GigaScience

Genome sequence of the small brown striatellus --Manuscript Draft--

Manuscript Number: GIGA-D-17-00204R1 Full Title: Genome sequence of the small brown planthopper Laodelphax striatellus Article Type: Data Note

Funding Information: Strategic Priority Research Program of the Dr. Feng Cui Chinese Academy of Sciences (XDB11040200) Major State Basic Research Development Dr. Feng Cui Program of China (973 Program) (2014CB13840402) National Natural Science Foundation of Dr. Yanyuan Bao China (31371934)

Abstract: Background: Laodelphax striatellus Fallén (: ) is one of the most destructive rice pests. L. striatellus is different from another two genome-released rice , Sogatella furcifera and Nilaparvata lugens, in many biological characteristics, such as host range, dispersal capacity, and vectoring plant viruses. Deciphering the genome of L. striatellus will help understand the genetic basis of the biological differences among the three rice planthoppers. Findings: 193 Gb Illumina data and 32.4 Gb Pacbio data were generated and used to assemble a high quality L. striatellus genome sequence, which is 541 Mb in length and has a contig N50 of 118 Kb and a scaffold N50 of 1.08 Mb. Annotated repetitive elements account for 25.7% of the genome. 17736 protein-coding genes were annotated, capturing 97.6% and 98% of BUSCO eukaryote and arthropoda genes, respectively. Compared to N. lugens and S. furcifera, L. striatellus has the smallest genome and the least gene number. Gene family expansion and transcriptomic analyses provided hints to the genomic basis of the differences in important traits such as host range, migratory habit and plant virus transmission between L. striatellus and the other two planthoppers. Conclusions: We reported a high quality genome assembly of L. striatellus, which is an important genomic resource not only for the study of the biology of L. striatellus and its interactions with plant hosts and plant viruses, but also for the comparisons to other planthoppers. Corresponding Author: Feng Cui Institute of Zoology Chinese Academy of Sciences Beijing, CHINA Corresponding Author Secondary Information: Corresponding Author's Institution: Institute of Zoology Chinese Academy of Sciences Corresponding Author's Secondary Institution: First Author: Junjie Zhu First Author Secondary Information: Order of Authors: Junjie Zhu Feng Jiang Xianhui Wang Pengcheng Yang Yanyuan Bao Wan Zhao Wei Wang

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation Hong Lu Qianshuo Wang Na Cui Jing Li Xiaofang Chen Lan Luo Jinting Yu Le Kang Feng Cui Order of Authors Secondary Information: Response to Reviewers: Editor suggestions: I agree with reviewer 1 that all genome resources, such as contigs, scaffolds, annotated scaffolds should be made easily accessible, for example via the i5k repository mentioned by the reviewer, and via our repository GigaDB.

Response: We contacted i5k and all genome resources are being submitted to the i5k repository. We has uploaded these data to the repository GigaDB (ftp://[email protected]).

In your revised manuscript, please include a citation to your GigaDB dataset to your reference list, and cite this in the data availability section and elsewhere in the manuscript, where appropriate.

Response: We included a citation [57] to our GigaDB dataset to the reference list and cited this in the data availability section, Line 450.

Reviewer #1: The paper is very straightforward, easy to read and the data are strong and very clear. This is a short interesting and comprehensive genome paper with appropriate strategies. I did not see either in the manuscript or in the GigaD repository the access to the genome resources: contigs, scaffolds, annotated scaffolds, browser… Please make it easily accessible. Access to the assembled and annotated genome could be easier into a dedicated database, such the one developed by i5k. I encourage the authors to contact the curators of this databac

Response: We contacted i5k and all genome resources are being submitted to the i5k repository. We has uploaded these data to the repository GigaDB (ftp://[email protected]).

It is probable that L. striatellus has bacterial symbiont. Did the authors find some traces of bacterial sequences that could correspond to the symbiont(s)?

Response: We aligned Wolbachia genome to the genome of L. striatellus and found that 96% of Wolbachia genome was covered in the genome of L. striatellus.

I also highlight that the authors made some quantitative and qualitative comparisons with the 2 other planthopper genomes. More particularly, they performed transcriptomic analyses between these three species in terms of virus charge. I suggest the authors to indicate whether the three viruses used for this study have or not the same kind of biology within their host : do they replicate for instance in the planthopper host? As well, by searching for GO terms specific to the different conditions, it is not clear whether the authors have used statistical based analysis to see if the enrichment is significant.

Response: Yes, the three viruses used for this study are transmitted in a persistent- propagative way. We complemented this information in Line 384. We retrieved the common GO terms from respective GO terms specific to different conditions (Lines

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation 405-406), not using the enrichment analysis.

Table 1: please indicate more clearly which library is DNA or RNA.

Response: We added the information of libraries in Table 1 as suggested by the reviewer.

Fig 2 and FigS7: mark with an arrow the location of L. sriatellus

Response: We added an arrow in the location of L. striatellus in these two figures.

Fig 4: I think information is missing such as the divergence between the branches, and the signification of the red spots.

Response: The divergence time between branches were marked in Figure S7. The red spots in Figure 4 were used to show clearly the divergence positions between the branches.

Fig S4: it is difficult to follow with all the different curves of different colors. Is it necessary to show the data for all the species? Would it be better to select some species?

Response: It is not good to use less species in Fig S4. Cross-species density plots using a broad range of insect species can provide a global comparison of gene structure parameter. The comparison results indicate the gene structure parameters of the small brown planthopper don’t show an obvious deviation from those of other .

Reviewer #2: The one suggestion I would make is that the observations on host range do not always agree with what was found in the comparison of the Pea aphid and the Soybean aphid. There, the smaller genome was associated with a reduced host range. It is important to make the comparison between the results for the aphid and plant hopper genomes to show that generalizations based only on one or the other will be misleading.

Response: We made the comparison between the results for the aphid and planthpooer genomes in Lines 438-443: Despite having the smallest genome, L. striatellus has the widest host plant range among the three planthoppers. This situation is different from that of the genome evolution in Aphididae, where the soybean aphid, Aphis glycines, which is an extreme specialist, has the smallest genome compared to another three aphid species with published genome sequences [56].

Reference 8 is often cited rather than the original source. Unless the number of references is limited the original source should be cited.

Response: We replaced reference 8 with the original source, the new reference [10], for the flow cytometry in Line 121 and 122.

Keywords should not include words in the title.

Response: We adjusted the keywords as “Comparative genomics; Insects; Genome sequencing; Annotation; Virus transmission”.

Line 129 In total, we. Not "Totally WE"

Response: We made the correction as the reviewer suggested.

Look at S1 Line 157. Solution, fixed in Not, "solution. After fixed" . The later could read as "After fixing" but the latter will be viewed as jargon.

Response: We made the correction as the reviewer suggested.

Comments regarding base composition are either unclear or vague. Line 201 refers to "proper base composition". The term proper is subjective. It is not clear what it means

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation in this context. The same can be said for Line 202. It is not clear what is meant by "The GC content of L. striatellus was 34.54%, similar to that of N. lugens". Is this in reference to overall GC, or based on bin composition? The reader needs more information here. What exactly was done? Response: We revised these sentences as follows: First, the overall base composition and the percentage of Ns were calculated. As shown in Table S1, the assembled genome had a low percentage (1.99%) of Ns and an expected base composition, which is similar to that of the other two planthoppers. The overall GC content of L. striatellus was 34.54%, similar to that of N. lugens [8] and slightly higher than that of S. furcifera [9]. Line 256 delete "was used".

Response: "was used" was deleted.

Line 421. Missing word, "We referred to genes"

Response: “to” was added here. Additional Information: Question Response Are you submitting this manuscript to a No special series or article collection? Experimental design and statistics Yes

Full details of the experimental design and statistical methods used should be given in the Methods section, as detailed in our Minimum Standards Reporting Checklist. Information essential to interpreting the data presented should be made available in the figure legends.

Have you included all the information requested in your manuscript?

Resources Yes

A description of all resources used, including antibodies, cell lines, and software tools, with enough information to allow them to be uniquely identified, should be included in the Methods section. Authors are strongly encouraged to cite Research Resource Identifiers (RRIDs) for antibodies, model organisms and tools, where possible.

Have you included the information requested as detailed in our Minimum Standards Reporting Checklist?

Availability of data and materials Yes

All datasets and code on which the conclusions of the paper rely must be

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation either included in your submission or deposited in publicly available repositories (where available and ethically appropriate), referencing such data using a unique identifier in the references and in the “Availability of Data and Materials” section of your manuscript.

Have you have met the above requirement as detailed in our Minimum Standards Reporting Checklist?

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation Manuscript Click here to download Manuscript Laodelphax_ED.docx

1 1 Genome sequence of the small brown planthopper 2 3 4 2 Laodelphax striatellus 5 6 3 7 8 9 4 Junjie Zhu1,4*, Feng Jiang2*, Xianhui Wang1, Pengcheng Yang2, Yanyuan Bao3, Wan 10 11 1 1 1 1 1 1 1 12 5 Zhao , Wei Wang , Hong Lu , Qianshuo Wang , Na Cui , Jing Li , Xiaofang Chen , 13 14 6 Lan Luo1, Jinting Yu1, Le Kang1,2#, Feng Cui1# 15 16 17 7 18 19 20 8 1State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute 21 22 23 9 of Zoology, Chinese Academy of Sciences, Beijing 100101, China 24 25 10 2Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100101, 26 27 28 11 China 29 30 31 12 3State Key Laboratory of Rice Biology and Ministry of Agriculture Key Laboratory of 32 33 34 13 Agricultural Entomology, Institute of Insect Sciences, Zhejiang University, Hangzhou 35 36 14 310058, China. 37 38 39 15 4University of Chinese Academy of Sciences, Beijing 100049, China 40 41 42 16 43 44 45 17 Junjie Zhu ([email protected]), Feng Jiang ([email protected]), Xianhui Wang 46 47 18 48 ([email protected]), Pengcheng Yang ([email protected]), Yanyuan Bao 49 50 19 ([email protected]), Wan Zhao ([email protected]), Wei Wang 51 52 53 20 ([email protected]), Hong Lu ([email protected]), Qianshuo Wang 54 55 56 21 ([email protected]), Na Cui ([email protected]), Jing Li ([email protected]), 57 58 22 59 Xiaofang Chen ([email protected]), Lan Luo ([email protected]), Jinting Yu 60 61 1 62 63 64 65 1 23 ([email protected]), Le Kang ([email protected]), Feng Cui ([email protected]) 2 3 24 4 5 6 25 #Authors for correspondence: 7 8 9 26 Feng Cui 10 11 12 27 Tel: +86-10-64807218; Email: [email protected]. 13 14 28 Le Kang 15 16 17 29 Tel: +86-10-64807219; Email: [email protected]. 18 19 20 30 21 22 23 31 *These authors contributed equally to this work. 24 25 32 26 27 28 33 29 30 31 34 32 33 34 35 35 36 36 37 38 39 37 40 41 42 38 43 44 45 39 46 47 40 48 49 50 41 51 52 53 42 54 55 56 43 57 58 44 59 60 61 2 62 63 64 65 1 45 Abstract 2 3 46 Background: Laodelphax striatellus Fallén (Hemiptera: Delphacidae) is one of the 4 5 6 47 most destructive rice pests. L. striatellus is different from two other rice planthoppers 7 8 9 48 with a released genome sequence, Sogatella furcifera and Nilaparvata lugens, in many 10 11 12 49 biological characteristics, such as host range, dispersal capacity, and vectoring plant 13 14 50 viruses. Deciphering the genome of L. striatellus will help to understand the genetic 15 16 17 51 basis of the biological differences among the three rice planthoppers. 18 19 20 52 Findings: 193 Gb Illumina data and 32.4 Gb Pacbio data were generated and used to 21 22 23 53 assemble a high quality L. striatellus genome sequence, which is 541 Mb in length and 24 25 54 has a contig N50 of 118 Kb and a scaffold N50 of 1.08 Mb. Annotated repetitive 26 27 28 55 elements account for 25.7% of the genome. 17736 protein-coding genes were annotated, 29 30 31 56 capturing 97.6% and 98% of BUSCO eukaryote and arthropoda genes, respectively. 32 33 34 57 Compared to N. lugens and S. furcifera, L. striatellus has the smallest genome and the 35 36 58 lowest gene number. Gene family expansion and transcriptomic analyses provided hints 37 38 39 59 to the genomic basis of the differences in important traits such as host range, migratory 40 41 42 60 habit and plant virus transmission between L. striatellus and the other two planthoppers. 43 44 45 61 Conclusions: We report a high quality genome assembly of L. striatellus, which is an 46 47 62 48 important genomic resource not only for the study of the biology of L. striatellus and 49 50 63 its interactions with plant hosts and plant viruses, but also for the comparisons to other 51 52 53 64 planthoppers. 54 55 56 65 Keywords: Comparative genomics; Insects; Genome sequencing; Annotation; Virus 57 58 66 59 transmission 60 61 3 62 63 64 65 1 67 Background 2 3 68 The small brown planthopper, Laodelphax striatellus (Delphacidae, Hemiptera), 4 5 6 69 is one of the most destructive pests in a variety of crops (Figure 1). It is widespread in 7 8 9 70 the Palearctic region, including countries such as China, Japan, Germany, Italy, Russia, 10 11 12 71 Kazakhstan, Turkey, and United Kingdom [1]. L. striatellus is polyphagous and its hosts 13 14 72 include rice, maize, oats, tall oatgrass, wheat, and barley. It injures plants by sap- 15 16 17 73 sucking behavior using its piercing-sucking mouthpart, after which symptoms like 18 19 20 74 stunting, chlorosis and hopper burn may further develop in plants. Apart from feeding 21 22 23 75 damage, L. striatellus transmits various plant viruses, such as Rice stripe virus (RSV), 24 25 76 Rice black-streaked dwarf virus (RBSDV), Barley yellow striate mosaic virus, Maize 26 27 28 77 rough dwarf virus, Wheat rosette stunt virus, and Wheat chlorotic streak virus [2]. Some 29 30 31 78 of these viruses may cause serious damage to agricultural production, such as RSV and 32 33 34 79 RBSDV. For example, rice stripe disease caused by RSV has broken out over the past 35 36 80 several decades in many East Asian countries, including China, where rice field 37 38 39 81 production was reduced by 30%–50% and total loss of harvest was observed in some 40 41 42 82 areas [3]. 43 44 45 83 L. striatellus is distinct from two other rice planthoppers, white-backed 46 47 84 48 planthopper (Sogatella furcifera) and brown planthopper (Nilaparvata lugens), in 49 50 85 several important traits such as host range, dispersal capacity, and plant viruses that 51 52 53 86 they vector. N. lugens mostly feeds on rice plants, S. furcifera feeds on rice, wheat and 54 55 56 87 maize, and L. striatellus has an even broader host range. Both N. lugens and S. furcifera 57 58 88 59 are known for migratory habits [4]. Whereas, S. furcifera is the vector of Southern rice 60 61 4 62 63 64 65 1 89 black streak dwarf virus (SRBSDV) [5], and N. lugens is the vector of Rice ragged 2 3 90 stunt virus (RRSV) and Rice grassy stunt virus [6, 7], L. striatellus is the carrier of RSV, 4 5 6 91 RBSDV and several other viruses. Although the genome sequences of S. furcifera and 7 8 9 92 N. lugens have been released recently [8, 9], no comparative genomic analyses were 10 11 12 93 reported for the two planthoppers. Deciphering the genome of L. striatellus can help 13 14 94 understand the genetic basis underlying the differences in important traits between L. 15 16 17 95 striatellus and the other two rice planthoppers 18 19 20 96 Data description 21 22 23 97 Sample and sequencing 24 25 98 The inbreeding line used for genome sequencing is an inbred laboratory strain that 26 27 28 99 was derived from a field population collected in Hai’an, Jiangsu province, China. A 29 30 31 100 single gravid female was selected and her progenies were sib-mated for 22 generations 32 33 34 101 to obtain the inbreeding line. Planthoppers were reared on 2-3 cm rice seedlings at 25 °C 35 36 102 and a photoperiod of 16:8 h light/dark. DNA was extracted by using Puregene Core Kit 37 38 39 103 A (Qiagen) from the F22 specimens following the manufacturer’s instruction. We built 40 41 42 104 5 libraries with insert size between 180 bp and 800 bp for paired-end sequencing and 9 43 44 45 105 libraries with insert size between 1.4 Kb to 24 Kb for mate-pair sequencing according 46 47 106 48 to standard protocols of Illumina HiSeq 2500 sequencer (Table 1). We also constructed 49 50 107 33 Pacbio RSII libraries according to the standard Pacbio protocols (Table 1). In total, 51 52 53 108 we generated 190 Gb Illumina data (126 Gb paired-end reads and 64 Gb mated-pair 54 55 56 109 reads) and 32.4 Gb Pacbio data, representing 316X and 54X coverage of the genome, 57 58 110 59 respectively. 60 61 5 62 63 64 65 1 111 For transcriptome sequencing, total RNA was isolated from four tissues (antenna, 2 3 112 brain, fatty body and gonad) and whole bodies of three developmental stages (egg, 4 5 6 113 nymph, adult) of L. striatellus using TRIzol reagent (Invitrogen) according to the 7 8 9 114 manufacturer’s protocol. Nanodrop (Thermo Scientific) was used to determine RNA 10 11 12 115 quantity and gel electrophoresis was used to examine RNA quality. cDNA libraries were 13 14 116 constructed according to the manufacturer’s instructions and sequenced on an Illumina 15 16 17 117 HiSeq 2500 sequencer. 18 19 20 118 21 22 23 119 Estimation of genome size and determination of chromosome number 24 25 120 We estimated the genome size of L. striatellus using two independent approaches: 26 27 28 121 flow cytometry [10] and k-mer analyses [11]. The flow cytometry analysis was carried 29 30 31 122 out according to a published procedure [10]. Briefly, a female adult was ground in the 32 33 34 123 PBS-T buffer. The mixture was filtered by a 40 μm cell filter, incubated with 2 μg/ml 35 36 124 RNase A at 37 °C for 15 minutes and then stained with 5 μg/ml propidium iodide at 37 38 39 125 25 °C for 30 minutes. The fluorescence signal was detected by a FACSCallbur Analyzer 40 41 42 126 (Becton, Dickinson and Company). Heads of Drosophila melanogaster and cytoblasts 43 44 45 127 of Gallus gallus were treated with the same procedure as genome size references. The 46 47 128 48 genome sizes of D. melanogaster and G. gallus are known to be 0.18 pg and 1.25 pg, 49 50 129 respectively [12]. As shown in Figure S1, the genome size of L. striatellus was 51 52 53 130 estimated to be 0.60 pg (587 Mb) by the flow cytometry method. In k-mer analysis, 54 55 56 131 31.94 Gb clean reads were utilized to generate a k-mer (k = 17) depth distribution curve 57 58 132 59 (Figure S1D), based on which the genome size was estimated to be 550 Mb. 60 61 6 62 63 64 65 1 133 Accordingly, the haploid genome size of L. striatellus was estimated to be 550-587 Mb. 2 3 134 The chromosome number was determined by cytological analysis of testes cells. 4 5 6 135 The testes of newly emerged males were dissected in insect Ringer solution, fixed in 7 8 9 136 Carnoy’s fixative for 15 minutes. The testes were washed with 0.01 mol/L PBS solution, 10 11 12 137 stained at 0.5 μg/ml Hoechst 33258, and sealed with Antifade Mounting Medium 13 14 138 (Beyotime). Cells in meiosis phase were selected for chromosome counting under a 15 16 17 139 confocal microscope Zeiss LSM710 (Zeiss). In most cases, 15 haploid chromosomes 18 19 20 140 were observed (30 for diploid chromosomes, Figure S2), although sometimes only 14 21 22 23 141 were visible. Thus the number of chromosomes in L. striatellus was determined to be 24 25 142 2n = 30. 26 27 28 143 29 30 31 144 Genome assembly and assessment 32 33 34 145 We assembled the genome with both Illumina sequencing and Pacbio sequencing 35 36 146 data. Illumina data were used to build contigs and scaffolds as follows. First, all reads 37 38 39 147 with >= 10% unidentified nucleotides, or with > 10 nt aligned to the adapter sequences, 40 41 42 148 or being putative PCR duplicates were removed to obtain clean reads. Mate-pair reads 43 44 45 149 from libraries with insert sizes >2 kb were classified as paired-end, unpaired, negative, 46 47 150 48 and mate-pair reads and only the negative and mate-pair reads were retained for the 49 50 151 assembly. Second, we employed SOAPdenovo Ver. 3.0 (SOAPdenovo, 51 52 53 152 RRID:SCR_010752) [13, 14] with the parameters “pregraph -K 33 -p 30 -d 30; contig 54 55 56 153 –k 33 –M 3” to build de Bruijn graph and assemble sequencing reads into contigs. Third, 57 58 154 59 all mate-pair reads were mapped to the contigs and mate-pair information was added in 60 61 7 62 63 64 65 1 155 a stepwise manner to connect contigs into scaffolds. GapCloser Ver. 1.12 (GapCloser, 2 3 156 RRID:SCR_015026) [13] was used to fill the gaps between scaffolds with a local 4 5 6 157 assembly strategy. Afterwards, PBJelly Ver. 15.8.24 (PBJelly, RRID:SCR_012091) [15] 7 8 9 158 was used to fill gaps between scaffolds using the 32.4 Gb (~ 54X) Pacbio data. Briefly, 10 11 12 159 all the gaps (length >25 bp) on the assembly were identified first and the Pacbio reads 13 14 160 were mapped to the assembly using PBJelly. The BLASR alignments were parsed to 15 16 17 161 identify gap-supporting reads by comparing aligned and un-aligned base positions 18 19 20 162 within each read [16]. Overlap-layout-consensus engine ALLORA within PBSuite (Ver. 21 22 23 163 15.8.24, Pacific Biosciences Menlo Park) [17] was used to assemble the reads for each 24 25 164 gap to generate consensus gap-filling sequences. As the final step, the consensus gap 26 27 28 165 filling sequences were spliced into the corresponding gap position in the draft assembly, 29 30 31 166 replacing all N's if the gap was closed and leaving appropriate number of N's if the gap 32 33 34 167 was only reduced. 35 36 168 With the above assembly procedure, we obtained a final assembly of 541 Mb, 37 38 39 169 having 38,193 scaffolds with a contig N50 length of 118 Kb and a scaffold N50 length 40 41 42 170 of 1.1 Mb. The length of the assembly accounts for 91.7% and 98.4% of the estimated 43 44 45 171 genome size by flow cytometry and k-mer analysis, respectively. The longest contig 46 47 172 48 and scaffold were 2.0 Mb and 10.4 Mb, respectively (Table 2). The Pacbio sequencing 49 50 173 data greatly improved the length of contigs compared to the published genomes of N. 51 52 53 174 lugens (contig N50, 24.2 Kb [8]) and S. furcifera (contig N50, 70.7 Kb [9]), which were 54 55 56 175 assembled with Illumina data only (Table 2). We aligned clean reads onto the genome 57 58 176 59 assembly using BWA (BWA, RRID:SCR_010910) [18] and calculated the fraction of 60 61 8 62 63 64 65 1 177 bases at given sequencing depth. The results showed a very small fraction of low 2 3 178 coverage bases, suggesting high coverage and accuracy of the genome assembly 4 5 6 179 (Figure S3). 7 8 9 180 10 11 12 181 Validation and quality control 13 14 182 The completeness and accuracy of the genome assembly were assessed by four 15 16 17 183 independent approaches. First, the overall base composition and the percentage of Ns 18 19 20 184 were calculated. As shown in Table S1, the assembled genome had a low percentage 21 22 23 185 (1.99%) of Ns and an expected base composition, which is similar to that of the other 24 25 186 two planthoppers. The overall GC content of L. striatellus was 34.54%, similar to that 26 27 28 187 of N. lugens [8] and slightly higher than that of S. furcifera [9]. Second, we remapped 29 30 31 188 Illumina paired-end reads to the assembly using BWA [18] and we found that 93.2% 32 33 34 189 reads could be mapped back, covering 96.83% of the assembled genome, including 35 36 190 95.08% of the genome with ≥ 20X coverage (Table S2). Third, we performed de novo 37 38 39 191 transcriptome assembly using Trinity Ver. 2.0.2 (Trinity, RRID:SCR_013048) for 40 41 42 192 RNA-seq data from multiple developmental stages and tissues (Table 1). We also 43 44 45 193 included two published RNA sequencing datasets from salivary glands and alimentary 46 47 194 48 canal [19] in the transcriptome assembly. We mapped the assembled transcripts to the 49 50 195 genome assembly using TopHat (TopHat, RRID:SCR_013035) with default parameters 51 52 53 196 and found that 90.31% of the transcripts with > 90% transcript coverage were aligned 54 55 56 197 to one scaffold (Table S3), indicating that most expressed genes were correctly 57 58 198 59 assembled in the genome. When the RNA reads from the nine transcriptome datasets 60 61 9 62 63 64 65 1 199 were directly mapped to the genome, 78% to 94% were could be correctly mapped to 2 3 200 the genome with appropriate splicing, indicating that the genome assembly had a good 4 5 6 201 representative of gene regions (Table S4). Finally, the benchmarking universal single- 7 8 9 202 copy orthologs (BUSCO, RRID:SCR_015008), Ver. 1, dataset representing 2675 genes 10 11 12 203 for arthropoda was used for genome assessment [20]. Our assembled genome captured 13 14 204 92% (2470/2675) of the BUSCO genes, suggesting that a gene repertoire was nearly 15 16 17 205 complete (Table S5). Taken together, these results suggest that our assembled genome 18 19 20 206 was highly accurate and nearly covered the whole genome. 21 22 23 207 24 25 208 Annotation of repetitive elements 26 27 28 209 Two independent methods, namely homology-based and de novo prediction, were 29 30 31 210 applied for repetitive element annotation. For the homology-based method, the 32 33 34 211 assembled genome was compared to Repbase issued on January 13, 2014 [21] using 35 36 212 RepeatMasker Ver. 4.0.5 (RepeatMasker, RRID:SCR_012954) and 37 38 39 213 RepeatProteinMasker (Ver. 1.36) with default settings [22]. For the de novo prediction, 40 41 42 214 we built a de novo repeat library with LTR_FINDER Ver. 1.0.5 (LTR_Finder, 43 44 45 215 RRID:SCR_015247) [23], Piler (Ver. 1.06) [24], RepeatScout Ver. 1.0.5 (RepeatScout, 46 47 216 48 RRID:SCR_014653) [25] and RepeatModeler Ver. 1.0.8 (RepeatModeler, 49 50 217 RRID:SCR_015027). Tandem Repeat Finder (Ver. 4.07b) [26] was used to search 51 52 53 218 tandem repeats. Furthermore, RepeatProteinMask [22] was used to identify putative 54 55 56 219 transposable element (TE) related proteins. After merging all the repetitive elements 57 58 220 59 identified by above-mentioned tools, we identified a total of 139.1 Mb repetitive 60 61 10 62 63 64 65 1 221 sequences, accounting for 25.7% of the genome (Table S6). The percentage of repetitive 2 3 222 elements in the L. striatellus genome was much lower than those of N. lugens (48.6% 4 5 6 223 [8]) and S. furcifera (44.3% [9]). Of all the repetitive sequences, 10.59% were the class 7 8 9 224 I transposable elements (retrotransposon), including 5.01% long interspersed nuclear 10 11 12 225 elements, 1.32% long terminal repeats, and 4.26% short interspersed nuclear elements. 13 14 226 Class II elements (DNA transposons) represented only 4.92% of the genome (Table 3). 15 16 17 227 L. striatellus had the lowest TE fraction and the smallest genome size compared to N. 18 19 20 228 lugens and S. furcifera (Table 3). 21 22 23 229 24 25 230 Annotation of protein-coding genes 26 27 28 231 The protein-coding genes were annotated with evidences from the homology-base 29 30 31 232 method, ab initio prediction, and RNA-seq data. For the homology-based method, the 32 33 34 233 annotated gene sets from eight species, N. lugens, Acyrthosiphon pisum, Pediculus 35 36 234 humanus, Nasonia vitripennis, D. melanogaster, Bombyx mori, Rhodnius prolixus and 37 38 39 235 Daphnia pulex (Table S7) were aligned to the L. striatellus genome using TBLASTN 40 41 42 236 (TBLASTN, RRID:SCR_011822) [27] with an E-value cutoff of 1E-5. GeneWise Ver. 43 44 45 237 2.2.0 (GeneWise, RRID:SCR_015054) [28] was used to define gene models. For ab 46 47 238 48 initio prediction, we utilized Augustus Ver. 3.1 (Augustus: Gene Prediction, 49 50 239 RRID:SCR_008417) [29], GlimmerHMM Ver. 3.0.4 (GlimmerHMM, 51 52 53 240 RRID:SCR_002654 [30], SNAP (Ver. 2013-11-29) [31], GeneID (Ver. 1.4) [32, 33] and 54 55 56 241 GENSCAN Ver. 1.0 (GENSCAN, RRID:SCR_012902) [34] to predict potential 57 58 242 59 protein-coding genes from the repeat-masked genome. Furthermore, we identified gene 60 61 11 62 63 64 65 1 243 structures with the assistance of nine transcriptomes assembled by Tophat-Cufflinks 2 3 244 (Ver. 2.2.1) [35] and Trinity-PASA (Ver. 2.0.2) [36], respectively. Then we integrated 4 5 6 245 all predicted gene structures above with EvidenceModeler (Ver. 1.1.1) [37] to obtain a 7 8 9 246 non-redundant set of 17736 protein-coding genes with an average gene length of around 10 11 12 247 16.17 Kb (Table S8-S9, Figure S4). We constructed the orthologous gene families using 13 14 248 annotated genes from 22 closely-related species (Table S7) and found that L. striatellus 15 16 17 249 had 4210 species-specific genes, much fewer than those of N. lugens (10163) and S. 18 19 20 250 furcifera (7743) (Figure 2). This may be attributed to the smaller genome size and lower 21 22 23 251 gene number in L. striatellus. 24 25 252 We used three methods to evaluate the gene models that we obtained. First, we 26 27 28 253 examined the 2 Kb upstream and downstream regions of annotated genes and found 29 30 31 254 that the majority (16525, 93.17%) of genes did not contain any ambiguous bases (Ns) 32 33 34 255 in the 2 Kb up- and down-stream regions, indicating that these gene models are not 35 36 256 located near an assembly gap and thus the gene models are unlikely to be a fragment. 37 38 39 257 Second, we compared our annotated genes with the corresponding orthlogous genes in 40 41 42 258 D. melanogaster. We performed BLASTX (BLASTX, RRID:SCR_001653) [27] 43 44 45 259 searches against the D. melanogaster gene set using the de novo assembled transcripts 46 47 260 48 in L. striatellus. A total of 8484 assembled transcripts that had identity > 60% with a D. 49 50 261 melanogaster gene and covered > 90% of the coding region were regarded as full-length 51 52 53 262 transcripts. Among them, 3728 transcripts (excluding redundant protein isoforms) 54 55 56 263 containing a complete ORF were searched against the annotated genes and 3093 57 58 264 59 (82.97%) of them had near perfect match to an annotated gene, indicating that most 60 61 12 62 63 64 65 1 265 annotated genes were complete. Third, we compared our annotated genes to the two 2 3 266 sets of BUSCO (Ver. 2) genes (1066 arthropoda genes and 303 eukaryote genes) [20] 4 5 6 267 and found that our predicted genes were considered as complete BUSCO genes in 97.6% 7 8 9 268 and 98.0% of the eukaryote genes and arthropoda genes, respectively (Figure S5), 10 11 12 269 suggesting that a nearly complete repertoire of protein-coding gene set was determined. 13 14 270 To estimate the level of heterozygosity in the gene model, we aligned 23X reads 15 16 17 271 to the genome assembly with BWA [18]. After removing duplicates, heterozygous 18 19 20 272 SNPs were identified using BCFtools [38]. The heterozygous SNPs in the coding 21 22 23 273 regions of each gene were used to compute read coverage and heterozygosity. Only a 24 25 274 single heterozygosity peak of around 0.3 was detected (Figure S6A). We ranked the 26 27 28 275 heterozygosity rate of all the gene set and took the top 20% as high heterozygosity (the 29 30 31 276 left as low heterozygosity). Coverage histograms of high and low heterozygosity 32 33 34 277 showed similar range of coverage distribution (Figure S6B). Therefore, the 35 36 278 heterozygosity did not influence the gene annotation. 37 38 39 279 In order to obtain putative functional assignments to the annotated genes, we 40 41 42 280 compared the annotated protein sequences of L. striatellus to proteins in KEGG (KEGG, 43 44 45 281 RRID:SCR_012773) [39], NR [40] and Swiss-Prot [41] databases using BLASTP 46 47 282 -5 48 (BLASTP, RRID:SCR_001010) [27] with an E-value cutoff of 1E . Domains and 49 50 283 motifs were scanned in Interpro [42] database by InterProScan (InterProScan, 51 52 53 284 RRID:SCR_005829) [43]. There were 78.7%, 66.3%, 63.6%, and 69.5% of annotated 54 55 56 285 proteins showing significant sequence similarity with the proteins in NR, Swiss-prot, 57 58 286 59 KEGG, and InterPro (InterPro, RRID:SCR_006695), respectively. Among the 12322 60 61 13 62 63 64 65 1 287 genes with an InterPro hit, 11159 (90.6%) had Pfam (Pfam, RRID:SCR_004726) 2 3 288 annotations and 8935 (72.5%) had Gene Ontology (GO, RRID:SCR_002811) 4 5 6 289 associations. After removing redundancy, 14182 of 17736 genes (80.0%) were assigned 7 8 9 290 to known databases (Figure 3). Among the 3554 unannotated genes, 1391 (7.8%) were 10 11 12 291 L. striatellus-specific genes. 13 14 292 15 16 17 293 Gene orthology prediction 18 19 20 294 Twenty one sequenced insects (Zootermopsis nevadensis, Tribolium castaneum, 21 22 23 295 Anoplophora glabripennis, Anopheles gambiae, D. melanogaster, A. pisum, Diuraphis 24 25 296 noxia, Cimex lectularius, L. striatellus, R. prolixus, N. lugens, S. furcifera, Diaphorina 26 27 28 297 citri, Oncopeltus fasciatus, Apis mellifera, N. vitripennis, B. mori, B. tabaci, Danaus 29 30 31 298 plexippus, Locusta migratoria, and P. humanus) and one non-insect arthropoda 32 33 34 299 sequenced species (D. pulex) were used to infer gene orthology and reconstruct the 35 36 300 phylogenetic tree. The annotated coding sequences were downloaded from the websites 37 38 39 301 listed in Table S7. The homologous gene families were identified using TreeFam [44, 40 41 42 302 45] and ascribed in different categories (Figure 2). The gene families were identified 43 44 45 303 following these steps: i) BLASTP [27] was used to compare all protein sequences for 46 47 304 -7 48 the 22 species with an E-value cutoff of 1E ; ii) the blast alignments were concatenated 49 50 305 by Solar (Ver. 0.9.6) [45], followed by homology identification among protein 51 52 53 306 sequences; and iii) gene families were identified using hcluster_sg (Ver. 0.5.0) [45]. 54 55 56 307 RAxML (Ver. 8.0.19) [46] was used to reconstruct the phylogenetic tree based on the 57 58 308 59 concatenated single-copy protein sequences under PROTGAMMAAUTO model with 60 61 14 62 63 64 65 1 309 100 bootstrap replicates. R8s (Ver. 1.7.1) [47] and MCMCtree (PAML package, Ver. 2 3 310 4.7; PAML, RRID:SCR_014932) [48] were used to estimate the divergence times 4 5 6 311 among species. The parameters used in MCMCtree were “--rootage 510 -clock 3 -alpha 7 8 9 312 0.977999 -model 7”. To examine gene family expansion and contraction in the three 10 11 12 313 planthoppers, we chose one additional hemipteran species R. prolixus as outgroup to 13 14 314 infer expanded/contracted gene families using CAFE (Ver. 3.1) [49]. A conditional P- 15 16 17 315 value was calculated for each gene family and the gene families with P-value < 0.05 18 19 20 316 were considered as significantly expanded or contracted. The phylogenetic analysis 21 22 23 317 revealed that L. striatellus clustered together with the other two planthoppers and had 24 25 318 a closer relationship to S. furcifera than N. lugens (Figure 4). The divergence times of 26 27 28 319 non-planthopper insect species were generally consistent with those estimated in the 29 30 31 320 previous study [8]. The result of molecular dating analysis indicated that the ancestor 32 33 34 321 of L. striatellus and S. furcifera split with N. lugens about 87.5 million years ago and L. 35 36 322 striatellus diverged from S. furcifera approximately 31 million years ago (Figure S7). 37 38 39 323 Compared with N. lugens and S. furcifera, L. striatellus had fewer expanded gene 40 41 42 324 families and more contracted gene families (Figure S8). This might partially explain 43 44 45 325 why L. striatellus has the fewest gene number among the three planthopper species. 46 47 326 48 Since the divergence of L. striatellus and S. furcifera, L. striatellus and S. furcifera had 49 50 327 95 and 547 expanded gene families, respectively (Figure S8). The significantly 51 52 53 328 expanded gene families in L. striatellus included some specific members of multi-gene 54 55 56 329 families, such as odorant receptor, cytochrome P450, and serine protease (especially 57 58 330 59 trypsin, Table S10). The specific members of chemosensory protein, odorant binding 60 61 15 62 63 64 65 1 331 protein, carboxylesterase, and ATP-binding cassette transporter families were also 2 3 332 increased in L. striatellus although their P-values were higher than 0.05 (Table S10). 4 5 6 333 Expansion of these gene families may have contributed to the widest host plant range 7 8 9 334 of L. striatellus among the three planthoppers. The specific members of gene families 10 11 12 335 associated with energy metabolism were significantly expanded in S. furcifera, such as 13 14 336 acyl-CoA synthetase, fatty acyl-CoA reductase, acyl-CoA-binding protein, and acyl- 15 16 17 337 coenzyme A thioesterase. The specific members of glyceraldehyde-3-phosphate 18 19 20 338 dehydrogenase, D-beta-hydroxybutyrate dehydrogenase, ADP/ATP translocase, acyl- 21 22 23 339 CoA transporter, and ATP synthase families also increased, even though with P-values 24 25 340 higher than 0.05 (Table S10). N. lugens had 433 expanded gene families (Figure S8). A 26 27 28 341 bunch of specific members from energy metabolism related gene families, including 29 30 31 342 Delta(3,5)-Delta(2,4)-dienoyl-CoA isomerase, ATP-citrate synthase, malonyl-CoA 32 33 34 343 decarboxylase, NADH dehydrogenase (ubiquinone) 1α subcomplex subunit 7 and 35 36 344 subunit 8, acyl-CoA synthetase, ATP synthase, and enoyl-CoA delta isomerase 37 38 39 345 increased in N. lugens although their P-values were higher than 0.05 (Table S10). 40 41 42 346 Expansion in the energy metabolism related gene families is in accordance with the 43 44 45 347 migratory habit of S. furcifera and N. lugens. 46 47 348 48 49 50 349 Olfaction and detoxification system 51 52 53 350 It is essential for herbivorous insects to recognize and locate their host plants 54 55 56 351 utilizing their sense of gustation and olfaction. Chemicals from the environment are 57 58 352 59 received and recognized by chemoreceptor genes, including odorant receptors (ORs), 60 61 16 62 63 64 65 1 353 gustatory receptors (GRs), and ionotropic receptors (IRs) in gustatory and olfactory 2 3 354 organs. Detoxification gene families also play an essential role in defense against 4 5 6 355 natural xenobiotics from host plants or synthetic xenobiotics including insecticides. To 7 8 9 356 identify chemoreception and detoxification related genes in L. striatellus, we retrieved 10 11 12 357 corresponding gene sequences of other insect species from previous studies and used 13 14 358 as queries. These genes were searched against the L. striatellus gene set using BLASTP 15 16 17 359 [27] with an E-value cutoff of 1E-5. In addition, we scanned the gene sets of three 18 19 20 360 planthoppers for domain information using InterProScan and extracted genes with 21 22 23 361 domains corresponding to each family. Finally we integrated results from both BLASTP 24 25 362 and InterProScan to obtain the final set of protein families. 26 27 28 363 There were 106 ORs, 38 IRs, and 12 GRs identified in L. striatellus (Table S11). 29 30 31 364 The numbers of OR and GR in L. striatellus were over twice as many as those in N. 32 33 34 365 lugens and S. furcifera, representing a significant expansion in these two families. This 35 36 366 is consistent with the fact that L. striatellus is the most polyphagous among the three 37 38 39 367 planthoppers because polyphagous insects tend to have more OR genes than 40 41 42 368 monophagous [8]. Moreover, we identified two proteins families important for odor 43 44 45 369 recognition and pheromone perception, namely odorant binding proteins (OBPs) and 46 47 370 48 chemosensory proteins (CSPs). There were 16 OBPs and 31 CSPs in L. striatellus, the 49 50 371 most among the three planthoppers (Table S11). The relatively higher number of odor 51 52 53 372 related genes in L. striatellus might be closely related to its polyphagous habit. 54 55 56 373 We manually annotated families of detoxification related genes, including 26 57 58 374 59 UDP-glycosyltransferases, 29 glutathione-S-transferases, 54 carboxyl/cholinesterase, 60 61 17 62 63 64 65 1 375 73 ATP-binding cassette transporters, and 76 cytochrome P450s in L. striatellus (Table 2 3 376 S12). The total number of detoxification related genes in L. striatellus was smaller than 4 5 6 377 that in N. lugens, but larger than that in S. furcifera. 7 8 9 378 10 11 12 379 Immune-related genes 13 14 380 We identified immune gene repertoires of the three planthoppers based on a 15 16 17 381 homology-based method. Immune genes from D. melanogaster, A. gambiae, Aedes 18 19 20 382 aegypti, Culex quinquefasciatus were downloaded from ImmunoDB [50]. Gene sets 21 22 23 383 from the three planthoppers were used as queries and searched against the immune 24 25 384 genes of the four insects, respectively, using BLASTX with an E-value cutoff of 1E-5. 26 27 28 385 The best hits were selected for further domain architecture analysis using InterProScan 29 30 31 386 and then were confirmed manually. The number of immune-related genes in L. 32 33 34 387 striatellus was 330, which was more than that in N. lugens (289) and S. furcifera (280) 35 36 388 (Table S13). The redundant copies of immune genes in L. striatellus mainly included 37 38 39 389 autophagy genes, 1,3-beta-D glucan binding protein genes, clip-domain serine protease 40 41 42 390 genes, and genes of small RNA regulatory pathway members. However, the numbers 43 44 45 391 of C-type lectin genes and Toll-like receptor genes were fewer in L. striatellus 46 47 392 48 compared to the other two planthoppers. 49 50 393 51 52 53 394 Transcriptomic responses of three planthoppers to their borne plant viruses 54 55 56 395 L. striatellus, S. furcifera, and N. lugens transmit different rice viruses. To explore 57 58 396 59 the molecular response to respective plant viruses, we analyzed and compared the 60 61 18 62 63 64 65 1 397 transcriptomic responses of L. striatellus to RSV, S. furcifera to SRBSDV, and N. lugens 2 3 398 to RRSV. The three viruses are transmitted in a persistent-propagative way. For L. 4 5 6 399 striatellus, RSV was incubated in the 4th-instar nymphs for 5 d as described previously 7 8 9 400 [51]. Three replicates of infected or non-infected insects were used to construct paired- 10 11 12 401 end RNA-seq libraries for sequencing on an Illumina Hiseq 2500 sequencer. The 13 14 402 transcriptomic data of S. furcifera infected with SRBSDV were retrieved from a 15 16 17 403 previous study [52]. The 3rd-instar nymphs of N. lugens were infected by RRSV for 7 18 19 20 404 d before collected for RNA extraction using the SV Total RNA Isolation System 21 22 23 405 (Promega). The gene expression libraries for RRSV-infected and non-infected samples 24 25 406 were constructed and sequenced on an Illumina HiSeq 2000 sequencer. RNA-seq reads 26 27 28 407 were mapped to the corresponding genome using TopHat2 (Ver. 2.1.1) [53]. For L. 29 30 31 408 striatellus and S. furcifera, HTSeq [54] was used to count the number of reads mapped 32 33 34 409 to each gene model, and the edgeR package was used to identify differentially expressed 35 36 410 genes (DEGs) with a fold change cutoff of 2 and FDR cutoff of 0.01. For N. lugens, 37 38 39 411 GFOLD (generalized fold change for ranking differentially expressed genes from 40 41 42 412 RNA-seq data) was used to detect DEGs without biological replicates. The gene 43 44 45 413 annotation files were downloaded from the corresponding websites (Table S14). We 46 47 414 48 referred to genes with higher expressions in the viruliferous group as up-regulated 49 50 415 genes and lower as down-regulated. The results showed that 460 (185 up and 275 down), 51 52 53 416 162 (48 up and 114 down), 1070 (515 up and 555 down) genes were differentially 54 55 56 417 expressed in L. striatellus, N. lugens and S. furcifera, respectively, when bearing 57 58 418 59 respective plant virus. 60 61 19 62 63 64 65 1 419 The DEGs in the three planthoppers were compared in GO terms and the common 2 3 420 GO terms were retrieved (Table S15). The up-regulated genes in the three planthoppers 4 5 6 421 were involved in the biological processes of regulation of transcription (GO:0006355) 7 8 9 422 and protein phosphorylation (GO:0006468). The down-regulated genes in the three 10 11 12 423 planthoppers took part in the biological processes of carbohydrate metabolic process 13 14 424 (GO:0005975), chitin catabolic process (GO:0006032), and proteolysis (GO:0006508). 15 16 17 425 Two zinc finger proteins of L. striatellus, one zinc finger protein of N. lugens, and 18 19 20 426 six zinc finger proteins of S. furcifera were commonly up-regulated while genes of 21 22 23 427 chitinases, cytochrome P450 CYP4s, and trypsins were commonly down-regulated in 24 25 428 the three planthoppers (Table S16) in response to respective plant virus. We also 26 27 28 429 identified homologous genes that were commonly regulated in the three planthoppers 29 30 31 430 by aligning N. lugens DEGs with those of L. striatellus and S. furcifera using BLASTP 32 33 -3 34 431 with a cutoff of 1E , a sequence identity higher than 60% and a coverage higher than 35 36 432 50%. Three groups of homologous genes, including one group of commonly up- 37 38 39 433 regulated genes and two groups of commonly down-regulated genes, were retrieved 40 41 42 434 from the three planthoppers (Table S17). The protein lengths of these homologous 43 44 45 435 genes ranged from 120 to 472 amino acids. We used these proteins as queries to search 46 47 436 - 48 the NR database and found no homologous genes in other species with a cutoff of 1E 49 50 437 7, indicating that these genes are likely planthopper-specific genes. 51 52 53 438 Differences in immune response to virus infection in the three planthoppers were 54 55 56 439 also observed. The RNAi pathway genes, RISC-loading complex TARBP2 and 57 58 440 59 argonaute-3, were up-regulated in S. furcifera and N. lugens, respectively, but genes in 60 61 20 62 63 64 65 1 441 the RNAi pathway did not respond to virus infection in L. striatellus. The antimicrobial 2 3 442 peptide defensin was up-regulated in L. striatellus and N. lugens but was down- 4 5 6 443 regulated in S. furcifera. The expression of Down syndrome cell adhesion molecule 7 8 9 444 gene increased in L. striatellus [55], decreased in S. furcifera, but did not show 10 11 12 445 significant change in N. lugens in response to respective plant viruses. 13 14 446 In summary, we reported a high quality of genome of L. striatellus, a notorious 15 16 17 447 rice pest insect. L. striatellus has the smallest genome and the least number of protein- 18 19 20 448 coding genes compared to the other two rice planthoppers, S. furcifera and N. lugens. 21 22 23 449 Comparative genomic analyses identified expansions and contractions in olfactory 24 25 450 genes, detoxification genes, immune genes, and energy metabolism genes among the 26 27 28 451 three rice planthoppers, which may have contributed to their differences in important 29 30 31 452 traits such as host range, migratory habit, and plant virus transmission. Despite having 32 33 34 453 the smallest genome, L. striatellus has the widest host plant range among the three 35 36 454 planthoppers. This situation is different from that of the genome evolution in Aphididae, 37 38 39 455 where the soybean aphid, Aphis glycines, which is an extreme specialist, has the 40 41 42 456 smallest genome compared to another three aphid species with published genome 43 44 45 457 sequences [56]. With the addition of the L. striatellus genome, the genome data of the 46 47 458 48 three rice planthoppers will boost the studies in various areas of planthoppers and 49 50 459 promote the control strategies in future. 51 52 53 460 54 55 56 461 Availability of supporting data 57 58 462 59 Genome sequencing and transcriptome data used for genome assembly and gene 60 61 21 62 63 64 65 1 463 annotation are deposited in the SRA under bioproject number PRJNA393384. Further 2 3 464 supporting data, including annotations, gene expression data, alignments, and BUSCO 4 5 6 465 results, are available via the GigaScience repository GigaDB (GigaDB, 7 8 9 466 RRID:SCR_004002) [57]. 10 11 12 467 13 14 468 List of abbreviations 15 16 17 469 BUSCO: benchmarking universal single-copy ortholog; CSP: chemosensory 18 19 20 470 protein; DEG: differentially expressed gene; GO: Gene Ontology; GR: gustatory 21 22 23 471 receptor; IR: ionotropic receptor; OBP: odorant binding protein; OR: odorant receptor; 24 25 472 RBSDV: Rice black-streaked dwarf virus; RRSV: Rice ragged stunt virus; RSV: Rice 26 27 28 473 stripe virus; SRBSDV: Southern rice black streak dwarf virus; TE: transposable element; 29 30 31 474 32 33 34 475 Competing interests 35 36 476 The authors declare that there are no financial and non-financial competing interests in 37 38 39 477 this study. 40 41 42 478 43 44 45 479 Authors’ contributions 46 47 480 48 JZ and FJ collected the samples, prepared the DNA and RNA, analyzed the data, 49 50 481 and drafted the paper. XW and PY coordinated the project. YB sequenced the 51 52 53 482 transcriptomes. WZ, WW, HL, QW, NC, JL, XC, LL and JY analyzed the data. LK and 54 55 56 483 FC designed the research, wrote and revised the paper. 57 58 59 484 60 61 22 62 63 64 65 1 485 Acknowledgements 2 3 486 We thank Prof. Thomas Sicheritz-Pontén from Technical University of Denmark 4 5 6 487 and Prof. Renyi Liu from Shanghai Center for Plant Stress Biology and Center of 7 8 9 488 Excellence for Molecular Plant Sciences, Chinese Academy of Sciences for comments 10 11 12 489 and language suggestions. This work was supported by the Strategic Priority Research 13 14 490 Program of the Chinese Academy of Sciences (No. XDB11040200), Major State Basic 15 16 17 491 Research Development Program of China (973 Program) (No. 2014CB13840402), and 18 19 20 492 Natural Science Foundation of China (No. 31371934). 21 22 23 493 24 25 494 References 26 27 495 1. 3I Interactive Keys and Taxonomic Databases. Dmitriev DA. 2003. 28 29 496 http://dmitriev.speciesfile.org/index.asp. Accessed 1 May 2017. 30 497 2. GENUS LAODELPHAX FENNAH, 1963. College of Agriculture & Natural Resources, 31 32 498 University of Delaware. http://ag.udel.edu/research/delphacid/species/Laodelphax.htm. 33 499 Accessed 1 May 2017. 34 500 3. Sun DZ and Jiang L. Research on the Inheritance and Breeding of Rice Stripe Resistance. 35 36 501 Chinese Agricultural Science Bulletin. 2006; 12: 073. 37 502 4. Huang HJ, Xue J, Zhuo JC, Cheng RL, Xu HJ, and Zhang CX. Comparative analysis of the 38 503 transcriptional responses to low and high temperatures in three rice planthopper species. 39 40 504 Molecular ecology. 2017; 26(10): 2726-37. 41 505 5. Zhou GH, Wen JJ, Cai DJ, Li P, Xu DL, and Zhang SG. Southern rice black-streaked dwarf 42 43 506 virus: a new proposed Fijivirus species in the family Reoviridae. Chinese Science Bulletin. 2008; 44 507 53(23): 3677-85. 45 508 6. Jia DS, Guo NM, Chen HY, Akita F, Xie LH, Omura T, et al. Assembly of the viroplasm by 46 47 509 viral non-structural protein Pns10 is essential for persistent infection of rice ragged stunt virus 48 510 in its insect vector. Journal of General Virology. 2012; 93(10): 2299-309. 49 511 7. Zheng LM, Mao QZ, Xie LH, and Wei TY. Infection route of rice grassy stunt virus, a tenuivirus, 50 51 512 in the body of its brown planthopper vector, Nilaparvata lugens (Hemiptera: Delphacidae) after 52 513 ingestion of virus. Virus Research. 2014; 188: 170-73. 53 54 514 8. Xue J, Zhou X, Zhang CX, Yu L-L, Fan HW, Wang Z, et al. Genomes of the rice pest brown 55 515 planthopper and its endosymbionts reveal complex complementary contributions for host 56 516 adaptation. Genome Biology. 2014; 15(12): 521. 57 58 517 9. Wang L, Tang N, Gao XL, Chang ZX, Zhang LQ, Zhou GH, et al. Genome sequence of a rice 59 518 pest, the white-backed planthopper (Sogatella furcifera). GigaScience. 2017; 6(1): 1. 60 61 23 62 63 64 65 519 10. Hare EE and Johnston JS. Genome size determination using flow cytometry of propidium 1 520 iodide-stained nuclei. Molecular Methods for Evolutionary Genetics. 2011: 3-12. 2 3 521 11. Li RQ, Fan W, Tian G, Zhu HM, He L, Cai J, et al. The sequence and de novo assembly of the 4 522 giant panda genome. Nature. 2010; 463(7279): 311. 5 523 12. Bennett MD, Leitch IJ, Price HJ, and Johnston JS. Comparisons with Caenorhabditis (∼ 100 6 7 524 Mb) and Drosophila (∼ 175 Mb) using flow cytometry show genome size in Arabidopsis to be∼ 8 525 157 Mb and thus∼ 25% larger than the Arabidopsis genome initiative estimate of∼ 125 Mb. 9 10 526 Annals of Botany. 2003; 91(5): 547-57. 11 527 13. Huang J, Zhang CM, Zhao X, Fei ZJ, Wan KK, Zhang Z, et al. The Jujube genome provides 12 528 insights into genome evolution and the domestication of sweetness/acidity taste in fruit trees. 13 14 529 PLOS Genetics. 2016; 12(12): e1006433. 15 530 14. Wang S, Zhang JB, Jiao WQ, Li J, Xun XG, Sun Y, et al. Scallop genome provides insights into 16 531 evolution of bilaterian karyotype and development. Nature Ecology & Evolution. 2017; 1: 0120. 17 18 532 15. English AC, Richards S, Han Y, Wang M, Vee V, Qu JX, et al. Mind the gap: upgrading genomes 19 533 with Pacific Biosciences RS long-read sequencing technology. PloS one. 2012; 7(11): e47768. 20 21 534 16. Chaisson MJ and Tesler G. Mapping single molecule sequencing reads using basic local 22 535 alignment with successive refinement (BLASR): application and theory. BMC bioinformatics. 23 536 2012; 13(1): 238. 24 25 537 17. Kelley RK, Wang G, and Venook AP. Biomarker use in colorectal cancer therapy. Journal of the 26 538 National Comprehensive Cancer Network. 2011; 9(11): 1293-302. 27 539 18. Li H and Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. 28 29 540 Bioinformatics. 2009; 25(14): 1754-60. 30 541 19. Zhao W, Lu LX, Yang PC, Cui N, Kang L, and Cui F. Organ-specific transcriptome response of 31 32 542 the small brown planthopper toward rice stripe virus. Insect Biochemistry and Molecular 33 543 Biology. 2016; 70: 60-72. 34 544 20. Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, and Zdobnov EM. BUSCO: assessing 35 36 545 genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 37 546 2015; 31(19): 3210-12. 38 547 21. Bao WD, Kojima KK, and Kohany O. Repbase Update, a database of repetitive elements in 39 40 548 eukaryotic genomes. Mobile DNA. 2015; 6(1): 11. 41 42 549 22. Tarailo‐Graovac M and Chen NS. Using RepeatMasker to identify repetitive elements in 43 44 550 genomic sequences. Current Protocols in Bioinformatics. 2009: 4.10. 1-4.10. 14. 45 551 23. Xu Z and Wang H. LTR_FINDER: an efficient tool for the prediction of full-length LTR 46 47 552 retrotransposons. Nucleic Acids Research. 2007; 35(suppl_2): W265-W68. 48 553 24. Edgar RC and Myers EW. PILER: identification and classification of genomic repeats. 49 554 Bioinformatics. 2005; 21(suppl_1): i152-i58. 50 51 555 25. Price AL, Jones NC, and Pevzner PA. De novo identification of repeat families in large genomes. 52 556 Bioinformatics. 2005; 21(suppl_1): i351-i58. 53 54 557 26. Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids 55 558 Research. 1999; 27(2): 573. 56 559 27. Altschul SF, Gish W, Miller W, Myers EW, and Lipman DJ. Basic local alignment search tool. 57 58 560 Journal of Molecular Biology. 1990; 215(3): 403-10. 59 561 28. Birney E, Clamp M, and Durbin R. GeneWise and genomewise. Genome Research. 2004; 14(5): 60 61 24 62 63 64 65 562 988-95. 1 563 29. Keller O, Kollmar M, Stanke M, and Waack S. A novel hybrid gene prediction method 2 3 564 employing protein multiple sequence alignments. Bioinformatics. 2011; 27(6): 757-63. 4 565 30. Majoros WH, Pertea M, and Salzberg SL. TigrScan and GlimmerHMM: two open source ab 5 566 initio eukaryotic gene-finders. Bioinformatics. 2004; 20(16): 2878-79. 6 7 567 31. Bedell JA, Korf I, and Gish W. MaskerAid: a performance enhancement to RepeatMasker. 8 568 Bioinformatics. 2000; 16(11): 1040-41. 9 10 569 32. Blanco E and Abril JF. Computational gene annotation in new genome assemblies using GeneID. 11 570 Bioinformatics for DNA Sequence Analysis. 2009: 243-61. 12 571 33. Blanco E, Parra G, and Guigó R. Using geneid to identify genes. Current Protocols in 13 14 572 Bioinformatics. 2007: 4.3. 1-4.3. 28. 15 573 34. Burge C and Karlin S. Prediction of complete gene structures in human genomic DNA. Journal 16 574 of Molecular Biology. 1997; 268(1): 78-94. 17 18 575 35. Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, et al. Differential gene and 19 576 transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature 20 21 577 Protocols. 2012; 7(3): 562. 22 578 36. Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith Jr RK, Hannick LI, et al. Improving the 23 579 Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids 24 25 580 Research. 2003; 31(19): 5654-66. 26 581 37. Haas BJ, Salzberg SL, Zhu W, Pertea M, Allen JE, Orvis J, et al. Automated eukaryotic gene 27 582 structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. 28 29 583 Genome Biology. 2008; 9(1): R7. 30 584 38. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map 31 32 585 format and SAMtools. Bioinformatics. 2009; 25(16): 2078-79. 33 586 39. Kanehisa M and Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids 34 587 Research. 2000; 28(1): 27-30. 35 36 588 40. Pruitt KD, Tatusova T, and Maglott DR. NCBI reference sequences (RefSeq): a curated non- 37 589 redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Research. 38 590 2006; 35(suppl_1): D61-D65. 39 40 591 41. Consortium U. UniProt: a hub for protein information. Nucleic Acids Research. 2014: gku989. 41 592 42. Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, et al. InterPro: the 42 43 593 integrative protein signature database. Nucleic Acids Research. 2008; 37(suppl_1): D211-D15. 44 594 43. Zdobnov EM and Apweiler R. InterProScan–an integration platform for the signature- 45 595 recognition methods in InterPro. Bioinformatics. 2001; 17(9): 847-48. 46 47 596 44. Li H, Coghlan A, Ruan J, Coin LJ, Heriche JK, Osmotherly L, et al. TreeFam: a curated database 48 597 of phylogenetic trees of gene families. Nucleic Acids Research. 2006; 34(suppl_1): 49 598 D572-D80. 50 51 599 45. Ruan J, Li H, Chen Z, Coghlan A, Coin LJ, Guo Y, et al. TreeFam: 2008 Update. Nucleic Acids 52 600 Research. 2008; 36(Database issue): D735-40. 53 54 601 46. Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large 55 602 phylogenies. Bioinformatics. 2014; 30(9): 1312-13. 56 603 47. Sanderson MJ. r8s: inferring absolute rates of molecular evolution and divergence times in the 57 58 604 absence of a molecular clock. Bioinformatics. 2003; 19(2): 301-02. 59 605 48. Van De Wiel MA, Leday GG, Pardo L, Rue H, Van Der Vaart AW, and Van Wieringen WN. 60 61 25 62 63 64 65 606 Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors. Biostatistics. 1 607 2013; 14(1): 113-28. 2 3 608 49. De Bie T, Cristianini N, Demuth JP, and Hahn MW. CAFE: a computational tool for the study 4 609 of gene family evolution. Bioinformatics. 2006; 22(10): 1269-71. 5 610 50. ImmunoDB. EM Zdobnov Group. 2008. http://cegg.unige.ch/Insecta/immunodb. Accessed 29 6 7 611 May 2016. 8 612 51. Zhao W, Yang PC, Kang L, and Cui F. Different pathogenicities of Rice stripe virus from the 9 10 613 insect vector and from viruliferous plants. New Phytologist. 2016; 210(1): 196-207. 11 614 52. Wang L, Tang N, Gao XL, Guo DY, Chang ZX, Fu YT, et al. Understanding the immune system 12 615 architecture and transcriptome responses to southern rice black-streaked dwarf virus in 13 14 616 Sogatella furcifera. Scientific reports. 2016; 6. 15 617 53. Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, and Salzberg SL. TopHat2: accurate 16 618 alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome 17 18 619 Biology. 2013; 14(4): R36. 19 620 54. Anders S, Pyl PT, and Huber W. HTSeq--a Python framework to work with high-throughput 20 21 621 sequencing data. Bioinformatics. 2015; 31(2): 166-69. 22 622 55. Zhang F, Li Q, Chen X, Huo Y, Guo H, Song Z, et al. Roles of the Laodelphax striatellus Down 23 623 syndrome cell adhesion molecule in Rice stripe virus infection of its insect vector. Insect 24 25 624 Molecular Biology. 2016; 25(4): 413-21. 26 625 56. Wenger JA, Cassone BJ, Legeai F, Johnston JS, Bansal R, Yates AD, et al. Whole genome 27 626 sequence of the soybean aphid, Aphis glycines. Insect Biochemistry and Molecular Biology. 28 29 627 2017. 30 628 57. Zhu JJ, Jiang F, Wang XH, Yang PC, Bao YY, Zhao W, et al. Supporting data for "Genome 31 32 629 sequence of the small brown planthopper Laodelphax striatellus". GigaScience Database. 2017. 33 630 http://dx.doi.org/10.5524/100361 34 35 631 36 37 38 632 Figure legends 39 40 41 633 Figure 1. Photograph of Laodelphax striatellus on a rice plant leaf. Scale bar, 1 mm. 42 43 634 Figure 2. Gene cluster analysis among 22 species. 1:1:1 and N:N:N 44 45 46 635 represents universal orthologs with single copy or multiple copy number, respectively. 47 48 49 636 Insect, Diptera, Hemiptera, Hymenoptera, Lepidoptera and Coleoptera stand for taxon- 50 51 52 637 specific orthologs, respectively. Other indicates orthlogs that do not belong to any 53 54 638 above-mentioned ortholog categories. SD indicates species-specifically duplicated 55 56 57 639 genes. ND indicates genes that cannot be classified into any other categories. The 58 59 60 640 location of Laodelphax striatellus was indicated by an arrow. 61 26 62 63 64 65 1 641 Figure 3. Venn diagram of functional annotation by four databases. NR, non- 2 3 642 redundant protein databases; KEGG, Kyoto Encyclopedia of Genes and Genomes. 4 5 6 643 Figure 4. Phylogenetic analysis of 22 arthropod species. The phylogenetic tree was 7 8 9 644 constructed based on amino acid sequences of 277 single-copy orthologs among 22 10 11 12 645 arthropod species (Anopheles gambiae, Anoplophora glabripennis, Apis mellifera, 13 14 646 Acyrthosiphon pisum, Bombyx mori, Bemisia tabaci, Cimex lectularius, Diaphorina 15 16 17 647 citri, Drosophila melanogaster, Diuraphis noxia, Danaus plexippus, Daphnia pulex, 18 19 20 648 Locusta migratoria, Laodelphax striatellus, Nilaparvata lugens, Nasonia vitripennis, 21 22 23 649 Oncopeltus fasciatus, Pediculus humanus, Rhodnius prolixus, Sogatella furcifera, 24 25 650 Tribolium castaneum, Zootermopsis nevadensis) using maximum likelihood algorithm. 26 27 28 651 The tree was rooted with D. pulex. 29 30 31 652 32 33 34 653 35 36 654 37 38 39 655 40 41 42 656 43 44 45 657 46 47 658 48 Table 1. Sequencing data used for genome assembly and annotation. 49 50 Category Accession Life stage Sample Insert Size Read Length Reads 51 type (bp) (bp) Number 52 53 Survey SRR5816389 Adult DNA 230 2 x 125 127772669 54 Assmebly SRR5830088 Adult DNA 180 2 x 100 123459791 55 SRR5816388 Adult DNA 250 2 x 125 137013558 56 57 SRR5816387 Adult DNA 500 2 x 100 141587274 58 SRR5816386 Adult DNA 500 2 x 125 30520480 59 SRR5816393 Adult DNA 800 2 x 100 153498320 60 61 27 62 63 64 65 SRR5816392 Adult DNA 1.4-1.6 K 2 x 125 40251413 1 SRR5816391 Adult DNA 2.6-2.8 K 2 x 125 36559438 2 3 SRR5816390 Adult DNA 5-5.6 K 2 x 125 26684783 4 SRR5816385 Adult DNA 5.6-6.5 K 2 x 125 23069935 5 SRR5816384 Adult DNA 9-11 K 2 x 125 24285333 6 7 SRR5816377 Adult DNA 11-13 K 2 x 125 23396366 8 SRR5816376 Adult DNA 13-15 K 2 x 125 30547732 9 10 SRR5816379 Adult DNA 15-18 K 2 x 125 25926919 11 SRR5816378 Adult DNA 18-24 K 2 x 125 26325395 12 SRR5817574 Adult DNA - 8559 99701 13 14 SRR5817559 Adult DNA - 8947 77038 15 SRR5817582 Adult DNA - 8474 104288 16 SRR5817569 Adult DNA - 8518 114320 17 18 SRR5817560 Adult DNA - 9202 80599 19 SRR5817562 Adult DNA - 9211 100089 20 21 SRR5817573 Adult DNA - 8610 102997 22 SRR5817558 Adult DNA - 9007 86083 23 SRR5817581 Adult DNA - 8452 89374 24 25 SRR5817570 Adult DNA - 8419 101715 26 SRR5817550 Adult DNA - 9192 82657 27 SRR5817576 Adult DNA - 8597 105080 28 29 SRR5817553 Adult DNA - 8586 77467 30 SRR5817557 Adult DNA - 8821 75712 31 32 SRR5817567 Adult DNA - 8363 106634 33 SRR5817575 Adult DNA - 8620 105795 34 SRR5817552 Adult DNA - 8985 66096 35 36 SRR5817556 Adult DNA - 8573 83500 37 SRR5817568 Adult DNA - 8357 104295 38 SRR5817578 Adult DNA - 8528 108299 39 40 SRR5817565 Adult DNA - 8728 69694 41 SRR5817555 Adult DNA - 8480 86385 42 43 SRR5817571 Adult DNA - 8437 106314 44 SRR5817577 Adult DNA - 8686 106337 45 SRR5817566 Adult DNA - 8890 52889 46 47 SRR5817554 Adult DNA - 8648 85970 48 SRR5817572 Adult DNA - 8437 101258 49 SRR5817580 Adult DNA - 8490 104459 50 51 SRR5817563 Adult DNA - 8954 91218 52 SRR5817561 Adult DNA - 8724 84033 53 54 SRR5817579 Adult DNA - 8776 107138 55 SRR5817564 Adult DNA - 9054 68294 56 SRR5817551 Adult DNA - 8508 88776 57 58 Annotation SRR5816381 Larva RNA 250-300 2 x 150 23733333 59 SRR5816380 Adult RNA 250-300 2 x 150 24933333 60 61 28 62 63 64 65 SRR5816383 Egg RNA 250-300 2 x 150 24633333 1 SRR5816382 Fat body RNA 250-300 2 x 150 31300000 2 3 SRR5816375 Brain RNA 250-300 2 x 150 40333333 4 SRR5816374 Gonad RNA 250-300 2 x 150 33300000 5 SRR5816394 Tentacle RNA 250-300 2 x 150 24966666 6 7 659 8 660 Note: Survey library in the Category column was used to estimate the genome size of Laodelphax 9 10 661 striatellus. Libraries of insert size >1 Kb were mate-paired. For gene annotation, data from two 11 662 previously sequenced tissues were used under accession SRR1619428 for salivary gland and 12 663 SRR1617617 for alimentary canal. 13 14 15 664 16 17 665 18 19 20 666 21 22 23 667 24 25 26 668 27 28 669 29 30 31 670 32 33 34 671 35 36 37 672 38 39 673 40 41 42 674 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 29 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 675 Table 2. Statistics comparison of genome assembly and annotation among three planthoppers. 25 26 Laodelphax striatellus Nilaparvata lugensa Sogatella furciferab 27 Category 28 Contig Scaffold Contig Scaffold Contig Scaffold 29 Total size (Mb) 530.2 541.0 993.8 1140.8 673.9 720.7 30 Total number 48574 38193 80046 46558 50020 20450 31 32 Maximum length (Kb) 1990 10350 230 2254 800 12789 33 N50 length (Kb) 118 1085 24 357 71 1185 34 35 GC content (%) 34.5 34.6 31.6 36 TE proportion (%) 23.0 38.9 39.7 37 BUSCO evaluation (%) 92 81 92 38 39 Gene number 17736 27571 21254 40 Average gene length (bp) 14342 11216 12597 41 Average CDS length (bp) 1289 1135 1526 42 43 Average exon per gene 6 4 6 44 Average exon length (bp) 213 264 240 45 46 Average intron length (bp) 2587 3062 2064 47 676 Note: TE, transposable element; BUSCO, benchmarking universal single copy ortholog; CDS, coding sequence; Gene number means number of protein-coding genes. 48 677 aFrom the published Nilaparvata lugens genome [8]. 49 50 678 bFrom the published Sogatella furcifera genome [9]. 51 52 679 53 54 55 680 56 57 681 58 59 60 61 30 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 682 Table 3. Comparison of transposable element (TE) contents of the three planthoppers. 25 26 Class Laodelphax striatellus Nilaparvata lugens Sogatella furcifera 27 28 De novo + Repbase TE Proteins Combined TEs Combined TEs Combined TEs 29 Length (bp) % of genome Length (bp) % of genome Length (bp) % of genome Length (bp) % of genome Length (bp) % of genome 30 DNA 24818676 4.59 2550902 0.47 26592872 4.92 162024958 14.20 126002323 17.33 31 32 LINE 24160245 4.47 4889094 0.90 27124925 5.01 182652892 16.00 69257982 9.52 33 LTR 7122249 1.32 0 0.00 7122249 1.32 168492299 14.80 31286552 4.30 34 35 SINE 22739683 4.20 743909 0.14 23044510 4.26 8272412 0.70 10730722 1.48 36 Other 0 0.00 0 0.00 0 0.00 41262 0.00 23167338 3.18 37 Unknown 27609625 5.10 0 0.00 27609625 5.10 21890733 1.90 28395639 3.90 38 39 Total 119645576 22.12 8177428 1.51 124360921 22.99 443765874 38.90 288840556 39.73 40 683 Note: De novo + Repbase refers to TE integrated between de novo and Repbase prediction. TE proteins refers to TE identified by RepeatProteinMask. Combined TEs 41 42 684 refers to TE combined two results above. DNA, DNA transposon; LINE, long interspersed nuclear element; LTR, long terminal repeat; SINE, short interspersed nuclear 43 685 element. Other means TE that can be classified but doesn’t belong given classes. Unknown means TE that can’t be classified. 44 45 686 46 47 48 687 49 50 51 688 52 53 689 54 55 690 56 57 58 59 60 61 31 62 63 64 65 1 691 Additional file 1 2 3 692 Table S1. Base composition of the Laodelphax striatellus genome assembly. 4 5 6 693 Table S2. Summary of reads mapping to the genome assembly of Laodelphax 7 8 9 694 striatellus. 10 11 12 695 Table S3. Transcript-based evaluation of the genome assembly of Laodelphax 13 14 696 striatellus. 15 16 17 697 Table S4. Statistics of nine transcriptomic reads mapped to different genomic 18 19 20 698 regions. 21 22 23 699 Table S5. Genome completeness assessment using benchmarking universal single 24 25 700 copy orthologs in five insects. 26 27 28 701 Table S6. Repetitive elements predicted by different programs. 29 30 31 702 Table S7. Sources of genome data of 22 arthropod species. 32 33 34 703 Table S8. Gene models predicted by different methods. 35 36 704 37 Table S9. Statistical comparison of gene sets of Laodelphax striatellus and 9 other 38 39 705 arthropod species. 40 41 42 706 Table S10. Expanded gene families in the three planthoppers. 43 44 45 707 Table S11. Chemoreception related genes in the three planthoppers. 46 47 708 48 Table S12. Detoxification related genes in the three planthoppers. 49 50 709 Table S13. Immune genes in the three planthoppers. 51 52 53 710 Table S14. Sources of gene annotation files for the three planthoppers. 54 55 56 711 Table S15. Shared Gene Ontology terms for differentially expressed genes in the 57 58 59 712 three planthoppers responding to plant viruses. 60 61 32 62 63 64 65 1 713 Table S16. Commonly regulated genes with similar functions in the three 2 3 714 planthoppers responding to plant viruses. 4 5 6 715 Table S17. Homologous genes in the three planthoppers responding to plant 7 8 9 716 viruses. 10 11 12 717 13 14 718 Additional file 2 15 16 17 719 Figure S1. Laodelphax striatellus genome size estimation by flow cytometry and k- 18 19 20 720 mer analyses. (A), (B) and (C) showed fluorescence peaks for Drosophila 21 22 23 721 melanogaster, Gallus gallus and L. striatellus, respectively. The genome sizes of D. 24 25 722 melanogaster and G. gallus were 0.18 pg and 1.25 pg, respectively. The genome size 26 27 28 723 of L. striatellus was calculated to be 0.60 pg. (D) illustrated the depth distribution of k- 29 30 31 724 mers (k = 17). 32 33 34 725 Figure S2. Laodelphax striatellus chromosomes dyed with Hoechst 33258. (A) 35 36 726 haploid chromosomes. (B) diploid chromosomes. 37 38 39 727 Figure S3. Sequencing depth distribution. The x-axis shows sequencing depth and 40 41 42 728 the y-axis shows fraction of bases with certain sequencing depth. 43 44 45 729 Figure S4. Summary of gene structures of Laodelphax striatellus and eight other 46 47 730 48 species used for gene annotation. 49 50 731 Figure S5. Benchmarking universal single copy orthologs (BUSCO) assessment of 51 52 53 732 the Laodelphax striatellus gene set. The completeness of the gene set was assessed 54 55 56 733 with two BUSCO Ver. 2 datasets (arthropoda and eukaryote). The recovered matches 57 58 734 59 are classified as ‘complete’ if their lengths are within the expectation of the BUSCO 60 61 33 62 63 64 65 1 735 profile match lengths. If these are found only once they are classified as ‘complete 2 3 736 single’ and other ‘complete’ matches are classified as ‘complete duplicated’. The 4 5 6 737 matches that are only partially recovered are classified as ‘fragmented’, and BUSCO 7 8 9 738 groups for which there are no matches that pass the tests of orthology are classified as 10 11 12 739 ‘missing’. For each species, the right bar shows the arthropoda results and the left bar 13 14 740 shows the eukaryote results. Aga, Anopheles gambiae; Agl, Anoplophora glabripennis; 15 16 17 741 Ame, Apis mellifera; Api, Acyrthosiphon pisum; Bmo, Bombyx mori; Bta, Bemisia 18 19 20 742 tabaci; Cle, Cimex lectularius; Dci, Diaphorina citri; Dme, Drosophila melanogaster; 21 22 23 743 Dno, Diuraphis noxia; Dpl, Danaus plexippus; Dpu, Daphnia pulex; Lmi, Locusta 24 25 744 migratoria; Lst, Laodelphax striatellus; Nlu, Nilaparvata lugens; Nvi, Nasonia 26 27 28 745 vitripennis; Ofa, Oncopeltus fasciatus; Phu, Pediculus humanus; Rpr, Rhodnius 29 30 31 746 prolixus; Sfu, Sogatella furcifera; Tca, Tribolium castaneum; Zne, Zootermopsis 32 33 34 747 nevadensis. 35 36 748 Figure S6. Determination of genomic heterozygosity. (A) Density distribution of 37 38 39 749 heterozygous rates. (B) Frequency distribution of read coverage of both high and low 40 41 42 750 heterozygosity. All heterozygosity rates were ranked and the top 20% was chosen as 43 44 45 751 high heterozygosity (high_het in the legend) and the left as low heterozygosity (low_het 46 47 752 48 in the legend). 49 50 753 Figure S7. Divergence times estimation of 22 arthropod species. The number on 51 52 53 754 each node stands for the divergence time from the present (million years ago, Mya) 54 55 56 755 with 95% confidence interval values noted in brackets. Four calibration time were used 57 58 756 59 in the estimation: D. pulex-D. melanogaster divergence (445~530 Mya), N. vitripennis- 60 61 34 62 63 64 65 1 757 D. melanogaster divergence (279~306 Mya), A. gambiae-D. melanogaster divergence 2 3 758 (235~269 Mya) and A. mellifera-N. vitripennis divergence (175~215 Mya). The 4 5 6 759 location of L. striatellus was indicated by an arrow. 7 8 9 760 Figure S8. Gene family expansion and contraction in the three planthoppers. R. 10 11 12 761 prolixus was used as outgroup to construct the phylogenetic tree and infer 13 14 762 expanded/contracted gene families by CAFÉ. A conditional P-value was calculated for 15 16 17 763 each gene family and families with P-value < 0.05 were considered as significantly 18 19 20 764 expanded (green) or contracted (red). 21 22 23 765 24 25 766 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 35 62 63 64 65 Figure 1 Click here to download Figure Figure 1.tif Figure 2 Click here to download Figure Figure 2.pdf

Daphnia pulex 1:1:1 Locusta migratoria N:N:N Zootermopsis nevadensis Insect Pediculus humanus Diptera Oncopeltus fasciatus Hemiptera Rhodnius prolixus Hymenoptera Cimex lectularius Lepidoptera Nilaparvata lugens Coleoptera Other Sogatella furcifera SD Laodelphax striatellus ND Diaphorina citri Bemisia tabaci Diuraphis noxia Acyrthosiphon pisum Nasonia vitripennis Apis mellifera Anoplophora glabripennis Tribolium castaneum Anopheles gambiae Drosophila melanogaster Danaus plexippus Bombyx mori

0 5000 10000 15000 20000 25000 30000 35000 Gene number Figure 3 Click here to download Figure Figure 3.pdf

Swiss-Prot NR

10 1076 142 8 326 InterPro KEGG 1034 272 161 28

478 10289 6

341 3

8 Figure 4 Click here to download Figure Figure 4.jpg Additional file 1

Click here to access/download Supplementary Material Additional file 1.pdf Additional file 2

Click here to access/download Supplementary Material Additional file 2.pdf Personal Cover Click here to download Personal Cover personal cover.docx

Dear Editor,

Thank you very much for organizing the review of our manuscript entitled “Genome sequence of the small brown planthopper Laodelphax striatellus” (GIGA-D-17-00204), and sending us the referees’ comments, which are very valuable for us. We have worked hard to be responsive to each of the reviewer’s comments and made modifications in the new version accordingly.

We hope that the new version of the paper can be accepted for publication in

GigaScience.

Sincerely yours,

Feng Cui Professor Institute of Zoology Chinese Academy of Sciences