bioRxiv preprint doi: https://doi.org/10.1101/2021.02.17.431348; this version posted February 18, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
12
Chromosome-level genome assembly of the horned-gall aphid, Schlechtendalia chinensis (Hemiptera: Aphididae: Erisomatinae)
34
- #
- #
Hong-Yuan Wei1 , Yu-Xian Ye2 , Hai-Jian Huang4, Ming-Shun Chen3, Zi-Xiang Yang1*, Xiao-Ming Chen1*,
- 5
- Chuan-Xi Zhang2,4*
- 6
- 1Research Institute of Resource Insects, Chinese Academy of Forestry, Kunming, China
2Institute of Insect Sciences, Zhejiang University, Hangzhou, China 3Department of Entomology, Kansas State University, Manhattan, KS, USA 4State Key Laboratory for Managing Biotic and Chemical Threats to the Quality and Safety of Agro-products; Key Laboratory of Biotechnology in Plant Protection of MOA of China and Zhejiang Province, Institute of Plant Virology, Ningbo University, Ningbo, China
789
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
#Contributed equally. *Correspondence Zi-Xiang Yang, Research Institute of Resource Insects, Chinese Academy of Forestry, Kunming, China. E-mail: [email protected] Xiao-Ming Chen, Research Institute of Resource Insects, Chinese Academy of Forestry, Kunming, China. E-mail: [email protected] Chuan-Xi Zhang, Institute of Insect Sciences, Zhejiang University, Hangzhou, China. E-mail: [email protected] Funding information National Natural Science Foundation of China, Grant/Award number: 31872305, U1402263; The basic research program of Yunnan Province, Grant/Award number: 202001AT070016; The grant for Innovative Team of ‘Insect Molecular Ecology and Evolution’ of Yunnan Province
Abstract
The horned gall aphid Schlechtendalia chinensis, is an economically important insect that induces galls valuable for medicinal and chemical industries. S. chinensis manipulates its host plant to form well-organized horned galls during feeding. So far, more than twenty aphid genomes have been reported; however, all of those are derived from free-living aphids. Here we generated a high-quality genome assembly of S. chinensis, representing the first genome sequence of a galling
1
bioRxiv preprint doi: https://doi.org/10.1101/2021.02.17.431348; this version posted February 18, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
aphid. The final genome assembly was 280.43 Mb, with 97% of the assembled sequences anchored into thirteen chromosomes. S. chinensis presents the smallest aphid genome size among available aphid genomes to date. The contig and scaffold N50 values were 3.39 Mb and 20.58 Mb, respectively. The assembly included 96.4% of conserved arthropod and 97.8% of conserved Hemiptera single-copy orthologous genes based on BUSCO analysis. A total of 13,437 protein-coding genes were predicted. Phylogenomic analysis showed that S. chinensis formed a single clade between the Eriosoma lanigerum clade and the Aphidini+Macrosiphini aphid clades. In addition, salivary proteins were found to be differentially expressed when S. chinensis underwent host alternation, indicating their potential roles in gall formation and plant defense suppression. A total of 36 cytochrome P450 genes were identified in S. chinensis, considerably fewer compared to other aphids, probably due to its small host plant range. The high-quality S. chinensis genome assembly and annotation provide an essential genetic background for future studies to reveal the mechanism of gall formation and to explore the interaction between aphids and their host plants.
Key words
Schlechtendalia chinensis; PacBio sequencing; Hi-C analysis; Chromosome-level genome assembly; Comparative genomics; Gall formation; Host alternation
1 Introduction
Many aphid species are economically important plant pests that feed on plant sap and also caused damage by transmitting plant viruses. Around 100 aphid species have been identified as significant agricultural pests among the approximately 5,000 known species (Blackman & Eastop, 2017). Up to this point, studies on aphid genomes have mostly focused on the subfamily Aphidinae (International Aphid Genomics Consortium, 2010; Li et al., 2019; Mathers, 2020; Mathers et al., 2017; Mathers, Mugford, et al., 2020; Mathers, Wouters, et al., 2020; Nicholson et al., 2015; Thorpe et al., 2018; Wenger et al., 2016). Genome sequencing efforts of other subfamilies, however, such as Eriosomatinae that are distantly related to Aphidinae are very limited (Julca et. al. 2020; Biallo et al., 2020). Unlike most free-living aphids, the galling species from Eriosomatinae induce the formation of galls on their primary host plant and live in the interior. Galling aphids may serve as good models for the study of unique ecological and behavioral phenomena for insect-plant interaction and
2
bioRxiv preprint doi: https://doi.org/10.1101/2021.02.17.431348; this version posted February 18, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
coevolution (Moran, 1989; David Wool, 2004). The horned gall aphid, Schlechtendalia chinensis (Hemiptera: Aphididae: Eriosomatinae: Fordini), is one of the most economically valuable insects, since it induces the formation of gallnuts that have been used for medicinal purposes and in chemical industries. The components of gallnuts are useful or essential gradients in producing inks, wine, food, cosmetic antioxidants, and animal feed. These widely uses are due to their high tannin levels which range from 50% to 70% (Zhang, Tang, & Cheng, 2008). The annual yield of gallnuts in China is 8,000-10,000 tons, accounting for >90% of the total yield worldwide (Zhang, Tang, & Cheng, 2008). S. chinensis shows a peculiar complex life cycle involving both sexual and asexual reproduction stages with a host alternation between the Chinese sumac (Rhus chinensis, Anacardiaceae) and mosses (Plagiomnium spp. Mniaceae). In this holocyclic life cycle, a fundatrix produced by a mating female crawls along the trunk and feeds on a new leaf, where it induces the formation of a horned-gall. The fundatrix then continues to feed inside the gall and reproduces fundatrigeniae via parthenogenesis. From here on, the gall grows quickly and forms a highly organized structure. During gall development, drastic morphological rearrangements occur in leaf tissues. For example, the palisade tissues of a gall are reorganized and replaced by parenchym cells (Liu et al., 2014). In autumn, galls become mature and burst to open, alate autumn migrants fly to nearby mosses and reproduce nymphs for overwintering. In the following spring, nymphs develop in the moss into spring migrants that fly back to the primary host, R. chinensis. Here they produce female and male offspring (sexuales) with degenerated mouthparts that allow them to attach themselves to the trunk crevices. After mating, each female reproduces only a fundatrix, starting the next life cycle all over again (Figure 1) (Zhang, Qiao, Zhong & Zhang, 1999; Blackman and Eastop, 2000). Taken together, S. chinensis participates in a series of all-female parthenogenetic generations and only a single sex generation. This unusually life cycle with various morphologically distant aphids at different stages is likely driven by adaptation to different environmental conditions. Unlike most free-living aphids from the Aphidinae taxon, galling aphids have many distinct biological characteristics. The most striking characteristic is that the feeding of most galling aphids does not seriously affect the health of their host plants. In fact, the galls of S. chinensis even provide some benefits to its primary host plant (Chen et al., 2020).
The complexity both in its developmental process and in the structure of its induced galls implies that S. chinensis may have unique gene sets that regulate its development and manipulate its
3
bioRxiv preprint doi: https://doi.org/10.1101/2021.02.17.431348; this version posted February 18, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
91 92
host plants (Takeda et al., 2019; Hirano et al., 2020). The underlying molecular mechanisms for its complex life cycle and ability to parasitize on host plants via apparently harmless galls remain largely unknown. Here we generated a high-quality chromosome-level genome assembly of S. chinensis, representing the first genome sequence of aphids that induces the formation of closed galls, using a combination of Illumina, Pacific Bioscience (PacBio) and Hi-C technologies. Subsequent, gene prediction, functional annotation and phylogenetic analysis were also carried out to determine the relationship between S. chinensis and other aphid species. This reference genome sequence provides information about genomic organization in the Eriosomatinae subfamily and allows comparative genomic studies for a better understanding of genes associated with its own developmental regulation as well as its ability to induce gall formation, living on alternative host plant species and suppress plant defence during parasitism.
93 94 95 96 97 98 99
100 101 102 103 104
2 Materials and Methods 2.1 Sample collection
S. chinensis samples were collected in October, 2019, from fresh mature galls on R. chinensis, in
105
Wufeng county (30°10′ N,110°52′ E,960 m above sea level), Hubei Province, China. Winged and
106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
wingless fundatrigeniae in a gall were treated as the same clone. Fundatrigeniae contained within a gall were poured onto a petri dish after dissecting the gall. Impurities such as waxes were removed manual. The fundatrigeniae were then transferred to Eppendorf tubes. Samples were separated into groups with each group containing about 20 individuals. The samples were immediately frozen in liquid nitrogen for 2 hours and then stored at -80℃ for later genome, RNA-seq and Hi-C analyses respectively.
In addition to the original samples, a colony was established through artificial cultivation for other genetic studies. To establish a colony, autumn migrants of S. chinensis were collected from mature galls and transferred to a nursery of the moss Plagiomnium maximoviczii, which were maintained in a greenhouse. In the following year, spring migrants (sexuparae) were collected from the mosses and cultivated in lab. Aphids were transferred to host trees after fundatrix emergence for gall induction. Aphid samples of different stages including fundatrix, fundatrigeniae, autumn migrants, overwinter nymphs, and spring migrants as well as male and female sexuales were collected separately. All aphid samples were immediately frozen in liquid nitrogen for two hours and then stored at -80℃ until further analysis.
4
bioRxiv preprint doi: https://doi.org/10.1101/2021.02.17.431348; this version posted February 18, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150
2.2 Genomic sequencing
Genomic DNA was isolated from aphid samples using a DNeasy Blood & Tissue Extraction Kit (Qiagen Inc., Valencia, CA, USA) by following the manufacturer's instructions. After the measurement of quality and quantity, the DNA samples were used for making a paired-end sequencing library (150 bp in length). The library was sequenced using the Illumina NovaSeq6000 plantform. In addition, a 20 kb library was constructed and sequenced using the PacBio RSII platform at Annoroad Gene Technology Co., Ltd. (Beijing, China).
Hi-C libraries were also constructed from aphid samples according to the Proximo Hi-C procedure and its quality and concentration were determined via an Agilent 2100 Bioanalyzer and qPCR. Briefly, aphid samples were mixed with 1% formaldehyde for 10 min at room temperature and the nuclei were extracted and permeabilized. After the quality of the library was ascertained, different libraries were pooled to achieve required concentrations for Illumina sequencing (Rao et al., 2014).
RNA samples were extracted from the fundatrix, the fundatrigeniae, autumn migrants, nymphs, spring migrants (sexuparae), male and female sexuales, respectively, following the standard protocols provided by the manufacturer. RNA quantity, purity and integrity were determined by a NanoPhotometer and the Agilent 2100 Bioanalyzer. cDNA libraries were built following the chain specific method. The libraries were initially quantified with qubit 2.0 fluorometer and diluted to 1.5 ng/ul. Different libraries were pooled according to the requirements of effective concentration and target data volume for Illumina sequencing.
2.3 Genome assembly
We selected a k-mer length of 19 bases and used Illumina paired end reads for k-mer analysis to estimate the genome size and generate k-mer distribution as described by Jellyfish and GenomeScope (Ranallo-Benavidez, Jaron, & Schatz, 2020). We used PacBio sequencing to assembly genome using Wtdbg2. Long reads were used to fix base errors in sequences generated by noisy long reads using NextPolish (Hu, Fang, Su, & Liu, 2019). Pilon was used to improve the draft assembly from Illumina reads (Walker et al., 2014). Finally, HaploMerger2 was used to remove the heterozygosity (Huang, Kang, & Xu, 2017).
Haploid contigs were scaffolded using the 3D de novo assembly (3D-DNA) software
(Dudchenko et al., 2017). Hi-C sequences were aligned with the draft genome assembly using
5
bioRxiv preprint doi: https://doi.org/10.1101/2021.02.17.431348; this version posted February 18, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180
Juicer. An initial assembly was generated via a 3D-DNA analysis. The initial assembly was validated using JBAT Juicebox, resulting in a finally chromosome-level genome assembly.
2.4 Chromosome staining
Salivary glands were dissected from fundatrigeniae of S. chinensis in a phosphate-buffer saline (PBS) solution (137 mM NaCl, 2.68 mM KCl, 8.1 mM Na2HPO4 and 1.47 mM KH2PO4 at pH 7.4) under a stereomicroscope (COIC, Chongqing, China). The samples were fixed in 4% paraformaldehyde (Sangon Biotech, Shanghai, China) for 1 h and washed with PBS three times. Then the samples were triturated mechanically, and 20 ml DAPI solution (Abcam, Cambridge, USA) was added to stain the chromosomes. Fluorescence images were examined and photographed under a Leica confocal laser-scanning microscope (Leica, Heidelberg, Germany).
2.5 Gene annotation
Repetitive sequences were identified based on homology and de novo approaches. For the homology-based approach, RepeatMasker was used to screen the S. chinensis genome against the Hemiptera ch repeat database, set this parameter to RepeatMasker -pa 4 -e ncbi -species Hemiptera ch -dir. Repeatmodeler identifies repetitive sequences based on de novo approaches prediction (Flynn et al., 2020). Statistical results of RepeatMasker and Repeatmodeler analyses were combined.
Gene structure predictions were performed using homology-based, de novo, and RNA-seq assisted methods. For the homology-based method, gene structures were predicted based on
homology to those from Eriosoma lanigerum, Acyrthosiphon pisum, Myzus persicae, Aphis glycines,
Rhopalosiphum maidis (Birney, Michele, & Durbin, 2004). For de novo prediction, Augustus was first used for Augustus training, then for RNA-seq data to make hint files, and finally run to perform gene prediction (Stanke et al., 2006; Blanco, Parra & Guigó, 2007). In addition, GeneMark-ET, BRAKER, Maker and SNAP were also applied separately to generate ab initio gene predictions. For the RNA-seq assisted method, RNA-seq data were alignedto the S. chinensis genome using PASA. Finally, gene prediction results were integrated by EVidenceModeler (Haas et al., 2008). We aligned the genes to seven functional databases to annotate genes in the S. chinensis genome. The databases used in the study were NCBI Non-Redundant Protein Sequence (Nr), Non-Redundant Nucleotide Sequence Database (Nt), SwissProt, Cluster of Orthologous Groups for eukaryotic complete genomes (KOG), The Integrated Resource of Protein Domains and Functional
6
bioRxiv preprint doi: https://doi.org/10.1101/2021.02.17.431348; this version posted February 18, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210
Sites (InterPro), Conserved Domain Database (CDD), Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes, Orthology database (KEGG) and evolutionary genealogy of genes: Non-supervised Orthologous Groups (eggNOG). A localized Blast2GO database was built first and then GO annotation was processed via Blast2GO. KAAS of KEGG was used to annotate S. chinensis genome sequence quickly, and the pattern of BBH was chosen.
2.6 Non-coding RNA identification
Transfer RNAs (tRNAs) were identified using the tRNAscan-SE program (set to default parameters, the same as below unless specific notes). RNAmmer was used to identify 5S/ 8S, 16S/ 18S and 23S/ 28S rRNA. Ribosomal RNAs (rRNAs), microRNAs (miRNAs) and small nuclear RNAs (snRNAs) were identified based on the Rfam database (12.2) (Kalvari et al., 2017).
2.7 Phylogenetic analysis
We constructed phylogenetic trees using whole-genome sequences of S. chinensis and whole-genome sequences of other ten aphid species including Cinara cedri, E. lanigerum, Ap.
glycines, Ap. gossypii, R. maidis, R. padi, Ac. pisum, Diuraphis noxia, M. cerasi and M. persicae.
The aphid genome sequence and gene structure annotation files were downloaded from the NCBI genome database, and genes containing mRNA information were retained and the tail CDS was modified. Finally, the protein and CDS sequences of all genes were obtained. The obtained protein sequences were filtered and analyzed by all vs all blast. Homologous genes were clustered using OrthoMCl (Li, Stoeckert & Roos, 2003). The coding sequences of homologous single copy genes were extracted. Multi-sequence alignments of homologous single copy genes were performed using
MAFFT (Katoh, Misawa, Kuma, & Miyata, 2002; Katoh & Standley, 2013). Protein sequence
alignments were converted to codon sequence alignments and conserved blocks were extracted from the alignments using Gblocks. The extracted codon sequence alignments of all homologous single copy genes were integrated for phylogenetic trees using RAxML (Castresana, 2000). The branch length of homologous genes was analyzed using PAML, and then compared with the standard tree to eliminate abnormal genes. Then the tree was built using RAxML again (Stamatakis, 2014). By providing the root number and multiple sequence alignment results with calibration point information, the species divergence time can be calculated using mcmctree (a part of the PAML software), and the divergence time within the evolutionary tree can be obtained with 95% confidence (Yang, 2007). The nodal dates of Ac. pisum and M. cerasi were 42.4-48.0 MYA, and
7
bioRxiv preprint doi: https://doi.org/10.1101/2021.02.17.431348; this version posted February 18, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240
those of Ac. pisum and Ap. gossypii were 34-60 MYA.
2.8 Gene family expansion and contraction
We used CAFÉ to analyze the gene family expansion and contraction by comparing our data with
those from 10 other aphids (C. cedri, E. lanigerum, Ap. glycines, Ap. gossypii, R. maidis, R. padi, Ac. pisum, D. noxia, M. cerasi and M. persicae). Briefly, the quantitative information of gene