
bioRxiv preprint doi: https://doi.org/10.1101/691923; this version posted July 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 Genome sequencing of Musa acuminata Dwarf Cavendish reveals a chromosome arm 2 duplication 3 4 Mareike Busche+, Boas Pucker+, Prisca Viehöver, Bernd Weisshaar, Ralf Stracke* 5 6 Genetics and Genomics of Plants, Center for Biotechnology (CeBiTec), Bielefeld University, 7 Sequenz 1, 33615 Bielefeld, Germany 8 9 MB: [email protected], 0000-0002-2114-7613 10 BP: [email protected], 0000-0002-3321-7471 11 PV: [email protected], 0000-0003-3286-4121 12 BW: [email protected], 0000-0002-7635-3473 13 RS: [email protected], 0000-0002-9261-2279 14 15 + these authors contributed equally 16 * corresponding author 17 18 19 Abstract 20 Different Musa species, subspecies, and cultivars are currently investigated to reveal their 21 genomic diversity. Here, we compare the Musa acuminata cultivar Dwarf Cavendish against the 22 previously released Pahang assembly. Numerous small sequence variants were detected and 23 the ploidy of the cultivar presented here was determined as triploid based on sequence variant 24 frequencies. Illumina sequencing also revealed a duplication of a large segment of chromosome 25 2 in the genome of the cultivar studied. Comparison against previously sequenced cultivars 26 provided evidence that this duplication is unique to Dwarf Cavendish. Although no functional 27 relevance of this duplication was identified, this example shows the potential of plants to tolerate 28 such aneuploidies. 29 30 31 1 bioRxiv preprint doi: https://doi.org/10.1101/691923; this version posted July 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 32 Introduction 33 Bananas (Musa) are monocotyledonous perennial plants. The edible fruit (botanically a berry) 34 belongs to the most popular fruits in the world. In 2016, about 5.5 million hectares of land were 35 used for the production of more than 112 million tons of bananas [1]. The majority of bananas 36 were grown in Africa, Latin America, and Asia where they offer employment opportunities and 37 are important export commodities [1]. Furthermore, with an annual per capita consumption of 38 more than 200 kg in Rwanda and more than 100 kg in Angola, bananas provide food security in 39 developing countries [1,2]. While plantains or cooking bananas are commonly eaten as a staple 40 food in Africa and Latin America, the softer and sweeter dessert bananas are popular in Europe 41 and Northern America. Between 1998 and 2000, around 47 % of the world banana production 42 and the majority of the dessert banana production relied on the Cavendish subgroup of cultivars 43 [2]. 44 Even though only Cavendish bananas are traded internationally, numerous cultivars are used 45 for local consumption in Africa and South-east Asia. Bananas went through a long 46 domestication process which started at least 7,000 years ago [3]. The first step towards edible 47 bananas was interspecific hybridisation between subspecies from different regions, which 48 caused incorrect meiosis and diploid gametes [4]. The diversity of edible triploids resulted from 49 human selection and triploidization of Musa acuminata and Musa balbisiana cultivars [4]. 50 These exciting insights into the evolution of bananas were revealed by the analysis of genome 51 sequences. Technological advances boosted sequencing capacities and allowed the 52 (re-)sequencing of genomes from multiple subspecies and cultivars. M. acuminata can be 53 divided into several subspecies and cultivars. The first M. acuminata (DH Pahang) genome 54 sequence has been published in 2012 [5], many more genomes have been sequenced recently 55 including: banksii, burmannica, zebrina [6], malaccensis (SRR8989632, SRR6996493), Baxijiao 56 (SRR6996491, SRR6996491), Sucrier : Pisang_Mas (SRR6996492). Additionally, the genome 57 sequences of other Musa species, M. balbisiana [7], M. itinerans [8], and M. schizocarpa [9], 58 have already been published. 59 60 Here we report about our investigation of the genome of M. acuminata Dwarf Cavendish which 61 revealed multiple large scale deletions compared to the DH Pahang v2 reference assembly. 62 Moreover, we identified an increased copy number of the southern arm of chromosome 2, 63 indicating that this region was duplicated in one haplophase. 64 2 bioRxiv preprint doi: https://doi.org/10.1101/691923; this version posted July 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 65 Materials and methods 66 Plant material and DNA extraction. Musa acuminata Dwarf Cavendish tissue culture 67 seedlings were obtained from FUTURE EXOTICS/SolarTek (Düsseldorf, Germany) (Figure 1). 68 Plants were grown under natural daylight at 21°C. Genomic DNA was isolated from leaves 69 following the protocol of Dellaporta et al. [10]. 70 71 Figure 1: M. acuminata Dwarf Cavendish plant. 72 73 Library preparation and sequencing. Genomic DNA was subjected to sequencing library 74 preparation via the TrueSeq v2 protocol as previously described [11]. Paired-end sequencing 75 was performed on an Illumina HiSeq1500 and NextSeq500, respectively, resulting in 2x250 nt 76 and 2x154 nt read data sets. 77 78 Read mapping, variant calling, and variant annotation. All reads were mapped to the DH 79 Pahang v2 reference genome sequence via BWA-MEM v0.7 [12]. GATK v3.8 [13,14] was 80 deployed to identify small sequence variants based on this read mapping. The resulting variant 81 set was subjected to SnpEff [15] to assign predictions about the functional impact to all variants. 82 Variants with disruptive effects were selected using a customized Python script as described 83 earlier [11]. 84 The genome-wide distribution of small sequence variants was assessed for small nucleotide 85 variants (SNVs) and InDels based on previously developed scripts [16]. The length distribution 3 bioRxiv preprint doi: https://doi.org/10.1101/691923; this version posted July 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 86 of InDels inside coding sequences was compared to the length distribution of InDels outside 87 coding sequences using a customized Python script [11]. 88 89 De novo genome assembly. Trimmomatic v0.38 [17] was applied as previously described [11] 90 to remove low quality sequences and remaining adapter sequences. Different sets of trimmed 91 reads were subjected to SOAPdenovo2 for assembly using optimized parameters [18]. 92 Resulting assemblies were evaluated using previously described criteria [18] including BUSCO 93 v3 [19] and polished by removing potential contaminations and adapters as described before 94 [18]. The DH Pahang v2 assembly [5,20] was used in the contamination detection process to 95 distinguish between bona fide banana contigs and sequences of unknown origin. Contigs with 96 high sequence similarity to non-plant sequences were removed as previously described [18]. 97 Surviving contigs were sorted based on the DH Pahang v2 reference genome sequence and 98 joint into pseudochromosomes to facilitate downstream analyses. 99 100 Data availability. Sequencing datasets were submitted to EBI (ERR3412983, ERR3412984, 101 ERR3413471, ERR3413472, ERR3413473, ERR3413474). Python scripts are freely available 102 on github (https://github.com/bpucker/banana). 103 104 105 Results and discussion 106 Structural variants. The probably most remarkable difference between the Dwarf Cavendish 107 and Pahang genome sequence is the amplification of the southern region of chromosome 2 108 (Figure 2). Such a duplication was not observed in any of the other available sequencing data 109 sets (File S1, File S2). There are multiple deletions and insertions which are not randomly 110 distributed over the genome. An increased average coverage in the south of chromosome 2 111 indicates that a substantial fraction of this chromosome has been duplicated. Apparently, read 112 mapping indicates at least four large scale deletions in Dwarf Cavendish compared to Pahang 113 v2 on chromosomes 2, 4, 5 and 7 (Figure 2). However, analysis of the underlying sequence 114 revealed long stretches of ambiguous bases as the cause for these low coverage regions. 115 116 4 bioRxiv preprint doi: https://doi.org/10.1101/691923; this version posted July 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 117 118 Figure 2: Coverage distribution. Chromosomes are ordered by increasing number with the 119 north end on the left hand side. Mapping of M. acuminata Dwarf Cavendish reads against the 120 DH Pahang v2 reference sequence revealed a tetraploid region in the southern part of 121 chromosome 2 in Dwarf Cavendish. Apparent large scale deletions are technical artifacts 122 caused by large stretches of ambiguous bases in the Pahang assembly that cannot be covered 123 by reads. 124 125 Ploidy of M. acuminata Dwarf Cavendish. Based on the coverage of small sequence variants, 126 the ploidy of Dwarf Cavendish was identified as triploid (Figure 3). Many heterozygous variants 127 display a frequency of the reference allele close to 0.33 or close to 0.66.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages10 Page
-
File Size-