Mangifera Indica L. Cv, Amrapali
Total Page:16
File Type:pdf, Size:1020Kb
1/25/2019 PAG XXII Workshop 1. Economic Significane of Mango Sequencing Complex Genome (Mangifera indica L.) 13th Jan 2019 Annual production value of > 20 billion USD A Reference Assembly of the Highly Heterozygous Genome of Mango (Mangifera indica L. cv, Amrapali) Nagendra K. Singh ICAR- National Research Centre on Plant Biotechnology Pusa Campus, New Delhi-110012 Acknowledgements 1. Origin and Distribution of Mangifera species Funding Support: ICAR-NPTC; ICAR-Extra Mural National Partners (PIs) Anju Bajpai, ICAR-CISH Lucknow S.K. Singh, ICAR-IARI, New Delhi K.V. Ravishankar, ICAR-IIHR Bengaluru • Origin of genus Mangifera has been Mangifera casturi traced to Damalgiri hills of Meghalaya Kalimantan, Indonesia Anil Rai, ICAR-IASRI, New Delhi by 65 my old fossil of mango leaf (R. C. Mehrotra, Birbal Sahni Institute of ICAR-NRCPB Palaeobotany, Lucknow) Vandna Rai, Kishor Gaikwad, Amitha Sevanthi • Common mango (Mangifera indica L.) originated in India (Woodrow, 1904; RA/SRF Scott, 1992; Cole & Hawson, 1963; Mukherjee, 1971; Malo, 1985). Ajay Mahato, Pawan Jaysawal, Sangeeta Singh, Nisha Singh • Mughals (1556 -1605) had a 500 -1000 ha orchard with 1,00,000 mango trees Service Provider • 1,000 varieties of mango contribute Nucleome Informatics India about 39% of the total fruit production in India 1. Domestication and Dispersal of Mango Cultivars Outline Singh et al. 2016 1. Introduction 2. Improved ‘Amrapali’ assembly with BioNano optical fingerprinting 3. Annotation of protein-coding genes 4. Annotation of repeat elements 5. Centromere, telomere and non-coding RNA genes 6. Segmental duplications and phylogeny 7. Re-sequencing, SNP Discovery, association studies 8. Prospects 1 1/25/2019 2. ‘Amrapali’ Genome Assembly 2. Additional PacBio Data for Reference Assembly Genome PacBio No. of coverage S. No. • Amrapali- 1st Mango hybrid Chemistry Runs No. of reads Total Bases (x) (Dashehari/Neelam) widely grown in India, dwarf, regular bearer P4C2 5 968,567 2,990,090,529 6.6 • Amphidiploid (2n= 40) 1 • Genome Size= 439 Mb 2 P5C3 25 5,357,722 13,407,765,248 29.79 1st Mango draft assembly PAG 2014: (Based on 454 and Illumina data) 3 P5C3 25 6,025,450 15,162,322,569 33.69 Assembly size: 492.6 Mb 4 P6C4 20 8,847,654 13,686,456,431 31.90 Assembly size larger than the 75 21,199,393 45,246,634,777 estimated genome size ? Total 101.98 x 2. Genome size of ‘Amrapali’ by Flow Cytometry 2. ‘Amrapali’ PacBio Reference Assembly Mango Pea Amrapali Assembly Statistics: Parameters Value Customized ‘Falcon’ assembler No. and size of contigs 9,703 (400.9 Mb) Longest contig 995.9 Kb • Self error corrected PacBio reads No. and size of scaffolds 4,312 (403.2 Mb) • 500 bp overlap, 25 % mismatch N50 of scaffolds 282.5 Kb Longest scaffold 2.0 Mb • Polishing using ‘quiver’ Scaffolds > 10 Kbp 3,161 (73.3%) DNA histogram of mango and pea nuclei • Scaffolding using Illumina mate pair Unknown bases (Ns) 2.1 Mb (0.5%) GC Content 32 % X axis : Cell count; Y axis: Relative fluorescence intensity (DNA content) • Heterozygosity problem resolved by PacBio long reads The nuclear DNA content of ‘Amrapali’ genome : 402 ±10 Mb Anchoring of scaffold in Pseudomolecules internal standard Pisum sativum (4300 Mb) Amrapali WGS assembly: • No. of markers (CASTA) mapped in contigs = 6,311/6,594 (95.7%) (Pseudomolecules + scaffolds) • No. of markers included in pseudomolecules = 5,492/6,311 (87.0%) [Based on two independent experiments with 4 replicates each using FACS cell deposited at • No. and size of anchored scaffolds = 1,222 (283.77 Mbp, 70.4 %) sorter (BD-LSR II), BD Biosciences] DDBJ/EMBL/GenBank (LMWC02000000) • Longest pseudomolecule: 23.3 Mb Method Used : Dolezel J, Greilhuber J, Suda J. (2007) Estimation of nuclear DNA December 2017 content in plants using flow cytometry. Nature Protocols 2(9):2233-2245. • Shortest pseudomolecule: 9.5 Mb • No. and size of floating scaffolds: 3,089 (119.53 Mbp, 29.6 % ) 2. Amrapali Genome- NGS Data Summary 2. Amrapali Improved Assembly: BioNano Data summary S. No Sequencing technology No of reads Base pairs (bp) Genome coverage (X) S. No. Parameter Value 1 454-FLX 6,179,100 2,436,840,346 7.64 2 Illumina (MiSeq) 2X250 16,723,094 4,180,773,500 9.29 1 Number of molecules i 890,863 3 Illumina (MiSeq) 2X250 33,091,426 8,272,856,500 18.39 2 Total length of those molecules is (Mb) 64,376 (160x) 4 Illumina (MiSeq) 2X300 37,016,200 11,104,860,000 24.68 5 Illumina (HiSeq) 2 Kb Mate Pair 78,067,384 7,806,738,400 17.35 3 The length N50 (Kb) 93.7 6 Illumina (HiSeq) 8 Kb Mate Pair 71,029,872 7,102,987,200 15.78 4 Minimum length (Kb) 20.1 7 Illumina (Hiseq) Paired end 455,013,076 45,501,307,600 101.11 (insert 250-350bp) 5 Maximum length (Kb) 1,353.6 9 Pacbio-55-runs 12,351,739 31,560,178,346 70 X 2nd Draft of Amrapali genome (PAG 2016) PacBio Draft assembly: 323 Mb (73.08%) NCBI Ac No. LMWC01000000 2 1/25/2019 2. Improved Amrapali Assembly 2. Mango Genome 20 Pseudomolecules with BioNano Optical Fingerprinting (Using high density map by Luo et al. FPS 2016) S. No. Version 2.0 Version 3.0 Parameter (PacBio+Illumina) (Verson 2 +BioNano) 9030 2314 Total Number of Scaffold 1 Total genome coverage (Mbp) 403.18 407. 95 2 No. of scaffold in merged pseudomolecule 1,222 97 3 Total size of pseudomolecule (Mbp) 283.77 293.29 4 Total number of Unordered scaffold (Mbp 3090 2217 ) 5 Total size of Unordered scaffold 119.42 114.68 6 Total Numbe of mapped marker 6145 (95.05%) 6297 (97.5%) 5492 (84.94%) 5845 (90.045%) 7 No of markers used in pseudomolecule 8 4312 2217 No. of scaffold after pseudomolecule 9 Longest pseudomolecule 23.31 Mb 25.5 Mb 10 Smallest pseudomolecule 9.5 Mb 9.8 Mb *Luo Chun, Shu Bo, Yao Quangsheng, Wu Hongxia, Xu Wentian, Wang Songbiao (2016). Construction of a High-Density Genetic Map Based on Large-Scale Marker Development in Mango Using Specific-Locus Amplified Fragment Sequencing (SLAF-seq). Front. Plant Sci., 30 August 2016. BLASTn program was used with advance parameter for mapping of markers 2. Genome Coverage: (BUSCO) Analysis 2. Genetic and Physical Maps Correspondence S.No. Details Mango genome V 2.0 Mango genome V 3.0 (IPacBio+ Illumina) (PacBio+Illumina+Bionano) 1 Complete BUSCOs (C) 1293 (89.8%) 1296 (90.0%) 2 Complete and single-copy BUSCOs (S) 1118 (77.6%) 1114 (77.4%) 3 Complete and duplicated BUSCOs (D) 175 (12.2%) 182 (12.6%]) 4 Fragmented BUSCOs (F) 50 (3.5%) 48 (3.3%) 5 Missing BUSCOs (M) 97 (6.7%) 96 (6.7%) Assembly Completeness analysis Cont.....2. Completeness of the Amraplai Assembly: 3. Annotation of Protein-coding Gene by Mapping 19 Different NCBI-SRA Transcriptome Reads (De novo, Protein Homology and Transcript based) S. No. Particular Value S.No. Type of RNA-Seq Data ( SRA-NCBI) Sequencing Submitter Mapping % Technology 1 Number of gene models 46,283 1 mango F1 population Illumina HiSeq 2500 SSCRI,, China 89.71825678 2 RNA seq Chausa NextSeq 500 ICAR-CISH, India 97.90796949 2 Number of CDS 46,395 3 RNA seq Amrapali NextSeq 500 ICAR-CISH, India 97.8609031 Mangifera indica 'TOMMY ATKINS' transcript reads from mixed organs Illumina HiSeq 2000 Indiana University 98.42958719 3 Number of exons 227,996 4 Mangifera indica 'TOMMY ATKINS' transcript reads from seed Illumina HiSeq 2000 Indiana University 98.05686491 5 Mangifera indica 'TOMMY ATKINS' transcript reads from mesocarp Illumina HiSeq 2000 Indiana University 98.63497871 4 Number of introns 181,601 6 Mangifera indica 'TOMMY ATKINS' transcript reads from leaf Illumina HiSeq 2000 Indiana University 98.19120256 7 Mangifera indica 'TOMMY ATKINS' transcript reads from flower Illumina HiSeq 2000 Indiana University 98.04236459 5 Total CDS length 41.5 Mb 8 Mangifera indica 'TOMMY ATKINS' transcript reads from exocarp Illumina HiSeq 2000 Indiana University 98.03759958 9 Mangifera indica 'TURPENTINE' transcript reads Illumina HiSeq 2000 Indiana University 98.28825616 6 Total gene length 107.5 Mb 10 Mangifera indica 'THAI EVERBEARING' transcript reads Illumina HiSeq 2000 Indiana University 98.32381005 11 Mangifera indica 'NEELUM' transcript reads Illumina HiSeq 2000 Indiana University 98.6131978 7 Shortest gene/CDS 150 bp 12 Mangifera indica 'M. CASTURI "PURPLE"' transcript reads Illumina HiSeq 2000 Indiana University 98.39724743 13 Mangifera indica 'BURMA' transcript reads Illumina HiSeq 2000 Indiana University 96.02066555 8 Longest gene 89,243 bp 14 Mangifera indica 'TOMMY ATKINS' transcript reads from seed coat Illumina HiSeq 2000 Indiana University 97.75470678 15 Mangifera indica 'AMIN ABRAHIMPUR' transcript reads Illumina HiSeq 2000 Indiana University 98.53282503 9 Longest CDS 12,582 bp 16 Amrapali leaf transcritome sequence Illumina MiSeq ICAR-NRCPB, India 97.91252088 17 Plant sample from Mangifera indica Illumina HiSeq 2000 Universidad Nat Auto de Mexico 97.84069045 10 mean gene length 2,324 bp 18 Complete mango transcriptome Illumina HiSeq 2000 University of Karachi, Pakistan 98.74955947 19 MANGO TRANSCRIPTOME Illumina HiSeq 2000 SSCRI, , China 97.99064843 11 mean CDS length 896 bp 12 % of genome covered by genes 26.7 • Mapped RNA-seq reads on to Amrapali Assembly >96 % 13 % of genome covered by CDS 10.3 • Confirm high coverage of the Assembly at transcript level 3 1/25/2019 3. Protein coding genes: Manual Categorization 5. Telomere, Centromere and Non-coding RNA Functional categorization of annotated genes in mango genome * No. of genes 9000 7884 8000 7000 6000 5363 4804 5000 4610 4000 3603 3000 2477 2267 2247 1975 2178 2000 1451 1136 1187 998 904 1030 1000 567 644 584 197 290 0 * Using Swissprot, GO, NCBI and AHRD Total 1686 non-coding RNA genes belonging to 11 major families 4.