Supplementary Information Appendix 2 3 1

Supplementary Information Appendix 2 3 1

1 Supplementary Information Appendix 2 3 1. Plant material, sequencing, assembly and map integration of the Nicotiana attenuata and N. 4 obtusifolia ge no mes .......................................................................................................................... 2 5 1.1 Plant material and DNA preparation ......................................................................................... 2 6 1.2 Raw data processing................................................................................................................. 2 7 1.3 Genome assembly pipeline. ....................................................................................................... 3 8 1.4 Generating the optical map ....................................................................................................... 4 9 1.5 Anchor scaffolds to pseudomolecules ......................................................................................... 5 10 1.6 Quality assessment and validation of assembly ........................................................................... 6 11 2. Genome annotation ...................................................................................................................... 7 12 2.1 Annotation of repeats and LTR insertion time estimation in Nicotiana ......................................... 7 13 2.2 Analyzing DTT-NIC1 insertions in Nicotiana genomes................................................................ 8 14 2.3 Annotation of miRNA, tRNA and rRNA ...................................................................................... 9 15 2.4 Annotation of protein coding genes.......................................................................................... 10 16 2.5 RNA sequencing and data analysis .......................................................................................... 12 17 2.6 Expression of transposable elements........................................................................................ 14 18 2.7 Functional annotation of protein coding genes ......................................................................... 14 19 2.8 Functional annotation of protein domains ................................................................................ 15 20 2.9 Annotation of transcription factor and protein kinase genes ...................................................... 16 21 3. Evolutionary genomics analyses ................................................................................................. 16 22 3.1 Homologous group identification ............................................................................................ 16 23 3.2 Detection of gene duplication events in each HG ...................................................................... 16 24 3.3 Confirmation of the whole-genome triplication in Nicotiana...................................................... 18 25 3.4 Estimation of species divergence times..................................................................................... 20 26 3.5 Identification of lineage-specific gene family expansion ............................................................ 21 27 4. Evolution of nicotine biosynthesis .............................................................................................. 22 28 4.1 Identification and reconstruction of the evolution history of nicotine biosynthesis genes ............. 22 29 4.2 Validation of the promoter sequences of nicotine biosynthesis genes and their ancestral copies... 23 30 4.3 Prediction of TE-derived putative miRNA target sites into regulatory region of nicotine 31 biosynthesis genes........................................................................................................................ 23 32 4.4 Evolution of root-specific expression of nicotine biosynthesis genes........................................... 24 33 4.5 Data access............................................................................................................................ 25 34 5. Supplemental Tables .................................................................................................................. 27 35 6. Supplemental Figures................................................................................................................. 36 36 7. Captions for supplementary datasets S1 to S9 ............................................................................ 72 37 8. References .................................................................................................................................. 74 38 1 39 1. Plant material, sequencing, assembly and map integration of the Nicotiana attenuata and 40 N. obtusifolia genomes 41 1.1 Plant material and DNA preparation 42 Plants were grown as previously described (1). The genomic DNA sequenced by 454 and 43 Illumina HiSeq2000 technologies was isolated from late rosette-stage plants using the CTAB-method(2). 44 The two N. attenuata DNA plants used for this extraction were from a 30th generation inbred line, 45 referred to as the UT accession, which originated from a 1996 collection from a native population in 46 Washington County, Utah, USA (1). N. obtusifolia DNA was obtained from a single plant of the first 47 inbred generation derived from seeds collected from a native population in 2004 at the Lytle Ranch 48 Preserve, Saint George, Utah, USA. High molecular weight genomic DNA used to generate the optical 49 map of N. attenuata was isolated using a nuclei based protocol (3) from approximately one hundred 50 freshly harvested N. attenuata plants (harvested 29 days post germination) of the same inbred generation 51 and origin as used for the short-read sequencing. 522. To reduce the potential effects of secondary metabolites on single molecular sequencing, the plant 53 material used for PacBio sequencing was from a cross of two isogenic N. attenuata transgenic lines 54 (mother: ir-aoc, line A-07-457-1, which was transformed with pRESC5AOC [GenBank KX011463] and 55 is impaired in JA biosynthesis; father: irGGPPS, line A-07-230-5, which was transformed with 56 pRESC5GGPPS [GenBank KX011462] and is impaired in the synthesis of the abundant 17- 57 hydroxygeranyllinalool diterpene glycosides), both generated from the 22nd inbred generation of the 58 same origin as the N. attenuata plants described above (1). Genomic DNA was isolated from 59 approximately one hundred young plants (harvested 29 days post germination) by Amplicon Express 60 (http://ampliconexpress.com) according to a proprietary protocol. 61 1.2 Raw data processing 62 We filtered and trimmed the Illumina HiSeq2000 whole-genome shotgun raw sequences of N. 63 attenuata and N. obtusifolia to obtain high-quality non-duplicated read pairs prior to genome assembly. 2 64 This was achieved by a custom script that extracted only the largest reads, longer than 32 bp, and which 65 contained no bases with a Phred quality score lower than 11. We compared the first 32 bp of each read in 66 a pair to those of other read pairs and retained only one pair if the same sequence was found more than 67 once (deduplication). Roche/454 long reads for N. attenuata were processed by the sffToCA script 68 provided by the Celera Assembler v7 pipeline (CA7). Table S3 provides details of the filtered data used 69 for the genome assemblies. 70 1.3 Genome assembly pipeline. 71 The overall assembly workflow for N. attenuata is shown as SI Appendix, Fig. S3. All paired-end 72 reads from the sequenced libraries were assembled using the Celera Assembler (CA7) with a minimum 73 read length cut-off at 64bp. Preliminary tests showed that single end or short reads did not improve the 74 assemblies, but increased calculation time significantly. In total, 86.4 Gb short read data were assembled. 75 The expected genome size of N. attenuata is of 2.54 Gb based on coverage of the “larger than N50 length 76 unitigs”, similar to the estimation (1C=2.5pg) from the flow cytometry analysis. We used the SSPACE v2 77 scaffolder to further improve scaffolding using the mate-pair data and filled gaps using GapCloser v1.12 78 (4). After manual inspections, we found that certain neighboring contigs in the scaffolds still contained 79 overlaps, which might be due to the assembly process from CA7 that places copies of repeat sequence at 80 the end of contigs or due to issues in Gapcloser v1.12 that leave open some closable gaps. To close these 81 gaps between overlapped contigs, we compared neighboring contigs using BLASTN (min. identity 82 95%/min. length 43) and then joined overlapping contigs with custom scripts. 83 The assembly scaffolds from short reads were used for gap filling and further scaffolding using 84 PBJelly (v15.8.24)(5) with ~10x PacBio reads (N50=14.9 kb, max read length =48.9 kb), which resulted 85 in a 2.17 Gb genome assembly. While PBjelly only increased the N50 scaffold size from 176 kb to 202 kb, 86 it significantly increased the N50 contig size from 67 kb to 90 kb. Because PacBio reads contain about 87 12-15% errors, we performed an additional correction step using short reads with PILON(6). In total, 98.2% 88 of the draft assembly was confirmed by short reads and ~1.3 Mb sequences were corrected by PILON. 3 89 The PILON corrected assembly was further mapped to the 10x PacBio reads using BLASR and used 90 SSPACE-longreads for the second round scaffolding, which increased the N50 scaffold size to 292 kb. 91 To assemble the N. obtusifolia genome, we employed a hybrid strategy in which we first 92 assembled

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    77 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us