Milkweed Bug (Oncopeltus Fasciatus)
Total Page:16
File Type:pdf, Size:1020Kb
Milkweed Bug (Oncopeltus fasciatus) Genome Consortium [Logo by Chiaki Ueda] Supplementary Information Table of Contents 1. Genome and transcriptome sequencing and assembly ............................................ 4 1.1 Source materials, DNA and RNA purification ..................................................... 4 1.2 Library preparation ............................................................................................... 4 1.3 Sequencing ............................................................................................................ 5 2. Genome characteristics, quality control, expression analyses ................................ 6 2.1 Genome size .......................................................................................................... 6 2.1.a Flow cytometry estimation ..................................................................... 6 2.1.b k-mer estimation .................................................................................... 8 2.2 Lateral gene transfer events and bacterial contamination ................................... 13 2.3 Repeat content ..................................................................................................... 19 2.4 Comparative transcriptomic assessments of hemipteroid reproductive biology 23 3. Automated gene annotation using a Maker 2.0 pipeline tuned for arthropods ..... 26 4. Community curation and generating the official gene set .................................... 28 1 5. Curation and comparative analysis of specific gene families ............................... 33 5.1 Developmental regulation: transcription factors and signaling pathways .......... 33 5.1.a Anterior-posterior body axis: terminal patterning system and segmentation .................................................................................................... 34 5.1.b Hox and other homeobox transcription factors .................................... 36 5.1.c Iroquois Complex (Iro-C) cluster ......................................................... 39 5.1.d T-box transcription factors and heart determinants ............................. 42 5.1.e Nuclear receptors.................................................................................. 44 5.1.f Dorsal-ventral body axis: BMP/TGF- pathway ................................. 46 5.1.g Dorsal-ventral body axis: Toll/ NFB pathway ................................... 48 5.1.h Innate immunity ................................................................................... 51 5.1.i Notch, Hedgehog, and Torso RTK pathways ....................................... 53 5.1.j Wnt pathway ......................................................................................... 55 5.1.k Appendage patterning .......................................................................... 59 5.1.l Germline genes ..................................................................................... 61 5.1.m Eye development ................................................................................. 63 5.2 Structural and differentiation genes .................................................................... 65 5.2.a Bristle and neural development ............................................................ 65 5.2.b Molting and metamorphosis genes ...................................................... 72 5.2.c Structural cuticular proteins and pigmentation .................................... 74 5.3 Environmental adaptations .................................................................................. 77 5.3.a Stress response ..................................................................................... 77 5.3.b Cytochrome P450s ............................................................................... 79 5.3.c Insecticide resistance ............................................................................ 80 2 5.3.d Neuropeptides and their receptors........................................................ 81 5.3.e Visual genes and light detection........................................................... 82 5.3.f Chemoreceptors .................................................................................... 85 5.4 Molecular machinery .......................................................................................... 95 5.4.a Gene silencing machinery (RNAi, miRNA, piRNA) ........................... 95 5.4.b Sex determination and dosage compensation ...................................... 97 5.4.c Epigenetic machinery ........................................................................... 99 5.4.d Repressive C2H2 zinc finger effectors (KAP-1/ TRIM proteins) ..... 107 6. Post-OGS v1.1 pipeline analyses ........................................................................ 109 6.1 Protein gene orthology assessments via OrthoDB and BUSCO ....................... 109 6.2 Transcription factor classifications and orthology assignments ....................... 112 6.3 Gene structure evolution ................................................................................... 113 6.4 Interspecific comparisons of metabolic enzymes ............................................. 116 7. References ........................................................................................................... 117 3 1. Genome and transcriptome sequencing and assembly Contributors: Stephen Richards, Daniel S.T. Hughes, Shwetha C. Murali, Jiaxin Qu, Shannon Dugan, Sandra L. Lee, Hsu Chao, Huyen Dinh, Yi Han, HarshaVardhan Doddapaneni, Kim C. Worley, Donna M. Muzny, Richard A. Gibbs, Kristen A. Panfilio, Stefan Koelzer The milkweed bug Oncopeltus fasciatus is one of thirty arthropod species sequenced as a part of a pilot project for the i5K arthropod genomes project at Baylor College of Medicine Human Genome Sequencing Center. For all of these species, an enhanced Illumina-ALLPATHS-LG (v. 35218) sequencing and assembly strategy enabled multiple species to be approached in parallel at reduced costs. For most species in the pilot, including O. fasciatus, we sequenced four libraries of nominal insert sizes 180 bp, 500 bp, 3 kb and 8 kb. The amount of sequence generated from each of these libraries is noted in Table S 1.1, with NCBI SRA accessions. 1.1 Source materials, DNA and RNA purification Genomic DNA was extracted from an individual adult male to construct the main sequencing libraries: 180-bp, 500-bp paired end and 3-kb mate pair libraries. A fourth, larger mate pair library with 8-10 kb inserts was constructed with DNA extracted from an individual adult female, due to the higher amount of starting DNA required for this library. Additionally, to aid in genome assembly and gene prediction, RNA was extracted from three samples representing three different life history samples: an individual adult male, an individual adult female, and pooled, mixed- instar nymphs. 1.2 Library preparation To prepare the 180-bp and 500-bp libraries, we used a gel-cut paired end library protocol. Briefly, 1 µg of the DNA was sheared using a Covaris S-2 system (Covaris, Inc. Woburn, MA) using the 180-bp or 500-bp program. Sheared DNA fragments were purified with Agencourt AMPure XP beads, end-repaired, dA-tailed, and ligated to Illumina universal adapters. After adapter ligation, DNA fragments were further size selected by agarose gel and PCR amplified for 6 to 8 cycles using Illumina P1 4 and Index primer pair and Phusion® High-Fidelity PCR Master Mix (New England Biolabs). The final library was purified using Agencourt AMPure XP beads and quality assessed by Agilent Bioanalyzer 2100 (DNA 7500 kit) determining library quantity and fragment size distribution before sequencing. The long mate pair libraries with 3-kb or 8-kb insert sizes were constructed according to the manufacturer’s protocol (Mate Pair Library v2 Sample Preparation Guide art # 15001464 Rev. A PILOT RELEASE). Briefly, 5 µg (for 2 and 3-kb gap size library) or 10 µg (8-10 kb gap size library) of genomic DNA was sheared to desired size fragments by Hydroshear (Digilab, Marlborough, MA), then end repaired and biotinylated. Fragment sizes between 3-3.7 kb (3 kb) or 8-10 kb (8 kb) were purified from 1% low melting agarose gel and then circularized by blunt-end ligation. These size selected circular DNA fragments were then sheared to 400 bp (Covaris S- 2), purified using Dynabeads M-280 Streptavidin Magnetic Beads, end-repaired, dA- tailed, and ligated to Illumina PE sequencing adapters. DNA fragments with adapter molecules on both ends were amplified for 12 to 15 cycles with Illumina P1 and Index primers. Amplified DNA fragments were purified with Agencourt AMPure XP beads. Quantification and size distribution of the final library was determined before sequencing as described above. 1.3 Sequencing Sequencing was performed on Illumina HiSeq2000s generating 100-bp paired end reads. Reads were pre-processed using cutadapt for adapter removal and sickle-trim for quality trimming (- min length 20bp). Subsequently, reads were assembled using ALLPATHS-LG (v35218) [1] on a large memory computer with 1 TB of RAM and further scaffolded and gap-filled using in-house tools Atlas-Link (v.1.0) and Atlas gap-fill (v.2.2) (https://www.hgsc.bcm.edu/software/). This yielded an assembly of 1.099 Gb (774 Mb without gaps within scaffolds), with a contig N50 of 4.0 kb and scaffold N50 of 340 kb, which has been deposited in GenBank (assembly accession GCA_000696205.1). Table S 1.1: Sequencing, assembly, annotation statistics and accession numbers in Excel Supplement 5 2. Genome