Comparative Genomics of the Major Parasitic Worms
Total Page:16
File Type:pdf, Size:1020Kb
Comparative genomics of the major parasitic worms International Helminth Genomes Consortium Supplementary Information Introduction ............................................................................................................................... 4 Contributions from Consortium members ..................................................................................... 5 Methods .................................................................................................................................... 6 1 Sample collection and preparation ................................................................................................................. 6 2.1 Data production, Wellcome Trust Sanger Institute (WTSI) ........................................................................ 12 DNA template preparation and sequencing................................................................................................. 12 Genome assembly ........................................................................................................................................ 13 Assembly QC ................................................................................................................................................. 14 Gene prediction ............................................................................................................................................ 15 Contamination screening ............................................................................................................................. 16 2.2 Data production, McDonnell Genome Institute (MGI) .............................................................................. 18 Genome sequencing library preparation ...................................................................................................... 18 Genome assembly ........................................................................................................................................ 19 Assembly QC / Contamination screening ..................................................................................................... 20 Transcriptome sequencing and assembly .................................................................................................... 20 Gene prediction ............................................................................................................................................ 20 2.3 Data production, Blaxter Nematode and Neglected Genomics (BaNG) .................................................... 21 Genome sequencing library preparation and sequencing ........................................................................... 21 Genome assembly ........................................................................................................................................ 22 Assembly QC ................................................................................................................................................. 22 Gene prediction ............................................................................................................................................ 22 3 Functional annotation ................................................................................................................................... 22 Assigning protein names to predicted proteins............................................................................................ 22 Assigning GO terms to predicted proteins ................................................................................................... 23 4 Repeat libraries and repeat-masking ............................................................................................................ 24 5 Regression model for genome size ............................................................................................................... 25 6 Mitochondrial genome analysis .................................................................................................................... 26 7 Defining high-quality ‘tier 1’ species for downstream analyses ................................................................... 26 8 Compara database of gene families .............................................................................................................. 27 Construction of the in-house Compara database ........................................................................................ 27 Identification of gene families, orthologs and paralogs .............................................................................. 28 1 9 Identification of synapomorphic gene families ............................................................................................. 28 10 Phylogenetic analysis of candidate lateral gene transfers .......................................................................... 28 11 Network representation of gene families ................................................................................................... 29 12 Phylogenetic tree based on gene family presence/absence ...................................................................... 29 13 Identification of gene family expansions .................................................................................................... 29 14 Species Tree ................................................................................................................................................. 32 15 Novel domain combinations ....................................................................................................................... 32 16 Ion Channels and ABC Transporters ............................................................................................................ 33 17 Proteases ..................................................................................................................................................... 33 18 Kinase prediction ......................................................................................................................................... 33 20 Signal peptide for secretion and TM domains predictions ......................................................................... 34 21 InterPro and GO annotations ...................................................................................................................... 34 22 Species-level functional enrichment (GO / InterPro / Pfam) analysis ........................................................ 34 23 SCP/TAPS protein family ............................................................................................................................. 34 24 GPCR analysis .............................................................................................................................................. 35 25 Metabolism ................................................................................................................................................. 36 Assigning ECs to predicted proteins and generating high-confidence EC predictions ................................. 36 Reconstructing metabolic pathways and pathway hole-filling .................................................................... 37 Analysis of KEGG metabolic modules and pathways ................................................................................... 37 Analysis of chokepoints in metabolic pathways ........................................................................................... 38 Carbohydrate active enzymes (CAZymes) .................................................................................................... 39 26 Identification of Potential Anthelmintic Drug Targets and Drugs............................................................... 39 Known anthelmintic drugs and compounds ................................................................................................. 39 Dendrogram of known anthelmintic compounds......................................................................................... 40 Identifying potential helminth drug targets ................................................................................................. 40 Identifying potential new anthelmintic drugs in ChEMBL ............................................................................ 44 Diversity analysis for creating a ‘diverse screening set’ ............................................................................... 45 Identifying compounds available for purchase using ZINC15 ...................................................................... 46 Self-organising map of compounds .............................................................................................................. 47 Supplementary Results .............................................................................................................. 48 1. Genomic diversity in parasitic nematodes and platyhelminths ................................................... 48 1.1 Genome sequencing and assembly ............................................................................................................ 48 Sequencing strategy ..................................................................................................................................... 48 Genome assembly pipeline validation .......................................................................................................... 48 Assembly statistics ......................................................................................................................................