<<

Mining Microbial Metagenomes for Novel Insecticidal Proteins Shilova IS1, Johnson AJ1, Chan L1, Dabbagh K1, David M1, Davis IW2, Haas JA2, Jain S1, Iwai S1, Loriaux P1, Ramachandran P1, Rutherford E1, Wegener KM2, Weinmaier T1, Williams RJ2, Wu Y1, DeSantis TZ1, Bennett KA1 1Second Genome, Inc, 341 Allerton Ave, South San Francisco, CA 94080, USA, 2Monsanto, 700 Chesterfield Parkway West, Chesterfield, MO 63017, USA

ABSTRACT III. APPROACH 4. Individual carbon compounds stimulated different phyla containing insecticidal-like taxa Microbes are genetically highly diverse and have evolved a variety of strategies in order to compete for nutrients, Carbon Carbon Carbon Carbon Carbon sample including pathogenesis of insects. Insecticidal microbes and their genes have been explored for agricultural Control source #1A source #1B source #2A source #2B source #3 uses for decades. Traditional methods to discover insecticidal and toxin genes are based on isolating Enriched in C#1 addition Enriched in Control Enriched in C#2 addition Enriched in Control bacteria on selective media, focusing primarily on the spore-forming species Bacillus thuringiensis. Thus, the p__[Thermi] Bacteria p__[Thermi] p__WPS−2 Bacteria diversity and novelty of isolated bacteria is limited by what can form colonies on the selective media. An � p__TM7 p__NKB19 Archaea alternative approach to identifying toxin genes is based on nucleotide homology searches in all available p__Cyanobacteria log2 ≥ 1 p__WS2 log2 ≥ 1 + H2O padj ≤ 0.05 sequenced genomes. However, most of the bacteria from soil microbial communities have not been cultured, and p__FBP p__GN02 padj ≤ 0.05 - incubated at 20˚C in the dark for 72h p__BRC1 p__BRC1 their genomes have not been sequenced. Our approach to identify novel insect is based on mining soil p__TM7 - sampled at 0, 24, 48 and 72 hours p__Nitrospirae metagenomes, in which the genetic potential of the whole microbial community includes uncultured species. In p__Fibrobacteres p__Chlorobi addition, we use multiple enrichment strategies on natural microbial communities to increase the abundance of 16S rRNA V4-based analysis of p__TM6 Illumina sequencing of metagenomes p__Acidobacteria p__OP11 community composition p__Chlamydiae insecticidal bacteria. The metagenomes from the enriched communities are then assessed using the Second p__Armatimonadetes p__FBP Genome (SG) discovery platform which includes statistical models trained on known insect toxins. The output p__Spirochaetes p__OD1 p__Firmicutes p__WS3 Kingdom from this analysis is a set of potentially novel insecticidal proteins that can be tested in vivo. k__Bacteria p__Chlorobi k__Bacteria p__Gemmatimonadetes k__Archaea SG Discovery platform (Section I) to p__Nitrospirae k__Archaea

• Alpha diversity of microbial community Phylum p__NKB19 • Differential analysis of community composition in different identify toxin proteins p__Verrucomicrobia p__Verrucomicrobia p__Acidobacteria OBJECTIVES treatments to determine which carbon source results in the p__Chloroflexi p__Armatimonadetes p__Cyanobacteria enrichment for insecticidal bacteria p__Euryarchaeota p__Bacteroidetes p__Bacteroidetes p__unclassified • Enrich soil microbial communities for insecticidal bacteria p__OD1 p__Firmicutes p__Actinobacteria • Obtain metagenomes of the enriched communities and also public genomic and metagenomic data IV. RESULTS p__Planctomycetes p__Gemmatimonadetes • Identify novel insect toxins in metagenomes based on statistical models p__unclassified p__Euryarchaeota 1. High microbial diversity with most of identified OTUs not in culture p__Actinobacteria p__Planctomycetes p__Proteobacteria p__Proteobacteria p__Chloroflexi Chitin Compost_Chitin 6000 Chitin Compost Compost_Chitin −10 −5 0 5 60006000 -10 log 2 fold change-5 of Control_T00 over Litter 5 −10 −5 0 5 -10log 2 fold change-5 of Control_T0 over0 Compost 5 4000 log fold change in relative abundance in Control over C#1 I. SG discovery platform from metagenomes is based on SG knowledge 40004000 2 log2 fold change in relative abundance in Control over C#2

2000 bases, ’omics pipelines and toxin-specific statistical modeling 20002000 Treatment 200,000 TreatmentChitin Treatment Chitin 0 Compost • 1119 differential OTUs in C#1 addition • 1522 differential OTUs in C#2 addition 200,000 Chitin 00 Compost_ChitinCompost Compost Control Grass Litter 6000 Control Grass Litter ControlCompost_Chitin Compost_Chitin 60006000 • Top enriched phyla: , • Top enriched phyla: Actinobacteria, GrassControl Control Grass Number of OTUs Litter

Total Reads Total Grass Sample Number of OTUs Litter , , , Litter 4000 processing DNA 40004000 Total Reads Total

150,000 OTUs of Number 2000 20002000 SG Metagenomics 150,000 Pipeline 0 0 0 0 100,000 200,000 0 100,000 200,000 0 100,000 200,000 SG KnowledgeBase: 0 100,000 200,000 0 100,000 200,000 0 100,000 200,000 5. Metagenomes contained homologues to insecticidal toxins with variability

• 24K genomes Number of Sequencing Reads KB Treatment Number of Sequencing Reads • 116 metagenome assemblies Treatment within and among treatments Public Number of Sequencing Reads metagenomes MTG • High sequencing depth (~180K reads per sample) • Yet, more OTUs remain to be discovered Protein Assay data using known Treatment Relative abundances of the top 80 Time Annotation insect toxins • ~13,800 OTUs identified (at 97% nucleotide identity to GreenGenes) insecticidal toxin proteins (rows) in the Pipeline Assay Data • stunting • 10% of OTUs had >97% nucleotide identity to sequenced genomes metagenomes (columns) are shown. • mortality • Read-hits to insecticidal toxins were 2. Changes in alpha microbial diversity varied between carbon sources normalized to total reads in each sample, and the values were scaled to Chi-square 6.2 Chi-square 6.2 Chi-square 1.9 Chi-square 1.9 KW Pval=0.01 KW Pval=0.01 the mean of each row. features label KW Pval=0.17 KW Pval=0.17 Normalized Normalized

hits to toxins to hits • The number of read-hits to toxins x1,1 x1,2 ⋯ x1,m ・ ranged from 80 to 200,000 per sample, New, x x ⋯ x ☠ with the median of 1,500 read-hits per 2,1 2,2 2,m testable ⋮ ⋮ ⋱ ⋮ ⋮ sample protein insect toxins Treatment x x ⋯ x Toxin • Abundances of read-hits to toxins n,1 n,2 n,m ☠ Toxins Prediction Control varied within and among treatments Pipeline Carbon #1 • The initial soil (Control) and initial Carbon #2 Carbon sources contained toxin-like

Carbon #3 proteins (1.5% of total reads) • After 48 h of incubation, at least two II. The manually curated ontology system in the SG KnowledgeBase samples with Carbon #1 additions had Control C#1 Control C#1 C#2 Control C#2 Control Time, h high number of read-hits to toxins. facilitates cross-study comparisons 0 • Significantly reduced diversity in response to Carbon #1 addition 24 • Not significant changes in diversity in response to Carbon #2 addition 48

SG Samples Nucleotide ontology 3. Strong shifts in soil microbial communities within 72 hours after datasets addition of complex organic carbon compounds

Principal Coordinate Analysis on Bray-Curtis distances CONCLUSIONS • Carbon compounds had distinct effects on microbial Public datasets Metadata community (Permanova, p =0.001) • Combination of microbiology, bioinformatics and data science allows to mine the genetic potential of the 0.40.4 Carbon #1 addition curation SG Data • The shift, when present, was observed in 24h after the total microbial community for novel insecticidal toxins and gives access to uncultured OTUs + Knowledge analysis Time start of incubation and was similar after 48h and 72h • The SG Discovery platform is applicable to other fields for discovery of microbial proteins beneficial for Second Genome T0 Data staging Base T24 T48 and human health T72 0.20.2 Treatment Chitin Client datasets Compost Axis.2 [18.8%] Axis.2 Time,Compost_Chitin h Carbon #2 addition Control ACKNOWLDGEMENTS Grass Axis 2 [18.8%] 2 Axis Litter0 0.00.0 • Data is obtained from public and private datasets and includes nucleotide and amino acid 24 Mohan Iyer and Nadir Mahmood for initiating the project, Gary Andersen and Tom Malvar for consultation sequences, gene and transcript abundances and taxonomic annotations 48 and discussions, and our interns: Emily Casey for help in the lab, and Akta Akta for help with SG • SG Ontology describes biospecimen and environmental metadata using curated ontology terms Control 72 Knowledge Base. −0.25 0.00 0.25 0.50 -0.25 0.0 Axis.1 [52.9%]0.25 0.5 Axis 1 [52.9%]

Contact Information: [email protected] [email protected]