Metagenome Analysis and Draft Genome Reconstruction of Produced Water Samples from Research & Coalbed Methane Environments Innovation Center Daniel E

Metagenome Analysis and Draft Genome Reconstruction of Produced Water Samples from Research & Coalbed Methane Environments Innovation Center Daniel E. Ross,1,2* Djuna Gulliver1 1National Energy Technology Laboratory, U.S. Department of Energy, 2AECOM, (*[email protected]) Abstract Metagenome Analysis Genome Analysis Biogasification is a process that utilizes the microbial Metagenome reads Binned contigs community native to coalbeds to naturally convert Contig mapping to reference genome sequence currently unusable coal into readily available methane. One methodology involves injection of nutrients into the coal seams to stimulate biogenic coal degradation and methanogenesis. Identification of major functional pathways of biogenic coal degradation and subsequent methane production will lead to a better understanding of the coal-to-methane conversion, the microorganisms responsible for this conversion, and the nutrients DNA isolation Metagenome Metagenome Mapping binned contigs Contigs from the Pseudomonas genome bin from the K35 metagenome (bottom) were aligned to the complete genome sequence required to bolster this conversion in situ. This study and sequencing Assembly Binning to reference genomes of Pseudomonas stutzeri CCUG 29243 (top). Red lines demarcate contig boundary. examines the metagenome of four produced water samples from the Central Appalachian Basin Whole genome comparisons (Pocahontas 3 coal seam) to determine the Marine isolates composition (who’s there) and the potential functional Clinical isolates pathways (what can they do) of the resident microbial Soil/sludge community. Nucleic acid was recovered from produced isolates water samples using DNA isolation techniques and the quality and quantity of DNA was assessed. Illumina MiSeq next generation sequencing was employed, and Milici and Polyak, 2014 Metagenome Metagenome contigs Genome bins the resultant nucleic acid sequence data was processed using a suite of bioinformatics software. Four metagenomes, named K34, K35, BB137, and L32A were obtained from produced water samples Taxonomic assessment Metagenome Assembly from a depth of 1704 ft, 1912 ft, 1980 ft, and 2578 ft, of metagenome respectively. Methanogens were present in all samples, suggesting methanogenesis can occur. K35 K34 Furthermore, hydrocarbon degradation pathways were found, suggesting a route for biodegradation of coal. Importantly, a draft genome most closely related to Methanogens Pseudomonas stutzeri CCUG was extracted from the L32A K35 metagenome. Initial analysis of the draft genome revealed a complete nitrogen fixation pathway, and a BB137 naphthalene degradation pathway. Goals: • Investigate the microbial community in potential biogasification sites K34 Sequence reads were assembled and contig length was plotted Pan-genome tree comparing the draft genome of P. stutzeri K35 to 31 complete and draft Pseudomonas genomes. The tree • Characterize relevant functional pathways found in against contig number. A steeper slope represents a better represents the degree of similarity between the predicted proteins encoded by each genome. coal systems required for coal-to-methane 1704 ft. assembly. bioconversion Genome Results. The K35 Pan-genome Analysis Overview of functional • Construct draft genomes of abundant metagenome was estimated to be microorganisms in coal systems to complete a potential Pseudomonas Metagenome Binning ~50% Pseudomonas. After careful detailed characterization of prevalent functional Jones et al., 2010 contig binning and genome Pseudomonas potential mapping, the Pseudomonas stutzeri K35 purines, FAD, folate, his genome bin was 99.2% complete, pyrimidines riboflavin estimated by the presence/absence of 833 marker Methods: Metagenome Analysis Pipeline (MAP) D-Xylulose-5P genes. The pan-genome was The first step in the metagenome analysis Sugars Glucose-6-P DHAP 1. Sample Collection and Preservation determined and the core genome PAL pipeline (MAP) involves careful and K35 Fructose- Fructose- (genes common across all strains 6-P analysis 1,6-2P metadata Store aliquots for later use ser, calculated sample collection. Samples are 2-keto-3-deoxy-6-P-gluconate Glyceraldehyde- tested) was estimated. The draft gly, 1912 ft. 3-phosphate Enrichment collected by drillers, or when possible on genome encodes for a complete cys 2. Nucleic Acid extraction cultures site by NETL researchers. Importantly, to nitrification pathway as well as the Glyceraldehyde- Pyruvate Store aliquots for later use preserve sample integrity and prevent Acetyl-CoA upper and lower naphthalene Pan-genome of 32 Pseudomonas stutzeri 3-phosphate nucleic acid degradation, samples are degradation pathways. The work strains. Each bar represents the number of ED pro, 3. Nucleic Acid Analysis immediately aliquoted and frozen. After presented here represents an gene clusters found in 1 to 32 strains Pathway TCA arg examined. transport from the field, samples are thawed initial metagenomic/genomic Nitrite Nitrate and processed for DNA extraction. The Samples to be Contigs from the K35 metagenome assembly were binned. Each approach to functional Future work. Results presented Dinitrogen oxide Nitrite Quality assessment | DNA Library preparation collected by drillers, quality and quantity of DNA is assessed Methanogens dot represents a contig and each cluster represents a potential characterization of coalbed mailed to NETL-MGN. here will provide a framework for Dinitrogen Ammonia before preparing samples for sequencing. genome. methane microbial communities. tailoring nutrient amendments for Fd microbial enhanced coalbed Fd 3.1 16S rRNA 3.2 Metagenome The second step involves processing DNA analysis Metagenome Results. Four metagenomes were methane and provide a baseline gene analysis samples to generate a sequence library to BB137 analyzed and classified taxonomically. Generally, for monitoring changes in the be loaded into the sequencer (Illumina Sequencing to be 1980 ft. all samples contained Bacteria and Archaea, microbial community during MiSeq). Processing involves cleaning, done at NETL-PIT in mostly comprised of Proteobacteria and amendments. 3.1.1 16S 3.2.1 DNA Gulliver Lab (84-223) barcoding, and pooling DNA samples. amplification and sequencing and samples sent to Euryarchaeota. With the exception of K35, all sequencing Earth Microbiome metagenomes were dominated by Methanogens REFERENCES: Project (EMP) The third MAP step is the most time- from the order Euryarchaeota. Short DNA 1. Milici, R.C., and Polyak, D.E., 2014, Coalbed-methane production in the Appalachian basin, chap. G.2 consuming and computationally intensive. of Ruppert, L.F., and Ryder, R.T., eds., Coal and petroleum resources in the Appalachian basin; sequencing reads (250 bp) were assembled into Here, data that is retrieved from the Distribution, geologic framework, and geochemical character: U.S. Geological Survey Professional 3.2.2 Binning and longer contigs. The contigs were binned according Paper 1708, 25 p., http://dx.doi.org/10.3133/pp1708G.2. (Chapter G.2 supersedes USGS Open-File Assembly sequencer is processed and analyzed. Report 02–105.) to genomic signature. Each bin was individually Processing involves removing barcodes and 2. Jones E JP, Voytek MA, Corum MD, and Orem WH (2010). Stimulation of Methane Generation from analyzed for genome completeness by comparing Nonproductive Coal by Addition of Nutrients or a Microbial Consortium 76(21):7013-7022. Bioinformatics on trimming reads based upon the quality This technical effort was performed in support of the National Energy Technology Laboratory’s ongoing research under the RES contract 3.1.3 16S Linux in Gaia Lab at contigs to a reference marker gene set. Based ACKNOWLEDGEMENT: score (a measure of the confidence of each Methanogens DE‐FE0004000. clustering and 3.2.3 Annotation NETL-PIT upon the presence or absence of these marker annotation base call). Analysis involves metagenome DISCLAIMER: This project was funded by the Department of Energy, National Energy Technology Laboratory, an agency of the United States Government, through a genes, the % genome completeness was support contract with AECOM. Neither the United States Government nor any agency thereof, nor any of their employees, nor AECOM, nor any of their employees, assembly, binning, and annotation. L32A estimated. makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or 2578 ft. service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof. Science & Engineering To Power Our Future.

Load more