Microbial Community Response to Heavy and Light Crude Oil in the Great Lakes

Stephen Techtmann 10/24/19 Microbial Sensors Techtmann Lab @ MTU

Investigating the applications of environmental microbial communities

Hydraulic Fracturing Related Antibiotic Resistance Oil Bioremediation Techtmann Lab @ MTU Overview

• Background on oil biodegradation • Microbial response to light and heavy crude oil in the Great Lakes • Machine learning for prediction of contamination in the Great Lakes. Oil Spills Deepwater Horizon Enbridge Line 6B Deepwater Horizon Oil Spill

• 4,1000,000 bbl of oil released • Light Sweet Crude oil released • April 20, 2010 • 1101.7 miles of shoreline oiled

Atlas and Hazen 2011 Enbridge Line 6B Spill – Marshall MI

• 20,082 bbl of oil released • Diluted Bitumen • July 26, 2010 • 70 miles of shoreline oiled

https://www.mlive.com/news/kalamazoo/2010/07/state_of_emergency_declared_as.html Oil Transmissions Pipelines in the Great Lakes Region

Line 5: • 645 miles from Superior WI to Sarnia Ontario • 540,000 barrels per day • Light crude and natural gas liquids (NGLs) Crude oil Oil types and API Gravity Microbes and Biotechnology (Bioremediation)

Low cost input Microbe High value output Decreased Cost Contaminant Increased Efficiency Carbon dioxide or non- toxic daughter products

Carbon dioxide

Microbial Biomass

Petroleum Microbe Daughter Products

Water Microbial Ecology and Biotechnology

Low cost input Microbe High value output

Decreased Cost/Increased Efficiency Complex input Input A Microbe Microbe Output A

Input B Microbe Output B High value input output Input C Microbe Output C

Input D Microbe Output D

Complex input Microbial High value output Community Conceptual Model of Oil Biodegradation

Oxygen Carbon dioxide

pH Microbial Biomass

Petroleum Microbial Community Daughter Products Temperature

Nutrients (N, P, Fe) Water Next-generation sequencing and microbial ecology

Nucleic Acid Extraction

DNA RNA

Sequencing

Marker Gene Shotgun Sequencing Metagenomics Metatranscriptomics Bioinformatic analyses Phylogenetic Genes and Microbial Composition Functional Activity Potential Hazen et al 2013 16S rRNA Sequencing for Community Profiling

Environmental DNA

rRNA is useful in microbial ecology because 1) It is conserved in all organisms (16S rRNA – Amplified rRNA and Archaea, 18S rRNA - Eukarya genes 2) It is rarely horizontally transferred 3) Minor difference in hypervariable regions can distinguish between microbial taxa

Next-generation sequencing can be used to Sequence generate millions of 16S rRNA reads from hundreds of samples in 24 hours

Operational Taxonomic Unit Phylogenetic Diversity (OTU) – Taxonomic grouping defined solely on sequence Relative abundance of differences microbial taxa Microbial Community Response to Released Oil in the Gulf of Mexico

Released oil resulted in a dramatic change in the microbial

community composition Mason et al 2012 ISME Oceanospirillales dominate oil-impacted sites throughout the Gulf

The microbial community changed in a reliable manner in oil impacted sites

Mason et al 2012 ISME Line 6B Microbial Response

Pseudomonas spp. dominate the microbial community in Dilbit amended microcosms from the Kalamazoo River

Deshpande et al 2018 Study Goals • Expand the understanding of the microbial response to oil in the Great Lakes. • Determine the impact of different oil types on the microbial response to crude oil. Oxygen Carbon dioxide

pH Microbial Biomass Microbial Petroleum Community Daughter Products Temperature

Nutrients (N, P, Fe) Water Overview

• Background on oil biodegradation • Microbial response to light and heavy crude oil in the Great Lakes • Machine learning for prediction of contamination in the Great Lakes. Sampling

• Surface water samples were collected from seven sites. • Water was transported back to the lab for microcosm experiments Methods - Microcosms

Control – No Oil

Bakken Crude

16S rRNA sequencing Diluted Bitumen North Dakota Bakken Cold Lake Diluted Crude Bitumen

API Gravity (° API) 40.6 API Gravity (° API) 21.7 GLRC MI141 STR1 STR2 STR3 SUP1 SUP4 1.00 Taxonomic Composition of Microcosms

GLRCGLRC MI141MI141 STR1STR1 STR2STR2 STR3STR3 SUP1SUP1 SUP4SUP4 1.00

y 0.75 t Class

i Rank3 n

y 0.75

t i

u Rank3

D_2__Acidimicrobiia D_2__Nitrospira

n u m D_2__AcidimicrobiiaD_2__Actinobacteria D_2__Nitrospira D_2__OPB35 soil group

m D_2__Actinobacteria D_2__OPB35 soil group m

m D_2__AlphaproteobacterD_2__Alphaproteobacteria D_2__Phia ycisphaerae D_2__Phycisphaerae

o o D_2__Betaproteobacteria D_2__Planctomycetacia

C D_2__Betaproteobacteria D_2__Planctomycetacia

C f

0.50 D_2__Chloroflexia D_2__Solibacteres

f o

D_2__Chloroflexia D_2__Solibacteres

0.50 D_2__Cyanobacteria D_2__Spartobacteria

o

n

o D_2__CytophagiaD_2__Cyanobacteria D_2__Sphingobacteriia D_2__Spartobacteria

i t

n D_2__Deltaproteobacteria D_2__Thermoleophilia r

o D_2__Cytophagia D_2__Sphingobacteriia o

i D_2__Flavobacteriia D_2__Verrucomicrobia Incertae Sedis

p t

o D_2__GammaproteobacterD_2__Deltaproteobacteria D_2__Via errucomicrobiae D_2__Thermoleophilia

r r

o D_2__Holophagae

P 0.25 D_2__Flavobacteriia D_2__Verrucomicrobia Incertae Sedis p

o D_2__Gammaproteobacteria D_2__Verrucomicrobiae r D_2__Holophagae P 0.25

0.00

L L L L L L L

N N N N N N N N N N N N N N

O O O O O O O

E E E E E E E E E E E E E E

R R R R R R R

K K K K K K K

M M M M M M M

T T T T T T T

K K K K K K K

U U U U U U U

N N N N N N N

A A A A A A A

T T T T T T T

I I I I I I I

B B B B B B B

O O O O O O O

B B B B B B

0.00 B

C C C C C C C

L L L L L L L

N N N N N N N N N N N N N N

O O O O O O O

E E E E E E E E E E E E E E

R R R R R R R

K K K K K K K

M M M M M M M

T T T T T T T

K K K K K K K

U U U U U U U

N N N N N N N

A A A A A A A

T T T T T T T

I I I I I I I

B B B B B B B

O O O O O O O

B B B B B B B

C C C C C C C Microbial Community Response to Oil

0.2

PERMANOVA comparing oil type

shape Stat BAKKEN F Model 7.8567 0.0 BITUMEN R2 0.03144 CONTROL

P value 0.001 ]

% SITE

2

.

0

1

[

GLRC

2

.

s i x MI141 A −0.2 Pairwise PERMANOVA by oil type Axis 2 2 [10.2%] Axis STR1 STR2 Control Bakken Dilbit STR3 Control 0.002 0.002 SUP1 SUP4 Bakken 0.0430 0.013 −0.4 Dilbit 0.0326 0.0156

−0.2 0.0 0.2 AxisAxis.1 1 [24.3%] [24.3%] PCoA Straits

0.2 Oil addition selects for a distinct community PCoA Straits

0.1 0.2 shape ] 0.1

% STR1

shape

] 9

% STR1

. STR2 9

. STR2 2 2 STR3STR3

1 0.0

1 0.0

[

[

OIL_TYPE

2

.

s

i OIL_TYPEBAKKEN x

2 BITUMEN .

A −0.1 CONTROL s

i BAKKEN x BITUMEN A −0.1 −0.2 CONTROL

−0.3

−0.2 0.0 0.2 −0.2 Axis.1 [24.4%]

−0.3

−0.2 0.0 0.2 Axis.1 [24.4%] STR1 STR2 STR3

Shifts in Microbial Community Composition

Straits 1 Straits 3 STR1 StraitsSTR2 2 STR3

0.2 ]

% 0.2

4

. ]

4 OIL_TYPE

%

2

4

.

[

4 OIL_TYPE BAKKEN

2

[ 1

BAKKEN

1

s

s i

i BITUMENBITUMEN

x

x A

0.0 CONTROL

A A

CONTROL

0.0 o

C

A

P

o

C P

−0.2

T0 T1 T2 T3 T4 T5 T0 T1 T2 T3 T4 T5 T0 T1 T2 T3 T4 T5 Time −0.2

T0 T1 T2 T3 T4 T5 T0 T1 T2 T3 T4 T5 T0 T1 T2 T3 T4 T5 Time Differentially Abundant OTUs between Oil and Control

Oil Control Differentially Abundant OTUs between Oil and Control

Top 5 OTUs Enriched in Oil Microcosms across all the Great Lakes Domain Phylum Class Order Family log2Fold padj Change Bacteria Rhodocyclales Rhodocyclaceae -4.825034181 1.15E-09 Bacteria Proteobacteria Sphingomonadales Sphingomonadaceae -4.813390899 1.01E-80 Bacteria Proteobacteria Alphaproteobacteria Alphaproteobacteria Unknown Family -4.76857179 2.89E-05 Incertae Sedis Bacteria Proteobacteria Alphaproteobacteria Sphingomonadales Sphingomonadaceae -4.437041976 0.000114849 Bacteria Proteobacteria Alphaproteobacteria SAR11 clade LD12 freshwater group -4.351315443 6.79E-12 Top 5 OTUs Enriched in Control Microcosms across all the Great Lakes Domain Phylum Class Order Family log2Fold padj Change Bacteria Proteobacteria Betaproteobacteria 4.916518199 9.13E-14 Bacteria Cyanobacteria Chloroplast uncultured diatom uncultured diatom 4.409444746 3.70E-05 Bacteria Actinobacteria Actinobacteria Frankiales Sporichthyaceae 4.407054997 2.84E-28 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae 3.949906563 1.16E-16 Bacteria Proteobacteria Pseudomonadales Moraxellaceae 3.908584356 5.68E-10 Oil Type Selects for a Distinct Community

Straits 1 Straits 3 STR1 StraitsSTR2 2 STR3

Pairwise PERMANOVA by oil type 0.2

] Control Bakken Dilbit

%

4 .

4 OIL_TYPE Control 0.002 0.002

2 [

BAKKEN 1

Bakken 0.0430 0.013 s

i BITUMEN

x A

0.0 CONTROL Dilbit 0.0326 0.0156

A

o

C P

−0.2

T0 T1 T2 T3 T4 T5 T0 T1 T2 T3 T4 T5 T0 T1 T2 T3 T4 T5 Time Differentially Abundant OTUs between Bakken and Dilbit

Bakken DilBit Differentially Abundant OTUs between Bakken and Dilbit

Top 5 OTUs Enriched in Bakken Microcosms across all the Great Lakes

Phlyum Class Order Family Genus log2Fold padj Change Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae -7.267517002 4.83E-12 Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Aquabacterium -6.885120619 9.00E-12 Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae NA -5.246257406 4.22E-12 Proteobacteria Betaproteobacteria Burkholderiales -3.833837931 0.004553027 Proteobacteria Alphaproteobacteria SAR11 clade LD12 freshwater group uncultured bacterium -3.758914796 2.83E-07 Top 5 OTUs Enriched in Dilbit Microcosms across all the Great Lakes Phlyum Class Order Family Genus log2Fold padj Change Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae 2.771554143 0.00492125 Proteobacteria Betaproteobacteria Burkholderiales NA 2.189047269 0.002388574 Proteobacteria Alphaproteobacteria Sphingomonadales Sphingomonadaceae Novosphingobium 1.886806751 0.00492125 Proteobacteria Alphaproteobacteria Sphingomonadales Sphingomonadaceae NA 1.87143764 0.004692176 Proteobacteria Alphaproteobacteria SAR11 clade LD12 freshwater group uncultured bacterium 1.509995256 0.001452114 Summary so far • A diverse set of microbes responds to oil in the Great Lakes. • Alpha- and Betaproteobacteria are the primary groups enriched in response to oil. • Many of the enriched OTUs are related to known oil-degrading taxa. • There is a distinct microbial response to different types of crude oil. Overview

• Background on oil biodegradation • Microbial response to crude oil in the Great Lakes • Comparison of the microbial response to light and heavy crude oil. • Machine learning for prediction of contamination in the Great Lakes. Gulf of Mexico Sampling Sites

Smith et al 2015 mBio Ability of the Microbial Community Structure to Predict the Presence of Oil

Smith et al 2015 mBio The abundance of two taxa can predict oil contamination in the GoM

Smith et al 2015 mBio Machine Learning and Microbial Ecology

List of taxa found in oiled Sample Train ML model to identify features (taxa) that distinguish List of taxa between oil and no oil found in Non- Oiled Sample 10 oil samples Model10 no oil Yes No Taxa A?

10 oil samples 0 oil samples 0 no oil samples 10 no oil samples Machine Learning and Microbial Ecology

List of taxa found in oiled Taxa A – oil Sample Train ML model to identify features (taxa) Taxa B – oil that distinguish Taxa C – no oil List of taxa between oil and no oil Taxa D – no oil found in Non- Oiled Sample

O

Model O 75% accurate O classification

N ? O

Presence of Oil across the Great Lakes

0 .

• Random forests model 1

predicting the presence of

8

. 0

oil in microcosms for all n

o

i

6

.

s 0

sites was constructed. i

c

e

4

. r

• Accuracy on test set : 0

Precision P

0.8226 2

. 0

• F1 of test set : 0.7179 AUC = 0.911

0 . CONTROL OIL 0 CONTROL 14 3 0.0 0.2 0.4 0.6 0.8 1.0 Recall OIL 8 37 Recall

Presence of Oil in the Straits of Mackinac

0 .

• Random forests model 1

predicting the presence 8

.

0 n

of oil in microcosms from o

i

6

.

s 0

the Straits of Mackinac. i

c

e

4

. r

• Accuracy on test set: 1 0

Precision

P 2 • F1 of test set : 1 .

0 AUC = 0.974

0 . CONTROL OIL 0 CONTROL 11 0 0.0 0.2 0.4 0.6 0.8 1.0 Recall OIL 0 22 Recall Classifying the type of oil across the Great Lakes

• A random forests model was constructed to classify samples into control, Bakken or Diluted Bitumen crude oils based on the microbial community composition. Metric Control Bakken Dilbit Accuracy Mean F1 F1 0.8627 0.8000 0.7429 All Sites 0.8095 0.8019 Balanced 0.9146 0.8452 0.8017

All Sites All Accuracy Straits 0.9697 0.9696 F1 0.9565 1.0000 0.9524 Balanced 0.9773 1.0000 0.9545 Straits Accuracy Biosignatures

Model Number of features with importance >1 All sites – predict presence of oil 40 All sites – classify oil type 333 Straits – predict presence of oil 177 Straits – classify oil type 119

The most important features were OTUs classified as Sphingomonadales, Comamonadaceae, and Sporichthyaceae Summary • A diverse set of microbes responds to oil in the Great Lakes. • There is a distinct microbial response to different types of crude oil. • Machine learning can serve as an effective tool for interrogating the microbial for features that are predictive of contamination. • Models for prediction of the presence of oil across the Great Lakes were less accurate than those predicting the presence and type of oil in one location. • Models for prediction of oil and classification of the type of contamination were highly accurate in the Straits of Mackinac region. Acknowledgments

• Tim Butler • Paige Webb • Ryan Ghannam • Emma Byrne • The captain and crew of the NOAA R/V 5501 • Jorge Santo Domingo (US EPA) • MTU – REF and startup funs