University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln
Chemical & Biomolecular Engineering Theses, Chemical and Biomolecular Engineering, Dissertations, & Student Research Department of
Summer 7-27-2021
METABOLIC MODELING AND OMICS-INTEGRATIVE ANALYSIS OF SINGLE AND MULTI-ORGANISM SYSTEMS: DISCOVERY AND REDESIGN
Mohammad Mazharul Islam University of Nebraska-Lincoln, [email protected]
Follow this and additional works at: https://digitalcommons.unl.edu/chemengtheses
Part of the Biochemical and Biomolecular Engineering Commons
Islam, Mohammad Mazharul, "METABOLIC MODELING AND OMICS-INTEGRATIVE ANALYSIS OF SINGLE AND MULTI-ORGANISM SYSTEMS: DISCOVERY AND REDESIGN" (2021). Chemical & Biomolecular Engineering Theses, Dissertations, & Student Research. 36. https://digitalcommons.unl.edu/chemengtheses/36
This Article is brought to you for free and open access by the Chemical and Biomolecular Engineering, Department of at DigitalCommons@University of Nebraska - Lincoln. It has been accepted for inclusion in Chemical & Biomolecular Engineering Theses, Dissertations, & Student Research by an authorized administrator of DigitalCommons@University of Nebraska - Lincoln.
METABOLIC MODELING AND OMICS-INTEGRATIVE ANALYSIS OF
SINGLE AND MULTI-ORGANISM SYSTEMS: DISCOVERY AND REDESIGN
by
Mohammad Mazharul Islam
A DISSERTATION
Presented to the Faculty of
The Graduate College at the University of Nebraska
In Partial Fulfillment of Requirements
For the Degree of Doctor of Philosophy
Major: Chemical & Biomolecular Engineering
Under the Supervision of Professor Rajib Saha
Lincoln, Nebraska
June, 2021
METABOLIC MODELING AND OMICS-INTEGRATIVE ANALYSIS OF
SINGLE AND MULTI-ORGANISM SYSTEMS: DISCOVERY AND REDESIGN
Mohammad Mazharul Islam, Ph.D.
University of Nebraska, 2021
Advisor: Rajib Saha
Computations and modeling have emerged as indispensable tools that drive the process of understanding, discovery, and redesign of biological systems. With the accelerating pace of genome sequencing and annotation information generation, the development of computational pipelines for the rapid reconstruction of high-quality genome-scale metabolic networks has received significant attention. These models provide a rich tapestry for computational tools to quantitatively assess the metabolic phenotypes for various systems-level studies and to develop engineering interventions at the DNA, RNA, or enzymatic level by careful tuning in the biophysical modeling frameworks. in silico genome-scale metabolic modeling algorithms based on the concept of optimization, along with the incorporation of multi-level omics information, provides a diverse array of toolboxes for new discovery in the metabolism of living organisms
(which includes single-cell microbes, plants, animals, and microbial ecosystems) and allows for the reprogramming of metabolism for desired output(s). Throughout my doctoral research, I used genome-scale metabolic models and omics-integrative analysis tools to study how microbes, plants, animal, and microbial ecosystems respond or adapt to diverse environmental cues, and how to leverage the knowledge gleaned from that to answer important biological questions. Each chapter in this dissertation will provide a detailed description of the methodology, results, and conclusions from one specific research project. The research works presented in this dissertation represent important foundational advance in Systems Biology and are crucial for sustainable development in food, pharmaceuticals and bioproduction of the future. i
DEDICATION
To Seefat Farzin, a beloved wife and a good human being. ii
ACKNOWLEDGEMENTS
This dissertation has been made possible because of several important people in my academic and personal life, to whom I am greatly indebted.
First and foremost, I would like to express my deepest gratitude to Professor Rajib Saha, my research advisor. He is not only an excellent mentor but also an extraordinary human being, which have played a key role in my graduate journey. Our relationship goes back a long way, even before I officially started my PhD at University of Nebraska-Lincoln. During my MS studies at the Pennsylvania State University and during my doctoral admission process, his help and guidance was insurmountable. Throughout my doctoral research, Professor Saha has provided me with novel ideas, necessary pointers, important discussions, constructive feedback, and due appreciation for my work. He has been the most accessible resource during my PhD. He has been patient with the progress of my projects, ingenious during brainstorming ideas, reasonable during constructing logical arguments and hypotheses, and understanding during difficult times. Besides being a great academic advisor, Professor Saha has supported my professional and personal development in numerous ways.
I would like to thank my dissertation committee members and collaborators. Professors
Vinai C. Thomas, Samodha Fernando, and Harkamal Walia have provided useful insight, primary experimental results, and much needed guidance throughout the different research projects I have undertaken. I also acknowledge their research group members who generated the experimental datasets and performed analyses for many of the projects, especially Dr. Jong-Sam Ahn from
Professor Thomas’ research group and Ms. Jaspreet Sandhu from Professor Walia’s research group. Professor Alexandrov has provided very thoughtful feedback and suggestions during the iii review of my progress and my comprehensive examinations, which were critical in shaping the overall outcome of the dissertation.
I would like to acknowledge all the members of the Systems and Synthetic Biology
(SSBio) laboratory. They have engaged in long brainstorming sessions on research ideas and troubleshooting of programming and computing challenges, reviewed my work as well as manuscripts, shared their insight on career development, and participated in fun social activities with me throughout my doctoral journey. It felt like a family with them being around.
Specifically, I am grateful to Dr. Cheryl Immethun, Dr. Wheaton Schroeder, Adil Al-Siyabi,
Dianna Long, Brandi Brown, Mark Kathol, and Niaz Chowdhury. In addition, I thank all the undergraduate student researchers who have worked with me in different projects over the years, especially Matthew Van Beek, Andrea Goertzen, Tony Le, and Shardhat R. Daggumati.
I hereby acknowledge the funding entities that have supported my work: the University of Nebraska Faculty Startup Grant; Nebraska Systems Science Initiative Seed Grant; University of Nebraska Collaboration Initiative Grant; University of Nebraska Animal Nutrition, Growth and Lactation Grant; Effective Mitigation Strategies for Antimicrobial Resistance Grant; US
Department of Agriculture (USDA); National Institute of Food and Agriculture (NIFA);
National Institute of Health (NIH); National Institute of Allergy and Infectious Diseases (NIAID);
University of Nebraska-Lincoln Research Council Interdisciplinary Grant; and National
Science Foundation. I also thank the faculties and staff of the department of Chemical and
Biomolecular Engineering at University of Nebraska-Lincoln for their help and support.
Thanks go out to my teachers, mentors, and colleagues at the University of Nebraska-
Lincoln, the Pennsylvania State University, and Bangladesh University of Engineering and
Technology. They have instilled technical and non-technical knowledge and wisdom that made iv me successful as a researcher. Specifically, I would like to thank the following individuals:
Professor Costas D. Maranas, my MS advisor at the Pennsylvania State University, for supervising my research in metabolic modeling of microbial communities; Professor Tareq
Daher, my instructor of STEM Teaching at University of Nebraska-Lincoln, for teaching me the tools of evidence-based teaching; Dr. Lisa Rohde, Associate Director of Teaching & Research
Development at University of Nebraska-Lincoln, for guiding me during my CIRTL certification process; and Professor Sirajul Haque Khan at Bangladesh University of Engineering and
Technology, for first kindling the fire of optimization of chemical and biological systems through his graduate course.
I would also like to acknowledge the inspiration I have drawn from several individuals:
Dr. Clifford Paul Stoll, Dr. Jorge Gabriel Cham (PhD comics), Mr. Randall Patrick Munroe
(xkcd), Cartoonists Mr. Gavin Aung Than (Zen Pencils) and Mr. Matthew Boyd Inman (The
Oatmeal), and Dr. Isaac Asimov. Their creative muse and talent have helped me persevere, advance, think, and act wisely in numerous occasions.
My heartiest gratitude goes to my beloved wife, Seefat Farzin. She has supported my strenuous doctoral journey with patience, encouragement, and love. Being a partner in all the joys and sorrows of my life, she has been my biggest source of motivation. During this demanding career path of mine, she has made countless sacrifices and still stood by me with her beautiful smile all day long. Without her on my side, this journey would have been very difficult to pursue.
My parents, Md Zahidul Islam and Maleka Pervin, have provided mental support and inspiration throughout my graduate studies. Having their elder son some few thousand miles away from home, it was not an easy affair for them. They missed me physically every moment of the past few years and kept inspiring me despite all my shortfalls in carrying out my v responsibilities towards them. My brother, Mr. Mohammad Manzurul Islam, has been a playful sibling all along, and I thank him for putting his trust and confidence in me in the times of need.
Along with them, I thank all my friends and family back in Bangladesh for all their support.
My friends in Lincoln, Nebraska have been helpful and supporting of my research. I had the privilege to work with and be friends with people from not only the USA and Bangladesh but also from all over the world. I am greatly thankful to them for their continuous companionship and encouragement.
This list can only acknowledge a tiny fraction of the people whose direct and indirect support have been instrumental to my work. I express my sincere gratitude to all.
vi
PREFACE
Some of the content discussed in Chapter 2 has been published in npj Systems Biology and
Applications (Islam, M. M., S. C. Fernando, and R. Saha, “Metabolic Modeling Elucidates
Metabolic Transactions in the Rumen Microbiome and the Metabolic Shifts upon virome
Interactions”, Frontiers in Microbiology, 2019, 10: 2412). Used with permission.
Some of the content discussed in Chapter 3 has been published in Computational and Structural
Biotechnology Journal (Islam, M. M., J. K. Sandhu, H. Walia, and R. Saha, “Transcriptomic data-driven discovery of global regulatory features in developing rice seeds under heat stress”,
Computational and Structural Biotechnology Journal, 2020, 18:2556-2567). Used with permission.
Some of the content discussed in Chapter 4 has been published in Frontiers in Microbiology
(Islam, M. M., S. C. Fernando, and R. Saha, “Metabolic Modeling Elucidates Metabolic
Transactions in the Rumen Microbiome and the Metabolic Shifts upon virome Interactions”,
Frontiers in Microbiology, 2019, 10: 2412). Used with permission.
Some of the content discussed in Chapter 5 has been published in PeerJ (Islam, M. M., Le, T.,
Daggumati, S. R., and R. Saha, “Investigation of microbial community interactions between lake
Washington methanotrophs using genome-scale metabolic modeling”, PeerJ, 2020, 8:e9464).
Used with permission.
vii
Some of the content discussed in Chapter 6 is expected to be included in a manuscript that is currently in preparation for publication under the lead of Rajib Saha (Islam, M. M., Goertzen, A., and R. Saha, “Exploring the metabolic landscape of pancreatic ductal adenocarcinoma cells using genome-scale metabolic modeling”, (in preparation for submission to refereed journal).
viii
TABLE OF CONTENTS
Introduction ...... 1
1.1. Understanding metabolism: the role of Systems Biology ...... 1
1.2. Metabolic model development and analysis ...... 2
1.3. Curation and refinement of genome-scale metabolic models ...... 4
1.4. Computational modeling of single- and multi-cellular organisms ...... 5
1.5. Metabolic modeling of microbial communities ...... 6
1.6. Integration of multi-level omics’ datasets into genome-scale metabolic models ...... 9
1.7. Computational resources ...... 10
1.8. Dissertation Objectives ...... 11
Genome-Scale Metabolic Reconstruction and Multi-omics Analysis of Human Pathogen
Staphylococcus aureus ...... 13
2.1. Background ...... 14
2.2. Preliminary model reconstruction utilizing the existing knowledge base ...... 17
2.3. Model curation ...... 18
2.4. Evaluation of growth profiles of mutants in NTML ...... 21
2.5. Gene essentiality analyses ...... 23
2.6. Using GrowMatch to resolve Growth and No-growth inconsistencies ...... 25
2.7. Model validation and refinement: Incorporation of regulation ...... 27
2.8. Model validation and refinement: Growth phenotypes study ...... 28
2.9. Model validation and refinement: Metabolite excretion profiles of mutants ...... 31 ix
2.10. Model validation and refinement: Carbon catabolism capacity ...... 38
2.11. Conclusions ...... 38
Transcriptomics-guided Discovery of Global Regulatory Mechanisms During Heat Stress on
Developing Rice Seeds ...... 41
3.1. Introduction ...... 42
3.2. Data collection ...... 47
3.3. Differential gene expression under heat stress ...... 48
3.4. Co-expression network and clustering behavior ...... 52
3.5. Application of MiReN optimization framework to identify minimal regulatory networks 54
3.6. MiReN-predicted regulatory influences of known regulators in rice ...... 56
3.7. MiReN-predicted de novo regulatory influences on stress-responsive rice transcription
factors ...... 60
3.8. Conclusions ...... 64
Microbiome-Virome Interaction in Bovine Rumen: The Role of Viral Auxiliary Metabolic Genes in Modulating Microbial Community Dynamics ...... 68
4.1. Background ...... 69
4.2. Model reconstructions and curations ...... 74
4.3. Community formation and simulation ...... 77
4.4. Identification of unknown interactions and bridging of network gaps ...... 79
4.5. Identification of viral auxiliary metabolic genes ...... 84
4.6. Viral auxiliary metabolic genes and shifts in flux distributions in the metabolic models .. 84
4.7. Variability of the metabolic fluxes under different community objective functions ...... 89 x
4.8. Conclusions ...... 92
Microbial Community Dynamics in a Freshwater Lake Methanotrophic Ecosystem ...... 94
5.1. Background ...... 95
5.2. Metabolic model reconstruction and curation ...... 99
5.3. Community model formation ...... 102
5.4. Formaldehyde Inhibition Constraint ...... 102
5.5. Community dynamics under variable environmental conditions ...... 103
5.6. Dynamic shifts in metabolism under sediment incubated microcosm and synthetic co-
culture composition ...... 112
5.7. Conclusions ...... 115
Divergent Metabolic Landscape of Pancreatic Ductal Adenocarcinoma ...... 116
6.1. Background ...... 116
6.2. Transcriptomic data processing ...... 122
6.3. Preliminary pancreas metabolic reconstruction and curation ...... 124
6.4. Metabolic models of PDAC and healthy pancreas cell ...... 127
6.5. Unique metabolic traits in PDAC ...... 129
6.6. Conclusions ...... 134
Conclusions and Future Perspectives ...... 135
Appendix A ...... 195
Appendix B ...... 198
Appendix C ...... 207
Appendix D ...... 213 xi
Appendix E ...... 216
Appendix F ...... 220
Appendix G ...... 225
xii
LIST OF FIGURES
2.1 Overall view of the iSA863 model reconstruction 22
2.2 Growth-no growth (G-NG) prediction matrices 27
2.3 Refinements in the central metabolic pathway of the model iSA863 showing 30
correction of reaction directionality, additions, and deletions.
2.4 Growth curve for the mutants in (A) CDMG and (B) CDM media. 34
2.5 Shifts in flux space for eight mutants in the central carbon and nitrogen 36
metabolic pathway.
3.1 Step-by-step workflow to identify stress responsive genes of interest and the 46
potential regulator candidates to be analyzed using the MiReN framework.
3.2 Transcriptomic experimental design of control and stressed samples of 48
developing rice seed.
3.3 Enrichment of genes related to cellular processes differentially enriched under 50
stress condition.
3.4 MiReN-predicted and experimentally identified regulatory networks for (A) 58
Slender Rice 1, (B) Immune receptor XA21 (C) Kinesin motor-domain
containing proteins KIN12C and STD1, and (D) WD-repeat containing protein
OsWD40.
3.5 Minimal regulatory networks involving the MADS-Box M-type (A), MYB (B), 63
and bZIP (C) genes in control and stress conditions.
4.1 Initial community simulation results showing the interactions between the 78
bacterial and archaeal members.
4.2 Workflow for identifying possible interspecies interactions from Gapfill 80
suggestions. xiii
4.3 Identified de novo interactions in the community. 83
4.4 Changes in flux space after viral AMGs were added to the metabolic models. 87
4.5 Shifts in metabolism and inter-species interactions after the inclusion of viral 88
auxiliary metabolic genes.
4.6 Variability in metabolic fluxes under different community objective functions 90
5.1 Community dynamics showing the fluxes of key shared metabolites and biomass 104
in the community model.
5.2 The community composition and total biomass under varying Methane and 106
Oxygen conditions.
5.3 Central carbon metabolism fluxes in Methylobacter (A) and Methylomonas (B) 107
under high nutrient condition
5.4 Central carbon metabolism fluxes in Methylobacter (A) and Methylomonas (B) 108
under Oxygen limited condition.
5.5 Central carbon metabolism fluxes in Methylobacter (A) and Methylomonas (B) 109
under Nitrogen limited condition.
5.6 Central carbon metabolism fluxes in Methylobacter (A) and Methylomonas (B) 111
under Carbon limited condition.
5.7 Flux distribution for select metabolites in Methylobacter and Methylomonas 113
under A) Lake Washington sediment-incubated microcosm conditions and B)
synthetic co-culture conditions.
6.1 Schematic of the workflow for generating healthy pancreas and PDAC model 121
and elucidating the metabolic divergence in PDAC.
6.2 Model statistics for the healthy pancreas and the PDAC models. 128
6.3 Significantly upregulated and downregulated pathways in PDAC cell 129
metabolism xiv
6.4 Distinct metabolic features of PDAC cell. 131
A.1 Example of fixing cycles involving duplicate reactions. 195
A.2 Example of fixing cycles involving lumped reactions. 196
A.3 Example of fixing cycles involving non-specific cofactors. 197
B.1 The methodology to determine gene-essentiality. 202
G.1 Significantly upregulated and downregulated pathways in PDAC cell 225
metabolism.
xv
LIST OF TABLES
2.1 Comparison of model statistics between recent S. aureus metabolic models. 23
2.2 Metabolite excretion rates of multiple S. aureus mutants with altered carbon 32
and nitrogen metabolism in CDMG and CDM culture supernatants
(µM/OD600/h).
4.1 Statistics for the draft and curated models of R. flavefaciens, P. ruminicola, and 76
M. gottschalkii.
4.2 Model statistics after filling the gaps using GapFind-GapFill. 81
4.3 Phage-microbe association in the bovine rumen. 85
5.1 Model statistics for Methylobacter and Methylomonas 102
5.2 High and limiting nutrient conditions for community simulation. 103
B.1 Ranking scheme for GrowMatchNGG solutions. 204
C.1 Condition-specific and mutant-specific regulation information incorporated in 207
the iSA863 metabolic model
D.1 Growth evaluation of Staphylococcus aureus iSA863 metabolic model on 213
different carbon sources
E.1 Calculated nutrient uptake rates in the rumen community. 216
F.1 Reactions added to Ruminococcus flavefaciens 220
F.2 Reactions added to Prevotella ruminicola 222
F.3 Reactions added to Methanobrevibacter gottschalkii 223
1
Chapter 1
Introduction
Computation and modeling have emerged as indispensable tools that drive the process of understanding, discovery, and redesign of biological systems. Nowadays, we use computations to reconstruct models of metabolism that account for stoichiometry, regulation, kinetics and increasingly every macromolecular species present. The plasticity of living systems inherited through evolution enables biotechnologists to steer metabolism to many different directions ranging from strain development for chemicals and materials production, drug targeting in pathogens, prediction of enzyme functions, pan-reactome analysis, tailoring metabolism through omics’ data integration, modeling interactions among multiple cells or organisms, and understanding human diseases. A growing number of computational tools relying on mathematical optimization frameworks have emerged, benefiting from the rapid advancements in the reconstruction of genome‐scale metabolic models of microbes, plants, animals, and microbial ecosystems. These tools and models together allow us to address the challenges of identifying and quantifying the genetic and environmental interventions and minimizing the counteractions of the organisms in response to them. These large models together with constraint-based methods represent a key foundational advance in Systems Biology and metabolic engineering and are crucial for sustainable development in food, pharmaceuticals and bioproduction of the future.
1.1. Understanding metabolism: the role of Systems Biology
Systems biology is the use of computational and mathematics tools, modeling, and analysis for holistic understanding and design of biological systems. It is a method of generating hypotheses in silico, utilizing mathematics and knowledge of the biological system which may be investigated in vivo through synthetic biology or other more traditional methods. Systems biology
2 has evolved as a scientific discipline in which computational and mathematical modeling is used to study the complete system, as opposed to molecular biology, which focuses on subsystems often studied via in vitro experiments. Another characteristic of systems biology is that it involves quantitative analysis, unlike the largely qualitative; nature of molecular biology that focuses on hypothesis testing, which is used to determine whether a given model describing the system is true or false. Despite the different approaches, systems biology is highly dependent on the extensive biological information that has been acquired through molecular biology, and systems biology studies often result in the generation of hypotheses that require confirmation using a reductionist approach. Due to the high connectivity in metabolism, it is often necessary to use explicit and detailed mathematical models for the analyses in Systems Biology.
1.2. Metabolic model development and analysis
At the heart of the most Systems biology tools and analysis methods is the Metabolic Model, which has provided a more rigorous method of metabolic investigation. A metabolic network model captures the inter-conversion of metabolites through chemical transformations catalyzed by enzymes by assembling gene annotation and biological information from existing knowledgebases such as KEGG1, BRENDA2, ModelSeed3, KBase4, and Biocyc5. To this end, a metabolic model describes reaction stoichiometry and directionality, gene to protein to reaction associations (GPRs), organelle-specific reaction localization, transporter/exchange reaction information, transcriptional/translational regulation, and biomass composition 6. By defining the metabolic space, genome-scale metabolic models (GSMs) can assess allowable cellular phenotypes; explore the metabolic potentials and restrictions under specific environmental and/or genetic conditions 7-10. In order to have detailed blueprints for biological systems, a genome-scale metabolic model needs to be carefully reconstructed from available genome annotation and biological data, checked for elemental and charge balance, tested for its completeness, and filled in for any remaining network gaps before utilizing it to answer important biological questions.
3
A metabolic model is mathematically described as a matrix of reaction stoichiometries shown below:
�!! ⋯ �!" ⋮ ⋱ ⋮ (1.1) �#! ⋯ �#"
Where � is the stoichiometric coefficient of metabolite � in reaction �.
Flux Balance Analysis (FBA) is a widely used tool for studying GSM models and subsequently applying them for metabolic engineering purposes 11-13. Under pseudo steady state, FBA assumes that the internal concentration of metabolites within a cellular system stays constant over time 11.
In addition to the mass balance constraints, environmental constraints based on availability of nutrients, electron acceptors, or other environmental conditions, relation of reaction rates with concentrations of metabolite, and negative free energy change for spontaneous reactions can also be imposed. The effects of gene expressions may result in regulatory constraints on these models as the cell adapts to environmental changes 14. The solution space of this under-determined system of equations represents the bounds of metabolic flux distribution that the cell can achieve under a given condition 15,16. An optimization-based algorithm can then be used with specific objective functions (usually the cellular growth rate or yield of a desired bioproduct) to simulate biological behavior of the cell. The optimization formulation, in its most common form, is given below.
��������( ) �
������� ��