STUDYING THE UNDERSTUDIED: HYPER AMMONIA PRODUCING BACTERIA AND BACTERIOPHAGES IN THE RUMEN MICROBIOME

by

Jessica Charlotte Abigail Friedersdorff

A thesis submitted in partial fulfillment of the requirements for the degree of

PhD in Biological Sciences

Aberystwyth University

2020

Preface

I. Mandatory Layout of Declaration/Statements

Word Count of thesis: 62,288 DECLARATION This work has not previously been accepted in substance for any degree and is not being concurrently submitted in candidature for any degree.

Candidate name Jessica Charlotte Abigail Friedersdorff

Signature: Date 15/09/2020

STATEMENT 1

This thesis is the result of my own investigations, except where otherwise stated. Where *correction services have been used, the extent and nature of the correction is clearly marked in a footnote(s).

Other sources are acknowledged by footnotes giving explicit references. A bibliography is appended.

Signature:

Date 15/09/2020

[*this refers to the extent to which the text has been corrected by others]

STATEMENT 2

I hereby give consent for my thesis, if accepted, to be available for photocopying and for inter-library loan, and for the title and summary to be made available to outside organisations.

Signature:

Date 15/09/2020

NB: Candidates on whose behalf a bar on access (hard copy) has been approved by the University should use the following version of Statement 2:

I hereby give consent for my thesis, if accepted, to be available for photocopying and for inter-library loans after expiry of a bar on access approved by Aberystwyth University.

Signature:

Date 15/09/2020

1 Preface II. Summary

Candidate’s Surname/Family Name Friedersdorff

Candidate’s Forenames (in full) Jessica Charlotte Abigail Candidate for the Degree of PhD Academic year the work submitted for examination 2020

Summary:

Greenhouse gas emissions and feed efficiency in ruminant livestock are pertinent and important topics, ones which have not suffered from lack of attention as ample research has endeavoured to further our understanding of the complex rumen microbial ecosystem. Despite this, some populations remain understudied. This is the key motivation behind the studies herein, which contribute to the understanding of the niche bacterial population of hyper ammonia producers (HAP) and bacteriophages (viruses that infect bacteria). HAP species degrade amino acids and peptides for energy, in the process producing hydrogen, carbon dioxide, and excessive amounts of ammonia. Hydrogen and carbon dioxide feed into methane production by archaea present in the rumen, whilst excess ammonia is removed from the animal host. This makes the HAPs an ideal target for potential population control, but firstly it was imperative to better understand them. This study first characterised the ammonia production phenotypes of bacterial cultures, then compared their genomes and transcriptomes to identify a signature that indicates the HAP phenotype. The work presented here has demonstrated the complexity and variability underlying the seemingly simple HAP phenotype, warranting further investigation in future work and isolation of novel HAPs in order to better understand this group before controlling the population. Phage therapy is one approach to population control that has been relatively little explored to date in the rumen. Despite phages being abundant in the rumen, there were only five genomes available of phages isolated from rumen-associated samples. This study isolated and characterised a further five novel phages that infect Butyrivibrio fibrisolvens. While the work presented here did not identify phages active against HAPs, these five Butyrivibrio phages contribute valuable information about the structure and function of the rumen ecosystem. It is suggested that continuation of this line of enquiry in future work would complement ongoing research utilising metagenomics, metatranscriptomics and metaproteomics aimed at understanding and improving rumen efficiency.

2

Preface III. Funding

Knowledge Economy Skills Scholarships (KESS 2) is a pan-Wales higher level skills initiative led by Bangor University on behalf of the HE sector in Wales. It is part funded by the Welsh Government’s European Social Fund (ESF) convergence programme for West Wales and the Valleys.

The supporting company partner was Dr David Rooke, director of Dynamic Extractions LTD.

3

Preface IV. Contents

I. MANDATORY LAYOUT OF DECLARATION/STATEMENTS ...... 1

II. SUMMARY ...... 2

III. FUNDING ...... 3

IV. CONTENTS ...... 4

V. ACKNOWLEDGMENTS ...... 10

VI. LIST OF PUBLICATIONS ...... 11

VII. FIGURE LEGENDS ...... 12

VIII. TABLE LEGENDS ...... 21

IX. ABBREVIATIONS ...... 24

X. ABSTRACT ...... 25

1. BACKGROUND AND INTRODUCTION ...... 27

1.1. RUMINANTS AND THEIR IMPORTANCE ...... 27 1.2. STRUCTURE AND FUNCTION OF THE RUMINANT STOMACH ...... 27 1.2.1. The Rumen and its Specialised Function ...... 28 1.3. DIGESTION IN RUMINANTS ...... 29 1.3.1. Feed and Diet ...... 29 1.3.2. Mechanical Digestion ...... 29 1.3.3. Fermentation ...... 30 1.3.4. Carbohydrate Metabolism ...... 32 1.3.5. Nitrogen Metabolism ...... 32 1.3.5.1. Ammonia in the Rumen; Sources, Transport and Fate ...... 33 1.4. THE RUMEN MICROBIOME ...... 36 1.4.1. Bacteria ...... 36 1.4.1.1. Fibrolytic Bacteria ...... 37 1.4.1.2. Proteolytic Bacteria ...... 39 1.4.1.3. Deaminating Bacteria ...... 40 1.4.1.4. Hyper Ammonia Producing (HAP) bacteria...... 43 1.4.2. Archaea ...... 50 1.4.3. Viruses ...... 50 1.4.3.1. Bacteriophages ...... 50 1.4.3.2. Archaeal phages ...... 55 1.4.4. Fungi ...... 55 1.4.5. Protozoa ...... 55 4

Preface 1.4.6. Understanding the Rumen Microbiome ...... 56 1.5. FEED EFFICIENCY AND METHANE PRODUCTION IN THE RUMEN ...... 57 1.5.1. Methane Production in the Rumen ...... 58 1.5.2. Ammonia Production in the Rumen ...... 58 1.5.3. Methods Considered to Mitigate Methane Emissions and Increase Feed Efficiency ...... 59 1.6. AIMS AND OBJECTIVES ...... 61

2. MATERIALS AND METHODS ...... 62

2.1. RUMEN FLUID ...... 62 2.2. CULTURE MEDIA ...... 63 2.3. SUB-CULTURING ...... 63 2.4. GRAM STAINING AND CELLULAR MORPHOLOGY ...... 64 2.5. GROWTH CURVES ...... 64 2.6. DNA EXTRACTION ...... 64 2.7. BACTERIAL WHOLE GENOME SEQUENCING ...... 65 2.8. 16S ANALYSIS ...... 65 2.9. SAMPLE PREPARATION FOR PHENOTYPE CONFIRMATION ...... 66 2.9.1. Measuring Ammonia Production ...... 66 2.9.2. Measuring Volatile Fatty Acid Production ...... 66 2.9.3. Fourier Transform Infrared Spectroscopy ...... 67 2.10. RNA EXTRACTION FOR BACTERIAL TRANSCRIPTOMICS ...... 67 2.11. PHAGE SCREENING...... 68 2.11.1. Source of Phages ...... 68 2.11.2. Phage Filtrates ...... 69 2.11.3. Enrichments ...... 69 2.11.4. Polyethylene Glycol Precipitations ...... 69 2.11.5. Soft Overlay Technique ...... 69 2.11.6. Phage Screening, Isolation and Purification ...... 70 2.11.7. Alternative Growth Media to Improve Lawn Growth ...... 70 2.11.8. DNA Extraction ...... 71 2.11.9. Solvent Stability and Host Range Testing...... 71 2.11.10. TEM ...... 72 2.11.11. DNA Sequencing ...... 72 2.12. COUNTER CURRENT CHROMATOGRAPHY ...... 72 2.12.1. Obtaining Model Bacteriophages and Their Bacterial Hosts ...... 73 2.12.2. Propagation and Preparation of Model Bacteriophages ...... 73 2.12.3. Spectroscopy ...... 74 2.13. DATA ...... 74

5

Preface

2.14. CULTURE CONFIRMATION USING 16S SEQUENCING ...... 75 2.15. COMPARISONS TO ACETOANAEROBIUM STICKLANDII DSM519 ...... 75 2.16. 40 UNIVERSAL GENE MARKERS PHYLOGENETIC TREE ...... 75 2.17. SEQUENCE SIMILARITY APPROACH AND EVOLUTIONARY GENOME NETWORKS (EGN) .... 76 2.17.1. Optimising Network Parameters ...... 76 2.17.2. Dividing the largest component ...... 77 2.17.3. Finding Gene Families of Interest ...... 77 2.18. FUNCTIONAL ANNOTATION APPROACH ...... 78 2.19. IDENTIFICATION OF TRANSPORTER GENES PRESENT IN THE THREE PHENOTYPES ...... 79 2.20. TRANSCRIPTOMIC APPROACH TO DETERMINE HYPER AMMONIA PRODUCTION GENE SIGNATURES ...... 80 2.20.1. Quality Control and Trimming of Raw Reads ...... 80 2.20.2. Mapping and Counting Reads to the Genomes ...... 80 2.20.3. Identifying Interesting Patterns Using Transcriptome Count Data ...... 80 2.21. BACTERIOPHAGE GENOME ASSEMBLY AND ANALYSIS ...... 82 2.21.1. Quality Control, Read Trimming and Assembly ...... 82 2.21.2. Comparing and Annotating the Phage Genomes ...... 82 2.21.3. Phylogenetics and Phylogenomics ...... 84 2.21.4. Assessing Host Interactions ...... 85

3. AMMONIA PRODUCTION BY RUMINAL BACTERIA; A DEFINITION OF THE PHENOTYPE, IDENTIFICATION AND CHARACTERISATION OF MODEL SPECIES .... 86

3.1. INTRODUCTION ...... 86 3.1.1. Aims and Objectives...... 88 3.2. EXPERIMENTAL PROCEDURES ...... 88 3.3. RESULTS AND DISCUSSION ...... 90 3.3.1. Classification of Rumen Microbes into Ammonia Production Phenotypic Groups ...... 90 3.3.1.1. No Ammonia Producers (NAPs) ...... 93 3.3.1.2. Some Ammonia Producers (SAPs) ...... 94 3.3.1.3. Carbohydrate Utilising Hyper Ammonia Producers ...... 95 3.3.1.4. Obligate Amino Acid Fermenting Hyper Ammonia Producers ...... 96 3.3.2. Model Bacteria Growth and Metabolism In Vitro ...... 100 3.3.2.1. Genome Sequencing for Culture Identification and Confirmation ...... 100 3.3.2.2. 16S Sequencing for Culture Identification ...... 102 3.3.2.3. Analysis of Ammonia Production Confirms Phenotypes ...... 106 3.3.2.4. Volatile Fatty Acid Production ...... 108 3.3.2.5. Fourier Transform Infrared Spectroscopy (FTIR) ...... 112 3.4. CONCLUSION ...... 120

6

Preface 4. ANALYSIS OF THE GENOMES OF RUMEN BACTERIA WITH KNOWN AMMONIA PRODUCTION PHENOTYPES TO DETERMINE A HYPER AMMONIA PRODUCTION SIGNATURE ...... 121

4.1. INTRODUCTION ...... 121 4.2. AIMS AND OBJECTIVES ...... 121 4.3. EXPERIMENTAL PROCEDURES ...... 122 4.4. RESULTS ...... 123 4.4.1. Characteristic genes in Acetoanaerobium sticklandii DSM519 and HAPs ...... 123 4.4.2. Phylogeny of Hyper Ammonia Producing Bacteria ...... 123 4.4.3. Genomic Analysis to Determine Hyper Ammonia Production Gene Signatures ...... 126 4.4.3.1. Optimising the Gene Family Creation ...... 126 4.4.3.2. Sequence Similarity Approach for Genomic Analysis ...... 128 4.4.3.3. Functional Annotation Approach for Genomic Analysis ...... 130 4.4.3.4. Transporters ...... 135 4.5. DISCUSSION ...... 137 4.5.1. Characteristic Acetoanaerobium sticklandii DSM519 Genes are Not Characteristic of HAPS 137 4.5.2. Hyper Ammonia Production Is Not A Monophyletic Trait ...... 137 4.5.3. The Sequence Similarity Approach Did Not Reveal A Unique HAP Genomic Signature 138 4.5.4. Functional Approaches Did Not Highlight Unique HAP Pathways ...... 139 4.5.5. Hyper Ammonia Producing Bacteria Have More Amino Acid Transporter Orthologs 140 4.5.6. Limitations, Implications and Further Work ...... 141 4.6. CONCLUSIONS ...... 142

5. ANALYSIS OF THE TRANSCRIPTOMES OF RUMEN BACTERIA WITH KNOWN AMMONIA PRODUCTION PHENOTYPES TO DETERMINE A HYPER AMMONIA PRODUCTION SIGNATURE ...... 144

5.1. INTRODUCTION ...... 144 5.2. AIMS AND OBJECTIVES ...... 144 5.3. EXPERIMENTAL PROCEDURES ...... 144 5.4. RESULTS ...... 147 5.4.1. Transcriptomic Data Quality and Evaluation ...... 147 5.4.2. Genes with Highest Expression ...... 147 5.4.3. Proportion of Gene Expression Contributing to Functions ...... 150 5.4.4. Functional Annotation Approach for Transcriptomic Analysis ...... 153 5.4.5. Expression of Transporters in HAPs and NAPs ...... 169 5.5. DISCUSSION ...... 172

7

Preface 5.5.1. Amino Acid Metabolism and Energy Conservation Form a Large Proportion of Some HAP Species Transcriptomes ...... 172 5.5.2. Limitations, Implications and Future Work ...... 176 5.6. CONCLUSIONS ...... 177

6. RUMINAL BACTERIOPHAGES ...... 178

6.1. INTRODUCTION ...... 178 6.2. AIMS AND OBJECTIVES ...... 179 6.3. EXPERIMENTAL PROCEDURES ...... 179 6.4. RESULTS AND DISCUSSION ...... 180 6.4.1. Detection of Phages in the Rumen Fluid and Faeces ...... 180 6.4.1.1. Target species 1: Eubacterium pyruvativorans isol6 ...... 180 6.4.1.2. Target species 2: Acetoanaerobium sticklandii (ATCC12662) ...... 182 6.4.1.3. Target species 3: Clostridium aminophilum F (ATCC49906) ...... 182 6.4.1.4. Target species 4: Ruminococcus flavefaciens 007c ...... 182 6.4.1.5. Target species 5: Butyrivibrio fibrisolvens D1 (DSM3071) ...... 183 6.4.2. In vitro Characterisation of Butyrivibrio Phages ...... 186 6.4.3. Phage Genome Analysis ...... 188 6.4.4. Phages Bo-Finn and Arian ...... 191 6.4.5. Phages Arawn and Idris ...... 196 6.4.6. Phage Ceridwen Genome ...... 200 6.4.7. Similarities and Differences Between the Butyrivibrio Phages ...... 202 6.5. PHYLOGENY ANALYSIS REVEALS WIDER CONTEXT OF BUTYRIVIBRIO PHAGES ...... 203 6.6. NO EVIDENCE OF PHAGE INTERACTIONS FOUND IN HOST GENOME ...... 205 6.7. CONCLUSIONS, IMPLICATIONS AND FUTURE WORK ...... 205

7. USING COUNTER CURRENT CHROMATOGRAPHY APPLICATIONS TO SEPARATE BACTERIOPHAGES ...... 207

7.1. INTRODUCTION ...... 207 7.1.1. Current Methods for Isolation and Purification ...... 207 7.1.2. Counter Current Chromatography ...... 208 7.2. AIMS AND OBJECTIVES ...... 211 7.3. EXPERIMENTAL PROCEDURES ...... 214 7.4. RESULTS ...... 215 7.4.1. Bacteriophages Retain Infectivity After the CCC Process ...... 215 7.4.1.1. T4 ...... 217 7.4.1.2. ϕX174 ...... 221 7.4.2. Phages were not detected with UV ...... 221 7.4.3. Bacteriophages Remain in the Column but are Inactivated by Sodium Hydroxide ...... 226 8

Preface

7.5. DISCUSSION ...... 230 7.6. CONCLUSIONS AND FUTURE WORK ...... 231

8. GENERAL DISCUSSION AND CONCLUSIONS ...... 233

8.1. CONCLUSIONS ...... 237

9. BIBLIOGRAPHY ...... 238

10. APPENDIX ...... 257

9

Preface

V. Acknowledgments

Firstly, huge thanks to my supervisors Chris Creevey and Alison Kingston-Smith for the opportunity and for the ongoing support, and David Rooke for not only his input in the project, but the support and encouragement he has offered throughout. I would like to acknowledge KESS for the funding, Jamie Newbold who directed us onto the HAPs, and Dave Whitworth for guidance. Thanks to Alan Cookson for his TEM expertise and boardgame knowledge, Colin Bright for his help with the CCC machine and Teri Davies for her support in the lab and for hacks out on her horses. Big thanks to Justin Pachebat for his mentorship in the start of my lecturing career, help and advice in the lab and as snack provider. Huge thanks to Pauline Rees Stevens for all the support she gave me and the terrible jokes she told when I needed cheering up.

Thanks to Ben Thomas, Tom Hitch, Karen Siu-Ting and Nick Dimonaco for lending me their ear, but mostly for the office antics and boardgame breakouts. Thanks to Cate Williams for putting my scissors and mug in jelly, and to Josh Nelthorpe-Cowne for the mutual sanity-keeping in the lab. Thanks to Elizabeth Hart for the support, friendship, and opportunity to spend time with her family and horses, and to Sissi and Canti for being patient with me and not getting too sassy. Thanks to Tom Martin, always there at the other end of the phone.

Thanks to my parents, family and family-in-law, for their encouragement throughout my PhD, even if they did not understand it, and also to their ongoing love and support throughout my education and academic career, and undoubtedly in the future. Thanks and scritches to Loki, the best dog for emotional support and mutually enjoyed cuddles, but the biggest thanks and love go to Max; my husband, my rock and always on-call computer-fixer. Finally, I would like to acknowledge my 14- year-old self, for aspiring to this.

Finis coronat opus.

10

Preface

VI. List of Publications

• Hitch, T. C. A., Thomas, B. J., Friedersdorff, J. C. A., Ougham, H., and Creevey, C. J. (2018). Deep Sequence Analysis Reveals the Ovine Rumen as a Reservoir of Antibiotic Resistance Genes. Environ. Pollut. 235, 571–575. doi:10.1016/j.envpol.2017.12.067. • Friedersdorff, J. C. A., Thomas, B. J., Hay, H. R., Freeth‐Thomas, B. A., and Creevey, C. J. (2019). From treetops to tabletops: a preliminary investigation of how plants are represented in popular modern board games. PLANTS, PEOPLE, PLANET 1, 290–300. doi:10.1002/ppp3.10057. • Gilbert, R. A., Townsend, E. M., Crew, K. S., Hitch, T. C. A., Friedersdorff, J. C. A., Creevey, C. J., Pope, Phillip B., Ouwerkerk, D., Jameson, E. (2020). Rumen Virus Populations: Technological Advances Enhancing Current Understanding. Front. Microbiol. 11, 450. doi:10.3389/fmicb.2020.00450. – Contributed a section about the countercurrent chromatography work in Chapter 7. • Friedersdorff, J. C. A., Thomas, B. J., Pidcock, S. E., Hart, E. H., Rubino, F., and Creevey, C. J. (2020). “Genome Sequencing and the Rumen Microbiome,” in Improving Rumen Function, eds. C. S. Mcsweeney and R. I. Mackie (Cambridge, UK: Burleigh Dodds Science Publishing), 97–132. • Friedersdorff, J. C. A, Kingston-Smith, A., Pachebat, J., Cookson, A., Rooke, D., and Creevey, C. (2020). The Isolation and Genome Sequencing of Five Novel Bacteriophages From the Rumen Active Against Butyrivibrio fibrisolvens. Front. Microbiol. 11, 1–13. doi:10.3389/fmicb.2020.01588. – A publication from the output of Chapter 6.

Offered Papers and Posters • Friedersdorff, J., Kingston-Smith, A., Creevey, C., Rooke, D., Bright, C., and West, D. (2018). Using High Performance Counter Current Chromatography for the Separation, Preparation and Purification of Bacteriophages from the Rumen Microbiome. [Poster] Exhibted at Countercurrent chromatography conference 2018, Braunchschweig, 01 Aug 2018 - 03 Aug 2018. doi: 10.13140/RG.2.2.12796.90246 – preceeded the work carried out in Chapter 7. • Friedersdorff, J., Creevey, C., and Kingston-Smith, A. (2019). Characterising the Genomes and Transcriptomes of Hyper Ammonia Producing Bacteria from the Rumen. Access Microbiol. 1, 814. doi:10.1099/acmi.ac2019.po0523. – talk presenting some of the results found whilst carrying out the work in Chapter 4 and 5. • Friedersdorff, J. (2020). Using the little things to tackle the big things: Can we use phages to impact climate change? Capsid & Tail, (82). Retrieved from https://phage.directory/capsid/rumen-phage - Future work

11

Preface VII. Figure Legends

Figure 1.1 A diagram showing the organisation of the chambers of the ruminant stomach. The first and largest of the four chambers is the rumen, which is further divided into four sections; the dorsal sac (DSR), dorsal blind sac (DBS), ventral sac (VSR) and ventral blind sac (VBS). The cranial sac (CSR) then links the rumen to the reticulum (RET). This is connected to the omasum (OM) by the reticulo-omasal orifice (ROO). The final chamber is the abomasum (ABO) which then leads to the duodenum (D) (adapted from (Wu and Papas, 1997))...... 29 Figure 1.2 Simplified overview of rumen fermentation (adapted from (Place et al., 2011))...... 31 Figure 1.3 Schematic showing carbohydrate fermentation pathways. These are the main pathways employed by ruminal microbes to ferment the different carbohydrates available. Dotted lines show reactions carried out by other microbes (adapted from (Russell and Rychlik, 2001))...... 31 Figure 1.4 Schematic diagram showing the fate of key nitrogen sources in the rumen. Dotted circles show the process and dotted arrows show alternative directions. BCVFA – Branched chain volatile fatty acids. 1These may also be available in the feed, 2Urea is also removed from the host in the urine (adapted from (Hartinger et al., 2018))...... 35 Figure 1.5 Diagram of key cellulolytic and the reactions they catalyse (Krause et al., 2003). Endoglucanase enzymes first break down the complex cellulose fibres, allowing access to cellobiohydrolases, which produces glucose polymers. Exoglucanase breaks down polysaccharide chains into smaller subunits, whilst glucosidase converts cellobiose into glucose (adapted from (Krause et al., 2003))...... 39 Figure 1.6 A schematic diagram showing the key processes in proteolysis. Intracellular processes of ruminal bacteria are those inside the oval and extracellular activity is outside, where the protease is bound to the membrane (adapted from (Bach et al., 2005))...... 42 Figure 1.7 The key bacteria involved at different stages of the proteolytic process (adapted from (Walker, Newbold and Wallace, 2005))...... 42 Figure 1.8 The fate of the amino acid leucine in the rumen. Leucine is ultimately degraded to the VFA isovalerate, carbon dioxide and glutamate through the transamination and decarboxylation process in (a). Glutamate is then deaminated to 2-oxoglutarate and ammonia, and NADH is hydrogenated (b). Hydrogenase enzymes then remove hydrogen to produce NAD (c), and methanogenesis converts carbon dioxide and hydrogen into methane (d) (adapted from (Hino and Russell, 1985))...... 43 Figure 1.9 Structural equation showing the Stickland reaction. The reaction in (A) shows the deamination of a donor amino acid catalysed by a dehydrogenase, producing ammonia and hydrogen and an α-keto acid, to which an inorganic phosphate binds to form an acyl-phosphate. (B) a reductase reduces an acceptor amino acid, donating the protons and electrons from the first reaction, producing yet more ammonia and the carbon chain creating the corresponding VFA. (C) The phosphate group phosphorylates ADP to create ATP (adapted from (de Vladar, 2012))...... 45

12

Preface Figure 1.10 Illustration of key metabolic pathways that take place in Clostridium sticklandii (strain DSM519) based on the genome Figure from (Fonknechten et al., 2010)...... 48 Figure 1.11 Key pathways for biofuel production in C. sticklandii. Coloured boxes highlight the different pathways involved in catabolism or biosynthesis of different substrates, which are given in dark blue. Figure from (Sangavai and Chellapandi, 2017)...... 49 Figure 1.12 Diagram of lytic and lysogenic cycles (adapted from (Gilbert and Ouwerkerk, 2020a)). . 52 Figure 1.13 Transmission electron microscopy (TEM) images of different phages isolated from the rumen. A range of morphological features are demonstrated by these phages, including group A morphology (typical of Myoviridae), with icosahedral or octagonal capsids, contractile tail and sometimes visible baseplate and collars (a,b,c,d). Group B morphology (typical of Siphoviridae) is also seen with elongated heads and long tails (e,f,g,k). Group C type morphologies (typical of Podoviridae) show collars and short tails (h, i, j), and a phage formed of long capsids was also seen (l) (adapted from (Klieve and Bauchop, 1988))...... 54 Figure 3.1 Overview of the methods and processes used. Italics indicate the sections in chapter 2 where the methods described...... 89 Figure 3.2 Diagram representing the four ammonia producing phenotypes. Hexagons in the top graph represent a bacterium with a known ammonia production phenotype, placed on the y-axis to give a qualitative visual representation of amount of ammonia it produces compared to other bacteria, based on the literature. X-axis positioning is relevant to metabolic abilities of the bacterium, indicated by the elongated shapes below the graph. Size of trapezoid shape represents proportions of that ability, where a thicker shape shows more of that activity and absence shows none of that activity. (1Bladen et al., 1961; 2Attwood et al., 1998; 3Whitehead et al., 2011; 4Russell, 2005; 5Eschenlauer et al., 2002)...... 92 Figure 3.3 Reactions of the amino acids most preferentially used by the obligate amino acid fermenting bacteria. 1Chen and Russell, 1988, 2Fonknechten et al., 2010...... 97 Figure 3.4 An example of alanine degradation producing pyruvate, ammonia and hydrogen from MetaCyc (Caspi et al., 2014)...... 97 Figure 3.5 Mauve multiple genome aligner showing arrangement of colinear blocks in the genomes of the sequenced HAPs compared to a reference genome. (A) shows the reference genome C. sticklandii DSM519 (top) aligned to the sequenced A. sticklandii ATCC12662 strain from this study (below). (B) shows the reference genome E. pyruvativorans i6 (top) from the Hungate collection aligned to the E. pyruvativorans Isol6 genome from this study (below). Each colinear block is coloured, with red lines denoting contigs and coloured lines between the genomes match up blocks that are aligned. Sequence similarity is represented as a graph within the blocks as a darker coloured line, for most blocks, similarity is so high this is difficult to visualise...... 104 Figure 3.6 A boxplot in the style of Tukey of ammonia production from cultures in mid-log growth phase. Each box is formed of the median over six replicates for each bacterium, with the upper and lower interquartile ranges. Whiskers show smallest or greatest observation greater than or smaller

13

Preface than 1.5 times the interquartile ranges above and below the hinges, respectively. Observations outside of this range appear as outliers, indicated in the graph as black points. The solid line indicates the median of the blank, with dashed lines showing the upper and lower hinges of the blank to indicate a baseline of ammonia in the medium. Shared letters show no significant difference using Kruskal-Wallis and Bonferroni correction method (P>0.05)...... 107 Figure 3.7 Bar graphs showing amounts of volatile fatty acids present in samples after growth. Red bars are hyper ammonia producers (HAPs), no ammonia prodcuers (NAPs) are green and blue are some ammonia producers (SAPs). The single gray bar is the blank medium, which acts as a baseline of VFAs present in the medium. The bar heights are the mean of six samples, with the error bar showing one standard deviation above and below the mean. The dashed lines indicate one standard deviation above and below the growth medium baseline...... 111 Figure 3.8 (On previous page) FTIR analysis on supernatant. Normalised spectra, principal component analysis graphs and corresponding loading graphs for the second component for the supernatants of each of the three ammonia production phenotypes. A-C are the normalised spectra for the HAPs, NAPs, SAPs respectively, where grey bands correspond to polysaccharides (900 ~ 1200 cm-1), amide II (1600 ~ 1640 cm-1), amide I (1650 ~ 1690 cm-1) and fatty acids (2900 – 3100 cm-1), (Belanche et al., 2013; Mayorga et al., 2016; Nakanishi, 1962). Principal components one and two are plotted (D- F) out of the three components needed for each to describe >99.5 % of the variation. Loading graphs for the second principal component are shown below (G-I), with the red line indicating zero...... 115 Figure 3.9 (On next page) FTIR analysis on the cell pellet. Normalised spectra, principal component analysis graphs and corresponding loading graphs for the second component for each of the three ammonia production phenotypes. The grey bands in A-C correspond to polysaccharides (900 ~ 1200

-1 -1 + cm ), nucleic acids (1250 cm ), bending in methylene bonded to nitrogen in amine salts (CH2-H , 1400~1440 cm-1), amide II (1600 ~ 1640 cm-1), amide I (1650 ~ 1690 cm-1), fatty acids (2900 – 3100 cm-1), (Belanche et al., 2013; Mayorga et al., 2016; Nakanishi, 1962). Principal components one and two are plotted out of five components needed to explain >99.5 % of the variance (D-F). Given that the greatest separation of the groups is due to the y axis, loadings for principal components two are shown (G-I) with the red line indicating zero...... 115 Figure 3.10 (On next page) Spectra of all the bacterial samples and blanks, as well as PCA and DFA plots for both the supernatant (left) and pellet (right). Principal components one and two were plotted, out of three needed for supernatant samples and five needed for pellet samples to explain >99.5 % of the variance (C & D). Discriminant function analysis graphs (E & F) show separation in a supervised approach, with circles indicating 95 % level of confidence. Solid coloured, hollow or encircled symbols in the DFA plots indicate whether that data point was used as test, training or validation respectively...... 118 Figure 4.1 Flow chart showing the key steps in this analysis, with relevant methods described in the sections given in italics...... 122

14

Preface Figure 4.2 Maximum likelihood supermatrix phylogenetic tree. Coloured bands in the outer ring indicate phyla as per the legend, and the coloured shapes in the outer ring indicate HAP, SAP or NAP phenotype, where filled shapes are confirmed phenotypes and empty shapes indicate putative phenotype. Purple triangles indicate the positions of the four sequenced genomes...... 125 Figure 4.3 Venn diagram showing the counts of gene families shared between all species with known ammonia production phenotypes. HAPs are in red, SAPs in blue and NAPs in green...... 129 Figure 4.4 Principal component analysis (PCA) plotting variance between counts of gene families for the three ammonia producing phenotypes...... 129 Figure 4.5 Functional profiles using higher level COG functional categories. Background colours to the bars indicate HAPs (red), SAPs (blue) and NAPs (green)...... 132 Figure 4.6 Venn diagram showing counts of clusters of orthologous genes unique or common to the different ammonia production phenotypes. HAPs are in red, SAPs in blue and NAPs in green...... 134 Figure 4.7 Principal component analysis (PCA) plotting variance between counts of genes belonging to bacterial orthologous genes (COGs)...... 134 Figure 4.8 Stacked graph of amino acid and peptide transporters in the genomes of species with known ammonia production phenotypes...... 136 Figure 5.1 Flow diagram of the steps taken for RNA extraction and transcriptomic analysis. Italics show the section in chapter 2 to refer to for more details...... 146 Figure 5.2 Mean quality Phred scores per base in reads from all samples. Made with MultiQC (Ewels et al., 2016)...... 149 Figure 5.3 Stacked graph for the expression of genes belonging to different functional categories, expressed as a percentage. Each functional category is coloured according to the legend on the right. Normalised expressions counts were used with the count for each gene in a functional category summed and expressed as a total of all expression counts. The three species to the left of the dotted line are HAPs, to the right are NAPs...... 151 Figure 5.4 Tukey style box and whisker plots of percentage of reads aligning to coding sequences with a particular function. The boxplots summarise the data within each phenotype, showing the median, first and third quartiles in the box, and 1.5 times the interquartile range above and below the median in the whiskers. The points are the replicates for each species, coloured according to the legend on the right. Black points are outliers. The functional identifiers are J - Translation, ribosomal structure and biogenesis; K – Transcription, L - Replication, recombination and repair; D- Cell cycle control, V- Defence mechanisms; T - Signal transduction mechanisms; M - Cell wall/membrane/envelope biogenesis; N - Cell motility; U - Intracellular trafficking, secretion, and vesicular, transport; O - Posttranslational modification, protein turnover, chaperones; C - Energy production and conversion; G - Carbohydrate transport and metabolism; E- Amino acid transport and metabolism; F- Nucleotide transport and metabolism; cell division, chromosome partitioning; H - Coenzyme transport and metabolism; I - Lipid transport and metabolism; P - Inorganic ion transport and metabolism; Q - Secondary metabolites biosynthesis, transport and catabolism, S- Function unknown...... 152 15

Preface Figure 5.5 Tukey style box and whisker plots of significantly different expression of eggNOG functional orthologous groups. The boxplots summarise data within each phenotype, showing the median, first and third quartiles in the box, and 1.5 times the interquartile range above and below the median in the whiskers. The points are the replicates for each species, coloured according to the legend on the right. Data used for the boxplots and the points use DESeq2 normalised counts. (A) shows only boxplots for COG gene families that are significantly increased in HAPs, whereas (B) shows all but two gene families that are higher in NAPs. Species are indicated in the legend; 12662 – A. sticklandii ATCC12662; 49906 – C. aminophilum ATCC49906; Isol6 – E. pyruvativorans Isol6; S85 – F. succinogenes S85; 007c – R. flavefaciens 007c; SY3 – R. albus SY3...... 159 Figure 5.6 Tukey style box and whisker plots of significantly different expression of KEGG orthologous groups. The boxplots summarise data within each phenotype, showing the median, first and third quartiles in the box, and 1.5 times the interquartile range above and below the median in the whiskers. Data used for the boxplots and the points are replicates of the species based on DESeq2 normalised counts. (A) shows only boxplots for KO families that are significantly increased in HAPs, whereas (B) shows all but three gene families that are higher in NAPs. Species are indicated in the legend; 12662 – A. sticklandii ATCC12662; 49906 – C. aminophilum ATCC49906; Isol6 – E. pyruvativorans Isol6; S85 – F. succinogenes S85; 007c – R. flavefaciens 007c; SY3 – R. albus SY3...... 168 Figure 5.7 Stacked graph of transcripts per million totals for amino acid and peptide transporter families. The TPM values were totalled for all orthologs belonging to the same transporter family...... 170 Figure 5.8 Boxplots for each of the transporter families. HAP species are in shades of red and NAPs in shades of green, according the legend on the right. The P-value for each transporter family is the comparison of HAPs and NAPs using a Wilcox rank sum test. Here, zero expression equates to absence of ortholog...... 171 Figure 5.9 An illustration of the possible interactions between two reactions which may be the pinnacle of the HAP phenotype...... 175 Figure 6.1 Transmission Electron Micrographs of viral particles from Rumen Fluid samples. A-D are from cow rumen fluid, whereas E-H were from sheep. Scale bars with magnification are provided on each image. Varying families of phages can be seen, such as the typical contractile tails of Myoviridae in B, E and F, the longer non-contractile tails of Siphoviridae in C, D, G and H, possible Podoviridae with short tails or one of the non-tailed phage families in A. The long structure visible in G could be a tail, or a filamentous phage or plant virus...... 181 Figure 6.2 Clostridum aminophilum streaked on different agar. Left is Hobson’s M2 2 % agar medium, and right is reinforced clostridial medium (RCM) with 1.5 % casamino acids...... 183 Figure 6.3 Flow diagrams showing the tests carried out for these four bacterial species. HAP = hyper ammonia producer, NAP = no ammonia producer, AOL = Area of lysis, area of clearing in the lawn, SRF = sheep rumen fluid, SF = sheep faecal sample, CRF = cow rumen fluid, CF = cow faecal

16

Preface

1 sample. See methods section 2.11.7 for information regarding supplements and alternative growth media...... 184 Figure 6.4 Flow diagram of steps taken to screen rumen-associated samples and isolate phages...... 185 Figure 6.5 Plaque morphologies of phages isolated against Butyrivibrio fibrisolvens DSM3071. (A) Phage D, (B) Phage M, (C) Phage P, (D) Phage C, (E) Phage J...... 186 Figure 6.6 Transmission electron micrographs of lytic Butyrivibrio phages. (A) Phage C, (B) phage D, (C) Phage J, (D) Phage P, (E) Phage M. Scale bars are labelled...... 187 Figure 6.7 Mauve alignment of the five Butyrivibrio phages. Colinear blocks are colour-coded, with the similarity graph plotted within the block and lines join similar blocks across genomes...... 190 Figure 6.8 Analysis of the Butyrivibrio phage Arian and Bo-Finn genomes. (A) Mauve alignments of the two genomes showing annotated ORFs, with directionality and function according to colour. Nucleotide similarity is expressed as a graph, with green indicating 100 %, yellow >30 %, red <30 %. (B) The organisation of the tail morphogenesis module in phage Arian. Uncharacterised proteins indicate where the protein was homologous to one reported previously, but had an unknown function, whereas hypothetical proteins did not show similarity above the set thresholds to any known proteins. Phage encoded proteins are those that there was uncertainty about their function. 193 Figure 6.9 Genome map of Butyrivibrio phage Arian. GC content is shown in blue and AT in green in the centre. Open reading frames are colour coded based on function, according to the legend. Uncharacterised proteins indicate where the protein was homologous to one reported previously, but had an unknown function, whereas hypothetical proteins did not show similarity above the set thresholds to any known proteins. Phage encoded proteins are those that there was uncertainty about their function...... 194 Figure 6.10 Genome map of Butyrivibrio phage Bo-Finn. GC content is shown in blue and AT in green in the centre. Open reading frames are colour coded based on function, according to the legend. Uncharacterised proteins indicate where the protein was homologous to one reported previously, but had an unknown function, whereas hypothetical proteins did not show similarity above the set thresholds to any known proteins. Phage encoded proteins are those that there was uncertainty about their function...... 195 Figure 6.11 Analysis of the Butyrivibrio phage Arawn and Idris genomes. (A) Mauve alignments of the two genomes showing annotated ORFs, with directionality and function according to colour. Nucleotide similarity is expressed as a graph, with green indicating 100 %, yellow >30 %, red <30 %. (B) The organisation of the tail morphogenesis module in phage Arawn. Uncharacterised proteins indicate where the protein was homologous to one reported previously, but had an unknown function, whereas hypothetical proteins did not show similarity above the set thresholds to any known proteins. Phage encoded proteins are those that there was uncertainty about their function. 197 Figure 6.12 Genome map of Butyrivibrio phage Arawn. GC content is shown in blue and AT in green in the centre. Open reading frames are colour coded based on function, according to the legend. Uncharacterised proteins indicate where the protein was homologous to one reported previously, but 17

Preface had an unknown function, whereas hypothetical proteins did not show similarity above the set thresholds to any known proteins. Phage encoded proteins are those that there was uncertainty about their function...... 198 Figure 6.13 Genome map of Butyrivibrio phage Idris. GC content is shown in blue and AT in green in the centre. Open reading frames are colour coded based on function, according to the legend. Uncharacterised proteins indicate where the protein was homologous to one reported previously, but had an unknown function, whereas hypothetical proteins did not show similarity above the set thresholds to any known proteins. Phage encoded proteins are those that there was uncertainty about their function...... 199 Figure 6.14 Genome map of Butyrivibrio phage Ceridwen. GC content is shown in blue and AT in green in the centre. Open reading frames are colour coded based on function, according to the legend. Uncharacterised proteins indicate where the protein was homologous to one reported previously, but had an unknown function, whereas hypothetical proteins did not show similarity above the set thresholds to any known proteins. Phage encoded proteins are those that there was uncertainty about their function...... 201 Figure 6.15 The results of the ViPTree analysis (Nishimura et al., 2017) using a protein distance metric based on normalised tBLASTx scores plotted on a log scale. The tree includes 358 related taxa, with the five Butyrivibrio phages in this study highlighted with red stars and labelled with arrows. Taxa are also annotated with the virus family they belong to (if known) on the inner ring; red for Siphoviridae, lime green for Myoviridae and turquoise for Podoviridae. The phyla the bacterial host belongs to for each taxa are given in the outer ring; Firmicutes in blue and Gammaproteobacteria in green...... 204 Figure 7.1 “Type-J planetary motion of a multilayer coil separation column. The column holder rotates about its own axis and revolves around the centrifuge axis at the same angular velocity (ω) in the same direction. This planetary motion prevents twisting the bundle of flow tubes allowing continuous elution through a rotating column without the risk of leakage and contamination.” Figure and caption from (Ito, 2005) ...... 209 Figure 7.2 Schematic diagram of the instrumental set up. The green highlights the route of the lower phase in the reverse phase through the semi-prep column. In this study, only the mobile phase was pumped into the semi-prep column (no upper or lower phases were used), and both reverse phase (RP) and normal phase (NP) were used. In later runs, the phage sample was injected inline after the pump, bypassing the valve box and normal sample injection port, as shown by the red arrow...... 210 Figure 7.3 Graphs showing fractionation of different nano and microparticles (Fedotov et al., 2015). (A) shows elution peaks for silica standards of three sizes; 150nm, 290nm and 900nm, which concur with the change in flow rate. (B) shows elution peaks for urban street dust particles of a heterogenous mixture of particles within a range of sizes...... 213

18

Preface Figure 7.4 Previously published electron micrographs of ϕX174 and T4 bacteriophages to illustrate size and shape. Left - ϕX174 phage stained using a uranyl-acetate based ss method (Yazaki, 1981). Right - T4 Phage stained with sodium molybdate at 333,000 x magnification (Bradley 1963)...... 213 Figure 7.5 Flowchart of processes detailing the runs, the conditions, and the aims. The figures these results produced are in italics on the right...... 216 Figure 7.6 Chromatograms from T4, 6 ml/min, reverse phase. Both chromatograms show the UV absorbance plotted over time, as well as the HPCCC speed and temperature. The bottom chromatogram also overlays the fractions collected in red, with the blue rectangle showing the fractions around the peak that were tested. Channels A = 210 nm, B = 254 nm, C = 280 nm, D = 366 nm...... 218 Figure 7.7 Fractions of 1 ml T4 sample applied to CCC in reverse phase at 6 ml/min. Areas of lysis visible on a lawn of E. coli DSM613 for fractions as labelled. The T4 sample before application is

also tested, alongside FB and distilled water (dH2O)...... 218 Figure 7.8 Concentration gradients of the fractions of 1 ml T4 sample applied to CCC at 6 ml/min. Areas of lysis in the form of whole spots, confluent spots or single plaques visible on a bacterial lawn of E.coli DSM613. Labels of the fraction are in black on the left of the plates, and dilution factor in white along the bottom. FB is applied to the bottom row on both plates as a control...... 219 Figure 7.9 Spot assay of propagated T4 samples from fractions. Three fractions were subsampled and tested for phage to check subsequent lysis, indicative of phage activity. FB – Fortier buffer...... 220 Figure 7.10 Chromatograms from ϕX174, 6 ml/min, reverse phase. Both chromatograms show the UV absorbance plotted over time for four wavelengths, as well as the HPCCC speed and temperature. The bottom chromatogram also overlays the fractions collected, with the blue rectangle showing the fractions around the peak that were tested...... 222 Figure 7.11 Fractions of 1 ml ϕX174 sample applied to CCC in reverse phase at 6 ml/min. Areas of lysis visible on a lawn of E. coli DSM13127 for fractions as labelled. The ϕX174 sample before

application is also tested, alongside Fortier buffer (FB) and distilled water (dH2O)...... 222 Figure 7.12 Concentration gradients of the fractions of 1 ml ϕX174 sample applied to CCC in reverse phase at 6 ml/min. Areas of lysis in the form of whole spots, confluent spots or single plaques visible on a bacterial lawn of E.coli DSM13127. Labels of the fraction are in black on the left of the plates, and dilution factor in white along the bottom. Sample 20 was spotted incorrectly (human error) and is labelled to correct for this. Fortier Buffer (FB) is applied to the bottom row on both plates as a control...... 223 Figure 7.13 Spot assay of propagated ϕX174 samples from fractions. Three fractions were subsampled from the initial spot test plate and tested for phage to check subsequent lysis, indicative of phage activity. FB – Fortier buffer, FB-C – contaminated FB spot, experimenter error...... 224 Figure 7.14 Using wave scans to determine whether phages cause peaks in absorbance. A and B are the spectra for water and Fortier Buffer, blanked against water. C-F are the spectra for bacterial elutes and the phage elutes blanked against the buffer...... 225 19

Preface Figure 7.15 Spot tests on square plates with 10μl from fractions produced in run 2(A-C) spotted onto soft agar overlays. Bacteria used was E. coli B (DSM613) and phage loaded on the column was a T4 preparation. The first plate (A) was the result of a run using 3ml/min flow rate, reverse phase and 0.5ml of sample, the second plate (B) the same conditions except with normal phase, and the third plate (C) reverse phase with a flow rate of 1.5ml/min...... 227 Figure 7.16 Chromatogram from 0.1 ml T4 at 3 ml/min in reverse phase after sodium hydroxide treatment. Both chromatograms show the UV absorbance plotted over time at four wavelengths, as well as the HPCCC speed and temperature. The bottom chromatogram also overlays the fractions collected, with asterisks above and the blue rectangle around the peak showing those fractions that were tested. Note that the line for 210 nm is not visible in this second chromatogram, but the peak is still apparent...... 228 Figure 7.17 Spot test of fractions of 0.1 ml T4 at 3 ml/min. The numbers correspond to 10 μl of that fraction spotted onto the bacterial host E. coli DSM613. IN – a sample of the mobile phase (water) entering the column, OUT – a sample of the mobile phase from the column after NaOH treatment. F1 – column flushed after the revolutions were stopped. F2 – same as F1, but after ~100 ml had been flushed. FB – Fortier Buffer. T4 – the initial T4 sample applied to the column...... 229 Appendix Figure 10.1 Growth curves of HAP, NAP and SAP cultures, all grown in the same conditions. Multiple lines and different symbols indicate multiple replicates. Some slow growing cultures were done in two parts; to capture growth across a 24 or 48 hour period as required...... 259 Appendix Figure 10.2 (On previous page) Tukey style box and whisker plots of significantly different expression of 268 eggNOG functional orthologous groups that were manually assessed for significant differences between HAPs and NAPs. The boxplots summarise data within each phenotype, showing the median, first and third quartiles in the box, and 1.5 times the interquartile range above and below the median in the whiskers. The points are the replicates for each species, coloured according to the legend on the right. Data used for the boxplots and the points use DESeq2 normalised counts. Species are indicated in the legend; 12662 – A. sticklandii ATCC12662; 49906 – C. aminophilum ATCC49906; Isol6 – E. pyruvativorans Isol6; S85 – F. succinogenes S85; 007c – R. flavefaciens 007c; SY3 – R. albus SY3...... 272 Appendix Figure 10.3 Phylogenomic tree from VICTOR analysis. Phylogenomic genome-BLAST distance phylogeny method, using the formula D0, yielding average support of 78 %. The numbers below the branches are GBDP pseudo-bootstrap support values from 100 replications. The branch lengths of the resulting VICTOR trees are scaled in terms of the respective distance formula used (Meier-Kolthoff and Göker, 2017; Meier-Kolthoff et al., 2013). The tree was visualized in iTOL (Letunic and Bork, 2007). Matching colours represent a shared species, genus or family, respectively...... 276 Appendix Figure 10.4 Metabolism pathway map showing the pathways related to for the orthologous functional groups found to show significantly different expression in HAPs (in red) and NAPs (in green). Pathway made using iPath3.0 (Darzi et al., 2018)...... 278 20

Preface Appendix Figure 10.5 Metabolism pathway map showing the pathways related to for the KEGG orthologous gene groups that showed significantly different expression in HAPs (in red) and NAPs (in green). Pathway made using iPath3.0 (Darzi et al., 2018)...... 279

VIII. Table Legends

Table 1.1 Average expected population size of microorganisms in the rumen. ([1] (Lean et al., 2014) and the references therein, [2] (Klieve and Swain, 1993)) ...... 36 Table 1.2 Summary of hyper ammonia producing bacteria characteristics in the rumen. Information from figure in (Wallace, 1996)...... 47 Table 2.1 Universal 16S primers used to amplify the V6-V8 region of rRNA...... 66 Table 2.2 EGN parameters. Parameters a user can define during the edge creation step of EGN (Halary, 2012)...... 78 Table 2.3 List of amino acid and peptide transporters of interest with their transporter codes...... 79 Table 3.1 List of species that were underwent revival from frozen stocks which were stored at either - 20°C1 or -80°C2. 3Both of these are the same strain but were obtained from different culture collections at different times. 4Genomes of these strains were sequenced. 5These species underwent further characterisation...... 89 Table 3.2 Overview of the Hyper Ammonia Producing bacteria phenotypes. Nitrogen source and amino acid preference is determined by those that are depleted most in media or result in more ammonia production...... 99 Table 3.3 Statistics on the sequenced genomes using QUAST from Microbes NG...... 101 Table 3.4 Results from MicrobesNG taxonomic identification analysis. As a part of the sequencing service, Kraken is used to map reads from samples to known Bacterial families or genera, the outcomes of the most frequent taxon is reported below. Samples were arbitrarily labelled but include strain information. Results can be found here: https://microbesng.uk/portal/projects/53009323-2238- 484E-99C0-53955A370129/...... 103 Table 3.5 BLAST results of sequenced 16S rRNA gene from cultures. The name of the culture is in the first column, followed by direction (Dir); forwards (F) or reverse (R). Subject ID is the sequence in the database that the query sequence had the highest match to, with percentage identity in the adjacent column. Those highlighted in grey did not match the strain expected. Some queries hit more than one sequence in the database equally well, and both are reported here where this was the case. 1PCR products submitted for sequencing, not excised gel band...... 105 Table 3.6 Volatile fatty acid (VFA) production from bacterial cultures. Values are means of six individual replicates with the standard deviation shown. Kruskal-Wallis test with Bonferroni corrections were completed for each column. Shared letters show no significant difference (P>0.05)...... 110 Table 5.1 Information and statistics pertaining to the quality control of the transcriptomic data...... 148

21

Preface Table 5.2 Functional descriptions and identifiers for the orthologous functional groups found to show significantly different expression in HAPs and NAPs. Those families in red are more expressed in the HAPs, and those in green are most expressed in the NAPs. The functional identifiers are C - Energy production and conversion; E- Amino acid transport and metabolism; F- Nucleotide transport and metabolism; D- Cell cycle control, cell division, chromosome partitioning; K – Transcription, O - Posttranslational modification, protein turnover, chaperones; H - Coenzyme transport and metabolism; T - Signal transduction mechanisms; J - Translation, ribosomal structure and biogenesis; U - Intracellular trafficking, secretion, and vesicular, transport; L - Replication, recombination and repair; I - Lipid transport and metabolism; P - Inorganic ion transport and metabolism; M - Cell wall/membrane/envelope biogenesis; G - Carbohydrate transport and metabolism; N - Cell motility. Information for descriptions obtained from eggNOG 5.0 database (http://eggnog5.embl.de/download/eggnog_5.0/per_tax_level/2/2_annotations.tsv.gz, accessed 22/01/2020.) ...... 154 Table 5.3 Table of the average transcripts per million (TPM) counts for the four replicates each for the three HAP species, including standard deviation...... 160 Table 5.4 Transcripts per million (TPM) averages across four replicates for each species, with standard deviation for the seven KOs that were unique to HAPs, or common to HAPs and SAPs but not NAPs. Descriptions as per the KEGG database ((Kanehisa et al., 2016), www.genome.jp//; accessed 11/02/2020)...... 162 Table 5.5 Table of KEGG orthologous gene groups that showed significantly different expression in HAPs and NAPs. Numbers denote the mean and standard deviation of DESeq2 normalised count data, and the letter shows the where significant differences lie; different letters shows a significant difference between these species. Those gene families highlighted in red are where the median is greater in HAPs compared to NAPs, whereas those highlighted in green are greater in NAPs compared to HAPs...... 162 Table 5.6 List of the descriptions from KEGG for the KEGG orthologs (KOs) found to have a significant increase in expression in HAPs compared to NAPs. Commision (EC) number given if ortholog is an enzyme...... 165 Table 6.1 Phage morphologies and characteristics. Source is the phage filtrate that yielded the plaque. *Mixture is a combination of subsamples taken from areas of lysis from spots of sheep and cow rumen fluid, and a cow faecal sample. Plaque morphology on the same host with the same growth conditions. Head and tail size approximate, as measured from two or three TEM images...... 186 Table 6.2 Summary of phage sample and its source, the genomes constructed, coverage and allocated phage name. *Mixture – plaques from one sample of cow faeces, cow rumen fluid and sheep rumen fluid were combined and tested again to produce the plaques from which these phages were isolated...... 189 Table 6.3 Genome statistics summary. *As used previously (Gilbert et al., 2017). N/A indicates that no single phage genome shared more than one best hit homologous protein...... 191 22

Preface Table 7.1 Tabular format of the results of the T4 spot test with dilutions of fractions, as seen in Figure 7.8...... 219 Table 7.2 Tabular format of the results of the ϕX174 spot test with dilutions of fractions, as seen in Figure 7.12...... 223 Appendix Table 10.1 Presence of characteristic genes from Acetoanaerobium sticklandii DSM519 in the HAPs, NAPs and SAPs. + indicated presence, based on BLASTp search with over 30% identity over at least 80% the length of the A. sticklandii DSM519 gene. Characteristic gene products listed in (Fonknechten et al., 2010)...... 260 Appendix Table 10.2 Table of 40 universal single-copy gene families. Table information taken from (Creevey et al., 2011)...... 264 Appendix Table 10.3 Number of Coding Sequences (CDS) and those which were not expressed. ... 265 Appendix Table 10.4 (On next page) Functional orthologous gene groups that showed significant difference in read counts between the species. Numbers denote the mean and standard deviation of DESeq2 normalised count data, and the letter shows the where significant differences lie; different letters shows a significant difference between these species. Those gene families above the bold line are where the median is greater in HAPs compared to NAPs. The functional identifiers are C - Energy production and conversion; E- Amino acid transport and metabolism; F- Nucleotide transport and metabolism; D- Cell cycle control, cell division, chromosome partitioning; K – Transcription, O - Posttranslational modification, protein turnover, chaperones; H - Coenzyme transport and metabolism; T - Signal transduction mechanisms; J - Translation, ribosomal structure and biogenesis; U - Intracellular trafficking, secretion, and vesicular, transport; L - Replication, recombination and repair; I - Lipid transport and metabolism; P - Inorganic ion transport and metabolism; M - Cell wall/membrane/envelope biogenesis; G - Carbohydrate transport and metabolism; N - Cell motility. 12662 – A. sticklandii ATCC12662; 49906 – C. aminophilum ATCC49906; Isol6 – E. pyruvativorans Isol6; S85 – F. succinogenes S85; 007c – R. flavefaciens 007c; SY3 – R. albus SY3...... 272 Appendix Table 10.5 Summary table of codon usage statistics for the bacterial host Butyrivibrio fibrisolvens DSM3071 and the five phage genomes. The ones highlighted correspond to the amino acid for which a tRNA was found in the genomes of phage Arian and Bo-Finn. (AA - Amino Acid, freq - frequency) ...... 275 Appendix Table 10.6 Summary table of the output from VICTOR. The genomes with the same numbers indicate that these belong to the same species, genus or family. This information is reflected in the tree in Appendix Figure 10.3...... 277

23

Preface

IX. Abbreviations

AF4 – asymmetrical flow field-flow sdFFF – sedimentation field-flow fractionation fractionation Amt – ammonium transporter SF – sheep faecal sample ANI – average nucleotide identity SRF – sheep rumen fluid AOL – area of lysis; an area of clearing VFA – volatile fatty acid observed on a lawn of bacterial cells °C – degree(s) Celsius ATCC – American Type Culture Collection ATP – adenosine triphosphate BAM – binary alignment map file format BED – browser extensible data file format CCC – Countercurrent chromatography CDS – coding sequence CF – cow faecal sample CF-HAPs – carbohydrate Fermenting hyper ammonia producers COG- cluster of orthologous groups CRF – cow rumen fluid DSMZ - Deutsche Sammlung von Mikroorganismen und Zellkulturen (German Collection of Microorganisms and Cell Cultures) EGN – Evolutionary Genome Network FAO – Food and Agriculture Organisation FFF – field-flow fractionation FTIR – Fourier transform infrared spectroscopy g – gravity or g-force gff – general feature format GHG – greenhouse gas h – hour(s) HAP – hyper ammonia producer HSD – Tukey’s honestly significant difference post hoc statistical test HP-FFAP – free fatty acid phase iTOL – The Interactive Tree Of Life, available at: http://itol.embl.de/ KEGG - Kyoto Encyclopaedia of Genes and Genomes KO – KEGG orthologous group LAB – lactic acid bacteria m – meter(s) min – Minute(s) ml – millilitre(s) mm – millimetre(s) μl – microlitre(s) μm – micrometre(s) NCBI – National Centre for Biotechnology Information OAFs – obligate amino acid fermenters PCA – principal component analysis PC1 and PC2 – the first and second principal components SAM – sequence alignment/map format file

24

Preface

X. Abstract

Greenhouse gas emissions and feed efficiency in ruminant livestock are pertinent and important topics, ones which have not suffered from lack of attention as ample research has endeavoured to further our understanding of the complex rumen microbial ecosystem and its residents that contribute to enteric fermentation. Despite this, some populations remain less well-characterised.

This is the key motivation behind the studies herein, which contribute to the understanding of the niche bacterial population of hyper ammonia producers (HAPs) and bacteriophages (viruses that infect bacteria) against predominant rumen bacteria. Chapter 3 introduces HAP species, and their ability to degrade amino acids and peptides for energy, which is a wasteful process producing hydrogen, carbon dioxide, and excessive amounts of ammonia. The hydrogen and carbon dioxide feed into methane production (methanogenesis) by archaea, whilst excess ammonia is removed from the animal host through the urine. The use of amino acids and production of excess waste products make the HAPs a target for potential population control, but firstly it was imperative to better understand this population. Suitable model bacterial species were identified for use in this study in chapter 3, and in vitro testing was carried out to confirm and define the ammonia producing phenotype of a total of nine species: three HAPS which produced significantly higher amounts of ammonia, three “some ammonia producers” (SAPs) which showed no difference in ammonia amounts compared to the ammonia in the growth medium (control), and three “no ammonia producers” (NAPs), which measured significantly less ammonia than the control, suggesting usage. Some separation of the three groups was seen using metabolome fingerprinting, and there were different volatile fatty acid production profiles, which together suggested that these three groups occupy different metabolic niches in the rumen and are therefore suitable for use in comparative genomic approaches.

The genomes of these nine species were then used to identify genomic signatures unique to the HAPs in chapter 4, which would allow other novel HAP bacteria to be identified in the rumen and wider afield. However, it was found that the HAPs are not all closely related and do not occupy a monophyletic clade, nor do the genomes share unique gene sequences or functions that suggest horizontal gene transfer or convergent evolution of the HAP phenotype. Without a unique genomic signature, it was hypothesised that functionally, HAPs utilise similar metabolic pathways as other bacteria, but because they rely on deamination for energy, must utilise these pathways more, something which could be reflected in the transcriptome.

Chapter 5 is a comparison of the transcriptomes of HAPs and NAPs grown in the same medium revealed that generally HAP species dedicated a larger proportion of transcripts to energy production and conservation, and amino acid transport or metabolism. Furthermore, a combination of genomic and transcriptomic evidence indicated the importance of glutamate and aspartate in HAP species. With significantly greater expression of glutamate dehydrogenase, aspartate aminotransferase and associated transporters in the HAPs, this all provided evidence that HAPs could be cycling these to 25

Preface produce oxaloacetate, which can then enter the tricarboxylic acid cycle to produce ATP. In the process of oxaloacetate production, hydrogen and ammonia are produced, some of the defining features of the HAPs.

Despite being abundant in the rumen, bacteriophages have received less attention than other populations. To date there are only five genomes of lytic phages isolated from rumen-associated samples that target rumen bacteria. Whilst the primary aim of chapter 6 was to isolate phages against HAPs, other predominant bacterial species were also used as hosts for novel phage isolation from fresh faecal and rumen fluid samples from cows and sheep. Five Butyrivibrio phages were successfully isolated, sequenced, and characterised, and were found to belong to four unique species, three unique genera and the same viral family. They shared no sequence similarity to any other phage genomes deposited in public databases. Whilst they were only observed undergoing the lytic cycle, there was genomic evidence that two of the candidates could be temperate phages. Genomic analysis combined with TEM microscopy indicates that these five phages likely belong to the Siphoviridae phage family. This contribution of five novel genomes is a small but important contribution, as increasing the number of rumen phage representatives in databases is necessary to aid understanding and annotation of viral sequences in future metagenomic and metatranscriptomic studies, primarily in the rumen, but also further afield.

To explore alternative methods for phage separation with the ultimate aim to obtain phage fractions from crude rumen fluid, counter current chromatography (CCC) was employed as described in chapter 7. Applications of Enterobacteria phages T4 and ϕX174 to a CCC column revealed that these phages can survive the repeated acceleration forces through centrifugation and be eluted from the column intact and active. However, further work is needed to bring this method to full fruition, applying the full potential of CCC and biphasic liquid separation to achieve distinct phage fractions, as a novel and scalable method for phage separation and purification.

26

Chapter 1

1. Background and Introduction

1.1. Ruminants and their Importance

Ruminants are a group of animals classified by the structure and function of their digestive system. Although they were originally grouped based on their four compartment stomach and their ability to ruminate (rechew cud), their inherent ability to take advantage of microbial fermentation for digestion of fibrous feeds unsuitable for human consumption is what sets these animals apart (Russell, 2002). This group includes smaller ruminants such as sheep, goats and deer, and larger animals such as cattle, camels, and giraffes. As of 2018, the population of cattle in the UK alone stood at around 9.9 million and 33.8 million sheep and goats1.

Ruminants were and continue to be important to human livelihood. They offer a source of food through milk production and meat from slaughter, fibres from wool, skins and hides for clothes and bedding, and manure for crop fertilisation; all of which are respected commodities which also give livestock farmers their living (FAO, 2016b). Because of the importance of ruminants, selected traits in individuals such as disease resistance, improved nutrition and increased output have been brought about through breeding and research (FAO, 2016b). Ruminants will also play an important role in the future, forming a target for the Food and Agriculture Organization (FAO) of the United Nations to meet the demand for the projected 50 % increase in food and agriculture requirements from 2012 by 2050 (FAO, 2017). The projected global cattle population in 2050 is predicted to reach 2.6 billion, and a projected population of 2.7 billion goats and sheep (Robinson et al., 2011). Due to their global importance, to study ruminants is therefore worthwhile and necessary.

1.2. Structure and Function of the Ruminant Stomach

In order to study ruminants from the point of view of improving feed efficiency, it is important to first understand how the structure of the digestive tract relates to its function. The stomach consists of four distinct chambers. The rumen, reticulum and omasum form the forestomach. This is then connected to the abomasum, which may also be referred to as the ‘true stomach’ as it is the first glandular stomach, resembling that of other mammals, and secretes its own enzymes and hydrochloric acid (Frandson et al., 2009; Agarwal et al., 2015). The stomach makes up around 55 % of the gastrointestinal tract in ruminants (Agarwal et al., 2015). A generalised diagram of the ruminant stomach can be seen in Figure 1.1. Exact morphology of the stomach and absolute size does vary depending on the animal, the type of feed available in the habitat of the animal and foraging behaviour. This is especially visible when comparing smaller ruminants to larger ones (Hofmann, 1989). Despite this, the organisation and

1 FAO. FAOSTAT - Live Animals. Latest update: 04/03/2020. (Accessed: 11/08/2020) http://www.fao.org/faostat/en/#data/QA. 27

Chapter 1 function are the same, with the ultimate aim to digest fibrous plant matter to obtain nutrients. The ruminant digestive system is therefore a prime example of successful evolution to improve the nutrient absorption from feed. The act of rumination coupled with a method of sorting larger ingesta for further mastication and the fermentative nature of the rumen has resulted in a successful adaptation for a herbivore diet (Clauss et al., 2010).

1.2.1. The Rumen and its Specialised Function The rumen is the largest digestion chamber of the stomach, comprising 80 % of the total capacity (Agarwal et al., 2015). It is the primary site of fermentation and digestion, where ingested food travels down the oesophagus and enters the rumen, which is specialised for this function in several ways. The rumen maintains a homeostatic environment; anaerobic with a carefully regulated temperature (~39°C) and a constant pH (~6.9) buffered by saliva (Russell, 2002). It is also stratified, with three major layers of separation seen in grazing ruminants. The bottom is the fluid layer, containing microbes, enzymes and nutrients, on top of which floats the fibre mat which is comprised of the feed in various stages of breakdown and the associated colonising microbes, with gases collecting in the gas dome at the top of the rumen (Tschuor and Clauss, 2008). The ruminal sacs, separated by pillars, contract in a coordinated manner, allowing for efficient movement of ingesta. This is important, as it not only serves as mechanical digestion to break down food particles, but also allows the liquid phase, which is a mixture of saliva and ruminal fluid rich in microorganisms and enzymes, to make better contact with food particles (Russell, 2002). The rumen also ensures that the dry mass of ingesta is hydrated and broken up into smaller particles before allowing entrance to the reticulum. The gas dome is where gases produced during fermentation collect and are then removed from the host back through the oesophagus and eructed (belched) from the animal (Welch, 1982; Russell, 2002). To improve feed efficiency, it is important to not only understand the anatomy and physiology of the ruminant, but also the process of digestion.

28

Chapter 1

Figure 1.1 A diagram showing the organisation of the chambers of the ruminant stomach. The first and largest of the four chambers is the rumen, which is further divided into four sections; the dorsal sac (DSR), dorsal blind sac (DBS), ventral sac (VSR) and ventral blind sac (VBS). The cranial sac (CSR) then links the rumen to the reticulum (RET). This is connected to the omasum (OM) by the reticulo-omasal orifice (ROO). The final chamber is the abomasum (ABO) which then leads to the duodenum (D) (adapted from (Wu and Papas, 1997)).

1.3. Digestion in Ruminants

1.3.1. Feed and Diet The majority of domesticated ruminants are grazers, with a diet formed of grasses (Hackmann and Spain, 2010). The digestion of fibrous material from the herbivore diet is made possible through fermentation by the microbial population in the host gut. Energy for the majority of the microbes in the rumen is obtained through enzymatic fermentation of carbohydrates from plant material (Lean et al., 2014). This fermentation also results ultimately in volatile fatty acids (VFAs), carbon dioxide and methane. The VFAs are absorbed and utilised by the animal host, whereas the methane is removed from the body through eructation (belching). The type of diet, quantity of feed ingested and the microbial population have an effect not only on nutrition absorbed but also the amount of methane produced (Wright and Klieve, 2011). As such, sudden changes in feed or diet may influence the efficiency of this nutritional absorption (Lean et al., 2014). It seems that when considering an optimal diet for ruminants it is directed by feeding the rumen rather than the whole animal, as most of the important nutrients are obtained through microbial action.

1.3.2. Mechanical Digestion Large ingested feed particles are broken down into smaller particles by ruminal peristalsis and rumination. Rumination is the regurgitation of fibrous ingesta (or “cud”) up the oesophagus from the rumen, followed by another cycle of mastication, mixing with saliva and swallowing (Welch, 1982). Saliva production is continuous and secretion is proportional to feed intake (Silanikove and Tadmor, 1989), acting as a lubricant for feed and digesta during mastication and forms the liquid medium in the 29

Chapter 1 rumen, as the rumen does not produce this itself. Saliva contains carbonates and urea and is therefore alkaline, acting as a buffer and neutralising acids produced during fermentation (McDougall, 1948). Grazing ruminants can spend up to 10 hours a day ruminating food (Russell, 2002). However, rumination time can be influenced by a number of external factors, such as the feed size, feeding time, stress and climate (Herskin et al., 2004; Schirmann et al., 2012; Soriani et al., 2013; Braun et al., 2015). This process is important in the breakdown of these food particles to increase the surface area allowing the microorganisms better access to begin fermentation.

1.3.3. Fermentation Herbivores require anaerobic fermentation and degradation by microorganisms to allow fibrous feeds to be broken down for nutrients. Cellulose, hemicellulose, starch, other carbohydrates and proteins are all digested by this microbial community (Flint, 1997). For fermentation to be successful, the environment is kept anaerobic, the temperature between 38 °C and 41 °C, and a pH ranging from 5.5 to 6.9 (Choudhury et al., 2015). The microbial activity helps maintain the anaerobic environment, the animal host controls the temperature and the pH is maintained by saliva, which contains buffering agents such as bicarbonate and phosphate (Wolin, 1981). To digest the lignocellulose-rich plant material, it is necessary to have a symbiotic relationship with the microbial community, to allow the host access to important proteins and VFAs, whilst providing ample nutrient sources for the microbes (Wolin, 1981; Newbold and Ramos-Morales, 2020). VFAs are the main source of energy for the host, and are transported into the blood stream from the rumen, along with other essential nutrients (Wolin, 1981; Nafikov and Beitz, 2007; Janssen, 2010). It has been possible to resolve some of the possible metabolic pathways undertaken in the rumen by deriving the products of ruminal microbial fermentation either through detecting and measuring metabolite fluctuations in culture, or inferring from known metabolic pathways in other ecosystems (Seshadri et al., 2018). A simple overview of the important fermentation reactions that occur in the rumen can be seen in Figure 1.2.

30

Chapter 1

Figure 1.2 Simplified overview of rumen fermentation (adapted from (Place et al., 2011)).

Figure 1.3 Schematic showing carbohydrate fermentation pathways. These are the main pathways employed by ruminal microbes to ferment the different carbohydrates available. Dotted lines show reactions carried out by other microbes (adapted from (Russell and Rychlik, 2001)).

31

Chapter 1 1.3.4. Carbohydrate Metabolism With help of the microbiome, low quality carbohydrate rich lignocellulosic plant material inedible to humans can be effectively digested, producing short chain fatty acids which are used for energy in the ruminant host (Seshadri et al., 2018). Most of the dietary carbohydrates from ingested feed are in the form of cellulose and starch in the plant cells and are fermented by the microbes in the rumen. Only 5- 20 % of dietary carbohydrates are digested by the ruminant host in the intestines, but this only achieves up to 10 % of the glucose requirement of the animal host (Nafikov and Beitz, 2007). Carbohydrate metabolism is therefore an important function of the rumen and its resident microbes. The Embden-Meyerhof-Parnas pathway of 6-carbon sugar metabolism is the most common glycolytic pathway in the rumen, believed to maximise the yield of energy in the form of adenosine triphosphate (ATP) (Russell and Wallace, 1997). Other pathways for glycolysis also exist in the rumen, but all pathways lead the same fate; production of VFAs (the most prominent being butyrate, propionate and acetate), carbon dioxide and hydrogen as seen in Figure 1.3, but it is the type and quality of the plant- based feed that determines which carbohydrate fermentation pathways are used, altering the amounts of resulting products (Place et al., 2011). For those pathways that produce the VFAs acetate and butyrate, hydrogen is a by- of this reaction. This hydrogen is utilised by methanogens present in the rumen to reduce carbon dioxide to methane (Place et al., 2011). The fate of hydrogen in the rumen has been reviewed previously (Ungerfeld, 2020).

1.3.5. Nitrogen Metabolism Nitrogen is generally found abundantly in the rumen, but levels can vary depending on diet as it is most commonly sourced from protein in the plant matter (Wallace et al., 1997b), and rapid proteolysis occurs within the first few hours after feed ingestion in grazing ruminants (Kingston-Smith and Theodorou, 2000). Rumen degradable protein is subdivided into true protein nitrogen; proteins that can be broken down into peptides and then amino acids, which are then either incorporated into microbial proteins or deaminated to produce ammonia, and non-protein nitrogen; nucleic acids, ammonia, urea, free amino acids and peptide molecules also present in the feed (Bach et al., 2005). Proteolysis is a process involving plant stress autolysis, damage to plant material during ingestion and mastication, as well as microbial colonisation and degradation (Kingston-Smith et al., 2003a, 2003b; Edwards et al., 2007). From this proteolysis of rumen degradable protein, nitrogen containing substrates are available to be incorporated into growing microbial cells. These microbes, along with any other undegraded feed, pass from the rumen into the intestines, where they are further digested, releasing microbial protein and a source of nitrogen for the ruminant host (Walker et al., 2005). By allowing protein synthesis by microbes in the rumen from a variety of nitrogen sources and then absorbing microbial protein in the intestines, ruminants are able to efficiently gain reliable access to a protein source (Wallace et al., 1997b). Protein metabolism and catabolism in the rumen therefore are important for host feed efficiency.

32

Chapter 1 Proteolysis in the rumen is complex, with a multitude of organisms, including bacteria, anaerobic fungi and ciliate protozoa, able to participate at different points in the process (Wallace, 1996; Wallace et al., 1997b; Walker et al., 2005) (Figure 1.4). Polypeptides are broken down extracellularly through bacterial enzyme hydrolysis of peptide bonds, and resulting peptides and free amino acids can then be transported by bacteria into their cell, where either these substrates are incorporated into proteins or are further deaminated and degraded, producing ammonia, VFAs and hydrogen or carbon dioxide (Tamminga, 1979; Russell and Martin, 1984). Protein breakdown is not always to fulfil nutrient needs of microbial cells, and under energy resource limiting conditions, some microbes may utilise peptides amino acid degradation for energy (Wallace, 1994; Kingston-Smith and Theodorou, 2000). Incorporation of nitrogen compounds into microbial protein can only occur efficiently if there is enough energy otherwise available for this biosynthesis, but if peptide breakdown and deamination exceeds the rate at which amino acids and ammonia is utilised, the process becomes inefficient and wasteful to the ruminant host (Wallace, 1996). This is because excess ammonia from these processes can diffuse across the rumen wall into the blood stream of the host, where it is metabolised in the liver to urea and recycled (Tan and Murphy, 2004). Ammonia cannot be used by the ruminant host, and therefore excess ammonia and urea is removed from the body in the urine (Kingston-Smith and Theodorou, 2000). The removal of excess ammonia and urea is not only a loss of nitrogen from the host, but also imposes environmental concerns, causing pollution and hazards to health (Walker et al., 2005; Hartinger et al., 2018). Not only does ammonia from excess protein degradation lead to environmental pollution, but the deamination process also involves the production of hydrogen and carbon dioxide. High levels of hydrogen present in the rumen has a negative effect on the digestion and nutrient degradation (Janssen, 2010). In response to this, methanogenic archaea are present to act as a hydrogen sink, taking up hydrogen and carbon dioxide and releasing it as methane gas. This is collected in the gas dome of the rumen, and then eructed by the animal (Janssen, 2010). Methane is a greenhouse gas, contributing to global warming, the significance of this is discussed later in this chapter.

1.3.5.1. Ammonia in the Rumen; Sources, Transport and Fate There are two main sources of ammonia in the rumen, both of which require microbial input for catabolism and liberation of ammonia. The first is breakdown of exogenous nitrogen containing compounds from the plant material in the feed, including proteins and its smaller subunits, nucleic acids and derivatives, or urea containing compounds. The second is from endogenous sources, including hydrolysis of urea, which can be recycled within the host from the liver or saliva into the rumen, or from nitrogen compounds from sloughed host cells (Parker et al., 1995; Huntington and Archibeque, 2000; Tan and Murphy, 2004; Abdoun et al., 2007). There is generally no distinction

+ made between ammonia (NH3) and ammonium (NH4 ) in many of the publications in relation to the rumen unless chemical symbols are used (Abdoun et al., 2007). As such, the term ammonia has in the past been used interchangeably. The same applies here, where ammonia is the generic term and 33

Chapter 1

+ chemical symbols are used to differentiate. As a weak base, nearly all NH3 would dissociate into NH4 at neutral to slightly acidic pH like that of the rumen environment (Huntington and Archibeque, 2000). However, ammonium is not lipid soluble and therefore will not pass the epithelial lining of the rumen or microbial cell walls without active transport. Bacterial cells that rely on ammonia for biosynthesis have active transporters (such as the ammonium transporter Amt) present in their membranes to facilitate ammonium transport (Pengpeng and Tan, 2013). NH3, on the other hand, is lipid soluble and will diffuse freely down a concentration gradient to move across membranes (Parker et al., 1995; Abdoun et al., 2007).

+ Once inside the cell, ammonia (or NH4 if actively transported) is combined with α-ketoglutarate to make glutamate, catalysed by glutamate dehydrogenase (GDH). Glutamate then enters the GS- GOGAT pathway, where glutamine synthetase (GS) and glutamate synthetase (GOGAT) catalyse the cycle between glutamine and glutamate respectively, both of which are progenitor nitrogen compounds for biosynthesis of oligopeptides (Pengpeng and Tan, 2013; Kim et al., 2014). This is a common pathway shared by many rumen bacteria for ammonia assimilation (Pengpeng and Tan, 2013). As mentioned previously, often production of ammonia exceeds the rumen microbial requirement. In this case, NH3 diffuses across epithelial wall into the blood stream and it is this free movement that is the main loss of ammonia from the rumen (Parker et al., 1995). Once in the bloodstream, the liver detoxifies the ammonia to urea or glutamine (Parker et al., 1995; Abdoun et al., 2007). In the case of liver damage or excessive ammonia absorption, the animal host experiences hyperammonaemia, which may lead to death (Lobley and Milano, 1997).

34

Chapter 1

Figure 1.4 Schematic diagram showing the fate of key nitrogen sources in the rumen. Dotted circles show the process and dotted arrows show alternative directions. BCVFA – Branched chain volatile fatty acids. 1These may also be available in the feed, 2Urea is also removed from the host in the urine (adapted from (Hartinger et al., 2018)).

35

Chapter 1 1.4. The Rumen Microbiome

It has been established in the previous section that without microbes the rumen would not be able to digest fibrous foods to gain the nutrition the host requires. Therefore, when studying the rumen function, it is vital to consider the microbiome. Microbial populations and abundances differ between individual ruminants, with variation in diet influencing microbial community composition the most (Henderson et al., 2015). Early studies established that the populations are even time dependent, with noticeable changes before and after feeding (Warner, 1962). That is not to say however that there are not key resident rumen microbes common to all ruminants. Indeed, an extensive study on ruminants across the globe revealed there to be a “core” microbiome, made up of certain microbes that were repeatedly detected in samples (Henderson et al., 2015). What variations there are in microbial communities and abundances can be attributed to the differences in climate, diet and farming strategies of the ruminant host (Henderson et al., 2015). The types of microbes present in the rumen and their estimated concentrations are listed in Table 1.1. The microbiome is well established in the rumen environment, making exogenous infection by transient microorganisms present on ingested feed less likely as it is often outcompeted before it can affect the host (Kamra, 2005).

Table 1.1 Average expected population size of microorganisms in the rumen. ([1] (Lean et al., 2014) and the references therein, [2] (Klieve and Swain, 1993)) Microorganism Expected population size

Bacteria 1010 cells/ml [1]

Archaea 107 – 109 cells/ml [1]

Protozoa 105 – 106 cells/ml [1]

Fungi 103 – 105 zoospores/ml [1]

Bacteriophages 109 – 1010 particles/ml [2]

1.4.1. Bacteria Bacteria are the predominant microbe in the rumen, and make up 50 % of the ruminal cell mass (Creevey et al., 2014). The role of bacteria in the rumen is to breakdown and ferment fibrous feeds. The bacteria are also a source of microbial protein, where the bacterial cell is degraded in the host abomasum before entering the intestines (Russell, 2002). The main energy resource for most bacteria is carbohydrates, which are found in the plant cells, but other bacterial populations will also target protein and lipid portions of the feed. The absence of oxygen is necessary for survival of the microbial population, as almost all ruminal bacteria are obligate anaerobes (Hungate, 1975).

36

Chapter 1 The microbiome is a complex network of microbes creating an interesting collaboration, where one microbe’s products are often another’s resource. Primary fermenters degrade complex polymers into monomers, which may be further utilised by either the primary fermenters or other microbes, creating a trophic chain (Morgavi et al., 2010). Many bacteria have a variety of abilities and may take part in different functions in the digestion process, some working synergistically with other species whilst some work antagonistically, thus it can be difficult to deconstruct the web of functions (Kamra, 2005). On the other hand, some bacteria are highly specialised, having co-evolved with the ruminant hosts to occupy a niche with specific functions and target substrates (Morgavi et al., 2010). The rumen environment undergoes fluctuations between abundant and scarce concentrations, and bacteria have evolved different methods of obtaining energy from the available resources. This diversification of the microbiome allows the rumen to maintain efficiency even during diet changes (Owens and Basalan, 2016). Specialisation and evolution into niches are thought to be brought about by competition for resources (Rubino et al., 2017), but other factors driving this in the rumen microbiome is still an area that is currently of interest (Rubino et al., 2016; Hart et al., 2018). Bacteria are generally grouped based on their primary function in the rumen, which has been derived from pure culture and substrate-based experiments. Some of the key groups are summarised below.

1.4.1.1. Fibrolytic Bacteria Fibre is formed of structural carbohydrates, such as cellulose, hemicellulose and pectin present in plant cell walls and is degraded by enzymes released by microbes (Owens and Basalan, 2016), (Figure 1.5), predominantly bacteria. Fibrolytic species adhere to the large surface area created by mastication, and it is here that most fibre degradation takes place. Bacteria belonging to the phylum Firmicutes have been shown to be the predominant fibrolytic species (Nyonyo et al., 2014; Huws et al., 2016). Hungate originally described three main contenders for the title of most active cellulolytic bacteria in the rumen in 1950 (Hungate, 1950; Russell et al., 2009). Fibrobacter succinogenes is considered to be the bacteria with the highest cellulolytic activity, often one of the dominating species found tightly bound to plant fibre in the rumen consistently over time (Huws et al., 2016); along with Ruminococcus flavefaciens and R. albus, Gram positive cocci that utilise hemicellulose as their energy source (Russell et al., 2009). Although these three species are considered to be the major fibrolytic bacteria, others may contribute to this process too. Species belonging to the genera Butyrivibrio and Pseudobutyrivibrio for example are highly-active xylan utilisers, whilst Prevotella species produce a range of xylan degrading enzymes as part of its repertoire of polysaccharide targeting enzymes, despite not always being considered as belonging to the highly-active cellulolytic group (Krause et al., 2003). Most enzymes involved in degradation of plant fibre belong to a class of enzymes called glycosyl (GH), which catalyse the hydrolysis reaction of the glycosidic bond present in carbohydrates and derivatives to produce smaller poly- or monosaccharide units (Krause et al., 2003). Metatranscriptomic and metagenomic approaches have revealed the diversity of these enzymes present in the rumen. A metatranscriptomic approach to determine enzymes and microbial communities 37

Chapter 1 involved in plant degradation in a cow fed a mixed diet (replicating a commercial diet) showed transcripts for cellulases which likely originated from Ruminococcus and Fibrobacter bacterial species, whereas enzymes involved in oligosaccharide breakdown could be traced to Prevotella species (Comtet-Marre et al., 2017). These results mostly correspond with patterns seen using metagenomic approaches, where the highest numbers of cellulose enzyme genes annotated in reference genomes of ruminal bacteria belonged to R. flavefaciens AE3010 and Fibrobacter succinogenes S85, but the highest number of genes for xylan degradation enzymes were also seen in R. flavefaciens AE3010, followed by Prevotella ruminicola 23 (Seshadri et al., 2018). These studies show that although the highest enzyme activities seem to correspond to those abundant and prevalent cellulolytic bacterial population, there are other bacteria that fit a different functional group that harbour these enzymes. For example, even Anaerovibrio lipolyticus, a predominant lipolytic species in the rumen also showed xylanase and carboxymethylcellulase activity (Nyonyo et al., 2014). Bacteria are not the only microorganisms that participate in this function. The full extent of the contributions of fungi and protozoa to the fibrolysis in the rumen remains to be seen, as recent studies have shown that they play a more substantial role than previously thought (Comtet-Marre et al., 2017). Between these microbes, the cell walls of whole plant cells that enter the rumen are degraded, and once these

38

Chapter 1 cellulose-rich cell walls are breached, other enzymes gain access to otherwise unattainable substrates such as proteins (Kingston-Smith and Theodorou, 2000).

1.4.1.2. Proteolytic Bacteria As discussed previously, protein digestion is a complex collaboration between bacteria, anaerobic fungi, and ciliate protozoa. Bacteria contribute the most to proteolytic activity, but some proteolytic enzymes are also released from plant material during ingestion and digestion (Walker et al., 2005). It was once believed that most of the breakdown of proteins was due to protozoa, but studies some years later determined that bacteria took on this role, with protozoa acting in a supporting role. This was shown to be the case in experiments measuring ammonia production from protozoal and bacterial cultures. It was found that protozoa contribute less than bacteria to ammonia production, but a synergistic effect was present in combined cultures (Hino and Russell, 1987). Early studies also found that it was the ruminal fluid that holds proteolytic activity, and although variable between samples, does not depend on diet, nor is it directly linked to active bacterial growth (Blackburn and Hobson, 1960). The first step in proteolysis is the extracellular enzymatic degradation of proteins by

Figure 1.5 Diagram of key cellulolytic enzymes and the reactions they catalyse (Krause et al., 2003). Endoglucanase enzymes first break down the complex cellulose fibres, allowing access to cellobiohydrolases, which produces glucose polymers. Exoglucanase breaks down polysaccharide chains into smaller subunits, whilst glucosidase converts cellobiose into glucose (adapted from (Krause et al., 2003)). 39

Chapter 1 hydrolysing some (or all) of the peptide bonds, producing oligopeptide subunits (Tamminga, 1979), as can be seen in Figure 1.6 and Figure 1.7. Those bacteria with proteolytic capabilities were isolated from rumen fluid using casein as the primary protein source along with carbohydrates in a nutrient- rich solid media. This yielded important, abundant, culturable proteolytic bacteria, including Streptococcus bovis (Russell et al., 1981), Bacteroides amylophilus (now Ruminobacter amylophilus (Stackebrandt and Hippe, 1986)), Selenomonas ruminantium, Bacteroides ruminicola (now Prevotella ruminicola (Shah and Collins, 1990)), Butyrivibrio fibrisolvens (Blackburn and Hobson, 1962), as well as species belonging to Lachnospira, Eubacterium, Fusobacterium and Clostridium (Wallace and Brammall, 1985). Some proteolytic enzymes could be separated from the cellular fraction, showing that enzymes were released and could breakdown casein in the solid media, creating a clearing. Those proteases that were cell-associated, such as on S. bovis and Eubacterium sp., did not successfully metabolise casein in solid media (Wallace and Brammall, 1985). Whilst protease release tends to be constant (Wallace and Brammall, 1985), it has been shown that activity can increase with an increased availability of carbohydrates (Cotta and Hespell, 1986). Interestingly, despite in vitro quantifications of protease activities indicating that a particular bacterium would not contribute substantially to the proteolysis process in the rumen, when combined with sequencing data to quantify population size, it may be that the bacterium in question is highly abundant, then it would in fact participate in proteolysis, such is the case with Prevotella albensis (Hartinger et al., 2018). Prevotella species are important contributors to peptide breakdown (Russell, 2002; Hartinger et al., 2018), with P. ruminicola harbouring the broadest range of dipeptidyl-peptidases than other microbes in the rumen, able to target different dipeptide combinations in longer peptide chains (Wallace et al., 1997a). These dipeptidyl-peptidases are the predominant type of peptidase enzymes in the rumen, catalysing the first step of peptide breakdown, where dipeptides are liberated from oligopeptides (Wallace et al., 1997b). Other peptidases catalyse the cleavage of a single amino acid from a peptide chain, a common function found in strains of Streptococcus bovis (Russell and Robinson, 1984; Wallace et al., 1997b). Although these two bacteria contribute the most to peptidolysis, other bacteria also contribute to this process, producing enzymes that catalyse various subunits of peptides. This includes certain strains of Butyrivibrio fibrisolvens, Fibrobacter succinogenes, Lachnospira multipara, Ruminobacter amylophilus and Ruminococcus flavefaciens (Wallace and McKain, 1991). The second step pertains to the cleavage of dipeptides into two single amino acids, catalysed by dipeptidases, common to bacteria such as Megasphaera elsedenii, Prevotella spp., Fibrobacter succinogenes and Lachnospira multipara, as well as protozoa (Wallace and McKain, 1991; Walker et al., 2005). Once amino acids are liberated from oligo-, tri- and dipeptides, deamination takes place in the cell, as seen in Figure 1.6.

1.4.1.3. Deaminating Bacteria Although the absolute amount of free amino acids present in the rumen varies depending on diet, at any given time there is only ever a low concentration compared to other substrates; this is because free amino acids are rapidly deaminated (Walker et al., 2005). Despite preformed amino acids being an 40

Chapter 1 important nitrogen source necessary for bacterial growth, only a fraction of the available amino acids are directly incorporated into new microbial protein (Hartinger et al., 2018). As previously mentioned, in energy limiting conditions, some bacteria are capable of deaminating amino acids for energy, not just for microbial protein synthesis (Kingston-Smith and Theodorou, 2000). This deamination leads to the production of ammonia, varying carbon chain lengths of VFAs, hydrogen and/or carbon dioxide, as well as ATP in the process (Barker, 1981). The carbon dioxide and hydrogen in the rumen are further metabolised by methanogens to create methane, an example of the relationship between the deamination and methanogenesis process can be seen in Figure 1.8. This relationship can be disturbed when mixed rumen microorganisms are treated with monensin, (an ionophore antibiotic) or carbon monoxide (a dehydrogenase inhibitor), resulting in decreased amino acid deamination and methane production (Hino and Russell, 1985).

There is evidence for many different species of bacteria in the rumen capable of transporting peptides or amino acids from degraded plant matter into their cells not just for biosynthesis of microbial protein (Russell and Martin, 1984). As ammonia is a product that is easily measurable in the laboratory, this became a proxy for measuring amino acid deamination. This was implemented by research in the 1960s, which revealed that some individual bacteria isolated from rumen contents could produce measurable amounts of ammonia when grown on solid agar-based media supplemented with ruminal fluid, amino acids, protein hydrolysates and added carbohydrates (Bladen et al., 1961). The predominant bacteria isolated this way were Selenomonas ruminantium, Butyrivibrio fibrisolvens, Eubacterium ruminantium, Peptostreptococcus elsdenii (now Megasphaera elsdenii (Rogosa, 1971)) and Bacteroides ruminicola (now Prevotella ruminicola (Shah and Collins, 1990)), with the latter believed to be the most important amino acid fermenter and ammonia producer (Bladen et al., 1961; Russell and Wallace, 1997).

41

Chapter 1

Figure 1.6 A schematic diagram showing the key processes in proteolysis. Intracellular processes of ruminal bacteria are those inside the oval and extracellular activity is outside, where the protease is bound to the membrane (adapted from (Bach et al., 2005)).

Figure 1.7 The key bacteria involved at different stages of the proteolytic process (adapted from (Walker, Newbold and Wallace, 2005)).

42

Chapter 1 Yet it was found that most of these bacteria could not deaminate amino acids fast enough to satisfy growth energy requirements, and this process would therefore only be used as maintenance energy and to compliment other energy-producing processes such as carbohydrate fermentation (Russell, 1983; Russell and Wallace, 1997; Wallace et al., 1997b; Hackmann and Firkins, 2015). There is however is a niche in the rumen occupied by a small population of bacteria that utilise amino acids as their sole energy source, and in this process produce a large amount of ammonia. Such bacteria have been termed hyper ammonia producing bacteria (HAPs). They are obligate amino acid fermenters, and contribute to the excess ammonia production from protein digestion in the rumen (Russell, 2002). This makes them an important target when considering ways to decrease deamination and ammonia production.

1.4.1.4. Hyper Ammonia Producing (HAP) bacteria Hyper ammonia producers (HAPs) are a group of anaerobic bacteria present in the rumen that are able to utilise amino acids and peptides as their only source of both carbon and nitrogen, gaining energy in the form of ATP from this process (Chen and Russell, 1988, 1989a). Indeed, it was previously known that amino acid deamination was common to the bacterial family Clostridia (Mead, 1971). The family could be subdivided into groups based on preferred amino acids and deamination products; some could utilise single amino acids and others would use pairs, where one amino acid acts as an electron donor, and another an electron receptor in what is termed the Stickland reaction, as shown in Figure

Figure 1.8 The fate of the amino acid leucine in the rumen. Leucine is ultimately degraded to the VFA isovalerate, carbon dioxide and glutamate through the transamination and decarboxylation process in (a). Glutamate is then deaminated to 2-oxoglutarate and ammonia, and NADH is hydrogenated (b). Hydrogenase enzymes then remove hydrogen to produce NAD (c), and methanogenesis converts carbon dioxide and hydrogen into methane (d) (adapted from (Hino and Russell, 1985)). 43

Chapter 1 1.9 (Mead, 1971). Products of amino acid fermentation are short-chain VFAs, hydrogen, ammonia and most importantly, ATP (Barker, 1981). The pathways and enzymes for these bacteria are relatively well studied.

A study in 1988 described the isolation of a Peptostreptococcus species and a Clostridium species from the rumen of cattle, both of which were able to produce around 20-30 times more ammonia than M. elsedenii and Prevotella ruminocola, the two bacteria considered to be the main ruminal ammonia- producing bacteria isolated previously (Russell et al., 1988). These were isolated from enriched rumen fluid, using trypticase as the sole nitrogen and energy source. Colonies with an interesting appearance were arbitrarily picked and several different morphological and biochemical characteristics were determined and compared to known ammonia producers. The isolates were named using presumptive identification, matching patterns of morphology and biochemistry to known bacterial genera. Strain C (a coccus) was named as a monensin-sensitive Peptostreptococcus, and was able to grow in the presence of protein hydrolysates as the only energy source (no carbohydrate) (Chen and Russell, 1988; Russell et al., 1988). In contrast, strain R (a rod shape) was named as a Clostridium and could grow in the presence of some carbohydrates (Russell et al., 1988). The following year, two further bacterial strains were isolated from the rumen (Chen and Russell, 1989a). They were also monensin sensitive and could utilise carbohydrates, albeit poorly, but grew very well on amino acids with high levels of ammonia production. These two were chosen based on cell morphology and named strain F (irregularly shaped) and strain SR (short rods). It was concluded that, based on morphology and biochemical characteristics, that strain F belonged in the Eubacterium genus, whilst strain SR belongs to the Bacteroidaceae group (Chen and Russell, 1989a). A few years later these were formally identified and named using 16S rRNA sequence similarity searches on the three strains isolated previously (strain C from Chen and Russell, 1988, and strains SR and F from Chen and Russell, 1989). Respectively, these three were found to be closely related to Peptostreptococcus anaerobius, Clostridium sticklandii (now Acetoanaerobium sticklandii (Galperin et al., 2016)) and the third had no matches and was claimed a new species, named Clostridium aminophilum (Paster et al., 1993). These three bacterial species have been titled hyper ammonia producers (HAPs). Further studies have since extended the list of species of ruminal HAP bacteria and have been formally defined as being monensin-sensitive, present in low numbers in the rumen (107 cells per ml of rumen fluid, compared to >109 of low ammonia producing bacteria), and a culture is able to produce around 300 nmol of ammonia per minute per milligram of protein (Wallace, 1996) (Table 1.2). Novel HAPs were isolated by taking advantage of their ability to grow without the requirement for carbohydrate as a source of energy, thereby separating them from other culturable obligately saccharolytic bacteria. The source of nitrogen supplied, on the other hand, tended to differ, with HAPs harbouring a preference for whether they would uptake peptides or amino acids, and even showed a preference for certain amino acids over others. These preferences were determined by observing a higher density of growth in liquid culture either in the presence of free amino acids in casamino acids (an acid hydrolysate of casein), or with peptides in trypticase (an enzymatic digest of casein) (Chen and Russell, 1988). For example, P. 44

Chapter 1

Figure 1.9 Structural equation showing the Stickland reaction. The reaction in (A) shows the deamination of a donor amino acid catalysed by a dehydrogenase, producing ammonia and hydrogen and an α-keto acid, to which an inorganic phosphate binds to form an acyl-phosphate. (B) a reductase reduces an acceptor amino acid, donating the protons and electrons from the first reaction, producing yet more ammonia and the carbon chain creating the corresponding VFA. (C) The phosphate group phosphorylates ADP to create ATP (adapted from (de Vladar, 2012)). anaerobius grew better with casamino acids, could deaminate leucine, serine, phenylalanine, threonine, and glutamine quicker than other amino acids, and ammonia production was increased when amino acids where offered as Stickland pairs (Chen and Russell, 1988). A. sticklandii grew better on trypticase and could rapidly deaminate arginine, serine, lysine, glutamine and threonine (Chen and Russell, 1989a). The presence of sodium ions was found to be necessary for the uptake of amino acids in P. anaerobius and C. aminophilum (Chen and Russell, 1989b, 1990).

Implementation of the HAP-enrichment media lead to the growth of a total of 19 distinct isolates from the rumen of sheep that were capable of a high level of ammonia production, and were split into six groups based on morphology and biochemistry (Eschenlauer et al., 2002). The groups I, II and IV did not utilise sugars, the ammonia production rates were high and DNA analysis revealed similarity to Clostridium and Eubacterium species. Group VI showed a similar pattern, and sequencing revealed similarity to Acidaminococcus fermentans, a species already described as a species capable of growing solely on amino acids (Rogosa, 1969). Group V produced the highest amount of ammonia, but this was due to ureolytic activity, and this group could also utilise glucose, sucrose, and maltose. Group III did not fit the HAP category, as isolates were saccharolytic and produced low rates of ammonia (Eschenlauer et al., 2002). An isolate from one of the sheep showed hyper ammonia producing characteristics and was asaccharolytic, but the VFA production profile differed from HAPs previously isolated. DNA analysis confirmed that this bacterium to be novel, and was named Eubacterium pyruvativorans, which clusters phylogenetically with other Eubacteria, Clostridia and HAPs (Wallace et al., 2003).

45

Chapter 1 Fifteen species with HAP-like phenotypes were isolated from the rumina of goats and sheep in Australia fed Calliandra calothyrsus, a proteinaceous and tannin-rich leafy shrub, all of which were able to grow on peptides and proteins in the presence of tannins and produced some amount of ammonia from peptide breakdown. Of these fifteen, ten fermented carbohydrates, six either showed low growth or did not use carbohydrates at all, and only one strain was claimed as a HAP, producing an amount of ammonia comparable to that of previous HAPs. This is the first hyper ammonia producing proteolytic species isolated (McSweeney et al., 1999). However, no explicit comment is made about the carbohydrate fermentation abilities for this isolate, only that it was placed in the category of “low level of carbohydrate fermentation”, and it was noticed that density of cultures decreased with addition of carbohydrates compared to media with none. Analysis of the 16S rDNA sequence revealed that this proteolytic strain was the most related to C. botulinum group 1, fermentation patterns of peptides was similar to that of Acetoanaerobium. sticklandii, and amount of ammonia produced was comparable to C. aminophilum F, C. sticklandii SR (which would now be assigned to Acetoanaerobium genus (Galperin et al., 2016)) and P. anaerobius C (McSweeney et al., 1999).

Meanwhile in New Zealand, Attwood et al in 1998 implemented the same HAP specific solid media techniques as previous studies, using tryptone and casamino acids as the sole nitrogen and energy source, but using roll tubes and liquid media, and culturing samples from pasture grazing cattle, sheep and deer. Five isolates were found this way, and when challenged with various carbon sources in the form of various carbohydrates, the isolates showed no evidence of fermentation of these substrates, except for lactate and pyruvate. All isolates were sensitive to monensin. Dot blots of DNA from the isolates did not hybridise to oligonucleotides from the three previously found HAP bacteria, proving that these isolates were different to the original three. One isolate closely resembled that of Peptostreptococcus anaerobius using phylogenetic analysis but was found to have an extra insert in the 16S sequence compared to P. anaerobius C. Another isolate was high sequence similarity to Fusobacterium necrophorum, whilst the best-known relatives to the other two strains were Eubacterium spp. and P. asaccharolyticus (Attwood et al., 1998). Ammonia production and growth rates did vary between these isolates, but despite this they all fall into the true HAP description, and therefore broadened knowledge about this niche.

Interestingly, further studies showed that ruminal isolates of Fusobacterium necrophorum had a high deamination rate of lysine, producing a significant amount of ammonia (Russell, 2005). Glutamate, glutamine, histidine, and serine could also be used as energy sources, and the isolates were sensitive to monensin, adhering to the previous HAP description. However, unlike other HAPs, F. necrophorum is Gram negative, and could also utilise glucose, maltose, galactose and lactose as energy sources for growth, but only when trypticase or yeast extract were also provided (Russell, 2005). The ability of F. necrophorum to utilise carbohydrates for growth shows that the hyper ammonia producing niche is not strictly asaccharolytic.

46

Chapter 1 This was supported by work that showed when species isolated from forage fed Nellore steers (Bos indicus) showed carbohydrate fermentation capabilities alongside ammonia production and were also termed hyper ammonia producing bacteria (Bento et al., 2015). This study classified those isolates

−1 −1 producing ≥100 nmol NH3 mg protein min as HAPs (three times lower than previously described, Table 1.2), which were all sensitive to monensin, but most strains showed increased culture density with addition of carbohydrates. Only three strains were obligate amino acid fermenters and did not utilise carbohydrates for growth, whereas the other 27 strains were carbohydrate fermenters but had a similar deamination rate to those previously recorded. Phylogenetic analysis using 16S rDNA sequences revealed only distant relations to previously isolated HAP species (Bento et al., 2015). These studies indicate that bacteria are likely influenced by host diet and environment and therefore the subsequent substrates available, leading these bacteria to evolve into their niche. It has been noted previously that the ability to switch between carbohydrate and amino acid fermentation for energy production would indeed be advantageous in the rumen (Hartinger et al., 2018). The characteristics of HAPs are further discussed in Chapter 3.

Table 1.2 Summary of hyper ammonia producing bacteria characteristics in the rumen. Information from figure in (Wallace, 1996). High Numbers, low activity Low numbers, high activity Butyrivibrio fibrisolvens Megasphaera elsdenii Clostridium aminophilum Example bacteria Prevotella ruminocola Clostridium sticklandii Selenomonas ruminantium Peptostreptococcus anaerobius Streptococcus bovis Concentration >109 cells ml-1 107 cells ml-1

-1 -1 Ammonia production 10-20 nmol NH3 min 300 nmol NH3 min Monensin activity Mostly resistant Sensitive

As a member of the amino-acid utilising, non-pathogenic proteolytic Clostridia, Clostridium sticklandii was labelled a ‘gold mine’ for new and interesting biochemical reactions, and sequencing of its genome revealed the enzymes and reaction pathways involved in the Stickland reaction, oxygen sensitive and selenium dependent enzymes (Fonknechten et al., 2010). Such pathways are illustrated in Figure 1.10, and notably most of these pathways are found in Clostridium spp. (Fonknechten et al., 2010). Eight selenoproteins were identified in the genome of C. sticklandii, where a cysteine residue is replaced by selenocysteine (Fonknechten et al., 2010), one of which is involved in the reduction of glycine that results in production of acetate and ammonia (Tanaka and Stadtman, 1978). With the genome available and the prevalence of C. sticklandii as an amino acid utiliser producing organic acids, using models and simulations it was possible to predict the success of this species for protein- based biofuel production (Sangavai and Chellapandi, 2017). Further review of this bacterium in the

47

Chapter 1 context of biotechnology and industrial applications looked at a genome-scale model to identify key genes and enzymes for biorefining and biofuel production, as illustrated in Figure 1.11.

Figure 1.10 Illustration of key metabolic pathways that take place in Clostridium sticklandii (strain DSM519) based on the genome Figure from (Fonknechten et al., 2010).

48

Chapter 1

and Chellapandi, and 2017). involved catabolism in or biosynthesis different of substrates, which are given in blue. dark Figure from Figure

1

.

11

Keypathways for biofuel production in

C. C. sticklandii

.

Colouredboxes highlight different the pathways (Sangavai

49

Chapter 1 1.4.2. Archaea Archaea are less diverse than the bacterial populations in the rumen (Henderson et al., 2015). Most of the archaea in the rumen are methanogens, where the predominant genus is Methanobrevibacter spp. (Morgavi et al., 2010). The dominant archaea in the rumen have methanogenic abilities and certain species are found in ruminants all over the globe (Henderson et al., 2015), and a core archaeal population is found irrelevant of diet (Carberry et al., 2014). These methanogens are an important sink for hydrogen, sequestering dihydrogen molecules that are produced during normal fermentation

(Kamra, 2005). They are able to use this hydrogen to grow, reducing carbon dioxide (CO2) to methane

(CH4) through the hydrogenotrophic pathway (Huws et al., 2018). Certain species are also able to use formate, methyl groups or acetate as reduction agents (Janssen and Kirs, 2008). The methane produced is removed from the ruminant host through eructation, entering the environment and contributing to the accumulation of this greenhouse gas in the atmosphere as well as wasting potential dietary energy from the host (Huws et al., 2018). It is this methane production ability that has placed archaea in the spotlight of rumen microbiome research, particularly as targets of enteric methane mitigation strategies. Some recent examples include using methanogen-specific lytic enzymes (Altermann et al., 2018), vaccination against key methanogens (Williams et al., 2009; Zhang et al., 2015) and targeting the methanogenesis activity using oils (Vargas et al., 2020). Removal of the archaea themselves is not a viable strategy alone as this community plays an integral role of hydrogen utilisation in the rumen, instead reduction of hydrogen production (Huws et al., 2018), or finding alternative sinks for the hydrogen would make more viable strategies (Lan and Yang, 2019; Ungerfeld, 2020).

1.4.3. Viruses The predominant types of virus present in the rumen are bacteriophages and archaeal viruses, with bacteriophages the most well characterised (Gilbert et al., 2020). Other viruses, which may be commensal or pathogenic may also be found in the rumen, but in much lower numbers.

1.4.3.1. Bacteriophages Bacteriophages (phages) are viruses that infect bacteria and are believed to be almost always present when bacteria are, therefore they can be found in every biosphere in abundance (Clokie et al., 2011). They generally fall into two categories based on their lifecycle and propagation abilities: virulent and temperate phages. A phage will first bind to a bacterial cell and inject its genetic material into the host cell. At this point, depending on the phage ability and particular circumstances, a virulent phage will undergo the lytic cycle, where the host cell machinery is used to produce virion progeny, and eventually these newly-formed phages burst out of the bacterial cell causing lysis and death. Temperate phages, however, may choose to undergo this lytic cycle or the lysogenic cycle. This involves instead incorporating the host cell genome into the genome of the host cell or circularising in the cytoplasm, where it remains a prophage, propagating into the genomes of daughter cells. Upon stress or induction of the lysogen (the bacterial cell), the prophage is no longer repressed and excises 50

Chapter 1 itself, then undergoes the lytic cycle, to propagate itself further (Gilbert and Ouwerkerk, 2020a) (Figure 1.12).

Despite being abundant in the rumen, comparatively little is known to date about the phage population compared to other microbes in the rumen and the influences on ruminal function and digestion they may have (Anderson et al., 2017), but with the advent of new methods, the extent of the phage population in this environment is beginning to be uncovered (Gilbert et al., 2017). It was established in early studies that the phages present were morphologically diverse and in high abundance in ruminal fluid (Figure 1.13) and could influence bacterial populations (Orpin and Munn, 1973). Given the diversity and abundance of bacteria, the presence of bacteriophage in this environment was hardly surprising (Lockington et al., 1988). A noticeable variation in the size of the ruminal phage population in between individuals was observed by studying phage DNA concentration in rumen fluid from sheep and goats using gel electrophoresis based approaches (Swain et al., 1996). This method compared banding patterns formed by phage DNA to determine the variety of phages present (Klieve and Swain, 1993). The banding patterns showed that not only that there was variation between animals, but even individual ruminants penned together eating the same diet had distinctly different phage populations. The total amount of DNA observed within a group however was similar. This study also determined that phage populations fluctuated in a diurnal pattern in sheep fed once daily, reaching the minimum concentration two hours after feeding, and a maximum concentration after six to eight hours (Swain et al., 1996).

Early studies of individual phages that infect known bacterial hosts describe the successful isolation of lytic phages from ruminal fluid that target species such as Serratia spp. and Streptococcus bovis (Adams et al., 1966), and Fusobacterium necrophorum (Tamada et al., 1985). These studies employed the typical soft overlay technique, growing lawns of bacteria in a softer agar poured over solid bottom agar, and adding phage samples to observe lysis. Similar methods have also been used to identify phages against rumen bacteria from samples that did not originate from the rumen fluid, for example lytic phages were isolated from abattoir runoff and washings of manure from transport trucks against common cellulolytic bacteria Ruminococcus albus and R. flavefaciens (Klieve et al., 2004), and phages against Prevotella ruminocola were found in sewage water (Klieve et al., 1991).

To further understand the phage populations that are present in the rumen, an induction study using the mutagen mitomycin C was conducted to observe the presence of prophages; bacteriophages that have infected and integrated their phage genomes into the host bacterial genome (Klieve et al., 1989). Bacteria from ruminal fluid were cultured and characterised using cell and culture morphology and biochemistry, then mitomycin C was added to growing cultures and tested for presence of phages using the soft overlay technique. Of 30 bacterial isolates tested, eight resulted in the production of phage-like particles after induction. One isolate also produced spontaneous particles without addition of the inducer (Klieve et al., 1989). This study proves that temperate phages are present in the rumen, and spontaneous phage particles could be indicative of lytic phage. 51

Chapter 1

Figure 1.12 Diagram of lytic and lysogenic cycles (adapted from (Gilbert and Ouwerkerk, 2020a)).

52

Chapter 1 Next generation sequence approaches are now elucidating the extent of prophage infection in genomes as well as the virome in the rumen, as well as outright sequencing the genomes of phages themselves. Sequencing of the above mentioned lytic phages isolated against Prevotella ruminocola, Ruminococcus spp. and Streptococcus bovis revealed conserved genes, allowing for better phylogenetic classification of these phages, as well as combining this information with the sequenced host genome to determine phage: host interactions (Gilbert et al., 2017). This is discussed further in chapter 6. A metagenomic approach to characterise the phage populations found that the most abundant phage type tended to be from the families Siphoviridae and Myoviridae, which are known to infect bacteria from the phyla Firmicutes and Proteobacteria, which are also found in high abundance in the rumen. They also found that prophages were much more common than lytic phages (Berg Miller et al., 2012). A similar study was carried out a year later, also using metagenomic approaches to characterise phages from thirteen dairy cattle (Ross et al., 2013). A large amount of variation was observed between the sampled cattle and when comparing their results to the data from the previous study, they found that only up to 5 % of their virome sequences align to those from the previous dataset (Berg Miller et al., 2012; Ross et al., 2013). As discussed in the publication (Ross et al., 2013), these differences may either reflect the methodology used, or could signify that phage populations can vary drastically between individuals. Virome metagenome analysis in steers fed different diets revealed that diet has a significant impact on viral community changes, but despite this, a core virome could be identified, with fourteen viruses predicted to infect Bacteroidetes and Proteobacteria that were common to all samples and diets tested (Anderson et al., 2017). Reads from a metagenome study of the viral community from the ovine and caprine rumen were found to be similar to phages isolated from other environments, as well as reads that map to an archaeal phage and prophages (Namonyo et al., 2018). The use of bioinformatic and computational methods for virome construction and prediction allows analysis of host: virus interaction, and the role viruses play in the rumen.

53

Chapter 1

Figure 1.13 Transmission electron microscopy (TEM) images of different phages isolated from the rumen. A range of morphological features are demonstrated by these phages, including group A morphology (typical of Myoviridae), with icosahedral or octagonal capsids, contractile tail and sometimes visible baseplate and collars (a,b,c,d). Group B morphology (typical of Siphoviridae) is also seen with elongated heads and long tails (e,f,g,k). Group C type morphologies (typical of Podoviridae) show collars and short tails (h, i, j), and a phage formed of long capsids was also seen (l) (adapted from (Klieve and Bauchop, 1988)).

54

Chapter 1 1.4.3.2. Archaeal phages There are phages that infect archaea, and although it has been argued as to whether these viruses are still classed as phages (Clokie et al., 2011), they are referred to here as such. The presence of a prophage within sequenced genomes from ruminal methanogens, such as species in the Methanobrevibacter genus, have been published (Attwood et al., 2008; Leahy et al., 2010; Kelly et al., 2016). The M. ruminantium genome for example was found unexpectedly to contain an intact prophage sequence, named ϕ-mru, which disrupts a putative membrane protein gene and is complete with an endoisopeptidase lysis gene (Attwood et al., 2008; Morgavi et al., 2012). These prophages have all been predicted computationally, with only ϕ-mru successfully induced, characterised and patented (Attwood et al., 2008). The presence of archaea carrying integrated prophages suggests that there are a wide range of phage types present in the rumen able to target the different aspects of the microbiome.

1.4.4. Fungi Despite being originally confused with flagellated protozoa, anaerobic fungi play an important role in degradation of fibre (Puniya et al., 2015), and are considered the most effective fibre degrading microorganism in all herbivorous mammalian guts (Edwards et al., 2017). They are generally found attached to ingested plant fibres or as free zoospores (Gordon and Phillips, 1998), and it is the attachment of these zoospores that mediates colonisation of fungi (Edwards et al., 2008). Upon removal of fungi from the rumen, gas production and fibre degradation decreases (Kamra, 2005). Despite the importance of anaerobic fungi in the rumen, the contributions, functions and abilities of this community within the rumen microbiome remains limited (Huws et al., 2018). This is partly due to difficult and fastidious growth conditions these microorganisms require to be successfully cultured and studied outside the host, as well as possessing large AT rich genomes, complicating sequencing efforts (Wang et al., 2019). Analysis using phylogenomic approaches of five rumen anaerobic fungi genomes combined with transcriptomic data revealed that these obligately anaerobic symbiotes form a distinct group, and that there were bacterial and host genomic elements alongside genes that are novel and unique to fungi that contribute to its successful fibre degradation (Wang et al., 2019).

1.4.5. Protozoa Mainly formed of ciliated protozoa, these microbes are free living, and form a large proportion of the cellular mass in the rumen, despite populations being fewer in number than bacteria (Newbold et al., 2015; Huws et al., 2018). Yet the variability in protozoal populations in different ruminant individuals across the globe is greater than that seen in bacterial or archaeal populations (Henderson et al., 2015). Their main nutrient source is soluble carbohydrates (Hungate, 1975), but some ciliated protozoa play a role in proteolysis and require proteins (Wallace, 1996), as well as preying on prokaryotes as a source of nitrogenous compounds (Russell and Hespell, 1981). Protozoa also produce hydrogen, which fuels methanogenesis by archaea, which associate themselves either externally or internally in the 55

Chapter 1 cytoplasm of the protozoa (Tapio et al., 2017). Defaunation (the removal of protozoa from the rumen) showed that not only are protozoa unnecessary for ruminal function, but their removal lead to an increase in bacterial populations, a reduction in methane and more microbial protein was made available to the host (Newbold et al., 2015).

1.4.6. Understanding the Rumen Microbiome Given the reliance of ruminants on the microbiome for digestion, ample research has gone into understanding this interesting community. Since bacteria are the predominant microbe in the rumen, making up 50 % of the ruminal cell mass (Creevey et al., 2014), early studies concentrated on culturing these microorganisms. In vitro culturing techniques allow for successful growth of some of these microorganisms from the rumen, made possible by the important development of strictly anaerobic growth media rich enough to simulate the rumen environment (Krause and Russell, 1996b). Despite these techniques yielding many novel bacteria from the rumen, it has been found that between 85 and 95 % of bacteria and other microbes remain uncultured (Krause et al., 2003), and with recent developments in culturomics, it is thought that 23% of the rumen microbiome could be cultured (Zehavi et al., 2018). To answer questions that arose about the remaining bacteria present in the rumen, different techniques had to be employed, and with the advances in sequencing, it is now possible to use genomic approaches to do this. Early studies used 16S rRNA probes to target ruminal bacteria and characterise the bacterial ecosystem phylogenetically without the need for culturing (Stahl et al., 1988). Not only was identification of bacteria possible using 16S rRNA probes, but assessment of culture-based hypotheses and metabolism could be carried out, for example these probes were used to determine that monensin affected growth of the hyper ammonia producers Peptostreptococcus anaerobius and Acetoanaerobium sticklandii but not Clostridium aminophilum (Krause and Russell, 1996a). Since then, many sequences from these studies have been deposited into online databases, allowing meta-analysis studies. In a study examining 16S rRNA gene sequences, 13,478 ruminal bacterial sequences were found, which were organised into 19 different phyla, where Firmicutes, Bacteroidetes and Proteobacteria were predominant (Kim et al., 2011).

However, analysing the rumen microbiome one genome at a time will not elucidate the complex network of functions and microbial community efforts that ultimately go towards digestion, instead looking at the metagenome will help to build a better image of the ecosystem (Krause et al., 2003). By analysing metagenomic samples from ruminants across the globe, microbial families, genera or even species have been identified as common to the rumen. What has been described as the ‘core bacterial microbiome’ is formed of species from the genera Prevotella, Butyrivibrio, and Ruminococcus, and unclassified genera in the families Lachnospiraceae, Ruminococcaceae, Bacteroidales and Clostridiales (Henderson et al., 2015). With increased throughput of sequencing data, studies were able to analyse the rumen microbiome for enriched genes and functions, finding a substantial number of carbohydrate active genes involved in biomass breakdown (Hess et al., 2011). With deep enough sequencing, it is even possible to start to reconstruct near-complete genomes from metagenome 56

Chapter 1 samples, a feat first achieved in the same study, where fifteen previously uncultured bacterial genomes were assembled (Hess et al., 2011). Deep sequencing of 42 cow rumen samples achieved assembly of 913 draft genomes, all of which were assigned to a kingdom, with 416 (45.6 %) assigned at least to the family level, 158 (17.3 %) to a genus and only seven (0.8 %) could be assigned to a known species (Stewart et al., 2018). This has since been expanded to 4,941 assembled draft genomes, all of which were assigned to a kingdom, 4,801 to a phylum, 4,514 to a class, 4,084 to an order, 3,188 to a family (64.5 %), 1,092 to a genus (22.1 %), and 144 to a species (2.9 %) (Stewart et al., 2019).

Metagenomic sequencing therefore allows for taxonomic profiling and population analysis, functional profiling of the microbes and the formation of near-complete whole genomes, yet there are a number of limitations (Quince et al., 2017). Despite an array of bioinformatic tools to analyse data, most platforms and programmes still rely on pattern matching as their foundation, which require comprehensive reference databases to be effective, a luxury not often afforded for many studies, and if it is, there is often a bias towards culturable and model organisms (Quince et al., 2017). Rumen microbial data is no exception to these problems, and a study in 2014 conducted a meta-analysis on available rumen microbial sequences to determine where future culturing efforts should be concentrated. Data was obtained from multiple sources, such as online databases, literature, and culture collections. It was found that the latter had a bias towards three phyla, which lend themselves to laboratory cultivation, whilst other taxa were under-represented (Creevey et al., 2014). Many genera from the rumen microbiome that had been cultivated had no genome sequence data, and instead these databases were biased towards the abundant bacteria involved in plant cell wall degradation (Creevey et al., 2014). To fulfil such needs of having better and more complete databases, the Hungate 1000 project was developed, with the aim to sequence one thousand rumen microbes to create a set of reliable reference genomes (http://www.rmgnetwork.org/hungate1000.html). The major outcome of this study was published recently, having sequenced 410 genomes to complement the 91 already publicly available rumen microbe genomes (Seshadri et al., 2018). These genomes are from representative cultured rumen microorganisms, from across nine different phyla, 48 families and 82 genera, yet some microorganisms are more represented than others. For example some genera have multiple species and strains sequenced; these tend to be the common and abundant carbohydrate fermenters (Seshadri et al., 2018). The Hungate 1000 database is a valuable resource for rumen microbiology research, as it is comprised of high-quality genomes sequenced from bacteria and archaea that have been cultured in vitro.

1.5. Feed Efficiency and Methane Production in the Rumen

Those who study ruminants often interest themselves ultimately with two main areas: increasing feed efficiency and reducing environmental impacts. Nutritionists, geneticists, microbiologists and farmers have all contributed to understanding the rumen microbiome, with efforts towards improving host nutrition and feed efficiency, and mitigating methane emissions and other pollutants (Hill et al., 2016;

57

Chapter 1 Løvendahl et al., 2018). The mechanisms, pathways and microbes involved in these areas have been studied in an endeavour to tackle these two issues.

1.5.1. Methane Production in the Rumen Methane is classed as a greenhouse gas and contributes significantly to global warming, second only to carbon dioxide, but has a global warming potential 28 times higher compared to carbon dioxide and remains in the atmosphere for just over 12 years (IPCC, 2014). Of methane produced globally through human activity, around 30 % of this is due to enteric fermentation in ruminants, beating emissions from use of fossil fuels, landfill and burning biomass, forming almost 16 % of total global greenhouse gas emissions (FAO, 2016a; Lan and Yang, 2019). It is calculated that cattle produce somewhere between 150 and 420 litres of methane a day, and sheep 25-55 litres a day (Janssen, 2010). These statistics make the important of studying enteric fermentation and methane production clear and that research in this area is a worthwhile task.

The pathways involved in methanogenesis in the rumen are well characterised, with a number of degradation and fermentation processes contributing hydrogen and carbon dioxide to methane production, which is then constantly emitted from the host animal (Lan and Yang, 2019). Archaeal methanogens are the only microbe in the rumen that produce methane (Hook et al., 2010), and most of these are hydrogenotrophs; using hydrogen to reduce carbon dioxide (Attwood and McSweeney, 2008). When carbohydrates from ingested plant matter are broken down in the anaerobic environment of the rumen, sugars are oxidised releasing hydrogen, which reduces NAD+ to NADH (Figure 1.3).

+ NADH is then re-oxidised back to NAD , and the hydrogen reduces CO2, forming methane (McAllister and Newbold, 2008). Other carbon sources for methane production include acetate or methyl containing compounds, and formate instead of hydrogen may be an electron donor (Morgavi et al., 2010). Although this removal of methane from the rumen equates to 2-12 % of energy loss from the feed (Lan and Yang, 2019), it is important that hydrogen is removed from the fluid, as its presence inhibits certain important enzymes that are present in the rumen fluid (McAllister and Newbold, 2008). This methanogenesis pathway is believed to occur inside the cells of methanogens, using a number of intermediates and enzymes. Leahy et al predicted the methanogenesis pathway in full using the Methanobrevibacter ruminantium genome with the enzymes required for these steps (Leahy et al., 2010).

1.5.2. Ammonia Production in the Rumen As described in section 1.3.5.1, the main source of ammonia in the rumen is first from degradation of nitrogenic compounds such as proteins and amino acids in the plant cells. Ammonia concentrations in the rumen tend to fluctuate and can depend on diet, for example feeds rich in urea or protein leads to an increase in ammonia concentration (Russell, 2002). The amount of nitrogen in dry matter and crude protein intake also positively correlates with ammonia production (Bougouin et al., 2016; Liu et al., 2017). It is the proteolysis and deamination of the protein and amino acids in the feed respectively that 58

Chapter 1 results ultimately in ammonia and this production is often faster than microbial protein biosynthesis and the incorporation of ammonia into new useful compounds by the rumen microbes (Chen and Russell, 1988). This results in a build-up of excess ammonia, which is removed from the animal through urinary urea, but this requires energy input from the host (Russell, 2002). There is a large variation in amount of ammonia excreted by the animals, with a range of between 0.19 and 432 grams of ammonia excreted per cow per day, where those in beef feedlots produced the highest amounts (Hristov et al., 2011). Urine contains around 90 % of the ammonia removed from the host, and the remaining 10 % is found in faeces (Bougouin et al., 2016). In cattle, between 10 and 35 % of nitrogen from feed may be incorporated into milk or meat, whereas almost all of the remaining nitrogen is excreted, forming a loss of up to 90 % (Liu et al., 2017). Once excreted, urea is rapidly degraded by urease enzymes to form ammonia, which along with any ammonia present in the urine and faeces, begins to volatilise immediately and once gaseous, enters the atmosphere, where ammonia molecules react with sulphur and nitrogen oxides to form ammonium salts; small particles that contribute to air pollution (Hristov et al., 2011; Liu et al., 2017). Not only do ammonia emissions contribute to air pollution, but also to water eutrophication and disruption of aquatic ecosystems, and cause pH changes in soil effecting terrestrial environments (Hristov et al., 2011; Bougouin et al., 2016). It is estimated that 82 % of the total ammonia emissions around the world are due to agriculture (Leytem and Dungan, 2014).

1.5.3. Methods Considered to Mitigate Methane Emissions and Increase Feed Efficiency To decrease methane emissions, there are two main options; either decrease the population of ruminants or find a way to mitigate methane production. The former option is not feasible, especially with the ever increasing global population of people, livestock farming is forecasted to play a large role as a food source, as well as offering valuable commodities in fibres and livelihoods in farming ruminants (Attwood and McSweeney, 2008). With this in mind, there has already been a reasonable amount of activity and interest in researching alternative ways for reducing methane without affecting animal production. This has resulted in a number of methods which have been outlined as potential ways to mitigate methane production in ruminants. The International Panel of Climate Change (IPCC) listed the potential for use of feed additives such as bioactive compounds or fats, antibiotics, nutrient supplements such as nitrate or sulphate, or inhibitors that target specifically the methanogenic archaeal population. Other methods offer long term management of methane emissions, such as breeding for a low emission phenotype or influencing the microbiome through vaccines, defaunation or bacteriophages (Smith et al., 2014).

Whilst bacteriophage therapy has been mentioned as an alternative for microbiome modulation, it is less so in the literature than other methods have been (Lan and Yang, 2019) and more work needs to be done in this area. Phages (both bacteriophages and archaeal viruses) are abundant in the rumen, and

59

Chapter 1 although each individual is specific, they are numerous and vary greatly, targeting a wide range of bacterial and archaeal hosts causing host death through cell lysis (Klieve and Hegarty, 1999; McAllister and Newbold, 2008). Though this action, phages can influence populations, reducing size of communities with precision, and because of this ability, phage therapy is a considerable method of biocontrol, and may reduce methane and increase feed efficiency by targeting those organisms that contribute to this. For example, it is thought that the use of phages to target and control methanogens would decrease methanogenesis (Klieve and Hegarty, 1999; Gilbert et al., 2015). Phages have been successfully isolated against methanogens, including a phage belonging to the family Caudovirales from the rumen that targets Methanobrevibacter smithii (Gilbert et al., 2015). Other anaerobic environments such as digesters have yielded tailed phages against thermophilic methanogens, such as the virulent ψM1 against Methanothermobacter marburgensis (formerly Methanobacterium thermoautotrophicum) (Meile et al., 1989; Gilbert et al., 2015), and lytic ФF1 and ФF3 against various strains of Methanobacterium thermoformicicum (Nölling et al., 1993). Not only are lytic phages of interest, but discovery of prophages in genomes also offers insight into phage infection and interactions. As mentioned in section 1.4.3.2, sequencing of the rumen methanogen Methanobrevibacter ruminantium revealed the presence of the prophage φ-mru, and the putative lytic enzyme endoisopeptidase (PeiR) encoded was shown to lyse cells of M. ruminantium in culture, which may be a valid alternative method of biocontrol in the rumen (Leahy et al., 2010). Recent developments on this found that by fusing this enzyme to the polyhydroxyalkanoate enzyme, PeiR could then be expressed and displayed on the surface of bio nanoparticles produced by Escherichia coli, which would have a lytic effect on a wide range of ruminal methanogens in vitro (Altermann et al., 2018).

Phages targeting predominant bacteria have also been described and recently the genomes of some of these phages have also been sequenced (Gilbert et al., 2017), with successful isolation of many morphologically distinct phages against Streptococcus bovis, a bacteria involved in ruminal lactic acidosis and therefore a prime target for biocontrol with phages (Klieve and Hegarty, 1999; Gilbert and Klieve, 2015). Despite ample examples of lytic phage isolation from the rumen, there is little evidence to date of these phages as a marketed or commercially available therapeutic for microbiome modulation. Phage therapy has its disadvantages; since phages are highly specific, they may only target very few strains of the same species, plus they need to come into contact with the target, both are issues in the dynamic rumen environment (McAllister and Newbold, 2008). It is also imperative that there is a deep understanding of the rumen ecosystem and communities that the phage will infect (Gilbert et al., 2015).

60

Chapter 1 1.6. Aims and Objectives

• To explore the ammonia producing bacterial species in the rumen microbiome (chapter 1 and chapter 3) and define ammonia production phenotypes in vitro such that bacterial species can be allocated to these groups to allow for in silico comparisons (chapter 3). • Carry out in silico comparisons of genomes (chapter 4) and transcriptomes (chapter 5) of the ammonia production groups to identify a unique signature for the HAPs. • Isolate bacteriophages from rumen-associated samples, such as fresh rumen fluid and faeces. Whilst these phages should be against the targets described previously, phages against predominant ruminal bacteria are also important and contribute to furthering the understanding of the rumen virome (chapter 6). • To utilise counter current chromatography-based techniques to determine the efficacy of this as a potential and novel method to separate bacteriophages (chapter 7).

61

Chapter 2

2. Materials and Methods

This chapter contains all materials and methods used throughout the studies herein. For simplification and ease of reading of the research chapters (chapters 3-7), a simple overview of the relevant experimental procedures is described in each chapter, with references to chapter 2 for further explanation. Furthermore, all scripts created for data analysis are named within the descriptions below and scripts themselves can be found online, as well as walkthroughs for genomics, transcriptomics and bacteriophage analyses that provide more detail about the steps taken, tools and commands used (https://github.com/jessalyn298/thesis_scripts, doi:10.5281/zenodo.4013479).

2.1. Rumen Fluid

Rumen fluid was obtained under the authority of licenses under the UK Animal Scientific Procedures act 19862.

Rumen fluid for the purpose of supplementing culture medium was prepared as a mixture from four fistulated Holstein-Friesian dry cows, which grazed ryegrass pasture through the day and were offered grass silage overnight. Hand grabs of rumen contents were taken in the morning before feeding and strained into four pre-warmed air-tight flasks, and transported to the laboratory, where the fluid from the different cows was mixed in roughly equal proportions before straining through two layers of muslin. The rumen fluid was then clarified to remove large particulate matter by spinning at 21859.2 x g at 4 °C for 25 minutes (Sorvall RC-26 Plus, SLA-1500 rotor). The supernatant was then removed and stored in plastic screw-lid containers at -20 °C.

For phage screening in chapter 6, fresh rumen fluid was obtained from cows and sheep. Hand grabs of rumen contents were taken in the morning from three fistulated 10-year-old dry Holstein-Friesian cows. These were squeezed and strained, then mixed in roughly equal proportions in an air-tight pre- warmed flask. The cows were grazing on ryegrass pasture and given ~500 g dairy concentrates in the morning during sampling. Similarly, hand grabs of rumen contents were obtained in the morning from a mixture of three seven-year-old Aberdale cross Texel fistulated sheep, grazing on ryegrass with ~300 g sugar beet and grass nuts supplemented each morning, and access to a Crystalyx lick. The solids from the hand grabs were squeezed and the rumen fluid mixed and poured into a prewarmed Duran glass bottle until full to reduce the air space. The flasks were transported to the laboratory where they were further processed immediately, as detailed in section 2.11.1.

2 https://www.aber.ac.uk/en/media/departmental/rbi/staff-students/ethics/Experimental-work-involving- animalsat-Aberystwyth-University-En-v-2.pdf 62

Chapter 2 2.2. Culture Media

All rumen Bacteria were cultured and maintained using Hobson’s M2 medium (Hobson, 1969); using 10 ml/L sodium lactate 60 % (w/v), tryptone (Melford, Ipswich UK) as the casein hydrolysate, and minerals (b) consisting of KH2PO4, 3.0 g/L; (NH4)2SO4, 6.0 g/L; NaCl, 6.0 g/L; MgSO4·7H2O, 0.6 g/L; CaCl2·2H2O, 0.6 g/L. Rumen fluid was obtained and clarified as described previously (section 2.1), and defrosted before use. All components except for the cysteine HCl were combined, made up to volume with distilled water and brought to the boil in a microwave to reduce oxygen content. The mixture was then mixed with a magnetic stirrer and warm temperature maintained on a hotplate (>50 °C), whilst carbon dioxide was bubbled through. Cysteine HCl (1.0 g/L) was added at this point, and once adequately dissolved, the broth medium decanted in aliquots of either 7 ml or 10 ml into Hungate tubes gassed with CO2, capped and autoclaved at 121 °C for 15 minutes at 15 psi. After autoclaving, any tubes with broth that appeared dark in colour or pink were discarded. Growth of bacterial cultures were necessary for chapters 3 through 6.

Solid Hobson’s M2 medium for bacterial culturing and phage screening in chapter 6 was prepared as described above, but with the required mass of microbiological agar powder added to a final concentration of 2 % to other ingredients before boiling. This was decanted into Duran bottles or glass Universals that were equilibrated briefly in the anaerobic cabinet and then gassed with carbon dioxide, capped, then autoclaved and maintained at 55 °C before use. Liquid agar was poured into Petri dishes on the bench and left to equilibrate in the anaerobic cabinet with gas mix of 10:10:80 of CO2, H2 and

N2 (Whitley A35 Anaerobic Workstation, Don Whitley Scientific, Shipley UK) overnight to reinstate anaerobic conditions. Alternatively, agar was poured from glass universals into Petri dishes directly in the anaerobic cabinet and left there to set before use. Soft agar for overlays was produced in the same way as bottom agar, but with a decreased quantity of agar powder to obtain a final concentration of 0.8 %, then 10 ml were placed into Hungate tubes, which were then capped, autoclaved, and either retained at 55 °C until immediate use or left to solidify at room temperature, then melted in a water bath immediately prior to use.

2.3. Sub-culturing

To maintain anaerobic conditions in Hungate tubes, cultures were propagated by using a needle and syringe inserted through the rubber septum of the tube. An aliquot of 0.3 ml (~5 %) of either a defrosted frozen culture or other previously grown liquid culture were inoculated into 7 ml of fresh Hobson’s M2 broth medium. When a frozen culture was used, three rounds of sub culturing were used to ensure adequate revival. Cultures were incubated at 39 °C usually overnight, but the time varied depending on culture. Alternatively, a 5 μl loop was used to streak liquid broth culture onto an agar plate in the anaerobic cabinet, and incubated there under anaerobic gas mixture at 39 °C. A bacterial culture stock was created by adding glycerol to a final concentration of 20 % (w/v) to a fully grown culture in a Hungate tube and stored at either -20 °C or -80 °C. 63

Chapter 2 2.4. Gram Staining and Cellular Morphology

Gram staining was used as a quick and convenient method of detecting whether there was likely to be contamination in a bacterial culture. According to manufacturer’s instructions, a small amount of culture was placed onto a glass slide, heat fixed and stained using crystal violet, Gram’s iodine, and safranin, each for one minute, and the supplied differentiator (Pro-Lab Diagnostics, Wirral UK). Cells were then observed using oil immersion and microscopy. This technique was used for chapters 3 through 6.

2.5. Growth Curves

Approximate growth curves were created by measuring culture density using spectrophotometry over time. The purpose of creating growth curves was to identify a suitable time point at which a culture would be dense enough for further characterisations in chapters 3 and 5, but not enter the stationary or death phase. The anaerobic requirements of these cultures made typical plate reader growth assays impractical. Instead, measurements were carried out by growing the cultures as usual in the Hungate tubes, with suitable attachments to measure culture density with a spectrophotometer set to 600 nm (Jenway 6715 UV/Vis spectrophotometer, Cole-Parmer, Stone UK) or 590 nm (Portable Colorimeter model 45, Fisher Scientific, Hampton New Hampshire). Resulting graphs were plotted in Microsoft Excel (Office 16). All measurements were carried out in triplicate, except for some slower growing cultures, where a growth curve was approximated using six cultures, with a 12-hour staggered start. This allowed for the growth over a full 24 (or 48) hour period to be measured within reasonable working hours.

2.6. DNA Extraction

DNA was extracted from bacterial cultures using FastDNA™ Spin Kit for Soil (MP Biomedicals, Solon Ohio). A 2 ml aliquot of culture was spun at 16,060 x g for 10 minutes at 4 °C in a microcentrifuge (Biofuge Fresco, Heraeus Instruments, fixed-angle rotor 3324, Hanau, Germany), and the cell pellet resuspended in 800 μl of kit supplied sodium phosphate buffer, transferred to the Lysis Matrix E and a further 122 μl of MT Buffer added before homogenising samples three times on the FastPrep sample preparation system (FastPrep24 5G, MP Biomedicals, Solon Ohio) for 30 seconds on speed 6.0 m/s, with a 30 second incubation step on ice in between. DNA was eluted from the column filter in 50 μl of DES and spotted 2 μl on a blanked Take3 plate and quantified by spectrophotometry (Epoch, Biotek, Winooski, Vermont).

For a quicker method of DNA extraction without the use of the kit, DNA was also extracted from bacterial pellets using thermal lysis. A frozen bacterial pellet was defrosted and resuspended in molecular grade water, boiled at 100 °C for 15 minutes, cooled on ice for 10 minutes, then spun at 15,000 x g for 5 minutes at 4 °C in a centrifuge (Centurion Scientific K2015R, rotor BRK5424, Chichester UK), then retained on ice. The supernatant was then tested for DNA by spotting 2 μl on a 64

Chapter 2 blanked Take3 plate and quantified by spectrophotometry (Synergy H1 Hybrid Reader, Biotek, Winooski, Vermont).

2.7. Bacterial Whole Genome Sequencing

Samples of DNA extracted from bacterial cultures using the extraction kit were diluted to a final concentration of around 30 ng/μl, in 50 μl, and genome sequencing was provided by MicrobesNG (http://www.microbesng.uk; Birmingham UK), which is supported by the BBSRC (grant number BB/L024209/1). The bioinformatic analysis and genome assembly that this service provides was further utilised, which comprised results from their standard analysis pipeline, which included using Kraken to find the closest available reference genome, BWAmem to align reads to this genome and assess quality, as well as de novo genome assembly using SPAdes and automated annotation using Prokka (https://microbesng.com/microbesng-faq/, accessed 24/02/2020). The results of this are further explained in chapter 3.

2.8. 16S Analysis

Once DNA was extracted, 1 μl was used as a template in a PCR reaction with 400 nM universal 16S primers, the sequences for which can be found in Table 2.1. The DNA was amplified according to manufacturer’s instructions using 0.5 U Taq polymerase and supplied reaction buffer, which contained dNTPs and magnesium chloride along with stabilisers and enhancers (PCRBIO HiFi polymerase, PCRBIOSYSTEMS, London UK). The reaction volume was made up to 25 μl with PCR grade water. Samples underwent thermocycling (Applied Biosystems 2720 Thermocycler V.208, Foster City California); with an initial denaturation step at 95 °C for 1 minute, before entering 35 cycles of 95 °C for 15 seconds, 56 °C for 30 seconds, 72 °C for 60 seconds, followed by a final chain elongation step at 72 °C for 5 minutes then held at 4 °C until further use.

For the phenotype confirmation experiments in chapter 3, the DNA was extracted from bacterial cultures using the MP Biomedicals kits as described above and used as the DNA template. PCR products were visualised using a 1 % agarose gel with 0.5 X Tris-acetate-EDTA (TAE) buffer and 0.5 X GelRed (Biotium, Fremont California). BBS loading dye supplied with the DNA extraction kit (MP Biomedicals, Solon Ohio) was added to the PCR products and samples were run at 100 V for ~70 minutes against a 100 bp ladder (Quick-Load Purple 100 bp Ladder, New England Biolabs). PCR products have an expected size of ~450 bp, and these bands were excised and regained from the gel using the Isolate II PCR and Gel Kit (Bioline, Memphis Tennessee). The excised DNA was then sent for sequencing in-house on the Sanger Sequencing platform, along with two additional samples; forward and backwards PCR products of E. pyruvativorans Isol6 that did not undergo gel excision, to see whether the same results could be obtained with fewer steps.

For the transcriptomic experiments in chapter 5, 16S analysis was done first by using thermal lysis for DNA extraction, which was used directly as the DNA template in the PCR reaction described above. 65

Chapter 2 PCR products were then run on a gel as before, but using a 100bp HyperLadder (Bioline, Memphis Tennessee). The PCR products were sent directly for sequencing in-house on the Sanger Sequencing platform.

Table 2.1 Universal 16S primers used to amplify the V6-V8 region of rRNA. F968GC 5’-CGCCCGCCGCGCGCGGCGGGCGGGGCGGGGGCACGGGGGGAACGCGAA GAACTTAC – 3’ R1401 5’-CGGTGTGTACAAGACCC-3’

2.9. Sample Preparation for Phenotype Confirmation

Each of the bacteria of interest were sub-cultured into fresh Hobson’s M2 broth and incubated for the designated time for each bacterium as determined by the growth curves (Appendix Figure 10.1), such that samples were taken during the mid-log growth phase. For each bacterium, seven replicates were sub-cultured, six of which were used for further processing, and the remaining one would be further incubated as a control to ensure continued growth of culture. For continuity purposes, the same replicate underwent all of the experiments explained below, such that results could be tracked and correlated. This means, for example, replicate one in ammonia production, volatile fatty acid (VFA) measurements and Fourier transform infrared spectroscopy (FTIR) could be correlated (chapter 3).

2.9.1. Measuring Ammonia Production An aliquot (1 ml) of bacterial culture grown to mid-log phase was added to 50 μl concentrated hydrochloric acid, mixed, and stored in -20 °C freezer until analysis. Ammonia quantification was then carried out in-house using a colorimetric method, reacting ammonia with salicylate and dichloroisocyanurate and analysed using a segmented flow analyser (ChemLab system 40, ChemLab Instruments Ltd, Essex UK). Sample readings were compared to a standard curve and multiplied by the dilution factor to obtain a final concentration of ammonia present in the samples. Graph creation and statistical tests were carried out using R (R Core Team, 2018) in RStudio (RStudioTeam, 2016), using packages ggplot2 (Wickham, 2016), agricolae (de Mendiburu, 2019), ggpubr (Kassambara, 2020), and dplyr (Wickham et al., 2019) (ammonia_boxplot.R). A Kruskal-Wallis test was used with Bonferroni correction to determine whether the amounts of ammonia produced by a bacterium was significantly different to that of the growth medium baseline level and to other samples (chapter 3).

2.9.2. Measuring Volatile Fatty Acid Production For each replicate, 1 ml of mid-log culture was added to 50 μl concentrated orthophosphoric acid, mixed and stored at -20 °C. For analysis, samples were defrosted, filtered through 0.45 μm pore-size filters into a small vial and capped ready for analysis using gas chromatography in-house. An aliquot of 1 μl of sample was injected at 250 °C into the Varian CP3380 chromatograph, carried by hydrogen

66

Chapter 2 at 5 psi at a rate of ~20 ml/min through a 15 metre HP-FFAP column (Agilent, Santa Clara California) and detected by a flame ionisation detector. Resulting data was analysed using a Kruskal-Wallis test and Bonferroni correction, and displayed graphically using R (R Core Team, 2018) in RStudio (RStudioTeam, 2016) (VFA_multiplgraphs.R and VFA_significance_testing.R). This analysis pertains to chapter 3.

2.9.3. Fourier Transform Infrared Spectroscopy For each replicate, 2 ml of culture was spun in a centrifuge at 10,000 x g for 10 minutes at room temperature. From the upper most volume of supernatant, 200 μl was aliquoted into a fresh tube and frozen in liquid nitrogen. The remaining supernatant was discarded, and the remaining cell pellet also frozen in liquid nitrogen. Samples were stored at -80 °C until analysis.

Before analysis, cell pellets were defrosted, resuspended in 200 μl of distilled water and transferred to a cryotube, along with a tungsten bead, and placed in ice. Samples were subjected to 2 minutes on a tissue lyser at speed 30 (Qiagen Tissue Lyser, Hilden Germany) and placed back on ice. 5 μl of each sample was spotted once onto a 96-well silicone plate, with replication of 10 % of samples for quality assurance, briefly dried on a dry heating block at 40 °C and then inserted into the FTIR analyser (Vertex 70 spectrophotometer, Bruker Optik GmbH, Germany). Supernatants were defrosted, mixed well and 5 μl of each sample was spotted onto the sample plate and dried and analysed as described above. Resulting data was converted to xy (wavelength against absorance) data and analysed using PyChem (v3.0.5g, (Jarvis et al., 2006)). Spectra, principal components analyses (PCA) and discriminant function analyses (DFA) were all produced and carried out in PyChem, with cross validation implemented by randomly assigning replicates within one ammonia production group (formed of three species each) to either train or test equally using a random number generator, then using the 10 % checks as validation. The cross validation was repeated six times each for the pellet and supernatant data (chapter 3).

2.10. RNA extraction for Bacterial Transcriptomics

Bacterial cultures of the three hyper ammonia producers and three no ammonia producers (see section 3.3.1) had total RNA extracted for transcriptomic analysis (chapter 5). Four RNA samples were prepared for each species by growing tubes of 7 ml culture to mid log phase (as determined by growth curves, Appendix Figure 10.1), the cellular matter was pelleted by spinning at 15,000 x g for 15 minutes at 4 °C (Eppendorf 5810R, fixed angle rotor 34-6-38). The fresh pellets were then resuspended using the lysis buffer supplied by the MP Biomedicals FastRNA™ SPIN Kit for Microbes. RNA extraction was then completed as per the manufacturer’s instructions, using the FastPrep-24 instrument (MP Biomedicals, Solon Ohio) at recommended settings, with the minimum recommended settings at 20 °C for the spin steps (Centurion Scientific K2015R centrifuge, rotor BRK5424). In the case of A. sticklandii 12662 and R. flavefaciens 007c, as these bacteria have a lower

67

Chapter 2 growth density, two 7 ml tubes of culture were grown per RNA sample, the pellets resuspended in 400 μl each of the lysis buffer (half the recommend volume for one sample), then two pellets combined in one lysing matrix tubes supplied in the kit. RNA was eluted in the final step using 20 μl of supplied DNAse/RNAse free water.

An extra 7 ml tube was inoculated for each species at the same time as those for the transcriptomics analysis and the bacterial cells also pelleted. These were then frozen at -80 °C and used for DNA extraction using the thermal lysis method and umderwent 16S rRNA analysis (section 2.8), to ensure that the cultures grown for RNA extraction were indeed the species expected.

After RNA was extracted, it was quantified and quality analysed using spectrophotometric approach (Take3 plate, Synergy H1 Hybrid Reader, Biotek, Winooski Vermont). To each sample, 1 unit of DNAse I (RNAse free, Thermo Scientific, Loughborough UK), which was diluted in 10 X reaction buffer with MgCl2, was added per microgram of RNA, along with 1 X reaction buffer and DEPC- treated water as per the manufacturer’s instructions. The samples were incubated at 37 °C for 30 minutes, then 1 μl of 50 mM EDTA was added per microgram of RNA and incubated at 65 °C for 10 minutes. The final RNA concentration was determined again using spectrophotometry and samples were frozen at -80 °C.

Samples were sent on dry ice to the Genomics Core Technology Unit at Queen’s University, Belfast, where the samples underwent further quality control measures, library preparation and sequencing. Briefly, samples were prepared using the Roche KAPA RNA HyperPrep kit with 200 ng of total RNA, quality control on the library preparation step performed on the fragment analyser (HS NGS kit), using Qubit and KAPA quantification. Libraries were pooled in equimolar amounts and denatured before applying to the Illumina NextSeq High Output 75. The indices used for these libraries were from the KAPA Unique Dual-Indexed Adapter kit and the Illumina TruSeq HT kit.

2.11. Phage Screening

Unless otherwise stated, methods for the preparation of rumen fluid and faecal samples and the isolation, propagation and purification of bacteriophages were followed as published previously (Klieve, 2005; Klieve and Gilbert, 2005) (chapter 6).

2.11.1. Source of Phages Samples of rumen fluid and faeces were obtained from fistulated sheep and cows. Rumen fluid was obtained as described in section 2.1. From the same animals, fresh faeces were caught in a gloved hand once excreted naturally from the sheep and collected into sterile plastic bags. Fresh faeces from cows were collected from the field that morning into plastic pots and transferred into centrifuge tubes.

68

Chapter 2 2.11.2. Phage Filtrates For both sheep and cows, phage filtrates were made from rumen fluid, as well as samples of faeces from three separate animals each, as per the published methods (Klieve, 2005; Klieve and Gilbert, 2005). The rumen fluid was aliquoted into centrifuge tubes, spun at 15000 x g for 15 minutes at 4 °C, and the supernatants stored on ice and filtered through 0.45 μm pore-size low-protein binding PES syringe filters. Around 10 g of faeces was weighed out into centrifuge tubes and 10 ml of Phage

Storage Buffer (PSB; 20 mM Tris.HCl, 200 mM NaCl, 20 mM MgCl2, 0.1 % (w/v) gelatin, (Klieve, 2005)) was added, incubated at room temperature for one hour with mixing on a rotator. After spinning at 15,000 x g for 15 minutes at 4 °C, the supernatant was filtered through a 0.45 μm pore-size PES syringe filter. Filtrates were wrapped in foil and stored at 4 °C.

2.11.3. Enrichments A 40 μl aliquot of each phage filtrate was added to 1 ml bacterial culture maintained in Hobson’s M2 medium in the early stages of growth. This was then incubated at 39 °C overnight, then spun at 15,000 x g for 15 minutes at 4 °C, and the supernatants were refrigerated until use.

2.11.4. Polyethylene Glycol Precipitations Phage particles were concentrated from lysates (samples collected after phage elution) or filtrates (filtered samples to test for phages) by addition of polyethylene glycol (PEG) and sodium chloride with modifications to methods published previously (Bourdin et al., 2014; Antibody Design Laboratories, 2015; Gutiérrez et al., 2018; Namonyo et al., 2018). A 5 X stock of 50 % (w/v) PEG 8,000 and 2.5 M NaCl was made. A 1 X working concentration of the PEG/NaCl stock was used to precipitate phages overnight at 4 °C, and precipitated phages were collected by spinning at 12,000 x g for 30 minutes at 4 °C. The supernatants were discarded and the pellet resuspended in a smaller volume of Fortier Buffer (FB; 20 mM Tris-HCl, 100 nM NaCl, 10 mM MgSO4, (Fortier and Moineau, 2009), wrapped in foil and refrigerated.

2.11.5. Soft Overlay Technique A technique for isolating bacteriophages that uses a solid agar base and a soft agar overlay that contains bacteria, such that the cells grow confluently to form a lawn, a technique recommended for rumen phage isolation (Klieve, 2005). A lawn of host bacteria was achieved by adding 1 ml of overnight (or actively growing) culture to 3 ml of warm 0.8 % Hobson’s M2 agar (soft agar), mixing and pouring over set bottom agar (1.5 % Hobson’s M2 agar) in an anaerobic cabinet with gas mix of

10:10:80 of CO2, H2 and N2 (Whitley A35 Anaerobic Workstation, Don Whitley Scientific).

For a spot test, 10 μl of phage filtrate was spotted in triplicate per phage filtrate sample onto the set soft agar overlay. Once dried, plates were inverted and incubated for >24 hours at 39 °C, or until lawn growth was visible.

69

Chapter 2 For a plaque assay, 10 μl of phage filtrate is added to 1 ml of overnight culture, left to incubate for no more than 15 minutes, then mixed with 3 ml soft agar and poured over the bottom agar. Once set, plates were inverted and incubated for >24 hours at 39 °C, or until lawn growth was visible.

2.11.6. Phage Screening, Isolation and Purification Following a spot test of phage filtrates, any spots positive for phage activity would appear as an area of lysis or clearing on the otherwise translucent-opaque bacterial lawn. Any positive spots were scraped using a 5 μl inoculating loop, which extracted top soft agar and then was mixed well in a small volume of PSB (see section 2.11.2), vortexed briefly and left to stand at room temperature for 30 minutes and stored at 4 °C until testing again on the same bacterial host.

Upon a positive result of propagation from the scraped sample, a plaque assay was carried out and after incubation overnight, well isolated plaques were picked using a pipette tip, placed into a small volume of PSB, vortexed briefly, left to stand for half an hour at room temperature, and tested again using a plaque assay. Dilutions of the picked plaques were made using PSB to avoid plaques overlapping in the early stages of purification, so that single isolated plaques with one morphology type could be observed.

Once the plaque morphology was homogenous, or at least three rounds of purification had taken place, the phage was eluted from a confluent plaque assay plate by adding 5 ml of PSB to the plate, macerating the agar gently using a plastic spreader, and leaving it to incubate at room temperature on a rocker, for at least 30 minutes. The eluate was aspirated using a wide-bore pipette tip into a microcentrifuge tube, spun at 15,000 x g for 2 minutes to pellet any agar, and the supernatant was filtered through a 0.45 μm pore-size PES syringe filter. Eluates were stored at 4 °C.

2.11.7. Alternative Growth Media to Improve Lawn Growth In order to improve growth of bacterial lawns, alternatives were sought. Casamino acids were added to the basal Hobson’s M2 medium for C. aminophilum cultures to a final concentration of 15 g/L (Rychlik and Russell, 2002). Alternatively Reinforced Clostridial Medium (RCM) was made, which consisted of 13 g/L yeast extract, 10 g/L peptone, 5 g/L glucose, 1 g/L soluble starch, 5 g/L NaCl, 3 g/L sodium acetate, and 0.5 g/L cysteine hydrochloride, with 2 % agar, and supplemented with 1.5 % casamino acids as recommended by the ATCC for the growth of C. aminophilum (ATCC). The addition of 15 g/L Bacto™ Peptone (BD Biosciences, Sparks, USA) to Hobson’s M2 medium was used to supplement E. pyruvativorans (Wallace et al., 2004). To try to improve growth of R. flavefaciens, cellobiose was added to the M2 medium to a final concentration of 2.5 g/L in addition to the 2 g/L already present (Saluzzi et al., 2001). A stock of each supplement was made up with distilled water, warmed to dissolve, then in the anaerobic cabinet, filter sterilised through a 0.22 μm pore-size PES syringe filter. To indicate as to whether or not these additions had any effect on the growth,

70

Chapter 2 optical density of the culture after 14, 16 and 22 hours were measured using a spectrophotometer at 600 nm (Jenway 6715).

2.11.8. DNA Extraction Phages from 800 μl of eluates were precipitated using PEG/NaCl overnight and resuspended in a volume one tenth of the initial volume, 40 μl of which was then used for DNA extraction. Controls comprised of a “host lysate sample”, which was made by taking an aliquot of host bacterial liquid culture, vortexing briefly and boiling at 100 °C for 15 minutes to lyse the cells, before spinning at 15,000 x g for 10 minutes, and filtering the supernatant through a 0.45 μm pore-size PES syringe filter. Of this, 600 μl was precipitated overnight with PEG/NaCl, alongside a sample of molecular grade water. These controls also underwent DNA extraction. The FastDNA™ Spin Kit for Soil (MP Biomedicals, Solon Ohio) was used to extract DNA from the lysate samples, using the FastPrep24 for 30 seconds at 6.0 m/s, and spinning at 14,000 x g for 5 minutes at 18 °C. DNA was eluted from the column with 30 μl of supplied DES water.

The DNA concentration was determined using the Qubit fluorometer and the high spectrum DNA assay (Qubit 3 Fluorometer, Invitrogen by Thermo Fisher Scientific). Those samples that had a low concentration of DNA were increased using a DNA Speed Vac (DNA 100, Savant), or in one case was re-extracted using 900 μl of phage eluate and combining this with 300 μl of the supplied sodium phosphate buffer in the first step. DNA was eluted from the column first with 30 μl of DES water, then a second time with 50 μl. The concentration of DNA in the re-extracted and concentrated sample were measured again, and all DNA samples were diluted with nuclease free water to achieve a final concentration of ~2 ng/μl in 10 μl necessary for sequencing.

Quality of the DNA was tested using spectrophotometry via the 260/280 ratio (Epoch, Biotek Instruments) and 5 μl of DNA samples were diluted such that <30 ng/μl was loaded with 1 X BBS loading dye (MP Biomedicals, Solon Ohio) in a 0.5 % agarose gel, using 0.5 % TAE buffer and 1 X GelRed (Biotium, Fremont California), alongside a 1 kb Hyperladder™ (Bioline, Memphis Tennessee) and ran at ~90 V for ~50 minutes. To gauge the expected behaviour of the bacterial genome, a one in 20 dilution of previously extracted DNA from the host bacteria (using the same kit) was loaded on the gel, alongside the other two controls.

2.11.9. Solvent Stability and Host Range Testing To 100μl of phage eluate, an equal volume of either chloroform or PSB was added, mixed well, and refrigerated for two hours, before testing using a spot test on the host bacterium (Klieve et al., 2004). Dilutions of the phages were carried out using one in ten dilutions down to 10-6.

Phage eluate were spotted using a ten-fold dilution up to 10-6 onto other potential hosts, which comprised of a different strain to the isolation host; B. fibrisolvens JW11, three different species; B. hungatei strains JK615, Su6 and DSM10295, and three of the same genus; B. sp. strains DSM10305, 71

Chapter 2 DSM10316, and M55. All hosts were grown in Hobson’s M2 liquid medium overnight and then 1 ml of each host bacterium was mixed with 3 ml of 0.8 % soft M2 agar and poured over set bottom agar. Possible phage activity was compared to that of the phage samples on the original B. fibrisolvens D1 (DSM3071) host.

2.11.10. TEM An aliquot of the sheep and cow rumen fluid phage filtrates were precipitated using PEG/NaCl, and these samples taken to TEM, which was carried out in-house. Drops of 25 μl of phage eluate samples were applied to Formvar-filmed and carbon coated 300 Mesh Nickel grids (Agar Scientific, Stansted UK) and left to absorb for two minutes before wicking away excess samples with filter paper. The stain used was methylamine tungstate (Agar Scientific), using 25 μl of a 1 % (w/v) aqueous solution. After one-minute, excess stain was wicked away and the grids left to air dry. Phage samples were visualised at 80 kV using a JEOL JEM1010 transmission electron microscope (JEOL Ltd, Tokyo Japan) and photographed using Carestream 4489 EM film (Agar Scientific, Stansted UK). Images were developed in Kodak D-19 developer for four minutes at 20 °C, fixed, washed, and dried, according to manufacturer’s instructions. The resulting negatives were scanned with an Epson Perfection V800 film scanner (Epson, Suwa Japan) and converted to positive images.

2.11.11. DNA Sequencing DNA sequencing was done in-house, where isolated phage DNA was first diluted to 1 ng/ul and libraries then prepared using the Illumina Nextera XT protocol as per the manufacturer's instructions, selecting AMPure bead ratio as suggested for 2 x 300 bp reads. Libraries were quantified via Qubit fluorescence spectrophotometry, pooled at equimolar ratio, and diluted to 6 pM before loading on an Illumina MiSeq platform using a v3 600-cycle kit in 2 x 300 bp format. The sequencing and library preparation were carried out by the Next-generation Sequencing and Genotyping facility at Aberystwyth University.

2.12. Counter Current Chromatography

Counter Current Chromatography was performed with a Spectrum Series 20 model (Dynamic Extractions, Tredegar Wales) (chapter 7). All runs utilised a semi-preparative column, with ~135 ml capacity and width of 0.8 mm. The rotation of the column was always 1,600 rpm, and distilled water was used as the mobile phase. Sample volume, flow rates, and whether reverse or normal phase were all variables changed between runs, and for ease, conditions are restated with each result. Four UV channels were set to 210 nm, 254 nm, 280 nm and 366 nm, and recorded absorbance during elution, whilst fractions were collected at suitable intervals (Foxy R2 Fraction Collector, Teledyne Isco, Lincoln Nebraska). Fractions of interest were identified using the chromatograms and tested for phage presence using the spot test as described above.

72

Chapter 2 2.12.1. Obtaining Model Bacteriophages and Their Bacterial Hosts Escherichia phage T4 (DSM4505) was purchased from the DSZM collection along with its bacterial host Escherichia coli strain B (DSM613), and Escherichia phage ϕX174 (DSM4497) and its bacterial host E. coli strain PC 0886 (DSM13127). The freeze-dried bacteria were revived according to instructions supplied; briefly 0.5 ml of Luria Broth (LB; Melford Laboratories, Ipswich UK; made according to manufacturer’s instructions) was added to the freeze dried material and left to rehydrate for 30 minutes, after which around half of this was transferred to 5 ml of LB broth, and a 5μl inoculation loop was used to streak onto 2 % LB agar (LB broth with 2 % bacteriological agar powder; Sigma Aldrich). The remaining volume was used to spread onto 2 % LB agar. These were incubated at 37 °C overnight, with the broth culture subjected to shaking (~225 rpm). To the fully-grown broth cultures, glycerol was added to a final concentration of 10 %, mixed and stored at -20 °C for future use.

ϕX174 was received as an active culture and was ready to be propagated using the soft-overlay technique and plaque assay as described below. Dilutions from 1 to 10-5 were plated using this technique, and the 10-4 dilution plate proved to be suitably confluent to elute the phages as described below. Aliquots of the elute were mixed with glycerol at a final concentration of 10 %; the cryoprotectant recommended by the DSMZ, and stored at -20 °C.

T4 was received as phage particles bound to a filter paper, which was propagated following the recommended methods. Briefly, the filter paper was placed on top of a prepared soft-agar overlay containing E. coli B (DSM613) and incubated at 37 °C overnight. The area of clearing was observed, and phages eluted using the method described below, incubating the eluent with the phage overlay for 4 hours before aspiration.

2.12.2. Propagation and Preparation of Model Bacteriophages Bacteriophages were propagated using the soft-overlay technique with the relevant host. From a frozen stock, a 5 μl inoculating loop was used to streak the host bacterium onto a 2 % LB agar plate with added 5 mM MgSO4 and incubated at 37 °C overnight. This plate was then stored in the fridge for up to a month, from which colonies could be picked to make the cultures for phage propagation. This was done by taking a single colony and mixing it with a small volume of LB with added 5 mM of MgSO4 (~2-5 ml, depending on the experiment), and incubated at 37 °C with shaking (~225 rpm) until sufficient growth (~5 hours or overnight). As long as a fresh culture was used for the overlays and that at least the exponential phase of growth was reached, the absolute incubation time of the broth culture had little effect on the resulting lawn quality. All broths and cultures used to grow host bacteria to propagate bacteriophages was supplemented with 5 mM MgSO4, to increase phage binding (Clokie and Kropinski, 2009).

When needed, bacteriophage preparations were first serially diluted 1 in 10 with LB broth with added

5 mM MgSO4, and either used for spot tests or plaque assays. For spot tests in 90 mm Petri dishes, 100 73

Chapter 2 μl of fresh bacterial host culture was added to 3-4 ml of 0.75 % soft LB agar with added 5 mM MgSO-

4, mixed, and poured over set 2 % LB agar with added 5 mM MgSO4. For larger square plates, 400 μl of fresh host culture was combined with 10 ml of soft agar and poured over the same bottom agar as previous. Once set, 10 μl of each dilution was spotted gently onto the soft agar, often in duplicate or triplicate, as necessary. A plaque assay was carried out in a similar way, but 10 μl of one phage dilution was incubated for no more than 15 minutes with 100 μl of the freshly grown bacterial host culture, then mixed with 3-4 ml of 0.75 % soft LB agar with added 5 mM MgSO4, mixed, and poured over set bottom agar as before. Once fully dry and set, the plates were incubated inverted at 37 °C until plaques were visible (>6 hours).

To elute phages from an agar overlay, ~5 ml of FB (see section 2.11.4) was added to those suitable dilution plates that achieved adequate confluent growth (where a high number of individual plaques are visible, with minimal overlap), and left on a rocker for >2 hours. Eluate was aspirated into fresh tubes of a suitable size and subjected to centrifugation at 5,000 x g for 10 minutes at 20 °C (Eppendorf 5810R centrifuge, fixed angle rotor F34-6-38). Supernatants were then filtered through a 0.45 μm pore-size low-protein binding PES syringe filter, and subsequence phage filtrates stored at 4 °C until use.

2.12.3. Spectroscopy As it was unclear what effect the presence of phage had on UV readings at wavelengths that the in-line detector was set to, samples were scanned across multiple wavelengths and transmission measured with a spectrophotometer. Samples included 800 μl of distilled water (mobile phase) alone, FB alone, bacterial culture alone and with phage samples. Each sample was aliquoted into a 1 cm pathlength plastic cuvette and scanned with the spectrophotometer (Ultrospec 4000, Pharmacia Biotech, Sweden). Either distilled water or FB was used to blank the spectrophotometer before measuring the sample of interest. Resulting absorbance graphs were compared.

2.13. Data

Sequence data used in this study is sourced from the Hungate 1000 project3 (Seshadri et al., 2018). This includes full genomes for 494 bacteria and archaea. Four additional genomes were added that were cultured in-house and sequenced by MicrobesNG (Birmingham, UK). A total of 498 genomes were available for these analyses. These genomes sequences were downloaded and re-annotated in- house using Prokka (Seemann, 2014) and made available for other studies (Wilkinson et al., 2018). Some or all of these genomes were used in analysis in chapters 3 through 6.

3 https://genome.jgi.doe.gov/portal/TheHunmicrobiome/TheHunmicrobiome.info.html, accessed September 2016 74

Chapter 2 2.14. Culture Confirmation Using 16S Sequencing

The resulting Sanger reads were queried using BLAST (blastn; v2.6.0+, (Altschul et al., 1990)) with default settings against a database containing all of the 16S sequences from the Hungate1000 collection genomes as well as the inclusion of Acetoanaerobium sticklandii strain DSM 519 16S sequence (NCBI Reference Sequence: NR_102880.1). Hits were manually checked, and the top hit for each query sequence was reported in chapter 3.

2.15. Comparisons to Acetoanaerobium sticklandii DSM519

The protein sequences and nucleotide whole genome sequence for Clostridium sticklandii DSM519 (species name not updated but will be refered to as Acetoanaerobium from herein) was downloaded from NCBI database GenBank: FP565809.1, as published previously (Fonknechten et al., 2010). Selenocysteine transfer RNA sequencing (tRNA-sec) were searched for in genome sequences using SecMarker (Santesmasses et al., 2017) online4 using defaults settings. The published list of 74 characteristic genes (Fonknechten et al., 2010) were used to obtain the protein sequences from the published genome, and used to create a BLAST database, against which the genomes of the nine HAPs, NAPs and SAPs were queried (blast-2.8.1+, (Altschul et al., 1990)). The top hits were retained if the percentage identity was ≥30% and the alignment length was ≥80% of the target gene length, where target genes were the 74 characteristic genes, as per the cut-offs used previously (Fonknechten et al., 2010).

2.16. 40 Universal Gene Markers Phylogenetic Tree

A phylogenetic tree was built in chapter 4 using the rumen microbial genomes available in the Hungate1000 collection using a supermatrix of 40 universal single-copy protein-coding genes, which were identified previously (Creevey et al., 2011), a list of which can be found in the appendix (Appendix Table 10.2). Firstly, orthologs for each of the 40 universal genes were identified in each of the genomes. This was done for each species, by querying the amino acid gene sequences, that were created using Prokka, against a database of amino acid sequences for each of the 40 genes, which were comprised of orthologous sequences from organisms across the tree of life. This sequence similarity search was carried out using blastp in Diamond (v0.7.9.58, (Buchfink et al., 2015)), and the first hit for each gene in the genome was retained as long as the Bitscore was above 605. This was further filtered by searching for the highest score for each unique combination of genome and COG, resulting in one ortholog for each of the 40 genes for each of the 498 genomes. Nearly all genomes contained

4 https://secmarker.crg.es/ 5 E-values and Bitscores are generally more reliable than percentage identity when inferring homology, and generally a Bitscore of 50 would be almost always significant assuming normal protein lengths and that databases are not large (Pearson, 2013). The use of a Bitscore of 60 in this study is fairly arbitrary but ensures adequate homology for these sequences, a threshold which was used before for homology assignment in metagenomic studies (Harrington et al., 2007). 75

Chapter 2 orthologs for all 40 universal genes, with one bacterial genome containing the fewest with 31 orthologs of the universal genes. The orthologs for each of the 40 genes were then aligned using MUSCLE (v3.8.31, (Edgar, 2004)), then concatenated together using Catsequences (https://github.com/ChrisCreevey/catsequences) to create a supermatrix. A non-partitioned maximum- likelihood tree was built with the supermatrix using RAxML (v8.2.9, (Stamatakis, 2014)) with PROTGAMMA and LG substitution model. The four genomes sequenced were added into the tree using the option to insert into a reference tree. The resulting tree was visualized using Interactive Tree of Life (iTOL, v5.5, (Letunic and Bork, 2007)) and the symbol annotations were added to the tree to indicate the ammonia production phenotype (if known) and the four genomes that were sequenced.

2.17. Sequence Similarity Approach and Evolutionary Genome Networks (EGN)

Similarity networks were used in chapter 4 to create gene families using the amino acid gene sequences for all 498 genomes using the Evolutionary Gene and Genome Networks generator (EGN, v1.0, (Halary et al., 2013)). EGN takes as input the results of an all-against-all sequence similarity search, which was performed using blastp in Diamond (v0.7.9.58, (Buchfink et al., 2015)) returning the first 1,000,000 hits for each query sequence. EGN has a variety of parameters that can be set that influences the edges created (connections between genes) and the clustering of protein sequences into gene families. The edge creation step takes the input of the all-against-all hits, and using the user defined parameters, filters hits that match these parameters, storing them in a table of edges. The option in EGN for fast edge creation was always used, which required more memory but less storage space.

There are six parameters for a user to define in the edge creation step, each with a default, which are explained using information from the EGN manual and supplied in Table 2.2. During the network creation step, the gene network option was chosen. Once the edge table was created, to avoid recomputation this table could be further filtered by changing parameters again (but only above that of what was set in the previous step). Hit coverage and best reciprocity values however cannot be altered, but a user can choose whether to enforce these conditions or not. To acquire the optimal settings for clustering genes in this dataset, optimisation tests were carried out.

2.17.1. Optimising Network Parameters To determine the best parameters for gene clustering using EGN, eight tests were completed, where one parameter was altered each time, concentrating on optimisation through changing best reciprocity, hit coverage percentage and hit identity. Each run was analysed for clustering efficiency using four different indicators: the number of gene families produced, the number of sequences present in the singletons group (those that do not fit within the parameters, or those without a hit in the all-against-all sequence similarity search) and the number of sequences in first gene family EGN creates. The

76

Chapter 2 clustering of the 40 universal genes was the fourth indicator for gene clustering efficiency, as it would be expected that these genes would be clustered together into one family for each. If gene homologs any one of the 40 universal genes were found in more than one gene family, the split was annotated onto the 40 genes tree using iTOL to visualise whether the split was phylogenetic. All tests used a Bitscore cut off of 60 during the all-against-all sequence similarity search step, except one, which looked at influence of Bitscore used on gene clustering in EGN by changing the threshold to 20.

The best results used a Bitscore cut off or 60 for the all-against-all search (Harrington et al., 2007), as well as best reciprocity set to 100 %, hit identity 50 % with both parameters enforced, and the other parameters left at the default. These results created by EGN using these settings were used for further analysis.

2.17.2. Dividing the largest component Once the optimal parameters were set, the largest component or gene family that EGN created was further split using a community detection method employed previously by Cheng et al called Louvain modularity clustering (Cheng et al., 2014). This method looks for closely connected community of nodes using an iterative approach for each node to maximise the modularity, which is a measurement of edge density within the community compared to between communities (Blondel et al., 2008). Louvain clustering was implemented using the iGraph package (v1.0.1, (Csárdi and Nepusz, 2006)) in R (v3.2.3, (R Core Team, 2018)) (igraph_clustering.R).

2.17.3. Finding Gene Families of Interest A count table that gives the number of genes present in each gene family for each of the 498 genomes was created using the counts in gene families created by EGN and after the community clustering. Only the subset of those nine genomes with known ammonia production phenotypes were further assessed (see chapter 3), first by simply sorting the data and looking for trends manually. Then a principal component analysis (PCA) was used to determine any trends in the data by normalising the data to the minimum; a scaling value was created by taking the number of genes in the genome with the fewest genes and dividing this by the number of genes in the genome to be analysed, then multiplying this value with the gene count in each genome. PCA was carried out in R (v3.6.0, (R Core Team, 2018)), using the command prcomp with scaling and packages ggplot (v3.1.1, (Wickham, 2016)) and ggrepel (v0.8.1, (Slowikowski, 2019)) to plot the data (genomes_EGN_pcas.R).

77

Chapter 2

Table 2.2 EGN parameters. Parameters a user can define during the edge creation step of EGN (Halary, 2012). Default Parameter name Parameter description setting The maximum number acceptable for a hit to be created, E-value threshold 1e05 taken from the all-against-all sequence similarity search Minimum hit identity as a percentage, taken from the all- Hit identity threshold 20 % against-all sequence similarity search Hit covers of the shortest Number of identical residues divided by shortest sequence 20 % sequence length, expressed as a percentage. Minimal hit length in Minimum acceptable hit length in nucleotides. Divided by 75 nucleic acids three for amino acids. A hit must be reciprocal, meaning gene A must hit gene B when either are the query or the subject. This reciprocal Best reciprocity 5 % must also be within X % of the e-value for the best hit for that gene, where x is a user specified percentage. Requires X % of the length of the sequence to be hit to Hit coverage consider edge creation, where X is a user specified 90 % percentage.

2.18. Functional Annotation Approach

The amino acid sequences of predicted genes (the same used in the EGN sequence similarity search above) from just the nine genomes of interest (chapter 3) were annotated functionally in chapter 4 using Eggnog Mapper (v2.0.1, eggnog database version 5.0.0; downloaded 02/01/2020, (Huerta-Cepas et al., 2016, 2017), requiring Diamond v0.0.24 (Buchfink et al., 2015)). Genes were sorted into gene families based firstly on clusters of orthologous groups (COGs), and then re-clustered into different gene families based on the KEGG annotation (KO). Genes could appear in multiple different gene families if more than one functional annotation were assigned to it. Count tables for numbers of genes in each gene family was created for both the COGs and KOs. These were then simply examined using sorting and filtering to find those families of interest. Each COG is also assigned a functional category identifier, which allocates a function to a higher level, more generic function (such as carbohydrate metabolism or amino acid metabolism).

The counts of genes that were assigned each functional category identifier were totalled and displayed as a 100 % stacked column graph in Microsoft Excel (Office 16). Principal component analysis was carried out using the counts of COGs in the nine genomes of interest, which were normalised to the lowest number of total COGs; each count was scaled by multiplying each gene family count by the lowest total number of COGs in a genome divided by the number of COGs total in the sample genome. The prcomp command with scaling in R (v3.6.0, (R Core Team, 2018), and packages ggplot2 (v3.1.1, (Wickham, 2016)) and ggrepel (v0.8.1, (Slowikowski, 2019) were used to calculate and plot the resulting principal components (genomes_bact_OGs_pcas .R).

78

Chapter 2 2.19. Identification of Transporter Genes Present in the Three Phenotypes

An analysis of the types of transporters present in geomes was carried out in chapter 4. First a BLAST database was created of all of the available homologs for transporters available on Transporter Classification Database (TCDB6; downloaded 14/01/2020; (Saier et al., 2016)). A total of 19,116 amino acid sequences were added and using a sequence similarity search with blastp (vblast-2.8.1+, (Altschul et al., 1990), the genomes of interest were queried against this database. The top hit from the BLAST output for each gene was retained if the Bitscore was >60. The occurrence of the transporter families of interest (Table 2.3) were then counted and a stacked bar graph was produced in R (v3.6.0, (R Core Team, 2018)) using the library packages ggplot2 (Wickham, 2016)7 and reshape2 (Wickham, 2007) (genomes_aatransporters.R).

Table 2.3 List of amino acid and peptide transporters of interest with their transporter codes.

Transporter Code Description 1.B.83 The Alpha-Aminoxy Acid Channel Family 2.A.118 The Basic Amino Acid Antiporter Family 2.A.3 The Amino Acid-Polyamine-Organocation Family 2.A.95 The 6 TMS Neutral Amino Acid Transporter Family 3.A.1.3 The Polar Amino Acid Uptake Transporter Family 3.A.1.4 The Hydrophobic Amino Acid Uptake Transporter Family 2.A.120 The Putative Amino Acid Permease Family 2.A.42 The Hydroxy/Aromatic Amino Acid Permease Family 2.A.18 The Amino Acid/Auxin Permease Family 2.A.23 The Dicarboxylate/Amino Acid:Cation (Na+ or H+) Symporter Family 2.A.26 The Branched Chain Amino Acid:Cation Symporter (LIVCS) Family 2.A.21.2 The Solute:Sodium Symporter Family 2.A.25 The Alanine or Glycine:Cation Symporter Family 2.A.78 The Branched Chain Amino Acid Exporter Family 2.A.114 The Putative Peptide Transporter Carbon Starvation CstA Family 2.A.67 The Oligopeptide Transporter Family 2.A.17 The Proton-dependent Oligopeptide Transporter Family 1.A.11 Ammonium Channel Family

6 http://www.tcdb.org/ 7 http://ggplot2.org/ 79

Chapter 2 2.20. Transcriptomic Approach to Determine Hyper Ammonia Production Gene Signatures

Results of the transcriptomic approach can be seen in chapter 5.

2.20.1. Quality Control and Trimming of Raw Reads Raw data was obtained from the sequencing centre and data was validated for data transfer errors using check sums and read quality was assessed using multiqc (v1.7, (Ewels et al., 2016)) before analysis. Single end trimming was carried out using Trimmomatic (v0.39, (Bolger et al., 2014)), using a quality threshold of a Phred score of 33 and minimum length of 50bp. Adapters were trimmed using the option ILLUMINACLIP:TruSeq3-SE.fa:2:30:10.

2.20.2. Mapping and Counting Reads to the Genomes Reads were mapped to the nucleotide genome sequence of the corresponding species using Bowtie2 (v2.3.3.1, (Langmead and Salzberg, 2012) with the default --sensitive-local argument. The resulting sequence alignment/map (SAM) files were manipulated using SamTools (v1.5, (Li et al., 2009)) to report primary reads only using flags -F260 to remove unmapped and secondary reads, and then outputting to a binary alignment map (BAM) format. FeatureCounts (v2.0.0, (Liao et al., 2014)) was used to count the number of reads that aligned to genes, taking the four BAM files (one for each replicate) for each species, along with the general feature format (gff) file, and returning just the counts for coding sequences (CDS) and the corresponding gene ID. A read was not counted if there was ambiguity in the alignment, where a read overlaps two or more features.

2.20.3. Identifying Interesting Patterns Using Transcriptome Count Data Once these count tables were created, the first observation was made by sorting the counts within a species and determining the gene with the highest number of reads averaged across the four replicates. Then, because the gene IDs in the FeatureCounts output matched those same gene IDs in the genomes, any gene family or cluster a gene was allocated to using the genome could be mapped to the read count data. If multiple genes belonged to the same gene family or cluster, they were summed. Counts for genes belonging to the same cluster of orthologous groups (COG) as determined previously (section 0) were summed. This allowed for the functional identifiers to be totalled and expressed as a percentage of reads in the transcriptome that belong to that function. The data was displayed using 100 % stacked column graph created in Microsoft Excel (Office 16). Statistical comparisons were carried out using Wilcoxon rank sum tests and plotted as box and whisker plots in the style of Tukey using the proportional data in R (v3.6.0; (R Core Team, 2018)) in RStudio (v1.2.5033; (RStudioTeam, 2016)) and using packages ggpubr (v0.2.5; (Kassambara, 2020)), reshape2 (v1.4.3; (Wickham, 2007)) and ggplot2 (v3.1.1, (Wickham, 2016)). The R script used can be found online (transcriptomic_COG_IDs_boxplots_stats.R).

80

Chapter 2 To statistically determine whether any genes had significantly more reads aligned to them, first, only those families represented in all of the genomes were chosen. To carry out statistical analyses, the DESeq2 package (v1.22.2, (Love et al., 2014)) was using in R (v3.6.0, (R Core Team, 2018)), implemented in RStudio (v1.2.5033, (RStudioTeam, 2016)). The tests were set up in a way such that each species was compared with each other species, and any genes that were significant (Padjusted value of <0.1 in DESeq2) in at least one comparison were retained, and maximum clique clustering was used to determine whether each species was significantly different from the others using iGraph (v1.2.4.1, (Csárdi and Nepusz, 2006)), and assigned a letter, where the same letters shared between two species indicated no significant difference, different letters indicated a significant difference (Padjusted <0.1 in DESeq2). A Python script (letters_python.py) was created and used to determine those genes where the set of letters that occurred in one group did not occur in the other, i.e. where any comparisons between any species in one group and any species for the other for a gene would be significantly different. Those gene families were then visualised as boxplots plotted using the normalised data from DESeq2 using ggplot2 (v3.1.1, (Wickham, 2016)), reshape2 (v1.4.3, (Wickham, 2007)) and ggforce (v0.3.1, (Pedersen, 2019)). The R scripts used can be found online (transcriptomics_COGs_deseq_analysis.R, transcriptomics_KOs_deseq_analysis.R, transcriptomics_COGs_boxplots.R and transcriptomics_KOs_boxplots.R). These gene families were visually interrogated further to find gene families where all species of the same phenotype were significantly greater than species of the other phenotype; obtained by finding those boxplots where the lowest normalised count of the phenotype with the greater median was greater than the highest normalised count of the phenotype with the smaller median. These gene families were replotted as boxplots and ordered by difference in medians of the phenotypes. This analysis was carried out for both read counts expressed as COGs and KOs. Those COGs and KOs that were found to have significant expression for either of the groups were used to plot the reactions represented on metabolism pathway maps using iPath3.08 (Darzi et al., 2018).

The expression of transporter genes was analysed by firstly totalling all read counts of genes that were annotated with the transporter families of interest, as determined in the genomes previously (2.19). A stacked graph was created in R (v3.6.0, (R Core Team, 2018)) implemented in RStudio (v1.2.5033, (RStudioTeam, 2016)) with packages ggplot2 (v3.1.1, (Wickham, 2016)) and reshape2 (v1.4.3, (Wickham, 2007)). A Shapiro test was used to verify that the counts for each transporter family was not normally distributed, and unpaired Wilcoxon rank sum tests were carried out between phenotypes to determine whether expression was significantly different between the two groups, implemented with the ggpubr package (v0.2.5, (Kassambara, 2020)) and to add P values to boxplots plotted with ggplot2. R scripts used in RStudio can be found online (transcriptomics_aatransporters.R).

8 http://pathways.embl.de 81

Chapter 2 2.21. Bacteriophage Genome Assembly and Analysis

The results from bacteriophage genome assembly and analysis can be seen in chapter 6.

2.21.1. Quality Control, Read Trimming and Assembly Quality control of the raw reads was carried out using FastQC v.0.11.8 (Andrews, 2010), followed by quality trimming using the paired end default settings in Sickle (Joshi and Fass, 2011) and assembly using SPAdes v3.13.0 on paired and single reads after trimming, with just “assembly-only” option applied (Bankevich et al., 2012), as conducted for phage genomes previously (Rihtman et al., 2016; Sazinas et al., 2018). Contigs were visualised using Bandage v0.8.1 (Wick et al., 2015), and those circular contigs with the highest coverage for assembled reads of a reasonable length (>10 kbp) were extracted as the phage genomes. The phage genome contigs were visualised manually using Geneious Prime 2020.1.29 and repeat regions were identified. A length of 127 bp was repeated in all genomes corresponding to the k-mer size used by SPAdes during assembly, which were removed manually in Geneious Prime. BWA-MEM v0.7.16a-r11810 (Li, 2013) was then used with default settings to align all reads in a sample to all of the corresponding assembled contigs, and SamTools v1.5 (Li et al., 2009) was then used to extract the alignments to the phage contig and manage output files. Coverage was assessed using the depth command in SamTools and calculating the coverage for the entire genome by averaging the coverage for each base in a genome.

The quality of genomes was assessed using Pilon v1.23 (Walker et al., 2014), and then were rearranged using the terminase gene, which was predicted using Prokka v1.12 (Seemann, 2014), supplemented with the Caudovirales database10. The genomes were reordered and orientated manually using Geneious Prime, such that the start of the linear sequence for the circularly permuted genome was at the first base of the first gene found downstream of the terminase gene, as recommended previously (Russell, 2018).

2.21.2. Comparing and Annotating the Phage Genomes These genomes were queried against the viruses (taxid:10239) nucleotide collection (nr/nt) using BLASTN (Zhang et al., 2000) and default settings (carried out on 09/04/2020). For comparative genomics, average nucleotide identity (ANI) was calculated using FastANI v1.3 (Jain et al., 2018) using default settings. Pairwise comparisons of the nucleotide genomic sequences were conducted using the Genome-BLAST Distance Phylogeny (GBDP) method (Meier-Kolthoff et al., 2013) with default settings recommended for prokaryotic viruses in VICTOR11 (Meier-Kolthoff and Göker, 2017)

9 https://www.geneious.com 10 Available at: http://s3.climb.ac.uk/ADM_share/Caudovirales.tar.gz; last modified 20/07/2017 11 Available at: https://ggdc.dsmz.de/victor.php; accessed 09/04/2020 82

Chapter 2 using the D0 formula. Genomic synteny was visualised using ProgressiveMauve within Geneious Prime, using default settings.

Open reading frames (ORFs) were then predicted using Glimmer v3.02 (Delcher et al., 2007) with default settings, GeneMarkS-2 (Lomsadze et al., 2018) with Prokaryotic sequence type, genetic code 11 and GFF3 output using the online tool12, Prodigal v2.6.3 with default settings with output as GFF (Hyatt et al., 2010) and PHANOTATE v1.2.2 (McNair et al., 2019) with default settings. All ORFs were manually curated using Geneious to visualise all predicted ORFs, allocating each putative ORF into one of four groups; (A) those ORFs where all gene callers agreed with presence, start and end; (B) those ORFs where all gene callers agreed with gene presence, but with different predicted starts and/or ends; (C) those ORFs where the majority of gene callers agreed on ORF presence, with same start and ends; (D) those ORFs where the majority of gene callers agreed on ORF presence, but with different predicted starts and/or ends.

Nucleotide and protein sequences for ORFs in all four categories were obtained using gff2bed tool (Bedops; v2.4.37; (Neph et al., 2012)), getfasta (Bedtools; v2.27.1; (Quinlan and Hall, 2010)) and transeq (Emboss; v6.6.0.0; (Rice et al., 2000)). These were then annotated by searching for homologs with BLAST v2.8.1+ (Altschul et al., 1990; Camacho et al., 2009) and all BLAST hits that achieved an e-value greater than 10-5 (Aziz et al., 2018) and query coverage >80 %, as used by Prokka (Seemann, 2014) were retained. Custom nucleotide and protein databases comprising of a number of NCBI databases were used, including representative viral reference genomes13, viral genomes, proteins and non-redundant proteins from the 8th July 2019 RefSeq database14, against which the full genomes and nucleotide gene sequences were searched; alongside the Caudovirales database2 and the prokaryotic virus orthologous groups (pVOGs) database supplied with multiPHATE v1.0 (Ecale Zhou et al., 2019). Protein databases were obtained from Swissprot15 and virus specific orthologous groups from EGGNOG16, annotating the best hits with Uniprot IDs. Each ORF for each genome was manually curated and annotated using the best of the retained hits, where multiple hits with equally good scores from different databases were all recorded. The name of the genome containing the gene with the best hit(s) for each ORF was also noted. Where gene callers did not agree on an ORF, the ORF with the best hit was used, considering coverage of query and subject and percentage identity to determine the best.

12 Available at: http://exon.gatech.edu/GeneMark/genemarks2.cgi, accessed 10/03/2020 13 ftp://ftp.ncbi.nlm.nih.gov/blast/db/ref_viruses_rep_genomes.tar.gz; last updated 23rd August 2019 14 Release 95; ftp://ftp.ncbi.nlm.nih.gov/refseq/release/viral/ 15 August 2019 release; ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz 16 v5.0, http://eggnog5.embl.de/download/eggnog_5.0/e5.viruses.faa 83

Chapter 2 Protein motifs were also predicted using hmmscan in HMMER v3.1b2 (Eddy, 2011) against pfam17, TIGRfam18 (Haft et al., 2003) and HAMAP19 (Lima et al., 2009) databases. A consensus gene name was then derived manually for each predicted ORF for each phage genome based on the evidence gathered from homology searches. Numbers of overlapping genes were obtained from the gff files for each of the genomes, and tRNAs were predicted using tRNAscan-SE v2.020 assuming a bacterial sequence source and otherwise default settings (Chan and Lowe, 2019) and added to the annotated ORFs.

TMHMM v2.0 (Krogh et al, 2001) was run to detect transmembrane regions, einverted was used in Emboss to find inverted repeats and PHACTS21 (McNair et al., 2012) to determine lifestyle of the phage. Promoters were identified in each genome using PhagePromoter22 (Sampaio et al., 2019) setting the lifecycle for each phage with the outcome of PHACTS, assuming all phages were in the Siphoviridae family, host bacterial genus as other, both direction and a threshold of 0.5. Terminators were predicted using FindTerm23 (Solovyev and Salamov, 2011), showing all putative terminators with threshold value <-10, as recommended previously (Aziz et al., 2018). Promoters and terminators were then manually checked, removing terminators present within genes whilst retaining ones present in intergenic regions or in the 3’ end of upstream genes (Aziz et al., 2018), and choosing promoters in the correct orientation and with the highest score where multiple were present.

2.21.3. Phylogenetics and Phylogenomics The nucleotide genome sequences of phage genomes were uploaded into ViPTree (Nishimura et al., 2017) using default settings, dsDNA nucleic acid type, Prokaryote host categories and a user-defined gene annotation file. From the resulting tree from the ViPTree analysis, the closest related taxa were identified as those in the same clade as the Butyrivibrio phage genomes, of which there were 13, and were downloaded using the NCBI ID supplied by ViPTree. The nucleotide sequences of these 13 genomes, along with the five phage genomes from this study, were then analysed using the Genome- BLAST Distance Phylogeny (GBDP) method (Meier-Kolthoff et al., 2013) with default settings recommended for prokaryotic viruses in VICTOR24 (Meier-Kolthoff and Göker, 2017) using the D0 formula.

17 Available at ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.gz; last modified 30/08/2018 18 Release 15, available at https://ftp.ncbi.nlm.nih.gov/hmm/TIGRFAMs/release_15.0/TIGRFAMs_15.0_HMM.tar.gz 19 HAMAP available with Prokka v1.12 (Seemann, 2014) 20 http://lowelab.ucsc.edu/tRNAscan-SE/; 21 Available at http://edwards.sdsu.edu/PHACTS/upload.php; accessed on 24/03/2020 22 Available as a Galaxy tool at https://galaxy.bio.di.uminho.pt/; accessed on 24/03/2020 23 http://www.softberry.com/berry.phtml?topic=findterm&group=programs&subgroup=gfindb; version v2.8.1, accessed 24/03/2020 24 Available at: https://ggdc.dsmz.de/victor.php; accessed 09/04/2020 84

Chapter 2 2.21.4. Assessing Host Interactions The host genome of the bacterial host Butyrivibrio fibrisolvens DSM 3071 was downloaded (NCBI Reference Sequence NZ_FQXK01000003.1) and the GC content and codon usage were summarised in Geneious Prime. Additionally, the presence of prophages was predicted using the PHASTER online tool25 (Arndt et al., 2016; Zhou et al., 2011). The amino acid sequences of the predicted prophage genes were subjected to a sequence similarity search using blastp against the protein sequences for genes predicted in the five phage genomes from this study and filtered using the same thresholds as previous. CRISPR/Cas genes were identified in the reference genome and in 497 genomes of the Hungate1000 collection using CRISPRCasFinder (v4.2.19) with default settings (Couvin et al., 2018). The resulting CRISPR spacer sequences were queried against the phage genomes from this study using blastn with default settings.

25 http://phaster.ca/ 85

Chapter 3

3. Ammonia Production by Ruminal Bacteria; A Definition of the Phenotype, Identification and Characterisation of Model Species

3.1. Introduction

What makes Hyper Ammonia Producing (HAP) bacteria interesting is their ability to use just amino acid fermentation for producing energy for growth and therefore their ability to grow with amino acids or peptides as their only nitrogen and carbon source. The first HAPs identified were found not to utilise carbohydrates in the conditions tested, the basis for which created the description of HAPs by Wallace, who categorised hyper ammonia producers as asaccharolytic, monensin sensitive bacteria (Wallace, 1996). Those bacteria that fit this profile have been claimed as the “classic” HAPs (Bento et al., 2015; Hartinger et al., 2018). However, some bacteria can hyper produce ammonia and grow on peptides or amino acids as the sole carbon source but when given the opportunity, do also have the ability to ferment carbohydrates. For example, non-ruminal Clostridium sticklandii (now Acetoanaerobium sticklandii (Galperin et al., 2016)) strains have been shown to be able to ferment glucose, galactose and maltose, yet the availability of these substrates in medium does not influence growth, and are therefore not or only minor substrates contributing to cell growth (Stadtman, 1954; Sangavai and Chellapandi, 2017).

Those bacteria that fit the “classic” HAP phenotype, such as Clostridium aminophilum F, Acetoanaerobium sticklandii SR, Peptostreptococcus anaerobius C, were isolated from the rumen of cattle, whereas E. pyruvativorans I6 was isolated from the rumen of sheep. However, once samples from the rumina of different animals were challenged with a protein-only medium, novel ammonia producing bacteria were isolated but with additional capabilities, thus expanding the niche. Fifteen species were isolated from the rumina of goats and sheep fed Calliandra calothyrsus, a proteinaceous and tannin-rich leafy shrub, all of which were able to grow on peptides and proteins in the presence of tannins and produced some amount of ammonia (McSweeney et al., 1999). Tannins are polyphenolic compounds that form insoluble complexes with proteins that usually decrease protein degradation by microbes (Patra et al., 2012). Of these fifteen species isolated, ten fermented carbohydrates, six either showed low growth or did not use carbohydrates at all, and only one strain was claimed as a HAP, producing an amount of ammonia comparable to that of previous HAPs. This was the first hyper ammonia producing proteolytic species isolated (McSweeney et al., 1999). However, no explicit comment was made about the carbohydrate fermentation abilities for this isolate, only that it was placed in the category of “low level of carbohydrate fermentation”, and it was noticed that density of cultures decreased with addition of carbohydrates compared to medium with none. Analysis of the 16S rDNA sequence revealed that this proteolytic strain was the most related to C. botulinum group 1,

86

Chapter 3 fermentation patterns of peptides was similar to that of A. sticklandii, and amount of ammonia produced was comparable to C. aminophilum F, A. sticklandii SR and P. anaerobius C (McSweeney et al., 1999).

Species isolated from forage fed Nellore steers (Bos indicus) showed carbohydrate fermentation capabilities alongside ammonia production and were termed as hyper ammonia producing bacteria

−1 −1 (Bento et al., 2015). This study classified those bacteria producing ≥100 nmol NH3 mg protein min as HAPs, which were all sensitive to monensin, but most strains showed increased culture density with addition of carbohydrates. Only three strains were obligate amino acid fermenters and did not utilise carbohydrates for growth, whereas the other 27 strains were carbohydrate fermenters but had a similar deamination rate to those HAPs previously recorded. Phylogenetic analysis using 16S rDNA sequences revealed distant relations to previously isolated HAP species (Bento et al., 2015). These studies indicate that bacteria are likely influenced by host diet and environment and therefore the subsequent substrates available, leading these bacteria to evolve into their niche. It has been noted previously that the ability to switch between carbohydrate and amino acid fermentation for energy production would indeed be advantageous in the rumen (Hartinger et al., 2018).

The list of known HAP species is short, and the list of those HAP species with their genome sequenced is even shorter. Few HAP isolates have undergone genome sequencing and most only have 16S sequences available. This is because literature so far has concentrated either on isolating these interesting species capable of this phenotype from various ruminant hosts and characterising their abilities in vitro, or sequencing and studying an individual at the genome level. A. sticklandii is an example of this, as its prevalence as an amino acid utiliser and possible novel biochemistry and enzymes that could be exploited for biotechnological applications means that the genome has been well annotated and studied (as discussed in section 1.4.1.4) (Fonknechten et al., 2010; Sangavai and Chellapandi, 2017). Studies such as these are insightful and reveal an individuals’ capabilities and will form a suitable starting points for comparisons to other species. Such comparisons were carried out between this genome and those of C. aminophilum and other Clostridia, in the context of potential use in biofuel generation (Sangavai and Chellapandi, 2017). The key findings drawn from these comparisons were that the genomes of A. sticklandii, C. aminophilum and C. bifermentans (all previously hypothesised to be HAPs) all encode a “typical transporter system” and that while the amino acid fermentation abilities are very similar, growth patterns still varied. Furthermore, A. sticklandii was shown to have a greater metabolic capability than other genomes, including C. aminophilum (Sangavai and Chellapandi, 2017), suggesting that not all genes present in genome of A. sticklandii are necessary to be a hyper ammonia producer. The in-depth review of the C. sticklandii DSM519 genome has revealed the key enzymes involved in amino acid catabolism along with interesting features, such as selenoproteins (Fonknechten et al., 2010; Sangavai and Chellapandi, 2017), which could be common to hyper ammonia producing bacteria and provide a good starting point for identifying commonalities.

87

Chapter 3 Further comparisons however are limited by the lack of data, something which could be remedied by identifying more HAP species and their genomes. Whilst efforts in the past have concentrated on culturing, being able to identify novel HAPs from genomic data would widen the applicability of these bacteria, for example predicting HAP species in any data sets, such as in the human microbiome. Externally, this group of bacteria shares a phenotype of amino acid and peptide utilisation and excessive ammonia production, but whether there is commonality that unites this group remains to be seen. But it is with this commonality that more HAP species could be identified. For example, if this trait is monophyletic or there is a unique genomic signature that confers the phenotype, more HAP bacteria can be predicted. To define this signature, suitable HAPs and their genomes must first be identified, their phenotype confirmed during normal, consistent growth to ensure commonality, and suitable comparisons need to be found.

3.1.1. Aims and Objectives

• To define the hyper ammonia production phenotype. • To find suitable bacterial species available in culture with their genome sequenced and available. • To identify a suitable method of comparison such that commonalities that are unique to HAPs can be found. • To confirm the ammonia production capabilities of the HAP cultures and any other species chosen as comparisons.

3.2. Experimental Procedures

First, a review of the current literature was carried out to identify and define the ammonia production phenotypes and to find which ruminal species belonged to these groups. Those that were available in culture were grown in vitro according to the methods described in sections 2.2 and 2.3. A total of 13 strains from frozen stocks underwent revival (Table 3.1).

Four HAPs were cultured from frozen stock and their genomes sequenced (section 2.6), firstly to confirm the exact species of the culture, and secondly to make the genomes for these laboratory strains available. A growth assay was carried out using the same medium and conditions for all cultivable cultures, so that the approximate mid-log growth phase could be identified for each species (section 2.5). Growing cultures to this point would allow for fairer comparison, as different species require different incubation times to achieve full growth, and comparisons between cells of different species in lag, log, stationary or death phase would create bias. Such comparisons were carried out using measurements of cumulative ammonia concentration in the culture (section 2.9.1), production of volatile fatty acids (section 2.9.2), and fingerprints of the metabolome (section 2.9.3). Part of the 16S rRNA gene was sequenced to confirm the species in each of the cultures (section 2.8), and the resulting sequences were queried against a database of 16S sequences from the Hungate1000

88

Chapter 3 collection (section 2.14). A visual overview of the processes can be seen in the flow diagram in Figure 3.1.

Table 3.1 List of species that were underwent revival from frozen stocks which were stored at either - 20°C1 or -80°C2. 3Both of these are the same strain but were obtained from different culture collections at different times. 4Genomes of these strains were sequenced. 5These species underwent further characterisation. Species Strain(s) Clostridium aminophilum ATCC499061,3,5; DSM107102,3,4 Acetoanaerobium sticklandii ATCC126621,4,5 Eubacterium pyruvativorans Isol61,4,5 Peptostreptococcus anaerobius ATCC273371,4 Megasphaera elsedenii T811,5 Prevotella ruminicola ATCC191891,5 Selenomonas ruminantium Z1081 Butyrivibrio fibrisolvens DSM30712,5 Ruminococcus albus SY31,5 Ruminococcus flavefaciens 007c1,5; FD11 Fibrobacter succinogenes S851,5

Figure 3.1 Overview of the methods and processes used. Italics indicate the sections in chapter 2 where the methods described.

89

Chapter 3 3.3. Results and Discussion

3.3.1. Classification of Rumen Microbes into Ammonia Production Phenotypic Groups In order to determine interesting characteristics particular to the hyper ammonia producing (HAP) bacteria, it is necessary to compare them to a contrasting species, ones that do not share the phenotype. The literature was consulted to determine whether there was a classification system for ruminal bacteria based on ammonia production phenotypes published that could be applied to this study.

Such a system published previously (Rychlik and Russell, 2000), used three groups: the HAPs,

−1 −1 carbohydrate fermenting with low ammonia production rates (2 –100 nmol NH3 mg protein min )

−1 and carbohydrate fermenting with essentially no ammonia production (<1 nmol NH3 mg protein min−1) when grown on a rich medium containing carbohydrates and trypticase. None of the carbohydrate fermenting species would grow on a medium without carbohydrates. However, this system does not account for more recently isolated HAP bacteria that can ferment carbohydrates. A fourth intermediary group has been described; isolates that fulfil the HAP phenotype, but can ferment carbohydrates for energy as well as amino acids and peptides (Bento et al., 2015), and therefore phenotypically should not be grouped with the obligate amino acid fermenting HAPs.

This leads to the conclusion that utilising ammonia production as a method of categorising bacteria is somewhat of a misnomer, as what is more interesting is the metabolic pathways of the cell. Ammonia production results from amino acid degradation, a pathway the HAPs utilise to a large extent, given that their method of energy production is through deamination. Those that produce either none or a small amount of ammonia were shown to not grow without carbohydrates, indicating that they rely on this primarily for energy and growth. In other words, ammonia production is an indication of amino acid and peptide degradation. Therefore, ammonia production by bacteria in the rumen should be represented by a spectrum, where under the assumed optimal circumstances for bacterial growth, a given bacterium will fall somewhere on this scale based on the amount of ammonia a culture produces as well as chosen metabolism pathways. Depending on where organisms fall, they can then be grouped. A diagram of this is shown in Figure 3.2, which better demonstrates the four ammonia production phenotypes, determined by information taken from literature resources.

Four distinct phenotypes arise from this spectrum; No Ammonia Producers (NAPs), are carbohydrate utilising bacteria that do not produce ammonia when given the opportunity (i.e. nitrogen sources). Some Ammonia Producers (SAPs) are those that produce a measurable yet small amount of ammonia but require carbohydrates for growth. Carbohydrate Fermenting Hyper Ammonia Producers (CF- HAPs) fulfil the HAP phenotype of utilising amino acids and peptides alone for energy and produce a large amount of ammonia, but may also utilise carbohydrates for energy when available. Obligate Amino Acid Fermenters (OAFs) are those “classic” HAPs that do not utilise carbohydrates for growth

90

Chapter 3 and produce a large amount of ammonia. For the purpose of studying the characteristics involved with hyper ammonia production, the latter two can be grouped together as HAPs, as they both have the ability to utilise amino acids and peptides as the sole energy source. To each of these groups, a small number of bacteria were assigned based on evidence in the literature. Based on the knowledge of metabolism, other aspects of activity a group may have can be predicted. For example, it was predicted that amino acid transport activity would be greater for HAPs as they rely on this as their energy source as well as for protein synthesis. Other such considerations are that all bacteria require peptides and amino acids (or ammonia) as a nitrogen requirement for protein synthesis.

Given that the ammonia production phenotypes are all predicted based on literature, these assumptions rely on the relationship between phenotype expressed by a culture when grown in vitro to the abilities and likely behaviour of the same bacteria in the rumen. For example, offering a bacterium a growth medium supplemented with substrates not found in the rumen may illicit a response that would not occur in the rumen. This tends to be avoided by using rumen fluid-based media. Absolute ammonia production is difficult to determine, as highlighted previously (Russell, 2005), because there are different methods to measure ammonia; for example, during stationary phase of bacterial growth, or during active exponential growth, which can result in different values. Therefore, there are no numerical interpretations of ammonia production on the graph seen in Figure 3.2, instead this is just a visual guide to allow comparison of proportions of ammonia production as interpreted from the literature. A full description of each group and the bacteria that have been allocated to them can be found below.

91

Chapter 3

Figure 3.2 Diagram representing the four ammonia producing phenotypes. Hexagons in the top graph represent a bacterium with a known ammonia production phenotype, placed on the y-axis to give a qualitative visual representation of amount of ammonia it produces compared to other bacteria, based on the literature. X-axis positioning is relevant to metabolic abilities of the bacterium, indicated by the elongated shapes below the graph. Size of trapezoid shape represents proportions of that ability, where a thicker shape shows more of that activity and absence shows none of that activity. (1Bladen et al., 1961; 2Attwood et al., 1998; 3Whitehead et al., 2011; 4Russell, 2005; 5Eschenlauer et al., 2002).

92

Chapter 3 3.3.1.1. No Ammonia Producers (NAPs) Ruminal bacteria that do not produce a net amount of ammonia in vitro belong in this group. Predominantly formed of cellulolytic bacteria, gaining energy from the breakdown of fibre, this group uses ammonia as their primary nitrogen source (Bryant and Robinson, 1961; Hartinger et al., 2018). The adaptation to the utilisation of ammonia over amino acids as a nitrogen source is an indication of specialisation in the rumen, as ammonia is much more prevalent in the rumen environment that free amino acids (Bryant and Robinson, 1961).

Fibrobacter succinogenes (formerly of the Bacteroides genus (Montgomery et al., 1988)), Ruminococcus flavefaciens and R. albus are the predominant cellulolytic species in the rumen (Russell et al., 2009; Pengpeng and Tan, 2013). One study showed that the addition of amino acids or peptides to cultures of these cellulolytic bacteria increased growth, but only as long as enough energy sources are available in the form of carbohydrates (Soto et al., 1994), suggesting that amino acids are used to synthesise new microbial proteins, or because an excess of amino acids means that more are available to deaminate, which produces energy and carbon skeletons, useful for protein synthesis (Atasoglu et al., 2001; Bach et al., 2005). It may be that deamination of amino acids can occur in the cells of these species, but the activity is low and ammonia that is produced this way does not leave the cell; a known difficulty posed for those studying ammonia as nitrogen sources (Atasoglu et al., 2001; Bach et al., 2005).

F. succinogenes strain S85 is the model fibrolytic organism for the rumen, adhering to feed particles to break down cellulose, particularly the crystalline structures, with the ability to use cellulose as its sole energy source, and produces succinate as the main product of its cellulose degradation pathways (Russell et al., 2009; Suen et al., 2011). With regards to nitrogen metabolism, ammonia is the primary source of nitrogen for amino acid synthesis for F. succinogenes, that it transports into the cell using a high-affinity ammonium transporter (Suen et al., 2011). Along with ammonia, it also requires branched and straight chain fatty acids (Bryant and Doetsch, 1954) likely as precursory carbon backbones for amino acids. This strain also cannot formulate ammonia from the breakdown of amino acids, meaning that limitations in ammonia availability can decrease cell growth (Bryant and Robinson, 1961; Maglione and Russell, 1997; Suen et al., 2011). Addition of single amino acids without ammonia, except for glutamine and asparagine, did not stimulate growth of S85 cultures (Bryant and Robinson, 1961), yet addition of casamino acids (but not tryptone) increased bacterial growth but had no effect on cellulolytic activity, showing that amino acids will stimulate growth but only if there is enough energy (for example in the form of cellulose) available (Soto et al., 1994). Ammonia was not produced when cultures of S85 and four other strains of F. succinogenes were grown on protein hydrolysate (Bladen et al., 1961). Strain F. succinogenes BL2 was found to have high peptidase activity against alanine dipeptides, and strain S85 had only low peptidase activity (Wallace and McKain, 1991). Whilst these strains share high levels of similarity in their 16S rRNA sequences, multiple passages and exposure in the laboratory may have resulted in mutations and their 93

Chapter 3 functions may be different (Kobayashi et al., 2008). F. succinogenes S85 was available in culture for this current study along with a genome in the Hungate collection, and therefore the ammonia production phenotype can be confirmed. It is unclear as to whether other strains of F. succinogenes are also NAPs, as the culture was not available to confirm.

Ruminococci are also prevalent cellulolytic bacteria in the rumen, predominantly R. albus and R. flavefaciens, differentiated from each other by their colony pigment and variation in fermentation products. They both form acetate, formate and hydrogen, but R. albus produces ethanol, whereas R. flavefaciens produces succinate (Russell et al., 2009). Similar to F. succinogenes, strains of these species were found to require ammonia for growth even if other nitrogen sources (such as amino acids) were available in abundance (Bryant and Robinson, 1961). When a variety of strains for both R. albus and R. flavefaciens were tested for production of ammonia on trypticase, none of those tested produced ammonia (Bladen et al., 1961). The strains SY3 and 007c for R. albus and R. flavefaciens are available in culture for this study and therefore the ammonia production phenotype can be confirmed. The revival of R. flavefaciens FD1 from frozen stock was unsuccessful. Genomes for this and other strains are also present in the Hungate collection, but without cultures to confirm lack of ammonia production these remain putative NAPs. Other predominant bacteria in the rumen that play a role in carbohydrate degradation without proteolytic activity would likely also belong to this group. Succinivibrio dextrinosolvens is an example of such bacterium, degrading dextrins and starch in the rumen, and can use ammonia and amino acids as sources of nitrogen, but does not produce any ammonia in the presence of ample nitrogen sources (Bladen et al., 1961; Gomez-Alarcon et al., 1982).

3.3.1.2. Some Ammonia Producers (SAPs) Bacteria belong to this group if, when given amino acids and/or peptides as a nitrogen source, can deaminate amino acids and produce enough ammonia that it is released from the cell. Energy is mainly produced through carbohydrate fermentation. This includes (but is not limited to) ruminal species Butyrivibrio fibrisolvens, Megasphaera elsdenii, Prevotella ruminicola and Selenomonas ruminantium.

Butyrivibrio fibrisolvens DSM3071 (strain D1) has been shown to produce ammonia, but only in small quantities as it has a weak deaminating activity, and the ammonia concentration was found not to increase even with increasing amino acid concentrations (Sales et al., 2000) but addition of amino acids and ammonia to cultures of B. fibrisolvens causes an increase in growth and proteolytic activity (Sales et al., 2000). This species is also sensitive to monensin (Dennis et al., 1981). M. elsdenii strains on other hand can deaminate amino acids nearly as quickly as HAPs, but total ammonia production was around four times lower and branched amino acids could not be used as a sole energy source. Ammonia production was only seen with casamino acids, not peptides, suggesting absence of peptidase activity in this species (Rychlik et al., 2002). M. elsdenii plays a role in lactate fermentation as well as carbohydrate fermentation and amino acid deamination (Counotte et al., 1981; Dennis et al.,

94

Chapter 3 1981; Weimer and Moen, 2013). Strain T81 has the highest ammonia production when grown with casamino acids and is resistant to monensin (Dennis et al., 1981; Rychlik et al., 2002). P. ruminicola strains use ammonia and peptides as their nitrogen sources, are proteolytic, utilise starch and carbohydrates, as well as playing an integral role in succinate fermentation that contributes to rumen homeostasis (Purushe et al., 2010). However, P. ruminicola cannot utilise amino acids as a sole nitrogen source (Kim et al., 2017). Although ammonia production was observed for strain 23 (the type strain) when grown on trypticase and indeed that P. ruminicola was claimed as the most important species for ammonia production in the rumen (Bladen et al., 1961), later studies found that ammonia consumption outweighed production during growth, noting an overall decrease in ammonia over time instead of an increase from production. This is likely a symptom of previous studies testing for ammonia after longer incubation times or on resting cells, resulting in observing ammonia accumulation, whereas studies on growing cells showed net ammonia uptake (Pittman and Bryant, 1964; Bryant and Robinson, 2010). This is supported by a study on a different strain, that found that ammonia production was low during active growth, but once the stationary phase was achieved ammonia began to accumulate (Russell, 1983). This is an indication that whilst P. ruminicola strains have the ability to deaminate amino acids to ammonia, this process may only be used during stationary phase of growth. S. ruminantium is another species shown to produce ammonia when grown on trypticase (Bladen et al., 1961), and strain HD4 has been implicated as a SAP previously (Russell et al., 1988). Butyrivibrio fibrisolvens DSM3071, Megasphaera elsedenii T81 and Prevotella ruminicola ATCC19189 are available in culture, and therefore the ammonia production phenotype can be confirmed. The revival of a strain of S. ruminantium Z108 from frozen stocks was unsuccessful. There are a further seven, three and six strains for these species respectively in the Hungate collection, as well as 12 strains of S. ruminantium, which without a culture for phenotype confirmation, remain putative SAPs.

3.3.1.3. Carbohydrate Utilising Hyper Ammonia Producers This group is populated by those bacteria that have been isolated from the rumen and defined as hyper ammonia producers with the ability to grow purely on amino acids or peptides, but also have the ability to ferment or utilise carbohydrates for energy, given the opportunity. Although hyper ammonia producing bacteria were originally defined as being asaccharolytic, a study on Brazilian steers revealed carbohydrate fermenting HAPs, and concluded that this niche might actually be wider than once thought (Bento et al., 2015). There is a possibility that these bacteria have the ability to shift their metabolism based on substrate availability (Hartinger et al., 2018), as shown by strains of Peptostreptococcus russellii, a weakly saccharolytic bacterium isolated from swine manure which grow more and produce less ammonia in the presence of glucose compared to amino acids or peptides (Whitehead and Cotta, 2004; Whitehead et al., 2011). One strain of this bacteria has also been identified and cultured from the rumen, and is part of the Hungate1000 collection (Seshadri et al., 2018). As well as P. russellii, other species of carbohydrate utilising HAPs include Fusobacterium 95

Chapter 3 necrophorum, for which one genome is available in the Hungate collection, and other undetermined species that relate to Clostridium argentinense, C. botulinum and C. bifermentans (Russell, 2005; Bento et al., 2015). There are none of these species available in culture, and therefore their ammonia production phenotype cannot be confirmed, and so are putative HAPs.

3.3.1.4. Obligate Amino Acid Fermenting Hyper Ammonia Producers Peptostreptococcus anaerobius C was the first hyper ammonia producing bacteria isolated from the rumen, a monensin sensitive bacterium that could be cultured on trypticase as the only energy and nitrogen source that produces a large amount of ammonia (Russell et al., 1988). This strain did not utilise any carbohydrates when given as a sole energy source, although over 90 % of strains (n = 63) of this species are able to ferment glucose and weakly ferment mannose (Murphy and Frick, 2013). Hydrogen was produced, and hydrogen-sulphide when the medium contained trypticase and cysteine (Russell et al., 1988). Amino acids are preferred over peptides, likely due to an inability to transport peptides effectively, with higher ammonia production from media containing casamino acids instead of trypticase, and production from Stickland pairs of amino acids greater than single amino acids (Chen and Russell, 1988). Amino acid usage for this ruminal isolate can be seen in Table 3.2 and the reactions of the preferential amino acids can be seen in Figure 3.3(A). Arguably, this species is stated in the literature as a HAP, but sequencing of the culture thought to be P. anaerobius revealed contamination and therefore as the phenotype cannot be confirmed in this study, it remains a putative HAP.

A small short rod named strain SR was the next hyper ammonia producer isolated from the rumen (Chen and Russell, 1989a), later named Clostridium sticklandii (Paster et al., 1993) now of the Acetoanaerobium genus (Galperin et al., 2016). Neither the genome nor the culture of SR strain was available for this current study, however the ATCC strain 12662 was used, the 16S rRNA gene for which was previously shown to be “essentially identical” (99.9 %) to the SR strain sequence (Paster et al., 1993), suggesting they are very close relatives. This strain was isolated from mud from San Francisco bay (Stadtman, 1954) and has been deposited in the ATCC as well as DSMZ (DSM519). It has since become the most well studied proteolytic Clostridium as its variety of biochemical abilities have shed new light on interesting metabolic pathways (Fonknechten et al., 2010). Most notable is the ability to use amino acids in Stickland pairs, with threonine, arginine, lysine and serine acting as receptors, and glycine and proline as donors, reactions for which can be seen in Figure 3.3(B). There is a requirement for peptides for growth, either in the form of tryptone or yeast extract (Stadtman, 1954), whereas addition of glucose and other carbohydrates did not increase growth (Stadtman, 1954; Mead, 1971), despite other studies showing weak fermentation of some carbohydrates such as glucose, galactose and maltose (Stadtman and McClung, 1957; Sangavai and Chellapandi, 2017). Carbohydrates are likely not substrates for energy or growth (Fonknechten et al., 2010).

96

Chapter 3 In the same vain as Acetoanaerobium sticklandii, Acidaminococcus fermentans should also be considered an obligate amino acid fermenter, as although some strains have been shown to be weakly saccharolytic, utilising glucose, the energy for this bacterium must originate from fermentation of amino acids (Rogosa, 1969). The type strain does not utilise glucose, instead using glutamate as a primary carbon source as well as energy source, alongside citrate and trans-aconitate, producing ammonia, carbon dioxide, butyrate and hydrogen (Chang et al., 2010). Although there are two strains available in the Hungate 1000 collection, there is no indication that these strains use carbohydrate substrates, nor are currently classed as hyper ammonia producers (Seshadri et al., 2018). Without cultures to test the phenotype of this species, they remain putative HAPs.

The third obligate amino acid fermenting bacterium isolated from the rumen was called Clostridium aminophilum F, an asaccharolytic irregular shaped rod, with a preference for free amino acids over peptides for growth (Chen and Russell, 1989a; Paster et al., 1993). Ammonia production from amino acids however is more particular than the other two

Figure 3.3 Reactions of the amino acids most preferentially used by the obligate amino acid fermenting bacteria. 1Chen and Russell, 1988, 2Fonknechten et al., 2010.

Figure 3.4 An example of alanine degradation producing pyruvate, ammonia and hydrogen from MetaCyc (Caspi et al., 2014).

97

Chapter 3 previously isolated ammonia-producing bacteria (Chen and Russell, 1989a). The amino acid preferences can be seen in Table 3.2, and products from these can be seen in Figure 3.3(C). This strain is available in culture (as ATCC 49906), thus the ammonia production phenotype can be confirmed, whilst the genome of it and another strain, a putative HAP, are available in the Hungate collection.

Eubacterium pyruvativorans isol6 is another asaccharolytic hyper ammonia producer isolated from the rumen of sheep, using peptides and amino acids as a sole energy source to produce ammonia in a comparable quantity to previously isolated HAP bacteria. Carbohydrates were not utilised and addition of carbohydrates to the growth medium did not increase growth (Wallace et al., 2003). Valerate and caproate are the main volatile fatty acids produced, while propionate and butyrate are used. Interestingly, this bacterium grows most rapidly on pyruvate; after which it is named, but given the low pyruvate availability in the rumen, this is unlikely to be the main ecological niche this bacterium occupies in this environment (Wallace et al., 2004). This strain is available in culture, and its phenotype can therefore be confirmed, and its genome, along with genomes of two other strains of putative HAPs, in addition to the genome of the culture sequenced for this project.

There are no reactions available in the literature for the bacterium, however based on the known metabolites produced, it is reasonable to assume that leucine and serine degradation likely leads to volatile fatty acid production as in P. anaerobius (Figure 3.3A), although malate is not reported as a product from E. pyruvativorans. According to MetaCyc (Caspi et al., 2014), there are a number of pathways in which alanine is degraded to pyruvate in bacteria such as that shown in Figure 3.4, producing ammonia, hydrogen, and pyruvate that could be the main pathway for ATP production and to produce butyrate, valerate and caproate (Wallace et al., 2004).

Other less characterised obligate amino acid fermenters include Peptostreptococcus sp. D1, isolated from ruminants in New Zealand. This isolate adheres to the same phenotypes as those described above, growing on peptides preferentially and using amino acids as the sole energy source resulting in high ammonia production. Addition of carbohydrates did not increase growth (Attwood et al., 1998). The genome for this strain is also available but remains a putative HAP as the phenotype cannot be confirmed without a culture.

98

Chapter 3

Table 3.2 Overview of the Hyper Ammonia Producing bacteria phenotypes. Nitrogen source and amino acid preference is determined by those that are depleted most in media or result in more ammonia production. Bacteria Amino Acid Preference Nitrogen Preference Reference Leucine, serine, Peptostreptococcus phenylalanine, threonine, Casamino acids (Chen and Russell, 1988) anaerobius C glutamine Threonine, arginine, Acetoanaerobium Yeast extract or tryptone (Fonknechten et al., serine, cysteine, proline sticklandii ATCC 12662 required1 2010), 1(Stadtman, 1954) and glycine Clostridium Glutamate, glutamine, Casamino acids (Paster et al., 1993) aminophilum F serine, histidine Eubacterium Alanine, leucine, serine Trypticase required for (Wallace et al., 2004) pyruvativorans I-6 and proline growth

99

Chapter 3 3.3.2. Model Bacteria Growth and Metabolism In Vitro Nine bacteria were successfully cultured and studied in vitro, three each of HAPs, NAPs, and SAPs (see Table 3.1 for more details). Confirmation of the four HAP cultures available was achieved firstly by whole genome sequencing, which also had the advantage of providing the genome sequence for the exact HAPs in culture. All cultures were confirmed using 16S sequencing and once confirmed, ammonia production was studied as well as volatile fatty acid production and metabolic fingerprints. It was apparent from observations made whilst growing these cultures as well as from the growth curves (Appendix Figure 10.1) that the HAP species were the slowest species to grow, and did not grow as densely as the SAPs, which tended to grow more dense than could be measured accurately with the spectrophotometer. M. elsedenii T81 grew the fastest, which reached stationary phase within 10 hours, and A. sticklandii ATCC12662 grew the slowest, taking around 48 hours to reach stationary phase of growth. Although some of the growth curves do not fully capture the lag, stationary or death phases, they indicate suitable times for samples to be taken such that the culture is in the exponential log growth phase, as shown by dashed lines on the graphs (Appendix Figure 10.1).

3.3.2.1. Genome Sequencing for Culture Identification and Confirmation The four HAP genomes sent for sequencing (see Table 3.1) revealed that two of the four cultures were indeed what were expected, but the other two were not. Microbes NG offer sequence analysis and assembly as well as raw data in the form of trimmed reads. Reads for samples were mapped to known bacterial families or genera using Kraken, giving an idea of taxonomic identity for the samples. 83.15 % and 89.61 % of the reads sequenced from the genomes of cultures thought to be Peptostreptococcus anaerobius ATCC 27337 and Clostridium aminophilum DSMZ 10710 mapped to Staphylococcus epidermidis and Lactobacillus ruminis, respectively. 99.49 % of reads from Acetoanaerobium sticklandii ATCC 12662 mapped successfully to [Clostridium] sticklandii as expected. Very few reads from Eubacterium pyruvativorans Isol6 on the other hand mapped to Clostridium sp., with only 0.91 % of the reads aligning to this genus. The full results that MicrobesNG provide can be seen in Table 3.4.

Quality of the assembled genomes of the two cultures of interest were visually assessed using FastQC, revealing that the assemblies were satisfactory, with no unexplained abnormalities. MicrobesNG also uses QUAST, a quality assessment tool for genome assemblies. The results can be seen in Table 3.3. Further confirmation of the identity of the bacteria in the cultures was obtained by checking their placement on the phylogenetic tree, as can be seen in Figure 4.2 in Section 4.4.2, where these genomes are indicated using purple arrows. Placement of the taxa of the four sequenced bacteria revealed that two of the samples were indeed contaminated and not as expected, whilst the other two were as expected. The sequence from the culture expected to be P. anaerobius ATCC 27337 was closely related to and positioned between two strains of Staphylococcus epidermidis. C. aminophilum DSMZ 10710 was placed in a clade of other strains of Lactobacillus ruminis. The positioning of these taxa on

100

Chapter 3 the tree corroborates the Kraken findings previously. Whilst the results previously from Kraken did not identify E. pyruvativorans from the reads, the low number of reads mapping to Clostridium species indicates a likely lack of information in the database about this particular ruminal bacterium. Positioning on the tree, however, is as expected, with the sequenced genome sharing a common ancestor with all three of the other E. pyruvativorans strains. There are no genomes for any strains of A. sticklandii in the Hungate 1000 collection despite strains of this species isolated from the rumen previously (Paster et al., 1993). Given that a very high number of reads (99.49 %) mapped to A. sticklandii in Kraken, the likelihood that this is a contaminant is very low. Instead, the placement of this taxon is interesting, and fell amongst other known hyper ammonia producers, clustering amongst Peptostreptococcus and Clostridium species. This result agrees with phylogenetic reconstructions performed elsewhere (Wallace et al., 2003).

Table 3.3 Statistics on the sequenced genomes using QUAST from Microbes NG. 10461_ 10462_ 10463_ 10464_ PepA27337 ClosS12662 EPyisol6 LaClos10710 Contigs 34 19 51 64 Largest contig (bp) 312041 1065135 361391 179634 Total length (bp) 2516271 2693859 2174708 2049063 N50 180773 475503 118119 63592 L50 6 2 6 10 GC (mol %) 31.93 33.17 54.82 43.38 Ns per 100 kbp 0 0 0 0

A genome comparison was carried out to compare the two sequenced HAPs of interest with its corresponding previously sequenced counterpart. For E. pyruvativorans Isol6, there is a genome for the same strain deposited in the Hungate 1000 collection, whereas a complete genome for the same strain of A. sticklandii was downloaded from NCBI for comparison. This was carried out using Mauve (Darling et al., 2004) and the resulting diagrams can be seen in Figure 3.5. Compared to the complete genome sequence of C. sticklandii DSM519 which consists of only the one contig, the sequenced genome has many more contigs in comparison, and some areas were not captured or assembled successfully in the sequencing for this project, indicated by gaps between the coloured blocks in the complete genome in Figure 3.5A. Despite this, the blocks that have been identified in the sequenced genome are whole contigs that align to segments of the complete genome, each with a very high sequence similarity, indicated by the darker lines within the blocks. There are many small contigs indicated by the mass of red on the right-hand side of the sequenced genome, which were not aligned successfully to the complete genome. This may be due to short contig length, or poor assembly for these contigs. The sequenced E. pyruvativorans in Figure 3.5B shows that this assembly has achieved better assembly of some contigs compared to the reference from the Hungate collection. For example,

101

Chapter 3 the first, third and fourth contigs in the sequenced genome contains blocks that match those found in separate contigs in the reference genome.

Because two of the four genomes came back as contaminated, it was imperative that additional confirmation of cultures was required before further studies were carried out.

3.3.2.2. 16S Sequencing for Culture Identification Sequencing the V4 to V6 regions of the 16S rRNA gene in the nine cultures that underwent phenotype confirmation revealed that all cultures were indeed of the species as expected, with all but two cultures sharing a high similarity with the expected strain. The two that did not match the expected strain are highlighted in grey in Table 3.5. Additionally, some sequences matched multiple strains of the same species. It is likely that this result is due to short query sequences, therefore a higher resolution and more reliable strain identification would be achieved by sequencing the whole 16S gene, as well as querying a larger database, not just one formed from sequences from the Hungate1000 collection. However, for this study, knowing that a culture is the correct species is the most important aspect. The requirement of running the PCR products on a gel and excising the band also was not necessary, as the forward and reverse samples both hit the expected target of E. pyruvativorans Isol6 when just the PCR products were submitted for sequencing.

102

Chapter 3

337 10461_PepA27 662 10462_ClosS12 6 10463_EPyisol 10710 10464_LaClos Sample

information.Results can behere: found known to Bacterial families genera, or the outcomes of the mostfrequent taxon Table

3

.

4

Resultsfrom MicrobesNG taxonomic identificationanalysis

unclassified unclassified unclassified unclassified Unclassified

12.97 0.44 93.36 9.81 (%) Unclassified

https://microbesng.uk/portal/projects/53009323

Staphylococcaceae Peptostreptococcaceae Clostridiaceae Lactobacillaceae Most frequent Family

86.87 99.49 1.16 89.96 Family (%) frequent Most .

Aspart a of the sequencing service,Kraken is used to mapreads from samples

isreport

Staphylococcus Peptoclostridium Clostridium Lactobacillus genus Most frequent

-

ed below. ed Samples were arbitrarily labelled but includestrain 2238

- 484E

86.86 99.49 1.14 89.96 genus (%) frequent Most

- 99C0

- 53955A370129

epidermidis Staphylococcus sticklandii [Clostridium] Clostridium sp. ruminis Lactobacillus species Most frequent

/

83.15 99.49 0.91 89.61 species (%) frequent Most

0.01 0.01 0.01 0 coli (%) Escherichia

103

Chapter 3

within graph blocks the as darker a colouredline, for mostblocks, similarity is so high this difficult is to visualise. block is coloured,with red li referencegenome the shows reference genome Figure

3

.

5

Mauvemultiple genomealigner showingarrangement of colinear blocks in the genomes of the sequencedHAPs compared torefer a

E. pyruvativorans E.

nesdenoting contigs and coloured between lines the genomes matchup blocks thatare aligned. Sequence similarityis represen

C.sticklandii

i6 (top) from the Hungatecollection aligned the to

DSM519 aligned (top) the to sequenced

A.

sticklandii

E. pyruvativorans E.

ATCC

12662strain from this study (below). (B) shows the

Isol6genome from this study (below). Each colinear

encegenome. (A) ted ted a as

104

Chapter 3 Table 3.5 BLAST results of sequenced 16S rRNA gene from cultures. The name of the culture is in the first column, followed by direction (Dir); forwards (F) or reverse (R). Subject ID is the sequence in the database that the query sequence had the highest match to, with percentage identity in the adjacent column. Those highlighted in grey did not match the strain expected. Some queries hit more than one sequence in the database equally well, and both are reported here where this was the case. 1PCR products submitted for sequencing, not excised gel band. Expected Dir subject ID % ident Eubacterium pyruvativorans isol6 F Eubacterium_pyruvativorans_i6 98.551 Eubacterium pyruvativorans isol6 R Eubacterium_pyruvativorans_i6 99.73 Acetoanaerobium sticklandii F NR_102880.1_Acetoanaerobium_sticklandii_ 98.673 ATCC12662 strain_DSM_519 _16S_ribosomal_RNA,_partial_sequence Acetoanaerobium sticklandii R NR_102880.1_Acetoanaerobium_sticklandii_ 100 ATCC12662 strain_DSM_519 _16S_ribosomal_RNA,_partial_sequence Clostridium aminophilum F Clostridium_aminophilum_F 98.578 ATCC49905 (strain F) Clostridium aminophilum R Clostridium_aminophilum_F 98.944 ATCC49906 (strain F) Ruminococcus flavefaciens 007c F Ruminococcus_flavefaciens_007c; 99.615 Ruminococcus_flavefaciens_17 Ruminococcus flavefaciens 007c R Ruminococcus_flavefaciens_007c; 99.462 Ruminococcus_flavefaciens_17 Fibrobacter succinogenes S85 F Fibrobacter_succinogenes_subsp._elongatus_ 97.778 HM2 Fibrobacter succinogenes S85 R Fibrobacter_succinogenes_subsp._elongatus_ 98.916 HM2 Ruminococcus albus SY3 F Ruminococcus_albus_SY3; 100 Ruminococcus_albus_AD2013 Ruminococcus albus SY3 R Ruminococcus_albus_SY3; 99.462 Ruminococcus_albus_AD2013 Butyrivibrio fibrisolvens DSMZ 3071 F Butyrivibrio_fibrisolvens_D1 (DSMZ 3071) 99.267 (strain D1) Butyrivibrio fibrisolvens DSMZ 3071 R Butyrivibrio_fibrisolvens_D1 (DSMZ 3071) 100 (strain D1) Prevotella ruminicola ATCC19189 F Prevotella_ruminicola_23 (ATCC 19189); 98.106 (strain 23) Prevotella_ruminicola_KHP1 Prevotella ruminicola ATCC19189 R Prevotella_ruminicola_23 (ATCC 19189); 99.727 (strain 23) Prevotella_ruminicola_KHP1 Megasphaera elsedenii T81 F Megasphaera_elsdenii_LC1 99.074 Megasphaera elsedenii T81 R Megasphaera_elsdenii_J1 97.468 Eubacterium pyruvativorans isol61 F Eubacterium_pyruvativorans_i6 96.061 Eubacterium pyruvativorans isol61 R Eubacterium_pyruvativorans_i6 100

105

Chapter 3 3.3.2.3. Analysis of Ammonia Production Confirms Phenotypes Ammonia production from the nine cultures was measured at the midpoint of log growth phase. As shown in the graph in Figure 3.6, the three HAPs E. pyruvativorans, A. sticklandii, and C. aminophilum had a statistically significantly higher (P < 0.05 Kruskal-Wallis with Bonferroni correction test) amount of ammonia detected in the cultures compared to the medium baseline, to the NAPs and to the SAPs, but were not significantly different from each other. The NAPs were all significantly lower than the blank medium results and showed no significant differences between themselves. The SAPs showed no significant change in the amount of ammonia present compared to the Hobson’s M2 growth medium, nor were there any significant differences in production between these three bacteria. Since ammonia was present in the medium due to the addition of ammonium salts as well as any ammonia present in the rumen fluid, it was important to determine the background amount of ammonia in the medium before addition of bacteria. This baseline is indicated on the graph using a solid line to indicate median and dashed lines to show the boxplot whiskers. This experiment does not indicate ammonia production rate or absolute change in ammonia, as to do that would require measurements of ammonia concentrations over time, as well as determining amount of ammonia in each tube of medium before culturing. Given that the rate of ammonia production relative to substrate is already well documented for these HAP bacteria (Chen and Russell, 1988, 1989a; Russell et al., 1988; Paster et al., 1993; Wallace et al., 2003), extensive testing seemed unnecessary. Instead, this experiment illustrates the amount of ammonia produced by a bacterial culture that has reached roughly half of the likely possible cell density based on turbidimetric measurements.

The results (Figure 3.6) clearly show that HAPs do indeed produce a significantly larger amount of ammonia compared to what was originally present in the medium, as well as the amounts produced by both the SAPs and the NAPs. This graph also shows that there was a decrease in ammonia detected in each of the NAP cultures, which agrees with experiments conducted previously (Atasoglu et al., 2001), and is likely due to the use of ammonia for incorporation into microbial protein (Tamminga, 1981; Abdoun et al., 2007) but without any reciprocal production of ammonia diffusing out of the cells. There was no significant difference in the amount of ammonia in the cultures containing any of the SAPs from the medium baseline, which could indicate that either ammonia is neither used nor produced by these bacterial cells during this phase of growth, or that ammonia production matches that of the rate that ammonia is used. Both of these theories are supported by observations of Prevotella ruminicola B14 (previously Bacteroides ruminicola) published in the literature. A study found that during the exponential growth phase, Prevotella ruminicola produced little

106

Chapter 3

Figure 3.6 A boxplot in the style of Tukey of ammonia production from cultures in mid-log growth phase. Each box is formed of the median over six replicates for each bacterium, with the upper and lower interquartile ranges. Whiskers show smallest or greatest observation greater than or smaller than 1.5 times the interquartile ranges above and below the hinges, respectively. Observations outside of this range appear as outliers, indicated in the graph as black points. The solid line indicates the median of the blank, with dashed lines showing the upper and lower hinges of the blank to indicate a baseline of ammonia in the medium. Shared letters show no significant difference using Kruskal- Wallis and Bonferroni correction method (P>0.05).

107

Chapter 3 ammonia from peptide deamination, but once in the stationary phase, ammonia concentration would increase (Russell, 1983). This supports the first theory, where the level of ammonia does not differ from the baseline levels because little ammonia was used or produced at the time of ammonia quantification, where the culture was in the growth phase. It was also observed that those cultures of P. ruminicola that were not provided with trypticase (source of peptides) in the growth medium showed a decrease in ammonia concentration during bacterial growth, but concentrations then increased in the later stages of growth (Russell, 1983). This showed that ammonia can be used by the cells, but in this current experiment, the medium contained a peptide source in the form of tryptone, therefore it is likely that ammonia was not used as a source of nitrogen, as no significant decrease in ammonia concentration was observed. On the other hand, the same study used labelled nitrogen atoms in ammonium sulphate to observe nitrogen in the ammonia pool, concluding that their results showed a small amount of ammonia cycling from trypticase nitrogen, into the extracellular ammonia pool, then back into cellular nitrogen (Russell, 1983). This supports the second theory, showing that growing cultures may indeed produce ammonia as quickly as they are using it.

3.3.2.4. Volatile Fatty Acid Production The total amount of VFA production can be seen in Table 3.6 and represented as a bar graph in Figure 3.7, where E. pyruvativorans Isol6, a hyper ammonia producer, produced the highest amount of volatile fatty acids overall, at around 115 mM, (taking into account the VFA concentration in the starting growth medium). This was mainly formed of caproate produced by this bacterium. Only four of the bacteria produced significantly more VFA compared to the base level (as determined by VFA concentration the growth medium), three of which are HAPs.

E. pyruvativorans uses propionate and butyrate to produce caproate and valerate (Wallace et al., 2004), which is reflected in the decrease in the former two VFAs and an increase in the latter two compared to the medium (Figure 3.7). Interestingly, the decrease in butyrate compared to the Hobson’s M2 medium was not noticed previously (Wallace et al., 2004), which may be due to subtle differences in medium composition. Fermentation of pyruvate into acetate or butyrate is a common pathway shared by many anaerobes in the rumen, releasing ATP (Russell, 2002). The most propionate and valerate were produced by M. elsdenii, a product of D-lactate fermentation (Weimer and Moen, 2013). M. elsedenii also produces significant amounts of butyrate and caproate compared to other species and the medium (Figure 3.7), which are products of glucose metabolism (Nelson et al., 2017), also reflected in these results. Butyrate production is also increased for B. fibrisolvens but not significantly compared to the medium, which is surprising since Butyrivibrio species are deemed the most important genus in the rumen for butyrate production (Russell, 2002). Casamino acid fermentation by C. aminophilum produces butyrate and acetate (Paster et al., 1993), similar to A. sticklandii, which as well as acetate and butyrate, also produces propionate, and amino-valerate as an intermediate during amino acid fermentation (Sangavai and Chellapandi, 2017), which are reflected in the results. The absence of caproic acid for F. succinogenes suggests that this VFA may have been 108

Chapter 3 metabolised completely from the medium, yet interestingly there is no mention of the use of caproate in the literature. This result might instead have been due to the levels of caproate being below the detectable levels of the background. There is also a variation in the amount of VFAs detected in the growth medium, as shown by the relatively large standard deviations. Given that the six replicates of the medium all originated from the same batch of medium, the variation should be relatively low. Instead, it may be that this variation is due to the detection method, but none the less goes towards explaining any variations seen between different bacterial samples as well.

109

Chapter 3

osn s Hobson’ Megasphaera elsedenii Prevotella ruminantium Butyrivibrio fibrisolvens Fibrobacter succinogenes Ruminococcus albus Ruminococcus flavefaciens C Acetoanaerobium sticklandii Eubacterium pyruvativorans Sample lostridium aminophilum

shown. Table

M2 Medium

3

.

Kruskal

6

Volatile fatty acid (VFA) production from bacterial cultures. Values are means of six individual replicates with standardthe

SY3

-

Wallis test Bonferroniwith corrections were

T81

19189

3071

S85

49906

007c

Isol6

12662

43.86 ± 8.71c 92.54 ± 5.97a 42.97 ± 8.34c 53.55 ± 7.70bc 47.04 ± 3.92bc 48.50 ± 6.86bc 48.63 ± 8.70bc 60.28 ± 6.88ab 61.00 ± 8.16ab 158.89 ± 5.79a VFATotal (mM)

45.28 ± 2.98a 23.98 ± 4.19a 47.29 ± 4.33a 41.47 ± 4.59a 56.17 ± 3.62a 49.83 ± 3.67a 48.54 ± 5.43a 38.27 ± 5.28a 49.79 ± 1.93a 11.85 ± 1.15a Acetate VFA Proportions Molar (%)

completed for each column

11.91 ± 1.37bc 14.31 ± 1.20ab 12.20 ± 1.33c 10.39 ± 1.18b 11.95 ± 1.10bc 11.01 ± 1.13bc 11.42 ± 1.36b 8.87± 0.87ab 9.50 ± 0.87ab 1.42 ± 0.05a Propionate

.

Shared letters show no significa

23.92 ± 1.99de 22.77 ± 0.55a 24.42 ± 2.76e 32.84 ± 3.14bcd 25.29 ± 1.89de 22.60 ± 2.13de 25.20 ± 2.19cd 40.61 ± 3.54bc 24.33 ± 1.35ab 4.66 ± 0.54a Butyrate

7.82 ± 0.72c 18.94 ± 1.37a 7.75 ± 0.88c 6.52 ± 0.73bc 6.61 ± 0.64c 7.06 ± 0.66c 6.50 ± 0.79bc 5.70 ± 0.64bc 11.49 ± 1.22ab 10.00 ± 0.22a Valerate

nt difference (P>0.05).

11.07 ± 0.72cd 20.01 ± 1.18ab 8.33 ± 0.82d 8.78 ± 0.54c 0.00cd 9.50 ± 1.42cd 8.33 ± 1.70c 6.56 ± 1.00bc 7.90 ± 1.54bc 72.06 ± 1.56a Caproate

deviation

110

Chapter 3

Figure 3.7 Bar graphs showing amounts of volatile fatty acids present in samples after growth. Red bars are hyper ammonia producers (HAPs), no ammonia prodcuers (NAPs) are green and blue are some ammonia producers (SAPs). The single gray bar is the blank medium, which acts as a baseline of VFAs present in the medium. The bar heights are the mean of six samples, with the error bar showing one standard deviation above and below the mean. The dashed lines indicate one standard deviation above and below the growth medium baseline.

111

Chapter 3 3.3.2.5. Fourier Transform Infrared Spectroscopy (FTIR) FTIR is a non-discriminatory method for identifying metabolite profiles and types of compounds involved, but without the depth of individual molecules. Both the supernatant and the cell pellet from mid-log growth phase cultures of HAPs, NAPs and SAPs were analysed using FTIR in order to get a glimpse into the metabolome and compare metabolic profiles firstly within then between groups. For the supernatants, the FTIR results indicated that there are some visible differences between the three bacterial species and the Hobson’s M2 growth medium (used as a blank) in each of the phenotypic groups. The spectra in Figure 3.8A-C revealed that all samples within the same phenotypic group had peaks at similar wavenumbers, showing that there are therefore similar molecules present across the samples. There is also little apparent difference in peak pattern in the spectra when comparing across the three phenotypic groups. The principal component analyses (PCA) in Figure 3.8D-F show relatively little separation of the spectra from the supernatants of the three bacterial species within any of the three phenotypic groups. Interestingly, the wavelengths that are driving what little separation there is in the y-axis are common to all three phenotypes, with peaks at ~1590 cm-1, as shown by the largest peak in each loading graph in Figure 3.8G-I. These wavenumbers may correspond to bending in amide (NH) groups (Nakanishi, 1962). The principal component analysis across all samples in Figure 3.10 reveals that there is some separation of the three phenotypes from each other and from the Hobson’s M2 medium. The variation between these growth medium blanks however indicated that there is generally a lot of variability in this data. Three principal components were required to describe >99.5 % of the variation in the data, with components one and two describing >98 %. Principal component two offers the most separation of the blank medium samples as already seen in Figure 3.10, with peaks in the same regions as described above.

For the pellets on the other hand, FTIR results showed more distinct separation between the phenotypic groups and growth medium. It should be noted that the growth medium yielded little to no pellet during sample preparation as the broth was generally clarified with dissolved substrates. Therefore, there may be an inherent bias comparing the samples to the blanks and interpretations here should be taken with some caution. Whilst the spectra for these samples (Figure 3.9A-C) are generally consistent for each bacterial sample within the groups, there are visible differences in peak patterns in the spectra between the three groups, indicating possible variations of metabolites in the cells. As with the supernatants, there is generally visible separation in the PCA plots of the bacterial samples with those of the growth medium (Figure 3.9D-F). What separation can be seen between each HAP and medium in the PCA plot (Figure 3.9D) is driven by a positive peak at 1656 cm-1 and negative peak at 983 cm-1 which may correspond to absorbance by amides (Nakanishi, 1962), and starch (980 to 1150 cm-1) or monosaccharides (800 to 1000 cm-1) (Belanche et al., 2013). The separation between each NAP and the medium is driven by a positive peak at 1592 cm-1 and negative peak at 1027 cm-1, which

-1 + may correspond to in plane bending of amine groups (1560 ~ 1640 cm ) or scissoring of NH3 (1575~1600 cm-1) (Nakanishi, 1962), and either starch or monosaccharides. A positive peak from

112

Chapter 3 amides at 1654 cm-1 also drives separation between the SAPs and medium, as well as a broader negative peak at 1800 ~ 2100 cm-1, which may correspond to the immonium band and amide salts (Nakanishi, 1962).

To better visualise the separation between each phenotypic group and growth medium, both a PCA was carried out as well as a supervised approach using discriminant function analysis. This revealed that for the supernatant, there was some small amount of separation between the blank medium, HAPs and the other two phenotypes, whereas NAPs and SAPs more difficult to distinguish (Figure 3.10). The first discriminant function separates the phenotypic groups from the blank medium in the X axis, revealing the same loading patterns as the PCA. What is interesting however, is that the second discriminant function showed separation of the HAPs from the NAPs and SAPs in the Y axis. This is driven primarily by the peaks at wavenumbers ~850 cm-1 and ~3400 cm-1, which could correspond to stretching of carbon-oxygen bonds in monosaccharides and amide stretching respectively (Nakanishi, 1962; Belanche et al., 2013). Other peaks of interest are at wavenumbers 1350 cm-1, ~1600 cm-1 and 2800 cm-1, which may correspond with C-N stretching, the amide II region, and carboxyl groups, respectively. This could be evidence that amides in the form of amino acids or peptides and fatty acids are driving the separation between the phenotypic groups.

The PCA for the pellets however, showed a more discernible separation of the groups (Figure 3.10), with the DFA showing distinct groupings of the three phenotypes and the culture medium. Separation of the HAPs from the medium samples occurs primarily in the first principal component (X-axis). Peaks at 1542 and 1592 cm-1 likely correspond to amine vibrations (1150 ~ 1650 cm-1), whereas a peak at 1656 cm-1 falls in the amide I band. Separation of HAPs from NAPs and SAPs however occurs primarily in the second component (Y-axis), with peaks in the similar regions as above; 1658, 1594 and 1542 cm-1, but also a negative peak at 1031 cm-1, which is in the polysaccharide region, possible due to starch or monosaccharides (Nakanishi, 1962; Belanche et al., 2013; Mayorga et al., 2016). This could indicate that the separation between HAP cells and the cells of other two groups is driven by carbohydrates, whereas separation of HAP cells from the growth medium is primarily driven

113

Chapter 3

114

Chapter 3

Figure 3.8 (On previous page) FTIR analysis on supernatant. Normalised spectra, principal component analysis graphs and corresponding loading graphs for the second component for the supernatants of each of the three ammonia production phenotypes. A-C are the normalised spectra for the HAPs, NAPs, SAPs respectively, where grey bands correspond to polysaccharides (900 ~ 1200 cm-1), amide II (1600 ~ 1640 cm-1), amide I (1650 ~ 1690 cm-1) and fatty acids (2900 – 3100 cm-1), (Belanche et al., 2013; Mayorga et al., 2016; Nakanishi, 1962). Principal components one and two are plotted (D-F) out of the three components needed for each to describe >99.5 % of the variation. Loading graphs for the second principal component are shown below (G-I), with the red line indicating zero.

Figure 3.9 (On next page) FTIR analysis on the cell pellet. Normalised spectra, principal component analysis graphs and corresponding loading graphs for the second component for each of the three ammonia production phenotypes. The grey bands in A-C correspond to polysaccharides (900 ~ 1200 -1 -1 + cm ), nucleic acids (1250 cm ), bending in methylene bonded to nitrogen in amine salts (CH2-H , 1400~1440 cm-1), amide II (1600 ~ 1640 cm-1), amide I (1650 ~ 1690 cm-1), fatty acids (2900 – 3100 cm-1), (Belanche et al., 2013; Mayorga et al., 2016; Nakanishi, 1962). Principal components one and two are plotted out of five components needed to explain >99.5 % of the variance (D-F). Given that the greatest separation of the groups is due to the y axis, loadings for principal components two are shown (G-I) with the red line indicating zero.

115

Chapter 3

116

Chapter 3 by amine groups, possibly because amino acids and peptides are depleted from the growth medium by bacterial cells in the cultures.

Based on what has been established about these ammonia production phenotypes, it was hypothesised that the supernatant analysis would show a separation based on different compounds that the cells produced or depleted from the growth medium, whereas the cell pellet would indicate predominant metabolites that the cells transported in and utilised, as well as cell structure. It is expected that the supernatants from HAP cultures would have increased ammonia, reduced amino acids and peptides, and no difference in carbohydrates compared to the Hobson’s M2 growth medium or the NAPs and SAPs, as HAPs do not utilise these molecules. On the other hand, NAPs and SAPs would have differences in the carbohydrate content compared to the medium, as well as a likely reduction in amino acids and peptides. Fatty acids would be produced by all bacteria, but given the large amount produced by Eubacterium pyruvativorans Isol6 as showed previously, this might be a driver for separation. Another consideration is the effect of the cell wall on separation. FTIR has been shown to be able to discriminate between Gram positive and Gram negative bacteria by analysing the structures present on the outside of the cells (Novais et al., 2019). However, this does not seem to be a driver of separation here; P. ruminicola, M. elsedenii and F. succinogenes are Gram negative, but do no cluster together in the PCA, suggesting that cell wall composition does not influence the spectra. Given that carbohydrates are the main source of energy for NAPs and SAPs it is expected that the cells would transport these preferentially and break them down, which the metabolites and therefore FTIR might reflect. HAPs on the other hand would be transporting more peptides and amino acids, which may also be detectable. This hypothesis was not reflected in these results, as no detectable differences in the wavelengths that correspond to these compounds were observed between the HAPs, SAPs and NAPs. However, there were trends in the DFA of the supernatant samples that supported this hypothesis, with separation of HAPs from the NAPs, SAPs and growth media, possibly driven by signals from carbohydrates, peptides, amino acids or ammonia, and volatile fatty acids. The hypothesis for the expected metabolites in pellets separating the groups is also supported by the DFA, with amide groups separating HAPs and the growth medium, and carbohydrates separating HAPs from the NAPs and SAPs.

However, alone the FTIR method is inconclusive, and would require further examination such as deeper metabolomics study. It is unsurprising that the analyses did not tightly group the different species within a phenotypic group together. As previously established, whilst the overall ammonia production phenotype is the same, this does not imply that these bacteria go about their metabolism in the same way. For example, it is known that Prevotella ruminicola is amylolytic, and will metabolise starch as well as other complex carbohydrates such as xylan (Krause et al., 2003), whereas Megasphaera elsedenii plays a role in lactate metabolism (Weimer and Moen, 2013). Because the medium was kept constant for these species, the FTIR spectra reflects their metabolic preferences. Whilst FTIR is a useful tool that can show the biochemical fingerprint of a whole cell or footprint

117

Chapter 3 (pellet) at a given time (Wenning et al., 2014), it does also have many limitations and should be used as a gateway to further analysis. It primarily relies on a presence/absence pattern, and as such is used to differentiate or even identify microorganisms by comparing FTIR spectra, a technique applied in areas such as clinical, food safety, and environmental, to name just a few (Wenning and Scherer, 2013). Although FTIR can be used to discern phenotypes, even within species there can be large phenotypic differences that makes FTIR unsuitable (Wenning and Scherer, 2013). Quality of spectra can vary, often affected by water vapour and poor signal to noise ratio (Wenning and Scherer, 2013), as well as variations and availability of substrates in growth medium (Wenning et al., 2014). Resolution of individual molecules is also low when using FTIR, as peaks of absorbance often overlap. Instead, only families of molecules such as proteins, lipids or nucleic acids etc can be identified (Petibois and Desbat, 2010).

Figure 3.10 (On next page) Spectra of all the bacterial samples and blanks, as well as PCA and DFA plots for both the supernatant (left) and pellet (right). Principal components one and two were plotted, out of three needed for supernatant samples and five needed for pellet samples to explain >99.5 % of the variance (C & D). Discriminant function analysis graphs (E & F) show separation in a supervised approach, with circles indicating 95 % level of confidence. Solid coloured, hollow or encircled symbols in the DFA plots indicate whether that data point was used as test, training or validation respectively.

118

Chapter 4

119

Chapter 4 3.4. Conclusion

So far, three distinct phenotypes of ammonia production by rumen bacteria have been identified. Three suitable representative species for each of these groups have been selected and cultured. The phenotypes were confirmed by primarily measuring ammonia production, as well as VFA production and using 16S sequenced to confirm that cultures grown were indeed as expected. A comparison of the metabolomic fingerprints using FTIR revealed some interesting separation based on chemistries of the groups in both the supernatant and cell pellet. Combining this with the ammonia production and VFA patterns, these offer evidence that these three groups occupy different metabolic niches. This characterisation builds the foundations on which further comparisons can be made, especially those using genomics and transcriptomics, to identify commonalities between these different phenotypic groups.

120

Chapter 4 4. Analysis of the Genomes of Rumen Bacteria with Known Ammonia Production Phenotypes to Determine A Hyper Ammonia Production Signature

4.1. Introduction

Hyper ammonia production through amino acid deamination is a wasteful process to the ruminant host (Chen and Russell, 1990), but despite this, the knowledge of the bacterial population that contributes to this process in the rumen remains limited. It was hypothesised here that as HAPs share the same phenotypic traits of utilising amino acids as the sole carbon and nitrogen source leading to excessive ammonia production, the identification of commonalities that are unique to this phenotype and shared by these species should be able to be combined into a signature. This signature can then be searched for in other bacterial genomes, increasing the knowledge of HAP species not only in the rumen, but other microbiomes of interest, particularly the human gut. Although previous studies did not identify the presence of putative HAP species in human faecal samples, bacterial species were found that were capable of metabolising proteins in a similar way to those HAP species found in the rumen (Richardson et al., 2013). Such previous studies utilised only culture-based methods, isolating bacteria using amino acid and peptide rich media. The discovery of a HAP-phenotype signature from genomic data would allow for metagenomic data from a variety of sources to be screened for the presence of novel putative HAP species. Hence, the current study concerns itself with identifying commonalities between HAPs and their genomes using no ammonia producing bacteria (NAPs) and some ammonia producing bacteria (SAPs) as a comparison.

4.2. Aims and Objectives

• To determine genome-based commonalities between the HAP species using the NAP and SAP species as a comparison. • To identify whether characteristic genes for Acetoanaerobium sticklandii DSM519 are present in the HAP genomes. • To use the genomes to investigate four core hypotheses: o the HAP trait is inherited o the HAP trait is conferred by horizontal gene transfer o the HAP trait is conferred by convergent evolution o the HAPs are more optimised for amino acid transport • Ultimately to determine a signature for the HAPs that can be used to determine novel HAP species in the rumen, and eventually other datasets such as human gut.

121

Chapter 4 4.3. Experimental Procedures

First the genome of Acetoanaerobium sticklandii DSM519 was downloaded (Section 2.15) and the characteristic genes were identified in the nine HAP, NAP and SAP genomes. Selenocysteine transfer RNA (tRNAsec) genes were identified using SecMarker (Santesmasses et al., 2017). Subsequent analyses used all of the genomes from the Hungate1000 collection and a summary of the steps taken can be seen in Figure 4.1.

Figure 4.1 Flow chart showing the key steps in this analysis, with relevant methods described in the sections given in italics.

122

Chapter 4 4.4. Results

4.4.1. Characteristic genes in Acetoanaerobium sticklandii DSM519 and HAPs Acetoanaerobium sticklandii is a well studied amino acid utiliser, and with the availability of the A. sticklanii DSM519 genome and published gene set characteristic for this bacterium, it would be interesting to determine whether these genes as well as selenoprotein use underpin the HAP phenotype.

In order to utilise selenoproteins, the selenocysteine transfer RNA (tRNAsec) is required to recognise the UGA codon, and the tool Secmarker (Santesmasses et al., 2017) can be used to identify whether it is encoded in a genome. As expected, a tRNAsec was identified in A. sticklanii DSM519 with an evalue of 5.4e-17. Similarly, one was identified in A. sticklandii ATCC12662 (from the genome sequenced in this study) with an evalue of 4.9e-17, and in Eubacterium pyruvativorans isolate6 with an evalue of 2.4e-17. Clostridium aminophilum ATCC49906 did not have any, nor did any of the three NAPs and SAPs, apart from Megapshaera elsedenii T81, which had one with an evalue of 1.1e-17. A full set of selenocysteine machinery are present in both A. sticklandii ATCC12662 and E. pyruvatirovrans Isolate6, as well as in the SAP M. elsedenii T81. C. aminophilum ATCC49906 only has selenophosphate synthase and D-proline reductase proprotein (part of the proline reductase complex). D-proline selenoprotein was only found to be present in M. elsedenii T81. None of the other NAPs or SAPs have any of the selenocyseine insertion machinery or proline reductase proteins. Two characteristic genes as described previously (Fonknechten et al., 2010) were found in all nine species of HAPs, NAPs and SAPs, which belonged to the glycine cleavage system and the cysteine catabolism pathway. There were no pathways that were complete in all of the HAPs alone, or the HAPs and SAPs alone. Full results can be seen in Appendix Table 10.1.

4.4.2. Phylogeny of Hyper Ammonia Producing Bacteria The first hypothesis tested whether hyper ammonia production was a phylogenetic trait that had been inherited from a common ancestor. To resolve this, a phylogenetic tree was built using a concatenation of orthologs for 40 universal non-protein coding gene markers identified in the genomes from the Hungate collection and visualised using the Interactive Tree of Life (iTOL, (Letunic and Bork, 2007)) and rooted on one of the archaea (Euryarchaeota) as an outgroup (Figure 4.2). This revealed that neither the confirmed nor putative HAPs (those that could not be confirmed as HAPs from cultures in this current study) occupied one clade together, as indicated by the position of the red circles on the tree. The HAP phenotype is therefore not a monophyletic trait, as not all of the confirmed HAP species are closely related, nor are the putative HAP species available in the Hungate1000 collection. All but one of the HAPs belong to one phylum; the Firmicutes, except one putative species, which belongs to the phylum Fusobacteria. Of the three confirmed and ten putative HAPs in the dataset, six of them do cluster together in one clade, however the status of the ammonia production phenotypes of

123

Chapter 4 the other bacterial species (Clostridium mangenotii LM2, C. glycolicum KPPR-9, C. sp NCR, Peptostreptococcaceae bacterium VA2, Eubacterium sp. AB3007, Peptostreptococcaceae bacterium pGA-8.) also present in this clade are unknown. Confirmed and putative SAP species are also placed across the tree, showing that ammonia production in general is not limited to one family or even phylum. Because the dataset contains multiple strains for the same species, uncoloured symbols indicates where the literature suggested a particular ammonia production phenotype for one strain or the species generally, and then this was applied as a putative phenotype to all strains of that species. For those taxa without a symbol, their ammonia producing abilities are unknown or untested.

124

Chapter 4

indicatethe positions of the four sequencedgenomes. outerring indicateHAP, SAP NAP or phenotype, where filledshapes areconfirmed phenotypes and empty shape Figure

4

.

2

Maximum likelihoodsupermatrix phylogenetic tree.

Coloured bandsin outer the ring indicatephyla per as the legend,and the coloured shapes the in s indicate s putative phenotype.Purple triangles

125

Chapter 4 4.4.3. Genomic Analysis to Determine Hyper Ammonia Production Gene Signatures Although the hyper ammonia production phenotype is not monophyletic, HAP bacterial genomes may harbour a particular series of genes that are necessary to confer the hyper ammonia production ability and therefore be indicative of this phenotype. These genes may share sequence similarity through conservation of established or ancient genes, exogenous gene acquisition such as through horizonal gene transfer, or genes that encode a particular function but have a different sequence which may have arisen through convergent evolution. To determine this, gene families were produced using all of the available genomes from the Hungate1000 collection using both sequence similarity to cluster closely related genes, as well as a functional annotation approach, where genes were annotated based on similarities to genes in already established orthologous groups that have assigned functions.

4.4.3.1. Optimising the Gene Family Creation Firstly, all the genomes available in the Hungate1000 collection were utilised to create the gene families using an all against all approach to sequence similarity searching (see methods in section 2.17.1). The use of all of the genomes instead of just the genomes pertaining to HAPs, NAPs and SAPs, ensured gene families were more complete and populated, avoiding phylogenetic biases, as well as being specific to the rumen microbes. It would also allow for the species with known ammonia production phenotypes to be placed into context with other rumen microorganisms. However, this would only be achieved by using the optimum thresholds to allocate a gene to a gene family to avoid gene family over population and bias. An optimisation process was carried out using different parameters available in the Evolutionary Genome Network (EGN) programme. The optimal settings were chosen where the number of sequences in the singletons and the largest components was low, and where there was not an excessively large (> 100,000) or small (< 10,000) number of gene families, and where COGs were identified in one gene family each, or as close to this as possible. The different settings trialled can be seen in Table 4.1.

Table 4.1 (On the next page) Review of the optimisation tests ran using EGN. The optional parameters available in EGN that are not listed here were not changed from the default. ‘Enforced?’ is an edge filtering option to enforce the hit coverage and best reciprocity parameters set.

126

Chapter 4

COGS Component Numbe Number of Singletons Total Gene Families Enforced? % Hit Identity Coverage% Hit % Reciprocal Bitscore

r Firstin

other COGS. samethe family as COGS appear in families. Some 1 COG three in families each and between two 8each, cogs split are onein family 31 of COGSthe 782319 139428 39520 none 20.00 90.00 5.00 60 Default:

%

% %

families. different 80up to spread in had genes COGs Some 1329 277327 229403 both 20.00 90.00 5.00 60

%

% %

families. different into eight split, some and the rest families each, across two 34each, one family were found in 16 COGS 29439 205057 63078 both 20.00 90.00 100.00 60

% %

%

families. across three andeach 1 two families 8each, across one family were found in 31 of the cogs 782319 139428 39520 none 20.00 50.00 100.00 60

% %

%

families. 2 had three families, and across two 12 were family, then one in COGS were 27 of the 475638 152832 44258 both 20.00 50.00 100.00 20

% %

%

multiple families. entirely across others are split different family, sequencesof a in have just a couple families. Some two up sevento varying between multiple were split across butAll 2 COGS 16484 236378 96149 both 40.00 80.00 100.00 60 (2014) Cheng al, et

% %

%

families,

across six. andeach 1 across five four each, 2 3each, across across three two each, 5 found across werecogs family, 20 found onein 9 Cogs were 23404 216010 92259 both 40.00 50.00 100.00 60

% %

%

three each. COGS in are each, and 2 two COGS in are 12each, one family COGS in are 27 of the 413958 154997 46007 both 20.00 50.00 100.00 60 Optimal:

families

% %

%

127

Chapter 4

4.4.3.2. Sequence Similarity Approach for Genomic Analysis By comparing the presence, absence and numbers of genes present in the different gene families for only the confirmed HAPs, NAPs and SAPs and then identifying those gene families populated only by HAPs alone, it may be possible to find HAP specific genes. Three gene families had at least one gene present in the three HAP bacteria (A. sticklandii ATCC12662, E. pyruvativorans Isol6, C. aminophilum ATCC49906), and no genes present in the SAP bacteria (B. fibrisolvens DSM3071, M. elsdenii T81, P. ruminicola ATCC19189) nor the NAP bacteria (F. succinogenes S85, R. albus SY3, R. flavefaciens 007c). The families were populated with genes for glycine betaine transporters, putative and hypothetical proteins. There was one gene family populated by genes from HAP and SAP bacterial genomes, which were annotated as alcohol dehydrogenases. There were three gene families that all three of the HAPs lacked that were present in all SAPs and NAPs, which were annotated as murein peptide carboxypeptidases, thiazole synthases and hypothetical proteins. An overview of the number of gene families that were common to all of the species within a known ammonia-production phenotype group is displayed as a Venn diagram Figure 4.3.

A principal components analysis (PCA) (Figure 4.4) was carried out on gene counts of these gene families, where the counts were normalised to the fewest number of genes in a genome. The PCA interestingly took 46,464 gene families and reduced the variation of counts within these to just nine principal components. Plotting principal components one and two showed a distinct separation of the HAPs (red) together with M. elsedenii T81 (SAP) and the other species. The most separation is in the y-axis, driven by PC2. Whilst there is a distinct clustering pattern for the HAPs, this graph only visualised 69.4 % of the variance in the data, therefore limiting the conclusions that can be drawn from this result, instead it indicates that there is complexity within the data.

The lack of substantial evidence of similar genes that are common to HAPs and not SAPs and NAPs suggests that neither conservation nor acquisition of sequential genes in a functional operon alone confers the hyper ammonia producing phenotype. Instead, there might be evidence of convergent evolution when looking at functional properties of genes, where the bacteria have evolved sequentially different genes that encode the same functional protein.

128 Chapter 4

3

1 1 384

9 3 1

Total = 46,465 Figure 4.3 Venn diagram showing the counts of gene families shared between all species with known ammonia production phenotypes. HAPs are in red, SAPs in blue and NAPs in green.

Figure 4.4 Principal component analysis (PCA) plotting variance between counts of gene families for the three ammonia producing phenotypes.

129 Chapter 4 4.4.3.3. Functional Annotation Approach for Genomic Analysis To assign function to the genes in the nine genomes of interest, the eggNOG mapper was used (Huerta-Cepas et al., 2017). Clusters of Orthologous Genes (COGs) codes were assigned on average to 94.02 ± 0.99 % of genes in the HAP species, 92.07 ± 2.55 % of genes in the SAP species, and 85.16 ± 4.45 % of genes in the NAP species. As the specific functions of these COGs are known, a higher level, more general function (such as carbohydrate metabolism or amino acid metabolism) was also assigned to these genes. A full list of functional category identifiers can be seen in the legend in Figure 4.5. Using such categories, genes were grouped based on higher level function and visualised, creating a functional profile showing the proportion of genes in a genome dedicated to particular functions, and this can be then compared between genomes.

For example, in the genomes of HAP species A. sticklandii ATCC12662, C. aminophilum ATCC49906 and E. pyruvativorans isol6 there were 9.2 %, 7.8 % and 9.2 % of the genes that contribute to amino acid transport and metabolism respectively, whereas there are 6.9 %, 6.0 %, 6.2 % of the COGs in NAP species F. succinogenes S85, R. albus SY3 and R. flavefaciens 007c respectively and 7.0 %, 9.2 %, and 6.0 % of the COGs in the SAP species B. fibrisolvens DSM 3071, M. elsdenii T81 and P. ruminicola ATCC19189 respectively.

The proportion of genes that contribute to energy production and conversion are also higher in the HAPs genomes A. sticklandii ATCC12662, C. aminophilum ATCC49906 and E. pyruvativorans isol6 at 7.5 %, 6.3 % and 8.4 % respectively, compared to 4.6 %, 4.7 % and 5.3 % of the COGs in NAP species F. succinogenes S85, R. albus SY3 and R. flavefaciens 007c respectively and 3.6 %, 8.2 % and 5.0 % of the COGs in the SAP species B. fibrisolvens DSM 3071, M. elsdenii T81 and P. ruminicola ATCC19189 respectively.

The proportions of COGs involved in carbohydrate transport and metabolism on the other hand were lower in the HAP species A. sticklandii ATCC12662, C. aminophilum ATCC49906 and E. pyruvativorans isol6 at 3.9 %, 5.0 % and 3.5 % respectively, compared to 8.7 %, 7.5 % and 8.1 % of the COGs in NAP species F. succinogenes S85, R. albus SY3 and R. flavefaciens 007c respectively and 10.9 %, 4.8 % and 9.7 % of the COGs in the SAP species B. fibrisolvens DSM 3071, M. elsdenii T81 and P. ruminicola ATCC19189 respectively.

Interestingly, for these three categories M. elsdenii T81 showed percentages of genes in these functional groups similar to those in the HAPs, with 9.2 % of genes important in Amino acid transport and metabolism, which is higher than the other percentages in the other three HAPs, and carbohydrate transport genes form 4.8 % of the total functional complement for M. elsdenii T81, lower than the 10.9 % and 9.7 % of B. fibrisolvens DSM3071 and P. ruminicola ATCC19189 respectively (Figure 4.5).

There were four clusters of orthologous genes identified as being unique to the HAPs. These were a) a group of genes for proteins with activity involved in energy production and conservation as well as posttranslational modification, protein turnover and chaperones (COG0384), b) a group

130 Chapter 4 belonging to the Betaine/Carnitine/Choline Transporter (BCCT) Family (transporter code 2.A.15) (COG1292), c) a Pas domain which plays a role in signal transduction (COG2202) and d) a group which contains orthologs to a protein of unknown function (COG5505). Similarly, there were two COGs shared between HAPs and SAPs; one was a group of alcohol dehydrogenase enzymes (COG1063), whilst the second was a group of enzymes with O-methyltransferase activity (COG4122). Both of these orthologous groups play a role in amino acid transport and metabolism (Huerta-Cepas et al., 2019). An overview of the numbers of bacterial orthologous groups that were shared or unique to the phenotypes can be seen in the Venn diagram Figure 4.6.

131 Chapter 4

100% Function unknown

General function prediction only

90% Secondary metabolites biosynthesis, transport and catabolism Inorganic ion transport and metabolism

80% Lipid transport and metabolism

Coenzyme transport and metabolism

70% Nucleotide transport and metabolism

Amino acid transport and metabolism

60% Carbohydrate transport and metabolism

Energy production and conversion

Posttranslational modification, protein 50% turnover, chaperones Intracellular trafficking, secretion, and vesicular transport Extracellular structures 40% Cytoskeleton

Cell motility 30% Cell wall/membrane/envelope biogenesis

Signal transduction mechanisms

20% Defense mechanisms

Nuclear structure

10% Cell cycle control, cell division, chromosome partitioning Chromatin structure and dynamics

0% Replication, recombination and repair

Transcription

RNA processing and modification

Translation, ribosomal structure and biogenesis

Figure 4.5 Functional profiles using higher level COG functional categories. Background colours to the bars indicate HAPs (red), SAPs (blue) and NAPs (green).

132 Chapter 4 Principal component analysis reduced the variation in counts of genes amongst 4916 COG gene families to nine principal components, with PC1 and PC2 explaining 59.6 % of the variation. Using these two components, there is clustering of the HAP genomes together, along with M. elsedenii T81, separate to the other SAP and NAP genomes (Figure 4.7), with PC2 driving this separation in the y- axis. However, due to the low amount of variation these principal components explain, further conclusions cannot be made.

Whereas the eggNOG orthologous gene groups (COGs) create gene families across a variety of functions and taxa, the addition of a KEGG ortholog (Kanehisa et al., 2016) annotation allows that gene to be easily mapped to metabolic pathways and possible expand the annotations applied to these genomes. Fewer genes were annotated with a KEGG ortholog than a COG; with an average 16.1 ± 1.0 % of total genes in HAPs, 14.1 ± 2.8 % in SAPs and 12.9 ± 0.6 % in NAPs. There were four KEGG functions that are unique to the HAPs, where each HAP species possess at least one gene, but the NAP and SAP species possessed none. These four functions are defined by KEGG as putative spermidine/putrescine transport system ATP-binding protein involved in quorum sensing (K02052), transposase for replication and repair (K07483), pldB; a lysophospholipase involved in glycerolphospholipid metabolism (K01048) and flp; a CRP/FNR family transcriptional regulator and anaerobic regulatory protein (K21562). There were a further three functions for which all of the HAPs and SAPs had at least one representative gene; spermidine/putrescine transport system permease protein (K11070), an uncharacterised protein with unknown function (K06975), and caffeoyl-CoA O- methyltransferase, which can be involved in monolignol biosynthesis, converting phenylalanine/tyrosine to monolignol (K00588).

Five functions did not have representative genes in the HAPs, with at least one in each of the NAP and SAP genomes. These were UDP-glucuronate 4-epimerase, involved in amino sugar and nucleotide sugar metabolism (K08679), thiazole tautomerase and thiazole synthase, both functional in thiamine metabolism (K10810, K03149), trimeric autotransporter adhesin (K21449), and finally acyl-CoA thioester which is involved in various metabolism pathways (K07107). Additionally, endoglucanase (K01179) which is involved in starch and sucrose metabolism had a varying number of genes capable of this function ranging from six to 30 in the SAP and NAP genomes, and zero to two genes in the HAP genomes. Interestingly, M. elsdenii T81 also did not have any endoglucanases, despite ostensibly having the SAP phenotype.

133 Chapter 4

4

7 2 508

6 6 2

Total = 4916 Figure 4.6 Venn diagram showing counts of clusters of orthologous genes unique or common to the different ammonia production phenotypes. HAPs are in red, SAPs in blue and NAPs in green.

Figure 4.7 Principal component analysis (PCA) plotting variance between counts of genes belonging to bacterial orthologous genes (COGs).

134 Chapter 4 4.4.3.4. Transporters In order to metabolise target compounds for energy; be this carbohydrates, amino acids or peptides, that target compound must first be transported into the cell, and resulting metabolites removed. Looking at the total numbers of amino acid and peptide transporter orthologs present in the genomes of interest, it was observed that there were more orthologs in HAPs than in NAPs (Figure 4.8). There is also a large variation between the numbers of orthologs present in the genomes, where C. aminophilum ATCC49906 has more orthologs compared to other HAPs (subtracting the one for the ammonium transporter present) but has fewer orthologs than the SAP M. elsedenii T81.

The large family of Dicarboxylate/Amino Acid:Cation Symporter Family transporters (2.A.23) is common to all HAPs and two of the SAPs but none of the NAPs. This family transports various dicarboxylates (such as succinate, glutamate and aspartate) or amino acids along with hydrogen or sodium into the cell (Saier et al., 2016). The presence of these transporters may be integral for the hyper and some ammonia production phenotype. In particular, the transporters found in the HAPs and SAPs are orthologs to an archaeal aspartate transporter (2.A.23.1.5), a glutamate/aspartate transporter (2.A.23.2.11), a glutamate/aspartate sodium symporter (2.A.23.2.2), another glutamate/aspartate sodium or hydrogen symporter (2.A.23.1.2), a serine/threonine sodium symporter (2.A.23.4.1) and a dicarboxylate (succinate, fumarate, malate and oxaloacetate) hydrogen symporter (2.A.23.1.6) (Saier et al., 2016), which was unique to just B. fibrisolvens DSM3071. Orthologs for transporters in the polar amino acid uptake family (3.A.1.3), and the alanine or glycine:cation symporter family (2.A.25) were found in all species. The two HAPs A. sticklandii ATCC12662 and E. pyruvativorans Isol6 did not possess any orthologs of the ammonium channel, whereas the HAP C. aminophilum ATCC49906, and all NAPs and SAPs did (Figure 4.8).

135 Chapter 4

Amino acid transporters

Permeases

Cation symporters

Exporter Peptide transporters Ammonium channel

Figure 4.8 Stacked graph of amino acid and peptide transporters in the genomes of species with known ammonia production phenotypes.

136 Chapter 4 4.5. Discussion

Currently, little is known about the genetic and biochemical organisation that confers the HAP phenotype. Here, a bioinformatic analysis was undertaken with the aim of elucidating key factors in known HAP genomes that could be used to differentiate HAPs from SAPs and NAPs in bacteria with an unknown ammonia production phenotype. Increasing the known species of HAPs would help develop the understanding of the roles these species play within a community, not only just in the rumen but other microbiomes.

4.5.1. Characteristic Acetoanaerobium sticklandii DSM519 Genes are Not Characteristic of HAPS Although A. sticklandii DSM519 is well characterised and the enzymes and pathways involved in its amino acid utilisation are well understood, if these genes and pathways were also unique to the other HAPs and not the NAPs (possibly shared with the SAPs) then these could be characteristic of all HAPs. Instead it was found that many of these genes characteristic to A. sticklandii DSM519 were also present in the HAP, NAP and SAP species studied here. Furthermore, there were no pathways unique to the HAPs, indicating that those that are characteristic to the one well studied species does not apply to all HAPs. However, some pathways may be completed in the HAP, NAP and SAP species studied here with homologs to those enzymes present in the model genome, something that sequence based pattern-matching will fail to capture. If this is the case, other HAPs may also prove to be interesting reservoirs for novel biochemical reactions and enzymes applicable for biotechnological use, much like A. sticklandii (Sangavai and Chellapandi, 2017).

4.5.2. Hyper Ammonia Production Is Not A Monophyletic Trait The first hypothesis was that the hyper ammonia producing bacterial species were closely related. Resolving this phylogenetic relationship could indicate the pattern of evolution these species may have undertaken and give an understanding of how the HAP phenotype of obligate amino acid fermentation and hyper ammonia production likely would have arisen. Had the placement of known HAP species on the tree formed one monophyletic clade, it could be assumed that this phenotype has evolved from one common ancestor, and as such all members of this clade would likely possess the same or highly similar genomic information that encodes for this phenotype. An example of this is endospore formation, an ability of certain Gram-positive rods, where evidence has shown that evolution of this trait occurred once, with conservation of endosporulation genes that showed monophyly, which also allowed for the discovery of new endosporulating bacteria (Abecasis et al., 2013). However, this was not the case for the species examined here, and much like the lactic acid bacteria, the HAPs instead form a biological group, rather than a taxonomical one (Makarova and Koonin, 2007), where putative and known HAP species are placed in different clades across the tree, something that has been noted previously with the amino-acid fermenting Clostridia (de Vladar, 2012). This indicates instead that the HAP phenotype has either arisen multiple times through convergent evolution or key genes were

137 Chapter 4 transferred through horizontal gene transfer. There is however one clade that is more heavily populated with HAPs than the others, and it would be of interest in future work to determine whether the other members of that clade are also HAPs, as current literature is inconclusive. Whilst Clostridium mangenotii was studied amongst other Clostridia for their ability to grow on media containing only amino acids as both an energy and carbon source (Elsden et al., 1976), there were no comments on ammonia production. However, C. mangenotii does reduce proline to form 5- aminovaleric acid, a commonality shared with C. sticklandii (Elsden and Hilton, 1979), indicating a possibility of sharing the HAP phenotype. Literature on C. glycolicum report similar abilities of growth with yeast extract as sole carbon and nitrogen source, as well as fermentation of some carbohydrates. Acetate, isovalerate, hydrogen and carbon dioxide are produced during fermentation of peptone and casamino acids, but not ammonia, and uracil is degraded to ammonia and carbon dioxide (Chamkha et al., 2001). Tests for ammonia production during growth on a nitrogen source without a carbohydrate carbon source would signify that these other closely related bacterial species are also HAPs, creating a more densely populated clade of HAPs.

A predominant limitation in studying this hypothesis, as well as the following hypotheses, is the small population size; without knowing the ammonia production phenotype of many of these organisms, conclusions drawn currently remain limited. To better trace the evolution of the HAPs, a larger number of representative taxa are therefore required. Currently, there are only three confirmed HAP strains and a further eight putative HAP species in the dataset, which limits the predictions and conclusions that can be made. By including data from other published datasets, such as HAPs isolated from Nellore steers (Bento et al., 2015), or those from ruminants in New Zealand (Attwood et al., 1998) and Scotland (Eschenlauer et al., 2002), it would create a more balanced view on HAPs. However, the only data available for these HAPs at the time of this current study were the 16S sequences. Although a tree using the 16S gene as a marker was tested using the data from the Hungate1000 dataset and the sequences from the published studies, the variability between sequence quality and hypervariable regions of the 16S gene that were sequenced created a misleading and unsuitable tree. Furthermore, the use of 16S genes as markers for taxonomic study has been proven to not be the best practice, instead taking multiple markers, such as the 40 universal genes, gives a better representation, a notion that was conceived as early as Darwin and published in his book On the Origin of Species (Darwin, 1859), and since discussed further (de Queiroz and Gatesy, 2007).

4.5.3. The Sequence Similarity Approach Did Not Reveal A Unique HAP Genomic Signature Because there was no phylogenetic signature, it was hypothesised that genomes of the hyper ammonia producing bacterial species shared genes with high sequence similarity that conferred this phenotype; something which may have arisen through horizontal gene transfer, something which was suggested previously (de Vladar, 2012). By making gene families based on sequence similarity across all available rumen genomes, a large number of families were created. A family unique to the three

138 Chapter 4 confirmed HAPs was composed of genes for glycine betaine transporters. These transport compounds

+ with quaternary ammonium groups [R‐N (CH)3)3] (Ziegler et al., 2010) which tend to be amino acid derivatives, suggesting that HAPs possibly do not rely on amino acids alone but can also uptake amino acid derivatives. Alternatively, the glycine betaine transporter has also been shown to act as an osmoregulator and sensor, transporting protective compounds into the cell to prevent osmotic stress (Ziegler et al., 2010). This suggests that the HAP bacterial cells may be exposed to osmotic stress, which may be linked to the reliance on using sodium for amino acid transport (Chen and Russell, 1990). The gene family with genes common to HAPs and SAPs contained a variety of dehydrogenases with different substrates, including sorbitol, L-threonine. isopropanol and alcohol, as well as zinc-like alcohol dehydrogenase like proteins. Alcohol dehydrogenases are zinc-containing enzymes, catalysing the oxidation reaction of alcohols to their respective aldehyde or ketone, releasing hydrogen protons in the process (Crichton, 2012). However, these gene families were not only populated by just these HAPs and SAPs, but other species in the Hungate collection. As the ammonia production phenotype for other species present is unknown, it is not possible to conclude that these are gene families that define the ability to produce ammonia. With an underwhelming amount of gene families that were unique to all HAPs or shared between HAPs and SAPs but not shared with NAPs, there is little evidence to support the hypothesis of horizontal gene transfer that confers the ammonia production phenotype.

4.5.4. Functional Approaches Did Not Highlight Unique HAP Pathways The use of functional annotations of genes removes the reliance on creating novel gene families from the rumen microbial genomes, and also removes the need to identify suitable parameters for finding sequence similarity between genes and thresholds suitable for this dataset. Using clusters of orthologous gene families (COGs) instead relies on applying the function of a known protein to those shown to be orthologous to it (Galperin et al., 2015). The implementation of the eggNOG-mapper functionally annotates predicted genes and assigns orthology (Huerta-Cepas et al., 2017, 2019) by using sequence similarity between input sequences and sequences of those genes already annotated and characterised. Assigning a function to a gene, especially a higher level, more generic function (such as carbohydrate metabolism), allowed the convergence hypothesis to be studied, where the HAP phenotype is conferred by the presence of genes that are able to do certain functions, but not necessarily share sequence similarity as they arose separately. A higher percentage of the genes in the three HAPs played a role in amino acid metabolism and transport than the three NAPs, and interestingly also genes involved in energy production and conservation. However, with such a small sample size it is difficult to determine whether this result is representative. Breakdown of amino acids is less efficient than other energy sources; for example, fermentation of four moles of glycine produces three moles of ATP, the highest ATP production of amino acid fermentation, whereas the fermentation of two moles of glucose produces five moles of ATP (de Vladar, 2012). It is therefore important that energy production is as efficient as possible, and energy is conserved. Yet utilising amino acids as the

139 Chapter 4 sole energy source is enough to sustain life of bacteria, as shown by not only in the HAPs in this study but particularly species of the genus Clostridium (de Vladar, 2012).

The three functions found to be unique to HAPs and one common to HAPs and SAPs but not present in NAPs did resemble those identified using the sequence similarity approach. This not only confirms these commonalities, but also adds more functional information, such as that this isomerase in particular is known to play a role in energy production and conservation, as well as posttranslational modifications and chaperone proteins. Additionally to these functions, a Pas domain function was unique to the HAPs, which play a role in cell signalling in response to certain stimuli, including oxygen concentrations (Taylor and Zhulin, 1999). The two functions common to both HAPs and SAPs are both associated with amino acid transport and metabolism, specifically alcohol dehydrogenases and methyltransferases. The alcohol dehydrogenases play a role in the catalysis of some aldehydes to their respective alcohols, a reaction that utilises extra electrons which were produced during the fermentation of some amino acids, particularly threonine, leucine, isoleucine and valine. The methyltransferase was also something that was highlighted using the KO groups, which is used to produce monolignols from amino acids. Using the KO groups, additional functions were found to be unique to HAPs and SAPs, such as spermidine/putrescine transporters, which allow cellular transport of polyamines. Although they are common to many Eukaryotic and Prokaryotic cells, polyamines do arise from the decarboxylation of amino acids; mostly ornithine, arginine or lysine (Potter and Paton, 2014). Exporters for these polyamines could suggest a mechanism of removal of excess by-products of amino acid fermentation, or if these transporters are specifically uptake only transporters, there is a possibility that these polyamines could undergo a similar processes to amino acid fermentation and also be used as energy, but this hypothesis currently lacks evidence.

4.5.5. Hyper Ammonia Producing Bacteria Have More Amino Acid Transporter Orthologs Delving further into transporters related specifically to amino acids and peptides, it was apparent that there were more orthologs of these transporters in these HAPs and SAPs than NAPs. Logically, to ferment amino acids, a cell must first transport theses nitrogenous substrates into its cell, be this amino acids or peptides (or possibly other derivatives, as suggested previously) and to do this requires a dedicated channel or transporter. Since HAPs require amino acids for energy, it was hypothesised that HAPs may have more genes or a wider variety of transporters or channels than NAPs, which is reflected in these results, and as SAPs have the ability to utilise amino acids and peptides for energy, would likely follow this trend. There was one transporter family that was unique to the HAPs and SAPs, comprising orthologs to a variety of glutamate and aspartate transporters, suggesting that glutamate and aspartate are likely to be important for the ammonia production phenotype. Interestingly, A. sticklandii has been shown to export glutamate and aspartate, along with alanine (Fonknechten et al., 2010), but it is unclear as to whether all HAPs do this.

140 Chapter 4 There are not only more but a wider variety of cation symporter orthologs present in these HAP and SAP genomes than in the NAPs, which correlates with the dependency on sodium for amino acid uptake and growth noted in the putative HAP Peptostreptococcus anaerobius C (Chen and Russell, 1989b) and the confirmed HAP C. aminophilum F (Chen and Russell, 1990). Monensin is a cation antiporter, and incubating cultures of these two HAP species with monensin caused inhibition of growth, something which was also noted for the ruminal A. sticklandii isolate (Paster et al., 1993) and E. pyruvativorans Isol6 (Wallace et al., 2003). As monensin caused decreased ammonia production when added to the rumen environment (Yang and Russell, 1993), this suggests that these cation transporters are necessary for HAP activity. Yet cation symporters are not unique to amino acid fermenters, as there are orthologs of these types of transporters also found in NAPs, as well as in other Prokaryotes and even Eukaryotes. Naturally, amino acid uptake is necessary in some capacity in all living cells, as amino acids are the building blocks of proteins and valuable source of carbon and nitrogen (Burkovski and Krämer, 2002). Therefore, the presence of some amino acid transporters in all cells should be expected.

The absence of ammonia transporters in two of the HAPs is also noteworthy. Ammonia is a preferred nitrogen source for many bacteria (Wacker et al., 2014) that needs active transport when required in high concentration. Although uncharged ammonia (NH3) can passively diffuse into cells (Kleiner,

+ 1985), it is a hydrophobic gas that dissociates into the charged ammonium cation (NH4 ), where at

+ neutral pH, NH3 is found in equilibrium with the charged NH4 , with only around 1 % remaining uncharged (Andrade and Einsle, 2007). When ammonia is at low concentrations in the environment, the passive diffusion of uncharged ammonia cannot satisfy cellular requirements, therefore requiring specialised channels or active transport to allow transport of the charged ammonium cation (Zheng et al., 2004; Andrade and Einsle, 2007). Ammonia is known to have cytotoxic effects on cells when it accumulates in high concentrations intracellularly, but there was one study showing that this is not the case for Escherichia coli, Bacillus subtilis and Corynebacterum glutamium, and went on to conclude that this resistance likely extends to other species, including the proteolytic Clostridia (Müller et al., 2006). It may be that other species rely on ammonium channels to import ammonium on a demand basis to avoid high intracellular concentrations and ammonia toxicity, whereas HAPs do not need to import ammonia as it is available in the cell from deamination. Excess uncharged ammonia (NH3) would likely then passively diffuse out of the HAP bacterial cells and avoid possible toxic effects.

4.5.6. Limitations, Implications and Further Work Although the sequence similarity gene family creation method had its advantages, there were some limitations, especially regarding the reliability of gene family formation. Despite the largest component in the gene family creation being split further using a community clustering approach, some other components contained a large number and a seemingly diverse mix of different genes, when considering the gene annotation provided by Prokka. This is due to the similarity thresholds, which when set too low would be too lax and cluster more genes into fewer families, possibly based

141 Chapter 4 on motifs such as transmembrane regions, or if the thresholds were too stringent, there would be more families with fewer species in a family, and not accurately represent shared genes. To overcome this, the distribution of genes that belong to predefined COG families indicated that family production for the most part was acceptable, grouping orthologs for these COGs into one family. One advantage this method holds over using predefined COG groups alone is the inclusion of all genes in all genomes available, as well as increased specificity for rumen microbial genomes. This formed one of the main limitations of the functional approach; the reliance on finding representative sequences similar to ruminal sequences in the eggNOG and the transporter database. Although these orthologs emanate from a variety of species and sources, using suitable thresholds to identify true homology can be problematic (Pearson, 2013).

Furthermore, genomic data such as this can also only show a cell’s potential, showing gene presence or absence and not expression or even post-translational activity. It may be that only few genes are encoded but are upregulated and overexpressed when required, and therefore looking at the number of gene copies present does not reflect true functional capabilities. Expressed genes may produce proteins that are held in an inactive state and only activated when required. Alternatively, some genes may be annotated as functional but may not be, and therefore never utilised by the cell. To better understand what functions and transporters are used by HAPs during the process of amino acid fermentation, transcriptomic studies are required.

The hypothesis that the genomes of HAPs contain common genomic signatures that are unique for this phenotype, possibly shared with SAP species to some extent, but not present in the NAPs, was not adequately supported with this evidence. Instead, from these results it is hypothesised that the processes HAPs use are common to bacteria for amino acid manipulation and biosynthesis. The overall observable HAP phenotype is distinct from the NAP phenotype, implying that some differences must be present. If HAPs are utilising amino acids for energy, they must therefore be using common pathways and enzymes more, meaning that the transcription of these enzymes is likely to be higher. Although enzymes often have multiple functions, and can sometimes catalyse reactions bidirectionally, it is possible to build a representation of the pathways HAPs may take from the genomes alone, as shown in genome-models for Acetoanaoerbium sticklandii DSM519 (Sangavai and Chellapandi, 2017), but confirming this with transcriptional data or further experiments would be valuable.

4.6. Conclusions

Characteristic genes for Acetoanaerobium sticklandii DSM519 were found not to be common characteristics for the HAPs, nor was there evidence that selenoproteins are important for this phenotype. Hyper ammonia production is not a monophyletic trait, nor is there evidence here to support that this phenotype has arisen in phylogenetically distant species through horizontal gene transfer, nor convergent evolution of the same functions. Furthermore, there is no genomic signature

142 Chapter 4 that is unique to these HAPs compared to the NAPs and SAPs. Instead, the HAPs may share the same genes as the NAPs and SAPs, which may have been shrouded in these comparisons. It is these common genes involved in amino acid metabolism and synthesis that may be utilised more frequently and efficiently by the HAPs, something that may be reflected in the transcriptomes.

143 Chapter 5

5. Analysis of the Transcriptomes of Rumen Bacteria with Known Ammonia Production Phenotypes to Determine A Hyper Ammonia Production Signature

5.1. Introduction

As discussed previously, hyper ammonia producing bacteria play an important role in nitrogen metabolism in the rumen. A genomic approach did not reveal a HAP signature, instead affirming that the HAP species achieve the same phenotype of high levels of ammonia production from amino acid and peptide degradation using pathways and enzymes that are common to amino acid metabolism and synthesis in many species. Therefore, to achieve this phenotype, the HAP species must instead be more efficient at utilising these pathways, utilising more enzymes and possibly expressing them more than the NAP species, something that would be reflected in the transcriptomes. During literature searches, no publications were found on the transcriptomes for the HAPs, offering the potential for a transcriptome-mediated HAP phenotype to be explored in this current work.

5.2. Aims and Objectives

• To sequence the transcriptomes of the three HAP species and three NAP species. • To compare these transcriptomes to determine differences between HAPs and NAPs that could indicate genes involved in amino acid fermentation and hyper ammonia production.

5.3. Experimental Procedures

The transcriptomes of the three HAPs (Clostridium aminophilum ATCC49906, Acetoanaerobium sticklandii ATCC12262 and Eubacterium pyruvativorans Isol6) and three NAPs (Fibrobacter succinogenes S85, Ruminococcus albus SY3, R. flavefaciens 007c) were sequenced. A summary of the steps taken can be seen in Figure 5.1.

Firstly, five replicate cultures for each of the HAPs and NAPs were grown in Hungate tubes in the same Hobson’s M2 medium (Section 2.2), with each replicate inoculated from the same initial culture (Section 2.3). The exception to this was for A. sticklandii ATCC12662 and R. flavefaciens 007c, which required the pooling of cells from two cultures for each RNA sample to achieve enough cellular matter to extract from, resulting in nine cultures grown for these species. These cultures were then incubated for a predetermined time (as determined by growth curves, as described in Section 2.5) to achieve approximately mid-log growth phase, before harvesting cells by centrifugation for RNA and DNA extraction. The incubation times for the cultures were 16 hours for A. sticklandii ATCC12662, 12 hours for both C. aminophilum ATCC49906 and E. pyruvativorans isol6, seven hours for R. flavefaciens 007c, four hours to R. albus SY3 and 11 hours for F. succinogenes S85. Four

144 Chapter 5 cultures underwent RNA extraction and sample preparation, which was carried out as described in Section 2.10. The cells from the remaining one culture for each species were frozen and used for 16S analysis (section 2.8).

Once the sequencing data was received, read quality was assessed and reads were trimmed as described in section 2.20.1. Reads were mapped to genes and counted (Section 2.20.2) and these data used to for expression analysis. This was done firstly by identifying the most highly expressed genes on average across the four replicates for each species. Then the data were analysed using read counts for both clusters of orthologous genes (COGs) and KEGG orthologs (KOs), as described in Section 2.20.3. A transcriptome functional profile was created by summing up all reads to genes that belonged to each functional category identifier, and then expressed as a percentage of the total reads. Counts for genes identified as amino acid, peptide or ammonia transporters were summed to allow for inferences to be made about the transcription and expression of transporters. Sequence similarity approaches using EGN created gene families were not utilised as an approach for transcriptome analysis due to low reliability and mistrust of the gene family creation. It was thought that poor gene family creation would be amplified when read counts were involved, leading to incorrect and misleading results.

145 Chapter 5

Cultures inoculated Sections 2.2, 2.3

Cultures incubated for predetermined time to reach Section 2.5 roughly midlog growth phase

Cells pelleted from Cells pelleted and RNA Section 2.10 one culture and extracted and samples prepared. frozen. Sections 2.6, 2.8 Sent to Genomics Core Cell pellets used to Technology Unit at Queen’s 16S analysis to University, Belfast confirm cultures.

Raw reads received, quality Section 2.20.1 controlled and trimmed.

Reads aligned to genes and Section 2.20.2 counted.

Amino acid, peptide and COGs KOs Section 2.20.3 ammonia transporters

Functional Statistical category analysis identifiers

Transcriptome functional profile

Figure 5.1 Flow diagram of the steps taken for RNA extraction and transcriptomic analysis. Italics show the section in chapter 2 to refer to for more details.

146 Chapter 5 5.4. Results

5.4.1. Transcriptomic Data Quality and Evaluation With a total of 516,578,677 reads and a length of 76bp, 39.3Gb of data was created for this transcriptomics dataset, which included four replicates each for six species.

The overall quality of the reads were excellent, with almost all bases on average showing a Phred score of over 30, as can be seen in the graph in Figure 5.2. The total number of reads for each sample and the number of reads retained after quality trimming can be seen in Table 5.1. There were a high number of duplicated reads present in all samples, with an average of 88.18 % reads showing duplication. On average, 4.03 % of reads were removed after trimming, and of those remaining 31.06 % aligned to protein coding genes that were not ribosomal rRNA or ribosomal protein sequences.

5.4.2. Genes with Highest Expression Expression analysis was carried out to determine which genes were expressed during the exponential growth phase of HAP and NAP bacterial cultures in a whole and well-supplemented rumen fluid medium. Assuming the number of reads that align to a gene correlated with expression, on average across the four replicates the most highly expressed genes were a Putative_endo-beta-N- acetylglucosaminidase in C. aminophilum ATCC49906, and different hypothetical proteins in both the HAPs A. sticklandii ATCC12662 and E. pyruvativorans Isol6, as well as in the NAP R. albus SY3. The most highly expressed gene in R. flavefaciens 007c was a fimbrial protein and alpha-agarase in F. succinogenes S85. There was a total of 18 genes in the genomes of E. pyruvativorans Isol6, 14 in A. sticklandii ATCC12662 and one in C. aminophilum ATCC49906 that has no reads aligned to it in any of the replicates. Eight genes had no reads in R. flavefaciens 007c, six in R. albus SY3 and all genes in F. succinogenes S85 had reads aligned (see Appendix Table 10.3). Simply by looking at the most highly expressed genes manually does not elucidate any obvious patterns; for example, the top highly expressed genes in the HAPs are not directly related to amino acid or peptide transport or metabolism, nor are those highest in the NAPs strikingly apparently involved in carbohydrate or fibre degradation. This could reflect the current growth medium used, and offering a more substantial source of amino acids, peptides or carbohydrates would result in the relevant gene expression profile expected. Instead, a more in-depth analysis is necessary to bring to light the underlying patterns of expression.

147

Chapter 5 Table Table

Average 007c flavefaciens . R 007c flavefaciens . R 007c flavefaciens . R 007c flavefaciens . R albusSY3 . R albusSY3 . R albusSY3 . R albus . R S85 succinogenes . F S85 succinogenes . F S85 succinogenes . F S85 succinogenes . F Isol6 pyruvativorans E . Isol6 pyruvativorans E . Isol6 pyruvativorans E . Isol6 pyruvativorans E . ATCC49906 aminophilum . C ATCC49906 aminophilum . C . C ATCC49906 aminophilum . C sticklandii ATCC12662 . A 12662 ATCC sticklandii . A 12662 ATCC sticklandii . A 12662 ATCC sticklandii . A Species

aminophilum ATCC49906 aminophilum 5

. 1

SY3

Informationand statistics pertaining to quality the control of the transcriptomicdata.

1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 Replicate

21,524,112 20,129,894 20,741,639 18,968,644 20,987,815 27,321,811 21,551,740 20,263,625 20,108,794 28,944,057 22,565,532 19,724,993 21,420,358 19,041,759 25,034,845 19,792,084 25,372,732 17,150,102 16,271,250 18,138,770 20,096,428 26,333,518 19,946,188 22,916,892 23,755,207 ofReads Number

20,674,745 19,800,467 19,457,619 18,774,212 20,573,172 27,076,897 21,301,504 20,167,672 20,065,692 28,802,773 22,458,082 19,504,529 21,191,847 18,159,348 22,801,850 18,655,653 24,307,970 16,142,405 15,653,485 17,741,124 14,700,541 25,506,921 19,228,499 21,439,585 22,682,024 trimming after of Reads Number

4.03 1.64 6.19 1.03 1.98 0.9 1.16 0.47 0.21 0.49 0.48 1.12 1.07 4.63 8.92 5.74 4.2 5.88 3.8 2.19 26.85 3.14 3.6 6.45 4.52 removed reads of (%) Percentage

31.06 33.06 37.06 35.69 34.93 47.09 46.19 43.55 45.61 20.18 20.89 25.11 21.94 23.56 25.97 24.97 23.68 24.39 24.74 25.05 29.48 29.71 34.57 33.85 34.12 by reported (as regions coding to readsaligning ofretained (%) Percentage

FeatureCounts)

148 Chapter 5

Figure 5.2 Mean quality Phred scores per base in reads from all samples. Made with MultiQC (Ewels et al., 2016).

149 Chapter 5 5.4.3. Proportion of Gene Expression Contributing to Functions Using the higher-level COG functional identifiers allowed examination of the proportion of gene expression for the various overall functions, creating a functional profile for the transcriptome. Overall, the functional profiles for the transcriptomes varied between all species (Figure 5.3). Genes important for energy production and conservation functions on average showed a significantly higher expression in the HAPs than the NAPs (P<0.001, identifier C in Figure 5.4), compared to the NAPs). Similarly, genes involved in amino acid metabolism and transport formed on average 10.4 ± 4.2 % of the expression in HAPs which was significantly greater than the NAPs, compared to 4.6 % ± 1.4 % (P=0.0011, identifier E in Figure 5.4). On the other hand, carbohydrate metabolism and transport were significantly more highly expressed in the NAPs than the HAPs (P<0.0056, identifier G in Figure 5.4). Looking at the individual species, genes known to play a role in energy production and conservation and amino acid transport and metabolism were the most transcribed genes for the HAP A. sticklandii ATCC12662, forming 15.7 % and 14.8 % of the reads respectively (Figure 5.3). Cell wall, membrane and envelope biogenesis genes were the most transcribed group for E. pyruvativorans Isol6 forming 21.7 % of the reads, followed by energy production and conservation at 19.8 % and amino acid and metabolism at 11.3 %. Energy production and conservation formed 10 % of the reads in C. aminophilum ATCC49906, but amino acid transport and metabolism formed only 5.1 % of reads, which is lower than the proportion of reads to genes with this function in both the NAPs F. succinogenes S85 (5.3 %) and R. flavefaciens 007c (5.5 %), made apparent in the box and whisker plot (identifier E, Figure 5.4).

150 Chapter 5

100% No COG function available

Function unknown

90% General function prediction only

Secondary metabolites biosynthesis, transport and catabolism 80% Inorganic ion transport and metabolism

Lipid transport and metabolism

Coenzyme transport and metabolism 70% Nucleotide transport and metabolism

Amino acid transport and metabolism 60% Carbohydrate transport and metabolism

Energy production and conversion 50% Posttranslational modification, protein turnover, chaperones Intracellular trafficking, secretion, and vesicular transport 40% Extracellular structures

Cytoskeleton

30% Cell motility

Cell wall/membrane/envelope biogenesis

Signal transduction mechanisms 20% Defense mechanisms

Nuclear structure 10% Cell cycle control, cell division, chromosome partitioning Chromatin structure and dynamics

0% Replication, recombination and repair

Transcription

RNA processing and modification

Translation, ribosomal structure and biogenesis

Figure 5.3 Stacked graph for the expression of genes belonging to different functional categories, expressed as a percentage. Each functional category is coloured according to the legend on the right. Normalised expressions counts were used with the count for each gene in a functional category summed and expressed as a total of all expression counts. The three species to the left of the dotted line are HAPs, to the right are NAPs.

151 Chapter 5

Figure 5.4 Tukey style box and whisker plots of percentage of reads aligning to coding sequences with a particular function. The boxplots summarise the data within each phenotype, showing the median, first and third quartiles in the box, and 1.5 times the interquartile range above and below the median in the whiskers. The points are the replicates for each species, coloured according to the legend on the right. Black points are outliers. The functional identifiers are J - Translation, ribosomal structure and biogenesis; K – Transcription, L - Replication, recombination and repair; D- Cell cycle control, V- Defence mechanisms; T - Signal transduction mechanisms; M - Cell wall/membrane/envelope biogenesis; N - Cell motility; U - Intracellular trafficking, secretion, and vesicular, transport; O - Posttranslational modification, protein turnover, chaperones; C - Energy production and conversion; G - Carbohydrate transport and metabolism; E- Amino acid transport and metabolism; F- Nucleotide transport and metabolism; cell division, chromosome partitioning; H - Coenzyme transport and metabolism; I - Lipid transport and metabolism; P - Inorganic ion transport and metabolism; Q - Secondary metabolites biosynthesis, transport and catabolism, S- Function unknown

152 Chapter 5 5.4.4. Functional Annotation Approach for Transcriptomic Analysis Out of 588 orthologous gene families common to all HAP and NAP species, there were 269 gene families with a significant difference in expression (P adjusted value > 0.1 in DESeq2) between the expression counts in the species of the HAPs and those of the NAPs, which was indicated by absence of the same letter (denoting significance) in a HAP and a NAP (Appendix Table 10.4). Boxplots were created for all 269 gene families and manually assessed, to determine those families with an observable and apparent difference in expression, where for each gene family, the lowest gene expression in any of the replicates of the phenotype with the greater median did not fall below that of the highest expression in the other phenotype (Appendix Figure 10.2). Manual curation of these 268 gene families revealed 47 families of note (Figure 5.5).

There were 26 of these noteworthy gene that were families highly expressed in the HAPs, and 21 in the NAPs. A summary of these gene families can be seen in Table 5.2, as well as boxplots to visualise the data in Figure 5.5 and plotted onto metabolism pathway maps in Appendix Figure 10.4 and Appendix Figure 10.5. Of all the gene families that showed higher expression in the HAPs, only two were involved in amino acid metabolism and transport; specifically, a family of glutamine, leucine, phenylalanine and valine dehydrogenases (COG0334) and amino acid carrier proteins (COG1115). There were three gene families that showed higher expression in the NAPs that were related to amino acid metabolism and transport, but these were a family of tryptophan synthases (COG0159), prephenate dehydrogenases (COG0287) and enzymes that catalyse the transfer of a phosphoribosyl group (COG0547). There was a family of rubredoxin genes (COG1773) which were significantly more highly expressed in the HAPs and are in involved in energy production and conversion. Five of the 26 gene families (COG0282, COG0634, COG0503, COG1947 and COG0756) that were more highly expressed in the HAPs play a role in nucleotide transport and metabolism, whereas only one of the gene families (COG0040) with this role was more highly expressed in the NAPs. There were two orthologous gene families involved in carbohydrate metabolism and transport that are significantly more expressed in the NAPs (COG0406, COG0662) compared to one in the HAPs, but this latter gene family is also involved in cell wall and membrane biosynthesis (COG0491).

The six orthologous gene families that were unique to the HAPs (including the two also common to SAPs) and not present in the NAPs were not amongst the most highly expressed genes for those genomes, nor were the proportion of reads that mapped to these genes at a comparable expression level. The transcripts per million statistic normalises data for gene length and sequencing depth and can be seen for these gene families in Table 5.3.

153 Chapter 5

Table 5.2 Functional descriptions and identifiers for the orthologous functional groups found to show significantly different expression in HAPs and NAPs. Those families in red are more expressed in the HAPs, and those in green are most expressed in the NAPs. The functional identifiers are C - Energy production and conversion; E- Amino acid transport and metabolism; F- Nucleotide transport and metabolism; D- Cell cycle control, cell division, chromosome partitioning; K – Transcription, O - Posttranslational modification, protein turnover, chaperones; H - Coenzyme transport and metabolism; T - Signal transduction mechanisms; J - Translation, ribosomal structure and biogenesis; U - Intracellular trafficking, secretion, and vesicular, transport; L - Replication, recombination and repair; I - Lipid transport and metabolism; P - Inorganic ion transport and metabolism; M - Cell wall/membrane/envelope biogenesis; G - Carbohydrate transport and metabolism; N - Cell motility. Information for descriptions obtained from eggNOG 5.0 database (http://eggnog5.embl.de/download/eggnog_5.0/per_tax_level/2/2_annotations.tsv.gz, accessed 22/01/2020.)

154 Chapter 5

COG0816 COG0428 COG0756 COG0846 COG1502 COG2003 COG1947 COG0234 COG0634 COG0503 COG0258 COG0749 COG0491 COG1773 COG0251 COG0215 COG2172 COG0542 COG1057 COG1191 COG0459 COG0568 COG0733 COG0282 COG1115 COG0334 Group

Could be a nuclease involved processing in of 5'the Transporter on target proteins. Modulates the activities of several proteins which are inactive in their acylated form Dutp diphosphatase activity form cardiolipin (CL) (diphosphatidylglycerol) and glycerol NAD Catalyses the reversible phosphatidyl group transfer from one phosp erythritol Belongs to UPF0758the family Catalyses the phosphorylation of positionthe 2 hydroxy group 4 of Binds Cpn60to in th synthesis Belongs to the purine pyrimidine phosphoribosyltransferase family Catalyses salvage a reaction resulting in the formation AMP, of that is energically less costly than de novo I additionIn polymeraseto activity, this DNA polymerase exhibits 5' Thiolesterase that catalyses hydrolysis the of S Rubredoxin Oxidation Cysteine Sigma factor antagonist activity d Response to heat Catalyses the reversible adenylation of nicotinate mononucleotide (namn) to nicotinic acid adenine Sigma factor activity initiation sites and are then released Protein refolding Sigma factor reverse reaction Neurotransmitter:sodium symporter activity Acetate kinase; Amino acid carrier protein Belongs to Gluthe Leu Phe Val dehydrogenases family Functional description

n addition polymeraseto activity, this DNA polymerase exhibits 5'

inucleotide (naad)

-

dependent lysine d

-

trna activity

-

reduction process

activity:

catalyses the formation of acetyl phosphate from acetate and ATP. Can also cataly

e presence of Mg

initiation factors that promote the attachment of RNA polymerase to specific

eacetylase and desuccinylase that specifically removes acetyl and succinyl groups

-

ATP and suppresses the atpase activity of the latter

-

D

-

lactoyl

-

end pre of

-

glutathione t

-

16S rrna

-

- -

diphosphocytidyl

hatidylglycerol molecule to another to

3' 3' exonuclease activity 3' exonuclease activity

o form glutathione and D

-

2C

-

methyl

-

-

lactic acid

D

s

e the

-

L P F K I L F O F F L L GM C J J T O H K O K D F E E Functional identifier

155 Chapter 5

COG2205 COG0438 COG0515 COG4972 COG0662 COG2804 COG0406 COG0769 COG0758 COG1354 COG2264 COG0379 COG1589 COG0029 COG0547 COG0157 COG1281 COG0159 COG0287 COG0040 COG0054 Group

Protein h Protein activity, transferring glycosyl groups Serine threonine protein kinase Pilus assembly protein Cupin 2, conserved barrel domain protein Type II secretory pathway, atpase pule Tfp pilus assembly pathway, atpase pilb Alpha glutamate (UMAG) in the biosynthesis bacterial of cell Catalyses the addition of meso DNA mediated transformation containing Smc and scpb that pull DNA away from mid Partici Protein methyltransferase activity divisome assembly Catalyses the condensation iminoof cytoplasmic, with the downstream cell division proteins, which are predominantly p Essential cell division protein. May link together upstreamthe cell division proteins, which are predominantly Catalyses the oxidation of L yield N Catalyses the transfer of the phosphoribosyl group of 5 Belongs to the nadc modd family irreversible aggregation. Plays an important role in the bacterial defense system toward oxidative st Redox regulated molecular chaperone. Protects both thermally unfolding and oxidatively damaged proteins from Tryptophan synthase activity regulation hisgof enzymatic activity Prephenate dehydrogenase ATP). Has a crucial role in pathwaythe because the rate histidine of biosynthesis seems Catalyses the condensation ATPof and 5 6,7 Functional description

-

dimethyl

-

pates in chromosomal partition during cell division. May act via the formation a of condensin

ribazole phosphatase activity

-

(5'

istidine kinase activity

-

phosphoribosyl)

-

8

-

ribityllumazine synthase activity

-

-

anthr

aspartate to iminoaspartate

-

diaminopimelic acid to nucleotidethe precursor UDP

anilate (PRA)

aspartate with dihydroxyacetone phosphate to form quinolinate

-

phosphoribose 1

-

-

-

wall peptidoglycan

phosphorylribose

cell into both cell halves

-

diphosphate formto N'

-

1

-

pyrophosphate (PRPP) anthranilateto to

-

(5'

eriplasmic. May control correct

-

N

-

phosphoribosyl)

-

acetylmuramoyl

to to be controlled primarily by

ress

-

like complex

-

ATP (PR

-

L

-

alanyl

-

-

D

-

T M KLT NU G NU G M LU D J H D H E H O E E F H Functional identifier

156 Chapter 5

A

157 Chapter 5

B

158 Chapter 5

Figure 5.5 Tukey style box and whisker plots of significantly different expression of eggNOG functional orthologous groups. The boxplots summarise data within each phenotype, showing the median, first and third quartiles in the box, and 1.5 times the interquartile range above and below the median in the whiskers. The points are the replicates for each species, coloured according to the legend on the right. Data used for the boxplots and the points use DESeq2 normalised counts. (A) shows only boxplots for COG gene families that are significantly increased in HAPs, whereas (B) shows all but two gene families that are higher in NAPs. Species are indicated in the legend; 12662 – A. sticklandii ATCC12662; 49906 – C. aminophilum ATCC49906; Isol6 – E. pyruvativorans Isol6; S85 – F. succinogenes S85; 007c – R. flavefaciens 007c; SY3 – R. albus SY3.

159 Chapter 5

Table 5.3 Table of the average transcripts per million (TPM) counts for the four replicates each for the three HAP species, including standard deviation. A. sticklandii C. aminophilum E. pyruvativorans COG Function ATCC12662 ATCC49906 Isol6 Energy production and conversion; COG0384 Posttranslational modification, 88.79 ± 10.47 190.28 ± 35.52 420.99 ± 103.21 protein turnover, chaperones COG1292 Cell wall/membrane functions 2.51 ± 0.18 33.65 ± 8.72 322.6 ± 334.6 COG2202 Signal transduction 27.4 ± 3.91 446.91 ± 42.25 36.62 ± 3.86 COG5505 Unknown function 5.2 ± 1.29 262.03 ± 49.28 221.47 ± 30.37 Amino acid transport and COG1063 88.57 ± 16.42 9.39 ± 2.76 64.56 ± 12.27 metabolism Amino acid transport and COG4122 68.86 ± 12.43 99.35 ± 14.73 178.5 ± 19.15 metabolism

The KEGG orthologs revealed more about the enzymes and pathways common to HAPs or NAPs. The KEGG orthologs with the greatest amount of reads on average in A. sticklandii ATCC12662 are two groups of transport ATP-binding proteins; CydC and CydDC (K16012, K16014), which are cysteine exporters, to which the average percentage across the four replicates of reads that aligned to genes with this function was 12.4 1 %. For C. aminophilum F, 12.84 % of the reads align to orthologs that are arabinogalactan endo-1,4-beta-galactosidases (K01224) and endo-alpha-N- acetylgalactosaminidase (K17624), both of which are glycosylases not assigned to a defined pathway. 2.60 % of the reads in E. pyruvativorans Isol6 aligned to orthologs of acetyl-CoA C-acetyltransferase (K00626), 2.09 % to glutamate dehydrogenase (K00262), which both play a role in amino acid metabolism amongst other pathways, and 2.05 % to pyruvate-ferredoxin/flavodoxin (K03737), which is involved in pyruvate oxidation.

For the NAP species, the KEGG ortholog groups to which the highest percentage of reads on average aligned to were an elongation factor Tu (K02458) to which 1.85 % of the reads aligned, as well as naphthyl-2-methylsuccinyl-CoA dehydrogenase (K15771), which is involved in naphthalene degradation in F. succinogenes S85. A further 1.73 % of reads align to an endoglucanase (K01170), which is a cellulase that plays a role in starch and sucrose metabolism, as well as cellulose degradation. In R. flavefaciens 007c, 6.33 % of the reads align to a general secretion pathway protein G (K02456), and 5.91 % to type IV pilus assembly protein PilA (K02650) which are both involved in secretion and signalling. 3.93 % of reads in R. albus SY3 aligned to small subunit ribosomal protein S1 (K02956), and 3.63 % to peptidoglycan DL-endopeptidase CwlO (K21471), an endopeptidase that acts on peptide-peptide bonds during the peptidoglycan biosynthesis and degradation processes.

From a total of 3025 unique KEGG ortholog groups (KOs) present in at least one of the six genomes, 572 of these were common to all three species of each HAPs and NAPs. Of these 572, there were 33 KO groups where the HAPs showed significantly higher expression than the NAPs, and 23 where NAPs showed greater expression than the HAPs (Padj < 0.1, DESeq2). These KOs are listed in

160 Chapter 5 Table 5.5, boxplots for visualisation of the data in Figure 5.6 and have been plotted onto metabolism pathway maps in Appendix Figure 10.4. The boxplots indicate that for some of the KEGG orthologs, despite a significant increase in expression in the HAPs compared to the NAPs, there remains a large variation in expression between the HAP species; for example, in (K03737, K00262, K00812 and K04069). There were three KEGG ortholog families involved in amino acid metabolism with increased expression in the HAPs than NAPs; glutamate dehydrogenase (K00262) and aspartate aminotransferase (K00812), and alanine racemase (K01775). There is also an increase expression in the enzyme acetate kinase (K00925), which is involved in propanoate metabolism, and six transporters; a zinc transport system permease protein and zin transporter (K09816, K07238), energy- coupling factor transport system substrate-specific component (K16924), alanine or glycine:cation symporter (K03310), a neurotransmitter:Na+ symporter (K03310), and polysaccharide transporter (K03328).

On the other hand, of the 22 KEGG orthologs that had a significantly higher expression in the NAPs, 17 were enzymes involved in a variety of pathways, including six unidirectional enzymes more highly expressed that are involved in synthesis of amino acids; specifically cyclohexadieny/prephenate dehydrogenase (K00220), prephenate dehydrogenase (K04517) in tyrosine biosynthesis, anthranilate phosphoribosyltransferase (K00766), tryptophan synthase (K01695) and anthranilate synthase/phosphoribosyltransferase (K13497) in tryptophan biosynthesis, and phosphoribosyl-AMP cyclohydrolase / phosphoribosyl-ATP pyrophosphohydrolase (K11755) in histidine biosynthesis. Additionally, one of the KEGG orthologs with significantly increased expression is L-aspartate oxidase (K00278), which catalyses the bidirectional reaction of aspartate, water and oxygen to produce oxaloacetate, ammonia and hydrogen peroxide. This same enzyme also catalyses the reaction of aspartate and oxygen to produce iminoaspartate and hydrogen peroxide, a pathway in nicotinate and nicotinamide metabolism. Quinolinate synthase (K03517) and icotinate-nucleotide pyrophosphorylase (K00767) are also significantly highly expressed in NAPs, which are the next two enzymes in the nicotinate and nicotinamide metabolism pathway. A glucokinase (K00845) is also expressed significantly more in the NAPs than HAPs, which plays a role in carbohydrate metabolism, specifically galactose metabolism, starch and sugar metabolism, and glycolysis and gluconeogenesis. Of those KOs that were unique to just HAP species, or HAPs and SAPs but not NAPs, none of these functions are particularly highly expressed in these transcriptomes (Table 5.4). Only one of the seven has a comparable level of expression (expressed as transcripts per million, TPM), which is lysophospholipase (K01048) involved in lipid metabolism.

161 Chapter 5 Table 5.4 Transcripts per million (TPM) averages across four replicates for each species, with standard deviation for the seven KOs that were unique to HAPs, or common to HAPs and SAPs but not NAPs. Descriptions as per the KEGG database ((Kanehisa et al., 2016), www.genome.jp/kegg/; accessed 11/02/2020).

A. sticklandii C. aminophilum E. pyruvativorans KO Description ATCC 12662 ATCC49906 Isol6 Putative spermidine/putrescine K02052 transport system ATP-binding 400.3 ± 21.9 20.2 ± 4.3 68.14 ± 7.41 protein K07483 Transposase 81.73 ± 6.99 63.32 ± 8.13 360.35 ± 65.07 Lysophospholipase K01048 148.82 ± 11.45 142.65 ± 30.61 128.32 ± 19.43 [EC:3.1.1.5] CRP/FNR family K21562 transcriptional regulator, 1898.37 ± 121.56 212.44 ± 17.72 68.56 ± 11.48 anaerobic regulatory protein Spermidine/putrescine K11070 transport system permease 220.4 ± 43.76 15.62 ± 4.22 166.05 ± 18.48 protein K06975 Uncharacterised protein 43.47 ± 13.45 1373.72 ± 329.07 396.97 ± 58.62 Caffeoyl-coa O- K00588 methyltransferase 70.79 ± 13.06 109.74 ± 16.25 187.63 ± 19.38 [EC:2.1.1.104]

Table 5.5 Table of KEGG orthologous gene groups that showed significantly different expression in HAPs and NAPs. Numbers denote the mean and standard deviation of DESeq2 normalised count data, and the letter shows the where significant differences lie; different letters shows a significant difference between these species. Those gene families highlighted in red are where the median is greater in HAPs compared to NAPs, whereas those highlighted in green are greater in NAPs compared to HAPs.

162 Chapter 5

K09816 K07238 K01520 K12410 K03328 K06183 K01515 K06131 K03630 K16924 K00919 K00759 K04078 K00760 K01972 K03574 K01883 K04485 K03694 K03695 K00969 K01775 K04077 K10773 K03087 K03308 K00925 K03086 K03310 K04069 K00812 K00262 K03737 Species

802.4 ± 92.1 a 1624.8 ± 137.4 a 747.8 ± 99.8 e 890.7 ± 98.7 c 678.9 ± 64.7 d 828.4 ± 92.1 a 610.2 ± 79.2 a 1110.7 ± 112.9 a 1586.5 ± 120.9 a 651.8 ± 28.3 a 651.8 ± 2025.2 ± 110.8 c 453.4 ± 91.9 a 4444.8 ± 930.8 a 2284.1 ± 314.6 a 6117.9 ± 540.9 a 3760.4 ± 192 e 4354.7 ± 629.4 a 2934.9 ± 451.5 a 2934.9 ± 451.5 a 5677.6 ± 1053.8 a 6754.6 ± 917.2 a 5868.4 ± 1334.3 a 2143.2 ± 162.7 a 5809.1 ± 711.5 a 9525.5 ± 1597 15247.6 ± 1106.3 a 5809.1 ± 711.5 a 8138.9 ± 1993.5 a 19576.8 ± 1838.9 a 45227.4 ± 8337.7 a 84501.6 ± 12623.1 a 111111.7 ± 12207 e A. sticklandii

28.3 a

ATCC 12662

.2 e

310.1 ± 50.5 b 227.9 ± 14.5 b 428.9 ± 53.3 a 800.4 ± 68.2 c 771.6 ± 34.8 d 5806.3 ± 301 b 1539 ± 147.2 b 1344.1 ± 31.6 e 964.1 ± 92.8 e 1657 ± 116.7 b 1657 ± 116.7 b 2049.5 ± 257 c 1475 ± 119.8 b 1346.4 ± 101.1 e 4849.6 ± 99.5 b 5154.5 ± 163.6 e 4580.1 ± 106.8 a 3343.7 ± 384.6 b 5276.9 ± 578.3 c 5276.9 ± 578.3 b 4318 ± 223.8 b 1574 ± 149.2 b 7556.6 ± 460.7 b 6645.4 ± 943.2 b 8431.3 ± 459.9 b 10957.5 ± 250.9 e 5551.9 ± 553.5 b 10324.2 ± 619.4 e 10063.1 ± 1513.5 b 10209.1 ± 1772.2 b 1672.7 ± 174.9 b 23200.9 ± 1428.6 b 35805.1 ± 2522.9 a C. aminophilum

ATCC49906

550.8 ± 55.8 c 452.4 ± 52.5 c 816.5 ± 20.3 e 1220.7 ± 225 a 717 ± 131.5 d 1060 ± 63.1 c 1093.5 ± 122.9 c 1314.9 ± 185 e 1015.5 ± 80.5 e 2026.4 ± 87 c 2026.4 ± 87 c 3057.4 ± 124.2 a 3135.1 ± 823.5 c 1354.4 ± 124.1 e 1431.4 ± 124.5 c 5230.9 ± 306.8 e 3800.6 ± 357.8 e 1079.5 ± 110.4 c 571 6731.5 ± 638 c 6734.6 ± 275.6 c 27975.6 ± 1966 c 23491.2 ± 4356.3 c 8720.5 ± 1223.3 c 10192.9 ± 980 c 7999.2 ± 980.6 a 10140.4 ± 974.1 c 10946.6 ± 1047.4 e 13591.2 ± 1600.4 c 1693 ± 155.1 c 36200 ± 2417.3 c 131741.8 ± 14372.2 c 128503.1 ± 7866.6 e E. pyruvativorans

2.6 ± 375.4 c

Isol6

218 ± 17.9 d 125.5 ± 7.3 d 248.7 ± 26.3 b 246.4 ± 26.9 d 180 ± 30 a 393.6 ± 30.4 d 136.5 ± 18.3 d 72.4 ± 9 b 338.9 ± 58.9 b 345 ± 16.3 d 345 ± 16.3 d 307.7 ± 20.3 b 191.7 ± 71 e 136.4 ± 14.5 b 911.4 ± 44.3 e 647.9 ± 33.2 b 1927.7 ± 243 b 478 ± 69.6 e 1082.3 ± 243.4 d 1082.3 ± 243.4 e 1133.4 ± 63.4 d 1143.9 ± 121.6 e 1300.4 ± 310.1 d 310.2 ± 24.8 e 729.3 ± 99.4 d 1302.2 ± 71.4 b 1696.3 ± 218.6 d 3763.4 ± 217.9 b 3010.9 ± 123.4 d 295 ± 12.7 e 500.4 ± 28.7 d 1116 11238 ± 1033 b F. succinogenes

7 ± 1839.4 d

S85

181.1 ± 18.7 f 44.8 ± 5.1 f 212.3 ± 11.3 d 278.2 ± 20.1 d 38.2 ± 19.1 c 299.1 ± 36.4 f 441.1 ± 15.4 f 357.1 ± 23 d 74.1 ± 14.7 d 556.6 ± 39.2 f 544.8 ± 41.1 f 848.3 ± 89.8 d 268.4 ± 40.4 d 111.6 ± 11.7 d 621.4 ± 34.4 d 3280.3 ± 115.7 d 1199.1 ± 45.3 d 784.8 ± 46 d 1697.6 ± 213.2 b 1697.6 ± 213.2 d 2823.7 ± 314.9 f 1113.7 ± 124.9 e 2013 ± 300.7 f 941.2 ± 125.3 d 1859.2 ± 243.9 f 332.5 ± 76.2 d 2912.5 ± 92.3 f 1859.2 ± 243.9 d 391.7 ± 56.8 e 303.3 ± 26.8 e 927.3 ± 65.2 f 3187.3 ± 518.5 f 20819.4 ± 4477.3 d R. flavefaciens

007c

109.7 ± 14.1 e 10.1 ± 3.2 e 161 ± 7.2 c 202.5 ± 28.7 b 15.5 ± 3.9 b 90 ± 8.6 e 96.9 ± 20.5 e 563.8 ± 41.6 c 15.3 ± 2.8 c 478.9 ± 21.8 e 471.1 ± 21.3 e 891.3 ± 61.1 d 169.5 ± 27.7 e 42.3 ± 9.8 c 937.6 ± 38.9 e 2678.1 ± 158.8 c 614 ± 25.4 c 490.4 ± 45.1 e 1125.6 ± 92.8 d 1125.6 ± 92.8 e 829.4 ± 105.1 e 715.3 ± 41.9 d 544.4 ± 98.3 e 281.4 ± 20.9 e 1068 ± 156.9 e 2720.2 ± 279.5 c 386 ± 112.3 e 1068 ± 156.9 c 334.1 ± 39 e 253.2 ± 8.8 d 213.2 ± 24.2 e 1423.1 ± 180.2 e 5823 ± 563 c R. albus

SY3

163 Chapter 5

K12132 K02652 K00845 K03110 K00655 K03147 K00278 K11755 K05896 K02687 K03517 K07102 K04096 K03589 K00766 K13497 K00767 K08744 K04517 K01695 K00995 K00220 K00210 Species

A.sticklandii

1311.5 ± ± 1311.5 a ± 138.8 1741.8

661.6 ± 40.1 40.1 d ± 661.6 52.4 d ± 381.8 52.4 d ± 381.8

118.1 ± 18.7 a 18.7 ± 118.1 a 62.9 ± 197.4 a 23.1 ± 287.2 a 15.3 ± 143.5 a 62.4 ± 190.2 c 86.3 ± 726.1 a 21.7 ± 112.6 a 17.6 ± 474.4 e 40.5 ± 475.6 c 52.4 ± 381.8 a 40.5 ± 475.6

889.6 ± 103 a 103 ± 889.6

28.5 a 8.8 ± 28.5 e 9.2 ± 29.3 ± 47.3

19 ± 5.5 e ± 5.5 19 a 4 ± 15.4

2 ± 0.5 a 0.5 ± 2

ATCC 12662 ATCC

325.9 a 325.9 a 6.8

C.aminophilum

1696.5 ± 246.4 b ± 246.4 1696.5 b ± 150.7 1010.1 b ± 139.2 1183.2

1049.9 ± b ± 52.8 1049.9

739.6 ± 114.6 c 114.6 ± 739.6

601.3 ± 73.4 73.4 d ± 601.3 48.5 b ± 243.8 35.6 b ± 131.6 ± 332.8 24.6 b ± 104.9 21.2 d ± 287.3 21.2 d ± 287.3

369.5 ± 21.6 e 21.6 ± 369.5

140.8 ± 5.9 a 5.9 ± 140.8

313.6 ± 63 b 63 ± 313.6 b 23 ± 175.9 b ± 19.6 741

807.1 ± 61 e 61 ± 807.1 e 26 ± 517.8 e ± 23.9 610

10.4 b 7.7 ± 10.4

16.2 e 4.1 ± 16.2 e 8.1 ± 29.5

26.6 26.6 b

A

TCC49906

E. pyruvativorans Isol6 E.pyruvativorans

3235.5 ± 163.2 c ± 163.2 3235.5

815.4 ± 111.7 e 111.7 ± 815.4 a 128.4 ± 656.3 e 128.4 ± 656.3

1108 c 124.9 ± 1108

303.3 ± ± 303.3 56.8 d ± 303.3

665.3 ± 33.7 c 33.7 ± 665.3 ± 344.7 c 48.3 ± 404.5 e 58.7 ± 433.4 a 21.6 ± 300.5 c 25.3 ± 160.2 c 14.4 ± 184.9 c 82.1 ± 649.7 c 56.8 ± 303.3

654.1 ± 9.1 d 9.1 ± 654.1

73.7 ± 16.7 c 16.7 ± 73.7 a 13.8 ± 68.7 c 12.9 ± 83.7

579 ± 69.2 c ± 69.2 579 c ± 26.3 288

50.6 a 9.4 ± 50.6

56.8 56.8 d

19.1 c 19.1

F. succinogenes S85 succinogenes F.

26126.3 ± 742.4 e 742.4 ± 26126.3

2495.8 ± 165.5 d ± 165.5 2495.8 d ± 126.5 1627.8 ± 2192.8 d ± 117.2 1422.2 d ± 101.2 1384.9

5693.2 ± 334.2 e ± 334.2 5693.2

2131.5 ± d ± 68.3 2131.5 d ± 32.7 1843.6 b ± 47.3 2505.4

3181.6 ± a ± 70.5 3181.6

866.1 ± 31.1 31.1 b ± 866.1 48.1 d ± 956.1 59.1 b ± 686.4 59.1 b ± 686.4 25.5 d ± 701.8 79.4 b ± 821.2 53.7 d ± 393.6 79.4 b ± 821.2

1586 ± 32.9 32.9 d ± 1586

1055 ± 39.4 e 39.4 ± 1055 a ± 84 2174.7 a ± 84 2174.7

645.2 d 645.2

119775.2 ± 10247.8 d 10247.8 ± 119775.2

R. flavefaciens 007c flavefaciens R.

4453.3 ± 1011.5 f ± 1011.5 4453.3

1795.7 ± 444.1 d ± 444.1 1795.7 d ± 257.9 1548.8 b ± 637.9 2317.2 d ± 184.4 1552.7 d ± 115.5 1019.8 d ± 199.4 2594.7 d ± 698.5 1130.1 d ± 199.5 2602.5

5957.3 ± 110.6 e ± 110.6 5957.3 c ± 338.3 6112.9 e ± 171.6 1573.9

5031.3 ± 258.6 f ± 258.6 5031.3 f ± 110.9 1141.4 f ± 166.7 1488.8

1789.5 ± d ± 71.5 1789.5

1036.7 ± e ± 23.3 1036.7 c 707.4 ± 923.8 c 707.4 ± 923.8

10559 ± 1506 f ± 1506 10559

9035 ± 2140 2140 f ± 9035 550.

6 ± 55.9 55.9 6 f ±

12465.8 ± 1203.4 d 1203.4 ± 12465.8

25286.9 ± 1907.2 e 1907.2 ± 25286.9

34250 ± 3516.9 c ± 3516.9 34250

1647.9 ± 150.9 d ± 150.9 1647.9 d ± 104.2 1408.2

2733.4 ± 294.3 e ± 294.3 2733.4 e ± 123.6 2051.8 c ± 207.9 1832.7 e ± 131.7 1558.7 e ± 101.3 1137.9

4052 e 262.5 ± 4052

948.6 ± 98.8 98.8 b ± 948.6 25.8 d ± 362.7 35.9 b ± 603.3 35.9 b ± 603.3 35.9 b ± 603.3

722.5 ± 59.5 e 59.5 ± 722.5 e 43.5 ± 750.7 e 17.8 ± 495.7

R. albusSY3 R.

432 ± 27.6 c ± 27.6 432 c ± 27.6 432

1043 c 76 ± 1043 c 76 ± 1043

164 Chapter 5 Table 5.6 List of the descriptions from KEGG for the KEGG orthologs (KOs) found to have a significant increase in expression in HAPs compared to NAPs. Enzyme Commision (EC) number given if ortholog is an enzyme. K0 Description from KEGG EC number K03737 por; pyruvate-ferredoxin/flavodoxin oxidoreductase EC:1.2.7.1, 1.2.7- K00925 ackA; acetate kinase EC:2.7.2.1 K00262 glutamate dehydrogenase (NADP+) EC:1.4.1.4 K06131 clsA_B; cardiolipin synthase A/B EC:2.7.8.- K01515 nudF; ADP-ribose pyrophosphatase EC:3.6.1.13 K00759 APRT; adenine phosphoribosyltransferase EC:2.4.2.7 K00760 hprT; hypoxanthine phosphoribosyltransferase EC:2.4.2.8 K01520 dut; dUTP pyrophosphatase EC:3.6.1.23 K00812 aspB; aspartate aminotransferase EC:2.6.1.1 K01775 alr; alanine racemase EC:5.1.1.1 K00969 nadD; nicotinate-nucleotide adenylyltransferase EC:2.7.7.18 K00919 ispE; 4-diphosphocytidyl-2-C-methyl-D-erythritol kinase EC:2.7.1.148 K01883 CARS; cysteinyl-tRNA synthetase EC:6.1.1.16 K04077 groEL; chaperonin GroEL K01972 DNA ligase (NAD+) EC:6.5.1.2 K10773 NTH; endonuclease III EC:4.2.99.18 K09816 znuB; zinc transport system permease protein K03087 rpoS; RNA polymerase nonessential primary-like sigma factor K03695 clpB; ATP-dependent Clp protease ATP-binding subunit ClpB K03086 rpoD; RNA polymerase primary sigma factor K06183 rsuA; 16S rRNA pseudouridine516 synthase EC:5.4.99.19 K03694 clpA; ATP-dependent Clp protease ATP-binding subunit ClpA K04078 groES; chaperonin GroES K03574 mutT; 8-oxo-dGTP diphosphatase EC:3.6.1.55 K04485 radA; DNA repair protein RadA/Sms mtsT; energy-coupling factor transport system substrate-specific K16924 component K07238 zinc transporter, ZIP family K04069 pflA; pyruvate formate activating enzyme EC:1.97.1.4 K12410 npdA; NAD-dependent deacetylase EC:3.5.1.- K03630 radC; DNA repair protein RadC K03308 neurotransmitter:Na+ symporter, NSS family K03310 alanine or glycine:cation symporter, AGCS family K03328 polysaccharide transporter, PST family

165 Chapter 5

166 Chapter 5

167 Chapter 5

Figure 5.6 Tukey style box and whisker plots of significantly different expression of KEGG orthologous groups. The boxplots summarise data within each phenotype, showing the median, first and third quartiles in the box, and 1.5 times the interquartile range above and below the median in the whiskers. Data used for the boxplots and the points are replicates of the species based on DESeq2 normalised counts. (A) shows only boxplots for KO families that are significantly increased in HAPs, whereas (B) shows all but three gene families that are higher in NAPs. Species are indicated in the legend; 12662 – A. sticklandii ATCC12662; 49906 – C. aminophilum ATCC49906; Isol6 – E. pyruvativorans Isol6; S85 – F. succinogenes S85; 007c – R. flavefaciens 007c; SY3 – R. albus SY3.

168 Chapter 5

5.4.5. Expression of Transporters in HAPs and NAPs Whilst the genomic analysis of amino acid and peptide transporters suggested a signature in the HAPs compared to the NAPs, the genome contains a cell’s functional potential, and does not indicate whether these transporters are actively used. Analysing the transcriptomes to highlight how these transporter genes are expressed can indicate this. Using transcripts per million (TPM) to normalise data and allow for direct comparison amongst samples, it is apparent from the graph in Figure 5.7 and Figure 5.8 that HAPs have a higher proportion of transcripts for genes that are amino acid and peptide transporters, with the exception of a branched chain amino acid exporter (2.A.78) which is more significantly expressed in NAP species (P=0.0052).

The hypothesis was further developed to stipulate that not only are there more of these transporters present, but they are also highly utilised, and this would be reflected in the transcriptomes. When comparing the expression of the transporter families represented in both HAPs and NAPs, there were more transcripts per million for transporter genes in HAPs than in NAPs (Figure 5.7). Additionally, there were significant difference in expression for some of the transporters that are common to all species. For example, the alanine or glycine:cation symporter (2.A.25) was significantly greater in HAPs than NAPs (P=7.4e-07). Similarly, the Peptide Transporter Carbon Starvation CstA Family (2.A.114) was also significantly more expressed in HAPs than NAPs (P=2.604e-05), however orthologs for this gene were only found in one of NAPs; F. succinogenes S85, as well as the three HAPs. The HAP species also had a significantly higher expression than NAPs (P=0.00066) for the polar amino acid uptake family (3.A.1.3), but where, despite this significance, only A. sticklandii ATCC12662 and C. aminophilum ATCC49906 had a higher proportion of transcripts for these orthologs than all of the NAP species (Figure 5.8). E. pyruvativorans Isol6 did not follow this trend, instead a high proportion of transcripts for this species are for the Amino Acid-Polyamine- Organocation (APC) Superfamily (2.A.3). Specifically, the transporters highly expressed (average TPM 7939.41 ± 862.12) in this superfamily were primarily the inner membrane transporter (2.A.3.7.5) which was likely either an amino acid transporter or an amino acid:organic amine antiporter (Saier et al., 2016). The most expressed transporters for A. sticklandii ATCC12662 belonged to the Basic Amino Acid Antiporter family (2.A.118), with the highest expression (average TPM 4667.03 ± 480.37) of a putative C4 dicarboxylate transporter (2.A.118.1.2).

Interestingly, despite being shown to previously have more amino acid and peptide orthologs present in its genome than the other two HAP species (Figure 4.8), C. aminophilum ATCC49906 had a lower proportion of transcripts from amino acid and peptide transporters compared to the other HAPs, but still substantially greater than seen in the NAP species. Ammonia transporters (1.A.11) were not visible on the stacked graph (Figure 5.7) and are therefore lowly expressed for all NAPs and for C. aminophilum ATCC49906, which also possesses an ortholog (Figure 5.8).

169 Chapter 5

Amino acid transporters Permease

Cation Symporter Exporte

Peptide transporter Ammoniu m channel

Figure 5.7 Stacked graph of transcripts per million totals for amino acid and peptide transporter families. The TPM values were totalled for all orthologs belonging to the same transporter family.

170 Chapter 5

Figure 5.8 Boxplots for each of the transporter families. HAP species are in shades of red and NAPs in shades of green, according the legend on the right. The P-value for each transporter family is the comparison of HAPs and NAPs using a Wilcox rank sum test. Here, zero expression equates to absence of ortholog.

171 Chapter 5 5.5. Discussion

5.5.1. Amino Acid Metabolism and Energy Conservation Form a Large Proportion of Some HAP Species Transcriptomes The current study explored the possibility of a transcriptome-mediated HAP phenotype by comparing expression in HAPs and NAPs. The transcriptomes revealed that there was a general trend for the HAP species to have a larger proportion of reads in their transcriptomes that aligned to genes that play a role in energy production and conservation and amino acid transport or metabolism. One exception to this is C. aminophilum ATCC49906, where the proportion of expression related to amino acid metabolism was lower than that of two of the NAP species. However, this species had 26.9 % of genes that were labelled as function unknown, which is much larger than the total average proportion of 12.7 % for all of the species for this unknown functional category. Therefore, this may not be the true reflection of the transcriptomic functional profile and this HAP may even harbour novel or unrecognised genes related to its phenotype. Additionally, these higher-level functions are coarse and span enzymes related to all processes in a function; for example, in metabolism and transport this includes catabolism as well as anabolism. As NAPs require enzymes for amino acid biosynthesis and polypeptide production, comparing expression of genes with this function alongside expression for amino acid deamination likely masks further patterns.

Exploration of the functions that are highly expressed in HAPs compared to NAPs revealed individual gene families that were differentially expressed, including those that are related to amino acid transport and metabolism. Interestingly, a family of dehyrogenases involved in the deamination of glutamate, leucine, phenylalanine and valine (COG0334) were significantly highly expressed in the HAPs compared to the NAPs. However, when looking at the boxplot for this family in Figure 5.5, there was a large variation in the expression within the phenotype for this orthologous group, with higher expression in A. sticklandii ATCC12662 and E. pyruvativorans Isol6 than C. aminophilum ATCC49906. This could be relevant to the preference of certain bacteria to peptides over amino acids; E. pyruvativorans and A. sticklandii prefer peptides (Stadtman, 1954; Wallace et al., 2004; Fonknechten et al., 2010) whereas C. aminophilum prefers casamino acids (Paster et al., 1993). Although all species were grown in the same broth medium because this is a complete and well supplemented medium, the preferential nitrogen source could be utilised. This variation within the phenotype for this gene is also more evidence that these HAPs have different functions to achieve the same phenotype. On the other hand, an orthologous group of amino acid carrier proteins (COG1115) were all highly expressed in the HAPs and lowly expressed in the NAPs, indicating a function likely to be necessary for this phenotype. Those amino acid metabolism and transport functions that are significantly more highly expressed in NAPs were tryptophan synthases (COG0159) and prephenate dehydrogenases (COG0287) which are related to tryptophan and tyrosine biosynthesis (Bonvin et al., 2006), respectively. This is an example of how anabolism genes are included in the generic family of

172 Chapter 5 amino acid metabolism functions. The increase in expression of an orthologous group of acetate kinase enzymes (COG0282) suggests the possibility of the production of acetate and ATP from acetyl- phosphate, yet as acetate is not produced more in HAPs than in NAPs (Figure 3.6), there is little evidence to suggest that this reaction is important in HAP function.

This comparative approach requires that a function be represented by all six species, thus excluding any orthologous gene families that are unique to one phenotype. However, this was already established using genomic approaches. Of those six orthologous groups found to be unique to HAPs or common to HAPs and SAPs but none of the NAPs at the genome level (section 4.4.3.3), none were shown to be highly expressed in any of the HAPs, indicating that it is likely not important for the HAP phenotype. Looking at functional pathways through KEGG functional orthologous groups also did not elucidate any apparent signatures. Of the seven KOs found to be unique to ammonia producing species, only one KEGG ortholog showed similar expression in each of the three HAPs: lysophospholipase. Although this enzyme is involved in lipid degradation, it does also play a role downstream of glycine, serine and threonine metabolism. With the exception of E. pyruvativorans Isol6, the most highly expressed KEGG orthologs in the other two HAPs did not immediately indicate phenotype specific amino acid metabolism related functions. For E. pyruvativorans isol6, the glutamate enzymes are some of the highest expressed KOs, as well as pyruvate metabolism enzymes. The increased expression of zinc transporters is interesting, suggesting the importance of zinc ions for the cell. A need for zinc is not unique to HAP species; many cells require this trace element, as it is used in many enzymes and proteins, but high levels can be toxic to the cell (Hantke, 2005). The increased expression of zinc transporters in HAPs compared to NAPs suggests an increased requirement for zinc inside the cell, which may be related to production of high numbers of zinc-based enzymes such as alcohol dehydrogenases noted to be common to HAPs and SAPs previously. Interestingly, studies have previously shown that supplementing steers with zinc sulphate resulted in a reduction of ruminal amino acid degradation and increased dietary amino acids made available to the host, but this was mainly concerned with studying protozoa populations and their contribution to amino acid degradation (Froetschel et al., 1990), but did not look at HAP activity which may have increased but this was overshadowed by the protozoal activity. Other studies of zinc supplementation noted a decrease in ammonia production, however this was related to inhibition of urease and a reduction of urea hydrolysis (Arelovich et al., 2000), and hyper ammonia producing bacterial species were not mentioned.

Other amino acid metabolism enzymes are also more significantly expressed in HAPs than in NAPs, such as aspartate aminotransferase that catalyses not only the production of glutamate from 2- oxoglutarate and aspartate, but also the same reaction with phenylalanine, tyrosine, and tryptophan. Glutamate dehydrogenase catalyses the deamination of glutamate to produce 2-oxoglutarate, ammonia and hydrogen (Kanehisa et al., 2016). As each reaction produces a substrate which is used by the other reaction, it may be that this forms a cycle, as illustrated in Figure 5.9, which could be the predominant

173 Chapter 5 pathway through which ammonia is produced in the HAP species. However, from this alone it is not apparent how ATP is generated, but it may be that the produced oxoacetalate and 2-oxoglutarate are then available for the citric acid cycle. Further evidence for this is the significantly increased expression of pyruvate ferredoxin oxidoreductase, which catalyses the bidirectional reaction of oxidised ferredoxin, pyruvate and CoA to produce acetyl-CoA, carbon dioxide and hydrogen. This cycle resembles that seen in species lactic acid bacteria (Fernández and Zúñiga, 2006). Excess glutamate and aspartate may be exported from the cell, something seen in A. sticklandii (Fonknechten et al., 2010). Indeed this cycle forms part of the amino acid degrading pathways undertaken by Clostridiales difficile, which is key to forming a 2-oxo-acid which can then enter either the reductive or oxidative Stickland reactions (Neumann-Schaal et al., 2019). More notably, this cycle is a key part of the malate-aspartate shuttle, but using malate instead of glutamate, which is involved in oxidation of NADH which is produced by glycolysis, and the electron transport chain in the mitochondria (Borst, 2020).

The deamination of aspartate is not unique to the HAPs, however. The NAPs have a significantly increased expression of L-aspartate oxidase, which catalyses not only the bidirectional reaction of aspartate in an amino acid metabolism pathway, but also a reaction in a nicotinate and nicotinamide metabolism pathway. Whilst it is unclear precisely which reaction the NAPs may be highly expressing this enzyme for, a further two enzymes that are also significantly highly expressed in NAPs that are the next two enzymes in the nicotinate and nicotinamide metabolism pathway suggests that aspartate is more likely to be important in nicotinate and nicotinamide metabolism. As well as this, amino acid biosynthesis is more important to NAPs, as there are enzymes relating to tyrosine, tryptophan and histidine biosynthesis that are more significantly expressed in NAPs than HAPs.

Another hypothesis tested was whether the presence of those amino acid metabolism genes common to many bacterial species were upregulated and expressed in the HAPs more so than NAPs. One such example of this was the transporters. Although there were some transporters found to be unique to HAP species, many amino acid and peptide transporter orthologs were also found in NAPs. It is these universal transporter genes common to many bacterial species that were of particular interest. Analysis of the transcriptomes determined that some of these transporter genes were expressed more in the HAPs than NAPs, for example, the alanine or glycine:cation symporter and the peptide transporter carbon starvation CstA family (2.A.114). Alanine and glycine act as donor and acceptor amino acids respectively in Stickland reactions, where a pair of amino acids react to produce energy (de Vladar, 2012), therefore the high levels of expression indicates the importance of transporting these amino acids. This peptide transporter has also been shown in Escherichia coli to be upregulated once glucose and other carbon energy sources are depleted, and to avoid starvation this upregulation induces peptide transport for utilisation as an alternative energy source (Schultz and Matin, 1991). As the only source for carbon for the HAPs is in the form of amino acids and peptides, the presence and high level of

174 Chapter 5 expression of a peptide transporter regulated by carbon requirements in the cell during exponential growth phase correlates with this finding.

On the other hand, the dicarboxylate/amino acid cation symporter family (2.A.23) was expressed relatively highly compared to other amino acid and peptide transporters in all of the HAPs. This ortholog was not present in the NAPs, and therefore is not expressed. The orthologs shown to be present in the HAPs were primarily for aspartate and glutamate transport, both of these substrates have been shown to be the targets of enzymes that are highly expressed in the HAPs. Additionally, the amino acid and peptide family with the highest expression in A. sticklandii ATCC12662 was a putative C4 dicarboxylate transporter, which may also take aspartate and glutamate as a substrate but there is no evidence that this particular gene does this, instead it has been shown to only transport those four-carbon carboxylates used in the tricarboxylic acid cycle (TCA) as shown previously during anaerobic growth of E. coli (Zientz et al., 1996). Yet this use of E. coli as a model is commonplace in genomics and it is important to question to transference of this information to other species. Whilst there are some interesting patterns in the transcriptomics and transporters, it remains to be seen whether these transporters do have the functions that they are annotated with. Ideally, functional testing of these transporters found to be important in the HAPs are required for confirmation.

Figure 5.9 An illustration of the possible interactions between two reactions which may be the pinnacle of the HAP phenotype.

175 Chapter 5 5.5.2. Limitations, Implications and Future Work Although each hypothesis was met with some small indication of something unique to HAPs, an obvious and fulfilling answer to the question of what formulates the HAP phenotype remains to be seen. Instead, it appears that whilst an outward phenotype is shared between the three HAP species studied here, the intricacies of amino acid metabolism and ammonia production is hidden within each species alone, arriving at the same phenotype through different pathways. To confirm which enzymes and pathways are upregulated in the individual HAPs during growth, further individual studies could be carried out on this transcriptomic data. Implementation of a different experiment, such as one using radiolabelled substrates, would also better highlight these pathways, where the species are exposed to growth media with differing levels of nitrogen sources. The design of this experiment on the other hand, aimed to find commonalities between HAPs under the same conditions when compared to species of a polar opposite ammonia production phenotype, in this case the NAPs. This approach revealed the importance of glutamate and aspartate, which can be used possibly in a cyclical manner to produce ammonia and hydrogen, with the production of ATP from the tricarboxylic acid cycle which can be supplied with products of these reactions, along with pyruvate and acetyl-CoA. This is supported by a number of findings, including the presence of transporters unique to the HAPs (but also present in the SAPs) that transport glutamate and aspartate and were shown to be expressed during the growth phase of the cultures, as well as the significantly increased expression of enzymes which catalyses the deamination of these substrates which in the process produces ammonia and hydrogen.

Yet these reactions identified above are common to other bacterial groups too, as these enzymes and pathways are present in the genomes of NAPs in this study, as well as in broader bacterial groups such as the lactic acid bacteria. If these pathways are indeed present across many phyla of bacteria, amino acid fermentation for energy may be an ancient form of energy production, something that has been claimed to have ‘powered metabolism in the RNA world’ (de Vladar, 2012). If this is indeed the case, then extant bacteria may retain the pathways for this, which is why there was not a specific signature for the HAPs in the genomes. Instead, the HAPs do not have the ability to metabolise carbohydrates, as they do not possess all the genes necessary to successfully metabolise carbohydrates as an adequate energy source (Fonknechten et al., 2010; Seshadri et al., 2018). An interesting genome and transcriptome comparison would be between HAPs and lactic acid bacteria (LAB). The two groups are by definition mutually exclusive; species that belong to the LAB group have the characteristic obligate sugar fermentation activity which produces lactic acid (Fernández and Zúñiga, 2006). During sugar starvation and stress, these bacteria are able to catabolise amino acids for energy (Papadimitriou et al., 2016), but it is uncertain whether all or some species in this group can utilise amino acids as the sole carbon source. Irrelevant of whether or not these LAB species can use amino acid deamination pathways as the form energy production, comparisons between genomes of HAP species and LAB species would expose those genes that HAPs lack, as they are not capable of sugar fermentation.

176 Chapter 5 Other considerations and limitations for transcriptomics include the reliance of upregulation and expression of genes encoding those proteins necessary for certain functions during exponential growth. If a cell requires proteins constantly and consistently throughout its lifecycle, differential expression would not be seen in different growth phases. Additionally, transcriptomic analysis assumes that the number of transcripts correlates to protein activity, which is an unreliable indicator (Evans, 2015). However, the addition of transcriptomics did allow for replication. The genomic approaches suffered from some bias due to a small sample size; three species of each of the three ammonia production phenotypes is limiting. Having four replicates of each species forced biological and technical replicates, as each individual culture, despite being seeded from the same initial culture, is no longer influenced by outside effects. Future work, however, would need to isolate and identify more HAPs and NAPs, using culture-based approaches. As with previous literature, using an amino acid and peptide rich medium with this as the sole nitrogen and carbon source should ensure HAP isolation. NAP isolation would likely prove more difficult but could be achieved using a medium based on cellulose and other carbohydrates, and ammonia as the carbon and nitrogen sources, respectively. Ideally, the outcome of this research would have identified commonalities of HAPs that creates a searchable signature in genomic data. This would allow for the additional identification of genomes. However, no such signature was identified in the genomes alone, instead it became apparent that the three HAPs studied both utilise different pathways and they share some of the fundamental genes for amino acid metabolism with the NAPs, both of which have the effect of diluting any signal during genome comparisons.

5.6. Conclusions

Hyper ammonia production is synonymous with obligate amino acid fermentation, as it is the process of deamination of amino acids to obtain energy that results in high levels of ammonia production, but the actual pathways taken to achieve this may be different, as seen in these HAP species. The pathways are also common to many other bacterial species, and are even present in those that can utilise carbohydrates, and because more energy can be obtained from sugars, amino acid fermentation tends to be used only as an alternative pathway, often activated during times of stress or starvation. It is thought that amino acid utilisation is the oldest form of energy production, and therefore it is likely that instead of these ruminal HAPs having evolved this phenotype, they have not gained the carbohydrate metabolism abilities, residing in the obligate amino acid fermenting niche. Exactly what makes them competitive in this niche remains unclear.

177 Chapter 6

6. Ruminal Bacteriophages26

6.1. Introduction

The use of prokaryotic viruses in the form of bacteriophages or archaeal viruses to target important and contributing bacteria or archaea in the rumen has been mentioned previously by other authors as a possible methane mitigation strategy (McAllister and Newbold, 2008; Leahy et al., 2010; Buddle et al., 2011; Kumar et al., 2014; Gilbert et al., 2015). There are three main developments that need to be researched and understood before a phage can be considered as suitable as a mitigation strategy, namely the target organism, the phage characteristics as well as a good understanding of the microbial ecosystem that the phage will be exposed to (Gilbert et al., 2015). Although most of the discussions revolve around using archaeal viruses to target the methanogens in the rumen, little consideration has been given to other bacterial targets, especially those species responsible for producing larger amounts of carbon dioxide and hydrogen which feed into methanogenesis. Hyper ammonia producing bacteria fall into this category and would therefore make good targets for phages as a methane mitigation strategy.

Rumen phages are those that either target rumen bacterial species, or those that are believed to reside in the rumen, evidenced either by direct sampling of rumen fluid (Klieve and Bauchop, 1988), or through isolation from ruminant associated samples, such as from compacted manure from transport trucks, and abattoir kill-floor run-off (Klieve et al., 2004). Samples from these two rumen-associated environments (as well as from sewage samples) previously yielded five phages that are, to date, the only genomes of lytic rumen phages to be sequenced and published (Gilbert et al., 2017). Of these five, three belonged to the Siphoviridae family; two that targeted the bacterial host Prevotella ruminicola and one which targeted Streptococcus bovis, and another two that belonged to the Podoviridae family that targeted Ruminococcus albus (Gilbert et al., 2017). However, there is still much to be learned about phages in this environment and the opportunity was taken in this current study to further investigate their presence in samples taken directly from rumen fluid and fresh faeces. The aims of this current study were to attempt to isolate phages specific to the hyper ammonia producers Acetoanaerobium sticklandii ATCC12662, Clostridium aminophilum F (ATCC49906) and Eubacterium pyruvativorans Isolate 6, using samples of rumen fluid and faeces from sheep and cattle. To abate concerns regarding the small population size of the hyper ammonia producers, predominant ruminal bacteria were used as a control to confirm the methods were appropriate. Yet, isolation and genome sequencing of novel bacteriophages from the rumen alone, regardless of bacterial host, is a worthy endeavour, as increasing the number of genomes from lytic rumen phages beyond the current

26 Part of these results have been published (Friedersdorff et al., 2020) and the manuscript is available here: https://www.frontiersin.org/articles/10.3389/fmicb.2020.01588/full 178 Chapter 6 five genomes available is an important contribution, and developing the knowledge of the phage community will help understand the interactions between components of the rumen ecosystem (Gilbert et al., 2015).

6.2. Aims and Objectives

• To screen rumen derived samples (rumen fluid and faeces) for bacteriophages. • To screen selected potential rumen bacterial hosts obtained from culture collections to assess suitability as phage targets. • Isolate bacteriophages, purify and characterise them in vitro. • Extract nucleic acids, sequence, reconstruct and analyse the phage genomes.

6.3. Experimental Procedures

Five bacterial hosts were selected, comprised of three HAPs; Clostridium aminophilum ATCC49906, Acetoanaerobium sticklandii ATCC12662 and Eubacterium pyruvativorans Isol6, along with the NAP Ruminococcus flavefaciens 007c and SAP Butyrivibrio fibrisolvens DSM3071. These were cultured and maintained using Hobson’s M2 medium (sections 2.2, 2.3).

A total of eight rumen-associated samples were screened for bacteriophages against these five selected hosts. Samples of faeces from three sheep and three cows (section 2.11.1) were screened for phages. First, solid samples underwent an elution step, using phage storage buffer to elute phages from the solid/semi-solid material (section 2.11.2). These, along with samples of mixed rumen fluid from a three cows (CRF) and from three sheep (SRF), were clarified to create phage filtrates (section 2.11.2), as described previously (Klieve, 2005; Klieve and Gilbert, 2005). To increase the concentration of bacteriophages in a rumen derived sample, host bacterial cultures were incubated with phage filtrates to enrich for phages (section 2.11.3). Bacteriophages and viral particles were also concentrated from filtrates using PEG precipitation (section 2.11.4). A sample was then screened using the soft overlay technique (2.11.6), and 24 samples were screened this way (four sheep and four cow rumen derived samples, each was screened, as well as undergoing enrichments and PEG precipitation).

Phage screening was carried out using the soft overlay technique in the form of spot tests and plaque assays (section 2.11.5). For some bacteria, it was necessary to adjust the growth media to encourage confluent lawn growth (section 2.11.7).

Indication of phage activity in the form of plaques or areas of lysis were sampled and purified (section 2.11.6), underwent DNA extraction and sequencing (section 2.11.8, 2.11.11), the stability of the phages in chloroform and their host ranges were tested (2.11.9). Purified phage samples were also visualised using TEM (2.11.10).

The phage genome sequencing data underwent quality control, read trimming and genome assembly (section 2.21.1), then the genomes were annotated and compared to each other (section 2.21.2).

179 Chapter 6 Phylogenetic and phylogenomic relationships were studied (section 2.21.3), as well as assessing possible bacterial host interactions (section 2.21.4).

6.4. Results and Discussion

6.4.1. Detection of Phages in the Rumen Fluid and Faeces Two different rumen derived samples from two different ruminant species were used for the evaluation of phage presence, the rumen fluid and faeces from both cattle and sheep. Bacteriophages with differing morphologies were observed by using TEM in phage filtrates derived from both cow and sheep rumen fluid. As can be seen in Figure 6.1, there are a number of different morphologies of virus-like particles visible, with varying sizes and shapes or capsids (phage heads) and tails. This initial result indicated that phages were present in the rumen fluid samples and also that the PEG precipitation method was suitable for use with phages, which had remained morphologically intact in the TEM images. However, TEM does not indicate phage viability and whether they are still biologically active and capable of infecting, nor whether they are virulent or temperate phages (section 1.4.3.1), or what bacterial host they infect. These same filtrate samples were then used to screen for phages targeting specific bacterial hosts.

6.4.1.1. Target species 1: Eubacterium pyruvativorans isol6 An initial spot test with phage lysates from rumen fluid and faecal samples grew densely enough such that some areas of lysis were visible in all three replicates of a cow faecal sample, two of three replicates of a sheep faecal sample, one of three replicates of a different sheep faecal sample and one of three replicates of sheep rumen fluid, suggestive of phage activity (Figure 6.3). These areas of clearing were subsampled, creating seven positive samples, which were tested for phage activity through propagation with another spot test, however poor growth of subsequent bacterial lawns impeded further testing. Poor lawn growth was a particular problem when screening for phages against Eubacterium pyruvativorans isol6, which would grow inconsistently between tests. Since E. pyruvativorans has a requirement for peptides (Wallace et al., 2004), the growth medium was supplemented with peptides to increase lawn density and reliability to better visualise phage activity. Peptone was added to the basal Hobson’s M2 broth medium, revealing an increase in bacterial culture density compared to Hobson’s M2 at OD600 (an absorbance reading of 1.212 compared to 1.004 of the Hobson’s M2 culture after equivalent incubation conditions). However, addition of this supplement to the top agar did not improve bacterial lawn growth density. Due to these issues, it was not possible to determine whether any lysis observed was caused by phage activity, but the results observed suggest that this is promising area to pursue in the future as this bacterium is a HAP. The addition of sodium pyruvate has been shown to further increase growth density more so than the addition of peptides (Wallace et al., 2004), and would be a promising avenue to investigate for increasing lawn density.

180 Chapter 6

Varyingfamilies phages of be can seen,such as thetypical plantvirus. non the longernon rumenfluid, whereas E Figure

A

E

6

-

tailedphage families A. in The structure long visibleG in could be tail, a a or fil

.

1

-

TransmissionElectron Micrographs of viralparticles from Rumen Fluid samples. A

contractile tails of

-

Hfrom were sheep. Scale bars with magnification areprovided each on image.

B

F

Siphoviridae

in C, D, in G possible H, and

C

G

contractile of tails

Podoviridae

Myoviridae

with with short tailsone or of

D

H

amentous phage or

in B, in E F, and the

- D are Dare from cow

181 Chapter 6 6.4.1.2. Target species 2: Acetoanaerobium sticklandii (ATCC12662) No evidence of phage activity was seen in any of the phage lysates, precipitated lysates or samples enriched for phages against A. sticklandii, despite lawn growth being consistently more than adequate for observation of lysis. This may be because the population of A. sticklandii in the rumen is low (Yang and Russell, 1993; Krause and Russell, 1996a), and therefore the corresponding phage population is also likely to be low, reducing the chance of isolation. Alternatively, it may be that this host strain in particular is not suitable to isolate phages against rumen specific A. sticklandii strains. This ATCC strain was isolated originally from the black mud of San Francisco Bay (Stadtman and McClung, 1957), and may therefore represent a different strain of A. sticklandii than those present in the rumen. Since phages tend to be strain specific, it may be different enough to not be infected by rumen phages.

6.4.1.3. Target species 3: Clostridium aminophilum F (ATCC49906) Similarly, no evidence of phage activity was seen in any of the phage lysates, precipitated lysates or samples enriched for phages against C. aminophilum (Figure 6.3), however lawn growth was consistently poor on Hobson’s M2. Previous studies showed increased growth of C. aminophilum when free amino acids were available (Paster et al., 1993). This was not observed in the current study, when casamino acids were added to Hobson’s M2 liquid medium, with a final absorbance of OD600 of 0.667 compared to 0.793 when grown in the a basal Hobson’s M2 culture. Instead growth was visibly better when grown in the ATCC recommended reinforced clostridial medium with casamino acids supplemented, as is shown in Figure 6.2.

6.4.1.4. Target species 4: Ruminococcus flavefaciens 007c As a predominant cellulolytic bacterium in the rumen, it was expected that phages against this bacterium would be more prominent in samples compared to the number of phages against bacterial species with smaller populations, and therefore more likely to be isolated. However, lawns of R. flavefaciens were unreliable, leading to inconsistent results. Areas of lysis were found in an initial spot test with phage preparations from sheep rumen fluid and faecal samples. These were scraped and tested again for phage activity, which did reveal further areas of lysis. No other evidence of phage activity was seen in precipitated phage lysates or enrichment samples. To increase bacterial growth and production of reliable lawns, alternative media or supplements were considered. Evidence of media used to grow R. flavefaciens indicated the necessity for cellobiose (Saluzzi et al., 2001). Increasing the amount of cellobiose in liquid Hobson’s M2 medium did not cause a substantial increase in culture density when tested briefly spectrophotometrically with an absorbance reading at

OD600 of 0.689 compared to 0.663 of the Hobson’s M2 medium). Addition of cellobiose supplement to the top agar did not improve lawn density. Improvements to the lawn production to achieve reliably dense lawns are fundamental to be able to isolate phages against this bacterium.

182 Chapter 6

Figure 6.2 Clostridum aminophilum streaked on different agar. Left is Hobson’s M2 2 % agar medium, and right is reinforced clostridial medium (RCM) with 1.5 % casamino acids.

6.4.1.5. Target species 5: Butyrivibrio fibrisolvens D1 (DSM3071) Lawn growth for this species on basal Hobson’s M2 medium was more reliable compared to the other species in this current study. Individual plaques were visible in a spot test from one of the cow faecal samples, as well as the same sample after PEG phage precipitation. The same spot test also revealed areas of lysis from precipitated phage lysate samples from both sheep and cow rumen fluid. In order to increase the likelihood of further phage propagation, plaques seen in these rumen fluid and faecal samples were picked and combined into one subsample, with less concern given to origin of the samples. This mixture of picked plaques was tested with a further plaque assay, revealing a large number of plaques when adding either 10 or 100 μl of phage sample to the host, each showing two different plaque morphologies. Small, round plaques with a darker, clearer centre and rough edges were picked from this, purified, and became “Phage D” (Figure 6.5A). Large, rough-edged plaques were also picked, purified, and named “Phage M” (Figure 6.5B). Given that it was unclear as to where these samples were from, precipitated samples of sheep rumen fluid, cow rumen fluid and the cow faecal sample were further screened individually as a plaque assay. Addition of 100μl of the precipitated cow faecal sample to host bacteria in a plaque assay revealed circular, clear cut plaques, ~1mm in diameter. These plaques were picked and purified, to create sample “Phage P” (Figure 6.5C). Addition of 100μl of precipitated phage filtrate from cow rumen fluid showed one medium-sized, circular clear-cut plaque, which was picked and propagated, to create “Phage C” (Figure 6.5D). Precipitated phage filtrate from sheep rumen fluid yielded “Phage J” (Figure 6.5E), which formed many small, pin-prick sized faint plaques.

183 Chapter 6

Figure 6.3 Flow diagrams showing the tests carried out for these four bacterial species. HAP = hyper ammonia producer, NAP = no ammonia producer, AOL = Area of lysis, area of clearing in the lawn, SRF = sheep rumen fluid, SF = sheep faecal sample, CRF = cow rumen fluid, CF = cow faecal 1 sample. See methods section 2.11.7 for information regarding supplements and alternative growth media.

184 Chapter 6

Butyrivibrio fibrisolvens DSM3071; SAP

PA with 10 μl All crude PA with 10 μl PA with 100 μl and 100 μl samples tested, and 100 μl precipitated precipitated three plaques precipitated CRF. Two All precipitated CRF. One seen in one CF SRF. One plaque plaques on 100 samples tested, plaque on 100 μl sample. on 100 μl plate. μl plate. four, one and plate. one plaques were seen in each of the replicates of a Phage J Phage C Phage P Three plaques CF, 1 plaque in subsampled and CRF and one in tested but no SRF. Two Four rounds of Three rounds of Three rounds of phage activity. plaques from CF purification purification purification and the two using PA with using PA with using PA with plaques from RF varying varying varying samples were dilutions. dilutions. dilutions. picked into one All crude mixed sample. samples tested again, AOL seen in all of on SF, all of SRF.

PA with 10 μl No lysis was and 100 μl of observed from mixed sample. the enrichment Two plaque samples. morphologies, one of each picked.

Phage D Phage M

Four rounds of Five rounds of purification purification using PA with using PA with varying varying dilutions. dilutions.

Figure 6.4 Flow diagram of steps taken to screen rumen-associated samples and isolate phages.

185 Chapter 6

A B C D E

Figure 6.5 Plaque morphologies of phages isolated against Butyrivibrio fibrisolvens DSM3071. (A) Phage D, (B) Phage M, (C) Phage P, (D) Phage C, (E) Phage J. 6.4.2. In vitro Characterisation of Butyrivibrio Phages All phages observed had a circular or even icosahedral-looking capsid, with a tail but not obviously contractile, as seen in the TEM images (Figure 6.6). Based on these morphologies, it is likely that these lytic Butyrivibrio phages belong in the Siphoviridae family, one of the most often observed phage families (Ackermann, 2009), especially in the bovine and caprine rumen (Berg Miller et al., 2012; Namonyo et al., 2018). However, the TEM images are not of a quality high enough to make assumptions and conclusions on the phage family alone, and further genome analysis is required to confirm this. Phage M remained elusive and difficult to obtain adequate TEM images. A summary of the phage morphologies and rough sizes can be seen in Table 6.1.

Table 6.1 Phage morphologies and characteristics. Source is the phage filtrate that yielded the plaque. *Mixture is a combination of subsamples taken from areas of lysis from spots of sheep and cow rumen fluid, and a cow faecal sample. Plaque morphology on the same host with the same growth conditions. Head and tail size approximate, as measured from two or three TEM images. Phage Source Plaque morphology Head size Tail size ~140 - Medium sized plaques, <1mm in diameter, C Cow rumen fluid ~58 – 63nm 143nm x round with a darker, clearer centre. ~10nm >125nm x D Mixture* Small but bigger than pin pricks, circular ~56 – 61nm ~15nm ~120-140nm J Sheep rumen fluid Very small, faint, pinprick like plaques. ~55 – 60nm x ~10nm M Mixture* Large, >1mm in diameter, rough edge. N/A N/A ~136- Round, clear cut plaques, medium sized, P Cow faeces ~48 - 58nm 139nm x <1mm in diameter. ~13nm

186 Chapter 6

A B

C D

Figure 6.6 Transmission electron micrographs of lytic Butyrivibrio phages. (A) Phage C, (B) phage D, (C) Phage J, (D) Phage P, (E) Phage M. Scale bars are labelled.

187 Chapter 6 When these five phages were tested for infectivity against other hosts, including a different strain to the isolation host; B. fibrisolvens JW11, three different species; B. hungatei strains JK615, Su6 and DSM10295, and three of the same genus; B. sp. strains DSM10305, DSM10316, and M55, no phage activity was seen at any dilutions. By spotting the same phages onto the host B. fibrisolvens DSM3071 under the same conditions and calculating for dilution factors, the estimated concentration for the phages were as follows; Phage C ~104, Phage D ~108, Phage J ~105, Phage M ~104, Phage P ~106. It may be that the concentration for some of these phages are too low to conclude their efficacy of infectivity of other hosts. Further, as only a small subsample of the B. fibrisolvens population has been tested, these results do not decisively conclude that these phages are extremely host specific but does indicate that they are likely to have a narrow host range.

All phages survived the addition of chloroform, which had no apparent effect on phages P and D, and only a possible slight reduction in concentration seen in phages J, C and M. However, this was only decided using one spot test, and to definitively observe a reduction in phage numbers, plaque assays would need to be carried out with replicates. However, as chloroform fully inactivates filamentous and lipid containing phages (Hyman, 2019), it is unlikely that any of the phages isolated here have these characteristics.

6.4.3. Phage Genome Analysis The bead beating method employed in this study successfully extracted DNA from the five phage sample which was then sequenced. Reads underwent quality control, trimming and assembly, resulting in one fully reconstructed circular phage contig in each of the samples (Table 6.2), except for phage sample J, which yielded two contigs (labelled J-1 and J-2). The two contigs had different read coverage and genome lengths (J-1 was 33,227 bp and J-2 was 31,128 bp in length). The reconstructed contigs surpassed the recommended minimum of 100-fold read coverage recommended for phage genomes (Russell, 2018).

Further analysis of these phage genomes revealed that phage genome J-1 reconstructed from sample J had the highest coverage but the average nucleotide identity (ANI) was 100 % similar to the phage genome reconstructed in phage C sample. Therefore, the J-1 genome did not undergo further analysis. Queries of these phage genomes against the NCBI virus nucleotide database revealed that there was no significant similarity to viral genomes previously published and deposited in this database. Queries against the viral representative reference sequence database and viral refseq database from NCBI also did not reveal similarity to any previously deposited viral sequences. Therefore, it is likely that these five phage genomes were novel, and were named; Bo-Finn (phage C), Butyrivibrio phage Arawn (phage D), Butyrivibrio phage Arian (phage P), Butyrivibrio phage Idris (phage J-2, the second shorter genome contig assembled from phage sample J) and yielded Butyrivibrio phage Ceridwen (phage M). The summary of the samples, the source the sample, the genome contigs reconstructed and subsequent names can be seen in Table 6.2.

188 Chapter 6

Table 6.2 Summary of phage sample and its source, the genomes constructed, coverage and allocated phage name. *Mixture – plaques from one sample of cow faeces, cow rumen fluid and sheep rumen fluid were combined and tested again to produce the plaques from which these phages were isolated. Phage Genome Contigs Average Coverage Over Source Butyrivibrio phage Sample Reconstructed Entire Genome C Cow rumen fluid C 125.832 Bo-Finn D Mixture* D 1390.5 Arawn J-1 1399.46 N/A J Sheep rumen fluid J-2 242.56 Idris M Mixture* M 352.867 Ceridwen P Cow faeces P 754.716 Arian

To determine the similarity between these phage genomes, genome comparisons were first carried out using Mauve (Figure 6.7), and then any genomes shown to be similar (Figure 6.7), had their ANI calculated. Butyrivibrio phage Arian and Butyrivibrio phage Bo-Finn shared similar genome length and genome synteny, and 98.6 % ANI. Butyrivibrio phage Arawn and Butyrivibrio phage Idris also shared a similar genome length and genome synteny to each other, and 98.96 % ANI. Butyrivibrio phage Ceridwen did not appear similar to the other four phages (Figure 6.7).

The reconstructed phage genomes ranged from 31kb (Arawn) to 39.7kb (Ceridwen) in length (Table 6.3), with numbers of predicted open reading frames (ORFs) ranging from 50 (Arawn) to 73 (Ceridwen). The percentage of ORFs with homology to known sequences ranged from 40 % (Ceridwen) to 52 % (Arawn) (Table 6.3). All phage genomes have overlapping genes, ranging from 15 % of genes the smallest genome belonging to Idris, and up to 30 % of the genes in the largest genome belonging to Ceridwen. Interestingly, this correlation contradicts the previous finding that smaller viral genomes generally have more overlapping genes (Brandes and Linial, 2016). All genomes have few areas of the genome that are not populated by ORFs and small intergenic regions, with many ORFs overlapping, which are all common characteristics in phage genomes (McNair et al., 2019).

189

Chapter 6

similar blocks acrossgenomes.

Figure

6

.

7

Mauvealignment of the fiveButyrivibrio phages.

Colinearblocks arecolour

- coded,with the similaritygraph plottedwithin the block and lines join

190 Chapter 6

Table 6.3 Genome statistics summary. *As used previously (Gilbert et al., 2017). N/A indicates that no single phage genome shared more than one best hit homologous protein. ORFs with best Most common phage name Butyrivibrio Genome GC (%) ORFs hit homologous (number of best hit homologous phage Length (bp) proteins proteins)* Arawn 31118 46.9 50 26 Paenibacillus phage PG1 (2) Arian 33499 39.7 54 25 N/A Bo-Finn 33227 39.7 51 24 N/A Clostridium phage phiCP13O (2), Ceridwen 39745 40.8 73 29 Clostridium phage phiCP26F (2) Idris 31128 46.9 55 25 Paenibacillus phage PG1 (2)

6.4.4. Phages Bo-Finn and Arian A score of 98.6% ANI is greater than the 95% nucleotide sequence identity threshold indicative of the same species, as set by the International Committee on Taxonomy of Viruses (ICTV) (Adriaenssens and Brister, 2017) suggests that Butyrivibrio phages Bo-Finn and Arian are different isolates that belong to the same species and genus. The two genomes are syntenous and share high sequence similarity across their entire genome (Figure 6.8A). The most pronounced difference is around 18 Kbp, where a subsequence is present in each genome that is not found in the other, as shown by the grey boxes in Figure 6.8A. Both phages have tail genes that correspond to those found in most members of the Siphoviridae; tail terminator, major tail protein, two chaperones, tape measure protein (TMP), distal tail protein, tail-associated lysozyme and baseplate/tip proteins, which together form a functional ‘tail morphogenesis module’ (Veesler and Cambillau, 2011). Figure 6.8B shows these genes in Butyrivibrio phage Arian, including a promoter and terminator, suggesting that these genes can be regulated together. Not all genes here showed homology to previously identified proteins, but if the tail morphogenesis module is conserved, these hypothetical proteins may be involved in the tail formation. Additionally, of the predicted ORFs that were homologous to genes in other phages, 10 in the Bo-Finn genome and 11 in the Arian genome shared homology to phages of the Siphoviridae family, six in each genome to phages of Myoviridae, and one each to a phage of Podoviridae. Along with the absence of any tail sheath proteins which are common to Myoviridae (Adriaenssens and Cowan, 2014), and the majority of genes being similar to other Siphoviridae phages, it is likely that these two phages belong to the Siphoviridae family.

Butyrivibrio phage Bo-Finn was predicted by PHACTS non-confidently to undergo the temperate lifecycle, whereas Arian was predicted non-confidently to undergo the lytic lifecycle. This suggests that there was some uncertainty with predicting the lifecycle using the proteomes of these two phages and that this outcome was not statistically supported (McNair et al., 2012). The lytic cycle relies on

191 Chapter 6 using the cellular machinery to create progeny virions, eventually bursting the host cell, whereas temperate phages integrate the phage genome after injection in the host, becoming a prophage (section 1.4.3.1). It is possible to predict lifestyle based on the genes present, for example, genes involved in integration, excision and lysogeny, regulation of expression and toxin production are more commonly associated with the lysogenic lifecycle (McNair et al., 2012), relying on integrase genes in particular for lysogeny (Philipson et al., 2018). On the other hand, genes involved in nucleotide metabolism, structural proteins, and lysis more important in virulent phages (McNair et al., 2012).

Life cycle can also be informed from the predicted major capsid protein and the family it belongs to. The major capsid proteins in these two phage genomes were predicted using protein motifs and resembled those in the HK97 family of major capsid proteins, populated by temperate phages (Haft et al., 2003). This suggests that these phages could undergo the temperate lifecycle, yet there is no further evidence in the genome to suggest this ability, for example there were no immunity repressors, partitioning genes or integration and excision enzymes annotated in the genome, which are common functions associated with temperate phages (Mavrich and Hatfull, 2017; Stark, 2017). Furthermore, the GC content of phages often correlate with that of their bacterial host genome (Almpanis et al., 2018). The GC content of phages Arian and Bo-Finn are equal to that of the GC content of B. fibrisolvens DSM3071 (39.7 %).

Additionally, these two phages were observed to undergo the lytic cycle in vitro and phage Arian harbours an endolysin (QHJ73651.1), and phage Bo-Finn contains both an endolysin (QHJ73713.1) and putative holin (QHJ73715.1). Moreover, a tRNA gene for the amino acid glutamine was predicted in these genomes, which is used in 3.6 % of coding genes in genome of phage Arian and 4.4 % in Bo- Finn, compared to 2.7 % of coding genes in the bacterial host genome (Appendix Table 10.5). These codon usage statistics suggest that this codon is utilised more often in the phage genome than the host genome and therefore encoding this tRNA would increase efficiency and virulence, something that has been noted previously in lytic phages (Bailly-Bechet et al., 2007). Because these genomes are so highly similar, it is likely that they would both undergo the same lifecycle, which is not reflected in the PHACTs results. Evidence so far would therefore indicate that these phages are virulent, however with some ORFs remaining uncharacterised, there may be novel proteins associated with the temperate lifecycle. The full annotated genomes can be seen for phage Arian and Bo-Finn in Figure 6.9 and Figure 6.10, respectively.

A blastn search for the entire genome sequences of phage Arian and Bo-Finn against the NCBI virus genome database did not reveal any significant alignments covering more than 8% of the query genome.

192

Chapter 6

directionality and function according to colour. encodedproteins arethose that there was uncertainty about their function. previously,but had organisation The of the tail morphogenesis module phage in Arian. Uncharacterised proteins indicatewhere theprotein was hom

Figure (

6

.

8

Analysis of the Butyrivibriophage Arian andBo

an an unknownfunction, whereas hypothetical proteinsdid not show similarity above the set thresholds any to known proteins. Ph

Nucleotidesimila

-

Finn genomes. Finn

rity is expressed graph, a as with green indicating100 yellow %, >30 %,<30 red (B) %.

(A) (A) Mauvealignments of the twogenomes showingannotated ORFs, with

ologous toreported one age age

193

Chapter 6

wasuncertainty abouttheir fu function,whereas hypotheticalproteins did not show similarityabove set the thresholds to known any proteins.Phage encoded function, on according to the legend. Uncharacterised proteins indicate where the Figure

6

.

9

Genomemap of Butyrivibriophage Arian.

nction.

GC content shown is in blueAT and green in in centre. the Open reading framesare colour coded based

proteinwas homologous one to reported previously, but unknown an had

proteinsare those that there

194

Chapter 6

uncertaintyabout their function. function,whereas hypotheticalproteins did not show similarityabove set the thresholds to known any proteins.Phage encoded function, on according to the legend. Un Figure

6

.

10

Genome of map Butyrivibrio phageBo

characterisedproteins indicate where the protein was homologous one to reported previously, but unknown an had

-

Finn.

GC content is

shown shown in blue AT and in green the in centre. Open reading framesare colour coded based

proteinsare those

that there was

195 Chapter 6 6.4.5. Phages Arawn and Idris Butyrivibrio viruses phages Arawn and Idris also share high sequence similarity (98.96 % ANI). The two genomes are syntenous, with highly similar genome sequences at the nucleotide level. The most noticeable difference between the two genomes is at 21.3 Kbp, where the similarity is low, and a different sequence in phage Arawn results in three ORFs predicted in this region (Figure 6.11A). The series of tail proteins present in Arawn (Figure 6.11B) also resembles the tail morphogenesis module common to phages in the Siphoviridae family (Veesler and Cambillau, 2011). Unlike phage Arian, Arawn has a terminator predicted after the major tail protein (QHJ73553.1) and a promoter located in the 3’ end of a tail protein (QHJ73556.1). As with the tail morphogenesis module of Arian and Bo- Finn, not all genes in the genomes of Arawn and Idris showed homology to previously identified proteins, but these hypothetical proteins may be involved in the tail formation. Additionally, of the predicted ORFs that were homologous to genes in other phages, 16 in the Arawn genome and 17 in Idris genomes shared homology to phages of the Siphoviridae family, and two in each genome to phages of Myoviridae. Along with the absence of any tail sheath proteins which are common to Myoviridae (Adriaenssens and Cowan, 2014), and the majority of genes being similar to other Siphoviridae phages, it is likely that these two phages also belong to the Siphoviridae family.

Both phage Arawn and Idris were predicted confidently by PHACTS to undergo the temperate lifecycle. The major capsid proteins are similar to those in the temperate phages Streptococcus phage phiD12 and Streptococcus phage phi-SsUD.1 (Tang et al., 2013). A putative excisionase (xis) gene was predicted in both Arawn (QHJ73575.1) and Idris (QHJ73845.1), after identification of a DNA binding domain in the excisionase family. The presence of a protein homologous to a reverse transcriptase in phage Arawn (QHJ73560.1) and Idris (QHJ73832.1) suggests a possible mechanism orf prophage immunity; previously shown to have an abortive effect on lytic phage infection and prolong lysogeny (Odegrip et al., 2006). Although these phages were observed in vitro undergoing the lytic cycle, the presence of these genes suggests the potential for Butyrivibrio phages Arawn and Idris to undergo the temperate lifecycle. The fully annotated genomes of phage Arawn and Idris can be seen in Figure 6.12 and Figure 6.13, respectively.

As with phages Arian and Bo-Finn, a blastn search for the entire genome sequences of phage Arawn and Idris against the NCBI virus genome database did not reveal any significant alignments covering more than 8% of the query genome.

196 Chapter 6

similarityabove the set thresholdsto known any proteins. Phage encoded proteins are thosethat therewas uncertainty about where protein the was homologous to one reportedpreviously, but had an unknownfunction, whereas hypothetical proteins n did y ORFs,directionality with and function according to colour.Nucleotide similarity isexpressed as graph, a with green indicat Figure

ellow >30 %, <30 red %. (B) organisation The of the tail morphogenesis modulein Arawn. phage Uncharacterised proteins indica

6

.

11

Analysis of theButyrivibrio phage Arawn Idris and genomes. Mauve (A) alignments of the twogenomes showing annotated

ing 100 ing %,

theirfunction.

ot ot

te

show

197

Chapter 6

about about their function. whereashypothetical proteins did not similarity show above the set thresholds to anyknown proteins. Phage encoded proteins function,according to the legend.Uncharacterised proteins indicate where protein the was homologous to one reported previou Figure

6

.

12

Genome of map Butyrivibrio phageArawn. GC

content content shown is in blue AT and green in in the centre. Open readingframes colour are coded based on

are those that there was uncertainty

sly,but had unknown an func tion,

198

Chapter 6

about about their f whereashypothetical proteins did not similarity show above the set thresholds to anyknown proteins. Phage encoded proteins function,according to the legend.Uncharacterised proteins indicate where the Figure

6

.

13

Genome of map Butyrivibrio phageIdris. GC content shown is in blueAT and green in in centre. the Open reading framesare co

unction.

proteinwas homologous to one reported previously, but had unknown an function,

are those that therewas uncertainty lourcoded based on

199 Chapter 6 6.4.6. Phage Ceridwen Genome The longest of the Butyrivibrio phage genomes belongs to phage Ceridwen, which also had the densest genome, with more predicted ORFs in the genome proportional to the length (Table 6.3). Of these ORFs, ~40 % shared similarity with a viral sequence or protein in one of the databases searched. This genome showed similar organisation to the other five phage genomes, with packaging, structural proteins and DNA modification related genes mostly found together (Figure 6.14). As with the other genomes, there was evidence of a tail morphogenesis module, formed of a putative tape measure protein (QHJ73746.1), tail protein (QHJ73747.1) and endopeptidase tail protein (QHJ73748.1). Of the predicted ORFs that were homologous to genes in other phages, 20 shared homology to phages of the Siphoviridae family, and one of Podoviridae. Therefore, it is likely that this phage belongs to the Siphoviridae family.

Phage Ceridwen was predicted non-confidently as having a lytic lifecycle by PHACTS, but the presence of an ORF similar to an antirepressor (QHJ73772.1) suggests lysogenic abilities. This antirepressor may act on the gene upstream of it; a gene homologous to the XRE family of translational inhibitors (QHJ73771.1). The most characterised phage repressors belong to this family (Durante-Rodríguez et al., 2016), which act as the main repressor, mediated by the antirepressor to trigger downstream lysis genes (Argov et al., 2019), which in this case could be the neighbouring putative lysin (QHJ73773.1). Despite this, there is no evidence of excisionase genes, and this phage was only observed to undergo the lytic lifecycle in vitro.

A blastn search for the entire genome sequences of phage Ceridwen against the NCBI virus genome database did not reveal any significant alignments covering more than 1% of the query genome.

200

Chapter 6

about about their function. whereashypothetical proteins did not similarity show above the set thresholds to anyknown proteins. Phage encoded proteins function, on according to the legend. Uncharacterised proteins indicate where the protein was homologous one to reported prev Figure

6

.

14

Genome of Butyrivibrio map phageCeridw

en. GC en. content is shown in blue AT and green in in thecentre. Open readingframes arecolour codedbased

are those that there was uncertainty

iously, but unknow an had n n function,

201 Chapter 6 6.4.7. Similarities and Differences Between the Butyrivibrio Phages The five phages in this study were observed in vitro to undergo the lytic lifecycle, infecting the same B. fibrisolvens host strain. The phages likely belong to the Siphoviridae family, as evidenced through genes encoded, genome organisation, size, and similarity to other phages belonging to this family. All five phages possess genomes with sizes greater than 20 Kbp and less than 125 Kbp, and tail morphogenesis modules typical of siphoviruses (Hatfull, 2008; Veesler and Cambillau, 2011).

Despite all targeting the same bacterial host, the GC content of these phages varied, with phages Arawn and Idris having the highest GC content (46.9 %), which does agree with the observation made previously that shorter phage genomes tend to have higher GC contents (Almpanis et al., 2018). There is, however, no linear relationship between GC content and genome length with these five phages (Table 6.3). Interestingly, phages Arawn and Idris were also predicted to be able to undergo the lysogenic cycle, yet the GC content of their genomes is higher than that of the bacterial host, which is 39.7 %. These findings do not follow trends observed previously that phages either maintain a GC content similar to the host, or lower (AT rich), but may instead suggest that the phage genome integrates into an area of the host genome that is more GC rich than average (Almpanis et al., 2018).

With genome lengths of around 30-40 Kbp for the Butyrivibrio phages, these ranges are similar to the 33.5-34.6 Kbp of other lytic rumen phage genomes belonging to Siphoviridae that were sequenced previously (Gilbert et al., 2017), and in the range of what has been observed from members of the Siphoviridae family generally (Hatfull, 2008). The number of ORFs in these Butyrivibrio phage genomes that share similarity with currently known nucleotide or amino acid sequences in the databases ranges from 40 % to 52 % of the total that were predicted, which is more than the ~25-27 % predicted for phages infecting Bacteroides and Ruminococcus species annotated previously, but not as many as the 54 % of ORFs annotated for the Streptococcus phage (Gilbert et al., 2017). This lack of confidence in assigning functional annotation to all predicted ORFs stems from reliance on sequence similarity matching and poor protein characterisation in phage genomes more generally. This lack of similarity to known sequences therefore is not particularly surprising, but it may suggest that the phages isolated in this study are considerably dissimilar to those isolated in other environments (Hatfull and Hendrix, 2011; Berg Miller et al., 2012). Additionally, phage genomes from the rumen are currently poorly represented in the public databases (Namonyo et al., 2018; Lawrence et al., 2019), limiting the conclusions that can be formed about the rumen phage population.

Butyrivibrio phage Bo-Finn, which was isolated from cow rumen fluid, was identified to be the same species as phage Arian, which was isolated from cow faeces. This suggests that phage populations are present throughout the digestive tract and may be evidence for passage of phages that reside in the rumen through the digestive system and finally into the excretions of the ruminant animal. Whilst it is generally accepted that the faecal microbiome cannot be used to represent the rumen microbiome, it instead is more representative of the hindgut microbiota (Noel et al., 2019), in the instance of phages,

202 Chapter 6 sampling of faeces for phages may reflect phage populations in the rumen, but this is something that requires further confirmation.

6.5. Phylogeny Analysis Reveals Wider Context of Butyrivibrio Phages

There were no significant alignments to any other phage genome in the NCBI viral reference database that exceeded 8% coverage of the query genome, something that is not unusual (Hatfull and Hendrix, 2011), suggesting that these five genomes are novel. However, some predicted genes in the Butyrivibrio phage genomes did show some level of similarity to other phage genes in the databases. Phylogenetic analysis of phage proteomes using ViPTree (Nishimura et al., 2017) revealed that four of the five phages are more closely related to each other than to any other phages (Figure 6.15). The next closest relatives of these four phages are Lactobacillus and Paenibacillus phages. Phage Ceridwen, on the other hand, is in a clade of Clostridium phages. The long branches between the Butyrivibrio phages and its nearest relatives, however, suggests low protein sequence similarity. The closest relatives to all of the Butyrivibrio phages also belong to the Siphoviridae family, offering further evidence that the Butyrivibrio phages belong to this family as well (Figure 6.15).

The genomes of the 13 closest relatives of the five Butyrivibrio phages were analysed using VICTOR to determine whether the Butyrivibrio phages, based on genome BLAST distance, belonged to an existing species, genus, or VICTOR family. It was revealed that Butyrivibrio Arian and Bo-Finn belonged to the same species, whilst the other Butyrivibrio phages belonged to a unique species, which was not shared with any other phage genomes. Butyrivibrio phages Arawn and Idris belonged to the same genus, which was distinct from Arian and Bo-Finn, and from Ceridwen. The Butyrivibrio phages did however belong to the same VICTOR family, and none of the Butyrivibrio phages belonged to the same genus or family as any other phage genomes. Phylogenomic analysis, therefore, suggests four species and three genera that are novel and unique (Appendix Figure 10.3 and

Appendix Table 10.6).

203

Chapter 6

phyla phyla the bacterialhost belongs to for each taxa are given in theouter annotatedwith the virus family belong they to known) (if the on innerring; red for scale. Figure

Thetree includes358 related taxa, with the five Butyrivibriophages in this study highlightedwith red and stars labelled w

6

.

15

Theresults of theViPTree analysis

(Nishimuraet al., 2017)

ring; ring;

using protein a distance metricbased normalised on tBLASTx scoresplotted a on log

Firmicutes in blueGammaproteobacteria and green. in

Siphoviridae,

limegreen for

Myoviridae

andturquoise for

ith

arrows.are Taxa also

P

odovir

i

d

ae . The The .

204 Chapter 6 6.6. No Evidence of Phage Interactions Found in Host Genome

The reference genome for Butyrivibrio fibrisolvens DSM 3071 did not show an intact prophage but had two regions of incomplete prophages: one with seven predicted phage proteins of which six were highly similar to other phage proteins, and another with 10, of which six were also highly similar to other phage prophages. None of these predicted prophage proteins shared any similarity to proteins in the Butyrivibrio phage genomes from this study. Furthermore, none of the phage proteins from this study shared similarity above the thresholds to the bacterial host genome. The absence of a prophage similar to the Butyrivibrio phage genomes in this study suggests that these phages are more likely to undergo the lytic lifecycle.

Additionally, the host genome had one cas9 ortholog, and 14 CRISPR arrays, suggesting previous interaction with bacteriophages. However, none of these spacers were similar to any of the Butyrivibrio phages in this study, nor were any of the CRISPR spacers predicted in other microbial genomes in the Hungate1000 collection (Seshadri et al., 2018).

6.7. Conclusions, Implications and Future Work

This study presents five novel phage genomes isolated from rumen-associated samples, and the first phages isolated and sequenced that target Butyrivibrio fibrisolvens. Isolated from ruminant faecal and rumen fluid samples, these five phages represent four novel species from three separate genera and were shown based on genomic characteristics to likely belong to the Siphoviridae family. None of the genomes show any similarity to any previously sequenced phages deposited in public databases. These five genomes double the number of cultured and sequenced phages from rumen-associated samples that are now available.

All of these Butyrivibrio phages were observed undergoing the lytic lifecycle, and further evidence in the genomes of Arian, Bo-Finn and Ceridwen supports this. Phages Arawn and Idris however harbour a number of lysogeny-associated genes, suggesting the possibility of these phages to be temperate. The lack of evidence of integration of these phages into the bacterial genomes using sequence similarity approaches does however question the lysogenic capabilities.

The addition of these five rumen-specific phage genomes to the reference databases is a small, but important contribution and will help to annotate known cultured isolates in genomic and metagenomic datasets not just from the rumen (Gilbert and Ouwerkerk, 2020b), but also from other environments. There is no reason to believe that B. fibrisolvens would be a more or less efficient or likely phage host than any other bacterium, and the identification of five phages belonging to three different genera active against this single host suggests a highly diverse phage population in the rumen. However, it is likely that an effort on the international scale is required, similar to the strategies of the Hungate collection (Seshadri et al. 2018), to achieve a more representative sample of this diversity.

205 Chapter 6 Phages are important in a wider context and hold the potential for use as mitigation strategies. There are three major factors that need to be understood before phage therapy can be developed and implemented (Gilbert et al., 2015): 1) Characterisations of target host; this was carried out in chapters 3, 4 and 5. 2) Characterisations of the phage(s); the work described in this chapter begins to address this second factor. 3) Characterisation of the wider microbial context and physical environment where the phage will be utilised. Developing the understanding of these factors has proved fundamental to phage research and using phages as a therapeutic has been successful in a medical context. With similar early developments made in phage use with livestock, it is hoped that future endeavours will include the use of phage therapy as a method to combat ammonia and methane production by targeting key bacterial species in the rumen. Essential to these developments the third factor; understanding the wider microbial context and physical environment. In particular, key questions such as “what were to happen to the microbiome if lytic phages were added?” need to be addressed. Given the perceived redundancy in the rumen microbial community (Weimer, 2015), and assuming that phages would successfully find and kill their target hosts in this complex community, it is tempting to speculate on what such a manipulation of one component of the rumen microbial ecosystem could realise. While it is possible that removing one component will simply open a niche for another with similar properties, it would be hoped that further research into phage-bacterium interactions will reveal a methodology for mitigation of climate change impacts of ruminants by manipulation of the rumen microbiota.

206 Chapter 7

7. Using Counter Current Chromatography Applications to Separate Bacteriophages

7.1. Introduction

With a growing interest in bacteriophage research, it is important that a variety of methods are tested to build up the repertoire of ways to isolate, separate and purify bacteriophages to achieve the standard of phage preparation required, as well as evaluate cost, time and yield for each method. The need to characterise the morphology, infection patterns and host range of a bacteriophage can be fulfilled by isolation and examination in vitro. Once characterised, a hurdle often encountered by researchers to bring a phage preparation to therapeutic fruition is to abate concerns regarding the safety of a phage cocktail; requiring a high purity with absence of bacterial toxins (Schmidt, 2019).

7.1.1. Current Methods for Isolation and Purification In order to achieve this high standard (high purity, concentration and free of toxins and bacterial cells), typical approaches for small scale purification use centrifugation and filtration to remove any whole bacterial cells from lysates, then removal of bacterial debris and toxins using polyethylene glycol precipitation and caesium chloride (CsCl) or sucrose gradient ultracentrifugation. Although the resulting phage preparations are pure, yields are often low and the method long and time consuming, and achieving the highest and purest yield also requires a trained hand (Gill and Hyman, 2010). Additionally the viscosity of the separation media, which may be caesium chloride or sucrose, combined with the high forces of the ultracentrifugation often leads to a decrease in phage infectivity (Eskelin et al., 2016).

A scalable method applicable for phage purification from large sample sizes is the recent use of anion-exchange chromatography. It is presented as an easier approach compared to CsCl gradients, but only once the elution profile of the phage of interest has been optimised (Adriaenssens et al., 2012; Vandenheuvel et al., 2018). Recovery yield in the purest elution fraction from this technique varied from 55 % up to 99.9 % depending on the phage of interest, but no measure of purity was recorded (Adriaenssens et al., 2012). Given that samples applied to the column were phage lysates from lysed bacterial cultures, complete purification would require the removal of all non-phage particles, including residual agar or growth medium substrates.

Yet even before potential therapeutic phage preparations need to be purified, phages still need to be isolated and concentrated, and methods to process large volumes are useful, especially for industrial applications or for sources where phages are sparse or in lower concentrations. Tangential flow filtration is a common technique used to concentrate viruses from large volumes of water samples, removing bacterial cells and debris particles larger than 0.22μm in size (Wommack et al., 2010). This allows for large volumes with low concentrations of phages to be processed quickly to concentrate the

207 Chapter 7 phages into a lower volume for downstream or further processing or testing, including metagenomic sequencing or isolating individual phages against bacterial hosts of interest. However, tangential flow filtration relies on membranes, and variables such as pore size and material have been shown to effect the efficiency of this method (Cai et al., 2015).

7.1.2. Counter Current Chromatography Counter current chromatography (CCC) is the term often applied to any biphasic immiscible liquid- liquid partitioning technique, with each of the phases acting as mobile and stationary. CCC has successfully been utilised to separate a range of natural and synthetic target compounds, such as plant extracts and medicinal products (Berthod and Faure, 2015). CCC can be split into two main types: hydrostatic and hydrodynamic. Hydrostatic uses a column with chambers, with centrifugal forces keeping the stationary phase constant, and the mobile phase mixing in those chambers. This is also called centrifugal partition chromatography. Hydrodynamic CCC columns vary in design, but mostly they utilise the planetary motion of one drum, with column coils that are also spinning (Berthod and Faure, 2015) (Figure 7.1 and Figure 7.2). Each specific technique has their uses, advantages and disadvantages that has been reviewed in more detail previously (Berthod et al., 2009). The application of CCC for studying bacteriophages in any capacity has not, to our knowledge, been reported in any literature previously.

Another common separation technique is field-flow fractionation (FFF). When an external field is applied perpendicular to a column, compounds that are flowing through the column in a mobile phase interact with this external field. There are a variety of different external fields that can be applied, such as thermal or magnetic and centrifugal. The latter method is similar to CCC and is called sedimentation FFF (sdFFF), or more recently centrifugal FFF (CFFF) (Contado, 2017), which uses centrifugal forces to elicit separation through differential acceleration (Williams et al., 2011). sdFFF can effectively separate particles based on size, volume, mass and density with respect to the mobile phase. It has the power and resolution to separate particles from 1nm to ~50μm in size, effectively separating nano and microparticles

208 Chapter 7

Figure 7.1 “Type-J planetary motion of a multilayer coil separation column. The column holder rotates about its own axis and revolves around the centrifuge axis at the same angular velocity (ω) in the same direction. This planetary motion prevents twisting the bundle of flow tubes allowing continuous elution through a rotating column without the risk of leakage and contamination.” Figure and caption from (Ito, 2005)

209

arrow. (NP)were used. In later runs, the phage sample was injected inline after pump, the bypassing thevalv thisstudy, only the mobile was phase pumped into the semi Chapter 7 Figure

7

.

2

Schematic diagramof the instrumental set up. green The highlightsthe routeof thelower phase in the reversephase through

-

prepcolumn (no upper or lowerphases were used), and both reversephase (RP)normal and phase

e box e and normal sample injection port, shown as by the red

thesemi

- prepcolumn. In

210 Chapter 7 such as dust, varying sizes of silica beads, sand and soil particles, achieved by simply changing the flow rate (Figure 7.3) (Fedotov et al., 2015), offering an intriguing opportunity for virus and bacteriophage separation. Studies also showed that T4D phage retained infectivity after being exposed to the forces of sdFFF (Caldwell et al., 1981) and the separation power of sdFFF was shown when Phages T4, T7 and the tobacco mosaic virus were separated from T2 (Giddings et al., 1975). These early studies indicate that phages can be separated based primarily on size. Similarly, asymmetrical flow field-flow fractionation (AF4) also separates based on size, ranging from 2 nm to 1 μm, and when coupled to light-scattering detectors, can be used successfully to study viruses and virus-like particles (Eskelin et al., 2019). In particular, AF4 was used to study bacteriophage PRD1, revealing that recovery of the phage from the column ranged from 40 to 100%, but greater yields were obtained at a lower virus concentration through increased sample dilution, which required an additional step of anion exchange chromatography to concentrate the phage samples after elution (Eskelin et al., 2016). It was also found that more phages retained infectivity when the samples were crude and less purified, for example bacterial lysates (Eskelin et al., 2016).

sdFFF is similar to CCC in that they both utilise centrifugal forces; sdFFF for separation directly, and CCC for holding the stationary phase in place. There are advantages that these two techniques share compared to other chromatography techniques. Samples need minimal preparation before application to the sdFFF or CCC column, for example samples do not need to be filtered and more concentrated samples can be applied before overloading, like they would for other methods such as size-exclusion chromatography (Contado, 2017), making sdFFF and CCC quicker and more efficient. Much like AF4, sdFFF and CCC are rapid and gentle separation methods, with AF4 and sdFFF having shown previously that viral particle infectivity is preserved (Caldwell et al., 1981; Eskelin et al., 2016), something that is has not yet been reported using CCC techniques. A disadvantage of AF4 and other FFF methods is the reliance and utilisation of an ultrafiltration membrane, something which CCC and sdFFF do not use, which is not only another parameter that requires optimisation, but can affect the yield and recovery (Eskelin et al., 2016). CCC instead uses biphasic liquid-liquid separation, where both the mobile phase and stationary phase are liquid, this means that properties of a wide variety of different immiscible solvents can be taken advantage of (Berthod, 2007). Also because of this, both phases can be recovered from the column, avoiding the often-irreversible interactions between targets and stationary phases seen in other chromatography methods such as membrane based AF4. Additionally, because of the liquid phases, overloading the column is less of an issue than other methods, and scaling up is possible and relatively straightforward (Berthod, 2007).

7.2. Aims and Objectives

The aim of this research was to apply bacteriophages to a high performance CCC machine, with the ultimate aim to determine separation ability of the CCC technique for phages. This preliminary work would not be utilising the HPCCC to its full potential, avoiding the use of a biphasic system in the first

211 Chapter 7 instance, instead using the forces the CCC exerts through centrifugal forces and the flow rate for separation, much in the same way as in sdFFF techniques. Therefore, the primary property that would separate phages would be size, and as such to maximise the probability of achieving clear separation, two well studied coliphages with different size and morphologies were chosen, as can be seen in Figure 7.4; ϕX174 is a microvirus with a capsid diameter of 25nm, spike proteins and no tail (Yazaki, 1981), which is smaller than the T4 phage, with an 111nm long and 78nm wide elongated icosahedral head, and a 18nm wide and 113nm long contractile tail (Ackermann and Krisch, 1997). These phages were used to model bacteriophage behaviour in the CCC process.

212 Chapter 7

(A) Silica

(B) Urban Street

Figure 7.3 Graphs showing fractionation of different nano and microparticles (Fedotov et al., 2015). (A) shows elution peaks for silica standards of three sizes; 150nm, 290nm and 900nm, which concur with the change in flow rate. (B) shows elution peaks for urban street dust particles of a heterogenous mixture of particles within a range of sizes.

Figure 7.4 Previously published electron micrographs of ϕX174 and T4 bacteriophages to illustrate size and shape. Left - ϕX174 phage stained using a uranyl-acetate based ss method (Yazaki, 1981). Right - T4 Phage stained with sodium molybdate at 333,000 x magnification (Bradley 1963).

213 Chapter 7 7.3. Experimental Procedures

Bacteriophages and their respective Escherichia coli bacterial hosts were sourced and propagated as described in sections 2.12.1 and 2.12.2. Using a plaque assay (as described in section 2.12.2) with a serial dilution, the phage concentration for T4 phage sample was determined to be ~6 x 109 PFU/ml, and ϕX174 at ~1 x 1010 PFU/ml. The different experiments are described below and summarised in Figure 7.5.

Firstly, 1 ml of T4 phage sample was applied to the semi-preparative column, using a flow rate of 6 ml/min in reverse phase, and fractions were collected every two minutes (Section 2.12). The elution profile was monitored using a UV detector, plotting a chromatogram of absorbance over time. The run was stopped once the absorbance had formed an elution peak and plateaued, which occurred shortly after one column volume was eluted (around 25 minutes). The column was then flushed with mobile phase, then this experiment was repeated for the ϕX174 phage. Aliquots from fractions either side and from the centre of the elution peak were tested for the presence of phage. A spot test was carried out in duplicate for both phages, as per the methods described in section 2.12.2.

To determine whether any resulting areas of lysis were caused by phage activity, a 5 μl loop was used to take subsamples from the centre of areas of lysis in three of the fractions each for both T4 for ϕX174. These subsamples were resuspended in a small volume (<200 μl) of Fortier phage buffer (FB) (2.11.4), and tested again using a spot test (section 2.12.2). The concentration of phages in the resulting fractions was determined using a concentration gradient spot test, performed by serial diluting the fractions using a tenfold dilution, and spotting samples of 10-3 to 10-8 dilutions as described previously (section 2.12.2) onto a large square plate.

A second run was formed of three applications of 0.5 ml of only the T4 preparation to the column, the first (A) with a flow rate of 3 ml/min in reverse phase and fractions collected every minute, the second (B) the same as the first but in normal phase, and the third (C) with a flow rate of 1.5 min/ml, reverse phase and fractions collected every two minutes. One in every 10 fractions were tested before the elution peak, then shortly before the peak every fraction was tested. For the first application (A), fractions 3, 13, 23 and 33 were tested, then fraction 34 and each fraction thereafter was tested until fraction 50. The second application (B) was similar, but every fraction from fraction numbers 36 until 54 were tested. Fractions in the third application (C) was the same as the first (A), but due to set up error, the fraction collector was halted before the elution peak was seen after 82 minutes. Fractions were therefore collected before the peak and after the peak only and tested to determine suitability of the method.

In run three, NaOH (0.5 %) was added to the column until the output was alkaline (as determined by litmus paper, just over one column volume) to clean the column and inactivate residual phages, and left for >2 hours, before flushing with mobile phase (distilled water). Once the output was at neutral pH, 0.1 ml T4 phage sample was added to the column, using water as the mobile phase, 3 ml/min flow

214 Chapter 7 rate in reverse phase. The run was stopped after the elution peak and the chromatogram had plateaued. To track the movement of T4 phage through the column, samples were taken from the mobile phase input, the column output before phage sample was added, then fractions were collected every 2 minutes. Two further samples were also collected from the column flush once the rotation was off. A spot test on square plates was carried out on one fraction every three, up until vial 19, when each subsequent vial was tested, as well as the control samples collected.

7.4. Results

7.4.1. Bacteriophages Retain Infectivity After the CCC Process The first step to utilising the CCC for phage separation is to evaluate the effects the forces of the process have on the biological activity and viability of the bacteriophages. Both T4 and ϕX174 phages survived the process and could successfully infect their hosts using a plaque assay. It is difficult to assess yield or loss of phage virions, as knowing the absolute number of phage particles applied is inherently difficult and can only be evaluated using a plaque assay to give an idea of plaque forming units (PFU). Virions that were already inactive or broken down cannot be easily determined, nor can those that were inactivated or were adversely affected by the CCC process.

215 Chapter 7

Phage preparations made (section 2.12.1)

Run 1: 1 ml of sample and 6 ml/min, both T4 and ϕX174 Figures 7.6-7.8 and table 7.1 for T4. were applied. Figures 7.10- 7.12 and table 7.2 for Aim – determine phage survivability. ϕX174.

Subsamples from areas of lysis created by three different

fractions from run 1 were tested for phage propagation Figure 7.9 and 7.13. (section 2.12.2).

Aim – confirm phage presence.

Run 2: (A) 0.5 ml of T4 preparation only, 3 ml/min flow Figure 7.15 rate, reverse phase. (B) Same as (A) but in normal phase. (C) Same as (A) but 1.5 ml/min flow rate. Aim – test alternative conditions to obtain different elution patterns. 3 ml fractions were collected and some from Figures 7.16, 7.17 before the peak and all during were tested for phage presence.

Run 3: Pre-treated the column with NaOH (0.5%). 0.1 ml T4 phage sample applied, 3 ml/min flow rate in reverse phase with fractions collected every two minutes. Aim – determine if 0.5 % NaOH is a suitable solvent for phage removal from column.

Figure 7.5 Flowchart of processes detailing the runs, the conditions, and the aims. The figures these results produced are in italics on the right.

216 Chapter 7 7.4.1.1. T4 T4 successfully retained infectivity after application to the CCC. A volume of 1 ml of sample with an initial concentration at ~6 x 109 PFU/ml of phage in FB was applied to the CCC with a flow rate of 6 ml/min. This resulted in a peak visible with all four of the UV channels occurring at around 20 minutes (Figure 7.6) , which is the expected time for the solvent front, or in this case the elution of one column volume of mobile phase (dH2O). The chromatogram did not suggest retention of the phages on the column as they were eluted after one column volume, however a spot test with concentration gradients revealed that the phages did not elute into one fraction. Instead, concentrations of phages increased and decreased over the fractions, reflecting the peak seen in the UV detection. All fractions taken contained phage, as shown by the spot assay in Figure 7.7, yet not enough fractions were tested to determine where phage elution began and ended. The positive control spot showed that the T4 sample injected into the CCC was indeed active, and the negative control spots of the buffer and water showed no phage activity nor had any effects on the bacterial lawn growth. Fraction 13 contained the highest concentration of phages, as shown by the spot assay with a concentration gradient for each of the fractions to determine phage concentration (Figure 7.8), which corresponded to the centre of the peak in the chromatogram. The concentration gradient of the stock T4 sample that was applied to the column revealed roughly the same concentration as previously shown with a plaque assay, with plaques visible at 10-8, which, when accounting for dilution and sampling volumes, equates to a stock phage solution of ~1010 pfu/ml.

The areas of lysis were confirmed to be caused by phage by observing lysis patterns from propagated samples. The subsamples taken from areas of lysis in fraction numbers 9, 12 and 17 on the initial spot test then showed areas of lysis in another round of spot testing Figure 7.9.

217 Chapter 7

[V] F:\phage T4 run 1_2 - HPCCC Speed (rpm) F:\phage T4 run 1_2 - HPCCC Temp (C) F:\phage T4 run 1_2 - Colibrick - 3 F:\phage T4 run 1_2 - Colibrick - 4 F:\phage T4 run 1_2 - Channel A F:\phage T4 run 1_2 - Channel B 1.0 F:\phage T4 run 1_2 - Channel C F:\phage T4 run 1_2 - Channel D

Voltage

0.5

0.0

0 10 20 30 40 2 3 Time 4 5 [min.]

[V] F:\phage T4 run 1_2 - HPCCC Speed (rpm) F:\phage T4 run 1_2 - HPCCC Temp (C) F:\phage T4 run 1_2 - Colibrick - 3 F:\phage T4 run 1_2 - Colibrick - 4 F:\phage T4 run 1_2 - Channel A F:\phage T4 run 1_2 - Channel B 1.0 F:\phage T4 run 1_2 - Channel C

F:\phage T4 run 1_2 - Channel D

0.01

1.02 Vial 1.02 1

2.03 Vial 2.03 2

3.04 Vial 3.04 3

4.05 Vial 4.05 4

5.06 Vial 5.06 5

6.07 Vial 6.07 6

7.07 Vial 7.07 7

8.08 Vial 8.08 8

9.09 Vial 9.09 9

Voltage

10.10 Vial 10.10 10

11.10 Vial 11.10 11

12.11 Vial 12.11 12

13.11 Vial 13.11 13

14.11 Vial 14.11 14

15.12 Vial 15.12 15

16.12 Vial 16.12 16

17.13 Vial 17.13 17

18.13 Vial 18.13 18

19.13 Vial 19.13 19

20.14 Vial 20.14 20

21.14 Vial 21.14 21

22.15 Vial 22.15 22

23.15 Vial 23.15 23

24.21 Vial 24.21 24

25.22 Vial 25.22 25 26.22 Vial 26.22 26

0.5

27.23 Vial 27.23 27

28.23 Vial 28.23 28

29.24 Vial 29.24 29

30.25 Vial 30.25 30

31.25 Vial 31.25 31

32.26 Vial 32.26 32

33.26 Vial 33.26 33

34.27 Vial 34.27 34

35.27 Vial 35.27 35

36.28 Vial 36.28 36

37.28 Vial 37.28 37

38.28 Vial 38.28 38

39.29 Vial 39.29 39

40.29 Vial 40.29 40

41.30 Vial 41.30 41

42.30 Vial 42.30 42

43.31 Vial 43.31 43

44.31 Vial 44.31 44

45.32 Vial 45.32 45

46.33 Vial 46.33 46

47.33 Vial 47.33 47 48.33 Vial 48.33 48 0.0

0 10 20 30 40 2 3 Time 4 5 [min.] Figure 7.6 Chromatograms from T4, 6 ml/min, reverse phase. Both chromatograms show the UV absorbance plotted over time, as well as the HPCCC speed and temperature. The bottom chromatogram also overlays the fractions collected in red, with the blue rectangle showing the fractions around the peak that were tested. Channels A = 210 nm, B = 254 nm, C = 280 nm, D = 366 nm.

Figure 7.7 Fractions of 1 ml T4 sample applied to CCC in reverse phase at 6 ml/min. Areas of lysis visible on a lawn of E. coli DSM613 for fractions as labelled. The T4 sample before application is also tested, alongside FB and distilled water (dH2O).

218 Chapter 7

9 14

10 15

11 16

12 17

13 T4

FB FB

10-8 10-7 10-6 10-5 10-4 10-8 10-7 10-6 10-5 10-4

Figure 7.8 Concentration gradients of the fractions of 1 ml T4 sample applied to CCC at 6 ml/min. Areas of lysis in the form of whole spots, confluent spots or single plaques visible on a bacterial lawn of E.coli DSM613. Labels of the fraction are in black on the left of the plates, and dilution factor in white along the bottom. FB is applied to the bottom row on both plates as a control.

Table 7.1 Tabular format of the results of the T4 spot test with dilutions of fractions, as seen in Figure 7.8.

Dilution factor (to the power of 10) Sample -3 -4 -5 -6 -7 -8 9 >20 7 1 10 Confluent >20 2 11 Confluent >20 ~16 1 12 Confluent Confluent >20 1 13 Confluent Confluent >20 4 14 Confluent >20 10 15 Confluent >20 6 3 16 Confluent >20 6 17 >20 6 1 Semi- T4 Whole spot Whole spot Whole spot Confluent 10 confluent

219 Chapter 7

Figure 7.9 Spot assay of propagated T4 samples from fractions. Three fractions were subsampled and tested for phage to check subsequent lysis, indicative of phage activity. FB – Fortier buffer.

220 Chapter 7 7.4.1.2. ϕX174 Phage ϕX174 also successfully retained infectivity after application to the CCC column. As with the application of T4 phage, the same parameters and volume of ϕX174 phage sample was loaded onto the column, with the initial ϕX174 concentration of ~4 x 109 PFU/ml in FB. A peak was also seen in all four UV channels at around 21 minutes (Figure 7.10), which corresponds to the solvent front or one column volume. The chromatogram suggested that there was no retention of the phages on the column as the peak was seen after one column volume. However, as with T4, the column did have an effect on the ϕX174 phage, in that the concentrations of the phages reflected the chromatogram peak and did not elute as one bolus. All fractions contained bacteriophages, as seen by presence of areas of lysis in the spot test in Figure 7.11, the positive control of the ϕX174 sample showed lysis, and the negative controls showed no lysis and had no effects on the bacterial lawn. As with T4, not enough fractions were tested to fully encapsulate the start and end of the elution of the phage sample from the column. The fraction that contained the highest concentration of phage particles was fraction number 22 (Figure 7.12), with one plaque present in the 10-7 dilution. Due to the nature of the phage infection pattern, ϕX174 plaques tend to be large, leading to overlap of plaques or phage activity in some samples, but this does not obscure the overall pattern of increasing then decreasing concentrations of phages in the fractions. No areas of lysis or plaques were seen in for fraction number 26 on the dilution gradient, but given the area of lysis present on the initial spot test for this fraction, this would suggest a concentration of ≤103 PFU/ml. The concentration gradient of the phage solution revealed that the initial concentration was ~1010 PFU/ml, when calculating for dilutions.

As with the T4 samples, the areas of lysis were indeed caused by phage, shown by observing areas of lysis in samples propagated from fractions 19, 22 and 26, which resulted in further areas of lysis in a subsequent spot assay (Figure 7.13).

7.4.2. Phages were not detected with UV In this instrumental set up, an in-line UV detector was used. Both T4 and ϕX174 eluted consistently with the peaks observed using UV detection, however it is unclear whether this absorbance peak is due to phages or to impurities or other substances in the elutes. Peaks may also be due to contaminants remaining in the column, but adequate flushing was done before use to avoid this. To determine whether the phage samples would be detected using the in-line UV detector channels, a spectrophotometric wave scan was done on a series of samples to determine whether the presence of phage would be detectable. This revealed that the phages had very little visible influence on the results peaks seen. The water mobile phase inherently absorbs, although very little, at ~200-250nm (Figure 7.14A). The buffer causes a slight increase in the absorbance in the same regions (Figure 7.14B). For both T4 and ϕX174, there was very little detectable difference between the spectra created by the phage eluate compared to the bacterial eluate (Figure 7.14C-F). A plaque assay was therefore imperative to determine the location of the phages in the fractions following elution.

221 Chapter 7

[V] F:\phage PhiX174 run 1 - HPCCC Speed (rpm) F:\phage PhiX174 run 1 - HPCCC Temp (C) F:\phage PhiX174 run 1 - Colibrick - 3 F:\phage PhiX174 run 1 - Colibrick - 4 1.5 F:\phage PhiX174 run 1 - Channel A F:\phage PhiX174 run 1 - Channel B F:\phage PhiX174 run 1 - Channel C F:\phage PhiX174 run 1 - Channel D

1.0 Voltage

0.5

0.0

0 5 10 15 20 25 Time [min.]

[V] F:\phage PhiX174 run 1 - HPCCC Speed (rpm) F:\phage PhiX174 run 1 - HPCCC Temp (C) F:\phage PhiX174 run 1 - Colibrick - 3 F:\phage PhiX174 run 1 - Colibrick - 4 1.5 F:\phage PhiX174 run 1 - Channel A F:\phage PhiX174 run 1 - Channel B F:\phage PhiX174 run 1 - Channel C F:\phage PhiX174 run 1 - Channel D

1.0

1.03 Vial 1.03 1

0.02

2.03 Vial 2.03 2

3.03 Vial 3.03 3

4.03 Vial 4.03 4

5.06 Vial 5.06 5

6.06 Vial 6.06 6

7.07 Vial 7.07 7

8.07 Vial 8.07 8

9.08 Vial 9.08 9

10.15 Vial 10.15 10

11.15 Vial 11.15 11

12.16 Vial 12.16 12

13.16 Vial 13.16 13

14.17 Vial 14.17 14

15.18 Vial 15.18 15

16.19 Vial 16.19 16

17.19 Vial 17.19 17

18.20 Vial 18.20 18

19.21 Vial 19.21 19

20.21 Vial 20.21 20

21.21 Vial 21.21 21

22.21 Vial 22.21 22

23.22 Vial 23.22 23

24.22 Vial 24.22 24 25.29 Vial 25.29 25

Vial 26.34 26 Voltage

0.5 27.13 Vial 27.13 27 27.14 0.0

0 5 10 15 20 25 Time [min.] Figure 7.10 Chromatograms from ϕX174, 6 ml/min, reverse phase. Both chromatograms show the UV absorbance plotted over time for four wavelengths, as well as the HPCCC speed and temperature. The bottom chromatogram also overlays the fractions collected, with the blue rectangle showing the fractions around the peak that were tested.

Figure 7.11 Fractions of 1 ml ϕX174 sample applied to CCC in reverse phase at 6 ml/min. Areas of lysis visible on a lawn of E. coli DSM13127 for fractions as labelled. The ϕX174 sample before application is also tested, alongside Fortier buffer (FB) and distilled water (dH2O).

222 Chapter 7

Figure 7.12 Concentration gradients of the fractions of 1 ml ϕX174 sample applied to CCC in reverse phase at 6 ml/min. Areas of lysis in the form of whole spots, confluent spots or single plaques visible on a bacterial lawn of E.coli DSM13127. Labels of the fraction are in black on the left of the plates, and dilution factor in white along the bottom. Sample 20 was spotted incorrectly (human error) and is labelled to correct for this. Fortier Buffer (FB) is applied to the bottom row on both plates as a control.

Table 7.2 Tabular format of the results of the ϕX174 spot test with dilutions of fractions, as seen in Figure 7.12.

Dilution factor (to the power of 10) Fraction -3 -4 -5 -6 -7 -8 19 Whole spot ~3 1 20 Whole spot Whole spot 3 2 21 Whole spot Whole spot Whole spot 4 22 Whole spot Whole spot Whole spot 1 (large) 1 23 Whole spot Whole spot ~8 ~4 24 Whole spot 3 2 25 2 3 26 ϕX174

223 Chapter 7

Figure 7.13 Spot assay of propagated ϕX174 samples from fractions. Three fractions were subsampled from the initial spot test plate and tested for phage to check subsequent lysis, indicative of phage activity. FB – Fortier buffer, FB-C – contaminated FB spot, experimenter error.

224 Chapter 7

A B

C D

E F

Figure 7.14 Using wave scans to determine whether phages cause peaks in absorbance. A and B are the spectra for water and Fortier Buffer, blanked against water. C-F are the spectra for bacterial elutes and the phage elutes blanked against the buffer.

225 Chapter 7 7.4.3. Bacteriophages Remain in the Column but are Inactivated by Sodium Hydroxide It was observed that bacteriophages could remain in the column despite washing with mobile phase. In run 2, despite washing the column with >2 column volumes of water before starting and in between runs, all fractions tested from each of these three methods were shown with a spot test to contain phages (Figure 7.15 A-C), as seen by areas on lysis and plaques in all spots. This indicates the presence of remnant phages from a previous run, as even the fractions before one column volume had been eluted showed phage activity. The corresponding chromatograms are not shown as it was shown that UV detection was not adequate for phage detection.

Leaving the column in a solution of 0.5 % sodium hydroxide (NaOH) for 1 – 2 hours at the end of a previous run was found to deactivate remnant phages. The NaOH was flushed from the column with the mobile phase before a 0.1 ml sample of T4 phage was added. At 3ml/min, a peak was seen at around 43 minutes, as expected (Figure 7.16). No lysis was seen in the spot tests on samples taken from the mobile phase after flushing but before a subsequent phage sample was applied, or in the early fractions before one column volume had eluted. There was no detectable phage activity in the mobile phase before it entered the column, or after it had left the column, which was treated with NaOH (Figure 7.17). No phages were detected until fraction number 16, where >30 plaques were counted. From vial 19 through to vial 25, the spots showed complete lysis, with confluent lysis visible in vials 26 and 27. A sample of the mobile phase from the flush once the procedure was stopped revealed still a >20 plaques visible. This was reduced to <10 plaques after ~100 ml of mobile phase was flushed.

226 Chapter 7

A B

C

Figure 7.15 Spot tests on square plates with 10μl from fractions produced in run 2(A-C) spotted onto soft agar overlays. Bacteria used was E. coli B (DSM613) and phage loaded on the column was a T4 preparation. The first plate (A) was the result of a run using 3ml/min flow rate, reverse phase and 0.5ml of sample, the second plate (B) the same conditions except with normal phase, and the third plate (C) reverse phase with a flow rate of 1.5ml/min.

227 Chapter 7

HPCCC Speed (rpm) HPCCC Temp (°C) Colibrick – 2 Colibrick – 3 210nm 245nm 280nm 366nm

HPCCC Speed (rpm) HPCCC Temp (°C) Colibrick – 2 Colibrick – 3 210nm 245nm 280nm 366nm * * * * * *

Figure 7.16 Chromatogram from 0.1 ml T4 at 3 ml/min in reverse phase after sodium hydroxide treatment. Both chromatograms show the UV absorbance plotted over time at four wavelengths, as well as the HPCCC speed and temperature. The bottom chromatogram also overlays the fractions collected, with asterisks above and the blue rectangle around the peak showing those fractions that were tested. Note that the line for 210 nm is not visible in this second chromatogram, but the peak is still apparent.

228 Chapter 7

OUT IN F2 F1

13 10 7 4 1

22 21 20 19 16

27 26 25 24 23

FB FB T4

Figure 7.17 Spot test of fractions of 0.1 ml T4 at 3 ml/min. The numbers correspond to 10 μl of that fraction spotted onto the bacterial host E. coli DSM613. IN – a sample of the mobile phase (water) entering the column, OUT – a sample of the mobile phase from the column after NaOH treatment. F1 – column flushed after the revolutions were stopped. F2 – same as F1, but after ~100 ml had been flushed. FB – Fortier Buffer. T4 – the initial T4 sample applied to the column.

229 Chapter 7 7.5. Discussion

Given the proteinaceous nature of the common virus structure, particularly bacteriophages, it stands to reason that implementing protein chromatography methods to separate or isolate whole phages directly should be possible. Despite the size difference and more complicated morphology of phages compared to a protein solution, it is expected that phages would act like proteins in chromatography systems (Oślizło et al., 2011). Therefore, to study the application of field-flow fractionation in a countercurrent chromatography setup is a worthy and novel endeavour.

As with the sdFFF experiments conducted previously, it is important to first establish whether the phages would survive the process (Caldwell et al., 1981). Ideally the phages should remain not only intact but also viable and biologically active, and this activity may be affected by the different shear forces exerted during the process. This was found not to be the case and both T4 (Figure 7.7) and ϕX174 (Figure 7.11) phages could be applied to the column with no adverse effects. However, the CCC forces did have an effect on the elution profile of the phage samples. Instead of eluting in one bolus (as is injected), there was an increase then decrease in concentration as the elution progressed which matched the peak seen in the UV detection. There was no retention of the phages on the column that could be eluted with the mobile phase, as phages were eluted after one column volume. Although yield could not be accurately determined using these methods of detection, when a concentration gradient of each of the initial phage samples were spotted onto the respective host and compared to the fractions, there was no significant loss of phages in the resulting fractions. Had a portion of the phages become deactivated or otherwise lost in the column, then it was expected that the concentration of phages in the fractions would be much lower than they were. This is an advantage of the CCC process, since (in this case) the mobile phase can be fully eluted, and all sample be recovered (Sutherland, 2007), assuming that there is no interaction between phage particle and PFTE tubing, which has not been ruled out.

One of the primary difficulties that this technique will need to overcome is finding a suitable in-line detector that can suitably detect phage particles as they elute. Currently the use of UV and different wavelength channels do not detect the bacteriophages themselves, as shown by the various comparative spectra (Figure 7.14). It was noticeable that the bacterial and phage eluates both had a slight yellow colour, a dilute shade of the corresponding agar they were eluted from. This is not surprising and indicates that elution with FB causes components of the agar to be dissolved, along with the phages and bacterial cells and debris. It is suggested that it is this mixture of remnants that absorb at different wavelengths, causing the peaks seen. As these are present in both the bacterial and phage eluate, and the only difference between the two samples is the presence of phage, any differences in the spectra are likely to be due to phage presence. However, as can be seen, there are no obvious differences in the peaks or absorbance levels between the two samples for both T4 and ϕX174. Whilst the use of UV detection is also implemented and recommended in the anion-exchange chromatography method for purifying bacteriophages (Vandenheuvel et al., 2018), there were

230 Chapter 7 difficulties in detecting T4, T7 and tobacco mosaic virus elutions from sdFFF using UV detectors, because peaks from the virus particles could not easily be distinguished from background noise (Giddings et al., 1975). There are alternative in-line detectors, for example mass spectrometry or flame ionisation detectors, but these are destructive techniques, and require a subsample to be taken. Whilst the in-line automated detection systems are preferred, other post-detection methods can be used. For example, in the current study, spot assays were used to confirm the presence and activity of phages, spotting fractions onto a lawn of the corresponding E. coli host. Compared to utilising spectroscopy and in-line detection, this method requires more effort, skill and patience, with results available in >6 hours. As long as one active phage is present in 10 μl, then this should be detectable, and therefore theoretically the average lowest level of detection for the spot test is 102 PFU/ml. Yet, once a suitable fraction and elution time has been identified, the number of fractions tested and eventually the target fraction can be estimated fairly reliably.

Because the method of detection implemented employs the lytic action of the phage to visualise activity on a bacterial host lawn, removing remnant phage particles from previous runs is imperative to avoid false positives. Flushing the column with water or even 50 % methanol between runs is not adequate, as shown by the presence of phages in fractions earlier than the solvent front (Figure 7.15). Instead, a bolus of 0.1M sodium hydroxide should be used to flush the column in between runs to ensure remaining phages are inactivated. SdFFF has been often implemented in the separation of biological cells, including human and bacterial (Battu et al., 2001). Not only do optimisations need to be made to ensure cell viability, but also high recovery and elution of otherwise sterile fractions. “Channel poisoning” can occur when there are interactions between the particles and the column surfaces, which often arise as the cell suspensions are not pure, similar to the phage samples in this study. This leads to a drop in yield, uncharacteristic elution profiles and reduction of cell viability (Battu et al., 2001).

7.6. Conclusions and Future Work

Despite the promises that this novel application of countercurrent chromatography to bacteriophage separation holds, this study so far has concluded that utilising the CCC in this way showed similar capability to sdFFF techniques, but further work is required to demonstrate the application of hydrodynamic CCC to its full potential, which holds a number of advantages as a separation and purification technique. This study has established that phages can indeed survive the process, proved here for both the large tailed T4 coliphage, and the small microvirus ϕX174. This is already an important first step towards carving a new methodology using CCC. Not only this, but it is imperative that the columns are adequately flushed and phages are deactivated by pushing a bolus of sodium hydroxide through the column after use. The next steps for this work are firstly to try to decrease the flow rate substantially to determine whether there is any retention of the phages on the column. If phages can be retained on the column, other impurities can be eluted and removed from the column, then a change in parameters (such as an increase in flow rate) can elute the phages into a separate

231 Chapter 7 fraction. With a semi-preparative column volume of ~135 ml and a flow rate of 0.1 ml/min, this would take 1350 minutes, or 22.5 hours. Once retention is established, the flow rate should be slowly increased to determine at what point each phage would elute, to see if separation could be achieved. It is unclear as to how much resolution there will be for the separation of phages that do not differ drastically in size, but by using T4 and ϕX174 initially, this would give an idea of feasibility. Another avenue to consider would be the use of a biphasic system, such as the use of a polymer-salt system or micelles in an aqueous two phase system, which has been successfully implemented on the filamentous phage M13 (Jue et al., 2014).

232 Chapter 8

8. General Discussion and Conclusions

The rumen microbiome is a complex community of microorganisms, contributing to breakdown of the fibrous plant matter that the ruminant host ingests. With the recent development of genome sequencing, even more interesting findings have come to light, in the way of developing our understanding of these microbial populations and their functions, as well as diversity in the rumen and across different ruminant hosts. It is now understood that there is a core rumen microbiome, comprised of some of the key bacterial genera and methanogenic species (Henderson et al., 2015). Furthermore, a metagenomic study of the viral populations in the rumen of steers fed different diets revealed there were abundant populations of phages found across all samples, indicating a core virome (Anderson et al., 2017). One of the key hurdles that metagenomic and metatranscriptomic data analysis face is the availability of rumen-specific data for comparative analysis. Contributing high quality data to databases to help tackle this issue is a running theme within these chapters.

Whilst the genomes for some key hyper ammonia producing bacterial species are available, there are likely to be more, as yet untested for this trait in vitro, and identifying more of these species in metagenomic datasets would be made possible if a genetic signature was identified that is unique to this community. A method to determine this signature was established in Chapter 3, where model bacterial species were identified which satisfied the following requirements; culturable in vitro, genome available (or made available through this study) and the ammonia production phenotype was measurable and confirmed, and fell into one of three categories; hyper ammonia producer (HAP), some ammonia producer (SAP) or no ammonia producer (NAP). Here, ammonia production was used as the measurable output that was assumed to correlate with the ability to breakdown amino acids for energy, which produces large amounts of ammonia. A primary concern with measuring ammonia is that the source is unknown. On this occasion, it was not possible to state that all ammonia produced was detected, nor was it a direct result of amino acid breakdown. To better monitor the movement of nitrogen through the ammonia pool, labelled nitrogen compounds could be used, something that was used previously by labelling ammonium sulphate to monitor extracellular ammonia and predict the movement of nitrogen from trypticase peptone into the ammonia pool and into cellular nitrogen (Russell, 1983). It is assumed that the excess ammonia a cell produces is removed, and it is this high ammonia production activity that separates the HAPs from the other groups. Future work could include the use of labelled nitrogen to confirm this, as well as developing other metabolomic approaches to reinforce some of the patterns indicated using FTIR analyses, such as using liquid chromatography and mass spectrometry to identify compounds, rather than types of compounds indicated by FTIR.

The genomes available for the nine model bacterial species (three each of HAPs, NAPs and SAPs) were used in comparative genomic approaches in Chapter 4 to identify genes that were unique to the hyper ammonia producers. Phylogenetic analysis looking at clustering of the HAPs amongst other

233 Chapter 8 genomes from the Hungate1000 collection revealed that hyper ammonia production was not a monophyletic trait. A lack of genes with a high sequence similarity in the HAPs suggested that the phenotype is not brought about through horizontal gene transfer, nor was there evidence of metabolic pathways unique to all HAPs that could suggest convergent evolution of the same functions. Comparisons between the NAPs, SAPs and HAPs did not lead to the identification of a genomic signature that was unique to the HAPs. Instead, it became apparent that the genes and functions used by the HAPs for amino acid deamination for energy varied even within this group, with no evidence that the HAPs used a set of genes or functions to carry out a particular pathway that was unique to only the HAPs. Instead, the assortment of different genes and functions the HAPs used were common to amino acid metabolism generally, something that all cells take part in to some extent. It is these common genes involved in amino acid metabolism and synthesis that may be utilised more frequently and efficiently by the HAPs, something that was reflected in the transcriptomes. Another limitation was the small number of model bacterial species; only three of each group were available that satisfied the aforementioned requirements. Although there was evidence in the literature of other bacterial species that would produce large amounts of ammonia and were capable of growth on a nitrogen-only medium, as discussed in Chapter 3, the absence of cultures for these species meant that they could not be compared in the same conditions and the ammonia production phenotype could not be confirmed in this instance.

Once it was realised that the genes that HAPs utilise for amino acid breakdown are common to amino acid metabolism pathways, these results lead to the hypothesis that because the HAPs rely on this fermentation for energy, these genes may be more active compared to those in other bacterial species, something that would be reflected in the transcriptomes. This was the premise for Chapter 5, where the transcriptomes of the HAPs and NAPs were compared. This analysis also offered a way to overcome lack of biological replicates, a limitation of the genomic analyses. This study used multiple Hungate tubes containing active culture, acting as a contained independent environment and therefore as biological replicates in the transcriptome analysis. This revealed that glutamate and aspartate are important to these HAPs, and further evidence from these studies indicated that these may be used in a cycle, to produce ammonia and hydrogen, and other products that were intermediaries in the tricarboxylic acid cycle, the energy production pathway. This hypothesis was supported by the findings of transporter genes for these compounds that were unique to the HAP genomes and were expressed in the transcriptomes, as well as expression of other necessary enzymes needed for this cycle. If the HAPs use the tricarboxylic acid cycle but just utilise a different initiating compounds through amino acid degradation, replacing carbohydrate fermentation, this could explain why so few genomic differences were found between the HAPs, NAPs and SAPs. The absence of carbohydrate metabolism genes was not further explored here, because this study aimed to identify what the HAPs have that the NAPs lack, using the presence of a signature to signify the HAPs. However, it may be that these genomes lack some of the key carbohydrate metabolism genes, something which further studies could explore to determine whether a signature can be identified based on the absence of genes

234 Chapter 8 or functions. Another advantage that transcriptomics overcomes is the identification of genes that are actively utilised during growth, whereas a genome contains only the cell’s functional potential. Additionally, many methods rely on sequence similarity approaches to identify and annotate gene functions, but whether these genes are functional is uncertain. The presence of non-functioning carbohydrate metabolism genes in the HAP genomes would shroud any HAP signatures in genome analyses. For example, it has been shown previously that whilst A. sticklandii harbours genes for glucose fermentation, a gene for a key protein (protein II of the phosphotransferase or PTS system) in the transporter system is absent, resulting in only slight fermentation of glucose noted (Fonknechten et al., 2010).

The results from the transcriptomic and genomic analyses lead to the formation of another hypothesis. It has already been postulated that amino acid utilisation is one of the oldest forms of energy production, powering metabolism in the primitive RNA-based world (de Vladar, 2012), and therefore it is likely that instead of these ruminal HAPs having evolved this ability recently (in evolutionary terms), it could instead be that they have not gained the more recently-evolved carbohydrate metabolism abilities. Thus, the HAPs are able to (or are forced to) reside in the obligate amino acid fermenting niche, occupying, and exploiting this in the rumen ecosystem. Yet, exactly what makes them competitive in this niche remains unclear. Future work could endeavour to identify what confers this advantage. As it has been established in the studies here that the functions used do not differ from those that many other bacteria possess, perhaps the enzymes HAPs encode are better adapted, or they harbour a repertoire of enzymes in the form of isoforms, which would give them an advantage in fluctuating environmental conditions, something which has been noted in predominant rumen bacteria previously (Rubino et al., 2017).

Ideally, to improve upon the methods and outcomes of this study, novel HAPs should be isolated using culturing techniques used previously that take advantage of the selective ability to use nitrogen sources as energy (Chen and Russell, 1988, 1989a; Russell et al., 1988). Once isolated, the genomes and transcriptomes can be sequenced, and the genome analysis methods carried out here can be repeated on this new data. Furthermore, a cellulose and ammonia rich selective medium can be used to isolate likely NAP candidates, and their phenotypes assessed and compared to the ones described here. Their genomes and transcriptomes can also be sequenced, offering a comparison for the HAPs. Hopefully, this would yield a larger dataset with more replicates available for comparison and reveal common trends in the HAPs that can lead to their identification in other datasets, with applications to not just the rumen environment, but others. For example, identification of a hyper ammonia producing bacterial population in the human gut would be interesting from the point of view of amino acid degradation in the human host, as well as links to human health and disease (Richardson et al., 2013).

The use of bacteriophages to target such interesting bacterial groups to improve rumen efficiency and mitigate methane emissions is something that has been discussed previously (Gilbert et al., 2015). Although the methanogens themselves seem like the obvious target for phage therapy, these are the

235 Chapter 8 primary hydrogen sinks in the rumen (Kamra, 2005), and perturbing this population could have catastrophic effects on enteric fermentation and ruminant host health. Therefore, alternative targets need to be found. Because of the role that HAPs play indirectly in methanogenesis through hydrogen and carbon dioxide production, as well as the excessive ammonia production and nitrogen loss from the ruminant host, these bacterial species would be ideal targets for phage therapy. Whilst this statement sounds like a reasonable and straightforward aim, it is, on the contrary, something that would require in-depth research, with preliminary characterisation and feasibility studies extending through to clinical trials and in vivo testing. Whilst these chapters work towards the former, the latter is beyond the scope of this study. Chapter 6 in particular presents the isolation and characterisation of five novel lytic Butyrivibrio phages and their genomes. These five phages were found to occupy four unique species, three unique genera but all belong to the same family, which based on TEM morphology and genome analysis, was thought likely to be Siphoviridae. This contribution of these five rumen phages (those that target rumen bacterial species or are isolated from rumen-associated samples) doubles the number of lytic phage genomes deposited in public databases to date. Although the aim was to find phages that target HAP bacterial species, the isolation of these phages shows that the methods are applicable, and the isolation and genome sequencing of any rumen phages remain a worthy endeavour. Therefore, these five phage genomes were a small but important contribution, and will aid in the identification and annotation of future rumen metagenomic and metatranscriptomic analyses (Gilbert and Ouwerkerk, 2020b).

Whilst no phages were isolated against the HAP species, there were areas of clearing found from some samples on lawns of Eubacterium pyruvativorans Isol6. However, these could not be propagated. Future work should concentrate on achieving better and more reliable lawns of these HAP species, perhaps trying a HAP specific medium. The addition of supplements did not seem to aid in the growth of the HAP strains in the lawns, however the reinforced clostridial medium did create a more dense and reliable lawn for Clostridium aminophilum ATCC49906. It was postulated that one of the reasons contributing to the absence of phages against these HAP bacterial hosts was the absence of these particular strains in the rumina tested. To yield the best and most relevant outcomes, ideally after HAPs are isolated as mentioned previously, using a selective medium with nitrogen as the only energy source, the same samples can be tested for the presence of phage targeting the HAPs.

Although the current methods employed here of centrifugation and ultrafiltration of rumen samples yielded phages, there was still an opportunity here to develop a method to input crude rumen fluid and receive as the output phage enriched fractions, or phage separation. This was the inspiration behind Chapter 7, a combined effort with Dynamic Extractions, a company specialising in countercurrent chromatography (CCC). Whilst chromatography techniques have been applied to phage separation before, there was no evidence in the literature that this approach has been used or even trialled. CCC has advantages over other separation and purification techniques in that it can be scaled up easily. The aim of this chapter was to take the first steps in doing this, and it was established that phages can

236 Chapter 8 indeed survive this process, which involves long columns and repeated acceleration forces through centrifugation. This was confirmed for the phages T4 and PhiX174, both Enterobacteria phages, therefore would benefit from further exploration with a greater range of phages. Additionally, it was found that phages remained in the columns even after flushing with mobile phase (water), and that adequate disinfection of the columns is required between runs, for example using sodium hydroxide. Furthermore, in-line detection of the phages themselves proved to be difficult because the phage particles are not detected by UV detection. Proteins are typically detected at 280 nm, so therefore phages should also be detected in the UV range due to their high protein content, yet this was not found to be the case. Alternative detection methods should be tried, such as mass spectrometry or flame ionisation detectors, which are destructive techniques. Plaque assays and spot tests were ultimately the best method to detect phage presence and gave quantitative results of phage concentrations in fractions. Although this work laid down some important foundations, there remains a lot of potential for this as a phage separation method, and next steps would include testing the biphasic separation method, potentially using polyethylene glycol and employ more of the CCC capabilities, not just the acceleration and centrifugal forces which were used here.

8.1. Conclusions

The work presented here has demonstrated the complexity and variability underlying the seemingly simple phenotype of the HAPs. This warrants further investigation and isolation of novel HAPs in order to better understand this group. This will then enable specific modifications of the rumen ecosystem to decrease outputs from production ruminants. Phage therapy is one approach to this problem that has been relatively little explored to date. While the work presented here did not identify phages active against HAPs, a further five rumen phages were identified that infect Butyrivibrio fibrisolvens, a predominant species found in the rumen, thereby contributing valuable information about the structure and function of the rumen ecosystem. It is suggested that continuation of this line of enquiry would complement ongoing research utilising metagenomics, metatranscriptomics and metaproteomics aimed at understanding and improving rumen efficiency.

237 Chapter 10 - Appendix

9. Bibliography

Abdoun, K., Stumpff, F., and Martens, H. (2007). Ammonia and urea transport across the rumen epithelium: a review. Anim. Health Res. Rev. 7, 43–59. doi:10.1017/S1466252307001156. Abecasis, A. B., Serrano, M., Alves, R., Quintais, L., Pereira-Leal, J. B., and Henriques, A. O. (2013). A genomic signature and the identification of new sporulation genes. J. Bacteriol. 195, 2101–2115. doi:10.1128/JB.02110-12. Ackermann, H. W. (2009). Phage classification and characterization. Methods Mol. Biol. 501, 127– 140. Ackermann, H. W., and Krisch, H. M. (1997). A catalogue of T4-type bacteriophages. Arch. Virol. 142, 2329–2345. doi:10.1007/s007050050246. Adams, J. C., Gazaway, J. A., Brailsford, M. D., Hartman, P. A., and Jacobson, N. L. (1966). Isolation of bacteriophages from the bovine rumen. Experientia 22, 717–718. doi:10.1007/BF01901335. Adriaenssens, E. M., and Brister, J. R. (2017). How to name and classify your phage: An informal guide. Viruses 9, 1–9. doi:10.3390/v9040070. Adriaenssens, E. M., and Cowan, D. A. (2014). Using signature genes as tools to assess environmental viral ecology and diversity. Appl. Environ. Microbiol. 80, 4470–4480. doi:10.1128/AEM.00878-14. Adriaenssens, E. M., Lehman, S. M., Vandersteegen, K., Vandenheuvel, D., Philippe, D. L., Cornelissen, A., et al. (2012). CIM® Monolithic Anion-Exchange Chromatography as a useful alternative to CsCl gradient purification of bacteriophage particles. Virology 434, 265–270. doi:10.1016/j.virol.2012.09.018.CIM. Agarwal, N., Kamra, D. N., and Chaudhary, L. C. (2015). “Rumen Microbial Ecosystem of Domesticated Ruminants,” in Rumen Microbiology: From Evolution to Revolution, eds. A. K. Puniya, R. Singh, and D. N. Kamra (New Delhi: Springer India), 17–30. doi:10.1007/978-81-322-2401-3_2. Almpanis, A., Swain, M., Gatherer, D., and McEwan, N. (2018). Correlation between bacterial G+C content, genome size and the G+C content of associated plasmids and bacteriophages. Microb. genomics 4, 0–7. doi:10.1099/mgen.0.000168. Altermann, E., Schofield, L. R., Ronimus, R. S., Beattie, A. K., Reilly, K., and Ferguson, D. J. (2018). Inhibition of Rumen Methanogens by a Novel Archaeal Lytic Enzyme Displayed on Tailored Bionanoparticles. Front. Microbiol. 9, 2378. doi:10.3389/fmicb.2018.02378. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990). Basic local alignment search tool. J. Mol. Biol. 215, 403–410. doi:10.1016/S0022-2836(05)80360-2. Anderson, C. L., Sullivan, M. B., and Fernando, S. C. (2017). Dietary energy drives the dynamic response of bovine rumen viral communities. Microbiome 5, 155. doi:10.1186/s40168-017-0374-3. Andrade, S. L. A., and Einsle, O. (2007). The Amt/Mep/Rh family of ammonium transport proteins (Review). Mol. Membr. Biol. 24, 357–365. doi:10.1080/09687680701388423. Antibody Design Laboratories (2015). Small-Scale Preparation of Filamentous Bacteriophage by PEG Precipitation. Available at: http://www.abdesignlabs.com/technical-resources/bacteriophage- preparation/ [Accessed September 13, 2019]. Arelovich, H. M., Owens, F. N., Horn, G. W., and Vizcarra, J. A. (2000). Effects of supplemental zinc and manganese on ruminal fermentation, forage intake, and digestion by cattle fed prairie hay and urea. J. Anim. Sci. 78, 2972–2979. doi:10.2527/2000.78112972x. Argov, T., Sapir, S. R., Pasechnek, A., Azulay, G., Stadnyuk, O., Rabinovich, L., et al. (2019). Coordination of cohabiting phage elements supports bacteria–phage cooperation. Nat. Commun. 10, 1–14. doi:10.1038/s41467-019-13296-x. Atasoglu, C., Newbold, C. J., and Wallace, R. J. (2001). Incorporation of [ 15 N ] Ammonia by the Cellulolytic Ruminal and Ruminococcus flavefaciens 17. Society 67, 2819–2822. doi:10.1128/AEM.67.6.2819.

238 Chapter 10 - Appendix ATCC Product Sheet - Clostridium aminophilum (ATCC 49906). Attwood, G., and McSweeney, C. (2008). Methanogen genomics to discover targets for methane mitigation technologies and options for alternative H2 utilisation in the rumen. Aust. J. Exp. Agric. 48, 28–37. doi:10.1071/EA07203. Attwood, G. T., Kelly, W. J., Altermann, E. H., and Leahy, S. C. (2008). Analysis of the Methanobrevibacter ruminantium draft genome: Understanding methanogen biology to inhibit their action in the rumen. Aust. J. Exp. Agric. 48, 83–88. doi:10.1071/EA07269. Attwood, G. T., Klieve, A. V, Ouwerkerk, D., and Patel, B. K. (1998). Ammonia-hyper producing bacteria from New Zealand ruminants. Appl. and Environ. Microbiol. 64, 1794–1804. Aziz, R. K., Ackermann, H. W., Petty, N. K., and Kropinski, A. M. (2018). “Essential steps in characterizing bacteriophages: Biology, taxonomy, and genome analysis,” in Methods in Molecular Biology (Humana Press Inc.), 197–215. doi:10.1007/978-1-4939-7343-9_15. Bach, A., Calsamiglia, S., and Stern, M. D. (2005). Nitrogen Metabolism in the Rumen. J. Dairy Sci. 88, E9–E21. doi:10.3168/JDS.S0022-0302(05)73133-7. Bailly-Bechet, M., Vergassola, M., and Rocha, E. (2007). Causes for the intriguing presence of tRNAs in phages. Genome Res. 17, 1486–1495. doi:10.1101/gr.6649807. Barker, H. A. (1981). Amino acid degradation by anaerobic bacteria. Annu. Rev. Biochem. 50, 23–40. Battu, S., Roux, A., Delebasee, S., Bosgiraud, C., and Cardot, P. J. P. (2001). Sedimentation field-flow fractionation device cleaning, decontamination and sterilization procedures for cellular analysis. J. Chromatogr. B Biomed. Sci. Appl. 751, 131–141. doi:10.1016/S0378-4347(00)00462-X. Belanche, A., Weisbjerg, M. R., Allison, G. G., Newbold, C. J., and Moorby, J. M. (2013). Estimation of feed crude protein concentration and rumen degradability by Fourier-transform infrared spectroscopy. J. Dairy Sci. 96, 7867–7880. doi:10.3168/jds.2013-7127. Bento, C., de Azevedo, A., Detmann, E., and Mantovani, H. (2015). Biochemical and genetic diversity of carbohydrate-fermenting and obligate amino acid-fermenting hyper-ammonia-producing bacteria from Nellore steers fed tropical forages and supplemented with casein. BMC Microbiol. 15, 28. doi:10.1186/s12866-015-0369-9. Berg Miller, M. E., Yeoman, C. J., Chia, N., Tringe, S. G., Angly, F. E., Edwards, R. A., et al. (2012). Phage-bacteria relationships and CRISPR elements revealed by a metagenomic survey of the rumen microbiome. Environ. Microbiol. 14, 207–227. doi:10.1111/j.1462-2920.2011.02593.x. Berthod, A. (2007). Countercurrent chromatography and the Journal of Liquid Chromatography: A love story. J. Liq. Chromatogr. Relat. Technol. 30, 1447–1463. doi:10.1080/10826070701277067. Berthod, A., and Faure, K. (2015). Separations with a Liquid Stationary Phase: Countercurrent Chromatography or Centrifugal Partition Chromatography. Anal. Sep. Sci., 1177–1206. doi:10.1002/9783527678129.assep046. Berthod, A., Maryutina, T., Spivakov, B., Shpigun, O., and Sutherland, I. A. (2009). Countercurrent chromatography in analytical chemistry (IUPAC technical report). Pure Appl. Chem. 81, 355–387. doi:10.1351/PAC-REP-08-06-05. Blackburn, T. H., and Hobson, P. N. (1960). Proteolysis in the Sheep Rumen by Whole and Fractionated Rumen Contents. J. Gen. Microbiol. 22, 272–281. doi:10.1099/00221287-22-1-272. Blackburn, T. H., and Hobson, P. N. (1962). Further Studies on the Isolation of Proteolytic Bacteria from the Sheep Rumen. J. Gen. Microbiol. 29, 69–81. doi:10.1099/00221287-29-1-69. Bladen, H. a, Bryant, M. P., and Doetsch, R. N. (1961). A Study of Bacterial Species from the Rumen Which Produce Ammonia from Protein Hydrolyzate. Appl. Microbiol. 9, 175–80. Available at: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1057698&tool=pmcentrez&rendertype=ab stract. Blondel, V. D., Guillaume, J. L., Lambiotte, R., and Lefebvre, E. (2008). Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008. doi:10.1088/1742- 5468/2008/10/P10008. 239 Chapter 10 - Appendix Bolger, A. M., Lohse, M., and Usadel, B. (2014). Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120. doi:10.1093/bioinformatics/btu170. Bonvin, J., Aponte, R. A., Marcantonio, M., Singh, S., Christendat, D., and Turnbull, J. L. (2006). Biochemical characterization of prephenate dehydrogenase from the hyperthermophilic bacterium Aquifex aeolicus . Protein Sci. 15, 1417–1432. doi:10.1110/ps.051942206. Borst, P. (2020). The malate–aspartate shuttle (Borst cycle): How it started and developed into a major metabolic pathway. IUBMB Life 72, 2241–2259. doi:10.1002/iub.2367. Bougouin, A., Leytem, A., Dijkstra, J., Dungan, R. S., and Kebreab, E. (2016). Nutritional and Environmental Effects on Ammonia Emissions from Dairy Cattle Housing: A Meta-Analysis. J. Environ. Qual. 45, 1123. doi:10.2134/jeq2015.07.0389. Bourdin, G., Schmitt, B., Guy, L. M., Germond, J. E., Zuber, S., Michot, L., et al. (2014). Amplification and purification of T4-Like Escherichia coli phages for phage therapy: From laboratory to pilot scale. Appl. Environ. Microbiol. 80, 1469–1476. doi:10.1128/AEM.03357-13. Braun, U., Zürcher, S., and Hässig, M. (2015). Eating and rumination activity in 10 cows over 10 days. Res. Vet. Sci. 101, 196–198. doi:10.1016/j.rvsc.2015.05.001. Bryant, M. P., and Doetsch, R. N. (1954). Factors necessary for the growth of Bacteroides succinogenes in the volatile acid fraction of rumen fluid. Science (80-. ). doi:10.1126/science.120.3127.944-a. Bryant, M. P., and Robinson, I. M. (1961). Studies on the Nitrogen Requirements of Some Ruminal Cellulolytic Bacteria. Appl. Microbiol. Bryant, M. P., and Robinson, I. M. (2010). Apparent Incorporation of Ammonia and Amino Acid Carbon During Growth of Selected Species of Ruminal Bacteria. J. Dairy Sci. 46, 150–154. doi:10.3168/jds.s0022-0302(63)88991-2. Buchfink, B., Xie, C., and Huson, D. H. (2015). Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60. doi:10.1038/nmeth.3176. Buddle, B. M., Denis, M., Attwood, G. T., Altermann, E., Janssen, P. H., Ronimus, R. S., et al. (2011). Strategies to reduce methane emissions from farmed ruminants grazing on pasture. Vet. J. 188, 11–17. doi:10.1016/j.tvjl.2010.02.019. Cai, L., Yang, Y., Jiao, N., and Zhang, R. (2015). Evaluation of tangential flow filtration for the concentration and separation of bacteria and viruses in contrasting marine environments. PLoS One 10. doi:10.1371/journal.pone.0136741. Caldwell, K. D., Karaiskakis, G., and Giddings, J. C. (1981). Characterization Flow Fractionation of T4d Virus by Field-flow Fractionation. J. Chromatogr. A 215, 323–332. doi:https://doi.org/10.1016/S0021-9673(00)81411-9. Carberry, C. A., Waters, S. M., Kenny, D. A., and Creevey, C. J. (2014). Rumen methanogenic genotypes differ in abundance according to host residual feed intake phenotype and diet type. Appl. Environ. Microbiol. 80, 586–594. doi:10.1128/AEM.03131-13. Caspi, R., Altman, T., Billington, R., Dreher, K., Foerster, H., Fulcher, C. A., et al. (2014). The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Res. 42, 459–471. doi:10.1093/nar/gkt1103. Chamkha, M., Labat, M., Patel, B. K. C., and Garcia, J. L. (2001). Isolation of a cinnamic acid- metabolizing Clostridium glycolicum strain from oil mill wastewaters and emendation of the species description. Int. J. Syst. Evol. Microbiol. 51, 2049–2054. doi:10.1099/00207713-51-6-2049. Chang, Y. J., Pukall, R., Saunders, E., Lapidus, A., Copeland, A., Nolan, M., et al. (2010). Complete genome sequence of Acidaminococcus fermentans type strain (VR4 T). Stand. Genomic Sci. 3, 1–14. doi:10.4056/sigs.1002553. Chen, G., and Russell, J. B. (1988). Fermentation of peptides and amino acids by a monensin-senstive ruminal peptostreptococcus. Appl. Environ. Microbiol. 54, 2742–2749. Chen, G., and Russell, J. B. (1989a). More monensin-sensitive, ammonia-producing bacteria from the 240 Chapter 10 - Appendix rumen. Appl. Environ. Microbiol. 55, 1052–1057. Chen, G., and Russell, J. B. (1989b). Sodium-dependent transport of branched-chain amino acids by a monensin-sensitive ruminal peptostreptococcus. Appl. Environ. Microbiol. 55, 2658–2663. Chen, G., and Russell, J. B. (1990). Transport and deamination of amino acids by a Gram-positive, monensin-sensitive ruminal bacterium. Appl. Environ. Microbiol. 56, 2186–2192. Cheng, S., Karkar, S., Bapteste, E., Yee, N., Falkowski, P., and Bhattacharya, D. (2014). Sequence similarity network reveals the imprints of major diversification events in the evolution of microbial life. Front. Ecol. Evol. 2, 1–13. doi:10.3389/fevo.2014.00072. Choudhury, P. K., Salem, A. Z. M., Jena, R., Kumar, S., Singh, R., and Puniya, A. K. (2015). “Rumen Microbiology: An Overview,” in Rumen Microbiology: From Evolution to Revolution, eds. A. K. Puniya, R. Singh, and D. N. Kamra (New Delhi: Springer India), 3–16. doi:10.1007/978-81-322-2401- 3_1. Clauss, M., Hume, I. D., and Hummel, J. (2010). Evolutionary adaptations of ruminants and their potential relevance for modern production systems. Animal 4, 979–992. doi:10.1017/s1751731110000388. Clokie, M. R. J., and Kropinski, A. M. (2009). Bacteriophages : methods and protocols. doi:10.1007/978-1-60327-164-6. Clokie, M. R. J., Millard, A. D., Letarov, A. V, and Heaphy, S. (2011). Phages in nature. Bacteriophage 1, 31–45. doi:10.4161/bact.1.1.14942. Comtet-Marre, S., Parisot, N., Lepercq, P., Chaucheyras-Durand, F., Mosoni, P., Peyretaillade, E., et al. (2017). Metatranscriptomics reveals the active bacterial and eukaryotic fibrolytic communities in the rumen of dairy cow fed a mixed diet. Front. Microbiol. 8, 1–13. doi:10.3389/fmicb.2017.00067. Contado, C. (2017). Field flow fractionation techniques to explore the “nano-world.” Anal. Bioanal. Chem. 409, 2501–2518. doi:10.1007/s00216-017-0180-6. Cotta, M. A., and Hespell, R. B. (1986). Proteolytic activity of the ruminal bacterium Butyrivibrio fibrisolvens. Appl. Environ. Microbiol. 52, 51–58. doi:10.1073/pnas.85.12.4426. Counotte, G. H., Prins, R. A., Janssen, R. H., and Debie, M. J. (1981). Role of Megasphaera elsdenii in the Fermentation of dl-[2-C]lactate in the Rumen of Dairy Cattle. Appl. Environ. Microbiol. 42, 649– 55. doi:10.1128/AEM.42.4.649-655.1981. Creevey, C. J., Doerks, T., Fitzpatrick, D. A., Raes, J., and Bork, P. (2011). Universally distributed single-copy genes indicate a constant rate of horizontal transfer. PLoS One 6. doi:10.1371/journal.pone.0022099. Creevey, C. J., Kelly, W. J., Henderson, G., and Leahy, S. C. (2014). Determining the culturability of the rumen bacterial microbiome. Microb. Biotechnol. 7, 467–479. doi:10.1111/1751-7915.12141. Crichton, R. R. (2012). “Zinc – Lewis Acid and Gene Regulator,” in Biological Inorganic Chemistry (Elsevier), 229–246. doi:10.1016/b978-0-444-53782-9.00012-7. Csárdi, G., and Nepusz, T. (2006). The igraph software package for complex network research. InterJournal Complex Syst. 1695, 1695. doi:10.3724/SP.J.1087.2009.02191. Darling, A. C. E., Mau, B., Blattner, F. R., and Perna, N. T. (2004). Mauve: Multiple alignment of conserved genomic sequence with rearrangements. Genome Res. doi:10.1101/gr.2289704. Darwin, C. (1859). On the origin of species. Orig. species Oxford world’s Class. Darzi, Y., Letunic, I., Bork, P., and Yamada, T. (2018). IPath3.0: Interactive pathways explorer v3. Nucleic Acids Res. 46, W510–W513. doi:10.1093/nar/gky299. de Mendiburu, F. (2019). agricolae: Statistical Procedure for Agricultural Research. Available at: https://cran.r-project.org/package=agricolae%0A. de Queiroz, A., and Gatesy, J. (2007). The supermatrix approach to systematics. Trends Ecol. Evol. 22, 34–41. doi:10.1016/j.tree.2006.10.002.

241 Chapter 10 - Appendix de Vladar, H. P. (2012). Amino acid fermentation at the origin of the genetic code. Biol. Direct 7, 6. doi:10.1186/1745-6150-7-6. Delcher, A. L., Bratke, K. A., Powers, E. C., and Salzberg, S. L. (2007). Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 23, 673–679. doi:10.1093/bioinformatics/btm009. Dennis, S. M., Nagaraja, T. G., and Bartley, E. E. (1981). Effects of Lasalocid or Monensin on Lactate-Producing or -Using Rumen Bacteria. J. Anim. Sci. 52, 418–426. Durante-Rodríguez, G., Mancheño, J. M., Díaz, E., and Carmona, M. (2016). Refactoring the λ phage lytic/lysogenic decision with a synthetic regulator. Microbiologyopen 5, 575–581. doi:10.1002/mbo3.352. Eddy, S. R. (2011). Accelerated profile HMM searches. PLoS Comput. Biol. 7. doi:10.1371/journal.pcbi.1002195. Edgar, R. C. (2004). MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797. doi:10.1093/nar/gkh340. Edwards, J. E., Forster, R. J., Callaghan, T. M., Dollhofer, V., Dagar, S. S., Cheng, Y., et al. (2017). PCR and omics based techniques to study the diversity, ecology and biology of anaerobic fungi: Insights, challenges and opportunities. Front. Microbiol. 8. doi:10.3389/fmicb.2017.01657. Edwards, J. E., Huws, S. A., Kim, E. J., and Kingston-Smith, A. H. (2007). Characterization of the dynamics of initial bacterial colonization of nonconserved forage in the bovine rumen. FEMS Microbiol. Ecol. 62, 323–335. doi:10.1111/j.1574-6941.2007.00392.x. Edwards, J. E., Kingston-Smith, A. H., Jimenez, H. R., Huws, S. A., Skøt, K. P., Griffith, G. W., et al. (2008). Dynamics of initial colonization of nonconserved perennial ryegrass by anaerobic fungi in the bovine rumen. FEMS Microbiol. Ecol. 66, 537–545. doi:10.1111/j.1574-6941.2008.00563.x. Elsden, S. R., and Hilton, M. G. (1979). Amino acids utiliation in clostridial taxonomy. 1, 137–141. Elsden, S. R., Hilton, M. G., and Waller, J. M. (1976). The end products of the metabolism of aromatic amino acids by clostridia. Arch. Microbiol. 107, 283–288. doi:10.1007/BF00425340. Eschenlauer, S. C. P., McKain, N., Walker, N. D., McEwan, N. R., Newbold, C. J., and Wallace, R. J. (2002). Ammonia production by ruminal microorganisms and enumeration, isolation, and characterization of bacteria capable of growth on peptides and amino acids from the sheep rumen. Appl. Environ. Microbiol. 68, 4925–4931. doi:10.1128/AEM.68.10.4925-4931.2002. Eskelin, K., Lampi, M., Meier, F., Moldenhauer, E., Bamford, D. H., and Oksanen, H. M. (2016). Asymmetric flow field flow fractionation methods for virus purification. J. Chromatogr. A 1469, 108– 119. doi:10.1016/j.chroma.2016.09.055. Eskelin, K., Poranen, M. M., and Oksanen, H. M. (2019). Asymmetrical flow field-flow fractionation on virus and virus-like particle applications. Microorganisms 7, 1–20. doi:10.3390/microorganisms7110555. Evans, T. G. (2015). Considerations for the use of transcriptomics in identifying the “genes that matter” for environmental adaptation. J. Exp. Biol. 218, 1925–1935. doi:10.1242/jeb.114306. Ewels, P., Magnusson, M., Lundin, S., and Käller, M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–3048. doi:10.1093/bioinformatics/btw354. FAO (2016a). Reducing Enteric Methane for Improving Food Security and Livelihoods. 8. FAO (2016b). the Contributions of Livestock Species and Breeds to Ecosystem Services. Available at: http://www.fao.org/documents/card/en/c/25208ece-20f2-44d8-a63e-7d7c84950a9d/. FAO (2017). The Future of Food and Agriculture - Trends and Challenges. Rome. Fedotov, P. S., Ermolin, M. S., and Katasonova, O. N. (2015). Field-flow fractionation of nano- and microparticles in rotating coiled columns. J. Chromatogr. A 1381, 202–209. doi:10.1016/j.chroma.2014.12.079.

242 Chapter 10 - Appendix Fernández, M., and Zúñiga, M. (2006). Amino acid catabolic pathways of lactic acid bacteria. Crit. Rev. Microbiol. 32, 155–183. doi:10.1080/10408410600880643. Flint, H. J. (1997). The rumen microbial ecosystem—some recent developments. Trends Microbiol. 5, 483–488. doi:10.1016/S0966-842X(97)01159-1. Fonknechten, N., Chaussonnerie, S., Tricot, S., Lajus, A., Andreesen, J. R., Perchat, N., et al. (2010). Clostridium sticklandii, a specialist in amino acid degradation:revisiting its metabolism through its genome sequence. BMC Genomics 11, 555. doi:10.1186/1471-2164-11-555. Fortier, L.-C., and Moineau, S. (2009). “Phage Production and Maintenance of Stocks, Including Expected Stock Lifetimes,” in Bacteriophages: Methods and Protocols, Volume 1, eds. M. R. J. Clokie and A. M. Kropinski (New York: Humana Press), 203–219. Frandson, R. D., Wilke, W. L., and Fails, A. D. (2009). “Anatomy and physiology of farm animals,” in Anatomy and physiology of farm animals (Wiley-Blackwell), 512. doi:10.1017/CBO9781107415324.004. Friedersdorff, J. C. A., Kingston-Smith, A. H., Pachebat, J. A., Cookson, A. R., Rooke, D., and Creevey, C. J. (2020). The Isolation and Genome Sequencing of Five Novel Bacteriophages From the Rumen Active Against Butyrivibrio fibrisolvens. Front. Microbiol. 11. doi:10.3389/fmicb.2020.01588. Froetschel, M. A., Martin, A. C., Amos, H. E., and Evans, J. J. (1990). Effects of zinc sulfate concentration and feeding frequency on ruminal protozoal numbers, fermentation patterns and amino acid passage in steers. J. Anim. Sci. 68, 2874. doi:10.2527/1990.6892874x. Galperin, M. Y., Brover, V., Tolstoy, I., and Yutin, N. (2016). Phylogenomic analysis of the family peptostreptococcaceae (Clostridium cluster xi) and proposal for reclassification of Clostridium litorale (Fendrich et al. 1991) and Eubacterium acidaminophilum (Zindel et al. 1989) as peptoclostridium lito. Int. J. Syst. Evol. Microbiol. 66, 5506–5513. doi:10.1099/ijsem.0.001548. Galperin, M. Y., Makarova, K. S., Wolf, Y. I., and Koonin, E. V. (2015). Expanded Microbial genome coverage and improved annotation in the COG database. Nucleic Acids Res. 43, D261– D269. doi:10.1093/nar/gku1223. Giddings, J. C., Yang, F. J. F., and Myers, M. N. (1975). Application of Sedimentation Field-Flow Fractionation to Biological Particles: Molecular Weights and Separation. Sep. Sci. 10, 133–149. doi:10.1080/00372367508058996. Gilbert, R. A., Kelly, W. J., Altermann, E., Leahy, S. C., Minchin, C., Ouwerkerk, D., et al. (2017). Toward understanding phage: Host interactions in the rumen; complete genome sequences of lytic phages infecting rumen bacteria. Front. Microbiol. 8, 1–17. doi:10.3389/fmicb.2017.02340. Gilbert, R. A., and Klieve, A. V. (2015). “Ruminal Viruses (Bacteriophages, Archaeaphages),” in Rumen Microbiology: From Evolution to Revolution, eds. A. K. Puniya, R. Singh, and D. N. Kamra (New Delhi: Springer), 121–142. Gilbert, R. A., and Ouwerkerk, D. (2020a). “Ruminal viruses and extrachromosomal genetic elements,” in Improving Rumen Function, eds. C. S. Mcsweeney and R. I. Mackie (Cambridge, UK), 281–320. doi:10.19103/AS.2020.0067.10. Gilbert, R. A., Ouwerkerk, D., and Klieve, A. V. (2015). “Phage Therapy in Livestock Methane Amelioration,” in Livestock Production and Climate Change, eds. P. K. Malik, R. Bhatta, J. Takahashi, R. Kohn, and C. S. Prasad (Wallingford: CABI), 318–335. Gilbert, R. A., Townsend, E. M., Crew, K. S., Hitch, T. C. A., Friedersdorff, J. C. A., Creevey, C. J., et al. (2020). Rumen Virus Populations: Technological Advances Enhancing Current Understanding. Front. Microbiol. 11, 450. doi:10.3389/fmicb.2020.00450. Gilbert, R., and Ouwerkerk, D. (2020b). The Genetics of Rumen Phage Populations. in Proceedings 2019 doi:10.3390/proceedings2019036165. Gill, J. J., and Hyman, P. (2010). Phage choice, isolation, and preparation for phage therapy. Curr. Pharm. Biotechnol. Gomez-Alarcon, R. A., O’Dowd, C., Leedle, J. A. Z., and Bryant, M. P. (1982). 1,4-Naphthoquinone

243 Chapter 10 - Appendix and other nutrient requirements of Succinivibrio dextrinosolvens. Appl. Environ. Microbiol. 44, 346– 350. Gordon, G. L. R., and Phillips, M. W. (1998). The role of anaerobic gut fungi in ruminants. Nutr. Res. Rev. 11, 133–168. doi:10.1079/NRR19980009. Gutiérrez, D., Fernández, L., Rodríguez, A., García, P., Gutiérrez, D., Fernández, L., et al. (2018). Practical Method for Isolation of Phage Deletion Mutants. Methods Protoc. 1, 6. doi:10.3390/mps1010006. Hackmann, T. J., and Firkins, J. L. (2015). Maximizing efficiency of rumen microbial protein production. Front. Microbiol. 6, 1–16. doi:10.3389/fmicb.2015.00465. Hackmann, T. J., and Spain, J. N. (2010). Invited review: ruminant ecology and evolution: perspectives useful to ruminant livestock research and production. J. Dairy Sci. 93, 1320–1334. doi:10.3168/jds.2009-2071. Haft, D. H., Selengut, J. D., and White, O. (2003). The TIGRFAMs database of protein families. Nucleic Acids Res. doi:10.1093/nar/gkg128. Halary, S. (2012). Evolutionary Gene and Genome Networks generator - User Guide. 1–10. Halary, S., McInerney, J. O., Lopez, P., and Bapteste, E. (2013). EGN: a wizard for construction of gene and genome similarity networks. BMC Evol. Biol. 13, 146. doi:10.1186/1471-2148-13-146. Hantke, K. (2005). Bacterial zinc uptake and regulators. Curr. Opin. Microbiol. 8, 196–202. doi:10.1016/j.mib.2005.02.001. Harrington, E. D., Singh, A. H., Doerks, T., Letunic, I., von Mering, C., Jensen, L. J., et al. (2007). Quantitative assessment of protein function prediction from metagenomics shotgun sequences. Proc. Natl. Acad. Sci. 104, 13913–13918. doi:10.1073/pnas.0702636104. Hart, E. H., Creevey, C. J., Hitch, T., and Kingston-Smith, A. H. (2018). Meta-proteomics of rumen microbiota indicates niche compartmentalisation and functional dominance in a limited number of metabolic pathways between abundant bacteria. Sci. Rep. 8, 1–11. doi:10.1038/s41598-018-28827-7. Hartinger, T., Gresner, N., and Südekum, K. H. (2018). Does intra-ruminal nitrogen recycling waste valuable resources? A review of major players and their manipulation. J. Anim. Sci. Biotechnol. 9. doi:10.1186/s40104-018-0249-x. Hatfull, G. F. (2008). Bacteriophage genomics. Curr. Opin. Microbiol. 11, 447–453. doi:10.1016/j.mib.2008.09.004. Hatfull, G. F., and Hendrix, R. W. (2011). Bacteriophages and their genomes. Curr. Opin. Virol. doi:10.1016/j.coviro.2011.06.009. Henderson, G., Cox, F., Ganesh, S., Jonker, A., Young, W., Abecia, L., et al. (2015). Rumen microbial community composition varies with diet and host, but a core microbiome is found across a wide geographical range. Sci. Rep. 5, 1–13. doi:10.1038/srep14567. Herskin, M. S., Munksgaard, L., and Ladewig, J. (2004). Effects of acute stressors on nociception, adrenocortical responses and behavior of dairy cows. Physiol. Behav. 83, 411–420. doi:10.1016/j.physbeh.2004.08.027. Hess, M., Sczyrba, A., Egan, R., Kim, T. W., Chokhawala, H., Schroth, G., et al. (2011). Metagenomic discovery of biomass-degrading genes and genomes from cow rumen. Science (80-. ). doi:10.1126/science.1200387. Hill, J., McSweeney, C., Wright, A.-D. G., Bishop-Hurley, G., and Kalantar-zadeh, K. (2016). Measuring Methane Production from Ruminants. Trends Biotechnol. 34, 26–35. doi:http://dx.doi.org/10.1016/j.tibtech.2015.10.004. Hino, T., and Russell, J. B. (1985). Effect of Reducing-Equivalent Disposal and NADH / NAD on Deamination of Amino Acids by Intact Rumen Microorganisms and Their Cell Extracts. 50, 1368– 1374. Hino, T., and Russell, J. B. (1987). Relative Contributions of Ruminal Bacteria and Protozoa To the

244 Chapter 10 - Appendix Degradation of Protein Invitro. J. Anim. Sci. 64, 261–270. Hobson, P. N. (1969). “Rumen Bacteria,” in Methods in Microbiology, eds. J. R. Norris and D. W. Ribbons (London: Academic Press), 131–149. Hofmann, R. R. (1989). Evolutionary steps of ecophysiological adaptation and diversification of ruminants: a comparative view of their digestive system. Oecologia 78, 443–457. Available at: https://link.springer.com/article/10.1007/BF00378733. Hook, S. E., Wright, A. D. G., and McBride, B. W. (2010). Methanogens: Methane producers of the rumen and mitigation strategies. Archaea 2010, 50–60. doi:10.1155/2010/945785. Hristov, A. N., Hanigan, M., Cole, A., Todd, R., McAllister, T. A., Ndegwa, P. M., et al. (2011). Review: Ammonia emissions from dairy farms and beef feedlots. Can. J. Anim. Sci. 91, 1–35. doi:10.4141/cjas10034. Huerta-Cepas, J., Forslund, K., Coelho, L. P., Szklarczyk, D., Jensen, L. J., Von Mering, C., et al. (2017). Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Mol. Biol. Evol. 34, 2115–2122. doi:10.1093/molbev/msx148. Huerta-Cepas, J., Szklarczyk, D., Forslund, K., Cook, H., Heller, D., Walter, M. C., et al. (2016). EGGNOG 4.5: A hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res. 44, D286–D293. doi:10.1093/nar/gkv1248. Huerta-Cepas, J., Szklarczyk, D., Heller, D., Hernández-Plaza, A., Forslund, S. K., Cook, H., et al. (2019). EggNOG 5.0: A hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 47, D309–D314. doi:10.1093/nar/gky1085. Hungate, R. E. (1950). The Anaerobic Mesophilic Cellulolytic Bacteria. Bacteriol. Rev. 14, 1 LP – 49. Available at: http://mmbr.asm.org/content/14/1/1.abstract. Hungate, R. E. (1975). The rumen microbial ecosystem. doi:10.1146/annurev.es.06.110175.000351. Huntington, G. B., and Archibeque, S. L. (2000). Practical aspects of urea and ammonia metabolism in ruminants. J. Anim. Sci. doi:10.2527/jas2000.77E-Suppl1y. Huws, S. A., Creevey, C. J., Oyama, L. B., Mizrahi, I., Denman, S. E., Popova, M., et al. (2018). Addressing global ruminant agricultural challenges through understanding the rumen microbiome: Past, present, and future. Front. Microbiol. 9, 2161. doi:10.3389/fmicb.2018.02161. Huws, S. A., Edwards, J. E., Creevey, C. J., Stevens, P. R., Lin, W., Girdwood, S. E., et al. (2016). Temporal dynamics of the metabolically active rumen bacteria colonizing fresh perennial ryegrass. FEMS Microbiol. Ecol. 92, 1–12. doi:10.1093/femsec/fiv137. Hyatt, D., Chen, G.-L., LoCascio, P. F., Land, M. L., Larimer, F. W., and Hauser, L. J. (2010). Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119. doi:10.1186/1471-2105-11-119. Hyman, P. (2019). Phages for Phage Therapy: Isolation, Characterization, and Host Range Breadth. Pharmaceuticals 12, 35. doi:10.3390/ph12010035. IPCC (2014). Climate Change 2014: Synthesis Report. Contribution of Working Groups I, II and III to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change. doi:10.1017/CBO9781107415324.004. Ito, Y. (2005). Golden rules and pitfalls in selecting optimum conditions for high-speed counter- current chromatography. J. Chromatogr. A 1065, 145–168. doi:10.1016/j.chroma.2004.12.044. Jain, C., Rodriguez-R, L. M., Phillippy, A. M., Konstantinidis, K. T., and Aluru, S. (2018). High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat. Commun. doi:10.1038/s41467-018-07641-9. Janssen, P. H. (2010). Influence of hydrogen on rumen methane formation and fermentation balances through microbial growth kinetics and fermentation thermodynamics. Anim. Feed Sci. Technol. 160, 1–22. doi:10.1016/j.anifeedsci.2010.07.002. 245 Chapter 10 - Appendix Janssen, P. H., and Kirs, M. (2008). Structure of the archaeal community of the rumen. Appl. Environ. Microbiol. 74, 3619–3625. doi:10.1128/AEM.02812-07. Jarvis, R. M., Broadhurst, D., Johnson, H., O’Boyle, N. M., and Goodacre, R. (2006). PYCHEM: A multivariate analysis package for python. Bioinformatics 22, 2565–2566. doi:10.1093/bioinformatics/btl416. Jue, E., Yamanishi, C. D., Chiu, R. Y. T., Wu, B. M., and Kamei, D. T. (2014). Using an aqueous two- phase polymer-salt system to rapidly concentrate viruses for improving the detection limit of the lateral-flow immunoassay. Biotechnol. Bioeng. 111, 2499–2507. doi:10.1002/bit.25316. Kamra, D. N. (2005). Rumen microbial ecosystem. Curr. Sci. 89, 124–135. Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M., and Tanabe, M. (2016). KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462. doi:10.1093/nar/gkv1070. Kassambara, A. (2020). ggpubr: “ggplot2” Based Publication Ready Plots. Available at: https://cran.r- project.org/package=ggpubr. Kelly, W. J., Pacheco, D. M., Li, D., Attwood, G. T., Altermann, E., and Leahy, S. C. (2016). The complete genome sequence of the rumen methanogen Methanobrevibacter millerae SM9. Stand Genomic Sci 11, 1–9. doi:10.1186/s40793-015-0038-5. Kim, J. N., Henriksen, E. D. C., Cann, I. K. O., and Mackie, R. I. (2014). Nitrogen utilization and metabolism in Ruminococcus albus 8. Appl. Environ. Microbiol. 80, 3095–3102. doi:10.1128/AEM.00029-14. Kim, J. N., Méndez–García, C., Geier, R. R., Iakiviak, M., Chang, J., Cann, I., et al. (2017). Metabolic networks for nitrogen utilization in Prevotella ruminicola 23. Sci. Rep. 7, 7851. doi:10.1038/s41598- 017-08463-3. Kim, M., Morrison, M., and Yu, Z. (2011). Status of the phylogenetic diversity census of ruminal microbiomes. FEMS Microbiol. Ecol. 76, 49–63. doi:10.1111/j.1574-6941.2010.01029.x. Kingston-Smith, A. H., Bollard, A. L., Armstead, I. P., Thomas, B. J., and Theodorou, M. K. (2003a). Proteolysis and cell death in clover leaves is induced by grazing. Protoplasma 220, 119–129. doi:10.1007/s00709-002-0044-5. Kingston-Smith, A. H., Bollard, A. L., Thomas, B. J., Brooks, A. E., and Theodorou, M. K. (2003b). Nutrient availability during the early stages of colonization of fresh forage by rumen micro-organisms. New Phytol. 158, 119–130. doi:10.1046/j.1469-8137.2003.00709.x. Kingston-Smith, A. H., and Theodorou, M. K. (2000). Post-ingestion metabolism of fresh forage. New Phytol. 148, 37–55. doi:10.1046/j.1469-8137.2000.00733.x. Kleiner, D. (1985). Bacterial ammonium transport. FEMS Microbiol. Lett. 32, 87–100. doi:10.1111/j.1574-6968.1985.tb01185.x. Klieve, A. V. (2005). “Bacteriophages,” in Methods in Gut Microbial Ecology for Ruminants, eds. H. P. S. . Makka and C. S. McSweeney (Dordrecht, Netherlands: Springer), 39–46. Klieve, A. V., Bain, P. A., Yokoyama, M. T., Ouwerkerk, D., Forster, R. J., and Turner, A. F. (2004). Bacteriophages that infect the cellulolytic ruminal bacterium Ruminococcus albus AR67. Lett. Appl. Microbiol. 38, 333–338. doi:10.1111/j.1472-765X.2004.01493.x. Klieve, A. V., and Gilbert, R. A. (2005). “Bacteriophage Populations,” in Methods in Gut Microbial Ecology for Ruminants, eds. H. P. S. Makkar and C. S. Mcsweeney (Dordrecht, Netherlands: Springer), 129–137. Klieve, A. V., Gregg, K., and Bauchop, T. (1991). Isolation and characterization of lytic phages from Bacteroides ruminicola ss brevis. Curr. Microbiol. 23, 183–187. doi:10.1007/BF02092277. Klieve, A. V., and Hegarty, R. S. (1999). Opportunities for biological control of ruminal methanogenesis. Aust. J. Agric. Res. 50, 1315–1319. doi:10.1071/AR99006. Klieve, A. V., Hudman, J. F., and Bauchop, T. (1989). Inducible bacteriophages from ruminal

246 Chapter 10 - Appendix bacteria. Appl. Environ. Microbiol. 55, 1630–1634. Klieve, A. V, and Bauchop, T. (1988). Morphological diversity of ruminal bacteriophages from sheep and cattle. Appl. Environ. Microbiol. 54, 1637–1641. Klieve, A. V, and Swain, R. A. (1993). Estimation of ruminal bacteriophage numbers by pulsed-field gel electrophoresis and laser densitometry. Appl. Environ. Microbiol. 59, 2299–2303. Kobayashi, Y., Shinkai, T., and Koike, S. (2008). Ecological and physiological characterization shows that Fibrobacter succinogenes is important in rumen fiber digestion - Review. Folia Microbiol. (Praha). 53, 195–200. doi:10.1007/s12223-008-0024-z. Krause, D. O., Denman, S. E., Mackie, R. I., Morrison, M., Rae, A. L., Attwood, G. T., et al. (2003). Opportunities to improve fiber degradation in the rumen: microbiology, ecology, and genomics. FEMS Microbiol. Rev. 27, 663–693. doi:10.1016/S0168-6445(03)00072-X. Krause, D. O., and Russell, J. B. (1996a). An rRNA Approach for Assessing the Role of Obligate Amino Acid-Fermenting Bacteria in Ruminal Amino Acid Deamination. Appl. Environ. Microbiol. 62, 815–821. Krause, D. O., and Russell, J. B. (1996b). How Many Ruminal Bacteria Are There? J. Dairy Sci. 79, 1467–1475. doi:10.3168/jds.S0022-0302(96)76506-2. Kumar, S., Choudhury, P. K., Carro, M. D., Griffith, G. W., Dagar, S. S., Puniya, M., et al. (2014). New aspects and strategies for methane mitigation from ruminants. Appl Microbiol Biotechnol 98, 31– 44. doi:10.1007/s00253-013-5365-0. Lan, W., and Yang, C. (2019). Ruminal methane production: Associated microorganisms and the potential of applying hydrogen-utilizing bacteria for mitigation. Sci. Total Environ. 654, 1270–1283. doi:10.1016/j.scitotenv.2018.11.180. Langmead, B., and Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nat. Methods. doi:10.1038/nmeth.1923. Lawrence, D., Baldridge, M. T., and Handley, S. A. (2019). Phages and Human Health: More Than Idle Hitchhikers. Viruses 11, 587. doi:10.3390/v11070587. Leahy, S. C., Kelly, W. J., Altermann, E., Ronimus, R. S., Yeoman, C. J., Pacheco, D. M., et al. (2010). The genome sequence of the rumen methanogen Methanobrevibacter ruminantium reveals new possibilities for controlling ruminant methane emissions. PLoS One 5, e8926. doi:10.1371/journal.pone.0008926. Lean, I. J., Golder, H. M., and Hall, M. B. (2014). Feeding, Evaluating, and Controlling Rumen Function. Vet. Clin. North Am. - Food Anim. Pract. 30, 539–575. doi:10.1016/j.cvfa.2014.07.003. Letunic, I., and Bork, P. (2007). Interactive Tree Of Life (iTOL): An online tool for phylogenetic tree display and annotation. Bioinformatics 23, 127–128. doi:10.1093/bioinformatics/btl529. Leytem, A. B., and Dungan, R. S. (2014). Livestock GRACEnet: A Workgroup Dedicated to Evaluating and Mitigating Emissions from Livestock Production. J. Environ. Qual. 43, 1101. doi:10.2134/jeq2014.06.0264. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., et al. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079. doi:10.1093/bioinformatics/btp352. Liao, Y., Smyth, G. K., and Shi, W. (2014). FeatureCounts: An efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930. doi:10.1093/bioinformatics/btt656. Lima, T., Auchincloss, A. H., Coudert, E., Keller, G., Michoud, K., Rivoire, C., et al. (2009). HAMAP: A database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot. Nucleic Acids Res. doi:10.1093/nar/gkn661. Liu, Z., Liu, Y., Murphy, J., and Maghirang, R. (2017). Ammonia and Methane Emission Factors from Cattle Operations Expressed as Losses of Dietary Nutrients or Energy. Agriculture 7, 16. doi:10.3390/agriculture7030016. 247 Chapter 10 - Appendix Lobley, G. E., and Milano, G. D. (1997). Regulation of hepatic nitrogen metabolism in ruminants. Proc. Nutr. Soc. 56, 547–563. doi:https://doi.org/10.1079/PNS19970057. Lockington, R. A., Attwood, G. T., and Brooker, J. D. (1988). Isolation and characterization of a temperate bacteriophage from the ruminal anaerobe Selenomonas ruminantium. Appl. Environ. Microbiol. Lomsadze, A., Gemayel, K., Tang, S., and Borodovsky, M. (2018). Modeling leaderless transcription and atypical genes results in more accurate gene prediction in prokaryotes. Genome Res. 28, 1079– 1089. doi:10.1101/gr.230615.117. Love, M. I., Huber, W., and Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 1–21. doi:10.1186/s13059-014-0550-8. Løvendahl, P., Huhtanen, P., Difford, G. F., Lidauer, M. H., Chagunda, M. G. G., Lund, P., et al. (2018). Review: Selecting for improved feed efficiency and reduced methane emissions in dairy cattle. Animal 12, s336–s349. doi:10.1017/s1751731118002276. Maglione, G., and Russell, J. B. (1997). The adverse effect of nitrogen limitation and excess- cellobiose on Fibrobacter succinogenes S85. Appl. Microbiol. Biotechnol. 48, 720–725. doi:10.1007/s002530051122. Makarova, K. S., and Koonin, E. V. (2007). Evolutionary genomics of lactic acid bacteria. J. Bacteriol. 189, 1199–1208. doi:10.1128/JB.01351-06. Mavrich, T. N., and Hatfull, G. F. (2017). Bacteriophage evolution differs by host, lifestyle and genome. Nat. Microbiol. 2, 17112. doi:10.1038/nmicrobiol.2017.112. Mayorga, O. L., Kingston-Smith, A. H., Kim, E. J., Allison, G. G., Wilkinson, T. J., Hegarty, M. J., et al. (2016). Temporal Metagenomic and Metabolomic Characterization of Fresh Perennial Ryegrass Degradation by Rumen Bacteria. Front. Microbiol. 7, 1–23. doi:10.3389/fmicb.2016.01854. McAllister, T. A., and Newbold, C. J. (2008). Redirecting rumen fermentation to reduce methanogenesis. Aust. J. Exp. Agric. 48, 7–13. doi:10.1071/EA07218. McDougall, E. I. (1948). Studies on ruminant saliva. 1. The composition and output of sheep’s saliva. Biochem. J. 43, 99–109. doi:10.1042/bj0430099. McNair, K., Bailey, B. A., and Edwards, R. A. (2012). PHACTS, a computational approach to classifying the lifestyle of phages. Bioinformatics. doi:10.1093/bioinformatics/bts014. McSweeney, C. S., Palmer, B., Bunch, R., and Krause, D. O. (1999). Isolation and characterization of proteolytic ruminal bacteria from sheep and goats fed the tannin-containing shrub legume Calliandra calothyrsus. Appl. Environ. Microbiol. 65, 3075–3083. Mead, G. C. (1971). The Amino Acid-Fermenting Clostridia. J Gen Microbiol 67, 47–56. doi:10.1099/00221287-67-1-47. Meier-Kolthoff, J. P., Auch, A. F., Klenk, H. P., and Göker, M. (2013). Genome sequence-based species delimitation with confidence intervals and improved distance functions. BMC Bioinformatics 14, 60. doi:10.1186/1471-2105-14-60. Meile, L., Jenal, U., Studer, D., Jordan, M., and Leisinger, T. (1989). Characterization of ψM1, a virulent phage of Methanobacterium thermoautotrophicum Marburg. Arch. Microbiol. 152, 105–110. doi:10.1007/BF00456085. Montgomery, L., Flesher, B., and Stahl, D. (1988). Transfer of Bacteroides succinogenes (Hungate) to Fibrobacter gen. nov. as Fibrobacter succinogenes comb. nov. and description of Fibrobacter intestinalis sp. nov. Int. J. Syst. Bacteriol. doi:10.1099/00207713-38-4-430. Morgavi, D. P., Forano, E., Martin, C., and Newbold, C. J. (2010). Microbial ecosystem and methanogenesis in ruminants. Animal 4, 1024–1036. doi:10.1017/S1751731112000407. Morgavi, D. P., Kelly, W. J., Janssen, P. H., and Attwood, G. T. (2012). Rumen microbial (meta)genomics and its application to ruminant production. Animal, 1–18. doi:10.1017/S1751731112000419.

248 Chapter 10 - Appendix Müller, T., Walter, B., Wirtz, A., and Burkovski, A. (2006). Ammonium toxicity in bacteria. Curr. Microbiol. 52, 400–406. doi:10.1007/s00284-005-0370-x. Murphy, E. C., and Frick, I. M. (2013). Gram-positive anaerobic cocci - commensals and opportunistic pathogens. FEMS Microbiol. Rev. 37, 520–553. doi:10.1111/1574-6976.12005. Nafikov, R. A., and Beitz, D. C. (2007). Carbohydrate and Lipid Metabolism in Farm Animals. J. Nutr. doi:10.1093/jn/137.3.702. Nakanishi, K. (1962). Infrared Absorption Spectroscopy. Tokyo: Nankodo. Namonyo, S., Wagacha, M., Maina, S., Wambua, L., and Agaba, M. (2018). A metagenomic study of the rumen virome in domestic caprids. Arch. Virol. 163, 3415–3419. doi:10.1007/s00705-018-4022-4. Nelson, R. S., Peterson, D. J., Karp, E. M., Beckham, G. T., and Salvachúa, D. (2017). Mixed Carboxylic Acid Production by Megasphaera elsdenii from Glucose and Lignocellulosic Hydrolysate. Fermentation 3, 1–16. doi:10.3390/fermentation3010010. Neph, S., Kuehn, M. S., Reynolds, A. P., Haugen, E., Thurman, R. E., Johnson, A. K., et al. (2012). BEDOPS: High-performance genomic feature operations. Bioinformatics. doi:10.1093/bioinformatics/bts277. Neumann-Schaal, M., Jahn, D., and Schmidt-Hohagen, K. (2019). Metabolism the difficile way: The key to the success of the pathogen Clostridioides difficile. Front. Microbiol. 10. doi:10.3389/fmicb.2019.00219. Newbold, C. J., De la Fuente, G., Belanche, A., Ramos-Morales, E., and McEwan, N. R. (2015). The role of ciliate protozoa in the rumen. Front. Microbiol. 6, 1–14. doi:10.3389/fmicb.2015.01313. Newbold, C. J., and Ramos-Morales, E. (2020). Review: Ruminal microbiome and microbial metabolome: Effects of diet and ruminant host. Animal 14, S78–S86. doi:10.1017/S1751731119003252. Nishimura, Y., Yoshida, T., Kuronishi, M., Uehara, H., Ogata, H., and Goto, S. (2017). ViPTree: the viral proteomic tree server. Bioinformatics 33, 2379–2380. doi:10.1093/bioinformatics/btx157. Noel, S. J., Olijhoek, D. W., Mclean, F., Løvendahl, P., Lund, P., and Højberg, O. (2019). Rumen and Fecal Microbial Community Structure of Holstein and Jersey Dairy Cows as Affected by Breed, Diet, and Residual Feed Intake. Animals 9, 498. doi:10.3390/ani9080498. Nolling, J., Groffen, a., de Vos, W. M., Nölling, J., Groffen, a., and de Vos, W. M. (1993). φ F1 and φF3, two novel virulent, archaeal phages infecting different thermophilic strains of the genus Methanobacterium. J. Gen. Microbiol. 139, 2511. doi:10.1099/00221287-139-10-2511. Novais, Â., Freitas, A. R., Rodrigues, C., and Peixe, L. (2019). Fourier transform infrared spectroscopy: unlocking fundamentals and prospects for bacterial strain typing. Eur. J. Clin. Microbiol. Infect. Dis. 38, 427–448. doi:10.1007/s10096-018-3431-3. Nyonyo, T., Shinkai, T., and Mitsumori, M. (2014). Improved culturability of cellulolytic rumen bacteria and phylogenetic diversity of culturable cellulolytic and xylanolytic bacteria newly isolated from the bovine rumen. FEMS Microbiol. Ecol. 88, 528–537. doi:10.1111/1574-6941.12318. Orpin, C. G., and Munn, E. A. (1973). The occurrence of bacteriophages in the rumen and their influence on rumen bacterial populations. Experentia 30, 1018–1021. Owens, F. N., and Basalan, M. (2016). “Ruminal Fermentation,” in Rumenology, eds. D. Millen, M. De Beni Arrigoni, and R. Lauritano Pacheco (Cham: Springer), 63–102. Papadimitriou, K., Alegría, Á., Bron, P. A., de Angelis, M., Gobbetti, M., Kleerebezem, M., et al. (2016). Stress Physiology of Lactic Acid Bacteria. Microbiol. Mol. Biol. Rev. 80, 837–890. doi:10.1128/mmbr.00076-15. Parker, D. S., Lomax, M. A., Seal, C. J., and Wilton, J. C. (1995). Metabolic implications of ammonia production in the ruminant. Proc. Nutr. Soc. 54, 549–563. doi:10.1079/PNS19950023. Paster, B. J., Russell, J. B., Yang, C. M., Chow, J. M., Woese, C. R., and Tanner, R. (1993). Phylogeny of the ammonia-producing ruminal bacteria Peptostreptococcus anaerobius, Clostridium

249 Chapter 10 - Appendix sticklandii, and Clostridium aminophilum sp. nov. Int. J. Syst. Bacteriol. 43, 107–110. doi:10.1099/00207713-43-1-107. Patra, A. K., Min, B.-R., and Saxena, J. (2012). “Dietary Tannins on Microbial Ecology of the Gastrointestinal Tract in Ruminants,” in Dietary Phytochemicals and Microbes, ed. A. K. Patra (Dordrecht: Springer Netherlands), 237–262. doi:10.1007/978-94-007-3926-0_8. Pearson, W. R. (2013). An Introduction to Sequence Similarity (“Homology”) Searching. Curr. Protoc. Bioinforma. 42, Unit3.1. doi:10.1002/0471250953.bi0301s42. Pedersen, T. L. (2019). ggforce: Accelerating “ggplot2.” Available at: https://cran.r- project.org/package=ggforce. Pengpeng, W., and Tan, Z. (2013). Ammonia Assimilation in Rumen Bacteria: A Review. Anim. Biotechnol. 24, 107–128. doi:10.1080/10495398.2012.756402. Petibois, C., and Desbat, B. (2010). Clinical application of FTIR imaging: New reasons for hope. Trends Biotechnol. 28, 495–500. doi:10.1016/j.tibtech.2010.07.003. Philipson, C. W., Voegtly, L. J., Lueder, M. R., Long, K. A., Rice, G. K., Frey, K. G., et al. (2018). Characterizing phage genomes for therapeutic applications. Viruses 10, 1–20. doi:10.3390/v10040188. Pittman, K. A., and Bryant, M. P. (1964). Peptides and Other Nitrogen Sources for Growth of Bacteroides. J. Bacteriol. 88, 401–410. Place, S. E., Stackhouse, K. R., Wang, Q., and Mitloehner, F. M. (2011). Mitigation of greenhouse gas emissions from U.S. beef and dairy production systems. ACS Symp. Ser. 1072, 443–457. doi:10.1021/bk-2011-1072.ch023. Potter, A. J., and Paton, J. C. (2014). Spermidine biosynthesis and transport modulate pneumococcal autolysis. J. Bacteriol. 196, 3556–3561. doi:10.1128/JB.01981-14. Puniya, A. K., Singh, R., and Kamra, D. N. (2015). Rumen microbiology: From evolution to revolution. doi:10.1007/978-81-322-2401-3. Purushe, J., Fouts, D. E., Morrison, M., White, B. A., Mackie, R. I., Coutinho, P. M., et al. (2010). Comparative Genome Analysis of Prevotella ruminicola and Prevotella bryantii: Insights into Their Environmental Niche. Microb. Ecol. 60, 721–729. doi:10.1007/s00248-010-9692-8. Quince, C., Walker, A. W., Simpson, J. T., Loman, N. J., and Segata, N. (2017). Shotgun metagenomics, from sampling to analysis. Nat. Biotechnol. 35, 833–844. doi:10.1038/nbt.3935. Quinlan, A. R., and Hall, I. M. (2010). BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842. doi:10.1093/bioinformatics/btq033. R Core Team (2018). R: A language and environment for statistical computing. Available at: https://www.r-project.org/. Rice, P., Longden, L., and Bleasby, A. (2000). EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 16, 276–277. doi:10.1016/S0168-9525(00)02024-2. Richardson, A. J., McKain, N., and Wallace, R. J. (2013). Ammonia production by human faecal bacteria, and the enumeration, isolation and characterization of bacteria capable of growth on peptides and amino acids. BMC Microbiol. 13, 6. doi:10.1186/1471-2180-13-6. Robinson, T. P., P.K., T., Franceschini, G., Kruska, R. L., Chiozza, F., Notenbaert, A., et al. (2011). Global livestock production systems. Rome: Food and Agriculture Organization of the United Nations (FAO) and International Livestock Research Institute (ILRI) Available at: http://www.fao.org/docrep/014/i2414e/i2414e.pdf. Rogosa, M. (1969). Acidaminococcus gen. n., Acidaminococcus fermentans sp. n., anaerobic Gram- negative diplococci using amino acids as the sole energy source for growth. J. Bacteriol. 98, 756–766. Rogosa, M. (1971). Transfer of Peptostreptococcus elsdenii Gutierrez et al. to a New Genus, Megasphaera [M. elsdenii (Gutierrez et al.) comb. nov.]. Int. J. Syst. Bacteriol. 21, 187–189. doi:10.1099/00207713-21-2-187. Ross, E. M., Petrovski, S., Moate, P. J., and Hayes, B. J. (2013). Metagenomics of rumen

250 Chapter 10 - Appendix bacteriophage from thirteen lactating dairy cattle. BMC Microbiol. 13, 242. doi:10.1186/1471-2180- 13-242. RStudioTeam (2016). RStudio: Integrated Development for R. Available at: http://www.rstudio.com/. Rubino, F., Carberry, C., M Waters, S., Kenny, D., McCabe, M. S., and Creevey, C. J. (2017). Divergent functional isoforms drive niche specialisation for nutrient acquisition and use in rumen microbiome. ISME J. 11, 932–944. doi:10.1038/ismej.2016.172. Rubino, F., Carberry, C., Waters, S. M., Kenny, D., Mccabe, M. S., and Creevey, C. J. (2016). Divergent functional isoforms drive niche specialisation for nutrient acquisition and use in rumen microbiome. Nat. Publ. Gr. 13. doi:10.1038/ismej.2016.172. Russell, D. A. (2018). “Sequencing, Assembling and Finishing Complete Bacteriophage Genomes,” in Bacteriophages. Methods in Molecular Biology Volume 3, eds. M. R. J. Clokie, A. M. Kropinski, and R. Lavigne (New York: Humana Press), 109–125. Russell, J. B. (1983). Fermentation of peptides by Bacteroides ruminicola B14. Appl. Environ. Microbiol. 45, 1566–1574. Russell, J. B. (2005). Enrichment of fusobacteria from the rumen that can utilize lysine as an energy source for growth. Anaerobe 11, 177–184. doi:10.1016/j.anaerobe.2005.01.001. Russell, J. B., Bottje, W. G., and Cotta, M. A. (1981). Degradation of protein by mixed cultures of rumen bacteria: identification of Streptococcus bovis as an actively proteolytic rumen bacterium. J. Anim. Sci. 53, 242–252. doi:10.2527/jas1981.531242x. Russell, J. B., and Hespell, R. B. (1981). Microbial Rumen Fermentation. J. Dairy Sci. 64, 1153–1169. doi:10.3168/jds.S0022-0302(81)82694-X. Russell, J. B. J. B. (2002). Rumen Microbiology and and Its Role in Ruminant Nutrition. Ithaca: New York: James B. Russell Available at: https://www.ars.usda.gov/research/software/download/?softwareid=409. Russell, J. B., and Martin, S. A. (1984). Effects of Various Methane Inhibitors on the Fermentation of Amino Acids by Mixed Rumen Microorganisms in Vitro. J. Anim. Sci. 59, 1329–1338. Russell, J. B., Muck, R. E., and Weimer, P. J. (2009). Quantitative analysis of cellulose degradation and growth of cellulolytic bacteria in the rumen. FEMS Microbiol. Ecol. 67, 183–197. doi:10.1111/j.1574-6941.2008.00633.x. Russell, J. B., and Robinson, P. H. (1984). Compositions and Characteristics of Strains of Streptococcus bovis. J. Dairy Sci. 67, 1525–1531. doi:10.3168/jds.S0022-0302(84)81471-X. Russell, J. B., and Rychlik, J. L. (2001). Factors that alter rumen microbial ecology. Science (80-. ). 292, 1119–1122. doi:10.1126/science.1058830. Russell, J. B., Strobel, H. J., and Chen, G. J. (1988). Enrichment and isolation of a ruminal bacterium with a very high specific activity of ammonia production. Appl. Environ. Microbiol. 54, 872–877. Russell, J. B., and Wallace, R. J. (1997). “Energy-Yielding and Energy-Consuming Reactions,” in The rumen microbial ecosystem, eds. P. N. Hobson and C. S. Stewart (London: Blackie Academic & Professional), 246–282. Rychlik, J. L., Lavera, R., and Russell, J. B. (2002). Amino Acid Deamination by Ruminal Megasphaera elsdenii Strains. Curr. Microbiol. 45, 340–345. doi:10.1007/s00284-002-3743-4. Rychlik, J. L., and Russell, J. B. (2000). Mathematical estimations of hyper-ammonia producing ruminal bacteria and evidence for bacterial antagonism that decreases ruminal ammonia production. FEMS Microbiol. Ecol. doi:10.1016/S0168-6496(00)00021-0. Rychlik, J. L., and Russell, J. B. (2002). The adaptation and resistance of Clostridium aminophilum F to the butyrivibriocin-like substance of Butyrivibrio fibrisolvens JL5 and monensin. FEMS Microbiol. Lett. 209, 89–94. doi:10.1111/j.1574-6968.2002.tb11115.x. Saier, M. H., Reddy, V. S., Tsu, B. V., Ahmed, M. S., Li, C., and Moreno-Hagelsieb, G. (2016). The Transporter Classification Database (TCDB): Recent advances. Nucleic Acids Res. 44, D372–D379.

251 Chapter 10 - Appendix doi:10.1093/nar/gkv1103. Sales, M., Lucas, F., and Blanchart, G. (2000). Effects of ammonia and amino acids on the growth and proteolytic activity of three species of rumen bacteria: Prevotella albensis, Butyrivibrio fibrisolvens, and Streptococcus bovis. Curr. Microbiol. 40, 380–386. doi:10.1007/s002840010074. Saluzzi, L., Flint, H. J., and Stewart, C. S. (2001). Adaptation of Ruminococcus flavefaciens resulting in increased degradation of ryegrass cell walls. FEMS Microbiol. Ecol. 36, 131–137. doi:10.1016/S0168-6496(01)00125-8. Sampaio, M., Rocha, M., Oliveira, H., and Dias, O. (2019). Predicting promoters in phage genomes using PhagePromoter. Bioinformatics. doi:10.1093/bioinformatics/btz580. Sangavai, C., and Chellapandi, P. (2017). Amino acid catabolism-directed biofuel production in Clostridium sticklandii: An insight into model-driven systems engineering. Biotechnol. Reports 16, 32–43. doi:10.1016/j.btre.2017.11.002. Santesmasses, D., Mariotti, M., and Guigó, R. (2017). Computational identification of the selenocysteine tRNA (tRNASec) in genomes. PLoS Comput. Biol. 13. doi:10.1371/journal.pcbi.1005383. Schirmann, K., Chapinal, N., Weary, D. M., Heuwieser, W., and von Keyserlingk, M. a G. (2012). Rumination and its relationship to feeding and lying behavior in Holstein dairy cows. J. Dairy Sci. 95, 3212–7. doi:10.3168/jds.2011-4741. Schmidt, C. (2019). Phage therapy’s latest makeover. Nat. Biotechnol. doi:10.1038/s41587-019-0133- z. Schultz, J. E., and Matin, A. (1991). Molecular and functional characterization of a carbon starvation gene of Escherichia coli. J. Mol. Biol. 218, 129–140. doi:10.1016/0022-2836(91)90879-B. Seemann, T. (2014). Prokka: Rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069. doi:10.1093/bioinformatics/btu153. Seshadri, R., Leahy, S. C., Attwood, G. T., Teh, K. H., Lambie, S. C., Cookson, A. L., et al. (2018). Cultivation and sequencing of rumen microbiome members from the Hungate1000 Collection. Nat. Biotechnol. 36, 359–367. doi:10.1038/nbt.4110. Shah, H. N., and Collins, D. M. (1990). NOTES: Prevotella, a New Genus To Include Bacteroides melaninogenicus and Related Species Formerly Classified in the Genus Bacteroides. Int. J. Syst. Bacteriol. 40, 205–208. doi:10.1099/00207713-40-2-205. Silanikove, N., and Tadmor, A. (1989). Rumen volume, saliva flow rate, and systemic fluid homeostasis in dehydrated cattle. Am. J. Physiol. 256, R809-15. Available at: http://www.ncbi.nlm.nih.gov/pubmed/2705570. Slowikowski, K. (2019). ggrepel: Automatically Position Non-Overlapping Text Labels with “ggplot2.” Available at: https://cran.r-project.org/package=ggrepel. Smith, P., Bustamante, M., Ahammad, H., Clark, H., Dong, H., Elsidigg, E. A., et al. (2014). “Agriculture, Forestry and Other Land Use (AFOLU),” in Climate Change 2014: Mitigation of Climate Change. Contribution of Working Group III to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change, eds. O. Edenhofer, R. Pichs-Madruga, Y. Sokona, E. Farahani, S. Kadner, K. Seyboth, et al. (Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA), 811–922. Solovyev, V., and Salamov, A. (2011). “Automatic annotation of microbial genomes and metagenomic sequences,” in Metagenomics and its Applications in Agriculture, Biomedicine and Environmental Studies. Soriani, N., Panella, G., and Calamari, L. (2013). Rumination time during the summer season and its relationships with metabolic conditions and milk production. J. Dairy Sci. 96, 5082–94. doi:10.3168/jds.2013-6620. Soto, R. C., Muhammed, S. A., Newbold, C. J., Stewart, C. S., and Wallace, R. J. (1994). Influence of peptides, amino acids and urea on microbial activity in the rumen of sheep receiving grass hay and on

252 Chapter 10 - Appendix the growth of rumen bacteria in vitro. Anim. Feed Sci. Technol. 49, 151–161. doi:10.1016/0377- 8401(94)90088-4. Stackebrandt, E., and Hippe, H. (1986). Transfer of Bacteroides amylophilus to a new genus Ruminobacter gen. nov., nom. rev. as Ruminobacter amylophilus comb. nov. Syst. Appl. Microbiol. doi:10.1016/S0723-2020(86)80078-9. Stadtman, T. C. (1954). On the metabolism of an amino acid fermenting Clostridium. J. Bacteriol. 67, 314–320. Stadtman, T. C., and McClung, L. S. (1957). Clostridium sticklandii nov. spec. J. Bacteriol. 73, 218– 219. Stahl, D. A., Flesher, B., Mansfield, H. R., and Montgomery, L. (1988). Use of phylogenetically based hybridization probes for studies of ruminal microbial ecology. Appl. Environ. Microbiol. 54, 1079– 1084. Stamatakis, A. (2014). RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313. doi:10.1093/bioinformatics/btu033. Stark, W. M. (2017). Making serine integrases work for us. Curr. Opin. Microbiol. 38, 130–136. doi:10.1016/j.mib.2017.04.006. Stewart, R. D., Auffret, M. D., Warr, A., Walker, A. W., Roehe, R., and Watson, M. (2019). Compendium of 4,941 rumen metagenome-assembled genomes for rumen microbiome biology and enzyme discovery. Nat. Biotechnol. 37, 953–961. doi:10.1038/s41587-019-0202-3. Stewart, R. D., Auffret, M. D., Warr, A., Wiser, A. H., Press, M. O., Langford, K. W., et al. (2018). Assembly of 913 microbial genomes from metagenomic sequencing of the cow rumen. Nat. Commun. 9, 1–11. doi:10.1038/s41467-018-03317-6. Suen, G., Mikhailova, N., Ivanova, N. N., Currie, C. R., Stevenson, D. M., Weimer, P. J., et al. (2011). The Complete Genome Sequence of Fibrobacter succinogenes S85 Reveals a Cellulolytic and Metabolic Specialist. PLoS One. doi:10.1371/journal.pone.0018814. Sutherland, I. A. (2007). Recent progress on the industrial scale-up of counter-current chromatography. J. Chromatogr. A 1151, 6–13. doi:10.1016/j.chroma.2007.01.143. Swain, R. A., Nolan, J. V., and Klieve, A. V (1996). Natural variability and diurnal fluctuations within the bacteriophage population of the rumen. Appl. Environ. Microbiol. 62, 994–997. Tamada, H., Harasawa, R., and Shinko, T. (1985). Isolation of Bacteriophage in Fusobacterium necrophorum. Japanese J. Vetenary Sci. 47, 483–486. Available at: https://www.jstage.jst.go.jp/article/jvms1939/47/3/47_3_483/_pdf. Tamminga, S. (1979). Protein degradation in the forestomachs of ruminants. J. Anim. Sci. 79, 1615– 1630. doi:10.2527/jas1979.4961615x. Tamminga, S. (1981). Nitrogen and Amino Acid Metabolism in dairy cows. Tan, Z., and Murphy, M. (2004). Ammonia production, ammonia absorption, and urea recycling in ruminants. A review. J. Anim. Feed Sci. 13, 389–404. doi:10.22358/jafs/67425/2004. Tanaka, H., and Stadtman, T. C. (1978). Selenium-Dependent Clostridial Glycine Reductase. Methods Enzymol. 53, 373–382. doi:10.1016/S0076-6879(78)53043-7. Tapio, I., Snelling, T. J., Strozzi, F., and Wallace, R. J. (2017). The ruminal microbiome associated with methane emissions from ruminant livestock. J. Anim. Sci. Biotechnol. 8, 7. doi:10.1186/s40104- 017-0141-0. Taylor, B. L., and Zhulin, I. B. (1999). PAS domains: internal sensors of oxygen, redox potential, and light. Microbiol. Mol. Biol. Rev. 63, 479–506. Available at: http://www.ncbi.nlm.nih.gov/pubmed/10357859 [Accessed January 29, 2020]. Tschuor, A., and Clauss, M. (2008). Investigations on the stratification of forestomach contents in ruminants: An ultrasonographic approach. Eur. J. Wildl. Res. 54, 627–633. doi:10.1007/s10344-008- 0188-5.

253 Chapter 10 - Appendix Ungerfeld, E. M. (2020). Metabolic Hydrogen Flows in Rumen Fermentation: Principles and Possibilities of Interventions. Front. Microbiol. 11. doi:10.3389/fmicb.2020.00589. Vandenheuvel, D., Rombouts, S., and Adriaenssens, E. M. (2018). “Purification of Bacteriophages Using Anion-Exchange Chromatography,” in Bacteriophages: Methods and Protocols, Volume 3, eds. M. R. J. Clokie, A. M. Kropinski, and R. Lavigne (New York: Springer), 59–69. doi:10.1007/978-1- 4939-7343-9_5. Vargas, J. E., Andrés, S., López-Ferreras, L., Snelling, T. J., Yáñez-Ruíz, D. R., García-Estrada, C., et al. (2020). Dietary supplemental plant oils reduce methanogenesis from anaerobic microbial fermentation in the rumen. Sci. Rep. 10, 1613. doi:10.1038/s41598-020-58401-z. Veesler, D., and Cambillau, C. (2011). A Common Evolutionary Origin for Tailed-Bacteriophage Functional Modules and Bacterial Machineries. Microbiol. Mol. Biol. Rev. 75, 423–433. doi:10.1128/mmbr.00014-11. Wacker, T., Garcia-Celma, J. J., Lewe, P., and Andrade, S. L. A. (2014). Direct observation of + electrogenic NH 4 transport in ammonium transport (Amt) proteins. Proc. Natl. Acad. Sci. 111, 9995–10000. doi:10.1073/pnas.1406409111. Walker, B. J., Abeel, T., Shea, T., Priest, M., Abouelliel, A., Sakthikumar, S., et al. (2014). Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement. PLoS One 9, e112963. doi:10.1371/journal.pone.0112963. Walker, N. D., Newbold, C. J., and Wallace, R. J. (2005). “Nitrogen Metabolism in the Rumen,” in Nitrogen and Phosphourous Nutrition of Cattle, eds. E. Pfeffer and A. Hristov (CABI), 71–116. Available at: https://ebookcentral.proquest.com/lib/aber/detail.action?docID=289439. Wallace, R. J. (1994). “Amino Acid and Protein Synthesis, Turnover,and Breakdown by Ruminal microorganisms.,” in Principles pf Protein Nutrition in Ruminants, ed. J. M. Asplund (Florida: CRC Press). Wallace, R. J. (1996). Ruminal Microbial Metabolism of Peptides and Amino Acids. J. Nutr. 126, 1326S-1334S. Available at: http://jn.nutrition.org/content/126/4_Suppl/1326S.long. Wallace, R. J., and Brammall, M. L. (1985). The Role of Different Species of Bacteria in the Hydrolysis of Protein in the Rumen. Microbiology 131, 821–832. doi:10.1099/00221287-131-4-821. Wallace, R. J., Chaudhary, L. C., Miyagawa, E., McKain, N., and Walker, N. D. (2004). Metabolic properties of Eubacterium pyruvativorans, a ruminal “hyper-ammonia-producing” anaerobe with metabolic properties analogous to those of Clostridium kluyveri. Microbiology 150, 2921–2930. doi:10.1099/mic.0.27190-0. Wallace, R. J., and McKain, N. (1991). A survey of peptidase activity in rumen bacteria. J. Gen. Microbiol. 137, 2259–2264. doi:10.1099/00221287-137-9-2259. Wallace, R. J., Mckain, N., Broderick, G. A., Rode, L. M., Walker, N. D., Newbold, C. J., et al. (1997a). Peptidases of the Rumen Bacterium, Prevotella ruminicola. Anaerobe 3, 35–42. Wallace, R. J., McKain, N., McEwan, N. R., Miyagawa, E., Chaudhary, L. C., King, T. P., et al. (2003). Eubacterium pyruvativorans sp. nov., a novel non-saccharolytic anaerobe from the rumen that ferments pyruvate and amino acids, forms caproate and utilizes acetate and propionate. Int. J. Syst. Evol. Microbiol. 53, 965–970. doi:10.1099/ijs.0.02110-0. Wallace, R., Onodera, R., and Cotta, M. A. (1997b). “Metabolism of Nitrogen-Containing Compounds,” in The Rumen Microbial Ecosystem, eds. P. N. Hobson and C. S. Stewart (London: Blackie Academic & Professional), 283–328. Wang, Y., Youssef, N. H., Couger, M. B., Hanafy, R. A., Elshahed, M. S., and Stajich, J. E. (2019). Molecular Dating of the Emergence of Anaerobic Rumen Fungi and the Impact of Laterally Acquired Genes. mSystems 4. doi:10.1128/msystems.00247-19. Warner, A. C. I. (1962). Some factors influencing the rumen microbial population. Microbiology 28, 129–146. doi:10.1099/00221287-28-1-129. Weimer, P. J. (2015). Redundancy, resilience, and host specificity of the ruminal microbiota:

254 Chapter 10 - Appendix Implications for engineering improved ruminal fermentations. Front. Microbiol. 6, 296. doi:10.3389/fmicb.2015.00296. Weimer, P. J., and Moen, G. N. (2013). Quantitative analysis of growth and volatile fatty acid production by the anaerobic ruminal bacterium Megasphaera elsdenii T81. Appl. Microb. Cell Physiol. 97, 4075–4081. doi:10.1007/s00253-012-4645-4. Welch, J. G. (1982). Rumination, Particle Size and Passage from the Rumen. J. Anim. Sci. 54, 885– 894. doi:10.2134/jas1982.544885x. Wenning, M., Breitenwieser, F., Konrad, R., Huber, I., Busch, U., and Scherer, S. (2014). Identification and differentiation of food-related bacteria: A comparison of FTIR spectroscopy and MALDI-TOF mass spectrometry. J. Microbiol. Methods 103, 44–52. doi:10.1016/j.mimet.2014.05.011. Wenning, M., and Scherer, S. (2013). Identification of microorganisms by FTIR spectroscopy: Perspectives and limitations of the method. Appl. Microbiol. Biotechnol. 97, 7111–7120. doi:10.1007/s00253-013-5087-3. Whitehead, T. R., and Cotta, M. A. (2004). Isolation and Identification of Hyper-Ammonia Producing Bacteria from Swine Manure Storage Pits. Curr. Microbiol. 48, 20–26. doi:10.1007/s00284-003-4084- 7. Whitehead, T. R., Cotta, M. A., Falsen, E., Moore, E., and Lawson, P. A. (2011). Peptostreptococcus russellii sp. nov., isolated from a swine-manure storage pit. Int. J. Syst. Evol. Microbiol. 61, 1875– 1879. doi:10.1099/ijs.0.023762-0. Wickham, H. (2007). Reshaping Data with the reshape Package. J. Stat. Softw. 21, 1–20. Available at: http://www.jstatsoft.org/v21/i12/. Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Available at: http://ggplot2.org. Wickham, H., François, R., Henry, L., and Müller, K. (2019). dplyr: A Grammar of Data Manipulation. Available at: https://cran.r-project.org/package=dplyr. Wilkinson, T. J., Huws, S. A., Edwards, J. E., Kingston-Smith, A. H., Siu-Ting, K., Hughes, M., et al. (2018). CowPI: A Rumen Microbiome Focussed Version of the PICRUSt Functional Inference Software. Front. Microbiol. 9, 1095. doi:10.3389/fmicb.2018.01095. Williams, S. K. R., Runyon, J. R., and Ashames, A. A. (2011). Field-flow fractionation: Addressing the nano challenge. Anal. Chem. 83, 634–642. doi:10.1021/ac101759z. Williams, Y. J., Popovski, S., Rea, S. M., Skillman, L. C., Toovey, A. F., Northwood, K. S., et al. (2009). A vaccine against rumen methanogens can alter the composition of archaeal populations. Appl. Environ. Microbiol. 75, 1860–1866. doi:10.1128/AEM.02453-08. Wolin, M. J. (1981). Fermentation in the rumen and human large intestine. Science (80-. ). 213, 1463– 1468.. Wommack, K. E., Sime-Ngando, T., Winget, D. M., Jamindar, S., and Helton, R. R. (2010). “Filtration-based methods for the collection of viral concentrates from large water samples,” in Manual of Aquatic Viral Ecology (American Society of Limnology and Oceanography), 110–117. Wright, A. D. G., and Klieve, A. V (2011). Does the complexity of the rumen microbial ecology preclude methane mitigation? Anim. Feed Sci. Technol. 166–167, 248–253. doi:10.1016/j.anifeedsci.2011.04.015. Wu, S. H. W., and Papas, A. (1997). Rumen-stable delivery systems. Adv. Drug Deliv. Rev. 28, 323– 334. doi:10.1016/S0169-409X(97)00087-2. Yang, C. M. J., and Russell, J. B. (1993). Effect of monensin on the specific activity of ammonia production by ruminal bacteria and disappearance of amino nitrogen from the rumen. Appl. Environ. Microbiol. 59, 3250–3254. Yazaki, K. (1981). Electron microscopic studies of bacteriophage phi X174 intact and “eclipsing’’ particles, and the genome by the staining, and shadowing method.” J. Virol. Methods 2, 159–67. Available at: http://www.ncbi.nlm.nih.gov/pubmed/6168647. 255 Chapter 10 - Appendix Zehavi, T., Probst, M., and Mizrahi, I. (2018). Insights into culturomics of the rumen microbiome. Front. Microbiol. doi:10.3389/fmicb.2018.01999. Zhang, L., Huang, X., Xue, B., Peng, Q., Wang, Z., Yan, T., et al. (2015). Immunization against rumen methanogenesis by vaccination with a new recombinant protein. PLoS One 10. doi:10.1371/journal.pone.0140086. Zhang, Z., Schwartz, S., Wagner, L., and Miller, W. (2000). A greedy algorithm for aligning DNA sequences. J. Comput. Biol. doi:10.1089/10665270050081478. Zheng, L., Kostrewa, D., Bernèche, S., Winkler, F. K., and Li, X. D. (2004). The mechanism of ammonia transport based on the crystal structure of AmtB of Escherichia coli. Proc. Natl. Acad. Sci. U. S. A. 101, 17090–17095. doi:10.1073/pnas.0406475101. Ziegler, C., Bremer, E., and Krämer, R. (2010). The BCCT family of carriers: from physiology to crystal structure. Mol. Microbiol., no-no. doi:10.1111/j.1365-2958.2010.07332.x. Zientz, E., Six, S., and Unden, G. (1996). Identification of a third secondary carrier (DcuC) for anaerobic C4-dicarboxylate transport in Escherichia coli: Roles of the three Dcu carriers in uptake and exchange. J. Bacteriol. 178, 7241–7247. doi:10.1128/jb.178.24.7241-7247.1996.

256 Chapter 10 - Appendix 10. Appendix

257 Chapter 10 - Appendix

258 Chapter 10 - Appendix

Appendix Figure 10.1 Growth curves of HAP, NAP and SAP cultures, all grown in the same conditions. Multiple lines and different symbols indicate multiple replicates. Some slow growing cultures were done in two parts; to capture growth across a 24 or 48 hour period as required.

259 Chapter 10 - Appendix Appendix Table 10.1 Presence of characteristic genes from Acetoanaerobium sticklandii DSM519 in the HAPs, NAPs and SAPs. + indicated presence, based on BLASTp search with over 30% identity over at least 80% the length of the A. sticklandii DSM519 gene. Characteristic gene products listed in

(Fonknechten et al., 2010).

Protein Characteristic gene products Label

symbol

Acetoanaerobium sticklandii ATCC12662 sticklandii Acetoanaerobium ATCC49906 aminophilum Clostridium isolate6 pyruvativorans Eubacterium 3071 DSM fibrisolvens Butyrivibrio T81 elsdenii Megasphaera ATCC19189 ruminicola Prevotella SY3 albus Ruminococcus 007c flavefaciens Ruminococcus S85 succinogenes Fibrobacter

Sec insertion machinery:

Selenocysteine synthase SelA CLOST_1358 + + + Selenocystein-specific elongation SelB CLOST_1357 + + + factor Selenophosphate synthase SelD CLOST_1359 + + + + CLOST_tRNA5 SeC(p) tRNA SelC + + + 9 Proline reductase: D-Proline reductase proprotein PrdA CLOST_2234 + + + D-Proline selenoprotein PrdB CLOST_2232 + Sec-containing electron transfer PrdC CLOST_2236 + + + + protein Glycine reductase: Substrate-specific activating CLOST_1113 + + GrdB selenoprotein CLOST_1114 GrdB-stabilizing proprotein GrdE CLOST_1110 + + Redox-active selenoprotein forming a CLOST_1112 GrdA carboxymethyl-selenoether CLOST_1111 Protein forming a protein-bound acetyl-ester from the GrdA- GrdC CLOST_1115 + + carboxymethyl-selenoether Protein releasing the protein-bound GrdD CLOST_1116 + + acetyl group as acetyl-phosphate Glycine cleavage system:

260 Chapter 10 - Appendix

Aminomethyltransferase GcvT CLOST_0426 + + + CLOST_0427 + + + Lipoylprotein GcvH CLOST_1127 + + + + + + + + + CLOST_0428 + + + Glycine dehydrogenase GcvP CLOST_0429 + + + Lipoamide dehydrogenase GcvL CLOST_1166 + + + + Threonine catabolism pathways: Threonine dehydrogenase Tdh CLOST_1621 + Threonine aldolase ItaE CLOST_0572 + Threonine dehydratase TdcB CLOST_0395 + + + + + + Arginine deiminase pathway: Arginine deiminase ArcA CLOST_0926 + Ornithine carbamoyltransferase ArcB CLOST_0927 + + + + + + + + Carbamate kinase ArcC CLOST_0928 + + + Ornithine reductive pathway: Ornithine cyclodeaminase ArcB CLOST_1603 + + Proline racemase PrdF CLOST_2228 + + + Ornithine oxidative pathway: Ornithine racemase Orr CLOST_1288 + + OraE CLOST_1290 + Ornithine aminomutase OraS CLOST_1291 + 2,4-diaminopentanoate Ord CLOST_1294 + dehydrogenase 2-amino-4-ketopentanoate thiolase (a OrtA CLOST_1292 + and b subunits) OrtB CLOST_1293 + Lysine fermentation pathway: L-lysine 2,3-aminomutase KamA CLOST_1382 + b-L-lysine-5,6-aminomutase (a and b KamD CLOST_1379 + subunits) KamE CLOST_1378 + 3,5-diaminohexanoate dehydrogenase Kdd CLOST_1383 + 3-keto-5-aminohexanoate cleavage Kce CLOST_1384 + enzyme 3-aminobutyryl-CoA ammonia lyase Kal CLOST_1385 + Acetoacetate:butyrate CoA AtoA CLOST_1124 + + + transferase (a and b subunits) AtoD CLOST_1123 + + Serine catabolism pathway:

261 Chapter 10 - Appendix

SdhA CLOST_1364 + + + + + L-serine dehydratase SdhB CLOST_1365 + + + + + Cysteine catabolism pathway: Putative L-cysteine sulphide lyase CLOST_2039 + + + + + + + + + Butyrate fermentation (from acetyl-

CoA): Acetyl-CoA acetyl transferase AtoB CLOST_1134 + + + + + 3-Hydroxybutyryl-CoA Hbd CLOST_1133 + + + + + dehydrogenase Crotonase Crt CLOST_1132 + + + + + + Butyryl-CoA dehydrogenase Bcd CLOST_1135 + + + + + + + EtfA CLOST_1137 + + + + + + + Electron transfer flavoprotein EtfB CLOST_1136 + + + + + + + Oxidative stress reponse: Mn-superoxide dismutase SodA CLOST_1948 + + + Superoxide reductase SorA CLOST_1779 + Alkyl hydroperoxide reductase YkuU CLOST_2030 + CLOST_0978 + + + + + + + + Glutathione peroxidase BtuE CLOST_2446 + + + + + + + + Selenoperoxiredoxin PrxU CLOST_2406 Thioredoxin dependent peroxidase CLOST_1360 + Methionine sulfoxide reductase A MsrA CLOST_1083 + + Methionine sulfoxide reductase B MsrB CLOST_2458 + Peroxide-responsive repressor PerR CLOST_1148 + + + + + Wood-Ljungdahl pathway: CODH- Carbon monoxide dehydrogenase CLOST_1160 + + beta CODH- Acetyl-CoA synthase CLOST_1171 + alpha Rnf complex: Electron transport complex RnfB CLOST_1397 + + + + + + Putative inner membrane subunit RnfA CLOST_1398 + + + + + + + Putative inner membrane NADH- RnfE CLOST_1399 + + + + + + + quinone reductase Electron transport complex protein RnfG CLOST_1400 + + + + + precursor

262 Chapter 10 - Appendix Putative inner membrane RnfD CLOST_1401 + + + + + + + + oxidoreductase Electron transport complex protein RnfC CLOST_1402 + + + + + + + + Hydrogenases and maturation proteins: Putative catalytic subunit of iron- HymC CLOST_0907 + + + + + + only hydrogenase Putative iron-only hydrogenase, HymB CLOST_0908 + + + + electron-transfer subunit Putative iron-only hydrogenase, CLOST_0909 + + + + + HymA electron-transfer subunit CLOST_1663 + + + + Periplasmic iron-only hydrogenase HydA CLOST_0839 + + + + + + Iron-only hydrogenase maturation HydF CLOST_0845 + + + + + protein Fe-hydrogenase assembly protein HydG CLOST_0846 + + Iron-only hydrogenase maturation HydE CLOST_0847 + + + + + protein

263 Chapter 10 - Appendix Appendix Table 10.2 Table of 40 universal single-copy gene families. Table information taken from (Creevey et al., 2011).

Gene family ID Description COG0012 GTP Binding Protein COG0016 Phenylalanyl-tRNA synthetase alpha subunit COG0018 Arginyl-tRNA synthetase COG0048 Ribosomal protein S12 COG0049 Ribosomal protein S7 COG0052 Ribosomal protein S2 COG0080 Ribosomal protein L11 COG0081 Ribosomal protein L1 COG0085 DNA-directed RNA polymerase COG0087 Ribosomal protein L3 COG0088 Ribosomal protein L4 COG0090 Ribosomal protein L2 COG0091 Ribosomal protein L22 COG0092 Ribosomal protein S3 COG0093 Ribosomal protein L14 COG0094 Ribosomal protein L5 COG0096 Ribosomal protein S8 COG0097 Ribosomal protein L6P/L9E COG0098 Ribosomal protein S5 COG0099 Ribosomal protein S13 COG0100 Ribosomal protein S11 COG0102 Ribosomal protein L13 COG0103 Ribosomal protein S9 COG0124 Histidyl-tRNA synthetase COG0172 Seryl-tRNA synthetase COG0184 Ribosomal protein S15P/S13E COG0185 Ribosomal protein S19 COG0186 Ribosomal protein S17 COG0197 Ribosomal protein L16/L10E COG0200 Ribosomal protein L15 COG0201 Preprotein subunit SecY COG0202 DNA-directed RNA polymerase COG0215 Cysteinyl-tRNA synthetase COG0256 Ribosomal protein L18 COG0495 Leucyl-tRNA synthetase COG0522 Ribosomal protein S4 COG0525 Valyl-tRNA synthetase COG0533 Metal-dependent protease COG0541 Signal recognition particle GTPase COG0552 Signal recognition particle GTPase

264 Chapter 10 - Appendix

Appendix Table 10.3 Number of Coding Sequences (CDS) and those which were not expressed. Total Number of CDS Number of CDS with Species Phenotype Regions (in GFF files) zero reads aligning A. sticklandii ATCC12662 HAP 2557 14 C. aminophilum ATCC49906 HAP 2574 1 E. pyruvativorans Isol6 HAP 1894 18 R. albus SY3 NAP 3794 6 R. flavefaciens 007c NAP 3139 8 F. succinogenes S85 NAP 3147 0

265 Chapter 10 - Appendix

266 Chapter 10 - Appendix

267 Chapter 10 - Appendix

268 Chapter 10 - Appendix

269 Chapter 10 - Appendix

270 Chapter 10 - Appendix

271 Chapter 10 - Appendix Appendix Figure 10.2 (On previous page) Tukey style box and whisker plots of significantly different expression of 268 eggNOG functional orthologous groups that were manually assessed for significant differences between HAPs and NAPs. The boxplots summarise data within each phenotype, showing the median, first and third quartiles in the box, and 1.5 times the interquartile range above and below the median in the whiskers. The points are the replicates for each species, coloured according to the legend on the right. Data used for the boxplots and the points use DESeq2 normalised counts. Species are indicated in the legend; 12662 – A. sticklandii ATCC12662; 49906 – C. aminophilum ATCC49906; Isol6 – E. pyruvativorans Isol6; S85 – F. succinogenes S85; 007c – R. flavefaciens 007c; SY3 – R. albus SY3.

Appendix Table 10.4 (On next page) Functional orthologous gene groups that showed significant difference in read counts between the species. Numbers denote the mean and standard deviation of DESeq2 normalised count data, and the letter shows the where significant differences lie; different letters shows a significant difference between these species. Those gene families above the bold line are where the median is greater in HAPs compared to NAPs. The functional identifiers are C - Energy production and conversion; E- Amino acid transport and metabolism; F- Nucleotide transport and metabolism; D- Cell cycle control, cell division, chromosome partitioning; K – Transcription, O - Posttranslational modification, protein turnover, chaperones; H - Coenzyme transport and metabolism; T - Signal transduction mechanisms; J - Translation, ribosomal structure and biogenesis; U - Intracellular trafficking, secretion, and vesicular, transport; L - Replication, recombination and repair; I - Lipid transport and metabolism; P - Inorganic ion transport and metabolism; M - Cell wall/membrane/envelope biogenesis; G - Carbohydrate transport and metabolism; N - Cell motility. 12662 – A. sticklandii ATCC12662; 49906 – C. aminophilum ATCC49906; Isol6 – E. pyruvativorans Isol6; S85 – F. succinogenes S85; 007c – R. flavefaciens 007c; SY3 – R. albus SY3.

272 Chapter 10 - Appendix

COG0756 COG0846 COG1502 COG2003 COG1947 COG0234 COG0634 COG0503 COG0258 COG0749 COG0491 COG1773 COG0251 COG0215 COG2172 COG0542 COG1057 COG1191 COG0459 COG0568 COG0733 COG0282 COG1115 COG0334 Species

F K I L F O F F L L GM C J J T O H K O K D F E E ID

746.9 ± 101.8 e 889 ± 1108.3 ± 106 a 1584.7 ± 135.1 a 650.9 ± 35.6 a 452.3 ± 89.2 a 4438.7 ± 931.5 a 2022.3 ± 124.5 c 2512.4 ± 310.6 e 2512.4 ± 310.6 a 2270.5 ± 195.9 a 1178.3 ± 90.8 a 2332.6 ± 388.4 a 3753.4 ± 175.7 e 2889.5 ± 342.5 c 14090.2 ± 1667.3 a 4486.3 ± 722 e 5 5860.6 ± 1342.6 e 5795.8 ± 670.1 a 9521.7 ± 1707.3 e 9449.4 ± 894.8 e 8135.2 ± 2046.3 e 84374.9 ± 12728.2 a 12662

216.1 ± 740.1 a

96.2 a

398.1 ± 44.1 a 1565.3 ± 174.2 b 1249.6 ± 41.8 c 897 ± 97.4 e 1538.9 ± 84.3 b 1369.8 ± 91.8 b 1250.9 ± 84.4 e 1908.3 ± 276 c 2085.7 ± 29.5 a 2085.7 ± 29.5 b 3067.7 ± 568.7 c 2370.3 ± 552.3 b 1488.1 ± 151.9 b 4257.6 ± 127.4 a 3880.7 ± 249.4 a 7108.4 ± 538.9 e 3180 ± 241.6 a 2867.1 ± 79.8 b 7017.9 ± 273.5 e 9605.9 ± 771.6 e 10189.6 ± 432.8 e 5164.1 ± 571.6 a 9338.1 ± 1286.9 e 19824.9 ± 1813.6 b 49906

807.1 ± 21.3 e 1242.4 ± 234.6 c 1299.6 ± 181 c 1003.8 ± 77.5 e 2003.7 ± 101.1 c 3097.4 ± 806.9 c 1338.1 ± 109.2 e 3022 ± 113.5 a 2554.1 ± 124 e 3972 ± 159.3 c 3323.6 ± 183.7 c 4140.2 ± 1251.3 c 5360.5 ± 962.1 c 3758 ± 369 e 2626.5 ± 296 c 6650.5 ± 575.3 e 5079.5 ± 303.5 e 6503.2 ± 572.6 c 23206.8 ± 4210.3 a 10823.8 ± 1078.4 e 7903.4 ± 929.7 a 10022.7 ± 944.9 e 13425.8 ± 1467.7 a 129680.6 ± 13484.2 c Isol6

262.4 ± 30.1 b 482.7 ± 43.1 e 387.4 ± 24.2 b 357.5 ± 63.1 b 363.8 ± 202.3 ± 75.5 e 143.9 ± 16.3 b 324.6 ± 24.2 b 792.2 ± 46.3 b 792.2 ± 46.3 d 1173.3 ± 30.4 d 240.8 ± 92.9 e 184.4 ± 15.9 d 2033.7 ± 263.9 b 855.4 ± 79 b 4049.7 ± 287.4 b 174.1 ± 11.1 b 1210.8 ± 188.7 d 1372.9 ± 334.4 b 3968.9 ± 229.8 b 1373.4 ± 1519.3 ± 142.8 b 1192.5 ± 88.1 b 8782.4 ± 1865.6 d S85

16.5 d

77.4 b

217.9 ± 12.8 d 535.3 ± 147.7 e 516.4 ± 45.7 d 76 ± 14.5 d 559.3 ± 46.1 f 275.2 ± 39.3 d 114.5 ± 11 d 870 ± 84 d 1159 ± 136.7 d 1407.9 ± 112.7 f 739.4 ± 63.1 b 270.6 ± 32.7 e 127.3 ± 26.6 f 1230.8 ± 52.3 d 1.9 ± 1.3 d 5189.6 ± 88.5 d 1949.9 ± 277.9 d 536.5 ± 100.9 e 2066.5 ± 313.8 d 1909.6 ± 267.9 d 341.6 ± 81 d 2989.7 ± 119.9 d 1.9 ± 1.1 d 3268.8 ± 517.8 f 007c

163.5 ± 6.6 c 205.6 ± 27.7 d 573.2 ± 48.7 d 15.6 ± 2.8 c 478.6 ± 18.8 e 172.2 ± 27.5 e 43.1 ± 10.6 c 906.1 ± 71.3 d 619.4 ± 69.5 c 619.4 ± 69.5 e 1125.7 ± 95.2 d 101.2 ± 13.5 d 934.4 ± 143.5 e 623.6 ± 16 c 2 ± 0.9 d 3577.2 ± 109.8 c 262.3 ± 40.4 c 518.9 ± 20.2 e 552.9 ± 98.4 c 1085.6 ± 163.6 c 2763.4 ± 274.8 c 391.3 ± 339.2 ± 35.9 c 1446.2 ± 184.6 e SY3

108.8 c

273 Chapter 10 - Appendix

COG2205 COG0438 COG0515 COG4972 COG0662 COG2804 COG0406 COG0769 COG0758 COG1354 COG2264 COG0379 COG1589 COG0029 COG0547 COG0157 COG1281 COG0159 COG0287 COG0040 COG0054 COG0816 COG0428 Species

T M KLT NU G NU G M LU D J H D H E H O E E F H L P ID

8551.5 ± 506 c 495.8 ± 113.9 c 3952.6 ± 297 e 2970.2 ± 410.1 a 1023.9 ± 210.1 a 1310.1 ± 329.5 a 664 ± 145.4 e 1567.8 ± 138.6 a 60.5 ± 7.4 a 189.8 ± 61.5 a 724.8 ± 85 c 2 ± 0.5 a 473.5 ± 14.6 a 13.1 ± 7.1 a 18.9 ± 15.3 ± 3.9 a 692.8 ± 44.1 e 47.2 ± 6.5 a 381.6 ± 56.8 e 36.4 ± 16.5 a 227.8 ± 36.8 e 666.4 ± 37.9 a 1621.5 ± 127.7 a 12662

5.3 e

4664.1 ± 368.6 a 7115.6 ± 643.8 a 3979.6 ± 222 e 1000 ± 166.7 b 213.7 ± 35.7 b 1573.1 ± 188.9 b 845.8 ± 69.3 a 2120.2 ± 114.3 ± 16.4 b 749.5 ± 44.9 c 685.8 ± 92.5 c 122 ± 32.1 b 688.6 ± 6.2 b 128.8 ± 20.5 b 15 ± 3.6 e 97.3 ± 21.6 b 549.9 ± 30 a 9.5 ± 6.8 b 131 ± 6.8 a 404 ± 12.6 b 255.6 ± 25.9 e 1678 ± 225.4 b 211.6 ± 7.9 b 49906

44.2 b

8391.1 ± 257.2 c 483.9 ± 86.6 c 2718.7 ± 386.1 a 1268 ± 91.6 c 821.2 ± 79.1 c 3197.7 ± 141.6 c 582.2 ± 55.2 e 988.5 ± 45.4 c 18.9 ± 6.7 c 805.8 ± 107.4 c 297.1 ± 21.5 a 158.3 ± 25.1 c 1094.9 ± 119 c 287.7 ± 44.5 c 50.1 ± 9.6 a 284.7 ± 26.9 c 635.2 ± 58.1 e 82.7 ± 13.1 c 299.9 ± 57.1 e 154.4 ± 4 c 149.6 ± 27 a 577.9 ± 61.8 c 447.1 ± 51.4 c Isol6

25942.9 ± 356.1 d 36568.3 ± 3792.4 b 6659.6 ± 340.4 b 4355.8 ± 315.4 d 7224.1 ± 392.3 e 5647.6 ± 303.7 e 7531.5 ± 459.1 b 3082.6 ± 109 e 1947.9 ± 633.2 d 2642.4 ± 60 b 1716.2 ± 125.5 d 1112.9 ± 48.4 e 1499.8 ± 123.9 d 932.6 ± 36.3 e 724.3 ± 67.4 b 740 ± 20.8 d 1383.6 ± 69.1 b 415.3 ± 59 d 1460.1 ± 98 b 2150 ± 160.7 d 327.1 ± 18.6 b 358.6 ± 45.4 e 132.4 ± 7.9 d S85

12953.2 ± 735.6 b 11411.7 ± 2699.2 d 18855.5 ± 840.2 d 10539 ± 807 f 1838.3 ± 439 d 6113.6 ± 90.6 e 2249.7 ± 147.5 d 3419.3 ± 335.4 d 1375.5 ± 196.7 f 1588.1 ± 251.8 d 2374.9 ± 635.1 b 1064.1 ± 28.3 e 1592.4 ± 177.3 d 914.8 ± 38.8 e 1047.4 ± 127.3 d 1529.2 ± 185.7 f 897.4 ± 46.2 d 565.4 ± 61.9 f 945.2 ± 715.9 d 547 ± 93.1 e 543.4 ± 95.1 d 346.7 ± 42.9 e 45.9 ± 5 f 007c

24777.3 ± 1623.8 12692.6 ± 3460.5 d 15002.2 ± 1053.4 c 15555.2 ± 2071.5 e 7264.8 ± 820.5 e 12667.6 ± 1237.8 d 1570.6 ± 170.3 c 2828.9 ± 301.7 e 1122.1 ± 115.5 e 1863.4 ± 223 d 1675.5 ± 169.8 d 368.6 ± 1431.7 ± 122.2 d 1582 ± 135.2 d 439.1 ± 31.6 c 503.6 ± 15.9 e 1163.3 ± 70.2 c 1156.2 ± 103 e 613.1 ± 39.5 c 506.1 ± 29.3 e 827.5 ± 98.4 c 248 ± 28.3 d 10.2 ± 3.3 e SY3

26.7 d

d

274 Chapter 10 - Appendix

Appendix Table 10.5 Summary table of codon usage statistics for the bacterial host Butyrivibrio fibrisolvens DSM3071 and the five phage genomes. The ones highlighted correspond to the amino acid for which a tRNA was found in the genomes of phage Arian and Bo-Finn. (AA - Amino Acid, freq - frequency)

275

Chapter 10 - Appendix

iTOL in (LetunicBork, and 2007). Matchingcolours represent shared a species, genus or family, respectively. VICTORare scaled trees in terms of the respectivedistance fo averagesupport of 78 The %. numbers below branches the GBDP are pseudo AppendixFigure

10

.

3

Phylogenomictree fromVICTOR analysis.

rmulaused (Meier

P

hylogenomicgenome

-

bootstrap support values from replications. 100 The branch lengths of the resulting

-

Kolthoff Göker, and 2017; Meier

-

BLASTdistance phylogeny m

-

Kolthoffet al., 2013). The treewas visualized ethod,using the formula D0, yielding

276 Chapter 10 - Appendix

Appendix Table 10.6 Summary table of the output from VICTOR. The genomes with the same numbers indicate that these belong to the same species, genus or family. This information is reflected in the tree in Appendix Figure 10.3. Genomes Species Genus Family Bacillus phage PM1 1 1 2 Lactobacillus phage iA2 4 2 2 Lactobacillus phage PLE2 5 2 2 Lactococcus phage 1358 6 3 1 Listeria phage P35 7 4 1 Listeria phage P40 8 4 1 Paenibacillus phage PG1 9 5 2 Bacillus phage SPP1 10 6 2 Butyrivibrio phage Arian 12 7 3 Butyrivibrio phage Bo-Finn 12 7 3 Butyrivibrio phage Ceridwen 13 8 3 Butyrivibrio phage Arawn 11 9 3 Butyrivibrio phage Idris 14 9 3 Clostridium phage phiCP13O 2 10 4 Clostridium phage phiCP34O 3 10 4 Clostridium phage 39-O 15 10 4 Clostridium phage phiCP26F 16 10 4 Clostridium phage phi9O 16 10 4

277

Chapter 10 - Appendix

differentexpression HAPs in (in red) and NAPs (in green). Pathway made usingiPath3.0 Appendix

Figure

10

.

4

Metabolism pathway map showing the pathways related for the to orthologous functional groups found to significantly show

(Darzi et al., 2018)

.

278

Chapter 10 - Appendix

expression HAPs in (in red)NAPs and (ingreen). Pathway made using iPath3.0 Appendix

Figure

10

.

5

Metabolism pathway map sh

owing the pathways related for the to

(Darzie

t t al.,2018)

KEGG

orthologous

. .

gene

groups

thatshowed significantlydifferent

279