View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by Helsingin yliopiston digitaalinen arkisto

Department of Veterinary Biosciences Faculty of Veterinary Medicine University of Helsinki Helsinki, Finland

Metaproteomics of the Human Intestinal Tract to Assess Microbial Functionality and Interactions with the Host

Carolin Adriane Kolmeder

ACADEMIC DISSERTATION To be presented, with the permission of the Faculty of Veterinary Medicine of the University of Helsinki, for public examination in Walter auditorium, EE-building, Agnes Sjöberginkatu 2, Helsinki, on May 28th, at 12 noon.

Helsinki 2015

Director of Studies: Professor Airi Palva Department of Veterinary Biosciences University of Helsinki Helsinki, Finland

Supervisors: Professor Willem M. de Vos Department of Veterinary Biosciences and RPU Immunobiology Faculty of Medicine, University of Helsinki, Helsinki, Finland and Laboratory of Microbiology Wageningen University Wageningen, The Netherlands

Docent Anne Salonen Professor Airi Palva Department of Immunology and Bacteriology Department of Veterinary Biosciences University of Helsinki University of Helsinki Helsinki, Finland Helsinki, Finland

Pre-examiners: Docent Kirsi Savijoki Associate Professor Koen Venema Department of Food and Environmental Department of Human Biology Sciences Maastricht University – Campus Venlo University of Helsinki Maastricht, The Netherlands Helsinki, Finland

Opponent: Docent Petri Auvinen Institute of Biotechnology University of Helsinki Helsinki, Finland

Published in Dissertationes Schola Doctoralis Scientiae Circumiectalis, Alimentarie, Biologicae

ISBN 978-951-51-1169-2 (paperback) and 978-951-51-1170-8 (PDF) ISSN 2342-5423 (print) and 2342-5431 (online) http://ethesis.helsinki.fi

Cover: Talvinen auringonlasku Kolilla – Jochen Müller Layout: Jochen Müller

Hansaprint Helsinki, Finland 2015

Non nobis solum nati sumus. Cicero/Plato

ABSTRACT ...... I LIST OF ORIGINAL PUBLICATIONS ...... II ABBREVIATIONS ...... III GLOSSARY ...... IV 1. INTRODUCTION ...... 1 2. REVIEW OF THE LITERATURE ...... 3 2.1 Human intestinal microbiota ...... 3 2.1.1 Development from birth to old age ...... 3 2.1.2 Variation and modulation in adulthood ...... 6 2.2 Methods to study the human intestinal microbiota ...... 7 2.2.1 Faecal material as the study object ...... 7 2.2.2 Phylogenetic analysis ...... 8 2.2.3 Metagenomics ...... 9 2.2.4 Metatranscriptomics ...... 13 2.2.5 Metaproteomics ...... 13 2.2.6 Data integration ...... 19 3. AIMS OF THE STUDY ...... 20 4. MATERIALS & METHODS ...... 21 4.1 Study cohorts and samples ...... 21 4.2 Methods ...... 22 4.2.1 Metaproteomics ...... 22 4.2.2 Phylogenetic microarray analysis ...... 23 4.2.3 Statistical analysis ...... 24 4.2.4 Additional data-sets used in this thesis ...... 24 5. RESULTS AND DISCUSSION ...... 25 5.1 Method development to study the human intestinal metaproteome (I, II and V) ...... 25 5.1.1 Analysis of the abundant metaproteome (II) ...... 25 5.1.2 Reproducibility testing of the pipeline (II) ...... 27 5.1.3 Development of in-house databases (I-IV) ...... 28 5.1.4 Functional annotation with COG, KEGG and eggNOG (I-IV) ...... 29 5.1.5 Reflections on the metaproteomic approach ...... 30 5.2 Intestinal metaproteomic baseline of healthy individuals and its variation over time ...... 31 5.2.1 Individuality of the faecal metaproteome (II and III) ...... 33 5.2.2 Main microbial functionalities (II and III) ...... 33 5.3 Host-microbiota interactions (II-IV) ...... 34 5.4 Comparing the metaproteome of lean and obese individuals (IV)...... 35 5.5 Phylogeny-specific metabolic activity – 16S rRNA genes versus peptides analysis (II-IV) ...... 36 5.6 Reflections on study cohorts ...... 37 6. CONCLUSIONS AND FUTURE PERSPECTIVES ...... 39 7. ACKNOWLEDGEMENTS ...... 42 8. REFERENCES ...... 45

ABSTRACT

The human body is complemented by its microbes. Consequently, various intimate reactions between host and microbes have been recognized and determining the role of the microbiota in many diseases has started. Until the present time, most attention has been focused on the intestinal tract that contains the majority of the microbiota. Extensive studies on the phylogenetic composition of the microbes present in faecal material revealed species richness and the individuality of the intestinal microbiota, unique to each human being. The work described in this thesis started when the first large scale metagenome analysis of the intestinal microbiota for global exploitation of the potential biological functions of this ecosystem was being initiated. Today, approximately ten million unique genes have been identified from the human intestinal microbiota, but little is known about which of these genes can be expressed into proteins and the conditions under which the protein synthesis occurs. Therefore, metaproteomics has emerged as a discipline in order to delve into the biological functions of an ecosystem in a holistic manner by targeting all the proteins present. This thesis work describes the development of a metaproteomic approach for analysing proteins from the human intestinal tract. The basis of this approach is to use crude faecal material to determine bacterial and human proteins simultaneously. Separating the proteins by one dimensional gel-electrophoresis and focusing on the molecular weight region defined by the 37 and 75 kDa bands of a protein marker was found to be both reproducible and feasible for the comparison of several dozens of samples. In a case study on the metaproteome and metagenome of two individuals the benefit of utilizing matched metagenomes on the number of identified spectra could be demonstrated. Furthermore, a bioinformatic workflow was devised to increase the ratio of peptides identified per protein class. Applying the developed metaproteomic approach to faecal samples derived from healthy adults identified ecosystem specific components including surface proteins like flagellins and proteins involved in vitamin synthesis. The monitoring of the metaproteome over six weeks in a probiotic intervention study revealed time-dependent and inter-individual variations in these functions. The major finding emerging from this work was that the metaproteome is personalized, both over the long-term as well as in short-term. Therefore, the phylogenetic individuality of the intestinal microbiota was also functionally reflected. A different set of microbes in their specific host in combination with environmental factors will express specific functionalities. Analysing both microbial and human proteins, which represented approximately 85% and 15% of the total proteins respectively, allowed identifying indications of host-microbe interactions. Enzymes both for degrading host and microbial components were identified. Effector and response molecules of the host- microbe dialogue were found. In addition, the metaproteome of a cohort of obese individuals was compared to that of non-obese individuals. Based on the spectral counts of the eggNOG assignments of the identified peptides the faecal metaproteomes of obese and non-obese subjects were significantly different from each other. In addition, insulin-sensitivity correlated significantly with bacterial proteins. In most cases, metaproteomic data was complemented by phylogenetic microarray data. This allowed looking at the composition and its function at the same time. Data integration revealed for example a correlation between Eubacterium rectale and flagellins. In the obesity study, the proportion of Bacteroidetes peptides in the faecal samples from the obese subjects compared to the non-obese subjects were relatively higher than the presence of this phylum detected by 16S rRNA analysis. This result implies biological activity of Bacteroidetes also in obese individuals. Metagenomics is still ahead of metaproteomics, both in numbers of analysed samples and available tools. However, a refinement of metaproteomic methodologies has started and the number of samples analysed is rising. Metaproteomics is one of the central disciplines to identify the key-molecules involved in host-microbes interactions.

I

LIST OF ORIGINAL PUBLICATIONS

This thesis is based on the following original publications referred to in the text by their Roman numerals (I- V):

I Rooijers, K.*, Kolmeder, C.*, Juste, C., Doré, J., de Been, M., Boeren, S., Galan, P., Beauvallet, C., de Vos, W.M. and Schaap, P.J., 2011. An iterative workflow for mining the human intestinal metaproteome. BMC Genomics, 12(6). *Equal contribution

II Kolmeder, C.A., de Been, M., Nikkilä, J., Ritamo, I., Mättö, J., Valmu, L., Salojärvi, J., Palva, A., Salonen, A. and de Vos, W.M., 2012. Comparative metaproteomics and diversity analysis of human intestinal microbiota testifies for its temporal stability and expression of core functions. PLoS ONE, 7(1), e29913.

III Kolmeder, C.A., Salojärvi, J., Ritari, J., de Been, M., Raes, J., Falony, G., Vieira-Silva, S., Kekkonen, R.A., Corthals, G., Palva, A., Salonen, A. and de Vos, W.M., 2015. Faecal metaproteomic analysis reveals a personalized and stable functional microbiome and limited effects of a probiotic intervention in adults. Submitted.

IV Kolmeder, C.A., Ritari, J., Verdam, F.J., Muth, T., Keskitalo, S., Varjosalo, M., Fuentes, S., Greve, J.W., Buurman, W.A., Reichl, U, Rapp, E., Martens, L., Palva, A., Salonen, A., Rensen, S. and de Vos, W.M., 2015. Colonic metaproteomic signatures of active and the host in obesity. Submitted.

V Kolmeder, C.A. and de Vos, W.M., 2014. Metaproteomics of our microbiome – developing insight in function and activity in man and model systems. Journal of Proteomics, 97, 3-16.

The original articles are reprinted with the kind permission of the publishers. In addition, some unpublished results are presented.

II

ABBREVIATIONS

1-DE one dimensional gel electrophoresis 2-DE two dimensional gel electrophoresis AMP abundant metaproteome BMI body mass index CD Crohn’s disease COG cluster of orthologous groups DB database EggNOG evolutionary genealogy of genes: non-supervised orthologous groups FDR false discovery rate geLC-MS/MS protein fractionation by 1-DE and peptide analysis by LC-MS/MS GIT gastro-intestinal tract GMM microbial gastro-intestinal metabolic modules HPLC high performance liquid chromatography HMP human microbiome project IBD inflammatory bowel disease IBS irritable bowel syndrome IMG/JGI integrated microbial genomes/joint genome institute KEGG Kyoto encyclopedia of genes and genomes KO KEGG orthology LC liquid chromatography LCA lowest common ancestor MALDI-ToF matrix assisted laser desorption/ionization time-of flight MRM multiple reaction monitoring MS mass spectrometry MS1 peptide ions MS/MS tandem mass spectrometry NGS next generation sequencing NSP non-starch polysaccharides NOG non-supervised orthologous group OTU operational taxonomic unit PSM peptide spectral matching qPCR quantitative polymerase chain reaction RS resistant starch SCFA short chain fatty acids SCX strong cation exchange SRM selected reaction monitoring UniProtKB universal protein resource knowledgebase

III

GLOSSARY

Activity Collection of expressed functions of a cell; in contrast to genetic material and potential

Ecosystem Living environment and organisms within

Functionality Collection of functions a cell is capable of performing

Matched data Metaproteomic and metagenomic data originating from the same biological sample; matched metaproteome and matched metagenome

Microbiota Collection of microbes living in an ecosystem

Microbiome Collection of genes of a microbiota; also used as a synonym for microbiota

Meta-Omics Global analysis of molecules contained in an ecosystem or microbiota with omics techniques that aim at the collective characterization and quantification of pools of biological molecules; metagenomics, metatranscriptomics, metaproteomics

Phylogeny Evolutionary relation of organisms; nowadays mostly based on genetic comparison

Taxonomy Hierarchical grouping of organisms according to phylogeny

IV

Introduction

1. INTRODUCTION

Making new biological discoveries requires two components: a special interest as well as technological possibilities. The discovery of microbial life by Antonie van Leeuwenhoek (1632-1723) is a good example from the early days of microbiology as it illustrates his keen interest in understanding a biological phenomenon and advancing technical developments. Van Leeuwenhoek was a careful observer and associated his diarrhoea to the presence of more and different little animals “animalcules”, as he called the microbes, in his faecal excrements. From his drawings one may conclude that he suffered from an infection by the protozoan Giardia. In addition, he developed lenses that later advanced into microscopes, instruments that are still widely used in microbiology today. Ever since, most attention for microbes living in or around the human body, has been focused on pathogenic microbes. However, in the late 19th century studying microbes being possibly beneficial to humans, such as the lactic acid bacteria, started. Microbiology has come a long way from these observations. Today, not only single microbial organisms are studied in isolation but also complete microbial communities, termed microbiota, are analysed, preferably in their original environment. The analysis of soil, oceans, eons-old ice, and the bodies of a variety of animals, has revealed huge microbial species variations and has provided indications of microbial functions. For instance, termites have specialized hindgut microbes helping them with degrading wood material and cows require a rumen microbiota to supply them with energy out of grass cellulose. By now, from all these microbial reservoirs, one has received most attention, as evidenced by the numbers of papers: the human being. Skin, the oral cavity and gut, among other body sites, are populated by microbes. In particular, the intestinal microbiota has become the focus of many studies that address their associations with health and disease. Modern microbiology has access to a large set of sophisticated molecular tools, each of which provides important information on the studied ecosystem (Table 1; Figure 1). The dominant phylogenetic composition of a microbiota can be assessed by analysing the 16S rRNA gene; this answers the question ‘Who is there’. The development of a variety of high-throughput gene sequencing methods has permitted the analysis of the entire gene repertoire of a microbial community, also known as the microbiome. The resulting metagenomic information describes the functional potential of the intestinal microbes, answering the question ’What can they do’. At present, phylogenetic and metagenomic approaches targeting the human microbiota are dominating the literature. However, other meta-omic approaches that analyse microbial consortia within their ecosystem are required to expand our knowledge of their in vivo functions by addressing the question ‘What are they doing’. The metabolic net-outcome (metabolomics) has been determined in a variety of studies. To some extent, the transcriptional activity of the microbial ecosystems (metatranscriptomics) has been addressed as well. Similarly, less studied than the phylogenetic composition and metagenome is the metaproteome, which includes the entire collection of proteins that are present within an ecosystem. Studying the proteins is required to answer the question ‘What has happened’. The term metaproteome was first proposed in 2004 (Rodríguez-Valera, 2004). In the same year, the first metaproteomic study, which analysed the metaproteome of activated sludge, was published (Wilmes and Bond, 2004). It was not before 2006, that the possibilities and challenges of metaproteomics were described in full (Wilmes and Bond, 2006). At the start of this thesis work, only one paper had been published describing the qualitative changes of the intestinal metaproteome occurring during early infancy (Klaassens et al. 2007). Due to the absence of reference genomes that are instrumental for protein identification, less than ten proteins could be identified. By now, metaproteomics has found application in many different environments from contaminated soil to the Atlantic Ocean, from termite hind-gut to human cerumen (Benndorf et al. 2007; Warnecke et al. 2007; Feig et al. 2013; Georges et al. 2014). The main aims of this thesis project were to develop an approach to analyse the human intestinal metaproteome and then to utilize this approach to establish a baseline of a healthy metaproteome.

1

Introduction

Table 1. Overview of state-of-the-art methods to study the human intestinal microbiota. The analytical target is named as well as the expected deliverables (Readout).

Method Target Readout

Phylogenetic analysis 16S rRNA sequences Microbiota composition and diversity

Metagenomics All DNA present Catalogue of microbial coding capacity and diversity on the functional level

Metatranscriptomics All mRNA Transcription of microbial genes as a proxy for their function – some host transcripts may be discovered as well Metaproteomics All proteins Expression of microbial and host genes into proteins as a mark for their intestinal function Metabolomics All metabolites and other small molecules Net metabolic output of the intestine derived from host-microbe interactions

Culturing Single microbes Real functional analysis

Figure 1. Tools with which to study the intestinal ecosystem. The intestinal ecosystem is directly influenced by the host and the environment surrounding the host.

2

Review of the Literature

2. REVIEW OF THE LITERATURE

2.1 Human intestinal microbiota

2.1.1 Development from birth to old age

We are born into a world full of microorganisms, including Bacteria, Archaea and Fungi as well as their viruses. Consequently, these microbes also colonize us. The first cohabitants on our inner and outer surfaces usually derive from our mothers and there seem to be routes that expose already the foetus to microbes, challenging the earlier view that babies are borne completely sterile (Tissier, 1900; Jiménez et al. 2005; Jost et al. 2014). Recent phylogenetic analysis has revealed that a great variety of microbial communities are hosted by the human body, distinct to each body site and niche (Human Microbiome Project Consortium, 2012). Each individual harbours a unique microbiota, this being influenced by both the host’s genes and environmental factors. However, for example the skin microbiota of two individuals is more similar than the skin microbiota and the intestinal microbiota within one individual, highlighting the site-specificity of the host-associated microbial communities. In addition, even a single body site, such as the mouth, can contain many micro-environments and hence different microbial communities exist in different niches (Kort et al. 2014). The intestine is not only the most complex but also the most studied of these ecosystems i.e. the environment and the organisms living within it (Qin et al. 2010; Karlsson et al. 2014; Li et al. 2014). The start-up community of the intestine may be shaped by the mode of delivery, infant nutrition – breast milk versus formula – and use of antibiotics, and may thereby also have an impact on health later in life (Penders et al. 2006; Scholtens et al. 2012; Nylund et al. 2014). Probably the most long-lasting effect of the intestinal community on our well-being is the stimulation of the immune-system. The intestinal microbiota plays an essential part in the priming of the immune system, reviewed by Weng and Walker (2013). This priming is needed in order to acquire the delicate balance between tolerance to antigens derived from food and commensals, and effective reactivity against pathogens later in life. As the human body and its physiology develop from infancy to adulthood, so does our intestinal microbiota. There are drastic changes taking place in infancy and childhood evidently affecting the microbiota. One early event is the change from nursing on breast milk, containing growth stimulating components for microbes (Wang et al. 2015), or formula to eating solid food (Bergström et al. 2014). The constant body growth and other developmental changes cause a constantly changing environment for the intestinal microbiota. The present studies have mostly focused on analysing faecal samples from infants and adults. From toddler age to adolescence less data is available and therefore the constant succession of the microbiota from infancy to adulthood is largely unknown. However, from early life until adulthood the diversity of the microbiota increases and an ecosystem rich in several hundreds to thousands of microbial species becomes established (Nielsen et al. 2014). At the level of operational taxonomic units (OTUs), unique organismal genetic units, the variation across individuals is tremendous, even though the main phyla in the gut are limited, including , Bacteroidetes, Actinobacteria, Proteobacteria and Verrucomicrobia. Collecting and curating 16S rRNA gene sequences at 97% similarity derived from the human intestine as deposited in public databases resulted in the definition of 2,473 OTUs (Ritari et al. 2015). These represent the present estimate of the number of bacterial and archaeal species possibly found in the human intestine, meaning that there is an almost infinite number of combinatorial community alternatives. In addition, more genera and species are constantly being isolated (Rajilić-Stojanović and de Vos, 2014). As discovered over a decade ago, each individual harbours one’s own personalized microbiota determined by genetic, environmental and dietary factors. As long as no drastic changes occur (see below), the microbiota is relatively stable and the temporal fluxes are smaller than the between-individual differences (Jalanka-Tuovinen et al. 2011). Therefore, the individual microbiota can be used for subject-identification. A challenging down-side of the large microbiota differences between

3

Review of the Literature individuals is evident when setting up group comparisons e.g. when attempting to study the effect of a special dietary regime. The between-group differences are hard to detect when the within-group differences are already large. However, unifying patterns across a collection of individuals have been identified and a set of enterotypes, each dominated by a certain group of bacteria, have been proposed (Arumugam et al. 2011). These patterns suggest that there is a limited set of module solutions providing the integral features in the intestine, provided by dominating bacteria. Such patterns like the enterotypes may be useful in study design since stratifying for a certain enterotype may make it easier to detect group differences. Recently, also the microbiota diversity (Salonen et al. 2014) and the abundance of individual bacteria (Korpela et al. 2014) have been reported to associate with the individual heterogeneity in dietary responses. This suggests that different features of the microbiota can be utilized in the stratification of the study subjects and in this way decrease the within-group variability. The whole digestive tract, from the oral cavity down to the colon, provides several different niches for microbial colonization, with the colon exhibiting the highest microbial density (Figure 2). As nutrient availability and human components change along the gastro-intestinal tract (GIT), the composition of the microbiota changes from one end to the other. The living environment of the intestinal microbes is not static at all. Among other factors, the amount of oxygen, the pH that varies from acidic in the stomach to more neutral in the intestine, and notably bile, which is a natural detergent, affect the types and amount of microbes present. In addition, pH and bile can vary due to changes in the diet and these can lead to changes in microbial communities (Duncan et al. 2009). The microbiota is encompassed by varying human enzymes having their origin in the saliva, gastric and pancreatic juice, bile and, in early life, breast milk. For example, the human intestinal alkaline phosphatase and the salivary scavenger and agglutinin (SALSA) have been shown to have an effect on the growth of intestinal microbes (Lallès, 2014; Reichhardt et al. 2014). The microbiota is also in contact with mucus, immune components and immune cells. Inevitably, evolution has produced an individual host-microbiota mutualism (Quercia et al. 2014). The human host provides the home for the microbiota and the intestinal microbes make a recognizable contribution to human physiology. The most obvious factor that links men and microbes is the diet and its major components; carbohydrates, lipids and proteins. Out of the large variety of carbohydrates, mono- and disaccharides and the less complex polysaccharide starch are digested mostly by host enzymes in the upper GI tract and immediately taken up as they are the major energy source. However, resistant starch (RS) and non-starch polysaccharides (NSP), which include celluloses, pectin, and hemicellulose (xylose, mannose or glucose), mostly reach the colon in an undigested form. Microbes, equipped with a large variety of hydrolases, degrade, to different extents, these plant-derived carbohydrates, which consist of a variety of sugars that are linked with various glycosylic bonds (Martens et al. 2014). For example, the genome of Bacteroides cellulosilyticus encodes over 400 enzymes for carbohydrate degradation. In general, the microbial gene repertoire for hydrolysing carbohydrates is much wider than that of the host. This means that humans depend on microbes to degrade certain food components; this may provide flexibility to adapt to short or long-term changes in the diet (Sonnenburg and Sonnenburg, 2014). Certain intestinal microbes not only rely on carbohydrates of dietary origin, but they also have enzymatic capacity to degrade host glycans derived from mucus, the glycoprotein structure secreted by goblet cells that covers the epithelial cells. For example Akkermansia muciniphila possesses this characteristic (Derrien et al. 2004), reviewed by Ouwerkerk et al. (2013); in this way, they contribute to a turn-over of this complex glycoprotein structure.

4

Review of the Literature

Figure 2. Schematic representation of the human digestive tract. Left side: concentration of microbes in cfu/ml. Right side: human components shaping the ecosystem; body fluids are secreted to the digestive tract along the digestive process; the intestine is lined by epithelial cells covered with a mucus layer produced by goblet cells, in yellow; various immune cells are present near the epithelial cells, dendritic cell as representative (Wang et al. 2005; Lawley and Walker, 2013; Schuijt et al. 2013)

While processing dietary and endogenous glycans, intestinal bacteria produce a variety of molecules that contain energy and/or act as bioactive molecules. The short chain fatty acids (SCFA) represent one group of microbial metabolites that is receiving considerable attention. Acetate, propionate, and butyrate are the major SCFA being formed in the intestine (Cummings and Macfarlane, 1991). In particular, butyrate is of special interest as it is both energy source and a signalling molecule for the epithelial cells. Several metabolic pathways for forming butyrate have been described (Louis et al. 2010). Similarly, propionate can be produced by different enzymatic routes (Reichardt et al. 2014). The main butyrate producers are members of Ruminococcacea and , the key species being Faecalibacterium prausnitzii and Roseburia intestinalis, caccae and Eubacterium hallii. Microbial networks exist for cross-feeding as certain species depend on the presence of other microbes and their contribution of certain substrates (Scott et al. 2013). For example Bifidobacterium ssp. produce lactate and acetate during growth on the polysaccharide inulin. These metabolites are further utilized by other bacteria, e.g. E. hallii, to produce butyrate.

5

Review of the Literature

Similarly to the hydrolases for degrading glycans, intestinal microbes possess genes with coding capacity to transform bile salts to a diverse range of metabolites. These metabolites have been shown to play a role in modulating the host’s energy metabolism, reviewed by Li and Chiang (2015). In addition, the GIT microbiota produces a set of proteinases. Microbial proteinases derived from pathogens have been shown not only to digest proteins but also to have non-proteolytic properties such as signalling to the host (Jarocki et al. 2014). However, as proteinases have been proposed to play a role in inflammatory bowel disease (IBD), it is likely that also commensal proteinases can have other properties than proteolytic ones (Steck et al. 2012). Intestinal microbes also contribute to the micronutrient supply and synthesize vitamins, for example B vitamins (LeBlanc et al. 2013). However, absorption by the host is not always proven. Thiamine has been shown to be synthesized by the intestinal microbiota and also human transporters exist implying the absorption by the host (Nabokina et al. 2014). Bacterial vitamin synthesis capacity seems to vary according to geographic region and depend on food supply and age (Yatsunenko et al. 2012). This indicates that vitamins of microbial origin contribute to the host’s vitamin supply. In contrast to vitamins, our GIT microbiota for example depends on the trace element iron from the host’s diet, as reviewed by Kortman et al. (2014). Changes in the metabolic activities of the intestinal microbiota possibly occur with old age. When the hormone set-up, the levels of physical activity, and nutrition habits are changing, the microbiota changes as well. In centenarians, it has been found that an increase in the levels of inflammation, so called inflammaging, was associated with a reduction in the numbers of butyrate-producing F. prausnitzii and an increase in the levels of potential pathogens (Biagi et al. 2010; Rampelli et al. 2013).

2.1.2 Variation and modulation in adulthood

The first component in a human’s life affecting the composition of an individual’s intestinal microbiota is the personal gene make-up. Twin studies have revealed that host genetics to some extent explain the composition of the intestinal microbiota (Zoetendal et al. 2001; Turnbaugh et al. 2009; Tims et al. 2012). In addition, single nucleotide polymorphisms have been shown to affect the intestinal community structure; this has been documented for the gene encoding fucosyl transferase (fut2) that determines the secretor-status of the host (Wacklin et al. 2014). The secretor-status profoundly affects the intestinal ecosystem as only in secretors the intestinal mucosa is decorated with fucose, a carbohydrate abundantly utilized by intestinal bacteria. Secretors and non-secretors have been shown to exhibit compositional differences in their microbiota. This provides one of the first examples with a known mechanism to explain how the host genotype shapes the intestinal microbiota (Wacklin et al. 2014). However, these are still early days of integrating the two extremely complex systems and trying to link variations in the gut microbiome to those in the human genome. Second, diet, as introduced above, is inevitably influencing the intestinal community. However, only a few human studies have been conducted that describe the daily differences in diet and the different macro- and micro-nutrient content. For example, in a cross-over study in which study participants consumed either a control diet, a diet high in RS or high in NSPs, or a weight loss-inducing diet, it was found that changes had occurred in microbial composition upon diet-switching (Walker et al. 2011). The effect of diet, attributable to food components, on the human intestinal microbiota has been reviewed recently (Salonen and de Vos, 2014; Graf et al. 2015). The microbial metabolomics profiles in vegans and omnivores have been compared, the effect of energy-content of diet on microbial composition has been studied, and metagenomic, metatranscriptomic and phylogenetic composition changes in response to a meat-fat based diet analysed (Jumpertz et al. 2011; Wu et al. 2014; David et al. 2014). In addition, differences in microbial composition and metagenomes in different ethnic groups have been detected (Yatsunenko et al. 2012; Tyakht et al. 2013; Schnorr et al. 2014; O’Keefe et al. 2015). These studies possibly reflect the different diets of the ethnic groups but it should be noted that these studies also compare populations with profound differences in

6

Review of the Literature hygiene level, general lifestyle, and host genetics. Flexibility in metabolic activities similar to that of the host is to be expected, reviewed by van Ommen et al. (2014). Therefore, setting aside extreme diets, most dietary interventions in healthy adults only lead to subtle changes in the microbiota (Salonen and de Vos, 2014). A recent meta-analysis including three different intervention studies on obese patients on different dietary interventions showed that some individuals’ microbiota may be responsive and others not upon dietary changes (Korpela et al. 2014). In a study, where healthy individuals consumed a mixture of seven milk- fermenting bacterial strains only subtle changes were found, as observed at the transcriptional level. The microbial composition remained unchanged (Mahowald et al. 2009). According to the definition provided by the World Health Organization (WHO), probiotics are "live micro-organisms which, when administered in adequate amounts, confer a health benefit on the host". They encompass a wide range of different species and have been widely used for example to modulate gut transit time (Dimidi et al. 2014). On microbial composition in healthy individuals, only subtle effects, if any at all, have been observed in response to the consumption of probiotics (Gerritsen et al. 2011). Third, the health status of a person can impact on the intestinal microbiota. Many associations between microbiota and disease have been reported. Moreover, indirect effects are well studied, such as by the usage of antibiotics. Also direct impact is likely but often it is not known whether the changes in the microbiota are causal or a consequence of the disease. Dramatic changes in an individual’s microbiota can occur upon antibiotic treatment as reviewed recently (Panda et al. 2014). It has been proposed that if no resilience is achieved, i.e. the changes to the intestinal microbial composition upon antibiotic usage are not balanced, infections by Clostridium difficile can occur (Song et al. 2013). These infections can become life-threatening in the case when C. difficile produces toxins in the intestine. Infections by C. difficile have been efficiently treated by introducing the microbiota from a healthy donor (Bookstaver et al. 2013; van Nood et al. 2013). This is evidence that the microbiota is causal both for the reoccurrence of the disease and its cure, one rare example of causality in a highly complex system. There is a large number of diseases that may involve the intestinal microbiota (de Vos and de Vos, 2012) ranging from intestinal diseases, such as IBD (Cantarel et al. 2011; Machiels et al. 2014) and irritable bowel syndrome (IBS) (Kassinen et al. 2007), to systemic conditions or diseases like obesity (Everard and Cani, 2013), non-alcoholic fatty liver disease, and diabetes type II (Vrieze et al. 2012), and even neuronal disorders such as Parkinson’s disease (Scheperjans et al. 2014). In most cases, differences in composition have been detected but mechanistic explanations are missing. In addition, it needs to be demonstrated whether these associations are a consequence or cause of the disease. Obesity and related metabolic aberrations represent one of the most actively studied diseases in relation to the intestinal microbiota. Studies addressing obesity-related phylogenetic profiles have yielded controversy results (Finucane et al. 2014). Those might be due to the applied methodology but also to the study cohort as the severity of obesity differed in the compared studies (Walters et al. 2014). On the other hand, several mechanisms that link gut microbiota to obesity and metabolic diseases have been proposed. These include increased energy harvest from the diet via SCFA production, enhanced fat storage as well as promotion of low-grade inflammation that contributes to development of insulin resistance (Moreno-Indias et al. 2014). This clearly shows that the functionality of the microbiota has to be studied in addition to the composition in order to understand the host-microbe cross-talk in obesity.

2.2 Methods to study the human intestinal microbiota

2.2.1 Faecal material as the study object

The easiest access to the human intestinal microbiota is faeces, which contains microbes and molecules that reflect as closely as possible the activities in the colon (Cummings, 1975). Since faeces can be obtained in a non-invasive manner, human studies targeting the intestinal microbiota have been mostly based on the analysis of faeces. In contrast, very few studies have involved samples taken from the ileum, intestinal

7

Review of the Literature mucosa (biopsies) or were conducted from other types of specimens such as luminal washes or biopsies (Booijink et al. 2010; Li et al. 2011). These studies clearly show that each compartment has its distinct microbiota with its own distinct functions. The components of faeces, which originate from host, food and microbes, reflect the whole digestive process. Faeces has been used as a cure for over thousand years and its composition has interested scientists for more than 100 years (Joslin, 1901). The bacterial mass represents more than half of the faecal wet weight, half of which were found to be intact microbial cells although many of these cells were not alive (Stephen and Cummings, 1980; Ben-Amor et al. 2005). Research has moved from describing the composition of faeces to using it for discriminating between different body conditions. Individual human proteins have been studied in faeces and used as disease markers e.g. calprotectin, a calcium binding protein derived from human neutrophils, is utilized as an inflammatory marker in inflammatory bowel disease (Foell et al. 2009). Elevated levels of calprotectin were also measured in a cohort of obese individuals indicating an inflamed state of the intestinal ecosystem in individuals with a high BMI (Verdam et al. 2013). The liquid phase of faeces, termed faecal water, has been widely used to test for toxic compounds, as reviewed by Pearson et al. (2009). Recently, faecal water has also been used to study the net outcome of host and microbiota metabolism and turn-over by metabolomics, both by nuclear magnetic resonance (NMR) and mass spectrometry (MS) techniques (Gao et al. 2009; Wu et al. 2010; O’Keefe et al. 2015). Since the metabolites present in faeces do not reflect the proportion of absorbed molecules and metabolites, it may be complemented with blood analysis to measure systemic metabolites. For example, the levels of plasma lipids have been determined and correlated to the intestinal microbial composition (Lahti et al. 2013a). Recently, several studies assessing the metabolites in faeces appeared. For example differences in microbial metabolites between chocolate-eaters and non-chocolate eaters were detected; another study described the effect of moderate wine consumption (Rezzi et al. 2007; Dewulf et al. 2013; Jiménez-Girón et al. 2014).

2.2.2 Phylogenetic analysis

The intestinal microbiota is phylogenetically rich, spreading over the three domains of life Bacteria, Archaea and Eukaryotes. There is a constant up-dating of phylogenetic grouping and classification (Rajilić-Stojanović and de Vos, 2014). Fungi and Archaea are more genetically deeply branched but also less researched than Bacteria, which represent the majority of intestinal microbes. However, Archaea, like Methanobrevibacter smithii and Methanosphaera stadtmanae, are of special interest due to their role in hydrogen disposal and in the production of methane that regulates the fermentation rate of some species (Gaci et al. 2014). Viruses add an additional “phylogenetic” complexity to the intestinal ecosystem. Similarly to the mycome, the virome has just started to be studied, but it should not be neglected when aiming to understand the entire intestinal ecosystem (Norman et al. 2014). The traditional way of identifying the phylogeny of a microorganism has been its isolation by culturing and studying its structural, physiological and biochemical characteristics such as cell-wall structure with Gram-staining and substrate preferences via feeding tests. There is the assumption that the majority of the world’s microbial biodiversity has not been cultured yet (Rappé and Giovannoni, 2003). At present, 1,057 intestinal microbial species have been cultured; 957 Bacteria, 92 Fungi, and 8 Archaea (Rajilić-Stojanović and de Vos, 2014). These represent less than half of the almost 2,500 known OTUs. Many of the intestinal microbes are strict anaerobes or are otherwise very demanding regarding their preferred culture and growth conditions. Moreover, some microbes may be obligate syntrophs and cannot be grown as pure cultures, or may grow very slowly demanding a patient researcher and long term funding. Molecular tools have been instrumental in exploring the microbial species variety. One specific gene has been established to categorize the phylogeny of an organism i.e. the gene encoding the 16S rRNA, an essential part of the prokaryotic ribosome. The 16S rRNA is an approximately 1,500 nt molecule, consisting of conserved gene regions and nine hypervariable parts. These characteristics make it an ideal phylogenetic marker and its sequence reflects evolutionary distance. The conserved regions are used to design nucleic acid

8

Review of the Literature primers for amplification of a broad range of species whereas the hypervariable part can be used for species identification or at least clustering it to its nearest neighbour. Quantitative polymerase chain reaction (qPCR) assays have been used to quantify certain microbial genera or species (Rinttilä et al. 2004). For community-wide microbiota studies 16S rRNA gene-based microarrays have been developed, such as the human intestinal tract chip HITChip (Rajilić-Stojanović et al. 2009). Microbial fingerprints have been used to compare the stability of the microbiota over time and relative differences of individual taxa between individuals and study groups. For example, phylogenetic fingerprints have shown clustering according to an individual’s health status such as the presence of IBD and obesity (Erickson et al. 2012; Verdam et al. 2013) and also served as predictors of the effect of diet (Korpela et al. 2014). As sequencing technologies have advanced and costs are sinking, phylogenetic analysis has reached a next level of sophistication. Instead of analysing a predefined set of phylotypes, the full diversity of specific microbiota can be explored. Next generation sequencing (NGS) techniques, sequencing techniques that followed the Sanger sequencing protocol and outperform it in terms of throughput and costs, made the global analysis of 16S rRNA possible. Sequencing platforms and strategies for gene assembly are reviewed in (Nagarajan and Pop, 2013). Accumulation of 16S rRNA sequences from various environments necessitate selection of publically available 16S rRNA for an annotation of the sequence reads as specific as possible. This will ease to find differences between individuals as those are more likely to occur at the species or even strain level rather than at the genus or higher levels. Such a database has been created recently for the intestine containing 2,473 OTUs, as indicated above, compared to over 100,000 contained in the 16S rRNA sequence repositories SILVA and Greengenes (Ritari et al. 2015). Accumulation of genomic and metagenomic data has supported the development of a software tool called Picrust that infers functions from the phylogenetic composition based on 16S rRNA gene information (Segata et al. 2012). Although technical developments have greatly facilitated phylogenetic analysis, one main factor still remains to complicate between-study comparisons i.e. DNA extraction. Large differences in the abundance of the major phyla have been found across studies that cannot be explained only by the sample origin (Lozupone et al. 2013). For simultaneous lysis of a diverse mixture of cells ranging from Gram-positive Firmicutes to Gram-negative Bacteroidetes, mechanical lysis by bead beating has been shown to be the less biased method (Salonen et al. 2010). In addition to DNA-extraction, sequencing specific biases such as the choice of primers, especially which variable region they are targeting, and the analytical platform used, affect the result of phylogenetic composition analyses (Lozupone et al. 2013).

2.2.3 Metagenomics

A number of technological developments including advanced PCR (Williams et al. 2006), next generation sequencing technologies, and new gene assembly strategies (see above) have made it possible to target the complete genetic content of microbial communities. Several studies have taken advantage of metagenomics to examine the genetic variety of the intestinal microbiota (Table 2). Sophisticated data processing strategies are required if one wishes to efficiently analyse the complex data generated. Hence, the recent years have seen the construction of several pipelines for processing raw sequence reads up to gene annotation, such as Medusa (Karlsson et al. 2014), Metaphlan (Segata et al. 2012), MEGAN (Huson and Weber, 2013), as well as metaProx that has been designed for faster annotation (Vey and Charles, 2014). In the late 2000s, two large projects focusing on the human microbiota were launched: the European MetaHIT with about 20 million Euro funding from the European Union focusing on the intestinal microbiota and the U.S. Human Microbiome Project HMP funded with over 100 million US $ from the National Institutes of Health, targeting microbiota at various body sites. Following the initial discovery of a baseline of 3 M genes, at the end of the MetaHit project, a total of 511 Europeans and 368 Chinese individuals had had their intestinal microbiota sequenced (Qin et al. 2010; Li et al. 2014). The HMP has sequenced over 500 human intestinal genomes as a reference data set to facilitate the assignment of metagenomic reads (Human

9

Review of the Literature

Microbiome Jumpstart Reference Strains Consortium et al. 2010). In addition, the microbiota from all body cavities of more than 100 subjects has been determined (Morgan et al. 2012). In both projects also data analysis tools were developed and the data has been made available to the scientific community (MetaHIT: http://meta.genomics.cn; HMP: hmpccad.com). The HMP has continued as integrated HMP (iHMP) which will focus more on the functionalities of the microbiota (Integrative HMP (iHMP) Research Network Consortium, 2014). Having started from sequencing clone libraries based on Sanger technology (Gill et al. 2006), the technological advances have been reflected in the rapid increase of the number of samples analysed in a single study, constantly increasing the number of sequenced nucleotides and the lengths of the reads while reducing the amount of time and money needed. Therefore, the number of analysed human metagenomes went up from two in 2006 to over 1,000 in 2014. Most studies have focused on identifying the genetic richness and coding capacity of the human microbiome with the focus on adults and European and U.S. inhabitants (Table 2). Mostly, the sequencing platforms HiSeq (Illumina) and 454 (Roche) have been used in the various projects. The fact that the next generation sequencing technologies is developing very rapidly is illustrated by the fact that less than ten years after its launch, the 454 sequencing platform is no longer supported by its manufacturer. The intestinal metagenomic sequencing efforts were recently combined into two separate gene atlases containing close to and over 10 Million microbial genes (Li et al. 2014; Karlsson et al. 2014; Table 2). Based on the data of both atlases, it was found that a core microbiome of close to 300,000 genes was shared by 50% of the study population. In the dataset collected by Karlsson and co-authors (2014), 1.3 million genes were reportedly shared by at least 20% of the population. The functional richness was assessed in the dataset collected by Li and co-authors (2014) and a total of 6,980 KEGG orthologous groups (KOs) (42.1% of genes) and 36,489 eggNOG orthologous groups (60.4% of genes) were identified (gene orthology databases like KEGG and eggNOG are detailed in 2.2.5). In the atlas reported by Karlsson and co-authors (2014) it was found that the core genes, i.e. genes found in at least 50% of the subjects, derived mainly from Bacteroides and Clostridium spp., which was to be expected as the majority of the intestinal microbiota belongs to the phyla Bacteroidetes and Firmicutes. More than 67% of the genes were without KO or known function but when looking at core genes, this number was reduced to 55% - this could indicate a bias in the annotation level of the shared house-keeping functions or just reflect the fact that more attention has been given to annotate the genomes of core bacteria. The fact that only half of the genes have been annotated indicates that, to a large extent, the functional individuality still has to be discovered. The explanation why individuals are either more or less prone to develop a certain disease or react in a specific way to a food component may lie in these unknown functionalities. In addition to metagenomes, sequencing of single microbial genomes is relevant for genome annotation and protein identification (see below). During recent years, there has also been a tremendous effort expended in these analyses. In 2007, there were only 306 bacterial genomes with 28 originating from the digestive system from Homo sapiens at the IMG/JGI system. The most recent release (04.02.2015) contained 23,908 bacterial genomes, out of which 1,847 had Homo sapiens as the host and in these, 398 digestive system as origin (1,034 genomes unclassified). The enormous increase of the number of sequenced bacterial genomes during the last years has been reviewed recently (Land et al. 2015).

10

Review of the Literature

6 and

;

ta resource

Da NA European Nucleotide Archive PRJNA28117 DDBJ/EMBL/ GenBank AN 32089 GenBank AN FJ362604 FJ372382 http://gutmeta. genomics.org.cn/ AN NCBI SRA04564 SRA050230

to

gene atlas gene

2010

et al. et al.

s

cluster orthologousof groups

;

Mio uniquegenes

00,000

3

,

Main finding Enrichedmetabolism glycans,of amino acids, xenobiotics 2407unique (COG) Functional differences between infants and adults and betweenhuman microbiota intestinal and other microbiota Overrepresentation: metabolism; carbohydrate underrepresentation: lipid transport & metabolism Core metagenome; differencesobesity in 3 Update of Qin 4,267,985 genes

and

gene

assembled

NA

-

R

genetic

metagenomic metagenomic

sequences

metagenomic

of metagenomicof

of of of

reads reads reads

of

40% un

b

b

;

gene

4 G

1

Sequencing method, dimension of information Sanger sequencing Bacterial 16S r distal gut metagenome 50,164 predictedreading open frames Sanger sequencing 1,057,481 sequence reads Sanger sequencing, NGS 1,937,461 partial bacterial 16S rRNA 2. sequence NGS 576.7 G sequence NGS 378.4 Gb sequence

)

69)

86

45 45 years:

-

– -

40.2)

mothers

4 days

)

±

17.9

(

(where available)

2

(range 18

years years (range 13

32 32 years) and 46

BMI, dietBMI,

-

age,

56% female,44% male diabetic patients and controls)

( (

7 months: 4; >1<3 years:24 2;

ender,

-

Study cohort Country of sample collection Sample number G U.S. 2 1 Male age 37, one female28 age one on unrestricted diet, one vegetarian Japan 13 7 male,female 5 3 7 U.S. 154 31 monozygotic and 23 dizygotic twin pairs (age 21 European Africanor ancestry concordant for obesity or leanness, lean/overweightor overweight/obese two sampling points with 57 intervals Denmark, Spain 124 Average age56 Average BMI 27kg/m China 368 43% female, male57% Average age48

a, b a, b a,

Overview significantof most the human intestinal 16S rRNA and metagenomic studies.

, 2006

Reference Gill Kurokawa, 2007 Turnbaugh, 2009 Qin, 2010 Qin, 2012

Table 2.

11

Review of the Literature

Read Archive

RAST RAST server

-

http://www.hmpdacc.org /HMGC/ MG Illumina rRNA 16SAN “qiime:850”; microbiome shotgun sequences AN “qiime:621” Sequence ERP002469 European Nucleotide Archive ERP003612 ERP002061

female 15 and male

;

18 18

878,816)

use referenceof genomes

risk individuals prediction

;

-

llumina platform; platform; llumina

groups pronounced

-

age

merindians

Microbiomes at body sites; Distribution of pathways across subjects more homogeneous than distribution taxa;of Functionality majorof cellular processes appeared be to similar across individuals 110 metagenomes Age and geographicdependent origin differences In all differences between U.S. and A Diabetic at by microbiome analysis Average 578,512genes/microbiome (range:59,147 Low andhigh diversity rich Genome assembly from metagenomic data without

for a

e PRJEB5224 e

metagenomic

reads

metagenomics

Gb ofGb

NGS 16 rRNA gene sequencing and subset of samples NGS 1,093,740,274 metagenomic sequence reads NGS 453 sequence NGS 10,067,996,412 metagenomic sequence reads NGS 4.5 Gb metagenomic sequence reads per sample

control

61)

-

years)

50

Venezuela; and

40

)

-

range

(

70 years

17 years 17(83years Malawian,

2

% male

s

years years

(range 18

5

mmg/MEDUSA/

4 56

-

46.6 kg/m

BMI 24±

53% female,47

(

U.S. 242 mean ageof 27± mean U.S. Amazonas of Venezuela, Malawian rural communities, USA metropolitan areas 531 115 individuals (34 families) from Malawi; 100 individuals (19 families) from 316 individuals (98 families) from the USA 326 individuals aged 0 65 Amerindian and 178 ofresidents the USA) plus 202 aged adults 18 Europe 145 70 years femaleold normal, orimpaired diabeticglucose Denmark 292 Average ageof BMI 18.1 33% lean, 9% overweight, obese58% Denmark, Spain 318 (including samplesQin from et al. 2010) Including andUC patientsCD

a

b

a

, 2012 ,

2013

continued

a, b a,

HMP consortium, 2012 Yatsunenko Karlsson, Le Chatelier, 2013 Nielsen 2014

Integrated in the gene collection by Karlsson et al. 2014; 782 metagenomes; 40 billion metagenomic reads from sequencing on I reads frombillion sequencing 40 metagenomic 2014; 782by metagenomes; et al. Karlsson Integrated gene collection the in

Integrated in the gene collection by Li et al. 2014; 1268 metagenomes at at European2014; 1268 metagenomes byNucleotideArchiv et al. Li Integrated gene collection the in

Table 2 project microbiome human number, generationHMP: next AN: accession sequencing, NGS: a b http://web.student.chalmers.se/groups/sysbio

12

Review of the Literature

2.2.4 Metatranscriptomics

One step down the cellular machinery to gene expression is the transcription into mRNA. However, since transcripts of bacteria and archaea have a short half-life, the design of the experimental approach and the preparation of the samples should be appropriate. In babies, where the metabolic turn-over is high and the intestinal transit usually very fast, transcripts may provide crucial information about intestinal activity. In adults mucosal or ileal samples are very appropriate for mRNA analysis and have provided essential insight into the metabolic conversions occurring in the upper intestinal tract where the microbes compete with the hosts for carbohydrates and form various trophic chains (Zoetendal et al. 2012; Zoetendal and de Vos, 2014). Moreover, this approach has revealed the functional activity of ingested Lactobacillus strains (Marco et al. 2010). In contrast, faecal samples from adults have been studied but the functional information which can be obtained from these specimens may be limited to the stable messengers (Booijink et al. 2010). It has been described that the digestive process in the human intestinal tract can last for up to six days (Cummings et al. 1978). Since the mRNA should be processed immediately, transcriptional analysis of faecal samples may be not become routine practice. Similar to 16S rRNA analysis, transcriptome analysis has moved from mere profiling and analysing a set of pre-defined mRNAs on a microarray to targeting each microbiota individually by sequencing mRNAs in an approach called RNAseq (Hampton-Marcell et al. 2013). Although faecal metatranscriptome data has to be interpreted with caution due to the above mentioned technical challenges, the present studies have provided evidence of the differences between potential function and regulated functions in vivo. In a recent study conducted in healthy men, the metatranscriptomes differed more than the metagenomes, reflecting the importance of functional studies as gene expression depends on many different factors (Franzosa et al. 2014). When individuals were challenged with a meat-fat based diet, samples clustered with KEGG assigned RNAseq data according to the diet but the taxonomic clustering remained subject-specific implying a strong effect of diet on microbial transcriptional activity (David et al. 2014).

2.2.5 Metaproteomics

Proteins are the real key-players of any physiological process. Among others, they can process substrates providing metabolic energy, have signalling and control functions within and outside the cells, have structural functions, and make a cell mobile. It is important to address the actual function out of the many possible ones encoded by the microbiome in order to obtain a more profound understanding of the activities of the intestinal ecosystem and the host-microbe interactions. Proteins are the ultimate targets for this as they determine the function of a cell. Moreover, protein synthesis is an energy-demanding process. For example in E. coli it was found that two thirds of the total energy used for growth was spent on protein biosynthesis (Neijssel and Mattos, 1994; Jewett et al. 2009). Hence, proteins are proxies of activity and since they are rather stable (Taverna and Goldstein, 2002) their analyses both represent information of on-going activities and those having taken place in the recent past. The term proteome describes the unity of proteins expressed at a certain time under a certain condition in an organelle or organism. This has to be expanded in the context of faecal metaproteomics as first reported by Klaassens and co-authors (2007) when they conducted a study in human babies. As detailed above, faecal material may contain proteins, or their fragments, derived from all steps of digestion. Therefore, faecal metaproteomics aims at analysing proteins expressed over a period of time. The proteins present in saliva (Harden et al. 2012), gastric fluid (Kam et al. 2011), bile acid (Barbhuiya et al. 2011), pancreas (Schrimpe-Rutledge et al. 2012; Paulo et al. 2013), and amniotic fluid (Michaels et al. 2007) have been characterized. Large-scale analysis of proteins, i.e. measuring hundreds to thousands of different proteins in one run has become possible due to the progress in two techniques i.e. liquid chromatography (LC), enabling the separation of several thousands of peptides based on their differences in hydrophobicity within a few hours,

13

Review of the Literature and mass spectrometry, making accurate mass measurement possible. Metaproteomics has advanced from the two dimension gel-electrophoresis (2-DE) approach followed by matrix assisted laser desorption/ionization time-of flight (MALDI-ToF) MS analysis to exploit mainly the 1-DE approach followed by LC tandem mass spectrometry (MS/MS), called GeLC-MS/MS. Instead of separating proteins according to their isoelectric point and molecular weight followed by the analysis of a limited set of proteins (2-DE), complex protein mixtures are fractionated by their molecular weight only (1-DE). The GeLC- MS/MS approach allows the analysis of a large mixture of peptides which are analysed in a single step by nano-high-performance LC-MS/MS. Reverse phase (RP) columns consisting of C18 molecules are used and a gradient increasing in hydrophobicity is applied to supply the mass spectrometer for peptide measurements. The various aspects of faecal metaproteomics have been reviewed recently (Study V of this thesis); the salient features of this discipline are summarized below (Figure 3).

Figure 3. Overview of a metaproteomics workflow with proposed specifications based on this thesis work.

Sample preparation Prior to LC-MS/MS analysis, the proteins need to be liberated from the sample matrix. As in all meta-omics techniques, a large collection of organisms can be the molecule-source for these proteins. Similarly, the proteins may be extracellular or intracellular. The available options for metaproteomics include the analysis of faecal water, chemical or mechanical lysis of either microbial cells, or all of the cells present in faeces. All of these options have found application in faecal metaproteomics but a systematic comparison such as for other metaproteomic samples like soil and activated sludge has yet to come (Keiblinger et al. 2012; Hansen et al. 2014). Due to limitations of the nano-HPLC and the mass spectrometers, complex protein mixtures need to be fractionated. This can be achieved by fractionation with 1-DE, followed by dividing the gel into several fractions that are then separately analysed by LC-MS/MS. Peptide mixtures can also be separated by strong cation exchange (SCX) chromatography and the resulting fractions analysed by reverse phase LC-MS. Another possibility is to inject the complex mixture into the LC and run a shallow gradient for a long period of time (see below).

14

Review of the Literature

LC-MS/MS Proteins are relatively stable molecules (see above) and this eases their analysis. To increase throughput, the proteins are digested into peptides prior to being subjected to MS. In order to reach the mass spectrometer after elution from the chromatographic columns, the peptides need to be charged by ionization. The mass analyser sorts the peptide ions according to their mass to charge (m/z) ratio and a detector measures their intensities. Modern mass spectrometers are usually hybrids containing two mass analysers; one high accurate mass analyser measuring the mass of intact peptide ions i.e. parent ions and, for easing down-stream peptide identification, one less accurate but fast mass analyser for fragmenting peptide ions and analysing the masses of the fragment ions. Several solutions are available for each component of a mass spectrometer, i.e. ionization source, mass analyser, and detector. These are compatible with each other and many different set- ups exist (Schneider and Riedel, 2010; Graham et al. 2011). There are also different methods available for fragmenting peptide ions (Guthals and Bandeira, 2012) resulting into different ion types. Typically, in metaproteomics studies, the peptides are ionized after nano-LC by electro-spray ionization and their m/z value analysed by hybrid mass spectrometers. The fragmentation is carried out by collision-induced dissociation leading to the production of mostly b- and y-ions, a fragmentation pattern which is appropriate for peptide-identification. Since its invention in the late 1980s, mass spectrometers have undergone rapid developments, both in terms of accuracy of the m/z determination and of speed of molecules analysed within a certain time window. In this thesis, this is illustrated by the fact that with the mass spectrometer used in Study IV five times more fragments were measured in the same time window (duty cycle) compared to Study I. These developments allow the analysis of complex protein mixtures, such as those derived from environmental systems, including the human intestinal tract. The invention of nano-HPLC allows the injection of few l and therefore sample availability is mostly not the bottle neck. This is a clear advantage to 2-DE. Despite the fact that only a limited number of human intestinal metaproteomic studies have been reported, various options for sample preparation and data analysis have been described, reflecting the multitude of options the proteomics tool box is offering. Least variation is seen in the LC-MS/MS setup reflecting the 1D gel/gel-free approaches and nano-LC plus Orbitrap as state of the art. In the faecal metaproteomics approaches, three major pipeline solutions have been applied: 2-DE followed by LC- MS/MS, 1-DE followed by LC/MSMS, and SCX chromatography coupled to LC-MS/MS (SCX RP LC- MS/MS) (Table 3). The sample complexity of environmental samples is generally higher than in classic proteomics studies where the proteins originate from one organism and are usually present in a homogenous sample matrix. In addition, one dogma for proteomics i.e. that all (or at least the majority) of expected proteins is contained in the sequence database (Nesvizhskii, 2014), is hardly fulfilled in metaproteomics. The discrepancy between the large interest in metaproteomics and its analytical challenge is expressed by the fact that the number of review articles exceeds the number of experimental articles. An overview of the human intestinal metaproteomic studies highlighting the differences in analytical approaches, used databases and outcome in protein numbers and identified functionalities has been recently published (Study V of this thesis). Since its publication, three additional studies have been reported which are focusing on method development and those are detailed in the Results and Discussion section of this thesis (Lichtman et al. 2013; Xiong et al. 2014; Tanca et al. 2015). One additional study described proteomic differences between healthy controls and patients with Crohn’s Disease (CD) based on 2-DE profiles of faecal proteins (Juste et al. 2014). Of note, the 16S rRNA analysis did not reveal any grouping according to the health status among the studied samples, highlighting the added value of proteome compared to conventional microbiota analysis based on phylogeny.

15

Review of the Literature

Table 3. Features of pre-fractionation techniques applied in human intestinal metaproteomics.

Feature 2-DE 1-DE SCX Required protein amount > 10 µg 1-10 µg 0.01 -0.001 µg Throughput sample Low Medium High preparation Throughput LC-MS/MS Low Medium Low Identification efficiency 10 – 100 proteins from one >500 proteins from one >500 proteins from one sample sample sample 2-DE: two dimensional gel electrophoresis; 1-DE one dimension gel electrophoresis; SCX: strong cation exchange

Peptide identification There are three main routes for inferring a peptide sequence to an m/z value. Those are peptide spectral matching (PSM), “de novo” sequencing, and the use of spectral libraries (Lam et al. 2008; Allmer, 2011; Vaudel et al. 2012). Most frequently applied is PSM which relies on matching parent and fragment ions against an in-silico digest of a protein sequence database. Many different algorithms have been developed for performing this task; these are available via open source tools or commercial software (Shteynberg et al. 2013). The choice of the protein sequence database is a very crucial one due to the complexity of the analysed samples in faecal metaproteomics. Hence, this is an essential step in the whole metaproteomic pipeline which is elaborated in the Results and Discussion part of this thesis. De novo sequencing i.e. direct inference of a peptide sequence from the m/z value is not yet routinely used in metaproteomics (Cantarel et al. 2011; Muth et al. 2015) as statistical methods to control for false positive identifications need to be developed. Spectral libraries, i.e. high quality reference m/z repositories, do not yet exist for metaproteomics. Since PSM is a probability-based approach, the number of false positive identifications has to be controlled and kept on a low level (Jeong et al. 2012). One common method is to use the results from PSM against a decoy database, which for example can be the sequences from the target database in reverse or a shuffling of the amino acids (Yadav et al. 2012). This approach has been introduced for classical proteomics and it has been shown that there is still improvement regarding sensitivity and accuracy of PSM in metaproteomics (Muth et al. 2015). False discovery rate estimations at the protein level are impaired by the problem of assigning peptides to proteins in metaproteomics. As the same peptide can be part of proteins originating from various species, phyla or even domains of life, protein inference is put to its extreme in metaproteomics. Therefore, analytical strategies are required to increase the statistical validity of PSM in metaproteomics. Specific post-translational modifications (PTMs) have not been considered in any human intestinal metaproteome study, as the complexity is already extremely high. A large variety of bacterial PTMs is known and includes phosphorylation, methylation and glycosylation, reviewed by Cain et al. (2014). In general, targeted proteomics approaches are required to identify for example phosphopeptides, reviewed by Huang et al. (2014). A proteomics approach has shown that some of their very abundant glycolytic enzymes of Lactobacillus rhamnosus GG are phosphorylated (Koponen et al. 2012). A large variety of PTMs are to be expected to be identified from intestinal metaproteomes.

Quantification In addition to identify the different proteins in the metaproteome, quantifying their amounts is required for sample comparison. Absolute quantification cannot be achieved in high-throughput proteomics but basic approximations for assessing protein quantity exist and are very useful. Two major approaches can be taken: labels can be introduced to compare between different samples based on the ratio of the differently labelled peptides, but also label-free quantification has been made possible during the last decade due to technical and software improvements (Seifert et al. 2012; Otto et al. 2014; Arsène-Ploetze et al. 2014). The latter approach has been mostly applied in faecal metaproteomics. Label-free analysis is either based on the chromatographic

16

Review of the Literature peaks of MS1 data, i.e. is the unity of all measured peptide ions, to compare peak intensities across experiments. This requires sophisticated software to adjust for differences in chromatography and peak intensities (aligning and normalization). Another, more simplistic but effective approach is counting the spectral hits for a certain unit (that may be proteins, peptides, taxonomic or functional unit). Various normalization methods have been described, reviewed by Neilson et al. (2011). During the last decade, selected reaction monitoring (SRM) methods, also called multiple reaction monitoring (MRM) methods, have been developed to quantify targeted proteins, reviewed by Liebler and Zimmerman (2013). Proof-of-principle studies identify proteins of interest and then a method is established to quantify these proteins. In a study on the metaproteome of the mucosal samples of CD patients a subset of proteins that were found to be significantly differently expressed in the 2-DE experiments, were targeted and verified with SRM/MRM (Juste et al. 2014). All of the SRM/MRM experiments of the thirteen proteins verified the earlier 2-DE results. Recently, SRM methods for 10,000 human proteins have been published (Rosenberger et al. 2014). In future, such methods may also be developed in large scale to target specific microbial proteins derived from the human intestine. In general, the evaluation of an approach is crucial for developing a method for multi-sample comparisons, especially when a multitude of options is available for each single analytical step. Already for the 1D separation and consequential protein in-gel digestion, a variety of choices have to be made and these will influence the subsequent results (Speicher et al. 2000; Glatter et al. 2012). Hence, the evaluation of a metaproteomic pipeline needs to be carefully planned and executed. The technical development in MS-based proteomics poses a challenge for data analysis and data storage (data repositories) as well as handling the constantly increasing size of raw files. A file containing the MS/MS spectra of a single sample or gel slice can be as large as 0.5 GB. Improvements have been achieved by standardisation of files for across-pipeline usage and public data repositories to enable reanalysis with constantly improving data analysis tools. For example, analysis solutions have been developed, with cloud computing to maximize compute resources (Muth et al. 2013). To handle these big data, software packages for data visualization are required; these have been reviewed recently (Oveland et al. 2015).

Proteomics applied in microbiology Proteomics has been widely used to study microbial activity but there are only very few studies on commensal intestinal microbes, in-vitro as well as in animal model (Schneider and Riedel, 2010; Otto et al. 2014). Performing a literature search on PubMed on 15/02/23 with the term “proteomics” in combination with either “pathogen”, “probiotic”, or “commensal” resulted in 1657, 81 and 55 hits, respectively. For example, the pathogen Staphylococcus aureus has been studied for its response to glucose-starvation using a proteome approach (Michalik et al. 2012). A reference proteome was reported for L. rhamnosus GG which is widely marketed as a probiotic and consumed world-wide, and this paradigm probiotic strain was analysed by proteomics in several culture conditions also resembling the intestinal environment (Koskenniemi et al. 2009; Koskenniemi et al. 2011; Koponen et al. 2012). Recently, the same group of researchers analysed the surface proteome of L. rhamnosus GG as this possibly contains the interacting molecules in the host upon consumption and identified expected secreted proteins as well as a set of moonlighting proteins (Espino et al 2015). Other probiotic or model strains that have been studied by proteomics are L. plantarum WCFS1 (Cohen et al. 2006), Bifidobacterium longum subsp. infantis (Kim et al. 2013), Propionibacterium freudenreichii (Le Maréchal 2015), and Weissella confusa (Lee et al. 2008). One example of a commensal of which its proteins were studied is Enterococcus faecalis (Lindenstrauß et al. 2013). In addition to measuring activity, proteomics has recently been used to identify microbial strains by MALDI in the case of Bifidobacterium animalis (Ruiz-Moyano et al. 2012). Similarly, the proteome of several human body compartments have been determined as detailed above, but only rarely the data is also searched against bacterial sequence entries.

17

Review of the Literature

Inferring functions to proteins Once a new genome is sequenced, the function of its genes or proteins is inferred from homology analysis with previously annotated genes or proteins; in the best case scenario the functions of selected genes are discovered by knock out, overproduction and phenotypic analysis. Due to the exponential growth of sequenced genes and therefore hypothetical proteins, experimental proof of the functions of these genes is lagging far behind. In the most recent release of the Universal Protein Resource Knowledgebase (UniProtKB) (2015_02 of 04-Feb-2015), the curated part Swissprot database contained 547,599 entries (http://web.expasy.org/docs/relnotes/relstat.html), whereas the non-curated part Trembl contained more than 100-fold more sequences, with 90,860,905 sequences (http://www.ebi.ac.uk/uniprot/TrEMBLstats/). It appears that the Swissprot database has reached a plateau whereas in Trembl the number of sequences is doubling approximately every year. Evidently, this generates the problem that annotation of new genes mostly relies on non-curated sequences and therefore introduces biases. There are several approaches linked to extensive databases that can be used to assign proteins to functional groups (Table 4). They differ in terms of the number and type of genomes on which the classification has been based on. In addition, the amount of levels for functional grouping varies and can be shallow like for the Cluster of Orthologous Genes (COG) (Tatusov et al. 1997) or more branched like the Kyoto Encyclopedia of Genes and Genomes orthology (KO) (Kanehisa et al. 2004). The Cluster of Orthologous Genes was originally built for bacterial proteins but extended for human functions as well (Koonin et al. 2004). However, there has been a break of more than ten years in updating this classifier as a result of funding issues. Considering the tremendous increase in sequenced genomes, this clearly created insufficient coverage but the recent 2014 update provides a reasonable starting position (Galperin et al. 2015). The database eggNOG built on the COG database but has a higher sequence coverage and therefore provides a good candidate for the functional annotation of especially bacterial genomes (Jensen et al. 2008). Recently, the Metaproteome Analyser (MPA), a pipeline which ranges from PSM to functional annotation, was released (Muth et al. 2015). One sample can be analysed with a collection of up to four non- commercial search algorithms within one analysis process. The interface allows representing the taxonomic and functional results in overview graphs. However, for this feature uniprot identifiers are required to infer and function and cannot be used in all metaproteomic approaches yet, as many protein candidates for metagenomic protein sequence databases do not contain uniprot identifiers.

Table 4. Overview of gene orthology databases.

Database Description Source COG Cluster of orthologous 25 functional families http://www.ncbi.nlm.nih.gov/COG/ genes 4,631 different COGs (Galperin et al. 2015) 711 full length eukaryotic, archaeal and bacterial genomes eggNOG non-supervised 25 functional families http://eggnog.embl.de orthologous groups (Powell et 1,700,000 orthologous groups al. 2014) Full length genomes 3,686 organisms (eukaryotic, archaeal and bacterial) KEGG 18,449 KEGG orthologies (KO) last free version 2011 (Kanehisa et al. 2012) (2015/2/18; inferred from KEGG GENES 16,055,760), 469 pathway maps as one part of the KEGG database one KO may belong to a variety of higher levels

18

Review of the Literature

2.2.6 Data integration

Studying the genes, transcripts, proteins or metabolites of the human intestinal microbiota all have their individual challenges. However, for a systemic understanding, different perspectives have to be taken at the same time and different data sets need to be integrated. Recent integrated approaches of the intestinal metaproteome followed a simple path and mainly described the study cohort (with only a limited set of subjects) from different angles and reported the similarities or differences with different data sets, like metagenomics and metatranscriptomics (Erickson et al. 2012; Ferrer et al. 2013; Daniel et al. 2014). Real integration still needs to be further enforced, as it is exemplified in this thesis, by using different types of data, like correlating the abundance of present community members with proteins to explain certain health conditions. Models of metabolic fluxes and network-modelling can be integrated as well (Faust et al. 2012). The term systems biology has been largely used for single organism studies but as analytical methods as well as bioinformatics are advancing, it is now possible to examine the human intestinal microbiota in a systems biological approach (Dos Santos et al. 2010; Siggins et al. 2012; dos Santos et al. 2013). Especially for generating hypotheses, intestinal models, as reviewed in (Venema and Van den Abbeele, 2013), as well as animal models help in reducing the complexity of the intestinal ecosystem. These hypotheses can then be tested in human trials. The differences between mice and humans have been reviewed recently (Nguyen et al. 2015). When testing the effect of diet on a consortium of 12 human bacterial strains in a mouse model, transcriptomics, composition analysis and proteomics revealed different changes at each analytical level (McNulty et al. 2013). The share of unique peptides per strains in the in vivo community was different to the predicted contribution based on genome comparison. These findings evidence the need for studying the intestinal community with complementary omics methodologies, not only to increase the insight, but also to reduce biases of each of the approaches. A new methodology is the use of stable isotope probing (SIP) that has been used to identify the bacteria with active metabolic conversions in an intestinal model and could also be used in humans to study protein fluxes (Kovatcheva‐Datchary et al. 2009). Recently, SIP has been used to study the active community found in mouse intestinal biopsies (Berry et al. 2013).

19

Review of the Literature

3. AIMS OF THE STUDY

The main aim of the thesis work was to characterize the intestinal ecosystem based on the proteins identified from faecal material. In detail, there were four main aims of this study:

I) To develop methods for assessing the metaproteome from the human intestinal tract and implement these in a proof-of-principle study;

II) To determine the intestinal functionalities and their short-term variance in healthy adults in the context of a probiotic intervention;

III) To conduct a comparison of the intestinal functionality of non-obese and obese individuals based on microbial and host proteins;

IV) To evaluate the methodologies which can be used to study the metaproteome of the human intestinal microbiota.

20

Materials & Methods

4. MATERIALS & METHODS

4.1 Study cohorts and samples During the course of this work faecal samples from four different study cohorts were analysed (Table 5). The individuals in Study I took part in the Micro-Obese cohort recruited from the large SU.VI.MAX2 cohort, in which individuals had been followed for eight years for their healthy nutritional and lifestyle habits (Hercberg et al. 2004). For Study II, which was a proof-of-principle study to assess the reproducibility of the developed approach and to determine the variation over time, samples were collected in-house from adults who provided informed consent. The study participants in Study III were a sub-set of a larger cohort of initially 68 individuals taking part in a probiotic trial testing three different probiotic strains in a double-blind setting (Lactobacillus rhamnosus GG, 1010 cfu per day; Bifidobacterium animalis ssp. lactis Bb12, 108 cfu per day; and Propionibacterium freudenreichii ssp. shermanii JS, 1010 cfu per day). The study product, to be consumed daily for three weeks, was a 250 ml fruit-based milk-drink without or with the probiotic strains. The set-up of this trial, the determined immune-markers and lipids, and the correlation between levels of lipids and the microbiota had been reported previously (Kekkonen et al. 2008a; Kekkonen et al. 2008b; Lahti et al. 2013a) (Figure 4). Metaproteomic analyses were carried out on samples from eight individuals from the placebo group and from eight individuals from the L. rhamnosus GG group. In Study IV, the intestinal metaproteomes were analysed in faecal samples from nine lean (BMI > 18.5 < 25 kg/m2), four overweight (BMI > 25 < 30 kg/m2), and 16 obese subjects (BMI > 30 kg/m2; N=4); of these seven were morbidly obese (BMI > 45 kg/m2). The microbial composition and clinical host variables had been determined previously (Verdam et al. 2013).

Table 5. Overview of study cohorts analysed in this thesis

Study Type Country of Subjects No. Timepoints Sample Age BMI sample (Female:Male) (intervals) storage range collection Study I Observational France 2 (0:2) 1 Fresh 61-63 < 30 study Study II Proof of Finland 3 (3:0) 2 (1y/ -70° C 23-35 < 30 principle study 1/2 y) Study III Placebo Finland 16 (11:5) 3 (3 weeks) -20° C o/n 27-54 18.4– controlled  -70° C 30.4 probiotic intervention Study IV Comparative The Netherlands 29 (20:9) 1 +4° C 18-54 18.6– study  -20° C 60.3 within one day

Figure 4. Study design of Study III with respect to the analyses described in this thesis.

21

Materials & Methods

4.2 Methods

4.2.1 Metaproteomics

Study I was conducted within a collaboration in which the proteins had already been extracted from faecal samples by first isolating microbial cells by Nycodenz gradient centrifugation (Murayama et al. 2001), followed by disruption of the cells by chemical lysis, as described in Study I in this thesis. In all other studies an in-house developed sample preparation approach was applied that included the following key features: i) no fractionation of faecal material, ii) extraction of proteins via mechanical lysis with bead beating (0.1 mm zirconium silicate beads) under anaerobic condition (gaseous nitrogen), and iii) protein fractionation into three parts according to molecular weight by one dimensional gel-electrophoresis with focusing on the approximately 40-80 kDa fraction. The proteins contained therein were termed the Abundant MetaProteome AMP (see Results and Discussion). The details of the applied metaproteomic methods have been described in the original articles (Studies I- IV). Table 6 presents an overview of the applied metaproteomic techniques from sample preparation to data analysis as applied in the four sub-studies.

Table 6. Overview of proteomics methods applied in this thesis Study I II III IV Sample Protein Extraction From microbes isolated from faecal material X Preparation (=bacterial pellet) From unfractionated faecal material X X X Detergent buffer (urea, thiourea, CHAPS) X Phosphate buffered saline (PBS) X X X FastPrep FP120 (Savant) @ 0.1 mm zirconium X silicate beads FastPrep 24 (MP Biomedicals) @ 0.1 mm X X zirconium silicate beads 1D PAGE NUPAGE 4-12% Bis-tris (Invitrogen); reducing X X X X agent, sample buffer, and MES running buffer (Biorad) Fractionation 20 gel pieces per gel lane X 3 gel pieces per gel lane, analysis focus on AMP X X X In-gel digestion (Shevchenko et al. 2006) X X X X Peptide clean-up GF/C filtration and OMIX tips X C18 microspin columns X HPLC-ESI- Nano-HPLC Gradient run length (min): 50, 85, 100, and 120, X X X X MS/MS respectively Mass spectrometers LTQ-Orbitrap X LTQ-Orbitrap XL X Orbitrap Velos X Velos Pro-Orbitrap Elite X

22

Materials & Methods

Table 6 continued

Data-analysis Peptide spectral OMSSA (Geer et al. 2004) X X X X matching – search algorithm X!Tandem (Craig and Beavis, 2004) X q-value correction (Käll et al. 2008); filtering of X ambiguous hits Peptide spectral Matched metagenomes, collection of bacterial X matching – database genomes (=synthetic metagenome) + unmatched metagenome Gut dedicated database (db) X Extension of db in Study II X X Separate search against subsets of db X Search against complete set of sequences in db X X Peptide grouping X False discovery Concatenated target and decoy db X X X rate(FDR) estimation Separate target and decoy db X FDR stringency 1 % X 5 % X X X Functional COG BLAST (Tatusov et al. 1997; Altschul et X X X assignment al. 1997) KEGG assignments through NCBInr BLAST X and subsequent MEGAN 5 analysis (Huson and Weber, 2013) EggNOG (Powell et al. 2014); Ublast (Edgar, X 2010) Functional grouping iPath (Letunic et al. 2008) X microbial gastro-intestinal metabolic modules X GMM (Le Chatelier et al. 2013) Taxonomic In-house python scripts – UniprotKB X assignment - Lowest common ancestora (LCA) Unipept (Mesuere et al. 2012) X X Quantification Progenesis LCMS (MS1 based) X Spectral counting (MS/MS based)b X X X a Peptide matching against UniprotKB entries with 100% identity; assignment of lowest taxonomic unit to which the peptide was uniquely mapped; in each case all the hierarchical levels were reported allowing the sub-setting of the data e.g. for bacterial peptides b Counting of the number of spectra identified per comparative unit e.g. KO or phylum; relative percentages in Studies I - III; absolute numbers in Study IV

4.2.2 Phylogenetic microarray analysis

In addition to the metaproteomic data, phylogenetic data was available. The experiments for nucleic acid based analysis were carried out by our research team (Study II and III) and collaborators (Study III, Study IV). In Study II and IV, faecal DNA was extracted by mechanical lysis as described in (Salonen et al. 2010) and in Study III with a modified Promega method as described in (Ahlroos and Tynkkynen, 2009). Of all samples in Studies II-IV, with the exception of four samples in Study III, the Human Intestinal Tract Chip (HITChip), a phylogenetic microarray described previously (Rajilić-Stojanović et al. 2009), was used to identify the bacterial composition of the faecal samples. The results of the microbiota composition in

23

Materials & Methods studies III and IV had been reported previously (Verdam et al. 2013; Lahti et al. 2013a). The HITChip targets the V1 and V6 hypervariable regions of the 16S rRNA gene of over 1000 bacterial phylotypes. Phylogenetic organization of the microarray probes and data pre-processing have been explained in detail elsewhere (Rajilić-Stojanović et al. 2009; Salonen et al. 2010; Jalanka-Tuovinen et al. 2011). In Studies III and IV, the probe signals were summarized with Robust Probabilistic Averaging algorithm (fRPA) (Lahti et al. 2013b) per phylotype and further to 131 genus-like levels, which are groups of bacteria sharing ≥90% sequence identity, and referred to with species name and relatives.

4.2.3 Statistical analysis

In studies II-IV, the statistical programming language R was used for statistical data analysis (R-core team).

Unsupervised methods For the correlation of various data-sets either Pearson (Studies II and III) or Spearman correlation was used (Study IV). The Jaccard Index (Levandowsky and Winter, 1971), in this case “number of shared peptide/peptide union”, was used to compare the metaproteomes at the level of identified peptide sequences (Study III). Complete linkage hierarchical clustering was applied to visualize sample relationships (Studies II-IV). Principal component analysis (PCA) (Venables and Ripley, 2002) was used as the ordination method to identify feature based clusters (Study II). In addition, principle coordinate analysis PCoA was used to identify sample clusters based on distances (Paradis et al. 2004). Jackknife iteration was used to test the statistical significance of the separation. Random forest analysis of ANOVA results was applied in Study II (Breiman, 2011; Faraway. 2002). Q-value corrections were applied. Wilcoxon test was used to test for group differences (Study IV).

Supervised method Boots-trap aggregated redundancy analysis (bagged RDA), applied and described in (Jalanka-Tuovinen et al. 2014), was used to identify the NOGs explaining most of the clustering of the study groups in Study IV.

4.2.4 Additional data-sets used in this thesis

Study I Study II Study III Study IV Metagenomic reads, Phylogenetic microarray Phylogenetic microarray Phylogenetic microarray NCBI BioProject 33303, data data; data; samples NO1 and NO3 Symptom diary Clinical host variables

24

Results and Discussion

5. RESULTS AND DISCUSSION

The general objective of the work described in this thesis was to gain a functional insight into the intestinal microbiota based on a metaproteomic analysis of human faecal material. For this purpose, appropriate sample preparation methodologies and a data analysis pipeline had first to be developed. These analytical strategies were applied to faecal samples collected from a total of 50 adults across four different studies. The functional individuality of the microbiome and changes in it occurring over time were addressed. In addition, possible modulations of these functions due to both a probiotic intervention and to obesity were followed. While the emphasis was placed on addressing the microbial functionalities, the developed approach also enabled the simultaneous analysis of the host functions.

5.1 Method development to study the human intestinal metaproteome (I, II and V) At the start of this project only one faecal metaproteomic study had been published, describing the metaproteome of baby faeces in the first 100 days of life (Klaassens et al. 2007). However, this first study was based on the classical 2-DE approach, before the appearance of the high throughput LC-MS/MS analytical technologies, and furthermore, it targeted baby microbiota which has a low complexity as compared to the adult microbiome. Moreover, no reference databases were available for the intestinal proteins. Hence, here a novel methodological pipeline had to be developed for analysing faeces obtained from adults; i.e. starting from refining the methodologies for preparing faecal samples for LC-MS/MS analysis and through different stages to finally devising a strategy for complex data-analysis. First, protein extraction and fractionation methods needed to be developed, applicable for the analysis of several dozens to hundreds of samples (Study II). Second, in the peptide spectral matching, as large a collection as possible of target sequences was needed with which to construct an intestine-specific database. As the sequence analysis of the intestinal metagenome has made very rapid progress, this database had to be updated (Studies I-IV). Third, a set of different gene orthology systems were applied to functionally annotate peptides and proteins, as no established system for the human intestinal proteins existed (Studies I-IV). Fourth, since previously the phylogenetic origin of a peptide had not been inferred from intestinal proteins, this was undertaken for the first time (Studies II-IV). Moreover, statistical approaches, like the Jaccard index and the integration of metaproteomic and phylogenetic data, originating from different disciplines were adopted for metaproteomics (Studies II-IV). Finally, a critical assessment of the methodologies used in human metaproteomics studies was performed and future needs and directions of human metaproteomics were proposed (Study V).

5.1.1 Analysis of the abundant metaproteome (II)

Mechanical lysis of microbial cells from adult faecal samples using a bead beating procedure had been shown to generate high yields of DNA that were suitable for determining the phylogenetic composition of a microbial ecosystem (Yu and Morrison, 2004). Hence, this method was applied to extract proteins from adult faecal samples. Similarly to the amount used for DNA extraction applied in Studies II-IV, a total of 125 mg faecal material was used as starting material (Salonen et al. 2010). Bead beating of the sample after three fold dilution in PBS with 500 mg zirconium silicate beads, for cell disruption, and three glass beads, for homogenization during the lysis process, under a gaseous nitrogen atmosphere resulted in protein concentrations high enough to produce highly visible Coomassie stained protein bands appropriate for consecutive nano-LC-MS/MS analysis. Proteins were fractionated according to their molecular weight by 1-DE. In addition to the size separation, this also provided a significant purification of the protein extracts and no further cleaning of the sample was needed anymore as these often results in dilution and the subsequent necessity to concentrate the samples. The 37 and 75 kDa bands of a pre-stained marker were used as molecular weight controls in the

25

Results and Discussion production of three gel fractions: one containing proteins larger than 80 kDa, one with a molecular weight range of approximately 40-80 kDa and one with proteins smaller than 40 kDa. The proteins contained in these fractions were called High molecular weight MetaProteome HMP, Abundant MetaProteome (AMP), and Low molecular weight MetaProteome (LMP), respectively. The proteins contained in the ~40-80 kDa region were termed AMP as the majority of proteins were covered in this region. By comparing the three molecular weight fractions, it was observed that 81.4% of the different proteins across all fractions were also identified in the AMP (Study II, 6 biological samples). A further motivation to focus on this area was that it may be difficult to identify high molecular weight proteins and the lower molecular weight fraction is likely to contain ribosomal proteins and partially digested proteins. In Study II, where the approach was published for the first time, AMP was referred to proteins present in the 35-80 kDa region. This terming was based on few individuals’ samples. Later, when analysing dozens of different samples, it was noticed that cutting the gels below the 35 kDa marker band may not always be convenient and a cutting above is required in order to have sample intrinsic marker bands for cutting which might be slightly above the 37 kDa band of the pre- stained marker. On average, in the AMP, almost 5500 unique peptides and 460 NOGs, as indicator for functional depth, were identified in one LC-MS/MS experiment with the experimental specifications as provided for Study IV (Table 6). Due to the complexity of the sample matrix and the proteins present in faecal material one sample preparation approach will always allow identification of only a subset of proteins present in the sample. Therefore, recently, various methods have been described that could increase the number of proteins identified in a metaproteomic experiment. Two main approaches are available for the identification of low- abundance proteins in the intestinal metaproteomes, i) separating the faecal fractions and ii) fractionation of proteins. Two recent studies described approaches to separately analyse bacterial and human proteins (Lichtman et al. 2013; Xiong et al. 2014). Analysing the proteins present in faecal water identified 53 human core proteins in a sample size of three individuals (Lichtman et al. 2013). This number corresponds to the 23 human KOs - which are a higher abstraction level than proteins - found to be the human functional core in Study III (data analysis this thesis; core defined as number of KOs identified in at least one sample of each individual). Therefore, the approach described here that is based on taking crude faecal material covers the same range of human proteins as assessed in the so-called targeted approach. The tedious separation described by (Xiong et al. 2014), which included several sample-preparation steps to separate human and microbial cells from each other, appeared to double the number of identified microbial proteins by two-fold compared to an approach analysing both fractions at once. Therefore, as found in other proteomics studies, increasing the level of fractionation increases the level of identifications. In addition to sample separation strategies, increasing the analysis time for one sample might increase protein coverage. This can be achieved by undertaking an extensive fractionation of proteins or peptides. For mouse faeces, filter assisted protein digestion (FASP), a protein digestion method which circumvents the in- gel digestion, in combination with a gradient running for over eight hours, produced over 16,000 peptides in four runs (Tanca et al. 2014). The gain in functional information compared to other approaches was not detailed. The same group compared the effect of differential centrifugation and achieved an enrichment of microbial peptides when separating bacterial cells from other faecal components by applying different centrifugation speeds. The preparation of a microbial pellet resulted into the identification of approximately 6,000 microbial peptides, whereas when the unseparated material was taken only around 4,000 peptides could be identified (Tanca et al. 2015). However, in the approach analysing the bacterial fraction, it appeared that the Bacteroidaceae were underrepresented. When comparing these other studies to the approach developed in this thesis (Study II), differences could be observed also at other steps of the pipeline, e.g. in the method of lysing the cells, the analytical platform used, as well as in the databases used (Table 7). These differences impair any direct comparison between the various studies. In general, the quantity and quality of fractionation and time spent per sample differs depending on which combination of techniques is selected. At

26

Results and Discussion the functional level, no major discrepancies appeared when comparing the different studies based on the identified proteins (Study V).

Table 7. Brief overview of examples for published method combinations in faecal metaproteomic experiments. Sample preparation Kolmeder et Tanca et al. 2014 Erickson et al. Juste et al. al. 2012 2011 2014 Crude faecal material X Bacterial pellet X X X

Digestion In-gel digestion X X FASEP X In-solution X

Analytical platform 2-DE X SCX X LC-MSMS moderate gradient < 3h X LC-MSMS Long gradient > 3h X

Protein sequence database Bacterial genomes X X X X Metagenomes X X

5.1.2 Reproducibility testing of the pipeline (II)

The effect of sample preparation on the reproducibility of the LC-MS/MS results had to be determined in order to allow its application for robust sample comparisons. For this purpose, MS1 profiles were used, since with respect to the analytical outputs, they represent best the uniqueness of each in-gel digest. Replicates of the protein extraction, in-gel digestion, and LC-MS/MS were taken for comparison. In the analyses, samples from six biological samples were included representing three different individuals and two sampling time- points. Altogether, the comparison was performed with 26,000 MS1 events across 22 AMPs. Based on Pearson correlation coefficients, the technical variation was compared to the variation of different samples from the same individual and in samples from different individuals (Figure 4). The technical variation, including the variation introduced by protein extraction, 1-DE/in-gel-digestion and LC-MS/MS analysis, was significantly lower than the variation of samples from the same individual collected on two different days with at least half a year time in-between. The variation observed in different samples from the same individual was significantly lower than the variation observed in samples taken from different individuals. This provided assurance that the developed approach could be applied in consecutive studies for comparing the functionalities of the intestinal microbiota across samples.

27

Results and Discussion

Figure 4. Boxplots of the distribution of the Pearson correlation coefficients (y-axis) to assess the effect of sample preparation, temporal and subject variation.

5.1.3 Development of in-house databases (I-IV)

One specific challenge in metaproteomics is the need for an adequate protein sequence database being a fair representation of the vast variety of different proteins present in the sample. The intestinal microbiota with the hundreds to thousands of different microbial species is an extreme version of this challenge. The individuality of the intestinal metaproteome and the consecutive required targeting of the database were demonstrated with metaproteome analysis of two samples for which the metagenomes had been analysed and were available for PSM i.e. matched metagenomes (Study I). Protein sequence databases were constructed by naïve translation of the metagenomes and these were used for PSM. By using the matched metagenome as the protein sequence database for PSM, it was possible to obtain double as many identified peptides than if one used the unmatched metagenome. In the same study, the following search strategy to increase the number of identified peptides was proposed; i) to use a collection of intestinal bacterial genomes as a database, called a synthetic metagenome, ii) perform a BLAST search with the hits from this search against a large metagenome database, iii) construct from these BLAST hits a protein sequence database and finally iv) perform PSM against this database and apply a conserved FDR of 1%. The aim was to increase per protein the sequence coverage by identified peptides, not to identify a large variety of functions. This approach of using an iterative workflow for PSM targeted the problem of impairing peptide identification on the basis of established validation strategies when the search sequence space becomes extremely large. In a classical proteomics experiment the database size is usually not larger than 10,000 sequences. However, the collection of metagenomic sequences used here contained 3,300,000 sequences (Qin et al. 2010). The result of doubling the number of identified peptides compared to the matched metagenome demonstrated the usefulness of the approach that could be applied in cases where extensive protein coverage is needed. Recently, other strategies for decreasing the size of the protein sequence database to aim for more specificity have been proposed. In one of these approaches a first search was conducted in a complex protein sequence database and a second search in a database constructed from the unfiltered results from the first search. The peptide identifications from the second search were subjected to strict FDR filtering (Jagtap et al. 2013). In another study, in a first search all the UniprotKB entries were used and then in a second search only the genomes of those microbial strains for which initially a hit had been found (Tanca et al. 2015). A systemic comparison of the available possibilities to increase the sensitivity of PSM in metaproteomics has yet to come.

28

Results and Discussion

Especially at the beginning of this work, having a matched metagenome of each analysed sample was not feasible due to the high costs. Therefore, it was decided to construct a database based on publically available sequences which would cover microbial, human and food derived proteins. During the course of this thesis work, the first comprehensive metagenome study was published (Kurokawa et al., 2007) followed by a human intestinal gene catalogue containing 3.3 Mio genes (Qin et al. 2010); the data of both studies was included into the protein sequence database constructed in-house. In addition, the protein sequence database contained a collection of intestinal microbial genomes; their number was initially 298 (Study II) but increased to 594 as more genomes became available (Studies III & IV). The human genome was included as well and extended for identifying alternative proteins in Studies III & IV. Finally, seven plant genomes were included in the protein sequence database that could be beneficial for identifying proteins present in food. The initial database contained 5,132,694 protein sequences (Study II) and the updated database 6,153,068 (Studies III & IV). The extension of the protein sequence database plays a possible role in the increased identification ratios during the course of this thesis work (Table 8). The largest collection of genes derived from the intestinal microbiota covers now more than 10,000,000 genes which may increase the ratio of identified peptides out of the metaproteomes (Karlsson et al. 2014). However, as exemplified in a recent study where the effect was tested of the size of the protein sequence database, it appeared that the optimum way of using this large sequence space still has to be determined (Muth et al. 2015). The more unique proteins that are present in a protein sequence database, the higher is the probability that peptides present in a sample will be identified due to the nature of PSM that allows only 100% similar identifications. Conversely, the recent study by Muth and co-authors revealed that by using the present peptide validation strategy of FDR filtering, the sensitivity will suffer in the case of large databases, hampering utilizing its full potential.

Table 8. Overview of MS/MS spectra and peptide identifications across studies Study I * Study II** Study III ** Study IV** 50 min gradient 85 min gradient Step-wise gradient 120 min gradient 70min + 30min Number of MS/MS ~45,000 ~4000 ~20,000 ~40,000 spectra per sample Identification ratio 7.9 - 11.9% ~17.0% ~24.0% ~27.0% *Different approach than used in Studies II-IV, 20 gel pieces covering full molecular weight range, each analysed with a 50 min gradient ** AMP; see Table 6 in Material and Methods for differences in mass spectrometers used and other parameters between the studies

5.1.4 Functional annotation with COG, KEGG and eggNOG (I-IV)

Simply because of their extreme complexity and since the appropriate methods are still under development, most metagenomic sequences have not been well annotated. Hence, a strategy of annotating the identified proteins by orthologies was used. This orthology-based annotation facilitated the comparison between samples. Due to the problem of protein inference, which is particularly pronounced in metaproteomics, it is impossible to assign one protein solution for one peptide, the common unit in classical proteomics for making comparison between samples. In addition, default settings of search algorithms only provide a certain amount of proteins solutions for one peptide (e.g. 30 by OMSSA). However, as in the case of glutamate dehydrogenase, over 500 proteins from different bacteria share at least one peptide and often more peptides. Therefore, the initial search hits need to be remapped to the original protein sequence database with all options per tryptic peptide recorded, and then proteins should be grouped according to their peptide dependencies. In Study II proteins have been grouped based on having at least one peptide in common and one unique peptide. This is computationally demanding and as no ready applicable software solution is available for this task, it was only applied in Study II, with in-house scripts. Therefore, otherwise peptides

29

Results and Discussion were functionally characterized based on the proteins they belonged to based on orthology and then the spectra per orthological functional category summed for subsequent quantitative comparisons. Out of the three orthological databases i.e. COG, KEGG and eggNOG, used for assigning a function to a peptide with eggNOG (Study IV) the highest functional assignation rate was obtained. For more than 90% of the spectra the respective proteins had a high confidence hit in the eggNOG database when applying a UBlast search (e-value of 1E-30 or lower); reflecting the suitability of this classification database of orthologous genes for annotating the human intestinal metaproteomes. The other databases COG and KEGG, which were used in studies II and III, covered around 70% of the identified spectra. All these orthological classification systems are based on a large collection of genomes from various sources and therefore they span a very broad range of functions. The description of the mapped orthologies is not always readily interpretable in the context of the human intestinal microbiota and may be viewed as rather abstract. Therefore, GMMs have been developed (Le Chatelier et al. 2013) that organize KOs into intestinal specific functional groups at three levels, such as mucin degradation or the degradation of carbohydrates, like arabinose and xylose (Study IV). These are meaningful categories and assist in the interpretation of the results. Characterization of protein functions can also be done based on the protein families database (PFAM) (Finn et al. 2014). This domain based functional annotation was not applied in this thesis work but has been previously used in metagenomic studies (Ellrott et al. 2010) and might be useful also in metaproteomics for functional description of proteins and sub sequential quantitative analysis for functionally targeted results.

5.1.5 Reflections on the metaproteomic approach

Taking into account that possibly 1000 microbial phylotypes are present in one individual intestine and that each of these phylotypes has a gene capacity of at least 2000 proteins, several millions of proteins could be present in one faecal sample. The proteins identified here obviously only reflect a very small fraction of the total proteins present in the samples. There are a few critical points that affect the success of a metaproteomic analysis. Sample preparation and protein or peptide fractionation strategies cannot be complete. However, the reproducibility can be assessed; based on a validated approach comparisons between samples can be made. In this work, the proteins were fractionated by molecular size which was then utilized in their subsequent selection and recovery. This process may have an effect on the identified proteins as only a subset of the proteins was analysed. Moreover, the recovery of the trypsin-digested peptides from the gel may be biased and affected by the amino acid sequences. In general, there exists no absolute bias-free proteomic technique, also not for metaproteomics and there is still potential for identifying more functions. To be kept in mind is that both the functional and taxonomic comparisons are based on identified spectra. This might introduce a further bias as it is of course not known which functions the not-identified proteins may have. They can be similar to those identified but they may also be different. Furthermore, a number of other parameters may introduce a bias, such as differences between search engines and their capacity to differentiate between similar peptides: with respect to the search results OMSSA for example is more conservative, tending to underrepresent, whereas X!Tandem has a tendency to over-represent. In addition, the more information is demanded for a peptide such as function and taxonomy, the smaller will become the remaining fraction for comparison; not all peptides have a taxonomy assigned and of these not all have an assigned function and vice versa. Therefore, improvements still can be made at all the various steps. The protein coverage needs to be increased by an optimized sample preparation method and analytical platform. However, careful evaluation is required that an accurate comparison between a large amount of samples is still possible. In addition, a cost-benefit analysis for the database searches is required with regard to the choice of search parameters and database size, all of which define the compute time and time spent on interpretation of the results, and peptide identification sensitivity (Muth et al. 2015). Similarly, peptide and protein validation methods for metaproteomics are required. Because of all these possible improvements, the here described approach

30

Results and Discussion captures a fraction of the faecal proteins. However, it represents a feasible solution for comparing dozens of samples including the functional and taxonomic characterization of the identified peptides. Although there was a constant development of the metaproteomic pipeline throughout the project no major discrepancies were observed. This indicates that the developed approach provides a reliable access to the major proteins contained in faeces. In Study II, protein quantification was based on MS1 data, in all other studies quantification was performed by counting the identified spectra per comparative group e.g. specific phylum. Quantification based on MS1 peaks might be more robust for identifying differences in protein abundances between samples compared to spectral counting. However, applying the MS1 based approach in metaproteomics generates two challenges. The large protein diversity across metaproteomes poses a challenge to the aligning and normalizing algorithms of software for quantifying MS1 based data. Furthermore, the MS1 based approach relies on assigning peptides to proteins; some proteins like glutamate-dehydrogenase are conserved. It is very challenging to resolve several hundred proteins forming one protein cluster at the single protein level. Today’s increased computer power make it more realistic that the standard application of MS1 based comparison in metaproteomics will come in the near future.

5.2 Intestinal metaproteomic baseline of healthy individuals and its variation over time Having developed methods and strategies for faecal metaproteome analysis, the focus of this work was to describe the proteins found in healthy individuals and to create a healthy metaproteomic baseline. Health is a complex state defined by the WHO “as state of complete physical, mental and social well-being and not merely the absence of disease or infirmity”. In Study III, people were asked to grade their health (“good”, “average”, “poor”). Two individuals reported that they had average health but all others reported good health. In the other studies, the participants were not asked to classify their health. In Study III, in which samples of the largest healthy cohort studied for their metaproteome until now were analysed, 103 microbial KOs were found in each of the 16 individuals at least once. This can be considered as the functional core microbiome. The majority (62%) of these could be mapped on the KEGG metabolic pathways (Figure 5). These core metabolic functions, which were mainly involved in carbohydrate and amino acid, and energy metabolism, represent the backbone of the intestinal ecosystem. The coverage appears less compared to what was found in Study II. This can be explained by the fact that in Study II only three individuals were studied, whereas in Study III 16 individuals were studied. Similarly as observed for phylogenetic analysis and metagenomes, the larger the cohort is, the smaller the number of shared species or genes/functions might be (Jalanka-Tuovinen et al. 2011; Karlsson et al. 2014).

31

Results and Discussion

et al. 2008).et al.

Metabolic pathways representing metaproteomic the core basedon KO assignments as assessed in 16 adults (Study(Letunic III)

Figure 5.

32

Results and Discussion

5.2.1 Individuality of the faecal metaproteome (II and III)

One of the main findings of this work was that the intestinal metaproteome of a human being is highly individual. This extends previous findings on the personalized composition of the intestinal microbiota that has been reported extensively (Li et al. 2014). It was observed that the individuality of the metaproteomes remained both over the long term (one half to one year; based on MS1 level, Study II) and the short term (6 weeks with probiotic intervention; Study III). Moreover, the personalized functional microbiome could be distinguished at both the peptide and KO level. This is an important finding as theoretically the individuality at the peptide level could be explained by genetic differences leading to amino acid differences. However, the observation that it was also found at the level of KOs, an abstract functional level, implies that there is a functional significance of the individuality. The individual variation exists in the type and extent of the cellular- and extracellular activities taking place in the intestinal ecosystem. These findings clearly go beyond metagenomic findings. Classifying genes to high functional level categories like COG families has revealed a rather homogenous distribution across individuals (Turnbaugh et al. 2009; Human Microbiome Project Consortium, 2012). However, most metagenomic studies have functionally not delved very deep into functional richness due to its obvious complexity.

5.2.2 Main microbial functionalities (II and III)

The main functionalities and special functions of the human intestinal microbiota explored by metaproteomics in this work clearly represent specific features of the ecosystem. Many proteins being involved in energy conversion were identified; these ranged from degrading both the initial food substrates, like arabinose isomerase for plant material degradation, and host derived, like fucosyl-isomerase to degrade fucose, up to the formation of intermediates such as butyrate and propionate by enzymes like butyryl-CoA dehydrogenase and methylmalonly CoA mutase. Clearly, generating energy for the microbial metabolism especially from carbohydrate sources appeared to be a characteristic of the intestinal microbiota, confirming earlier (meta-)genome-based observations. The maintenance of a redox-equilibrium is a very important task in an anaerobic environment. Rubrerythin was detected in the metaproteomes analysed in Study II and this protein could be involved in reducing oxidative damage by reducing peroxide to water. In addition, glutamate dehydrogenase was identified from several hundred microbes (Studies II & III). This protein may act as electron sink (Study II). The fact that high amounts of glutamate dehydrogenase were found highlights its functional importance as determined in a computational approach in which the functions found in a human derived metagenome were compared to metagenomes of other ecosystems (Ellrott et al. 2010). Furthermore, this demonstrates the capability of this metaproteomic approach to capture functional information from several hundreds of different species. The synthesis of vitamins like folate and vitamin B12 could be identified since it was possible to detect enzymes like formyltetrahydrofolate synthetase and cobalamin biosynthesis protein, respectively, in the metaproteomes analysed in Studies II and III. This does not imply that these vitamins are necessarily available to the host. However, it indicates that these vitamins may be required by the intestinal ecosystem. Interestingly, patients with IBD had a higher prevalence of deficiencies of folate and vitamin B12 (Bermejo et al. 2013). It is not to be excluded that the microbiota has a role in this phenomenon. In general, vitamin B12 was identified earlier as important factor in the intestine as both the host and the majority of intestinal microbes are requiring this vitamin, recently reviewed (Degnan et al. 2014). In general, in Study III it was possible to identify temporal and individual differences in all of the described functions. These identified microbial functions are evidence for the various metabolic activities taking place in the intestine and the changes occurring upon daily variations. Those proteins might be used as marker for the integrity of the microbial community.

33

Results and Discussion

In Study III, study participants had consumed a milk-based fruit drink with or without L. rhamnosus GG. Although, immunological experiments had revealed an effect by the consumption of L. rhamnosus GG (Kekkonen et al. 2008a), no changes to the functionalities were observed. A possible explanation for this might be that there were more pronounced changes in the ileum which were diluted in faecal material and could not be detected with the applied approach.

5.3 Host-microbiota interactions (II-IV) The developed approach allowed identifying also human proteins. These constituted approximately 15% of the identified spectra. Identification of bacterial and host proteins in parallel resulted in the identification of various indications for host-microbiota interactions. Flagellins were detected in most of the intestinal metaproteomes (Studies I-IV). They are microbial proteins that make the cell motile. Moreover, flagellins are also known ligands of human toll-like receptor 5 and trigger the NFkB pathway (Vijay-Kumar et al. 2008). In Study III with the pro-inflammatory serine/threonine protein kinase 4 a human protein could be found significantly positively correlating with flagellins. This may be an example of a host-microbe interaction i.e. the flagellins might be triggering the expression of serine/threonine kinase 4. Flagellins as component of the intestinal metaproteome were also found in a recent case study identifying the faecal proteins of one lean and one obese adolescent (Ferrer et al. 2013). In Study II, Eubacterium rectale and Roseburia intestinalis were identified as the possible sources of the flagellins. New intestinal species possessing the ability to produce flagellins have been recently identified, such as Lactobacillus ruminis (Neville et al. 2012). Elevated levels of flagellin-antigens have been found in patients with Crohn’s disease and post-infective IBS (Schoepfer et al. 2008). However, since they were detected also in healthy individuals, the presence of flagellins does not necessarily need to lead to inflammation. On the contrary, these microbial proteins could also stimulate the host in a beneficial way (Cullender et al. 2013). Human intestinal alkaline phosphatase is another protein that is involved in the inflammatory response. This protein was identified in the human metaproteome of several subjects (Studies II-IV). It was found in high amounts in obese and was one of the NOGs explaining the separation of samples between the obese and the non-obese group. It contributed up to 50% of the identified spectra with human origin. However, also in non-obese individuals, the levels of this enzyme could be high. The human IAP can bind bacterial lipopolysaccharides (LPS) and may therefore be controlling LPS-induced inflammation. Intestinal alkaline phosphatase has been used as a therapeutic agent for several conditions such as sepsis and is a well-studied enzyme. Several factors have been identified that affect its expression levels and enzyme activities, e.g. the type of blood group or pH (Estaki et al. 2014). Intestinal alkaline phosphatase might represent a good marker for regulative host processes allowing the gut to cope with changing environmental conditions. The intestinal tract is an ecosystem rich in food derived proteins and host and microbial derived proteins. Regulation of protein degradation is demonstrated by the identification of several proteinases but also proteinase inhibitors. Several decades ago, tryptic activity was detected in faecal samples derived from infants and therefore proteinases and their inhibitors were proposed to have a role in balancing intestinal functions (Norin et al. 1985). As discussed in the literature summary of this thesis, proteinases are also supposed to have another role in addition to proteolysis and it may be via these secondary roles that they contribute to a certain disease phenotype. Taken together, a combination of these proteins might be used as markers to monitor the health state of the intestinal ecosystem i.e. they could be used as indicators of an intact or perturbed system. Bacterial proteins degrading host derived glycans have been identified in many subjects (Study III), such as fucose isomerase which degrades mammalian derived glycans. In addition, also the host enzyme trehalase was identified; this is an enzyme that degrades trehalose, a disaccharide that is formed in bacteria upon stress to dehydration and acts as a compatible solute. These findings further illustrate the mutualistic relationship between the host and its microbes.

34

Results and Discussion

One example of indirect host-microbe interaction was observed in Study III in which two individuals exhibited relatively lower levels of alpha amylase and high levels of Prevotella ssp. peptide counts in comparison to the other individuals. This finding could indicate that these two individuals have consumed a diet not dominated by starch but containing more complex carbohydrates enforcing the activity of Prevotella.

5.4 Comparing the metaproteome of lean and obese individuals (IV) Within this work, the microbial composition and metaproteome of a cohort of 29 non-obese and obese were analysed. In the samples deriving from the non-obese individuals, statistically significant higher levels of Bacteroidetes 16S rRNA were found compared to the obese individuals. However, relatively more Bacteroidetes peptides than 16S rRNA signals were obtained in the obese group compared to the non-obese group. It is unlikely that this is due to the greater faecal mass in obese individuals i.e. the relative share should not be affected as protein and DNA amounts should be affected similarly in that case. One explanation could be that there is an uncoupling between growth and activity, a well-known physiological phenomenon recognized in industrially-utilized bacteria such as Lactococcus lactis cultured at low pH conditions (Goel et al. 2015). It has been reported that the amount of SCFA were significantly higher in obese than in non-obese subjects (Fernandes et al. 2014), and as a result the colonic pH values of the obese subjects are predicted to be markedly lower. In addition, it was found that the growth rate of Bacteroides spp. was tightly controlled and reduced at low pH and by the amount of SCFA (Duncan et al 2009). Hence, although the Bacteroides spp. may simply be not growing to high abundance due to the low pH in the obese subjects, these bacteria may still have very high activities. Yet another explanation could be a technical bias due to the phylogenetic analysis approach used or an incomplete coverage of the protein sequence database. Among the 957 cultured bacterial intestinal species 47% belong to the phylum Firmicutes and 10% to Bacteroidetes (Rajilić-Stojanović and de Vos, 2014). Out of 398 strains at IMG/JGI for which there is an assigned origin in the digestive tract, 19% belong to the phylum Bacteroidetes (Figure 6). It is possible that the 16S rRNA sequences of the “obese” Bacteroidetes species are not fully covered by the oligonucleotide probes on the microarray. Alternatively, these “obese” Bacteroidetes species may be overrepresented in the protein databases used. The effect of ethnicity on the Bacteroidetes levels was revealed in a study on the microbiota and metagenome of obese compared to lean individuals (Turnbaugh et al. 2009). Lean individuals with European ancestry had 1.1 times more Bacteroidetes than their obese counterparts whereas the ratio was 1.3 for individuals of African ancestry. This might be due to the fact that the lean individuals of African ancestry had significantly more Bacteroidetes and less Firmicutes than the individuals of European ancestry. These findings exemplify the difficulties which may arise in comparing groups e.g. according their BMI values. The metaproteomes of the non-obese and obese group differed in a statistically significant way as shown by a principle coordinate analysis of the identified NOGs. This demonstrated that there are considerable differences in functionalities between the non-obese and obese group. Among the 25 NOGs which were found to explain the separation between samples from the obese and the non-obese group, based on baggedRDA analysis, several NOGs were involved in carbohydrate degradation. Furthermore, the insulin sensitivity of the subjects, as measured with the HOMA index, correlated negatively with a specific bacterial two component regulator. This is a further example of the link between microbiota and the host, and it may represent a possible starting point to clarify the reasons of the metabolic changes occurring in obesity.

35

Results and Discussion

Figure 6. Microbial phyla distribution of 1,847 microbial genomes at IMG/JGI with origin Homo sapiens (IMG/JGI HS), 398 of those are from the sub-system digestive tract (IMG/JGI HS DT), and of 1057 cultured intestinal phylotypes (1057). (EA = Euryarchaeota)

5.5 Phylogeny-specific metabolic activity – 16S rRNA genes versus peptides analysis (II-IV) The recent expansion of the protein databases as consequence of the efforts ongoing in sequencing microbial genomes has provided a critical amount of microbial proteins sequences originating from a diverse range of species which made referring a LCA to a peptide possible. Furthermore, phylogenetic microarray data and metaproteomic data were compared to obtain a better understanding of which microbes were present and what was their metabolic activity. When comparing the phylum distributions measured by the phylogenetic microarray to the spectral count data of phylum-specific peptides both agreeing and disagreeing results were obtained (Figure 7). Differences in the averages per phylum were observed between Studies II, III and IV. Biological differences in the study cohorts, the storage of the faecal material (Table 5; Bahl et al. 2012) as well as differences in the analytical steps (Table 6) might have contributed to the observed differences.

100%

80%

60% Verrucomicrobia Proteobacteria 40% Actinobacteria 20% Bacteroidetes Firmicutes 0% 16S pep 16S pep 16S pep 16S pep Non-obese Obese Study II Study III Study IV

Figure 7. Barplot of the distribution of the main bacterial phyla across three studies assessed by phylogenetic microarray (16S) and metaproteomics analyses (pep).

36

Results and Discussion

The main functionalities were determined for the three major phyla found in the human intestine, Firmicutes, Bacteroidetes, and Actinobacteria. Firmicutes appeared to be dominantly active in carbohydrate metabolism and butyrate production, and Actinobacteria in sugar metabolism. The proteins produced by Bacteroidetes showed a more even distribution of several functions. These findings were in agreement with previous theoretical knowledge. Both in Study II and III, a large fraction of the identified peptides could not be assigned to proteins with any known function. This implies that the functionalities of Bacteroidetes are underrepresented in terms of cultured and sequenced species and hence need to be analysed more closely. In Study III, peptides were filtered according their phylogenetic origin and the KOs of these specific peptides were further studied. Specific focus was on the functionalities of Faecalibacterium prausnitzii, Prevotella ssp. and Methanobrevibacter as these organisms vary greatly in their abundance and encode distinct functions within the gut ecosystem. For F. prausnitzii for example glucuronate-isomerase was identified implying F. prausnitzii’s involvement in carbohydrate metabolism. According to the KOs that were identified based on Prevotella peptides these microbes contributed to the propionate synthesis in the analysed metaproteomes as methylmalonyl-CoA mutase was identified. In addition, proteins involved in the methane production derived from Methanobrevibacter were identified. The bias which remains in finding a LCA for a peptide and drawing a conclusion on these findings is that obviously not all peptides can be assigned. In this work the rate of assigning a LCA to a peptide was around 70%. An absolute peptide matching is required for assigning a phylogenetic origin to a peptide. However, this is impaired by the fact that many protein sequences still remain to be identified. Therefore, the results have to be seen in the light of present knowledge both in the form of oligonucleotides represented on the microarray as well as protein sequence entries in UniprotKB. In Study II, where aligned and normalized MS1 data for proteins were available the proteins could be correlated with the microarray data. Correlations for example between flagellin and Roseburia intestinalis and Eubacterium rectale could be identified. These findings could be verified by a BLAST search of the flagellin sequences. This highlights the fact that the integration of microarray and metaproteomic data makes it possible to identify the organisms responsible for the production of a specific protein. In Study I, it was noted that one of the individuals exhibited a high number of Akkermansia reads and similarly a high number of Akkermansia derived peptides could be identified. In the other individual, hardly any Akkermansia peptides were obtained. Akkermansia is so far the only member of the deeply rooted phylum Verrocumicrobia found in human GIT and therefore a good example for comparing 16S rRNA and peptide data.

5.6 Reflections on study cohorts All the study participants but one (diabetes type II) were individuals without (chronic) disease. They had not taken any antibiotics at least three months prior to sampling and did not suffer from any gastro-intestinal diseases. Due to sample availability issues, some skewed age distribution was observed in the probiotic intervention trial (Study III). In the obesity study (Study IV) there was a significant age difference between the non-obese and the obese group. Generally, due to the small size of the cohorts, there was no even distribution between male and female study participants. With the exception of Study II which was a proof of principle study, all samples were derived from collaborations for which a number of meta-data was already available. Therefore, there was no influence on the availability of meta-data such as health-questionnaires or food diaries which were not collected. The absence of food diaries impaired controlling for the impact of diet that can be significant. For instance, without dietary information, it is difficult to account for the individual variation in some of the results obtained here, such as why flagellins are detected in some individuals and not in others. Similarly, in Study IV it remains unclear whether there is a per se difference in metabolism or if a difference in diet could explain the separation of the intestinal metaproteome from obese and non-obese subjects.

37

Results and Discussion

In Study III, study participants filled a symptom diary that indicated showing normal health perturbations such as loose stools or coughing. There is a learning process encountered when arranging this kind of study. Studies with human subjects depend very much on their willingness to participate and this may be lower in healthy subjects than in individuals with a disease who may hope that their participation will confer a health benefit. Since scientific interest in intestinal microbiota is still increasing, one can predict that larger studies also collecting faecal material or other digestive tract compartments will be conducted in the future.

38

Conclusions and Future Perspectives

6. CONCLUSIONS AND FUTURE PERSPECTIVES

Microbiology has largely moved from light microscopy to what can be defined as “molecular microscopy”. By determining the sequences of 16S rRNA genes or metagenomes, a much larger variety of microbial communities has been discovered than ever would have been expected by looking at a microscopic picture or analysing colonies on a plate. The continuous development of both proteomics and the human microbiome field is reflected in the analytical capabilities of the different mass spectrometers used in this work, the development of available algorithms and other tools, and the constant increase in the numbers of microbial genes identified from the human intestine. The immense increase in genomic information of the intestinal microbiota has greatly facilitated studies at the protein level. However, the tremendous growth of sequence information has also hampered the potential identifications because of the great number of variations. Hence, it underlined the need for the development of new data analysis solutions. Proteomic software, such as is available for de novo sequencing, may further be developed to identify peptides in complex mixtures without the need for gene knowledge. Moreover, almost each year of this thesis work, an updated and faster or better mass spectrometer was introduced. The metaproteomic approach devised here is a detailed description of one solution for assessing the proteins contained in faecal material; as discussed, several other approaches are feasible. With regard to the sample preparation, the development in metaproteomics has been parallel to that encountered in metagenomics. First, several different approaches have been proposed and now a systematic comparison is being undertaken (Salonen et al. 2010; Lozupone et al. 2013; Franzosa et al. 2014). An important point to keep in mind is that we are at the beginning of understanding the intestinal microbiome in its entirety by applying methods producing complex data-sets. These present studies highlighted the variety and complexity of the intestinal microbiome. After these initial studies, a refinement and homogenization of the analytical tools has started. In the field of metaproteomics, a systematic comparison of sample preparation methodologies has still to emerge and there are many important questions needing to be addressed. What is the effect of sampling? What is the effect of freezing and is there an effect of storage time? Which is better: using a bacterial pellet or utilizing crude faecal material? How should the lysis be performed: by mechanical disruption, chemical lysis, or a combination of these techniques? What is the best protein-fractionation method? In one study, the question of appropriate study material has now been addressed and the results indicate that by using a bacterial pellet one can recover more microbial proteins but with a bias in the recovered phyla (Tanca et al. 2015). It is to be expected that further methodological harmonization based on a comprehensive evaluation will accelerate the speed in acquiring new knowledge about the intestinal microbiota and its involvement in health and disease. Another approach that may be important in the future is the use of stable isotope probing. This approach can be coupled to dietary interventions with food components labelled with stable isotopes; this would make it possible to analyse trophic chains. Similarly, as it has been done for 16S rRNA analysis in an intestinal model and proteins in aquatic model systems, this technique could be applied to follow the human-microbial interactions in humans based by measuring the incorporation rates of these substrates into proteins. Two solutions for coping with protein inference that complicates quantitative comparisons between samples were applied. In Study II grouping proteins according their relatedness based on at least one shared peptide was used to resolve PSM results on the protein level. Another solution was devised by assigning peptides to functional groups and performing quantitative comparisons on that level. However, what is still needed is to define the functional-depth at which meaningful differences between treatment/health-state groups are found; if this is the individual protein deriving from a certain strain or species, a group of proteins sharing same peptides or proteins grouped to a specific function. Functional grouping of proteins as provided by the orthologous gene databases such as COG or eggNOG or the recently developed GMM are very useful in providing an overview of the functions taking place in a very complex ecosystem. One step further to the

39

Conclusions and Future Perspectives defining of GMMs, grouping KOs to intestinal relevant functional categories, would be environment-specific annotation of human intestinal metagenomes. This requires experimental proofing of the predicted functions in combination with network analyses to test whether the predicted functions fit into the theoretical model. There is an overall awareness of the gene richness encoding for a large variety of functions of our microbes; and their importance for a multitude of body functions. However, there is still much to be learned of our microbes and us, in a world with usually a surplus of every food item. The mechanisms of interactions between host and microbiota are poorly understood. In the case of energy metabolism, recent research has opened up the possibly involved molecules and even genes. Bile acids are depending on the microbiota differently modified and these regulate the expression of the fxr gene, responsible for regulating host bile acid production. We need to concentrate again more on microbial physiology. Future efforts are to be taken to get to know yet un-cultured microbes and learn about their functions. Another possibility, closer to the in vivo situation, is to fractionate the intestinal microbes into distinct subsets, e.g. according to surface structure, and then to analyse their proteins. There is a large unknown gap between the potential and real functionalities of the intestinal microbiota; only half of the sequenced genes in metagenomic studies can be mapped to a gene homologue with a known function. The numbers of proteins detected in one faecal metaproteomics experiment are several factors of magnitude less than the theoretical number that could be detected. Some methods for fractionating the metaproteome have been described above. A further option for better capturing the full variety of proteins would be to utilize peptide ligand libraries as suggested recently (Righetti et al. 2015). Peptide ligand libraries could help to identify the less abundant proteins. Due to the multitude of factors affecting human metabolism and the composition and functions of our microbes, careful metadata collection, as well as phylogenetic and functional measurements, are required if we are to make meaningful associations. Data generated in such a way can be used to make predictions based on profiling data and it can be applied for diagnosis of an individual’s health state or the decision what would be the most successful treatment. As shown in this work, we have a personalized metaproteome and there is temporal variation of its functions. By conducting longitudinal studies, well-controlled for possible influencing factors, we may be able to determine which microbial set-ups are associated with which and which effectors change the property and if so, to which extent. Although the main purpose of metaproteomics is not to identify which organisms are present in the microbial community but instead to study its function, in this work the method of assigning a phylogenetic origin to a peptide was found useful in the handling of such complex data. Sub-setting the identifications into host and bacterial proteins revealed component specific proteins. Phylogenetic analyses complemented the metaproteomic data in this work. Although both sequencing and proteomics techniques developed tremendously during the course of this work, biases from both sides remain, complicating the one-to-one comparison. Until these biases are further tackled, any comparisons have to be viewed as approximations. A major divergence was observed between phylogenetic microarray analysis and metaproteomics for members of the genus Bacteroides. Whereas on the 16S rRNA level Bacteroides spp. were apparently significantly less abundant in obese individuals, relatively more Bacteroides peptides were identified from the obese individuals. While the simple explanation is that the Bacteroides spp. are more active in the obese subjects, the question arises why they are then not simply outgrowing and become more abundant. Consequential studies may provide further evidence that there is an uncoupling between growth and activity. There is a perceived need for studying the proteins in the upper regions of the GIT in addition to the studies so far mostly focused on faecal material. One study used lavage to sample proteins along the intestine (Li et al. 2011). However, this study focused on extracellular proteins and no attempts were made to lyse the microbes that could have been present. Sampling in a way comparably to the one carried out in a rat study, in which the metaproteome of different intestinal compartments were analysed (Haange et al. 2012), would

40

Conclusions and Future Perspectives contribute to describe in humans the spatial functional differences along – from proximal to distal, and across the intestinal tract – from epithelium to lumen. We have learned by using techniques of modern molecular biology and keen interest in the ongoing activities inside of our body that we require our microbes and our microbes require us for keeping a healthy body condition. Careful study of the vast information on the intestinal microbiota and expanding it with targeted new studies in a team of clinicians, microbiologists, nutrition scientists and bioinformaticians, will sharpen our insights of the complex mechanisms ongoing inside us.

41

Acknowledgements

7. ACKNOWLEDGEMENTS

This thesis work was carried out at the Department of Veterinary Biosciences, University of Helsinki and the Finnish Centre of Excellence in Microbial Food Safety Research. It was part of the TEKES project “Innovations for Intestinal Health” implemented in the Faculty of Veterinary Medicine. This work was also supported by the Academy of Finland and the Doctoral Programme in Food Chain and Health. A special thank you to Dean Professor Antti Sukura and the Head of the Department Professor Airi Palva for providing state-of-the-art facilities; everything needed was available.

I want to close this work by answering one question I was asked several times, especially in the beginning of this work: Why Finland? The reason is simple - I got this project here which I felt fitted me, and which privileges me to thank all of you to whom I am grateful for the best imaginable support:

My team of three supervisors: You displayed trust in me by giving me this project. You also believed in me that I would finish it. Airi, I am grateful that I always could enjoy your support. Kiitos kovasti ja anna anteeksi että en puhu suomea hyvin! Willem, your ideas were a very important driver of the project and your enthusiasm helped through difficult times. Your positive challenging attitude allowed me to grow. Anne, you have had many roles in this project: supervisor, office companion, friend. You have supported me with help on a very broad scale. Thank you!

Koen Venema and Kirsi Savijoki are thanked for their critical comments on my thesis. You gave me the chance to think through some aspects one more time.

I want to thank Ewen MacDonald for his kind suggestions to improve the English of this thesis.

This project produced some external hard-drives full of data which could only be handled with the help of our bioinformatics J-forces. Janne and Jarkko – both of you always had a patient explanation of R and statistics, now thanks to your pedagogical skills I’m happily using basic R-commands routinely. Jarkko - you got me started with problem-solving of R-scripts and the knowledge that it is usually the way of reading the data which causes the problem not the functions. Jarkko and Jarmo - you have been handling my huge tables, solved several software issues and contributed substantially to the data analysis in the second half of the project. Thank you!

This project allowed me to collaborate with many people both abroad and in Finland whom I want to thank for their contributions, especially for the provision of such fine facilities (state of the art mass-spectrometers) and the analysis tools so essential in this work: Sjef - for substantial help in getting the project started, Koos and Peter - for the joint writing of the first manuscript of this thesis. Ilja - you not only conducted the LC- MS/MS measurement, you even provided a courier service relating to the work for the second article. Thank you also for the various discussions on technical aspects of LC-MS/MS.

I want to thank Jaana Mättö for making possible the collaboration with the Red Cross and for being a member of my PhD follow-up group.

After mostly travelling to get my samples into the LC-MSMS, I finally obtained a very local collaborator thanks to you Salla and Markku. Markku - thank you also very much for supporting me in your role as a follow-up group member and posing some new proteomics challenges.

42

Acknowledgements

Mark - thank you very much for giving the project a Bioinformatics boost; thank you also for the numerous (non-)proteomics chats.

I want to thank all the numerous co-authors not named individually. Thank you very much for your input.

I’m grateful for the travel financing that I received, both to communicate with people from the proteomics and microbiology community.

Many people of the Finnish proteomics community are thanked for basic practical support.

This thesis is here also due to the efforts of many people, efforts made before I even started this project: The supervisors of my Bachelor’s and Master’s thesis, especially Professor Hannelore Daniel and Professor Karl- Heinz Engel, guided my first scientific steps; your inspiration and knowledge transfer planted the seed that I should proceed to study for a PhD; the time at the NRC and people there were very important in giving me a foundation in proteomics allowing me to start this thesis work. Thank you!

All the peer PhD-students are thanked for sharing the burdens and joys of doing a PhD. Especially Lotta for always having a very good reason to come to Turku and making me familiar with Mölkky, Jing for all the chats about science and life in general and your enormous scientific enthusiasm, and Katri for your very positive spirit.

Outi - you have been an un-replaceable lab-assistance and you taught me how to give instructions in the right way. Thank you for the Mökki experience and socialising me in the first part of the project.

The Mikro people are thanked for the chats during lunch and coffee breaks as well as the day-to-day support.

I want to thank the FiDiPro steering group for giving me the opportunity for periodical reflection of my work.

This thesis required a lot of computer resources. Timo, vielen Dank für die stets prompte Hilfe in all den Computerschwierigkeiten und dafür, dass du mir eine Software nach der anderen installiert hast. Vielen Dank für die deutschen Sprüche!

Despite a language barrier once in a while there weren’t too many bureaucratic hurdles; Maija, Ritva and Tarja - thank you for the smooth handling of these matters and the language support. Laila, thank you for the help related to graduate school matters. Tiina - thank you very much for your kind help with guiding me through the dissertation process.

I want to thank the graduated doctors for proving to me that graduation is possible. Especially Tanja and Jonna are thanked for answering all my questions about how to get the thesis ready and the numerous simultaneous translations during all these years.

All the late-night and weekend-workers are thanked for giving some background noise during the more lonely times in the lab.

Katrin und Philipp mit Felix und Isabella: Aus unseren wöchentlichen Lauftreffs sind (Halb-)Jahres Familientreffen geworden. Danke für eure Freundschaft!

43

Acknowledgements

Gabi und Robert: Danke für eure Unterstützung aus der Ferne und dass wir uns in München immer willkommen fühlen.

Mama, Papa und Christoph: Der Ort meines Doktorvorhabens ist nicht der naheste zu meiner niederbayerischen Heimat. Trotzdem habt ihr mich in meinem Vorhaben stets unterstützt wie in all meinen Wissens-Bestrebungen. DANKE!

Unaussprechlicher Dank an Jochen und Iida: Mit euch bin ich da, wo ich sein mag und die, die ich sein mag.

Helsinki, April 2015

Carolin Kolmeder

44

8. REFERENCES

Ahlroos, T., Tynkkynen, S., 2009. Quantitative Bergström, A., Skov, T.H., Bahl, M.I., Roager, strain-specific detection of Lactobacillus H.M., Christensen, L.B., Ejlerskov, K.T., rhamnosus GG in human faecal samples by Mølgaard, C., Michaelsen, K.F., Licht, T.R., real-time PCR. J. Appl. Microbiol. 106, 506- 2014. Establishment of intestinal microbiota 514. during early life: a longitudinal, explorative study of a large cohort of Danish infants. Allmer, J., 2011. Algorithms for the de novo Appl. Environ. Microbiol. 80, 2889-2900. sequencing of peptides from tandem mass spectra. Expert. Rev. Proteomics 8, 645-657 Bermejo, F., Algaba, A., Guerra, I., Chaparro, M., De-La-Poza, G., Valer, P., Piqueras, B., Arsène-Ploetze, F., Bertin, P.N., Carapito, C., Bermejo, A., García-Alonso, J., Pérez, M.J., 2014. Proteomic tools to decipher microbial et al., 2013. Should we monitor vitamin B12 community structure and functioning. and folate levels in Crohn's disease patients? Environ. Sci. Pollut. Res. Int. Scand. J. Gastroenterol. 48, 1272-1277.

Altschul, S.F., Madden, T.L., Schäffer, A.A., Berry, D., Stecher, B., Schintlmeister, A., Zhang, J., Zhang, Z., Miller, W., Lipman, Reichert, J., Brugiroux, S., Wild, B., Wanek, D.J., 1997. Gapped BLAST and PSI- W., Richter, A., Rauch, I., Decker, T., et al., BLAST: a new generation of protein 2013. Host-compound foraging by intestinal database search programs. Nucleic Acids microbiota revealed by single-cell stable Res. 25, 3389-3402. isotope probing. Proc. Natl. Acad. Sci. U S A 110, 4720-4725. Arumugam, M., Raes, J., Pelletier, E., Le Paslier, D., Yamada, T., Mende, D.R., Fernandes, Biagi, E., Nylund, L., Candela, M., Ostan, R., G.R., Tap, J., Bruls, T., Batto, J.M., et al., Bucci, L., Pini, E., Nikkilä, J., Monti, D., 2011. Enterotypes of the human gut Satokari, R., Franceschi, C., et al., 2010. microbiome. Nature 473, 174-180. Through ageing, and beyond: gut microbiota and inflammatory status in seniors and Bahl, M.I., Bergström, A., Licht, T.R., 2012. centenarians. PLoS One 5, e10667. Freezing fecal samples prior to DNA extraction affects the Firmicutes to Booijink, C.C., El-Aidy, S., Rajilić-Stojanović, Bacteroidetes ratio determined by M., Heilig, H.G., Troost, F.J., Smidt, H., downstream quantitative PCR analysis. Kleerebezem, M., de Vos, W.M., Zoetendal, FEMS Microbiol. Lett. 329, 193-197. E.G., 2010. High temporal and inter- individual variation detected in the human Barbhuiya, M.A., Sahasrabuddhe, N.A., Pinto, ileal microbiota. Environ. Microbiol. 12, S.M., Muthusamy, B., Singh, T.D., 3213-3227. Nanjappa, V., Keerthikumar, S., Delanghe, B., Harsha, H., Chaerkady, R., et al., 2011. Bookstaver, P.B., Ahmed, Y., Millisor, V.E., Comprehensive proteomic analysis of human Siddiqui, W., Albrecht, H., 2013. bile. Proteomics 11, 4443-4453. Clostridium difficile: case report and concise review of fecal microbiota transplantation. J. Ben-Amor, K., Heilig, H., Smidt, H., Vaughan, S. C. Med. Assoc. 109, 62-66. E.E., Abee, T., de Vos, W.M., 2005. Genetic diversity of viable, injured, and dead fecal Breiman, L., 2001. Random forests. Machine bacteria assessed by fluorescence-activated Learning 45, 5-32. cell sorting and 16S rRNA gene analysis. Appl. Environ. Microbiol. 71, 4679-4689. Cain, J.A., Solis, N., Cordwell, S.J., 2014. Beyond gene expression: The impact of protein post- Benndorf, D., Balcke, G.U., Harms, H., Von translational modifications in bacteria. J. Bergen, M., 2007. Functional metaproteome Proteomics 97, 265-286. analysis of protein extracts from contaminated soil and groundwater. ISME J. 1, 224-234.

Cantarel, B.L., Erickson, A.R., Verberkmoes, de Vos, W.M., de Vos, E.A.J., 2012. Role of the N.C., Erickson, B.K., Carey, P.A., Pan, C., intestinal microbiome in health and disease: Shah, M., Mongodin, E.F., Jansson, J.K., from correlation to causation. Nutr. Rev. 70, Fraser-Liggett, C.M., et al., 2011. Strategies S45-S56. for metagenomic-guided whole-community proteomics of complex microbial Degnan, P.H., Taga, M.E., Goodman, A.L., 2014. environments. PLoS One 6, e27173. Vitamin B 12 as a Modulator of Gut Microbial Ecology. Cell Metab. 20, 769-778. Cohen, D.P., Renes, J., Bouwman, F.G., Zoetendal, E.G., Mariman, E., de Vos, W.M., Derrien, M., Vaughan, E.E., Plugge, C.M., de Vaughan, E.E., 2006. Proteomic analysis of Vos, W.M., 2004. Akkermansia muciniphila log to stationary growth phase Lactobacillus gen. nov., sp. nov., a human intestinal mucin- plantarum cells and a 2-DE database. degrading bacterium. Int. J. Syst. Evol. Proteomics 6, 6485-6493. Microbiol. 54, 1469-1476.

Craig, R., Beavis, R.C., 2004. TANDEM: Dewulf, E.M., Cani, P.D., Claus, S.P., Fuentes, S., matching proteins with tandem mass spectra. Puylaert, P.G., Neyrinck, A.M., Bindels, Bioinformatics 20, 1466-1467. L.B., de Vos, W.M., Gibson, G.R., Thissen, J.P., et al., 2013. Insight into the prebiotic Cullender, T.C., Chassaing, B., Janzon, A., concept: lessons from an exploratory, double Kumar, K., Muller, C.E., Werner, J.J., blind intervention study with inulin-type Angenent, L.T., Bell, M.E., Hay, A.G., fructans in obese women. Gut 62, 1112- Peterson, D.A., et al., 2013. Innate and 1121. adaptive immunity interact to quench microbiome flagellar motility in the gut. Cell Dimidi, E., Christodoulides, S., Fragkos, K.C., Host Microbe 14, 571-581. Scott, S.M., Whelan, K., 2014. The effect of probiotics on functional constipation in Cummings, J. H., 1975. The colon: absorptive, adults: a systematic review and meta-analysis secretory and metabolic functions. Digestion of randomized controlled trials. Am. J. Clin. 13, 232-240. Nutr. 100, 1075-1084.

Cummings, J.H., Macfarlane, G.T., 1991. The dos Santos, F.B., de Vos, W.M., Teusink, B., control and consequences of bacterial 2013. Towards metagenome-scale models for fermentation in the human colon. J. Appl. industrial applications—the case of Lactic Bacteriol. 70, 443-459. Acid Bacteria. Curr. Opin. Biotechnol. 24, 200-206. Cummings, J.H., Wiggins, H.S., Jenkins, D.J., Houston, H., Jivraj, T., Drasar, B.S., Hill, dos Santos, V.M., Müller, M., De Vos, W.M., M.J., 1978. Influence of diets high and low 2010. Systems biology of the gut: the in animal fat on bowel habit, gastrointestinal interplay of food, microbiota and host at the transit time, fecal microflora, bile acid, and mucosal interface. Curr. Opin. Biotechnol. fat excretion. J. Clin. Invest. 61, 953-963. 21, 539-550.

Daniel, H., Gholami, A.M., Berry, D., Duncan, S.H., Louis, P., Thomson, J.M., Flint, Desmarchelier, C., Hahne, H., Loh, G., H.J., 2009. The role of pH in determining the Mondot, S., Lepage, P., Rothballer, M., species composition of the human colonic Walker, A., et al., 2014. High-fat diet alters microbiota. Environ. Microbiol. 11, 2112- gut microbiota physiology in mice. ISME J. 2122. 8, 295-308. Edgar, R.C., 2010. Search and clustering orders of David, L.A., Maurice, C.F., Carmody, R.N., magnitude faster than BLAST. Gootenberg, D.B., Button, J.E., Wolfe, B.E., Bioinformatics 26, 2460-2461. Ling, A.V., Devlin, A.S., Varma, Y., Fischbach, M.A., et al., 2014. Diet rapidly and reproducibly alters the human gut microbiome. Nature 505, 559-563.

Ellrott, K., Jaroszewski, L., Li, W., Wooley, J.C., Finn, R.D., Bateman, A., Clements, J., Coggill, P., Godzik, A., 2010. Expansion of the protein Eberhardt, R.Y., Eddy, S.R., Heger, A., repertoire in newly explored environments: Hetherington, K., Holm, L., Mistry, J., et al., human gut microbiome specific protein 2014. Pfam: the protein families database. families. PLoS Comput. Biol. 6, e1000798. Nucleic Acids Res. 42, D222-30.

Erickson, A.R., Cantarel, B.L., Lamendella, R., Finucane, M.M., Sharpton, T.J., Laurent, T.J., Darzi, Y., Mongodin, E.F., Pan, C., Shah, Pollard, K.S., 2014. A taxonomic signature M., Halfvarson, J., Tysk, C., Henrissat, B., et of obesity in the microbiome? Getting to the al., 2012. Integrated metagenomics/ guts of the matter. PloS One 9, e84689. metaproteomics reveals human host- microbiota signatures of Crohn's disease. Foell, D., Wittkowski, H., Roth, J., 2009. PLoS One 7, e49138. Monitoring disease activity by stool analyses: from occult blood to molecular Estaki, M., DeCoffe, D., Gibson, D.L., 2014. markers of intestinal inflammation and Interplay between intestinal alkaline damage. Gut 58, 859-868. phosphatase, diet, gut microbes and immunity. World J. Gastroenterol. 20, Franzosa, E.A., Morgan, X.C., Segata, N., 15650-15656. Waldron, L., Reyes, J., Earl, A.M., Giannoukos, G., Boylan, M.R., Ciulla, D., Everard, A., Cani, P.D., 2013. Diabetes, obesity Gevers, D., et al., 2014. Relating the and gut microbiota. Best. Pract. Res. Clin. metatranscriptome and metagenome of the Gastroenterol. 27, 73-83. human gut. Proc. Natl. Acad. Sci. U.S.A. 111, E2329-38. Faraway, J.J., 2002. Practical Regression and ANOVA using R. Published on the URL: Gaci, N., Borrel, G., Tottey, W., O’Toole, P.W., http://www.cran.rproject.org/doc/contrib/Far Brugère, J., 2014. Archaea and the human away-PRA.pdf. gut: New beginning of an old story. World J. Gastroenterol: WJG 20, 16062. Faust, K., Sathirapongsasuti, J.F., Izard, J., Segata, N., Gevers, D., Raes, J., Galperin, M.Y., Makarova, K.S., Wolf, Y.I., Huttenhower, C., 2012. Microbial co- Koonin, E.V., 2015. Expanded microbial occurrence relationships in the human genome coverage and improved protein microbiome. PLoS Computat. Biol. 8, family annotation in the COG database. e1002606. Nucleic Acids Res. 43, D261-9.

Feig, M.A., Hammer, E., Völker, U., Jehmlich, Gao, X., Pujos-Guillot, E., Martin, J., Galan, P., N., 2013. In-depth proteomic analysis of the Juste, C., Jia, W., Sebedio, J., 2009. human cerumen—a potential novel Metabolite analysis of human fecal water by diagnostically relevant biofluid. J. gas chromatography/mass spectrometry with Proteomics 83, 119-129. ethyl chloroformate derivatization. Anal. Biochem. 393, 163-175. Fernandes, J., Su, W., Rahat-Rozenbloom, S., Wolever, T., Comelli, E., 2014. Adiposity, Geer, L.Y., Markey, S.P., Kowalak, J.A., Wagner, gut microbiota and faecal short chain fatty L., Xu, M., Maynard, D.M., Yang, X., Shi, acids are linked in adult humans. Nutr W., Bryant, S.H., 2004. Open mass Diabetes 4, e121. spectrometry search algorithm. J. Proteome Res 3, 958-964. Ferrer, M., Ruiz, A., Lanza, F., Haange, S., Oberbach, A., Till, H., Bargiela, R., Campoy, Georges, A.A., El-Swais, H., Craig, S.E., Li, C., Segura, M.T., Richter, M., et al., 2013. W.K., Walsh, D.A., 2014. Metaproteomic Microbiota from the distal guts of lean and analysis of a winter to spring succession in obese adolescents exhibit partial functional coastal northwest Atlantic Ocean microbial redundancy besides clear differences in plankton. ISME J. 8, 1301-1313. community structure. Environ. Microbiol. 15, 211-226.

Gerritsen, J., Smidt, H., Rijkers, G.T., de Vos, Hansen, S.H., Stensballe, A., Nielsen, P.H., W.M., 2011. Intestinal microbiota in human Herbst, F., 2014. Metaproteomics: health and disease: the impact of probiotics. Evaluation of protein extraction from Genes Nutr. 6, 209-240. activated sludge. Proteomics 14, 2535-2539.

Gill, S.R., Pop, M., Deboy, R.T., Eckburg, P.B., Harden, C.J., Perez-Carrion, K., Babakordi, Z., Turnbaugh, P.J., Samuel, B.S., Gordon, J.I., Plummer, S.F., Hepburn, N., Barker, M.E., Relman, D.A., Fraser-Liggett, C.M., Nelson, Wright, P.C., Evans, C.A., Corfe, B.M., K.E., 2006. Metagenomic analysis of the 2012. Evaluation of the salivary proteome as human distal gut microbiome. Science 312, a surrogate tissue for systems biology 1355-1359. approaches to understanding appetite. J. Proteomics 75, 2916-2923. Glatter, T., Ludwig, C., Ahrné, E., Aebersold, R., Heck, A.J., Schmidt, A., 2012. Large-scale Hercberg, S., Galan, P., Preziosi, P., Bertrais, S., quantitative assessment of different in- Mennen, L., Malvy, D., Roussel, A., Favier, solution protein digestion protocols reveals A., Briançon, S., 2004. The SU. VI. MAX superior cleavage efficiency of tandem Lys- Study: a randomized, placebo-controlled trial C/trypsin proteolysis over trypsin digestion. of the health effects of antioxidant vitamins J. Proteome Res. 11, 5145-5156. and minerals. Arch. Intern. Med. 164, 2335- 2342. Goel, A., Eckhardt, T.H., Puri, P., Jong, A., Santos, F.B., Giera, M., Fusetti, F., de Vos, Huang, J., Wang, F., Ye, M., Zou, H., 2014. W.M., Kok, J., Poolman, B., et al., 2015. Enrichment and separation techniques for Protein costs do not explain evolution of large-scale proteomics analysis of the protein metabolic strategies and regulation of post-translational modifications. J. ribosomal content. Mol. Microbiol. Chromatogr. A. 1372, 1-17.

Graf, D., Di Cagno, R., Fåk, F., Flint, H.J., Human Microbiome Jumpstart Reference Strains Nyman, M., Saarela, M., Watzl, B., 2015. Consortium, Nelson, K.E., Weinstock, G.M., Contribution of diet to the composition of the Highlander, S.K., Worley, K.C., Creasy, human gut microbiota. Microb. Ecol. Health H.H., Wortman, J.R., Rusch, D.B., Mitreva, Dis. 26. M., Sodergren, E., Chinwalla, A.T., Feldgarden, M., Gevers, D., Haas, B.J., Graham, C., McMullan, G., Graham, R.L.J., 2011. Madupu, R., Ward, D.V., Birren, B.W., Proteomics in the microbial sciences. Gibbs, R.A., Methe, B., Petrosino, J.F., Bioeng. Bugs, 17-30. Strausberg, R.L., Sutton, G.G., White, O.R., Wilson, R.K., Durkin, S., Giglio, M.G., Guthals, A., Bandeira, N., 2012. Peptide Gujja, S., Howarth, C., Kodira, C.D., identification by tandem mass spectrometry Kyrpides, N., Mehta, T., Muzny, D.M., with alternate fragmentation modes. Mol. Pearson, M., Pepin, K., Pati, A., Qin, X., Cell. Proteomics 11, 550-557. Yandava, C., Zeng, Q., Zhang, L., Berlin, A.M., Chen, L., Hepburn, T.A., Johnson, J., Haange, S., Oberbach, A., Schlichting, N., McCorrison, J., Miller, J., Minx, P., Hugenholtz, F., Smidt, H., von Bergen, M., Nusbaum, C., Russ, C., Sykes, S.M., Till, H., Seifert, J., 2012. Metaproteome Tomlinson, C.M., Young, S., Warren, W.C., analysis and molecular genetics of rat Badger, J., Crabtree, J., Markowitz, V.M., intestinal microbiota reveals section and Orvis, J., Cree, A., Ferriera, S., Fulton, L.L., localization resolved species distribution and Fulton, R.S., Gillis, M., Hemphill, L.D., enzymatic functionalities. J. Proteome Res. Joshi, V., Kovar, C., Torralba, M., 11, 5406-5417. Wetterstrand, K.A., Abouellleil, A., Wollam, A.M., Buhay, C.J., Ding, Y., Dugan, S., Hampton-Marcell, J.T., Moormann, S.M., Owens, FitzGerald, M.G., Holder, M., Hostetler, J., S.M., Gilbert, J.A., 2013. Preparation and Clifton, S.W., Allen-Vercoe, E., Earl, A.M., metatranscriptomic analyses of host-microbe Farmer, C.N., Liolios, K., Surette, M.G., Xu, systems. Methods Enzymol. 531, 169-185. Q., Pohl, C., Wilczek-Boney, K., Zhu, D., 2010. A catalog of reference genomes from

the human microbiome. Science 328, 994- Jewett, M.C., Miller, M.L., Chen, Y., Swartz, J.R., 999. 2009. Continued protein synthesis at low [ATP] and [GTP] enables cell adaptation Human Microbiome Project Consortium, 2012. during energy limitation. J. Bacteriol. 191, Structure, function and diversity of the 1083-1091. healthy human microbiome. Nature 486, 207-214. Jiménez, E., Fernández, L., Marín, M.L., Martín, R., Odriozola, J.M., Nueno-Palop, C., Huson, D.H., Weber, N., 2013. Microbial Narbad, A., Olivares, M., Xaus, J., community analysis using MEGAN. Rodríguez, J.M., 2005. Isolation of Methods Enzymol. 531, 465-485. commensal bacteria from umbilical cord blood of healthy neonates born by cesarean Integrative HMP (iHMP) Research Network section. Curr. Microbiol. 51, 270-274. Consortium, 2014. The Integrative Human Microbiome Project: dynamic analysis of Jiménez-Girón, A., Ibáñez, C., Cifuentes, A., microbiome-host omics profiles during Simó, C., Muñoz-González, I., Martín- periods of human health and disease. Cell Álvarez, P.J., Bartolome, B., Moreno Host Microbe 16, 276-289. Arribas, M.V., 2014. Faecal metabolomic fingerprint after moderate consumption of Jagtap, P., Goslinga, J., Kooren, J.A., McGowan, red wine by healthy subjects. J. Proteome T., Wroblewski, M.S., Seymour, S.L., Res. 14, 897-905. Griffin, T.J., 2013. A two‐step database search method improves sensitivity in Joslin, E.P., 1901. The Influence of Bile on peptide sequence matches for Metabolism. J. Exp. Med. 5, 513-525. metaproteomics and proteogenomics studies. Proteomics 13, 1352-1357. Jost, T., Lacroix, C., Braegger, C., Chassard, C., 2014. Stability of the maternal gut Jalanka-Tuovinen, J., Salojärvi, J., Salonen, A., microbiota during late pregnancy and early Immonen, O., Garsed, K., Kelly, F.M., lactation. Curr. Microbiol. 68, 419-427. Zaitoun, A., Palva, A., Spiller, R.C., de Vos, W.M., 2014. Faecal microbiota composition Jumpertz, R., Le, D.S., Turnbaugh, P.J., Trinidad, and host-microbe cross-talk following C., Bogardus, C., Gordon, J.I., Krakoff, J., gastroenteritis and in postinfectious irritable 2011. Energy-balance studies reveal bowel syndrome. Gut 63, 1737-1745. associations between gut microbes, caloric load, and nutrient absorption in humans. Am. Jalanka-Tuovinen, J., Salonen, A., Nikkilä, J., J. Clin. Nutr. 94, 58-65. Immonen, O., Kekkonen, R., Lahti, L., Palva, A., de Vos, W.M., 2011. Intestinal Juste, C., Kreil, D.P., Beauvallet, C., Guillot, A., microbiota in healthy adults: temporal Vaca, S., Carapito, C., Mondot, S., Sykacek, analysis reveals individual and common core P., Sokol, H., Blon, F., et al., 2014. Bacterial and relation to intestinal symptoms. PLoS protein signals are associated with Crohn's One 6, e23035. disease. Gut 63, 1566-1577.

Jarocki, V.M., Tacchi, J.L., Djordjevic, S.P., Käll, L., Storey, J.D., Noble, W.S., 2008. Non- 2014. Non‐proteolytic functions of microbial parametric estimation of posterior error proteases increases pathological complexity. probabilities associated with peptides Proteomics 15, 1075-1088. identified by tandem mass spectrometry. Bioinformatics 24, i42-8. Jensen, L.J., Julien, P., Kuhn, M., von Mering, C., Muller, J., Doerks, T., Bork, P., 2008. Kam, S.Y., Hennessy, T., Chua, S.C., Gan, C.S., eggNOG: automated construction and Philp, R., Hon, K.K., Lai, L., Chan, W.H., annotation of orthologous groups of genes. Ong, H.S., Wong, W.K., et al., 2011. Nucleic Acids Res. 36, D250-4. Characterization of the human gastric fluid proteome reveals distinct pH-dependent Jeong, K., Kim, S., Bandeira, N., 2012. False protein profiles: implications for biomarker discovery rates in spectral identification. studies. J. Proteome Res. 10, 4535-4546. BMC Bioinformatics 13 Suppl 16, S2.

Kanehisa, M., Goto, S., Kawashima, S., Okuno, infantis reveals the metabolic insight on Y., Hattori, M., 2004. The KEGG resource consumption of prebiotics and host glycans. for deciphering the genome. Nucleic Acids PloS One 8, e57535. Res. 32, D277-80. Klaassens, E.S., de Vos, W.M., Vaughan, E.E., Kanehisa, M., Goto, S., Sato, Y., Furumichi, M., 2007. Metaproteomics approach to study the Tanabe, M., 2012. KEGG for integration and functionality of the microbiota in the human interpretation of large-scale molecular data infant gastrointestinal tract. Appl. Environ. sets. Nucleic Acids Res. 40, D109-D114. Microbiol. 73, 1388-1392.

Karlsson, F.H., Tremaroli, V., Nookaew, I., Koonin, E.V., Fedorova, N.D., Jackson, J.D., Bergström, G., Behre, C.J., Fagerberg, B., Jacobs, A.R., Krylov, D.M., Makarova, K.S., Nielsen, J., Bäckhed, F., 2013. Gut Mazumder, R., Mekhedov, S.L., Nikolskaya, metagenome in European women with A.N., Rao, B.S., et al., 2004. A normal, impaired and diabetic glucose comprehensive evolutionary classification of control. Nature 498, 99-103. proteins encoded in complete eukaryotic genomes. Genome Biol. 5, R7. Karlsson, F.H., Nookaew, I., Nielsen, J., 2014. Metagenomic Data Utilization and Analysis Koponen, J., Laakso, K., Koskenniemi, K., (MEDUSA) and Construction of a Global Kankainen, M., Savijoki, K., Nyman, T.A., Gut Microbial Gene Catalogue. PLoS de Vos, W.M., Tynkkynen, S., Kalkkinen, Comput. Biol. 10, e1003706. N., Varmanen, P., 2012. Effect of acid stress on protein expression and phosphorylation in Kassinen, A., Krogius-Kurikka, L., Mäkivuokko, Lactobacillus rhamnosus GG. J. Proteomics H., Rinttila, T., Paulin, L., Corander, J., 75, 1357-1374. Malinen, E., Apajalahti, J., Palva, A., 2007. The fecal microbiota of irritable bowel Korpela, K., Flint, H.J., Johnstone, A.M., Lappi, syndrome patients differs significantly from J., Poutanen, K., Dewulf, E., Delzenne, N., that of healthy subjects. Gastroenterology de Vos, W.M., Salonen, A., 2014. Gut 133, 24-33. microbiota signatures predict host and microbiota responses to dietary interventions Keiblinger, K.M., Wilhartitz, I.C., Schneider, T., in obese individuals. PloS One 9, e90702. Roschitzki, B., Schmid, E., Eberl, L., Riedel, K., Zechmeister-Boltenstern, S., 2012. Soil Kort, R., Caspers, M., van de Graaf, A., van metaproteomics - Comparative evaluation of Egmond, W., Keijser, B., Roeselers, G., protein extraction protocols. Soil Biol. 2014. Shaping the oral microbiota through Biochem. 54, 14-24. intimate kissing. Microbiome 2, 41.

Kekkonen, R.A., Lummela, N., Karjalainen, H., Kortman, G.A., Raffatellu, M., Swinkels, D.W., Latvala, S., Tynkkynen, S., Jarvenpaa, S., Tjalsma, H., 2014. Nutritional iron turned Kautiainen, H., Julkunen, I., Vapaatalo, H., inside out: intestinal stress from a gut Korpela, R., 2008a. Probiotic intervention microbial perspective. FEMS Microbiol. has strain-specific anti-inflammatory effects Rev. 38, 1202-1234. in healthy adults. World J. Gastroenterol. 14, 2029-2036. Koskenniemi, K., Koponen, J., Kankainen, M., Savijoki, K., Tynkkynen, S., de Vos, W.M., Kekkonen, R.A., Sysi-Aho, M., Seppanen-Laakso, Kalkkinen, N., Varmanen, P., 2009. T., Julkunen, I., Vapaatalo, H., Oresic, M., Proteome analysis of Lactobacillus Korpela, R., 2008b. Effect of probiotic rhamnosus GG using 2-D DIGE and mass Lactobacillus rhamnosus GG intervention on spectrometry shows differential protein global serum lipidomic profiles in healthy production in laboratory and industrial-type adults. World J. Gastroenterol. 14, 3188- growth media. J. Proteome Res. 8, 4993- 3194. 5007.

Kim, J., An, H.J., Garrido, D., German, J.B., Lebrilla, C.B., Mills, D.A., 2013. Proteomic analysis of Bifidobacterium longum subsp.

Koskenniemi, K., Laakso, K., Koponen, J., Lawley, T.D., Walker, A.W., 2013. Intestinal Kankainen, M., Greco, D., Auvinen, P., colonization resistance. Immunology 138, 1- Savijoki, K., Nyman, T.A., Surakka, A., 11. Salusjarvi, T., de Vos, W.M., Tynkkynen, S., Kalkkinen, N., Varmanen, P., 2011. Le Chatelier, E., Nielsen, T., Qin, J., Prifti, E., Proteomics and transcriptomics Hildebrand, F., Falony, G., Almeida, M., characterization of bile stress response in Arumugam, M., Batto, J., Kennedy, S., et al., probiotic Lactobacillus rhamnosus GG. Mol. 2013. Richness of human gut microbiome Cell. Proteomics 10, M110.002741. correlates with metabolic markers. Nature 500, 541-546. Kovatcheva‐Datchary, P., Egert, M., Maathuis, A., Rajilić‐Stojanović, M., De Graaf, A.A., LeBlanc, J.G., Milani, C., de Giori, G.S., Sesma, Smidt, H., De Vos, W.M., Venema, K., F., van Sinderen, D., Ventura, M., 2013. 2009. Linking phylogenetic identities of Bacteria as vitamin suppliers to their host: a bacteria to starch fermentation in an in vitro gut microbiota perspective. Curr. Opin. model of the large intestine by RNA‐based Biotechnol. 24, 160-168. stable isotope probing. Environ. Microbiol. 11, 914-926. Lee, J.H., Karamychev, V.N., Kozyavkin, S.A., Mills, D., Pavlov, A.R., Pavlova, N.V., Kurokawa, K., Itoh, T., Kuwahara, T., Oshima, Polouchine, N.N., Richardson, P.M., K., Toh, H., Toyoda, A., Takami, H., Morita, Shakhova, V.V., Slesarev, A.I., et al., 2008. H., Sharma, V.K., Srivastava, T.P., et al., Comparative genomic analysis of the gut 2007. Comparative metagenomics revealed bacterium Bifidobacterium longum reveals commonly enriched gene sets in human gut loci susceptible to deletion during pure microbiomes. DNA Res. 14, 169-181. culture growth. BMC Genomics 9, 247.

Lahti, L., Salonen, A., Kekkonen, R.A., Salojärvi, Le Maréchal, C., Peton, V., Plé, C., Vroland, C., J., Jalanka-Tuovinen, J., Palva, A., Orešič, Jardin, J., Briard-Bion, V., Durant, G., M., de Vos, W.M., 2013a. Associations Chuat, V., Loux, V., Foligné, B., et al., 2015. between the human intestinal microbiota, Surface proteins of Propionibacterium Lactobacillus rhamnosus GG and serum freudenreichii are involved in its anti- lipids indicated by integrated analysis of inflammatory properties. Journal of high-throughput profiling data. PeerJ 1, e32. proteomics 113, 447-461.

Lahti, L., Torrente, A., Elo, L.L., Brazma, A., Letunic, I., Yamada, T., Kanehisa, M., Bork, P., Rung, J., 2013b. A fully scalable online pre- 2008. iPath: interactive exploration of processing algorithm for short biochemical pathways and networks. Trends oligonucleotide microarray atlases. Nucleic Biochem. Sci. 33, 101-103. Acids Res. 41, e110. Levandowsky, M., Winter, D., 1971. Distance Lallès, J., 2014. Intestinal alkaline phosphatase: between sets. Nature 234, 34-35. novel functions and protective effects. Nutr. Rev. 72, 82-94. Li, J., Jia, H., Cai, X., Zhong, H., Feng, Q., Sunagawa, S., Arumugam, M., Kultima, J.R., Lam, H., Deutsch, E.W., Eddes, J.S., Eng, J.K., Prifti, E., Nielsen, T., et al., 2014. An Stein, S.E., Aebersold, R., 2008. Building integrated catalog of reference genes in the consensus spectral libraries for peptide human gut microbiome. Nat. Biotechnol. 32, identification in proteomics. Nat. Methods 5, 834-841. 873-875. Li, T., Chiang, J.Y., 2015. Bile acids as metabolic Land, M., Hauser, L., Jun, S., Nookaew, I., Leuze, regulators. Curr. Opin. Gastroenterol. 31, M.R., Ahn, T., Karpinets, T., Lund, O., Kora, 159-165. G., Wassenaar, T., et al., 2015. Insights from 20 years of bacterial genome sequencing. Funct. Integr. Genomics, 1-21.

Li, X., LeBlanc, J., Truong, A., Vuthoori, R., Lactobacillus gut-adaptive responses in Chen, S.S., Lustgarten, J.L., Roth, B., Allard, humans and mice. ISME J. 4, 1481-1484. J., Ippoliti, A., Presley, L.L., et al., 2011. A metaproteomic approach to study human- Martens, E.C., Kelly, A.G., Tauzin, A.S., Brumer, microbial ecosystems at the mucosal luminal H., 2014. The devil lies in the details: how interface. PLoS One 6, e26542. variations in polysaccharide fine-structure impact the physiology and evolution of gut Lichtman, J.S., Marcobal, A., Sonnenburg, J.L., microbes. J. Mol. Biol. 426, 3851-3865. Elias, J.E., 2013. Host-centric proteomics of stool: a novel strategy focused on intestinal McNulty, N.P., Wu, M., Erickson, A.R., Pan, C., responses to the gut microbiota. Mol. Cell. Erickson, B.K., Martens, E.C., Pudlo, N.A., Proteomics 12, 3310-3318. Muegge, B.D., Henrissat, B., Hettich, R.L., et al., 2013. Effects of diet on resource Liebler, D.C., Zimmerman, L.J., 2013. Targeted utilization by a model human gut microbiota quantitation of proteins by mass containing Bacteroides cellulosilyticus WH2, spectrometry. Biochemistry (N.Y.) 52, 3797- a symbiont with an extensive glycobiome. 3806. PLoS Biol. 11, e1001637.

Lindenstrauß, A.G., Behr, J., Ehrmann, M.A., Mesuere, B., Devreese, B., Debyser, G., Aerts, Haller, D., Vogel, R.F., 2013. Identification M., Vandamme, P., Dawyndt, P., 2012. of fitness determinants in Enterococcus Unipept: Tryptic Peptide-Based Biodiversity faecalis by differential proteomics. Arch. Analysis of Metaproteome Samples. J. Microbiol. 195, 121-130. Proteome Res. 11, 5773-5780.

Louis, P., Young, P., Holtrop, G., Flint, H.J., Michaels, J.A., Dasari, S., Pereira, L., Reddy, 2010. Diversity of human colonic butyrate‐ A.P., Lapidus, J.A., Lu, X., Jacob, T., producing bacteria revealed by analysis of Thomas, A., Rodland, M., Roberts, C.T., et the butyryl‐CoA: acetate CoA‐transferase al., 2007. Comprehensive proteomic analysis gene. Environ. Microbiol. 12, 304-314. of the human amniotic fluid proteome: gestational age-dependent changes. J. Lozupone, C.A., Stombaugh, J., Gonzalez, A., Proteome Res. 6, 1277-1285. Ackermann, G., Wendel, D., Vazquez-Baeza, Y., Jansson, J.K., Gordon, J.I., Knight, R., Michalik, S., Bernhardt, J., Otto, A., Moche, M., 2013. Meta-analyses of studies of the human Becher, D., Meyer, H., Lalk, M., Schurmann, microbiota. Genome Res. 23, 1704-1714. C., Schlüter, R., Kock, H., et al., 2012. Life and death of proteins: a case study of Machiels, K., Joossens, M., Sabino, J., De Preter, glucose-starved Staphylococcus aureus. Mol. V., Arijs, I., Eeckhaut, V., Ballet, V., Claes, Cell. Proteomics 11, 558-570. K., Van Immerseel, F., Verbeke, K., et al., 2014. A decrease of the butyrate-producing Moreno-Indias, I., Cardona, F., Tinahones, F.J., species Roseburia hominis and Queipo-Ortuño, M.I., 2014. Impact of the gut Faecalibacterium prausnitzii defines microbiota on the development of obesity dysbiosis in patients with ulcerative colitis. and type 2 diabetes mellitus. Front. Gut 63, 1275-1283. Microbiol. 5.

Mahowald, M.A., Rey, F.E., Seedorf, H., Morgan, X.C., Segata, N., Huttenhower, C., 2012. Turnbaugh, P.J., Fulton, R.S., Wollam, A., Biodiversity and functional genomics in the Shah, N., Wang, C., Magrini, V., Wilson, human microbiome. Trends Genet. 29, 51- R.K., et al., 2009. Characterizing a model 58. human gut microbiota composed of members of its two dominant bacterial phyla. Proc. Murayama, K., Fujimura, T., Morita, M., Shindo, Natl. Acad. Sci. U.S.A. 106, 5859-5864. N., 2001. One‐step subcellular fractionation of rat liver tissue using a Nycodenz density Marco, M.L., de Vries, M.C., Wels, M., gradient prepared by freezing‐thawing and Molenaar, D., Mangell, P., Ahrne, S., de two‐dimensional sodium dodecyl sulfate Vos, W.M., Vaughan, E.E., Kleerebezem, electrophoresis profiles of the main fraction M., 2010. Convergence in probiotic of organelles. Electrophoresis 22, 2872-2880.

Muth, T., Kolmeder, C.A., Salojärvi, J., Keskitalo, Nielsen, H.B., Almeida, M., Juncker, A.S., S., Varjosalo, M., Verdam, F.J., Rensen, Rasmussen, S., Li, J., Sunagawa, S., Plichta, S.S., Reichl, U., Vos, W.M., Rapp, E., 2015. D.R., Gautier, L., Pedersen, A.G., Le Navigating through metaproteomics data–a Chatelier, E., 2014. Identification and logbook of database searching. Proteomics. assembly of genomes and genetic elements in complex metagenomic samples without Muth, T., Behne, A., Heyer, R., Kohrs, F., using reference genomes. Nat. Biotechnol. Benndorf, D., Hoffmann, M., Lehtevä, M., 32, 822-828. Reichl, U., Martens, L., Rapp, E., 2015. The MetaProteomeAnalyzer: a powerful open- Norin, K., Gustafsson, B., Lindblad, B., Midtvedt, source software suite for metaproteomics T., 1985. The establishment of some data analysis and interpretation. J. Proteome microflora associated biochemical Res. 14, 1557-1565. characteristics in feces from children during the first years of life. Acta Paediatrica 74, Muth, T., Peters, J., Blackburn, J., Rapp, E., 207-212. Martens, L., 2013. ProteoCloud: A full- featured open source proteomics cloud Norman, J.M., Handley, S.A., Virgin, H.W., 2014. computing pipeline. J. Proteomics 88, 104- Kingdom-agnostic metagenomics and the 108. importance of complete characterization of enteric microbial communities. Nabokina, S.M., Inoue, K., Subramanian, V.S., Gastroenterology 146, 1459-1469. Valle, J.E., Yuasa, H., Said, H.M., 2014. Molecular identification and functional Nylund, L., Satokari, R., Salminen, S., de Vos, characterization of the human colonic W.M., 2014. Intestinal microbiota during thiamine pyrophosphate transporter. J. Biol. early life–impact on health and disease. Proc. Chem. 289, 4405-4416. Nutr. Soc. 73, 457-469.

Nagarajan, N., Pop, M., 2013. Sequence assembly O’Keefe, S.J.D, Li, J.V., Lahti, L., Ou, J., demystified. Nat. Rev. Genet. 14, 157-167. Carbonero, F., Mohammed, K., Posma, P.M., Kinross, J., Wahl, E., Ruder, E., et al., 2015. Neijssel, O.M., Mattos, M., 1994. The energetics Fat, Fiber and Cancer Risk in African of bacterial growth: a reassessment. Mol. Americans and Rural Africans. Nature Microbiol. 13, 179-182. Comm., 6.

Neilson, K.A., Ali, N.A., Muralidharan, S., Otto, A., Becher, D., Schmidt, F., 2014. Mirzaei, M., Mariani, M., Assadourian, G., Quantitative proteomics in the field of Lee, A., Van Sluyter, S.C., Haynes, P.A., microbiology. Proteomics 14, 547-565. 2011. Less label, more free: approaches in label‐free quantitative mass spectrometry. Ouwerkerk, J.P., de Vos, W.M., Belzer, C., 2013. Proteomics 11, 535-553. Glycobiome: bacteria and mucus at the epithelial interface. Best Pract. Res. Clin. Nesvizhskii, A.I., 2014. Proteogenomics: Gastroenterol. 27, 25-38. concepts, applications and computational strategies. Nat. Methods 11, 1114-1125. Oveland, E., Muth, T., Rapp, E., Martens, L., Berven, F.S., Barsnes, H., 2015. Viewing the Neville, B.A., Forde, B.M., Claesson, M.J., proteome: How to visualize proteomics data? Darby, T., Coghlan, A., Nally, K., Ross, Proteomics. R.P., O’Toole, P.W., 2012. Characterization of pro-inflammatory flagellin proteins Panda, S., Casellas, F., Vivancos, J.L., Cors, produced by Lactobacillus ruminis and M.G., Santiago, A., Cuenca, S., Guarner, F., related motile Lactobacilli. PloS One 7, Manichanh, C., 2014. Short-Term Effect of e40592. Antibiotics on Human Gut Microbiota. PloS One 9, e95476. Nguyen, T.L., Vieira-Silva, S., Liston, A., Raes, J., 2015. How informative is the mouse for Paradis, E., Claude, J., Strimmer, K., 2004. APE: human gut microbiota research? Dis. Model. Analyses of Phylogenetics and Evolution in Mech. 8, 1-16. R language. Bioinformatics 20, 289-290.

Paulo, J.A., Kadiyala, V., Banks, P.A., Conwell, abundant microbiota of young and elderly D.L., Steen, H., 2013. Mass spectrometry- adults. Environ. Microbiol. 11, 1736-1751. based quantitative proteomic profiling of human pancreatic and hepatic stellate cell Rampelli, S., Candela, M., Turroni, S., Biagi, E., lines. Genomics Proteomics Bioinformatics Collino, S., Franceschi, C., O'Toole, P.W., 11, 105-113. Brigidi, P., 2013. Functional metagenomic profiling of intestinal microbiome in extreme Pearson, J.R., Gill, C.I., Rowland, I.R., 2009. ageing. Aging (Albany NY) 5, 902. Diet, fecal water, and colon cancer– development of a biomarker. Nutr. Rev. 67, Rappé, M.S., Giovannoni, S.J., 2003. The 509-526. uncultured microbial majority. Annu. Rev. Microbiol. 57, 369-394. Penders, J., Thijs, C., Vink, C., Stelma, F.F., Snijders, B., Kummeling, I., van den Brandt, R-core team, R: A language and environment for P.A., Stobberingh, E.E., 2006. Factors statistical computing. R Foundation for influencing the composition of the intestinal Statistical Computing, Vienna, Austria. URL microbiota in early infancy. Pediatrics 118, http://www.R-project.org/. 511-521. Reichardt, N., Duncan, S.H., Young, P., Powell, S., Forslund, K., Szklarczyk, D., Belenguer, A., Leitch, C.M., Scott, K.P., Trachana, K., Roth, A., Huerta-Cepas, J., Flint, H.J., Louis, P., 2014. Phylogenetic Gabaldon, T., Rattei, T., Creevey, C., Kuhn, distribution of three pathways for propionate M., et al., 2014. eggNOG v4.0: nested production within the human gut microbiota. orthology inference across 3686 organisms. ISME J. 8, 1323-1335. Nucleic Acids Res. 42, D231-9. Reichhardt, M.P., Jarva, H., de Been, M., Miguel Qin, J., Li, R., Raes, J., Arumugam, M., Burgdorf, Rodriguez, J., Quintana Jimenez, E., K.S., Manichanh, C., Nielsen, T., Pons, N., Loimaranta, V., de Vos, W.M., Meri, S., Levenez, F., Yamada, T., et al., 2010. A 2014. The salivary scavenger and agglutinin human gut microbial gene catalogue in early life: diverse roles in amniotic fluid established by metagenomic sequencing. and in the infant intestine. J. Immunol. 61, Nature 464, 59-65. 252-252.

Qin, J., Li, Y., Cai, Z., Li, S., Zhu, J., Zhang, F., Rezzi, S., Ramadan, Z., Martin, F.P., Fay, L.B., Liang, S., Zhang, W., Guan, Y., Shen, D., et van Bladeren, P., Lindon, J.C., Nicholson, al., 2012. A metagenome-wide association J.K., Kochhar, S., 2007. Human metabolic study of gut microbiota in type 2 diabetes. phenotypes link directly to specific dietary Nature 490, 55-60. preferences in healthy individuals. J. Proteome Res. 6, 4469-4477. Quercia, S., Candela, M., Giuliani, C., Turroni, S., Luiselli, D., Rampelli, S., Brigidi, P., Righetti, P.G., Candiano, G., Citterio, A., Franceschi, C., Bacalini, M.G., Garagnani, Boschetti, E., 2015. Combinatorial Peptide P., et al., 2014. From lifetime to evolution: Ligand Libraries as a “Trojan Horse” in timescales of human gut microbiota Deep Discovery Proteomics. Anal. Chem. adaptation. Front. Microbiol. 5. 87, 293-305.

Rajilić-Stojanović, M., de Vos, W.M., 2014. The Rinttilä, T., Kassinen, A., Malinen, E., Krogius, first 1000 cultured species of the human L., Palva, A., 2004. Development of an gastrointestinal microbiota. FEMS extensive set of 16S rDNA‐targeted primers Microbiol. Rev. 38, 996-1047. for quantification of pathogenic and indigenous bacteria in faecal samples by Rajilić-Stojanović, M., Heilig, H.G., Molenaar, real‐time PCR. J. Appl. Microbiol. 97, 1166- D., Kajander, K., Surakka, A., Smidt, H., de 1177. Vos, W.M., 2009. Development and application of the human intestinal tract chip, a phylogenetic microarray: analysis of universally conserved phylotypes in the

Ritari, J., Lahti, L., de Vos, W.M., 2015. Schnorr, S.L., Candela, M., Rampelli, S., Improved Taxonomic Assignment of Human Centanni, M., Consolandi, C., Basaglia, G., Intestinal 16s rRNA Sequence Reads by a Turroni, S., Biagi, E., Peano, C., Severgnini, Dedicated Reference Database. Under M., et al., 2014. Gut microbiome of the review. Hadza hunter-gatherers. Nat. Commun. 5, 3654. Rodríguez-Valera, F., 2004. Environmental genomics, the big picture? FEMS Microbiol. Schoepfer, A.M., Schaffer, T., Seibold-Schmid, Lett. 231, 153-158. B., Müller, S., Seibold, F., 2008. Antibodies to flagellin indicate reactivity to bacterial Rosenberger, G., Koh, C.C., Guo, T., Röst, H.L., antigens in IBS patients. Neurogastroenterol. Kouvonen, P., Collins, B.C., Heusel, M., Motil. 20, 1110-1118. Liu, Y., Caron, E., Vichalkovski, A., et al., 2014. A repository of assays to quantify Scholtens, P.A., Oozeer, R., Martin, R., Amor, 10,000 human proteins by SWATH-MS. K.B., Knol, J., 2012. The early settlers: Scientific Data 1, 140031. intestinal microbiology in early life. Annu. Rev. Food Sci. Technol. 3, 425-447. Ruiz-Moyano, S., Tao, N., Underwood, M.A., Mills, D.A., 2012. Rapid discrimination of Schrimpe-Rutledge, A.C., Fontès, G., Gritsenko, Bifidobacterium animalis subspecies by M.A., Norbeck, A.D., Anderson, D.J., matrix-assisted laser desorption ionization- Waters, K.M., Adkins, J.N., Smith, R.D., time of flight mass spectrometry. Food Poitout, V., Metz, T.O., 2012. Discovery of Microbiol. 30, 432-437. novel glucose-regulated proteins in isolated human pancreatic islets using LC–MS/MS- Salonen, A., de Vos, W.M., 2014. Impact of diet based proteomics. J. Proteome Res. 11, on human intestinal microbiota and health. 3520-3532. Annu. Rev. Food Sci. Technol. 5, 239-262. Schuijt, T.J., van der Poll, T., de Vos, W.M., Salonen, A., Lahti, L., Salojärvi, J., Holtrop, G., Wiersinga, W.J., 2013. The intestinal Korpela, K., Duncan, S.H., Date, P., microbiota and host immune interactions in Farquharson, F., Johnstone, A.M., Lobley, the critically ill. Trends Microbiol. 21, 221- G.E., et al., 2014. Impact of diet and 229. individual variation on intestinal microbiota composition and fermentation products in Scott, K.P., Gratz, S.W., Sheridan, P.O., Flint, obese men. ISME J. 8, 2218-2230. H.J., Duncan, S.H., 2013. The influence of diet on the gut microbiota. Pharmacol. Res. Salonen, A., Nikkilä, J., Jalanka-Tuovinen, J., 69, 52-60. Immonen, O., Rajilić-Stojanović, M., Kekkonen, R.A., Palva, A., de Vos, W.M., Segata, N., Waldron, L., Ballarini, A., 2010. Comparative analysis of fecal DNA Narasimhan, V., Jousson, O., Huttenhower, extraction methods with phylogenetic C., 2012. Metagenomic microbial microarray: effective recovery of bacterial community profiling using unique clade- and archaeal DNA using mechanical cell specific marker genes. Nat. Methods 9, 811- lysis. J. Microbiol. Methods 81, 127-134. 814.

Scheperjans, F., Aho, V., Pereira, P.A., Koskinen, Seifert, J., Taubert, M., Jehmlich, N., Schmidt, F., K., Paulin, L., Pekkonen, E., Haapaniemi, E., Völker, U., Vogt, C., Richnow, H., von Kaakkola, S., Eerola‐Rautio, J., Pohja, M., et Bergen, M., 2012. Protein‐based stable al., 2015. Gut microbiota are related to isotope probing (protein‐SIP) in functional Parkinson's disease and clinical phenotype. metaproteomics. Mass Spectrom. Rev. 31, Mov. Disord. 30, 350-358. 683-697.

Schneider, T., Riedel, K., 2010. Environmental Shevchenko, A., Tomas, H., Havlis, J., Olsen, proteomics: analysis of structure and J.V., Mann, M., 2006. In-gel digestion for function of microbial communities. mass spectrometric characterization of Proteomics 10, 785-798. proteins and proteomes. Nat. Protoc. 1, 2856- 2860.

Shteynberg, D., Nesvizhskii, A.I., Moritz, R.L., Taverna, D.M., Goldstein, R.A., 2002. Why are Deutsch, E.W., 2013. Combining results of proteins marginally stable? Proteins 46, 105- multiple search engines in proteomics. Mol. 109. Cell. Proteomics 12, 2383-2393. Tims, S., Derom, C., Jonkers, D.M., Vlietinck, R., Siggins, A., Gunnigle, E., Abram, F., 2012. Saris, W.H., Kleerebezem, M., de Vos, Exploring mixed microbial community W.M., Zoetendal, E.G., 2012. Microbiota functioning: recent advances in conservation and BMI signatures in adult metaproteomics. FEMS Microbiol. Ecol. 80, monozygotic twins. ISME J. 7, 707-717. 265-280. Tissier, H., 1900. Recherches sur la flore Song, Y., Garg, S., Girotra, M., Maddox, C., von intestinale des nourrissons: état normal et Rosenvinge, E.C., Dutta, A., Dutta, S., pathologique. G. Carré et C. Naud. Fricke, W.F., 2013. Microbiota dynamics in patients treated with fecal microbiota Turnbaugh, P.J., Hamady, M., Yatsunenko, T., transplantation for recurrent Clostridium Cantarel, B.L., Duncan, A., Ley, R.E., Sogin, difficile infection. PloS One 8, e81330. M.L., Jones, W.J., Roe, B.A., Affourtit, J.P., et al., 2009. A core gut microbiome in obese Sonnenburg, E.D., Sonnenburg, J.L., 2014. and lean twins. Nature 457, 480-484. Starving our Microbial Self: The Deleterious Consequences of a Diet Deficient in Tyakht, A.V., Kostryukova, E.S., Popenko, A.S., Microbiota-Accessible Carbohydrates. Cell Belenikin, M.S., Pavlenko, A.V., Larin, Metab. 20, 779-786. A.K., Karpova, I.Y., Selezneva, O.V., Semashko, T.A., Ospanova, E.A., et al., Speicher, K.D., Kolbas, O., Harper, S., Speicher, 2013. Human gut microbiota community D.W., 2000. Systematic analysis of peptide structures in urban and rural populations in recoveries from in-gel digestions for protein Russia. Nat. Comm. 4. identifications in proteome studies. J. Biomol. Tech. 11, 74-86. van Nood, E., Vrieze, A., Nieuwdorp, M., Fuentes, S., Zoetendal, E.G., de Vos, W.M., Steck, N., Mueller, K., Schemann, M., Haller, D., Visser, C.E., Kuijper, E.J., Bartelsman, J.F., 2012. Bacterial proteases in IBD and IBS. Tijssen, J.G., et al., 2013. Duodenal infusion Gut 61, 1610-1618. of donor feces for recurrent Clostridium difficile. N. Engl. J. Med. 368, 407-415. Stephen, A.M., Cummings, J.H., 1980. The microbial contribution to human faecal mass. van Ommen, B., van der Greef, J., Ordovas, J.M., J. Med. Microbiol. 13, 45-56. Daniel, H., 2014. Phenotypic flexibility as key factor in the human nutrition and health Tanca, A., Palomba, A., Pisanu, S., Addis, M.F., relationship. Genes Nutr. 9, 1-9. Uzzau, S., 2015. Enrichment or depletion? The impact of stool pretreatment on Vaudel, M., Sickmann, A., Martens, L., 2012. metaproteomic characterization of the human Current methods for global proteome gut microbiota. Proteomics. identification. Expert Rev. Proteomics 9, 519-532. Tanca, A., Palomba, A., Pisanu, S., Deligios, M., Fraumene, C., Manghina, V., Pagnozzi, D., Venables, W.N., Ripley, B.D., 2002. Modern Addis, M.F., Uzzau, S., 2014. A applied statistics with S. Springer. straightforward and efficient analytical pipeline for metaproteome characterization. Venema, K., Van den Abbeele, P., 2013. Microbiome 2, 1-16. Experimental models of the gut microbiome. Best Pract. Res. Clin. Gastroenterol. 27, 115- Tatusov, R.L., Koonin, E.V., Lipman, D.J., 1997. 126. A genomic perspective on protein families. Science 278, 631-637.

Verdam, F.J., Fuentes, S., de Jonge, C., Zoetendal, Milk Oligosaccharides Consumed. J. Pediatr. E.G., Erbil, R., Greve, J.W., Buurman, W.A., Gastroenterol. Nutr. de Vos, W.M., Rensen, S.S., 2013. Human intestinal microbiota composition is Warnecke, F., Luginbühl, P., Ivanova, N., associated with local and systemic Ghassemian, M., Richardson, T.H., Stege, inflammation in obesity. Obesity 21, E607- J.T., Cayouette, M., McHardy, A.C., E615. Djordjevic, G., Aboushadi, N., et al., 2007. Metagenomic and functional analysis of Vey, G., Charles, T.C., 2014. MetaProx: the hindgut microbiota of a wood-feeding higher database of metagenomic proximons. termite. Nature 450, 560-565. Database (Oxford) 2014, 10.1093/database/bau097. Weng, M., Walker, W., 2013. The role of gut microbiota in programming the immune Vijay-Kumar, M., Aitken, J.D., Gewirtz, A.T., phenotype. J. Dev. Orig. Health Dis. 4, 203- 2008. Toll like receptor-5: protecting the gut 214. from enteric microbes. Semin. Immunopathol. 30, 11-21. Williams, R., Peisajovich, S.G., Miller, O.J., Magdassi, S., Tawfik, D.S., Griffiths, A.D., Vrieze, A., Van Nood, E., Holleman, F., Salojärvi, 2006. Amplification of complex gene J., Kootte, R.S., Bartelsman, J.F.W.M., libraries by emulsion PCR. Na. Methods 3, Dallinga-Thie, G.M., Ackermans, M.T., 545-550. Serlie, M.J., Oozeer, R., et al., 2012. Transfer of intestinal microbiota from lean Wilmes, P., Bond, P.L., 2004. The application of donors increases insulin sensitivity in two-dimensional polyacrylamide gel subjects with metabolic syndrome. electrophoresis and downstream analyses to a Gastroenterology 143, 913-916. mixed community of prokaryotic microorganisms. Environ. Microbiol. 6, 911- Wacklin, P., Tuimala, J., Nikkilä, J., Tims, S., 920. Mäkivuokko, H., Alakulppi, N., Laine, P., Rajilić-Stojanović, M., Paulin, L., de Vos, Wilmes, P., Bond, P.L., 2006. Metaproteomics: W.M., et al., 2014. Faecal Microbiota studying functional gene expression in Composition in Adults Is Associated with the microbial ecosystems. Trends Microbiol. 14, FUT2 Gene Determining the Secretor Status. 92-97. PloS One 9, e94863. Wu, G.D., Compher, C., Chen, E.Z., Smith, S.A., Walker, A.W., Ince, J., Duncan, S.H., Webster, Shah, R.D., Bittinger, K., Chehoud, C., L.M., Holtrop, G., Ze, X., Brown, D., Stares, Albenberg, L.G., Nessel, L., Gilroy, E., et M.D., Scott, P., Bergerat, A., et al., 2011. al., 2014. Comparative metabolomics in Dominant and diet-responsive groups of vegans and omnivores reveal constraints on bacteria within the human colonic diet-dependent gut microbiota metabolite microbiota. ISME J. 5, 220-230. production. Gut , gutjnl-2014-308209.

Walters, W.A., Xu, Z., Knight, R., 2014. Meta- Wu, J., An, Y., Yao, J., Wang, Y., Tang, H., 2010. analyses of human gut microbes associated An optimised sample preparation method for with obesity and IBD. FEBS Lett. 588, 4223- NMR-based faecal metabonomic analysis. 4233. Analyst 135, 1023-1030.

Wang, M., Ahrne, S., Jeppsson, B., Molin, G., Xiong, W., Giannone, R.J., Morowitz, M.J., 2005. Comparison of bacterial diversity Banfield, J.F., Hettich, R.L., 2014. along the human intestinal tract by direct Development of an Enhanced Metaproteomic cloning and sequencing of 16S rRNA genes. Approach for Deepening the Microbiome FEMS Microbiol. Ecol. 54, 219-231. Characterization of the Human Infant Gut. J. Proteome Res. 14, 133-141. Wang, M., Li, M., Wu, S., Lebrilla, C.B., Chapkin, R.S., Ivanov, I., Donovan, S.M., 2015. Fecal Microbiota Composition of Breast-fed Infants is Correlated with Human

Yadav, A.K., Kumar, D., Dash, D., 2012. Learning from decoys to improve the sensitivity and specificity of proteomics database search results. PloS One 7, e50651.

Yatsunenko, T., Rey, F.E., Manary, M.J., Trehan, I., Dominguez-Bello, M.G., Contreras, M., Magris, M., Hidalgo, G., Baldassano, R.N., Anokhin, A.P., et al., 2012. Human gut microbiome viewed across age and geography. Nature 486, 222-227.

Yu, Z., Morrison, M., 2004. Improved extraction of PCR-quality community DNA from digesta and fecal samples. BioTechniques 36, 808-812.

Zoetendal, E.G., Akkermans, A.D., Akkermans- van Vliet, W.M., de Visser, J., Arjan, G.M., de Vos, W.M., 2001. The host genotype affects the bacterial community in the human gastronintestinal tract. Microb. Ecol. Health Dis. 13, 129-134.

Zoetendal, E.G., Raes, J., van den Bogert, B., Arumugam, M., Booijink, C.C., Troost, F.J., Bork, P., Wels, M., de Vos, W.M., Kleerebezem, M., 2012. The human small intestinal microbiota is driven by rapid uptake and conversion of simple carbohydrates. ISME J. 6, 1415-1426.

Zoetendal, E.G., de Vos, W.M., 2014. Effect of diet on the intestinal microbiota and its activity. Curr. Opin. Gastroenterol. 30, 189- 195.