THÈSE Pour obtenir le grade de DOCTEUR DE L’UNIVERSITÉ DE GRENOBLE

Spécialité : Méthodes et algorithmes en Biologie

Présentée par Tarfa Mustafa

Thèse dirigée par Roberto A. Geremia et codirigée par Jean Martins Préparée au sein du Laboratoire d’Ecologie Alpine Dans l'École Doctorale Ingénierie de la santé, la cognition et l’environnement

Towards a metranscriptomic comparison of two alpine soils under contrasted snow cover

Thèse soutenue publiquement le 28 Septembre 2011, devant le jury composé de :

Rolland Marmeisse: CR CNRS Ecologie Microbienne Lyon ( rapporteur ) Denis Faure: DR CNRS, Institut des Sciences du Végétal, Gif-sur-Yvette ( rapporteur ) Lionel Ranjard: DR INRA, Genosol, Dijon ( examinateur ) Patrick Ravanel: Prof, UJF, LECA, Grenoble ( examinateur ) Jean Martins: CR CNRS, LTHE, Grenoble, ( co-directeur de thèse ) Roberto A. Geremia: DR CNRS- LECA, Grenoble, ( directeur de thèse )

Table of content Remerciements Présentation de la démarche Chapter I - literature Review 1 2.1 Soil environment in general, who is there? 1 2.1.1 Soil environment, formation and characteristics. 1 2.1.2 Soil biodiversity: definition and concept. 1 2.1.3 Microbial diversity estimations methods. 4 2.1.4 Eukaryotes as major component of soil environment. 7 2.1.4.1 Eukaryotes evolution history. 7 2.1.4.2 Fungal phylogeny. 8 2.2 Soil ecosystem, who is there, and what they are doing? 13 2.2.1 What is an ecosystem? 13 2.2.2 The soil ecosystem, concept, definition and key elements. 14 2.2.2.1 Major factors influencing soil ecosystem. 15 2.2.2.2 Above- and belowground interaction effects on ecosystem functions. 17 I. Feedbacks between below- and above ground communities. 17 II. Rhizodeposition as major factor in biogeochemical cycles of soil. 20 2.2.3 Soil ecosystem: Fungi as a major actor 23 2.3 Alpine soil as special ecosystem: 27 2.3.1 Definition and key features. 27 2.3.2 Climate changes effect on biodiversity in alpine ecosystem. 27 2.3.2.1 Snow-cover effects on seasonal succession of microbial communities. 28 2.3.2.2 Climate changes determine vegetal communities in alpine environment. 30 2.3.2.3 Snow-cover impact on plant-microorganisms interactions. 32 2.3.2.4 Biogeochemical cycles in alpine environment in function of snow-cover. 33 2.3.3 Snowmelt gradient in relation with ecosystem functions. 37 2.4 Methods for studying functional diversity of soil: 42 2.4.1 Functional diversity analysis using traditional and biochemical approaches. 42 2.4.2 Molecular approaches to solve ecological questions. 44 2.4.2.1 Historical preface. 44 2.4.2.2 Metagenomic, a step forward for understanding functional diversity. 45 2.4.2.3 Transcriptome sequencing for gene expression analysis. 47

I

2.4.2.3.1 First methodologies, advances and limitations. 47 2.4.2.3.2 Next generation sequencing, application, advances and limitations. 48 2.4.2.3.3 Tag-based approaches for gene expression profiling. 53 2.4.2.4 Metatranscriptomic approach: 55 2.4.2.4.1 Definition, concept and limitations. 55 2.4.2.4.2 Knowledge on Metatranscriptomic approach. 57 2.4.2.4.3 Drawbacks and technical limitations. 59 Résumé Chapter II - Development of total RNA extraction protocol, cDNA libraries 64 construction, and 454 pyrosequencing 63 3.1 Introduction: 3.2 Materiel and methods: 66 3.2.1 Soil samples. 66 3.2.2 Total RNA extraction. 67 3.2.2.1 Extraction method. 68 3.2.2.2 RNA purification and DNA elimination. 68 3.2.2.3 RNA quality and quantity estimation. 69 3.2.3 cDNA preparation. 69 3.2.4 454 Pyrosequencing. 70 3.3 Results and discussion: 71 3.3.1 Total RNA extraction. 71 3.3.1.1 RNA PowerSoil TM Total RNA Isolation Kit. 71 3.3.1.2 Modified protocol proposed by Bailly et al 2007. 73 3.3.2 cDNA libraries preparations. 77 3.4 Discussion: 77 Résumé Chapter III: 454 pyrosequencing data analysis 82 4.1 Introductions: 81 4.2 Material and methods: 87 4.2.1 Sequences analysis, size selection, trimming and assembly. 87 4.2.2 Taxonomical and functional analysis. 87 4.3 Results: 88 4.3.1 Row dataset preparation and assembly. 88 4.3.2 Community composition and taxonomic origin of transcripts. 92

II

4.3.2.1 Taxonomic affiliation of ribosomal reads. 92 4.3.2.2 Taxonomic affiliation of putative mRNA reads. 95 4.3.2.2.1 MG-RAST approach. 4.3.2.2.2 Blast2Go approach. 4.3.3 Functional analysis of transcripts. 96 4.3.3.1 Functional analysis using SEED subsystems (MG-RAST). 96 4.3.3.2 Functional analysis based on Gene Ontology (GO terms). 101 4.4 Discussion 109 4.4.1 Choice of studied ecosystem and what possible ecological significance. 109 4.4.2 Choice of molecular approach for ecological questions. 109 4.4.3 Qualitative and quantitative aspects of obtained dataset. 110 4.4.4 Assembly, as important factor for improving 454 pyrosequencing reads quality. 112 4.4.5 Phylogenetic analysis; different approaches, different databases, and as 113 consequence different results. 4.4.6 Functional annotations using MG-RAST and BLAST2GO tell different stories: 116 4.5 Conclusions and perspective. 120 Résumé References: Additional files: Article 1: Towards a metatranscriptomic comparison between two alpines soils. Article 2: Effect of plant species identity, drought stress and defoliation on the diversity of bacterial and fungal communities associated with the roots of grasses. Article 3: The dependence on porosity of the phylogenetic structure of bacterial urban sediments .

III

Remerciements

Ce fut comme un éclair, comme un clin d’œil ! Cinq années sont passées ; la tâche est terminée mais l’aventure scientifique et l’expérience humaine n’ont fait que changer dans mon esprit et dans ma personne, ne feront que graver encore et profondément dans ma mémoire et dans mon cœur. L’expérience humaine de la thèse est parfois (ou souvent) une expérience de solitude. Mais je tiens à reconnaître qu’elle n’aurait même pas été possible sans votre accompagnement ou tout simplement sans votre présence à mes côtés. Je voudrais tout d’abord adresser mes remerciements à Rolland Marmeisse (Ecologie Microbienne, Lyon) et Denis Faure (Institut des Sciences du Végétal, Gif-sur-Yvette ) d’avoir accepté de rapporter ma thèse, ainsi que pour leurs leurs critiques sur le manuscrit. Je voudrais également remercier Lionel Ranjard (INRA Genosol, Dijon) et Patrick Ravanel (UJF-LECA, Grenoble) d’avoir accepté de faire partie du jury. A mon directeur de thèse Roberto Geremia. Malgré le fait que notre relation a passé beaucoup des étapes difficiles au sein de la parcoure de ma thèse, je tiens à te remercier de m’avoir accepté dans ton équipe et de me permettre de m’épanouir dans plusieurs domaines scientifiques et surtout l’écologie microbienne. Je te remercie également pour ta volonté à nous faire sortir des situations bloqué chaque fois. Je remercie également Jean martins pour sa participation dans la première partie de ma thèse, et malgré le fait que le plan de thèse a été largement modifié, Je n’oublierai pas son insistance p our continuer de diriger ma thèse jusqu’à la fin. Jean-marc bonville, c’est avec toi que j’ai découvert le monde de la biologie moléculaire et bioinformatique. Malgré tant de disputes et de malentendus, les deux dernières années de ma thèse passée sous ton encadrement étaient la période la plus innovante, et c’était la période où j’ai découvert le véritable monde de la recherche. Je te remercie profondément pour tout ce que tu m’as appris. Grand merci à Armelle munièr, la personne avec qui j’ai passé de très bons moments. Armelle, ta présence à mes côtés était l’un des rares éléments positifs au sein du parcours perturbé de ma thèse. Je ne peux jamais oublier l’aide précieuse que tu m’as offerte et franchement, je ne peux pas imaginer ces cinq ans de thèse sans toi. Cette thèse n’aurait pas pu voir le jour et n’aurait pas pu être menée convenablement à son terme sans les interventions !décisives de Philippe Choler.

IV

Le grand chef Patrick Ravanel, merci pour ta disponibilité et ton soutien, notamment dans les moments de doute et de découragement. Ce travail de thèse n’aurait pas été le même sans l’ancienne é quipe PEX, et je veux particulièrement remercier l’adorable Angélique San Miguel. Je ne peux jamais oublier tous les moments que tu m’as conforté et ta présence dans le laboratoire était pour moi un rayon de soleil très brillant. Ozgur clinck, tant des baskets de cigarettes fumées en bas de LECA, tant de souffrances partagées mais aussi tant d’espoirs et de rêves. Je te remercie pour tous les moments passés ensemble. Egalement je ne peux pas oublier le magnifique Mathieu faure, le troisième étranger dans le laboratoire avec Ozgur et moi. Abdel, tant de discussions scientifiques avec toi et beaucoup de choses partagées, je te remercie et te souhaite plein de réussite dan ta vie personnelle et professionnelle. Luci et Bahar, je vous remercie pour les moments partagés au début de ma thèse et pour l’aide scientifique précieuse que vous m’avait présentée . Grand merci pour madame Lucile Sage et monsieur Mamadu B ello, pour l’aide précieuse que vous m’avez présenté, et de pouvoir me confier à vous dans les moments difficile. Une pensée particulière pour tous les membres de l’équipe de foot « Michaiel, Fred, Franchisco, Abdel, Saiied, Julien, Max, et plein d’autres, et bien sûr notre équipe de supportrices féminin « Emma, Marie-Audri, Claire et Angélique». Je tiens également à remercier Olivier Lontin, notre brillant informaticien. Tu étais toujours disponible pour moi, et avec les cin q ordinateurs que tu m’as fourni s pendant ma thèse je pense qu’on a fait un record difficile à battre au sein de LECA. Je remercie mes amis syriens avec lesquelles j’ai passé ses cinq ans de thèse « anas, Hazem, Hatfan, Safi, Tahsin). Je remercie mon père et ma mère et toute ma famille pour le soutien moral qui m’ont donné, sans lequel Je n’aurais pas pu atteindre la fin du long tunnel de cette expérience. (Sombre peut paraître comme cauchemardesque). Je tiens à remercier profondément mon adorable femme qui m’as soutenu pendant les deux dernières années de ma thèse et qui a dû supporter mes états d’âmes et de stress que j’ai dû vivre et le stress que j’ai subie surtout à pendant la phase de rédaction. Ton existence à côté de moi m’a donné l’espoir et l’envie de continuer et de traverser t ous les obstacles scientifiques et humains.

V

Présentation de la démarche Cette thèse avait pour objectif général d’étudier « l’effet des stress naturels et anthropiques sur la diversité microbienne des écosystèmes terrestres avec un focus particulier sur l’ impact et la résilience ». Le premier chapitre se focalise sur la distribution des communautés bactériennes et fongiques à l'intérieur de la microstructure d'une couche sédimentaire multi-contaminée provenant d ’un bassin d’infiltration d’eaux pluviales urbaines situé à Meyzieu (Rhône, France ). Ce travail a fait l’objet d’une publication dans le journal FEMS Microbiology Ecology (Badin et al. 2011). Le deuxième chapitre avait pour objectif de montrer l ’effet des espèces végétales, et en particulier le stress imposé par, la sécheresse et la défoliation, sur la diversité des communautés bactériennes et fongiques associées aux racines des plantes. Ce travail a été soumis pour publication dans le journal Plant and Soil (Bouasria et al. 2011). Le troisième chapitre concerne les e ffets d’un amendement en composés phénoliques et en extraits de litières sur la diversité des communautés microbiennes. Ce travail n’a pas encore pu être publié à cause de difficultés expérimentales et techniques. D’autres volets de rech erche ont été également abordés durant cette thèse notamment sur la mobilisation de bactéries et de champignons dans deux contextes différents : un néosol urbain issu d’un bassin d’assainissement pluvial urbain et un sol agricole. Ces travaux n’ ont malheureusement pas pu être menés à terme pour des raisons financières et techniques. Ils ne seront pas présentés dans ce document. La réalisation de l ’ensemble de ces travaux a nécessité trois ans et demi, au bout desquels, certains objectifs n’ont pu être attei nts. Nous avons donc du réorienter radicalement la deuxième partie de la thèse vers l’étude de la diversité fonctionnelle des sols alpins par une approche méta-transcriptomique. La distribution de la neige à l'échelle du paysage est une des variables les plus importantes contrôlant la structure et la fonction des écosystèmes de montagne. De plus, le couvert neigeux est un des facteurs les plus importants pour le contrôle du microclimat et les conditions de croissance des plantes dans les écosystèmes alpins. Des changements d’épaisseur neigeuse et de durée d ’enneigement peuvent entraîner de grands changements dans les conditions édapho-climatiques, ainsi que dans la composition des communautés végétales. Les sols saisonnièrement couverts par la neige sont fortement supposés séquestrer de grandes quantités de carbone organique. Dans les zones alpines et les toundras arctiques, la durée de l’enneigement impactent radicalement la structure et le fonctionnement d e l'écosystème. De plus, les cycles biogéochimiques majeurs des nutriments et du carbone sont sous l'influence de la fréquence et la durée d'événements de gel-dégel.

VI

Nous avons utilisé l’approche m étatranscriptomique pour essayer de comprendre la diversité fonctionnelle réelle et les activités exprimées dans les sols alpins par les micro- organismes, en réponse à différentes contraintes environnementales. La transcriptomique et, par extension, la métatranscriptomique, peut être vues comme l'analyse quantitative complète de tous les gènes exprimés par un ou plusieurs organismes, ou par l'écosystème entier. L’utilisation de cette approche implique d’abord d’ extraire des ARN de bonne qualité et avec un bon rendement, ensuite de convertir ces ARN en cDNA en ciblant les fractions de ARNm. La capacité d'évaluer le metatranscriptome des communautés microbiennes complexes dans différentes conditions environnementales représente en soi une avancée significative dans notre capacité de relier la structure et les fonctions des communautés avec les génotypes d'ADN (les séquences) et avec la correspondance phénotype. Le pyroséquençage à grande échelle de cDNAs offre une opportunité unique pour explorer profondément la nature et la complexité du monde de la transcriptomique. De plus, le pyroséquençage est bien adapté pour analyser le transcriptome d’espèces modèle s et non-modèles et permettre la construction rapide de données de séquençage avec un moindre coût (de temps et de prix). L’environnent des zones alpines présente des opportunités écologiques très intéressantes pour évaluer les effets de la neige sur les conditions climatique locales et sur les processus qui se déroulent dans cet écosystème. Pourtant, les communautés microbiennes alpines ne sont pas bien connues et seulement quelques études comparatives de dynamique des communautés microbiennes par rapport aux modèles de couverture de neige ont été décrites. Dans cette étude, nous présentons la première préparation de base de données de cDNA et une première caractérisation du métatranscriptome de deux sols alpins, séquencés en utilisant la technologie 454. Nous présentons également une analyse des séquences et les procédures d'annotation en utilisant des logiciels publiquement disponibles et des scripts de python en utilisant l’environnent d’Obitools . Nous avons également développé un pipeline d'analyse bio-informatique adapté qui permet d'extraire correctement des renseignements fonctionnels et taxinomiques de ces bases de données. Nos objectifs étaient de lancer une première étude expérimentale en établissant d'abord l’approche métatranscriptomique concernant les activités des communautés microbiennes des eucaryotes des sols alpins sous deux conditions d’enneigement très contrasté es nommé LSM (lately snowmelt) et ESM (early snowmelt), qui sont caractérisés par des gradients climatiques contrastés et des différences de végétations associées.

VII

Chapter I - literature Review 2.1 Soil environment in general, who is there? 2.1.1 Soil environment, formation and major characteristics: Historically, soil is the product of the weathering of rocks and minerals, and can have various properties according to the origin of the parent material, climate and vegetation (Taylor et al. 2009). From an evolutionary perspective, the emergence of higher plants with deep and mycorrhizal root systems is considered as major process in the earth history and the biogeochemical cycles of CO 2 (and O 2) via their impact on rock weathering and soil formation in terrestrial ecosystems (Berner 1997; Taylor et al., 2009). Soil structure and development is controlled by several abiotic factors such as depth and loosening, structuring, water retention, aeration and gas exchange, availability of inorganic and organic compounds, rock weathering and C sequestration, which, in turn is controlled and determined by biotic factors such as plant and their associated microorganisms (Van-Breemen, 1993; Lambers et al., 2009). Plants-microorganisms associations and interactions are major factors in the formation or modification of soil through their role in supplying and transforming organic matter (Lambers et al., 2009; Taylor et al., 2009). Overall, soil comprises the most spatially and temporally heterogeneous environment on earth; a combination of complex physical structure, geochemistry, and dramatic seasonal fluctuations in temperature, moisture and nutrient availability, which are all experienced by soil microbes at micro-scales, and result in higher diversity and biomass of organisms compared with other environments (Schmidt et al., 2007; Roesch et al., 2007). All these organisms not only cohabit but also interact in soil through nutrient recycling, competition, parasitism, and predation in a multitrophic network (Coleman & Whitman, 2005). These complex interactions regulate the soil biomass so that the composition of one group is influenced by the others (Edel-Hermann et al., 2008).

2.1.2 Soil biodiversity: Soil biodiversity refers to all organisms living in soil, which could be divided, depending on the size class organisms into macro, meso and microfauna, and depending to preferred living environment into, aboveground and belowground communities (Breure et al., 2004). Biodiversity is concerned with the functional attributes of ecosystem, for example, decomposition and nutrient cycling, in addition to numbers of species of all the biota present. This differentiates it from the concept of species diversity, which is concerned with the identity and distribution of species in a given habitat or region (Coleman & Whitman, 2005). Microbial diversity describes complexity and variability at different levels of biological

1 organization. It covers genetic variability within taxon (species) and the number (richness) and relative abundance (evenness) of taxons and functional groups (guilds) in communities (Torsvik & Øvreås, 2002). The term species diversity consists of two components; the first one is the total number of species present, which can be referred to as species richness. In other words it refers to the quantitative variation among species. The second component is the distribution of individuals among these species, which is referred to as evenness. However biodiversity is based upon the species as a unit (Øvreâs, 2000). As soil biota is thought to harbor the largest part of the world’s biodiversity and to control processes that are considered as globally important components in the cycling of organic matter, energy and nutrients, it is largely important to study these organisms in their natural habitats (Gardi et al., 2009). Recent results from both, culturing dependent and independent approaches clearly demonstrate that soil biodiversity is even higher than previously imagined (Tiedje et al., 1999; Gardi et al., 2009). Therefore, diversity analysis is important in order to; (1) Understand the relative distribution of organisms, (2) Increase the information concerning diversity of genetic resources in a community, (3) Enhance the knowledge of the functional role of diversity and the factors regulating the biodiversity, and finally, (4) Understand the consequence of biodiversity, for example, does specific level of diversity is crucial for ecosystem functioning and sustainability. Studies on biodiversity and its relation to ecosystem structure and function have mainly focused on macro-organisms, and little attention has been directed towards microorganisms, despite the huge role that they play in the cycling of organic matter, energy and nutrients within all environments (Øvreâs, 2000; Gardi et al., 2009). Basically, there are three principal biological domains in life, represented by Prokaryotes (dominated by ), Archea, and finally Eucarya which consist mainly of protists, fungi, plants and animals. As these three domains are well represented within all terrestrial ecosystems ( Fig. 1 ), (Coleman & Whitman, 2005); soil is considered as the most important reservoir of biodiversity on earth, and the most undiscovered one (Tiedje et al., 1999; Gardi et al., 2009).

2

Figure 1. Universal phylogenetic tree based on small-subunit ribosomal RNA sequences. Sixty-four rRNA sequences representative of all known phylogenetic domains were aligned and a tree was produced using a likelihood method. Modified from Pace (1999) by Coleman & Whitman (2005).

Soil microorganisms are the unseen majority in soil and comprise a large portion of life’s genetic diversity (Van der heijden et al., 2008). Bacteria which are the dominant fraction of prokaryotes are considered as probably the most species array of organisms on earth (Torsvik & Øvreås, 2002; Coleman & Whitman, 2005). It is believed that there are at least 53 bacterial phyla on earth based upon signature sequences encountered in DNA extracted directly from the environment. Only representatives of 27 phyla have been cultivated and described in pure culture, and the most of these phyla are represented by only a few isolates and some with only one described species. Moreover, only five phyla (Actinobacteria, Bacteroidetes, Cyanobacteria, Firmicutes, and ) represent 95% of all cultivated and published species (Coleman & Whitman, 2005). Fungi are considered as the second most abundant fraction within the microbial community in soil (Anderson & Cairney, 2004; Coleman & Whitman, 2005), and the most diverse group of Eukarya, with diversity estimation of 0.712 to 1.500 x 10 6 species (Mueller et al., 2007). In fact, fungal hyphae often account for the greatest fraction of soil biomass (Wardle, 2002) and can reach lengths of hundreds to thousands of meters per gram of soil (Peay et al., 2008). Environmental surveys indicate that the Archaea are diverse and abundant not only in extreme environments, but also in soil, where they may

3 carry out a key role in the biogeochemical cycles of the planet (Gribaldo & Armanet, 2006; Lamb et al., 2011). The Archaea are now known to be metabolically diverse organisms coexisting with Bacteria and Eukarya in the majority of earth environments, both terrestrial and aquatic, including extreme ones, such as high or low pH, low temperature, high salinity or pressure (Rothschild & Mancinelli, 2001). In addition, archaeal molecular systems generally show a level of complexity; in terms of number of components halfway through that of bacterial and eukaryal ones. For example, a number of ribosomal proteins are uniquely shared between Archaea and Eukarya, while none is uniquely shared with Bacteria or between Bacteria and Eukarya (Gribaldo & Armanet, 2006; Lamb et al., 2011). Despite the dominance of bacteria and fungi in the majority of soil ecosystems; they represent only a small fraction of the soil biomass, which incorporate a wide range of other eukaryotic organisms, including protozoa and nematodes. In addition to the presences of well- studied fauna such as ants, the soil ecosystem contains many less studied as well as more numerous mesofauna, such as micro arthropods (Torsvik & Øvreås, 2002; Coleman & Whitman, 2005). Researches on soil biodiversity indicate also several thousands of invertebrate species. The most important soil invertebrate groups in terms of numerical abundance and/or total biomass in temperate regions are: nematodes, micro-arthropods (mites and collembolans), enchytraeids and earthworms, which are found in the uppermost soil layers (for example, the soil surface and the litter layer), however, the majority of soil organisms are still unknown, and it has been estimated, for example, that the currently described fauna of Nematoda, Acari and Protozoa represents less than 5% of the total number of species (Gardi et al., 2009).

2.1.3 Methods of microbial diversity estimations : Estimating soil microbial diversity is a central part in understanding soil microbial ecology (Maron et al., 2007). Even if the presence of microorganisms in soil was observed in the nineteenth century, soil is still considered as the last great domains for biodiversity research and are supposed to contain a remarkable range of organisms not yet explored (Gabriel, 2010). All investigations of soil microorganisms are hampered by the heterogeneity of the soil composition, the vast numbers (c. 10 9 individual cells) and diversity (>10 6 distinct taxa) of microorganisms present in each gram of soil, the paucity of knowledge concerning the majority of the microbiota (Hirsch et al., 2010), the irregular distribution and abundance of microorganisms in soil (Coleman & Whitman, 2005; Nielsen et al., 2011), and finally the inability to study soil microorganisms, as in general only a minuscule fraction of soil

4 microbial population (estimated between 1 to 10%) can be cultured by standards laboratory practices (Torsvik & Øvreås, 2002; Kirk et al., 2004; Maron et al., 2007; Warnecke & Hess, 2009). In recent decades several attempts have been made to face limitations of culture-dependent and biochemical approaches that have lead to considerable progress in soil biology and microbiology. The development of molecular tools and implementation of signature DNA sequences and others molecular approaches have greatly facilitated the identification of novel taxa. Molecular fingerprints of microbial communities are nowadays a common method for the analysis and comparison of environmental samples. The significance of differences between microbial community fingerprints was analyzed considering the presence of different phylotypes and their relative abundance (Portillo & Conzalez, 2008). Molecular-based techniques could be divided into four major groups: Guanine plus cytosine (G+C) content, Nucleic acid reassociation and hybridization, DNA microarrays, and PCR-based approaches (Greene and Voordouw, 2003; Kirk et al., 2004). PCR-based approaches were the most used and developed techniques recently, in which the community of interest can be targeted using specific oligonucleotide primers by polymerase chain reaction (Kirk et al., 2004; Edel-Hermann et al., 2008; Gupta et al., 2008). Phylotypic diversity of microbial communities is assessed through three kinds of methods relying on electrophoresis. The first kind exploits the size variation of a marker gene that is consequently separated according to its length such as RISA and ARISA (Fisher & Triplett, 1999). The second type of methods separates the amplicons by their sequences and thus, their conformations, such as, temperature and denaturing gradient gel electrophoresis (DGGE and TGGE) which were originally developed to detect point mutations in DNA sequences (Øvreäs, 2000; Kirk et al., 2004). Another technique that relies on native electrophoresis separation based on differences in DNA sequences is single strand conformation polymorphism (SSCP), (Orita et al., 1989, Zinger et al., 2008). Finally, the third technique relies on both previous methods: PCR products are digested according to their sequence with restriction enzyme, and then separated by their length such as in Restriction fragment length polymorphism (RFLP and T-RFLP), (Liu et al., 1997; Tiedje et al., 1999). However, Standard PCR-based approaches using “universal primers” for rRNA genes are not quantitative but do provide very useful qualitative information on dominant microbial populations (Kirk et al., 2004; Coleman & Whitman, 2005; Edel-Hermann et al., 2008). In the last two decades, cultivation-independent surveys, especially those based on the 16S rRNA and 18S rRNA gene, have greatly advanced our understanding of microbial diversity (Torsvik & Øvreås.,

5 2002; Shrestha et al., 2009). Though 16S rRNA is the marker of choice for the majority of microbial classification, its heterogeneity, due to multiple copies of the gene, usually requires a method to select some and discard others based on some criteria (Kotamarti et al., 2010). A condition in using PCR is to find a good target gene (molecular marker) and to develop primers that selectively amplify this gene. First, the target gene should contain both conserved and variable regions; second, the primer should target the gene from all members of the clade investigated, and it should not target any gene from organisms outside this clade. This is not trivial as only a minor fraction of soil microorganisms have been isolated and their16S rRNA gene investigated, and we do not know well if the available databases represent the target clade (Sørensen et al., 2009). While community profiling techniques are now commonly used in conjunction with DNA sequencing and phylogenetic analysis for the taxonomic identification of species present within a sample, there are still many technical limitations that could be summarized as follow: (1) Biological starting material (DNA or RNA extraction) depends largely on extraction procedures. Cells Lysis efficiency varies within microbial groups and could prevent the efficacy of extraction of certain groups and as a consequence these groups will not be represented in further amplifications steps. Moreover, co-extracted substances could interfere with subsequent PCR analysis and prevent enzymatic activities. Subsequent purification steps can lead to loss of DNA or RNA, again potentially biasing molecular diversity analysis (Prosser 2002). (2) Differential amplification of target genes can also bias PCR-based diversity studies, because these genes/fragments are present in all organisms, they have well defined regions for taxonomic classification that are not subject to horizontal transfer and have sequences databases available to researches (von Wintzingerode et al., 1997). (3) Another factors affecting soil microbial diversity is that microbial communities may have several nested levels of organization, and that they could be dependent on different soil properties or groups of properties (Kirk et al., 2004). (4) The possible presence of chimeric DNA sequences generated from environmental DNA samples that has been shown to influence considerably taxonomic identification of soil microorganisms (Von Wintzingerode et al. 1997). Despite all these limitations, these community profiling techniques have provided remarkably different pictures for the distribution of specific groups of microorganisms than previously discovered based on culture works (Kirk et al., 2004; Saleh-Lakha et al., 2005; Edel-Hermann et al., 2008; Gupta et al., 2008).

6 2.1.4 Eukaryotes as major components of the soil environment: 2.1.4.1 Eukaryotes evolution history: Many attempts have been made to establish a natural, phylogenetic system of eukaryotes, but the relationships and the order of evolutionary appearance of many varied groups remain uncertain, primarily because of a lack of clear synapomorphies. As a result, extensive studies aiming to establish the structure and root of the Eukaryote tree by phylogenetic analyses of molecular sequences have not resulted in a generally accepted tree (Arisue et al., 2005). However, determining the relationships among and divergence times for the major eukaryotic lineages remains one of the most important and controversial outstanding problems in evolutionary biology. The sequencing and phylogenetic analyses of ribosomal RNA (rRNA) genes led to the first nearly comprehensive phylogenies of eukaryotes in the late 1980s, and supported a view where cellular complexity was acquired during the divergence of extant unicellular eukaryote lineages. More recently, however, refinements in analytical methods coupled with the availability of many additional genes for phylogenetic analysis showed that much of the deep structure of early rRNA trees was artefactual (Roger & Hug, 2006). In the last decade, significant advances in taxon sampling and methodology have increased our understanding of eukaryotic evolutionary relationships and dramatically altered our view of high-level diversity. Whereas early phylogenetic analyses relied primarily on the information contained within a single gene (i.e. that encoding 18S rRNA), and were thus limited in terms of their resolution, the genomics revolution has brought an ever-increasing amount of data to bear on the question of protist phylogeny for example, which represent the prominent part of eukaryotic diversity, and more generally, the ancient divergences among eukaryotic lineages (Keeling et al., 2005; Lane & Archibald, 2008). These phylogenetic analyses of a multiple gene and the discovery of important molecular and ultrastructural phylogenetic characters have resolved eukaryotic diversity into six major hypothetical groups (Fig. 2 ), erected on the basis of a diverse mix of morphological and molecular sequence data. Yet relationships among these groups remain poorly understood because of saturation of sequence changes on the billion-year time-scale, possible rapid radiations of major lineages, phylogenetic artifacts and endosymbiotic or lateral gene transfer among eukaryotes (Roger & Hug, 2006; Lane & Archibald, 2008). The strength of the evidence supporting these super- assemblages has been the subject of much discusses and the relationships between the super- groups are largely unknown. Indeed, whether molecular data can perfectly resolve relationships between taxa that diverged one billion years ago is unclear (Lane and Archibald, 2008). However, the history of eukaryotic evolution is one of ever increasing diversity and

7 complexity at multiple levels. The increases in genotypic and phynotypic complexity are usually associated with expansion of gene families. For instance, it has been shown that the diversification of gene families involved in cell differentiation and cell-cell communication contributed to the origination of multicellularity (Zhou et al., 2010).

Figure 2 . A view of the tree of eukaryotes. A hypothetical phylogeny indicating the six major super groups of eukaryotes. Dotted branches indicate lineages that do not clearly fall within any of the major groups. The placement of the root of the tree of eukaryotes is indicated by dihydrofolate reductase (DHFR) –thymidylate synthase (TS) fusion data and myosin gene family data. Alternative positions for the root are indicated by asterisks. The grey shaded region depicts the parts of this hypothetical tree of eukaryotes that are not strongly recovered (with greater than 85% bootstrap support) in published single or multiple gene phylogenies (Roger & Hug, 2006). 2.1.4.2 Fungal phylogeny: Whereas early culture-based studies led to the conclusion that soils were dominated by a few hundred globally distributed fungi, cloning studies easily document hundreds of species within a single plot and indicate high levels of local variability and endemism (Gilbert & Souza, 2002). In recent years, direct nucleic acid extraction approaches coupled with PCR amplification have become more widely applied in investigations of soil fungal ecology and are beginning to provide significant advances in our understanding of genetic diversity in soil

8 fungal communities (Anderson & Cairney, 2004; Sørenson et al., 2009). O’Brien et al (2005) evaluated soil fungal diversity at two sites in temperate forest using direct isolation of small- subunit and internal transcribed spacer (ITS) rRNA genes by PCR and high-throughput sequencing of cloned fragments, and found a remarkably large number of species from just a few grams of soil, but this is only a fraction of the estimated number of fungal species (Gilbert & Souza, 2002). Furthermore, the species documented in these studies still appear to be only a small fraction of the total species pool (Anderson & Cairney, 2004; Peay et al 2008). Despite their ubiquity and clear importance in terrestrial ecosystems, the ecological study of fungal communities has long been held back by an inability to identify species in their vegetative states (Peay et al 2008). In addition, the multi-copy nature of the genomic regions targeted in fungal ecology assumes that the number of copies in different fungal species is similar. The number of copies of the rRNA operon is known to vary between different fungal species; however, the exact number of copies contained within the genome of any single species remains unknown for most species at present. This variation between species complicates the quantification of different fungal species in a mixed DNA pool if the rRNA operon is the target region. A further complicating factor is the potential presence of both spores and mycelium in a single soil sample, which are likely to be co-extracted during the DNA extraction process (Anderson & Cairney, 2004). The advent of next generation sequencing technologies opens the door for systematic and comprehensive studies of fungal diversity. Some of the rapport dealt with 454 pyrosequencing to evaluate fungal diversity could be summarized as follow: Buée et al (2009) assessed the fungal diversity in six different forest soils using tag-encoded 454 pyrosequencing of the nuclear ribosomal internal transcribed spacer-1 (ITS-1). Most operational taxonomic units (81%) belonged to the Dikarya subkingdom (Ascomycota and Basidiomycota). Richness, abundance and taxonomic analyses identified the Agaricomycetes as the dominant fungal class. Opik et al (2009) tested the diversity of arbuscular mycorrhizal fungi (AMF) in natural ecosystems (boreonemoral forest) by applying 454 sequencing, and found that there was a significant difference between AMF communities in the roots of forest specialist plant species and in the roots of habitat generalist plant species. One of the major limitations in investigating fungal diversity in soil, in which the extracted DNA pool constitutes DNA from a diverse range of eukaryotic and prokaryotic organisms, has been the suitability of available PCR primers. The challenge has been to design PCR primers that amplify as broad a taxonomic range of fungi as possible, but at the same time to

9 prevent co-amplification of closely related eukaryotic DNA (Anderson & Cairney, 2004). The lack of specificity for fungal template limits their usefulness in mixed DNA samples, particularly where the ratio of fungal DNA to non-fungal DNA is low (Gardes & Bruns, 1993). In contrast to bacteria (16S rRNA), taxonomic identification of fungi based on sequences of the eukaryotic ribosomal small subunit, the 18S rRNA, is more problematic, with identification commonly limited to genus or family level (Anderson & Cairney, 2004). For fungi, the highly conserved SSU gene provides little species-level resolution, resulting in the lumping of closely related phylotypes and even genera (Tederesoo et al., 2010).This is primarily due to the relative lack of variation within 18S rRNA genes between closely related fungal species as a result of the relatively short period of evolution of the fungi kingdom compared with bacteria. This is compounded by the lack of a complete database of fungal reference sequences. In contrast, non-coding rDNA spacer regions, such as the ITS, benefit from a fast rate of evolution, resulting in greater sequence variation between closely related species compared with the more conserved coding regions of the rRNA gene cluster (Gardes & Bruns 1993, Anderson & Cairney 2004). As a result of improved species-level resolution, the internal transcribed spacer (ITS) region at the 3’ -end of the SSU (in eukaryotes) has been accepted as an exploratory barcoding target for fungal identification (Tederesoo et al., 2010).

Soil is a very complex system and is represented as one of the most diverse habitats on earth and contains the most diverse collection of living organisms that comprises a variety of microhabitats, with possibly thousands of different species with different physicochemical gradients and irregular environmental conditions. soil biota can be regarded as the “biological engine of the earth” and is implicated in most of the key functions that soil provides in terms of ecosystem service, by driving many fundamental nutrient cycling processes, soil structural dynamics, by influencing the weathering and solubilization of minerals and contributing to soil structure and aggregation and finally, degradation of pollutants.

10

L'écosystème terrestre : qui est là ? Le sol est le produit de la désagrégation des roches et des minéraux et peut avoir des propriétés différentes selon l'origine de la roche mère, le climat et la végétation. En général, la structure du sol et son fonctionnement sont contrôlés par plusieurs facteurs abiotiques comme la profondeur, la structuration, la rétention d'eau, l'aération et l'échange du gaz, la disponibilité des matières inorganiques et organiques, et la séquestration de grandes quantités de carbone. Tous ces facteurs sont à leur tour contrôlés et déterminés par des facteurs biotiques comme les plantes et leur association avec les microorganismes. Les associations et les interactions entre les plantes et les microorganismes sont des facteurs importants dans la formation ou la modification des sols par leur rôle dans l'approvisionnement et la transformation des matières organiques. Les plantes, et essentiellement les racines et les microorganismes associés, sont impliqués pour la plupart dans la désagrégation des C du sol. De plus, les matières organiques du sol sont les produits directs de la combinaison entre les activités biologiques des plantes, microorganismes, animaux, et de nombreux facteurs abiotiques. La production de matières organiques du sol, et parmi elles les polysaccharides extracellulaires et d’autres débris cellulaires peuve nt augmenter la capacité du sol à maintenir la structure dès qu’il est formé et en plus contrôler la fertilité et les fonctions du sol. Le sol constitue l'environnement le plus spatialement et temporellement hétérogène sur terre, une combinaison de structure physique complexe, avec une géochimie et des fluctuations saisonnières drastiques dans la température, l'humidité et la disponibilité des nutriments. Tous ces éléments impactent les microorganismes du sol à micro-échelle, ce qui résulte en une énorme diversité et biomasse, en comparaison avec d’autres environnements. Tous ces organismes non seulement cohabitent, mais agissent aussi dans le recyclage des nutriments, la compétition, le parasitisme et prédation dans un réseau multitrophique. Ces réactions complexes régulent la biomasse du sol de telle sorte que la composition d'un groupe se trouve sous l'influence des autres. Donc, le sol peut être considéré comme un système très complexe et représente donc un des habitats les plus divers sur la terre. Le sol contient ainsi la collection la plus diverse d'organismes vivants qui comprend une grande variété de microhabitats, des milliers de différentes espèces soumises à différents gradients physico- chemiques et des conditions météorologiques et environnementales très variables. Comme la biocénose du sol est considérée comme une des plus diversifiée sur terre et qu’ elle contrôle

11 des processus importants, il s’avère indispensable de mieux comprendre la diversité et le fonctionnement de ces organismes dans leurs habitats naturels. Les analyses de diversité sont importante pour de nombreuses raisons comme : (1) Comprendre la distribution relative des organismes, (2) Augmenter les connaissances sur la diversité de ressources génétiques dans une communauté, (3) Améliorer la connaissance du rôle fonctionnel de la diversité et des facteurs régulant la diversité biologique, et finalement, (4) Comprendre le rôle de la diversité biologique pour répondre à des questions telles que par exemple : est ce qu ’un niveau spécifique de diversité est essentiel pour le fonctionnement et la durabilité des écosystèmes? Pourtant, les études sur la diversité biologique et sa relation avec la structure et la fonctionnalité des écosystèmes se sont surtout concentrées sur les macro-organismes et peu d'attention a été dirigée vers les micro-organismes malgré leur rôle prépondérant dans tous les environnements. Ceci est aujourd’hui facilité par la large gamme de méthodes disponibles pour étudier la diversité microbienne des sols. Chaque méthode a ses limites et fournit généralement seulement une image partielle de la diversité microbienne de sol. Certaines techniques représentent un outil puissant pour l'analyse des communautés d’eucaryote s du sol, d’autres sont plus adaptées pour les communa utés de procaryotes. Comme il est impossible d'évaluer l'efficacité de chaque méthode avec notre connaissance actuelle, il est préférable d'étudier les communautés microbiennes à autant de niveaux différents que possible. L’avancement dans les analyses de diversité et plus précisément les approches moléculaires contribuera à améliorer notre compréhension de ces communautés et de leur fonctionnement.

12 2.2 Soil ecosystem, who is there, and what are they doing? 2.2.1 What is an Ecosystem? In general, an ecosystem is a complex system in which various species interact with each other to form complex networks (Maron et al., 2007; Peay et al., 2008). Through such network interactions, an ecosystem is capable of accomplishing systems level functions which could not be achieved by individual populations. The stability of ecosystems depends on three major properties; resistance, resilience, and functional redundancy; which could be explained by: the degree of change caused by disturbance, the speed with which they return to their predisturbance level and finally that the same functions are achieved by different populations (Maron et al., 2007). Within ecosystems, diversity could be affected by several factors such as trophic interactions, spatial and temporal habitat heterogeneity, disturbance and eutrophication (Torsvik & Øvreås, 2002). There are obviously negative effects such as stress, or positive effects like resource diversity or biological interactions ( Fig. 3 ).

Figure 3 . External factors determine diversity, which in turn affects ecosystem properties like stability, resilience, redundancy and productivity (Lynch et al., 2004).

The major underlying principle of diversity studies is probably the assumption that interactions between populations in a habitat lead to an organized and stable community (Øvreâs, 2000). There are three general categories of interactions by which organisms in one compartment can affect biodiversity in another one: (1) obligate, selective interactions (one- to-one linkage) through mutualism, for example; (2) one-to-many species linkages, via keystones and dominants, and (3) causal richness, or many-to-many linkages. The nature and extent of these interactions varies a great deal depending on the systems studied and the spatial scales at which the mechanisms are being considered (Coleman & Whitman, 2005). In a microbial community many different organisms will perform the same processes and

13 probably be found in the same niche. Stress in one part can be rapidly amplified and spread to the whole system through positive feedback links that attach the system together. Recovery of ecosystems after disturbance includes re-establishing the energy source for organisms as well as ecosystem processes, which depends mainly on the functional aspect of the microbial community and not on the species composition (Øvreâs, 2000). Positive effects on diversity may be related to increased stability, resilience, resistance to stress, and even productivity (Lynch et al., 2004). However, all the parameters factors controlling ecosystem functions and dynamic are well identified for higher organisms such as plant, because of the large number of morphologically identifiable functional traits (Peay et al., 2008). These parameters factors have not yet been largely addressed in microbial ecology, because of the huge taxonomic and functional diversity within microbial communities, together with the difficulty of studying them in their natural habitats, which holds back the characterization of population dynamics and definition of functional groups (Maron et al., 2007, Peay et al., 2008).

2.2.2 The soil ecosystem, concept, definition and key elements: Soil is one of the most important ecosystems on earth, as it is a dynamic system in which physical, chemical and biological components interact (Kirk et al., 2004; Van der heijden et al., 2008; Bastida et al., 2009; Guazzaroni, et al., 2009). Soil functioning is carried out by communities of organisms that cohabit in the soil (Coleman & Whitman, 2005). Within this ecosystem, microorganisms are major components of all soils often with low biomass compared with the mineral or humus fraction, but with absolute crucial roles for the functioning of soils (Breure et al., 2004), and perform an important task in ecosystem by influencing a large number of important ecosystem processes (Kirk et al., 2004; Guazzaroni, et al., 2009). From this point of view it is crucial to generate a thorough understanding of theses key microorganisms and processes they facilitate (Guazzaroni, et al., 2009). Although there can be thousands to millions of species living in a square meter of soil, only a limited subset of organisms contribute to some specific processes (Coleman & Whitman 2005 ). This functional diversity is an aspect of the overall microbial diversity in soil, and encompasses a range of activities (Torsvik & Øvreås, 2002). The properties and characteristics measured in soil biodiversity studies varied considerably, but most could be assigned to three broad groups: decomposition-related (such as mass loss from leaf litter or wood over time), respiration-related (fluxes of CO2 either alone or in response to nutrient conditions), or biota-related (such as biomass of various soil organisms, such as microbes, nematodes and arthropods), (Nielsen et al., 2011). Most of these soil functions influenced by

14 soil organisms are performed through multitrophic interactions that ensure the resistance, resilience and recovery of these functions. Soil organisms are implicated in many biogeochemical cycles such those of carbone, nitrogen and phosphours, through decomposition and transformation of organic matter entering soil, including plant litter and animal manure (Kirk et al., 2004; Van der heijden et al., 2008; Bastida et al., 2009; Gardi et al., 2009; Guazzaroni, et al., 2009). In addition, soil organisms could play a range of others functions such as; degradation of anthropogenic compounds such as pesticides, stabilization of soil aggregates, specifically by building clayhumus- complexes, improvement of soil porosity due to burrowing activities, influencing soil pH through nitrification and denitrification, which could result in mobility changes of heavy metals, and finally, influencing heavy metal mobility under different redox conditions (Van der heijden et al., 2008, Gardi et al., 2009). Soil microorganisms perform a wide range of functions and represent (60-80%) of biological activity in soil.

2.2.2.1 Major factors influencing soil ecosystem: The role of soil organisms in nutrient cycling and other environmental processes is regulated by a diverse range of biotic and abiotic factors. The functionality and dynamical performance of soil organisms emerges as a result of the interaction between the physical and biological processes as mediated by the structure of soil (Wardle, 2006). Even if it is largely accepted that abiotic factors affect soil ecosystem, for example, particle size has a higher impact on microbial diversity and community structure than have factors like bulk pH and the type and amount of organic compound input; the biotic drivers are thought to be the most important drivers of soil biodiversity (Torsvik & Øvreås, 2002). These biotic factors operate in a hierarchical manner across a range of spatial and temporal scales ( Fig. 4 ). At the most local of these scales, diversity of soil organisms can potentially be regulated by interactions that occur among taxa within the same trophic grouping. Biotic factors are directly relevant to the so-called diversity-functioning issue, which is focused on determining whether organisms diversity influence key ecosystem properties such as flow rates of nutrient decomposition, productivity, resistance and resilience to disturbances (Wardle, 2006).

15

Figure 4. The hierarchical nature of biotic factors that may influence alpha-diversity within a given group of soil organisms. ‘within group’ can be applied to key trophic groups (e.g. microbe -feeding nematodes) or taxonomic groups (e.g. fungi and oribatid mites). ‘Interference mechanisms’ apply primarily to int eractions among fungi, among bacteria, and between bacteria and fungi; ‘predation’ by other soil organisms includes soil faunal consumption of bacteria, fungi and other soil fauna. The right-hand axis is presented as a gradient, because while some organisms and processes operate exclusively in the aboveground or belowground compartments, others operate in both compartments (e.g. plants that have both aboveground and belowground structures; animals that spend time both above- and belowground). The relative positions of these drivers in terms of temporal and spatial scale are only intended to be illustrative not exact, and that some of these drivers operate across a range of scales (Wardle, 2006)

However, while linking populations and processes has been a major focus of macro- ecology, there is debate about whether or not information on the structure of microbial communities is informative at the ecosystem level. This debate varied from that ecosystems act to average out microbial species effects, implying that microbial populations do not easily scale-up to explain processes (Andren et al., 1999). Others argued that all possible niches are always filled in an environment, so microbial diversity has no direct role in ecosystem function (Finlay et al., 1997). Therefore, understanding of ecosystem functions requires an integration of both biotic and abiotic factors, because although much work has concentrated on developing techniques that are able to measure the diversity of soil biology, little effort has

16 been spent in connecting these measures with habitats and then functions (Young & Crawford, 2004; Foissner et al., 2005).

2.2.2.2 above- and belowground interaction effects on ecosystem functions: Although it has been long known that soil organisms are integral to soil fertility, it is only during the past few decades that ecologists have begun to explore belowground communities and their functional significance for plant communities and ecosystem processes. This interest is due, in part, to technological advances that have enabled scientists to extract and characterize soil microbial communities and to assess their function. However, it is also due to an increasing recognition by ecologists, who have traditionally focused on aboveground organisms, of the importance of belowground organisms as structuring forces in terrestrial ecosystems (Bardgett et al., 2005). For example, bacteria have been estimated to number from 4 to 6×10 30 cells, with a sizable proportion, about 92%, being in the subsurface. At this level of abundance, the amount of cellular nitrogen and phosphorus in soil prokaryotes is comparable to that found in terrestrial plants, which illustrates the importance of the these organisms to biochemical processes in soil (Coleman & Whitman, 2005). However, linking plant diversity to microbial diversity and soil ecosystem function is far more challenging because plant community structure is not independent of soil conditions. Additionally, associations in field studies cannot always be attributed to the fundamental effects of plants on the soil community as there are in contrast, many mechanisms through which the soil microbial community can influence plant community structure (Bardgett, 2005; Van der heijden et al., 2008). I. Feedbacks between below and aboveground communities: our understanding of relationships between aboveground and belowground communities is complicated by the fact that they operate at a relatively wide range of spatial and temporal scales that also differs, depending on the body size and life history of the organism concerned and the size of its habitat unit or domain (Bardgett et al., 2005). The aboveground and belowground components strongly influence each other, suggesting that there may be important feedbacks between them. It is expected that the strength of these feedbacks depends on which groups of soil biota are considered (Wardle, 2006). Understanding the distribution of microorganisms relative to plant distribution is important to understand how plants influence soil processes and neighboring plants (Schmidt et al., 2000). Numbers of species above ground and below ground may be correlated when taxa in both habitats respond similarly to the same or correlated environmental driving variables, in particular across large gradients of disturbance,

17 climate, soil conditions, or geographic area. However, it is known that plant community composition affects soil microbial community structure and function, and it has been shown to correlate with the spatial distribution of soil properties and microbial biomass in many natural systems (Schmidt & Lipson, 2004; Wardle, 2006). There is increasing recent recognition that soil groups most directly associated with plant roots (e.g. mycorrhizal fungi, root pathogen and herbivores) show a higher degree of specificity. This means that a greater range of plant species should be able to support a greater range of root-associated biota (Schmidt et al., 2000; Wardle, 2006). Moreover, the range and more particularly the functional traits of plants, (e.g., whether they harbor nitrogen-fixing symbionts, warm-season grasses or rosette forbs) are generally strong drivers of ecosystem processes (Schmidt & Lipson, 2004; Bardgett et al., 2005; Coleman & Whitman, 2005). Moreover, high diversity in plant species can result in high diversity of litter quality or types of litter entering the belowground system (plants that produce specific phenolic compounds may select for population of soil microbes that metabolize and detoxify such compounds). Therefore, changes in plant diversity would modify resource availability for heterotrophic microbial communities in soil, and therefore modify their composition (Benizri & Amiaud, 2005; Bardgett et al., 2005; Coleman & Whitman, 2005; Van der heijden et al., 2008). Soil microorganisms also influence above-ground ecosystems by contributing to plant nutrition; plant health, soil structure, soil fertility and regulation of plant communities (release nutrients into plant-available forms and degrade toxic residues), (Fig. 5 ). They also form symbiotic associations with plant roots that often act as antagonists to pathogens (Breure et al., 2004; Kirk et al., 2004). Therefore, a high diversity of resources and species in soil could feed back to a high diversity aboveground, where certain species or functional groups are closely linked to groups below ground (Schmidt et al., 2000; Schmidt & Lipson, 2004; Bardgett et al., 2005; Coleman & Whitman, 2005). Conservatives estimates suggest that 20000 plant species are completely dependent on microbial symbionts for growth and survival pointing to the importance of soil microbes as regulators of plant species richness on earth. Therefore, soil microbes are important regulators of plant productivity, especially in nutrient poor ecosystem where plant symbionts are responsible for the acquisition of limiting nutrients, for example, mycorrhizal fungi and nitrogen fixing bacteria are responsible for 5- 20% (grassland and savannah) to 80% (temperate and boreal forests) of all nitrogen, and up to 75% of phosphorus that is acquired by plant annually (Van der heijden et al., 2008). Moreover, Protozoa species is considered as one of the most important components of soil biota due to their essential roles in biogeochemical cycles. There is convincing evidence that

18 soils protozoa respire about 10% of the total carbon input, mineralize 20 –40% of the net nitrogen, and significantly enhance the growth of plants and earthworms (Foissner et al., 2005).

Figure 5. Schematic representation showing the impact of soil microbes on nutrient acquisition and plant productivity in natural ecosystems. Plant litter is decomposed by a wide range of bacteria and fungi (1) making nutrients available for uptake by mycorrhizal fungi (2) and plant roots or immobilizing nutrients into microbial biomass and recalcitrant organic matter (4). Ecto-mycorrhizal fungi and ericoid mycorrhizal fungi have also access to organic nutrients and deliver these nutrients to their host plants (3). Some plants can also acquire organic nutrients directly. Nutrients can also be lost from soil caused by denitrification of ammonium into di- nitrogen gas or nitrogen oxides by denitrifying bacteria (5) or when nitrifying bacteria and Archaea facilitate nitrogen leaching by transforming ammonium into nitrate (6), which is much more mobile in soil. The contribution of microbes to leaching losses of other nutrients (e.g. phosphorus) is still poorly understood. Nitrogen-fixing bacteria (both free-living and symbiotic) transform nitrogen gas into ammonium (7), thereby making it available to plants, enhancing plant productivity. Finally, microbial pathogens attack plants and can reduce plant productivity (8), (Van der heijden et al., 2008).

Also, microorganisms could affect aboveground community negatively by reducing plant productivity when pathogen microbes affect forest tree for example, and this can have dramatic consequences for ecosystem processes ( Fig. 5 ). Microbes can also indirectly influence plant productivity via the action of free-living microbes that alter rates of nutrient supply and resource portioning and thus reduce plant productivity through competition for nutrients with plant root and/or promoting nutrient loss via leaching of mobile nutrient forms (Van der heijden et al., 2008). However, estimates of microbial effects on plant productivity are often difficult because populations and communities of soil microbes are affected by wide range of other soil biota, especially their consumers, such as protozoa and nematodes (Wardle, 2006).

19 II. Rhizodeposition as a major factor in biogeochemical cycles in soil: The rhizosphere is an environment where root exudates stimulate microbes that have the potential to rapidly initiate growth following a dissolved organic matter (DOM) pulse. These growth spurts lead to increased predation by protists and larger soil eukaryotes, resulting in increased microbial turnover and leakage of dissolved organic N (DON) and dissolved inorganic N (DIN) back into the rhizosphere environment (Schmidt et al., 2007). In the rhizosphere, organic carbon is considered to be the key factor for microbial density and activity. During plant growth, roots actively or passively release a range of organic compounds, that comprises exudates (small molecules, such as organic acids, amino acids and sugars), secretions (such as enzymes), lysates from dead cells and mucilage,. This process, known as rhizodeposition, is of ecological importance as it results in different chemical, physical and biological characteristics in the rhizosphere compared with those of the bulk soil. Therefore, rhizodeposits play an important role in the regulation of symbiotic and protective associations between plant and soil microorganisms (Singh et al., 2004; Lambers et al., 2009). Siciliano et al (2003) have proposed that the plant's influence on the soil microbial community would extend beyond the rhizosphere and would alter the functioning of the bulk soil microbial community to aid in degradation. These exudates are of prime importance for microorganisms since they are readily assimilable without synthesis of exo-enzymes, and offer a readably available source of carbon and/or energy likely to enhance fast growing microbes in the rhizosphere, providing these microbes with the necessary metabolic capabilities, which is able to support high levels of microbial activities (Singh et al., 2004; Benizri & Amiaud, 2005). Most soil microorganisms are heterotrophic and, as such, depend upon plant C and, in turn, provide plants with nitrogen (N), phosphorus (P) and other minerals through decomposition of soil organic matter. ( Figure 6) shows the flow of carbon in the rhizosphere (Singh et al., 2004). Soil organisms play a key role in C dynamics and a loss of species through global changes could influence global C dynamics. As a consequence, the flux of below ground C to the atmosphere through decomposition and subsequent respiration is very large, and has been estimated to be an order of magnitude larger than anthropogenic CO 2 emissions. Although abiotic factors, such as temperature and moisture, are considered to be primary determinants of decomposition rates and C cycling, soil organisms are directly responsible for the vast majority of organic matter decomposition (Bjôrk et al., 2008; Djukic et al., 2010; Nielsen et al., 2011).

20

Figure 6. A simplistic model diagram showing carbon-flow and multi-dimensional interactions in the rhizosphere. Root exudates provide carbon to soil microorganisms, which in turn provide nitrogen and phosphorus to the plant by mineralization and immobilization of organic matter. Soil organic matter (SOM) is formed by microbial action on plant litters and dead animals. Soil animals feed on plant litters, soil microbes and SOM (Singh et al., 2004) It is apparent that differences in species traits among some soil biota lead to specific- specific effects on aspects of C cycling, and that not all species found in soils are redundant from an ecosystem functioning perspective (Coleman & Whitman, 2005). A stepwise process for the ways in which increased heterogeneity of carbon (C) substrates from aboveground will positively influence belowground diversity ( Fig. 7 ) is as follows: ( 1) diversity of primary producers leads to diversity of C inputs belowground; ( 2) carbon resource heterogeneity leads to diversity of herbivores and detritivores; and ( 3) diversity of detritivores or belowground herbivores leads to diversity of organisms at higher trophic levels in belowground food webs (Hooper et al., 2000). The balance between primary production and decomposition ultimately control ecosystem C storage. Soils are of considerable importance for C cycling in terrestrial ecosystems as a large proportion of the global terrestrial C pool (approximately 80%) is found there (Coleman & Whitman, 2005; Nielsen et al., 2011 ). Decomposition and nutrient mineralization processes are carried out by a diverse range of taxa, suggesting a significant level of functional redundancy in the decomposer biota (Wardle, 2006). Microbes carry out most of actual breakdown of plant material because they are one of the few organisms in soils that produce enzymes capable of degrading recalcitrant plant-derived compounds such as lignin. Additionally, numerous experiments have demonstrated that soil animals; including mites, nematodes, earthworms, and their interaction with soil microorganisms, typically stimulate decomposition, measured as rates of litter mass loss and C mineralization (Nielsen et al., 2011).

21

Figure 7 . Diversity of terrestrial ecosystem components as a function of resource heterogeneity and trophic interactions (Hooper et al., 2000).

Symbiotic microorganisms can obviously play a key role in accessing complex organic N, bypassing saprotrophs involved in mineralization. However, in some mycorrhizal systems these saprotrophs play a pivotal role in making N available to the plants. The AM (Abscular mycorhizal) fungi also increase N nutrition by extending the absorptive zone due to hyphal extensions described as the ‘mycorrhizosphere’. This increase in N uptake is related to the stimulation of bacteria grazing in the rhizosphere (Lambers et al., 2009). There is sufficient evidence that soil fauna have significant effects on all of the pools and fluxes in these cycles, and soil fauna mineralize more N than microbes in some habitats. It is therefore essential that their role in the C and N cycle be understood. We consider two mechanisms through which soil fauna can directly affect N cycling. First, fauna that are efficient assimilators of C and that have prey with similar C:N ratios as themselves, are likely to contribute directly to the mineral N pool. Second, fauna that are inefficient assimilators of C and that have prey with higher C:N ratios than themselves are likely to contribute most to the dissolved organic matter

22 (DOM) pool. Different groups of fauna are likely to contribute to these two pathways. Protists and bacteria-feeding nematodes are more likely to be important for N mineralization through grazing on microbial biomass, while the effects of enchytraeids and fungal-feeding microarthropods are most likely to be important for DOM production (Osler & Sommerkorn, 2007). At the functional level, microorganisms that colonize the rhizosphere help plants to acquire P and K, and some enhance N uptake from the soil by their effect on root morphology and physiology. Recent observations of a higher diversity of ammonia-oxidizing bacteria (Briones et al., 2003) and nitrogen-fixing ( nifH ) genes (Cocking, 2003) in the rhizosphere when compared with bulk soil suggest that plants might select functional groups rather than taxonomic groups of microorganisms (Singh et al., 2004). Other investigations indicate that the type and amount of available organic substrates strongly influence the abundance of microbial groups and their functional diversity in soil ecosystems (Torsvik & Øvreås, 2002; Breure et al., 2004).

2.2.3 Soil ecosystem: Fungi as a major actor: Eukaryotes are one of the most important components of soil microorganisms, and are considered as major players in soils communities as, they strongly influence the prokaryotic community structure (Hirsch et al., 2010). Fungi represent a large amount of soil biota and the dominant fraction within eukaryotes and are considered as one of the most important functional component of terrestrial ecosystem as decomposers, mutualists and pathogens and hold a central place in the subterranean food web since they are able to decompose organic matter and allow the transfer of nutrients in plants through mycorrhiza (Torsvik & Øvreås, 2002; Anderson & Cairney, 2004; Wardle, 2006, Mueller et al., 2007). Fungi are probaply best known for their role as decomposers, dominating the decomposition of plant parts, and particularly of lignified cellulose. They produce a wide range of extracellular enzymes that break down complex organic polymers into simpler forms that can be taken up by the fungi or by other organisms. This process is an essential step in the carbon cycle; without it, plant detritus would quickly tie up available carbon and mineral nutrients. It is not surprising, therefore, that eliminating fungi results in a significant reduction in both carbon and nitrogen depletion from litter (Beare et al., 1992). Fungi directly shape the community dynamics of plants, animals, and bacteria through a range of interactions (Gilbert & Souza, 2002). The majority of vascular plants are associated with arbuscular mycchorizal or ectomycchorizal fungi and benefits from an increased capacity to extract phosphorus and other nutrients from the soil. Mycchorizal fungi thus have an important role in plant community development,

23 nutrient cycling and the maintenance of soil structure (Breure et al., 2004). The symbiosis between plant roots and fungi, referred to as mycorrhiza, is one of the most ubiquitous mutualisms in terrestrial ecosystems. These mycorrhizal associations enable plants to acquire mineral nutrients and water in exchange for photosynthetically derived sugars. It is likely that plant adaptation to life on land 400 million years ago was possible only with the help of mycorrhizal symbionts. Many plants depend heavily on mycorrhizae for mineral nutrition, and the absence of appropriate fungi can significantly alter plant community structure. Fungi share many ecological similarities with macroorganisms, like plants. For example, fungi are sessile and compete for space in order to control access to resources. And, although individual hyphae are microscopic, genets or ramets can come to occupy large spaces and can survive for many years (Peay et al., 2008). There is evidence indicating possible effects of mycorrhizal fungal diversity and/or composition on ecosystem productivity at the species level and even at the genetic level (Wardle, 2006). Fungi are the most common and important plant pathogens, causing serious crop loss and shaping the composition and structure of natural plant communities in many significant but often underappreciated ways. For example, seedling mortality is often highest close to parent plants because of host-specific fungal pathogens that reside on the parents. Such distance- dependent mortality has been hypothesized as a major mechanism preventing competitive exclusion and maintaining plant species diversity (Gilbert & Souza, 2002). However, while some fungi are clearly parasites, the nature of many fungal interactions is uncertain and may change depending on the environment in which the interactions occur. Endophytic fungi, which live ubiquitously inside the leaves, stems, and roots of plants, are a good example. Although some of these fungi produce secondary compounds that protect their hosts from herbivory, their overall effects on plant fitness can change dramatically depending on environmental conditions or herbivore pressure. Although there is ample evidence in the literature for competitive interaction in soil fungal communities, direct evidence for competition as a determinant of fungal diversity is scarce. Unlike bacteria, fungi do not seem to exhibit a high frequency of horizontal gene transfer, so functional traits are relatively stable, and species concepts are useful and reasonably well developed. For these reasons, it is likely that much of the ecological theory derived from macroorganisms is applicable to the study of fungi (Peay et al., 2008). These fungal associations may be mutualistic or pathogenic, thus influencing the productivity and the colonization ability of plant communities (Torsvik & Øvreås, 2002, Wardle, 2006).

24 The soil is one of the most important complex ecosystems on earth with many network interactions that could be elucidated by maintaining its resistance, resilience and functional redundancy. The majority of soil functioning is handeled by their habitats that varie from microorganisms to macroorganisms, with the most important roles attributed to microorganisms such bacteria and fungi. These interaction and roles played by microorganisms and other organisms within soil ecosystems are influenced by several factors that could be divided into; abiotic, such as, the pH, the type and amount of organic compound or even particles sizes, whereas, biotic factors are in direct relation with feedbacks between below and aboveground communities. However, soil microorganisms play a crucial rol in al soil fonctionning and further in major biochemical cycles with a specific role for each of soil components, such as fungi as decomposers, mutualists and pathogens.

25 L'écosystème sol : qui est là et que font-ils? D’une manière générale, u n écosystème est un système complexe dans lequel les espèces différentes communiquent les unes avec les autres pour former des réseaux complexes. Un écosystème est capable d'accomplir des fonctions au niveau de systèmes qui ne pourraient pas être accomplies par des populations individuelles. La stabilité d'écosystèmes dépend de trois paramètres importants : la résistance, la résilience et la redondance fonctionnelle. Ces paramètres correspondent respectivement au degré de changement provoqué par les perturbations, à la vitesse avec laquelle ils reviennent à leur niveau avant les perturbations, et finalement au fait que les mêmes fonctions peuvent être accomplies par différentes populations. Le sol est un des écosystèmes les plus importants sur la terre. C’ est un système dynamique dans lequel les composantes physiques, chimiques et biologiques réagissent. Le fonctionnement du sol est contrôlé par les communautés d'organismes qui cohabitent. Dans le système de sol, malgré qu’ils ne représentent le plus souvent qu’ une faible biomasse en comparaison avec les fractions minérales ou d'humus, les microorganismes sont des composants importants avec des rôles cruciaux pour le fonctionnement du sol puisqu’ils exécutent des tâches importantes dans l'écosystème en influençant un grand nombre de processus environmentaux. Les organismes des sols pourraient avoir d'autres fonctions comme la dégradation des matières anthropiques comme les pesticides, la stabilisation des agrégats du sol, l'influence du pH du sol par la nitrification et la dénitrification, avec un effet important sur la mobilité de métaux lourds et finalement, en influençant leur biodisponibilité et donc leur impact. Le rôle des organismes du sol dans les cycles des nutriments et d'autres processus environnementaux est régulé par une gamme diverse de facteurs biotiques et abiotiques. La fonctionnalité et la performance dynamique d'organismes du sol émergent à la suite d ’inter actions entre les processus physiques et biologiques, sous le contrôle de la structure des sols. Les facteurs biotiques sont directement impliqués dans l’expression de la diversité fonctionnelle, dont l’étude a pour objectif de déterminer si les microorganismes clés influencent les propriétés de l'écosystème, comme la décomposition des nutriments, la productivité et la résistance ou la résilience aux perturbations. Les nombres d'espèces terrestres et souterraines peuvent être corrélés quand les taxa dans les deux habitats répondent de la même façon aux mêmes variables environnementales, en particulier à travers un grand gradient de perturbations, climat, conditions de sol, ou région géographique. Pourtant, il est connu que la composition de communautés végétales affecte la structure des

26 communautés microbiennes et les fonctions des sols. Il a été montré que ces effets sont en liaison avec la distribution spatiale des propriétés d ’un sol et de biomasse microbienne dans beaucoup de systèmes naturels. Les micro-organismes du sol influencent aussi les écosystèmes terrestres en contribuant à la nutrition et la santé des plantes, la structure et la fertilité de sol, et la régulation des communautés des plantes (libérer des nutriments dans des formes disponibles pour les plantes et la dégradation des résidus toxiques). Ils forment aussi des associations symbiotiques avec les racines des plantes qui agissent comme antagonistes des pathogènes. Pourtant, l ’estimation des effets microbiens sur la productivité des plantes est souvent difficile parce que les populations et les communautés microbiennes des sols sont affectées par une large gamme d’autres organismes du sol, surtout leur s prédateurs, comme les protozoaires et les nématodes. Les microorganismes des sols jouent un rôle central dans la dynamique du C. Une perte d'espèces en relation avec les changements globaux pourrait influencer la dynamique globale de carbone (C). Bien que les facteurs abiotiques comme la température et l'humidité, puissent être considérés comme des variables principales influençant le taux de décomposition de la matière organique et le cycle du carbone, les organismes des sols sont directement responsables de la vaste majorité de la décomposition des matières organiques. Les micro- organismes symbiotiques peuvent évidemment jouer un rôle clé dans le fait d'accéder au complexe d’azote organique et ils jouent un rôle essentiel dans la disponibilité de l’azote pour les plantes. Les champignons de type Abscular mycorhizae peuvent aussi augmenter la nutrition avec N en étendant la zone absorbante par des extensions hyphal décrites comme la 'mycorrhizosphere'. Cette augmentation dans l’absorption de l’azote est rattachée à la stimulation de bactéries vivant dans la rhizosphere. Les eucaryotes sont une des composantes les plus importantes des microorganismes de sol et sont considérés comme les principaux acteurs dans les communautés de sols au même titre que les bactéries. Ils influencent fortement la structure de communauté de procaryotes. Les champignons représentent la fraction dominante des eucaryotes et sont considérés comme étant la composante fonctionnelle la plus importante d'écosystème terrestre comme décomposeurs, mutualistes et pathogènes. Les champignons sont probablement plus connus pour leur rôle dans la décomposition de la matière organique végétale, en dominant la décomposition des débris des plantes et particulièrement de la cellulose lignifiée. Ils produisent une large gamme d'enzymes extracellulaires qui décomposent des polymères organiques complexes en formes plus simples qui peuvent être alors prises en charge par les champignons ou d'autres organismes (bactéries), ce qui leur confère un rôle central dans le cycle de carbone.

27 2.3 Alpine soil as special ecosystem: 2.3.1 Definition and key features: The alpine zone is situated above the tree-line and below the non-seasonal snowline on temperate and tropical mountains (Edwards et al., 2007), and occupy one fifth of global land surface (Beniston, 1996). Alpine environments are defined as being cold, windy, and snowy, and characterized by low growing season temperatures, and long-lasting snow cover (Edwards et al., 2007; Schöb et al., 2009). These ecosystems are dominated by slow-growing plant species, and by a low net primary productivity, a high soil organic matter content, and a limited supply of soil nutrient (Choler, 2005). Large amounts of organic carbon are supposed to be sequestered within these environments (Broll, 1998; Nemergut et al., 2005), which are estimated at 11% of total world soil carbon (Robinson, 2002). Therefore, s oils from arctic and alpine tundra environments play important roles in the global biogeochemical processes, especially for C cycle. Indeed, as this large soil C stocks, when exposed to climate change will put these stocks of soil carbon at risk of mineralization through increases in microbial respiration, and may provide potential positive feedbacks to warming (Oechel et al., 1997; Nemergut et al., 2005; McGuire et al., 2009). In addition to soil pH, length of growing season and daily maximum temperature, represent the key abiotic factors that control snowbeds communities (Schöb et al., 2009).

2.3.2 Climate changes effect on biodiversity in alpine ecosystem: High mountain regions contain a substantial part of global biodiversity, which represent a crucial element in the conservation and insurance for ecosystem stability (Schöb et al., 2009). There is an important relation between the diversity of soil organisms and climate change (Broll 1998), through altering snowfall regimes, which in turn influence snow cover and ultimately tundra plant communities (Wipf & Rixen, 2010). In addition to plants, many soil organisms are adapted to the extreme environmental conditions and have developed specific adaptations to survive, and therefore play an important role in extreme ecosystems functioning. Apart from microorganisms, these include the following invertebrate groups: lumbricidae, enchytraeidae, collembola, acari, diplopoda, isopoda, nematoda, protozoa and insect larvae. Some small mammals, especially burrowing rodents, may influence soil properties in alpine areas in many ways (Broll, 1998).

2.3.2.1 Snow-cover effects on seasonal succession of microbial community: Alpine tundra soils commonly experience profound seasonal cycles that are characterized by shifts in microbial communities, processes and soil nutrient availability (Nemergut et al.,

28 2005). Microbes are largely implied in driving large-scale biogeochemical processes. Thus, the question can be raised about the functional importance of biogeochemical cycles for spatial and seasonal variations of alpine microbial communities, especially under climate change (Lipson et al., 2002; Zinger et al., 2009). Climate changes are largely accepted as crucial factor in determining plant composition and diversity within alpine environment, and as a consequence, a shift in growth form composition, as observed in arctic ecosystem might have significant effects on biogeochemical cycling in alpine ecosystems ( Walker et al., 2006). Variation in microbial community is distinguished on the basis of physical location, i.e. those likely to favor either the snowpack or the underlying soil system and timing, for example, those that operate semi-continuously throughout the winter period or are concentrated with shorter periods such as snowmelt (Edwards et al., 2007). Soil microbial communities have the metabolic and genetic capability to adapt to changing environmental conditions on very short time scales, and the ability to perceive and respond to environmental changes on a much faster time scale than plants and animals (Schmidt et al., 2007). The feedback responses of microorganisms to climate change in terms of greenhouse gas flux may either amplify (positive feedback) or reduce (negative feedback) the rate of climate change (Singh et al., 2010). These systems have relatively short, snow-free growing seasons, when plants are photosynthetically active and competitive in the uptake and immobilization of available nitrogen. The corresponding summer microbial communities are probably fast-growing organisms that feed on labile root exudates. As summer transitions to fall, drier soils and cooler temperatures lead to the beginning of plant senescence, and the microbial community shifts to a population that is capable of degrading more recalcitrant compounds, such as polymers and phenolic found in plant litter. Winter’s lower temperatures and snow -cover stimulate the cold-adapted microbial community, which feeds on moribund plant litter and mineralizes and immobilizes nitrogen from plant material. Winter is an extremely active time for the microbial community, and the highest levels of microbial biomass are seen under snowpack (Nemergut et al., 2005). Lipson et al (2002), compared the seasonal abundance of bacterial and fungal cells, and found that winter microbial community uses more complex substrates, such as phenolic, starch, and cellulose from dead plant materiel, and functions well at low temperatures, whereas the summer community is more reliant on live plant roots exudates, such as amino acids, and functions better in warmer temperatures. The differences in substrates utilization could reflect the different substrates available seasonally in winter

29 dead plant material is available, whereas in summer live roots and their exudates are present (Lipson et al., 2002; Schmidt et al., 2007). Observation of the seasonal succession of functional groups prompted phylogenetic studies of seasonal fluctuations of the microbial community in tundra soils. From a broad phylogenetic perspective, fungi dominate the under-snow biomass and bacteria are more active in the summer (Schmidt et al., 2007). These observations are in agreement with studies from other systems that also show that fungi dominate microbial biomass in the winter (Bardgett et al., 2005). Within fungi, there are also large seasonal changes in the major groups of the fungi ( Ascomycetes, Basidiomycetes, and Zygomycetes ), but the most dramatic changes occur within previously unknown subphylum level clades of Ascomycetes (Schmidt et al., 2007). The fungal communities are particularly seasonally dynamic. At snowmelt, eukaryotic SSU rRNA gene surveys suggest that the population of ascomycetes drops by about 50%, while the populations of zygomycetes and basidiomycetes increase in abundance (Nemergut et al., 2005). Studies of mycorrhizal fungi in alpine soils have provided some of the only field-based evidence that plants can use a succession of different fungal partners to access different soil nutrients over the course of a growing season. A succession of fungal partners in the roots of Ranunculus adoneus was observed and that phosphorus uptake by this plant coincided with the transitory (less than two weeks) abundance of AM arbuscular in new roots of this plant in the field (Mullen & Schmidt, 1993). In contrast, all N uptakes in R. adoneus occurred during and just after snowmelt when the overwintering roots were heavily infected with a novel endophytic fungus (Mullen et al., 1998). Zinger et al (2009, Fig. 8., 2011) by studying the contrasting diversity patterns of bacterial and fungal soil communities in alpine ecosystem found that, microbial beta diversity was correlated with soil pH, whereas fungal beta diversity was mainly related to soil organic matter. The same results was realized by Lentendu et al (2011) by studying the same landscape in order to estimate the fungal diversity within different alpine tundra habitats and also found that fungal beta diversity patterns were significantly correlated with environmental conditions.

30

Figure 8: Seasonal variations of bacterial and fungal communities assessed by CE-SSCP. (a) Between three sites from August 2006 to May 2007, (b) in the site B during 2 years from June 2005 to May 2007. The ESM locations are in filled symbols and LSM in open symbols, squares indicate Site A, circles Site B, triangle, Site C; June, Ju; August; Au; October, Oc; and May, Ma; 2005, 5; 2006, 6; and 2007, 7. Small symbols indicate samples grouping at atypical positions. Molecular fingerprints were compared by computing bootstrapped dendrograms. Thick lines indicate branches supported by a bootstrap value 4500. The ovals show the relevant groupings: thick dark gray lines: May samples post-winter convergence; dark-grey-dashed lines: monthly grouping of ESM sites; light-grey dashed lines: yearly grouping in ESM (Zinger et al., 2009).

2.3.2.2 Climate changes determine vegetal communities in alpine environment: Alpine plant species are variably sensitive to climate change (Theurillat & Guisan, 2001; Thuiller et al., 2005; Walker et al., 2006), that respond in a fast and flexible way (Cannone et al. 2007), and could be particularly endangered by climate change because of the loss of their habitat (Schöb et al., 2009). Snow-cover duration, which is largely related to climate changes, has the potential to affect plant growth through its influence on temperature, light regime, wind exposure, soil-water content, and nitrogen availability and disturbance regime. It is therefore largely accepted that repeated and constant events of snow-melting patterns influence and determine the strategy used by plant community against these stresses and/or disturbance. Therefore differences in snow-melting events could be a strong ecological factor of alpine plant species diversity (Choler, 2005). The occurrence of several species is restricted to snowbeds, thereby characterizing this extreme alpine plant community (Schöb et al., 2009). For example, climate change between 1953 and 2003 has largely affected the vegetation in a high alpine site of the European Alps. As these changes follow a sharp increase in both summer and annual temperatures after 1980, it is supposed that vegetation of the alpine (2400-2800 m) and nival (above 2800 m) belts

31 respond in a fast and flexible way, contradicting previous hypotheses that alpine and nival species appear to have a natural inactivity and are able to tolerate an increase of 1-2 degrees C in mean air temperature (Cannone et al., 2007). By measuring the absorption of many amino acids of the four major ecosystem types in arctic Alaska, Kielland (1994), has found that deciduous shrubs had higher uptake rates than the more slowly growing evergreen shrubs, suggesting that new growth created a sink that strongly influenced capacity for amino acids uptake. Moreover, Wipf & Rixen (2010) work showed that flowering phenology responded strongly to changes in the timing of snowmelt. The least responsive group of species was graminoids. However, they did show a decrease in productivity and abundance with experimentally increased snow covers. The species group with the greatest phenological response to snowmelt changes was the dwarf shrubs. Wilson & Nilsson (2009) also indicated that experimental warming affected only slightly alpine plant community composition. However associated to changes in snow cover duration and depth, the effects of global climate change on vegetation may be especially pronounced in alpine meadows ( Theurillat & Guisan, 2001), which is illustrated typically by an increase in shrub cover and a loss of species richness (Wilson & Nilsson., 2009). Warming will cause a decline in biodiversity across a wide variety of tundra, at least in the short term, which provides rigorous experimental evidence that recently observed increases in shrub cover in many tundra regions are in response to climate warming. These changes have important implications for processes and interactions within tundra ecosystems and between tundra and the atmosphere (Walker et al., 2006). Vegetation within alpine ecosystem could be relevant to snow-melt date and duration, which could be illustrated by the presence of early snow-melt location and lately snow-melt location. Thus, vegetation within alpine ecosystems, could be devied into: early snow-melt locations (ESM) vegetation, which is dominated by stress tolerant species, Kobresia muysoroides (Cyperaceae ), Dryas octopetala (Rosaceae ) and lately snow-melt locations (LSM) vegetation, which is dominated by low-stature species that can support shorter growing season such as Carex foetida (Cyperaceae), Alpecurus alpinus (Poaceae), Alchemilla pentaphyllea (Rosaceae) and Salix herbaceae (Salicaceae), (Choler, 2005; Zinger et al., 2009; Baptist et al., 2008; Baptist et al., 2010 ). Another way for determining vegetation classification within alpine environment could be depending on their growth strategy into; slow-growing species as the alpine herb Acomastylis rossi , and fast-growing species, such as the grass Deschampsia caespitosa (Bardgett et al., 2005).

32 2.3.2.3 Snow-cover impact on plant-microorganisms interactions: In terrestrial ecosystems, the response of plant communities and symbiotic microorganisms, such as mycorrhizal fungi and nitrogen-fixing bacteria, to climate change is quite well understood, both in terms of physiology and community structure, as it represents a crucial factor that determines the nature and extent of terrestrial-ecosystem feedback responses. However, understanding the responses of microbial communities to climate change is complicated by the vast and largely unexplored diversity of microbiota found in the terrestrial environments, for which only a few examples of food webs have been fully constructed (Singh et al., 2010). Plant microbial interactions may be critical factors for freeze- thaw and drying-rewetting events response (Groffman et al., 1999), through an increase of soil amino acid concentration and a decrease of microbial population size, which would provide plants with windows of increased resources and decreased competition (Lipson & Monson., 1998). Variations in plant-derived C substrates influence microbial diversity, whereas microbial activity, biomass, and immobilization of N influence plant diversity. For example, some slow-growing phenolic-rich plant species ( Acomastylis rossi ) exude C into the soil to influence microbial immobilization of N, promoting low nutrient conditions whereas other, fast-growing species ( Deschampsia caespitosa ) exhibit high turnover of fine roots, promoting more fertile conditions (Bardgett et al., 2005). Spring snow-melt triggers a crash in the microbial biomass and a concordant pulse of nitrogen, probably from lysed microbial cells. Much of this nitrogen is then taken up by plants, and the growing season begins anew (Nemergut et al., 2005). Under-snow microbial communities metabolize alleochemical released from plant litter in the fall and winter, before plant growth commences in the spring (Schmidt & Lipson, 2004). Therefore, variations in the chemistry of these compounds provide a potential source to promote diversity in the microbial community. For example, litter that is rich in easily degraded poly-phenolic compounds enhances overall microbial biomass, particularly fungi, whereas litter that is rich in carbohydrates and sugars enhances bacterial growth (Bardgett et al., 2005).

33

Figure 9. Seasonal changes within alpine soil (Bardgett et al., 2005).

In alpine wet meadow, a s illustrated in ( Figure 9); in autumn phase: Senescing plants provide a pulse of labile C to support microbial growth (Figure 9a). Then in winter phase; microbial biomass continues to increase in soils warmer than -5° C as C and N in plant litter is consumed and mineralized (Figure 9b). Reaching spring phase, lead to rapid changes in microclimate and the exhaustion of labile C compounds lead to turnover of microbial community, with concomitant release of labile N for plant uptake (Figure 9c), and finally, in Summer phase, plant uptake of N to meet growth demands occurs during the early summer, followed by a period of C sequestration and loss to soil microbes (Figure 9d), (Bardgett et al., 2005). In contrast, in alpine dry meadows, snow cover is irregular and soils are frequently exposed to very low air temperature, thus, soils are frozen most of the winter after receiving plant litter in the autumn. During the spring, the snow pack melts, briefly saturating soils and exposing them to a more dynamic temperature regime. During the early summer, soils warm and start to dry as plants become active. In late summer the soils become quite dry, until the monsoonal rains in august or September. Plants senesce during the fall and soils again become frozen and exposed until late winter or early spring (Lipson et al., 2002).

2.3.2.4 Biogeochemical cycles in alpine environment in function of Snow-cover: In arctic and alpine tundra, the duration of snow cover has radical impacts on ecosystem structure and functioning. Major nutrient and carbon cycling is influenced by the number, frequency and duration of freeze-thaw events, or any changes in snow cover regime

34 (Robinson, 2002; Edwards et al., 2007; Björk et al., 2008). Snow directly and indirectly controls nutrient and carbon cycling (Groffman et al., 2001). Direct control involves the short term effects of snow cover on winter soil temperature (by insulating soil) and/or on summer soil moisture, whereas, indirect control involves the long term effects of snow cover variations on growing-season length, soil fertility and water availability. Snowpack development could affect (i) soil microbiological activity and nutrient transformations; (ii) the capacity of the accumulating snowpack to retain atmospheric derived solutes; (iii) preferential elution and rapid runoff of solutes from the snowpack during periods of thaw; and (iv) leaching of solutes. Melting of the snowpack can lead to wide fluctuations (including diurnal) in soil moisture and in prevailing local environmental conditions, such as pH and redox. When coupled with intense hydrological activity during the spring snowmelt, this can result in a period that experiences considerable loss of nutrients from the soil system. Individual processes operate and vary spatially (both horizontally and vertically within the soil profile) and temporally, responding to factors such as the availability of carbon and other substrates (Edwards et al., 2007). There is currently much interest in carbon (C) cycling because of potential interactions and feedbacks with climate changes. While anthropogenic C emissions are driving current climate changes, the coupled nature of climate and C cycling could accelerate this change. This is because terrestrial ecosystems store vast quantities of C and models suggest that feedbacks between climate and terrestrial C cycling could stimulate the transfer of C from terrestrial ecosystem to atmosphere because of, for example, enhanced rates of decomposition, changing vegetation type in the arctic, and the loss of Amazonian forests (Nielsen et al., 2011). The response of the carbon cycle of the Arctic to changes in climate is a major issue of global concern (McGuire et al., 2009). Freezing and thawing of soils may affect the turnover of soil organic matter and thus the losses of C and N from soils ( Fig. 10). Changes in microbial biomass and populations, root turnover and soil structure might explain increased gaseous and solute fluxes of C and N following freeze-thaw events (Matzner & Borken., 2008).

35

Figure 10: A simplistic conceptual model to illustrate complex feedbacks caused by climate change. Increased levels of carbon dioxide (CO 2) result in a higher plant biomass and a higher rhizodeposition of carbon, which increases microbial biomass and activity in the short term. However, in the long term, limitation of mineral nutrients such as nitrogen may constrain this response. Such mineral limitation will affect the dominance of oligotrophic and copiotrophic microorganisms in a given ecosystem, which in turn may influence CO 2 flux (Singh et al., 2010 ) Previous studies have suggested that these stresses result in root and microbial mortality, releasing labile organic N and C to soil and increasing available N (Pilon et al., 1994). Several experimentations on alpine and tundra ecosystem showed several approaches and perspectives concerning C mineralization. Recent evidence indicates that significant amounts of C may be lost as CO2 to the atmosphere from tundra ecosystems during the fall, winter and spring months, which is related to the fact that high latitude ecosystems are particularly vulnerable to climate change (Oechel et al., 1993; Oechel et al., 1997). Only a few studies have addressed the relative importance of species vs. climatic effects on decomposition in alpine or arctic tundra. Hobbie (1996) compared the effects of increased temperature and litter from different Alaskan tundra on carbon and nitrogen mineralization in microcosms and found that Warming between 4 degrees and 10 degrees C significantly increased rates of soil and litter respiration, litter decomposition, litter nitrogen release, and soil net nitrogen mineralization . Baptist et al (2008) found that winter deep snowpack creates an abiotic environment that is more stable and favorable for litter decomposition. However, the results also showed that growth form was a more important driver of decomposition than snowpack depth. Thus, changes in litter quality resulting from community-level shifts in dominant growth forms will likely have a stronger impact on litter decomposition in alpine tundra than the direct effect of changing snow regimes (Baptist et al., 2010). Correspondingly, Nadelhoffer et al (1991) showed that C mineralization was more strongly related to organic matter quality than to temperature in

36 arctic tundra. Williams et al. (1998) reported that if the snowfall increases in alpine or arctic areas, an increase in C mineralization may be expected. Oechel et al. (1997) believed that the relationship between rates of decomposition and N mineralization is ‘relatively weak’, because “although rates of microbial respiration appear to increase exponentially with elevated temperature, rates of N mineralization appear to be insensitive to small or intermediate variations in soil temperature and moisture due to rapid immobilization of mineralized N by soil microorganisms (Robinson, 2002). However, nitrogen cycling is also affected by snowpack in the alpine ecosystems. Climate changes as consequence of changes in duration and size of the snowpack can largely increase N 2O production (Nemergut et al., 2005; Matzner & Borken., 2008). Brooks et al (1998) proposed a potential mechanism for the transfer of N between microorganisms and plants through freeze/thaw events, especially, when snow depth is no longer sufficient to prevent diurnal temperature fluctuations on soil. Soil exposed to freezing temperatures at night may result in the lysis of microbial cells. As soils warm during the day, N released from the completion of this freeze/thaw cycle is potentially available to plants, which are better able to survive the freeze/thaw (Edwards et al., 2007). Baptist et al (2010) found that a reduced snow cover may have a weak and immediate direct effect on litter decomposition rates and N availability in alpine tundra. A much larger impact on nutrient cycling is likely to be mediated by longer term changes in the relative abundance of lignin-rich dwarf shrubs. In general, ectomycorrhizal species had higher amino acid uptake than did non-mycorrhizal species. Moreover, using a snow depth manipulating study conducted in a northern hardwood forest, Groffman et al. (1999) showed significant increases in N cycling and loss in those plots where the snow was removed in comparison to the controls. This also could be related to the fact that, the lack of snow covers results in colder soil temperatures, more extensive soil freezing and an increase in freeze –thaw cycles (Edwards et al., 2007). Schmidt et al (2007) have proposed a model for the nitrogen cycle within arctic environment. It is shown in Fig. 11.

37

Figure 11: A conceptual model of the succession of N cycles and losses from seasonally snow-covered ecosystems based on year-round studies of alpine meadows. From left to right, the fall/winter cycle is a time of microbial buildup (low turnover), high depolymerase activity, and immobilization of N into microbial cells. At snow melt-out, the cold-adapted (psychrophilic) microbial biomass crashes, resulting in release of dissolved N (DN). Some of this N is retained by growth of the snowmelt microbial community and overwintering endophytic fungi or is lost to leaching. N in the summer microbial community turns over approximately 10 times (N turnover time of 13 –18 days) and each turn of the microbial cycle releases DON and DIN. Most of this N is rapidly retaken up by the microbial biomass, but some may be taken up by plants and/or lost to leaching. The magnitude of N losses during the summer should be directly related to the rapidity of microbial turnover and relative plant demand for N ( Schmidt et al (2007 ).

In alpine tundra system, plants are generally poor competitors for nutrients relative to microorganisms (Lipson & Monson., 1998; Wallenstein et al. 2008). Even if plants only take up a small fraction of the microbial N released in the rhizosphere this is an important source of N for plants in natural ecosystems. In alpine soils, plants only have to capture about 5% of the N released by turnover of microbial N to meet their N demands for growth, which strongly support the hypothesis that turnover of the microbial community is the largest source of DON and DIN for plant uptake during the plant growing season (Schmidt et al., 2007). Inorganic N supplied to plants by mineralization is not sufficient to meet the annual requirement of N by many tundra species. Whereas N mineralization is slow in tundra soils and concentrations of inorganic N are also low, these soils have large stocks of both structural and soluble organic N (Kielland, 1994).

2.3.3 Snowmelt gradient in relation with ecosystem functions: A fundamental assumption in landscape ecology is that spatial patterns have significant influences on the flows of materials and energy while processes create and modify spatial patterns. Various scale issues relating to the dynamics of snow exist. While knowledge of the link to processes at the plot scale is growing, there are difficulties associated with up-scaling and extrapolation to the regional, continental, or global scales (Edwards et al., 2007). Abiotic conditions in snowbeds are largely regulated by snow. First, beginning and duration of growing season is determined by the date of snowmelt or the moment the snow cover almost

38 disappeared. The time and duration of the snowmelt is controlled by the thickness of snow pack, the quality of snow, and by climatic factors, such as air temperature and solar radiation. Second, the soil temperature during the growing season is also largely influenced by snow, i.e. the quantity of accumulated snow and the duration of melting period regulate the amount and inflow of cold melting water, which keeps the soil humid and cool (Edwards et al., 2007; Schöb et al., 2009; Baptist et al., 2010). In addition to significant seasonal variation in the structure and function of the microbial communities, these soils demonstrate considerable spatial variation, some of which is also controlled by snowpack (Nemergut et al., 2005). Three periods were identified based upon the length and general significance of contrasting physical conditions. These included a pre- and post-snowpack period likely to experience freeze/thaw conditions with a relatively stable middle period of continuous snow cover. A final hydrological active period was associated with snow melt. However, the balance between overall nutrient/carbon sequestration and overwinter loss is dependent to some extent upon the relative timescales of each period. Snowpack thickness influences diffusion of atmospheric oxygen into soil which can result in reduced soil oxygen availability due to heterotrophic respiration under deep snowpack’s (Edwards et al., 2007). We can recognize depending on differences in snow-packs thickness duration and quality between late and early snowmelt locations. In late snowmelt locations, soils are classified as stagnogley, enriched in clay whereas in early snowmelt locations are considered as alpine ranker ( Duchaufour & Gilot, 1966). In late snowmelt locations, deep snow cover, acts as an insulating layer, maintains soil temperature at 0°c, whereas the shallow and variable snow cover of early snowmelt locations leads to very low soil temperatures during winter and tends to limit microbial activity (Nemergut et al., 2005; Baptist et al., 2010). Snow cover, through soil temperature, may therefore impact decomposition in a significant way particularly during winter (Baptist et al., 2010). A lack of snow cover results in colder soil temperatures, more extensive soil freezing and an increase in freeze –thaw cycles (Edwards et al., 2007). Whatever the species; early snowmelt locations showed consistently reduced decomposition rates and delayed final stages of N mineralization. This lower decomposition rate was associated with freezing soil temperatures during winter. Lately and Early snowmelt locations snowpack correlates with higher levels of under-snow microbial respiration (Nemergut et al., 2005), which is illustrated by the different winter and summer soil respirations in function of temperature. Respiration proceeded rapidly at 0°c during winter, but respiration in the summer soil was almost completely halted at 0°c (Lipson et al., 2002).

39 In alpine environnements, Bryant et al. (1998 ) considered that the variation in decomposition rates along a snowmelt gradient was a pared to late snowmelt locations. During summer, soil temperature and soil moisture were similar in late snowmelt and early snowmelt locations. Thus, summertime conditions cannot explain the effect of snowmelt location on yearly mass loss. In contrast, the positive relationship between mass loss and mean winter soil temperature clearly shows that snowmelt location effects are related to soil temperature and therefore to snow depth during winter. In late snowmelt locations, deep snow cover, which acts as an insulating layer, maintains soil temperature at 0°C. And due to greater insulation, production of CO 2 was significantly greater under deeper snowpack’s (Edwards et al., 2007). The change of snow cover dynamics in alpine tundra will have profound impacts on microbial communities. For example, a reduction in snowpack could result in a loss of fungal diversity, as we observed between LSM and ESM locations (Zinger et al., 2009). Schöb et al (2009) has analyzed the small-scale species distribution along the snowmelt and soil temperature gradients within alpine snowbeds in the Swiss Alps, and found that the date of snowmelt and soil temperature were relevant abiotic factors for small-scale vegetation patterns within alpine snowbeds communities. Surface litter decomposition studies show that there was a significant increase in the litter mass loss under deep and early snow, with no significant change under medium and little snow conditions. However, changes in climate that result in differences in snow duration, depth, and extent may therefore produce large changes in the C and N soil dynamics of alpine ecosystems (Williams et al., 1998).

Consequences of climate change on snowbeds are largely illustrated by the extinction of their highly specified plant species due to the loss of appropriate habitats. The distribution of most of the characteristic snowbeds species is limited to cruel environmental conditions occurring in snowbeds and significantly dependent on the respective snowmelt date and/or soil temperature. In summary, substrate availability and enzyme activity data, indicate that there is a yearly succession of dominant substrates for microbial growth progressing from carbon polymers/phenolics (winter) to proteins (snowmelt) to rhizo-deposition (summer). The snow-melt microbial community is largely fueled by protein released from the crash of the winter microbial community. The broad microbial processes of both C and N mineralization at high latitudes are governed by a suite of often auto correlated factors that include low soil temperature, exceptionally high or low soil moisture contents and organic matter of low resource quality.

40 Le sol alpin : un écosystème particulier L’environnement alpin est défini comme étant froid, venteux et enneigé et caractérisé par des périodes avec des basses températures et des couvertures de neige durables. Ces écosystèmes sont dominés par des espèces de plantes avec un faible taux de croissance et par une très faible productivité primaire, un haut contenu en matières organiques et des réserves limitées de nutriment. En raison d’un faible turnover, d e grandes quantités de carbone organique sont supposées être séquestrées dans ces environnements, qui sont estimées à 11 % de carbone total des sols à l’échelle mondiale. C’est pourquoi, les sols de s toundras arctiques et des sols alpins jouent un rôle important dans les processus biogéochimiques globaux, surtout pour le cycle de carbone. Cette grande réserve de carbone, est exposée au changement climatique et peut mettre ce carbone en danger par des processus de minéralisation via l’augmentation de la respiration des communautés microbiennes, qui peut induire des rétroactions potentielles positives au réchauffement climatique. En plus du pH des sols, la longueur de la période de croissance et la température maximum quotidienne, représentent les facteurs abiotiques clés qui contrôlent les communautés microbiennes et végétales des sols enneigés. En plus des plantes, beaucoup d'organismes dans le sol sont adaptés aux conditions environnementales extrêmes et ont développé des adaptations spécifiques pour survivre et donc jouer un rôle important dans le fonctionnement de ces écosystèmes extrêmes. À part les microorganismes, ceux-ci incluent les groupes d’invertébrés suivants : lumbricidae, enchytraeidae, collembola, acari, diplopoda, isopoda, nematoda, protozoa et les larves d'insectes. Les microbes sont grandement impliqués dans le déroulement des processus biogéochimiques du sol à grande échelle. Les changements climatiques sont reconnus comme un facteur crucial dans la détermination de la composition et de la diversité des plantes dans l'environnement alpin. Par conséquent, un changement de la composition des formes de croissance, comme observé dans l'écosystème arctique pourrait avoir des effets significatifs sur les cycles biogéochimiques dans les écosystèmes alpins. Les cycles majeurs du carbone et des nutriments sont influencés à la fois par le nombre, la fréquence et la durée d'événements de fonte de neige, ou de n'importe quels changements dans le régime de la couverture neigeuse des sols. Les végétaux alpins sont variablement sensibles aux changements climatiques, ce qui peut largement influencer la végétation alpine, qui répond d'une façon rapide et flexible et pourrait être particulièrement mise en danger par le changement climatique à cause de la dégradation de leur habitat. La durée de la couverture neigeuse des sols, qui est liée au changement climatique, a le potentiel d‘ affecter le développement des

41 plantes par son influence sur les températures, la lumière, l’exposition au vent, le contenu en eau dans le sol, la disponibilité de l’azote et finalement le régime des perturbations. Il est donc admis que les événements répétés et constants de gel-dégel influencent et déterminent la stratégie utilisée par la communauté des plantes contre les perturbations physiques. Les communautés microbiennes des sols de toundra alpine suivent des cycles saisonniers profonds qui sont caractérisés par des modifications de ces communautés microbiennes, des processus et de la disponibilité de nutriment. La neige contrôle directement et indirectement les cycles du carbone et des nutriments. Le contrôle direct implique les effets à court terme de la couverture de neige sur la température du sol en hiver et sur l’humidité du sol en été, tandis que le contrôle indirect implique les effets à long terme de variations de l’enneigement sur la période de la saison de développement, la fertilité du sol, et la disponibilité en eau. Dans les écosystèmes terrestres alpins, les plantes sont généralement de faibles concurrents pour les nutriments par rapport aux microorganismes. Même si les plantes prennent seulement une petite fraction de l’azote libéré dans la rhizosphère cela reste une source importante d’azote pour ces plantes dans les écosystèmes naturels souvent carencés. La disponibilité des substrats et les activités enzymatiques indiquent qu'il y a une succession annuelle de substrats dominants pour la croissance des communautés microbiennes, des polymères de carbone en hiver, vers les protéines pendant la fonte des neiges et finalement la rhizodéposition en été. Les communautés microbiennes observées en période de fonte de neige sont largement alimentées par les protéines libérées par l'effondrement de la communauté microbienne d'hiver. Les processus microbiens pour la minéralisation du carbone et de l’azote en haute altit ude sont gouvernés par une succession de facteurs, souvent corrélés, qui incluent la basse température des sols, leur teneur en eau exceptionnellement haute ou basse et la faible qualité de ressource des matières organiques. Nous pouvons observer également des différences de propriétés des sols à l’enneigement contrasté en durée et en épaisseur. Dans les zones de fonte des neiges tardive (late-snowmelt, LSM), les sols sont souvent classifiés comme des stagnogley, enrichis en argiles, alors que dans les zones de fonte des neiges précoce (early snow melt, ESM), les sols sont souvent classifiés comme des rankers alpins. Dans les premiers (LSM), la couverture profonde de neige, agit comme une couche d'isolation et maintient la température des sols aux environs de 0°c, alors que la peu profonde et variable couverture neigeuse des seconds (ESM) expose les sols à de très basses températures en hiver ce qui limite fortement l'activité microbienne.

42 2.4 Methods for studying functional diversity of soil: Microbial activities shape the biogeochemistry of the planet and macroorganisms health (Kirk et al., 2004). However, despite their abundance, the impact of soil microbes on ecosystem processes is still poorly understood. Determining the metabolic processes performed by microbes is important for both understanding and manipulating ecosystems (Van der heijden et al., 2008; Bastida et al., 2009). Therefore, understanding the diversity of soil organisms could afford better comprehension of the impacts of soil microbial activity at ecosystem and major soil ecological processes (Broll, 1998). Due to the inability to culture most microbes, heterogeneity of soil composition, high microbial diversity, complex spatial arrangement of biota and their resources, physical protection of microorganisms, and the lack of information concerning the majority of the micro biota, it has been difficult to understand the nature and dynamics of microbial habitats despite the fact that modern microbial techniques show a great promise in linking the ecological concepts and theoretical framework into ecosystem level function (Kirk et al., 2004; Gupta et al., 2008; Van der Heijden et al., 2008; Bastida et al., 2009).

2.4.1 Functional diversity analysis using traditional and biochemical approaches: Early attempts were made to simulate natural conditions and to obtain more direct evidence of the biochemical and ecological activities and functions of microorganisms in soil. These methods used single enrichment of the soil with an appropriate substrate and analyses of metabolic products that led to results characteristic for a closed system (Gabriel, 2010). One of the methods developed using this technique is the catabolic response profile (CRP), which is a measure of short-term substrate-induced respiration, that has been used to calculate the diversity (range and evenness) of catabolic functions expressed in situ (Torsvik & Øvreås, 2002). Measuring of respiration activities as an indice of functional activities can be divided into three major groups: measurement of carbon dioxide production (HolmJensen, 1960), measurement of oxygen consumption, and determination of the respiratory quotient (RQ, a ratio of the volume of carbon dioxide produced to the volume of oxygen consumed in respiration over a period of time), (Stotzky, 1960). Another method that uses specific substrate-uptake profiles is the combination of fluorescent in situ hybridization and microautoradiography that can be used to characterize individual bacterial cells within complex microbial communities (Torsvik & Øvreås, 2002; Benndorf et al., 2007; Bastida et al., 2009). Overall, these approaches only provide limited information on the populations associated with a specific process rather than a complete description of their functional role

43 within a community (Maron et al., 2007). Meanwhile, this method offers a powerful technique for identifying microorganisms that are actively involved in specific metabolic processes (Torsvik & Øvreås, 2002). Measuring soil enzyme activities is one of the most important developed approaches for studying functional diversity. The techniques were based mostly on spectrophotometry, fluorescence, radiolabelling and gas- or high-pressure liquid chromatography. The sources of enzymes in soil may be microorganisms, plants or animals, both living and dead, from which enzymes can be released due to changes in membrane permeability or after cell decomposition, with majority of these enzymes of microbial origin and are closely related to microbial abundance and/or activity (Gabriel, 2010). Several enzymes related to important biochemical processes were detected using these methods, for example, enzymes degrading cellulose, hemicelluloses and other polysaccharides and those involved in lignin transformation, which are considered as the most important in soil (Hayano et al., 1986). Also, organisms (filamentous wood-decaying fungi) specified to degradation of the xylem cell wall components (cellulose, hemicelluloses, lignins and extractives) were detected using these techniques (Lundell et al., 2011). Another biochemical approach is the Phospholipids fatty acids (PLFA) technique: In this technique, structural data obtained from lipids are based on the occurrence of fatty acids or quinones that are specific for certain taxa. Using this technique, a metabolic function is only inferred from the presence of microorganisms known for that function (Anderson & Cairney, 2004; Benndorf et al., 2007). Stable isotope probing (SIP) is another largely used approach that separates nucleic acids of different organisms according to their abilities to use particular substrates labeled with stable isotopes (Anderson & Cairney, 2004; Benndorf et al., 2007, Bastida et al., 2009; Singh et al., 2010). Metaproteomics approach or the identification of all the proteins in soil provides information about the actual functionality in relation to metabolic pathways (Benndorf et al., 2007; Maron et al., 2007; Bastida et al., 2009). Metaproteomics detect gene expression at the final level of translation. However, comparing with acids nucleic based approaches, protein based methods undergo several drawbacks. Firstly, there are no commercial protein extraction kits as there are for DNA and RNA. Secondly, soil is a poor source of proteins, at least compared with microbial cultures, and, unlike in the case of DNA or RNA, no techniques such as PCR exist to ‘replicate’ proteins . Thirdly, many proteins are closely associated with compounds that interfere with their identification, such as those of humic origin, and finally, the genome database for protein matches for soil microorganisms is not complete, so more efforts should be made to develop this approach in genomic projects (Bastida et al., 2009). Overall, due to

44 the fact that the majority of microorganisms in soil are uncultivable (Dinsdale et al., 2008); this majority, as a consequence are not represented using any traditional approach, which demonstrates the limit of these approaches to describe natural diversity and that most of the microbial diversity remained unexplored (Torsvik & Øvreås, 2002; Saleh-Lakha et al., 2005; Maron et al., 2007), and their role, in terms of their contribution to nutrient and energy flow, soil respiration, gene transfer, degradation of pollutants, diseases will be underestimated or completely missed (Dinsdale et al., 2008). 2.4.2 Molecular approaches to solve ecological questions: 2.4.2.1 Historical preface: Molecular ecology sciences have passed several steps of development ( Fig. 12). From 1960s, when the concentration was only on studying monoxenic cultures lacking interactions between microorganisms and between microorganisms and their habitats, to the 1980 year, which were started to take into consideration not only single organisms but density, diversity, and activity of microbial populations isolated from natural environments, to nowadays, which is dominated by so-called “omic era”, where functional and structural diversity are studied at all levels of ecosystems (Maron et al., 2007).

Figure 12: Historical and step-by-step evolution of microbial ecology (Maron et al., 2007).

45 However, Pace et al (1985) were the first who tried to defeat limitations related to difficulties in culturing most of the microorganisms by introducing a cultivation-independent approach based on the extraction, amplification, cloning and characterization of rDNA genes directly from natural environments. Starting with these early works, many efforts have been dedicated in order to develop methods to characterize information contained in the nucleic acids extracted from environmental samples. Advances in molecular biology methods have permitted to develop our understanding of microbial functions, their interactions with other organisms and their environment (Yun et al., 2004; Saleh-Lakha et al., 2005), and have had a profound impact on biology and other major disciplines, including evolutionary biology. As a consequence, considerable attention has focused on links between ecological factors such as species diversity, abundance, and distribution and the flux of energy and nutrients through ecosystems. In the future these advances in molecular techniques will certainly play an important role in understanding how ecological and geophysical factors govern ecosystems processes (Johnson et al., 2009).

2.4.2.2 Metagenomic, a step forward for understanding functional diversity: Molecular techniques based on DNA enabled the microbial community structure to be studied at the DNA level and have allowed the abundance of genes related to a given microbial group involved in a metabolic pathway to be quantified (Maron et al., 2007; Bastida et al., 2009; Nowrousian et al., 2010). Metagenome is the collective of DNA fragments from all the microorganisms present in a community (Maron et al., 2007; Bastida et al., 2009; Nowrousian et al., 2010). Metagenomic approach was developed recently for prokaryotic microorganisms in order to understand the (true) functional diversity and activities expressed in soil by microorganisms in response to different environmental constraints and enables studies of organisms that are not easily cultured (Maron et al., 2007; Dinsdale et al., 2008; Guazzaroni et al., 2009; Nowrousian et al., 2010). The basis for metagenomic approach is the construction of metagenomic clone libraries, which involves the extraction of environmental DNA, its shearing and subsequent ligation into suitable cloning vectors (Warnecke & Hess, 2009). However, metagenomic approach not only records the genetic information present in the different genomes of the micro-organisms that colonize these environments, but also offers the opportunity to get insights into genome organization, gene content, functional significance and genetic variability of new microbial species belonging to non-cultivable phyla in natural microbial communities (Dinsdale et al., 2008; Warnecke & Hess, 2009).

46 Metagenomic studies were first performed using Sanger sequencing, which was lately overwhelmed by the advancement of next generation sequencing (NGS) technique. The use of NGS techniques further improves the possibilities offered by metagenomics because NGS eliminates the need for cloning steps, thereby excluding cloning biases, and producing a higher sensitivity due to increased sequencing depths at lower costs (Nowrousian et al., 2010). The Metagenomic approach was used recently in a lot of studies. For example, Yun et al (2004) used metagenomic approach in order to characterize a novel amylolytic enzyme encoded by a gene from a soil-derived metagenomic library. Tringe et al (2005) characterized and compared the metabolic capabilities of terrestrial and marine microbial communities using largely unassembled sequence data obtained by shotgun sequencing DNA isolated from various environments. Martin-Cuadrado et al (2007) presented large sequencing effort in the deep ocean, by studying the metagenomics of the deep Mediterranean. Dinsdale et al (2008) used a metagenomic approach to elucidate the functional potential of nine biomes including subterranean samples, hyper-saline ponds, aquaculture-fish associated, terrestrial-animal- associated and mosquito-associated. The use of metagenomic approaches is hampered by several difficulties and drawbacks. Even if signature DNA sequences could infer the extent of biodiversity in prokaryotes, it provides limited insights into the phenotypic and functional properties of these organisms (Coleman & Whitman, 2005). Genomic plasticity of microbes causes variations in the gene content of closely related strains, making predictions of community metabolism on the basis of representative genomes and signature genes such as 16S ribosomal RNA unreliable (Dinsdale et al., 2008). Moreover, while the use of DNA as a molecular marker reveals information on the presence of organisms or the potential function of a community, it gives no information on activities occurring in situ, and do not provide insights into gene regulation and transcription (McGrath et al., 2008; Sørenson et al., 2009). Other biases introduced using this method are because of primers and/or exponential amplification, simultaneous assessment of all three domain of life, which is impossible, and the persistence of free DNA (Urich et al., 2008), in addition to inefficient DNA extraction, cloning biases and limited sequencing capacity (Warnecke & Hess, 2009). Besides all these difficulties concerning the use of metagenomic approaches for prokaryote organisms, additional difficulties arise when applying this approach for eukaryote ones. The frequent presence of introns and lack of conservation of motifs in promoter sequences prevent expression of genomic copies of eukaryotic protein-coding genes not only in a bacterial cell but also in most eukaryotic hosts. Moreover, as no protocols are easily applied for separating eukaryotic cells from a complex

47 environmental matrix such as soil, a DNA-based metagenomic library that include eukaryotic DNA would also necessarily include prokaryotic sequences. In addition to major limitations of using metagenomic approach for prokaryote, moreover, the use of this approach pose additional problems such as; the large size of eukaryote genome comparing with that of prokaryotes. Overall, it is unlikely that a workable metagenomic library based on genomic DNA can capture a significant fraction of the gene content of a eukaryotic microbial community (Bailly et al., 2007). Questions like how natural microbial communities respond to perturbations in environmental conditions, are better answered by analysis of community mRNA than genomic DNA (Gilbert et al., 2008), and it is well established that mRNA transcripts may correlate more closely to enzymes production rates than functional gene concentrations in DNA (Wallenstein et al., 2008). 2.4.2.3 Transcriptome sequencing for gene expression analysis: An interesting enigma in molecular biology is how the identical genetic make-up of cells can give rise to different cell types, each of which plays a defined role in the functioning of a multicellular organism. This phenotypic diversity has been linked to the fact that different cell types within the organism activate (or express) different sets of genes (transcriptomes) that lead to different cell fates and functions. The correlation of cellular fate and function with gene expression patterns has thus been of prime interest to biologists for decades (Morozova et al., 2009). The transcriptome is the sum of transcripts (mRNAs) of an organism at a defined spatial and temporal locus (Broadley et al., 2008). Transcriptome sequencing has been used for applications ranging from gene expression profiling, genome annotation, and rearrangement detection to non-coding RNA discovery and quantification, and has been a key area of biological investigation for decades (Bastida et al., 2009, Morozova & Marra, 2008). High-throughput sequencing based on RNA to indicate gene activity “Transcriptomic” are being developed in parallel with the high-throughput sequencing and microarrays methods (Hirsch et al., 2010). 2.4.2.3.1 First methodologies, advances and limitations: The earliest attempts to understand cellular transcriptomes included examinations of total cellular RNA from different organisms, tissue types, or disease states for the presence and quantity of transcripts of interest. The first candidate gene-based studies utilized Northern blot analysis (Bastida et al., 2009; Morozova et al., 2009), a low-throughput technique that required the use of radioactivity and large amounts of input RNA. This procedural complexity and requirement for relatively large amounts of RNA restricted Northern blotting to the detection of a few known transcripts at a time from samples where RNA availability was not

48 limited (Morozova et al., 2009). Lately, several approaches were developed. The first one is based on microarrays, where DNA and RNA from environmental samples can be quantified in one step using microarray with several thousands of probes for known genes and pathways (Benndorf et al., 2007). cDNA is hybridized to an array of complementary oligonucleotide probes corresponding to genes of interest, and the abundance of a particular mRNA species is estimated from its hybridization intensity to the relevant probe (Torsvik & Øvreås, 2002). Functional gene arrays that have genes encoding functional enzymes involved in biogeochemical cycling processes fabricated onto the array, have been used as signatures for monitoring the physiological status and functional activities of microbial populations and communities (Saleh-Lakha et al., 2005), for example, Wu et al (2001) has used this technique in order to detect genes coding for nitrate reductase in bacteria. The development of microarrays supplanted single-gene approaches by allowing simultaneous characterization of expression levels of thousands of known or putative transcripts (Morozova et al., 2009). This advance brought about a multitude of expression-profiling initiatives aiming to comprehensively characterize expression signatures of different cell types. Further developments in the microarray field enabled other transcriptomics applications, such as the detection of non-coding RNAs, single-nucleotides polymorphism (SNPs), and alternative splicing events (Warnecke & Hess., 2009). Despite their power to measure the expression of thousands of genes simultaneously, microarray methods do not readily address several key aspects, especially the ability to detect novel transcripts and the ability to study the coding sequence of detected transcripts, and present limited detection sensitivity and quantification reliability (Gao et al., 2007; Morozova et al., 2009; Warnecke & Hess, 2009). Moreover, microarrays approach is limited by the fact that transcript abundance is inferred from hybridization intensity rather than measured straightness, which result in noisy driven data, that could interferes with reproducibility and cross-sample comparisons (Morozova et al., 2009). 2.4.2.3.2 Next generation sequencing, application, advances and limitations: Traditional culture-independent molecular identification methods have so far suffered from high cost and low throughput that render many taxa undetectable (Amend et al., 2010). Transcriptome sequencing studies have evolved from determining the sequence of individual cDNA clones to more comprehensive attempts to construct cDNA sequencing libraries representing portions of the species transcriptome. Due to the high cost of the Sanger method used in these studies and the complexity of the associated cloning step, routine full-length cDNA (FL-cDNA) sequencing efforts were not possible, resulting in low coverage,

49 insufficient to comprehensively characterize whole transcriptomes of multi-cellular species (Morozova et al., 2009). Recently, large-scale sequencing has been revolutionized by the development of several so-called next-generation sequencing (NGS) technologies. These have drastically increased the number of bases obtained per sequencing run while at the same time decreasing the costs per base. Compared to Sanger sequencing, NGS technologies yield shorter read lengths; however, despite this drawback, they have greatly facilitated genome sequencing, first for prokaryotic genomes and within the last years also for eukaryotic ones. This advance was possible due to an associated development of software that allows the de novo assembly of draft genomes from large numbers of short reads (Morozova & Marra, 2008; Nowrousian et al., 2010). The four commercially available new-generation sequencing technologies, 454 (Roche), Genome Analyzer (Illumina, Solexa) and ABI-SOLID (Applied Biosystems), (Marguerat et al., 2008, Morozova & Marra 2008, Morozova et al., 2009), and most recently released Helicos HeliScope, produce an abundance of short reads at a much higher throughput than is possibly achievable using Sanger sequencer ( Fig. 13).

Figure 13: Advances in sequencing chemistry implemented in next-generation sequencers. ( a) The pyrosequencing approach implemented in 454/Roche sequencing technology detects incorporated nucleotides by chemiluminescence resulting from PPi release. ( b) The Illumina method utilizes sequencing-by-synthesis in the presence of fluorescently labeled nucleotide analogues that serve as reversible reaction terminators. ( c) The single-molecule sequencing-by-synthesis approach detects template extension using Cy3 and Cy5 labels attached to the sequencing primer and the incoming nucleotides, respectively. ( d ) The SOLiD method sequences templates by sequential ligation of labeled degenerate probes. Two-base encoding implemented in the SOLiD instrument allows for probing each nucleotide position twice (Morozova et al., 2009).

50 Currently, several strategies and platforms are under development including sequencing by synthesis (SBS), sequencing by hybridization and nanopore sequencing. Pyrosequencing is an SBS method that can sequence thousands of DNA fragments in a few hours (Fig. 13). The entire genome of a bacterium was sequenced in 4.5h with high accuracy, compared with the several months required by the Sanger procedure. A unique feature of pyrosequencing is that several small samples can be run on a single 454 chip. This provides the possibility to perform replications and multiplexing of multiple samples from the same or different organisms in a single experiment (Gowda et al., 2006). Pyrosequencing is an alternative technology that detects the pyrophosphate on nucleotides. The pyrophosphate liberated with each nucleotide addition can generate light in a reaction coupled to ATP sulfurylase and luciferase (Ahmadian et al., 2006). Newly, a lot of development has been made on pyrosequencing technology, and to date the 454 pyrosequencing techniques is the most widely used next generation sequencing technologies for the de novo sequencing and analysis of transcriptome in non- model organisms (McGrath et al., 2008; Morozova et al., 2009; Sun et al., 2010; Hirsch et al., 2010). 454 sequencing technology has experienced a rapid improvement in throughput, read length, and accuracy. Now the newest 454 sequencing platform, the GS FLX Titanium, can generate one million reads with an average length of 400 bases at 99.5% accuracy per run. However, pyrosequencing has several inherent technological problems. The data volumes produced are of unmanageable size with regard to manual editing and, as yet, there is no consensus on how to recognize and handle potential mistakes. The construction of flexible, automated pipelines for processing the data and the reduction of mistakes related to base reading errors are among the greatest challenges in pyrosequencing technology (Tedersoo et al., 2010). The advent of next-generation sequencing technologies has tremendously reduced the sequencing cost and experimental complexity, as well as improved transcript coverage, rendering sequencing-based transcriptome analysis more readily available and useful to individual laboratories. This technological advance challenged the dominant nature of microarrays, enabling many new applications to be introduced for the study of transcriptomes (Morozova & Marra, 2008; Morozova et al., 2009). Moreover, Next-generation sequencing technologies have revolutionized transcriptomics by providing opportunities for multidimensional examinations of cellular transcriptomes in which high-throughput expression data are obtained at a single-base resolution (Marguerat et al., 2008; Bastida et al., 2009). However, continually decreasing costs of next-generation sequencing technologies, combined with the added qualitative information value of sequence data as compared to microarrays, could make transcriptome sequencing the preferred means of evaluating

51 questions regarding gene expression and alternative splicing for model organisms. De novo sequencing continues to represent the best approach for non-model organisms where genomic sequence data is lacking. The ability to conduct omics level experiments using only a few cells will allow researchers to answer questions that were previously intractable due to the heterogeneous and complex nature of most tissue samples (Kapteyn et al., 2010). There are several caveats of next generation sequencing based approaches. Multiple difficulties were encountered during the development of protocols involving cDNA synthesis and amplification. For example, researchers observed artifacts such as primer dimmers that dominated sequencing data sets and reduced effective coverage, prompting the use of semi suppressive PCR to reduce primer dimmer frequency. Thus, although these methods may be useful for qualitative applications, establishing and improving their quantitative capabilities will probably require additional developments (Ozsolak & Milos, 2010). Moreover, despite the growing availability of next-generation sequencing technologies, significant effort is still often needed to collect the required biological starting material. For transcriptome analysis, this generally entails the synthesis and subsequent amplification of cDNA. Although exponential amplification raises concerns representation bias, exponentially amplified cDNA is representative of the transcript population, even when starting with the equivalent of a single cell’s RNA (Kapteyn et al., 2010). In addition, given the tendency of reverse transcriptase to generate false second-strand cDNA products during first-strand cDNA synthesis, it is not clear whether the approaches that rely on sequencing first-strand cDNA products (either directly or by intra- or intermolecular ligation) are absolutely strand specific. Second, ligation tends to have sequence preferences. Thus, the approaches that rely on ligation may suffer from various representational biases. Examples of such bias are found in transcriptome profiling and ribosome profiling experiments, in which extremely uneven coverage was seen for libraries prepared using ligation, compared with libraries prepared using enzyma tic 3’ polyadenylation (Ozsolak & Milos, 2010). A common outcome from these collections is the cataloguing of numerous uncharacterized genes, a large portion of them identified as being among the most predominant in these environments (Tringe et al ., 2005; Gilbert et al ., 2008). This outcome is a product of the low genomic diversity currently represented in sequence databases that have historically relied on cultured organisms. As is well documented, uncultured microorganisms represent the vast majority of microbial diversity, and so these databases disgustingly underrepresented the functional potential of microorganisms (Morales & Holben, 2010).

52 NGS technologies hold great potential for the study of eukaryotic microorganisms. One reason for this is that because of their mostly small genome sizes, high sequence coverage can already be achieved with a moderate amount of NGS and therefore projects like genome sequencing that were previously restricted to large genome centers are now possible even for small groups. Filamentous fungi were among the first eukaryotes for which de novo genome sequencing projects included or were solely based on NGS reads (DiGuistini et al., 2009; Nowrousian et al., 2010), and yeasts were among the first organisms for which RNA-seq was established (Nagalakshmi et al., 2008). For other eukaryotic microorganisms, NGS has been so far mainly used in transcriptomics studies in combination with Sanger sequencing to establish EST libraries, e.g., for the omycete Pythium ultimum (Cheung et al., 2008) or the charophyte algae Coleochaete orbicularis and Spirogyra pratensis (Timme & Delwiche, 2010). However, with 65 sequenced genomes and more than 70 genome sequences in progress from green algae to a number of species that belong to diverse phylogenetic groups summarized as protists; it is only a matter of time until NGS-based approaches are applied to eukaryotic microorganisms on a broad scale (Nowrousian et al., 2010). For eukaryotic microorganisms, this will be extremely useful in combination with de novo genome sequencing since most sequenced genomes can only be annotated automatically, with manual annotation projects being extremely labor- and cost-intensive and therefore restricted to a few select model organisms. This leads to automatically predicted gene models that are often not correct, especially with respect to intron distribution. The inclusion of RNA-seq data in a genome sequencing project allows much better, evidence-based gene predictions during automated annotation without greatly increasing costs (Nowrousian et al., 2010). Nevertheless, the few examples published up till now have successfully demonstrated the potential for discovery of genes and genetic markers in these systems; Weber et al (2007) used Massively Parallel Pyrosequencing to study Arabidopsis thaliana seedling transcriptome. Barakat et al (2009) used 454 to compare the transcriptomes of American chestnut (Castanea dentate) and Chinese chestnut (Castanea mollissima) in response to the chestnut blight infection caused by the fungus, Cryphonectria parasitica, that infect stem tissues and kills the trees by grinding them. Sato et al (2009) presented the first genome-level transcriptome of the wood-degrading fungus Phanerochaete chrysoporium grown on red oak, using poly(A) mRNA and 454 pyrosequencing approaches. Moreover, Wang et al (2009) presented a global characterization of Artemisia annua glandular trichome transcriptome using 454 pyrosequencing, using oligo(dT) primer. Sun et al (2010) by studying American ginseng root transcriptome using a GS-FLX titanium platform, found that transcriptomes analysis based on

53 454 pyrosequencing is powerful tool for determining the gene encoding enzymes responsible for the biosynthesis of secondary metabolites in non-model plants.

2.4.2.3.3 Tag-based approaches for gene expression profiling. Tag-based approaches are the most used strategies for gene expression analysis. One of the major advantages of tag-based methods for gene expression profiling is their ability to detect absolute gene expression values by retrieval of actual tag counts. Unlike microarrays, that rely on previously identified or predicted sequences, tag-based methods are ideal for gene discovery and detection of low level gene expression due to the depth of sampling through sequencing multiple tag per clone. These features render the practices of normalization and subtraction commonly performed in cDNA library construction obsolete when it comes to tag-based approaches. Tag-based approaches offer the unique advantage of not only cataloging gene expression but of providing a profile of the transcriptional activity within a genome. Differential gene expression, transcription start site identification, alternative polyadenylation site determination, antisense transcription, small RNA profiling, and low- abundance transcript discovery, are some of the applications that tag-based methodologies can efficiently provide with high throughput (Vega-Sanchez et al., 2007). In the last decade, various sequence-based strategies have been developed for transcriptome studies: I. Serial Analysis of Gene Expression (SAGE): It was the first reported tag sequencing method for gene expression profiling. SAGE experiments offered many advantages over microarrays, such as the ability to detect novel transcripts, the ability to obtain direct measures of transcript abundance thus allowing easier comparisons between multiple samples, and the discovery of novel alternative splice isoforms. However, SAGE studies still involved a laborious cloning procedure, were costly, and produced short sequence tags (14 or 21 bp) that are difficult to resolve for transcripts with similar coding sequence (Morozova et al., 2009). Serial Analysis of Gene Expression was used in a variety of studies, for example, Poroyko et al (2005), used SAGE to define number and relative abundance of transcripts in the root tip of well-watered maize seedings. More recently, Molina et al (2008) used an improved version of the (SAGE) techniques to get insight into stress-related gene activity in chickpea (Cicer arietinum) as response to drought stress. II. Expressed sequence tag (EST): Expressed sequenced tags (EST) emerged as the first high throughput, tag-based method for the study of gene expression and for genome annotations. The development of expressed sequence tag (EST) sequencing in 1991 partially addressed the cost limitation of FL-cDNA sequencing by introducing a less complete, less

54 accurate, yet cheaper approach to the detection of expressed transcripts than was possible with sequencing FL-cDNAs (Vega-Sanchez et al., 2007; Morozova & Marra, 2008). Generation of expressed sequence tags (ESTs), which are single sequencing reads derived from one end of a cDNA clone, have been used to characterize cellular mRNA profiles. With the development of next-generation sequencing technologies, EST sequencing has gained potential as one of the sequence census methods for studying mRNA profiles on a genome-wide scale. It has been estimated that most EST sequencing projects fail to cover 20–40% of transcripts, which usually include rare or very long transcripts as well as transcripts with highly specific expression patterns. Another challenge of EST-driven gene annotation is alternative splicing and the complex structure of many loci from multi-cellular eukaryotes, resulting in a substantial number of incomplete annotations. Next-generation sequencing technologies have the potential for providing much deeper coverage of EST libraries. However, the short reads may be problematic when annotating alternative splice variants and the complete accurate structures of protein coding loci (Morozova & Marra, 2008). A number of studies have been successful at constructing EST libraries using the Sanger and 454 technologies. Cheung et al (2008) used EST approach in order to study Pythium species as an agriculturally important genus of plant pathogens. Vega-Arreguin et al (2009) introduced sequencing and analysis of expressed sequence tags (ESTs) for the discovery of novel genes and for annotation of genomic sequences in Palomera maize transcriptome by a high throughput strategy of pyrosequencing. III. Massively parallel signature sequencing (MPSS): Massively parallel signature sequencing (MPSS) has been more recently introduced. The procedure uses an approach of cloning cDNAs on microbeads, digestion with the tagging enzyme DpnII, followed by sequencing by hybridization using adapters with four bases at a time, generating 17-20 signatures. MPSS allows for the identification of millions of signature per experiment, surpassing even the largest of SAGE applications that only cover hundreds of thousands of tags. However, MPSS experiments cannot be performed in individual laboratories, as the technology was only available, until recently, through Illumina. The now discontinued MPSS technique has been replaced by a new platform that uses a sequencing-by-synthesis approach, with DNA molecules attached to a flow cell, and is known as Solexa sequencing technology (Vega-Sanchez et al., 2007). Gowda et al (2006) used both massively parallel signature sequencing (MPSS), and robust-long serial analysis of gene expression (RL-SAGE) to study the mycelium and appressorium transcriptomes of Magnaporthe grisea in order to understand the molecular mechanisms of its pathogenesis on rice plants at the transcriptome level. Alagna

55 et al (2009) has found that massively parallel sequencing of different olive fruit cDNA collections has provided large scale information about the structure and putative function of gene transcripts accumulated during fruit development. Comparative transcript profiling allowed the identification of differentially expressed genes with potential relevance in regulating the fruit metabolism and phenolic content during ripening.

2.4.2.4 Metatranscriptomic approach: 2.4.2.4.1 Definition, concept and limitations : Over the past few years, attempts have been made to analyze gene expression in natural populations. Transcriptomic and by extension metatranscriptomic, can be seen as the comprehensive, quantitative analysis of all genes expressed by one or several organisms, or by a whole ecosystem (John et al., 2009). While the disciplines of genomics and metagenomics study the genomic potential of a particular organism or a microbial community, respectively, transcriptomics and metatranscriptomics deal with the subset of genes that are transcribed under certain environmental conditions (Warnecke & Hess, 2009). The sum of all gene transcripts in an organism is termed the transcriptome. The application of transcriptome analysis to natural environments (metatranscriptomics) refers to the expressed subset of genes within a microbial community at a certain point in time (John et al., 2009; Warnecke & Hess, 2009). Therefore, Metatranscriptomics, defined as the collective RNA from all the microorganisms in a community, constitutes a very important step forward in the attempt to reveal soil-microbial functionality (Maron et al., 2007; Bastida et al., 2009). Historically, metatranscriptomic studies have involved either the use of microarrays or mRNA-derived cDNA clone libraries. These approaches have produced significant insight into the metatranscriptome of different communities but have limitations when exploring the diversity of natural community. Firstly, a microarray only gives information about those sequences for which it was designed and it is usual to screen for gene sequences that are already known. Secondly, although transcripts cloning avoid this problem through the random amplification and sequestering of environmental mRNA fragments, it introduces other biases; e.g. any cloned transcripts that encode toxic products or titrates host DNA-binding factors will skew the relative abundance of sequences (Gilbert et al., 2008). The sequencing of cDNA rather than genomic DNA focuses analysis on the transcribed portion of the genome, which provide certain advantages over DNA-based metagenomics, the most important being the reduction of community complexity by focusing on the active population in a sample. This focus reduces the size of the sequencing target space, which can be viewed as desirable given

56 the fact that, even with next-generation sequencers, sequencing an entire genome is still an expensive task. Coupling of direct ‘omic’ approaches to direct high -throughput sequencing is producing massive amounts of data, replacing the previous limitation of sequencing depth with the challenge of analyzing terabytes (trillions of bytes) of sequence information (Morozova et al., 2009; Morales & Holben, 2010). The identification and quantification of mRNA species under different conditions or in different cell types have long been of interest to biologists. RNA is a molecule of many facets and subtleties, participating in almost all macromolecular processes. The central dogma states that genetic information flows from DNA to RNA and then to protein. mRNA is the template carrying the genetic message from the gene to the ribosomal factories for protein synthesis. mRNAs are heterogeneous in size, ranging from hundreds to thousands of nucleotides (Gu & Reddy, 2001). Consequently, RNA has thus been more often targeted for information on the active portion of the population, which represents a logical next step to Metagenomic approach (Sørenson et al., 2009). In prokaryotes, genetic information in DNA is collinear with the specified product, so newly transcribed mRNAs are used directly as templates for translation. In eukaryotes, however, a much more complex process occurs to produce mature mRNA template for translation (Gu & Reddy, 2001). Both metagenomics and metatranscriptomics have been considered as technologies that can facilitate the exploration of the uncultivable majority of microorganisms, their metabolic capabilities and functional roles (Morales & Holben, 2010). The biogeochemical cycling of globally important elements is intrinsically controlled by the expression of genes encoding the reactions catalyzed by the organisms (mostly microbes) participating in these cycles (John et al., 2009). Sequencing the expressed genetic information of an ecosystem (metatranscriptome) can provide information about the response of organisms to varying environmental conditions. The application of high-throughput sequencing technology is now enabling access to both known and previously unknown transcripts in natural communities (Gilbert et al., 2008). Sequencing of cDNAs is one field to which NGS was applied quickly and successfully. The use of NGS to obtain transcriptomics data is collectively known as RNA deep sequencing or RNA-seq (Morozova & Marra 2008; Nowrousian et al., 2010). Such applications include the characterization of low-abundance transcripts and genotyping to determine, for example, which alleles of the transcripts might be differentially expressed. In these scenarios, it may be preferable to enrich for the desired subset of transcripts, to minimize the overall cost of sequencing and maximize the number of samples that can be analyzed. RNA-seq of poly(A)+ RNA species offers a natural route for exome sequencing without the use of enrichment

57 strategies. Recent advances in RNA-seq have provided researchers with a powerful toolbox for the characterization and quantification of the transcriptome. Using these technological advances, it is now available to build a complete catalogue of transcripts that are derived from genomes ranging from those of simple unicellular organisms to complex mammalian cells, as well as in tissues in normal and disease states, and to define complex biological networks in a wide range of biological specimens. With these networks, data-driven RNA network models of cells and tissues could present a fully understand of the biological pathways that are active in various physiological conditions (Ozsolak & Milos, 2010). These developments enabled the characterization of variations of the microbial community structure and diversity in multiple situations allowing the identification of populations preferentially associated with environmental perturbations. Studies on the metatranscriptome together with those on the metagenome and the metaproteome will contribute to progress in our knowledge of microbial communities and their contribution in ecosystem functioning (Maron et al., 2007). The current approaches, microbial community metagenomic and metatranscriptomic techniques, have been developed as other ways to study microbial assemblages, giving rise to exponentially increasing collections of information from numerous environments (Morales & Holben, 2010).Corresponding to metagenomics, environmental transcriptomics (metatranscriptomics) retrieves and sequences environmental mRNAs from a microbial assemblage without prior knowledge of what genes the community might be expressing (Poretsky et al ., 2005; Frias- Lopez et al ., 2008). Thus it provides a less biased perspective on microbial gene expression in situ compared with other approaches. As well as approaches based on DNA as molecular marker, methods based on RNA have permitted such analyses in the context of molecular transcription and have provided some information on gene activity.

2.4.2.4.2 Knowledge on Metatranscriptomic approach: The first glimpse at an environmental transcriptome was the work of Poretsky et al (2005) who built primarily prokaryotic mRNA libraries derived from two aquatic sites. Although this was only a glimpse of the gene expression profile of these environments, it provided the proof of concept necessary for metatranscriptomics (John et al., 2009). Gilbert et al (2008) presented a study of complex marine metatranscriptome obtained from random whole- community mRNA using the GS-FLX pyrosequencing technology. This study is characterized by an exceptional enrichment of mRNA (99.92%) versus ribosomal RNA. It was proved that metatranscriptomic studies of natural microbial communities are not only feasible, but when

58 paired with metagenomic data sets, offer an unprecedented opportunity to explore both structure and function of microbial communities (Gilbert et al., 2008). John et al 2009 presented the first report of a metatranscriptome from eukaryotic marine plankton. A metatranscriptomic cDNA library targeting transcriptome has been reported; Leininger et al (2006) have studied the soil metatranscriptome by targeting prokaryote community. Grant et al (2006) and Todaka et al (2007) have used the 3’ poly(A) tail of eukaryotes mRNA for identification of different eukaryotic protein coding sequences from hot springs water and termite gut respectively. Bailly et al (2007) presented the first metatranscriptomic study of eukaryotic community of forest soil, and has approved the potential of this approach to address ecophysiological questions, not only at the community level, but also at the level of a single abundant species. Urich et al (2008) used the metatranscriptomic approach to study soil microbial community structure and function. Moreover, Shi et al (2009) used metatranscriptomics approach to reveals unique microbial small RNAs in the ocean’s water column. Tartar et al (2009) studied parallel metatranscriptome analyses of host and symbiont gene expression in the gut of the termite reticulitermes flavips. Shrestha et al (2009) reported the first study towards global metatranscriptome analysis in paddy soils with contrasting conditions. Bulk composite mRNA was extracted from the oxic and anoxic zones of 70-day- old flooded paddy soil microcosms and used to construct two cDNA libraries by semi- randomly primed RT-PCR. Poretsky et al (2009) used environmental transcriptomics to elucidate day/night differences in gene expression in surface waters of the North Pacific subtropical gyre. This analysis provided information on the dominant metabolic processes within the bacterioplankton assemblages and reveals changes in expression patterns of biogeochemically relevant processes. A summary of metatranscriptomic studies is shown in table 14.

59 Table 1: Some recent studies that applied metatranscriptomic approaches to environmental samples. . 2.4.2.4.3 Drawbacks and technical limitations: Global metatranscriptomics may involve different sources of bias. Data collection may be distorted by the procedure used to extract mRNA, while events of horizontal gene transfer and an insufficient reference database of complete genome sequences may also contribute to bias in the data analysis (Shrestha et al., 2009). But, the major limitation of metatranscriptome is related to the difficulty in eliminating humic acids during the extraction process, differential transcription kinetics of similar genes in different population, low correlation between RNA levels and synthesis of the corresponding proteins. However, indications of genetic potential do not contribute to the elucidation of the functionality contrarily to the level of expression of the genetic potential of microbial communities in ecosystems (Maron et al., 2007). Environmental transcriptomics protocols are technically difficult, as prokaryotic mRNAs generally lack the poly(A) tails that make isolation of eukaryotic messages relatively straightforward (Liang & Pardee, 1992) and because of the relatively short half-lives of mRNAs. In addition, mRNAs are much less abundant than rRNAs in total RNA extracts, thus an rRNA background often overwhelms mRNA signals (Grunberg-Manago, 1999; Kapranov et al., 2002; Maron et al., 2007; Frias-Lopez et al., 2008; Poretsky et al., 2009). Several approaches for overcoming these challenges have been recently proposed. For example, ribosomal RNA (rRNA) subtraction was used in combination with randomly primed RT-PCR to generate microbial community cDNA for cloning and downstream sequence analysis. Although preliminary results were encouraging , relatively large sample volumes (≈10 liters) and long sample collecting times were required. Linear RNA amplification methods have

60 been widely used to study gene expression in eukaryotic tissues but are not generally applicable to bacterial and archaeal mRNA because of the requirement of a poly(A) tail (Frias-Lopez et al., 2008).

Microbial taxonomic and functional diversity was estimated for long time using traditional methods or culture-dependent approaches, which were insufficient for discovering and identifying all organisms within environmental sample. However, advances in molecular biology methods have ameliorated our understanding of microbial structure, functions and their interactions with other organisms and their environment. However, beyond its biotechnological applications, the metatranscriptome should reveal the pattern of gene expression of a microbial community in a complex environment and could be used to infer the physiological status of the corresponding community and to identify the environmental conditions that have a major impact on gene expression in situ. The use of large scale metatranscriptome technique could allow future targeting of functions-related activity for molecular biological studies, without a prior knowledge of important functional gene groups or sequence variants present in the sample.

61 Les méthodes pour étudier la diversité fonctionnelle des sols : La détermination des processus métaboliques conduits par les microbes est importante autant pour comprendre que pour manipuler les écosystèmes terrestres. Plusieurs approches biochimiques et traditionnelles ont été développées pour étudier la diversité fonctionnelle des sols. Les méthodes Stable isotope probing (SIP) et Phospholipids fatty acids (PLFA) ont été largement utilisées dans plusieurs environnements pour élucider leur biodiversité fonctionnelle. La métaprotéomique, ou l'identification de toutes les protéines, est une approche qui fournit des renseignements sur la fonctionnalité réelle par rapport aux voies métaboliques. Si on considère que la majorité des microorganismes des sols sont non- cultivables, cela montre la limite des approches traditionnelles basées sur une phase de culture, à décrire la diversité naturelle. Ceci indique par conséquent qu’une grande partie de la diversité microbienne est restée inexplorée jusqu’à présent, ainsi que l ’activité des microorganismes, du point de vue de leur contribution aux flux de nutriments et d'énergie, la respiration du sol, le transfert de gènes, la dégradation de polluants, qui sont largement sous- estimés ou complètement occultés par les méthodes traditionnelles. Les techniques moléculaires basées sur l'ADN ont permis d’étudier la structure des communautés microbiennes au niveau de l'ADN et ont permis de rattacher l'abondance de gènes à un groupe microbien donné impliqué dans une voie métabolique particulière. Le métagenome est l’ensemble des fragments d'ADN de tous les microorganismes présents dans une communauté. Par ailleurs, cela a permis l’étude de microorganismes qui ne sont pas, ou pas facilement, cultivables au laboratoire. Il est improbable que la création d'une base de donné métagénomique basée sur l'ADN génomique puisse capturer une fraction significative du contenu de gènes d'une communauté microbienne. Ainsi, pour mieux comprendre les communautés microbiennes naturelles répondant aux perturbations dans des conditions environnementales, une analyse des communautés basée sur les ARNm semble plus judicieuse que par l'ADN génomique, puisqu’ il est bien établi que les ARNm transcris sont directement liés aux taux de production d'enzymes que les concentrations de gèns fonctionnels dans le métagénome. Un mystère intéressant en biologie moléculaire est comment la fabrication génétique identique de cellules peut induire la formation de différents types cellulaires, dont chacun joue un rôle défini dans le fonctionnement d'un organisme multicellulaire. Cette diversité phénotypique a été reliée au fait que différents types de cellules dans l'organisme activent différents ensembles de gènes (transcriptomes) qui peuvent entraîner différentes fonctions de la cellule. Le transcriptome est la somme des transcriptions (mRNAs) d'un organisme à un

62 lieu géométrique spatial et temporel défini. Le séquençage de transcriptome a été utilisé pour des applications variant du profilage de l'expression des gènes (genome annotation) et la détection de réarrangements à la découverte d'ARN non-codants, et a été un domaine clé d'investigations biologiques pendant plusieurs années. Les études sur le transcriptome ont évolué et ont permis de déterminer les séquences de clones individuels de cDNA aux essais plus complets pour construire des bases de données de cDNA représentant des portions d’espèces transcrites. En raison du prix élevé de la méthode de Sanger utilisée dans ces études et la complexité de la procédure de clonage, la fabrication des FL-cDNA n’ est pas envisageable pour étudier la biodiversité fonctionnelle. Récemment, le séquençage à grande échelle a été révolutionné par le développement de plusieurs technologies : les nouvelles générations de séquençage (NGS). Le séquençage de cDNAs est un champ auquel NGS a été rapidement appliqué et avec succès. L'utilisation de NGS pour obtenir les données transcriptomiques est collectivement connue comme le séquençage profond de l'ARN ou l'ARN-seq. En utilisant ces avancées technologiques, nous pouvons aujourd’hui construire un catalogue complet de transcris qui sont tirés des génomes variant de ceux d'organismes unicellulaires simples aux cellules complexes de mammifères, ou même des tissus normaux ou infectés. Alors que la génomique et la métagenomique étudient le potentiel génomique d'un organisme particulier ou d’ une communauté microbienne, la transcriptomique et la métatranscriptomique traitent respectivement des sous-ensembles de gènes qui sont transcris dans certaines conditions environnementales. L’application des analyses du transcriptome aux environnements naturels (métatranscriptomique) permet une compréhension de l'expression de gènes de la communauté microbienne ambiante in situ. Par conséquent, la Métatranscriptomique est définie comme l ’ensemble des ARN de tous les microorganismes d’une communauté, et constitue la deuxième étape dans la perspective de révéler la fonctionnalité microbienne des sols, en combinaison avec l'analyse de protéines. Au-delà de ses applications biotechnologiques, la métatranscriptomique devrait révéler le niveau d'expression de gènes d'une communauté microbienne dans un environnement complexe et pourrait être utilisée pour déduire le statut physiologique de la communauté correspondante. Cela devrait permettre également d’ identifier les conditions environnementales qui ont un impact important sur l'expression de gènes in situ. L'utilisation à grande échelle de la métatranscriptomique pourrait permettre le futur criblage d'activités pour les études spécifiques de biologie moléculaire, sans la connaissance a priori de groupes de gène fonctionnels importants ou présents de manière variable dans les sols.

63 CHAPTER II: Development of total RNA extraction protocol, cDNA libraries construction, and 454 pyrosequencing

3.1 Introduction: Gene expression can be used to identify genes and activities essential for cellular functions under varying conditions (Borneman & Triplett, 1997). Many techniques are now available to analyze microbial community structure and function by analyzing microbial rRNA and mRNA, respectively (Kirk et al., 2004; Coleman & Whitman, 2005; Saleh-Lakha et al., 2005; Edel-Hermann et al., 2008; Gupta et al., 2008). The first logical step within range of methods in molecular microbial ecology based on environmental matrices is the direct extraction of nucleic acids with a high output and quality (Sørensen et al., 2009), followed by, analyzing all the data obtained by the appropriate tool and using the appropriate strategy in order to get an estimation of the taxonomic diversity and gene expression in these environments. The isolation and analysis of mRNA transcripts from environmental microbial samples is an important step to increase our understanding of the complex processes of microbial ecology (Poretsky et al., 2005). However, the functions of prokaryotic genes and their transcripts remain difficult to study due to technical problems related with the isolation of mRNA (McGrath et al., 2008), as most eukaryotic mRNA contains 50-200 adenylic acid residues at their 3’ -end, comparing with 15 to 60 in prokaryotes and that eukaryotic polyadenylation machinery recognizes a specific consensus near the 3 ’ end, whereas the sites of polyadenylation of prokaryotic mRNA are diverse, and the reaction does not require a consensus sequence. Moreover, most eukaryotic mRNAs contain a 7-methylguanosine residue (called cap) attached to the terminal residue of initia l transcript through a 5’ ppp 5’ linkage, in contrast to prokaryotic mRNA that do not contain au exact 5’ cap structure (Gu & Reddy, 2001). In general, mRNA molecules are much less abundant than rRNA molecules in total RNA extracts (Poretsky et al., 2005), with only approximately 1 –5% mRNA (McGrath et al., 2008). Moreover, because of simultaneous transcription and translation, mRNA is usually fragmented, and consequently, represents an unstable small fraction of total extracted cellular RNA, which can be hardly isolated without the persistence of considerable levels of rRNA contamination (McGrath et al., 2008; Shrestha et al., 2009). In addition, the half-lives of prokaryotic mRNA which is too short (30s based on studies of cultured bacteria) represent a major problem to the isolation and characterization of mRNA from soil (Sayler et al., 2001; Poretsky et al., 2005). Other factors affecting mRNA isolation from a soil matrix is the

64 potential degradation by ribo-nuclease, and that extraction methods optimized for one soil are not easily applicable to other soils (Sayler et al., 2001). The extraction of total RNA has the advantages of avoiding extensive steps of purification and amplification of mRNA, which enable fast preparation of soil samples characterized by concentrations of humic acids and other substances which could affect further molecular application (Urich et al., 2008). Soil RNA extraction protocols include basically three stages: Cell lysis ; which is the complete lysis of microorganisms for releasing all intracellular RNA. This is the most crucial step and that is subject to the most variation amongst RNA extraction protocols. The most widely used cell lysis techniques for RNA extraction from soil are: ballistic disintegration of cells using glass or zirconium beads, solubilization of cell membranes by detergent, boiling or enzymatic degradation of the cell wall and membranes coupled with osmotic shock, usually with repeated freeze-thaw cycles. The method of choice for cell lysis can depend on the sample. Typically, a more destructive mechanical approach is used as the amount of humic and clay content increases. The rest of the steps in RNA extractions are standard between samples: inactivation of nucleases (RNase) activity to prevent degradation and losses of RNA, and finally, extraction and purification of the RNA extract and removal of organic contaminants, humic acids or humic substances that are co- extracted with nucleic acids from soil and inhibit the enzymatic activities, for example, Taq DNA polymerase (Anderson & Cairney, 2004; Saleh-Lakha et al., 2005). It is highly recommended to extract RNA at a sufficient output and high purified quality to screen gene expression and relate it to microbial activities that are detected in soil (Hurt et al., 2001; Anderson & Cairney, 2004; Saleh-Lakha et al., 2005). The essential and critical parameters to efficient recovery of nucleic acids (Hurt et al., 2001; Kirk et al., 2004; Saleh-Lakha et al., 2005) are : efficiency of cell lysis (it varies between and within microbial groups), and purification from contaminating humic acids like organic matter (interfering with subsequent cDNA synthesis and PCR analysis), and finally efficiency of nucleic acid recovery after lysis (subsequent purification steps can lead to loss of DNA or RNA, again potentially biasing molecular diversity analysis). Our objective was to develop an extraction protocol for the two studied ecosystems, LSM (lately snowmelt) and ESM (early snowmelt). Our strategy for development of extraction protocol would depend initially on fife basic parameters, (1). Quality : RNA extraction should yield extraction with high output quality, based on RNA integrity number (RIN) values, which is the estimation of the quality of extracted RNA based on number of detected parameters that should be over 7 (Agilent technologies). (2). Reproducibility : extraction should be regular for the two samples and within each sample,

65 based on the number of successful extraction from total extracted samples. (3). Sufficient output: extraction yield should be in the acceptable output range for latter pyrosequencing analysis (at least 5µg of total extracted RNA in 100µl). (4). Variability : It is difficult to compare data generated using different DNA extraction protocols, therefore, final protocol should be suitable for RNA extractions from the different soil samples studied here, taking into consideration the different characteristics of each soil (e.g. variable presence of humic acids and organic matter). (5). Productivity : obtaining sufficient RNA extractions yield and quality based on limited quantities of soil starting materiel (0.5g). Once RNA is extracted in high output and quality, the next step is to convert these RNA into cDNA. Several strategies are used, ranging from converting the total extracted RNA or to purify mRNA amongst the total extracted RNA. The cap structure of the mRNA can be used as a tag for an intact and complete RNA template, for example, specific selection of capped eukaryotic mRNA can be accomplished by affinity retention using a cap-binding protein coupled to a solid support or through the chemical introduction of a biotin group into the diol residue of the cap structure (Edery et al., 1995). Alternatively, an oligo-capping technique is used whereby a synthetic oligo-nucleotide replaces the cap structure and is ligated to the mRNA (Suzuki et al., 1997). Generation of full-length cDNAs from an mRNA template was a matter of substantial development in biotechnology research. Conventional reverse transcription (RT) reactions use either the oligo(dT) primer or random hexamers to synthesize single strand cDNA. The synthesis of full-length double-stranded cDNA relays on more indirect approaches. Schmidt & Mueller (1999) presented CapSelect as a novel technique for the selective enrichment of full-length cDNAs, which was successfully applied for the specific enrichment of complete mRNA 5’ -ends in PCR-mediated sequences analysis, cDNA library construction and direct transcription start site mapping. Other method relays on template switching PCR (TS-PCR), which takes advantage of the terminal transferase activity and template-switching ability of Moloney murine leukemia virus reverse transcriptase (MMLV-RT) to add an arbitrary sequence to the 5 ′ end of a transcript in a manner that occurs preferentially for capped, full-length transcripts. This arbitrary sequence, encoded in a TS oligo, along with a second arbitrary sequence added at the 3 ′ end of the cDNA by the oligo- dT-containing primer used to initiate reverse transcription, is used to subsequently amplify the total pool of putatively full-length transcripts (Zhu et al., 2001; Kapteyn et al. 2010). Reverse transcriptase can also synthesize cDNA in a primer-independent manner, which is thought to be caused by self priming arising from the RNA secondary structure. This results in the generation of random cDNA synthesis. Furthermore, reverse transcriptase has lower fidelity

66 compared to other polymerases owing to their lack of proofreading mechanisms, and they have variable RNA to cDNA conversion efficiency depending on the experimental conditions

(Ozsolak & Milos, 2010). Despite the difficulties concerning RNA extraction, numerous protocols have been developed for total RNA extraction. Most of these have been optimized for specific samples and have not been applied to diverse environmental samples. Therefore, the choice of appropriate protocols for the isolation of RNA and more the appropriate protocol for converting these extracted RNA into cDNA prior to pyrosequencing is essential for subsequent characterization. Metatranscriptomic approach used in this study to analyze the functional diversity of tow ecosystem totally different, Late snow-melt (LSM), and Early Snow-melt (ESM) locations, within alpine tundra environment, could represent un advanced step in order to understand the factors influencing gene expression within these environment in relation with snow cover duration and other biotic and abiotic factors.

3.2 Materiel and methods: 3.2.1 Soil samples: The sampling site was located in the Grand Galibier Massif in the French South-Western Alps, between Grenoble and Briançon (elevation, 2520m). Two alpine ecosystems contrasting in their snow-cover regimes, early snowmelt (ESM) and late snowmelt (LSM) sites were studied. The sampling was done in the middle of the vegetation season in August 2008, where three soil samples for each plot were collected from the top 10cm of soil, sieved (2mm mesh size) to remove fine roots and large organic debris, frozen in dry ice and then stored at -80c°. Before RNA isolation, 10 g of soil were first ground for 10 min in chilled mortars, under liquid N 2, and stored in aliquots of 2 g at -80°c.

Figure 14: Alpine ecosystem (a) landscape general, (b) the two studied ecosystem showing differences of color (different content of organic matter), and different vegetation cover.

67 3.2.2 Total RNA extraction: 3.2.2.1 Extraction method: In order to extract RNA from soil locations (LSM and ESM), and depending on extraction parameters mentioned above, and as RNA is unstable and could be easily degraded by RNases, all solutions used in handling RNA extraction were prepared with RNase free stocks and DEPC-treated water. The flowchart of all experimental procedure is shown (Fig. 15).

Figure 15: schematic outline of experimental procedures workflow. Texts in color grey represent the steps that were not adopted in the final protocol.

68 Two extraction methods were used. Firstly, extractions were realized using RNA PowerSoil TM Total RNA Isolation Kit corresponding to manufactures instructions . Secondly, initial steps of RNA extraction protocol proposed by Billy et al (2007) were adopted and lately modified in order to be adapted for our studied soils. The general points of the protocol are: (1). 0.5 g of frozen ground soil was added to tubes containing 0.5g of glass beads, diameter 0.6 mm (Sigma), 1 ml of lysis buffer (Different lysis buffer compositions were tried, ( Table 2).

Composants mix1 mix2 mix3 mix4 Tris-Hcl 92.5 mM 85 mM 75 mM 45 mM

Lysis solution (pH 9) Na 2EDTA 18.5 mM 17 mM 15 mM 9 mM NaCl 92.5 mM 85 mM 75 mM 45 mM SDS 1.85% 1.7% 1.5% 0.9% Guanidine isothiocyanate 100 mM 400 mM 800 mM 100 mM Denaturant solution (pH 8) Tris-Hcl 0.25 mM 1 mM 2 mM 0.2 mM

Na 2EDTA 0.02 mM 0.1 mM 0.2 mM 0.02 mM β-mercaptoethanol (µl) 50 50 50 25 Phenol (pH 5), (µl) 0 0 0 500 Table 2: Composition of initial lysis buffer.

(2). Cells were further broken in this mix by agitating for 10 min at maximum speed at room temperature using the Vortex2Genie of Scientific Industries, then centrifuged at 14000 x g for 5 minutes. ( 3). Two successive extractions were performed by adding 1 ml of phenol: chloroform:isoamyl-alcohol (25:24:1, by vol.) mixture, followed by vortex for 30 seconds and centrifugation for 10 minutes at 14000 x g, then the upper aqueous phase was carefully transferred to new tubes, avoiding the inter- and lower-phase. ( 4). to the aqueous phase transferred from the previous step, two purification steps with Chloroform: isoamyl-alcohol (24:1, by vol.), were realized in order to eliminate maximum traces of phenol (essential for later enzymatic reactions). After manual vortex and centrifugation for 10 minutes at 14000 x g, the upper aqueous phase was carefully transferred to new tubes. ( 5). Nucleic acids were then precipitated for 30 min at -80°C by adding to the aqueous phase 0.1 volumes of 3M Na- acetate (pH 5.2) and 2.5 volume of ethanol 100%, then centrifuged for 15 minutes at 15000 x g. (6). The nucleic acid pellet was then resuspended in 100µl H 2O-DEPC.

3.2.2.2 RNA purification and DNA elimination: Nucleic acids were further purified using, (1). adding 40 µl of LiCL (4M), to the soil purified RNA, and precipitated overnight at (4°c), then centrifuged at 14000 x g for 15 minutes, then the RNA pellet was resuspended in 40 µl DEPC-treated water (2). Residues

69 DNA were digested at ambient temperature for 25 min using 3 UK/µl of DNase1 (QIAGEN). Low molecular weight contaminating molecules and DNA residues were further eliminated on RNeasy® column Mini Kit (QIAGEN), following manufactures instructions.

3.2.2.3 RNA quality and quantity estimation: RNA yield and quality obtained after extraction using the two methods mentioned above were estimated using: (1) nucleic acids migration on agarose gel at 1.5%. The extraction was considered as successful if two distinct bands of 16/18S and 23/28S rRNA, without the presence of smear that reflects RNA degradation was visually detected. (2) ultraviolet (UV) absorbance via NanoDrop ND-1000 Spectrophotometer, where the ratio 260/280 (the ratio between extracted nucleic acids and protein) represent an estimation of the quality of extracted nucleic acids that should be between 1.80 and 2.00 (Aranda et al., 2009). (3) Capillary electrophoresis using Bioanalyzer 2100 RNA 6000 Nano Kit from Agilent Technologies, where RIN values represent an estimation of RNA integrity number, which is a major step in the standardization of RNA integrity assessment (acceptable results should be over 7), (Schroeder et al., 2006).

3.2.3 cDNA preparation: MINT cDNA synthesis kit (EVROGEN) was used in order to convert the total extracted RNA from LSM and ESM samples into cDNA according to manufacturer’s instructions. The technology used in this kit ( Fig. 16 ) depends on specific properties of MINT reverse transcriptase (RT). First strand cDNA synthesis starts from the 3'-primer comprising an oligo(dT) sequence (5’ -AAGCAGTGGTATCAACGCAGAGTAC(T)30VN-3’) to anneal to the poly(A)+ stretch of RNA. When RT reaches the 5' end of the mRNA, it adds a non- template oligo(dC), to the 3' end of the newly synthesized first-strand cDNA. This oligo(dC) stretch base pairs to the complementary 3'-end oligo(dG) of the PlugOligo primer (5’ - AAGCAGTGGTATCAACGCAGAGTACGGGGG-P-3’). RT identifies PlugOligo as an extra part of the RNA-template and continues first strand cDNA synthesis to the end of the oligonucleotide, thus incorporating PlugOligo sequence into the 5' end of cDNA. The efficiency of this step is increased using the IP-solution. The last 3'-dG residue of the PlugOligo is a terminator nucleotide comprising 3'-phosphate group, thus preventing unwanted annealing and extension of the PlugOligo. At the final step, ds-cDNA is amplified by PCR.

70

Figure 16: schematic outline of Mint cDNA synthesis workflow (Schmidt & Mueller., 1999).

Double-stranded cDNAs were synthesized and amplified by performing 10 PCR of 24, 27, and 31 cycles on each sample, using the primer M1 (5’ - AAGCAGTGGTATCAACGCAGAGT-3’). Then PCR products from the same sample were pooled together, and purified using the PCR Purification kit (QIAquick ®). Six cDNA libraries were prepared from LSM and ESM locations (three replicates for each). These six cDNA libraries were further pooled to produce two final cDNA libraries, one for LSM and another for ESM.

3.2.4 454 Pyrosequencing: Pyrosequencing was carried out at Genoscope (Centre National de Sequençage, Evry, France). First of al, nebulization was realized on cDNA fragments in order to produce short fragments with approximately 500 bp in length. Short adaptors A1:C*C*A*T*CTCATCCCTGCGTGTCTCCGAC*T*C*A*G- 3’, extended by the 3'tag ACGAGTGCGT for LSM and ACGCTCGACA for ESM, and B1:5' biotinylated C*C*T*A*TCCCCTGTGTGCCTTGGCAGTC*T*C*A*G;, asterisk indicates a phosphorothioate-modified base, were then ligated to blunted fragments, and DNA molecules containing A and B were selected. The two cDNA libraries were mixed and pyrosequenced

71 jointly according to (Margulies et al., 2005) on one fourth of a plate, with modifications required for the 454 GS-Flx Titanium technology (Roche).

3.3 Results and discussion: 3.3.1 Total RNA extraction: 3.3.1.1 RNA PowerSoil TM Total RNA Isolation Kit: Two experiments were realized using RNA PowerSoil TM , Total RNA Isolation Kit: Experiment I: 2 (LSM) and 2 (ESM) samples were extracted using this protocol following manufactures instructions, followed par further purification by precipitations using LiCl. RNA extraction quality and yield estimation of final extraction solutions before and after precipitation based on NanoDrop Spectrophotometer is shown in Table 3.

1Crude RNA extractions 2After RNA precipitation using LiCL Sample 260/280 yield (µg/g) of soil 260/280 yield (µg/g) of soil LSM1 1,75 12,9528 1,88 0,9096 LSM2 1,74 40,2224 1,5 1,2312 ESM1 - - - - ESM2 1,91 31,0752 1,81 4,3088 Table 3: Estimating of extracted RNA quality and concentration using NanoDrop ND-1000 Spectrophotometer, (1) before and ( 2) after precipitation using LiCL.

PowerSoil RNA extraction kit has resulted in extraction with variable yield between sites and within each site. Reproducibility of extraction was not reached because extraction of ESM1 sample has failed (Table 3). In addition, quality estimation also showed variable results depending on 260/280 ratio using NanoDrop, which could be clearly related to other substances co-extracted with nucleic acids, represented by the final dark-brown color of extraction. Although further purification by precipitation using LiCl, resulted in better quality (260/280 ratio) for certain extraction (LSM1, table2), the yield was reduced. After repeating the extractions procedures for LSM and ESM samples several times (data not shown), and taking into consideration quality estimation (260/280 ratio was largely out of acceptable ranges), productivity (always half of ESM samples were not well extracted or completely missed); we found that using PowerSoil RNA extraction kit, followed by precipitation of salts and contaminants using Lithium Chloride (LiCl) was not suitable for getting appropriate quality and yields for both soil samples. Experiment II: The presence of DNA and the fail of LiCl to separate it from the ARN, prompted us to try DNA digestion. The same soil samples; 2 (LSM) and 2 (ESM), were extracted again using the same protocol (RNA PowerSoil TM , Total RNA Isolation Kit)

72 following manufactures instructions, with only replacing precipitation with LiCl, by digestion of DNA residue using DNaseI and RNeasy purification column, following manufacturer’s instructions. We can see clearly that the extractions with the modification mentioned above, has ameliorated the quality of extracted RNA based on NanoDrop estimations. Estimation RNA extraction output and quality (Table 4), showed acceptable quality results for both LSM and ESM samples (the ratio 260/280) based on NanoDrop estimations, with variable yields detected between samples and within each sample.

1Crude RNA extractions 2After RNA purification and DNA elimination Sample 260/280 yield (µg/g) of soil 260/280 yield (µg/g) of soil LSM1 1.83 2,4322 1.89 1,3394 LSM2 1.71 1,0572 2.02 0,7866 ESM1 1.77 2,0734 1.78 2,0838 ESM2 1.89 5,5812 2.05 2,6626 Table 4: Estimating of extracted RNA quality and concentration using NanoDrop ND-1000 Spectrophotometer, (1) before and ( 2) after digestion by Dnase1 and purification by RNeasy columns. However, further quality estimations using Bioanalyzer 2100 RNA 6000 Nano Kit from Agilent Technologies ( Fig. 17 ), showed inacceptable results in term of RIN for both LSM and ESM extracts, that could be related to the fact that our procedure could also extract large amounts of extracellular protein and humic acids from soil particles which could interfering with the spectra of nucleic acids. Agilent spectres also revealed the presence of RNA degradation, especially for ESM samples, with the persistence of small amounts of potential nucleic acids (DNA) still recovered after DNA digestion

Figure 17 : Electropherogram for (2 LSM sample and 2 ESM samples), after digestion by Dnase1 and purification by RNeasy columns using the Bioanalyzer 2100 RNA 6000 Nano Kit from Agilent Technologies.

73 It could be clearly seen that the digestion of residue DNA by applying DNase1, and purification with RNeasy columns of RNA PowerSoil TM total RNA ’s extractions has ameliorated RNA output and quality realized on the same samples (LSM and ESM). However, extractions using this protocol showed that it is more adapted for LSM sample but not for ESM samples (Experiment I, II). ESM soils extraction was not regular at all in term of reproducibility and yield, which prevent the adoption of this protocol as a final protocol (data not shown). Thus, these results suggest that extractions using this protocol following modification mentioned above, were successful for getting extraction with acceptable yield even it were variable within sites and within each site. In contrast, quality estimations were not in the appropriate range, and showed important portion of degradation and thus using this protocol with the modification mentioned above did not circumvent the problem of RNA degradation. In addition to the total failure of RNA extraction of certain samples (ESM) and the fact that comparing data generated using different acids nucleic extraction protocols could influence markedly soil microbial diversity estimations; we decided to abandon this protocol and to test other possible protocol.

3.3.1.2 Modified protocol proposed by Bailly et al 2007: Several extractions were realized using exactly RNA protocol proposed by Bailly et al (2007) . 2 LSM samples and 4 ESM samples were extracted using this protocol (Fig. 18 ).

Figure 18 : RNA quality estimating by agarose gel. Lane 1-2 corresponds to LSM1, LSM2 respectively. Lane 3- 6 corresponds to ESM1, ESM2, ESM3, and ESM4 respectively.

Figure ( 18 ) shows clearly the degradation of RNA extracts, represented by a slight smear starting from the rRNA bands and extending to the area of shorter fragments. This degradation is clearly confirmed based on Agilent estimations ( Fig. 19 ), represented by the absence of the two distinct rRNA bands (16/18S and 23/26S), especially for ESM samples. RIN values clearly showed that it was largely below 7, especially for ESM samples.

74

Figure 19 : Electropherogram for Total RNA extraction quality estimations, based on Agilent BioanAlyzer.

Using the exact protocol proposed by Bailly et al (2007) resulted in extraction with relatively acceptable yield but bad quality, especially for ESM samples. These results suggest that this protocol was not suitable to get RNA extractions in term of quality, reproducibility and variability, which enhanced the application of certain modification to adapt this protocol to the specialty of our soil samples. First changed parameters were soil sample size, which had no obvious effect on extraction quality and yield. There were no differences when using 0.5 and 1 g of soil respectively (data not shown). Second changed parameter was the modification of lysis buffer composition on extraction procedure, several compositions were tested (Table 2). Mixes 1, 2, 3 did not show acceptable results, with the presence of evidence of RNA degradation, which was obvious especially for ESM samples (data not shown). Due to this obvious degradation of RNA during extraction procedures, we hypothesized that adding phenol to the initial lysis buffer combined with previous modifications mentioned above could prevent the immediate degradation of RNA. Extraction process was done as rapidly as possible to limit degradation of mRNA. Another modifications were to keep the samples on ice under all the experimental procedure, transfer the soil immediately from (-

80°c) into tubes already pre-chilled under liquid N 2 and shack immediately to ensure the total mixing between the soil and all the component of the lysis buffer, including the phenol. Moreover, due to the fact that LSM and ESM samples differ in their content of organic

75 matter, two phenols: chloroform:isoamyl-alcohol extractions, followed by two steps of chloroform:isoamyl-alcohol purifications were realized to eliminate the maximum of contaminants eluted with RNA extractions. Following these modification, two soils samples, one LSM and one ESM, were extracted. After the first phase separation, there were two phases, separated by an opaque creamy inter-phase with no distinct difference between the two phases, while after the second phase separation most of the dark-brown color was found in the chloroform:isoamyl alcohol phase, that was separated from a clear light yellow phase by creamy inter-phase, the thickness of which varied depending on the type of soil (LSM or ESM). Consequently, the phenol phase contained significantly less brown smear, suggesting that most humic compounds and contaminants were successfully removed or transferred in the down layer. Results shown in Fig. 20 clearly showed that following all these modifications has resulted in excellent RNA extraction output and quality for both LSM and ESM samples.

Figure 20: RNA extraction, quality and concentration estimations based on (A) agarose gel at (1.5%), and (B) Agilent BioAnalyzer. The modifications introduced to the protocol proposed by Bailly et al (2007) , especially, using lysis buffer composition showed that mix4 resulted on important ameliorations on RNA extraction quality and output. Figure 20 shows clearly the distinct presence of the two ribosomal bands (16/18S and 23/28S), with no obvious traces of degradation, and moreover, with excellent RIN values. Following all these modifications, we have succeeded the extraction of total RNA with RNA integrity number of 7.6 and 7.3 for LSM and ESM respectively, and with sufficient RNA extraction yield, in case of doing several extractions from each samples and pooling all the extraction together for generating cDNA libraries. Using the modified protocol, three replicate from LSM samples and three from ESM samples, were extracted using the protocol mentioned above. Quality estimation based on Agilent BioAnalyzer of extraction of total RNA for the six samples is shown in Fig. 21 .

76

Figure 21: Electropherogram based on Agilent Bioanalyser analysis of six extracted soil from LSM and ESM samples.

3.3.2 cDNA libraries preparations: Six cDNA libraries were prepared from total extracted RNA of 3 LSM samples and 3 ESM samples showed in Fig. 21 using MINT cDNA synthesis kit according to manufacturer’s recommendations. Fig. 22 shows quality estimations of the six prepared cDNA libraries, realized from total extracted RNA, estimated by capillary electrophoresis using Agilent Bioanalyzer.

Figure 22: Electropherogram based on Agilent Bioanalyser of the six constructed cDNA libraries for 3LSM and 3ESM samples. The three replicates for each sample (LSM and ESM) were pooled together to produce two library for Pyrosequencing. Quality estimation based on migration on agarose gel ( Fig. 23, A), and by spectrophotometer ( Fig. 23, B ), revealed that the final two constructed cDNA

77 libraries were of high quality based on the ratio of 260/280 (1.84 and 1.86 for LSM and ESM respectively), and with sufficient final output of (7.27 µg/100µl and 9.86 µg/100µl for LSM and ESM respectively).

Figure 23: quality estimation of cDNA libraries prepared from LSM and ESM samples using, A. agarose gel and B. by detection by ultraviolet at 260nm for acid nucleic using Nanodrop spectrophotometer.

Figure 24: Electropherogram based on Agilent Bioanalyser of the final two constructed cDNA libraries (LSM and ESM). 3.4 Disussion: Total RNA was extracted from tow alpine soil sites (LSM, Lately Snow Melt), and (ESM, Early Snow Melt). We succeeded to implement final extraction protocol with adjacent appropriate results in term of quality (all extraction were characterized by excellent RIN values), reproducibility (this protocol was successful for LSM and ESM samples), sufficient output (final RNA extraction yield obtained using this protocol was in the suitable range, in case of performing several extractions from each sample and pooling these extraction in final sample) and finally, productivity (extractions were successful even using limited amount of soil starting materiel, 0.5 g of soil). Extraction of total RNA from two ecosystems totally different within their abiotic and biotic factors related to the different regime of snow melting and different vegetation cover complicate more the adoption of protocol appropriate for the two soils. Results of extraction using the same protocol has confirmed that even with using

78 the same initial steps in all RNA extraction procedures, some protocols are more appropriate for certain soil and not for others, which come initially from properties of each soil and how much this extraction protocol is adapted for the specificity of this soil (Sayler et al., 2001). The presence of phenol in lysis buffer was crucial for RNA stabilization, as phenol is considered as important agent for nucleic acids separation, and perhaps protecting from degradation (Kirby et al., 1956). Moreover, handling total extraction procedure from the beginning under cold condition, using sterilized materiel and moreover, within the phenol could circumvent these problems of RNA degradation. Overall, getting RNA extraction with high output and quality is crucial in order to perform successful analysis of these extracted RNA, which, in itself, a substantial step to allow the detection and identification of those members of the community that are active compared with dormant ones. The introduction of high-throughput next-generation DNA sequencing (NGS) technologies revolutionized transcriptomic by allowing RNA analysis through cDNA sequencing at massive scale (RNA-seq), (Ozsolak & Milos, 2010). This development eliminated several challenges posed by precedent technologies, and presented a very important step towards elucidating the structural and functional diversity and factors controlling in natural environments and amongst these arise the alpine environments (Edwards et al., 2007). Specific polyadenylation at the 3’ end of RNA was for a long time believed to be unique to eukaryotic messenger RNAs. Studies of the last ten years of the structural gene for poly(A) polymerase1 in Escherichia coli initiated that RNA polyadenylation occurs not only in eukaryotes (Jasiecki & Wegrzyn, 2003). However, the 3’ -end of prokaryotes and eukaryotes are totally different. Most eukaryotic mRNA contains a polyA tail, in contrast to prokaryotes (Gu & Reddy, 2001). RNA-seq strategies often involve a poly(A) mRNA-enrichment step. Polyadenylation of transcripts also takes place during transcript degradation steps, and thus poly(A) enrichment steps may also enrich for RNA degradation products of RNA polymerase I transcripts and other RNAs (Ozsolak & Milos, 2010). Thus targeting poly(A) fraction would basically get eukaryote fractions of microbial communities. In this context, using the metatranscriptomic approach coupled with targeting eukaryotic community (polyA tails) is supposed to be an important step in order to understand the role of eukaryotic community within alpine environment, and to elucidate the responses of these communities against climatic changes and the presence of different vegetation covers with different growth strategies and moreover associated with different microbial communities especially within the two studied locations lately snowmelt (LSM), and early snowmelt (ESM) locations.

79 Procédures expérimentales : Les protocoles d'extraction de l'ARN du sol incluent fondamentalement trois étapes : La lyse cellulaire, qui est la lyse complète des microorganismes pour libérer tout l'ARN intracellulaire, suivie de l’inactivation de l’activité des nucleases (RNase) pour prévenir la dégradation et les pertes d'ARN, et finalement, l'extraction et la purification de l'ARN extrait pour éliminer les polluants organiques, les acides humiques ou les substances humiques qui sont co-extraites avec les acides nucléiques du sol et inhibent les activités enzymatiques comme celle de la Taq DNA polymérase. Il est donc hautement recommandé d'extraire l'ARN en quantité suffisante et à une haute qualité pour pouvoir rechercher les gènes exprimés et les rattacher aux activités microbiennes mises en évidence dans le sol. Notre objectif était de développer un protocole d'extraction pour les deux écosystèmes étudiés, LSM (sols soumis à fonte de neige tardive) et ESM (sols soumis à fonte de neige précoce). Notre stratégie pour le développement de notre protocole d'extraction dépendait au départ de cinq critères fondamentaux : (1) La qualité: l'extraction d'ARN devrait produire des extraits d’A RN de haute qualité, basée sur les valeurs de RIN (préciser) qui devaient être supérieures à 7; (2) La reproductibilité : l'extraction devait être régulière pour les deux échantillons et dans chaque échantillon en se basant sur le nombre d'extractions réussies sur les échantillons totaux ; (3) La quantité produite : la production d'extraits devrait être dans la gamme de production acceptable pour le pyroséquençage (au moins 5 µg d'ARN extrait total dans 100 µl), (4). La variabilité : il y a une difficulté importante pour comparer les données produites en utilisant des protocoles d'extraction d'ARN différents, donc le protocole choisi devait être convenable pour les extractions d'ARN des différents échantillons de sol étudiés ici, en prenant en compte les différentes caractéristiques de chaque sol étudié ; Finalement, (5) la productivité : l'obtention d ’extraits d'ARN suffisants au niveau de la qualité et à partir des quantités limitées de sol (0.5g). Malgré des difficultés concernant l'extraction d'ARN, de nombreux protocoles ont été développés pour l'extraction d'ARN total. La plupart d'entre eaux ont été optimisés pour des échantillons spécifiques et n'ont pas été appliqués à divers échantillons environnementaaux. Le choix de protocoles appropriés pour l'isolement d'ARN et plus particulièrement la procédure de conversion des ARN extraits en cDNA avant pyroséquençage, est donc essentiel pour la caractérisation fiable du fonctionnement des écosystèmes. Différentes stratégies ont été utilisées pour étudier le fonctionnement des écosystèmes en relation avec la structure des communautés microbiennes. L’isolement et l'analyse d’AR Nm transcrits des échantillons

80 microbiens environnementaux représentent une étape importante pour augmenter notre compréhension des processus complexes importants en écologie microbienne. Dès que l'ARN est extrait selon des critères de haute qualité et quantité, l’étape suivant e est de convertir ces ARN en ADNc. La génération de full-length cDNAs à partir des ARNm est un défi important pour la recherche biotechnologique. Les réactions de transcription réverse conventionnelle (RT) utilisent soit l’approche oligo (dT) soit des hexamères aléatoires pour synthétiser le premier brin d ’ADNc. L'approche Métatranscriptomique utilisée dans cette étude pour analyser la diversité fonctionnelle d ’écosystèmes terrestres complètement différents (en relation avec leur couvert nival (LSM et ESM) dans un contexte de toundra alpine) pourrait représenter l’étape suivant e pour comprendre les facteurs influençant l'expression de gènes dans ces environnements en relation avec la durée de couverture neigeuse et d'autres facteurs biotiques et abiotique.

81 Chapter III 454 pyrosequencing data analysis 4.1 Introduction: Various strategies were used to study the ecosystem functioning relationship with the structure of microbial communities. Essential objectives of these efforts are to attribute key functions to specific community members and, in view of the ecosystem stability, to reveal cooperation between community members and functional redundancies (Dinsdale et al., 2008; Morales & Holben, 2010). The enormous microbial diversity present in a natural ecosystem represents a rich resource for discovery of unknown microbes and the novel genes/proteins they encompass (Schmidt et al., 2007). The first step after obtaining sequencing results from 454 pyrosequencing or other approaches, starting from nucleic acids extracted from environmental samples such as soil, is the analyzing of these data using the appropriate database with the appropriate software. Regardless of the sequencing approach used to generate the data, the first steps in analysis of any metagenome and by extension metatranscriptome involve sequences trimming (removal of primers and adapters used for sequencing), followed by size selection (removal of short reads and reads with bad quality containing stretch of polyA/T for example), and then comparing those sequences to known sequence databases. Recently, several databases and associated software’s were developed in order to analyze the enormous data obtained from sequencing and other molecular approaches. One of the first used and most important databases is the GenBank, which is a comprehensive database that contains publicly available nucleotide sequences for more than 380,000 organisms named at the genus level or lower, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole genome shotgun (WGS) and environmental sampling projects. The importance of this database comes from the daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ), which ensures worldwide coverage (Benson et al., 2010). Other database largely used recently is the UniProt Knowledgebase (UniProtKB), which is the central access point for extensive curated protein information, including function, classification, and cross- reference, which is accessible at (http://www.ebi.ac.uk/uniprot/ ). It consists of two sections: UniProtKB/Swiss-Prot, that contains records with full manual annotation or computer- assisted manually verified annotation performed by biologists and based on published literature and sequence analysis. The second section is the UniProtKB/TrEMBL, which

82 contains records with computationally generated annotation and large-scale functional characterization (Wu et al., 2006). SEED database is a more recent database that is limited by number of genome that contain which count for 58 archaeal, 902 bacterial, 562 eukaryal, 1713 viral and 0 environmental genomes. Within these genomes found in the SEED database, only 58 archaeal, 868 bacterial and only 29 eukaryal genomes are more-or-less complete (http://theseed.uchicago.edu/FIG/index.cgi ). SEED database is used in coordination with several databases include several rDNA databases, including RDP-II, and the European 16S RNA database, and boutique databases such as the chloroplast database, mitochondrial database, and ACLAME database of mobile elements. However, with the advancement in molecular and biochemical approaches for analyzing natural environment, a lot of databases have been created and continually developed in order to get the best estimation of the taxonomic and functional diversity within these studied environments. This computationally intensive charge provides the basic data types for many subsequent analyses, including phylogenetic comparisons and functional annotations. An important step in obtaining a high-quality genome sequence is to correctly assemble short reads into longer sequences accurately representing contiguous genomic regions. Current sequencing technologies continue to offer increases in throughput, and corresponding reductions in cost and time. Unfortunately, the benefit of obtaining a large number of reads is counterbalanced by sequencing errors, with different biases being observed with each platform and the amount of data that needs powerful computers to be analyzed. Although software are available to assemble reads for each individual system, no procedure has been proposed for high-quality simultaneous assembly based on reads from a mix of different technologies (Boisvert et al., 2010; Laserson et al., 2011). A number of DNA sequence assembly programs have been developed such as PHRAP (Green, 1996), TIGR assembler (Sutton et al., 1995), SEQAID (Peltola et al., 1984). CAP3 Assembly has been used widely in genome sequencing projects. CAP3 is a developed version of CAP with the following major phases: (1). Removal of poor end regions of reads, (2). Computation of overlaps between reads, (3). Removal of false overlaps, (4). Construction of contigs, and finally construction of multiple sequences alignments and generation of consensus sequences (Huang & Madan, 1999). Assembly and other analytical tools used for handling data obtained from pyrosequencing could be tested using rarefaction or accumulation curves. An accumulation curve is a plot of the cumulative number of types observed versus sampling effort. The curves contain information about how well the communities have been sampled. The idea that microbial diversity cannot be estimated comes from the fact that many microbial

83 accumulation curves are linear or close to linear because of high diversity, small sample sizes, or both (Hughes et al., 2001). NMDS test can be used in order to compare different datasets with different ecological variables. NMDS involves the reduction of multidimensional similarity data to a low-dimensional ordination in which relative distance indicates relative similarity (i.e. samples with very similar composition are close and vice versa). The procedure uses an iterative algorithm to successively refine sample positions in the ordination until they match as closely as possible the underlying similarity matrix (Bennett et al., 2008). The next step after sequences trimming and size selecting and assembly is to assess the taxonomic content of the sample and moreover linking functions to specific populations by the assignment of phylogenies to specific genetic fragments, which is a huge challenge in biodiversity studies (Morales & Holben, 2010). Recently, several methods have been developed in order to estimate taxonomic and functional diversity within natural environment. The progress achieved in the development of various methods has now reached to a level that they could be and are being used for a meaningful assessment of the diversity (taxonomic and functional) and functioning of soil biota (Gori et al., 2010; Gupta et al., 2008; Schreiber et al., 2010). However, since majority of these organisms in environmental samples belong to unknown taxonomic groups, one of the biggest challenges, not only just to catalog the known organisms, but also to identify and characterize new organisms belonging to known or unknown taxonomic groups. These organisms could belong to an entirely new species or genus or family or order or class or even a new phylum (Mohammed et al., 2010). Metatranscriptomic analyses, in principle, can be used to associate specific microbial taxa with in situ expression dynamics. However, phylogenetic inference based on protein-coding genes is highly dependent on a given’s conservation across taxa, the depth of taxonomic sampling, taxon richness and evenness in the sample, and sequence read length. The accuracy of taxonomic assignment may be negatively affected by possible events of horizontal gene transfer (Shrestha et al., 2009). Phylogenetic analysis, gives a simple look at “who is in the sample?” (Schreiber et al., 2010). In 454 pyrosequencing data sets, the abundance of reads pertaining to taxa or phylotypes is commonly interpreted as a measure of genic or taxon abundance, useful for quantitative comparisons of community similarity (Amend et al., 2010). Within reads generated after 454 pyrosequencing; only a small fraction of fragments can be taxonomically characterized, depending on the size of the used marker gene database (Krause et al., 2008), and also due to the phylogenetic diversity in the samples. Most current approaches for taxonomic classification are based on similarity between the organisms to be classified and known correctly classified organisms (Kotamarti et al., 2010; Schreiber et al.,

84 2010). 16S rRNA, 18S rRNA or other slowly evolving marker genes are used frequently as phylogenetic anchors to predict the taxonomic origin of environmental genomic fragments (Krause et al., 2008). The use of phylogenetic anchors had limited this approach to sequence fragments physically linked to the genes used for phylogenetic reconstructions (Morales & Holben, 2010). The first tool ever used and still widely used for taxonomic assessment is Basic Local Alignment Search Tool (BLAST; Altschul et al., 1990). Using BLAST, each read will be assigned to its best matching reference sequence, called best hit (BH), which is still the best stand-alone assignment method for long reads (Gori et al., 2010). Simply classifying genomic fragments based on a best BLAST hit will only yield reliable results if close relatives are available for comparison (Krause et al., 2008). However, the known shortcoming of BLAST-based analysis in metagenomics include the requirement of a sufficient sequence length and the existence of close homologous in the reference database (Schreiber et al., 2010). Lowest Common Ancestor (LCA) assigns to each read one taxon computed by means of the least common taxonomic ancestor of a suitable set of sequences (hits). These hits are obtained by matching the read against a database of reference sequence, like the NCBI-NR protein database. In this way LCA assigns reads to taxa at possibly different taxonomic ranks. In the same way CARMA identifies protein family fragments among the reads and it assigns each fragment to the ancestor taxon shared by the phylogenetic sub-tree of reference proteins where the fragment is located (Gori et al., 2010). In order to overcome some of drawbacks of LCA, several taxonomic approaches have been developed recently such as NBC (Naïve Bayes Classification) tool webserver for taxonomic classification of metagenomic reads (Rosen et al., 2010), and MTR (Multiple Taxonomic Ranks based clustering) proposed by Gori et al (2010). In contrast to homology-based approaches, several methods pursue the direct classification of the DNA signature of single reads such as Phylopythia (McHardy et al., 2007), and TACOA (Diaz et al., 2009). In addition to the importance of databases creation and maintenance, there was need for software’s that could exploit these databases in optimal way. In this context, sequences are screened for potential protein encoding genes (PEGs) via a BLASTX search against the SEED comprehensive non-redundant database sourced from the INSDC databases, sequencing centers, and other sources. In parallel with the BLASTX searches, the sequence data is compared to all accessory databases by using the appropriate algorithms and significance selection criteria. Finally, these matches to external databases are used to compute the derived data and a phylogenetic reconstruction of the sample is computed by using both the phylogenetic information contained in the SEED nr database and the similarities to the

85 ribosomal RNA database (Meyer b et al., 2009). Other software that could be used for taxonomic and functional analysis called MEGAN. Lowest Common Ancestor (LCA) as simple similarity-based algorithm for taxonomic assignment is the core algorithm of MEGAN web based annotation tool. This software can import the results of a BLAST comparison as input and attempts to situate each read on a node in the NCBI . This is performed by the LCA algorithm that assigns each ribosomal reads to the lowest common ancestor in the taxonomy from a subset of the best scoring matches in the BLAST result. Ribosomal reads that have no BLAST matches are assigned to the special node “no hits” and those unassigned due to algorithmic reasons are placed on the special node “not assigned”. The result of the analysis is displayed as a representation tree of the NCBI taxonomy (Huson & Mitra., 2011). The logical next step after getting an estimation of the phylogenetic diversity issue de 454 pyrosequencing data is to attribute these species belonging to certain taxa to certain functional groups or roles (Poretsky et al., 2005). MG-RAST is a server which is an open source system based on the SEED framework for comparative genomics has recently used to analyze the explosion of random community genomics, or metagenomics (Meyer et al., 2008). MG-RAST can achieve both kinds of profiling at the same time because the detected homologies may provide information about functional and taxonomic relations. Reads are portioned based on MG-RAST organization in hierarchical classification called “subsystem” which regroups the reads based on their similarity with known protein sequences within SEED database. Functional classifications of the PEGs are computed by projecting against SEED FIGfams and subsystems based on these similarity searches (Meyer b et al., 2009). These functional assignments become the raw input to an automatically generated initial metabolic reconstruction of the sample, providing suggestions for metabolic fluxes and flows, reactions, and enzymes (Overbeek et al., 2005; Meyer et al., 2008). In Molecular Biology, protein sequences, structures and functions have been subjected to classification. While protein sequence and protein structure are concepts easy to define, quantify and compare, the situation for protein function is fairly more complex. As a consequence, functional classification schemes are usually created from prior knowledge, while the creation of most sequence and structural classification schemes is driven by data. This fact opens the possibility of introducing arbitrary, artificial or human-imposed functional definitions and classifications, due to historical reasons, to the experimental toolkit which is available, or simply to the need of grouping and categorizing things. For example, a protein function like ‘transmission of genetic information’ is clearly a category, since it groups genes

86 and proteins without evolutionary or functional relationships in many cases. It only responds to our own organization of biological phenomena around the ‘central dogma’ in order to arrange the knowledge in textbooks (Chagoyen & Pazos, 2010). The first schemes for functional classification were based on a small number of disjoint functional groups (Riley, 1998). In the course of time, more complex functional vocabularies and ontologies were created. The de facto standard today for representing protein function in computational terms is the one maintained by the Gene Ontology (GO) consortium (Ashburner et al., 2000). Gene Ontology (GO), the de facto standard for representing protein functional aspects, is being used beyond the primary goal for which it is designed: protein functional annotation. It is increasingly used to evaluate large sets of relationships between proteins, e.g. protein –protein interactions or mRNA co-expression, under the assumption that related proteins tend to have the same or similar GO terms. Nevertheless, this assumption only holds for terms representing functional groups with biological significance (‘classes’), and not for the ones representing human-imposed aggregations or conceptualizations lacking a biological rationale (‘categories’) ( Chagoyen & Pazos, 2010). GO defines a set of functional terms related by parenthood relationships forming a directed acyclic graph (DAG). We can navigate this DAG from very general to highly specific terms (functions). Indeed GO defines three separated ontologies for representing three orthogonal aspects of protein function: molecular function (GO:MF), biological process (GO:BP), and cellular component (GO:CC). A given protein is annotated by assigning to it one or more terms from these three ontologies. However, when trying to perform GO-based analysis in poorly characterized organisms we encountered a number of drawbacks. In general, these tools are either not designed for high-throughput sequence annotation, are limited in their mining and visualization capabilities, or accept only gene or probe identifiers as input data, making them restrictive to annotated sequences already deposited in public databases. Blast2GO (B2G), a universal GO annotation, visualization and statistics framework that brings advanced functional analysis to the genomics research of non- model species. B2G has been designed to (1) allow automatic and high throughput sequence annotation and (2) integrate functionality for annotation-based data mining. Basically, B2G uses local alignment search tool (BLAST) to find similar sequences to one or several input sequences ( http://www.ncbi.nl.mnih.gov/BLAST/ ). The program extracts the GO terms associated to each of the obtained hits and returns an evaluated GO annotation for the query sequences, and lately enzyme codes are obtained by mapping from equivalent GOs (Conesa et al., 2005).

87 4.2 Materiel and methods: 4.2.1 Sequences analysis, size selection, trimming and assembly: Reads obtained from pyrosequencing (Fasta files) for both LSM and ESM datasets were treated with Unix commands and Phyton scripts via Obitools environment (http://www.grenoble.prabi.fr/trac/OBITools ). These reads were trimmed in order to, first, eliminate the M1 primer sequence used for cDNA amplification at either DNA end, second, eliminate the polyA tail, when found at 3'-end, and the polyT stretch, when present at 5'-end, and finally, to eliminate all reads < 40 nt. After the reads trimming from the row dataset, the fasta file containing trimmed reads (>40nt) in length were converted into database, which was then queried by (BlastN) against a collection of 35 LSU and 39 SSU rRNA sequences, sampled from 19 bacterial phyla, Archaea and various eukaryotes taxa. Reads retained with an E-values > 0.001were considered as rRNA sequences based on sequences similarity to the database. Reads with no match hit against the manually generated ribosomal reference were considered as putative mRNA. Reads was assembled using the publically available CAP3 program at (http://deepc2.psi.iastate.edu/aat/cap/cap.html ) using default parameters. R scripts were generated in order to establish accumulation curves for assembled sequences and in order to depict the complexity of our assembled dataset. BlastX searches were conducted against UniProt TrEMBL and Swiss-Prot database ( http://expasy.org/sprot/ ), using several E- values between 10 -2 and 10 -20 .

4.2.2 Taxonomical and functional analysis: Taxonomic analyses were realized by; firstly , rRNA and mRNA reads were compared against SEED database using MG-RAST software with E-value cut-off of 0.01. A phylogenetic reconstruction of the sample is computed by using both the phylogenetic information contained in the SEED nr database and the similarities to the ribosomal RNA database. Secondly , rRNA reads were aligned against NCBI non-redundant protein database -6 using Blast2Go software. BlastN results with an E-value cutoff of >10 , were retained and analyzed using MEGAN (mtagenome Analysis Software at http://www-ab.informatik.uni- tuebingen.de/software/megan ). Analysis of BLAST output files was performed using the MEGAN software version 4.0beta21. In order to estimate the functional diversity of our dataset, functional annotation of protein coding sequences were performed by BLAST (BlastX) searches of the putative mRNA sequences against MG-RAST (Meta Genome Rapid Annotation using Subsystem Technology; v1.2) server at the Argonne National Library (http://metagenomics.nmpdr.org ). Test fisher was realized on sequences related to each

88 subsystem in order to get the enzymes significantly differences between the two dataset. For each gene or functional category, a pairwise comparison of read hit distribution to all other reads or all other MG-RAST annotated reads was performed using a Monte Carlo test with 10000 samplings ( chisq.test function in R using sim=T and B=10000 ). The null hypothesis is that the probability of annotating a given function is the same in the two datasets. These enzymes significantly differentes were further compared against ExPASy Proteomics Server at ( http://expasy.org/cgi-bin/show_thumbnails.pl ). All putative mRNA reads were also annotated using Blast2GO (B2G), a comprehensive bioinformatics tool for functional annotation and analysis of gene or protein (uncharacterized) sequences at ( http://www.blast2go.org/ ). For multivariate analysis; trimmed and size selected (putative mRNA) reads of our two metatranscriptomes, besides, 8 published pyrosequenced transcriptomes downloaded from the Short Read Archive (SRA) division of the Genbank repository and 5 genomes functional attributions downloaded from SEED database were all deposited on the MG-RAST server, then reapportioned in several functional group based on subsystem approach. The ratio of each functional category was calculated from the total number of reads annotated with MG-RAST within each dataset. NMDS (non-metric multidimensional scaling) ordinations based on the Kulczynski similarity index were used to examine relative similarities of two alpine metatranscriptomes with the downloaded transcriptomes and with genomic functional attribution issue de SEED database.

4.3 Results: 4.3.1 Row dataset preparation, trimming and assembly: A cDNA sample was prepared from two alpines soils sites (LSM, ESM) and sequenced using a one- quarter plate run with the 454 GS-FLX Titanium platform. This one quarter run produced 92078 and 66779 reads for LSM and ESM, respectively. Elimination of short reads obtained from next-generation (< 40 nt, usually contained poly-A and T stretch) has reduced the initial dataset to 64% and 65% of initial number of reads. The remaining sequences were accounted for 58978 sequences in LSM site with cumulated sequence length of 23,1 Mb and read length (average ± SD) of (392 ± 130 nt), comparing with 43472 sequences in ESM site with cumulated sequence length of 16,8 Mb and read length (average ± SD) of (386± 131 nt). An overview of data flow is shown in (Fig. 25 ).

89

Figure 25. Overview of data flow.

The comparison of total trimmed and size selected sequences against the ribosomal database reference has revealed the presence of sequence with blast match, which was accounted for 8919 sequences (15%) of total trimmed and size selected sequences for LSM and 4296 sequences (10%) for ESM. these sequences assigned as rRNA were firstly isolated and lately analyzed and eliminated from the trimmed and size selected dataset. The remaining sequences were considered as putative mRNAs sequences, and were accounted for 50059 sequences (85 % of total trimmed and size selected reads) and 39176 sequences (90 %) putative mRNA reads in LSM and ESM respectively (Fig. 26 ).

Fig. 26. Size distribution of cDNA reads before trimming. Reads were considered as mRNA (shown in blue) when they were more than 40 nt, and in the same time didn’t match any hit against the ribosomal reference. In contrast, reads with blast match hits against the ribosomal reference are shown in red. The rest of reads (shown in yellow) were classed bad reads, as they were less than 40 nt, and in the same time contain longue stretches of polyA and polyT.

90 Putative mRNA reads were assembled using the CAP3 program. The 50039 sequences in LSM were assembled into 5827 contig with (average ± SD) length of 400 ± 184 nt, and 28362 sequences remained as singleton (57 %) with (average ± SD) length of 363 ± 144 nt, comparing with 39176 sequences in ESM assembled into 3864 contig with (average ± SD) length of 457 ± 181 nt and 26164 sequences as singleton (67 %) with (average ± SD) length of 369 ± 138 nt ( Table 5).

Table 5: general statistics on reads assembly using CAP3. Assembly produced a little number of large contig (> 800 bp) and a considerable number of contig with only two reads, with average size of (635 and 595 bp) for LSM and ESM respectively, while in contrast only a few of the assemblies contained over 800 pyrosequencing reads. A large majority reads remained unassembled, with 57 % and 67% singletons, respectively. Contig size was only moderately increased over the initial read size (LSM 465,7 ± 78 nt, ESM 487 ± 88,4 nt), which could means that obtained contigs were the result of overlapping of similar regions between the two reads joint together. Curves of accumulation of assembly show a positive relationship between the total number of reads and the number of reads assembled into contig, and so, assembly that we have used was not capable to explore the complexity of our dataset; however, we have remarked a little more complexity within ESM dataset assembly ( Fig. 27).

91

Figure 27: accumulation of contigs upon assembly of putative mRNA reads by CAP3. Triangles: ESM, circles: LSM. Dotted lines represent average ± standard deviation of 100 random samplings. As assembly didn’t improve the quality of our initial reads in term of size and number of reads assembled into contigs, we decided to ignore the assembled reads and continue analyzing the initial datasets obtained after trimming, size selecting and removing of ribosomal reads. Comparison with UniProt-TrEMBL database has revealed that the distribution of hits within the two datasets (LSM and ESM) against this database was homogenous. 90% of reads were annotated when using an E-value cutoff of (10 -2), comparing with 70% of reads annotated when using an E-value cutoff more stringent of 10-20 , ( Fig. 28 ). In contrast, the comparison with UniPort-Swiss-Prot showed that only 35% and 15% of sequences were annotated when using E-value cutoff of 10 -2 and 10 -20 in the two datasets respectively. Secondly, all the sequences considered such mRNA were annotated against SEED database and showed that 16 % and 7 % of sequences were annotated when using E- value cutoff of 10 -2 and 10 -20 in the two dataset respectively. These comparisons revealed that the diversity of our dataset is largely dependent on the used database like reference and the threshold of E-value cutoff.

92 Figure. 28. Cumulated BlastX hits at various E-values, using SEED, SwissProt, TrEMBL datasets. Putative mRNA was annotated against the mentioned database depending on number of best hit match with E-values cutoff ranging from 10 -2 to 10 -20 .

4.3.2 Community composition and taxonomic origin of transcripts: 4.3.2.1 Taxonomic affiliation of ribosomal reads based: I. MG-RAST approach : Ribosomal sequences counted for 8919 (15% of total long reads) and 4269 (10% of total long reads) in LSM and ESM detests respectively. The comparison of all ribosomal read with SEED database via MG-RAST software, revealed that the majority of sequences with blast matches based on ribosomal protein were related to bacterial origin, with 64.64% and 61.34% in LSM and ESM datasets respectively ( Table 6). The two datasets has shown largely similar results concerning the distribution of Blast match based on ribosomal protein approach.

Protein based 16s LSM ESM LSM ESM Archaea 132(2.45%) 42(1.64%) 0(0.00%) 0(0.00%) Bacteria 3481(64.64%) 1569(61.34%) 1754(97.77%) 888(95.38%) Eukaryota 279(5.18%) 214(8.37%) 0(0.00%) 0(0.00%) Virus 0(0.00%) 0(0.00%) 0(0.00%) 0(0.00%) Other 1493(27.73%) 733(28.66%) 40(2.23%) 43(4.62%) Table 6: General information on the content of ribosomal reads based on SEED database.

High number of sequences within the two datasets had a blast match against SEED database with, 80.13%, and 75.45% from total annotated reads in LSM, ESM dataset respectively. The

93 comparison between LSM and ESM datasets has shown large similarity ( Fig. 29 ). Bacteria was dominant with (86.62 %) and (82.37 %) in LSM and ESM respectively. Followed by sequences related to Eukaryote with (10.82 %) and (15.18 %) in LSM and ESM respectively. Within bacteria, Proteobacteria class was dominant within the two dataset with (31.9 %) and (32.7 %) from total annotated reads, followed by Firmicutes class with (21.8 %) and (16.6 %) in LSM and ESM respectively. The majority of sequences related to Fungi phylum were attributed to Ascomycota with (5.9 %) and (8 %) from total assigned reads in LSM and ESM respectively . The distribution of sequences related to other microbial groups such as, Alveolata, Viridiplantae, Actinobacteria , Bacteroidetes , etc, was largely similar, and almost all the groups were represented largely equally within the two datasets. Results of the distribution of microbial groups based on BLAST against SEED database are shown in (Figure 29 ).

Figure 29: Taxonomic affiliation of ribosomal reads based on SEED database via MG-RAST with E-value cutoff of 10 -2.

II. MEGAN approach: Within the 8943 and 4306 ribosomal reads for LSM and ESM respectively, bacteria were by far the most abundant domain in the two datasets; with the representation of largely the same phyla within the two datasets. The most abundant classes within our two datasets was bacteria, with 4780 reads (53.44%) for LSM and 2563 reads (59.52%) for ESM. Fungi and Viridiplantae were the second most abundant classes within the two datasets. Moreover, within bacteria, the most abundant phyla in our two datasets, were,

94 Proteobacteria with 1044 reads (21.84% from total sequences related to bacteria) for LSM and 533 reads (20.79%) for ESM, followed by Acidobacteria with 915 reads (19.14%) for LSM and 438 reads (17.08%) for ESM. Figures (30 and 31 ) show MEGAN results for LSM and ESM datasets at the class level. While both datasets showed similar results for the three most abundant phyla; certain phyla were underrepresented such Verrucomicrobia in LSM dataset, others were completely missing such Nitrospirae and Thermobaculum in LSM dataset, and Cyanobacteria in ESM dataset.

Figure 30: High-level summary of a MEGAN analysis of LSM dataset, based on a BLASTX comparison of the 8943 reads against the NCBI-NR database.

Figure 31: High-level summary of a MEGAN analysis of ESM dataset, based on a BLASTX comparison of the 4303 reads against the NCBI-NR database.

95 4.3.2.2 Taxonomic affiliation of putative mRNA reads: I. MG-RAST approach: All putative mRNA reads (50039 in LSM and 39346 in ESM), were aligned against SEED database via MG-RAST software with E-value cutoff of 10 -2. At the kingdom level, eukaryotes were by far the most abundant domain in the two datasets; with (80.23% and 79.3% in LSM and ESM respectively), followed by bacteria with (18.61% and 19.32% in LSM and ESM respectively). Both datasets showed largely similar distribution for the most abundant phyla (Bacteria, Viridiplantae, Fungi and Metazoa). Fungi represented the most abundant phyla within eukaryotes; however, within fungi phyla, the 5893 sequence in LSM dataset and the 4057 sequences in ESM dataset were all attributed to Ascomycota. Bacteria showed the same representation of largely the same phyla within the two datasets. Most abundant bacterial phyla in our two datasets, were, Proteobacteria with1402 reads (39.93%) of total reads assigned as bacteria for LSM and 1222 reads (45.26%) for ESM, followed by Acidobacteria with 457 reads (13.01%) for LSM and 498 reads (20.08%) for ESM. Sequences related to Actinobacteria with 359 reads (10.22%) for LSM and 292 reads (11.77%) for ESM, were the third most abundant bacterial phyla within our two datasets. While both datasets showed similar results for the three most abundant phyla, certain phyla were underrepresented such Acidobacteria in LSM dataset, or Chloroflexi in LSM dataset. Taxonomic distribution of reads with blast match hits against SEED database is shown in (Fig. 32 ).

Figure 32: Taxonomic affiliation of the putative mRNA reads based on BlastX analysis against SEED database via MG-RAST. Sequences that did not return any significant hit with E-value cutoff greater than 10 -2 for SEED database, were not assigned.

II. Blast2GO approach: All putative mRNA reads in the tow datasets (LSM and ESM) generated after 454 pyrosequencing, were aligned against NCBI nr-protein database using BlastX via Blast2Go software with E-value cutoff of 10 -6. Within the two dataset only about 35% of total reads had a significant blast match against NCBI nr-protein database ( Fig 33 ).

96 Within annotated reads, 8980 (51.18%) and 7033 (53.20%) of total annotated reads were not assigned to of the major kingdom of live and considered as unknown or others. Fungi were the most abundant domain in both datasets with 3733 (21.27%) and 3109 (20.82%) reads of total annotated reads in LSM and ESM respectively. Within fungi; Basidiomycota and Ascomycota clades were dominant, with 30% and 70% in LSM and ESM respectively for Basidiomeycoat clade, and 70% and 56% of total sequences related to fungi in LSM and ESM respectively for sequences related to Ascomycota (data not shown). The second most abundant domain was represented by sequences related to Plant, with 2769 (32.3%) and 2119 (29.6%) matches against NCBI nr-protein database. We observed also the presence of relatively less reads related to Bacteria, Chordata, Amoebozoa, and Nematoda.

Figure 33: Taxonomic affiliation of the putative mRNA reads based on BlastN analysis against NCBI nr-protein database via Blast2go software using E-value cutoff of 10 -6 .

4.3.3 Functional annotations: 4.3.3.1 Functional analysis using SEED subsystems (MG-RAST): For LSM dataset, the majority of annotated sequences were related to, eukaryote protein (70,00 %), bacteria (15.00 %), archaea (0.26 %), and finally (13.85 %) of deposited sequences were not assigned to proteins related to one of the four major kingdoms, comparing with (71%) of eukaryotic protein, (16%) of bacterial proteins, (0.7%) of archeal protein and (12.14 %) not assigned to one of the four major kingdoms for ESM ( Table 7).

Protein based 16s LSM ESM LSM ESM Archaea 100 (0.62%) 89 (0.68%) 0 (0.00%) 0 (0.00%) Bacteria 2387(14.79%) 2077(15,94%) 13(100%) 6 (100%) Eukaryota 11414(70.74%) 9250 (70,97%) 0 (0.00%) 0 (0.00%) Virus 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) Other 2234(13.85%) 1618 (12,14%) 0 (0.00%) 43 (4.52%) Table 7: General repartition of LSM and ESM datasets, comparing with SEED database via MG-RAST software.

97 8030 and 6474 sequence in LSM and ESM datasets respectively had a BLAST match hit against the SEED database, which represented about (17%) of total putative mRNA in the two datasets. The majority of subsystems were present in the two soil datasets ( Table 8). The greatest proportion of sequences that could be assigned to a metabolic function was related to protein metabolism subsystem with 31.27 % and 29.09 % in LSM and ESM respectively. The second most abundant subsystem contains genes associated with carbohydrates subsystem with 11.85 % and 10.35 % in LSM and ESM respectively. Generally similar group of function were present in the two soil datasets. Statistical analysis of sequences abundance within the two datasets in the several subsystems has revealed that, within the two datasets 8 subsystems (cofactors, vitamins, virulence, Cell Wall and Capsule, Membrane Transport, Protein Metabolism, Nucleosides and Nucleotides and carbohydrates), have shown significant differences between the two datasets ( Table 8).

Table 8: Functional classification of putative coding protein derived from cDNA of LSM and ESM datasets, it was annotated using a BLASTX algorithm against the SEED database, using E-value of <0.01 that considered to be significant and minimum alignment length of 50 bp, the two P-value were calculated based on total number or analyzed reads, and from the total number of annotated reads using MG-RAST software.

Several subsystems didn’t show any significant differences between LSM and ESM datasets at hierarchical classification. However, when getting more in details within each

98 subsystem that showed significant differences, we can notice also many differences at hierarchical classification subsystem 2 and 3. Most obvious significant differences between the two datasets based on P-values calculated from total annotated reads were, Ribosome LSU eukaryotic and archaeal, Dehydrogenase complex, with P-value probability of 0.001( Table 9).

Table 9: SEED functional groups showing differences in hit distribution. P-value is that of a Monte-Carlo test using 10000 simulated random samplings. Italics denote non significant differences

Probing carbohydrate subsystem as indicator of metabolic activities Fisher test calculated for each subsystem within carbohydrates subsystem showed significant differences between sequences related to carbohydrates subsystem in the two metatranscriptomic datasets. 930 and 669 sequences related to carbohydrates subsystem were detected in LSM and ESM sites respectively, which represented 11.85 % and 10.35 % of total putative mRNA annotated by MG-RAST. Generally there were similar groups of function within the two soil metatranscriptomes. However 19 subsystems have shown significant differences between LSM and ESM datasets, within, there were 14 functional categories associated with Enzyme Code EC ( Table 10 ).

99 Table 10: Significant differences within carbohydrates subsystem, determined by calculating Fisher-test for all the sequences related to carbohydrates. P-value < 0.05 was considered significant. Fisher-test was calculated depending on the number of reads annotated and on total number of reads deposited on MG-RAST. Enzymes involved in trehalose biosynthesis were found in the two locations (LSM and ESM) except for Trehalose synthase-treS (EC 5.4.99.16) and Trehalose-6-phosphate phosphatase (EC 3.1.3.12) enzymes which were detected only in ESM location. However, the distribution of sequences related to Trehalose biosynthesis is more common in ESM location rather than in LSM location. In addition, sequence related to Sucrose metabolism represented by Beta-glucosidase (EC 3.2.1.21) and Phosphoglucomutase (EC 5.4.2.2) were largely more abundant in ESM dataset rather than in ESM. In contrast, enzymes involved in transforming pyruvate into acetyl-CoA were largely more abundant in LSM dataset. More precisely, enzymes such, Pyruvate dehydrogenase E1 (EC 1.2.4.1), Enolase (EC 4.2.1.11), Pyruvate decarboxylase (EC 4.1.1.1), which has an essential role and contributes to linking the glycolysis metabolic pathway to the citric acid cycle and releasing energy via NADH were largely more abundant in LSM dataset.

Probing secondary metabolism subsystem All functions related to secondary metabolism were represented in the two dataset. There were no significant differences within the subsystems related to secondary metabolism general subsystem, based on Fisher test calculated for all sequences related to secondary metabolism ( Table. 11 ). However, at subsystem hierarchy 3, results show that there were several significant differences, especially within biosynthesis of phenylpropanoids subsystem, for example, Apigenin derivatives , Phytosterol biosynthesis and Biflavanoid biosynthesis were

100 largely more abundant in ESM metatranscriptome dataset. In contrast, Flavanone biosynthesis was largely more abundant in LSM metatranscriptome dataset.

Table 11 : sequences related to secondary metabolism subsystem. Fisher test was realized in order to estimate significant differences within our two metatranscriptomes. Names in bold and italic represent the ones that showed significant differences between the two datasets based on P-values. P-values calculated from total sequences related to Biosynthesis of phenylpropanoides subsystem, show that even if there were a lot of significant differences at subsystem hierarchy 2 and 3; at final classification hierarchy, there were no more significant differences except for EC 2.1.1.75 (flavonoid 7-O-methyltransferase) which was largely more abundant in LSM dataset ( Table 12 ). Moreover, Flavanone-3-hydroxylase enzyme (EC 1.14.13.21) was involved in tow subsystems concerning secondary metabolism; Biflavanoid biosynthesis and Anthocyanin biosynthesis and it was the dominant enzymes that involved in Anthocyanin biosynthesis with 62.5 % and 50 % of sequences related to Anthocyanin biosynthesis subsystem in LSM and ESM locations respectively. Enzymes such as the multigene families encoding P450 hydroxylases, methyltransferases, and the glycosyltransferases, were present nearly equally in the tow dataset, except for the sequence related to gene encoding cytochrome P450 which was more abundant in LSM location with 11.8 % and 5.6 % of total sequences related to secondary metabolism subsystems in LSM and ESM datasets respectively (data not shown).

101 Table 12: sequences related to Biosynthesis of phenylpropanoides. Fisher test was realized in order to estimate significant differences within our two metatranscriptomes.

4.3.3.2 Functional analysis based on Gene Ontology (GO terms) Gene ontology terms were assigned to 12426 sequences (24.83% of total annotated reads) in LSM dataset and 10018 sequences (25.04% of total annotated reads) in ESM dataset. Theses assignations are based on sequences similarity with known proteins annotated with GO-terms. GO ontology analysis based on Blast2go database showed that the distribution of gene functions for cDNA sequences from LSM and ESM dataset were similar. The 35 most significantly transcripts matching well characterized genes in public databases are show in (Fig. 34 ). Many transcripts could not be annotated, because they either matched to anonyms ESTs, or did not hit any entry in the database.

102 Fig. 34: The 35 most abundant KEGG-pathways, detected in LSM, ESM metatranscriptome dataset.

103 Fisher Test used to estimate gene function differences between the two dataset, showed that there were 22 significant different GO-terms, listed in (Table 13 ).

Table 13 : GO-terms significant differences between LSM and ESM dataset. The Fisher-test was calculated based on total number of sequences annotated in the two dataset, using FDR (corrected p-value by False Discovery Rate Control), FWER (corrected p-value by Family Wise Error Rate) and Single Test p-value (p- value without multiple testing corrections).

The most specific GO-terms were (GO:0045730, GO:0047727, GO:0031419, GO:0042110), which represented respectively nutrient reservoir activity , isobutyryl-coA mutase activity , cobalmine binding and Tcell activation ; which were all more represented in ESM dataset rather than in LSM. Other most specific GO-terms were (GO:0005840, GO:0003735, GO:0006097), representing; Ribosome , structural constituent of ribosome and glyoxylate cycle , which were all significantly more present in LSM dataset ( Fig. 35 ).

104 Fig. 35: GO-terms significant differences between LSM and ESM dataset. The test was calculated based on total number of sequences annotated in the two dataset, using FDR (corrected p-value by False Discovery Rate Control), FWER (corrected p-value by Family Wise Error Rate) and Single Test p-value (p-value without multiple testing corrections). GO annotation analyses based on KEGG metabolic pathways showed that, gene function categories associated with metabolic process are highly represented in both dataset. To group the proteins with associated GO terms, the top level term for each GO category “Molecular function”, “Biological process” and “Cellular component” was recorded at th e different match levels. The distribution of hits in the different categories turned out to be roughly similar for each match level. Within the 10864 and 8871 annotation associated with “Molecular function” in LSM and ESM respectively, at hierarchic level 3, the dominant term are clearly, protein binding, nucleotide binding and hydrolase activity; which were common in LSM and ESM datasets. In contrast, oxidoreductase activity, nucleic acid binding, transferase activity and ion binding, were present only in LSM dataset ( Fig. 36 ).

105

Figure 36: Molecular function distribution of LSM and ESM. At the same level, the biological processes most represented terms were cellular biosynthesis process, cellular macromolecule metabolic process and protein metabolic process; which were common within the two datasets. In contrast certain categories were present only in one dataset and completely missed in the other such, gene expression in ESM dataset and transpiration in LSM dataset. The other cellular functions such as nucleobase, nucleoside and nucleotide and nucleotide acid metabolic process, in addition to cellular nitrogen compound metabolic process, macromolecule biosynthesis process were common in the two datasets with different percents ( Fig. 37 ).

Figure. 37: Biological processes distribution of LSM and ESM.

106 Finally, within “Cellular component” the dominant terms are divided between cell part, membrane-bounded organelle, non-membrane-bounded organelle and organelle part, which were largely similar between LSM and ESM metatranscriptome dataset ( Fig. 38 ).

Figure. 38 : Biological processes distribution of LSM and ESM.

Estimating the number of genes expressed in our datasets In order to estimate the number of genes expressed in our two dataset LSM and ESM. All annotation results based on SEED database (MG-RAST), NCBI non-redundant protein database (Blast2GO) and SwissProt database were compared and corves of accumulation were generated ( Fig. 39 ). Sequences were assigned gene names based on the gene name annotation of the best blast match for that sequence. The numbers of proteins with known function identified were close together for SwissProt and Blast2GO, but much smaller for MG-RAST (7861, 7875, and 1753 for LSM, respectively; Fig. 39A). Among reads annotated by at least one system, about half (46%) had an associated enzyme code. In contrast to overall protein annotation, most EC numbers were given by one system only. For 71% of the 1362 reads annotated by the 3 systems, the EC annotations were consensual (data not shown). The proportion of proteins associated with an enzyme code (i.e., of enzymes) was much lower using either SwissProt (8%) or Blast2GO (12%) than using MG-RAST (36%). As a result, the numbers of enzymes identified by the three pipelines were much closer together than the numbers of protein annotations (data not shown). Blast2GO retrieved the most enzyme codes, followed by SwissProt and MG-RAST (898, 720 and 616, respectively, for LSM and 878, 678, and 621 for ESM dataset, Fig. 39B ).

107

Figure 39: Protein annotation versus EC annotation. A, accumulation of protein names across the mRNA datasets. Circles denote Blast2GO-, diamonds SwissProt-, and triangles MG-RAST annotations. Open symbols are for LSM, and filled symbols for ESM. Shown is the average of 100 random samplings; standard deviations always fitted within symbol height. B, same as above, but for proteins associated with an EC code. Comparison with other transcriptome and metatranscriptomes: NMDS visual results clearly showed that that our two dataset (LSM and ESM) are largely similar and largely different of all other datasets. Coral larval transcriptome (Test4) and was the closest one to our two metatranscriptomes datasets, followed by Artemisia annua transcriptome (Test3), and wood-degrading fungus phanerochaete chrysosporium transcriptome (Test8), and Olea europaea transcriptome (Test6). However, the comparison of our tow metatranscriptomic datasets with functional annotation issue of SEED database using two dimensional NMDS ordinations has shown that our two datasets were largely different from functional annotation of several prokaryote and eukaryotes genomes.

108 Figure. 40 : Two dimensional NMDS ordinations of functional annotations of 10 transcriptomes. All pairwise distances among samples are calculated with Kulczynski distance. The matrix of data consist of 10 datasets include our two metatranscriptomes datasets (LSM and ESM), and 8 other transcriptomes downloaded from Short Read Archive (SRA) division of the Genbank repository and submitted on MG-RAST. The 10 datasets (Test1: American Chestnut transcriptome, Test2: Zygaena Filipendulae transcriptome, Test3: Artemisia Annua transcriptome, Test4: Coral larval transcriptome, Test5: American Ginseng transcriptome, Test6: Olea Europaea transcriptome, Test7: Marine microbial metatranscriptome and Test8: Phanerochaete Chrysosporium transcriptome), G1: arabidopsis thaliana genome, G2: Gibberella zeae genome, G3: Neurospora crassa genome, G4: Blastopirellula marina genome, G5: Drosophila melanogaster genome; were compared in function of the percent of sequences related to 26 subsystems (functional categories) based on SEED databases. Only significant differences at P-Value of 0.05 were represented in this ordination.

109 4.4 Discussion 4.4.1 Choice of studied ecosystem and what possible ecological significance: Studying natural environments as alpine ecosystem, in representative ways that elucidate the interaction between all the component of these environments is hampered by several factors. These interactions studies cannot always be simply ascribed as there are many mechanisms through which the soil microbial community can influence plant community structure and the inverse, and associations between plant and microbial community structures can arise through the shared influence of external environmental factors (Lamb et al., 2011). All these factors and interaction are more clearly observed in alpine tundra ecosystems. In alpine environments, climate change may alter vegetation composition as well as the quantity and quality of plant litter, which in turn may affect microbial community composition and functioning (Djukic et al., 2010). More precisely, the duration of snow cover, the frequency of freeze-thaw events influence radically alpine tundra structure and functioning (Nemergut et al., 2005; Edwards et al., 2007; Schöb et al., 2009). The presence of snowmelt gradient illustrated by the presence of lately snowmelt and Early snowmelt locations, represent an important opportunities to study the effects of snow on local climatic conditions and ecosystem processes. Thus, understanding what gene are expressed, and under which conditions, and moreover, what metabolic pathways were activated or not in relation to snow gradient, all these question could be ecologically very important to understand the alpine tundra ecosystem, in relation with fundamental nutrient cycling supposed to be important as alpine oils sequester a high quantity of C. 4.4.2 Choice of molecular approach for ecological questions: In order to get the best estimation or quantification of factors that control nutrient cycling and major processes within snowmelt gradient (LSM and ESM) largely different in their abiotic and biotic factors; the choice of molecular approach to responds to some ecological question is very important. The basis of molecular microbial ecology is the molecular markers; which can be genes or gene transcripts that can be identified in a complex pool of nucleic acids providing information on the group of organisms harboring these genes. Information based on DNA and RNA can answer questions related to the population structure of a specific environment such as: is a particular gene present in this population? What is the phylogenetic composition of this community? Are particular genes expressed? How is the community composition changed after perturbation of the environment? What are the spatial or temporal differences in a particular habitat? However, the choice of molecular marker is

110 very dependent on the question asked. While the use of DNA as a molecular marker reveals information on the presence of organisms or the potential function of a community, it gives no information on the activity at the time of sampling. However, in recent years, RNA has been more often targeted for information on the active fraction of the population, as transcript formation is believed to follow metabolic activity (Sørensen et al., 2009). In this context, the Metatranscriptome should reflect the pattern of gene expression of microbial community in a complex environment (Bailly et al., 2007). The two cDNA dataset were generated from alpine soil samples by applying 3’ -polyA enrichment of sequences using oligo(dT) primer, which is expected to eliminate the majority of rRNA from our generated dataset, which was not true in our datasets (we had 15% and 10% of reads with blast match hit against ribosomal reference for LSM and ESM respectively). Which correspond to the fact that although poly(A) enrichment of eukaryotic mRNA can considerably reduce the sum of rRNA reads retained and eventually sequenced, some will without doubt pass through the purification and tail-based amplification (John et al., 2009). Any way these ribosomal sequences were isolated and analyzed separately from putative mRNA reads, because retained rRNA reads could reduce the efficiency of protein- coding sequences recovery from eukaryotic metatranscriptome libraries, and interfere with any analysis of the mRNA transcripts present in the dataset (McGrath et al., 2008; John et al., 2009). However, the presence of relatively large percentage of eukaryotic mRNA sequences in our library indicates the adequate selective via amplification of mRNA poly(A) using the oligo-dT approach, in spite of the persistence of rRNA reads within the good dataset, which, mean that the methods used for poly(A) enrichment was not totally successful. 4.4.3 Qualitative and quantitative aspects of obtained dataset: Current RNA-seq approaches can present many difficulties. First, the RNA-seq signal across transcripts tends to show non-uniformity of coverage, which may be a result of biases introduced during various steps, such as priming with random hexamers, cDNA synthesis, ligation, amplification and sequencing. Second, commonly used RNA-seq strategies can result in transcript-length bias because of the multiple fragmentation and RNA or cDNA size- selection steps they use. This bias may result in complications for downstream analyses (Ozsolak & Milos, 2010). Trimming and size selecting procedure of 454 pyrosequencing reads is crucial in order get the most representative estimation that occur in situ. Technical difficulty that could be encountered is the long poly (A/T) tails in cDNA, which may lead to low-quality sequencing reads (Sun et al., 2010). Short reads contained frequently poly (A/T) stretches and represented 36% and 35% of row dataset generated by 454 pyrosequencing GS

111 FLX system for LSM and ESM respectively, which means that the approach that we have applied by oligo(dT) priming with 30T was not totally adequate. 454 pyrosequencing does not efficiently process homo-polymer regions greater than 8 bp in length, thus, using oligo-dT primers will yield sequences that are 3’ -enriched relative to the entire transcriptome, resulting in sequences frequently containing polyadenylated tails which could significantly reduce the length of informative reads, and also produce low-quality sequencing reads (Meyer a et al,. 2009; Vega-ArreguÍn et al, 2009; Sun et al., 2010). In addition, elimination of short reads usually with poly (A/T) stretches is based on the fact that, the significance of sequence similarity depends in part on the length of the query sequence, and that longer sequences reads are generally more informative, allowing for more robust annotation, thus, short sequencing reads obtained from next-generation sequencing, frequently cannot be matched to known genes (Frias-Lopez et al ., 2008; Meyer a et al,. 2009). Another technical problem is the number of PCR cycles used to amplify double strand cDNA. Several PCR cycles were tried in order to get the optimal size for pyrosequencing. However, 31 cycles were chosen for final application which we think could be a major element of getting biased results as increasing the number of PCR cycles could increase the PCR artifacts. More cycle numbers leads to accumulation of more point mutation artifacts and people suggested to perform PCR at as few cycle numbers as possible (Wu et al., 2001). However, trying less numbers of PCR cycles as 24 and 27 was not capable for obtaining the necessaries output for pyrosequencing determined at 5µg /100µl. Since BLAST E-values are affected by sequence length, short sequences may be unfairly penalized if the E-value threshold is chosen as stringent as 10 -10 . Therefore, a less stringent E- value of threshold of 10 -3 has been used in previous studies for the taxonomic and functional characterization of short sequences reads (Shrestha et al., 2009). Comparing our two metatranscriptome dataset with several databases, could be important step in order to get the widest range possible of reference databases that could represent a comprehensive, quantitative analysis of our datasets. These comparisons revealed that the diversity of our dataset is largely dependent on the used database like reference (UniPort TrEMBL, UniPort- Swiss-Prot, NCBI-NR and SEED) and the threshold of E-value cutoff (10 -2 to 10 -20 ), with the presence of negative relation between the number of annotated sequences and the strictness of E-value cutoff. The different percentage of annotation reads based on these different databases could be explained by the fact that, SEED database is not totally adequate for eukaryote genome, as there are only 29 eukaryal genomes, and not even complete (Meyer et al., 2008), and that UniProtKB/SwissProt database contain genomes highly selected and

112 revised, which explain the low percentage of annotated reads using this database, and finally that UniProtKB/TrEMBL is less stringent and contain more genomes with minus precision, which also could explain the high percentage of annotated reads (Wu et al., 2006).

4.4.4 Assembly, as important factor for improving 454 pyrosequencing reads quality: Within the last decade, large-scale ‘omic’ approaches have evolved to the point where the original primary limitation, namely depth of sequencing, has been replaced with the challenge of analyzing ever-increasing sequence datasets (both the size of individual datasets and the total collection of datasets being amassed through time). These computational challenges include the correct assembly of genomes, functional characterization of small genetic fragments and phylogenetic placement in the absence of traditional marker sequences (e.g. the 16S rRNA gene), (Morales & Holben, 2010). The short sequence reads make assembly of overlapping sequences problematic, moreover, the challenge is greatest when genomic data are not available to aid assembly, when there is polymorphism, and when the data are cDNA sequences that contain variation created by alternative splicing and vary widely in transcript abundance, making coverage uneven (Vera et al., 2008). However, a challenge for any 454 project is obtaining sufficient coverage of less abundant transcripts. The 28362 and 26164 sequences in LSM and ESM respectively, considered as singleton (not incorporated into contigs), confirm the low level of coverage in our dataset and the high diversity. Normally deep sequencing of transcriptome with 454 sequencing typically produces many singleton sequences that fail to be assembled, which from one part, could result from sequencing errors, artifacts of cDNA normalization, or contaminant from other resources, and from other parts, could represent unique genes expressed at low levels (Meyer a et al,. 2009). Curves of accumulation of reads assembled into contigs and reads not incorporated into contigs, and considered as singleton, clearly showed a positive relationship between the total number of reads and the number of reads assembled into contig, and so, assembly that we have used was not capable to explore the complexity of our dataset; because it clearly did not reach the saturation level. Anyway, there is no evident solution available for assembling the sequences from a non-model system transcriptome sequencing project into a number of sequences precisely matches the number of genes expressed (Meyer a et al,. 2009). The noncontiguous sequence fragments resulting from small sequence fragments and the low coverage of a sample often lead to disconnected genetic elements. As parts of an organized system, gene sequences alone provide little information in the physiological or the ecological sense without understanding the regulatory framework that controls their expression (Sorek &

113 Cossart, 2009). Also, the surrounding sequences often provide a context to discern whether putative gene sequences are capable of producing a viable protein. Unless successfully assembled into sequences representing operons or other genetic organizational units, short sequence reads merely produce a collection of genetic fragments that may have high identity with parts of genes or gene families, but do not provide sufficient information to ascertain an ecological function or guarantee gene product functionality (Morales & Holben, 2010). As a result we preferred to go forward on taxonomic and functional analysis with not assembled reads because, (i) we could not verify the validation of our assembly, (ii) assembly using CAP3 didn’t result in a lot of long contig but mostly with short contig with not more than 2 reads par contig, (iii), accumulation curves revealed clearly that we couldn’t get the saturation level. Finally, we think that failing to reach saturation level when applying assembly is largely related to volume of data or reads obtained by pyrosequencing. Tringe et al (2005) showed that the sequence data have posed challenges to genome assembly, which suggests that complex communities will demand enormous sequencing expenditure for the assembly of even the most predominant members.

4.4.5 Phylogenetic analysis; different approaches, different databases, and as consequence different results: All biologists who sample natural communities are overwhelmed with the problem of how well a sample reflects a community’s true diversity. New genetic techniques hav e revealed extensive microbial diversity than was previously detected with culture-dependent methods and morphological identifications. As a result, we must rely on samples to inform us about the actual diversity of microbial communities (Hughes et al., 2001). However, a partial question emerging from environmental sequencing projects is the extent to which the data are interpretable in the absence of significant individual genome assemblies, as most of microbial communities are extremely complex and thus not assembled (Tringe et al., 2005). For environmental ‘omic’ approaches, this genomic phenomenon raises the question of how to assign identity to randomly mixed small sequence fragments in the presence of high genomic and transcriptomic diversity, especially under high sequencing error rates. At least two scenarios arise in current research that can lead to erroneous genome or transcriptome reconstructions: (1) multiple related strains with subtle genomic/transcriptomic variability are present, but are not recognized as individual strains or transcripts in the assembly process, thus abolishing the detection of species-level genetic richness, (2) multiple different genome arrangements are present and real, but one is selected over the others based on the order of

114 discovery (Morales & Holben, 2010). Estimating taxonomic diversity in high mountains soil is important due to the fact that these areas are changing rapidly due to global warming, and many high-elevation ecosystems that depend on snowmelt and long periods of snow cover may disappear in the future, leading to the extinction of many species before they are even known to science (Freeman et al., 2009). Knowing which taxa are present in theses ecosystem is essential for understanding the results of metatranscriptome analyses for dataset generated from these ecosystems. Dominant community member should concomitantly dominate a pyrosequencing data set. The differences in the proportional abundance of a given species across samples should be biologically meaningful and reflect the actual proportional abundance of that species in the environment (Amend et al., 2010). The abundance of thousands of transcripts varies markedly in response to environmental and developmental perturbations, affecting protein translation and activity, and thus organism phenotypes. Consequently, it is widely assumed that variation in the abundance of transcripts among individuals within a population will ultimately lead to divergence in transcript abundance among populations and taxa through adaptive phenotypic selection (Broadley et al., 2008). For sequences amplified from reverse transcribed RNA it could nevertheless be assumed that some correlation should exist between sequences number and the volume of corresponding biologically active cytoplasm (Bailly et al., 2007). As our two metatranscriptomes datasets (LSM, ESM) represent two ecosystems totally different, in function of different vegetation covers and as consequence different microbial populations, and also different snow-cover duration; it is supposed to detect different community structures within our two metatranscriptomes datasets. When comparing environmental sequences with fully identified reference sequences, a common practice has been to rely on threshold values for sequence similarity (Ovaskainen et al., 2010). As we have seen previously for ribosomal reads, comparing our datasets with different database with different E-values will result in different taxonomic affiliations. Taxonomic annotations of Putative mRNA against SEED database were largely biased, for example MG-RAST assigned 5886 sequences in LSM dataset to fungi and within fungi, 100% of sequences are related to the phyla Ascomycota , which do not correspond with results previously obtained on the community structure of alpine tundra. Zinger et al (2009) showed that that were a variety of phyla related to fungi with the evenly dominance of Basidiomycota and Ascomycota. Moreover, majority of molecular surveys of fungal community composition have shown that Basidiomycota and Ascomycota are the dominant fungi in vegetated soils (Freeman et al., 2009). In the dry meadow, Meyer et al (2004) investigated subnival and summer eukaryotic

115 communities and found sequences related to the fungi, cercozoans, alveolates and lobosea. Fungi dominated this soil, and three major fungal phyla (ascomycetes, basidiomycetes, and zygomycetes) were present. However, the dominance of Ascomycota sequences within our two datasets could be related to the fact that SEED database is not totally adapted for eukaryotes metagenome, but more for prokaryote once. Therefore, and as there were no raison to believe that there were not even one sequence belong to Basidiomycota or even to other class of fungi; we don’t feel comfortable for depending on the t axonomic affiliation of SEED database. In contrast to taxonomic affiliation obtained by performing Blast match against SEED database, result obtained by comparing our two datasets to NCBI-NR database show largely more accepted results. Fungi was the dominant phyla within the two datasets (LSM, ESM), within fungi we can see largely similar distribution of sequences related to Basidiomycota and Ascomycota in LSM dataset, and week dominance of sequences related to Basidiomycota comparing to ascomycota. Moreover, all putative mRNA analyzed using NBC software using parameters par default has shown that the majority absolute of deposited reads within the two datasets were assigned as bacteria (data not shown). However, these results largely biased and out of sense were completely ignored while trying to get an estimation of the taxonomic origin of our two metatranscriptomes. Many challenged are associated with the interpretation of microbial gene expression patterns at the community level. Factors influencing metatranscriptome analyses include the specifics of mRNA synthesis and degradation rates, environmental conditions at the time of sampling, sequences reads size and target gene size, and the specific method used for gene identification and annotation (Frias-Lopez et al ., 2008). The differences in the proportional abundance of a given species across samples should be biologically meaningful and reflect the actual proportional abundance of that species in the environment (Amend et al., 2010). But in our experiment this theory could not be confirmed as the abundance of fungi and plant within the two datasets, could be explained by using un oligo(dT) approach, which means that there were an enrichment of reads containing poly(A) tails that characterize almost all eukaryotes. The presence of certain species was not expected within alpine tundra soil such as Branchiostoma floridae , Nematostella vectensis, which could be the result of methodological artifacts such as sequencing error or other technical problems while blasting against the different databases. Overall, the two used taxonomic approaches; MG-RAST and Blas2go taxonomic affiliation, demonstrated the dominance of eukaryotic origin of sequences, then sequences related to plant with always the presence of about 15-25% of sequence of bacteria. The taxonomic assignation of MG-RAST and Blast2go was relatively similar, only at the

116 kingdom level but, differed significantly at finer phylogenetic scales phyla or taxa level. However, understanding the functional roles of communities within our two metatranscriptomic dataset should be correlated with dominant taxonomic groups. For phylogenetic analysis, the numbers of species detected in a sample, or of the numbers of organisms discerned at any given phylogenetic level, are strongly affected by the number of sequences analyzed (Roesch et al., 2007). Vega-Arreguin et al (2009) by studying Palomero maize transcriptome, found that the fact that increasing the number of 454 sequencing runs shows a significant increase on the number of novel genomic sequences matched with expressed sequence tags, providing expression evidence for such genome regions, which most probably represent genes or transcriptionally active non-coding regions with low levels of expression. As we don’t kno w exactly the source of biases within our two datasets represented by the presence of about 15% and 10% of ribosomal reads from total annotated reads in LSM and ESM respectively, we tried to identify these potential ribosomal reads using MG-RAST and MEGAN approaches. The comparison of both results obtained from using the two software showed largely similar results within our two datasets, with the presences of certain exceptions. Taxonomic annotations based on SEED database via MG-RAST software showed large dominance of sequences related to bacteria in comparison with taxonomic annotations obtained by comparing against NCBI-NR database using MEGAN software. Taxonomic affiliations of putative ribosomal reads based on SEED database were largely homogeneous between LSM and ESM datasets. In contrast, we can find that some microbial phyla were over- or underrepresented or even missed from one dataset to another based on comparison with NCBI-NR database. Overall, and as mentioned above, using oligo(dT) approach should yield enrichment with reads with polyA tails, and therefore, theses sequences considered as rRNA were considered as artifact or sequencing errors and were lately eliminated.

4.4.6 Functional annotations using MG-RAST and BLAST2GO tell different stories: MG-RAST arrange metabolic pathways into a hierarchical structure in which all of the genes required for a specific assignment are regrouped into subsystems, which are groups of gene or functional roles acting together in a biological processes (Meyer et al,. 2008), at the highest level of organization, the subsystem include both catabolic and anabolic functions (for example secondary metabolism) and at the lowest levels of the subsystems there are specific pathways (for example, Biosynthesis of phenylpropanoides). The comparison of our two metatranscriptomes datasets showed that; first, a large range of genes expressed were detected

117 in both datasets, then, some metabolic pathways were significantly more represented in LSM rather than in ESM and the inverse. As MG-RAST classifies reads within functional subsystems in three hierarchical levels; it was interesting to get more details of significant differences subsystem hierarchical level 2 and 3. Some subsystems has not shown differences at the first level of subsystem hierarchy but showed further differences within theses subsystems at second or third level of hierarchy, for example, secondary metabolism subsystem was not different between LSM and ESM, furthermore, there were no significant differences within the subsystems related to secondary metabolism general subsystem, but within the biosynthesis of phynilpropanoides subsystem hierarchy 3 for example, there were significant differences, concern, Apigenin derivatives , Phytosterol biosynthesis , which were largely more abundant in ESM metatranscriptome. In contrast, Flavanone biosynthesis was largely more abundant in LSM metatranscriptome. Functional differences exist between species with regard to species aspects of C cycling. For example, it has been shown that specific components of the fungal community in soil take part in the decomposition of particular C sources, such as glycine, sucrose, lignin, cellulose and tannin protein, and only certain groups of microbes are associated with the breakdown of specific C compounds (Nielsen et al., 2011). Trehalose is widely distributed in living cells and is found frequently in yeast, fungi and plants (Cardoso et al., 2007), where it plays a variety of roles primarily as an important stress protecting against a variety of stress conditions including dehydration, heat and cold shock, oxidative stress and antimicrobial drug treatments, and serve also as a storage carbohydrates in fungi due to the presences of trehalase enzyme, which hydrolyzes trehalose into two molecules of glucose; consequently genes, proteins and intermediate metabolites involved in Trehalose biosynthesis are playing as key regulators of basic carbon metabolism in fungi. Trehalose synthase (EC:5.4.99.16) and Trehalose-6-phosphate phosphatase (EC: 3.1.3.12) enzymes which were detected only in ESM location are included in the most widespread pathway of trehalose synthesis in many biological systems. When high levels of trehalose are produced in the cell, perhaps as a result of exposure to stress, TreS may function to convert this trehalose to maltose and then to glycogen when the stress is removed. Removal of trehalose is probably essential because high levels of trehalose may be toxic. In contrast, if trehalose falls to a dangerously low level, TreS may function to convert glycogen to maltose and then to trehalose (Pan et al., 2008). In contrast, enzymes like Pyruvate dehydrogenase E1 (EC 1.2.4.1), Enolase (EC 4.2.1.11), Pyruvate decarboxylase (EC 4.1.1.1), involved in transforming pyruvate into acetyl-CoA (a process called pyruvate decarboxylation), which in turn can be used in the citric acid cycle to

118 carry out cellular respiration were largely more abundant in LSM dataset. These enzymes play an essential role and contribute to linking the glycolysis metabolic pathway to the citric acid cycle and releasing energy via NADH (Radin et al., 2009). However, these results give an idea or tendency to what is happening in these environments (LSM and ESM), for example the variation within enzymes related to trehalose could give the impression that ESM location which have no regular snow cover and is influenced by very low temperatures and repeated freeze-thaw cycles, are more stressed, therefore, there is more production of enzymes related to stress as trehalose. In contrast, within LSM location we can clearly see the dominance of enzymes in relation with respiratory cycles, which means that these organisms living in LSM location are active and produce these enzymes in relation with repertoire activities. Historically, the microbiology of these soils has been investigated in the snow-free season only, because it was assumed that low temperatures and frozen soil prohibited microbial activity during the winter. More recent studies demonstrate that microbial processes continue in snow-covered soils, and that a significant portion of yearly decomposition and production of microbial derived trace gases, occurs in subnival soil (Nemergut et al., 2005). In alpine environments, Bryant et al (1998) considered that the variation in decomposition rates along a snowmelt gradient was a function of temperature and moisture. However snow was not assessed as a potential determinant of litter decomposition, as, during summer, soil temperature and soil moisture were similar in late snowmelt and early snowmelt locations. These finding could explain the fact that we have obtained largely similar community structure and functionality within LSM and ESM locations, as we have got our samples in August long far as snow cover was melt in the two locations. Our finding go well with Wallenstein et al (2008), which has found that fungi phyla did not differ greatly in abundance between samples taken prior to soil freezing and after the soils thawed; suggesting that most microorganisms survive the winter intact through resistance mechanisms. In contrast, and during the rest of the plant growing season, microbial biomass is very dynamic and appears to coincide with plant activity, being relatively high when soils are warm and moist. These microbial dynamics appear to be linked to shifting availabilities of substrate and changes in temperature regime, and therefore, microbial community was greatly affected by changes in temperature regime and subst rate availability led to this study’s hypothesis that the summer and winter microbial communities differ in function and composition (Lipson et al., 2002). Baptist et al (2010) , found that although the effect of snowmelt locations was not significant, they found a trend toward higher decomposition rate in late snowmelt locations (Baptist et al., 2010). Microbial biomass is high during winter and declines rapidly as snow melts in the

119 spring, and that this decline is associated with changes in temperatures regime and substrate availability (Lipson et al., 2002). Moreover, evidence is mounting that snow-covered soils are teeming with active microbial life (Lipson & Monson., 1998; Nemergut et al., 2005). One of the most important aspects in mining genomics data is to associate individual sequences and related expression information with biological function. Automatic functional annotation is an effective approach to solve this problem. Functional annotation allows categorization of genes in functional classes, which can be very useful to understand the physiological meaning of large amounts of genes and to assess functional differences between subgroups of sequences (Conesa et al., 2005). GO annotation provides a resource for investigation of specific processes, functions or cellular structures involved in certain metabolic processes involved in essential biogeochemical cycles within the soil. The hierarchical structure of these vocabularies allows the selection of sets of genes involved in a specific process at the desired level of details. For each sequence, the specific annotated terms were mapped to higher level parent terms to provide a broad overview of the groups of genes cataloged in this metatranscriptome for each of the three ontology vocabularies. The hierarchical structure of these vocabularies allows the selection of sets of genes involved in a specific process at the desired level of details. The functions of gene identified cover various biological processes which were largely common within the two metatranscriptome datasets. GO annotation analyses showed that, overall, the two metatranscriptomes datasets were similar and that gene function categories associated with metabolic pathways are highly represented in both metatranscriptomes. The basis for such methodology is to eventually enable ecological descriptions based on the suite of gene expressed in the organisms active in the alpine ecosystem, which we could not get because the lack of replicates in our experimental approach. For non-model species with little or no genomic data available, sequencing offers rapid characterization of large portion of the transcriptome and therefore provides a comprehensive tool for gene discovery (Wang et al., 2009). Genes involved in plant secondary metabolism have frequently been identified by Expressed Sequence Tag approach. The lower cost and greater sequence coverage offered by pyrosequencing makes it possible to identify more candidate gene involved in plant natural product biosynthetic pathways, especially those with low abundance and often missed by conventional EST project (Wang et al., 2009). Sequence related to secondary metabolism represented dominant fraction within functional annotation based on B2GO. Secondary metabolism is related directly to plants, therefore obtaining high percentage of reads related to secondary metabolism especially using Blast2GO software means that sieving processes

120 effected on soil samples were not sufficient to eliminate all the raciness from the soil and that a large portion of these fragment were amplified using oligo(dT) approach. However, the percentage of sequences related to secondary metabolism was largely different using MG- RAST and Blast2go approach, with large dominance of sequences related to secondary metabolism within NCBI-NR database rather than SEED database. These variations are initially related to differences between databases and E-values used to get these estimations. Overall, these variances pose a real problem for the interpretation of these results at the community level. Combined comparative analyses of housekeeping genes and functional genes may provide information in both phylogenetic diversity and the potential functional diversity of microbial communities. Such information is very useful in functional diversity studies to track highly expressed genes and genes critical in biogeochemical pathways (Torsvik & Øvreås, 2002). However, in-depth information on community functioning in a biogeochemical or ecological context within our two datasets (LSM and ESM) was limited, as most of the predictions were related to cellular housekeeping functions. However, a large portion of house-keeping genes, that is required for the maintenance of basic cellular function were largely dominant in our both metatranscriptomes datasets. High levels of expression might be expected for some house-keeping genes, causing them to be well represented in even an incomplete transcriptome sequencing effort. Finally, lack of annotation for many of the genes discovered will remain a problem when working with non-model species, regardless of methodological approach or assembly quality.

4.5 Conclusions and perspective. This is to our knowledge the first report of a metatranscriptomic approach from eukaryotic alpine soil dataset. In ecosystem lacking fully-sequences genome and without any available molecular data, interpretation of results of sequencing and assembly is very difficult. This study lacks technical replicates among separate 454 runs, so it is not possible to deduce whether the biases we encounter here are representative of pyrosequencing in general. Therefore, it is not possible to get definitive conclusion about metabolic pathways explained previously. However, despite the caveats and potential improvement, this study demonstrates the possibility of the metatranscriptome approach, from alpine soil extracted RNA to the recuperation of functional cDNAs expressed in a eukaryotic community occupying this ecosystem. Despite the limited number of sequences obtained in this study, we identified sequences coding enzymes that could participate to major soil biochemical processes. Large

121 scale sequencing is likely to identify a significant number of such genes that could be used to clarify the activities that are expressed in situ, by the representative of different eukaryotic phyla. Based on these coverage estimates, frequently transcribed genes from abundant taxa were well represented, but increased depth would have been required to fully capture some specialized processes carried out by most rare members of alpine soil community. Pyrosequencing is a powerful alternative to traditional identification techniques in terms of cost, time and throughput of samples, and is likely to recover more species and provides a less biased qualitative picture of the microbial community composition. However, because of PCR biases and technical errors, caution should be used to interpret quantitative aspects and the biological meaning of the data. Moreover, tools are needed to handle next-generation sequencing data and to deal with the computational complexity of large-scale studies. According to all what mentioned above, the question the most important that we asked is that, in the absence of technical replications for our two datasets, what ecological significance we can conclude, and moreover, is it possible that these significant differences could reflect the actual taxonomic and functional differences between the two dataset (LSM and ESM). For further studies using the metatranscriptomic approach to elucidate the taxonomic and functional diversity within natural environments, it is important to note that: I. Despite the growing availability of next-generation sequencing technologies, significant effort is still often needed to collect the required biological starting materiel. For transcriptome analysis, this generally entails the RNA extraction in high output and quality, mRNA isolation and purification, synthesis and subsequent amplification of cDNA. II. These researches must involve methodological progress, specifically with respect to an increased cDNA sampling effort in combination with an overall cDNA length that allows for taxonomic and functional assignments with high significance, as the existing approaches are often not sufficient to detect certain transcripts and/or cover their entire length. III. Assembling genomes for low-abundance community members in any of environmental samples communities would clearly require significantly more sequence data. IV. Global metatranscriptome analysis still relies on new high throughput sequencing platforms that provide improved cDNA read lengths. Optimal annotation will be achieved when getting large metatranscriptomic data sets with read lengths more than what have been realized. Thus, the 454 GS FLX Titanium pyrosequencing with individual reads of around 400 bp is an important step forward in order to conduct metatranscriptomics.

122 454 Analyse de données de pyroséquençage Un objectif important en écologie microbienne est de relier des populations microbiennes à des processus environnementaux spécifiques (par ex. les transformations biogéochimiques), autrement dit, fournir des liens directs entre les organismes et leurs activités. Différentes stratégies ont été utilisées pour étudier le fonctionnement des écosystèmes en relation avec la structure des communautés microbiennes. Les objectifs essentiels de ces efforts sont d'attribuer des fonctions clés aux membres spécifiques des communautés et, par rapport à la stabilité de l'écosystème considéré, révéler la coopération entre les membres de la communauté et les surabondances fonctionnelles. La première étape après l ’obtention des données de pyroséquençage 454 (ou d'autres types de séquençage haut débit) fournies après l’extraction des acides nucléiques des échantillons environnementaux étudiés, est l ’analyse de ces données en utilisant des bases de données appropriées avec les logiciels appropriés. Sans tenir compte de l'approche de séquençage utilisée pour produire les données, les premiers pas dans l'analyse de n'importe quel métagenome et donc, par extension , n’importe quel métatranscriptome, impliquent la réparation (trimming) des séquences (l'enlèvement des amorces et adaptateurs utilisés pour le séquençage), suivi par la sélection de la taille par l’élimination de courts reads et de reads de mauvaise qualité (par exemple contenant des parties polyA/T) et ensuite comparer ces séquences avec d’autres séquences déjà identifi ées dans des bases de données publiques. Pourtant, avec l'avancement dans les approches moléculaires et biochimiques pour analyser les environnements naturels, beaucoup de bases de données ont été créées et développées constamment pour recevoir la meilleure estimation de la diversité taxinomique et fonctionnelle dans ces environnements étudiés. Cette charge quantitativement intensive fournit des données fondamentales pour beaucoup d'analyses ultérieures, en incluant les comparaisons phylogénétiques et les annotations fonctionnelles. Un pas important dans l'obtention de séquences du génome en haute qualité est l ’assemblage correct dans des séquences plus longues représentant exactement des régions génomiques contiguës. L'assemblage et d'autres outils analytiques utilisés pour manipuler les données de pyroséquençage peuvent être évaluées en utilisant les courbes d'accumulation ou de raréfaction. L’étape suivant e après le trimming des séquences, la sélection de taille et l'assemblage est d’évaluer le contenu tax onomique de l'échantillon ainsi que la relation entre fonctions et populations spécifiques par l'attribution de phylogénies aux fragments génétiques spécifiques, ce qui est, en soi, un énorme défi pour les études sur la diversité biologique fonctionnelle.

123 Après avoir estimé la diversité taxonomique, il est important d’attribuer les espèces appartenant à un certain taxa à certains groupes fonctionnels ou à leur rôle biologique. C'est à notre connaissance la première étude présentant une approche métatranscriptomique de sols alpins ciblant les communautés eucaryotes. Dans les écosystèmes avec peu de données moléculaires et où les génomes complètement séquencés sont manquants, l'interprétation des résultats de pyroséquençage et l'assemblage est très difficile. De plus, en absence de replicats techniques obtenus par des runs séparés de 454 (comme dans notre étude), il est très difficile de déduire si les inclinations que nous rencontrons ici sont représentatives du pyrosequençage en général. De plus, il n'est pas possible d’avoir des conclusions définitive s sur les voies métaboliques expliqué auparavant. Pourtant, malgré ces limitations et les améliorations potentielles, nous avons montré que le séquençage et la caractérisation métatranscriptomique pourrait être suffisants pour identifier beaucoup de signatures biologiques exprimées dans les échantillons biologiques complexes comme ceux des écosystèmes terrestres alpins, même si un nombre limité de séquences ont été produits et l'absence de réplicats, nous avons identifié des séquences codant des enzymes qui participent aux processus biogéochimiques importants des sols alpins. Cette étude démontre ainsi l’applicabilité des approches métatranscriptomique s basées sur l’obtention d 'ARN extraits de ces sols alpins et la récupération de cDNAs fonctionnels exprimés par la communauté eucaryote de cet écosystème.

124 References: Ahmadian, A., M. Ehn, and S. Hober. 2006. Pyrosequencing: History, biochemistry and future. Clinica Chimica Acta 363:83-94. Alagna, F., N. D'Agostino, L. Torchia, M. Servili, R. Rao, M. Pietrella, G. Giuliano, M.L. Chiusano, L. Baldoni, and G. Perrotta. 2009. Comparative 454 pyrosequencing of transcripts from two olive genotypes during fruit development. Bmc Genomics 10. Altschul, S.F., W. Gish, W. Miller, E.W. Myers, and D.J. Lipman. 1990. BASIC LOCAL ALIGNMENT SEARCH TOOL. Journal of Molecular Biology 215:403-410. Amend, A.S., K.A. Seifert, and T.D. Bruns. 2010. Quantifying microbial communities with 454 pyrosequencing: does read abundance count? Molecular Ecology 19:5555-5565. Anderson, I.C., and J.W.G. Cairney. 2004. Diversity and ecology of soil fungal communities: increased understanding through the application of molecular techniques. Environmental Microbiology 6:769-779. Andren, O., L. Brussaard, and M. Clarholm. 1999. Soil organism influence on ecosystem- level processes bypassing the ecological hierarchy? Applied Soil Ecology 11:177-188. Aranda, R., S.M. Dineen, R.L. Craig, R.A. Guerrieri, and J.M. Robertson. 2009. Comparison and evaluation of RNA quantification methods using viral, prokaryotic, and eukaryotic RNA over a 10(4) concentration range. Analytical Biochemistry 387:122-127. Arisue, N., M. Hasegawa, and T. Hashimoto. 2005. Root of the eukaryota tree as inferred from combined maximum likelihood analyses of multiple molecular sequence data. Molecular Biology and Evolution 22:409-420. Ashburner, M., C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M. Cherry, A.P. Davis, K. Dolinski, S.S. Dwight, J.T. Eppig, M.A. Harris, D.P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J.C. Matese, J.E. Richardson, M. Ringwald, G.M. Rubin, and G. Sherlock. 2000. Gene Ontology: tool for the unification of biology. Nature Genetics 25:25-29. Bailly, J., L. Fraissinet-Tachet, M.C. Verner, J.C. Debaud, M. Lemaire, M. Wesolowski- Louvel, and R. Marmeisse. 2007. Soil eukaryotic functional diversity, a metatranscriptomic approach. Isme Journal 1:632-642. Baptist, F., L. Zinger, J.C. Clement, C. Gallet, R. Guillemin, J.M.F. Martins, L. Sage, B. Shahnavaz, P. Choler, and R. Geremia. 2008. Tannin impacts on microbial diversity and the functioning of alpine soils: a multidisciplinary approach. Environmental Microbiology 10:799-809. Baptist, F., N.G. Yoccoz, and P. Choler. 2010. Direct and indirect control by snow cover over decomposition in alpine tundra along a snowmelt gradient. Plant And Soil 328:397-410. Barakat, A., D.S. DiLoreto, Y. Zhang, C. Smith, K. Baier, W.A. Powell, N. Wheeler, R. Sederoff, and J.E. Carlson. 2009. Comparison of the transcriptomes of American chestnut (Castanea dentata) and Chinese chestnut (Castanea mollissima) in response to the chestnut blight infection. Bmc Plant Biology 9. Bardgett, R.D., W.D. Bowman, R. Kaufmann, and S.K. Schmidt. 2005. A temporal approach to linking aboveground and belowground ecology. Trends in Ecology & Evolution 20:634-641. Bardgett, R. 2005. The biology of soils Oxford University Press, Oxford. Bastida, F., J.L. Moreno, C. Nicolas, T. Hernandez, and C. Garcia. 2009. Soil metaproteomics: a review of an emerging environmental science. Significance, methodology and perspectives. European Journal of Soil Science 60:845-859.

125 Beare, M.H., R.W. Parmelee, P.F. Hendrix, W.X. Cheng, D.C. Coleman, and D.A. Crossley. 1992. Microbial and faunal interactions and effects on litter nitrogen and decomposition in agroecosystems. Ecological Monographs 62:569-591. Beniston, M. 2001. The effects of global warming on mountain regions: a summary of the 1995 report of the intergovernmental panel on climate change. Benizri, E., and B. Amiaud. 2005. Relationship between plants and soil microbial communities in fertilized grasslands. Soil Biology & Biochemistry 37:2055-2064. Benndorf, D., G.U. Balcke, H. Harms, and M. von Bergen. 2007. Functional metaproteome analysis of protein extracts from contaminated soil and groundwater. Isme Journal 1:224-234. Bennett, L.T., S. Kasel, and J. Tibbits. 2008. Non-parametric multivariate comparisons of soil fungal composition: Sensitivity to thresholds and indications of structural redundancy in T-RFLP data. Soil Biology & Biochemistry 40:1601-1611. Benson, D.A., I. Karsch-Mizrachi, D.J. Lipman, J. Ostell, and E.W. Sayers. GenBank. Nucleic Acids Research 39:D32-D37. Berner, R.A. 1997. Paleoclimate - The rise of plants and their effect on weathering and atmospheric CO2. Science 276:544-546. Bjork, R.G., M.P. Bjorkman, M.X. Andersson, and L. Klemedtsson. 2008. Temporal variation in soil microbial communities in Alpine tundra. Soil Biology & Biochemistry 40:266-268. Boisvert, S., F. Laviolette, and J. Corbeil. 2010. Ray: Simultaneous Assembly of Reads from a Mix of High-Throughput Sequencing Technologies. Journal of Computational Biology 17:1519-1533. Borneman, J., and E.W. Triplett. 1997. Rapid and direct method for extraction of RNA from soil. Soil Biology & Biochemistry 29:1621-1624. Breure, A.M. 2004. Soil Biodiversity: Measurements, Indicators, Threats and Soil functions. I International Conference of Soil and Compost Eco-Biology. September 2004, León - Spain. Briones, A.M., S. Okabe, Y. Umemiya, N.B. Ramsing, W. Reichardt, and H. Okuyama. 2003. Ammonia-oxidizing bacteria on root biofilms and their possible contribution to N use efficiency of different rice cultivars. Plant And Soil 250:335-348. Broadley, M.R., P.J. White, J.P. Hammond, N.S. Graham, H.C. Bowen, Z.F. Emmerson, R.G. Fray, P.P.M. Iannetta, J.W. McNicol, and S.T. May. 2008. Evidence of neutral transcriptome evolution in plants. New Phytologist 180:587-593. Broll, G. 1998. Diversity of soil organisms in alpine and arctic soils in Europe. Review and research needs Pirinoes: 43-72. Brooks, P.D., M.W. Williams, and S.K. Schmidt. 1998. Inorganic nitrogen and microbial biomass dynamics before and during spring snowmelt. Biogeochemistry 43:1-15. Bryant, D.M., E.A. Holland, T.R. Seastedt, and M.D. Walker. 1998. Analysis of litter decomposition in an alpine tundra. Canadian Journal of Botany-Revue Canadienne De Botanique 76:1295-1304. Buee, M., M. Reich, C. Murat, E. Morin, R.H. Nilsson, S. Uroz, and F. Martin. 2009. 454 Pyrosequencing analyses of forest soils reveal an unexpectedly high fungal diversity. New Phytologist 184:449-456. Cannone, N., S. Sgorbati, and M. Guglielmin. 2007. Unexpected impacts of climate change on alpine vegetation. Frontiers in Ecology and the Environment 5:360-364.

126 Cardoso, F.S., R.F. Castro, N. Borges, and H. Santos. 2007. Biochemical and genetic characterization of the pathways for trehalose metabolism in Propionibacterium freudenreichii, and their role in stress response. Microbiology-Sgm 153:270-280. Chagoyen, M., and F. Pazos. 2010. Quantifying the biological significance of gene ontology biological processes-implications for the analysis of systems-wide data. Bioinformatics 26:378-384. Cheung, F., J. Win, J.M. Lang, J. Hamilton, H. Vuong, J.E. Leach, S. Kamoun, C.A. Levesque, N. Tisserat, and C.R. Buell. 2008. Analysis of the Pythium ultimum transcriptome using Sanger and Pyrosequencing approaches. Bmc Genomics 9. Choler, P. 2005. Consistent shifts in Alpine plant traits along a mesotopographical gradient. Arctic Antarctic And Alpine Research 37:444-453. Cocking, E.C. 2003. Endophytic colonization of plant roots by nitrogen-fixing bacteria. Plant And Soil 252:169-175. Coleman, D.C., and W.B. Whitman. 2005. Linking species richness, biodiversity and ecosystem function in soil systems. Pedobiologia 49:479-497. Conesa, A., S. Gotz, J.M. Garcia-Gomez, J. Terol, M. Talon, and M. Robles. 2005. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21:3674-3676. Diaz, N.N., L. Krause, A. Goesmann, K. Niehaus, and T.W. Nattkemper. 2009. TACOA - Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. Bmc Bioinformatics 10. DiGuistini, S., N.Y. Liao, D. Platt, G. Robertson, M. Seidel, S.K. Chan, T.R. Docking, I. Birol, R.A. Holt, M. Hirst, E. Mardis, M.A. Marra, R.C. Hamelin, J. Bohlmann, C. Breuil, and S.J.M. Jones. 2009. De novo genome sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data. Genome Biology 10. Dinsdale, E.A., R.A. Edwards, D. Hall, F. Angly, M. Breitbart, J.M. Brulc, M. Furlan, C. Desnues, M. Haynes, L.L. Li, L. McDaniel, M.A. Moran, K.E. Nelson, C. Nilsson, R. Olson, J. Paul, B.R. Brito, Y.J. Ruan, B.K. Swan, R. Stevens, D.L. Valentine, R.V. Thurber, L. Wegley, B.A. White, and F. Rohwer. 2008. Functional metagenomic profiling of nine biomes. Nature 452:629-U8. Djukic, I., F. Zehetner, A. Mentler, and M.H. Gerzabek. 2010. Microbial community composition and activity in different Alpine vegetation zones. Soil Biology & Biochemistry 42:155-161. Duchaufour, P., Gilot, J.C., 1996. Etude d’une chaîne de sols de l’étage alpine (col du Galibier) et ses relations avec la végétation. Oecologia Plantarum, 1: 253-274. Edel-Hermann, V., N. Gautheron, C. Alabouvette, and C. Steinberg. 2008. Fingerprinting methods to approach multitrophic interactions among microflora and microfauna communities in soil. Biology and Fertility of Soils 44:975-984. Edery, I., L.L. Chu, N. Sonenberg, and J. Pelletier. 1995. An Efficient Strategy to Isolate Full-Length Cdnas Based on an Messenger-Rna Cap Retention Procedure (Capture). Molecular and Cellular Biology 15:3363-3371. Edwards, A.C., R. Scalenghe, and M. Freppaz. 2007. Changes in the seasonal snow cover of alpine regions and its effect on soil processes: A review. Quaternary International 162:172- 181. Finlay, B.J., S.C. Maberly, and J.I. Cooper. 1997. Microbial diversity and ecosystem function. Oikos 80:209-213.

127 Fisher, M.M., and E.W. Triplett. 1999. Automated approach for ribosomal intergenic spacer analysis of microbial diversity and its application to freshwater bacterial communities. Applied and Environmental Microbiology 65:4630-4636. Foissner, W., H. Berger, K. Xu, and S. Zechmeister-Boltenstern. 2005. A huge, undescribed soil ciliate (Protozoa : Ciliophora) diversity in natural forest stands of Central Europe. Biodiversity and Conservation 14:617-701. Freeman, K.R., A.P. Martin, D. Karki, R.C. Lynch, M.S. Mitter, A.F. Meyer, J.E. Longcore, D.R. Simmons, and S.K. Schmidt. 2009. Evidence that chytrids dominate fungal communities in high-elevation soils. Proceedings of the National Academy of Sciences of the United States of America 106:18315-18320. Frias-Lopez, J., Y. Shi, G.W. Tyson, M.L. Coleman, S.C. Schuster, S.W. Chisholm, and E.F. DeLong. 2008. Microbial community gene expression in ocean surface waters. Proceedings of the National Academy of Sciences of the United States of America 105:3805- 3810. Gabriel, J. 2010. Development of soil microbiology methods: from respirometry to molecular approaches. Journal of Industrial Microbiology & Biotechnology 37:1289-1297. Gao, H.C., Z.M.K. Yang, T.J. Gentry, L.Y. Wu, C.W. Schadt, and J.Z. Zhou. 2007. Microarray-based analysis of microbial community RNAs by whole-community RNA amplification. Applied and Environmental Microbiology 73:563-571. Gardes, M., and T.D. Bruns. 1993. Its Primers with Enhanced Specificity for Basidiomycetes - Application to the Identification of Mycorrhizae and Rusts. Molecular Ecology 2:113-118. Gardi, C., L. Montanarella, D. Arrouays, A. Bispo, P. Lemanceau, C. Jolivet, C. Mulder, L. Ranjard, J. Rombke, M. Rutgers, and C. Menta. 2009. Soil biodiversity monitoring in Europe: ongoing activities and challenges. European Journal of Soil Science 60:807-819. Gilbert, G.S., and W.P. Sousa. 2002. Host specialization among wood-decay polypore fungi in a Caribbean mangrove forest. Biotropica 34:396-404. Gilbert, J.A., D. Field, Y. Huang, R. Edwards, W.Z. Li, P. Gilna, and I. Joint. 2008. Detection of Large Numbers of Novel Sequences in the Metatranscriptomes of Complex Marine Microbial Communities. Plos One 3. Gori, F., G. Folino, M.S.M. Jetten, and E. Marchiori. 2010. MTR: taxonomic annotation of short metagenomic reads using clustering at multiple taxonomic ranks. Bioinformatics 27:196-203. Gowda, M., R.C. Venu, M.B. Raghupathy, K. Nobuta, H.M. Li, R. Wing, E. Stahlberg, S. Couglan, C.D. Haudenschild, R. Dean, B.H. Nahm, B.C. Meyers, and G.L. Wang. 2006. Deep and comparative analysis of the mycelium and appressorium transcriptomes of Magnaporthe grisea using MPSS, RL-SAGE, and oligoarray methods. Bmc Genomics 7. Grant, S., W.D. Grant, D.A. Cowan, B.E. Jones, Y.H. Ma, A. Ventosa, and S. Heaphy. 2006. Identification of eukaryotic open reading frames in metagenomic cDNA libraries made from environmental samples. Applied and Environmental Microbiology 72:135-143. Green, P. 1996. PHRAP documentation. University of Washington, Seattle. Greene, E.A., and G. Voordouw. 2003. Analysis of environmental microbial communities by reverse sample genome probing. Journal of Microbiological Methods 53:211-219. Gribaldo, S., and C. Brochier-Armanet. 2006. The origin and evolution of Archaea: a state of the art. Philosophical Transactions of the Royal Society B-Biological Sciences 361:1007- 1022.

128 Groffman, P.M., J.P. Hardy, S. Nolan, R.D. Fitzhugh, C.T. Driscoll, and T.J. Fahey. 1999. Snow depth, soil frost and nutrient loss in a northern hardwood forest. Hydrological Processes 13:2275-2286. Groffman, P.M., C.T. Driscoll, T.J. Fahey, J.P. Hardy, R.D. Fitzhugh, and G.L. Tierney. 2001. Effects of mild winter freezing on soil nitrogen and carbon dynamics in a northern hardwood forest. Biogeochemistry 56:191-213. Grunberg-Manago, M. 1999. Messenger RNA stability and its role in control of gene expression in bacteria and phages. Annual Review of Genetics 33:193-227. Gu, J., and R. Reddy. 2001. Cellular RNAs: Varied Roles. Encyclopedia of Life Science. Guazzaroni, M.E., A. Beloqui, P.N. Golyshin, and M. Ferrer. 2009. Metagenomics as a new technological tool to gain scientific knowledge. World Journal of Microbiology & Biotechnology 25:945-954. Gupta, V., R.P. Dick, and D.C. Coleman. 2008. Functional microbial ecology: Molecular approaches to microbial ecology and microbial habitats - Preface. Soil Biology & Biochemistry 40:1269-1271. Hayano, K. 1986. Cellulase Complex in a Tomato Field Soil - Induction, Localization and Some Properties. Soil Biology & Biochemistry 18:215-219. Hirsch, P.R., T.H. Mauchline, and I.M. Clark. 2010. Culture-independent molecular techniques for soil microbial ecology. Soil Biology & Biochemistry 42:878-887. Hobbie, S.E. 1996. Temperature and plant species control over litter decomposition in Alaskan tundra. Ecological Monographs 66:503-522. Holmjensen, I. 1960. A New Gas Absorption Device - Its Application to Titrimetric and Conductometric Micro Determinations of Carbon Dioxide in Air. Analytica Chimica Acta 23:13-27. Hooper, D.U., D.E. Bignell, V.K. Brown, L. Brussaard, J.M. Dangerfield, D.H. Wall, D.A. Wardle, D.C. Coleman, K.E. Giller, P. Lavelle, W.H. Van der Putten, P.C. De Ruiter, J. Rusek, W.L. Silver, J.M. Tiedje, and V. Wolters. 2000. Interactions between aboveground and belowground biodiversity in terrestrial ecosystems: Patterns, mechanisms, and feedbacks. Bioscience 50:1049-1061. Huang, X.Q., and A. Madan. 1999. CAP3: A DNA sequence assembly program. Genome Research 9:868-877. Hughes, J.B., J.J. Hellmann, T.H. Ricketts, and B.J.M. Bohannan. 2001. Counting the uncountable: Statistical approaches to estimating microbial diversity. Applied and Environmental Microbiology 67:4399-4406. Hurt, R.A., X.Y. Qiu, L.Y. Wu, Y. Roh, A.V. Palumbo, J.M. Tiedje, and J.H. Zhou. 2001. Simultaneous recovery of RNA and DNA from soils and sediments. Applied and Environmental Microbiology 67:4495-4503. Huson, D.H., Mitra, S. 2011. Comparative Metagenome Analysis Using MEGAN. Handbook of Molecular Microbial Ecology I: Metagenomics and Complementary Approaches. Jasiecki, J., and G. Wegrzyn. 2003. Growth-rate dependent RNA polyadenylation in Escherichia coli. Embo Reports 4:172-177. John, D.E., B.L. Zielinski, and J.H. Paul. 2009. Creation of a pilot metatranscriptome library from eukaryotic plankton of a eutrophic bay (Tampa Bay, Florida). Limnology and Oceanography-Methods 7:249-259.

129 Kapranov, P., S.E. Cawley, J. Drenkow, S. Bekiranov, R.L. Strausberg, S.P.A. Fodor, and T.R. Gingeras. 2002. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296:916-919. Kapteyn, J., R.F. He, E.T. McDowell, and D.R. Gang. 2010. Incorporation of non-natural nucleotides into template-switching oligonucleotides reduces background and improves cDNA synthesis from very small RNA samples. Bmc Genomics 11. Keeling, P.J., G. Burger, D.G. Durnford, B.F. Lang, R.W. Lee, R.E. Pearlman, A.J. Roger, and M.W. Gray. 2005. The tree of eukaryotes. Trends in Ecology & Evolution 20:670-676. http://cran.r-project.org/web/packages/picante/index.html. 2009. picante: R tools for integrating phylogenies and ecology. R package version 0.7-2. http://cran.r- project.org/web/packages/picante/index.html, Accessed 3 June 2011. Kielland, K. 1994. Amino-Acid-Absorption by Arctic Plants - Implications for Plant Nutrition and Nitrogen Cycling. Ecology 75:2373-2383. Kirby, K.S. 1956. A New Method for the Isolation of Ribonucleic Acids from Mammalian Tissues. British Empire Cancer Campaign Research Fellow 64:405-409. Kirk, J.L., L.A. Beaudette, M. Hart, P. Moutoglis, J.N. Khironomos, H. Lee, and J.T. Trevors. 2004. Methods of studying soil microbial diversity. Journal of Microbiological Methods 58:169-188. Kotamarti, R.M., M. Hahsler, D. Raiford, M. McGee, and M.H. Dunham. 2010. Analyzing taxonomic classification using extensible Markov models. Bioinformatics 26:2235-2241. Krause, L., N.N. Diaz, A. Goesmann, S. Kelley, T.W. Nattkemper, F. Rohwer, R.A. Edwards, and J. Stoye. 2008. Phylogenetic classification of short environmental DNA fragments. Nucleic Acids Research 36:2230-2239. Lamb, E.G., N. Kennedy, and S.D. Siciliano. 2011. Effects of plant species richness and evenness on soil microbial community diversity and function. Plant and Soil 338:483-495. Lambers, H., C. Mougel, B. Jaillard, and P. Hinsinger. 2009. Plant-microbe-soil interactions in the rhizosphere: an evolutionary perspective. Plant and Soil 321:83-115. Lane, C.E., and J.M. Archibald. 2008. The eukaryotic tree of life: endosymbiosis takes its TOL. Trends in Ecology & Evolution 23:268-275. Laserson, J., V. Jojic, and D. Koller. 2011. Genovo: De Novo Assembly for Metagenomes. Journal of Computational Biology 18:429-443. Leininger, S., T. Urich, M. Schloter, L. Schwark, J. Qi, G.W. Nicol, J.I. Prosser, S.C. Schuster, and C. Schleper. 2006. Archaea predominate among ammonia-oxidizing prokaryotes in soils. Nature 442:806-809. Lentendu, G., Zinger, L., Manel, S., Coissac, E., Choler, P., Geremia, R.A., Melodelima, C. 2011. Assessment of soil fungal diversity in different alpine tundra habitats by means of pyrosequencing. Fungal Diversity. Liang, P., and A.B. Pardee. 1992. DIFFERENTIAL DISPLAY OF EUKARYOTIC MESSENGER-RNA BY MEANS OF THE POLYMERASE CHAIN-REACTION. Science 257:967-971. Lipson, D.A., and R.K. Monson. 1998. Plant-microbe competition for soil amino acids in the alpine tundra: effects of freeze-thaw and dry-rewet events. Oecologia 113:406-414. Lipson, D.A., C.W. Schadt, and S.K. Schmidt. 2002. Changes in soil microbial community structure and function in an alpine dry meadow following spring snow melt. Microbial Ecology 43:307-314.

130 Liu Y., Garceau, N.Y., Loros, J.J., Dunlap. , J.C. 1997. Thermally regulated translational control of FRQ mediates aspects of temperature responses in the Neurospora circadian clock. Cell 89: 477 –486 Lundell, T.K., M.R. Makela, and K. Hilden. 2011. Lignin-modifying enzymes in filamentous basidiomycetes - ecological, functional and phylogenetic review. Journal of Basic Microbiology 50:5-20. Lynch, J.M., A. Benedetti, H. Insam, M.P. Nuti, K. Smalla, V. Torsvik, and P. Nannipieri. 2004. Microbial diversity in soil: ecological theories, the contribution of molecular techniques and the impact of transgenic plants and transgenic microorganisms. Biology and Fertility of Soils 40:363-385. Marguerat, S., B.T. Wilhelm, and J. Bahler. 2008. Next-generation sequencing: applications beyond genomes. Biochemical Society Transactions 36:1091-1096. Margulies, M., M. Egholm, W.E. Altman, S. Attiya, J.S. Bader, L.A. Bemben, J. Berka, M.S. Braverman, Y.J. Chen, Z.T. Chen, S.B. Dewell, A. de Winter, J. Drake, L. Du, J.M. Fierro, R. Forte, X.V. Gomes, B.C. Godwin, W. He, S. Helgesen, C.H. Ho, S.K. Hutchison, G.P. Irzyk, S.C. Jando, M.L.I. Alenquer, T.P. Jarvie, K.B. Jirage, J.B. Kim, J.R. Knight, J.R. Lanza, J.H. Leamon, W.L. Lee, S.M. Lefkowitz, M. Lei, J. Li, K.L. Lohman, H. Lu, V.B. Makhijani, K.E. McDade, M.P. McKenna, E.W. Myers, E. Nickerson, J.R. Nobile, R. Plant, B.P. Puc, M. Reifler, M.T. Ronan, G.T. Roth, G.J. Sarkis, J.F. Simons, J.W. Simpson, M. Srinivasan, K.R. Tartaro, A. Tomasz, K.A. Vogt, G.A. Volkmer, S.H. Wang, Y. Wang, M.P. Weiner, D.A. Willoughby, P.G. Yu, R.F. Begley, and J.M. Rothberg. 2006. Genome sequencing in microfabricated high-density picolitre reactors (vol 437, pg 376, 2005). Nature 441:120-120. Maron, P.A., L. Ranjard, C. Mougel, and P. Lemanceau. 2007. Metaproteomics: A new approach for studying functional microbial ecology. Microbial Ecology 53:486-493. Martin-Cuadrado, A.B., P. Lopez-Garcia, J.C. Alba, D. Moreira, L. Monticelli, A. Strittmatter, G. Gottschalk, and F. Rodriguez-Valera. 2007. Metagenomics of the Deep Mediterranean, a Warm Bathypelagic Habitat. Plos One 2. Matzner, E., and W. Borken. 2008. Do freeze-thaw events enhance C and N losses from soils of different ecosystems? A review. European Journal of Soil Science 59:274-284. McGrath, K.C., S.R. Thomas-Hall, C.T. Cheng, L. Leo, A. Alexa, S. Schmidt, and P.M. Schenk. 2008. Isolation and analysis of mRNA from environmental microbial communities. Journal of Microbiological Methods 75:172-176. McGuire, A.D., L.G. Anderson, T.R. Christensen, S. Dallimore, L.D. Guo, D.J. Hayes, M. Heimann, T.D. Lorenson, R.W. Macdonald, and N. Roulet. 2009. Sensitivity of the carbon cycle in the Arctic to climate change. Ecological Monographs 79:523-555. McHardy, A.C., H.G. Martin, A. Tsirigos, P. Hugenholtz, and I. Rigoutsos. 2007. Accurate phylogenetic classification of variable-length DNA fragments. Nature Methods 4:63-72. Meyer, A.F., D.A. Lipson, A.P. Martin, C.W. Schadt, and S.K. Schmidt. 2004. Molecular and metabolic characterization of cold-tolerant alpine soil Pseudomonas sensu stricto. Applied and Environmental Microbiology 70:483-489. Meyer, F., D. Paarmann, M. D'Souza, R. Olson, E.M. Glass, M. Kubal, T. Paczian, A. Rodriguez, R. Stevens, A. Wilke, J. Wilkening, and R.A. Edwards. 2008. The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. Bmc Bioinformatics 9.

131 Meyer a, E., G.V. Aglyamova, S. Wang, J. Buchanan-Carter, D. Abrego, J.K. Colbourne, B.L. Willis, and M.V. Matz. 2009. Sequencing and de novo analysis of a coral larval transcriptome using 454 GSFlx. Bmc Genomics 10.

Meyer b, F., R. Overbeek, and A. Rodriguez. 2009. FIGfams: yet another set of protein families. Nucleic Acids Research 37:6643-6654. Mohammed, M.H., T.S. Ghosh, N.K. Singh, and S.S. Mande. 2010. SPHINX-an algorithm for taxonomic binning of metagenomic sequences. Bioinformatics 27:22-30. Molina, C., B. Rotter, R. Horres, S.M. Udupa, B. Besser, L. Bellarmino, M. Baum, H. Matsumura, R. Terauchi, G. Kahl, and P. Winter. 2008. SuperSAGE: the drought stress- responsive transcriptome of chickpea roots. Bmc Genomics 9. Morales, S.E., and W.E. Holben. 2010. Linking bacterial identities and ecosystem processes: can 'omic' analyses be more than the sum of their parts? Fems Microbiology Ecology 75:2-16. Morozova, O., and M.A. Marra. 2008. Applications of next-generation sequencing technologies in functional genomics. Genomics 92:255-264. Morozova, O., M. Hirst, and M.A. Marra. 2009. Applications of New Sequencing Technologies for Transcriptome Analysis. Annual Review of Genomics and Human Genetics 10:135-151. Mueller, G.M., J.P. Schmit, P.R. Leacock, B. Buyck, J. Cifuentes, D.E. Desjardin, R.E. Halling, K. Hjortstam, T. Iturriaga, K.H. Larsson, D.J. Lodge, T.W. May, D. Minter, M. Rajchenberg, S.A. Redhead, L. Ryvarden, J.M. Trappe, R. Watling, and Q.W. Wu. 2007. Global diversity and distribution of macrofungi. Biodiversity and Conservation 16:37-48. Mullen, R.B., and S.K. Schmidt. 1993. Mycorrhizal Infection, Phosphorus Uptake, and Phenology in Ranunculus-Adoneus - Implications for the Functioning of Mycorrhizae in Alpine Systems. Oecologia 94:229-234. Mullen, R.B., S.K. Schmidt, and C.H. Jaeger. 1998. Nitrogen uptake during snowmelt by the snow buttercup, Ranunculus adoneus. Arctic and Alpine Research 30:121-125. Nadelhoffer, K.J., A.E. Giblin, G.R. Shaver, and J.A. Laundre. 1991. Effects of temperature and substrate quality on element mineralization in 6 Arctic soils. Ecology 72:242- 253. Nagalakshmi, U., Z. Wang, K. Waern, C. Shou, D. Raha, M. Gerstein, and M. Snyder. 2008. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320:1344-1349. Nemergut, D.R., E.K. Costello, A.F. Meyer, M.Y. Pescador, M.N. Weintraub, and S.K. Schmidt. 2005. Structure and function of alpine and arctic soil microbial communities. Research in Microbiology 156:775-784. Nielsen, U.N., E. Ayres, D.H. Wall, and R.D. Bardgett. 2011. Soil biodiversity and carbon cycling: a review and synthesis of studies examining diversity-function relationships. European Journal of Soil Science 62:105-116. Nowrousian, M., J.E. Stajich, M.L. Chu, I. Engh, E. Espagne, K. Halliday, J. Kamerewerd, F. Kempken, B. Knab, H.C. Kuo, H.D. Osiewacz, S. Poggeler, N.D. Read, S. Seiler, K.M. Smith, D. Zickler, U. Kuck, and M. Freitag. 2010. De novo Assembly of a 40 Mb Eukaryotic Genome from Short Sequence Reads: Sordaria macrospora, a Model Organism for Fungal Morphogenesis. Plos Genetics 6. O'Brien, H.E., J.L. Parrent, J.A. Jackson, J.M. Moncalvo, and R. Vilgalys. 2005. Fungal community analysis by large-scale sequencing of environmental samples. Applied and Environmental Microbiology 71:5544-5550.

132 Oechel, W.C., S.J. Hastings, G. Vourlitis, M. Jenkins, G. Riechers, and N. Grulke. 1993. Recent Change of Arctic Tundra Ecosystems from a Net Carbon-Dioxide Sink to a Source. Nature 361:520-523. Oechel, W.C., G. Vourlitis, and S.J. Hastings. 1997. Cold season CO2 emission from arctic soils. Global Biogeochemical Cycles 11:163-172. http://cran.r-project.org/web/packages/vegan/index.html. 2009. vegan: Community Ecology Package. R package version 1.15-4. http://cran.r- project.org/web/packages/vegan/index.html, Accessed 3 June 2011. Opik, M., M. Metsis, T.J. Daniell, M. Zobel, and M. Moora. 2009. Large-scale parallel 454 sequencing reveals host ecological group specificity of arbuscular mycorrhizal fungi in a boreonemoral forest. New Phytologist 184:424-437. Orita, M., H. Iwahana, H. Kanazawa, K. Hayashi, and T. Sekiya. 1989. Detection of polymorphisms of human and DNA by Gel-electrophoresis as Single Strand Conformation Polymorphisms. Proceedings of the National Academy of Sciences of the United States of America 86:2766-2770. Osler, G.H.R., and M. Sommerkorn. 2007. Toward a complete soil C and N cycle: Incorporating the soil fauna. Ecology 88:1611-1621. Ovaskainen, O., J. Nokso-Koivista, J. Hottola, T. Rajala, T. Pennanen, H. Ali-Kovero, O. Miettinen, P. Oinonen, P. Auvinen, L. Paulin, K.H. Larsson, and R. Makipaa. 2010. Identifying wood-inhabiting fungi with 454 sequencing - what is the probability that BLAST gives the correct species? Fungal Ecology 3:274-283. Overbeek, R., T. Begley, R.M. Butler, J.V. Choudhuri, H.Y. Chuang, M. Cohoon, V. de Crecy-Lagard, N. Diaz, T. Disz, R. Edwards, M. Fonstein, E.D. Frank, S. Gerdes, E.M. Glass, A. Goesmann, A. Hanson, D. Iwata-Reuyl, R. Jensen, N. Jamshidi, L. Krause, M. Kubal, N. Larsen, B. Linke, A.C. McHardy, F. Meyer, H. Neuweger, G. Olsen, R. Olson, A. Osterman, V. Portnoy, G.D. Pusch, D.A. Rodionov, C. Ruckert, J. Steiner, R. Stevens, I. Thiele, O. Vassieva, Y. Ye, O. Zagnitko, and V. Vonstein. 2005. The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Research 33:5691-5702. Ovreas, L. 2000. Population and community level approaches for analysing microbial diversity in natural environments. Ecology Letters 3:236-251. Ozsolak, F., and P.M. Milos. 2010. RNA sequencing: advances, challenges and opportunities. Nature Reviews Genetics 12:87-98. Pace, N.R. 1999. Microbial ecology & diversity. Asm News 65:328-333. Pace, B.W., Bank, S., Wise, L., Burson, L.C., Borrero, E. 1985. Amylase isoenzymes in the acute abdomen: an adjunct in those patients with elevated total amylase. Am J Gastroenterol 1985; 80:898-901. Pan, Y.T., J.D. Carroll, N. Asano, I. Pastuszak, V.K. Edavana, and A.D. Elbein. 2008. Trehalose synthase converts glycogen to trehalose. Febs Journal 275:3408-3420. Peay, K.G., P.G. Kennedy, and T.D. Bruns. 2008. Fungal Community Ecology: A Hybrid Beast with a Molecular Master. Bioscience 58:799-810. Peltola, H., H. Soderlund, and E. Ukkonen. 1984. SEQAID - A DNA-Sequence assembling program based on a mathematical-model. Nucleic Acids Research 12:307-321. Pilon, C.E., B. Cote, and J.W. Fyles. 1994. Effect of Snow Removal on Leaf Water Potential, Soil-Moisture, Leaf and Soil Nutrient Status and Leaf Peroxidase-Activity of Sugar Maple. Plant and Soil 162:81-88.

133 Poretsky, R.S., N. Bano, A. Buchan, G. LeCleir, J. Kleikemper, M. Pickering, W.M. Pate, M.A. Moran, and J.T. Hollibaugh. 2005. Analysis of microbial gene transcripts in environmental samples. Applied and Environmental Microbiology 71:4121-4126. Poretsky, R.S., I. Hewson, S.L. Sun, A.E. Allen, J.P. Zehr, and M.A. Moran. 2009. Comparative day/night metatranscriptomic analysis of microbial communities in the North Pacific subtropical gyre. Environmental Microbiology 11:1358-1375. Poroyko, V., L.G. Hejlek, W.G. Spollen, G.K. Springer, H.T. Nguyen, R.E. Sharp, and H.J. Bohnert. 2005. The maize root transcriptome by serial analysis of gene expression. Plant Physiology 138:1700-1710. Portillo, M.C., and J.M. Gonzalez. 2009. Comparing bacterial community fingerprints from white colonizations in Altamira Cave (Spain). World Journal of Microbiology & Biotechnology 25:1347-1352. Prosser, J.I. 2002. Molecular and functional diversity in soil micro-organisms. Plant and Soil 244:9-17. R Foundation for Statistical Computing, Vienna, Austria. http://cran.r-project.org/. 2010. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. http://cran.r-project.org/, Accessed 3 June 2011. Rardin, M.J., S.E. Wiley, R.K. Naviaux, A.N. Murphy, and J.E. Dixon. 2009. Monitoring phosphorylation of the pyruvate dehydrogenase complex. Analytical Biochemistry 389:157- 164. Riley, M. 1998. Systems for categorizing functions of gene products. Current Opinion in Structural Biology 8:388-392. Robinson, C.H. 2002. Controls on decomposition and soil nitrogen availability at high latitudes. Plant and Soil 242:65-81. Roesch, L.F., R.R. Fulthorpe, A. Riva, G. Casella, A.K.M. Hadwin, A.D. Kent, S.H. Daroub, F.A.O. Camargo, W.G. Farmerie, and E.W. Triplett. 2007. Pyrosequencing enumerates and contrasts soil microbial diversity. Isme Journal 1:283-290. Roger, A.J., and L.A. Hug. 2006. The origin and diversification of eukaryotes: problems with molecular phylogenetics and molecular clock estimation. Philosophical Transactions of the Royal Society B-Biological Sciences 361:1039-1054. Rosen, G., A. Rosenfeld, T.Y. Lim, Y. Lan, and C. Blackwood. 2010. NBC: the Naïve Bayes Classification tool webserver for taxonomic classification of metagenomic reads. Briefings in Bioinformatics 27:127-129. Rothschild, L.J., and R.L. Mancinelli. 2001. Life in extreme environments. Nature 409:1092-1101. Saleh-Lakha, S., M. Miller, R.G. Campbell, K. Schneider, P. Elahimanesh, M.M. Hart, and J.T. Trevors. 2005. Microbial gene expression in soil: methods, applications and challenges. Journal of Microbiological Methods 63:1-19. Sato, S., F.A. Feltus, P. Iyer, and M. Tien. 2009. The first genome-level transcriptome of the wood-degrading fungus Phanerochaete chrysosporium grown on red oak. Current Genetics 55:273-286. Sayler, G.S., J.T. Fleming, and D.E. Nivens. 2001. Gene expression monitoring in soils by mRNA analysis and gene lux fusions. Current Opinion in Biotechnology 12:455-460. Schmidt, W.M., and M.W. Mueller. 1999. CapSelect: a highly sensitive method for 5' CAP-dependent enrichment of full-length cDNA in PCR-mediated analysis of mRNAs. Nucleic Acids Res 27:e31.

134 Schmidt, S.K., D.A. Lipson, and T.K. Raab. 2000. Effects of willows (Salix brachycarpa) on populations of Salicylate-mineralizing microorganisms in alpine soils. Journal of Chemical Ecology 26:2049-2057. Schmidt, S.K., and D.A. Lipson. 2004. Microbial growth under the snow: Implications for nutrient and allelochemical availability in temperate soils. Plant and Soil 259:1-7. Schmidt, S.K., E.K. Costello, D.R. Nemergut, C.C. Cleveland, S.C. Reed, M.N. Weintraub, A.F. Meyer, and A.M. Martin. 2007. Biogeochemical consequences of rapid microbial turnover and seasonal succession in soil. Ecology 88:1379-1385. Schob, C., P.M. Kammer, P. Choler, and H. Veit. 2009. Small-scale plant species distribution in snowbeds and its sensitivity to climate change. Plant Ecology 200:91-104. Schreiber, F., P. Gumrich, R. Daniel, and P. Meinicke. 2010. Treephyler: fast taxonomic profiling of metagenomes. Bioinformatics 26:960-961. Schroeder, A., O. Mueller, S. Stocker, R. Salowsky, M. Leiber, M. Gassmann, S. Lightfoot, W. Menzel, M. Granzow, and T. Ragg. 2006. The RIN: an RNA integrity number for assigning integrity values to RNA measurements. Bmc Molecular Biology 7. Shi, Y.M., G.W. Tyson, and E.F. DeLong. 2009. Metatranscriptomics reveals unique microbial small RNAs in the ocean's water column. Nature 459:266-U154. Shrestha, P.M., M. Kube, R. Reinhardt, and W. Liesack. 2009. Transcriptional activity of paddy soil bacterial communities. Environmental Microbiology 11:960-970. Siciliano, S.D., J.J. Germida, K. Banks, and C.W. Greer. 2003. Changes in microbial community composition and function during a polyaromatic hydrocarbon phytoremediation field trial. Applied and Environmental Microbiology 69:483-489. Singh, B.K., P. Millard, A.S. Whiteley, and J.C. Murrell. 2004. Unravelling rhizosphere- microbial interactions: opportunities and limitations. Trends in Microbiology 12:386-393. Singh, B.K., R.D. Bardgett, P. Smith, and D.S. Reay. 2010. Microorganisms and climate change: terrestrial feedbacks and mitigation options. Nature Reviews Microbiology 8:779- 790. Sorek, R., and P. Cossart. 2009. Prokaryotic transcriptomics: a new view on regulation, physiology and pathogenicity. Nature Reviews Genetics 11:9-16. Sorensen, J., M.H. Nicolaisen, E. Ron, and P. Simonet. 2009. Molecular tools in rhizosphere microbiology-from single-cell to whole-community analysis. Plant and Soil 321:483-512. Stotzky, G. 1960. A Simple Method for the Determination of the Respiratory Quotient of Soils. Canadian Journal of Microbiology 6:439-452. Sun, C., Y. Li, Q. Wu, H.M. Luo, Y.Z. Sun, J.Y. Song, E.M.K. Lui, and S.L. Chen. 2010. De novo sequencing and analysis of the American ginseng root transcriptome using a GS FLX Titanium platform to discover putative genes involved in ginsenoside biosynthesis. Bmc Genomics 11. Sutton, G.G., O. White, M.D. Adams, and A.R. Kerlavage. 1995. TIGR Assembler: A new tool for assembling large shotgun sequencing projects. Genome Science and Technology 1:9- 19. Suzuki, M., K. Matsui, M. Yamada, H. Kasai, T. Sofuni, and T. Nohmi. 1997. Construction of mutants of Salmonella typhimurium deficient in 8-hydroxyguanine DNA glycosylase and their sensitivities to oxidative mutagens and nitro compounds. Mutation Research-Genetic Toxicology and Environmental Mutagenesis 393:233-246.

135 Tartar, A., M.M. Wheeler, X.G. Zhou, M.R. Coy, D.G. Boucias, and M.E. Scharf. 2009. Parallel metatranscriptome analyses of host and symbiont gene expression in the gut of the termite Reticulitermes flavipes. Biotechnology for Biofuels 2. Taylor, L.L., J.R. Leake, J. Quirk, K. Hardy, S.A. Banwart, and D.J. Beerling. 2009. Biological weathering and the long-term carbon cycle: integrating mycorrhizal evolution and function into the current paradigm. Geobiology 7:171-191. Tedersoo, L., R.H. Nilsson, K. Abarenkov, T. Jairus, A. Sadam, I. Saar, M. Bahram, E. Bechem, G. Chuyong, and U. Koljalg. 2010. 454 Pyrosequencing and Sanger sequencing of tropical mycorrhizal fungi provide similar results but reveal substantial methodological biases. New Phytologist 188:291-301. Theurillat, J.P., and A. Guisan. 2001. Potential impact of climate change on vegetation in the European Alps: A review. Climatic Change 50:77-109. Thuiller, W., S. Lavorel, M.B. Araujo, M.T. Sykes, and I.C. Prentice. 2005. Climate change threats to plant diversity in Europe. Proceedings of the National Academy of Sciences of the United States of America 102:8245-8250. Tiedje, J.M., S. Asuming-Brempong, K. Nusslein, T.L. Marsh, and S.J. Flynn. 1999. Opening the black box of soil microbial diversity. Applied Soil Ecology 13:109-122. Timme, R.E., and C.F. Delwiche. 2010. Uncovering the evolutionary origin of plant molecular processes: comparison of Coleochaete (Coleochaetales) and Spirogyra (Zygnematales) transcriptomes. Bmc Plant Biology 10. Todaka, N., S. Moriya, K. Saita, T. Hondo, I. Kiuchi, H. Takasu, M. Ohkuma, C. Piero, Y. Hayashizaki, and T. Kudo. 2007. Environmental cDNA analysis of the genes involved in lignocellulose digestion in the symbiotic protist community of Reticulitermes speratus. Fems Microbiology Ecology 59:592-599. Torsvik, V., and L. Ovreas. 2002. Microbial diversity and function in soil: from genes to ecosystems. Current Opinion in Microbiology 5:240-245. Tringe, S.G., C. von Mering, A. Kobayashi, A.A. Salamov, K. Chen, H.W. Chang, M. Podar, J.M. Short, E.J. Mathur, J.C. Detter, P. Bork, P. Hugenholtz, and E.M. Rubin. 2005. Comparative metagenomics of microbial communities. Science 308:554-557. Urich, T., A. Lanzen, J. Qi, D.H. Huson, C. Schleper, and S.C. Schuster. 2008. Simultaneous Assessment of Soil Microbial Community Structure and Function through Analysis of the Meta-Transcriptome. Plos One 3. Van der Heijden, M.G.A., R.D. Bardgett, and N.M. van Straalen. 2008. The unseen majority: soil microbes as drivers of plant diversity and productivity in terrestrial ecosystems. Ecology Letters 11:296-310. Vega-Arreguin, J.C., E. Ibarra-Laclette, B. Jimenez-Moraila, O. Martinez, J.P. Vielle- Calzada, L. Herrera-Estrella, and A. Herrera-Estrella. 2009. Deep sampling of the Palomero maize transcriptome by a high throughput strategy of pyrosequencing. Bmc Genomics 10. Vega-Sanchez, M.E., M. Gowda, and G.L. Wang. 2007. Tag-based approaches for deep transcriptome analysis in plants. Plant Science 173:371-380. Vera, J.C., C.W. Wheat, H.W. Fescemyer, M.J. Frilander, D.L. Crawford, I. Hanski, and J.H. Marden. 2008. Rapid transcriptome characterization for a nonmodel organism using 454 pyrosequencing. Molecular Ecology 17:1636-1647. Van Breemen, N., 1993. Soils as biotic constructs favouring net productivity. Geoderma 57, 183-211.

136 Von Wintzingerode, F., U.B. Gobel, and E. Stackebrandt. 1997. Determination of microbial diversity in environmental samples: pitfalls of PCR-based rRNA analysis. Fems Microbiology Reviews 21:213-229. Walker, M.D., C.H. Wahren, R.D. Hollister, G.H.R. Henry, L.E. Ahlquist, J.M. Alatalo, M.S. Bret-Harte, M.P. Calef, T.V. Callaghan, A.B. Carroll, H.E. Epstein, I.S. Jonsdottir, J.A. Klein, B. Magnusson, U. Molau, S.F. Oberbauer, S.P. Rewa, C.H. Robinson, G.R. Shaver, K.N. Suding, C.C. Thompson, A. Tolvanen, O. Totland, P.L. Turner, C.E. Tweedie, P.J. Webber, and P.A. Wookey. 2006. Plant community responses to experimental warming across the tundra biome. Proceedings of the National Academy of Sciences of the United States of America 103:1342-1346. Wallenstein, M.D., and M.N. Weintraub. 2008. Emerging tools for measuring and modeling the in situ activity of soil extracellular enzymes. Soil Biology & Biochemistry 40:2098-2106. Wang, W., Y.J. Wang, Q. Zhang, Y. Qi, and D.J. Guo. 2009. Global characterization of Artemisia annua glandular trichome transcriptome using 454 pyrosequencing. Bmc Genomics 10. Wardle, D.A. 2002. Communities and Ecosystems. Linking the Aboveground and Belowground Components. Princeton University Press, Princeton, U.S.A. Wardle, D.A. 2006. The influence of biotic interactions on soil biodiversity. Ecology Letters 9:870-886. Warnecke, F., and M. Hess. 2009. A perspective: Metatranscriptomics as a tool for the discovery of novel biocatalysts. Journal Of Biotechnology 142:91-95. Weber, A.P.M., K.L. Weber, K. Carr, C. Wilkerson, and J.B. Ohlrogge. 2007. Sampling the arabidopsis transcriptome with massively parallel pyrosequencing. Plant Physiology 144:32-42. Williams, M.W., P.D. Brooks, and T. Seastedt. 1998. Nitrogen and carbon soil dynamics in response to climate change in a high-elevation ecosystem in the Rocky Mountains, USA. Arctic and Alpine Research 30:26-30. Wilson, S.D., and C. Nilsson. 2009. Arctic alpine vegetation change over 20 years. Global Change Biology 15:1676-1684. Wipf, S., and C. Rixen. 2010. A review of snow manipulation experiments in Arctic and alpine tundra ecosystems. Polar Research 29:95-109. Wu, J.Y., X.T. Jiang, Y.X. Jiang, S.Y. Lu, F. Zou, and H.W. Zhou. 2001. Effects of polymerase, template dilution and cycle number on PCR based 16 S rRNA diversity analysis using the deep sequencing method. Bmc Microbiology 10. Wu, C.H., R. Apweiler, A. Bairoch, D.A. Natale, W.C. Barker, B. Boeckmann, S. Ferro, E. Gasteiger, H.Z. Huang, R. Lopez, M. Magrane, M.J. Martin, R. Mazumder, C. O'Donovan, N. Redaschi, and B. Suzek. 2006. The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Research 34:D187-D191. Young, I.M., and J.W. Crawford. 2004. Interactions and self-organization in the soil- microbe complex. Science 304:1634-1637. Yun, J., S. Kang, S. Park, H. Yoon, M.J. Kim, S. Heu, and S. Ryu. 2004. Characterization of a novel amylolytic enzyme encoded by a gene from a soil-derived metagenomic library. Applied and Environmental Microbiology 70:7229-7235. Zhou, X.F., Z.G. Lin, and H. Ma. 2010. Phylogenetic detection of numerous gene duplications shared by animals, fungi and plants. Genome Biology 11.

137 Zhu, Y.Y., E.M. Machleder, A. Chenchik, R. Li, and P.D. Siebert. 2001. Reverse transcriptase template switching: A SMART (TM) approach for full-length cDNA library construction. Biotechniques 30:892-897. Zinger, L., J. Gury, F. Giraud, S. Krivobok, L. Gielly, P. Taberlet, and R.A. Geremia. 2007. Improvements of polymerase chain reaction and capillary electrophoresis single-strand conformation polymorphism methods in microbial ecology: Toward a high-throughput method for microbial diversity studies in soil. Microbial Ecology 54:203-216. Zinger, L., J. Gury, O. Alibeu, D. Rioux, L. Gielly, L. Sage, F. Pompanon, and R.A. Geremia. 2008. CE-SSCP and CE-FLA, simple and high-throughput alternatives for fungal diversity studies. Journal of Microbiological Methods 72:42-53. Zinger, L., B. Shahnavaz, F. Baptist, R.A. Geremia, and P. Choler. 2009. Microbial diversity in alpine tundra soils correlates with snow cover dynamics. Isme Journal 3:850-859. Zinger, L., D.P.H. Lejon, F. Baptist, A. Bouasria, S. Aubert, R.A. Geremia, and P. Choler. 2011. Contrasting diversity patterns of crenarchaeal, bacterial and fungal soil communities in an alpine landscape. Plos One 6:e19950.

138 Annex1: Towards a metatranscriptomic comparison of two alpine soils

Mustafa, T., Monier, A., Geremia, R.A., Choler, PH., Bonneville, JM. 1Laboratoire d'Ecologie Alpine, UMR CNRS-UJF 5553, Université J. Fourier - Grenoble I, BP 53, 38041 Grenoble, France 2 Station Alpine J. Fourier, UMS 2925 CNRS-UJF, Université J. Fourier - Grenoble I, F- 38041 Grenoble, France. * Correspondance: JM Bonneville, Laboratoire d'Ecologie Alpine, UMR CNRS-UJF 5553, Université J. Fourier - Grenoble I, BP XX, 38041 Grenoble, France; e-mail: jean- [email protected] ; Keywords: Alpine soil, metatranscriptome, Eukaryotes, polyadenylated mRNA, metabolic pathways.

139

1. Introduction: Soils are highly diversified habitats that cover most of continental areas. Eukaryotic and prokaryotic microbial communities, which represent most of soil biomass, exhibit a high specific diversity. A single gram of soil may harbor several thousands of prokaryotic species and more than 400 species (Lentendu et al., 2011). Within the soil ecosystem almost all the biochemical processes are carried out by communities of soil microorganisms, which influence the most important environmental functions (Bailly et al., 2007). Bacteria and Fungi are responsible for 90-95% of the total biochemical processes in most soils (Djukic et al., 2010; Courty et al., 2010). Moreover, they are the driving force in the process of organic matter decomposition and thus play a key role in the nutrient cycle of soil ecosystems in general (Djukic et al., 2010). For instance, Fungi are accepted to be crucial for plant cell wall degradation. The composition of soil microbial communities has been matter of numerous studies. Molecular methods are preferred because only a few proportions of soil microorganisms have been cultured, owing to the difficulty of isolation in laboratory conditions. The use of fingerprinting and, more recently, massif sequencing provide clues about the factors influencing microbial community composition. Although soil prokaryotic communities are mainly influenced by abiotic factors such as pH, fungal communities are influenced by the quality of organic matter and above ground plant community’s composition. Studies performed in alpine soil show essentially the same results, and also show a seasonal succession of microbial communities. The metagenomic, which records the genetic information present in the soil micro-organisms offers new opportunities to assess the functional properties of soils and may get insights into genome organization and gene content of new microbial species belonging to non-cultivable phyla (Bailly et al., 2007). Most of the studies used soil DNA, which reveals information on the presence of organisms or the potential function of a community, but does not give information on the metabolic state of the organism or the activities actually occurring in situ (McGrath et al., 2008; Urich et al., 2008; Sorenson et al., 2009). Studies on soil RNA are more kin to assess the metabolically active organisms and how they are affected by perturbations in environmental conditions (Gilbert et al., 2008). Consequently, RNA has thus been more often targeted for information on the active portion of the population, which represents a logical next step to Metagenomic (Urich et al., 2008; Sorenson et al., 2009).

140 As mentioned above, the studies on community composition and metagenomics are important to assess both microbial distribution and potentially available functions, but they are not informative about the metabolic state. Approaches based on RNA (transcriptomics) may provide better understanding of functional diversity and the actual biological activities expressed in soil. The functional annotation of mRNA provides a snapshot on the actual functional capabilities of a soil, while its abundance may be a proxy to enzymes production rates (Wallenstein et al., 2008). Given the taxonomic diversity and the proportion of unknown organisms, high sequencing depth should be necessary. The newly developed high throughput pyrosequencing technology allows for rapid construction of sequence data with radically reduced effort, time and cost (Vera et al., 2008; Wang et al., 2009). Thus, the increased throughput of next generation sequencing technologies such as 454 sequencing show great potential for expanding sequences database for functional studies of soil biota (Mayer et al., 2009). Actually, the 454 pyrosequencing techniques were used for transcriptomic studies of non-model organisms (McGrath et al., 2008; Sun et al., 2010; Hirsch et al., 2010). Studies on environmental samples are called “metatranscriptoics” or “environmental trascriptomics” In environmental studies, the main approach was random priming, which renders mostly rARN. However, several drawbacks were identified: produces many biases (Gilbert et al., 2008), in addition to difficulties working with unstable microbial mRNA as a small portion of total cellular RNA, and the persistence of significant levels of rRNA contamination (McGrath et al., 2008). The lack of reference genomes can also derive on a lower quality of annotations. For eukaryotes, the use of RNA extracted from environmental samples could circumvent these problems, owing to their poly-A tails, eukaryotic mRNA which can indeed be particularly isolated from a complex RNA mixture and converted into intronless cDNAs that can be cloned to generate environmental metatranscriptomic cDNA libraries, which are representative of the fraction of protein-coding genes expressed at the time of sampling (Bailly et al., 2007). Nevertheless, the few examples published up till now have successfully demonstrated the potential for discovery of genes and genetic markers in these systems (Mayer et al., 2009), that could be applied on alpine soils which to our knowledge has not been yet studied tell now. Seasonally snow-covered soils represent 20% of the global land surface. It is largely assumed that these soils contain large amounts of organic carbon and that the mineralization of this carbon stock is of increasing concern in a warmer climate (Zinger et al., 2009). In arctic and alpine tundra, the duration of snow cover has radical impacts on ecosystem structure and functioning, moreover, major nutrient and carbon cycling is influenced by the

141 number, frequency and duration of freeze-thaw events (Björk et al., 2008), which disorder soil structure and act as a selective anti-microbial agent (Edwards et al., 2007). Thus, alpine tundra offers important ecologically opportunities to estimate the impact of snow on local climatic conditions and ecosystem process (Zinger et al 2009). However, alpine microbial communities are not well known, and only a few comparative studies of microbial community dynamics in relation to snow cover patterns have been reported. Our objectives were to initiate un first experimental study by establishing first metatranscriptomic approach concerning eukaryotic soil microbial community activity in two contrasted conditions in alpine tundra, namely early snowmelt (ESM) and late snowmelt (LSM), which are the two extremes of a snow cover gradient, in order to achieve sufficient first experimental understanding of the interaction within microorganism’s communities in alpines tundra and with their environments. In this study, we present the first cDNA library preparation and titration for metatranscriptome of two alpine soils sequenced using 454, as well as sequence analysis and annotation procedures depending on exclusively publicly available software and python scripting tools. We have also set up an appropriate bioinformatics analysis pipeline that allows to reliably extracting functional and taxonomic information and statistics from this dataset. 2. Materials and Methods: 2.1. Site characteristics and soil sampling The study has been conducted in the French South-Western Alps (Lieu-dit Vallon de Roche Noire, Massif du Grand Galibier, France; 45°0.05’N, 06°0.38’E, 2500 m asl), in truly Alpine grasslands. Samples were collected (i) in an early snowmelt (ESM) site, displaying a shallow winter snowpacks and frequent and deep soil freezing events and (ii) in late snowmelt site (LSM) that accumulates a dense and deep winter snowpack insulating soils from freezing temperatures. Differences in snow cover regime between the two sites lead to strong discrepancies in the growing season length, with soil temperature becoming positive (i.e. snow free) about 40 days later in LSM than in ESM (Zinger et al., 2009). Consequently, the vegetation in the two sites differs: ESM is dominated by stress-tolerant species like Kobresia muysoroides (Cyperaceae ), Dryas octopetala (Rosaceae ) and by low-stature species such as Carex foetida (Cyperaceae), whereas LSM’s vegetation is mainly composed of fast-growing species such as Alopecurus alpinus (Poaceae) and Alchemilla pentaphylla (Rosaceae), as well as the cold-tolerant Salix herbaceae (Salicaceae) (Choler, 2005). The two sites are separated by approximately 25 m and their surface ranged between 50 and 100 m 2 and correspond to the sites B related in an earlier study (Zinger et al., 2009). Sampling was done in the late growing

142 season, on August 22nd 2008. In each site, three soil samples were collected from the top 10 cm of soil, sieved (2mm mesh size) to remove visible roots, frozen in dry ice for transport and then stocked at -80 °C. 2.2. RNA extraction and purification Total soil RNA was extracted according to Bailly et al (2007) with minor modifications. Briefly, 10 g of soil were submerged with liquid N 2 in a mortar and pestle- ground for 10 min. Two extractions for each of the three LSM and ESM samples were carried out in tubes containing 0.5 g of ground soil, 0.5 g of glass beads, diameter 0.6 mm (Sigma), 450 µl of lysis buffer (0,1M Tris-HCl pH 9, 20 mM EDTA, 0.1 M NaCl and 2% SDS), 25 µl of denaturing solution (4M guanidine isothiocyanate, 10mM Tris-HCl pH 8, 1 mM EDTA), 25 µl of β -mercaptoethanol, and 500 µl of phenol saturated with 0.1M of citrate buffer at pH 5. Tubes were agitated for 10 min at maximum speed at room temperature on a Vortex2Genie (Scientific Industries), then centrifuged for 10 min. Two successive extractions with phenol:chloroform:isoamyl alcohol (25:24:1 by vol) mixture, followed by two extraction steps with Chloroform: isoamyl alcohol (24:1 by vol) were then performed. The nucleic acids in the aqueous phase were precipitated by adding 0.1 volume of 3M Na-acetate (pH 5.2) and 2.5 volumes of ethanol. The pellet was resuspended in 100 µl diethtyl pyrocarbonate treated water. The contaminating DNA was digested at ambient temperature for 25 min using DNase1 (3 kU/µl), (Qiagen). Low molecular weight contaminating molecules and DNA residues were eliminated on a RNeasy® column Mini Kit (Qiagen). RNA concentrations were estimated on a NanoDrop ND-1000 Spectrophotometer, and RNA quality was estimated by capillary electrophoresis using the Bioanalyzer 2100 RNA 6000 Nano Kit from Agilent Technologies. RNA yield ranges were 3-6 mg/g fresh soil for LSM and 5-11 mg/g for ESM. 2.3. cDNA library construction and pyrosequencing RNA extracted from each soil sample was first converted into first strand cDNA in three independent reactions using each 225 ng of total RNA as template and the oligo(dT) primer of the MINT cDNA synthesis kit (EVROGEN), according to manufacturer’s recommandations. Each reaction was then amplified in 10 parallel PCR tubes with an intitial denaturation of 1 min at 95°C, 31 cycles of 15 s at 95°C, 20 s at 66°C, 3 min at 72°C, and a final extension for 15 s at 66°C followed by 3 minutes at 72°C. PCR products were pooled per sample and purified on QIAquick ® collumns (Qiagen). The cDNA purity, concentration and size range were estimated spectrophotometrically and by capillary electrophoresis using the DNA 6000 Nano Chips kit Bioanalyzer 2100 from Agilent Technologies. Finally, the cDNA samples were pooled per site ( > 7 mg for each) and sent for sequencing to Genoscope (Centre

143 National de Sequençage, Evry, France). There, the two cDNAs were fragmented by nebulization to produce fragments of ~500 bp length, blunted, and ligated to A1 and B1 adaptors (A1: C*C*A*T*CTCATCCCTGCGTGTCTCCGAC*T*C*A*G- 3’, extended by the 3' tag ACGAGTGCGT for LSM and ACGCTCGACA for ESM; B1: 5' biotinylated C*C*T*A*TCCCCTGTGTGCCTTGGCAGTC*T*C*A*G; (asterisk indicates a phosphorothioate-modified base). DNA molecules containing A1 and B1 were selected, mixed, and samples were pyrosequenced jointly on one fourth of a plate according to (Margulies et al., 2005), with modifications required for the 454 GS-Flx Titanium technology (Roche). 2.4. Sequence analysis, assembly, bioinformatics pipelines. FASTA files obtained from pyrosequencing were split into LSM and ESM files using their tags, and sequencing reads were trimmed and annotated using a series of custom Python scripts in the Obitools environment ( http://www.grenoble.prabi.fr/trac/OBITools ) that are available upon request. Raw sequences were trimmed i) from the M1 primer sequence used for cDNA amplification at either DNA end, ii) from polyA tail when found in 3', and iii) from polyT stretch, when present in 5'; these oligopolymeric stretches were sometimes > 100 nt in length, and only trimmed reads > 40 nt in length were kept. To identify ribosomal RNA reads, the resulting FASTA files were converted into databases and queried by BLASTn for similitude to one of a collection of complete LSU and SSU rRNA sequences gathered from the 19 bacterial phyla, 2 archaeal phyla, and 14 eukaryotic species taken among known soil fauna and flora members (see https://sites.google.com/site/metatranscriptomerochenoire/). Reads with an E-value < 0.001 are hereafter named r-RNA reads and were sorted out, leaving putative mRNA reads. Read assembly was performed using the CAP3 software and default parameters (http://deepc2.psi.iastate.edu/aat/cap/cap.html; (Huang and Madan, 1999)), with a FASTA file free of annotations; rarefaction curves were drawn with an R script. BLASTs searches using putative mRNA reads as queries were performed locally against the UniProt TrEMBL and Swiss-Prot databases using -w15 option to relax gap opening penalty and an E-value cut-off of 10 -6, and the identity of the best hit was stored. For functional annotation with MG-RAST (Meta Genome Rapid Annotation using Subsystem Technology; v1.2; (Huson et al., 2009)), FASTA files were submitted at http://metagenomics.nmpdr.org, using as thresholds an E- value of < 0.01 and a minimum alignment length of 50 bp. For Blast2GO annotation (http://www.blast2go.org/start_blast2go; (Conesa and Gotz, 2008)), an E-value cutoff of 10 -6 was used. EC codes were those attributed to a read with the various annotation systems, when

144 available (the SwissProt enzyme code being the one associated with the best hit). When an annotation system associated a string of several ECs to a single read, that string was considered as a new entry in EC rarefaction curves. EC strings from distinct annotation systems were considered as conflictive with each other only when they did not contain a common item. For rarefaction curves of functional annotations or ECs, the FASTA file annotations were first converted into a tabular format; an abundance table that gives read hit numbers for each function (or EC) was then written using Unix commands, and finally processed using an R script. For Venn diagrams, a contingency table scoring the presence/absence of a functional annotation (or EC) for each read and each annotation pipeline was written, and their co- occurrence scored following (Cagnard, 2010). To deduce a taxonomic structure of the soil communities from reads similar to mRNA coding for known proteins, read similitudes to TrEMBL entries were first retrieved with a BLASTx search; assignments to taxonomic groups were then performed with MEGAN (Huson et al., 2007). The subsets of reads coding for all groups of ribosomal proteins were retrieved with MG-RAST, distributed between an LSU and a SSU FASTA file, and processed as above. For each gene or functional category, a pairwise comparison of read hit distribution to all other reads (Table 2 and 3) or all other MG- RAST annotated reads (Figure 3 and Table S1) was performed using a Monte Carlo test with 10000 samplings ( chisq.test function in R using sim=T and B=10000 ). The null hypothesis is that the probability of annotating a given function is the same in the two datasets. To calculate the maximum number of false positives in a hit list, p-values were first were computed according to (Audic and Claverie, 1997) and shown to be very close to those provided by the Monte-Carlo test (dat not shown). Second, only entries totalizing at least 6 hits were considered (n= 89 for Table 3), because the statistical power is very limited on entries with fewer hits (e.g. umbalanced hit scores of 4:0 and 5:1 return a non-significant p- value, 0.06). Finally, the maximal number of false positives was estimated as n X p, where p is the highest p-value considered (Audic and Claverie, 1997). Sequence depth estimates were computed assuming known the true ratio of expression of a gene X in two conditions (RATIO), and defining PSUCCES as the probability of annotation of a read as gene X product in the first sample. Assuming an equal the number of reads, N, has been submitted for annotation in samples A and B, the numbers of X-annotated reads in samples A and B, Xa and Xb, are random variables following the binomial distribution and whose parameters (n, p) are respectively (N, PSUCCES) and (N, PSUCCES*RATIO). An R script making use of the bsamsize function of the Hmisc package then estimates the sample size N needed to achieve a

145 95% power of a two-sided test for the difference in two proportions, for several values of PSUCCES and of RATIO 1.

3. Results cDNA was synthesized from poly-A RNA purified from ESM and LSM soils and submitted to pyrosequencing. A total of 92078 cDNA reads for LSM and 66779 for ESM were obtained. The size distribution of raw sequence reads was bimodal, with a first peak around 60 nt and a second one around 500 nt (Figure 1A and B). Primers and terminal oligopolymeric stretches were trimmed (see Methods), and trimmed reads < 40 nt were set apart as non-informative. Fourty-four percent of these reads contained a 5’ poly(T) stretch, versus 7 % among longer reads, indicating that the presence of a 5' poly(T) interfered negatively with the pyrosequencing process. Reads longer than 40 nt after trimming (58978 for LSM and 43472 for ESM) were further filtered to set aside ribosomal RNA sequences (see Methods). Reads showing similitude to bacterial, eukaryotic or archaeal rRNA accounted for <15% of the informative reads and were set apart (Figure 1A and B). Reads without similitude to rRNA were considered as putative mRNA reads and amounted to 50035 (LSM) and 39064 (ESM), with similar average lengths (Figure 1A).

Fig. 1. Putative mRNA and rRNA reads and their assembly into contigs. (A and B) Size distribution of cDNA reads before trimming for LSM (A) and ESM (B). Orange, red and blue bars denote non-informative, r-RNA, and putative mRNA reads, respectively. Trimmed reads > 40 nt in length were classed as r-RNA- or putative mRNA reads (see Methods). (C), Rarefaction of contigs upon assembly of putative mRNA reads (red symbols) or r-RNA reads (blue symbols) by CAP3. Filled triangles: ESM, open circles: LSM. Dotted lines denote average ± standard deviation of 100 random samplings.

146 In order to get an insight into the complexity of the RNA datasets, overlapping reads were assembled into contigs, and rarefaction curves were produced (Figure 1B). For putative mRNAs, a large majority of reads remained unassembled, with 57% and 67% singletons for LSM and ESM, respectively. Contig length was only moderately increased over the initial read size (Figure 1), and the median contig content was of 2 reads. Visual inspection of the contigs revealed an extensive overlap of reads in most contigs, indicating that some mRNA regions are better represented than others. In line with a decreased sequence complexity, a higher level of read assembly was achieved with rRNA datasets (Figure 1C). In order to get an insight into the protein coding capacity of the datasets, unassembled reads were aligned against the TrEMBL database, providing a comparison to the largest non- redundant repository of protein sequences. A large majority of putative mRNA reads (87%) were annotated as similar to a known protein coding sequence. Unassigned reads in LSM and ESM had on average a much smaller mean length (~170 nt), suggesting that they also arose by reverse transcription of a coding mRNA, but were too short for a protein similitude to be detected. A global functional identification of mRNA reads was further performed using three annotation systems (Figure 2A, see Methods).

Fig. 2. Protein annotation by MG_RAST, Blast2GO & SwissProt. (A) Venn diagrams showing the number of reads annotated by the 3 pipelines for the 50035 LSM putative mRNA reads. (B) Rarefaction curve of proteins identified upon virtual translation of the putative mRNA datasets. Circles denote Blast2GO-, diamonds SwissProt-, and triangles MG-RAST annotations. Open symbols are for LSM, and filled symbols for ESM. Only the average of 100 random samplings is shown; the standard deviations were always smaller than symbol heights. C, Venn diagram of the protein annotations including an enzyme code for the LSM dataset. D, same as B, but for proteins associated with an EC code.

147 First, the MG-RAST server annotated 16-17% of the reads as similar to a SEED protein database entry. Second, we aligned mRNA reads to Swiss-Prot, which resulted in 26% of annotated reads. Finally, the Blast2GO server annotated 36% of the reads. Figure 2A shows that most reads annotated in MG-RAST were also retrieved by SwissProt (6627 + 167, i.e. 85% of overlap), and that most reads annotated by SwissProt were also annotated by Blast2GO (6626 + 5680, i.e. 96% of overlap); similar trends were noted in ESM (not shown). Thus, the three pipelines annotate a common core of reads. The numbers of proteins with known function identified were close together for SwissProt and Blast2GO, but much smaller for MG-RAST (7861, 7875, and 1753 for LSM, respectively; Figure 2B). The TrEMBL Blast E-value, a negative indicator of the annotation quality, had for median 6 x10 -30 for the complete dataset, and was reduced to similar values for the MG-RAST, SwissProt, and Blast2GO subsets (4.10 -39 , 1.10 -39 , and 2.10 -38 , respectively. Among reads annotated by at least one system, about half (46%) had an associated enzyme code (EC number; Figure 2C). Contrasting with the overall protein annotation however, most EC numbers were given by one system only (Figure 2A and 2C). For 71% of the 1362 reads annotated by the 3 systems, the EC annotations were consensual. The proportion of proteins associated with an EC (i.e., of enzymes) was much lower using either SwissProt (8%) or Blast2GO (12%) than using MG- RAST (36%). As a result, the numbers of enzymes identified by the three pipelines were much closer together than the numbers of protein annotations (Figure 2C and 2D). Blast2GO retrieved the most enzyme codes, followed by SwissProt and MG-RAST (898, 720 and 616, respectively, for LSM). Nearly as many EC codes (878, 678, and 621) were identified from the smaller ESM dataset. In order to estimate the contribution of different eukaryotic taxa to soil gene expression, we first studied the taxonomic origins of r-RNA reads.. Most rRNA reads were of bacterial origin (>75 %), and fungi predominated among eukaryotes. However, whereas the large subunit (LSU) rRNA and the small subunit (SSU) rRNA are equally represented in a cell, we noted that r-RNA reads were not randomly distributed onto the two subunit RNAs, with for instance a two-fold excess of LSU over SSU for Acidobacteria, and a 20-fold excess for Proteobacteria (data not shown). It is thus likely that oligo d(T) priming introduced sampling biases on ribosomal RNA templates, which are not polyadenlated. We then investigated the taxonomic origin of the putative mRNAs, which were blasted against TrEMBL database and further analyzed using MEGAN (Table 1). This strategy failed assigning > 61% of sequences. However, among the sequences assigned to cellular organisms, eukaryotes contributed to more than 80%, while bacteria represented less than 14%; Archaeae were less abundant than

148 Bacteria by nearly two orders of magnitude. Among eukaryotes, Fungi, Viridiplantae and Metazoa were the major phyla. Table1. taxonomic structure of the soil communities deduced from putative mRNA reads.

Since conserved phylogenetically informative genes represent only a small fraction of total metagenomic data sets (Kunin et al., 2008), we have analysed the subsets of reads coding for ribosomal proteins (Table 1). The large subunit- (LSU) and small subunit (SSU) protein subsets are expected to provide similar patterns, as ribosomal proteins are expressed together. p With respect to the complete protein coding dataset, the contribution of Bacteria in ribosomal subset diminished strongly, with a corresponding increase of eukaryotes contribution (Table1). Fungi remained the most abundant kingdom, but we noticed an increase in the lower eukaryotic nodes, and a decrease for Viridiplantae . 2 Taken together, this data indicates that the mRNA dataset is enriched on eukaryotic sequences, with Fungi as the most represented eukaryotic group. Distribution of biochemical functions To study the distribution of functions on ESM and LSM metatranscriptomes, we used the MG-RAST pipeline, which classes assigned reads into nested subsystems that group together functionally related proteins. The distribution of assigned reads into the 26 top subsystems is

2 The apparent frequency of Metazoa increased with the LSU but decreased with the SSU subset. Embarassant a commenter…

149 shown on Figure 3. Two functional groups, Protein metabolism and Carbohydrates , were above 10% of assignments. In addition, hits into these two subsystems were more frequent in LSM than in ESM (p < 0,05 in a Monte Carlo test; Figure 3 and Methods). The reverse trend was observed and statistically significant (p < 0,05) for 6 minor subsystems: Virulence , Cofactors and Vitamins , Cell Wall and Capsule , Membrane Transport , Nucleosides and Nucleotides , and RNA Metabolism (Figure 3). These unmbalanced subsystems were explored further, and their components also exhibiting significant differences are listed in Table S1. Most remarkably, there were consistent increases in the metabolism of several vitamins in ESM, and metal ion transporters accounted for part of the overall increase in Membrane Transport . Interestingly, despite the lack of overall variation for Secondary Metabolism , a contrasted situation of most expressed phenylpropanoids biosynthetic pathways was found (Table S1).

Fig. 3. Functional classification of putative mRNA reads from LSM and ESM. Datasets were annotated by the MG-RAST server (http://metagenomics.nmpdr.org ), and the fraction of annnotated reads assigned to each of the 26 top level SEED subsystems was plotted. Asterisks and triangles denote annotation frequencies significatively more represented in LSM and ESM, respectively (p < 0,05, see Methods); their components are analyzed in Table S1. Differences in hit counts between LSM and ESM were sometimes observed for lower rank SEED subsystems. We investigated the phenyl propanoid pathway because of its pivotal role in the production of soil recalcitrant organic matter. Despite the apparent lack of overall variation of the secondary metabolism, hit counts in lower subsystems like the synthesis of apigenin derivatives and of phytosterol were significantly higher in ESM, while the opposite

150 was true for flavanone biosynthesis (Table S1). The major differences between LSM and ESM were found in protein metabolism and carbohydrate use, and are detailed below. Protein metabolism . Within the Protein Metabolism subsystem, MG-RAST retrieved differences in the protein biosynthesis group, and the major difference was related to ribosomal proteins. A statistically significant larger fraction of reads was assigned to proteins form the large ribosomal subunit (LSU) in LSM than in ESM (Table 2). To confirm this observation, we retrieved the reads coding for ribosomal proteins using Blast2GO, which defines largely overlapping but not identical sets (Table 2). The Blast2GO anntation confirmed the increase in LSM for ribosomal proteins, and was observed for both subunits (SSU). These data suggest a more active translation system in the LSM soil. We further explored protein turnover by analysing reads involved in protein degradation. Since the SEED annotation system failed to retrieve several functions associated with this pathway ( i.e. in the ubiquitin pathway), we worked with Blast2GO. This did not reveal differences for neither proteasome nor polyubiquitin, but ubiquitin-protein ligase assignations were clearly more abundant in LSM than in ESM. Because both ribosome biogenesis and protein ubiquitinylation are up-regulated, we propose that protein turnover in LSM is higher than in ESM. Table 2. Reads associated with ribosomal proteins or the ubiquitin degradation system

151 Carbohydrate metabolism. Investigating the SEED Carbohydrate subsystem revealed a total of 327 known proteins encoded in LSM or ESM and representing a total of 1600 reads. The median hit number was 3; as a consequence, the statistical power is null for the less expressed half of the identified proteins (i.e. the most unbalanced distribution, 3:0, does not produce a significant p-value). On the other hand, eighty nine proteins had at least 6 hits each, and represented most (57%) of the reads in this subsystem. Fifteen out of these 89 proteins showed a significant difference in read frequency between LSM and ESM (p < 0.05 in a Monte-Carlo test: Table 3). In this outlier list of 15, the number of false positive entries was estimated to be at most 3 (see Methods). Much like for the overall annotated sequences, the corresponding reads were largely eukaryotic in origin: bacterial hits accounted for 24 and 10 % of them for LSM and ESM, respectively. Table 3. Proteins acting in carbohydrate metabolism and displaying assymetric distribution in the LSM and ESM soils.

The notable exception was isobutyryl coA mutase, for which all hits are covering a unique sequence of bacterial origin, and may represent an amplification event without biological meaning. For 11 proteins, read frequencies were higher in LSM (Table 3); altogether, they accounted for 41% of the read gap between LSM and ESM for the whole Carbohydrate subsystem 3. Conversely, 4 proteins were significantly overrepresented in ESM, thus opposing

3 mbsStat_carboH .xls, Carbohydrates_chi2sim.xls

152 to the trend observed for the whole Carbohydrate subsystem (Table 3). Eleven of the 15 proteins with unbalanced expression have an associated EC code, which allowed 10 of them to be placed on a metabolic pathway (Figure 4). This revealed a striking asymmetry between the two soil habitats in the nature of enzymes fuelling glucose to the cell, as well as in glucose utilization. On one hand, b-glucosidase, an enzyme degrading structural plant cell wall compounds, was more abundant in ESM; this was also the case for phosphoglucomutase, which can channel glucose away from glycolysis. On the other hand, amylases (Glucan 1,4-α- glucosidase) degrading storage polysaccharides were more abundant in LSM. Also, several enzymes of the glycolysis and Kreb’s cycle were more abundant in LSM (Figure 4 and Table 3). Phosphoglucomutase was more abundant in ESM; this could be part of a channelling of glucose to storage or stress carbohydrates like trehalose, as trehalose synthase tended to increase also in ESM.

Fig. 4. Carbohydrate enzymes showing differential expression map to divergent pathways. The role of enzymes listed in Table 3 was investigated and is reported on this metabolic map. Dark and light grey letters denote overexpression in LSM and ESM, respectively. Note the asymmetry in glucose feeding and use between the two datasets.

153 In order to confirm the data on the 12 enzymatic activities displaying asymmetric hit distribution, we fetched the reads annotated with Blast2GO and SwissProt and sharing the corresponding EC numbers (Table 3). Because enzyme sequences can share the same EC number, yet belong to distinct SEED subsystems, we also reinvestigated the MG-RAST annotations the same way (Table 3). The differences in hit counts between LSM and ESM were in most instances of the same sign as in the Carbohydrate subsystem. When looking at the whole MG-RAST output, 10 enzymes showed significant differences. For Blast2GO and SwissProt, it was reduced to three. The recurrent differences concerned Kreb’s cycle. Taken together, our data support the hypothesis that channelling of glucose towards ATP synthesis is higher in LMS than ESM. Taken together, these results suggest two different life styles of eukaryotes from either habitat: ESM biota thrives on complex extracellular resources to cope with stress, while LSM organisms mainly use glucose for ATP synthesis. Power test analysis The previous sections indicate that, although are quite similar, ESM and LSM soil metatranscriptomes display relevant differences in glucose metabolism and possibly in other pathways. The question arises about the relation between the depth of the sequencing effort and the completeness of the raised catalogue of genes (or gene classes) differentially expressed in conditions A and B. Detecting these differences depends on the absolute level of the given gene expression in A, the A/B expression ratio, and the sequencing depth (Figure 5). With around 40,000 reads, we have detected most of the expression differences between LSM and ESM whose annotations represent > 1% of all reads in soil A and < 0.8% in soil B. For an annotation present in 0.1% of the A reads, differences have likely be detected if its frequency is <0,05% in B. Here, the 3 top subsystems each gathered > 1% of the putative mRNA reads, and two of them showed a differential expression. All other subsystems (except photosynthesis) were above 0.1%, which suggests that when no variation is detected, the true -4 5 expression ratio is less than 2-fold. For f A= 10 , about 5*10 reads are required to detect safely a 2 fold difference (Figure 5); this is a frequency reached by the 350 most expressed genes, many of which code for ribosomal proteins or enzymes of central metabolic pathways; -5 6 they gathered > 35% of annotated reads. For f A=10 , more 2*10 reads are needed to detect a 2.5 fold difference. With about 50000 reads, the median number of reads per annotation was 1 for Blast2GO and 2 for MG-RAST (data not shown). This implies that at least half of the genetic functions, taken individually, cannot be shown to be differentially expressed unless millions of reads are available; it also emphasizes the need for pertinent gene classes in data mining.

154 Discussion Large scale gene expression study of environmental samples by metatranscriptomics or environmental transcriptomics is an exciting field of research in ecology that poses no a priori choice on the ecologically relevant gene or gene groups on which to focus. This new field has been opened by the advent of next generation DNA sequencing techniques, which keep on improving. We identified a protein coding potential for a much larger fraction of the putative mRNA reads than in a previous metatranscriptomic study (32%; Urich et al, 2008), and this is likely to be due to the increase in read length from the GS20 (120 nt) to the GSFlex pyrosequencing technology. Since rRNA is present in overwhelming amounts over mRNA in every cell, getting access to the spectra of encoded biochemical functions encoded by soil RNA requires the enrichment of mRNA before sequencing, and/or their a posteriori filtering from very large datasets (Urich et al., 2008). The poly(A) RNA approach, used here for eukaryotic mRNA enrichment in a protocol that places the DNA polymerase used for emulsion PCR in contact with poly(A) or poly(T) containing templates, generated a class of unreadable short sequences (Figure 1A), but recent cDNA synthesis and sequencing protocols allow to avoid this problem (Sun et al.) (Meyer et al., 2009; Vega-Arreguin et al., 2009). A large majority of r-RNA reads in the two alpine soils of this study were of prokaryotic origin, as previously found in an agricultural soil (Urich et al., 2008), suggesting that bacteria make up most of the soil biomass. Nevertheless ~85% of putative mRNA reads were assigned to eukaryote taxa (Table 1), which is in similar to what is described in another study focusing on marine plankton metatranscriptome and using a similar approach (John et al., 2009). This may reflect i) an incomplete selection of polyadenylated RNA, ii) the existence of a small fraction of polyadenylated bacterial or archaeal RNA that is targeted for degradation (Portnoy et al., 2005; Slomovic et al., 2008), or iii) the result from low-quality annotation. The nearly linear rarefaction curves produced upon read assembly indicate a very partial sampling within cDNA populations of very high complexity (Figure 1B). Rarefaction curves of biochemical functions and of enzymes did not reach a plateau (Figures 2B and 2D), but deviated sufficiently from linearity to allow a quantitative comparison of the most expressed genes or gene classes. In the absence of a reference genome, data mining from soil metatranscriptomes is likely to be safer without read assembly, because chimerical, artifactual proteins are likely to be predicted from overlapping reads covering ortholog mRNAs from different species, or from paralog mRNAs. These wrong products will be difficult to

155 dissociate from true fusion proteins, which include abundantly expressed proteins like polyubiquitin and the ribosomal genes for L40 or S27A (Karbstein). Metatranscriptomic analysis entirely depends on the efficient extraction of bona fide information from nucleic acid databases. We illustrate here that the two main annotation pipelines, i.e. MG-RAST and Blast2GO, annotated cDNA reads in noticeably different ways. The number of annotated reads was twice lower for MG-RAST than for Blast2GO (Figure 2A); nonetheless, some proteins were identified more frequently by MG-RAST (e.g. Enoyl- CoA hydratase, malate dehydrogenase; Table 3). MG-RAST identified about four times fewer proteins with a known function than Blast2GO (Figure 2B), but the two pipelines identified nearly as many enzymes (Figure 2D). These marked differences likely reflect a smaller repertoire of proteins, more oriented towards prokaryote genomes, in the SEED database 4. Because the annotations of a single read by MG-RAST, Blast2GO and SwissProt sometimes differed, the sets of reads annotated by the two pipelines as belonging to a given gene or gene group are fully overlapping and can differ quantitatively. For instance, despite the strong conservation of ribosome or proteasome proteins, the corresponding numbers of hits differed among annotation tools (Table 2). As a consequence, differences highlighted with one annotation system are sturdier when another confirms them. Another reason to use both annotation systems in parallel is that important gaps remain in the gene function groupings, such as the ubiquitin pathway, absent in MG-RAST. Community structure analysis was based on reads coding for proteins and lead to the identification of fungi as the major eukayotic phylum contributing to both datasets, followed by plants and metazoan (Fig. 1). The nearly absence of hits into the photosynthesis subsystems is on line with a root origin of the plant reads. Attempts to get more precise insights into the taxonomic makeup of the dataset, however, lead to the emergence of both expected and unexpected groups: as expected, Asco - and Basidiomycetes were the overwhelming fungal groups, and eudicotyledons and Poaceae dominated among Viridiplantae . In contrast, Metazoa were dominated by nematodes, but arthropods were less represented than Chordata , with surprising contributions of Cephalochordata and fishes. Assignments to Cnidaria , another typically aquatic group, were nearly as frequent as to Rotifers (data not shown / include Figure SI). Symmetrically, a minority of the sea plankton cDNA clones sequenced by John et al, corresponded to organisms very unlikely to be present,

4 Caution for a direct use of EC : frequent conflicts (29%) among the 3 pipelines in EC annotations, at global scale and also at small scale (carbohydrates), see DataNotShown …

156 like fungi, plants or metazoa (John et al., 2009). These egregious assignments might be the result of an uneven density of sequenced genomes across the tree of life. Among bacterial phyla, imbalances in taxonomic assignations between rRNA and protein datasets have been pointed out (Urich et al., 2008), and the same may apply for Eukaryota. Another explaination of such a feature may come from ribosomal proteins subsets. These ubiquitous proteins are conserved across the tree of life (e.g. (Liao and Dennis, 1994)), which limits their resolution power for taxonomy, but are expressed strongly and in a coordinated manner (Bremer and Dennis, 1996). Therefore, data obtained with either ribosomal subunit should give both independent and convergent indications. The proportion of annotations of prokaryotic origin was notably lower in the r-protein subsets (< 0,05) than in the whole protein set (> 0,12; Table 1). This is a paradox situation, because every growing cell needs expressing r-proteins in a hardly compressible way: in Escherichia coli , the fraction of r-protein fluctuates, but stays over 0,09 (Bremer and Dennis, 1996). 5A possible explanation is that a wide and as yet unsuspected array of genes has been sequenced only in bacteria so far, and the counterparts in the eukaryotic soil microflora remain unidentified. High-elevation soils might contain large reservoirs of undocumented biodiversity, mainly due to their oligotrophic nature and dramatic freeze-thaw cycles, even in summer (Freeman et al., 2009). Soil metagenomic studies will be helpful improve our knowledge of soil biodiversity. The rarefaction curves for contigs (Figure 1B), protein functions and enzyme names (Figure 2) all indicate a higher complexity for the ESM metatranscriptome. This raises the possibility that the expression of a larger gene set is required to cope with stress. Transcripts of genes involved in the biosynthesis of vitamins and cofactors, for example, appear to be more expressed in ESM (Table S1). A likely explanation is that root exudates in LSM soil provide the aforementioned molecules to soil microorganisms, whose synthesis is therefore turned down. Transcriptional regulation in thiamin biosynthesis is documented for fungi (Maundrell, 1990) 6.

5 Ribosomal proteins are sometimes missed by automatic gene callers because of their small size (Kunin et al., 2008), but the same should apply for eukaryotic and prokaryotic r-proteins (check sizes! ). I did not find ribosomal proteins in the quoted paper ?!

6 The increased biosynthesis of vitamin B12 is consistent with the increased expression of isobutyryl- CoA mutase, which uses it as a cofactor (Banerjee and Ragsdale, 2003)

157 Differences in expression in the two most expressed gene groups are reported in this study (Tables 2 and 3). We observed an increased hit frequency both in ribosomal proteins and in the ubiquitin pathway, which usually leads to protein degradation. Proteins to be hydrolyzed are covalently conjugated to several ubiquitin tags and addressed to the proteasome, where they are degraded while the ubiquitin is recycled; the specificity of this intracellular proteolysis is brought about by an array of enzymes with ubiquitin ligase activity, which collectively decide on the individual fate of nuclear and cytoplasmic proteins (Hershko, 2005). We therefore propose that both protein synthesis and protein degradation are faster cellular processes in LSM than in ESM. Within house keeping biochemical pathways, like production of energy from carbohydrates or protein metabolism, it is interesting to note that some segments are constitutive, while others display no variations. For instance, we observed no difference for respiration (not shown), while they were clear for the upstream components, glycolysis and Kreb’s cycle (Figure 4 and Table 3). A plausible interpretation is that oxidative phosphorylation is mediated by proteins that saturate the mitochondrial inner membrane, and for which an increased expression would not lead to an increased activity. A similar pattern emerges in the ubiquitin pathway. The recurrent components, ubiquitin (produced from polyubiquitin genes), and proteasome, displayed no obvious variations, while the regulatory components, i.e. ubiquitin protein ligases, were clearly more abundant in LSM. The most likely interpretation in both cases is that the primary targets of gene regulation are the bottleneck steps in the biological processes. Differences in gene expression have been noted for isolated genes or for small gene classes, e.g. in the phenylpropanoid pathway. In the absence of differences in broader or neighbor gene groups, such observations are difficult to interpret. Because the statistical power of a metatranscriptomic study varies sharply with sample-sequencing depth (Figure 5), larger datasets will be needed for a few model habitats to resolve such issues. On the other hand, taking into account the deep differences detected among the broadest gene classes between transcriptomes from strongly different habitats, such as soil and marine communities (Figure 6), a modest sequencing depth would be sufficient to detect major differences in a wide series of contrasted soils, and possibly allow to correlate soil biota function abundances with habitat properties. The efficiency of such an approach will certainly be enhanced by the definition of functional gene groups of biogeochemical relevance. A biogeochemical interpretation of soil metagenomic data will need to make use of internal mRNA standard (Gifford et al.), and will need to cope with the extra difficulties linked to soil heterogeneity.

158 References: Audic S, Claverie JM (1997) The significance of digital gene expression profiles. Genome Res 7: 986-995. Bailly J, Fraissinet-Tachet L, Verner MC, Debaud JC, Lemaire M, Wesolowski-Louvel M, Marmeisse R (2007) Soil eukaryotic functional diversity, a metatranscriptomic approach. Isme J 1: 632-642 Banerjee R, Ragsdale SW (2003) The many faces of vitamin B12: catalysis by cobalamin-dependent enzymes. Annu Rev Biochem 72: 209-247 Björk RG, Björkman MP, Andersson MX, Klemedtsson L (2008) Temporal variation in soil microbial communities in Alpine tundra. Soil Biology and Biochemistry 40: 266-268 Bremer H, Dennis PP (1996) Modulation of chemical composition and other parameters of the cell by growth rate. . In eae Neidhardt, ed, Escherichia coli and Salmonella: Cellular and Molecular Biology, Ed 2nd. ASM Press, Washington DC, pp 1553-1569 Buée M, Reich M, Murat C, Morin E, Nilsson RH, Uroz S, Martin F (2009) 454 Pyrosequencing analyses of forest soils reveal an unexpectedly high fungal diversity. New Phytol 184: 449-456 Cagnard N (2010) eVenn: A powerful tool to compare lists and draw Venn diagrams. R package version 1.24.1. Conesa A, Gotz S (2008) Blast2GO: A comprehensive suite for functional analysis in plant genomics. Int J Plant Genomics 2008: 619832 Courty P-E, BuÈe M, Diedhiou AG, Frey-Klett P, Le Tacon F, Rineau F, Turpault M-P, Uroz S, Garbaye J The role of ectomycorrhizal communities in forest ecosystem processes: New perspectives and emerging concepts. Soil Biology and Biochemistry 42: 679-698 de Boer W, Folman LB, Summerbell RC, Boddy L (2005) Living in a fungal world: impact of fungi on soil bacterial niche development. FEMS Microbiol Rev 29: 795-811 Dinsdale EA, Edwards RA, Hall D, Angly F, Breitbart M, Brulc JM, Furlan M, Desnues C, Haynes M, Li L, McDaniel L, Moran MA, Nelson KE, Nilsson C, Olson R, Paul J, Brito BR, Ruan Y, Swan BK, Stevens R, Valentine DL, Thurber RV, Wegley L, White BA, Rohwer F (2008) Functional metagenomic profiling of nine biomes. Nature 452: 629 - 632 Djukic I, Zehetner F, Mentler A, Gerzabek MH Microbial community composition and activity in different Alpine vegetation zones. Soil Biology and Biochemistry 42: 155-161 Edwards AC, Scalenghe R, Freppaz M (2007) Changes in the seasonal snow cover of alpine regions and its effect on soil processes: A review. Quaternary International 162-163: 172-181 Freeman KR, Martin AP, Karki D, Lynch RC, Mitter MS, Meyer AF, Longcore JE, Simmons DR, Schmidt SK (2009) Evidence that chytrids dominate fungal communities in high-elevation soils. Proc Natl Acad Sci U S A 106: 18315-18320 Gifford SM, Sharma S, Rinta-Kanto JM, Moran MA Quantitative analysis of a deeply sequenced marine microbial metatranscriptome. Isme J 5: 461-472

159 Gilbert JA, Field D, Huang Y, Edwards R, Li W, Gilna P, Joint I (2008) Detection of large numbers of novel sequences in the metatranscriptomes of complex marine microbial communities. PLoS One 3: e3042 Grant S, Grant WD, Cowan DA, Jones BE, Ma Y, Ventosa A, Heaphy S (2006) Identification of eukaryotic open reading frames in metagenomic cDNA libraries made from environmental samples. Appl Environ Microbiol 72: 135-143 Hershko A (2005) The ubiquitin system for protein degradation and some of its roles in the control of the cell division cycle. Cell Death Differ 12: 1191-1197 Hirsch PR, Mauchline TH, Clark IM (2010) Culture-independent molecular techniques for soil microbial ecology. Soil Biology and Biochemistry 42: 878-887 Huang X, Madan A (1999) CAP3: A DNA sequence assembly program. Genome Res 9: 868-877 Huson DH, Auch AF, Qi J, Schuster SC (2007) MEGAN analysis of metagenomic data. Genome Res 17: 377-386 Huson DH, Richter DC, Mitra S, Auch AF, Schuster SC (2009) Methods for comparative metagenomics. BMC Bioinformatics 10 Suppl 1: S12 Jan Sørensen, Nicolaisen MH, Ron E, Simonet P (2009) Molecular tools in rhizosphere microbiology —from single-cell to whole-community analysis Plant and Soil 321: 483-512 John D, Zielinski B, Paul J (2009) Creation of a pilot metatranscriptome library from eukaryotic plankton of a eutrophic bay (Tampa Bay, Florida) Limnol. Oceanogr. Methods . 7, 2009,249-259: 249-259 Karbstein K Chaperoning ribosome assembly. Kunin V, Copeland A, Lapidus A, Mavromatis K, Hugenholtz P (2008) A bioinformatician's guide to metagenomics. Microbiol Mol Biol Rev 72: 557-578, Table of Contents Lentendu G, Zinger L, Manel S, Coissac E, Choler P, Geremia R, Melodelima C (2011) Assessment of soil fungal diversity in different alpine tundra habitats by means of pyrosequencing. Fungal Diversity : 1-11 Liao D, Dennis PP (1994) Molecular phylogenies based on ribosomal protein L11, L1, L10, and L12 sequences. J Mol Evol 38: 405-419 Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, Dewell SB, Du L, Fierro JM, Gomes XV, Godwin BC, He W, Helgesen S, Ho CH, Irzyk GP, Jando SC, Alenquer ML, Jarvie TP, Jirage KB, Kim JB, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM, Lei M, Li J, Lohman KL, Lu H, Makhijani VB, McDade KE, McKenna MP, Myers EW, Nickerson E, Nobile JR, Plant R, Puc BP, Ronan MT, Roth GT, Sarkis GJ, Simons JF, Simpson JW, Srinivasan M, Tartaro KR, Tomasz A, Vogt KA, Volkmer GA, Wang SH, Wang Y, Weiner MP, Yu P, Begley RF, Rothberg JM (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature 437: 376-380

160 Maundrell K (1990) nmt1 of fission yeast. A highly transcribed gene completely repressed by thiamine. Journal of Biological Chemistry 265: 10857-10864 McGrath KC, Thomas-Hall SR, Cheng CT, Leo L, Alexa A, Schmidt S, Schenk PM (2008) Isolation and analysis of mRNA from environmental microbial communities. J Microbiol Methods 75: 172-176 Meyer E, Aglyamova GV, Wang S, Buchanan-Carter J, Abrego D, Colbourne JK, Willis BL, Matz MV (2009) Sequencing and de novo analysis of a coral larval transcriptome using 454 GSFlx. BMC Genomics 10: 219 Morales SE, Holben WE Linking bacterial identities and ecosystem processes: can 'omic' analyses be more than the sum of their parts? FEMS Microbiol Ecol 75: 2-16 Portnoy V, Evguenieva-Hackenberg E, Klein F, Walter P, Lorentzen E, Klug G, Schuster G (2005) RNA polyadenylation in Archaea: not observed in Haloferax while the exosome polynucleotidylates RNA in Sulfolobus. EMBO Rep 6: 1188-1193 Roling WF, Ferrer M, Golyshin PN Systems approaches to microbial communities and their functioning. Curr Opin Biotechnol 21: 532-538 Shi Y, Tyson GW, Eppley JM, Delong EF (2011) Integrated metatranscriptomic and metagenomic analyses of stratified microbial assemblages in the open ocean. Isme J 5: 999-1013 Singh BK, Bardgett RD, Smith P, Reay DS Microorganisms and climate change: terrestrial feedbacks and mitigation options. Nat Rev Microbiol 8: 779-790 Slomovic S, Portnoy V, Schuster G (2008) Detection and characterization of polyadenylated RNA in Eukarya, Bacteria, Archaea, and organelles. Methods Enzymol 447: 501-520 Stewart FJ, Ottesen EA, DeLong EF (2010) Development and quantitative analyses of a universal rRNA-subtraction protocol for microbial metatranscriptomics. Isme J 4: 896-907 Sun C, Li Y, Wu Q, Luo H, Sun Y, Song J, Lui EM, Chen S (2010) De novo sequencing and analysis of the American ginseng root transcriptome using a GS FLX Titanium platform to discover putative genes involved in ginsenoside biosynthesis. BMC Genomics 11: 262 Tedersoo L, Nilsson RH, Abarenkov K, Jairus T, Sadam A, Saar I, Bahram M, Bechem E, Chuyong G, Koljalg U (2010) 454 Pyrosequencing and Sanger sequencing of tropical mycorrhizal fungi provide similar results but reveal substantial methodological biases. New Phytol 188: 291-301 Todaka N, Moriya S, Saita K, Hondo T, Kiuchi I, Takasu H, Ohkuma M, Piero C, Hayashizaki Y, Kudo T (2007) Environmental cDNA analysis of the genes involved in lignocellulose digestion in the symbiotic protist community of Reticulitermes speratus. FEMS Microbiol Ecol 59: 592-599 Urich T, Lanzen A, Qi J, Huson DH, Schleper C, Schuster SC (2008) Simultaneous assessment of soil microbial community structure and function through analysis of the meta-transcriptome. PLoS ONE 3: e2527

161 Vega-Arreguin JC, Ibarra-Laclette E, Jimenez-Moraila B, Martinez O, Vielle-Calzada JP, Herrera-Estrella L, Herrera-Estrella A (2009) Deep sampling of the Palomero maize transcriptome by a high throughput strategy of pyrosequencing. BMC Genomics 10: 299 Vera JC, Wheat CW, Fescemyer HW, Frilander MJ, Crawford DL, Hanski I, Marden JH (2008) Rapid transcriptome characterization for a nonmodel organism using 454 pyrosequencing. Mol Ecol 17: 1636-1647 Wallenstein MD, Weintraub MN (2008) Emerging tools for measuring and modeling the in situ activity of soil extracellular enzymes. Soil Biology and Biochemistry 40: 2098-2106 Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10: 57-63 Warnecke F, Hess M (2009) A perspective: metatranscriptomics as a tool for the discovery of novel biocatalysts. J Biotechnol 142: 91-95 Zinger L, Lejon DP, Baptist F, Bouasria A, Aubert S, Geremia RA, Choler P Contrasting diversity patterns of crenarchaeal, bacterial and fungal soil communities in an alpine landscape. PLoS One 6: e19950 Zinger L, Shahnavaz B, Baptist F, Geremia RA, Choler P (2009) Microbial diversity in alpine tundra soils correlates with snow cover dynamics. Isme J 3: 850-859

162 Annex2:

Relationships between plant status and bacterial and fungal communities associated with grass roots under laboratory conditions

Authors Bouasria Abderrahim a,§ , Mustafa Tarfa a,§ , De Bello Francesco b, Zinger Lucie a, Lemperiere Guy c , Geremia Roberto A. a* , and Choler Philippe a aLaboratoire d’Ecologie Alpine, UMR CNRS -UJF 5553, Université de Grenoble 1, CNRS, [UdS], LECA, BP 53, 38041 Grenoble, France b Institute of Botany, Czech Academy of Sciences, Dukelská 135, CZ-379 82 Třeboň, Czech Republic cLaboratoire Caractérisation et Contrôle des Populations de Vecteurs , UR016 IRD Parc Scientifique Agropolis, Bd de Lironde, 34394 Montpellier, France

§ To be considered as equal first authors * Corresponding author: e-mail: [email protected] , LECA, BP 53, Bât D Biologie, 38041 Grenoble cedex 9 France. Tel. + 33 (0) 476 511 413, Fax: + 33 (0) 476 514 279.

163 Abstract Despite the importance of plant-soil feedbacks in terrestrial ecosystems, the effect of stress and disturbance on soil microbial communities has received little attention so far. We examined the changes in rhizosphere microbial diversity of four grass species submitted to a stress (drought) and/or a disturbance (defoliation). A full-factorial microcosm experiment in which we manipulated plant species identity, water supply and defoliation was conducted during two growing seasons. The diversity of bacterial and fungal communities was determined using Single-Strand Conformation Polymorphism (SSCP) molecular fingerprinting. Using SSCP profiles as a surrogate for microbial diversity, we used multivariate analysis of variance to test for the effect of experimental factors on microbial community changes. Around 50 % of the variance was explained by the experimental factors plant species identity contributed the most with 22% of the total variance explained. Drought had a higher effect on bacterial- than on fungal- beta diversity. The effect of defoliation was weak but tended to be stronger for fungal communities, which were overall, the most responsive to plant species identity and to plant growth responses. Bacterial and fungal rhizospheric communities respond to direct and indirect, i.e. plant-mediated, effects of experimental treatments. Contrasting plant growth strategies in response to stress and disturbance are important drivers of rhizospheric microbial community dynamics.

Keywords Drought – herbivory – grass – microbial diversity – rhizosphere –environmental changes

Abbreviations SSCP: Single Strand Conformation Polymorphism, Fo : Festuca ovina , Kc : Koeleria cristata, Ac : Agrostis capillaris, Dc: Deschampsia cespitosa, SLA: Specific Leaf Area, C: control, D: drought, F: defoliation, DF: drought and defoliation, SOM: soil organic matter content, Volumetric Soil Water content: VSW,

164 1. Introduction Understanding how plants cope with stress (constraints on primary productivity) and disturbance (destruction of biomass) has provided the foundations of major plant functional classifications and a general framework to analyse spatial and temporal patterns of species composition change [1]. How does a suite of traits at different levels of organization - from organs to population - underpin plant response to stress and disturbance and how does it impact on community dynamics and functioning are still open questions [2]. These long- standing issues in plant ecology need to be re-evaluated from a microbial perspective since plant-soil microbe interactions play a pivotal role in plant community dynamics and spatial diversity patterns [3-6]. Yet, experimental studies examining the response of both plants and microbes to stress and disturbance remain scarce [but see 7, 8]. Rhizospheric microbial communities may respond either directly or indirectly to the stress and disturbance factors plants cope with. An indirect effect is an effect mediated through plant growth and plant physiological changes in response to environment. Such plant- mediated effects are increasingly regarded as of primary importance to understand ecosystem biogeochemistry [9] and the relationships across multi-trophic levels of diversity [10]. Recent studies have documented how plant carbon allocation changes may be influential in defining the soil microbial community associated with roots. For example, herbivory (or defoliation) has an indirect impact on soil microbes or the soil mesofauna by modifying plant’s root : shoot ratios and rhizodeposition flow [8, 11-13]. These studies have focused on changes in microbial biomass and in microbial activities, but did not directly analyze changes in microbial beta diversity associated with plant response to stress and disturbance. There is therefore a need for comparative studies examining the consequences of plant growth on the composition of root-associated microbial communities. The advent of molecular techniques in microbial ecology has opened new avenues to characterize soil microbial diversity patterns [14]. Many descriptive studies have shed new light on the biogeography of microbes at a range of spatial and temporal scales [review in 15] and there is a burst of interest in understanding patterns of microbial beta diversity, i.e. the variation in microbial community structure among a set of sample units [16]. A growing body of evidence supports the role of plant species identity in shaping soil microbial communities [17-19]. However, estimating the contribution of plant species identity compared to that of other environmental drivers remains difficult with mensurative studies. Hypothesis-driven approaches should complement these studies to provide more robust test of factors controlling

165 microbial diversity patterns and, more specifically, to disentangling the direct and indirect effects of environmental changes on soil microbial diversity. Here, we report on an experimental study that examines changes in the beta diversity of rhizosphere microbial communities associated with grass roots. We carried out a greenhouse microcosm experiment in which we controlled for plant species identity, water supply (drought) and defoliation (disturbance) using a full-factorial design. Bacterial and fungal microbial communities were characterized using Single-Strand Conformation Polymorphism (SSCP) molecular profiling. We addressed the following three main questions: how much of the variation in molecular profiles among pots, a surrogate of beta diversity, is explainable by experimental factors? Do bacterial and fungal communities respond similarly to experimental factors? To what extent do soil microbial changes depend on the species- specificity of plant growth response to stress and disturbance? 2. Materials and methods 2.1 Sites and species sampling In the early spring 2006, we collected four dominant grass species in the mountain grasslands of the ‘Alpage de Darbounouse’ (44°58'7.82"N, 5°28'39.08"E) at 1300 m. The sampling area is located in the “Reserve Naturelle des Hauts Plateaux du Vercors”, a limestone plateau (17 000 ha) located in the French Alps. The selected grass species – Festuca ovina (Fo ), Koeleria cristata (Kc), Agrostis capillaris (Ac ), and Deschampsia cespitosa (Dc) – represent conservative ( Fo and Kc ) and exploitative ( Ac and Dc ) strategies . Conservative strategy is characterized by a low relative growth rate and a low Specific Leaf Area (SLA). Fo and Kc dominate dry, unfertile grasslands with shallow soils, mostly found on the top of small convexities. Exploitative strategy is characterized by a high relative growth rate and a high SLA. Ac and Dc dominate mesic, productive grasslands preferentially located in small depressions. All these grasslands have been grazed for centuries and the impact of grazing is generally higher in the mesic grasslands. 2.2 Experimental design We randomly sampled 30 x 30 cm large and 40 cm depth soil cores covered by large tussocks of each grass species. Tillers of each species were manually separated from large tussocks and transplanted into small pots to establish a source of plant material for the experimental studies. In order to assure soil homogeneity, we worked with a reconstituted soil consisting of vermiculite, sand and a commercial, non-sterilized, potting compost (Fertiligène®, http://www.fertiligene.com/) in the proportion 4/1/1. Pots were filled with this soil. The nitrogen availability in the soil mixture ranged from 10 to 12 mg of mineral N per kg

166 of soil [20]. The soil was mixed and sieved for homogenisation. Transplanted tillers in these pots constituted a first root-conditioning phase in the composite soil. The greenhouse experiment was conducted in Grenoble (France). The first phase of the experiment was a soil conditioning phase: seven tillers of the same species were transplanted in pots of 20 cm diameter and 25 cm depth filled with the composite soil described above. We controlled for the initial fresh biomass of all tillers by cutting leaves above 3 cm and roots below 5 cm. This further allowed eliminating the younger roots, likely harbouring the majority of mycorrhiza and microbial communities. Tiller roots were gently washed in sterile water before transplantation. A total of nearly 3000 tillers were transplanted. Two series of fifteen pots were left without plants. Experimental treatments were applied during a second phase including the 2006 and 2007 growing seasons. We randomly assigned four treatments, i.e. control (C), drought (D), defoliation (F), drought and defoliation (DF) to pots. Fifteen replicates per species and per treatment combination were available. Our complete randomized experimental design did not include blocks. Pots were moved on a weekly basis to account for possible heterogeneity in light resources or air temperature within the greenhouse. The defoliation treatment (F) consisted in cutting the aboveground biomass at a height of 7 cm after one month of growth. In the drought treatment (D), 400 ml of water was supplied every 15 days vs . every 2 days in the control (C). The drought treatment was applied to a subset of the unvegetated pots (hereafter Du) whereas the other subset was watered every 2 days (hereafter Cu). At the end of the experiment, aboveground and belowground plant biomass were harvested, dried at 85°C for 48 hours and weighed. 2.3 Soil characteristics Samples of 15 g of soil were dried at 105 °C and burned 5 hours at 550 °C to determine soil organic matter content (SOM). Percentage of SOM was calculated according to: SOM (%) = (dry mass (105 °C) – dry mass (550 °C)) / dry mass (105 °C). Total carbon and nitrogen were estimated with 3-5 mg of dried (70°C for 72h), and 200 μm -sieved soil samples using Flash EA 1112 elemental analyser (Thermo Fisher Scientific Inc., Italy). Soil water pH was measured after mixing 5 g of soil with 12.5 ml of distilled water [21]. Volumetric Soil Water content (VSW) was measured using TDR probes (Soil moisture Equipment Corp., Santa Barbara, California, USA.). Daily measurements were made during the first 15 days following watering and then on a weekly basis. Measurements were done in subsamples of 5 pots for each species as well as for unvegetated pots.

167 2.4 Microbial community structure For each experimental condition, rhizosphere soil samples from three different pots were collected by gently shaking and rinsing the dense root system with sterile water to avoid the presence of any root material in soil samples. The obtained sludge was centrifuged and the soil pellet was immediately frozen at -20 °C until extraction. The diversity of bacterial and fungal communities was estimated using Single Strand Conformation Polymorphisms (SSCP) method [22]. For each soil sample, we extracted DNA from three aliquots of 0.25 g wet mass using the PowerSoil-htp TM 96 Well Soil DNA Isolation Kit (MO BIO Laboratories, Ozyme, St Quentin en Yvelines, France). DNA concentration was quantified using the NanoDrop ND- 1000 (NanoDrop technologies). The extracted DNA was verified by electrophoresis on 1.5% agarose gel. The three DNA extracts per pot were pooled to get a composite sample. The SSCP analysis followed Zinger et al. [22]. Briefly, this method separates a fragment of the microbial gene ssu (bacteria) or ITS1 (fungi) according to their size and conformation, which are then visualized as peaks of varying intensity depending on the fragment abundance. Fungal diversity was assessed by amplification of the ITS1 region (Internal Transcribed Spacer) using the primers ITS5 (5’ -GGAGTAAAAGTCGTAACAAGG-3’) and ITS2 (5’ - GCTGCGTTCTTCATCGATGC-3’). The ITS2 primer was labelled in 5’ end with fluorescein phospharamidite (FAM). For bacterial diversity, the V3 region of ssu was amplified using the primers W49 (5’ -ACGGTCCAGACTCCTACGGG-3’) and W104 (5’ - GTGCCAGCAGCCGCGGTAA-3’), which was labelled with 5’ -fluorescein phosphoramidite (FAM). Three independent PCR reactions were performed. The PCR mixture contained

(25µl) 2.5 mM of MgCl 2, 1X of AmpliTaq GoldTM buffer, 0.5 µM of each primer, 0.1 µM of dNTP, 1 U of AmpliTaq Gold polymerase (Applied Biosystems, Courtaboeuf, France, and 1µl of each DNA sample (10 ng/µl). The PCR conditions were the same for both cases: 95 °C for 10 min; 95 °C for 30s, 56 °C for 15 s 72 °C for 15 s (30 cycles); and final extension at 72 °C for 7 min and PCR products were visualized on a 2% TBE agarose gel. A 1-µl aliquot of the PCR product was mixed with 10 ml of formamide Hi-Di (Applied Biosystems, Courtaboeuf, France) and 0.2 ml of the standard internal DNA molecular weight marker Genescan-400HD ROX (Applied Biosystems, Courtaboeuf, France). The SSCP was performed on an ABI prism 3130 Genetic analyzer (Applied Biosystems, Courtaboeuf, France), using a 36 cm length capillary. The polymer consisted of 5% GenScan polymer, 10% glycerol and 3200 buffer (Applied Biosystems, Courtaboeuf, France), running buffer consisted of 10% glycerol and 3200 buffer. The injection time and voltage were set to 22s and 1 kV, respectively. Electrophoresis was performed at 32 °C for 25 min [22, 23].

168 2.5 Statistical analyses SSCP profiles were normalized to control for variations of total fluorescence intensity among electrophoregrams. To estimate microbial beta diversity, distances between normalized SSCP profiles were calculated using Kulczynski distance, a pairwise dissimilarity measure based on quantitative data [24]. Analyses performed with other distances (e.g. Bray- Curtis, Gower) did not change the main conclusions of the study (data not shown). The resulting dissimilarity matrix (Y) summarized the multivariate dispersion of the sample units. In order to disentangle the relative effect of each factor on microbial beta diversity, we chose a statistical method to partition the sums of squares of Y among experimental treatments [25]. We had three fixed experimental factors, namely plant species identity (A), drought (B) and defoliation (C). The generic form of the model was thus: , where error term. This multivariate analysis of variance was carried out using the function adonis of the package vegan [26, 27]. The method is similar to a redundancy analysis [28]. Sums of squares and resulting F-tests from permutations of the raw data were calculated to test for the significance of experimental factors on Y. In addition to variance partitioning, neighbour- joining distance trees based on dissimilarity matrices were constructed with the package ape [29] to visualize the dissimilarity patterns between microbial communities. Support for resulting groups was calculated from 1000 bootstrap replicates. All statistical analyses were done with the R software [30]. 3. Results 3.1 Soil and plant responses to experimental treatments Soil parameters of vegetated vs . unvegetated pots only differed in the drought treatment where the presence of grasses reduced significantly soil pH (Fig. 1b and Supplementary Material 1) and soil water content (Fig. 1c and Supplementary Material 1). Soil pH increased under drought and defoliation for Ac (Fig 1b) and tended to decrease under drought for Kc (effect only significant when comparing drought and drought x defoliation). Soil water depletion tended to increase for Ac and Dc under defoliation, with or without drought treatment (Fig. 1c). Species differed in their root growth response under drought: Ac (and to a lesser extent Dc ) exhibited reduced root biomass compared to control whereas Kc (and to a lesser extent Fo ) increased its root biomass (Fig. 1d). The defoliation treatment did not lead to a significant decrease in the root biomass investment regardless of the species considered (Fig. 1d).

169 B A b a

A a A A b B AB A a A A A A ab a a AAB A ab A pH ab A A A a A A a a B a A A a A a A BA A b a a a A A a a A a a a B a a a a A a a a A a a A a

Total N Total contentN (%) a

A a

0.0 0.1 0.2 0.3 0.4 0.5 Drought x 7.0 7.2 7.4 7.6 7.8 8.0 Drought x Control Defoliation Drought Defoliation Control Defoliation Drought Defoliation

A B A A a a a A A a A a A c A a a a A AB a ab A A A a a ab AB C AB ab A D b ab b A B b B AB a B b b a a AB B a Root BiomassRoot (g) A A B C B a

Soil Water Content (%) SoilContent Water B b B a a b a a B a a

A a 0 10 20 30 40 0 1 2 3 4 Drought x Drought x Control Defoliation Drought Defoliation Control Defoliation Drought Defoliation

Agrostis capillaris Deschampsia caespitosa Festuca ovina Figure 1 Koeleria cristata No plant

Figure 1 . Barplot of soil characteristics (a-c) and root biomass (d) at the end of the experiment. Means and standard errors are shown for each experimental treatment. Values with the same letters are not significantly different according to Tukey's post hoc test at 5%. Lowercase letters compare species within treatment. Uppercase letters compare treatment within species.

170 3.2 Rhizosphere Microbial Diversity At the end of the experiment, the bacterial (Fig. 2a) and fungal (Fig. 2b) communities of unvegetated pots (t fin(C) and t fin(D) ) (i) strongly differed from that found in vegetated pots and, (ii) were different from the microbial communities of the initial soil mixture (t ini ). This is indicative of a significant, plant-independent, microbial turnover during the time course of the experiment. Neighbour-joining tree topology showed that this microbial turnover was particularly pronounced under drought treatment for Fungi (Fig. 2b).

tfin(D) (a) (b)

tfin(C)

tini tfin(C)

tfin(C)

tini 0.02 0.01

tfin(D)

Agrostis capillaris Control Deschampsia caespitosa Defoliation Festuca ovina Drought Koeleria cristata Defoliation x Drought No plant

Figure 2

Figure 2 . Neighbour-joining trees of soil bacterial (a) and fungal (b) SSCP profiles. Branches with a bootstrap value above 50% (based on 1000 resampling) are in bold. Trees are based on Kulczynski’s distances. Soil microflora of unvegetated pots (x) are shown at the beginnin g

(t ini ) and at the end (t fin ) of the experiment for the control (C) and the drought treatment (D). See Table 1 for variance partitioning. Variation partitioning of the entire dataset showed that around 50 % of the SSCP profile diversity was explained by the different experimental factors (Table 1). Plant species identity was the most significant factor for both Bacteria and Fungi with R2 = 0.22. Drought had a significant effect on bacterial beta diversity ( R2 = 0.11), and a much less pronounced effect on fungal beta diversity ( R2 = 0.04). By contrast, the effect of defoliation was weak. All combined effects including plant species identity were significant for Fungi whereas only the

171 drought x plant species identity interaction was significant for Bacteria (Table 1).These results suggest that responses to experimental treatments tended to be more idiosyncratic ( i.e. plant species related) for Fungi than for Bacteria . Table 1 . Relative contribution of species identity, drought and defoliation on microbial beta diversity. Multivariate analysis of variance was performed on the distances between SSCP profiles.

df: degrees of freedom; SS: Sum of Squares.

172 (a) (b)

0.01 0.01

(c) (d)

0.01 0.01

Agrostis capillaris Deschampsia caespitosa Figure 3 Festuca ovina Koeleria cristata Figure 3. Neighbour-joining trees of soil bacterial SSCP profiles in control (a), drought (b), defoliation (c) and drought x defoliation (d) treatments. Branches with bootstrap values above 50% are in bold. Trees are based on Kulczynski’s distances. See Table 2 for variance partitioning.

Given the significant species by drought and species by defoliation interactions (Table 1), we further examined the effects of species within each treatment for Bacteria (Fig. 3) and Fungi (Fig. 4) with the corresponding variation partitioning in Table 2. In control pots, the plant species identity effect was strong (Fig. 3a, 4a) and accounted for 50 % of the variance for both Bacteria and Fungi (Table 2). In the drought treatment, the plant species identity effect remained high ( R2= 0.57 for Bacteria and R2 = 0.47 for Fungi). By contrast, under defoliation, this species effect was markedly diminished for Bacteria (R2= 0.23) and to a lesser extent for Fungi (R2 = 0.34) whereas the drought x defoliation treatment lay in- between.

173 (a) (b)

0.02 0.02

(c) (d)

0.02 0.02

Agrostis capillaris Deschampsia caespitosa Figure 4 Festuca ovina Koeleria cristata Figure 4 . Neighbour-joining trees of soil fungal SSCP profiles in control (a), drought (b), defoliation (c) and drought x defoliation (d) treatments. Branches with bootstrap values above 50% are in bold. Trees are based on Kulczynski’s distances. See Table 2 for variance partitioning.

Table 2 . Quantification of the effect of plant species on microbial beta diversity in each treatment .

df: degrees of freedom; SS: Sum of Squares.

174 To compare the effects of drought and defoliation treatments on microbial communities, we also performed variation partitioning within each species (Table 3; see also the corresponding neighbour joining trees in Supplementary Material 2 and 3). The drought effect on bacterial and fungal SSCP profiles was only significant for Ac and Dc (Table 3). Defoliation did not influence bacterial beta diversity , but was always significant for fungal beta diversity with a R2 ranging from 0.07 ( Ac and Dc ) to 0.16 (Kc ). Table 3 . Quantification of the effect of stress and disturbance treatments on microbial beta diversity for each species.

df: degrees of freedom; SS: Sum of Squares. 4. Discussion In this study, we addressed the effect of stress and disturbance on rhizosphere microbial communities through direct and indirect, i.e. plant-mediated, effects. We have used a microcosm experiment to control for two key driving factors in temperate grasslands, i.e. water shortage and defoliation. Microcosm experiments are powerful approaches to identify

175 drivers of microbial communities [31-33]. Around 50 % of the microbial beta diversity was accounted for by plant species identity in control pots (Table 3) and plant species identity was the primary source of variation, with 22 % of explained variance (Table 1). Such a feature might have been caused by microbes from the field remaining on the surface of washed roots despite our efforts to strongly reduce this native and likely specific biomass (see Material and

Methods). The microbial communities of the initial soil (t ini , Fig. 2) were embedded in those of the vegetated pots suggesting that the microbial communities in the vegetated pots were recruited from microorganisms present in the soil of the pots. In addition, the vegetated pots were not grouped by sampling location, indicating that the microbial signature of the native grassland soils was not retained in the experimental pots. Taken together, our results suggest that each grass species has selected its own rhizospheric microflora from the common pool of microbes occurring in the exogenous soil mixture. The effect of plant species identity on microbial biomass and activity has well been documented [34-36]. Root exudation [37-41], litter input [36, 42], and establishment of symbiotic association to enhance nutrient uptake [43, 44] are among the mechanisms through which plants can regulate soil microbial composition in a species-specific way. However, other studies observed a weak effect of plant species identity on microbial communities compared to that of the soil type [e.g. 45, 46]. Here we did not compare the relative importance of these two factors but only examined monocultures grown on a soil mixture containing exogenous organic matter. Microflora of native mountain soils have been shown to be largely determined by plant species composition [47]. A companion study using native soils of the investigated mountain grasslands would allow assessing the impact of stress and disturbance on a specialized microflora already in place. It has been argued that fungal diversity patterns of vegetated soils were generally more related to plant species identity than bacterial diversity patterns [review in 48]. Bacteria seemed less directly related to plant and are thus more influenced by soil properties [45, 49]. For example, soil bacterial beta nutrient availability [50]. In our study, changes in soil pH were mainly observed under drought and their effect was therefore difficult to disentangle from that of soil water content changes. The drought treatment most strongly affected the rhizospheric microflora for Ac (Table 3) despite minor changes of soil parameters for that species (Fig. 1a-c). This may result from the contrasting patterns of root biomass investment that we observed for the different grass species (Fig. 1d, see below). Although one would have expected a stronger plant specificity of Fungi due to mycorrhizal associations, our results indicated that the effect of

176 plant species identity was similar for both Bacteria and Fungi (Table 2). This might be due to (i) a strong reduction of mycorrhizal biomass when younger roots were cut prior the experiment and/or (ii) the absence of roots in soils submitted to molecular analyses, excluding endophytic fungi. Both Bacteria – primarily dependent on labile carbon pools – and Fungi – better adapted to use complex plant carbon compounds [48, 51] – may be similarly affected by plant species identity through root exudation and tissue turnover [52]. Our experiment lasted two growing seasons and both labile carbon compounds supplied during vegetative growth and complex carbon structure from dead tissues have had the potential to influence both bacterial and fungal communities. Fungi were less sensitive to drought than Bacteria (Table 1) and these findings are consistent with earlier reports [53, 54]. Our study also revealed that the impact of the drought treatment on microbial community was, at least in part, indirect, as we found more pronounced effect for the two exploitative plant species, Ac and Dc, than for the two conservative plant species Fo and Kc (Table 3). Noticeably, this contrasting effect of drought was observed even if soil pH, soil water, and soil N content did not significantly differ among species. The increased production of roots in Fo and Kc under drought and presumably the lower transpiration flow (higher Specific Leaf Area, reduced leaf area) are features of stress- tolerant strategies [55]. It is consistent with the field distribution of these two species that dominate the most xeric mountain grasslands occurring on shallow calcareous soils. We hypothesize that the maintenance of a high root biomass and possibly rhizodeposition flow under drought is essential to buffer the direct impact of drought on microbes. By contrast, the rhizospheric microflora of Ac and Dc were more responsive to water shortage and further work is needed to determine whether this response is driven by soil moisture directly, by root biomass variations and/or by qualitative changes of carbon supply to microbes. Overall, our results are consistent with the hypothesis that plant growth strategies strongly influences the diversity and functioning of rhizospheric communities in a context of environmental changes [56]. Defoliation had a weak effect on microbial beta diversity (Table 2) and two thirds of the variance was unexplained when testing for a species effect within the defoliation treatment (Table 3). Gutian and Bardgett [57] reported an increased microbial biomass following defoliation in a pot experiment but did not assess microbial community diversity changes. Many studies have examined the effect of aboveground foliar-herbivory on microbes and put forward nitrogen cycling changes to explain changes in microbial biomass and activity [58]. An experimental defoliation differs from herbivory in many respects, and obviously the long-

177 term response of microbes in natural communities are different from that observed following the experimental cutting over two growing seasons. In our experiment, defoliation triggered a slight reduction in root biomass for all four species (Fig.1c) but the effect was never significant. Therefore the reduced effect of plant species identity in defoliation treatment compared to control seamed root-biomass independent. The repeated defoliation may have led to reduced flow of decomposable carbon to the rhizosphere and may have consequently diminished the strength of microflora selection by plant species.

5. Conclusion In this study, we used molecular methods to characterize the microflora associated with grass roots and variation partitioning to tease out the role of different experimental factors on the turnover of bacterial and fungal communities. Our findings showed that the contrasting growth response of plants coping with stress and disturbance cascade into rhizosphere microbial community changes. This work opens avenues to understand the linkages between plant functional groups and soil microbial community dynamics. Further studies should identify the plant functional traits that are the most influential on microbial turn-over and pay more attention to the feedback of microbial community changes on plant performance.

6. Acknowledgments We gratefully acknowledge Jean-Marc Bonneville and Jean Martins for comments on the manuscript. We thank Isabelle Boulangeat for her help on microbial data analysis. Logistic support was provided by Pierre-Eymard Biron, wildlife reserve officer of the ‘Reserve Naturelle des Hauts Plateaux du Vercors’ (PNR du Vercors). AB was partly funded by the ‘Pôle de Recherches sur la Biodiversité’, an initiative of the ‘Conseil General de l’ISERE’ (Grenoble, F rance).

178 7. References [1] J.P. Grime, Plant strategies, vegetation processes, and ecosystem properties, John Wiley & Sons, Chichester, 2001. [2] K.N. Suding, D.E. Goldberg, K.M. Hartman, Relationships among species traits: Separating levels of response and identifying linkages to abundance, Ecology, 84 (2003) 1-16. [3] D.A. Wardle, Communities and Ecosystems. Linking the Aboveground and Belowground Components., Princeton University Press, Princeton, U.S.A., 2002. [4] R. Bardgett, The biology of soils, Oxford University Press, Oxford, 2005. [5] H.L. Reynolds, A. Packer, J.D. Bever, K. Clay, Grassroots ecology: Plant-microbe-soil interactions as drivers of plant community structure and dynamics, Ecology, 84 (2003) 2281- 2291. [6] C.H. Ettema, D.A. Wardle, Spatial soil ecology, Trends in Ecology & Evolution, 17 (2002) 177-183. [7] W. Williamson, D. Wardle, The soil microbial community response when plants are subjected to water stress and defoliation disturbance, Applied Soil Ecology, 37 (2007) 139- 149. [8] J. Mikola, G.W. Yeates, D.A. Wardle, G.M. Barker, K.I. Bonner, Response of soil food- web structure to defoliation of different plant species combinations in an experimental grassland community, Soil Biology and Biochemistry, 33 (2001) 205-214. [9] S. Lavorel, E. Garnier, Predicting changes in community composition and ecosystem functioning from plant traits: revisiting the Holy Grail, Functional Ecology, 16 (2002) 545- 556. [10] D.A. Wardle, R.D. Bardgett, J.N. Klironomos, H. Setala, W.H. van der Putten, D.H. Wall, Ecological linkages between aboveground and belowground biota, Science, 304 (2004) 1629-1633. [11] L.I. Sorensen, J. Mikola, M.M. Kytoviita, Defoliation effects on plant and soil properties in an experimental low arctic grassland community - the role of plant community structure, Soil Biology and Biochemistry, 40 (2008) 2596-2604. [12] M. Sinka, T.H. Jones, S.E. Hartley, The indirect effect of above-ground herbivory on collembola populations is not mediated by changes in soil water content, Applied Soil Ecology, 36 (2007) 92-99. [13] L.M. Macdonald, E. Paterson, L.A. Dawson, A.J.S. McDonald, Defoliation and fertiliser influences on the soil microbial community associated with two contrasting Lolium perenne cullivars, Soil Biology and Biochemistry, 38 (2006) 674-682.

179 [14] J.M. Tiedje, S. Asuming-Brempong, K. Nusslein, T.L. Marsh, Flynn, Opening the black box of soil microbial diversity, Applied Soil Ecology, 13 (1999) 109-122. [15] J.B. Martiny, B.J. Bohannan, J.H. Brown, R.K. Colwell, J.A. Fuhrman, J.L. Green, M.C. Horner-Devine, M. Kane, J.A. Krumins, C.R. Kuske, P.J. Morin, S. Naeem, L. Ovreas, A.L. Reysenbach, V.H. Smith, J.T. Staley, Microbial biogeography: putting microorganisms on the map, Nature Reviews: Microbiology, 4 (2006) 102-112. [16] M.J. Anderson, T.O. Crist, J.M. Chase, M. Vellend, B.D. Inouye, A.L. Freestone, N.J. Sanders, H.V. Cornell, L.S. Comita, K.F. Davies, S.P. Harrison, N.J. Kraft, J.C. Stegen, N.G. Swenson, Navigating the multiple meanings of beta diversity: a roadmap for the practicing ecologist, Ecology Letters, 14 (2011) 19-28. [17] E. Yergeau, K.K. Newsham, D.A. Pearce, G.A. Kowalchuk, Patterns of bacterial diversity across a range of Antarctic terrestrial habitats, Environmental Microbiology, 9 (2007) 2670-2682. [18] D.L. Mummey, P.D. Stahl, Spatial and temporal variability of bacterial 16S rDNA-based T-RFLP patterns derived from soil of two Wyoming grassland ecosystems, FEMS Microbiology Ecology, 46 (2003) 113-120. [19] E. Brodie, S. Edwards, N. Clipson, Bacterial community dynamics across a floristic gradient in a temperate upland grassland ecosystem, Microbial Ecology, 44 (2002) 260-270. [20] N. Gross, K.N. Suding, S. Lavorel, C. Roumet, Complementarity as a mechanism of coexistence between functional groups of grasses, Journal of Ecology, 95 (2007) 1296-1305. [21] F. Yan, S. Schubert, K. Mengel, Soil pH increase due to biological decarboxylation of organic anions, Soil Biol. Bioch., 28 (1996) 617-624. [22] L. Zinger, J. Gury, O. Alibeu, D. Rioux, L. Gielly, L. Sage, F. Pompanon, R.A. Geremia, CE-SSCP and CE-FLA, simple and high-throughput alternatives for fungal diversity studies, Journal of Microbiological Methods, 72 (2008) 42-53. [23] L. Zinger, J. Gury, F. Giraud, S. Krivobok, L. Gielly, P. Taberlet, R.A. Geremia, Improvements of polymerase chain reaction and capillary electrophoresis single-strand conformation polymorphism methods in microbial ecology: Toward a high-throughput method for microbial diversity studies in soil, Microbial Ecology, 54 (2007) 203-216. [24] D.P. Faith, P.R. Minchin, L. Belbin, Compositional dissimilarity as a robust measure of ecological distance, Vegetatio, 69 (1987) 57-68. [25] M.J. Anderson, A new method for non-parametric multivariate analysis of variance, Austral Ecology, 26 (2001).

180 [26] S. Kembel, D. Ackerly, S. Blomberg, P. Cowan, M. Helmus, H. Morlon, C.O. Webb, picante: R tools for integrating phylogenies and ecology. R package version 0.7-2, in, http://cran.r-project.org/web/packages/picante/index.html , Accessed 3 June 2011, 2009. [27] J. Oksanen, R. Kindt, P. Legendre, B. O'Hara, L. Gavin, vegan: Community Ecology Package. R package version 1.15-4, in, http://cran.r- project.org/web/packages/vegan/index.html , Accessed 3 June 2011, 2009. [28] P. Legendre, M.J. Anderson, Distance-based redundancy analysis: Testing multispecies responses in multifactorial ecological experiments, Ecological Monographs, 69 (1999) 1-24. [29] E. Paradis, APE: Analyses of Phylogenetics and Evolution in R language, Bioinformatics, 20 (2004) 289-290. [30] R Development Core Team, R: A Language and Environment for Statistical Computing, in, R Foundation for Statistical Computing, Vienna, Austria. http://cran.r-project.org/ , Accessed 3 June 2011, 2010. [31] B. Stres, T. Danevcic, L. Pal, M.M. Fuka, L. Resman, S. Leskovec, J. Hacin, D. Stopar, I. Mahne, I. Mandic-Mulec, Influence of temperature and soil water content on bacterial, archaeal and denitrifying microbial communities in drained fen grassland soil microcosms, FEMS Microbiology Ecology, 66 (2008) 110-122. [32] B. Stres, L. Philippot, J. Faganeli, J.M. Tiedje, Frequent freeze-thaw cycles yield diminished yet resistant and responsive microbial communities in two temperate soils: a laboratory experiment, FEMS Microbiology Ecology, 74 (2010) 323-335. [33] F. Baptist, L. Zinger, J. Clement, C. Gallet, R. Guillemin, J. Martins, L. Sage, B. Shahnavaz, P. Choler, R. Geremia, Tannin impacts on microbial diversity and the functioning of alpine soils: a multidisciplinary approach, Environmental Microbiology, 10 (2008) 799- 809. [34] L. Innes, P.J. Hobbs, R.D. Bardgett, The impacts of individual plant species on rhizosphere microbial communities in soils of different fertility, Biology and Fertility of Soils, 40 (2004) 7-13. [35] S.J. Grayston, S.Q. Wang, C.D. Campbell, A.C. Edwards, Selective influence of plant species on microbial diversity in the rhizosphere, Soil Biology & Biochemistry, 30 (1998) 369-378. [36] R.D. Bardgett, A. Shine, Linkages between plant litter diversity, soil microbial biomass and ecosystem function in temperate grasslands, Soil Biology and Biochemistry, 31 (1999) 317-321.

181 [37] M. Bacilio-Jimenez, S. Aguilar-Flores, E. Ventura-Zapata, E. Perez-Campos, S. Bouquelet, E. Zenteno, Chemical characterization of root exudates from rice (Oryza sativa) and their effects on the chemotactic response of endophytic bacteria, Plant and Soil, 249 (2003) 271-277. [38] B.W. Hütsch, J. Augustin, W. Merbach, Plant rhizodeposition-an important source for carbon turnover in soils, Journal of Plant Nutrition and Soil Science, 165 (2002) 397-407. [39] C.D. Broeckling, A.K. Broz, J. Bergelson, D.K. Manter, J.M. Vivanco, Root exudates regulate soil fungal community composition and diversty, Applied and Environmental Microbiology, 74 (2008) 738-744. [40] C. Bertin, X.H. Yang, L.A. Weston, The role of root exudates and allelochemicals in the rhizosphere, Plant and Soil, 256 (2003) 67-83. [41] D. Standing, J.I. Rangel Castro, J.I. Prosser, M. A, K. Killham, Rhizosphere carbon flow: a driver of soil microbial diversity, in: R.D. Bardgett, M.B. Usher, D.W. Hopkins (Eds.) Biological diversity and function in soils, Cambridge University Press, Cambridge, 2005, pp. 154-170. [42] C. Conn, J. Dighton, Litter quality influences on decomposition, ectomycorrhizal community structure and mycorrhizal root surface acid phosphatase activity, Soil Biology and Biochemistry, 32 (2000) 489-496. [43] M.G.A. Van Der Heijden, R.D. Bardgett, N.M. Van Straalen, The unseen majority: soil microbes as drivers of plant diversity and productivity in terrestrial ecosystems, Ecology Letters, 11 (2008) 296-310. [44] G. Berg, K. Smalla, Plant species and soil type cooperatively shape the structure and function of microbial communities in the rhizosphere, FEMS Microbiology Ecology, 68 (2009) 1-13. [45] B.K. Singh, S. Munro, J.M. Potts, P. Millard, Influence of grass species and soil type on rhizosphere microbial community structure in grassland soils, Applied Soil Ecology, 36 (2007) 147-155. [46] N. Nunan, T.J. Daniell, B.K. Singh, A. Papert, J.W. McNicol, J.I. Prosser, Links between plant and rhizoplane bacterial communities in grassland soils, characterized using molecular techniques, Applied and Environmental Microbiology, 71 (2005) 6784-6792. [47] L. Zinger, D.P.H. Lejon, F. Baptist, A. Bouasria, S. Aubert, R.A. Geremia, P. Choler, Contrasting diversity patterns of crenarchaeal, bacterial and fungal soil communities in an alpine landscape, Plos One, 6 (2011) e19950.

182 [48] P. Millard, B.K. Singh, Does grassland vegetation drive soil microbial diversity?, Nutrient Cycling in Agroecosystems, 88 (2010) 147-158. [49] U.N. Nielsen, G.H.R. Osler, C.D. Campbell, D. Burslem, W. van der, The influence of vegetation type, soil properties and precipitation on the composition of soil mite and microbial communities at the landscape scale, Journal of Biogeography, 37 (2010) 1317- 1328. [50] N. Fierer, R.B. Jackson, The diversity and biogeography of soil bacterial communities, Proceedings of the National Academy of Sciences, USA, 103 (2006) 626-631. [51] W. de Boer, L.B. Folman, R.C. Summerbell, L. oddy, Living in a fungal world: impact of fungi on soil bacterial niche development, FEMS Microbiology Reviews, 29 (2005) 795–811. [52] C. Mougel, P. Offre, L. Ranjard, T. Corberand, E. Gamalero, C. Robin, P. Lemanceau, Dynamic of the genetic structure of bacterial and fungal communities at different developmental stages of Medicago truncatula Gaertn. cv. Jemalong line J5, New Phytologist, 170 (2006) 165-175. [53] D. Griffin, Water potential as a selective factor in the microbial ecology of soils, in: J.F. Parr, W.R. Gardner, L.F. Elliott (Eds.) Water potential relations in soil microbiology, Soil Science Society of America, Madison, Wisconsin, 1981, pp. 141-151. [54] R. Harris, Effect of water potential on microbial growth and activity, in: J.F. Parr, W.R. Gardner, L.F. Elliott (Eds.) Water potential relations in soil microbiology, Soil Science Society of America, Madison, Wisconsin, 1981, pp. 23-95. [55] H. Poorter, O. Nagel, The role of biomass allocation in the growth response of plants to different levels of light, CO2, nutrients and water: a quantitative review, Australian Journal of Plant Physiology, 27 (2000) 1191. [56] D.A. Wardle, G.W. Yeates, W. Williamson, K.I. Bonner, The response of a three trophic level soil food web to the identity and diversity of plant species and functional groups, Oikos, 102 (2003) 45-56. [57] R. Guitian, R.D. Bardgett, Plant and soil microbial responses to defoliation in temperate semi-natural grassland, Plant and Soil, 220 (2000) 271-277. [58] A.K. Patra, L. Abbadie, A. Clays Josserand, V. Degrange, S.J. Grayston, P. Loiseau, F. Louault, S. Mahmood, S. Nazaret, L. Philippot, E. Poly, J.I. Prosser, A. Richaume, X. Le Roux, Effects of grazing on microbial functional groups involved in soil N dynamics, Ecological Monographs, 75 (2005) 65-80.

183 Supplementary Table S1 (a) Effect of stress and disturbance treatments on soil parameters for each species and (b) effect of species on soil parameters for each stress and disturbance treatment.

Supplementary Material 2. Neighbour-joining trees of soil bacterial SSCP profiles for Ac (a), Dc (b), Fo (c) and Kc (d) species. Branches with bootstrap values above 50% are in bold. Trees are based on Kulczynski’s distances. See Table 3 for variance partitioning.

184 Supplementary Material 3. Neighbour-joining trees of soil fungal SSCP profiles for Ac (a), Dc (b), Fo (c) and Kc (d) species. Branches with bootstrap values above 50% are in bold. Trees are based on Kulczynski’s distances. See Table 3 for variance partitioning.

185 Annex3:

Microbial communities of urban stormwater sediments: The phylogenetic structure of bacterial communities varies with porosity.

Anne-Laure Badin 1, Tarfa Mustafa 2, Cédric Bertrand 3, Armelle Monier 2, Cécile Delolme 1, Roberto A. Geremia* 2, Jean-Philippe Bedell 1 1 Université de Lyon, Lyon, F-69003, France ; Université Lyon1, Villeurbanne, F- 69622, France ; ENTPE, Vaulx-en-velin, F-69518, France ; CNRS, UMR 5023 Laboratoire Ecologie des Hydrosystèmes Naturels et Anthropisés, Villeurbanne, F- 69622, France. 2 Laboratoire d'Ecologie Alpine, CNRS UMR 5553, Université Joseph Fourier, Grenoble, F-38041, France 3 Laboratoire de Chimie des Biomolécules et de l'Environnement - EA 4215, Université de Perpignan Via Domitia, Perpignan, France

* Corresponding author: Roberto A. Geremia ; Laboratoire d'Ecologie Alpine (LECA) UJF/CNRS UMR 5553, UFR BIOLOGIE; Bât. D Univ. Joseph Fourier ; F-38041 Grenoble Cedex 9 ; France ; Tel.: +33476514113; Fax: +33 476 51 42 79; E-mail: [email protected]

Key words : aggregates; bacterial biomasses and diversity; Cyanobacteria; fungal communities; soil fractionation; urban sediment

186 Abstract This study focuses on the distribution of bacterial and fungal communities inside the microstructure of a multicontaminated sedimentary layer resulting from urban stormwater infiltration. Fractionation was performed on the basis of differential porosity and on aggregate grain size, resulting in 5 fractions: leachable fitting macroporosity, <10 µm, 10-160 µm, 160-1000 µm fitting aggregates, >1000 µm. Both bacterial and fungal biomasses are higher in the <10 µm and leachable fractions. Aggregates contain numerous bacteria but very low fungal biomass. SSCP molecular profiles highlighted differences between bacterial and fungal communities of the leachable fraction and those of the aggregates. Random Sanger sequencing of ssu clones revealed that this difference is mainly due to the presence of Epsilonproteobacteria and Firmicutes in leachable fractions, while Cyanobacteria is higher in aggregates. The Cyanobacteria phylotypes in aggregates were dominated by sequences related to Microcoleus vaginatus while the leachable fractions displayed sequences of chloroplastic origin. Thus more than 50% of the phylotypes observed were related to Proteobacteria , a further 40% were related to Cyanobacteria and Bacteroidetes . Preferential distribution of clades in almost all detected phyla or classes was observed. This study provides insight into the identities of dominant members of the bacterial communities of urban sediments. The presence of M. vaginatus highlights its dominance in pioneer soils. Introduction Every year in urban areas, millions of tons of solid contaminated materials are accumulated on impermeable surfaces and form deposits such as wastewater sludge, road dust and sediments. In France, such urban sediments from stormwater have been estimated to represent several ten or so million tons dry weight (hereafter DW) per year (Petavy & Ruban 2005, Ruban 2005). Stormwater is often collected and made to infiltrate towards groundwater through infiltration basins, leading to the settling of highly polluted urban deposits, which constitute a potential source of contamination for groundwater (Pitt et al. 1999; Datry et al. 2004). Such sediments are mainly composed of silt-sized particles that contain high levels of metals and petroleum-derived organic matter (OM) (Durand et al., 2004, Durand et al., 2005, Clozel et al., 2006, Badin et al., 2008). They are colonized by a consistently active microflora (Neto et al., 2007, Mermillod-Blondin et al., 2008) and support plant growth. The solid materials constituting the sedimentary layer present high contents of OM (4 to 27%) and

187 biomasses (Durand et al. 2004; Winiarski et al. 2006; Murakami et al. 2008). These sediments can be considered as soil, because they have been formed as a function of parent rock and organic matter deposits; but in contrast to well-formed soil, they have evolved since only few decades (time of the infiltration practices). They could be considered as substrate where pedogenesis is occurring; we propose to consider these materials as the surface layer of a Technosol (Capilla et al. 2006; FAO 2006). A Technosol is defined as a soil whose properties and pedogenesis are dominated by their technical origin. Soil structure is a dynamic rather than a static property of soil (Hussein and Adey, 1998) that influences the transfer of solutes to other compartments or the uptake by the soil. It is known that the water that reaches groundwater with mobile solutes and bacteria is transported by convection in the macroporosity (advective flow), whereas diffusion mechanisms are effective in the microporosity and are responsible for transformation of organic matter and indirectly for pollutant retention. Consequently, the leaching of pollutants and bacteria to sub surfaces depends both on their mobility properties and the physical structure of the sediments in the surface layer. The basic unit of soil structure is aggregates, which is a cluster (or group) of soil particles in which the forces holding the particles together are much stronger than the forces between adjacent aggregates (Martin et al ., 1955). Various soil constituents are responsible for the aggregate formation. According to Edwards and Bremner (1967), Tisdall and Oades (1982), Oades (1988), and Boix-Fayos et al. (2001) the essential parameters that act on aggregation are organic matter (OM), microorganisms, clays and divalent cations. At microscale, soils are complex heterogeneous and structured environments (Young & Crawford, 2004). Their physical and chemical characteristics differ from point to point even by a few μm, leading to different microenvironments, which strongly influence bacterial cells (Hattori & Hattori, 1976, Chenu & Stotzky 2002, Grundmann 2004, Franklin & Mills 2007). Soil structure contains two different microhabitats: the inner part of soil aggregates (associated with microporosity <2.5-6 μm) and the outer part of aggregates (i.e. surrounding macroporosity) and aggregates surfaces (Hattori & Hattori 1976, Ranjard et al., 2000, Mummey et al., 2006, Kim & Sansalone 2008). Each pore space provides different niches for microbial life. For instance, microorganisms living in a macroporosity should cope with desiccation, but easily obtain nutrients infiltrating through the soil during infiltration. On the other hand,

188 microorganisms living in a microporosity should cope with anoxia and low nutrient resources, but are protected from predation. Thus the diversity of soil microhabitats could lead to differences in microbial communities. Actually, the distribution of microorganisms in soil porosities and grain size fractions has already been assessed in soil (Hattori & Hattori 1976, Jocteur Monrozier et al., 1991, Ranjard et al., 2000, Sessitsch et al., 2001, Mummey et al., 2006, Kim & Sansalone 2008). In some well-structured soils, the bacterial communities in microporosity and macroporosity are different (Hattori & Hattori 1976, Ranjard et al., 2000, Belnap et al., 2002, Pallud et al., 2004, Mummey et al., 2006). By contrast, in less structured soils, no evidence of differential distribution was found (Blackwood et al., 2006, Kim et al., 2008). Infiltration basins’ technosol contains very stable aggregates (Badin et al, 2009), however no information exists about microorganisms colonizing the different aggregates in this type of solid material. Characterization of the microbial composition in the aggregate and of the macroporosity fractions is required to better understand structure formation and the interactions between solid phases and leaching waters. Actually, microorganisms can interact with the solid phase and change the properties of the aggregates, i.e. by degrading OM or sequestering metals; this action may have really impact the composition of leachates. The main goal of this study was to describe the microbial compositions of the macro and microstructures of urban sediment as background work for the study of leaching processes. The technosol was fractionated into 5 fractions on the basis of aggregate size. The bacterial and fungal molecular profiles were determined by Capillary-Electrophoresis Single Strand Conformational Polymorphism (CE-SSCP), The bacterial community was further characterized by random sequencing of an ssu library. Our results show that bacterial communities of leachable and aggregated fractions differ greatly. To our knowledge, this is the first study on the two main porosity fractions of such urban stormwater sediment. Materials and methods Description of site, sampling and sediment The infiltration basin studied is located at Chassieu, an urban area NE of Lyon, France. The 1 ha infiltration pond receives stormwater from an urban and industrialized watershed of 185 ha and was described previously (Winiarski et al., 2006). The sampling area (Badin et al., 2008) is located in the infiltration part of the pond.

189 Sampling was done with a clean shovel. Five samples of 1 kg each were collected on May 10th 2006 and were taken from the 2 m 2 area (A to E samples). This sampling strategy was designed to catch the spatial variability on the site. They were kept at 4°C for 2 weeks before fractionation experiments. The chemical characteristics of the urban sediment were also reported previously (Badin et al., 2008). Briefly, the sediments studied were silt-sized materials with a high total heavy metal content (1934 ±188 mg/kg) and organic matter content (10 ±2% DW); the organic fraction was mainly composed of petroleum byproducts (Badin et al., 2008). The water content was 39 ±7%, - - + NO 3 +NO 2 1.62 ±O.77 mg/kg and NH 4 717 ±368 mg/kg. Grain size analyses and fractionation Grain size analyses and aggregate determination were performed in triplicates for each sample (the 5 described previously), in aqueous phase, by laser diffractometry (Malvern Mastersizer 2000G) Grain size distribution of the sample was measured before and after a 1-min sonication. The measurements were performed by laser diffractometer: a Malvern mastersizer 2000G with a range of 0.02 to 2,000 μm (in aqueous phase) was used. The aggregates considered were only those that could be destroyed by the amount of energy supplied by ultrasound (US) for 1 min in the sampler of the diffractometer (50 –60 Hz). Prior to analysis, each sample was gently wet-sieved at 1,600 μm with tap water. Fractionation: Based on aggregate size distribution, six fractions were obtained: the leachable part and five grain size fractions: >1000, 160-1000, 10-160, <10 µm. The aggregates are mainly found in the 160-1000 µm, but could also be in the <1000 µm. Fractionation was performed in triplicate for each sample as described previously (Ranjard et al. , 1997, Ranjard et al. , 2000). Briefly, the sediment sample was suspended and subjected to 14 cycles of gentle shaking and settling. The supernatant was recovered after each settling and pooled, resulting in the leachable fraction. The remaining sediment was fractionated by wet sieving (1000 and 160 µm) and sedimentation (10-160 and <10 µm). This procedure was repeated 3 times on the 5 samples collected, the relative proportions of fractions were very similar (p-value of 1 way ANOVA > 0.07). Thus, one sample was chosen to further characterize the fractions. The B sediment sample was chosen because its shows the least deviation from the means for water, OM contents, bacterial counts and pH. Fractionation was performed once more in triplicates (B1, B2, B3) and subsamples were allocated to various analyses (microbial characterization, OM characterization (Badin et al ., 2008) .

190 Bacterial enumeration and fungal biomass determination Bacterial enumeration : The bacteria counts were obtained by direct counting using epifluorescence microscopy, which detects live and dead cells. Bacteria were stained with 4’,6 -diamidino-2-phenylindole (DAPI) (Porter & Feig 1980, Kepner & Pratt 1994). Counts were performed in triplicates for each bulk sediment samples (A to E) and for each subsamples resulting from B1 to B3 fractionations within the 3 days following sampling or grain size fractionation on samples kept at 4°C. The bulk sediments, >1000 and 160-1000 µm were treated as solid samples, whereas the leachable, <10 and 10-160 µm fractions were treated as suspensions. The solid fractions were blended for 90 s with sterile NaCl 0.8% in a liquid:solid ratio of 5:1. The resulting suspensions were collected in sterile tubes and sonicated for 1 min. Formaldehyde (9 mg/mL), a solution of glutaraldehyde buffered at pH 7.2 (0.4 mg/mL), and DAPI stain (0.002 mg/mL) were added successively. The staining reaction lasted 20 min in the dark. The suspensions were filtered on black prestained 0.22 µm filters (Millipore isopore GTPB). Fluorescent bacteria were counted in ten areas (along a Z shaped line) per filter and three filters per sample. Fungal biomass determination : Ergosterol is a fungal cellular compound and has been used as a marker of live fungal biomass (Djajakirana et al. , 1996). It was extracted by 30 min sonication using 1 g eq/DW of sediment and 80mL of bidistilled ethanol. Filtrated extracts (0.45 µm) were then concentrated by evaporation and the dry extracts were recovered in a volume of 6 mL of bidistilled ethanol. It was filtered again (0.45 µm) prior to analysis by Liquid Chromatography (Agilent 1100) and Mass Spectrometry. The mobile phase used in an isocratic run was methanol: water (97:3). Detection was performed by Atmospheric Pressure Chemical Ionization (APCI) and quantification was done in selecting ion monitoring mode by using the characteristic fragment ion at m/z 279. Ergosterol content was quantified in subsamples resulting from B1 to B3 fractionations (duplicates or triplicates) and in 4 bulk sediment samples (triplicates). Extraction, amplification and cloning of bacterial ssu genes DNA extraction from sediments and grain size fractions: DNA was extracted from bulk sediment samples (triplicates from B) and from each samples resulting from the B1 to B3 fractionations. DNA extractions were performed with the Power SoilTM Extraction Kit (Mo Bio Laboratories, Ozyme, St Quentin en Yvelines, France), using 250 mg of fresh weight sediment or solid fractions (>1000, 160-1000) per sample and with the UltraClean Water kit (Mo Bio Laboratories), using 250 mL of the suspended

191 fractions (10-160, <10 µm and the leachable fractions), according to the manufacturer instructions. Three extractions were performed on the bulk sediment and one on each fraction per fractionation assay i.e . 3. The DNA extracts were checked on an agarose gel (1% in Tris-Borate-EDTA, hereafter TBE). PCR-SSCP analyses: The phylogenetic structure of bacterial and fungal communities was assessed by using single strand conformation polymorphisms (SSCP) as described previously (Zinger et al. , 2007, Gury et al. , 2008, Zinger et al. , 2008). Briefly, fungal diversity was assessed by amplification of the ITS1 region by using the primer ITS5 (5’ - GGAGTAAAAGTCGTAACAAGG-3’) and the fluorescently labeled primer (5’HEX) ITS2 (5’ -GCTGCGTTCTTCATCGATGC-3’). For bacterial diversity, the V3 region of ssu was amplified by using the primer W49 (5’ -ACGGTCCAGACTCCTACGGG-3’) and fluorescently labeled primer 5’ -FAM W104 (5’ -GTGCCAGCAGCCGCGGTAA-3’).

The PCR mixture (25 μL) consists of 2.5 mM of MgCl 2, 0.2 µM of each primer, 0.05 µM of each dNTP, 1 U of Taq polymerase (Roche), and 1 µL of each DNA sample (10 ng/μL), using Ultrapure water. The PCR conditions were the same for both cases: 95°C for 10 min; 95°C for 10 s, 56°C for 15 s, 72°C for 15 s (30 cycles); and final extension at 72°C for 7 min. PCR products were visualized on a 2% TBE agarose gel, allowing DNA concentration assessment. CE-SSCP : A 1-μl aliquot of the PCR product was mixed with 10 µL of formamide Hi - Di (applied Biosystems, Courtaboeuf, France), 0.5 µL of NaOH (0.3 M) and 0.2 µL of the standard internal DNA molecular weight marker Genescan-400HD ROX (Applied Biosystems). SSCP was performed on an ABI prism 3130 Genetic analyzer (Applied Biosystems), using a 36 cm length capillary. The polymer contained 5% CAP polymer, 10% glycerol (Applied Biosystems), and the running buffer contained 10% glycerol and 10% of 3100 buffer. The injection time and voltage were set at 22 s and 1 kV, respectively. Electrophoresis was performed at 32°C for 25 min (Zinger et al. , 2007, Zinger et al. , 2008). The profiles obtained from CE-SSCP were retrieved as digits and compared by correspondence analysis. This discriminative analysis ordinates rows by columns and columns by rows; it is usually performed on contingency tables to visualize the distribution of species in different habitats and, conversely, habitat uses by species (Ramette, 2007). Here, it allowed the visualization of phylotypes (in equivalent base pair: nbp) for subsamples, and subsamples for phylotypes. It was performed with the ADE-4 software (Dray & Dufour 2007), a package of R software (Team 2008). Two

192 incongruous profiles were substituted by the mean of their replicates in order to avoid biases of the CoA (one of the aggregated fractions for fungi and bacteria). Clone library construction and analysis . Bacterial communities were monitored by using 800 bp of the ssu gene encompassed by primers 63F (5’ - CAGGCCTAACACATGCAAGTC -3’ (positions 43 to 63 of E. Coli ssu ) (Marchesi et al. , 1998) and Com2-ph reverse (5’ - CCGTCAATTCCTTTGAGTTT -3’, positions 907 to 926 of E.coli ssu ) (Schwieger & Tebbe 1998). Clone libraries were constructed as previously described (Zinger ISME) for the leachable and aggregated samples (DNA extracts were pooled for each fraction). Briefly, eight independent PCR-amplifications were performed on each sample, pooled, purified using (QIAquick-PCR Purification kit (250) QIAGEN) and cloned using a TOPO TA PCR 4.1 cloning kit (Invitrogen SARL, Molecular Probes, Cergy Pontoise, France). Esherichia coli (TOP10F’) was transformed by electroporation. Blue transformants were selected for sequencing. Plasmidic DNA was extracted by using (NucleoSpin®-Robot96 Cor Kit-MACHEREY-NAGEL). Sanger- sequencing was performed by Cogenics (Meylan, France), using M13 primers. Sequence and statistical analyses: The chimerical sequences were detected by using Bellerophon (Huber et al. , 2004) and removed from the dataset. We obtained 256 sequences for the leachable fraction and 211 sequences for the aggregated fraction. The taxonomic assignment of ssu sequences was done by using the Ribosomal Database Project (Cole et al. , 2003). Rarefaction analysis at the order level indicates that most of the diversity at this level was covered with the sequencing depth used here.(data not shown) The closest matches were downloaded as references from GenBank (www.ncbi.nlm.nih.gov). The sequences shorter than 800 bp were removed to improve alignment, rendering 209 sequences for the leachable fraction and 164 for the aggregated fraction. The multiple alignments were performed by using the ClustalW algorithm (Thompson et al. , 1994). After calculating the Jukes-Cantor distance, the phylogenetic tree was constructed with MEGA 3.1 (Kumar et al. , 2004) by using Neighbour-Joining with 1000 bootstraps. We computed the nearest taxon index (hereafter NTI), which quantifies the degree of phylogenetic clustering of taxa given their patterns of presence/absence and their phylogenetic relationships (Webb et al. 2002) provided by the R package “picante” (Kembel 2009) Data processing Statistical analyses were performed with R (Team 2008). Nested ANOVA were performed to test bacteria counts between fractions (the potential effect of subsampling used for fractionation purposes was taken into account). Bacteria counts were log

193 transformed to verify the normality assumptions of residues and the equality of variances. A posteriori Tukey HSD test was performed when ANOVA revealed statistical significant differences. The significance level was set at α=0.05. For fungi, no statistical analysis was performed in regards of the number of fungal biomass data available. Only raw data (mini-max) are given. Results Identification of the aggregated fraction The basic unit of the sedimentary structure is the aggregate. In order to determine the aggregated fraction of sediment, we compared the grain size distribution of unsonicated vs sonicated sediments. The >1000 μm was mostly composed of gravels but also pieces of wood, particulate organic matter, aggregates, etc. The aggregated fraction was considered to be the fraction sensitive to sonication (Badin et al. , 2009). As shown in Figure 1, the peak 160-1000 µm was strongly reduced by sonication, indicating that this fraction is mostly made up of aggregates. The proportion of the particles between 10 and 160 µm, which increased in size after sonication, could be elementary particles or microaggregates too stable to be disaggregated by ultrasound. Consequently, fractionation thresholds were fixed. The fractions separated were: <10, 10-160, 160- 1000 and >1000 μm plus the leachable fraction.

Figure 1: Particle grain size distribution of the stormwater sediments. The black line matches the measurement made without preliminary ultra-sonication, the gray one matches the measurement made after sonication. Note aggregation of the 160-1000 µm fraction.

194 Table 1: Characteristics of the bulk sediment and the fractions. Grain sizes are in μm. Analyses were performed by triplicates.

(a) Regarding the number of bacteria, the fractions sharing the same letter have not been shown to be significantly different (a posteriori Tukey HSD test, α=0.05). (b) Results from calculation: (weight proportion of each fraction) x (bacteria count/ ergosterol content in fraction). (c) Sum of heavy metals measured in aqua regia digests.

195 The mass proportions of the fractions are shown in Table 1. The weight proportions of each grain size fraction did not differ between the triplicates of the 5 samples of sediments (p value of one-way ANOVA > 0.07). The >1000 μm fraction was the major fraction. The macroaggregated fraction (160-1000 μm) represents 14% and the leachable part only 2% of the sediment dry weight. The leachable fraction consisted of particles finer than 100 μm (>90%). Given the reproducibility of fr actionation, one sample was fractionated three times and analyzed for further studies. Biomass distribution In order to assess the microorganism distribution in the six studied fractions, the bacteria were counted and the ergosterol content measured in each fraction (Table 1). Regarding the bacteria, the 2 finest fractions (the leachable and <10µm fractions) contained the largest counts per gram of dry weight of fraction (around 7x10 10 bacteria/ g DW of sediments) (p values range from 0.000 to 0.020 and 0.000 to 0.013 respectively). The values were significantly lower in the >1000 μm fraction (p values = 0.000) and intermediate in the 10-160 and 160-1000 μm fractions. Finally, bacteria counts in the bulk sediments were not shown to be different from the aggregated fraction (160-1000 μm). When considering the contribution of weight of each fraction, the 10 - 160 and 160-1000 µm fractions accounted for 55% of total bacterial counts, while the leachable fraction only accounted for 8% (counts per g of total sediment). In summary, the fractions that contain the highest biomass per g, are also the smallest fraction, and consequently their contribution to the whole sediment is often minor. The ergosterol content of fractions suggests that fungal biomass per gram of dry weight was lowest in the aggregated fraction (160-1000 μm) and highest in both the <10 μm and leachable fractions. Moreover, ergosterol contents measured in the bulk sediments and the finer fractions were similar. Since fungi are expected to colonize the macroporosity ( i.e. not to be inside aggregates), these results further support the hypothesis that macroporosity is represented by the leachable fraction while the <10 µm fraction would be potentially leachable. Our results indicate that bacteria counts are found mainly in the aggregate fraction while fungal biomass is found in the macroporosity. It should be kept on mind that we compare live and dead counts (bacteria) to live biomass (fungi). Nevertheless we can hypothesize that bacteria and fungi niches are different: bacteria mainly colonize the aggregate fraction while fungi colonize the macroporosity.

196 Genotypic diversity of the microbial communities in grain size fractions To compare both the bacterial or fungal communities of these size fractions, we performed CE-SSCP from 3 independent fractionation experiments. The bacterial and fungal data were analyzed by correspondence analyses (CoA). The CoA on the genotypic diversity of the bacterial communities is plotted in Figure 2. The first canonical axis (eigenvalue = 0.026) explained 77% of the total variation in the data, the second (0.003) a further 8%. The replicates of the >1000 μm fraction differed greatly, indicating that bacterial communities living in the largest grain size fraction could be very different from one sampling point to another. This high spatial variability supports the 1 st axis of the CoA, and may reflect the variety of substrates observed in the >1000 µm fraction (pieces of wood, large aggregates, gravels) that could select different bacterial communities. Along the 1 st axis, the discriminative peaks are those for phylotypes (or OTU) from 202 to 205.5 bp. In contrast, the replicates of the other fractions and of the bulk sediment are well grouped, indicating that bacterial communities are homogeneous. The samples of the bulk sediment and the <10, 10-160, 160-1000 μm fractions cluster at the center of the CoA map (Figure 2). Thus, the genotypic fingerprints of the bacterial communities of bulk sediment are similar to those of the finer fractions. Moreover, they are composed of the majority of the phylotypes (otu) detected overall. On the other hand, the samples of the leachable fraction are clearly discriminated from the bulk sediment and the finer fractions. These results strongly suggest that bacterial communities from the macroporosity are different from those of the solid fractions.

(d) Figure 2: Mapping of the bacterial genotypic diversity in the grain size fractions and bulk sediment. Results from correspondence analysis (CoA) for bacterial SSCP profiles in fractionated gr ain size fractions (in μm). The two first axes are kept for mapping (77% and 8%); the first one is horizontal. Leach. = the leachable fraction.

197 The CoA on the genotypic diversity of fungal communities is shown in Figure 3. The first canonical axis (eigenvalue = 0.208) explained 45% of the total variation in the data, the second (0.076), a further 17%. Almost 40% of the total inertia of fungi diversity was not taken into account for mapping. The variability among replicates of the bulk sediment, the >1000 and 160-1000 μm fractions is huge and seems to support the 1 st axis. On the contrary, the replicates of the finer (>10 and 10-160 µm) and leachable fractions are well grouped. The 2 nd axis seems to be supported by the difference among fungal communities living, on the one hand, in the <10 µm fraction, and on the other, in the leachable and 10-160 µm fractions.

Figure 3: Mapping of the fungal genotypic diversity in the grain size fractions and bulk sediment. Results from correspondence analysis (CoA) for fungal SSCP profiles in fractionated grain size fractions (in μm). The two first axes are kept for mapping (45% and 17%); the first one is horizontal. Leach. = the leachable fraction.

Thus, fungi communities appear to be very heterogeneous in the larger fractions whereas homogeneous in the finer fractions. At least 3 homogeneous fungal communities were detected: those living in the macroporosity, <10 µm and 10-160 µm fractions. Both bacterial and fungal communities from the macroporosity of the urban sediments are different from those from the sediment microporosity. Composition of the bacterial communities living inside the aggregates and in the macroporosity: Based both on technosol structure and the genotypic diversity of the bacterial communities, the bacteria living in the leachable fraction (fitting the macroporosity) and the aggregated fraction (160-1000 µm) were investigated further. We obtained 457

198 bacterial ssu gene sequences (256 sequences for the leachable fraction and 201 sequences for the aggregated fraction). The sequences were grouped by 97% of similarity rendering 98 groups for leachable fraction and 101 groups for aggregated fraction. The Shannon-Weaver diversity indices are 7.64 and 4.83 for the leachable and aggregated fraction respectively. Thus both fractions display high but different bacterial diversities. The analysis of the phylogenetic structure indicated that the aggregated fraction was over-dispersed (NTI= – 0.43) but not significantly so (P=0.66), while the leachable fraction was clustered (NTI 1.65 P=0.047). The latter results indicate that the leachable fraction was subject to environmental constraints. The clone libraries were compared by using LibCompare of RDPII. The results are presented in Figure 4. An overall analysis reveals that in both samples the majority of phylotypes are related to Proteobacteria , a substantial number to the Bacteriodetes and only a few sequences to Actinobacteria (only 11 in the leachable fraction). Secondly, the bacterial communities from the macroporosity (the leachable fraction) and from the aggregates (the 160-1000 μm) are quite dissimilar (Figure 4), supporting the results of CE-SSCP. At phylum level, the proportion of Cyanobacteria is significantly higher in the aggregates (32.4% vs 7.6%, p-value = 1.98x10 -9). The proportions of the Proteobacteria (63.5% vs. 45.7%, p-value= 6.00x10 -4), and the Firmicutes (6.6% vs. 0%, p-value= 2.94x10 -4), are significantly higher in the macroporosity communities. The difference in Proteobacteria is due to the class Epsilonproteobacteria (9.6% vs. 0%, p- value = 6.70x10 -6).

Figure 4: Proportions of clones of the leachable (macroporosity, in red) and the aggregated fractions (in blue). Bacteria belonging to aggregates: red, to the leachable fraction (macroporosity): blue. *** : p value < 0.001.

199 In order to search for the presence of specific phylotypes in each sample, the sequences from both samples were merged and clustered by using phylogenetic approaches (Supplementary materials: Figure 1 and 2). For the group of Cyanobacteria , the ssu sequences were clustered into two main clades, with one of them representing chloroplastic sequences (Figure 5). The prokaryotic clade contains most of the “cyanobacterial” sequences from the aggregated fraction (39/54), while the chl oroplastic clade groups most of the sequences of the leachable fraction. The most abundant prokaryotic phylotype is related to Microcoleus vaginatus (37 sequences of the aggregated fraction and 3 from the aggregate fraction). Interestingly, the chloroplastic sequences were related to algae and diatoms, namely the algae Scenedesmus obliquus (leachable) and Spirogyra maxima (aggregate), the diatom Nitzschia frustulum (aggregated) and an unknown group (leachable).

Figure 5: Cyanobacteria and chloroplastic diversity of the leachable (red circle) and aggregated fractions (blue triangle). The numbers of sequences observed are indicated in brackets.

200 For the group of Proteobacteria , the sequences were also clustered into four main classes ( Alpha , Beta , Gamma and Epsilonoproteobacteria ) (Figure 6). The class of Alphaproteobacteria , accounting for 35/209 sequences in the leachable fraction and 20/164 sequences in the aggregate fraction, also seems to contain phylotypes displaying preferential distribution. Indeed, most of the sequences related to Sphingomonas sp . were only found in the leachable fraction. The other sequences, related to Skermanella , Porphyrobacter neustonensis , Devosia , Bradyrhizobium , Brevundimonas , and uncultured phylotypes did not display preferential distribution. The class contained (36/209) in the leachable fraction and (24/164) in the aggregated fraction. Most of the phylotypes related to defluvii were from the leachable fraction, while phylotypes related to an uncultured Betaproteobacterium were from the aggregate fraction. The clades associated with Ideonella sp. and Rhodoferax sp. were dominated by aggregate and leachable phylotypes respectively. For the other clades ( Aquaspirillum , Methylibium fluvum , and hortensis ), there was no clear trend. Thus, even if there is no significant difference in the distribution of this class, it seems that certain Betaproteobacterium phylotypes have a preferential distribution. The class of Gammaproteobacteria , which represented (31/209) in the leachable fraction and (22/164) in the aggregated fraction, displayed phylotypes related to Thermomonas brevis , Aspromonas , and uncultured species ( Chromatiales and Gammaproteobacterium ) without a trend in distribution. Finally, the class of Epsilonproteobacteria (19/209 sequences in the leachable fraction) was dominated by phylotypes related to Arcobacter sp .

201

Figure 6: Proteobacteria diversity of the leachable (red circle) and aggregated fractions (blue triangle). The numbers of observed sequences are indicated inside brackets. The group of Bacteroidetes displayed sequences of both the leachable (28/209) fraction and the aggregated fraction (33/164). These sequences were clustered into five main clades. Flavobacteriaceae was largely represented by phylotypes related to Flavobacterium , Glacier bacterium , Candidatus Amoebinatus , and uncultured Chryseobacterium . Interestingly, the branches related to F. kamowaganaensis cluster phylotypes from the aggregated fraction. There were two cases related to uncultured Bacteroidetes , both of them grouping mostly phylotypes from the aggregated fraction. In the group of Sphingobacteriaceae there was a clear distinction between the phylotypes from the leachable and aggregated fractions. The phylotypes from the aggregated fraction were related to Sphingobacterium sp . The phylotypes from the leachable fraction were related to uncultured strains ( Crenotrichaceae ). Concerning the fifth class,

202 it was related to Flexibacteriaceae and grouped mostly phylotypes from the aggregated fraction. These phylotypes were related to Cytophaga hutchinsoni , Arcocella aquatica , Rhodocytophaga aerolata and Flexibacterium bacterium . Once again, phylogenetic analysis revealed preferential distribution of the phylotypes. For the phylum of Firmicutes (24/209), which was present only in the leachable fraction, the sequences were related to Nocardioides , Microbacterium , Clostridium , Fusibacter , and Anaerovorax . Thus, phylogenetic analysis revealed preferential distribution of clades in almost all the phyla or classes detected. Discussion Human activities in urban areas generate huge volumes of solid materials that can be transported and deposited in stormwater infiltration devices, leading to the formation of urban sediment. Although their contaminant levels are often assessed, their physical structure or microbial communities are poorly described. Moreover, analysis of the leachable fraction is relevant not only as a descriptor of a microhabitat of urban sediment, but also because the pollutants, bacteria and even pathogenic bacteria present could reach the groundwater. Here, we found that the distribution of fungal and bacterial communities is different. The formers colonizing the leachable fraction, while the later are also present in the aggregates. Moreover, the phylogenetic structure of bacterial communities from the aggregates is different from that of the leachable fraction. General pattern in bacterial diversity of this urban sediment When considered both aggregate and leachable fraction,more than 50% of the phylotypes observed were related to Proteobacteria , while a further 40% were also related to Cyanobacteria and Bacteroidetes . Actinobacteria and Acidobacteria represent less than 3 and 2% respectively. Indeed, the general trend in natural soils is that Alphaproteobacteria , Acidobacteria , and Actinobacteria are often dominant; Bacteroidetes content seems to vary among soils while Firmicutes , and Planctomycetes are definitively less abundant (Janssen 2006, Roesch et al. 2007). The differences between soil and technosol can be due to many reasons. First, soil formation needs plants, which may select a specific bacterial cortege. Second, the times required for soil formation are higher than hundreds of years; this time may be necessary to recruit microbial species. Third, the quality of the soil metal content and OM may play a central role to select microorganisms. Indeed, while OM from soils is derived from plants and microbial species are better adapted to thrive on it, the OM of the technosol studied here derives from petroleum byproducts.. The urban sediment studied here was rich in

203 organic compounds with petroleum by-products (steranes and terpanes, unresolved complex mixture (UCM) and polycyclic aromatic hydrocarbons (PAH), but plant and bacteria biomarkers were also found as phytol and derivatives, or sterols (Badin et al., 2008). Such organic compounds may be either recalcitrant to degradation or of low bioavailability. Moreover, the technosol contains important amounts of metal pollutants (see above). Both the complex OM and metal content would select bacterial phylotypes adapted to cope with them, selecting for bacterial phylotypes similar to those present in contaminated sites. As a matter of fact, many phylotypes found here have previously been found in polluted environments such as contaminated soils and wastewater (Table 2), and could be involved in degradation of the complex OM. Table 2: Report of some bacterial strains in various environments. Environments Bacterial strains observed Wastewater treatment and Aquaspirillum psychrophilum (Morgan-Sagastume et activated sludge al. , 2008), Simplicispira (Grabovich et al. , 2006), Leptothrix (Kraigher et al. , 2008), Thermomonas brevis (Mergaert et al. , 2003), Afipia (Cole et al. , 2004), Nocardiodaceae (Yoon & Park, 2006), Algoriphagus , (Okabe et al. , 2007) Heavy metal contaminated Microcoleus (Trzcińska & Pawlik -Skowrońska, 2008) soil and sediment Oil contaminated soil and Nitzchia frustulum (Paissé et al. , 2008), Acidovorax (Li sediment et al. , 2008, Paissé et al. , 2008), , Janthinobacterium agaricidamnosum , Sphingomonadaceae , Rhodobacter , Clostridium , Flavobacterium (Li et al. , 2008), Microbacterium (Evtushenko & Takeuchi, 2006), Leptothrix (Omoregie et al. , 2008), Xanthomonas (Kim & Crowley, 2007, Paissé et al. , 2008), Skermanella (Kim & Crowley, 2007), Afipia (Nogales et al. , 2001), Erythromicro bium (Leys et al. , 2004) Extreme habitats Erythromicrobium (Rathgeber et al. , 2008), Silanimonas lenta (Manucharova et al. , 2008), Taxeobacter (Reichenbach, 2006)

204 The abundance of Cyanobacteria was unexpected. Actually this phylum was considered to be competitive in the absence of a large pool of organic carbon, such as in desert or early successional soils from a receding glacier soil (Gundlapally & Garcia- Pichel 2006, Nemergut et al. , 2007). As a matter of fact, the urban sediments can be considered to lack a large pool of bioavailable OM, because its OM is complex and recalcitrant to degradation. In this context, the presence of photosynthetic autotrophs (Cyanobacteria and algae) suggest that C fixation by photosynthesis would be more competitive than OM degradation (Badin et al., 2008). Thus, this urban sediment harboured bacterial communities with different C uptake characteristics (photosynthesis and OM degradation). It is tempting to speculate that part of the microbial community feeding on recalcitrant OM has an oligotrophic behavior, being not competitive enough to preclude Cyanobacteria development. It is noteworthy that large proportions of Cyanobacteria are present in the early stages of soil formation in the absence of vegetation, for instance in desert biological crusts (Garcia-Pichel et al. , 2001, Gundlapally & Garcia-Pichel 2006, see also below), tailing dumps resulting from a former mine of Zn-Cd (Trzcińska & Pawlik -Skowrońska 2008), and early successional soils from a receding glacier (Nemergut et al. , 2007). The presence of this pioneer phylum suggests that this urban sediment is somehow a soil in an early successional state. As expected, these results underline functional differences between natural soils and our urban sediments. Microbial distribution within soil fractions There are contrasting reports in the literature on the influence of aggregation on microbial community structure. While differential distribution of microbial communities was not observed (Blackwood et al. , 2006, Kim et al. , 2008), when studying agricultural soils, Sessitsch et al. (2001) showed that particle size has a notable impact on microbial diversity and community structure. This seems also to be the case for the urban sediments as judged by the molecular profiles analysis. Moreover, our molecular analyses further support the distinction between microbial communities of microporosity (aggregates) and macroporosity (outer, leachable fraction) previously observed (Hattori & Hattori 1976, Ranjard et al. , 2000, Mummey et al. , 2006). However, the fungal communities from the bulk, >1000 and aggregated fractions were heterogeneous, indicating a lack of structure of this community at the scale studied here. It is possible that increasing the sampling scale (the whole basin rather than the plot or increasing the amount of soil extracted) would reveal a structure. However, it is also possible that the

205 fungal community is not structured yet, and that we are catching a dispersal signal. Actually, the technosol may not be an appropriated habitat, because complexity of the OM may preclude the utilization by fungi. This hypothesis is also supported by the presence of photosynthetic organisms, which indirectly support the proposal of technosol as a pioneer soil (see below). Another important point regards the presence of pathogenic bacteria that may leach towards groundwater. Is important to stress that the methods used here are not specific for pathogenic bacteria, and are not accurate enough. Still, we found Arcobacter sp . in the leachable fraction. This strain was reported to be a pathogen of fecal origin. This finding is further supported by the presence of fecal sterols in the sediments (waste water markers (coprostanol, coprostanone, 24-ethyl-coprostanol) were reported in these specific sediments (Badin et al, 2008)). More appropriated methods are needed to asses this point. Bacterial communities associated with aggregates recalls poor soils The sizes and proportions of the aggregated fraction were first evaluated and the fractionation protocol was designed to keep the most representative aggregated fraction: 160-1000 μm. Several phylotypes were shown to be preferentially associated with this fraction, Microcoleus vaginatus being the most represented. This bacteria was described as a large, highly mobile, filamentous species lacking a heterocyst (unable to fix N 2), that is ubiquitous and can colonize soils within only a few days after soil disturbance if the soil is wet (Belnap, 2002). Indeed, M. vaginatus was previously observed to dominate certain biological soil crust communities. Biological soil crusts are the topmost layers of the soil wherever higher plant cover is restricted, and most notably in arid regions (Garcia-Pichel et al. , 2001, Gundlapally & Garcia-Pichel 2006). Although the urban sediment studied here present a high water content, the low bioavailability of OM (see above), makes it alike pioneer soils, which are poor in nutriments, and also selects for autotrophic bacteria. The pioneering role of M. vaginatus species can be related to the transformation of sediment structure. Effectively, M. vaginatus was reported to contribute to soil microstructure by coating, enmeshment, binding and gluing of particles (Malam Issa et al. , 2007). Other Cyanobacteria , heterotrophic bacteria and fungi also contribute to the formation of biological soil crusts (Garcia-Pichel et al. , 2001, Gundlapally & Garcia- Pichel 2006). Roeselers et al. (2007) reported the study of successional changes of community composition of freshwater phototroph ic biofilms growing on polycarbonate

206 slides and inoculated with biofilm samples obtained from the sedimentation tank of a wastewater treatment plant. They reported the dominance of M. vaginatus , and the presence of other phylotypes: algae ( Scenedesmus obliquus ), some Bacteriodetes (Cytophagales , Taxeobacter ), and some Betaproteobacteria (Acidovorax ). Interestingly, we also observed the phylotypes just cited. The question remains about the N supply that may be accomplished by heterocystous Cyanobacteria , which were not found here. The absence of N 2 fixing Cyanobacteria in our dataset can be explained by either failure to extract DNA from these strains (because of thick extracellular sheaths (Garcia-Pichel et + al. , 2001)). It is also possible that the NH 4 present in our samples (Badin et al ., 2011) is sufficient to support growth of bacteria. Bacterial communities associated with leachable fraction reveals habitat fragmentation. When water infiltrates through a porous material, macropores drive the majority of the advective flow. Indeed, as much as 1410 µg of Zn, 224 µg of Cu and 8 and 10% of the total bacterial and fungal biomasses respectively were leached from the technosol in a controlled drying experiment carried out in laboratory column (Badin et al. , 2011). Here, the leachable fraction fits with macroporosity. The leachable bacterial community (macroporosity) is clearly different from those of the non-fractionated sediments and other fractions (Figure 2). Phylotypes related to the following species were identified only or preferentially in the leachable fraction: Sphingomonas sp., Acidovorax defluvii, Arcobacter sp., Sphingobacteriaceae, Crenotrichaceae, Nocardioides, Microbacterium, Clostridium, Fusibacter, and Anaerovorax . Some are known to grow preferentially anaerobically ( i.e. Clostridium (Wiegel et al. , 2006), Anaerovorax sp . (Matthies et al. , 2000) and others, aerobically ( i.e. Microbacterium (Evtushenko & Takeuchi, 2006)). Thus, the bacterial composition suggests that the macroporosity is not a uniform aerobic habitat, but also comprises anaerobic regions. This is compatible with the actual knowledge on urban sediments. The technosol fraction is a porous material for which around 50% of the pores were filled with water which gives rise to aquatic and aerial niches. Additionally, the different drainage times may have led to different levels of dissolved O 2 (Lassabatere et al., 2010). The composition of the bacterial community nicely reflects the presence of at least two microenvironments differing in O 2 content. Thus, even such small fractions display heterogeneous habitats.

207 Conclusion This study describes the characteristics of the microbial communities from urban sediment resulting from stormwater infiltration practices. Distribution of bacteria and fungi in the microstructure was also investigated by analyses of the leachable fraction (macroporosity) and different grain size fractions. While fungal biomass is more abundant in the macroporosity, Bacteria is present in both macro- and microporosity. Genotypic diversity of both bacterial and fungal communities highlighted the heterogeneity of niches in the larger fraction (> 1000 µm), but also the similarity for smaller fractions. Bacterial communities of the microporosity differ strongly from those of the macroporosity. While the bacterial communities of the aggregate fraction are compatible with those of poor soil habitats or contaminated material, the leachable fraction reveals the presence of microenvironments. The succession of microbial communities through time should be investigated further. Moreover, the fate of the leached microbes in the subsoil should be investigated in order to assess the risk for groundwater quality. Acknowledgements The work presented was funded in part by the ECOPLUIE project backed by the PRECODD research program (2005) (n° ANR: ANR-05-ECOT-006-07, n° ADEME: 0594C-0089) and the Ministère de l'Ecologie, de l'Energie, du Développement durable et de l'Aménagement du territoire - Direction de la recherche et de l'innovation (Convention n° 07 DST S 002). We also thank the two anonymous reviewers whose comments largely improve the paper. It was also carried out in the framework of the regional observatory on urban hydrology (OTHU).

208 References Badin A-L, Bedell J-P & Delolme C (2009) Effect of water content on aggregation and contaminant leaching: the study of an urban Technosol. J.Soil Sediment 9: 653-663. Badin A-L, Faure P, Bedell J-P & Delolme C (2008) Distribution of organic pollutants and natural organic matter in urban storm water sediments as a function of grain size. Sci. Total Environ. 403: 178-187. Badin A-L, Méderel G, Béchet B, Borschneck D & Delolme C (2009) Study of the aggregation of the surface layer of Technosols from stormwater infiltration basins using grain size analyses with laser diffractometry. Geoderma 153: 163-171. Badin A-L, Monier A, Volatier L, Geremia RA, Delolme C & Bedell J-P (2011) Structural stability, microbial biomass and community composition of sediments affected by the hydric dynamics of an urban stormwater infiltration basin. Microbiol. Ecol. (DOI 10.1007/s00248- 011-9829-4). Belnap J (2002) Nitrogen fixation in biological soil crusts from southeast Utah, USA. Biol Fertil Soils 35: 128-135. Blackwood CB, Dell CJ, Smucker AJM & Paul EA (2006) Eubacterial communities in different soil macroaggregate environments and cropping systems. Soil Biol. Biochem. 38: 720-728. Boix-Fayos C, Calvo-Cases A, Imeson AC, Soriano-Soto MD. (2001) Influence of soil properties on the aggregation of some Mediterranean soils and the use of aggregate size and stability as land degradation indicators. Catena, 44: 47-67. Capilla X, Schwartz C, Bedell JP, Sterckeman T, Perrodin Y, Morel JL. (2006). "Physicochemical and biological characterisation of different dredged sediment deposit sites in France." Environ. Pollut. 143 :106-116. Chenu C & Stotzky G (2002) Interaction between Microorganisms and Soil Particles: An Overview. Interactions between Soil Particles and Microorganisms,(Huang PM, Bollag J-M & Senesi N, eds.), pp. 571 John Wiley and Sons. Clozel B, Ruban V, Durand C & Conil P (2006) Origin and mobility of heavy metals in contaminated sediments from retention and infiltration ponds. Appl. Geochem. 21: 1781- 1798. Cole AC, Semmens MJ & LaPara TM (2004) Stratification of Activity and Bacterial Community Structure in Biofilms Grown on Membranes Transferring Oxygen. Appl. Environ. Microbiol. 70: 1982-1989.

209 Cole JR, Chai B, Marsh TL, Farris J, Wang Q, Kulam S A, Chandra S, McGarrell D M, Schmidt T M, Garrity G M & Tiedje J M (2003) The Ribosomal Database Project (RDP-II): previewing a new autoaligner that allows regular updates and the new prokaryotic taxonomy. Nucleic Acids Res. 31: 442-443. Datry T, Malard F ,Gibert J. (2004) Dynamics of soluted and dissolved oxygen in shallow urban groundwater below a stormwater infiltration basin. Sci. Total Environ. 329: 215-229. Djajakirana G, Joergensen RG & Meyer B (1996) Ergosterol and microbial biomass relationship in soil. Biol. Fertil. Soils 22: 299-304. Dray S & Dufour AB (2007) The ade4 package: implementing the duality diagram for ecologists. J. Statist. Software 22: 1-20. Durand C, Ruban V & Amblès A (2005) Characterisation of complex organic matter present in contaminated sediments from water retention ponds. J. Anal Appl Pyrolysis 73: 17- 28. Durand C, Ruban V, Ambles A & Oudot J (2004) Characterization of the organic matter of sludge: determination of lipids, hydrocarbons and PAHs from road retention/infiltration ponds in France. Environ. Pol. 132: 375-384. Edwards AP, Bremner JM. (1967) Microaggregates in soils. J. Soil Sci. 18: 64-73. Evtushenko LI & Takeuchi M (2006) The Family Microbacteriaceae. The Prokaryotes, Vol. 3 (York SN, ed.), pp. 1020-1098. Springer. Franklin RB & Mills AL (2007) Introduction. The spatial distribution of microbes in the environment,(Franklin RB & Mills AL, eds.), p. 333. Springer, Dordrecht, The Netherland. Garcia -Pichel F, López -Cortés A & Nübel U (2001) Phylogenetic and Morphological Diversity of Cyanobacteria in Soil Desert Crusts from the Colorado Plateau. Appl. Environ. Microbiol. 67: 1902-1910. Grabovich M, Gavrish E, Kuever J, Lysenko AM, Podkopaeva D & Dubinina G (2006) Proposal of Giesbergeria voronezhensis gen. nov., sp. nov. and G. kuznetsovii sp. nov. and reclassification of [Aquaspirillum] anulus, [A.] sinuosum and [A.] giesbergeri as Giesbergeria anulus comb. nov., G. sinuosa comb. nov. and G. giesbergeri comb. nov., and [Aquaspirillum] metamorphum and [A.] psychrophilum as Simplicispira metamorpha gen. nov., comb. nov. and S. psychrophila comb. nov. Int. J. Syst. Evol. Microbiol. 56: 569-576. Grundmann GL (2004) Spatial scales of soil bacterial diversity - the size of a clone. FEMS Microbiol. Ecol. 48: 119-127.

210 Gundlapally SR & Garcia-Pichel F (2006) The community and phylogenetic diversity of biological soil crusts in the Colorado Plateau studied by molecular fingerprinting and intensive cultivation. Microb. Ecol. 52: 345-357. Gury J, Zinger L, Gielly L, Taberlet P & Geremia RA (2008) Exonuclease activity of proofreading DNA polymerases is at the origin of artifacts in molecular profiling studies. Electrophoresis 29: 2437-2444. Hattori T & Hattori R (1976) The physical environment in soil microbiology: an attempt to extend principles of microbiology to soil microorganisms. CRC Critical Rev. Microbiol. 4:423-461. Huber T, Faulkner G & Hugenholtz P (2004) Bellerophon: A program to detect chimeric sequences in multiple sequence alignments. Bioinformatics 20: 2317-2319. Hussein J. and Adey M.A. (1998) Changes in microstructure, voids and b-fabric of surface samples of a Vertisol caused by wet/dry cycles. Geoderma, 85:63-82. Janssen PH (2006) Identifying the dominant soil bacterial taxa in libraries of 16S rRNA and 16S rRNA genes. Appl. Environ. Microbiol. 72: 1719-1728. Jocteur Monrozier L, Ladd JN, Fitzpatrick RW, Foster RC & Rapauch M (1991) Components and microbial biomass content of size fractions in soils of contrasting aggregation. Geoderma 50: 37-62. Kembel S, Ackerly D, Blomberg S, Cowan P, Helmus M, Morlon H & Webb CO (2009) picante: R tools for integrating phylogenies and ecology. R package version 0.7-2. Kepner Jr. RL & Pratt JR (1994) Use of fluorochromes for direct enumeration of total bacteria in environmental samples: Past and present. Microbiol. Rev. 58: 603-615. Kim J-Y & Sansalone JJ (2008) Event-based size distributions of particulate matter transported during urban rainfall-runoff events. Water Res. 42: 2756-2768. Kim JS & Crowley DE (2007) Microbial diversity in natural asphalts of the Rancho La Brea Tar Pits. Appl. Environ. Microbiol. 73: 4579-4591. Kim JS, Dungan RS & Crowley D (2008) Microarray analysis of bacterial diversity and distribution in aggregates from a desert agricultural soil. Biol. Fertil. Soils 44: 1003-1011. Kraigher B, Kosjek T, Heath E, Kompare B & Mandic-Mulec I (2008) Influence of pharmaceutical residues on the structure of activated sludge bacterial communities in wastewater treatment bioreactors. Water Res. 42: 4578-4588. Kumar S, Tamura K & Nei M (2004) MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment. Brief Bioinform 5: 150-163.

211 Lassabatere L, Angulo-Jaramillo R, Goutaland D, Letellier L, Gaudet JP, Winiarski T, & Delolme C (2010) Effect of the settlement of sediments on water infiltration in two urban infiltration basins. Geoderma 156: 316-325. Leys NMEJ, Ryngaert A, Bastiaens L, Verstraete W, Top EM & Springael D (2004) Occurrence and Phylogenetic Diversity of Sphingomonas Strains in Soils Contaminated with Polycyclic Aromatic Hydrocarbons. Appl. Environ. Microbiol. 70: 1944-1955. Li D, Yang M, Li Z, Qi R, He J & Liu H (2008) Change of bacterial communities in sediments along Songhua River in Northeastern China after a nitrobenzene pollution event. FEMS Microbiol. Ecol. 65: 494-503. Malam Issa O, Defarge C, Le Bissonnais Y, Marin B, Duval O, Bruand A, D’Acqui LP, Nordenberg S & Annerman M (2007) Effects of the inoculation of cyanobacteria on the microstructure and the structural stability of a tropical soil. Plant and Soil 290: 209-219. Manucharova NA, Vlasenko AN, Tourova TP, Panteleeva AN, Stepanov AL & Zenova GM (2008) Thermophilic chitinolytic microorganisms of brown semidesert soil. Microbiology 77: 610-614. Marchesi JR, Sato T, Weightman AJ, Martin TA, Fry JC, Hiom SJ & Wade WG (1998) Design and evaluation of useful bacterium-specific PCR primers that amplify genes coding for bacterial 16S rRNA. Appl. Environ. Microbiol. 64: 795-799. Martin JP, Martin WP, Page JB, Raney WA, De Ment JD. (1955) Soil aggregation. Adv. Agron. 7: 1-37. Matthies C, Evers S, Ludwig W & Schink B (2000) Anaerovorax odorimutans gen. nov., sp. nov., a putrescine-fermenting, strictly anaerobic bacterium. Int. J. Syst. Evol. Microbiol. 50: 1591-1594. Mergaert J, Cnockaert MC & Swings J (2003) Thermomonas fusca sp. nov. and Thermomonas brevis sp. nov., two mesophilic species isolated from a denitrification reactor with poly(E-caprolactone) plastic granules as fixed bed, and emended description of the genus Thermomonas. Int. J. Syst. Evol. Microbiol. 53: 1961-1966. Mermillod-Blondin F, Nogaro G, Vallier F & Gibert J (2008) Laboratory study highlights the key influences of stormwater sediment thickness and bioturbation by tubificid worms on dynamics of nutrients and pollutants in stormwater retention systems. Chemosphere 72: 213- 223. Morgan-Sagastume F, Nielsen JL & Nielsen PH (2008) Substrate-dependent denitrification of abundant probe-defined denitrifying bacteria in activated sludge. FEMS Microbiol. Ecol. 66: 447-461.

212 Mummey D, Holben W, Six J & Stahl P (2006) Spatial stratification of soil bacterial populations in aggregates of diverse soils. Microb. Ecol. 51: 404-411. Murakami M, Nakajima F, Furumai H. (2008) The sorption of heavy metal species by sediments in soakaways receiving urban road runoff. Chemosphere, 70: 2099-2109. Nemergut DR, Anderson SP, Cleveland CC, Martin AP, Miller AE, Seimon A & Schmidt SK (2007) Microbial community succession in an unvegetated, recently deglaciated soil. Microb. Ecol. 53: 110-122. Neto M, Ohannessian A, Delolme C & Bedell JP (2007) Towards an optimized protocol for measuring global dehydrogenase activity in storm-water sediments. J. Soils Sediments 7: 101-110. Nogales B, Moore ERB, Llobet-Brossa E, Rossello-Mora R, Amann R & Timmis KN (2001) Combined Use of 16S Ribosomal DNA and 16S rRNA to Study the Bacterial Community of Polychlorinated Biphenyl-Polluted Soil. Appl. Environ. Microbiol. 67: 1874- 1884. Oades, JM. (1988) The retention of organic matter in soils. Biogeochemistry, 5: 35-70. Okabe S, Odagiri M, Ito T & Satoh H (2007) Succession of sulfur-oxidizing bacteria in the microbial community on corroding concrete in sewer systems. Appl. Environ. Microbiol. 73: 971-980. Omoregie EO, Mastalerz V, De Lange G, Straub K L, Kappler A, Røy H, Stadnitskaia A, Foucher J-P, & Boetius A (2008) Biogeochemistry and community composition of iron- and sulfur-precipitating microbial mats at the Chefren mud volcano (Nile deep sea fan, eastern Mediterranean). Appl. Environ. Microbiol. 74: 3198-3215. Paissé S, Coulon F, Goñi -Urriza M, Peperzak L, McGenity TJ & Duran R (2008) Structure of bacterial communities along a hydrocarbon contamination gradient in a coastal sediment. FEMS Microbiol. Ecol. 66: 295-305. Pallud C, Dechesne A, Gaudet J-P, Debouzie D & Grundmann GL (2004) Modification of spatial distribution of 2,4-dichlorophenoxyacetic acid degrader microhabitats during growth in soil columns. Appl. Envir. Microbiol. 70: 2709-2716. Petavy F & Ruban V (2005) Estimation des gisements. (LCPC, ed.), pp. 52-54. LCPC, Bouguenais, France. Pitt R, Clark S, Field R. (1999) Groundwater contamination potential from stormwater infiltration practices. Urban Water J. 1: 217-236. Porter KG & Feig YS (1980) The use of DAPI for identifying and counting aquatic microflora. Limnol. Oceanogr. 25: 943-948.

213 Ramette A (2007) Multivariate analyses in microbial ecology. FEMS Microbiol. Ecol. 62: 142-160. Ranjard L, Richaume A, Jocteur Monrozier L & Nazaret S (1997) Response of soil bacteria to Hg(II) in relation to soil characteristics and cell location. FEMS Microbiol. Ecol. 24: 321-331. Ranjard L, Nazaret S, Gourbière F, Thioulouse J, Linet P & Richaume A (2000) A soil microscalde study to reveal the heterogeneity of Hg(II) impact on indigenous bacteria by quantification of adapted phenotypes and analysis of community DNA fingerprints. FEMS Microbiol. Ecol. 31: 107-115. Rathgeber C, Lince MT, Alric J, Lang AS, Humphrey E, Blankenship RE, Verméglio A, F., Plumley FG, Van Dover CL, Beatty JT & Yurkov V (2008) Vertical distribution and characterization of aerobic phototrophic bacteria at the Juan de Fuca Ridge in the Pacific Ocean. Photosynth. Res. 97: 235-244. Reichenbach H (2006) chap 6.7: The order Cytophagales. Prokaryotes Vol. 7 (Dworkin M, ed.) pp. 549-590. Springer New York, New York. Roesch LF, Fulthorpe RR, Riva A, Casella G, Hadwin AKM, Kent AD, Daroub SH, Camargo FAO, Farmerie WG, Triplett EW. (2007) Pyrosequencing enumerates and contrasts soil microbial diversity. ISME J 1:283 –290) Roeselers G, Van Loosdrecht MCM & Muyzer G (2007) Heterotrophic pioneers facilitate phototrophic biofilm development. Microb. Ecol. 54: 578-585. Ruban V (2005) Caractérisation et gestion des sédiments de l'assainissement pluvial. Génie urbain EG 19, Laboratoire Central des Ponts et Chaussées, (LCPC, ed.) p. 151, Paris. Schwieger, F & Tebbe CC. (1998). A new approach to utilize PCR-single-strand- conformation polymorphism for 16S rRNA gene-based microbial community analysis. Appl. Environ. Microbiol. 64:4870-4876. Sessitsch A, Weilharter A, Gerzabek MH, Kirchmann H & Kandeler E (2001) Microbial Population Structures in Soil Particle Size Fractions of a Long-Term Fertilizer Field Experiment. Appl. Environ. Microbiol. 67: 4215-4224. Team RDC (2008) R: A Language and Environment for Statistical Computing. (Computing RFfS, ed.), Vienna, Austria. Thompson JD, Higgins DG & Gibson TJ (1994) Clustal-W - Improving The Sensitivity Of Progressive Multiple Sequence Alignment Through Sequence Weighting, Position-Specific Gap Penalties And Weight Matrix Choice. Nucleic Acid Res. 22: 4673-4680.

214 Tisdall JM, Oades JM. (1982) Organic matter and water-stable aggregates in soils. J. Soil Sci. 33: 141-163. Trzcińska M & Pawlik -Skowrońska B (2008) Soil algal communities inhabiting zinc and lead mine spoils. J. Appl. Phycol. 20: 341-348. Webb CO, Ackerly DD, McPeek MA & Donoghue MJ (2002) Phylogenies and community ecology. Annu. Rev. Ecol. Evol. Syst. 33: 475-505. Wiegel J, Tanner R & Rainey F (2006) An Introduction to the Family Clostridiaceae. The Prokaryotes, Vol. 4 (York SN, ed.), pp. 654-678. Springer, New York. Winiarski T, Bedell JP, Delolme C & Perrodin Y (2006) The impact of stormwater on a soil profile in an infiltration basin. Hydrogeol. J. 14: 1244-1251. Yoon JH & Park YA (2006) The Genus Nocardioides. The Prokaryotes, (Dworkin M, ed.), pp. 1099-1113. Springer, New York. Young IM & Crawford JW (2004) Interactions and self-organization in the soil-microbe complex. Science 304: 1634-1637. Zinger L, Gury J, Giraud F, Krivobok S, Gielly L, Taberlet P & Geremia RA (2007) Improvements of polymerase chain reaction and capillary electrophoresis single-strand conformation polymorphism methods in microbial ecology: Toward a high-throughput method for microbial diversity studies in soil. Microb. Ecol. 54: 203-216. Zinger L, Gury J, Alibeu O, Rioux D, Gielly L, Sage L, Pompanon F & Geremia R A (2008) CE-SSCP and CE-FLA, simple and high-throughput alternatives for fungal diversity studies. J. Microbiol. Methods 72: 42-53.

215

216

217 Résumé: Le sol constitue l'environnement le plus spatialement et temporellement hétérogène sur terre, une combinaison de structure physique complexe, géochimie et fluctuations saisonnières drastiques dans la température, l'humidité et la disponibilité des nutriments. Il héberge une énorme diversité de bactéries et de champignons qui représentent l’essentiel de la biomasse et sont les acteurs clé de la dégradation de la matière organique. Les sols des toundras arctiques et alpines sont des puits de carbone majeurs, et leur fonctionnement tant aérien que souterrain des écosystèmes froids est fortement modulé par l’hauteur du manteau neigeux. Nous avo ns utilisé l’approche métatranscriptomique qui est une analyse semi- quantitative des gènes exprimés par un ou plusieurs organismes, ou par l'écosystème entier pour essayer de comprendre la diversité fonctionnelle réelle et les activités exprimées dans les sols alpins par les eukaryotes, en réponse à ces contraintes environnementales. Nous avons étudié les activités des communautés microbiennes eucaryotes (en ciblant la fraction polyA des ARN messagers) des sols alpins sous deux conditions d’enneigement tr ès contrasté. Nous présentons un pipeline d’ analyse bioinformatique des séquences et les résultats de trois procédures d'annotation parallèles. Plusieurs milliers de cDNAs fonctionnels ont été identifies, ce qui permet une ébauche de la structure taxonomique de ces écosystèmes. Malgré l’absence de génome de référence concernant les communautés eucaryotes de sol alpin, nous avons pu identifier plusieurs séquences codantes des enzymes qui sont essentielles dans plusieurs voies métaboliques largement importants dans la majorité des cycles biogéochimiques. Egalement, nous avons montré que les deux sols étudiés aux deux extremes d’un gradient d’enneigement présentent de s différences dans les enzymes dirigeant l’utilisation du glucose, vers une dégradation liée a la production d’énergie dans un cas, et vers la formation de sucres lies a l’exposition au stress dans l’autre.

218

Abstract: The soil represent the environment the most spatially and temporally heterogeneous on earth, a combination of complex physical structure, geochemistry and dramatic seasonal fluctuations in temperature, humidity and nutrient availability. It contains a huge diversity of bacteria and fungi that make up the bulk of the biomass and are key players in the degradation of organic matter. The soils of the arctic and alpine tundra are major carbon sinks, and functions of both above and belowground cold ecosystems are strongly modulated by the thicknesses of the snowpack. We used the métatranscriptomique approach which is a semi- quantitative analysis of genes expressed by one or more organizations, or by the entire ecosystem to try to understand the functional diversity and real activity expressed in the alpine soils by eukaryotes communities, in response to these environmental constraints. We studied the activities of eukaryotic microbial communities (targeting the fraction of polyA mRNA) in two alpine soil under different snow conditions. We present a bioinformatics analysis pipeline and the results of three parallel annotation procedures. Thousands of functional cDNAs were identified, allowing a draft of the taxonomic structure of these ecosystems. Despite the lack of reference genome of the eukaryotic communities of alpine soil, we identified several coding sequences of enzymes that are essential in many important metabolic pathways widely in most biogeochemical cycles. Also, we have shown that the two soils studied at both extremes of a gradient of snow are differences in the enzymes directing the use of glucose, linked to deterioration in the production of energy in one case and to the formation of sugars related to exposure to stress in the other.

219