Microbial Detection and Monitoring Using Tools

Teresa Lettieri

Institute for Environment and Sustainability

2006 EUR 22600 EN

The mission of the Institute for Environment and Sustainability is to provide scientific-technical support to the European Union’s Policies for the protection the environment and sustainable development of the European and global environment.

European Commission Directorate-General Joint Research Centre Institute for Environment and Sustainability

Contact information Address:via E. Fermi, 1,TP 300 E-mail: [email protected] Tel.: +39-0332789868 Fax: 39-0332789352

http://ies.jrc.cec.eu.int http://www.jrc.cec.eu.int

Legal Notice Neither the European Commission nor any person acting on behalf of the Commission is responsible for the use which might be made of this publication.

A great deal of additional information on the European Union is available on the Internet. It can be accessed through the Europa server http://europa.eu.int

EUR 22600 EN

ISSN 1018-5593

Luxembourg: Office for Official Publications of the European Communities

© European Communities, 2006

Reproduction is authorised provided the source is acknowledged

Printed in Italy

CONTENTS

INTRODUCTION 1

MICROBIAL WORLD 3

MOLECULAR BIOLOGY TOOLS 5 DNA Sequencing 5 DNA Cloning System 5 PCR and Real Time PCR 6 Hybridization Techniques 9 Immunofluorescence 10

FROM FIRST MICROBIAL GENOMICS TO MICROBIAL COMMUNITY 12

GENOMICS TOOLS 13 DNA Microarray Technology 13 Mass Spectrometry 16 Single Cell Genomic Sequencing 18

PUBLIC DATABASES 19

MICROBIAL COOMUNITIES: DYNAMIC CHANGES 21

CONCLUSIONS 23

REFERENCES 25

Introduction

Micro-organisms are responsible for the most biogeochemical cycles that shape the environment of earth and its oceans. So far, only part of these organisms has been well studied, especially those living on earth and more considered from an anthropogenic perspective, e.g causing human diseases or providing useful products and services. Further the inability to generate pure culture has hampered the possibility to study and understand the metabolic processes of many micro- organisms.

Recently, advances in molecular biology and -omics technologies are offering new and more exciting perspectives and knowledge of the microbial world, such as to understand the biological communities and function relationships, to identify the diversity of the population; to monitor the effects of environmental factors on the community structures.

In March 2003 J. Craig Venter and coworkers have begun to explore environmental bacteria in a culture-independent manner by isolating DNA from directly the environmental sample, Sargasso Sea and then transforming into large insert clones for sequencing (1). The sequencing revealed the identification of new species. Indeed, molecular approaches have significantly influenced the understanding of the microbial diversity and ecology. In particular, ribosomal RNA

(rRNA) gene sequence comparisons have provided a revolutionary approach for interpreting microbial evolutionary relationships (2). In a logical extension of this technique extraction of phylogenetically informative genes directly from naturally occurring represents another important development, in , opening up the natural microbial world to be closer scrutiny (3, 4 ).

1 Further advances in genome sequencing have had and still having a great impact on microbial biology providing insights into , and diversity. So far the full genome sequencing of many , especially marine ones, have been completed and many others are underway. Combining the cultivation –independent gene sequences with the genomic approaches (e.g. whole genome shotgun, -omic technologies) it is now possible to investigate a more comprehensive natural microbial communities. These techniques, first applied to marine plankton to characterize uncultivated marine bacterial and archeal species (5) are now becoming a common method to characterize the community structure.

Applications include the genome analysis of uncharacterized taxa (6, 7 ) expression of novel genes pathways from uncultured environmental microorganisms, elucidation of community-specific metabolism and comparison of different community gene contents.

If from on one hand, we are discovering huge microbial diversity, on the other hand it is increasing concerns about the loss of biodiversity, especially caused by exploitation, pollution and habitat destruction, or indirectly through climate change and related perturbations of ocean biogeochemistry (8-10 ). The review will explore the development, application of the genomics technologies to the microbial biodiversity (microbial ecology) monitoring. I will first introduce the concept of microbial world, what we know so far and then facing with the molecular biology tools the transforming view of microbial diversity the (metagenomics, ecogenomics).

The second section is dedicated to the applicability and to the illustration of examples of –omics technologies to the monitoring of microbial biodiversity either in sea/ocean or in freshwaters. This part will be introduced by -omics technologies

2 (DNA Microarray, Mass Spectrometry Maldi-TOFF and single cell genome sequencing) and by the availability of public databases.

Microbial World The microbial world contains a highly heterogeneous group and they are distributed in four domains, Bacteria Archea, Eukaryota , and . In the

Eukaryota, the micro-organisms are within the eukaryotic micro-organisms such as protozoa, algae and fungi. Viruses are the most abundant entities on the planet.

Although they are not organisms in the sense as Eukaryotes, Archeans and Bacteria, are of considerable biological importance and, like bacteria, help cycle organic matter in the marine food web. In a more global vision we can define three classes in the ecosystem, the producers, decomposers and consumers. The producers are plants, algae, and autotrophic bacteria and archea (cyanobacteria), they acquire nutrients from inorganic sources (carbon dioxide) and water into organic compounds

(photosynthesis), and usually it is referred to them as phytoplankton. The decomposers, in contrast, cannot acquire carbon from inorganic sources. But they can transform organic sources supplied by producers, either as exudates or as dead organic matter. They play a key role in the recycling of dissolved organic material

(DOM), either transforming organic material into inorganic forms (mineralization, nutrients for producers), or contributing to the food chain since they are eaten by micro-flagellate and ciliate which then, are food for small fishes (zooplankton, first consumer), such process is also known as the microbial loop (Fig.1). In the microbial loop, the role of viruses is still unclear, especially the bacteriophage. The bacteriophages attack the bacteria, so they can influence the loop releasing more

3 dissolved organic matter (DOM) and at the same time, they can affect the bacteria community, then the microbial loop as well as the carbon cycle.

Carbon (CO 2) + Nutrients

Dissolved Organic Matter DOM ?

Micro-flagellates Microbial Loop Viruses Bacteria ?

Fig.1 Microbial Loop is shown. The carbon cycle and energy flow is controlled by the microbial world, either for the production (phytoplankton) or for the release of dissolved Organic matter in the cycle (heterotrophic bacteria, bacteria). It is still unknown the role of viruses as well as their influences. They attack bacteria and unicellular algae influencing the total biomass.

4

Molecular Biology Tools

The genomic age began in 1977 when a that infects Escherichia coli was sequenced and with it the development of molecular biology techniques allowed the cloning of part of genome or single gene to be, then further characterized. At beginning of 1990, the high throughput DNA-sequencing technique and instruments boosted the sequencing of many other organisms opening the era of genomics and related technologies. I will describe in this paragraph technologies which were known before the genomic era. They deserve a separated paragraph since they have been founder of the new technologies as recognised by the scientific communities who awarded the authors with the Nobel Prize.

DNA sequencing

As mentioned above the basic principle of sequencing was established in the early

’70 by the work of Sanger (11 ) (Nobel Prize in 1980), it took almost twenty years to get an efficient and automatized system capable to use fluorescent dyes to tag the dideoxyribonculeotides with one colour of each of the four nucleotides (12 ).

Furthermore faster and cheaper sequencing methods and equipment are continuously developed. For example, the recent pyrosequencing method uses a novel fibre-optic slide of individual wells. This method could sequence 25 millions bases in one 4- hour run with an accuracy of 99.96% (13 ).

DNA Cloning System

Paul Berg creates the first recombinant DNA molecules (ref. Berg, Science). While studying the actions of isolated genes, he devised methods for splitting DNA

5 molecules at selected sites and attaching the resulting segments to the DNA of a virus or plasmid, which could then enter bacterial or animal cells. Indeed a large variety of cloning systems is now available to accommodate different types and sizes of DNA fragments. For examples, the vectors can be based on plasmids (optimal range of DNA fragments 0.5-10 Kb), part of virus e.g. bacteriophage (7-20 Kb), cosmid or fosmid (35-45 Kb), bacterial artificial chromosome (BAC, 80-200 Kb) and artificial chromosome (YAC, 200-1500 Kb). Development of vectors such as cosmid, BAC and YAC accelerated the sequencing and analysis of genetic information directly from the environmental samples blooming the field of

Metagenomics (6, 14 ).

PCR and Real Time Quantitative-PCR

Kary B. Mullis (Nobel Prize, 1998) invented the polymerase chain reaction technique (PCR) is an essential book on Mullis and the development of the polymerase chain reaction). In the original PCR procedure, one problem was that the DNA polymerase had to be replenished after every cycle because it is not stable at the high temperatures needed for denaturation. This problem was solved in 1987 with the discovery of a heat-stable DNA polymerase called Taq, an enzyme isolated from the thermophilic bacterium Thermus aquaticus. Taq polymerase also led to the invention of the PCR machine (15 ). This tool allows to analyze very low amount of

DNA fro laboratory and environmental filed. It is possible using appropriate primers to amplify the target gene, e.g. amplification of nuclear small ribosomal gene. The ribosomal gene is the most conserved gene and for this reason it is (16S rDNA)

6 extensively used to identify an organism’s taxonomic group, calculate related groups and estimate rates of species divergence (Phylogenetic tree).

Recently the Real Time PCR (RT-PCR) or Quantitative Real Time PCR (QRT-PCR) allows to monitor in real time progress of the PCR, increasing the sensitivity as well as the quantification starting from very low copy number of the template (16 ). The most popular system is shown in figure 2, the TaqMan® assay. The system is based on the 5’-3’exonuclease activity of Amplitaq Gold ® polymerase that cleaves the labeled probe bound to the specific target (17 ). The probe is designed between the primer and forward, with the fluorescence dye at the 5’end and the quencher at

3’end. The PCR kinetic is designed such that after melting (denaturation) of the

DNA duplex, the probe will hybridize to the specific target at higher temperature than the PCR primers. Extension from the Taq polymerase subsequently annealing forward primer is then blocked by the probe. The 5’end of the probe is then replaced and the resulting overhang 5’-3- is cleaved by the polymerase. Consequently the fluorescence dye will be physically separated by the quencher and hence an increase in fluorescence directly proportional to the DNA concentration. By recording the level of fluorescence emission at each amplification cycle, it is possible to monitor a reaction during its exponential phase, where the replicate samples amplify exponentially and the increase in the amount of PCR product correlate to the initial amount of target template.

7

Fig 2: TaqMan assay. The RT-PCR reaction exploits the 5' nuclease activity of AmpliTaq Gold DNA polymerase to cleave a TaqMan probe during the PCR. The TaqMan probe contains a reporter dye at 5' end (R) and a quencher (Q) dye at 3'. During the first step, the primers and the probe anneal at specific target, and the polymerization starts. During the extension the probe is displaced and cleaved by the exonuclease activity 5'-3'. The cleavage of the probe separates the reporter dye from the quencher, resulting in increased fluorescence of the reporter. Accumulation of the PCR products is detected directly by monitoring the increase in fluorescence of the reporter. When the probe is intact, the proximity of the reporter dye to the quencher results in suppression of the fluorescence emission primarily by Foster- type energy transfer.

8 Hybridizationtechniques: FISH, FISH-microautoradiography and Stable

Isotope Probing

Fluorescent in situ hybridization (FISH) was developed by Aman (18 ) and it is used to detect uncultivable bacterial. FISH allows, using fluorescently tagged specific probes (e.g. 16 rRNA), to observe directly the microorganisms and estimate the specific species, genera, families or phyla in a given environmental sample (19 ).

Recently Kalyuzhnaya et al. (20 ) combined FISH with the flow cytometry and cell sorting (FACS) . The method was used for the detection and enrichment of the two types of methanotroph populations (type I and II) from lake Washington sediments.

FISH has been used also associated with specific radiolabeled substrate to identify with metabolic analyses (21 ). This technique involves short incubation of an environmental sample with a 14 C-labelled substrate, after which the sample is fixed in thin section and subsequently FISH analysis is performed (22 ).

Interesting is the developing of stable isotope probing (SIP) that involves the exposure of an environmental sample to a stable-isotope-enriched substrate (e.g

15N) and subsequently analysis of the labeled biomarkers. This technique, especially in last year, is improved in the attempt to linking the identity of microorganisms to their function. An exhaustive review has been recently published by Dumont (23 ).

9

Immunofluorescence

This technique is also quite known in molecular biology field. However in microbiology it has not been used frequently, so far since the availability of genome sequence and then protein sequence to develop antibodies were not always available.

But the post genomic period definitively increased and filling the gap, and many protein sequences have been identified and in some case even microorganism- specific allowing the technique to be valuable tool for the screening of environmental samples.

The immunofluorescence technique employs two sets of antibodies; a primary antibody is used against the antigen of interest. Subsequently a secondary, dye- coupled antibody that recognizes the primary antibody is used. In this way, the researcher may create several primary antibodies that recognize various antigens but, because they all share a common constant region, may be recognized by a single dye-coupled antibody. Typically this is done by using antibodies made in different species (Figure 3).

Immunofluorescently labeled samples can be analyzed by using a fluorescence microscope or by confocal microscopy. P. Tuomi et al . (24 ) analyzed using polyclonal antibodies nine different bacteria from the Lake Saelenvannet in western

Norway to study the dynamic population monitoring through two spring seasons.

10

Second antidody

Microrganism

Specific antibody

Fig.3 It is a simple scheme of the immunofluorescence staining. Two antibodies are used, one, the primary antibody is specific for the target microorganism (e.g. against a membrane receptor) while the other is a dye-coupled antibody, the secondary, usually is produced against the common region of the primary antibody which is conserved in all species.

From First Microbial Genomics to Microbial Community Genomics, Metagenomics

It is now a decade since the first microbial genome was sequenced. Although genomics is still in its infancy and the best is still to come, amazing strides have been made since the completion in 1995 of the first genome sequence of a free-living organism, the bacterium Haemophilus influenzae (25). Just ten years later, 261 microbial genomes have been completed and an additional 669 are in progress. Most

11 recently the merging of cultivation-independent gene sequences with contemporary genomic approaches such as “whole-genome shotgun sequencing” (1) is providing a more comprehensive picture of the structure and function of indigenous microbial communities.

The environmental sample is gently lysated to destroy the cellular membrane and then, the extracted genomic DNA is cloned in fosmids and in BAC (26, 27 ) which, as previously mentioned, are capable to allocate large insert ( up to 200 kb). There are already several studies that used large-inset DNA. The results of these studies brought to the discovery of unexpected mechanism of light-driven energy generation in the ocean (28 ), a massive survey of the gene complement of Sargasso Sea microorganism (1), and the characterization of biochemical pathways that can differentiate microbial species living in different habitats (29 ).

One of the most remarkable applications of metagenomis is the Sargasso Sea. This study produced 1,987,936 DNA sequences read for a total 1,625 Mb of DNA. The gene complement of the assembled Sargasso Sea database revealed about a total

1,800 species including 148 previously unknown bacterial phylogenetic types. Their analysis also identified spatial variation in species richness and relative abundance among the four sampled sites (1). Another example of genomic analysis of natural microbial communities led to discovery of rhodopsin in bacteria, a domain life non anticipate to contains these photoproteins (6). The authors could prove and test the activity of these proteins in vitro and in vivo by heterologous expression in E. coli .

Subsequently field studies showed that these bacteria are quite spread in marine environment.

12

Genomics Tool

DNA Microarray technology

DNA Microarrays have revolutionized biological research. With the help of a microarray, researchers can query the whole genome at once, rather than just a few genes at a time. Experiments that used to be impossible are now being performed in days or hours. "By being able to see the big picture, all the genes, all the genetic variation, we can readily pick out answers--we can make discoveries that we could never make before,"Eric Lander (International Human Genome Sequencing

Consortium.

The field of DNA microarray has evolved from Ed Southern’s key insight (30 ), twenty five years ago, who showed that labeled nucleic acid molecules could be used to interrogate nucleic acid molecules attached to a solid support. The resulting

Southern blot can be viewed has the first DNA array (31 ). Subsequently, the explosion of array technologies has been sparked by two key innovations. The first has been the use of non porous solid support, such as glass, which has facilitated the miniaturization of the array and the development of fluorescence-hybridization detection (32-34 ). The second critical innovation has been the development of methods for high-density spatial synthesis of oligonucleotides, which allows the analysis of thousands of genes at the same time. Recently a significant technical achievement was obtained by producing arrays with more than 250,000 oligonucleotides probes or 10,000 different cDNAs per square centimeter (35 ). DNA microarrays are fabricated by high speed robots generally on glass. The applications

13 of this technology are extensively described in literature e.g. in toxicology, ecotoxicology (36 ), drug discovery, tumor studies .

In microbiology many microarrays are commercially available for pathogen detection and their applications are quite well documented E.coli (37 ), Salmonella

(Garizar et al ., 2002), Helicobacter pylori (38 ). Recently several types of microarrays have been developed for detection of bacteria and quantitative assessment of their community structures (39 ). These arrays include i) phylogenetic oligonucleotide arrays that contain highly specific gene sequence from rRNA of specific groups of microbes (40 ) (Fig. 4), ii) community genome arrays that contain highly specific gene or part of gene from known cultured microbial species, iii)functional gene arrays that contained conserve domains of genes involved in specific metabolic pathways such as biogeochemical cycling carbon, nitrogen, sulphate, phosphate and metals. Other application will be further discussed in microbial communities and dynamic changes.

14 Probes mRNA or DNA -> labeled cDNA A B C D A’ B’ C’ D’

Targets Co-hybridize

Scanning Image Data Analysis

Fig 4. DNA Microarray analysis: the chip is spotted with fragment of gene or genome (probe in a fixed and annotated position) which is hybridized with the complementary part (labeled mRNA or Genomic DNA: target) extracted from the sample (mix or pure culture of microorganisms). The signal will be visualized by scanning the image and by analyzing the data.

Mass spectrometry – based microorganism identification

Mass spectrometry (MS) can be used to identify individual proteins present in the sample under analyses. A mass spectrometer discriminates between chemical compounds by separating molecular ions according to the mass-to-charge (m/z) ratio, allowing the determination of molecular mass. The key improvement that has

15 allowed the use of MS for the identification of proteins has been the development of

“soft” methods of ionization such as matrix-assisted laser desorption ionization

(MALDI) and electrospray ionization (ESI). These two methods allow the ionization of large biomolecules such as proteins with very little or no fragmentation (thus the term soft ionization). The third key part of such technique is the availability of a large number of fully sequenced genomes that allow the peptide mass fingerprinting

(41, 42 ). In this method the protein to identify is subjected to proteolytic digestion with a protease (usually trypsin). The masses of the resulting peptides are acquired by mass spectrometry to give an experimental peptide fingerprint. This experimental fingerprint is then compared with all the predicted peptide fingerprint of that organism available in the complete proteome database. A protein that has the most theoretical peptide fragments in accordance with the experimental data is identified as the best matched protein.

The possibility of identifying a large number of the proteins present at a given time in an organism or tissue has given rise to the proteomics field. Early applications of proteomics have been focused on the differences, at the protein levels, between normal and cancerous cells. For example, a study compared the proteome of normal human luminal and myoepithelial breast cells using immunopurified cell populations. It detected 170 spots with a twofold difference in expression levels (43 ) and identified 52 of them.

This technique has also been applied to environmental problems. Several studies have attempted to identify “protein expression signature” in response to environmental stressors. Bradley and al. have analysed the response, at the protein level, of marine invertebrates to salinity and temperatures, and to chemicals such as polychlorinated biphenyls and copper (44-46 ).

16 Advances in MS technology and the advent of the “time of flight (TOF)” detectors coupled with MALDI desorption techniques has allowed the identification of intact microorganisms by MALDI-TOF MS fingerprinting (47 ). The identification of microorganisms by protein profiling follows a simple protocol: starting from a single colony or other biological material, samples can be analyzed in a few minutes.

Automated spectra acquisition is completed in a few seconds per sample and then analyzed with the appropriate identification software. The identification of microorganisms is based on the measurement of constantly expressed high-abundant proteins, such as ribosomal proteins (48 ). The MALDI-TOF MS spectra are usually measured in the range 2000-20000 Da where very few metabolites appear and it has been shown that culture medium has little effect on the peak pattern distribution.

Also, growth state of the cells has little effect on the peak pattern of measured proteins (49 ). Samples preparation and measurements use quite simple and standard protocols and thus the acquired profile spectra are very reproducible, even using different MALDI-TOF instruments.

This method is fast, robust, has a low cost per sample, and it is thus well suited in environmental research and for the study of dynamic changes of biodiversity in microbial communities (50 ).

Single cell genomic sequencing

This method has been published by Zhang in Nature (51 ) and it is a promising tool which will revolutionize even more the concept of microbial diversity and more in general of the biodiversity.

17 The authors have set up a technique to sequence the genome from only one cell.

Indeed the metagenomic approach , although it revealed an enormous biodiversity in the environmental samples, deals with some obstacles such as the difficulty to assembling contigs into discrete genomes and the bias sampling toward abundant species (52 ). The sequence form one uncultured cell could improve the knowledge of genetic heterogeneity in a population trying to understand which is the effect on the biodiversity in term of i) interaction between host and parasitic genomes or ii) cell-cell interactions (e.g. predator-prey, symbionts, commensals) and iii) the rescue of rare species from the genomic environmental sample .

The method is the combination of the Real Time PCR, which they called “real-time ultra-low isothermal amplification” and the “method to be sure that the amplification occurs from single cell (polymerase cloning, called “ploning) ”.

The first one is just an optimization of the real-time PCR to minimize the effects of exogenous DNA contamination and endogenous template background to the unspecific annealing of the primers. To achieve the goal, they used SYBR green fluorescence to monitor the isothermal amplification (contamination from exogenous DNA (53 )) and designed hexanucleotides that cannot hybridize.

To plone a single cell, they developed a procedure to assess whether an amplicon

(amplified target) is truly amplified from a single cell. First set up the conditions in laboratory in an artificial mixture of four E.coli strains, distinguished-genetically.

They start form serial diluted populations until they were sure that the amplification was obtained only from one cell. They set up the method also for Prochloroccus marinus , which is one of the most abundant bacterial lineage in the ocean (54 ) and then applied this method to a Pacific Ocean sample. They have successfully ploned

18 two single Prochloroccus cells which were confirmed to be closely related to the sequenced strain MIT9312 (54 )..

Public databases for Genome Microbes

In table 1 are listed the main databases, today available for microbial genomics. The

Institute for Genome Research (TIGR), founded by J.G Venter launched the genome era with its landmark publication of the first full DNA sequence of a free-living organism, the bacterium Haemophilus influenzae . The database provides information about DNA, protein and taxonomic data for microbes, plants and humans. Further a free website can be used to display information on all of the publicly available, complete prokaryotic genomes.

The Joint Genome Insitute (JGI) has released 119 finished microbial genomes including eukaryotic microbial sites and metagenomic analysis. The database includes also annotation, alignment and protein translation. Since March 2006, the

IMG provides a framework for comparative analysis of the genomes sequenced by the Joint Genome Institute. Its goal is to facilitate the visualization and exploration of genomes from a functional and evolutionary perspective.

NCBI (National Center for Biotechnology Information) created public databases such as GenBank which includes sequence database, an annotated collection of all publicly available DNA sequences from bacteria to human.

Ribosomal Database Project (55 ) provides ribosome related data (more than 120,000 bacterial 16S) , and services such as online data analysis, rRNA derived phylogenetic trees, and aligned and annotated rRNA sequences.

The Genome Information Brooker (GIB) is the comprehensive data repository of complete microbial genomes in the public domain. GIB diffuses the genome

19 sequence data and annotation. It is possible to explore any microbial genome by clone name, ORF name/number, function, gene name, product name, location, sequence (namely, homology search), and other features. It is to be noted that both fragments and genomes are downloadable. The EMBL (European Molecular

Biology Laboratories) Nucleotide Sequence database is the major European database. It is possible to find DNA sequences from any species.

Nevertheless, the quite many available databases, other informatics challenges need to be considered as the field of environmental is developing so rapidly. Management, annotations and archiving of data sets are ongoing process and many but not all, of the mechanisms to handle huge datasets now emerging, are available. One issue of concern is the establishment of standards and requirements for environmental genomic metadata. Indeed, all theses information will continue to mine complex community datasets such as to understand how community gene content maps onto taxonomic composition, metabolic repertoire and phenotypic expressions . The databases will play a key role into collection of the information including, not only appropriate organizing community structure, but also the sampling location, the physical and chemical parameters as well sampling and sequencing standardized protocols (27 ).

Table 1: List of cited databases and repository services.

Acronym Full name Web site

TIGR Institute for Genome Research http://www.tigr.org/

JGI Joint Genome Institute http://genome.jgpsf.org/mic_home.html

IMG Integrated Microbial Genome http://img.jgi.doe.gov/cgibin/pub/main.cgi

RDP Ribosomal Dabase Project http://rdp.cme.msu.edu/

20 NCBI National Center for Biotechnology www.ncbi.nlm.nih.gov/Genomes/index.html

Information Genome

GIB Genome Information Broker http://gib.genes.nig.ac.jp/

EMBL-Bank EMBL Nucleotide http://www.ebi.ac.uk/embl/

Sequence Database

Microbial communities : dynamic changes and genomics techniques

The complete sequences of micro-organisms combined with the environmental sample microbial genomes (metagenomics) are a challenge to explore the nature of the communities and then, to monitor them over the time and to understand the dynamic change due to various forces (such as climate changes, nutrient concentrations, habitat destruction, etc.).

Dennis et al. (39 ) designed a DNA Microarray containing probes for known bacteria metabolic and catabolic genes, total 64, to monitor the gene expression changes in a complex environment. They analysed the pulp and paper water waste treatment systems showing that they could predict the structure of the microorganisms in the sample. The identification of betaproteobacterial order "Rhodocyclales" in environmental samples was done developing a DNA microarray using 16S RNA.

They were able to detect the specific bacteria from environmental sample even less than 1% of the all population (56 ).

Recently a DNA Microarray platform for the characterization of bacterial communities in the freshwater-sediments has been described by Peplies (57 ). In this case, they used 70 probes for heterogeneous 16S rRNA, which were tested before against pure bacteria culture to measure the false-negative signals on detection

(approximately 5%), then against four German river site samples. The results were

21 also confirmed by additional FISH (fluorescence in situ hybridization) technique.

Despite these examples and others mention above in the DNA microarray section, significant challenges remain with regards to specificity, sensitivity, especially because microbial communities contain highly heterogeneous group of organisms.

However, in the future the availability of more genomics information together the single genome cell sequencing will allow to design specific probes to avoid the cross

–hybridization between strains within the species etc.

Characterization of microbial communities to evaluate the diversity and changes of ammonia oxidizing bacteria (AOB) using the Real Time PCR technique and immunofluorescence staining is described by Urakawa et al. (58 ). They collected five superficial marine sediments in the Tokyo Bay, an eutrophic bay. They could identify and quantify the two major groups present in the sample which was confirmed also by immunofluorescence staining. The results revealed that spatial distribution and shifts in the population structure of AOB is strictly correlated to nutrients and organic inputs from the river run-off and phytoplankton bloom.

Another application of the Real Time PCR is the detection of toxic genes associated to cyanobacterial communities in the Lake Erie (59 ). The lake, during the algae bloom, is particularly dominated by Microcystis, responsible for toxin production .

They used the RT PCR to detect the genes mcyD and mcyB, encoding for toxin microcystin as well as 16S rDNA fragments either specific to Microcystis species or to all bacteria. They tested 21 samples from 13 field stations and identified the presence of Mycrocystis. Further they generated a ribosomal library from seven stations to characterise the microbial communities, showing that the dominated sequences were from Synechococcus and Cyanobium phyologentic cluster. Some years before, McElhiney & co-authors (60 ) isolated a single-chain antibody against

22 the toxin microcystin for rapid detection of the toxin from water. It is interesting that they immobilized the antibody on a column and capable to concentrate trace level of microcystin from large volume showing high sensitivity and specificity.

Conclusions The first decade of the genomics era has revolutionized our understanding of microbiology, and it is likely that this process will accelerate as new technologies are and will being developed that allow even more rapid generation of genomic data, which in turn will open more avenues of research. However, currently it is still taking snapshots, not yet making movies. The challenge of next decade will be to string all the pictures to really appreciate the complexity and the dynamic nature of the exchanges that are taking place in the microbial world and their functional implications. To achieve the goal, an in interdisciplinary systems approach is needed. The society of Microbiology has issued a call to create systems microbiology and systems microbial ecology (Buckley, MR. (2005): System

Microbiology : Beyon Microbial Genomics . ASM Report) .). System Microbiology can be considered a subset of the new emerging filed of System Biology (61 ).

System Microbiology seeks to treat an organism or community as a whole, integrating fundamental biological knowledge (molecular, cellular, population community and ecosystem level) with genomics, transcriptome (gene expression level), proteome (protein level) and metabolome (metabolite monitoring level).

Definitively, this discipline will require the contribution of many scientists such as microbiologists, molecular biologists, chemists, biochemists, bioinformatics, computational scientist, mathematicians, and only the collaborations among them will move the field forward. Systems microbiology and systems ecology promise to

23 shed the light on the activities of microbes, but also to provide tools and approaches to better understand the complexity and the biodiversity of the ecosystem.

24 References

1. Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, Wu D, Paulsen I, Nelson KE, Nelson W, Fouts DE, Levy S, Knap AH, Lomas MW, Nealson K, White O, Peterson J, Hoffman J, Parsons R, Baden- Tillson H, Pfannkoch C, Rogers YH, Smith HO. Environmental genome shotgun sequencing of the Sargasso Sea. Science 304:66-74(2004). 2. Woese CR. Bacterial . Microbiol Rev 51:221-71(1987). 3. Giovannoni SJ, Rappe MS, Vergin KL, Adair NL. 16S rRNA genes reveal stratified open ocean bacterioplankton populations related to the Green Non- Sulfur bacteria. Proc Natl Acad Sci U S A 93:7979-84(1996). 4. Pace NR. A molecular view of microbial diversity and the biosphere. Science 276:734-40(1997). 5. Stein JL, Marsh TL, Wu KY, Shizuya H, DeLong EF. Characterization of uncultivated prokaryotes: isolation and analysis of a 40-kilobase-pair genome fragment from a planktonic marine archaeon. J Bacteriol 178:591-9(1996). 6. Beja O, Aravind L, Koonin EV, Suzuki MT, Hadd A, Nguyen LP, Jovanovich SB, Gates CM, Feldman RA, Spudich JL, Spudich EN, DeLong EF. Bacterial rhodopsin: evidence for a new type of phototrophy in the sea. Science 289:1902-6(2000). 7. de la Torre JR, Christianson LM, Beja O, Suzuki MT, Karl DM, Heidelberg J, DeLong EF. Proteorhodopsin genes are distributed among divergent marine bacterial taxa. Proc Natl Acad Sci U S A 100:12830-5(2003). 8. Lotze HK, Lenihan HS, Bourque BJ, Bradbury RH, Cooke RG, Kay MC, Kidwell SM, Kirby MX, Peterson CH, Jackson JB. Depletion, degradation, and recovery potential of estuaries and coastal seas. Science 312:1806- 9(2006). 9. Pandolfi JM, Bradbury RH, Sala E, Hughes TP, Bjorndal KA, Cooke RG, McArdle D, McClenachan L, Newman MJ, Paredes G, Warner RR, Jackson JB. Global trajectories of the long-term decline of coral reef ecosystems. Science 301:955-8(2003). 10. Worm B, Barbier EB, Beaumont N, Duffy JE, Folke C, Halpern BS, Jackson JB, Lotze HK, Micheli F, Palumbi SR, Sala E, Selkoe KA, Stachowicz JJ,

25 Watson R. Impacts of biodiversity loss on ocean ecosystem services. Science 314:787-90(2006). 11. Sanger F, Air GM, Barrell BG, Brown NL, Coulson AR, Fiddes CA, Hutchison CA, Slocombe PM, Smith M. Nucleotide sequence of bacteriophage phi X174 DNA. Nature 265:687-95(1977). 12. Prober JM, Trainor GL, Dam RJ, Hobbs FW, Robertson CW, Zagursky RJ, Cocuzza AJ, Jensen MA, Baumeister K. A system for rapid DNA sequencing with fluorescent chain-terminating dideoxynucleotides. Science 238:336- 41(1987). 13. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, Dewell SB, Du L, Fierro JM, Gomes XV, Godwin BC, He W, Helgesen S, Ho CH, Irzyk GP, Jando SC, Alenquer ML, Jarvie TP, Jirage KB, Kim JB, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM, Lei M, Li J, Lohman KL, Lu H, Makhijani VB, McDade KE, McKenna MP, Myers EW, Nickerson E, Nobile JR, Plant R, Puc BP, Ronan MT, Roth GT, Sarkis GJ, Simons JF, Simpson JW, Srinivasan M, Tartaro KR, Tomasz A, Vogt KA, Volkmer GA, Wang SH, Wang Y, Weiner MP, Yu P, Begley RF, Rothberg JM. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437:376-80(2005). 14. Beja O, Suzuki MT, Koonin EV, Aravind L, Hadd A, Nguyen LP, Villacorta R, Amjadi M, Garrigues C, Jovanovich SB, Feldman RA, DeLong EF. Construction and analysis of bacterial artificial chromosome libraries from a marine microbial assemblage. Environ Microbiol 2:516-29(2000). 15. Saiki RK, Gelfand DH, Stoffel S, Scharf SJ, Higuchi R, Horn GT, Mullis KB, Erlich HA. Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science 239:487-91(1988). 16. Heid CA, Stevens J, Livak KJ, Williams PM. Real time quantitative PCR. Genome Res 6:986-94(1996). 17. Holland PM, Abramson RD, Watson R, Gelfand DH. Detection of specific polymerase chain reaction product by utilizing the 5'----3' exonuclease activity of Thermus aquaticus DNA polymerase. Proc Natl Acad Sci U S A 88:7276-80(1991). 18. Fuchs BM, Zubkov MV, Sahm K, Burkill PH, Amann R. Changes in community composition during dilution cultures of marine bacterioplankton

26 as assessed by flow cytometric and molecular biological techniques. Environ Microbiol 2:191-201(2000). 19. Bathe S, Hausner M. Design and evaluation of 16S rRNA sequence based oligonucleotide probes for the detection and quantification of Comamonas testosteroni in mixed microbial communities. BMC Microbiol 6:54(2006). 20. Kalyuzhnaya MG, Zabinsky R, Bowerman S, Baker DR, Lidstrom ME, Chistoserdova L. Fluorescence in situ hybridization-flow cytometry-cell sorting-based method for separation and enrichment of type I and type II methanotroph populations. Appl Environ Microbiol 72:4293-301(2006). 21. Lee N, Nielsen PH, Andreasen KH, Juretschko S, Nielsen JL, Schleifer KH, Wagner M. Combination of fluorescent in situ hybridization and microautoradiography-a new tool for structure-function analyses in microbial ecology. Appl Environ Microbiol 65:1289-97(1999). 22. Daims H, Ramsing NB, Schleifer KH, Wagner M. Cultivation-independent, semiautomatic determination of absolute bacterial cell numbers in environmental samples by fluorescence in situ hybridization. Appl Environ Microbiol 67:5810-8(2001). 23. Dumont MG, Murrell JC. Stable isotope probing - linking microbial identity to function. Nat Rev Microbiol 3:499-504(2005). 24. Tuomi P, Torsvik T, Heldal M, Bratbak G. Bacterial population dynamics in a meromictic lake. Appl Environ Microbiol 63:2181-8(1997). 25. Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM, et al. Whole- genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269:496-512(1995). 26. Sabehi G, Beja O, Suzuki MT, Preston CM, DeLong EF. Different SAR86 subgroups harbour divergent proteorhodopsins. Environ Microbiol 6:903- 10(2004). 27. DeLong EF. Microbial community genomics in the ocean. Nat Rev Microbiol 3:459-69(2005). 28. Oz A, Sabehi G, Koblizek M, Massana R, Beja O. Roseobacter-like bacteria in red and mediterranean sea aerobic anoxygenic photosynthetic populations. Appl Environ Microbiol 71:344-53(2005).

27 29. Kruger M, Meyerdierks A, Glockner FO, Amann R, Widdel F, Kube M, Reinhardt R, Kahnt J, Bocher R, Thauer RK, Shima S. A conspicuous nickel protein in microbial mats that oxidize methane anaerobically. Nature 426:878-81(2003). 30. Southern EM. Detection of specific sequences among DNA fragments separated by gel electrophoresis. J Mol Biol 98:503-17(1975). 31. Southern EM. Blotting at 25. Biochem Sci 25:585-8(2000). 32. Schena M, Shalon D, Davis RW, Brown PO. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270:467-70(1995). 33. Schena M, Shalon D, Heller R, Chai A, Brown PO, Davis RW. Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. Proc Natl Acad Sci U S A 93:10614-9(1996). 34. Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV, Chee MS, Mittmann M, Wang C, Kobayashi M, Horton H, Brown EL. Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol 14:1675-80(1996). 35. Lipshutz RJ, Fodor SP, Gingeras TR, Lockhart DJ. High density synthetic oligonucleotide arrays. Nat Genet 21:20-4(1999). 36. Lettieri T. Recent applications of DNA microarray technology to toxicology and ecotoxicology. Environ Health Perspect 114:4-9(2006). 37. Chizhikov V, Rasooly A, Chumakov K, Levy DD. Microarray analysis of microbial virulence factors. Appl Environ Microbiol 67:3258-63(2001). 38. Bjorkholm B, Sjolund M, Falk PG, Berg OG, Engstrand L, Andersson DI. Mutation frequency and biological cost of antibiotic resistance in Helicobacter pylori. Proc Natl Acad Sci U S A 98:14607-12(2001). 39. Dennis P, Edwards EA, Liss SN, Fulthorpe R. Monitoring gene expression in mixed microbial communities by using DNA microarrays. Appl Environ Microbiol 69:769-78(2003). 40. Loy A, Lehner A, Lee N, Adamczyk J, Meier H, Ernst J, Schleifer KH, Wagner M. Oligonucleotide microarray for 16S rRNA gene-based detection of all recognized lineages of sulfate-reducing prokaryotes in the environment. Appl Environ Microbiol 68:5064-81(2002).

28 41. Shevchenko A, Jensen ON, Podtelejnikov AV, Sagliocco F, Wilm M, Vorm O, Mortensen P, Boucherie H, Mann M. Linking genome and proteome by mass spectrometry: large-scale identification of yeast proteins from two dimensional gels. Proc Natl Acad Sci U S A 93:14440-5(1996). 42. Sagliocco F, Guillemot JC, Monribot C, Capdevielle J, Perrot M, Ferran E, Ferrara P, Boucherie H. Identification of proteins of the yeast protein map using genetically manipulated strains and peptide-mass fingerprinting. Yeast 12:1519-33(1996). 43. Page MJ, Amess B, Townsend RR, Parekh R, Herath A, Brusten L, Zvelebil MJ, Stein RC, Waterfield MD, Davies SC, O'Hare MJ. Proteomic definition of normal human luminal and myoepithelial breast cells purified from reduction mammoplasties. Proc Natl Acad Sci U S A 96:12589-94(1999). 44. Shepard JL, Bradley BP. Protein expression signatures and lysosomal stability in Mytilus edulis exposed to graded copper concentrations. Mar Environ Res 50:457-63(2000). 45. Shepard JL, Olsson B, Tedengren M, Bradley BP. Protein expression signatures identified in Mytilus edulis exposed to PCBs, copper and salinity stress. Mar Environ Res 50:337-40(2000). 46. Bradley BP, Shrader EA, Kimmel DG, Meiller JC. Protein expression signatures: an application of proteomics. Mar Environ Res 54:373-7(2002). 47. Fenselau C, Demirev PA. Characterization of intact microorganisms by MALDI mass spectrometry. Mass Spectrom Rev 20:157-71(2001). 48. Ryzhov V, Fenselau C. Characterization of the protein subset desorbed by MALDI from whole bacterial cells. Anal Chem 73:746-50(2001). 49. Pineda FJ, Antoine MD, Demirev PA, Feldman AB, Jackman J, Longenecker M, Lin JS. Microorganism identification by matrix-assisted laser/desorption ionization mass spectrometry and model-derived ribosomal protein biomarkers. Anal Chem 75:3817-22(2003). 50. Demirev PA, Feldman AB, Kowalski P, Lin JS. Top-down proteomics for rapid identification of intact microorganisms. Anal Chem 77:7455-61(2005). 51. Zhang K, Martiny AC, Reppas NB, Barry KW, Malek J, Chisholm SW, Church GM. Sequencing genomes from single cells by polymerase cloning. Nat Biotechnol 24:680-6(2006).

29 52. Tyson GW, Chapman J, Hugenholtz P, Allen EE, Ram RJ, Richardson PM, Solovyev VV, Rubin EM, Rokhsar DS, Banfield JF. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428:37-43(2004). 53. Hafner GJ, Yang IC, Wolter LC, Stafford MR, Giffard PM. Isothermal amplification and multimerization of DNA by Bst DNA polymerase. Biotechniques 30:852-6, 858, 860 passim(2001). 54. Rocap G, Larimer FW, Lamerdin J, Malfatti S, Chain P, Ahlgren NA, Arellano A, Coleman M, Hauser L, Hess WR, Johnson ZI, Land M, Lindell D, Post AF, Regala W, Shah M, Shaw SL, Steglich C, Sullivan MB, Ting CS, Tolonen A, Webb EA, Zinser ER, Chisholm SW. Genome divergence in two Prochlorococcus ecotypes reflects oceanic niche differentiation. Nature 424:1042-7(2003). 55. Cole JR, Chai B, Farris RJ, Wang Q, Kulam SA, McGarrell DM, Garrity GM, Tiedje JM. The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis. Nucleic Acids Res 33:D294- 6(2005). 56. Loy A, Schulz C, Lucker S, Schopfer-Wendels A, Stoecker K, Baranyi C, Lehner A, Wagner M. 16S rRNA gene-based oligonucleotide microarray for environmental monitoring of the betaproteobacterial order "Rhodocyclales". Appl Environ Microbiol 71:1373-86(2005). 57. Peplies J, Lachmund C, Glockner FO, Manz W. A DNA microarray platform based on direct detection of rRNA for characterization of freshwater sediment-related prokaryotic communities. Appl Environ Microbiol 72:4829- 38(2006). 58. Urakawa H, Kurata S, Fujiwara T, Kuroiwa D, Maki H, Kawabata S, Hiwatari T, Ando H, Kawai T, Watanabe M, Kohata K. Characterization and quantification of ammonia-oxidizing bacteria in eutrophic coastal marine sediments using polyphasic molecular approaches and immunofluorescence staining. Environ Microbiol 8:787-803(2006). 59. Ouellette AJ, Handy SM, Wilhelm SW. Toxic Microcystis is widespread in Lake Erie: PCR detection of toxin genes and molecular characterization of associated cyanobacterial communities. Microb Ecol 51:154-65(2006).

30 60. McElhiney J, Drever M, Lawton LA, Porter AJ. Rapid isolation of a single- chain antibody against the cyanobacterial toxin microcystin-LR by phage display and its use in the immunoaffinity concentration of microcystins from water. Appl Environ Microbiol 68:5288-95(2002). 61. Ideker T, Galitski T, Hood L. A new approach to decoding life: systems biology. Annu Rev Genomics Hum Genet 2:343-72(2001).

31 European Commission

EUR 22600 EN – DG Joint Research Centre, Institute for the Environment and Sustainability Title: Microbial Biodiversity:Detection and Monitoring Using Molecular Biology Tools Authors: Teresa Lettieri Luxembourg: Office for Official Publications of the European Communities 2006 –pp.32– 21 x 29 cm EUR - Scientific and Technical Research series; ISSN 1018-5593

Abstract

Microorganisms are responsible for the most biogeochemical cycles that shape

the environment of earth and its oceans. So far, only part of these organisms has been

well studied, especially those living on earth and more considered from an

anthropogenic perspective, e.g causing human diseases or providing useful products

and services. Further the inability to generate pure culture has hampered the

possibility to study and understand the metabolic processes of many micro-organisms.

Recently, advances in molecular biology and -omics technologies are offering

new and more exciting perspectives and knowledge of the microbial world, such as to

understand the biological communities and function relationships, to identify the

diversity of the population; to monitor the effects of environmental factors on the

community structures.

This review will explore the development, application of the genomics

technologies to the microbial biodiversity and microbial ecology monitoring. It will

introduce the concept of microbial world, and how the molecular biology tools and –

omics technologies revolutionized the microbial ecological studies expanded our view

on the previously underappreciated microbial world.

32

The mission of the JRC is to provide customer-driven scientific and technical support for the conception, development, implementation and monitoring of EU policies. As a service of the European Commission, the JRC functions as a reference centre of science and technology for the Union. Close to the policy-making process, it serves the common interest of the Member States, while being independent of special interests, whether private or national.