25.9 Metagenomics NV Q&A MH IF
Total Page:16
File Type:pdf, Size:1020Kb
Vol 455|25 September 2008 Q&A MICROBIOLOGY Metagenomics Philip Hugenholtz and Gene W. Tyson Ten years after the term metagenomics was coined, the approach continues to gather momentum. This culture-independent, molecular way of analysing environmental samples of cohabiting microbial populations has opened up fresh perspectives on microbiology. Why the ‘meta’ in metagenomics? this way and we have been ignorant of their Meta genomics is now also being adopted in Genomics determines the complete genetic existence. This cultivation bottleneck has medicine. Of particular note is an international complement of an organism by high-through- skewed our view of microbial diversity and initiative, the Human Microbiome Project, put sequencing of the base pairs of its DNA. limited our appreciation of the microbial which aims to map human-associated micro- The most prominent example was the Human world. Meta genomics provides a relatively bial communities (including those of the gut, Genome Project, which involved the sequencing unbiased view not only of the community mouth, skin and vagina). of 3 billion base pairs. But the genomes of hun- structure (species richness and distribution) dreds of organisms from all three domains of but also of the functional (metabolic) potential What surprises have there been? life (archaea, bacteria and eukarya), as well as of a community. A strength of metagenomics is its potential those of quasi-life forms such as viruses, have for serendipitous discovery. An example is now been sequenced. Metagenomics, by con- What environments can be analysed? the discovery of proteorhodopsin proteins, trast, involves sampling the genome sequences In principle, any environment is amenable to light-driven proton pumps that were first of a community of organisms inhabiting a com- metagenomic analysis provided that nucleic identified in environmental DNA from bac- mon environment. Metagenomics has also been acids can be extracted from sample material terioplankton. Proteorhodopsins have since more broadly defined as any type of analysis of (Fig. 1). Simpler communities are more trac- been found to be widely distributed and highly DNA obtained directly from the environment table to a particular technique called shotgun expressed in diverse microbial groups from — for example, after the appropriate pro cedures, sequencing (Box 1, overleaf) — this was a aquatic habitats, and they may represent a screening such DNA for particular enzymatic rationale for one of the earliest studies, which major source of energy flux in the photic zone activity. To date, the approach has been applied targeted a biofilm in acid drainage from mines of the world’s oceans. A more recent discovery exclusively to microbial communities. that consisted of only a handful of dominant is that of archaeal ammonia oxidizers. It was microbial populations. Most interest, how- thought that bacteria were solely responsible Why do we need metagenomics? ever, has centred on the marine environment: for aerobic ammonia oxidation, although Microbiology has traditionally been based the largest meta genomic study to date is the their numbers often could not account for on pure cultures grown in the laboratory. But Global Ocean Sampling Expedition, which fol- the observed rates of ammonia oxidation in most microorganisms cannot be grown in lows the voyage of Darwin’s ship HMS Beagle. many habitats. The fortuitous discovery of Bras del Port saltern Acid mine Mediterranean drainage biofilm Human gut Sea Phosphorus-removing Nine biomes Sargasso Sea microbiome (US) bioreactors Whale fall; Eel river Minnesota Guerrero Gutless worm Soils farm soil Soudan Mine Negro sediments microbiome Marine viral Drinking (anaerobic Termite gut hypersaline community water methane Human gut Mouse gut microbiome mat oxidizers) viriome microbiome Jan 03 Aug 03 Mar 04 Oct 04 Apr 05 Nov 05 May 06 Dec 06 Jun 06 Jan 08 Aug 08 Marine RNA Neanderthal Pleistocene cave viriome Human faeces bear fossils microbiome Coral Coral reef viral community holobiont Hawaii Ocean Mammoth Time Series fossil Oceanic Global Ocean Human gut viriomes Sampling microbiome (Japan) Figure 1 | Timeline of sequence-based metagenomic projects showing the variety of environments sampled since 2002. The oceanic viriomes (all viruses in a habitat) (August 2006) were from the Sargasso Sea, Gulf of Mexico, coastal British Columbia and the Arctic Ocean. The nine biomes (March 2008) were stromatolites, fish gut, fish ponds, mosquito viriome, human-lung viriome, chicken gut, bovine gut and marine viriome. The different technologies used are dye-terminator shotgun sequencing (black), fosmid library sequencing (pink) and pyrosequencing (green). (Graphic based on data sets represented at www.genomesonline.org.) 481 NEWS & VIEWS Q&A NATURE|Vol 455|25 September 2008 an ammonia monooxygenase gene next to Box 1 | The nuts and bolts of metagenomics an archaeal marker gene (encoding small- Metagenomics begins with functional or evolutionary many sequence data are sub unit ribo somal RNA) spawned a rash of the extraction of genomic interest, and these clones obtained and the relative papers implicating archaea as the main source DNA from cellular organisms were then fully sequenced to abundance of the organism in of ammonia oxidation in many marine and and/or viruses in an provide contextual data. This its community. For example, terrestrial ecosystems. environmental sample. For is still a useful approach, but assuming completely random ‘traditional’ dye-terminator such directed sequencing sampling, a population Can whole genomes be reconstructed sequencing, the DNA is has largely been replaced with an average genome from an environmental sample? sheared into uniform lengths by ‘shotgun’ sequencing size of 3 million base pairs, Yes: the genomes of dominant species can be and inserted into a vector of randomly sampled, and comprising 0.1% of a fully reconstructed from environmental sam- — a known DNA fragment anonymous DNA, which community, would require ples using random sequencing. For example, that can be moved between is cheaper and has higher 3 billion bases of sequence complete or near-complete genomes have been organisms. The vector is then throughput. data to obtain a 1× coverage assembled from microbial populations present replicated in a bacterial host As with isolate genomics, (each base of the genome in biofilms in acid mine drainage, in activated (typically Escherichia coli), metagenomic data- is represented on average sludges and in marine samples. Having the com- which produces many clones processing usually involves by one read). This is beyond plete or near-complete genome of a dominant of the genome fragment the assembly of short, the range of dye-terminator suitable for sequencing. overlapping sequence sequencing, but new, highly population provides the gene inventory for the Current sequencing reads into a consensus parallelized sequencing organism and allows its metabolic potential to technologies produce ‘reads’ sequence, and prediction technologies, such as 454– be determined, including inferring the absence of fewer than 1,000 bases of which stretches of Roche pyrosequencing and of metabolic pathways as well as their presence. (Fig. 3), so thousands of reads sequence encode genes. In Illumina sequencing, may be A key feature of genomes obtained from envi- are required to re-cover whole metagenomics, however, up to the job (Fig. 3). ronmental sources is that they are composites genomes. As with single- complex communities may A desirable extra step in of the population from which they were derived, genome (isolate) projects, a not result in any assembly, metagenomics is to identify and encompass the genetic microheterogeneity range of sizes of DNA can be as no population may have the owner of each anonymous present in that population. cloned. been sampled a sufficient DNA fragment, a process Initially, metagenomic number of times to result in called binning, or classification. What have we learned about studies focused on screening overlapping reads. Binning methods are still in microbial evolution? libraries of large-insert The amount of an their infancy, and usually only Metagenomics provides the first broad insights clones (fosmids and bacterial organism’s genome recovered longer fragments (5 kilobases into coexisting (sympatric) populations, as artificial chromosomes) for from an environmental or more) can be classified every sequence read is derived from a differ- clones containing genes of sample depends on how reliably. P.H. & G.W.T. ent individual within a given community. In communities in which deep sequence-read coverage of individual populations is possible, most dominant population in a soil sample, Are we starting to get a handle on meta genomics provides an exquisite view of the and many times that to obtain genomes from genetic diversity in the environment? evolutionary processes shaping these organ- less dominant populations. But it is possible to Scarcely: we are still far from measuring the full isms. For instance, data from archaeal popula- extract biologically meaningful information extent of genetic diversity encoded by micro- tions in acid mine drainage were used to show from completely unassembled sequence data bial life. The Global Ocean Sampling Expedi- that genetic recombination occurs at a much using a gene-centric approach. tion has generated the largest meta genomic higher frequency than previously predicted, data set so far, comprising 6.12 million and is the primary evolutionary force