PErsPEctIvEs

linked to precursors that are diagnostic for opinion specific groups of organisms4 (TABLE 1). These molecules are also highly resistant against Unravelling ancient microbial history further degradation. Encased in lithified , they can persist as recognizable with community proteogenomics and molecular skeletons for billions of years. These molecular fossils can be extracted from rocks (the antiquity of which can be lipid established by radiometric isotopic tech- niques) and identified using mass spectrom- Jochen J. Brocks and Jillian Banfield etry. They can provide intriguing glimpses into the species composition of the Earth’s Abstract | Our window into the Earth’s ancient microbial past is narrow and ancient microbial ecosystems. A fascinat- obscured by missing data. However, we can glean information about ancient ing example is the abundance of glycerol microbial ecosystems using fossil lipids (biomarkers) that are extracted from dialkyl glycerol tetraethers (GDGTs) in black billion-year-old sedimentary rocks. In this Opinion article, we describe how , deposited during the mid-Cretaceous environmental genomics and related methodologies will give molecular fossil oceanic 1b, approximately 112 million years (Myr) ago. These GDGTs research a boost, by increasing our knowledge about how evolutionary innovations point to a massive expansion of pelagic, in microorganisms have changed the surface of planet Earth. chemoautotrophic Crenarchaeota into the world’s oceans5. The first appearance of metabolic pathways sulphate reducers often leads to anoxic and In this Opinion article we describe how that allowed microorganisms to exploit new sulphidic pore waters. Under these reducing genomics, proteomics and metabolomics of and previously untapped energy sources, conditions, lipids lose their functionalities; natural microbial consortia could be com- such as methanogenesis, sulphate reduction for example, their double bonds and hydroxy bined to produce a tree of life in which each and , must have had pro- groups. The exact mechanisms of the biologi- lineage is annotated with key biomarkers found effects on the chemical composition cal and chemical sedimentary reactions that and the base sequence of all genes that are of the atmosphere and oceans. One of the turn natural products in recent ecosystems required for their biosynthesis. Ideally, some main goals of geobiology is to determine into the corresponding population of molecu- nodes in the tree would be roughly dated, when such physiologies first appeared in the lar fossils (diagenesis) are poorly understood and the topology would yield data about Earth’s history and to correlate these events and present a formidable challenge for the frequency of with the associated chemical and isotopic biomarker interpretations. During diagenesis, (HGT) and convergent evolution in lipid changes in the rock record (FIG. 1). molecules of dominant community members biosynthetic pathways. The focus of most microbiological inves- could be completely lost, whereas lipids of less tigations is on living cells. However, even abundant organisms could become enriched. Biomarkers and microbial detection in death, the remains of microorganisms Therefore, it is currently difficult to make Biomarkers can be used to measure geo- from all three domains of life can provide quantitative statements about community chemical parameters, such as salinity, important information about how natural composition based on biomarker abundance, temperature and state. For example, ecosystems work. In lakes and oceans, most and virtually impossible to draw conclusions the incorporation of cyclopentane rings cellular polymers, such as DNA, proteins and about the absence of specific organisms. As into the biphytanyl chains of GDGTs carbohydrates, are degraded rapidly in the a result, studies of diagenetic transforma- increases the thermal transition point of water column and, subsequently, in the bot- tions under different chemical and geological cell membranes and is a mechanism for tom sediments. Under exceptional circum- conditions will be a priority of biomarker temperature adaptation in archaea. The stances, ancient bacterial DNA can persist geochemistry in the coming years. A recent number of incorporated rings in GDGTs in sediments for several hundred thousand breakthrough in this area was the recognition extracted from marine sediments has a years1, an observation that could revolution- that non-biological addition of hydrogen sul- linear correlation with annual mean sea ize the study of relatively recent microbial phide to double bonds, followed by reductive surface temperature6. As a classical exam- ecosystems. However, more recalcitrant desulphurization, is an important pathway of ple of the application of this correlation, biomolecules, such as lipids, can perma- lipid preservation2,3 (FIG. 2). GDGTs yielded a high-resolution record nently escape the recycling process. In fine- Although most taxonomic information of sea surface temperatures across the grained sediments, where water circulation is is lost in the transition from biomolecule to Palaeocene–Eocene thermal maximum restricted, remineralization of dead biomass defunctionalized geomolecule, the gener- (PETM), which occurred ~55 Myr ago by -consuming heterotrophs and ated hydrocarbon skeletons can often still be and was one of the most rapid global

NATUrE rEvIEws | Microbiology vOlUME 7 | AUGUsT 2009 | 601 © 2009 Macmillan Publishers Limited. All rights reserved PersPectives

Geological events AtmosphereOcean Era Aeon Biological events

0.055 PETM thermal event C 0.065 Meteorite impact and Disappearance of dinosaurs

Cretaceous/Tertiary M oic 0.091 Biomarkers for centric diatoms50 mass extinction 4 oz Evolution of flowering plants 0.252Permo/Triassic mass extinction P Phaner 0.542 0.58 Gaskiers glaciation43 <0.632 Fossil (stem group?) animal embryos51 0.64 Marinoan ‘snowball’ >0.635 Biomarker evidence for sponge52 44 glaciation ? teozoic 0.75 Fossil testate amoebae53 0.71 Sturtian ‘snowball’ ro Fossil chlorophytes?53,54 glaciation44

Neop 53,54 1.0 ~1.0 Fossil xanthophytes? oic

oz 1.2 Fossil rhodophytes (complex multicellularity 53,54 ? 3 and sexual reproduction) oter oic oz 1.5 Fossil evidence for eukaryotes bearing oter Mesopr a cytoskeleton54

1.6 Pr 1.6 Oldest certainly syngenetic biomarkers: Chlorobiaceae, Chromatiaceae and ~1.8 Disappearance of banded 9

oic type I methanotrophs iron formations 53 oz ~1.7 Oldest eukaryotic fossils ? >1.8 Trace fossils of millimetre-sized, motile 2 oter organisms (giant protists?)55 2.22 Makganyene ‘snowball’ 45 1.9 Oldest fossils with diagnostic cyanobacterial glaciation morphology56 ? laeopr

Pa Rise in O as evidence for cyanobacterial activity45,57 2.32–2.45 Great oxygenation 2 event46 58 O2 >0.001% PAL 2.5 2.5Oldest indigenous hydrocarbons in shales (PAH) 2.6 Fo ssils of filamentous cyanobacteria?59 2.6–2.5 First traces of 2.7 Isotopic evidence for sulphate reduction60 atmospheric O2 61 62 (REF. 47–49) 2.8 Isotopic evidence for aerobic or anaerobic methanotrophy

3.2 Filamentous fossils in hydrothermal spring system63

3.5 Isotopic evidence for sulphur disproportionation64 1 Isotopic evidence for methanogenesis?65 Archaean Oldest probable microbial mats (stromatolites)66

>3.7 Po ssible isotopic evidence for biological carbon fixation67

Formation of the Earth 4.6 Billion years (Gyr) 0.1 110 100

% PAL O2

Figure 1 | Timeline (in billions of years (gyr) before the present) of events), including: anoxic and ferrous deep oceans and anoxic surface major environmental and biological events in the Earth’s history. waters (step 1); surface waters that became increasingly oxygenated the left panel shows the evolution of atmospheric O2 concentrations (step 2); anoxic and possibly sulphidic deep ocean waters and oxygen- through time as a percentage of the present atmospheric level (PAL)42. ated surface waters (step 3); and predominantly oxygenated deep the question marks indicate that the trajectory of the curve is poorly oceans (step 4). c, cainozoic; M, Mesozoic; P, Palaeozoic; PAH, pol- constrained. the middle panel shows the evolution of the redox state yaromatic hydrocarbon; PEtM, Palaeocene–Eocene thermal of the oceans (excluding temporary conditions such as oceanic anoxic maximum.

602 | AUGUsT 2009 | vOlUME 7 www.nature.com/reviews/micro © 2009 Macmillan Publishers Limited. All rights reserved PersPectives

heating events in the Earth’s history. The O CH3 GDGT data, in combination with other O geochemical proxies, showed that the sharp heating event was the cause, not Okenone the consequence, of a rapid injection of the greenhouse gas methane into the H2S atmosphere–ocean system7. A second, unique application of biomar- SH O CH3 kers is the reconstruction of extremely O ancient microbial ecosystems. The oldest- known indigenous biomarkers8 were found in 1,640-Myr-old sedimentary rocks that were deposited in northern Australia, in Reductive desulphurization a deep rift basin that was probably con- (FIG. 1) nected to the ocean . The biomark- O CH3 ers identified an entirely bacterial and O archaeal ecosystem that was distinct from communities found in modern marine environments9. Extracts of the oil-prone rocks yielded high concentrations of the Further diagenetic reduction aromatic , indicating that there was prolific growth of planktonic purple sulphur bacteria (Chromatiaceae), and and isorenieratane, indicating that green and brown pigmented Okenane green sulphur bacteria (Chlorobiaceae) Nature Reviews | Microbiology were also present. The sea was stratified Figure 2 | The diagenetic transformation of okenone, a found exclusively in plank- tonic purple sulphur bacteria (chromatiaceae), to the molecular fossil okenane. Okenane and sulphidic up into the of retains the diagnostic carbon skeleton of its precursor and can be extracted from billion-year-old the water column, consistent with models sedimentary rocks. It is regarded as a biomarker for chromatiaceae. that predict vastly euxinic oceans between ~1,800 and 800 Myr ago (FIG. 1). A lack of steroids suggests that eukaryotic algae Currently, biomarker interpretations Lipids produced by bacteria and archaea did not yet have an important role in this are based primarily on lipids detected in Improved cultivation and isolation tech- ancient sea. extant organisms that have been grown niques10 would clearly be an important step Okenane is unique to the 1,640-Myr- in the laboratory. However, most bacteria towards linking newly described bacteria old rocks from northern Australia and has and archaea have not yet been isolated10. and archaea to their lipid products. However, never been discovered in younger rocks. Of 105 known bacterial and archaeal the diversity of bacteria and archaea is prob- In the modern biosphere, the only known phyla, 73 do not have a single cultured ably vast, and the challenges of isolating precursor with an okenane skeleton is representative, and we do not have a sin- specific organisms of interest are so consid- okenone (FIG. 2), a purple-coloured carote- gle complete genome for 76 phyla11. Even erable that most of these analyses are likely noid found exclusively in Chromatiaceae. when organisms can be grown in isolation, to be impossible in the foreseeable future. The presence of okenane in these rocks was their lipid products might differ from Cultivation-independent techniques could therefore interpreted as evidence that the those produced in nature. Consequently, offer a promising alternative way to fill the family Chromatiaceae must have evolved our knowledge about the distribution of gaps in our knowledge of lipid biosynthetic before the end of the Palaeoproterozoic biomarkers in nature is not sufficient to pathways, and such techniques have already period9 (FIG. 1). But is this interpretation assess the reliability of biomarkers. considerably expanded our understanding robust? Could other, as-yet-uncharacterized ‘Orphan biomarkers’ are hydrocarbons of microbial biodiversity13. The generation of modern lineages produce okenone? Could that have been detected in sedimentary 16s rrNA gene clone libraries and group- Chromatiaceae have acquired the ability to rocks and crude oils, but have as-yet- specific fluorescence in situ hybridization synthesize okenone more recently, through unknown biological sources12. The most (FIsH) has helped to describe which organ- HGT? If acquisition was through HGT, prominent orphan biomarkers are cheilan- isms are present in natural environments the biomarker should match the gene thanes, which are tricyclic terpanes that and which microbial groups are quantita- donor and not the recipient. An alterna- occur in most sedimentary rocks from the tively important14. Cultivation-independent tive explanation is that the okenane in Precambrian period to the present day. sequencing of functional genes has provided ancient rocks was derived from a group of These biomarkers are as abundant and information about the number and diversity organisms that is now extinct. It is clear widely distributed as eukaryotic steranes of organisms that can produce a certain that the interpretation of molecular finger- and bacterial hopanes, yet we have not class of lipids in a particular environment. prints preserved in rocks is limited by our recognized a single living representative. For example, in a seminal study15, Pearson knowledge of the biosynthetic capacities Evidently, we are still unaware of the natu- and colleagues used degenerate PCr prim- of organisms that existed when the rocks ral products of some of the most common ers to study the domain-wide genetic were formed. organisms in the ocean. diversity of the squalene-hopene cyclase

NATUrE rEvIEws | Microbiology vOlUME 7 | AUGUsT 2009 | 603 © 2009 Macmillan Publishers Limited. All rights reserved PersPectives

table 1 | Selected biomarkers, their biological origins and environmental interpretation biomarker Predominant biological origins and environmental interpretation Archaea

4,12 Head-to-head linked acyclic isoprenoids (c31 to c40), Archaea in general ; the number of rings incorporated into the isoprenoid chains of archaeol, other isoprenoidal glycerol ethers and glycerol dialkyl glycerol tetraethers is a proxy for annual mean sea surface temperature 6 isoprenoidal glycerol dialkyl glycerol tetraethers (tEX86) 2,6,15,19-tetramethylicosane Marine pelagic crenarchaeota; exclusively associated with a mid-cretaceous oceanic anoxic event5,72 Bacterioruberane A predicted biomarker for Haloarchaea; postulated to be indicative of salt lake environments12 crocetane Archaeal anaerobic methane oxidizers; indicative of sub-marine gas seeps, gas hydrates or mud volcanoes73 (also found as a degradation product of diaromatic carotenoids) 2,6,10,15,19-Pentamethylicosane Methanogenic and methanotrophic archaea74 crenarchaeol Marine pelagic crenarchaeota; found from the cretaceous period to the present5 Bacteria

75 c31 to c35 hopanes Bacteria in general 2α-Methylhopanes cyanobacteria17, Pseudomonadales and rhizobiales76 3β-Methylhopanes type I methanotrophic bacteria (Methylococcaceae), other methylotrophs and acetic acid bacteria77,78 4-Methylcholestanes and 4,4-dimethylcholestanes type I methanotrophic bacteria (Methylococcaceae)9,79, dinoflagellates and biosynthetic intermediates in all eukaryotes chlorobactane Green pigmented chlorobiaceae; indicates anoxic and sulphidic conditions in the water column in the presence of light (photic zone ) or in microbial mats80 Isorenieratane carbon isotopically heavy isorenieratane is diagnostic for brown pigmented chlorobiaceae and is indicative of photic zone anoxia (in the water column or in microbial mats)81; isotopically lighter isorenieratane may be derived from sponges and actinomycetes82 Okenane Purple sulphur bacteria (chromatiaceae); indicative of photic zone euxinia and planktonic conditions (currently only associated with the late Palaeoproterozoic period9,82) Eukaryotes

83 24-Norcholestane (c 26 steroids) Diatoms; found from the cretaceous period to the present 84 85 cholestanes (c27 steroids) Eukaryotes in general with possible minor contributions from myxobacteria ; probably predominantly derived from red algae in the Proterozoic period

84 Ergostanes (c28 steroids) Diagnostic for eukaryotes ; relative concentrations increased through the Phanerozoic period, when they were predominantly derived from secondary symbiotic algae such as dinoflagellates, diatoms and coccolithophorids86

84 stigmastane (c29 steroids) Diagnostic for eukaryotes ; the main sources in the Neoproterozoic period were probably green algae and in the post-Devonian period, probably terrestrial plants

24-n-Propylcholestanes (c30 steroids) Pelagophyte algae (‘brown tides’ and sarcino chrysidales); almost always representative of marine environments87

24-Isopropylcholestanes (c30 steroids) Demosponges; found from the Neoproterozoic period to the present and represent the oldest (>635 million years (Myr) old) credible evidence for animal life in the geological record52,88,89

Dinosteranes (c30 steroids) Predominantly dinoflagellates (rare in diatoms); found from the Mesozoic to cenozoic periods (dinosteranes in older rocks are either from younger contamination or are derived from ‘protodinoflagellates’)90

c30 highly branched isoprenoid centric diatoms (Rhizosolenia); found in marine environments from 91.5 Myr ago (Upper turonian period) to the present50,91

c37 to c39 di- to tetra-unsaturated alkenones Prymnesiophyte algae (marine); the number of double bonds is used to estimate past 37 92 temperatures in the photic zone of the water column (Uk proxy) Gammacerane Probably predominantly bacterivorous ciliates (possibly also fungi, ferns and alphaproteobacteria)93,94 Oleanane, cadinanes and bicadinanes95 Angiosperms; oleanane in pre-cretaceous rocks predates the oldest angiosperm fossils96 Phyllocladanes, beyerane, kaurane and atisane conifers; found from the Devonian period to the present4,97

604 | AUGUsT 2009 | vOlUME 7 www.nature.com/reviews/micro © 2009 Macmillan Publishers Limited. All rights reserved PersPectives

(sqhC) gene in marine and freshwater bac- products. For example, a single mutation in Analysing parallel variations of abun- terioplankton. Functionalized the protosteryl cation stabilizing region of dance of lipids and potential source organ- are preserved in large quantities in modern cycloartenol synthase enables the produc- isms across multiple samples can also give aquatic sediments, and hopanes, their fossil tion of additional sterols that have a different clues about which source produces what. analogues, are probably the most abundant carbon skeleton (lanosterol and parkeol)20. The genomes of potential source organisms polycyclic hydrocarbons in crude oil of all In many cases, it will therefore be difficult to can then be inspected, using bioinformatics, ages16. Cyanobacteria were cited as a major deduce the carbon skeleton of a lipid prod- for distant homologues of enzymes known source17. However, Pearson’s environmental uct based on genomic information alone, to be involved in the production of similar study yielded 79 novel sqhC sequences that and even more challenging to predict the lipids. However, it can be difficult to rec- averaged only 60% translated amino acid number, position and stereochemistry ognize homologues from highly divergent identity to their closest relatives in the data- of functional groups. However, in the case of lineages. This limitation will be overcome bases. surprisingly, cyanobacteria were not well-studied proteins for which the three once comprehensive databases of genomic among them. This study highlights how the dimensional structures of the reactive cen- sequences and lipid products become avail- biological sources of even the more common tres and substrate pockets are known, it able. At that time it might be possible to biomarkers remain largely unknown15. is possible to predict exact lipid products. determine the function of a gene involved Although the recovery of single targeted For example, the position of bulky amino in lipid biosynthesis by comparison with genes can yield insights into the lipid bio- acid residues protruding into the product three-dimensional enzyme structures from synthetic capacity of bacteria and archaea in channel of prenyltransferases can be used to potentially distantly related organisms26. natural environments, we posit that it is pos- predict where the elongation of the grow- Corresponding genes could also be trans- sible to determine an almost-complete lipid ing polyprenyl chain will be terminated and ferred to, and expressed in, organisms in complement produced by individual bacteria whether the major products will have 20, 25, culture and then be subjected to functional and archaea in an environment through 30 or more carbon atoms21. screening, an approach sometimes referred cultivation-independent genomic, proteomic For many lipid classes, such as sterols, car- to as functional genomics27,28. For example, and organic geochemical analysis of natural otenoids, acyclic isoprenoids and phospholi- a metagenomic library of river sediments consortia. Using the approach outlined in pids, identification of complete biosynthetic surprisingly yielded a putative bacteriochlo- FIG. 3, near-complete genomes of all domi- pathways and exact lipid products based on rophyll a synthase (bchG) gene associated nant (>5% relative abundance) bacteria and genomic information is already possible. with a 16s rrNA gene from the uncultivated archaea can be recovered from an environ- However, it remains a challenge to recognize miscellaneous crenarchaeotal group. The mental sample. This was first achieved for genes that belong to new or incompletely putative bchG was cloned and heterologously simple consortia that are dominated by a characterized pathways. Most genes involved expressed in Escherichia coli; the resulting small number of organisms for which deep in lipid biosynthesis remain unknown, and protein was capable of synthesizing bacterio- genomic sampling of dominant populations environmental samples continuously yield chlorophyll a, presenting the first evidence for is possible. The approach has since been new lipid structures and even new, and a functional enzyme for bacteriochlorophyll deployed across a wide range of environ- unusual, lipid classes, such as ladderanes22. biosynthesis in archaea29. ment types (reviewed in rEF. 18). Community Assigning lipids with uncharacterized bio- Community genomics also help to define genomic data from an acidophilic biofilm synthetic pathways to bacteria and archaea the conditions that might bring recalcitrant yielded near-complete (>95%) genomes of the that persistently defy isolation requires bacteria and archaea into culture30, facilitat- five dominant archaea and bacteria19. Only unconventional methods. lipids with unu- ing the study of unknown lipid biosynthetic one of these species had been cultured previ- sual carbon isotopic compositions, such as pathways using genetic and molecular bio- ously. with improved methods for assigning those of methanotrophs or green sulphur logical techniques. New information about reconstructed genome fragments to the organ- bacteria, could be linked to specific source the conditions that can allow microorgan- isms from which they were derived (binning), organisms by comparison with the carbon isms to be cultured will also come from extensive genomic reconstruction of the isotopic composition of diagnostic DNA comparisons of the proteomes of cultivated same community has been achieved for nine fragments23 or by micron-scale 13C measure- whole communities with those of environ- archaea and five bacteria, some of which con- ments of fluorescently labelled cell aggregates mental communities. Anomalies detected stitute only a few percent of the community using secondary ion in the culture, such as overproduction of (G.J. Dick, personal communication). (sIMs)24. Moreover, with time-of-flight oxidative stress response proteins and osmo- (TOF)-sIMs it is possible to map the distribu- protectants, will reveal how the media could Correlating genes with lipid products. Once tion of lipids in mounted environmental sam- be revised to improve the growth of stressed a near-complete gene complement has been ples, such as sections through microbial mats25. organisms, ultimately leading to their isolation recovered for an uncultivated organism, it At a spatial resolution of 5 μm, TOF-sIMs can and the study of their lipid content. should be possible to predict all the lipids detect attomolar quantities of intact lipids with that the organism has the ability to produce a sufficient mass resolution to identify a wide Environmental regulation of lipid biosynthesis. (FIG. 3). In practice, however, association of range of molecules. At lower mass resolution, some lipids, and their structural variants, genes with lipid products can be difficult, TOF-sIMs can achieve a spatial resolution in are produced in response to changing owing to the fact that generic proteins, such the submicron range, at the size of individual environmental conditions; for example, as cytochromes or demethylases, are impli- microorganisms. If sensitivity and mass accu- the GDGTs discussed above incorporate cated in the structural modification and racy can be further improved at high spatial an increasing number of rings into the functional ornamentation of lipids. Even resolution, TOF-sIMs, combined with FIsH, lipid chain in response to rising tempera- slight changes to the amino acid sequence will allow characterization of the lipids in a tures. Although community genomic data of enzymes can cause widely different lipid single cell. sets contain information about which

NATUrE rEvIEws | Microbiology vOlUME 7 | AUGUsT 2009 | 605 © 2009 Macmillan Publishers Limited. All rights reserved PersPectives

Molecular fossils Microbial ecosystem

A 11 Interpretation 10 Confirmation by spatial of the fossil record high resolution mass B 1 DNA extraction spectrometry C Community genomic data set Community lipidome OH 9 Lipid extraction

HO and identification

OH

OH HO OH OH

OH OH 2 Reconstruction of HO 5 Protein genomes extraction

Reconstructed genomes 8 Assignment of lipids a to organisms based on Community proteome b predicted structures c

3 Identification of genes involved OH OH HO in lipid OH OH biosynthesis (*) Predicted * HO Unidentified enzymes a * lipids * b * * * OH c

HO

OH 4 Prediction of OH primary enzyme a a structures

7 Prediction of lipids produced b b by each organism based on observed biosynthetic pathways

c c 6 Identification of proteins based on predicted Identified proteins amino acid sequences Predicted proteins Figure 3 | community genomics and proteomics for prediction of lipid relative abundance) bacteria and archaea can be recovered (step 2). Genes products. With nearly completely reconstructed genomes and knowledge involved in lipid biosynthesis are identified (step 3) and primary protein struc- of which genes were expressed at the time when the sample was taken, it tures are predicted according to the base sequences of these genes (step 4). should, in principle, be possible to assign each lipid detected in a community Proteins involved in lipid biosynthesis that were extracted from the same envi- lipidome to an organism or group of organisms in a consortium, even if these ronmental sample (step 5) can then be identifiedNa ontur thee Re basisviews of | theMicr predictedobiology organisms cannot be isolated and grown in culture. In step 1, DNA is primary protein structures and assigned to individual organisms (step 6). extracted from natural samples, fragments are sequenced and sequences are Using information about the enzymes that are part of well-studied lipid bio- assembled to reconstruct larger stretches of genome. the reconstructed synthetic pathways and that were detected in the environmental sample, it DNA segments are assigned to populations of organisms by co-assembly into should be possible to predict major lipid end products (step 7) and thereby larger fragments that carry phylogenetically informative genes or are binned assign actual lipids in the environment to individual organisms or a group of based on metrics such as depth of sampling, Gc percentage19 or through organisms (steps 8–10). the identification of lipids of organisms that cannot be sequence signature analysis68–70, following methods pioneered by Karlin and isolated could improve our understanding of the sources and environmental colleagues71. In this way, nearly complete genomes of all dominant (>5% importance of fossil lipids in the geological record (step 11). microorganism can potentially produce study of the presence and abundance of acid sequences derived from community what type of lipid, we cannot determine proteins in natural microbial environments genomic data sets (FIG. 3; step 4), identifica- which lipid biosynthetic enzymes are actu- (FIG. 3; steps 4, 5 and 6). tion of several thousand protein structures ally expressed under specific environmen- without genomic reference informa- is possible32. In these analyses, proteins are tal conditions, or at a particular point in tion, the study of proteins in environmen- extracted from environmental samples (FIG. time, using this information alone. This tal samples is restricted to a small fraction 3; step 5) and enzymatically fragmented at question of ‘who makes what, when’ can be of the community proteome31. However, specific amino acids to produce peptides. answered by community proteomics, the using information from predicted amino The mass to charge ratio is determined for

606 | AUGUsT 2009 | vOlUME 7 www.nature.com/reviews/micro © 2009 Macmillan Publishers Limited. All rights reserved PersPectives

both the whole peptides and their fragmen- mrNA transcripts38. Using the techniques One of the greatest uncertainties in tation products. To identify peptides, the mass described above, changes in lipid biosynthesis the interpretation of ancient biomarkers is the spectra are then compared with theoretical and other metabolic activities could be fol- possibility that suitable precursor lipids were mass to charge ratios for peptides and their lowed simultaneously in all major and some also present in groups of organisms that are fragmentation products that are predicted minor organisms of a community. This, in now extinct. Although this problem seems to from genome sequences (FIG. 3; step 6). The turn, will help to identify the functions of be intractable, it should be possible to assess experimentally identified peptides are mapped those lipids, and their structural variants, that the probability that extinct groups acquired back onto predicted proteins to determine the are important in the geological record, such as complete or partial biosynthetic pathways source organism. This methodology was used hopanoids, methylhopanoids, and sterols and in the past, based on information about the to identify 2,033 proteins from the 5 dominant their alkylated counterparts. frequency of HGT and convergent evolution species that formed an acidic biofilm32. For some secondary metabolites carry in extant organisms. For example, we could these 5 species, over 50% of all predicted pro- specific carbon skeletons that can, in prin- estimate the probability that extinct archaea teins were identified (with a <1% false positive ciple, be preserved as hydrocarbon fossils. acquired steroid biosynthesis from bacteria rate33). A probable function was recognized However, their concentrations will be by looking at the frequency of similar events for 69% of these proteins, including enzymes minute, and their detection in the complex between archaea and bacteria in the past. The involved in lipid . The community matrix of bitumens and oils will be a formi- frequency of HGT is probably dependent on proteome reveals which enzymes involved in dable challenge. Coupling two-dimensional the function of the metabolic product, the lipid biosynthesis are expressed, and roughly gas techniques with fast position of the newly acquired pathway in in what concentration, at the time of sam- multidimensional Ms (such as TOF–TOF) the existing metabolic network, and the pling. In this way, it should become possible could disentangle complex hydrocarbon classes and number of genes involved in to assign each lipid detected in the com- mixtures and make even the minutest traces the transfer. The frequency of HGT between munity lipidome to an organism, or group visible. In this way, fossil siderophores two groups will also strongly depend on the of organisms, in the consortium (FIG. 3; steps could yield information about the scarcity evolutionary distance between them. On 7–10). Community proteomic methods offer of specific ions in ancient oceans and fossil the basis of experimental40 and community a snapshot of the lipid biosynthetic activity antibiotics could tell us when the first genomic data41, it is now well-established of a microbial community under defined bacterivorous predators arose. that the frequency of genetic exchange by environmental conditions and can be applied homologous recombination shows a log- to environmental perturbation or time series Extrapolating into the past linear dependence on sequence identity. In experiments. specific lipids can be present in different other words, homologous recombination Proteogenomic studies on environmental lineages through common inheritance, events become increasingly rare as organ- samples that contain uncultivated organ- convergent evolution or HGT. HGT is par- isms become more distantly related. But can isms have, as yet, only been applied to abun- ticularly problematic for biomarker inter- relationships be defined for other lateral dant members of microbial communities32. pretations because too little is known about transfer mechanisms? For example, although Proteogenomics of fractionated samples, its prevalence. For example, 1.6-Gyr (billion inter-domain transfers of lipid biosynthetic such as cells separated according to size or years)-old rocks from northern Australia pathways have been documented, they seem fluorescence-activated cell sorting14, could be contain steroids that are methylated at to be rare. Massive, cultivation-independent used to capture genomes, proteomes and lip- C4. The biosynthetic capacity to produce sequence data sets can provide the required ids of organisms that can play a crucial part in these steroids is found in all eukaryotes rates of HGT for different metabolic products an ecosystem but that have low relative abun- but also occurs in at least three unrelated at all hierarchical levels in the tree of life. They dances. studies of the genetic diversity and groups of bacteria39. Phylogenetic trees of will also yield insights into the mechanisms proteomes of complex microbial ecosystems oxidosqualene cyclase, the first enzyme in of HGT, particularly the role of viruses and such as soil and marine water will also require the sterol biosynthetic pathway, show that other mobile elements in gene transfer. Other the collection of extensive reference genomes, sterol biosynthesis was laterally transferred questions that could be answered include: and DNA amplification from single cells will between eukaryotes and the three bacterial how commonly do viral genomes encode probably become the method of choice to groups39. The timing and direction of trans- lipid pathways and how wide is the host range generate this data34–36. fer are currently unknown. However, infor- of modern viruses? Although uncertainties with recent advances in peptide mass mation about the age of the HGT events, will remain, such as the size of the microbial spectrometry and sample fractionation, it is obtained from molecular clocks, could biosphere in the past and changes in fre- possible to detect proteins with five orders help to unravel which of the four steroid- quency of HGT over time, the parameters of dynamic range37. As the different proteins of producing groups is the most likely source of discussed above can be combined to estimate microorganisms have a concentration range the 1.6-Gyr-old steroids. For example, if the the probability of transfer events and con- of about six orders of magnitude, it will be bacterial groups acquired oxidosqualene vergent evolution in the past and to make possible to identify all but the least-abundant cyclase from eukaryotes less than 1.6 Gyr statements about the reliability of biomarker proteins in sorted cell concentrates. This ago, then Eukarya are the most likely interpretations. offers the potential to study subtle adjust- source. Even though such molecular clock ments of the lipid biosynthetic apparatus to estimates will commonly have uncertain- outlook changing physical and chemical conditions ties in the order of hundreds of millions Bacteria and archaea evolved early in the without disturbing the system. It will also be of years, the information can be sufficient Earth’s history (FIG. 1). Yet there is no diag- possible to detect adjustments in lipid metab- to discriminate between alternative bio- nostic morphological evidence for the olism in microbial communities by sequenc- logical sources of particularly ancient domain Bacteria in the rock record until ing cDNA products from environmental biomarkers. 2.1 Gyr ago, and no fossilized cell of any age

NATUrE rEvIEws | Microbiology vOlUME 7 | AUGUsT 2009 | 607 © 2009 Macmillan Publishers Limited. All rights reserved PersPectives

has been recognized as an archaeon. The 1. Coolen, M. J. L. & Overmann, J. 217,000‑year‑old DNA Structural Genomics Center. J. Struct. Funct. Genomics sequences of green bacteria in Mediterranean 8, 99–105 (2007). reason is clear: even in the exceptionally rare sapropels and their implications for the reconstruction 27. Allen, H. K., Moe, L. A., Rodbumrer, J., Gaarder, A. & event that soft-bodied organisms are pre- of the paleoenvironment. Environ. Microbiol. 9, Handelsman, J. Functional metagenomics reveals 238–249 (2007). diverse β‑lactamases in a remote Alaskan soil. ISME J. served in rock, the simple shapes and forms 2. Werne, J. P. et al. Investigating pathways of diagenetic 3, 241–251 (2008). of almost all archaea and bacteria make it organic matter sulfurization using compound‑specific 28. Lim, H. K. et al. Characterization of a forest soil sulfur isotope analysis. Geochim. Cosmochim. Acta 72, metagenome clone that confers indirubin and indigo difficult to assign cellular fossils to any par- 3489–3502 (2008). production on Escherichia coli. Appl. Environ. ticular clade. Molecular fossils fill some of the 3. Hebting, Y. et al. Biomarker evidence for a major Microbiol. 71, 7768–7777 (2005). preservation pathway of sedimentary organic carbon. 29. Meng, J. et al. An uncultivated crenarchaeota contains vast gaps in the geological record. Diagnostic Science 312, 1627–1631 (2006). functional bacteriochlorophyll a synthase. ISME J. 3, biomarkers exist for organisms that other- 4. Peters, K. E., Walters, C. C. & Moldowan, J. M. 106–116 (2009). The Biomarker Guide (Cambridge Univ. Press, UK, 2004). 30. Tyson, G. W. & Banfield, J. F. Cultivating the wise leave no traces, such as green and 5. Kuypers, M. M. M. et al. Massive expansion of marine uncultivated: a community genomics perspective. purple sulphur bacteria, crenarchaeotes, archaea during a mid‑Cretaceous oceanic anoxic event. Trends Microbiol. 13, 411–415 (2005). Science 293, 92–94 (2001). 31. Denef, V. J., Shah, M. B., VerBerkmoes, N. C., Hettich, and bacterial and archaeal methanotrophs. 6. Schouten, S., Hopmans, E. C., Schefuss, E. & Sinninghe R. L. & Banfield, J. F. Implications of strain‑ and species‑ These biomarkers, accompanied by isotopic Damsté, J. S. Distributional variations in marine level sequence divergence for community and isolate crenarchaeotal membrane lipids: a new tool for shotgun proteomic analysis. J. Proteome Res. 6, signatures of biogenic materials, rare diag- reconstructing ancient sea water temperatures? Earth 3152–3161 (2007). nostic body fossils, and deductions from Planet. Sci. Lett. 204, 265–274 (2002). 32. Ram, R. J. et al. Community proteomics of a natural 7. Sluijs, A. et al. Environmental precursors to rapid light microbial biofilm. Science 308, 1915–1920 (2005). molecular clocks and ancestral state recon- carbon injection at the Palaeocene/Eocene boundary. 33. Lo, I. et al. Strain‑resolved community proteomics structions, are our window into the Earth’s Nature 450, 1218–1221 (2007). reveals recombining genomes of acidophilic bacteria. 8. Summons, R. E., Powell, T. G. & Boreham, C. J. Nature 446, 537–541 (2007). microbial past (FIG. 1). Biomarkers have Petroleum geology and geochemistry of the Middle 34. Walker, A. & Parkhill, J. Single‑cell genomics. Nature helped to track the response of microorgan- Proterozoic McArthur Basin, northern Australia: III. Rev. Microbiol. 6, 176–177 (2008). Composition of extractable hydrocarbons. Geochim. 35. Raes, J. & Bork, P. Molecular eco‑systems biology: isms to cataclysmic events in the Earth’s Cosmochim. Acta 52, 1747–1763 (1988). towards an understanding of community function. history, such as major perturbations of the 9. Brocks, J. J. et al. Biomarker evidence for green and Nature Rev. Microbiol. 6, 693–699 (2008). purple sulphur bacteria in a stratified Paleoproterozoic 36. Woyke, T. et al. Assembling the marine metagenome, carbon cycle, anoxic oceans, major glacia- sea. Nature 437, 866–870 (2005). one cell at a time. PLoS ONE 4, e5299 (2009). tions, global heating events and resulting 10. Giovannoni, S. & Stingl, U. The importance of culturing 37. VerBerkmoes, N. C., Denef, V. J., Hettich, R. L. & bacterioplankton in the ‘omics’ age. Nature Rev. Banfield, J. F. Functional analysis of natural microbial catastrophic losses in biodiversity. As many Microbiol. 5, 820–826 (2007). consortia using community proteomics. Nature Rev. organisms and their lipids are sensitive to 11. Achtman, M. & Wagner, M. Microbial diversity and the Microbiol. 7, 196–205 (2009). genetic nature of microbial species. Nature Rev. 38. Frias‑Lopez, J. et al. Microbial community gene environmental conditions, biomarkers have Microbiol. 6, 431–440 (2008). expression in ocean surface waters. Proc. Natl Acad. also yielded records of marine temperatures, 12. Brocks, J. J. & Summons, R. E. in Treatise on Sci. USA 105, 3805–3810 (2008). Geochemistry Vol. 8 (ed. Schlesinger, W. H.) 63–115 39. Pearson, A., Budin, M. & Brocks, J. J. Phylogenetic and eutrophication, anoxic and sulphidic conditions (Elsevier–Pergamon, Oxford, 2004). biochemical evidence for sterol synthesis in the and hypersalinity. 13. Medini, D. et al. Microbiology in the post‑genomic era. bacterium Gemmata obscuriglobus. Proc. Natl Acad. Nature. Rev. Microbiol. 6, 419–430 (2008). Sci. USA 100, 15352–15357 (2003). The study of ancient microbial ecosys- 14. Amann, R. & Fuchs, B. M. Single‑cell identification in 40. Roberts, M. S. & Cohan, F. M. The effect of DNA tems using hydrocarbon biomarkers is a microbial communities by improved fluorescence in situ sequence divergence on sexual isolation in Bacillus. hybridization techniques. Nature Rev. Microbiol. 6, Genetics 134, 401–408 (1993). young field and most applications remain to 339–348 (2008). 41. Eppley, J. M., Tyson, G. W., Getz, W. M. & Banfield, J. F. be uncovered. As we fill in the gaps in our 15. Pearson, A., Flood Page, S. R., Jorgenson, T. L., Fischer, Genetic exchange across a species boundary in the W. W. & Higgins, M. B. Novel hopanoid cyclases from archaeal genus Ferroplasma. Genetics 177, knowledge about the biosynthetic capacities the environment. Environ. Microbiol. 9, 2175–2188 407–416 (2007). of extant organisms, and link gene inven- (2007). 42. Canfield, D. E. The early history of atmospheric oxygen: 16. Ourisson, G. in Early Life on Earth, Nobel Symposium homage to Robert M. Garrels. Annu. Rev. Earth Planet tories to lipid profiles from environmental Vol. 84 (ed. Bengtson, S.) 259–269 (Columbia Sci. 33, 1–36 (2005). samples, we will discover new biomarkers Univ. Press, New York, 1994). 43. Canfield, D. E., Poulton, S. W. & Narbonne, G. M. Late‑ 17. Summons, R. E., Jahnke, L. L., Hope, J. M. & Logan, Neoproterozoic deep‑ocean oxygenation and the rise of and their meaning. This will help decipher G. A. 2‑Methylhopanoids as biomarkers for animal life. Science 315, 92–95 (2007). how the Earth and its biosphere have evolved cyanobacterial oxygenic photosynthesis. Nature 400, 44. Hoffman, P. F., Kaufman, A. J., Halverson, G. P. & 554–557 (1999). Schrag, D. P. A Neoproterozoic snowball Earth. Science together. In particularly ancient rocks, we 18. Wilmes, P., Simmons, S. L., Denef, V. J. & Banfield, J. F. 281, 1342–1346 (1998). could discover molecules that resemble The dynamic genetic repertoire of microbial communities. 45. Kirschvink, J. L. & Kopp, R. E. Palaeoproterozoic ice FEMS Microbiol. Rev. 33, 109–132 (2009). houses and the evolution of oxygen‑mediating enzymes: extant lipids but that are the products of an 19. Tyson, G. W. et al. Community structure and metabolism the case for a late origin of photosystem II. Philos. ancestral, abbreviated biosynthetic pathway. through reconstruction of microbial genomes from the Trans. R. Soc. Lond. B 363, 2755–2765 (2008). environment. Nature 428, 37–43 (2004). 46. Bekker, A. et al. Dating the rise of atmospheric oxygen. The sequences of such molecules would yield 20. Hart, E. A. et al. Directed evolution to investigate steric Nature 427, 117–120 (2004). information about the timing of the evolu- control of enzymatic oxidosqualene cyclization. An 47. Anbar, A. D. et al. A whiff of oxygen before the Great isoleucine‑to‑valine mutation in cycloartenol synthase Oxidation Event? Science 317, 1903–1906 (2007). tion of lipid biosynthetic pathways directly allows lanosterol and parkeol biosynthesis. J. Am. Chem. 48. Kaufman, A. J. et al. Late Archean biospheric from the geological record. we might also Soc. 121, 9887–9888 (1999). oxygenation and atmospheric evolution. Science 317, 21. Ohnuma, S. et al. A pathway where polyprenyl 1900–1903 (2007). finally be able to adopt orphan biomarkers diphosphate elongates in prenyltransferase. Insight into 49. Wille, M. et al. Evidence for a gradual rise of oxygen and use them to probe the appearance of a common mechanism of chain length determination of between 2.6 and 2.5 Ga from Mo isotopes and Re‑PGE prenyltransferases. J. Biol. Chem. 273, 26705–26713 signatures in shales. Geochim. Cosmochim. Acta 71, currently unknown organisms and link their (1998). 2417–2435 (2007). to events in the Earth’s history. 22. Sinninghe Damsté, J. S. et al. Linearly concatenated 50. Sinninghe Damsté, J. S. et al. The rise of the cyclobutane lipids form a dense bacterial membrane. rhizosolenid diatoms. Science 304, 584–587 (2004). Jochen J. Brocks is at the Research School of Earth Nature 419, 708–712 (2002). 51. Xiao, S., Zhang, Y. & Knoll, A. H. Three‑dimensional 23. Sessions, A. L., Sylva, S. P. & Hayes, J. M. A moving‑wire preservation of algae and animal embryos in a Sciences, and Centre for Macroevolution and device for carbon‑isotopic analyses of nanogram Neoproterozoic phosphorite. Nature 391, 553–558 Macroecology, The Australian National University, quantities of nonvolatile organic carbon. Anal. Chem. (1998). Canberra, ACT 0200, Australia. 77, 6519–6527 (2005). 52. Love, G. D. et al. Fossil steroids record the appearance 24. Orphan, V. J., House, C. H., Hinrichs, K. U., McKeegan, of Demospongiae during the Cryogenian period. Nature Jillian Banfield is at the Department of Earth and K. D. & DeLong, E. F. Methane‑consuming archaea 457, 718–721 (2009). Planetary Science, University of California, Berkeley, revealed by directly coupled isotopic and phylogenetic 53. Knoll, A. H., Javaux, E., Hewitt, D. & Cohen, P. CA 94720‑4767, USA. analysis. Science 293, 484–487 (2001). Eukaryotic organisms in Proterozoic oceans. Philos. 25. Thiel, V. et al. Biomarkers at the microscopic range: ToF‑ Trans. R. Soc. Lond. B 361, 1023–1038 (2006). Correspondence to J.J.B. SIMS molecular imaging of Archaea‑derived lipids in a 54. Butterfield, N. J. A vaucheriacean alga from the middle e‑mail: [email protected] microbial mat. Geobiology 5, 413–421 (2007). Neoproterozoic of Spitsbergen: implications for the 26. Shin, D. et al. Structure‑based inference of molecular evolution of Proterozoic eukaryotes and the Cambrian doi:10.1038/nrmicro2167 functions of proteins of unknown function from Berkeley explosion. Paleobiology 30, 231–252 (2004).

608 | AUGUsT 2009 | vOlUME 7 www.nature.com/reviews/micro © 2009 Macmillan Publishers Limited. All rights reserved PersPectives

55. Rasmussen, B., Bengtson, S., Fletcher, I. R. & 80. Schaeffer, P., Adam, P., Wehrung, P. & Albrecht, P. 91. Nichols, P. D., Volkman, J. K., Palmisano, A. C., McNaughton, N. J. Discoidal impressions and trace‑like Novel aromatic carotenoid derivatives from sulfur Smith, G. A. & White, D. C. Occurrence of an

fossils more than 1200 million years old. Science 296, photosynthetic bacteria in sediments. Tetrahedr. isoprenoid C25 di‑unsaturated alkene and high 1112–1115 (2002). Lett. 38, 8413–8416 (1997). neutral lipid content in Antarctic sea‑ice diatom 56. Hofmann, H. J. Precambrian microflora, Belcher 81. Summons, R. E. & Powell, T. G. Chlorobiaceae in communities. J. Phycol. 24, 90–96 (1988). Islands, Canada: significance and systematics. Paleozoic seas revealed by biological markers, 92. Brassell, S. C., Eglinton, G., Marlowe, I. T., J. Paleontol. 50, 1040–1073 (1976). isotopes and geology. Nature 319, 763–765 (1986). Pflaumann, U. & Sarnthein, M. Molecular 57. Rasmussen, B., Fletcher, I. R., Brocks, J. J. & Kilburn, 82. Brocks, J. J. & Schaeffer, P. Okenane, a biomarker stratigraphy: a new tool for climatic assessment. M. R. Reassessing the first appearance of eukaryotes for (Chromatiaceae), and other Nature 320, 129–133 (1986). and Cyanobacteria. Nature 455, new carotenoid derivatives from the 1,640 Ma 93. Ten Haven, H. L., Rohmer, M., Rullkötter, J. & 1101–1104 (2008). Barney Creek Formation. Geochim. Cosmochim. Acta Bisseret, P. Tetrahymanol, the most likely precursor 58. Brocks, J. J., Summons, R. E., Buick, R. & Logan, G. A. 72, 1396–1414 (2008). of gammacerane, occurs ubiquitously in marine Origin and significance of aromatic hydrocarbons in 83. Holba, A. G. et al. 24‑norcholestanes as age‑ sediments. Geochim. Cosmochim. Acta 53, giant iron ore deposits of the late Archean Hamersley sensitive molecular fossils. Geology 26, 783–786 3073–3079 (1989). Basin in Western Australia. Org. Geochem. 34, (1998). 94. Sinninghe Damsté, J. S. et al. Evidence for 1161–1175 (2003). 84. Volkman, J. K. Sterols in microorganisms. Appl. gammacerane as an indicator of water column 59. Kazmierczak, J. & Altermann, W. Neoarchean Microbiol. Biotechnol. 60, 496–506 (2003). stratification. Geochim. Cosmochim. Acta 59, biomineralization by benthic cyanobacteria. Science 85. Bode, H. B. et al. Steroid biosynthesis in prokaryotes: 1895–1900 (1995). 298, 2351 (2002). identification of myxobacterial steroids and cloning 95. van Aarssen, B. G. K., Hessels, J. K. C., Abbink, O. A. 60. Shen, Y., Buick, R. & Canfield, D. E. Isotopic evidence of the first bacterial 2,3(S)‑oxidosqualene cyclase & de Leeuw, J. W. The occurrence of polycyclic for microbial sulphate reduction in the early Archaean from the myxobacterium Stigmatella aurantiaca. sesqui‑, tri‑, and oligoterpenoids derived from a era. Nature 410, 77–81 (2001). Mol. Microbiol. 47, 471–481 (2003). resinous polymeric cadinene in crude oils from 61. Hayes, J. M. in Early Life on Earth (ed. Bengtson, S.) 86. Grantham, P. J. & Wakefield, L. L. Variations in the southeast Asia. Geochim. Cosmochim. Acta 56, 220–236 (Columbia Univ. Press, New York, 1994). sterane carbon number distribution of marine 1231–1246 (1992). 62. Hinrichs, K.‑U. Microbial fixation of methane carbon source rock derived oils through geological time. 96. Moldowan, J. M. et al. The molecular fossil record of at 2.7 Ga: was an anaerobic mechanism possible? Org. Geochem. 12, 61–73 (1988). oleanane and its relation to angiosperms. Science Geochem. Geophys. Geosyst. 3, 1042 (2002). 87. Moldowan, J. M. et al. Sedimentary 265, 768–771 (1994). 63. Rasmussen, B. Filamentous microfossils in a 24‑n‑propylcholestanes, molecular fossils diagnostic 97. Noble, R. A., Alexander, R., Kagi, R. I. & Knox, J. 3,235‑million‑year‑old volcanogenic massive sulphide of marine algae. Science 247, 309–312 (1990). Tetracyclic diterpenoid hydrocarbons in some deposit. Nature 405, 2676–2679 (2000). 88. McCaffrey, M. A. et al. Paleoenvironmental Australian coals, sediments and crude oils. Geochim.

64. Philippot, P. et al. Early Archaean microorganisms implications of novel C30 steranes in Precambrian to Cosmochim. Acta 49, 2141–2147 (1985). preferred elemental sulfur, not . Science 317, Cenozoic age petroleum and bitumen. Geochim. 1534–1537 (2007). Cosmochim. Acta 58, 529–532 (1994). Acknowledgements 65. Ueno, Y., Yamada, K., Yoshida, N., Maruyama, S. & 89. McCaffrey, A. M., Dahl, J. E. P., Sundararaman, P., We thank W. Fischer and B. Rasmussen for advice on figure 1. Isozaki, Y. Evidence from fluid inclusions for microbial Moldowan, J. M. & Schoell, M. Source rock quality methanogenesis in the early Archaean era. Nature determination from oil biomarkers II: a case study 440, 516–519 (2006). using tertiary‑reservoired Beauford Sea oils. Am. FURTHER inFoRMATion 66. Walter, M. R., Buick, R. & Dunlop, J. S. R. Stromatolites Assoc. Pet. Geol. Bull. 78, 1527–1540 (1994). Jochen J. Brocks’ homepage: http://shrimp.anu.edu.au/ 3,400–3,500 Myr old from the North Pole area, 90. Moldowan, J. M. & Talyzina, N. M. Biogeochemical people/jjb/jjb.html Western Australia. Nature 284, 443–445 (1980). evidence for dinoflagellate ancestors in the Early All links ArE AcTivE in ThE onlinE Pdf 67. Rosing, M. T. 13C‑depleted carbon microparticles in Cambrian. Science 281, 1168–1170 (1998). >3700‑Ma sea‑floor sedimentary rocks from West Greenland. Science 283, 674–676 (1999). 68. Teeling, H., Meyerdierks, A., Bauer, M., Amann, R. & Glöckner, F. O. Application of tetranucleotide frequencies for the assignment of genomic fragments. Environ. Microbiol. 6, 938–947 (2004). 69. Abe, T., Sugawara, H., Kinouchi, M., Kanaya, S. & Ikemura, T. Novel phylogenetic studies of genomic SciEncE And SociETy sequence fragments derived from uncultured microbe mixtures in environmental and clinical samples. DNA Res. 12, 281–290 (2005). 70. Mavromatis, K. et al. Use of simulated data sets to Communicable disease among evaluate the fidelity of metagenomic processing methods. Nature Methods 4, 495–500 (2007). 71. Karlin, S., Mrázek, J. & Campbell, A. M. displaced Afghans: refuge without Compositional biases of bacterial genomes and evolutionary implications. J. Bacteriol. 179, 3899–3913 (1997). 72. Vink, A., Schouten, S., Sephton, S. & Sinninghe Damsté, shelter J. S. A newly discovered norisoprenoid, 2,6,15,19‑tetramethylicosane, in Cretaceous black shales. Geochim. Cosmochim. Acta 62, 965–970 Alefiyah Rajabali, Omer Moin, Amna S. Ansari, Mohammad R. Khanani (1998). 73. Thiel, V. et al. Highly isotopically depleted isoprenoids: and Syed H. Ali molecular markers for ancient methane venting. Geochim. Cosmochim. Acta 63, 3959–3966 (1999). Abstract | More than 23 years of warfare in Afghanistan has caused over 6 million 74. Schouten, S., van der Maarel, M. J., Huber, R. & Sinninghe Damsté, J. S. 2,6,10,15,19‑ Afghans to seek asylum in approximately 70 different countries, with most Afghan pentamethylicosenes in Methanolobus bombayensis, a marine methanogenic archaeon, and in Methanosarcina refugees settling in the developing countries of Pakistan and Iran. In a developing host mazei. Org. Geochem. 26, 409–414 (1997). 75. Rohmer, M., Bouvier‑Navé, P. & Ourisson, G. country, poor sanitation and nutrition, overcrowding and inaccessibility to health care Distribution of hopanoid triterpenes in prokaryotes. J. Gen. Microbiol. 130, 1137–1150 (1984). facilities act synergistically to influence morbidity and mortality from infectious 76. Rashby, S. E., Sessions, A. L., Summons, R. E. & disease in the refugee population. In this science and society article we discuss the Newman, D. K. Biosynthesis of 2‑methylbacteriohopanepolyols by an anoxygenic prevalence of transmissible infection, modes of transmission, associated risk factors, phototroph. Proc. Natl Acad. Sci. USA 104, 15099–15104 (2007). and the state of health and health care in the displaced Afghan population. 77. Zundel, M. & Rohmer, M. Prokaryotic triterpenoids 1. 3 β‑methylhopanoids from Acetobacter species and Methylococcus capsulatus. Eur. J. Biochem. 150, 23–27 (1985). Afghan refugees constitute the single largest Kingdom, Germany, the Netherlands, 78. Summons, R. E. & Jahnke, L. L. in Biological refugee population in the world1. Although Denmark, sweden, Norway, Austria, Markers in Sediments and Petroleum (eds Moldowan, J. M., Albrecht, P. & Philp, R. P.) most live in Pakistan (1.9 million) and Hungary, Egypt, Canada, the United states 182–200 (Prentice Hall, New Jersey, 1992). Iran (0.9 million) (FIG. 1a), a number of and Australia2. Three key periods in the 79. Bird, C. W. et al. Steroids and squalene in Methylococcus capsulatus grown on methane. other countries also host Afghan refugees, history of Afghanistan have generated Nature 230, 473–474 (1971). including India, Uzbekistan, the United waves of migrant refugees (FIG.1b). The first

NATUrE rEvIEws | Microbiology vOlUME 7 | AUGUsT 2009 | 609 © 2009 Macmillan Publishers Limited. All rights reserved