bs_bs_banner

Environmental Microbiology (2014) 16(3), 643–657 doi:10.1111/1462-2920.12365

Systematic evaluation of bias in microbial community profiles induced by whole genome amplification

Susana O. L. Direito,1† Egija Zaura,2 Miranda Little,1 that pWGA is the most promising method for charac- Pascale Ehrenfreund3,4 and Wilfred F. M. Röling1* terization of microbial communities in low-biomass 1Molecular Cell Physiology, Faculty of Earth and Life environments and for currently planned astro- Sciences, VU University Amsterdam, Amsterdam, The biological missions to Mars. Netherlands. 2Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Introduction Amsterdam and VU University Amsterdam, Amsterdam, Advances in molecular techniques have revolutionized The Netherlands. microbial ecology since the 1980s (Pace et al., 1986). 3 Leiden Institute of Chemistry, Leiden, The Netherlands. Polymerase chain reaction (PCR)-based culture- 4 Space Policy Institute, Elliott School of International independent sequence analysis of 16S rRNA genes has Affairs, Washington, Washington, DC, USA. become the most widely used method to determine the phylogenetic composition of microbial communities, cir- Summary cumventing the problem that most microorganisms cannot be cultured (Hawkins et al., 2002). It contributed to Whole genome amplification methods facilitate the the description of microbial communities in low-biomass, detection and characterization of microbial communi- extreme environments with resemblance to Mars (e.g. ties in low biomass environments. We examined the Drees et al., 2006; Pointing et al., 2009; Direito et al., extent to which the actual community structure is 2011). reliably revealed and factors contributing to bias. One More recently, the field of microbial ecology has moved widely used [multiple displacement amplification from the PCR-based analysis of single genes towards the (MDA)] and one new primer-free method [primase- analysis of metagenomes and functional capabilities of based whole genome amplification (pWGA)] were microbial communities through metagenomics (e.g. Tyson compared using a polymerase chain reaction (PCR)- et al., 2004) and community-wide functional microarrays based method as control. Pyrosequencing of an envi- (e.g. He et al., 2007). However, obtaining sufficient DNA ronmental sample and principal component analysis for these analyses can be an issue, especially for low- revealed that MDA impacted community profiles more biomass environments (Abulencia et al., 2006). Conven- strongly than pWGA and indicated that this related to tional PCR (Mullis et al., 1986) is not practical to generate species GC content, although an influence of DNA sufficient DNA template as it requires prior sequence integrity could not be excluded. Subsequently, biases information to define primers that target specific by species GC content, DNA integrity and fragment sequences. Multiple displacement amplification (MDA) size were separately analysed using defined mixtures has been used to amplify whole genomes from low- of DNA from various species. We found significantly biomass environments for use as template in meta- less amplification of species with the highest GC genomics (Abulencia et al., 2006). Functional microarrays content for MDA-based templates and, to a lesser are often hybridized with DNA that is first amplified by extent, for pWGA. DNA fragmentation also interfered MDA (Wu et al., 2006). severely: species with more fragmented DNA Whole genome amplification (WGA) methods, like were less amplified with MDA and pWGA. pWGA MDA, offer yet another advantage over PCR in describing was unable to amplify low molecular weight DNA microbial communities in low-biomass, extreme Mars < ( 1.5 kb), whereas MDA was inefficient. We conclude analogues, and possibly even astrobiology missions to Mars; the random amplification of DNA by WGA methods Received 24 August, 2013; revised 10 December, 2013; accepted 17 may aid the detection of life, as even on Earth rRNA December, 2013. *For correspondence. E-mail wilfred.roling@ gene-directed PCR can miss species (e.g. Huber et al., falw.vu.nl; Tel. +31 20 5987192; Fax +31 20 5987223. †Present address: School of Physics and Astronomy, University of Edinburgh, 2002). Furthermore, virus-like entities might be present Edinburgh, Midlothian EH9 3JZ, Scotland. on other planets, as viral entities are very primordial

© 2013 Society for Applied Microbiology and John Wiley & Sons Ltd 644 S. O. L. Direito et al.

(Forterre, 2006). WGA methods enable obtaining suffi- Results cient DNA for the analysis of viral communities (Kim and The effect of two WGA methods, MDA and pWGA on 16S Bae, 2011). rRNA gene-based community profiling was examined and MDA requires Bacillus subtilis bacteriophage phi29 compared with a control for which the DNA template was DNA polymerase and a mixture of random exonuclease- not subjected to WGA. Throughout the Results and Dis- resistant primers (hexamers) to perform isothermal cussion sections, ‘MDA’ and ‘pWGA’ sources refer to the amplification at low temperature (30°C) (Dean et al., use of template that was generated by these WGA 2002). MDA provides a more complete genomic cover- methods. ‘Control’ refers to the control, in which only PCR age and less biased amplification compared with other was used to generate community profiles. WGA methods (Lasken and Egholm, 2003; Vora et al., 2004). However, also MDA is subject to biases (prefer- ential amplification of certain sequences or genetic loci). Evaluation of WGA of an environmental sample The observed biases have been suggested to relate to DNA fragment length (Abulencia et al., 2006), GC The impact of MDA and pWGA on community represen- content (Bredel et al., 2005; Pugh et al., 2008; Yilmaz tation in a Cu-impacted soil sample was already evident et al., 2010), chromosome position (Lage et al., 2003; after principle coordinate analysis (Fig. 1A) and cluster Pugh et al., 2008) and stochastic effects originated analysis on denaturing gradient gel electrophoresis by amplifying from very low amounts of template (DGGE) profiles (Supporting Information Fig. S1A). As (Raghunathan et al., 2005). The causes for bias are still triplicates were highly reproducible (Control > 94%, MDA poorly understood and have not been systematically > 92% and pWGA > 82% similarity), two independently studied in a microbial ecology context, where the genetic derived DNA sources per treatment were subjected to composition of samples can be very complex due to the high-resolution profiling by pyrosequencing. After quality in general large diversity in species and the genes they filtering, a total of 8247 reads [1375 average per sample, contain. standard deviation = 985, range: 620–3204] with 615 Another drawback of MDA is that it can give rise to unique sequences were obtained, corresponding to 564 false-positive results because of contaminating exog- different operational taxonomic units (OTUs). Control enous DNA or endogenous template-independent samples (not subjected to WGA) and samples first sub- primer-primer interactions (Zhang et al., 2006). A recent jected to WGA (MDA and pWGA) revealed a similar primase-based WGA (pWGA) method is truly primer- species richness (Table 1); the numbers of OTUs [analy- independent, which would aid in diminishing false-positive sis of variance (ANOVA), F = 5.0, P = 0.11] and Chao1 results for low-biomass environments. pWGA employs species richness estimator based on incidence data bacteriophage T7 gene 4 protein (gp4) primase to synthe- (ANOVA, F = 0.27, P = 0.78) did not differ significantly. size primers on-template, excluding the need of adding However, the Shannon diversity index (which combines synthetic primers (Li et al., 2008). Gp4 has both primase species richness and abundances) did (ANOVA, F = 87.2, and helicase activity; the helicase opens the DNA double P = 0.002), with a slightly decreased index for pWGA, strand for both priming and polymerization, thereby compared with the control, and lowest value for MDA removing the need for a denaturing step (Vincent et al., (Table 1). 2004), and the primase generates the primers by recog- Reads were classified into 26 different phyla, and only nizing the sequences 3′-CTGG(G/T)-5′ or 3′-CTGTG-5′ a very small fraction remained unclassified (Control = (Li et al., 2008). So far only a few pWGA studies have 0.2% ± 0.1%, MDA = 0.6% ± 0.0% and pWGA = 0.1 been performed (Li et al., 2008; Schaerli et al., 2010; Tate ± 0.1%). Acidobacteria, Actinobacteria, Bacteroidetes, et al., 2012), all unrelated to microbial ecology. Chloroflexi, Firmicutes and were the most Here, we systematically evaluated and compared represented phyla and an examination at the phyla level biases in microbial community profiles generated by MDA revealed an impact of WGA on the relative abundances of and pWGA. Species with lower GC content might be phyla. Samples subjected only to PCR (Control samples) over-represented after amplification because DNA with highlighted more Acidobacteria (21.9% ± 1.0%) than did higher GC content is more stable and difficult to denature pWGA- and MDA-based DNA sources (14.3% ± 1.3% and (Marmur and Doty, 1959). Excessive fragmentation of 12.7 ± 1.3% respectively) and also more Actinobacteria Gram-negative DNA can occur as these cells are (23.7% ± 2.5%) than did pWGA (16.5 ± 0.2%) and MDA easier to break during mechanic cell lysis than Gram- (3.1% ± 1.0%), while MDA strongly favoured the amplifi- positives (Schneegurt et al., 2003). Fragmented DNA is cation of Bacteroidetes (25.7% ± 0.4%) as compared with possibly more difficult to amplify. We present a detailed the control (1.9% ± 0.0%), and pWGA (4.3% ± 1.1%) and investigation of the MDA and pWGA methods for influence Firmicutes (24.4% ± 1.9%) as compared with the control of GC content, DNA fragmentation and DNA size. (10.0% ± 0.3%) and pWGA (21.2% ± 5.4%).

© 2013 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology, 16, 643–657 Evaluation of bias induced by whole genome amplification 645

Fig. 1. Impact of WGA approaches on bacterial community representation for a Cu-polluted soil sample from the Wildekamp, the Netherlands. A. Ordination plot of principle coordinate analysis on matrix containing Pearson correlations between 16S rRNA gene-based DGGE profiles. B. Ordination biplot of PCA on OTUs derived from 454 pyrosequencing. Samples are displayed by dots, lines represent the loadings of OTUs (see also Table 2). Only loadings higher than 0.1 or lower than −0.1 are shown. PC1 and PC2 explained 65.8% and 23.7% of variance respectively.

The impact of WGA on community profiles became and C) analysis. Figure 2 shows strong effects of WGA on more apparent at higher phylogenetic resolution (Fig. 2), the reported community structure at the family level. and when combined with statistical (Table 2), ordination Cluster analysis of the pyrosequencing data after assign- (Fig. 1B) and cluster (Supporting Information Fig. S1B ing reads to genera (Supporting Information Fig. S1B)

© 2013 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology, 16, 643–657 646 S. O. L. Direito et al.

Table 1. Diversity statistics for control samples not subjected to and weighted UniFrac cluster analysis at the OTU WGA (‘Control’) and samples subjected to WGA (i.e. either ‘MDA’ or level (Supporting Information Fig. S1C) had Control and ‘pWGA’), derived from pyrosequencing and assigning reads to genera. pWGA clustering closer together. Replicates were highly reproducible (> 96% similarity; Supporting Information Samples Averages of number of OTUs Chao1 Shannon H Fig. S1B). These two cluster analyses differ from the Control 176 ± 8 278 ± 70 4.6 ± 0.0* DGGE profiles clustering seen in Supporting Information MDA 157 ± 6 263 ± 19 4.1 ± 0.0* Fig. S1A, where both WGA methods were grouping close pWGA 167 ± 1 246 ± 22 4.3 ± 0.0* to each other. However, this difference might relate

*Significantly different (P < 0.05) between samples. to the low resolution of DGGE in comparison with The averaged (n = 2) number of OTUs, Chao1 species richness esti- pyrosequencing. In addition, Principle coordinate analysis mator and Shannon diversity index, with standard deviation, are (Fig. 1A) on DGGE data and principal component analy- shown. The data used for this analysis was a subsampling of 620 reads from each individual sample. sis (PCA) on the pyrosequencing data revealed a similar picture (Fig. 1B); along the PC1 axis (explaining 66% of

Fig. 2. Relative abundance (%) of reads atributed to different families for control samples not subjected to WGA (Control.1 and Control.2) and samples subjected to WGA (MDA.1, MDA.2, pWGA.1 and pWGA.2).

© 2013 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology, 16, 643–657 Evaluation of bias induced by whole genome amplification 647 variance), the MDA-based DNA templates and controls pWGA-based template (Supporting Information Fig. S2: were most distant from each other and the pWGA-based similarities > 78%). However, PCA on the DGGE band templates were in between. Along the PC2 axis (explain- intensities (Fig. 3B) revealed a similar separation of ing 24% of variance), the pWGA-based templates were samples in two-dimensional space as that obtained for the separated from the MDA-based templates and controls. experiment utilizing environmental DNA (Fig. 1); control Sixteen (in bold in Table 2) of the 30 most abundant and MDA replicates were separated along the X-axis, with OTUs showed significant differences, and many of the pWGA in between, but separated along the Y-axis from most abundant OTUs also contributed strongly to PC1 the Control and MDA samples. pWGA-based replicates and/or PC2 (indicated in light grey in Table 2). The four were more reproducible than MDA-based replicates OTUs with strongest contribution to PC1 and PC2 (Fig. 3 and Supporting Information Fig. S2). (Fig. 1B) were Bacillus (#73890), which was the most Significant differences occurred in particular in the abundant OTU (Table 2) and was significantly higher in amplification of Deinococcus radiodurans (one-way whole genome amplified templates (pWGA and MDA); ANOVA: F = 60, P = 0.0001) and Paracoccus denitrificans Roseiflexus (#33), which was only amplified with pWGA; (F = 68, P = 0.0001). Deinococcus radiodurans was less Ramlibacter (#15036) more amplified with pWGA; and amplified after MDA (2% ± 2%) and pWGA (0% ± 0%) Clostridium (#67881), more amplified with MDA. More than with Control only (13% ± 2%). Paracoccus denitri- amplified with MDA were also Flavisolibacter (#92578) ficans was also under-represented by MDA but was pref- and Chitinophagaceae (#39439). Acidobacteria Gp6 erably amplified with pWGA (pWGA: 23% ± 1%; Control: (#21274), Acidothermus (#5217) and Burkholderiaceae 8% ± 3%; MDA: 2% ± 2%). Because these two species (#14209) were significantly more abundant in control (D. radiodurans and P. denitrificans) have the highest GC samples than in WGA-based templates. Blastococcus content (both 67%), the influence of GC content on rep- (#86789) was only amplified in controls and pWGA, espe- resentation of this defined mock community after MDA cially the most abundant OTUs appeared to differ signifi- was comparable with that of the undefined, environmental cantly between amplification methods (11 out of the first sample (Fig. 1). 15 most abundant OTUs, Table 2). A regression was performed on the PC values using a Impact of DNA fragmentation on microbial combination of continuous (GC content, genome size) community profiles and categorical (Gram-staining) explanatory variables (indicated in Table 2). Gram staining was used as a indi- A negative effect of DNA fragmentation on the relative cator for DNA integrity; Gram-negative microorganisms contribution of the corresponding species to community are more easy to lyse than Gram-positives (Schneegurt profiles was revealed after mixing equal amounts of intact et al., 2003), and therefore, their DNA may be more frag- or ClaI-digested DNA of Bacillus cereus and Staphylococ- mented during bead beating. While neither GC content cus aureus (having comparable GC contents, see Table 3, nor Gram-staining could explain PC2 values, a signifi- and without ClaI restriction sites in their targeted 16S cant positive effect of GC content on PC1 values rRNA genes), subjecting these four mixtures to pWGA or (P < 0.0001) was observed, with low GC bacteria being MDA, and performing PCR and DGGE on their products relatively over-represented in pWGA and even more in (Table 4). When DNA of both species in the original DNA MDA samples. A marginal effect for Gram-staining template was either intact or fragmented, there was no (P = 0.065) was observed. Therefore, subsequently, GC preferential amplification of the DNA of either species content and DNA integrity were individually investigated relative to each other, irrespective of whether the template using defined mixtures of species. We also tested for was directly assessed by PCR or first subjected to MDA genome size, but no significant relation with PC1 or PC2 (Table 4). Differences in DNA sizes did not affect commu- was observed. nity representation when the DNA template was only sub- jected to PCR, without prior WGA. However, after MDA amplification, the relative band surface for S. aureus was Influence of species GC content on microbial higher (66% ± 2%) when only B. cereus DNA in the origi- community profiles nal mixture was fragmented and lower (32% ± 2%) when DGGE analysis (Fig. 3A) on a defined mixture of intact only S. aureus DNA was fragmented. genomic DNA of six species with different GC contents With pWGA amplified template, when DNA of both (Table 3) showed that WGA did affect the part of the species in the original DNA template was intact, there was community that was detected and that there was also an no significant effect (one-sample t-test, P > 0.05) on com- effect of the method of WGA on what was detected. munity profiles either the observed relative band surface Control samples clustered more closely with the MDA- for S. aureus (44% ± 9%) corresponded to the expected based DNA source (similarities > 85%) than with the 50%. However, fragmentation had a stronger effect on

© 2013 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology, 16, 643–657 648 .O .Direito L. O. S.

Table 2. Overview of the 30 most abundant OTUs, obtained by 454 pyrosequencing on a DNA template from Wildekamp, the Netherlands, that had been subjected to three different treatments.

Relative abundance (%) One-way ANOVA Control MDA pWGA (P < 0.05)

03SceyfrApidMcoilg n onWly&Sn Ltd, Sons & Wiley John and Microbiology Applied for Society 2013 © No. of sequences Gram PC1 PC2 tal et (from total 8247) Taxon stain Genome size (Mb) GC content (%) loading loading Mean SD Mean SD Mean SD FP . 1138 Bacillus + 4.8 ± 0.8 (n = 36)1 37 ± 4(n = 10)1 −0.609 −0.541 6.89a 0.46 17.44b 0.49 16.03a,b 4.21 10.85 0.042 344 Acidobacteria Gp6 – 1.2 ± 0.7 (n = 8)1 60 ± 2(n = 7)1 0.239 0.043 5.98c 0.1 1.7a 0.13 3.72b 0.27 283.44 < 0.001 202 Rhodocyclacea – 4.3 (n = 1)1 65 ± 4(n = 5)1 0.174 −0.087 3.34 0.74 0.86 0.16 3.37 2.47 1.87 0.297 198 Ramlibacter – 4.1 (n = 1)2 70 (n = 1)2 0.055 −0.473 0.56a 0.21 0.31a 0.02 4.98b 1.13 31.15 0.01 191 Anaerolineae – 3.5 (n = 1)1 54 (n = 1)1 0.215 −0.038 3.67 0.64 0.16 0.01 2.63 1.42 8.03 0.062 160 Acidothermus + 2.4 (n = 1)1 67 (n = 1)1 0.16 0.066 3.32b 0.13 0.37a 0.53 1.38a 0.04 45.24 0.006 136 Sporosarcina + 3.7 (n = 1)2 42 (n = 1)2 −0.074 −0.033 0.97 0.28 2.15 0.98 1.65 0.81 1.24 0.406 134 Roseiflexus – 5.8 ± 0.0 (n = 2)1 61 ± 0(n = 2)1 0.024 −0.373 0.00a 0 0.00a 0 3.49b 0.21 561.31 < 0.001 131 Micrococcaceae + 4.2 ± 0.9 (n = 18)1 64 ± 7(n = 14)1 0.07 −0.092 1.59b 0.21 0.39a 0.09 1.89b 0.2 42.19 0.006 129 Clostridium + 4.4 ± 0.9 (n = 41)1 30 ± 3(n = 10)1 −0.433 0.366 0.46a 0.26 7.86b 0.75 0.37a 0.04 174.51 0.001 118 Blastococcus + 4.9 (n = 1)2 73 (n = 1)2 0.106 −0.038 1.91b 0.66 0.00a 0 1.40a,b 0.27 11.49 0.039 108 Burkholderiaceae – 7.9 ± 0.9 (n = 36)1 66 ± 2(n = 10)1 0.085 0.019 1.96b 0.04 0.38a 0.31 1.05a 0.15 31.52 0.01 96 Acidobacteria Gp6 – 1.2 ± 0.7 (n = 8)1 60 ± 2(n = 7)1 0.073 0.006 1.66b 0.08 0.30a 0.2 0.97a,b 0.22 28.46 0.011 91 Nitrospira – 4.3 (n = 1)1 54 (n = 1)1 −0.032 0.085 1.28 0.55 1.8 0.43 0.76 0.06 3.28 0.176 90 Flavisolibacter – N.A. 43 ± 1(n = 2)3 −0.188 0.098 0.33a 0.08 3.67b 0.97 0.98a 0.33 18.03 0.021 89 Nitrosomonadaceae – 3.2 ± 0.5 (n = 4)1 49 ± 4(n = 4)1 0.079 −0.023 1.54 0.73 0.22 0.32 1.19 0.41 3.52 0.164 82 Candidatus Solibacter – 10.0 (n = 1)1 62 (n = 1)1 0.011 0.026 1.18 0.51 1.02 0.38 0.93 0.21 0.22 0.812 80 Xanthobacteraceae – 3.1 ± 2.6 (n = 3)1 68 ± 0(n = 3)1 0.057 −0.011 1.29b 0.17 0.32a 0.24 0.99a,b 0.12 14.77 0.028 78 Solirubrobacterales + 8.0 ± 1.9 (n = 2)1 73 (n = 1)1 0.06 −0.023 1.23b 0.2 0.16a 0.23 0.97b 0.04 20.16 0.018 74 Terrabacter + 4.9 (n = 1)2 71 (n = 1)4 0.05 0.049 1.43 0.02 0.4 0.57 0.51 0.34 4.33 0.131 73 Acidobacteria Gp4 – 1.2 ± 0.7 (n = 8)1 60 ± 2(n = 7)1 0.01 0.024 1.06 0.33 0.85 0.28 0.73 0.08 0.86 0.507 72 Conexibacter + 6.0 ± 0.5 (n = 2)1 73 (n = 1)1 0.072 0.02 1.41 0.83 0.08 0.11 0.64 0.15 3.71 0.155 71 Acidobacteria Gp1 – 1.2 ± 0.7 (n = 8)1 60 ± 2(n = 7)1 0 −0.027 0.74 0.41 0.78 0.26 1.02 0.08 0.57 0.616 64 Arthrobacter + 4.6 ± 0.2 (n = 14)1 64 ± 3(n = 5)1 0.034 −0.093 0.56a,b 0.4 0.07a 0.11 1.23b 0.16 10.54 0.044 63 Acidobacteria Gp2 – 1.2 ± 0.7 (n = 8)1 60 ± 2(n = 7)1 −0.078 0.081 0.67 0.12 2 0.77 0.47 0.09 6.68 0.079 niomna Microbiology Environmental 61 Bacillus + 4.8 ± 0.8 (n = 36)1 37 ± 4(n = 10)1 −0.062 0.004 0.33 0.11 1.41 0.51 0.77 0.14 5.91 0.091 59 Chitinophagaceae – 7.0 ± 2.0 (n = 5)2 41 ± 2(n = 4)3,5 −0.178 0.134 0.20a 0.08 3.23b 1.14 0.22a 0.13 13.7 0.031 58 Conexibacter + 6.0 ± 0.5 (n = 2)1 73 (n = 1)1 0.035 −0.027 0.83 0.1 0.32 0.24 0.92 0.42 2.64 0.218 58 Aciditerrimonas + N.A. 74 (n = 1)6 0.062 −0.001 1.13b 0.04 0.08a 0.11 0.72a,b 0.31 14.84 0.028 58 – N.A. 67 (n = 1)7 0.028 −0.014 0.8 0.23 0.24 0.12 0.64 0.33 2.89 0.2

Pyrosequencing was either directly performed on the isolated DNA (‘Control’) or after prior WGA with MDA or pWGA. Taxa occurrence and names, in combination with their Gram stain, genome sizes (Mb) and GC content (%), relative abundance (%) [average over two independent amplifications with standard deviation (SD)], PCA loadings of principal components (PC) 1 and 2, and results of one-way ANOVA are indicated. Shaded area of the Table: OTUs which contributed with loadings > 0.1 or <−0.1 to PC1 and PC2. OTUs in bold indicate significant differences between treatments. No correction for multiple testing was performed, the False Discovery Rate was 9%. In the column ‘relative abundance’: a,b,cIndicate the homogenous subsets to which the averages belong to (Post Hoc tests, Tukey, P < 0.05). Bold indicates OTUs that significantly differ between treatments. GC contents and genome sizes of species belonging to the identified genus were obtained from:

, 1 2 3 4 5 6 7

16 http://img.jgi.doe.gov/; EzGenome database; Yoon and Im, 2007; Montero-Barrientos et al., 2005; Lim et al., 2009; Itoh et al., 2011; Kämpfer et al., 2006. N.A. indicates information not

643–657 , available. Evaluation of bias induced by whole genome amplification 649

Fig. 3. (A) DGGE (30–55% denaturant gradient) profiles of bacterial 16S rRNA gene fragments in a defined mixture of six different species (Bacillus cereus, Deinococcus radiodurans R1, Desulfitobacterium hafniense Y51, Escherichia coli CL4B, Paracoccus denitrificans Pd1222 and Shewanella putrefaciens 200R; Table 3), after PCR (with GC-clamp) was performed on the amplification products of MDA, PCR (Control), or pWGA, as indicated. (B) Ordination biplot of PCA on the DGGE band intensities. Samples are displayed by dots, lines represent the loadings. community profiles after pWGA than observed after MDA. B. cereus DNA was fragmented, this value rose for When the DNA of both species had been fragmented, S. aureus (76% ± 1%). However, when S. aureus DNA S. aureus 16S rRNA genes were less well amplified was fragmented and B. cereus DNA was intact, 16S rRNA (27% ± 2%). When S. aureus DNA was intact and genes of S. aureus could not be detected (Table 4).

© 2013 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology, 16, 643–657 650 S. O. L. Direito et al.

Table 3. Properties of the seven species used in this study and the number of 16S rRNA molecules used for the experiment on impact of GC content.

16S rRNA gene 16S rRNA copy Gram Genome Genomic GC copy numbers numbers used in Species stain (+/−) size (Mb) content (%) in genome defined mixture

Bacillus cereus + 5.4 36 12 5.3 × 105 Deinococcus radiodurans R1 + 3.3 67 3 1.1 × 106 Desulfitobacterium hafniense Y51 + 5.7 47 6 2.8 × 105 Escherichia coli CL4B – 4.6 51 7 1.5 × 106 Paracoccus denitrificans Pd1222 – 5.2 67 3 4.6 × 105 Shewanella putrefaciens 200R – 4.8 45 5 1.2 × 106 Staphylococcus aureus + 2.9 33 5 n.a.

Genome sizes, GC content and number of 16S rRNA gene copies were taken from http://img.jgi.doe.gov/ n.a.: not applied in mixture to test impact of GC content.

Suitability of WGA for low molecular weight DNA template generated on average 3.3 (± 2.7) 103 ng Double-stranded DNA (dsDNA), of which only 3% ± 2% As a follow-up on the experiment with DNA fragmentation, was PCR amplifiable, while 10 ng of 1.5 kb-sized tem- WGA on 10 ng of template consisting of defined small plate resulted in 1.7 (± 0.7) 103 ng dsDNA, of which fragments (either 0.2 or 1.5 kb long 16S rRNA gene PCR 24% ± 8% was PCR-amplifiable (Table 5). For the suc- fragment derived from a single clone; see Experimental cessful pWGA reactions, 10 ng of 0.2 kb fragment size procedures) was performed. MDA resulted in visible resulted in 144 ng dsDNA of which only 1.4 ng was ampli- amplification products between 1 and > 10 kb size, com- fiable, while 10 ng of 1.5 kb fragments resulted in parable with the positive control. For some replicates, 32 ± 37 ng dsDNA of which 16 ± 19 ng was amplifiable, amplification products were mostly > 10 kb size (lanes indicating that pWGA hardly increased the amount of 0.2 kb-c, 1.5 kb-b and 1.5 kb-c in Fig. 4A). On the PCR-amplifiable DNA, and in case of the 0.2 kb template, other hand, pWGA amplification did not result in well- it even decreased. reproducible results as it generated only visible amplifica- tion for a few samples (lanes 0.2 kb-b, 1.5 kb-a and Discussion 1.5 kb-c in Fig. 4B). The positive reactions revealed mostly > 10 kb size products in comparable with the posi- WGA techniques are gaining ground in microbial ecology tive control (Fig. 4B). Both methods amplified the targeted (Binga et al., 2008). However, any bias in the techniques 16S rRNA gene fragment without introducing major altera- could lead to erroneous assessment of microbial commu- tions in DNA sequences because DGGE banding posi- nity structure. Consequently, the systematic determination tions were not affected by WGA (Fig. 4C). Quantification of the biases inherent to the various WGA methodologies revealed that for MDA, 10 ng of 0.2 kb-sized original is of great relevance. This study revealed that factors that

Table 4. Impact of DNA fragmentation on community representation after whole genome amplification (MDA or pWGA).

Composition of DNA source Relative band surface Source of DNA in PCR-DGGE S. aureus B. cereus (%) for S. aureus

Original DNA Intact Intact 50 ± 1 Fragmented Fragmented 50 ± 1 Intact Fragmented 50 ± 1 Fragmented Intact 50 ± 1 MDA amplified DNA Intact Intact 50 ± 2 Fragmented Fragmented 50 ± 2 Intact Fragmented 66 ± 2 Fragmented Intact 32 ± 2 pWGA amplified DNA Intact Intact 44 ± 9 Fragmented Fragmented 27 ± 2 Intact Fragmented 76 ± 1 Fragmented Intact 0 ± 0

WGA was performed on DNA mixtures of B. cereus and S. aureus with all possible combinations of intact and fragmented DNA and with equal 16S rRNA copy numbers. The average relative DGGE band surfaces (%), with standard deviation, for S. aureus are presented (n = 3).

© 2013 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology, 16, 643–657 Evaluation of bias induced by whole genome amplification 651

are generated by amplification-based methods because DNA with higher GC content is more stable and more difficult to denature (Marmur and Doty, 1959). The influ- ence of GC content has been inferred previously for PCR (Reysenbach et al., 1992) and was briefly reported for a few phi29 DNA polymerase-based MDA kits includ- ing the MDA kit used in this study (Yilmaz et al., 2010), but was not yet reported for pWGA. Through a system- atic evaluation, we showed here that there is a clear influence of GC content on microbial community profiles after WGA. Detailed statistical analysis on the basis of 454 pyrosequencing of an environmental sample revealed that bias towards GC was much stronger for MDA than for pWGA. Genera that contributed strongly to the separation along the PCA axes or to the observed variation included low GC content Gram-positive Firmicutes such as Bacillus Fig. 4. Effects of MDA and pWGA amplification on low molecular and Clostridium. That pWGA was less biased by GC weight DNA (0.2 and 1.5 kb indicate the length of the 16S rRNA content was also evident from the observation that gene fragment in the DNA template used in amplification, and a, b members belonging to high GC content, Gram-negative and c indicate replicates of each amplification). (A,B) Yields of products resulting from MDA (A) and pWGA (B) amplification, as genera, i.e. Roseiflexus (Chloroflexi: 61% GC content) revealed by 1.2% agarose gel electrophoresis; N, negative control; and Ramlibacter (Betaproteobacteria: 70% GC) were P, positive control of respective kits; L, 1 kb DNA ladder most abundant in pWGA-based templates. In addition, (Promega); (C) DGGE (30–55% denaturant gradient) profiles of 16S rRNA gene fragments: Control refers to the original template other high GC content taxa, like Anaerolineae (54% GC) for PCR-DGGE, MDA and pWGA refer to the use of WGA product and Rhodocyclacea (∼65% GC) occurred in similar abun- as template for PCR-DGGE; M12, marker consisting of a mixture of dances after pWGA and in the control method. The lower 12 different bacterial 16S rRNA gene fragments. susceptibility of pWGA to GC content may be related to the helicase that opens the DNA double helix in this method (Vincent et al., 2004), while MDA relies on an influence the veracity of two different types of WGA initial denaturation step. methods (MDA and pWGA) include GC content, DNA Next to GC content, also other factors may have con- integrity and DNA size. tributed to the observed differences in the experiment Environmental DNA extracts comprise a complex employing an environmental DNA template. Possibly, mixture of DNA from different species, differing in GC these factors may interfere with relating the differences to content and in DNA integrity. GC contents in microbial GC content or correlate with GC content. DNA fragmen- genomes can vary radically from < 20% to > 70% tation during DNA isolation can compromise the PCR- (Hildebrand et al., 2010). Species with lower GC content based detection of bacteria with more fragile cell walls may become over-represented when community profiles (Miller et al., 1999). Excessive fragmentation can occur

Table 5. Quantification of the products of MDA and pWGA amplifications on 10 ng low molecular weight DNA template (0.2 or 1.5 kb fragment sizes).

Percentage (%) of Ratio amplifiable Template amplifiable 16S DNA to template Amplification size (kb) Total dsDNA (ng) rRNA gene copies DNA

MDA 0.2 3.27(± 2.70)103 3 ± 2 9.82 1.5 1.70(± 0.69)103 24 ± 8 40.8 pWGA 0.2 144* 1* 0.14 1.5 32 ± 37** 50 ± 52** 1.60

* and ** only one and two, respectively, out of the three independent amplifications were successful (see Fig. 4). Total dsDNA (ng) was determined with PicoGreen, and the averages with standard deviation are presented. Expected 16S rRNA copy numbers were calculated based on the measured amount of dsDNA, fragment length and sequence composition, and compared with copy numbers as determined by qPCR on the WGA products, to obtain the percentage of PCR-amplifiable 16S rRNA gene copies (average with standard deviation). The ratio of amplifiable DNA to template indicates the amount of DNA that was amplifiable after WGA, divided by the initial amount of template (10 ng).

© 2013 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology, 16, 643–657 652 S. O. L. Direito et al. for DNA of Gram-negative bacteria as their cells are like PCR may introduce mutations during the denaturing easier to break than Gram-positives (Schneegurt et al., step and strand synthesis (Dean et al., 2002; Binga et al., 2003). In our experiments, the low GC Bacilli and 2008). Future studies should also investigate the effects Clostridia were over-represented after MDA and pWGA. of pWGA on representation of microbial communities Both are Gram-positives; thus, also their DNA likely using non-PCR-based methods to analyse the pWGA- remained more intact during isolation. Therefore, we con- amplified DNA, such as metagenomics studies employing tinued with experiments in which we used defined mix- random shotgun sequencing. tures of species to study GC content and DNA integrity Ideally, WGA-based methods should also allow for the separately. The results clearly indicated that both factors amplification of strongly fragmented DNA. If life existed on can contribute to bias. Mars and was based on DNA, this life may have gone A defined mixture of high molecular weight DNA of six extinct (Isenbarger et al., 2008). Ancient DNA under- species, with known GC contents, produced results in line goes fragmentation because of depurination proces- with the experiment on the soil sample; it revealed signifi- ses that create abasic sites resulting in DNA breaks cant less amplification of the two species (P. denitrificans (Garcia-Garcera et al., 2011). PCR-based WGA methods and D. radiodurans) with the highest GC content (both that require degenerate or random primers such as 67%) using MDA-based templates and, to a lesser extent, degenerate oligonucleotide-primed PCR and primer pWGA-based templates. extension preamplification can amplify low molecular The effect of fragmentation on WGA for microbial weight DNA (< 3 kb), but these have a much higher ecology studies was never fully examined. Fragmentation genomic coverage bias compared with MDA (Pinard also interfered with amplification, as revealed by MDA and et al., 2006), generate non-specific amplification artefacts pWGA amplification on mixtures of intact and fragmented (Cheung and Nelson, 1996) and relatively short DNA frag- DNA of B. cereus and S. aureus, both having comparable ments (< 1 kb) that cannot be used in many applications GC contents. Effects of fragmentation on community pro- (Dean et al., 2002). Of the two methods we tested, MDA files were strongest after pWGA. proved to be the better method to amplify fragments as We performed our experiments in the context of studies short as 0.2 kb, although subsequent PCR amplification on low-biomass, Mars analogue environments on Earth on the MDA product was inefficient, especially for shorter (Direito et al., 2011) and the preparation for missions to fragments (0.2 kb). MDA produces a hyperbranched DNA detect life on Mars (Ehrenfreund et al., 2011). The network of repeated sequences (Spits et al., 2006). If the expected low (if any) biomass, restricted to localized branches are too short, a hyperbranched DNA network areas, makes the search for extinct and extant life on of short branches is produced, which may make the Mars a complex endeavour. In addition to a thorough sequence inaccessible to both targeting primers and sub- assessment of the mineralogy, geochemistry and the bio- sequent exponential PCR amplification. In addition, tem- energetic potential of the landing site and advanced sam- plates tend to decrease even more in size after each pling technology, innovative life detection instrumentation amplification round because of positioning of the random will be required to find possible remnants of life or organic primers along the template (Shoaib et al., 2008). pWGA fragments on Mars (Ehrenfreund et al., 2011). The fact did not amplify low molecular weight DNA (< 1.5 kb) well, that WGA techniques are sensitive and do not require and also, the amplified DNA was not suitable for subse- previous sequence information increases their potential quent PCR, in particular when a 0.2 kb DNA fragment was for life detection purposes, where highly conserved or used as template. A possible reason for pWGA not being ‘universal’ gene sequences should not be assumed as effective was that the primase requires DNA motifs certain. Polymerases have been engineered to replicate 3′-CTGG(G/T)-5′ or 3′-CTGTG-5′ (Li et al., 2008) for ini- molecules alternative to RNA and DNA, like xenonucleic tiating amplification. These motifs were few in the used acids (Pinheiro et al., 2012). Possibly, the enzymes used 1.5 kb fragment (seven motifs per fragment) and absent in WGA can also be evolved to detect and amplify devi- from the 0.2 kb fragment. Directed evolution experiments ating nucleic acids. may aid in improving pWGA for amplification of low The WGA methods studied here generated reproduc- molecular weight DNA; such experiments have allowed ible results, with over 80% similarity for replicates. In the improvement of DNA polymerases vis-à-vis detecting particular, pWGA is a promising method for microbial and repairing damaged DNA (e.g. d’Abbadie et al., 2007). ecology studies on Mars analogues and Mars itself, as it MDA and pWGA methods have additional practical provided most similar results to PCR and less bias advantages over other WGA methods by requiring less towards low GC content species. By not requiring external energy consumption and simpler instrumentation than primers, pWGA is a truly isothermal and primer-free ampli- PCR, which make these techniques worth to further con- fication method, while MDA is subject to primer dimer sider and investigate for life detection purposes beyond formation (thus detection of false-positives), chimeras and Earth.

© 2013 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology, 16, 643–657 Evaluation of bias induced by whole genome amplification 653

Experimental procedures primers, 12.5 μl Fidelitaq PCR Master Mix (2×) (USB Corpo- ration, Cleveland, OH, USA), 9.5 μl DNase and RNase-free Strains water (MP Biomedicals, Solon, OH, USA) and 1 μl template. Bacillus cereus, Escherichia coli CL4B, Shewanella putrefaciens 200 and Staphylococcus aureus were grown in Quantification of 16S rRNA genes and total DNA Luria–Bertani medium; Paracoccus denitrificans Pd1222 in Brain Heart Infusion; and Deinococcus radiodurans R1 in Quantitative real-time PCR was performed with a 7300 Real- Deutsche Sammlung von Mikroorganismen und Zellkulturen Time PCR System (AB Applied Biosystems, Branchburg, CA, medium 53. All strains were aerobically grown at 37°C with USA) using primers F357 and R518 (Muyzer et al., 1993). agitation, except Desulfitobacterium hafniense Y51 that was The qPCR program consisted of: 50°C for 2 min, 95°C for grown statically and anaerobically at 37°C. 15 min, 40 cycles (95°C for 15 s, 54°C for 30 s, 72°C for 30 s) and 72°C for 10 min, with a melting curve program: 95°C for 15 s, 60°C for 1 min, 95°C for 15 s and 60°C for 15 s. DNA sources Triplicates were performed for each qPCR reaction, and negative controls were DNase and RNase-free water (MP PowerSoil DNA isolation kit (MO BIO Laboratories, Solana Biomedicals, Solon, OH, USA). Reaction volumes (25 μl) Beach, CA, USA) was used according to the manufacturer consisted of 1 μl of each forward and reverse primer 0.4 μM, protocol for DNA isolation from 250 μl of culture. For experi- 12.5 μl2× DyNAmo HS SYBR Green qPCR master mix ments requiring intact DNA of B. cereus and S. aureus, (Finnzymes, Espoo, Finland), 0.6 μl50× ROX (Finnzymes), without significant DNA fragmentation, the DNeasy Blood and 6.9 μl DNase and RNase-free water (MP Biomedicals), and tissue kit (QIAGEN, Hilden, Germany) employing lysozyme to 3 μl template. A standard curve consisting of DNA dilutions in break the cells was used. Subsequent ‘controlled’ fragmen- the range of 0.01–100 mg l−1 was included. tation of intact DNA was achieved by restriction with ClaI dsDNA was quantified using the Quant-iT PicoGreen (Roche, Indianapolis, IN, USA). This enzyme does not cut in dsDNA kit (Invitrogen, Breda, the Netherlands). the 16S rRNA gene of either species. The restriction reaction μ ∼ μ μ mix consisting of 24 l of DNA extract ( 1 g), 1 l of ClaI Community profiling (10 U; Roche), 4 μlof10× buffer H (Roche), 0.4 μl of Bovine Serum Albumin and 10.6 μl of DNase and RNase-free water PCR products were analysed by DGGE (DCode Universal (Promega, Madison, WI, USA) was incubated at 37°C for 3 h, Mutation Detection System, Bio Rad, Hercules, CA, USA). after which the cut DNA was purified (Gene JET PCR Purifi- Gels were 8% polyacrylamide (37.5:1 acrylamide/Bis), with a cation Kit, Fermentas, St Leon-Rot, Germany). 30–55% denaturing gradient. Electrophoresis was performed To investigate amplification biases for an environmental in 1 × TAE buffer at 200 V and 60°C during 3.5 h. A mixture of sample, a DNA extract from a soil sample taken at 12 different bacterial 16S rRNA gene fragments prepared in Wildekamp near Bennekom (The Netherlands) was used. our laboratory from cloned fragments was used as marker. The soil of this sample was treated with 500 kg ha−1 of copper Gels were stained with ethidium bromide, illuminated with a and the pH adjusted to 6.1 in 1982, and when collected, the Vilber Lourmat (TCP-20-M) UV transilluminator and photo- pH was 5.0 and the copper concentration 0.27 mg kg−1.A graphed. Quantitative analysis of DGGE profiles was per- previous study revealed that the microbial community struc- formed with Gel Compar II (Applied Maths, Kortrijk, Belgium). ture at this location was highly diverse (de Boer et al., 2011). Similarity values were calculated using Pearson correlation and visualized by unweighted paired group method with arith- metic means cluster analysis (van Verseveld and Röling, WGA 2004).

MDA was performed with Illustra GenomiPhi V2 DNA Amplification Kit (GE Healthcare, Buckinghamshire, UK), Determining influence of species GC content on and pWGA was performed with Rapisome pWGA kit microbial community profiles (Biohelix, Beverly, MA, USA) according to the protocols of A mixture containing DNA from six different bacteria was the manufacturers. used (see Table 1 for species, their properties and 16S rRNA copy numbers used in the defined mixture). The 16S rRNA gene copy numbers of each species used to prepare the PCR mixture was determined by qPCR. After performing PCR, Primers F357 and R518 (Muyzer et al., 1993) were used to MDA and pWGA, each in triplicate, on the mixture (and on amplify 0.2 kb bacterial 16S rRNA gene fragments. Primer water-only controls), a PCR with GC-clamped primers was F357 with a GC clamp (Muyzer et al., 1993) was used for performed so that the amplification products could be sepa- DGGE profiling. Primers 8F and 1512R were applied to gen- rated by DGGE. erate 1.5 kb 16S rRNA gene fragments (Felske et al., 1997). The PCR programme consisted of an initial denaturation at Determining impact of DNA fragmentation on microbial 94°C for 5 min; 35 cycles of 94°C for 30 s, 54°C for 30 s and community profiles 72°C for 30 s (90 s for 8F/1512R); and a final elongation step at 72°C for 8 min. A total volume of 25 μl was used in each Four different mixtures of fragmented and intact DNA of PCR reaction, containing 0.4 μM forward and reverse B. cereus and S. aureus were prepared (Table 2). Mixtures

© 2013 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology, 16, 643–657 654 S. O. L. Direito et al. consisted of equal amounts of 16S rRNA gene copies, quan- controls was not sufficient for pyrosequencing. All amplicons tified by qPCR, for each species. Mixtures were subjected to were pooled together in equimolar quantities (20 ng each). MDA and pWGA (in triplicate), and products checked by 1.2% This mixture was dried at room temperature overnight by agarose gel electrophoresis. These products and the original using a Centrivap concentrator Labconco (Beun De Ronde, DNA mixtures were PCR-amplified with GC-clamped primers Abcoude, the Netherlands) connected to an automatic and checked by DGGE. The relative band surface of each freeze-dryer (Virtis, Gardiner, NY, USA; Cenco Instrumenten, species was determined with Gel Compar II (Applied Maths). Breda, the Netherlands) and resuspended to a final concen- tration of 20 ng DNA μl−1 of nuclease-free water. The size and quality of the amplicon mixture was analysed with a Low molecular weight DNA amplification Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA) and sequenced with a 454 GS FLX Titanium system (Roche, In order to establish the suitability of MDA and pWGA to Basel, Switzerland) and amplicon processed at Macrogen, amplify low molecular weight DNA, 16S rRNA gene frag- Inc. (Seoul, South-Korea). These sequence data have been ments (10 ng of either 0.2 or 1.5 kb long fragments) were submitted to the NCBI Sequence Read Archive (http:// subjected to MDA and pWGA in triplicate and compared with trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?) database under respective positive controls of respective kits (MDA: lambda the study accession number SRP028927. DNA, and pWGA: human genomic DNA), followed by PCR on the products with GC-clamped primers and DGGE. Total amounts of dsDNA MDA and pWGA products were deter- Pyrosequencing data analysis mined with PicoGreen (Invitrogen, Breda, the Netherlands), while qPCR was used to quantify the amplifiable DNA in the QIIME (Caporaso et al., 2010) version 1.4.0 was used to products. Fragments were obtained by PCR on a pGEM-T process sequence data. Data quality control was performed plasmid containing a 1.5 kb 16S rRNA gene fragment. The by initially splitting reads per individual sample according to plasmid was isolated using the GeneJET Plasmid Miniprep kit their barcodes. Barcodes and primers were trimmed off, and (Fermentas Life Sciences, St. Leon-Rot, Germany) from an poor quality reads were filtered out. Removed reads included E. coli JM109 transformant in a clone library described in reads with a quality score below 25 on average for the entire Direito and colleagues (2011). read, reads with more than one error in barcode or an ambiguous base call (N > 0), homopolymer sequences of ≥ 6 nucleotides, and too short (< 150) or too long (> 1000) reads. Community profiling after WGA of environmental DNA Sliding window test (50 nt) of quality scores was enabled, and sequences of low quality were truncated at the beginning of Bacterial community profiling of an environmental sample the poor-quality window. Only a single mismatch in each (Wildekamp) was performed by using PCR targeting the primer was allowed. Data were denoised (Reeder and Knight, hypervariable region V3 of 16S rRNA genes for DGGE analy- 2010) using Denoiser version 1.3.0, and chimeras were sis and PCR targeting the hypervariable regionV5–V7 of 16S removed with UCHIME (Edgar et al., 2011) version 4.2.40 rRNA genes for 454 pyrosequencing. MDA and pWGA were using both ‘reference’ and ‘de novo’ modes. Sequences were first performed in triplicate, and their products and original clustered into OTUs with the UCLUST Reference Optimal DNA extracts were subjected to PCR. algorithm (Edgar, 2010) and a 97% similarity threshold. For assignment, the RDP classifier (Cole et al., 2009) was used to align the sequences to the SILVA rRNA database Pyrosequencing (Pruesse et al., 2007) trimmed to span the targeted DNA amplicon library construction involved a PCR amplifica- hypervariable regions V5–V7 as described by Brandt and tion step allowing the tagging of amplicons with identification colleagues (2012). tags or barcodes. PCR reaction was performed with forward Cluster analysis with Pearson correlation as similarity primer 785F (GGATTAGATACCCBRGTAGTC) and reverse measure, PCA, principal coordinate analysis, regression primer 1175R (ACGTCRTCCCCDCCTTCCTC) (Kraneveld analysis and diversity index Shannon H were calculated in et al., 2012). These primers also included at the 5′ end the PAST version 2.14 (Hammer et al., 2001). Chao1 indexes 454 Life Sciences (Branford, CT, USA) adapter A (in the for species richness for a subsampling of 620 reads were forward primer) and B (in the reverse primer) plus a unique 10 calculated with the calculator available at http://www nucleotides long barcode in the forward primer. The PCR .biology.ualberta.ca/jbrzusto/rarefact.php#Calculator. Pear- program consisted of an initial denaturation at 95°C for 120 s, son correlation and one-way ANOVA with post-hoc tests 9 cycles (95°C for 30 s, 53°C for 30 s and 72°C for 80 s), 29 (Tukey) were performed with SPSS statistics version 20 cycles (95°C for 30 s, 62°C for 30 s and 72°C for 80 s) and a (SPSS, Chicago, IL, USA). final elongation step at 72°C for 180 s. A total volume of 50 μl μ was used in each PCR reaction, containing 0.4 M forward Acknowledgments and reverse primers, 25 μl GoTaq Colorless Master Mix (2×) (Promega), 13.4 μl nuclease-free water (Promega) and 6 μl We thank Hans V. Westerhoff for reading a previous version template. Bands were cut from 1.2% agarose gels, purified of the manuscript; Dwayne Holmes for his preliminary work with MinElute Gel Extraction kit (QIAGEN), analysed with with the Wildekamp sample and testing different WGA Experion Automated Electrophoresis System (Bio-Rad methods; Martin Braster for providing the DNA extract of Laboratories, Hercules, CA, USA) and quantified with the Wildekamp sample; Raquel Vargas for providing the PicoGreen. Amplicons concentration of negative, water-only Desulfitobacterium hafniense Y51 culture; Jorke Kamstra for

© 2013 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology, 16, 643–657 Evaluation of bias induced by whole genome amplification 655 his assistance with the Experion Automated Electrophoresis Drees, K.P., Neilson, J.W., Betancourt, J.L., Quade, J., System; Mark Buijs and Jessica Koopman for their assis- Henderson, D.A., Pryor, B.M., and Maier, R.M. (2006) Bac- tance during the pyrosequencing process. This research terial community structure in the hyperarid core of the was funded by a grant from the Netherlands Organization Atacama Desert, Chile. Appl Environ Microbiol 72: 7902– for Scientific Research (NWO/SRON User Support Pro- 7908. gramme Space Research, in support of the project ‘Molecular Edgar, R.C. (2010) Search and clustering orders of magni- detection of life on Mars’ ALW-GO- PL/07–11). Pascale tude faster than BLAST. Bioinformatics 26: 2460–2461. Ehrenfreund acknowledges support from the NASA Edgar, R.C., Haas, B.J., Clemente, J.C., Quince, C., and Astrobiology Institute. Knight, R. (2011) UCHIME improves sensitivity and speed of chimera detection. Bioinformatics 27: 2194–2200. References Ehrenfreund, P., Röling, W.F.M., Thiel, C.S., Quinn, R., Sephton, M.A., Stoker, C., et al. (2011) Astrobiology and Abulencia, C.B., Wyborski, D.L., Garcia, J.A., Podar, M., habitability studies in preparation for future Mars missions: Chen, W., Chang, S.H., et al. (2006) Environmental whole- trends from investigating minerals, organics and biota. Int J genome amplification to access microbial populations in Astrobiology 10: 239–253. contaminated sediments. Appl Environ Microbiol 72: 3291– Felske, A., Rheims, H., Wolterink, A., Stackebrandt, E., and 3301. Akkermans, A.D.L. (1997) Ribosome analysis reveals Binga, E.K., Lasken, R.S., and Neufeld, J.D. (2008) prominent activity of an uncultured member of the class Something from (almost) nothing: the impact of multiple Actinobacteria in grassland soils. Microbiology 143: 2983– displacement amplification on microbial ecology. ISME J 2: 2989. 233–241. Forterre, P. (2006) Three RNA cells for ribosomal lineages de Boer, T.E., Tas¸, N., Braster, M., Temminghoff, E.J.M., and three DNA viruses to replicate their genomes: a Röling, W.F.M., and Roelofs, D. (2011) The influence of hypothesis for the origin of cellular domain. Proc Natl Acad long-term copper contaminated agricultural soil at differ- SciUSA103: 3669–3674. ent pH levels on microbial communities and springtail Garcia-Garcera, M., Gigli, E., Sanchez-Quinto, F., Ramirez, transcriptional regulation. Environ Sci Technol 46: 60–68. O., Calafell, F., Civit, S., and Lalueza-Fox, C. (2011) Frag- Brandt, B.W., Bonder, M.J., Huse, S.M., and Zaura, E. (2012) mentation of contaminant and endogenous DNA in ancient TaxMan: a server to trim rRNA reference databases and samples determined by shotgun sequencing; prospects for inspect taxonomic coverage. Nucleic Acids Res 40: W82– human palaeogenomics. PLoS ONE 6: e24161. W87. Hammer, Ø., Harper, D.A.T., and Ryan, P.D. (2001) PAST: Bredel, M., Bredel, C., Juric, D., Kim, Y., Vogel, H., Harsh, paleontological statistics software package for education G.R., et al. (2005) Amplification of whole tumor genomes and data analysis. Palaeontologia Electron 4: 9. and gene-by-gene mapping of genomic aberrations from Hawkins, T.L., Detter, J.C., and Richardson, P.M. (2002) limited sources of fresh-frozen and paraffin-embedded Whole genome amplification – applications and advances. DNA. J Mol Diagn 7: 171–182. Curr Opin Biotechnol 13: 65–67. Caporaso, J.G., Kuczynski, J., Stombaugh, J., Bittinger, K., He, Z.L., Gentry, T.J., Schadt, C.W., Wu, L.Y., Liebich, J., Bushman, F.D., Costello, E.K., et al. (2010) QIIME allows Chong, S.C., et al. (2007) GeoChip: a comprehensive analysis of high-throughput community sequencing data. microarray for investigating biogeochemical, ecological Nat Methods 7: 335–336. and environmental processes. ISME J 1: 67–77. Cheung, V.G., and Nelson, S.F. (1996) Whole genome ampli- Hildebrand, F., Meyer, A., and Eyre-Walker, A. (2010) Evi- fication using a degenerate oligonucleotide primer allows dence of selection upon genomic GC-content in bacteria. hundreds of genotypes to be performed on less than one PLoS Genet 6: e1001107. nanogram of genomic DNA. Proc Natl Acad Sci U S A 93: Huber, H., Hohn, M.J., Rachel, R., Fuchs, T., Wimmer, V.C., 14676–14679. and Stetter, K.O. (2002) A new phylum of Archaea repre- Cole, J.R., Wang, Q., Cardenas, E., Fish, J., Chai, B., Farris, sented by a nanosized hyperthermophilic symbiont. Nature R.J., et al. (2009) The Ribosomal Database Project: 417: 63–67. improved alignments and new tools for rRNA analysis. Isenbarger, T.A., Carr, C.E., Johnson, S.S., Finney, M., Nucleic Acids Res 37: D141–D145. Church, G.M., Gilbert, W., et al. (2008) The most con- d’Abbadie, M., Hofreiter, M., Vaisman, A., Loakes, D., served genome segments for life detection on earth and Gasparutto, D., Cadet, J., et al. (2007) Molecular breeding other planets. Orig Life Evol Biosph 38: 517–533. of polymerases for amplification of ancient DNA. Nat Itoh, T., Yamanoi, K., Kudo, T., Ohkuma, M., and Takashina, Biotechnol 25: 939–943. T. (2011) Aciditerrimonas ferrireducens gen. nov., sp. nov., Dean, F.B., Hosono, S., Fang, L., Wu, X., Faruqi, A.F., an iron-reducing thermoacidophilic actinobacterium iso- Bray-Ward, P., et al. (2002) Comprehensive human lated from a solfataric field. Int J Syst Evol Microbiol 61: genome amplification using multiple displacement amplifi- 1281–1285. cation. Proc Natl Acad Sci U S A 99: 5261–5266. Kämpfer, P., Young, C.-C., Arun, A.B., Shen, F.-T., Jäckel, U., Direito, S.O.L., Ehrenfreund, P., Marees, A., Staats, M., Rosselló-Mora, R., et al. (2006) Pseudolabrys taiwanensis Foing, B., and Röling, W.F.M. (2011) A wide variety of gen. nov., sp. nov., an alphaproteobacterium isolated from putative extremophiles and large beta-diversity at the Mars soil. Int J Syst Evol Microbiol 56: 2469–2472. Desert Research Station (Utah). Int J Astrobiology 10: Kim, K.-H., and Bae, J.-W. (2011) Amplification methods bias 191–207. metagenomic libraries of uncultured single-stranded and

© 2013 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology, 16, 643–657 656 S. O. L. Direito et al.

double-stranded DNA viruses. Appl Environ Microbiol 77: Peplies, J., and Glöckner, F.O. (2007) SILVA: a compre- 7663–7668. hensive online resource for quality checked and aligned Kraneveld, E.A., Buijs, M.J., Bonder, M.J., Visser, M., Keijser, ribosomal RNA sequence data compatible with ARB. B.J.F., Crielaard, W., and Zaura, E. (2012) The relation Nucleic Acids Res 35: 7188–7196. between oral Candida load and bacterial microbiome pro- Pugh, T.J., Delaney, A.D., Farnoud, N., Flibotte, S., Griffith, files in Dutch older adults. PLoS ONE 7: e42770. M., Li, H.I., et al. (2008) Impact of whole genome amplifi- Lage, J.M., Leamon, J.H., Pejovic, T., Hamann, S., Lacey, M., cation on analysis of copy number variants. Nucleic Acids Dillon, D., et al. (2003) Whole genome analysis of genetic Res 36: e80. alterations in small DNA samples using hyperbranched Raghunathan, A., Ferguson, H.R., Bornarth, C.J., Song, W., strand displacement amplification and array – CGH. Driscoll, M., and Lasken, R.S. (2005) Genomic DNA ampli- Genome Res 13: 294–307. fication from a single bacterium. Appl Environ Microbiol 71: Lasken, R.S., and Egholm, M. (2003) Whole genome 3342–3347. amplification: abundant supplies of DNA from precious Reeder, J., and Knight, R. (2010) Rapidly denoising samples or clinical specimens. Trends Biotechnol 21: pyrosequencing amplicon reads by exploiting rank- 531–535. abundance distributions. Nat Methods 7: 668–669. Li, Y., Kim, H.-J., Zheng, C., Chow, W.H.A., Lim, J., Keenan, Reysenbach, A.L., Giver, L.J., Wickham, G.S., and Pace, B., et al. (2008) Primase-based whole genome amplifica- N.R. (1992) Differential amplification of rRNA genes by tion. Nucleic Acids Res 36: e79. polymerase chain reaction. Appl Environ Microbiol 58: Lim, J.H., Baek, S.-H., and Lee, S.-T. (2009) Ferruginibacter 3417–3418. alkalilentus gen. nov., sp. nov. and Ferruginibacter lapsin- Schaerli, Y., Stein, V., Spiering, M.M., Benkovic, S.J., Abell, anis sp. nov., novel members of the family ‘Chitinophaga- C., and Hollfelder, F. (2010) Isothermal DNA amplification ceae’ in the phylum Bacteroidetes, isolated from freshwater using the T4 replisome: circular nicking endonuclease- sediment. Int J Syst Evol Microbiol 59: 2394–2399. dependent amplification and primase-based whole- Marmur, J., and Doty, P. (1959) Heterogeneity in genome amplification. Nucleic Acids Res 38: e201. deoxyribonucleic acids. I. Dependence on composition of Schneegurt, M.A., Dore, S.Y., and Kulpa, C.F., Jr (2003) the configurational stability of deoxyribonucleic acids. Direct extraction of DNA from soils for studies in microbial Nature 183: 1427–1429. ecology. Curr Issues Mol Biol 5: 1–8. Miller, D.N., Bryant, J.E., Madsen, E.L., and Ghiorse, W.C. Shoaib, M., Baconnais, S., Mechold, U., Le Cam, E., Lipinski, (1999) Evaluation and optimization of DNA extraction and M., and Ogryzko, V. (2008) Multiple displacement amplifi- purification procedures for soil and sediment samples. Appl cation for complex mixtures of DNA fragments. BMC Environ Microbiol 65: 4715–4724. Genomics 9: 415. Montero-Barrientos, M., Rivas, R., Velázquez, E., Monte, E., Spits, C., Le Caignec, C., De Rycke, M., Van Haute, L., and Roig, M.G. (2005) Terrabacter terrae sp. nov., a novel Van Steirteghem, A., Liebaers, I., and Sermon, K. (2006) actinomycete isolated from soil in Spain. Int J Syst Evol Whole-genome multiple displacement amplification from Microbiol 55: 2491–2495. single cells. Nat Protoc 1: 1965–1970. Mullis, K., Faloona, F., Scharf, S., Saiki, R., Horn, G., and Tate, C.M., Nunez, A.N., Goldstein, C.A., Gomes, I., Erlich, H. (1986) Specific enzymatic amplification of DNA in Robertson, J.M., Kavlick, M.F., and Budowle, B. (2012) vitro: the polymerase chain reaction. Cold Spring Harb Evaluation of circular DNA substrates for whole genome Symp Quant Biol 51 (Part 1): 263–273. amplification prior to forensic analysis. Forensic Sci Int Muyzer, G., Dewaal, E.C., and Uitterlinden, A.G. (1993) Genet 6: 185–190. Profiling of complex microbial populations by denaturing Tyson, G.W., Chapman, J., Hugenholtz, P., Allen, E.E., Ram, gradient gel electrophoresis analysis of polymerase R.J., Richardson, P.M., et al. (2004) Community structure chain reaction-amplified genes coding for 16S rRNA. Appl and metabolism through reconstruction of microbial Environ Microbiol 59: 695–700. genomes from the environment. Nature 428: 37–43. Pace, N.R., Stahl, D.A., Lane, D.J., and Olsen, G.J. (1986) van Verseveld, H.W., and Röling, W.F.M. (2004) Cluster The analysis of natural microbial-populations by ribosomal- analysis and statistical comparison of molecular commu- RNA sequences. Adv Microb Ecol 9: 1–55. nity profile data. In Molecular Microbial Ecology Manual, Pinard, R., de Winter, A., Sarkis, G.J., Gerstein, M.B., Vol. 2, 2nd edn. Kowalchuk, G.A., de Bruijn, F.J., Head, Tartaro, K.R., Plant, R.N., et al. (2006) Assessment of I.M., Akkermans, A.D.L., and van Elsas, J.D. (eds). whole genome amplification-induced bias through high- Dordrecht, The Netherlands: Kluwer Academic Publishers, throughput, massively parallel whole genome sequencing. pp. 1373–1396. BMC Genomics 7: 216. Vincent, M., Xu, Y., and Kong, H. (2004) Helicase-dependent Pinheiro, V.B., Taylor, A.I., Cozens, C., Abramov, M., isothermal DNA amplification. EMBO Rep 5: 795–800. Renders, M., Zhang, S., et al. (2012) Synthetic genetic Vora, G.J., Meador, C.E., Stenger, D.A., and Andreadis, J.D. polymers capable of heredity and evolution. Science 336: (2004) Nucleic acid amplification strategies for DNA 341–344. microarray-based pathogen detection. Appl Environ Pointing, S.B., Chan, Y.K., Lacap, D.C., Lau, M.C.Y., Jurgens, Microbiol 70: 3047–3054. J.A., and Farrell, R.L. (2009) Highly specialized microbial Wu, L., Liu, X., Schadt, C.W., and Zhou, J. (2006) Microarray- diversity in hyper-arid polar desert. Proc Natl Acad Sci U S based analysis of subnanogram quantities of microbial A 106: 19964–19969. community DNAs by using whole-community genome Pruesse, E., Quast, C., Knittel, K., Fuchs, B.M., Ludwig, W., amplification. Appl Environ Microbiol 72: 4931–4941.

© 2013 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology, 16, 643–657 Evaluation of bias induced by whole genome amplification 657

Yilmaz, S., Allgaier, M., and Hugenholtz, P. (2010) Multiple A. UPGMA cluster analysis of DGGE profiles (30–55% dena- displacement amplification compromises quantitative turant gradient) of bacterial 16S rRNA gene fragments after analysis of metagenomes. Nat Methods 7: 943–944. Pearson correlation. For each lane, the applied amplification Yoon, M.-H., and Im, W.-T. (2007) Flavisolibacter method is indicated: control samples not subjected to WGA ginsengiterrae gen. nov., sp. nov. and Flavisolibacter (Control) and samples subjected to WGA (MDA and pWGA) ginsengisoli sp. nov., isolated from ginseng cultivating soil. and the number represents the replicate. Int J Syst Evol Microbiol 57: 1834–1839. B. Cluster analysis with Pearson correlation as similarity Zhang, K., Martiny, A.C., Reppas, N.B., Barry, K.W., Malek, measure for final taxa (genus) level obtained for samples J., Chisholm, S.W., and Church, G.M. (2006) Sequencing subjected to 454 pyrosequencing. genomes from single cells by polymerase cloning. Nat C. Weighted UniFrac cluster analysis at the out level (using a Biotechnol 24: 680–686. normalization of 620 sequences). The scale bar indicates a weighted UniFrac distance of 0.05. Fig. S2. Cluster analysis of DGGE profiles of bacterial 16S rRNA gene fragments in a defined mixture of six different Supporting information species (Bacillus cereus, Deinococcus radiodurans R1, Desulfitobacterium hafniense Y51, Escherichia coli CL4B, Additional Supporting Information may be found in the online Paracoccus denitrificans Pd1222 and Shewanella put- version of this article at the publisher’s web-site: refaciens 200R; Table 3) used to test impact of GC content, Fig. S1. Impact of whole genome amplification approaches with Pearson correlation as similarity measure after determin- on bacterial community representation for a Cu-polluted soil ing the relative DGGE band surface intensities for each sample from the Wildekamp, the Netherlands. species.

© 2013 Society for Applied Microbiology and John Wiley & Sons Ltd, Environmental Microbiology, 16, 643–657