Article The Influence of DNA Extraction and Lipid Removal on Human Milk Bacterial Profiles

Anna Ojo-Okunola 1,* , Shantelle Claassen-Weitz 1 , Kilaza S. Mwaikono 2,3 , Sugnet Gardner-Lubbe 4 , Heather J. Zar 5,6,7, Mark P. Nicol 1,8 and Elloise du Toit 1

1 Division of Medical Microbiology, Department of Pathology, Faculty of Health Sciences, Observatory 7925, University of Cape Town, Cape Town 7700, South Africa; tellafi[email protected] (S.C.-W.); [email protected] (M.P.N.); [email protected] (E.d.T.) 2 Computational Biology Group and H3ABioNet, Department of Integrative Biomedical Sciences, Observatory 7925, University of Cape Town, Cape Town 7700, South Africa; [email protected] 3 Department of Science and Laboratory Technology, Dar es Salaam Institute of Technology, 11000 Dar es Salaam, Tanzania 4 Department of Statistics and Actuarial Science, Faculty of Economic and Management Sciences, Stellenbosch University, Matieland 7602, Stellenbosch, South Africa; [email protected] 5 Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, Observatory 7925, University of Cape Town, Cape Town 7700, South Africa; [email protected] 6 SAMRC Unit on Child & Adolescent Health, Observatory 7925, University of Cape Town, Cape Town 7700, South Africa 7 Department of Pediatrics and Child Health, Red Cross War Memorial Children’s Hospital, Rondebosch, Cape Town 7700, South Africa 8 School of Biomedical Sciences, Division of Infection and Immunity, The University of Western Australia, M504, Perth, WA 6009, Australia * Correspondence: [email protected]

 Received: 16 April 2020; Accepted: 6 May 2020; Published: 15 May 2020 

Abstract: Culture-independent molecular techniques have advanced the characterization of environmental and human samples including the human milk (HM) bacteriome. However, extraction of high-quality genomic DNA that is representative of the bacterial population in samples is crucial. Lipids removal from HM prior to DNA extraction is common practice, but this may influence the bacterial population detected. The objective of this study was to compare four commercial DNA extraction kits and lipid removal in relation to HM bacterial profiles. Four commercial DNA extraction kits, QIAamp® DNA Kit, ZR Fungal/Bacterial DNA MiniPrep™, QIAsymphony DSP DNA Kit and ZymoBIOMICS™ DNA Miniprep Kit, were assessed using milk collected from ten healthy lactating women. The kits were evaluated based on their ability to extract high quantities of pure DNA from HM and how well they extracted DNA from bacterial communities present in a commercial mock microbial community standard spiked into HM. Finally, the kits were evaluated by assessing their extraction repeatability. Bacterial profiles were assessed using Illumina MiSeq sequencing targeting the V4 region of the 16S rRNA . The ZR Fungal/Bacterial DNA MiniPrep™ and ZymoBIOMICS™ DNA Miniprep (Zymo Research Corp., Irvine, CA, USA) kits extracted the highest DNA yields with the best purity. DNA extracted using ZR Fungal/Bacterial DNA MiniPrep™ best represented the in the mock community spiked into HM. In un-spiked HM samples, DNA extracted using the QIAsymphony DSP DNA kit showed statistically significant differences in taxa prevalence from DNA extracted using ZR Fungal/Bacterial DNA MiniPrep™ and ZymoBIOMICS™ DNA Miniprep kits. The only difference between skim and whole milk is observed in bacterial profiles with differing relative abundances of Enhydrobacter and Acinetobacter. DNA extraction, but not lipids removal, substantially influences bacterial profiles detected in HM samples, emphasizing the need for careful selection of a DNA extraction kit to improve DNA recovery from a range of bacterial taxa.

Methods Protoc. 2020, 3, 39; doi:10.3390/mps3020039 www.mdpi.com/journal/mps Methods Protoc. 2020, 3, 39 2 of 17

Keywords: 16S rRNA gene sequencing; bacterial profiles; DNA extraction; human milk; skim milk; whole milk

1. Introduction There is a growing interest in the role that human milk (HM) microbes play in infant and maternal health. The HM microbiota has been shown to have a role in the development of the infant gut bacteriome and in promoting programming of the [1,2]. Randomized clinical trials have shown that the clinical signs of mastitis were alleviated among women with staphylococcal lactational mastitis after the oral administration of HM-derived Lactobacillus salivarius and Lactobacillus gasseri strains as compared with the control group who ingested the placebo; HM bacteria may therefore serve as an alternative treatment for lactational infectious mastitis caused by Staphylococcus aureus [3,4]. Studies characterizing the bacterial diversity in HM initially were based on culture-dependent techniques, which had several limitations including detection of only viable and being labor-intensive [5–7]. Culture-independent molecular techniques, enabled by next-generation sequencing (NGS), can profile bacteria in complex environments and provide detailed phylogenetic information [8,9]. For this, however, high-quality genomic DNA that is representative of the microbial communities is required. The optimization of DNA extraction methods has been documented for samples such as feces, vagina, soil, bovine milk, saliva and colonic tissue [10–17], with only a recent study having been conducted in HM samples [18]. This is crucial as methodological variation such as the use of different DNA extraction kits may impact microbial community profiling [19]. Moreover, the effect of removing the lipids layer of milk prior to DNA extraction is unknown, even though this approach is commonly used [20–22]. Lipids-rich tissue can cause difficulties in DNA extraction due to lipids interfering with tissue disruption or by influencing the chemistry of the DNA isolation buffers [23]. Optimization of DNA extraction from HM is necessary as HM is known to have a relatively low bacterial biomass, and interfering substances (such as proteins) pose a challenge to the extraction of large amounts of quality DNA [17]. In addition, the HM bacteriome has been reported to contain a variety of Gram-positive and Gram-negative bacterial species [20] with differing wall composition, which makes some species more difficult to lyse than others. Improper lysis of these two groups of bacteria may result in a biased representation of the bacterial community present in HM samples. Hence, methods need to be tested for their effectiveness, efficiency to lyse bacterial cells, and the quality of the extracted DNA. The aim of this study was to compare and evaluate the extraction of bacterial genomic DNA from HM samples using four commercial DNA extraction kits. Selection of kits used in this study was based on their availability and prior use for HM bacteriome studies. We also incorporated a mock microbial community with predetermined DNA ratios from a mixture of bacterial species to assess bias of the DNA extraction kits. Furthermore, whole milk (WM) and skim milk (SM) were compared to determine whether the removal of the lipids layer affected the bacterial population detected in HM.

2. Material and Methods

2.1. Subjects and Sample Collection HM samples were collected from ten healthy lactating women residing in Cape Town, South Africa after their consent was obtained. The women were asked to wash their hands, their nipples and surrounding area with soap and water. Milk was collected manually by hand expression into a 50 mL sterile collection bottle after discarding the first few drops. After collection, the samples were transported on ice, aliquoted, and stored at 20 C until further processing. This study received ethical − ◦ MethodsMethods Protoc. Protoc.2020 2020, 3,, 3 39, x FOR PEER REVIEW 33 of of 17 17

transported on ice, aliquoted, and stored at −20 °C until further processing. This study received ethical approvalapproval from from the the University University of of Cape Cape Town Town Human Human Research Research Ethics Ethics Committee, Committee, South South Africa Africa (HREC (HREC REF:REF: 649 649/2016)./2016).

2.2.2.2. Methods Methods of of DNA DNA Extraction Extraction EachEach of of the the tenten HMHM samplessamples was processed as as un un-spiked-spiked SM SM (n (n = =10)10) and and WM WM (n (n= 10).= 10). DNA DNA was wasextracted extracted in duplicatein duplicate from from each each SM SM and and WM WM sample, sample, using using 4 4 didifferentfferent kits kits (total(total number number of of extractsextracts= = 160,160, FigureFigure1 1A).A). InIn addition,addition, as an extraction control, control, one one HM HM sample sample was was divided divided into into SM 9 SM and WM, with each 1 mL sample spiked with 75 µL (1 10 9cells) of Zymobiomics Microbial and WM, with each 1 mL sample spiked with 75 µL (1 ×× 10 cells) of Zymobiomics Microbial CommunityCommunity Standard Standard (ZMCS), (ZMCS), (Catalogue (Catalogue no. no. D6300, D6300, Zymo Zymo Research Research Corp., Corp., Irvine, Irvine, CA, CA, USA). USA). As As per per manufacturer’smanufacturer’s specifications, specifications, the the genomic genomic DNA DNA abundance abundance (%) (%) for for each each bacteria bacteria species species is is 12% 12% while while thethe microbial microbial operational operational taxonomic taxonomic unit unit (OTUs) (OTUs relative) relative abundances abundances are arePseudomonas Pseudomonasspp. spp. (4.6%), (4.6%), EnterobacteriaceaeEnterobacteriaceae 1 1 (11.3%), (11.3%), Enterobacteriaceae Enterobacteriaceae 2 2 (10.0%), (10.0%),Listeria Listeriaspp. spp. (15.9%), (15.9%),Staphylococcus Staphylococcusspp. spp. (13.3%),(13.3%),Lactobacillus Lactobacillusspp. spp. (18.8%), (18.8%), EnterococcusEnterococcus spp.spp. (10.4%) (10.4%) and and BacillusBacillus spp.spp. (15.7%) (15.7%) (Table (Table S1).S1). CommercialCommercial ZMCSZMCS was was employed employed as asthe the mock mock community community because because it is readily it is available readily available and affordable and affandordable also and manufactured also manufactured by Zymo by Research Zymo Research Corp., Irvine, Corp., Irvine,CA, USA, CA, the USA, same the samecompany company which whichmanufactured manufactured two of two the of DNA the DNA kits assessedkits assessed in the in thestudy. study. DNA DNA was was extracted extracted from from replicates replicates (on (onconsecutive consecutive days) days) of the of thesame same spiked spiked sampl samplee with with each each of the of the4 kits 4 kits(n = (n16)= (Figure16) (Figure 1B). 1 B).

FigureFigure 1. 1.Human Human milk milk samples samples collection collection per per milk milk type type and and DNA DNA extraction extraction method. method.( A (A)) DNA DNA extractionextraction from from un-spiked un-spiked skim skim milk milk and and whole whole milk milk samples. samples. ( B(B) DNA) DNA extraction extraction from from spiked spiked skim skim milkmilk and and whole whole milk milk samples. samples.

HMHM samples samples were were homogenized homogenized by by vortexing vortexing and and SM SM was was prepared prepared by by adapting adapting a a previously previously published protocol [24]. In brief, the samples were centrifuged at 3500 g for 20 min at 10 C, and the published protocol [24]. In brief, the samples were centrifuged at 3500× g for 20 min at− −10◦ °C, and the fatfat layer layer removed removed by by a a 10 10µ LμL disposable disposable inoculation inoculation loop. loop. The The supernatant supernatant was was thereafter thereafter centrifuged centrifuged in the same tube at 7600 g for 10 min at room temperature, and the cell pellet used for further in the same tube at 7600× g for 10 minutes at room temperature, and the cell pellet used for further processing. The pellet from WM was prepared by centrifugation of the original milk sample at 7600 g processing. The pellet from WM was prepared by centrifugation of the original milk sample at 7600× g forfor 10 10 min min at at room room temperature. temperature. DNA DNA was was extracted extracted from from un-spiked un-spiked HM HM samples samples (n (n= =160) 160) and and spiked spiked HMHM samples samples (n (n= =16) 16) using using the the recommended recommended starting starting volume volume of of the the four four di differentfferent commercial commercial DNA DNA kitskits described described below: below: ® KitKit A A (QIAamp (QIAamp®DNA DNA Microbiome Microbiome Kit), Kit), (Qiagen, (Qiagen, Hilden, Hilden, Germany): Germany): 500 500µ μLL Bu Bufferffer AHL AHL (host (host cellcell lysis lysis bu buffer)ffer) was was added added to to either either 1 1 mLmLof of WM WM (or (or SM) SM) for for DNA DNA extraction extraction from from WM WM and and SM SM respectively.respectively. The The pellet pellet obtained obtained after after centrifugation centrifugation was was used used for for further further processing. processing. KitKit B B (ZR (ZR Fungal Fungal/Bacterial/Bacterial DNA DNA MiniPrep MiniPrep™™), (Zymo), (Zymo Research Research Corp., Corp., Irvine, Irvine, CA, USA): CA, TheUSA pellet): The obtainedpellet obtained from centrifugation from centrifugation of 1mL of WM 1mL (orof SM),WM was(or SM) resuspended, was resuspended in 250 µL in of 250the resultantμL of the

Methods Protoc. 2020, 3, 39 4 of 17 supernatant before proceeding to add 750 µL Lysis Solution (Zymo Research Corp., Irvine, CA, USA) to the tube. Kit C (QIAsymphony DSP DNA), (Kit Qiagen, Hilden, Germany): The pellet obtained from centrifugation of 1 mL of WM (or SM) was resuspended in 250 µL of the resultant supernatant. An “off-board” mechanical lysis step followed as recommended by the manufacturer and as previously described [11], using 750 µL Lysis Solution and ZR BashingBeadTM (Zymo Research Corp., Irvine, CA, USA). Following mechanical lysis, the lysate was centrifuged at 5800 g for 1 min, and 400 µL × of the supernatant was used for DNA extraction on the QIAsymphony® SP instrument (Qiagen, Hombrechtikon, Switzerland). Kit D (ZymoBIOMICS™ DNA Miniprep Kit), (Zymo Research Corp., Irvine, CA, USA): the pellet obtained from centrifugation of 1 mL of WM (or SM), was resuspended in 250 µL of the resultant supernatant before proceeding to add 750 µL ZymoBIOMICS™ Lysis Solution (Zymo Research Corp., Irvine, CA, USA) to the tube. An elution volume of 50 µL was used for all the kits except kit C, in which the minimum elution volume was 60 µL as set by the supplier. For homogeneity and to ensure higher concentrations of DNA samples (as recommended by the manufacturer), 50 µL elution volume was used for Kit D. With the exception of Kit D, which was eluted in DNase/RNase Free Water, DNA eluted from other kits were extracted in elution buffers. All bead-beating steps were performed in the TissueLyser LT™ (Qiagen, FRITSCH GmbH, Idar-Oberstein, Germany) at a frequency of 50 Hz for 5 min.

2.3. DNA Quantification The concentration and purity of DNA were measured using a NanoDrop™ ND-2000c Spectrophotometer (Thermo Fisher Scientific Inc., MA, USA). DNA yield was obtained by multiplying the DNA concentration by the final elution volume. The DNA yield from all samples was also assessed by 16S quantitative polymerase chain reaction (qPCR) using a protocol previously described [25]. Each 30 µL PCR reaction contained 2.5 µL DNA template, 1 µL of 0.166 µM probe, 15 µL of SensiFAST™ Probe No-ROX (Catalogue no. BIO-86020, Bioline, MA, USA), 9.5 µL of MilliQ water and 1 µL each of 0.333 µM forward and reverse primer with conditions as described (Table S2A). The qPCR was carried out on a 7500 Fast Real-Time PCR System (Applied Biosystems, Foster City, CA, USA). DNA was stored at 20 C until further processing. − ◦ 2.4. Extraction and Sequencing Controls The ZMCS was extracted using all four kits and served as a positive extraction control. DNA extraction on all HM samples was done in duplicate on two consecutive days using the respective kits to evaluate extraction repeatability. In addition, DNA extracts from two samples were randomly selected for repeat processing (library preparation and sequencing) to evaluate repeatability of steps following DNA extraction. In low biomass samples such as HM samples, a portion of sequence reads may result from exogenous DNA contributed by reagent contaminants used during the process of DNA extraction and 16S rRNA gene library preparation. To allow in silico correction for contamination, cyanobacteria (Arthrospira spirulina) DNA extract obtained from a pure culture of Cyanobacterium (Arthrospira spirulina) was spiked into DNA extracts from each of the respective elution buffers (negative extraction control) at a 16S rRNA gene concentration similar to that of HM samples as assessed by qPCR. Since negative controls have little or no “competing” naturally present bacterial DNA, amplification of this small amount of background DNA may lead to overestimation of the contribution of contaminants to bacterial profiles. We compensated for this effect by spiking an amount of known bacterial DNA into control samples at an equivalent concentration to that found in HM samples. These “cyanobacteria-spiked-elution buffers” were included in the library preparation and sequencing steps alongside the samples. Methods Protoc. 2020, 3, 39 5 of 17

2.5. 16S Ribosomal Ribonucleic Acid (rRNA) Amplicon Library Preparation A two-step amplification approach described by Wu and colleagues [26] was employed to avoid PCR amplification biases associated with the use of adapter and index sequences. In the first PCR reaction, the hypervariable V4 region of the 16S rRNA gene was amplified using primers and PCR cycling conditions as previously described [27,28] (Table S2B). Each 25.25 µL PCR reaction contained 12.5 µL 2 MyTaq™ HS Mix (BIO-25046), 2 µL of 0.8 µM forward and reverse primers, 1 µL of MilliQ × water, 0.75 µL dimethyl sulphoxide (catalog no D4540, Sigma-Aldrich®, St. Louis, MO, USA) and 7 µL DNA template. In the second PCR reaction, the same reagents were used as above, except that the template was 7 µL of the amplicon product from the first PCR reaction, and the reverse primers contained Illumina adapters and various unique index sequences at the 3’ end for each sample [29]. The PCR conditions are the same with the short PCR run as described except that 30 cycles were conducted in the 2nd PCR run [27] (Table S2C). Amplicon products were cleaned with Agencourt SPRIPlate 96 super Magnet Plate, and QuantiFluor™ dsDNA System was used to quantify cleaned amplicons [27,28]. The integrity of the cleaned amplicons was checked by gel electrophoresis. Briefly, 5 µL of each cleaned amplicon was analyzed on a 2% agarose gel containing 1% ethidium bromide. Amplicons were normalized by pooling at an equimolar concentration of 100 ng and purified using Agencourt AMPure system (Beckman Coulter, UK). Pooled library was extracted on 1.5% agarose gel. QIAquick Gel Extraction Kit (Qiagen, MA, USA) was used for gel purification with the following minor modification to manufacturer’s protocol. The elution buffer, Tris-EDTA buffer (pH 8.0), was heated at 70 ◦C to improve amplicon recovery (step 13). Qubit® dsDNA BR Assay Kit was used for final quantification of the pooled 16S library.

2.6. 16S Ribosomal Ribonucleic Acid (rRNA) Gene Sequencing The pooled 16S library was paired-end sequenced on the Illumina® MiSeq™ platform using the MiSeq Reagent v3 kit, 600 cycles (Illumina, CA, USA). The quality control steps entailed (1) the quantitation of adapter-ligated dsDNA using the KAPA Library Quantification Kits (Illumina®) (KAPA Biosystems, MA, USA) and (2) analysis of fragment size of the pooled library with the Agilent High-Sensitivity (HS) DNA Kit (Agilent Technologies, Santa Clara, CA, USA). The library pool was thereafter diluted to 4 nM using Buffer EB (Qiagen, Hilden, Germany), and denatured and neutralized using 0.2 N NaOH and HT1 Buffer (Illumina®). A final library dilution was prepared at 5.5 pM, which was loaded to the sequencer according to the manufacturer’s instructions [30], alongside the sequencing control (PhiX library) spiked into the 16S library at 15% (v/v).

2.7. Bioinformatics Workflow The sequencing quality of FASTQ files was assessed using FASTQC (v0.10.1) package [31]. Forward and reverse sequences were then merged using UPARSE (v7.0.1090), allowing 3 mismatches in overlaps (fastq_maxdiff set to 3), followed by quality filtering using USEARCH9 fastq_filter (sequences truncated to 250bp). Reads with a maximum expected number of error >0.1 were discarded (fastq_maxee set to 0.1) [32]. De-replication and selection of sequences occurring more than once was performed by sortbysize command in USEARCH9. Clustering of sequences into operational taxonomic units (OTUs) (with a clustering radius of 3) was done using USEARCH9 cluster_otus command. The USEARCH9_uchime2_ref tool was used to detect and remove chimeras, and OTU counts were obtained using USEARCH9 usearch-global [33]. Decontamination of HM samples was based on type of kit used and was done by first removing cyanobacteria sequences from the four “cyanobacteria-spiked-NTC” controls. Sequences remaining after the removal of cyanobacteria sequences were identified as “contaminant sequences”. The latter were screened against HM samples by aligning HM sample sequences to spiked control sequences at Methods Protoc. 2020, 3, 39 6 of 17

100% similarity using align_seq.py, based on PyNAST [34]. An average number of reads was calculated for each of the “contaminant sequences” matching at 100% similarity to HM sample sequences. “Contaminant sequences” were removed from HM samples by removing the average number of reads calculated from the four “cyanobacteria-spiked-NTC” controls. Further processing of data was performed using Quantitative Insights Into Microbial Ecology (QIIME) 1.9.1 suite of software tools [35]. OTU picking occurred at 97% sequence similarity, and taxonomic assignment was carried out against SILVA database (Version 132.) [36] using Ribosomal Database Project (RDP) classifier (v2.2) in QIIME (v1.9.1) [35]. Rarefaction plot of Shannon diversity against sequencing depth was also generated in QIIME using alpha_rarefaction.py [35]. The raw sequencing reads were deposited in the NCBI Sequence Read Archive (SRA) database with accession number PRJNA510564.

2.8. Data Analysis Data analysis and graphical illustrations of the data (bar plots, boxplots, dendograms) were generated in R statistical package (version 3.4.1) and R studio 1.1.456 [37]. Agglomerative cluster dendograms were generated by complete linkage hierarchical clustering [38] using the [hclust] function [39]. This hierarchical clustering method is based on the Bray–Curtis dissimilarity index [40] of the R vegan package [41]. Cluster dendogram was performed for all OTUs with relative abundance of >0.5%. Alpha diversity within each sample was measured using the Shannon–Weaver index with function [diversity] in the R package vegan [42], which measures both the richness and evenness of organisms within a given sample. Analysis of Variance (Type II tests) [43] was used to test the significant difference in alpha diversity between groups and to generate a p-value with a significance threshold of p < 0.05, while error estimates were based on Pearson residuals. Log-ratio biplots using a Bayesian prior technique for adjustments of zero counts were made as previously described [44] and employed lambda-scaling to ensure evenness in the “total spread” of the data sets [45]. Log-ratio biplots were used to show multivariate clustering patterns as they are specific for proportions/percentages [46]. Generalized linear models (GLM) were used to test the effect of SM and WM, and the four DNA extraction kits, on HM bacterial profiles at different taxonomy levels. The negative binomial distribution [47] in the package stats with the Quasi-Poisson family function [48] was applied to model over-dispersion. Benjamini–Hochberg method for multiple correction was used to correct all p-values, set at a 5% significance level, by the false discovery rate (FDR) [49]. Tukey’s Honest Significant Differences (HSD) method was used to generate a single-step multiple comparison of means procedure with 95% family-wise confidence intervals [50]. Notched box plots [51] were made to show distribution analysis of the data, as they display the 95% confidence interval for the median.

3. Results

3.1. Influence of DNA Extraction Kits and Lipid Removal on Yield and Quality of DNA Extracted from Un-Spiked Human Milk Samples The efficiencies of four DNA extraction kits were compared based on yield and purity of the extracted DNA from un-spiked HM samples (n = 160) with NanoDrop™ ND-2000c Spectrophotometer (Thermo Fisher Scientific Inc., MA, USA) (Figure2). A significant di fference in the yield of DNA extracted was observed between the kits (p = 8.71 10 9) (Figure2A). Kits B and D gave the highest DNA × − yield; Tukey’s HSD revealed no significant difference between these two kits (p = 0.96). No significant difference was observed in DNA yield when comparing SM and WM (Figure S1A). However, when comparing bacterial 16S DNA concentration from HM samples, using qPCR, no significant difference was observed between kits (p = 0.253) (Figure S2A). Similarly, no significant difference was observed in 16S DNA concentration when comparing SM and WM (p = 0.524) (Figure S2B). DNA purity was assessed using the 260/280 absorbance ratio measure as previously described [18], and no significant differences were observed between the four kits (p = 0.327) (Figure2B), though the Methods Protoc. 2020, 3, 39 7 of 17 purity of DNA varied between the kits. Kits B and D had DNA purity closest to the recommended range of 1.8–2.0, while kit C showed a large variation in DNA purity between samples. There was no significant difference in DNA purity between SM and WM (Figure S1B).

Figure 2. DNA yield and purity of the four different commercial kits. Notched box plots showing (A) the DNA yield and (B) the DNA purity (260/280 absorbance) obtained by using each of the four kits. The notched box signifies the 75% (upper) and 25% (lower) quartile showing the distribution of 50% of the samples. The line inside the box plot represents the median, and the notch the 95% confidence interval for the median. The whiskers (top and bottom) represent the maximum and minimum values. Outliers, which are beyond 1.5 times the interquartile range above the maximum value and below the minimum value, are shown with open circles. *** represents p < 0.001.

3.2. Influence of DNA Extraction Kits and Lipid Removal on Bacterial Profiles Obtained from Mock Microbial Community Standard Spiked into Human Milk To evaluate which DNA extraction kit best extracted the bacterial communities in the known ZMCS community, composition and abundance were assessed after spiking this community into WM and SM of sample 1 (extracted by each of the 4 kits in duplicate) (Figure S3B). When comparing the bacterial 16S DNA concentration in the spiked vs. un-spiked sample, >99.9% of the total 16S DNA within the spiked sample originated from the ZMCS (Table S3), and therefore, the contribution of the endogenous bacterial microbiota of this sample to the profiles generated from this sample might be negligible. The log10 median 16S gene copy numbers/uL are shown in Figure S4. There was a 3.8 log10 difference between the medians of spiked (5.7 log10) and un-spiked (1.9 log10) samples. The spiked-in DNA therefore accounted for >99.9% of the total DNA in these samples. Hierarchical cluster analysis was used to create a dendogram of the bacterial composition of the ZMCS spiked into the HM sample alongside the community profile provided by the manufacturer (Figure S3). Bacterial profiles did not cluster based on whether DNA was extracted from WM or SM but rather based on the DNA extraction kit used. Overall, kit A showed a very different profile compared to ZMCS and the other three kits under investigation. Kit A only extracted DNA from three of the eight bacterial genera/families present in ZMCS. The three other kits (kits B, C and D) represented all the eight bacterial genera/families albeit in differing abundance. Kit C showed the widest variation in composition between replicates, with the samples clustering on different clades of the dendogram. Kit B (and some replicates for kit C) clustered closest to the ZMCS, suggesting the best representation of the microbial community standard. Beta diversity (Bray–Curtis dissimilarity index) was also computed to show the differences in bacterial composition between the composition of ZMCS and DNA extracted from each kit for WM Methods Protoc. 2020, 3, 39 8 of 17 and SM spiked samples. A lower beta diversity value would mean greater similarity between the composition of ZMCS and the HM sample spiked with ZMCS. For SM, kit B had the lowest beta diversity of 0.06, meaning it most closely represented the bacterial profiles of the ZMCS. For WM, the lowest beta diversity of 0.16 was seen for kits B and C (Table S4). We further evaluated the differences in relative abundances of the eight bacterial genera/families expected from the known “theoretical” bacterial profile of ZMCS and those resulting from the extraction kits (Figure3). Gram-negative organisms present in ZMCS were substantially under-represented in samples extracted with Kit A (Figure3A–C). Kit B best represented the Gram-negative organisms in the known mock community with relative abundances closest to the mock community (Figure3A–C). The relative abundances of Gram-negative organisms in the ZMCS were over-represented in DNA extracted using Kit C and Kit D (Figure3A–C). In relation to the five Gram-positive organisms, no kit showed an ideal representation of all (Figure3D–H). Kit B and Kit C resulted in the closest proportional representation of Lactobacillus spp. to the ZMCS. On the other hand, they showed a lower relative abundance of Listeria spp. and Staphylococcus spp. compared with ZMCS. There were no differences in relative abundances of taxa in ZMCS extracted from SM vs. WM (Figure4A–H). Due to the poor performance of kit A in extracting DNA from Pseudomonas spp., Enterobacteriaceae, Enterococcus spp. and Bacillus spp. in the HM sample spiked with ZMCS, the spiked samples were re-extracted with kit A; however, on this occasion, the first steps of the manufacturer’s guidelines involving the initial host DNA depletion step (benzonase treatment) were omitted—a decision based on communication with the manufacturer. The benzonase steps are intended to deplete extracellular bacterial DNA. This modification of the extraction protocol resulted in substantially improved representation of the bacteria in the ZMCS (Figure5). However, at this point, kit A was excluded from further analysis of (un-spiked) HM samples.

Figure 3. Relative abundances of bacterial taxa extracted by kits in comparison to the commercial ZMCS composition. The horizontal line in each box indicates the relative abundance of the in question in the commercial ZMCS as given by the manufacturer. p < 0.05 shows a statistically significant difference in relative abundance of each bacterial taxon between different kits. p-values were generated by Anova test in stats package of R and adjusted using Benjamini–Hochberg’s false discovery rate. ZMCS = Zymobiomics Microbial Community Standard. Methods Protoc. 2020, 3, x FOR PEER REVIEW 8 of 17

organisms in the known mock community with relative abundances closest to the mock community (Figure 3A–C). The relative abundances of Gram-negative organisms in the ZMCS were over-represented in DNA extracted using Kit C and Kit D (Figure 3A–C). In relation to the five Gram- positive organisms, no kit showed an ideal representation of all (Figure 3D–H). Kit B and Kit C resulted in the closest proportional representation o