bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA
1 Exons, Introns, and UCEs Reveal Conflicting Phylogenomic Signals in a Rapid
2 Radiation of Frogs (Ranidae: Hylarana)
3
4 Kin Onn Chan1,2,*, Carl R. Hutter2, Perry L. Wood, Jr.3, L. Lee Grismer4, Rafe M.
5 Brown2
6
7 1 Lee Kong Chian National History Museum, Faculty of Science, National University of
8 Singapore, 2 Conservatory Drive, Singapore 117377. Email: [email protected]
9
10 2 Biodiversity Institute and Department of Ecology and Evolutionary Biology, University
11 of Kansas, Lawrence, KS 66045, USA. Email: [email protected]; [email protected]
12
13 3 Department of Biological Sciences & Museum of Natural History, Auburn University,
14 Auburn, Alabama 36849, USA. Email: [email protected]
15
16 4 Herpetology Laboratory, Department of Biology, La Sierra University, 4500 Riverwalk
17 Parkway, Riverside, California 92505, USA. Email: [email protected]
18
19 *Corresponding author
20
1 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.
21 Abstract.—Numerous types of genomic markers have been used to resolve recalcitrant
22 nodes, yet their relative performance and congruence have rarely been compared directly.
23 Using target-capture sequencing, we obtained more than 12,000 highly informative exons
24 and introns, including ~600 UCEs to address long-standing systematic problems in
25 Southeast Asian Golden-backed frogs of the genus complex Hylarana. To reduce gene
26 tree estimation errors, we filtered the data using different thresholds of taxon
27 completeness and parsimony informative sites (PIS) in addition to using the best-fit
28 models of DNA evolution to estimate individual single-locus gene trees. We then
29 estimated species trees using concatenation (IQ-TREE), summary coalescent (ASTRAL),
30 and distance-based methods (ASTRID). Topological incongruence among these methods
31 and variation in nodal support were examined in detail using a suite of different measures
32 including quartet frequencies, bootstrap, local posterior probabilities, gene concordance
33 factors, and quartet scores. Results showed that high levels of incongruence were present
34 along the backbone of the phylogeny, specifically surrounding short internodes. We also
35 demonstrated that filtering data by PIS was more efficacious at improving congruence
36 compared to filtering by missing data, and that exons were more sensitive to data filtering
37 than introns and UCEs. Despite utilizing more than 6.9 million characters and 2.7 million
38 PIS, analyses failed to converge on a single concordant topology. Instead, exons, introns,
39 and UCEs produced genuinely strongly-supported yet conflicting phylogenetic signals,
40 which affected our phylogeny estimates at different scales/levels—indicating a general,
41 potentially alarming challenge for phylogenomics inference employing many of todays
42 massive datasets. Additionally, bootstrap values were consistently high despite low levels
43 of congruence and high proportions of gene trees that support conflicting topologies, bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA
44 indicating that traditional bootstraps are likely poor measures of congruence or branch
45 support in large phylogenomic datasets, especially during instances of rapid
46 diversification. Although low bootstrap values do ostensibly reflect low heuristic support,
47 we recommend that high bootstrap support obtained from large genomic datasets be
48 interpreted with caution. Additional complimentary measures such quartet frequencies,
49 gene concordance factors, quartet scores, and posterior probabilities can be useful to
50 provide a more robust and accurate representation of bipartition certainty and ultimately,
51 evolutionary history of incompletely resolved or poorly-understood clades.
52 Keywords: FrogCap, bootstrap, branch support, incongruence, quartet frequency, gene
53 concordance factor
54
55 Generating large amounts of data is no longer an issue in the era of
56 phylogenomics. Instead, limitations are imposed by model complexities (parameter
57 space) and computational tractability. Furthermore, analyzing genome-scale data has
58 revealed a different suite of challenges including high levels of incongruence, conflicting
59 evolutionary histories, and systematic bias (Gee 2003; Phillips et al. 2004; Philippe et al.
60 2011; Delsuc et al. 2005; Philippe et al. 2005; Jeffroy et al. 2006; Galtier and Daubin
61 2008; Dell’Ampio et al. 2014; Smith et al. 2015; Zhang et al. 2015; Leaché et al. 2015;
62 Kendall and Colijn 2016; Crowl et al. 2017; Reddy et al. 2017; Platt et al. 2018; Pease et
63 al. 2018; Roycroft et al. 2019). It is therefore important to find a “sweetspot” that
64 optimizes the shifting trade-off between amount of data and analytical resources without
65 compromising the accuracy of inferences. As such, understanding the impacts of data
66 filtering/subsampling strategies and performing robust assessments on analytical methods
3 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.
67 and the accuracy of species tree inferences are integral components to the rapidly
68 expanding future of the field.
69 Incongruence can arise not only from biological processes such as hybridization,
70 horizontal gene transfer, and incomplete lineage sorting that violate the assumption of
71 orthology (Whitfield and Lockhart 2007; Whitfield and Kjer 2008; Eaton et al. 2015;
72 Meiklejohn et al. 2016; Tarver et al. 2016; Ottenburghs et al. 2017; Léveillé-Bourret et al.
73 2018), but also through systematic biases associated with the analysis of large datasets.
74 Gene tree estimation errors (GTEE) resulting from (but not limited to) model
75 misspecification or insufficient phylogenetic signal can increase noise and affect
76 phylogenetic inference (Roure et al. 2013; Doyle et al. 2015; Roch and Warnow 2015;
77 Vachaspati and Warnow 2015; Blom et al. 2017; Molloy and Warnow 2017; Nute et al.
78 2018). Due to different underlying models and assumptions, different analytical methods
79 such as concatenation, distance-based, and coalescent-based summary methods can also
80 produce variable results. Several studies have argued that concatenation can perform as
81 well or better than summary methods, which may be adversely affected by high GTEE
82 (Gatesy and Springer 2014; Simmons and Gatesy 2015; Tonini et al. 2015). Conversely,
83 concatenation analyses have also been shown to fail or produce spuriously high support
84 for the wrong tree (Weisrock et al. 2012; Wielstra et al. 2014; Roch and Steel 2015;
85 Warnow 2015; Molloy and Warnow 2017; Mendes and Hahn 2018). Although it is
86 widely acknowledged that GTEE is an important analytical challenge, potentially
87 affecting species tree estimation, recent studies suggest that distance- and summary-based
88 methods that are statistically consistent under the MSC model may perform better under a
89 wide range of model conditions—and that they have the potential to produce low error bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA
90 rates when many genes are available and GTEE is low (Bayzid and Warnow 2013; Patel
91 2013; Lanier and Knowles 2015; Roch and Warnow 2015; Mirarab et al. 2016; Baca et
92 al. 2017; Molloy and Warnow 2017; Nute et al. 2018; Vachaspati and Warnow 2018).
93 Therefore, if large amounts of gene trees can be estimated with low GTEE, the power of
94 coalescent-based methods can be harnessed to estimate species trees with high accuracy.
95 Analyses of massive gene sequence datasets have also demonstrated how
96 traditional measures of support such as the non-parametric bootstrap and posterior
97 probabilities can be positively misleading (Phillips et al. 2004; Seo 2008; Wiens and
98 Morrill 2011; Kumar et al. 2012; Weisrock et al. 2012; Yang and Zhu 2018; Roycroft et
99 al. 2019). Resampling methods such as non-parametric bootstrapping essentially measure
100 site-sampling variance as opposed to observed variance in the data. Because site-
101 sampling variance is an inverse function of sample size (amount of data), bootstrap
102 values will inevitably inflate as the amount of data increases (Felsenstein 1985; Kumar et
103 al. 2012); this tendency does not necessarily reflect variation in the data themselves. In
104 contrast, calculating Bayesian posterior probabilities is computationally expensive and
105 can also produce spuriously high support in big datasets (Susko 2008; Yang and Zhu
106 2018). As genome-scale datasets become more common, more robust characterizations of
107 uncertainty is needed to tease apart conflict from true signal strength (Gadagkar et al.
108 2005; Smith et al. 2015; Minh et al. 2018; Pease et al. 2018), which can be
109 disproportionately obfuscated in nodes that are old or separated by short internal branches
110 (Whitfield and Lockhart 2007; Whitfield and Kjer 2008; Rothfels et al. 2012; Meiklejohn
111 et al. 2016; Blom et al. 2017; Léveillé-Bourret et al. 2018; Mclean et al. 2019; Roycroft
112 et al. 2019).
5 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.
113 Fueled in part by the increasing availability of complete genomes and
114 transcriptomes, the development of target capture methods have impelled the
115 phylogenomic revolution through custom-designed probe-sets that target specific
116 genomic markers with the aim of capturing orthologous and informative loci across
117 different evolutionary timescales (Bi et al. 2012; Faircloth et al. 2012; Lemmon et al.
118 2012; Singhal et al. 2017; Collins and Hrbek 2018). Among the freely-available target
119 capture methods, ultra-conserved elements (UCEs) and exonic markers are widely-used
120 to resolve ambiguous relationships (Bi et al. 2012; Faircloth et al. 2012; Blaimer et al.
121 2015; Bragg et al. 2016, 2018; Hugall et al. 2016; Meiklejohn et al. 2016; Baca et al.
122 2017; Van Dam et al. 2017). UCEs are typically used to reconstruct deep-time
123 evolutionary relationships (Crawford et al. 2012; Faircloth et al. 2012, 2013; McCormack
124 et al. 2012), whereas exon-capture methods are more suitable at moderate evolutionary
125 scales (Bi et al. 2012; Bragg et al. 2016; Abdelkrim et al. 2018; Ilves et al. 2018). Faster-
126 evolving, non-coding introns have also been shown to be effective at resolving
127 problematic nodes at the species, genus, and family level (Armstrong et al. 2001; Allen
128 and Omland 2003; DeBry and Seshadri 2005; Creer 2007; Chojnowski et al. 2008;
129 Krauss et al. 2008; Igea et al. 2010; Folk et al. 2015) and furthermore, have been
130 demonstrated to contain stronger and more congruent phylogenetic signals compared to
131 exons (Chen et al. 2017a). Although these different types of genomic markers have been
132 employed to resolve recalcitrant evolutionary relationships, rarely have direct
133 comparisons been made to examine the relative performance and phylogenetic
134 congruency of different markers (Chen et al. 2017c). A newly developed anuran (frogs
135 and toads) probe-set (“FrogCap”) with a module specifically designed for the superfamily bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA
136 Ranoidea, targets more than 12,000 highly informative and orthologous exonic (and their
137 intervening intronic regions) and UCE loci (Hutter 2019), effectively covering a wide
138 range of diversification timescales. We employed the FrogCap probe-set to examine the
139 efficacy of exons, introns, and UCEs to resolve recalcitrant nodes in a systematically
140 chaotic group of frogs that are plagued by pervasive phylogenetic ambiguity and
141 concomitant taxonomic instability.
142 The systematics of the amphibian family Ranidae has one of the most volatile and
143 contentious taxonomic histories among all amphibian groups (Dubois 1992; Chen et al.
144 2005; Dubois et al. 2005; Frost et al. 2006; Che et al. 2007; Stuart 2008; Pyron and
145 Wiens 2011; Oliver et al. 2015; Yuan et al. 2016; Arifin et al. 2018). Within Ranidae, the
146 systematics and taxonomy of Golden-backed frogs of the genus-complex Hylarana sensu
147 lato (s.l.) are particularly problematic, in part, due to morphological similarities of
148 convergence and symplesiomorphy. More than a dozen generic and sub-generic names
149 have been created, synonymized, resurrected, and/or revalidated, with the majority of
150 changes based on morphology and/or Sanger-derived genetic markers (Dubois 1992;
151 Oliver et al. 2015; Chan and Brown 2017; Frost 2019). At present, this group harbors at
152 least 94 species, which are distributed across Africa, Southeast Asia, and Australasia
153 (Frost, 2018), thereby presenting interesting systematic challenges, compelling
154 evolutionary questions, and expansive biogeographic significance. However, the absence
155 of a stable, well-resolved phylogeny prevents the investigation of such questions, to say
156 nothing of accurate species diversity estimates (AmphibiaWeb 2019) and conservation
157 status of the taxa involved (IUCN 2018). To address this challenge and explore the
158 general issue of variation in performance and information content of these classes of data,
7 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.
159 we collected unprecedented amounts of genomic data in the form of exons, introns, and
160 UCEs using the FrogCap probe-set with the main aim of resolving the backbone of the
161 Hylarana s. l. phylogeny. We took specific measures to minimize the effects of GTEE by
162 filtering the data according to various thresholds of taxon completeness (missing data)
163 and phylogenetic information content [proportion of parsimony-informative-sites/loci
164 (PIS)]. Next, individual single-locus gene trees were estimated using the best-fit model of
165 substitution and data were analyzed using concatenation, summary, and distance-based
166 methods. Finally, we assessed incongruence using various measures of branch support
167 including ultrafast bootstrap, local posterior probability, quartet support, and gene
168 concordance factor to: 1) explore their adequacy in capturing the underlying variation in
169 the data; and 2) determine whether uncertainty is due to systematic bias, insufficient
170 phylogenetic signal, or representative of genuine conflict characterized by variable gene
171 histories.
172
173 MATERIALS AND METHODS
174 Taxon Sampling and DNA Extraction
175 We sequenced 31 ingroup samples consisting of 20 species, with representatives
176 from all 10 genera (Table S1). Tissue samples for molecular work were obtained from the
177 museum holdings of The University of Kansas Biodiversity Institute (KU), California
178 Academy of Sciences (CAS), La Sierra University Herpetological Collection, Riverside,
179 California (LSUHC), and the Museum of Vertebrate Zoology, Berkeley (MVZ).
180 Genomic DNA was extracted using the automated Promega Maxwell® RSC Instrument
181 (Tissue DNA kit) and subsequently quantified using the Promega Quantus® Fluorometer. bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA
182
183 Probe Design, Library Preparation, and Sequencing
184 Probe design follows Hutter (2019) and is summarized here. Probes were
185 synthesized as biotinylated RNA oligos in a myBaits kit (Arbor Biosciences™, formerly
186 MYcroarray® Ann Arbor, MI) by matching 25 publicly available transcriptomes to the
187 Nanorana parkeri and Xenopus tropicalis genomes using the program BLAT (Kent
188 2002). Matching sequences were clustered by their genomic coordinates to detect
189 presence/absence across species and to achieve full locus coverage. To narrow the locus
190 selection to coding regions, each cluster was matched to available coding region
191 annotations from the Nanorana parkeri genome using the program EXONERATE (Slater
192 and Birney 2005). Loci from all matching species were then aligned using MAFFT
193 (Katoh and Standley 2013) and subsequently separated into 120 bp-long bait sequences
194 with 2x tiling (50% overlap among baits) using the myBaits-2 kit (40,040 baits) with
195 120mer sized baits. These loci have an additional bait at each end extending into the
196 intronic region to increase the coverage and capture success of these areas. Baits were
197 then filtered, retaining those: without sequence repeats; a GC content of 30%–50%; and
198 baits that did not match to their reverse complement or multiple genomic regions.
199 Additionally, 646 UCEs that contain at least 10% informative sites were included
200 (Alexander et al. 2016).
201 Library preparation was performed by Arbor Biosciences and briefly follows: (1)
202 genomic DNA was sheared to 300–500 bp; (2) adaptors were ligated to DNA fragments;
203 (3) unique identifiers were attached to the adapters to later identify individual samples;
204 (4) biotinylated 120mer RNA library baits were hybridized to the sequences; (5) target
9 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.
205 sequences were selected by adhering to magnetic streptavidin beads; (6) target regions
206 were amplified via PCR; and (7) samples were pooled and sequenced on an Illumina
207 HiSeq PE-3000 with 150 bp paired-end reads. Sequencing was performed at the
208 Oklahoma Medical Research Foundation DNA Sequencing Facility.
209
210 Bioinformatics
211 The bioinformatics pipeline for filtering adapter contamination, assembling loci,
212 and exporting alignments are available on GITHUB, using version 2 of the pipeline
213 (https://github.com/chutter/FrogCap-Sequence-Capture). Adapter contamination and
214 other sequencing artefacts were filtered from raw reads using the program AFTERQC
215 (Chen et al. 2017b). Paired-end reads were merged using the program BBMERGE
216 (Bushnell et al. 2017), which avoids inflating coverage for these regions due to uneven
217 lengths from cleaning (Zhang et al. 2014). The cleaned reads were then assembled de
218 novo using the program SPADES v.3.12 (Bankevich et al. 2012) under a variety of k-mer
219 schemes. SPADES also has built-in error correction, so error correction was not
220 performed prior to assembly. The contigs were then matched against the reference probe
221 sequences with BLAT, keeping only those contigs that uniquely matched to the probe
222 sequences. The final set of matching loci was then aligned on a locus-by-locus basis
223 using MAFFT.
224 Alignments were trimmed and saved separately into usable datasets for
225 phylogenetic analyses and data type comparisons: (1) Introns: the exon previously
226 delimited was trimmed out of the original contig and the two remaining intronic regions
227 were concatenated; (2) Exons: each alignment was adjusted to be in an open-reading bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA
228 frame and trimmed to the largest reading frame that accommodated >90% of the
229 sequences, alignments with no clear reading frame were discarded; (3) Exons-combined,
230 exons from the same gene, which may be linked (Lanier and Knowles 2012; Scornavacca
231 and Galtier 2017), were concatenated and treated as a single locus; and (4) UCEs were
232 also saved as a separate dataset. We applied internal trimming only to the intron and UCE
233 alignments using the program trimAl (automatic1 function; Capella-gutiérrez et al.,
234 2009). All alignments were externally trimmed to ensure that at least 50 percent of the
235 samples had sequence data present.
236
237 Data Filtering and Phylogenetic Analysis
238 We sought to minimize the effects of GTEE by applying two widely-used data
239 filtering strategies. In addition to the unfiltered data, each dataset (Exons, Exons-
240 combined, Introns, and UCEs) was filtered at 50%, 75%, and 95% sampling
241 completeness (loci that did not meet these thresholds were discarded). Because loci with
242 low phylogenetic information can introduce noise and increase GTEE, we also filtered
243 data according to information content using number of parsimony-informative-sites (PIS)
244 as a proxy. We assembled datasets that contained the top 50%, 25%, and 5% of loci with
245 the highest PIS. Summary statistics, partitioning, and concatenation of data were
246 performed using the program AMAS (Borowiec 2016) and custom R scripts.
247 Phylogenetic estimation was performed using concatenation, distance-based, and
248 summary methods. For the concatenation analysis, we used the maximum likelihood
249 program IQ-TREE v1.7 (Nguyen et al. 2015; Chernomor et al. 2016). Due to the sheer
250 number of loci, we only performed an unpartitioned analysis using the GTR+GAMMA
11 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.
251 substitution model (model testing and partitioned analysis for individual loci were not
252 computationally tractable). Branch support was assessed using 1,000 ultrafast bootstrap
253 replicates (UFB; Hoang et al., 2017). Nodes with UFB >95 were considered strongly-
254 supported.
255 Because empirical and simulation studies have suggested that concatenation
256 analysis can result in the wrong tree with high support, and that unpartitioned analysis
257 can be statistically inconsistent in the presence of incomplete lineage sorting (ILS)
258 (Degnan and Rosenberg 2009; Roch and Steel 2015; Warnow 2015), we also performed
259 distance- and summary-based species tree analyses that are ILS aware and statistically
260 consistent under the multi-species coalescent model. The program ASTRAL-III (Zhang
261 et al. 2018), hereafter referred to only as ASTRAL, was used because it has one of the
262 lowest error rates when the number of informative sites are high and has been shown to
263 produce more accurate results compared to other summary methods under a variety of
264 conditions including high ILS and low GTEE (Mirarab et al. 2014; Davidson et al. 2015;
265 Vachaspati and Warnow 2015, 2018; Ogilvie et al. 2016; Molloy and Warnow 2017).
266 Prior to the species tree analysis, IQ-TREE was used to estimate gene trees for each
267 individual locus. To reduce further GTEE arising from model misspecification, we
268 estimated and used the best-fit substitution model for each individual locus as determined
269 by the program ModelFinder (Kalyaanamoorthy et al. 2017). The resulting gene trees
270 were then used as input in the ASTRAL analysis. Finally, because phylogenomic species
271 tree estimation can benefit from a mixture of genes aimed at resolving different parts of
272 the tree (Townsend and Leuenberger 2011; Chen et al. 2015), we performed an ASTRAL
273 analysis on a dataset comprising 500 loci with the highest PIS from the Exons-combined, bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA
274 Introns, and UCE datasets. To improve accuracy of all ASTRAL analyses, we collapsed
275 branches that were below 10% bootstrap support as recommended by the authors (Zhang
276 et al. 2018).
277 Finally, the same sets of gene trees were used to estimate species trees using the
278 distance-based method ASTRID, which has been shown to outperform ASTRAL when
279 many genes are available and when ILS is very high (Vachaspati and Warnow 2015).
280
281 Assessing Incongruence
282 ASTRAL quartet scores were computed to summarize the proportion of induced
283 quartet trees (from individual single-locus gene trees) in the species tree. For example, a
284 score of 0.5 would mean that 50% of quartet trees induced by the gene trees are in the
285 species tree. The normalized Robinson-Fould’s distance (RFDist) was also used to
286 examine topological congruence between each gene tree and the corresponding species
287 tree derived from ASTRAL. We further used quartet support, quartet frequencies, and the
288 gene concordance factor (gCF) to measure the amount of gene tree conflict around each
289 branch of the species tree. Quartet support and frequencies were calculated in ASTRAL
290 to examine the amount of gene tree quartets supporting the primary, second, and third
291 alternative topologies. For every branch of the species tree, the gCF represents the
292 percentage of decisive gene trees containing that branch, while accounting for unequal
293 taxon coverage among gene trees (Minh et al. 2018).
294
295 RESULTS
296 Data Assembly
13 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.
297 After matching assembled contigs with targeted loci, an average of 13,745 contigs
298 were obtained per sample, with a mean and median length of 939.3 and 896.7
299 respectively (Table S2). Overall ingroup taxon occupancy was high with the exception of
300 one sample (Fig. S1). However, subsequent analyses showed that this had no effect on
301 phylogenetic estimation as the sample was consistently recovered in the same position
302 with high support across all datasets and analyses. Our Exons and Introns datasets had
303 similar occupancies that were slightly lower than UCEs and Exons-combined (Fig. S1).
304 Prior to data filtering, the Exons and Introns datasets consisted of more than 12,000 loci;
305 the UCE dataset contained 638 loci. Exons from the same gene were also combined to
306 form a separate dataset (Exons-combined) comprising 2,254 loci (Table 1). UCE loci
307 were longest on average, followed by Exons-combined, Introns, and Exons. On average,
308 our Introns datasets had the highest number of PIS per locus followed by UCE, Exons,
309 and Exons-combined. Intronic loci also had a much higher proportion of PIS compared to
310 all other datasets (0.5–0.6 vs. 0.2–0.3) and consequently, had a much higher sum of total
311 PIS (>2.7 million before filtering; Fig. S2; Table 1).
312 Filtering by completeness did not drastically reduce the number of loci (except at
313 the most stringent threshold of 95% completeness) and the resulting datasets contained
314 more loci and total PIS than datasets filtered by PIS. The UCE dataset was not filtered at
315 5% PIS because too few loci were retained at that threshold. The average proportion of
316 PIS per locus was not substantially affected by data filtering, indicating that captured loci
317 were consistently informative within a particular marker type (Table 1).
318
319 Phylogenetic Estimation bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA
320 Overall, all methods of phylogenetic analyses produced five different topologies
321 (T1–T5; Table 2). However, topology T5 was only recovered from the Introns 95 dataset
322 (248 loci), which contained numerous poorly supported nodes and hence, was considered
323 inaccurate and not included in further discussions. Topologies T1–T4 were only
324 discordant at three nodes differing in relationships of the
325 Humerana+Hylarana/Amnirana, “Amnirana”/Sylvirana, and Hydrophylax/“Hylarana”
326 celebensis clades (Fig. 1). We therefore focused on these problematic clades in
327 downstream analyses.
328 Different filtering strategies did not generally alter tree topology when analyzed
329 using the same method, except at extreme filtering thresholds (Table 2). However,
330 conflicting topologies where recovered in a number of datasets analyzed by the different
331 methods we employed. Our IQ-TREE analysis recovered topology T2 for all Exons and
332 Exons-combined datasets. However, topology T1 was recovered in analyses of most
333 Exons-combined datasets by ASTRAL and ASTRID. Results for the Introns datasets
334 were variable and all five topologies were recovered, but analyses of majority of these
335 datasets resulted in the inference of topology T1. The UCE datasets only recovered the
336 T3 and T4 topologies, which varied according to filtering strategy and method of
337 inference. The combined dataset consisting of 500 most parsimony-informative loci from
338 the Exons-combined, Introns, and UCE datasets produced topology T1 with full support
339 across all analytical methods (Table 2). Individual phylogenies with branch support are
340 presented in the supplementary material.
341
342 Tree Support and Incongruence
15 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.
343 Average bootstrap support for individual gene trees was relatively high across all
344 datasets (>60% for Exons; >75% for Exons-combined, Introns, and UCE), indicating that
345 the gene trees contained high phylogenetic information and low GTEE (Fig. 2). Filtering
346 strategy had a more pronounced impact on Exons and Exons-combined datasets and
347 filtering by PIS produced gene trees with higher average bootstrap support compared to
348 filtering by completeness. Bootstrap support for gene trees in Introns and UCE datasets
349 were less perturbed by data filtering and showed similar but minor improvements (Fig.
350 2). Similarly, the Exons dataset exhibited the highest observed topological incongruence
351 between gene trees and species trees (measured using Robinson-Fould’s Distance,
352 RFDist) and was the most sensitive to data filtering (Fig. 2). Filtering by PIS was also
353 more effective in improving topological congruence compared to filtering by
354 completeness. Topological congruence of Introns and UCE datasets were also least
355 sensitive to data filtering (Fig. 2). Reflecting a similar trend, ASTRAL species tree
356 quartet scores (QS) and mean gCF showed very slight improvements when filtered by
357 completeness in the Exons and Exons-combined datasets, but improvement was markedly
358 higher when filtered by PIS. These scores improved to a much lesser degree for the
359 Introns and UCE datasets. On average, quartet scores and mean gCF were highest in
360 analyses of UCEs, followed by Introns, Exons-combined, and Exons datasets (Table 2).
361
362 Branch Support and Incongruence
363 The inferred level of congruence surrounding each node was strongly and
364 positively correlated with its associated internal branch length and this relationship holds
365 true regardless of data filtering schemes (Fig. 3). The shortest internal branches (Node 1– bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA
366 3) had the lowest gCF and QS and were also where most topological incongruence
367 among datasets occurred (Fig. 1). Overall, bootstrap support values across focal nodes
368 were consistently high and invariant for all data types and filtering strategies, with the
369 exception of Node 1 for the Exons-combined dataset and Node 3 for the UCE dataset
370 (Fig. 4). Conversely, posterior probabilities, gCF, and QS exhibited greater variability
371 across all datasets, presumably providing a better characterization of variation in the data.
372 Unlike gCF, posterior probabilities and QS did not progressively improve with more
373 stringent filtering strategies. Variation in gCF for the Introns and UCE datasets were also
374 smaller compared to Exons and Exons-combined (Fig. 4).
375
376 Node 1.—Topology T2 was recovered only by our Exons and Exons-combined
377 datasets (Table 2). However, an examination of quartet frequencies for the primary,
378 second, and third alternative topologies revealed that relatively equal, and in some
379 datasets equal proportions (unfiltered, 50% and 75% completeness) of gene trees
380 supported either the primary or alternate topology for that node (Fig. 5). For Exons
381 datasets, gCF values were very low (<10%) when unfiltered or filtered by completeness
382 but improved when filtered by PIS. However, despite being associated with low gCF and
383 high proportions of gene trees supporting contrasting topologies, bootstrap values were
384 100 across all Exons datasets (Table S3). Interestingly, although we inferred equal
385 proportions of gene trees supporting either the primary or alternate topologies for the
386 Exons and Exons-combined datasets that were unfiltered and filtered at 50% and 75%
387 completeness (Fig. 5), ASTRAL and ASTRID analyses inferred topologies T2 for Exons
388 and T1 for Exons-combined datasets respectively (Table 2). A closer examination of the
17 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.
389 numbers of gene trees supporting each topology (as opposed to proportions) revealed that
390 only a small number of additional gene trees supported the primary topology in Exons-
391 combined datasets. However, although the number of additional gene tress supporting the
392 primary (over the alternate) topology was very low (not more than 7; Fig. S3), they were
393 sufficient to infer a different topology with relatively high bootstrap support (UFB 89–95;
394 Table S3). Another conflicting result was produced by the Exons-combined dataset
395 filtered at 25% PIS, which recovered the primary topology with high support (UFB=99;
396 PP=1.0) in ASTRAL, but the alternate topology in ASTRID (Table 2). The Introns and
397 UCE datasets unequivocally supported the primary topology for this node (Fig. 5).
398
399 Node 2.—The clear majority of gene trees in the Exons and Exons-combined
400 datasets supported the primary topology, whereas the majority of UCE gene trees
401 supported the alternate topology (Fig. 5). The Introns dataset was more ambiguous and
402 resulted in variable support for conflicting alternate topologies depending on how data
403 was filtered (Fig. 5; Table 2). Topologies were inconsistent at extreme filtering thresholds
404 (95% completeness and top 5% PIS), most likely due to insufficient information from the
405 low numbers of retained loci. For the Introns datasets filtered at 50% and 25% PIS, our
406 ASTRID and concatenated analyses recovered the primary topology with high support
407 (UFB=100), whereas ASTRAL recovered the alternate topology with low support for
408 Introns PIS-50 (PP=0.15) but high support for Introns PIS-25 (PP=1.0; Tables 2, S3).
409 Quartet frequencies indicated equal proportions of gene trees (39%) supporting both
410 primary and alternate topologies in the Introns PIS-50 dataset, and only an additional 1%
411 more supporting the alternate over the primary topology in the Introns PIS-25 dataset bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA
412 (Fig. 5). Despite the high level on incongruence surrounding this node, bootstrap support
413 was 100 for all datasets and local posterior probabilities were 1.0 in analyses of all but
414 two datasets (Introns 95, Introns PIS-50; Table S3).
415
416 Node 3.—Only analyses of our UCE dataset resulted in the inference of topology
417 T4. However, bootstrap and posterior probabilities were low and quartet frequencies
418 showed relatively equal numbers of gene trees supporting either the primary or alternate
419 topology, indicating that topology T4was not strongly supported (Figs. 5, S3; Table S3).
420
421 DISCUSSION
422 Impacts of Data Filtering
423 Our results showed that filtering by PIS is a more successful strategy to improve
424 branch support and topological congruence compared to filtering by taxon
425 completeness/missing data even though fewer loci were retained. Interestingly, the
426 magnitude of improvement was more substantial in exonic loci compared to introns and
427 UCEs, indicating that exonic data are more sensitive to data subsampling; these findings
428 are similar to results reported by Chen et al. (2017a). One possible explanation for this
429 trend is that exonic loci are less variable, contain a higher number of uninformative
430 single-locus gene trees, and therefore, produces more variation in gene tree topologies.
431 Filtering out loci with fewer PIS removes uninformative trees, resulting in less gene tree
432 variation and higher gene tree species tree congruence. Another possible explanation
433 could be provided by the observation that selection may act on exons, and thus, introduce
434 more phylogenetic noise. Congruence improved considerably when exons of the same
19 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.
435 gene were combined (Exons-combined), even though this dataset had significantly fewer
436 loci and total PIS. Unexpectedly, despite exons being the class of data most affected by
437 data filtering and having the highest levels of incongruence among datasets, phylogenetic
438 inference using exons was remarkably congruent across all analyses, producing the T2
439 topology regardless of filtering strategies.
440 Even though certain data filtering strategies increased congruence, stringent
441 filtration could come at the expense of data (loss of character information, which
442 ultimately can bias topological estimates). This possibility was exemplified by datasets
443 filtered using the most stringent criteria (95% completeness and top 5% PIS). These
444 datasets had the highest levels of congruence but produced the most erratic topologies,
445 demonstrating that high congruence does not always translate to accurate results. This
446 finding highlights the importance of finding an optimal balance between data filtering
447 thresholds and the amount of retained data. We found that datasets consisting of fewer
448 than 1000 loci were the most unreliable, regardless of the quality of retained loci.
449 Similarly, analyzing data filtered to maximize information did not always produce better
450 results; when testing for a relationship between PIS and gCF using linear regression, we
451 did not find a significant relationship (Fig. S4).
452
453 Incongruence and Measures of Support
454 Although this study utilized large numbers of informative loci, our results
455 demonstrate that the resolution of recalcitrant nodes, especially those that are separated
456 by very short internal branches, remains a continuing analytical challenge. With the
457 current ability to sample and estimate thousands of individual gene histories in bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA
458 phylogenetic studies of sequence capture data, a single species tree summary consensus
459 topology may still be difficult to empirically estimate. By implementing measures to
460 reduce systematic bias and via a thorough examination of alternate topologies, we
461 demonstrate that different genetic markers such as exons, introns, and UCEs can produce
462 genuinely conflicting phylogenetic signals, which impact phylogenetic inference at
463 multiple scales. At Node 1 (Fig. 5), all Exons datasets supported the alternate topology,
464 whereas Introns and UCEs supported the primary topology. On the other hand for Nodes
465 2 and 3, the Exons, Exons-combined, and Introns datasets supported the primary
466 topology whereas the alternate topology was supported by the UCE dataset. Given the
467 high levels of conflict among different genetic markers, determining which topology
468 represents the species tree, or whether a single “true” species tree even exists, may not be
469 possible or even desirable when we consider that disregarding a considerable amount of
470 genuine but conflicting signal that are inherently present with the data may misrepresent
471 evolutionary history (Philippe et al. 2011; Hahn and Nakhleh 2016; Crowl et al. 2017;
472 Reddy et al. 2017; Rosser et al. 2017; Platt et al. 2018).
473 The presence of strong conflicting signals also highlights the importance of
474 examining not only branch support, but also alternate topologies when assessing the
475 accuracy of phylogenomic trees. This study showed that alternate topologies can be
476 inferred with high support even when there are equal proportions of gene trees supporting
477 conflicting topologies. In such cases, the probability of obtaining the alternate topology
478 may be as likely as the primary topology and this cannot be gleaned by merely observing
479 branch support values. We therefore advocate for the examination and reporting of
480 alternate topologies when evaluating the confidence of phylogenomics trees, instead of
21 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.
481 purely relying on measures of branch support. Furthermore, our results showed that
482 increasing the amount of loci doesn't positively correlate with increasing congruence.
483 Instead, we found a remarkably strong and positive correlation between levels of
484 congruence and its associated branch length across all datasets and marker types (Fig. 3).
485 Although a similar relationship was implied by Wiens et al's. (2008) analysis of 20 loci,
486 we demonstrably consider this phenomenon at a genomic scale, orders of magnitude
487 more expansive, and with data from across the genome.
488 Despite the known shortcomings of traditional bootstrapping for large datasets
489 (Gadagkar et al. 2005; Kumar et al. 2012; Smith et al. 2015; Rodríguez et al. 2017;
490 Roycroft et al. 2019), the procedure remains one of the most widely used measures of
491 heuristic nodal support, including phylogenomic datasets. Although alternate bootstrap
492 methods have been proposed to reduce false positives e.g. resampling sites within
493 partitions or resampling partitions instead of sites (Nei et al. 2001; Gadagkar et al. 2005),
494 these strategies are not computationally tractable for tens of thousands of informative
495 loci. However, traditional bootstrapping remains tractable for assessing statistical support
496 for clades, especially when the amount of information is limited. The continued value of
497 traditional bootstrapping was illustrated by relatively low BS values at Node 1 for the
498 Exons-combined dataset and Node 3 for our UCE dataset; both we interpret as likely
499 indicative of insufficient phylogenetic information. On the other hand, high bootstrap
500 values are not necessarily reflective of high confidence for a correct topology, and can be
501 artifacts of big data, which reduces sampling variance (Smith et al. 2015; Rodríguez et al.
502 2017; Roycroft et al. 2019). Although low bootstrap values can in fact reflect poor
503 support, we recommend that high bootstrap values obtained from large genomic datasets bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA
504 be interpreted with caution and to use complimentary measures such as gCF, quartet
505 scores, and posterior probabilities to obtain a better characterization of variation in the
506 data and a more comprehensive perspective of evolutionary history.
507
508 Implications for Systematics and Taxonomy
509 Despite lingering uncertainty surrounding the branching order of specific clades,
510 our findings resolve some long-standing systematic conundrums with high statistical
511 support. The reciprocally monophyletic relationship between Chalcorana and Pulchrana
512 differs from Oliver et al. (2015) but was also inferred recently by Chan and Brown
513 (2017), and is consistently and unequivocally supported across all our analyses. Our
514 results also conclusively demonstrated that southeast Asian “Amnirana” nicobariensis is
515 not congeneric with the true Amnirana from Africa (the clade containing the generotype
516 species) and, thus, should be placed in a different genus. However, genomic data from
517 additional taxa, especially from the genus Indosylvirana will be needed before this taxon
518 can be placed with certainty in an existing genus. Another novel insight gleaned from this
519 study is the phylogenetic placement of “Hylarana” celebensis from Sulawesi, Indonesia.
520 This clade was inferred as the sister lineage of the “Indosylvirana” milleti + Papurana
521 clade with high support across all datasets and analyses with the exception of the UCE
522 species tree analysis, which inferred it to be sister to Hydrophylax but with low support.
523 Our results conclusively showed that “Hylarana” celebensis does not belong to the genus
524 Hylarana (the clade containing the generotype species for Hylarana) but, instead, may
525 represent a distinct lineage for which the erection of a new genus may be required. The
526 placement of “Indosylvirana” milleti from Cambodia as the sister lineage of Papurana
23 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.
527 was unexpected, yet highly supported across all analyses. The genus Papurana is
528 restricted to the Australasian region, which was never connected by land to Indochina
529 (Hall 1998, 2013; Voris 2000). Hence, its sister relationship with a lineage from
530 Cambodia is biogeographically incoherent (just as the inclusion of "Amnirana"
531 nicobariensis makes little biogeographic sense, when included in the otherwise African
532 genus Amnirana). We speculate that missing taxa could be responsible for these
533 anomalous relationships and that inclusion of additional taxa from the intervening regions
534 of Indochina, Borneo, and Wallacea will be necessary to resolve the placement of
535 recalcitrant lineages and stabilize classification.
536
537 SUPPLEMENTARY MATERIAL
538 Data and online-only supplementary materials are available from the Dryad
539 Digital Repository (https://doi.org/10.5061/dryad.bj907mp)
540
541 ACKNOWLEDGEMENTS
542 KOC’s work was supported by U.S. National Science Foundation (DEB 1702036)
543 and the National Geographic Society (9722-15). Indonesian, Philippine, and Solomon
544 Islands sampling were funded by NSF support to RMB (DEB 0640737, 0743491,
545 1557053, respectively) and Illumina sequencing was partially funded by DEB 1654388.
546
547 REFERENCES
548 Abdelkrim J., Aznar-Cormano L., Fedosov A.E., Kantor Y.I., Lozouet P., Phuong M.A.,
549 Zaharias P., Puillandre N. 2018. Exon-Capture-Based phylogeny and diversification bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA
550 of the venomous gastropods (Neogastropoda, Conoidea). Mol. Biol. Evol. 35:2355–
551 2374.
552 Alexander A.M., Su Y.C., Oliveros C.H., Olson K. V., Travers S.L., Brown R.M. 2016.
553 Genomic data reveals potential for hybridization, introgression, and incomplete
554 lineage sorting to confound phylogenetic relationships in an adaptive radiation of
555 narrow-mouth frogs. Evolution (N. Y). 71:475–488.
556 Allen E.V.A.S., Omland K.E. 2003. Novel intron phylogeny supports plumage
557 convergence in Orioles (Icterus). Auk. 120:961–969.
558 AmphibiaWeb. 2019. AmphibiaWeb. Available from http://amphibiaweb.org.
559 Arifin U., Smart U., Hertwig S.T., Smith E.N., Iskandar D.T., Haas A. 2018. Molecular
560 phylogenetic analysis of a taxonomically unstable ranid from Sumatra, Indonesia,
561 reveals a new genus with gastromyzophorous tadpoles and two new species.
562 Zoosystematics Evol. 94:163–193.
563 Armstrong M.H., Braun E.L., Kimball R.T. 2001. Phylogenetic utility of Avian
564 Ovomucoid Intron G: A comparison of nuclear and mitochondrial phylogenies in
565 Galliformes. Auk. 118:799–804.
566 Baca S.M., Alexander A., Gustafson G.T., Short A.E.Z. 2017. Ultraconserved elements
567 show utility in phylogenetic inference of Adephaga (Coleoptera) and suggest
568 paraphyly of ‘Hydradephaga.’ Syst. Entomol. 42:786–795.
569 Bankevich A., Nurk S., Antipov D., Gurevich A.A., Dvorkin M., Kulikov A.S., Lesin
570 V.M., Nikolenko S.I., Pham S., Prjibelski A.D., Pyshkin A. V., Sirotkin A. V.,
571 Vyahhi N., Tesler G., Alekseyev M.A., Pevzner P.A. 2012. SPAdes: A new genome
572 assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol.
25 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.
573 19:455–477.
574 Bayzid M.S., Warnow T. 2013. Naive binning improves phylogenomic analyses.
575 Bioinformatics. 29:2277–2284.
576 Bi K., Vanderpool D., Singhal S., Linderoth T., Moritz C., Good J.M. 2012.
577 Transcriptome-based exon capture enables highly cost-effective comparative
578 genomic data collection at moderate evolutionary scales. BMC Genomics. 13:403.
579 Blaimer B.B., Brady S.G., Schultz T.R., Lloyd M.W., Fisher B.L., Ward P.S. 2015.
580 Phylogenomic methods outperform traditional multi-locus approaches in resolving
581 deep evolutionary history: A case study of formicine ants. BMC Evol. Biol. 15:1–
582 14.
583 Blom M.P.K., Bragg J.G., Potter S., Moritz C. 2017. Accounting for uncertainty in gene
584 tree estimation: Summary-coalescent species tree inference in a challenging
585 radiation of Australian lizards. Syst. Biol. 66:352–366.
586 Borowiec M.L. 2016. AMAS: a fast tool for alignment manipulation and computing of
587 summary statistics. PeerJ. 4:e1660.
588 Bragg J.G., Potter S., Afonso Silva A.C., Hoskin C.J., Bai B.Y.H., Moritz C. 2018.
589 Phylogenomics of a rapid radiation: The Australian rainbow skinks. BMC Evol.
590 Biol. 18:1–12.
591 Bragg J.G., Potter S., Bi K., Moritz C. 2016. Exon capture phylogenomics: efficacy
592 across scales of divergence. Mol. Ecol. Resour. 16:1059–1068.
593 Bushnell B., Rood J., Singer E. 2017. BBMerge – Accurate paired shotgun read merging
594 via overlap. PLoS One. 12:1–15.
595 Capella-gutiérrez S., Silla-martínez J.M., Gabaldón T. 2009. trimAl : a tool for bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA
596 automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics.
597 25:1972–1973.
598 Chan K.O., Brown R.M. 2017. Did true frogs ‘dispersify’? Biol. Lett. 13:20170299.
599 Che J., Pang J., Zhao H., Wu G.F., Zhao E.M., Zhang Y.P. 2007. Phylogeny of Raninae
600 (Anura: Ranidae) inferred from mitochondrial and nuclear sequences. Mol.
601 Phylogenet. Evol. 43:1–13.
602 Chen L., Murphy R.W., Lathrop A., Ngo A., Orlov N.L., Cuc T.H., Somorjai I.L.M.
603 2005. Taxonomic chaos in Asian ranid frogs: an initial phylogenetic resolution.
604 Herpetol. J. 15:231–243.
605 Chen M.Y., Liang D., Zhang P. 2015. Selecting question-specific genes to reduce
606 incongruence in phylogenomics: A case study of jawed vertebrate backbone
607 phylogeny. Syst. Biol. 64:1104–1120.
608 Chen M.Y., Liang D., Zhang P. 2017a. Phylogenomic resolution of the phylogeny of
609 laurasiatherian mammals: Exploring phylogenetic signals within coding and
610 noncoding sequences. Genome Biol. Evol. 9:1998–2012.
611 Chen S., Huang T., Zhou Y., Han Y., Xu M., Gu J. 2017b. AfterQC: Automatic filtering,
612 trimming, error removing and quality control for fastq data. BMC Bioinformatics.
613 18:91–100.
614 Chen Z., Li H., Zhu Y., Feng Q., He Y., Chen X. 2017c. Molecular phylogeny of the
615 family Dicroglossidae (Amphibia: Anura) inferred from complete mitochondrial
616 genomes. Biochem. Syst. Ecol. 71:1–9.
617 Chernomor O., Von Haeseler A., Minh B.Q. 2016. Terrace aware data structure for
618 phylogenomic inference from supermatrices. Syst. Biol. 65:997–1008.
27 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.
619 Chojnowski J.L., Kimball R.T., Braun E.L. 2008. Introns outperform exons in analyses of
620 basal avian phylogeny using clathrin heavy chain genes. Gene. 410:89–96.
621 Collins R.A., Hrbek T. 2018. An in silico comparison of protocols for dated
622 phylogenomics. Syst. Biol. 67:633–650.
623 Crawford N.G., Faircloth B.C., Mccormack J.E., Brumfield R.T., Winker K., Glenn T.C.
624 2012. More than 1000 ultraconserved elements provide evidence that turtles are the
625 sister group of archosaurs. Biol. Lett. 8:783–786.
626 Creer S. 2007. Choosing and using introns in molecular phylogenetics. Evol. Bioinforma.
627 3:99–108.
628 Crowl A.A., Myers C., Cellinese N. 2017. Embracing discordance: Phylogenomic
629 analyses provide evidence for allopolyploidy leading to cryptic diversity in a
630 Mediterranean Campanula (Campanulaceae) clade. Evolution (N. Y). 71:913–922.
631 Van Dam M.H., Lam A.W., Sagata K., Gewa B., Laufa R., Balke M., Faircloth B.C.,
632 Riedel A. 2017. Ultraconserved elements (UCEs) resolve the phylogeny of
633 Australasian smurf-weevils. PLoS One. 12:1–21.
634 Davidson R., Vachaspati P., Mirarab S., Warnow T. 2015. Phylogenomic species tree
635 estimation in the presence of incomplete lineage sorting and horizontal gene
636 transfer. BMC Genomics. 16:S1.
637 DeBry R.W., Seshadri S. 2005. Nuclear intron sequences for phylogenetics of closely
638 related mammals: an example uising the phylogeny of Mus. J. Mammal. 82:280–
639 288.
640 Degnan J.H., Rosenberg N. a. 2009. Gene tree discordance, phylogenetic inference and
641 the multispecies coalescent. Trends Ecol. Evol. 24:332–340. bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA
642 Dell’Ampio E., Meusemann K., Szucsich N.U., Peters R.S., Meyer B., Borner J.,
643 Petersen M., Aberer A.J., Stamatakis A., Walzl M.G., Minh B.Q., Von Haeseler A.,
644 Ebersberger I., Pass G., Misof B. 2014. Decisive data sets in phylogenomics:
645 Lessons from studies on the phylogenetic relationships of primarily wingless insects.
646 Mol. Biol. Evol. 31:239–249.
647 Delsuc F., Brinkmann H., Philippe H. 2005. Phylogenomics and the reconstruction of the
648 tree of life. Nat. Rev. Genet. 6:361–375.
649 Doyle V.P., Young R.E., Naylor G.J.P., Brown J.M. 2015. Can we identify genes with
650 increased phylogenetic reliability? Syst. Biol. 64:824–837.
651 Dubois A. 1992. Notes sur la classification des Ranidae (Amphibiens anoures). Bull.
652 Mens. la Société Linnéenne Lyon. 61:305–352.
653 Dubois A., Crombie R.I., Glaw F. 2005. Amphibia Mundi. 1.2. Recent amphibians:
654 Generic and infrageneric taxonomic additions (1981-2002). Alytes. 23:25–69.
655 Eaton D.A.R., Hipp A.L., González-Rodríguez A., Cavender-Bares J. 2015. Historical
656 introgression among the American live oaks and the comparative nature of tests for
657 introgression. Evolution (N. Y). 69:2587–2601.
658 Faircloth B.C., McCormack J.E., Crawford N.G., Harvey M.G., Brumfield R.T., Glenn
659 T.C. 2012. Ultraconserved elements anchor thousands of genetic markers spanning
660 multiple evolutionary timescales. Syst. Biol. 61:717–726.
661 Faircloth B.C., Sorenson L., Santini F., Alfaro M.E. 2013. A phylogenomic perspective
662 on the radiation of ray-finned fishes based upon targeted sequencing of
663 Ultraconserved Elements (UCEs). PLoS One. 8.
664 Felsenstein J. 1985. Confidence limits on phylogenies: an approach using the bootstrap.
29 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.
665 Evolution (N. Y). 39:783–791.
666 Folk R.A., Mandel J.R., Freudenstein J. V. 2015. A protocol for targeted enrichment of
667 intron-containing sequence rarkers for recent radiations: A phylogenomic example
668 from Heuchera (Saxifragaceae). Appl. Plant Sci. 3:1500039.
669 Frost D.R. 2019. Amphibian Species of the World: an Online Reference. Version 6.0
670 (accessed 10 June 2019). .
671 Frost D.R., Grant T., Faivovich J., Bain R.H., Haas A., Haddad C.F.B., De Sá R.O.,
672 Channing A., Wilkinson M., Donnellan S.C., Raxworthy C.J., Campbell J. a., Blotto
673 B.L., Moler P., Drewes R.C., Nussbaum R. a., Lynch J.D., Green D.M., Wheeler
674 W.C. 2006. The amphibian tree of life. Bull. Am. Museum Nat. Hist. 297:1–291.
675 Gadagkar S.R., Rosenberg M.S., Kumar S. 2005. Inferring species phylogenies from
676 multiple genes: Concatenated sequence tree versus consensus gene tree. J. Exp.
677 Zool. Part B Mol. Dev. Evol. 304:64–74.
678 Galtier N., Daubin V. 2008. Dealing with incongruence in phylogenomic analyses.
679 Philos. Trans. R. Soc. B Biol. Sci. 363:4023–4029.
680 Gatesy J., Springer M.S. 2014. Phylogenetic analysis at deep timescales: Unreliable gene
681 trees, bypassed hidden support, and the coalescence/concatalescence conundrum.
682 Mol. Phylogenet. Evol. 80:231–266.
683 Gee H. 2003. Evolution: ending incongruence. Nature. 425:782.
684 Hahn M.W., Nakhleh L. 2016. Irrational exuberance for resolved species trees. Evolution
685 (N. Y). 70:7–17.
686 Hall R. 1998. The plate tectonics of Cenozoic SE Asia and the distribution of land and
687 sea. Biogeogr. Geol. Evol. SE Asia.:99–131. bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA
688 Hall R. 2013. The palaeogeography of Sundaland and Wallacea since the Late Jurassic. J.
689 Limnol. 72:1–17.
690 Hoang D.T., Chernomor O., von Haeseler A., Minh B.Q., Le S.V. 2017. UFBoot2:
691 improving the ultrafast bootstrap approximation. Mol. Biol. Evol. 35:518–522.
692 Hugall A.F., O’hara T.D., Hunjan S., Nilsen R., Moussalli A. 2016. An exon-capture
693 system for the entire class Ophiuroidea. Mol. Biol. Evol. 33:281–294.
694 Hutter C.R. 2019. FrogCap: A novel exon-capture probeset for Ranoidea. BioArxiv. In
695 press.
696 Igea J., Juste J., Castresana J. 2010. Novel intron markers to study the phylogeny of
697 closely related mammalian species. BMC Evol. Biol. 10:369.
698 Ilves K.L., Torti D., López-Fernández H. 2018. Exon-based phylogenomics strengthens
699 the phylogeny of Neotropical cichlids and identifies remaining conflicting clades
700 (Cichliformes: Cichlidae: Cichlinae). Mol. Phylogenet. Evol. 118:232–243.
701 IUCN. 2018. The IUCN Red List of Threatened Species. .
702 Jeffroy O., Brinkmann H., Delsuc F., Philippe H. 2006. Phylogenomics: the beginning of
703 incongruence? Trends Genet. 22:225–231.
704 Kalyaanamoorthy S., Minh B.Q., Wong T.K.F., von Haeseler A., Jermiin L.S. 2017.
705 ModelFinder: fast model selection for accurate phylogenetic estimates. Nat.
706 Methods. 14:587–589.
707 Katoh K., Standley D.M. 2013. MAFFT multiple sequence alignment software version 7:
708 Improvements in performance and usability. Mol. Biol. Evol. 30:772–780.
709 Kendall M., Colijn C. 2016. Mapping phylogenetic trees to reveal distinct patterns of
710 evolution. Mol. Biol. Evol. 33:2735–2743.
31 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.
711 Kent W.J. 2002. BLAT — The BLAST -Like Alignment Tool. Genome Res. 12:656–
712 664.
713 Krauss V., Thümmler C., Georgi F., Lehmann J., Stadler P.F., Eisenhardt C. 2008. Near
714 intron positions are reliable phylogenetic markers: an application to Holometabolous
715 insects. Mol. Biol. Evol. 25:821–830.
716 Kumar S., Filipski A.J., Battistuzzi F.U., Kosakovsky Pond S.L., Tamura K. 2012.
717 Statistics and truth in phylogenomics. Mol. Biol. Evol. 29:457–472.
718 Lanier H.C., Knowles L.L. 2012. Is recombination a problem for species-tree analyses?
719 Syst. Biol. 61:691–701.
720 Lanier H.C., Knowles L.L. 2015. Applying species-tree analyses to deep phylogenetic
721 histories: Challenges and potential suggested from a survey of empirical
722 phylogenetic studies. Mol. Phylogenet. Evol. 83:191–199.
723 Leaché A.D., Chavez A.S., Jones L.N., Grummer J.A., Gottscho A.D., Linkem C.W.
724 2015. Phylogenomics of Phrynosomatid lizards: conflicting signals from sequence
725 capture versus restriction site associated DNA sequencing. Genome Biol. Evol.
726 7:706–719.
727 Lemmon A.R., Emme S.A., Lemmon E.M. 2012. Anchored hybrid enrichment for
728 massively high-throughput phylogenomics. Syst. Biol. 61:727–744.
729 Léveillé-Bourret É., Starr J.R., Ford B.A., Moriarty Lemmon E., Lemmon A.R. 2018.
730 Resolving rapid radiations within Angiosperm families using anchored
731 phylogenomics. Syst. Biol. 67:94–112.
732 McCormack J.E., Faircloth B.C., Crawford N.G., Gowaty P.A., Brumfield R.T., Glenn
733 T.C. 2012. Ultraconserved elements are novel phylogenomic markers that resolve bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA
734 placental mammal phylogeny when combined with species-tree analysis. Genome
735 Res. 22:746–754.
736 Mclean B.S., Bell K.C., Allen J.M., Helgen K.M., Cook J.A. 2019. Impacts of inference
737 method and data set filtering on phylogenomic resolution in a rapid radiation of
738 Ground Squirrels (Xerinae: Marmotini). Syst. Biol. 68:298–316.
739 Meiklejohn K.A., Faircloth B.C., Glenn T.C., Kimball R.T., Braun E.L. 2016. Analysis
740 of a rapid evolutionary radiation using ultraconserved elements: evidence for a bias
741 in some multispecies coalescent methods. Syst. Biol. 65:612–627.
742 Mendes F.K., Hahn M.W. 2018. Why concatenation fails near the anomaly zone. Syst.
743 Biol. 67:158–169.
744 Minh B.Q., Hahn M.W., Lanfear R. 2018. New methods to calculate concordance factors
745 for phylogenomic datasets. bioRxiv.:doi: http://dx.doi.org/10.1101/487801.
746 Mirarab S., Bayzid M.S., Warnow T. 2016. Evaluating summary methods for multilocus
747 species tree estimation in the presence of incomplete lineage sorting. Syst. Biol.
748 65:366–380.
749 Mirarab S., Reaz R., Bayzid M.S., Zimmermann T., S. Swenson M., Warnow T. 2014.
750 ASTRAL: Genome-scale coalescent-based species tree estimation. Bioinformatics.
751 30:541–548.
752 Molloy E.K., Warnow T. 2017. To include or not to include: the impact of gene filtering
753 on species tree estimation methods. Syst. Biol. 67:285–303.
754 Nei M., Xu P., Glazko G. 2001. Estimation of divergence times from multiprotein
755 sequences for a few mammalian species and several distantly related organisms.
756 Proc. Natl. Acad. Sci. 98:2497–2502.
33 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.
757 Nguyen L.T., Schmidt H.A., Von Haeseler A., Minh B.Q. 2015. IQ-TREE: A fast and
758 effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol.
759 Biol. Evol. 32:268–274.
760 Nute M., Chou J., Molloy E.K., Warnow T. 2018. The performance of coalescent-based
761 species tree estimation methods under models of missing data. BMC Genomics.
762 19:1–22.
763 Ogilvie H.A., Heled J., Xie D., Drummond A.J. 2016. Computational performance and
764 statistical accuracy of *BEAST and comparisons with other methods. Syst. Biol.
765 65:381–396.
766 Oliver L.A., Prendini E., Kraus F., Raxworthy C.J. 2015. Systematics and biogeography
767 of the Hylarana frog (Anura: Ranidae) radiation across tropical Australasia,
768 Southeast Asia, and Africa. Mol. Phylogenet. Evol. 90:176–192.
769 Ottenburghs J., Kraus R.H.S., van Hooft P., van Wieren S.E., Ydenberg R.C., Prins
770 H.H.T. 2017. Avian introgression in the genomic era. Avian Res. 8:1–11.
771 Patel S. 2013. Error in Phylogenetic Estimation for Bushes in the Tree of Life. J.
772 Phylogenetics Evol. Biol. 01:1–10.
773 Pease J.B., Brown J.W., Walker J.F., Hinchliff C.E., Smith S.A. 2018. Quartet Sampling
774 distinguishes lack of support from conflicting support in the green plant tree of life.
775 Am. J. Bot. 105:385–403.
776 Philippe H., Brinkmann H., Lavrov D. V., Littlewood D.T.J., Manuel M., Wörheide G.,
777 Baurain D. 2011. Resolving difficult phylogenetic questions: Why more sequences
778 are not enough. PLoS Biol. 9.
779 Philippe H., Delsuc F., Brinkmann H., Lartillot N. 2005. Phylogenomics. Annu. Rev. bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA
780 Ecol. Evol. Syst. 36:541–562.
781 Phillips M.J., Delsuc F., Penny D. 2004. Genome-scale phylogeny and the detection of
782 systematic biases. Mol. Biol. Evol. 21:1455–1458.
783 Platt R.N., Faircloth B.C., Sullivan K.A.M., Kieran T.J., Glenn T.C., Vandewege M.W.,
784 Lee T.E., Baker R.J., Stevens R.D., Ray D.A. 2018. Conflicting evolutionary
785 histories of the mitochondrial and nuclear genomes in New World Myotis bats. Syst.
786 Biol. 67:236–249.
787 Pyron A.R., Wiens J.J. 2011. A large-scale phylogeny of Amphibia including over 2800
788 species, and a revised classification of extant frogs, salamanders, and caecilians.
789 Mol. Phylogenet. Evol. 61:543–583.
790 Reddy S., Kimball R.T., Pandey A., Hosner P.A., Braun M.J., Hackett S.J., Han K.L.,
791 Harshman J., Huddleston C.J., Kingston S., Marks B.D., Miglia K.J., Moore W.S.,
792 Sheldon F.H., Witt C.C., Yuri T., Braun E.L. 2017. Why do phylogenomic data sets
793 yield conflicting trees? Data type influences the avian tree of life more than taxon
794 sampling. Syst. Biol. 66:857–879.
795 Roch S., Steel M. 2015. Likelihood-based tree reconstruction on a concatenation of
796 aligned sequence data sets can be statistically inconsistent. Theor. Popul. Biol.
797 100:56–62.
798 Roch S., Warnow T. 2015. On the robustness to gene tree estimation error (or lack
799 thereof) of coalescent-based species tree methods. Syst. Biol. 64:663–676.
800 Rodríguez A., Burgon J.D., Lyra M., Irisarri I., Baurain D., Blaustein L., Göçmen B.,
801 Künzel S., Mable B.K., Nolte A.W., Veith M., Steinfartz S., Elmer K.R., Philippe
802 H., Vences M. 2017. Inferring the shallow phylogeny of true salamanders
35 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.
803 (Salamandra) by multiple phylogenomic approaches. Mol. Phylogenet. Evol.
804 115:16–26.
805 Rosser N.L., Thomas L., Stankowski S., Richards Z.T., Kennington W.J., Johnson M.S.
806 2017. Phylogenomics provides new insight into evolutionary relationships and
807 genealogical discordance in the reef-building coral genus Acropora. Proc. R. Soc. B
808 Biol. Sci. 284.
809 Rothfels C.J., Larsson A., Kuo L.Y., Korall P., Chiou W.L., Pryer K.M. 2012.
810 Overcoming deep roots, fast rates, and short internodes to resolve the ancient rapid
811 radiation of eupolypod II ferns. Syst. Biol. 61:490–509.
812 Roure B., Baurain D., Philippe H. 2013. Impact of missing data on phylogenies inferred
813 from empirical phylogenomic data sets. Mol. Biol. Evol. 30:197–214.
814 Roycroft E.J., Moussalli A., Rowe K.C. 2019. Phylogenomics Uncovers Confidence and
815 Conflict in the Rapid Radiation of Australo-Papuan Rodents. Syst. Biol. syz044.
816 Scornavacca C., Galtier N. 2017. Incomplete lineage sorting in mammalian
817 phylogenomics. Syst. Biol. 66:112–120.
818 Seo T.K. 2008. Calculating bootstrap probabilities of phylogeny using multilocus
819 sequence data. Mol. Biol. Evol. 25:960–971.
820 Simmons M.P., Gatesy J. 2015. Coalescence vs. concatenation: Sophisticated analyses
821 vs. first principles applied to rooting the angiosperms. Mol. Phylogenet. Evol.
822 91:98–122.
823 Singhal S., Grundler M., Colli G., Rabosky D.L. 2017. Squamate conserved loci (SqCL):
824 a unified set of conserved loci for phylogenomics and population genetics of
825 squamate reptiles. Mol. Ecol. Resour. 17:e12–e24. bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA
826 Slater G., Birney E. 2005. Automated generation of heuristics for biological sequence
827 comparison. BMC Bioinformatics. 6:31.
828 Smith S.A., Moore M.J., Brown J.W., Yang Y. 2015. Analysis of phylogenomic datasets
829 reveals conflict, concordance, and gene duplications with examples from animals
830 and plants. BMC Evol. Biol. 15:1–15.
831 Stuart B.L. 2008. The phylogenetic problem of Huia (Amphibia: Ranidae). Mol.
832 Phylogenet. Evol. 46:49–60.
833 Susko E. 2008. On the distributions of bootstrap support and posterior distributions for a
834 star tree. Syst. Biol. 57:602–612.
835 Tarver J.E., Dos Reis M., Mirarab S., Moran R.J., Parker S., O’Reilly J.E., King B.L.,
836 O’Connell M.J., Asher R.J., Warnow T., Peterson K.J., Donoghue P.C.J., Pisani D.
837 2016. The interrelationships of placental mammals and the limits of phylogenetic
838 inference. Genome Biol. Evol. 8:330–344.
839 Tonini J., Moore A., Stern D., Shcheglovitova M., Orti G. 2015. Concatenation and
840 species tree methods exhibit statistically indistinguishable accuracy under a aange of
841 simulated conditions. PLOS Curr. Tree Life.:Tonini, J., Moore, A., Stern, D.,
842 Shcheglovitova,.
843 Townsend J.P., Leuenberger C. 2011. Taxon Sampling and the Optimal Rates of
844 Evolution for Phylogenetic Inference. Syst. Biol. 60:358–365.
845 Vachaspati P., Warnow T. 2015. ASTRID: Accurate species TRees from internode
846 distances. BMC Genomics. 16:1–13.
847 Vachaspati P., Warnow T. 2018. SVDquest: Improving SVDquartets species tree
848 estimation using exact optimization within a constrained search space. Mol.
37 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.
849 Phylogenet. Evol. 124:122–136.
850 Voris H.K. 2000. Maps of Pleistocene sea levels in Southeast Asia: Shorelines, river
851 systems and time durations. J. Biogeogr. 27:1153–1167.
852 Warnow T. 2015. Concatenation analyses in the presence of incomplete lineage sorting.
853 PLOS Curr. Tree Life.:1–10.
854 Weisrock D.W., Smith S.D., Chan L.M., Biebouw K., Kappeler P.M., Yoder A.D. 2012.
855 Concatenation and concordance in the reconstruction of mouse lemur phylogeny: An
856 empirical demonstration of the effect of allele sampling in phylogenetics. Mol. Biol.
857 Evol. 29:1615–1630.
858 Whitfield J.B., Kjer K.M. 2008. Ancient rapid radiations of insects: challenges for
859 phylogenetic analysis. Annu. Rev. Entomol. 53:449–472.
860 Whitfield J.B., Lockhart P.J. 2007. Deciphering ancient rapid radiations. Trends Ecol.
861 Evol. 22:258–265.
862 Wielstra B., Arntzen J.W., Van Der Gaag K.J., Pabijan M., Babik W. 2014. Data
863 concatenation, Bayesian concordance and coalescent-based analyses of the species
864 tree for the rapid radiation of Triturus newts. PLoS One. 9.
865 Wiens J.J., Kuczynski C.A., Smith S.A., Mulcahy D.G., Sites J.W., Townsend T.M.,
866 Reeder T.W. 2008. Branch lengths, support, and congruence: Testing the
867 phylogenomic approach with 20 nuclear loci in snakes. Syst. Biol. 57:420–431.
868 Wiens J.J., Morrill M.C. 2011. Missing data in phylogenetic analysis: Reconciling results
869 from simulations and empirical data. Syst. Biol. 60:719–731.
870 Yang Z., Zhu T. 2018. Bayesian selection of misspecified models is overconfident and
871 may cause spurious posterior probabilities for phylogenetic trees. Proc. Natl. Acad. bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA
872 Sci. 115:1854–1859.
873 Yuan Z.Y., Zhou W.W., Chen X., Poyarkov N.A., Chen H.M., Jang-Liaw N.H., Chou
874 W.H., Matzke N.J., Iizuka K., Min M.S., Kuzmin S.L., Zhang Y.P., Cannatella D.C.,
875 Hillis D.M., Che J. 2016. Spatiotemporal diversification of the True Frogs (genus
876 Rana): A historical framework for a widely studied group of model organisms. Syst.
877 Biol. 65:824–842.
878 Zhang C., Rabiee M., Sayyari E., Mirarab S. 2018. ASTRAL-III: Polynomial time
879 species tree reconstruction from partially resolved gene trees. BMC Bioinformatics.
880 19:15–30.
881 Zhang J., Kobert K., Flouri T., Stamatakis A. 2014. PEAR: A fast and accurate Illumina
882 Paired-End reAd mergeR. Bioinformatics. 30:614–620.
883 Zhang Q., Feild T.S., Antonelli A. 2015. Assessing the impact of phylogenetic
884 incongruence on taxonomy, floral evolution, biogeographical history, and
885 phylogenetic diversity. Am. J. Bot. 102:566–580.
886
887 FIGURE CAPTIONS
888 Figure 1 Comparisons of the four primary topologies obtained from phylogenetic
889 analyses across the various datasets with discordant taxa highlight in red. The three focal
890 nodes with the highest discordance are labelled with a red circle.
891
892 Figure 2 Density plots showing average bootstrap values for each single-locus gene tree
893 and normalized Robinson–Foulds distances between each gene tree and the
894 corresponding species tree. Vertical dotted lines represent mean values for each dataset.
39 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.
895
896 Figure 3 Relationship between branch length (in coalescent units) and its corresponding
897 gene concordance factor (gCF). Branch lengths were obtained from the ASTRAL
898 analysis.
899
900 Figure 4 Comparison of ultrafast bootstrap values (from IQ-TREE), local posterior
901 probabilities (from ASTRAL), gene concordance factor, and quartet support (from
902 ASTRAL) for each focal node across the various datasets.
903
904 Figure 5 Frequency (in percentage) of the three possible topologies surrounding each
905 focal node. Cladograms representing each possible topology are color coded to match the
906 stacked bars.
907
908 Table 1. Attributes and summary statistics of the various datasets used in this study. PIS
909 = parsimony informative sites.
Dataset Filtering No. Locus length Total Mean
loci (mean | median) PIS prop.
PIS
Exons-unfiltered None 12,332 213 | 165 573,425 0.2
Exons 50 50% complete 10,375 215 | 168 507,033 0.21
Exons 75 75% complete 8,599 224 | 171 446,916 0.22
Exons 95 95% complete 770 312 | 210 57,467 0.23
Exons PIS-50 Top 50% PIS 6,166 286 | 207 441,134 0.25
Exons PIS-25 Top 25% 3,083 273 | 390 319,642 0.27 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA
Exons PIS-5 Top 5% PIS 617 702 | 852 147,407 0.29
EC-unfiltered None 2,254 619 | 480 291,342 0.2
EC 50 50% complete 1,822 576 | 459 216,646 0.2
EC 75 75% complete 1,749 583 | 465 211,947 0.2
EC 95 95% complete 705 667 | 537 101,132 0.21
EC PIS-50 Top 50% PIS 1,127 878 | 726 220,124 0.22
EC PIS-25 Top 25% 564 1,173 | 986 151,899 0.23
EC PIS-5 Top 5% PIS 113 2,028 | 1,884 54,082 0.24
Introns-unfiltered None 12,299 480 | 476 2,744,044 0.47
Introns 50 50% complete 10,570 496 | 496 2,558,468 0.49
Introns 75 75% complete 8,333 513 | 500 2,117,497 0.5
Introns 95 95% complete 248 533 | 540 59,442 0.46
Introns PIS-50 Top 50% PIS 6,150 595 | 583 1,867,336 0.52
Introns PIS-25 Top 25% 3,075 662 | 653 1,074,656 0.54
Introns PIS-5 Top 5% PIS 615 773 | 761 261,722 0.56
UCE-unfiltered None 638 782 | 769 114,282 0.22
UCE 50 50% complete 516 787 | 781 97,156 0.23
UCE 75 75% complete 447 815 | 800 89,804 0.24
UCE 95 95% complete 157 861 | 864 32,432 0.24
UCE PIS-50 Top 50% PIS 319 916 | 901 82,109 0.28
UCE PIS-25 Top 25% 160 998 | 994 48,370 0.31
910
911
912
913
41 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.
914 Table 2 Inferred topologies from the IQ-TREE, ASTRAL, and ASTRID analyses and
915 their corresponding quartet scores, QS (for ASTRAL) and average gCF values.
Topology
Dataset IQ-TREE ASTRAL ASTRID QS Mean gCF
Exons-unfiltered T2 T2 T2 0.69 50.94
Exons 50 T2 T2 T2 0.66 50.96
Exons 75 T2 T2 T2 0.66 51.07
Exons 95 T2 T2 T2 0.68 54.23
Exons PIS-50 T2 T2 T2 0.71 58.75
Exons PIS-25 T2 T2 T2 0.75 65.41
Exons PIS-5 T2 T2 T2 0.84 77.78
EC-unfiltered T2 T1 T1 0.76 63.96
EC 50 T2 T1 T1 0.75 62.67
EC 75 T2 T1 T1 0.75 62.93
EC 95 T2 T1 T1 0.77 66.24
EC PIS-50 T2 T2 T1 0.81 73.13
EC PIS-25 T2 T2 T2 0.85 78.68
EC PIS-5 T2 T1 T1 0.89 83.88
Introns-unfiltered T1 T1 T1 0.78 66.55
Introns 50 T1 T1 T1 0.78 66.79
Introns 75 T1 T1 T1 0.78 67.21
Introns 95 T5 T3 T4 0.79 68.65
Introns PIS-50 T1 T3 T1 0.79 68.84
Introns PIS-25 T1 T3 T1 0.8 69.45
Introns PIS-5 T2 T3 T3 0.81 70.70 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA
UCE-unfiltered T3 T4 T4 0.81 71.79
UCE 50 T3 T4 T3 0.8 71.40
UCE 75 T3 T3 T3 0.8 72.00
UCE 95 T4 T4 T4 0.81 73.07
UCE PIS-50 T3 T4 T3 0.85 76.44
UCE PIS-25 T3 T4 T3 0.84 77.96
916
43 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Exon Exons−combined 100
75
50
25 Dataset 50% complete 75% complete 95% complete Intron UCE gCF Top 25% PIS 100 Top 50% PIS Top 5% PIS Unfiltered 75
50
25
0 2 4 6 0 2 4 6 Branch Length bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.