<<

bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN

1 Exons, Introns, and UCEs Reveal Conflicting Phylogenomic Signals in a Rapid

2 Radiation of (Ranidae: Hylarana)

3

4 Kin Onn Chan1,2,*, Carl R. Hutter2, Perry L. Wood, Jr.3, L. Lee Grismer4, Rafe M.

5 Brown2

6

7 1 Lee Kong Chian National History Museum, Faculty of Science, National University of

8 Singapore, 2 Conservatory Drive, Singapore 117377. Email: [email protected]

9

10 2 Institute and Department of Ecology and , University

11 of Kansas, Lawrence, KS 66045, USA. Email: [email protected]; [email protected]

12

13 3 Department of Biological Sciences & Museum of Natural History, Auburn University,

14 Auburn, Alabama 36849, USA. Email: [email protected]

15

16 4 Herpetology Laboratory, Department of Biology, La Sierra University, 4500 Riverwalk

17 Parkway, Riverside, California 92505, USA. Email: [email protected]

18

19 *Corresponding author

20

1 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.

21 Abstract.—Numerous types of genomic markers have been used to resolve recalcitrant

22 nodes, yet their relative performance and congruence have rarely been compared directly.

23 Using target-capture sequencing, we obtained more than 12,000 highly informative exons

24 and introns, including ~600 UCEs to address long-standing systematic problems in

25 Southeast Asian Golden-backed frogs of the genus complex Hylarana. To reduce gene

26 tree estimation errors, we filtered the data using different thresholds of

27 completeness and parsimony informative sites (PIS) in addition to using the best-fit

28 models of DNA to estimate individual single-locus gene trees. We then

29 estimated species trees using concatenation (IQ-TREE), summary coalescent (ASTRAL),

30 and distance-based methods (ASTRID). Topological incongruence among these methods

31 and variation in nodal support were examined in detail using a suite of different measures

32 including quartet frequencies, bootstrap, local posterior probabilities, gene concordance

33 factors, and quartet scores. Results showed that high levels of incongruence were present

34 along the backbone of the phylogeny, specifically surrounding short internodes. We also

35 demonstrated that filtering data by PIS was more efficacious at improving congruence

36 compared to filtering by missing data, and that exons were more sensitive to data filtering

37 than introns and UCEs. Despite utilizing more than 6.9 million characters and 2.7 million

38 PIS, analyses failed to converge on a single concordant topology. Instead, exons, introns,

39 and UCEs produced genuinely strongly-supported yet conflicting phylogenetic signals,

40 which affected our phylogeny estimates at different scales/levels—indicating a general,

41 potentially alarming challenge for phylogenomics inference employing many of todays

42 massive datasets. Additionally, bootstrap values were consistently high despite low levels

43 of congruence and high proportions of gene trees that support conflicting topologies, bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA

44 indicating that traditional bootstraps are likely poor measures of congruence or branch

45 support in large phylogenomic datasets, especially during instances of rapid

46 diversification. Although low bootstrap values do ostensibly reflect low heuristic support,

47 we recommend that high bootstrap support obtained from large genomic datasets be

48 interpreted with caution. Additional complimentary measures such quartet frequencies,

49 gene concordance factors, quartet scores, and posterior probabilities can be useful to

50 provide a more robust and accurate representation of bipartition certainty and ultimately,

51 evolutionary history of incompletely resolved or poorly-understood .

52 Keywords: FrogCap, bootstrap, branch support, incongruence, quartet frequency, gene

53 concordance factor

54

55 Generating large amounts of data is no longer an issue in the era of

56 phylogenomics. Instead, limitations are imposed by model complexities (parameter

57 space) and computational tractability. Furthermore, analyzing genome-scale data has

58 revealed a different suite of challenges including high levels of incongruence, conflicting

59 evolutionary histories, and systematic bias (Gee 2003; Phillips et al. 2004; Philippe et al.

60 2011; Delsuc et al. 2005; Philippe et al. 2005; Jeffroy et al. 2006; Galtier and Daubin

61 2008; Dell’Ampio et al. 2014; Smith et al. 2015; Zhang et al. 2015; Leaché et al. 2015;

62 Kendall and Colijn 2016; Crowl et al. 2017; Reddy et al. 2017; Platt et al. 2018; Pease et

63 al. 2018; Roycroft et al. 2019). It is therefore important to find a “sweetspot” that

64 optimizes the shifting trade-off between amount of data and analytical resources without

65 compromising the accuracy of inferences. As such, understanding the impacts of data

66 filtering/subsampling strategies and performing robust assessments on analytical methods

3 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.

67 and the accuracy of species tree inferences are integral components to the rapidly

68 expanding future of the field.

69 Incongruence can arise not only from biological processes such as hybridization,

70 horizontal gene transfer, and incomplete lineage sorting that violate the assumption of

71 orthology (Whitfield and Lockhart 2007; Whitfield and Kjer 2008; Eaton et al. 2015;

72 Meiklejohn et al. 2016; Tarver et al. 2016; Ottenburghs et al. 2017; Léveillé-Bourret et al.

73 2018), but also through systematic biases associated with the analysis of large datasets.

74 Gene tree estimation errors (GTEE) resulting from (but not limited to) model

75 misspecification or insufficient phylogenetic signal can increase noise and affect

76 phylogenetic inference (Roure et al. 2013; Doyle et al. 2015; Roch and Warnow 2015;

77 Vachaspati and Warnow 2015; Blom et al. 2017; Molloy and Warnow 2017; Nute et al.

78 2018). Due to different underlying models and assumptions, different analytical methods

79 such as concatenation, distance-based, and coalescent-based summary methods can also

80 produce variable results. Several studies have argued that concatenation can perform as

81 well or better than summary methods, which may be adversely affected by high GTEE

82 (Gatesy and Springer 2014; Simmons and Gatesy 2015; Tonini et al. 2015). Conversely,

83 concatenation analyses have also been shown to fail or produce spuriously high support

84 for the wrong tree (Weisrock et al. 2012; Wielstra et al. 2014; Roch and Steel 2015;

85 Warnow 2015; Molloy and Warnow 2017; Mendes and Hahn 2018). Although it is

86 widely acknowledged that GTEE is an important analytical challenge, potentially

87 affecting species tree estimation, recent studies suggest that distance- and summary-based

88 methods that are statistically consistent under the MSC model may perform better under a

89 wide range of model conditions—and that they have the potential to produce low error bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA

90 rates when many genes are available and GTEE is low (Bayzid and Warnow 2013; Patel

91 2013; Lanier and Knowles 2015; Roch and Warnow 2015; Mirarab et al. 2016; Baca et

92 al. 2017; Molloy and Warnow 2017; Nute et al. 2018; Vachaspati and Warnow 2018).

93 Therefore, if large amounts of gene trees can be estimated with low GTEE, the power of

94 coalescent-based methods can be harnessed to estimate species trees with high accuracy.

95 Analyses of massive gene sequence datasets have also demonstrated how

96 traditional measures of support such as the non-parametric bootstrap and posterior

97 probabilities can be positively misleading (Phillips et al. 2004; Seo 2008; Wiens and

98 Morrill 2011; Kumar et al. 2012; Weisrock et al. 2012; Yang and Zhu 2018; Roycroft et

99 al. 2019). Resampling methods such as non-parametric bootstrapping essentially measure

100 site-sampling variance as opposed to observed variance in the data. Because site-

101 sampling variance is an inverse function of sample size (amount of data), bootstrap

102 values will inevitably inflate as the amount of data increases (Felsenstein 1985; Kumar et

103 al. 2012); this tendency does not necessarily reflect variation in the data themselves. In

104 contrast, calculating Bayesian posterior probabilities is computationally expensive and

105 can also produce spuriously high support in big datasets (Susko 2008; Yang and Zhu

106 2018). As genome-scale datasets become more common, more robust characterizations of

107 uncertainty is needed to tease apart conflict from true signal strength (Gadagkar et al.

108 2005; Smith et al. 2015; Minh et al. 2018; Pease et al. 2018), which can be

109 disproportionately obfuscated in nodes that are old or separated by short internal branches

110 (Whitfield and Lockhart 2007; Whitfield and Kjer 2008; Rothfels et al. 2012; Meiklejohn

111 et al. 2016; Blom et al. 2017; Léveillé-Bourret et al. 2018; Mclean et al. 2019; Roycroft

112 et al. 2019).

5 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.

113 Fueled in part by the increasing availability of complete genomes and

114 transcriptomes, the development of target capture methods have impelled the

115 phylogenomic revolution through custom-designed probe-sets that target specific

116 genomic markers with the aim of capturing orthologous and informative loci across

117 different evolutionary timescales (Bi et al. 2012; Faircloth et al. 2012; Lemmon et al.

118 2012; Singhal et al. 2017; Collins and Hrbek 2018). Among the freely-available target

119 capture methods, ultra-conserved elements (UCEs) and exonic markers are widely-used

120 to resolve ambiguous relationships (Bi et al. 2012; Faircloth et al. 2012; Blaimer et al.

121 2015; Bragg et al. 2016, 2018; Hugall et al. 2016; Meiklejohn et al. 2016; Baca et al.

122 2017; Van Dam et al. 2017). UCEs are typically used to reconstruct deep-time

123 evolutionary relationships (Crawford et al. 2012; Faircloth et al. 2012, 2013; McCormack

124 et al. 2012), whereas exon-capture methods are more suitable at moderate evolutionary

125 scales (Bi et al. 2012; Bragg et al. 2016; Abdelkrim et al. 2018; Ilves et al. 2018). Faster-

126 evolving, non-coding introns have also been shown to be effective at resolving

127 problematic nodes at the species, genus, and family level (Armstrong et al. 2001; Allen

128 and Omland 2003; DeBry and Seshadri 2005; Creer 2007; Chojnowski et al. 2008;

129 Krauss et al. 2008; Igea et al. 2010; Folk et al. 2015) and furthermore, have been

130 demonstrated to contain stronger and more congruent phylogenetic signals compared to

131 exons (Chen et al. 2017a). Although these different types of genomic markers have been

132 employed to resolve recalcitrant evolutionary relationships, rarely have direct

133 comparisons been made to examine the relative performance and phylogenetic

134 congruency of different markers (Chen et al. 2017c). A newly developed anuran (frogs

135 and toads) probe-set (“FrogCap”) with a module specifically designed for the superfamily bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA

136 Ranoidea, targets more than 12,000 highly informative and orthologous exonic (and their

137 intervening intronic regions) and UCE loci (Hutter 2019), effectively covering a wide

138 range of diversification timescales. We employed the FrogCap probe-set to examine the

139 efficacy of exons, introns, and UCEs to resolve recalcitrant nodes in a systematically

140 chaotic group of frogs that are plagued by pervasive phylogenetic ambiguity and

141 concomitant taxonomic instability.

142 The of the family Ranidae has one of the most volatile and

143 contentious taxonomic histories among all amphibian groups (Dubois 1992; Chen et al.

144 2005; Dubois et al. 2005; Frost et al. 2006; Che et al. 2007; Stuart 2008; Pyron and

145 Wiens 2011; Oliver et al. 2015; Yuan et al. 2016; Arifin et al. 2018). Within Ranidae, the

146 systematics and taxonomy of Golden-backed frogs of the genus-complex Hylarana sensu

147 lato (s.l.) are particularly problematic, in part, due to morphological similarities of

148 convergence and symplesiomorphy. More than a dozen generic and sub-generic names

149 have been created, synonymized, resurrected, and/or revalidated, with the majority of

150 changes based on morphology and/or Sanger-derived genetic markers (Dubois 1992;

151 Oliver et al. 2015; Chan and Brown 2017; Frost 2019). At present, this group harbors at

152 least 94 species, which are distributed across Africa, Southeast Asia, and Australasia

153 (Frost, 2018), thereby presenting interesting systematic challenges, compelling

154 evolutionary questions, and expansive biogeographic significance. However, the absence

155 of a stable, well-resolved phylogeny prevents the investigation of such questions, to say

156 nothing of accurate species diversity estimates (AmphibiaWeb 2019) and conservation

157 status of the taxa involved (IUCN 2018). To address this challenge and explore the

158 general issue of variation in performance and information content of these classes of data,

7 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.

159 we collected unprecedented amounts of genomic data in the form of exons, introns, and

160 UCEs using the FrogCap probe-set with the main aim of resolving the backbone of the

161 Hylarana s. l. phylogeny. We took specific measures to minimize the effects of GTEE by

162 filtering the data according to various thresholds of taxon completeness (missing data)

163 and phylogenetic information content [proportion of parsimony-informative-sites/loci

164 (PIS)]. Next, individual single-locus gene trees were estimated using the best-fit model of

165 substitution and data were analyzed using concatenation, summary, and distance-based

166 methods. Finally, we assessed incongruence using various measures of branch support

167 including ultrafast bootstrap, local posterior probability, quartet support, and gene

168 concordance factor to: 1) explore their adequacy in capturing the underlying variation in

169 the data; and 2) determine whether uncertainty is due to systematic bias, insufficient

170 phylogenetic signal, or representative of genuine conflict characterized by variable gene

171 histories.

172

173 MATERIALS AND METHODS

174 Taxon Sampling and DNA Extraction

175 We sequenced 31 ingroup samples consisting of 20 species, with representatives

176 from all 10 genera (Table S1). Tissue samples for molecular work were obtained from the

177 museum holdings of The University of Kansas Biodiversity Institute (KU), California

178 Academy of Sciences (CAS), La Sierra University Herpetological Collection, Riverside,

179 California (LSUHC), and the Museum of Vertebrate Zoology, Berkeley (MVZ).

180 Genomic DNA was extracted using the automated Promega Maxwell® RSC Instrument

181 (Tissue DNA kit) and subsequently quantified using the Promega Quantus® Fluorometer. bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA

182

183 Probe Design, Library Preparation, and Sequencing

184 Probe design follows Hutter (2019) and is summarized here. Probes were

185 synthesized as biotinylated RNA oligos in a myBaits kit (Arbor Biosciences™, formerly

186 MYcroarray® Ann Arbor, MI) by matching 25 publicly available transcriptomes to the

187 Nanorana parkeri and Xenopus tropicalis genomes using the program BLAT (Kent

188 2002). Matching sequences were clustered by their genomic coordinates to detect

189 presence/absence across species and to achieve full locus coverage. To narrow the locus

190 selection to coding regions, each cluster was matched to available coding region

191 annotations from the Nanorana parkeri genome using the program EXONERATE (Slater

192 and Birney 2005). Loci from all matching species were then aligned using MAFFT

193 (Katoh and Standley 2013) and subsequently separated into 120 bp-long bait sequences

194 with 2x tiling (50% overlap among baits) using the myBaits-2 kit (40,040 baits) with

195 120mer sized baits. These loci have an additional bait at each end extending into the

196 intronic region to increase the coverage and capture success of these areas. Baits were

197 then filtered, retaining those: without sequence repeats; a GC content of 30%–50%; and

198 baits that did not match to their reverse complement or multiple genomic regions.

199 Additionally, 646 UCEs that contain at least 10% informative sites were included

200 (Alexander et al. 2016).

201 Library preparation was performed by Arbor Biosciences and briefly follows: (1)

202 genomic DNA was sheared to 300–500 bp; (2) adaptors were ligated to DNA fragments;

203 (3) unique identifiers were attached to the adapters to later identify individual samples;

204 (4) biotinylated 120mer RNA library baits were hybridized to the sequences; (5) target

9 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.

205 sequences were selected by adhering to magnetic streptavidin beads; (6) target regions

206 were amplified via PCR; and (7) samples were pooled and sequenced on an Illumina

207 HiSeq PE-3000 with 150 bp paired-end reads. Sequencing was performed at the

208 Oklahoma Medical Research Foundation DNA Sequencing Facility.

209

210 Bioinformatics

211 The bioinformatics pipeline for filtering adapter contamination, assembling loci,

212 and exporting alignments are available on GITHUB, using version 2 of the pipeline

213 (https://github.com/chutter/FrogCap-Sequence-Capture). Adapter contamination and

214 other sequencing artefacts were filtered from raw reads using the program AFTERQC

215 (Chen et al. 2017b). Paired-end reads were merged using the program BBMERGE

216 (Bushnell et al. 2017), which avoids inflating coverage for these regions due to uneven

217 lengths from cleaning (Zhang et al. 2014). The cleaned reads were then assembled de

218 novo using the program SPADES v.3.12 (Bankevich et al. 2012) under a variety of k-mer

219 schemes. SPADES also has built-in error correction, so error correction was not

220 performed prior to assembly. The contigs were then matched against the reference probe

221 sequences with BLAT, keeping only those contigs that uniquely matched to the probe

222 sequences. The final set of matching loci was then aligned on a locus-by-locus basis

223 using MAFFT.

224 Alignments were trimmed and saved separately into usable datasets for

225 phylogenetic analyses and data type comparisons: (1) Introns: the exon previously

226 delimited was trimmed out of the original contig and the two remaining intronic regions

227 were concatenated; (2) Exons: each alignment was adjusted to be in an open-reading bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA

228 frame and trimmed to the largest reading frame that accommodated >90% of the

229 sequences, alignments with no clear reading frame were discarded; (3) Exons-combined,

230 exons from the same gene, which may be linked (Lanier and Knowles 2012; Scornavacca

231 and Galtier 2017), were concatenated and treated as a single locus; and (4) UCEs were

232 also saved as a separate dataset. We applied internal trimming only to the intron and UCE

233 alignments using the program trimAl (automatic1 function; Capella-gutiérrez et al.,

234 2009). All alignments were externally trimmed to ensure that at least 50 percent of the

235 samples had sequence data present.

236

237 Data Filtering and Phylogenetic Analysis

238 We sought to minimize the effects of GTEE by applying two widely-used data

239 filtering strategies. In addition to the unfiltered data, each dataset (Exons, Exons-

240 combined, Introns, and UCEs) was filtered at 50%, 75%, and 95% sampling

241 completeness (loci that did not meet these thresholds were discarded). Because loci with

242 low phylogenetic information can introduce noise and increase GTEE, we also filtered

243 data according to information content using number of parsimony-informative-sites (PIS)

244 as a proxy. We assembled datasets that contained the top 50%, 25%, and 5% of loci with

245 the highest PIS. Summary statistics, partitioning, and concatenation of data were

246 performed using the program AMAS (Borowiec 2016) and custom R scripts.

247 Phylogenetic estimation was performed using concatenation, distance-based, and

248 summary methods. For the concatenation analysis, we used the maximum likelihood

249 program IQ-TREE v1.7 (Nguyen et al. 2015; Chernomor et al. 2016). Due to the sheer

250 number of loci, we only performed an unpartitioned analysis using the GTR+GAMMA

11 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.

251 (model testing and partitioned analysis for individual loci were not

252 computationally tractable). Branch support was assessed using 1,000 ultrafast bootstrap

253 replicates (UFB; Hoang et al., 2017). Nodes with UFB >95 were considered strongly-

254 supported.

255 Because empirical and simulation studies have suggested that concatenation

256 analysis can result in the wrong tree with high support, and that unpartitioned analysis

257 can be statistically inconsistent in the presence of incomplete lineage sorting (ILS)

258 (Degnan and Rosenberg 2009; Roch and Steel 2015; Warnow 2015), we also performed

259 distance- and summary-based species tree analyses that are ILS aware and statistically

260 consistent under the multi-species coalescent model. The program ASTRAL-III (Zhang

261 et al. 2018), hereafter referred to only as ASTRAL, was used because it has one of the

262 lowest error rates when the number of informative sites are high and has been shown to

263 produce more accurate results compared to other summary methods under a variety of

264 conditions including high ILS and low GTEE (Mirarab et al. 2014; Davidson et al. 2015;

265 Vachaspati and Warnow 2015, 2018; Ogilvie et al. 2016; Molloy and Warnow 2017).

266 Prior to the species tree analysis, IQ-TREE was used to estimate gene trees for each

267 individual locus. To reduce further GTEE arising from model misspecification, we

268 estimated and used the best-fit substitution model for each individual locus as determined

269 by the program ModelFinder (Kalyaanamoorthy et al. 2017). The resulting gene trees

270 were then used as input in the ASTRAL analysis. Finally, because phylogenomic species

271 tree estimation can benefit from a mixture of genes aimed at resolving different parts of

272 the tree (Townsend and Leuenberger 2011; Chen et al. 2015), we performed an ASTRAL

273 analysis on a dataset comprising 500 loci with the highest PIS from the Exons-combined, bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA

274 Introns, and UCE datasets. To improve accuracy of all ASTRAL analyses, we collapsed

275 branches that were below 10% bootstrap support as recommended by the authors (Zhang

276 et al. 2018).

277 Finally, the same sets of gene trees were used to estimate species trees using the

278 distance-based method ASTRID, which has been shown to outperform ASTRAL when

279 many genes are available and when ILS is very high (Vachaspati and Warnow 2015).

280

281 Assessing Incongruence

282 ASTRAL quartet scores were computed to summarize the proportion of induced

283 quartet trees (from individual single-locus gene trees) in the species tree. For example, a

284 score of 0.5 would mean that 50% of quartet trees induced by the gene trees are in the

285 species tree. The normalized Robinson-Fould’s distance (RFDist) was also used to

286 examine topological congruence between each gene tree and the corresponding species

287 tree derived from ASTRAL. We further used quartet support, quartet frequencies, and the

288 gene concordance factor (gCF) to measure the amount of gene tree conflict around each

289 branch of the species tree. Quartet support and frequencies were calculated in ASTRAL

290 to examine the amount of gene tree quartets supporting the primary, second, and third

291 alternative topologies. For every branch of the species tree, the gCF represents the

292 percentage of decisive gene trees containing that branch, while accounting for unequal

293 taxon coverage among gene trees (Minh et al. 2018).

294

295 RESULTS

296 Data Assembly

13 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.

297 After matching assembled contigs with targeted loci, an average of 13,745 contigs

298 were obtained per sample, with a mean and median length of 939.3 and 896.7

299 respectively (Table S2). Overall ingroup taxon occupancy was high with the exception of

300 one sample (Fig. S1). However, subsequent analyses showed that this had no effect on

301 phylogenetic estimation as the sample was consistently recovered in the same position

302 with high support across all datasets and analyses. Our Exons and Introns datasets had

303 similar occupancies that were slightly lower than UCEs and Exons-combined (Fig. S1).

304 Prior to data filtering, the Exons and Introns datasets consisted of more than 12,000 loci;

305 the UCE dataset contained 638 loci. Exons from the same gene were also combined to

306 form a separate dataset (Exons-combined) comprising 2,254 loci (Table 1). UCE loci

307 were longest on average, followed by Exons-combined, Introns, and Exons. On average,

308 our Introns datasets had the highest number of PIS per locus followed by UCE, Exons,

309 and Exons-combined. Intronic loci also had a much higher proportion of PIS compared to

310 all other datasets (0.5–0.6 vs. 0.2–0.3) and consequently, had a much higher sum of total

311 PIS (>2.7 million before filtering; Fig. S2; Table 1).

312 Filtering by completeness did not drastically reduce the number of loci (except at

313 the most stringent threshold of 95% completeness) and the resulting datasets contained

314 more loci and total PIS than datasets filtered by PIS. The UCE dataset was not filtered at

315 5% PIS because too few loci were retained at that threshold. The average proportion of

316 PIS per locus was not substantially affected by data filtering, indicating that captured loci

317 were consistently informative within a particular marker type (Table 1).

318

319 Phylogenetic Estimation bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA

320 Overall, all methods of phylogenetic analyses produced five different topologies

321 (T1–T5; Table 2). However, topology T5 was only recovered from the Introns 95 dataset

322 (248 loci), which contained numerous poorly supported nodes and hence, was considered

323 inaccurate and not included in further discussions. Topologies T1–T4 were only

324 discordant at three nodes differing in relationships of the

325 Humerana+Hylarana/Amnirana, “Amnirana”/, and Hydrophylax/“Hylarana”

326 celebensis clades (Fig. 1). We therefore focused on these problematic clades in

327 downstream analyses.

328 Different filtering strategies did not generally alter tree topology when analyzed

329 using the same method, except at extreme filtering thresholds (Table 2). However,

330 conflicting topologies where recovered in a number of datasets analyzed by the different

331 methods we employed. Our IQ-TREE analysis recovered topology T2 for all Exons and

332 Exons-combined datasets. However, topology T1 was recovered in analyses of most

333 Exons-combined datasets by ASTRAL and ASTRID. Results for the Introns datasets

334 were variable and all five topologies were recovered, but analyses of majority of these

335 datasets resulted in the inference of topology T1. The UCE datasets only recovered the

336 T3 and T4 topologies, which varied according to filtering strategy and method of

337 inference. The combined dataset consisting of 500 most parsimony-informative loci from

338 the Exons-combined, Introns, and UCE datasets produced topology T1 with full support

339 across all analytical methods (Table 2). Individual phylogenies with branch support are

340 presented in the supplementary material.

341

342 Tree Support and Incongruence

15 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.

343 Average bootstrap support for individual gene trees was relatively high across all

344 datasets (>60% for Exons; >75% for Exons-combined, Introns, and UCE), indicating that

345 the gene trees contained high phylogenetic information and low GTEE (Fig. 2). Filtering

346 strategy had a more pronounced impact on Exons and Exons-combined datasets and

347 filtering by PIS produced gene trees with higher average bootstrap support compared to

348 filtering by completeness. Bootstrap support for gene trees in Introns and UCE datasets

349 were less perturbed by data filtering and showed similar but minor improvements (Fig.

350 2). Similarly, the Exons dataset exhibited the highest observed topological incongruence

351 between gene trees and species trees (measured using Robinson-Fould’s Distance,

352 RFDist) and was the most sensitive to data filtering (Fig. 2). Filtering by PIS was also

353 more effective in improving topological congruence compared to filtering by

354 completeness. Topological congruence of Introns and UCE datasets were also least

355 sensitive to data filtering (Fig. 2). Reflecting a similar trend, ASTRAL species tree

356 quartet scores (QS) and mean gCF showed very slight improvements when filtered by

357 completeness in the Exons and Exons-combined datasets, but improvement was markedly

358 higher when filtered by PIS. These scores improved to a much lesser degree for the

359 Introns and UCE datasets. On average, quartet scores and mean gCF were highest in

360 analyses of UCEs, followed by Introns, Exons-combined, and Exons datasets (Table 2).

361

362 Branch Support and Incongruence

363 The inferred level of congruence surrounding each node was strongly and

364 positively correlated with its associated internal branch length and this relationship holds

365 true regardless of data filtering schemes (Fig. 3). The shortest internal branches (Node 1– bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA

366 3) had the lowest gCF and QS and were also where most topological incongruence

367 among datasets occurred (Fig. 1). Overall, bootstrap support values across focal nodes

368 were consistently high and invariant for all data types and filtering strategies, with the

369 exception of Node 1 for the Exons-combined dataset and Node 3 for the UCE dataset

370 (Fig. 4). Conversely, posterior probabilities, gCF, and QS exhibited greater variability

371 across all datasets, presumably providing a better characterization of variation in the data.

372 Unlike gCF, posterior probabilities and QS did not progressively improve with more

373 stringent filtering strategies. Variation in gCF for the Introns and UCE datasets were also

374 smaller compared to Exons and Exons-combined (Fig. 4).

375

376 Node 1.—Topology T2 was recovered only by our Exons and Exons-combined

377 datasets (Table 2). However, an examination of quartet frequencies for the primary,

378 second, and third alternative topologies revealed that relatively equal, and in some

379 datasets equal proportions (unfiltered, 50% and 75% completeness) of gene trees

380 supported either the primary or alternate topology for that node (Fig. 5). For Exons

381 datasets, gCF values were very low (<10%) when unfiltered or filtered by completeness

382 but improved when filtered by PIS. However, despite being associated with low gCF and

383 high proportions of gene trees supporting contrasting topologies, bootstrap values were

384 100 across all Exons datasets (Table S3). Interestingly, although we inferred equal

385 proportions of gene trees supporting either the primary or alternate topologies for the

386 Exons and Exons-combined datasets that were unfiltered and filtered at 50% and 75%

387 completeness (Fig. 5), ASTRAL and ASTRID analyses inferred topologies T2 for Exons

388 and T1 for Exons-combined datasets respectively (Table 2). A closer examination of the

17 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.

389 numbers of gene trees supporting each topology (as opposed to proportions) revealed that

390 only a small number of additional gene trees supported the primary topology in Exons-

391 combined datasets. However, although the number of additional gene tress supporting the

392 primary (over the alternate) topology was very low (not more than 7; Fig. S3), they were

393 sufficient to infer a different topology with relatively high bootstrap support (UFB 89–95;

394 Table S3). Another conflicting result was produced by the Exons-combined dataset

395 filtered at 25% PIS, which recovered the primary topology with high support (UFB=99;

396 PP=1.0) in ASTRAL, but the alternate topology in ASTRID (Table 2). The Introns and

397 UCE datasets unequivocally supported the primary topology for this node (Fig. 5).

398

399 Node 2.—The clear majority of gene trees in the Exons and Exons-combined

400 datasets supported the primary topology, whereas the majority of UCE gene trees

401 supported the alternate topology (Fig. 5). The Introns dataset was more ambiguous and

402 resulted in variable support for conflicting alternate topologies depending on how data

403 was filtered (Fig. 5; Table 2). Topologies were inconsistent at extreme filtering thresholds

404 (95% completeness and top 5% PIS), most likely due to insufficient information from the

405 low numbers of retained loci. For the Introns datasets filtered at 50% and 25% PIS, our

406 ASTRID and concatenated analyses recovered the primary topology with high support

407 (UFB=100), whereas ASTRAL recovered the alternate topology with low support for

408 Introns PIS-50 (PP=0.15) but high support for Introns PIS-25 (PP=1.0; Tables 2, S3).

409 Quartet frequencies indicated equal proportions of gene trees (39%) supporting both

410 primary and alternate topologies in the Introns PIS-50 dataset, and only an additional 1%

411 more supporting the alternate over the primary topology in the Introns PIS-25 dataset bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA

412 (Fig. 5). Despite the high level on incongruence surrounding this node, bootstrap support

413 was 100 for all datasets and local posterior probabilities were 1.0 in analyses of all but

414 two datasets (Introns 95, Introns PIS-50; Table S3).

415

416 Node 3.—Only analyses of our UCE dataset resulted in the inference of topology

417 T4. However, bootstrap and posterior probabilities were low and quartet frequencies

418 showed relatively equal numbers of gene trees supporting either the primary or alternate

419 topology, indicating that topology T4was not strongly supported (Figs. 5, S3; Table S3).

420

421 DISCUSSION

422 Impacts of Data Filtering

423 Our results showed that filtering by PIS is a more successful strategy to improve

424 branch support and topological congruence compared to filtering by taxon

425 completeness/missing data even though fewer loci were retained. Interestingly, the

426 magnitude of improvement was more substantial in exonic loci compared to introns and

427 UCEs, indicating that exonic data are more sensitive to data subsampling; these findings

428 are similar to results reported by Chen et al. (2017a). One possible explanation for this

429 trend is that exonic loci are less variable, contain a higher number of uninformative

430 single-locus gene trees, and therefore, produces more variation in gene tree topologies.

431 Filtering out loci with fewer PIS removes uninformative trees, resulting in less gene tree

432 variation and higher gene tree species tree congruence. Another possible explanation

433 could be provided by the observation that selection may act on exons, and thus, introduce

434 more phylogenetic noise. Congruence improved considerably when exons of the same

19 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.

435 gene were combined (Exons-combined), even though this dataset had significantly fewer

436 loci and total PIS. Unexpectedly, despite exons being the class of data most affected by

437 data filtering and having the highest levels of incongruence among datasets, phylogenetic

438 inference using exons was remarkably congruent across all analyses, producing the T2

439 topology regardless of filtering strategies.

440 Even though certain data filtering strategies increased congruence, stringent

441 filtration could come at the expense of data (loss of character information, which

442 ultimately can bias topological estimates). This possibility was exemplified by datasets

443 filtered using the most stringent criteria (95% completeness and top 5% PIS). These

444 datasets had the highest levels of congruence but produced the most erratic topologies,

445 demonstrating that high congruence does not always translate to accurate results. This

446 finding highlights the importance of finding an optimal balance between data filtering

447 thresholds and the amount of retained data. We found that datasets consisting of fewer

448 than 1000 loci were the most unreliable, regardless of the quality of retained loci.

449 Similarly, analyzing data filtered to maximize information did not always produce better

450 results; when testing for a relationship between PIS and gCF using linear regression, we

451 did not find a significant relationship (Fig. S4).

452

453 Incongruence and Measures of Support

454 Although this study utilized large numbers of informative loci, our results

455 demonstrate that the resolution of recalcitrant nodes, especially those that are separated

456 by very short internal branches, remains a continuing analytical challenge. With the

457 current ability to sample and estimate thousands of individual gene histories in bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA

458 phylogenetic studies of sequence capture data, a single species tree summary consensus

459 topology may still be difficult to empirically estimate. By implementing measures to

460 reduce systematic bias and via a thorough examination of alternate topologies, we

461 demonstrate that different genetic markers such as exons, introns, and UCEs can produce

462 genuinely conflicting phylogenetic signals, which impact phylogenetic inference at

463 multiple scales. At Node 1 (Fig. 5), all Exons datasets supported the alternate topology,

464 whereas Introns and UCEs supported the primary topology. On the other hand for Nodes

465 2 and 3, the Exons, Exons-combined, and Introns datasets supported the primary

466 topology whereas the alternate topology was supported by the UCE dataset. Given the

467 high levels of conflict among different genetic markers, determining which topology

468 represents the species tree, or whether a single “true” species tree even exists, may not be

469 possible or even desirable when we consider that disregarding a considerable amount of

470 genuine but conflicting signal that are inherently present with the data may misrepresent

471 evolutionary history (Philippe et al. 2011; Hahn and Nakhleh 2016; Crowl et al. 2017;

472 Reddy et al. 2017; Rosser et al. 2017; Platt et al. 2018).

473 The presence of strong conflicting signals also highlights the importance of

474 examining not only branch support, but also alternate topologies when assessing the

475 accuracy of phylogenomic trees. This study showed that alternate topologies can be

476 inferred with high support even when there are equal proportions of gene trees supporting

477 conflicting topologies. In such cases, the probability of obtaining the alternate topology

478 may be as likely as the primary topology and this cannot be gleaned by merely observing

479 branch support values. We therefore advocate for the examination and reporting of

480 alternate topologies when evaluating the confidence of phylogenomics trees, instead of

21 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.

481 purely relying on measures of branch support. Furthermore, our results showed that

482 increasing the amount of loci doesn't positively correlate with increasing congruence.

483 Instead, we found a remarkably strong and positive correlation between levels of

484 congruence and its associated branch length across all datasets and marker types (Fig. 3).

485 Although a similar relationship was implied by Wiens et al's. (2008) analysis of 20 loci,

486 we demonstrably consider this phenomenon at a genomic scale, orders of magnitude

487 more expansive, and with data from across the genome.

488 Despite the known shortcomings of traditional bootstrapping for large datasets

489 (Gadagkar et al. 2005; Kumar et al. 2012; Smith et al. 2015; Rodríguez et al. 2017;

490 Roycroft et al. 2019), the procedure remains one of the most widely used measures of

491 heuristic nodal support, including phylogenomic datasets. Although alternate bootstrap

492 methods have been proposed to reduce false positives e.g. resampling sites within

493 partitions or resampling partitions instead of sites (Nei et al. 2001; Gadagkar et al. 2005),

494 these strategies are not computationally tractable for tens of thousands of informative

495 loci. However, traditional bootstrapping remains tractable for assessing statistical support

496 for clades, especially when the amount of information is limited. The continued value of

497 traditional bootstrapping was illustrated by relatively low BS values at Node 1 for the

498 Exons-combined dataset and Node 3 for our UCE dataset; both we interpret as likely

499 indicative of insufficient phylogenetic information. On the other hand, high bootstrap

500 values are not necessarily reflective of high confidence for a correct topology, and can be

501 artifacts of big data, which reduces sampling variance (Smith et al. 2015; Rodríguez et al.

502 2017; Roycroft et al. 2019). Although low bootstrap values can in fact reflect poor

503 support, we recommend that high bootstrap values obtained from large genomic datasets bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA

504 be interpreted with caution and to use complimentary measures such as gCF, quartet

505 scores, and posterior probabilities to obtain a better characterization of variation in the

506 data and a more comprehensive perspective of evolutionary history.

507

508 Implications for Systematics and Taxonomy

509 Despite lingering uncertainty surrounding the branching order of specific clades,

510 our findings resolve some long-standing systematic conundrums with high statistical

511 support. The reciprocally monophyletic relationship between Chalcorana and Pulchrana

512 differs from Oliver et al. (2015) but was also inferred recently by Chan and Brown

513 (2017), and is consistently and unequivocally supported across all our analyses. Our

514 results also conclusively demonstrated that southeast Asian “Amnirana” nicobariensis is

515 not congeneric with the true Amnirana from Africa (the containing the generotype

516 species) and, thus, should be placed in a different genus. However, genomic data from

517 additional taxa, especially from the genus Indosylvirana will be needed before this taxon

518 can be placed with certainty in an existing genus. Another novel insight gleaned from this

519 study is the phylogenetic placement of “Hylarana” celebensis from Sulawesi, Indonesia.

520 This clade was inferred as the sister lineage of the “Indosylvirana” milleti + Papurana

521 clade with high support across all datasets and analyses with the exception of the UCE

522 species tree analysis, which inferred it to be sister to Hydrophylax but with low support.

523 Our results conclusively showed that “Hylarana” celebensis does not belong to the genus

524 Hylarana (the clade containing the generotype species for Hylarana) but, instead, may

525 represent a distinct lineage for which the erection of a new genus may be required. The

526 placement of “Indosylvirana” milleti from Cambodia as the sister lineage of Papurana

23 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.

527 was unexpected, yet highly supported across all analyses. The genus Papurana is

528 restricted to the Australasian region, which was never connected by land to Indochina

529 (Hall 1998, 2013; Voris 2000). Hence, its sister relationship with a lineage from

530 Cambodia is biogeographically incoherent (just as the inclusion of "Amnirana"

531 nicobariensis makes little biogeographic sense, when included in the otherwise African

532 genus Amnirana). We speculate that missing taxa could be responsible for these

533 anomalous relationships and that inclusion of additional taxa from the intervening regions

534 of Indochina, Borneo, and Wallacea will be necessary to resolve the placement of

535 recalcitrant lineages and stabilize classification.

536

537 SUPPLEMENTARY MATERIAL

538 Data and online-only supplementary materials are available from the Dryad

539 Digital Repository (https://doi.org/10.5061/dryad.bj907mp)

540

541 ACKNOWLEDGEMENTS

542 KOC’s work was supported by U.S. National Science Foundation (DEB 1702036)

543 and the National Geographic Society (9722-15). Indonesian, Philippine, and Solomon

544 Islands sampling were funded by NSF support to RMB (DEB 0640737, 0743491,

545 1557053, respectively) and Illumina sequencing was partially funded by DEB 1654388.

546

547 REFERENCES

548 Abdelkrim J., Aznar-Cormano L., Fedosov A.E., Kantor Y.I., Lozouet P., Phuong M.A.,

549 Zaharias P., Puillandre N. 2018. Exon-Capture-Based phylogeny and diversification bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA

550 of the venomous gastropods (Neogastropoda, Conoidea). Mol. Biol. Evol. 35:2355–

551 2374.

552 Alexander A.M., Su Y.C., Oliveros C.H., Olson K. V., Travers S.L., Brown R.M. 2016.

553 Genomic data reveals potential for hybridization, introgression, and incomplete

554 lineage sorting to confound phylogenetic relationships in an of

555 narrow-mouth frogs. Evolution (N. Y). 71:475–488.

556 Allen E.V.A.S., Omland K.E. 2003. Novel intron phylogeny supports plumage

557 convergence in Orioles (Icterus). Auk. 120:961–969.

558 AmphibiaWeb. 2019. AmphibiaWeb. Available from http://amphibiaweb.org.

559 Arifin U., Smart U., Hertwig S.T., Smith E.N., Iskandar D.T., Haas A. 2018. Molecular

560 phylogenetic analysis of a taxonomically unstable ranid from Sumatra, Indonesia,

561 reveals a new genus with gastromyzophorous tadpoles and two new species.

562 Zoosystematics Evol. 94:163–193.

563 Armstrong M.H., Braun E.L., Kimball R.T. 2001. Phylogenetic utility of Avian

564 Ovomucoid Intron G: A comparison of nuclear and mitochondrial phylogenies in

565 Galliformes. Auk. 118:799–804.

566 Baca S.M., Alexander A., Gustafson G.T., Short A.E.Z. 2017. Ultraconserved elements

567 show utility in phylogenetic inference of Adephaga (Coleoptera) and suggest

568 paraphyly of ‘Hydradephaga.’ Syst. Entomol. 42:786–795.

569 Bankevich A., Nurk S., Antipov D., Gurevich A.A., Dvorkin M., Kulikov A.S., Lesin

570 V.M., Nikolenko S.I., Pham S., Prjibelski A.D., Pyshkin A. V., Sirotkin A. V.,

571 Vyahhi N., Tesler G., Alekseyev M.A., Pevzner P.A. 2012. SPAdes: A new genome

572 assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol.

25 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.

573 19:455–477.

574 Bayzid M.S., Warnow T. 2013. Naive binning improves phylogenomic analyses.

575 Bioinformatics. 29:2277–2284.

576 Bi K., Vanderpool D., Singhal S., Linderoth T., Moritz C., Good J.M. 2012.

577 Transcriptome-based exon capture enables highly cost-effective comparative

578 genomic data collection at moderate evolutionary scales. BMC Genomics. 13:403.

579 Blaimer B.B., Brady S.G., Schultz T.R., Lloyd M.W., Fisher B.L., Ward P.S. 2015.

580 Phylogenomic methods outperform traditional multi-locus approaches in resolving

581 deep evolutionary history: A case study of formicine ants. BMC Evol. Biol. 15:1–

582 14.

583 Blom M.P.K., Bragg J.G., Potter S., Moritz C. 2017. Accounting for uncertainty in gene

584 tree estimation: Summary-coalescent species tree inference in a challenging

585 radiation of Australian lizards. Syst. Biol. 66:352–366.

586 Borowiec M.L. 2016. AMAS: a fast tool for alignment manipulation and computing of

587 summary statistics. PeerJ. 4:e1660.

588 Bragg J.G., Potter S., Afonso Silva A.C., Hoskin C.J., Bai B.Y.H., Moritz C. 2018.

589 Phylogenomics of a rapid radiation: The Australian rainbow skinks. BMC Evol.

590 Biol. 18:1–12.

591 Bragg J.G., Potter S., Bi K., Moritz C. 2016. Exon capture phylogenomics: efficacy

592 across scales of divergence. Mol. Ecol. Resour. 16:1059–1068.

593 Bushnell B., Rood J., Singer E. 2017. BBMerge – Accurate paired shotgun read merging

594 via overlap. PLoS One. 12:1–15.

595 Capella-gutiérrez S., Silla-martínez J.M., Gabaldón T. 2009. trimAl: a tool for bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA

596 automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics.

597 25:1972–1973.

598 Chan K.O., Brown R.M. 2017. Did true frogs ‘dispersify’? Biol. Lett. 13:20170299.

599 Che J., Pang J., Zhao H., Wu G.F., Zhao E.M., Zhang Y.P. 2007. Phylogeny of Raninae

600 (Anura: Ranidae) inferred from mitochondrial and nuclear sequences. Mol.

601 Phylogenet. Evol. 43:1–13.

602 Chen L., Murphy R.W., Lathrop A., Ngo A., Orlov N.L., Cuc T.H., Somorjai I.L.M.

603 2005. Taxonomic chaos in Asian ranid frogs: an initial phylogenetic resolution.

604 Herpetol. J. 15:231–243.

605 Chen M.Y., Liang D., Zhang P. 2015. Selecting question-specific genes to reduce

606 incongruence in phylogenomics: A case study of jawed vertebrate backbone

607 phylogeny. Syst. Biol. 64:1104–1120.

608 Chen M.Y., Liang D., Zhang P. 2017a. Phylogenomic resolution of the phylogeny of

609 laurasiatherian mammals: Exploring phylogenetic signals within coding and

610 noncoding sequences. Genome Biol. Evol. 9:1998–2012.

611 Chen S., Huang T., Zhou Y., Han Y., Xu M., Gu J. 2017b. AfterQC: Automatic filtering,

612 trimming, error removing and quality control for fastq data. BMC Bioinformatics.

613 18:91–100.

614 Chen Z., Li H., Zhu Y., Feng Q., He Y., Chen X. 2017c. Molecular phylogeny of the

615 family Dicroglossidae (Amphibia: Anura) inferred from complete mitochondrial

616 genomes. Biochem. Syst. Ecol. 71:1–9.

617 Chernomor O., Von Haeseler A., Minh B.Q. 2016. Terrace aware data structure for

618 phylogenomic inference from supermatrices. Syst. Biol. 65:997–1008.

27 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.

619 Chojnowski J.L., Kimball R.T., Braun E.L. 2008. Introns outperform exons in analyses of

620 basal avian phylogeny using clathrin heavy chain genes. Gene. 410:89–96.

621 Collins R.A., Hrbek T. 2018. An in silico comparison of protocols for dated

622 phylogenomics. Syst. Biol. 67:633–650.

623 Crawford N.G., Faircloth B.C., Mccormack J.E., Brumfield R.T., Winker K., Glenn T.C.

624 2012. More than 1000 ultraconserved elements provide evidence that turtles are the

625 sister group of archosaurs. Biol. Lett. 8:783–786.

626 Creer S. 2007. Choosing and using introns in molecular . Evol. Bioinforma.

627 3:99–108.

628 Crowl A.A., Myers C., Cellinese N. 2017. Embracing discordance: Phylogenomic

629 analyses provide evidence for allopolyploidy leading to cryptic diversity in a

630 Mediterranean Campanula (Campanulaceae) clade. Evolution (N. Y). 71:913–922.

631 Van Dam M.H., Lam A.W., Sagata K., Gewa B., Laufa R., Balke M., Faircloth B.C.,

632 Riedel A. 2017. Ultraconserved elements (UCEs) resolve the phylogeny of

633 Australasian smurf-weevils. PLoS One. 12:1–21.

634 Davidson R., Vachaspati P., Mirarab S., Warnow T. 2015. Phylogenomic species tree

635 estimation in the presence of incomplete lineage sorting and horizontal gene

636 transfer. BMC Genomics. 16:S1.

637 DeBry R.W., Seshadri S. 2005. Nuclear intron sequences for phylogenetics of closely

638 related mammals: an example uising the phylogeny of Mus. J. Mammal. 82:280–

639 288.

640 Degnan J.H., Rosenberg N. a. 2009. Gene tree discordance, phylogenetic inference and

641 the multispecies coalescent. Trends Ecol. Evol. 24:332–340. bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA

642 Dell’Ampio E., Meusemann K., Szucsich N.U., Peters R.S., Meyer B., Borner J.,

643 Petersen M., Aberer A.J., Stamatakis A., Walzl M.G., Minh B.Q., Von Haeseler A.,

644 Ebersberger I., Pass G., Misof B. 2014. Decisive data sets in phylogenomics:

645 Lessons from studies on the phylogenetic relationships of primarily wingless insects.

646 Mol. Biol. Evol. 31:239–249.

647 Delsuc F., Brinkmann H., Philippe H. 2005. Phylogenomics and the reconstruction of the

648 tree of . Nat. Rev. Genet. 6:361–375.

649 Doyle V.P., Young R.E., Naylor G.J.P., Brown J.M. 2015. Can we identify genes with

650 increased phylogenetic reliability? Syst. Biol. 64:824–837.

651 Dubois A. 1992. Notes sur la classification des Ranidae (Amphibiens anoures). Bull.

652 Mens. la Société Linnéenne Lyon. 61:305–352.

653 Dubois A., Crombie R.I., Glaw F. 2005. Amphibia Mundi. 1.2. Recent :

654 Generic and infrageneric taxonomic additions (1981-2002). Alytes. 23:25–69.

655 Eaton D.A.R., Hipp A.L., González-Rodríguez A., Cavender-Bares J. 2015. Historical

656 introgression among the American live oaks and the comparative nature of tests for

657 introgression. Evolution (N. Y). 69:2587–2601.

658 Faircloth B.C., McCormack J.E., Crawford N.G., Harvey M.G., Brumfield R.T., Glenn

659 T.C. 2012. Ultraconserved elements anchor thousands of genetic markers spanning

660 multiple evolutionary timescales. Syst. Biol. 61:717–726.

661 Faircloth B.C., Sorenson L., Santini F., Alfaro M.E. 2013. A phylogenomic perspective

662 on the radiation of ray-finned fishes based upon targeted sequencing of

663 Ultraconserved Elements (UCEs). PLoS One. 8.

664 Felsenstein J. 1985. Confidence limits on phylogenies: an approach using the bootstrap.

29 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.

665 Evolution (N. Y). 39:783–791.

666 Folk R.A., Mandel J.R., Freudenstein J. V. 2015. A protocol for targeted enrichment of

667 intron-containing sequence rarkers for recent radiations: A phylogenomic example

668 from Heuchera (Saxifragaceae). Appl. Plant Sci. 3:1500039.

669 Frost D.R. 2019. Amphibian Species of the World: an Online Reference. Version 6.0

670 (accessed 10 June 2019). .

671 Frost D.R., Grant T., Faivovich J., Bain R.H., Haas A., Haddad C.F.B., De Sá R.O.,

672 Channing A., Wilkinson M., Donnellan S.C., Raxworthy C.J., Campbell J. a., Blotto

673 B.L., Moler P., Drewes R.C., Nussbaum R. a., Lynch J.D., Green D.M., Wheeler

674 W.C. 2006. The amphibian tree of life. Bull. Am. Museum Nat. Hist. 297:1–291.

675 Gadagkar S.R., Rosenberg M.S., Kumar S. 2005. Inferring species phylogenies from

676 multiple genes: Concatenated sequence tree versus consensus gene tree. J. Exp.

677 Zool. Part B Mol. Dev. Evol. 304:64–74.

678 Galtier N., Daubin V. 2008. Dealing with incongruence in phylogenomic analyses.

679 Philos. Trans. R. Soc. B Biol. Sci. 363:4023–4029.

680 Gatesy J., Springer M.S. 2014. Phylogenetic analysis at deep timescales: Unreliable gene

681 trees, bypassed hidden support, and the coalescence/concatalescence conundrum.

682 Mol. Phylogenet. Evol. 80:231–266.

683 Gee H. 2003. Evolution: ending incongruence. Nature. 425:782.

684 Hahn M.W., Nakhleh L. 2016. Irrational exuberance for resolved species trees. Evolution

685 (N. Y). 70:7–17.

686 Hall R. 1998. The plate tectonics of Cenozoic SE Asia and the distribution of land and

687 sea. Biogeogr. Geol. Evol. SE Asia.:99–131. bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA

688 Hall R. 2013. The palaeogeography of Sundaland and Wallacea since the Late Jurassic. J.

689 Limnol. 72:1–17.

690 Hoang D.T., Chernomor O., von Haeseler A., Minh B.Q., Le S.V. 2017. UFBoot2:

691 improving the ultrafast bootstrap approximation. Mol. Biol. Evol. 35:518–522.

692 Hugall A.F., O’hara T.D., Hunjan S., Nilsen R., Moussalli A. 2016. An exon-capture

693 system for the entire class Ophiuroidea. Mol. Biol. Evol. 33:281–294.

694 Hutter C.R. 2019. FrogCap: A novel exon-capture probeset for Ranoidea. BioArxiv. In

695 press.

696 Igea J., Juste J., Castresana J. 2010. Novel intron markers to study the phylogeny of

697 closely related mammalian species. BMC Evol. Biol. 10:369.

698 Ilves K.L., Torti D., López-Fernández H. 2018. Exon-based phylogenomics strengthens

699 the phylogeny of Neotropical cichlids and identifies remaining conflicting clades

700 (Cichliformes: Cichlidae: Cichlinae). Mol. Phylogenet. Evol. 118:232–243.

701 IUCN. 2018. The IUCN Red List of Threatened Species. .

702 Jeffroy O., Brinkmann H., Delsuc F., Philippe H. 2006. Phylogenomics: the beginning of

703 incongruence? Trends Genet. 22:225–231.

704 Kalyaanamoorthy S., Minh B.Q., Wong T.K.F., von Haeseler A., Jermiin L.S. 2017.

705 ModelFinder: fast model selection for accurate phylogenetic estimates. Nat.

706 Methods. 14:587–589.

707 Katoh K., Standley D.M. 2013. MAFFT multiple sequence alignment software version 7:

708 Improvements in performance and usability. Mol. Biol. Evol. 30:772–780.

709 Kendall M., Colijn C. 2016. Mapping phylogenetic trees to reveal distinct patterns of

710 evolution. Mol. Biol. Evol. 33:2735–2743.

31 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.

711 Kent W.J. 2002. BLAT — The BLAST -Like Alignment Tool. Genome Res. 12:656–

712 664.

713 Krauss V., Thümmler C., Georgi F., Lehmann J., Stadler P.F., Eisenhardt C. 2008. Near

714 intron positions are reliable phylogenetic markers: an application to Holometabolous

715 insects. Mol. Biol. Evol. 25:821–830.

716 Kumar S., Filipski A.J., Battistuzzi F.U., Kosakovsky Pond S.L., Tamura K. 2012.

717 Statistics and truth in phylogenomics. Mol. Biol. Evol. 29:457–472.

718 Lanier H.C., Knowles L.L. 2012. Is recombination a problem for species-tree analyses?

719 Syst. Biol. 61:691–701.

720 Lanier H.C., Knowles L.L. 2015. Applying species-tree analyses to deep phylogenetic

721 histories: Challenges and potential suggested from a survey of empirical

722 phylogenetic studies. Mol. Phylogenet. Evol. 83:191–199.

723 Leaché A.D., Chavez A.S., Jones L.N., Grummer J.A., Gottscho A.D., Linkem C.W.

724 2015. Phylogenomics of Phrynosomatid lizards: conflicting signals from sequence

725 capture versus restriction site associated DNA sequencing. Genome Biol. Evol.

726 7:706–719.

727 Lemmon A.R., Emme S.A., Lemmon E.M. 2012. Anchored enrichment for

728 massively high-throughput phylogenomics. Syst. Biol. 61:727–744.

729 Léveillé-Bourret É., Starr J.R., Ford B.A., Moriarty Lemmon E., Lemmon A.R. 2018.

730 Resolving rapid radiations within Angiosperm families using anchored

731 phylogenomics. Syst. Biol. 67:94–112.

732 McCormack J.E., Faircloth B.C., Crawford N.G., Gowaty P.A., Brumfield R.T., Glenn

733 T.C. 2012. Ultraconserved elements are novel phylogenomic markers that resolve bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA

734 placental mammal phylogeny when combined with species-tree analysis. Genome

735 Res. 22:746–754.

736 Mclean B.S., Bell K.C., Allen J.M., Helgen K.M., Cook J.A. 2019. Impacts of inference

737 method and data set filtering on phylogenomic resolution in a rapid radiation of

738 Ground Squirrels (Xerinae: Marmotini). Syst. Biol. 68:298–316.

739 Meiklejohn K.A., Faircloth B.C., Glenn T.C., Kimball R.T., Braun E.L. 2016. Analysis

740 of a rapid evolutionary radiation using ultraconserved elements: evidence for a bias

741 in some multispecies coalescent methods. Syst. Biol. 65:612–627.

742 Mendes F.K., Hahn M.W. 2018. Why concatenation fails near the anomaly zone. Syst.

743 Biol. 67:158–169.

744 Minh B.Q., Hahn M.W., Lanfear R. 2018. New methods to calculate concordance factors

745 for phylogenomic datasets. bioRxiv.:doi: http://dx.doi.org/10.1101/487801.

746 Mirarab S., Bayzid M.S., Warnow T. 2016. Evaluating summary methods for multilocus

747 species tree estimation in the presence of incomplete lineage sorting. Syst. Biol.

748 65:366–380.

749 Mirarab S., Reaz R., Bayzid M.S., Zimmermann T., S. Swenson M., Warnow T. 2014.

750 ASTRAL: Genome-scale coalescent-based species tree estimation. Bioinformatics.

751 30:541–548.

752 Molloy E.K., Warnow T. 2017. To include or not to include: the impact of gene filtering

753 on species tree estimation methods. Syst. Biol. 67:285–303.

754 Nei M., Xu P., Glazko G. 2001. Estimation of divergence times from multiprotein

755 sequences for a few mammalian species and several distantly related organisms.

756 Proc. Natl. Acad. Sci. 98:2497–2502.

33 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.

757 Nguyen L.T., Schmidt H.A., Von Haeseler A., Minh B.Q. 2015. IQ-TREE: A fast and

758 effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol.

759 Biol. Evol. 32:268–274.

760 Nute M., Chou J., Molloy E.K., Warnow T. 2018. The performance of coalescent-based

761 species tree estimation methods under models of missing data. BMC Genomics.

762 19:1–22.

763 Ogilvie H.A., Heled J., Xie D., Drummond A.J. 2016. Computational performance and

764 statistical accuracy of *BEAST and comparisons with other methods. Syst. Biol.

765 65:381–396.

766 Oliver L.A., Prendini E., Kraus F., Raxworthy C.J. 2015. Systematics and

767 of the Hylarana (Anura: Ranidae) radiation across tropical Australasia,

768 Southeast Asia, and Africa. Mol. Phylogenet. Evol. 90:176–192.

769 Ottenburghs J., Kraus R.H.S., van Hooft P., van Wieren S.E., Ydenberg R.C., Prins

770 H.H.T. 2017. Avian introgression in the genomic era. Avian Res. 8:1–11.

771 Patel S. 2013. Error in Phylogenetic Estimation for Bushes in the Tree of Life. J.

772 Phylogenetics Evol. Biol. 01:1–10.

773 Pease J.B., Brown J.W., Walker J.F., Hinchliff C.E., Smith S.A. 2018. Quartet Sampling

774 distinguishes lack of support from conflicting support in the green plant tree of life.

775 Am. J. Bot. 105:385–403.

776 Philippe H., Brinkmann H., Lavrov D. V., Littlewood D.T.J., Manuel M., Wörheide G.,

777 Baurain D. 2011. Resolving difficult phylogenetic questions: Why more sequences

778 are not enough. PLoS Biol. 9.

779 Philippe H., Delsuc F., Brinkmann H., Lartillot N. 2005. Phylogenomics. Annu. Rev. bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA

780 Ecol. Evol. Syst. 36:541–562.

781 Phillips M.J., Delsuc F., Penny D. 2004. Genome-scale phylogeny and the detection of

782 systematic biases. Mol. Biol. Evol. 21:1455–1458.

783 Platt R.N., Faircloth B.C., Sullivan K.A.M., Kieran T.J., Glenn T.C., Vandewege M.W.,

784 Lee T.E., Baker R.J., Stevens R.D., Ray D.A. 2018. Conflicting evolutionary

785 histories of the mitochondrial and nuclear genomes in New World Myotis bats. Syst.

786 Biol. 67:236–249.

787 Pyron A.R., Wiens J.J. 2011. A large-scale phylogeny of Amphibia including over 2800

788 species, and a revised classification of extant frogs, salamanders, and caecilians.

789 Mol. Phylogenet. Evol. 61:543–583.

790 Reddy S., Kimball R.T., Pandey A., Hosner P.A., Braun M.J., Hackett S.J., Han K.L.,

791 Harshman J., Huddleston C.J., Kingston S., Marks B.D., Miglia K.J., Moore W.S.,

792 Sheldon F.H., Witt C.C., Yuri T., Braun E.L. 2017. Why do phylogenomic data sets

793 yield conflicting trees? Data type influences the avian tree of life more than taxon

794 sampling. Syst. Biol. 66:857–879.

795 Roch S., Steel M. 2015. Likelihood-based tree reconstruction on a concatenation of

796 aligned sequence data sets can be statistically inconsistent. Theor. Popul. Biol.

797 100:56–62.

798 Roch S., Warnow T. 2015. On the robustness to gene tree estimation error (or lack

799 thereof) of coalescent-based species tree methods. Syst. Biol. 64:663–676.

800 Rodríguez A., Burgon J.D., Lyra M., Irisarri I., Baurain D., Blaustein L., Göçmen B.,

801 Künzel S., Mable B.K., Nolte A.W., Veith M., Steinfartz S., Elmer K.R., Philippe

802 H., Vences M. 2017. Inferring the shallow phylogeny of true salamanders

35 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.

803 (Salamandra) by multiple phylogenomic approaches. Mol. Phylogenet. Evol.

804 115:16–26.

805 Rosser N.L., Thomas L., Stankowski S., Richards Z.T., Kennington W.J., Johnson M.S.

806 2017. Phylogenomics provides new insight into evolutionary relationships and

807 genealogical discordance in the reef-building coral genus Acropora. Proc. R. Soc. B

808 Biol. Sci. 284.

809 Rothfels C.J., Larsson A., Kuo L.Y., Korall P., Chiou W.L., Pryer K.M. 2012.

810 Overcoming deep roots, fast rates, and short internodes to resolve the ancient rapid

811 radiation of eupolypod II ferns. Syst. Biol. 61:490–509.

812 Roure B., Baurain D., Philippe H. 2013. Impact of missing data on phylogenies inferred

813 from empirical phylogenomic data sets. Mol. Biol. Evol. 30:197–214.

814 Roycroft E.J., Moussalli A., Rowe K.C. 2019. Phylogenomics Uncovers Confidence and

815 Conflict in the Rapid Radiation of Australo-Papuan Rodents. Syst. Biol. syz044.

816 Scornavacca C., Galtier N. 2017. Incomplete lineage sorting in mammalian

817 phylogenomics. Syst. Biol. 66:112–120.

818 Seo T.K. 2008. Calculating bootstrap probabilities of phylogeny using multilocus

819 sequence data. Mol. Biol. Evol. 25:960–971.

820 Simmons M.P., Gatesy J. 2015. Coalescence vs. concatenation: Sophisticated analyses

821 vs. first principles applied to rooting the angiosperms. Mol. Phylogenet. Evol.

822 91:98–122.

823 Singhal S., Grundler M., Colli G., Rabosky D.L. 2017. Squamate conserved loci (SqCL):

824 a unified set of conserved loci for phylogenomics and of

825 squamate reptiles. Mol. Ecol. Resour. 17:e12–e24. bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA

826 Slater G., Birney E. 2005. Automated generation of heuristics for biological sequence

827 comparison. BMC Bioinformatics. 6:31.

828 Smith S.A., Moore M.J., Brown J.W., Yang Y. 2015. Analysis of phylogenomic datasets

829 reveals conflict, concordance, and gene duplications with examples from

830 and plants. BMC Evol. Biol. 15:1–15.

831 Stuart B.L. 2008. The phylogenetic problem of Huia (Amphibia: Ranidae). Mol.

832 Phylogenet. Evol. 46:49–60.

833 Susko E. 2008. On the distributions of bootstrap support and posterior distributions for a

834 star tree. Syst. Biol. 57:602–612.

835 Tarver J.E., Dos Reis M., Mirarab S., Moran R.J., Parker S., O’Reilly J.E., King B.L.,

836 O’Connell M.J., Asher R.J., Warnow T., Peterson K.J., Donoghue P.C.J., Pisani D.

837 2016. The interrelationships of placental mammals and the limits of phylogenetic

838 inference. Genome Biol. Evol. 8:330–344.

839 Tonini J., Moore A., Stern D., Shcheglovitova M., Orti G. 2015. Concatenation and

840 species tree methods exhibit statistically indistinguishable accuracy under a aange of

841 simulated conditions. PLOS Curr. Tree Life.:Tonini, J., Moore, A., Stern, D.,

842 Shcheglovitova,.

843 Townsend J.P., Leuenberger C. 2011. Taxon Sampling and the Optimal Rates of

844 Evolution for Phylogenetic Inference. Syst. Biol. 60:358–365.

845 Vachaspati P., Warnow T. 2015. ASTRID: Accurate species TRees from internode

846 distances. BMC Genomics. 16:1–13.

847 Vachaspati P., Warnow T. 2018. SVDquest: Improving SVDquartets species tree

848 estimation using exact optimization within a constrained search space. Mol.

37 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.

849 Phylogenet. Evol. 124:122–136.

850 Voris H.K. 2000. Maps of Pleistocene sea levels in Southeast Asia: Shorelines, river

851 systems and time durations. J. Biogeogr. 27:1153–1167.

852 Warnow T. 2015. Concatenation analyses in the presence of incomplete lineage sorting.

853 PLOS Curr. Tree Life.:1–10.

854 Weisrock D.W., Smith S.D., Chan L.M., Biebouw K., Kappeler P.M., Yoder A.D. 2012.

855 Concatenation and concordance in the reconstruction of mouse lemur phylogeny: An

856 empirical demonstration of the effect of allele sampling in phylogenetics. Mol. Biol.

857 Evol. 29:1615–1630.

858 Whitfield J.B., Kjer K.M. 2008. Ancient rapid radiations of insects: challenges for

859 phylogenetic analysis. Annu. Rev. Entomol. 53:449–472.

860 Whitfield J.B., Lockhart P.J. 2007. Deciphering ancient rapid radiations. Trends Ecol.

861 Evol. 22:258–265.

862 Wielstra B., Arntzen J.W., Van Der Gaag K.J., Pabijan M., Babik W. 2014. Data

863 concatenation, Bayesian concordance and coalescent-based analyses of the species

864 tree for the rapid radiation of Triturus newts. PLoS One. 9.

865 Wiens J.J., Kuczynski C.A., Smith S.A., Mulcahy D.G., Sites J.W., Townsend T.M.,

866 Reeder T.W. 2008. Branch lengths, support, and congruence: Testing the

867 phylogenomic approach with 20 nuclear loci in snakes. Syst. Biol. 57:420–431.

868 Wiens J.J., Morrill M.C. 2011. Missing data in phylogenetic analysis: Reconciling results

869 from simulations and empirical data. Syst. Biol. 60:719–731.

870 Yang Z., Zhu T. 2018. Bayesian selection of misspecified models is overconfident and

871 may cause spurious posterior probabilities for phylogenetic trees. Proc. Natl. Acad. bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA

872 Sci. 115:1854–1859.

873 Yuan Z.Y., Zhou W.W., Chen X., Poyarkov N.A., Chen H.M., Jang-Liaw N.H., Chou

874 W.H., Matzke N.J., Iizuka K., Min M.S., Kuzmin S.L., Zhang Y.P., Cannatella D.C.,

875 Hillis D.M., Che J. 2016. Spatiotemporal diversification of the True Frogs (genus

876 ): A historical framework for a widely studied group of model organisms. Syst.

877 Biol. 65:824–842.

878 Zhang C., Rabiee M., Sayyari E., Mirarab S. 2018. ASTRAL-III: Polynomial time

879 species tree reconstruction from partially resolved gene trees. BMC Bioinformatics.

880 19:15–30.

881 Zhang J., Kobert K., Flouri T., Stamatakis A. 2014. PEAR: A fast and accurate Illumina

882 Paired-End reAd mergeR. Bioinformatics. 30:614–620.

883 Zhang Q., Feild T.S., Antonelli A. 2015. Assessing the impact of phylogenetic

884 incongruence on taxonomy, floral evolution, biogeographical history, and

885 phylogenetic diversity. Am. J. Bot. 102:566–580.

886

887 FIGURE CAPTIONS

888 Figure 1 Comparisons of the four primary topologies obtained from phylogenetic

889 analyses across the various datasets with discordant taxa highlight in red. The three focal

890 nodes with the highest discordance are labelled with a red circle.

891

892 Figure 2 Density plots showing average bootstrap values for each single-locus gene tree

893 and normalized Robinson–Foulds distances between each gene tree and the

894 corresponding species tree. Vertical dotted lines represent mean values for each dataset.

39 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.

895

896 Figure 3 Relationship between branch length (in coalescent units) and its corresponding

897 gene concordance factor (gCF). Branch lengths were obtained from the ASTRAL

898 analysis.

899

900 Figure 4 Comparison of ultrafast bootstrap values (from IQ-TREE), local posterior

901 probabilities (from ASTRAL), gene concordance factor, and quartet support (from

902 ASTRAL) for each focal node across the various datasets.

903

904 Figure 5 Frequency (in percentage) of the three possible topologies surrounding each

905 focal node. Cladograms representing each possible topology are color coded to match the

906 stacked bars.

907

908 Table 1. Attributes and summary statistics of the various datasets used in this study. PIS

909 = parsimony informative sites.

Dataset Filtering No. Locus length Total Mean

loci (mean | median) PIS prop.

PIS

Exons-unfiltered None 12,332 213 | 165 573,425 0.2

Exons 50 50% complete 10,375 215 | 168 507,033 0.21

Exons 75 75% complete 8,599 224 | 171 446,916 0.22

Exons 95 95% complete 770 312 | 210 57,467 0.23

Exons PIS-50 Top 50% PIS 6,166 286 | 207 441,134 0.25

Exons PIS-25 Top 25% 3,083 273 | 390 319,642 0.27 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA

Exons PIS-5 Top 5% PIS 617 702 | 852 147,407 0.29

EC-unfiltered None 2,254 619 | 480 291,342 0.2

EC 50 50% complete 1,822 576 | 459 216,646 0.2

EC 75 75% complete 1,749 583 | 465 211,947 0.2

EC 95 95% complete 705 667 | 537 101,132 0.21

EC PIS-50 Top 50% PIS 1,127 878 | 726 220,124 0.22

EC PIS-25 Top 25% 564 1,173 | 986 151,899 0.23

EC PIS-5 Top 5% PIS 113 2,028 | 1,884 54,082 0.24

Introns-unfiltered None 12,299 480 | 476 2,744,044 0.47

Introns 50 50% complete 10,570 496 | 496 2,558,468 0.49

Introns 75 75% complete 8,333 513 | 500 2,117,497 0.5

Introns 95 95% complete 248 533 | 540 59,442 0.46

Introns PIS-50 Top 50% PIS 6,150 595 | 583 1,867,336 0.52

Introns PIS-25 Top 25% 3,075 662 | 653 1,074,656 0.54

Introns PIS-5 Top 5% PIS 615 773 | 761 261,722 0.56

UCE-unfiltered None 638 782 | 769 114,282 0.22

UCE 50 50% complete 516 787 | 781 97,156 0.23

UCE 75 75% complete 447 815 | 800 89,804 0.24

UCE 95 95% complete 157 861 | 864 32,432 0.24

UCE PIS-50 Top 50% PIS 319 916 | 901 82,109 0.28

UCE PIS-25 Top 25% 160 998 | 994 48,370 0.31

910

911

912

913

41 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. CHAN ET AL.

914 Table 2 Inferred topologies from the IQ-TREE, ASTRAL, and ASTRID analyses and

915 their corresponding quartet scores, QS (for ASTRAL) and average gCF values.

Topology

Dataset IQ-TREE ASTRAL ASTRID QS Mean gCF

Exons-unfiltered T2 T2 T2 0.69 50.94

Exons 50 T2 T2 T2 0.66 50.96

Exons 75 T2 T2 T2 0.66 51.07

Exons 95 T2 T2 T2 0.68 54.23

Exons PIS-50 T2 T2 T2 0.71 58.75

Exons PIS-25 T2 T2 T2 0.75 65.41

Exons PIS-5 T2 T2 T2 0.84 77.78

EC-unfiltered T2 T1 T1 0.76 63.96

EC 50 T2 T1 T1 0.75 62.67

EC 75 T2 T1 T1 0.75 62.93

EC 95 T2 T1 T1 0.77 66.24

EC PIS-50 T2 T2 T1 0.81 73.13

EC PIS-25 T2 T2 T2 0.85 78.68

EC PIS-5 T2 T1 T1 0.89 83.88

Introns-unfiltered T1 T1 T1 0.78 66.55

Introns 50 T1 T1 T1 0.78 66.79

Introns 75 T1 T1 T1 0.78 67.21

Introns 95 T5 T3 T4 0.79 68.65

Introns PIS-50 T1 T3 T1 0.79 68.84

Introns PIS-25 T1 T3 T1 0.8 69.45

Introns PIS-5 T2 T3 T3 0.81 70.70 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. PHYLOGENOMIC CONFLICT IN HYLARANA

UCE-unfiltered T3 T4 T4 0.81 71.79

UCE 50 T3 T4 T3 0.8 71.40

UCE 75 T3 T3 T3 0.8 72.00

UCE 95 T4 T4 T4 0.81 73.07

UCE PIS-50 T3 T4 T3 0.85 76.44

UCE PIS-25 T3 T4 T3 0.84 77.96

916

43 bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Exon Exons−combined 100

75

50

25 Dataset 50% complete 75% complete 95% complete Intron UCE gCF Top 25% PIS 100 Top 50% PIS Top 5% PIS Unfiltered 75

50

25

0 2 4 6 0 2 4 6 Branch Length bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/765610; this version posted September 11, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.