bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 Transition bias influences the evolution of antibiotic resistance in Mycobacterium

2 tuberculosis

3 Joshua L. Payne1,2π*, Fabrizio Menardo3,4 π, Andrej Trauner3,4, Sonia Borrell3,4, Sebastian M.

4 Gygli3,4, Chloe Loiseau3,4, Sebastien Gagneux3,4†, Alex R. Hall1†

5 1. Institute of Integrative , ETH Zurich, Switzerland

6 2. Swiss Institute of Bioinformatics, Lausanne, Switzerland

7 3. Swiss Tropical and Public Health Institute, Basel, Switzerland

8 4. University of Basel, Basel, Switzerland

9 *Correspondence: [email protected]

10 π, These authors contributed equally.

11 †These authors also contributed equally.

12

13

1 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

14 Abstract

15 Transition bias, an overabundance of transitions relative to , has been widely

16 reported among studies of spreading under relaxed selection. However, demonstrating

17 the role of transition bias in adaptive evolution remains challenging. We addressed this challenge

18 by analyzing adaptive antibiotic-resistance mutations in the major human pathogen

19 Mycobacterium tuberculosis. We found strong evidence for transition bias in two independently

20 curated datasets comprising 152 and 208 antibiotic resistance mutations. This was true at the level

21 of mutational paths (distinct, adaptive DNA sequence changes) and events (individual instances

22 of the adaptive DNA sequence changes), and across different and promoters

23 conferring resistance to a diversity of antibiotics. It was also true for mutations that do not code

24 for amino acid changes (in gene promoters and the ribosmal gene rrs), and for mutations that are

25 synonymous to each other and are therefore likely to have similar fitness effects, suggesting that

26 transition bias can be caused by a bias in supply. These results point to a central role for

27 transition bias in determining which mutations drive adaptive antibiotic resistance evolution in a

28 key pathogen.

29

30 Keywords

31 Antibiotic resistance, evolution, mutation bias, Mycobacterium tuberculosis, transition bias

32

33

2 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

34 Significance statement

35 Whether and how transition bias influences adaptive evolution remain open questions. We

36 studied 296 DNA mutations that confer antibiotic resistance to the human pathogen

37 Mycobacterium tuberculosis. We uncovered strong transition bias among these mutations and

38 also among the number of times each mutation has evolved in different strains or geographic

39 locations, demonstrating that transition bias can influence adaptive evolution. For a subset of

40 mutations, we were able to rule out an alternative selection-based hypothesis for this bias,

41 indicating that transition bias can be caused by a biased mutation supply. By revealing this bias

42 among M. Tuberculosis resistance mutations, our findings improve our ability to predict the

43 mutational pathways by which pathogens overcome treatment.

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

3 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

60 Introduction

61 Mutation creates genetic variation and therefore influences evolution. Mutation is not an entirely

62 random process, but rather exhibits biases toward particular DNA sequence changes. For

63 example, a bias toward transitions (-to-purine or -to-pyrimidine changes),

64 relative to transversions (purine-to-pyrimidine or pyrimidine-to-purine changes), has been widely

65 reported among studies of mutations spreading under relaxed selection (1-5).

66

67 Demonstrating the role of such transition bias in adaptive evolution remains challenging, with

68 most existing evidence derived from individual case studies (6-10). Stoltzfus and McCandlish

69 recently reported the first systematic study of transition bias in putatively adaptive evolution,

70 using the repeated occurrence of amino acid replacements in laboratory or natural evolution as

71 evidence that the replacements are adaptive (11). Their meta-analysis provides compelling

72 evidence that transition bias influences adaptive evolution, with transitions observed in at least

73 two-fold excess of the null expectation that they occur once for every two transversions. Yet such

74 analyses have two limitations. First, while the repeated occurrence of an amino acid replacement

75 is highly suggestive of adaptation (12), it is not direct evidence of adaptation. Second, an

76 overabundance of transitions could result from a bias in mutation supply (i.e., mutation-based

77 transition bias), from a greater selective advantage conferred by the amino acid replacements

78 caused by transitions relative to those caused by transversions (i.e., selection-based transition

79 bias), or from both. For example, the spontaneous of methylated to

80 may contribute to mutation-based transition bias (13), whereas the propensity of non-

81 synonymous transitions to conserve the biochemical properties of amino acids better than non-

82 synonymous transversions may contribute to selection-based transition bias (14). Discriminating

83 amongst these two forms of transition bias has proven challenging to date (15, 16).

84

4 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

85 Despite the emerging evidence that transitions are overrepresented among adaptive mutations as

86 well as those spreading under relaxed selection, it remains unclear whether this plays a role in

87 real-world scenarios where rapid adaptive evolution has important implications for treatment of

88 infectious disease, such as the emergence of antibiotic resistance in pathogenic bacteria. For

89 example, if pathogen populations fix the first resistance mutation that appears ('first-come-first-

90 served') (17), then a mutation supply biased toward particular types of substitutions

91 will influence which genetic changes drive adaptation. Alternatively, if many beneficial

92 mutations are available to selection, and pathogens fix those with the highest selective advantage

93 ('pick-the-winner'), a bias in mutation supply would have a weaker impact on which genetic

94 changes drive adaptation. Therefore, identifying transition bias in such scenarios would improve

95 our basic understanding of how resistance evolves, and our ability to predict the relative

96 likelihoods of alternative mutational pathways to resistance.

97

98 Here, we study transition bias in the evolution of antibiotic resistance in Mycobacterium

99 tuberculosis (MTB), a major human pathogen for which antibiotic resistance evolution is a key

100 obstacle to effective treatment (18). We do so using two independently curated datasets of

101 mutations that are known to confer antibiotic resistance and are therefore definitively adaptive.

102 Additionally, we test specifically for mutation-based transition bias by considering two subsets of

103 adaptive mutations: 1) Mutations located in gene promoters and in the ribosomal gene rrs, which

104 is not translated to protein and therefore should not be influenced by selection-based bias caused

105 by transitions encoding different amino acid changes than transversions. 2) Mutations that are

106 synonymous to each other and are therefore likely to have similar fitness effects.

107

108 Our results reveal strong transition bias in the mutational paths to antibiotic resistance and in the

109 number of times each mutational path is used in the evolution of antibiotic resistance, across 22

110 genes or gene promoters that confer resistance to 11 antibiotics. We also observe transition bias

5 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

111 among adaptive mutations that do not code for amino acid changes, and among adaptive

112 mutations that are synonymous to each other, consistent with the hypothesis that transition bias is

113 at least partly mutation-based. We therefore demonstrate that transition bias influences adaptive

114 evolution, specifically the evolution of antibiotic resistance in a key global pathogen.

115

116 Results

117 We curated a dataset of 152 unique point mutations that confer resistance to at least one of 11

118 different antibiotics, and that appeared in at least one of 9,351 publicly-available MTB

119 (Materials and Methods). We refer to this dataset as the Basel dataset. We also analyzed an

120 independently curated dataset of 208 unique point mutations that confer resistance to at least one

121 of 8 antibiotics and appeared in at least one of 5,310 MTB genomes (19) (Materials and

122 Methods). We refer to this dataset as the Manson dataset. The Basel and Manson datasets have 64

123 point mutations in common, and together include resistance mutations for 11 antibiotics.

124

125 Following Stoltzfus and McCandlish (11), we separately studied transition bias in mutational

126 paths and in mutational events. A mutational path is any single mutation in one of our datasets,

127 such as the well known C > G that causes the S315T substitution in the sole MTB

128 catalase KatG to confer resistance to isoniazid (20). This mutational path may be used any

129 number of times during adaptation of MTB strains to antibiotics, for example in different patients

130 or in different geographic regions. Each use is a mutational event. For the Basel dataset, we

131 calculated the number of mutational events using a parsimony-based analysis of mutational gains

132 and losses at all nodes in the reconstructed phylogeny of the 9,351 MTB genomes (Materials and

133 Methods). The Manson dataset also contained estimates of the number of mutational events,

134 derived using similar methods (Materials and Methods).

135

6 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

136 We studied transition bias by calculating the ratio of the number of transitions to the number of

137 transversions in each dataset, separately for mutational paths and events. We calculated 95%

138 binomial confidence intervals on this ratio and an empirical p-value that describes the probability

139 of observing a transition:transversion ratio greater than the observed ratio, given a null model

140 (Materials and Methods). Among several null models described by Stoltzfus and McCandlish

141 (11), we chose the most conservative, which assumes that all nucleotide mutations are equally

142 likely and there is no difference between the average fitness effects of transitions and

143 transversions. This null model gives a transition:transversion ratio of 0.5, because for any given

144 nucleotide there is one possible transition and two possible transversions. We therefore inferred

145 transition bias in cases where the lower bound of the 95% binomial confidence interval of the

146 observed transition:transversion ratio was greater than 0.5 and the empirical p-value was less than

147 0.05.

148

149 We observed transition bias in both mutational paths and mutational events, for both the Basel

150 and Manson datasets (Figure 1). Transition bias was considerably more pronounced in the

151 number of mutational events, as compared to the number of mutational paths. This provides

152 compelling evidence that transition bias influences the evolution of antibiotic resistance in MTB.

153 To understand why, assume there was no mutation-based or selection-based transition bias and

154 the high transition:transversion ratio at the level of mutational paths was instead due to chance,

155 such as a sampling artifact. In this scenario we would expect at most the same transition bias at

156 the level of events, rather than the roughly two-fold increase that we observed.

157

158 To determine which mutations explained the observed transition bias, we calculated the relative

159 rates of all six possible nucleotide pair mutations, accounting for GC content (1), which is

160 relatively high in MTB compared to other bacteria (Materials and Methods). For mutational

161 paths, the rate of A/T > G/C transitions exceeded the rate of any other nucleotide-pair mutation

7 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

162 (Figure 2A,B). The same was true for mutational events (Figure 2C,D), where the rates of both

163 forms of transitions (G/C > A/T and A/T > G/C) were at least 1.5 times that of all forms of

164 transversions.

165

166 Because the influence of transition bias might depend on the mechanism of antibiotic resistance,

167 we next tested for transition bias separately for different antibiotics. This reduced the number of

168 mutational paths and events that could be analyzed, so we first determined the antibiotics for

169 which we had sufficient statistical power to ensure that an observed lack of transition bias was not

170 due to reduced sample size (Materials and Methods). This analysis revealed that, at the level of

171 mutational paths, our datasets were too small to test for transition bias of the strength observed in

172 the entire dataset for any individual antibiotic (Figure S1A,B). This is because at least 44 (Basel)

173 and 118 (Manson) mutational paths would be required to provide sufficient statistical power, and

174 the maximum number of paths per antibiotic was 30 (for streptomycin) in the Basel dataset and

175 51 (for rifampicin) in the Manson dataset (Figure S2A,B). In contrast, we were able to test for

176 transition bias in the mutational events associated with resistance to individual antibiotics,

177 because only 14 and 12 events were required to provide sufficient statistical power in the Basel

178 and Manson datasets (Figure S1C,D). For this analysis, we considered mutations that

179 simultaneously conferred resistance to multiple antibiotics in separate categories. For example,

180 some mutations in the gene Rv1484 (inhA) and its promoter confer resistance to both isonazid

181 and ethionamide (5 and 6 mutations respectively in the Basel and Manson datasets). To avoid

182 counting such mutations multiple times, we analyzed them separately and did not include them

183 together with mutations that conferred resistance only to isonazid or ethionamide. All but two

184 individual antibiotics, both in the Basel dataset, had enough mutational events to test for

185 transition bias (Figure S2C,D).

186

8 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

187 We observed transition bias in the number of mutational events associated with the evolution of

188 resistance to almost all antibiotics, with transition bias indicated by transition:transversion ratios

189 ranging from 1.59 for pyrazinamide to 41.6 for kanamycin (i.e., from more than three-fold to

190 more than eighty-fold excess of the null expectation; Table 1). The glaring exception was

191 isoniazid, which was dominated by a single C > G transversion (S315T in KatG), representing

192 409 of the 472 mutational events associated with isoniazid resistance in the Basel dataset (or 671

193 events if we include mutations in inhA that are associated with resistance to both isoniazid and

194 ethionamide) and 321 of the 389 mutational events associated with isoniazid resistance in the

195 Manson dataset (or 545 if we include inhA).

196

197 The above results suggest transition bias influences the evolution of antibiotic resistance in MTB.

198 However, it remains unclear whether this bias is mutation-based or selection-based. To

199 disentangle these potential sources of transition bias, we used two different approaches. First, if

200 the observed bias were caused by transitions and transversions encoding amino acid changes with

201 different average fitness effects, we would not expect the bias to extend to mutations that do not

202 encode amino acid changes. We tested this by examining mutations in gene promoters and the

203 ribosomal gene rrs. There were 23 such mutations in the Basel dataset, and 25 in the Manson

204 dataset (12 occur in both datasets). Because this did not provide enough statistical power to

205 perform the analysis at the level of paths, we only considered events here. We found an excess of

206 transitions, demonstrating transition bias among mutations that do not encode amino acid

207 changes. Specifically, in the Basel dataset, there were 603 events that comprised 525 transitions

208 and 78 transversions (transition:transversion ratio of 6.73; 95% binomial confidence interval:

209 (5.30, 8.65); empirical p-value = 0). In the Manson dataset, there were 520 events that comprised

210 421 transitions and 99 transversions (transition:transversion ratio of 4.25; 95% binomial

211 confidence interval: (3.41, 5.35); empirical p-value = 0). Note that mutations in genes such as rrs

9 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

212 may nevertheless have variable fitness effects despite not encoding amino acid changes (21), and

213 we discuss this further below.

214

215 Second, we considered cases where different mutations caused the same resistance-conferring

216 amino acid change, and were therefore synonymous to each other and expected to have similar

217 fitness effects. Specifically, we considered amino acid changes that can be caused by both

218 transition and transversion mutations at the same ancestral codon. For example, methionine can

219 mutate to isoleucine via the transition ATG > ATA or the transversions ATG > ATT and ATG >

220 ATC. In the standard genetic code, there are five such amino acid changes (Table 2). These were

221 rare or non-existent in the Basel and Manson datasets (Table 2), except the amino acid change

222 methionine-to-isoleucine, which occurred in 3 mutational paths and 137 events in the Basel

223 dataset, and in 4 mutational paths and 135 events in the Manson dataset. In the Basel dataset, the

224 137 events comprised 88 transitions and 49 transversions (transition:transversion ratio of 1.80;

225 95% binomial confidence interval: (1.25, 2.60); empirical p-value = 0). In the Manson dataset, the

226 135 events comprised 96 transitions and 39 transversions (transition:transversion ratio of 2.46;

227 95% binomial confidence interval: (1.68, 3.67); empirical p-value = 0). This shows transitions

228 were also overrepresented among events conferring resistance via the same amino acid change

229 (from methionine to isoleucine).

230

231 Ideally, we would have sufficient mutational paths or events to perform the above analysis on all

232 of the amino acid changes in Table 2. To this end, we analyzed an older dataset, TBDReaMDB

233 (22), that included a greater number of resistance mutations, but was not compiled with the same

234 strict inclusion criteria as the Basel and Manson datasets. Across the 717 mutational paths from

235 this dataset that we included (Materials & Methods), we observed transition bias

236 (transition:transversion ratio of 0.86; 95% binomial confidence interval: (0.74, 1.00); empirical p-

237 value = 0). In this dataset there were also a sufficient number of the amino acid changes shown in

10 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

238 Table 2 to take an aggregated approach to calculating transition bias amongst mutations that are

239 synonymous to each other. Specifically, amongst all such mutational paths, there were 16

240 transitions and 12 transversions (Table 2; transition:transversion ratio of 1.33; empirical p-value =

241 0.014). This was more than twice the expected ratio of 0.63, derived from a null model

242 accounting for the number of transitions and transversions in these particular mutational paths

243 (Materials and Methods).

244

245 Discussion

246 We found strong transition bias at multiple levels (mutational paths and events) and in multiple

247 independently curated datasets (Basel, Manson, and TBDReamDB). By analyzing antibiotic

248 resistance mutations that confer a fitness benefit in the presence of selecting antibiotics, we

249 overcame the difficulty of classifying mutations as adaptive. By focusing part of our analysis on

250 nucleotide substitutions in gene promoters and the ribosomal gene rrs, and on mutations that are

251 synonymous to each other, we overcame the difficulty of ruling out an overabundance of

252 transitions caused by transitions and transversions encoding amino acid changes with different

253 average fitness effects. Our data also revealed notable exceptions to the general trend toward

254 transition bias. In particular, the most common resistance mutation by far against isoniazid was a

255 transversion. Our results have four key implications for antibiotic resistance evolution and

256 mutational biases.

257

258 First, quantifying the overabundance of transitions improves our ability to predict mutational

259 pathways to resistance. MTB often acquires multiple resistance mutations sequentially, and some

260 mutational trajectories are more common than others (23). Our results suggest the probability of

261 following a given trajectory will be higher when it contains a greater fraction of transitions than

262 alternative trajectories encoding similar resistance .

263

11 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

264 Second, for at least part of our data, we excluded an important potential explanation for transition

265 bias, specifically that transitions encode amino acid changes with different average fitness effects

266 than transversions. We did this by showing that the bias extended to mutations in gene promoters

267 and the ribosomal gene rrs, and to mutations that are synonymous to each other. We still cannot

268 rule out variable fitness effects that are not linked to amino acid changes, as observed for

269 streptomycin-resistance mutations in ribosomal genes (21), or synonymous resistance mutations

270 in other bacterial species (24). Nevertheless, if such effects were important drivers of resistance

271 evolution in MTB, we would expect a higher fraction of observed mutations to be synonymous (0

272 and 2 mutations in coding regions were synonymous in the Basel and Manson datasets,

273 respectively). Moreover, a recent survey of experimentally observed fitness effects did not

274 support a difference in average fitness effects between transitions and transversions (16). By

275 contrast, analysis of mutations under relaxed selection indicates transitions occur more frequently

276 than transversions, including for MTB (1). Therefore, while we do not argue there is no role for

277 biased fitness effects in general (in fact there is evidence of this for viruses (15)), it is unlikely to

278 be the sole cause of the overabundance of transitions we observed, indicating this is explained

279 instead or in addition by a higher mutation supply of transitions than transversions.

280

281 Third, if we then accept that a biased mutation supply explains at least some of the observed bias

282 among mutational paths to resistance, this is consistent with a role for mutation-limited 'first-

283 come-first-served' dynamics (17) in resistance evolution in MTB. This is not necessarily the case

284 for other pathogens, such as bacteria where resistance is horizontally transmissible. Indeed, this

285 scenario may be relatively likely for pathogens like MTB, where infections expand slowly from a

286 small infectious dose (25) initially containing little genetic variation. On the other hand, genomic

287 evidence shows multiple resistant MTB lineages can occur within the same patient (26, 27),

288 which is less supportive of mutation-limited dynamics. The extent to which such lineages

289 compete with each other will depend on the spatial population structure, for example if they are in

12 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

290 different lung sections they may not compete directly (28). Overall, this indicates resistance may

291 evolve via a process somewhere along a continuum between first-come-first-served and pick-the-

292 winner (29).

293

294 Fourth, MTB may be closer to one end of this continuum when challenged with some antibiotics

295 than others. Specifically, a single transversion accounted for >70% of the mutational events

296 conferring isoniazid resistance (the same was also true of mutations conferring resistance to both

297 isoniazid and ethionamide), indicating a greatly reduced role for transition bias. Instead, the high

298 frequency of the S315T mutation in katG among isoniazid-resistant isolates may reflect that it

299 confers resistance to clinically relevant concentrations, particularly in combination with inhA

300 promoter mutations (30), without severely reducing catalytic function (20). MTB strains carrying

301 this mutation, while not detected in in vitro mutant screens (31), also appear more likely to

302 transmit than strains with other isoniazid-resistance mutations (32).

303

304 Our observation that transition bias is prevalent at the level of paths, but even stronger at the level

305 of events, is consistent with Stoltzfus & McCandlish's (2017) observations for mutations in

306 parallel experimental or natural populations, and is expected under mutation-based transition bias

307 when the sample size is large (11). Specifically, when the sample size is small, each observed

308 mutational path is represented by one event only, so the transition:transversion ratio of paths and

309 events is identical. With increasing sample size, individual paths are represented by multiple

310 events. In this case, the transition:transversion ratio among events will tend toward that of the

311 adaptive , and transitions will be overrepresented if there is mutation-based

312 transition bias. However the transition:transversion ratio of observed paths will tend toward the

313 ratio for all possible unique adaptive mutations, which would approximate 0.5 even when there is

314 mutation-based transition bias (11). Note the observed ratio at the level of paths, while weaker

315 than at the level of events, still exceeded 0.5. This could be because our datasets are incomplete

13 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

316 samples of the possible paths to resistance in MTB, and the likelihood of a given path featuring in

317 our datasets is higher if it occurs more frequently (i.e. mutation-based bias and intermediate

318 sample size). Alternatively, our datasets may capture the vast majority of paths to resistance, but

319 due to selection-based bias a greater fraction of them are transitions than transversions.

320

321 The bias we observed was also prevalent at the level of individual nucleotide changes, with A/T >

322 G/C transitions being particularly common. This is in contrast to earlier evidence that G/C > A/T

323 transitions are the most common type of mutation spreading under relaxed selection in multiple

324 species, including MTB (1). Nevertheless, in that study the ratio of transitions to A/T compared to

325 G/C was lower for MTB than other bacteria. Further evidence of a distinctive transition bias

326 toward G/C in Mycobacteria, and in particular an overabundance of A/T > G/C transitions, comes

327 from a recent mutation accumulation experiment with the related species Mycobacterium

328 smegmatis (33). Another earlier study testing for evidence that oxidative damage caused by

329 antibiotics influences mutational biases also found that A/T > G/C transitions were the most

330 common type of substitution in an earlier version of the TBDReaMDB (34). Thus, transition bias

331 is pervasive across earlier studies and our dataset, yet may be toward different types of transitions

332 depending on the species and on whether mutations are involved in antibiotic resistance.

333

334 In conclusion, our data support the hypothesis that a bias toward transitions plays a key role in

335 determining the genetic changes driving antibiotic resistance evolution.

336

337 Materials and Methods

338 The Basel dataset

339 We curated a list of mutations known to confer resistance to one or more of the following drugs

340 or drug classes: isoniazid, ethionamide, rifampicin, ethambutol, pyrazinamide, fluoroquinolones,

341 aminoglycosides. Starting from a previously published set of 120 mutations (35), we first

14 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

342 excluded mutations in rpsA and ahpC, because these genes were unlikely to confer resistance to

343 pyrazinamide and isoniazid respectively (36, 37). We then added gyrB to the list of pertinent

344 genes, because some mutations in this gene have been shown to lead to fluoroquinolone

345 resistance (38).

346

347 We included additional mutations if they met one or more of the following criteria: 1) they have

348 been shown, by virtue of allelic exchange, to confer resistance; 2) introduction of the mutation

349 into the enzyme of interest was investigated in vitro and shown to confer properties consistent

350 with drug resistance; 3) they were identified in laboratory-generated antibiotic-resistant strains as

351 the most likely candidate for resistance; or 4) there was a clear correlation between the presence

352 of the mutation and drug resistance as detected by phenotypic drug susceptibility testing of

353 clinical strains. This resulted in a list of 196 mutations (Table S1), of which we found 152 in at

354 least one of 9,351 publicly-available MTB genomes (Table S2) (39). These are the mutational

355 paths in the Basel dataset (Table S3).

356

357 We determined the mutational events by first calling single nucleotide polymorphisms in the

358 9,351 genomes and then using the polymorphisms to reconstruct the genomes’ phylogeny and to

359 infer mutational gains and losses throughout the phylogeny, as follows. We clipped Illumina

360 adaptors, trimmed low quality reads using Trimmomatic v. 0.33 (SLIDINGWINDOW:5:20) (40),

361 and removed reads shorter than 20 base pair. We merged overlapping paired-end reads using

362 SeqPrep v. 1.2 (overlap size = 15), and mapped the resulting reads to the reconstructed ancestral

363 sequence of the M. tuberculosis complex (41) using the mem algorithm of BWA v 0.7.13 (ref

364 (42)). We marked duplicated reads using the MarkDuplicates module of Picard v. 2.9.1,

365 performed local realignment of reads around using the RealignerTargetCreator and

366 IndelRealigner modules of GATK v. 3.4.0 (ref (43)), and excluded reads with an alignment score

367 lower than (0.93 × read_length) – (read_length × 4 × 0.07), which corresponds to more than 7

15 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

368 mismatches per 100 base pairs. We called single nucleotide polymorphisms using Samtools v. 1.2

369 mpileup (44) and VarScan v. 2.4.1 (ref (45)), with the following thresholds: minimum mapping

370 quality of 20, minimum base quality at a position of 20, minimum read depth at a position of 7,

371 minimum percentage of reads supporting the call 90%, maximum strand bias for a position 90%.

372 Genomes were excluded if 1) they had an average coverage < 20x, 2) more than 50% of their

373 single nucleotide polymorphisms were excluded due to the strand bias filter, 3) more than 50% of

374 their single nucleotide polymorphisms had a percentage of reads supporting the call between 10%

375 and 90%, or 4) they contained single nucleotide polymorphisms that belong to different MTB

376 lineages, because this indicates that a mix of genomes was sequenced. Finally, we excluded all

377 genomic positions with more than 10% missing data. This resulted in a final dataset of 300,583

378 polymorphic positions.

379

380 We inferred a phylogenetic tree based on the 300,583 single nucleotide polymorphisms with

381 FastTree (46), using double digit precision and the options -nocat and -nosupport. We extracted

382 the bases at each of the 152 genomic positions in our list of resistance mutations from the vcf file

383 for the 9,351 genomes and assembled them in phylip format. To determine the number of

384 mutational events per mutational path, we reconstructed the nucleotide changes at the 152

385 genomic positions on the phylogenetic tree rooted with the inferred ancestral sequence of the

386 MTB complex (41). To do this we used the maximum parsimony ACCTRAN algorithm (47)

387 implemented in PAUP* v. 4.0a (ref (47)), giving equal weight to all characters and considering

388 them as unordered.

389

390 The Manson dataset

391 Manson et al. (19) compiled a list of polymorphisms associated with resistance to eight

392 antibiotics (Supplementary Table 4 in ref (19)), and searched for these polymorphisms in 5,310

393 MTB genomes. They found 392 of these polymorphisms in at least one (Supplementary

16 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

394 Table 5 in ref (19)). We filtered these polymorphisms to only include point mutations, resulting in

395 a dataset of 208 mutational paths. Manson et al. (19) calculated the number of events per

396 mutational path by reconstructing the phylogeny of the 5,310 MTB genomes and using a

397 parsimony-based analysis to determine mutational gains and losses throughout the phylogeny. We

398 used their estimates of the number of events per mutational path, as they were reported in

399 Supplementary Table 5 of ref (19).

400

401 The TBDReaMDB dataset

402 TBDReaMDB is a dataset of 1,178 mutational paths associated with resistance to at least one of 9

403 antibiotics. We filtered this dataset to only include mutational paths that (1) are non-synonymous

404 point mutations, with both both the ancestral and derived codons reported (709 paths), or (2)

405 point mutations in promoters that are upstream of a gene’s transcription start site (8 paths), and

406 (3) are non-redundant, where we considered two mutational paths redundant if they were the

407 same mutational path and associated with the same gene ID, drug, and codon position (or

408 nucleotide position in the case of promoters). The filtered dataset contains 717 mutational paths.

409

410 We used this dataset, which includes a greater number of mutations than the Basel and Manson

411 datasets, but with less strict inclusion criteria, to study transition bias amongst amino acid

412 changes that can be caused by mutational paths that are transitions or transversions, and that arise

413 from the same codon (Table 2). For this purpose, we developed a null model accounting for the

414 number of transitions and transversions in the available mutational paths that cause such amino

415 acid changes. Specifically, for a given combination i of amino acid change and ancestral codon,

416 there are ni mutational paths that are transitions and pi mutational paths that are transversions. In

417 the TBDReaMDB dataset, there are m = 6 such combinations of amino acid changes and

418 ancestral codons (top six rows of Table 2). Some are observed multiple times (at different loci),

419 giving a total of 28 such observations (i.e., the sum of the values in the rightmost column of Table

17 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

! ! 420 2). The expected probability of a transition under the null model is (1/ !!! �!) !!! �!�! /

421 (�!!�!), where qi is the number of times each mutation (path) is observed at different loci

422 (events). For the m = 6 observed mutational paths, this gives a null transition probability of

423 (1/(1+1+11+1+7+7))*(1*0.5 + 1*0.5 + 11*0.33 + 1*0.33 + 7*0.33 + 7*0.5) = 0.385, which

424 equates to a transition:transversion ratio of 0.63.

425

426 Calculating empirical p-values for transition:transversion ratios

427 To calculate an empirical p-value for an observed transition:transversion ratio in a dataset

428 containing x mutational paths or events, we randomly generated 106 datasets of x mutations

429 according to our null model that transversions are twice as likely as transitions. Each of the x

430 mutations in each dataset was chosen to be a transition with probability 1/3 or a transversion with

431 probability 2/3 (when we analyzed mutations that are synonymous to each other in the

432 TBDReaMDB dataset, we used probabilities of 0.385 and 0.615). We determined the

433 transition:transversion ratio for each of these datasets and then calculated the empirical p-value as

434 the fraction of these ratios that were greater than the observed ratio.

435

436 Calculating the relative rates of the six nuleotide pair mutations

437 GC content may influence the number of mutational paths or mutational events in our datasets.

438 MTB has a high GC content, 65.6% genome-wide (48) and 64.4% in the 17 genes associated with

439 resistance in the Basel and Manson datasets. Thus, we expected to see more mutations from G/C

440 or C/G than from A/T or T/A, simply because there are more Gs and Cs in the genes associated

441 with resistance in our datasets. To control for this effect in our calculation of the relative rates of

442 the six possible nucleotide pair mutations, we followed the method of Hershberg and Petrov (1).

443 Specifically, we first determined the number of mutations of each type we would expect under

444 equal GC content by multiplying the number of mutations from A/T to G/C, C/G, or T/A by

445 64.4/(100 - 64.4). We then calculated the relative rates of the six nucleotide pair mutations by

18 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

446 dividing the number of each (unchanged for mutations from G/C) by the sum of all possible pairs

447 of mutations, and multiplying by 100.

448

449 Calculating statistical power

450 The numbers of paths and events are much smaller for individual antibiotics than in the full

451 datasets, which may render our test of transition bias statistically underpowered for individual

452 antibiotics. To determine the minumum number of mutational paths or events required to rule out

453 the possibility that an observed lack of transition bias might be due to small sample size, we

454 downsampled our datasets as follows. For each of the Basel and Manson datasets separately, we

455 randomly sampled x mutational paths or events from the dataset without replacement and

456 calculated the fraction of these paths or events that were transitions (we calculate this fraction,

457 rather than the transition:transversion ratio, to avoid division by zero) and the associated 95%

458 binomial confidence interval. We repeated this 104 times for each value of x, and calculated the

459 average fraction of transitions, the minimum of the lower 95% confidence intervals, and the

460 maximum of the upper 95% confidence intervals. For mutational paths, we varied x from 2 to the

461 maximum number of mutational paths in each dataset, and for mutational events, we varied x

462 from 2 to 800. We then determined the minimum value of x for which the minimum of the 95%

463 confidence intervals exceeded 1/3, which is the expected fraction of transitions under the null

464 model (corresponding to a transition:transversion ratio of 0.5). This is the minimum number of

465 mutational paths or events required to exclude the possibility that an observed lack of transition

466 bias at a given value of x is due to small sample size, given the transition bias in the full dataset.

467 For the Basel dataset, this minimum number of mutational paths was 44 and the minimum

468 number of mutational events was 14. In the Manson dataset, the minimum number of mutational

469 paths was 118 and the minimum number of mutational events was 12. No antibiotic was

470 associated with more than these minimum numbers of mutational paths in either the Basel or

471 Manson datasets (Figure S2A,B). However, we had at least the minimum number of mutational

19 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

472 events for all antibiotics and combinations except for Amikacin+Capreomycin, and mutations

473 conferring resistance only to Capreomycin in the Basel dataset (Figure S2C,D).

474

475 Acknowledgements

476 We acknowledge support from Swiss National Science Foundation (SNSF) grants

477 PP00P3_170604 (J.L.P), 31003A_165803 (A.R.H.), 310030_166687 (S.G.), IZRJZ3_164171

478 (S.G.), IZLSZ3_170834 (S.G.) and CRSII5_177163 (S.G.), the European Research

479 Council (309540-EVODRTB to S.G.) and SystemsX.ch. Part of the calculations were

480 performed at sciCORE (http://scicore.unibas.ch/) scientific computing center at University of

481 Basel.

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

20 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

497 Table 1. Summary of transition bias in mutational events per antibiotic in the Basel and

498 Manson datasets. Rows are ordered by decreasing number of events in the Basel dataset.

499 Dashes indicate antibiotics for which there are no mutational events in the respective

500 dataset. Mutations that confer resistance to multiple antibiotics are reported separately and

501 were not counted among the events conferring resistance to individual antibiotics. RIF-

502 Rifampicin, EMB - Ethambutol, INH – Isonazid, SM – Streptomycin, ETH - Ethionamide,

503 FQ – Floroquinolones, AK – Amikacin, CAP – Capreomycin, KAN – Kanamycin, PZA –

504 Pyrazinamide, OFX - Ofloxacine. Ti – number of transitions, Tv – number of transversions,

505 Ti : Tv – transition:transversion ratio, 95% CI – 95% binomial confidence interval, p –

506 empirical p-value.

507

Basel dataset Manson dataset

Antibiotic Ti Tv Ti:Tv 95% CI p Ti Tv Ti:Tv 95% CI p

RIF 365 223 1.64 (1.38, 1.94) 0 494 209 2.36 (2.01, 2.80) 0

EMB 390 154 2.53 (2.10, 3.07) 0 342 131 2.61 (2.13, 3.22) 0

INH 42 430 0.10 (0.07, 0.13) 1 54 335 0.16 (0.12, 0.22) 1

SM 285 71 4.01 (3.08, 5.28) 0 254 72 3.53 (2.71, 4.65) 0

INH, ETH 231 31 7.45 (5.11, 11.22) 0 137 18 7.61 (4.64, 13.23) 0

FQ 166 57 2.91 (2.14, 4.01) 0 - - - - -

AK, CAP, KAN 128 0 ∞ (34.20, ∞) 0 - - - - -

PZA 67 28 2.39 (1.52, 3.86) 0 46 29 1.59 (0.98, 2.61) 0

KAN 52 9 5.78 (2.82, 13.34) 0 208 5 41.6 (17.54, 129.46) 0

ETH 7 10 0.70 (0.23, 2.04) 0.3 16 10 1.60 (0.68, 3.95) 0.003

KAN, CAP 17 0 ∞ (4.13, ∞) 0 - - - - -

OFX - - - - - 220 91 2.42 (1.89, 3.12) 0

21 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

508 Table 2. Observed transitions and transversions in mutational events in the Basel and

509 Manson datasets, and in mutational paths in the TBDReaMDB dataset, among amino acid

510 changes that can be caused by both transition and transversion mutations to the same

511 codon. Abbreviations as in Table 1.

Amino acid change Ancestral Possible Basel Manson TBDReamDB codon Ti:Tv Ti:Tv Ti:Tv Ti:Tv G → R GGA 1:1 0:0 0:0 0:1 G → R GGG 1:1 0:0 0:0 1:0 M → I ATG 1:2 88:49 96:39 6:5 F → L TTT 1:2 0:0 0:0 1:0 F → L TTC 1:2 0:0 0:0 1:6 W → R TGG 1:1 0:0 0:0 7:0 Stop → R TGA 1:1 0:0 0:0 0:0 512

513

514

515

516

517

518

519

520

521

522

523

524

525

526

527

528

22 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

529 Figure 1. Transition bias in mutational paths and mutational events in the Basel and

530 Manson datasets. The dashed horizontal line shows the null expectation of the

531 transition:transversion ratio.

532

533

534

535

536

537 Basel 2 Manson 538

539

540 1.5 541

542

543 1

544 Transition:transversion ratio

545

546 0.5

547 Paths Events

548

549

550

551

552

23 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

553 Figure 2. Relative rates of the six nucleotide pair mutations, for mutational paths and

554 events in the Basel and Manson datasets. Transitions are indicated with bold text. Rates

555 adjusted for GC content (Materials and methods).

Basel Manson 35 25 A B 30 20 25

20 15

15 10 10 5 5

0 0 Proportion of mutational paths (%) Proportion of mutational paths (%)

A>T,T>A A>T,T>A G>C,C>GG>T,C>A A>C,T>GG>A,G>A,C>T C>TA>G,A>G,T>C T>C G>C,C>GG>T,C>A A>C,T>GG>A,G>A,C>T C>TA>G,A>G,T>C T>C

50 40 C D 40

30 30

20 20

10 10

0 0 Proportion of mutational events (%) Proportion of mutational events (%)

A>T,T>A A>T,T>A 556 G>C,C>GG>T,C>A A>C,T>GG>A,G>A,C>T C>TA>G,A>G,T>C T>C G>C,C>GG>T,C>A A>C,T>GG>A,G>A,C>T C>TA>G,A>G,T>C T>C

557

558

559

560

561

562

563

24 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

564 Figure S1. The average fraction of transitions (open circles), with the minimum and

565 maximum of the lower and upper 95% binomial confidence intervals (dashed lines), relative

566 to the number of sampled paths or events from the Basel or Manson datasets (x-axis). Open

567 circles and dashed lines are derived from 104 replications per number of sampled paths or

568 events. The horizontal line indicates the null expectation of a fraction of transitions equal to

569 1/3, and the vertical line indicates the minimum number of mutational paths or events

570 required for the minimum lower bound on the 95% confidence interval to exceed 1/3.

Basel Manson 1 1 A B 0.8 0.8

0.6 0.6

0.4 0.4 that are transitions 0.2 that are transitions 0.2 Fraction of sampled paths Fraction of sampled paths 0 0 100 101 102 100 101 102 Number of sampled paths Number of sampled paths

1 1 C D 0.8 0.8

0.6 0.6

0.4 0.4 that are transitions 0.2 that are transitions 0.2 Fraction of sampled events Fraction of sampled events 0 0 100 101 102 100 101 102 571 Number of sampled events Number of sampled events

572

573

574

25 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

575 Figure S2. The numbers of mutational paths and events that confer resistance to individual

576 or multiple antibiotics in the Basel and Manson datasets.

Basel Manson 35 A 50 B 30

25 40

20 30 15 20 10 Number of paths 10 5

0 0

SM RIF FQ RIF SM INH PZAEMB INH ETH KANCAP PZA ETHEMB OFX KAN AK,CAP INH,ETH KAN,CAP INH, ETH AK,CAP,KAN 600 800 C D 500 600 400

300 400

200 200 Number of events 100

0 0

RIF INHSM FQ RIF SM EMB PZAKANETH CAP EMB INH OFX KAN PZA ETH AK,CAP INH,ETH KAN,CAP INH, ETH K,CAP,KAN 577 A

578

579

580

581

582

583

584

26 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

585 References

586 1. Hershberg R & Petrov DA (2010) Evidence that mutation is universally 587 biased towards AT in bacteria. PLoS Genet 6(9):e1001115. 588 2. Petrov DA & Hartl DL (1999) Patterns of nucleotide substitution in 589 Drosophila and mammalian genomes. Proc Natl Acad Sci U S A 96(4):1475- 590 1479. 591 3. Ossowski S, et al. (2010) The rate and molecular spectrum of spontaneous 592 mutations in Arabidopsis thaliana. Science 327(5961):92-94. 593 4. Keightley PD, et al. (2009) Analysis of the genome sequences of three 594 Drosophila melanogaster spontaneous mutation accumulation lines. Genome 595 Res 19(7):1195-1201. 596 5. Pauly MD, Procario MC, & Lauring AS (2017) A novel twelve class fluctuation 597 test reveals higher than expected mutation rates for influenza A viruses. Elife 598 6. 599 6. Galen SC, et al. (2015) Contribution of a mutational hot spot to 600 adaptation in high-altitude Andean house wrens. Proc Natl Acad Sci U S A 601 112(45):13958-13963. 602 7. Sackman AM, et al. (2017) Mutation-Driven Parallel Evolution during Viral 603 Adaptation. Mol Biol Evol 34(12):3243-3253. 604 8. Rokyta DR, Joyce P, Caudle SB, & Wichman HA (2005) An empirical test of the 605 mutational landscape model of adaptation using a single-stranded DNA virus. 606 Nat Genet 37(4):441-444. 607 9. Weigand MR & Sundin GW (2012) General and inducible hypermutation 608 facilitate parallel adaptation in Pseudomonas aeruginosa despite divergent 609 mutation spectra. Proc Natl Acad Sci U S A 109(34):13680-13685. 610 10. Couce A, Rodriguez-Rojas A, & Blazquez J (2015) Bypass of genetic 611 constraints during mutator evolution to antibiotic resistance. Proc Biol Sci 612 282(1804):20142698. 613 11. Stoltzfus A & McCandlish DM (2017) Mutational Biases Influence Parallel 614 Adaptation. Mol Biol Evol 34(9):2163-2172. 615 12. Storz JF (2016) Causes of molecular convergence and parallelism in protein 616 evolution. Nat Rev Genet 17(4):239-250. 617 13. Rosenberg MS, Subramanian S, & Kumar S (2003) Patterns of transitional 618 mutation biases within and among mammalian genomes. Mol Biol Evol 619 20(6):988-993. 620 14. Zhang J (2000) Rates of conservative and radical nonsynonymous nucleotide 621 substitutions in mammalian nuclear genes. J Mol Evol 50(1):56-68. 622 15. Lyons DM & Lauring AS (2017) Evidence for the Selective Basis of Transition- 623 to-Transversion Substitution Bias in Two RNA Viruses. Mol Biol Evol 624 34(12):3205-3215. 625 16. Stoltzfus A & Norris RW (2016) On the Causes of Evolutionary 626 Transition:Transversion Bias. Mol Biol Evol 33(3):595-602. 627 17. Yampolsky LY & Stoltzfus A (2001) Bias in the introduction of variation as an 628 orienting factor in evolution. Evol Dev 3(2):73-83.

27 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

629 18. Gagneux S (2018) Ecology and evolution of Mycobacterium tuberculosis. Nat 630 Rev Microbiol 16(4):202-213. 631 19. Manson AL, et al. (2017) Genomic analysis of globally diverse Mycobacterium 632 tuberculosis strains provides insights into the emergence and spread of 633 multidrug resistance. Nat Genet 49(3):395-402. 634 20. Yu S, Girotto S, Lee C, & Magliozzo RS (2003) Reduced affinity for Isoniazid in 635 the S315T mutant of Mycobacterium tuberculosis KatG is a key factor in 636 antibiotic resistance. J Biol Chem 278(17):14769-14775. 637 21. Springer B, et al. (2001) Mechanisms of streptomycin resistance: selection of 638 mutations in the 16S rRNA gene conferring resistance. Antimicrob Agents 639 Chemother 45(10):2877-2884. 640 22. Sandgren A, et al. (2009) Tuberculosis drug resistance mutation database. 641 PLoS Med 6(2):e2. 642 23. Borrell S, et al. (2013) between antibiotic resistance mutations 643 drives the evolution of extensively drug-resistant tuberculosis. Evol Med 644 Public Health 2013(1):65-74. 645 24. Schenk MF, Szendro IG, Krug J, & de Visser JA (2012) Quantifying the 646 adaptive potential of an antibiotic resistance enzyme. PLoS Genet 647 8(6):e1002783. 648 25. Dheda K, Barry CE, 3rd, & Maartens G (2016) Tuberculosis. Lancet 649 387(10024):1211-1226. 650 26. Trauner A, et al. (2017) The within-host population dynamics of 651 Mycobacterium tuberculosis vary with treatment efficacy. Genome Biol 652 18(1):71. 653 27. Sun G, et al. (2012) Dynamic population changes in Mycobacterium 654 tuberculosis during acquisition and fixation of drug resistance in patients. J 655 Infect Dis 206(11):1724-1733. 656 28. Lieberman TD, et al. (2016) Genomic diversity in autopsy samples reveals 657 within-host dissemination of HIV-associated Mycobacterium tuberculosis. 658 Nat Med 22(12):1470-1474. 659 29. McCandlish DM & Stoltzfus A (2014) Modeling evolution using the 660 probability of fixation: history and implications. Q Rev Biol 89(3):225-252. 661 30. Lempens P, et al. (2018) Isoniazid resistance levels of Mycobacterium 662 tuberculosis can largely be predicted by high-confidence resistance- 663 conferring mutations. Sci Rep 8(1):3246. 664 31. Bergval IL, Schuitema AR, Klatser PR, & Anthony RM (2009) Resistant 665 mutants of Mycobacterium tuberculosis selected in vitro do not reflect the in 666 vivo mechanism of isoniazid resistance. J Antimicrob Chemother 64(3):515- 667 523. 668 32. Gagneux S, et al. (2006) Impact of bacterial on the transmission of 669 isoniazid-resistant Mycobacterium tuberculosis. Plos Pathog 2(6):e61. 670 33. Kucukyildirim S, et al. (2016) The Rate and Spectrum of Spontaneous 671 Mutations in Mycobacterium smegmatis, a Bacterium Naturally Devoid of the 672 Postreplicative Mismatch Repair Pathway. G3 (Bethesda) 6(7):2157-2163.

28 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

673 34. Wang ZY, Xiong M, Fu LY, & Zhang HY (2013) Oxidative DNA damage is 674 important to the evolution of antibiotic resistance: evidence of mutation bias 675 and its medicinal implications. J Biomol Struct Dyn 31(7):729-733. 676 35. Walker TM, et al. (2015) Whole-genome sequencing for prediction of 677 Mycobacterium tuberculosis drug susceptibility and resistance: a 678 retrospective cohort study. Lancet Infect Dis 15(10):1193-1202. 679 36. Dillon NA, Peterson ND, Feaga HA, Keiler KC, & Baughn AD (2017) Anti- 680 tubercular Activity of Pyrazinamide is Independent of trans-Translation and 681 RpsA. Sci Rep 7(1):6135. 682 37. Nieto RL, et al. (2016) Virulence of Mycobacterium tuberculosis after 683 Acquisition of Isoniazid Resistance: Individual Nature of katG Mutants and 684 the Possible Role of AhpC. PLoS One 11(11):e0166807. 685 38. Malik S, Willby M, Sikes D, Tsodikov OV, & Posey JE (2012) New insights into 686 fluoroquinolone resistance in Mycobacterium tuberculosis: functional 687 genetic analysis of gyrA and gyrB mutations. PLoS One 7(6):e39754. 688 39. Menardo F, et al. (2018) Treemmer: a tool to reduce large phylogenetic 689 datasets with minimal loss of diversity. BMC Bioinformatics 19(1):164. 690 40. Bolger AM, Lohse M, & Usadel B (2014) Trimmomatic: a flexible trimmer for 691 Illumina sequence data. Bioinformatics 30(15):2114-2120. 692 41. Comas I, et al. (2013) Out-of- migration and Neolithic coexpansion of 693 Mycobacterium tuberculosis with modern humans. Nat Genet 45(10):1176- 694 1182. 695 42. Li H & Durbin R (2009) Fast and accurate short read alignment with 696 Burrows-Wheeler transform. Bioinformatics 25(14):1754-1760. 697 43. McKenna A, et al. (2010) The Genome Analysis Toolkit: a MapReduce 698 framework for analyzing next-generation DNA sequencing data. Genome Res 699 20(9):1297-1303. 700 44. Li H (2011) A statistical framework for SNP calling, mutation discovery, 701 association mapping and population genetical parameter estimation from 702 sequencing data. Bioinformatics 27(21):2987-2993. 703 45. Koboldt DC, et al. (2012) VarScan 2: somatic mutation and copy number 704 alteration discovery in cancer by exome sequencing. Genome Res 22(3):568- 705 576. 706 46. Price MN, Dehal PS, & Arkin AP (2010) FastTree 2--approximately maximum- 707 likelihood trees for large alignments. PLoS One 5(3):e9490. 708 47. Wilgenbusch JC & Swofford D (2003) Inferring evolutionary trees with 709 PAUP*. Curr Protoc Bioinformatics Chapter 6:Unit 6 4. 710 48. Cole ST, et al. (1998) Deciphering the biology of Mycobacterium tuberculosis 711 from the complete genome sequence. Nature 393(6685):537-544. 712

29