bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
1 Transition bias influences the evolution of antibiotic resistance in Mycobacterium
2 tuberculosis
3 Joshua L. Payne1,2π*, Fabrizio Menardo3,4 π, Andrej Trauner3,4, Sonia Borrell3,4, Sebastian M.
4 Gygli3,4, Chloe Loiseau3,4, Sebastien Gagneux3,4†, Alex R. Hall1†
5 1. Institute of Integrative Biology, ETH Zurich, Switzerland
6 2. Swiss Institute of Bioinformatics, Lausanne, Switzerland
7 3. Swiss Tropical and Public Health Institute, Basel, Switzerland
8 4. University of Basel, Basel, Switzerland
9 *Correspondence: [email protected]
10 π, These authors contributed equally.
11 †These authors also contributed equally.
12
13
1 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
14 Abstract
15 Transition bias, an overabundance of transitions relative to transversions, has been widely
16 reported among studies of mutations spreading under relaxed selection. However, demonstrating
17 the role of transition bias in adaptive evolution remains challenging. We addressed this challenge
18 by analyzing adaptive antibiotic-resistance mutations in the major human pathogen
19 Mycobacterium tuberculosis. We found strong evidence for transition bias in two independently
20 curated datasets comprising 152 and 208 antibiotic resistance mutations. This was true at the level
21 of mutational paths (distinct, adaptive DNA sequence changes) and events (individual instances
22 of the adaptive DNA sequence changes), and across different genes and gene promoters
23 conferring resistance to a diversity of antibiotics. It was also true for mutations that do not code
24 for amino acid changes (in gene promoters and the ribosmal gene rrs), and for mutations that are
25 synonymous to each other and are therefore likely to have similar fitness effects, suggesting that
26 transition bias can be caused by a bias in mutation supply. These results point to a central role for
27 transition bias in determining which mutations drive adaptive antibiotic resistance evolution in a
28 key pathogen.
29
30 Keywords
31 Antibiotic resistance, evolution, mutation bias, Mycobacterium tuberculosis, transition bias
32
33
2 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
34 Significance statement
35 Whether and how transition bias influences adaptive evolution remain open questions. We
36 studied 296 DNA mutations that confer antibiotic resistance to the human pathogen
37 Mycobacterium tuberculosis. We uncovered strong transition bias among these mutations and
38 also among the number of times each mutation has evolved in different strains or geographic
39 locations, demonstrating that transition bias can influence adaptive evolution. For a subset of
40 mutations, we were able to rule out an alternative selection-based hypothesis for this bias,
41 indicating that transition bias can be caused by a biased mutation supply. By revealing this bias
42 among M. Tuberculosis resistance mutations, our findings improve our ability to predict the
43 mutational pathways by which pathogens overcome treatment.
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
3 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
60 Introduction
61 Mutation creates genetic variation and therefore influences evolution. Mutation is not an entirely
62 random process, but rather exhibits biases toward particular DNA sequence changes. For
63 example, a bias toward transitions (purine-to-purine or pyrimidine-to-pyrimidine changes),
64 relative to transversions (purine-to-pyrimidine or pyrimidine-to-purine changes), has been widely
65 reported among studies of mutations spreading under relaxed selection (1-5).
66
67 Demonstrating the role of such transition bias in adaptive evolution remains challenging, with
68 most existing evidence derived from individual case studies (6-10). Stoltzfus and McCandlish
69 recently reported the first systematic study of transition bias in putatively adaptive evolution,
70 using the repeated occurrence of amino acid replacements in laboratory or natural evolution as
71 evidence that the replacements are adaptive (11). Their meta-analysis provides compelling
72 evidence that transition bias influences adaptive evolution, with transitions observed in at least
73 two-fold excess of the null expectation that they occur once for every two transversions. Yet such
74 analyses have two limitations. First, while the repeated occurrence of an amino acid replacement
75 is highly suggestive of adaptation (12), it is not direct evidence of adaptation. Second, an
76 overabundance of transitions could result from a bias in mutation supply (i.e., mutation-based
77 transition bias), from a greater selective advantage conferred by the amino acid replacements
78 caused by transitions relative to those caused by transversions (i.e., selection-based transition
79 bias), or from both. For example, the spontaneous deamination of methylated cytosines to
80 thymines may contribute to mutation-based transition bias (13), whereas the propensity of non-
81 synonymous transitions to conserve the biochemical properties of amino acids better than non-
82 synonymous transversions may contribute to selection-based transition bias (14). Discriminating
83 amongst these two forms of transition bias has proven challenging to date (15, 16).
84
4 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
85 Despite the emerging evidence that transitions are overrepresented among adaptive mutations as
86 well as those spreading under relaxed selection, it remains unclear whether this plays a role in
87 real-world scenarios where rapid adaptive evolution has important implications for treatment of
88 infectious disease, such as the emergence of antibiotic resistance in pathogenic bacteria. For
89 example, if pathogen populations fix the first resistance mutation that appears ('first-come-first-
90 served') (17), then a mutation supply biased toward particular types of nucleotide substitutions
91 will influence which genetic changes drive adaptation. Alternatively, if many beneficial
92 mutations are available to selection, and pathogens fix those with the highest selective advantage
93 ('pick-the-winner'), a bias in mutation supply would have a weaker impact on which genetic
94 changes drive adaptation. Therefore, identifying transition bias in such scenarios would improve
95 our basic understanding of how resistance evolves, and our ability to predict the relative
96 likelihoods of alternative mutational pathways to resistance.
97
98 Here, we study transition bias in the evolution of antibiotic resistance in Mycobacterium
99 tuberculosis (MTB), a major human pathogen for which antibiotic resistance evolution is a key
100 obstacle to effective treatment (18). We do so using two independently curated datasets of
101 mutations that are known to confer antibiotic resistance and are therefore definitively adaptive.
102 Additionally, we test specifically for mutation-based transition bias by considering two subsets of
103 adaptive mutations: 1) Mutations located in gene promoters and in the ribosomal gene rrs, which
104 is not translated to protein and therefore should not be influenced by selection-based bias caused
105 by transitions encoding different amino acid changes than transversions. 2) Mutations that are
106 synonymous to each other and are therefore likely to have similar fitness effects.
107
108 Our results reveal strong transition bias in the mutational paths to antibiotic resistance and in the
109 number of times each mutational path is used in the evolution of antibiotic resistance, across 22
110 genes or gene promoters that confer resistance to 11 antibiotics. We also observe transition bias
5 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
111 among adaptive mutations that do not code for amino acid changes, and among adaptive
112 mutations that are synonymous to each other, consistent with the hypothesis that transition bias is
113 at least partly mutation-based. We therefore demonstrate that transition bias influences adaptive
114 evolution, specifically the evolution of antibiotic resistance in a key global pathogen.
115
116 Results
117 We curated a dataset of 152 unique point mutations that confer resistance to at least one of 11
118 different antibiotics, and that appeared in at least one of 9,351 publicly-available MTB genomes
119 (Materials and Methods). We refer to this dataset as the Basel dataset. We also analyzed an
120 independently curated dataset of 208 unique point mutations that confer resistance to at least one
121 of 8 antibiotics and appeared in at least one of 5,310 MTB genomes (19) (Materials and
122 Methods). We refer to this dataset as the Manson dataset. The Basel and Manson datasets have 64
123 point mutations in common, and together include resistance mutations for 11 antibiotics.
124
125 Following Stoltzfus and McCandlish (11), we separately studied transition bias in mutational
126 paths and in mutational events. A mutational path is any single mutation in one of our datasets,
127 such as the well known C > G transversion that causes the S315T substitution in the sole MTB
128 catalase KatG to confer resistance to isoniazid (20). This mutational path may be used any
129 number of times during adaptation of MTB strains to antibiotics, for example in different patients
130 or in different geographic regions. Each use is a mutational event. For the Basel dataset, we
131 calculated the number of mutational events using a parsimony-based analysis of mutational gains
132 and losses at all nodes in the reconstructed phylogeny of the 9,351 MTB genomes (Materials and
133 Methods). The Manson dataset also contained estimates of the number of mutational events,
134 derived using similar methods (Materials and Methods).
135
6 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
136 We studied transition bias by calculating the ratio of the number of transitions to the number of
137 transversions in each dataset, separately for mutational paths and events. We calculated 95%
138 binomial confidence intervals on this ratio and an empirical p-value that describes the probability
139 of observing a transition:transversion ratio greater than the observed ratio, given a null model
140 (Materials and Methods). Among several null models described by Stoltzfus and McCandlish
141 (11), we chose the most conservative, which assumes that all nucleotide mutations are equally
142 likely and there is no difference between the average fitness effects of transitions and
143 transversions. This null model gives a transition:transversion ratio of 0.5, because for any given
144 nucleotide there is one possible transition and two possible transversions. We therefore inferred
145 transition bias in cases where the lower bound of the 95% binomial confidence interval of the
146 observed transition:transversion ratio was greater than 0.5 and the empirical p-value was less than
147 0.05.
148
149 We observed transition bias in both mutational paths and mutational events, for both the Basel
150 and Manson datasets (Figure 1). Transition bias was considerably more pronounced in the
151 number of mutational events, as compared to the number of mutational paths. This provides
152 compelling evidence that transition bias influences the evolution of antibiotic resistance in MTB.
153 To understand why, assume there was no mutation-based or selection-based transition bias and
154 the high transition:transversion ratio at the level of mutational paths was instead due to chance,
155 such as a sampling artifact. In this scenario we would expect at most the same transition bias at
156 the level of events, rather than the roughly two-fold increase that we observed.
157
158 To determine which mutations explained the observed transition bias, we calculated the relative
159 rates of all six possible nucleotide pair mutations, accounting for GC content (1), which is
160 relatively high in MTB compared to other bacteria (Materials and Methods). For mutational
161 paths, the rate of A/T > G/C transitions exceeded the rate of any other nucleotide-pair mutation
7 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
162 (Figure 2A,B). The same was true for mutational events (Figure 2C,D), where the rates of both
163 forms of transitions (G/C > A/T and A/T > G/C) were at least 1.5 times that of all forms of
164 transversions.
165
166 Because the influence of transition bias might depend on the mechanism of antibiotic resistance,
167 we next tested for transition bias separately for different antibiotics. This reduced the number of
168 mutational paths and events that could be analyzed, so we first determined the antibiotics for
169 which we had sufficient statistical power to ensure that an observed lack of transition bias was not
170 due to reduced sample size (Materials and Methods). This analysis revealed that, at the level of
171 mutational paths, our datasets were too small to test for transition bias of the strength observed in
172 the entire dataset for any individual antibiotic (Figure S1A,B). This is because at least 44 (Basel)
173 and 118 (Manson) mutational paths would be required to provide sufficient statistical power, and
174 the maximum number of paths per antibiotic was 30 (for streptomycin) in the Basel dataset and
175 51 (for rifampicin) in the Manson dataset (Figure S2A,B). In contrast, we were able to test for
176 transition bias in the mutational events associated with resistance to individual antibiotics,
177 because only 14 and 12 events were required to provide sufficient statistical power in the Basel
178 and Manson datasets (Figure S1C,D). For this analysis, we considered mutations that
179 simultaneously conferred resistance to multiple antibiotics in separate categories. For example,
180 some mutations in the gene Rv1484 (inhA) and its promoter confer resistance to both isonazid
181 and ethionamide (5 and 6 mutations respectively in the Basel and Manson datasets). To avoid
182 counting such mutations multiple times, we analyzed them separately and did not include them
183 together with mutations that conferred resistance only to isonazid or ethionamide. All but two
184 individual antibiotics, both in the Basel dataset, had enough mutational events to test for
185 transition bias (Figure S2C,D).
186
8 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
187 We observed transition bias in the number of mutational events associated with the evolution of
188 resistance to almost all antibiotics, with transition bias indicated by transition:transversion ratios
189 ranging from 1.59 for pyrazinamide to 41.6 for kanamycin (i.e., from more than three-fold to
190 more than eighty-fold excess of the null expectation; Table 1). The glaring exception was
191 isoniazid, which was dominated by a single C > G transversion (S315T in KatG), representing
192 409 of the 472 mutational events associated with isoniazid resistance in the Basel dataset (or 671
193 events if we include mutations in inhA that are associated with resistance to both isoniazid and
194 ethionamide) and 321 of the 389 mutational events associated with isoniazid resistance in the
195 Manson dataset (or 545 if we include inhA).
196
197 The above results suggest transition bias influences the evolution of antibiotic resistance in MTB.
198 However, it remains unclear whether this bias is mutation-based or selection-based. To
199 disentangle these potential sources of transition bias, we used two different approaches. First, if
200 the observed bias were caused by transitions and transversions encoding amino acid changes with
201 different average fitness effects, we would not expect the bias to extend to mutations that do not
202 encode amino acid changes. We tested this by examining mutations in gene promoters and the
203 ribosomal gene rrs. There were 23 such mutations in the Basel dataset, and 25 in the Manson
204 dataset (12 occur in both datasets). Because this did not provide enough statistical power to
205 perform the analysis at the level of paths, we only considered events here. We found an excess of
206 transitions, demonstrating transition bias among mutations that do not encode amino acid
207 changes. Specifically, in the Basel dataset, there were 603 events that comprised 525 transitions
208 and 78 transversions (transition:transversion ratio of 6.73; 95% binomial confidence interval:
209 (5.30, 8.65); empirical p-value = 0). In the Manson dataset, there were 520 events that comprised
210 421 transitions and 99 transversions (transition:transversion ratio of 4.25; 95% binomial
211 confidence interval: (3.41, 5.35); empirical p-value = 0). Note that mutations in genes such as rrs
9 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
212 may nevertheless have variable fitness effects despite not encoding amino acid changes (21), and
213 we discuss this further below.
214
215 Second, we considered cases where different mutations caused the same resistance-conferring
216 amino acid change, and were therefore synonymous to each other and expected to have similar
217 fitness effects. Specifically, we considered amino acid changes that can be caused by both
218 transition and transversion mutations at the same ancestral codon. For example, methionine can
219 mutate to isoleucine via the transition ATG > ATA or the transversions ATG > ATT and ATG >
220 ATC. In the standard genetic code, there are five such amino acid changes (Table 2). These were
221 rare or non-existent in the Basel and Manson datasets (Table 2), except the amino acid change
222 methionine-to-isoleucine, which occurred in 3 mutational paths and 137 events in the Basel
223 dataset, and in 4 mutational paths and 135 events in the Manson dataset. In the Basel dataset, the
224 137 events comprised 88 transitions and 49 transversions (transition:transversion ratio of 1.80;
225 95% binomial confidence interval: (1.25, 2.60); empirical p-value = 0). In the Manson dataset, the
226 135 events comprised 96 transitions and 39 transversions (transition:transversion ratio of 2.46;
227 95% binomial confidence interval: (1.68, 3.67); empirical p-value = 0). This shows transitions
228 were also overrepresented among events conferring resistance via the same amino acid change
229 (from methionine to isoleucine).
230
231 Ideally, we would have sufficient mutational paths or events to perform the above analysis on all
232 of the amino acid changes in Table 2. To this end, we analyzed an older dataset, TBDReaMDB
233 (22), that included a greater number of resistance mutations, but was not compiled with the same
234 strict inclusion criteria as the Basel and Manson datasets. Across the 717 mutational paths from
235 this dataset that we included (Materials & Methods), we observed transition bias
236 (transition:transversion ratio of 0.86; 95% binomial confidence interval: (0.74, 1.00); empirical p-
237 value = 0). In this dataset there were also a sufficient number of the amino acid changes shown in
10 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
238 Table 2 to take an aggregated approach to calculating transition bias amongst mutations that are
239 synonymous to each other. Specifically, amongst all such mutational paths, there were 16
240 transitions and 12 transversions (Table 2; transition:transversion ratio of 1.33; empirical p-value =
241 0.014). This was more than twice the expected ratio of 0.63, derived from a null model
242 accounting for the number of transitions and transversions in these particular mutational paths
243 (Materials and Methods).
244
245 Discussion
246 We found strong transition bias at multiple levels (mutational paths and events) and in multiple
247 independently curated datasets (Basel, Manson, and TBDReamDB). By analyzing antibiotic
248 resistance mutations that confer a fitness benefit in the presence of selecting antibiotics, we
249 overcame the difficulty of classifying mutations as adaptive. By focusing part of our analysis on
250 nucleotide substitutions in gene promoters and the ribosomal gene rrs, and on mutations that are
251 synonymous to each other, we overcame the difficulty of ruling out an overabundance of
252 transitions caused by transitions and transversions encoding amino acid changes with different
253 average fitness effects. Our data also revealed notable exceptions to the general trend toward
254 transition bias. In particular, the most common resistance mutation by far against isoniazid was a
255 transversion. Our results have four key implications for antibiotic resistance evolution and
256 mutational biases.
257
258 First, quantifying the overabundance of transitions improves our ability to predict mutational
259 pathways to resistance. MTB often acquires multiple resistance mutations sequentially, and some
260 mutational trajectories are more common than others (23). Our results suggest the probability of
261 following a given trajectory will be higher when it contains a greater fraction of transitions than
262 alternative trajectories encoding similar resistance phenotypes.
263
11 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
264 Second, for at least part of our data, we excluded an important potential explanation for transition
265 bias, specifically that transitions encode amino acid changes with different average fitness effects
266 than transversions. We did this by showing that the bias extended to mutations in gene promoters
267 and the ribosomal gene rrs, and to mutations that are synonymous to each other. We still cannot
268 rule out variable fitness effects that are not linked to amino acid changes, as observed for
269 streptomycin-resistance mutations in ribosomal genes (21), or synonymous resistance mutations
270 in other bacterial species (24). Nevertheless, if such effects were important drivers of resistance
271 evolution in MTB, we would expect a higher fraction of observed mutations to be synonymous (0
272 and 2 mutations in coding regions were synonymous in the Basel and Manson datasets,
273 respectively). Moreover, a recent survey of experimentally observed fitness effects did not
274 support a difference in average fitness effects between transitions and transversions (16). By
275 contrast, analysis of mutations under relaxed selection indicates transitions occur more frequently
276 than transversions, including for MTB (1). Therefore, while we do not argue there is no role for
277 biased fitness effects in general (in fact there is evidence of this for viruses (15)), it is unlikely to
278 be the sole cause of the overabundance of transitions we observed, indicating this is explained
279 instead or in addition by a higher mutation supply of transitions than transversions.
280
281 Third, if we then accept that a biased mutation supply explains at least some of the observed bias
282 among mutational paths to resistance, this is consistent with a role for mutation-limited 'first-
283 come-first-served' dynamics (17) in resistance evolution in MTB. This is not necessarily the case
284 for other pathogens, such as bacteria where resistance is horizontally transmissible. Indeed, this
285 scenario may be relatively likely for pathogens like MTB, where infections expand slowly from a
286 small infectious dose (25) initially containing little genetic variation. On the other hand, genomic
287 evidence shows multiple resistant MTB lineages can occur within the same patient (26, 27),
288 which is less supportive of mutation-limited dynamics. The extent to which such lineages
289 compete with each other will depend on the spatial population structure, for example if they are in
12 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
290 different lung sections they may not compete directly (28). Overall, this indicates resistance may
291 evolve via a process somewhere along a continuum between first-come-first-served and pick-the-
292 winner (29).
293
294 Fourth, MTB may be closer to one end of this continuum when challenged with some antibiotics
295 than others. Specifically, a single transversion accounted for >70% of the mutational events
296 conferring isoniazid resistance (the same was also true of mutations conferring resistance to both
297 isoniazid and ethionamide), indicating a greatly reduced role for transition bias. Instead, the high
298 frequency of the S315T mutation in katG among isoniazid-resistant isolates may reflect that it
299 confers resistance to clinically relevant concentrations, particularly in combination with inhA
300 promoter mutations (30), without severely reducing catalytic function (20). MTB strains carrying
301 this mutation, while not detected in in vitro mutant screens (31), also appear more likely to
302 transmit than strains with other isoniazid-resistance mutations (32).
303
304 Our observation that transition bias is prevalent at the level of paths, but even stronger at the level
305 of events, is consistent with Stoltzfus & McCandlish's (2017) observations for mutations in
306 parallel experimental or natural populations, and is expected under mutation-based transition bias
307 when the sample size is large (11). Specifically, when the sample size is small, each observed
308 mutational path is represented by one event only, so the transition:transversion ratio of paths and
309 events is identical. With increasing sample size, individual paths are represented by multiple
310 events. In this case, the transition:transversion ratio among events will tend toward that of the
311 adaptive mutation rate, and transitions will be overrepresented if there is mutation-based
312 transition bias. However the transition:transversion ratio of observed paths will tend toward the
313 ratio for all possible unique adaptive mutations, which would approximate 0.5 even when there is
314 mutation-based transition bias (11). Note the observed ratio at the level of paths, while weaker
315 than at the level of events, still exceeded 0.5. This could be because our datasets are incomplete
13 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
316 samples of the possible paths to resistance in MTB, and the likelihood of a given path featuring in
317 our datasets is higher if it occurs more frequently (i.e. mutation-based bias and intermediate
318 sample size). Alternatively, our datasets may capture the vast majority of paths to resistance, but
319 due to selection-based bias a greater fraction of them are transitions than transversions.
320
321 The bias we observed was also prevalent at the level of individual nucleotide changes, with A/T >
322 G/C transitions being particularly common. This is in contrast to earlier evidence that G/C > A/T
323 transitions are the most common type of mutation spreading under relaxed selection in multiple
324 species, including MTB (1). Nevertheless, in that study the ratio of transitions to A/T compared to
325 G/C was lower for MTB than other bacteria. Further evidence of a distinctive transition bias
326 toward G/C in Mycobacteria, and in particular an overabundance of A/T > G/C transitions, comes
327 from a recent mutation accumulation experiment with the related species Mycobacterium
328 smegmatis (33). Another earlier study testing for evidence that oxidative damage caused by
329 antibiotics influences mutational biases also found that A/T > G/C transitions were the most
330 common type of substitution in an earlier version of the TBDReaMDB (34). Thus, transition bias
331 is pervasive across earlier studies and our dataset, yet may be toward different types of transitions
332 depending on the species and on whether mutations are involved in antibiotic resistance.
333
334 In conclusion, our data support the hypothesis that a bias toward transitions plays a key role in
335 determining the genetic changes driving antibiotic resistance evolution.
336
337 Materials and Methods
338 The Basel dataset
339 We curated a list of mutations known to confer resistance to one or more of the following drugs
340 or drug classes: isoniazid, ethionamide, rifampicin, ethambutol, pyrazinamide, fluoroquinolones,
341 aminoglycosides. Starting from a previously published set of 120 mutations (35), we first
14 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
342 excluded mutations in rpsA and ahpC, because these genes were unlikely to confer resistance to
343 pyrazinamide and isoniazid respectively (36, 37). We then added gyrB to the list of pertinent
344 genes, because some mutations in this gene have been shown to lead to fluoroquinolone
345 resistance (38).
346
347 We included additional mutations if they met one or more of the following criteria: 1) they have
348 been shown, by virtue of allelic exchange, to confer resistance; 2) introduction of the mutation
349 into the enzyme of interest was investigated in vitro and shown to confer properties consistent
350 with drug resistance; 3) they were identified in laboratory-generated antibiotic-resistant strains as
351 the most likely candidate for resistance; or 4) there was a clear correlation between the presence
352 of the mutation and drug resistance as detected by phenotypic drug susceptibility testing of
353 clinical strains. This resulted in a list of 196 mutations (Table S1), of which we found 152 in at
354 least one of 9,351 publicly-available MTB genomes (Table S2) (39). These are the mutational
355 paths in the Basel dataset (Table S3).
356
357 We determined the mutational events by first calling single nucleotide polymorphisms in the
358 9,351 genomes and then using the polymorphisms to reconstruct the genomes’ phylogeny and to
359 infer mutational gains and losses throughout the phylogeny, as follows. We clipped Illumina
360 adaptors, trimmed low quality reads using Trimmomatic v. 0.33 (SLIDINGWINDOW:5:20) (40),
361 and removed reads shorter than 20 base pair. We merged overlapping paired-end reads using
362 SeqPrep v. 1.2 (overlap size = 15), and mapped the resulting reads to the reconstructed ancestral
363 sequence of the M. tuberculosis complex (41) using the mem algorithm of BWA v 0.7.13 (ref
364 (42)). We marked duplicated reads using the MarkDuplicates module of Picard v. 2.9.1,
365 performed local realignment of reads around indels using the RealignerTargetCreator and
366 IndelRealigner modules of GATK v. 3.4.0 (ref (43)), and excluded reads with an alignment score
367 lower than (0.93 × read_length) – (read_length × 4 × 0.07), which corresponds to more than 7
15 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
368 mismatches per 100 base pairs. We called single nucleotide polymorphisms using Samtools v. 1.2
369 mpileup (44) and VarScan v. 2.4.1 (ref (45)), with the following thresholds: minimum mapping
370 quality of 20, minimum base quality at a position of 20, minimum read depth at a position of 7,
371 minimum percentage of reads supporting the call 90%, maximum strand bias for a position 90%.
372 Genomes were excluded if 1) they had an average coverage < 20x, 2) more than 50% of their
373 single nucleotide polymorphisms were excluded due to the strand bias filter, 3) more than 50% of
374 their single nucleotide polymorphisms had a percentage of reads supporting the call between 10%
375 and 90%, or 4) they contained single nucleotide polymorphisms that belong to different MTB
376 lineages, because this indicates that a mix of genomes was sequenced. Finally, we excluded all
377 genomic positions with more than 10% missing data. This resulted in a final dataset of 300,583
378 polymorphic positions.
379
380 We inferred a phylogenetic tree based on the 300,583 single nucleotide polymorphisms with
381 FastTree (46), using double digit precision and the options -nocat and -nosupport. We extracted
382 the bases at each of the 152 genomic positions in our list of resistance mutations from the vcf file
383 for the 9,351 genomes and assembled them in phylip format. To determine the number of
384 mutational events per mutational path, we reconstructed the nucleotide changes at the 152
385 genomic positions on the phylogenetic tree rooted with the inferred ancestral sequence of the
386 MTB complex (41). To do this we used the maximum parsimony ACCTRAN algorithm (47)
387 implemented in PAUP* v. 4.0a (ref (47)), giving equal weight to all characters and considering
388 them as unordered.
389
390 The Manson dataset
391 Manson et al. (19) compiled a list of polymorphisms associated with resistance to eight
392 antibiotics (Supplementary Table 4 in ref (19)), and searched for these polymorphisms in 5,310
393 MTB genomes. They found 392 of these polymorphisms in at least one genome (Supplementary
16 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
394 Table 5 in ref (19)). We filtered these polymorphisms to only include point mutations, resulting in
395 a dataset of 208 mutational paths. Manson et al. (19) calculated the number of events per
396 mutational path by reconstructing the phylogeny of the 5,310 MTB genomes and using a
397 parsimony-based analysis to determine mutational gains and losses throughout the phylogeny. We
398 used their estimates of the number of events per mutational path, as they were reported in
399 Supplementary Table 5 of ref (19).
400
401 The TBDReaMDB dataset
402 TBDReaMDB is a dataset of 1,178 mutational paths associated with resistance to at least one of 9
403 antibiotics. We filtered this dataset to only include mutational paths that (1) are non-synonymous
404 point mutations, with both both the ancestral and derived codons reported (709 paths), or (2)
405 point mutations in promoters that are upstream of a gene’s transcription start site (8 paths), and
406 (3) are non-redundant, where we considered two mutational paths redundant if they were the
407 same mutational path and associated with the same gene ID, drug, and codon position (or
408 nucleotide position in the case of promoters). The filtered dataset contains 717 mutational paths.
409
410 We used this dataset, which includes a greater number of mutations than the Basel and Manson
411 datasets, but with less strict inclusion criteria, to study transition bias amongst amino acid
412 changes that can be caused by mutational paths that are transitions or transversions, and that arise
413 from the same codon (Table 2). For this purpose, we developed a null model accounting for the
414 number of transitions and transversions in the available mutational paths that cause such amino
415 acid changes. Specifically, for a given combination i of amino acid change and ancestral codon,
416 there are ni mutational paths that are transitions and pi mutational paths that are transversions. In
417 the TBDReaMDB dataset, there are m = 6 such combinations of amino acid changes and
418 ancestral codons (top six rows of Table 2). Some are observed multiple times (at different loci),
419 giving a total of 28 such observations (i.e., the sum of the values in the rightmost column of Table
17 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
! ! 420 2). The expected probability of a transition under the null model is (1/ !!! �!) !!! �!�! /
421 (�!!�!), where qi is the number of times each mutation (path) is observed at different loci
422 (events). For the m = 6 observed mutational paths, this gives a null transition probability of
423 (1/(1+1+11+1+7+7))*(1*0.5 + 1*0.5 + 11*0.33 + 1*0.33 + 7*0.33 + 7*0.5) = 0.385, which
424 equates to a transition:transversion ratio of 0.63.
425
426 Calculating empirical p-values for transition:transversion ratios
427 To calculate an empirical p-value for an observed transition:transversion ratio in a dataset
428 containing x mutational paths or events, we randomly generated 106 datasets of x mutations
429 according to our null model that transversions are twice as likely as transitions. Each of the x
430 mutations in each dataset was chosen to be a transition with probability 1/3 or a transversion with
431 probability 2/3 (when we analyzed mutations that are synonymous to each other in the
432 TBDReaMDB dataset, we used probabilities of 0.385 and 0.615). We determined the
433 transition:transversion ratio for each of these datasets and then calculated the empirical p-value as
434 the fraction of these ratios that were greater than the observed ratio.
435
436 Calculating the relative rates of the six nuleotide pair mutations
437 GC content may influence the number of mutational paths or mutational events in our datasets.
438 MTB has a high GC content, 65.6% genome-wide (48) and 64.4% in the 17 genes associated with
439 resistance in the Basel and Manson datasets. Thus, we expected to see more mutations from G/C
440 or C/G than from A/T or T/A, simply because there are more Gs and Cs in the genes associated
441 with resistance in our datasets. To control for this effect in our calculation of the relative rates of
442 the six possible nucleotide pair mutations, we followed the method of Hershberg and Petrov (1).
443 Specifically, we first determined the number of mutations of each type we would expect under
444 equal GC content by multiplying the number of mutations from A/T to G/C, C/G, or T/A by
445 64.4/(100 - 64.4). We then calculated the relative rates of the six nucleotide pair mutations by
18 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
446 dividing the number of each (unchanged for mutations from G/C) by the sum of all possible pairs
447 of mutations, and multiplying by 100.
448
449 Calculating statistical power
450 The numbers of paths and events are much smaller for individual antibiotics than in the full
451 datasets, which may render our test of transition bias statistically underpowered for individual
452 antibiotics. To determine the minumum number of mutational paths or events required to rule out
453 the possibility that an observed lack of transition bias might be due to small sample size, we
454 downsampled our datasets as follows. For each of the Basel and Manson datasets separately, we
455 randomly sampled x mutational paths or events from the dataset without replacement and
456 calculated the fraction of these paths or events that were transitions (we calculate this fraction,
457 rather than the transition:transversion ratio, to avoid division by zero) and the associated 95%
458 binomial confidence interval. We repeated this 104 times for each value of x, and calculated the
459 average fraction of transitions, the minimum of the lower 95% confidence intervals, and the
460 maximum of the upper 95% confidence intervals. For mutational paths, we varied x from 2 to the
461 maximum number of mutational paths in each dataset, and for mutational events, we varied x
462 from 2 to 800. We then determined the minimum value of x for which the minimum of the 95%
463 confidence intervals exceeded 1/3, which is the expected fraction of transitions under the null
464 model (corresponding to a transition:transversion ratio of 0.5). This is the minimum number of
465 mutational paths or events required to exclude the possibility that an observed lack of transition
466 bias at a given value of x is due to small sample size, given the transition bias in the full dataset.
467 For the Basel dataset, this minimum number of mutational paths was 44 and the minimum
468 number of mutational events was 14. In the Manson dataset, the minimum number of mutational
469 paths was 118 and the minimum number of mutational events was 12. No antibiotic was
470 associated with more than these minimum numbers of mutational paths in either the Basel or
471 Manson datasets (Figure S2A,B). However, we had at least the minimum number of mutational
19 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
472 events for all antibiotics and combinations except for Amikacin+Capreomycin, and mutations
473 conferring resistance only to Capreomycin in the Basel dataset (Figure S2C,D).
474
475 Acknowledgements
476 We acknowledge support from Swiss National Science Foundation (SNSF) grants
477 PP00P3_170604 (J.L.P), 31003A_165803 (A.R.H.), 310030_166687 (S.G.), IZRJZ3_164171
478 (S.G.), IZLSZ3_170834 (S.G.) and CRSII5_177163 (S.G.), the European Research
479 Council (309540-EVODRTB to S.G.) and SystemsX.ch. Part of the calculations were
480 performed at sciCORE (http://scicore.unibas.ch/) scientific computing center at University of
481 Basel.
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
20 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
497 Table 1. Summary of transition bias in mutational events per antibiotic in the Basel and
498 Manson datasets. Rows are ordered by decreasing number of events in the Basel dataset.
499 Dashes indicate antibiotics for which there are no mutational events in the respective
500 dataset. Mutations that confer resistance to multiple antibiotics are reported separately and
501 were not counted among the events conferring resistance to individual antibiotics. RIF-
502 Rifampicin, EMB - Ethambutol, INH – Isonazid, SM – Streptomycin, ETH - Ethionamide,
503 FQ – Floroquinolones, AK – Amikacin, CAP – Capreomycin, KAN – Kanamycin, PZA –
504 Pyrazinamide, OFX - Ofloxacine. Ti – number of transitions, Tv – number of transversions,
505 Ti : Tv – transition:transversion ratio, 95% CI – 95% binomial confidence interval, p –
506 empirical p-value.
507
Basel dataset Manson dataset
Antibiotic Ti Tv Ti:Tv 95% CI p Ti Tv Ti:Tv 95% CI p
RIF 365 223 1.64 (1.38, 1.94) 0 494 209 2.36 (2.01, 2.80) 0
EMB 390 154 2.53 (2.10, 3.07) 0 342 131 2.61 (2.13, 3.22) 0
INH 42 430 0.10 (0.07, 0.13) 1 54 335 0.16 (0.12, 0.22) 1
SM 285 71 4.01 (3.08, 5.28) 0 254 72 3.53 (2.71, 4.65) 0
INH, ETH 231 31 7.45 (5.11, 11.22) 0 137 18 7.61 (4.64, 13.23) 0
FQ 166 57 2.91 (2.14, 4.01) 0 - - - - -
AK, CAP, KAN 128 0 ∞ (34.20, ∞) 0 - - - - -
PZA 67 28 2.39 (1.52, 3.86) 0 46 29 1.59 (0.98, 2.61) 0
KAN 52 9 5.78 (2.82, 13.34) 0 208 5 41.6 (17.54, 129.46) 0
ETH 7 10 0.70 (0.23, 2.04) 0.3 16 10 1.60 (0.68, 3.95) 0.003
KAN, CAP 17 0 ∞ (4.13, ∞) 0 - - - - -
OFX - - - - - 220 91 2.42 (1.89, 3.12) 0
21 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
508 Table 2. Observed transitions and transversions in mutational events in the Basel and
509 Manson datasets, and in mutational paths in the TBDReaMDB dataset, among amino acid
510 changes that can be caused by both transition and transversion mutations to the same
511 codon. Abbreviations as in Table 1.
Amino acid change Ancestral Possible Basel Manson TBDReamDB codon Ti:Tv Ti:Tv Ti:Tv Ti:Tv G → R GGA 1:1 0:0 0:0 0:1 G → R GGG 1:1 0:0 0:0 1:0 M → I ATG 1:2 88:49 96:39 6:5 F → L TTT 1:2 0:0 0:0 1:0 F → L TTC 1:2 0:0 0:0 1:6 W → R TGG 1:1 0:0 0:0 7:0 Stop → R TGA 1:1 0:0 0:0 0:0 512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
22 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
529 Figure 1. Transition bias in mutational paths and mutational events in the Basel and
530 Manson datasets. The dashed horizontal line shows the null expectation of the
531 transition:transversion ratio.
532
533
534
535
536
537 Basel 2 Manson 538
539
540 1.5 541
542
543 1
544 Transition:transversion ratio
545
546 0.5
547 Paths Events
548
549
550
551
552
23 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
553 Figure 2. Relative rates of the six nucleotide pair mutations, for mutational paths and
554 events in the Basel and Manson datasets. Transitions are indicated with bold text. Rates
555 adjusted for GC content (Materials and methods).
Basel Manson 35 25 A B 30 20 25
20 15
15 10 10 5 5
0 0 Proportion of mutational paths (%) Proportion of mutational paths (%)
A>T,T>A A>T,T>A G>C,C>GG>T,C>A A>C,T>GG>A,G>A,C>T C>TA>G,A>G,T>C T>C G>C,C>GG>T,C>A A>C,T>GG>A,G>A,C>T C>TA>G,A>G,T>C T>C
50 40 C D 40
30 30
20 20
10 10
0 0 Proportion of mutational events (%) Proportion of mutational events (%)
A>T,T>A A>T,T>A 556 G>C,C>GG>T,C>A A>C,T>GG>A,G>A,C>T C>TA>G,A>G,T>C T>C G>C,C>GG>T,C>A A>C,T>GG>A,G>A,C>T C>TA>G,A>G,T>C T>C
557
558
559
560
561
562
563
24 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
564 Figure S1. The average fraction of transitions (open circles), with the minimum and
565 maximum of the lower and upper 95% binomial confidence intervals (dashed lines), relative
566 to the number of sampled paths or events from the Basel or Manson datasets (x-axis). Open
567 circles and dashed lines are derived from 104 replications per number of sampled paths or
568 events. The horizontal line indicates the null expectation of a fraction of transitions equal to
569 1/3, and the vertical line indicates the minimum number of mutational paths or events
570 required for the minimum lower bound on the 95% confidence interval to exceed 1/3.
Basel Manson 1 1 A B 0.8 0.8
0.6 0.6
0.4 0.4 that are transitions 0.2 that are transitions 0.2 Fraction of sampled paths Fraction of sampled paths 0 0 100 101 102 100 101 102 Number of sampled paths Number of sampled paths
1 1 C D 0.8 0.8
0.6 0.6
0.4 0.4 that are transitions 0.2 that are transitions 0.2 Fraction of sampled events Fraction of sampled events 0 0 100 101 102 100 101 102 571 Number of sampled events Number of sampled events
572
573
574
25 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
575 Figure S2. The numbers of mutational paths and events that confer resistance to individual
576 or multiple antibiotics in the Basel and Manson datasets.
Basel Manson 35 A 50 B 30
25 40
20 30 15 20 10 Number of paths 10 5
0 0
SM RIF FQ RIF SM INH PZAEMB INH ETH KANCAP PZA ETHEMB OFX KAN AK,CAP INH,ETH KAN,CAP INH, ETH AK,CAP,KAN 600 800 C D 500 600 400
300 400
200 200 Number of events 100
0 0
RIF INHSM FQ RIF SM EMB PZAKANETH CAP EMB INH OFX KAN PZA ETH AK,CAP INH,ETH KAN,CAP INH, ETH K,CAP,KAN 577 A
578
579
580
581
582
583
584
26 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
585 References
586 1. Hershberg R & Petrov DA (2010) Evidence that mutation is universally 587 biased towards AT in bacteria. PLoS Genet 6(9):e1001115. 588 2. Petrov DA & Hartl DL (1999) Patterns of nucleotide substitution in 589 Drosophila and mammalian genomes. Proc Natl Acad Sci U S A 96(4):1475- 590 1479. 591 3. Ossowski S, et al. (2010) The rate and molecular spectrum of spontaneous 592 mutations in Arabidopsis thaliana. Science 327(5961):92-94. 593 4. Keightley PD, et al. (2009) Analysis of the genome sequences of three 594 Drosophila melanogaster spontaneous mutation accumulation lines. Genome 595 Res 19(7):1195-1201. 596 5. Pauly MD, Procario MC, & Lauring AS (2017) A novel twelve class fluctuation 597 test reveals higher than expected mutation rates for influenza A viruses. Elife 598 6. 599 6. Galen SC, et al. (2015) Contribution of a mutational hot spot to hemoglobin 600 adaptation in high-altitude Andean house wrens. Proc Natl Acad Sci U S A 601 112(45):13958-13963. 602 7. Sackman AM, et al. (2017) Mutation-Driven Parallel Evolution during Viral 603 Adaptation. Mol Biol Evol 34(12):3243-3253. 604 8. Rokyta DR, Joyce P, Caudle SB, & Wichman HA (2005) An empirical test of the 605 mutational landscape model of adaptation using a single-stranded DNA virus. 606 Nat Genet 37(4):441-444. 607 9. Weigand MR & Sundin GW (2012) General and inducible hypermutation 608 facilitate parallel adaptation in Pseudomonas aeruginosa despite divergent 609 mutation spectra. Proc Natl Acad Sci U S A 109(34):13680-13685. 610 10. Couce A, Rodriguez-Rojas A, & Blazquez J (2015) Bypass of genetic 611 constraints during mutator evolution to antibiotic resistance. Proc Biol Sci 612 282(1804):20142698. 613 11. Stoltzfus A & McCandlish DM (2017) Mutational Biases Influence Parallel 614 Adaptation. Mol Biol Evol 34(9):2163-2172. 615 12. Storz JF (2016) Causes of molecular convergence and parallelism in protein 616 evolution. Nat Rev Genet 17(4):239-250. 617 13. Rosenberg MS, Subramanian S, & Kumar S (2003) Patterns of transitional 618 mutation biases within and among mammalian genomes. Mol Biol Evol 619 20(6):988-993. 620 14. Zhang J (2000) Rates of conservative and radical nonsynonymous nucleotide 621 substitutions in mammalian nuclear genes. J Mol Evol 50(1):56-68. 622 15. Lyons DM & Lauring AS (2017) Evidence for the Selective Basis of Transition- 623 to-Transversion Substitution Bias in Two RNA Viruses. Mol Biol Evol 624 34(12):3205-3215. 625 16. Stoltzfus A & Norris RW (2016) On the Causes of Evolutionary 626 Transition:Transversion Bias. Mol Biol Evol 33(3):595-602. 627 17. Yampolsky LY & Stoltzfus A (2001) Bias in the introduction of variation as an 628 orienting factor in evolution. Evol Dev 3(2):73-83.
27 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
629 18. Gagneux S (2018) Ecology and evolution of Mycobacterium tuberculosis. Nat 630 Rev Microbiol 16(4):202-213. 631 19. Manson AL, et al. (2017) Genomic analysis of globally diverse Mycobacterium 632 tuberculosis strains provides insights into the emergence and spread of 633 multidrug resistance. Nat Genet 49(3):395-402. 634 20. Yu S, Girotto S, Lee C, & Magliozzo RS (2003) Reduced affinity for Isoniazid in 635 the S315T mutant of Mycobacterium tuberculosis KatG is a key factor in 636 antibiotic resistance. J Biol Chem 278(17):14769-14775. 637 21. Springer B, et al. (2001) Mechanisms of streptomycin resistance: selection of 638 mutations in the 16S rRNA gene conferring resistance. Antimicrob Agents 639 Chemother 45(10):2877-2884. 640 22. Sandgren A, et al. (2009) Tuberculosis drug resistance mutation database. 641 PLoS Med 6(2):e2. 642 23. Borrell S, et al. (2013) Epistasis between antibiotic resistance mutations 643 drives the evolution of extensively drug-resistant tuberculosis. Evol Med 644 Public Health 2013(1):65-74. 645 24. Schenk MF, Szendro IG, Krug J, & de Visser JA (2012) Quantifying the 646 adaptive potential of an antibiotic resistance enzyme. PLoS Genet 647 8(6):e1002783. 648 25. Dheda K, Barry CE, 3rd, & Maartens G (2016) Tuberculosis. Lancet 649 387(10024):1211-1226. 650 26. Trauner A, et al. (2017) The within-host population dynamics of 651 Mycobacterium tuberculosis vary with treatment efficacy. Genome Biol 652 18(1):71. 653 27. Sun G, et al. (2012) Dynamic population changes in Mycobacterium 654 tuberculosis during acquisition and fixation of drug resistance in patients. J 655 Infect Dis 206(11):1724-1733. 656 28. Lieberman TD, et al. (2016) Genomic diversity in autopsy samples reveals 657 within-host dissemination of HIV-associated Mycobacterium tuberculosis. 658 Nat Med 22(12):1470-1474. 659 29. McCandlish DM & Stoltzfus A (2014) Modeling evolution using the 660 probability of fixation: history and implications. Q Rev Biol 89(3):225-252. 661 30. Lempens P, et al. (2018) Isoniazid resistance levels of Mycobacterium 662 tuberculosis can largely be predicted by high-confidence resistance- 663 conferring mutations. Sci Rep 8(1):3246. 664 31. Bergval IL, Schuitema AR, Klatser PR, & Anthony RM (2009) Resistant 665 mutants of Mycobacterium tuberculosis selected in vitro do not reflect the in 666 vivo mechanism of isoniazid resistance. J Antimicrob Chemother 64(3):515- 667 523. 668 32. Gagneux S, et al. (2006) Impact of bacterial genetics on the transmission of 669 isoniazid-resistant Mycobacterium tuberculosis. Plos Pathog 2(6):e61. 670 33. Kucukyildirim S, et al. (2016) The Rate and Spectrum of Spontaneous 671 Mutations in Mycobacterium smegmatis, a Bacterium Naturally Devoid of the 672 Postreplicative Mismatch Repair Pathway. G3 (Bethesda) 6(7):2157-2163.
28 bioRxiv preprint doi: https://doi.org/10.1101/421651; this version posted September 20, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
673 34. Wang ZY, Xiong M, Fu LY, & Zhang HY (2013) Oxidative DNA damage is 674 important to the evolution of antibiotic resistance: evidence of mutation bias 675 and its medicinal implications. J Biomol Struct Dyn 31(7):729-733. 676 35. Walker TM, et al. (2015) Whole-genome sequencing for prediction of 677 Mycobacterium tuberculosis drug susceptibility and resistance: a 678 retrospective cohort study. Lancet Infect Dis 15(10):1193-1202. 679 36. Dillon NA, Peterson ND, Feaga HA, Keiler KC, & Baughn AD (2017) Anti- 680 tubercular Activity of Pyrazinamide is Independent of trans-Translation and 681 RpsA. Sci Rep 7(1):6135. 682 37. Nieto RL, et al. (2016) Virulence of Mycobacterium tuberculosis after 683 Acquisition of Isoniazid Resistance: Individual Nature of katG Mutants and 684 the Possible Role of AhpC. PLoS One 11(11):e0166807. 685 38. Malik S, Willby M, Sikes D, Tsodikov OV, & Posey JE (2012) New insights into 686 fluoroquinolone resistance in Mycobacterium tuberculosis: functional 687 genetic analysis of gyrA and gyrB mutations. PLoS One 7(6):e39754. 688 39. Menardo F, et al. (2018) Treemmer: a tool to reduce large phylogenetic 689 datasets with minimal loss of diversity. BMC Bioinformatics 19(1):164. 690 40. Bolger AM, Lohse M, & Usadel B (2014) Trimmomatic: a flexible trimmer for 691 Illumina sequence data. Bioinformatics 30(15):2114-2120. 692 41. Comas I, et al. (2013) Out-of-Africa migration and Neolithic coexpansion of 693 Mycobacterium tuberculosis with modern humans. Nat Genet 45(10):1176- 694 1182. 695 42. Li H & Durbin R (2009) Fast and accurate short read alignment with 696 Burrows-Wheeler transform. Bioinformatics 25(14):1754-1760. 697 43. McKenna A, et al. (2010) The Genome Analysis Toolkit: a MapReduce 698 framework for analyzing next-generation DNA sequencing data. Genome Res 699 20(9):1297-1303. 700 44. Li H (2011) A statistical framework for SNP calling, mutation discovery, 701 association mapping and population genetical parameter estimation from 702 sequencing data. Bioinformatics 27(21):2987-2993. 703 45. Koboldt DC, et al. (2012) VarScan 2: somatic mutation and copy number 704 alteration discovery in cancer by exome sequencing. Genome Res 22(3):568- 705 576. 706 46. Price MN, Dehal PS, & Arkin AP (2010) FastTree 2--approximately maximum- 707 likelihood trees for large alignments. PLoS One 5(3):e9490. 708 47. Wilgenbusch JC & Swofford D (2003) Inferring evolutionary trees with 709 PAUP*. Curr Protoc Bioinformatics Chapter 6:Unit 6 4. 710 48. Cole ST, et al. (1998) Deciphering the biology of Mycobacterium tuberculosis 711 from the complete genome sequence. Nature 393(6685):537-544. 712
29