<<

bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1

2

3 Aminoacyl tRNA Synthetases as Malarial Drug Targets:

4 A Comparative Study

5 Dorothy Wavinya Nyamai, Özlem Tastan Bishop*

6 Research Unit in Bioinformatics (RUBi), Department of and Microbiology,

7 Rhodes University, Grahamstown 6140, South Africa

8

9 *Correspondence to [email protected] bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

10 Abstract

11 Treatment of parasitic diseases has been challenging due to the development of drug

12 resistance by parasites, and thus there is need to identify new class of drugs and drug targets.

13 translation is important for survival of plasmodium and the pathway is present in all

14 the life cycle stages of the plasmodium parasite. Aminoacyl tRNA synthetases are primary

15 in protein translation as they catalyse the first reaction where an amino acid is added

16 to the cognate tRNA. Currently, there is limited research on comparative studies of

17 aminoacyl tRNA synthetases as potential drug targets. The aim of this study is to understand

18 differences between plasmodium and human aminoacyl tRNA synthetases through

19 bioinformatics analysis. Plasmodium falciparum, P. fragile, P. vivax, P. ovale, P. knowlesi,

20 P. bergei, P. malariae and human aminoacyl tRNA synthetase sequences were retrieved from

21 UniProt database and grouped into 20 families based on amino acid specificity. Despite

22 functional and structural conservation, multiple , motif discovery, pairwise

23 sequence identity calculations and molecular phylogenetic analysis showed striking

24 differences between parasite and human . Prediction of alternate binding sites

25 revealed potential druggable sites in PfArgRS, PfMetRS and PfProRS at regions that were

26 weakly conserved when compared to the human homologues. These differences provide a

27 basis for further exploration of plasmodium aminoacyl tRNA synthetases as potential drug

28 targets.

29 Keywords

30 Aminoacyl tRNA synthetases; motif analysis; phylogenetic tree calculations, homology

31 modelling

32 Abbreviations bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

33 aaRS, aminoacyl tRNA synthetases; RF, rossmann fold; CPI, connective peptide I; MSA,

34 multiple ; CD, catalytic domain; ABD, anticodon binding domain

35 Introduction 36 Parasitic diseases like trypanosomiasis, malaria, leishmaniasis and filariasis affect millions of

37 people in the world yearly [1–4]. These diseases cause a remarkable burden in economic

38 development and health of affected countries and thus the need to come up with control and

39 prevention strategies. Currently, the main mode of prevention and treatment of these parasitic

40 diseases is by use of drugs as there are no approved vaccines in the market [5]⁠. However,

41 most parasites have developed resistance against conventional drugs leading to the drugs

42 being ineffective [6,7]. Thus, there is need to develop new classes of drugs and to identify

43 drug targets to solve the shortcoming of drug resistance. Targeting housekeeping pathways

44 such as protein translation may help deal with drug resistance as they are important for the

45 survival of most parasites [8–10].

46 Plasmodium parasite causes malaria disease, which is a major public concern due to its high

47 mortality and morbidity rates [10,11]. There are five plasmodium species that cause malaria

48 in human, and these are Plasmodium falciparum (P. falciparum), Plasmodium vivax (P.

49 vivax), Plasmodium knowlesi (P. knowlesi), Plasmodium malariae (P. malariae) and

50 Plasmodium ovale (P. ovale) [12]. Plasmodium has three genomes; cytoplasm, mitochondrial

51 and apicoplast, and each of them needs a functional protein translation mechanism for growth

52 and survival [10,13,14]. Plasmodium proteins involved in protein translation machinery are

53 generally encoded by the nuclear genome and exported to target organelles to carry out

54 various functions in protein synthesis [13,15–17].

55 Aminoacyl tRNA synthetases (aaRSs) are a group of key enzymes in protein translation

56 pathway; they catalyze the first reaction, where an amino acid is added to the cognate tRNA bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

57 molecule in the presence of ATP and magnesium (Mg2+) ions. This reaction takes place in

58 two steps; first ATP activates the amino acid through formation of aminoacyl-adenylate

59 intermediate, while the second step involves ligation of the adenylate intermediate to the

60 cognate tRNA molecule through a covalent bond generating AMP [8,9,18]. Although the

61 canonical function of these enzymes is to add amino acids to tRNA for translation and they

62 are highly conserved in their catalytic domains, in general aaRSs show sequence, structural

63 and functional diversity across organisms [19]. Furthermore, in some organisms, aaRSs have

64 evolved to perform non-canonical functions such as angiogenesis, RNA splicing, signaling

65 events, transcription regulation, apoptosis and immune responses [20–22]. P. falciparum

66 tyrosyl-tRNA synthetases (PfTyrRS), for instance, have cytokine-like functions, while

67 eukaryotic methionyl-tRNA synthetases (MetRS) have glutathione-S- domains

68 that play a key role in protein-protein interactions [23,24]. P. falciparum lysyl tRNA

69 synthetase (PfLysRS) synthesizes diadenosine polyphosphate, a signaling molecule that plays

70 a role in gene expression, DNA replication and regulation of ion channels of the parasite

71 [25,26].

72 Of the five human malaria parasites, P. falciparum, known to be highly pathogenic, causes

73 the most severe forms of malaria, and is responsible for most of the malaria mortality cases

74 reported across the world [27]. P. falciparum has a total of 36 aaRSs that are asymmetrically

75 distributed in either the cytoplasm, mitochondria or the apicoplast compartments. Of the 36

76 P. falciparum aaRSs, 15 reside in the apicoplast, 16 in the cytoplasm and four in

77 mitochondria: AlaRS, GlyRS, ThrRS and CysRS are found both in the apicoplast and the

78 cytoplasm and each of the four is encoded by a single gene and exported to the two

79 compartments while only phenylalanine aminoacyl synthetase (PheRS) is encoded in the

80 mitochondria [28–30]. P. falciparum protein translation in the mitochondria relies on

81 enzymes imported from the cytoplasm including aaRSs [28]. The apicoplast encodes AspRS, bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

82 PheRS, ValRS, LysRS, HisRS, AsnRS, ProRS, SerRS, TrpRS, ArgRS, IleRS, GluRS,

83 LeuRS, TyrRS and MetRS while AlaRS, CysRS ThrRS and GlyRS are reported to have a

84 single gene encoding both the cytoplasm and apicoplast [15,17,30,31]. A single

85 transcript for each gene is spliced alternatively to generate the two isoforms for each protein

86 which are then targeted to either the cytosol or the apicoplast [29,30]. Each of these genes

87 encodes a protein with a N-terminal extension that corresponds to a signal and transit peptide

88 and is conserved in the apicomplexa phylum [29]. P. falciparum cytoplasm has genes that

89 encode ProRS, AspRS, IleRS, LysRS, HisRS, PheRS, AsnRS, ArgRS, GlnRS, SerRS,

90 TrpRS, ValRS, MetRS, LeuRS, GluRS and TyrRS [31].

91 In human, aaRSs carry out aminoacylation reactions in the cytoplasm, nucleus and the

92 mitochondria. After tRNA is encoded in the nucleus, it is transported to the cytoplasm where

93 protein translation takes place [8]. The human mitochondria acquires nuclear-encoded aaRSs

94 with the aid of translation signals within the aaRSs proteins to carry out protein synthesis

95 [32]. The cytoplasm is the only compartment where both aminoacylation and protein

96 synthesis exclusively takes place in humans. Human aaRSs are, thus, classified as

97 mitochondrial or cytoplasmic based on the compartment where they are localized [32]. In

98 human, a total of 36 aaRSs have been reported with 17 of them in the mitochondrion and 16

99 aaRSs exclusively functioning in the cytoplasm while the other three catalyze aminoacylation

100 reactions in both organelles [8,32]. The three bifunctional aaRSs in human are GlnRS, GlyRS

101 and LysRS. In the cytoplasm, aminoacylation of proline and glutamate is catalyzed by a

102 single bifunctional enzyme (Glu/ProRS). Thus, both compartments have enzymes for

103 charging all the 20 amino acids [32,33].

104 Generally, aaRSs proteins are classified into two distinct classes based on key features of the

105 catalytic site architecture and the manner of charging tRNA [18,20]. Class I aaRSs include

106 IleRS, LeuRS, MetRS, CysRS, GlnRS, GluRS, TrpRS, ValRS, ArgRS and TyrRS. Proteins bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

107 in this class have a catalytic domain (Figure 1A) characterized by a Rossmann fold (RF)

108 located near the N-terminal [34]. The catalytic domain of this class comprises five parallel β-

109 sheet strands flanked by α-helices. The RF possesses highly conserved HIGH and KMSKS

110 motifs separated by a loop [35,36] as shown in Figure 1A. The HIGH motif is located in a

111 region formed by a loop linking the first β-sheet strand and the adjacent α-helix while the

112 KMSKS motif occurs after the fifth β-sheet strand [8]. The RF domain has an insert known as

113 the connective peptide I (CPI) in all enzymes in this class whose structure is characteristic of

114 mixed α and β folds. Proteins in this class have common domains that include an alpha-

115 helical anticodon binding domain (ABD), connective peptide (CPI) and the tRNA stem

116 contact fold [37]. The CPI insert is found towards the end of the first half of the fourth β-

117 strand of the RF joining the N-terminal and C-terminal sections of the catalytic domain [8].

118 With the exception of TyrRS, MetRS and TrpRS all Class I enzymes are monomeric [8]. In

119 monomeric enzymes, the CPI binds tRNA at the 3’- single stranded end while in TrpRS and

120 TyrRS it forms the dimer interface of these dimeric enzymes [34,38]. In ValRS, IleRS and

121 LeuRS, the CPI insert is enlarged (250-275 amino acid residues as compared to CysRS and

122 MetRS where it is 50 and 100 residues respectively) to include an editing domain for editing

123 misacylated tRNA through hydrolysis [39]. The editing domain proofreads the

124 aminoacylation process through pre-transfer or post-transfer editing [8]. Post-transfer editing

125 involves hydrolyzing of misacylated tRNA to amino acid and tRNA while pre-transfer

126 modification hydrolyzes the mis-activated aminoacyl adenylate to AMP and amino acid [8].

127 The ABD of proteins in Class I occurs at the C-terminal which binds the anticodons in the

128 cognate tRNA [40].

129 Class I enzymes binds to the tRNA acceptor end through the minor groove and these

130 enzymes aminoacylate the 2’-OH group of adenosine nucleotide [8,40]. Proteins in this class

131 can further be classified into five subclasses based on sequence similarity and bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

132 physicochemical properties of their substrates [41,42]. Subclass Ia members charge

133 hydrophobic amino acids that have aliphatic side chains and include ValRS, MetRS, IleRS

134 and LeuRS. Subclass Ib proteins have charged amino acids as their substrates and include

135 GlnRS, CysRS and GluRS. Members of subclass IIb bind to the cognate tRNA before

136 carrying out the aminoacylation process [8,43]. TrpRS and TyrRS belong to subclass Ic and

137 their substrates are aromatic amino acids. ArgRS is the only member of subclass Id and it

138 possesses an Add1 domain at the N-terminal whose function is to recognize the D-loop in the

139 tRNA core (Figure 1A) [8,40]. Class I LysRS found in some bacteria and archaea shares

140 structural similarity with subclass Ib but it has a unique alpha helix cage and is thus grouped

141 in subclass Ie [44].

142 Class II aaRSs include HisRS, ProRS, LysRS, SerRS, AspRS, ThrRS, AlaRS, GlyRS, PheRS

143 and AsnRS. Proteins in this class are further grouped in three subclasses whose members are

144 more closely related than other subclasses [45,46]. Class IIa proteins exist as dimers and

145 includes ProRS, SerRS, GlyRS, ThrRS, HisRS and all have the aminoacylation domain at the

146 N-terminal [8]. Members of this subclass have an ABD at the C-terminal (figure 1B). The

147 anticodon binding domain is absent in SerRS as this protein does not require an anticodon to

148 discriminate its cognate tRNA [47,48]. ProRS has editing domains located between motifs I

149 and II at the catalytic domain while in ThrRS the editing domain is at the N-terminus (Figure

150 1B) [40]. Members of Class IIb are dimers and have a C-terminal catalytic domain that is

151 structurally similar and include AspRS, LysRS and AsnRS. The ABD in this subclass is

152 located at the N-terminal (Figure 1B). Class IIc includes PheRS, AlaRS and GlyRS and all

153 exist in tetrameric conformation [8,46]. AlaRS possesses a C-Ala domain at the C- terminal

154 which is absent in other members of Class IIc. The editing domain in AlaRS occurs between

155 the tRNA binding domain and the C-Ala domain (Figure 1B) [40]. bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

156 Class II enzymes possess a catalytic site domain characterized by seven β-sheet strands

157 connected by α-helices [49]. This domain, just like the Class I catalytic domain couples

158 amino acid, ATP and tRNA 3’-terminus during catalytic reactions [40,50]. Class II catalytic

159 domain has three weakly conserved motifs (Figure 1B, Figure 2B); Motif I found at the N-

160 terminal of the catalytic region is characterized by a long α-helix linked to a short β-strand

161 with a proline residue at the end which is highly conserved and is involved in homo

162 dimerization [51]. Motif II juxtaposes amino acid, ATP and tRNA and comprise β- sheet

163 strands. Motif III is located at the C-terminal of the catalytic domain and binds ATP and

164 comprise alternate β-strands and α-helices [36]. LysRS can be classified in both classes based

165 on the structure and mode of charging tRNA, with Class I LysRS occurring in some bacteria

166 and most archaea [52] while Class II LysRS occurs in most bacteria and all [8].

167 Figure 1. Key domains of aminoacyl tRNA synthetases. A) Class I aaRS showing the 168 Catalytic Domain (CD) and the anticodon binding domain (ABD). The CD has a CPI insert in 169 all the proteins in this Class. The CPI insert (orange) in IleRS, LeuRS and ValRS is enlarged 170 to form an editing domain while in TyrRS and TrpRS it functions in the formation of dimers. 171 ArgRS has an Add1 domain (cyan) at the N-terminus which is involved in tRNA recognition. 172 B) Class II aaRS showing the catalytic domain with the three conserved motifs (I, II and III). 173 In GlyRS, HisRS and ProRS the anticodon binding domain is at the C-terminal. In AspRS, 174 AsnRS and LysRS it is at the N-terminal, while in AspRS, LysRS and AsnRS proteins the 175 ABD occurs at the C-terminal. Dimer interfaces are shown by a magenta color and are 176 characterized by motif I. ProRS has an editing domain that occurs between motif I and II at 177 the catalytic site while in ThrRS the editing domain is located at the N-terminal. AlaRS has a 178 C-Ala domain (gold) at the C-terminal that functions in dimer formation. 179

180 Figure 2 A) The apo structure of PfTyrRS (Class I). The catalytic domain (residues 22- 181 260) is shown as cyan (cartoon) while the anticodon binding domain (residues 261-370) is 182 shown in grey. The highly conserved KMSKS motif (red) and the HIGH motif (yellow) are 183 shown in the structure. ATP and tyrosine binding sites are shown as blue dotted ellipses. 184 Asp61, His70, Ala72, Gln73, Gln210, His235, Met237, Leu238, Met248, Lys250 are 185 involved in ATP binding while residues Tyr60, Glu64, Ala96, Phe99, Ile172, Tyr188, Gln192 186 and Asp195 are involved in tyrosine binding [53]. B) Apo structure of PfLysRS (Class II). 187 The anticodon binding domain (residues 77-226) is shown in grey cartoon while the catalytic 188 domain (residues 227-583) is shown in cyan. Motif I (red), motif II (yellow) and motif III 189 (magenta) are shown by arrows. The ATP and lysine binding site are shown by 190 the blue dotted ellipses. Residues Arg330, His338, Asn339, Phe342, Glu500, Asn503, bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

191 Gly556 and Arg559 are implicated in binding of ATP while residues Glu308, Asn330, 192 Glu346, Tyr348, Asn503, Tyr505, and Glu507 are involved in binding of lysine [54].

193

194 Protein translation has been explored as a target in the development of antimalarial drugs

195 with most drugs interfering with the ribosome [55]. Recently, there has been increased

196 interest in exploring P. falciparum aaRSs as potential drug targets [15,25,31,55–57].

197 Plasmodium aaRSs inhibitors have been identified that target either the ATP pocket, the

198 amino acid or tRNA binding site or the editing domains of some of these enzymes. Some of

199 the compounds reported to target P. falciparum aaRSs are halofuginone, cladosporin, 3-

200 aminomethyl benzoxaborole AN6426, glyburide and TCMDC-124506 [58–61].

201 Halofuginone, a derivative of febrifugine, targets ProRS tRNA and proline binding site

202 mimicking tRNA 3’-Adenine 76 and L-pro in an ATP dependent manner [57,62,63].

203 Halofuginone binding to human and plasmodium ProRS involves identical residues and in

204 both the compound mimics proline and adenine substrates binding pose thus leading to

205 toxicity in human cells [58,64,65]. Cladosporin, a secondary metabolite from fungi, is

206 reported to have activity against blood and liver stage P. falciparum and its activity is

207 selective to only the parasite LysRS protein [25,59]. Cladosporin, an adenosine analogue

208 binds at the ATP binding site of PfLysRS [25,59]. Cladosporin can, thus, be used as a basis

209 for development of other scaffolds with improved drug-like properties. The compound 3-

210 aminomethyl benzoxaborole AN6426 was reported to be active against LeuRS in drug

211 resistant P. falciparum but did not impair growth of the wild type [61]. This compound binds

212 to the editing domain of PfLeuRS and inhibits it inactivating the 3’Adenine 76 nucleotide of

213 the cognate tRNA covalently and the catalytic turnover of P. falciparum resistant strains [56].

214 Glyburide and TCMDC-124506 are reported to bind to a site adjacent to the ATP binding site

215 of PfProRS and displace key residues involved in ATP binding thus inhibiting the enzyme bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

216 activity [60]. Glyburide and TCMDC are selective to PfProRS and do not cause toxicity to

217 human cells and thus can be used as a basis for development of drugs targeting PfProRS.

218 Due to these shortcomings of the current compounds that target aaRSs and the ever-

219 increasing antimalarial drug resistance [6,15,16,19,27,66], there is need to develop novel

220 drugs and identify more targets to counter this resistance. In addition, the development of

221 drugs that are active against the liver, blood stage parasites [66] and the sexual stages of the

222 parasites thus terminating the infection cycle would help in malaria eradication [67]. With

223 aaRSs proteins being present in all stages of the parasite life cycle, identification of subtle

224 differences between the plasmodium and human proteins would help in achieving this goal.

225 Although aaRS are desirable drug targets, selectivity of drugs to only parasitic aaRS and not

226 human proteins is a challenge as human aaRS have bacterial and eukaryotic origin [68–70].

227 High conservation of aaRS across plasmodium and the human host may hinder development

228 of parasite specific inhibitors [10,71,72]. Comparative studies between host and parasite

229 sequences and structures are important in identifying differences that can be exploited for

230 drug development [71,73–75]. The aim of this study was to discern sequence and structural

231 differences of aaRS between human and plasmodium proteins despite the functional

232 conservation of these proteins. The differences that occur at the active pockets and the

233 predicted druggable sites can thus be exploited for development of drugs with good

234 selectivity [76]. This study included P. falciparum, P. bergei, P. malariae, P. ovale, P. yoelii,

235 P. vivax, P. knowlesi, P. fragile, human and other mammalian sequences. Sequences from

236 toxoplasma and cryptosporidium species which belong to the apicomplexa family and other

237 prokaryotes were also included for molecular phylogenetic tree calculations (Additional file

238 1). Targeting of cytosolic protein machinery in plasmodium shows immediate death while

239 inhibition of apicoplast protein translation machinery is reported to show delayed death

240 where parasites die only during the next replication process after treatment [77]; thus, in this bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

241 study we focus mainly on the cytosolic aaRSs. The sequences were classified into two groups

242 based on differences in structure of their catalytic domain and further into the different aaRS

243 families based on their amino acid substrates [18,20,78]. The study was divided into two

244 parts. First, sequence-based analysis which involved motif search, multiple sequence

245 alignment and phylogenetic tree calculations was carried out. Secondly, structure-based

246 analysis was carried out which involved modeling of 3D structures of proteins, mapping of

247 identified motifs to these structures and identification of probable allosteric drug targeting

248 sites on the 3D models. The results showed striking differences in motifs and at residue level

249 between parasite and human proteins. The results from this study thus form a basis for further

250 research on aaRS as potential antimalarial drug targets and other parasitic diseases.

251 bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

252 Methods 253 Sequence retrieval

254 Plasmodium falciparum aminoacyl tRNA synthetases (PfaaRS) were retrieved from NCBI-

255 Protein database [79]. Protein sequences of other plasmodium species and human ones were

256 searched by BLAST in UniProt using each PfaaRS as the query sequence for the specific

257 family [80]. The BLASTp algorithm with the default BLOSUM62 matrix was used for the

258 search of homologous sequences. (Additional file 1). The data set consisted of the five

259 plasmodium species that infect human, P. bergei, P. yoelii, P. fragile and human

260 homologues. For phylogenetic tree calculations, other apicomplexan (cryptosporidium and

261 toxoplasma) sequences and prokaryote sequences were also retrieved (Additional file 1). The

262 sequences were then grouped into 20 groups based on the different aaRS families. Retrieved

263 sequences were also grouped into two classes (Class I and Class II), each consisting of ten

264 protein families [81,82]. Crystal structures for human and P. falciparum ArgRS, TrpRS,

265 MetRS, TyrRS, LysRS and ProRS proteins were retrieved from Protein Data Bank (PDB)

266 [83].

267 Motif discovery

268 Motif discovery was done using Multiple Expectation Maximisation for Motif Elicitation

269 (MEME) vs 4.11 to identify highly conserved motifs in each aaRS class [84]. A total of 90

270 motifs with a motif width of 6-50 residues were run for each of the non-homologous classes.

271 The MAST tool was used to identify overlapping motifs [85]. A Python script was used to

272 analyse MAST files and MEME log files. Motif conservation was represented as a number of

273 sites per a total number of class sequences, and the results were displayed as heatmaps.

274 Further, motif discovery was performed for each aaRS family and the results also displayed bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

275 as heatmaps. For each aaRS family, the default parameters were used with motif width of 6-

276 50 residues and the number of motifs run for each family varied (Additional file 3).

277 Homology modelling and model quality assessment

278 3D structures of P. falciparum, H. sapiens, P. bergei, P. malariae, P. knowlesi, P. fragile, P.

279 yoelii, P. ovale and P. vivax proteins were built by homology modelling using MODELLER

280 v9.15 [86]. Templates were identified using HHpred and PRotein Interactive MOdeling

281 (PRIMO) webservers [87,88] for the six ArgRS, TyrRS, TrpRS, MetRS, LysRS and ProRS

282 families (Additional file 2). The other families had no good quality templates hence models

283 were not built. For ArgRS - 5JLD [89]; for MetRS - 4DLP [90]; for TrpRS - 4J75 [91]; for

284 TyrRS - 5USF [92]; for ProRS - 4NCX [93] and for LysRS - 4DPG [94] was used. For each

285 protein, 100 models were calculated and the top three models with the lowest z- DOPE

286 (Discrete Optimized Protein Energy) score were selected for validation. Structure quality

287 assessment was done using Analysis (PROSA) webserver [95], Verify3D

288 [96] and Qualitative Model Energy Analysis (QMEAN) [97] and the model with the best

289 scores was selected for allosteric site prediction and motif mapping.

290 Sequence alignment

291 For each family of sequences, multiple sequence alignment was carried out using Profile

292 Multiple Alignment with Local Structures and 3D constraints (PROMALS3D) and Tree-

293 based Consistency Objective Function Evaluation (TCOFFEE) alignment tools [98,99].

294 Visualization and editing of the alignments were done using the Jalview vs. 2.10 software

295 [100]. The alignment results from the two alignment tools were compared, and, in both, it

296 was observed that the sequences were aligned identically except for the less conserved C-

297 terminal and N-terminal regions. TCOFFEE sequence alignments were used for the

298 phylogenetic tree calculations as well as for all versus all pairwise sequence identity bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

299 calculations via a Python script. The sequence identity results were translated into heatmaps

300 using a Matlab script.

301 Molecular phylogenetic analysis

302 Phylogenetic tree calculations were carried out for each family of aaRSs to study

303 evolutionary relationships within the protein families using Molecular Evolutionary Genetic

304 Analysis (MEGA) vs7.0 tool [101]. For sequence alignment of each family, three gap

305 deletion options - 90%, 95% and 100% - were used to calculate the models, and the best three

306 models for each deletion option were selected based on the lowest Bayesian information

307 criterion (BIC) scores. Maximum Likelihood (ML) statistical method was used to infer

308 evolutionary relationship while calculating trees for the top three models for each gap

309 deletion option for each protein families [102]. Total of 180 (3x3x20) trees were calculated.

310 Nearest-Neighbor-Interchange search was performed for all the constructed trees. BioNJ and

311 Neighbor Join algorithms were used for a matrix of pairwise distances calculated using JTT

312 model to obtain the initial trees for the heuristic search and the topology with the highest log

313 likelihood selected [103]. A strong branch swap filter and 1000 bootstrap replicates were

314 used for each tree calculation. The trees were then compared to the bootstrap consensus trees

315 to ensure that branching patterns were accurate and the best model and gap deletion for each

316 case was, then, chosen.

317 Prediction of alternate druggable sites

318 Structure-based drug design and development requires understanding of the structure and

319 function of the binding sites of the target protein. Identification of new drug targeting sites

320 different from the validated active sites is key in development of new classes of drugs. In this

321 study, probable druggable sites of our protein models were determined using FTMap

322 webserver [104] and SiteMap [105,106]. Homology models were used as input for the bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

323 prediction of probable druggable sites. The FTMap webserver identifies probable binding

324 sites by screening of small compounds that vary in shape, polarity and size using an empirical

325 energy function and the CHARMM force field [104]. The webserver docks isopropanol,

326 acetaldehyde, phenol, benzaldehyde, urea, dimethyl ether, acetonitrile, ethane, acetamide,

327 benzene, methylamine, cyclohexane, ethanol, N,N-dimethylformamide, isobutanol and

328 acetone at the surface of the protein [107]. Clusters of low energy conformations are

329 calculated and ranking of the probes is done based on the average energy [104]. The site that

330 binds most of the compounds is considered the active binding site while other regions that

331 bind several compounds are the predicted binding sites.

332 SiteMap, a tool in Schrödinger suites assigns site points in cavities that are likely to

333 contribute to protein-protein or protein-ligand interactions based on energetic and geometric

334 properties [105,106]. The tool uses an algorithm that depends on how well sheltered the sites

335 from solvents are and how close they are from the protein surface to determine the likeness of

336 a site point. The sites are classified based on different properties which include; how enclosed

337 is the site by the protein, the size of the site as measured by the number of points, the degree

338 by which a ligand can accept or donate hydrogen bonds, how tight the site interacts with the

339 protein, how exposed the site is to the solvent and the hydrophilic and hydrophobic nature of

340 the site [105]. The predicted binding sites are then ranked based on a SiteScore calculated

341 using a linear combination of these factors [105]. bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

342 Results and Discussion

343 In this study, 92 Class I and 89 Class II proteins were analysed for the eight plasmodium

344 species and their human homologues. More mammalian sequences were included for MSA

345 and motif search within each aaRS family to avoid bias. A protein from each aaRS family

346 was represented for each organism except for PbAspRS which was reported as a putative

347 protein thus we did not include it in the study. Overall, the study is divided into two parts. In

348 the first part, sequence related analyses such as MSA, phylogenetic tree calculations and

349 motif identification were performed with the aim of understanding the general differences

350 between plasmodial and human proteins. The second part included homology modelling,

351 mapping of motif information into 3D structures and identification of alternative drug

352 targeting sites, as the within a family of proteins is generally highly conserved,

353 hence identification of plasmodial protein specific inhibitors might be challenging.

354 PART 1 – SEQUENCE BASED ANALYSES

355 Discovery of motifs that are conserved in each AARS class

356 Motif analysis was done for each aaRS class (Figure 3 and 4) and for each family (see

357 Additional file 3 & 4). The results were displayed as heatmaps using a Python script and

358 mapped to multiple sequence alignment results and available structures. Motifs discovered

359 for each family varied as shown in Additional file 3 and 4. Motif numbering used in this

360 section is based on the MEME results.

361 In Class I, 90 motifs were identified as shown in Figure 3. The start and end positions of

362 highly conserved motifs in this class is shown in Table 1. Motif 1 was conserved in all 92

363 sequences in this class (Figure 3). This motif contains conserved residues involved in ATP

364 binding. Motif 2 was present in 45 out of 92 sequences and this motif has also been reported

365 to be important in ATP binding [40]. Class I aaRS enzymes are known to have a Rossmann bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

366 fold catalytic domain which is characteristic of the highly conserved Motif 1 and 2 [108,109].

367 Motif 12, 20 and 65 were also highly conserved among sequences in this class. The other

368 motifs clustered based on the enzyme family but some were conserved across different

369 enzymes within the same class. Motif 3, 4, 5, 13 and 14, for example, was conserved in all

370 GluRS and GlnRS sequences (Figure 3). These shared motifs show that these two proteins

371 have a high sequence identity and may explain why plasmodium apicoplast GluRS

372 mischarges glutamine specific tRNA with glutamate. In this case, glutamate is then changed

373 to glutamine a reaction catalysed by glutamyl-tRNA amidotransferase enzyme [110,111].

374 Motif 1 consisting of the HIGH signature which is characteristic of the Rossman fold was

375 conserved in all Class I aaRS (Figure 3) [81]. This class also showed high conservation of a

376 Motif 2 containing the KMSKS conserved signature which has also been reported to be part

377 of the RF in this class (Figure 6 & Additional file 3 & 4). The HIGH motif is present in the

378 first half of the RF while the KMSKS motif is present in the second half of the RF domain

379 (Figure 5 & Additional file 4). Motif conservation of the Rossman fold reflects the functional

380 importance of this region. This fold is involved in ATP binding and has been reported to be

381 highly conserved in class I proteins [36]. Class I catalytic domain is characteristic of a five

382 strand parallel sheets flanked by α-helices with amino acid and ATP binding sites on opposite

383 sides of a pseudo-2-fold symmetry. The Rossmann fold, in all Class I proteins has a

384 connective polypeptide I (CPI) insert which is characterized by alpha and beta folds [40]. The

385 conserved Motifs 1 and 2 across the class are present in the catalytic domains [40]. Detailed

386 analysis of each showed conserved motifs specific to each family (Additional

387 file 3). Further, some conserved motifs unique only in the plasmodium proteins were

388 observed (Figure 4 & Additional file 3 & 4).

389 On mapping the motifs to the multiple sequence alignments, differences at the residue level

390 were observed despite the high level of motif conservation thus these residues can be the bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

391 basis of drug discovery. specific motifs in ArgRS, MetRS, GluRS WHEP domain

392 and AspRS are important for the association of proteins to a multi- tRNA synthetase complex

393 in eukaryotes [112–114]. In human, nine aaRSs form a complex together with non-synthetase

394 p18, p38 and p43 accessory proteins [114–116]. Leucyl, isoleucyl, glutaminyl, lysyl,

395 methionyl, aspartyl, prolyl and glutamyl-tRNA synthetases form the multi-synthetase

396 complex together with the auxiliary proteins in human aaRS but this complex is not present

397 in plasmodium aaRSs [34].

398 These unique motifs may also play important roles other than the canonical catalytic roles

399 [117]. Human LeuRS and GluRS, for example, have been reported to trigger leucine

400 dependent cellular proliferation and glutamine dependent apoptosis by functioning as amino

401 acid binding sensors [118,119]. Highly conserved motifs specific to each aaRS group are as a

402 result of idiosyncratic insertions at the C-terminal or within or after the Rossmann fold of

403 each protein family in this class [21,40,114] (Additional file 3 & 4). Methionine, valine,

404 isoleucine and leucine aaRSs are all known to be specific to substrates that have aliphatic side

405 chains and Motifs 20, 24 and 65 that are highly conserved in these four proteins may have a

406 role in this specificity [40]. LeuRS, IleRS, MetRS, ArgRS, ValRS and CysRS have a

407 structurally conserved anticodon binding domain characterized by α-helices and this may

408 explain the conservation of Motifs 2, 20, 44 and 65 among these proteins (Figure 3) [40].

409 Plasmodium TrpRS has an N-terminal extension which is 227 amino acid residues long that

410 constitute a AlaX-like domain and a linker region that function in binding of tRNA and in

411 aminoacylation activity [120]. This extension is not present in the human TrpRS and thus

412 explains the unique motifs at the N- terminal of the plasmodium proteins. Plasmodium

413 sequences also have a lysine-enriched insertion at the C-terminal end of the KMSKS motif

414 which is 15 residues long in PfTrpRS which is absent in the human sequence [120]. The

415 domain for binding anticodons in Class I is located at the carboxyl terminal except for bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

416 LeuRS. The structures of this region are highly divergent even within the sub-classes and is

417 known to play an important role in tRNA discrimination [40].

418 Figure 3. Motifs identified in Class I aaRS presented as a heat map. The colours 419 represent conservation of motifs of the identified 90 motifs in this class. Conservation 420 increases from blue to red while the absence of motifs is shown by a white colour. Motifs not 421 present in human aaRS are shown in a red asterisk. 422 423 Table 1. Highly conserved motifs in Class I aaRS. Motif 1, 2, 6, 12, 20 and 65 in Class I P. 424 falciparum aaRS and the human homologues. Motif positions in the sequences are indicated 425 and dashes are used where the motif is not present.

Motif 1 Motif 2 Motif 6 Motif 12 Motif 20 Motif 65 PfArgRS 134-152 ------HsArgRS 197-215 ------PfCysRS 131-149 382-419 ------HsCysRS 53-71 405-442 ------700-714 ---- PfGlnRS 284-302 ------HsGlnRS 266-284 ------PfGluRS 313-331 ------HsGluRS 200-218 ------PfIleRS 127-145 782-819 623-651 222-271 167-181 91-111 HsIleRS 44-62 599-636 444-472 139-188 94-108 63-83 PfLeuRS 141-159 1119-1156 688-716 219-268 181-195 160-180 HsLeuRS 49-67 715-752 ------89-103 ---- PfMetRS 228-246 514-551 ------268-282 247-267 HsMetRS 269-287 ------310-324 289-309 PfTrpRS 306-324 ------534-548 ---- HsTrpRS 159-177 ------PfTyrRS 61-79 ------HsTyrRS 53-71 ------PfValRS 86-104 628-665 454-482 181-230 126-140 105-125 HsValRS 340-358 861-898 708-736 435-484 380-394 359-379 426 427 In Class II, there were three highly conserved motifs across the class (Figure 4, Table 2). In

428 the reporting of motif results of this class motif names are based on MEME results and not on

429 previous literature. Motif 1 was present in 60 sequences, Motif 2 was present in 58 sequences

430 while motif 20 was present in 76 sequences out of 89 sequences (Figure 4). Motif 1, motif 2

431 and Motif 19 discovered in Class II identified in this study contain the conserved signatures bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

432 of Class II proteins (motif III, motif II and motif I respectively) reported by Chaliotis et al

433 (2016) [36]. In Class II, motifs also clustered based on the protein family. Motif conservation

434 among proteins may mean that these regions play a specific function in the proteins. Motif

435 discovery was then done for each protein to determine conserved motifs within homologous

436 sequences of each protein and the results presented as heat maps (Additional file 3).

437 Figure 4. Motifs identified in Class II aaRS presented as a heat map. Motifs not present 438 in human aaRS are shown in a red asterisk. The colours represent conservation of the 439 identified 90 motifs in this class. Conservation increases from blue to red while the absence 440 of motifs is shown by a white colour. 441 442 Table 2. Highly conserved motifs in Class II aaRS. Motif 1, 2, 19 and 20 in Class II P. 443 falciparum aaRS and the human homologues. Motif positions in the sequences are indicated 444 and dashes are used where the motif is not present.

Motif 1 Motif 2 Motif 19 Motif 20 PfAlaRS 620-640 ------942-962 HsAlaRS 236-256 ------23-43 PfAsnRS 574-594 369-389 ---- 343-363 HsAsnRS 512-532 313-333 249-289 ---- PfAspRS 590-610 382-402 317-357 360-380 HsAspRS 465-485 264-284 199-239 242-262 PfGlyRS ---- 469-489 ---- 173-193 HsGlyRS ---- 322-342 ---- 130-150 PfHisRS 918-938 688-708 ---- 658-678 HsHisRS 378-398 148-168 ---- 118-138 PfLysRS 549-569 321-341 254-294 362-382 HsLysRS 543-563 314-334 247-287 ---- PfPheRS 313-333 ------110-130 HsPheRS 308-328 ------PfProRS ---- 463-483 ------HsProRS ---- 1225-1245 ------PfSerRS ------202-222 HsSerRS ------192-212 PfThrRS 653-673 ------358-378 HsThrRS 444-464 ------151-171 445

446 Class II aaRS have a highly conserved catalytic domain that occurs as β-sheet strands with α-

447 helices on either side. This domain binds ATP, amino acid and the tRNA during bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

448 aminoacylation. Motif 1 has been reported previously (as motif III) to be part of the active

449 site forming α-helices and β-strands [36,121]. Motif 2, (Figure 4) also found at the catalytic

450 site of proteins in this group forms β strands in pairs joined by a loop [35]. Motif I plays a

451 role in binding of ATP while Motif 2 couples ATP, tRNA and amino acid binding [35,122].

452 Another weakly conserved motif in the active site of these proteins forms an α-helix that is

453 linked to a β-strand with a proline residue at the end (Motif 19, Figure 4). This motif is

454 known to be crucial in formation of dimers in most proteins of this class [123].

455 Further, subclasses in this class have conserved motifs within each subclass (Figure 4). For

456 example, Ser, Thr, Gly, Pro and His aaRSs all belong to the Class IIa and have anticodon

457 binding domains that are specific to the subclass [48]. These proteins are specific to small and

458 hydrophobic amino acids and have motifs that are conserved among them as shown in the

459 heatmap (Figure 4). The anticodon binding domain comprises of three α-helices five and β-

460 stranded sheets and occurs in the C-terminus of this sub-class [48,124,125]. The anticodon

461 binding domain is absent in SerRS as this protein does not require an anticodon to

462 discriminate its cognate tRNA [47,48]. Subclass IIb which comprises of AsnRS, LysRS and

463 AspRS have a unique anticodon binding domain at the N-terminal and share conserved

464 motifs (Figure 4) [50,126,127]. This subclass of enzymes is specific to large polar and

465 charged amino acid substrates and are similar in structural organization. AspRS is capable of

466 catalysing aminoacylation of aspartate and asparagine and thus it can be classified as

467 discriminating and non-discriminating protein just like GluRS [128,129]. Non-discriminating

468 AspRS is only present in bacteria and archaea but not in eukaryotes [130]. Family specific

469 motifs, can be attributed to the diversity in accessory domains found at the N- and C-

470 terminal or within loops in the core domain [131].

471 bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

472 Multiple sequence alignment and Motif mapping

473 Plasmodium and mammalian sequences for every aaRS family were aligned using TCOFFEE

474 as indicated in the methodology. The alignment results were visualized using Jalview

475 software and motifs discovered for each family mapped to these alignments [100]. A purple

476 colour was used for the motifs that were conserved in all plasmodium and mammalian

477 sequences, blue colour for only motifs conserved in mammalian species and green colour for

478 motifs conserved only in plasmodium sequences (Additional file 4). On carrying out motif

479 analysis and sequence alignment of Class I aaRSs, it was observed that not all families had

480 the KMSKS signature though all proteins had the HIGH signature (Additional file 4).

481 Alignment of ArgRS showed inserts in mammalian ArgRS at both the C- and N-termini that

482 are not present in plasmodium sequences (Additional file 4A). The highly conserved HIGH

483 signature in Class I aaRSs catalytic domain was observed in Motif 1 of this family (HVGH)

484 (Additional file 4A). Motifs 10 and 12 which were conserved only in mammalian sequences

485 were observed in the N-terminal. Human ArgRS has a basic 72 residue extension at the N-

486 terminal which is characteristic of mammalian ArgRS and plays a role in interaction with

487 accessory proteins like p43 to form the multi-synthetase complex [116,132]. Mammals also

488 have an ArgRS isoform that lacks this extension and is believed to be important in ubiquitin

489 dependent protein degradation where it forms Arg-tRNAArg which is transferred to ArgRS

490 which then adds the arginine to all acidic N-terminal amino acids [133,134].

491

492 bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

493 Figure 5: Motifs discovered in TyrRS family mapped to the multiple sequence alignment 494 results. Motif numbering is based on MEME results. A purple colour shows motifs conserved 495 in all sequences while motifs only present in mammalian sequences are shown in blue. The 496 highly conserved HIGH and KMSKS motifs in Class II aaRSs are shown in a red and yellow 497 box respectively. 498

499 CysRS sequence alignment and motif mapping showed a highly conserved core domain and

500 weakly conserved N- and C-terminal domains. The highly conserved HIGH signature was

501 found in Motif 2 of this family occurring as HLGH in plasmodium and HMGH in the

502 mammalian sequences (Additional file 4B). Motif 8, 10, 12, 13, 18 and 19 were conserved

503 only in mammalian sequences while Motif 11 and 15 were only conserved in plasmodium

504 sequences analysed in this family (Additional file 4B). GlnRS alignment also showed low

505 conservation on both termini with inserts observed for the mammalian sequences at the N-

506 terminal (Additional file 4C). Only two plasmodium specific motifs were found at the core

507 domain, Motif 23 at the N-terminal end of the highly conserved HIGH signature (Motif 2)

508 and Motif 29. Motif 8, 9, 11 and 13 were found only in the mammalian species (Additional

509 file 4C). P. falciparum is reported to have Glutathione-S-transferase (GST)-like domains

510 though their function in the malarial parasite has not been reported [34]. These domains are

511 important in formation of multi-synthetase complex through protein-protein interactions in

512 eukaryotes [21,22,117]. GST-like domains have also been reported in MetRS though just like

513 in GlnRS, the function of these domains in plasmodium is not known unlike in eukaryotes

514 where they play a role in protein-protein interactions [19].

515 The GluRS family also showed low conservation at the N-terminal with Motif 16 present in

516 mammalian sequences at this terminal (Additional file 4D). The HIGH signature was found

517 in Motif 3 as HIGH in all sequences analysed except for PfGluRS where it was HVGH

518 (Additional file 4). P. falciparum GluRS sequence has a glutamine rich N- terminal from

519 residue 68 as opposed to other plasmodium species. In mammals including human, this bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

520 enzyme is a bifunctional protein acting both as GluRS and ProRS thus it catalyses

521 aminoacylation of both proline and glutamate [135]. On alignment with plasmodium GluRS,

522 the mammalian sequences showed a C-terminal extension indicating that it is the N- terminal

523 end that catalyses glutamate aminoacylation. The human enzyme contains three motifs that

524 link the two catalytic domains that function in formation of the multicomplex synthetase and

525 play a role in protein-nucleic acid interactions [135,136]. Similar motifs have been reported

526 in other aaRS like GlyRS, HisRS and TrpRS though they occur at the N-or C-termini of the

527 core domains as a single copy as opposed to the Glu/ProRS where they occur as tandem

528 repeats linking the two catalytic domains [135–137]. Human IleRS has an extension at the C-

529 terminal which was absent in plasmodium sequences, but the core domain of this family was

530 highly conserved (Additional file 4E). Motif 19, 20 and 26 were conserved in the C-terminal

531 of mammalian sequences but absent in plasmodium sequences. The three tandem motifs in

532 the human bifunctional Glu/ProRS have been shown to interact with two repeated motifs in

533 IleRS at the C-terminal extension [138]. In IleRS, the HIGH signature was found in Motif 1

534 while the KMSKS signature was in Motif 3 occurring as HYGH and KMSKR respectively

535 (Additional file 4E). Alignment and motif discovery of LeuRS family showed that this family

536 of protein has low conservation even at the core domain (Additional file 4F). Motif 21, 25

537 and 27 were conserved in plasmodium sequences. Only Motifs 3, 5, 6, 26 and 36 were

538 conserved through all mammalian and plasmodium sequences (Additional file 4F). The other

539 motifs were conserved only in mammalian sequences. The highly conserved Motif 6 had the

540 HIGH signature occurring as HVGH for PfLeuRS, PmLeuRS, PoLeuRS, PyLeuRS, HMGH

541 for PfrLeuRS, PvLeuRS and PkLeuRS and HLGH in the analysed mammalian sequences

542 (Additional file 4F). Anticodon binding domain in LeuRS is located at the C-terminal which

543 had a low conservation as seen in Additional file 4F and this may provide specific targets for

544 drug discovery [139]. Motif discovery and alignment of MetRS showed high conservation of bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

545 mammalian sequences. Some unique motifs were only present in plasmodium MetRS but

546 were absent in mammalian sequences (Additional file 4G). The highly conserved HIGH

547 signature was observed in Motif 8 which was conserved in all sequences analysed while the

548 KMSKS signature was found in Motif 14 conserved in plasmodium and Motif 6 in

549 mammalian sequences (Additional file 4G). The catalytic domain of MetRS was weakly

550 conserved with only Motif 1, 2, 4, 8, and 15 being conserved in all sequences at this region.

551 The C-terminal showed mammalian and plasmodium specific motifs. Motif 5 and 9 found at

552 the N-terminal were conserved in all analysed sequences in this family (Additional file 4G).

553 TrpRS alignment revealed a plasmodium specific extension at the N-terminal characterised

554 by Motif 8, 9, 10 and 14 (Additional file 4H). This extension plays a role in aminoacylation

555 and tRNA binding as reported in P. falciparum [34]. In P. falciparum, this extension

556 comprises of a linker region and an AlaX-like domain that plays a role in tRNA binding but

557 does not edit mis-acylations as observed with Pyrococcus horikoshii [120]. The core domain

558 and the C-terminal of TrpRS family showed highly conserved motifs in all the sequences

559 with only a short Motif 18 present in mammalian sequences (Additional file 4H). Alignment

560 and mapping motifs discovered in TyrRS sequences showed high conservation of motifs at

561 the core domain (Figure 5). Alignment of sequences in this family showed an extension at the

562 C-terminal of the mammalian TyrRS which was missing in all plasmodium sequences (Figure

563 5). This extension was characterised by Motifs 6, 8, 9, 11 and 20 which were conserved in all

564 the mammalian sequences analysed (Figure 5). This extension in human TyrRS is an

565 endothelial monocyte-activating polypeptide II (EMAPII) domain that has cytokine-like

566 functions like angiogenesis and inflammation [22,140]. Motif discovery showed that the core

567 domain is highly conserved across the mammalian and plasmodium TyrRS sequences (Figure

568 5). The catalytic domain of the human sequence is also different from the malarial parasites

569 in that it has a buried tripeptide cytokine motif (Glu-Leu-Arg) while in plasmodium this motif bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

570 is on the surface [22,53]. ValRS alignment showed a N-terminal extension for the

571 mammalian sequences that was absent in all plasmodium sequences comprising of Motifs 14,

572 16, 18, 22 and 25 (Additional file 4J). Mapping of motifs showed that the catalytic domain of

573 proteins analysed in this family are highly conserved though a few plasmodium specific

574 motifs were observed. The highly conserved HIGH signature was found in Motif 2 of this

575 family while the KMSKS signature was in Motif 7 (Additional file 4J). The N-terminal

576 domain showed Motif 16, 33, 34 and 35 which were conserved only in plasmodium

577 sequences (Additional file 4J). Motifs 20, 30 and 38 that were specific to mammalian

578 sequences were also observed at the N-terminal (Additional file 4J).

579 Alignment of AlaRS sequences showed a N-terminal extension of varying lengths in the

580 plasmodium species which was absent in mammalian AlaRS (Additional file 4K). The C-

581 terminal of the proteins in this family showed Motifs 20, 21 and 29 that were only conserved

582 in plasmodium sequences and not in human as well as mammalian specific motifs (Motif 8,

583 14, 17 and 18). AsnRS, LysRS, and AspRS alignment and motif discovery showed low

584 conservation at the N-terminal while core domains and the C-terminal showed high

585 conservation. The anticodon binding domain of these proteins is located at the highly variable

586 N-terminal and thus drugs that specifically bind to the parasite tRNA binding site can be

587 designed [14,141]. Motif 11, 12 and 17 were conserved in plasmodium sequences of AsnRS

588 family at the N-terminal while in this region, Motif 5, 6 and 13 conserved in mammalian

589 sequences were observed (Additional file 4L). In AspRS both the catalytic domain and the C-

590 terminal were highly conserved with the presence of two short Motifs (16 and 20) conserved

591 only in mammals (Additional file 4M). GlyRS, HisRS, ProRS, ThrRS families belong to the

592 subclass IIa and have a highly conserved tRNA binding region at the C-terminal as seen in

593 the alignments and motifs in this region (Additional file 4 N, O, R and T). HisRS family

594 showed a N-terminal extension for all plasmodium sequences analysed but absent in the bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

595 mammalian sequences (Additional file 4O). This extension was characterised by Motifs 11,

596 12, 14, 15, 17, 18, 19 and 23 (Additional file 4O). However, SerRS which also belongs to this

597 subclass does not need an anticodon to discriminate its substrate and thus lacks this domain

598 [48] and the C-terminal of this family showed low conservation (Additional file 4S). ProRS

599 showed Motif 17 and 20 which were conserved only in plasmodium sequences analysed

600 (Additional file 4R). Plasmodium ProRS has a Ybak domain at the N-terminal which edits

601 mischarged Pro-tRNAAla and Pro-tRNASer and this may explain the plasmodium specific

602 motifs at the N-terminal [15,34,57]. The mammalian sequences analysed for this family were

603 of the cytosolic bifunctional Glu/ProRS proteins and this explains the mammalian specific

604 motifs observed at the N-terminal which is believed to be the region responsible for

605 glutamate aminoacylation (Additional file 4R).

606 PheRS motif discovery and alignment showed that the plasmodium sequences are highly

607 variable when compared to mammalian PheRS (Additional file 4Q). Motifs 9, 10, 11 and 13

608 were conserved only in plasmodium while Motifs 5, 6, 8 and 14 were conserved in

609 mammalian sequences in this family (Additional file 4Q). Only Motifs 1, 2, 3 and 4 were

610 conserved across all the sequences in this family (Additional file 4Q). Plasmodium PheRS

611 has a nuclear localization signal and DNA binding domains and thus in addition to

612 aminoacylation, this enzyme mediates cellular processes by binding DNA [142]. Despite high

613 conservation at the aaRS active sites, differences were noted at the residue level after the

614 sequences were aligned. For example, in LysRS family, P. falciparum ATP binding pocket at

615 positions Val328 and Ser344 corresponds to Gln321 and Thr338 respectively in the human

616 protein (Figure 6). Residues with a large side chain at this position like observed in human

617 LysRS do not favour binding of cladosporin a known inhibitor for PfLysRS [25]. These two

618 residues are thus believed to be responsible for selective binding of cladosporin to P.

619 falciparum and not human LysRS [25]. Discovery of drugs that have high specificity to bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

620 parasitic proteins has for a long time been a challenge resulting in drug toxicity in human

621 cells [14]. The alignment results showed striking differences at the sequence level of

622 plasmodium and human aaRSs that can further be explored for the design and development of

623 drugs with few side effects.

624 Figure 6: Mapping of discovered motifs in LysRS family to multiple sequence 625 alignment. A purple colour shows motifs conserved in all sequences while motifs only 626 present in mammalian sequences are shown in blue. One motif conserved only in 627 plasmodium species is shown in green. Motif numbering is based on MEME results. The 628 three conserved signatures in Class II aaRSs are shown in red, yellow and pink boxes. The 629 red arrows show residues Val328 and Ser344 in P. falciparum which are key residues in 630 binding of ATP. 631 632 Phylogenetic tree calculations and pairwise sequence identity calculations agree in

633 grouping sequences

634 On conducting phylogenetic tree analysis, all plasmodium species clustered together, and this

635 was also seen on performing all versus all pairwise sequence identity calculations (Figure 7,

636 Figure 8 and Additional file 6). In this study, numbering of sequences in sequence identity

637 heatmaps was based on the branching of phylogenetic trees. In Class I, plasmodium

638 sequences in TyrRS family showed the highest sequence identity (above 85%) while GlnRS

639 plasmodium sequences showed the lowest sequence (below 75%) identity among

640 plasmodium families. In most of the families, P. yoelii and P. bergei sequences were

641 clustered together in the trees. P. vivax, P. fragile and P. knowlesi were also clustered

642 together in many families, indicating that they are highly conserved and share evolutionary

643 history. These similarities were also captured in sequence identity calculations, and reflected

644 as imaginary boxes in heat maps. Here we will name them as “conservation boxes”. P. bergei

645 and P. yoelii are rodent malaria parasites and are used to study human malaria [143,144]. P.

646 fragile infects simians and studies have shown that human red blood cells do not support the

647 growth of this parasite but it showed a high sequence identity to P. knowlesi whose natural bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

648 vertebrate host is Macaca fascicularis but has been reported to infect human in some parts of

649 Southeast Asia [145,146]. P. knowlesi has been reported to have a close phylogenetic

650 relationship to P. vivax [147] and the two showed a sequence identity above 95% in TyrRS

651 (Figure 8). P. fragile-monkey models can thus be used to study parasite-host-system for the

652 immunological response of the falciparum-like parasite both in vivo and in vitro [148].

653 In ArgRS sequence identity calculations, plasmodium sequences had above 80% sequence

654 identity and motif discovery showed that all motifs identified were conserved in all sequences

655 (Additional file 3 & 6.1). ValRS plasmodium sequences showed 80% sequence identity with

656 PvValRS, PkValRS and PfrValRS clustering together with a 90% sequence identity. In this

657 family, PyValRS and PbValRS showed above 95% sequence identity, clustered together in

658 the phylogenetic tree and shared Motif 36 which was absent in the other plasmodium

659 sequences (Additional file 3 & 6.11). PvCysRS, PkCysRS and PfrCysRS clustered together

660 with a 90% sequence identity and shared Motif 22 which was missing in other plasmodium

661 sequences (Additional file 3 & 6.2). Motif 27 was present only in PyCysRS and PbCysRS

662 and these two sequences showed a 90% sequence identity Additional file 3 & 6.2). PfrGlnRS,

663 PkGlnRS and PvGlnRS clustered together and Motif 34, 35 and 37 were only present in these

664 sequences (Additional file 3 & 6.3). In this family, E. coli, human and S. cerevisiae

665 sequences formed an outgroup showing they are the oldest aaRS (Additional file 6.3). GluRS

666 plasmodium sequences had above 75% sequence identity and shared all identified motifs

667 (Additional file 3 & 6.4). PbIleRS and PyIleRS shared Motif 38 and 39 and showed 95%

668 sequence identity (Additional file 3 & 6.5). Cryptosporidium and toxoplasma belong to the

669 apicomplexan family together with plasmodium and their sequences showed about 50%

670 sequence identity to plasmodium sequences in IleRS and MetRS family (Additional file 6.5

671 & 6.7). PvLeuRS, PfrLeuRS and PkLeuRS had 80% sequence identity and shared Motif 39

672 (Additional file 3 & 6.6). In TrpRS, Motif 21 and 23 were only identified in PbTrpRS and bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

673 PyTrpRS which had 90% sequence identity. In all families in Class I, human sequences

674 showed low sequence identities (below 40%) compared to the plasmodium sequences

675 (Additional file 6).

676 677 Figure 7. A) TyrRS family phylogenetic tree. Maximum Likelihood method was used to 678 infer evolutionary history using Le_Gascuel_2008 model at 95% site coverage [149]. 679 Phylogenetic tree calculations were done using MEGA7 [101]. The tree that had the highest 680 log likelihood (-2978.09) is shown. Initial tree(s) for the heuristic search were obtained by 681 using BioNJ and Neighbor-Join algorithms to a matrix of pairwise distances calculated using 682 a JTT model, and then selecting the topology with higher log likelihood value. A Gamma 683 distribution was used to calculate evolutionary rate differences among sites (5 categories (+G, 684 parameter = 0.4355)). Nine amino acid sequences were used for this analysis. There were 343 685 positions after the calculations. B) TyrRS pairwise sequence calculations. The sequence 686 identity values of the sequences in the TyrRS family is shown. The heatmap shows the 687 identity scores as a color-coded matrix for every aaRS sequence versus every aaRS sequence 688 in this family. Conservation increases from blue to red in the heat map. 689 690 691 In Class II aaRSs, ProRS family showed the highest sequence identity with plasmodium

692 sequences having above 80% sequence identity (Additional file 6). The high sequence

693 identity among plasmodium sequences was also reflected in motif identification where all the

694 sequences shared the identified motifs (Additional file 3). In Class IIa GlyRS showed the

695 least conservation with most of the sequences having less than 65% sequence identity

696 (Additional file 6.16). GlyRS family showed low conservation with sequence identity less

697 than 70% for all sequences except for PfrGlyRS, PvGlyRS and PkGlyRS which formed a

698 conservation box with a sequence identity of about 75% (Additional file 6.16). This

699 clustering was also seen in motif identification whereby PfrGlyRS, PvGlyRS and PkGlyRS

700 had Motif 24 which was absent in all other plasmodium sequences in this family. PbGlyRS

701 and PyGlyRS had a sequence identity of 90% and shared Motif 27, 30 and 34 (Additional file

702 3 & 6.16). P. falciparum ThrRS had a low sequence identity compared to other plasmodium

703 sequences and it also branched separately in the phylogenetic tree. In SerRS family, human,

704 T. brucei, C. albicans, T. gondii and C. parvum also formed a conservation box but with a bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

705 sequence identity of about 65%. P. vivax, P. fragile and P. knowlesi in this family had a high

706 sequence identity forming a conservation box and clustered together in the phylogenetic tree

707 (Additional file 6.20). PfrThrRS, PvThrRS and PkThrRS shared Motif 24, 27, 29 and 33

708 showing these sequences are closely related as depicted by trees and sequence identity

709 calculations (Additional file 3 & 6.20). In SerRS family, plasmodium sequences formed a

710 conservation box with about 75% sequence identity with each other except for P. yoelii

711 which was more identical to P. bergei with a sequence identity of 90% (Additional file 6.19).

712 In motif identification, P. yoelii shared Motif 20 and 22 which were all absent in all other

713 plasmodium sequences explaining the high sequence conservation (Additional file 3).

714 PkSerRS and PvSerRS branched together and the two shared Motif 19 showing that the

715 sequences are closely related (Additional file 3). In HisRS family, all plasmodium sequences

716 formed a conservation box showing more than 70% sequence identity to each other except for

717 PfHisRS (Additional file 6.14). This difference was also seen in motifs identified in this

718 family where Motif 21 was present in all plasmodium sequences but absent in PfHisRS

719 (Additional file 3).

720 In Class IIb, AsnRS sequences were highly conserved with above 80% sequence identity

721 while AspRS was the least conserved with about 65% sequence identity (Additional file 6.13

722 & 6.15). The high sequence conservation in AsnRS was also seen in motif discovery where

723 all plasmodium sequences shared identified motifs (Additional file 3). In AsnRS family, C.

724 ubiquitum showed a higher sequence identity to S. typhi sequence than to T. gondii which

725 belongs to the same phylum. PvAspRS and PkAspRS branched together in tree calculation

726 and these two proteins shared Motif 22, 26 and 28 showing they are closely related

727 (Additional file 3 & 6.15). In LysRS family, plasmodium sequences showed a sequence

728 identity of above 75% with PbLysRS and PyLysRS forming a conservation box with about

729 95% sequence identity (Figure 8). PfrLysRS, PkLysRS and PvLysRS also formed a bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

730 conservation box and these three proteins shared Motif 15 which was absent in other

731 plasmodium sequences (Figure 8, Additional file 3).

732 Overall, PheRS was the least conserved family in Class II with plasmodium sequences with

733 only about 50% sequence identity and this was seen during motif discovery where only a few

734 motifs were conserved across species (Additional file 3 & 6). In the AlaRS family, P.

735 falciparum (sequence 5 in the heatmap) was less conserved compared to other plasmodium

736 sequences as seen in the (Additional file 6). Plasmodium sequences in AlaRS family showed

737 a sequence identity above 70% (Additional file 6.12). In this family, P. vivax, P. fragile and

738 P. knowlesi also formed a conservation box while P. yoelii and P. bergei also formed a

739 conservation box indicating that these sequences are highly conserved compared to other

740 plasmodium sequences. PfrAlaRS, PvAlaRS and PkAlaRS shared Motif 31 which was absent

741 in all other plasmodium sequences but present in mammalian sequences (Additional file 3).

742 In all the families in Class II, human sequences in this class branched out as an out group and

743 this is supported by the low sequence identity (below 40%) shown in the conservation

744 heatmaps (Additional file 6).

745 Figure 8. A) LysRS family phylogenetic tree. Maximum Likelihood method was used to 746 infer evolutionary history using Le_Gascuel_2008 model at 90% site coverage [149]. 747 Phylogenetic tree calculations were done using MEGA7 [101]. The tree that had the highest 748 log likelihood (-6116.25) is shown. Initial tree(s) for the heuristic search were obtained by 749 using BioNJ and Neighbor-Join algorithms to a matrix of pairwise distances calculated using 750 a JTT model, and then selecting the topology with higher log likelihood value. A Gamma 751 distribution was used to calculate evolutionary rate differences among sites (5 categories (+G, 752 parameter = 0.6075)). Eleven amino acid sequences were used for this analysis. There were 753 503 positions after the calculations. B) LysRS pairwise sequence calculations. The sequence 754 identity values of the sequences in the LysRS family is shown. The heatmap shows the 755 identity scores as a color-coded matrix for every aaRS sequence versus every aaRS sequence 756 in this family. Conservation increases from blue to red in the heat map. 757 bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

758 PART 2 – STRUCTURAL ANALYSES

759 Accurate 3D protein models are calculated for Class I and Class II aaRSs

760 In the PDB, there are only four Class I (ArgRS, MetRS, TrpRS, TyrRS) and two Class II

761 (LysRS and ProRS) structures that were available with reasonable quality. As a first step,

762 each of these crystal structures was remodelled to eliminate the missing residues, except

763 PfTyrRS, as this structure does not have missing residues. It was previously shown that

764 homology modelling with a very high sequence template identity (or remodelling itself) does

765 not introduce modelling errors [150]. As a next step, these models were used to model the 3D

766 structures of the homologues (see Additional file 2 for further information).

767 For each protein, 100 homology models were calculated, and the three best models selected

768 based on z-DOPE scores. DOPE score is an atomic statistical potential which depends on a

769 native protein structure [151]. It is highly accurate in assessment of the quality of protein

770 models as it accounts for the spherical and finite shape of the protein native structure [151–

771 153]. It depends on the number of atom pairs considered and thus the number of all possible

772 pairs of heavy atoms in the protein are normalized to get the z-DOPE score [151,154].

773 Models with lowest z-DOPE were selected and model quality assessment was done using

774 Verify 3D [96], ProSA [95] and QMEAN [97] webservers. Verify 3D assesses the

775 compatibility of the 3D structure with the amino acid sequence (1D) and assigns a class to the

776 structure based on the local environment, location and secondary structure and compares this

777 to known native structures [96]. At least 80% of the amino acid residues should have a score

778 greater than or equal to 0.2 in the 3D/1D profile for the structure to be considered of good

779 quality. ProSA-web is a tool for checking errors in a 3D model and displays the quality score

780 as graphical presentation. Areas of the model that are not accurate are identified by a plot of

781 local quality scores which are then mapped on the 3D structure using colour codes [95]. bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

782 QMEAN score describes the major geometrical aspects of protein models using five

783 structural descriptors. The overall status of residues is described by a solvation potential,

784 long-range interactions are assessed by secondary structure-specific pairwise residue-level

785 potential that is dependent on distance and a torsion angle potential is used to determine the

786 local geometry which is calculated over three consecutive residues [97]. Descriptors of

787 solvent accessibility and the agreement between calculated and predicted structures are also

788 used in calculating the score [97]. All the calculated models passed the quality evaluation

789 tests from these three tools (Additional file 2).

790 The models for the plasmodium ArgRS were built using 5JLD [89] as a template while 4ZAJ

791 was used for the human homologue. The ArgRS models consist of the N-terminal, catalytic

792 domain and the anticodon binding domain. All the models for MetRS, which included the

793 catalytic domain and the anticodon binding domain, were calculated using 4DLP [90].

794 Plasmodium TrpRS models were built with 4J75 [155] while 1R6T [156] was used for

795 HsTrpRS. It was possible to model the N-terminal, catalytic and anticodon binding domains

796 for this family. The crystal structure 5USF [92] was used for the calculation of plasmodium

797 TyrRS while 1Q11 [156] consisting only the catalytic and anticodon binding domain was

798 used to model the HsTyrRS. The catalytic and anticodon binding domains of LysRS were

799 built using 4DPG [94] as the starting structure while 4NCX [60] was used for building ProRS

800 models which included a zinc-binding like domain at the C-terminal.

801 The 3D models were, then, used for mapping identified motifs to structures as well as for the

802 search of alternate druggable sites in P. falciparum homologues.

803 Motif mapping to homology models

804 Out of all identified motifs (Additional file 3), the motifs of the six families with structures

805 were mapped into the 3D structures (Figure 9, Figure 10 and Additional file 5). The start and bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

806 end residues for motifs identified in the six families are shown for P. falciparum and the

807 human homologues (Table 3). In ArgRS family, motifs were conserved in all analysed

808 structures except Motif 16 which was present only in the plasmodium sequences but absent in

809 in HsArgRS (Additional file 5A, Figure 9). HsArgRS N-terminal had Motifs 10 and 11 which

810 were absent in plasmodium structures. Motif 13 was not positionally conserved in the

811 analysed structures. In plasmodium it occurs in the anticodon binding domain and the N-

812 terminal while in HsArgRS it occurs in catalytic and the anticodon binding domains

813 (Additional file 5A). In HsMetRS, Motif 5 was in the anticodon binding domain while in

814 plasmodium structures this motif was mapped to the catalytic site. The motif occurs in an

815 alpha helix region in HsMetRS while in PfMetRS the site consists of beta sheets. Motif 14

816 occurring in the catalytic site and a loop region in PfMetRS was missing in HsMetRS

817 structure (Figure 9). Motif 10 was present in HsMetRS anticodon binding domain but absent

818 in plasmodium. Other motifs in this family were conserved across all analysed structures. In

819 TrpRS, Motif 7 was only present in HsTrpRS but absent in all plasmodium structures. Motif

820 8, 9 and 10 were present only in PyTrpRS (Additional file 5C). Motif 1 and Motif 4 were

821 mapped at the catalytic domain in all structures except in PyTrpRS where they are in the

822 anticodon binding domain (Additional file 5C). Motif 2 was present at the catalytic domain of

823 all the TrpRS homology model structures but absent in PyTrpRS. In TyrRS family, Motif 14

824 was conserved in PfTyrRS, PkTyrRS, PmTyrRS, PvTyrRS and PyTyrRS while Motif 12 was

825 only present in human (Figure 9 & Additional file 5D).

826 In Class II, in LysRS, Motif 9 occurs at the catalytic domain in a region consisting of alpha

827 helices and loops in all structures except PfrLysRS and PmLysRS where it mapped in a

828 region consisting of both beta strands and alpha helices. PfrLysRS and PmLysRS did not

829 have Motif 4 present in the anticodon binding domain of all other structures. Motif 8 mapped

830 in a region consisting of alpha helices in all structures except in PfrLysRS and PmLysRS bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

831 where the region consisted of beta sheets and alpha helices (Additional file 5E). Mapped

832 motif in ProRS were conserved in all analysed secondary structures (Figure 10 & Additional

833 file 5F).

834 Figure 9: Class I motifs mapped to the homology models of P. falciparum and the 835 human homologue. Motifs are numbered according to the MEME results. 836

837 Figure 10: Class II motifs mapped to the homology models of P. falciparum and the 838 human homologues. Motifs are numbered according to the MEME results.

839 bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

840 Table 3: Starting and ending positions of motifs identified in PfArgRS, PfMetRS, PfTrpRS, PfTyrRS, PfLysRS and PfProRS as well as the 841 human homologues. Dashes show where the motif was not present.

Motif 1 Motif 2 Motif 3 Motif 4 Motif 5 Motif 6 Motif 7 Motif 8 Motif 9 Motif 10 Motif 11 Motif 12 Motif 13 Motif 14 Motif 15 Motif 16 Motif 17 PfArgRS 135-184 441-490 308-357 506-555 364-413 30-64 230-279 70-119 186-226 ------15-29 558-578 420-434 286-306 ---- HsArgRS 198-247 499-548 371-420 568-617 428-477 101-135 293-342 142-191 249-289 50-99 620-660 8-48 346-360 478-498 549-563 ---- 363-370 PfMetRS 243-292 519-559 ---- 439-459 331-360 ------193-242 405-433 ------562-611 ---- 468--517 301-329 ---- 657-706 HsMetRS 285-334 598-638 ---- 510-530 711-740 ---- 460-509 234-283 35-63 ------104-153 159-208 ---- 343-371 839-888 ---- PfTrpRS 301-350 447-496 512-561 249-298 392-441 562-611 ---- 74-123 ---- 124-173 1-41 203-243 ---- 43-71 354-368 612-626 174-199 HsTrpRS 154-203 304-353 354-403 102-151 247-296 404-453 48-97 ------222-242 ---- 6-46 ---- 206-220 454-468 ---- PfTyrRS 172-221 60-88 265-314 320-360 91-108 ---- 116-144 ------223-251 ------30-50 149-169 254-264 362-469 51-58 HsTyrRS 150-199 39-67 240-289 295-335 69-86 488-528 97-125 395-444 339-388 200-228 447-487 1-29 129-149 ---- 229-239 89-96 30-37 PfLysRS 304-353 465-514 532-581 177-226 254-303 121-170 360-409 70-119 414-463 228-248 ---- 515-529 40-60 ------354-359 ---- HsLysRS 297-346 459-508 526-575 169-218 247-296 114-163 353-402 63-112 409-458 221-241 ---- 509-523 577-597 ------347-352 1-15 PfProRS 362-411 268-317 662-711 502-551 570-606 447-496 319-359 ------718-746 ------418-446 615-648 ---- 109-149 HsProRS 1124- 1030- 332-381 1266- 1338- 1209- 1081- 447-496 666-715 392-441 1484- 191-240 257-306 1180- 1383- 1443- 497-537 1173 1079 1315 1374 1258 1121 1512 1208 1416 1483 bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

842 New potential druggable sites in P. falciparum aaRSs are identified

843 FTMap provides information on binding hot spots and the druggability of these sites using

844 probes from fragment libraries [107]. These fragment hits can be used in identification of hits

845 from larger ligands. On the other hand, SiteMap predicts possible binding sites using an

846 algorithm that assigns site points using geometric and energetic properties [105,106]. The site

847 points are then grouped to give sites which are ranked based on a SiteScore computed based

848 on size, hydrophobicity, exposure to the solvent and the ease of donating or accepting

849 hydrogens. Both FTMap and SiteMap showed consistency in prediction of probable binding

850 sites. In all the six modelled proteins, FTMap and SiteMap were able to predict the known

851 active sites which consists of the ATP and amino acid binding sites as the highest ranked site

852 (Figure 11 & 2). Alternative sites were also predicted in PfArgRS, PfMetRS, PfProRS, and

853 HsProRS that can be targeted for design of new drug classes using both FTMap and SiteMap

854 (Figure 11, 12 & 13). Since two tools show consistency in prediction of possible binding

855 sites, we only discuss the results from FTMap in this study.

856 The identified potential druggable site in PfArgRS is in a region located at the anticodon

857 binding domain characterized by Motif 4 and 6 but the site is not present in HsArgRS (Figure

858 9 & 11, Table 3). Probes at this site interact with residues in the ABD – His515, Lys518,

859 Ile522, Lys534, Glu537, Asp541 and Tyr34 located in the N-terminal domain. Motif results

860 showed low conservation of these residues with His515 corresponding to Cys577, Lys518 to

861 Arg580, Ile522 to Ile584, Lys534 to Thr592, Glu537 to Asp595, Asp541 to Glu599 and

862 Tyr34 to Ser105 in the human homologue (Figure 11D). These residues are, however, highly

863 conserved in the other plasmodium sequences studied (Additional file 4A). This region can

864 thus be potentially targeted for inhibitor design with high selectivity to the plasmodium

865 protein as indicated by the low conservation in the human homologue. bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

866 Figure 11: Homology models of PfArgRS and HsArgRS and prediction of potential 867 ligand binding sites. The catalytic domain of the models is shown in cyan, the anticodon 868 binding domain (ABD) in grey and the N-terminal domains of PfArgRS and HsArgRS are 869 shown in a light orange colour. The HIGH and KMSKS motifs highly conserved in Class I 870 are shown in red and yellow respectively. Known druggable sites are shown by the red dotted 871 ellipses while the predicted site in PfArgRS by FTMap is shown in purple dotted ellipses. A) 872 PfArgRS homology model. B) Insert – zoomed view of the predicted druggable site in 873 PfArgRS with the residues interacting with probes represented as magenta sticks. C) 874 HsArgRS homology model. No probable druggable sites were predicted in human 875 homologue. D) Motif 4 and 6 logos showing conservation of residues in this family and 876 PfArgRS residues interacting with probes at the predicted site.

877

878 The predicted hotspot in PfMetRS is in a pocket formed by Motifs 5, 9, 14, 20 and the loop

879 region of Motif 4 (Figure 10 & 12). Motif 5 is present in HsMetRS, but this motif occurs in

880 the anticodon binding domain while Motif 14 is not present in HsMetRS (Figure 10 & 12).

881 HsMetRS, however has a Motif 7 present in this site which is absent in the PfMetRS. Probes

882 at the PfMetRS predicted site were interacting with residues Trp481, Ala421, Asp422,

883 Arg415, Pro419, Met385, Leu420, Leu423 and Tyr353. Tyr353, Leu420, Asp422 and Ala421

884 located in Motif 4 and 9 corresponds to Ala733, Val50, Gln52 and Leu51 respectively in the

885 human homologue. The low conservation of residues in these two motifs may explain why

886 the probes only docked to PfMetRS and not HsMetRS. This difference in conservation at

887 residue level in the predicted site can thus be targeted for the potential development of drugs

888 of that bind selectively to PfMetRS. A study by Hussain et al [19] reported an auxiliary

889 binding site different from ATP and methionine binding sites in PfMetRS. Inhibitors at this

890 site interacted with residues Phe482, Ile231, His483, Tyr454, Trp447, Ile479 and Leu451

891 [19]. These residues map to Motif 4 and 14 located at the predicted site by FTMap in

892 PfMetRS homology model (Table 3). An auxiliary binding pocket has also been reported in

893 Trypanosoma brucei MetRS [157].

894 bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

895 Figure 12: Homology models of PfMetRS and HsMetRS and prediction of potential 896 ligand binding sites. The catalytic domain of the models is shown in cyan and the anticodon 897 binding domain (ABD) in grey. The HIGH and KMSKS motifs are shown in red and yellow 898 respectively. A) PfMetRS homology model. Known druggable sites are shown by the red 899 dotted ellipses while the predicted site in PfMetRS by FTMap is shown in purple dotted 900 ellipses. B) Insert – zoomed view of the predicted druggable pocket in PfMetRS showing 901 stick representation (magenta) of residues interacting with probes at this site. The predicted 902 site is located at the catalytic domain C) HsMetRS homology model. HsMetRS had no 903 probable druggable sites predicted by FTMap. D) Motif 5, 9 and 20 logos showing 904 conservation of residues in this family and PfMetRS residues interacting with probes 905 predicted site.

906

907 The identified potentially druggable site in PfProRS occurs at a region characterised by Motif

908 1, 5 and 11 which are also present in HsProRS (Figure 9 & 13). In PfProRS, residues Tyr746,

909 Thr397, Phe262, Arg401 and Lys394 were interacting with probes docked at this site while in

910 human, Thr1164, Phe1167, Thr1277, Leu1162, Arg1278 and Thr1276 were interacting with

911 the probes. All residues implicated in the interaction of probes in PfProRS were conserved in

912 all the studied sequences in this family except Thr397 which corresponds to Gln1159 in the

913 human homologue (Figure 13D & Additional file 4R). A previous study by Hewitt et al [60]

914 reported selective binding of glyburide and TCMDC-124506 at the PfProRS predicted site.

915 This pocket is located at a region formed by α5 (residues 513-524), α9 (residues 261-272)

916 and β-hairpin 1 and 2 (residues 276-287). We observed interaction of Phe262 to the FTMap

917 probes and Tyr746 (Figure 13) which is also reported to interact with glyburide and TCMDC-

918 124506 [60]. Inhibition of PfProRS by the two compounds is known to be through distortion

919 of the ATP binding site [60]. Binding of glyburide and TCMDC-124506 causes movement of

920 a loop between Val389 and Glu404 displacing Phe405, Arg401 and Arg390 which are key

921 residues in ATP binding [60]. The unique predicted sites in PfArgRS, PfMetRS and PfProRS

922 can thus be targeted through high throughput screening to identify new inhibitors.

923 Figure 13: Build homology models of ProRS and prediction of potential ligand binding 924 sites. The catalytic domain is shown in cyan, anticodon binding domain in grey and the C- bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

925 terminal zinc-binding like domain is shown in light pink colour. Motif 2 located at the 926 catalytic domain is shown in yellow. Known druggable sites are shown by the red dotted 927 ellipses while other predicted sites by FTMap are shown in purple dotted ellipses. A) The 928 homology model of PfProRS. B) Insert – zoomed view of the predicted site in PfProRS 929 showing residues interacting with probes at this site as magenta sticks. These probes were 930 interacting with residues – Tyr746, Thr397, Phe262, Arg401 and Lys394. C) The homology 931 model of HsLysRS showing a probable druggable site with residues Thr1164, Phe1161, 932 Thr1277, Leu1162, Thr1276 and Arg1278 interacting with probes. D) Motif 1 and 11 logos 933 showing conservation of residues in these motifs and PfProRS residues interacting with 934 probes at the predicted site. 935

936 Conclusion

937 Resistance and selectivity remain a challenge when designing anti-parasitic drugs. This study

938 aimed at getting insights on the differences at sequence and structure level between

939 plasmodium and human aaRS. Motif analysis of the two aaRSs classes showed family

940 specific motifs. Further, analysis of motifs for each family showed plasmodium specific and

941 also mammalian specific motifs. Multiple sequence alignments and motif analysis of aaRS

942 families showed high conservation of the core domains while N- and C- termini of most

943 families showed low conservation. Interestingly, the core domain of LeuRS sequences

944 showed low conservation despite functional conservation. ArgRS sequence alignment

945 showed mammalian specific inserts at the N- and C-termini while mammalian TyrRS and

946 ValRS had N-terminal extension not present in plasmodium sequences. Inhibitors can be

947 designed to target the highly variable ABD located either at the N-terminal or the C-terminal.

948 On doing pairwise sequence identity calculations, ProRS was the most conserved aaRS

949 family while GlyRS was the least conserved. Phylogenetic studies showed that human

950 proteins had different evolutionary history to plasmodium proteins with plasmodium

951 sequences clustering together. Plasmodium sequences also showed high sequence identity

952 compared to the human homologues which had below 40% sequence identity. P. yoelii and P.

953 bergei were seen to cluster in trees in most of the aaRS families showing that these proteins bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

954 are closely related, and this was also depicted by the high sequence identity and shared motifs

955 among them. P. fragile, P. knowlesi and P. vivax aaRSs were also seen to share evolutionary

956 history and had high sequence identity. Prediction of additional druggable sites identified hot

957 spots in PfArgRS, PfMetRS and PfProRS. The identified sites showed low conservation and

958 variation of identified motifs between P. falciparum proteins and the human homologues.

959 The identified sites can thus be targeted to develop drugs that only selectively bind to

960 plasmodium proteins. As per the results in this study, it is evident that despite structural

961 conservation, plasmodium aaRS have key features that differentiate them from human

962 proteins. These differences can be targeted to develop antimalarial drugs with less toxicity to

963 the host.

964 bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

965 Author contributions

966 Ö.T.B designed the study. D.W.N acquired the data, performed data analysis and wrote the

967 initial draft. All authors contributed in interpretation and discussion of results and writing of

968 the manuscript.

969 Acknowledgements

970 This work is supported by the National Research Foundation (NRF) South Africa (Grant

971 Number 105267). The content of this publication is solely the responsibility of the authors

972 and does not necessarily represent the official views of the funders. D.W.N thanks Rhodes-

973 Henderson Bursary and National Research Foundation South Africa for financial support.

974 D.W.N thanks Vuyani Moses for his guidance in performing the calculations and data

975 analysis.

976

977

978 bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

979 Additional files

980 Additional file 1

981 A table showing the data set used in the study with Blast details and crystal structures

982 retrieved from the Protein Data Bank. The species, E-value, identity, accession number, PDB

983 ID and sequence lengths are given.

984 Additional file 2

985 Homology model validation results obtained for Verify 3D, QMEAN and ProSA webservers.

986 The z-DOPE scores for each model and the templates used for modelling are also shown.

987 Additional file 3

988 Motifs discovered for the 20 aminoacyl tRNA synthetase families using MEME software.

989 The default motif width of 6-50 residues was used. The Mast tool was used to identify

990 overlapping motifs. The number of motifs run for each family varied and motif conservation

991 was presented as number of sites divided by total number of class sequences and results

992 displayed as heatmaps. Motif conservation increases from blue to red.

993 Additional file 4

994 Results on mapping of discovered motifs on multiple sequence alignments for the 20 aaRS

995 families. Multiple sequence alignment was performed using TCOFFEE software with default

996 parameters.

997 Additional file 5

998 Mapping of unique motifs to homology models in plasmodium ArgRS, MetRS, TrpRS,

999 TyrRS, LysRS and ProRS families and the respective human homologues. Motif numbering

1000 for each protein is based on the MEME results.

1001 Additional file 6

1002 Phylogenetic trees and pairwise sequence calculations for aaRS families: Molecular

1003 Phylogenetic calculations were performed using MEGA7. Sequence identity calculations bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1004 were done using an in-house python script and results displayed as heatmaps. Conservation

1005 increases from blue to red.

1006 bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1007 References

1008 1. Brooker S, Akhwale W, Pullan R, Estambale B, Clarke SE, Snow RW, et al. Epidemiology

1009 of Plasmodium-helminth co-infection in Africa: Populations at risk, potential impact on

1010 anemia, and prospects for combining control. Am J Trop Med Hyg. 2007;77:88–98.

1011 2. Fevre EM, Wissmann B V, Welburn SC, Lutumba P. The burden of human African

1012 Trypanosomiasis. PLoS Negl Trop Dis [Internet]. 2008;2. Available from:

1013 http://ovidsp.ovid.com/ovidweb.cgi?T=JS&CSC=Y&NEWS=N&PAGE=fulltext&D=emed8

1014 &AN=2009223996

1015 3. Gething PW, Patil AP, Smith DL, Guerra C a., Elyazar IRF, Johnston GL, et al. A new

1016 world malaria map: Plasmodium falciparum endemicity in 2010. Malar J [Internet].

1017 2011;10:378. Available from: http://eprints.soton.ac.uk/344413/

1018 4. Boatin BA, Basáñez MG, Prichard RK, Awadzi K, Barakat RM, García HH, et al. A

1019 research agenda for helminth diseases of humans: Towards control and elimination. PLoS

1020 Negl. Trop. Dis. 2012.

1021 5. Prichard RK, Basanez MG, Boatin BA, McCarthy JS, Garcia HH, Yang GJ, et al. PLOS

1022 NEGLECTED TROPICAL DISEASES. A Res. Agenda Helminth Dis. Humans Interv.

1023 Control Elimin. 2012. p. 14.

1024 6. Baird JK, J.K. B. Drug therapy: effectiveness of antimalarial drugs. N Engl J Med

1025 [Internet]. 2005;352:1565. Available from:

1026 http://ovidsp.ovid.com/ovidweb.cgi?T=JS&PAGE=reference&D=emed7&NEWS=N&AN=2

1027 005172567%5Cnhttp://content.nejm.org/

1028 7. Nayyar GML, Breman JG, Newton PN, Herrington J. Poor-quality antimalarial drugs in

1029 southeast Asia and sub-Saharan Africa. Lancet Infect. Dis. 2012. p. 488–96. bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1030 8. Rajendran V, Kalita P, Shukla H, Kumar A, Tripathi T. Aminoacyl-tRNA synthetases:

1031 Structure, function, and drug discovery. Int J Biol Macromol [Internet]. Elsevier B.V.;

1032 2018;111:400–14. Available from: https://doi.org/10.1016/j.ijbiomac.2017.12.157

1033 9. Fang P, Guo M. Evolutionary Limitation and Opportunities for Developing tRNA

1034 Synthetase Inhibitors with 5-Binding-Mode Classification. Life [Internet]. 2015;5:1703–25.

1035 Available from: http://www.mdpi.com/2075-1729/5/4/1703

1036 10. Manickam Y, Chaturvedi R, Babbar P, Malhotra N, Jain V, Sharma A. Drug targeting of

1037 one or more aminoacyl-tRNA synthetase in the malaria parasite Plasmodium falciparum.

1038 Drug Discov Today [Internet]. Elsevier Current Trends; 2018 [cited 2018 Jun 12]; Available

1039 from: https://www.sciencedirect.com/science/article/pii/S135964461730538X

1040 11. World Health Organization W, WHO, World Health Organization W. World Malaria

1041 Report 2015. World Health. 2015;1–280.

1042 12. Antinori S, Galimberti L, Milazzo L, Corbellino M. Biology of human malaria plasmodia

1043 including Plasmodium knowlesi. Mediterr J Hematol Infect Dis. 2012;4.

1044 13. Jackson KE, Habib S, Frugier M, Hoen R, Khan S, Pham JS, et al. Protein translation in

1045 Plasmodium parasites. Trends Parasitol. 2011. p. 467–76.

1046 14. Pham JS, Dawson KL, Jackson KE, Lim EE, Pasaje CFA, Turner KEC, et al. Aminoacyl-

1047 tRNA synthetases as drug targets in eukaryotic parasites. Int. J. Parasitol. Drugs Drug Resist.

1048 2014. p. 1–13.

1049 15. Khan S, Sharma A, Jamwal A, Sharma V, Pole AK, Thakur KK, et al. Uneven spread of

1050 cis- and trans-editing aminoacyl-tRNA synthetase domains within translational compartments

1051 of P. falciparum. Sci Rep [Internet]. 2011;1:188. Available from:

1052 http://www.nature.com/articles/srep00188 bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1053 16. Pino P, Aeby E, Foth BJ, Sheiner L, Soldati T, Schneider A, et al. Mitochondrial

1054 translation in absence of local tRNA aminoacylation and methionyl tRNAMet formylation in

1055 Apicomplexa. Mol Microbiol. 2010;76:706–18.

1056 17. Jackson KE, Pham JS, Kwek M, De Silva NS, Allen SM, Goodman CD, et al. Dual

1057 targeting of aminoacyl-tRNA synthetases to the apicoplast and cytosol in Plasmodium

1058 falciparum. Int J Parasitol [Internet]. Australian Society for Parasitology Inc.; 2012;42:177–

1059 86. Available from: http://dx.doi.org/10.1016/j.ijpara.2011.11.008

1060 18. Ibba M, Söll D. Aminoacyl-tRNA Synthesis. Annu Rev Biochem [Internet].

1061 2000;69:617–50. Available from:

1062 http://www.annualreviews.org/doi/10.1146/annurev.biochem.69.1.617

1063 19. Hussain T, Yogavel M, Sharma A. Inhibition of protein synthesis and malaria parasite

1064 development by drug targeting of methionyl-tRNA synthetases. Antimicrob Agents

1065 Chemother. 2015;59:1856–67.

1066 20. Park SG, Schimmel P, Kim S. Aminoacyl tRNA synthetases and their connections to

1067 disease. Proc Natl Acad Sci U S A [Internet]. 2008;105:11043–9. Available from:

1068 http://europepmc.org/articles/PMC2516211/?report=abstract

1069 21. Guo M, Schimmel P, Yang XL. Functional expansion of human tRNA synthetases

1070 achieved by structural inventions. FEBS Lett. 2010. p. 434–42.

1071 22. Paul M, Schimmel P. Essential nontranslational functions of tRNA synthetases. Nat

1072 Chem Biol [Internet]. 2013;9:145–53. Available from:

1073 http://www.nature.com/doifinder/10.1038/nchembio.1158

1074 23. Bhatt TK, Khan S, Dwivedi VP, Banday MM, Sharma A, Chandele A, et al. Malaria

1075 parasite tyrosyl-tRNA synthetase secretion triggers pro-inflammatory responses. Nat bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1076 Commun [Internet]. 2011;2:530. Available from:

1077 http://www.nature.com/doifinder/10.1038/ncomms1522

1078 24. Wolf YI, Aravind L, Grishin N V., Koonin E V. of Aminoacyl-tRNA

1079 synthetases-analysis of unique domain architectures and phylogenetic trees reveals a complex

1080 history of horizontal gene transfer events. Genome Res. 1999;9:689–710.

1081 25. Sharma A, Khan S, Sharma A, Belrhali H, Yogavel M. Structural basis of malaria

1082 parasite lysyl-tRNA synthetase inhibition by cladosporin. J Struct Funct Genomics.

1083 2014;15:63–71.

1084 26. Sherma A, Yogavel M, Sharma A. Structural and functional attributes of malaria parasite

1085 diadenosine tetraphosphate . Sci Rep [Internet]. Nature Publishing Group;

1086 2016;6:1–12. Available from: http://dx.doi.org/10.1038/srep19981

1087 27. Miller LH, Baruch DI, Marsh K, Doumbo OK. The pathogenic basis of malaria. Nature.

1088 2002;415:673–9.

1089 28. Sharma A, Sharma A. Plasmodium falciparum mitochondria import tRNAs along with an

1090 active phenylalanyl-tRNA synthetase. Biochem J [Internet]. 2015;465:459–69. Available

1091 from: http://www.biochemj.org/bj/465/bj4650459.htm

1092 29. Jackson KE, Pham JS, Kwek M, De Silva NS, Allen SM, Goodman CD, et al. Dual

1093 targeting of aminoacyl-tRNA synthetases to the apicoplast and cytosol in Plasmodium

1094 falciparum. Int J Parasitol [Internet]. Pergamon; 2012 [cited 2018 Jun 12];42:177–86.

1095 Available from:

1096 https://www.sciencedirect.com/science/article/pii/S0020751911002992?via%3Dihub

1097 30. Pham JS, Sakaguchi R, Yeoh LM, De Silva NS, McFadden GI, Hou Y-M, et al. A dual-

1098 targeted aminoacyl-tRNA synthetase in Plasmodium falciparum charges cytosolic and bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1099 apicoplast tRNACys. Biochem J [Internet]. 2014;458:513–23. Available from:

1100 http://www.ncbi.nlm.nih.gov/pubmed/24428730

1101 31. Manickam Y, Chaturvedi R, Babbar P, Malhotra N, Jain V, Sharma A. Drug targeting of

1102 one or more aminoacyl-tRNA synthetase in the malaria parasite Plasmodium falciparum.

1103 Drug Discov Today [Internet]. Elsevier Ltd; 2018;0:26–33. Available from:

1104 http://dx.doi.org/10.1016/j.drudis.2018.01.050

1105 32. Bonnefond L, Fender A, Rudinger-Thirion J, Giegé R, Florentz C, Sissler M. Toward the

1106 full set of human mitochondrial aminoacyl-tRNA synthetases: Characterization of AspRS and

1107 TyrRS. Biochemistry. 2005;44:4805–16.

1108 33. Antonellis A, Green ED. The Role of Aminoacyl-tRNA Synthetases in Genetic Diseases.

1109 Annu Rev Genomics Hum Genet [Internet]. 2008;9:87–107. Available from:

1110 http://www.annualreviews.org/doi/10.1146/annurev.genom.9.081307.164204

1111 34. Bhatt TK, Kapil C, Khan S, Jairajpuri MA, Sharma V, Santoni D, et al. A genomic

1112 glimpse of aminoacyl-tRNA synthetases in malaria parasite Plasmodium falciparum. BMC

1113 Genomics. 2009;10:644.

1114 35. Eriani G, Cavarelli J, Martin F, Ador L, Rees B, Thierry JC, et al. The class II aminoacyl-

1115 tRNA synthetases and their active site: Evolutionary conservation of an ATP binding site. J

1116 Mol Evol. 1995;40:499–508.

1117 36. Chaliotis A, Vlastaridis P, Mossialos D, Ibba M, Becker HD, Stathopoulos C, et al. The

1118 complex evolutionary history of aminoacyl-tRNA synthetases. Nucleic Acids Res [Internet].

1119 2016;45:gkw1182. Available from: http://www.ncbi.nlm.nih.gov/pubmed/27903912

1120 37. Chen JF, Guo NN, Li T, Wang ED, Wang YL. CP1 domain in Escherichia coli leucyl-

1121 tRNA synthetase is crucial for its editing function. Biochemistry. 2000;39:6726–31. bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1122 38. Doublié S, Bricogne G, Gilmore C, Carter, CW. Tryptophanyl-tRNA synthetase crystal

1123 structure reveals an unexpected homology to tyrosyl-tRNA synthetase. Structure. 1995;3:17–

1124 31.

1125 39. Yadavalli SS, Ibba M. Quality control in aminoacyl-tRNA synthesis: Its role in

1126 translational fidelity. Adv. Protein Chem. Struct. Biol. 2012. p. 1–43.

1127 40. Perona JJ, Hadd A. Structural diversity and protein engineering of the aminoacyl-tRNA

1128 Synthetases. Biochemistry. 2012;51:8705–29.

1129 41. Cusack S. Aminoacyl-tRNA synthetases Stephen Cusack. Curr Opin Struct Biol

1130 [Internet]. 1997;7:881–9. Available from: http://biomednet.com/elecref/O959440X00700881

1131 42. Perona JJ, Gruic-Sovulj I. Synthetic and editing mechanisms of aminoacyl-tRNA

1132 synthetases. Top. Curr. Chem. 2014. p. 1–41.

1133 43. Ibba M, Losey HC, Kawarabayasi Y, Kikuchi H, Bunjun S, Söll D. Substrate recognition

1134 by class I lysyl-tRNA synthetases: a molecular basis for gene displacement. Proc Natl Acad

1135 Sci U S A. 1999;96:418–23.

1136 44. Bennett EJ, Shaler T a, Woodman B, Ryu K-Y, Zaitseva TS, Becker CH, et al. Anticodon

1137 recognition and discrimination by the alpha-helix cage domain of class I lysyl-tRNA

1138 synthetase. J Biol Chem [Internet]. 2007;282:11033–8. Available from:

1139 http://www.ncbi.nlm.nih.gov/pubmed/17540772%5Cnhttp://www.pubmedcentral.nih.gov/arti

1140 clerender.fcgi?artid=2816034&tool=pmcentrez&rendertype=abstract%5Cnhttp://www.ncbi.n

1141 lm.nih.gov/pubmed/17567576%5Cnhttp://www.ncbi.nlm.nih.gov/pubmed/17581591%5Cnhtt

1142 p://www

1143 45. Pouplana LR, Buechter DD, Davis MW, Schimmel P. Idiographic representation of

1144 conserved domain of a class II tRNA synthetase of unknown structure. Protein Sci. bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1145 1993;2:2259–62.

1146 46. Smith TF, Hartman H. The evolution of Class II Aminoacyl-tRNA synthetases and the

1147 first code. FEBS Lett [Internet]. Federation of European Biochemical Societies;

1148 2015;589:3499–507. Available from: http://dx.doi.org/10.1016/j.febslet.2015.10.006

1149 47. Normanly J, Ollick T, Abelson J. Eight base changes are sufficient to convert a leucine-

1150 inserting tRNA into a serine-inserting tRNA. Proc Natl Acad Sci U S A [Internet].

1151 1992;89:5680–4. Available from:

1152 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=49356&tool=pmcentrez&rendert

1153 ype=abstract

1154 48. Yao P, Fox PL. Aminoacyl-tRNA synthetases in medicine and disease. EMBO Mol. Med.

1155 2013. p. 332–43.

1156 49. Martinez-Rodriguez L, Erdogan O, Jimenez-Rodriguez M, Gonzalez-Rivera K, Williams

1157 T, Li L, et al. Functional class I and II amino acid-activating enzymes can be coded by

1158 opposite strands of the same gene. J Biol Chem. 2015;290:19710–25.

1159 50. Ruff M, Krishnaswamy S, Boeglin M, Poterszman a, Mitschler a, Podjarny a, et al.

1160 Class II aminoacyl transfer RNA synthetases: crystal structure of yeast aspartyl-tRNA

1161 synthetase complexed with tRNA(Asp). Science. 1991;252:1682–9.

1162 51. Bhatt T, Kapil C, Khan S, Jairajpuri M, Sharma V, Santoni D, et al. A genomic glimpse

1163 of aminoacyl-tRNA synthetases in malaria parasite Plasmodium falciparum. BMC Genomics

1164 [Internet]. 2009;10:644. Available from:

1165 http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-10-644

1166 52. Ibba M, Bono JL, Rosa P a, Söll D. Archaeal-type lysyl-tRNA synthetase in the Lyme

1167 disease spirochete Borrelia burgdorferi. Proc Natl Acad Sci U S A. 1997;94:14383–8. bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1168 53. Bhatt TK, Khan S, Dwivedi VP, Banday MM, Sharma A, Chandele A, et al. Malaria

1169 parasite tyrosyl-tRNA synthetase secretion triggers pro-inflammatory responses. Nat

1170 Commun [Internet]. 2011;2:530. Available from:

1171 http://www.ncbi.nlm.nih.gov/pubmed/22068597

1172 54. Khan S, Garg A, Camacho N, Van Rooyen J, Kumar Pole A, Belrhali H, et al. Structural

1173 analysis of malaria-parasite lysyl-tRNA synthetase provides a platform for drug development.

1174 Acta Crystallogr Sect D Biol Crystallogr [Internet]. 2013 [cited 2017 May 22];69:785–95.

1175 Available from: http://scripts.iucr.org/cgi-bin/paper?S0907444913001923

1176 55. Saint-Léger A, Sinadinos C, Ribas de Pouplana L. The growing pipeline of natural

1177 aminoacyl-tRNA synthetase inhibitors for malaria treatment. Bioengineered. 2016;7:60–4.

1178 56. Ebere, Palencia A, Guo D, Ahyong V, Dong C, Li X, et al. Antimalarial Benzoxaboroles

1179 Target. 2016;60:4886–95.

1180 57. Jain V, Kikuchi H, Oshima Y, Sharma A, Yogavel M. Structural and functional analysis

1181 of the anti-malarial drug target prolyl-tRNA synthetase. J Struct Funct Genomics.

1182 2014;15:181–90.

1183 58. Keller TL, Zocco D, Sundrud MS, Hendrick M, Edenius M, Yum J, et al. Halofuginone

1184 and other febrifugine derivatives inhibit prolyl- tRNA synthetase. 2012;8:311–7.

1185 59. Hoepfner D, McNamara CW, Lim CS, Studer C, Riedl R, Aust T, et al. Selective and

1186 specific inhibition of the plasmodium falciparum lysyl-tRNA synthetase by the fungal

1187 secondary metabolite cladosporin. Cell Host Microbe [Internet]. Elsevier Inc.; 2012;11:654–

1188 63. Available from: http://dx.doi.org/10.1016/j.chom.2012.04.015

1189 60. Hewitt SN, Dranow DM, Horst BG, Abendroth JA, Forte B, Hallyburton I, et al.

1190 Biochemical and Structural Characterization of Selective Allosteric Inhibitors of the bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1191 Plasmodium falciparum Drug Target, Prolyl-tRNA-synthetase. ACS Infect Dis [Internet].

1192 2017 [cited 2017 May 22];3:34–44. Available from:

1193 http://pubs.acs.org/doi/abs/10.1021/acsinfecdis.6b00078

1194 61. Sonoiki E, Palencia A, Guo D, Ahyong V, Dong C, Li X, et al. Antimalarial

1195 Benzoxaboroles Target Plasmodium falciparum Leucyl-tRNA Synthetase. Antimicrob

1196 Agents Chemother [Internet]. American Society for Microbiology; 2016 [cited 2018 Jun

1197 12];60:4886–95. Available from: http://www.ncbi.nlm.nih.gov/pubmed/27270277

1198 62. Jain V, Yogavel M, Oshima Y, Kikuchi H, Touquet B, Hakimi MA, et al. Structure of

1199 prolyl-tRNA synthetase-halofuginone complex provides basis for development of drugs

1200 against malaria and toxoplasmosis. Structure [Internet]. Elsevier Ltd; 2015;23:819–29.

1201 Available from: http://dx.doi.org/10.1016/j.str.2015.02.011

1202 63. Jain V, Yogavel M, Kikuchi H, Oshima Y, Hariguchi N, Matsumoto M, et al. Targeting

1203 Prolyl-tRNA Synthetase to Accelerate Drug Discovery against Malaria, Leishmaniasis,

1204 Toxoplasmosis, Cryptosporidiosis, and Coccidiosis. Structure [Internet]. Elsevier Ltd.;

1205 2017;25:1495–1505.e6. Available from: http://dx.doi.org/10.1016/j.str.2017.07.015

1206 64. Son J, Lee EH, Park M, Kim JH, Kim J, Kim S, et al. Conformational changes in human

1207 prolyl-tRNA synthetase upon binding of the substrates proline and ATP and the inhibitor

1208 halofuginone. Acta Crystallogr Sect D Biol Crystallogr. 2013;69:2136–45.

1209 65. Zhou H, Sun, Litao Yang X-L, And Schimmel P. ATP-Directed Capture of Bioactive

1210 Herbal-Based Medicine on Human tRNA Synthetase. 2015;2:147–85.

1211 66. Burrows JN, Chibale K, Wells TNC. The state of the art in anti-malarial drug discovery

1212 and development. Curr Top Med Chem. 2011;11:1226–54.

1213 67. Baker DA. Malaria gametocytogenesis. Mol Biochem Parasitol [Internet]. Elsevier B.V.; bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1214 2010;172:57–65. Available from: http://dx.doi.org/10.1016/j.molbiopara.2010.03.019

1215 68. Pham JS, Dawson KL, Jackson KE, Lim EE, Pasaje CFA, Turner KEC, et al. Aminoacyl-

1216 tRNA synthetases as drug targets in eukaryotic parasites. Int. J. Parasitol. Drugs Drug Resist.

1217 2014. p. 1–13.

1218 69. Chihade JW, Brown JR, Schimmel PR, de Pouplana LR. Origin of mitochondria in

1219 relation to evolutionary history of eukaryotic alanyl-tRNA synthetase. Proc Natl Acad Sci U

1220 S A. 2000;97:12153–7.

1221 70. Pouplana LR De, Schimmel P. Visions & Reflections A view into the origin of life :

1222 aminoacyl-tRNA synthetases. Nature. 2000;57:865–70.

1223 71. Faya N, Penkler DL, Tastan Bishop Ö. Human, vector and parasite Hsp90 proteins: A

1224 comparative bioinformatics analysis. FEBS Open Bio. 2015;5:916–27.

1225 72. Pink R, Hudson A, Mouriès MA, Bendig M. Opportunities and challenges in antiparasitic

1226 drug discovery. Nat Rev Drug Discov. 2005;4:727–40.

1227 73. Bunjun S, Stathopoulos C, Graham D, Min B, Kitabatake M, Wang a L, et al. A dual-

1228 specificity aminoacyl-tRNA synthetase in the deep-rooted eukaryote Giardia lamblia. Proc

1229 Natl Acad Sci U S A. 2000;97:12997–3002.

1230 74. Hatherley R, Clitheroe CL, Faya N, Tastan Bishop Ö. Plasmodium falciparum Hop:

1231 Detailed analysis on complex formation with and Hsp90. Biochem Biophys Res

1232 Commun. 2015;456:440–5.

1233 75. Hatherley R, Blatch GL, Bishop ÖT. Plasmodium falciparum Hsp70-x: A heat shock

1234 protein at the host-parasite interface. J Biomol Struct Dyn. 2014;32:1766–79.

1235 76. Shibata S, Gillespie JR, Kelley AM, Napuli AJ, Zhang Z, Kovzun K V., et al. Selective

1236 inhibitors of methionyl-tRNA synthetase have potent activity against Trypanosoma brucei bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1237 infection in mice. Antimicrob Agents Chemother. 2011;55:1982–9.

1238 77. Dahl EL, Rosenthal PJ. Multiple antibiotics exert delayed effects against the Plasmodium

1239 falciparum apicoplast. Antimicrob Agents Chemother. 2007;51:3485–90.

1240 78. Schimmel P. Development of tRNA synthetases and connection to genetic code and

1241 disease. Protein Sci. 2008;17:1643–52.

1242 79. Richa Agarwala, Tanya Barrett, Jeff Beck, Dennis A. Benson, Colleen Bollin, Evan

1243 Bolton, Devon Bourexis, J. Rodney Brister, Stephen H. Bryant, Kathi Canese, Karen Clark,

1244 Michael DiCuc- cio, Ilya Dondoshansky, Scott Federhen, Michael Feolo, Kathryn Funk, L

1245 KZ. Database resources of the National Center for Biotechnology Information. Nucleic Acids

1246 Res [Internet]. 2014;41:1–12. Available from:

1247 http://www.ncbi.nlm.nih.gov/pubmed/25398906

1248 80. The UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Res

1249 [Internet]. 2015;43:D204-12. Available from:

1250 http://nar.oxfordjournals.org/cgi/content/long/43/D1/D204

1251 81. Eriani G, Delarue M, Poch O, Gangloff J, Moras D. Partition of tRNA synthetases into

1252 two classes based on mutually exclusive sets of sequence motifs. Lett to Nat. 1990;347:203–

1253 6.

1254 82. Cusack S. Aminoacyl-tRNA synthetases. Curr Opin Struct Biol. 1997;7:881–9.

1255 83. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, et al. The Protein

1256 Databank. Nucleic Acids Res. 2000;28:235–42.

1257 84. Bailey TL, Johnson J, Grant CE, Noble WS. The MEME Suite. Nucleic Acids Res.

1258 2015;43:W39–49.

1259 85. Bailey TL, Gribskov M. Combining evidence using p-values: application to sequence bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1260 homology searches. Bioinformatics [Internet]. 1998;14:48–54. Available from:

1261 https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/14.1.48

1262 86. Kelm S, Shi J, Deane CM. MEDELLER: Homology-based coordinate generation for

1263 membrane proteins. Bioinformatics. 2010;26:2833–40.

1264 87. Söding J, Biegert A, Lupas AN. The HHpred interactive server for protein homology

1265 detection and structure prediction. Nucleic Acids Res. 2005;33.

1266 88. Hatherley R, Brown DK, Glenister M, Bishop ÖT. PRIMO: An interactive homology

1267 modeling pipeline. PLoS One. 2016;11:1–20.

1268 89. Jain V, Yogavel M, Sharma A. Dimerization of Arginyl-tRNA Synthetase by Free Heme

1269 Drives Its Inactivation in Plasmodium falciparum. Structure. 2016;24:1476–87.

1270 90. Ojo KK, Ranade RM, Zhang Z, Dranow DM, Myers JB, Choi R, et al. Brucella melitensis

1271 Methionyl-tRNA-Synthetase (MetRS), a potential drug target for brucellosis. PLoS One.

1272 2016;11.

1273 91. Koh CY, Kim JE, Napoli AJ, Verlinde CLMJ, Fan E, Buckner FS, et al. Crystal structures

1274 of Plasmodium falciparum cytosolic tryptophanyl-tRNA synthetase and its potential as a

1275 target for structure-guided drug design. Mol Biochem Parasitol. 2013;189:26–32.

1276 92. Barros-Álvarez X, Kerchner KM, Koh CY, Turley S, Pardon E, Steyaert J, et al.

1277 Leishmania donovani tyrosyl-tRNA synthetase structure in complex with a tyrosyl adenylate

1278 analog and comparisons with human and protozoan counterparts. Biochimie. 2017;138:124–

1279 36.

1280 93. Hewitt SN, Dranow DM, Horst BG, Abendroth JA, Forte B, Hallyburton I, et al.

1281 Biochemical and structural characterization of selective allosteric inhibitors of the

1282 Plasmodium falciparum drug target, prolyl-tRNA-synthetase. ACS Infect Dis. 2017;3:34–44. bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1283 94. Ofir-Birin Y, Fang P, Bennett SP, Zhang H-M, Wang J, Rachmin I, et al. Structural

1284 Switch of Lysyl-tRNA Synthetase between Translation and Transcription. Mol Cell

1285 [Internet]. 2013 [cited 2017 May 23];49:30–42. Available from:

1286 http://linkinghub.elsevier.com/retrieve/pii/S1097276512008635

1287 95. Wiederstein M, Sippl MJ. ProSA-web: Interactive web service for the recognition of

1288 errors in three-dimensional structures of proteins. Nucleic Acids Res. 2007;35.

1289 96. Eisenberg D, Lüthy R, Bowie JU. VERIFY3D: Assessment of protein models with three-

1290 dimensional profiles. Methods Enzymol. 1997;277:396–406.

1291 97. Benkert P, Künzli M, Schwede T. QMEAN server for protein model quality estimation.

1292 Nucleic Acids Res [Internet]. 2009;37:W510–4. Available from:

1293 http://www.ncbi.nlm.nih.gov/pubmed/19429685%5Cnhttp://www.pubmedcentral.nih.gov/arti

1294 clerender.fcgi?artid=PMC2703985%5Cnhttp://www.pubmedcentral.nih.gov/articlerender.fcg

1295 i?artid=2703985&tool=pmcentrez&rendertype=abstract

1296 98. Di Tommaso P, Moretti S, Xenarios I, Orobitg M, Montanyola A, Chang JM, et al. T-

1297 Coffee: A web server for the multiple sequence alignment of protein and RNA sequences

1298 using structural information and homology extension. Nucleic Acids Res. 2011;39.

1299 99. Pei J, Kim BH, Grishin N V. PROMALS3D: A tool for multiple protein sequence and

1300 structure alignments. Nucleic Acids Res. 2008;36:2295–300.

1301 100. Waterhouse AM, Procter JB, Martin DMA, Clamp M, Barton GJ. Jalview Version 2-A

1302 multiple sequence alignment editor and analysis workbench. Bioinformatics. 2009;25:1189–

1303 91.

1304 101. Kumar S, Stecher G, Tamura K. MEGA7: Molecular Evolutionary Genetics Analysis

1305 Version 7.0 for Bigger Datasets. Mol Biol Evol. 2016;33:1870–4. bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1306 102. Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. MEGA5: Molecular

1307 evolutionary genetics analysis using maximum likelihood, evolutionary distance, and

1308 maximum parsimony methods. Mol Biol Evol. 2011;28:2731–9.

1309 103. Jones DT, Taylor WR, Thornton JM. The rapid generation of data matrices

1310 from protein sequences. Bioinformatics. 1992;8:275–82.

1311 104. Ngan CH, Bohnuud T, Mottarella SE, Beglov D, Villar EA, Hall DR, et al. FTMAP:

1312 Extended protein mapping with user-selected probe molecules. Nucleic Acids Res. 2012;40.

1313 105. Halgren T. New method for fast and accurate binding-site identification and analysis.

1314 Chem Biol Drug Des. 2007;69:146–8.

1315 106. Halgren TA. Identifying and characterizing binding sites and assessing druggability. J

1316 Chem Inf Model. 2009;49:377–89.

1317 107. Kozakov D, Grove LE, Hall DR, Bohnuud T, Mottarella SE, Luo L, et al. The FTMap

1318 family of web servers for determining and characterizing ligand-binding hot spots of proteins.

1319 Nat Protoc. 2015;10:733–55.

1320 108. Moras D. Structural and functional relationships between aminoacyl-tRNA synthetases.

1321 Trends Biochem Sci [Internet]. 1992;17:159–64. Available from:

1322 papers3://publication/uuid/7E1AF307-2980-4379-933C-27969EA880B7

1323 109. Fourmy D, Mechulam Y, Blanquet S. Crucial role of an idiosyncratic insertion in the

1324 Rossman fold of class 1 aminoacyl-tRNA synthetases: the case of methionyl-tRNA

1325 synthetase. Biochemistry [Internet]. 1995;34:15681–8. Available from:

1326 http://www.ncbi.nlm.nih.gov/pubmed/7495798

1327 110. Mailu BM, Ramasamay G, Mudeppa DG, Li L, Lindner SE, Peterson MJ, et al. A

1328 nondiscriminating glutamyl-tRNA synthetase in the plasmodium apicoplast: The first enzyme bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1329 in an indirect aminoacylation pathway. J Biol Chem. 2013;288:32539–52.

1330 111. Mailu BM, Li L, Arthur J, Nelson TM, Ramasamy G, Fritz-Wolf K, et al. Plasmodium

1331 apicoplast Gln-tRNAGln biosynthesis utilizes a unique GatAB amidotransferase essential for

1332 erythrocytic stage parasites. J Biol Chem. 2015;290:29629–41.

1333 112. Kyriacou S V., Deutscher MP. An Important Role for the Multienzyme Aminoacyl-

1334 tRNA Synthetase Complex in Mammalian Translation and Cell Growth. Mol Cell.

1335 2008;29:419–27.

1336 113. Khan S, Doerig C, Baker D, Billker O, Blackman M, Chitnis C, et al. Recent advances

1337 in the biology and drug targeting of malaria parasite aminoacyl-tRNA synthetases. Malar J

1338 [Internet]. 2016;15:203. Available from:

1339 http://malariajournal.biomedcentral.com/articles/10.1186/s12936-016-1247-0

1340 114. Robinson JC, Kerjan P, Mirande M. Macromolecular assemblage of aminoacyl-tRNA

1341 synthetases: Quantitative analysis of protein-protein interactions and mechanism of complex

1342 assembly. J Mol Biol. 2000;304:983–94.

1343 115. Han JM, Kim JY, Kim S. Molecular network and functional implications of

1344 macromolecular tRNA synthetase complex. Biochem. Biophys. Res. Commun. 2003. p. 985–

1345 93.

1346 116. Lee SW. Aminoacyl-tRNA synthetase complexes: beyond translation. J Cell Sci

1347 [Internet]. 2004;117:3725–34. Available from:

1348 http://jcs.biologists.org/cgi/doi/10.1242/jcs.01342

1349 117. Guo M, Yang X-L, Schimmel P. New functions of aminoacyl-tRNA synthetases beyond

1350 translation. Nat Rev Mol Cell Biol [Internet]. 2010;11:668–74. Available from:

1351 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3042954&tool=pmcentrez&rende bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1352 rtype=abstract

1353 118. Ko YG, Kim EK, Kim T, Park H, Park HS, Choi EJ, et al. Glutamine-dependent

1354 Antiapoptotic Interaction of Human Glutaminyl-tRNA Synthetase with Apoptosis Signal-

1355 regulating 1. J Biol Chem. 2001;276:6030–6.

1356 119. Han JM, Jeong SJ, Park MC, Kim G, Kwon NH, Kim HK, et al. Leucyl-tRNA

1357 synthetase is an intracellular leucine sensor for the mTORC1-signaling pathway. Cell.

1358 2012;149:410–24.

1359 120. Khan S, Garg A, Sharma A, Camacho N, Picchioni D, Saint-Léger A, et al. An

1360 Appended Domain Results in an Unusual Architecture for Malaria Parasite Tryptophanyl-

1361 tRNA Synthetase. PLoS One. 2013;8.

1362 121. Cusack S, Berthet-Colominas C, Härtlein M, Nassar N, Leberman R. A second class of

1363 synthetase structure revealed by X-ray analysis of Escherichia coli seryl-tRNA synthetase at

1364 2.5 A. Nature. 1990;347:249–55.

1365 122. Cavarelli J, Eriani G, Rees B, Ruff M, Boeglin M, Mitschler A, et al. The active site of

1366 yeast aspartyl-tRNA synthetase: structural and functional aspects of the aminoacylation

1367 reaction. EMBO J [Internet]. 1994;13:327–37. Available from:

1368 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=394812&tool=pmcentrez&render

1369 type=abstract

1370 123. Dignam JD, Guo J, Griffith WP, Garbett NC, Holloway A, Mueser T. Allosteric

1371 interaction of nucleotides and tRNA ala with E. coli alanyl-tRNA synthetase. Biochemistry.

1372 2011;50:9886–900.

1373 124. Arnez JG, Harris DC, Mitschler A, Rees B, Francklyn CS, Moras D. Crystal structure of

1374 histidyl-tRNA synthetase from Escherichia coli complexed with histidyl-adenylate. EMBO J. bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1375 1995;14:4143–55.

1376 125. Logan DT, Mazauric MH, Kern D, Moras D. Crystal structure of glycyl-tRNA

1377 synthetase from Thermus thermophilus. EMBO J [Internet]. 1995;14:4156–67. Available

1378 from:

1379 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=394498&tool=pmcentrez&render

1380 type=abstract

1381 126. Onesti S, Miller AD, Brick P. The crystal structure of the lysyl-tRNA synthetase (LysU)

1382 from Escherichia coli. Structure. 1995;3:163–76.

1383 127. Berthet-Colominas C, Seignovert L, Härtlein M, Grotli M, Cusack S, Leberman R. The

1384 crystal structure of asparaginyl-tRNA synthetase from Thermus thermophilus and its

1385 complexes with ATP and asparaginyl-adenylate: The mechanism of discrimination between

1386 asparagine and aspartic acid. EMBO J. 1998;17:2947–60.

1387 128. Becker HD, Reinbolt J, Kreutzer R, Giegé R, Kern D. Existence of two distinct aspartyl-

1388 tRNA synthetases in Thermus thermophilus. Structural and biochemical properties of the two

1389 enzymes. Biochemistry. 1997;36:8785–97.

1390 129. Min B, Pelaschier JT, Graham DE, Tumbula-Hansen D, Söll D. Transfer RNA-

1391 dependent amino acid biosynthesis: an essential route to asparagine formation. Proc Natl

1392 Acad Sci U S A [Internet]. 2002;99:2678–83. Available from:

1393 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=122407&tool=pmcentrez&render

1394 type=abstract

1395 130. Sheppard K, Yuan J, Hohn MJ, Jester B, Devine KM, Söll D. From one amino acid to

1396 another: tRNA-dependent amino acid biosynthesis. Nucleic Acids Res. 2008;36:1813–25.

1397 131. Delarue M. Aminoacyl-tRNA synthetases. Curr Opin Struct Biol [Internet]. 1995;5:48– bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1398 55. Available from: http://www.ncbi.nlm.nih.gov/pubmed/7773747

1399 132. Guigou L, Shalak V, Mirande M. The tRNA-Interacting Factor p43 Associates with

1400 Mammalian Arginyl-tRNA Synthetase but Does Not Modify Its tRNA Aminoacylation

1401 Properties. Biochemistry. 2004;43:4592–600.

1402 133. Ferber S, Ciechanover A. Role of arginine-tRNA in protein degradation by the ubiquitin

1403 pathway. Nature [Internet]. 1987;326:808–11. Available from:

1404 http://www.ncbi.nlm.nih.gov/pubmed/3033511

1405 134. Zheng YG, Wei H, Ling C, Xu MG, Wang ED. Two forms of human cytoplasmic

1406 arginyl-tRNA synthetase produced from two translation initiations by a single mRNA.

1407 Biochemistry. 2006;45:1338–44.

1408 135. Jeong EJ, Hwang GS, Kyung Hee Kim, Min Jung Kim, Kim S, Kim KS. Structural

1409 analysis of multifunctional peptide motifs in human bifunctional tRNA synthetase:

1410 Identification of RNA-binding residues and functional implications for tandem repeats.

1411 Biochemistry. 2000;39:15775–82.

1412 136. Rho SB, Lee JS, Jeong EJ, Kim KS, Kim YG, Kim S. A multifunctional repeated motif

1413 is present in human bifunctional tRNA synthetase. J Biol Chem. 1998;273:11267–73.

1414 137. Fett R, Knippers R. The primary structure of human glutaminyl-tRNA synthetase: A

1415 highly conserved core, amino acid repeat regions, and homologies with translation elongation

1416 factors. J Biol Chem. 1991;266:1448–55.

1417 138. Rho SB, Kim MJ, Lee JS, Seol W, Motegi H, Kim S, et al. Genetic dissection of protein-

1418 protein interactions in multi-tRNA synthetase complex. Proc Natl Acad Sci U S A.

1419 1999;96:4488–93.

1420 139. Hsu JL, Martinis SA. A Flexible Peptide Tether Controls Accessibility of a Unique C- bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1421 terminal RNA-binding Domain in Leucyl-tRNA Synthetases. J Mol Biol. 2008;376:482–91.

1422 140. Wakasugi K. Two Distinct Cytokines Released from a Human Aminoacyl-tRNA

1423 Synthetase. Science (80- ) [Internet]. 1999;284:147–51. Available from:

1424 http://www.sciencemag.org/cgi/doi/10.1126/science.284.5411.147

1425 141. Frugier M, Moulinier L, Giegé R. A domain in the N-terminal extension of class IIb

1426 eukaryotic aminoacyl-tRNA synthetases is important for tRNA binding. EMBO J [Internet].

1427 2000;19:2371–80. Available from:

1428 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=384352&tool=pmcentrez&render

1429 type=abstract

1430 142. Dou X, Limmer S, Kreutzer R. DNA-binding of phenylalanyl-tRNA synthetase is

1431 accompanied by loop formation of the double-stranded DNA. J Mol Biol [Internet].

1432 2001;305:451–8. Available from: http://www.ncbi.nlm.nih.gov/pubmed/11152603

1433 143. de Oca MM, Engwerda C, Haque A. Plasmodium berghei ANKA (PbA) infection of

1434 C57BL/6J mice: a model of severe malaria. [Internet]. Methods Mol. Biol. 2013. Available

1435 from: http://www.ncbi.nlm.nih.gov/pubmed/23824903

1436 144. Otto TD, Böhme U, Jackson AP, Hunt M, Franke-Fayard B, Hoeijmakers WAM, et al.

1437 A comprehensive evaluation of rodent malaria parasite genomes and gene expression. BMC

1438 Biol [Internet]. 2014;12:86. Available from:

1439 http://bmcbiol.biomedcentral.com/articles/10.1186/s12915-014-0086-0

1440 145. Mitsui H, Arisue N, Sakihama N, Inagaki Y, Horii T, Hasegawa M, et al. Phylogeny of

1441 Asian primate malaria parasites inferred from apicoplast genome-encoded genes with special

1442 emphasis on the positions of Plasmodium vivax and P. fragile. Gene. 2010;450:32–8.

1443 146. Pain A, Böhme U, Berry AE, Mungall K, Finn RD, Jackson AP, et al. The genome of bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1444 the simian and human malaria parasite Plasmodium knowlesi. Nature. 2008;455:799–803.

1445 147. Carlton JM, Adams JH, Silva JC, Bidwell SL, Lorenzi H, Caler E, et al. Comparative

1446 genomics of the neglected human malaria parasite Plasmodium vivax. Nature. 2008;455:757–

1447 63.

1448 148. Höglind A, Areström I, Ehrnfelt C, Masjedi K, Zuber B, Giavedoni L, et al. Systematic

1449 evaluation of monoclonal antibodies and immunoassays for the detection of Interferon-γ and

1450 Interleukin-2 in old and new world non-human primates. J Immunol Methods. 2017;441:39–

1451 48.

1452 149. Le SQ, Gascuel O. An improved general amino acid replacement matrix. Mol Biol Evol.

1453 2008;25:1307–20.

1454 150. Bishop ÖT, Kroon M. Study of protein complexes via homology modeling, applied to

1455 cysteine and their protein inhibitors. J Mol Model. 2011;17:3163–72.

1456 151. Shen MY, Sali A. Statistical potential for assessment and prediction of protein structures

1457 [Internet]. Protein Sci. 2006. p. 2507–24. Available from:

1458 http://www.ncbi.nlm.nih.gov/pubmed/17075131

1459 152. Eramian D, Shen M, Devos D, Melo F, Sali A, Marti-Renom MA. A composite score

1460 for predicting errors in protein structure models. Protein Sci [Internet]. 2006;15:1653–66.

1461 Available from: http://doi.wiley.com/10.1110/ps.062095806

1462 153. Marko AC, Stafford K, Wymore T. Stochastic pairwise alignments and scoring methods

1463 for comparative protein structure modeling. J Chem Inf Model [Internet]. 2007;47:1263–70.

1464 Available from: http://www.ncbi.nlm.nih.gov/pubmed/17391002

1465 154. Chen H, Kihara D. Estimating quality of template-based protein models by alignment

1466 stability. Proteins Struct Funct Genet. 2008;71:1255–74. bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1467 155. Koh CY, Kim JE, Napoli AJ, Verlinde CLMJ, Fan E, Buckner FS, et al. Crystal

1468 structures of Plasmodium falciparum cytosolic tryptophanyl-tRNA synthetase and its

1469 potential as a target for structure-guided drug design. Mol Biochem Parasitol [Internet].

1470 Elsevier B.V.; 2013;189:26–32. Available from:

1471 http://dx.doi.org/10.1016/j.molbiopara.2013.04.007

1472 156. Yang X-L, Otero FJ, Skene RJ, McRee DE, Schimmel P, Ribas de Pouplana L. Crystal

1473 structures that suggest late development of genetic code components for differentiating

1474 aromatic side chains. Proc Natl Acad Sci [Internet]. 2003;100:15376–80. Available from:

1475 http://www.pnas.org/cgi/doi/10.1073/pnas.2136794100

1476 157. Koh CY, Kim JE, Shibata S, Ranade RM, Yu M, Liu J, et al. Distinct states of

1477 methionyl-tRNA synthetase indicate inhibitor binding by conformational selection. Structure.

1478 2012;20:1681–91.

1479 bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/440891; this version posted October 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.