bioRxiv preprint doi: https://doi.org/10.1101/2020.09.20.305201; this version posted September 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 2 3 4

5

6

7 Identifying -Relevant Mutations in the DLC START Domain 8 using Evolutionary and Structure-Function Analyses

9 10 Ashton S. Holub1,2, Renee A. Bouley3, Ruben C. Petreaca4, Aman Y. Husbands1,2* 11

12 1 Department of Molecular Genetics, The Ohio State University, Columbus, OH, 43215, USA.

13 2 Center for Applied Plant Sciences (CAPS), The Ohio State University, Columbus, OH, 43215, 14 USA.

15 3 Department of Chemistry and Biochemistry, The Ohio State University, Marion, OH, 43302, 16 USA.

17 4 Department of Molecular Genetics, The Ohio State University, Marion, OH, 43302, USA.

18

19 *Corresponding author. Email: [email protected]

20

1 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.20.305201; this version posted September 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

21 Simple Summary: Deleted in Liver Cancer (DLC) are tumor suppressors that contain a

22 StAR-related lipid transfer (START) domain. Little is known about the DLC START domain

23 including the residues that mediate its activation. Using the Catalogue of Somatic Mutations in

24 Cancer (COSMIC) dataset, evolutionary, and structure-function analyses, we identify key

25 functional residues of DLC START domains. Mutations in these residues are significantly

26 overrepresented in numerous in multiple tissue types. In other contexts, START

27 domains bind hydrophobic small molecules and stimulate regulatory outputs of interacting

28 partners or co-occurring domains. The identification of functional residues in the DLC START

29 domain may thus have implications for targeted manipulation of DLC tumor suppressive activity.

30 Critically, these residues would not have been identified using traditional queries of COSMIC.

31 Thus, we propose tandem evolutionary and structure-function approaches are an underutilized

32 strategy to unmask cancer-relevant mutations within COSMIC.

33 Abstract: Rho GTPase signaling promotes proliferation, invasion, and metastasis in a broad

34 spectrum of cancers. Rho GTPase activity is regulated by the Deleted in Liver Cancer (DLC)

35 family of bonafide tumor suppressors which directly inactivate Rho GTPases by stimulating GTP

36 hydrolysis. In addition to a RhoGAP domain, DLC proteins contain a StAR-related lipid transfer

37 (START) domain. START domains in other organisms bind hydrophobic small molecules and

38 can regulate interacting partners or co-occurring domains through a variety of mechanisms. In

39 the case of DLC proteins, their START domain appears to contribute to tumor suppressive

40 activity. However, the nature of this START-directed mechanism, as well as the identities of

41 relevant functional residues, remain virtually unknown. Using the Catalogue of Somatic

42 Mutations in Cancer (COSMIC) dataset, and evolutionary and structure-function analyses, we

43 identify several conserved residues likely to be required for START-directed regulation of DLC-1

44 and DLC-2 tumor suppressive capability. This pan-cancer analysis shows that conserved

45 residues of both START domains are highly-overrepresented in cancer cells from a wide range

2 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.20.305201; this version posted September 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

46 tissues. Interestingly, in DLC-1 and DLC-2, three of these residues form multiple interactions at

47 the tertiary structural level. Further, mutation of any of these residues is predicted to disrupt

48 interactions and thus destabilize the START domain. As such, these mutations would not have

49 emerged from traditional hotspot scans of COSMIC. We propose that evolutionary and

50 structure-function analyses are an underutilized strategy which could be used to unmask

51 cancer-relevant mutations within COSMIC. Our data also suggest DLC-1 and DLC-2 as high-

52 priority candidates for development of novel therapeutics targeting their START domain.

53 Keywords: evolution, structure-function analyses, ligand-binding domain, tumor suppressor,

54 computational genomics, COSMIC, novel druggable therapeutic targets

3 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.20.305201; this version posted September 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

55 1. Introduction

56 Rho GTPases are a subfamily of G proteins involved in signal transduction. These proteins

57 regulate multiple pathways, including the cytoskeleton, cell polarity, cell cycle progression,

58 and microtubule dynamics [1]. Signaling is activated by GTP binding at their GTPase domain

59 and inactivated by its hydrolysis into GDP. This balance is mediated by the opposing activities

60 of Rho guanine nucleotide exchange factors (RhoGEFs) and Rho GTPase activating proteins

61 (RhoGAPs), which promote GTP replacement and GTP hydrolysis, respectively (Figure 1).

62 Upregulated signaling of Rho GTPases such as RhoA, Cdc42, and Rac, or their respective

63 RhoGEFs, leads to tumor proliferation, migration, invasion, and metastasis in a number of

64 cancers [2-4].

65 Rho GTPases have a globular structure with a limited druggable surface, making them

66 particularly challenging drug targets [5-7]. Efforts to mitigate the effects of Rho GTPases in

67 cancer have therefore focused primarily on disrupting aspects of Rho GTPase signaling

68 (reviewed in [7]). These include studies of RhoGAPs, which have specific consistent inhibitory

69 effects on Rho GTPase signaling and are considered particularly attractive components to

70 target within the Rho GTPase signaling module [1,3,5,7-14] (Figure 1). The Deleted in Liver

71 Cancer 1 (DLC-1 or STARD12), DLC-2 (STARD13), and DLC-3 (STARD8) RhoGAPs are

72 frequently downregulated in cancer and associated with poor prognosis [12,15]. The Cancer

73 Genome Atlas (TCGA) data indicate that, of the three DLC , DLC-1 is usually the most

74 dramatically downregulated, followed by DLC-2 [12]. However, unlike DLC-2, DLC-1 is

75 embryonic lethal when knocked out in mouse models [16,17]. A recent pan-cancer analysis also

76 showed that DLC-1 missense mutations are common in cancer cells [15]. This suggests a larger

77 role for DLC-1 in control of disease and development. DLC-3, on the other hand, does not show

78 a correlation between copy number loss and cancer progression in the TCGA dataset [12].

4 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.20.305201; this version posted September 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

79 Despite making differential contributions to cancer progression, all three DLC proteins are

80 considered to be bonafide tumor suppressors that inhibit cancer cell growth [16,18-24].

81 Adjacent to the DLC RhoGAP domain is a StAR-related lipid transfer (START) domain.

82 START domains adopt a helix-grip fold structure, forming deep hydrophobic binding pockets

83 that undergo conformational changes upon binding of lipophilic ligands [25-27]. START-

84 containing proteins are distributed throughout Eubacteria and Eukaryota, and are important

85 regulators of numerous biological processes. For instance, in humans, START-containing

86 proteins are associated with numerous human diseases, including metabolic, oncogenic, and

87 autoimmune disorders [28-30]. Structurally, these proteins can be divided into two classes:

88 minimal START proteins and START-containing multidomain proteins (SMPs), which possess a

89 wide variety of additional functional domains [31]. START domains employ a range of complex

90 regulatory mechanisms [19,20,32-34]. Of particular interest, START domains in both minimal

91 proteins and SMPs are able to stimulate regulatory outputs of interacting partners or co-

92 occurring domains in a ligand-dependent manner [32,34]. In the context of DLC proteins, this

93 would be predicted to manifest as START-dependent promotive effects on tumor suppressor

94 activity.

95 The presence of a START domain suggests DLC proteins may be regulated by a lipophilic

96 ligand, rendering them more easily targetable by natural or synthetic small molecules. However,

97 while there is considerable in vivo and in vitro evidence for them as bonafide tumor

98 suppressors, the role of their START domain remains poorly understood. In fact, the residues

99 mediating START function remain virtually unknown. This is a key bottleneck for targeted

100 modulation of DLC tumor suppressive activity. Here we use an evolutionary approach to identify

101 conserved residues in DLC START domains that are also mutated in the Catalogue of Somatic

102 Mutations in Cancer (COSMIC) dataset. We find that conserved residues of DLC-1 and DLC-2

103 START domains are highly-overrepresented in tumor samples from a broad spectrum of

5 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.20.305201; this version posted September 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

104 cancers. Moreover, using comparative structural modeling, we identify a set of three residues,

105 present in both DLC-1 and DLC-2 START domains, that cluster and interact at the tertiary level.

106 Mutations identified in COSMIC are predicted to break these non-covalent interactions, and

107 disrupt START tertiary structure and function. Our results support the notion that the START

108 domain is involved in DLC-mediated tumor suppression, and that DLC proteins may thus be

109 particularly tractable therapeutic targets. Importantly, these residues would not have emerged

110 from traditional scans for mutational hotspots. As such, we propose that tandem evolutionary

111 and structural analyses could help unmask hidden cancer-relevant mutations within the

112 COSMIC dataset.

113 2. Results and Discussion

114 2.1 Mutations in conserved residues of DLC-1 and DLC-2 START domains are overrepresented

115 in tumors.

116 COSMIC contains numerous missense, nonsense, and frameshift mutations in the START

117 domains of DLC-1, DLC-2, and DLC-3 [36]. We focused on missense mutations, as the latter

118 two are more likely to yield strong effects that confound specific contributions of the DLC

119 START domain. We found 47, 33, and 54 missense mutations falling within the DLC-1, DLC-2,

120 and DLC-3 START domains, respectively (Table S1-S3). These mutations do not cluster into

121 hotspots, as a Kolmogorov-Smirnov test for uniformity failed to reject the null hypothesis (Table

122 S4). Although uniformly distributed, we hypothesized that they may nonetheless be

123 disproportionately affecting functional residues.

124 To identify functional residues of DLC START domains, we performed a multiple sequence

125 alignment (MSA), as evolutionary conservation is a strong indicator of critical residues within

126 domains [37]. START domains were selected from 123 DLC-1, DLC-2, and DLC-3

127 orthologs from 46 vertebrate species spanning 450 million years of divergent evolution. Using a

6 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.20.305201; this version posted September 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

128 stringent 98% threshold, we identified 41 residues (~20%) within the 206 amino acid DLC

129 START domains that are identical in ≥98% of sequences (Figure 2; see Figure S1 for full

130 species MSA). Missense mutations from COSMIC were placed onto the MSA to score whether

131 they preferentially affected these conserved residues (Figure 3A). DLC-1 had 17 mutations

132 falling in 11 conserved residues, DLC-2 had 13 mutations falling in 10 conserved residues, and

133 DLC-3 had 5 mutations falling in 5 conserved residues (Figure 3; Tables S1-S3). This

134 corresponds to 36.2%, 39.3%, and 9.3% of total mutations in DLC-1, DLC-2, and DLC-3 START

135 domains, respectively (Table S5). We next tested whether these frequencies deviate from the

136 approximately 20% likelihood of mutations occurring in conserved residues by chance (Table

137 S5). Indeed, mutations are significantly overrepresented in conserved residues of both DLC-1

138 (p-value = 0.006) and DLC-2 (p-value = 0.005) START domains (Figure 3B; Tables S5-S7). By

139 contrast, mutations of the DLC-3 START domain are not overrepresented in conserved residues

140 (p-value = 0.052). These findings are consistent with the frequent downregulation of DLC-1 and

141 DLC-2, but not DLC-3, in many cancers [12]. Taken together, these evolutionary analyses

142 identify conserved residues in DLC-1 and DLC-2 START domains which may be linked to

143 cancer phenotypes.

144 2.2 Conserved residues form multiple bonds within the START tertiary structure

145 Conservation of residues implies critical roles in function [37]. For example, an arginine

146 residue near the opening of the START cavity is highly-conserved and known to be required for

147 START activity [38]; this residue also accrued mutations in multiple independent patient

148 samples (DLC-1 R988; Table S1). In addition, mutation of the conserved E966 residue in DLC-1

149 was recently shown to impair tumor suppressor activity [15] (Table S1). The recovery of multiple

150 residues with known roles in either cancer or START activity supports the biological relevance

151 of the residues identified by our analyses. Most conserved DLC START residues are

152 uncharacterized however, and it is formally possible that amino acid substitutions have only

7 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.20.305201; this version posted September 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

153 minimal effects on tertiary structure [39]. We thus sought to predict their impact on structure and

154 function of DLC START domains using comparative structural modeling. As mutations in the

155 DLC-3 START domain were not overrepresented in COSMIC, it was excluded from these

156 structural analyses.

157 Mutations in COSMIC falling within conserved residues were mapped to homology models

158 of the DLC-1 and DLC-2 START domains (Figure 4). Strikingly, three conserved residues with

159 missense mutations in COSMIC colocalized to the same positions within the DLC-1 and DLC-2

160 START domains: R947, S1077, and F1078 of DLC-1 (Figure 4A,B), and R969, S1100, and

161 F1101 of DLC-2 (Figure 4C,D). The conserved R947/R969 residues are part of helix-2, and are

162 in close proximity to the S1077/S1100 and F1078/F1101 residues, which sit near the end of the

163 C-terminal helix-3. Given the degree of conservation and close proximity in both DLC-1 and

164 DLC-2, we focused our subsequent analyses on these arginine, serine, and phenylalanine

165 residues.

166 The close proximity of these residues in three-dimensional space presents the possibility that

167 they may interact at the tertiary level. We examined this idea using Crystallographic Object-

168 Oriented Toolkit (CooT), and identified multiple predicted interactions between these residues,

169 as well as between these residues and their neighbors. For instance, the R947 residue of DLC-

170 1 forms a cation-π interaction with the aromatic ring of F1078 (3.3 Å; Figure 5A-C). Cation-π

171 interactions in proteins generally occur between a positively charged amino acid and the π face

172 of an aromatic ring, and are key contributors to protein structure [40,41]. In addition to this

173 cation-π interaction, R947 also forms a hydrogen bond with Q1081 (3.1 Å). Finally, S1077 does

174 not directly interact with either R947 or F1078, but instead forms a hydrogen bond with K1073

175 within helix-3. In the case of DLC-2, R969 and F1101 in DLC-2 similarly form a cation-π

176 interaction, although the strength of this interaction is predicted to be weaker (3.8 Å; Figure 6A-

177 C). In contrast to S1077 of DLC-1, S1100 in DLC-2 interacts with R969 via hydrogen bonding at

8 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.20.305201; this version posted September 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

178 both its carboxy backbone (2.9 Å) and side chain (3.2 Å) of S1100. Thus, while not apparent

179 from the primary sequence, three-dimensional homology models reveal these residues appear

180 to function together at the tertiary level.

181 2.3 COSMIC missense mutations disrupt tertiary-level interactions in DLC-1 and DLC-2 START

182 domains

183 Their conservation, proximity, and extensive intermolecular interactions suggest that

184 missense mutants in the arginine, serine, or phenylalanine residues would disrupt START

185 structure and function. To test this, we separately replaced each residue with one of the

186 missense mutations identified in COSMIC (Table S1-S3). We then generated new homology

187 models for DLC-1 and DLC-2 START domains and used CooT to determine the likely effect of

188 these mutations on the previously identified noncovalent network.

189 In DLC-1, the R947C mutation abolishes its cation-π interaction with F1078 and hydrogen

190 bond with Q1081, but leaves the S1077-K1073 hydrogen bond intact (Figure 5D). S1077F

191 retains the R947 and F1078 cation-π interaction, but abolishes the S1077F-K1073 hydrogen

192 bond, as well as the hydrogen bond between R947 and Q1081 (Figure 5E). Finally, F1078L

193 disrupts its cation-π interaction with R947, but retains the hydrogen bond between R947 and

194 Q1081, as well as the S1077-K1073 hydrogen bond (Figure 5F).

195 In the case of DLC-2, the R969C mutation similarly abolishes its cation-π interaction with

196 F1101, as well as its hydrogen bond to S1100 (Figure 6D). S1100F retains the cation-π

197 interaction with R969 and F1101, while abolishing all hydrogen bonds formed with R969 (Figure

198 6E). Finally, F1101L abolishes its cation-π interaction with R969 and disrupts the hydrogen

199 bond between R969 and the side chain of S1100, but retains the hydrogen bond between R969

200 and the carboxyl backbone of S1100 (Figure 6F).

9 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.20.305201; this version posted September 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

201 The arginine-to-cysteine mutations in both DLC-1 (R947C) and DLC-2 (R969C) disrupt

202 cation-π interactions, as well as multiple hydrogen bonds. Thus, we predict this substitution to

203 be the most destabilizing in both START domains. This is also in line with global analyses of the

204 COSMIC dataset, which found that arginine-to-glutamine and arginine-to-cysteine substitutions

205 occur with high frequency in driver mutations [42]. The serine-to-phenylalanine mutations

206 (S1077F in DLC-1 and S1100F in DLC-2) also abolish multiple hydrogen bonds and replace a

207 small polar side chain with a large hydrophobic side chain. Thus, this substitution could

208 conceivably destabilize START domains to a similar extent as the arginine-to-cysteine

209 mutations. By contrast, the phenylalanine-to-leucine mutations (F1078L and F1101L) are

210 predicted to be the least destabilizing, as several hydrogen bonds are left intact and both amino

211 acids have similar physicochemical properties [39]. These structural analyses, focusing on a

212 subset of mutations overrepresented in patient tumors, support a role for the START domain in

213 regulating the tumor suppressor activity of DLC-1 and DLC-2.

214 3. Conclusions

215 Functional residues mediating the tumor suppressive activity of the DLC START domain

216 remain virtually unknown. Here, we undertook a pan-cancer analysis and identified residues

217 within DLC-1 and DLC-2 START domains likely to affect the function of these genes. We also

218 show that COSMIC missense mutations in DLC-1 and DLC-2 START domains are significantly

219 overrepresented in evolutionarily-conserved residues predicted to be critical for proper folding,

220 interaction, and/or activity [37,38]. Importantly, the relevance of these residues to START and

221 DLC tumor suppressive activities is supported by multiple previous findings [15,20]. Our

222 analyses thus provide an evolutionary rationale to begin drug targeting of key residues within

223 the conserved START domains of these genes.

224 Three of these substitutions, occurring in both DLC-1 and DLC-2, disrupt multiple tertiary-

225 level interactions, and are likely to impact START structure. The remaining overrepresented

10 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.20.305201; this version posted September 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

226 COSMIC missense mutations may similarly impact START activity, or affect other properties

227 such as ligand binding or conformational change. A potential DLC START regulatory

228 mechanism is suggested by studies of both minimal proteins and SMPs, demonstrating that

229 START domains can stimulate regulatory outputs in an allosteric manner [32,35]. If the DLC

230 START domain functions similarly, it could promote RhoGAP tumor suppressor activity, possibly

231 via allosteric regulation of intra- or inter-molecular interactions.

232 These data also suggest mutation hotspots might not capture the full spectrum of cancer-

233 relevant mutations in COSMIC. For instance, the conserved arginine, serine, and phenylalanine

234 residues in DLC-1 and DLC-2 appear to work together to stabilize START structure (Figures 5

235 and 6). Single substitutions in any of these residues would have the same structural and

236 functional consequences for DLC START activity, thereby diluting its signal in COSMIC. We

237 propose that, in addition to traditional analyses of mutation frequency, COSMIC should be

238 queried using coupled evolutionary and structure-function analyses. These tandem approaches

239 may help unmask hidden cancer-relevant mutations.

240 DLC-1 and DLC-2 are frequently down-regulated, as opposed to mutated, in cancer [12,43].

241 For instance, DLC-1 expression is reduced 10-fold in lung adenocarcinoma, 3-fold in

242 , 4-fold in breast cancer, and 3-fold in colon cancer [12]. Stimulating

243 the activity of residual DLC-1 and DLC-2 proteins through an agonistic START ligand could be a

244 means to improve patient outcomes, particularly if used in combination with current therapeutic

245 approaches. Importantly, agonistic START ligands have already been developed. In plants,

246 signaling of the hormone abscisic acid (ABA) is mediated by its minimal START domain

247 receptor [44]. Opabactin, a rationally designed agonistic START ligand, has a 7-fold stronger

248 affinity than the native ligand, and increases ABA signaling outputs in vivo by a factor of 10 [45].

249 An agonistic ligand designed for the DLC START domain could compensate for reduced

250 outputs, and would be relevant for a number of cancer types. Taken together, our data identify

11 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.20.305201; this version posted September 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

251 several conserved residues likely to underlie the START-directed regulation of DLC-1 and DLC-

252 2 tumor suppressive capability. DLC-1 and DLC-2 may thus be high-priority candidates for

253 development of novel therapeutics targeting their START domain.

254 4. Materials and methods

255 4.1 Kolmogorov-Smirnov test of uniformity

256 Mutation data was obtained from the COSMIC database (v91 GRCh38) for the dominant

257 isoforms of DLC-1 (DLC1_ENST00000358919.6), DLC-2 (ENST00000336934.9), and DLC-3

258 (ENST00000374599.7). DLC START domains were identified using HHpred [46]: DLC-1 (880 -

259 1083aa), DLC-2 (903aa-1108aa), and DLC-3 (893aa – 1098aa). Mutations outside of the

260 START domain, as well as nonsense, frameshift, and intronic mutations within the START

261 domain, were excluded from our analyses. To ensure only independent mutation events were

262 measured, duplicate samples from the same patient were removed before analysis. Missense

263 mutations were counted for individual residues, and a one-sample nonparametric test was done

264 using the Kolmogorov-Smirnov test in IBM SPSS Statistics 26. The observed distribution of

265 missense mutations was tested against the null hypothesis that mutations are uniformly

266 distributed along the sample data. Test option settings were set to a significance value of 0.05,

267 a confidence value of 95%, and to exclude cases test-by-test. User-Missing Values were set to

268 “Exclude”.

269 4.2 Multiple Sequence Alignment (MSA)

270 To build an MSA and identify conserved residues in DLC START domains, orthologs of

271 DLC-1, DLC-2, and DLC-3 were identified using NCBI’s Blastp program against the reference

272 proteins (refseq_protein) database [47].In total, 123 full length sequences were found from 46

273 vertebrate species. Sequences were aligned using the ClustalW algorithm [48] in the MEGA X

274 software [49], and trimmed to exclude residues falling outside the START domain. Conservation

12 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.20.305201; this version posted September 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

275 of residues was analyzed using R and “msa” package from Bioconductor [50]. A stringent 98%

276 consensus threshold was used, allowing for only 2 of the 123 sequences to contain a non-

277 similar residue at a given position. This threshold was selected to account for rare lineage

278 specific mutations that may be permissive to changes in critical residues, but not representative

279 of most sequences [51]. Residues were then shaded according to similarity and a consensus

280 logo was added using R and the “msa” package from Bioconductor. Finally, this file was

281 exported in .tex format and compiled using TeXworks (Figure S1).

282 4.3 Mutation counts and chi-square analysis

283 The expected number of mutations in conserved residues assuming a random distribution

284 was calculated by using the percentage of identical residues multiplied by the total number of

285 mutations in START domain of each paralog.

# 표푓 푐표푛푠푒푟푣푒푑 푟푒푠𝑖푑푢푒푠 퐸푥푝푒푐푡푒푑 푚푢푡푎푡𝑖표푛푠 𝑖푛 푐표푛푠푒푟푣푒푑 푟푒푠𝑖푑푢푒푠 = # 표푓 푚푢푡푎푡𝑖표푛푠 푥 퐷표푚푎𝑖푛 푙푒푛푔푡286ℎ

287 The expected number of mutations in non-conserved residues was determined by

288 subtracting the expected number of mutations in conserved residues from the total number of

289 mutations. Chi-square values were calculated using the formula: (퐸푥푝푒푐푡푒푑−푂푏푠푒푟푣푒푑)2 퐶ℎ𝑖 푠푞푢푎푟푒 푣푎푙푢푒 = 퐸푥푝푒푐푡푒푑 290 These summed chi-square values were then calculated using the website:

291 https://www.socscistatistics.com/pvalues/chidistribution.aspx

292 4.4 Homology modeling and identification of tertiary-level interactions

293 Homology modeling of the each START domain was performed using the Max Planck

294 Institute Bioinformatics Toolkit [46]. START domain sequences of DLC-1 and DLC-2 were

295 separately queried using HHpred against the (PDB_mmCIF70_20_May)

296 structural database to identify related protein structures for homology modelling. The top 10

297 structures were identical for both START domains, and included the crystal structure of DLC-2

13 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.20.305201; this version posted September 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

298 (2PSO). These 10 protein structures (chain A of 2PSO, chain A 2R55, chain A of 2MOU, chain

299 A of 5I9J, chain A of 6L1M, chain A of 2E3P, chain C of 3P0L, chain A of 1LN1, chain B of

300 3FO5, chain B of 3QSZ) were forwarded to HHpred-TemplateSelection to generate the MSA.

301 Template alignment was subsequently forwarded to MODELLER to predict the tertiary

302 structures of the DLC-1 and DLC-2 START domains. Structural models were examined using

303 Crystallographic Object-Oriented Toolkit (CooT) to identify predicted interactions formed with

304 the conserved residues using the Environmental Distances tool with a cut-off of 4.0 Å. Mutated

305 DLC-1 and DLC-2 START domain were modeled using the same template alignment as their

306 respective wildtype models, and analyzed for predicted interactions in CooT using the same cut-

307 off value.

308 Supplementary Materials: The following are available online at XXX, Figure S1: Full multiple

309 sequence alignment (MSA) of START domains from DLC-1, DLC-2, and DLC-3 orthologs.

310 Tables S1-S7 in Excel.

311 Author Contributions: Conceptualization, ASH, RCP, and AYH; methodology, ASH, RCP,

312 RAB, and AYH; software, ASH and RAB; validation, ASH, RCP, RAB and AYH; formal analysis,

313 ASH, RAB, and AYH; investigation, ASH and RCP; resources, ASH, RCP, and AH; data

314 curation, ASH; writing—original draft preparation, ASH and AYH; writing—review and editing,

315 ASH, RCP, RAB, and AYH; visualization, ASH, RAB, and AYH; supervision, AYH; project

316 administration, AYH; funding acquisition, RCP and AYH. All authors have read and agreed to

317 the published version of the manuscript.

318 Funding: ASH is funded by a Pelotonia Graduate Fellowship Award. Funding for RCP was

319 provided by the Ohio State University James Comprehensive Cancer Center through a P30

320 award (CA016058) and a Pelotonia award.

14 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.20.305201; this version posted September 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

321 Acknowledgements: We thank members of the Husbands Lab for insightful comments on the

322 work.

323 Conflict of Interest: The authors declare no conflict of interest.

324

325

15 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.20.305201; this version posted September 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

326

327 Figure 1. The Rho GTPase signaling module. Rho GTPases (red) are active while bound to 328 GTP. RhoGAPs (purple) stimulate hydrolysis of bound GTP, switching Rho GTPases an 329 inactive GDP-bound state. RhoGDIs (blue) bind GDP-bound Rho GTPases and sequester them 330 in the cytosol. Finally, RhoGEFs (orange) exchange bound GDP for GTP, reactivating Rho 331 GTPase signaling. 332

16 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.20.305201; this version posted September 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

333 334 Figure 2. Representative subset of the multiple sequence alignment (MSA) of START domains 335 from DLC-1, DLC-2, and DLC-3 orthologs. Species include Homo sapiens (Hs), Sus scrofa (Ss), 336 Aquila chrysaetos (Ac), Alligator mississippiensis (Am), Xenopus tropicalis (Xt), and Danio rerio 337 (Dr). Blue indicates residues that are identical in ≥98% of sequences. Magenta indicates 338 residues with similar physiochemical properties in ≥98% of sequences. White represents non- 339 conserved residues. Logo represents the consensus residue(s) of the 123 sequences used in 340 the full alignment (see Figure S1). 341 342

17 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.20.305201; this version posted September 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

343 344 Figure 3. COSMIC missense mutations localizing to conserved residues of DLC-1, DLC-2, and 345 DLC-3 START domains. (A) Lollipop plot displaying the location and frequency of missense 346 mutations. Mutations occur in identically conserved (blue), physiochemically conserved 347 (magenta), or non-conserved residues (white). (B) COSMIC missense mutations are 348 overrepresented in the evolutionary-conserved residues of the START domains of DLC-1 and 349 DLC-2, but not DLC-3. *** P < .001, chi-square test. 350 351

18 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.20.305201; this version posted September 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

352 353 354 355

356

357

358

359

360

361

362

363

364 Figure 4. Homology models of DLC-1 and DLC-2 START domains. (A,C) Side and (B,D) top 365 down views of (A,B) DLC-1 and (C,D) DLC-2 START domains. Blue indicates conserved 366 residues with missense mutations in cancers. Red indicates conserved arginine, serine, and 367 phenylalanine residues mutated in both DLC-1 and DLC-2, as well as the highly-conserved 368 arginine in DLC-1 (R988) mutated in numerous cancers. 369

19 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.20.305201; this version posted September 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384 Figure 5. Consequences of COSMIC missense mutations on the tertiary structure of the DLC-1 385 START domain. (A-C) Conserved residues colocalizing near the C-terminal helix form multiple 386 tertiary-level interactions. (D) R947C mutation disrupts the cation-π interaction of F1078 and 387 hydrogen bond of Q1081. (E) F1078L disrupts the R947 cation-π interaction, but retains the 388 hydrogen bond between R947 and Q1081. (F) S1077F retains the cation-π interaction, and 389 disrupts all other bonds. Dotted lines indicate hydrogen bonding and cation-π interactions. 390

20 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.20.305201; this version posted September 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 Figure 6. Consequences of COSMIC missense mutations on the tertiary structure of the DLC-2 406 START domain. A-C Conserved residues colocalizing near the C-terminal helix form multiple 407 tertiary-level interactions. (D) R969C mutation disrupts all interactions formed between these 408 residues. E) F1101L disrupts the cation-π interaction of R969 and the hydrogen bond with the 409 side chain of S1100. Dotted lines indicate hydrogen bonding and cation-π interactions. 410 411

21 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.20.305201; this version posted September 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

412 Supplemental Figure and Table Legends 413 414 Figure S1. Full multiple sequence alignment (MSA) of START domains from DLC-1, DLC-2, 415 and DLC-3 orthologs. Species are listed down the left side of the MSA. Blue indicates residues 416 that are identical in ≥98% of sequences. Magenta indicates residues with similar physiochemical 417 properties in ≥98% of sequences. White represents non-conserved residues. Logo represents 418 the consensus residue(s) of all 123 sequences. 419 Table S1 – COSMIC missense mutations and sequence conservation for DLC-1. Conserved 420 and non-conserved residues in the DLC-1 START domain mutated in COSMIC. Features 421 include: residue number, number of missense mutations, identity of introduced substitution, and 422 level of conservation in MSA. 423 Table S2 – COSMIC missense mutations and sequence conservation for DLC-2. Conserved 424 and non-conserved residues in the DLC-2 START domain mutated in COSMIC. Features 425 include: residue number, number of missense mutations, identity of introduced substitution, and 426 level of conservation in MSA. 427 Table S3 - COSMIC missense COSMIC missense mutations and sequence conservation for 428 DLC-3. Conserved and non-conserved residues in the DLC-3 START domain mutated in 429 COSMIC. Features include: residue number, number of missense mutations, identity of 430 introduced substitution, and level of conservation in MSA. 431 Table S4 - Kolmogorov-Smirnov (K-S) test of uniformity for DLC-1, DLC-2, and DLC-3 START 432 domains. K-S test indicates no deviation from uniformity for all DLC START domains. P > 0.05. 433 Table S5 – Summary of DLC-1, DLC-2, and DLC-3 mutations in conserved and non-conserved 434 residues. Percentage of residues ≥98% conserved in all DLC START domains (green), 435 percentage of conserved residues with at least one mutation (pink), percentage of all mutations 436 falling in conserved residues (grey). 437 Table S6 – Chi-square input value determination. Calculation of Expected numbers of mutations 438 falling in conserved or non-conserved residues (for chi-square test). 439 Table S7 – Chi-square analysis. Chi-square tested demonstrating DLC-1 and DLC-2 START 440 domains accrue significantly more mutations in conserved residues than expected by chance. 441 DLC-3 does not show this effect. ***P < 0.001. 442 443 444

22 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.20.305201; this version posted September 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

445 References 446 1. Etienne-Manneville S, Hall A. 2002. Rho GTPases in cell biology. Nature. 420: 629 – 447 635. 448 2. Vega F, Ridley A. 2008. Rho GTPases in cancer cell biology. FEBS Letters. 582: 2093 – 449 2101. 450 3. Haga R, Ridley A. 2016. Rho GTPases: Regulation and roles in cancer cell biology. 451 Small GTPases. 7(4): 207 – 221. 452 4. Jung H et al. 2020. Dysregulation of Rho GTPases in Human Cancers. Cancers. 12(5): 453 1179. 454 5. del Mar Maldonado M. Dharmawardhane S. 2018. Targeting Rac and Cdc42 GTPases 455 in Cancer. Cancer Res. 78(12): 3101 – 3111. 456 6. Shang X et al. 2012. Rational design of small molecule inhibitors targeting RhoA 457 subfamily Rho GTPases. Chem Buil. 19 (6): 699 – 710. 458 7. Lin Y, Zheng Y. 2015. Approaches of targeting Rho GTPases in cancer drug discovery. 459 Expert Opin Drug Discov. 10(9): 991 – 1010. 460 8. Prieto-Dominguez N, Parnell C, Teng Y. 2019. Drugging the Small GTPase Pathways in 461 Cancer Treatment: Promises and Challenges. Cells. 8: 255. 462 9. Garcia-mata R, Boulter E. Burridge K. 2012. The invisible hand: regulation of RHO 463 GTPases by RHOGDIs. Nat Rev Mol Cell Biology. 12(8): 493 – 504. 464 10. Zhao L et al. 2008. Overexpression of Rho GDP-Dissociation Inhibitor Alpha Is 465 Associated with Tumor Progression and Poor Prognosis of Colorectal Cancer. Journal of 466 Proteome Research. 7: 3994 – 4003. 467 11. Jones M et al. 2002. Proteomic analysis and identification of new biomarkers and 468 therapeutic targets for invasive ovarian cancer. Proteomics. 2: 76 – 84. 469 12. Wang D et al. 2016. DLC1 is the principally biologically-relevant down-regulated DLC 470 family member in several cancers. Oncotarget. 7(29): 45144 – 44157. 471 13. Peck J, Douglas IV G, Wu C, Burbelo P. 2002. Human RhoGAP domain-containing 472 proteins: structures, function, and evolutionary relationships. FEBS Letters. 528 (1-3). 27 473 – 34. 474 14. Hodge R, Ridley A. 2016. Regulating Rho GTPases and their regulators. Nat Rev Mol 475 Cell Biol. 17: 496 – 510. 476 15. Wang D et al. 2020. Cancer-Associated Point Mutations in the DLC1 Tumor Suppressor 477 and Other Rho-GAPs Occur Frequently and Are Associated with Decreased Function. 478 Cancer Res. 80(17): 3568 – 3579. 479 16. Durkin M et al. 2005. DLC-1, a Rho GTPase-activating protein with tumor suppressor 480 function, is essential for embryonic development. FEBS Letters. 579(5): 1191 – 1196. 481 17. Yau T et al. 2009. Deleted in Liver Cancer 2 (DLC2) Was Dispensable for Development 482 and Its Deficiency Did Not Aggravate Hepatocarcinogenesis. PLoS ONE. 4(8): E6566. 483 18. Durkin M et al. 2007. Deleted in liver cancer 3 (DLC-2), a novel Rho GTPase-activating 484 protein, is downregulated in cancer and inhibits tumor cell growth. Oncogene. 26: 4580 – 485 4589. 486 19. Sun L, Sun J, Song J. 2019. High expression of DLC family proteins predicts better 487 prognosis and inhibits tumor progression in NSCLC. Molecular Medicine Reports. 19: 488 4881 – 4889. 489 20. Du X et al. 2012. Functional Interaction of Tumor Suppressor DLC1 and Caveolin-1 in 490 Cancer Cells. Cancer Res. 72(17): 4405 – 4416.

23 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.20.305201; this version posted September 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

491 21. Healy K et al. 2008. DLC-1 Suppresses Non-Small Cell Lung Cancer Growth and 492 Invasion By RhoGAP-Dependent and Independent Mechanisms. Mol Carcinog. 47(5): 493 326 – 337. 494 22. Lin Y et al. 2010. DLC2 modulates angiogenic responses in vascular endothelial cells by 495 regulating cell attachment and migration. Oncogene. 29(20): 3010 – 3016. 496 23. Leung T et al. 2005. Deleted in liver cancer 2 (DLC2) suppresses cell transformation by 497 means of inhibition of RhoA activity. Proc Natl Acad Sci U S A. 102(42): 15207 – 15212. 498 24. Ching Y et al. 2003. Deleted in Liver Cancer (DLC) 2 Encodes a RhoGAP Protein with 499 Growth Suppressor Function and Is Underexpressed in Hepatocellular Carcinoma. The 500 Journal of Biological Chemistry. 278: 10824 – 10830. 501 25. Iyer L, Koonin E, Aravind L. 2001. Adaption of the helix-grip fold for ligand binding and 502 catalysis in the START domain superfamily. PROTEINS: Structure, Function, and 503 Genetics. 43: 134-144. 504 26. Roderick S et al. 2002. Structure of human phosphatidylcholine transfer protein in 505 complex with its ligand. Nat Struct Biol. 9(7): 507-511. 506 27. Murcia M et al. 2006. Modeling the structure of the StART domains of MLN64 and StAR 507 proteins in complex with cholesterol. The Journal of Lipid Research. 47: 2614 – 2630. 508 28. Soccio R, Breslow J. 2003. StAR-related Lipid Transfer (START) Proteins: Mediators of 509 Intracellular Lipid Metabolism. The Journal of Biological Chemistry. 278: 22183 – 22186. 510 29. Alpy F, Tomasetto C. 2005. Give lipids a START: the StAR-related lipid transfer 511 (START) domain in mammals. Journal of Cell Science.118: 2791-2801. 512 30. Raya A et al. 1999. Characterization of a Novel Type of Serine/Threonine Kinase That 513 Specifically Phosphorylates the Human Goodpasture Antigen. The Journal of Biological 514 Chemistry. 274: 12642 – 12649. 515 31. Schrick K et al. 2004. START lipid/sterol-binding domains are amplified in plants and are 516 predominantly associated with homeodomain transcription factors. Genome Biology. 5: 517 R41. 518 32. Tillman M et al. 2020. Allosteric regulation of thioesterase superfamily member 1 by lipid 519 sensor domain binding fatty acids and lysophosphatidylcholine. PNAS. 202003877 520 33. Wang S et al. 2019. YR36/WKS1-mediated phosphorylation of PsbO, an extrinsic 521 member of photosystem II, inhibits photosynthesis and confers stripe rust resistance in 522 wheat. Mol Plant. 12:1639 – 1650. 523 34. Khafif M et al. 2017. An essential role for the VASt domain of the Arabidopsis VAD1 524 protein in the regulation of defense and cell death in response to pathogens. PLoS ONE. 525 12: 1 – 18. 526 35. Kanno et al. 2007. Interacting proteins dictate function of the minimal START domain 527 phosphatidylcholine transfer protein/ StarD2. Journal of Biological Chemistry. 282(42): 528 30728-30736. 529 36. Tate J et al. 2019. COSMIC: the Catalogue Of Somatic Mutations In Cancer. Nucleic 530 Acids Research. 47(D1): D941 – D947. 531 37. Capra J, Singh M. 2007. Predicting functionally important residues from sequence 532 conservation. Bioinformatics. 23(15): 1875 – 1882. 533 38. Baker B et al. 2007. Cholesterol Binding Does Not Predict Activity of the Steroidogenic 534 Acute Regulatory Protein, StAR. Journal of Biological Chemistry. 282: 10223 – 10232. 535 39. Castro-Chavez F. 2010. The rule of variation: Amio acid exchange according to the 536 rotating circular genetic code. J Theo Biol. 264(7): 711 – 721.

24 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.20.305201; this version posted September 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

537 40. Dougherty D. 1996. Cation-π Interactions in Chemistry and Biology: A New View of 538 Benen, Phe, Tyr, and Trp. Science. 271:163 – 168. 539 41. Gallivan J, Dougherty D. 1999.Cation-π interactions in structural biology. PNAS. 96(17): 540 9459 – 9464. 541 42. Anoosha P, Sakthivel R, Gromiha M. 2016. Exploring preferred amino acid mutations in 542 cancer genes: Applications to identify potential drug targets. BBA – Molecular Basis of 543 Disease. 1862(2): 155 – 165. 544 43. Popescu N, Goodison S. 2014. Deleted in liver cancer-1 (DLC-1): An emerging 545 metastasis suppressor . Mol Diagn Ther.18(3): 292 – 302. 546 44. Miyakawa T, Fujita Y, Yamaguchi-Shinozaki K, Tanokura M. 2013. Structure and 547 function of abscisic acid receptors. Trends Plant Sci. 18:259-266. 548 45. Vaidya et al. 2019. Dynamic control of plant water use using designed ABA receptor 549 agonists. Science. 366(445): eaaw8848. 550 46. Zimmermann L et al. 2008. A Completely Reimplemented MPI Bioinformatics Toolkit 551 with a New HHpred Sever at its Core. J Mol Biol. 430(15): 2237 – 2243. 552 47. NCBI Resource Coordinators. 2018. Database resources of the National Center for 553 Biotechnology Information. Nucleic Acids Research. 46: D8 – D13. 554 48. Thompson J, Higgins D, Gibson T. 1994. CLUSTAL W: improving the sensitivity of 555 progressive multiple sequence alignment through sequence weighting, position-specific 556 gap penalties and weight matrix choice. Nucleic Acids Res. 22(22): 4673 – 4680. 557 49. Kumar S, Stecher G, Knyaz C, Tamura K. 2018. MEGA X: Molecular Evolutionary 558 Genetics Analysis across Computing Platforms. Mol Biol Evol. 35(6): 1547 – 1549. 559 50. Bodenhofer U, Bonatesta E, Horejs-Kainrath C, Hochreiter S (2015). “msa: an R 560 package for multiple sequence alignment.” Bioinformatics, 31(24), 3997–3999. 561 51. Bridgham J, Carroll S, Thornton J. 2006. Evolution of hormone-receptor complexity by 562 molecular exploitation. Science. 312. 97-101 563 564 565 566 567

25