<<

bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 SARS-CoV-2 Entry TMPRSS2 and Its

2 Homologue, TMPRSS4 Adopts Structural Fold Similar

3 to Blood and Complement Pathway

4 Related

∗,a ∗∗,b b 5 Vijaykumar Yogesh Muley , Amit Singh , Karl Gruber , Alfredo ∗,a 6 Varela-Echavarría

a 7 Instituto de Neurobiología, Universidad Nacional Autónoma de México, Querétaro, México b 8 Institute of Molecular Biosciences, University of Graz, Graz, Austria

9 Abstract The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) utilizes TMPRSS2 receptor to enter target cells and subsequently causes coron- avirus disease 19 (COVID-19). TMPRSS2 belongs to the type II serine proteases of subfamily TMPRSS, which is characterized by the presence of the serine- protease domain. TMPRSS4 is another TMPRSS member, which has a domain architecture similar to TMPRSS2. TMPRSS2 and TMPRSS4 have been shown to be involved in SARS-CoV-2 infection. However, their normal physiological roles have not been explored in detail. In this study, we analyzed the amino acid sequences and predicted 3D structures of TMPRSS2 and TMPRSS4 to under- stand their functional aspects at the protein domain level. Our results suggest that these proteins are likely to have common functions based on their conserved domain organization. Furthermore, we show that the predicted 3D structure of their domain has significant similarity to that of plasminogen which dissolves blood clot, and of other blood coagulation related proteins. Additionally, molecular docking analyses of inhibitors of four blood coagulation and anticoagulation factors show the same high specificity to TMPRSS2 and TMPRSS4 3D structures. Hence, our observations are consistent with the blood coagulopathy observed in COVID-19 patients and their predicted functions based on the sequence and structural analyses offer avenues to understand better and explore therapeutic approaches for this disease.

10 Keywords: Covid19; TMPRSS2; TMPRSS4; Protease; SARS-CoV-2; Blood

11 coagulation factors

12 1. Introduction

13 Proteolysis is mediated by a special class of proteins called proteases or

14 peptidases that hydrolyze peptide bonds of their substrate proteins (López-

15 Otín and Overall, 2002). They act as a surveillance system that monitors

16 the turnover of cellular proteins. Hence, they modulate a plethora of cellular

∗Corresponding Author ∗∗First author Email addresses: [email protected]; [email protected] (Vijaykumar Yogesh Muley), [email protected] (Alfredo Varela-Echavarría) bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

17 processes including cell growth, survival, and death, as well as phagocytosis,

18 signaling pathways and membrane re-modelling (Muley et al., 2019; Puente et

19 al., 2005). In Escherichia coli, 36% (26% with stringent criteria) of proteases

20 belong to the serine protease family (Clausen et al., 2002) and this distribution

21 is estimated to be similar for many organisms. More than two percent of human

22 encode proteases (Puente et al., 2005), and 20 of them are classified as the

23 type II transmembrane serine proteases (TTSP). TTSPs have conserved domain

24 organization, which consists of a single-pass transmembrane domain located near

25 the amino-terminal end of the protein spanning through the cytosol and a large

26 extracellular portion at the carboxy-terminus containing the serine protease

27 domain of the fold (Clausen et al., 2002; Szabo and Bugge, 2008).

28 This fold is characterized by the Ser-His-Asp , which is involved in

29 activity. These are widely distributed in prokaryotic and

30 eukaryotic genomes (Clausen et al., 2002; Muley et al., 2019; Puente et al., 2005).

31 Interestingly, the first TTSP member was identified over a century ago by Pavlov

32 due to its essential role in food digestion (Szabo and Bugge, 2008), and it was

33 cloned in 1994 leading to its characterization as a plasma membrane-anchored

34 protein (Kitamoto et al., 1994).

35 The transmembrane protease, serine 2 (TMPRSS2) and 4 (TMPRSS4) are

36 members of the TTSP family and belong to the hepsin/transmembrane pro-

37 tease/serine (TMPRSS) subfamily of TTSP (Szabo and Bugge, 2008). TMPRSS2

38 facilitates SARS-CoV-1 and SARS-CoV-2 entry in human cells and plays a crit-

39 ical role in Coronavirus disease 19 (Covid19) (Hoffmann et al., 2020; Hu et

40 al., 2020; Matsuyama et al., 2010). TMPRSS4 was previously characterized as

41 TMPRSS3 (Wallrapp et al., 2000), which along with TMPRSS2 promotes SARS

42 CoV-2 infection in human enterocytes (Zang et al., 2020). Its overexpression

43 has been observed in dozens of cancers and it contributes to tumorigenesis and

44 metastasis (Aberasturi and Calvo, 2015; Lee et al., 2016; Villalba et al., 2019).

45 Interestingly, TMPRSS2 and TMPRSS4 have been also shown to act as host

46 cell entry receptors for Influenza virus (Bertram et al., 2010) and TMPRSS2

47 was further shown to be involved in replication of H7N9 and Influenza viruses in

48 vivo (Sakai et al., 2014). However, their functions are not clearly understood in

49 normal conditions or in viral diseases.

50 In this study, we analyzed the amino acid sequences and predicted 3D

51 structures of TMPRSS2 and TMPRSS4 to understand their functional aspects at

52 the protein domain level. Our results suggest that these proteins are likely to have

53 common functions based on their conserved domain organization. Furthermore,

54 we show that the predicted 3D structure of their serine protease domain has

55 significant similarity to that of plasminogen, and of other blood coagulation

56 related proteins. Additionally, molecular docking analyses of inhibitors of four

57 blood coagulation and anticoagulation factors show the same high specificity to

58 TMPRSS2 and TMPRSS4 3D structures. Hence, our observations are consistent

59 with the blood coagulopathy observed in Covid19 patients and their predicted

60 functions based on the sequence and structural analyses offer avenes to understand

61 better and explore therapeutic approaches for this disease.

2 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

62 2. Material and methods

63 2.1. Sequence analysis

64 Protein sequences of TMPRSS2 and TMPRSS4 from and their mouse

65 orthologs were obtained from the UniProt database (Bateman et al., 2017).

66 Protein domains were identified using the scanProsite tool from the ProSite

67 database (Castro et al., 2006; Sigrist et al., 2009). Further domain architecture

68 information was obtained from the Genome3D database (Lewis et al., 2015).

69 The TOPCONS web server was used to predict the membrane-spanning region

70 of the proteins (Tsirigos et al., 2015). Multiple sequence alignment of human and

71 mouse proteins was constructed using the MAFFT plugin of JalView program,

72 and visualized using the latter (Katoh et al., 2018; Waterhouse et al., 2009). The

73 sequences of TMPRSS2 and TMPRSS4 were used for searches against the Protein

74 Data Bank (PDB) database using HHPred to find their structural homologs

75 (Berman, 2000; Hildebrand et al., 2009). Phyre2 was used in intensive mode to

76 predict their 3D structures (Kelley et al., 2015). Phyre2 modelled the TMPRSS2

77 structure using the PDB template structures 4O03_A, 2XRC_D, 6ESO_A,

78 4DUR_A, 4HZH_B, 1Z8G_A, and 3NXP_A. The same templates were also used

79 to model the TMPRSS4 structure except the 3NXP_A. The regions composed

80 of the scavenger receptor cysteine-rich (SRCR) and serine protease domains in

81 TMPRSS2 and TMPRSS4 were modelled with high accuracy by Phyre2, which

82 was also supported by HHPred results. The predicted structures belonging to

83 this region were then uploaded to the CATH web server to obtain the structural

84 domain hits from available crystal structures (Dawson et al., 2017). CATH

85 results confirmed the presence of two distinct domains, a large domain with

86 Greek-key β-barrel fold (Chymotrypsin domain) and a SRCR domain. Then,

87 the 3D protein structure corresponding to this region was compared with the

88 template structures identified by Phyre2 and top 20 structural homologs obtained

89 from HHPred search, together containing 36 unique structures. The domain

90 architectures of the corresponding proteins were extracted using ProSite database

91 (Sigrist et al., 2009).

92 2.2. Protein 3D structure analysis

93 We computed the root mean square deviation (RMSD) between the backbone

94 structure of the protease domain alone, the SRCR domain alone and both do-

95 mains of TMPRSS2 and TMPRSS4 with the above-mentioned 36 PDB structures

96 using the align module in PyMOL, with maximum iteration cycles of 20 and

97 BLOSUM62 as a scoring matrix (Schrödinger, LLC, 2015). The structures of plas-

98 minogen (PDB accession, 5UGG) and prothrombin activator (a catalytic domain

99 of prothrombinase, PDB accession, 4BXW) are available in complex with their

100 selective inhibitors YO (trans-4-aminomethylcyclohexanecarbonyl-l-tyrosine-n-

101 octylamide, PDB accession, 89M) and L-Glu-Gly-Arg chloromethyl ketone (PDB

102 accession, 0GJ) respectively (Law et al., 2017; Lechtenberg et al., 2013). These

103 structures were superimposed with TMPRSS2 and TMPRSS4 in the presence

104 and absence of their inhibitors using the PyMOL align tool. We selected 89M and

3 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

105 0GJ ligands, and also the inhibitor D-phenylalanyl-N-(3-chlorobenzyl)-

106 L-prolinamide (PDB accession, 22U) from PDB structure 2ZC9 (Baum et al.,

107 2009) and the plasma inhibitor, N-[(6-amino-2,4-dimethylpyridin-3-

108 yl)methyl]-1-({4-[(1H-pyrazol-1-yl)methyl]phenyl}methyl)-1H-pyrazole-4-carboxamide

109 (PDB accession, 75D) from 6O1S (Partridge et al., 2019) to perform their docking

110 studies on TMPRSS2, TMPRSS4, 4BXW, 5UGG, and 2ANY structures using

111 the Glide docking program (Friesner et al., 2004). Briefly, the LigPrep module in

112 Maestro was employed to generate multiple conformations of the ligands followed

113 by energy minimization (Schrödinger, 2018). The target protein structures

114 were preprocessed to remove the bad contacts using the wizard integrated into

115 Maestro. The OPLS force field was used to minimize the protein structure

116 (Jorgensen and Tirado-Rives, 1988). The center of the receptor grid was placed

117 on the center of mass (the triad Ser-His-Asp) of the proteins, followed

118 by the extra precision Glide docking (Friesner et al., 2006). All structural images

119 were rendered using PyMOL.

120 3. Results

121 3.1. A conserved extracellular domain architecture of TMPRSS2 and TMPRSS4

122 suggests their related functions

123 The human TMPRSS2 encodes a 492 amino acid long protein compared

124 to the 437 amino acids encoded by TMPRSS4 (Wallrapp et al., 2000). Pairwise

125 global sequence alignment using the Needleman-Wunsch algorithm showed amino

126 acid similarity of 42.2% and identity of 30.3% between them (Madeira et al.,

127 2019; Needleman and Wunsch, 1970). The single-pass transmembrane helix is

128 present in both proteins near their N-termini (Supplementary figure 1). The

129 approximate location of a transmembrane helix in TMPRSS2 is between 85 and

130 105 residues, whereas between 33 and 53 in TMPRSS4 leaving a longer N-terminal

131 sequence of 84 amino acids in TMPRSS2. The N-terminal sequence preceding the

132 transmembrane helix in both proteins is shorter and predicted to be cytoplasmic,

133 while the following sequence is longer and exposed to the extracellular milieu

134 owing to its extracellular topology (Supplementary figure 1). As shown in

135 Figure 1A, both proteins have a conserved extracellular domain architecture

136 containing the Low-density lipoprotein (LDL) receptor class A domain (denoted

137 as LDLRA), the scavenger receptor cysteine rich (SRCR) and the protease

138 domain of Peptidase S1A, chymotrypsin family or -like serine protease

139 superfamily according to Pfam and Superfamily database, respectively (Gough

140 et al., 2001; Sonnhammer et al., 1997). The protease domain, hereafter referred

141 to as serine-protease, adopts a chymotrypsin type structural fold characterized

142 by Greek-key β-barrels. The results pertaining to the extracellular portion are

143 consistent with the previous sequence analysis reports (Aberasturi and Calvo,

144 2015; Szabo and Bugge, 2008; Wallrapp et al., 2000). The chymotrypsin fold is

145 a prototype structural feature of the high temperature requirement A (HtrA)

146 of Trypsin-like serine proteases, which acts as chaperones and

147 are responsible for maintaining protein tertiary structure at high temperature

148 (Clausen et al., 2002).

4 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

149 In contrast to conserved C-terminal regions, the N-terminal amino acid se-

150 quences preceding the transmembrane helix in both proteins differ substantially

151 (Figure 1A). TMPRSS4 has only a stretch of 32 residues at its N-terminus, which

152 do not show similarity to any known structure. In TMPRSS2, however, the

153 equivalent N-terminal region of approximately 85 amino acids is predicted to

154 be structurally homologous to the Delta-retroviral matrix superfamily (CATH

155 superfamily code 1.10.185.10), and within this region, a stretch of amino acids

156 spanning positions 5 to 33 show similarity to the β-sandwich domain of Sec23/24

157 superfamily (CATH superfamily code 2.60.40.1670) according to the Genome3D

158 database annotation (Lewis et al., 2015). The β-sandwich domain of Sec23/24

159 can also be confirmed with the Superfamily database searches (SUPERFAM-

160 ILY/SCOP database accession: 81995) (Gough et al., 2001). This domain is likely

161 to adopt the Human T-cell Leukemia Virus Type II Matrix Protein structural

162 fold according to CATH database annotation (Dawson et al., 2017). However, we

163 could not detect homologous sequences for this region in viral genome restricted

164 searches using a PSI-BLAST at NCBI, neither HHPred search predicted similar

165 structures in the PDB database (Hildebrand et al., 2009; Johnson et al., 2008).

166 Therefore, experimental analyses are required to address the functional aspects

167 of both predicted domains.

168 3.2. Multiple sequence alignment of TMPRSS2 and TMPRSS4 with their mouse

169 orthologs shows highly conserved SRCR and serine protease domains

170 To understand the amino acid variations in both proteins, we performed

171 multiple sequence alignment of TMPRSS2 and TMPRSS4 with their mouse

172 orthologs only, since our aim was to confirm whether important amino acid

173 positions are conserved in both proteins. This analysis revealed indels in the

174 N-terminal region of TMPRSS2 and TMPRSS4 not affecting their conserved

175 transmembrane helix, which is followed by a conserved C-terminal sequence

176 (Figure 1B). The LDLRA domain is located right next to the membrane helix in

177 both proteins, which is consistent with previous studies (Aberasturi and Calvo,

178 2015; Szabo and Bugge, 2008). This domain contains six cysteine disulfide-bonds

179 that bind lipoproteins such as LDLs and a highly conserved cluster of negatively

180 charged amino acids (Bieri et al., 1995; Yamamoto et al., 1984). All six cysteines

181 are conserved in TMPRSS2 and its mouse ortholog, and four are also conserved in

182 TMPRSS4. One indel is adjacent to the LDLRA domain at the N-terminal end in

183 TMPRSS4 and another at the C-terminus of TMPRSS2. The one in TMPRSS4

184 corresponds to one of its missing cysteine residues, and another cysteine residue

185 is substituted by phenylalanine. Both proteins, however, have conserved calcium

186 binding sites within this LDLRA domain. The bound calcium ion imparts

187 structural integrity to the domain (Bieri et al., 1995), suggesting the presence

188 of LDLRA domain activity in both proteins. This domain in both proteins is

189 followed by a highly conserved SRCR and the serine protease domain (Figure

190 1B). The biochemical functions of SRCR domains have not been established

191 with certainty but they are likely to mediate protein-protein interactions and

192 ligand binding (Hohenester et al., 1999; Resnick et al., 1994). This domain is

193 found in diverse secreted and membrane bound proteins including regulators of

5 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

194 the complement cascades involved in immune response (Freeman et al., 1990).

195 The catalytic triad of Ser-His-Asp residues responsible for its proteolytic activity

196 is conserved in human and mouse (Figure 1B).

197 Overall, these results reveal that the extracellular region of TMPRSS2 and

198 TMPRSS4, and its domain organization is highly conserved suggesting that they

199 have related functions.

200 3.3. TMPRSS2 and TMPRSS4 show homology with and

201 blood coagulation and anticoagulation related proteins

202 To identify known structural homologs of both proteins, we queried their

203 sequences in the PDB database using the HHPred webserver. The region between

204 110 to 491 and 96 to 436 amino acid positions in TMPRSS2 and TMPRSS4,

205 respectively, showed significant similarity with several PDB structures (hits)

206 (Figure 1C). We selected the top 20 significant hits for each protein for further

207 analysis (details of search results are provided in the Supplementary table 1).

208 The structures 2XRC, 1Z8G, and 2OQ5 were the most closely related to the

209 extracellular region of both proteins (Figure 1C). TMPRSS2 showed the best

210 match with the 2XRC structure of Human complement factor I encoded by

211 the CFI gene (Roversi et al., 2011), while TMPRSS4 with the 1Z8G structure

212 belonging to another TTSP family member, hepsin (HPN), which is also known

213 as TMPRSS1 (Herter et al., 2005). Hepsin and TMPRSS2 proteolytically cleave

214 the Angiotensin-converting 2 (ACE2) in a similar manner (Heurich et

215 al., 2014). The 2OQ5 structure is a part of the catalytic domain of TTSP family

216 member DESC1 (Kyrieleis et al., 2007). DESC1 was shown to activate influenza

217 viruses and coronaviruses in cell culture linked to host cell entry (Zmora et al.,

218 2014). Most remaining HHPred hits belonged to structures of complement factors

219 or blood coagulation and anticoagulation proteins (Table 1). It is noteworthy,

220 that not a single structural match was obtained for the cytoplasmic region of

221 both proteins, even when their amino acid sequences were queried alone.

222 The high sequence similarity of TMPRSS2 and TMPRSS4 with known

223 structures in the PDB database allowed modelling their 3D structure using

224 Phyre2 web server (Kelley et al., 2015). In both sequences, about 86% of the

225 residues were modelled at more than 90% confidence. The 67 and 61 residues at

226 the N-terminal ends of TMPRSS2 and TMPRSS4, respectively, were modelled

227 ab initio due to the lack of homology with known structures. Therefore, we

228 removed the coordinates of the first 125 and 61 amino acids from the predicted

229 structures of TMPRSS2 and TMPRSS4 due to their low confidence prediction.

230 Figure 2 shows that SRCR and the serine-protease domains were modelled with

231 high accuracy in both proteins with the preservation of clustered cysteines and

232 the catalytic triad of serine-protease domain, respectively. A similar adjacent

233 two-domain architecture is common in other TTSPs such as TMPRSS5 and

234 Hepsin (Herter et al., 2005; Szabo and Bugge, 2008).

235 The template structures identified by Phyre2 (see Material and Methods)

236 and the top 20 hits obtained by the HHPred search of TMPRSS2 and TMPRSS4

237 homologs correspond to a total of 36 structures of 29 proteins from 10 species. As

238 expected, all of these correspond to known proteases and, strikingly, many of them

6 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

239 are related to blood coagulation processes, both pro- and anticoagulation (10 of

240 29 proteins) (Table 1). These proteins include anticoagulation factor plasminogen,

241 coagulation factor thrombin and plasma kallikrein. Moreover, five of the identified

242 proteins were linked to immune functions, including the Complement factor

243 D and I involved in an alternative immune response complement pathway, as

244 well as the GZMA, GZMK, and GZMM required for activation of

245 caspase-independent cell death in cytotoxic T-cells and NK-cells (Ewen et al.,

246 2012). We also observed that the six proteins encoded by the genes F11, F2,

247 KLKB1, PLAU, ACR, and TMPRSS11E are the targets of SERPINA5, a plasma

248 serine protease inhibitor with hemostatic roles as a procoagulant, anticoagulant

249 and proinflammatory factor (Yang and Geiger, 2017).

250 We further analyzed the domain architecture of these proteins. The serine-

251 protease domain is conserved in all homologs, and half of them are also ac-

252 companied by other domains, particularly those found in proteins related to

253 blood coagulation (Figure 3). These findings prompted the possibility that

254 TMPRSS2 and TMPRSS4 have functions in blood pro- and anticoagulation

255 related processes. We believe that these functions could be performed by their

256 serine protease domain alone as the thrombin-like snake serine protease

257 (UniProt name, VSPSX_GLOSA) has only the protease domain and it shows

258 strong blood coagulation activity in vitro (Figure 3) (Wei et al., 2007).

259 3.4. Structural homology of the predicted structure of TMPRSS2 and TMPRSS4

260 with plasminogen and enteropeptidases

261 We used the align module in PyMOL to compute the RMSD between back-

262 bone atoms of the predicted structures of TMPRSS2 and TMPRSS4 with each

263 of the above-mentioned 36 structures. More than 30 structures showed high

264 similarity with both structures with RMSD values of less than 1Å (Table 2).

265 Among them, the PDB structure 5UGG containing a serine protease domain

266 showed a striking superimposition with RMSD values of 0.563Å and 0.499Å with

267 TMPRSS2 and TMPRSS4, respectively (Figure 4A, C). The 5UGG structure be-

268 longs to a plasminogen (Law et al., 2017). Further underscoring the significance

269 of this finding is that along with the overall similarity of their chymotrypsin fold,

270 the coordinates of the active site amino acid triad are almost identical between

271 5UGG and both predicted structures (Figure 4B, D). The second-best structural

272 alignment of TMPRSS2 was with the 4DGJ structure with a RMSD value of

273 0.611Å, and TMPRSS4 with 3W94 with a RMSD value of 0.532Å (Table 2).

274 The 4DGJ and 3W94 structures are representative of TTSP enteropeptidases

275 such as the human TMPRSS15 which is found on the brush border membrane

276 of epithelial cells in the duodenum and its homolog from the Japanese rice fish

277 Oryzias latipes (UniProt accession A4UWM5), respectively. On the other hand,

278 the TMPRSS4 SRCR domain shows structural similarity with the equivalent

279 domain in hepsin (PDB 1Z8G, with a RMSD value of 0.169Å), while the TM-

280 PRSS2 SRCR domain was more similar to the chain D of the Complement factor

281 I domain (PDB 2XRC with RMSD value of 0.458Å) (Table 2) (Herter et al.,

282 2005; Roversi et al., 2011).

7 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

283 Corroborating the above results with HHPred predictions suggest that TM-

284 PRSS2 and TMPRSS4 are likely to adopt a structural core similar to the two

285 TTSP enteropeptidases, TMPRSS15 and Hepsin, the Complement factor I, and

286 plasminogen since they were most closely related at the sequence and predicted

287 structure level (Heissig et al., 2020; Lo et al., 2020; Risitano et al., 2020). These

288 results are consistent with previous in-silico analyses of the predicted TMPRSS2

289 structure. Hepsin was used as a template to model TMPRSS2 structure in three

290 previous studies (Chikhale et al., 2020; Idris et al., 2020; Rahman et al., 2020),

291 while human TMPRSS15 and its Japanese rice fish ortholog structures were also

292 used (Hempel et al., 2021; Huggins, 2020).

293 3.5. TMPRSS2 and TMPRSS4 may be inhibited by thrombin, plasma kallikrein

294 and plasminogen inhibitors

295 The plasminogen serine protease domain structure showed the best overlap

296 with TMPRSS2 and TMPRSS4 protease domains. Plasminogen dissolves the

297 fibrin and dissolves blood clots (Storti and Szwast, 1982), and this activity

298 has been shown to be inhibited by Aprotinin, a polypeptide consisting of 58

299 amino acid residues from bovine lung (Mahdy and Webster, 2004). Interestingly,

300 Aprotinin also inhibits plasma kallikrein and thrombin, which are involved in

301 blood coagulation and showed a reasonable structural match with TMPRSS2 and

302 TMPRSS4 serine protease domains. Therefore, we assumed that their selective

303 ligands can also inhibit the activity of TMPRSS2 and TMPRSS4. Structures

304 of the complexes of 5UGG (plasminogen) and 4BXW (prothrombin activator)

305 with their selective inhibitors 89M and OGJ, respectively are available in the

306 PDB database (Law et al., 2017; Lechtenberg et al., 2013). In addition, we

307 selected 22U and 75D molecules, which are selective inhibitors of thrombin (PDB,

308 2ZC9) and plasma kallikrein (PDB, 6O1S) (Baum et al., 2009; Partridge et

309 al., 2019). The latter structures were treated as positive controls since they

310 were not used as templates for TMPRSS2 and TMPRSS4 structure prediction

311 by Phyre2. In addition, we selected the structure of 2ANY of the kallikrein

312 protease family (Tang et al., 2005). These four inhibitor molecules were then

313 used for docking with TMPRSS2, TMPRSS4, 4BXW, 5UGG, and 2ANY using

314 the Glide docking program. We selected 4BXW, 5UGG, and 2ANY among

315 the other structures since they represent thrombin, plasminogen, and kallikrein

316 family serine protease structures and they had the lowest RMSD values with the

317 TMPRSS2 and TMPRSS4 structures. The receptor grid was centered on the

318 center of mass of the catalytic triad (Ser-His-Asp) of these proteins. Interestingly,

319 most Glide docking scores for TMPRSS2 and TMPRSS4 with all four inhibitors

320 were as good as with their original receptor molecules (Table 3). The predicted

321 structures of TMPRSS2 and TMPRSS4 had binding energy values below -5

322 kcal/mol for all molecules except for TMPRSS4, which had a binding energy of

323 around -8 kcal/mol for the inhibitors OGJ and 89M. Hence, TMPRSS2 scores

324 reflected a close fit for all inhibitors suggesting that it has a 3D structure similar

325 to blood coagulants as well as anticoagulant. However, TMPRSS4 showed a

326 reasonably high selectivity only for the inhibitors of the blood coagulation factors

327 thrombin and kallikrein. All these reported inhibitors are known to interfere with

8 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

328 the catalytic triad of the serine proteases. As shown in Figure 5, although both

329 TMPRSS2 and TMPRSS4 have the same number of bonds with both inhibitors,

330 the latter imparts an extra charged interaction with OGJ using Lys287, which

331 also forms a cation-pi and hydrogen bond with the 87M ligand, allowing a tighter

332 binding.

333 In summary, these results suggest that the serine protease domains of the

334 TMPRSS2 and TMPRSS4 are likely to have structural cores very similar to

335 those of plasminogen, thrombin, and plasma kallikrein. Hence, they are likely to

336 share effects as blood anticoagulant factors.

337 4. Discussion

338 The membrane-bound proteins TMPRSS2 and ACE2 are well known entry

339 points of SARS-CoV-2. On the other hand, TMPRSS4 has not been well studied

340 in the context of COVID19 pathogenesis. The TMPRSS4 expression on cell

341 membranes have been shown to promote SARS-CoV-1 S protein driven cell-cell

342 fusion similar to that of TMPRSS2 but without the cleavage of the S protein

343 (Glowacka et al., 2011). This suggests that TMPRSS4 activates the S protein

344 independently of its cleavage by an unknown molecular mechanism. TMPRSS2

345 and TMPRSS4 also activate hemagglutinin, which is indispensable for influenza

346 virus infectivity in lungs (Chaipan et al., 2009). Moreover, both proteins have

347 also been shown to assist infection of SARS-CoV-2 infection in the intestine

348 (Zang et al., 2020). This evidence suggests that TMPRSS2 and TMPRSS4

349 perform related functions. Therefore, we analyzed TMPRSS2 and TMPRSS4

350 proteins at the sequence and their predicted structure levels and provide evidence

351 supporting their related functions.

352 4.1. The N-terminal cytoplasmic region of TMPRSS2 is similar to the domain

353 involved in protein trafficking

354 The N-terminal cytoplasmic region of 85 residues in TMPRSS2 is likely

355 to adopt the structural fold (CATH ID: 1.10.185.10) found in Human T-cell

356 Leukemia Virus Type II Matrix Protein (Dawson et al., 2017). This gag protein

357 is a common feature of all retroviruses and is required for membrane localization

358 of the assembling viral particle and subsequently remains associated to the

359 inner surface of the membrane of the mature virion (Christensen et al., 1996).

360 Interestingly, we also found a hit for another domain in this region belonging to

361 the β-sandwich domain of the Sec23/24 superfamily (CATH ID: 2.60.40.1670).

362 This domain is a prototype of sec23/24 proteins that are part of the multi-

363 subunit complex COPII coat, which is responsible for the selective export of

364 cargo proteins from the endoplasmic reticulum to the Golgi apparatus (Hughes

365 and Stephens, 2008). Both domains seem to be involved in cargo trafficking.

366 However, we did not find homologous sequences for this region in viral genome

367 restricted searches using a PSI-BLAST at NCBI, neither using HHPred searches

368 in the PDB database (Hildebrand et al., 2009; Johnson et al., 2008). This is

369 likely due to the fact that structures are more conserved than the sequences and

9 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

370 hence it is often observed that the same fold is adopted by proteins even though

371 amino acid sequences differ substantially. Therefore, experimental analyses are

372 required to address the functional aspects of both predicted domains considering

373 their possible roles in viral assembly and trafficking (Christensen et al., 1996;

374 Hughes and Stephens, 2008).

375 4.2. LDLRA domains of TMPRSS2 and TMPRSS4 may be invovled in viral

376 entry via receptor-mediated endocytosis

377 The extracellular part of both proteins has highly conserved SCRC, LDLRA,

378 and serine-protease domains in the same order. The LDLRA domains are

379 unstructured and their binding of the calcium ion confers them structural

380 integrity, which in turn allows their binding to cholesterols. The calcium binding

381 site is conserved in both proteins suggesting that they can form an active LDLRA

382 structure. Cholesterol is an essential component of the eukaryotic cell membranes

383 in which it plays a critical role by maintaining their fluidity and hence the barrier

384 between cell and environment. Membrane receptors having LDLRA domains

385 bind LDLs that contain esterified cholesterol and carry them into cells after

386 clustering in clathrin-coated pits by a receptor-mediated endocytosis (Daly et al.,

387 1995). However, the function of the LDLRA domain in TMPRSS2 and TMPRSS4

388 is not clear as of now. It may normally bring LDLs inside cells through receptor-

389 mediated endocytosis as it does with other proteins such as apolipoprotein E

390 (Daly et al., 1995,Brown and Goldstein (1986)), which can be used for the host

391 membrane remodelling. Assuming that viruses exploit host cell machinery to

392 perform their tasks, TMPRSS2 mediated endocytosis can also import SARS-

393 CoV-2 inside cells along with LDLs. Since endocytosis leaves no evidence of

394 virus entry and it can avoid detection of its cargo by immunosurveillance, the

395 endocytic pathway appears to be a common mechanism of host cell entry for

396 many viruses. For example, herpes simplex virus 1 and human immunodeficiency

397 virus 1 are capable of entering directly but often use the endocytic pathways

398 for cell entry (Daecke et al., 2005; Miyauchi et al., 2009; Nicola et al., 2003).

399 The membrane domains with high concentrations of cholesterols known as lipid

400 rafts have also been shown to be targeted by viruses for cell entry (Lingwood

401 and Simons, 2010; Simons and Ikonen, 1997). The importance of clathrin-

402 endocytotic pathway, lipid rafts and the presence of ACE2 receptors in them

403 has been confirmed in SARS-CoV-1 (Glende et al., 2008; Inoue et al., 2007;

404 Wang et al., 2008) as well as in SARS-CoV-2 infections (Li et al., 2021; Nardacci

405 et al., 2021). Furthermore, many RNA viruses harness endocytosis to traffic

406 cholesterol from the remodeled membrane and extracellular medium to generate

407 replication organelles, where cholesterol regulates viral polyprotein processing

408 and genome replication (Ilnytska et al., 2013). The Cholesterol-25-hydroxylase

409 converts cholesterol to 25-hydrocholesterol and depletes it from host membrane,

410 which in turn has been shown to block SARS-CoV-2 membrane fusion thereby

411 inhibiting infection in lung epithelial cells (Wang et al., 2020). In addition, higher

412 levels of oxidized cholesterols could lead to the induction of a procoagulant state

413 (Kim et al., 2020) or aggravate the formation of atherosclerotic plaques. This

414 would in part explain the blood clotting and breathing problems observed in

10 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

415 COVID19 patients (Gu et al., 2021). Therefore, TMPRSS2 and TMPRSS4

416 LDLRA domains must be studied further.

417 4.3. Serine-protease domains of TMPRSS2 and TMPRSS4 show significant

418 similarity with plasminogen and blood coagulation factors

419 The TMPRSS2 and TMPRSS4 3D structures are not available. Hence, we

420 first identified their structural homologs using HHPred, and also modelled 3D

421 structures using Phyre2. We identified 29 structural homologs from 10 species

422 with both approaches. These 3D structures were mapped to the SCRC and

423 serine-protease in both proteins, but no match was found for the LDLRA domain.

424 When we analyzed further the domain architectures of the homologs, we found

425 that the serine-protease domain is conserved in all, and half of them are also

426 accompanied by other domains, particularly those found in proteins related to

427 blood coagulation and anticoagulation. As discussed earlier, the TMPRSS2 and

428 TMPRSS4 serine protease domains are present at their C-termini. The TMPRSS2

429 domain, which is tethered to the outer face of the cell membrane, activates the S

430 protein of SARS-CoV-2 for host cell entry. Among the serine-protease domains

431 of 29 proteins, that of plasminogen showed a striking superimposition with

432 TMPRSS2 and TMPRSS4 and the coordinates of their active site amino acid

433 triad were almost identical. It is also shown that the plasminogen shows 95%

434 identity within the S1–S1’ subsites of TMPRSS2, which are used for cleavage

435 of the SARS-Cov-2 spike protein, and 64.71% within S4–S4’ subsites, which

436 were the highest among 14 serine proteases including TMPRSS15 which was

437 selected for homology modelling of the TMPRSS2 structure (Huggins, 2020).

438 This suggests that the serine-protease domains of the TMPRSS2 and TMPRSS4

439 are likely to have a protease activity similar to that of plasminogen, which

440 dissolves the fibrin of blood clots (Storti and Szwast, 1982). Hence, these

441 findings suggest that TMPRSS2 and TMPRSS4 serine protease domains have

442 similar catalytic properties to those of blood clotting factors further supporting

443 the notion that they are likely involved in blood coagulation and anticoagulation

444 through common mechanisms involving plasminogen, plasma kallikrein and

445 thrombin. Similar to these three blood coagulation related factors, the protease

446 domains of TMPRSS2 and TMPRSS4 are also extracellular and they are made

447 also snthesized in an inactive zymogen form. Interestingly, TMPRSS4 directly

448 activates the -type (pro-uPA) encoded by the

449 PLAU gene through its proteolytic activity, which in turn can cleave zymogen

450 plasminogen to form the active enzyme (Min et al., 2014). This suggests

451 a potential role of TMPRSS4 in blood clot resolution upstream of plasminogen

452 and pro-uPA, which is one of the TMPRSS2 and TMPRSS4 structural homologs

453 identified in this study. It is noteworthy that pro-uPA and plasminogen are

454 both ligands for the LDLRA domain containing protein families (Liu et al.,

455 2001). Hence additional studies are warranted to determine whether TMPRSS4

456 activates pro-uPA which in turn converts plasminogen to plasmin to resolve

457 blood clots. These sequential reactions are likely to be dependent on the LDLRA

458 domain of TMPRSS4 and we believe TMPRSS2 also performs similar functions.

11 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

459 4.4. Plasminogen, thrombin, and plasma kallikrein inhibitors show selective

460 binding to TMPRSS2 and TMPRSS4 serine-protease active sites.

461 Aprotinin has been shown to inhibit plasminogen, thrombin, and plasma

462 kallikrein. Therefore, we assumed that the inhibitors of these proteins can

463 also inhibit TMPRSS2 and TMPRSS4. To test this, we used 3D structures

464 of four selective inhibitors and performed docking analysis with the protease

465 active sites of TMPRSS2, TMPRSS4, prothrombin activator, plasminogen, and

466 plasma kallikrein structures. As expected, the Glide docking scores with all four

467 inhibitors for TMPRSS2, and to a lesser extent for TMPRSS4, were as good as

468 with their original molecules suggesting that these inhibitors are likely to impair

469 TMPRSS2 and TMPRSS4 protease activity. Moreover, Aprotinin, a polypeptide

470 consisting of 58 amino acid residues from bovine lung inhibits plasminogen,

471 plasma kallikrein and thrombin (Mahdy and Webster, 2004). There is also

472 evidence that it specifically inhibits other serine-proteases including TMPRSS2

473 and TMPRSS4 that cleave hemagglutinin protein of Influenza virus, and aerosol

474 inhalation of Aprotinin is used for the treatment of patients with mild-to-

475 moderate influenza infections (Ovcharenko and Zhirnov, 1994; Zhirnov et al.,

476 2011). Aprotinin has been suggested to inhibit TMPRSS2 as well (Shen et al.,

477 2017).

478 4.5. TMPRSS2 and TMPRSS4 may function at high-temperature during immune

479 response owing to their serine-protease domain

480 Strikingly, the 29 structural homologs of TMPRSS2 and TMPRSS4 we

481 detected can be grouped into a few related functions including 10 in blood

482 coagulation related processes, 5 in immune functions, and the rest in alternative

483 immune response complement pathway, and activation of caspase-independent

484 cell death in cytotoxic T-cells and NK-cells (Ewen et al., 2012). Interestingly,

485 the five proteins encoded by the genes F11, F2, KLKB1, PLAU, and ACR, and

486 also TMPRSS member TMPRSS11E are the targets of SERPINA5, a plasma

487 serine protease inhibitor with hemostatic roles as a procoagulant, anticoagulant

488 and proinflammatory factor (Yang and Geiger, 2017). It is highly possible that

489 TMPRSS2 and TMPRSS4 genes are also targets of SERPINA5. Additionally,

490 it is known that dexamethasone induces expression of the beta isoform of

491 GZMA and represses expression of its alpha isoform, upon binding of the

492 glucocorticoid receptor (Ruike et al., 2007), ultimately leading to apoptotic cell

493 death (Myoumoto et al., 2007). Interestingly, septic shock is an inflammatory

494 response which causes excessive cell death, and critical COVID19 patients have

495 been shown to recover using Dexamethasone (RECOVERY Collaborative Group

496 et al., 2021). Furthermore, anticoagulants have been used to treat COVID19

497 patients with good success (Levi et al., 2020; Violi et al., 2020).

498 One of the hallmarks of immune response is fever, especially upon infec-

499 tions by viruses or other pathogens. Presumably TMPRSS2 and TMPRSS4

500 along with their 29 homologs, necessitate to remain stable and functional at

501 high temperature during fever. Intriguingly, their serine-protease domain is a

502 prototype feature of the high temperature requirement A (HtrA) protein family,

12 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

503 whose members act as chaperones and are responsible for maintaining protein

504 quality at high temperature (Clausen et al., 2002). The HtrA family is present

505 in the three domains of life and hence, the HtrA mediated cellular response to

506 high temperature is universally conserved (Muley et al., 2019). Null mutants of

507 Escherichia coli HtrA do not survive at elevated temperature due to instability of

508 temperature sensitive proteins due to the lack of protein quality control measures

509 mediated by HtrA (Lipinska et al., 1990). These observations indicate that

510 the serine-protease domains are likely selected as a part of blood coagulation

511 and anticoagulation, immune response, and alternative complement pathways to

512 perform their functions efficiently during immune responses and keep normal

513 hemostasis at high temperatures. Hence, the allele frequency and nucleotide

514 sequences of these protein coding genes can be expected to vary among the

515 populations originally adapted to cold, hot, and temperate zones (Wang et al.,

516 2020). This could be one of the important factors involved in diverse rates of

517 dispersion of SARS-CoV-2 infection in countries with different populations and

518 temperatures. One cannot ignore the diverse genetic makeup of the world’s

519 population influenced by the surrounding environment, and TMPRSS2 and TM-

520 PRSS4 should be further studied in this regard especially their serine-protease

521 domain.

522 5. Conclusions

523 Our in-silico analysis based on the predicted structures of TMPRSS2 and

524 TMPRSS4 allowed the identification of structural homologs many of which are

525 involved in blood coagulation, immune response, and proteolysis, which are

526 important in the context of immune functions. The similarity of the proteolytic

527 domains of TMPRSS2 and TMPRSS4 to that of the blood clotting factors

528 suggest that their catalytic properties are similar as well. Indeed, the tight

529 docking of known inhibitors for these factors to the catalytic sites of TMPRSS2

530 and TMPRSS4 strongly suggest that their activity is also inhibited. This would

531 in part explain why anticoagulant treatments that have been used to treat

532 COVID19 patients have had good success. In addition to treating the clotting

533 problems in these patients, it is expected that the inhibition of the extracellular

534 domains of TMPRSS2 and TMPRSS4 would inhibit their proteolytic effect on

535 ACE2 thus limiting virus entry into the target cells. Moreover, inhibition of

536 these proteins in platelets might also limit the thrombotic effects of SARS CoV-2

537 (Zhang et al., 2020). Hence, our studies shed light on a novel mechanism by which

538 anticoagulant treatments might act to limit the severity of COVID19 inking LDL

539 and clotting factors and offer an avenue for further exploration of therapeutic

540 approaches for this disease that is affecting the whole world population.

541 6. Funding

542 Financial support to VYM and AV-E was provided by IA203920 and IN229620

543 DGAPA-UNAM grants respectively. AV-E was also supported by CONACYT-

544 315802 grant. Financial support to AS and KG was provided by the Austrian

13 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

545 Science Funds (FWF) through the doc.funds project DOC-46 “Catalox” and the

546 Doctoral Academy Graz of the University of Graz.

547 7. References

548 Aberasturi, A.L. de, Calvo, A., 2015. TMPRSS4: an emerging potential thera-

549 peutic target in cancer. British journal of cancer 112, 4–8. doi:10.1038/bjc.2014.403

550 Bateman, A., Martin, M.J., O’Donovan, C., Magrane, M., Alpi, E., Antunes,

551 R., Bely, B., Bingley, M., Bonilla, C., Britto, R., Bursteinas, B., Bye-AJee, H.,

552 Cowley, A., Da Silva, A., De Giorgi, M., Dogan, T., Fazzini, F., Castro, L.G.,

553 Figueira, L., Garmiri, P., Georghiou, G., Gonzalez, D., Hatton-Ellis, E., Li, W.,

554 Liu, W., Lopez, R., Luo, J., Lussi, Y., MacDougall, A., Nightingale, A., Palka,

555 B., Pichler, K., Poggioli, D., Pundir, S., Pureza, L., Qi, G., Rosanoff, S., Saidi,

556 R., Sawford, T., Shypitsyna, A., Speretta, E., Turner, E., Tyagi, N., Volynkin,

557 V., Wardell, T., Warner, K., Watkins, X., Zaru, R., Zellner, H., Xenarios, I.,

558 Bougueleret, L., Bridge, A., Poux, S., Redaschi, N., Aimo, L., ArgoudPuy, G.,

559 Auchincloss, A., Axelsen, K., Bansal, P., Baratin, D., Blatter, M.C., Boeckmann,

560 B., Bolleman, J., Boutet, E., Breuza, L., Casal-Casas, C., De Castro, E., Coudert,

561 E., Cuche, B., Doche, M., Dornevil, D., Duvaud, S., Estreicher, A., Famiglietti,

562 L., Feuermann, M., Gasteiger, E., Gehant, S., Gerritsen, V., Gos, A., Gruaz-

563 Gumowski, N., Hinz, U., Hulo, C., Jungo, F., Keller, G., Lara, V., Lemercier, P.,

564 Lieberherr, D., Lombardot, T., Martin, X., Masson, P., Morgat, A., Neto, T.,

565 Nouspikel, N., Paesano, S., Pedruzzi, I., Pilbout, S., Pozzato, M., Pruess, M.,

566 Rivoire, C., Roechert, B., Schneider, M., Sigrist, C., Sonesson, K., Staehli, S.,

567 Stutz, A., Sundaram, S., Tognolli, M., Verbregue, L., Veuthey, A.L., Wu, C.H.,

568 Arighi, C.N., Arminski, L., Chen, C., Chen, Y., Garavelli, J.S., Huang, H., Laiho,

569 K., McGarvey, P., Natale, D.A., Ross, K., Vinayaka, C.R., Wang, Q., Wang,

570 Y., Yeh, L.S., Zhang, J., 2017. UniProt: The universal protein knowledgebase.

571 Nucleic Acids Research 45, D158–D169. doi:10.1093/nar/gkw1099

572 Baum, B., Muley, L., Heine, A., Smolinski, M., Hangauer, D., Klebe, G., 2009.

573 Think Twice: Understanding the High Potency of Bis(phenyl)methane Inhibitors

574 of Thrombin. Journal of Molecular Biology 391, 552–564. doi:10.1016/j.jmb.2009.06.016

575 Berman, H.M., 2000. The . Nucleic Acids Research 28,

576 235–242. doi:10.1093/nar/28.1.235

577 Bertram, S., Glowacka, I., Blazejewska, P., Soilleux, E., Allen, P., Danisch,

578 S., Steffen, I., Choi, S.-Y., Park, Y., Schneider, H., Schughart, K., Pöhlmann,

579 S., 2010. TMPRSS2 and TMPRSS4 Facilitate Trypsin-Independent Spread

580 of Influenza Virus in Caco-2 Cells. Journal of Virology 84, 10016–10025.

581 doi:10.1128/jvi.00239-10

582 Bieri, S., Djordjevic, J.T., Daly, N.L., Smith, R., Kroon, P.A., 1995. Disulfide

583 bridges of a cysteine-rich repeat of the LDL receptor ligand-binding domain.

584 Biochemistry 34, 13059–13065. doi:10.1021/bi00040a017

585 Brown, M.S., Goldstein, J.L., 1986. A receptor-mediated pathway for choles-

586 terol homeostasis. Science 232, 34–47. doi:10.1126/science.3513311

587 Castro, E. de, Sigrist, C.J., Gattiker, A., Bulliard, V., Langendijk-Genevaux,

588 P.S., Gasteiger, E., Bairoch, A., Hulo, N., 2006. ScanProsite: Detection of

14 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

589 PROSITE signature matches and ProRule-associated functional and structural

590 residues in proteins. Nucleic Acids Research 34, W362–W365. doi:10.1093/nar/gkl124

591 Chaipan, C., Kobasa, D., Bertram, S., Glowacka, I., Steffen, I., Tsegaye, T.S.,

592 Takeda, M., Bugge, T.H., Kim, S., Park, Y., Marzi, A., Pöhlmann, S., 2009.

593 Proteolytic activation of the 1918 influenza virus hemagglutinin. Journal of

594 virology 83, 3200–11. doi:10.1128/JVI.02205-08

595 Chikhale, R.V., Gupta, V.K., Eldesoky, G.E., Wabaidur, S.M., Patil, S.A.,

596 Islam, M.A., 2020. Identification of potential anti-TMPRSS2 natural prod-

597 ucts through homology modelling, virtual screening and molecular dynamics

598 simulation studies. Journal of Biomolecular Structure and Dynamics 1–16.

599 doi:10.1080/07391102.2020.1798813

600 Christensen, A.M., Massiah, M.A., Turner, B.G., Sundquist, W.I., Sum-

601 mers, M.F., 1996. Three-Dimensional Structure of the HTLV-II Matrix Protein

602 and Comparative Analysis of Matrix Proteins from the Different Classes of

603 Pathogenic Human Retroviruses. Journal of Molecular Biology 264, 1117–1131.

604 doi:10.1006/jmbi.1996.0700

605 Clausen, T., Southan, C., Ehrmann, M., 2002. The HtrA family of proteases:

606 implications for protein composition and cell fate. Molecular cell 10, 443–55.

607 doi:10.1016/s1097-2765(02)00658-5

608 Daecke, J., Fackler, O.T., Dittmar, M.T., Kräusslich, H.-G., 2005. Involve-

609 ment of Clathrin-Mediated Endocytosis in Human Immunodeficiency Virus Type

610 1 Entry. Journal of Virology 79, 1581–1594. doi:10.1128/jvi.79.3.1581-1594.2005

611 Daly, N.L., Scanlon, M.J., Djordjevic, J.T., Kroon, P.A., Smith, R., 1995.

612 Three-dimensional structure of a cysteine-rich repeat from the low-density lipopro-

613 tein receptor. Proceedings of the National Academy of Sciences of the United

614 States of America 92, 6334–6338. doi:10.1073/pnas.92.14.6334

615 Dawson, N.L., Lewis, T.E., Das, S., Lees, J.G., Lee, D., Ashford, P., Orengo,

616 C.A., Sillitoe, I., 2017. CATH: an expanded resource to predict protein func-

617 tion through structure and sequence. Nucleic Acids Research 45, D289–D295.

618 doi:10.1093/nar/gkw1098

619 Ewen, C.L., Kane, K.P., Bleackley, R.C., 2012. A quarter century of

620 granzymes. doi:10.1038/cdd.2011.153

621 Freeman, M., Ashkenas, J., Rees, D.J., Kingsley, D.M., Copeland, N.G.,

622 Jenkins, N.A., Krieger, M., 1990. An ancient, highly conserved family of

623 cysteine-rich protein domains revealed by cloning type I and type II murine

624 macrophage scavenger receptors. Proceedings of the National Academy of

625 Sciences 87, 8810–8814. doi:10.1073/pnas.87.22.8810

626 Friesner, R.A., Banks, J.L., Murphy, R.B., Halgren, T.A., Klicic, J.J., Mainz,

627 D.T., Repasky, M.P., Knoll, E.H., Shelley, M., Perry, J.K., Shaw, D.E., Francis,

628 P., Shenkin, P.S., 2004. Glide: A New Approach for Rapid, Accurate Docking

629 and Scoring. 1. Method and Assessment of Docking Accuracy. Journal of

630 Medicinal Chemistry 47, 1739–1749. doi:10.1021/jm0306430

631 Friesner, R.A., Murphy, R.B., Repasky, M.P., Frye, L.L., Greenwood, J.R.,

632 Halgren, T.A., Sanschagrin, P.C., Mainz, D.T., 2006. Extra precision glide: Dock-

633 ing and scoring incorporating a model of hydrophobic enclosure for protein-ligand

15 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

634 complexes. Journal of Medicinal Chemistry 49, 6177–6196. doi:10.1021/jm051256o

635 Glende, J., Schwegmann-Wessels, C., Al-Falah, M., Pfefferle, S., Qu, X., Deng,

636 H., Drosten, C., Naim, H.Y., Herrler, G., 2008. Importance of cholesterol-rich

637 membrane microdomains in the interaction of the S protein of SARS-coronavirus

638 with the cellular receptor angiotensin-converting enzyme 2. Virology 381, 215–

639 221. doi:10.1016/j.virol.2008.08.026

640 Glowacka, I., Bertram, S., Muller, M.A., Allen, P., Soilleux, E., Pfefferle,

641 S., Steffen, I., Tsegaye, T.S., He, Y., Gnirss, K., Niemeyer, D., Schneider, H.,

642 Drosten, C., Pohlmann, S., 2011. Evidence that TMPRSS2 Activates the Severe

643 Acute Respiratory Syndrome Coronavirus Spike Protein for Membrane Fusion

644 and Reduces Viral Control by the Humoral Immune Response. Journal of

645 Virology 85, 4122–4134. doi:10.1128/JVI.02232-10

646 Gough, J., Karplus, K., Hughey, R., Chothia, C., 2001. Assignment of

647 homology to genome sequences using a library of hidden Markov models that

648 represent all proteins of known structure. Journal of Molecular Biology 313,

649 903–919. doi:10.1006/jmbi.2001.5080

650 Gu, S.X., Tyagi, T., Jain, K., Gu, V.W., Lee, S.H., Hwa, J.M., Kwan, J.M.,

651 Krause, D.S., Lee, A.I., Halene, S., Martin, K.A., Chun, H.J., Hwa, J., 2021.

652 Thrombocytopathy and endotheliopathy: crucial contributors to COVID-19

653 thromboinflammation. doi:10.1038/s41569-020-00469-1

654 Heissig, B., Salama, Y., Takahashi, S., Osada, T., Hattori, K., 2020. The

655 multifaceted role of plasminogen in inflammation. Cellular Signalling 75, 109761.

656 doi:10.1016/j.cellsig.2020.109761

657 Hempel, T., Raich, L., Olsson, S., Azouz, N.P., Klingler, A.M., Hoffmann,

658 M., Pöhlmann, S., Rothenberg, M.E., Noé, F., 2021. Molecular mechanism

659 of inhibiting the sars-cov-2 cell entry facilitator tmprss2 with camostat and

660 nafamostat. Chem. Sci. –. doi:10.1039/D0SC05064D

661 Herter, S., Piper, D.E., Aaron, W., Gabriele, T., Cutler, G., Cao, P., Bhatt,

662 A.S., Choe, Y., Craik, C.S., Walker, N., Meininger, D., Hoey, T., Austin, R.J.,

663 2005. is a preferred in vitro substrate for human

664 hepsin, a membrane-anchored serine protease implicated in prostate and ovarian

665 cancers. Biochemical Journal 390, 125–136. doi:10.1042/BJ20041955

666 Heurich, A., Hofmann-Winkler, H., Gierer, S., Liepold, T., Jahn, O., Pohlmann,

667 S., 2014. TMPRSS2 and ADAM17 Cleave ACE2 Differentially and Only Pro-

668 teolysis by TMPRSS2 Augments Entry Driven by the Severe Acute Respira-

669 tory Syndrome Coronavirus Spike Protein. Journal of Virology 88, 1293–1307.

670 doi:10.1128/JVI.02202-13

671 Hildebrand, A., Remmert, M., Biegert, A., Söding, J., 2009. Fast and

672 accurate automatic structure prediction with HHpred. Proteins: Structure,

673 Function, and Bioinformatics 77, 128–132. doi:10.1002/prot.22499

674 Hoffmann, M., Kleine-Weber, H., Schroeder, S., Krüger, N., Herrler, T.,

675 Erichsen, S., Schiergens, T.S., Herrler, G., Wu, N.-H., Nitsche, A., Müller, M.A.,

676 Drosten, C., Pöhlmann, S., 2020. SARS-CoV-2 Cell Entry Depends on ACE2

677 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor. Cell

678 181, 271–280.e8. doi:10.1016/j.cell.2020.02.052

679 Hohenester, E., Sasaki, T., Timpl, R., 1999. Crystal structure of a scavenger

16 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

680 receptor cysteine-rich domain sheds light on an ancient superfamily. Nature

681 structural biology 6, 228–32. doi:10.1038/6669

682 Hu, B., Guo, H., Zhou, P., Shi, Z.L., 2020. Characteristics of SARS-CoV-2

683 and COVID-19. Nature Reviews Microbiology. doi:10.1038/s41579-020-00459-7

684 Huggins, D.J., 2020. Structural analysis of experimental drugs binding to the

685 SARS-CoV-2 target TMPRSS2. Journal of Molecular Graphics and Modelling

686 100, 107710. doi:10.1016/j.jmgm.2020.107710

687 Hughes, H., Stephens, D.J., 2008. Assembly, organization, and function of the

688 COPII coat. Histochemistry and Cell Biology 129, 129–151. doi:10.1007/s00418-

689 007-0363-x

690 Idris, M.O., Yekeen, A.A., Alakanse, O.S., Durojaye, O.A., 2020. Computer-

691 aided screening for potential TMPRSS2 inhibitors: a combination of pharma-

692 cophore modeling, molecular docking and molecular dynamics simulation ap-

693 proaches. Journal of Biomolecular Structure and Dynamics 1–19. doi:10.1080/07391102.2020.1792346

694 Ilnytska, O., Santiana, M., Hsu, N.Y., Du, W.L., Chen, Y.H., Viktorova,

695 E.G., Belov, G., Brinker, A., Storch, J., Moore, C., Dixon, J.L., Altan-Bonnet,

696 N., 2013. Enteroviruses harness the cellular endocytic machinery to remodel

697 the host cell cholesterol landscape for effective viral replication. Cell Host and

698 Microbe 14, 281–293. doi:10.1016/j.chom.2013.08.002

699 Inoue, Y., Tanaka, N., Tanaka, Y., Inoue, S., Morita, K., Zhuang, M., Hattori,

700 T., Sugamura, K., 2007. Clathrin-Dependent Entry of Severe Acute Respiratory

701 Syndrome Coronavirus into Target Cells Expressing ACE2 with the Cytoplasmic

702 Tail Deleted. Journal of Virology 81, 8722–8729. doi:10.1128/jvi.00253-07

703 Johnson, M., Zaretskaya, I., Raytselis, Y., Merezhuk, Y., McGinnis, S.,

704 Madden, T.L., 2008. NCBI BLAST: a better web interface. Nucleic Acids

705 Research 36, W5–W9. doi:10.1093/nar/gkn201

706 Jorgensen, W.L., Tirado-Rives, J., 1988. The OPLS [optimized potentials

707 for liquid simulations] potential functions for proteins, energy minimizations

708 for crystals of cyclic peptides and crambin. Journal of the American Chemical

709 Society 110, 1657–1666. doi:10.1021/ja00214a001

710 Katoh, K., Rozewicki, J., Yamada, K.D., 2018. MAFFT online service: Mul-

711 tiple sequence alignment, interactive sequence choice and visualization. Briefings

712 in Bioinformatics 20, 1160–1166. doi:10.1093/bib/bbx108

713 Kelley, L.A., Mezulis, S., Yates, C.M., Wass, M.N., Sternberg, M.J., 2015.

714 The Phyre2 web portal for protein modeling, prediction and analysis. Nature

715 Protocols 10, 845–858. doi:10.1038/nprot.2015.053

716 Kim, M., Yoo, H.J., Lee, D., Lee, J.H., 2020. Oxidized LDL induces

717 procoagulant profiles by increasing lysophosphatidylcholine levels, lysophos-

718 phatidylethanolamine levels, and Lp-PLA2 activity in borderline hypercholes-

719 terolemia. Nutrition, Metabolism and Cardiovascular Diseases 30, 1137–1146.

720 doi:10.1016/j.numecd.2020.03.015

721 Kitamoto, Y., Yuan, X., Wu, Q., McCourt, D.W., Sadler, J.E., 1994. En-

722 terokinase, the initiator of intestinal digestion, is a mosaic protease composed of

723 a distinctive assortment of domains. Proceedings of the National Academy of

724 Sciences 91, 7588–7592. doi:10.1073/pnas.91.16.7588

725 Kyrieleis, O.J.P., Huber, R., Ong, E., Oehler, R., Hunter, M., Madison, E.L.,

17 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

726 Jacob, U., 2007. Crystal structure of the catalytic domain of DESC1, a new

727 member of the type II transmembrane serine proteinase family. FEBS Journal

728 274, 2148–2160. doi:10.1111/j.1742-4658.2007.05756.x

729 Law, R.H.P., Wu, G., Leung, E.W.W., Hidaka, K., Quek, A.J., Caradoc-

730 Davies, T.T., Jeevarajah, D., Conroy, P.J., Kirby, N.M., Norton, R.S., Tsuda, Y.,

731 Whisstock, J.C., 2017. X-ray crystal structure of plasmin with tranexamic acid–

732 derived active site inhibitors. Blood Advances 1, 766–771. doi:10.1182/bloodadvances.2016004150

733 Lechtenberg, B.C., Murray-Rust, T.A., Johnson, D.J.D., Adams, T.E., Kr-

734 ishnaswamy, S., Camire, R.M., Huntington, J.A., 2013. Crystal structure of

735 the prothrombinase complex from the venom of Pseudonaja textilis. Blood 122,

736 2777–2783. doi:10.1182/blood-2013-06-511733

737 Lee, Y., Ko, D., Min, H.-J., Kim, S.B., Ahn, H.-M., Lee, Y., Kim, S., 2016.

738 TMPRSS4 induces invasion and proliferation of prostate cancer cells through in-

739 duction of Slug and cyclin D1. Oncotarget 7, 50315–50332. doi:10.18632/oncotarget.10382

740 Levi, M., Thachil, J., Iba, T., Levy, J.H., 2020. Coagulation abnormalities

741 and thrombosis in patients with COVID-19. doi:10.1016/S2352-3026(20)30145-9

742 Lewis, T.E., Sillitoe, I., Andreeva, A., Blundell, T.L., Buchan, D.W., Chothia,

743 C., Cozzetto, D., Dana, J.M., Filippis, I., Gough, J., Jones, D.T., Kelley, L.A.,

744 Kleywegt, G.J., Minneci, F., Mistry, J., Murzin, A.G., Ochoa-Montaño, B., Oates,

745 M.E., Punta, M., Rackham, O.J., Stahlhacke, J., Sternberg, M.J., Velankar, S.,

746 Orengo, C., 2015. Genome3D: exploiting structure to help users understand their

747 sequences. Nucleic Acids Research 43, D382–D386. doi:10.1093/nar/gku973

748 Li, X., Zhu, W., Fan, M., Zhang, J., Peng, Y., Huang, F., Wang, N.,

749 He, L., Zhang, L., Holmdahl, R., Meng, L., Lu, S., 2021. Dependence of

750 SARS-CoV-2 infection on cholesterol-rich lipid raft and endosomal acidifica-

751 tion. Computational and Structural Biotechnology Journal 19, 1933–1943.

752 doi:10.1016/j.csbj.2021.04.001

753 Lingwood, D., Simons, K., 2010. Lipid rafts as a membrane-organizing

754 principle. doi:10.1126/science.1174621

755 Lipinska, B., Zylicz, M., Georgopoulos, C., 1990. The HtrA (DegP) protein,

756 essential for Escherichia coli survival at high temperatures, is an endopeptidase.

757 Journal of Bacteriology 172, 1791–1797. doi:10.1128/jb.172.4.1791-1797.1990

758 Liu, C.X., Li, Y., Obermoeller-McCormick, L.M., Schwartz, A.L., Bu, G.,

759 2001. The Putative Tumor Suppressor LRP1B, a Novel Member of the Low

760 Density Lipoprotein (LDL) Receptor Family, Exhibits Both Overlapping and

761 Distinct Properties with the LDL Receptor-related Protein. Journal of Biological

762 Chemistry 276, 28889–28896. doi:10.1074/jbc.M102727200

763 Lo, M.W., Kemper, C., Woodruff, T.M., 2020. COVID-19: Complement,

764 Coagulation, and Collateral Damage. The Journal of Immunology 205, 1488–

765 1495. doi:10.4049/jimmunol.2000644

766 López-Otín, C., Overall, C.M., 2002. “Protease degradomics: A new chal-

767 lenge for proteomics”. Nature Reviews Molecular Cell Biology 3, 509–519.

768 doi:10.1038/nrm858

769 Madeira, F., Park, Y.M., Lee, J., Buso, N., Gur, T., Madhusoodanan, N.,

770 Basutkar, P., Tivey, A.R., Potter, S.C., Finn, R.D., Lopez, R., 2019. The

18 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

771 EMBL-EBI search and sequence analysis tools APIs in 2019. Nucleic acids

772 research. doi:10.1093/nar/gkz268

773 Mahdy, A., Webster, N., 2004. Perioperative systemic haemostatic agents.

774 British Journal of Anaesthesia 93, 842–858. doi:10.1093/bja/aeh227

775 Matsuyama, S., Nagata, N., Shirato, K., Kawase, M., Takeda, M., Taguchi, F.,

776 2010. Efficient Activation of the Severe Acute Respiratory Syndrome Coronavirus

777 Spike Protein by the Transmembrane Protease TMPRSS2. Journal of Virology

778 84, 12658–12664. doi:10.1128/JVI.01542-10

779 Min, H.J., Lee, M.K., Lee, J.W., Kim, S., 2014. TMPRSS4 induces cancer

780 cell invasion through pro-uPA processing. Biochemical and Biophysical Research

781 Communications 446, 1–7. doi:10.1016/j.bbrc.2014.01.013

782 Miyauchi, K., Kim, Y., Latinovic, O., Morozov, V., Melikyan, G.B., 2009.

783 HIV Enters Cells via Endocytosis and Dynamin-Dependent Fusion with Endo-

784 somes. Cell 137, 433–444. doi:10.1016/j.cell.2009.02.046

785 Muley, V.Y., Akhter, Y., Galande, S., 2019. PDZ Domains Across the

786 Microbial World: Molecular Link to the Proteases, Stress Response, and Protein

787 Synthesis. Genome Biology and Evolution 11, 644–659. doi:10.1093/gbe/evz023

788 Myoumoto, A., Nakatani, K., Koshimizu, T.A., Matsubara, H., Adachi, S.,

789 Tsujimoto, G., 2007. Glucocorticoid-induced A expression can be

790 used as a marker of glucocorticoid sensitivity for acute lymphoblastic leukemia

791 therapy. Journal of Human Genetics 52, 328–333. doi:10.1007/s10038-007-0119-4

792 Nardacci, R., Colavita, F., Castilletti, C., Lapa, D., Matusali, G., Meschi,

793 S., Del Nonno, F., Colombo, D., Capobianchi, M.R., Zumla, A., Ippolito, G.,

794 Piacentini, M., Falasca, L., 2021. Evidences for lipid involvement in SARS-CoV-2

795 cytopathogenesis. Cell Death and Disease 12. doi:10.1038/s41419-021-03527-9

796 Needleman, S.B., Wunsch, C.D., 1970. A general method applicable to the

797 search for similarities in the amino acid sequence of two proteins. Journal of

798 Molecular Biology 48, 443–453. doi:10.1016/0022-2836(70)90057-4

799 Nicola, A.V., McEvoy, A.M., Straus, S.E., 2003. Roles for Endocytosis and

800 Low pH in Herpes Simplex Virus Entry into HeLa and Chinese Hamster Ovary

801 Cells. Journal of Virology 77, 5324–5332. doi:10.1128/jvi.77.9.5324-5332.2003

802 Ovcharenko, A., Zhirnov, O., 1994. Aprotinin aerosol treatment of influenza

803 and paramyxovirus bronchopneumonia of mice. Antiviral Research 23, 107–118.

804 doi:10.1016/0166-3542(94)90038-8

805 Partridge, J.R., Choy, R.M., Silva-Garcia, A., Yu, C., Li, Z., Sham, H., Met-

806 calf, B., 2019. Structures of full-length plasma kallikrein bound to highly specific

807 inhibitors describe a new mode of targeted inhibition. Journal of Structural

808 Biology 206, 170–182. doi:10.1016/j.jsb.2019.03.001

809 Puente, X., Sánchez, L., Gutiérrez-Fernández, A., Velasco, G., López-Otín,

810 C., 2005. A genomic view of the complexity of mammalian proteolytic systems.

811 Biochemical Society Transactions 33, 331–334. doi:10.1042/BST0330331

812 Rahman, N., Basharat, Z., Yousuf, M., Castaldo, G., Rastrelli, L., Khan, H.,

813 2020. Virtual Screening of Natural Products against Type II Transmembrane

814 Serine Protease (TMPRSS2), the Priming Agent of Coronavirus 2 (SARS-CoV-2).

815 Molecules 25, 2271. doi:10.3390/molecules25102271

816 RECOVERY Collaborative Group, Horby, P., Lim, W.S., Emberson, J.R.,

19 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

817 Mafham, M., Bell, J.L., Linsell, L., Staplin, N., Brightling, C., Ustianowski, A.,

818 Elmahi, E., Prudon, B., Green, C., Felton, T., Chadwick, D., Rege, K., Fegan,

819 C., Chappell, L.C., Faust, S.N., Jaki, T., Jeffery, K., Montgomery, A., Rowan,

820 K., Juszczak, E., Baillie, J.K., Haynes, R., Landray, M.J., 2021. Dexamethasone

821 in Hospitalized Patients with Covid-19. The New England journal of medicine

822 384, 693–704. doi:10.1056/NEJMoa2021436

823 Resnick, D., Pearson, A., Krieger, M., 1994. The SRCR superfamily: a

824 family reminiscent of the Ig superfamily. Trends in Biochemical Sciences 19, 5–8.

825 doi:10.1016/0968-0004(94)90165-1

826 Risitano, A.M., Mastellos, D.C., Huber-Lang, M., Yancopoulou, D., Garlanda,

827 C., Ciceri, F., Lambris, J.D., 2020. Complement as a target in COVID-19?

828 Nature Reviews Immunology 20, 343–344. doi:10.1038/s41577-020-0320-7

829 Roversi, P., Johnson, S., Caesar, J.J.E., McLean, F., Leath, K.J., Tsiftsoglou,

830 S.A., Morgan, B.P., Harris, C.L., Sim, R.B., Lea, S.M., 2011. Structural basis

831 for complement factor I control and its disease-associated sequence polymor-

832 phisms. Proceedings of the National Academy of Sciences 108, 12839–12844.

833 doi:10.1073/pnas.1102167108

834 Ruike, Y., Katsuma, S., Hirasawa, A., Tsujimoto, G., 2007. Glucocorticoid-

835 induced alternative promoter usage for a novel 5’ variant of granzyme A. Journal

836 of Human Genetics 52, 172–178. doi:10.1007/s10038-006-0099-9

837 Sakai, K., Ami, Y., Tahara, M., Kubota, T., Anraku, M., Abe, M., Nakajima,

838 N., Sekizuka, T., Shirato, K., Suzaki, Y., Ainai, A., Nakatsu, Y., Kanou, K.,

839 Nakamura, K., Suzuki, T., Komase, K., Nobusawa, E., Maenaka, K., Kuroda,

840 M., Hasegawa, H., Kawaoka, Y., Tashiro, M., Takeda, M., 2014. The Host

841 Protease TMPRSS2 Plays a Major Role in In Vivo Replication of Emerging

842 H7N9 and Seasonal Influenza Viruses. Journal of Virology 88, 5608–5616.

843 doi:10.1128/JVI.03677-13

844 Schrödinger, 2018. Maestro | Schrödinger.

845 Schrödinger, LLC, 2015. The PyMOL molecular graphics system, version 1.8.

846 Shen, L.W., Mao, H.J., Wu, Y.L., Tanaka, Y., Zhang, W., 2017. TMPRSS2:

847 A potential target for treatment of influenza virus and coronavirus infections.

848 Biochimie 142, 1–10. doi:10.1016/j.biochi.2017.07.016

849 Sigrist, C.J., Cerutti, L., De Castro, E., Langendijk-Genevaux, P.S., Bulliard,

850 V., Bairoch, A., Hulo, N., 2009. PROSITE, a protein domain database for

851 functional characterization and annotation. Nucleic Acids Research 38, D161–

852 D166. doi:10.1093/nar/gkp885

853 Simons, K., Ikonen, E., 1997. Functional rafts in cell membranes. doi:10.1038/42408

854 Sonnhammer, E.L., Eddy, S.R., Durbin, R., 1997. Pfam: A comprehen-

855 sive database of protein domain families based on seed alignments. Pro-

856 teins: Structure, Function and Genetics 28, 405–420. doi:10.1002/(SICI)1097-

857 0134(199707)28:3<405::AID-PROT10>3.0.CO;2-L

858 Storti, R.V., Szwast, A.E., 1982. Molecular cloning and characterization of

859 Drosophila genes and their expression during embryonic development and in pri-

860 mary muscle cell cultures. Developmental Biology 90, 272–283. doi:10.1016/0012-

861 1606(82)90376-1

862 Szabo, R., Bugge, T., 2008. Type II transmembrane serine proteases in

20 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

863 development and disease. The International Journal of Biochemistry & Cell

864 Biology 40, 1297–1316. doi:10.1016/j.biocel.2007.11.013

865 Tang, J., Yu, C.L., Williams, S.R., Springman, E., Jeffery, D., Sprengeler,

866 P.A., Estevez, A., Sampang, J., Shrader, W., Spencer, J., Young, W., McGrath,

867 M., Katz, B.A., 2005. Expression, Crystallization, and Three-dimensional

868 Structure of the Catalytic Domain of Human Plasma Kallikrein. Journal of

869 Biological Chemistry 280, 41077–41089. doi:10.1074/jbc.M506766200

870 Tsirigos, K.D., Peters, C., Shu, N., Käll, L., Elofsson, A., 2015. The TOP-

871 CONS web server for consensus prediction of membrane protein topology and sig-

872 nal peptides. Nucleic Acids Research 43, W401–W407. doi:10.1093/nar/gkv485

873 Villalba, M., Exposito, F., Pajares, M.J., Sainz, C., Redrado, M., Remirez,

874 A., Wistuba, I., Behrens, C., Jantus-Lewintre, E., Camps, C., Montuenga, L.M.,

875 Pio, R., Lozano, M.D., Andrea, C. de, Calvo, A., 2019. TMPRSS4: A Novel

876 Tumor Prognostic Indicator for the Stratification of Stage IA Tumors and a

877 Liquid Biopsy Biomarker for NSCLC Patients. Journal of Clinical Medicine 8,

878 2134. doi:10.3390/jcm8122134

879 Violi, F., Pastori, D., Cangemi, R., Pignatelli, P., Loffredo, L., 2020. Hyperco-

880 agulation and Antithrombotic Treatment in Coronavirus 2019: A New Challenge.

881 Thrombosis and Haemostasis 120, 949–956. doi:10.1055/s-0040-1710317

882 Wallrapp, C., Hähnel, S., Müller-Pillasch, F., Burghardt, B., Iwamura,

883 T., Ruthenbürger, M., Lerch, M.M., Adler, G., Gress, T.M., 2000. A novel

884 transmembrane serine protease (TMPRSS3) overexpressed in pancreatic cancer.

885 Cancer research 60, 2602–6.

886 Wang, H., Yang, P., Liu, K., Guo, F., Zhang, Y., Zhang, G., Jiang, C., 2008.

887 SARS coronavirus entry into host cells through a novel clathrin- and caveolae-

888 independent endocytic pathway. Cell Research 18, 290–301. doi:10.1038/cr.2008.15

889 Wang, S., Li, W., Hui, H., Tiwari, S.K., Zhang, Q., Croker, B.A., Rawlings, S.,

890 Smith, D., Carlin, A.F., Rana, T.M., 2020. Cholesterol 25-Hydroxylase inhibits

891 SARS -CoV-2 and other coronaviruses by depleting membrane cholesterol. The

892 EMBO Journal 39, e106057. doi:10.15252/embj.2020106057

893 Waterhouse, A.M., Procter, J.B., Martin, D.M.A., Clamp, M., Barton, G.J.,

894 2009. Jalview Version 2–a multiple sequence alignment editor and analysis

895 workbench. Bioinformatics 25, 1189–1191. doi:10.1093/bioinformatics/btp033

896 Wei, W., Zhao, W., Wang, X., Teng, M., Niu, L., 2007. Purification,

897 crystallization and preliminary X-ray diffraction analysis of saxthrombin, a

898 thrombin-like enzyme from Gloydius saxatilis venom. Acta Crystallographica

899 Section F Structural Biology and Crystallization Communications 63, 704–707.

900 doi:10.1107/S1744309107031429

901 Yamamoto, T., Davis, C., Brown, M.S., Schneider, W.J., Casey, M., Goldstein,

902 J.L., Russell, D.W., 1984. The human LDL receptor: A cysteine-rich protein

903 with multiple Alu sequences in its mRNA. Cell 39, 27–38. doi:10.1016/0092-

904 8674(84)90188-0

905 Yang, H., Geiger, M., 2017. Cell penetrating SERPINA5 (ProteinC inhibitor,

906 PCI): More questions than answers. Seminars in Cell & Developmental Biology

907 62, 187–193. doi:10.1016/j.semcdb.2016.10.007

908 Zang, R., Castro, M.F.G., McCune, B.T., Zeng, Q., Rothlauf, P.W., Sonnek,

21 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

909 N.M., Liu, Z., Brulois, K.F., Wang, X., Greenberg, H.B., Diamond, M.S., Ciorba,

910 M.A., Whelan, S.P., Ding, S., 2020. TMPRSS2 and TMPRSS4 promote SARS-

911 CoV-2 infection of human small intestinal enterocytes. Science Immunology 5,

912 eabc3582. doi:10.1126/sciimmunol.abc3582

913 Zhang, S., Liu, Y., Wang, X., Yang, L., Li, H., Wang, Y., Liu, M., Zhao, X.,

914 Xie, Y., Yang, Y., Zhang, S., Fan, Z., Dong, J., Yuan, Z., Ding, Z., Zhang, Y., Hu,

915 L., 2020. SARS-CoV-2 binds platelet ACE2 to enhance thrombosis in COVID-19.

916 Journal of Hematology and Oncology 13, 120. doi:10.1186/s13045-020-00954-7

917 Zhirnov, O., Klenk, H., Wright, P., 2011. Aprotinin and similar pro-

918 tease inhibitors as drugs against influenza. Antiviral Research 92, 27–36.

919 doi:10.1016/j.antiviral.2011.07.014

920 Zmora, P., Blazejewska, P., Moldenhauer, A.-S., Welsch, K., Nehlmeier, I.,

921 Wu, Q., Schneider, H., Pohlmann, S., Bertram, S., 2014. DESC1 and MSPL

922 Activate Influenza A Viruses and Emerging Coronaviruses for Host Cell Entry.

923 Journal of Virology 88, 12087–12097. doi:10.1128/JVI.01427-14

22 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table 1: Structural homologs of TMPRSS2 and TPMRSS4 identified by HHPred and Phyre2

PDB ID Uniprot ID Uniprot Name Protein name Gene Organism Length 4CRG P03951 FA11_HUMAN Coagulation factor XI (FXI) F11 Homo sapiens 625 4XDE P00748 FA12_HUMAN Coagulation factor XII F12 Homo sapiens 615 3NXP, 4O03 P00734 THRB_HUMAN Prothrombin, Coagulation factor II F2 Homo sapiens 622 6R2W P08709 FA7_HUMAN Coagulation factor VII F7 Homo sapiens 295 2ANY P03952 KLKB1_HUMAN Plasma kallikrein KLKB1 Homo sapiens 638 5YC6 P00749 UROK_HUMAN Urokinase-type plasminogen activator PLAU Homo sapiens 431 5UGG, 4DUR P00747 PLMN_HUMAN Plasminogen PLG Homo sapiens 810 3H5C P22891 PROZ_HUMAN Vitamin K-dependent protein Z PROZ Homo sapiens 444 3S69 Q7SZE1 VSPSX_GLOSA Thrombin-like enzyme saxthrombin (SVTLE) - Gloydius saxatilis 258 4BXW Q56VR3 FAXC_PSETE Venom prothrombin activator pseutarin-C catalytic subunit (PCCS) - Pseudonaja textilis 467 5NAT P00746 CFAD_HUMAN Complement CFD Homo sapiens 253 5FCR P03953 CFAD_MOUSE Complement factor D Cfd Mus musculus 259 2XRC P05156 CFAI_HUMAN Complement factor I CFI Homo sapiens 583 3GYL Q16651 PRSS8_HUMAN Prostasin PRSS8 Homo sapiens 343 3NCL Q9Y5Y6 ST14_HUMAN Suppressor of tumorigenicity 14 protein ST14 Homo sapiens 855 1FIW Q9GL10 ACRO_SHEEP ACR Ovis aries 329 1AO5 P36368 EGFB2_MOUSE Epidermal growth factor-binding protein type B Egfbp2 Mus musculus 261 3W94 A4UWM5 A4UWM5_ORYLA -1 EP-1 Oryzias latipes 1036 1ORF P12544 GRAA_HUMAN Granzyme A, Cytotoxic T-lymphocyte proteinase 1 GZMA Homo sapiens 262 1MZA P49863 GRAK_HUMAN Granzyme K, Natural killer cell -2 GZMK Homo sapiens 264 2ZGC P51124 GRAM_HUMAN Granzyme M, Natural killer cell granular protease GZMM Homo sapiens 257 2R0L, 1YC0 Q04756 HGFA_HUMAN Hepatocyte growth factor activator HGFAC Homo sapiens 655 1Z8G, 1P57 P05981 HEPS_HUMAN Serine protease hepsin, Transmembrane protease serine 1 HPN Homo sapiens 417 2OQ5 Q9UL52 TM11E_HUMAN Transmembrane protease serine 11E TMPRSS11E Homo sapiens 423 4DGJ P98073 ENTK_HUMAN Transmembrane protease serine 15, Enteropeptidase TMPRSS15 Homo sapiens 1019 1YM0 Q3HR18 Q3HR18_EISFE Lumbrokinase F238 - Eisenia fetida 245 1ELT Q7SIG3 ELA1_SALSA -1 - Salmo salar 236 2F91 Q52V24 Q52V24_ASTLP Hepatopancreas trypsin (Fragment) - Astacus leptodactylus 237 1PQ7 P35049 TRYP_FUSOX Trypsin - Fusarium oxysporum 248 Note: Blood coagulation related proteins are highlighted with bold face Hyphen used for information not available

Table 2: Root mean square deviations of predicted TMPRSS2 and TPMRSS4 3D structures with 36 known structural homologs

TMPRSS2 TMPRSS4 PDB ID Both domains Protease domain SRCR domain Both domains Protease domain SRCR domain 5UGG A 0.572 0.563 0.499 0.492 4DGJ A 0.611 0.611 0.584 0.584 2R0L A 0.612 0.612 0.592 0.592 3S69 A 0.613 0.613 0.674 0.674 1YC0 A 0.634 0.638 0.582 0.578 1YM0 A 0.645 0.645 0.599 0.599 1P57 B 0.648 0.648 0.687 0.679 3NXP A 0.656 0.559 1.007 0.913 3NCL A 0.659 0.667 0.647 0.647 2ANY A 0.665 0.692 0.534 0.534 4CRGA 0.667 0.667 0.574 0.574 1ELT A 0.692 0.692 0.642 0.642 3W94 A 0.692 0.692 0.532 0.532 2OQ5 A 0.693 0.684 0.639 0.639 2F91 A 0.701 0.701 0.677 0.677 3GYL B 0.731 0.836 0.688 0.688 1AO5 B 0.737 0.737 0.813 0.813 1FIW A 0.740 0.740 0.746 0.746 5YC6 U 0.749 0.749 0.621 0.621 4XDE A 0.762 0.762 0.683 0.683 5NATA 0.805 0.805 0.766 0.766 6R2W H 0.806 0.806 0.711 0.697 1Z8G A 0.814 0.818 0.869 0.801 0.679 0.169 2XRC A 0.832 0.684 0.996 0.821 0.813 8.525 5FCR B 0.836 0.836 0.937 0.937 4BXW A 0.842 0.611 0.747 0.746 4DUR A 0.867 0.699 0.721 0.687 2ZGC A 0.897 0.563 1.128 1.128 1ORF A 0.912 0.612 1.027 1.021 2XRC D 0.955 0.605 0.458 0.959 0.737 8.659 6ESO A 1.055 0.699 0.594 0.534 1PQ7 A 1.116 1.116 1.021 1.021 1MZA A 1.139 0.740 1.359 1.097 4HZH B 1.361 0.818 1.297 1.097 3H5C B 1.369 1.178 1.189 1.187 4O03 A 19.354 0.605 1.205 0.890 Note: Table is sorted from the best to worst RMSD values of TMPRSS2 with the compared known structures. The known structures which showed best RMSD values with the structure of both domains, or protease domain alone or SRCR domain alone are highlighted with bold face.

23 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table 3: Glide docking scores (in kcal/mol) of blood coagulation related protein inhibitors with their known targets plasminogen, plasma kalikrein, prothrombin activator, and with predicted structures of TMPRSS2 and TMPRSS4.

Inhibitors Protein name or PDB ID OGJa 22Ua 89Mb 7SDc TMPRSS2 -4.851 -5.123 -4.332 -4.602 TMPRSS4 -8.842 -5.565 -8.564 -5.92 4BXW (Venom prothrombin activator pseutarin-C catalytic subunit) -5.985 -3.283 -4.145 -5.12 5UGG (Plasminogen) -4.576 -4.589 -4.576 -4.873 2ANY (Plasma kallikrein, light chain) -4.45 -5.11 -4.238 -5.489 Note: TMPRSS2 and TMPRSS4 docking scores are based on predicted strcutures. 4BXW, 5UGG, and 2ANY are PDB structure identifiers, whose protease domains were used for docking. All these structures are part of the blood coagulation factors (annotated in brackets) a Peptide like selective thrombin inhibitor; b Plasminogen inhibitor; c Plasma kalikrein inhibitor;

24 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 1: Sequence comparison of TMPRSS2 and TMPRSS4 with their mouse orthologs Tmprss2 and Tmprss4. Panel A shows domain architectures of TMPRSS2 and TMPRSS4 proteins. The following domains with known activity are shown with their amino acid positions: Low-density lipoprotein receptor class A (LDLRA_2), Scavenger receptor cysteine rich (SRCR_2), and Serine proteases, trypsin family (Trypsin). Multiple sequence alignment between human TMPRSS2 and TMPRSS4 with their mouse orthologs is shown in panel (B). The approximate location of the domains shown in (A) are indicated by bars on top of the multiple alignment in (B) with the same color code in (A). The location of the triad of Ser-His-Asp, responsible for the proteolytic activity of the Trypsin domain is conserved in all four sequences and indicated by asterisks in (B). The serine protease domain and its active sites are conserved in all proteins. The25 LDLRA_2 domain appears to be functional in TMPRSS2 but it is truncated in TMPRSS4, which may have the cholesterol transport activity. Panel (C) shows the structural homologs of TMPRSS2 and TMPRSS4 in the PDB database. bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 2: Predicted 3D structures of TMPRSS2 and TMPRSS4 SRCR and serine-protease domains. The figure shows structural decomposition of predicted proteins. Greek-key β-barrel fold and scavenger receptor cysteine-rich (SRCR) domains are shown in indigo and grey, respectively.

26 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

ACRO_SHEEP

EGFB2_MOUSE

GRAA_HUMAN

GRAK_HUMAN

GRAM_HUMAN

HEPS_HUMAN

Q52V24_ASTLP

ELA1_SALSA

Q3HR18_EISFE

VSPSX_GLOSA

TRYP_FUSOX

PRSS8_HUMAN

CFAD_MOUSE

CFAD_HUMAN

CFAI_HUMAN

FA7_HUMAN

FAXC_PSETE

PROZ_HUMAN

UROK_HUMAN

FA12_HUMAN

HGFA_HUMAN

THRB_HUMAN

PLMN_HUMAN

KLKB1_HUMAN

FA11_HUMAN

TM11E_HUMAN

ST14_HUMAN

A4UWM5_ORYLA

ENTK_HUMAN 27 Figure 3: Domain architectures of TMPRSS2 and TMPRSS4 structural homologs. The figure shows domain architectures of 29 proteins from ProSite database. The active sites are shown with red diamonds and disulphide bridges with golden lines. bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 4: Superimposition of protease domains of predicted 3D structures of TMPRSS2 and TMPRSS4 with other known protease domains. Superimposition is shown of the protease domain of the 5UGG PDB structure of plasminogen with TMPRSS2 (A) and TMPRSS4 (B). TMPRSS2 and TMPRSS4 domains are shown in green while the 5UGG domain in red. Panel B and D shows the superimposition of the serine-protease catalytic triad (Ser-His-Asp) of TMPRSS2 and TMPRSS4 in green and 5UGG in red. The structure of 5UGG ligand (PDB ID 87M) is shown with blue spheres in (A) and (C).

28 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 5: Docking of TMPRSS2 and TMPRSS4 with known protease inhibitors. Docking is shown of TMPRSS2 (A) and TMPRSS4 (B) with the peptide-like thrombin inhibitor OGJ and the plasminogen tranexamic acid-derived inhibitor 89M (C) and (D). The docking score of OGJ with TMPRSS2 was -4.851 (kcal/mol), and with TMPRSS4 was -8.980 (kcal/mol), whereas with the inhibitor 89M was -4.332 (kcal/mol) and -8.564 (kcal/mol), respectively. Hydrogen bonds are indicated by pink arrows and cation-pi interactions by red arrows.

29 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Supplementary table 1: Structural homologs of TMPRSS2 and TPMRSS4 sequences identified using HHPred

PDB ID Proba E-valueb Scorec SSd Query HMMe Template HMMf TMPRSS2 sequence search hits 2XRCA 100.0 5.5E-30 268.4 345 110-491 201-558(565) 1Z8G A 100.0 1.9E-27 235.6 349 140-492 1-363(372) 2OQ5 A 99.9 6.3E-25 200.2 230 256-489 1-231(232) 4DGJ A 99.9 9.9E-25 199.1 233 256-489 1-235(235) 4BXW A 99.9 2.5E-24 216.9 344 117-491 7-412(423) 6R2W H 99.9 3.4E-24 197.6 234 256-491 1-237(249) 1P57 B 99.9 3.8E-24 198.0 236 256-492 1-246(255) 3W94 A 99.9 4.1E-24 194.5 233 256-489 1-235(235) 2ANY A 99.9 9.5E-24 192.9 236 256-492 1-239(241) 3H5C B 99.9 1.1E-23 203.6 296 119-489 1-317(317) 4XDE A 99.9 6.8E-24 197.5 237 255-492 2-247(257) 4CRG A 99.9 7.6E-24 192.9 234 256-490 1-237(238) 3NCL A 99.9 9.1E-24 193.3 232 256-489 1-240(241) 5NAT A 99.9 1.3E-23 192.2 230 256-492 1-231(232) 5FCR B 99.9 1.8E-23 191.0 231 256-492 1-232(234) 3GYL B 99.9 1.3E-23 195.3 235 256-490 1-243(261) 2F91 A 99.9 1.7E-23 191.5 231 256-489 1-237(237) 1ELT A 99.9 1.6E-23 192.0 228 256-488 1-236(236) 5YC6 U 99.9 1.7E-23 192.1 234 256-489 1-246(246) 2R0L A 99.9 2E-23 192.7 236 256-492 1-242(248) TMPRSS4 sequence search hits 1Z8G A 100.0 1.1E-33 270.1 340 96-436 1-362(372) 2XRC A 100.0 1.2E-29 256.1 372 60-436 26-558(565) 2OQ5 A 100.0 5.3E-26 200.3 229 205-434 1-231(232) 2R0L A 99.9 1.7E-25 199.5 229 205-436 1-241(248) 1ORF A 99.9 2.4E-25 197.1 229 205-437 1-234(234) 4XDE A 99.9 4.8E-25 198.2 233 203-436 1-246(257) 1P57 B 99.9 1.9E-24 193.2 231 205-436 1-245(255) 5NAT A 99.9 2.3E-24 190.5 225 205-435 1-229(232) 1YC0 A 99.9 2.6E-24 196.4 242 195-437 21-277(283) 5YC6 U 99.9 1.7E-24 192.0 230 205-434 1-246(246) 1AO5 B 99.9 6.3E-25 194.4 224 205-436 1-236(237) 5UGG A 99.9 6.6E-24 189.4 239 194-436 6-251(251) 2ZGC A 99.9 3.7E-24 190.2 229 205-436 1-231(240) 1YM0 A 99.9 2.6E-24 190.9 228 205-436 1-238(238) 1FIW A 99.9 3.4E-24 196.3 232 205-436 1-251(290) 4DGJ A 99.9 3.2E-24 189.1 228 205-434 1-235(235) 1MZA A 99.9 4.1E-24 189.4 228 204-435 2-236(240) 4BXW A 99.9 4.6E-24 207.3 241 194-437 159-413(423) 3S69 A 99.9 2.3E-24 190.8 226 205-436 1-227(234) 1PQ7 A 99.9 3.8E-24 187.3 224 205-433 1-224(224) Note: TMPRSS2 and TMPRSS4 protein sequences were searched against the entire PDB database using HHPred web-server. The top 20 hits for each protein are reported in the table with their PDB and chain identifiers. a Probability of target to be a true positive; b The number of hits one can expect by chance with a score better than the one for the target when scanning the datbase; c Raw sequence similarity score; d Secondary structure similarity score between30 query and target; e Range of aligned match states from query HMM; f Range of aligned match states from target HMM; bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A) TMPRSS2

B) TMPRSS4

Protein Length

Supplementary figure 1: The position of membrane helix in TMPRSS2 and TMPRSS4. The combined analysis report obtained from the TOPCONS web-server is shown, in which lower G (free energy) values represent amino acids that are likely to be part of the trans-membrane helix. The thick red and blue lines represent the inside and outside topology of the protein, respectively. A transmembrane helix is predicted at the N-termini in TMPRSS2 (A) and TMPRSS4 (B) protein sequences.

31 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Supplementary figure 2: Superimposition of the protease domain of predicted 3D structures of TMPRSS2 and TMPRSS4 with the scavenger receptor cysteine-rich (SRCR) domain of 1Z8G PDB structure. TMPRSS2 (A) and TMPRSS4 (B) SRCR domains are shown in purple while 1Z8G is shown in grey.

32