bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
1 SARS-CoV-2 Entry Protein TMPRSS2 and Its
2 Homologue, TMPRSS4 Adopts Structural Fold Similar
3 to Blood Coagulation and Complement Pathway
4 Related Proteins
∗,a ∗∗,b b 5 Vijaykumar Yogesh Muley , Amit Singh , Karl Gruber , Alfredo ∗,a 6 Varela-Echavarría
a 7 Instituto de Neurobiología, Universidad Nacional Autónoma de México, Querétaro, México b 8 Institute of Molecular Biosciences, University of Graz, Graz, Austria
9 Abstract The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) utilizes TMPRSS2 receptor to enter target human cells and subsequently causes coron- avirus disease 19 (COVID-19). TMPRSS2 belongs to the type II serine proteases of subfamily TMPRSS, which is characterized by the presence of the serine- protease domain. TMPRSS4 is another TMPRSS member, which has a domain architecture similar to TMPRSS2. TMPRSS2 and TMPRSS4 have been shown to be involved in SARS-CoV-2 infection. However, their normal physiological roles have not been explored in detail. In this study, we analyzed the amino acid sequences and predicted 3D structures of TMPRSS2 and TMPRSS4 to under- stand their functional aspects at the protein domain level. Our results suggest that these proteins are likely to have common functions based on their conserved domain organization. Furthermore, we show that the predicted 3D structure of their serine protease domain has significant similarity to that of plasminogen which dissolves blood clot, and of other blood coagulation related proteins. Additionally, molecular docking analyses of inhibitors of four blood coagulation and anticoagulation factors show the same high specificity to TMPRSS2 and TMPRSS4 3D structures. Hence, our observations are consistent with the blood coagulopathy observed in COVID-19 patients and their predicted functions based on the sequence and structural analyses offer avenues to understand better and explore therapeutic approaches for this disease.
10 Keywords: Covid19; TMPRSS2; TMPRSS4; Protease; SARS-CoV-2; Blood
11 coagulation factors
12 1. Introduction
13 Proteolysis is mediated by a special class of proteins called proteases or
14 peptidases that hydrolyze peptide bonds of their substrate proteins (López-
15 Otín and Overall, 2002). They act as a surveillance system that monitors
16 the turnover of cellular proteins. Hence, they modulate a plethora of cellular
∗Corresponding Author ∗∗First author Email addresses: [email protected]; [email protected] (Vijaykumar Yogesh Muley), [email protected] (Alfredo Varela-Echavarría) bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
17 processes including cell growth, survival, and death, as well as phagocytosis,
18 signaling pathways and membrane re-modelling (Muley et al., 2019; Puente et
19 al., 2005). In Escherichia coli, 36% (26% with stringent criteria) of proteases
20 belong to the serine protease family (Clausen et al., 2002) and this distribution
21 is estimated to be similar for many organisms. More than two percent of human
22 genes encode proteases (Puente et al., 2005), and 20 of them are classified as the
23 type II transmembrane serine proteases (TTSP). TTSPs have conserved domain
24 organization, which consists of a single-pass transmembrane domain located near
25 the amino-terminal end of the protein spanning through the cytosol and a large
26 extracellular portion at the carboxy-terminus containing the serine protease
27 domain of the chymotrypsin fold (Clausen et al., 2002; Szabo and Bugge, 2008).
28 This fold is characterized by the Ser-His-Asp catalytic triad, which is involved in
29 endopeptidase activity. These enzymes are widely distributed in prokaryotic and
30 eukaryotic genomes (Clausen et al., 2002; Muley et al., 2019; Puente et al., 2005).
31 Interestingly, the first TTSP member was identified over a century ago by Pavlov
32 due to its essential role in food digestion (Szabo and Bugge, 2008), and it was
33 cloned in 1994 leading to its characterization as a plasma membrane-anchored
34 protein (Kitamoto et al., 1994).
35 The transmembrane protease, serine 2 (TMPRSS2) and 4 (TMPRSS4) are
36 members of the TTSP family and belong to the hepsin/transmembrane pro-
37 tease/serine (TMPRSS) subfamily of TTSP (Szabo and Bugge, 2008). TMPRSS2
38 facilitates SARS-CoV-1 and SARS-CoV-2 entry in human cells and plays a crit-
39 ical role in Coronavirus disease 19 (Covid19) (Hoffmann et al., 2020; Hu et
40 al., 2020; Matsuyama et al., 2010). TMPRSS4 was previously characterized as
41 TMPRSS3 (Wallrapp et al., 2000), which along with TMPRSS2 promotes SARS
42 CoV-2 infection in human enterocytes (Zang et al., 2020). Its overexpression
43 has been observed in dozens of cancers and it contributes to tumorigenesis and
44 metastasis (Aberasturi and Calvo, 2015; Lee et al., 2016; Villalba et al., 2019).
45 Interestingly, TMPRSS2 and TMPRSS4 have been also shown to act as host
46 cell entry receptors for Influenza virus (Bertram et al., 2010) and TMPRSS2
47 was further shown to be involved in replication of H7N9 and Influenza viruses in
48 vivo (Sakai et al., 2014). However, their functions are not clearly understood in
49 normal conditions or in viral diseases.
50 In this study, we analyzed the amino acid sequences and predicted 3D
51 structures of TMPRSS2 and TMPRSS4 to understand their functional aspects at
52 the protein domain level. Our results suggest that these proteins are likely to have
53 common functions based on their conserved domain organization. Furthermore,
54 we show that the predicted 3D structure of their serine protease domain has
55 significant similarity to that of plasminogen, and of other blood coagulation
56 related proteins. Additionally, molecular docking analyses of inhibitors of four
57 blood coagulation and anticoagulation factors show the same high specificity to
58 TMPRSS2 and TMPRSS4 3D structures. Hence, our observations are consistent
59 with the blood coagulopathy observed in Covid19 patients and their predicted
60 functions based on the sequence and structural analyses offer avenes to understand
61 better and explore therapeutic approaches for this disease.
2 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
62 2. Material and methods
63 2.1. Sequence analysis
64 Protein sequences of TMPRSS2 and TMPRSS4 from humans and their mouse
65 orthologs were obtained from the UniProt database (Bateman et al., 2017).
66 Protein domains were identified using the scanProsite tool from the ProSite
67 database (Castro et al., 2006; Sigrist et al., 2009). Further domain architecture
68 information was obtained from the Genome3D database (Lewis et al., 2015).
69 The TOPCONS web server was used to predict the membrane-spanning region
70 of the proteins (Tsirigos et al., 2015). Multiple sequence alignment of human and
71 mouse proteins was constructed using the MAFFT plugin of JalView program,
72 and visualized using the latter (Katoh et al., 2018; Waterhouse et al., 2009). The
73 sequences of TMPRSS2 and TMPRSS4 were used for searches against the Protein
74 Data Bank (PDB) database using HHPred to find their structural homologs
75 (Berman, 2000; Hildebrand et al., 2009). Phyre2 was used in intensive mode to
76 predict their 3D structures (Kelley et al., 2015). Phyre2 modelled the TMPRSS2
77 structure using the PDB template structures 4O03_A, 2XRC_D, 6ESO_A,
78 4DUR_A, 4HZH_B, 1Z8G_A, and 3NXP_A. The same templates were also used
79 to model the TMPRSS4 structure except the 3NXP_A. The regions composed
80 of the scavenger receptor cysteine-rich (SRCR) and serine protease domains in
81 TMPRSS2 and TMPRSS4 were modelled with high accuracy by Phyre2, which
82 was also supported by HHPred results. The predicted structures belonging to
83 this region were then uploaded to the CATH web server to obtain the structural
84 domain hits from available crystal structures (Dawson et al., 2017). CATH
85 results confirmed the presence of two distinct domains, a large domain with
86 Greek-key β-barrel fold (Chymotrypsin domain) and a SRCR domain. Then,
87 the 3D protein structure corresponding to this region was compared with the
88 template structures identified by Phyre2 and top 20 structural homologs obtained
89 from HHPred search, together containing 36 unique structures. The domain
90 architectures of the corresponding proteins were extracted using ProSite database
91 (Sigrist et al., 2009).
92 2.2. Protein 3D structure analysis
93 We computed the root mean square deviation (RMSD) between the backbone
94 structure of the protease domain alone, the SRCR domain alone and both do-
95 mains of TMPRSS2 and TMPRSS4 with the above-mentioned 36 PDB structures
96 using the align module in PyMOL, with maximum iteration cycles of 20 and
97 BLOSUM62 as a scoring matrix (Schrödinger, LLC, 2015). The structures of plas-
98 minogen (PDB accession, 5UGG) and prothrombin activator (a catalytic domain
99 of prothrombinase, PDB accession, 4BXW) are available in complex with their
100 selective inhibitors YO (trans-4-aminomethylcyclohexanecarbonyl-l-tyrosine-n-
101 octylamide, PDB accession, 89M) and L-Glu-Gly-Arg chloromethyl ketone (PDB
102 accession, 0GJ) respectively (Law et al., 2017; Lechtenberg et al., 2013). These
103 structures were superimposed with TMPRSS2 and TMPRSS4 in the presence
104 and absence of their inhibitors using the PyMOL align tool. We selected 89M and
3 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
105 0GJ ligands, and also the thrombin inhibitor D-phenylalanyl-N-(3-chlorobenzyl)-
106 L-prolinamide (PDB accession, 22U) from PDB structure 2ZC9 (Baum et al.,
107 2009) and the plasma kallikrein inhibitor, N-[(6-amino-2,4-dimethylpyridin-3-
108 yl)methyl]-1-({4-[(1H-pyrazol-1-yl)methyl]phenyl}methyl)-1H-pyrazole-4-carboxamide
109 (PDB accession, 75D) from 6O1S (Partridge et al., 2019) to perform their docking
110 studies on TMPRSS2, TMPRSS4, 4BXW, 5UGG, and 2ANY structures using
111 the Glide docking program (Friesner et al., 2004). Briefly, the LigPrep module in
112 Maestro was employed to generate multiple conformations of the ligands followed
113 by energy minimization (Schrödinger, 2018). The target protein structures
114 were preprocessed to remove the bad contacts using the wizard integrated into
115 Maestro. The OPLS force field was used to minimize the protein structure
116 (Jorgensen and Tirado-Rives, 1988). The center of the receptor grid was placed
117 on the center of mass (the active site triad Ser-His-Asp) of the proteins, followed
118 by the extra precision Glide docking (Friesner et al., 2006). All structural images
119 were rendered using PyMOL.
120 3. Results
121 3.1. A conserved extracellular domain architecture of TMPRSS2 and TMPRSS4
122 suggests their related functions
123 The human TMPRSS2 gene encodes a 492 amino acid long protein compared
124 to the 437 amino acids encoded by TMPRSS4 (Wallrapp et al., 2000). Pairwise
125 global sequence alignment using the Needleman-Wunsch algorithm showed amino
126 acid similarity of 42.2% and identity of 30.3% between them (Madeira et al.,
127 2019; Needleman and Wunsch, 1970). The single-pass transmembrane helix is
128 present in both proteins near their N-termini (Supplementary figure 1). The
129 approximate location of a transmembrane helix in TMPRSS2 is between 85 and
130 105 residues, whereas between 33 and 53 in TMPRSS4 leaving a longer N-terminal
131 sequence of 84 amino acids in TMPRSS2. The N-terminal sequence preceding the
132 transmembrane helix in both proteins is shorter and predicted to be cytoplasmic,
133 while the following sequence is longer and exposed to the extracellular milieu
134 owing to its extracellular topology (Supplementary figure 1). As shown in
135 Figure 1A, both proteins have a conserved extracellular domain architecture
136 containing the Low-density lipoprotein (LDL) receptor class A domain (denoted
137 as LDLRA), the scavenger receptor cysteine rich (SRCR) and the protease
138 domain of Peptidase S1A, chymotrypsin family or Trypsin-like serine protease
139 superfamily according to Pfam and Superfamily database, respectively (Gough
140 et al., 2001; Sonnhammer et al., 1997). The protease domain, hereafter referred
141 to as serine-protease, adopts a chymotrypsin type structural fold characterized
142 by Greek-key β-barrels. The results pertaining to the extracellular portion are
143 consistent with the previous sequence analysis reports (Aberasturi and Calvo,
144 2015; Szabo and Bugge, 2008; Wallrapp et al., 2000). The chymotrypsin fold is
145 a prototype structural feature of the high temperature requirement A (HtrA)
146 protein family of Trypsin-like serine proteases, which acts as chaperones and
147 are responsible for maintaining protein tertiary structure at high temperature
148 (Clausen et al., 2002).
4 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
149 In contrast to conserved C-terminal regions, the N-terminal amino acid se-
150 quences preceding the transmembrane helix in both proteins differ substantially
151 (Figure 1A). TMPRSS4 has only a stretch of 32 residues at its N-terminus, which
152 do not show similarity to any known structure. In TMPRSS2, however, the
153 equivalent N-terminal region of approximately 85 amino acids is predicted to
154 be structurally homologous to the Delta-retroviral matrix superfamily (CATH
155 superfamily code 1.10.185.10), and within this region, a stretch of amino acids
156 spanning positions 5 to 33 show similarity to the β-sandwich domain of Sec23/24
157 superfamily (CATH superfamily code 2.60.40.1670) according to the Genome3D
158 database annotation (Lewis et al., 2015). The β-sandwich domain of Sec23/24
159 can also be confirmed with the Superfamily database searches (SUPERFAM-
160 ILY/SCOP database accession: 81995) (Gough et al., 2001). This domain is likely
161 to adopt the Human T-cell Leukemia Virus Type II Matrix Protein structural
162 fold according to CATH database annotation (Dawson et al., 2017). However, we
163 could not detect homologous sequences for this region in viral genome restricted
164 searches using a PSI-BLAST at NCBI, neither HHPred search predicted similar
165 structures in the PDB database (Hildebrand et al., 2009; Johnson et al., 2008).
166 Therefore, experimental analyses are required to address the functional aspects
167 of both predicted domains.
168 3.2. Multiple sequence alignment of TMPRSS2 and TMPRSS4 with their mouse
169 orthologs shows highly conserved SRCR and serine protease domains
170 To understand the amino acid variations in both proteins, we performed
171 multiple sequence alignment of TMPRSS2 and TMPRSS4 with their mouse
172 orthologs only, since our aim was to confirm whether important amino acid
173 positions are conserved in both proteins. This analysis revealed indels in the
174 N-terminal region of TMPRSS2 and TMPRSS4 not affecting their conserved
175 transmembrane helix, which is followed by a conserved C-terminal sequence
176 (Figure 1B). The LDLRA domain is located right next to the membrane helix in
177 both proteins, which is consistent with previous studies (Aberasturi and Calvo,
178 2015; Szabo and Bugge, 2008). This domain contains six cysteine disulfide-bonds
179 that bind lipoproteins such as LDLs and a highly conserved cluster of negatively
180 charged amino acids (Bieri et al., 1995; Yamamoto et al., 1984). All six cysteines
181 are conserved in TMPRSS2 and its mouse ortholog, and four are also conserved in
182 TMPRSS4. One indel is adjacent to the LDLRA domain at the N-terminal end in
183 TMPRSS4 and another at the C-terminus of TMPRSS2. The one in TMPRSS4
184 corresponds to one of its missing cysteine residues, and another cysteine residue
185 is substituted by phenylalanine. Both proteins, however, have conserved calcium
186 binding sites within this LDLRA domain. The bound calcium ion imparts
187 structural integrity to the domain (Bieri et al., 1995), suggesting the presence
188 of LDLRA domain activity in both proteins. This domain in both proteins is
189 followed by a highly conserved SRCR and the serine protease domain (Figure
190 1B). The biochemical functions of SRCR domains have not been established
191 with certainty but they are likely to mediate protein-protein interactions and
192 ligand binding (Hohenester et al., 1999; Resnick et al., 1994). This domain is
193 found in diverse secreted and membrane bound proteins including regulators of
5 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
194 the complement cascades involved in immune response (Freeman et al., 1990).
195 The catalytic triad of Ser-His-Asp residues responsible for its proteolytic activity
196 is conserved in human and mouse (Figure 1B).
197 Overall, these results reveal that the extracellular region of TMPRSS2 and
198 TMPRSS4, and its domain organization is highly conserved suggesting that they
199 have related functions.
200 3.3. TMPRSS2 and TMPRSS4 show homology with complement system and
201 blood coagulation and anticoagulation related proteins
202 To identify known structural homologs of both proteins, we queried their
203 sequences in the PDB database using the HHPred webserver. The region between
204 110 to 491 and 96 to 436 amino acid positions in TMPRSS2 and TMPRSS4,
205 respectively, showed significant similarity with several PDB structures (hits)
206 (Figure 1C). We selected the top 20 significant hits for each protein for further
207 analysis (details of search results are provided in the Supplementary table 1).
208 The structures 2XRC, 1Z8G, and 2OQ5 were the most closely related to the
209 extracellular region of both proteins (Figure 1C). TMPRSS2 showed the best
210 match with the 2XRC structure of Human complement factor I encoded by
211 the CFI gene (Roversi et al., 2011), while TMPRSS4 with the 1Z8G structure
212 belonging to another TTSP family member, hepsin (HPN), which is also known
213 as TMPRSS1 (Herter et al., 2005). Hepsin and TMPRSS2 proteolytically cleave
214 the Angiotensin-converting enzyme 2 (ACE2) in a similar manner (Heurich et
215 al., 2014). The 2OQ5 structure is a part of the catalytic domain of TTSP family
216 member DESC1 (Kyrieleis et al., 2007). DESC1 was shown to activate influenza
217 viruses and coronaviruses in cell culture linked to host cell entry (Zmora et al.,
218 2014). Most remaining HHPred hits belonged to structures of complement factors
219 or blood coagulation and anticoagulation proteins (Table 1). It is noteworthy,
220 that not a single structural match was obtained for the cytoplasmic region of
221 both proteins, even when their amino acid sequences were queried alone.
222 The high sequence similarity of TMPRSS2 and TMPRSS4 with known
223 structures in the PDB database allowed modelling their 3D structure using
224 Phyre2 web server (Kelley et al., 2015). In both sequences, about 86% of the
225 residues were modelled at more than 90% confidence. The 67 and 61 residues at
226 the N-terminal ends of TMPRSS2 and TMPRSS4, respectively, were modelled
227 ab initio due to the lack of homology with known structures. Therefore, we
228 removed the coordinates of the first 125 and 61 amino acids from the predicted
229 structures of TMPRSS2 and TMPRSS4 due to their low confidence prediction.
230 Figure 2 shows that SRCR and the serine-protease domains were modelled with
231 high accuracy in both proteins with the preservation of clustered cysteines and
232 the catalytic triad of serine-protease domain, respectively. A similar adjacent
233 two-domain architecture is common in other TTSPs such as TMPRSS5 and
234 Hepsin (Herter et al., 2005; Szabo and Bugge, 2008).
235 The template structures identified by Phyre2 (see Material and Methods)
236 and the top 20 hits obtained by the HHPred search of TMPRSS2 and TMPRSS4
237 homologs correspond to a total of 36 structures of 29 proteins from 10 species. As
238 expected, all of these correspond to known proteases and, strikingly, many of them
6 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
239 are related to blood coagulation processes, both pro- and anticoagulation (10 of
240 29 proteins) (Table 1). These proteins include anticoagulation factor plasminogen,
241 coagulation factor thrombin and plasma kallikrein. Moreover, five of the identified
242 proteins were linked to immune functions, including the Complement factor
243 D and I involved in an alternative immune response complement pathway, as
244 well as the Granzymes GZMA, GZMK, and GZMM required for activation of
245 caspase-independent cell death in cytotoxic T-cells and NK-cells (Ewen et al.,
246 2012). We also observed that the six proteins encoded by the genes F11, F2,
247 KLKB1, PLAU, ACR, and TMPRSS11E are the targets of SERPINA5, a plasma
248 serine protease inhibitor with hemostatic roles as a procoagulant, anticoagulant
249 and proinflammatory factor (Yang and Geiger, 2017).
250 We further analyzed the domain architecture of these proteins. The serine-
251 protease domain is conserved in all homologs, and half of them are also ac-
252 companied by other domains, particularly those found in proteins related to
253 blood coagulation (Figure 3). These findings prompted the possibility that
254 TMPRSS2 and TMPRSS4 have functions in blood pro- and anticoagulation
255 related processes. We believe that these functions could be performed by their
256 serine protease domain alone as the thrombin-like snake venom serine protease
257 (UniProt name, VSPSX_GLOSA) has only the protease domain and it shows
258 strong blood coagulation activity in vitro (Figure 3) (Wei et al., 2007).
259 3.4. Structural homology of the predicted structure of TMPRSS2 and TMPRSS4
260 with plasminogen and enteropeptidases
261 We used the align module in PyMOL to compute the RMSD between back-
262 bone atoms of the predicted structures of TMPRSS2 and TMPRSS4 with each
263 of the above-mentioned 36 structures. More than 30 structures showed high
264 similarity with both structures with RMSD values of less than 1Å (Table 2).
265 Among them, the PDB structure 5UGG containing a serine protease domain
266 showed a striking superimposition with RMSD values of 0.563Å and 0.499Å with
267 TMPRSS2 and TMPRSS4, respectively (Figure 4A, C). The 5UGG structure be-
268 longs to a plasminogen (Law et al., 2017). Further underscoring the significance
269 of this finding is that along with the overall similarity of their chymotrypsin fold,
270 the coordinates of the active site amino acid triad are almost identical between
271 5UGG and both predicted structures (Figure 4B, D). The second-best structural
272 alignment of TMPRSS2 was with the 4DGJ structure with a RMSD value of
273 0.611Å, and TMPRSS4 with 3W94 with a RMSD value of 0.532Å (Table 2).
274 The 4DGJ and 3W94 structures are representative of TTSP enteropeptidases
275 such as the human TMPRSS15 which is found on the brush border membrane
276 of epithelial cells in the duodenum and its homolog from the Japanese rice fish
277 Oryzias latipes (UniProt accession A4UWM5), respectively. On the other hand,
278 the TMPRSS4 SRCR domain shows structural similarity with the equivalent
279 domain in hepsin (PDB 1Z8G, with a RMSD value of 0.169Å), while the TM-
280 PRSS2 SRCR domain was more similar to the chain D of the Complement factor
281 I domain (PDB 2XRC with RMSD value of 0.458Å) (Table 2) (Herter et al.,
282 2005; Roversi et al., 2011).
7 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
283 Corroborating the above results with HHPred predictions suggest that TM-
284 PRSS2 and TMPRSS4 are likely to adopt a structural core similar to the two
285 TTSP enteropeptidases, TMPRSS15 and Hepsin, the Complement factor I, and
286 plasminogen since they were most closely related at the sequence and predicted
287 structure level (Heissig et al., 2020; Lo et al., 2020; Risitano et al., 2020). These
288 results are consistent with previous in-silico analyses of the predicted TMPRSS2
289 structure. Hepsin was used as a template to model TMPRSS2 structure in three
290 previous studies (Chikhale et al., 2020; Idris et al., 2020; Rahman et al., 2020),
291 while human TMPRSS15 and its Japanese rice fish ortholog structures were also
292 used (Hempel et al., 2021; Huggins, 2020).
293 3.5. TMPRSS2 and TMPRSS4 may be inhibited by thrombin, plasma kallikrein
294 and plasminogen inhibitors
295 The plasminogen serine protease domain structure showed the best overlap
296 with TMPRSS2 and TMPRSS4 protease domains. Plasminogen dissolves the
297 fibrin and dissolves blood clots (Storti and Szwast, 1982), and this activity
298 has been shown to be inhibited by Aprotinin, a polypeptide consisting of 58
299 amino acid residues from bovine lung (Mahdy and Webster, 2004). Interestingly,
300 Aprotinin also inhibits plasma kallikrein and thrombin, which are involved in
301 blood coagulation and showed a reasonable structural match with TMPRSS2 and
302 TMPRSS4 serine protease domains. Therefore, we assumed that their selective
303 ligands can also inhibit the activity of TMPRSS2 and TMPRSS4. Structures
304 of the complexes of 5UGG (plasminogen) and 4BXW (prothrombin activator)
305 with their selective inhibitors 89M and OGJ, respectively are available in the
306 PDB database (Law et al., 2017; Lechtenberg et al., 2013). In addition, we
307 selected 22U and 75D molecules, which are selective inhibitors of thrombin (PDB,
308 2ZC9) and plasma kallikrein (PDB, 6O1S) (Baum et al., 2009; Partridge et
309 al., 2019). The latter structures were treated as positive controls since they
310 were not used as templates for TMPRSS2 and TMPRSS4 structure prediction
311 by Phyre2. In addition, we selected the structure of 2ANY of the kallikrein
312 protease family (Tang et al., 2005). These four inhibitor molecules were then
313 used for docking with TMPRSS2, TMPRSS4, 4BXW, 5UGG, and 2ANY using
314 the Glide docking program. We selected 4BXW, 5UGG, and 2ANY among
315 the other structures since they represent thrombin, plasminogen, and kallikrein
316 family serine protease structures and they had the lowest RMSD values with the
317 TMPRSS2 and TMPRSS4 structures. The receptor grid was centered on the
318 center of mass of the catalytic triad (Ser-His-Asp) of these proteins. Interestingly,
319 most Glide docking scores for TMPRSS2 and TMPRSS4 with all four inhibitors
320 were as good as with their original receptor molecules (Table 3). The predicted
321 structures of TMPRSS2 and TMPRSS4 had binding energy values below -5
322 kcal/mol for all molecules except for TMPRSS4, which had a binding energy of
323 around -8 kcal/mol for the inhibitors OGJ and 89M. Hence, TMPRSS2 scores
324 reflected a close fit for all inhibitors suggesting that it has a 3D structure similar
325 to blood coagulants as well as anticoagulant. However, TMPRSS4 showed a
326 reasonably high selectivity only for the inhibitors of the blood coagulation factors
327 thrombin and kallikrein. All these reported inhibitors are known to interfere with
8 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
328 the catalytic triad of the serine proteases. As shown in Figure 5, although both
329 TMPRSS2 and TMPRSS4 have the same number of bonds with both inhibitors,
330 the latter imparts an extra charged interaction with OGJ using Lys287, which
331 also forms a cation-pi and hydrogen bond with the 87M ligand, allowing a tighter
332 binding.
333 In summary, these results suggest that the serine protease domains of the
334 TMPRSS2 and TMPRSS4 are likely to have structural cores very similar to
335 those of plasminogen, thrombin, and plasma kallikrein. Hence, they are likely to
336 share effects as blood anticoagulant factors.
337 4. Discussion
338 The membrane-bound proteins TMPRSS2 and ACE2 are well known entry
339 points of SARS-CoV-2. On the other hand, TMPRSS4 has not been well studied
340 in the context of COVID19 pathogenesis. The TMPRSS4 expression on cell
341 membranes have been shown to promote SARS-CoV-1 S protein driven cell-cell
342 fusion similar to that of TMPRSS2 but without the cleavage of the S protein
343 (Glowacka et al., 2011). This suggests that TMPRSS4 activates the S protein
344 independently of its cleavage by an unknown molecular mechanism. TMPRSS2
345 and TMPRSS4 also activate hemagglutinin, which is indispensable for influenza
346 virus infectivity in lungs (Chaipan et al., 2009). Moreover, both proteins have
347 also been shown to assist infection of SARS-CoV-2 infection in the intestine
348 (Zang et al., 2020). This evidence suggests that TMPRSS2 and TMPRSS4
349 perform related functions. Therefore, we analyzed TMPRSS2 and TMPRSS4
350 proteins at the sequence and their predicted structure levels and provide evidence
351 supporting their related functions.
352 4.1. The N-terminal cytoplasmic region of TMPRSS2 is similar to the domain
353 involved in protein trafficking
354 The N-terminal cytoplasmic region of 85 residues in TMPRSS2 is likely
355 to adopt the structural fold (CATH ID: 1.10.185.10) found in Human T-cell
356 Leukemia Virus Type II Matrix Protein (Dawson et al., 2017). This gag protein
357 is a common feature of all retroviruses and is required for membrane localization
358 of the assembling viral particle and subsequently remains associated to the
359 inner surface of the membrane of the mature virion (Christensen et al., 1996).
360 Interestingly, we also found a hit for another domain in this region belonging to
361 the β-sandwich domain of the Sec23/24 superfamily (CATH ID: 2.60.40.1670).
362 This domain is a prototype of sec23/24 proteins that are part of the multi-
363 subunit complex COPII coat, which is responsible for the selective export of
364 cargo proteins from the endoplasmic reticulum to the Golgi apparatus (Hughes
365 and Stephens, 2008). Both domains seem to be involved in cargo trafficking.
366 However, we did not find homologous sequences for this region in viral genome
367 restricted searches using a PSI-BLAST at NCBI, neither using HHPred searches
368 in the PDB database (Hildebrand et al., 2009; Johnson et al., 2008). This is
369 likely due to the fact that structures are more conserved than the sequences and
9 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
370 hence it is often observed that the same fold is adopted by proteins even though
371 amino acid sequences differ substantially. Therefore, experimental analyses are
372 required to address the functional aspects of both predicted domains considering
373 their possible roles in viral assembly and trafficking (Christensen et al., 1996;
374 Hughes and Stephens, 2008).
375 4.2. LDLRA domains of TMPRSS2 and TMPRSS4 may be invovled in viral
376 entry via receptor-mediated endocytosis
377 The extracellular part of both proteins has highly conserved SCRC, LDLRA,
378 and serine-protease domains in the same order. The LDLRA domains are
379 unstructured and their binding of the calcium ion confers them structural
380 integrity, which in turn allows their binding to cholesterols. The calcium binding
381 site is conserved in both proteins suggesting that they can form an active LDLRA
382 structure. Cholesterol is an essential component of the eukaryotic cell membranes
383 in which it plays a critical role by maintaining their fluidity and hence the barrier
384 between cell and environment. Membrane receptors having LDLRA domains
385 bind LDLs that contain esterified cholesterol and carry them into cells after
386 clustering in clathrin-coated pits by a receptor-mediated endocytosis (Daly et al.,
387 1995). However, the function of the LDLRA domain in TMPRSS2 and TMPRSS4
388 is not clear as of now. It may normally bring LDLs inside cells through receptor-
389 mediated endocytosis as it does with other proteins such as apolipoprotein E
390 (Daly et al., 1995,Brown and Goldstein (1986)), which can be used for the host
391 membrane remodelling. Assuming that viruses exploit host cell machinery to
392 perform their tasks, TMPRSS2 mediated endocytosis can also import SARS-
393 CoV-2 inside cells along with LDLs. Since endocytosis leaves no evidence of
394 virus entry and it can avoid detection of its cargo by immunosurveillance, the
395 endocytic pathway appears to be a common mechanism of host cell entry for
396 many viruses. For example, herpes simplex virus 1 and human immunodeficiency
397 virus 1 are capable of entering directly but often use the endocytic pathways
398 for cell entry (Daecke et al., 2005; Miyauchi et al., 2009; Nicola et al., 2003).
399 The membrane domains with high concentrations of cholesterols known as lipid
400 rafts have also been shown to be targeted by viruses for cell entry (Lingwood
401 and Simons, 2010; Simons and Ikonen, 1997). The importance of clathrin-
402 endocytotic pathway, lipid rafts and the presence of ACE2 receptors in them
403 has been confirmed in SARS-CoV-1 (Glende et al., 2008; Inoue et al., 2007;
404 Wang et al., 2008) as well as in SARS-CoV-2 infections (Li et al., 2021; Nardacci
405 et al., 2021). Furthermore, many RNA viruses harness endocytosis to traffic
406 cholesterol from the remodeled membrane and extracellular medium to generate
407 replication organelles, where cholesterol regulates viral polyprotein processing
408 and genome replication (Ilnytska et al., 2013). The Cholesterol-25-hydroxylase
409 converts cholesterol to 25-hydrocholesterol and depletes it from host membrane,
410 which in turn has been shown to block SARS-CoV-2 membrane fusion thereby
411 inhibiting infection in lung epithelial cells (Wang et al., 2020). In addition, higher
412 levels of oxidized cholesterols could lead to the induction of a procoagulant state
413 (Kim et al., 2020) or aggravate the formation of atherosclerotic plaques. This
414 would in part explain the blood clotting and breathing problems observed in
10 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
415 COVID19 patients (Gu et al., 2021). Therefore, TMPRSS2 and TMPRSS4
416 LDLRA domains must be studied further.
417 4.3. Serine-protease domains of TMPRSS2 and TMPRSS4 show significant
418 similarity with plasminogen and blood coagulation factors
419 The TMPRSS2 and TMPRSS4 3D structures are not available. Hence, we
420 first identified their structural homologs using HHPred, and also modelled 3D
421 structures using Phyre2. We identified 29 structural homologs from 10 species
422 with both approaches. These 3D structures were mapped to the SCRC and
423 serine-protease in both proteins, but no match was found for the LDLRA domain.
424 When we analyzed further the domain architectures of the homologs, we found
425 that the serine-protease domain is conserved in all, and half of them are also
426 accompanied by other domains, particularly those found in proteins related to
427 blood coagulation and anticoagulation. As discussed earlier, the TMPRSS2 and
428 TMPRSS4 serine protease domains are present at their C-termini. The TMPRSS2
429 domain, which is tethered to the outer face of the cell membrane, activates the S
430 protein of SARS-CoV-2 for host cell entry. Among the serine-protease domains
431 of 29 proteins, that of plasminogen showed a striking superimposition with
432 TMPRSS2 and TMPRSS4 and the coordinates of their active site amino acid
433 triad were almost identical. It is also shown that the plasminogen shows 95%
434 identity within the S1–S1’ subsites of TMPRSS2, which are used for cleavage
435 of the SARS-Cov-2 spike protein, and 64.71% within S4–S4’ subsites, which
436 were the highest among 14 serine proteases including TMPRSS15 which was
437 selected for homology modelling of the TMPRSS2 structure (Huggins, 2020).
438 This suggests that the serine-protease domains of the TMPRSS2 and TMPRSS4
439 are likely to have a protease activity similar to that of plasminogen, which
440 dissolves the fibrin of blood clots (Storti and Szwast, 1982). Hence, these
441 findings suggest that TMPRSS2 and TMPRSS4 serine protease domains have
442 similar catalytic properties to those of blood clotting factors further supporting
443 the notion that they are likely involved in blood coagulation and anticoagulation
444 through common mechanisms involving plasminogen, plasma kallikrein and
445 thrombin. Similar to these three blood coagulation related factors, the protease
446 domains of TMPRSS2 and TMPRSS4 are also extracellular and they are made
447 also snthesized in an inactive zymogen form. Interestingly, TMPRSS4 directly
448 activates the urokinase-type plasminogen activator (pro-uPA) encoded by the
449 PLAU gene through its proteolytic activity, which in turn can cleave zymogen
450 plasminogen to form the active enzyme plasmin (Min et al., 2014). This suggests
451 a potential role of TMPRSS4 in blood clot resolution upstream of plasminogen
452 and pro-uPA, which is one of the TMPRSS2 and TMPRSS4 structural homologs
453 identified in this study. It is noteworthy that pro-uPA and plasminogen are
454 both ligands for the LDLRA domain containing protein families (Liu et al.,
455 2001). Hence additional studies are warranted to determine whether TMPRSS4
456 activates pro-uPA which in turn converts plasminogen to plasmin to resolve
457 blood clots. These sequential reactions are likely to be dependent on the LDLRA
458 domain of TMPRSS4 and we believe TMPRSS2 also performs similar functions.
11 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
459 4.4. Plasminogen, thrombin, and plasma kallikrein inhibitors show selective
460 binding to TMPRSS2 and TMPRSS4 serine-protease active sites.
461 Aprotinin has been shown to inhibit plasminogen, thrombin, and plasma
462 kallikrein. Therefore, we assumed that the inhibitors of these proteins can
463 also inhibit TMPRSS2 and TMPRSS4. To test this, we used 3D structures
464 of four selective inhibitors and performed docking analysis with the protease
465 active sites of TMPRSS2, TMPRSS4, prothrombin activator, plasminogen, and
466 plasma kallikrein structures. As expected, the Glide docking scores with all four
467 inhibitors for TMPRSS2, and to a lesser extent for TMPRSS4, were as good as
468 with their original molecules suggesting that these inhibitors are likely to impair
469 TMPRSS2 and TMPRSS4 protease activity. Moreover, Aprotinin, a polypeptide
470 consisting of 58 amino acid residues from bovine lung inhibits plasminogen,
471 plasma kallikrein and thrombin (Mahdy and Webster, 2004). There is also
472 evidence that it specifically inhibits other serine-proteases including TMPRSS2
473 and TMPRSS4 that cleave hemagglutinin protein of Influenza virus, and aerosol
474 inhalation of Aprotinin is used for the treatment of patients with mild-to-
475 moderate influenza infections (Ovcharenko and Zhirnov, 1994; Zhirnov et al.,
476 2011). Aprotinin has been suggested to inhibit TMPRSS2 as well (Shen et al.,
477 2017).
478 4.5. TMPRSS2 and TMPRSS4 may function at high-temperature during immune
479 response owing to their serine-protease domain
480 Strikingly, the 29 structural homologs of TMPRSS2 and TMPRSS4 we
481 detected can be grouped into a few related functions including 10 in blood
482 coagulation related processes, 5 in immune functions, and the rest in alternative
483 immune response complement pathway, and activation of caspase-independent
484 cell death in cytotoxic T-cells and NK-cells (Ewen et al., 2012). Interestingly,
485 the five proteins encoded by the genes F11, F2, KLKB1, PLAU, and ACR, and
486 also TMPRSS member TMPRSS11E are the targets of SERPINA5, a plasma
487 serine protease inhibitor with hemostatic roles as a procoagulant, anticoagulant
488 and proinflammatory factor (Yang and Geiger, 2017). It is highly possible that
489 TMPRSS2 and TMPRSS4 genes are also targets of SERPINA5. Additionally,
490 it is known that dexamethasone induces expression of the beta isoform of
491 GZMA and represses expression of its alpha isoform, upon binding of the
492 glucocorticoid receptor (Ruike et al., 2007), ultimately leading to apoptotic cell
493 death (Myoumoto et al., 2007). Interestingly, septic shock is an inflammatory
494 response which causes excessive cell death, and critical COVID19 patients have
495 been shown to recover using Dexamethasone (RECOVERY Collaborative Group
496 et al., 2021). Furthermore, anticoagulants have been used to treat COVID19
497 patients with good success (Levi et al., 2020; Violi et al., 2020).
498 One of the hallmarks of immune response is fever, especially upon infec-
499 tions by viruses or other pathogens. Presumably TMPRSS2 and TMPRSS4
500 along with their 29 homologs, necessitate to remain stable and functional at
501 high temperature during fever. Intriguingly, their serine-protease domain is a
502 prototype feature of the high temperature requirement A (HtrA) protein family,
12 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
503 whose members act as chaperones and are responsible for maintaining protein
504 quality at high temperature (Clausen et al., 2002). The HtrA family is present
505 in the three domains of life and hence, the HtrA mediated cellular response to
506 high temperature is universally conserved (Muley et al., 2019). Null mutants of
507 Escherichia coli HtrA do not survive at elevated temperature due to instability of
508 temperature sensitive proteins due to the lack of protein quality control measures
509 mediated by HtrA (Lipinska et al., 1990). These observations indicate that
510 the serine-protease domains are likely selected as a part of blood coagulation
511 and anticoagulation, immune response, and alternative complement pathways to
512 perform their functions efficiently during immune responses and keep normal
513 hemostasis at high temperatures. Hence, the allele frequency and nucleotide
514 sequences of these protein coding genes can be expected to vary among the
515 populations originally adapted to cold, hot, and temperate zones (Wang et al.,
516 2020). This could be one of the important factors involved in diverse rates of
517 dispersion of SARS-CoV-2 infection in countries with different populations and
518 temperatures. One cannot ignore the diverse genetic makeup of the world’s
519 population influenced by the surrounding environment, and TMPRSS2 and TM-
520 PRSS4 should be further studied in this regard especially their serine-protease
521 domain.
522 5. Conclusions
523 Our in-silico analysis based on the predicted structures of TMPRSS2 and
524 TMPRSS4 allowed the identification of structural homologs many of which are
525 involved in blood coagulation, immune response, and proteolysis, which are
526 important in the context of immune functions. The similarity of the proteolytic
527 domains of TMPRSS2 and TMPRSS4 to that of the blood clotting factors
528 suggest that their catalytic properties are similar as well. Indeed, the tight
529 docking of known inhibitors for these factors to the catalytic sites of TMPRSS2
530 and TMPRSS4 strongly suggest that their activity is also inhibited. This would
531 in part explain why anticoagulant treatments that have been used to treat
532 COVID19 patients have had good success. In addition to treating the clotting
533 problems in these patients, it is expected that the inhibition of the extracellular
534 domains of TMPRSS2 and TMPRSS4 would inhibit their proteolytic effect on
535 ACE2 thus limiting virus entry into the target cells. Moreover, inhibition of
536 these proteins in platelets might also limit the thrombotic effects of SARS CoV-2
537 (Zhang et al., 2020). Hence, our studies shed light on a novel mechanism by which
538 anticoagulant treatments might act to limit the severity of COVID19 inking LDL
539 and clotting factors and offer an avenue for further exploration of therapeutic
540 approaches for this disease that is affecting the whole world population.
541 6. Funding
542 Financial support to VYM and AV-E was provided by IA203920 and IN229620
543 DGAPA-UNAM grants respectively. AV-E was also supported by CONACYT-
544 315802 grant. Financial support to AS and KG was provided by the Austrian
13 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
545 Science Funds (FWF) through the doc.funds project DOC-46 “Catalox” and the
546 Doctoral Academy Graz of the University of Graz.
547 7. References
548 Aberasturi, A.L. de, Calvo, A., 2015. TMPRSS4: an emerging potential thera-
549 peutic target in cancer. British journal of cancer 112, 4–8. doi:10.1038/bjc.2014.403
550 Bateman, A., Martin, M.J., O’Donovan, C., Magrane, M., Alpi, E., Antunes,
551 R., Bely, B., Bingley, M., Bonilla, C., Britto, R., Bursteinas, B., Bye-AJee, H.,
552 Cowley, A., Da Silva, A., De Giorgi, M., Dogan, T., Fazzini, F., Castro, L.G.,
553 Figueira, L., Garmiri, P., Georghiou, G., Gonzalez, D., Hatton-Ellis, E., Li, W.,
554 Liu, W., Lopez, R., Luo, J., Lussi, Y., MacDougall, A., Nightingale, A., Palka,
555 B., Pichler, K., Poggioli, D., Pundir, S., Pureza, L., Qi, G., Rosanoff, S., Saidi,
556 R., Sawford, T., Shypitsyna, A., Speretta, E., Turner, E., Tyagi, N., Volynkin,
557 V., Wardell, T., Warner, K., Watkins, X., Zaru, R., Zellner, H., Xenarios, I.,
558 Bougueleret, L., Bridge, A., Poux, S., Redaschi, N., Aimo, L., ArgoudPuy, G.,
559 Auchincloss, A., Axelsen, K., Bansal, P., Baratin, D., Blatter, M.C., Boeckmann,
560 B., Bolleman, J., Boutet, E., Breuza, L., Casal-Casas, C., De Castro, E., Coudert,
561 E., Cuche, B., Doche, M., Dornevil, D., Duvaud, S., Estreicher, A., Famiglietti,
562 L., Feuermann, M., Gasteiger, E., Gehant, S., Gerritsen, V., Gos, A., Gruaz-
563 Gumowski, N., Hinz, U., Hulo, C., Jungo, F., Keller, G., Lara, V., Lemercier, P.,
564 Lieberherr, D., Lombardot, T., Martin, X., Masson, P., Morgat, A., Neto, T.,
565 Nouspikel, N., Paesano, S., Pedruzzi, I., Pilbout, S., Pozzato, M., Pruess, M.,
566 Rivoire, C., Roechert, B., Schneider, M., Sigrist, C., Sonesson, K., Staehli, S.,
567 Stutz, A., Sundaram, S., Tognolli, M., Verbregue, L., Veuthey, A.L., Wu, C.H.,
568 Arighi, C.N., Arminski, L., Chen, C., Chen, Y., Garavelli, J.S., Huang, H., Laiho,
569 K., McGarvey, P., Natale, D.A., Ross, K., Vinayaka, C.R., Wang, Q., Wang,
570 Y., Yeh, L.S., Zhang, J., 2017. UniProt: The universal protein knowledgebase.
571 Nucleic Acids Research 45, D158–D169. doi:10.1093/nar/gkw1099
572 Baum, B., Muley, L., Heine, A., Smolinski, M., Hangauer, D., Klebe, G., 2009.
573 Think Twice: Understanding the High Potency of Bis(phenyl)methane Inhibitors
574 of Thrombin. Journal of Molecular Biology 391, 552–564. doi:10.1016/j.jmb.2009.06.016
575 Berman, H.M., 2000. The Protein Data Bank. Nucleic Acids Research 28,
576 235–242. doi:10.1093/nar/28.1.235
577 Bertram, S., Glowacka, I., Blazejewska, P., Soilleux, E., Allen, P., Danisch,
578 S., Steffen, I., Choi, S.-Y., Park, Y., Schneider, H., Schughart, K., Pöhlmann,
579 S., 2010. TMPRSS2 and TMPRSS4 Facilitate Trypsin-Independent Spread
580 of Influenza Virus in Caco-2 Cells. Journal of Virology 84, 10016–10025.
581 doi:10.1128/jvi.00239-10
582 Bieri, S., Djordjevic, J.T., Daly, N.L., Smith, R., Kroon, P.A., 1995. Disulfide
583 bridges of a cysteine-rich repeat of the LDL receptor ligand-binding domain.
584 Biochemistry 34, 13059–13065. doi:10.1021/bi00040a017
585 Brown, M.S., Goldstein, J.L., 1986. A receptor-mediated pathway for choles-
586 terol homeostasis. Science 232, 34–47. doi:10.1126/science.3513311
587 Castro, E. de, Sigrist, C.J., Gattiker, A., Bulliard, V., Langendijk-Genevaux,
588 P.S., Gasteiger, E., Bairoch, A., Hulo, N., 2006. ScanProsite: Detection of
14 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
589 PROSITE signature matches and ProRule-associated functional and structural
590 residues in proteins. Nucleic Acids Research 34, W362–W365. doi:10.1093/nar/gkl124
591 Chaipan, C., Kobasa, D., Bertram, S., Glowacka, I., Steffen, I., Tsegaye, T.S.,
592 Takeda, M., Bugge, T.H., Kim, S., Park, Y., Marzi, A., Pöhlmann, S., 2009.
593 Proteolytic activation of the 1918 influenza virus hemagglutinin. Journal of
594 virology 83, 3200–11. doi:10.1128/JVI.02205-08
595 Chikhale, R.V., Gupta, V.K., Eldesoky, G.E., Wabaidur, S.M., Patil, S.A.,
596 Islam, M.A., 2020. Identification of potential anti-TMPRSS2 natural prod-
597 ucts through homology modelling, virtual screening and molecular dynamics
598 simulation studies. Journal of Biomolecular Structure and Dynamics 1–16.
599 doi:10.1080/07391102.2020.1798813
600 Christensen, A.M., Massiah, M.A., Turner, B.G., Sundquist, W.I., Sum-
601 mers, M.F., 1996. Three-Dimensional Structure of the HTLV-II Matrix Protein
602 and Comparative Analysis of Matrix Proteins from the Different Classes of
603 Pathogenic Human Retroviruses. Journal of Molecular Biology 264, 1117–1131.
604 doi:10.1006/jmbi.1996.0700
605 Clausen, T., Southan, C., Ehrmann, M., 2002. The HtrA family of proteases:
606 implications for protein composition and cell fate. Molecular cell 10, 443–55.
607 doi:10.1016/s1097-2765(02)00658-5
608 Daecke, J., Fackler, O.T., Dittmar, M.T., Kräusslich, H.-G., 2005. Involve-
609 ment of Clathrin-Mediated Endocytosis in Human Immunodeficiency Virus Type
610 1 Entry. Journal of Virology 79, 1581–1594. doi:10.1128/jvi.79.3.1581-1594.2005
611 Daly, N.L., Scanlon, M.J., Djordjevic, J.T., Kroon, P.A., Smith, R., 1995.
612 Three-dimensional structure of a cysteine-rich repeat from the low-density lipopro-
613 tein receptor. Proceedings of the National Academy of Sciences of the United
614 States of America 92, 6334–6338. doi:10.1073/pnas.92.14.6334
615 Dawson, N.L., Lewis, T.E., Das, S., Lees, J.G., Lee, D., Ashford, P., Orengo,
616 C.A., Sillitoe, I., 2017. CATH: an expanded resource to predict protein func-
617 tion through structure and sequence. Nucleic Acids Research 45, D289–D295.
618 doi:10.1093/nar/gkw1098
619 Ewen, C.L., Kane, K.P., Bleackley, R.C., 2012. A quarter century of
620 granzymes. doi:10.1038/cdd.2011.153
621 Freeman, M., Ashkenas, J., Rees, D.J., Kingsley, D.M., Copeland, N.G.,
622 Jenkins, N.A., Krieger, M., 1990. An ancient, highly conserved family of
623 cysteine-rich protein domains revealed by cloning type I and type II murine
624 macrophage scavenger receptors. Proceedings of the National Academy of
625 Sciences 87, 8810–8814. doi:10.1073/pnas.87.22.8810
626 Friesner, R.A., Banks, J.L., Murphy, R.B., Halgren, T.A., Klicic, J.J., Mainz,
627 D.T., Repasky, M.P., Knoll, E.H., Shelley, M., Perry, J.K., Shaw, D.E., Francis,
628 P., Shenkin, P.S., 2004. Glide: A New Approach for Rapid, Accurate Docking
629 and Scoring. 1. Method and Assessment of Docking Accuracy. Journal of
630 Medicinal Chemistry 47, 1739–1749. doi:10.1021/jm0306430
631 Friesner, R.A., Murphy, R.B., Repasky, M.P., Frye, L.L., Greenwood, J.R.,
632 Halgren, T.A., Sanschagrin, P.C., Mainz, D.T., 2006. Extra precision glide: Dock-
633 ing and scoring incorporating a model of hydrophobic enclosure for protein-ligand
15 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
634 complexes. Journal of Medicinal Chemistry 49, 6177–6196. doi:10.1021/jm051256o
635 Glende, J., Schwegmann-Wessels, C., Al-Falah, M., Pfefferle, S., Qu, X., Deng,
636 H., Drosten, C., Naim, H.Y., Herrler, G., 2008. Importance of cholesterol-rich
637 membrane microdomains in the interaction of the S protein of SARS-coronavirus
638 with the cellular receptor angiotensin-converting enzyme 2. Virology 381, 215–
639 221. doi:10.1016/j.virol.2008.08.026
640 Glowacka, I., Bertram, S., Muller, M.A., Allen, P., Soilleux, E., Pfefferle,
641 S., Steffen, I., Tsegaye, T.S., He, Y., Gnirss, K., Niemeyer, D., Schneider, H.,
642 Drosten, C., Pohlmann, S., 2011. Evidence that TMPRSS2 Activates the Severe
643 Acute Respiratory Syndrome Coronavirus Spike Protein for Membrane Fusion
644 and Reduces Viral Control by the Humoral Immune Response. Journal of
645 Virology 85, 4122–4134. doi:10.1128/JVI.02232-10
646 Gough, J., Karplus, K., Hughey, R., Chothia, C., 2001. Assignment of
647 homology to genome sequences using a library of hidden Markov models that
648 represent all proteins of known structure. Journal of Molecular Biology 313,
649 903–919. doi:10.1006/jmbi.2001.5080
650 Gu, S.X., Tyagi, T., Jain, K., Gu, V.W., Lee, S.H., Hwa, J.M., Kwan, J.M.,
651 Krause, D.S., Lee, A.I., Halene, S., Martin, K.A., Chun, H.J., Hwa, J., 2021.
652 Thrombocytopathy and endotheliopathy: crucial contributors to COVID-19
653 thromboinflammation. doi:10.1038/s41569-020-00469-1
654 Heissig, B., Salama, Y., Takahashi, S., Osada, T., Hattori, K., 2020. The
655 multifaceted role of plasminogen in inflammation. Cellular Signalling 75, 109761.
656 doi:10.1016/j.cellsig.2020.109761
657 Hempel, T., Raich, L., Olsson, S., Azouz, N.P., Klingler, A.M., Hoffmann,
658 M., Pöhlmann, S., Rothenberg, M.E., Noé, F., 2021. Molecular mechanism
659 of inhibiting the sars-cov-2 cell entry facilitator tmprss2 with camostat and
660 nafamostat. Chem. Sci. –. doi:10.1039/D0SC05064D
661 Herter, S., Piper, D.E., Aaron, W., Gabriele, T., Cutler, G., Cao, P., Bhatt,
662 A.S., Choe, Y., Craik, C.S., Walker, N., Meininger, D., Hoey, T., Austin, R.J.,
663 2005. Hepatocyte growth factor is a preferred in vitro substrate for human
664 hepsin, a membrane-anchored serine protease implicated in prostate and ovarian
665 cancers. Biochemical Journal 390, 125–136. doi:10.1042/BJ20041955
666 Heurich, A., Hofmann-Winkler, H., Gierer, S., Liepold, T., Jahn, O., Pohlmann,
667 S., 2014. TMPRSS2 and ADAM17 Cleave ACE2 Differentially and Only Pro-
668 teolysis by TMPRSS2 Augments Entry Driven by the Severe Acute Respira-
669 tory Syndrome Coronavirus Spike Protein. Journal of Virology 88, 1293–1307.
670 doi:10.1128/JVI.02202-13
671 Hildebrand, A., Remmert, M., Biegert, A., Söding, J., 2009. Fast and
672 accurate automatic structure prediction with HHpred. Proteins: Structure,
673 Function, and Bioinformatics 77, 128–132. doi:10.1002/prot.22499
674 Hoffmann, M., Kleine-Weber, H., Schroeder, S., Krüger, N., Herrler, T.,
675 Erichsen, S., Schiergens, T.S., Herrler, G., Wu, N.-H., Nitsche, A., Müller, M.A.,
676 Drosten, C., Pöhlmann, S., 2020. SARS-CoV-2 Cell Entry Depends on ACE2
677 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor. Cell
678 181, 271–280.e8. doi:10.1016/j.cell.2020.02.052
679 Hohenester, E., Sasaki, T., Timpl, R., 1999. Crystal structure of a scavenger
16 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
680 receptor cysteine-rich domain sheds light on an ancient superfamily. Nature
681 structural biology 6, 228–32. doi:10.1038/6669
682 Hu, B., Guo, H., Zhou, P., Shi, Z.L., 2020. Characteristics of SARS-CoV-2
683 and COVID-19. Nature Reviews Microbiology. doi:10.1038/s41579-020-00459-7
684 Huggins, D.J., 2020. Structural analysis of experimental drugs binding to the
685 SARS-CoV-2 target TMPRSS2. Journal of Molecular Graphics and Modelling
686 100, 107710. doi:10.1016/j.jmgm.2020.107710
687 Hughes, H., Stephens, D.J., 2008. Assembly, organization, and function of the
688 COPII coat. Histochemistry and Cell Biology 129, 129–151. doi:10.1007/s00418-
689 007-0363-x
690 Idris, M.O., Yekeen, A.A., Alakanse, O.S., Durojaye, O.A., 2020. Computer-
691 aided screening for potential TMPRSS2 inhibitors: a combination of pharma-
692 cophore modeling, molecular docking and molecular dynamics simulation ap-
693 proaches. Journal of Biomolecular Structure and Dynamics 1–19. doi:10.1080/07391102.2020.1792346
694 Ilnytska, O., Santiana, M., Hsu, N.Y., Du, W.L., Chen, Y.H., Viktorova,
695 E.G., Belov, G., Brinker, A., Storch, J., Moore, C., Dixon, J.L., Altan-Bonnet,
696 N., 2013. Enteroviruses harness the cellular endocytic machinery to remodel
697 the host cell cholesterol landscape for effective viral replication. Cell Host and
698 Microbe 14, 281–293. doi:10.1016/j.chom.2013.08.002
699 Inoue, Y., Tanaka, N., Tanaka, Y., Inoue, S., Morita, K., Zhuang, M., Hattori,
700 T., Sugamura, K., 2007. Clathrin-Dependent Entry of Severe Acute Respiratory
701 Syndrome Coronavirus into Target Cells Expressing ACE2 with the Cytoplasmic
702 Tail Deleted. Journal of Virology 81, 8722–8729. doi:10.1128/jvi.00253-07
703 Johnson, M., Zaretskaya, I., Raytselis, Y., Merezhuk, Y., McGinnis, S.,
704 Madden, T.L., 2008. NCBI BLAST: a better web interface. Nucleic Acids
705 Research 36, W5–W9. doi:10.1093/nar/gkn201
706 Jorgensen, W.L., Tirado-Rives, J., 1988. The OPLS [optimized potentials
707 for liquid simulations] potential functions for proteins, energy minimizations
708 for crystals of cyclic peptides and crambin. Journal of the American Chemical
709 Society 110, 1657–1666. doi:10.1021/ja00214a001
710 Katoh, K., Rozewicki, J., Yamada, K.D., 2018. MAFFT online service: Mul-
711 tiple sequence alignment, interactive sequence choice and visualization. Briefings
712 in Bioinformatics 20, 1160–1166. doi:10.1093/bib/bbx108
713 Kelley, L.A., Mezulis, S., Yates, C.M., Wass, M.N., Sternberg, M.J., 2015.
714 The Phyre2 web portal for protein modeling, prediction and analysis. Nature
715 Protocols 10, 845–858. doi:10.1038/nprot.2015.053
716 Kim, M., Yoo, H.J., Lee, D., Lee, J.H., 2020. Oxidized LDL induces
717 procoagulant profiles by increasing lysophosphatidylcholine levels, lysophos-
718 phatidylethanolamine levels, and Lp-PLA2 activity in borderline hypercholes-
719 terolemia. Nutrition, Metabolism and Cardiovascular Diseases 30, 1137–1146.
720 doi:10.1016/j.numecd.2020.03.015
721 Kitamoto, Y., Yuan, X., Wu, Q., McCourt, D.W., Sadler, J.E., 1994. En-
722 terokinase, the initiator of intestinal digestion, is a mosaic protease composed of
723 a distinctive assortment of domains. Proceedings of the National Academy of
724 Sciences 91, 7588–7592. doi:10.1073/pnas.91.16.7588
725 Kyrieleis, O.J.P., Huber, R., Ong, E., Oehler, R., Hunter, M., Madison, E.L.,
17 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
726 Jacob, U., 2007. Crystal structure of the catalytic domain of DESC1, a new
727 member of the type II transmembrane serine proteinase family. FEBS Journal
728 274, 2148–2160. doi:10.1111/j.1742-4658.2007.05756.x
729 Law, R.H.P., Wu, G., Leung, E.W.W., Hidaka, K., Quek, A.J., Caradoc-
730 Davies, T.T., Jeevarajah, D., Conroy, P.J., Kirby, N.M., Norton, R.S., Tsuda, Y.,
731 Whisstock, J.C., 2017. X-ray crystal structure of plasmin with tranexamic acid–
732 derived active site inhibitors. Blood Advances 1, 766–771. doi:10.1182/bloodadvances.2016004150
733 Lechtenberg, B.C., Murray-Rust, T.A., Johnson, D.J.D., Adams, T.E., Kr-
734 ishnaswamy, S., Camire, R.M., Huntington, J.A., 2013. Crystal structure of
735 the prothrombinase complex from the venom of Pseudonaja textilis. Blood 122,
736 2777–2783. doi:10.1182/blood-2013-06-511733
737 Lee, Y., Ko, D., Min, H.-J., Kim, S.B., Ahn, H.-M., Lee, Y., Kim, S., 2016.
738 TMPRSS4 induces invasion and proliferation of prostate cancer cells through in-
739 duction of Slug and cyclin D1. Oncotarget 7, 50315–50332. doi:10.18632/oncotarget.10382
740 Levi, M., Thachil, J., Iba, T., Levy, J.H., 2020. Coagulation abnormalities
741 and thrombosis in patients with COVID-19. doi:10.1016/S2352-3026(20)30145-9
742 Lewis, T.E., Sillitoe, I., Andreeva, A., Blundell, T.L., Buchan, D.W., Chothia,
743 C., Cozzetto, D., Dana, J.M., Filippis, I., Gough, J., Jones, D.T., Kelley, L.A.,
744 Kleywegt, G.J., Minneci, F., Mistry, J., Murzin, A.G., Ochoa-Montaño, B., Oates,
745 M.E., Punta, M., Rackham, O.J., Stahlhacke, J., Sternberg, M.J., Velankar, S.,
746 Orengo, C., 2015. Genome3D: exploiting structure to help users understand their
747 sequences. Nucleic Acids Research 43, D382–D386. doi:10.1093/nar/gku973
748 Li, X., Zhu, W., Fan, M., Zhang, J., Peng, Y., Huang, F., Wang, N.,
749 He, L., Zhang, L., Holmdahl, R., Meng, L., Lu, S., 2021. Dependence of
750 SARS-CoV-2 infection on cholesterol-rich lipid raft and endosomal acidifica-
751 tion. Computational and Structural Biotechnology Journal 19, 1933–1943.
752 doi:10.1016/j.csbj.2021.04.001
753 Lingwood, D., Simons, K., 2010. Lipid rafts as a membrane-organizing
754 principle. doi:10.1126/science.1174621
755 Lipinska, B., Zylicz, M., Georgopoulos, C., 1990. The HtrA (DegP) protein,
756 essential for Escherichia coli survival at high temperatures, is an endopeptidase.
757 Journal of Bacteriology 172, 1791–1797. doi:10.1128/jb.172.4.1791-1797.1990
758 Liu, C.X., Li, Y., Obermoeller-McCormick, L.M., Schwartz, A.L., Bu, G.,
759 2001. The Putative Tumor Suppressor LRP1B, a Novel Member of the Low
760 Density Lipoprotein (LDL) Receptor Family, Exhibits Both Overlapping and
761 Distinct Properties with the LDL Receptor-related Protein. Journal of Biological
762 Chemistry 276, 28889–28896. doi:10.1074/jbc.M102727200
763 Lo, M.W., Kemper, C., Woodruff, T.M., 2020. COVID-19: Complement,
764 Coagulation, and Collateral Damage. The Journal of Immunology 205, 1488–
765 1495. doi:10.4049/jimmunol.2000644
766 López-Otín, C., Overall, C.M., 2002. “Protease degradomics: A new chal-
767 lenge for proteomics”. Nature Reviews Molecular Cell Biology 3, 509–519.
768 doi:10.1038/nrm858
769 Madeira, F., Park, Y.M., Lee, J., Buso, N., Gur, T., Madhusoodanan, N.,
770 Basutkar, P., Tivey, A.R., Potter, S.C., Finn, R.D., Lopez, R., 2019. The
18 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
771 EMBL-EBI search and sequence analysis tools APIs in 2019. Nucleic acids
772 research. doi:10.1093/nar/gkz268
773 Mahdy, A., Webster, N., 2004. Perioperative systemic haemostatic agents.
774 British Journal of Anaesthesia 93, 842–858. doi:10.1093/bja/aeh227
775 Matsuyama, S., Nagata, N., Shirato, K., Kawase, M., Takeda, M., Taguchi, F.,
776 2010. Efficient Activation of the Severe Acute Respiratory Syndrome Coronavirus
777 Spike Protein by the Transmembrane Protease TMPRSS2. Journal of Virology
778 84, 12658–12664. doi:10.1128/JVI.01542-10
779 Min, H.J., Lee, M.K., Lee, J.W., Kim, S., 2014. TMPRSS4 induces cancer
780 cell invasion through pro-uPA processing. Biochemical and Biophysical Research
781 Communications 446, 1–7. doi:10.1016/j.bbrc.2014.01.013
782 Miyauchi, K., Kim, Y., Latinovic, O., Morozov, V., Melikyan, G.B., 2009.
783 HIV Enters Cells via Endocytosis and Dynamin-Dependent Fusion with Endo-
784 somes. Cell 137, 433–444. doi:10.1016/j.cell.2009.02.046
785 Muley, V.Y., Akhter, Y., Galande, S., 2019. PDZ Domains Across the
786 Microbial World: Molecular Link to the Proteases, Stress Response, and Protein
787 Synthesis. Genome Biology and Evolution 11, 644–659. doi:10.1093/gbe/evz023
788 Myoumoto, A., Nakatani, K., Koshimizu, T.A., Matsubara, H., Adachi, S.,
789 Tsujimoto, G., 2007. Glucocorticoid-induced granzyme A expression can be
790 used as a marker of glucocorticoid sensitivity for acute lymphoblastic leukemia
791 therapy. Journal of Human Genetics 52, 328–333. doi:10.1007/s10038-007-0119-4
792 Nardacci, R., Colavita, F., Castilletti, C., Lapa, D., Matusali, G., Meschi,
793 S., Del Nonno, F., Colombo, D., Capobianchi, M.R., Zumla, A., Ippolito, G.,
794 Piacentini, M., Falasca, L., 2021. Evidences for lipid involvement in SARS-CoV-2
795 cytopathogenesis. Cell Death and Disease 12. doi:10.1038/s41419-021-03527-9
796 Needleman, S.B., Wunsch, C.D., 1970. A general method applicable to the
797 search for similarities in the amino acid sequence of two proteins. Journal of
798 Molecular Biology 48, 443–453. doi:10.1016/0022-2836(70)90057-4
799 Nicola, A.V., McEvoy, A.M., Straus, S.E., 2003. Roles for Endocytosis and
800 Low pH in Herpes Simplex Virus Entry into HeLa and Chinese Hamster Ovary
801 Cells. Journal of Virology 77, 5324–5332. doi:10.1128/jvi.77.9.5324-5332.2003
802 Ovcharenko, A., Zhirnov, O., 1994. Aprotinin aerosol treatment of influenza
803 and paramyxovirus bronchopneumonia of mice. Antiviral Research 23, 107–118.
804 doi:10.1016/0166-3542(94)90038-8
805 Partridge, J.R., Choy, R.M., Silva-Garcia, A., Yu, C., Li, Z., Sham, H., Met-
806 calf, B., 2019. Structures of full-length plasma kallikrein bound to highly specific
807 inhibitors describe a new mode of targeted inhibition. Journal of Structural
808 Biology 206, 170–182. doi:10.1016/j.jsb.2019.03.001
809 Puente, X., Sánchez, L., Gutiérrez-Fernández, A., Velasco, G., López-Otín,
810 C., 2005. A genomic view of the complexity of mammalian proteolytic systems.
811 Biochemical Society Transactions 33, 331–334. doi:10.1042/BST0330331
812 Rahman, N., Basharat, Z., Yousuf, M., Castaldo, G., Rastrelli, L., Khan, H.,
813 2020. Virtual Screening of Natural Products against Type II Transmembrane
814 Serine Protease (TMPRSS2), the Priming Agent of Coronavirus 2 (SARS-CoV-2).
815 Molecules 25, 2271. doi:10.3390/molecules25102271
816 RECOVERY Collaborative Group, Horby, P., Lim, W.S., Emberson, J.R.,
19 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
817 Mafham, M., Bell, J.L., Linsell, L., Staplin, N., Brightling, C., Ustianowski, A.,
818 Elmahi, E., Prudon, B., Green, C., Felton, T., Chadwick, D., Rege, K., Fegan,
819 C., Chappell, L.C., Faust, S.N., Jaki, T., Jeffery, K., Montgomery, A., Rowan,
820 K., Juszczak, E., Baillie, J.K., Haynes, R., Landray, M.J., 2021. Dexamethasone
821 in Hospitalized Patients with Covid-19. The New England journal of medicine
822 384, 693–704. doi:10.1056/NEJMoa2021436
823 Resnick, D., Pearson, A., Krieger, M., 1994. The SRCR superfamily: a
824 family reminiscent of the Ig superfamily. Trends in Biochemical Sciences 19, 5–8.
825 doi:10.1016/0968-0004(94)90165-1
826 Risitano, A.M., Mastellos, D.C., Huber-Lang, M., Yancopoulou, D., Garlanda,
827 C., Ciceri, F., Lambris, J.D., 2020. Complement as a target in COVID-19?
828 Nature Reviews Immunology 20, 343–344. doi:10.1038/s41577-020-0320-7
829 Roversi, P., Johnson, S., Caesar, J.J.E., McLean, F., Leath, K.J., Tsiftsoglou,
830 S.A., Morgan, B.P., Harris, C.L., Sim, R.B., Lea, S.M., 2011. Structural basis
831 for complement factor I control and its disease-associated sequence polymor-
832 phisms. Proceedings of the National Academy of Sciences 108, 12839–12844.
833 doi:10.1073/pnas.1102167108
834 Ruike, Y., Katsuma, S., Hirasawa, A., Tsujimoto, G., 2007. Glucocorticoid-
835 induced alternative promoter usage for a novel 5’ variant of granzyme A. Journal
836 of Human Genetics 52, 172–178. doi:10.1007/s10038-006-0099-9
837 Sakai, K., Ami, Y., Tahara, M., Kubota, T., Anraku, M., Abe, M., Nakajima,
838 N., Sekizuka, T., Shirato, K., Suzaki, Y., Ainai, A., Nakatsu, Y., Kanou, K.,
839 Nakamura, K., Suzuki, T., Komase, K., Nobusawa, E., Maenaka, K., Kuroda,
840 M., Hasegawa, H., Kawaoka, Y., Tashiro, M., Takeda, M., 2014. The Host
841 Protease TMPRSS2 Plays a Major Role in In Vivo Replication of Emerging
842 H7N9 and Seasonal Influenza Viruses. Journal of Virology 88, 5608–5616.
843 doi:10.1128/JVI.03677-13
844 Schrödinger, 2018. Maestro | Schrödinger.
845 Schrödinger, LLC, 2015. The PyMOL molecular graphics system, version 1.8.
846 Shen, L.W., Mao, H.J., Wu, Y.L., Tanaka, Y., Zhang, W., 2017. TMPRSS2:
847 A potential target for treatment of influenza virus and coronavirus infections.
848 Biochimie 142, 1–10. doi:10.1016/j.biochi.2017.07.016
849 Sigrist, C.J., Cerutti, L., De Castro, E., Langendijk-Genevaux, P.S., Bulliard,
850 V., Bairoch, A., Hulo, N., 2009. PROSITE, a protein domain database for
851 functional characterization and annotation. Nucleic Acids Research 38, D161–
852 D166. doi:10.1093/nar/gkp885
853 Simons, K., Ikonen, E., 1997. Functional rafts in cell membranes. doi:10.1038/42408
854 Sonnhammer, E.L., Eddy, S.R., Durbin, R., 1997. Pfam: A comprehen-
855 sive database of protein domain families based on seed alignments. Pro-
856 teins: Structure, Function and Genetics 28, 405–420. doi:10.1002/(SICI)1097-
857 0134(199707)28:3<405::AID-PROT10>3.0.CO;2-L
858 Storti, R.V., Szwast, A.E., 1982. Molecular cloning and characterization of
859 Drosophila genes and their expression during embryonic development and in pri-
860 mary muscle cell cultures. Developmental Biology 90, 272–283. doi:10.1016/0012-
861 1606(82)90376-1
862 Szabo, R., Bugge, T., 2008. Type II transmembrane serine proteases in
20 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
863 development and disease. The International Journal of Biochemistry & Cell
864 Biology 40, 1297–1316. doi:10.1016/j.biocel.2007.11.013
865 Tang, J., Yu, C.L., Williams, S.R., Springman, E., Jeffery, D., Sprengeler,
866 P.A., Estevez, A., Sampang, J., Shrader, W., Spencer, J., Young, W., McGrath,
867 M., Katz, B.A., 2005. Expression, Crystallization, and Three-dimensional
868 Structure of the Catalytic Domain of Human Plasma Kallikrein. Journal of
869 Biological Chemistry 280, 41077–41089. doi:10.1074/jbc.M506766200
870 Tsirigos, K.D., Peters, C., Shu, N., Käll, L., Elofsson, A., 2015. The TOP-
871 CONS web server for consensus prediction of membrane protein topology and sig-
872 nal peptides. Nucleic Acids Research 43, W401–W407. doi:10.1093/nar/gkv485
873 Villalba, M., Exposito, F., Pajares, M.J., Sainz, C., Redrado, M., Remirez,
874 A., Wistuba, I., Behrens, C., Jantus-Lewintre, E., Camps, C., Montuenga, L.M.,
875 Pio, R., Lozano, M.D., Andrea, C. de, Calvo, A., 2019. TMPRSS4: A Novel
876 Tumor Prognostic Indicator for the Stratification of Stage IA Tumors and a
877 Liquid Biopsy Biomarker for NSCLC Patients. Journal of Clinical Medicine 8,
878 2134. doi:10.3390/jcm8122134
879 Violi, F., Pastori, D., Cangemi, R., Pignatelli, P., Loffredo, L., 2020. Hyperco-
880 agulation and Antithrombotic Treatment in Coronavirus 2019: A New Challenge.
881 Thrombosis and Haemostasis 120, 949–956. doi:10.1055/s-0040-1710317
882 Wallrapp, C., Hähnel, S., Müller-Pillasch, F., Burghardt, B., Iwamura,
883 T., Ruthenbürger, M., Lerch, M.M., Adler, G., Gress, T.M., 2000. A novel
884 transmembrane serine protease (TMPRSS3) overexpressed in pancreatic cancer.
885 Cancer research 60, 2602–6.
886 Wang, H., Yang, P., Liu, K., Guo, F., Zhang, Y., Zhang, G., Jiang, C., 2008.
887 SARS coronavirus entry into host cells through a novel clathrin- and caveolae-
888 independent endocytic pathway. Cell Research 18, 290–301. doi:10.1038/cr.2008.15
889 Wang, S., Li, W., Hui, H., Tiwari, S.K., Zhang, Q., Croker, B.A., Rawlings, S.,
890 Smith, D., Carlin, A.F., Rana, T.M., 2020. Cholesterol 25-Hydroxylase inhibits
891 SARS -CoV-2 and other coronaviruses by depleting membrane cholesterol. The
892 EMBO Journal 39, e106057. doi:10.15252/embj.2020106057
893 Waterhouse, A.M., Procter, J.B., Martin, D.M.A., Clamp, M., Barton, G.J.,
894 2009. Jalview Version 2–a multiple sequence alignment editor and analysis
895 workbench. Bioinformatics 25, 1189–1191. doi:10.1093/bioinformatics/btp033
896 Wei, W., Zhao, W., Wang, X., Teng, M., Niu, L., 2007. Purification,
897 crystallization and preliminary X-ray diffraction analysis of saxthrombin, a
898 thrombin-like enzyme from Gloydius saxatilis venom. Acta Crystallographica
899 Section F Structural Biology and Crystallization Communications 63, 704–707.
900 doi:10.1107/S1744309107031429
901 Yamamoto, T., Davis, C., Brown, M.S., Schneider, W.J., Casey, M., Goldstein,
902 J.L., Russell, D.W., 1984. The human LDL receptor: A cysteine-rich protein
903 with multiple Alu sequences in its mRNA. Cell 39, 27–38. doi:10.1016/0092-
904 8674(84)90188-0
905 Yang, H., Geiger, M., 2017. Cell penetrating SERPINA5 (ProteinC inhibitor,
906 PCI): More questions than answers. Seminars in Cell & Developmental Biology
907 62, 187–193. doi:10.1016/j.semcdb.2016.10.007
908 Zang, R., Castro, M.F.G., McCune, B.T., Zeng, Q., Rothlauf, P.W., Sonnek,
21 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
909 N.M., Liu, Z., Brulois, K.F., Wang, X., Greenberg, H.B., Diamond, M.S., Ciorba,
910 M.A., Whelan, S.P., Ding, S., 2020. TMPRSS2 and TMPRSS4 promote SARS-
911 CoV-2 infection of human small intestinal enterocytes. Science Immunology 5,
912 eabc3582. doi:10.1126/sciimmunol.abc3582
913 Zhang, S., Liu, Y., Wang, X., Yang, L., Li, H., Wang, Y., Liu, M., Zhao, X.,
914 Xie, Y., Yang, Y., Zhang, S., Fan, Z., Dong, J., Yuan, Z., Ding, Z., Zhang, Y., Hu,
915 L., 2020. SARS-CoV-2 binds platelet ACE2 to enhance thrombosis in COVID-19.
916 Journal of Hematology and Oncology 13, 120. doi:10.1186/s13045-020-00954-7
917 Zhirnov, O., Klenk, H., Wright, P., 2011. Aprotinin and similar pro-
918 tease inhibitors as drugs against influenza. Antiviral Research 92, 27–36.
919 doi:10.1016/j.antiviral.2011.07.014
920 Zmora, P., Blazejewska, P., Moldenhauer, A.-S., Welsch, K., Nehlmeier, I.,
921 Wu, Q., Schneider, H., Pohlmann, S., Bertram, S., 2014. DESC1 and MSPL
922 Activate Influenza A Viruses and Emerging Coronaviruses for Host Cell Entry.
923 Journal of Virology 88, 12087–12097. doi:10.1128/JVI.01427-14
22 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Table 1: Structural homologs of TMPRSS2 and TPMRSS4 identified by HHPred and Phyre2
PDB ID Uniprot ID Uniprot Name Protein name Gene Organism Length 4CRG P03951 FA11_HUMAN Coagulation factor XI (FXI) F11 Homo sapiens 625 4XDE P00748 FA12_HUMAN Coagulation factor XII F12 Homo sapiens 615 3NXP, 4O03 P00734 THRB_HUMAN Prothrombin, Coagulation factor II F2 Homo sapiens 622 6R2W P08709 FA7_HUMAN Coagulation factor VII F7 Homo sapiens 295 2ANY P03952 KLKB1_HUMAN Plasma kallikrein KLKB1 Homo sapiens 638 5YC6 P00749 UROK_HUMAN Urokinase-type plasminogen activator PLAU Homo sapiens 431 5UGG, 4DUR P00747 PLMN_HUMAN Plasminogen PLG Homo sapiens 810 3H5C P22891 PROZ_HUMAN Vitamin K-dependent protein Z PROZ Homo sapiens 444 3S69 Q7SZE1 VSPSX_GLOSA Thrombin-like enzyme saxthrombin (SVTLE) - Gloydius saxatilis 258 4BXW Q56VR3 FAXC_PSETE Venom prothrombin activator pseutarin-C catalytic subunit (PCCS) - Pseudonaja textilis 467 5NAT P00746 CFAD_HUMAN Complement factor D CFD Homo sapiens 253 5FCR P03953 CFAD_MOUSE Complement factor D Cfd Mus musculus 259 2XRC P05156 CFAI_HUMAN Complement factor I CFI Homo sapiens 583 3GYL Q16651 PRSS8_HUMAN Prostasin PRSS8 Homo sapiens 343 3NCL Q9Y5Y6 ST14_HUMAN Suppressor of tumorigenicity 14 protein ST14 Homo sapiens 855 1FIW Q9GL10 ACRO_SHEEP Acrosin ACR Ovis aries 329 1AO5 P36368 EGFB2_MOUSE Epidermal growth factor-binding protein type B Egfbp2 Mus musculus 261 3W94 A4UWM5 A4UWM5_ORYLA Enteropeptidase-1 EP-1 Oryzias latipes 1036 1ORF P12544 GRAA_HUMAN Granzyme A, Cytotoxic T-lymphocyte proteinase 1 GZMA Homo sapiens 262 1MZA P49863 GRAK_HUMAN Granzyme K, Natural killer cell tryptase-2 GZMK Homo sapiens 264 2ZGC P51124 GRAM_HUMAN Granzyme M, Natural killer cell granular protease GZMM Homo sapiens 257 2R0L, 1YC0 Q04756 HGFA_HUMAN Hepatocyte growth factor activator HGFAC Homo sapiens 655 1Z8G, 1P57 P05981 HEPS_HUMAN Serine protease hepsin, Transmembrane protease serine 1 HPN Homo sapiens 417 2OQ5 Q9UL52 TM11E_HUMAN Transmembrane protease serine 11E TMPRSS11E Homo sapiens 423 4DGJ P98073 ENTK_HUMAN Transmembrane protease serine 15, Enteropeptidase TMPRSS15 Homo sapiens 1019 1YM0 Q3HR18 Q3HR18_EISFE Lumbrokinase F238 - Eisenia fetida 245 1ELT Q7SIG3 ELA1_SALSA Elastase-1 - Salmo salar 236 2F91 Q52V24 Q52V24_ASTLP Hepatopancreas trypsin (Fragment) - Astacus leptodactylus 237 1PQ7 P35049 TRYP_FUSOX Trypsin - Fusarium oxysporum 248 Note: Blood coagulation related proteins are highlighted with bold face Hyphen used for information not available
Table 2: Root mean square deviations of predicted TMPRSS2 and TPMRSS4 3D structures with 36 known structural homologs
TMPRSS2 TMPRSS4 PDB ID Both domains Protease domain SRCR domain Both domains Protease domain SRCR domain 5UGG A 0.572 0.563 0.499 0.492 4DGJ A 0.611 0.611 0.584 0.584 2R0L A 0.612 0.612 0.592 0.592 3S69 A 0.613 0.613 0.674 0.674 1YC0 A 0.634 0.638 0.582 0.578 1YM0 A 0.645 0.645 0.599 0.599 1P57 B 0.648 0.648 0.687 0.679 3NXP A 0.656 0.559 1.007 0.913 3NCL A 0.659 0.667 0.647 0.647 2ANY A 0.665 0.692 0.534 0.534 4CRGA 0.667 0.667 0.574 0.574 1ELT A 0.692 0.692 0.642 0.642 3W94 A 0.692 0.692 0.532 0.532 2OQ5 A 0.693 0.684 0.639 0.639 2F91 A 0.701 0.701 0.677 0.677 3GYL B 0.731 0.836 0.688 0.688 1AO5 B 0.737 0.737 0.813 0.813 1FIW A 0.740 0.740 0.746 0.746 5YC6 U 0.749 0.749 0.621 0.621 4XDE A 0.762 0.762 0.683 0.683 5NATA 0.805 0.805 0.766 0.766 6R2W H 0.806 0.806 0.711 0.697 1Z8G A 0.814 0.818 0.869 0.801 0.679 0.169 2XRC A 0.832 0.684 0.996 0.821 0.813 8.525 5FCR B 0.836 0.836 0.937 0.937 4BXW A 0.842 0.611 0.747 0.746 4DUR A 0.867 0.699 0.721 0.687 2ZGC A 0.897 0.563 1.128 1.128 1ORF A 0.912 0.612 1.027 1.021 2XRC D 0.955 0.605 0.458 0.959 0.737 8.659 6ESO A 1.055 0.699 0.594 0.534 1PQ7 A 1.116 1.116 1.021 1.021 1MZA A 1.139 0.740 1.359 1.097 4HZH B 1.361 0.818 1.297 1.097 3H5C B 1.369 1.178 1.189 1.187 4O03 A 19.354 0.605 1.205 0.890 Note: Table is sorted from the best to worst RMSD values of TMPRSS2 with the compared known structures. The known structures which showed best RMSD values with the structure of both domains, or protease domain alone or SRCR domain alone are highlighted with bold face.
23 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Table 3: Glide docking scores (in kcal/mol) of blood coagulation related protein inhibitors with their known targets plasminogen, plasma kalikrein, prothrombin activator, and with predicted structures of TMPRSS2 and TMPRSS4.
Inhibitors Protein name or PDB ID OGJa 22Ua 89Mb 7SDc TMPRSS2 -4.851 -5.123 -4.332 -4.602 TMPRSS4 -8.842 -5.565 -8.564 -5.92 4BXW (Venom prothrombin activator pseutarin-C catalytic subunit) -5.985 -3.283 -4.145 -5.12 5UGG (Plasminogen) -4.576 -4.589 -4.576 -4.873 2ANY (Plasma kallikrein, light chain) -4.45 -5.11 -4.238 -5.489 Note: TMPRSS2 and TMPRSS4 docking scores are based on predicted strcutures. 4BXW, 5UGG, and 2ANY are PDB structure identifiers, whose protease domains were used for docking. All these structures are part of the blood coagulation factors (annotated in brackets) a Peptide like selective thrombin inhibitor; b Plasminogen inhibitor; c Plasma kalikrein inhibitor;
24 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Figure 1: Sequence comparison of TMPRSS2 and TMPRSS4 with their mouse orthologs Tmprss2 and Tmprss4. Panel A shows domain architectures of TMPRSS2 and TMPRSS4 proteins. The following domains with known activity are shown with their amino acid positions: Low-density lipoprotein receptor class A (LDLRA_2), Scavenger receptor cysteine rich (SRCR_2), and Serine proteases, trypsin family (Trypsin). Multiple sequence alignment between human TMPRSS2 and TMPRSS4 with their mouse orthologs is shown in panel (B). The approximate location of the domains shown in (A) are indicated by bars on top of the multiple alignment in (B) with the same color code in (A). The location of the triad of Ser-His-Asp, responsible for the proteolytic activity of the Trypsin domain is conserved in all four sequences and indicated by asterisks in (B). The serine protease domain and its active sites are conserved in all proteins. The25 LDLRA_2 domain appears to be functional in TMPRSS2 but it is truncated in TMPRSS4, which may have the cholesterol transport activity. Panel (C) shows the structural homologs of TMPRSS2 and TMPRSS4 in the PDB database. bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Figure 2: Predicted 3D structures of TMPRSS2 and TMPRSS4 SRCR and serine-protease domains. The figure shows structural decomposition of predicted proteins. Greek-key β-barrel fold and scavenger receptor cysteine-rich (SRCR) domains are shown in indigo and grey, respectively.
26 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
ACRO_SHEEP
EGFB2_MOUSE
GRAA_HUMAN
GRAK_HUMAN
GRAM_HUMAN
HEPS_HUMAN
Q52V24_ASTLP
ELA1_SALSA
Q3HR18_EISFE
VSPSX_GLOSA
TRYP_FUSOX
PRSS8_HUMAN
CFAD_MOUSE
CFAD_HUMAN
CFAI_HUMAN
FA7_HUMAN
FAXC_PSETE
PROZ_HUMAN
UROK_HUMAN
FA12_HUMAN
HGFA_HUMAN
THRB_HUMAN
PLMN_HUMAN
KLKB1_HUMAN
FA11_HUMAN
TM11E_HUMAN
ST14_HUMAN
A4UWM5_ORYLA
ENTK_HUMAN 27 Figure 3: Domain architectures of TMPRSS2 and TMPRSS4 structural homologs. The figure shows domain architectures of 29 proteins from ProSite database. The active sites are shown with red diamonds and disulphide bridges with golden lines. bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Figure 4: Superimposition of protease domains of predicted 3D structures of TMPRSS2 and TMPRSS4 with other known protease domains. Superimposition is shown of the protease domain of the 5UGG PDB structure of plasminogen with TMPRSS2 (A) and TMPRSS4 (B). TMPRSS2 and TMPRSS4 domains are shown in green while the 5UGG domain in red. Panel B and D shows the superimposition of the serine-protease catalytic triad (Ser-His-Asp) of TMPRSS2 and TMPRSS4 in green and 5UGG in red. The structure of 5UGG ligand (PDB ID 87M) is shown with blue spheres in (A) and (C).
28 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Figure 5: Docking of TMPRSS2 and TMPRSS4 with known protease inhibitors. Docking is shown of TMPRSS2 (A) and TMPRSS4 (B) with the peptide-like thrombin inhibitor OGJ and the plasminogen tranexamic acid-derived inhibitor 89M (C) and (D). The docking score of OGJ with TMPRSS2 was -4.851 (kcal/mol), and with TMPRSS4 was -8.980 (kcal/mol), whereas with the inhibitor 89M was -4.332 (kcal/mol) and -8.564 (kcal/mol), respectively. Hydrogen bonds are indicated by pink arrows and cation-pi interactions by red arrows.
29 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Supplementary table 1: Structural homologs of TMPRSS2 and TPMRSS4 sequences identified using HHPred
PDB ID Proba E-valueb Scorec SSd Query HMMe Template HMMf TMPRSS2 sequence search hits 2XRCA 100.0 5.5E-30 268.4 345 110-491 201-558(565) 1Z8G A 100.0 1.9E-27 235.6 349 140-492 1-363(372) 2OQ5 A 99.9 6.3E-25 200.2 230 256-489 1-231(232) 4DGJ A 99.9 9.9E-25 199.1 233 256-489 1-235(235) 4BXW A 99.9 2.5E-24 216.9 344 117-491 7-412(423) 6R2W H 99.9 3.4E-24 197.6 234 256-491 1-237(249) 1P57 B 99.9 3.8E-24 198.0 236 256-492 1-246(255) 3W94 A 99.9 4.1E-24 194.5 233 256-489 1-235(235) 2ANY A 99.9 9.5E-24 192.9 236 256-492 1-239(241) 3H5C B 99.9 1.1E-23 203.6 296 119-489 1-317(317) 4XDE A 99.9 6.8E-24 197.5 237 255-492 2-247(257) 4CRG A 99.9 7.6E-24 192.9 234 256-490 1-237(238) 3NCL A 99.9 9.1E-24 193.3 232 256-489 1-240(241) 5NAT A 99.9 1.3E-23 192.2 230 256-492 1-231(232) 5FCR B 99.9 1.8E-23 191.0 231 256-492 1-232(234) 3GYL B 99.9 1.3E-23 195.3 235 256-490 1-243(261) 2F91 A 99.9 1.7E-23 191.5 231 256-489 1-237(237) 1ELT A 99.9 1.6E-23 192.0 228 256-488 1-236(236) 5YC6 U 99.9 1.7E-23 192.1 234 256-489 1-246(246) 2R0L A 99.9 2E-23 192.7 236 256-492 1-242(248) TMPRSS4 sequence search hits 1Z8G A 100.0 1.1E-33 270.1 340 96-436 1-362(372) 2XRC A 100.0 1.2E-29 256.1 372 60-436 26-558(565) 2OQ5 A 100.0 5.3E-26 200.3 229 205-434 1-231(232) 2R0L A 99.9 1.7E-25 199.5 229 205-436 1-241(248) 1ORF A 99.9 2.4E-25 197.1 229 205-437 1-234(234) 4XDE A 99.9 4.8E-25 198.2 233 203-436 1-246(257) 1P57 B 99.9 1.9E-24 193.2 231 205-436 1-245(255) 5NAT A 99.9 2.3E-24 190.5 225 205-435 1-229(232) 1YC0 A 99.9 2.6E-24 196.4 242 195-437 21-277(283) 5YC6 U 99.9 1.7E-24 192.0 230 205-434 1-246(246) 1AO5 B 99.9 6.3E-25 194.4 224 205-436 1-236(237) 5UGG A 99.9 6.6E-24 189.4 239 194-436 6-251(251) 2ZGC A 99.9 3.7E-24 190.2 229 205-436 1-231(240) 1YM0 A 99.9 2.6E-24 190.9 228 205-436 1-238(238) 1FIW A 99.9 3.4E-24 196.3 232 205-436 1-251(290) 4DGJ A 99.9 3.2E-24 189.1 228 205-434 1-235(235) 1MZA A 99.9 4.1E-24 189.4 228 204-435 2-236(240) 4BXW A 99.9 4.6E-24 207.3 241 194-437 159-413(423) 3S69 A 99.9 2.3E-24 190.8 226 205-436 1-227(234) 1PQ7 A 99.9 3.8E-24 187.3 224 205-433 1-224(224) Note: TMPRSS2 and TMPRSS4 protein sequences were searched against the entire PDB database using HHPred web-server. The top 20 hits for each protein are reported in the table with their PDB and chain identifiers. a Probability of target to be a true positive; b The number of hits one can expect by chance with a score better than the one for the target when scanning the datbase; c Raw sequence similarity score; d Secondary structure similarity score between30 query and target; e Range of aligned match states from query HMM; f Range of aligned match states from target HMM; bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
A) TMPRSS2
B) TMPRSS4
Protein Length
Supplementary figure 1: The position of membrane helix in TMPRSS2 and TMPRSS4. The combined analysis report obtained from the TOPCONS web-server is shown, in which lower G (free energy) values represent amino acids that are likely to be part of the trans-membrane helix. The thick red and blue lines represent the inside and outside topology of the protein, respectively. A transmembrane helix is predicted at the N-termini in TMPRSS2 (A) and TMPRSS4 (B) protein sequences.
31 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.26.441280; this version posted April 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Supplementary figure 2: Superimposition of the protease domain of predicted 3D structures of TMPRSS2 and TMPRSS4 with the scavenger receptor cysteine-rich (SRCR) domain of 1Z8G PDB structure. TMPRSS2 (A) and TMPRSS4 (B) SRCR domains are shown in purple while 1Z8G is shown in grey.
32