International Journal of Molecular Sciences

Article Identification of Host Cellular Protein Substrates of SARS-COV-2 Main Protease

Márió Miczi 1,2,Mária Golda 1,2, Balázs Kunkli 1,2 , Tibor Nagy 3 ,József T˝ozsér 1 and János András Mótyán 1,*

1 Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, 4032 Debrecen, Hungary; [email protected] (M.M.); [email protected] (M.G.); [email protected] (B.K.); [email protected] (J.T.) 2 Doctoral School of Molecular Cell and Immune Biology, University of Debrecen, 4032 Debrecen, Hungary 3 Department of Applied Chemistry, Faculty of Science and Technology, University of Debrecen, 4032 Debrecen, Hungary; [email protected] * Correspondence: [email protected]; Tel.: +36-52/512-900

 Received: 23 November 2020; Accepted: 12 December 2020; Published: 15 December 2020 

Abstract: The novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the causative agent of coronavirus disease-19 (COVID-19) being associated with severe pneumonia. Like with other viruses, the interaction of SARS-CoV-2 with host cell proteins is necessary for successful replication, and cleavage of cellular targets by the viral protease also may contribute to the pathogenesis, but knowledge about the human proteins that are processed by the main protease (3CLpro) of SARS-CoV-2 is still limited. We tested the prediction potentials of two different in silico methods for the identification of SARS-CoV-2 3CLpro cleavage sites in human proteins. Short stretches of homologous host-pathogen protein sequences (SSHHPS) that are present in SARS-CoV-2 polyprotein and human proteins were identified using BLAST analysis, and the NetCorona 1.0 webserver was used to successfully predict cleavage sites, although this method was primarily developed for SARS-CoV. Human C-terminal-binding protein 1 (CTBP1) was found to be cleaved in vitro by SARS-CoV-2 3CLpro, the existence of the cleavage site was proved experimentally by using a His6-MBP-mEYFP recombinant substrate containing the predicted target sequence. Our results highlight both potentials and limitations of the tested algorithms. The identification of candidate host substrates of 3CLpro may help better develop an understanding of the molecular mechanisms behind the replication and pathogenesis of SARS-CoV-2.

Keywords: COVID-19; coronavirus; SARS; SARS-CoV-2; main protease; 3CL protease; NetCorona; SSHHPS; cleavage site identification; cleavage site prediction; host protein cleavage

1. Introduction A novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was identified in December 2019 as the causative agent of coronavirus disease-19 (COVID-19) that occurred first in Wuhan, Hubei province, China [1]. According to the data that were reported to the World Health Organization up to 10 December 2020, the global SARS-CoV-2 pandemic was associated with >68.16 million confirmed cases of infections and >1.55 million virus-related deaths worldwide (https://covid19.who.int). In general, viruses rely on the host machinery for the efficient infection and for the completion of the replication cycle, furthermore, changing expression profiles of host genes and interactions with the host proteins can also help the virus to evade the immune reaction after the infection, as it was observed in the case of SARS-CoV and SARS-CoV-2 infection, as well [2–5]. It is known that SARS coronavirus infection can influence multiple tissues or organs, including the respiratory system [6], coagulation

Int. J. Mol. Sci. 2020, 21, 9523; doi:10.3390/ijms21249523 www.mdpi.com/journal/ijms Int. J. Mol. Sci. 2020, 21, 9523 2 of 19

Int. J. Mol. Sci. 2020, 21, x FOR PEER REVIEW 2 of 19 system [7,8], gastrointestinal tract [9], or nervous system [10]. Numerous interacting partners of SARS-CoV-2SARS coronavirus proteins haveinfection already can beeninfluence identified multiple [3], buttissues the or detailed organs, function including and the proteolytic respiratory targets of SARS-CoV-2system [6], coagulation in the host cellssystem are [7,8], still understudied,gastrointestinal however,tract [9], or various nervous symptoms system [10]. may Numerous be connected in partinteracting with the partners destruction of SARS-CoV-2 of host proteins. proteins have already been identified [3], but the detailed functionThe genome and ofproteolytic SARS-CoV-2 targets codes of SARS-CoV-2 for multiple in non-structural the host cells proteinsare still understudied, (nsp) including however, two cysteine proteases,various a symptoms papain-like may protease be connected (, in PLpro),part with and the destruction a 3-chymotrypsin-like of host proteins. protease (nsp5, 3CLpro, The genome of SARS-CoV-2 codes for multiple non-structural proteins (nsp) including two or main protease), this last one is responsible for most of the processing of the viral polyprotein. cysteine proteases, a papain-like protease (nsp3, PLpro), and a 3-chymotrypsin-like protease (nsp5, Both3CLpro, SARS-CoV or main and SARS-CoV-2protease), this 3CLlast proteasesone is resp consistonsible offor three most domains.of the processing Domain of I andthe IIviral contain antiparallelpolyprotein.β-barrels, Both SARS-CoV while domain and IIISARS-CoV-2 has a helical 3CL arrangement. proteases consist The of active three site domains. comprises Domain His41 I and Cys145and catalytic II contain residues antiparallel [11– 13β-barrels,]. while domain III has a helical arrangement. The active site comprisesSARS-CoV His41 and and SARS-CoV-2 Cys145 catalytic 3CL proteases residues [11–13]. share high sequence identity (96%) [14] and differ only in few residuesSARS-CoV (Figure and1 a),SARS-CoV-2 including 3CL the Ser46proteases (SARS-CoV-2 share high 3CLprosequence numbering) identity (96%) which [14] and serine differ residue is locatedonly in in few the proximityresidues (Figure of the 1a), active including site of the the Ser46 (SARS-CoV-2 (Figure1b) 3CLpro but is notnumbering) involved which in the serine formation of anyresidue substrate is located binding in the subsite proximity (Figure of the1c). active The si substrate-bindingte of the enzyme (Figure cleft is1b) located but is not between involved domain in I the formation of any substrate binding subsite (Figure 1c). The substrate-binding cleft is located and II, and the substrate-binding subsites show high conservation [11–13]. Each amino acid side chain between domain I and II, and the substrate-binding subsites show high conservation [11–13]. Each of the substrate (P4-P4’) which fit in a successive subsite of the enzyme (S4-S4’) is named according amino acid side chain of the substrate (P4-P4’) which fit in a successive subsite of the enzyme to the(S4-S4’) notation is named of Schecter according and to Berger the notation [15]. of The Schect S4 siteer and of Berger the protease [15]. The is S4 a shallowsite of the hydrophobic protease is a site, whileshallow S3 enables hydrophobic binding ofsite, a wide while range S3 ofenables residues, binding including of a hydrophobicwide range of (e.g., residues, Val), polarincluding (e.g., Thr), or basichydrophobic (Arg, Lys) (e.g residues,. Val), polar because (e.g. Thr), P3 or residue basic (Arg, is exposed Lys) residues, to the because solvent. P3 S2residue and S1is exposed are deep to sites, S2 showsthe solvent. a preference S2 and S1 for are hydrophobic deep sites, S2 P2 shows residues a preference (Leu, Phe, for hydrophobic Val) of autoproteolytic P2 residues (Leu, cleavage Phe, sites of theVal) polyprotein, of autoproteolytic while S1cleavage pocket sites specifically of the polyprotein, binds P1-Gln while residue. S1 pocket The specifically relatively binds shallow P1-Gln S1 0 site residue. The relatively shallow S1' site mainly binds Ser or Ala residues, while the deep and mainly binds Ser or Ala residues, while the deep and hydrophilic S20 site can accept a wide variety hydrophilic S2' site can accept a wide variety of the residues even a large Lys. Similar to P3, the P3' of the residues even a large Lys. Similar to P3, the P30 residue is also exposed to the solvent, thus residue is also exposed to the solvent, thus specific interactions are not formed with the protease at specific interactions are not formed with the protease at this site, and the shallow hydrophobic S40 site this site, and the shallow hydrophobic S4' site also can bind various residues [11–13]. The high also can bind various residues [11–13]. The high conservation of substrate binding subsites implies conservation of substrate binding subsites implies that efficient inhibitors may target a wide range of thatCoV efficient 3CL proteases inhibitors [13], may and target the specificity a wide rangeof SARS-CoV-2 of CoV 3CL3CLpro proteases may be highly [13], and comparable the specificity with of SARS-CoV-2that of SARS-CoV. 3CLpro may be highly comparable with that of SARS-CoV.

FigureFigure 1. Comparison 1. Comparison of of SARS-CoV SARS-CoV and and SARS-CoV-2SARS-CoV-2 3CL 3CL proteases. proteases. (a) (Sequencea) Sequence alignment alignment of the of the proteasesproteases is shown, is shown, only only those those residues residues areare represented for for SARS-CoV SARS-CoV 3CLpro 3CLpro that thatare different are diff erentas as compared to SARS-CoV-2 3CLpro. Active site residues are green and underlined, residues that differ compared to SARS-CoV-2 3CLpro. Active site residues are green and underlined, residues that differ in in the two proteases are red. (b) Schematic structure of SARS-CoV-2 3CLpro based on its X-ray the two proteases are red. (b) Schematic structure of SARS-CoV-2 3CLpro based on its X-ray crystal crystal structure (6LU7.pdb). Catalytic residues (His41 and Cys145) are highlighted by green color, structure (6LU7.pdb). Catalytic residues (His41 and Cys145) are highlighted by green color, the residues the residues that are different in SARS-CoV and SARS-CoV-2 3CL proteases are shown by red. Ser46 thatresidue are diff erentis also in highlighted. SARS-CoV Inhi andbitor SARS-CoV-2 bound to 3CLthe proteasesactive site areis also shown shown by red.by sticks. Ser46 ( residuec) is also highlighted. Inhibitor bound to the active site is also shown by sticks. (c) Compositions of

substrate binding subsites are shown based on literature data [11–13,16], subsite nomenclature is shown according to Schechter and Berger [15]. Int. J. Mol. Sci. 2020, 21, 9523 3 of 19

The autoproteolytic cleavage site sequences of SARS-CoV and SARS-CoV-2 3CL polyproteins have already been described [17,18], but only a few cleavage sites were identified in host target proteins. It has been reported that SARS-CoV 3CLpro can cleave cellular V-ATPase G1 in vitro [19], and A549 human lung carcinoma cells overexpressing SARS-CoV 3CLpro showed down-regulated NF-κB production [20], the decreased NF-κB protein level may possibly be a consequence of the proteolytic processing of NF-κB by SARS-CoV 3CLpro. Based on the high-confidence interaction of SARS-CoV-2 3CLpro, histone deacetylase 2 (HDAC2) was also identified as a candidate target, and the catalytically inactive protease was found to interact with tRNA methyltransferase 1 (TRMT1), as well [3]. To date, only a single in vitro study has been reported in which an LC-MS based N-terminomics approach was applied to identify host targets of SARS-CoV and CoV-2 3CLpro by incubating the recombinantly expressed with cell lysates of lung and epithelial cells. Numerous host targets have been identified, the obtained cleavage site preferences which were derived from in vitro proteomic analyses revealed a high preference for P1-Gln, P2-Leu, and P1’-Gly/Ala/Ser residues [21]. The in silico methods are useful tools for the prediction of cleavage site sequences, such tools were designed for some viral proteases, e.g., for human immunodeficiency virus proteases (HIVcleave webserver) [22], picornaviral proteases (NetPicoRNA v. 1.0 webserver) [23], Group IV viral proteases [24,25], and an algorithm has also been developed for SARS-CoV 3CLpro (NetCorona 1.0 webserver) [26]. The identification of short stretches of homologous host-pathogen protein sequences (SSHHPS) was also used successfully to determine cleavage sites of Zika virus and Venezuelan equine encephalitis virus (VEEV) proteases in multiple human target proteins [24]. This method is based on the principle that host proteins may also contain such sequences that are identical with cleavage site sequences of viral polyproteins, therefore, may be potentially targeted by the viral protease. The NetCorona webserver was developed based on multiple cleavage site sequences of coronavirus polyproteins and is applicable for the prediction of potential cleavage sites of SARS-CoV 3CLpro, thus can be used for the identification of proteolytic targets and for inhibitor design, as well [26]. This algorithm was applied previously to predict cleavage sites in the nucleocapsid protein of porcine epidemic diarrhea virus (PEDV) 3CLpro [27], in the equine coronavirus polyprotein [28], or in human protein targets of SARS-CoV 3CLpro while developing the method [26]. In the case of SARS-CoV-2 3CLpro, glutathione peroxidase 1, selenoprotein F, and thioredoxin reductase 1 were proposed to be host substrates by using in silico algorithms [29], but the results were not validated in vitro. These proteins were not identified in the recently reported proteomic analysis as substrates of SARS-CoV-2 3CLpro [21], although, in vitro identification of host targets in additional cell types remain to be performed. Therefore, the application of in silico methods may aid the identification of proteolytic targets, and results of in silico analyses can be correlated with those of in vitro measurements to assess the reliability of predictions, which are widely used in the computational drug design. Accordingly, in this study, we aimed to apply SSHHPS analysis and the NetCorona 1.0 webserver to predict SARS-CoV-2 3CLpro cleavage sites. BLAST analysis was used to identify SSHHPS in human proteins, while NetCorona v. 1.0 webserver was applied for the prediction of cleavage probabilities. Structures of potential targets were also investigated to determine surface accessibilities of the predicted cleavage sites. Experimental approaches, including the design and use of His6-MBP-mEYFP recombinant protein substrates (MBP, maltose-binding protein; mEYFP, monomeric enhanced yellow fluorescent protein) were also applied to prove susceptibility for processing by SARS-CoV-2 3CLpro.

2. Results

2.1. Comparison of SARS-CoV and SARS-CoV-2 Protease Cleavage Sites First, we compared the autoproteolytic cleavage site sequences of SARS-CoV and SARS-CoV-2 3CLpro and found that the recognition sites closely resemble each other (Figure2). Similar to SARS-CoV [21,30], SARS-CoV-2 3CLpro cleavage sites also contain a conserved Gln residue in the P1 Int. J. Mol. Sci. 2020, 21, x FOR PEER REVIEW 4 of 19

2. Results

2.1. Comparison of SARS-CoV and SARS-CoV-2 Protease Cleavage Sites

Int. J. Mol.First, Sci. we2020 compared, 21, 9523 the autoproteolytic cleavage site sequences of SARS-CoV and SARS-CoV-24 of 19 3CLpro and found that the recognition sites closely resemble each other (Figure 2). Similar to SARS-CoV [21,30], SARS-CoV-2 3CLpro cleavage sites also contain a conserved Gln residue in the P1 position,position, andand therethere are are hydrophobic hydrophobic (Leu, (Leu, Phe, Phe, or or Val) Val) and and small small aliphatic aliphatic residues residues (mainly (mainly Ser orSer Ala) or inAla) P2 in and P2 P1’and positions, P1’ positions, respectively. respectively.

Figure 2.2. Comparison of SARS-CoV and SARS-CoV-2SARS-CoV-2 cleavagecleavage sitesite sequences.sequences. Sequences are shownshown forfor PLpro and and 3CLpro 3CLpro based based on on literature literature data data [15,16] [15,16. ].Cleavage Cleavage positions positions are are shown shown by asterisks, by asterisks, the thesequences sequences enclosed enclosed by bythe the lines lines were were used used to to prep prepareare sequence sequence logos. logos. P4-P4’ P4-P4’ substrate substrate residues are numbered accordingaccording toto thethe nomenclaturenomenclature ofof SchechterSchechter andand BergerBerger [[15].15].

Both thethe identicalidentical bindingbinding site compositionscompositions (Figure(Figure1 1)) andand highhigh similaritysimilarity ofof 3CLpro3CLpro cleavage cleavage sitessites inin thethe viral viral polyproteins polyproteins (Figure (Figure2) implied2) implied that that SARS-CoV SARS-CoV and SARS-CoV-2and SARS-CoV-2 3CL proteases 3CL proteases share similarshare similar substrate substrate profiles. profiles. Accordingly, Accordingly, the NetCorona the NetCorona v. 1.0 webserver v. 1.0 webserver which has which been developed has been primarilydeveloped for primarily the prediction for the ofpredicti SARS-CoVon of SARS-CoV 3CLpro cleavage 3CLpro sites cleavage [26] was sites assumed [26] was to assumed be potentially to be applicablepotentially to applicable predict SARS-CoV-2 to predict SARS-C 3CLprooV-2 cleavage 3CLpro sites, cleavage as well. sites, as well.

2.2. Testing NetCoronaNetCorona 1.01.0 WebserverWebserver forfor PredictionPrediction ofof SARS-CoV-2SARS-CoV-2 3CLpro3CLpro CleavageCleavage SitesSites First, wewe tested tested whether whether NetCorona NetCorona 1.0 algorithm 1.0 algorithm is suitable is for suitable the identification for the ofidentification autoproteolytic of cleavageautoproteolytic sites within cleavage the SARS-CoV-2 sites within polyprotein. the SARS-CoV-2 As was polyprotein. expected, no As NetCorona was expected, score no was NetCorona predicted forscore the was cleavage predicted sites for of the PLpro cleavage (, sites , of andPLpro nsp3) (nsp1, because nsp2, theseand nsp3) sites because are different these fromsites theare consensusdifferent from pattern the of 3CLpro.consensus Cleavage pattern sites of of3CLpro 3CLpro. Cleavage were identified sites successfullyof 3CLpro bywere the webserveridentified (Tablesuccessfully1), only by the the nsp5 webserver site resulted (Table in 1), a scoreonly the being nsp5 slightly site resulted below thein a threshold, score being indicating slightly below that 87% the sensitivitythreshold, indicating of the method that [87%26] maysensitivity be a limiting of the me factorthod of[26] prediction. may be a Thelimiti resultsng factor implied of prediction. that the NetCoronaThe results 1.0implied webserver that the can NetCorona be potentially 1.0 webserve applied tor can predict be potentially cleavage sites appl ofied SARS-CoV-2 to predict cleavage 3CLpro. sites of SARS-CoV-2 3CLpro. Table 1. Prediction of cleavage sites of SARS-CoV-2 polyprotein by NetCorona 1.0 webserver. Sequence ofTable SARS-CoV-2 1. Prediction polyprotein of cleavage was usedsites asof inputSARS-CoV [31].-2 Values polyprotein lower than by NetCorona the threshold 1.0 (0.5) webserver. are not predictedSequence asof aSARS-CoV-2 potential cleavage polyprotein site. Cleavage was used positions as inputare [31]. shown Values by lower asterisks. than the threshold (0.5) are notSARS-CoV-2 predicted as Cleavage a potential Site cleavage Protease site. Cleavage positions Sequence are shown NetCorona by asterisks. 1.0 Score SARS-CoV-2nsp1 Cleavage Site PLpro Protease ELNGG*AYTRY Sequence NetCorona no score1.0 Score given nsp2nsp1 PLpro PLpro ELNGG*AYTRY TLKGG*APTKV no score no scoregiven given nsp3nsp2 PLpro PLpro TLKGG*APTKV ALKGG*KIVNN no score no scoregiven given nsp4nsp3 3CLpro PLpro ALKGG*KIVNN SAVLQ*SGFRK no score given 0.891 nsp5nsp4 3CLpro3CLpro SAVLQ*SGFRK GVTFQ*SAVKR 0.891 0.458 nsp6nsp5 3CLpro3CLpro GVTFQ*SAVKR VATVQ*SKMSD 0.458 0.783 nsp7nsp6 3CLpro3CLpro VATVQ*SKMSD RATLQ*AIASE 0.783 0.838 nsp8nsp7 3CLpro3CLpro RATLQ*AIASE AVKLQ*NNELS 0.838 0.860 nsp9nsp8 3CLpro3CLpro AVKLQ*NNELS TVRLQ*AGNAT 0.860 0.904 nsp10nsp9 3CLpro3CLpro TVRLQ*AGNAT EPMLQ*SADAQ 0.904 0.865 nsp12nsp10 3CLpro3CLpro EPMLQ*SADAQ HTVLQ*AVGAC 0.865 0.905 nsp13nsp12 3CLpro3CLpro HTVLQ*AVGAC VATLQ*AENVT 0.905 0.680 nsp14 3CLpro FTRLQ*SLENV 0.964 nsp15 3CLpro YPKLQ*SSQAW 0.899 Int. J. Mol. Sci. 2020, 21, 9523 5 of 19

2.3. Identification of Host Targets by SSHHPS Analysis and NetCorona Prediction The NetCorona algorithm was found to be an effective tool for the prediction of those cleavage sites within the full-length polyprotein sequences which show the consensus pattern (Table1), no additional cleavage sites were identified in the polyprotein. We assumed that other methods that are based on the similarities of sequence motifs may also be applicable for cleavage site identification. Such a method is the SSHHPS analysis of which prediction potential has already been proved in the case of Group IV proteases [24,25]. We applied this method to find candidate targets of SARS-CoV-2 3CLpro, the SSHHPS were identified in human proteins by BLAST analysis using autoproteolytic cleavage site sequences of SARS-CoV-2 polyprotein as input (Table1). Int. J. TheMol. Sci. results 2020, 21 of, x SSHHPS FOR PEER analysesREVIEW are shown in Table S1 for all cleavage sites of SARS-CoV-2 6 of 19 polyprotein. Numerous human proteins were found to contain such a site that is similar to It has already been described that COVID-19 caused by SARS-CoV-2 infection may lead to the autoproteolytic cleavage sites of the polyprotein, highest similarities were obtained e.g., coagulation disorders and increased risk of venous thromboembolism [7]. Therefore, we for C-terminal-binding protein 1 and 2 (CTBP1 and CTBP2), dihydropyrimidinase-related protein investigated whether some human plasma proteins that may be susceptible to proteolysis by 2, protein tyrosine kinase 6 (PTK6), acetylcholinesterase (ACHE), protocadherin 19, JNK1/MAPK8- SARS-CoV-2 3CLpro. The sequences of these proteins were analyzed only by the NetCorona associated membrane protein, or obscurin proteins (Table S1). algorithm, SSHHPS were not identified by BLAST analysis in these proteins. Human plasminogen SSHHPS analysis showed a high similarity of a sequence motif of human PTK6 protein (PLMN) and plasminogen activator inhibitor 2 (PAI2) were identified as candidate substrates, while (89VRRLQ*AEGNA98) with that of the viral polyprotein (nsp9, TVRLQ*AGNAT). Accordingly, this site fibrinogen, plasminogen activators, and plasminogen activator inhibitor 1 (PAI1) were predicted to was identified by NetCorona 1.0 webserver with a relatively high probability, indicating that PTK6 contain no putative 3CLpro cleavage site (Table S2). A higher score was obtained for PLMN as contains a putative cleavage site of 3CLpro (Figure3). compared to PAI2, therefore, we selected PLMN for testing cleavage in vitro (Figure 3).

FigureFigure 3.3. PutativePutative SARS-CoV-2SARS-CoV-2 3CLpro 3CLpro cleavage cleavage sites sites in in some some selected selected human human target target proteins. proteins. For theFor selectedthe selected proteins, proteins, PDB PDB and and UniProt UniProt identifiers identifiers are shown,are shown, the the scores scores and and cleavage cleavage sites sites predicted predicted by NetCoronaby NetCorona 1.0 webserver1.0 webserver are alsoare al indicated.so indicated. Predicted Predicted score score for IRAK1for IRAK1 was was reported reported previously previously [26]. The[26]. structures The structures of the of proteins the proteins are also are shown also shown (grey), (grey), the predicted the predicted cleavage cleavage sites are sites highlighted are highlighted (blue), arrows(blue), arrows show the show P1-Gln the P1-Gln residues. residues.

373 382 HumanIdentification CTBP1 of protein some plasma was also proteins predicted (PLMN, to contain PAI2) a as sequence potential motif targets ( ELNGAAYRYPby the NetCorona) whichwebserver is similar implied to thethat nsp1 SSHHPS site ofanalysis SARS-CoV-2 alone may polyprotein be not sensitive (ELNGG*AYTRY). enough for the The high relatively throughput high scoreidentification obtained of for protease this site substrates, by SSHHPS however, analysis indicatedthese methods that CTBP1 are based may on also different be a proteolytic approaches target. for However,cleavage site the identifiedidentification, cleavage and sitethe isnumber likely to of be identified a cleavage sites site may of PLpro, depend the putativeon BLAST target settings sequence (e.g. doeslength not of resemble query sequence). the consensus Additionally, pattern of 3CLprowe assumed cleavage that sites structural and contains contexts no glutamineof the putative in the P1cleavage position. sites Despite need to this,be considered, the sequence therefore, of CTBP1 accessibilities was analyzed of target by theregions NetCorona were also 1.0 determined. webserver, 373 382 asA similar well (Figure in silico S1). approach As it was has expected, already thebeen single appli motifed for (theELNGAAYRYP identification of potential) of CTBP1-identified cleavage sites byin host SSHHPS selenoproteins analysis-was and not enzymes predicted of asglutathion a putativee synthesis cleavage [29], site ofbut 3CLpro, neither but proteomic interestingly [21] thenor 153 162 predictionspecifically revealed targeted a analys putativees proved 3CLpro cleavages cleavage siteof these in CTBP1 targets ( inGTRVQ*SVEQI vitro to date. Interestingly,), which was PAI2 not identifiedwas identified by SSHHPS as a substrate analysis basedof SARS-CoV on similarity and withhCoV-NL63 -15 3CLpro, cleavage as sites. well, while cleavage of PLMN was not detected by a proteomic analysis [21]. To investigate whether the candidate substrates are sensitive towards proteolysis by SARS-CoV-2 3CLpro in vitro, we selected CTBP1, PTK6, IRAK1, and PLMN recombinant proteins because of the potential cleavage sites of these proteins were found to be exposed to the surface (Figure 3). The putative cleavage site in acetylcholinesterase (ACHE) was found to be buried in the structure, therefore, ACHE was excluded from further analysis. Example of ACHE proved the importance of structural analysis of candidate substrates: the possible target sequences may be inaccessible for proteolysis even the high probability of cleavage that was implied by sequence-based prediction (e.g. by NetCorona v. 1.0).

2.5. In Vitro Cleavage of Recombinant Proteins by SARS-CoV-2 3CLpro For in vitro cleavage reactions we used untagged SARS-CoV-2 3CLpro. The protease was expressed in BL21(DE3) cells fused to an N-terminal His6-tag and then purified by Ni-NTA affinity chromatography. After the enzymatic removal of His6-tag using Factor Xa, the untagged enzyme

Int. J. Mol. Sci. 2020, 21, 9523 6 of 19

These results implied that the SSHHPS analysis may also be potentially applicable for the identification of the cleavage sites of PLpro, however, testing prediction potential in the case of PLpro was out of the scope of this study. NetCorona 1.0 webserver is applicable only for the prediction of 3CLpro sites.

2.4. Selection of Targets for In Vitro Investigation Out of the possible targets identified by in silico sequence analyses, we selected CTBP1 and PTK6 for further investigation. These proteins were identified as candidate substrates of SARS-CoV-2 3CLpro using NetCorona 1.0 webserver, as well (Figure3). Interleukin-1 receptor-associated kinase 1 (IRAK1) was predicted previously to be potentially cleaved by SARS-CoV 3CLpro [26], thus it was also selected for testing its proteolysis in vitro. We decided to include IRAK1 in this study in order to prove that potential targets of SARS-CoV 3CLpro may be accessible for cleavage by SARS-CoV-2 3CLpro, as well. Furthermore, to our best knowledge, cleavage of IRAK1 by SARS-CoV or SARS-CoV-2 3CLpro has not been proved experimentally to date. Interestingly, the cleavage site in IRAK1 was not identified by SSHHPS analysis, however, it was predicted with a high score by the NetCorona algorithm (Figure3). This may highlight a limitation of SSHHPS analysis and implies that the number of potential targets may depend on the settings of the BLAST search. It has already been described that COVID-19 caused by SARS-CoV-2 infection may lead to coagulation disorders and increased risk of venous thromboembolism [7]. Therefore, we investigated whether some human plasma proteins that may be susceptible to proteolysis by SARS-CoV-2 3CLpro. The sequences of these proteins were analyzed only by the NetCorona algorithm, SSHHPS were not identified by BLAST analysis in these proteins. Human plasminogen (PLMN) and plasminogen activator inhibitor 2 (PAI2) were identified as candidate substrates, while fibrinogen, plasminogen activators, and plasminogen activator inhibitor 1 (PAI1) were predicted to contain no putative 3CLpro cleavage site (Table S2). A higher score was obtained for PLMN as compared to PAI2, therefore, we selected PLMN for testing cleavage in vitro (Figure3). Identification of some plasma proteins (PLMN, PAI2) as potential targets by the NetCorona webserver implied that SSHHPS analysis alone may be not sensitive enough for the high throughput identification of protease substrates, however, these methods are based on different approaches for cleavage site identification, and the number of identified sites may depend on BLAST settings (e.g., length of query sequence). Additionally, we assumed that structural contexts of the putative cleavage sites need to be considered, therefore, accessibilities of target regions were also determined. A similar in silico approach has already been applied for the identification of potential cleavage sites in host selenoproteins and enzymes of glutathione synthesis [29], but neither proteomic [21] nor specifically targeted analyses proved cleavages of these targets in vitro to date. Interestingly, PAI2 was identified as a substrate of SARS-CoV and hCoV-NL63 3CLpro, as well, while cleavage of PLMN was not detected by a proteomic analysis [21]. To investigate whether the candidate substrates are sensitive towards proteolysis by SARS-CoV-2 3CLpro in vitro, we selected CTBP1, PTK6, IRAK1, and PLMN recombinant proteins because of the potential cleavage sites of these proteins were found to be exposed to the surface (Figure3). The putative cleavage site in acetylcholinesterase (ACHE) was found to be buried in the structure, therefore, ACHE was excluded from further analysis. Example of ACHE proved the importance of structural analysis of candidate substrates: the possible target sequences may be inaccessible for proteolysis even the high probability of cleavage that was implied by sequence-based prediction (e.g., by NetCorona v. 1.0).

2.5. In Vitro Cleavage of Recombinant Proteins by SARS-CoV-2 3CLpro For in vitro cleavage reactions we used untagged SARS-CoV-2 3CLpro. The protease was expressed in BL21(DE3) cells fused to an N-terminal His6-tag and then purified by Ni-NTA affinity chromatography. After the enzymatic removal of His6-tag using Factor Xa, the untagged enzyme Int. J. Mol. Sci. 2020, 21, 9523 7 of 19 Int. J. Mol. Sci. 2020, 21, x FOR PEER REVIEW 7 of 19 waswas further further purified purified by by ion-exchange ion-exchange chromatograp chromatography.hy. The The purity purity of of the the enzyme enzyme was was assessed assessed by by SDS-PAGESDS-PAGE (Figure (Figure 44).).

FigureFigure 4 4.. PurificationPurification of of SARS-CoV-2 SARS-CoV-2 3CLpro. 3CLpro. Figure Figure shows shows a a Coomassie-stained Coomassie-stained gel gel image, image, after after SDS-PAGESDS-PAGE analysis analysis of of samples: samples: 1) Molecular (1) Molecular weight weight standard; standard; 2) Soluble (2) Soluble lysate; lysate;3) Ni-NTA (3) Ni-NTAaffinity chromatography,affinity chromatography, flowthrough; flowthrough; 4) Ni-NTA (4) Ni-NTA affinity affi nitychromatography, chromatography, eluate eluate fraction fraction of of

HisHis66-SARS-CoV-2-SARS-CoV-2 3CLpro;3CLpro; (5) 5) ion-exchange ion-exchange affi nityaffi chromatography,nity chromatography, eluate of eluate untagged of SARS-CoV-2untagged SARS-CoV-23CLpro. Dotted 3CLpro. arrow showsDotted SARS-CoV-2 arrow shows 3CLpro, SARS-C whichoV-2 has 3CLpro, a slightly which lower has molecular a slightly weight lower than

molecularthe His6-tagged weight enzyme. than the His6-tagged enzyme.

CleavageCleavage reactions reactions were performedperformed byby SARS-CoV-2 SARS-CoV-2 3CLpro 3CLpro to to investigate investigate the the susceptibility susceptibility of theof theselected selected human human proteins proteins for proteolysis for proteo (Figurelysis5 ).(Figure Additionally, 5). Additionally, a His 6-MBP-mEYFP a His6-MBP-mEYFP recombinant recombinantprotein containing protein a natural containing cleavage a sitenatural of SARS-CoV-2 cleavage 3CLpro site (nsp4,of SARS-CoV-2 TSAVLQ*SGFRKM) 3CLpro was(nsp4, also TSAVLQ*SGFRKM)designed and applied was as aalso positive designed control and substrate applied in as the a inpositive vitro cleavage control reactions.substrate Thein the NetCorona in vitro cleavagescore obtained reactions. for theThe recombinant NetCorona substratescore obtained was identical for the recombinant to the value calculatedsubstrate was for the identical AVLQ*SGFR to the valuecleavage calculated site of thefor polyproteinthe AVLQ*SGFR (Table cleavage1). As was site expected, of the polyprotein the His 6-MBP-TSAVLQ*SGFRKM-mEYFP (Table 1). As was expected, the Hisfusion6-MBP-TSAVLQ*SGFRKM-mEYFP protein substrate was cleaved veryfusion effi proteinciently substrate by the protease. was cleaved The substratevery efficiently and cleavage by the protease.products The were substrate separated and bycleavage denaturing products SDS-PAGE were separated and then by denaturing detected in SDS-PAGE the gel usingand then UV Int.detectedtransillumination, J. Mol. Sci. in 2020the, 21gel, x which FORusing PEER indicatedUV REVIEW transillumination, successful in-gel which renaturation indicated of successful mEYFP (Figure in-gel5 a).renaturation 8 of 19of mEYFP (Figure 5a). For negative control, we applied bovine serum albumin (BSA) as a substrate of the untagged enzyme. Neither SSHHPS analysis nor NetCorona webserver predicted SARS-CoV-2 3CLpro cleavage sites in BSA (Table S2), in agreement with this we found that BSA was not processed by the protease (Figure 5b). PLMN was also predicted to contain a putative cleavage site, but we observed no processing (Figure 5c). However, the NetCorona score obtained for the putative cleavage site was above the threshold (0.5) but was below the highest probability range (0.8–1.0) (Figure 3). IRAK1 was identified previously as a candidate target of SARS-CoV 3CLpro [27], but its processing by a coronavirus 3CLpro has not been proved to date, therefore, IRAK1 was also subjected to proteolysis. We observed almost complete turnover of IRAK1 upon cleavage by SARS-CoV-2FigureFigure 5. 5. 3CLproCleavageCleavage (Figure of of recombinant recombinant 5d), proving proteins proteins that by byIRAK SARS-C SARS-CoV-21 is oV-2a proteolytic 3CLpro. 3CLpro. targetAfter After cleavage of cleavage SARS-CoV-2 reactions reactions 3CLpro. by by In thisSARS-CoV-2SARS-CoV-2 study, this PR,PR, protein reactionreaction was mixtures mixturesnot newly were were analyzed identifi analed byyzed SDS-PAGE.as aby candidate SDS-PAGE. Uncleaved substrate, Uncleaved proteins thus, and proteins was SARS-CoV-2 not and further investigatedSARS-CoV-2PR are shown in vitro. PR by are continuous shown by and continuous dashed lines, and respectively, dashed lines, while respectively, asterisks indicate while asterisks cleavage products.indicate cleavageCTBP1(a) Cleavage proteinproducts. of His was 6(-MBP-TSAVLQ*SGFRKM-mEYFP.Aftera) Cleavagefound to of be His processed6-MBP-TSAVLQ*SGFRKM-mEYFP. but we observed in-gel renaturation, lower Af turnoverter the in-gel full-length renaturation,as compared substrate to IRAK1theand (Figure full-length SGFRKM-mEYFP 5e). substrate The appearance cleavage and SGFR product ofKM-mEYFP cleavage were visualized productscleavage UV product implied illumination, were processing visualized and the of gel CTBP1,UV was illumination, stained we assumed with that andthisCoomassie thecleavage gel dye,was occursstained as well. withwithin (b) Cleavage Coomassie the reactioncleavage dye, ofas recombinantsite well. identified (b) Cleavage BSA. by (c )reaction Cleavagethe NetCorona of reactionrecombinant ofv. recombinant1.0 BSA. webserver (c) (153GTRVQ*SVEQICleavageplasminogen. reaction (162d)). Cleavage ofIn re ordercombinant of to reaction prove plasminogen. recombinant the existence (d IRAK1.) Cleavageof cleavage (e) Cleavage of reactionbetween reaction recombinant 157th of recombinant and 158thIRAK1. PTK6.residues, (e) Cleavage(f) Cleavage reaction reaction of recombinant of recombinant PTK6. CTBP1. (f) Cleavage Gel images reaction are representatives of recombinant of CTBP1.2 parallel Gel experiments. images are processing of CTBP1 was further investigated, as it is described later. ≥ representatives of ≥2 parallel experiments. For negative control, we applied bovine serum albumin (BSA) as a substrate of the untagged enzyme.PTK6 Neither was identified SSHHPS as analysis a potential nor NetCoronatarget both webserver by SSHHPS predicted analysis SARS-CoV-2 and NetCorona 3CLpro prediction, cleavage but we did not detect its processing by SARS-CoV-2 3CLpro (Figure 5f). It is important to note that the recombinant PTK6 was supplied in a buffer containing phenylmethylsulfonyl fluoride (PMSF) which is known to be able to effectively inhibit numerous serine proteases (including chymotrypsin and trypsin). Therefore, to exclude the possibility that the processing of PTK6 was impaired by PMSF, we investigated its effect on SARS-CoV-2 3CLpro activity. We found that PSMF does not inhibit the processing of His6-MBP-TSAVLQ*SGFRKM-mEYFP recombinant substrate, even at 0.005% (m/v) final concentration (Figure 6b). This implied that PMSF present in the stock solution cannot prevent proteolysis and proved that PTK6 is not a proteolytic target of SARS-CoV-2 3CLpro. The effect of the N-terminal His6-tag on the activity of 3CLpro was also tested, and the His6-tagged enzyme was unable for processing the recombinant substrate (Figure 6a). This in agreement with the findings of Grum-Tokars et al. who revealed a dramatic decrease of SARS-CoV 3CLpro activity upon addition of N- or C-terminal affinity tags [32].

Figure 6. Effect of N-terminal His6 tag and PMSF on SARS-CoV-2 3CLpro activity. (a) In cleavage reactions we used His6-MBP-TSAVLQ*SGFRKM-mEYFP as a substrate of His6-tagged and untagged forms of SARS-CoV-2 3CLpro. (b) The effect of PMSF on the cleavage of

His6-MBP-TSAVLQ*SGFRKM-mEYFP was studied, using untagged protease. PMSF was dissolved in ethanol and was applied in 0.005–0.00005% (m/v) concentration range. Uncleaved substrate is shown in figure part (a). Gel images are representatives of ≥2 parallel experiments.

Int. J. Mol. Sci. 2020, 21, x FOR PEER REVIEW 8 of 19

Int. J. Mol. Sci. 2020, 21, 9523 8 of 19 sites in BSA (Table S2), in agreement with this we found that BSA was not processed by the protease (Figure5b). Figure 5. Cleavage of recombinant proteins by SARS-CoV-2 3CLpro. After cleavage reactions by PLMN was also predicted to contain a putative cleavage site, but we observed no processing SARS-CoV-2 PR, reaction mixtures were analyzed by SDS-PAGE. Uncleaved proteins and (Figure5c). However, the NetCorona score obtained for the putative cleavage site was above the SARS-CoV-2 PR are shown by continuous and dashed lines, respectively, while asterisks indicate threshold (0.5) but was below the highest probability range (0.8–1.0) (Figure3). cleavage products. (a) Cleavage of His6-MBP-TSAVLQ*SGFRKM-mEYFP. After in-gel renaturation, IRAK1the full-length was identified substrate previously and SGFR asKM-mEYFP a candidate cleavage target ofproduct SARS-CoV were visualized 3CLpro [27 UV], butillumination, its processing by a coronavirusand the gel was 3CLpro stained has with not beenCoomassie proved dye, to date,as well. therefore, (b) Cleavage IRAK1 reaction was also of recombinant subjected to BSA. proteolysis. (c) We observedCleavage almost reaction complete of recombinant turnover plasminogen. of IRAK1 upon(d) Cleavage cleavage of by reaction SARS-CoV-2 recombinant 3CLpro IRAK1. (Figure (e) 5d), provingCleavage that IRAK1 reaction is of a recombinant proteolytic targetPTK6. ( off) SARS-CoV-2Cleavage reaction 3CLpro. of recombinant In this study, CTBP1. this Gel protein imageswas are not newlyrepresentatives identified as of a candidate≥2 parallel substrate,experiments. thus, was not further investigated in vitro. CTBP1 protein was found to be processed but we observed lower turnover as compared to IRAK1PTK6 (Figure was5 e).identified The appearance as a potential of cleavage target both products by SSHHPS implied analysis processing and of NetCorona CTBP1, we prediction, assumed thatbut we this did cleavage not detect occurs its processing within the by cleavage SARS-CoV-2 site identified 3CLpro (Figure by the 5f). NetCorona It is important v. 1.0 to webserver note that (the153GTRVQ*SVEQI recombinant PTK6162). Inwas order supplied to prove in a the buffer existence containing of cleavage phenylmethylsulfonyl between 157th and fluoride 158th residues, (PMSF) processingwhich is known of CTBP1 to be was able further to effectively investigated, inhibit as numerous it is described serine later. proteases (including chymotrypsin and PTK6trypsin). was Therefore, identified to as aexclude potential the target possibilit bothy by that SSHHPS the processing analysis andof PTK6 NetCorona was impaired prediction, by butPMSF, we didwe notinvestigated detect its its processing effect on by SARS-CoV-2 SARS-CoV-2 3CLpro 3CLpro activity. (Figure 5Wef). Itfound is important that PSMF to note does that not theinhibit recombinant the processing PTK6 wasof His supplied6-MBP-TSAVLQ*SGFRKM-mEYFP in a buffer containing phenylmethylsulfonyl recombinant substrate, fluoride even (PMSF) at which0.005% is ( knownm/v) final to beconcentration able to effectively (Figure inhibit 6b). This numerous implied serine that PMSF proteases present (including in the chymotrypsinstock solution andcannot trypsin). prevent Therefore, proteolysis to exclude and proved the possibility that PTK6 that is not the a processingproteolytic of target PTK6 of was SARS-CoV-2 impaired by 3CLpro. PMSF, we investigatedThe effect of its the effect N-terminal on SARS-CoV-2 His6-tag 3CLpro on the activity. activity Weof found3CLpro that was PSMF also doestested, not and inhibit the theHis processing6-tagged enzyme of His6 -MBP-TSAVLQ*SGFRKM-mEYFPwas unable for processing the recombinant recombinant substrate substrate, (Figure even at 0.005%6a). This (m /vin) finalagreement concentration with the (Figure findings6b). of This Grum-Tokars implied that et PMSFal. who present revealed in thea dramatic stock solution decrease cannot of SARS-CoV prevent proteolysis3CLpro activity and proved upon addition that PTK6 of isN- not or C-terminal a proteolytic affinity target tags of SARS-CoV-2 [32]. 3CLpro.

Figure 6. Effect of N-terminal His tag and PMSF on SARS-CoV-2 3CLpro activity. (a) In cleavage Figure 6. Effect of N-terminal His66 tag and PMSF on SARS-CoV-2 3CLpro activity. (a) In cleavage reactions we used His -MBP-TSAVLQ*SGFRKM-mEYFP as a substrate of His -tagged and untagged reactions we used His66-MBP-TSAVLQ*SGFRKM-mEYFP as a substrate of His66-tagged and untagged formsforms of SARS-CoV-2of SARS-CoV-2 3CLpro. 3CLpro. (b) The e ffect(b) of PMSFThe oneffect the cleavageof PMSF of His 6-MBP-TSAVLQ*SGFRKM-on the cleavage of mEYFP was studied, using untagged protease. PMSF was dissolved in ethanol and was applied in His6-MBP-TSAVLQ*SGFRKM-mEYFP was studied, using untagged protease. PMSF was dissolved 0.005–0.00005%in ethanol and (mwas/v) concentrationapplied in 0.005–0.00005% range. Uncleaved (m/v substrate) concentration is shown range. in figure Uncleaved part (a). substrate Gel images is are representatives of 2 parallel experiments. shown in figure part (≥a). Gel images are representatives of ≥2 parallel experiments.

The effect of the N-terminal His6-tag on the activity of 3CLpro was also tested, and the His6-tagged enzyme was unable for processing the recombinant substrate (Figure6a). This in agreement with the

findings of Grum-Tokars et al. who revealed a dramatic decrease of SARS-CoV 3CLpro activity upon addition of N- or C-terminal affinity tags [32]. In order to investigate the possible causes of why we observed no proteolysis in the case of some candidate targets, we further analyzed the structures and compared the accessibilities of the putative cleavage sites (Figure7). The comparison of solvent-accessible surface areas of PTK6, PLMN, IRAK1, and CTBP1 structures showed that P5–P1 and P1’–P5’ residues may have relatively lower accessibility in Int. J. Mol. Sci. 2020, 21, x FOR PEER REVIEW 9 of 19

In order to investigate the possible causes of why we observed no proteolysis in the case of some candidate targets, we further analyzed the structures and compared the accessibilities of the Int.putative J. Mol. Sci.cleavage2020, 21, 9523sites (Figure 7). The comparison of solvent-accessible surface areas of PTK6,9 of 19 PLMN, IRAK1, and CTBP1 structures showed that P5–P1 and P1’–P5’ residues may have relatively lower accessibility in PTK6 and PLMN, respectively. We assume that relatively lower solvent PTK6 and PLMN, respectively. We assume that relatively lower solvent accessible surface areas (SASA) accessible surface areas (SASA) of these sites may prevent efficient binding and cleavage of the of these sites may prevent efficient binding and cleavage of the substrate. In contrast, the average substrate. In contrast, the average values obtained for P5–P5’ residues are more comparable in values obtained for P5–P5’ residues are more comparable in CTBP1 and IRAK1 proteins and show CTBP1 and IRAK1 proteins and show relatively higher overall accessibility of the putative cleavage relatively higher overall accessibility of the putative cleavage site, however, a threshold was not site, however, a threshold was not determined. We assumed that the relatively lower cleavage determined. We assumed that the relatively lower cleavage efficiency of CTBP1 may be caused in part efficiency of CTBP1 may be caused in part by the accessibilities of P2’–P5’ residues of by the accessibilities of P2’–P5’ residues of 153GTRVQ*SVEQI162 site which are located in an α-helix, 153GTRVQ*SVEQI162 site which are located in an α-helix, while the entire target site of IRAK1 is while the entire target site of IRAK1 is located in a loop region. The relatively higher accessibilities of located in a loop region. The relatively higher accessibilities of cleavage sites are in agreement with cleavage sites are in agreement with the susceptibilities of CTBP1 and IRAK1 for proteolysis in vitro the susceptibilities of CTBP1 and IRAK1 for proteolysis in vitro (Figure 5) and indicate that (Figure5) and indicate that probabilities of cleavage sites predicted by the NetCorona webserver need probabilities of cleavage sites predicted by the NetCorona webserver need to be interpreted by to be interpreted by considering surface accessibilities of putative sites, as well. Our result highlights considering surface accessibilities of putative sites, as well. Our result highlights that determination that determination of apparent accessibilities of cleavage sites in the protein structures is not sufficient of apparent accessibilities of cleavage sites in the protein structures is not sufficient enough (Figure enough (Figure2); in agreement with the results of Taylor and Radding [ 29], we also suggest the 2); in agreement with the results of Taylor and Radding [29], we also suggest the detailed detailed determination of structural characteristics, especially the calculation of numerical SASA values determination of structural characteristics, especially the calculation of numerical SASA values (Figure7) for more reliable cleavage site prediction. (Figure 7) for more reliable cleavage site prediction.

FigureFigure 7.7. AnalysisAnalysis ofof surface-accessiblesurface-accessible surfacesurface areas.areas. RelativeRelative surface-accessible surfacesurface areaarea (SASA)(SASA) valuesvalues (%)(%) determined determined for for all all atoms atoms in PTK6,in PTK6, PLMN, PLMN, IRAK1, IRAK1, and CTBP1,and CTBP1, and wereand averagedwere averaged for every for positionevery position in a 5-residue in a window.5-residue Values window. for P5–P5’ Values residues for P5–P5’ are highlighted residues byare red, hi theghlighted average-of-average by red, the andaverage-of-average SD values calculated and forSD P5–P5’ values sites calculated are shown fo inr P5–P5’ brackets. sites Residue are numberingshown in brackets. starts in each Residue case fromnumbering the first starts residue in each of the case protein from inthe the first coordinate residue of file. the protein in the coordinate file.

2.6. Identification of Cleavage Position in CTBP1 and in His6-MBP-mEYFP Substrates 2.6. Identification of Cleavage Position in CTBP1 and in His6-MBP-mEYFP Substrates We designed and prepared His6-MBP-mEYFP recombinant fusion proteins which were used as We designed and prepared His6-MBP-mEYFP recombinant fusion proteins which were used as substrates of SARS-CoV-2 3CLpro. A His6-MBP-TSAVLQ*SGFRKM-mEYFP substrate contained the substrates of SARS-CoV-2 3CLpro. A His6-MBP-TSAVLQ*SGFRKM-mEYFP substrate contained the natural nsp4 cleavage site of SARS-CoV-2 polyprotein, while a His6-MBP-REGTRVQ*SVEQIRE-mEYFP proteinnatural containednsp4 thatcleavage cleavage sitesite sequenceof whichSARS-CoV-2 was identified polyprotein, CTBP1 by NetCoronawhile a algorithmHis6-MBP-REGTRVQ*SVEQIRE-mEYFP (153GTRVQ*SVEQI162). protein contained that cleavage site sequence which was identified CTBP1 by NetCorona algorithm (153GTRVQ*SVEQI162). As it was expected, His6-MBP-TSAVLQ*SGFRKM-mEYFP was processed by SARS-CoV-2 3CLpro As it was expected, His6-MBP-TSAVLQ*SGFRKM-mEYFP was processed by SARS-CoV-2 (Figure5a), and we observed proteolysis of His 6-MBP-REGTRVQ*SVEQIRE-mEYFP protein as well. The3CLpro substrate (Figure turnover 5a), and was we lower observed as compared proteolysis to His of6 His-MBP-TSAVLQ*SGFRKM-mEYFP6-MBP-REGTRVQ*SVEQIRE-mEYFP which impliesprotein loweras well. cleavage The substrate efficiency turnover for the CTBP1 was lower cleavage as compared site (Figure to8 ).His This6-MBP-TSAVLQ*SGFRKM-mEYFP is in contrast with the obtained cleavage probabilities, as a higher NetCorona score was obtained for the CTBP1 cleavage site (0.946) as compared to the nsp4 site (0.891). This observation implied that the cleavage probabilities predicted Int. J. Mol. Sci. 2020, 21, x FOR PEER REVIEW 10 of 19 which implies lower cleavage efficiency for the CTBP1 cleavage site (Figure 8). This is in contrast withInt. J. Mol.the obtained Sci. 2020, 21 ,cleavage 9523 probabilities, as a higher NetCorona score was obtained for the CTBP110 of 19 cleavage site (0.946) as compared to the nsp4 site (0.891). This observation implied that the cleavage probabilities predicted purely based on protein sequence may show no strong correlation with cleavagepurely based efficiencies, on protein indicating sequence that may it showis important no strong to correlation validate the with results cleavage of in e ffisilicociencies, predictions indicating in vitro.that it is important to validate the results of in silico predictions in vitro.

Figure 8. Cleavage of recombinant fusion protein substrates by SARS-CoV-2 3CLpro. His6-MBP-mEYFP Figure 8. Cleavage of recombinant fusion protein substrates by SARS-CoV-2 3CLpro. substrates containing TSAVLQ*SGFRKM (SARS-CoV-2 nsp4) or REGTRVQ*SVEQIRE (CTBP1) cleavage His6-MBP-mEYFP substrates containing TSAVLQ*SGFRKM (SARS-CoV-2 nsp4) or site sequences were digested with SARS CoV-2 3CLpro. After SDS-PAGE, the gel was stained with REGTRVQ*SVEQIRE (CTBP1) cleavage site sequences were digested with SARS CoV-2 3CLpro. Coomassie dye. Uncleaved proteins and SARS-CoV-2 PR are shown by continuous and dashed After SDS-PAGE, the gel was stained with Coomassie dye. Uncleaved proteins and SARS-CoV-2 PR lines, respectively, while asterisks indicate cleavage products. Gel images are representatives of 2 are shown by continuous and dashed lines, respectively, while asterisks indicate cleavage products.≥ parallel experiments. Gel images are representatives of ≥ 2 parallel experiments. The reaction mixtures were analyzed by matrix-assisted laser desorption/ionization time-of-flight massThe spectrometry reaction (MALDI-TOFmixtures were MS) andanalyzed the molecular by matrix-assisted weights of cleavage laser fragments desorption/ionization were determined time-of-flight mass spectrometry (MALDI-TOF MS) and the molecular weights of cleavage for the identification of cleavage positions. As was expected, the recombinant His6-MBP-mEYFP fragmentssubstrate representingwere determined the nsp4 for the cleavage identification site sequence of cleavage of the positions. polyprotein As (TSAVLQ*SGFRKM) was expected, the recombinantwas cleaved withinHis6-MBP-mEYFP the incorporated substrate sequence representing at the desired the position nsp4 (Figurecleavage9a). site After sequence the cleavage of the of polyproteinrecombinant (TSAVLQ*SGFRKM) CTBP1 protein, the analysis was cleaved of cleavage within fragments the incorporated implied that sequence the full-length at the protein desired is position (Figure 9a). After the153 cleavage of reco162mbinant CTBP1 protein, the analysis of cleavage cleaved at the predicted site ( GTRVQ*SVEQI ) (Figure9b), and the recombinant His 6-MBP-mEYFP 153 162 fragmentssubstrate containing implied that the the same full-length cleavage siteprot wasein is also cleaved cleaved at atthe the predicted same predicted site ( positionGTRVQ*SVEQI (Figure9c).) (FigureThese results 9b), and proved the recombinant that the fluorescent His6-MBP-mEYFP substrates aresubstrate suitable containing for fluorimetric the same assay. cleavage site was also cleaved at the same predicted position (Figure 9c). These results proved that the fluorescent substrates2.7. Comparison are suitable of Cleavage for fluorimetric Efficiencies of assay. SARS-CoV-2 and CTBP Cleavage Sites After proving that the recombinant substrates are cleaved at the desired position (Figure9), we performed cleavage reactions using the His6-MBP-mEYFP substrates to demonstrate that the fusion proteins are applicable for proteinase assays and to compare the cleavage efficiencies of SARS-CoV-2 and CTBP1 cleavage sites (Figure 10). Cleavage reaction revealed that His6-MBP-TSAVLQ*SGFRKM-mEYFP substrate containing a natural cleavage site of polyprotein is a better substrate of SARS-CoV-2 3CLpro as compared to the CTBP cleavage site-containing substrate (Figure 10) (Table2). This is in agreement with the results of the gel-based assay which showed higher cleavage efficiency of His6-MBP-TSAVLQ*SGFRKM-mEYFP substrate as compared to His6-MBP-REGTRVQ*SVEQIRE-mEYFP (Figure8) but is in contrast with the higher NetCorona score obtained for the latter cleavage site (Figure3). Our results proved that the designed substrates can be used for proteolytic assays, and show that the substrate is processed at the incorporated CTBP1 only with low efficiency.

Int. J. Mol. Sci. 2020, 21, 9523 11 of 19 Int. J. Mol. Sci. 2020, 21, x FOR PEER REVIEW 11 of 19

Figure 9. Determination of cleavage sites by MALDI-TOF MS. In cleavage reactions Figure 9. Determination of cleavage sites by MALDI-TOF MS. In cleavage reactions His6-MBP- His6-MBP-TSAVLQ*SGFRKM-mEYFP (a), recombinant CTBP1 (b), and TSAVLQ*SGFRKM-mEYFP (a), recombinant CTBP1 (b), and His6-MBP-REGTRVQ*SVEQIRE-mEYFP proteinsHis6-MBP-REGTRVQ*SVEQIRE (c) were used as substrates.-mEYFP Molecular proteins weights (c) were (Da) used of SARS-CoV-2as substrates. 3CLpro Molecular are labeledweights by black,(Da) of values SARS-CoV-2 for the full-length 3CLpro are substrates labeled by are black, shown values by dark for bluethe full-length color, while substrates the proteolytic are shown fragments by aredark colored blue color, by light while blue the and proteolytic green, respectively. fragments are Cleavage colored siteby light sequences blue and are green, also shown, respectively. asterisks indicateCleavage cleavage site sequences position. are also shown, asterisks indicate cleavage position.

2.7. ComparisonTable 2. Kinetic of Cleavage parameters Efficiencies determined of SARS-CoV-2 for SARS-CoV-2 and CTBP 3CLpro Cleavage using His Sites6-MBP-TSAVLQ*SGFRKM- mEYFP and His6-MBP-REGTRVQ*SVEQIRE-mEYFP substrates. Data were calculated based on the After proving that the recombinant substrates are cleaved at the desired position (Figure 9), we results of two parallel experiments. performed cleavage reactions using the His6-MBP-mEYFP substrates to demonstrate that the fusion 1 1 1 proteins are applicableCleavage Sitefor proteinase KM (µassaysM) and kcatto (scompare− ) the kcat /cleavageKM (µM− efficienciess− ) of SARS-CoV-2 andTSAVLQ*SGFRKM CTBP1 cleavage sites 5.086 (Figure2.046 10). 1.349 0.370 0.2652 0.1291 ± ± ± REGTRVQ*SVEQIRE 0.860 0.303 0.0033 0.0004 0.0038 0.0014 ± ± ±

Int.Int. J. J. Mol. Mol. Sci.Sci.2020 2020,,21 21,, 9523 x FOR PEER REVIEW 12 of of 19 19

0.04

0.03 ) -1

M s 0.02 μ ( 0 v 0.01

0.00 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 [S] (μM)

FigureFigure 10.10. KineticKinetic analysis analysis of of SARS-CoV-2 SARS-CoV-2 3CLpro 3CLpro using using recombinant recombinant fusion fusion protein protein substrates. substrates. Data Data obtained for His -MBP-TSAVLQ*SGFRKM-mEYFP and His -MBP-REGTRVQ*SVEQIRE-mEYFP obtained for His6-MBP-TSAVLQ*SGFRKM-mEYFP6 and His66-MBP-REGTRVQ*SVEQIRE-mEYFP substratessubstrates are are shown shown by by full full circles circles and and squares, squares, respectively. respectively. Averages Averages of of values values obtained obtained from from two two parallelparallel experiments experiments are are plotted, plotted, error error bars bars represent represent SD. SD. Kinetic Kinetic parameters parameters are are shown shown in in Table Table2. 2. The GTRVQ*SVEQI sequence motif is fully identical in the highly homologous CTBP1 and CTBP2 Table 2. Kinetic parameters determined for SARS-CoV-2 3CLpro using human proteins (Figure S2), therefore, CTBP2 is likely to be a target of SARS-CoV-2 3CL protease as well. His6-MBP-TSAVLQ*SGFRKM-mEYFP and His6-MBP-REGTRVQ*SVEQIRE-mEYFP substrates. Data In agreementwere calculated with this, based CTBP2 on the has results been of provedtwo parallel to be experiments. a proteolytic target of hCoV-NL63 3CLpro [21], and the highly similar cleavage site specificities imply that CTBP proteins may be potential targets of

SARS-CoV and SARS-CoV-2Cleavage 3CLproSite enzymes,KM (µM) but their susceptibilitykcat (s-1) forkcat/K proteolyticM (µM-1s cleavage-1) needs to be investigated in the context of other cell types and/or species, as well. TSAVLQ*SGFRKM 5.086 ± 2.046 1.349 ± 0.370 0.2652 ± 0.1291

3. Discussion REGTRVQ*SVEQIRE 0.860 ± 0.303 0.0033 ± 0.0004 0.0038 ± 0.0014 In this work, we aimed to test the application of such sequence-based algorithms for the prediction of SARS-CoV-2Cleavage 3CLproreaction cleavagerevealed sites that inHis di6ff-MBP-TSAVLQ*SGFRKMerent proteins which methods-mEYFP have substrate already beencontaining applied a innatural the case cleavage of SARS-CoV site of polyprotein 3CLpro [26] is or a Zika better and substrate VEEV Group of SARS-CoV-2 IV viral proteases 3CLpro as [24 compared,25]. to the CTBPComparison cleavage site-containing of SARS-CoV substrate and SARS-CoV-2 (Figure 10) 3CL (Table proteases 2). This showed is in agreement a high identity with of the protease results sequencesof the andgel-based substrate bindingassay subsitewhich compositions, showed whichhigher implied cleavage that the NetCoronaefficiency v. 1.0of webserver—thatHis6-MBP-TSAVLQ*SGFRKM has been developed-mEYFP for the predictionsubstrate of SARS-CoV as 3CLpro cleavagecompared sites [26]—mayto beHis potentially6-MBP-REGTRVQ*SVEQIRE-mEYFP applicable to identify potential (Figure host 8) but targets is in of contrast SARS-CoV-2 with the 3CLpro. higher InNetCorona addition, identificationscore obtained of for SSHHPS the latt usinger cleavage BLAST site analysis (Figure may 3). Our be applied results toproved identify that putative the designed target sequencessubstrates ofcan PLpro, be used however, for proteolytic prediction assays, potential and was show not testedthat the in substrate this context. is processed at the incorporated CTBP1The only most with probable low efficiency. candidate host substrates were considered to contain SSHHPS and/or a potential cleavageThe site GTRVQ*SVEQI with a high NetCorona sequence score,motif andis fully of the iden candidatetical in targets,the highly we selectedhomologous CTBP1 CTBP1 and PTK6 and proteinsCTBP2 human and investigated proteins (Figure their susceptibility S2), therefore, for proteolysisCTBP2 is likelyin vitro to. be Additionally, a target of IRAK1,SARS-CoV-2 which 3CL has alreadyprotease been as well. predicted In agreement previously wi toth containthis, CTBP2 a potential has been cleavage proved site to of be SARS-CoV a proteolytic 3CLpro target [26 of], washCoV-NL63 also studied, 3CLproand [21], we and proved the highly that it similar is a substrate cleavage of site SARS-CoV-2 specificities 3CLpro.imply that Plasma CTBP proteins protein PLMN—containingmay be potential targets a predicted of SARS-CoV cleavage site—wasand SARS-C notoV-2 digested 3CLpro by SARS-CoV-2 enzymes, but 3CLpro, their susceptibility possibly due tofor inaccessibility proteolytic cleavage of the cleavage needs to site be ininvestigated the structure. in the context of other cell types and/or species, as well.A His6-MBP-TSAVLQ*SGFRKM-mEYFP recombinant substrate—containing a natural cleavage site of SARS-CoV-2 polyprotein—was designed and used as a positive control in cleavage reactions. This3. Discussion substrate system has already been applied previously to study proteases of HIV-1, tobacco etch virusIn [33 –this35], work, yeast Ty1we retrotransposonaimed to test the [36 ],applicatio and Venezuelann of such equine sequence-based encephalitis algorithms virus (VEEV) for [ 37the], andprediction the protease of SARS-CoV-2 of human paternally 3CLpro cleavage expressed sites gene in 10 different (PEG10) proteinproteins [38 which]. The methods successful have adaptation already ofbeen this applied recombinant in the substrate case of systemSARS-CoV enables 3CLpro enzymatic [26] or characterization Zika and VEEV of SARS-CoV-2Group IV viral 3CLpro proteases and screening[24,25]. of inhibitors using a microcentrifuge tube- [33,36] or a microtiter plate-based protease assayComparison [34,37] in the of future. SARS-CoV Furthermore, and SARS-CoV-2 a wide variety 3CL pr ofoteases sequences showed can be a potentiallyhigh identity inserted of protease into recombinantsequences and substrates, substrate making binding specificity subsite studiescompositions, and target which site identificationsimplied that the possible. NetCorona v. 1.0 webserver—that has been developed for the prediction of SARS-CoV 3CLpro cleavage sites [26]—may be potentially applicable to identify potential host targets of SARS-CoV-2 3CLpro. In

Int. J. Mol. Sci. 2020, 21, 9523 13 of 19

CTBP1 is a transcriptional co-repressor protein that is involved in the regulation of the expression of genes controlling development, oncogenesis, and apoptosis [39]. CTBP1 and -2 were found previously to influence viral replication, and enhanced replication of adenovirus E1A was observed upon CTBP knockdown [40]. PTK6 is also referred to as breast tumor kinase, and is an intracellular non-receptor tyrosine kinase, while PLMN is the zymogen form of plasmin being responsible for digestion of fibrin clot (fibrinolysis). Here we identified both proteins as candidate targets of SARS-CoV-2 3CLpro, but we did not observe their processing in spite of the presence of a putative cleavage site (predicted by the NetCorona webserver). The sets of experimentally determined cleavage sites—e.g., obtained from in vitro proteomic analysis [21]—are expected to aid the improvement of prediction algorithms’ reliability, while our results also represented some limitations of the applied in silico methods and highlighted the necessity of structural analysis and determination of cleavage site accessibilities, otherwise, candidate targets can be identified only with lower accuracy. In summary, we have successfully adapted the SSHHPS analysis for the identification of potential coronavirus cleavage sites, and “repurposed” the NetCorona 1.0 webserver for the prediction of candidate human target proteins of SARS-CoV-2 3CLpro. We demonstrated that the NetCorona 1.0 webserver developed primarily for the 3CLpro of SARS-CoV is applicable efficiently for that of SARS-CoV-2, as well. The NetCorona webserver can be applied for the prediction of 3CLpro cleavage sites, while our results implied that SSHHPS analysis may be used to identify substrates of PLpro, as well, however, we have not tested PLpro in vitro. The prediction algorithms were tested only for human proteins, but they can be potentially adapted for the identification of host targets in other species as well. Our results highlighted a limitation of sequence-based cleavage site predictions and showed that the structural context of cleavage sites also need to be considered because the regions with the lower solvent-accessible surface may be less susceptible for proteolysis, even a high NetCorona score. We identified CTBP1 protein as a host substrate of SARS-CoV-2 3CLpro, and the existence of the predicted cleavage site was successfully proved experimentally both in the case of the recombinant CTBP1 and the His6-MBP-REGTRVQ*SVEQIRE-mEYFP substrate. Nonetheless, it is important to note that the CTBP cleavage site was processed with remarkably lower efficiency. Based on homology we assume that human CTBP2 is also a host substrate of the protease, but future studies need to reveal how processing of the CTBP proteins play role in the viral life-cycle. Identification of additional molecular targets of SARS-CoV and SARS-CoV-2 3CL proteases may help better understanding of viral replication, pathogenesis, and the coronavirus-induced phenotypes.

4. Materials and Methods

4.1. In Silico Analyses

4.1.1. BLAST Analysis Autoproteolytic cleavage site sequences of SARS-CoV-2 3CLpro were obtained from the literature [18]. BLAST analysis was performed to identify SSHHPS in human proteins, using the cleavage site sequences as input. Human-specific sequence search was run in BLASTP-as part of the BLAST+ 2.10.0—using the “blastp-short” option with PAM30 scoring matrix optimized for query sequences shorter than 30 residues [41,42]. The 10 residue-long query sequences (P5–P5’ residues) were aligned against the nr BLAST database (all non-redundant databases including GenBank translations, PDB, SwissProt, PIR, and PRF entries, excluding environmental samples from WGS projects) consisting of a total of 281,252,422 sequences. In order to include partially aligned hits of the catalytic residues, and those of similar physicochemical characteristics the following parameter values were set: window length, 15; cutoff value, 25,500; threshold score, 5. Int. J. Mol. Sci. 2020, 21, 9523 14 of 19

4.1.2. NetCorona Prediction NetCorona v. 1.0 webserver (available at http://www.cbs.dtu.dk/services/NetCorona/) was applied to predict the presence of SARS-CoV-2 3CLpro cleavage sites [26], using sequences of full-length proteins as input. The default threshold of the NetCorona algorithm is 0.5. The higher prediction score indicates higher cleavage probability, the most probable cleavage sites were identified with > 0.8 prediction score.

4.1.3. Structures Coordinate files were downloaded from Protein Data Bank [43]. We used X-ray crystal structures of SARS-CoV-2 3CLpro (6LU7.pdb) [13], CTBP1 (4U6Q.pdb) [44], PLMN (1DDJ.pdb) [45], and ACHE proteins (1B41.pdb) [46], and a NMR solution structure of PTK6 (1RJA.pdb) [47]. The homology model of IRAK1 was downloaded from SWISS-MODEL Repository [48], the model has been prepared based on X-ray crystal structure of the protein (6BFN.pdb) [49]. The per-residue solvent accessible surface areas (SASA) were computed based on the coordinate files using Lee and Richard’s algorithm with default probe-radius (1.4 Å) at a resolution of 200 slices per atom by FreeSASA tool [50]. Structural figures were prepared by using the PyMol Molecular Graphics System (V. 1.3 Schrödinger, LLC). Sequence logos were prepared by the WebLogo 3 webserver [51].

4.2. In Vitro Analyses

4.2.1. Materials All materials were obtained from Sigma-Aldrich, otherwise, it is indicated. The purified recombinant human CTBP1 (ab93729), PTK6 (ab60888), and IRAK1 tagged with an N-terminal glutathione-S-transferase (ab268679) were ordered from Abcam, PLMN from Chromogenix, while BSA (A7030) from Sigma-Aldrich (St. louis, MO, USA). Lyophilized BSA and PLMN were dissolved in distilled water (1 mg/mL stock).

4.2.2. Expression and Purification of SARS-CoV-2 3CLpro A pET11a plasmid bearing the coding sequence of SARS-CoV-2 3CLpro (GenBank: MT291835.2) fused to an N-terminal hexahistidine tag (His6) was obtained using the gene synthesis service of GenScript. The pET11 bacterial expression plasmid coding for His6-SARS-CoV-2 3CLpro was transformed into BL21(DE3) E. coli cells. For protein expression, transformant cells were cultured at 37 ◦C in Luria-Bertani (LB) medium supplemented with ampicillin (100 µg/mL final concentration). Expression of the protease was induced by the addition of isopropyl β-D-1-thiogalactopyranoside (IPTG) in 1 mM final concentration, followed by shaking the suspensions at 37 ◦C for 3 h. After incubation, cells were collected by centrifugation at 4 C for 20 min at 5000 g (Sorvall Lynx 4000, Thermo Fisher Scientific, ◦ × Waltham, MA, USA). The pelleted cells were lysed in buffer A (20 mM Tris, 150 mM NaCl, 10 mM imidazole, pH 7.5) followed by sonication (Branson Sonifier 450) and centrifugation at 4 ◦C for 20 min at 10,000 g. His -SARS-CoV-2 3CLpro was purified from the supernatant by a His-Trap Ni-chelate affinity × 6 chromatography column (GE Healthcare) using the Äkta Prime instrument (Amersham Pharmacia Biotech, Little Chalfont, UK). The buffer of the eluate was changed to buffer B (20 mM Tris, 1 mM DTT, pH 8.0) by ultrafiltration using Amicon Ultra centrifugal filters (10K, Merck Millipore, Burlington, MA, USA), followed by removal of His6-tag by digesting His6-SARS-CoV-2 3CLpro using Factor Xa (BCXA-1060, Haematologic Technologies, Essex Junction, VT, USA). Purification of the untagged protease was performed by ion-exchange chromatography using HiTrap Q FF column (GE Healthcare, Chicago, IL, USA). Finally, the fractions were dialyzed against buffer C (20 mM Tris, 150 mM NaCl, 1 mM EDTA, 1 mM DTT, pH 7.8), and stored at 20 C. The purity of the fractions was determined − ◦ by SDS-PAGE. Int. J. Mol. Sci. 2020, 21, 9523 15 of 19

4.2.3. Vector Construction for the Expression of a His6-MBP-mEYFP Substrates

The coding sequences of cleavage sites were cloned into a pDest-His6-MBP-mEYFP bacterial expression plasmid, based on the method described previously [33,34], the applied oligonucleotide primers and the cleavage site sequences are shown in Table3. The success of cloning was confirmed by a DNA sequencing service (Eurofins Genomics Germany GmbH; Ebersberg, Germany), followed by a transformation of the verified pDest-His6-MBP-mEYFP expression constructs into BL21(DE3) E. coli cells.

Table 3. Oligonucleotide primers used for cloning. pDest-His6-MBP-mEYFP expression vectors were prepared by ligating the following complementary oligonucleotide primer pairs—coding for the cleavage site sequences-into the plasmid. CTBP1 (151–164) sequence is fully identical in CTBP1 and CTBP2 proteins. FW: forward; RV: reverse.

Cleavage Site Sequence Oligonucleotide Primer Sequence SARS-CoV-2 TSAVLQ* FW: 5’-TAAAACCTCTGCGGTGCTGCAGTCTGGCTTTCGTAAAATGG-3’ nsp4 SGFRKM RV: 5’-CTAGCCATTTTACGAAAGCCAGACTGCAGCACCGCAGAGGTTTTAAT-3’ CTBP1/2 REGTRV* FW: 5’-TAAACGTGAAGGCACCCGTGTGCAGTCTGTGGAACAGATCCGTGAAG-3’ (151–164) SVEQIRE RV: 5’-CTAGCTTCACGGATCTGTTCCACAGACTGCACACGGGTGCCTTCACGTTTAAT-3’

4.2.4. Expression and Purification of the His6-MBP-mEYFP Substrates

The His6-MBP-mEYFP protein substrates (Table3) were expressed in BL21(DE3) E. coli cells based on the protocol described previously [33,34] with slight modifications. After expression at 37 ◦C, cells were collected by centrifugation, the pellet was suspended in lysis buffer (20 mM Tris HCl, 100 mM NaCl, 5 mM imidazole, pH 7.8), followed by sonication and centrifugation. The recombinant proteins were purified from the cleared cell lysates using Ni-NTA magnetic agarose beads (Cube Biotech, Germany) [33–35]. After purification, the elution buffer (100 mM EDTA, 0.05% Tween 20, pH 8.0) was exchanged for distilled water, and the total protein concentration was determined by measuring absorbance at 280 nm using NanoDrop 2000 (Thermo Fisher Scientific, Waltham, MA, USA). Sample purity was determined by SDS-PAGE, using 14% polyacrylamide gel. The purified fusion proteins were then used in cleavage reactions as substrates of SARS-CoV-2 3CLpro.

4.2.5. Cleavage Reactions by SARS-CoV-2 3CLpro For cleavage reaction, the recombinant proteins were incubated with purified SARS-CoV-2 3CLpro in reaction buffer (20 mM Tris, 100 mM NaCl, pH 7.8) at 37 ◦C for at least 1 h. To analyze cleavage reactions by SDS-PAGE, the polyacrylamide gels were stained by Coomassie dye. In some cases, the denaturing SDS-PAGE was followed by in-gel renaturation of His6-MBP-mEYFP substrates by rinsing the gel in distilled water. The uncleaved substrate and cleavage products were visualized in the unstained gel based on their fluorescence under UV light using AlphaImager gel documentation system (ProteinSimple) [33–35], then the gel was stained by Coomassie dye as well.

4.2.6. Cleavage Site Identification by MALDI-TOF MS For the identification of cleavage sites, the reaction mixtures were concentrated and desalted by using C4 ZipTip pipette tips (ZTC04S096, Sigma-Aldrich, St. Louis, MO, USA), based on the instructions of the manufacturer. 2,5-dihydroxybenzoic acid (DHB) (100 mg/mL) was applied as matrix dissolved in 50% aqueous acetonitrile with 0.1% TFA content. 0.5 µL matrix and 1 µL sample was deposited and mixed on the plate and was allowed to dry. The mass spectrometric measurements were performed with a Bruker Autoflex Speed MALDI-TOF mass spectrometer. The linear mode was used for all samples, where the ion source voltage 1 and ion source voltage 2 were 19.5 kV, 18.3 kV, respectively. The applied laser was a solid phase laser (355 nm, 100 µJ/pulse) utilized at 200 Hz and 10,000 shots were summed. The results were evaluated by the ≥ flexAnalysis software (Bruker, Billerica, MA, USA). Int. J. Mol. Sci. 2020, 21, 9523 16 of 19

4.2.7. Proteinase Assay with His6-MBP-mEYFP Substrates The magnetic bead-based assay was performed based on the method described previously [33–36] with slight modifications. Cleavage reactions were performed in reaction buffer (20 mM Tris, 100 mM NaCl, pH 7.8) by incubating samples at 37 ◦C for 10 min. For the measurements with His6-MBP- TSAVLQ*SGFRKM-mEYFP and His6-MBP-REGTRVQ*SVEQIRE-mEYFP substrates the enzyme was applied in 0.074 µM and 0.74 µM final concentration, respectively. Due to the lack of any selective tight-binding inhibitors, the determination of the active site concentration was not possible, and the activity of SARS-CoV-2 3CLpro was regarded as 100%. Fluorimetric measurements were performed using a Biotek Synergy H1 device at 510 nm excitation and 540 nm emission wavelengths.

Supplementary Materials: Supplementary materials can be found at http://www.mdpi.com/1422-0067/21/24/ 9523/s1. Author Contributions: Conceptualization, J.A.M., M.M., M.G., B.K., T.N., J.T.; data curation, J.A.M., M.M., M.G., B.K., T.N.; formal analysis, J.A.M., M.M., M.G., B.K., T.N.; funding acquisition, J.T.; investigation, J.A.M., M.M., M.G., B.K., T.N.; methodology, J.A.M., M.M., M.G., B.K., T.N.; resources, J.T.; software, J.A.M., B.K.; supervision, J.A.M., J.T.; validation, J.A.M., T.N., J.T.; visualization, J.A.M., M.M., M.G., T.N.; writing—original draft preparation, J.A.M., M.M., M.G., B.K., T.N., J.T.; writing—review and editing, J.A.M., M.M., M.G., B.K., T.N., J.T. All authors have read and agreed to the published version of the manuscript. Funding: This work was supported in part by the Higher Education Institutional Excellence Programme (NKFIH-1150-6/2019) of the Ministry of Innovation and Technology in Hungary, within the framework of the Biotechnology thematic programme of the University of Debrecen. This paper was also supported by the GINOP-2.3.3-15-2016-00021 project, co-financed by the European Union and the European Regional Development Fund. Acknowledgments: The authors are grateful to the colleagues of Laboratory of Retroviral Biochemistry at the Department of Biochemistry and Molecular Biology (University of Debrecen), especially to Szilvia Janics-Pet˝ofor the technical assistance. Conflicts of Interest: The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations

3CLpro 3-chymotrypsin-like protease (main protease) COVID-19 coronavirus disease-19 CTBP1 C-terminal-binding protein 1 CTBP2 C-terminal-binding protein 2 MBP Maltose-binding protein mEYFP Monomeric enhanced yellow fluorescent protein PLpro Papain-like protease PR Protease SARS-CoV-2 Severe acute respiratory syndrome coronavirus 2 SSHHPS Short stretches of homologous host-pathogen protein sequences

References

1. Wu, F.; Zhao, S.; Yu, B.; Chen, Y.M.; Wang, W.; Song, Z.G.; Hu, Y.; Tao, Z.W.; Tian, J.H.; Pei, Y.Y.; et al. A new coronavirus associated with human respiratory disease in China. Nature 2020, 579, 265–269. [CrossRef] 2. Fung, T.S.; Liu, D.X. Human Coronavirus: Host-Pathogen Interaction. Annu. Rev. Microbiol. 2019, 73, 529–557. [CrossRef] 3. Gordon, D.E.; Jang, G.M.; Bouhaddou, M.; Xu, J.; Obernier, K.; White, K.M.; O’Meara, M.J.; Rezelj, V.V.; Guo, J.Z.; Swaney, D.L. A SARS-CoV-2 protein interaction map reveals targets for drug repurposing. Nature 2020, 583, 459–468. [CrossRef][PubMed] 4. Kumar, S.; Nyodu, R.; Maurya, V.K.; Saxena, S.K. Host Immune Response and Immunobiology of Human SARS-CoV-2 Infection. In Coronavirus Disease 2019 (COVID-19); Springer: Singapore, 2020; pp. 43–53. Int. J. Mol. Sci. 2020, 21, 9523 17 of 19

5. Blanco-Melo, D.; Nilsson-Payant, B.E.; Liu, W.C.; Uhl, S.; Hoagland, D.; Møller, R.; Jordan, T.X.; Oishi, K.; Panis, M.; Sachs, D.; et al. Imbalanced Host Response to SARS-CoV-2 Drives Development of COVID-19. Cell 2020, 181, 1036–1045.e9. [CrossRef][PubMed] 6. Yin, Y.; Wunderink, R.G. MERS, SARS and other coronaviruses as causes of pneumonia. Respirology 2018, 23, 130–137. [CrossRef][PubMed] 7. Giannis, D.; Ziogas, I.; Gianni, P. Coagulation disorders in coronavirus infected patients: COVID-19, SARS-CoV-1, MERS-CoV and lessons from the past. J. Clin. Virol. 2020, 127, 104362. [CrossRef][PubMed] 8. Li, H.; Liu, L.; Zhang, D.; Xu, J.; Dai, H.; Tang, N.; Su, X.; Cao, B. SARS-CoV-2 and viral sepsis: Observations and hypotheses. Lancet 2020, 395, 1517–1520. [CrossRef] 9. Lin, L.; Jiang, X.; Zhang, Z.; Huang, S.; Zhang, Z.; Fang, Z.; Gu, Z.; Gao, L.; Shi, H.; Mai, L.; et al. Gastrointestinal symptoms of 95 cases with SARS-CoV-2 infection. Gut 2020, 69, 997–1001. [CrossRef] 10. Baig, A.M. Neurological manifestations in COVID-19 caused by SARS-CoV-2. CNS Neurosci. Ther. 2020, 26, 499–501. [CrossRef] 11. Xue, X.; Yu, H.; Yang, H.; Xue, F.; Wu, Z.; Shen, W.; Li, J.; Zhou, Z.; Ding, Y.; Zhao, Q.; et al. Structures of two coronavirus main proteases: Implications for substrate binding and antiviral drug design. J. Virol. 2008, 82, 2515–2527. [CrossRef] 12. Yang, H.; Xie, W.; Xue, X.; Yang, K.; Ma, J.; Liang, W.; Zhao, Q.; Zhou, Z.; Pei, D.; Ziebuhr, J.; et al. Design of wide-spectrum inhibitors targeting coronavirus main proteases. PLoS Biol. 2005, 3, e324. 13. Jin, Z.; Du, X.; Xu, Y.; Deng, Y.; Liu, M.; Zhao, Y.; Zhang, B.; Li, X.; Zhang, L.; Peng, C.; et al. Structure of Mpro from SARS-CoV-2 and discovery of its inhibitors. Nature 2020, 582, 289–293. [CrossRef][PubMed] 14. Zhang, L.; Lin, D.; Sun, X.; Curth, U.; Drosten, C.; Sauerhering, L.; Becker, S.; Rox, K.; Hilgenfeld, R. Crystal structure of SARS-CoV-2 main protease provides a basis for design of improved α-ketoamide inhibitors. Science 2020, 368, 409–412. [CrossRef][PubMed] 15. Schechter, I.; Berger, A. On the size of the active site in proteases. I. papain. Biochem. Biophys Res. Commun. 1967, 27, 157–162. [CrossRef] 16. Stoddard, S.V.;Stoddard, S.D.; Oelkers, B.K.; Fitts, K.; Whalum, K.; Whalum, K.; Hemphill, A.D.; Manikonda, J.; Martinez, L.M.; Riley, E.G.; et al. Optimization Rules for SARS-CoV-2 Mpro Antivirals: Ensemble Docking and Exploration of the Coronavirus Protease Active Site. Viruses 2020, 12, 942. [CrossRef][PubMed] 17. Muramatsu, T.; Takemoto, C.; Kim, Y.T.; Wang, H.; Nishii, W.; Terada, T.; Shirouzu, M.; Yokoyama, S. SARS-CoV 3CL protease cleaves its C-terminal autoprocessing site by novel subsite cooperativity. Proc. Natl. Acad. Sci. USA 2016, 113, 12997–13002. [CrossRef][PubMed] 18. Chan, J.F.; Kok, K.H.; Zhu, Z.; Chu, H.; To, K.K.; Yuan, S.; Yuen, K.Y. Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan. Emerg. Microbes Infect. 2020, 9, 221–236. [CrossRef] 19. Lin, C.W.; Tsai, F.J.; Wan, L.; Lai, C.C.; Lin, K.H.; Hsieh, T.H.; Shiu, S.Y.; Li, J.Y. Binding interaction of SARS coronavirus 3CL(pro) protease with vacuolar-H+ ATPase G1 subunit. FEBS Lett. 2005, 579, 6089–6094. [CrossRef] 20. Liao, H.H.; Wang, Y.C.; Chen, M.C.; Tsai, H.Y.; Lin, J.; Chen, S.T.; Tsay, G.J.; Cheng, S.L. Down-regulation of granulocyte-macrophage colony-stimulating factor by 3C-like proteinase in transfected A549 human lung carcinoma cells. BMC Immunol. 2011, 12, 16. [CrossRef] 21. Koudelka, T.; Boger, J.; Henkel, A.; Schönherr, R.; Krantz, S.; Fuchs, S.; Rodríguez, E.; Redecke, L.; Tholey, A. N-Terminomics for the Identification of in vitro Substrates and Cleavage Site Specificity of the SARS-CoV-2 Main Protease. Proteomics 2020, e2000246. [CrossRef] 22. Shen, H.B.; Chou, K.C. HIVcleave: A web-server for predicting human immunodeficiency virus protease cleavage sites in proteins. Anal. Biochem. 2008, 375, 388–390. [CrossRef][PubMed] 23. Blom, N.; Hansen, J.; Blaas, D.; Brunak, S. Cleavage site analysis in picornaviral polyproteins: Discovering cellular targets by neural networks. Protein Sci. 1996, 5, 2203–2216. [CrossRef][PubMed] 24. Morazzani, E.M.; Compton, J.R.; Leary, D.H.; Berry, A.V.; Hu, X.; Marugan, J.J.; Glass, P.J.; Legler, P.M. Proteolytic cleavage of host proteins by the Group IV viral proteases of Venezuelan equine encephalitis virus and Zika virus. Antivir. Res. 2019, 164, 106–122. [CrossRef][PubMed] 25. Hu, X.; Compton, J.R.; Legler, P.M. Analysis of Group IV Viral SSHHPS Using In Vitro and In Silico Methods. J. Vis. Exp. 2019, 154, e60421. [CrossRef][PubMed] Int. J. Mol. Sci. 2020, 21, 9523 18 of 19

26. Kiemer, L.; Lund, O.; Brunak, S.; Blom, N. Coronavirus 3CLpro proteinase cleavage sites: Possible relevance to SARS virus pathology. BMC Bioinform. 2004, 5, 72. [CrossRef][PubMed] 27. Jaru-Ampornpan, P.; Jengarn, J.; Wanitchang, A.; Jongkaewwattana, A. Porcine Epidemic Diarrhea Virus 3C-Like Protease-Mediated Nucleocapsid Processing: Possible Link to Viral Cell Culture Adaptability. J. Virol. 2017, 91.[CrossRef][PubMed] 28. Zhang, J.; Guy, J.S.; Snijder, E.J.; Denniston, D.A.; Timoney, P.J.; Balasuriya, U.B. Genomic characterization of equine coronavirus. Virology 2007, 369, 92–104. [CrossRef] 29. Taylor, E.W.; Radding, W. Understanding Selenium and Glutathione as Antiviral Factors in COVID-19: Does the Viral Mpro Protease Target Host Selenoproteins and Glutathione Synthesis? Front. Nutr. 2020, 7, 143. [CrossRef] 30. Fan, K.; Ma, L.; Han, X.; Liang, H.; Wei, P.; Liu, Y.; Lai, L. The substrate specificity of SARS coronavirus 3C-like proteinase. Biochem. Biophys. Res. Commun. 2005, 329, 934–940. [CrossRef] 31. Zhang, C.; Zheng, W.; Huang, X.; Bell, E.W.; Zhou, X.; Zhang, Y. Protein Structure and Sequence Reanalysis of 2019-nCoV Genome Refutes Snakes as Its Intermediate Host and the Unique Similarity between Its Spike Protein Insertions and HIV-1. J. Proteome Res. 2020, 19, 1351–1360. [CrossRef] 32. Grum-Tokars, V.; Ratia, K.; Begaye, A.; Baker, S.C.; Mesecar, A.D. Evaluating the 3C-like protease activity of SARS-Coronavirus: Recommendations for standardized assays for drug discovery. Virus Res. 2008, 133, 63–73. [CrossRef][PubMed] 33. Bozóki, B.; Gazda, L.; Tóth, F.; Miczi, M.; Mótyán, J.A.; T˝ozsér, J. A recombinant fusion protein-based, fluorescent protease assay for high throughput-compatible substrate screening. Anal. Biochem. 2018, 540, 52–63. [CrossRef][PubMed] 34. Bozóki, B.; Mótyán, J.A.; Miczi, M.; Gazda, L.D.; T˝ozsér, J. Use of Recombinant Fusion Proteins in a Fluorescent Protease Assay Platform and Their In-gel Renaturation. J. Vis. Exp. 2019, 16, e58824. [CrossRef][PubMed] 35. Mótyán, J.A.; Miczi, M.; Bozóki, B.; T˝ozsér, J. Data supporting Ni-NTA magnetic bead-based fluorescent protease assay using recombinant fusion prostein substrates. Data Brief. 2018, 18, 203–208. [CrossRef] 36. Gazda, L.D.; Joóné, M.K.; Nagy,T.; Mótyán, J.A.; T˝ozsér, J. Biochemical characterization of Ty1 retrotransposon protease. PLoS ONE 2020, 15, e0227062. [CrossRef] 37. Bozóki, B.; Mótyán, J.A.; Hoffka, G.; Waugh, D.S.; T˝ozsér, J. Specificity Studies of the Venezuelan Equine Encephalitis Virus Non-Structural Protein 2 Protease Using Recombinant Fluorescent Substrates. Int. J. Mol. Sci. 2020, 21, 7686. 38. Golda, M.; Mótyán, J.A.; Mahdi, M.; T˝ozsér, J. Functional Study of the Retrotransposon-Derived Human PEG10 Protease. Int. J. Mol. Sci. 2020, 21, 2424. [CrossRef] 39. Stankiewicz, T.R.; Gray, J.J.; Winter, A.N.; Linseman, D.A. C-terminal binding proteins: Central players in development and disease. Biomol. Concepts 2014, 5, 489–511. [CrossRef] 40. Subramanian, T.; Zhao, L.J.; Chinnadurai, G. Interaction of CtBP with adenovirus E1A suppresses immortalization of primary epithelial cells and enhances virus replication during productive infection. Virology 2013, 443, 313–320. [CrossRef] 41. Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997, 25, 3389–3402. [CrossRef] 42. Schäffer, A.A.; Aravind, L.; Madden, T.L.; Shavirin, S.; Spouge, J.L.; Wolf, Y.I.; Koonin, E.V.; Altschul, S.F. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res. 2001, 29, 2994–3005. [CrossRef][PubMed] 43. Berman, H.M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T.N.; Weissig, H.; Shindyalov, I.N.; Bourne, P.E. The Protein Data Bank. Nucleic Acids Res. 2000, 28, 235–242. [CrossRef][PubMed] 44. Hilbert, B.J.; Morris, B.L.; Ellis, K.C.; Paulsen, J.L.; Schiffer, C.A.; Grossman, S.R.; Royer, W.E., Jr. Structure-guided design of a high affinity inhibitor to human CtBP. ACS Chem. Biol. 2015, 10, 1118–1127. [CrossRef][PubMed] 45. Wang, X.; Terzyan, S.; Tang, J.; Loy, J.A.; Lin, X.; Zhang, X.C. Human plasminogen catalytic domain undergoes an unusual conformational change upon activation. J. Mol. Biol. 2000, 295, 903–914. [CrossRef][PubMed] Int. J. Mol. Sci. 2020, 21, 9523 19 of 19

46. Kryger, G.; Harel, M.; Giles, K.; Toker, L.; Velan, B.; Lazar, A.; Kronman, C.; Barak, D.; Ariel, N.; Shafferman, A.; et al. Structures of recombinant native and E202Q mutant human acetylcholinesterase complexed with the snake-venom toxin fasciculin-II. Acta Crystallogr. D Biol. Crystallogr. 2000, 56 Pt 11, 1385–1394. [CrossRef] 47. Hong, E.; Shin, J.; Kim, H.I.; Lee, S.T.; Lee, W. Solution structure and backbone dynamics of the non-receptor protein-tyrosine kinase-6 Src homology 2 domain. J. Biol. Chem. 2004, 279, 29700–29708. [CrossRef] 48. Bienert, S.; Waterhouse, A.; de Beer, T.A.; Tauriello, G.; Studer, G.; Bordoli, L.; Schwede, T. The SWISS-MODEL Repository-new features and functionality. Nucleic Acids Res. 2017, 45, D313–D319. [CrossRef] 49. Wang, L.; Qiao, Q.; Ferrao, R.; Shen, C.; Hatcher, J.M.; Buhrlage, S.J.; Gray, N.S.; Wu, H. Crystal structure of human IRAK1. Proc. Natl. Acad. Sci. USA 2017, 114, 13507–13512. [CrossRef] 50. Mitternacht, S. FreeSASA: An open source C library for solvent accessible surface area calculations. F1000Res 2016, 5, 189. [CrossRef] 51. Crooks, G.E.; Hon, G.; Chandonia, J.-M.; Brenner, S.E. WebLogo: A Sequence Logo Generator. Genome Res. 2004, 14, 1188–1190. [CrossRef]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).