Charonis SS, Tsilibary EP, Georgopoulos AP. SARS-CoV-2 Virus and Human Leukocyte (HLA) Class II: Investigation in silico of Binding Affinities for COVID-19 Protection and Vaccine Development. J Immunological Sci. (2020); 4(4): 12-23 Journal of Immunological Sciences

Original Research Article Open Access

SARS-CoV-2 Virus and Human Leukocyte Antigen (HLA) Class II: Investigation in silico of Binding Affinities for COVID-19 Protection and Vaccine Development Spyros S. Charonis1,2, Effie-Photini Tsilibary1,2, Apostolos P. Georgopoulos1,2* 1Brain Sciences Center, Department of Veterans Affairs Health Care System, Minneapolis, MN 55417, USA 2Department of Neuroscience, University of Minnesota Medical School, Minneapolis, MN 55455, USA

Article Info ABSTRACT

Article Notes SARS-CoV-2 causes COVID-19, urgently requiring the development Received: October 06, 2020 of effective vaccine(s). Much of current efforts focus on the SARS-CoV-2 Accepted: November 16, 2020 spike-glycoprotein by identifying highly antigenic as good vaccine *Correspondence: candidates. However, high antigenicity is not sufficient, since the activation *Dr. Apostolos P. Georgopoulos, Brain Sciences Center, of relevant T cells depends on the presence of the complex of the antigen Department of Veterans Affairs Health Care System, with a suitably matching Human Leukocyte Antigen (HLA) Class II molecule, Minneapolis, MN 55417, USA; Email: [email protected]. not the antigen alone: in the absence of such a match, even a highly antigenic in vitro will not elicit formation in vivo. Here we assessed © 2020 Georgopoulos AP. This article is distributed under the terms of the Creative Commons Attribution 4.0 International systematically in silico the binding affinity of epitopes of the spike-glycoprotein License. to 66 common HLA-Class-II (frequency ≥ 0.01). We used a sliding epitope window of 22-amino-acid-width to scan the entire and determined Keywords: the binding affinity of each subsequence to each HLA . DPB1 had highest SARS-CoV-2 binding affinities, followed by DRB1 and DQB1. Higher binding affinities were SRAS-Cov-2 spike protein 2 concentrated in the initial part of the glycoprotein (S1-S460), with a peak at Human Leukocyte Antigen (HLA) Vaccines S223-S238. This region would be well suited for effective vaccine development by ensuring high probability for successful matching of the vaccine antigen from that region to a HLA Class II molecule for CD4+ T activation by the antigen-HLA molecule complex.

Introduction SARS-CoV-2 causes COVID-19, a disease that has now become a global pandemic1,2. Hence, there is an urgent need to develop an effective vaccine against the virus. Many efforts in that direction focus on the SARS-CoV-2 spike-glycoprotein, seeking to identify highly antigenic epitopes as good vaccine candidates3. However,

cells depends on the presence of the complex of the antigen with a suitablyhigh antigenicity matching is Human not sufficient, Leukocyte since Antigen the activation(HLA) Class of II relevant molecule, T not the antigen alone: in the absence of such a match, even a highly antigenic epitope in vitro will not elicit antibody formation in vivo.

second,Therefore, the there antigen are should two key be factorsadequately for a immunogenic, vaccine to be assuming effective: thatfirst, thethere individual should be is a immunocompetentgood match with a HLA for Classantibody II molecule, production. and In the case of an actual viral infection, the mounting of

against the virus protein is called “specific (or adaptive) ”. resultingThe key agents from thefor specificcleavage immunity of the viral and antibody by production proteases are in antigenthe HLA presenting Class II molecules. cells (APC), These which bind include to ~18-22-mer macrophages AA peptides4. When

Page 12 of 23 Charonis SS, Tsilibary EP, Georgopoulos AP. SARS-CoV-2 Virus and Human Leukocyte Antigen (HLA) Class II: Investigation in silico of Binding Affinities for COVID-19 Protection Journal of Immunological Sciences and Vaccine Development. J Immunological Sci. (2020); 4(4): 12-23 a match occurs, the HLA Class II- complex moves HLA alleles + to the cell surface, attracts CD4 For this study, we selected the more frequent alleles of 5 process of antibody production is initiated classical HLA Class II (DPB1, DQB1, DRB1), namely the most crucial first important Tstep cells, is andthe asuccessful complex . Thus, match between the virus protein peptide and a Class reasonable threshold. For that purpose, we obtained an II molecule. An individual carries 6 classical Class II Estimationall alleles with of Global global Allele frequencies Frequencies ≥ 0.01, by an querying arbitrary butthe 4 alleles (2 from each DRB1, DQB1 and DPB1 genes) . If website http://www.allelefrequencies.net/ 11 these provide a good match to the peptide, the antibody production process starts immediately and, assuming . The alleles that the person is immunocompetent, antibodies will DRB1with frequencies genes, respectively. ≥ 0.01 that we used are given in Table 1. be produced in time to eliminate the virus and protect They comprised 21, 15 and 30 alleles of DPB1, DQB1 and from future exposures to that virus. If, on the other SARS-CoV-2 spike glycoprotein hand, there is no match, antibodies cannot be produced and there will not be specific immunity for that virus. In fact, complete lack of Class II molecules, as it occurs UniprotKBThe amino database acid12 . (AA) It consists sequence of 1273 of AAs. the SARS-CoV-2 in a rare genetic disorder (primary spike glycoprotein (“glycoprotein”) was retrieved from the disorder6) is a devastating disease characterized by Partitioning the SARS-CoV-2 spike glycoprotein uncontrolled viral infections and poor prognosis. As mentioned above, the main objective of this Finally, the possibility and quality of the match critically depend on the affinity of the virus and the HLA of HLA Class I and II molecules to the SARS-CoV-2 spike Class II molecule. Indeed, in this study, we examined glycoprotein.study was to For assess that exhaustivelypurpose, we used the bindinga sliding affinities epitope systematically and exhaustively the SARS-CoV-2 virus window approach13 to partition the sequence of the spike spike glycoprotein to determine the binding affinities glycoprotein into subsequences of all possible consecutive of epitopes to 66 common HLA Class II alleles and then 22-mers for (e.g. residues S1-S22, S2-S23, …, S1252-S1273) identify regions in the protein with high such affinities structure of the spike protein (S) of SARS-CoV-2 has ofthat 22-AA-length covered the subsequences entire sequence was length generated (1273 AA).(number The beenas good recently candidates solved using for vaccine cryo-electron development. microscopy The7 ofmethod subsequences is illustrated = length in Figure of spike 1. More glycoprotein specifically, – a22); set and its sequence is known . Specific attention has been thus the number of subsequences was 1273-22 = 1251. directed towards the spike8 glycoprotein of the SARS- Subsequences were collected and queried in the IEDB CoV-2 since it represents a major binding site for cells database (www.iedb.org)14 in order to determine their via their ACE2 receptors in a number of recent studies that even revealed its crystal structure2,9,10. methodbinding 14affinity. For each to a specific HLA Class II molecule. Binding Materials and Methods 22 affinitypredicted predictions and reported were as obtained a percentile using rank the byNetMHCIIpan comparing -mer, a binding affinity score was n-mers selected from the SwissProt database11. Smaller the TheSARS-CoV-2 main objective spike glycoprotein. of this study For was that to purpose, exhaustively we the peptide’s score against the scores of five million random assess the binding affinities of HLA Class II molecules to follows. allelepercentile and 22ranks-mer indicate of the spikehigher glycoprotein binding affinity. was Therefore,found and assessed the binding affinities of 66 Class II alleles, as the minimum percentile rank (i.e. highest affinity) for each

Figure. 1. A sample of the sliding window approach for the SARS-CoV-2 spike glycoprotein.

Page 13 of 23 Charonis SS, Tsilibary EP, Georgopoulos AP. SARS-CoV-2 Virus and Human Leukocyte Antigen (HLA) Class II: Investigation in silico of Binding Affinities for COVID-19 Protection Journal of Immunological Sciences and Vaccine Development. J Immunological Sci. (2020); 4(4): 12-23

Table 1. HLA Class II alleles used, ordered by their global frequencies in descending order. (See text for details.) Allele Frequency Allele Frequency Allele Frequency DPB1*04:01 0.23022 DQB1*03:01 0.24528 DRB1*07:01 0.11305 DPB1*101:01 0.177 DQB1*02:01 0.13308 DRB1*15:01 0.0956 DPB1*05:01 0.16296 DQB1*03:02 0.10029 DRB1*03:01 0.0885 DPB1*04:02 0.16181 DQB1*05:01 0.0997 DRB1*11:01 0.07516 DPB1*02:01 0.15451 DQB1*06:02 0.07696 DRB1*01:01 0.06829 DPB1*03:01 0.0676 DQB1*02:02 0.07162 DRB1*13:01 0.05585 DPB1*01:01 0.05857 DQB1*03:03 0.05454 DRB1*11:04 0.05065 DPB1*13:01 0.04415 DQB1*04:02 0.05185 DRB1*04:01 0.0442 DPB1*14:01 0.04115 DQB1*06:01 0.05043 DRB1*13:02 0.03842 DPB1*02:02 0.02725 DQB1*05:02 0.04806 DRB1*16:01 0.03479 DPB1*09:01 0.02473 DQB1*05:03 0.0427 DRB1*14:01 0.03005 DPB1*17:01 0.02403 DQB1*06:03 0.04042 DRB1*14:54 0.02665 DPB1*28:01 0.01827 DQB1*06:04 0.02836 DRB1*15:02 0.02583 DPB1*77:01 0.01597 DQB1*04:01 0.02426 DRB1*12:01 0.02435 DPB1*11:01 0.0146 DQB1*06:09 0.01356 DRB1*04:04 0.02248 DPB1*18:01 0.01448 DRB1*09:01 0.02224 DPB1*107:01 0.014 DRB1*04:05 0.02144 DPB1*10:01 0.01295 DRB1*08:01 0.02097 DPB1*21:01 0.01172 DRB1*12:02 0.02014 DPB1*22:01 0.01149 DRB1*04:03 0.01807 DPB1*06:01 0.01001 DRB1*01:02 0.01745 DRB1*13:03 0.0167 DRB1*04:11 0.01642 DRB1*08:03 0.01629 DRB1*04:07 0.01524 DRB1*16:02 0.01494 DRB1*14:02 0.0148 DRB1*10:01 0.01429 DRB1*08:02 0.01421 DRB1*04:02 0.01362 kep of the classical genes (two of each DPB1, DQB1, DRB1), we simulated by bootstrap sets of individual cases by Datat. Thisanalysis yielded 1251 values for each allele. combining randomly six alleles (two from each of the three genes, DPB1, DQB1, and DRB1) using ad hoc computer of an allele to the glycoprotein: (a) the value with the lowestWe used22-mer two minimum measures rank to quantify (of the thetotal binding 1251 values),affinity 2019). programs in Microsoft Visual Intel FORTRAN (version called Lowest Percentile Rank (LPR), and (b) the percent Application to single individuals. As mentioned above, of percentile rank values less or equal 5% (PR5). LPR is a every individual carries 6 classical HLA Class II alleles, whereas PR5 is a more stable measure with the reasonable individual k, a quantitative estimate of their ability to unique value, and, therefore, prone to chance fluctuations, 2make from antibodies each against (DPB1, SARS-CoV-2 DQB1, DRB1). would be For the a average specific of the allele at the 5% threshold. (Exploratory analyses PR5 of the 6 alleles carried by that individual. Let be the interpretation as the probability of high binding affinity PR5 for allele i. data were then analyzed to assess (a) the effect of Gene onusing LPR a and 10% PR5, threshold and (b) yielded the differential very similar variation results.) of Gene- The Then related effect on LPR and PR5 along the AA sequence of the (1)

glycoprotein to determine epitopes of high binding affinity and (2) Analyses using bootstrap to specific alleles. We estimated basic statistics for using a bootstrap Since each individual carries 6 HLA Class II alleles procedure, as follows. Let MDPB1, MDQB1, MDRB1 be the number

Page 14 of 23 Charonis SS, Tsilibary EP, Georgopoulos AP. SARS-CoV-2 Virus and Human Leukocyte Antigen (HLA) Class II: Investigation in silico of Binding Affinities for COVID-19 Protection Journal of Immunological Sciences and Vaccine Development. J Immunological Sci. (2020); 4(4): 12-23 of available alleles in the DPB1, DQB1 and DRB1 genes, respectively, in the population; in this study, for the most (PR5) for each one of the 3 genes above using a linear association between allele frequency and binding affinity common (f MDPB1 = 21, regression analysis. M = 15 and M = 30. We generated 1000 bootstrap DQB1 , DRB1 General statistical methods values by selecting ≥ 0.01) randomly alleles globally, (with replacement, we have: allowing for homozygotes) two alleles from each set of the three Standard statistical methods were employed in these genes regression, using the IBM-SPSS statistical package (version 26). analyses, including analysis of variance (ANOVA) and linear Results (3) Overall relations of individual HLA Class II alleles to and (4) SARS-CoV-2 spike glycoprotein Figure 2 illustrates the results of the sliding epitope where the asterisk (*) denotes a bootstrap sample. window analysis for 3 HLA Class II alleles. Fig. 2A shows the percentile rank values for all 1251 epitopes for an allele Application to populations. Here we extended the analysis above on single individuals to populations, by taking into account the frequency of allele in the population. For with low binding activities (DQB1*02:02). Fig. 2B plots the i with high binding affinities (DRB1*13:02) and another that purpose, we computed an estimate of the weighted percentile rank values for 2 HLA Class II alleles that differ (by for an individual k in the population only by one (DRB1*13:01, DRB1*13:02) and using the bootstrapping procedure above: attests to the importance of HLA Class II molecule structure ) binding affinity higher binding activities than DRB1*13:01. Figure 3 (5) illustrateson binding results affinity on to LPR SARS-CoV-2: and PR5 for DRB1*13:02 two alleles, has one much with where the asterisk (*) denotes a bootstrap sample. We high (left panel) and another with low (right panel) binding then computed the sum of the corresponding frequency weights: LPR (Fig. 4A) and PR5 (Fig. 4B), and their joint distribution affinities. Figure 4 shows the frequency distributions of (6)

(linear(Fig. 4C). regression LPR and of LPR PR5 against were significantlyln(PR5), F associated; = 145.2, P [1,64] =more 4.1x10 specifically, PR decreased with the logarithm of PR5 and (7) the 66 alleles−18 studied. ). Tables 2 and 3 give the LPR and PR5 values of We then computed the mean of weighted as follows Overall relations of individual HLA Class II genes to SARS-CoV-2 spike glycoprotein f the three genes (DPB1, DQB1, DRB1) (N = 1000 bootstrap samples). 5; F[2,63] = 5.69, P = 0.005, The PR5 o differed significantly (Figure (8) ANOVA). DPB1 had significantly higher PR5 than DQB1 (P = (9) 0.002, ANOVA); the PR5 did not differ significantly between DPB1 and DRB1 (P = 0.365, ANOVA). Finally, DRB1 had (10) Localizationsignificantly higher of allelePR5 than binding DQB1 (P affinities = 0.009, ANOVA). along the SARS-CoV-2 spike glycoprotein sequence Association between binding affinity and allele frequency in populations. Here we investigated the possible association were concentrated in the initial part of the glycoprotein (equation 2) and Lower LPR values (i.e. higher binding affinities) the corresponding average allele frequency (equation 7) usingbetween a linear the averageregression binding analysis. affinity left panel); a cumulative plot is shown in Fig. 6, right panel. Similarly,(approximately high PR5 between values S1-S480were more AA frequent residues) in (Figurethe initial 6, As a control, we paired the observed to randomly part of the glycoprotein (approximately S1-S501; Figure 7). chosen (from the same gene set) and performed the same analysis. Relevance of allele-SARS-CoV-2 binding affinities to single individuals Association between binding affinity and allele frequency in single genes. Here we investigated the possible

The frequency distribution of 1000 average binding Page 15 of 23 Charonis SS, Tsilibary EP, Georgopoulos AP. SARS-CoV-2 Virus and Human Leukocyte Antigen (HLA) Class II: Investigation in silico of Binding Affinities for COVID-19 Protection Journal of Immunological Sciences and Vaccine Development. J Immunological Sci. (2020); 4(4): 12-23

Figure. 2. A) percentile rank values of all 22-mer sliding epitopes (N = 1251) are plotted along the SARS-CoV-2 glycoprotein sequence for DRB1:13:02 (green; low rank values = high binding affinities) and DQB1*02:02 (magenta; high rank values = low binding affinities). B) all percentile rank values are plotted for 2 alleles that differ by only one amino acid and yet possess very different binding affinities (DRB1*1301, red; DRB1*13:02, green)

Figure. 3. Left panel, percentile rank values and associated LPR and PR5 values for an allele (DPB*04:01) with high binding affinities. Right panel, percentile rank values and associated LPR and PR5 values for an allele (DQB1*02:02) with low binding affinities.

Figure. 4. A) frequency distribution of LPR. B) frequency distribution of PR5. C) LPR is plotted against LE5. See text for details.

Page 16 of 23 Charonis SS, Tsilibary EP, Georgopoulos AP. SARS-CoV-2 Virus and Human Leukocyte Antigen (HLA) Class II: Investigation in silico of Binding Affinities for COVID-19 Protection Journal of Immunological Sciences and Vaccine Development. J Immunological Sci. (2020); 4(4): 12-23

Table 2. LPR values for all alleles, ranked from smallest (highest GWTFGAGAALQIPFAMQ- DQB1*04:02 0.34 884 906 binding affinity) to largest (lowest binding affinity). The associated MAYRF AA sequence and its location along the spike glycoprotein are also DRB1*08:01 0.44 VGGNYNYLYRLFRKSNLKPFER 444 466 given. See text for details. DRB1*11:01 0.44 VGGNYNYLYRLFRKSNLKPFER 444 466 Start End Allele LMR Sequence for LMR DRB1*04:05 0.47 LPIGINITRFQTLLALHRSYLT 228 250 position position DRB1*16:02 0.52 QDLFLPFFSNVTWFHAIHVSGT 51 73 DRB1*08:03 0.01 GTTLDSKTQSLLIVNNATNVVI 106 128 TSGWTFGAGAALQIPFAMQ- DQB1*03:02 0.54 882 904 DRB1*14:02 0.01 GTTLDSKTQSLLIVNNATNVVI 106 128 MAY DRB1*13:02 0.02 GTTLDSKTQSLLIVNNATNVVI 106 128 GWTFGAGAALQIPFAMQ- DQB1*06:09 0.64 884 906 DPB1*04:02 0.04 NLCPFGEVFNATRFASVYAWNR 333 355 MAYRF DPB1*77:01 0.04 NLCPFGEVFNATRFASVYAWNR 333 355 DRB1*11:04 0.66 VGGNYNYLYRLFRKSNLKPFER 444 466 DRB1*04:11 0.05 LPIGINITRFQTLLALHRSYLT 228 250 GWTFGAGAALQIPFAMQ- DQB1*06:04 0.8 884 906 DRB1*01:02 0.07 PIGINITRFQTLLALHRSYLTP 229 251 MAYRF DRB1*04:03 0.07 TQQLIRAAEIRASANLAATKMS 1008 1030 DRB1*04:01 0.92 QQLIRAAEIRASANLAATKMSE 1009 1031 DPB1*02:01 0.08 LCPFGEVFNATRFASVYAWNRK 334 356 DRB1*03:01 1.1 KHTPINLVRDLPQGFSALEPLV 205 227 DPB1*01:01 0.09 CPFGEVFNATRFASVYAWNRKR 335 357 DRB1*07:01 1.1 LPIGINITRFQTLLALHRSYLT 228 250 DPB1*04:01 0.09 TNLCPFGEVFNATRFASVYAWN 332 354 DRB1*01:01 1.2 PIGINITRFQTLLALHRSYLTP 229 251 DRB1*04:04 0.1 LPIGINITRFQTLLALHRSYLT 228 250 DRB1*13:01 1.7 LPIGINITRFQTLLALHRSYLT 228 250 DPB1*02:02 0.11 NLCPFGEVFNATRFASVYAWNR 333 355 DQB1*05:02 2.5 CNGVEGFNCYFPLQSYGFQPTN 479 501 DPB1*10:01 0.11 YVTQQLIRAAEIRASANLAATK 1006 1028 DQB1*05:03 2.6 GVEGFNCYFPLQSYGFQPTNGV 481 503 DPB1*101:01 0.11 NLCPFGEVFNATRFASVYAWNR 333 355 DQB1*05:01 2.8 VSPTKLNDLCFTNVYADSFVIR 381 403 DPB1*14:01 0.11 YVTQQLIRAAEIRASANLAATK 1006 1028 DQB1*02:01 3.9 INLVRDLPQGFSALEPLVDLPI 209 231 DPB1*17:01 0.11 YVTQQLIRAAEIRASANLAATK 1006 1028 DQB1*02:02 3.9 INLVRDLPQGFSALEPLVDLPI 209 231 DRB1*15:01 0.11 PIGINITRFQTLLALHRSYLTP 229 251 DPB1*06:01 0.13 CPFGEVFNATRFASVYAWNRKR 335 357 Table 3. PR5 values for all alleles, ranked from highest (best affinity) DPB1*18:01 0.14 PFGEVFNATRFASVYAWNRKRI 336 358 to lowest. See text for details. DQB1*03:01 0.14 LAGTITSGWTFGAGAALQIPFA 877 899 Allele LE5MR DRB1*12:02 0.14 LPIGINITRFQTLLALHRSYLT 228 250 DRB1*04:05 9.83213 DPB1*28:01 0.16 LPIGINITRFQTLLALHRSYLT 228 250 DRB1*04:01 9.27258 DRB1*04:02 0.16 PIGINITRFQTLLALHRSYLTP 229 251 DRB1*04:07 8.39329 DPB1*09:01 0.17 PFGEVFNATRFASVYAWNRKRI 336 358 DRB1*13:02 8.39329 DPB1*107:01 0.17 PFGEVFNATRFASVYAWNRKRI 336 358 DRB1*10:01 8.31335 DPB1*13:01 0.17 PFGEVFNATRFASVYAWNRKRI 336 358 DRB1*15:02 8.07354 DQB1*06:01 0.17 LTPGDSSSGWTAGAAAYYVGYL 248 270 DRB1*16:02 8.07354 DPB1*03:01 0.18 VTQQLIRAAEIRASANLAATKM 1007 1029 DPB1*04:01 7.83373 DPB1*05:01 0.18 LPIGINITRFQTLLALHRSYLT 228 250 DPB1*02:02 7.67386 DPB1*11:01 0.18 LCPFGEVFNATRFASVYAWNRK 334 356 DPB1*101:01 7.67386 DRB1*16:01 0.18 PIGINITRFQTLLALHRSYLTP 229 251 DPB1*02:01 7.35412 DRB1*10:01 0.19 VASQSIIAYTMSLGAENSVAYS 686 708 DPB1*04:02 7.19424 DRB1*08:02 0.2 VTQQLIRAAEIRASANLAATKM 1007 1029 DPB1*77:01 7.19424 DRB1*15:02 0.21 LDSKTQSLLIVNNATNVVIKVC 109 131 DRB1*13:03 7.11431 DPB1*22:01 0.22 PFGEVFNATRFASVYAWNRKRI 336 358 DQB1*06:01 7.03437 DRB1*12:01 0.22 LPIGINITRFQTLLALHRSYLT 228 250 DRB1*16:01 6.79456 DRB1*13:03 0.22 LDSKTQSLLIVNNATNVVIKVC 109 131 DPB1*01:01 6.63469 DRB1*04:07 0.25 DSKTQSLLIVNNATNVVIKVCE 110 132 DRB1*04:04 6.63469 DRB1*14:01 0.27 LPIGINITRFQTLLALHRSYLT 228 250 DQB1*06:02 6.31495 DRB1*14:54 0.27 LPIGINITRFQTLLALHRSYLT 228 250 DRB1*07:01 6.31495 DQB1*06:02 0.29 ITSGWTFGAGAALQIPFAMQMA 881 903 DPB1*28:01 6.15508 DQB1*06:03 0.29 ITSGWTFGAGAALQIPFAMQMA 881 903 DPB1*22:01 5.9952 DPB1*21:01 0.3 VTQQLIRAAEIRASANLAATKM 1007 1029 DRB1*14:02 5.9952 DQB1*03:03 0.3 TSGWTFGAGAALQIPFAMQMAY 882 904 DQB1*03:01 5.83533 GWTFGAGAALQIPFAMQ- DQB1*04:01 5.7554 DRB1*09:01 0.31 884 906 MAYRF DQB1*04:02 5.7554 GWTFGAGAALQIPFAMQ- DRB1*01:01 5.7554 DQB1*04:01 0.34 884 906 MAYRF DRB1*15:01 5.7554

Page 17 of 23 Charonis SS, Tsilibary EP, Georgopoulos AP. SARS-CoV-2 Virus and Human Leukocyte Antigen (HLA) Class II: Investigation in silico of Binding Affinities for COVID-19 Protection Journal of Immunological Sciences and Vaccine Development. J Immunological Sci. (2020); 4(4): 12-23

DRB1*04:11 5.59552 DQB1*06:03 5.51559 DRB1*04:03 5.51559 meanaffinities and median (see Methods) of were is shown 4.97 inand Figure 4.96, 8respectively. (left panel) DPB1*09:01 5.35572 and their rank-ordered values in Fig. 8 (right panel). The DPB1*107:01 5.35572 DPB1*13:01 5.35572 CoV-2These spikeresults glycoprotein indicate that greater ~50% than of the the individuals average binding in the DPB1*06:01 5.27578 simulated population possess binding affinities to SARS- DPB1*18:01 5.19584 DRB1*09:01 5.19584 Relevanceaffinity of the of whole allele-SARS-CoV-2 sample. binding affinities to DPB1*03:01 5.11591 whole populations DPB1*05:01 5.11591 DPB1*11:01 4.71623 single individuals to populations, by taking into account DPB1*21:01 4.71623 the Infrequency this analysis, of allele we extendedi in the population the findings by computing above on DPB1*10:01 4.63629 an estimate of the weighted (by for DPB1*14:01 4.63629 an individual k in the population using the bootstrapping DPB1*17:01 4.63629 procedure described in the Methods) binding section. affinity We found DQB1*03:03 4.47642 that = 5.01, a value very close to the mean (= 4.97) and DRB1*08:03 3.67706 median (= 4.96) of the distribution of . DRB1*14:01 3.67706 DRB1*14:54 3.67706 In the analyses above, we used the global frequencies DRB1*04:02 3.51719 of the 66 alleles. However, those frequencies differ from DRB1*08:01 3.51719 population to population (country, ethnicity, etc.). In DRB1*12:01 3.43725 general, the quantitative ability to mount antibodies in a DQB1*06:04 3.35731 population depends (a) on the frequency of those alleles DRB1*11:01 3.27738 DRB1*01:02 3.11751 DRB1*12:02 3.11751 in thevarious population, populations, and (b) the on frequencies the binding of affinity all HLA of Class alleles II DQB1*06:09 2.95763 allelesin the in population. the population Therefore, should for be accurateknown. estimates of DQB1*03:02 2.79776 DRB1*11:04 2.71783 Association between binding affinity and allele DRB1*08:02 2.55795 frequency in populations DQB1*05:03 1.91847 Here we investigated the possible association between DRB1*03:01 1.91847 (equation 2) and the DRB1*13:01 1.83853 corresponding average allele frequency (equation 7). DQB1*05:02 1.43885 the average binding affinity was positively DQB1*05:01 0.79936 associated with average allele frequency : the higher the DQB1*02:01 0.39968 We found that the average binding affinity DQB1*02:02 0.39968 average frequency of the alleles in the set (Figure 9, left average binding affinity of the 6-allele set, the higher the t inpanel; r = 0.129; slope (beta) ± SE = 0.0348 ± 0.000846 [998] = 4.12, P = 0.000042). This means that a 1% increase is associated with an increase of 0.0348 in . This tocan SARS-CoV-2, be interpreted with as a reflectingpresumed an survival evolutionary advantage. pressure As a control,for increasing we paired allele the frequency observed with overallto randomly higher chosen affinities

above(from thecame same from gene a single set); bootstrap. no significant We carried association out 1000 was suchfound bootstraps (Fig. 9, right to obtainpanel; ar =long-term 0.027, P =expected 0.389). Theestimate results of Figure. 5. Mean ± SEM of percent of PR5 for the 66 alleles used (N = the slope of the dependence of on 21, 15, 30 for genes DPB1, DQB1 and DRB1, respectively). See text per . for details. . The mean slope (± SD) was 0.002815 ± 0.00087

Page 18 of 23 Charonis SS, Tsilibary EP, Georgopoulos AP. SARS-CoV-2 Virus and Human Leukocyte Antigen (HLA) Class II: Investigation in silico of Binding Affinities for COVID-19 Protection Journal of Immunological Sciences and Vaccine Development. J Immunological Sci. (2020); 4(4): 12-23

Figure. 6. Left panel: LPR for the 66 alleles. Right panel, cumulative plot of the data in the left panel.

Figure. 7. Left panel: Total counts of PR5 values for all 66 alleles used are plotted against the glycoprotein sequence. Right panel, cumulative plot of the data in the left panel.

Figure. 8. Frequency distribution of 1000 . See text for details.

Association between binding affinity and allele frequency in single genes dependence. We tested this hypothesis by performing a linearpanel) regression suggests that analysis specific between genes mightallele befrequency driving and this individuals, consisting of 6 alleles (2 from each gene DPB1, The results above come from simulated HLA makeup of binding affinity for each one of the 3 genes above. We found a highly significant correlation only for the DPB1 DQB1, DRB1). The significant dependence of average allele gene (Figure 10; r = 0.589, P = 0.005, N = 21 alleles). In frequency on the average binding affinity (Figure 9, left Page 19 of 23 Charonis SS, Tsilibary EP, Georgopoulos AP. SARS-CoV-2 Virus and Human Leukocyte Antigen (HLA) Class II: Investigation in silico of Binding Affinities for COVID-19 Protection Journal of Immunological Sciences and Vaccine Development. J Immunological Sci. (2020); 4(4): 12-23

Figure. 9. Average six-allele frequency is plotted against randomly assigned average binding affinity for actual pairing (left panel) and random pairing (right panel). See text for details.

the ability of an individual to mount antibodies against that protein, as happens during an epidemic or following vaccination; and fourth, to gain an insight into a possible

(and, therefore, the successful production of antibodies and protectionevolutionary from pressure COVID-19) by alleles to occur with at high higher binding frequencies, affinity

below.i.e. a positive association between binding affinity and allele frequency. We discuss the findings on these objectives Methodological considerations

algorithm that considers existing experimental data14. Moreover,The predicted synthetic binding peptides affinities which were contain based onamino an acid sequences of proteins without post-translational

Figure 10. Allele frequency is plotted against binding affinity of corresponding alleles. See text for details. extensively and have proven an excellent type of molecule formodifications the mimicry such of protein as glycosylation sites: such havepeptides been can used be generated as exact copies of protein fragments, as well 16. An example of the contrast no significant association was found for the DQB1 usefulness of this approach17 gene (r = 0.022, P = 0.938, N = 15) or the DRB1 gene (r the hypothesized evolutionary pressure has been exerted aspeptides in diverse mimicking chemical antibody modifications binding to the Epidermal through= 0.153, the P = DPB1 0.421, gene. N = 30). These findings indicate that Growth Factor Receptor. was the identification of Discussion Here we investigated the relations between SARS- CoV-2 spike glycoprotein and 66 common HLA Class II of theStudies protein, of binding based onaffinities a priori of considerations HLA Class II molecules of interest to alleles. Our study was inspired by the known dependence toa protein the particular typically fragment. focus on In specific contrast, fragments here we (epitopes) sought to of antibody production against a foreign antigen on a good match between the antigen and HLA Class II molecules. epitopes of the SARS-CoV-2 spike glycoprotein with each of assess systematically the binding affinities of all continuous purpose, we used a sliding epitope window approach13 in suitableThere were fragments four major (epitopes) aims of the this spike study, glycoprotein as follows. which66 common a sliding (frequency epitope ≥of 0.01) 22 AA HLA sequence Class II length alleles. (suitable For that toFirst, the todifferent investigate 66 HLA exhaustively Class II alleles; the bindingsecond, to affinity identify of for HLA Class II molecule binding) scanned the whole glycoprotein sequence of 1273 AA, for a total of 1251 to HLA Class II molecules, as good candidates for vaccine epitopes. For each epitope, we determined the minimum development;portions of the third,spike glycoproteinto derive a withquantitative high binding estimate affinity of rank of its binding to each of the 66 alleles used in this study,

Page 20 of 23 Charonis SS, Tsilibary EP, Georgopoulos AP. SARS-CoV-2 Virus and Human Leukocyte Antigen (HLA) Class II: Investigation in silico of Binding Affinities for COVID-19 Protection Journal of Immunological Sciences and Vaccine Development. J Immunological Sci. (2020); 4(4): 12-23 as compared to 5 million peptides of the same length; the ranked within the population, thus providing information concerning the rank of that individual among others with smaller the minimum rank, the better the binding affinity. individuals who may not be able to produce an adequate theThis 3 procedure classical enabledHLA Class us toII (a)genes evaluate (DPB1, binding DQB1, affinities DRB1), numberrespect toof thatantibodies measure. following This, in SARS-CoV-2 turn, will helpinfection identify or of individual alleles, (b) compare binding affinities among vaccination. Second, the goodness of match of a population glycoprotein sequence, and thus (d) evaluate the ability overall can be estimated and used as a predictor of the of(c) individuals identify epitopes and populations with high bindingcarrying affinity these alongalleles theto expected antibody response in a natural setting (epidemic) mount antibodies against the spike glycoprotein, assuming or following vaccination. It should be noted that both that those epitopes are of adequate antigenicity and that cases above, regarding individuals and populations, rely individuals are immunocompetent. on two factors, namely the HLA Class II genetic makeup

Location of glycoprotein epitopes with high binding individual, both of these factors are known and, therefore, affinity to HLA Class II molecules itand is thestraightforward binding affinity to ofcompute an allele the in thatgoodness makeup. of match.For an However, for a population, an additional critical element is the frequency of occurrence of an allele, information that the Weglycoprotein found that (Figs. more 6 and than 7), one-halfwith two ofhot high spots affinity in the has a major bearing on estimating the goodness of match binding epitopes were located within the first 500 AA of epitopes from that part of the glycoprotein would have a to be weighted by their frequencies in the population (see in the population as a whole, since allele affinities will have highinterval chance S200-S400 of good match AA. Therefore,with HLA Class vaccines II molecules, containing the equation 5 above). In the present study, we used 66 HLA are in contrast with a recent paper3 which described the there are many more alleles with lower frequencies, and, first step in initiating antibody production. These findings evenClass more II alleles important, with global the frequencies frequencies of ≥0.01. various However, alleles differ across populations (countries, ethnicities, etc.). inmost our “immunodominant” analysis (Figs. 6 and epitopes 7). Although in amino the authors acid positions discuss Unfortunately, complete lists of HLA Class II alleles are not theS510-S570 concept of of the immunodominance, S protein, all of which it should have be low pointed affinities out available for any population because it has never been a useful goal to obtain such a list; instead, most information comes from studies related to . Be that effective which they assessed on T and that as it may, here we outlined a rigorous procedure that antigen-HLAB cells from mice molecule is necessary complex, but notnot sufficient.the antigen The alone: reason in will provide the population estimate of goodness of spike thelies absencein the fact of suchthat Ta match,cells are even the a first highly to antigenicinteract with epitope the glycoprotein-HLA Class II needed: additional information about added alleles will naturally increase the accuracy of the estimate but will not affect the procedure. moleculesin vitro will are not crucial elicit antibody for effective formation vaccine in development. vivo. Therefore, binding affinities of glycoprotein epitopes to HLA Class II Association between allele frequency and binding Application to individuals and populations affinity An important aspect of this study focused on determining the ability of an individual for a good match between spike glycoprotein epitopes and their higherWe allele found frequencies a significant (across positive 6 alleles association of the HLA between Class own HLA Class II molecules. Assuming that individuals IIallele genetic frequency makeup) and were binding associated affinity, with such higher that average are immunocompetent otherwise (i.e. they are not immunocompromised due to disease, age, or drug intake), pressure such that alleles conferring higher protection the goodness of the match above is crucial in initiating againstbinding SARS-CoV-2 affinities (Fig. spike 9). glycoprotein This suggests ultimately an evolutionary became antibody production because it the antigen epitope – more frequent. A further investigation of this relation

association between single allele frequency and binding theHLA 3 Classclassical II HLA molecule Class II complex genes (DPB1, that activates DQB1, DRB1), CD4+ for T in individual genes revealed a highly significant positive alymphocytes. total of 6 such Now, alleles. every We individual combined carries that information 2 from each with of this postulated evolutionary drive is unclear, since SARS- CoV-2affinity is onlya novel for virus. the DPB1A possible gene explanation (Fig. 10). The would source be that of epitopes of the SARS-CoV-2 spike glycoprotein are shared individual,the binding and affinity then ofextended each allele this analysis we determined to populations in this with those of other proteins of virus that have been present study, to derive estimates of the “goodness of match” of an the current observation that exposure to common cold goodnessof individuals of match using of an a individual bootstrap can procedure. be estimated There and (presumablyfor a long time. caused This bycould other be corona a plausible viruses) explanation may protect for were three major findings from this analysis. First, the

Page 21 of 23 Charonis SS, Tsilibary EP, Georgopoulos AP. SARS-CoV-2 Virus and Human Leukocyte Antigen (HLA) Class II: Investigation in silico of Binding Affinities for COVID-19 Protection Journal of Immunological Sciences and Vaccine Development. J Immunological Sci. (2020); 4(4): 12-23 against COVID-1919. We are currently investigating in Author Contributions our lab this general idea by comparing the SARS-CoV-2 All authors have read and agree to the published version spike glycoprotein to proteins of other upper respiratory of the manuscript. Conceptualization, A.P.G.; methodology, S.C. and A.P.G.; software, S.C. and A.P.G.; validation, S.C. and A.P.G.; formal analysis, S.C. and A.P.G.; investigation, S.C., allelesviruses, of and DPB1. evaluating If this hypothesistheir binding is affinitiestrue, age-long to the upper same respiratory66 alleles to epidemics find out a may possible have providedcommon associationthe evolutionary with pressure for selecting HLA Class II molecules with high andE-P.T. A.P.G.; and A.P.G.; visualization, writing—original A.P.G..; supervision, draft preparation, A.P.G.; project E-P.T., administration,S.C. and A.P.G.; A.P.G.; writing—review funding acquisition, and editing, A.P.G. E-P.T., S.C. advantage to their carriers. binding affinity to such epitopes, thus conferring a survival Funding Relevance to the COVID-19 pandemic A pivotal role of HLA Class II in the ability to mount antiSARS-CoV-2 antibodies was also indicated in a study of ConflictsThis research of Interest received no external funding. a cohort of convalescent patients who developed antibodies against different areas of the spike protein of the virus (S1, RBD, and S2 regions) with variable titers of antibodies. More ReferencesThe authors declare no conflict of interest. 1. Padron-Regalado E. Vaccines for SARS-CoV-2: Lessons from other

inspecifically, 10 of the ≈30%175 patients, of recovered the level mild was COVID-19 below the patientslimit of 2. coronavirus strains. Infect Dis Ther. 2020; 9(2): 1-20. generated a deficient level of neutralizing antibody titers; and structural modeling of SARS-CoV-2 spike protein reveals an detection while in contrast, two patients showed very evolutionayJaimes JA, distinct André NM,and proteolytically Chappie JS, sensitive et al. Phylogenetic activation loop. analysis J Mol Biol. 2020; 432 (10): 3309-3325. ability to mount antibodies to the virus should depend on 3. Zhang BZ, Hu YF, Chen LL, et al. Mining of epitopes on spike protein of high titers. This finding strongly suggests that the variable the interplay between virus and host immune response in HLA genotype, at least in part. The authors concluded that SARS-CoV-2 from COVID-19 patients. Cell Res. 2020; 30(8): 702-704. coronavirus infections should be further explored for the 4. doi:Blum 10.1038/s41422-020-0366-x. JS, Wearsch PA, Cresswell P. Pathways of antigen processing. development of effective vaccine against SARS-CoV-220, a Annu. Rev. Immunol. 2013; 31: 443-473. point addressed in detail herein. 5. Straube F. via MHC Class II molecules. In: Vohr HW. (eds) Encyclopedic Reference of Immunotoxicology. Springer As to the duration of anti-virus antibodies, these 0_99 Berlin Heidelberg. 2005. https://doi.org/10.1007/3-540-27806- cohort of 70 patients who had variable antibody titers, 6. beingreached highest a maximum during dayin the 31-40 first since 20 daysonset, of and infection decreasing in a Immunol. 2014; 134: 269-275. Hanna S, Etzioni A. MHC class I and II deficiencies. J Allergy Clin slightly afterwards20 7. Wrapp D, , Corbett KS, et al. Cryo-EM Structure of the 2019-nCoV spike in the prefusion conformation. Science. 2020; Nianshuang Wang N concluded that the .titers Thus, of changes antibodies of antibodyand how levelslong anyand immunity their duration in patients were individual-specific.will last is a key question The authors to be 367(6483): 1260-1263. addressed for safe and effective antiviral treatments and 8. Walls AC, Park YJ, Tortorici MA, et al. structure, function, and vaccines in the future20; this further indicates differences antigenicity of the SARS-CoV-2 spike glycoprotein. Cell. 2020; 180: 9. 281–292.Wang Q, Zhang Y, Wu L, et al. structural and functional basis of SARS- between individuals depending on their ability to mount and sustain an optimal antibody response pointing to the 10. CoV-2Ou X, Liu entry Y, Lei by X,using et al. human Characterization ACE2. Cell. of 2020; spike 181: glycoprotein 1-11. of SARS- CoV-2 on virus entry and its immune cross-reactivity with SARS-CoV. paramountFinally, thesignificance published of informationthe HLA genotype. regarding different antibodies raised by COVID-19 patients strengthens the Nature Communications. 2020; 11: 1620. | https://doi.org/10.1038/ 11. s41467-020-15562-9 | www.nature.com/naturecommunications http://www.allelefrequencies.net/ Accessed on 6.16.2020. The Allele Frequence Database. 12. UniProt Consortium. UniProt: A Worldwide Hub of Protein Knowledge importance of our data which demonstrates specific 2019; 47(D1):D506-D515. forS-domain future vaccine virus peptides development, binding which with will high serve affinity for (1) to 13. Nucleic Charonis Acids S, James Res. LM, Georgopoulos AP. In silico assessment of binding strategicdifferent clinical HLA class management; II alleles. (2) This evaluation information of vaccination is critical

affinities of three dementia-protective Human Leukocyte Antigen (3) assignment of clinical professional and managerial (HLA) alleles to nine human herpes virus . Curr Res Transl teamsefficiency interacting in different with individuals COVID-19 inpatients the general20. population; 14. Med. Immune 2020; Epitope 68(4): 211-216.Database doi: and 10.1016/j.retram.2020.06.002. Analysis Resource: www.iedb.org . Accessed on 6.16.2020.

Page 22 of 23 Charonis SS, Tsilibary EP, Georgopoulos AP. SARS-CoV-2 Virus and Human Leukocyte Antigen (HLA) Class II: Investigation in silico of Binding Affinities for COVID-19 Protection Journal of Immunological Sciences and Vaccine Development. J Immunological Sci. (2020); 4(4): 12-23

15. Jensen KK, Andreatta M, Marcatili P, et al. Improved methods Mateus J, Grifoni A, Selective and cross-reactive SARS-

18. Tarke A, et al. for predicting peptide binding affinity to MHC class II CoV-2 epitopes in unexposed humans. Science. 2020. 10.1126/ molecules. . 2018; 154(3): 394‐406. doi:10.1111/ 19. science.abd3871Shi Y, Wang Y, Shao C, et al. COVID-19 infection: the perspectives on

16. imm.12889.Groß A, Hashimoto CH, Sticht H, Eichler J. Synthetic peptides as immune responses. Cell Death Differ. 2020; 27(5): 1451-1454. doi: protein mimics. Front Bioeng Biotechnol. 2015; 3: article 211. doi: 20. 10.1038/s41418-020-0530-3. 17. 10.3389/fbioe.2015.00211 Responses to SARS-CoV-2 in COVID-19 Inpatients and Convalescent PatientsWang X, Clin Guo Infect X, Qianqian Dis. 2020 Xin Jun O, et4; al.ciaa721. Neutralizing doi: 10.1093/cid/ Antibodies Sachdeva S, Joo H, Tsai J, et al. A rational approach for creating ciaa721. Online ahead of print. 37201-6peptides mimicking antibody binding. Scientific Reports volume 9, Article number: 997. 2019; https://doi.org/10.1038/s41598-018-

Page 23 of 23