medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

1 Longitudinal proteomic profiling of dialysis patients with COVID-19 reveals 2 markers of severity and predictors of death 3 4 Jack Gisby*1, Candice L. Clarke*1,2, Nicholas Medjeral-Thomas*1,2, Talat H. Malik1, 5 Artemis Papadaki1, Paige M. Mortimer1, Norzawani B. Buang1, Shanice Lewis1, 6 Marie Pereira1, Frederic Toulza1, Ester Fagnano1, Marie-Anne Mawhin1, Emma E. 7 Dutton1, Lunnathaya Tapeng1, Arianne C. Richard3, Paul D. W. Kirk4,5, Jacques 8 Behmoaras1, Eleanor Sandhu1,2, Stephen P. McAdoo1,2, Maria F. Prendecki1,2, 9 Matthew C. Pickering1, Marina Botto1, Michelle Willicombe*1,2, David C. Thomas*1,2, 10 James E. Peters*1,6 11 * equal contributions 12 13 Author Affiliations 14 1) Centre for Inflammatory Disease, Dept of Immunology and Inflammation, Imperial College 15 London. 16 2) Renal and Transplant Centre, Hammersmith Hospital, Imperial College Healthcare NHS 17 Trust, London, United Kingdom. 18 3) CRUK Cambridge Institute and Cambridge Institute for Medical Research, University of 19 Cambridge. 20 4) MRC Biostatistics Unit, Forvie Way, University of Cambridge. 21 5) Cambridge Institute of Therapeutic Immunology & Infectious Disease, University of 22 Cambridge. 23 6) Health Data Research UK, London, UK. 24 25 Correspondence to: James E. Peters. Email: [email protected] 26

NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.1 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

27 Abstract 28 End-stage kidney disease (ESKD) patients are at high risk of severe COVID-19. We 29 measured 436 circulating in serial blood samples from hospitalised and non- 30 hospitalised ESKD patients with COVID-19 (n=256 samples from 55 patients). Comparison 31 to 51 non-infected patients revealed 221 differentially expressed proteins, with consistent 32 results in a separate subcohort of 46 COVID-19 patients. 203 proteins were associated with 33 clinical severity, including IL6, markers of monocyte recruitment (e.g. CCL2, CCL7), 34 neutrophil activation (e.g. proteinase-3) and epithelial injury (e.g. KRT19). Machine learning 35 identified predictors of severity including IL18BP, CTSD, GDF15, and KRT19. Survival 36 analysis with joint models revealed 69 predictors of death. Longitudinal modelling with linear 37 mixed models uncovered 32 proteins displaying different temporal profiles in severe versus 38 non-severe disease, including integrins and adhesion molecules. These data implicate 39 epithelial damage, innate immune activation, and leucocyte-endothelial interactions in the 40 pathology of severe COVID-19 and provide a resource for identifying drug targets. 41 42 Introduction 43 Coronavirus disease (COVID-19), caused by the SARS-CoV-2 virus, displays wide clinical 44 heterogeneity from asymptomatic to fatal disease. Patients with severe disease exhibit 45 marked inflammatory responses and immunopathology. The mechanisms underlying this 46 remain incompletely characterised and the key molecular mediators are yet to be 47 determined. The first treatment shown to reduce mortality from COVID-19 in randomised 48 trials was dexamethasone [1], a corticosteroid which has broad non-specific effects on the 49 immune system. Even with corticosteroid treatment, mortality in severe COVID-19 remains 50 significant. There is a wide armamentarium of existing drugs that target inflammation more 51 selectively, providing potential repurposing opportunities for the treatment of COVID-19. 52 Recently, the REMAP-CAP trial has demonstrated efficacy of anti-IL6 receptor blockade in 53 patients admitted to intensive care units with severe disease [2]. In order to select the most 54 promising agents for future trials, we urgently need to better understand the molecular 55 drivers of severe disease. Proteins are the effector molecules of biology and the targets of 56 most drugs. Therefore, proteomic profiling to identify the key mediators of severe disease 57 provides a valuable tool for identifying and prioritising potential drug targets [3]. 58 59 Risk factors for severe or fatal COVID-19 include age, male sex, non-European ancestry, 60 obesity, diabetes mellitus, cardiovascular disease, and immunosuppression [4]. End-stage 61 kidney disease (ESKD) is one of the strongest risk factors for severe COVID-19 (estimated 62 hazard ratio for death 3.69) [4], and ESKD patients hospitalised with COVID-19 have a 63 mortality of approximately 30% [5–8]. ESKD patients have a high prevalence of vascular and

2 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

64 cardiometabolic disease (e.g. hypertension, ischaemic heart disease, diabetes), either as a 65 result of the underlying cause of their renal disease or as a consequence of renal failure. In 66 addition, ESKD results in both relative immunosuppression and chronic low-grade 67 inflammation, which may impact viral defence and the host inflammatory response. 68 69 Here we performed proteomic profiling of serial blood samples of ESKD patients with 70 COVID-19, leveraging the unique opportunity for longitudinal sampling in both the outpatient 71 and inpatient settings afforded by a large multi-ethnic haemodialysis cohort (Figure 1a). 72 These data revealed 221 proteins that are dysregulated in COVID-19 versus matched non- 73 infected ESKD patients. Using linear mixed models, joint models and machine learning, we 74 identified proteins that are markers of COVID-19 severity and risk of death. Finally, we 75 characterised the temporal dynamics of the blood proteomic response during COVID-19 76 infection in ESKD patients, uncovering 32 proteins that display altered trajectories in patients 77 with severe versus non-severe disease. 78 79 Results 80 We recruited 55 ESKD patients with COVID-19 (subcohort A; Table 1). All patients were 81 receiving haemodialysis prior to acquiring COVID-19. Blood samples were taken as soon as 82 feasible following COVID-19 diagnosis. At time of initial sample, 30 patients were outpatients 83 attending haemodialysis sessions and 25 were hospitalised inpatients (Methods, Figure 1). 84 Following the initial blood sample, serial sampling was performed for 51/55 patients. We also 85 recruited 51 non-infected haemodialysis patients as ESKD controls, mirroring the age, sex 86 and ethnicity distribution of the COVID-19 cases (Figure 1 figure supplement 1a-c). We 87 used the Olink proteomics platform to measure 436 proteins (Supplementary File 1a) in 88 256 plasma samples from the COVID-19 patients and the 51 control samples. The proteins 89 measured consisted of 5 multiplex ‘panels’ focussed on proteins relevant to immuno- 90 inflammation, cardiovascular and cardiometabolic disease. The 436 proteins assayed 91 showed strong enrichment for immune-related proteins (Supplementary File 1b). 92 93 In addition, we performed the Olink proteomic assays in 52 serum samples from a separate 94 set of 46 COVID-19 positive ESKD patients (subcohort B), and 11 serum samples from 95 ESKD COVID-19 negative controls (a subset of the controls described above). For the large 96 majority of patients only a single timepoint was available. A higher proportion of these 97 patients (41/46, 89%) were hospitalised and had severe disease (Table 2) than in subcohort 98 A (Figure 1, Table 1). 99 100 Proteomic differences between COVID-19 positive and negative ESKD patients

3 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

101 Principal components analysis (PCA) of proteomic data from subcohort A demonstrated 102 differences between samples from COVID-19 positive cases and controls, although the two 103 groups did not separate into discrete clusters (Figure 2a-b). To examine the effects of 104 COVID-19 on the plasma proteome, we performed a differential expression analysis in 105 subcohort A between COVID-19 cases (n=256 samples passing quality control from 55 106 patients) and non-infected ESKD controls (n=51) using linear mixed models, which account 107 for serial samples from the same individual (Methods). This revealed 221 proteins 108 associated with COVID-19 (5% false discovery rate, FDR); the vast majority were 109 upregulated, with only 40 downregulated (Figure 3a, Supplementary File 1c). In order to 110 provide a succinct and standardised nomenclature, we report proteins by the symbols of the 111 encoding them (see Supplementary File 1a for a mapping of symbols to full 112 names). The most strongly upregulated proteins (in terms of fold change) were DDX58, 113 CCL7, IL6, CXCL11, KRT19 and CXCL10, and the most strongly downregulated were 114 SERPINA5, CCL16, FABP2, PON3, ITGA11 and MMP12 (Figure 3 figure supplement 1). 115 Notably, many of the upregulated proteins were chemotaxins. 116 117 We observed that a high proportion of the measured proteins were associated with COVID- 118 19. Given the highly targeted nature of the Olink panels that we used (enriched for immune 119 and inflammation-related proteins), this was not surprising. Nevertheless, to ensure that the 120 Benjamini-Hochberg adjustment of p-values was controlling the false discovery rate at the 121 5% level, we performed two additional analyses (Methods). First, we estimated the FDR 122 using an alternative method (the plug-in method of Tibshirani [9]); this confirmed appropriate 123 FDR control. Second, we used permutation to estimate the distribution of the number of 124 proteins expected to be declared significant under the null hypothesis of no association 125 between any proteins and COVID-19. This showed that the probability of observing the 126 number of differentially abundant proteins we identified was highly unlikely under the null 127 (empirical p < 1x10-5; Figure 3 figure supplement 2). 128 129 Although our COVID-19 negative controls were well matched in terms of age, sex and 130 ethnicity (Figure 1 figure supplement 1a-c), perfect matching of comorbidities was not 131 feasible in the context of the healthcare emergency at the time of patient recruitment. There 132 was a higher prevalence of diabetes in the COVID-19 cases compared to the controls (61.8 133 % versus 47.1%, respectively; Table 1). To evaluate whether differing rates of diabetes had 134 impacted the proteins identified as differentially abundant between cases and controls, we 135 performed a sensitivity analysis adding diabetes as an additional covariate in the linear 136 mixed model. This did not materially affect our findings; estimated effect sizes and -log10 p- 137 values from models with and without the inclusion of diabetes were highly correlated

4 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

138 (Pearson r > 0.99, and r = 0.95, respectively; Figure 3 figure supplement 3a-b). Full results 139 from both models are shown in Supplementary File 1c. Similarly, there were also 140 differences in the underlying cause of ESKD in cases compared controls (Table 1). We 141 therefore performed a further sensitivity analysis adjusting for underlying cause of renal 142 failure. This did not make any meaningful difference to our results (Figure 3 figure 143 supplement 3c-d, Supplementary File 1c). 144 145 We also considered the possibility that timing of haemodialysis might affect the plasma 146 proteome. To minimise the impact of this, all samples were taken prior to haemodialysis. For 147 the large majority (86.6%) of samples, the most recent haemodialysis was between 48 and 148 72 hours prior to blood draw. This consistency in timing of blood sampling reduces the 149 potential for impact of this issue. Nevertheless, to evaluate whether timing of haemodialysis 150 might have impacted our results, we performed a sensitivity analysis including time from last 151 haemodialysis as a covariate. Our results were not materially affected by this, with -log10 p- 152 values and estimated effect sizes very highly correlated with those obtained without inclusion 153 of this covariate (Pearson r >0.99 for effect size estimates and for -log10 p-values; Figure 3 154 figure supplement 4a-b, Supplementary File 1c).

155 156 We used the smaller subcohort B (n=52 serum samples from 46 patients with COVID-19; 157 Methods) for validation. We first projected the data from subcohort B into the PCA space of 158 subcohort A to examine how well the separation of cases and controls in the PCA space 159 replicated (Methods). This revealed clearer separation of infected and non-infected patients 160 than in subcohort A (Figure 2c-d), perhaps reflecting the higher proportion of hospitalised 161 patients (41 of 46 patients) in subcohort B (Table 2). We next performed differential 162 abundance analysis in subcohort B and found 201 proteins that were dysregulated in cases 163 versus controls (5% FDR) (Supplementary File 1c). Of the 221 differentially abundant 164 proteins from subcohort A, 150 (69.7%) were also identified in subcohort B at 5% FDR 165 (Figure 4a). Effect sizes in each dataset showed a strong correlation (r=0.80, Figure 4b). 166 This demonstrates that our findings are highly reproducible despite differences in sample 167 sizes and blood materials (plasma versus serum in subcohort A and B, respectively). 168 169 Proteins associated with COVID-19 severity 170 Examination of the principal components plot labelling samples by clinical severity at the 171 time of sampling (defined by WHO severity scores, graded as mild, moderate, severe or 172 critical) demonstrated a gradient of COVID-19 severity, best captured by principal 173 components 1 and 3 (Figure 2 figure supplement 1a). To determine the proteomic effects

5 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

174 of COVID-19 severity, we tested for associations between proteins and WHO severity score 175 at the time of blood sampling, using linear mixed models with severity encoded as an ordinal 176 predictor (Methods). This analysis revealed 203 proteins associated with severity (Figure 177 3b, Supplementary File 1d). The majority of these were upregulated in more severe 178 disease, with only 42 down-regulated. A sensitivity analysis adjusting for time since last 179 haemodialysis, made no significant impact on our results (Figure 3 figure supplement 4c- 180 d, Supplementary File 1d). Consistent with previous reports, we found that severe COVID- 181 19 was characterised by elevated IL6. In addition, we observed a signature of upregulated 182 monocyte chemokines (e.g. CCL2, CCL7, CXCL10), neutrophil activation and degranulation 183 (e.g. PRTN3, MPO) and epithelial injury (e.g. KRT19, AREG, PSIP1, GRN). (Figure 3b,c, 184 Figure 5). SERPINA5 and leptin showed the greatest downregulation as COVID-19 severity 185 increased (Figure 3b,c). 186 187 We next asked how does the COVID-19 severity protein signature relate to the proteins that 188 are differentially abundant between cases and controls? The majority (140/203; 69%) of 189 severity-associated proteins were also identified as differentially abundant in the COVID-19 190 positive versus negative analysis (Figure 6a). Log fold changes for proteins in COVID-19 191 versus non-infected patients were correlated with effect sizes in the severity analysis, such 192 that the proteins most upregulated in cases versus controls also tended to show the greatest 193 upregulation in severe disease (Figure 6b). However, there were some notable exceptions 194 (eg CCL20, IL17C, OSM) that were strongly associated with severity but not differentially 195 expressed in infected versus non-infected patients (Figure 6c). 196 197 Supervised learning to predict COVID-19 severity 198 Principal components analysis revealed that some samples from patients who had mild or 199 moderate disease at the time of sampling clustered with samples from patients with severe 200 disease (Figure 2 figure supplement 1a). Examination of the same PCA plot this time 201 labelling samples according to the patient’s overall clinical course (measured by peak WHO 202 severity score over the duration of the illness) (Figure 2 figure supplement 1b), revealed 203 that these samples came from individuals who subsequently developed severe or critical 204 disease. This suggested that molecular changes may predate clinical deterioration. To 205 evaluate this further, we used supervised learning approaches to test if the proteomic 206 signature of the first blood sample for each patient in our dataset could identify whether the 207 patient either had clinically severe COVID-19 at the time of sampling or would develop 208 severe disease in the future. Whereas differential expression analyses consider each protein 209 marker separately, machine learning techniques allow examination of all proteins 210 concurrently, thus capturing non-linear relationships in the dataset. Using Random Forests,

6 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

211 we trained a classifier on the first sample for each COVID-19 patient to predict the overall 212 clinical course, defined by peak WHO severity. For the purposes of this analysis, we 213 binarised clinical course into either WHO mild/moderate or severe/critical. 214 215 The Random Forests method achieved 71% accuracy in predicting peak severity. By 216 contrast, using only clinically available predictors (demographics, comorbidities and clinical 217 laboratory results), the Random Forests method achieved 66% accuracy in predicting peak 218 severity. Combining clinical parameters plus proteins did not improve accuracy (71%) 219 compared to using proteomic predictors alone, suggesting that the information contained in 220 the clinical predictors is captured at the proteomic level. While we do not believe that 221 proteomic profiling is likely to enter clinical practice for risk stratification during this 222 pandemic, the features selected by the classifier can highlight proteins of biological 223 importance. We therefore interrogated the model to identify key proteins by calculating 224 feature importance metrics (Methods, Supplementary File 1e). The most important 225 proteins for indicating the presence of current or future severe disease were IL18BP, CTSD 226 (Cathepsin D), GDF15, KRT19, TNFSF11 and IL1RL1 (ST2) (Figure 7a). It is notable that 227 through this distinct analytical approach, KRT19 again emerged as a key biomarker of 228 severe disease. 229 230 Proteins associated with risk of death 231 9/55 patients in subcohort A died. We therefore sought to identify proteins associated with 232 risk of death. To leverage the dynamic nature of repeated protein measurements for 233 prediction of death, we utilised joint models, which combine linear mixed models and Cox 234 proportional hazards models [10,11] (Methods). This analysis identified 44 proteins for 235 which increased concentration was associated with increased risk of death (Figure 7b, 236 Supplementary File 1f), including CST3, IL22RA1, AZU1, CCL28 and SPON1, and 25 237 proteins for which increased concentration was associated with reduced risk of death, 238 including CD84, TNFSF12, TANK, PRKCQ and ADM. 239 240 Associations with clinical laboratory tests 241 A number of routine clinical laboratory tests have well characterised associations with 242 COVID-19 (e.g. elevated inflammatory markers, d-dimer and reduced lymphocyte count) 243 [12]. We therefore compared our proteomic data from COVID-19 patients at each timepoint 244 to contemporaneous clinical laboratory measurements using linear mixed models 245 (Methods). We found associations between plasma proteins and all clinical laboratory 246 measurements except troponin (Figure 8, Supplementary File 1g). Many of these proteins 247 were also markers of severity (e.g. IL6, KRT19, IFN-gamma and CXCL10 were strongly

7 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

248 associated with raised CRP and ferritin and reduced lymphocyte counts). Of note CCL7, a 249 monocyte chemokine that was also identified as an important marker of severity by the 250 Random Forests classifier, was associated with lower monocyte count and raised 251 inflammatory markers. Elevated neutrophil count was associated with Oncostatin-M, which 252 regulates IL6, GCSF and GMCSF production, and with the proteases MMP9 and defensin. 253 254 Longitudinal analysis reveals proteins with distinct temporal profiles according to severity 255 The immune response to infection is dynamic, and therefore snapshot measurements 256 provide only partial insights. Leveraging the dense serial sampling in our dataset (Figure 1), 257 we modelled the temporal trajectory of each protein and asked whether or not any protein 258 trajectories differed in patients with a severe/critical versus mild/moderate overall clinical 259 course. This was achieved using linear mixed models that included a term for time from first 260 symptoms and a time x severity interaction term (Methods). 261 262 178 proteins displayed a significant association with time from first symptoms (5% FDR), 263 demonstrating the temporal variability in plasma proteins across the disease course 264 (Supplementary File 1h). Moreover, we identified 32 proteins for which there was 265 significant interaction between time and severity, i.e. proteins displaying differential temporal 266 trajectories between mild/moderate and severe/critical infections (Supplementary File 1h, 267 Figure 9). Among the proteins with the strongest temporal differences according to clinical 268 course were the integrins ITGA11 and ITGB6, the adhesion molecule ICAM1, TNFRSF10B 269 (a receptor for TRAIL) and PLAUR, the receptor for urokinase plasminogen activator. Most 270 of these proteins exhibited rising profiles in the more severe patients but flat profiles in milder 271 cases. ACE2, the receptor for SARS-CoV-2, also displayed this pattern (Figure 9). In 272 contrast, abundance of ITGA11, which was also identified as reduced in the analysis of 273 infected versus non-infected patients, fell over time in the severe group. 274 275 Testing for proteins associated with ethnicity 276 In the UK, individuals from ethnic minorities are at higher risk of severe disease and death 277 from COVID-19 [4]. We therefore examined whether any of the proteins we measured 278 exhibited differences across ethnicities, analysing COVID-19 positive cases and controls 279 separately (Methods). In COVID-19 negative ESKD patients, no proteins were significantly 280 associated with ethnicity in a multivariable model adjusting for age and sex. In COVID-19 281 positive ESKD patients, there is the potential for protein associations with ethnicity to be 282 confounded by disease severity. To account for this, we included severity as well as age and 283 sex as covariates. A single protein, LY75, was associated with ethnicity in this multivariable 284 model (nominal P 0.0001, Benjamini-Hochberg adjusted P 0.04, with higher levels in white

8 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

285 patients). Using the same within cases analysis strategy in subcohort B, we found no 286 proteins were significantly associated with ethnicity after multiple testing correction, although 287 the nominal P value for LY75 was 0.025. While these analyses failed to identify substantial 288 ethnicity-related variation in the proteins we measured, an important caveat is that there 289 were relatively modest numbers of individuals from each ethnic group, and so statistical 290 power was limited. Larger multi-ethnic studies are needed to adequately address this 291 question. 292 293 Comparisons to other proteomic studies in COVID-19 294 Other studies have used a variety of proteomic platforms to investigate COVID-19. We 295 compared our findings to those of 3 published studies [13–15] and a preprint by Filbin et al 296 [16]. Of the 221 proteins that were differentially abundant in our analysis of COVID-19 297 positive versus negative ESKD patients, 116 associations had been previously reported 298 (Supplementary File 1i). Of the 203 proteins associated with severity, 165 had previously 299 been reported (Supplementary File 1j). 300 301 We focussed in more detail on the study by Filbin et al [16] because of the large sample size 302 and the breadth of proteomic assay used. This study comprised 384 patients with acute 303 respiratory distress (306 COVID-19 positive and 79 COVID-19 negative), and measured 304 1,472 proteins using the Olink Explore platform. 417 of these were also measured in our 305 study. Of the 221 proteins differentially abundant in our case/control analysis, 210 were 306 measured in their study. Of these 100 (47.6%) were significant in their analysis of COVID-19 307 positive versus negative respiratory distress. In addition, we observed strong correlation 308 (r=0.69) between the estimated log fold changes in our and their studies (Figure 4 figure 309 supplement 1). Of the 203 proteins associated with severity in our study, 192 were 310 measured in their study. 157 of these were significantly associated with severity, giving a 311 concordance of 81.8%. Thus, despite the differences in study design and clinical 312 populations, we observed notable similarities in our results and those reported by Filbin et al 313 [16]. 314

315 Discussion 316 317 In this study we performed plasma proteomic profiling of haemodialysis patients with COVID- 318 19. A strength of our study was that we were able to perform serial blood sampling in both 319 the outpatient and inpatient setting, including longitudinal samples from the same individual 320 before and after hospitalisation. This was possible because haemodialysis patients are 321 unable to fully isolate as they must continue to attend for regular dialysis sessions.

9 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

322 Moreover, haemodialysis patients represent an important group since ESKD is one of the 323 strongest risk factors for death from COVID-19 [4,6–8]. Data from the UK Renal Registry 324 shows that 7- and 14-day mortality for COVID-19 infected in-centre haemodialysis patients 325 was 11% and 19%, respectively [17]. Data from the Scottish Renal Registry estimates 30- 326 day mortality following a positive COVID-19 test as 22%, and as of 31 May 2020, 28.2% of 327 renal replacement therapy patients who had a positive COVID-19 test had died [18]. In our 328 local population of 1,352 in-centre haemodialysis patients, 315 patients had tested positive 329 for COVID-19 by the end of our study period (31 May 2020), of whom 53% required 330 hospitalisation and 85 (27%) died. The OpenSAFELY study [4] examined ~17 million UK 331 primary care records and linked these to the UK COVID-19 mortality register. Patients with 332 estimated glomerular filtration rate (eGFR) <30ml/min/1.73m2 had a hazard ratio (HR) for 333 death of 3.56 after adjustment for age and sex. 334 335 In part, the high mortality from COVID-19 in ESKD patients likely reflects the fact that these 336 patients are enriched for cardiometabolic traits that predispose to severe COVID-19. 337 However, in multivariable analyses adjusting for these factors, impaired renal function 338 remains an independent risk factor for severe COVID-19 [4]. Moreover, there is an inverse 339 relationship between renal function and risk of death from COVID-19 across the spectrum of 340 chronic kidney disease. These observations support the notion that the state of ESKD per se 341 is an important determinant of outcome in COVID-19. ESKD is well recognised as an 342 immunosuppressed state [19–21], with defects in both innate and adaptive immunity [22– 343 25]. Accordingly, ESKD confers increased vulnerability to viral infections including influenza 344 and respiratory syncytial virus [26–29]. In addition, ESKD results in a chronic low-grade 345 inflammatory state [30]. This tendency to a pro-inflammatory state, combined with reduced 346 ability to respond to viruses, may contribute to the abnormal host response to SARS-CoV-2 347 infection, producing the immunopathology that leads to severe COVID-19. 348 349 Our comparison of COVID-19 positive and negative haemodialysis patient plasma samples 350 revealed 221 proteins that were differentially abundant in COVID-19. The majority of these 351 were upregulated, with strong representation of viral response proteins (e.g. DDX58, IFNG), 352 /chemokines (e.g. IL6, CCL7, CXCL10 and CXCL11) and epithelial proteins (e.g. 353 KRT19, PSIP1) (Figure 3a). The COVID-19 negative controls in this analysis were carefully 354 matched to cases in terms of age, sex and ethnicity. However, complete matching of clinical 355 characteristics was not feasible; there were differences in the prevalence of diabetes and the 356 underlying causes of ESKD between COVID-19 positive cases and controls (Table 1). 357 Sensitivity analyses adjusting for these covariates gave highly consistent results, indicating

10 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

358 that our findings are robust. In addition, we validated our findings when we analysed serum 359 samples from a separate subcohort of COVID-19 positive ESKD patients. 360 361 ESKD is itself likely to significantly impact the plasma proteome. Previous cross-sectional 362 studies have shown that the levels of many circulating proteins have an inverse relationship 363 with eGFR [31,32]. A longitudinal study using an Olink proteomics panel (although not one 364 used in our study) found that for 74% of the 84 proteins measured, protein levels rose as 365 eGFR fell [33]. For many proteins, it is unclear whether this inverse relationship with renal 366 function reflects cause or effect. Some proteins may be increased in chronic kidney disease 367 due to reduced renal clearance, some may be elevated secondary to tissue injury or chronic 368 inflammation, and others may be drivers of renal injury. Regardless, this observation of 369 widespread changes in the blood proteome of kidney disease patients emphasises the 370 importance of using COVID-19 ESKD patients rather than healthy individuals as our control 371 group. 372 373 Analysis within COVID-19 cases revealed 203 proteins associated with disease severity, the 374 strongest of which was IL6 (Figure 3b). Association of IL6 with severe disease is well- 375 established and has already received considerable attention [34,35]. Despite promising initial 376 case reports of IL6R receptor blockade in COVID-19, convincing efficacy was not 377 demonstrated in early randomised trials [36]. More recently, the REMAP-CAP trial has 378 shown the benefit of anti-IL6R therapy when given to critically ill patients on admission to 379 intensive care units [2], indicating that IL6 does contribute to critical illness from COVID-19. 380 Our finding that IL6 was most strongly upregulated in severe disease demonstrates the 381 value of plasma proteomic profiling in identifying putative drug targets. 382 383 Members of the CCL and CXCL chemokine families (e.g. CCL2, CCL7, CCL20 and 384 CXCL10) were strongly associated with severity. Likewise, higher levels of CCL2, CCL7, 385 CCL20 and CXCL10 were associated with lower blood lymphocyte count and higher 386 inflammatory markers (CRP and ferritin) (Supplementary File 1g), which are clinical 387 markers of severe disease and poorer outcome in COVID-19 [37]. Of note, CCL20 is a 388 chemoattractant for lymphocytes [38], and its negative association with lymphocyte count 389 may reflect a direct effect on migration of lymphocytes from the blood into the tissues rather 390 than simply marking severe disease. CCL2 (also known as MCP-1) and CCL7 (MCP-3), are 391 both chemokines for monocytes, and CXCL10 has pleiotropic immunological effects 392 including chemotaxis. These chemokines were also negatively correlated with blood 393 monocyte count, suggesting recruitment of these innate immune cells into damaged tissues. 394

11 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

395 The neutrophil proteases PRTN3 (proteinase-3) and MPO (myeloperoxidase) (Figure 5) and 396 the neutrophil-derived protein AZU1 were associated with severe disease (Supplementary 397 File 1d), indicating that neutrophil activation and degranulation are features of severe 398 COVID-19. Degranulation of neutrophils releasing PRTN3 and MPO could potentially 399 contribute to oxidative damage in the lungs and thus more severe disease. 400 401 A striking finding of our study was the association of disease severity with upregulation of 402 epithelial proteins (e.g. KRT19) and epithelial tissue repair pathways (e.g. PSIP1, AREG, 403 GRN (progranulin)), most likely reflecting lung and vascular damage. KRT19 was notably 404 prominent in our analyses, as well as the study by Filbin et al. [16](Supplementary File 1j). 405 KRT19 is an intermediate filament protein, important for the structural integrity of epithelial 406 cells [39]. These data suggest that severe COVID-19 is characterised by destruction of the 407 lung epithelium and vascular endothelium. Vascular injury might thus explain the high level 408 of vascular thrombosis seen in patients in severe disease. In summary, our data reveal that 409 severe COVID-19 is characterised proteomically by a signature of innate immune activation 410 and epithelial injury. 411 412 69% of proteins associated with severity were also differentially abundant in the case versus 413 control analysis (Figure 6a), and for the large majority of proteins the within-cases severity 414 analysis effect size was proportional to the fold change between cases and controls (Figure 415 6b). This suggests that, in general, the distinction in the plasma proteome between severe 416 and mild COVID-19 is a quantitative difference in the COVID-19 signature, rather than there 417 being an orthogonal signature involving a different set of proteins. Consistent with this 418 concept, examination of PCA plots coloured by severity revealed that while there was a 419 gradient of COVID-19 severity, the samples from severe or critical patients did not form a 420 discrete cluster distinct from those from patients with milder disease (Figure 2 figure 421 supplement 1). However, there were a few exceptions where proteins that were associated 422 with severity were not upregulated in the case-control analysis. These included OSM, IL17C, 423 and CCL20 (Figure 6c). These proteins therefore reflect biological processes specifically of 424 severe disease and may represent therapeutic targets. 425 426 Survival analysis identified 44 proteins associated with increased risk of death (Figure 7b). 427 As expected, many of these were also associated with disease severity, high CRP and lower 428 lymphocyte count (Figure 7 figure supplement 1). In contrast, 25 proteins were associated 429 with reduced risk of death (Figure 7b). One such protein is the multi-functional 430 TNFSF12 (TWEAK). Although TWEAK can exert pro-inflammatory effects, it also can inhibit 431 the innate immune response [40] and promote tissue repair and endothelial cell proliferation

12 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

432 and survival [41], which may be beneficial responses in COVID-19. This illustrates that 433 although proteins associated with inflammation are often thought to be destructive, the 434 inflammatory response also induces a programme for limiting injury and initiating tissue 435 repair. Insufficient activation of such homeostatic mechanisms may contribute to why some 436 individuals get severe COVID-19. 437 438 The host immune response to COVID-19 is a dynamic process, and clinical deterioration 439 typically occurs 7-10 days after first symptoms. Temporal information may therefore be 440 important in determining optimum timing of therapeutic intervention (e.g. blockade of a 441 particular cytokine). By taking serial samples and examining their patterns within individuals 442 over time, we were able to model protein trajectories and found that many proteins display 443 temporal variability during COVID-19. Longitudinal measurements also allow molecular 444 comparison of severe versus mild disease trajectories. By modelling the interaction term 445 between time from first symptoms and overall disease course, we found 32 proteins that 446 displayed distinct temporal profiles in severe vs mild disease. These results point to 447 enhanced leucocyte-endothelial cell interactions indicated by upregulation of cell adhesion 448 molecules (e.g. ITGB6, ICAM1) in severe disease. This endothelial activation may contribute 449 to COVID-19-associated thrombosis discussed above. Management of thrombosis in 450 COVID-19 currently consists of anticoagulation. Our results suggest that disrupting 451 leucocyte-endothelial interactions may be a complementary therapeutic strategy. 452 453 Several proteins associated with either risk of death or clinical severity lie in pathways 454 targeted by existing drugs. PARP1 was identified as an important marker of current or future 455 severe COVID-19, and also was associated with risk of death. PARP1 is associated with 456 inflammatory and vascular disease [42]. PARP1 inhibitors are in use for cancer [43], and our 457 data suggest that re-purposing of PARP1 inhibition in COVID-19 should be explored further. 458 IL33 was associated with both risk of death and clinical severity, and its receptor IL1RL1 459 (ST2) was associated with clinical severity and identified as an important predictor of severe 460 clinical course. Monoclonal antibodies against IL33 and its receptor are in late-stage 461 development for asthma [44], and could also be explored in COVID-19. As discussed above, 462 MPO was associated with clinical severity. MPO inhibitors [45] might have a role in reducing 463 neutrophil-mediated tissue injury in COVID-19. Finally, inhibitors of monocyte chemokines 464 (e.g. CCL2) and their receptors have been developed [46,47], although drugging these 465 pathways is made more challenging by molecular cross-talk. An important caveat is that we 466 cannot determine whether the associations we observed are drivers of pathology in COVID- 467 19 or simply reflect the downstream consequences of inflammation and tissue injury. Future

13 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

468 studies using Mendelian randomisation analysis will provide a useful tool for assessing 469 causality and prioritizing drug targets. 470 471 Other groups have studied the plasma or serum proteome in COVID-19 [13–16,48], using 472 either mass spectrometry or immunoassays including the Olink platform. Mass spectrometry 473 is less sensitive than immunoassays and so it likely to be unable to detect many of the 474 cytokines measured here. Conversely, it can provide complementary information by 475 measuring many proteins that our immunoassays did not target. A limitation of our study was 476 that we used Olink panels that measured specific proteins selected on their relevance to 477 inflammation, immunity, cardiovascular and metabolic disease. This bias precluded formal 478 pathway enrichment analysis. In general, our results had greater similarities to studies that 479 used immunoassays over mass spectrometry (Supplementary File 1i-j). 47.6% of proteins 480 differentially expressed in COVID-19 positive versus negative ESKD patients in our study 481 were differentially expressed in COVID-19 positive versus negative acute respiratory distress 482 syndrome patients in the study of Filbin et al [16], who used a different Olink proteomics 483 platform. Moreover, we observed consistent effect sizes (Figure 4 figure supplement 1). 484 These similarities are striking given the difference in clinical populations and control groups; 485 in Filbin et al’s report the controls included patients with non-COVID-19 respiratory infections 486 whereas our control group did not have active infection. The concordance in proteins 487 associated with COVID-19 severity within cases was even higher (81.8%). The similarities 488 suggests a similar plasma proteomic signature of COVID-19 across different clinical 489 populations, particularly the signature associated with severity. 490 491 In summary, this study reveals proteins associated with COVID-19 infection and severity, 492 and demonstrates altered dynamic profiles between patients with severe disease and those 493 with a more indolent course. Our results emphasise the importance of studying and targeting 494 mechanisms that reduce the lung epithelial and endothelial damage to both alleviate the 495 severity of the infection and to reduce the chance of long-lasting complications. These data 496 provide a valuable resource for therapeutic target prioritisation. 497 498 Materials and Methods 499 500 Subjects and samples 501 Ethical approval: All participants (patients and controls) were recruited from the Imperial 502 College Renal and Transplant Centre and its satellite dialysis units, London, and provided 503 written informed consent prior to participation. Study ethics were reviewed by the UK 504 National Health Service (NHS) Health Research Authority (HRA) and Health and Care

14 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

505 Research Wales (HCRW) Research Ethics Committee (reference 20/WA/0123: The impact 506 of COVID-19 on patients with renal disease and immunosuppressed patients). Ethical 507 approval was given. 508 509 Subcohort A: We recruited 55 COVID-19 positive haemodialysis patients, either as 510 outpatients or as inpatients (Table 1). All patients were receiving in-centre outpatient 511 haemodialysis prior to COVID-19 diagnosis. COVID-19 was confirmed in all cases with 512 positive nasal PCR for the SARS-CoV-2 virus. Patients were recruited during the first UK 513 national lockdown, with recruitment from 8th April – 30th May 2020. Blood was collected in 514 EDTA tubes and centrifuged to obtain plasma, and stored at –80oC. Sample processing was 515 performed within 4 hours of venepuncture. The initial sample was taken as an outpatient for 516 30 patients and as an inpatient for 25. Where feasible, serial blood samples were taken. In 517 total, 259 samples were taken (3 subsequently failed QC – see below). The median number 518 of serial samples was 5 (range 1-10) (Figure 1 figure supplement 1d). 8 patients who were 519 recruited as outpatients were subsequently admitted to hospital with COVID-19 over the 520 course of the study. 27 of 55 (49.1%) patients had severe or critical disease (defined by 521 peak WHO severity). 9 (16.4%) patients died. 522 523 In addition, we recruited 51 COVID-19 negative haemodialysis controls. COVID-19 negative 524 haemodialysis controls were selected to mirror the cases in terms of demographic features 525 (age, sex, ethnicity) (Figure 1 figure supplement 1a-c). These control patients had no 526 clinical features of any other infection. 527 528 Subcohort B: We also recruited a separate set of 46 COVID-19 positive haemodialysis 529 patients (Table 2). These patients were recruited from the same centre, but slightly earlier 530 than subcohort A (recruitment commenced on 30th March 2020). For these patients, blood 531 was collected in serum tubes and centrifuged to obtain serum. At this time, we had very 532 limited access to laboratory facilities and so plasma was not collected from these patients. 5 533 were outpatients and 41 were inpatients, reflecting the fact that UK policy was weighted 534 towards inpatient testing at the time these patients were recruited. 33 of 46 patients (71.7%) 535 had severe or critical disease (by peak WHO severity), and 9 (19.6%) patients died. For 40 536 patients, only one sample from a single timepoint was collected, and for 6 patients, 2 537 samples were collected. To provide controls for subcohort B, we used serum samples from 538 11 non-infected haemodialysis patients (collected at the same time as plasma from a subset 539 of the ESKD control group described above). 540 541 Clinical severity scores

15 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

542 Severity scoring was performed based on WHO classifications (WHO clinical management 543 of COVID-19: Interim guidance 27 May 2020) adapted for clinical data available from 544 electronic medical records. ‘Mild’ was defined as COVID-19 symptoms but no evidence of 545 pneumonia and no hypoxia. ‘Moderate’ was defined as symptoms of pneumonia or hypoxia 546 with oxygen saturation (SaO2) greater than 92% on air, or an oxygen requirement no greater 547 than 4L/min. ‘Severe’ was defined as SaO2 less than 92% on air, or respiratory rate more 548 than 30 per minute, or oxygen requirement more than 4L/min. ‘Critical’ was defined as organ 549 dysfunction or shock or need for high dependency or intensive care support (i.e. the need for 550 non-invasive ventilation or intubation). Severity scores were charted throughout a patient’s 551 illness. We defined the overall severity/clinical course for each patient as the peak severity 552 score that occurred during the patient’s illness. 553 554 Proteomic assays 555 Plasma and serum proteomic measurements were performed using Olink proximity 556 extension immunoassays (https://www.olink.com/products/). Five 92-protein multiplex Olink 557 panels were run (‘inflammation’, ‘immune response’, ‘cardiometabolic’, ‘cardiovascular 2’ and 558 ‘cardiovascular 3’), resulting in 460 measurements per sample. Since a small number of 559 proteins were measured on more than one panel, we measured a total of 436 unique 560 proteins. The Olink assays were run using 88 samples/plate. All plates were run in a single 561 batch. Plate layouts was carefully designed to avoid confounding of potential plate effects 562 with biological or clinical variables of interest. To achieve this, we used an experimental 563 design that combined ensuring case/control balance across plates with random selection of 564 samples from each category and random ordering of allocation to wells. This is outlined in 565 more detail as follows. We ensured that each plate contained a mixture of control and case 566 samples. Specifically, a fixed proportion of each plate was designated for control samples. 567 The allocation of specific control samples to each plate was performed using randomisation. 568 For the case samples, we again used randomisation for plate assignment, with the constraint 569 that once one sample from a given patient was allocated to a plate, all other longitudinal 570 samples from that patient were assigned to same plate. Finally, once all the samples had 571 been allocated to plates, the layout of samples within each plate was determined through a 572 further randomisation step for well allocation. 573 574 Protein annotation 575 We used the Human Protein Atlas version 20.0 [49] for protein annotation (Figure 1 figure 576 supplement 2). We performed enrichment analysis of the 436 proteins that we measured 577 using string-db [50]. 578

16 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

579 Normalisation and quality assessment and control 580 The data was normalised using standard Olink workflows to produce relative protein 581 abundance on a log2 scale (‘NPX’). Quality assessment was performed by a) examination of 582 Olink internal controls, and b) inspection of boxplots, relative log expression (RLE) plots [51] 583 and PCA. Following these steps, 3 poor quality samples were removed. In addition, 5 584 samples failed quality control on a single proteomic panel only, with the remaining panels 585 passing QC. For these samples, proteins on the panel that failed QC were set to missing, 586 and the data for the remaining proteins was retained. 587 588 Principal components analysis revealed no substantial impact of plate effects (Figure 2 589 figure supplement 2). 13 proteins were assayed more than once due to their inclusion in 590 multiple Olink panels. For plasma, the median correlation between the assays was 0.986 591 with an IQR of 0.974-0.993 and a range of 0.925 to 0.998. For serum, the median correlation 592 between the assays was 0.991 with an IQR of 0.952-0.995 and a range of 0.737-0.999. We 593 removed duplicate assays at random prior to subsequent analyses. 594 595 For 11 ESKD controls, we had contemporaneous plasma and serum samples. To assess the 596 comparability of these two matrices, we calculated the Pearson’s correlation coefficient 597 between the assays for each protein (Supplementary File 1k). 344/436 (78.9%) of proteins 598 had a Pearson’s r > 0.5. We also report the variance of each protein in plasma and serum 599 since low correlation may reflect low variance. The proteins with the lowest estimated 600 Pearson correlation coefficient were AZU1, STK4 and TANK. We highlight that this 601 comparison had small sample size (only 11 samples) and that the samples were from control 602 patients without infection. Caution should be made in extrapolating these findings to the 603 context of active infection where protein dynamic ranges may be different. 604 605 Missing values 606 Following QC, 0.22% data points were missing for the plasma dataset and 0.35% for the 607 serum dataset. For analyses that required no missing values (PCA and supervised learning), 608 we imputed missing values as follows. The dataset was first scaled and centred, and missing 609 values imputed using caret’s k-nearest neighbours (kNN) method [52]. The 5 closest 610 samples (by Euclidean distance) were used to estimate each missing value. 611 612 Principal Components Analysis 613 Singular value decomposition was used to perform PCA on the proteomic data from 614 subcohort A (plasma samples). We then used the loadings from subcohort A together with

17 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

615 the proteomic data from subcohort B to calculate principal components scores. This enabled 616 projection of subcohort B data into the PCA space of subcohort A. 617 618 Differential protein abundance analysis: COVID-19 positive versus negative 619 Differential protein abundance analyses between COVID-19 positive and negative samples 620 were performed using linear mixed models, to account for the use of serial samples from the 621 same individuals (R lme4 package [53]). This analysis compared 256 samples from 55 622 COVID-19 patients with 51 non-infected patients (1 sample per non-infected patients). Age, 623 sex and ethnicity were included as covariates. We used a random intercept term to estimate 624 the variability between individuals in the study and account for repeated measures. The 625 regression model in R notation was:

626 NPX ~ covid_status + sex + age + ethnicity + (1 | individual)

627 where NPX represents the protein abundance and covid_status was a categorical variable 628 (infected/non-infected). Sex and ethnicity were also categorical variables. Age was a 629 quantitative variable. We calculated P values using a type 3 F test in conjunction with 630 Satterthwaite’s method for estimating the degrees of freedom for fixed effects [54]. The 631 regression model was fitted for each of the 436 proteins individually. Multiple testing 632 correction was performed using the Benjamini-Hochberg method and a 5% FDR used for the 633 significance threshold. 634 635 The same approach was used for subcohort B. This analysis comprised 52 serum samples 636 from 46 COVID-19 positive patients versus 11 samples from non-infected patient samples (1 637 sample per non-infected patient). 638 639 As sensitivity analyses, we repeated the differential abundance analyses between case and 640 controls for the subcohort A adjusting for additional covariates and comparing this to the 641 basic model (i.e. using age, sex, and ethnicity alone). This was performed for each of the 642 following parameters: diabetes status, cause of ESKD, and time to last haemodialysis. 643 644 Testing for associations between proteins and clinical severity 645 For testing the association of plasma proteins with the 4-level WHO severity rating (mild, 646 moderate, severe and critical) within COVID-19 positive cases from subcohort A (n=256 647 samples from 55 patients), we used a similar linear mixed modelling approach to the COVID- 648 19 positive versus negative differential abundance analysis; for this analysis, the 649 covid_status term was replaced by a severity variable encoded using orthogonal polynomial 650 contrasts to account for ordinal nature of severity levels. As before, age, sex and ethnicity

18 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

651 were included as covariates. As a sensitivity analysis, we repeated the analysis with time to 652 last haemodialysis (days) as an additional covariate. 653 654 Testing for associations between proteins and clinical laboratory tests 655 The linear mixed modelling strategy was also employed for testing association of temporal 656 clinical laboratory variables and protein levels, with the value of the clinical variable (as a 657 quantitative trait) used in place of covid_status. Only COVID-19 positive patients were 658 included in this analysis. Contemporaneous lab measurements were not available for all 659 samples. This varied according to the clinical lab parameter. Some (eg troponin, d-dimer) 660 were measured less frequently than full blood count and CRP. Details of the proportion of 661 missing values for each lab parameter are included in (Supplementary File 1g). We also 662 calculated correlations between clinical laboratory variables and protein levels using the R 663 package rmcorr, which determines the overall within-individual relationship among paired 664 measures that have been taken on two or more occasion [55]. 665 666 Testing for associations between proteins and ethnicity 667 We performed testing of protein levels and ethnicity separately in COVID-19 negative ESKD 668 patients and COVID-19 positive ESKD patients. These analyses were limited to individuals 669 who were White, South Asian (Indian, Pakistani or Bangladeshi ancestry) or Black as there 670 were too few individuals from other ethnic groups for meaningful interpretation. For COVID- 671 19 negative patients (1 sample per patient), we performed linear regression for each protein 672 with ethnicity as the predictor variable, and age and sex as covariates. For COVID-19 673 positive patients we used a linear mixed model to account for serial samples from the same 674 individual, again with age, and sex as covariates. 675 676 Multiple testing correction 677 We used the Benjamini-Hochberg method to control the FDR at 5% for all statistical 678 analyses. 679 680 Alternative estimation of the FDR using the plug-in method 681 To provide additional support that the Benjamini-Hochberg procedure was providing 682 adequate control of the FDR, we also used the plug-in method of Tibshirani [9] as an 683 alternative method to estimate the FDR, as described below. 684 1. We defined R as the number of associations declared significant in the real data. 685 2. We defined C as the test statistic used as the significance threshold used in the real data 686 (i.e. that corresponding to an adjusted p-value of 0.05).

19 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

687 3. The expected number of proteins that we would find significant under the null hypothesis 688 that no proteins are differentially abundant between COVID-19 positive versus negative 689 patients (i.e. false positives) was estimated using a permutation strategy. We randomly 690 permuted each individuals’ COVID-19 status label 100,000 times and, in each case, 691 repeated the differential abundance analysis on the permuted data. The estimated the 692 number of false positives (͐Ȩ) was then estimated by the number of associations with test 693 statistic > C in 100,000 permutations of the data, divided by the number of permutations. 694 4. The estimated FDR was then calculated as ͐Ȩ/R. 695 696 We implemented a similar approach for the testing the association of proteins with severity 697 scores within cases. 698 699 Using this method, the estimated FDR for the case versus control analysis was 0.062 and for 700 the severity analysis 0.057, indicating that we had appropriately controlled the FDR. 701 702 Empirical p-value calculation 703 As a complementary analysis, based on the approach of Filbin et al. [16], we estimated the 704 empirical P-value for the likelihood of observing as many significant proteins as we identified 705 in the real data if the null hypothesis of no differentially abundant proteins in cases versus 706 controls were true. We again used 100,000 permutations of the case control labels to 707 estimate the null distribution. We performed Benjamini-Hochberg adjustment on the nominal 708 p-values of each permutation, and counted the number of proteins that were significant 709 (adjusted p-value <0.05) in each permutation. 710 The distribution of the number of proteins declared significant is shown in Figure 3 figure 711 supplement 2a; on no occasion in 100,000 permutations did we observe more proteins 712 declared significant than in the real data. We can thus state that the empirical P-value (the 713 fraction of permutation runs where we observed ≥ the number of associations in the real 714 data) is less than 1/100,000 = 1x10-5. 715 716 We also applied this method to the association testing of proteins with severity scores within 717 cases (Figure 3 figure supplement 2b). Again, on no occasion in 100,000 permutations did 718 we observe more proteins declared significant than in the real data (empirical p-value <1x10- 719 5). 720 721 Supervised learning

20 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

722 Random forest models were fit using R’s randomForest and caret packages [52,56]. Data 723 was centred, scaled and imputed as in Data Preparation with the caveat that, during cross- 724 validation, the pre-processing procedure was first applied on the resampled (training) data 725 before the same method was applied without re-calculation to the holdout (test) set. To 726 estimate model accuracy, we used 4-fold cross-validation. The cross-validation procedure 727 was repeated 100 times. The model’s parameters were kept constant at 500 trees and an 728 mtry value (number of proteins randomly sampled as candidates at each node) calculated as 729 the square root of the number of features. After parameter estimation, we fit a final model 730 trained using the entirety of the dataset. This model was used for subsequent feature 731 extraction. Random forest feature extraction was carried out using the R 732 randomForestExplainer package. We made use of the following importance measures: 733 accuracy decrease (the average decrease in prediction accuracy upon swapping out a 734 feature), number of trees (the number of trees with a node corresponding to a feature) and 735 mean minimal depth (the average depth at which a node corresponding to a feature occurs). 736 Three models were generated with different input features: i) proteomic data alone; ii) clinical 737 parameters alone; iii) proteomic data and clinical parameters. Clinical parameters included 738 sex, age, ethnicity, cause of ESKD, comorbidities, smoking status, radiological evidence of 739 pulmonary infiltrates, and clinical laboratory tests. 740 741 Survival analysis using joint modelling 742 Following scaling and centering, we fit linear mixed models for each protein to capture the 743 temporal trajectories of each individual. A polynomial spline of degree 2 was used to model 744 protein concentration with respect to time (from symptom onset, measured in days); the 745 spline was fitted for samples that were taken between 1 and 28 days from first symptoms, 746 inclusive. Proteomic data after that point was censored. We estimated both random 747 intercepts and random slopes for each individual, as per the following R formula notation:

748 NPX ~ time + (time | individual)

749 These were joined to a Cox regression model using the jointModel package [11] in order to 750 estimate the association of each protein with risk for death. P values were calculated using a 751 Wald test for the association between the linear mixed model and Cox regression. 752 Benjamini-Hochberg adjustment was applied, with an adjusted p-value of 0.05 used as the 753 significance threshold. 754 755 Longitudinal analysis 756 We also used linear mixed models to estimate the temporal profile of each protein. For this 757 longitudinal analysis we explicitly modelled the time from first symptoms. We set up the

21 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

758 model to test for each protein a) whether the protein significantly change over time and b) 759 whether the protein changes over time differently in individuals with a mild versus severe 760 disease course. The latter was performed statistically by testing for an interaction effect 761 between time and clinical course. For the purposes of this analysis, we binarised patients 762 into severe or non-severe clinical course according to the peak WHO severity disease of 763 their illness. Patients with a peak WHO score of mild or moderate were considered non- 764 severe and those with a peak score of severe or critical were considered severe. 765 766 We then used R’s bs function to fit a polynomial spline of degree 2 to model protein 767 concentration with respect time (from symptom onset, measured in days) [57]. The spline 768 was fit for samples that were taken between 1 and 21 days from symptom onset, inclusive. 769 We estimated random slopes with respect to time, in addition to random intercepts, to 770 account for each individual’s unique disease course. For each protein, we fitted the following 771 model (R notation): 772 NPX ~ time * severity + sex + age + ethnicity + (time | individual) 773 To identify proteins that changed significantly over time, we examined the P-values for the 774 main effect of time. To identify proteins with distinct temporal profiles between severe and 775 non-severe cases, we examined the P values for the time x severity interaction term. For 776 each of these two research questions, P-values were adjusted for the multiple proteins 777 tested using the Benjamini-Hochberg method and 5% FDR used as the significance 778 threshold. 779 780 Data availability statement 781 All data generated during this study are included in the manuscript and supporting files. 782 Underlying source data for all analyses (individual-level proteomic and clinical phenotyping 783 data) are available without restriction as Source Data Files 1-4. In addition, these data have 784 been deposited in the Dryad Digital Repository (doi:10.5061/dryad.6t1g1jwxj). 785 786 Code availability: code is available in the following GitHub repository: 787 https://github.com/jackgisby/longitudinal_olink_proteomics 788 789 Author contributions: 790 Conceived and designed the study: MCP, MB, MW, DCT, JEP. 791 Funding acquisition: JB, MCP, MB, MW, DCT, JEP. 792 Patient recruitment and sample collection: CLC, NM-T, ES, SPM, MFP, MW, DCT. 793 Led and coordinated patient recruitment: MW. 794 Sample processing: THM, PMM, NBB, SL, MP, FT, EF, M-A M, ED, LT, MB. 795 Clinical phenotyping: CLC, NM-T, SL, ES, MFP, MCP, JEP. 796 Data analysis: JG, AP. 797 Supervised the analysis: JEP. 798 Statistical support: PK, ACR. 799 Wrote the paper: JG, JEP

22 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

800 Results interpretation and editing the paper: MB, MCP, DT, ACR, JB. 801 All authors critically reviewed and approved the manuscript before submission. 802 803 Acknowledgements 804 The authors thank the patients who volunteered for this study and the staff at Imperial 805 College Healthcare NHS Trust (the Imperial College Healthcare NHS Trust renal COVID-19 806 group and dialysis staff): 807 808 Appelbe M, Ashby DR, Brown EA, Cairns T, Charif R, Condon M, Corbett RW, Duncan N, 809 Edwards C, Frankel A, Griffith M, Harris S, Hill P, Kousios A, Levy JB, Loucaidou M, 810 Lightstone L, Liu L, Lucisano G, Lynch K, Mclean A, Moabi D, Muthusamy A, Nevin M, 811 Palmer A, Parsons D, Prout V, Salisbury E, Smith C, Tam F, Tanna A, Tansey K, Tomlinson 812 J, Webster P. 813 814 We also acknowledge the efforts of renal specialist doctors in training for assistance with 815 recruiting patients to this study. 816 817 We also thank: 818 - Hari and Rachna Murgai and Milan and Rishi Khosla for generous support with sample 819 transport. 820 - Dr Kerry Rostron for exceptional support with laboratory facilities in challenging 821 circumstances and the Department administrators for their help. 822 - Dr Brian Tom and Dr Jessica Barrett (MRC Biostatistics Unit, University of Cambridge) for 823 statistical advice. 824 - Prof Sir John Savill (Melbourne Academic Centre for Health) for comments on the 825 manuscript. 826 827 Funding statement 828 This research was partly funded by Community Jameel and the Imperial President’s 829 Excellence Fund and by a UKRI-DHSC COVID-19 Rapid Response Rolling Call 830 (MR/V027638/1) (to JEP). We also acknowledge a contribution from UKRI/NIHR through the 831 UK Coronavirus Immunology Consortium (UK-CIC) and the National Institute for Health 832 Research (NIHR) Biomedical Research Centre based at Imperial College Healthcare NHS 833 Trust and Imperial College London. The views expressed are those of the author(s) and not 834 necessarily those of the NHS, the NIHR or the Department of Health. JEP is supported by 835 UKRI Innovation Fellowship at Health Data Research UK (MR/S004068/2). DCT is 836 supported by a Stage 2 Wellcome-Beit Prize Clinical Research Career Development 837 Fellowship (20661206617/A/17/Z and 206617/A/17/A) and the Sidharth Burman endowment. 838 MCP is a Wellcome Trust Senior Fellow in Clinical Science (212252/Z/18/Z). NM-T and ES 839 are supported by Wellcome Trust and Imperial College London Research Fellowships, and 840 CLC by an Auchi Clinical Research Fellowship. PDWK is supported by the UK Medical 841 Research Council (MC_UU_00002/13). 842 843 The funders had no role in study design, data collection and interpretation, or the decision to 844 submit the work for publication. 845 846 Competing interests: 847 None of the authors have any patents (planned, pending or issued) relevant to this work. Dr. 848 McAdoo reports personal fees from Celltrion, Rigel, GSK and Cello, unrelated to the 849 submitted work. Dr Peters has received travel and accommodation expenses and hospitality

23 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

850 from Olink to speak at Olink-sponsored academic meetings. None of the other authors have 851 any competing interests. 852 853

24 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

854 Figure Legends 855 856 Figure 1 – Study design. 857 a) Schematic representing a summary of the patient cohorts, sampling and the major 858 analyses. Blue and red stick figures represent outpatients and hospitalised patients, 859 respectively. b) Timing of serial blood sampling in relation to clinical course of COVID-19 860 (subcohort A). Black asterisks indicate when samples were obtained. Three patients were 861 already in hospital prior to COVID-19 diagnosis (indicated by red bars). 862 863 Figure 1 figure supplement 1 – Baseline characteristics of subcohort A. 864 The number of COVID-19 positive and negative patients in subcohort A (plasma), stratified 865 by: a) sex, b) age, and c) ethnicity. d) Serial samples obtained for COVID-19 patients. 866 867 Figure 2 – Principal components analysis. 868 PC = principal component. Each point represents a sample. Colouring indicates COVID-19 869 status. The directions and relative sizes of the 6 largest PC loadings are plotted as arrows 870 (middle column). 871 a-b) Subcohort A. Due to serial sampling, there are multiple samples for most patients. The 872 proportion of variance explained in subcohort A by each PC is shown in parentheses on the 873 axis labels. c-d) Subcohort B. Samples are projected into the PCA coordinates from 874 subcohort A. 875 876 Figure 2 figure supplement 1 – Principal components analysis in relation to clinical 877 severity. 878 a) Colouring indicates WHO severity at time of sampling. b) Colouring indicates overall 879 clinical course (indicated by peak WHO severity) for the patient from which that sample was 880 taken. 881 882 Figure 2 figure supplement 2 – Principal components analysis in relation to assay 883 plate. 884 Principal components analysis of the subcohort A coloured by plate. 885 886 Figure 3 – Identification of dysregulated proteins. 887 a) Proteins up-regulated (red) or down-regulated (blue) in COVID-19 positive patients 888 versus negative ESKD patients. n= 256 plasma samples from 55 COVID-19 positive 889 patients, versus n= 51 ESKD controls (1 sample per control patient). 890 b) Proteins associated with disease severity. Associations of protein levels against WHO 891 severity score at the time of sampling. Linear gradient indicates the effect size. A positive 892 effect size (red) indicates that an increase in protein level is associated with increasing 893 disease severity and a negative gradient (blue) the opposite. n= 256 plasma samples from 894 55 COVID-19 positive patients. For a) and b): P-values from linear mixed models after 895 Benjamini-Hochberg adjustment; significance threshold= 5% FDR; dark-grey = non- 896 significant. 897 c) Heatmap showing protein levels for selected proteins with strong associations with 898 severity. Each column represents a sample (n=256 COVID-19 samples and 51 non-infected 899 samples). Each row represents a protein. Proteins are annotated using the symbol of their 900 encoding . For the purposes of legibility, not all significantly associated proteins are 901 shown; the heatmap is limited to the 17% most up- or down-regulated proteins (by effect 902 size) of those with a significant association. Proteins are ordered by hierarchical clustering. 903 Samples are ordered by WHO severity at the time of blood sample (‘Severity’). ‘Overall 904 course’ indicates the peak WHO severity over the course of the illness. 905 906 Figure 3 figure supplement 1. Differential abundance analysis between ESKD patients 907 with and without COVID-19.

25 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

908 Heatmap showing selected proteins with the largest fold changes in differential abundance 909 analysis (subcohort A). As for Figure 3, the heatmap is limited to the 17% most up- or down- 910 regulated proteins (by fold change) of those with a significant association. 911 912 Figure 3 figure supplement 2 – Permutation analysis to estimate the null distribution. 913 Histogram showing the distribution of the number of associations declared significant (FDR 914 5%) after random permutation of class labels (100,000 replications). a) the COVID-19 +ve 915 versus -ve differential abundance analysis. b) the COVID-19 severity differential abundance 916 analysis. The vertical red line denotes the number of proteins we found significant in the 917 analysis with the true sample labels. 918 919 Figure 3 figure supplement 3 – Sensitivity analyses adjusting for diabetes status and 920 cause of ESKD. 921 As sensitivity analyses, the COVID-19 positive versus negative differential abundance 922 regressions were repeated adding diabetes status (a-b) and cause of ESKD (c-d) as 923 additional covariates. The basic model included age, sex and ethnicity as covariates. Each 924 point represents a protein. A comparison of -log10 p-values and effect sizes are shown for 925 all 436 proteins. r indicates Pearson’s correlation coefficient. 926 927 Figure 3 figure supplement 4 – Sensitivity analysis adjusting for time since last 928 haemodialysis. 929 Comparison of results obtained with and without adding time since last haemodialysis as an 930 additional covariate to the regression models. a-b) COVID-19 positive versus negative 931 differential expression analysis. c-d) Severity analysis. Each point represents a protein. r 932 indicates Pearson’s correlation coefficient. 933 934 Figure 4 – Validation. 935 a) Overlap between the significant associations in the differential abundance analysis 936 between ESKD patients with and without COVID-19 in subcohorts A and B. 5% FDR was 937 used as the significance threshold in both analyses. 938 b) Comparison of estimated effect sizes for all 436 proteins in the differential abundance 939 analyses (COVID-19 positive versus negative) in subcohort A and B. Each point represents 940 a protein. Pearson’s r is shown. 941 Differential abundance analyses were performed using linear mixed models. Subcohort A 942 analysis (plasma samples): 256 samples from 55 COVID-19 patients versus 51 non-infected 943 patient samples (single time-point). Subcohort B (serum samples): 52 samples from 55 944 COVID-19 patients and 11 non-infected patient samples (single time-point). 945 946 Figure 4 figure supplement 1– Comparison with the report of Filbin et al. [16] 947 Comparison of log2 fold change for COVID-19 positive versus negative ESKD patients in our 948 study versus COVID-19 positive versus negative respiratory distress patients in the report by 949 Filbin et al. [16]. Colours indicate whether a protein was significantly differentially abundant 950 in each study. Pearson’s r is shown. 951 952 Figure 5 - Selected proteins strongly associated with COVID-19 severity. 953 Violin plots showing distribution of plasma protein levels according to COVID-19 status at the 954 time of blood draw. Boxplots indicate median and interquartile range. n=256 samples from 955 55 COVID-19 patients and 51 samples from non-infected patients. WHO severity indicates 956 the clinical severity score of the patient at the time the sample was taken. Mild n=135 957 samples; moderate n=77 samples; severe n=29 samples; critical n= 15 samples. Upper 958 panel: monocyte chemokines. Middle panel: markers of epithelial injury. Lower panel: 2 959 neutrophil proteases, and IL6. 960 961 Figure 6 – comparison of proteins differentially expressed in COVID-19 with those 962 associated with clinical severity.

26 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

963 a) Overlap between the proteins significantly differentially expressed in COVID-19 (n=256 964 COVID-19 samples and 51 non-infected samples) versus those associated with severity 965 (within cases analysis, n=256 samples) (subcohort A). 5% FDR was used as the significant 966 cut-off in both analyses. 967 b) Comparison of effect sizes for each protein in the COVID-19 positive versus negative 968 analysis (x-axis) and severity analysis (y-axis). Each point represents a protein. Pearson’s r 969 is shown. 970 c) Examples of proteins specifically associated with severity but not significantly differentially 971 abundant in the comparison of all cases versus controls. Violin plots showing distribution of 972 plasma protein levels according to COVID-19 status at the time of blood draw. Boxplots 973 indicate median and interquartile range. n=256 samples from 55 COVID-19 patients and 51 974 samples from non-infected patients. WHO severity indicates the clinical severity score of the 975 patient at the time the sample was taken. Mild n=135 samples; moderate n=77 samples; 976 severe n=29 samples; critical n= 15 samples. 977 978 Figure 7 – Prediction of severe COVID-19 and death. 979 a) The 12 most important proteins for predicting overall clinical course (defined by peak 980 COVID-19 WHO severity) using Random Forests supervised learning. If a variable is 981 important for prediction, it is likely to appear in many decision trees (number of trees) and be 982 close to the root node (i.e. have a low minimal depth). The mean minimal depth across all 983 trees (white box) was used as the primary feature selection metric. 984 b) Proteins that are significant predictors of death (Benjamini-Hochberg adjusted p <0.05). 985 n=256 samples from 55 COVID-19 positive patients, of whom 9 died. Risk coefficient 986 estimates are from a joint model. Bars indicate 95% confidence intervals. For proteins with a 987 positive risk coefficient, a higher concentration corresponds to a high risk of death, and vice 988 versa for proteins with negative coefficients. 989 990 Figure 7 figure supplement 1 – proteins associated with risk of death: correlation to 991 clinical severity and clinical laboratory measurements. 992 Proteins significantly associated with risk of death (5% FDR) are shown. The estimated 993 effect size from the linear mixed model testing association with severity are also shown. 994 Correlations between protein levels and contemporaneous clinical laboratory marker values 995 were calculated using rmcorr [55] for each of the proteins significant (5% FDR) in the joint 996 model. The rows and columns of the clinical marker correlation matrix are ordered by 997 hierarchical clustering. 998 999 Figure 8 – Associations of clinical laboratory markers with plasma proteins. 1000 Proteins that are positively (red) or negatively (blue) associated with clinical laboratory 1001 parameters (5% FDR). P-values from differential abundance analysis using linear mixed 1002 models after Benjamini-Hochberg adjustment. Dark-grey = non-significant. Two associations 1003 were found for d-dimer (not shown- see Supplementary File 1g). 1004 1005 Figure 9 – Modelling of temporal protein trajectories. 1006 The top 18 proteins displaying the most significantly (5% FDR) different longitudinal 1007 trajectories between patients with a mild or moderate (n= 28) versus severe or critical (n= 1008 27) overall clinical course (defined by peak WHO severity). Means and 95% confidence 1009 intervals for each group, predicted using linear mixed models (Methods), are plotted. The 1010 remainder of significant proteins are shown in Figure 9 figure supplement 1. Individual 1011 data points are shown in Figure 9 figure supplement 2. 1012 1013 Figure 9 figure supplement 1– display of modelled temporal trajectories for other 1014 proteins with a significant time x severity interaction. Proteins significant at 5% FDR but 1015 not shown in Figure 9 are displayed here. 1016

27 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

1017 Figure 9 figure supplement 2– raw data points for modelling of temporal protein 1018 trajectories. The 8 most significant proteins from Figure 9 are displayed. 1019 1020 1021 1022

28 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

1023 COVID-19 positive ESKD patients (n=55) ESKD controls (n=51)

Overall Patients with peak Patient with peak severity mild or severity severe or moderate critical (n=28) (n=27)

Age Median 72.2 73.4 68.5 70.1 (Q1-Q3) 62.5-77.3 65.5-76.4 61.8-78.8 62.2-75.1 Sex M 39 (70.9%) 18 (64.3%) 21 (77.8%) 36 (70.6%) F 16 (29.1%) 10 (35.7%) 6 (22.2%) 15 (29.4%) Ethnicity White 16 (29.1%) 5 (17.9%) 11 (40.7%) 13 (25.5%) Black 8 (14.5%) 5 (17.9%) 3 (11.1%) 8 (15.7%) South Asian 18 (32.7%) 10 (35.7%) 8 (29.6%) 20 (39.2%) Asian (other) 4 (7.3%) 1 (3.6%) 3 (11.1%) 3 (5.9%) Other 9 (16.4%) 7 (25.0%) 2 (7.4%) 7 (13.7%) Diabetes 34 (61.8%) * 16 (57.1%) 18 (66.7%) 24 (47.1%) * Current Smoker 1 (1.8%) 1 (3.6%) 0 0 ESKD Cause DN 29 (52.7%) 14 (50.0%) 15 (55.6%) 20 (39.2%) Genetic 1 (1.8%) 1 (3.6%) 0 1 (2.0%) GN 3 (5.5%) 1 (3.6%) 2 (7.4%) 9 (17.6%) HTN/Vascular 5 (9.1%) 3 (5.5%) 2 (7.4%) 7 (13.7%) Other 8 (14.5%) 5 (17.9%) 3 (11.1%) 4 (7.8%) Unknown 9 (16.4%) 4 (14.3%) 5 (18.5%) 10 (19.6%) Hospitalisation 33 (60%) 6 (21.4%) 27 (100%) N/A due to COVID-19†

Fatal COVID-19 9 (16.3%) 0 (0%) 9 (33.3%) N/A

1024 1025 Table 1 – Characteristics of subcohort A. 1026 GN = Glomerulonephritis. HTN = Hypertension. IQR = inter-quartile range. ‘South Asian’ represents 1027 individuals with Indian, Pakistani or Bangladeshi ancestry. Subsets defined according to peak WHO 1028 severity over the course of the illness. N/A = not applicable. 1029 *One patient had type 1 diabetes, the remainder type 2. 1030

29 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

1031 COVID-19 positive COVID-19 negative ESKD patients ESKD controls (n=46) (n=11)*

Age Median 64.3 71.6 (Q1-Q3) 60.3-73.0 (61.7-73.9) Sex M 32 (69.6%) 8 (72.3%) F 14 (30.4%) 3 (27.3%) Ethnicity White 11 (23.9%) 3 (27.3%) Black 8 (17.4%) 3 (27.3%) South Asian 12 (26.1%) 3 (27.3%) Asian (other) 7 (15.2%) 0 Other 8 (17.4%) 2 (18.2%) Diabetes 29 (63.0%) 6 (54.5%) Current Smoker 2 (4.3%) 0 (%) ESKD Cause DN 19 (41.3%) 5 (45.5%) Genetic 1 (2.2%) 0 GN 7 (15.2%) 1 (9.1%) HTN/Vascular 3 (6.5%) 1 (9.1%) Other 3 (6.5%) 2 (18.2%) Unknown 13 (28.3%) 2 (18.2%) Hospitalisation due to 41 (89.1%) N/A COVID-19

Severe or critical COVID-19 33 (71.7%) N/A

Fatal COVID-19 9 (19.6%) N/A

1032 1033 Table 2 – Characteristics of subcohort B. 1034 GN = Glomerulonephritis. HTN = Hypertension. IQR = inter-quartile range. ‘South Asian’ represents 1035 individuals with Indian, Pakistani or Bangladeshi ancestry. Subsets defined according to peak WHO 1036 severity over the course of the illness. N/A = not applicable. *These 11 controls are a subset of the 1037 control patients used in subcohort A. 1038 1039

30 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

1040 Supplementary File 1 table legends. 1041 1042 Supplementary File 1a. Protein Annotation. 1043 List of the 436 proteins measured. GeneID = gene symbol of the gene encoding the protein (used as 1044 the main identifier in the manuscript); UniProt = UniProt ID; Olink Assay Name = protein id used by 1045 Olink; Protein Name = full protein name; Panel name = the name of the 92 protein multiplex Olink 1046 panel on which the protein was measured. 1047 1048 Supplementary File 1b. Enrichment of Reactome terms for the entire set of proteins measured. 1049 The results of enrichment testing for genes corresponding to all 436 measured proteins against the 1050 background of the genome. The analysis was performed against the Reactome pathways using 1051 string-db. The list of Reactome terms is ordered by the number of proteins associated with the term. 1052 1053 Supplementary File 1c. Differential abundance analysis for COVID-19 positive vs negative 1054 ESKD patients in subcohort A and B. 1055 Summary statistics for all 436 proteins are shown. Pvalue = nominal p-value from linear mixed model. 1056 Adjusted Pvalue = p-values after Benjamini-Hochberg correction. Fold change = estimated fold 1057 change from regression coefficient. Proteins are ordered based on results in subcohort A: first by 1058 whether they are significant or not (at 5% FDR), then by fold change (from positive to negative). Note 1059 the associations are not ordered by p-value so strong associations do not necessarily appear at the 1060 top of the table. Significant adjusted p-values are coloured in green and non-significant in grey. 1061 Estimated fold changes are coloured in a gradient from red to blue for up or downregulated in COVID- 1062 19 +ve versus –ve, respectively. 1063 Sample size for subcohort A: n= 256 plasma samples from 55 COVID-19 positive ESKD patients, 1064 versus n= 51 ESKD controls (1 sample per control patient). 1065 Sample size for subcohort B: 52 samples from 55 COVID-19 patients and 11 non-infected patient 1066 samples (single time-point). 1067 1068 Supplementary File 1d Associations of proteins and COVID-19 severity (subcohort A). 1069 Summary statistics for all 436 proteins are shown. Pvalue = nominal p-value from linear mixed model. 1070 Adjusted Pvalue = p-values after Benjamini-Hochberg correction. Fold change = estimated fold 1071 change from regression coefficient. Proteins are ordered first by whether they are significant or not (at 1072 5% FDR), then by linear gradient (effect size) from positive to negative. Note the associations are not 1073 ordered by p-value so strong associations do not necessarily appear at the top of the table. 1074 1075 Supplementary File 1e. Predictors of clinical course from Random Forests. 1076 Importance metrics for each protein for prediction according to a random forest model trained to 1077 predict current or future severe/critical disease using the first sample of each patient. Proteins are 1078 ordered by mean minimal depth across all trees – this was used as the primary importance metric. 1079 1080 Supplementary File 1f. Proteomic predictors of fatal COVID-19. 1081 Summary statistics from joint models for fatal disease. Results for all 436 proteins are shown. “Is 1082 significant” indicates significance (green) or not (grey) at 5% FDR. The association coefficient for 1083 each protein indicates the direction and magnitude of the estimated log relative risk for death (red 1084 indicates higher protein levels increase risk of death, blue the opposite). 95% confidence intervals are 1085 plotted. 1086 1087 Supplementary File 1g. Associations of proteins and clinical laboratory measurements. 1088 Clinical variable = clinical lab tests: white cell count, lymphocyte count, neutrophil count, monocyte 1089 count, C-reactive protein, ferritin, d-dimer, troponin. 1090 1091 Supplementary File 1h. Longitudinal proteomic profiling with linear mixed models. 1092 Summary statistics from the linear mixed models used to identify proteins with differential temporal 1093 trajectories between mild/moderate (n=28) and severe/critical COVID-19 patients (n=27). Summary 1094 statistics for all 436 proteins are shown. 1095

31 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

1096 Pvalue = nominal p-value from linear mixed model for the interaction term between time from 1097 symptom onset (days) and overall WHO severity (as a binary variable: mild-moderate or severe- 1098 critical). 1099 1100 Adjusted Pvalue = p-values after Benjamini-Hochberg correction. “Is significant” indicates significance 1101 (green) or not (grey) at 5% FDR. 1102 1103 Supplementary File 1i. Comparison to other proteomic studies of COVID-19 positive vs 1104 negative patients. 1105 Proteins that were differentially abundant in COVID-19 +ve vs -ve patients in our data are listed (5% 1106 FDR). TRUE indicates that the protein was reported as differentially abundant in the relevant previous 1107 proteomic study. The final column summarises whether the association was previously reported in 1108 any of the 4 studies. We have not harmonised significance thresholds between studies: we simply 1109 report whether the authors declared the protein significant by the threshold of their study. 1110 1111 Supplementary File 1j. Comparison to other proteomic studies of COVID-19 severity. 1112 Proteins that were associated with severity in our data are listed (5% FDR). TRUE indicates that the 1113 protein was reported as associated with severity in the relevant previous proteomic study. The final 1114 column summarises whether the association was previously reported in any 1 or more of the 4 1115 studies. We have not harmonised significance thresholds between studies: we simply report whether 1116 the authors declared the protein significant by the threshold of their study. 1117 1118 Results are shown for all 436 proteins against all 8 lab measurements. 1119 1120 Adjusted p-value = p-value from linear mixed model after Benjamini-Hochberg correction. 1121 1122 Gradient indicates effect size and direction. A positive gradient (red) indicates higher concentrations 1123 of proteins are associated with higher clinical laboratory measurements. “Is significant” indicates 1124 significance (green) or not (grey) at 5% FDR. 1125 1126 Contemporaneous clinical laboratory tests were not available for all plasma samples. The proportion 1127 of samples for which contemporaneous lab tests were available were: white cell count 66%, 1128 neutrophils 66%, monocytes 66%, lymphocytes 66%, CRP 64%, ferritin 36%, troponin 35%, d-dimer 1129 30%. 1130 1131 Supplementary File 1k. Per protein correlations between plasma and serum levels derived 1132 from the same blood sample in 11 COVID-19 negative ESKD patients. 1133 Plasma and serum were taken from 11 non-infected ESKD patients that were measured in both 1134 subcohort A (plasma) and B (serum). Pearson’s r was calculated for the 11 paired measurements for 1135 each protein. Proteins are ordered by r value; this column is coloured from red to blue for positive and 1136 negative r values, respectively. 95% confidence intervals are reported. We also report the variance of 1137 the NPX levels for each protein in plasma and in serum. 1138 1139

32 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

1140 References 1141 1. Horby, P. et al. Dexamethasone in Hospitalized Patients with Covid-19 — Preliminary 1142 Report. N. Engl. J. Med. 0, NEJMoa2021436 (2020) doi:10.1056/NEJMoa2021436. 1143 2. Gordon, A. C. et al. -6 Receptor Antagonists in Critically Ill Patients with 1144 Covid-19 - Preliminary report. medRxiv (2021) doi:10.1101/2021.01.07.21249390. 1145 3. Suhre, K., McCarthy, M. I. & Schwenk, J. M. Genetics meets proteomics: perspectives 1146 for large population-based studies. Nat. Rev. Genet. (2020) doi:10.1038/s41576-020- 1147 0268-2. 1148 4. Williamson, E. J. et al. Factors associated with COVID-19-related death using 1149 OpenSAFELY. Nature 584, 430–436 (2020) doi:10.1038/s41586-020-2521-4. 1150 5. Docherty, A. B. et al. Features of 16,749 hospitalised UK patients with COVID-19 1151 using the ISARIC WHO Clinical Characterisation Protocol. medRxiv 1152 2020.04.23.20076042 (2020) doi:10.1101/2020.04.23.20076042. 1153 6. Corbett, R. W. et al. Epidemiology of COVID-19 in an Urban Dialysis Center. J. Am. 1154 Soc. Nephrol. 31, 1815–1823 (2020) doi:10.1681/ASN.2020040534. 1155 7. Ng, J. H. et al. Outcomes of patients with end-stage kidney disease hospitalized with 1156 COVID-19. Kidney Int. (2020) doi:10.1016/j.kint.2020.07.030. 1157 8. Valeri, A. M. et al. Presentation and outcomes of patients with ESKD and COVID-19. 1158 J. Am. Soc. Nephrol. 31, 1409–1415 (2020) doi:10.1681/ASN.2020040470. 1159 9. Hastie T, Tibshirani R, F. J. High-Dimensional Problems: p ≫ N. in The Elements of 1160 Statistical Learning 649–698 (Springer Series in Statistics, 2001). 1161 10. Ibrahim, J. G., Chu, H. & Chen, L. M. Basic concepts and methods for joint models of 1162 longitudinal and survival data. J. Clin. Oncol. Off. J. Am. Soc. Clin. Oncol. 28, 2796– 1163 2801 (2010) doi:10.1200/JCO.2009.25.0654. 1164 11. Rizopoulos, D. JM: An R package for the joint modelling of longitudinal and time-to- 1165 event data. J. Stat. Softw. 35, 1–33 (2010) doi:10.18637/jss.v035.i09. 1166 12. Guan, W. et al. Clinical characteristics of coronavirus disease 2019 in China. N. Engl. 1167 J. Med. 382, 1708–1720 (2020) doi:10.1056/NEJMoa2002032. 1168 13. Shen, B. et al. Proteomic and Metabolomic Characterization of COVID-19 Patient 1169 Sera. Cell 182, 59-72.e15 (2020) doi:10.1016/j.cell.2020.05.032. 1170 14. Lucas, C. et al. Longitudinal analyses reveal immunological misfiring in severe 1171 COVID-19. Nature 584, 463–469 (2020) doi:10.1038/s41586-020-2588-y. 1172 15. Arunachalam, P. S. et al. Systems biological assessment of immunity to mild versus 1173 severe COVID-19 infection in humans. Science (80-. ). 6261, 1–18 (2020) 1174 doi:10.1126/science.abc6261. 1175 16. Filbin, M. R. et al. Plasma proteomics reveals tissue-specific cell death and mediators 1176 of cell-cell interactions in severe COVID-19 patients. bioRxiv 2020.11.02.365536 1177 (2020) doi:10.1101/2020.11.02.365536. 1178 17. COVID-19 Data. The Renal Association https://renal.org/health-professionals/covid- 1179 19/covid-19-data (2020). 1180 18. Scottish Renal Registry COVID-19 report. Public Health Scotland 1181 https://beta.isdscotland.org/find-publications-and-data/population-health/covid- 1182 19/scottish-renal-registry-covid-19-report/ (2020).

33 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

1183 19. Eiselt, J. et al. Previous Vaccination and Age are More Important Predictors of 1184 Immune Response to Influenza Vaccine than Inflammation and Iron Status in Dialysis 1185 Patients. Kidney Blood Press. Res. 41, 139–147 (2016) doi:10.1159/000443416. 1186 20. Girndt, M., Sester, U., Sester, M., Kaul, H. & Köhler, H. Impaired cellular immune 1187 function in patients with end-stage renal failure. Nephrology, dialysis, 1188 transplantation: official publication of the European Dialysis and Transplant 1189 Association - European Renal Association vol. 14 2807–2810 (1999) 1190 doi:10.1093/ndt/14.12.2807. 1191 21. Sarnak, M. J. & Jaber, B. L. Mortality caused by sepsis in patients with end-stage 1192 renal disease compared with the general population. Kidney Int. 58, 1758–1764 1193 (2000) doi:10.1111/j.1523-1755.2000.00337.x. 1194 22. Alexiewicz, J. M., Smogorzewski, M., Fadda, G. Z. & Massry, S. G. Impaired 1195 phagocytosis in dialysis patients: studies on mechanisms. Am. J. Nephrol. 11, 102– 1196 111 (1991) doi:10.1159/000168284. 1197 23. Massry, S. & Smogorzewski, M. Dysfunction of polymorphonuclear leukocytes in 1198 uremia: role of parathyroid hormone. Kidney Int. Suppl. 78, S195-6 (2001) 1199 doi:10.1046/j.1523-1755.2001.59780195.x. 1200 24. Girndt, M., Sester, M., Sester, U., Kaul, H. & Köhler, H. Molecular aspects of T- and 1201 B-cell function in uremia. Kidney Int. Suppl. 78, S206-11 (2001) doi:10.1046/j.1523- 1202 1755.2001.59780206.x. 1203 25. Meier, P., Dayer, E., Blanc, E. & Wauters, J.-P. Early activation correlates with 1204 expression of apoptosis markers in patients with end-stage renal disease. J. Am. 1205 Soc. Nephrol. 13, 204–212 (2002). 1206 26. Betjes, M. G. H. Immune cell dysfunction and inflammation in end-stage renal 1207 disease. Nat. Rev. Nephrol. 9, 255–265 (2013) doi:10.1038/nrneph.2013.44. 1208 27. Boattini, M. et al. Influenza and respiratory syncytial virus infections in the oldest-old 1209 continent. Eur. J. Clin. Microbiol. Infect. Dis. Off. Publ. Eur. Soc. Clin. Microbiol. 39, 1210 2085–2090 (2020) doi:10.1007/s10096-020-03959-9. 1211 28. Prasad, N. et al. Respiratory Syncytial Virus–Associated Hospitalizations Among 1212 Adults With Chronic Medical Conditions. Clin. Infect. Dis. (2020) 1213 doi:10.1093/cid/ciaa730. 1214 29. Usvyat, L. A. et al. Seasonal variations in mortality, clinical, and laboratory parameters 1215 in hemodialysis patients: a 5-year cohort study. Clin. J. Am. Soc. Nephrol. 7, 108– 1216 115 (2012) doi:10.2215/CJN.03880411. 1217 30. Carrero, J. J. & Stenvinkel, P. Inflammation in end-stage renal disease--what have we 1218 learned in 10 years? Semin. Dial. 23, 498–509 (2010) doi:10.1111/j.1525- 1219 139X.2010.00784.x. 1220 31. Naseeb, U. et al. Complementary LC-MS/MS Proteomic Analysis of Uremic Plasma 1221 Proteins. J. Coll. Physicians Surg. Pak. 25, 606–609 (2015). 1222 32. Christensson, A. et al. The Impact of the Glomerular Filtration Rate on the Human 1223 Plasma Proteome. Proteomics. Clin. Appl. 12, e1700067 (2018) 1224 doi:10.1002/prca.201700067. 1225 33. Lind, L. et al. Longitudinal effects of aging on plasma proteins levels in older adults - 1226 associations with kidney function and hemoglobin levels. PLoS One 14, e0212060 1227 (2019) doi:10.1371/journal.pone.0212060.

34 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

1228 34. Wu, C. et al. Risk Factors Associated With Acute Respiratory Distress Syndrome and 1229 Death in Patients With Coronavirus Disease 2019 Pneumonia in Wuhan, China. 1230 JAMA Intern. Med. 180, 934–943 (2020) doi:10.1001/jamainternmed.2020.0994. 1231 35. Li, X. et al. Risk factors for severity and mortality in adult COVID-19 inpatients in 1232 Wuhan. J. Allergy Clin. Immunol. 146, 110–118 (2020) doi:10.1016/j.jaci.2020.04.006. 1233 36. Furlow, B. COVACTA trial raises questions about ’s benefit in COVID-19. 1234 Lancet Rheumatol. 2, e592 (2020) doi:10.1016/S2665-9913(20)30313-1. 1235 37. Gupta, R. K. et al. Development and validation of the ISARIC 4C Deterioration model 1236 for adults hospitalised with COVID-19: a prospective cohort study. Lancet. Respir. 1237 Med. (2021) doi:10.1016/S2213-2600(20)30559-2. 1238 38. Schutyser, E., Struyf, S. & Van Damme, J. The CC chemokine CCL20 and its 1239 receptor CCR6. Cytokine Growth Factor Rev. 14, 409–426 (2003) doi:10.1016/s1359- 1240 6101(03)00049-2. 1241 39. Saha, S. K. et al. KRT19 directly interacts with β-catenin/RAC1 complex to regulate 1242 NUMB-dependent NOTCH signaling pathway and breast cancer properties. 1243 Oncogene 36, 332–349 (2017) doi:10.1038/onc.2016.221. 1244 40. Maecker, H. et al. TWEAK attenuates the transition from innate to adaptive immunity. 1245 Cell 123, 931–944 (2005) doi:10.1016/j.cell.2005.09.022. 1246 41. Burkly, L. C., Michaelson, J. S., Hahm, K., Jakubowski, A. & Zheng, T. S. TWEAKing 1247 tissue remodeling by a multifunctional cytokine: role of TWEAK/Fn14 pathway in 1248 health and disease. Cytokine 40, 1–16 (2007) doi:10.1016/j.cyto.2007.09.007. 1249 42. Henning, R. J., Bourgeois, M. & Harbison, R. D. Poly(ADP-ribose) Polymerase 1250 (PARP) and PARP Inhibitors: Mechanisms of Action and Role in Cardiovascular 1251 Disorders. Cardiovasc. Toxicol. 18, 493–506 (2018) doi:10.1007/s12012-018-9462-2. 1252 43. Rouleau, M., Patel, A., Hendzel, M. J., Kaufmann, S. H. & Poirier, G. G. PARP 1253 inhibition: PARP1 and beyond. Nat. Rev. Cancer 10, 293–301 (2010) 1254 doi:10.1038/nrc2812. 1255 44. Corren, J. New Targeted Therapies for Uncontrolled Asthma. J. allergy Clin. Immunol. 1256 Pract. 7, 1394–1403 (2019) doi:10.1016/j.jaip.2019.03.022. 1257 45. Galijasevic, S. The development of myeloperoxidase inhibitors. Bioorg. Med. Chem. 1258 Lett. 29, 1—7 (2019) doi:10.1016/j.bmcl.2018.11.031. 1259 46. Vergunst, C. E. et al. Modulation of CCR2 in rheumatoid arthritis: a double-blind, 1260 randomized, placebo-controlled clinical trial. Arthritis Rheum. 58, 1931–1939 (2008) 1261 doi:10.1002/art.23591. 1262 47. Haringman, J. J. et al. A randomized controlled trial with an anti-CCL2 (anti-monocyte 1263 chemotactic protein 1) monoclonal antibody in patients with rheumatoid arthritis. 1264 Arthritis Rheum. 54, 2387–2392 (2006) doi:10.1002/art.21975. 1265 48. Rodriguez, L. et al. Systems-Level Immunomonitoring from Acute to Recovery Phase 1266 of Severe COVID-19. Cell reports. Med. 1, 100078 (2020) 1267 doi:10.1016/j.xcrm.2020.100078. 1268 49. Uhlen, M. et al. Tissue-based map of the human proteome. Science (80-. ). 347, 1269 1260419–1260419 (2015) doi:10.1126/science.1260419. 1270 50. Szklarczyk, D. et al. STRING v11: Protein-protein association networks with increased 1271 coverage, supporting functional discovery in genome-wide experimental datasets. 1272 Nucleic Acids Res. 47, D607–D613 (2019) doi:10.1093/nar/gky1131.

35 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

1273 51. Gandolfo, L. C. & Speed, T. P. RLE plots: Visualizing unwanted variation in high 1274 dimensional data. PLoS One 13, 1–9 (2018) doi:10.1371/journal.pone.0191629. 1275 52. Kuhn, M. Building predictive models in R using the caret package. J. Stat. Softw. 28, 1276 1–26 (2008) doi:10.18637/jss.v028.i05. 1277 53. Bates, D., Mächler, M., Bolker, B. & Walker, S. Fitting Linear Mixed-Effects Models 1278 Using lme4. J. Stat. Softw. 67, (2015) doi:10.18637/jss.v067.i01. 1279 54. Kuznetsova, A., Brockhoff, P. B. & Christensen, R. H. B. lmerTest Package: Tests in 1280 Linear Mixed Effects Models. J. Stat. Softw. 82, (2017) doi:10.18637/jss.v082.i13. 1281 55. Bakdash, J. Z. & Marusich, L. R. Repeated measures correlation. Front. Psychol. 8, 1282 1–13 (2017) doi:10.3389/fpsyg.2017.00456. 1283 56. Leo, B. Random forests. Mach. Learn. 45, 5–32 (2001) 1284 doi:10.1023/A:1010933404324. 1285 57. Perperoglou, A., Sauerbrei, W., Abrahamowicz, M. & Schmid, M. A review of spline 1286 function procedures in R. BMC Med. Res. Methodol. 19, 1–16 (2019) 1287 doi:10.1186/s12874-019-0666-3. 1288

36 Figure 1 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license . Subcohort A Serial blood sampling N=256 a COVID-19 +ve Identification of dysregulated proteins samples ESKD patients (N=55) 1st sample as outpatient N=30 patients Differential expression analysis: i) COVID-19 +ve cases vs controls ii) Within-cases analysis of severity N=8 deteriorated and hospitalised

1st sample as inpatient N=25 patients Determine biomarkers of severe COVID-19

N=9 deaths Supervised Feature Learning Extraction Control group: Subcohort B COVID-19 -ve COVID-19 +ve ESKD ESKD patients N=46 patients N=51 N=52 samples Longitudinal analysis

Model Joint Models Plasma / serum proteomics Temporal + to Identify (436 proteins) Trajectories Predictors of Death b O O O O O O O O O O O O O O O O O O O O O O O O O O O O COVID-19 Infected Individuals Infected COVID-19 O

0 10 20 30 40 Time from first positive swab (days)

COVID-19 WHO Severity Mild Moderate Severe Critical Sample Timepoint

O Hospital Admission Hospitalised before COVID-19 diagnosis Death medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

Figure 2

COVID-19 Status Negative Positive

Subcohort A (plasma) Subcohort B (serum)

IKBKG a NF2 c

SH2B3 0.1 0.1 0.1 IRAK4 AXIN1

DCTN1

0.0 0.0 0.0 PC2 (8.94%) -0.1 -0.1 -0.1

-0.2 -0.1 0.0 0.1 -0.2 -0.1 0.0 0.1 -0.2 0.0 0.2 b d 0.2 0.2 0.2

0.1 0.1 0.1 0.0

PC3 (7%) 0.0 0.0 -0.1 TREML2

CD46 SERPINA5 -0.2

-0.1 -0.1 CDH1 PTPRS MFAP5 -0.2 -0.1 0.0 0.1 -0.2 -0.1 0.0 0.1 -0.2 0.0 0.2 PC1 (19.11%) PC1 PC1 medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

Figure 3 a COVID-19 +ve vs -ve COVID-19 Severity differential abundance analysis b differential abundance analysis

CCL2 KRT19 DDX58 AREG PSIP1 CCL7 PSIP1 CKAP4 PRTN3 BLMH GRN LGALS9 CCL7 CXCL10 7.5 LDLR HMOX1 CXCL11 9 CXCL1 TNFRSF10B IGFBP1 ITGA11 KRT19 IL1RN DDX58 TRIM21 DECR1 DFFA PLAT CCL20 SERPINA5 IL8 SRPK2 CXCL10 5.0 SERPINA5 6 IL6 IL6 FETUB SERPINE1 ITGB1BP2 MMP12 SERPINA12 PON3 2.5 CCL16 FABP2 IFNG 3 -log10(Adjusted Pvalue) -log10(Adjusted Pvalue)

LEP

0.0 0 0 1 2 -0.5 0.0 0.5 1.0 1.5 Log2 Fold Change Effect Size

Age Abundance c (Z-Score) Sex Severity Overall Course

Upregulated

Age 80

40

Sex F M

Downregulated WHO Severity Negative Mild Moderate Severe Critical medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

Figure 4 a b Significance None Serum only Plasma only Both

IL6 Subcohort A Subcohort B DDX58 COVID-19 COVID-19 CCL7 +ve vs -ve +ve vs -ve 4 (plasma) (serum)

TRIM21 ZBTB16 ITGB1BP2 CXCL10

IL10 2 CXCL11

Log2 Fold Change

Subcohort B (serum) r = 0.804 0 MMP12

FABP2 PON3

SERPINA5 CCL16 0 2 4 Subcohort A (plasma) Log2 Fold Change medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

Figure 5

CCL2 CCL7 CXCL10 10.0 16 15 7.5 14 14

5.0 12 13

10 12 2.5

8 11 0.0

KRT19 AREG PSIP1

10

7.5 7.5 8

5.0 5.0 6

2.5 4 2.5

0.0

PRTN3 MPO IL6

5 10 10 4 8

3 6 5

2 4

0 NPX (log2 protein abundance) NPX (log2 protein abundance) NPX (log2 protein abundance) 1 2

Mild Mild Mild Severe Critical Severe Critical Severe Critical Negative Moderate Negative Moderate Negative Moderate medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

Figure 6

None Severity only a b Significance COVID-19 +ve vs -ve only Both

COVID-19 IL6 1.5 CCL7 +ve vs -ve Severity IGFBP1 KRT19 DDX58 CCL20

FGF21 1.0 IL17C

OSM

CXCL1 0.5 ITGB1BP2 DECR1 IL1RN r = 0.754 0.0 DAPP1 Severity Effect Size Severity Effect CXCL5 CTRC -0.5 GIF SERPINA12 LEP FETUB SERPINA5 0 2 4 COVID-19 +ve vs -ve Log2 Fold Change c CCL20 IL17C OSM 15.0 12 10

12.5 8 9

10.0 6 6

7.5 4 3

NPX (log2 protein abundance) 5.0 2

Mild Mild Mild Severe Critical Severe Critical Severe Critical Negative Moderate Negative Moderate Negative Moderate medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

Figure 7 a Minimal Depth 0 1 2 3 4

IL18BP 0.82 CTSD 1.18 GDF15 1.77 KRT19 1.8 TNFSF11 1.83 IL1RL1 1.83 PARP1 1.89

Importance FLT3LG 1.92 CKAP4 1.95 CDH5 2.09 CCL24 2.09 CCL2 2.12

0 10 20 30 Number of Trees

CST3 b IL22RA1 AZU1 CCL28 SPON1 TRIM5 FABP6 HCLS1 OSM PTX3 IL6 PRDX1 LDLR FST NID1 HEXIM1 PARP1 TFPI IL2 DKK1 KRT19 IL33 GLO1 PLAT SERPINE1 IL8 PRELP NRP1 TIMP1 IL20 CA1 ST6GAL1 PSIP1 S100A12 PRTN3 IL1RL1 HSPG2 IL10 COL18A1 AREG C1QTNF1 IL1R2 F3 CXCL16 EIF4EBP1 TNFRSF13B NCR1 NRTN FAP CD5 LTA FCRL6 AXL CD244 TNFSF11 TEK IL20RA HSPB1 IL13 CSF1 AMBP SIT1 SPON2 SOD2 ADM PRKCQ TANK TNFSF12 CD84 -10 -5 0 5 10 Risk Coefficient (95% CI) medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

Figure 8

Lymphocyte Count C-Reactive Protein

IL1RL1 IL6 SERPINA5 CKAP4 8 KRT19 PTX3 20 CKAP4 IL18 CXCL10 NID1 KRT19 TNFSF13B EBI3 IL18BP CR2 AGRP AREG CCL7 IL6 NID1 IL33 CD6 SERPINA5 6 IL10 TNFSF13B CCL7 15 IL8 PSIP1 IL8 TNFSF11 IGFBP1 SERPINA12 IL1RL1 CXCL10 DDX58 SERPINA12 4 IGFBP1 10 CCL16 FGF21 CCL20 CCL16 IFNG CTRC DDX58

CCL24 -log10(Adjusted Pvalue) 2 5 CCL17 IFNG

0 0 -1 0 1 -0.005 0.000 0.005 0.010 Ferritin Model Gradient Neutrophil Count Model Gradient

6 OSM HNMT TNFRSF10C PLAUR TNFRSF1B CCL2 6 TNFRSF1A CTSL TGFA MMP9

PARP1 SELE REG3A 4 CCL7 PSIP1 KRT19 PRSS2

IL10 IGFBP1 4 TNFRSF10B DEFA1

S100A12 IL6

PAPPA CCL16 2 CLEC4C KITLG IFNG CCL5 SERPINA5 CCL20 2 IL12B

-log10(Adjusted Pvalue) DDX58 -log10(Adjusted Pvalue) IL6

0 0 -2e-04 0e+00 2e-04 4e-04 -0.1 0.0 0.1

White Cell Count Monocyte Count 10.0 3 LYVE1 TNFRSF10C TNFRSF10C CCL16

BOC LAG3 CX3CL1 ANGPTL3 7.5 CCL7 PARP1 PLAUR ITGB6 SERPINA12 PLAUR MBL2 2 PAPPA KRT19 REG3A DDX58 5.0 PRSS2 OSM HAO1 SELE CCL17 MMP9 TNFRSF11A CEACAM8 CCL16 LAG3 CCL24 1 2.5 TNFRSF10B -log10(Adjusted Pvalue) -log10(Adjusted Pvalue) CCL8

IFNG

0.0 0 -0.1 0.0 0.1 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 Effect Size Effect Size medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

Figure 9

WHO Severity Mild/Moderate Severe/Critical

2.5 ITGA11 5.2 ITGB6 TNFRSF10B

4.8 11 2.0

4.4 10

1.5 4.0 9 NPX (log2 protein abundance) 1.0 3.6 5 10 15 20 5 10 15 20 5 10 15 20 9.5 PLAUR 8.0 ICAM1 4.0 CNTN1 9.0

7.5 8.5 3.6

8.0 7.0 3.2 7.5

6.5 5 10 15 20 5 10 15 20 5 10 15 20 CFHR5 LIFR 8 CA5A

5.25 8.5 7

5.00 6 8.0 4.75 5 7.5 4.50 4

5 10 15 20 5 10 15 20 5 10 15 20

MARCO NRP1 8.4 FIGF

7.0 1.75 8.2

1.50 6.8 8.0

1.25 6.6 7.8

1.00 5 10 15 20 5 10 15 20 5 10 15 20

IFNLR1 REG3A 3.5 ACP5 3.5 6.0

3.0 5.5 3.0

2.5 5.0 2.5

2.0 4.5

NPX (log2 protein abundance)2.0 NPX (log2 protein abundance) NPX (log2 protein abundance) NPX (log2 protein abundance) 5 10 15 20 5 10 15 20 5 10 15 20

S100A12 CDCP1 GNLY 6 1.75 5.0 1.50 5

4.5 1.25 4

4.0 1.00 3 0.75 NPX (log2 protein abundance) 5 10 15 20 5 10 15 20 5 10 15 20 Time from first symptoms (days) Time from first symptoms (days) Time from first symptoms (days) medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

Figure 1 figure supplement 1

a Female Male b Sex COVID-19 -ve COVID-19 +ve

40 20

30 15

20 10 Number of Individuals

Number of Individuals 10 5

0 0 COVID-19 -ve COVID-19 +ve 40 50 60 70 80 90 40 50 60 70 80 90 Age

Ethnicity Asian (other) Black Other South Asian White d c 15

20

15 10

10 5 Number of Individuals

Number of Individuals 5

0 0 COVID-19 -ve COVID-19 +ve 1 2 3 4 5 6 7 8 9 10 Number of Samples medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

Figure 2 figure supplement 1

COVID-19 WHO Severity

Negative Mild Moderate Severe Critical a Severity at time of sampling 0.2

0.1 PC3 (7%) 0.0

-0.1

-0.2 -0.1 0.0 0.1 PC1 (19.11%) b Overall clinical course 0.2

0.1 PC3 (7%) 0.0

-0.1

-0.2 -0.1 0.0 0.1 PC1 (19.11%) medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

Figure 2 figure supplement 2

Plate ID 1 2 3 4 5 a

0.1

0.0 PC2 (8.94%)

-0.1

-0.2 -0.1 0.0 0.1 PC1 (19.11%) b 0.2

0.1 PC3 (7%) 0.0

-0.1

-0.2 -0.1 0.0 0.1 PC1 (19.11%) medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

Figure 3 figure supplement 1

Age Abundance (Z-Score) C Sex Severity Overall Course Downregulated

Age 80 Upregulated 40

Sex F M

WHO Severity Negative Mild Moderate Severe Critical medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

Figure 3 figure supplement 2 a COVID-19 +ve vs -ve differential abundance analysis

75000

50000

25000 Number of Replications

0 0 50 100 150 200

Number of Proteins Significant

COVID-19 severity b differential abundance analysis

75000

50000

25000 Number of Replications

0 0 50 100 150 200 Number of Proteins Significant medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

Figure 3 figure supplement 3 a r = 0.999 b 12 r = 0.955 2

9

1

6

0 3 as additional covariate as additional covaraite -log10(Pvalue) with diabetes -log10(Pvalue) Log2 Fold Change with diabetes Log2 Fold

0 0 1 2 0 3 6 9 12 Log2 Fold Change -log10(Pvalue) c r = 0.997 12 r = 0.949 2 d

9

1

6

0 3 ESKD as additional covariate ESKD as additional covaraite -log10(Pvalue) with cause of -log10(Pvalue) Log2 Fold Change with cause of Log2 Fold

0 0 1 2 0 3 6 9 12 Log2 Fold Change -log10(Pvalue) medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

Figure 3 figure supplement 4

COVID-19 +ve vs -ve differential abundance analysis a r = 0.999 b 12 r = 0.997 2

9

1

6

0 3 dialysis as additional covariate dialysis as additional covaraite -log10(Pvalue) with time from last -log10(Pvalue) Log2 Fold Change with time from last Log2 Fold 0 0 1 2 0 3 6 9 12 Log2 Fold Change -log10(Pvalue)

COVID-19 Severity differential abundance analysis c r = 0.999 d r = 0.998 1.5

1.0 10

0.5

5 0.0 dialysis as additional covariate Effect Size with time from last Effect Size

-0.5 dialysis as additional covaraite -log10(Pvalue) with time from last -log10(Pvalue) 0 -0.5 0.0 0.5 1.0 1.5 0 5 10 Effect Size -log10(Pvalue) medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

Figure 4 figure supplement 1

Significance None Differential Abundance Analysis Filbin et al. Both

CXCL10 2 DDX58 IFNG CCL7

KRT19 CCL8 CXCL11

1 TRIM21 LAG3 CPA1 IL1RN AGER NID1 IL6 LEP HSPB1 SOD2 CXCL1 ITGB1BP2 DECR1 0 KITLG PON3 IL10 SRPK2

Filbin et al . Effect Size CST5

FABP2 CCL16 CRTAC1 -1 MMP12 FGF23 GH1 r = 0.688 NPPB CCL24 -1 0 1 2 Differential Abundance Log2 Fold Change medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

Figure 7 figure supplement 1

Correlation (clinical measurement)

COVID-19 Severity Effect Size

Fatal COVID-19 Risk Coefficient

DDimerTroponinCRP FerritinNeutrophilWhite CountLymphocyte Cell CountMonocyte COVID-19Count CountFatal Severity COVID-19 Effect Risk Size Coefficient medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

Figure 9 figure supplement 1

WHO Severity Mild/Moderate Severe/Critical

HSD11B1 LYVE1 5 SFTPD 7.75 3.5 4 7.50 3.0 3 7.25 2.5 7.00 2

2.0 6.75 5 10 15 20 5 10 15 20 5 10 15 20

CCL24 VASN CLEC7A

6 2.75

5 7.0 2.50 4

6.5 3 2.25

2

5 10 15 20 5 10 15 20 5 10 15 20

NOTCH3 10.75 TNFRSF11A 6.5 IL17C 5.75 10.50 6.0

5.50 10.25 5.5

5.25 10.00 5.0

5.00 9.75 4.5

4.75 9.50 4.0 5 10 15 20 5 10 15 20 5 10 15 20 3.6 IL15RA FXYD5 FCGR2A

2.5 3.3 4.5 2.0 3.0

1.5 4.0 2.7 1.0 NPX (log2 protein abundance) NPX (log2 protein abundance) NPX (log2 protein abundance) NPX (log2 protein abundance) 5 10 15 20 5 10 15 20 5 10 15 20

5.5 ACE2

2.0 5.0

4.5 1.6

4.0 1.2

FETUB 3.5 0.8 NPX (log2 protein abundance) 5 10 15 20 5 10 15 20 Time from first symptoms (days) Time from first symptoms (days) medRxiv preprint doi: https://doi.org/10.1101/2020.11.05.20223289; this version posted February 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .

Figure 9 figure supplement 2

WHO Severity Mild/Moderate Severe/Critical

ITGA11 ITGB6

Mild/Moderate Severe/Critical Mild/Moderate Severe/Critical 3.0

2.5 5 2.0

1.5 4

1.0

NPX (log2 protein abundance) 3 5 10 15 20 5 10 15 20 5 10 15 20 5 10 15 20

TNFRSF10B PLAUR

Mild/Moderate Severe/Critical Mild/Moderate Severe/Critical

10 12

11 9

10 8 9 7 8 NPX (log2 protein abundance) 5 10 15 20 5 10 15 20 5 10 15 20 5 10 15 20

ICAM1 CNTN1

Mild/Moderate Severe/Critical Mild/Moderate Severe/Critical

4.0 8

3.5

7 3.0

2.5 6 NPX (log2 protein abundance) 5 10 15 20 5 10 15 20 5 10 15 20 5 10 15 20

CFHR5 LIFR

Mild/Moderate Severe/Critical Mild/Moderate Severe/Critical

9 6.0

5.5 8 5.0

4.5 7 4.0 NPX (log2 protein abundance) 5 10 15 20 5 10 15 20 5 10 15 20 5 10 15 20 Time from first symptoms (days) Time from first symptoms (days)