bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 The spatial proteome of the human intervertebral disc reveals architectural 2 changes in health, ageing and degeneration

3 Vivian Tam1,2*, Peikai Chen1*, Anita Yee1, Nestor Solis3, Theo Klein3,4, 4 Mateusz Kudelko1, Rakesh Sharma5, Wilson CW Chan1,2,6, Christopher M. Overall3, 5 Lisbet Haglund7, Pak C Sham8, Kathryn SE Cheah1, Danny Chan1,2

6 1 School of Biomedical Sciences, The University of Hong Kong, Hong Kong, China; 7 2 The University of Hong Kong Shenzhen Institute of Research and Innovation, Shenzhen, China; 8 3 Centre for Blood Research, Faculty of Dentistry, University of British Columbia, Vancouver, Canada; 9 4 Present address: Triskelion BV, Zeist, The Netherlands; 10 5 Proteomics and Metabolomics Core Facility, The University of Hong Kong, Hong Kong, China; 11 6 Department of Orthopaedics Surgery and Traumatology, HKU-Shenzhen Hospital, Shenzhen, China; 12 7 Department of Surgery, McGill University, Montreal, Canada; 13 8 Centre for PanorOmic Sciences (CPOS), The University of Hong Kong, Hong Kong, China. 14

15

16 KSEC is a senior editor at eLife. All other authors declare that they do not have any conflict of 17 interests

18 * These authors contributed equally to this manuscript

19 20 21 22 23 Correspondence should be addressed to:

24 Danny Chan 25 School of Biomedical Sciences 26 Faculty of Medicine 27 The University of Hong Kong 28 21 Sassoon Road, Hong Kong 29 Email: [email protected] 30 Tel.: +852 3917 9482

1 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

31 Abstract

32 The proteome of the human intervertebral disc, particularly the extracellular matrix, underpins its 33 integrity and function. We describe high quality spatial proteomes from young and aged lumbar 34 discs, profiled across different regions, and identified sets of that are variable or constant 35 with respect to disc compartments, levels, and anatomic directions. From these, four 36 modules defining a young disc were derived. Mapping these modules to aged disc proteomes 37 showed that the central nucleus pulposus becomes more like the outer annulus fibrosus. 38 Transcriptome analysis supported these proteomic findings. Concurrently, dynamic proteomic 39 analyses indicate impaired protein synthesis and increased degradative activities. We identified a 40 novel set of proteins that correlates with disc hydration that are predictive of MRI intensities. 41 Signalling pathways (SHH, WNT, BMP/TGFβ) and hypertrophic chondrocyte differentiation 42 events relating to ageing and degeneration were implicated. This work presents the foundation 43 proteomic architectural landscapes of young and ageing human discs.

44 (149 words)

45

46

47

48 Key words: human, intervertebral discs, nucleus pulposus, annulus fibrosus, ageing, 49 extracellular matrix, proteomics, TAILS, degradome, SILAC, transcriptomics, hypertrophy-like

50

2 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

51 Introduction

52 The 23 intervertebral discs (IVDs) in the human spine provide stability, mobility and flexibility. 53 IVD degeneration (IDD), most common in the lumbar region (Saleem et al., 2013; Teraguchi et 54 al., 2014), is associated with a decline in function and a major cause of back pain, affecting up to 55 80% of the world’s population at some point in life (Rubin, 2007), presenting significant 56 socioeconomic burdens. Multiple interacting factors such as genetics, influenced by ageing, 57 mechanical and other stress factors, contribute to the pathobiology, onset, severity and progression 58 of IDD (Munir et al., 2018).

59 IVDs are large, avascular, extracellular matrix (ECM)-rich structures comprising three 60 compartments: a hydrated nucleus pulposus (NP) at the centre, surrounded by a tough annulus 61 fibrosus (AF) at the periphery, and cartilaginous endplates of the adjoining vertebral bodies 62 (Humzah and Soames, 1988). The early adolescent NP is populated with vacuolated notochordal- 63 like cells, which are gradually replaced by small chondrocyte-like cells (Risbud et al., 2015). Blood 64 vessels terminate at the endplates, nourishing and oxygenating the NP via diffusion, whose limited 65 capacity mean that NP cells are constantly subject to hypoxic, metabolic and mechanical stresses 66 (Urban et al., 2004).

67 With ageing and degeneration, there is an overall decline in cell “health” and numbers (Rodriguez 68 et al., 2011; Sakai et al., 2012), disrupting homeostasis of the disc proteome. The ECM has key 69 roles in biomechanical function and disc hydration. Indeed, a hallmark of IDD is reduced hydration 70 in the NP, diminishing the disc’s capacity to dissipate mechanical loads. Clinically, T-2 weighted 71 magnetic resonance imaging (MRI) is the gold standard for assessing IDD, that uses disc hydration 72 and structural features such as bulging or annular tears to measure severity (Pfirrmann et al., 2001; 73 Schneiderman et al., 1987). The hydration and mechanical properties of the IVD are dictated by 74 the ECM composition, which is produced and maintained by the IVD cells.

75 To meet specific biomechanical needs, cells in the IVD compartments synthesise different 76 compositions of ECM proteins. Defined as the “matrisome” (Naba et al., 2012), the ECM houses 77 the cells and facilitates their inter-communication by regulation of the availability and presentation 78 of signalling molecules (Taha and Naba, 2019). With ageing or degeneration, the NP becomes 79 more fibrotic and less compliant (Yee et al., 2016), ultimately affecting disc biomechanics (Newell

3 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

80 et al., 2017). Changes in matrix stiffness can have a profound impact on cell-matrix interactions 81 and downstream transcriptional regulation, signalling activity and cell fate (Park et al., 2011).

82 The resulting alterations in the matrisome lead to vicious feedback cycles that reinforce cellular 83 degeneration and ECM changes. Notably, many of the associated IDD genetic risk factors, such 84 as COL9A1 (Jim et al., 2005), ASPN (Song et al., 2008), and CHST3 (Song et al., 2013), are 85 variants in encoding matrisome proteins, highlighting their importance for disc function. 86 Therefore, knowledge of the cellular and extracellular proteome and their spatial distribution in 87 the IVD is crucial to understanding the mechanisms underlying the onset and progression of IDD 88 (Feng et al., 2006).

89 Current knowledge of IVD biology is inferred from a limited number of transcriptomic studies on 90 human (Minogue et al., 2010; Riester et al., 2018; Rutges et al., 2010) and animal (Veras et al., 91 2020) discs. Studies showed that cells in young healthy NP express markers including CD24, 92 KRT8, KRT19 and T (Fujita et al., 2005; Minogue et al., 2010; Rutges et al., 2010), whilst NP 93 cells in aged or degenerated discs have different and variable molecular signatures (Chen et al., 94 2006; Rodrigues-Pinto et al., 2016), such as genes involved in TGFβ signalling (TGFA, INHA, 95 INHBA, BMP2/6). The healthy AF expresses genes including collagens (COL1A1 and COL12A1) 96 (van den Akker et al., 2017), growth factors (PDGFB, FGF9, VEGFC) and signalling molecules 97 (NOTCH and WNT) (Riester et al., 2018). Although transcriptomic data provides valuable cellular 98 information, it does not faithfully reflect the molecular composition. Cells represent only a small 99 fraction of the disc volume, transcriptome-proteome discordance does not enable accurate 100 predictions of protein levels from mRNA (Fortelny et al., 2017), and the disc matrisome 101 accumulates and remodels over time.

102 Proteomic studies on animal models of IDD, including murine (McCann et al., 2015), canine 103 (Erwin et al., 2015), and bovine (Caldeira et al., 2017), have been reported. Nevertheless, human- 104 animal differences in cellular phenotypes and mechanical loading physiologies mean that these 105 findings might not translate to the human scenario. So far, human proteomic studies have 106 compared IVDs with other cartilaginous tissues (Onnerfjord et al., 2012a); and have shown 107 increases in fibrotic changes in ageing and degeneration (Yee et al., 2016), a role for inflammation 108 in degenerated discs (Rajasekaran et al., 2020), the presence of haemoglobins and

4 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

109 immunoglobulins in discs with spondylolisthesis and herniation (Maseda et al., 2016), and changes 110 in proteins related to cell adhesion and migration in IDD (Sarath Babu et al., 2016). The reported 111 human disc proteomes were limited in the numbers of proteins identified and finer 112 compartmentalisation within the IVD, and disc levels along the lumbar spine have yet to be 113 studied. Nor have the proteome dynamics in term of ECM remodelling (synthesis and degradation) 114 in young human IVDs and changes in ageing and degeneration been described.

115 In this study, we established a reference spatial proteome of young healthy IVDs based on 33 LC- 116 MS/MS (liquid chromatography mass spectrometry) profiles generated from a systematic division 117 of the different compartments of three whole lower lumbar discs from a 16-year-old male non- 118 degenerated individual. To gain insights into how the IVD proteomic landscape changes with 119 ageing, we analysed 33 LC-MS/MS profiles from IVDs isolated from a 59-year-old male 120 individual at the corresponding disc levels. We evaluated differences and similarities in the 121 proteome of these disc compartments, by principal component analysis (PCA), analysis of variance 122 (ANOVA) and identification of differentially expressed proteins (DEPs). The proteomic findings 123 were consolidated with 8 transcriptome profiles from 4 individuals.

124 We discovered modules containing specific sets of proteins that describe the landscape of a young 125 IVD, and the deconstruction of these modules with ageing and degeneration. We analysed the 126 dynamic changes in the disc proteome by SILAC (Stable Isotopic Labelling of Amino Acids in 127 Culture) (Ong et al., 2002) on the IVDs of 5 individuals, and by TAILS (Terminal Amine Isotopic 128 Labelling of Substrates) (Kleifeld et al., 2010) on 6 individuals, to determine synthesis and 129 degradation status, and their relationships with disc compartment and IDD. We confirmed the 130 presence of a basal ECM remodelling process in the NP of young IVDs that increases with ageing 131 and degeneration. Finally, we correlated types of proteins with tissue hydration using high 132 resolution MRI of the aged discs, to derive a hydration signature predictive of MRI intensities.

133

5 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

134 Results

135 Disc samples and their phenotypes

136 Lumbar disc components from 17 individuals (Table 1) were harvested for proteomic, 137 transcriptomic, SILAC and degradome (Lopez-Otin and Overall, 2002) analyses. Reference 138 proteomes were generated from lumbar disc segments of a young (16 years) and an aged (59 years) 139 individual. T1- and T2-weighted MRI (3T) showed the young discs (L3/4, L4/5, L5/S1) were non- 140 degenerated, with a Schneiderman score of 1 from T2 images (Figure 1A). The NP of young IVD 141 were well hydrated (white) with no disc bulging, endplate changes, or observable inter-level 142 variations (Figure 1A; Supplemental Figure S1A), consistent with healthy discs and were deemed 143 fit to serve as a benchmarking reference. To investigate structural changes associated with ageing, 144 high resolution (7T) MRI was taken for the aged discs (Figure 1B). All discs had irregular 145 endplates, and annular tears were present (green arrowheads) adjacent to the lower endplate and 146 extending towards the posterior region at L3/4 and L4/5 (Figure 1B). The NP exhibited regional 147 variations in hydration in both sagittal and coronal images (Figure 1B). As the young and aged 148 MRIs were obtained at different resolutions, direct comparison is difficult. However, the aged 149 discs were less hydrated and the NP and AF structures less distinct, consistent with gross 150 observations (Supplemental Figure S1A). Phenotypes of the disc samples used in the generation 151 of other profiling data are described in methods and Table 1.

152 Quality data detecting large numbers of matrisome and non-matrisome proteins

153 To derive spatial reference proteomes for young and aged human IVDs, we subdivided the lumbar 154 discs into 11 key regions (Figure 1C), spanning the outer-most (outer AF; OAF) to the central- 155 most (NP) region of the disc, traversing both anteroposterior and lateral axes, adding valuable 156 spatial information to our proteomic dataset. Since the disc is an oval shape, an inner AF (IAF) 157 region was assigned in the lateral directions. A “mixed” compartment between the NP and IAF 158 with undefined boundary was designated as the NP/IAF in all four (anteroposterior and lateral) 159 directions. In all, this amounted to 66 specimens with different compartments, ages, directions and 160 levels, which then underwent LC-MS/MS profiling (Supplemental Table S1). Systematic analyses 161 of the 66 profiles are depicted in a flowchart (Figure 1D).

6 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

162 A median of 654 proteins per profile were identified for the young samples and 829 proteins for 163 the aged samples, with a median of 742 proteins per profile for young and aged combined. The 164 proteome-wide distributions were on similar scales across the profiles (Supplemental Figure S1B). 165 Of the 3,100 proteins detected in total, 418 were matrisome proteins (40.7% of all known 166 matrisome proteins) and 2,682 non-matrisome proteins (~14% of genome-wide non-matrisome 167 genes) (Figure 1E; Supplemental Figure S1C), and 983 were common to all four major 168 compartments, namely the OAF, IAF, NP/IAF, and NP (Figure 1E, upper panel). A total of 1,883 169 proteins were identified in young discs, of which 690 (36%) were common to all regions. 170 Additionally, 45 proteins (2.4%) were unique to NP, whilst NP/IAF, IAF and OAF had 86 (4.6%), 171 54 (2.9%) and 536 (28%) unique proteins, respectively (Figure 1E, middle panel). For the aged 172 discs, 2,791 proteins were identified, of which 803 (28.8%) were common to all regions. NP, 173 NP/IAF and IAF had 34 (12%), 80 (28.7%) and 44 (15%) unique proteins, respectively, with the 174 OAF accounting for the highest proportion of 1,314 unique proteins (47%) (Figure 1E, lower 175 panel). The aged OAF had the highest number of detected proteins with an average of 1,156, 176 followed by the young OAF with an average of 818 (Figure 1F). The quantity and spectrum of 177 protein categories identified suggest sufficient proteins had been extracted and the data are of high 178 quality.

179 Levels of matrisome proteins declines in all compartments of aged discs

180 We divided the detected matrisome proteins into core matrisome (ECM proteins, encompassing 181 collagens, proteoglycans and ), and non-core matrisome (ECM regulators, ECM 182 affiliated and secreted factors), according to a matrisome classification database (Naba et al., 2012) 183 (matrisomeproject.mit.edu) (Figure 1F). Despite the large range of total numbers of proteins 184 detected (419 to 1,920) across the 66 profiles (Figure 1F), all six sub-categories of the matrisome 185 contained similar numbers of ECM proteins (Figure 1F,G; Supplemental Figure S2A). The non- 186 core matrisome proteins were significantly more abundant in aged than in young discs 187 (Supplemental Figure S2A). On average, 19 collagens, 18 proteoglycans, 68 glycoproteins, 52 188 ECM regulators, 22 ECM affiliated proteins, and 29 secreted factors of ECM were detected per 189 profile. The majority of the proteins in these matrisome categories were detected in all disc 190 compartments, and in both age groups. A summary of all the comparisons are presented in

7 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

191 Supplemental Figure S1C-E, and the commonly expressed matrisome proteins are listed in Table 192 2.

193 Even though there are approximately three times more non-matrisome than matrisome proteins 194 per profile on average (Figure 1F), their expression levels in terms of label-free quantification 195 (LFQ) values are markedly lower (Supplemental Figure S2B). Specifically, the expression levels

196 of core-matrisome were the highest, with an average log2(LFQ) of 30.65, followed by non-core 197 matrisome at 28.56, and then non-matrisome at 27.28 (Supplemental Figure S2B). Within the core- 198 matrisome, the expression was higher (p=6.4×10-21) in young (median 30.74) than aged (median 199 29.72) discs (Supplemental Figure S2C,D). This difference between young and aged discs is 200 consistent within the sub-categories of core and non-core matrisome, with the exception of the 201 ECM regulator category (Figure 1H). The non-core matrisome and non-matrisome, however, 202 exhibited smaller cross-compartment and cross-age differences in terms of expression levels 203 (Supplemental Figure S2E-H). This implies that the amount of ECM protein in each compartment 204 of the disc declines with ageing and possibly changes in the relative composition, but the numbers 205 of proteins detected per matrisome sub-category remain similar. This agrees with the concept that 206 ECM synthesis is not sufficient to counterbalance degradation, as exemplified in a proteoglycan 207 study (Silagi et al., 2018).

208 Cellular activities inferred from non-matrisome proteins

209 Although 86.5% (2,682) of the detected proteins were non-matrisome, their expression levels were 210 considerably lower than matrisome proteins across all sample profiles (Figure 1F). A functional 211 categorisation according to the Nomenclature Committee family annotations 212 (Yates et al., 2017) showed many categories containing information for cellular components and 213 activities, with the top 30 listed in Figure 1I. These included transcriptional and translational 214 machineries, post-translational modifications, mitochondrial function, protein turnover; and 215 importantly, transcriptional factors, cell surface markers, and inflammatory proteins that can 216 inform gene regulation, cell identity and response in the context of IVD homeostasis, ageing and 217 degeneration.

218 These functional overviews highlighted 77 DNA-binding proteins and/or transcription factors, 83 219 cell surface markers, and 175 inflammatory-related proteins, with their clustering data presented

8 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

220 as heatmaps (Supplemental Figure S3). Transcription factors and cell surface markers are detected 221 in some profiles (Supplemental Figure S3A and B). The heatmap of the inflammatory-related 222 proteins showed that more than half of the proteins are detected in the majority of samples, with 4 223 major clusters distinguished by age and expression levels (Supplemental S3C). For example, one 224 of the clusters in the aged samples showed enrichment for complement and coagulation cascades 225 (False Discovery Rate, FDR q=1.62×10-21) and clotting factors (FDR q=6.05×10-9), indicating 226 potential infiltration of blood vessels. Lastly, there are 371 proteins involved in signalling 227 pathways, and their detection frequency in the different compartments and heat map expression 228 levels are illustrated in Supplemental Figure S3D.

229 Histones and housekeeping genes inform cross compartment- and age-specific 230 variations in cellularity

231 Cellularity within the IVD, especially the NP, decreases with age and degeneration (Rodriguez et 232 al., 2011; Sakai et al., 2012). We assessed whether cellularity of the different compartments could 233 be inferred from the proteomic data. Quantitation of histones can reflect the relative cellular 234 content of tissues (Wisniewski et al., 2014). We detected 10 histones, including subunits of histone 235 1 (HIST1H1B/C/D/E, HIST1H2BL, HIST1H3A, HIST1H4A) and histone 2 (HIST2H2AC, 236 HIST2H2BE, HIST2H3A), with 4 subunits identified in over 60 sample profiles that are mutually 237 co-expressed (Supplemental Figure S4A). Interestingly, histone concentrations, and thus 238 cellularity, increased from the inner to the outer compartments of the disc, and showed a highly 239 significant decrease in aged discs compared to young discs across all compartments (Figure 1J), 240 (Wilcoxon p=5.6×10-4) (Supplemental Figure S4B).

241 GAPDH and ACTA2 are two commonly used reference proteins, involved in the metabolic 242 process and the cytoskeletal organisation of cells, respectively. They are expected to be relatively 243 constant between cells and are used to quantify the relative cellular content of tissues (Barber et 244 al., 2005). They were detected in all 66 profiles. GAPDH and ACTA2 amounts were significantly 245 correlated with a Pearson correlation coefficient (PCC) of 0.794 (p=9.3×10-9) (Supplemental 246 Figure S4C), and they were both significantly co-expressed with the detected histone subunits, 247 with PCCs of 0.785 and 0.636, respectively (Figure 1K; Supplemental Figure S4D). As expected, 248 expression of the histones, GAPDH and ACTA2 was not correlated with two core-matrisome

9 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

249 proteins, ACAN and COL2A1 (Supplemental Figure S4E); whereas ACAN and COL2A1 were 250 significantly co-expressed (Supplemental Figure S4F), as expected due to their related regulation 251 of expression and tissue function. Thus, cellularity information can be obtained from proteomic 252 information, and the histone quantification showing reduced cellularity in the aged IVD is 253 consistent with the reported changes (Rodriguez et al., 2011; Sakai et al., 2012).

254 Phenotypic variations revealed by PCA and ANOVA

255 PCA captures the information content distinguishing age and tissue types

256 To gain a global overview of the data, we performed PCA on a set of 507 proteins selected 257 computationally, allowing maximal capture of valid values, while incurring minimal missing 258 values, followed by imputations (Supplemental Figure S5; Methods). The first two principal 259 components (PCs) explained a combined 65.5% of the variance, with 39.9% and 25.6% for the 260 first and second PCs, respectively (Figure 2A and B). A support vector machine with polynomial 261 kernel was trained to predict the boundaries: it showed PC1 to be most informative to predict age, 262 with a clear demarcation between the two age groups (Figure 2A, vertical boundary), whereas PC2 263 distinguished disc sample localities, separating the inner compartments (NP, NP/IAF and IAF) 264 from the OAF (Figure 2A, horizontal boundary). PC3 captured only 5.0% of the variance (Figure 265 2C,D), but it distinguished disc level, separating the lowest level (L5/S1) from the rest of the 266 lumbar discs (L3/4 and L4/5) (Figure 2D, horizontal boundary). Samples in the upper level (L3/4 267 and L4/5) appeared to be more divergent, with the aged disc samples deviating from the young 268 ones (Figure 2D).

269 Top correlated genes with the principal components are insightful of disc homeostasis

270 To extract the most informative features of the PCA, we performed proteome-wide associations 271 with each of the top three PCs, which accounted for over 70% of total variance, and presented the 272 top 100 most positively and top 100 most negatively correlated proteins for each of the PCs (Figure 273 2E-G). As expected, the correlation coefficients in absolute values were in the order of PC1 > 274 PC2 > PC3 (Figure 2E-G). The protein content is presented as non-matrisome proteins (grey 275 colour) and matrisome proteins (coloured) that are sub-categorised as previously. For the 276 negatively correlated proteins, the matrisome proteins contributed to PC1 in distinguishing young

10 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

277 disc samples, as well as to PC2 for sample location within the disc, but less so for disc level in 278 PC3. Further, the relative composition of the core and non-core matrisome proteins varied between 279 the three PCs, depicting the dynamic ECM requirement and its relevance in ageing (PC1), tissue 280 composition within the disc (PC2) and mechanical loading (PC3).

281 PC1 of young discs identified known chondrocyte markers, CLEC3A (Lau et al., 2018) and 282 LECT1/2 (Zhu et al., 2019); hedgehog signalling proteins, HHIPL2, HHIP, and SCUBE1 (Johnson 283 et al., 2012); and xylosyltransferase-1 (XYLT1), a key for initiating the attachment of 284 glycosaminoglycan side chains to proteoglycan core proteins (Silagi et al., 2018) (Figure 2E). Most 285 of the proteins that were positively correlated in PC1 were coagulation factors or coagulation 286 related, suggesting enhanced blood infiltration in aged discs. PC2 implicated key changes in 287 molecular signalling proteins (hedgehog, WNT and Nodal) in the differences between the inner 288 and outer disc regions (Figure 2F). Notably, PC2 contains heat shock proteins (HSPA1B, HSPA8, 289 HSP90AA1, HSPB1) which are more strongly expressed in the OAF than in inner disc, indicating 290 the OAF is under stress (Takao and Iwaki, 2002). Although the correlations in PC3 were much 291 weaker, proteins such as CILP/CILP2, DCN, and LUM were associated with lower disc level.

292 ANOVA reveals the principal phenotypes for categories of ECMs

293 To investigate how age, disc compartment, level, and direction affect the protein profiles, we 294 carried out ANOVA for each of these phenotypic factors for the categories of matrisome and non- 295 matrisome proteins (Supplemental Figure S5H-J). In the young discs, the dominant phenotype 296 explaining the variances for all protein categories was disc compartment. It is crucial that each 297 disc compartment (NP, IAF and OAF) has the appropriate protein composition to function 298 correctly (Supplemental Figure S5I). This also fits the understanding that young healthy discs are 299 axially symmetric and do not vary across disc levels. In aged discs, compartment is still relevant 300 for non-matrisome proteins and collagen, but disc level and directions become influential for other 301 protein categories, which is consistent with variations in mechanical loading occurring in the discs 302 with ageing and degeneration (Supplemental Figure S5J). In the combined (young and aged) disc 303 samples, age was the dominant phenotype across major matrisome categories, while compartment 304 best explained the variance in non-matrisome, reflecting the expected changes in cellular (non- 305 matrisome) and structural (matrisome) functions of the discs (Supplemental Figure S5H). This

11 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

306 guided us to analyse the young and aged profiles separately, before performing cross-age 307 comparisons.

308 The spatial proteome of young and healthy discs

309 PCA of the 33 young profiles showed a distinctive separation of the OAF from the inner disc 310 regions on PC1 (upper panel of Figure 3A). In PC2, the lower level L5/S1 could generally be 311 distinguished from the upper lumbar levels (lower panel of Figure 3A). The detected proteome of 312 the young discs (Figure 3B) accounts for 9.2% of the human proteome (or 1,883 out of the 20,368 313 on UniProt). We performed multiple levels of pairwise comparisons (summarised in Supplemental 314 Figure S6A) to detect proteins associated with individual phenotypes, using three approaches (see 315 Methods): statistical tests; proteins detected in one group only; or proteins using a fold-change 316 threshold. We detected a set of 671 DEPs (Supplemental Table S2) (termed the ‘variable set’), 317 containing both matrisome and non-matrisome proteins (Figure 3D), and visualised in a heatmap 318 (Supplemental Figure S7), with identification of four modules (Y1-Y4).

319 Expression modules showing lateral and anteroposterior trends

320 To investigate how the modules are associated with disc components, we compared their protein

321 expression profiles along the lateral and anteroposterior axes. The original log2(LFQ) values were 322 transformed to z-scores to be on the same scale. Proteins of the respective modules were 323 superimposed on the same charts, disc levels combined or separated (Figure 3E-H; Supplemental 324 Figure S8). Module Y1 is functionally relevant to NP, containing previously reported NP and novel 325 markers KRT19, CD109, KRT8 and CHRD (Anderson et al., 2002), CHRDL2, SCUBE1 (Johnson 326 et al., 2012) and CLEC3B. Proteins levels in Y1 are lower on one side of the OAF, increases 327 towards the central NP, before declining towards the opposing side, forming a concave pattern in 328 both lateral and anteroposterior directions, with a shoulder drop occurring between IAF and OAF 329 (Figure 3E).

330 Module Y2 was enriched in proteins for ECM organisation (FDR q=8.8×10-24); Y3 was enriched 331 for proteins involved in smooth muscle contraction processes (FDR q=8.96×10-19); and Y4 was 332 enriched for proteins of the innate immune system (FDR q=2.93×10-20). Interestingly, these 333 modules all showed a tendency for convex patterns with upward expression toward the OAF

12 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

334 regions, in both lateral and anteroposterior directions, in all levels, combined or separated (Figure 335 3F-H; Supplemental S8B-D). The higher proportions of ECM, muscle contraction and immune 336 system proteins in the OAF are consistent with the contractile function of the AF (Nakai et al., 337 2016), and with the NP being avascular and “immune-privileged” in a young homeostatic 338 environment (Sun et al., 2020).

339 Inner disc regions are characterised with NP markers

340 The most distinctive pattern on the PCA is the separation of the OAF from the inner disc, with 99 341 proteins expressed higher in the OAF and 55 expressed higher in the inner disc (Figure 3I,J). 342 Notably, OAF and inner disc contained different types of ECM proteins. The inner disc regions 343 were enriched in collagens (COL3A1, COL5A1, COL5A2, and COL11A2), matrillin (MATN3), 344 and proteins associated with ECM synthesis (PCOLCE) and matrix remodelling (MXRA5). We 345 also identified in the inner disc previously reported NP markers (KRT19, KRT8) (Risbud et al., 346 2015), in addition to inhibitors of WNT (FRZB, DKK3) and BMP (CHRD, CHRDL2) signalling 347 (Figure 3I,J). Of note, FRZB and CHRDL2 were recently shown to have potential protective 348 characteristics in osteoarthritis (Ji et al., 2019). The TGFβ pathway appears to be suppressed in the 349 inner disc where antagonist CD109 (Bizet et al., 2011; Li et al., 2016) is highly expressed, and the 350 TGFβ activity indicator TGFBI is expressed higher in the OAF than in the inner disc.

351 The OAF signature is enriched with proteins characteristic of tendon and ligament

352 The OAF is enriched with various collagens (COL1A1, COL6A1/2/3, COL12A1 and COL14A1), 353 basement membrane (BM) proteins (LAMA4, LAMB2 and LAMC1), small leucine-rich 354 proteoglycans (SLRP) (BGN, DCN, FMOD, OGN, PRELP), and BM-anchoring protein (PRELP) 355 (Figure 3I,J). Tendon-related markers such as thrombospondins (THBS1/2/4) (Subramanian and 356 Schilling, 2014) and cartilage intermediate layer proteins (CILP/CILP2) are also expressed higher 357 in the OAF. Tenomodulin (TNMD) was exclusively expressed in 9 of the 12 young OAF profiles, 358 and not in any other compartments (Supplemental Table S2). This fits current understanding of 359 the AF as a tendon/ligament-like structure (Nakamichi et al., 2018). In addition, the OAF was 360 enriched in actin-myosin (Figure 3I), suggesting a role of contractile function in the OAF, and in 361 heat shock proteins (HSPA1B, HSPA8, HSPB1, HSP90B1, HSP90AA1), suggesting a stress 362 response to fluctuating mechanical loads.

13 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

363 Spatial proteome enables clear distinction between IAF and OAF

364 We sought to identify transitions in proteomic signatures between adjacent compartments. The NP 365 and NP/IAF protein profiles were highly similar (Supplemental Figure 6B). Likewise, NP/IAF and 366 IAF showed few DEPs, except COMP which was expressed higher in IAF (Supplemental Figure 367 6C). OAF and NP (Supplemental Figure 6R), and OAF and NP/IAF (Supplemental Figure 6O) 368 showed overlapping DEPs, consistent with NP and NP/IAF having highly similar protein profiles, 369 despite some differences in the anteroposterior direction (Supplemental Figure 6P,Q). The clearest 370 boundary within the IVD, between IAF and OAF, was marked by a set of DEPs, of which 371 COL5A1, SERPINA5, MXRA5 were enriched in the IAF, whereas LAMB2, THBS1, CTSD 372 typified the OAF (Supplemental Figure 6D). These findings agreed with the modular patterns 373 (Figure 3E-H).

374 The constant set shows common characteristics for the young reference architecture

375 Of the 1,880 proteins detected, 1204 proteins were not found to vary with respect to the phenotypic 376 factors. The majority of these proteins were detected in few profiles (Figure 3B) and were not used 377 in the comparisons. We set a cutoff for a detection in >1/2 of the profiles to prioritise a set of 245 378 proteins, hereby referred to as the ‘constant set’ (Figure 3C). Both the variable and the constant 379 sets contained high proportions of ECM proteins (Figure 3D). Amongst the proteins in the constant 380 set that were detected in all 33 young profiles were known protein markers defining a young NP 381 or disc, including COL2A1, ACAN and A2M (Risbud et al., 2015). Other key proteins in the 382 constant set included CHAD, HAPLN1, VCAN, HTRA1, CRTAC1, and CLU. Collectively, these 383 proteins showed the common characteristics shared by compartments of young discs, and they, 384 alongside the variable set, form the architectural landscape of the young disc.

385 Diverse changes in the spatial proteome with ageing

386 Fewer inner-outer differences but greater variation between levels in aged discs

387 PCA was used to identify compartmental, directional and level patterns for the aged discs (Figure 388 4A). Albeit less clear than for the young discs (Figure 3A), the OAF could be distinguished from 389 the inner disc regions on PC1, explaining 46.7% of the total variance (Figure 4A). PC2 showed a

14 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

390 more distinct separation of signatures from lumbar disc levels L5/S1 to the upper disc levels 391 (L4/L5 and L3/4), accounting for 21.8% of the total variance (Figure 4A).

392 Loss of the NP signature from inner disc regions

393 As with the young discs, we performed a series of comparative analyses (Figure 4C; Supplemental 394 Figure S9 A-H; Supplemental Table S3). Detection of DEPs between the OAF and the inner disc 395 (Figure 4B), showed that 100 proteins were expressed higher in the OAF, similar to young discs 396 (Figure 3C). However, in the inner regions, only 9 proteins were significantly expressed higher, in 397 marked contrast to the situation in young discs. Fifty-five of the 100 DEPs in the OAF region 398 overlapped in the same region in the young discs, but only 3 of the 9 DEPs in the inner region were 399 identified in the young disc; indicating changes in both regions but more dramatic in the inner 400 region. This suggests that ageing and associated changes may have initiated at the centre of the 401 disc. The typical NP markers (KRT8/19, CD109, CHRD, CHRDL2) were not detectable as DEPs 402 in the aged disc; but CHI3L2, A2M and SERPING1 (Figure 4B), which have known roles in tissue 403 fibrosis and wound healing (Lee et al., 2011; Naveau et al., 1994; Wang et al., 2019a), were 404 detected uniquely in the aged discs.

405 Gradual modification of ECM composition and cellular responses in the outer AF

406 A comparative analysis of the protein profiles indicated that the aged OAF retained 55% of the 407 proteins of a young OAF. These changes are primarily reflected in the class of SLRPs (BGN, DCN, 408 FMOD, OGN, and PRELP), and glycoproteins such as CILP, CILP2, COMP, FGA/B, and FGG. 409 From a cellular perspective, 45 proteins enriched in the aged OAF could be classified under 410 ‘responses to stress’ (FDR q=1.86×10-7; contributed by CAT, PRDX6, HSP90AB1, EEF1A1, 411 TUBB4B, P4HB, PRDX1, HSPA5, CRYAB, HIST1H1C), suggesting OAF cells are responding 412 to a changing environment such as mechanical loading and other stress factors.

413 Convergence of the inner disc and outer regions in aged discs

414 To map the relative changes between inner and outer regions of the aged discs, we performed a 415 systematic comparison between compartments (Supplemental Figure S9 A-D). The most 416 significant observation was a weakening of the distinction between IAF and OAF that was seen in 417 young discs, with only 17 DEPs expressed higher in the OAF of the aged disc (Supplemental

15 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

418 Figure S9C). More differences were seen when we included the NP in the comparison with OAF 419 (Supplemental Figure S9D), indicating some differences remain between inner and outer regions 420 of the aged discs. While the protein profiles of the NP and IAF were similar, with no detectable 421 DEPs (Supplemental Figure S9A,B), their compositions shared more resemblance with the OAF. 422 These progressive changes of the protein profiles and DEPs between inner and outer compartments 423 suggest the protein composition of the inner disc compartments becomes more similar to the OAF 424 with ageing, with the greatest changes in the inner regions. This further supports a change initiating 425 from the inner region of the discs with ageing.

426 Changes in young module patterns reflect convergence of disc compartments

427 We investigated protein composition in the young disc modules (Y1-Y4) across the lateral and 428 anteroposterior axes (Figure 4C-F). For module Y1 that consists of proteins defining the NP 429 region, the distinctive concave pattern has flattened along both axes, but more so for the 430 anteroposterior direction where the clear interface between IAF and OAF was lost (Figure 4C; 431 Supplemental Figure S9I). Similarly for modules Y2 and Y3, which consist of proteins defining 432 the AF region, the trends between inner and outer regions of the disc have changed such that the 433 patterns become more convex, with a change that is a continuum from inner to outer regions 434 (Figure 4D,E; Supplemental Figure S9J,K). These changes in modules Y1-3 in the aged disc 435 further illustrate the convergence of the inner and outer regions, with the NP/IAF becoming more 436 OAF-like. For Y4, the patterns along the lateral and anteroposterior axes were completely 437 disrupted (Figure 4F; Supplemental Figure S9J). As Y4 contains proteins involved in vascularity 438 and inflammatory processes, these changes indicate disruption of cellular homeostasis in the NP.

439 Disc level variations reflect spatial and temporal progression of disc changes

440 The protein profiles of the aged disc levels, consistent with the PCA findings, showed similarity 441 between L3/4 and L4/5 (Supplemental Figure S9E), but, in contrast to young discs, differences 442 between L5/S1 and L4/5 (Supplemental Figure S9F), and more marked differences between L5/S1 443 and L3/4 (Supplemental Figure S9G,H). Overall, the findings from PCA, protein profiles (Figure 444 2B-D) and MRI (Figure 1B) agree. As compared to young discs, the more divergent differences 445 across the aged disc levels potentially reflect progressive transmission from the initiating disc to

16 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

446 the adjacent discs with ageing. To further investigate the aetiologies underlying IDD, cross-age 447 comparisons are needed.

448 Aetiological insights uncovered by young/aged comparisons

449 Next, we performed extensive pair-wise comparisons between the young and aged samples under 450 a defined scheme (Supplemental Figure S10A; Supplemental Table S4). First, we compared all 33 451 young samples with all 33 aged samples, which identified 169 DEPs with 104 expressed higher in 452 the young and 65 expressed higher in the aged discs (Figure 5A). A simple GO term analysis 453 showed that the most important biological property for a young disc is structural integrity, which 454 is lost in aged discs (Figure 5B; Supplemental Table S5). The protein classes most enriched in the 455 young discs were related to cartilage synthesis, chondrocyte development, and ECM organisation 456 (Figure 5B). The major changes in the aged discs relative to young ones, were proteins involved 457 in cellular responses to an ageing environment, including inflammatory and cellular stress signals, 458 progressive remodelling of disc compartments, and diminishing metabolic activities (Figure 5C; 459 Supplemental Table S5).

460 Inner disc regions present with most changes in ageing

461 For young versus old discs, we compared DEPs of the whole disc (Figure 5A) with those from the 462 inner regions (Figure 5D) and OAF (Figure 5E) only. Seventy-five percent (78/104) of the down- 463 regulated DEPs (Figure 5F) were attributed to the inner regions, and only 17% (18/104) were 464 attributed to the OAF (Figure 5F). Similarly, 65% (42/65) of the up-regulated DEPs were solely 465 contributed by inner disc regions, and only 18.4% (12/65) by the OAF. Only 5 DEPs were higher 466 in the OAF, while 49 were uniquely up-regulated in the inner discs (Figure 5G). The key biological 467 processes in each of the compartments are highlighted in Figure 5F, G.

468 The changing biological processes in the aged discs

469 Expression of known NP markers was reduced in aged discs, especially proteins involved in the 470 ECM and its remodelling, where many of the core matrisome proteins essential for the structural 471 function of the NP were less abundant or absent. While this is consistent with previous 472 observations (Feng et al., 2006), of interest was the presence of a set of protein changes that were 473 also seen in the OAF, which was also rich in ECM and matrix remodelling proteins (HTRA1,

17 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

474 SERPINA1, SERPINA3, SERPINC1, SERPINF1, and TIMP3) and proteins involved in fibrotic 475 events (FN1, POSTN, APOA1, APOB), suggesting these changes are occurring in the aging IVDs 476 (Figure 5 F,G).

477 Proteins associated with cellular stress are decreased in the aged inner disc, with functions ranging 478 from molecular chaperones needed for (HSPB1, HSPA1B and HSPA9) to 479 modulation of oxidative stress (SOD1) (Figure 5F). SOD1 has been shown to become less 480 abundant in the aged IVD (Hou et al., 2014) and in osteoarthritis (Scott et al., 2010). HSPB1 is 481 cytoprotective and a deficiency is associated with inflammation reported in degenerative discs 482 (Wuertz et al., 2012). We found an increased concentration of (CLU) (Figure 5G), an 483 extracellular that aids the solubilisation of misfolded protein complexes by binding 484 directly and preventing protein aggregation (Trougakos, 2013; Wyatt et al., 2009), and also has a 485 role in suppressing fibrosis (Peix et al., 2018).

486 Inhibitors of WNT (DKK3 and FRZB), and antagonists of BMP/TGFβ (CD109, CHRDL2, DCN 487 FMOD, INHBA and THBS1) signalling were decreased or absent in the aged inner region (Figure 488 5F,G), consistent with the reported up-regulation of these pathways in IDD (Hiyama et al., 2010) 489 and its closely related condition osteoarthritis (Leijten et al., 2013). Targets of hedgehog signalling 490 (HHIPL2 and SCUBE1) were also reduced (Figure 5F), consistent with SHH’s key roles in IVD 491 development and maintenance (Rajesh and Dahia, 2018). TGFβ signalling is a well-known 492 pathway associated with fibrotic outcomes. WNT is known to induce chondrocyte hypertrophy 493 (Dong et al., 2006), that can be enhanced by a reduction in (Figure 5F), a known inhibitor 494 of chondrocyte hypertrophy (Saito et al., 2007).

495 To gain an overview of the disc compartment variations between young and aged discs, we 496 followed the same strategy as in Supplemental Figure S7 to aggregate three categories of DEPs in 497 all 23 comparisons (Supplemental Figure S10A), and created a heatmap from the resulting 719 498 DEPs. This allowed us to identify 6 major protein modules (Figure 5H). A striking feature is 499 module 6 (M6), which is enriched for proteins involved in the complement pathway (GSEA 500 Hallmark FDR q=4.9×10-14) and angiogenesis (q=2.3×10-3). This module contains proteins that 501 are all highly expressed in the inner regions of the aged disc, suggesting the presence of blood. M6 502 also contains the macrophage marker CD14, which supports this notion.

18 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

503 We visualised the relationship between the young (Y1-4) and young/aged (M1-6) modules using 504 an alluvial chart (Figure 5I). Y1 corresponds primarily to M1b that is enriched with fibrosis, 505 angiogenesis, apoptosis and EMT (epithelial to mesenchymal transition) proteins. Y2 seems to 506 have been deconstructed into three M modules (M2b > M3 > M4). M2b and M3 contain proteins 507 linked to heterogeneous functions, while proteins in M4 are associated with myogenesis and 508 cellular metabolism, but also linked to fibrosis and angiogenesis. Y3 primarily links to M2a with 509 a strong link to myogenesis, and mildly connects with M3 and M4. Y4 has the strongest connection 510 with M6a, which is linked to coagulation. Both the variable and the constant sets of the young disc 511 were also changed in ageing.

512 In all, the IVD proteome showed that with ageing, activities of the SHH pathways were decreased, 513 while those of the WNT and BMP/TGFβ pathways, EMT, angiogenesis, fibrosis, cellular stresses 514 and chondrocyte hypertrophy-like events were increased.

515 Concordant changes between the transcriptome and proteome of disc cells

516 The proteome reflects both current and past transcriptional activities. To investigate upstream 517 cellular and regulatory activities, we obtained transcriptome profiles from two IVD compartments 518 (NP and AF) and two sample states (young, scoliotic but non-degenerated, YND; aged individuals 519 with clinically diagnosed IDD, AGD) (Table 1). The transcriptome profiles of YND and AGD are 520 similar to the young and aged disc proteome samples, respectively. After normalisation 521 (Supplemental Figure S11A) and hierarchical clustering, we found patterns reflecting relationships 522 among IVD compartments and ages/states (Supplemental Figure S11B-D). PCA of the 523 transcriptome profiles showed that PC1 captured age/state variations (Figure 6A) and PC2 524 captured the compartment (AF or NP) differences, with a high degree of similarity to the proteomic 525 PCA that explained 65.0% of all data variance (Figure 2A).

526 Transcriptome shows AF-like characteristics of the aged/degenerated NP

527 We compared the transcriptome profiles of different compartments and age/state-groups. We 528 detected 88 DEGs (differentially expressed genes) between young AF and NP samples 529 (Supplemental Figure S11E); 39 were more abundant in NP (including known NP markers CD24, 530 KRT19) and 49 were more abundant in AF, including the AF markers, COL1A1 and THBS1. In

19 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

531 the AGD samples, 11 genes differed between AF and NP (Supplemental Figure S11F), comparable 532 to the proteome profiles (Figure 4C). Between the YND and AGD AF, there were 45 DEGs, with 533 COL1A1 and MMP1 more abundant in YND and COL10A1, WNT signalling (WIF1, WNT16), 534 inflammatory (TNFAIP6, CXCL14, IL11), and fibrosis-associated (FN1, CXCL14) genes more 535 abundant in AGD (Supplemental Figure S11G). The greatest difference was between YND and 536 AGD NP, with 216 DEGs (Supplemental Figure S11H), with a marked loss of NP markers 537 (KRT19, CD24), and gain of AF (THBS1, DCN), proteolytic (ADAMTS5), and EMT (COL1A1, 538 COL3A1, PDPN, NT5E, LTBP1) markers with age. Again, consistent with the proteomic findings, 539 the most marked changes are in the NP, with the transcriptome profiles becoming AF-like.

540 Concordance between transcriptome and proteome profiles

541 We partitioned the DEGs between the YND and AGD into DEGs for individual compartments 542 (Figure 6B). The transcriptomic (Figure 6B) Venn diagram was very similar to the proteomic one 543 (Figure 5F-G). For example, WNT/TGFβ antagonists and ECM genes were all down-regulated 544 with ageing/degeneration, while genes associated with stress and ECM remodelling were more 545 common. When we directly compared the transcriptomic DEGs and proteomic DEPs across 546 age/states and compartments (Figure 6C-F), we observed strong concordance between the two 547 types of datasets for a series of markers. In the young discs, concordant markers included KRT19 548 and KRT8, CHRDL2, FRZB, and DKK3 in the NP, and COL1A1, SERPINF1, COL14A1, and 549 THBS4 in the AF (Figure 6C). In the AGD discs, concordant markers included CHI3L2, A2M and 550 ANGPTL4 in the NP and MYH9, HSP90AB1, HBA1, and ACTA2 in the AF (Figure 6D). A high 551 degree of concordance was also observed when we compared across age/states for the AF (Figure 552 6E) and NP (Figure 6F).

553 Despite the transcriptomic samples having diagnoses (scoliosis for YND and IDD for AGD), 554 whereas the proteome samples were cadaver samples with no reported diagnosis of IDD, the 555 changes detected in the transcriptome profiles substantially support the proteomic findings. A 556 surprising indication from the transcriptome was the increased levels of COL10A1 (Lu et al., 557 2014), BMP2 (Grimsrud et al., 2001), IBSP, defensin beta-1 (DEFB1), ADAMTS5, pro- 558 inflammatory (TNFAIP6, CXCL) and proliferation (CCND1, IGFBP) genes in the AGD NP

20 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

559 (Supplemental Figure S11E-H), reaffirming the involvement of hypertrophic-like events (Melas 560 et al., 2014) in the aged and degenerated NP.

561 The genome-wide transcriptomic data included over 20 times more genes per profile than the 562 proteomic data, providing additional biological information about the disc, particularly low 563 abundance proteins, such as transcription factors and surface markers. For example, additional 564 WNT antagonists were WIF1 (Wnt inhibitory factor) and GREM1 (Figure 6B) (Leijten et al., 565 2013). Comparing the YND NP against YND AF or AGD NP (Supplemental Figure S11E,H), we 566 identified higher expression of three transcription factors, T (brachyury), HOPX (homeodomain- 567 only protein homeobox), and ZNF385B in the YND NP. Brachyury is a well-known marker for 568 the NP (Risbud et al., 2015), and HOPX is differentially expressed in mouse NP as compared to 569 AF (Veras et al., 2020), and expressed in mouse notochordal NP cells (Lam, 2013). Overall, 570 transcriptomic data confirmed the proteomic findings and revealed additional markers.

571 Changes in the active proteome in the ageing IVD

572 The proteomic data up to this point is a static form of measurement (static proteome) and represents 573 the accumulation and turnover of all proteins up to the time of harvest. The transcriptome indicates 574 genes that are actively transcribed, but does not necessarily correlate to translation or protein 575 turnover. Thus, we studied changes in the IVD proteome (dynamic proteome) that would reflect 576 newly synthesised proteins and proteins cleaved by proteases (degradome), and how they relate to 577 the static proteomic and transcriptomic findings reported above.

578 Aged or degenerated discs synthesise fewer proteins

579 We performed ex vivo labelling of newly synthesised proteins using the SILAC protocol (Ong et 580 al., 2002) (Figure 7A; Methods) on AF and NP samples from 4 YND individuals and one AGD 581 individual (Table 1). In the SILAC profiles, light isotope-containing signals correspond to the pre- 582 existing unlabelled proteome, and heavy isotope-containing signals to newly synthesised proteins 583 (Figure 7B). The ECM compositions in the light isotope-containing profiles (Figure 7B, middle 584 panel) are similar to the static proteome samples of the corresponding age groups described above 585 (Figure 1F). In contrast, the heavy isotope-containing profiles contained fewer proteins in the AGD

21 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

586 than in the YND samples (Figure 7B, left panel) and showed variable heavy to light ratio profiles 587 (Figure 7B, right panel).

588 To facilitate comparisons, we averaged the abundance of the proteins detected in the NP or AF for 589 which we had more than one sample, then ranked the abundance of the heavy isotope-containing 590 (Figure 7C) and light isotope-containing (Figure 7D) proteins. The number of proteins newly 591 synthesised in the AGD samples was about half that in the YND samples (Figure 7C). This is 592 unlikely to be a technical artefact as the total number of light isotope-containing proteins detected 593 in the AGD samples is comparable to the YND, in both AF and NP (Figure 7D), and the difference 594 is again well illustrated in the heavy to light ratios (Figure 7E).

595 Reduced synthesis of non-matrisome proteins was found for the AGD samples (GAPDH as a 596 reference point (dotted red lines) (Figures 7C,D; Figure 7C,E, grey portions). Of the 68 high 597 abundant non-matrisome proteins in the YND NP compartment that were not present in the AGD 598 NP, 28 are ribosomal proteins (Supplemental Figure S12B), suggesting reduced translational 599 activities. This agrees with our earlier findings of cellularity, as represented by histones, in the 600 static proteome (Figure 1J, K).

601 Changes in protein synthesis in response to the cell microenvironment affects the architecture of 602 the disc proteome. To understand how the cells may contribute and respond to the accumulated 603 matrisome in the young and aged disc, we compared the newly synthesised matrisome proteins of 604 AGD and YND samples rearranged in order of abundance (Figure 7E). More matrisome proteins 605 were synthesised in YND samples across all classes. In YND AF, collagens were synthesised in 606 higher proportions than in AGD AF with the exception of fibril-associated COL12A1 (Figure 7E, 607 top panel). Similarly, higher proportions in YND AF were observed for proteoglycans (except 608 FMOD), glycoproteins (except TNC, FBN1, FGG and FGA), ECM affiliated proteins (except 609 C1QB), ECM regulators (except SERPINF2, SERPIND1, A2M, ITIH2, PLG), and secreted 610 factors (except ANGPTL2). Notably, regulators that were exclusively synthesised in young AF 611 are involved in collagen synthesis (P4HA1/2, LOXL2, LOX, PLOD1/2) and matrix turnover 612 (MMP3), with enrichment of protease HTRA1 and protease inhibitors TIMP3 and ITIH in YND 613 AF compared to AGD AF.

22 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

614 In AGD NP, overall collagen synthesis was less than in YND NP (Figure 7E, lower panel); 615 however, there was more synthesis of COL6A1/2/3 and COL12A1. Furthermore, AGD NP 616 synthesised more LUM, FMOD, DCN, PRG4, and PRELP proteoglycans than YND NP. Notably, 617 there was less synthesis of ECM-affiliated proteins (except C1QC and SEMA3A) and regulators 618 – particularly those involved in collagen synthesis (P4HA1/2, LOXL2, LOX) – but an increase in 619 protease inhibitors. A number of newly synthesised proteins in AGD NP were similarly 620 represented in the transcriptome data, including POSTN, ITIH2, SERPINC1, IGFBP3, and PLG. 621 Some genes were simultaneously underrepresented in the AGD NP transcriptome and newly 622 synthesised proteins, including hypertrophy inhibitor GREM1 (Leijten et al., 2013).

623 Proteome of aged or degenerated discs is at a higher degradative state

624 The degradome reflects protein turnover by identifying cleaved proteins in a sample (Lopez-Otin 625 and Overall, 2002).When combined with relative quantification of proteins through the use of 626 isotopic and isobaric tagging and enrichment for cleaved neo amine (N)-termini of proteins before 627 labelled samples are quantified by mass-spectrometry, degradomics is a powerful approach to 628 identify the actual status of protein cleavage in vivo.

629 We employed the well validated and sensitive terminal amine isotopic labelling of substrates 630 (TAILS) method (Kleifeld et al., 2010; Rauniyar and Yates, 2014) to analyse and compare 6 discs 631 from 6 individuals (3 young or non-degenerated, YND; and 3 aged and degenerated, AGD) (Table 632 1) (Figure 7F) (Kleifeld et al., 2010; Rauniyar and Yates, 2014) using the 6-plex tandem mass tag 633 (TMT)-TAILS (labelling 6 independent samples and analysed together on the mass spectrometer), 634 with one of the YND samples (YND136) as the reference sample (Figure 7F). Whereas shotgun 635 proteomics is intended to identify the proteome components, N-terminome data is designed to 636 identify the exact cleavage site in proteins that also evidence stable cleavage products in vivo.

637 Here, TAILS identified 92 and 68 cleaved proteins in the AF and NP disc samples, respectively. 638 For each protein, we calculated the means and standard deviations in the AGD and YND samples, 639 and then ranked these according to the normalised differences between the two groups, for NP 640 (Figure 7G) and AF (Figure 7H). By setting a cut-off of absolute normalised difference of greater 641 than 1, we detected proteins that were either cleaved less or cleaved more in AGD samples 642 compared to the YND samples. Interestingly, in both AF and NP, the AGD samples had more

23 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

643 cleaved proteins than the YND samples. In the NP, 5 proteins were less cleaved in AGD and 10 644 proteins were more cleaved (Figure 7G); while in the AF, 5 were less cleaved and 17 were more 645 cleaved (Figure 7H).

646 The proteins with more cleavage sites were functionally relevant to the corresponding disc 647 compartments. For example, in the YND NP, there were more cleaved neo N-terminal peptides 648 from cleaved CHRDL2, COL2A1, immunoglobulins and LYZ; whereas the AGD NP was 649 enriched for semi-trypic cleaved neo N-terminal peptides corresponding to cleaved blood proteins 650 (A2M, ALB, HBA1) and matrix proteins FN1, COMP, and PRG4 (Figure 7G). In the AGD AF, 651 we quantified increased proportions of peptides relative to YND AF, that corresponded to cleaved 652 blood proteins (HBB, HBA1, C3, immunoglobulins), isoform 2 of clusterin (CLU), FN1, COMP, 653 HAPLN1, CILP, FGFBP2; whereas in the YND AF there were more peptides corresponding to 654 cleaved collagen types I, II and VI (Figure 7H).

655 The cleavage sites in the non-degenerated aged samples in both compartments did not deviate 656 noticeably from the reference sample, in terms of relative abundance. These results suggest that 657 protein degradation activity increases with age, whether or not degeneration occurs.

658 MRI landscape correlates with proteomic landscape

659 We tested for a correlation between MRI signal intensity and proteome composition. In 660 conventional 3T MRI of young discs, the NP is brightest reflecting its high hydration state while 661 the AF is darker, thus less hydrated (Figure 1A; Supplemental Figure S13B). Since aged discs 662 present with more MRI phenotypes, we used higher resolution MRI (7T) on them (Figure 1B; 663 Figure 8A), which showed less contrast between NP and AF than in the young discs. To enhance 664 robustness, we obtained three transverse stacks per disc level for the aged discs (Figure 8B; 665 Supplemental Figure S13D), and averaged the pixel intensities for the different compartments 666 showing that overall, the inner regions were still brighter than the outer (Figure 8C).

667 Next, we performed a level-compartment bi-clustering on the pixel intensities of the aged disc 668 MRIs, which was bound by disc level and compartment (Figure 8D). The findings resembled the 669 proteomic clustering and PCA patterns (Figure 2; Supplemental Figure S6V-W). We performed a 670 pixel intensity averaging of the disc compartments from the 3T images (Supplemental Figure 13B),

24 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

671 and a level-compartment bi-clustering on the pixel intensities (Supplemental Figure S13C). While 672 the clustering can clearly partition the inner from the outer disc compartments, the information 673 value from each of the compartments is less due to the lower resolution of the MRI. In all, these 674 results indicate a link between regional MRI landscapes and proteome profiles, prompting us to 675 investigate their potential connections.

676 Proteome-wide associations with MRI landscapes reveals a hydration matrisome

677 The MRI and the static proteome were done on the same specimens in both individuals, so we 678 could perform proteome-wide associations with the MRI intensities. We detected 85 significantly 679 correlated ECM proteins, hereby referred to as the hydration matrisome (Figure 8E). We found no 680 collagen to be positively correlated with brighter MRI, which fits current understanding as 681 collagens contribute to fibrosis and dehydration. Other classes of matrisome proteins were either 682 positively or negatively correlated, with differential components for each class (Figure 8E). 683 Positively correlated proteoglycans included EPYC, PRG4 (lubricin) and VCAN, consistent with 684 their normal expression in a young disc and hydration properties. Negatively correlated proteins 685 included OAF (TNC, SLRPs) and fibrotic (POSTN) markers (Figure 8E).

686 Given this MRI-proteome link and the greater dynamic ranges of MRI in the aged discs enabled 687 by the higher resolution 7T MRIs (Figure 8D), we hypothesised that the hydration matrisome 688 might be used to provide information about MRI intensities and thus disc hydration. To test this, 689 we trained a LASSO regression model (Tibshirani, 1996) of the aged MRIs using the hydration 690 matrisome (85 proteins), and applied the model to predict the intensity of the MRIs of the young 691 discs, based on the young proteome of the same 85 proteins. Remarkably, we obtained a PCC of 692 0.689 (p=8.9×10-6; Spearman=0.776) between the actual and predicted MRI (Supplemental Figure 693 S13E). The predicted MRI intensities of the young disc exhibited a smooth monotonic decrease 694 from the NP towards IAF, then dropped suddenly towards the OAF (Figure 8F, right panel), with 695 an ROC AUC (receiver operating characteristics, area under the curve) of 0.996 between IAF and 696 OAF (Supplemental Figure S13F). In comparison, actual MRIs exhibited a linear decrease from 697 NP to OAF (Figure 8F, left panel). On reviewing these two patterns, we argue that the predicted 698 intensities may be a more faithful representation of the young discs’ water contents than the actual 699 MRI, as it reflects the gross images (Supplemental Figure S1A), PCA (Figure 3A) and Y1 modular

25 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

700 trend in the young discs (Figure 3E). This exercise not only revealed the inherent connections 701 between regional MRI and regional proteome, but also identified a set of ECM components that is 702 predictive of MRI relating to disc hydration, which may be valuable for future clinical applications.

703

26 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

704 Discussion

705 The IVD is architecturally designed for its function. Thus, spatial information relating to the 706 molecular make-up in different compartments is critical for understanding its biology in health and 707 disease. This can only be obtained from intact IVDs. In our study, we provided information on the 708 spatial proteome of entire human IVDs encompassing different compartments along the lateral and 709 anteroposterior axes, and at different levels of the lumbar discs. This spatial proteome provides a 710 reference map of the proteomic architecture of a young disc which can be used to provide insights 711 into ageing and degeneration.

712 A discovery dataset was established from intact lumbar discs of a young and an aged individual, 713 consisting of 66 proteomic specimens in total; with 11 regional specimens per disc for three lumbar 714 disc levels per individual. The young discs were considered non-degenerated by disc morphologies 715 and MRI. The aged discs displayed degeneration characteristics including reduced MRI intensities 716 and annular tears. We extended the findings of this data set with transcriptomics, SILAC and 717 degradome profiling from another 15 individuals.

718 Using a well-established protein extraction protocol (Onnerfjord et al., 2012b), and 719 chromatographic fractionation of the peptides prior to mass spectrometry, we produced a dataset 720 of the human intervertebral disc comprising 3,100 proteins, encompassing ~400 matrisome and 721 2,700 non-matrisome proteins, with 1,769 proteins detected in 3 or more profiles, considerably 722 higher than recent studies (Maseda et al., 2016; Rajasekaran et al., 2020; Ranjani et al., 2016; 723 Sarath Babu et al., 2016). The high quality of our data enabled the application of unbiased 724 approaches including PCA and ANOVA to reveal the relative importance of the phenotypic 725 factors. Particularly, age was found to be the dominant factor influencing proteome profiles.

726 Comparisons between different compartments of the young disc produced a reference landscape 727 containing known (KRT8/19 for the NP; COL1A1, CILP and COMP for the AF) and novel (FRZB, 728 CHRD, CHRDL2 for the NP; TNMD, SLRPs and SOD1 for the AF) markers. The young healthy 729 discs were enriched for matrisome components consistent with a healthy functional young IVD. 730 Despite morphological differences between NP and IAF, the inner disc compartments (NP, 731 NP/IAF and IAF) display high similarities, in contrast to the large differences between IAF and 732 OAF, which was consistent for discs from all lumbar disc levels. This morphological-molecular

27 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

733 discrepancy might be accounted for by subtle differences in the ECM organisation, such as 734 differences in GAG moieties on proteoglycans, or levels of or other modifications 735 of ECM that diversify function (Silagi et al., 2018). Nonetheless, we partitioned the detected 736 proteins into a variable set that captures the diversity, and a constant set that lays the common 737 foundation of all young compartments, which work in synergy to achieve disc function.

738 Clustering analysis of the 671 DEPs of the variable set identified 4 key modules (Y1-Y4). Visually, 739 Y1 and Y2 mapped across the lateral and anteroposterior axes with opposing trends. Molecularly, 740 module Y1 (NP) contained proteins promoting regulation of matrix remodelling, such as matrix 741 degradation inhibitor MATN3 (Jayasuriya et al., 2012) and MMP inhibitor TIMP1. Inhibitors of 742 WNT and BMP signalling were also present. Module Y2 (OAF) included COL1A1, THBS1/2/3, 743 CILP1/CILP2 and TNMD, consistent with the OAF’s tendon-like features (Nakamichi et al., 744 2018). It also included a set of SLRPs that might play roles in regulation of collagen assembly 745 (Robinson et al., 2017; Taye et al., 2020), fibril alignment (Robinson et al., 2017), maturation and 746 crosslinking (Kalamajski et al., 2016); while others are known to inhibit or promote TGFβ 747 signalling (Markmann et al., 2000). Notably, the composition of the IAF appeared to be a transition 748 zone between NP and OAF rather than an independent compartment, as few proteins can 749 distinguish it from adjacent compartments. Classes of proteins in both Y3 (smooth muscle feature) 750 and Y4 (immune and blood) resemble Y2, which reflects the contractile property of the AF (Nakai 751 et al., 2016) and the capillaries infiltrating or present at the superficial outer surface of the IVD.

752 In the aged disc, the change in the DEPs between the inner and outer regions of the discs suggests 753 extensive changes in the inner compartment(s). Mapping the aged data onto modules Y1-Y4 754 allowed a visualisation of the changes. The flattening of the Y1 and Y2 modules along both the 755 lateral and anteroposterior axes indicated a convergence of the inner and outer disc. This is 756 supported by the observed rapid decline of NP proteins and increase of AF proteins in the inner 757 region. Fewer changes were seen in the aged OAF, which concurs with the notion that degenerative 758 changes originate from the NP and radiate outwards. The most marked change was seen in module 759 Y4 (blood), where the pattern was inverted, characterised by high expression in the NP but low in 760 OAF. While contamination cannot be excluded and there are reports that capillaries do not 761 infiltrate the NP even in degeneration (Nerlich et al., 2007), our finding is consistent with other 762 proteomic studies showing enrichment of blood proteins in pathological NP (Maseda et al., 2016).

28 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

763 The route of infiltration can be from the fissured AF or cartilage endplates. Calcified endplates are 764 more susceptible to microfractures, which can lead to blood infiltration into the NP (Sun et al., 765 2020). Of interest is the involvement of an immune response within the inner disc. This 766 corroborates reports of inflammatory processes in ageing and degenerative discs, with the up- 767 regulation of pro-inflammatory cytokines and presence of inflammatory cells (Molinos et al., 768 2015; Wuertz et al., 2012).

769 The SILAC and degradome studies provided important insights into age-related differences in the 770 biosynthetic and turnover activity in the IVD. The SILAC data indicated that protein synthesis is 771 significantly impaired in aged degenerated discs. These findings correlate with reports of reduced 772 cellularity in ageing (Rodriguez et al., 2011), which we have also ascertained by leveraging the 773 relationship of histones and housekeeping genes with cell numbers. From the TAILS degradome 774 analysis, we observed more cleaved protein fragments in aged compartments, particularly for 775 structural proteins important for tissue integrity such as COMP and those involved in cell-matrix 776 interactions such as FN1, which was coupled with the enrichment of the proteolytic process GO 777 terms in the aged static proteome. Collectively, this reveals a systematic modification and 778 replacement of the primary proteomic architecture of the young IVD with age that is associated 779 with diminished or failure in functional properties in ageing or degeneration.

780 Despite known transcriptome-proteome discordance (Fortelny et al., 2017), our identification of 781 concordant changes allow insights into active changes in the young and aged discs. For example, 782 inhibitors of the WNT pathway and antagonists of BMP/TGFβ signalling (Leijten et al., 2013) 783 were down regulated in the aged discs in both the proteome and transcriptome. Interestingly, the 784 activation of these pathways is known to promote chondrocyte hypertrophy (Dong et al., 2006), 785 and hypertrophy has been noted in IDD (Rutges et al., 2010). This suggests a model where the 786 regulatory environment suppressing cellular hypertrophy changes with ageing or degeneration, 787 resulting in conditions such as cellular senescence and tissue mineralisation that are part of the 788 pathological process. In support, S100A1, a known inhibitor of chondrocyte hypertrophy (Saito et 789 al., 2007) is down regulated, while chondrocyte hypertrophy markers COL10A1 and IBSP 790 (Komori, 2010) are up-regulated in the aged disc. Similar changes have been observed in ageing 791 mouse NP (Veras et al., 2020) as well as in osteoarthritis (Zhu et al., 2009) where chondrocyte 792 hypertrophy is thought to be involved in its aetiology (Ji et al., 2019; van der Kraan and van den

29 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

793 Berg, 2012). Given that WNT inhibitors are already in clinical trials for osteoarthritis (Wang et 794 al., 2019b), this may point to a prospective therapeutic strategy for IDD.

795 A key finding of our study is the direct demonstration, within a single individual, of association 796 between the hydration status of the disc as revealed by MRI, and the matrisome composition of 797 the disc proteome. The remarkable correlation between predicted hydration states inferred from 798 the spatial proteomic data and the high-definition phenotyping of the aged disc afforded by 7T 799 MRI has enormous potential for understanding the molecular processes underlying IDD.

800 In conclusion, we have generated a comprehensive dataset of the young and aged disc spatial 801 proteome, that provides detailed information on the proteome in the different compartments of the 802 IVD and the architectural changes that accompany ageing. The data reveal the role of the spatial 803 organisation of the proteome that underpins the functional properties of the different compartments 804 of the human IVD with implications for the underlying molecular pathology of degeneration 805 (Figure 8G, H). The rich datasets are a valuable resource for cross referencing with animal and in 806 vitro studies to evaluate clinical relevance and guide the development of therapeutics of human 807 IDD.

808

30 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

809 Materials and methods

810 Cadaveric specimens

811 Two human lumbar spines were obtained through approved regulations and governing bodies, with 812 one young (16M) provided by L.H. (McGill University) and one aged (59M) from Articular 813 Engineering, LLC (IL, USA). The young lumbar spine was received frozen as an intact whole 814 lumbar spine. The aged lumbar spine was received frozen, dissected into bone-disc-bone segments. 815 The cadaveric samples were stored at -80⁰C until use.

816 Clinical specimens

817 Clinical specimens were obtained with approval by the Institutional Review Board (references UW 818 13-576 and EC 1516-00 11/01/2001) and with informed consent in accordance with the Helsinki 819 Declaration of 1975 (revision 1983) from another 15 patients undergoing surgery for IDD, trauma 820 or AIS (adolescent idiopathic scoliosis) at Queen Mary Hospital (Hong Kong), and Duchess of 821 Kent Children’s Hospital (Hong Kong). Information of both the cadaveric and clinical samples are 822 summarised in Table 1.

823 MRI imaging of cadaveric samples

824 The discs were thawed overnight at 4⁰C, and then pre-equalised for scanning at room temperature. 825 For the young IVD, these were imaged together as the lumbar spine was kept intact. T2-weighted 826 and T1-weighted sagittal and axial MRI, T1-rho MRI and Ultrashort-time-to-echo MRI images 827 were obtained using a 3T Philips Achieva 3.0 system at the Department of Diagnostic Radiology, 828 The University of Hong Kong.

829 For the aged discs, the IVD were imaged separately as bone-disc-bone segments, at the Department 830 of Electrical and Electronic Engineering, The University of Hong Kong. The MRS and CEST 831 imaging were performed. The FOV for the CEST imaging was adjusted to 76.8×76.8 mm2 to 832 accommodate the size of human lumbar discs (matrix size = 64×64, slice thickness = 2 mm). All 833 MRI experiments were performed at room temperature using a 7 T pre-clinical scanner (70/16 834 Pharmascan, Bruker BioSpin GmbH, Germany) equipped with a 370 mT/m gradient system along

31 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

835 each axis. Single-channel volume RF coils with different diameters were used for the samples 836 based on size (60 mm for GAG phantoms and human cadaveric discs).

837 Image assessment of the aged lumbar IVD

838 The MRI images in the transverse view were then assessed for intensity of the image (brighter 839 signifying more water content). Three transverse MRI images per IVD were overlaid with a grid 840 representing the areas that were cut for mass-spectrometry measurements as outlined previously. 841 For each region, the ‘intensity’ was represented by the average of the pixel intensities, which were 842 graphically visualised and used for correlative studies.

843 Division of cadaveric IVD for mass spectrometry analysis

844 The endplates were carefully cut off with a scalpel, exposing the surface of the IVD, which were 845 then cut into small segments spanning seven segments in the central left-right lateral axis, and 846 five segments in the central anteroposterior axis (Figure 1C). In all, this corresponds to a total of 847 11 locations per IVD. Among them, 4 are from the OAF, 2 from the IAF (but only in the lateral 848 axis), 1 from the central NP, and 4 from a transition zone between IAF and the NP (designated 849 the ‘NP/IAF’). Samples were stored frozen at -80⁰C until use.

850 SILAC by ex vivo culture of disc tissues

851 NP and AF disc tissues from spine surgeries (Table 1) were cultured in custom-made Arg- and 852 Lys-free α-MEM (AthenaES) as per formulation of Gibco α-MEM (Cat #11900-024), 853 supplemented with 10% dialysed FBS (10,000 MWCO, Biowest, Cat# S181D), 854 penicillin/streptomycin, 2.2 g/L sodium bicarbonate (Sigma), 30 mg/L L-methionine (Sigma), 21 13 855 mg/L “heavy” isotope-labelled C6 L-arginine (Arg6, Cambridge Isotopes, Cat # CLM-2265-H), 856 146 mg/L “heavy” isotope-labelled 4,4,5,5-D4 L-Lysine (Lys4, Cambridge Isotopes, Cat # DLM-

857 2640). Tissue explants were cultured for 7 days in hypoxia (1% O2 and 5% CO2 in air) at 37⁰C 858 before being washed with PBS and frozen until use.

859 Protein extraction and preparation for cadaveric and SILAC samples

860 The frozen samples were pulverised using a freezer mill (Spex) under liquid nitrogen. Samples 861 were extracted using 15 volumes (w/v) of extraction buffer (4M guanidine hydrochloride (GuHCl),

32 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

862 50 mM sodium acetate, 100 mM 6-aminocaproic acid, and HALT protease inhibitor cocktail 863 (Thermo Fischer Scientific), pH 5.8). Samples were mechanically dissociated with 10 freeze-thaw 864 cycles and sonicated in a cold water bath, before extraction with gentle agitation at 4°C for 48 865 hours. Samples were centrifuged at 15,000g for 30 minutes at 4°C and the supernatant was ethanol 866 precipitated at a ratio of 1:9 for 16 hours at -20°C. The ethanol step was repeated and samples were 867 centrifuged at 5000 g for 45 min at 4°C, and the protein pellets were air dried for 30 min.

868 Protein pellets were re-suspended in fresh 4M urea in 50 mM ammonium bicarbonate, pH 8, using 869 water bath sonication to aid in the re-solubilisation of the samples. Samples underwent reduction 870 with TCEP (5mM final concentration) at 60°C for 1 hr, and alkylation with iodoacetamide (500 871 mM final concentration) for 20 min at RT. Protein concentration was measured using the BCA 872 assay (Biorad) according to manufacturer’s instructions. 200 μg of protein was then buffer 873 exchanged with 50 mM ammonium bicarbonate with centricon filters (Millipore, 30 kDa cutoff) 874 according to manufacturer’s instructions. Samples were digested with mass spec grade 875 Trypsin/LysC (Promega) as per manufacturer’s instructions. For SILAC-labelled samples, formic 876 acid was added to a final concentration of 1%, and centrifuged and the supernatant then desalted 877 prior to LC-MS/MS measurements. For the cadaveric samples, the digested peptides were then 878 acidified with TFA (0.1% final concentration) and quantified using the peptide quantitative 879 colorimetric peptide assay kit (Pierce, catalogue 23275) before undergoing fractionation using the 880 High pH reversed phase peptide fractionation kit (Pierce, catalogue number 84868) into four 881 fractions. Desalted peptides, were dried, re-suspended in 0.1% formic acid prior to LC-MS/MS 882 measurements.

883 Mass spectrometry for cadaveric and SILAC samples

884 Samples were loaded onto the Dionex UltiMate 3000 RSLC nano Liquid Chromatography coupled 885 to the Orbitrap Fusion Lumos Tribid Mass Spectrometer. Peptides were separated on a commercial 886 Acclaim C18 column (75 μm internal diameter × 50 cm length, 1.9 μm particle size; Thermo). 887 Separation was attained using a linear gradient of increasing buffer B (80% ACN and 0.1% formic 888 acid) and declining buffer A (0.1% formic acid) at 300 nL/min. Buffer B was increased to 30% B 889 in 210 min and ramped to 40% B in 10 min followed by a quick ramp to 95% B, where it was held 890 for 5 min before a quick ramp back to 5% B, where it was held and the column was re-equilibrated.

33 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

891 Mass spectrometer was operated in positive polarity mode with capillary temperature of 300°C. 892 Full survey scan resolution was set to 120 000 with an automatic gain control (AGC) target value 893 of 2 × 106, maximum ion injection time of 30 ms, and for a scan range of 400−1500 m/z. Data 894 acquisition was in DDA mode to automatically isolate and fragment topN multiply charged 895 precursors according to their intensities. Spectra were obtained at 30000 MS2 resolution with AGC 896 target of 1 × 105 and maximum ion injection time of 100 ms, 1.6 m/z isolation width, and 897 normalised collisional energy of 31. Preceding precursor ions targeted for HCD were dynamically 898 excluded of 50 s.

899 Label free quantitative data processing for cadaveric samples

900 Raw data were analysed using MaxQuant (v.1.6.3.3, Germany). Briefly, raw files were searched 901 using Andromeda search engine against human UniProt protein database (20,395 entries, Oct 902 2018), supplemented with sequences of contaminant proteins. Andromeda search parameters for 903 protein identification were set to a tolerance of 6 ppm for the parental peptide, and 20 ppm for 904 fragmentation spectra and trypsin specificity allowing up to 2 miscleaved sites. Oxidation of 905 methionine, carboxyamidomethylation of cysteines was specified as a fixed modification. Minimal 906 required peptide length was specified at 7 amino acids. Peptides and proteins detected by at least 907 2 label-free quantification (LFQ) ion counts for each peptide in one of the samples were accepted, 908 with a false discovery rate (FDR) of 1%. Proteins were quantified by normalised summed peptide 909 intensities computed in MaxQuant with the LFQ option enabled. A total of 66 profiles were 910 obtained: 11 locations × 3 disc levels × 2 individuals; with a median of 665 proteins (minimum 911 419, maximum 1920) per profile.

912 Data processing for SILAC samples

913 The high resolution, high mass accuracy mass spectrometry (MS) data obtained were processed 914 using Proteome Discoverer (Ver 2.1), wherein data were searched using Sequest algorithm against 915 Human UniProt database (29,900 entries, May 2016), supplemented with sequences of 916 contaminant proteins, using the following search parameters settings: oxidized methionine (M), 917 acetylation (Protein N-term), heavy Arginine (R6) and Lysine (K4) were selected as dynamic 918 modifications, carboxyamidomethylation of cysteines was specified as a fixed modification, 919 minimum peptide length of 7 amino acids was enabled, tolerance of 10 ppm for the parental

34 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

920 peptide, and 20 ppm for fragmentation spectra, and trypsin specificity allowing up to 2 miscleaved 921 sites. Confident proteins were identified using a target-decoy approach with a reversed database, 922 strict FDR 1% at peptide and PSM level. Newly synthesised proteins were heavy labelled with 923 Arg6- and Lys4 and the data was expressed as the normalised protein abundance obtained from 924 heavy (labelled)/light (un-labelled) ratio.

925 Degradome sample preparation, mass spectrometry and data processing

926 Degradome analyses was performed on NP and AF from three non-degenerated and three 927 degenerated individuals (Table 1). Frozen tissues were pulverised as described above and prepared 928 for TAILS as previously reported (Kleifeld et al., 2010). After extraction with SDS buffer (1% 929 SDS, 100 mM dithiothreitol, 1X protease inhibitor in deionised water) and sonication (three cycles, 930 15s/cycle), the supernatant (soluble fraction) underwent reduction at 37⁰C and alkylation with a 931 final concentration of 15mM iodoacetamide for 30 min at RT. Samples were precipitated using 932 chloroform/methanol, and the protein pellet air dried. Samples were re-suspended in 1M NaOH, 933 quantified by nanodrop, diluted to 100mM HEPES and 4M GnHCl and pH adjusted pH6.5-7.5) 934 prior to 6-plex TMT labelling as per manufacturer’s instructions (Sixplex TMT, Cat# 90061, 935 ThermoFisher Scientific). Equal ratios of TMT-labelled samples were pooled and 936 methanol/chloroform precipitated. Protein pellets were air-dried and re-suspended in 200mM 937 HEPES (pH8), and digested with trypsin (1:100 ratio) for 16 hr at 37⁰C, pH 6.5 and a sample was 938 taken for pre-TAILS. High-molecular-weight dendritic polyglycerol aldehyde (ratio of

939 5:1 w/w polymer to sample) and NaBH3CN (to a final concentration of 80 mM) was added, 940 incubated at 37⁰C for 16 hr, followed by quenching with 100 mM ethanolamine (30 min at 37⁰C) 941 and underwent ultrafiltration (MWCO of 10,000). Collected samples were desalted, acidified to 942 0.1% formic acid and dried, prior to MS analysis.

943 Samples were analysed on a Thermo Scientific Easy nLC-1000 coupled online to a Bruker 944 Daltonics Impact II UHR QTOF. Briefly, peptides were loaded onto a 20cm x 75µm I.D. analytical 945 column packed with 1.8µm C18 material (Dr. Maisch GmbH, Germany) in 100% buffer A (99.9%

946 H2O, 0.1% formic acid) at 800 bar followed by a linear gradient elution in buffer B (99.9% 947 acetonitrile, 0.1% formic acid) to a final 30% buffer B for a total 180 min including washing with 948 95% buffer B. Eluted peptides were ionized by ESI and peptide ions were subjected to tandem MS

35 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

949 analysis using a data-dependent acquisition method. A top17 method was employed, where the top 950 17 most intense multiply charged precursor ions were isolated for MS/MS using collision-induced- 951 dissociation, and actively excluded for 30s.

952 MGF files were extracted and searched using Mascot against the UniProt Homo sapiens database, 953 with semi-ArgC specificity, TMT6plex quantification, variable oxidation of methionine, variable 954 acetylation of N termini, 20 ppm MS1 error tolerance, 0.05 Da MS2 error tolerance and 2 missed 955 cleavages. Mascot .dat files were imported into Scaffold Q+S v4.4.3 for peptide identification 956 processing to a final FDR of 1%. Quantitative values were calculated through Scaffold and used 957 for subsequent analyses.

958 Transcriptomic samples: isolation, RNA extraction and data processing

959 AF and NP tissues from 4 individuals were cut into approximately 0.5cm3 pieces, and put into the 960 Dulbecco's modified Eagle's medium (DMEM) (Gibco) supplemented with 20 mM HEPES 961 (USB), 1% penicillin-streptomycin (Gibco) and 0.4% fungizone (Gibco). The tissues were 962 digested with 0.2% pronase (Roche) for 1 hour, and centrifuged at 200 g for 5 min to remove 963 supernatant. AF and NP were then digested by 0.1% type II collagenase (Worthington 964 Biochemical) for 14 hours and 0.05% type II collagenase for 8 hours, respectively. Cell suspension 965 was filtered through a 70 µm cell strainer (BD Falcon) and centrifuged at 200 g for 5 min. The cell 966 pellet was washed with buffered saline (PBS) and centrifuged again to remove the 967 supernatant.RNA was then extracted from the isolated disc cells using Absolutely RNA Nanoprep 968 Kit (Stratagene), following manufacturer’s protocol, and stored at -80°C until further processing.

969 The quality and quantity of total RNA were assessed on the Bioanalyzer (Agilent) using the RNA 970 6000 Nano total RNA assay. cDNA was generated using Affymetrix GeneChip Two-Cycle cDNA 971 Synthesis Kit, followed by in vitro transcription to produce biotin-labelled cRNA. The sample was 972 then hybridised onto the Affymetrix GeneChip Human Genome U133 Plus 2.0 Array. The array 973 image, CEL file and other related files were generated using Affymetrix GeneChip Command 974 Console. The experiment was conducted as a service at the Centre for PanorOmic Sciences of the 975 University of Hong Kong.

36 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

976 CEL and other files were loaded into GeneSpring GX 10 (Agilent) software. The RMA algorithm 977 was used for probe summation. Data were normalised with baseline transformed to median of all 978 samples. A loose filtering based on the raw intensity values was then applied to remove 979 background noise. Consequently, transcriptomic data with a total of 54,675 probes (corresponding 980 to 20,887 genes) and 8 profiles were obtained.

981 Bioinformatics and functional analyses

982 The detected proteins were compared against the transcription factor (TF) database (Vaquerizas et 983 al., 2009) and the human genome nomenclature consortium database for cell surface markers 984 (CDs) (Braschi et al., 2019), where 77 TFs and 83 CDs were detected (Supplemental Figure S3A). 985 Excluding missing values, the LFQ levels among the data-points range from 15.6 to 41.1, with a 986 Gaussian-like empirical distribution (Supplemental Figure S5A). The numbers of valid values per 987 protein were found to decline rapidly when they were sorted in descending order (Supplemental 988 Figure S5B, upper panel). To perform principal component analyses (PCAs), only a subset of 989 genes with sufficiently large numbers of valid values (i.e. non-missing values) were used. The cut- 990 off for this was chosen based on a point corresponding to the steepest slope of descending order 991 of valid protein numbers (Supplemental Figure S5B, second panel), such that the increase of valid 992 values is slower than the increase of missing values beyond that point. Subsequently, the top 507 993 genes were picked representing 59.8% of all valid values. This new subset includes 12.4% of all 994 missing values. Since the subset of data still contains some missing values, an imputation strategy 995 was adopted employing the Multiple Imputation by Chained Equations (MICE) method and 996 package (van Buuren and Groothuis-Oudshoorn, 2011), with a max iteration set at 50 and the 997 default PMM method (predictive mean matching). To further ensure normality, Winsorisation was 998 applied such that genes whose average is below 5% or above 95% of all genes were also excluded 999 from PCA. The data was then profile-wise standardised (zero-mean and 1 standard deviation) 1000 before PCA was applied on the R platform (Team, 2013).

1001 To assess the impact of the spatiotemporal factors on the proteomic profiles, we performed 1002 Analysis of Variance (ANOVA), correlating each protein to the age, compartments, level, and 1003 directionality. To draw the soft boundaries on the PCA plot between groups of samples, support 1004 vector machines with polynomial (degree of 2) kernel were applied using the LIBSVM package

37 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1005 (Chang and Lin, 2011) and the PCA coordinates as inputs for training. A meshed grid covering the 1006 whole PCA field was created to make prediction and draw probability contours for -0.5, 0, and 0.5 1007 from the fitted model. Hierarchical clustering was performed with (1- correlation coefficient) as 1008 the distance metrics unless otherwise specified.

1009 To address the problem of ‘dropout’ effects while avoiding extra inter-dependency introduced due 1010 to imputations, we adopted three strategies in calculating the differentially expressed proteins 1011 (DEPs), namely the statistical DEPs (sDEPs), exclusive DEPs (eDEPs), and the fold-change DEPs 1012 (fDEPs). First, for the proteins that have over half valid values in both groups under comparison, 1013 we performed t-testing with p-values adjusted for multiple testing by the false discovery rate 1014 (FDR). Those with FDR below 0.05 were considered sDEPs. Second, for the proteins where one 1015 group has some valid values while the other group is completely not detected, we considered the 1016 ones with over half valid values in one group to be eDEPs. For those proteins that were expressed 1017 in <50% in both groups, we considered the ones with fold-change greater than 2 to be fDEPs.

1018 To fit the lateral and anteroposterior trends for the modules of genes identified in the young 1019 samples, a Gaussian Process Estimation (GPE) model was trained using the GauPro package in R 1020 (Team, 2013). Pathway analyses was conducted on the GSEA (Subramanian et al., 2005). 1021 Signalling proteins was compiled based on 25 Signal transduction pathways listed on KEGG 1022 (Kanehisa et al., 2019).

1023 The LASSO model between MRI and proteome was trained using the R package “glmnet”, 1024 wherein the 85 ECMs were first imputed for missing values in them using MICE. Nine ECMs 1025 were not imputed for too many missing values, leaving 76 for training and testing. The best value 1026 for λ was determined by cross-validations. A model was then trained on the aged MRIs (dependent 1027 variable) and aged proteome of the 76 genes (independent variable). The fitted model was then 1028 applied to the young proteome to predict MRIs of the young discs.

1029 Raw data depository and software availability

1030 The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium 1031 via the PRIDE (Vizcaino et al., 2016) repository with the following dataset identifiers for cadaver 1032 samples (PXD017774), SILAC samples (PXD018193), and degradome samples

38 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1033 (PXD018298000). The RAW data for the transcriptome data has been deposited on NCBI GEO 1034 with accession number GSE147383. The custom scripts for processing and analysing the data were 1035 housed at github.com/hkudclab/human_disc_proteome. An interactive web interface for the data 1036 is available at sbms.hku.hk/dclab/human_disc_proteome.

39 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1037 Tables

1038 Table 1. Summary of disc samples.

Reason for Samples Age Sex Disc level/s Disc regions surgery Cadaver samples Young spine 16 M L3/4, L4/5, L5/S1 NP, NP/IAF, IAF, OAF N/A Aged spine 59 M L3/4, L4/5, L5/S1 NP, NP/IAF, IAF, OAF N/A Transcriptome samples YND74 17 M L1/2 NP, OAF Scoliosis YND88 16 M L1/2 NP, OAF Scoliosis AGD40 62 F L4/5 NP, OAF Degeneration AGD45 47 M L4/5 NP, OAF Degeneration SILAC samples YND148 19 F L2/3 OAF Scoliosis YND149 15 F L1/2 OAF Scoliosis YND151 15 F L1/2 NP, OAF Scoliosis YND152 14 F L1/2 NP, OAF Scoliosis AGD80 63 M L4/5 NP, OAF Degeneration Degradome samples YND136 17 F L1/2 NP, OAF Scoliosis YND141 20 F L1/2 NP, OAF Scoliosis YND143 53 M L1/2 NP, OAF Trauma AGD62 55 F L5/S1 NP, OAF Degeneration AGD65 68 F L4/5 NP, OAF Degeneration AGD67 55 M L4/5 NP, OAF Degeneration 1039

40 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1040 Table 2. Commonly expressed ECM and associated proteins across all 66 profiles.

Categories (Number of proteins) Protein names Core matrisome Collagens (13) COL1A1/2, COL2A1, COL3A1, COL5A1, COL6A1/2/3, COL11A1/2, COL12A1, COL14A1, COL15A1 Proteoglycans (14) ACAN, ASPN, BGN, CHAD, DCN, FMOD, HAPLN1, HSPG2, LUM, OGN, OMD, PRELP, PRG4, VCAN Glycoproteins (34) ABI3BP, AEBP1, CILP, CILP2, COMP, DPT, ECM2, EDIL3, EFEMP2, EMILIN1, FBN1, FGA, FGB, FN1, FNDC1, LTBP2, MATN2/3, MFGE8, MXRA5, NID2, PCOLCE, PCOLCE2, PXDN, SMOC1/2, SPARC, SRPX2, TGFBI, THBS1/2/4, TNC, TNXB Other matrisome ECM affiliated proteins (10) ANXA1/2/4/5/6, CLEC11A, CLEC3A/B, CSPG4, SEMA3C ECM regulators (16) A2M, CD109, CST3, F13A1, HTRA1, HTRA3, ITIH5, LOXL2/3, PLOD1, SERPINA1/3/5, SERPINE2, SERPING1, TIMP1 Secreted factors (2) ANGPTL2, FGFBP2 1041

41 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1042 Supplemental Tables

1043 Supplemental Table 1. Raw data of the 66 LC-MS/MS static proteome profiles.

1044 Supplemental Table 2. Differentially expressed proteins (DEPs) among pairs of sample groups 1045 within the 33 young disc profiles.

1046 Supplemental Table 3. Differentially expressed proteins (DEPs) among pairs of sample groups 1047 within the 33 aged disc profiles.

1048 Supplemental Table 4. Differentially expressed proteins (DEPs) between young and aged 1049 sample groups.

1050 Supplemental Table 5. Significantly enriched (GO) terms associated with 1051 proteins expressed higher in all young or all aged discs. 1052

42 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1053 Figure legends

1054 Figure 1. MRI, outline of workflow, and global overview of data. 1055 (A) Clinical T2-weighted MRI images of the young lumbar discs in the sagittal and coronal 1056 plane (left and right panels), T1 MRI image of the young lumbar spine (middle panel). 1057 (B) High resolution T2-weighted MRI of the aged lower lumbar spine in sagittal (left panel) 1058 and coronal plane (right pane). 1059 (C) Diagram showing the anatomy of the IVD and locations from where the samples were 1060 taken. VB: vertebral body; NP, nucleus pulposus; AF, annulus fibrosus; IAF: inner AF; 1061 OAF: outer AF; NP/IAF: a transition zone between NP and IAF. 1062 (D) Schematic diagram showing the structure of the data, and the flow of analyses. 1063 (E) Venn diagrams showing the overlap of detected proteins in the four major compartments. 1064 Top panel, young and aged profiles; middle, young only; bottom, aged only. 1065 (F) Barchart showing the numbers of proteins detected per sample, categorised into 1066 matrisome (coloured) or non-matrisome proteins (grey). 1067 (G) Barcharts showing the composition of the matrisome and matrisome-associated proteins. 1068 Heights of bars indicate the number of proteins in each category expressed per sample. 1069 The N number in brackets indicate the aggregate number of proteins. 1070 (H) Violin plots showing the level of sub-categories of ECMs in different compartments of 1071 the disc. The green number on top of each violin shows its median. LFQ: Label Free 1072 Quantification. 1073 (I) Top 30 HGNC gene families for all non-matrisome proteins detected in the dataset. 1074 (J) Violin plots showing the averaged expression levels of 10 detected histones across the 1075 disc compartments and age-groups. 1076 (K) Scatter-plot showing the co-linearity between GAPDH and histones.

1077 Figure 2. Principle component analysis (PCA) of the 66 profiles based on a set of 507 genes 1078 selected by optimal cut-off (see Supplemental Figure S5A-C). 1079 (A) Scatter-plot of PC1 and PC2 color-coded by compartments, and dot-shaped by age- 1080 groups. Solid curves are the support vector machines (SVMs) decision boundaries 1081 between inner disc regions (NP, NP/IAF, IAF) and OAF, and dashed curves are soft 1082 boundaries for probability equal to ±0.5 and are applied to all plots in this figure.

43 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1083 (B) Scatter-plot of PC1 and PC2 color-coded by disc levels. The SVM boundaries are trained 1084 between L5/S1 and upper levels (L3/4 and L4/5). 1085 (C) Scatter-plot of PC1 and PC3, color-coded by disc compartments. The SVM boundaries 1086 are trained between inner disc regions and OAF. 1087 (D) Scatter-plot of PC1 and PC3, color-coded by disc levels. The SVM boundaries are 1088 trained between L5/S1 and upper levels (L3/4 and L4/5). 1089 (E) Top 100 positively and negatively correlated genes with PC1, color-coded by ECM 1090 categories. 1091 (F) Top 100 positively and negatively correlated genes with PC2, color-coded by ECM 1092 categories. 1093 (G) Top 100 positively and negatively correlated genes with PC3, color-coded by ECM 1094 categories.

1095 Figure 3. Delineating the young and healthy disc proteome. 1096 (A) PCA plot of all 33 young profiles. Curves in the upper panel show the SVM boundaries 1097 between the OAF and inner disc regions, those in the lower panel separate the L5/S1 disc 1098 from the upper disc levels. L, left; R, right; A, anterior; P, posterior. 1099 (B) A schematic illustrating the partitioning of the detected human disc proteome into 1100 variable and constant sets. 1101 (C) A histogram showing the distribution of non-DEPs in terms of their detected frequencies 1102 in the young discs. Only 245 non-DEP proteins were detected in over 16 profiles, which 1103 is thus defined to be the constant set; while the remaining ~1,000 proteins were 1104 considered marginally detected. 1105 (D) Piecharts showing the ECM compositions in the variable (left) and constant (right) sets. 1106 The constant set proteins that were detected in all 33 young profiles are listed at the 1107 bottom. 1108 (E) Normalised expression (Z-scores) of proteins in the young module Y1 (NP signature) 1109 laterally (top panel) and anteroposterially (bottom panel), for all three disc levels 1110 combined. The red curve is the Gaussian Process Estimation (GPE) trendline, and the 1111 blue curves are 1 standard deviation above or below the trendline. 1112 (F) Lateral trends of module Y2 (AF signature) for each of the three disc levels.

44 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1113 (G) Lateral trends of module Y3 (Smooth muscle cell signature) for each of the three disc 1114 levels. 1115 (H) Lateral trends of module Y4 (Immune and blood) for each of the three disc levels. 1116 (I) Volcano plot of differentially expressed proteins (DEPs) between OAF and inner disc (an 1117 aggregate of NP, NP/IAF, IAF), with coloured dots representing DEPs. 1118 (J) A functional categorisation of the DEPs in (I).

1119 Figure 4. Characterisation of the aged disc proteome. 1120 (A) PCA plot of all the aged profiles on PC1 and PC2, color-coded by compartments. Curves 1121 in the left panel show the SVM boundaries between OAF and inner disc; those in the 1122 right panel separate the L5/S1 disc from the upper disc levels. Letters on dots indicate 1123 directions: L, left; R, right; A, anterior; P, posterior. 1124 (B) Volcano plot showing the DEPs between the OAF and inner disc (an aggregate of NP, 1125 NP/IAF and IAF), with the coloured dots representing statistically significant 1126 (FDR<0.05) DEPs. 1127 (C) Using the same 4 modules identified in young samples, we determined the trend for these 1128 in the aged samples. Locational trends of module Y1 showing higher expression in the 1129 inner disc, albeit they are more flattened than in the young disc samples. Top panel shows 1130 left to right direction and bottom panel shows anterior to posterior direction. The red 1131 curve is the Gaussian Process Estimation (GPE) trendline, and the blue curves are 1 1132 standard deviation above or below the trendline. This also applies to (D), (E) and (F). 1133 (D) Lateral trends for module Y2 in the aged discs. 1134 (E) Lateral trends for module Y3 in the aged discs. 1135 (F) Lateral trends for module Y4 in the aged discs.

1136 Figure 5. Comparison between young and aged samples.

1137 (A) Volcano plot showing the DEPs between all the 33 young and 33 aged profiles. Coloured 1138 dots represent statistically significant DEPs. 1139 (B) GO term enrichment of DEPs higher in young profiles. 1140 (C) GO term enrichment of DEPs higher in aged profiles. Full names of GO terms in (B) and 1141 (C) are listed in Supplemental Table S5.

45 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1142 (D) Volcano plot showing DEPs between aged and young inner disc regions. 1143 (E) Volcano plot showing DEPs between aged and young OAF. 1144 (F) Venn diagram showing the partitioning of the young/aged DEPs that were down- 1145 regulated in aged discs, into contributions from inner disc regions and OAF. 1146 (G) Venn diagram showing the partitioning of the young/aged DEPs that were up-regulated 1147 in aged discs, into contributions from inner disc regions and OAF. 1148 (H) A heat map showing proteins expressed in all young and aged disc, with the 1149 identification of 6 modules (module 1: higher expression in young inner disc regions, 1150 modules 2 and 4: higher expression in young OAF, module 3: highly expressing in aged 1151 OAF, module 5: higher expression across all aged samples, and module 6: higher 1152 expression in aged inner disc, and some OAF). 1153 (I) An alluvial chart showing the six modules identified in (H) and their connections to the 1154 previously identified four modules and constant set in the young reference proteome; as 1155 well as their connections to enriched GO terms.

1156 Figure 6. Concordance between proteomic and transcriptome data.

1157 (A) A PCA plot of the 8 transcriptomic profiles. Curves represent SVM boundaries between 1158 patient-groups or compartments. 1159 (B) Venn diagrams showing the partitioning of the young/aged DEGs into contributions from 1160 inner disc regions and OAF. Left: down-regulated in AGD samples; right: up-regulated. 1161 (C) Transcriptome data from the NP and AF of two young individuals were compared to the 1162 proteomic data, with coloured dots representing identified proteins also expressed at the 1163 transcriptome level. 1164 (D) Transcriptome and proteome comparison of aged OAF and NP. 1165 (E) Transcriptome and proteome comparison of young and aged OAF. 1166 (F) Transcriptome and proteome comparison of young and aged NP.

1167 Figure 7. The dynamic proteome of the intervertebral disc shows less biosynthesis of proteins in 1168 aged tissues.

1169 (A) Schematic showing pulse-SILAC labelling of ex-vivo cultured disc tissues where heavy 1170 Arg and Lys are incorporated into newly made proteins (heavy), and pre-existing proteins

46 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1171 remaining unlabelled (light). NP and AF tissues from young (n=3) and aged (n=1) were 1172 cultured for 7 days in hypoxia prior to MS. 1173 (B) Barcharts showing the number of identified non-matrisome (grey) and matrisome 1174 (coloured) existing proteins (middle panel); newly synthesised proteins (left panel), and 1175 the heavy/light ratio (right panel) for each of the samples. 1176 (C) The quantities of each of the heavy labelled (newly synthesised) proteins identified for 1177 each of the four groups were averaged, and then plotted in descending order of 1178 abundance. It shows that YND AF and NP synthesise higher numbers of proteins than the 1179 AGD AF and NP. The red dotted reference line shows the expression of GAPDH. 1180 (D) The quantities of each of the light (existing) proteins identified for each group was 1181 averaged, and then plotted in descending order of abundance which shows that there are 1182 similar levels of existing proteins in the four pooled samples. 1183 (E) The matrisome proteins of (C) were singled out for display. The abundance of these 1184 proteins in YND samples were generally higher across all types of matrisome proteins 1185 than the AGD, with the exceptions of aged related proteins. 1186 (F) Schematic showing the workflow of degradome analysis by N-terminal amine isotopic 1187 labelling (TAILS) for the identification of cleaved neo N-terminal peptides. 1188 (G) Identification of cleaved proteins ranked according to tandem mass tag (TMT) isobaric 1189 labelling of N-terminal peptides in NP. The numbers of N-terminal peptides identified 1190 from corresponding proteins (x axis) were plotted in increasing ratios, with those that 1191 were less cleaved in AGD NP to the left, whereas proteins cleaved at a greater number of 1192 sites or in increased amount per site in AGD NP to the right and included a higher 1193 proportion of ECM and plasma protein N-terminal peptides that were identified. Data is

1194 expressed as the log2(ratio) of N-terminal peptides relative to the YND136 sample. n.s.: 1195 not significant. 1196 (H) Comparison of protein degradation in AF. The numbers of peptides identified from 1197 corresponding proteins (x axis) were plotted in increasing ratios, with those that were less 1198 cleaved in AGD AF to the left, whereas proteins cleaved at a greater number of sites or in 1199 increased amount per site in AGD AF to the right. A higher proportion of ECM and 1200 plasma protein N-terminal peptides were also identified in AGD samples, whilst in YND

47 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1201 AF, the highest amount of detected peptides came from collagens. Data is expressed as

1202 the log2(ratio) relative to the YND136 sample.

1203 Figure 8. MRI intensities and their correlation with the proteomic data. 1204 (A) The middle MRI stack of each disc level in the aged cadaveric sample. 1205 (B) Schematic of the disc showing the three stacks of MRI images per disc. 1206 (C) Violin plots showing the pixel intensities within each location per disc level, 1207 corresponding to the respective locations taken for mass spectrometry measurements. 1208 Each violin-plot is the aggregate of three stacks of MRIs per disc. 1209 (D) A heatmap bi-clustering of levels and compartments based on the MRI intensities. 1210 (E) The hydration ECMs: the ECM proteins most positively and negatively correlated with 1211 MRI. 1212 (F) The 3T MRI intensities of the young discs across the compartments (left), and the 1213 predicted MRI intensities based on a LASSO regression model trained on the hydration 1214 ECMs (right). 1215 (G) A water-tank model of the dynamics in disc proteomics showing the balance of the 1216 proteome is maintained by adequate anabolism to balance catabolism. 1217 (H) Diagram showing the partitioning of the detected proteins into variable and constant sets, 1218 whereby four modules characterising the young healthy disc were further derived; and 1219 showing their changes with ageing. SMC: smooth muscle cell markers.

48 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1220 Author contributions

1221 VT handled IRB/IC, coordinated MRI measurements, performed tissue dissection, sample 1222 preparation, data processing (MaxQuant), analyses (Perseus) and interpretation, prepared figures. 1223 PKC performed bioinformatics analyses (ANOVA, PCA, SVM, DEGs, GPE, LASSO), and 1224 prepared figures. VT and PKC wrote the first draft of the manuscript. AY generated the microarray 1225 data. RS, NS and TK performed the mass spectrometry and processed the data, supervised and 1226 funded by CMO. MK, PS and WC were involved in interpretation. LH provided the young spine 1227 and interpreted the data. KC provided critical input, interpreted results and was involved in writing. 1228 DC conceived ideas, supervised the project, interpreted data and results, wrote the manuscript. All 1229 authors contributed to writing and approved the manuscript.

1230 Acknowledgments

1231 We thank Dr Ed Wu and Dr Anna Wang of the Dept of EEE at HKU for performing the high- 1232 resolution MRI on the aged discs. We thank Dr Dino Samartzis for arranging the MRI of the young 1233 lumbar spine, and Prof. Kenneth Cheung and Dr Jason Cheung for collecting surgical disc 1234 specimens. Part of this work was supported by the Theme-based Research Scheme (T12-708/12N) 1235 and Area of Excellence (AoE/M-04/04) of the Hong Kong Research Grants (RGC) Council 1236 awarded to KSEC (Project Coordinator and PI) and DC (co-PI), the RGC European Union - Hong 1237 Kong Research and Innovation Cooperation Co-funding Mechanism (E-HKU703/18) awarded to 1238 DC, and by the Ministry of Science and Technology of the People’s Republic of China: National 1239 Strategic Basic Research Program (“973”) (2014CB942900) awarded to DC. The TAILS analyses 1240 were supported by a Canadian Institutes of Health Research Foundation Grant FDN-148408 to 1241 CMO, who holds a Canada Research Chair in Protease Proteomics and Systems Biology.

1242

49 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1243 References

1244 Anderson, R.M., Lawrence, A.R., Stottmann, R.W., Bachiller, D., and Klingensmith, J. (2002). Chordin 1245 and noggin promote organizing centers of forebrain development in the mouse. Development 129, 1246 4975-4987. 1247 Barber, R.D., Harmer, D.W., Coleman, R.A., and Clark, B.J. (2005). GAPDH as a housekeeping gene: 1248 analysis of GAPDH mRNA expression in a panel of 72 human tissues. Physiol Genomics 21, 389- 1249 395. 1250 Bizet, A.A., Liu, K., Tran-Khanh, N., Saksena, A., Vorstenbosch, J., Finnson, K.W., Buschmann, M.D., 1251 and Philip, A. (2011). The TGF-beta co-receptor, CD109, promotes internalization and degradation 1252 of TGF-beta receptors. Biochim Biophys Acta 1813, 742-753. 1253 Braschi, B., Denny, P., Gray, K., Jones, T., Seal, R., Tweedie, S., Yates, B., and Bruford, E. (2019). 1254 Genenames.org: the HGNC and VGNC resources in 2019. Nucleic Acids Res 47, D786-D792. 1255 Chang, C.C., and Lin, C.J. (2011). LIBSVM: A Library for Support Vector Machines. Acm T Intel Syst 1256 Tec 2. 1257 Chen, J., Yan, W., and Setton, L.A. (2006). Molecular phenotypes of notochordal cells purified from 1258 immature nucleus pulposus. Eur Spine J 15 Suppl 3, S303-311. 1259 Dong, Y.F., Soung do, Y., Schwarz, E.M., O'Keefe, R.J., and Drissi, H. (2006). Wnt induction of 1260 chondrocyte hypertrophy through the Runx2 transcription factor. J Cell Physiol 208, 77-86. 1261 Feng, H., Danfelter, M., Stromqvist, B., and Heinegard, D. (2006). Extracellular matrix in disc 1262 degeneration. J Bone Joint Surg Am 88 Suppl 2, 25-29. 1263 Fortelny, N., Overall, C.M., Pavlidis, P., and Freue, G.V.C. (2017). Can we predict protein from mRNA 1264 levels? Nature 547, E19-E20. 1265 Fujita, N., Miyamoto, T., Imai, J., Hosogane, N., Suzuki, T., Yagi, M., Morita, K., Ninomiya, K., 1266 Miyamoto, K., Takaishi, H., et al. (2005). CD24 is expressed specifically in the nucleus pulposus 1267 of intervertebral discs. Biochem Biophys Res Commun 338, 1890-1896. 1268 Grimsrud, C.D., Romano, P.R., D'Souza, M., Puzas, J.E., Schwarz, E.M., Reynolds, P.R., Roiser, R.N., and 1269 O'Keefe, R.J. (2001). BMP signaling stimulates chondrocyte maturation and the expression of 1270 Indian hedgehog. J Orthop Res 19, 18-25. 1271 Hiyama, A., Sakai, D., Risbud, M.V., Tanaka, M., Arai, F., Abe, K., and Mochida, J. (2010). Enhancement 1272 of intervertebral disc cell senescence by WNT/beta-catenin signaling-induced matrix 1273 metalloproteinase expression. Arthritis Rheum 62, 3036-3047. 1274 Hou, G., Lu, H., Chen, M., Yao, H., and Zhao, H. (2014). Oxidative stress participates in age-related 1275 changes in rat lumbar intervertebral discs. Arch Gerontol Geriatr 59, 665-669. 1276 Humzah, M.D., and Soames, R.W. (1988). Human intervertebral disc: structure and function. Anat Rec 1277 220, 337-356. 1278 Jayasuriya, C.T., Goldring, M.B., Terek, R., and Chen, Q. (2012). Matrilin-3 induction of IL-1 receptor 1279 antagonist is required for up-regulating collagen II and aggrecan and down-regulating ADAMTS- 1280 5 . Arthritis Res Ther 14, R197. 1281 Ji, Q., Zheng, Y., Zhang, G., Hu, Y., Fan, X., Hou, Y., Wen, L., Li, L., Xu, Y., Wang, Y., et al. (2019). 1282 Single-cell RNA-seq analysis reveals the progression of human osteoarthritis. Ann Rheum Dis 78, 1283 100-110. 1284 Jim, J.J., Noponen-Hietala, N., Cheung, K.M., Ott, J., Karppinen, J., Sahraravand, A., Luk, K.D., Yip, S.P., 1285 Sham, P.C., Song, Y.Q., et al. (2005). The TRP2 allele of COL9A2 is an age-dependent risk factor 1286 for the development and severity of intervertebral disc degeneration. Spine (Phila Pa 1976) 30, 1287 2735-2742. 1288 Johnson, J.L., Hall, T.E., Dyson, J.M., Sonntag, C., Ayers, K., Berger, S., Gautier, P., Mitchell, C., Hollway, 1289 G.E., and Currie, P.D. (2012). Scube activity is necessary for Hedgehog signal transduction in vivo. 1290 Dev Biol 368, 193-202.

50 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1291 Kalamajski, S., Bihan, D., Bonna, A., Rubin, K., and Farndale, R.W. (2016). Fibromodulin Interacts with 1292 Collagen Cross-linking Sites and Activates Lysyl Oxidase. J Biol Chem 291, 7951-7960. 1293 Kanehisa, M., Sato, Y., Furumichi, M., Morishima, K., and Tanabe, M. (2019). New approach for 1294 understanding genome variations in KEGG. Nucleic Acids Res 47, D590-D595. 1295 Kleifeld, O., Doucet, A., auf dem Keller, U., Prudova, A., Schilling, O., Kainthan, R.K., Starr, A.E., Foster, 1296 L.J., Kizhakkedathu, J.N., and Overall, C.M. (2010). Isotopic labeling of terminal amines in 1297 complex samples identifies protein N-termini and protease cleavage products. Nat Biotechnol 28, 1298 281-288. 1299 Komori, T. (2010). Regulation of bone development and extracellular matrix protein genes by RUNX2. 1300 Cell Tissue Res 339, 189-195. 1301 Lam, T.-k. (2013). Fate of notochord descendent cells in the intervertebral disc. In HKU Theses Online 1302 (HKUTO) (The University of Hong Kong (Pokfulam, Hong Kong)). 1303 Lau, D., Elezagic, D., Hermes, G., Morgelin, M., Wohl, A.P., Koch, M., Hartmann, U., Hollriegl, S., 1304 Wagener, R., Paulsson, M., et al. (2018). The cartilage-specific lectin C-type lectin domain family 1305 3 member A (CLEC3A) enhances tissue plasminogen activator-mediated plasminogen activation. 1306 J Biol Chem 293, 203-214. 1307 Lee, C.G., Da Silva, C.A., Dela Cruz, C.S., Ahangari, F., Ma, B., Kang, M.J., He, C.H., Takyar, S., and 1308 Elias, J.A. (2011). Role of chitin and chitinase/chitinase-like proteins in inflammation, tissue 1309 remodeling, and injury. Annu Rev Physiol 73, 479-501. 1310 Leijten, J.C., Bos, S.D., Landman, E.B., Georgi, N., Jahr, H., Meulenbelt, I., Post, J.N., van Blitterswijk, 1311 C.A., and Karperien, M. (2013). GREM1, FRZB and DKK1 mRNA levels correlate with 1312 osteoarthritis and are regulated by osteoarthritis-associated factors. Arthritis Res Ther 15, R126. 1313 Li, C., Hancock, M.A., Sehgal, P., Zhou, S., Reinhardt, D.P., and Philip, A. (2016). Soluble CD109 binds 1314 TGF-beta and antagonizes TGF-beta signalling and responses. Biochem J 473, 537-547. 1315 Lopez-Otin, C., and Overall, C.M. (2002). Protease degradomics: a new challenge for proteomics. Nat Rev 1316 Mol Cell Biol 3, 509-519. 1317 Lu, Y., Qiao, L., Lei, G., Mira, R.R., Gu, J., and Zheng, Q. (2014). Col10a1 gene expression and 1318 chondrocyte hypertrophy during skeletal development and disease. Frontiers in Biology 9, 195- 1319 204. 1320 Markmann, A., Hausser, H., Schonherr, E., and Kresse, H. (2000). Influence of decorin expression on 1321 transforming growth factor-beta-mediated collagen gel retraction and biglycan induction. Matrix 1322 Biol 19, 631-636. 1323 Maseda, M., Yamaguchi, H., Kuroda, K., Mitsumata, M., Tokuhashi, Y., and Esumi, M. (2016). Proteomic 1324 Analysis of Human Intervertebral Disc Degeneration. Journal of Nihon University Medical 1325 Association 75, 16-21. 1326 Melas, I.N., Chairakaki, A.D., Chatzopoulou, E.I., Messinis, D.E., Katopodi, T., Pliaka, V., Samara, S., 1327 Mitsos, A., Dailiana, Z., Kollia, P., et al. (2014). Modeling of signaling pathways in chondrocytes 1328 based on phosphoproteomic and cytokine release data. Osteoarthritis Cartilage 22, 509-518. 1329 Minogue, B.M., Richardson, S.M., Zeef, L.A., Freemont, A.J., and Hoyland, J.A. (2010). Characterization 1330 of the human nucleus pulposus cell phenotype and evaluation of novel marker gene expression to 1331 define adult stem cell differentiation. Arthritis Rheum 62, 3695-3705. 1332 Molinos, M., Almeida, C.R., Caldeira, J., Cunha, C., Goncalves, R.M., and Barbosa, M.A. (2015). 1333 Inflammation in intervertebral disc degeneration and regeneration. J R Soc Interface 12, 20150429. 1334 Munir, S., Rade, M., Maatta, J.H., Freidin, M.B., and Williams, F.M.K. (2018). Intervertebral Disc Biology: 1335 Genetic Basis of Disc Degeneration. Curr Mol Biol Rep 4, 143-150. 1336 Naba, A., Clauser, K.R., Hoersch, S., Liu, H., Carr, S.A., and Hynes, R.O. (2012). The matrisome: in silico 1337 definition and in vivo characterization by proteomics of normal and tumor extracellular matrices. 1338 Mol Cell Proteomics 11, M111 014647. 1339 Nakai, T., Sakai, D., Nakamura, Y., Nukaga, T., Grad, S., Li, Z., Alini, M., Chan, D., Masuda, K., Ando, 1340 K., et al. (2016). CD146 defines commitment of cultured annulus fibrosus cells to express a 1341 contractile phenotype. J Orthop Res 34, 1361-1372.

51 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1342 Nakamichi, R., Kataoka, K., and Asahara, H. (2018). Essential role of Mohawk for tenogenic tissue 1343 homeostasis including spinal disc and periodontal ligament. Mod Rheumatol 28, 933-940. 1344 Naveau, S., Poynard, T., Benattar, C., Bedossa, P., and Chaput, J.C. (1994). Alpha-2-macroglobulin and 1345 hepatic fibrosis. Diagnostic interest. Dig Dis Sci 39, 2426-2432. 1346 Nerlich, A.G., Schaaf, R., Walchli, B., and Boos, N. (2007). Temporo-spatial distribution of blood vessels 1347 in human lumbar intervertebral discs. Eur Spine J 16, 547-555. 1348 Newell, N., Little, J.P., Christou, A., Adams, M.A., Adam, C.J., and Masouros, S.D. (2017). Biomechanics 1349 of the human intervertebral disc: A review of testing techniques and results. J Mech Behav Biomed 1350 Mater 69, 420-434. 1351 Ong, S.E., Blagoev, B., Kratchmarova, I., Kristensen, D.B., Steen, H., Pandey, A., and Mann, M. (2002). 1352 Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach 1353 to expression proteomics. Mol Cell Proteomics 1, 376-386. 1354 Onnerfjord, P., Khabut, A., Reinholt, F.P., Svensson, O., and Heinegard, D. (2012a). Quantitative 1355 proteomic analysis of eight cartilaginous tissues reveals characteristic differences as well as 1356 similarities between subgroups. J Biol Chem 287, 18913-18924. 1357 Onnerfjord, P., Khabut, A., Reinholt, F.P., Svensson, O., and Heinegard, D. (2012b). Quantitative 1358 proteomic analysis of eight cartilaginous tissues reveals characteristic differences as well as 1359 similarities between subgroups. Journal of Biological Chemistry 287, 18913-18924. 1360 Park, J.S., Chu, J.S., Tsou, A.D., Diop, R., Tang, Z., Wang, A., and Li, S. (2011). The effect of matrix 1361 stiffness on the differentiation of mesenchymal stem cells in response to TGF-beta. Biomaterials 1362 32, 3921-3930. 1363 Peix, L., Evans, I.C., Pearce, D.R., Simpson, J.K., Maher, T.M., and McAnulty, R.J. (2018). Diverse 1364 functions of clusterin promote and protect against the development of pulmonary fibrosis. Sci Rep 1365 8, 1906. 1366 Pfirrmann, C.W., Metzdorf, A., Zanetti, M., Hodler, J., and Boos, N. (2001). Magnetic resonance 1367 classification of lumbar intervertebral disc degeneration. Spine (Phila Pa 1976) 26, 1873-1878. 1368 Rajasekaran, S., Tangavel, C., K, S.S., Soundararajan, D.C.R., Nayagam, S.M., Matchado, M.S., 1369 Raveendran, M., Shetty, A.P., Kanna, R.M., and Dharmalingam, K. (2020). Inflammaging 1370 determines health and disease in lumbar discs-evidence from differing proteomic signatures of 1371 healthy, aging, and degenerating discs. Spine J 20, 48-59. 1372 Rajesh, D., and Dahia, C.L. (2018). Role of Sonic Hedgehog Signaling Pathway in Intervertebral Disc 1373 Formation and Maintenance. Curr Mol Biol Rep 4, 173-179. 1374 Ranjani, V., Sreemol, G., Muthurajan, R., Natesan, S., Gnanam, R., Kanna, R.M., and Rajasekaran, S. 1375 (2016). Proteomic Analysis of Degenerated Intervertebral Disc-identification of Biomarkers of 1376 Degenerative Disc Disease and Development of Proteome Database. Global Spine Journal 6, 1377 WO025. 1378 Rauniyar, N., and Yates, J.R., 3rd (2014). Isobaric labeling-based relative quantification in shotgun 1379 proteomics. J Proteome Res 13, 5293-5309. 1380 Riester, S.M., Lin, Y., Wang, W., Cong, L., Mohamed Ali, A.M., Peck, S.H., Smith, L.J., Currier, B.L., 1381 Clark, M., Huddleston, P., et al. (2018). RNA sequencing identifies gene regulatory networks 1382 controlling extracellular matrix synthesis in intervertebral disk tissues. J Orthop Res 36, 1356-1369. 1383 Risbud, M.V., Schoepflin, Z.R., Mwale, F., Kandel, R.A., Grad, S., Iatridis, J.C., Sakai, D., and Hoyland, 1384 J.A. (2015). Defining the phenotype of young healthy nucleus pulposus cells: recommendations of 1385 the Spine Research Interest Group at the 2014 annual ORS meeting. J Orthop Res 33, 283-293. 1386 Robinson, K.A., Sun, M., Barnum, C.E., Weiss, S.N., Huegel, J., Shetye, S.S., Lin, L., Saez, D., Adams, 1387 S.M., Iozzo, R.V., et al. (2017). Decorin and biglycan are necessary for maintaining collagen fibril 1388 structure, fiber realignment, and mechanical properties of mature tendons. Matrix Biol 64, 81-93. 1389 Rodrigues-Pinto, R., Berry, A., Piper-Hanley, K., Hanley, N., Richardson, S.M., and Hoyland, J.A. (2016). 1390 Spatiotemporal analysis of putative notochordal cell markers reveals CD24 and keratins 8, 18, and 1391 19 as notochord-specific markers during early human intervertebral disc development. J Orthop 1392 Res 34, 1327-1340.

52 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1393 Rodriguez, A.G., Slichter, C.K., Acosta, F.L., Rodriguez-Soto, A.E., Burghardt, A.J., Majumdar, S., and 1394 Lotz, J.C. (2011). Human disc nucleus properties and vertebral endplate permeability. Spine (Phila 1395 Pa 1976) 36, 512-520. 1396 Rubin, D.I. (2007). Epidemiology and risk factors for spine pain. Neurol Clin 25, 353-371. 1397 Rutges, J.P., Duit, R.A., Kummer, J.A., Oner, F.C., van Rijen, M.H., Verbout, A.J., Castelein, R.M., Dhert, 1398 W.J., and Creemers, L.B. (2010). Hypertrophic differentiation and calcification during 1399 intervertebral disc degeneration. Osteoarthritis Cartilage 18, 1487-1495. 1400 Saito, T., Ikeda, T., Nakamura, K., Chung, U.I., and Kawaguchi, H. (2007). S100A1 and , 1401 transcriptional targets of SOX trio, inhibit terminal differentiation of chondrocytes. EMBO Rep 8, 1402 504-509. 1403 Sakai, D., Nakamura, Y., Nakai, T., Mishima, T., Kato, S., Grad, S., Alini, M., Risbud, M.V., Chan, D., 1404 Cheah, K.S., et al. (2012). Exhaustion of nucleus pulposus progenitor cells with ageing and 1405 degeneration of the intervertebral disc. Nat Commun 3, 1264. 1406 Saleem, S., Aslam, H.M., Rehmani, M.A., Raees, A., Alvi, A.A., and Ashraf, J. (2013). Lumbar disc 1407 degenerative disease: disc degeneration symptoms and magnetic resonance image findings. Asian 1408 Spine J 7, 322-334. 1409 Sarath Babu, N., Krishnan, S., Brahmendra Swamy, C.V., Venkata Subbaiah, G.P., Gurava Reddy, A.V., 1410 and Idris, M.M. (2016). Quantitative proteomic analysis of normal and degenerated human 1411 intervertebral disc. Spine J 16, 989-1000. 1412 Schneiderman, G., Flannigan, B., Kingston, S., Thomas, J., Dillin, W.H., and Watkins, R.G. (1987). 1413 Magnetic resonance imaging in the diagnosis of disc degeneration: correlation with discography. 1414 Spine (Phila Pa 1976) 12, 276-281. 1415 Silagi, E.S., Shapiro, I.M., and Risbud, M.V. (2018). Glycosaminoglycan synthesis in the nucleus pulposus: 1416 Dysregulation and the pathogenesis of disc degeneration. Matrix Biol 71-72, 368-379. 1417 Song, Y.Q., Cheung, K.M., Ho, D.W., Poon, S.C., Chiba, K., Kawaguchi, Y., Hirose, Y., Alini, M., Grad, 1418 S., Yee, A.F., et al. (2008). Association of the asporin D14 allele with lumbar-disc degeneration in 1419 Asians. Am J Hum Genet 82, 744-747. 1420 Song, Y.Q., Karasugi, T., Cheung, K.M., Chiba, K., Ho, D.W., Miyake, A., Kao, P.Y., Sze, K.L., Yee, A., 1421 Takahashi, A., et al. (2013). Lumbar disc degeneration is linked to a sulfotransferase 1422 3 variant. J Clin Invest 123, 4909-4917. 1423 Subramanian, A., and Schilling, T.F. (2014). Thrombospondin-4 controls matrix assembly during 1424 development and repair of myotendinous junctions. Elife 3. 1425 Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., 1426 Pomeroy, S.L., Golub, T.R., Lander, E.S., et al. (2005). Gene set enrichment analysis: a knowledge- 1427 based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102, 1428 15545-15550. 1429 Sun, Z., Liu, B., and Luo, Z.J. (2020). The Immune Privilege of the Intervertebral Disc: Implications for 1430 Intervertebral Disc Degeneration Treatment. Int J Med Sci 17, 685-692. 1431 Taha, I.N., and Naba, A. (2019). Exploring the extracellular matrix in health and disease using proteomics. 1432 Essays Biochem 63, 417-432. 1433 Takao, T., and Iwaki, T. (2002). A comparative study of localization of 27 and heat 1434 shock protein 72 in the developmental and degenerative intervertebral discs. Spine (Phila Pa 1976) 1435 27, 361-368. 1436 Taye, N., Karoulias, S.Z., and Hubmacher, D. (2020). The "other" 15-40%: The Role of Non-Collagenous 1437 Extracellular Matrix Proteins and Minor Collagens in Tendon. J Orthop Res 38, 23-35. 1438 Team, R.C. (2013). R: A language and environment for statistical computing. 1439 Teraguchi, M., Yoshimura, N., Hashizume, H., Muraki, S., Yamada, H., Minamide, A., Oka, H., Ishimoto, 1440 Y., Nagata, K., Kagotani, R., et al. (2014). Prevalence and distribution of intervertebral disc 1441 degeneration over the entire spine in a population-based cohort: the Wakayama Spine Study. 1442 Osteoarthritis Cartilage 22, 104-110. 1443 Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J Roy Stat Soc B Met 58, 267-288.

53 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1444 Trougakos, I.P. (2013). The molecular chaperone apolipoprotein J/clusterin as a sensor of oxidative stress: 1445 implications in therapeutic approaches - a mini-review. Gerontology 59, 514-523. 1446 Urban, J.P., Smith, S., and Fairbank, J.C. (2004). Nutrition of the intervertebral disc. Spine (Phila Pa 1976) 1447 29, 2700-2709. 1448 van Buuren, S., and Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations 1449 in R. J Stat Softw 45, 1-67. 1450 van den Akker, G.G.H., Koenders, M.I., van de Loo, F.A.J., van Lent, P., Blaney Davidson, E., and van der 1451 Kraan, P.M. (2017). Transcriptional profiling distinguishes inner and outer annulus fibrosus from 1452 nucleus pulposus in the bovine intervertebral disc. Eur Spine J 26, 2053-2062. 1453 van der Kraan, P.M., and van den Berg, W.B. (2012). Chondrocyte hypertrophy and osteoarthritis: role in 1454 initiation and progression of cartilage degeneration? Osteoarthritis Cartilage 20, 223-232. 1455 Vaquerizas, J.M., Kummerfeld, S.K., Teichmann, S.A., and Luscombe, N.M. (2009). A census of human 1456 transcription factors: function, expression and evolution. Nat Rev Genet 10, 252-263. 1457 Veras, M.A., McCann, M.R., Tenn, N.A., and Seguin, C.A. (2020). Transcriptional profiling of the murine 1458 intervertebral disc and age-associated changes in the nucleus pulposus. Connect Tissue Res 61, 63- 1459 81. 1460 Vizcaino, J.A., Csordas, A., del-Toro, N., Dianes, J.A., Griss, J., Lavidas, I., Mayer, G., Perez-Riverol, Y., 1461 Reisinger, F., Ternent, T., et al. (2016). 2016 update of the PRIDE database and its related tools. 1462 Nucleic Acids Res 44, D447-456. 1463 Wang, X.L., Hou, L., Zhao, C.G., Tang, Y., Zhang, B., Zhao, J.Y., and Wu, Y.B. (2019a). Screening of 1464 genes involved in epithelial-mesenchymal transition and differential expression of complement- 1465 related genes induced by PAX2 in renal tubules. Nephrology (Carlton) 24, 263-271. 1466 Wang, Y., Fan, X., Xing, L., and Tian, F. (2019b). Wnt signaling: a promising target for osteoarthritis 1467 therapy. Cell Commun Signal 17, 97. 1468 Wisniewski, J.R., Hein, M.Y., Cox, J., and Mann, M. (2014). A "proteomic ruler" for protein copy number 1469 and concentration estimation without spike-in standards. Mol Cell Proteomics 13, 3497-3506. 1470 Wuertz, K., Vo, N., Kletsas, D., and Boos, N. (2012). Inflammatory and catabolic signalling in 1471 intervertebral discs: the roles of NF-kappaB and MAP . Eur Cell Mater 23, 103-119; 1472 discussion 119-120. 1473 Wyatt, A.R., Yerbury, J.J., and Wilson, M.R. (2009). Structural characterization of clusterin-chaperone 1474 client protein complexes. J Biol Chem 284, 21920-21927. 1475 Yates, B., Braschi, B., Gray, K.A., Seal, R.L., Tweedie, S., and Bruford, E.A. (2017). Genenames.org: the 1476 HGNC and VGNC resources in 2017. Nucleic Acids Res 45, D619-D625. 1477 Yee, A., Lam, M.P., Tam, V., Chan, W.C., Chu, I.K., Cheah, K.S., Cheung, K.M., and Chan, D. (2016). 1478 Fibrotic-like changes in degenerate human intervertebral discs revealed by quantitative proteomic 1479 analysis. Osteoarthritis Cartilage 24, 503-513. 1480 Zhu, M., Tang, D., Wu, Q., Hao, S., Chen, M., Xie, C., Rosier, R.N., O'Keefe, R.J., Zuscik, M., and Chen, 1481 D. (2009). Activation of beta-catenin signaling in articular chondrocytes leads to osteoarthritis-like 1482 phenotype in adult beta-catenin conditional activation mice. J Bone Miner Res 24, 12-21. 1483 Zhu, S., Qiu, H., Bennett, S., Kuek, V., Rosen, V., Xu, H., and Xu, J. (2019). Chondromodulin-1 in health, 1484 osteoarthritis, cancer, and heart disease. Cell Mol Life Sci 76, 4493-4502. 1485

54 FIGURE 1 A B C A T11/T1 L3/4 L3/4 L3/4 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyrightVB holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.L R T12/L1 IVD VB P L1/2 P L4/5 L4/5 L4/5 L2/3

11 pro les Anterior (A) L3/4 corresponding to OAF 11 locations NP/IAF L4/5 (L) (R) L5/S1 L5/S1 L5/S1 OAF IAF NP/IAF NP NP/IAF IAF OAF Left Right NP/IAF OAF L5/S1 A A L R L R Posterior (P) P P T2 T1 T2 T2 T2 H Young Aged D E IAF NP/IAF OAF 72 126 NP 14 11 proteomic pro les per disc 102 9 Collagens Proteoglycans 158 14 Young and aged 1337 60 31.9 32 31.6 31.5 29.7 29.7 29.9 29.9 35.0 35.7 36.2 36.5 34.3 34.7 35.4 34.3 L3/4 L4/5 L5/S1 L3/4 L4/5 L5/S1 40 (in aggregate 3,100) 983 40 96 1 Young individual Aged individual 3751 35 LFQ)

44 2 35 IAF NP/IAF Raw data (n=66) OAF 54 86 NP 30 89 13 15 30 125 24 536 45 Young discs only 25 (in aggregate 1,883) 25 Imputation Detection Association 690 Intensities (log 89 3 of DEPs with 3538 20 MRI images 43 PCA, SVM, NP IAF NP IAF NP IAF NP IAF Functional OAF OAF OAF OAF ANOVA IAF NP/IAF NP/IAF NP/IAF NP/IAF NP/IAF & clustering analyses OAF 44 80 NP analyses & cross 117 9 7 Glycoproteins ECM affiliated comparisons Aged discs only 180 12 1314 34 29.8 30.0 30.1 29.6 29.1 29.1 29.3 29.2 29.1 29.1 28.9 29.3 28.0 28.4 28.8 28.6 (in aggregate 2,791) 803 40 36 109 1 2831

25 LFQ) 2 35 32

F Age-group Core matrisome non-Core matrisome 30 28 Young Collagens ECM regulators 25 Aged Proteoglycans ECM aliated 24 Intensities (log Glycoproteins Secreted factors

NP NP NP NP non-matrisome proteins IAFOAF IAFOAF IAFOAF IAFOAF NP/IAF NP/IAF NP/IAF NP/IAF ECM regulators Secreted factors

28.4 29.0 28.4 28.4 29.5 29.6 29.3 28.7 35 28.3 28.2 28.3 27.4 27.0 27.0 27.0 26.9 1000 1500

35 LFQ)

2 30

30 detected per sample

25 25 Absolute number of protein genes number of protein Absolute 0 500 Intensities (log 20 age-group 20

NP NP NP NP IAFOAF IAFOAF IAFOAF IAFOAF NP/IAF NP/IAF NP/IAF NP/IAF L3/4 NP L4/5 NP L3/4 NP L4/5 NP L5/S1 NP L5/S1 NP I L3/4 L IAF L4/5 L IAF L3/4 R IAF L4/5 R IAF L3/4 A OAF L4/5 A OAF L3/4 L OAF L4/5 L OAF L3/4 P OAF L4/5 P OAF L3/4 R OAF L4/5 R OAF L3/4 L IAF L4/5 L IAF L3/4 R IAF L4/5 R IAF L3/4 A OAF L4/5 A OAF L3/4 L OAF L4/5 L OAF L3/4 P OAF L4/5 P OAF L3/4 R OAF L4/5 R OAF L5/S1 L IAF L5/S1 R IAF L5/S1 A OAF L5/S1 L OAF L5/S1 P OAF L5/S1 R OAF L5/S1 L IAF L5/S1 R IAF L5/S1 A OAF L5/S1 L OAF L5/S1 P OAF L5/S1 R OAF L3/4 A NP/IAF L4/5 A NP/IAF L3/4 L NP/IAF L4/5 L NP/IAF L3/4 P NP/IAF L4/5 P NP/IAF L3/4 R NP/IAF L4/5 R NP/IAF L3/4 A NP/IAF L4/5 A NP/IAF L3/4 L NP/IAF L4/5 L NP/IAF L3/4 P NP/IAF L4/5 P NP/IAF L3/4 R NP/IAF L4/5 R NP/IAF

L5/S1 A NP/IAF L5/S1 L NP/IAF L5/S1 P NP/IAF L5/S1 R NP/IAF L5/S1 A NP/IAF L5/S1 L NP/IAF L5/S1 P NP/IAF L5/S1 R NP/IAF 60 40 G 20 counts Protein 0 Secreted factors (N=83) 20 0 ATPases ydrolases Zinc fingers ECM PDZ domain Mitochondrial I−set domain CD molecules Phosphatases Immunoglobulin aliated (N=47) 0 15 e helical domain EF−hand domain RNA binding motif Major spliceosome WD repeat domain Ribosomal proteins Heat shock proteins L ribosomal proteins ComplementS ribosomal system proteins regulatory subunits GlycosyltransferasesGlycoside h Intermediate filaments ProteinProtein coat compl phosphataseexes 1 ucleolar RNA host genes Spliceosomal B complex SpliceosomalSpliceosomalSpliceosomal A complex C complex P complex

ECM Armadillo lik 25 regulators Small n Ras small GTPases superfamily (N=109) 0 r=0.785 -15 J 29.4729.7 30.0231.5628.1 28.9229.9530.47 K (p=6.5×10 ) 34 Glyco- 50 proteins OAF (N=114) 32 IAF NP/IAF 0 30 NP

Proteo- 15

glycans (N=23) 0 28 Collagens (N=30) 0 15 (logLFQ) 26 age-group young

24 histones (logLFQ) 27 28 29 30 31 32

histones expression aged

30 31 32 33 34 NP NP IAFOAF IAFOAF L3/4 NP L4/5 NP L3/4 NP L4/5 NP GAPDH (logLFQ) L5/S1 NP L5/S1 NP NP/IAF NP/IAF L3/4 L IAF L4/5 L IAF L3/4 R IAF L4/5 R IAF L3/4 A OAF L4/5 A OAF L3/4 L OAF L4/5 L OAF L3/4 P OAF L4/5 P OAF L3/4 R OAF L4/5 R OAF L3/4 L IAF L4/5 L IAF L3/4 R IAF L4/5 R IAF L3/4 A OAF L4/5 A OAF L3/4 L OAF L4/5 L OAF L3/4 P OAF L4/5 P OAF L3/4 R OAF L4/5 R OAF Young Aged L5/S1 L IAF L5/S1 R IAF L5/S1 A OAF L5/S1 L OAF L5/S1 P OAF L5/S1 R OAF L5/S1 L IAF L5/S1 R IAF L5/S1 A OAF L5/S1 L OAF L5/S1 P OAF L5/S1 R OAF L3/4 A NP/IAF L4/5 A NP/IAF L3/4 L NP/IAF L4/5 L NP/IAF L3/4 P NP/IAF L4/5 P NP/IAF L3/4 R NP/IAF L4/5 R NP/IAF L3/4 A NP/IAF L4/5 A NP/IAF L3/4 L NP/IAF L4/5 L NP/IAF L3/4 P NP/IAF L4/5 P NP/IAF L3/4 R NP/IAF L4/5 R NP/IAF L5/S1 A NP/IAF L5/S1 L NP/IAF L5/S1 P NP/IAF L5/S1 R NP/IAF L5/S1 A NP/IAF L5/S1 L NP/IAF L5/S1 P NP/IAF L5/S1 R NP/IAF FIGURE 2

A bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948E Top correlated; this version with PC1 posted JulyF 12,Top 2020. correlated The with copyright PC2 holderG forTop this correlated preprint with (which PC3 was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. young −0.5 0.0 0.5 −0.5 0.0 0.5 −0.5 0.0 0.5 aged NP NP/IAF COL11A2 COL9A1 COL1A1 COL2A1 COL5A1 3 COL16A1 2 COL5A2 COL11A1 OMD IAF COL3A1 BCAN DCN COL1A2 CHADL 3 LUM COL11A1 EPYC PRELP OAF COL15A1 13 EDIL3 OGN 8 COL5A1 MXRA5 BGN COL6A2 IGFBP5 HAPLN1 COL9A1 EMILIN1 KERA COL6A3 CRISPLD1 COMP COL6A1 PCOLCE2 LTBP2 COL8A1 SMOC2 CILP HSPG2 SLIT3 14 SRPX2 VCAN SRPX CILP2 ACAN AEBP1 THBS1 11 HAPLN3 SMOC1 EFEMP2 CHAD 8 IGFBP7 THBS2 HAPLN1 VIT IGFBP6 BGN MATN3 LAMA4 0.5 ASPN CSPG4 LAMB2 FNDC1 SEMA3C CLEC11A PC2 (25.6%) 3 THBS3 SEMA3A LMAN1 3 ABI3BP SERPINE2 ANXA5 0 PCOLCE SERPINA5 MMP3 −0.5 FBN1 PAMR1 CTSD EMILIN3 SERPING1 CTSF TNXB TIMP2 F13A1 6 MATN2 PLOD1 CTSB MATN3 CST3 13 LOX EMILIN1 19 PLOD2 GDF5 IGFBP7 CD109 SFRP4 −0.5 ECM2 TIMP1 ANGPTL7 5 SPP1 ITIH5 S100A4 0 SLIT3 SLPI S100A9 VIT LOXL2 WISP2

0.5 −20 0 20 40 60 LTBP4 INHBA H1FX LAMA5 SCUBE3 FTH1 NID2 CCL18 CLIC4 −20 −10 0 10 20 30 40 TNC TGFB1 TALDO1 CLEC3A S100A6 PDIA3 ANXA1 CHRD PPP2R1A ANXA4 IL17B GBA PC1 (39.9%) LGALS3 6 CXCL12 NOV ANXA2 ANGPTL4 17 ENO2 B GPC1 PDGFC CHID1 CD109 SCUBE1 HSP90B1 young LOX FSTL1 HIST1H2BL L3/4 SLPI CCL14 DPP7 ITIH5 6 FRZB SIL1 aged LOXL2 CHRDL2 CA1 L4/5 SERPINA5 FGFBP2 RPL15 IL17B MDK SOST FRZB C1S GAA L5/S1 HHIP APOD UTRN SCUBE1 QSOX1 GANAB CCL18 8 ISLR HEXB ANGPTL7 FAP ARF4 CHRDL2 XYLT1 ACTR3 S100A1 CLSTN1 HSP90AB1 CALU MFI2 MAMDC2 HHIPL2 TNFRSF11B HBG1 CKB DKK3 AOC2 PGAM1 ATRN CHD9 LECT2 SCRG1 RCN3 TPI1 MINPP1 GSTP1 SSC5D CHST3 CLIC1 CKM PDGFRL ATP6AP2 65 RCN3 HEXA PNP LECT1 CFHR2 ARF1 PYGB HHIPL1 NNMT RNASE4 CPQ CA2 LYZ GALNT2 AKR1A1

PC2 (25.6%) ANG KRT8 SKP1 ENPP2 GGH RYR1 RNASE3 GRN CFD DSC3 CRTAC1 47 ARHGDIA VIM MSMP ITGBL1 CAPS KAL1 CAPG LPL PPIC PGM2 XYLT1 40 CYTL1 LDHA PRDX1 LPL Collagens PPIA ALDOA OAF YWHAZ UGP2 ACLY Proteoglycans MYOC ITGB1 PXYLP1 NT5E MYH1 CCDC80 PDIA6 KRT14 CP Glycoproteins CLU MFI2 DNAJC3 TKT −20 0 20 40 60 PYGM POFUT2 ECM aliated GSTO1 CRYAB RNASE4 CALR RCN1 APMAP P4HB −20 −10 0 10 20 30 40 LDHA VASN ECM regulators KRT10 VASN APOM KRT1 ACTA2 PYGL Secreted factors HSP90AA1 PC1 (39.9%) GAPDH KRT19 HSPA5 PGK1 C1R HIST1H4A C CD9 DDX39B Others CFL1 GRN NUCB1 RPS3 KRT8 CHI3L1 NCL MDH1 CHPF TLN1 LRG1 1 COL4A2 IGFALS young VTN LAMB1 MFAP4 FGA FBLN1 IGFBP3 7 FGB 7 LAMB2 7 EFEMP1 aged FGG NID1 SLIT3 ECM1 LAMC1 CRISPLD1 IGFALS SERPINH1 LRG1 1 HPX 2 SERPINB1 1 CLEC3B HTRA1 CAPZB SERPIND1 F13A1 HSPA1B SERPINF2 MMP2 LMNA SERPINA4 SERPINA1 PDIA6 SERPINA3 F9 ILF3 SERPINA1 A2M HIST1H2BL SERPINA6 F13B YWHAZ 13 AGT HABP2 RPS8 F13B F10 GPI SERPING1 ITIH1 HLA−A SERPINA7 SERPINA7 GLUD1 F12 SERPINA6 CAPG HABP2 24 F12 HNRNPK KNG1 SERPINA10 ARPC2 FSTL1 ITIH2 SYNCRIP 2 ANGPTL4 ITIH4 TPM2 CFHR5 PLG SSB DBNL F2 HSPA8 HSPA2

PC3 (5.0%) KNG1 SET CD163 SERPINC1 RNH1 BIN1 SERPIND1 HBA1 BTD SERPINF2 PRDX2 IGHG3 HRG CKAP4 UGP2 NP SERPINA4 IDH2 VDAC1 CRLF1 EEF2 PROS1 3 ANGPTL4 CFL1 PRSS3P2 NP/IAF MST1 GDI2 PXYLP1 C4B ESD IGJ HOXD1 ACO2 CKM IAF APOL1 HNRNPA2B1 KAL1 GPLD1 VCL C2 CRP EIF4A1 IGHD OAF CFH CALR GPLD1 IGKV3−20 HSP90AA1 CCT3 −15 −10 −5 0 5 10 15 20 C2 HDGF A1BG CD5L HBB ENO3 APOB GSTP1 CFHR2 −20 −10 0 10 20 30 40 BTD ATP5A1 IGHV4−61 IGJ MYL6B KRT19 APOC1 UBA1 LPL PC1 (39.9%) CFHR1 BLVRB ATRN D CFHR2 PLEC PLTP LGALS3BP CA2 APOL1 ORM1 SND1 IGLC6 ORM2 AHNAK TNNT3 young KLKB1 ATP5B GC APOM 92 CA1 C8A aged CD14 EIF5A CFH AZGP1 NCL ORM2 SHBG SELENBP1 MYOM2 APOE AKR1B1 KRT8 C5 HIST2H2AC RPS16 APCS CBR1 C8G IGHG4 FSCN1 77 HADHB HPR PFN1 DKK3 AFM YWHAQ C4B C3 HSPB1 TPM1 C4BPA TUBB CD5L IGLC6 HNRNPM ACTN2 65 ALB MYL6 TTN HP MAMDC2 APCS GC AKR1A1 TNNC2 APOC3 ACTG1 CFI AHSG RPL12 APOM IGKC HNRNPC IGKV3−20 APOA1 DPYSL3 CFB IGHG3 DDX17 MYBPC1 C6 ADH1B DES 0.5 P0DOX2 RPS3 LMNB1 0 IGHA1 RPL18 AHSG

PC3 (5.0%) −0.5 TF TPM3 MYH7 IGHM HNRNPD MYLPF IGLL5 MYH9 ACTN3 C7 OLFML1 PYGM A1BG RAP1B HPR C8B CYB5R3 IGHM IGK PCBP1 MYL3 LBP HNRNPA3 LDB3 L3/4 CFI TPM4 MYH1 APOA2 HNRNPU CD14 IGHG1 VCP SHBG L4/5 CPB2 CLIC1 IGHV5−51 APOH RPL3 TNNC1 L5/S1 IGHG2 CAP1 LBP PON1 HNRNPR MYL1

−15 −10 −5 0 5 10 15 20 C8G SEPT7 APOA2 APOA4 TAGLN2 MYL2 C9 CAPZA2 CP −20 −10 0 10 20 30 40 C8A HNRNPA1 IGHG4 IGHD PLS3 MYH2 PGLYRP2 ST13 P0DOX2 PC1 (39.9%) TTR MYL12A CASQ1 FIGURE 3

A bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948B ; this versionD posted July 12, 2020. The copyright holder for this preprint (which NP was notA certifiedP by peer review) is the author/funder. All rights reserved.Variable protein No reuse set allowed withoutConstant permission. protein set NP/IAF Human proteome 27 (4.0%) 11 (4.5%) L IAF L L OAF L (20,368 proteins) R R 44 (6.6%) 15 (6.1%) Detected L proteome P 16 (2.4%) 9 (3.7%) L R (1,883 proteins) A 53 (7.9%) 24 (9.8%) R P A

PC2 (11.7%) R L A R 11 (1.6%) 7 (2.9%) P A R 506 (75.5%) 171 (69.8%) −0.5 L Variable Constant 13 (1.9%) 8 (3.3%) R set set A 0.5 L P P R 0 −15 −10 −5 0 5 10 15 −30 −20 −10 0 10 20 Marginally PC1 (36.5%) detected Major proteins in the constant set: L3/4 Collagens: COL1A2,COL2A1,COL9A1,COL15A1 L4/5 L5/S1 Proteoglycans: C ACAN,CHAD,HAPLN1,HSPG2,OMD,VCAN 245 Glycoproteins: ABI3BP,AEBP1,CTGF,DPT,ECM2,EFEMP2,EMILIN3,FNDC1,LTBP2, NID2,PXDN,TNC, TNXB ECM aliated: CLEC3A,CLEC11A 0.5 0 >16 PC2 (11.7%) −0.5 ECM regulators: A2M,HTRA1,HTRA3,LOXL2,LOXL3,SERPINA1,SERPINA3,SERPINE1 Secreted factors: ANGPTL2,FGFBP2,PTN Detected in

Number of Pro les Non-matrisome: ACTA2,ANG,C1R,C3,C4B,CALU,CAPS,CFH,CKB,CLU,CRTAC1,DSTN, −15 −10 −5 0 5 10 15

−30 −20 −10 0 10 20 0 5 10 15 20 25 30 EEF1A1, EHD2,ENO1,FBXO2,FHL1,FTH1,FTL,GANAB,GSN,HHIPL2, PC1 (36.5%) 1 200 400 600 800 1000 1200 HSPA5, IGK,KRT1, KRT10,LDHA,LRP1,LYZ,NEB,P4HB,PDIA4,PEBP1, Number of proteins PGAM1,PGK1, PGM1,PPIB, PRDX1,PRDX6,PTRF,PYGB,RNH1, SOD3,SSC5D,TPI1, TTN,TUBB4B,UGP2,VIM E Module Y1: NP markers F Module Y2: AF markers 2 I 2 1 1 Higher in OAF 0 COL6A2 SERPINF1 0 Higher in inner disc COL6A3

Z-scores −1 −1 YWHAZ Z-scores (L3/4) 8 (all levels mixed) −2 −2 L_OAFL_IAF L_NP/IAFNP R_NP/IAFR_IAF R_OAF COL5A2 COL6A1 L_OAFL_IAF L_NP/IAFNP R_NP/IAFR_IAF R_OAF

ANXA6 2 LAMC1 1 HSPA1B COL1A1 2 0 CD109 GSTP1

6 BGN 1 −1 CSPG4 ANXA4 THBS4 Z-scores (L4/5) AOC2 LAMB2 0 −2 COL5A1 ANXA5 PLS3 L_OAFL_IAF L_NP/IAFNP R_NP/IAFR_IAF R_OAF COL11A1 AHNAK SEMA3C OGN Z-scores ANXA2 LMNA −1 PRELP THBS1 YWHAE CAP1 (all levels mixed) PDGFRL −2 MXRA5 CILP2 UBA1 ACTN1 A_OAF A_NP/IAF P_NP/IAF P_OAF 2 FMOD NP SERPINA5 VIT EDIL3 ANXA1 1 4 KRT8 CXCL12 GAPDH HIST1H2BL LAMA4 PRDX2 MFGE8 CLSTN1 HIST1H4A YWHAQ 0 ACTG1 TNFRSF11B ISLR CALM3 PPIA −log(FDR q) C1S KAL1 IQGAP1 HNRNPA2B1 −1 APOD SMOC1 DCN HSPA8 SND1ATP5B HPX INHBA SERPINE2 CCDC80 PDIA3 GNAI2 KRT19 TIMP1 ARF1 GDI2 THBS2SOD1 NCL SERPINC1 Z-scores (L5/S1) EMILIN1 −2 IL17B FRZB CTSD CFL1 HSPB1 ORM1 L_OAFL_IAF L_NP/IAFNP R_NP/IAFR_IAF R_OAF CLEC3B VCL OAF HSP90B1 AZGP1SERPINH1 MYH9 SRPX2 RPS8 ALB RNASE4 CP CFB PDIA6 HBB VCP IGHA1 SERPING1 PCOLCE2 FBN1MSN VWA1 COL12A1 APOA1 2 PXYLP1DKK3LPL SMOC2 HSP90AA1 CLIC1 HIST2H2AC CHRD PCOLCECPQ COMP ALDOA FGA Module Y3: smooth muscle QSOX1 LOX DPYSL3 FGB G MATN3 MFI2ATRNSLPIIGFBP7 TUBA1B ACTN4 cell markers Module Y4: immune & blood SLIT3 TGFBI HBA1 CALR ITIH2 H GALNT2 COL11A2 CILP PKM 2 CHRDL2 PLOD1 RPS3 ARF4COL14A1 FGG COL3A1 NUCB1 RPS27A PFN1 1 1 0 0 −1 0 Z-scores (L3/4) −1 Z-scores ( L3/4 ) L_OAFL_IAF L_NP/IAFNP R_NP/IAFR_IAF R_OAF L_OAFL_IAF L_NP/IAFNP R_NP/IAFR_IAF R_OAF −6 −4 −2 0 2 4 6 log(fold-change) 2 1 J 1 0 0 Higher in inner disc (non-OAF) Higher in OAF COL3A1, COL5A1, COL5A2, −1 −1 COL12A1, COL14A1, COL1A1, COL6A1, Collagens COL11A1, COL11A2 COL6A2, COL6A3 Z-scores (L4/5) Z-scores ( L4/5 ) −2 −2 Proteoglycans BGN, DCN, FMOD, OGN, PRELP L_OAFL_IAF L_NP/IAFNP R_NP/IAFR_IAF R_OAF L_OAFL_IAF L_NP/IAFNP R_NP/IAFR_IAF R_OAF EDIL3, EMILIN1, IGFBP7, MATN3, MXRA5, CILP, CILP2, COMP, FBN1, FGA, FGB, Glycoproteins FGG, LAMA4, LAMB2, LAMC1, MFGE8, SRPX2, PCOLCE, PCOLCE2, SLIT3, SMOC1, SMOC2, VIT TGFBI, THBS1, THBS2, THBS4, VWA1 2 2 ECM aliated CLEC3B, CSPG4, SEMA3C ANXA1, ANXA2, ANXA4, ANXA5, ANXA6, HPX 1 1 CD109, PLOD1, SERPINA5, SERPINE2, CTSD, ITIH2, LOX, SERPINC1, SERPINF1, ECM regulators SERPING1, SLPI, TIMP1 SERPINH1 0 0 Secrected factors CHRD, CHRDL2, CXCL12, FRZB, IL17B, INHBA −1 −1 Z-scores ( L5/S1 ) Basement COL6A1, COL6A2, COL6A3, LAMA4, LAMB2, Z-scores (L5/S1) −2 membranes LAMC1 L_OAFL_IAF L_NP/IAFNP R_NP/IAFR_IAF R_OAF L_OAFL_IAF L_NP/IAFNP R_NP/IAFR_IAF R_OAF Surface markers CD109 FIGURE 4 A C D Module Y1: NP markers Module Y2: AF markers bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948P ; this version posted July 12, 2020. The copyright holder for this preprint (which was notP certifiedP by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. A L RA 2 R L LL 1 P R 1 0 R A 0.5 L A R 0 0 −1 P −0.5 Z-scores (L3/4) R LP −1 L_OAFL_IAF L_NP/IAFNP R_NP/IAFR_IAFR_OAF R PC2 (21.8%) −2 A PC2 (21.8%) NP

Z-scores (all levels mixed) L_OAF L_IAF L_NP/IAFNP R_NP/IAFR_IAF R_OAF NP/IAF L L3/4 R IAF L L R L4/5 2 L5/S1 OAF 0.5 −30 −20 −10 0 10 20

−30 −20 −10 0 10 20 −0.5 0 A 1 −60 −40 −20 0 20 −60 −40 −20 0 20 0 PC1 (46.7%) PC1 (46.7%) 2 −1

Z-scores (L4/5) −2 1 L_OAFL_IAF L_NP/IAFNP R_NP/IAFR_IAFR_OAF

B 0

−1 2 Higher in OAF PKM −2 1 MSN Z-scores (all levels mixed) A_OAF A_NP/IAF NP P_NP/IAF P_OAF Higher in inner disc 0 −1

Z-scores (L5/S1) −2 ANXA6 L_OAFL_IAF L_NP/IAFNP R_NP/IAFR_IAFR_OAF AHNAK

6 IQGAP1 GSTP1 E HSP90AA1 Module Y3: smooth muscle YWHAE F HBD MYH9 cell markers Module Y4: immune & blood HBB TKT 1.5 EEF1A1 HSP90AB1 1 ANXA1 HSPA1B 1 COL6A2 0.5 YWHAZ 0 0 TUBA1B CHI3L2 FBN1 RNH1 −0.5 PRDX1 ANXA4 Z-scores (L3/4) −1

4 CFL1 PGK1 −1 Z-scores (L3/4) LOX PFN1 L_OAFL_IAF L_NP/IAFNP R_NP/IAFR_IAFR_OAF ENO1 TUBB4B L_OAFL_IAF L_NP/IAFNP R_NP/IAFR_IAF R_OAF ANXA2 HBA1 S100A9 CTSD CLIC1 PPIA LMNA A2M FSCN1 BLVRB CA1 −log(FDR q) ANXA5 MAMDC2 PRDX6 COL1A1 UBA1 CAT CAPG SOD1CALR SERPING1 TPI1 COL6A3 PEBP1 2 COL6A1 PRDX2 EEF2 ACTN4 SELENBP1 2 UGP2 HIST1H1C VCP HIST2H2AC 1 RPS27A ACTG1 HIST1H2BL TNXB 1 ATRN THBS4 VIM ATP5A1 ARHGDIA CALM3 COL12A1 0 0 2 ANGPTL4 CAP1 HSP90B1 PLS3 LDHB ATP5B FLNA −1 C1S HSPG2 DSTN HNRNPA2B1 Z-scores (L4/5) −1 PDIA6 PARK7 TPM3 PLEC VASN HSPA5 YWHAQ NID2 GDI2 Z-scores (L4/5) RBP4 GAPDH CRYAB PTRF HSPA8 −2 PDIA3 AK1 L_OAFL_IAF L_NP/IAFNP R_NP/IAFR_IAF R_OAF L_OAFL_IAF L_NP/IAFNP R_NP/IAFR_IAFR_OAF CFHR2 ARF1 P4HB GPI TAGLN2 S100A4 LAMA5 ACTA2

2 2

0 1 1 0 0 −1

−2 0 2 4 6 Z-scores (L5/S1) −1

Z-scores (L5/S1) −2 L_OAFL_IAF L_NP/IAFNP R_NP/IAFR_IAFR_OAF log(fold-change) L_OAFL_IAF L_NP/IAFNP R_NP/IAFR_IAF R_OAF FIGURE 5

A bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948B ; this version posted July 12, 2020. The copyright holder for this preprint (which allwas 33 notyoung certified vs all by 33 peer aged review) is the author/funder. All rights reserved. No reuse allowedC without permission. Extracellular Structure Org Defense Rsps APOA4 Skeletal Sys Dev Proteolysis Higher in aged Collagen Fibril Org Neg Reg Peptidase Higher in young Connective Tiss Dev Adaptive Immune Rsps Bio Adhesion Neg Reg Proteolysis Cartilage Dev Reg Proteolysis Chondrocyte Di Reg Peptidase Bone Dev Neg Reg Hydrolase Cartilage Dev EC Structure Org Endochondral Bone Morph Endocytosis APOA1 Supramolecular Fiber Org IL17B COL5A2 C9 Reg Hydrolase Activity COL11A2 HSPG2 Growth Plate Cartilage Dev IGHG1 Rsps To Wounding TF Bone Growth HP F2 Wound Healing MATN3 FNDC1 APOH ALB Chondrocyte Dev IGK ITIH4 COL15A1COL1A2 C6 Fibrinolysis HHIPL2 IGKC C8B PLG Skeletal Sys Morph THBS3 FGB FGG Reg Wound Healing FBN1 FGA Bone Morph COL5A1 ECM1 (FDR q) TNXB ENPP2 Inmtry Rsps MATN2 ORM1 Growth Plate Cartilage Morph 10 VCAN F13A1 COL2A1 SCUBE1 ITIH2 Acute Inmtry Rsps ITIH5 A2M IGLL5 HPX Chondrocyte Dev COL6A2 ABI3BP APOE Cytolysis CLEC3A COL11A1 COL6A3 C3 AZGP1 GRC Chondrocyte Di COL3A1 LOX IGHG2SERPIND1 C5 −log COL6A1 CD109 APOB C8A Cartilage Morphogenesis Reg Rsps To Stress CKB ANXA1 C4BPA TTR HAPLN3 FRZBCAPSPGAM1 ANGPTL2 SERPINC1 Phagocytosis SPARC LECT2CALU PCOLCE CFH CFI HRG 0 20 40 PRDX1ANXA4S100A1 CLU IGHA1 RCN3PYGB C2 AFMC7 IGFBP7 SSC5D LOXL2ECM2 SERPINA1 SERPINF2 VTN 0 10 20 30 NID2 EMILIN3 SPP1 PRG4 POSTN KRT19 COL8A1 ANG EMILIN1 HTRA1 IGLC6 -log (FDR q) CKM KRT14 ANXA2 BGN LDHA FN1 CFB MMP2 10 XYLT1 LAMA5 CYTL1 SERPINA3 -log (FDR q) HAPLN1 KRT8 PTRFITGAV CCL18 PFKP CHAD UGP2 GNASLYZ C4B SERPINF1 GC 10 CHRDL2TNC RNASE4 TIMP3 LECT1TUBA1B SLPIACANRCN1VASNANXA5 ITIH1 CLSTN1ATRN DSC3 TPI1FMOD RBP4 COL14A1 CILP2 LOXL3 VIM MFI2 LTBP4 YWHAE FTL FLT1 CD9 PGM1 VWA1 COL12A1LGALS3 PKMFGFBP2 CTSD EHD2 GPC1 TGFBI GNB1 SERPINA5 0 5 10 15 20

−4 −2 0 2 4 6 8 Collagens: COL2A1, COL3A1, COL5A1, COL11A1, COL12A1, log(fold-change) COL14A1 Proteoglycans: ACAN, CHAD, HAPLN1 Glycoproteins: ECM2, EMILIN1, EMILIN3, FNDC1, IGFBP7, LAMA5, D MATN2/3, NID2, PCOLCE, SPP1, THBS3, TNC, TNXB APOA4 Collagens: COL8A1 Higher in aged inner regions Proteoglycans: FMOD NP markers: KRT8/19, DSC3, LECT1/2, CHAD, HAPLN1, CLEC3A Higher in young inner regions F Shh signalling: HHIPL2, SCUBE1 TNXB Glycoproteins: CILP2 COL11A2 COL6A3 HPX Mesenchymal: VIM MATN3 BMP/TGFβ antgaonists: FMOD THBS3 IGHG1 APOA1 Wnt, BMP/TGFβ antgaonists: FRZB, CHRDL2, CD109, COL5A1 HSPG2 IGHG2 Others: CD9, ITGAV, LOXL3, GPC1, GNAS FNDC1 CLU IGK C9 down-regulated IL17B COL6A2 ITIH5 ORM1 Inhibition of hypertrophy: S100A1 MATN2 HTRA1 ALB VIM ANXA2 in all aged COL5A2 COL6A1 PRDX1 A2M SERPINF1 FGG HHIPL2PTRF samples COL3A1CD109 PRG4 IGKC ENPP2 NID2 IGLL5 F2 PLG COL11A1FBN1 ECM1 SERPINC1 Collagens: COL1A2, COL5A2, COL6A1/2/3,

(FDR q) APOH CKM COL15A1 TF FGB XYLT1 FN1 AZGP1 HPC8B 10 ASPN TIMP3 FGA COL11A2, COL15A1 SCUBE1 TNC ANXA1 ITIH4 ITIH2 Proteoglycans: ASPN, CHADL UGP2 LOX COL1A2 COL2A1 CYTL1 TPI1 RNASE4 C3 IGHA1 Proteoglycans: BGN, HAPLN3, HSPG2, VCAN 8 CLEC3A COL12A1 PCOLCE C6 Glycoproteins: EDIL3, MXRA5, PCOLCE2, SMOC2, THBS4 KRT8 CALU SERPINA5ECM2PKM F13A1 −log FRZB YWHAE AFM ANXA4 ACTA2EMILIN1 RBP4 Glycoproteins: ABI3BP TUBA1B CXCL12 ATRN Stress: HSP90AA1, HSPA1B, HSPA5, HSPA8, SOD1 CHADL SSC5DPGAM1 ABI3BP

5KRT19 COL14A1 10ALDOA PEBP1 QSOX1 ANGPTL2 15 SPP1 DSC3 MFI2PPIB TNFRSF11B CALM3 THBS4 C7 Others: FBN1, LOX, ANXA1/4/5, ANXA5, IL17B CLSTN1 LAMA5 LECT1 CHAD CRYAB HTRA3 C5 0 78 Ribosomes: RPS27A, RNASE3 LMNAVCAN LYZ LOXL2 LECT2 SLPI CCL18ANGCSPG4 EDIL3 CCDC80 C2 HSPA1B ANXA6 EMILIN3 CKB MXRA5 LDHA HAPLN3 S100A1 RCN3 SERPINA3 APOE CHRDL2 LGALS3 EEF1A1GAPDH MYH1 TUBB4B ANXA5 SERPINA1 IGLC6 HNRNPA2B1DKK3 CHST14 PGK1 Wnt, BMP/TGFβ antgaonists: DKK3 EHD2 AHNAK IGFBP7 CAPS SMOC2 CFD APOB 18 DPYSL2 HAPLN1 PPIA PYGB SERPINE1 KRT14 MDH1GNB1PRDX6ENO1 TIMP1 FHL1 CLEC3B TKT RCN1RPS27A CFH IQGAP1 PCOLCE2 PXYLP1 CFB RNASE3PYGM HSP90AA1 BGN ACANHSPA5 TGFBI CTSD C4B HIST1H1C CDH1 PGM1 APOD VASNLTBP4 CLEC11A FABP3 MYL1 FGFBP2 1 YWHAZ DSTN P4HA1 HSPA8 57 SOD1 ATP5A1 PFKP PDGFRL AGT POSTN ACTN1 Collagens: COL1A1 2 0 −5 0 5 10 down-regulated down-regulated in aged SPARC, in aged log(fold-change) OAF inner disc E VWA1 FBN1 BGN Higher in aged OAF Higher in young OAF G ANXA4 Proteoglycans: PRG4 Glycoproteins: ECM1, FGA, FGB, FGG, FN1 APOA4 ECM regulators: HRG, ITIH1, LOX APOH Immunoglobulins: IGHA1, IGHG1, IGHG2, IGK, IGKC, IGLC6, IGLL5, COL1A2 SERPIND1, SERPINF2 up-regulated COL15A1 Matrix remodelling: HTRA1, SERPINA1, SERPINA3, SERPINC1, SERPINF1, TIMP3 COL1A1 Others: TTR, C4BPA, C8A, MMP2 in all aged Blood: FGA, FGB, FGG, C2, C3, C4B, C5, C7, C8B, CFB, CFH COL6A2 ANXA1 CFI, FLT1, FTL, GC SPARC VWA1 samples Stress: CLU (FDR q) COL6A1 IL17B ANXA5 C6VTN

10 Fibrosis: FN1, APOA1, APOB VCAN CTSD APOE COL5A2 F13A1 HAPLN3 HSPG2 C9 COL6A3 POSTN ABI3BP Matrix: Vtn 11 −log COL11A2 AHSG PLG IGHM APCS ITIH4 Proteolysis: MMP2 2 42 Blood: F13A1, C6, C9 10 Fibrosis: POSTN, APOA4, APOE 3 Immunoglobulins: IGHM 7 ECM aliated: CLEC11A

0 1 2 3 4 0 Glycoproteins:AHSG, APCS ECM regulators: AGT, HTRA3, SERPINE1, TIMP1 −4 −2 0 2 4 6 8 up-regulated up-regulated Protelytic: HTRA3, TIMP1 log(fold-change) in aged in aged OAF inner disc H

protein protein min max I Not detected Young proteome Aged-Young & modules modules Enrichment Young Aged Fibrosis Ages M1a Adipogenesis Y1 Androgen response Angiogenesis L3/4 L OAF L3/4 P OAF M1b Apical junction L3/4 R OAF L4/5 A OAF Apoptosis L3/4 A OAF Coagulation L5/S1 A OAF 4 Y2 L5/S1 P OAF 2b L5/S1 R OAF M2a L4/5 L OAF Complement L4/5 R OAF L5/S1 L OAF E2F targets L4/5 P OAF L3/4 L IAF Y3 L3/4 L NP/IAF EMT L3/4 NP L3/4 P NP/IAF M2b L3/4 R NP/IAF Estrogen response early L3/4 R IAF Y4 Estrogen response late L4/5 L IAF L3/4 A NP/IAF Fatty acid metabolism L4/5 L NP/IAF G2M checkpoint L4/5 P NP/IAF 1a 1b L4/5 R IAF variable proteins L5/S1 L IAF Hedgehog signaling L4/5 R NP/IAF Heme metabolism L4/5 NP Hypoxia

L4/5 A NP/IAF Other L5/S1 A NP/IAF Il2 stat5 signaling L5/S1 R IAF L5/S1 L NP/IAF KRAS signaling dn L5/S1 R NP/IAF KRAS signaling up L5/S1 NP Mitotic spindle L5/S1 P NP/IAF MTORC1 signaling L3/4 L OAF L3/4 R OAF 2a M3 MYC targets v1 L3/4 A OAF 3 L4/5 L OAF L5/S1 R OAF Myogenesis L3/4 P OAF L3/4 R IAF L3/4 L IAF L3/4 A NP/IAF Oxidative

L3/4 P NP/IAF the constant set P53 pathway L3/4 R NP/IAF 6 L3/4 L NP/IAF Peroxisome L3/4 NP Proteins in PI3K/AKT MTor signaling L4/5 L NP/IAF L4/5 NP Protein secretion L4/5 P NP/IAF ROS pathway L4/5 R IAF L4/5 L IAF UV response dn L4/5 A NP/IAF M4 Xenobiotic metabolism L4/5 R NP/IAF Hypertrophy L5/S1 P NP/IAF 6a L4/5 A OAF Stress L4/5 R OAF M5 L4/5 P OAF L5/S1 P OAF L5/S1 L OAF L5/S1 A OAF M6a L5/S1 L IAF

L5/S1 R IAF Marginally Others L5/S1 L NP/IAF 5 detected L5/S1 NP in young L5/S1 R NP/IAF M6b L5/S1 A NP/IAF ZG16 IGKV1-33 IAH1 NOV MGAT1 CTNNB1 PLA2G2A ITIH6 SAA1 PRPH PREP SPAG9 EPYC DAG1 SSBP1 MPZ PPBP AOC2 VPS13C ROCK2 MYADM IGHV5-51 FLT1 CDSN PAPSS2 HK1 KRT6C KRT5 KRT9 KRT1 KRT2 TSN PSMD1 SPARC OXCT1 NES FBXO2 ATP1A1 TWF2 NEO1 SRPX PMM2 PGAM1 LECT2 LANCL1 CHRDL2 SSC5D SELP RCN1 THBS3 FNDC1 COL8A1 RNASE3 COL3A1 LTBP4 TNXB HRAS COL15A1 S100A6 S100A1 SPP1 PSMB6 VAPA RNASET2 IGLV6-57 HHIP CTSC KRT14 MATN4 HYAL1 EFHD2 DDB1 CAPN3 BCAN SLPI TNFRSF11B XYLT1 SERPINA5 MFI2 CLSTN1 CXCL12 SCUBE1 VIT IL17B MATN3 COL5A2 FRZB ITIH5 COL5A1 CD109 ENPP2 EMILIN3 COL9A1 LOXL3 LOXL2 LECT1 IGFBP7 RNASE4 LYZ CHAD FGFBP2 ANG VCAN ABI3BP HHIPL2 COL2A1 COL11A2 COL11A1 CLEC3A SLIT3 KRT8 KRT19 MXRA5 DKK3 INHBA QSOX1 APOD CDH1 ITGAV CKB RNASE1 CD9 TPR LPL STXBP3 MYOZ1 COL9A3 UBXN10 MGP ECH1 SRL GRN MRC1 EEA1 LGALS3 UGP2 DSC3 EPHX1 BMP3 PSME1 CCL18 SLC1A5 G6PD GGT2 APOBEC2 ITGB1 FBN3 DPP4 RSU1 KAL1 B4GAT1 CCT4 ACADVL POFUT2 PLIN4 LMAN2 GYG1 HSPA2 LMAN1 FAM49B SPTA1 HBG1 ANGPTL7 SPTB S100A8 ERAP1 LASP1 FIBIN HSPA13 AKR1C1 RYR1 LTBP1 UTRN SLC2A1 VEGFA GNA11 FKBP7 MECP2 ATP6V1E1 COL4A1 SEC31A PDGFD KRT16 FAM129B IGFBP6 H1FX IGHV3-23 CRIP2 CFHR1 PTN MSMP EIF5 ESYT1 DKK1 XIRP1 PSMA2 HPRT1 CAPN13 DHX9 COPA APMAP SBSPON TNMD APOC4 MASP1 CASQ1 ZNF282 DEFA3 MT2A HIST1H1B NT5E STXBP1 AMBP KERA KRT10 FHL3 COL1A1 WISP2 FBLN1 MYOC CAV1 TNNI1 TOR1B SLC25A4 GOT1 PYGM TNNC2 TNNC1 MYH2 TNNT3 MYL3 MYL1 MYLPF MYH1 MYH7 MYL2 MYBPC1 TPM3 TPM2 TPM1 ACTN2 MYOM3 TNNT1 PDHA1 LDB3 PFKM MYOM2 MYBPC2 UQCRFS1 ATP5O FLNC CFL2 BIN1 TUFM AOC3 ACTN3 DES PVR GPD1 KLHL41 GLO1 RPS21 CTSG CKMT2 NPM1 LDHB SPTBN1 OLFML1 MYH11 LAMA2 LAMA4 LAMC1 LAMB2 NID1 COL4A2 TUBB4B TUBB TUBA1B TKT ERP44 PTRF VIM ANXA2 HNRNPU UBA1 IQGAP1 CBR1 EEF1A1 S100A10 EHD2 TNC NID2 COL14A1 COL12A1 HIST1H1C DSTN VCL MYH10 HNRNPA3 TLN1 SPTAN1 NCL MYL12A HNRNPK HDGF SOD2 RPLP2 RPL18 HSPA9 LMNA HSPA1B MYL6 HSPB1 ATP5A1 ATP5B ASPN SERPINH1 LG CLTC DPYSL2 DPYSL3 ACTN1 RNH1 PLS3 ARHGAP1 TAGLN2 MYL6B VCP MYH9 PLEC FLNA PFN1 CAP1 CLIC1 CFL1 AHNAK CALR MAMDC2 ACTN4 HIST2H2AC FSCN1 SEPT2 TXNDC5 WDR1 PGRMC1 GATD3A ANKRD2 SNX3 SH3BGR FKBP1A SH3BGRL TCP1 CLSTN2 EIF2S1 GMFB DPYD RPN1 VDAC1 PAPPA MYOF HINT1 MVP RRBP1 SFPQ KCTD12 SRSF7 CAST MARCKS FLNB XPNPEP1 LMNB1 ATP5H SPON1 HSPD1 HADHB ACO1 PGD PHB LCP1 RAN TAGLN ALDH2 TPT1 BASP1 YWHAG SERPINB1 CAPZA1 ARHGDIB FABP4 HSP90AB1 LMNB2 DDAH2 ADH1B NME2 TMOD4 LMCD1 SDHA ETFB MYH4 BCAM FSTL3 CAPZA2 STIP1 TPSAB1 CCT5 OGDH MYOM1 ACAA2 PRDX3 SDHB DLST GPX1 CCT6A HADHA CD163 ALDH1A1 IDH1 CAPZB RPS19 HNRNPC RPS9 GNB2L1 MDH2 RPL4 ESD LAMB1 IDH2 ACO2 AKR1B1 AKR1A1 RPS20 RPL11 ANXA11 TXN SYNCRIP HUWE1 HNRNPA1 ANP32A CYB5R3 GLUD1 ADD3 VAT1 TPM4 IGHV3-64D DCTN2 TPP1 TFG CAPN1 FABP5 DYNC1H1 SEPT11 LAP3 RPL5 CTSA NAP1L1 HLA-DRA DCTN1 CTSD CTSB TNNI2 GARS MAP1B EEF1A2 SNX2 MYO1D AGL RPN2 CAPN2 AP2A2 RPS2 ANPEP AKR7A2 HMGB1 S100A9 HRNR MPO CA2 CA1 LTF SLC4A1 HBD CAT BLVRB PSMA7 RPS16 PRKDC HSPA4 H6PD C1QBP RPL10 MYH3 CNN1 PEA15 FAM162A CCT2 GTF3C4 ATP5D NSFL1C DDX1 PDLIM5 UQCRC1 APOC2 PHPT1 DMD AAMDC VPS35 DCD CNDP2 FBLN2 H3F3A HMCN1 PRKCDBP H1F0 MRC2 DARS PFKP GNAS RAC1 CHD9 PGAM2 SOD1 TPI1 MDH1 FHL1 ALDOC FABP3 CRYAB MB CA3 ENO3 CKM ALDOA LAMA5 HAPLN3 MATN2 COL6A3 COL6A2 COL6A1 LOX FBN1 ANXA4 ANXA1 COL1A2 CALU CDK17 NEB UQCRC2 PDHB HADH IGKV2-24 AK3 PAPOLA AK2 RDX AKR1C3 UQCRB MBL2 CRISPLD1 COX6B1 SF3B3 CHI3L2 COX4I1 PHB2 IGKV1-6 AKAP12 MFAP4 COL28A1 CCL21 SIL1 CYFIP1 HNRNPL PPIC PAICS DDX39B CMPK1 CDC42 RPS11 PROZ DAK SEMA3A TNFAIP6 PRG4 HTRA1 TIMP3 FN1 CRLF1 CHI3L1 BPIFB1 A2M AGT F13A1 VWF PTGES3 IL1RAP TCAP CORO1B IGKV4-1 VARS MST1 CFHR5 MANF C4A ECM1 PROS1 ROCK1 GPLD1 EFEMP1 POSTN MMP2 SERPINF1 ITIH1 CRP PLTP SERPINA10 SAA4 APOC3 NAMPT FCN3 PODN BTD SERPINA6 SERPINF2 PGLYRP2 ITIH4 SERPIND1 PON1 DSG1 C5 APOA2 KLKB1 IGFALS HPR SERPINA3 SERPINA1 F12 CFI AFM AHSG SERPINA7 C3 IGHD C7 HABP2 C4B C2 IGLL5 IGLC6 IGKC LRG1 ORM2 ORM1 IGHA1 HPX GC IGHG3 SERPINC1 P0DOX2 KNG1 IGHG2 IGHG1 TF ALB APOH F2 A1BG CD14 AZGP1 PLG ITIH2 VTN FGA FGG FGB APOA1 TTR HRG C9 C8G C8A IGHM HP APOA4 SERPINA4 LBP C8B C6 APOE APCS APOB IGHG4 CFH IGJ CD5L F5 HOXD1 SHBG LGALS3BP F13B CPN2 CPB2 APOL1 APOC1 MYLK APOM ANGPTL4 C1RL PIGR PADI2 F11 ARPC5 FKBP10 IGKV3-20 CNPY2 ITIH3 DSP C4BPA CFP C1QC C1QB proteins ALS1 (M6 Module Module excld. Module M1 Module M2 Module M3 M4 M5 Module M6 M6a) (NP Signature) (AF Signature) (AF Sig.) (AF Sig.) (AF Sig.) (Blood) FIGURE 6

bioRxiv preprint doi: https://doi.org/10.1101/2020.07.11.192948; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.Collagens: COL1A1, COL3A1, A Young and non-degenerated (YND) B Wnt, BMP/TGFβ COL13A1 antgaonists: Proteoglycans: DCN Aged and degenerated (AGD) CHRDL2 Proteolytic: ADAMTS5 BMP: BMP2, NP SVM boundary for YND/AGD Others: ANKRD10, DST, FLJ38717, HIST1H2BG, TGFb: LTBP1 AF SVM boundary for AF/NP JMJD1C, N4BP2L2, Ossi cation: IBSP 0 0 EIF1AY PDE1A, PELI1, TncRNA, PLA2G2A Proliferation: CCND1, AGD45NP IGFBP3/5/6 YND74AF AGD45AF TRA2A, UBE2D3, WIF1 ZCCHC7, ZNF207, F5 Osmotic: AQP1, KCNK5 ZNF638 PSPH YND88AF C4BPA SPARCL1 EPYC 0.5 IL11 PRG4 0 WNT16 KIAA0146 NP markers: GABBR2 0 SLC1A1 up-regulated Collagens: COL1A1, down-regulated KRT8/19, FN1 COL14A1 in all AGD CD24, NRN1 in all AGD COL10A1 YND88NP AGD40AF -0.5 Glycoproteins: FGL2 samples HOPX, SAMD5 ST6GALNAC5 samples IGFBP1 Others: ACTA2, Stress: HSPA6 TMEM200A TNFAIP6 PC2(13.5%) 0 EMCN, MMP1, BMP antgaonists: ADAMTS6 DEFB1 SERPINF1, 0 GREM1 CDH19 0 CXCL14 GPNMB, EDNRA, Wnt antgaonists: TREM1 KRIT1, TM4SF18, 1 15 FRZB 2 1 AGD40NP STC1 1 7 -0.5 12 15 -60YND74NP -40 -20 0 20 40 116 35 0.5 0 0 0 -100 -50 0 50 100 down-regulated down-regulated up-regulated up-regulated PC1 (51.5%) in AGD AF in AGD NP in AGD AF in AGD NP

C D LAMC1 MYH9

SERPINF1 6

HPX SERPINC1 AHNAK IGHA1 MYH9 FGG ITIH2 ATP5B LAMA4 VCP THBS1 4 LMNA PLS3 COL1A1 CAP1 AOC2 SERPINH1 IQGAP1 VCPATP5B COL14A1 CALR HSP90AB1 VCLACTN1 COL12A1 CALR THBS4 HSP90AA1 NCL TUBA1B HBB HBA1 COL6A1THBS2 HSPA8ANXA6 HNRNPA2B1 IQGAP1CLIC1GSTP1MFGE8 ACTA2 PRDX2 ANXA6 PFN1 ANXA2 SND1 GDI2 COL6A2 HIST1H1C GNAI2 ACTN4UBA1 PPIA EEF2 PRELP ACTG1ALDOA ANXA5FMODANXA4 ACTG1GDI2 THBS4

SRPX2 ARF4 LOX 2 SOD1ARF1 VWA1 EEF1A1YWHAE DCNHSP90B1 PPIACOMP DSTN PDIA6UGP2 YWHAE PDIA3 CILP2 PRDX1 PDIA3 GAPDH FBN1 ANXA1 MSN TGFBI YWHAZ MSN HSP90B1

young inner disc} young CILP PGK1 HSPA5 ARF1 ANXA5 ENO1

PCOLCE2 CFB CCDC80 SMOC2 LPL AF} − { CHRD SMOC1 TIMP1 GALNT2 EDIL3 (proteomics data) (proteomics data) CD109 MFI2 OAF SLPI KAL1 C1S SERPINE2 FRZB SERPINA5COL5A1 CP CLSTN1 CHRDL2 MXRA5 SEMA3C QSOX1 DKK3 MATN3 TNFRSF11B VASN C1S {aged OAF} − inner disc} {y oung O

CFHR2

−2 0 A2M INHBA CHI3L2 ANGPTL4 KRT19

KRT8 −6 −4 −2 0 2 4 6

−4 −2 0 2 4 6 −2 −1 0 1 2 3

{YND AF} − {YND NP} {AGD AF} − {AGD NP} (microarray data) (microarray data) E F IGHM

APOA4 10 APOA4 SERPINC1 APOA1 HPX PLG FGG IGHA1 FGA PLG ITIH2 ITIH4 F2 POSTN C8B SERPINF1 VTN ITIH4 ORM1 C6 APOB APCS IGHG1 5 C6 APOH AZGP1 TF AHSG F13A1 PRG4 POSTN F13A1 A2M ECM1

CTSD TIMP3 HTRA1 SERPINA1 HTRA3 FN1 CLU SERPINA3CFD SERPINE1 CTSD CCDC80 ANGPTL2 TIMP1 CLEC11A 0 (proteomics data) (proteomics data) HSPA5 ANXA5 APOD EDIL3 HSP90AA1 ACAN RPS27A PEBP1 ANXA5 HNRNPA2B1YWHAETHBS4 RCN1DSTN VWA1 HIST1H1C ACTA2 LGALS3 SPP1 ABI3BP HAPLN3 ANXA4 COL1A2 CCL18CXCL12PYGM COL5A2 AHNAK LECT1 TUBA1B {aged OAF} − {young COL1A1 HAPLN3 COL15A1 CD109 COL6A1 COL6A2 CHRDL2 SLPI CALU VIM COL6A1 FRZB ANXA4 UGP2 SCUBE1 SPARC CLEC3A COL14A1 {aged inner disc} − {young ENPP2 CKM COL11A2

−5 KRT8

−4 −2 0 2 4 6 8 KRT19 IL17B IL17B COL11A2

−4 −2 0 2 4 6 −4 −2 0 2 4

{AGD AF} − {YND AF} {AGD NP} − {YND NP} (microarray data) (microarray data) FIGURE 7

A B YND: Young and non-degenerated Collagens Glycoproteins ECM regulators Others bioRxivSILAC preprint doi: https://doi.org/10.1101/2020.07.11.192948AGD: Aged and degenerated ; this version posted July 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved.Proteoglycans No reuse ECMallowed affiliated without permission.Secreted factors

YND NP YND AF

250 300 150 200 AGDAGD NP AF

heavy ex vivo 150 200 100 labeling culture medium (7d) 100 100 50 50 vy (# proteins detected) Heavy-to-Light Ratio

0 Light (# proteins detected) 0 0 Hea

NP_AGD80AF_AGD80 NP_AGD80AF_AGD80 NP_AGD80AF_AGD80 NP_YND151NP_YND152AF_YND151AF_YND152OAF_YND148OAF_YND149 NP_YND151NP_YND152AF_YND151AF_YND152OAF_YND148OAF_YND149 NP_YND151NP_YND152AF_YND151AF_YND152OAF_YND148OAF_YND149 AAAAA C SOD2 GAPDHTAGLN2 FN1 SOD3 PRG4 CLU COL2A1Caveolin FGFBP2 200 150 YND AF (averaged)

100 COMPCILP HAPLIN1HBBCOL1A1 50 0 1 100 200 300 HBB CD44 C1QBGAPDH SOST COL2A1 FGFBP2 200 FN1 150 AGD AF CILPA2M 100 CLU HAPLIN1 COMPHBA1 50 PRG4 0 1 100 200 300 light heavy GREM1TAGLN2 GAPDH SOD2 (existing) (newly synthesised) 200 150 COL2A1Caveolin YND NP (averaged) COL1A1FN1 HAPLIN1 100 CILP HBBCLU A2M Abundance (Heavy) Abundance 50 PRG4 0 1 100 200 300 GAPDHFN1 200 HBB COL2A1 AGD NP m/z 150 100 CLUHAPLIN1CILP A2M COMP 50 PRG4 HBA1HBA2 0 1 100 200 300 GAPDH proteins

D 200 E 150 YND AF (averaged) PLOD2 P4HA2 LOXL2 IGFBP7 P3H1 P4HA1 LGALS1 IGFBP3 SERPINH1 MMP3 COL5A2 SERPINE2 S100A10 COL5A1 AEBP1 COL3A1 PLOD1 F13A1 MGP AMBP COL11A1 PCOLCE ANXA2 FN1 TIMP3 COL9A3 ANXA1 EDIL3 SPARC 100 200 CD109 ACAN THBS4 ANXA6 PRG4 VTN FBLN1 HTRA1 SMOC2 POSTN TNC COL2A1 FNDC1 MFAP4 NID1 COL6A2 S100A4 KERA S100A1 TGFBI THBS2 MFGE8 ANXA4 ITIH4 COL14A1 LUM

50 THBS1 S100B CHAD LAMC1 FGFBP2 COL11A2 PTN FBN1 DCN 100 COL15A1 ABI3BP COL6A3 LOX ASPN COMP (Heavy) HSPG2 CILP SRPX2 HAPLN1 COL1A2 CLEC3A TNXB

0 CILP2 in YND AF FMOD COL6A1 COL1A1 COL12A1 Abundance MATN2 PRELP SERPINF1 FGG FGA C1QB ANGPTL2 1 100 200 300 400 SERPINF2 A2M ITIH2 PLG SERPIND1 0 200 150 AGD AF 100 100 (Heavy) in AGD AF in AGD 50 Abundance 200 0 1 100 200 300 400 Collagens Glycoproteins ECM regulators 200 Proteoglycans ECM affiliated Secreted factors 150 YND NP (averaged) 100

Abundance (Light) Abundance 50 LGALS1 GREM1 P4HA2 P3H1 PLOD2 SERPINH1 P4HA1 LOXL2 FSTL1 ANXA2 ANXA6 S100A10 CSPG4 ASPN COL5A1 ANXA1 COL14A1 200 S100A1

0 COL5A2 TIMP3 S100A4 PLOD1 TNC COL9A3 COL6A2 MGP COL11A1 TGFBI COL2A1 SPARC EDIL3 SERPINE2 COL9A1 F13A1 THBS1 ANXA4 1 100 200 300 400 S100B AEBP1 ABI3BP CD109 COL1A1 FN1 HAPLN1 POSTN LUM LOX EMILIN1 CHAD COL1A2 MFGE8 100 COL11A2 COL6A3 ACAN HSPG2 MFAP4 SRPX2

200 CILP MATN4 (Heavy) CLEC3A FMOD A2M CILP2 DCN in YND NP FBN1 HTRA1 COL6A1 SERPINF1 Abundance PRG4 ITIH2 PRELP COL12A1 IGFBP3 ECM2 VTN FGA FGG COMP C1QC PLG SERPINC1 SERPINF2 ITIH4 SERPINA1 KNG1 SEMA3A 150 AGD NP SERPINA5 LPA SERPIND1 0 100 50 100 0 (Heavy) in AGD NP in AGD 1 100 200 300 400 Abundance 200 proteins GAPDH F TMT TAILS G Standard error of YND (young and/or non-degenerated) NP Standard error of AGD (aged and degenerated) NP ● Reference Target ● ● ● ● ● ● AGD- YND K K ● ● avrg avrg N N ● ● ● K K ● ● ● ● ● ● ● ● ● ● ● ● ● (n.s.) ● ● ● ● ● ● ● ● ● ● ● ● N N ● ● ● ● ● ● ● ● ● ● AGD ● ● ● ● ● ● ● ● ● ● ● ● avrg ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● TMT ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 2 4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● labeling ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● YND K K ● ● avrg K K ● K K Pool

relative to YND136 NP less cleaved in AGD NP more cleaved in AGD NP ● −6 −4 −2 ● K K K K Ig Ig Ig K K Ig Ig Ig C3 LYZ FN1 ALB ALB CLU HBB A2M A2M CILP HBA1 PRG4 PRG4 PRG4 COMP COMP COMP COMP COMP COMP COMP COMP COMP COMP IGHG4 HTRA1 ABI3BP COL2A1 COL2A1 COL2A1 COL2A1 COL2A1 COL2A1 COL2A1 COL2A1 COL2A1 COL2A1 COL2A1 COL2A1 COL2A1 COL2A1 COL2A1 COL2A1 COL2A1 COL2A1 COL2A1 COL2A1 COL2A1 COL1A1 COL2A1 COL2A1 COL2A1 COL2A1 COL1A1 COL1A1 HAPLN1 HAPLN1 FGFBP2 Trypsin digestion CHRDL2 COL11A2 COL11A1 5 10

K K Uncha racterized K K H K K ● AGD Depletion of avrg CHO Standard error of YND AF ● ● tryptic peptides ● ● CHO ● AGD YND withpolymer Standard error of AGD AF ● avrg- avrg ● ● ● ● ● ● ● (p=0.006147) ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● YND ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● avrg ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 2● 4 6 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● Enrichment of relative to YND136 AF ● ● ● ● ● ● ● ● ● ● ● ● N-terminal peptides −2 ● ● ● ● & mass spectrometry ● less cleaved in AGD AF more cleaved in AGD AF Ig C3 FN1 FN1 CLU HBB HBB CILP CILP CILP HBA1 ACAN ACAN ACAN ACAN CILP2 CILP2 COMP COMP COMP COMP COMP COMP COMP COMP COMP Cluster COL2A1 COL1A1 COL1A1 COL1A1 COL6A3 COL1A1 COL1A1 COL1A2 COL1A1 COL1A1 COL2A1 COL1A1 COL1A1 COL1A1 COL1A1 COL1A1 COL1A1 COL1A2 COL1A1 COL2A1 COL1A1 COL1A1 COL6A3 COL2A1 COL2A1 COL1A2 COL6A1 COL1A1 COL2A1 COL2A1 COL1A2 COL1A1 COL1A1 COL1A2 COL1A1 m/z COL2A1 COL2A1 COL1A2 COL2A1 COL2A1 COL1A2 COL2A1 COL2A1 COL2A1 COL2A1 COL3A1 COL2A1 COL1A1 COL1A1 COL1A1 COL1A2 COL2A1 COL1A1 COL1A1 COL2A1 COL1A2 COL2A1 HAPLN1 HAPLN1 HAPLN1 FGFBP2 IGKV3-11 COL11A1 IGKV1-33 IGKV3-20 Uncharacterized 5 17 FIGURE 8 A B L3/4, Stack-2 L4/5, Stack-2 A L5/S1, Stack-2 bioRxiv preprint doi:A https://doi.org/10.1101/2020.07.11.192948; this version postedA July 12, 2020. The copyright holder for this preprint (which 10 was not certified by peer review) is the author/funder. All rights reserved. No reuse allowedStack without 1 permission. Stack 2 4 L R Stack 3 L 1 R L R 8 6 2 3 7 9 5 11 D P P P

NP C L3/4 L4/5 L5/S1 NP/IAF (L) 1 2 3 4 5 6 7 8 9 10 11 NP/IAF (A) NP/IAF (P) NP/IAF (R) IAF (R) IAF (L) OAF (R) OAF (L) OAF (P) OAF (A) intensities L3/4(3) L4/5(3) L4/5(2) L4/5(1) L3/4(2) L3/4(1) L5/S1(3) L5/S1(2) L5/S1(1) 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.1 0.30.20.4 0.5 0.6 NP NP NP MRI Intensities IAF L IAF R OAF L OAF R OAF A OAF P IAF L IAF R OAF L OAF R OAF A OAF P IAF L IAF R OAF L OAF R OAF A OAF P NP/IAF L NP/IAF R NP/IAF A NP/IAF P NP/IAF L NP/IAF R NP/IAF A NP/IAF P NP/IAF L NP/IAF R NP/IAF A NP/IAF P

F LASSO predicted MRI E Real MRI intensities intensities in the young discs Collagens Glycoproteins ECM regulators in the young discs (based 76 ECM genes) Proteoglycans ECM affiliated Secreted factors 0.5

0.75

0.4

0.50

0.3 ASPN ANXA4 ANXA5 ANXA1 CLEC3B CTSD CTSB SERPINB1 LOX SERPINH1 S100A9 PTN S100A11 S100A4 COL6A2 COL6A3 COL6A1 COL12A1 COL15A1 COL14A1 COL1A1 FMOD PRELP OGN HSPG2 THBS1 FBLN1 THBS4 MFGE8 MMRN2 TNXB LAMC1 NTN1 VTN TNC THBS2 NID2 FBN1 MATN2 LAMB2 VWA1 POSTN LAMA5 FGA FGB ANXA6 ANXA2 SEMA3E 0.25 0.0 0.5 0.2 FN1 A2M Real MRI intensities ISM2 CST3 MBL2 SDC2 PPBP PRG4 EPYC VCAN ECM1 CHRD TIMP3 TIMP1 TIMP2 FSTL1 LOXL3 LOXL2 INHBA TGFB1 AEBP1 HTRA3 SPARC CSPG4 HGFAC MXRA5 PDGFC

IGFBP3 0.00 FAM20C Predicted MRI intensities TNFAIP6 SEMA3A MRI intensities SERPINA5 SERPINA3 SERPINE2 SERPINA7 SERPINE1 Correlation with SERPING1

−0.5 NP IAF NP IAF OAF OAF

NP/IAF NP/IAF

INHBA G THBS1 development FMOD ageing, injury DKK3 CHRDL2 DCN degeneration FRZB CHRD CD109 Shh (HHIP, SCUBE) Wnt Bmp TGFβ TGFβ (TGFBI, LTBP1) notochordal cells (KRT19, KRT8, DSC3) Hypertrophy chondrocytes (LECT1/2, CLEC3A) hypertrophy (COL10A1, IBSP) proteolytic (HTRA1,ADAMTS, MMP) S100A1 newly inammation synthesised stress (SOD↓, CLU) senescence (Ribosomal↓) bleeding (CD14, CD163,HBB) young brosis (POSTN, FN1) dehydration young

proteome

aged degradome aged

H Y1 (NP) attened young aged Y2 (AF) marginally

detected Y3 (SMC)

variable marginally set variable detected set constant set Y4 (Blood) inverted constant Cellularity set Cellularity

hydration matrisome

young and aged or homeostatic degenerated