Identification of pre-diagnostic proteomics markers contributing to the future risk of lung cancer
Sonia Dagnino, Barbara Bodinier, Marc Chadeau-Hyam
8th October, 2020
Barbara Bodinier Pre-diagnostic markers of lung cancer 8th October, 2020 1 / 27 Data overview
Prospective case-control study of lung cancer on 648 participants from EPIC Italy (N=382) and NOWAC (N=266, Women only)
92 circulatory inflammatory proteins measured with the Olink platform
Protein levels provided are log2-transformed (Olink’s NPX unit where an increase of 1 NPX corresponds to the doubling of protein concentration) Replicated measurements (N=2) for 56 EPIC Italy participants Quality control: 16 controls (same sample measured 16 times) ⇒ A total of 720 measurements (704 samples and 16 controls) over 8 plates (N=90 samples each)
In the same participants (NOWAC only, N=222): transcriptomics measurements (N=11,610 probes)
Barbara Bodinier Pre-diagnostic markers of lung cancer 8th October, 2020 2 / 27 Participant characteristics
Full population NOWAC EPIC Women EPIC Men (N=648) (N=266) (N=169) (N=213) Age at sample (years) 55.29 (5.58) 56.49 (4.01) 53.17 (7.49) 55.48 (4.99) Gender: Female 435 (67.1) 266 (100.0) 169 (100.0) 0 (0.0) Body Mass Index 25.58 (3.86) 24.90 (3.53) 25.56 (4.91) 26.41 (3.05) Centre NOWAC 266 (41.0) 266 (100.0) 0 (0.0) 0 (0.0) Florence 132 (20.4) 0 (0.0) 79 (46.7) 53 (24.9) Naples 8 (1.2) 0 (0.0) 8 (4.7) 0 (0.0) Ragusa 28 (4.3) 0 (0.0) 0 (0.0) 28 (13.1) Turin 122 (18.8) 0 (0.0) 26 (15.4) 96 (45.1) Varese 92 (14.2) 0 (0.0) 56 (33.1) 36 (16.9) Lung cancer status: case 323 (49.8) 133 (50.0) 84 (49.7) 106 (49.8) Time to diagnosis (years) 5.80 (3.59) 3.81 (2.02) 7.24 (3.69) 7.17 (3.87) Subtype Adenocarcinoma 142 (44.0) 63 (47.4) 37 (44.0) 42 (39.6) Large-cell carcinoma 42 (13.0) 6 (4.5) 13 (15.5) 23 (21.7) Small-cell carcinoma 46 (14.2) 26 (19.5) 10 (11.9) 10 (9.4) Squamous-cell carcinoma 50 (15.5) 19 (14.3) 11 (13.1) 20 (18.9) Other lung cancer 43 (13.3) 19 (14.3) 13 (15.5) 11 (10.4) Smoking status Never 181 (28.1) 70 (26.3) 69 (40.8) 42 (20.0) Former 199 (30.9) 73 (27.4) 35 (20.7) 91 (43.3) Current 265 (41.1) 123 (46.2) 65 (38.5) 77 (36.7) Smoking intensity 9.56 (8.75) 7.79 (6.12) 6.87 (8.66) 13.80 (9.89) Packyears 14.36 (14.59) 11.44 (10.75) 9.72 (13.00) 21.64 (17.03) Time since quitting smoking (years) 5.42 (8.77) 4.96 (9.49) 4.06 (7.44) 6.75 (8.51) Smoking duration 22.04 (16.83) 24.29 (17.91) 15.31 (15.41) 24.64 (15.08) Cumulative Smoking Index 0.94 (0.93) 0.56 (0.43) 0.93 (1.03) 1.42 (1.08) Quality Control: Warning 15 (2.3) 3 (1.1) 1 (0.6) 11 (5.2) Storage time (years) 14.91 (4.94) 9.03 (0.98) 18.97 (1.17) 18.74 (1.23) Gel status: poor quality 21 (10.7) - 4 (5.2) 17 (14.2)
⇒ Differences in smoking habits between Men and Women in EPIC
Barbara Bodinier Pre-diagnostic markers of lung cancer 8th October, 2020 3 / 27 Participant characteristics
Full population NOWAC EPIC Women EPIC Men (N=648) (N=266) (N=169) (N=213) Age at sample (years) 55.29 (5.58) 56.49 (4.01) 53.17 (7.49) 55.48 (4.99) Gender: Female 435 (67.1) 266 (100.0) 169 (100.0) 0 (0.0) Body Mass Index 25.58 (3.86) 24.90 (3.53) 25.56 (4.91) 26.41 (3.05) Centre NOWAC 266 (41.0) 266 (100.0) 0 (0.0) 0 (0.0) Florence 132 (20.4) 0 (0.0) 79 (46.7) 53 (24.9) Naples 8 (1.2) 0 (0.0) 8 (4.7) 0 (0.0) Ragusa 28 (4.3) 0 (0.0) 0 (0.0) 28 (13.1) Turin 122 (18.8) 0 (0.0) 26 (15.4) 96 (45.1) Varese 92 (14.2) 0 (0.0) 56 (33.1) 36 (16.9) Lung cancer status: case 323 (49.8) 133 (50.0) 84 (49.7) 106 (49.8) Time to diagnosis (years) 5.80 (3.59) 3.81 (2.02) 7.24 (3.69) 7.17 (3.87) Subtype Adenocarcinoma 142 (44.0) 63 (47.4) 37 (44.0) 42 (39.6) Large-cell carcinoma 42 (13.0) 6 (4.5) 13 (15.5) 23 (21.7) Small-cell carcinoma 46 (14.2) 26 (19.5) 10 (11.9) 10 (9.4) Squamous-cell carcinoma 50 (15.5) 19 (14.3) 11 (13.1) 20 (18.9) Other lung cancer 43 (13.3) 19 (14.3) 13 (15.5) 11 (10.4) Smoking status Never 181 (28.1) 70 (26.3) 69 (40.8) 42 (20.0) Former 199 (30.9) 73 (27.4) 35 (20.7) 91 (43.3) Current 265 (41.1) 123 (46.2) 65 (38.5) 77 (36.7) Smoking intensity 9.56 (8.75) 7.79 (6.12) 6.87 (8.66) 13.80 (9.89) Packyears 14.36 (14.59) 11.44 (10.75) 9.72 (13.00) 21.64 (17.03) Time since quitting smoking (years) 5.42 (8.77) 4.96 (9.49) 4.06 (7.44) 6.75 (8.51) Smoking duration 22.04 (16.83) 24.29 (17.91) 15.31 (15.41) 24.64 (15.08) Cumulative Smoking Index 0.94 (0.93) 0.56 (0.43) 0.93 (1.03) 1.42 (1.08) Quality Control: Warning 15 (2.3) 3 (1.1) 1 (0.6) 11 (5.2) Storage time (years) 14.91 (4.94) 9.03 (0.98) 18.97 (1.17) 18.74 (1.23) Gel status: poor quality 21 (10.7) - 4 (5.2) 17 (14.2)
⇒ Differences in smoking duration between NOWAC and EPIC Women
Barbara Bodinier Pre-diagnostic markers of lung cancer 8th October, 2020 3 / 27 Participant characteristics
Full population NOWAC EPIC Women EPIC Men (N=648) (N=266) (N=169) (N=213) Age at sample (years) 55.29 (5.58) 56.49 (4.01) 53.17 (7.49) 55.48 (4.99) Gender: Female 435 (67.1) 266 (100.0) 169 (100.0) 0 (0.0) Body Mass Index 25.58 (3.86) 24.90 (3.53) 25.56 (4.91) 26.41 (3.05) Centre NOWAC 266 (41.0) 266 (100.0) 0 (0.0) 0 (0.0) Florence 132 (20.4) 0 (0.0) 79 (46.7) 53 (24.9) Naples 8 (1.2) 0 (0.0) 8 (4.7) 0 (0.0) Ragusa 28 (4.3) 0 (0.0) 0 (0.0) 28 (13.1) Turin 122 (18.8) 0 (0.0) 26 (15.4) 96 (45.1) Varese 92 (14.2) 0 (0.0) 56 (33.1) 36 (16.9) Lung cancer status: case 323 (49.8) 133 (50.0) 84 (49.7) 106 (49.8) Time to diagnosis (years) 5.80 (3.59) 3.81 (2.02) 7.24 (3.69) 7.17 (3.87) Subtype Adenocarcinoma 142 (44.0) 63 (47.4) 37 (44.0) 42 (39.6) Large-cell carcinoma 42 (13.0) 6 (4.5) 13 (15.5) 23 (21.7) Small-cell carcinoma 46 (14.2) 26 (19.5) 10 (11.9) 10 (9.4) Squamous-cell carcinoma 50 (15.5) 19 (14.3) 11 (13.1) 20 (18.9) Other lung cancer 43 (13.3) 19 (14.3) 13 (15.5) 11 (10.4) Smoking status Never 181 (28.1) 70 (26.3) 69 (40.8) 42 (20.0) Former 199 (30.9) 73 (27.4) 35 (20.7) 91 (43.3) Current 265 (41.1) 123 (46.2) 65 (38.5) 77 (36.7) Smoking intensity 9.56 (8.75) 7.79 (6.12) 6.87 (8.66) 13.80 (9.89) Packyears 14.36 (14.59) 11.44 (10.75) 9.72 (13.00) 21.64 (17.03) Time since quitting smoking (years) 5.42 (8.77) 4.96 (9.49) 4.06 (7.44) 6.75 (8.51) Smoking duration 22.04 (16.83) 24.29 (17.91) 15.31 (15.41) 24.64 (15.08) Cumulative Smoking Index 0.94 (0.93) 0.56 (0.43) 0.93 (1.03) 1.42 (1.08) Quality Control: Warning 15 (2.3) 3 (1.1) 1 (0.6) 11 (5.2) Storage time (years) 14.91 (4.94) 9.03 (0.98) 18.97 (1.17) 18.74 (1.23) Gel status: poor quality 21 (10.7) - 4 (5.2) 17 (14.2)
⇒ Differences in time to diagnosis between NOWAC and EPIC
Barbara Bodinier Pre-diagnostic markers of lung cancer 8th October, 2020 3 / 27 Participant characteristics
Full population NOWAC EPIC Women EPIC Men (N=648) (N=266) (N=169) (N=213) Age at sample (years) 55.29 (5.58) 56.49 (4.01) 53.17 (7.49) 55.48 (4.99) Gender: Female 435 (67.1) 266 (100.0) 169 (100.0) 0 (0.0) Body Mass Index 25.58 (3.86) 24.90 (3.53) 25.56 (4.91) 26.41 (3.05) Centre NOWAC 266 (41.0) 266 (100.0) 0 (0.0) 0 (0.0) Florence 132 (20.4) 0 (0.0) 79 (46.7) 53 (24.9) Naples 8 (1.2) 0 (0.0) 8 (4.7) 0 (0.0) Ragusa 28 (4.3) 0 (0.0) 0 (0.0) 28 (13.1) Turin 122 (18.8) 0 (0.0) 26 (15.4) 96 (45.1) Varese 92 (14.2) 0 (0.0) 56 (33.1) 36 (16.9) Lung cancer status: case 323 (49.8) 133 (50.0) 84 (49.7) 106 (49.8) Time to diagnosis (years) 5.80 (3.59) 3.81 (2.02) 7.24 (3.69) 7.17 (3.87) Subtype Adenocarcinoma 142 (44.0) 63 (47.4) 37 (44.0) 42 (39.6) Large-cell carcinoma 42 (13.0) 6 (4.5) 13 (15.5) 23 (21.7) Small-cell carcinoma 46 (14.2) 26 (19.5) 10 (11.9) 10 (9.4) Squamous-cell carcinoma 50 (15.5) 19 (14.3) 11 (13.1) 20 (18.9) Other lung cancer 43 (13.3) 19 (14.3) 13 (15.5) 11 (10.4) Smoking status Never 181 (28.1) 70 (26.3) 69 (40.8) 42 (20.0) Former 199 (30.9) 73 (27.4) 35 (20.7) 91 (43.3) Current 265 (41.1) 123 (46.2) 65 (38.5) 77 (36.7) Smoking intensity 9.56 (8.75) 7.79 (6.12) 6.87 (8.66) 13.80 (9.89) Packyears 14.36 (14.59) 11.44 (10.75) 9.72 (13.00) 21.64 (17.03) Time since quitting smoking (years) 5.42 (8.77) 4.96 (9.49) 4.06 (7.44) 6.75 (8.51) Smoking duration 22.04 (16.83) 24.29 (17.91) 15.31 (15.41) 24.64 (15.08) Cumulative Smoking Index 0.94 (0.93) 0.56 (0.43) 0.93 (1.03) 1.42 (1.08) Quality Control: Warning 15 (2.3) 3 (1.1) 1 (0.6) 11 (5.2) Storage time (years) 14.91 (4.94) 9.03 (0.98) 18.97 (1.17) 18.74 (1.23) Gel status: poor quality 21 (10.7) - 4 (5.2) 17 (14.2)
⇒ Differences in storage time between NOWAC and EPIC
Barbara Bodinier Pre-diagnostic markers of lung cancer 8th October, 2020 3 / 27 Below detection limit/missing data by disease status
Proportion of measurements below the limit of detection in cases (N=351 samples from 323 participants) and controls (N=353 samples from 325 participants) separately
1.0 Healthy controls (N=353) Lung cancer cases (N=351)
0.8
0.6
0.4 Proportion LOD below
0.2
0.0 IL2 IL4 IL5 IL6 IL8 IL7 LIF uPA NT3 TNF IL33 IL20 IL24 IL13 IL10 IL18 SCF CD6 CD5 ADA HGF LIFR OPG Flt3L OSM TSLP PDL1 FGF5 CCL3 CCL4 CST5 CD40 CSF1 TNFB IL17A IL12B IL17C CD8A ARTN NRTN MCP3 MCP2 MCP4 MCP1 IL2RB SIRT2 GDNF DNER TRAIL MMP1 AXIN1 ST1A1 FGF23 4EBP1 FGF21 CCL25 CCL20 CCL11 CCL19 CCL28 CCL23 FGF19 CD244 CXCL9 CXCL5 CXCL1 CXCL6 IL18R1 CASP8 VEGFA IL20RA IL10RA IL15RA IL10RB CDCP1 MMP10 TWEAK CX3CL1 CXCL11 CXCL10 IL1alpha SLAMF1 IL22RA1 STAMBP TRANCE BetaNGF ENRAGE TNFSF14 TGFalpha TNFRSF9 IFNgamma LAPTGFbeta1
⇒ 21 proteins with more than 30% of measurements below the detection limit ⇒ Additionally, 3 missing values for sample 103590 (warning in QC) ⇒ Very small differences in proportions of missing values between cases and controls
Barbara Bodinier Pre-diagnostic markers of lung cancer 8th October, 2020 4 / 27 Quality check: control measurements
Standard deviation between samples (dark blue) and controls (red) Protein names are ordered by decreasing proportion of samples below the LOD
2.0 Samples (N=704) Controls (N=16)
1.5
1.0 Standard deviation
0.5
0.0 IL2 IL4 IL5 IL6 IL8 IL7 LIF uPA NT3 TNF IL33 IL20 IL24 IL13 IL10 IL18 SCF CD6 CD5 ADA HGF LIFR OPG Flt3L OSM TSLP PDL1 FGF5 CCL3 CCL4 CST5 CD40 CSF1 TNFB IL17A IL12B IL17C CD8A ARTN NRTN MCP3 MCP2 MCP4 MCP1 IL2RB SIRT2 GDNF DNER TRAIL MMP1 AXIN1 ST1A1 FGF23 4EBP1 FGF21 CCL25 CCL20 CCL11 CCL19 CCL28 CCL23 FGF19 CD244 CXCL9 CXCL5 CXCL1 CXCL6 IL18R1 CASP8 VEGFA IL20RA IL10RA IL15RA IL10RB CDCP1 MMP10 TWEAK CX3CL1 CXCL11 CXCL10 IL1alpha SLAMF1 IL22RA1 STAMBP TRANCE BetaNGF ENRAGE TNFSF14 TGFalpha TNFRSF9 IFNgamma LAPTGFbeta1
⇒ More variability between the actual samples than between the controls for all proteins except IL1alpha (∼ 80% of data below the detection limit)
Barbara Bodinier Pre-diagnostic markers of lung cancer 8th October, 2020 5 / 27 Repeated measurements
Pearson’s correlation between replicates (N=56 samples measured twice) on the 71 proteins with less than 30% of missing values 1.000 0.995 0.990 0.985 0.980 0.975 Correlation between replicates Correlation between 0.970 0.965 2101850 (N=69) 2100495 (N=71) 2101630 (N=71) 2102068 (N=71) 2101536 (N=71) 2100450 (N=69) 2103779 (N=71) 2103865 (N=70) 2102038 (N=71) 2103767 (N=68) 2101836 (N=70) 2102121 (N=70) 2101730 (N=69) 2103774 (N=71) 2104083 (N=70) 2103971 (N=71) 2104096 (N=70) 2104023 (N=69) 2103895 (N=71) 2104084 (N=70) 2104013 (N=68) 2105573 (N=70) 2105490 (N=71) 2105053 (N=71) 2105697 (N=69) 2105253 (N=70) 2105623 (N=71) 2105298 (N=70) 2105524 (N=70) 2105179 (N=68) 2106943 (N=70) 2105180 (N=70) 2107903 (N=69) 2107578 (N=71) 2105501 (N=71) 2105407 (N=69) 2108054 (N=69) 2108262 (N=71) 2108330 (N=70) 2108558 (N=71) 2108343 (N=67) 2107974 (N=70) 2109777 (N=68) 2110358 (N=69) 2110574 (N=67) 2110115 (N=71) 2111901 (N=70) 2109801 (N=70) 2110032 (N=68) 2111944 (N=70) 2111169 (N=71) 2111183 (N=70) 2112878 (N=71) 2112935 (N=70) 2111168 (N=71) 2111282 (N=69)
⇒ High consistency between both measurements (correlations above 0.96) ⇒ The average value is taken for replicated measurements ⇒ Remaining missing values are imputed using QRILC for left-censored data
Barbara Bodinier Pre-diagnostic markers of lung cancer 8th October, 2020 6 / 27 Quality control and outlier detection
Principal Component Analysis: score plots along the first 3 PCs Detection of outliers along the first 5 PCs using the multivariate distance-based algorithm implemented in the R package mvoutlier
10 10 4
2
5 5 0
−2 0 0
−4 Comp 2 (9.18% e.v.) Comp 3 (3.61% e.v.) Comp 3 (3.61% e.v.) −6 −5 −5
−8
−30 −20 −10 0 10 −30 −20 −10 0 10 −8 −6 −4 −2 0 2 4 Comp 1 (43.7% e.v.) Comp 1 (43.7% e.v.) Comp 2 (9.18% e.v.)
Selected Samples (N=578) Quality Control (N=5) Sample Conservation (N=6) Detected Outliers (N=40) Quality Control + Detected Outliers (N=4) Sample Conservation + Detected Outliers (N=9) Quality Control + Sample Conservation + Detected Outliers (N=6)
⇒ All 70 samples with poor quality data are removed for further analyses
Barbara Bodinier Pre-diagnostic markers of lung cancer 8th October, 2020 7 / 27 Principal component analysis after exclusions
PCA on the imputed data after exclusion of participants (N=578)
100 NOWAC (N=238) Florence (N=131) Naples (N=7) Turin (N=83) 80 Varese (N=92) 5 Men (N=176) Women (N=402)
60
0 40 Comp 2 (7.46% e.v.)
20 −5 Percentage of explained variance of explained Percentage
0
0 10 20 30 40 50 60 70 −10 −5 0 5 10
Number of components Comp 1 (27.23% e.v.)
⇒ The first PC explains 27% of the variance ⇒ Score plots show strong differences between EPIC centres and NOWAC ⇒ Need to account for heterogeneity between study/cohorts in the models
Barbara Bodinier Pre-diagnostic markers of lung cancer 8th October, 2020 8 / 27 Principal component analysis after exclusions
Denoising of the data by extracting the residuals from a linear mixed model with protein levels (outcome) against plate and centre as random effects PCA on the residuals (denoised data)
100
5
80
60 0
40 Comp 2 (7.52% e.v.) NOWAC (N=238) −5 Florence (N=131) 20 Naples (N=7) Turin (N=83) Percentage of explained variance of explained Percentage Varese (N=92) Men (N=176) 0 Women (N=402) −10
0 10 20 30 40 50 60 70 −10 −5 0 5 10
Number of components Comp 1 (18.93% e.v.)
⇒ The first PC explains 19% of the variance ⇒ Linear mixed models were successful in removing centre effects
Barbara Bodinier Pre-diagnostic markers of lung cancer 8th October, 2020 9 / 27 Analytical plan
1 Investigating the associations with lung cancer in Women only 2 Accounting for the effects of smoking using packyears and smoking status 3 Sensitivity analyses: stratification by cohort and median time to diagnosis 4 Validation in Men 5 Accounting for joint effects of the proteins in multivariate analyses 6 Stratification on main histological subtypes 7 Validation in external cohorts (EPIC, NSHDS) 8 Evaluation of the complementarity between proteins and packyears in lung cancer status discrimination using ROC curves 9 Exploration of the functional role of lung cancer-related proteins via OMICs integration
Barbara Bodinier Pre-diagnostic markers of lung cancer 8th October, 2020 10 / 27 Univariate analyses in Women
Univariate logistic models with lung cancer status/packyears (outcome) against protein levels (predictor) and adjusted on age and BMI
⇒ 12 significant associations with lung cancer after FDR correction ⇒ 11/12 are also associated with packyears ⇒ Overall attenuation of the strength of the association with lung cancer upon adjustment on packyears, only CDCP1 survives adjustment
Barbara Bodinier Pre-diagnostic markers of lung cancer 8th October, 2020 11 / 27 Investigating the effects on smoking
Stratified analyses by smoking status
⇒ No significant association in never smoking Women ⇒ CDCP1, SCF and IL6 are significantly associated with future lung cancer status in current smoking Women
Barbara Bodinier Pre-diagnostic markers of lung cancer 8th October, 2020 12 / 27 Validation in Men
Analyses adjusted on packyears are conducted in Women (N=388) and Men (N=173) separately
0.6 CDCP1 CDCP1 3.0
0.4 2.5 (Men)
2.0 ) e
0.2 u l a
v 1.5 (Men)
− β p ( 0.0 10 1.0 g o l − 0.5 −0.2
0.0
−0.2 0.0 0.2 0.4 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
β (Women) − log10(p − value) (Women)
⇒ CDCP1 is nominally significant in Men
Barbara Bodinier Pre-diagnostic markers of lung cancer 8th October, 2020 13 / 27 Pairwise correlations between proteins
Heatmap of Pearson’s correlations between the imputed levels of the 71 inflammatory proteins in controls (left) and cases (right) Hierarchical clustering performed in healthy controls CONTROLS (N=290) CASES (N=288)
AXIN1 1 AXIN1 1 SIRT2 SIRT2 STAMBP STAMBP ST1A1 ST1A1 ENRAGE 0.5 ENRAGE 0.5 CASP8 CASP8 LAPTGFbeta1 LAPTGFbeta1 CD40 CD40 CCL11 CCL11 CXCL11 0 CXCL11 0 MCP4 MCP4 VEGFA VEGFA HGF HGF CD6 −0.5 CD6 −0.5 CD5 CD5 TGFalpha TGFalpha OSM OSM TNFSF14 TNFSF14 CD244 −1 CD244 −1 CSF1 CSF1 IL8 IL8 MMP1 MMP1 4EBP1 4EBP1 ADA ADA MCP1 MCP1 CCL23 CCL23 CXCL6 CXCL6 MCP2 MCP2 CST5 CST5 IL15RA IL15RA IL10 IL10 CDCP1 CDCP1 IFNgamma IFNgamma TNFB TNFB TNF TNF IL12B IL12B TNFRSF9 TNFRSF9 FGF21 FGF21 CD8A CD8A IL6 IL6 TRANCE TRANCE FGF19 FGF19 IL17A IL17A MMP10 MMP10 uPA uPA TRAIL TRAIL TWEAK TWEAK OPG OPG LIFR LIFR IL10RB IL10RB CX3CL1 CX3CL1 DNER DNER CCL25 CCL25 FGF23 FGF23 SCF SCF NT3 NT3 IL7 IL7 CXCL1 CXCL1 CXCL5 CXCL5 CCL19 CCL19 CCL20 CCL20 Flt3L Flt3L CCL28 CCL28 IL18 IL18 CXCL9 CXCL9 CXCL10 CXCL10 IL18R1 IL18R1 PDL1 PDL1 MCP3 MCP3 CCL4 CCL4 CCL3 CCL3 AXIN1 SIRT2 STAMBP ST1A1 ENRAGE CASP8 LAPTGFbeta1 CD40 CCL11 CXCL11 MCP4 VEGFA HGF CD6 CD5 TGFalpha OSM TNFSF14 CD244 CSF1 IL8 MMP1 4EBP1 ADA MCP1 CCL23 CXCL6 MCP2 CST5 IL15RA IL10 CDCP1 IFNgamma TNFB TNF IL12B TNFRSF9 FGF21 CD8A IL6 TRANCE FGF19 IL17A MMP10 uPA TRAIL TWEAK OPG LIFR IL10RB CX3CL1 DNER CCL25 FGF23 SCF NT3 IL7 CXCL1 CXCL5 CCL19 CCL20 Flt3L CCL28 IL18 CXCL9 CXCL10 IL18R1 PDL1 MCP3 CCL4 CCL3 AXIN1 SIRT2 STAMBP ST1A1 ENRAGE CASP8 LAPTGFbeta1 CD40 CCL11 CXCL11 MCP4 VEGFA HGF CD6 CD5 TGFalpha OSM TNFSF14 CD244 CSF1 IL8 MMP1 4EBP1 ADA MCP1 CCL23 CXCL6 MCP2 CST5 IL15RA IL10 CDCP1 IFNgamma TNFB TNF IL12B TNFRSF9 FGF21 CD8A IL6 TRANCE FGF19 IL17A MMP10 uPA TRAIL TWEAK OPG LIFR IL10RB CX3CL1 DNER CCL25 FGF23 SCF NT3 IL7 CXCL1 CXCL5 CCL19 CCL20 Flt3L CCL28 IL18 CXCL9 CXCL10 IL18R1 PDL1 MCP3 CCL4 CCL3
⇒ Overall, similar correlation patterns in cases and controls ⇒ Some correlated proteins, need to account for this in the models
Barbara Bodinier Pre-diagnostic markers of lung cancer 8th October, 2020 14 / 27 Multivariate analyses
Logistic-LASSO on lung cancer against all proteins, adjusted on age and BMI Variable selection used in combination with stability analyses (1,000 iterations on subsamples of 80% of the data) to derive selection proportions in the base and further adjusted on packyears models
1.0 1.0 SCF CDCP1 A Base model IL10 MCP1 IL8 CDCP1 Model adjusted on packyears IL6 IL12BCD8A IL10ST1A1 MMP10 0.8 0.8 CXCL11 IL8
ST1A1 LIFR CXCL6CXCL5 OPG 0.6 0.6 IL6 SCF CD8A
0.4 0.4
Selection proportion 0.2 Selection proportion 0.2
0.0 0.0
IL8 IL6 IL7 −5 0 5 10 uPA NT3 TNF IL10 IL18 SCF CD5 CD6 ADA HGF LIFR OPG Flt3L OSM PDL1 CCL4 CD40 CSF1 CST5 CCL3 TNFB IL12B IL17A CD8A MCP1 MCP2 MCP3 MCP4 SIRT2 TRAIL DNER MMP1 AXIN1 ST1A1 CCL20 CCL11 FGF19 CCL23 FGF21 FGF23 CCL19 CCL28 CD244 CCL25 4EBP1 CXCL6 CXCL5 IL18R1 CXCL9 CXCL1 CASP8 VEGFA IL10RB IL15RA CDCP1 MMP10 TWEAK CXCL11 CXCL10 CX3CL1 STAMBP TRANCE ENRAGE TNFSF14
TGFalpha log p value signed TNFRSF9 − 10( − ) ( ) IFNgamma LAPTGFbeta1
B 1.0 ⇒ Good consistency with univariate results1.0 CDCP1 CDCP1 ⇒ CDCP1 and IL10 in the model adjusted on smokingIL10 0.8 0.8 IL10
0.6 0.6 Barbara Bodinier Pre-diagnostic markers of lung cancer CCL25 8th October, 2020 15 / 27 CXCL11
0.4 0.4
Selection proportion 0.2 Selection proportion 0.2
0.0 0.0
IL8 IL6 IL7 −4 −2 0 2 4 6 8 10 uPA NT3 TNF IL10 IL18 SCF CD5 CD6 ADA HGF LIFR OPG Flt3L OSM PDL1 CCL4 CD40 CSF1 CST5 CCL3 TNFB IL12B IL17A CD8A MCP1 MCP2 MCP3 MCP4 SIRT2 TRAIL DNER MMP1 AXIN1 ST1A1 CCL20 CCL11 FGF19 CCL23 FGF21 FGF23 CCL19 CCL28 CD244 CCL25 4EBP1 CXCL6 CXCL5 IL18R1 CXCL9 CXCL1 CASP8 VEGFA IL10RB IL15RA CDCP1 MMP10 TWEAK CXCL11 CXCL10 CX3CL1 STAMBP TRANCE ENRAGE TNFSF14
TGFalpha log p value signed TNFRSF9 − 10( − ) ( ) IFNgamma LAPTGFbeta1
C 1.0 1.0 CDCP1 IL12B
CCL11 CDCP1 0.8 0.8 CD244 SCF IL10RB CXCL6 CCL3TRAIL MCP1 0.6 0.6 CD6 CXCL10 IL18
0.4 0.4
Selection proportion 0.2 Selection proportion 0.2
0.0 0.0
IL8 IL6 IL7 0 5 10 uPA NT3 TNF IL10 IL18 SCF CD5 CD6 ADA HGF LIFR OPG Flt3L OSM PDL1 CCL4 CD40 CSF1 CST5 CCL3 TNFB IL12B IL17A CD8A MCP1 MCP2 MCP3 MCP4 SIRT2 TRAIL DNER MMP1 AXIN1 ST1A1 CCL20 CCL11 FGF19 CCL23 FGF21 FGF23 CCL19 CCL28 CD244 CCL25 4EBP1 CXCL6 CXCL5 IL18R1 CXCL9 CXCL1 CASP8 VEGFA IL10RB IL15RA CDCP1 MMP10 TWEAK CXCL11 CXCL10 CX3CL1 STAMBP TRANCE ENRAGE TNFSF14
TGFalpha log p value signed TNFRSF9 − 10( − ) ( ) IFNgamma LAPTGFbeta1 1.0 1.0 SCF CDCP1 A Base model IL10 MCP1 IL8 CDCP1 Model adjusted on packyears IL6 IL12BCD8A IL10ST1A1 MMP10 0.8 0.8 CXCL11 IL8
ST1A1 LIFR CXCL6CXCL5 OPG 0.6 0.6 IL6 SCF CD8A
Analyses by0.4 histological subtypes 0.4
Selection proportion 0.2 Selection proportion 0.2
Stability analyses0.0 of the logistic-LASSO stratified0.0 by subtype:
IL8 IL6 IL7 −5 0 5 10 uPA NT3 TNF IL10 IL18 SCF CD5 CD6 ADA HGF LIFR OPG Flt3L OSM PDL1 CCL4 CD40 CSF1 CST5 CCL3 TNFB IL12B IL17A CD8A MCP1 MCP2 MCP3 MCP4 SIRT2 TRAIL DNER MMP1 AXIN1 ST1A1 CCL20 CCL11 FGF19 CCL23 FGF21 FGF23 CCL19 CCL28 CD244 CCL25 4EBP1 CXCL6 CXCL5 IL18R1 CXCL9 CXCL1 CASP8 VEGFA IL10RB IL15RA CDCP1 MMP10 TWEAK CXCL11 CXCL10 CX3CL1 STAMBP TRANCE ENRAGE TNFSF14
TGFalpha log p value signed TNFRSF9 − 10( − ) ( ) adenocarcinoma (N=91 cases) and small-cellIFNgamma carcinoma (N=32 cases) LAPTGFbeta1
B 1.0 1.0 CDCP1 CDCP1 IL10
0.8 0.8 IL10
0.6 0.6 CCL25
CXCL11
0.4 0.4
Selection proportion 0.2 Selection proportion 0.2
0.0 0.0
IL8 IL6 IL7 −4 −2 0 2 4 6 8 10 uPA NT3 TNF IL10 IL18 SCF CD5 CD6 ADA HGF LIFR OPG Flt3L OSM PDL1 CCL4 CD40 CSF1 CST5 CCL3 TNFB IL12B IL17A CD8A MCP1 MCP2 MCP3 MCP4 SIRT2 TRAIL DNER MMP1 AXIN1 ST1A1 CCL20 CCL11 FGF19 CCL23 FGF21 FGF23 CCL19 CCL28 CD244 CCL25 4EBP1 CXCL6 CXCL5 IL18R1 CXCL9 CXCL1 CASP8 VEGFA IL10RB IL15RA CDCP1 MMP10 TWEAK CXCL11 CXCL10 CX3CL1 STAMBP TRANCE ENRAGE TNFSF14
TGFalpha log p value signed TNFRSF9 − 10( − ) ( ) IFNgamma LAPTGFbeta1
C 1.0 1.0 CDCP1 IL12B
CCL11 CDCP1 0.8 0.8 CD244 SCF IL10RB CXCL6 CCL3TRAIL MCP1 0.6 0.6 CD6 CXCL10 IL18
0.4 0.4
Selection proportion 0.2 Selection proportion 0.2
0.0 0.0
IL8 IL6 IL7 0 5 10 uPA NT3 TNF IL10 IL18 SCF CD5 CD6 ADA HGF LIFR OPG Flt3L OSM PDL1 CCL4 CD40 CSF1 CST5 CCL3 TNFB IL12B IL17A CD8A MCP1 MCP2 MCP3 MCP4 SIRT2 TRAIL DNER MMP1 AXIN1 ST1A1 CCL20 CCL11 FGF19 CCL23 FGF21 FGF23 CCL19 CCL28 CD244 CCL25 4EBP1 CXCL6 CXCL5 IL18R1 CXCL9 CXCL1 CASP8 VEGFA IL10RB IL15RA CDCP1 MMP10 TWEAK CXCL11 CXCL10 CX3CL1 STAMBP TRANCE ENRAGE TNFSF14
TGFalpha log p value signed TNFRSF9 − 10( − ) ( ) IFNgamma LAPTGFbeta1 ⇒ CDCP1 is highly selected but re-ordering of the other signals suggesting heterogeneity between the subtypes
Barbara Bodinier Pre-diagnostic markers of lung cancer 8th October, 2020 16 / 27 Validation of CDCP1 in external cohorts
Validation using data from two external cohorts: EPIC (Netherlands, UK, Germany, Spain) and NSHDS (Northern Sweden Health and Disease Study) Logistic models adjusted on age and BMI (base model), and further adjusted on packyears or smoking status
⇒ Associations with all lung cancer survive adjustment on smoking ⇒ Significant associations with adenocarcinoma despite small sample size
Barbara Bodinier Pre-diagnostic markers of lung cancer 8th October, 2020 17 / 27 Quantifying the amount of disease-relevant information
ROC analyses with packyears, CDCP1 and LASSO-selected proteins to quantify the amount of disease-relevant information brought by proteins
ALL LUNG CANCER (N=191 CASES) ADENOCARCINOMA (N=89 CASES) SMALL-CELL (N=30 CASES)
A 1.0 B 1.0 C 1.0
0.8 0.8 0.8
0.6 0.6 0.6
0.4 0.4 0.4 True Positive Rate Positive True Rate Positive True Rate Positive True
0.2 0.2 0.2 Packyears − AUC = 0.74 [0.73−0.74] Packyears − AUC = 0.69 [0.68−0.70] Packyears − AUC = 0.88 [0.87−0.88] CDCP1 − AUC = 0.65 [0.65−0.66] CDCP1 − AUC = 0.68 [0.68−0.69] CDCP1 − AUC = 0.74 [0.73−0.75] CDCP1 + Packyears − AUC = 0.75 [0.74−0.75] CDCP1 + Packyears − AUC = 0.73 [0.73−0.74] CDCP1 + Packyears − AUC = 0.88 [0.87−0.89] LASSO stable (N=10) − AUC = 0.73 [0.72−0.75] LASSO stable (N=2) − AUC = 0.71 [0.70−0.71] LASSO stable (N=3) − AUC = 0.81 [0.78−0.82] 0.0 LASSO stable (N=10) + Packyears − AUC = 0.78 [0.77−0.79] 0.0 LASSO stable (N=2) + Packyears − AUC = 0.76 [0.75−0.76] 0.0 LASSO stable (N=3) + Packyears − AUC = 0.88 [0.86−0.89]
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate False Positive Rate False Positive Rate ⇒ CDCP1 alone yields an AUC of 0.65 (all LC), 0.68 (adenocarcinoma) and 0.74 (small-cell carcinoma) ⇒ Increase from 0.69 to 0.73 with CDCP1 on top of packyears (adenocarcinoma) ⇒ Moderate additional information with more proteins (N=10, AUC going from 0.75 to 0.78)
Barbara Bodinier Pre-diagnostic markers of lung cancer 8th October, 2020 18 / 27 Integration of CDCP1 with transcriptomics
Univariate analyses of CDCP1 against all N=11,610 transcripts measured in the same participants (N=222, NOWAC)
8
LRRN3 − Znl3dSA4HuOuhEnO4o ●
6 SEM1 − Ngi9Xlgh6np8jgjgjk ● ) e u l a v
● − 4 ● p
( ● ● ● ● ● ● 10 ● ● ● ●● ● ● ●
g ● ● ● ●● ●● ● ● ●● o ●●
l ● ● ● ●● ●●● ●● ●● ●● ● − ●● ●● ●●● ●● ●●● ●● ● ● ●● ●● ●● ●● ●● ●●●● ●●● ● ●● ●● ●●● ●● ●●● ● ●●●● ●●●●● ●●●●● ●●●●● ●●● ●●● ●● ●●●● ●●● ●●●● 2 ●●● ●● ●●●● ●●●●●● ●●●● ●●●● ●●●●● ●●● ●●● ●●●● ●●● ●●●● ●●●●● ●●● ●●●●● ●●●●● ● ●●● ●●● ●●●● ●●●●●● ●●●●● ●●●● ●●●●● ●●●●●● ●●●● ●●●●● ●●●●● ●●●● ●●●●● ●●●●● ●●●●● ●●●● ●●●●● ●●●●● ●●●● ●●●●●● ●●●●● ●●●● ●●●●●● ●●●●●● ●●●●●● ●●●●●● ●●●●● ●●●●● ●●●●● ●●●●●● ●●●●●● ●●●●● ● ●●●●●● ●●●●● ●●●●●●● ●●●●● ●●●●●● ●●●●●● ●●●●●● ●●●●● ●●●●●● ●●●●● ● ●●●●● ●●●●●●● ●●●●●● ●●●●●● ●●●●●● ●●●●●●● ●●●●●● ●●●●● ●●●●●●● ●●●●●● ●●●●●● ●●●●●● ●●●●●● ●●●●●● ●●●●●● ●●●●●● ●●●●●●● ●●●●●● ●●●●●● ●●●●●● ●●●●●● ●●●●●● ●●●●●● ●●●●●● ●●●●●●● ●●●●●● ●●●●●●● ●●●●●●● ●●●●●● ●●●●●● ●●●●●●● ●●●●●●● ●●●●●●● ●●●●●● ●●●●●●● ●●●●●●● ●●●●●●● ●●●●●●● ●●●●●●●● ●●●●●●● 0 ●●●●●●●●●●●●●●●
−0.10 −0.05 0.00 0.05 0.10 0.15 β ⇒ Significant association of CDCP1 with LRRN3 (marker of tobacco smoking) and SEM1
Barbara Bodinier Pre-diagnostic markers of lung cancer 8th October, 2020 19 / 27 Exploration of the functional role of CDCP1
11,610 transcripts measured in the same participants (N=222, NOWAC) Transcript nuIDs could be linked to 11,485 unique gene symbols 10,656 of these gene symbols could be identified on the Panther Database
Functional annotation based on classifications from knowledgebases: Biological Process Reactome
⇒ Each classification provides different levels of grouping of the genes (classification in groups and sub-groups of genes) ⇒ One gene can belong to different groups within each classification
Barbara Bodinier Pre-diagnostic markers of lung cancer 8th October, 2020 20 / 27 Associations between CDCP1 and Reactome pathways
Grouping of the transcripts by Reactome pathways ⇒ 1,545 functional groups involving 6,401 unique genes (some of these pathways are made of just one gene)
Summary of each group using PCA: all components explaining more than 5% of the variance of the group are kept ⇒ 8,043 PC scores summarising the pathways (new dataset) ⇒ Number of PCs per pathway ranging between 1 and 10
Univariate regressions of CDCP1 against PC scores summarising the groups Estimation of the Effective Number of Test with the number of PCs to explain 90% of the variance over the 8,043 scores (ENT=109)
Barbara Bodinier Pre-diagnostic markers of lung cancer 8th October, 2020 21 / 27 Associations between CDCP1 and Reactome pathways
Univariate regressions of CDCP1 against PC scores summarising Reactome pathways
4
) 3 e u l a v - p (
10 2 g o l -
1
0
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 b
Deactivation of the beta−catenin transactivating complex Defective GALNT3 causes familial hyperphosphatemic tumoral calcinosis (HFTC) Initial triggering of complement CBY1 CTBP1 XPO1 HDAC1 SOX4 RPS27A UBA52 TLE3 APC RBBP5 AKT1 UBB LEF1 MLL4 SOX13 BCL9 MEN1 PYGO2 BTRC TCF7 TLE4 BCL9L ASH2L TLE2 CHD8 UBC YWHAZ CTNNBIP1 TLE1 CTNNB1 CFP C2 C1QB C1QC C1QA CFB CFD FCN1 GZMM HDC C4A MUC4 MUC2 MUC20 MUC16 MUC6 MUC1
⇒ Identification of 3 groups including 6-30 transcripts that were not detected in univariate analyses
Barbara Bodinier Pre-diagnostic markers of lung cancer 8th October, 2020 22 / 27 Associations between CDCP1 and Biological Processes
Grouping of the transcripts by Biological Processes ⇒ 3,600 functional groups involving 9,826 unique genes (some of these pathways are made of just one gene)
Summary of each group using PCA: all components explaining more than 5% of the variance of the group are kept ⇒ 20,974 PC scores summarising the pathways (new dataset) ⇒ Number of PCs per pathway ranging between 1 and 10
Univariate regressions of CDCP1 against PC scores summarising the groups Estimation of the Effective Number of Test with the number of PCs to explain 90% of the variance over the 20,974 scores (ENT=140)
Barbara Bodinier Pre-diagnostic markers of lung cancer 8th October, 2020 23 / 27 Associations between CDCP1 and Biological Processes
Univariate regressions of CDCP1 against PC scores summarising Biological Processes
5
4 ) e u l
a 3 v - p ( 10 g
o 2 l -
1
0
−1.0 −0.5 0.0 0.5 1.0 b
Protein localization to nucleus (GO:0034504) Interleukin−18−mediated signaling pathway (GO:0035655) ERAD pathway (GO:0036503) Positive regulation of ERAD pathway (GO:1904294) Positive regulation of metallopeptidase activity (GO:1905050) Response to fluid shear stress (GO:0034405) Positive regulation of dendrite development (GO:1900006) Negative regulation of cAMP−dependent protein kinase activity (GO:2000480) Regulation of chemotaxis (GO:0050920) Regulation of cell−cell adhesion (GO:0022407) Positive regulation of pseudopodium assembly (GO:0031274) RNF185 RCN3 UBXN6 RHBDD1 CCDC47 SEL1L VCP MARCH6 C2orf30 DNAJB12 RNF5 SYVN1 DERL1 AKT1 ALOX5 PDGFB IL18RAP IL18R1 IL18BP YWHAE GBP2 CALR XPO1 HHEX ZBTB7A DULLARD TOR1B LMNA DNAJB6 TOR1AIP1 TMEM188 BMP4 ABCA7 ZBTB16 BANP PAF1 DVL1 TOPORS XPA RPL11 TOR1A PLRG1 RNFT1 C19orf6 USP13 ATXN3 DDRGK1 MAPK14 ANTXR1 MAPK3 CLDN4 KARS,SYK P2RX4 PKD1 CITED2 PDGFRB LRP8 TMEM106B IQGAP1 ALK CYFIP1 PACSIN1 PTN DAB2IP EZH2 PKIA PKIG PRKAR2A SIRT1 PRKAG2 PRKAR1A USP14 CXCR4 FAM49B P2RY12 PC MINK1 LEF1 SRC ADAM8 FUT7 KIT CDC42EP1 APC CDC42EP2 F2RL1 CDC42EP4 CCR7 CCL21 CDC42EP3 FAM39E C15orf62 CDC42
⇒ Identification of 13 groups including protein localization to nucleus, regulation of cell-cell adhesion and regulation of chemotaxis
Barbara Bodinier Pre-diagnostic markers of lung cancer 8th October, 2020 24 / 27 Conclusions and perspectives
Analyses of a panel of circulating inflammatory proteins in association with the future risk of lung cancer in two prospective cohorts ⇒ Identification of robust associations between CDCP1 and all lung cancer and adenocarcinoma ⇒ Associations hold in stratified analyses by cohort (EPIC/NOWAC) or median time-to-diagnosis ⇒ Validation of these findings in two independent cohorts ⇒ Moderate gain in AUC on top of packyears (0.04 for adenocarcinoma) ⇒ Limited gain when considering joint effects of inflammatory proteins
CDCP1 (CUB domain containing protein 1): transmembrane noncatalytic receptor involved in the loss of anchorage in epithelial cells during mitosis ⇒ Previously found associated with higher proliferation and poor prognosis in lung cancer
Barbara Bodinier Pre-diagnostic markers of lung cancer 8th October, 2020 25 / 27 Conclusions and perspectives
Using transcriptomics measurements in the same individuals, integration of CDCP1 with transcript levels to gain insight into its functional role Functional grouping of the transcripts based on the Reactome and Biological Processes knowledgebases Identification of associations between CDCP1 and summarised pathways ⇒ β-catenin transactivating complex: linked to a range of cancers, implicated in tumour development ⇒ cell-cell adhesion: CDCP1 disruption associated with interference in EGF/EGFR (Epidermal growth factor receptor) induced cell migration
Metabolomics measurements in the same participants: to be integrated with proteins and transcripts to explore joint effects at multiple molecular levels
Barbara Bodinier Pre-diagnostic markers of lung cancer 8th October, 2020 26 / 27 Acknowledgements
Dr Sonia Dagnino Prof Marc Chadeau-Hyam (PI) Prof Roel Vermeulen Dr Florence Guida Dr Karl Smith-Byrne Dr Mattias Johansson Dr Therese Haugdahl Nost Prof Torkjel Sandanger Funding: Cancer Research - UK ‘Mechanomics’ PRC project grant (Grant PRC 22184 to MC-H)
Barbara Bodinier Pre-diagnostic markers of lung cancer 8th October, 2020 27 / 27