Application of the Proteograph™ Product Suite to the Identification of Differential Isoform Plasma Abundance in Early Lung Cancer vs. Healthy Controls

Asim Siddiqui*, John E. Blume, Margaret K. R. Donovan, Marwin Ko, Ryan W. Benz, Theodore L. Platt, Juan C. Cuevas, Serafim Batzoglou and Omid C. Farokhzad

Proteograph Platform Delivers Unbiased, Utilizing Proteograph Platform to Interrogate Protein Isoforms in a Proteograph Data Sheds Light on Biological Deep and Rapid at Scale Non-small Cell Lung Cancer (NSCLC) Plasma Study Consequences of Protein Isoforms

The ~20,000 in the human genome encode over one million Proteograph Platform Enables Deep and Unbiased Plasma Proteomics BMP1 Shows Differential Isoform Abundance Pattern protein variants, because of alternative splice forms, allelic variation and protein modifications. Though large-scale studies Percentage of Samples Number of Peptides Detected The lowest abundant putative protein isoform, BMP1 from the list of have expanded our understanding of cancer biology through in Which Protein Group Detected per Protein candidate identified in this study, comprises four protein analysis of both tissue and biofluids, similarly-scaled unbiased, deep 1A 1B 1C coding isoforms. Two of these isoforms are substantially longer n=80 proteomics studies of biofluids have remained impractical due to 2499 (~400-800 residues) than the other two isoforms covering additional Median 8 complexity of workflows. We have previously described the . Peptides mapping to exons that cover all four protein isoforms Proteograph Product Suite, a novel platform that leverages the n=61 (5A) have higher abundance in cancer relative to controls, whereas nano-bio interactions of nanoparticles for deep and unbiased 1992 peptides mapping to exons that cover only the two longer isoforms proteomic sampling at scale. We have shown the utility of the have higher abundance in healthy controls. BMP1 is known to play a Proteograph solution for unbiased and deep interrogation of plasma dual role in cancer, acting as both suppressor and activator3 and this from 141 subjects: 80 pre-classified healthy controls and 61 differential pattern of isoform abundance may shed further light into samples from early-stage NSCLC patients to create a plasma

Control (Healthy) BMP1’s role in cancer.

Number of Proteins of Number

Early Early NSCLC Number of Subjectsof Number

biomarker classifier for the detection of NSCLC versus healthy Counts Group Protein controls with AUC of 0.911. Here we present a further analysis of this BMP1 Early NSCLC Healthy Control BMP1 data to dissect differences between patients and controls in plasma 5A abundance of protein isoforms arising from alternative Sample Type BMP-202 ENST00000306385

splicing. 1 3 4 6 7

Peptide Peptide 4/5

y

Peptide Peptide 1

Peptide Peptide 2 Peptide Peptide 3 Peptide 6 Peptide 7

h

t

l a

e 2 5 H

1 3 4 6 7

y

l

r a

1,000 E 2 5 Genome Proteome Figure 1. NSCLC study using the Proteograph Product Suite. Number of subjects in NSCLC study including Static indicator of risk Dynamic indicator of status 800 healthy controls and early NSCLC samples (1A). Protein groups detected across 141 subjects in control vs. BMP-201 ENST00000306349

early NSCLC plasma samples (1B), across a percentage of the subjects in the NSCLC study. 2499 protein 1 3 4 6

600 y

h

t

l a

e 2 5 groups are found across all subjects and 1,992 in 25% of the subjects. Number of peptides for each protein H

400 1 3 4 6

y l group in the NSCLC study (1C) shows, 21,959 peptides are detected in total with a median 8 peptides per r High accessibility of a High utility but low E 2 5 200 content but low utility accessibility of content protein across the NSCLC plasma study using the Proteograph Product Suite. Cataloged variants (mm) variants Cataloged BMP-204 ENST00000397814 0

1

y

h

t

l a

e 2 Utility of Information H

1

y

l

r a

E 2

10M+ human exomes genetic variants BMP-203 ENST00000354870 ~695M 1

Identification of Putative Protein Isoforms Using Peptide Abundance y

catalogued h t

1M+ genomes l a

e 2 H

1

y

l

r a

E 2

*** *** *** < 0.2% of genetic variants fully characterized 1.0 5B Peptide 1 Peptide 2 Peptide 3 Peptide 4 Peptide 5 Peptide 6 Peptide 7

p = 2.1e−04 p = 2.7e−02 p = 6.6e−03 p = 4.8e−02 p = 1.1e−01 p = 5.8e−02 p = 3.5e−02

0.5 6 Relative

5 Abundance

Core Technology 4 0.0

Proprietary Engineered Nanoparticles Detected Peptides Peptide Peptide Peptide Peptide Peptide Peptide Peptide 3 Peptides Intensity

Protein X 2

y

y

y

y

y

y

y

y

y

y

y

y

y

y

l

C

l

C

l

C

l

C

l

C

l

C

l

C

h

h

h

h

h

h

h

r

r

r

r

r

r

r

L

L

L

L

L

L

L

t

t

t

t

t

t

t

l

l

l

l

l

l

l

a

a

a

a

a

a

a

C

C

C

C

C

C

C

a

a

a

a

a

a

a

E

E

E

E

E

E

E

e

e

e

e

e

e

e

S

S

S

S

S

S

S

H

H

H

H

H

H

H

N

N

N

N

N

N N n=80 Plasma Proteins Nanoparticles Protein coronas

Figure 2. Overview of the putative protein isoform identification strategy. From 1,992 proteins we filtered Figure 5. BMP1 map and peptide intensity. BMP1 peptides

Control to proteins present in at least 50% of subjects from either heathy or early cases and searched from NSCLC early and control subjects mapped to four BMP protein- for peptides that had differential abundance between controls and cancer (p < 0.05; Benjamini- coding transcripts (5A) and corresponding peptide intensities (5B) Hochberg corrected). Next, we filtered for proteins comprising sets of peptides where at least shows; two of the peptides (purple shading) are more abundant in Digestion NSCLC one peptide had significantly higher and another significantly lower plasma abundance in healthy controls vs. NSCLC and five of the peptides (teal shading) are more abundant in n=61 early NSCLC. healthy controls. MaxQuant 5ug

16 Candidate Proteins 16 Candidate Proteins Ranked by the HPPP 16 candidate proteins We demonstrated that measurements at the peptide level for the Data analysis Microflow SWATH Tryptic peptides APOB LC/MS analysis RAP1B 15/3486 Matched plasma proteome enable quantification of differential isoform VCL TLN1 abundance patterns, which were inaccessible to prior methods of FLNA lesser scale, depth, or coverage compared to the Proteograph BMP1 COL6A3 platform. By extending our approach to include additional features PRG4 such as protein amino acid variants and PTMs, we anticipate LDHB RTN4 extending our knowledge to enable proteogenomics. Proteograph Product Suite FERMT3 HADHA THBS3 Estimatedng/mL ITIH1 C4A The Proteograph Product Suite has the throughput required C1R 0.0 0.2 0.4 0.6 0.8 for the identification of protein isoforms across the proteome in an Associated Open Targets Score unbiased manner, enabling cancer biology research and biomarker Associated Open Target Score Intensity Rank discoveries at scale.

Figure 3. Identification of 16 putative protein Figure 4. Putative protein isoforms matched to isoforms and corresponding open targets score. the Human Plasma Proteome Project (HPPP)2. 15 References: Associated open targets score for lung carcinoma out 16 proteins were found in the HPPP. These were 1. Blume et al. Nat. Comm. (2020) targets with putative protein isoforms. Nine novel ranked by estimated concentration in plasma. 2. Deutsch et al. J. Proteome Res.(2018) lung carcinoma targets with little or non-existing P13497 (BMP1) is the least abundant. P61224 Sample is ready to be analyzed on most LC/MS instruments 3. Bach et al. Mol. Ther. Oncolytics (2018 information are highlighted (teal bracket). (RAP1B) is not present in the HPPP.

Seer, Inc., Redwood City, CA - *[email protected] © Seer 2021, Proteograph and Seer are trademarks of Seer Inc, all other trademarks are property of their respective owners