Department of Pharmacy Analytical Biochemistry
The role of proteogenomics in understanding molecular mechanisms of COPD
Yanick Hagemeijer, Rainer Bischoff, Victor Guryev, Peter Horvatovich
X-Omics Festival, April 12, 2021 Population and intra-individual genomics variability
Omics layers: Phenome Genome/exome seq Microbiome DNA variants Metabolome RNA-seq Lipidome Data integration Glycome Transcripts Glycoproteome Kinome Population genetic Proteins, peptides Phosphoproteome variability Proteome LC-MS/MS Translatome Transcriptome Epi-genome Genome Metabolome Metabolites Lipidome Acetylome Adapted from … PMID 23664091 Disease
Tumor heterogeneity Era of personal genomics
~ 500 000
genomes / / /
Genome of the Netherlands
1000 genomes Craig project Venter James Genomes read Watson
2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 Difficulty in identifying variants de novo from MS/MS spectra
NH2-E A N F D I N Q L Y D C N W V V V N C S T P G N F F H V L R-COOH
Int Int
b1 y4 y11 y9 80,000 E E A D D A Q Q Q L V Q I 12,000 V H F T S C N V V V y 75,000 13 11,000 70,000 y y5 10 65,000 10,000 y2 60,000 y9 y12 9,000 FNGP WNCDYLQNIDFNAE 55,000 y 8,000 y15 50,000 1 y8 45,000 y7 7,000
40,000 6,000 y14 35,000 y13 b 5,000 30,000 3 b 4 y11 4,000 y16 25,000 b5 y6 b 20,000 3,000 3 b 15,000 2 y b 14 y y 6 2,000 y2 y b y 12 10,000 3 3 4 4 b6 y10 b7 y5 b10 b 5,000 b8 11 1,000 Prec++ b9 b12 b13 0 m/z 0 m/z 0 100 200 300 400 500 600 700 800 900 1,000 1,100 1,200 1,300 1,400 1,500 1,600 0 250 500 750 1,000 1,250 1,500 1,750 2,000 2,250 2,500
• Best identification method: database search (DBS) • DBS requires list of protein sequences expected to be present in the sample • Canonical sequences Swissprot and Uniprot (Ensembl) is used in common proteomics workflows • In public databases low number of variants (20, 80 and 30 k proteins) are present. Lam MCP 2011 Chronic Obstructive Pulmonary disease
20% of the smokers develop COPD, more than 200 millions persons have COPD Progressive loss of lung function with a large impact on the quality of life Insufficient insight in the molecular mechanisms of COPD Limited therapeutic options COPD phenotypes and treatment chronic bronchitis emphysema
Impaired lung function Symptoms airways alveoli
Today treatment “One size fits all” Study design
20/18 lung tissues COPD 10 COPD stage IV (8 female/2 male) 10/8 ex-smokers control (6/4 female/4 male) Control
proteins mRNA sequences
Orbitrap Q-Executive+ Illumina 1D-LC-MS/MS 20 million sequencing depth 2D-LC-MS/MS Polyadenylated mRNA fraction
Brandsma CA, Guryev V, Timens W, …, van den Berge M, Horvatovich P., Integrated proteogenomic approach identifying a protein signature of COPD and a new splice variant of SORBS1, Thorax, 2020, 75(2):180-183. PMID: 31937552 Proteogenomics data integration workflow mRNA raw sequence LC2-MS/MS data Illumina Orbitrap QE+
• Annotated proteins FastQC Grid quality control • SAAVs smoothing, resampling • Splice isoforms • indels Trimmomatic • New gene models Centroid sequence trimming • New transcripts peak detection and quantification
STAR TopHat2 Warp2D alignment and annotation alignment and annotation time alignment
Cufflinks StringTie MetaMatch mRNA assembly and quantification mRNA assembly and quantification peak matching
TransDecoder SearchGUI/PEAKS Translated protein sequence peptide/protein identification
mRNA quantitative table PeptideShaker/PEAKS Fasta format/FPKM validation of peptide/protein identification
mRNA-Seq mRNA-Seq pre-processing Peptide/protein quantitative table Quantitative table (PSM, IC) Proteomics Input files MS1 quantification mRNA-Seq and LC-MS/MS
Proteomics Output files Suits F, Hoekman B, Rosenling T, Bischoff R, Horvatovich P., Threshold Avoiding Proteomics Pipeline, peptide/protein identification Protein/mRNA and quantities Anal Chem., 2011, 83(20):7786-94. annotated/Normal: A A C A g t g a g t . . . a t a c a g A A G A One reason why
exon exon transcript is in 5 COPD samples: A A C A g t g a g t . . . a t A C A G A A G A present, and
frameshift protein is absent exon exon translated protein sequence of oncostatin M receptor gene without mutation
with mutation STRING protein-protein interaction network
STRING edges: experiments and database experiments database both
Proteomics + transcriptomics FDR 0.01 Fixed layout Peptide identification (combined dataset)
Non-synonymous New transcript variant isoform 500 164 Public (UniProt, 231 Ensembl) 8767 0 Confirmed gene models 17586 9707 0 6
361 17314 0 722 321
642
901 peptides are not in curated public databases PSM count of novel peptides present only in COPD or Control 8
Control
25 COPD 4 9
higher in Control higher in COPD 20
15 6 5 PSM countPSM 5 7 5 4 4 4 5 5
10 5 5 4 7 4 4 7 6 4 6 4
4 5
5 0 Group only “Novel peptides” identification PEAKS score and SpetcrumAI test Peptides only present in COPD Peptide sequence Quality score Gene Effect Ion support Flanking ion support ADSQDAGQETEKEGEDPQASAQDETPITSAK 134.83 AKAP12 E1600D amino acid substitution DSRPSQAAGDNQGDEVK 132.86 THRAP3 A201V amino acid substitution TGQEALSQTTISWAPFQDTSEYIISCHPVGTDEEPLQFR 131.44 FN1 Native NA NA AEHMETNAVGPSQSSDTR 129.43 MPRIP P327Q amino acid substitution EGEDPQASAQDETPITSAK 129.34 AKAP12 E1600D amino acid substitution DPAEGTPLEAAGTR 118.25 SORBS1 Splice variant, exon extension NA NA ELTVSNNDINEAGVHVLCQGLK 115.80 RNH1 R188H amino acid substitution TLEGLQVEEEPVYK 106.37 HCLS1 E361K amino acid substitution EASQGSSASSAPQSVK 105.60 ZNF830 H99Q amino acid substitution SDSELNNEVAAR 103.81 CYBRD1 S266N amino acid substitution EQALQEAMEQLEELELER 100.83 SWAP70 Q505E amino acid substitution EQQNDTSSELQNR 91.55 ZFYVE16 I192T amino acid substitution YGFQFLR 81.46 PCYOX1 S149F amino acid substitution NKYEDEINR 70.58 KRT7 H186R amino acid substitution EPVSGAVEGK 62.17 CAVIN2 Q174E amino acid substitution NVSSNPCHEAVGIK 56.53 SORBS1 Splice variant, exon extension NA NA YEDEINR 49.80 KRT7 H186R amino acid substitution
Peptides only present in Control Peptide sequence Quality score Gene Effect Ion support Flanking ion support ISYGPDWKDFYVVEPLAFEGTPEQK 120.51 HEXA I447V amino acid substitution ISNSAAYSGSVAPANSALGQTQPSDQDTLVQR 110.84 PDLIM5 T410A amino acid substitution LHSLTQAKEESEK 110.58 RRBP1 L1043H amino acid substitution - GGGAGFISGLTYLELDNPAGNK 108.34 C7 S389T amino acid substitution ASQSVSSNYLAWYQQKPGQAPR 105.88 IGKV3-20 S52N amino acid substitution QTLEKENTDLAGELR 77.52 MYH11 A1241T amino acid substitution NSLFLQMNSLR 76.68 IGHV3-43 Y99F amino acid substitution LLIYWASAR 68.41 IGKV4-1 T79A amino acid substitution LLEDLR 33.62 OTOA/PDE4DIP Native NA NA Zhu Y, Orre LM, Johansson HJ, Huss M, Boekel J, Vesterlund M, Fernandez-Woodbridge A, Branca RMM, Lehtiö J. Discovery of coding regions in the human genome by integrated proteogenomics analysis workflow. Nature Commun. 2018 Mar 2;9(1):903. Validation of non-reference peptides with synthetic peptides MS/MS spectra
SORBS1 AKAP12
SORBS1 SWAP70 E
E EQALQEAMEQLEELELER
Q505E amino acid substitution Two peptides uniquely mapping to a novel exon of SORBS1 gene SORBS1 is encoded on minus strand of chr10 (band 10q24.1), between 95.31 and 95.56 Mbp (gene length: 249.64 kb)
Exon 8 New exon Exon 7 Adds 239 AAs between residues 270 and 271 2 peptides were found by MS/MS analysis >NEWP24530 MSSECDGGSKAVMNGLAPGSNGQDKATADPLRARSISAVKIIPVKTVKNASGLVLPTDMD LTKICTGKGAVTLRASSSYRETPSSSPASPQETRQHESKPGLEPEPSSADEWRLSSSADA NGNAQPSSLAAKGYRSVHPNLPSDKSQDATSSSAAQPEVIVVPLYLVNTDRGQEGTARPP Exons 1-7 TPLGPLGCVPTIPATASAASPLTFPTLDDFIPPHLQRWPHHSQPARASGSFAPISQTPPS FSPPPPLVPPAPEDLRRVSEPDLTGAVSSTVLSPPRPPLPQKDRFAWQSPTIHNTYKDSL YLSSPKPYVPLGTPRQQNPSQPQPISVLLAAGSAPKGVVCPGSLLPDSTFPSASSQPQQR YAATRTVYHKNVSSNPCHEAVGIKKVSSLYVPCLSNNICLAASENSSRVARDPAEGTPLE New exon AAGTRAPAPGLVSRTAGTGKPPPAPPPDPPKLFFDIRKDAVNRGESPSLGTQASFPDVRP PVLGPRVTSDPENRKSKESYLLQPSYPAKDSSPLLNEVSSSLIGTDSQAFPSVSKPSSAY PSTTIVNPTIVLLQHNREQQKRLSSLSDPVSERRVGEQDSAPTQEKPTSPGKAIEKRAK… Exons 8 and further Protein domains of SORBS1 splice variants native SORBS1 1 250 500 750 1000 1266 Sorb SH3 SH3 SH3
new exon 8 +239 AA novel SORBS1 splice variant 1 250 500 750 1000 1250 1505 Atrophin-1 superfamily Sorb SH3 SH3 SH3 Cancer moonshot for melanoma
Clinically well characterized patients
Biobanks
BRAF V600E Tumor sections
Vemurafenib melanoma lymph node metastasis Multi-omics profiles & digital pathology Proteomics Genomics Digital pathology
PMIDs: 31599373, 30914758, 30900145 491.085 Da Fundings and Collaborations
Yanick Hagemeijer Ekaterina Ovchinnikova Victor Guryev Corry-Anke Brandsma Maria Yakovleva Maarten van den Berge Melinda Rezeli Wim Timens Jonatan Eriksson Karin Wolters Thomas Fehniger Alejandro Sanchez Brotons Dirkje Postma György Markó-Varga Karel Gerbrands Gyorgy Halmos Ana Ciconelle Renee Vehoeven Hjalmar Permentier Rainer Bischoff
Frank Suits