Chromosome 16 Update: Surfing Transcriptomics Landscapes
Total Page:16
File Type:pdf, Size:1020Kb
Chromosome 16 Update: Surfing transcriptomics landscapes. A step beyond the annotaon of Chr-16 proteome Fernando J Corrales, Concha Gil, Juan P Albar & Chr16-Spanish HPP ProteoRed-ISCIII, Spain SpHPP!Consortium! Spanish Human! Proteome Project ! Chromosome1! 6! AIMS • Chromosome 16 annotation. • Selection of 3 cell lines based on published transcriptomic profiles for maximum chromosome 16 coverage • Transcriptomic analysis of the selected cell lines. • Shotgun proteomic analysis of the selected cell lines. • Development of SRM methods for 200 proteins/year. • Expression of missing proteins and development of SRM methods. 32 research units organized in 5 WG: WG1. Protein expression and purification. Peptides. Transcriptomics. WG2. Development of quantitative S/MRM assays. WG3. Shotgun proteomics. Molecular profiles and PTMs. WG4. Bioinformatics WG5. Biology and disease. Neurodegenerative, cardiovascular, infectious, cancer, obesity, Rheumatic disorders. SpHPP!Consortium! Spanish Human! Proteome Project ! Chromosome1! 6! AIMS • Chromosome 16 annotation. • Selection of 3 cell lines based on published transcriptomic profiles for maximum chromosome 16 coverage • Transcriptomic analysis of the selected cell lines. • Shotgun proteomic analysis of the selected cell lines. • Development of SRM methods for 200 proteins/year. • Expression of missing proteins and development of SRM methods. Chromosome 16 known and missing proteins 484 HPA an=bodies for Chr16 proteins, 51 for unknown ENSEMBL v70 110 OMIM hits 2316 genes neXtProt Nov 2012 Obesity 886 protein coding genes Neurodegenerave diseases Pepde Atlas Dec 2012 187 unknown proteins Cancer GPMDB Nov 1012 HPA v10.0 Protein expression vectors for 28 Chr16 unknown proteins Coverage of 75% gene products in Lymphoid cells Epitelial cells Fibroblasts Shotgun Transcriptomic Proteomic analysis analysis Protein profile Data integraon Gene expression profile Global and cell type specific outcomes Correlaon proteome/transcriptome Proteome coverage and chromosome distribu=on Chr16 proteome coverage Chr16 missing proteins A B Annotation of chromosome 16 using HUPO guidelines. (A) The difference between the protein coding genes in Ensemblv70 and the proteins with experimental evidence in GPMdb, neXtProt and Human Peptide Atlas result in the 187 missing proteins group. (B) Distribution of FPKM in the 16 tissue samples of HBM comparing known and missing protein genes. The p-values show the statistical significance of the decrease in the expression of genes coding for chr16 missing proteins compared to expression levels of genes coding for chr16 known proteins. SpHPP!Consortium! Spanish Human! Proteome Project ! Chromosome1! 6! AIMS • Chromosome 16 annotation. • Selection of 3 cell lines based on published transcriptomic profiles for maximum chromosome 16 coverage • Transcriptomic analysis of the selected cell lines. • Shotgun proteomic analysis of the selected cell lines. • Development of SRM methods for 200 proteins/year. • Expression of missing proteins and development of SRM methods. How to check MIAPE compliance of submiWed data? RAW MIAPE Extraction Data compilation Data curation Data inspection Data submission data Inspec=on FDR, PRIDE Proteome Data PSI MIAPE project thresholds, visualizaon XML standard (MS,MSI) definion etc… XChange Search Export engine MIAPE Extrac=on • Extraction of • MIAPE Mass Spectrometry (MS) and • MIAPE Mass Spectrometry Informatics (MSI) from X!Tandem, mzIdentML, mgf, mzML, PRIDE XML A fully localLocal workflow is being developed -> not data computerstorage on the MIAPE repository RAW MIAPE Extraction Data compilation Data curation Data inspection Data submission data Inspec=on FDR, PRIDE Data Proteome PSI MIAPE project thresholds, visualizaon XML XChange standard (MS,MSI) definion etc… Search Export engine ProteoRed MIAPE repository http://www.proteored.org/MIAPEExtractor MIAPE compliant submissions 4 cell lines – 8 labs – 19 experiments 2013 Basic rpHPLC, 1DE, offgel separations 1 PRIDE XML per experiment. Each cell line in a separate dataset Cell line # experiments # files File size Upload Elapsed method me MCF7 8 538 60.8 Gb 3h 36min CCD18 6 442 97.1Gb 6h 36min RAMOS 3 228 30.3 Gb Aspera 1h 56min protocol Jurkat 2 102 33.8 Gb 3h 01min 19 1310 222 Gb ≈ 15 hours Number of peptides/proteins/genes (1% FDR protein level): Total Chr16 Cell line Genes Proteins Pep=des (*) Genes Proteins Pep=des (*) MCF7 5.161 5.273 37.602 272 264 1.693 CCD18 4.670 4.670 26.959 205 199 1.043 RAMOS 3.413 3.343 15.945 152 150 714 Jurkat 6.785 6.989 55.055 332 332 2.420 Total 8.131 8.766 77.805 383 383 3.345 (*) unique peptides SpHPP!Consortium! Spanish Human! Proteome Project ! Chromosome1! 6! AIMS • Chromosome 16 annotation. • Selection of 3 cell lines based on published transcriptomic profiles for maximum chromosome 16 coverage • Transcriptomic analysis of the selected cell lines. • Shotgun proteomic analysis of the selected cell lines. • Development of SRM methods for 200 proteins/year. • Expression of missing proteins and development of SRM methods. Development of SRM methods 120 preteins 106 proteins 50 proteins validated Cardiotrophin 1 6 groups detected At least 2 labs 4,00E+05 3,50E+05 y = 167,15x + 4863,7 3,00E+05 R² = 0,99831 Protein( Gene_id loge #(((Laboratories #(((((((Peptides Accession 2,50E+05 Ramos MCF7 CCD18 TC28 Huh7 Jurkat HUAECS A5YKK6 CNOT1 +1186,1 2 1 x x 2,00E+05 B5ME19 EIF3CL +339,5 3 4 x x x x O14983 ATP2A1 +392,6 3 1 x x x x O43809 NUDT21 +206,8 3 4 x x x x 1,50E+05 O60884 DNAJA2 +173,1 3 1 x x x x x O75150 RNF40 +187,3 2 2 x x x 1,00E+05 P00505 GOT2 +564,2 3 3 x x x x x Intensity P00738 HP +1400,2 3 4 x x 5,00E+04 P04075 ALDOA +1076,2 3 3 x x x x x x P15170 GSPT1 +284,5 3 3 x x x x x 0,00E+00 P15559 NQO1 +384,3 3 3 x x x P15880 RPS2 +357,3 3 4 x x x x 0 500 1000 1500 2000 P22695 UQCRC2 +361,5 3 4 x x x x x P31146 CORO1A +598,1 3 5 x x x x P35637 FUS +365 3 4 x x x x P49411 TUFM +411,3 3 9 x x x x x Peptide amount (fmol) P49588 AARS +543 3 3 x x x x x P63279 UBE2I +210,9 3 1 x x x x x P69849 NOMO3 +369,4 3 2 x x x x x x 8,00E+06 P80404 ABAT +230,7 2 1 x x Q08AM6 VAC14 +187,1 3 4 x x x x Q12789 GTF3C1 +1039,8 3 1 x x 7,00E+06 Control( Q12931 TRAP1 +392,6 3 1 x x x x x Q13509 TUBB3 +901,2 3 7 x x x x x Cirrhosis( Q14019 COTL1 +236,4 3 2 x x x 6,00E+06 Q14694 USP10 +177,7 3 5 x x x x Q14807 KIF22 +387,7 2 1 x x x HCC( Q15393 SF3B3 +702 3 4 x x x x 5,00E+06 Q16775 HAGH +196,9 3 2 x x x Q49A26 GLYR1 +271,3 3 2 x x x x 4,00E+06 Q53FZ2 ACSM3 +198,8 3 1 x x Q68EM7 ARHGAP17 +1107,8 2 1 x x x x Q6P2E9 EDC4 +435,1 3 3 x x x x 3,00E+06 Q86W42 THOC6 +179,2 3 4 x x x x x Q8TBB5 KLHDC4 +285,4 3 1 x x x Intensity( Q92793 CREBBP +408,5 2 1 x x 2,00E+06 Q93009 USP7 +312,2 3 5 x x x x x x Q96DA0 ZG16B +265 2 1 x x x Q96QK1 VPS35 +335,9 3 3 x x x x x 1,00E+06 Q9NUI1 DECR2 +185 2 2 x x x Q9NUU7 DDX19A +203,5 3 7 x x x x Q9UMR2 DDX19B +219,6 2 1 x x 0,00E+00 Q9UQ35 SRRM2 +1962,2 3 3 x x x x x x CTF1 SpHPP!Consortium! Spanish Human! Proteome Project ! Chromosome1! 6! AIMS • Chromosome 16 annotation. • Selection of 3 cell lines based on published transcriptomic profiles for maximum chromosome 16 coverage • Transcriptomic analysis of the selected cell lines. • Shotgun proteomic analysis of the selected cell lines. • Development of SRM methods for 200 proteins/year. • Expression of missing proteins and development of SRM methods. Unkown proteins cloned Chromosome 16 ̴ 187 Unkown proteins Cloned in pANT7_cGST WG1 25 new proteins Digest with restriction enzymes Sequencing Mass spectrometry Expression (IVTT) and (MRM) purification (GST) Unkown proteins results Uniprot Accesion Selected Detected Detected Identified Number Protein peptides peptides transitions peptides O60359 CACNG3 Voltage-dependent calcium channel gamma-3 subunit 12 9 43 8 Q6UXU4 GSG1L Germ cell-specific gene 1-like protein 7 5 33 7 Q8WV35 LRRC29 Leucine-rich repeat-containing protein 29 9 9 53 9 Q96A59 MARVELD3 MARVEL domain-containing protein 3 6 6 34 6 Q9NXF8 ZDHC7 Palmitoyltransferase ZDHHC7 7 2 10 2 Q6PL45 C16ORF79 BRICHOS domain-containing protein 5 7 5 22 0 Q8TEW6 DOK4 Docking protein 4 11 9 44 8 Q8TB05 FAM100A UBA-like domain-containing protein 1 1 1 8 2 Q9NWW0 HCFC1R1 Host cell factor C1 regulator 1 7 4 17 2 Q9HBE5 IL21R Interleukin-21 receptor 12 8 36 5 O75324 SNN Stannin 3 2 28 2 Q96B96 TMEM159 Promethin 3 3 28 2 Q9BTX3 TMEM208 Transmembrane protein 208 3 1 5 0 Q96H86 ZNF764 Zinc finger protein 764 12 6 22 7 Q8TAZ6 CKLFSF2 CKLF-like MARVEL transmembrane domain-containing protein 2 6 2 12 1 Q8IZF4 GPR114 Probable G-protein coupled receptor 114 11 3 18 1 Q6PII5 HAGHL Hydroxyacylglutathione hydrolase-like protein 10 7 43 6 Q8TDN1 KCNG4 Potassium voltage-gated channel subfamily G member 4 11 3 12 2 Q8N635 C16ORF73 Meiosis-specific with OB domain-containing protein 14 4 26 1 Q8WTQ4 C16ORF78 Uncharacterized protein C16orf78 10 8 40 2 Q8IUW3 SPATA2L Spermatogenesis-associated protein 2-like protein 9 9 64 6 Q8WVE7 TMEM170A Transmembrane protein 170A 6 1 6 0 P17023 ZNF19 Zinc finger protein 19 15 3 11 0 A8K8V0 ZNF785 Zinc finger protein 785 10 7 58 7 CONCLUSIONS • The number of protein coding genes is 886 (ENSv70) including 187 than encode proteins without experimental MS evidence, according to the adopted C-HPP criteria.