<<

FROM HUMAN PROJECT TO HUMAN PROTEOME PROJECT /HPP/: RUSSIAN PARTICIPATION

ARCHAKOV A.I. INSTITUTE OF BIOMEDICAL CHEMISTRY OF RAMS, MOSCOW, RUSSIA

www.ibmc.msk.ru [HTTP://WWW.PROTEOME.RU/EN/ROAD MAP] GENOME [OUTLINE]

TRANSCRIPTOME [LAYOUT]

PROTEOME [PRODUCT] THE MAIN DIFFERENCE BETWEEN AND : THE SAME GENOME

CATERPILLAR BUTTERFLY THE DIFFERENT PROTEOME STEPS OF HPP HUPO FORMATION:

BARBADOS CONFERENCE 2007 05/11-01-2007

H. PEARSEN. BIOLOGISTS INITIATE PLAN TO MAP HUMAN PROTEOME. NATURE, 452, 24, 920-921, 2008.

HUMAN PROTEOME PROJECT HUPO 2008, 7th WORLD CONGRESS, 16/20-08-08,Amsterdam, NL

HPP MOSCOW WORKSHOP, RUSSIA 20/21-03,2009

SEOUL, KHUPO, 26/27-03-2009 CANADIAN HUPO, TORONTO, 26/30-09-2009 THE EXISTING PROTEOMIC TECHNOLOGIES ALLOW TO LAUNCH HPP

AFTER COMPLETION OF HPP PROJECT GENOME-BASED MEDICINE SHOULD BECOME PROTEOME-BASED, WHICH WILL BE THE BASIS FOR PERSONALIZED MEDICINE. HOW MANY EXIST IN HUMAN BODY?

22 PAIRS SOMATIC CHR 20 000 GENES[1] 1 PAIR SEX CHR

~ 50 FORMS PER GENE[2]

~ 1[2] - 2[3] MLN.PROTEINS

______[1] The sequence of the human genome. Venter et al., Science. 2001 [2] Extent of modifications in Human Proteome Samples…, Nielsen et al., MCP, 2009 [3] Biospecific irreversible fishing… Archakov et al., Proteomics, 2009 HOW MANY PROTEINS IN HUMAN PROTEOME : dependence between detection limit of the staining method and number of proteins spots on 2DE. (Archakov et al., PROTEOMICS (2009), 9, 1326-1343). Staining Methods: 10-6 AmidoBlack 10-7 Coummassi Blue -8 BLOOD PLASMA 10 Silver Tio 10-9 Silver Glutar

-10 HEP G2 CELLS Cy5 , М , 10

Cy5Sat DL 10-11 10-12 10-13

2.2 Mln

10-20 380 Thsnd 10-24 10 100 500 1000 2000 3000 4000 5000 Number of protein species

WHY GENE CENTRIC PROTEOME PROJECT IS REALISTIC? • RECENT PROTEOMICS IS GENOME-BASED SCIENCE. GENE CENTRIC PROJECT IS APPLICATION OF EXISTING TECHNOLOGIES TO SIMPLIFYED TASKS

• CREATED MRM/IRREVERSIBLE BINDING TECHNOLOGY ALLOWS TO ANALYSE BLOOD PLASMA PROTEINS WITH SENSITIVITY 10-18M (1 copy/1µL) OR 1 PROTEIN COPY PER 103 OR HEP2G CELLS . • • FOCUSED MRM ANALYSIS OF SINGLE CHROMOSOME WITH KNOWN GENE NUMBER ALLOWS TO GENERATE “GOLD STANDARD” FOR GENE CENTRIC PROTEOME PROJECT. • Molecular recognize mapping (MRM) CRITERIA FOR CHROMOSOME SELECTION:

• TOTAL NUMBER OF PROTEIN-CODING GENES • CLINICAL RELEVANCE • NUMBER OF ALREADY IDENTIFIED PROTEINS • ABSENCE OF IMMUNOGLOBULINS CHROMOSOME STATISTICS

LENGTH (BPS): 78,077,248 GENES 513 KNOWN PROTEIN-CODING GENES: 285 PSEUDOGENE GENES: 64 miRNA Genes: 32 rRNA Genes: 13 snRNA Genes: 51 snoRNA Genes: 36 Misc RNA Genes: 25 Ensembl release 60 - Nov 2010 http://www.ensembl.org DATA MINING STATISTICS FOR 18th CHROMOSOME: 286 MASTER PROTEIN CODING GENES

PRIDE = 224 identified proteins Protein Atlas = 102 proteins PRIDE (PRoteomics IDEntifications database) 48 PROTEINS – NOT CONFIRMED In Protein Atlas only In PRIDE only (12) (134)

In PRIDE and in Protein Atlas (90) How many proteins can we expect for 18th chromosome?

285 Trancriptome analisys of 18th chromosome + 230 Over 80% of the genes undergo alternative splicing (Kampa at. al, Genome Res., 2004, 14, 331-342)

516 On average, each of them can have 50 PTMs x 50 (Nielsen M, Savitski M, Zubarev R, MCP, 2006, 5, 2384-) Protein species can be expected as expressed in HepG2 cell line 25800 HepG2 cell lines ซึ่งเป็นต้นกำเนิดจำกมะเร็งชนิด hepatocellular adenocarcinoma ROADMAP

PROTEOME OF THE 18-TH HUMAN CHROMOSOME: GENE CENTRIC IDENTIFICATION OF TRANSCRIPTS, PROTEINS AND PEPTIDES

http://www.proteome.ru/en/roadmap/ http://www.hupo.org/research/hpp/soc/ RusHPP CONSISTS OF TWO PHASES: GOAL OF THE PILOT PHASE IS IDENTIFICATION OF ALL MASTER PROTEINS PRODUCED BY 18thCHR IN LIVER AND HEP2G CELLS AND IDENTIFICATION OF THEM IN PLASMA WITH SENSITIVITY 10-18 M (1 PROTEIN COPY/ΜL ) AND 1 PROTEIN COPY PER 103 LIVER OR HEP2G CELLS. MAIN PHASE OF RusHPP

THE GOAL IS THE IDENTIFICATION OF ALL MODIFIED PROTEINS(ABOUT 30000) EXPRESSED BY 18CHR AT SENSITIVITY 10-18 M.

BIOINFORMATIC AND EXPERIMENTAL CREATION OF 18THCHR PROTEIN INTERACTOME BY COMBINING OF OPTICAL BIOSENSOR WITH MS (BUNEEVA O. ET AL. PROTEOMICS 2010, 10, 23-57.)

CREATION OF 18THCHR PROTEIN KNOWLEDGE BASE PRINCIPLE DIFFERENCE BETWEEN GENOMIC AND PROTEOMICS

PRINCIPLE DIFFERENCE BETWEEN GENOMIC AND PROTEOMICS IS EXISTENCE OF POLYMERASE CHAIN REACTION (PCR) IN GENOMICS ALLOWING TO MULTIPLY NUCLEIC ACID MOLECULES AND ABSENCE OF PCR-LIKE REACTION IN PROTEOMICS. DUE TO PCR GENOMICS DOES NOT HAVE DETECTION LIMIT (DL) OBSTACLE. PROTEOMICS HAS IT. THREE BOTTLENECKS OF HPP • LOW SENSITIVITY OF RECENT PROTEOMIC TECHNOLOGIES. THE BEST ONE -- MRM-MS REACHES SENSITIVITY UP TO 10-14 M*, CORRESPONDING TO 10 000 PROTEIN COPIES IN 1ΜL OF PLASMA. • THE ABSENCE OF “GOLD STANDARD” FOR SAMPLES AND SAMPLING. • PROTEOMICS IS SITUATIONAL SCIENCE. WHAT IS BOARD LINE BETWEEN DIFFERENT SITUATIONS ?? WHAT IS REVERSE AVOGADRO’s NUMBER ?

23 NA=6.022x10 MOLECULES/MOLE 1 MOLE - 1L - 1M -24 1/NA≈10 MOLE/MOLECULE ≈1MOLECULE/L = 10-24 M

[Archakov A.I. et. al. PROTEOMICS 2007, 7, 4–9] IT WOULD BE POSSIBLE TO INCREASE THE SENSITIVITY UP 10-18M [ARCHAKOV ET AL., 2009], CORRESPONDING TO 1 PROTEIN COPY IN 1ΜL. IT BECAME ACHIEVABLE DUE TO COMBINING AFM OR MRM TECHNOLOGIES WITH IRREVERSIBLE BINDING OF PROTEINS TO BRCN- SEPHAROSE. ** atomic force microscopy (AFM) and magnetic resonance microscopy (MRM) THE SENSITIVITY AT THE RANGE OF 10-18M WILL BE QUITE ENOUGH FOR STARTING THE HPP. COMBINING OF IRREVERSIBLE FISHING TECHNOLOGY WITH AFM CONCENTRATION OF PROTEINS FROM SOLUTION ON THE ACTIVATED AFM –CHIP USING IRREVERSIBLE FISHING

SOLUTION

FISHING AFM SURFACE AFM SURFACE

- CONCENTRATION OF PROTEINS SURFACED CONCENTRATION OF IN THE SOLUTION – 10-11 M FISHED PROTEINS – 10-3 M

- VOLUME – 1mL - AFM SURFACE – 1mm2 - 5 nm HEIGHT OF THE MOLECULE

AFM IRREVERSIBLE FISHING INCREASES SURFACE CONCENTRATION BY FACTOR OF 108 AFM IRREVERSIBLE CHEMICAL FISHING AVIDIN ON AFM-SUPPORT 5nm EXPERIMENT SUCCINIMIDE MODIFIED MICA -13 СAVIDIN=10 M; 0 V=1 ml; T=37 C; t INCUB= 60 min 2 SACTIVATION=0,4 mm ; 2 -4 2 Sscan=16FRAMES*25m =4*10 mm tSCAN=240min NMOLECULES=5122500 molecules 2 0 5µm /400 m -6 CVS =5*10 M (FROM EXPERIMENT) -5 CVS=10 M (FROM THEORY) THE DEPENDENCE OF ANTI-HCVcoreim /HCVcoreAg PROTEIN COMPLEX NUMBER ON HCVcoreAg CONCENTRATION IN SOLUTION (IRREVERSIBLE BINDING) (ARCHAKOV et al 2009, 9, 1363-1343) NUMBER OF 1 – EXPERIMENTAL IRREVERSIBLE BINDING COMPLEXES 2 – THEORETICAL IRREVERSIBLE BINDING -V=1 mL - V=50 mL 104 EXPERIMENTAL CONDITIONS:

103 anti-HCVcoreAgimm (PHOTO- cutoff line CROSS LINKER MODIFIED) SIMS=0,4 mm2; Scan=16FRAMES*25m2=4*10-4 102 mm2 tSCAN=240min T = 370C; t = 60 min; 10 10-13 10-15 10-17 10-19 CONCENTRATION, M COMBINING OF IRREVERSIBLE FISHING WITH MRM MS Experimental workflow for low and ultra low copied protein detection

Step 1 Step 2 Step 3

QqQ dynamic MRM analysis of individual proteins and in Human Plasma HPLC-Chip QqQ (Agilent 6410)

HPLC-Chip Q-TOF (Agilent 6510) MRM STRATEGY FOR LOW AND ULTRA-LOW COPIED PROTEIN DETECTION (BSA and CYP102/BM3/ as an example)

+ trypsin

Detection limit for purified proteins is 10-16 M, 600 copies/1 l (s/n>7)

+serum + trypsin

Detection limit for CYP102 in the presence of human serum is 10-14 M 60 000 copies/1 l (s/n>7) IRREVERSIBLE BINDING OF PROTEINS ON BrCN-SEPHAROSE BEADS

PURIFIED CYP102/BSA IN THE PRESENCE OF HUMAN SERUM

10E-9M 10E-12M 10E-15M 10E-18M

0.5-50 ml DETECTION LIMIT FOR IRREVERSIBLE BINDING OF CYP102/BSA FOR PURIFIED PROTEINS ON BrCN-SEPHAROSE PROTEIN BEADS FOLLOWING AND PROTEINS IN BY DIGESTION WITH TRYPSIN THE PRESENCE OF HUMAN SERUM IS 10-18 M, 1 COPY/1ΜL NEW TECHNOLOGIES FOR Rus-HPP

1. ANALYTICAL COMPLEX BASED ON THE COMBINATION OF ATOMIC FORCE MICROSCOPY AND MASS- SPECTROMETRY

2. ANALYTICAL INSTRUMENTS BASED ON NANOWIRES

3. INFORMATIONAL CLOUD COMPUTING SYSTEM BASED ON THE PERSONAL SUPERCOMPUTER PLATFORM WHAT IT MEANS PROTEOME-BASED MEDICINE? 1. NEW DIAGNOSTIC TESTS BASED ON HIGH SENSITIVITY TECHNOLOGY SOLUTIONS (1A) AND PROTEOTYPING (1B) WILL BE CREATED.

1A. UP-TO-DATE METHODS HAVE SENSITIVITY ABOUT 10-12 M. IT MEANS, THAT 1 BILLION OF PROTEIN COPIES COULD BE DETECTED IN 10L OF BIOLOGICAL MATERIAL. IF SENSITIVITY INCREASE UP 10-18 M WE COULD DETECT 1 PROTEIN COPY PER CELL.

1B. AS THERE EXIST SNP, SAP, AS AND PTM, THE TOTAL NUMBER OF PROTEINS ENCODED IN THE GENOME (~20,000 GENES) COULD INCREASE UP 2 MLN. PRESUMABLY, THESE UNKNOWN PROTEINS CAN BE USED IN DIAGNOSTICS. DETECTION OF HCVcoreAg AT TWO SPOTS AFM CHIP FROM HCVcoreAg SOLUTION (C=10-9 М) ANTI-HCVcoreAgimm SPOT 25 1 – ANTI HCVcoreAg 20 1 2 – anti-

15 HCVcoreimm/HCVcoreAg

% 2 h= 3-7 nm 10

5

RELATIVECONTENT, 0 0 2 4 6 8 10 HEIGHT, NM

ANTI-HBsAgimm (CONTROL SPOT)

25 3 – ANTI-HBsAg 3 imm 20 4 – AFM CHIP AFTER INCUBATION IN 15 4 HCVcore Ag (C=10-9М) 10 SOLUTION 5 SURFACE RELATIVECONTENT, % 0 TOPOGRAPHY HAS 0 2 4 6 8 10 HEIGHT, NM NOT CHANGED COMPARISON OF AFM AND OTHER METHODS (ELISA AND PCR) FOR DETECTION OF HBsAg AND HCVcoreAg HCVcoreAg HBsAg

hepatitis C virus (HCV) (Hepatitis B Surface Antigen : HBsAg) PCR (RNA HCV) ELISA АFM + - AFM + - + 24 2 + 25 3 - 8 7 - 1 6

COINCIDENCE – 76% COINCIDENCE – 89%

POSSIBLE REASON OF DISAGREEMENT: 132 aa of 191 (70%) in HCVcore PROTEIN 183 aa of 226 (80%) in HBsAg PROTEIN SEQUENCE ARE INVARIANT SEQUENCE ARE INVARIANT [Bukh et.al., PNAS, 1994, 91, 8239-8234] [Norder et.al., J. Gener. Virol.,, 1992, 73, 1201-1208] NEW DIAGNOSTIC TESTS BASED ON PROTEOTYPING PROTEOTYPE IS A RESULT OF: WHAT IT MEANS PROTEOME-BASED MEDICINE/continuation/?

2. NEW DRUG TARGETS THERE ARE ABOUT 500 DRUG TARGETS NOW IN USE IN PHARMACOLOGY. AT THE END OF HPP AROUND 5 000 - 10 000 NEW DRUG TARGETS COULD BE FOUND.

3. NEW MOLECULAR MECHANISMS OF DISEASES DEVELOPMENT WOULD BE DISCOVERED IDENTIFICATION OF THOUSANDS OF NEW PROTEINS WOULD HELP TO DECIPHER THE COMPLEX METABOLIC PATHWAYS AND REVEAL UNCOMMON INDIRECT PROTEIN-PROTEIN RELATIONSHIPS. ACKNOWLEDGEMENTS

• A. Lisitsa • V. Zgoda • Yu. Ivanov • E. Ponomarenko

IBMCH RAMS, MOSCOW