(12) Patent Application Publication (10) Pub. No.: US 2007/0083334 A1 Mintz Et Al

US 20070O83334A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2007/0083334 A1 Mintz et al. (43) Pub. Date: Apr. 12, 2007 (54) METHODS AND SYSTEMS FOR (60) Provisional application No. 60/322,285, filed on Sep. ANNOTATING BOMOLECULAR 14, 2001. Provisional application No. 60/322,359, SEQUENCES filed on Sep. 14, 2001. Provisional application No. 60/322,506, filed on Sep. 14, 2001. Provisional appli (75) Inventors: Liat Mintz, Kendall-Park, NJ (US); cation No. 60/324,524, filed on Sep. 26, 2001. Pro Hanging Xie, Plainsboro, NJ (US); visional application No. 60/354.242, filed on Feb. 6, Dvir Dahari, Tel Aviv (IL); Erez 2002. Provisional application No. 60/371,494, filed Levanon, Petach Tikva (IL); Shiri on Apr. 11, 2002. Provisional application No. 60/384, Freilich, Haifa (IL); Nili Beck, Kfar 096, filed on May 31, 2002. Provisional application Saba (IL); Wei-Yong Zhu, Plainsboro, No. 60/397,784, filed on Jul. 24, 2002. NJ (US); Alon Wasserman, Tel Aviv (IL); Chen Hermesh, Mishmar Hashiva Publication Classification (IL); Idit Azar, Tel Aviv (IL); Jeanne Bernstein, Kfar Yona (IL) (51) Int. Cl. G06F 9/00 (2006.01) Correspondence Address: (52) U.S. Cl. ................................................. 702/19, 702/20 Martin D. Moynihan PRTSI, Inc. (57) ABSTRACT P.O. Box 16446 Arlington, VA 22215 (US) A method of annotating biomolecular sequences. The method comprises (a) computationally clustering the bio (73) Assignee: Compugen Ltd. molecular sequences according to a progressive homology range, to thereby generate a plurality of clusters each being (21) Appl. No.: 11/443,428 of a predetermined homology of the homology range; and (22) Filed: May 31, 2006 (b) assigning at least one ontology to each cluster of the plurality of clusters, the at least one ontology being: (i) Related U.S. Application Data derived from an annotation preassociated with at least one biomolecular sequence of each cluster, and/or (ii) generated (63) Continuation of application No. 10/242,799, filed on from analysis of the at least one biomolecular sequence of Sep. 13, 2002, now abandoned. each cluster thereby annotating biomolecular sequences. Patent Application Publication Apr. 12, 2007 Sheet 1 of 25 US 2007/0083334 A1 CO N y CS V g) sew s O) ou Patent Application Publication Apr. 12, 2007 Sheet 2 of 25 US 2007/0083334 A1 r CO wa i s c Patent Application Publication Apr. 12, 2007 Sheet 3 of 25 US 2007/0083334 A1 Figure 2 gastrontestinal systern gastrointestinal glands -. Ngastrointestinal tract / Y. -16 A- issuire tieyer gal \ -4y Visetia esophageal gastric cal ecta NN.bowel epitheliurnissa di Sadler liver ar sets of earra carcer N largethers Y sudder tepa T toy recret juriura 4. er Patent Application Publication Apr. 12, 2007 Sheet 4 of 25 US 2007/0083334 A1 N ef CD s 3) ept .2 C ---s Patent Application Publication Apr. 12, 2007 Sheet 5 of 25 US 2007/0083334 A1 Figure 4 (page 1/9) 1 tumor 1.1 epithelial cell tumors 1.1.1 carcinoma 1.1.1.1 adenocarcinoma 1.1.1.2 lobular carcinoma 1.2 Mesenchinal cell tumors 1.2.1 sarconna .2.1.1 liposarcoma 1.2.1.2 rhabdomyosarcoma 1.2.1.3 pnet 1.2.1.4 ewing sarcoma 1.3 blood funnors 1.3.1 lymphoma 1.3.2 leukemia 1.3.3 myeloma 1.4 endocrine tumors 1.4.1 pheocromocytoma 1.4.2 carcinoid Patent Application Publication Apr. 12, 2007 Sheet 6 of 25 US 2007/0083334 A1 Figure 4 (page 2/9) 2 endocrine System 2.1 adrenal 2.1.1 pheocromocytoma 2.2 pancreas 2.2.1 islets of Langerhans 2.3 neuroendocrine 2.3.1 hypothalamus 2.3.2 carcinoid 2.4 thyroid 3 vascular tissue 3.1 arteries 3.1.1 aorta 3.2 vein Patent Application Publication Apr. 12, 2007 Sheet 7 of 25 US 2007/0083334 A1 Figure 4 (page3/9) 4 genitourinary system 4.1 urinary system 4.1.1 bladder 4.1.2 kidney 4.2 genital system 4.2.1 women genital System 4.2.1.1 cervix 4.2.É.2 ovary 4.2.1.3 uterus 4.2.1.3.1 endometriun 4.2.2 men gentile system 4.2.2.1 prostate 4.2.2.2 testis 4.2.2.2.1 epididymis 5 muscles 5.1 rhabdomyosarcoma 5.2 tongue 5.3 bladder 5.4 heart 5.5 uterus Patent Application Publication Apr. 12, 2007 Sheet 8 of 25 US 2007/0083334 A1 Figure 4 (page 4/9) 6 Blood 6.1 peripheral blood 6.1.1 erythroid line 6.1.2 leukocyte 6.1.2.1 lymphoid system 6.1.2.1.1 lymphoma 6.1.2.1.2 spleen 6.1.2.1.3 thalamus 6.2 stern cells 6.2.1 myeloid 6.2.2 myeloma 6.3 Bone narroW 6.4 leukemia Patent Application Publication Apr. 12, 2007 Sheet 9 of 25 US 2007/0083334 A1 Figure 4 (page 5/9) 7 nerve system 7.1 CNS, central nervous system 7.1.1 brain 7.1.1.1 cerebrum 7.1.1.2 cerebellum 7.1.1.3 pitituary gland 7.1.1.4 hypothalamus 7.1.1.5 thalamus 7.1.1.6 olfactory 7.1.1.7 Hippocampus 7.1.1.8 amygdala 7.1.1.9 frontal lobe 7.1.10 pnet 7.2 Embryonal nerve system 7.2.1 primitive neuroectoderm 7.3 retina 7.3.1 retinoblastoma Patent Application Publication Apr. 12, 2007 Sheet 10 of 25 US 2007/0083334 A1 Figure 4 (page 6/9) 8 breast 8.1 ductal breast 8.1.1 ductal carcinoma 8.2 lobular carcinoma 8.3 mammary 9 Skeleton 9.1 bore 9.1.1 eving sarcoma 9.1.2 craniofacial 9.1.2.1 calvarium 9.2 connective tissue 9.2.1 trabeculae 9.2.2 cartilage 10 embryo 10.1 annion 10.2 chorion 10.3primitive neuroectoderm 10.4 placenta Patent Application Publication Apr. 12, 2007 Sheet 11 of 25 US 2007/0083334 A1 Figure 4 (page 7/9) 11 exocrine system 11.1 pancreas 11.1.1 islets of Langerhans 11.2prostate 11.3salivary gland Patent Application Publication Apr. 12, 2007 Sheet 12 of 25 US 2007/0083334 A1 Figure 4 (Page 8/9) 12 face organs 12.1 nose 12.2 ear 12.2.1 cochlea 12.3eye 12.3.1 retina 12.3.1.1 retinobastonia 12.3.2 lenis f2.4 mouth 12.5 tongue 13 gastrointestinal system 13.1 UICOSa 13.2stomach 13.3 intestine 13.3.1 colorectal 13.3.1.1 colon 13.4 hepatobiliary system 13.4.1 liver 13.4.2 biliary system 13.4.2.1 gall bladder 13.5 pancreas 13.5.1 islets of Langerhans Patent Application Publication Apr. 12, 2007 Sheet 13 of 25 US 2007/0083334 A1 Figure 4 (page 9/9) 14 respiratory system 14.1 nasopharynx 14.2 lung 14.2.1 small cell lung carcinona 15 Skin 15.1 dernis 15.1.1 melanocyte 16 fat tissue 16.1 iposarcoma Patent Application Publication Apr. 12, 2007 Sheet 14 of 25 US 2007/0083334 A1 120 s 100 gif AARNTSEEstegger":Ey. s s i.A. A.5 80 ' C : .S. 60 l U A. dy 40 w) s 3 220 O O 10 20 30 40 50 60 70 80 90 100 LOD Score Figure 5 Patent Application Publication Apr. 12, 2007 Sheet 15 of 25 US 2007/0083334 A1 ce o d d e e s e as ed go t s f ad n sugaolfsäsuoo Joaquin N Patent Application Publication Apr. 12, 2007 Sheet 16 of 25 US 2007/0083334 A1 i s suiao i?suolo Joaquit N Patent Application Publication Apr. 12, 2007 Sheet 17 of 25 US 2007/0083334 A1 Subolji/suo joaqun N Patent Application Publication Apr. 12, 2007 Sheet 18 of 25 US 2007/0083334 A1 brain stomach placenta kidney spleen thymus ovary heart lung testis liver breast pancreas i muscle adenocarcinoma colon normal Duke A Duke B Duke C Duke D B C s D genomic DNA Patent Application Publication Apr. 12, 2007 Sheet 19 of 25 US 2007/0083334 A1 89_InÃ¡H Patent Application Publication Apr. 12, 2007 Sheet 20 of 25 US 2007/0083334 A1 6a.m.??, Patent Application Publication Apr. 12, 2007 Sheet 21 of 25 US 2007/0083334 A1 Brain Stomach Placenta Kidney Spleen Thymus Adenocarcinoma Colon Normal i Colon Duke’s A Colon Duke's B Colon Duke's C Colon Duke's D . Patent Application Publication Apr. 12, 2007 Sheet 22 of 25 US 2007/0083334 A1 2u Ouoleo fun SN # Ouseboa At ON 9ff t E. e SeAe I uOSSeIdx peZeu.ION eA)be Patent Application Publication Apr. 12, 2007 Sheet 23 of 25 US 2007/0083334 A1 euouble fun SN is 9 oup23-OaAyyppo. 9i Gi (ouapW) 0900 gif vil (ouapy) oz.90 vil coodog)-euoup Jelouapf Eli oup JeouenbSylplay Zi Ou)Je3-uenbS gppoA is eu Ougojeo souenbS oil euoup Jeo souenbS 6: i (lood-og) ouple3-uenbS gy Og) euoupoleo SoueabS is (uo) snouenbS fun 9ty eulouse ZFS)). Gil NiOZ-9) reuo, fun. Eit ood peup og 'ulo fundi in a r n r are N N a e SeAe UOISSeIdx peZIIeu.ION 9Agee Patent Application Publication Apr. 12, 2007 Sheet 24 of 25 US 2007/0083334 A1 A 2ULuoup J23 ft. In SNY 9 OuJeboat p-po, 9 gli (ouapw) 0990 Gli ity (ouapW) OZ-9) will to Oro-buloupe3Ouapy Ei ougeo-tuenbS gpa ety ouble3-uenbSuppo Li eu Ougoue) souenbS Oy euouplje3 SoulenbS 6 i too dog) ouple-uebS 9 O) 2uoupe) SoulenbS is (uo) soughbS fung euoupleo ZSO is NiO29O euo fun ey lood use upogu ON fun Ziy in to to r n - o v SeAe UOISSeIdx peZeu.ION eAgee US 2007/0083334 A1 Apr.

(12) Patent Application Publication (10) Pub. No.: US 2007/0083334 A1 Mintz Et Al

Review Article Functional Subunits of Eukaryotic Chaperonin CCT/Tric in Protein Folding

Leeds Thesis Template

The Microbiota-Produced N-Formyl Peptide Fmlf Promotes Obesity-Induced Glucose

Style Specifications Thesis

Product Sheet Info

Biochemical Basis for Differential Deoxyadenosine Toxicity To

Structure, Hormonal Regulation and Chromosomal Location of Genes Encoding Barley (1-+A)-B-Xylan Endohydrolases

Supplementary Information

The Metabolic Building Blocks of a Minimal Cell Supplementary

Q 297 Suppl USE

Exploring the Data Work Organization of the Gene Ontology Shuheng Wu

The Chaperonin Folding Machine