(12) Patent Application Publication (10) Pub. No.: US 2008/0085284 A1 Patell Et Al
Total Page:16
File Type:pdf, Size:1020Kb
US 20080O85284A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2008/0085284 A1 Patell et al. (43) Pub. Date: Apr. 10, 2008 (54) CONSTRUCTION OF A COMPARATIVE (52) U.S. Cl. ............................. 424/184.1; 435/6: 514/44; DATABASE AND IDENTIFICATION OF 536/23.7:536/24.33: 707/102 VIRULENCE FACTORS COMPARISON OF POLYMORPHC REGIONS IN CLINICAL (57) ABSTRACT SOLATES OF INFECTIOUS ORGANISMS The present invention is directed to novel nucleotide (76) Inventors: Villoo Morawala Patell, Bangalore sequences to be used for diagnosis, identification of the (IN); K.R. Rajyashri, Bangalore (IN); strain, typing of the strain and giving orientation to its Marc Rodrigue, Marcy (FR); Guy potential degree of virulence, infectivity and/or latency for Vernet, Marcy (FR) all infectious diseases more particularly tuberculosis. The present invention also includes method for the identification Correspondence Address: and selection of polymorphisms associated with the viru SALWANCHIK LLOYD & SALWANCHK lence and/or infectivity in infectious diseases more particu A PROFESSIONAL ASSOCATION larly in tuberculosis by a comparative genomic analysis of PO BOX 142950 the sequences of different clinical isolates/strains of infec GAINESVILLE, FL 32614-2950 (US) tious organisms. The regions of polymorphisms, can also act (21) Appl. No.: 11/632,108 as potential drug targets and vaccine targets. More particu larly, the invention also relates to identifying virulence (22) Filed: Apr. 9, 2007 factors of M. tuberculosis strains and other infectious organ isms to be included in a diagnostic DNA chip allowing Publication Classification identification of the strain, typing of the strain and finally (51) Int. Cl. giving orientation to its potential degree of virulence. A6 IK 3L/70 (2006.01) Although the present invention has been illustrated with A6 IK 39/00 (2006.01) specific reference to the polymorphic region in the Myco A6IP 43/00 (2006.01) bacterium tuberculosis, the said invention is not to be C7H 2L/04 (2006.01) understood and construed as being limited to Tuberculosis CI2O I/68 (2006.01) but is applicable to all infectious diseases. Identification of Single Nucleotide Polymorphisms (SNPs) in M. tuberculosis strains H37Ry, CDC1551 and M. bovis BCG A total of 1829 SNP's have been identified in the three genomes. Of these 825 SNPs are identical in H37RV and CDC1551, with a different nucleotide in BCG. 1579 of these are ORFs While the rest (246) are in non-coding regions. The SNPs in the ORF are categorized into synonymous, non-synonymous SNPs. The latter are further categorized 'On the basis of the change in primary structure of the protein that results - Conservative for no-change and non-conservative for changed primary structure of proteiin encoded. Patent Application Publication Apr. 10, 2008 Sheet 1 of 31 US 2008/0085284 A1 Figure 1: Entity Relationship Model SNP SEQ SNP Ref pos (PK} SNPid {FK} Ref annotation id{FK} Annotation id FK} Ref base Query pos Refaa Query base In cdc1551 Sequence id In h37rv Is insSNP annotation SNP analysis Id{PK} ref pos function Organism BCG annotation amino Version ref base class Accession no BCGAA Gene start query name Gene end query pos Locus tag query annotation Product query base Se Protein id query_aa Username EC number is nsSNP Password DBXref qryorf DB xref GOA boworf Type is non cons Strand is iden base Gene name funcannoid Gene link function note long poly indels Accession no Accession no Ref start Ref start Ref end Ref end BCG annotation BCG annotation BCGorf BCG orf Query name Query name Query start Query start Query end Query end Query annotation Query annotation Query orf Query orf Funcanno id Funcanno id Patent Application Publication Apr. 10, 2008 Sheet 2 of 31 US 2008/0085284 A1 Figure 2: Identification of Single Nucleotide Polymorphisms (SNPs) in M. tuberculosis strains H37Rv, CDC1551 and M.bovis BCG A total of 1829 SNP's have been identified in the three genomes. Of these 1825 SNPs are identical in H37Rv and CDC1551, with a different nucleotide in BCG. 1579 of these are in ORFs while the rest (246) are in non-coding regions. The SNPs in the ORF are categorized into synonymous, non-synonymous SNPs. The latter are further categorized on the basis of the change in primary structure of the protein that results - conservative for no-change and non-conservative for changed primary structure of protein encoded. Patent Application Publication Apr. 10, 2008 Sheet 3 of 31 US 2008/0085284 A1 Fig 3: Identification of indels in M. tuberculosis strains H37Rv, CDC1551 and M. bovis BCG A total of 794 indels have been identified in the three genomes. Of these, 237 are present in both H37Rv and CDC1551 with respect to BCG, 178 in ORF and 59 are outside the ORF. Patent Application Publication Apr. 10, 2008 Sheet 4 of 31 US 2008/0085284 A1 Fig 4: Identification of long plymorphisms in M. tuberculosis strains H37Rv, CDC1551 and M. bovis BCG 136 polymorphisms are present in the three genomes, 30 of them being identical to CDC1551 and H37Rv. 22 of these polymorphisms are present in the ORFswhile 8 are outside the ORF. Patent Application Publication Apr. 10, 2008 Sheet 5 of 31 US 2008/0085284 A1 Figure 5: Display showing a region of 10kb of the BCG genome with three types of annotations: BCG ORF's, SNP's in H37Rv, and SNP's in CDC1551 The details of the different color codes used is as follows: A - Synonymous 0 - Non Synonymous or truncated protein H - Indel or long polymorphisms H - alignment of H37Rv and CDC1551 4 - Polymorph in coding region of only BCG O Polymorph in the non-coding region of both BCG and query. 3ik MF's in FRV indel's in SFRv j962 30957 NPs in hsfry i104. 4.32608 lignment with H37RV NC00-0962:30757. .30983 NC000962:32625.33804 HNC000982:3098.3105.7 NC000.95 NC000962:31058, .3262: - nnotated Genes M0028 M30.031 0.032 A mum M30029 Haam-m-m-m-mail-a-minum mus-amus man-a-mb lignment with CDC1551. NC_00275:30738.30924 NC002755:3233.32558 NC00275 HT NCO255:30925.30954 NC002755:32569.33748 Hcoo2755:30962.31036 NC002755:31059.32332 - NP's in CDC1551. t 30923.303. 08 4 inde's in coc1551 30952.3095 134 3232 P's in CC15 oding Regions M0027.01. b0030.001 3003.co. lu-T-- - - CONSERyEHYPOTHETICAL PROTEIN CONSERVED HYPHETICAL FRTEIN HYPOTHETICAL FROTEIN MyO028.co. Mb0032.01. H. --"w. HYPOTHETICAL FROTEINb9029.gi PTATIVE REMNAKT HYPOTHETICAL PROTE Patent Application Publication Apr. 10, 2008 Sheet 6 of 31 US 2008/0085284 A1 Figure 6 Comparative genomics browser displaying BCG in the upper panel and H37Rv in the bottom panel. * The segments labeled MUM-* are the perfect matches generated by the MUMmer tool, and the vertical lines show the alignment of the MUM segments in both genomes. The color coding of the ORF's is used to indicate the length of the ORF. This is very helpful to researchers because if an ORF in H37 aligns with an ORF in BCG but they have different colors, then there is a mutation that makes them have different lengths (see for example the genes in the MUM-1280 region). NC02:45 . - - - - - - BCG1987; genes 1933; 1589; 1930k 1931k 1932k 1333k1933k agg5k 1536; 1937R i558k 1959k 2060k 2001: 20dpk:2003k: 2004k 2005k 2006 1793. 19cic Polz.95c PEPSRS31 i lib1798 A. holso Mb1801 Mbi 803, Mo1805, gigore 19809 ; - - - - - - - - - --" - er re.'. : 4 SES HES6 Mo199 bi802c Plb1804 cyp144, 1808 pal I 2 -rosta - wom: e- m- : indels in H37R ships1986939...1988743 in H37Ru ! snps in H37Rw | 96970: MJ-1278 Patent Application Publication Apr. 10, 2008 Sheet 7 of 31 US 2008/0085284 A1 Fig 7.1 : PRIMER designed to amplify the polymorphisms (Polymorphism ID: 593) FORWARD REVERSE PRIMER ORF PRIMER 1375.153 1373397 YES 1373907 1373629.1373649 1374,080.1374O1 YES Upper Primer: 24-Tier 5 GCCGACGCTGCTTGGATGATGAT jer Piter: 3-the 5' CASCGTTGCCCCGTTGGTAT 3. DNA 250 plul, Salt 50 mM Upper Printher Priter T Biff PC Printer Orer all Stability - 4. Frirmer Location 56.53 Frt T. Pit TT Pirtles TT Difference OptiTal Annealing Teliperature Product Length 43 bp Product Trn & G. Method) 85.3 PC Product G. Cotent F. Prst Tit SSC Fg C 135000 3500 375200 375300 135400 ship's in H37Rw indel's in H3ARw SNP's in 37Ry 137553 i375309 4. - alignment with H37Rw -NGO00962:37-1605..137390? NC000982:1374054.375722 NC00-0962:373908.374,063 Annotated Genes 262c alignment with CDC1551 NC002.55:137,095...i.373397 NC002755:373398..i.37469 SNPs in coc1551. 37553 4. indel's in CdCl551 SF's in CDC551 Patent Application Publication Apr. 10, 2008 Sheet 8 of 31 US 2008/0085284 A1 Fig 7.2: PRIMER designed to amplify the polymorphism (Polymorphism ID: 639) STRAIN SNP START END FORWARD REVERSE PRIMER PRIMER BCG 1476918 1476.917 | 1.4771.53 14767.01.1476719 14771.54...1477.177 1478424 1477169 1478659 14782O7.1478225 1478660.1478683 147888. 1477626 147916 1478664...1478.682 Upper PrirTer: 19-mer 5' CGGCAGGTTCGTGGTCTCG Lower Primer: 24-mer 5' GTGGGCGGGTGTAATGTTGAAGG DNA 250 pM, Salt 50 mM upper Priner Pitar TT 58. 6. PC Primer Overall Stability 4.kct -48.4 kci. Prit. Location 24.3 S.-FF Product Tin - Piner TT 28. C. Printers Trn Differ: 3.4 OptiTal Annealing Tertiperature 3.4C Product Length 477 bp Product Trn & GC Method) 88C Product 6 Content fg. Product TT at fixSS 18.4 4F5800 4:5900 i477000 47,00 4200 ShP's in H3FRy indel's in H37Rw SNP's in H3Ry {}47698 477.2071. alignment with H3FRw NCQ00962:4783.i.17888. - NC000962:479084,4809FO NC000982:1478882...i.479083 H Annotated Genes alkA alkAa Mk1352 ma-ma (Hr alignment with CDC1551 NC002755:1477854,1478424 NC_002755:1478627.4484212 NC002755:1478425,1478626 SNP's in CDC55.