Identify Audiences, Especially in the Fast-Moving Field of Bioinformatics

Identify Audiences, Especially in the Fast-Moving Field of Bioinformatics

AnAn IntroductionIntroduction toto NCBINCBI’’ss BioinformaticsBioinformatics ResourcesResources Dr.Dr. MedhaMedha DevareDevare [email protected] Life Sciences/Bioinformatics Specialist Albert R. Mann Library Cornell University, Ithaca, NY 14853 USAIN 2006: Delivering Information for the New Life Sciences October 7, 2006 Part I: Introduction to DNA Sequencing Part II: Data Mining in Bioinformatics CENTRAL DOGMA OF BIOLOGY Courtesy: National Human Genome Research Institute NUCLEOTIDES Nucleotide = phosphate + pentose sugar + base http://www.web-books.com/MoBio/Free/Ch3A.htm PENTOSE SUGARS http://www.web-books.com/MoBio/Free/Ch3A.htm NITROGENOUS BASES Purines Adenine Guanine Pyrimidines Cytosine Thymine Uracil (RNA only) http://dl.clackamas.cc.or.us/ch106-09/nucleoti.htm STRUCTURE OF DNA Courtesy: National Human Genome Research Institute DNA REPLICATION http://www.ncc.gmu.edu/dna/repanim.htm DNA SEQUENCING DNA SEQUENCING DNA SEQUENCING DNA SEQUENCING DNA SEQUENCING http://www.dnalc.org/ddnalc/resources/cycseq.html CLONING – PLASMID VECTOR http://www.accessexcellence.org/RC/VL/GG/inserting.html CLONING – identifying transformed cells DNA insert AmpR origin of replication VECTORS Vector FormForm Host CarryingCarrying Capacity Major UsesUses Plasmid Double-stranded circular DNA E. coli Upto 15 kb cDNA libraries; subcloning Bacteriophage lambda Virus – linear DNA E. coli Upto 25 kb Genomic and cDNA libraries Cosmid Double-stranded circular DNA E. coli 30 – 45 kb Genomic libraries Bacteriophage P1 Virus – circular DNA E. coli 70 – 90 kb Genomic libraries BAC Bacterial artificial chromosome E. coli 100 – 500 kb Genomic libraries YAC Yeast artificial chromosome Yeast 250 – 2000 kb Genomic libraries GENOME SEQUENCING Genome sequencing: http://www.pbs.org/wgbh/nova/genome/sequencer.html# Whole genome shotgun sequencing: http://smcg.cifn.unam.mx/enp-unam/03-EstructuraDelGenoma/animaciones/humanShot.swf WhatWhat isis bioinformatics?bioinformatics? Research,Research, developmentdevelopment oror applicationapplication ofof computationalcomputational toolstools andand approachesapproaches toto expandexpand thethe use,use, acquisition,acquisition, visualization,visualization, analysis,analysis, organizationorganization andand archivingarchiving ofof biological,biological, medical,medical, behavioralbehavioral oror healthhealth data.data. [Bioinformatics[Bioinformatics atat thethe NIH,NIH, 2001]2001] http://http://grants.nih.gov/grants/bistic/bistic.cfmgrants.nih.gov/grants/bistic/bistic.cfm ImportantImportant databasesdatabases inin thethe publicpublic domaindomain •• NationalNational CenterCenter forfor BiotechnologyBiotechnology InformationInformation (NCBI)(NCBI) http://www.ncbi.nlm.nih.gov •• EuropeanEuropean BioinformaticsBioinformatics InstituteInstitute ((http://www.ebi.ac.uk/) •• EuropeanEuropean MolecularMolecular BiologyBiology LaboratoryLaboratory ((http://www.embl.org) •• DNADNA DataData BankBank ofof JapanJapan ((http://www.ddbj.nig.ac.jp/Welcome.html) •• TIGRTIGR ((http://www.tigr.org) TheThe NationalNational CenterCenter forfor BiotechnologyBiotechnology InformationInformation (NCBI)(NCBI) Bethesda CreatedCreated inin 19881988 (( NationalNational LibraryLibrary ofof MedicineMedicine atat NIH)NIH) – Establish public databases – Conduct research in computational biology – Develop software tools for sequence analysis – Disseminate biomedical information NCBI FieldGuide NCBINCBI databasedatabase typestypes – Bibliographic Citations for biomedical articles http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed Free archive of life sci. journals http://www.pubmedcentral.nih.gov/ From NCBI FieldGuide NCBINCBI databasedatabase typestypes – Bibliographic Books that can be searched online http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Books&itool=toolbar Human genes/genetic disorders http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM From NCBI FieldGuide NCBINCBI databasedatabase typestypes – Sequence (nucleotide; protein) – Taxonomy – Genome http://www.ncbi.nlm.nih.gov – Gene – Expression –Structure NCBI FieldGuide TypesTypes ofof SequenceSequence DatabasesDatabases PrimaryPrimary DatabasesDatabases –– ContainContain rawraw andand redundantredundant datadata:: originaloriginal experimentalexperimental sequences,sequences, submittedsubmitted andand ““ownedowned”” byby experimentalistsexperimentalists –– DatabaseDatabase staffstaff reviewreview andand organizeorganize thethe datadata:: dondon’’tt add,add, modifymodify oror updateupdate thethe recordsrecords ¾¾Examples:Examples: GenBank,GenBank, SNP,SNP, GEOGEO NCBI FieldGuide TypesTypes ofof SequenceSequence DatabasesDatabases DerivativeDerivative DatabasesDatabases –– HumanHuman--curatedcurated (data(data compilationcompilation andand correction)correction) ¾ Examples:Examples: LocusLinkLocusLink,, OMIMOMIM && LiteratureLiterature databasesdatabases –– ComputationallyComputationally--DerivedDerived (auto(auto--partitioningpartitioning GenBankGenBank seqsseqs)) ¾¾Example:Example: UniGeneUniGene –– CombinationCombination ¾ Examples:Examples: RefSeq,RefSeq, GenomeGenome AssemblyAssembly NCBI FieldGuide 11ºº SequenceSequence DatabaseDatabase GenBank •• NucleotideNucleotide--onlyonly sequencesequence databasedatabase •• ArchivalArchival ((>292,000 organisms) SubmissionSubmission ofof GenBankGenBank DataData toto NCBI:NCBI: ¾¾DirectDirect submissionssubmissions ofof individualindividual recordsrecords viavia WebWeb ((BankItBankIt,, SequinSequin)) ¾¾BatchBatch submissionssubmissions ofof bulkbulk sequencessequences viavia ee--mailmail ((EST,EST, dbGSSdbGSS,, dbSTSdbSTS)) ¾¾FTPFTP accountsaccounts forfor sequencingsequencing centerscenters NCBI FieldGuide TheThe InternationalInternational SequenceSequence DatabaseDatabase CollaborationCollaboration NIHNIH Entrez NCBI GenBankGenBank EMBLEMBL •Submissions •Updates •Submissions EMBLEMBL •Updates DDBJDDBJ CIB EBI NIGNIG •Submissions •Updates SRS getentry NCBI FieldGuide CheckCheck forfor crosscross--functionalityfunctionality ofof accessionaccession numbersnumbers AccessionAccession no.no. AB062786AB062786 EBI:EBI: http://http://www.ebi.ac.ukwww.ebi.ac.uk DDBJ:DDBJ: http://www.ddbj.nig.ac.jp/http://www.ddbj.nig.ac.jp/ OrganizationOrganization ofof GenBank:GenBank: GenBankGenBank DivisionsDivisions ((gbdivgbdiv)) RecordsRecords areare divideddivided intointo 1818 divisions:divisions: -- 11 PatentPatent 5 High Throughput EST Expressed Sequence Tag -- 5 High Throughput PRIGSS PrimateGenome Survey Sequence PLNHTG PlantHigh and Throughput Fungal Genomic - 1212 TraditionalTraditional BCTSTS BacterialSequence and Tagged Archaeal Site - INVHTC InvertebrateHigh Throughput cDNA Traditional Divisions: ROD Rodent VRL Viral ••BulkDirect Divisions: Submissions VRT Other Vertebrate •• Batch(Sequin Submission and BankIt) MAM Mammalian (ex. ROD and PRI) •• Accurate(Email and FTP) PHG Phage •• Well characterized SYN Synthetic (cloning vectors) •• Inaccurate UNA Unannotated •• Poorly characterized ENV Environmental NCBI FieldGuide Length mRNA = cDNA Division DNA = genomic Accession Number Accession.Version NCBI’s Taxonomy Feature Table GenPept Protein ID Database searching: http://www.ncbi.nlm.nih.gov/ e.g.e.g. -- pharmacogeneticspharmacogenetics • Identifying novel targets for new drugs ¾ mapping and identifying genes associated w/ disease ¾ characterizing proteins targets for new drugs • Identifying genetic variants associated w/ adverse drug reactions ¾ e.g., cytochrome P450s = multigene family of enzymes (liver) ¾ genetically variable expression = variation in drug efficacy Adapted from: Wolf et al., British Medical J., 320: 987-990 Potential consequences of polymorphic drug metabolism • Extended pharmacological effect • Adverse drug reactions • Lack of pro-drug activation (e.g., codeine) • Drug toxicity • Increased effective dose • Metabolism by alternate, deleterious pathways • Exacerbated drug – drug interactions Adapted from: Wolf et al., British Medical J., 320: 987-990 Common pharmacogenetic polymorphisms in human drug metabolizing enzymes (Weber, W.W. Pharmacogenetics. Oxford, 1997) Gene Metaboliser Frequency # of drugs Examples Phenotype CYP2D6 Poor White 6%, African American 2% >100 codeine, dextromethorphan Ultra-rapid Ethiopian 20%, Spanish 7% CYP2C9 Reduced >60 Ibuprofen, warfarin TPMT Poor low in all populations <10 6-mercaptopurine, 6-thioguanine Example: Cytochrome P450 gene - CYP2D6 • CYP2D6 is highly polymorphic (inactive in ~ 6% of Caucasians) ¾ codes for debrisoquine hydroxylase Adapted from: Wolf et al., British Medical J., 320: 987-990 http://www.ncbi.nlm.nih.gov/ Sequence/structureSequence/structure searchingsearching toolstools s e q results Simple sequence search u (BLAST) e n results Profile-sequence search c (HMMER) e results Structure-sequence search s (threading) t r u Homology modeling c (MODELLER) t u Structure-structure search (CE) r e Slide courtesy of Pillardy, Ripoll, and Sun (CBSU) ToolTool comparisoncomparison BLAST HMM Threading Sensitivity: Least sensitive Most sensitive Speed: Seconds Minutes Hours DB size: 1 x 106 1 x 106 18000 (PDB) Result Some expertise interpretation: Relatively easy required Slide courtesy of Pillardy, Ripoll, and Sun (CBSU) SequenceSequence similaritysimilarity searchingsearching WhyWhy dodo it?it? • identify and annotate sequences with no, incomplete, incorrect annotations (GenBank) • infer functionality for genes/proteins • find conserved domains • assemble genomes; clean up sequences (e.g., suspected cloning vector sequences) • explore evolutionary relationships NOTE: Similar sequences may

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    68 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us