Introduction to Bioinformatics •
Total Page:16
File Type:pdf, Size:1020Kb
21‐Mar‐15 Info and documentation Introduction to Bioinformatics • http://theory.bio.uu.nl/BDA/2015 • http://www.google.com – … but only for guidance and hints: never take the internet for granted • Campbell Biology, 9th or 10th edition, Pearson • Reader – Printed in black and white – Download full color PDF at: http://theory.bio.uu.nl/BDA/2015/BioInf2015.pdf Bas E. Dutilh – Errata: Systems Biology: Bioinformatic Data Analysis http://theory.bio.uu.nl/BDA/2015/errata.html Utrecht University, March 19th 2015 Evaluation How would you figure out the function of a protein? • Final mark course – 2/3 mark of Mathematics/Theoretical Biology – 1/3 mark of Bioinformatic Data Analysis • Bioinformatics: mark of written exam only – NOTE: this is different from info in studiegids! Activity assay – Date: April 9th 2015 at 17:00‐20:00 in Educatorium Gamma X‐ray structure • Bonus point – NOTE: this is different from info in studiegids! – Make all practicals and have them signed by the assistant • In case of emergencies you can be late by one class maximum th – Hand in your mini‐article on time (deadline: April 7 2015) Knock‐out mouse through http://theory.bio.uu.nl/sb/rooster.html – The bonus point will only be added to the mark of the written exam if this mark is >4 before addition – The maximum mark is a 10 BLAST search How about for all proteins in a genome? Genome sizes Chaos chaos (1.4 Tb, Friz 1968) Tb: Tera base pairs (1012) Gb: Giga base pairs (109) Mb: Mega base pairs (106) Kb: Kilo base pairs (103) 1 21‐Mar‐15 Gene density and non‐coding DNA Components of the human genome • Mammals (including humans) have the lowest gene • 20,000 – 25,000 protein‐coding genes (1.5%) density – Number of genes in a given length of DNA • Introns within genes • Introns (25.9%) • Noncoding DNA between genes • Transposable elements (44.7%) – DNA transposons – Long terminal repeat (LTR) retrotransposons – Short interspersed nuclear elements (SINEs) – Long interspersed nuclear elements (LINEs) – Endogenous retroviruses – Miniature inverted repeat transposable elements (MITEs) Largest genomes Smallest genomes • Eukaryota – Free: Ostreococcus tauri (12.6 Mb) – Endosymb: Encephalitozoon intestinalis (2.3 Mb) • Bacteria and Archaea – Free: Mycoplasma genitalium (580 kb) Largest sequenced genome: – Endosymb: Cand. Carsonella ruddii (160 kb) Loblolly pine (Pinus taeda) 20, 000, 000,000 bp (20 Gb) Kinugasasō (Paris japonica) • Viruses 149,000,000,000 bp (149 Gb) – Circoviridae (1.8 kb –only two proteins!) Genetic diversity Human genome • Phylogenetic Tree of Life • 3,000,000,000 bp (3 Gb) • Human Genome Project (HGP) – 1990‐2003 – Draft genome sequence complete in 2000 Eukaryotes • Reference genome – Source: blood (female) and sperm (male) – Samples taken from many donors, but only a few were used to protect donor identities – Sequence is not from one individual • >70% from one male donor Archaea • Cost HGP: $ 3,000,000,000 Prokaryotes – Target: $ 1,000 genome Bacteria 2 21‐Mar‐15 Genome sequencing Whole Genome Shotgun (WGS) approach Cloned genomes Segments known order Fragment and sequence Assemble sequences Consensus genome Personal genome sequences Your personal genome sequence ~2.000.000 differences Craig Venter James Watson ~5.000.000 differences ~5.000.000 differences Reference Genome So we have a $200 personal genome… Personalized medicine Sergey Brin Co‐founder Co‐invester LRRK2 polymorphism on chromosome 12 ‐ 28% risk of Parkinson’s at age 59 ‐ 51% at age 69 • …now the million dollar question is: ‐ 74% at age 79 • From reactive to proactive medicine What can I learn from my – Identify high risk alleles 3,000,000,000 A’s, C’s, G’s, and T’s? – Adapt lifestyle (e.g. risk of high blood pressure) – Preventive screening or treatment (e.g. risk of cancer) • Pharmacogenomics: – Impact of genetic variation on response to medication 3 21‐Mar‐15 Biology is Big Data science Omics sciences • The suffix ‐ome refers to a totality of some sort • Gene (genetics) • Genome • Genomics • Transcript (RNA) • Transcriptome • Transcriptomics genomes • Protein • Proteome • Proteomics sequenced # DNA RNA Protein • Metabolite • Metabolome • Metabolomics • Lipid • Lipidome • Lipidomics Moore's Law: computer power doubles every ~2 years. • Microbe • Microbiome • Microbiomics (?!) Genomics Metagenomics • Identify differences in gene content between genomes • Discover new species: “Biological Dark Matter” Sample • Analyze genome evolution • Predict gene functions Filter Microbes or viruses Chordata ↔ Echinodermata Human microbiome and virome Bioinformatics • In your body: ~1013 human cells ~1014 bacteria ~1015 viruses • Bioinformatics: study of informatic processes in biotic Image: Lisa Brown for systems Paulien Hogeweg and Ben Hesper (Utrecht University, 1970) • Bioinformatic Data Analysis: using computational methods to analyze biological data 4 21‐Mar‐15 Bioinformatics in Utrecht today 5.