21‐Mar‐15
Info and documentation Introduction to Bioinformatics • http://theory.bio.uu.nl/BDA/2015
• http://www.google.com – … but only for guidance and hints: never take the internet for granted
• Campbell Biology, 9th or 10th edition, Pearson
• Reader – Printed in black and white – Download full color PDF at: http://theory.bio.uu.nl/BDA/2015/BioInf2015.pdf Bas E. Dutilh – Errata: Systems Biology: Bioinformatic Data Analysis http://theory.bio.uu.nl/BDA/2015/errata.html Utrecht University, March 19th 2015
Evaluation How would you figure out the function of a protein? • Final mark course – 2/3 mark of Mathematics/Theoretical Biology – 1/3 mark of Bioinformatic Data Analysis
• Bioinformatics: mark of written exam only – NOTE: this is different from info in studiegids! Activity assay – Date: April 9th 2015 at 17:00‐20:00 in Educatorium Gamma X‐ray structure
• Bonus point – NOTE: this is different from info in studiegids! – Make all practicals and have them signed by the assistant • In case of emergencies you can be late by one class maximum th – Hand in your mini‐article on time (deadline: April 7 2015) Knock‐out mouse through http://theory.bio.uu.nl/sb/rooster.html – The bonus point will only be added to the mark of the written exam if this mark is >4 before addition – The maximum mark is a 10 BLAST search
How about for all proteins in a genome? Genome sizes Chaos chaos (1.4 Tb, Friz 1968)
Tb: Tera base pairs (1012) Gb: Giga base pairs (109) Mb: Mega base pairs (106) Kb: Kilo base pairs (103)
1 21‐Mar‐15
Gene density and non‐coding DNA Components of the human genome • Mammals (including humans) have the lowest gene • 20,000 – 25,000 protein‐coding genes (1.5%) density – Number of genes in a given length of DNA • Introns within genes • Introns (25.9%) • Noncoding DNA between genes
• Transposable elements (44.7%) – DNA transposons – Long terminal repeat (LTR) retrotransposons – Short interspersed nuclear elements (SINEs) – Long interspersed nuclear elements (LINEs) – Endogenous retroviruses – Miniature inverted repeat transposable elements (MITEs)
Largest genomes Smallest genomes • Eukaryota – Free: Ostreococcus tauri (12.6 Mb) – Endosymb: Encephalitozoon intestinalis (2.3 Mb)
• Bacteria and Archaea – Free: Mycoplasma genitalium (580 kb) Largest sequenced genome: – Endosymb: Cand. Carsonella ruddii (160 kb) Loblolly pine (Pinus taeda) 20,000 ,000 ,000 bp (20 Gb) Kinugasasō (Paris japonica) • Viruses 149,000,000,000 bp (149 Gb) – Circoviridae (1.8 kb –only two proteins!)
Genetic diversity Human genome • Phylogenetic Tree of Life • 3,000,000,000 bp (3 Gb) • Human Genome Project (HGP) – 1990‐2003 – Draft genome sequence complete in 2000 Eukaryotes • Reference genome – Source: blood (female) and sperm (male) – Samples taken from many donors, but only a few were used to protect donor identities – Sequence is not from one individual • >70% from one male donor Archaea • Cost HGP: $ 3,000,000,000 Prokaryotes – Target: $ 1,000 genome Bacteria
2 21‐Mar‐15
Genome sequencing Whole Genome Shotgun (WGS) approach
Cloned genomes
Segments known order
Fragment and sequence
Assemble sequences
Consensus genome
Personal genome sequences Your personal genome sequence
~2.000.000 differences
Craig Venter James Watson
~5.000.000 differences ~5.000.000 differences
Reference Genome
So we have a $200 personal genome… Personalized medicine Sergey Brin Co‐founder Co‐invester
LRRK2 polymorphism on chromosome 12 ‐ 28% risk of Parkinson’s at age 59 ‐ 51% at age 69 • …now the million dollar question is: ‐ 74% at age 79 • From reactive to proactive medicine What can I learn from my – Identify high risk alleles 3,000,000,000 A’s, C’s, G’s, and T’s? – Adapt lifestyle (e.g. risk of high blood pressure) – Preventive screening or treatment (e.g. risk of cancer) • Pharmacogenomics: – Impact of genetic variation on response to medication
3 21‐Mar‐15
Biology is Big Data science Omics sciences • The suffix ‐ome refers to a totality of some sort • Gene (genetics) • Genome • Genomics • Transcript (RNA) • Transcriptome • Transcriptomics genomes
• Protein • Proteome • Proteomics sequenced
#
DNA RNA Protein
• Metabolite • Metabolome • Metabolomics • Lipid • Lipidome • Lipidomics Moore's Law: computer power doubles every ~2 years. • Microbe • Microbiome • Microbiomics (?!)
Genomics Metagenomics • Identify differences in gene content between genomes • Discover new species: “Biological Dark Matter” Sample • Analyze genome evolution • Predict gene functions
Filter
Microbes or viruses
Chordata ↔ Echinodermata
Human microbiome and virome Bioinformatics • In your body: ~1013 human cells ~1014 bacteria ~1015 viruses • Bioinformatics: study of informatic processes in biotic
Image: Lisa Brown for systems
Paulien Hogeweg and Ben Hesper (Utrecht University, 1970) • Bioinformatic Data Analysis: using computational methods to analyze biological data
4 21‐Mar‐15
Bioinformatics in Utrecht today
5