Sequencing Genomes

Sequencing Genomes

Sequencing Genomes Propojený obrázek nelze zobrazit. Příslušný soubor byl pravděpodobně přesunut, přejmenován nebo odstraněn. Ověřte, zda propojení odkazuje na správný soubor a umístění. Human Genome Project: History, results and impact MUDr. Jan Pláteník, PhD. (December 2020) Beginnings of sequencing • 1965: Sequence of a yeast tRNA (80 bp) determined • 1977: Sanger’s and Maxam & Gilbert’s techniques invented • 1981: Sequence of human mitochondrial DNA (16.5 kbp) • 1983: Sequence of bacteriophage T7 (40 kbp) • 1984: Epstein & Barr‘s Virus (170 kbp) 1 Homo sapiens • 1985-1990: Discussion on human genome sequencing • “dangerous” - “meaningless” - “impossible to do” • 1988-1990: Foundation of HUMAN GENOME PROJECT • International collaboration: HUGO (Human Genome Organisation) • Aims: – genetic map of human genome – physical map: marker every 100 kbp – sequencing of model organisms (E. coli, S. cerevisiae, C. elegans, Drosophila, mouse) – find all human genes (estim. 60-80 tisíc) – sequence all human genome (estim. 4000 Mbp) by 2005 Other genomes • July 1995 : Haemophilus influenzae (1.8 Mbp) ... First genome of independent organism • October 1996: Saccharomyces cerevisiae (12 Mbp) ... First Eukaryota • December 1998: Caenorhabditis elegans (100 Mbp) ... First Metazoa 2 May 1998: • Craig Venter launches private biotechnology company CELERA GENOMICS, Inc. and announces intention to sequence whole human genome in just 3 years and 300 mil. USD using the whole- genome shotgun approach. • The publicly funded HGP in that time: sequenced cca 4 % of the genome March 2000: • Celera Genomics & academic collaborators publish draft genome of Drosophila melanogaster (cca 2/3 from 180 Mbp) • ... whole-genome shotgun is feasible for large genomes as well • ... ... Human genome: competition between Human Genome Project and Celera Genomics 3 International Human Genome Sequencing Consortium (Human Genome Project, HGP) • Open to co-operation from any country • 20 laboratories from USA, Great Britain, Japan, France, Germany and China • About 2800 workers, main coordinator: Francis Collins, NIH • Publicly funded (about 3 billion USD) • Approach: clone-by-clone • Results: duty to upload on internet within 24 hours (the Bermuda rule). Clone-by-clone genomic DNA fragments cca 150,000 bp cloning in BAC ( bacterial artificial chromosome ) clones positioned in the genome using physical maps (STS - sequence tagged site , fingerprint - cleavage by restrictases) digestion of every clone to short fragments cca 500 bp sequencing assembly of each clone sequence with computer 4 Celera Genomics, Inc. • Private biotechnology company, based in Rockville, Maryland, USA. President Craig Venter. • Investments into automation and computer processing, few dozens of employees • Approach: whole-genome shotgun + utilised publicly shared data from HGP. • Results: raw data temporarily available at company www site, but all other updates and annotations for commercial purpose. Whole-genome shotgun genomic DNA fragments 2, 10, 50 kbp cloned in plasmids E.coli sequencing sequence assembly using sophisticated computer algorithms 5 February 2001: • International Human Genome Sequencing Consortium publishes draft of human genome in Nature (Feb. 15 th 2001) • Draft: 90 % euchromatin (2.95 Gbp, whole genome 3.2 Gbp). 25 % definitive. • Celera Genomics, Inc. publishes human genome sequence in Science (Feb. 16 th 2001) • Sequence of euchromatin (2.91 Gbp) Advance in sequencing 1985: 500 bp /lab and day – still the Sanger dideoxynucleotide technique, but – capillary electrophoresis instead of gel – fluorescence markers instead radioactivity – full automatisation & robotisation – computer power 2000: 175,000 bp /day (Celera) 1000 bp/sec. (HGP) 6 Sequencing continues... • Human genome now: Definitive version announced 14/4/2003 …50 years since DNA double helix. The reference sequence still being updated. • Fugu rubripes: draft of genome in August 2002 • Mouse: • Celera Genomics: draft in June 2001 • Mouse Genome Sequencing Consortium: Nature, December 2002 • Laboratory rat: draft in March 2004 • Chimpanzee: September 2005 • … and many other genomes: malaria (the cause Plasmodium falciparum and carrier Anopheles gambiae), zebrafish, rice, dog, cattle, sheep, pig, chicken, honeybee, mammoth etc. Research in “postgenomic” age • New approaches to study genes & proteins: • GENOMICS ... analysis of whole genome and its expression • PROTEOMICS ... analysis of whole proteome, i.e. all proteins in given tissue or organism • BIOINFORMATICS ... processing, analysis and interpretation of large data sets (NA or protein sequences, gene arrays, 3D protein structures etc. Experiments in silico • Rapid development of new technologies: • e.g. DNA Microarray - expression of thousands of genes can be studied simultaneously 7 DNA Microarray (“ DNA chip”) Single Nucleotide Polymorphism (SNP) A G A G T T C T G C T C G A G G G T T C T G C G CG Occurs on average in one base per 1000 bp, i.e. in 0.1 % of human genome About 10 millions of SNPs with occurrence > 1% Coding/non-coding Protein structure changed/unchanged 8 International HapMap Project • Further international collaboration 2002-2009 • Genotyping and sequencing of DNA from 270 people from four different populations (USA, Nigeria, Japan, China) • Aims at finding • all important human SNPs (about 10,000,000) • their stable combinations (haplotypes) • Tag SNP for each haplotype • Data publicly available for further exploration Human genetic variation • Two unrelated humans have 99.5% of genome identical • Single Nucleotide Polymorphisms: 0.1% • Copy number variation (insertions, deletions, duplications): 0.4% • Variable number tandem repeats (…DNA fingerprinting in forensics) • Epigenetics (methylation) 9 Second-generation sequencers: E.g. Illumina Co., XII/2008: • One run (3 days) of Genome Analyzer made by Illumina Inc. = 60 years of work of ABI 3730xl (used by Celera Genomics) • Cost of one human genome sequencing: 40-50,000 $ • First individual human genomes sequenced: • 2007: Craig Venter, James Watson – both genomes published in the internet … and third-generation sequencers Graph: Nature 458, 719-724 (2009). Obtained from http://genome.wellcome.ac.uk 10 Next-Generation Sequencing (NGS) • 454 pyrosequencing • Sequencing by synthesis (Illumina) • SOLiD sequencing by ligation • Ion Torrent semiconductor sequencing • DNA nanoball sequencing • Heliscope single molecule sequencing • Single molecule real time (SMRT) sequencing • Nanopore DNA sequencing Archon X Prize for Genomics $ 10,000,000 Announced in 2006. For the first team that succeeds in sequencing of 100 individual human genomes within 30 days in certain requested quality and cost below $1,000 per one genome. 11 Archon X Prize for Genomics $ 10,000,000 Announced in 2006. For the first team that succeeds in sequencing of 100 individual human genomes within 30 days in certain requested quality and cost below $1,000 per one genome. (Cost to sequence a human genome according to NHGRI, Wikimedia Commons) 12 Next-Generation Sequencing (NGS) Current technology, e.g Illumina HiSeq 3000 or HiSeq 4000: Sequencing by synthesis (SBS) Up to 400 Gb/day …12 human genomes or 96 exomes in 3.5 days www.illumina.com https://www.youtube.com/watch?v=womKfikWlxM • Sequencing all cca 1.5 million of known eukaryotic species on Earth (… all plants, animals, protozoa and fungi, so far <0.2 % sequenced) • Launched 1/11/2018, expected to take 10 years, 4.7 billion USD. 13 Human Genome Project: Results The Human Genome Fig. from Bolzer et al. 2005, PLoS Biol. 3(5): e157 DOI: 10.1371/journal.pbio.0030157 Haploid genome: 3 billion base pairs divided to 23 chromosomes • 1 meter of DNA if extended • 750 Mb (1 CD) • 2 million standard printed pages (50 letters/line, 30 lines/page) 14 DNA in cell nucleus Nucleus of typical human cell has diameter 5-8 μm and contains 2 m of DNA Comparable to a tennis ball into which 20 km of thin thread has been neatly packed. Classification of eukaryotic genomic DNA: • Degree of condensation: • Euchromatin • Heterochromatin (cca 10%, sequencing difficult) • Repetitivity: • Highly repetitive • Moderately repetitive • Non-repetitive (single-copy) • Function: • Structural (centromeres, telomeres) • Coding protein • Transcribed to noncoding RNA (introns, rRNA, tRNA, miRNA etc.) • Transposons • Regulatory sequences • Junk…? 15 Experiments with denaturation & reassociation of DNA: Rapid reassociation (10-15%): - highly repetitive DNA Intermediate reassociation (25- 40%): - moderately r epetitive DNA Slow reassociation (50- 60%): - non-repetitive (single copy) DNA Fig: Lodish, H. et al.: Molecular Cell Biology (3rd ed.), W.H.Freeman, New York 1995. Classification of eukaryotic genomic DNA: • Highly repetitive (simple-sequence DNA): • All heterochromatin (centromeres, telomeres, 8% of genome, yet unsequenced) • Minisatellites (3% of euchromatin) • Moderately repetitive: • Tandemly repeated genes for rRNA, tRNA and histones (more identical copies to achieve high transcription efficiency, e.g. rRNA genes in eukaryotes >100 copies) • Transpozons • Non-repetitive: • Protein genes • Genes for noncoding RNA • Regulatory sequences 16 Eukaryotic GENE Fig: Murray, R.K. et al.: Harperova biochemie, Appleton & Lange 1993, in Czech H&H 2002. Genes are not placed evenly in genome • Big differences among chromosomes: • chromosome 1: 2,057 coding genes • chromosome Y: 65 coding genes • Regions rich in genes (“cities”) - more C and G • Regions poor in genes (“deserts”) - more A and T, up to 3 Mb! • CpG islands -

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    32 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us