(12) United States Patent (10) Patent No.: US 8,663,919 B2 Dong Et Al

Total Page:16

File Type:pdf, Size:1020Kb

(12) United States Patent (10) Patent No.: US 8,663,919 B2 Dong Et Al USOO8663919B2 (12) United States Patent (10) Patent No.: US 8,663,919 B2 Dong et al. (45) Date of Patent: Mar. 4, 2014 (54) CHROMOSOME CONFORMATION (56) References Cited ANALYSIS U.S. PATENT DOCUMENTS (75) Inventors: Shoulian Dong, Mountain View, CA 2005/OO32105 A1* 2/2005 Bair et al. ......................... 435/6 (US); Junko F. Stevens, Menlo Park, CA 2009/0061425 A1* 3/2009 Lo et al. ............................ 435/6 (US); Chunmei Liu, Palo Alto, CA 2010.0081141 A1 4/2010 Chen (US); Cora L. Woo, Santa Clara, CA (US); Luz Montesclaros, Pittsburg, CA FOREIGN PATENT DOCUMENTS (US) EP 1835.038 9, 2007 Assignee: Life Technologies Corporation, EP 2083090 T 2009 (73) WO 2009,147386 12/2009 Carlsbad, CA (US) WO 2011, 146056 11, 2011 (*) Notice: Subject to any disclaimer, the term of this OTHER PUBLICATIONS patent is extended or adjusted under 35 U.S.C. 154(b) by 0 days. Berensmeier (2006) “Magnetic particles for the separation and purificationofnucleic acids' Appl Microbiol Biotechnol 73:495-504; Appl. No.: 13/475,089 published online Oct. 25, 2006.* (21) Hagege, Helene et al., “Quantitative Analysis of Chromosome Con Filed: May 18, 2012 formation Capture Assays (3C-qPCR)', Nature Protocols, vol. 2, No. (22) 7, 2007, 1722-1733. Partial International Search Report for International Application No. (65) Prior Publication Data PCT/US2012/038558 dated Sep. 24, 2012. US 2012/O3O2449 A1 Nov. 29, 2012 International Search Report and Written Opinion for International Appl. No. PCT/US2012/03855 mailed Jan. 15, 2013. Dekker, Job et al., "Capturing Chromosome Conformation'. Science, Related U.S. Application Data vol. 295, 2002, 1306-1311. (60) Provisional application No. 61/487.614, filed on May * cited by examiner 18, 2011, provisional application No. 61/579,444, filed on Dec. 22, 2011, provisional application No. Primary Examiner — Christopher M Babic 61/579,876, filed on Dec. 23, 2011. Assistant Examiner — Karen S Weiler (51) Int. C. (57) ABSTRACT CI2O I/68 (2006.01) Disclosed herein are compositions, methods and kits for ana (52) U.S. C. lyzing three-dimensional chromatin and/or chromosome USPC ........................................................... 435/6.1 conformation. Method are also disclosed for using the meth (58) Field of Classification Search ods disclosed herein for diagnosing diseases such as cancer. None See application file for complete search history. 7 Claims, 27 Drawing Sheets U.S. Patent Mar. 4, 2014 Sheet 1 of 27 US 8,663,919 B2 o U.S. Patent Mar. 4, 2014 Sheet 3 of 27 US 8,663,919 B2 U.S. Patent Mar. 4, 2014 Sheet 4 of 27 US 8,663,919 B2 U.S. Patent Mar. 4, 2014 Sheet 5 Of 27 US 8,663,919 B2 U.S. Patent US 8,663,919 B2 U.S. Patent US 8,663,919 B2 U.S. Patent US 8,663,919 B2 U.S. Patent Mar. 4, 2014 Sheet 10 of 27 US 8,663,919 B2 90 ??. Kessy - kessy boV U.S. Patent R US 8,663,919 B2 |18omed?u3AEuo??e6?TIIºO-ue?w?egIe?oL Ž{?-------------------------------------------------------------------------------------------------------------------------------------------|{}) #No.?://?q?,09&3 |-09, |-0, <?|º 3|-001gst|…No. -------------------------------------------------------------------------------------------------------------------------------------|-08OE º §g |-|-07?i ‘EË U.S. Patent Mar. 4, 2014 Sheet 12 of 27 US 8,663,919 B2 c were vs. Cer rir waa sia s C. : -- -- --~~ es ce s r . & res ii. {..) M KS is s s s . xeror wo 8. 8 CD d. {X --- c ws d s -- ex cN w- cc (c. x. (N. c. sa- ge a s is Á3u8 had uo 38.3) 8 A83 geze (N U.S. Patent Mar. 4, 2014 Sheet 13 Of 27 US 8,663,919 B2 029 r - - - - - - - - - - - - - - - - - --K L sex f s KC war : s * : c sea reas a x - c. x s &s U.S. Patent Mar. 4, 2014 Sheet 14 of 27 US 8,663,919 B2 $(94-3 U.S. Patent Mar. 4, 2014 Sheet 15 Of 27 US 8,663,919 B2 i. U.S. Patent Mar. 4, 2014 Sheet 16 of 27 US 8,663,919 B2 Amplification Piot Amplification Piot O3 + 1,000 E-3 di:. 1.000E-4 di. Detector. 8sw Fio: A R vs. joie Threshoic:50 F.G. 5A U.S. Patent Mar. 4, 2014 Sheet 17 Of 27 US 8,663,919 B2 3} 250 20 50 O 5 interra adaptor incorporated targets G. 53 U.S. Patent Mar. 4, 2014 Sheet 18 of 27 US 8,663,919 B2 - - - - F.G. 6A U.S. Patent Mar. 4, 2014 Sheet 19 Of 27 US 8,663,919 B2 80 - 80 - 45w 35 30 250 - 20 5. Sr. F.G. 6B U.S. Patent Mar. 4, 2014 Sheet 20 Of 27 US 8,663,919 B2 uô?søgog U.S. Patent Mar. 4, 2014 Sheet 21 of 27 US 8,663,919 B2 $3',d???šan N-O bo e 80 U.S. Patent Mar. 4, 2014 Sheet 22 of 27 US 8,663,919 B2 ...r.l...r...r.l...r...r.l...r....r. seeeeeeeee N-O & O eja Å U.S. Patent Mar. 4, 2014 Sheet 23 Of 27 US 8,663,919 B2 Aligned Met Curves 5 66 67 6869 to 7, 72 7374 is 76 77 is 7e 80 81628384 85 36 8788 emperature {C} A: 0% Methyl, Cr? at 23 B: 100% Methyl FG 9A U.S. Patent Mar. 4, 2014 Sheet 24 of 27 US 8,663,919 B2 PD Expression 23 Confo frn GA / MCF-7 Control not in GAPDH F.G. 93 U.S. Patent Mar. 4, 2014 Sheet 25 Of 27 US 8,663,919 B2 W six-sc on Oo - Co is a ri () c- Y - O - c < *; k (- - - - x---see % S85te 3 po U.S. Patent Mar. 4, 2014 Sheet 26 of 27 US 8,663,919 B2 r s s S.- S.- SNNNN, NY, i. s res NNNicx, NNNNNNNNN, CNN st s--- Sefuel O pio U.S. Patent Mar. 4, 2014 Sheet 27 Of 27 US 8,663,919 B2 BC2 2. 20.5 20.0 9.5 - - BC2 Expression. & AC-7 231st 5 9.0 8.5 8,0-- --------------------------------------- 17.5 23 Contro MCF-7 Control FG. 21A hsa-MiR-2 Expression 28,5- 28.35 - 28 AiR 21 Expression: 27.5 C7|3:5 27 26.5 25.5- 24.5 23 Contro Nor MCF Control Nor FG 21B US 8,663,919 B2 1. 2 CHROMOSOME CONFORMATION ization (FISH) analysis can examine multiple loci, but the ANALYSIS severe experimental conditions may adversely affect chromo Somal organization. CROSS-REFERENCE TO RELATED In an attempt to overcome the limitations of visual chro APPLICATIONS mosomal analysis, methods using the polymerase chain reac tion (PCR) have been developed. For example, the chromo This application claims priority to U.S. Provisional Appli Some conformation capture (3C) method analyzes overall cation No. 61/487,614, filed May 18, 2011, U.S. Provisional chromosomal spatial organization and physical properties at Application No. 61/579,444, filed Dec. 22, 2011 and U.S. a higher resolution (see, Dekker et al., Science 295:1306 Provisional Application No. 61/579,876, filed Dec. 23, 2011 10 1311 (2002)). In3C experiments, genomic DNA (gDNA) and proteins in the chromosomes are fixed in place by cross which disclosures are herein incorporated by reference in linking. The cross-linked gldNA is digested by a restriction their entirety. enzyme and ligated before being purified for analysis. Physi FIELD cal interactions between genomic loci are identified as spe 15 cific cross-ligated DNA elements using PCR amplification. As a result of 3C methods, spatial information is converted to This disclosure relates to methods, reagents and kits useful quantifiable DNA sequences. However, the wide adaptation for epigenetic analysis of DNA. of 3C methods has been hindered by the lack of quantitative process controls and cumbersome protocols. BACKGROUND Derivative methods of 3C, such as 4C and 5C, have also been developed. 4C and 5C differ from 3C only in the analysis Epigenetics is the study of heritable changes in gene func of the ligation product. In 4C, the ligation product is first tion that do not involve changes in DNA sequence. These amplified by PCR using two outwardly facing primers from changes can occur due to the chemical modification of spe the restriction site to create a circular DNA molecule which is cific genes or gene-associated proteins of an organism. These 25 then analyzed by microarray technology. In 5C, the ligation modifications can affect how genes are expressed and used in products are mixed with special primers designed to anneal at cells. For example, methylation of specific regions in gene the ends of the restriction fragment. Analysis is carried out sequences. Such as CpG sites, can make these genes less either by use of a microarray or by sequencing against a 3C transcriptionally active. Another example is post-transla library of ligation products. tional modification of histone proteins around which genomic 30 However, there are several disadvantages associated with DNA is wound. This histone modification can affect the these 3C-based methods. The assays are time consuming and unwinding of the DNA during transcription which, in turn, are not precise and sensitive enough for reactions with low can affect the expression of the transcribed genes. Further digestion or ligation efficiency. These methods also require more, chromosome conformation or chromatin compaction significant dilution of the DNA sample. As a result, large can also have an effect on gene expression, Such as affecting 35 quantities of the sample may be needed, which may not the accessibility of the template to polymerases or changing always be available.
Recommended publications
  • DNA Sequencing
    1 DNA Sequencing Description of Module Subject Name Paper Name Module Name/Title DNA Sequencing Dr. Vijaya Khader Dr. MC Varadaraj 2 DNA Sequencing 1. Objectives 1. DNA sequencing Introduction 2. Types of methods available 3. Next generation sequencing 2. Lay Out 3 DNA Sequencing Sequencing Maxam Gilbert Sanger's Next generation 3 Introduction 4 DNA Sequencing In the mid 1970s when molecular cloning techniques in general were rapidly improving, simple methods were also developed to determine the nucleotide sequence of DNA. These advances laid the foundation for the detailed analysis of the structure and function of the large number of genes. Other fields which utilize DNA sequencing include diagnostic and forensic biology. With the invention of DNA sequencing, research and discovery of new genes is increased many folds. Using this technique whole genomes of many animal (human), plant, and microbial genomes have been sequenced. The first attempts to sequence DNA mirrored techniques developed in 1960s to sequence RNA. These involved : a) Specific cleavage to the DNA in to smaller fragments by enzymatic digestion or chemical digestion b) Nearest neighbour analysis c) Wandering spot method. Indeed in some studies the DNA was transcribed in to RNA with E.coli RNA polymerase and then sequenced as RNA So there were several questions unanswered; How many base pairs (bp) are there in a human genome?( ~3 billion (haploid)) How much did it cost to sequence the first human genome? ~$2.7 billion How long did it take to sequence the first human genome?( ~13 years) When was the first human genome sequence complete? (2000-2003) Goal figuring the order of nucleotides across a genome Problem Current DNA sequencing methods can handle only short stretches of DNA at once (<1-2Kbp) Solution 5 DNA Sequencing Sequence and then use computers to assemble the small pieces 3.1 Maxam-Gilbert sequencing Also known as chemical sequencing, here there is use radioactive labeling at one 5' end of the DNA fragment which is to be sequenced.
    [Show full text]
  • DNA Sequencing - I
    SUBJECT FORENSIC SCIENCE Paper No. and Title PAPER No.13: DNA Forensics Module No. and Title MODULE No.21: DNA Sequencing - I Module Tag FSC_P13_M21 FORENSIC SCIENCE PAPER No.13: DNA Forensics MODULE No.21: DNA Sequencing - I TABLE OF CONTENTS 1. Learning Outcomes 2. Introduction 3. First Generation Sequencing Methods 4. Second Generation Sequencing Techniques 5. Third Generation Sequencing – emerging technologies 6. DNA Sequence analysis 7. Summary FORENSIC SCIENCE PAPER No.13: DNA Forensics MODULE No.21: DNA Sequencing - I 1. Learning Outcomes After studying this module, reader shall be able to understand - DNA Sequencing Methods of DNA Sequencing DNA sequence analysis 2. Introduction In 1953, James Watson and Francis Crick discovered double-helix model of DNA, based on crystallized X-ray structures studied by Rosalind Franklin. As per this model, DNA comprises of two strands of nucleotides coiled around each other, allied by hydrogen bonds and moving in opposite directions. Each strand is composed of four complementary nucleotides – adenine (A), cytosine (C), guanine (G) and thymine (T) – with A always paired with T and C always paired with G with 2 & 3 hydrogen bonds respectively. The sequence of the bases (A,T,G,C) along DNA contains the complete set of instructions that make up the genetic inheritance. Defining the arrangement of these nucleotide bases in DNA strand is a primary step in assessing regulatory sequences, coding and non-coding regions. The term DNA sequencing denotes to methods for identifying the sequence of these nucleotides bases in a molecule of DNA. The basis for sequencing proteins was initially placed by the effort of Fred Sanger who by 1955 had accomplished the arrangement of all the amino acids in insulin, a small protein produced by the pancreas.
    [Show full text]
  • DNA Sequencing  Genetically Inherited Diseases
    DNA, MUTATION DETECTION TYPES OF DNA There are two major types of DNA: Genomic DNA and Mitochondrial DNA Genomic DNA / Nuclear DNA: Comprises the genome of an organism and lead to an expression of genetic traits. Controls expression of the various traits in an organism. Sequenced as part of the Human Genome Project to study the various functions of the different regions of the genome Usually, during DNA replication there is a recombination of genes bringing about a change in sequence leading to individual specific characteristics. This way the difference in sequence could be studied from individual to individual. MITOCHONDRIAL DNA(MT DNA) mtDNA is a double stranded circular molecule. mtDNA is always Maternally inherited. Each Mitochodrion contains about 2-10 mtDNA molecules. mtDNA does not change from parent to offspring (Without recombination) Containing little repetitive DNA, and codes for 37 genes, which include two types of ribosomal RNA, 22 transfer RNAs and 13 protein subunits for some enzymes HUMAN STRUCTURAL GENE 1. Helix–turn–helix 2. Zinc finger 3. Leucine zipper 4. Helix–loop–helix GENE TO PROTEIN Facilitate transport of the mRNA to the cytoplasm and attachment to the ribosome Protect the mRNA from from 5' exonuclease Acts as a buffer to the 3' exonuclease in order to increase the half life of mRNA. TRANSLATION TYPES OF DNA SEQUENCE VARIATION VNTR: Variable Number of Tandem Repeats or minisatellite (Telomeric DNA, Hypervariable minisatellite DNA) ~6-100 bp core unit SSR : Simple Sequence Repeat or STR (short
    [Show full text]
  • Bioinformatics: a Perspective
    Bioinformatics: A perspective Dr. Matthew L. Settles Genome Center University of California, Davis [email protected] Outline • The World we are presented with • Advances in DNA Sequencing • Bioinformatics as Data Science • Viewport into bioinformatics • Training • The Bottom Line Sequencing Costs October 2016 Cost per Megabase of Sequence Cost per Human Sized Genome @ 30x $0.014/Mb $1245 per $1000 Human sized $10000000 (30x) genome $100 $10 Dollars $100000 $1 $0.1 $1000 2005 2010 2015 2005 2010 2015 Year Year • Includes: labor, administration, management, utilities, reagents, consumables, instruments (amortized over 3 years), informatics related to sequence productions, submission, indirect costs. • http://www.genome.gov/sequencingcosts/ Growth in Public Sequence Database 13 10 WGS > 1 trillion bp 8 10 11 10 6 10 9 10 Bases Sequences 4 7 10 10 ● GenBank ● GenBank 5 ● WGS 2 ● WGS 10 10 1990 2000 2010 1990 2000 2010 Year Year • http://www.ncbi.nlm.nih.gov/genbank/statistics Short Read Archive (SRA) Growth of the Sequence Read Archive (SRA) over time 15 10 > 1 quadrillion bp 14 10 13 10 12 10 11 10 ● Bases ● Bytes ● Open Access Bases ● Open Access Bytes 2000 2005 2010 2015 Year http://www.ncbi.nlm.nih.gov/Traces/sra/ Increase in Genome Sequencing Projects Lists > 3700 unique genus • JGI – Genomes Online Database (GOLD) • 67,822 genome sequencing projects Brief History Sequencing Platforms • 1986 - Dye terminator Sanger sequencing, technology dominated until 2005 until “next generation sequencers”, peaking at about 900kb/day ‘Next’ Generation • 2005 – ‘Next Generation Sequencing’ as Massively parallel sequencing, both throughput and speed advances. The first was the Genome Sequencer (GS) instrument developed by 454 life Sciences (later acquired by Roche), Pyrosequencing 1.5Gb/day Discontinued Illumina • 2006 – The second ‘Next Generation Sequencing’ platform was Solexa (later acquired by Illumina).
    [Show full text]
  • Sequencing Genomes
    Sequencing Genomes Propojený obrázek nelze zobrazit. Příslušný soubor byl pravděpodobně přesunut, přejmenován nebo odstraněn. Ověřte, zda propojení odkazuje na správný soubor a umístění. Human Genome Project: History, results and impact MUDr. Jan Pláteník, PhD. (December 2020) Beginnings of sequencing • 1965: Sequence of a yeast tRNA (80 bp) determined • 1977: Sanger’s and Maxam & Gilbert’s techniques invented • 1981: Sequence of human mitochondrial DNA (16.5 kbp) • 1983: Sequence of bacteriophage T7 (40 kbp) • 1984: Epstein & Barr‘s Virus (170 kbp) 1 Homo sapiens • 1985-1990: Discussion on human genome sequencing • “dangerous” - “meaningless” - “impossible to do” • 1988-1990: Foundation of HUMAN GENOME PROJECT • International collaboration: HUGO (Human Genome Organisation) • Aims: – genetic map of human genome – physical map: marker every 100 kbp – sequencing of model organisms (E. coli, S. cerevisiae, C. elegans, Drosophila, mouse) – find all human genes (estim. 60-80 tisíc) – sequence all human genome (estim. 4000 Mbp) by 2005 Other genomes • July 1995 : Haemophilus influenzae (1.8 Mbp) ... First genome of independent organism • October 1996: Saccharomyces cerevisiae (12 Mbp) ... First Eukaryota • December 1998: Caenorhabditis elegans (100 Mbp) ... First Metazoa 2 May 1998: • Craig Venter launches private biotechnology company CELERA GENOMICS, Inc. and announces intention to sequence whole human genome in just 3 years and 300 mil. USD using the whole- genome shotgun approach. • The publicly funded HGP in that time: sequenced cca 4
    [Show full text]
  • BIMM 143 Genome Informatics I Lecture 13
    BIMM 143 Genome Informatics I Lecture 13 Barry Grant http://thegrantlab.org/bimm143 Todays Menu: • What is a Genome? • Genome sequencing and the Human genome project • What can we do with a Genome? • Compare, model, mine and edit • Modern Genome Sequencing • 1st, 2nd and 3rd generation sequencing • Workflow for NGS • RNA-Sequencing and Discovering variation What is a genome? The total genetic material of an organism by which individual traits are encoded, controlled, and ultimately passed on to future generations Side note! Genetics and Genomics • Genetics is primarily the study of individual genes, mutations within those genes, and their inheritance patterns in order to understand specific traits. • Genomics expands upon classical genetics and considers aspects of the entire genome, typically using computer aided approaches. Side note! Genomes come in many shapes • Primarily DNA, but can be RNA in the case of some viruses Prokaryote • Some genomes are circular, others linear Bacteriophage • Can be organized into discrete units (chromosomes) or freestanding molecules (plasmids) Eukaryote Side note! Under a microscope, a Eukaryotic cell's genome (i.e. collection of chromosomes) resembles a chaotic jumble of noodles. The looping is not random however and appears to play a role in controlling gene regulation. Image credit: Scientific American March 2019 Genomes come in many sizes 100,000 Eukaryotes Prokaryotes 1,000 Viruses 10 Number of protein coding genes Number of protein 0.01 1.00 100.00 Genome size (Mb) Modified from image by Estevezj
    [Show full text]
  • Next Generation Sequencing for SARS-Cov-2
    Next generation sequencing for SARS-CoV-2 Next generation sequencing for SARS-CoV-2/COVID-19 Acknowledgements The PHG Foundation and FIND are grateful for the insight provided by the individuals consulted during the course of this work. Full consultee acknowledgements are listed in Appendix 1 Lead writers Chantal Babb de Villiers (PHG Foundation) Laura Blackburn (PHG Foundation) Sarah Cook (PHG Foundation) Joanna Janus (PHG Foundation) Emma Johnson (PHG Foundation) Mark Kroese (PHG Foundation) Concept development and lead review Swapna Uplekar (FIND) Reviewers Devy Emperador (FIND) Jilian Sacks (FIND) Marva Seifert (FIND) Anita Suresh (FIND) Report production PHG Foundation July 2020; Update March 2021 This report is the result of PHG Foundation’s independent research and analysis and is not linked to a third party in any way. PHG Foundation has provided occasional analytical services to Oxford Nanopore Technologies (ONT) as part of a consultancy agreement. Next generation sequencing for SARS-CoV-2 2 Contents Introduction 6 1.1 Methods 6 1.2 SARS-CoV-2 7 1.3 SARS-CoV-2 genome sequencing 8 1.4 SARS-CoV-2 biology 9 2 Global surveillance: applications of sequencing technologies and techniques 14 2.1 Overview of surveillance 15 2.2 Overview of genomic surveillance 17 2.3 Sequencing initiatives across the globe 18 2.4 Sequencing data repositories and data sharing 25 2.5 SARS-CoV-2 genomic data releases by country 26 2.6 Additional sequencing related initiatives 29 3 Diagnostics and the role of sequencing 32 3.1 Sequencing for diagnosis 32 3.2
    [Show full text]
  • Next Generation Sequencing: Principle and Applications
    Int.J.Curr.Microbiol.App.Sci (2021) 10(08): 329-333 International Journal of Current Microbiology and Applied Sciences ISSN: 2319-7706 Volume 10 Number 08 (2021) Journal homepage: http://www.ijcmas.com Review Article https://doi.org/10.20546/ijcmas.2021.1008.039 Next Generation Sequencing: Principle and Applications Arul Pandiyan* and Mahesh Dahake Department of Veterinary Pathology, Nagpur Veterinary College, Maharashtra Animal and Fishery Sciences University, Nagpur-440006, India *Corresponding author ABSTRACT Next Generation Sequencing (NGS) also named as massively parallel sequencing, is a powerful new tool that can be used for the complex K eyw or ds diagnosis and intensive monitoring of infectious diseases in veterinary medicine. NGS technologies are also being increasingly used to study the Next Generation Sequencing, aetiology, genomics, evolution and epidemiology of infectious disease, as Clinical diagnostics, well as host pathogen interaction, host immune response including Sanger sequencing, Metagenomics responses to antimicrobial treatment and vaccination. NGS approaches can be used as primary tools for the study of disease outbreaks by the Article Info identification and follow-up of transmission routes, thereby helping in the Accepted: identification of outbreak origins in zoonotic diseases. The above 15 July 2021 applications of next generation sequencing in disease and cancer diagnosis, Available Online: 10 August 2021 genome characterization and viral diversity, viral metagenomics and host pathogen interaction study make it a powerful tool for detailed analysis of complex populations in a minimalistic period of time. Introduction The structure of DNA was identified by Watson and Crick based on the fundamental Histopathology is a technique used to directly DNA crystallography and X-ray diffraction visualize the diseased tissue for the collection work of Rosalind Franklin (Watson J.D.et al., of data which can be used for patient 1953).
    [Show full text]
  • Identification of Genetic Variation Using Next-Generation Sequencing
    Technische Universitat¨ Munchen¨ Lehrstuhl f¨urBioinformatik Identification of genetic variation using Next-Generation Sequencing Sebastian Hubert Eck Vollst¨andigerAbdruck der von der Fakult¨atWissenschaftszentrum Weihen- stephan f¨urErn¨ahrung,Landnutzung und Umwelt der Technischen Univer- sit¨atM¨unchen zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften genehmigten Dissertation. Vorsitzender: Univ.-Prof. Dr. H.-R. Fries Pr¨uferder Dissertation: 1. Univ.-Prof. Dr. H-W. Mewes 2. Univ.-Prof. Dr. R. Zimmer, Ludwig-Maximilians Universit¨atM¨unchen 3. Priv.-Doz. Dr. T. M. Strom Die Dissertation wurde am 30.07.2013 bei der Technischen Universit¨atM¨unchen eingereicht und am 27.01.2014 durch die Fakult¨atWissenschaftszentrum Wei- henstephan f¨urErn¨ahrung,Landnutzung und Umwelt angenommen. Summary The field of genetic research was drastically changed by the advent of next- generation DNA sequencing technologies during the last three years. These technologies are able to generate unparalleled amounts of DNA sequence in- formation at a cost that is several orders of magnitude lower than standard sequencing techniques. Employing this technology allows identification of dis- ease causative and disease associated mutations from a frequency spectrum previously not accessible with standard techniques. Even though showing great promise, researchers are also facing new challenges. In particular the handling and analysis of unprecedented amounts of data. The goal of this thesis was the conception and implementation of an auto- mated, computational analysis pipeline for next generation sequencing data in general and whole exome sequencing data in particular in order to robustly identify disease related mutations. The pipeline covers all key analysis steps including alignment to a reference genome, variant calling, quality filtering and annotation of identified variants, storage of all variants in a relational database and identification of putatively causative mutations.
    [Show full text]
  • BIMM 143 Genome Informatics I Lecture 14
    BIMM 143 Genome Informatics I Lecture 14 Barry Grant http://thegrantlab.org/bimm143 TODAYS MENU: ‣ What is a Genome? • Genome sequencing and the Human genome project ‣ What can we do with a Genome? • Compare, model, mine and edit ‣ Modern Genome Sequencing • 1st, 2nd and 3rd generation sequencing ‣ Workflow for NGS • RNA-Sequencing and Discovering variation Genetics and Genomics • Genetics is primarily the study of individual genes, mutations within those genes, and their inheritance patterns in order to understand specific traits. • Genomics expands upon classical genetics and considers aspects of the entire genome, typically using computer aided approaches. What is a Genome? The total genetic material of an organism by which individual traits are encoded, controlled, and ultimately passed on to future generations Genomes come in many shapes • Primarily DNA, but can be RNA in the case of some viruses prokaryote • Some genomes are circular, others linear bacteriophage • Can be organized into discrete units (chromosomes) or freestanding molecules (plasmids) eukaryote Prokaryote by Mariana Ruiz Villarreal| Bacteriophage image by Splette / CC BY-SA | Eukaryote image by Magnus Manske/ CC BY-SA Genomes come in many sizes Modified from image by Estevezj / CC BY-SA Genome Databases NCBI Genome: http://www.ncbi.nlm.nih.gov/genome Early Genome Sequencing • Chain-termination “Sanger” sequencing was developed in 1977 by Frederick Sanger, colloquially referred to as the “Father of Genomics” • Sequence reads were typically 750-1000 base pairs in length
    [Show full text]
  • Genomics: a Perspective
    Genomics: A perspective Matthew L. Settles, Ph.D. Genome Center University of California, Davis [email protected] Sequencing Platforms • 1986 - Dye terminator Sanger sequencing, technology dominated until 2005 (and remains relevant) until “next generation sequencers”, peaking at about 900kb/day ‘Next’ Generation • 2005 – ‘Next Generation Sequencing’ as Massively parallel sequencing, both throughput and speed advances. The first was the Genome Sequencer (GS) instrument developed by 454 life Sciences (later acquired by Roche), Pyrosequencing 1.5Gb/day Discontinued Illumina (Solexa) • 2006 – The second ‘Next Generation Sequencing’ platform. Now the dominant platform with 75% market share of sequencer and and estimated >90% of all bases sequenced are from an Illumina machine, Sequencing by Synthesis > 1600Gb/day. NovaSeq HiSeq Complete Genomics • 2006 – Using DNA nanoball sequencing, has been a leader in Human genome resequencing, having sequenced over 20,000 genomes to date. In 2013 purchased by BGI and is now set to release their first commercial sequencer, the Revolocity. Throughput on par with HiSeq NOW DEFUNCT Human genome/exomes only. 10,000 Human Genomes per year Bench top Sequencers vRoche 454 Junior vLife Technologies § Ion Torrent § Ion Proton § Gene Studio S5 vIllumina § MiSeq § MiniSeq § iSeq 100 The ‘Next, Next’ Generation Sequencers (3rd Generation) • 2009 – Single Molecule Read Time sequencing by Pacific Biosystems, most successful third generation sequencing platforms, RSII ~2Gb/day, newer Pac Bio Sequel ~14Gb/day, near 100Kb reads. SMRT Sequencing Iso-seq on Pac Bio possible, transcriptome without ‘assembly’ SmidgION: nanopore sensing for use with mobile devices Oxford Nanopore • 2015 – Another 3rd generation sequencer, founded in 2005 and currently in beta testing.
    [Show full text]
  • Cancer Genome Sequencing and Its Implications for Personalized Cancer Vaccines
    Cancers 2011, 3, 4191-4211; doi:10.3390/cancers3044191 OPEN ACCESS cancers ISSN 2072-6694 www.mdpi.com/journal/cancers Review Cancer Genome Sequencing and Its Implications for Personalized Cancer Vaccines Lijin Li 1,†, Peter Goedegebuure 1,2,†, Elaine R. Mardis 2,3, Matthew J.C. Ellis 2,4, Xiuli Zhang 1, John M. Herndon 1, Timothy P. Fleming 1,2, Beatriz M. Carreno 2,4, Ted H. Hansen 2,5 and William E. Gillanders 1,2,* 1 Department of Surgery, Washington University School of Medicine, St. Louis, MO 63110, USA; E-Mails: [email protected] (L.L.); [email protected] (P.G.); [email protected] (X.Z.); [email protected] (J.M.H.); [email protected] (T.P.F.) 2 The Alvin J. Siteman Cancer Center at Barnes-Jewish Hospital and Washington University School of Medicine, St. Louis, MO 63110, USA; E-Mails: [email protected] (E.R.M.); [email protected] (M.J.C.E.); [email protected] (B.M.C.); [email protected] (T.H.H.) 3 The Genome Institute at Washington University School of Medicine, St. Louis, MO 63108, USA 4 Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA 5 Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63110, USA † These authors contributed equally to this work. * Author to whom correspondence should be addressed; E-Mail: [email protected]; Tel.: +1-314-7470072; Fax: +1-314-454-5509. Received: 17 September 2011; in revised form: 31 October 2011 / Accepted: 9 November 2011 / Published: 25 November 2011 Abstract: New DNA sequencing platforms have revolutionized human genome sequencing.
    [Show full text]