Sequencing of Nucleic Acids: from the First Human Genome to Next

268 DOI: 10.17344/acsi.2021.6691 Acta Chim. Slov. 2021, 68, 268–278 Review Sequencing of Nucleic Acids: from the First Human Genome to Next Generation Sequencing in COVID-19 Pandemic Borut Furlani,1,§ Katarina Kouter,2,§ Damjana Rozman3,* and Alja Videtič Paska2,* 1 University of Ljubljana, Faculty of Medicine, Vrazov trg 2, 1000 Ljubljana, Slovenia 2 University of Ljubljana, Faculty of Medicine, Institute of Biochemistry and Molecular Genetics, Medical Centre for Molecular Biology, Vrazov trg 2, 1000 Ljubljana, Slovenia 3 University of Ljubljana, Faculty of Medicine, Institute of Biochemistry and Molecular Genetics, Centre for Functional Genomics and Bio-Chips, Zaloška 4, 1000 Ljubljana, Slovenia §These two author equally contributed to this work. * Corresponding author: E-mail: [email protected]; Tel. + 386 1 543 76 61; Fax. + 386 1 543 76 41, [email protected]; Tel. + 386 1 543 75 91; Fax. + 386 1 543 76 41 Received: 01-26-2021 Abstract Despite being around for more than 40 years, DNA sequencing is regarded as young technology in clinical medicine. As sequencing is becoming cheaper, faster and more accurate, it is rapidly being incorporated into clinical laboratories. In 2003, the completion of the first human genome opened the door to personalized medicine. Ever since it has been expected for genomics to widely impact clinical care and public health. However, many years can pass for genomic dis- coveries to reflect back and benefit the patients. DNA sequencing represents a less biased approach to diagnostics. It is not only a diagnostic tool, but can also influence clinical management and therapy. As new technologies rapidly emerge it is important for researchers and health professionals to have basic knowledge about the capabilities and drawbacks of the existing sequencing methods, and their use in clinical setting and research. This review provides an overview of nucleic acid sequencing technologies from historical perspective and later focuses on clinical utilization of sequencing. Some of the most promising areas are presented with selected examples from Slovenian researchers. Keywords: Clinical sequencing; DNA; nucleic acid; genomics; next generation sequencing; precision medicine 1. Introduction implemented in diagnostics and clinical management. Due to the rapid development of novel technologies, se- More than forty years ago, two papers described the quencing of nucleic acids will contribute to the discovery first method for determining the sequence of nucleotide of the genomic, transcriptomic and epigenomic basis of bases in DNA1,2 and ever since, the development of new unsolved diseases, improved diagnostics, and personalized sequencing methods has been exponential. In 2003, the therapies. Human Genome Project (HGP) was finalized, presenting Advances in sequencing technologies have contrib- the complete version of the Human Genome.3 Once the uted to a significant reduction of the sequencing costs in reference human genome was established, scientists tried the last 15 years (Figure 1). In 2004, the National Human to explain disease mechanisms or susceptibility for certain Genome Research Institute started an initiative to reduce diseases through population resequencing and determina- the whole genome sequence cost to US$10004 accelerating tion of disease-causing genomic variants. Recently, DNA the development of cheaper and faster sequencing meth- sequencing has been moving into the clinical setting to be ods. A deviation of sequencing cost from ‘Moore’s law’ oc- Furlani et al.: Sequencing of Nucleic Acids: from the First Human Genome ... Acta Chim. Slov. 2021, 68, 268–278 269 curred around year 2008 which coincides with the transi- 5’ carbon atom phosphate group of a joining dNTP. What tion from Sanger’s sequencing of nucleic acids to the next made the sequencing possible is the addition of labelled generation sequencing technologies (NGS) resulting in a dideoxynucleotides (ddNTP); dNTPs lacking the deoxyri- rapid fall of sequencing cost. Human genome sequencing bose 3’ carbon atom hydroxy group needed to form the for $1000 was achieved a few years ago5 and with novel phosphodiester bond. After the incorporation of a ddNTP, technologies we are quickly approaching a $100 human DNA polymerase could no longer add new nucleotides as genome. With today’s enormous sequencing outputs and the phosphodiester bonds can no longer be formed. This lower cost per base the biggest challenge remains mean- results in the production of DNA fragments that vary in ingful interpretation and informative reporting of se- length; however, all fragments end with a labelled ddNTP. quencing results. Initially, sequencing was performed using radioactively la- This review provides an overview of nucleic acid se- belled ddNTPs. About a decade later Sanger sequencing quencing technologies and examples of clinical utilization was automated and commercialized using fluorescent-la- of sequencing, also from Slovenian researchers. belled ddNTPs7 and capillary electrophoresis, providing a single base pair resolution.8 Using automated Sanger sequencing, researchers were able to read up to 75,000 bp per day.9 This method presented the foundation for the HGP, the biggest collaborative biological project that was officially completed in 2003 (it took 13 years and cost al- most US$3 billion to obtain the complete version of the human genome).3,6,10 The next step was to identify genomic differences/variants among people to explain disease mechanisms and/or disease susceptibility. Such projects required genomes of many individuals to be sequenced, however, Sanger sequencing was far too time-consuming and expensive. 2. 2. Next (or Second) Generation Sequencing Figure 1: Sequencing cost for the human genome. The departure of (NGS) sequencing cost curve from Moore’s law coincides with the emergence of next generation sequencing (NGS). Moore’s law originates The “hunt” began for alternative DNA sequencing in the computer hardware industry that involves doubling of ‘com- methods, ultimately resulting in the emergence of NGS puting power’ every two years. It is considered that technologies technologies. NGS is based on massive parallel sequencing, that follow the law are regarded as successful.6 It thus represents a useful relationship to compare technology advances. Data shown in meaning that billions of short DNA fragments are se- the figure was obtained from the National Human Genome Re- quenced simultaneously producing short sequence search Institute.6 “r e a d s”. 11 Reads are computationally aligned to the reference sequence to assemble the consensus DNA sequence (Figure 2). NGS technologies significantly increased sequencing throughput, decreased labour, and sequencing 12 2. Short Review of the Sequencing cost. In 2004, the company 454 Life Sciences released the first commercially available NGS platform13, which is no Technologies longer used today. 454 Life Sciences technology was based In 1977, researchers developed two methods which on pyrosequencing; using detection of light to determine enabled sequencing of nucleic acids of several hundred the DNA sequence. The basic principle of pyrosequencing base pairs (bp), the Sanger’s “dideoxy method” and the consists of using enzymes to build the complementary Maxam-Gilbert’s method1,2, causing revolution in biology. DNA strand and detect the base order of DNA strands mo- bilized to beads located on a titer plate. Each addition of 2. 1. First-Generation DNA Sequencing dNTPs to the growing DNA strand, catalysed by DNA polymerase, results in the release of a pyrophosphate. Pyroph- The Sanger sequencing is known today as the osphate is then catalysed to adenosine triphosphate by sul- first-generation DNA sequencing and was based on the fate adenylyltransferase. Enzyme luciferase later utilizes the use of polymerase chain reaction. DNA polymerase is an adenosine triphosphate to generate light by converting lu- enzyme that can elongate an existing DNA molecule by ciferin to oxyluciferin. Only one of the four dNTPs is added adding deoxynucleotides (dNTPs; a nucleobase, deoxyri- at a time, and unused dNTPs are degraded by the enzyme bose, and phosphate groups) to the DNA 3’-end via a apyrase before the addition of new dNTP. Intensity of the phosphodiester bond between the 3’ carbon atom hydroxy light is detected by sensors, indicating if and how many group of the DNA incorporated deoxynucleotide and the dNTP were added to the complementary DNA strand.14 Furlani et al.: Sequencing of Nucleic Acids: from the First Human Genome ... 270 Acta Chim. Slov. 2021, 68, 268–278 Figure 2: Schematic representation of NGS workflow. Steps common to different NGS platforms are DNA fragmentation (that is required or not depending on the library type), library preparation, massive parallel sequencing, bioinformatics analysis, and variant annotation and interpretation.27 Sequencing generates billions of “reads” that are computationally aligned to the reference genome to assemble linear consensus sequence. The number of reads in which a particular base/variant appears is known as a read depth and determines the confidence with which a particular base/ variant is called.12 A particular single nucleotide variant should appear in more than 10 reads (meaning read depth to be at least 10 x) to be regarded as a genuine genomic variant.12 Illumina, Ion Torrent, and Beijing Genomics Institute tion cycle, the device determines the inserted dNTP using (BGI) are currently the main NGS companies on the mar-

Sequencing of Nucleic Acids: from the First Human Genome to Next

DNA Sequencing

DNA Sequencing - I

DNA Sequencing  Genetically Inherited Diseases

Bioinformatics: a Perspective

Sequencing Genomes

BIMM 143 Genome Informatics I Lecture 13

Next Generation Sequencing for SARS-Cov-2

Next Generation Sequencing: Principle and Applications

Identification of Genetic Variation Using Next-Generation Sequencing

BIMM 143 Genome Informatics I Lecture 14

Genomics: a Perspective

Cancer Genome Sequencing and Its Implications for Personalized Cancer Vaccines