Platforms for Biomarker Analysis Using High- Throughput Approaches In
Total Page:16
File Type:pdf, Size:1020Kb
UNIT 2. BIOMARKERS: PRACTICAL ASPECTS CHAPTER 7. Platforms for biomarker analysis using high- throughput approaches in UNIT 2 CHAPTER 7 CHAPTER genomics, transcriptomics, proteomics, metabolomics, and bioinformatics B. Alex Merrick, Robert E. London, Pierre R. Bushel, Sherry F. Grissom, and Richard S. Paules Summary Global biological responses that transcriptomics, proteomics and ultimate aim of genomic platforms reflect disease or exposure biology metabolomics that can be performed is to fully map this landscape to are kinetic and highly dynamic at a cell-, tissue-, or organism- more completely describe all of phenomena. While high-throughput wide basis. Bioinformatics is the the biological interactions within a DNA sequencing continues to discipline that derives knowledge living system, during disease and drive genomics, the possibility of from the large quantity and diversity toxicity, and define the behaviour more broadly measuring changes of biological, genetic, genomic and and relationships of all the in gene expression has been a gene expression data by integrating components of a biological system. recent development manifested by computer science, mathematics, The development of databases and a diversity of technical platforms. statistics and graphic arts. Gene, knowledge bases will support the Such technologies measure protein and metabolite expression integration of data from multiple transcripts, proteins and small profiles can be thought of as domains, as well as computational biological molecules, or metabolites, snapshots of the current, poorly- modelling. This chapter will and respectively define the fields of mapped molecular landscape. The describe the technical platform Unit 2 • Chapter 7. Platforms for biomarker analysis using high-throughput approaches 121 in genomics, transcriptomics, proteomics, metabolomics, and bioinformatics methods involving DNA sequencing, genomes that allow comparison of the genetic blueprint of an individual mass spectrometry, nuclear genetic organization, evolution and is relatively static, the various levels magnetic resonance combined function. Nearly 300 genomes have of gene expression to form and with separation systems, and been completely sequenced and operate a complex organism are bioinformatics to derive genomic range from unicellular organisms, dynamically regulated, structurally and gene expression data and like E. coli and S. cervisiae, to complex and spatially determined. include the relevant bioinformatic model invertebrate organisms, such At any point in time, only a portion tools for analysis. These genomic, as Drosophila melanogaster and of a genome is expressed in or omics platforms should have C. elegans, to several mammalian specific cells and tissues. At the wide application to epidemiological species for which the completed mRNA level of gene expression, the studies. and ongoing genome projects are all transcriptome represents all genes available online (3). transcribed at any one moment, and Introduction Although the conception of the the proteome is the complement of idea for sequencing the human proteins making up cells and tissues. The sequencing of the human genome is relatively recent, the Small molecules and metabolites genome stands as one of the major project could not have occurred comprise the metabolome. The scientific achievements of the without the preceding decades global study of each gene expression twentieth century. It embodies a of biological and technological level is suffixed with “omics,” such defining moment in modern biology developments. Particularly as transcriptomics, proteomics and by which most high-throughput noteworthy of these contributions are metabolomics. Figure 7.1 suggests technologies are compared for size early cytogenetics and chromosomal a sequence of gene expression scope, and complexity. Beginning in studies at the beginning of the based on genomic DNA sequences 1990, it took roughly a decade for the twentieth century by Morgan and that are dynamically reflected in first draft of the human genome to colleagues, the discovery of DNA changes of transcripts, proteins be completed. By 2003, about 99% structure by Watson and Crick in and metabolites. Each level of of all gene-containing regions were 1953, DNA cloning in 1973 by Berg gene expression (represented by described, numbering about 20 500 and Cohen, the DNA sequencing upward, curved, dotted lines) has genes (1), although some regions of reaction in 1975 by Sanger, reverse the opportunity to feed back and the genome, such as centromeres, transcriptase in 1970 and restriction influence other levels reflective telomeres and gene deserts, endonucleases in 1971, and the of highly integrated, multicellular continue to undergo characterization polymerase chain reaction (PCR) processes in cells, tissues and and study. Data from the human by Mullis in 1983 (4). A new century organisms. Studies of each gene genome project has provided a now begins with an era of “omics,” expression area utilize very generalized human map of the those fields describing a multitude of different technical platforms to three billion nucleotides comprising genomic functions aimed at further maximize large scale coverage of the DNA of a few human subjects. deciphering the biological meaning the transcriptome, proteome and However, studies on the variations of sequences in the human genome. metabolome. Technical platforms (polymorphisms) in human DNA The purpose of this chapter is may involve mass parallel analysis sequences are currently underway; to describe the technical platforms using robotics, miniaturization, samples from 270 individuals of in genomics, transcriptomics, automation and computer multiethnic backgrounds are being proteomics, metabolomics and processing. The integration of the used in a consortium called the bioinformatics that could be useful many levels of gene expression is International HapMap Project for in epidemiologic studies. These often referred to as systems biology haplotype mapping (http://www. analytical platforms favour high by bioinformatics or computational hapmap.org) (2). The goal is to sample throughput and generation biology. Bioinformatics represents identify the patterns of single of large data sets. an applied field of mathematics to nucleotide polymorphism (SNP) biochemistry and molecular biology groups, called haplotypes or haps, Omes and omics using statistics, computer science among individual human beings. In and artificial intelligence to design addition, interpretation of the human Gene expression constantly algorithms to derive biological genome has been greatly enhanced changes during health, adaptation, meaning from gene expression by the DNA sequencing of many other toxicity, disease and aging. While data. 122 Figure 7.1. The interdependence of the cellular metabolome, proteome, transcriptome, chromosomal aberrations and and genome. Each type of characterization provides a functional indication of the mutations that affect phenotype. activity of the proceeding set of molecules (solid lines). Conversely, there will be DNA sequencing does have inherent some degree of feedback regulation built into the system (dotted lines). advantages in achieving single- base resolution and importantly for de novo analysis of samples without the prior knowledge of existing DNA sequence required for fabricated sequence platforms (8). New sequencing technologies that are high-throughput and low-cost while maintaining high accuracy and completeness are in continued development (9). New UNIT 2 platforms often integrate real-time (RT) PCR and may incorporate 7 CHAPTER microelectrophoresis, sequencing by hybridization, mass spectrometry, high-density oligonucleotide arrays, or incorporation of nanopore technology to bring within sight the goal of routine human genome sequencing (10) for personalized medicine (11). There are several alternatives to whole-genome assessment of chromosomal abnormalities that do not involve traditional DNA sequencing. Alternative platforms involve selective genotyping of very focused genomic loci (short Genomics such as those in p53 in the Li- sections of DNA) that are potentially Fraumeni syndrome (5) and ATM in related to disease susceptibility. Chromosomal abnormalities are ataxia telangiectasia (6), have been Haplotype mapping data have responsible for many developmental found by sectional resequencing been extremely useful in providing defects and malignancies, and of genomic DNA or PCR-amplified candidate genomic regions with high include rearrangements in genomic DNA or RNA. However, de novo polymorphic variation. Hundreds DNA or changes in copy number, sequencing of individuals using of thousands of loci can be very such as deletions, duplications automated Sanger-based capillary rapidly genotyped using BeadArray and amplifications. Identification of electrophoresis systems has so far platforms, a technology based genomic changes and mutations been practical for only small regions upon direct hybridization of whole- that underlie disease rely on of the human genome-containing genome-amplified (WGA) genomic comparisons of DNA sequences candidate genes. DNA to BeadArrays of locus- between affected and unaffected Recent advances in nucleic specific, 50mer oligonucleotide individuals. Finding disease- acid sequencing technologies using sequences (12). As such, genome- causing chromosomal abnormalities massive parallel sequencing, called wide association studies (GWAS) by genomic analysis is confounded next-generation sequencing,