The Human Genome: Structure and Function of Genes and Chromosomes
Total Page:16
File Type:pdf, Size:1020Kb
3 The Human Genome: Structure and Function of Genes and Chromosomes Over the past 20 years, remarkable progress has been DNA STRUCTURE: A BRIEF made in our understanding of the structure and func- REVIEW tion of genes and chromosomes at the molecular level. More recently, this has been supplemented by DNA is a polymeric nucleic acid macromolecule an in-depth understanding of the organization of the composed of three types of units: a five-carbon sugar, human genome at the level of its DNA sequence. deoxyribose; a nitrogen-containing base; and a phos- These advances have come about in large measure phate group (Fig. 3–1). The bases are of two types, through the applications of molecular genetics and purines and pyrimidines. In DNA, there are two genomics to many clinical situations, thereby provid- purine bases, adenine (A) and guanine (G), and two ing the tools for a distinctive new approach to med- pyrimidine bases, thymine (T) and cytosine (C). ical genetics. In this chapter, we present an overview Nucleotides, each composed of a base, a phosphate, of the organization of the human genome and the and a sugar moiety, polymerize into long polynu- aspects of molecular genetics that are required for an cleotide chains by 5Ј–3Ј phosphodiester bonds understanding of the genetic approach to medicine. formed between adjacent deoxyribose units (Fig. This chapter is not intended to provide an extensive 3–2). In the human genome, these polynucleotide description of the wealth of new information about chains (in their double-helix form) are hundreds of gene structure and regulation. To supplement the millions of nucleotides long, ranging in size from ap- information discussed here, Chapter 4 describes proximately 50 million base pairs (for the smallest many experimental approaches of modern molecular chromosome, chromosome 21) to 250 million base genetics that are becoming critical to the practice and pairs (for the largest chromosome, chromosome 1). understanding of human and medical genetics. The anatomical structure of DNA carries the The increased knowledge of genes, and of their chemical information that allows the exact transmis- organization in the genome, has had an enormous sion of genetic information from one cell to its impact on medicine and on our perception of human daughter cells and from one generation to the next. physiology. As Nobel laureate Paul Berg stated At the same time, the primary structure of DNA presciently at the dawn of this new era: specifies the amino acid sequences of the polypep- Just as our present knowledge and practice of medicine tide chains of proteins, as described later in this relies on a sophisticated knowledge of human anatomy, chapter. DNA has elegant features that give it these physiology, and biochemistry, so will dealing with dis- properties. The native state of DNA, as elucidated ease in the future demand a detailed understanding of by James Watson and Francis Crick in 1953, is a the molecular anatomy, physiology, and biochemistry of the human genome. We shall need a more double helix (Fig. 3–3). The helical structure detailed knowledge of how human genes are organized resembles a right-handed spiral staircase in which its and how they function and are regulated. We shall also two polynucleotide chains run in opposite direc- have to have physicians who are as conversant with the tions, held together by hydrogen bonds between molecular anatomy and physiology of chromosomes pairs of bases: A of one chain paired with T of the and genes as the cardiac surgeon is with the structure and workings of the heart. other, and G with C (see Fig. 3–3). Consequently, 17 18 THOMPSON & THOMPSON GENETICS IN MEDICINE Purines Pyrimidines NH2 O C N C N C HN C CH3 CH HC C C CH N N O N O Figure 3–1. H The four bases of H _ 5' Base DNA and the general structure of a O P O CH Adenine (A) Thymine (T) 2 O nucleotide in DNA. Each of the _ four bases bonds with deoxyribose O C C H H (through the nitrogen shown in H H red) and a phosphate group to form C C O NH2 3' the corresponding nucleotides. OH H C N C HN C N CH Phosphate Deoxyribose CH C C C CH N H2N N O N H H Guanine (G) Cytosine (C) knowledge of the sequence of nucleotide bases on complementary strands, in accordance with the one strand automatically allows one to determine sequence of the original template strands (Fig. 3–4). the sequence of bases on the other strand. The Similarly, when necessary, the base complementarity double-stranded structure of DNA molecules allows allows efficient and correct repair of damaged DNA them to replicate precisely by separation of the molecules. two strands, followed by synthesis of two new THE CENTRAL DOGMA: DNA : RNA : PROTEIN _ O 5' end _ Genetic information is contained in DNA in the O P O chromosomes within the cell nucleus, but protein O synthesis, during which the information encoded in Base 1 the DNA is used, takes place in the cytoplasm. This H2C 5' O compartmentalization reflects the fact that the human C C H H organism is a eukaryote. This means that human H H 3' C C cells have a genuine nucleus containing the DNA, O H which is separated by a nuclear membrane from the _ cytoplasm. In contrast, in prokaryotes like the intesti- O P O nal bacterium Escherichia coli, DNA is not enclosed O within a nucleus. Because of the compartmentaliza- Base 2 tion of eukaryotic cells, information transfer from the H2C 5' O nucleus to the cytoplasm is a very complex process C C H H that has been a focus of attention among molecular H H 3' C C and cellular biologists. O H The molecular link between these two related _ types of information (the DNA code of genes and the O P O amino acid code of proteins) is ribonucleic acid O (RNA). The chemical structure of RNA is similar to Base 3 that of DNA, except that each nucleotide in RNA has H2C 5' O a ribose sugar component instead of a deoxyribose; 3' end C C H H in addition, uracil (U) replaces thymine as one of H H 3' C C the pyrimidines of RNA (Fig. 3–5). An additional OH H difference between RNA and DNA is that RNA in Figure 3–2. A portion of a DNA polynucleotide chain, showing most organisms exists as a single-stranded molecule, the 3Ј–5Ј phosphodiester bonds that link adjacent nucleotides. whereas DNA exists as a double helix. The Human Genome: Structure and Function of Genes and Chromosomes 19 5' C 3' G 3.4 Å Figure 3–3. The structure of DNA. Left, A two-dimensional representation of the two complementary strands of DNA, showing the AT and GC base G C pairs. Note that the orientation of the two strands is antiparallel. Right, The double-helix model of DNA, as pro- posed by Watson and Crick. The hori- zontal “rungs” represent the paired T 34 Å bases. The helix is said to be right- A handed because the strand going from lower left to upper right crosses over the opposite strand. (Based on Watson JD, Crick FHC [1953] Molecular structure of nucleic acids—A structure C for deoxyribose nucleic acid. Nature G 171:737–738.) 3' Hydrogen bonds 5' 20 Å The informational relationships among DNA, Genetic information is stored in DNA by means of RNA, and protein are intertwined: DNA directs the a code (the genetic code, discussed later) in which the synthesis and sequence of RNA, RNA directs the sequence of adjacent bases ultimately determines the synthesis and sequence of polypeptides, and specific sequence of amino acids in the encoded polypeptide. proteins are involved in the synthesis and metabo- First, RNA is synthesized from the DNA template lism of DNA and RNA. This flow of information is through a process known as transcription. The RNA, referred to as the “central dogma” of molecular carrying the coded information in a form called mes- biology. senger RNA (mRNA), is then transported from the nucleus to the cytoplasm, where the RNA sequence is decoded, or translated, to determine the sequence of amino acids in the protein being synthesized. The 3' 5' G C process of translation occurs on ribosomes, which are G C C G cytoplasmic organelles with binding sites for all of the A T interacting molecules, including the mRNA, involved G C in protein synthesis. Ribosomes are themselves made A T T A up of many different structural proteins in association A T with a specialized type of RNA known as ribosomal G C T A RNA (rRNA). Translation involves yet a third type of C G RNA, transfer RNA (tRNA), which provides the A T molecular link between the coded base sequence of the G C T A T A mRNA and the amino acid sequence of the protein. C 3' 5' G G C G C A T A T T A T A O T A T A O G C G C C _ 5' Base G C G C HN CH O P O CH C G C G 2 O _ A T A T O C C G C G C C CH H H A T A T O H H N C T A T A 3' C T A T A H OH OH 5' 3' 5' 3' Uracil (U) Phosphate Ribose Figure 3–4. Replication of a DNA double helix, resulting in two Figure 3–5.