<<

52 Thorax 1990;45:52-56

Molecular and respiratory disease 1 (Series editor: R A Stockley) Thorax: first published as 10.1136/thx.45.1.52 on 1 January 1990. Downloaded from Basic principles

C A Owen, RA Stockley

Disease processes in the respiratory tract are examples of the technology and its application. similar to those that occur in other organs. The purpose of this first article is to introduce Congenital, malignant, and inflammatory the reader to some ofthe terminology and basic diseases are common, though our understand- principals that will appear in subsequent ing of these disease processes remains articles. Key words and phrases have been superficial. In recent years there has been an highlighted. Clearly such an introductory text increasing interest in the basic mechanisms can do no more than outline the basic that lead to the presence of acute, chronic, or principles. fatal lung disease. This interest is more than Further details can be obtained from an array academic, for the understanding of the basic of standard textbooks, including simplified mechanisms offers an opportunity to predict texts such as Structure and Expression by and manipulate the disease process by specific John D Hawkins (Cambridge University Press, methods. In addition, the lessons learned have 1985). major implications for similar disease processes in other organs. Many lung diseases clearly have a genetic Gene structure and function component and this provides an early clue to The genetic material of all organisms other the mechanisms. Cystic fibrosis is the most than retroviruses is deoxyribose nucleic acid common genetic disorder in the Western world (DNA). In nucleated cells (eukaryotes) most of and identification ofthe gene has recently been the DNA is found in the chromosomes within accomplished. The that it codes for has the nucleus. are the functional units been predicted;' it has properties of a trans- of DNA located on chromosomes and the complete of membrane protein and a potential role in collection genes of the organism is http://thorax.bmj.com/ chloride transport. A single codon deletion is referred to as the genome. found in 70% ofthe patients2 and would impair There are two classes of genes. Structural ATP binding of the cytoplasmic region of the genes contain the information to determine the protein, thereby affecting its function. Further amino acid sequences of , whereas are expected to account for the regulatory genes do not produce proteins but remaining 30% of the patients. In time treat- regulate of structural genes. ment may thus become more direct or even The genes of eukaryotes are composed of curative rather than supportive as at present. double stranded DNA-that is, two linear Such a step has already been taken with a, DNA strands wound around each other in the on September 29, 2021 by guest. Protected copyright. antitrypsin deficiency, where replacement form of a double helix. Each DNA strand is therapy is now under way.3 In this disorder the made up of a combination of four similar but protein deficiency had been identified some 20 distinct . Nucleotides have a years before the genetic defect was character- common deoxyribose sugar, which is linked at ised. Because the gene product was known, the 5' end to a phosphate group and at the however, the gene could be rapidly identified, opposite (1') position to one offour nitrogenous whereas in cystic fibrosis the methods have bases, adenine (A), guanine (G), cytosine (C), been indirect and hence relatively slow. These or thymine (T) (fig 1). The nucleotides are different approaches to gene identification will covalently linked by a 3'-5' phosphodiester be discussed later. bond. Thus the DNA strand has a backbone Other examples of conditions where composed of alternating deoxyribose and preliminary molecular biological studies have phosphate groups with a free 5' phosphate revealed a potential genetic clue to patho- group at one end (5' end) and a free 3' hydroxyl genesis include the relation of major deletions group at the other (3' end). The bases project at of the short arm of chromosome 3 to small cell right angles from the backbone and the double lung ,4 defects of chromosome 21 affect- helix is stabilised by hydrogen bonds between Lung ing superoxide dismutase to a predisposition to the bases of the opposite strand. Base pairing is Immunobiochemical oxygen toxicity,5 and defects of elastin gene fixed and can occur only between A and T and Research Laboratory, expression to emphysema.6 between C and G (fig 2). Thus the General Hospital, one DNA Birmingham B4 6NH This is the first of a series ofarticles aimed at sequence of strand determines that of C A Owen illustrating the diverse ways in which the other, and the two strands are referred to as R A Stockley molecular biology has already had an impact in complementary. One strand is known as the Address for reprint requests: respiratory disease and its potential in future and its complementary partner as Dr RA Stockley, the The order of the General Hospital, management and diagnosis. The series is not template (see belo*). Birmingham B4 6NH. meant to be definitive but will give broad nucleotides in a structural gene ultimately Basic principles 53

0 The 5' flanking sequences contain a promotor region, which extends for a distance of about 1 5' 100 base pairs and is essential for the expression 0 - P - 0 -CH of most classes of structural genes. There are two fixed elements of this region, the TATA Thorax: first published as 10.1136/thx.45.1.52 on 1 January 1990. Downloaded from and CAAT boxes. The TATA box (which Nitrogenous Base usually has the nucleotide sequence 0 TATAAAA) is located 25-35 base pairs 4, "upstream" of the first exon and the CAAT box (GGTCAATCT) generally occurs a further 50 base pairs upstream. Both of these elements are believed to be important for the attachment of the RNA polymerase, OH H which is responsible for the initiation of transcription of DNA to RNA. is also modified by Phosphate Group Deoxyribose Sugar sequences. In contrast to promoters, these Figure I Structure of the nucleotide component of DNA showing the positions of the 3' sequences do not have to occupy a specific hydroxyl and 5' phosphate groups. position in relation to the genes they influence, and in addition they are often tissue specific. For example, the immunoglobulin gene 5' Coding (anti-sense) strand 3' enhancer functions only in B lymphocytes and P04 , OH will increase the expression of other genes that 1- I I I I have been inserted only in the B cells, whether A T G C; I A G G A upstream or downstream from the enhancer. Finally, in the region 3' to the end ofthe gene there is a transcription termination triplet of I I1 nucleotides (codon), such as TAG, which OH t/////// //i//////P04 3' Template (sense) strand 5' causes the enzyme RNA polymerase to cease transcription. Deoxynbose - phosphate backbone Figure 2 Diagrammatic representation of the structure ofDNA. Hatching indicates the phosphate backbone of DNA replication and the polymerase the molecule. Nucleotide bases: A-adenine; chain reaction C-cytosine; T-thymine; G-guanine. During division of the parent cell the DNA is

precisely replicated so that each daughter cell http://thorax.bmj.com/ will contain a full complement of DNA determines the amino acid sequence of its identical to that of the parent cell. This is a protein product. complex process in which numerous By convention, the initial letters of the bases play a part. Initially the two strands of DNA of the nucleotides are used as abbreviations in are separated for a distance at a replication fork writing DNA sequences and only the sequence (fig 4). This is followed by pairing of de- of the coding strand is given, with the 5' end to oxyribonucleoside triphosphates with the bases the left. Sequences to the left of a given in each strand. These nucleosides are joined nucleotide are said to be 5' or "upstream" and together by the enzyme DNA polymerase. The on September 29, 2021 by guest. Protected copyright. those to the right are 3' or "downstream." synthesis of the new strand always proceeds in Genes in eukaryotes vary in length from a the 5' to 3' direction. The replication fork then few hundred to several thousand (kilo) base moves further along the parent strands ofDNA pairs in length. The protein coding sequences and the steps are repeated until complete copies are divided into blocks (exons), which are of both strands have been synthesised. Each interspersed with non-coding regions of un- daughter cell contains one strand from the certain function (), as shown in figure 3. parent cell and one newly synthesised strand. At each end of the gene there are flanking This type of replication is therefore termed sequences, which regulate the rates of tran- semi conservative. scription to the respective messenger RNA The process of DNA replication forms the (mRNA) and hence the subsequent protein basis for the polymerase chain reaction, which products. has been developed recently and greatly Transcription termination Enhancer sequence sequence

CAAT TATA Poly A BOX BOX Start of signal t Promoterl transcrption Exons region Introns

Figure 3 Diagrammatic representation of a typical mammalian gene. 54 Owen, Stockley

5' 3' I I I I I I l 3' 5' Thorax: first published as 10.1136/thx.45.1.52 on 1 January 1990. Downloaded from 5'

Replication fork

5' DNA polymerase 3'

Complementary base pairng of nucleotide triphosphates Figure 4 Schematic representation of DNA replication.

improves the prospects of prenatal diagnosis of human genetic disorders as well as the identi- fication of as little as one strand of DNA or mRNA. In the polymerase chain reaction DNA

or mRNA (converted to DNA by the enzyme http://thorax.bmj.com/ reverse transcriptase) is amplified over a million fold within a few hours to facilitate its use in > Tissue or sample standard molecular biology techniques, such as Southern blotting with hybridisation to radioactive probes (see next article). Genomic DNA extracted The principle of the method is illustrated in figure 5. Initially, the genomic double stranded DNA is heated (denatured) to separate the stranded DNA on September 29, 2021 by guest. Protected copyright. Double strands. The resulting single strands of DNA are then incubated with two single stranded Sequence of interest oligonucleotide primers complementary to + heated sequences flanking the region of the DNA to be studied. The primers anneal to the heat Single stranded DNA denatured DNA, and DNA synthesis proceeds in the 5' to 3' direction with a thermostable Primers () added after the addition of de- + DNA polymerase DNA polymerase + nucleotides oxyribonucleotide triphosphates (dNTPs). This is followed by further heat denaturation, and the newly synthesised strand as well as the 5 3 DNA original DNA combine with additional primers X 5.ej replication and act as templates for further rounds ofDNA Repeat synthesis, amplification increasing exponen- process to tially. Although the polymerase chain reaction amplify the has been applied initially to study fetal DNA it sequence of can also be used to detect viral pathogens even 5' 3 /i interest in the presence of a large excess of host DNA. The technique has been used recently to detect 3' 5' Synthesis of low copy numbers of the human immuno- 3 5' complementary deficiency virus7 and the human papilloma 1 1 T-FT sequences virus' in human tissues. The power of the 5' 3' including the other sequence of technique provides many potential uses, interest though its ability to amplify a single strand of DNA produces major problems with con- Figure 5 Schematic representation of the polymerase and makes it difficult to interpret chain reaction. tamination Basic principles 55

3' Template strand of the gene 5' sequence is reached. The base sequence of the RNA transcript (the primary transcript) is the liii' !/Xi f- same as that ofthe coding (anti-sense) strand of Poly A DNA (with U replacing T) and therefore renrun 7 1 =Yfnen signal region IJCnUt nb contains both introns and exons. Thorax: first published as 10.1136/thx.45.1.52 on 1 January 1990. Downloaded from 0 Introns The primary transcript is modified before it leaves the nucleus. Firstly, the 5' end is 3' flr the addition of a "capped" by methylated guanine nucleotide to its end. This step is ;-"-Direction of transcription (5'--3') essential for the formation of the subsequent mRNA- complex. Secondly, the 3' RNA polymerase end is modified by the addition of 50-200 binds to promotor adenine nucleotides to give the poly A tail, region which stabilises the mRNA against subsequent 5' 3' nuclease degradation. The poly A tail is present on the mRNA molecule and is often used / only - III /{Z WZ --PPrimary in vitro to separate mRNA from other types of I ~~~RNA RNA. Finally, the non-coding sequences, the LNUS L transcript introns, are excised and the exons are spliced together to produce the mature mRNA with CYTOPLASM Addition of cap + contiguous coding sequences. ,poly A tail 5' I 3' D}/ Aa AAAAA555M - The The mRNA molecule enters the cytoplasm and CAP attaches to the ribosome (the site of protein Introns excised synthesis). The base sequence of the mRNA is Exons spliced together converted into the linear sequence of amino acids of the protein product. This is mediated by the genetic code, the principal feature being that three consecutive bases (a codon) code for a AAAAAAA- Messenger RNA single amino acid. There are 64 different permutations of the four possible bases in the three base codon, although there are only 20 Figure 6 Schematic representation of DNA RNA amino acids. The genetic code is therefore said transcription. to be redundant-that is, two or more codons exist for most amino acids. For example UCC, http://thorax.bmj.com/ the results. In addition, there is increasing risk UCA, UCU, and UCG all code for serine. The of introducing experimental mutations during exceptions are methionine and tryptophan, the replication process, which may lead to which are represented by single codons (AUG incorrect conclusions concerning the nature of and UGG). There are three codons that do not the DNA. code for amino acids (UAA, UAG, and UGA) but act as termination signals to end transcription. Gene transcription In any mRNA there are clearly three possible on September 29, 2021 by guest. Protected copyright. The expression ofthe information contained in ways in which the base sequence can be divided DNA is mediated by the synthesis of a second into codons; the correct option is referred to as type of nucleic acid, ribonucleic acid (RNA). the reading frame. In eukaryotes the initiation Like DNA, RNA is composed of four signal for translation is usually the codon A UG nucleotide monomers. There are two major at the 5' end of the mRNA. Thereafter succes- differences: in RNA the sugar is ribose instead sive codons are "read" to give the specified of deoxyribose and the base thymine in DNA is amino acid sequence until a termination codon replaced by (U). RNA is generally a is reached. Details ofthe individual steps ofthe single stranded molecule but it hybridises with translation process may be found in standard complementary sequences in other RNA textbooks of biochemistry. molecules and in DNA (A pairs with U in the RNA, or with T in the DNA). RNA is synthesised from a template ofDNA Common genetic defects by the process of transcription, which is similar A change in the base sequence of DNA is to DNA replication except that only one of the referred to as a . Mutations may arise strands of DNA is used as a template (the spontaneously or may be induced by some template or ), as shown in figure 6. insult, either physical (for example, ultraviolet After separation of the two DNA strands, the or gamma irradiation) or chemical (for RNA nucleotides A, C, G, and U pair with example, nitrous acid or acridines). The complementary bases on the template strand of consequences of such a mutation depend on the DNA. The enzyme RNA polymerase binds its location within the gene or controlling to the DNA promoter region and moves in the sequences; a mutation in the remaining (non- 3' direction, catalysing the formation of phos- coding) sequences will not usually alter the phodiester bonds between the nucleotides. phenotype of the organism. This process continues until the termination A single results in either a 56 Owen, Stockley

substitution, deletion or insertion of a base Although key words and phrases have been within a DNA sequence. A base substitution highlighted the following glossary summarises within a coding sequence may result in an these and others for subsequent use throughout amino acid substitution within the protein. the series. This change may have no effect on the function Thorax: first published as 10.1136/thx.45.1.52 on 1 January 1990. Downloaded from or amount ofthe resulting protein unless it is at Anneal Joining together of two complementary a critical site. For example, the most common strands of nucleic acids by hydrogen bonding between phenotype associated with a, antitrypsin the base pairs of the opposite strands. deficiency and emphysema (PiZ) is the result of Anti-sense strand of DNA DNA strand not transcribed; has the same sequence as RNA transcript a single base substitution in exon V of the gene (T replacing U). (GAG-+AAG), which in turn leads to a change Capping Addition of a methylated guanine residue to in the amino acid residue 342 from glutamic the 5' end of an RNA transcript. acid to lysine.9 A base substitution that results Coding strand of DNA DNA strand that is not in an alternative codon for the same amino acid transcribed (see "anti-sense strand"). (for example, CUU lysine-÷CUC lysine) will Codon Three consecutive bases that either code for an amino acid or function as a signal for initiation or be silent. termination of transcription. A mutation that results in the insertion or Complementary base pairing Hydrogen bonding deletion of one or two bases will alter the between two bases of opposite strands of nucleic acids. reading frame from that point onwards and this Denaturation Separation of two complementary is termed a frameshift mutation. If bases are strands of nucleic acid by heating or treatment with an deleted the reading frame moves forward in the alkali. DNA polymerase Enzyme that catalyses DNA 3' direction (3' frameshift), whereas insertions synthesis. mean that the reading frame is retarded (5' Enhancer DNA sequence that modifies gene frameshift). These frameshifts may lead to the transcription. synthesis of a completely different protein. Exon Segment of the gene that codes for part of the Commonly mutations result in the formation of amino acid sequence of the protein product. premature stop codons and hence truncated Frameshift mutation Insertion or deletion ofa base pair in a gene, resulting in the disruption of the normal forms of the protein that may not be secreted reading frame. from the cell-for example, the null Hong Gene Functional unit of DNA that gives rise to a Kong mutant form of the ot, antitrypsin gene.'0 specific protein product. Genome Total genetic material of a cell. Segment of a gene which does not code for the 1 Riordan JR, Rommens JM, Kerem BS, et al. Identification amino acid sequence of the protein product. of the cystic fibrosis gene: cloning and characterisation of Messenger RNA (mRNA) Primary transcript that has complementary DNA. Science 1989;245:1066-73. been modified (including excision of introns) and 2 Kerem BS, Rommens JM, Buchanan JA, et al. Identification of the cystic fibrosis gene: genetic analysis. Science 1989; contains the genetic information to code for a protein product. 245:1073-9. http://thorax.bmj.com/ 3 Wewers DM, Casolaro A, Sellers SE, et al. Replacement Mutation A change in the base sequence of DNA. therapy for a-1-antitrypsin deficiency associated with emphysema. N Engl J Med 1987;316:1055-62. Nuclease Enzyme that catalyses the degradation of 4 Whang-Peng J, Bunn PA, Kao-Shan CS. A non random nucleic acids by breaking phosphodiester bonds. chromosomal abnormality, del 3p (14-23), in human small Nucleotide Monomeric unit of nucleic acid. cell lung cancer. Cancer Genet Cytogenet 1982;6:119-34. Poly A tail Fifty to 200 adenine units added to the 3' 5 Ackerman AD, Fackler JC, Tuck-Muller CM, et al. Partial monosomy 21, diminished activity of superoxide end of most messenger , stabilising the molecule dismutase and pulmonary oxygen toxicity. N Engl J Med against degradation by nucleases. 1988;318:1666-9. Point mutation Mutation caused by the substitution of 6 Olsen DR, Fazio MJ, Shamban AT, et al. Cutis Laxa: one base for reduced elastin gene expression in skin fibroblast cultures pair another. as determined by hybridisation with a homologous cDNA Primary transcript RNA synthesised from a DNA on September 29, 2021 by guest. Protected copyright. and an exon 1 specific oligonucleotide. J Biol Chem template that is unmodified and contains introns. 1988;263:6465-7. Primer Short single stranded DNA molecule 7 Ou CY, Mitchell SW, Krebbs J. DNA amplification for direct detection of human immunodeficiency virus-I complementary to a flanking sequence of a template (HIV-1) in DNA ofperipheral mononuclear cells. Science DNA strand, acting as a growing point for the addition 1988;239:295-7. of nucleotides during DNA synthesis. 8 Shibata DK, Arnheim N, Martin WJ. Detection of human Promoter Specific regulatory DNA sequence essential papilloma virus in paraffin-embedded tissue using the polymerase chain reaction. J Exp Med 1988;167:225-30. for gene transcription. 9 Kidd VJ, Bruce Wallace R, Itakura K, et al. a- 1-Antitrypsin Reverse transcriptase Enzyme purified from deficiency detection by direct analysis of the mutation in retroviruses that catalyses the synthesis of single the gene. Nature 1983;304:230-4. stranded DNA with 10 Sifers RN, Brashears-Macatee S, Kidd VJ, et al. A frame- complementary (cDNA) messenger shift mutation results in a truncated a-i -antitrypsin that is RNA as the template. retained within the rough endoplasmic reticulum. J Biol RNA polymerase Enzyme catalysing the synthesis of Chem 1988;263:7330-5. RNA from the template DNA strand during the process of transcription. Sense strand ofDNA DNA strand that is transcribed. Glossary Template DNA strand See "sense strand". The foregoing text should provide basic Transcription Transfer of genetic information from knowledge of gene structure and function. double stranded DNA to single stranded RNA.