<<

42

Carrier Growing end of Polymer with added subunit

+

Carrier Recycled

Free carrier

Monomer

(a)

Hydrolysis +

_ H+ + OH

(b) H20H 0 Figure 2.10 and ; and to a carrier that subsequently transfers the monomer to . (a) , , and nucleic acids the end of the growing . ( b) A macromolecule is consist of monomers (subunits) linked together by covalent bonds. disassembled by hydrolysis of the bonds that join the monomers Free monomers do not simply react with each other to become together. Hydrolysis is the splitting of a bond by . All of these . Rather, each monomer is first activated by attachment reactions are catalyzed by specific .

include , which are the precursors of polysaccha- involved in energy storage, such as ATP; regulatory mole- rides; amino acids, which are the precursors of proteins; cules such as cyclic AMP; and metabolic waste products , which are the precursors of nucleic acids; and such as . fatty acids, which are incorporated into . 3. Metabolic intermediates ( ). The in a R E V I E W have complex chemical and must be syn- 1. What properties of a are critical to ? thesized in a step-by-step sequence beginning with spe- 2. Draw the structures of four different functional groups. cific starting materials. In the cell, each series of chemical How would each of these groups alter the solubility of a reactions is termed a . The cell starts molecule in water? with compound A and converts it to compound B, then to compound C, and so on, until some functional end (such as an building block of a pro- tein) is produced. The compounds formed along the pathways leading to the end products might have no 2.5 | Four Types of per se and are called metabolic intermediates . 4. Molecules of miscellaneous function. This is obviously a Biological Molecules broad category of molecules but not as large as you might The macromolecules just described can expect; the vast bulk of the dry weight of a cell is made up be divided into four types of organic of macromolecules and their direct precursors. The mole- molecules: , lipids, pro- The The Chemical Basis of Life cules of miscellaneous function include such substances as teins, and nucleic acids. The localization of these molecules vitamins, which function primarily as adjuncts to pro- in a number of cellular structures is shown in an overview in teins; certain or amino acid hormones; molecules Figure 2.11. Chapter Chapter 2 43

Chromatin Carbohydrates in nucleus Carbohydrates (or glycans as they are often called) include simple sugars (or ) and all larger molecules DNA constructed of building blocks. Carbohydrates function

Protein Carbohydrate primarily as stores of chemical energy and as durable building

Carbohydrate materials for biological construction. Most sugars have the general formula (CH 2O) n. The sugars of importance in cellu- lar have values of n that range from 3 to 7. Sugars grain containing three are known as , those with four in chloroplast carbons as , those with five carbons as , those with six carbons as , and those with seven carbons as Plasma membrane .

The of Simple Sugars Protein Each sugar molecule consists of a backbone of carbon linked together in a RNA Protein linear array by single bonds. Each of the carbon atoms of the backbone is linked to a single hydroxyl group, except for one Ribosome DNA Microtubules that bears a carbonyl (C PO) group. If the is Lipid located at an internal position (to form a ketone group), the Protein sugar is a , such as , which is shown in Figure 2.12 a. Mitochondrion Protein If the carbonyl is located at one end of the sugar, it forms an Carbohydrate group and the molecule is known as an , as Lipid exemplified by , which is shown in Figure 2.12b–f. DNA Because of their large numbers of hydroxyl groups, sugars RNA tend to be highly water soluble. Although the straight-chained formulas shown in Figure Figure 2.11 An overview of the types of biological molecules that 2.12a,b are useful for comparing the structures of various sug- make up various cellular structures. ars, they do not reflect the fact that sugars with five or more carbon atoms undergo a self-reaction (Figure 2.12 c) that con- verts them into a closed, or ring-containing, molecule. The ring forms of sugars are usually depicted as flat ( planar ) struc- tures (Figure 2.12 d) lying perpendicular to the plane of the paper with the thickened line situated closest to the reader. The H and OH groups lie parallel to the plane of the paper,

H H

6 6 H C OH C O CH 2OH O H CH 2OH H 5 H CH OH C O H C OH H 5 O H 2 C H C OH H O H HO H H 4H 1 H O C HO C H HO C H H 1 C O C C OH H H H H 4 OH H H O HO3 2 OH HO HO H C OH H C OH O OH C C HO H H C H C C H OH 3 2 H O H H C OH H C OH H OH H O 2.5 H C OH H C OH Four Types of Biological Molecules Biological of Types Four

H H D-Fructose D-Glucose D-Glucose α-D-Glucose α-D-Glucose α-D-Glucose (Ring Formation) () (Chair form) (Ball-and-stick chair) (a) (b) (c) (d) (e) (f)

Figure 2.12 The structures of sugars. (a) Straight-chain formula of lying perpendicular to the page with the thickened line situated closest fructose, a ketohexose [keto, indicating the carbonyl (yellow), is located to the reader and the H and OH groups projecting either above or internally, and because it consists of six carbons]. ( b) Straight- below the ring. The basis for the designation ␣-D-glucose is discussed chain formula of glucose, an aldohexose (aldo because the carbonyl is in the following section. ( e) The chair conformation of glucose, which located at the end of the molecule). ( c) Self-reaction in which glucose depicts its three-dimensional structure more accurately than the is converted from an open chain to a closed ring (a ring). flattened ring of part d. ( f ) A ball-and-stick model of the chair (d ) Glucose is commonly depicted in the form of a flat (planar) ring conformation of glucose. 44 projecting either above or below the ring of the sugar. In rations exist that cannot be superimposed on one another. actual fact, the sugar ring is not a planar structure, but usually These two molecules (termed stereoisomers or enantiomers ) have exists in a three-dimensional conformation resembling a chair essentially the same chemical reactivities, but their structures are (Figure 2.12 e,f ). mirror images (not unlike a pair of right and left human hands). By convention, the molecule is called D-glyceraldehyde if Stereoisomerism As noted earlier, a carbon atom can the hydroxyl group of carbon 2 projects to the right, and L- bond with four other atoms. The arrangement of the groups glyceraldehyde if it projects to the left (Figure 2.13 c). Because around a carbon atom can be depicted as in Figure 2.13 a with it acts as a site of stereoisomerism, carbon 2 is referred to as an the carbon placed in the center of a tetrahedron and the asymmetric carbon atom. bonded groups projecting into its four corners. Figure 2.13 b As the backbone of sugar molecules increases in length, so depicts a molecule of glyceraldehyde, which is the only too does the number of asymmetric carbon atoms and, conse- aldotriose. The second carbon atom of glyceraldehyde is quently, the number of stereoisomers. Aldotetroses have two linked to four different groups (—H, —OH, —CHO, and asymmetric carbons and thus can exist in four different con- —CH 2OH). If the four groups bonded to a carbon atom are figurations (Figure 2.14). Similarly, there are 8 different all different, as in glyceraldehyde, then two possible configu- aldopentoses and 16 different aldohexoses. The designation of each of these sugars as D or L is based by convention on the arrangement of groups attached to the asymmetric carbon a atom farthest from the aldehyde (the carbon associated with the aldehyde is designated C1). If the hydroxyl group of this carbon projects to the right, the aldose is a D-sugar; if it proj- ects to the left, it is an L-sugar. The enzymes present in living cells can distinguish between the D and L forms of a sugar. C d b Typically, only one of the stereoisomers (such as D-glucose and L-fucose) is used by cells. c The self-reaction in which a straight-chain glucose mole- (a) cule is converted into a six-membered ( pyranose ) ring was shown in Figure 2.12 c. Unlike its in the open chain, the C1 of the ring bears four different groups and thus be- comes a new center of asymmetry within the sugar molecule. CHO CHO Mirror Because of this extra asymmetric carbon atom, each type of pyranose exists as ␣ and ␤ stereoisomers (Figure 2.15). By C C H H convention, the molecule is an ␣-pyranose when the OH

CH 2OH OH OH CH 2OH group of the first carbon projects below the plane of the ring, and a ␤-pyranose when the hydroxyl projects upward. The difference between the two forms has important biological consequences, resulting, for example, in the compact shape of and starch molecules and the extended conformation (b) of (discussed later). CHO CHO Linking Sugars Together Sugars can be joined to one an- H C OH OH C H other by covalent glycosidic bonds to form larger molecules. CH OH CH OH Glycosidic bonds form by reaction between carbon atom C1 2 2 of one sugar and the hydroxyl group of another sugar, gener- D-Glyceraldehyde L-Glyceraldehyde ating a —C—O—C— linkage between the two sugars. As (c) discussed below (and indicated in Figures 2.16 and 2.17), sug- ars can be joined by quite a variety of different glycosidic Figure 2.13 Stereoisomerism of glyceraldehyde. (a) The four groups bonded to a carbon atom (labeled a, b, c, and d) occupy the four corners of a tetrahedron with the carbon atom at its center. ( b) Glyceraldehyde is the only three-carbon aldose; its second carbon atom is bonded to four different groups (—H, —OH, —CHO, and —CH 2OH). As a CHO CHO CHO CHO result, glyceraldehyde can exist in two possible configurations that are HCOH HOCH HCOH HOCH not superimposable, but instead are mirror images of each other as in- HCOH HCOH HOCH HOCH dicated. These two stereoisomers (or enantiomers) can be distinguished by the configuration of the four groups around the asymmetric (or CH 2OH CH 2OH CH 2OH CH 2OH

chiral ) carbon atom. Solutions of these two rotate plane- D-Erythrose D-Threose L-Threose L-Erythrose

The The Chemical Basis of Life polarized light in opposite directions and, thus, are said to be optically active. ( c) Straight-chain formulas of glyceraldehyde. By Figure 2.14 Aldotetroses. Because they have two asymmetric carbon convention, the D- is shown with the OH group on the right. atoms, aldotetroses can exist in four configurations. Chapter Chapter 2 45

6 CH 2OH CH 2OH 5 CH 2OH C OH H O OH H H H O H 1 H C H C H OH H 4 OH H OH H O HO H HO HO OH 3C 2C H OH H OH H OH β-D-Glucopyranose α-D-Glucopyranose

Figure 2.15 Formation of an ␣- and ␤-pyranose. When a molecule molecule. By convention, the molecule is an ␣-pyranose when the OH of glucose undergoes self-reaction to form a pyranose ring (i.e., a six- group of the first carbon projects below the plane of the ring, and a ␤- membered ring), two stereoisomers are generated. The two isomers are pyranose when the hydroxyl group projects upward. in equilibrium with each other through the open-chain form of the

bonds. Molecules composed of only two sugar units are disac- be composed of many different combinations of sugar units, charides (Figure 2.16). serve primarily as readily these carbohydrates can play an informational role; that is, they available energy stores. , or table sugar, is a major com- can serve to distinguish one type of cell from another and help ponent of , which carries chemical energy from one mediate specific interactions of a cell with its surroundings. part of the plant to another. , present in the milk of most mammals, supplies newborn mammals with fuel for early Polysaccharides By the middle of the nineteenth century, growth and development. Lactose in the diet is hydrolyzed by it was known that the blood of people suffering from diabetes the , which is present in the plasma membranes had a sweet due to an elevated level of glucose, the key of the cells that line the intestine. Many people lose this sugar in energy metabolism. Claude Bernard, a prominent enzyme after childhood and find that eating dairy products French physiologist of the period, was looking for the cause of causes digestive discomfort. diabetes by investigating the source of blood sugar. It was Sugars may also be linked together to form small chains assumed at the time that any sugar present in a human or an called (oligo ϭ few). Most often such chains had to have been previously consumed in the diet. are found covalently attached to lipids and proteins, converting Working with dogs, Bernard found that, even if the them into and glycoproteins, respectively. Oligosac- were placed on a diet totally lacking carbohydrates, their blood charides are particularly important on the glycolipids and gly- still contained a normal amount of glucose. Clearly, glucose coproteins of the plasma membrane, where they project from could be formed in the body from other types of compounds. the cell surface (see Figure 4.4 c). Because oligosaccharides may After further investigation, Bernard found that glucose enters the blood from the . Liver , he found, con- tains an insoluble polymer of glucose he named glycogen . Sucrose Bernard concluded that various food materials (such as pro- 6 CH OH teins) were carried to the liver where they were chemically 2 1 converted to glucose and stored as glycogen.Then, as the body H 5 O H HOCH O H 2 needed sugar for fuel, the glycogen in the liver was trans- 4 H 1(α) 2 5 OH H H HO formed to glucose, which was released into the bloodstream to O 6 HO 3 2 3 4 CH 2OH satisfy glucose-depleted tissues. In Bernard’s hypothesis, the H OH OH H balance between glycogen formation and glycogen breakdown (a) in the liver was the prime determinant in maintaining the rel- atively constant ( homeostatic ) level of glucose in the blood. 2.5 Lactose Bernard’s hypothesis proved to be correct. The molecule 6 6 Molecules Biological of Types Four CH 2OH CH 2OH he named glycogen is a type of —a polymer of HO 5 O H 5 O OH sugar units joined by glycosidic bonds.

4 H 1(β) O 4 H 1 OH H OH H Glycogen and Starch: Nutritional Polysaccharides Glycogen is a H H 3 2 3 2 H branched polymer containing only one type of monomer: glu- H OH H OH cose (Figure 2.17 a). Most of the sugar units of a glycogen (b) molecule are joined to one another by ␣(1 → 4) glycosidic bonds (type 2 bond in Figure 2.17 a). Branch points contain a Figure 2.16 Disaccharides. Sucrose and lactose are two of the most sugar joined to three neighboring units rather than to two, as common disaccharides. Sucrose is composed of glucose and fructose in the unbranched segments of the polymer. The extra neigh- joined by an ␣(1 → 2) linkage, whereas lactose is composed of glucose bor, which forms the branch, is linked by an ␣(1 → 6) glyco- and joined by a ␤(1 → 4) linkage. sidic bond (type 1 bond in Figure 2.17 a). 46

1

Glycogen 2 (a)

2

(b) Starch

3

(c) Cellulose

Figure 2.17 Three polysaccharides with identical sugar monomers extended. Whereas glycogen and starch are energy stores, cellulose but dramatically different properties. Glycogen ( a), starch ( b), and molecules are bundled together into tough fibers that are suited for cellulose ( c) are each composed entirely of glucose subunits, yet their their structural role. Colorized electron micrographs show glycogen chemical and physical properties are very different due to the distinct granules in a liver cell, starch grains (amyloplasts) in a plant seed, and ways that the monomers are linked together (three different types of cellulose fibers in a plant cell wall; each is indicated by an arrow. linkages are indicated by the circled numbers). Glycogen molecules [P HOTO INSETS : ( TOP ) D ON FAWCETT /P HOTO RESEARCHERS , are the most highly branched, starch molecules assume a helical INC .; ( CENTER ) J EREMY BURGESS /P HOTO RESEARCHERS , I NC .; arrangement, and cellulose molecules are unbranched and highly (BOTTOM ) B IOPHOTO ASSOCIATES /P HOTO RESEARCHERS , I NC .]

Glycogen serves as a storehouse of surplus chemical en- Most bank their surplus chemical energy in the ergy in most animals. Human skeletal muscles, for example, form of starch , which like glycogen is also a polymer of glu- typically contain enough glycogen to fuel about 30 minutes of cose. Potatoes and cereals, for example, consist primarily of moderate activity. Depending on various factors, glycogen starch. Starch is actually a mixture of two different polymers, typically ranges in molecular weight from about one to four and amylopectin. Amylose is an unbranched, helical The The Chemical Basis of Life million daltons. When stored in cells, glycogen is highly molecule whose sugars are joined by ␣(1 → 4) linkages (Fig- concentrated in what appears as dark-, irregular gran- ure 2.17 b), whereas amylopectin is branched. Amylopectin ules in electron micrographs (Figure 2.17 a, right). differs from glycogen in being much less branched and having Chapter Chapter 2 47 an irregular branching pattern. Starch is stored as densely packed granules, or starch grains , which are enclosed in membrane- bound ( plastids ) within the plant cell (Figure 2.17 b, right). Although animals don’t synthesize starch, they possess an enzyme ( ) that readily hydrolyzes it.

Cellulose, , and Glycosaminoglycans: Structural Polysac- charides Whereas some polysaccharides constitute easily di- gested energy stores, others form tough, durable structural materials. Cotton and linen, for example, consist largely of cellulose , which is the major component of plant cell walls. Cotton textiles owe their durability to the long, unbranched cellulose molecules, which are ordered into side-by-side aggregates to form molecular cables (Figure 2.17 c, right panel) that are ideally constructed to resist pulling (tensile) forces. Like glycogen and starch, cellulose consists solely of glucose monomers; its properties differ dramatically from these other polysaccharides because the glucose units are joined by ␤(1 → 4) linkages (bond 3 in Figure 2.17 c) rather than ␣(1 → 4) link- ages. Ironically, multicellular animals (with rare exception) lack the enzyme needed to degrade cellulose, which happens to be the most abundant organic material on Earth and rich in chemical energy. Animals that “make a living” by digesting Figure 2.18 Chitin is the primary component of the outer skeleton cellulose, such as termites and sheep, do so by harboring of this grasshopper. ( ANTHONY BANNISTER /G ALLO IMAGES / and protozoa that synthesize the necessary enzyme, © C ORBIS ) cellulase. Not all biological polysaccharides consist of glucose monomers. Chitin is an unbranched polymer of the sugar N-acetylglucosamine, which is similar in structure to glucose are found in the spaces that surround cells, and their structure but has an acetyl amino group instead of a hydroxyl group and function are discussed in Section 7.1. The most complex bonded to the second carbon atom of the ring. polysaccharides are found in plant cell walls (Section 7.6).

CH 2OH HO H Lipids H Lipids are a diverse group of nonpolar biological molecules OH H HO OH whose common properties are their ability to dissolve in or- ganic , such as chloroform or , and their in- H HNCOCH 3 ability to dissolve in water—a property that explains many of N-Acetylglucosamine their varied biological functions. Lipids of importance in cel- lular function include , , and . Chitin occurs widely as a structural material among inverte- brates, particularly in the outer covering of , spiders, Fats Fats consist of a molecule linked by ester and crustaceans. Chitin is a tough, resilient, yet flexible mate- bonds to three fatty acids; the composite molecule is termed a rial not unlike certain . Insects owe much of their suc- triacylglycerol (Figure 2.19 a). We will begin by considering cess to this highly adaptive polysaccharide (Figure 2.18). the structure of fatty acids . Fatty acids are long, unbranched 2.5 Another group of polysaccharides that has a more complex hydrocarbon chains with a single carboxyl group at one end Four Types of Biological Molecules Biological of Types Four structure is the glycosaminoglycans (or GAGs ). Unlike other (Figure 2.19 b). Because the two ends of a molecule polysaccharides, they have the structure —A—B—A—B—, have a very different structure, they also have different prop- where A and B represent two different sugars. The best-studied erties. The hydrocarbon chain is hydrophobic, whereas the GAG is heparin, which is secreted by cells in the lungs and carboxyl group (—COOH), which bears a negative charge other tissues in response to tissue injury. Heparin inhibits at physiological pH, is hydrophilic. Molecules having blood coagulation, thereby preventing the formation of clots both hydrophobic and hydrophilic regions are said to be that can block the flow of blood to the heart or lungs. Heparin amphipathic ; such molecules have unusual and biologically accomplishes this feat by activating an inhibitor (antithrom- important properties. The properties of fatty acids can be bin) of a key enzyme (thrombin) that is required for blood appreciated by considering the use of a familiar product: soap, coagulation. Heparin, which is normally extracted from pig which consists of fatty acids. In past centuries, soaps were tissue, has been used for decades to prevent blood clots in made by heating animal in strong alkali (NaOH or KOH) patients following major surgery. Unlike heparin, most GAGs to break the bonds between the fatty acids and the glycerol. 48

Glycerol Fatty moiety acid tail O Water

CH 2 O C O

CH O C O

CH 2 O C

(a)

O H H H H H H H H H H H H H H H H H

HOC C C C C C C C C C C C C C C C C HC

H H H H H H H H H H H H H H H H H (b)

Figure 2.20 Soaps consist of fatty acids. In this schematic drawing of a soap micelle, the nonpolar tails of the fatty acids are directed inward, where they interact with the greasy matter to be dissolved. The negatively charged heads are located at the surface of the micelle, where they interact with the surrounding water. Membrane proteins, which also tend to be insoluble in water, can also be solubilized in this way by extraction of membranes with detergents. Tristearate (c)

As a result, greasy materials are converted into complexes (micelles ) that can be dispersed by water (Figure 2.20). Fatty acids differ from one another in the length of their hydrocarbon chain and the presence or absence of double bonds. Fatty acids present in cells typically vary in length from 14 to 20 carbons. Fatty acids that lack double bonds, such as stearic acid (Figure 2.19 b), are described as saturated ; those possessing double bonds are unsaturated . Naturally occurring fatty acids have double bonds in the cis configuration. Double Linseed bonds (of the cis configuration) (d) Figure 2.19 Fats and fatty acids. (a) The basic structure of a triacyl- H H C H glycerol (also called a or a neutral fat). The glycerol moiety, C C as opposed to C C indicated in orange, is linked by three ester bonds to the carboxyl groups of three fatty acids whose tails are indicated in green. ( b) Stearic C C H C acid, an 18-carbon saturated fatty acid that is common in animal fats. cis trans (c) Space-filling model of tristearate, a triacylglycerol containing three identical stearic acid chains. ( d ) Space-filling model of linseed oil, a produce kinks in a fatty acid chain. Consequently, the more triacylglycerol derived from flax seeds that contains three unsaturated double bonds that fatty acid chains possess, the less effectively fatty acids (linoleic, oleic, and linolenic acids). The sites of unsaturation, which produce kinks in the molecule, are indicated by the these long chains can be packed together. This lowers the yellow-orange bars. temperature at which a fatty acid-containing lipid melts. Tri- stearate, whose fatty acids lack double bonds (Figure 2.19 c), is a common component of animal fats and remains in a solid state well above room temperature. In contrast, the profusion Today, most soaps are made synthetically. Soaps owe their of double bonds in vegetable fats accounts for their liquid The The Chemical Basis of Life grease-dissolving capability to the fact that the hydrophobic state—both in the plant cell and on the grocery shelf—and for end of each fatty acid can embed itself in the grease, whereas their being labeled as “polyunsaturated.” Fats that are liquid at the hydrophilic end can interact with the surrounding water. room temperature are described as . Figure 2.19 d shows Chapter Chapter 2 49 the structure of linseed oil, a highly volatile lipid extracted Because they lack polar groups, fats are extremely insolu- from flax seeds, that remains a liquid at a much lower temper- ble in water and are stored in cells in the form of dry lipid ature than does tristearate. Solid shortenings, such as mar- droplets. Since lipid droplets do not contain water as do glyco- garine, are formed from unsaturated vegetable oils by gen granules, they represent an extremely concentrated storage chemically reducing the double bonds with atoms (a fuel. In many animals, fats are stored in special cells ( adipocytes ) process termed hydrogenation ). The hydrogenation process whose is filled with one or a few large lipid droplets. also converts some of the cis double bonds into trans double Adipocytes exhibit a remarkable ability to change their volume bonds, which are straight rather than kinked. This process to accommodate varying quantities of fat. generates partially hydrogenated or trans-fats. A molecule of fat can contain three identical fatty acids Steroids Steroids are built around a characteristic four- (as in Figure 2.19 c), or it can be a mixed fat , containing more ringed hydrocarbon skeleton. One of the most important than one fatty acid species (as in Figure 2.19 d ). Most natural steroids is , a component of animal cell membranes fats, such as olive oil or butterfat, are mixtures of molecules and a precursor for the synthesis of a number of steroid hor- having different fatty acid species. mones, such as testosterone, progesterone, and estrogen (Fig- Fats are very rich in chemical energy; a gram of fat con- ure 2.21). Cholesterol is largely absent from plant cells, which tains over twice the energy content of a gram of carbohydrate is why vegetable oils are considered “cholesterol-free,” but (for reasons discussed in Section 3.1). Carbohydrates function plant cells may contain large quantities of related compounds. primarily as a short-term, rapidly available energy source, whereas fat reserves store energy on a long-term basis. It is es- Phospholipids The chemical structure of a common phos- timated that a person of average size contains about 0.5 kilo- pholipid is shown in Figure 2.22. The molecule resembles a grams (kg) of carbohydrate, primarily in the form of glycogen. fat (triacylglycerol), but has only two fatty acid chains rather This amount of carbohydrate provides approximately 2000 than three; it is a diacylglycerol . The third hydroxyl of the glyc- kcal of total energy. During the course of a strenuous day’s erol backbone is covalently bonded to a group, exercise, a person can virtually deplete his or her body’s entire which in turn is covalently bonded to a small polar group, such store of carbohydrate. In contrast, the average person contains as choline, as shown in Figure 2.22. Thus, unlike fat mole- approximately 16 kg of fat (equivalent to 144,000 kcal of cules, phospholipids contain two ends that have very different energy), and as we all know, it can take a very long time to properties: the end containing the phosphate group has a dis- deplete our store of this material. tinctly hydrophilic character; the other end composed of the two fatty acid tails has a distinctly hydrophobic character. Be- cause phospholipids function primarily in cell membranes, and because the properties of cell membranes depend on their CH 3 components, they will be discussed further in CH 3 CH 3 Sections 4.3 and 15.2 in connection with cell membranes.

CH 3 CH 3

HO Phosphate OH Cholesterol – CH 3 CH 3 O + H3C N CH 2 CH 2 OP O CH 2 CH O CH 3 3 O H H H H H H H H H H H H H H H H H Choline HC O C C C C C C C C C C C C C C C C C C H H H H H H H H H H H H H H H H H H 2.5 O O H H H H H H H H H H H H H H H H H OH H C O C C C C C C C C C C C C C C C C C C H

Testosterone 2 Molecules Biological of Types Four CH 3 H H H H H H H H H H H H H H H H H

Polar Glycerol Fatty head group backbone acid chains

Figure 2.22 The phospholipid . The molecule HO consists of a glycerol backbone whose hydroxyl groups are covalently Estrogen bonded to two fatty acids and a phosphate group. The negatively charged phosphate is also bonded to a small, positively charged choline Figure 2.21 The structure of steroids. All steroids share the basic group. The end of the molecule that contains the phosphorylcholine is four-ring skeleton. The seemingly minor differences in chemical struc- hydrophilic, whereas the opposite end, consisting of the fatty acid tail, ture between cholesterol, testosterone, and estrogen generate profound is hydrophobic. The structure and function of phosphatidylcholine and biological differences. other phospholipids are discussed at length in Section 4.3. 50 Proteins quence of amino acids that gives the molecule its unique prop- erties. Many of the capabilities of a protein can be understood Proteins are the macromolecules that carry out virtually all of by examining the chemical properties of its constituent amino a cell’s activities; they are the molecular tools and machines acids. Twenty different amino acids are commonly used in the that make things happen. As enzymes, proteins vastly acceler- construction of proteins, whether from a virus or a human. ate the rate of metabolic reactions; as structural cables, pro- There are two aspects of amino acid structure to consider: that teins provide mechanical support both within cells and which is common to all of them and that which is unique to outside their perimeters (Figure 2.23 a); as hormones, growth each. We will begin with the shared properties. factors, and activators, proteins perform a wide variety of regulatory functions; as membrane receptors and transporters, proteins determine what a cell reacts to and what types of sub- The Structures of Amino Acids All amino acids have a carboxyl stances enter or leave the cell; as contractile filaments and group and an amino group, which are separated from each molecular motors, proteins constitute the machinery for other by a single carbon atom, the ␣-carbon (Figure 2.24 a,b). biological movements. Among their many other functions, In a neutral aqueous solution, the ␣-carboxyl group loses its proteins act as , serve as , form blood clots, and exists in a negatively charged state (—COO Ϫ), and absorb or refract light (Figure 2.23 b), and transport substances the ␣-amino group accepts a proton and exists in a positively ϩ from one part of the body to another. charged state (NH 3 ) (Figure 2.24 b). We saw on page 44 that How can one type of molecule have so many varied func- carbon atoms bonded to four different groups can exist in two tions? The explanation resides in the virtually unlimited mo- configurations ( stereoisomers ) that cannot be superimposed on lecular structures that proteins, as a group , can assume. Each one another. Amino acids also have asymmetric carbon atoms. protein, however, has a unique and defined structure that en- With the exception of , the ␣-carbon of amino acids ables it to carry out a particular function. Most importantly, bonds to four different groups so that each amino acid can ex- proteins have shapes and surfaces that allow them to interact ist in either a D or an L form (Figure 2.25). Amino acids used selectively with other molecules. Proteins, in other words, ex- in the synthesis of a protein on a ribosome are always L-amino hibit a high degree of specificity . It is possible, for example, acids. The “selection” of L-amino acids must have occurred for a particular DNA-cutting enzyme to recognize a segment very early in cellular and has been conserved for of DNA containing one specific sequence of eight nucleotides, billions of years. , however, use D-amino acids while ignoring all the other 65,535 possible sequences com- in the synthesis of certain small , including those of posed of this number of nucleotides. the cell wall and several antibiotics (e.g., gramicidin A). During the process of protein synthesis, each amino acid The Building Blocks of Proteins Proteins are polymers becomes joined to two other amino acids, forming a long, made of amino acid monomers. Each protein has a unique se- continuous, unbranched polymer called a polypeptide chain .

(a) (b)

The The Chemical Basis of Life Figure 2.23 Two examples of the thousands of biological structures recognition; and ( b) the lenses of eyes, as in this spider, which focus composed predominantly of protein. These include ( a) feathers, light rays. ( A: D ARRELL GULIN /G ETTY IMAGES ; B: T HOMAS which are in birds for thermal insulation, flight, and sex SHAHAN /P HOTO RESEARCHERS , I NC .) Chapter Chapter 2 51

R The amino acids that make up a polypeptide chain are joined

H + H by bonds that result from the linkage of the carboxyl α C H N C group of one amino acid to the amino group of its neighbor, − H O O with the elimination of a molecule of water (Figure 2.24 c). A polypeptide chain composed of a string of amino acids joined (a) by peptide bonds has the following backbone:

Side Chain H R

+ − O H R O H R HNαC C O H H CN C C N C HH O N H CCN H C C

Amino group Carboxyl group H R O H R O (b)

R' R" The “average” polypeptide chain contains about 450 amino acids. The longest known polypeptide, found in the H NC C OH H NC C OH muscle protein titin, contains more than 30,000 amino acids. + Once incorporated into a polypeptide chain, amino acids are HH O HH O termed residues . The on one end of the chain, the

H2O N-terminus , contains an amino acid with a free (unbonded) ␣-amino group, whereas the residue at the opposite end, the ' R R" C-terminus , has a free ␣-carboxyl group. In addition to amino acids, many proteins contain other types of components that HNC C N C C OH are added after the polypeptide is synthesized. These include HH O H H O carbohydrates (to form glycoproteins), -containing groups (to form metalloproteins) and organic groups (e.g., Peptide bond flavoproteins). (c) The Properties of the Side Chains The backbone, or main Figure 2.24 Amino acid structure. Ball-and-stick model ( a) and chemical formula ( b) of a generalized amino acid in which R can be chain, of the polypeptide is composed of that part of each any of a number of chemical groups (see Figure 2.26). ( c) The amino acid that is common to all of them. The or formation of a peptide bond occurs by the condensation of two amino R group (Figure 2.24), bonded to the ␣-carbon, is highly acids, drawn here in the uncharged state. In the cell, this reaction variable among the 20 building blocks, and it is this variability occurs on a ribosome as an amino acid is transferred from a carrier that ultimately gives proteins their diverse structures and ac- (a tRNA molecule) onto the end of the growing polypeptide chain tivities. If the various amino acid side chains are considered (see Figure 11.49). together, they exhibit a large variety of structural features, ranging from fully charged to hydrophobic, and they can par- ticipate in a wide variety of covalent and noncovalent bonds. As discussed in the following chapter, the side chains of the “active sites” of enzymes can facilitate (catalyze) many differ- ent organic reactions. The assorted characteristics of the side chains of the amino acids are important in both intra molecu- D- L-Alanine lar interactions, which determine the structure and activity of 2.5 the molecule, and inter molecular interactions, which deter- COOH COOH Molecules Biological of Types Four Mirror mine the relationship of a polypeptide with other molecules, including other polypeptides (page 61). C C CH CH Amino acids are classified on the character of their side 3 3 chains. They fall roughly into four categories: polar and H NH NH H 2 2 charged, polar and uncharged, nonpolar, and those with unique properties (Figure 2.26).

1. Polar, charged. Amino acids of this group include , , , and . These four amino Figure 2.25 Amino acid stereoisomerism. Because the ␣-carbon of acids contain side chains that become fully charged; that all amino acids except glycine is bonded to four different groups, two is, the side chains contain relatively strong organic acids stereoisomers can exist. The D and L forms of alanine are shown. and bases. The ionization reactions of glutamic acid and 52 Polar charged + NH 3 C NH + – CH NH NH O O 2 3 HC NH + – C CH O O CH 2 2 CH C CH CH C NH 2 CH 2 2 CH CH CH CH 2 2 CH 2 2 2 + – + – + – + – + – H3N C C O H3N CC O H3N CC O H3N CC O H3N CC O H O H O H O H O H O Aspartic acid Glutamic acid Lysine Arginine (Asp or D) (Glu or E) (Lys or K) (Arg or R) (His or H)

Properties of side chains (R groups):

Hydrophilic side chains act as acids or bases which tend to be fully charged (+ or –) under physiologic conditions. Side chains form ionic bonds and are often involved in chemical reactions.

Polar uncharged OH O NH 2 C O NH 2 OH CH C 3 CH 2 CH C CH CH 2 H OH CH 2 2 2 + – + – + – + – + – H3N CC O H3N CC O H3N CC O H3N CC O H3N CC O H O H O H O H O H O (Ser or S) (Thr or T) (Gln or Q) (Asn or N) (Tyr or Y) Properties of side chains:

Hydrophilic side chains tend to have partial + or – charge allowing them to participate in chemical reactions, form H-bonds, and associate with water.

Nonpolar CH 3

CH 3 S NH CH 3 CH 3 CH CH CH C CH CH 3 CH 3 2 2 CH CH H C CH CH CH CH 3 CH 2 3 2 2 2 + – + – + – + – + – + – + – H3N CC O H3N CC O H3N CC O H3N C C O H3N CC O H3N CC O H3N CC O H O H O H O H O H O H O H O Alanine (Ala or A) (Val or V) (Leu or L) (Ile or I) (Met or M) (Phe or F) (Trp or W)

Properties of side chains:

Hydrophobic side chain consists almost entirely of C and H atoms. These amino acids tend to form the inner core of soluble proteins, buried away from the aqueous medium. They play an important role in membranes by associating with the lipid bilayer.

Side chains with unique properties SH CH 2 CH 2 H CH – 2 CH 2 CH C O +H N CC O – +H N CC O – N 3 3 O + H O H O H2 Glycine (Gly or G) (Cys or C) (Pro or P) Side chain consists only of hydrogen Though side chain has polar, uncharged Though side chain has hydrophobic atom and can fit into either a hydrophilic character, it has the unique property of character, it has the unique property or hydrophobic environment.Glycine often forming a with another of creating kinks in polypeptide resides at sites where two polypeptides cysteine to form a disulfide link. chains and disrupting ordered come into close contact. secondary structure.

Figure 2.26 The chemical structure of amino acids. These 20 amino are arranged into four groups based on the character of their side

The The Chemical Basis of Life acids represent those most commonly found in proteins and, more chains, as described in the text. All molecules are depicted as free specifically, those encoded by DNA. Other amino acids occur as the amino acids in their ionized state as they would exist in solution at result of a modification to one of those shown here. The amino acids neutral pH. Chapter Chapter 2 53 – O OH O O form hydrogen bonds with other molecules including C C water. These amino acids are often quite reactive. In- – CH 2 CH 2 + cluded in this category are asparagine and glutamine (the OH + H CH + CH amides of aspartic acid and glutamic acid), threonine, 2 H 2 N C C N C C serine, and tyrosine. 3. H H O H H O Nonpolar. The side chains of these amino acids are hydrophobic and are unable to form electrostatic bonds or (a) interact with water. The amino acids of this category are H alanine, valine, leucine, isoleucine, tryptophan, phenylala- + nine, and methionine. The side chains of the nonpolar NH NH 2 2 amino acids generally lack and . They CH CH 2 2 vary primarily in size and shape, which allows one or an-

CH 2 CH 2 other of them to pack tightly into a particular space – CH OH CH + within the core of a protein, associating with one another 2 + 2 + H CH H CH as the result of van der Waals forces and hydrophobic 2 2 interactions. N C C N C C 4. The other three amino acids —glycine, proline, and H H O H H O cysteine—have unique properties that separate them from (b) the others. The side chain of glycine consists of only a Figure 2.27 The ionization of charged, polar amino acids. (a) The hydrogen atom, and glycine is a very important amino side chain of glutamic acid loses a proton when its acid for just this reason. Owing to its lack of a side chain, group ionizes. The degree of ionization of the carboxyl group depends glycine residues provide a site where the backbones of two on the pH of the medium: the greater the hydrogen concentration polypeptides (or two segments of the same polypeptide) (the lower the pH), the smaller the percentage of carboxyl groups that can approach one another very closely. In addition, are present in the ionized state. Conversely, a rise in pH leads to an in- glycine is more flexible than other amino acids and allows creased ionization of the proton from the carboxyl group, increasing parts of the backbone to move or form a hinge. Proline is the percentage of negatively charged glutamic acid side chains. The pH ␣ at which 50 percent of the side chains are ionized and 50 percent are unique in having its -amino group as part of a ring unionized is called the pK, which is 4.4 for the side chain of free (making it an imino acid). Proline is a hydrophobic amino glutamic acid. At physiologic pH, virtually all of the glutamic acid acid that does not readily fit into an ordered secondary residues of a polypeptide are negatively charged. ( b) The side chain of structure, such as an ␣ helix (page 55), often producing lysine becomes ionized when its amino group gains a proton. The kinks or hinges. Cysteine contains a reactive sulfhydryl greater the hydroxyl ion concentration (the higher the pH), the smaller (—SH) group and is often covalently linked to another the percentage of amino groups that are positively charged. The pH cysteine residue, as a disulfide (—SS—) bridge . at which 50 percent of the side chains of lysine are charged and 50 percent are uncharged is 10.0, which is the pK for the side chain of free Cysteine lysine. At physiologic pH, virtually all of the lysine residues of a polypeptide are positively charged. Once incorporated into a polypep- H H O H H O tide, the pK of a charged group can be greatly influenced by the N C C N C C surrounding environment.

CH 2 CH 2

SH S Oxidation + – + 2H + 2e SH Reduction S lysine are shown in Figure 2.27. At physiologic pH, the

side chains of these amino acids are almost always present CH 2 CH 2 2.5 in the fully charged state. Consequently, they are able to

N C C N C C Molecules Biological of Types Four form ionic bonds with other charged species in the cell. For example, the positively charged arginine residues of H H O H H O proteins are linked by ionic bonds to the nega- tively charged phosphate groups of DNA (see Figure 2.3). Disulfide bridges often form between two that Histidine is also considered a polar, charged amino acid, are distant from one another in the polypeptide backbone or though in most cases it is only partially charged at physi- even in two separate polypeptides. Disulfide bridges help sta- ologic pH. In fact, because of its ability to gain or lose a bilize the intricate shapes of proteins, particularly those pres- proton in physiologic pH ranges, histidine is a particu- ent outside of cells where they are subjected to added physical larly important residue in the active site of many proteins and chemical . (as in Figure 3.13). Not all of the amino acids described in this section are 2. Polar, uncharged. The side chains of these amino acids found in all proteins, nor are the various amino acids distrib- have a partial negative or positive charge and thus can uted in an equivalent manner. A number of other amino acids 54 are also found in proteins, but they arise by alterations to the (Figure 2.28 a). In contrast, the nonpolar residues are situated side chains of the 20 basic amino acids after their incorpora- predominantly in the core of the molecule (Figure 2.28 b). The tion into a polypeptide chain. For this reason they are called hydrophobic residues of the protein interior are often tightly posttranslational modifications (PTMs) . Dozens of differ- packed together, creating a type of three-dimensional jigsaw ent types of PTMs have been documented. The most wide- puzzle in which water molecules are generally excluded. spread and important PTM is the reversible addition of a Hydrophobic interactions among the nonpolar side chains of phosphate group to a serine, threonine, or tyrosine residue. these residues are a driving force during Lysine acetylation is another widespread and important PTM (page 64) and contribute substantially to the overall stability affecting thousands of proteins in a mammalian cell. PTMs of the protein. In many enzymes, reactive polar groups project can generate dramatic changes in the properties and function into the nonpolar interior, giving the protein its catalytic ac- of a protein, most notably by modifying its three-dimensional tivity. For example, a nonpolar environment can enhance ionic structure, level of activity, localization within the cell, life interactions between charged groups that would be lessened span, and/or its interactions with other molecules. The pres- by competition with water in an aqueous environment. Some ence or absence of a single phosphate group on a key regula- reactions that might proceed at an imperceptibly slow rate in tory protein has the potential to determine whether or not a water can occur in millionths of a second within the protein. cell will behave as a cancer cell or a normal cell. Because of PTMs, a single polypeptide can exist as a number of distinct The Structure of Proteins Nowhere in biology is the inti- biological molecules. mate relationship between form and function better illus- The ionic, polar, or nonpolar character of amino acid side trated than with proteins. The structure of most proteins is chains is very important in protein structure and function. completely defined and predictable. Each amino acid in one of Most soluble (i.e., nonmembrane) proteins are constructed so these giant macromolecules is located at a specific site within that the polar residues are situated at the surface of the mole- the structure, giving the protein the precise shape and reactiv- cule where they can associate with the surrounding water and ity required for the job at hand. Protein structure can be de- contribute to the protein’s solubility in aqueous solution scribed at several levels of organization, each emphasizing a

(a) (b)

Figure 2.28 Disposition of hydrophilic and hydrophobic amino acid primarily within the center of the protein, particularly in the vicinity residues in the soluble protein cytochrome c. (a) The hydrophilic side of the central group. (I LLUSTRATION , I RVING GEIS . I MAGE

The The Chemical Basis of Life chains, which are shown in green, are located primarily at the surface FROM IRVING GEIS COLLECTION /H OWARD HUGHES MEDICAL of the protein where they contact the surrounding aqueous medium. INSTITUTE . R IGHTS OWNED BY HHMI. R EPRODUCED BY (b) The hydrophobic residues, which are shown in red, are located PERMISSION ONLY .) Chapter Chapter 2 55 differences in amino acid sequence in the same protein among related . The degree to which changes in the pri- mary sequence are tolerated depends on the degree to which the shape of the protein or the critical functional residues are disturbed. The first amino acid sequence of a protein was deter- mined by and co-workers at Cambridge University in the early 1950s. Beef insulin was chosen for the study because of its availability and its small size—two polypeptide chains of 21 and 30 amino acids each. The of insulin was a momentous feat in the newly emerging field of . It revealed that proteins, the most complex molecules in cells, have a definable sub- structure that is neither regular nor repeating, unlike those of polysaccharides. Each particular polypeptide, whether insulin Figure 2.29 Scanning electron micrograph of a red blood cell from a or some other species, has a precise sequence of amino acids person with sickle cell anemia. Compare with the micrograph of a nor- that does not vary from one molecule to another. With the mal red blood cell of Figure 4.32 a. (C OURTESY OF J. T.T HORNWAITE , advent of techniques for rapid DNA sequencing (see Section B. F. C AMERON , AND R. C. L EIF .) 18.14), the primary structure of a polypeptide can be deduced from the sequence of the encoding gene. In the past few years, the complete sequences of the of hundreds of organisms, including humans, have been deter- mined. This information will eventually allow researchers to different aspect and each dependent on different types of in- learn about every protein that an can manufacture. teractions. Customarily, four such levels are described: pri- However, translating information about primary sequence mary , secondary , tertiary , and quaternary . The first, primary into knowledge of higher levels of protein structure remains a structure, concerns the amino acid sequence of a protein, formidable challenge. whereas the latter three levels concern the organization of the molecule in space. To understand the Secondary Structure All matter exists in space and therefore and biological function of a protein it is essential to know how has a three-dimensional expression. Proteins are formed by that protein is constructed. linkages among vast numbers of atoms; consequently their shape is complex. The term conformation refers to the three- Primary Structure The primary structure of a polypeptide is dimensional arrangement of the atoms of a molecule, that is, the specific linear sequence of amino acids that constitute the to their spatial organization. Secondary structure describes the chain. With 20 different building blocks, the number of dif- conformation of portions of the polypeptide chain. Early ferent polypeptides that can be formed is 20 n, where n is the studies on secondary structure were carried out by Linus Paul- number of amino acids in the chain. Because most polypep- ing and Robert Corey of the California Institute of Technology. tides contain well over 100 amino acids, the variety of possible By studying the structure of simple peptides consisting of a sequences is essentially unlimited. The information for the few amino acids linked together, Pauling and Corey con- precise order of amino acids in every protein that an organism cluded that polypeptide chains exist in preferred conforma- can produce is encoded within the of that organism. tions that provide the maximum possible number of hydrogen As we will see later, the amino acid sequence provides bonds between neighboring amino acids. the information required to determine a protein’s three- Two conformations were proposed. In one conformation, dimensional shape and thus its function. The sequence of the backbone of the polypeptide assumed the form of a cylin- amino acids, therefore, is all-important, and changes that arise drical, twisting spiral called the alpha ( ␣) helix (Figure 2.5 in the sequence as a result of genetic mutations in the DNA 2.30a,b ). The backbone lies on the inside of the helix, and the Four Types of Biological Molecules Biological of Types Four may not be readily tolerated. The earliest and best-studied side chains project outward. The helical structure is stabilized example of this relationship is the change in the amino acid by hydrogen bonds between the atoms of one peptide bond sequence of that causes the sickle cell anemia . and those situated just above and below it along the spiral This severe, inherited anemia results solely from a single (Figure 2.30 c). The X-ray diffraction patterns of actual pro- change in amino acid sequence within the hemoglobin mole- teins produced during the 1950s bore out the existence of the cule: a nonpolar valine residue is present where a charged glu- ␣ helix, first in the protein keratin found in hair and later in tamic acid is normally located. This change in hemoglobin various oxygen-binding proteins, such as myoglobin and structure can have a dramatic effect on the shape of red hemoglobin (see Figure 2.34). Surfaces on opposite sides of an blood cells, converting them from disk-shaped cells to sickle- ␣ helix may have contrasting properties. In water-soluble pro- shaped cells (Figure 2.29), which tend to clog small blood teins, the outer surface of an ␣ helix often contains polar vessels, causing pain and life-threatening crises. Not all amino residues in contact with the , whereas the surface acid changes have such a dramatic effect, as evidenced by the facing inward typically contains nonpolar side chains. 56 Figure 2.30 The . (a) Tubular repre- ␣ N sentation of an helix. The atoms of the main C O chain would just fit within a tube of this CR H radius. ( b) The helical path around a central axis O N C C H taken by the polypeptide backbone in a region of C N C C O N H ␣ helix. Each complete (360 Њ) turn of H R C the helix corresponds to 3.6 amino acid residues. N C C H N C N CO The distance along the axis between adjacent C R H C residues is 1.5 Å. ( c) The arrangement of the C H N C O atoms of the backbone of the ␣ helix and the hy- N C R H C C C drogen bonds that form between amino acids. N H N C O R Because of the helical rotation, the peptide bonds C H C C of every fourth amino acid come into close prox- N H C N imity. The approach of the carbonyl group N C R C C C O (C PO) of one peptide bond to the imine group H 3.6 N (H—N) of another peptide bond results in the C R C H O residues C formation of hydrogen bonds between them. The N H N Hydrogen C C O hydrogen bonds (orange bars) are essentially par- N C H R bond C C allel to the axis of the cylinder and thus hold the C O H N turns of the chain together. ( A: F ROM JAYNATH N H C R C C H R. B ANAVER AND AMOS MARITAN . F IGURE N O C R C CREATED BY TIMOTHY LEZON . R EPRINTED C H WITH PERMISSION FROM THE ANNUAL O REVIEW OF , V OLUME 36; ©2007, BY ANNUAL REVIEWS , I NC .) (a) (b) (c)

The second conformation proposed by Pauling and Corey of hydrogen bonds, but these are oriented perpendicular to the was the beta ( ␤)-pleated sheet , which consists of several seg- long axis of the polypeptide chain and project across from one ments of a polypeptide lying side by side (Figure 2.31 a). Un- part of the chain to another (Figure 2.31 c). The strands of a ␤ like the coiled, cylindrical form of the ␣ helix, the backbone of sheet can be arranged either parallel or antiparallel (as in Fig- each segment of polypeptide (or ␤ strand ) in a ␤ sheet as- ure 2.31 a) to one another. Like the ␣ helix, the ␤ sheet has sumes a folded or pleated conformation (Figure 2.31b). Like also been found in many different proteins. Because ␤ strands the ␣ helix, the ␤ sheet is also characterized by a large number are highly extended, the ␤ sheet resists pulling (tensile) forces.

Figure 2.31 The ␤-pleated sheet. (a) Tubular R R R R

representation of an antiparallel ␤ sheet. C C C C ( ) Each polypeptide of a ␤ sheet assumes an C N C N C N C N b N C N C N C N C extended but pleated conformation referred to as C C C C a ␤ strand. The pleats result from the location of R R R R the ␣-carbons above and below the plane of the (b) sheet. Successive side chains (R groups in the figure) project upward and downward from the backbone. The distance along the axis between adjacent residues is 3.5 Å ( c) A ␤-pleated sheet consists of a number of ␤ strands that lie parallel to one another and are joined together by a regu- lar array of hydrogen bonds between the carbonyl and imine groups of the neighboring backbones. Neighboring segments of the polypeptide backbone may lie either parallel (in → the same N-terminal C-terminal direction) o or antiparallel (in the opposite N-terminal → 7.0A C-terminal direction). ( A: F ROM JAYNATH R. (a) (c) BANAVER AND AMOS MARITAN . F IRURE CRE -

The The Chemical Basis of Life ATED BY TIMOTHY LEZON . R EPRINTED WITH PERMISSION FROM FROM THE IRVING GEIS COLLECTION /H OWARD HUGHES THE ANNUAL REVIEW OF BIOPHYSICS , VOLUME 36; © 2007, BY MEDICAL INSTITUTE . R IGHTS OWNED BY HHMI. R EPRODUCTION ANNUAL REVIEWS , I NC .; B. C: I LLUSTRATION , I RVING GEIS . I MAGE BY PERMISSION ONLY .) Chapter Chapter 2 57

Figure 2.32 A ribbon model of ribonuclease. The regions of ␣ helix are depicted as spirals and ␤ strands as flattened ribbons with the Figure 2.33 An X-ray diffraction pattern of myoglobin. The pattern arrows indicating the N-terminal → C-terminal direction of the of spots is produced as a beam of X-rays is diffracted by the atoms in polypeptide. Those segments of the chain that do not adopt a regular the protein crystal, causing the X-rays to strike the film at specific sites. secondary structure (i.e., an ␣ helix or ␤ strand) consist largely of loops Information derived from the position and intensity (darkness) of the and turns. Disulfide bonds are shown in yellow. (H AND DRAWN BY spots can be used to calculate the positions of the atoms in the protein JANE S. R ICHARDSON .) that diffracted the beam, leading to complex structures such as that shown in Figure 2.34. (C OURTESY OF JOHN C. K ENDREW .)

Silk is composed of a protein containing an extensive amount The detailed tertiary structure of a protein is usually de- of ␤ sheet; silk fibers are thought to owe their strength to this termined using the technique of X-ray .5 In architectural feature. Remarkably, a single fiber of spider silk, this technique (which is described in more detail in Sections which may be a tenth the thickness of a human hair, is roughly 3.2 and 18.8), a crystal of the protein is bombarded by a thin five times stronger than a steel fiber of comparable weight. beam of X-rays, and the radiation that is scattered (diffracted) Those portions of a polypeptide chain not organized into by the electrons of the protein’s atoms is allowed to strike a ra- an ␣ helix or a ␤ sheet may consist of hinges, turns, loops, or diation-sensitive plate or detector, forming an image of spots, finger-like extensions. Often, these are the most flexible por- such as those of Figure 2.33. When these diffraction patterns tions of a polypeptide chain and the sites of greatest biological are subjected to complex mathematical analysis, an investiga- activity. For example, molecules are known for their tor can work backward to derive the structure responsible for specific interactions with other molecules (antigens); these in- producing the pattern. teractions are mediated by a series of loops at one end of the For many years it was presumed that all proteins had a antibody molecule (see Figures 17.15 and 17.16). The various fixed three-dimensional structure, which gave each protein its types of secondary structures are most simply depicted as unique properties and specific functions. It came as a surprise 2.5 shown in Figure 2.32: ␣ helices are represented by helical rib- to discover over the past decade or so that many proteins of Four Types of Biological Molecules Biological of Types Four bons, ␤ strands as flattened arrows, and connecting segments higher organisms contain sizable segments that lack a defined as thinner strands. conformation. Examples of proteins containing these types of unstructured (or disordered ) segments can be seen in the mod- Tertiary Structure The next level above secondary structure els of the PrP protein in Figure 1 on page 67 and the histone is tertiary structure, which describes the conformation of the tails in Figure 12.13 c. The disordered regions in these proteins entire polypeptide. Whereas secondary structure is stabilized are depicted as dashed lines in the images, conveying the primarily by hydrogen bonds between atoms that form the peptide bonds of the backbone, tertiary structure is stabilized 5 by an array of noncovalent bonds between the diverse side The three-dimensional structure of small proteins can also be determined by nuclear magnetic resonance (NMR) , which is not discussed in this chains of the protein. Secondary structure is largely limited to text (see the supplement to the July issue of Struct. Biol. , 1998, Nature a small number of conformations, but tertiary structure is vir- Struct. Biol. 7:982, 2000, and Biochem. Sci. 34: 601, 2009 for reviews of tually unlimited. this technology). Figure 1 a, page 67, shows an NMR-derived structure. 58 fact that these segments of the polypeptide (like pieces of fraction patterns such as that shown in Figure 2.33. The pro- spaghetti) can occupy many different positions and, thus, can- tein they reported on was myoglobin. not be studied by X-ray crystallography. Disordered segments Myoglobin functions in muscle tissue as a storage site for tend to have a predictable amino acid composition, being oxygen; the oxygen molecule is bound to an iron atom in the enriched in charged and polar residues and deficient in hy- center of a heme group. (The heme is an example of a pros- drophobic residues. You might be wondering whether pro- thetic group , i.e., a portion of the protein that is not composed teins lacking a fully defined structure could be engaged in a of amino acids, which is joined to the polypeptide chain after useful function. In fact, the disordered regions of such pro- its assembly on the ribosome.) It is the heme group of myo- teins play key roles in vital cellular processes, often binding to that gives most muscle tissue its reddish color. DNA or to other proteins. Remarkably, these segments often The first report on the structure of myoglobin provided a undergo a physical transformation once they bind to an ap- low-resolution profile sufficient to reveal that the molecule propriate partner and are then seen to possess a defined, was compact (globular) and that the polypeptide chain was folded structure. folded back on itself in a complex arrangement. There was no Most proteins can be categorized on the basis of their evidence of regularity or symmetry within the molecule, such overall conformation as being either fibrous proteins , which as that revealed in the earlier description of the DNA double have an elongated shape, or globular proteins , which have a helix. This was not surprising considering the singular func- compact shape. Most proteins that act as structural materials tion of DNA and the diverse functions of protein molecules. outside living cells are fibrous proteins, such as collagens and The earliest crude profile of myoglobin revealed eight elastins of connective tissues, keratins of hair and skin, and rod-like stretches of ␣ helix ranging from 7 to 24 amino acids silk. These proteins resist pulling or shearing forces to which in length. Altogether, approximately 75 percent of the 153 they are exposed. In contrast, most proteins within the cell are amino acids in the polypeptide chain is in the ␣-helical con- globular proteins. formation. This is an unusually high percentage compared with that for other proteins that have since been examined. Myoglobin: The First Globular Protein Whose Tertiary Structure No ␤-pleated sheet was found. Subsequent analyses of myo- Was Determined The polypeptide chains of globular proteins globin using additional X-ray diffraction data provided a are folded and twisted into complex shapes. Distant points on much more detailed picture of the molecule (Figures 2.34 a the linear sequence of amino acids are brought next to each and 3.16). For example, it was shown that the heme group is other and linked by various types of bonds. The first glimpse situated within a pocket of hydrophobic side chains that pro- at the tertiary structure of a globular protein came in 1957 motes the binding of oxygen without the oxidation (loss of through the X-ray crystallographic studies of John Kendrew electrons) of the iron atom. Myoglobin contains no disulfide and his colleagues at Cambridge University using X-ray dif- bonds; the tertiary structure of the protein is held together

(a) (b)

Figure 2.34 The three-dimensional structure of myoglobin. (a) The indicated in red). The positions of all of the molecule’s atoms, other tertiary structure of whale myoglobin. Most of the amino acids are part than hydrogen, are shown. ( A: I LLUSTRATION , I RVING GEIS . I MAGE

The The Chemical Basis of Life of ␣ helices. The nonhelical regions occur primarily as turns, where the FROM IRVING GEIS COLLECTION /H OWARD HUGHES MEDICAL polypeptide chain changes direction. The position of the heme is indi- INSTITUTE . R IGHTS OWNED BY HHMI. R EPRODUCED BY cated in red. ( b) The three-dimensional structure of myoglobin (heme PERMISSION ONLY ; B: K EN EWARD /P HOTO RESEARCHERS .) Chapter Chapter 2 59 exclusively by noncovalent interactions. All of the noncovalent protein (Figure 2.36). Some domains have been found only in bonds thought to occur between side chains within proteins— one or a few proteins. Other domains have been shuffled hydrogen bonds, ionic bonds, and van der Waals forces—have widely about during evolution, appearing in a variety of pro- been found (Figure 2.35). Unlike myoglobin, most globular teins whose other regions show little or no evidence of an evo- proteins contain both ␣ helices and ␤ sheets. Most impor- lutionary relationship. Shuffling of domains creates proteins tantly, these early landmark studies revealed that each protein with unique combinations of activities. On average, mam- has a unique tertiary structure that can be correlated with its malian proteins tend to be larger and contain more domains amino acid sequence and its biological function. than proteins of less complex organisms, such as flies and . Protein Domains Unlike myoglobin, most eukaryotic pro- teins are composed of two or more spatially distinct modules, Dynamic Changes within Proteins Although X-ray crystallo- or domains , that fold independent of one another. For exam- graphic structures possess exquisite detail, they are static im- ple, the mammalian enzyme phospholipase C, shown in the ages frozen in time. Proteins, in contrast, are not static and central part of Figure 2.36, consists of four distinct domains, inflexible, but capable of considerable internal movements. colored differently in the drawing. The different domains of a Proteins are, in other words, molecules with “moving parts.” polypeptide often represent parts that function in a semi-in- Because they are tiny, nanometer-sized objects, proteins can dependent manner. For example, they might bind different be greatly influenced by the energy of their environment. factors, such as a coenzyme and a or a DNA strand Random, small-scale fluctuations in the arrangement of the and another protein, or they might move relatively independ- bonds within a protein create an incessant thermal motion ent of one another. Protein domains are often identified with within the molecule. Spectroscopic techniques, such as nu- a specific function. For example, proteins containing a PH do- clear magnetic resonance (NMR), can monitor dynamic main bind to membranes containing a specific phospholipid, whereas proteins containing a chromodomain bind to a methylated lysine residue in another protein. The functions of Bacterial a newly identified protein can usually be predicted by the do- phospholipase C mains of which it is made. Troponin C Many polypeptides containing more than one are thought to have arisen during evolution by the fusion of that encoded different ancestral proteins, with each domain representing a part that was once a separate molecule. Each domain of the mammalian phospholipase C molecule, for ex- ample, has been identified as a homologous unit in another

van der Waals forces

Mammalian phospho- CH lipase C CH 3 CH 3 CH 3 CH CH CH 3 3 3 CH

Hydrogen bond 2.5

C OH O NH Synapto- 2 Molecules Biological of Types Four C tagmin

Recoverin O + – CH 2 CH 2 CH 2 CH 2 NH 3 OC CH 2 Figure 2.36 Proteins are built of structural units, or domains. The Ionic bond mammalian enzyme phospholipase C is constructed of four domains, indicated in different colors. The catalytic domain of the enzyme is shown in blue. Each of the domains of this enzyme can be found inde- pendently in other proteins as indicated by the matching color. (F ROM Figure 2.35 Types of noncovalent bonds maintaining the LIISA HOLM AND CHRIS SANDER , S TRUCTURE 5:167, © 2007, conformation of proteins. WITH PERMISSION FROM ELSEVIER .) 60 movements within proteins, and they reveal shifts in hydrogen bonds, waving movements of external side chains, and the full rotation of the aromatic rings of tyrosine and phenylalanine residues about one of the single bonds. The important role that such movements can play in a protein’s function is illus- trated in studies of acetylcholinesterase, the enzyme responsi- ble for degrading acetylcholine molecules that are left behind following the transmission of an impulse from one nerve cell to another (Section 4.8). When the tertiary structure of acetylcholinesterase was first revealed by X-ray crystallogra- phy, there was no obvious pathway for acetylcholine molecules to enter the enzyme’s catalytic site, which was situated at the bottom of a deep gorge in the molecule. In fact, the narrow entrance to the site was completely blocked by the presence of a number of bulky amino acid side chains. Using high-speed Figure 2.37 Dynamic movements within the enzyme computers, researchers have been able to simulate the random acetylcholinesterase. A portion of the enzyme is depicted movements of thousands of atoms within the enzyme, a feat here in two different conformations: (1) a closed conformation that cannot be accomplished using experimental techniques. (left) in which the entrance to the catalytic site is blocked by the These molecular dynamic (MD) indicated that presence of aromatic rings that are part of the side chains of tyrosine movements of side chains within the protein would lead to the and phenylalanine residues (shown in purple) and (2) an open rapid opening and closing of a “gate” that would allow acetyl- conformation (right) in which the aromatic rings of these side chains have swung out of the way, opening the “gate” to allow acetylcholine choline molecules to diffuse into the enzyme’s catalytic site molecules to enter the catalytic site. These images are constructed using (Figure 2.37). The X-ray crystallographic structure of a pro- computer programs that take into account a host of information about tein (i.e., its crystal structure ) can be considered an average the atoms that make up the molecule, including bond lengths, bond structure, or “ground state.” A protein can undergo dynamic angles, electrostatic attraction and repulsion, van der Waals forces, etc. excursions from the ground state and assume alternate confor- Using this information, researchers are able to simulate the movements mations that are accessible based on the amount of energy that of the various atoms over a very short time period, which provides the protein contains. images of the conformations that the protein can assume. An animation Predictable (nonrandom) movements within a protein that of this image can be found on the Web at http://mccammon.ucsd.edu are triggered by the binding of a specific molecule are described (C OURTESY OF J. A NDREW MCCAMMON .) as conformational changes . Conformational changes typically involve the coordinated movements of various parts of the mol- ecule. A comparison of the polypeptides depicted in Figures 3 a and 3 b on page 82 shows the dramatic homodimer , whereas a protein composed of two nonidentical that occurs in a bacterial protein (GroEL) when it interacts subunits is a heterodimer . A ribbon drawing of a homodimeric with another protein (GroES). Virtually every activity in which protein is depicted in Figure 2.38 a. The two subunits of the a protein takes part is accompanied by conformational changes protein are drawn in different colors, and the hydrophobic within the molecule (see http://molmovdb.org to watch exam- residues that form the complementary sites of contact are in- ples). The conformational change in the protein that dicated. The best-studied multisubunit protein is hemoglo- occurs during muscle contraction is shown in Figures 9.60 and bin, the O 2-carrying protein of red blood cells. A molecule of 9.61. In this case, binding of myosin to an molecule leads to human hemoglobin consists of two ␣-globin and two ␤-globin a small (20 Њ) rotation of the head of the myosin, which results polypeptides (Figure 2.38 b), each of which binds a single in a 50 to 100 Å movement of the adjacent actin filament. The molecule of oxygen. Elucidation of the three-dimensional importance of this dynamic event can be appreciated if you con- structure of hemoglobin by Max Perutz of Cambridge Uni- sider that the movements of your body result from the additive versity in 1959 was one of the early landmarks in the study of effect of millions of conformational changes taking place within molecular biology. Perutz demonstrated that each of the four the contractile proteins of your muscles. globin polypeptides of a hemoglobin molecule has a tertiary structure similar to that of myoglobin, a fact that strongly sug- Quaternary Structure Whereas many proteins such as myo- gested the two proteins had evolved from a common ancestral globin are composed of only one polypeptide chain, most are polypeptide with a common O 2-binding mechanism. Perutz made up of more than one chain, or subunit . The subunits also compared the structure of the oxygenated and deoxy- may be linked by covalent disulfide bonds, but most often they genated versions of hemoglobin. In doing so, he discovered are held together by noncovalent bonds as occur typically be- that the binding of oxygen was accompanied by the move- tween hydrophobic “patches” on the complementary surfaces ment of the bound iron atom closer to the plane of the heme of neighboring polypeptides. Proteins composed of subunits group.This seemingly inconsequential shift in position of a sin- The The Chemical Basis of Life are said to have quaternary structure . Depending on the pro- gle atom pulled on an ␣ helix to which the iron is connected, tein, the polypeptide chains may be identical or nonidentical. which in turn led to a series of increasingly larger movements A protein composed of two identical subunits is described as a within and between the subunits. This finding revealed for the Chapter Chapter 2 61 first time that the complex functions of proteins may be carried different proteins, each with a specific function, become phys- out by means of small changes in their conformation. ically associated to form a much larger multiprotein complex . One of the first multiprotein complexes to be discovered and Protein–Protein Interactions Even though hemoglobin studied was the pyruvate dehydrogenase complex of the bac- consists of four subunits, it is still considered a single protein terium Escherichia coli , which consists of 60 polypeptide chains with a single function. Many examples are known in which constituting three different enzymes (Figure 2.39). The en- zymes that make up this complex catalyze a series of reactions connecting two metabolic pathways, and the TCA cycle (see Figure 5.7). Because the enzymes are so closely as- sociated, the product of one enzyme can be channeled directly to the next enzyme in the sequence without becoming diluted in the aqueous medium of the cell. Multiprotein complexes that form within the cell are not necessarily stable assemblies, such as the pyruvate dehydroge- nase complex. In fact, most proteins interact with other proteins in highly dynamic patterns, associating and dissociating de- pending on conditions within the cell at any given time. Interacting proteins tend to have complementary surfaces. Of- ten a projecting portion of one molecule fits into a pocket within its partner. Once the two molecules have come into close (a) contact, their interaction is stabilized by noncovalent bonds. The reddish-colored object in Figure 2.40 a is called an SH3 domain, and it is found as part of more than 200 differ- ent proteins involved in molecular signaling. The surface of an SH3 domain contains shallow hydrophobic “pockets” that be- come filled by complementary “knobs” projecting from an- other protein (Figure 2.40 b). A large number of different structural domains have been identified that, like SH3, act as adaptors to mediate interactions between proteins. In many cases, protein–protein interactions are regulated by modifica- tions, such as the addition of a phosphate group to a key amino acid, which act as a switch to turn on or off the protein’s ability to bind a protein partner. As more and more complex molecular activities have been discovered, the importance of

(b) Figure 2.38 Proteins with quaternary structure. (a) Drawing of transforming growth factor- ␤2 (TGF- ␤2), a protein that is a dimer composed of two identical subunits. The two subunits are colored yel- low and blue. Shown in white are the cysteine side chains and disulfide bonds. The spheres shown in yellow and blue are the hydrophobic 2.5 residues that form the interface between the two subunits. ( b) Drawing of a hemoglobin molecule, which consists of two ␣-globin chains and Molecules Biological of Types Four two ␤-globin chains (a heterotetramer) joined by noncovalent bonds. (a)20 nm (b) When the four globin polypeptides are assembled into a complete he- moglobin molecule, the kinetics of O 2 binding and release are quite Figure 2.39 Pyruvate dehydrogenase: a multiprotein complex. (a) different from those exhibited by isolated polypeptides. This is because Electron micrograph of a negatively stained pyruvate dehydrogenase the binding of O 2 to one polypeptide causes a conformational change complex isolated from E. coli. Each complex contains 60 polypeptide in the other polypeptides that alters their affinity for O 2 molecules. ( A: chains constituting three different enzymes. Its molecular mass FROM S. D AOPIN ET AL ., S CIENCE 257:372, COURTESY OF DAVID approaches 5 million daltons. ( b) A model of the pyruvate dehydroge- R. D AVIES , © 1992, REPRODUCED WITH PERMISSION FROM AAAS; nase complex. The core of the complex consists of a cube-like cluster of B: I LLUSTRATION , I RVING GEIS . I MAGE FROM IRVING GEIS dihydrolipoyl transacetylase molecules. Pyruvate dehydrogenase dimers REPRODUCED WITH PERMISSION FROM AAAS. C OLLECTION / (black spheres) are distributed symmetrically along the edges of the HOWARD HUGHES MEDICAL INSTITUTE . R IGHTS OWNED BY cube, and dihydrolipoyl dehydrogenase dimers (small gray spheres) are HHMI. R EPRODUCED BY PERMISSION ONLY .) positioned in the faces of the cube. (C OURTESY OF LESTER J. R EED .) 62 being studied are introduced into the same yeast cell. If the yeast cell tests positive for a particular reporter protein, which is indicated by an obvious color change in the yeast cell, then the two proteins in question had to have interacted within the yeast . Interactions between proteins are generally studied one at a time, producing the type of data seen in Figure 2.40. In re- cent years, a number of research teams have set out to study protein–protein interactions on a global scale. For example, one might want to know all of the interactions that occur among the 14,000 or so proteins encoded by the genome of the fruit fly Drosophila melanogaster . Now that the entire genome of this has been sequenced, virtually every gene within the genome is available as an individual DNA segment that can be cloned and used as desired. Consequently, it should be possible to test the proteins encoded by the fly (a) genome, two at a time, for possible interactions in a Y2H as- say. One study of this type reported that, of the millions of X X possible combinations, more than 20,000 interactions were Ar g X detected among 7048 fruit-fly proteins tested. X Although the Y2H has been the mainstay in the study of protein–protein interactions for more than 15 years, it is an indirect assay (see Figure 18.27) and is fraught with uncer- Pro Pro tainties. On one hand, a large percentage of interactions known to occur between specific proteins fail to be detected in these types of . The Y2H assay, in other words, gives a significant number of false negatives. The Y2H assay is also (b) known to generate a large number of false positives; that is, it indicates that two proteins are capable of interacting when it is Figure 2.40 Protein–protein interactions. (a) A model illustrating known from other studies that they do not do so under normal the complementary molecular surfaces of portions of two interacting conditions within cells. In the study of fruit-fly proteins de- proteins. The reddish-colored molecule is an SH3 domain of the scribed above, the authors used computer analyses to narrow enzyme PI3K, whose function is discussed in Chapter 15. This domain the findings from the original 20,000 interactions to approxi- binds specifically to a variety of proline-containing peptides, such as the one shown in the space-filling model at the top of the figure. The mately 5000 interactions in which they had high confidence. proline residues in the peptide, which fit into hydrophobic pockets on Overall, it is estimated that, on average, each protein encoded in the surface of the enzyme, are indicated. The polypeptide backbone of the genome of a eukaryotic organism interacts with about five the peptide is colored yellow, and the side chains are colored green. different protein partners. According to this estimate, human (b) Schematic model of the interaction between an SH3 domain and a proteins would engage in roughly 100,000 different interactions. peptide showing the manner in which certain residues of the peptide fit The results from large-scale protein–protein interaction into hydrophobic pockets in the SH3 domain. ( A: F ROM HONGTAO studies can be presented in the form of a network, such as that YU AND STUART SCHREIBER , C ELL 76:940, © 1994, WITH shown in Figure 2.41. This figure displays the potential bind- PERMISSION FROM ELSEVIER .) ing partners of the various yeast proteins that contain an SH3 domain (see Figure 2.40 a) and illustrates the complexities of such interactions at the level of an entire organism. In this interactions among proteins has become increasingly appar- particular figure, only those interactions detected by two en- ent. For example, such diverse processes as DNA synthesis, tirely different assays (Y2H and a biochemical technique that ATP formation, and RNA processing are all accomplished by involves purification and analysis of protein complexes) are “molecular machines” that consist of a large number of inter- displayed, which makes the conclusions much more reliable acting proteins, some of which form stable relationships and than those based on a single technology. Those proteins that others transient liasons. Several hundred different protein have multiple binding partners, such as Las17 situated near complexes have been purified in large-scale studies on yeast. the center of Figure 2.41, are referred to as hubs . Hub proteins Most investigators who study protein–protein interac- are more likely than non-hub proteins to be essential proteins, tions want to know whether one protein they are working that is, proteins that the organism cannot survive without. with, call it protein X, interacts physically with another pro- Some hub proteins have several different binding interfaces tein, call it protein Y. This type of question can be answered and are capable of binding a number of different binding part- The The Chemical Basis of Life using a genetic technique called the yeast two-hybrid (Y2H) ners at the same time. In contrast, other hubs have a single assay, which is discussed in Section 18.7 and illustrated in Fig- binding interface, which is capable of binding several different ure 18.27. In this technique, genes encoding the two proteins partners, but only one at a time. Examples of each of these Chapter Chapter 2 63

Vrp1 Bck1 Yhl002w ously unknown. What do these proteins do? One approach to Bni1 Ubp7 Prk1 determining a protein’s function is to identify the proteins Bnr1 Srv2 Ykl105c with which it associates. If, for example, a known protein has Ark1 Pbs1 Ydl146w Abp1 been shown to be involved in DNA replication, and an un- Myo5 Fun21 Myo3 Fus1 known protein is found to interact with the known protein, Ydr409w Sho1 Yfr024c Bzz1 Las17 then it is likely that the unknown protein is also part of the Yir003w Ypr171w Sla1 cell’s DNA-replication machinery. Thus, regardless of their Bbc1 limitations, these large-scale Y2H studies (and others using Ygr136w Acf2 Yer158c different assays not discussed) provide the starting point to Ygr268c Rvs167 Ysc84 explore a myriad of previously unknown protein–protein in- Boi1 Ynl09w teractions, each of which has the potential to lead investiga- Ypr154w Yjr083c tors to a previously unknown biological process. Ymr192w

Yor197w Ypl249c Protein Folding The elucidation of the tertiary structure of Figure 2.41 A network of protein–protein interactions. Each red myoglobin in the late 1950s led to an appreciation for the line represents an interaction between two yeast proteins, which are complexity of protein architecture. An important question indicated by the named black dots. In each case, the arrow points from immediately arose: How does such a complex, folded, asym- an SH3 domain protein to a target protein with which it can bind. The metric organization arise in the cell? The first insight into this 59 interactions depicted here were detected using two different types of problem began with a serendipitous observation in 1956 by techniques that measure protein–protein interactions. (See Trends Christian Anfinsen at the National Institutes of . An- Biochem. Sci . 34:1, and 579, 2009, for discussion of the validity of finsen was studying the properties of ribonuclease A, a small protein–protein interaction studies.) (F ROM A.H.Y. T ONG , ET AL ., enzyme that consists of a single polypeptide chain of 124 295:323, 2002, COPYRIGHT © 2002. REPRINTED WITH amino acids with 4 disulfide bonds linking various parts of the PERMISSION FROM AAAS.) chain. The disulfide bonds of a protein are typically broken (reduced) by adding a reducing agent, such as mercap- types of hub proteins are illustrated in Figure 2.42. The hub toethanol, which converts each disulfide bridge to a pair of protein depicted in Figure 2.42 a plays a central role in the sulfhydryl (—SH) groups (see drawing, page 53). To make all process of , while that shown in Figure 2.42 b of the disulfide bonds accessible to the reducing agent, Anfin- plays an equally important role in the process of cell division. sen found that the molecule had to first be partially unfolded. Aside from obtaining a lengthy list of potential interac- The unfolding or disorganization of a protein is termed tions, what do we learn about cellular activities from these denaturation , and it can be brought about by a variety of types of large-scale studies? Most importantly, they provide a agents, including detergents, organic solvents, radiation, heat, guideline for further investigation. Genome-sequencing proj- and compounds such as urea and guanidine chloride, all of ects have provided scientists with the amino acid sequences of which interfere with the various interactions that stabilize a a huge number of proteins whose very existence was previ- protein’s tertiary structure. 2.5 Four Types of Biological Molecules Biological of Types Four

(a) (b) Figure 2.42 Protein–protein interactions of hub proteins. (a) The Cdc28 binds a number of different proteins (Cln1-Cln3) at the same enzyme RNA polymerase II, which synthesizes messenger in interface, which allows only one of these partners to bind at a time. the cell, binds a multitude of other proteins simultaneously using (F ROM DAMIEN DEVOS AND ROBERT B. R USSELL , C URR . O PIN . multiple interfaces. ( b) The enzyme Cdc28, which phosphorylates STRUCT . B IOL . 17:373, © 2007, WITH PERMISSION OF ELSEVIER .) other proteins as it regulates the cell division cycle of budding yeast. 64 When Anfinsen treated ribonuclease molecules with open to more than one interpretation. For the sake of simplic- mercaptoethanol and concentrated urea, he found that the ity, we will restrict the discussion to “simple” proteins, such as preparation lost all of its enzymatic activity, which would be ribonuclease, that consist of a single domain. One fundamen- expected if the protein molecules had become totally un- tal issue that has been extensively debated is whether all of the folded. When he removed the urea and mercaptoethanol from members of a of unfolded proteins of a single the preparation, he found, to his surprise, that the molecules species fold along a similar pathway or fold by means of a di- regained their normal enzymatic activity. The active ribonu- verse set of routes that somehow converge upon the same na- clease molecules that had re-formed from the unfolded pro- tive state. Recent studies that have simulated folding with the tein were indistinguishable both structurally and functionally aid of specialized supercomputers appear to have swung the from the correctly folded (i.e., native ) molecules present at the consensus toward the idea that single-domain proteins fold beginning of the (Figure 2.43). After extensive along a single dominant pathway characterized by a unique, study, Anfinsen concluded that the linear sequence of amino relatively well-defined transition state (see Figure 2.45). acids contained all of the information required for the forma- Another issue that has been roundly debated concerns the tion of the polypeptide’s three-dimensional conformation. Ri- types of events that occur at various stages during the folding bonuclease, in other words, is capable of self-assembly . As process. In the course depicted in Figure 2.44 a, protein fold- discussed in Chapter 3, events tend to progress toward states ing is initiated by interactions among neighboring residues of lower energy. According to this concept, the tertiary struc- that lead to the formation of much of the secondary structure ture that a polypeptide chain assumes after folding is the ac- of the molecule. Once the ␣ helices and ␤ sheets are formed, cessible structure with the lowest energy, which makes it the subsequent folding is driven by hydrophobic interactions that most thermodynamically stable structure that can be formed bury nonpolar residues together in the central core of the pro- by that chain. It would appear that evolution selects for those tein. According to an alternate scheme shown in Figure 2.44 b, amino acid sequences that generate a polypeptide chain capa- the first major event in protein folding is the hydrophobic col- ble of spontaneously arriving at a meaningful in a lapse of the polypeptide to form a compact structure in which biologically reasonable time period. the backbone adopts a native-like topology. Only after this There have been numerous controversies in the study of collapse does significant secondary structure develop. Recent protein folding. Many of these controversies stem from the studies indicate that the two pathways depicted in Figure 2.44 fact that the field is characterized by highly sophisticated ex- lie at opposite extremes, and that most proteins probably fold perimental, spectroscopic, and computational procedures that by a middle-of-the-road scheme in which secondary structure are required to study complex molecular events that typically formation and compaction occur simultaneously. These early occur on a microsecond timescale. These efforts have often yielded conflicting results and have generated data that are

Collapse

Unfolding (urea + mercaptoethanol) Unfolded Secondary Native structure (a) Refolding

Collapse

Figure 2.43 Denaturation and refolding of ribonuclease. A native ribonuclease molecule (with intramolecular disulfide bonds indicated) Unfolded Secondary Native is reduced and unfolded with ␤-mercaptoethanol and 8 M urea. After structure removal of these , the protein undergoes spontaneous refolding. (b) The The Chemical Basis of Life (F ROM C. J. E PSTEIN , R. F. G OLDBERGER , AND C. B. A NFINSEN , COLD SPRING HARBOR SYMP . Q UANT . B IOL . 28:439, 1963. Figure 2.44 Two alternate pathways by which a newly synthesized or REPRINTED WITH PERMISSION FROM COLD SPRING HARBOR denatured protein could achieve its native conformation. Curled LABORATORY PRESS .) segments represent ␣ helices, and arrows represent ␤ strands. Chapter Chapter 2 65 The Role of Molecular Chaperones Not all proteins are able to assume their final tertiary structure by a simple process of self- assembly. This is not because the primary structure of these proteins lacks the required information for proper folding, but rather because proteins undergoing folding have to be pre- vented from interacting nonselectively with other molecules in the crowded compartments of the cell. Several families of proteins have evolved whose function is to help unfolded or misfolded proteins achieve their proper three-dimensional conformation. These “helper proteins” are called molecular chaperones , and they selectively bind to short stretches of hydrophobic amino acids that tend to be exposed in nonnative Figure 2.45 Along the folding pathway. The image on the left shows proteins but buried in proteins having a native conformation. the native tertiary structure of the enzyme acyl-. The im- Figure 2.46 depicts the activities of two families of molec- age on the right is the transition structure, which represents the state of ular chaperones that operate in the of eukaryotic cells. the molecule at the top of an energy barrier that must be crossed if the Molecular chaperones are involved in a multitude of activities protein is going to reach the native state. The transition structure con- within cells, ranging from the import of proteins into or- sists of numerous individual lines because it is a set (ensemble) of ganelles (see Figure 8.47 a) to the prevention and reversal of closely related structures. The overall architecture of the transition protein aggregation. We will restrict the discussion to their structure is similar to that of the native protein, but many of the finer actions on newly synthesized proteins. Polypeptide chains are structural features of the fully folded protein have yet to emerge. Con- synthesized on ribosomes by the addition of amino acids, one version of the transition state to the native protein involves completing at a time, beginning at the chain’s N-terminus (step 1, Figure secondary structure formation, tighter packing of the side chains, and 2.46). Chaperones of the Hsp70 family bind to elongating finalizing the burial of hydrophobic side chains from the aqueous sol- vent. (F ROM K. L INDORFF -L ARSEN , ET AL , T RENDS BIOCHEM . S CI . polypeptide chains as they emerge from an exit channel 30:14, 2005, FIG . 1 B. © 2005, WITH PERMISSION FROM ELSEVIER . within the large subunit of the ribosome (step 2). Hsp70 IMAGE FROM CHRISTOPHER DOBSON .) chaperones are thought to prevent these partially formed polypeptides (i.e., nascent polypeptides) from binding to other proteins in the cytosol, which would cause them either to ag- gregate or misfold. Once their synthesis has been completed folding events lead to the formation of a partially folded, tran- (step 3), many of these proteins are simply released by the sient structure that resembles the native protein but lacks chaperones into the cytosol where they spontaneously fold many of the specific interactions between amino acid side chains into their native state (step 4). Other proteins are repeatedly that are present in the fully folded molecule (Figure 2.45). bound and released by chaperones until they finally reach their If the information that governs folding is embedded in a fully folded state. Many of the larger polypeptides are trans- protein’s amino acid sequence, then alterations in this se- ferred from Hsp70 proteins to a different type of chaperone quence have the potential to change the way a protein folds, called a chaperonin (step 5). Chaperonins are cylindrical pro- leading to an abnormal tertiary structure. In fact, many muta- tein complexes that contain chambers in which newly synthe- tions responsible for inherited disorders have been found to sized polypeptides can fold without interference from other alter a protein’s three-dimensional structure. In some cases, macromolecules in the cell. TRiC is a chaperonin thought to the consequences of protein misfolding can be fatal. Two ex- assist in the folding of up to 15 percent of the polypeptides amples of fatal neurodegenerative that result from ab- synthesized in mammalian cells. The discovery and mecha- normal protein folding are discussed in the accompanying nism of action of Hsp70 and chaperonins are discussed in Human Perspective. depth in the Experimental Pathways on page 80. 2.5 mRNA mRNA Four Types of Biological Molecules Biological of Types Four

Ribosome Native polypeptide Nascent polypeptide Chaperone 1 (Hsp70)

3 4 2

Figure 2.46 The role of molecular chaperones in encouraging Chaperonin (TRiC) protein folding. The steps are described in the text. (Other families of chaperones are known but are not discussed.) 5 66 THE HUMAN PERSPECTIVE Protein Misfolding Can Have Deadly Consequences In April 1996 a paper was published in the medical journal Lancet that composed solely of protein. He called the protein a prion . This “protein generated widespread alarm in the of Europe. The paper only” hypothesis, as it is called, was originally met with considerable described a study of 10 persons afflicted with Creutzfeld-Jakob disease skepticism, but subsequent studies by Prusiner and others have provided (CJD), a rare, fatal disorder that attacks the brain, causing a loss of mo- overwhelming support for the proposal. It was presumed initially that tor coordination and dementia. Like numerous other diseases, CJD can the prion protein was an external agent—some type of virus-like parti- occur as an inherited disease that runs in certain families or as a sporadic cle lacking . Contrary to this expectation, the prion protein form that appears in individuals who have no family history of the dis- was soon shown to be encoded by a gene (called PRNP ) within the cell’s ease. Unlike virtually every other inheritable disease, however, CJD can own chromosomes.The gene is expressed within normal brain tissue and also be acquired . Until recently,persons who had acquired CJD had been encodes a protein designated PrP C (standing for prion protein cellular) recipients of organs or products that were donated by a person that resides at the surface of nerve cells.The precise function of PrP C re- with undiagnosed CJD. The cases described in the 1996 Lancet paper mains a mystery. A modified version of the protein (designated PrP Sc , had also been acquired, but the apparent source of the disease was con- standing for prion protein scrapie) is present in the brains of humans taminated beef that the infected individuals had eaten years earlier.The with CJD. Unlike the normal PrP C, the modified version of the protein contaminated beef was derived from cattle raised in England that had accumulates within nerve cells, forming aggregates that kill the cells. contracted a neurodegenerative disease that caused the animals to lose In their purified states, PrP C and PrP Sc have very different phys- motor coordination and develop demented behavior. The disease be- ical properties. PrP C remains as a monomeric molecule that is soluble came commonly known as “mad cow disease.” Patients who have ac- in solutions and is readily destroyed by protein-digesting en- quired CJD from eating contaminated beef can be distinguished by zymes. In contrast, PrP Sc molecules interact with one another to form several criteria from those who suffer from the classical forms of the dis- insoluble fibrils that are resistant to enzymatic digestion. Based on ease.To date, roughly 200 people have died of CJD acquired from con- these differences, one might expect these two forms of the PrP pro- taminated beef, and the numbers of such deaths have been declining. 1 tein to be composed of distinctly different sequences of amino acids, A disease that runs in families can invariably be traced to a faulty but this is not the case. The two forms can have identical amino acid gene, whereas diseases that are acquired from a contaminated source sequences, but they differ in the way the polypeptide chain folds to can invariably be traced to an infectious agent. How can the same dis- form the three-dimensional protein molecule (Figure 1). Whereas a ease be both inherited and infectious? The answer to this question has PrP C molecule consists largely of ␣-helical segments and interconnect- emerged gradually over the past several decades, beginning with ob- ing coils, the core of a PrP Sc molecule consists largely of ␤ sheet. servations by D. Carleton Gajdusek in the 1960s on a strange malady It is not hard to understand how a polypeptide might be that once afflicted the native population of Papua, New Guinea. Gaj- less stable and more likely to fold into the abnormal PrP Sc conforma- dusek showed that these islanders were contracting a fatal neurode- tion, but how is such a protein able to act as an infectious agent? Ac- generative disease—which they called “kuru”—during a funeral ritual cording to the prevailing hypothesis, an abnormal prion molecule in which they ate the brain tissue of a recently deceased relative. Au- (PrP Sc ) can bind to a normal protein molecule (PrP C) and cause the topsies of the brains of patients who had died of kuru showed a dis- normal protein to fold into the abnormal form. This conversion can tinct , referred to as spongiform encephalopathy , in which be shown to occur in the test tube: addition of PrP Sc to a preparation certain brain regions were riddled with microscopic holes (vacuola- of PrP C can convert the PrP C molecules into the PrP Sc conformation. tions), causing the tissue to resemble a . According to this hypothesis, the appearance of the abnormal protein It was soon shown that the brains of islanders suffering from kuru in the body—whether as a result of a rare misfolding event in the case were strikingly similar in microscopic appearance to the brains of per- of sporadic disease or by exposure to contaminated beef—starts a sons afflicted with CJD.This observation raised an important question: chain reaction in which normal protein molecules in the cells are Did the brain of a person suffering from CJD, which was known to be gradually converted to the misshapen prion form as they are recruited an inherited disease, contain an infectious agent? In 1968, Gajdusek into growing insoluble fibrils. The precise mechanism by which pri- showed that when extracts prepared from a biopsy of the brain of a per- ons lead to neurodegeneration remains unclear. son who had died from CJD were injected into a suitable laboratory an- CJD is a rare disease caused by a protein with unique infective imal, that animal did indeed develop a spongiform encephalopathy properties. Alzheimer’s disease (AD), on the other hand, is a com- similar to that of kuru or CJD. Clearly, the extracts contained an infec- mon disorder that strikes as many as 10 percent of individuals who tious agent, which at the time was presumed to be a virus. are at least 65 years of age and perhaps 40 percent of individuals who In 1982, Stanley Prusiner of the University of California, San are 80 years or older. Persons with AD exhibit memory loss, confu- Francisco, published a paper suggesting that, unlike viruses, the infec- sion, and a loss of reasoning ability. CJD and AD share a number of tious agent responsible for CJD lacked nucleic acid and instead was important features. Both are fatal neurodegenerative diseases that can occur in either an inherited or sporadic form. Like CJD, the brain of a person with Alzheimer’s disease contains fibrillar deposits 1 On the surface, this would suggest that the epidemic has run its course, but of an insoluble material referred to as amyloid (Figure 2). 2 In both there are several reasons for public health officials to remain concerned. For one, studies of tissues that had been removed during surgeries in England indicate that thousands of people are likely to be infected with the disease without ex- 2It should be noted that the term amyloid is not restricted to the abnormal pro- hibiting symptoms (discussed in Science 335:411, 2012). Even if these individu- tein found in AD. Many different proteins are capable of assuming an abnormal

The The Chemical Basis of Life als never develop clinical disease, they remain potential carriers who could pass conformation that is rich in ␤-sheet, which causes the protein monomers to ag- CJD on to others through blood transfusions. In fact, at least two individuals are gregate into characteristic amyloid fibrils that bind certain dyes. Amyloid fibrils believed to have contracted CJD after receiving blood from a donor harboring are defined by their molecular structure in which the ␤-strands are oriented per- the disease. These findings underscore the need to test blood for the presence of pendicular to the long axis of the fibrils. The PrP Sc -forming fibrils of prion dis- the responsible agent (whose nature is discussed below). eases are also described as amyloid. Chapter Chapter 2 67

(a) (b)

Figure 1 A contrast in structure. (a) Tertiary structure of the normal fold very differently. As a result of the differences in folding, PrP C C (PrP ) protein as determined by NMR spectroscopy. The orange por- remains soluble, whereas PrP Sc produces aggregates that kill the cell. tions represent ␣-helical segments, and the blue portions are short (The two molecules shown in this figure are called conformers because ␤ strands. The yellow dotted line represents the N-terminal portion of they differ only in conformation.) ( A: F ROM R. R IEK , ET AL ., FEBS the polypeptide, which lacks defined structure. ( b) A proposed model 413, 287, F IG . 1. © 1997, WITH PERMISSION FROM ELSEVIER . I MAGE Sc of the abnormal, infectious (PrP ) prion protein, which consists largely FROM KURT WÜTHRICH . B: R EPRINTED FROM S. B. P RUSINER , of ␤-sheet. The actual tertiary structure of the prion protein has not TRENDS BIOCHEM . S CI . 21:483, 1996 C OPYRIGHT 1996, WITH been determined. The two molecules shown in this figure are formed PERMISSION FROM ELSEVIER .) by polypeptide chains that can be identical in amino acid sequence but diseases, the fibrillar deposits result from the self-association of a Over the past two decades, research on AD has been dominated polypeptide composed predominantly of ␤ sheet. There are also by the amyloid hypothesis , which contends that the disease is caused many basic differences between the two diseases: the proteins that by the production of a molecule, called the amyloid ␤-peptide (A␤). form the disease-causing aggregates are unrelated, the parts of the A␤ is originally part of a larger protein called the amyloid precursor brain that are affected are distinct, and the protein responsible for AD protein (APP ), which spans the nerve . The A ␤ pep- is not considered to be an infectious agent (i.e., it does not spread in tide is released from the APP molecule following cleavage by two a contagious pattern from one person to another, although it may specific enzymes, ␤-secretase and ␥-secretase (Figure 3). The length spread from cell to cell within the brain). of the A ␤ peptide is somewhat variable. The predominant species

Normal Alzheimer’s

Neurofibrillary tangles

Neuron 2.5 Four Types of Biological Molecules Biological of Types Four

Amyloid plaque Neurofibrillary tangle (NFT) (a)

Figure 2 Alzheimer’s disease. (a) The defining characteristics of brain Perspective, are composed of misfolded tangles of a protein called tau that tissue from a person who died of Alzheimer’s disease. ( b) Amyloid plaques is involved in maintaining the microtubule organization of the nerve cell. containing aggregates of the A ␤ peptide appear extracellularly (between Both the plaques and tangles have been implicated as a cause of the dis- nerve cells), whereas neurofibrillary tangles (NFTs) appear within the ease. ( A: © T HOMAS DEERINCK , NCMIR/P HOTO RESEARCHERS , I NC . cells themselves. NFTs, which are discussed at the end of the Human B: © A MERICAN HEALTH ASSISTANCE FOUNDATION .) 68 as the underlying basis of the disease. The strongest argument Lumen of endoplastic reticulum γ− Secretase (becomes extracellular space) against the amyloid hypothesis is the weak correlation that can exist between the number and size of amyloid plaques in the brain and the severity of the disease. Elderly persons who show little or no sign of memory loss or dementia can have relatively high levels of amyloid deposits in their brain and those with severe disease can have little or no amyloid deposition. All of the currently on the market for the treatment of NH2 Amyloid precursor protein Aβ COOH (APP) peptide AD are aimed only at management of symptoms; none has any effect on stopping disease progression. With the amyloid hypothesis as the guiding influence, researchers have followed three basic strategies in the pursuit of new drugs for the prevention and/or reversal of men- Aβ40 tal decline associated with AD. These strategies are (1) to prevent peptide the formation of the A ␤42 peptide in the first place; (2) to remove β− Secretase the A ␤42 peptide (or the amyloid deposits it produces) once it has been formed; and (3) to prevent the interaction between A ␤ mole- cules, thereby preventing the formation of both and fibril- lar aggregates. Before examining each of these strategies, we can Aβ42 consider how investigators can determine what type of drugs might peptide be successful in the prevention or treatment of AD. Cytoplasm One of the best approaches to the development of treatments for human diseases is to find laboratory animals, particularly mice, Figure 3 Formation of the A ␤ peptide. The A ␤ peptide is carved that develop similar diseases, and use these animals to test the effec- from the amyloid precursor protein (APP) as the result of cleavage by tiveness of potential therapies. Animals that exhibit a disease that two enzymes, ␤-secretase and ␥-secretase. It is interesting that APP mimics a human disease are termed animal models . For whatever rea- and the two secretases are all proteins that span the membrane. Cleav- son, the brains of aging mice show no evidence of the amyloid de- age of APP occurs inside the cell (probably in the endoplasmic reticu- posits found in humans, and, up until 1995, there was no animal lum), and the A ␤ product is ultimately secreted into the space outside model for AD. Then, in that year, researchers found that they could of the cell. The ␥-secretase can cut at either of two sites in the APP create a strain of mice that developed amyloid plaques in their brain molecule, producing either A ␤40 or A ␤42 peptides, the latter of which and performed poorly at tasks that required memory. They created is primarily responsible for production of the amyloid plaques seen in this strain by genetically the mice to carry a mutant hu- Figure 2. ␥-Secretase is a multisubunit enzyme that cleaves its man APP gene, one responsible for causing AD in families. These substrate at a site within the membrane. genetically engineered ( transgenic ) mice have proven invaluable for testing potential therapies for AD. The greatest excitement in the field of AD therapeutics has centered on the second strategy men- tioned above, and we can use these investigations to illustrate some has a length of 40 amino acids (designated as A ␤40), but a minor of the steps required in the development of a new . species with two additional hydrophobic residues (designated as In 1999, Dale Schenk and his colleagues at Elan Pharmaceuticals A␤42) is also produced. Both of these peptides can exist in a soluble published an extraordinary finding. They had discovered that the for- form that consists predominantly of ␣ helices, but A ␤42 has a ten- mation of amyloid plaques in mice carrying the mutant human APP dency to spontaneously refold into a very different conformation gene could be blocked by repeatedly injecting the animals with the that contains considerable ␤-pleated sheet. It is the misfolded A ␤42 very same substance that causes the problem, the aggregated A ␤42 version of the molecule that has the greatest potential to cause dam- peptide. In effect, the researchers had immunized (i.e., vaccinated) the age to the brain. A ␤42 tends to self-associate to form small com- mice against the disease. When young (6-week-old) mice were immu- plexes (oligomers) as well as large aggregates that are visible as fibrils nized with A ␤42, they failed to develop the amyloid brain deposits as in the . These amyloid fibrils are deposited out- they grew older. When older (13-month-old) mice whose brains al- side of the nerve cells in the form of extracellular amyloid plaques ready contained extensive amyloid deposits were immunized with the (Figure 2). Although the issue is far from settled, a body of evidence A␤42, a significant fraction of the fibrillar deposits was cleared out of suggests that it is the soluble oligomers that are most toxic to nerve the nervous system. Even more importantly, the immunized mice per- cells, rather than the insoluble aggregates. Cultured nerve cells, for formed better than their nonimmunized littermates on memory-based example, are much more likely to be damaged by the presence of sol- tests. uble intracellular A ␤ oligomers than by either A ␤ monomers or ex- The dramatic success of these experiments on mice, combined tracellular fibrillar aggregates. In the brain, the A ␤ oligomers appear with the fact that the animals showed no ill effects from the immu- to attack the synapses that connect one nerve cell to another and nization procedure, led government regulators to quickly approve a eventually lead to the death of the nerve cells. Persons who suffer Phase I clinical trial of the A ␤42 vaccine. A Phase I clinical trial is the from an inherited form of AD carry a mutation that leads to an in- first step in testing a new drug or procedure in humans and usually creased production of the A ␤42 peptide. Overproduction of A ␤42 comes after years of preclinical testing on cultured cells and animal can be caused by possession of extra copies (duplications) of the APP models. Phase I tests are carried out on a small number of subjects gene, by mutations in the APP gene, or by mutations in genes and are designed to monitor the safety of the therapy and the optimal (PSEN1 , PSEN2 ) that encode subunits of ␥-secretase. Individuals dose of the drug rather than its effectiveness against the disease. The The Chemical Basis of Life with such mutations exhibit symptoms of the disease at an early age, None of the subjects in two separate Phase I trials of the A ␤ typically in their 50s. The fact that all mutations associated with vaccine showed any ill-effects from the injection of the amyloid pep- these inherited, early-onset forms of AD lead to increased produc- tide. As a result, the investigators were allowed to proceed to a Phase tion of A ␤42 is the strongest argument favoring amyloid formation II clinical trial, which involves a larger group of subjects and is Chapter Chapter 2 69 designed to obtain a measure of the effectiveness of the procedure persons who are at very high risk of developing AD before they de- (or drug). This particular Phase II trial was carried out as a random- velop symptoms. The first clinical trial of this type was begun in ized, double-blind, placebo-controlled study. In this type of study: 2012 as several hundred individuals who would normally be destined 1. the patients are randomly divided into two groups that are treated to develop early-onset AD (due to mutations in the PSEN1 gene) ␤ similarly except that one group is given the curative factor (pro- were treated with an anti-A antibody in the hopes of blocking the tein, antibodies, drugs, etc.) being investigated and the other future buildup of amyloid and preventing the disease. group is given a placebo (an inactive substance that has no thera- Drugs have also been developed that follow the other two peutic value), and strategies outlined above. Alzhemed and scyllo-inositol are two ␤ 2. small molecules that bind to A peptides and block molecular aggre- the study is double-blinded , which means that neither the gation and fibril formation. Clinical trials have failed to demonstrate researchers nor patients know who is receiving treatment and that either drug is effective in stopping disease progression in pa- who is receiving the placebo. tients with mild to moderate AD. The third strategy outlined above The Phase II trial for the A ␤ vaccine began in 2001 and en- is to stop production of A ␤ peptides. This can be accomplished by rolled more than 350 individuals in the United States and Europe inhibiting either ␤- or ␥-secretase, because both enzymes are re- who had been diagnosed with mild to moderate AD. After receiving quired in the pathway that cleaves the APP precursor to release the in- two injections of synthetic ␤-amyloid (or a placebo), 6 percent of the ternal peptide (Figure 3). Pharmaceutical companies have had great subjects experienced a potentially life-threatening inflammation of difficulty developing a ␤-secretase inhibitor that is both potent and the brain. Most of these patients were successfully treated with small enough to enter the brain from the bloodstream. A number of steroids, but the trial was discontinued. potent ␥-secretase inhibitors have been developed that block the Once it had become apparent that vaccination of patients with production of all A ␤ peptides, both in cultured nerve cells and in A␤42 had inherent risks, it was decided to pursue a safer form of im- transgenic AD mice. But there is a biological problem that has to be munization therapy, which is to administer antibodies directed overcome with this class of inhibitors. In addition to cleaving APP, against A ␤ that have been produced outside the body. This type of ␥-secretase activity is also required in a key signaling pathway in- approach is known as passive immunization because the person does volving a protein called Notch. Two of the most promising ␥-secre- not produce the therapeutic antibodies themselves. Passive immu- tase inhibitors, flurizan and semagacestat, have both failed to show nization with an anti-A ␤42 antibody (called bapineuzumab) had already proven capable of restoring memory function in transgenic mice and was quickly shown to be safe, and apparently effective, in Phase I and II clinical trials. The last step before government ap- proval is a Phase III trial, which typically employs large numbers of subjects (a thousand or more at several research centers) and com- pares the effectiveness of the new treatment against standard ap- proaches. The first results of the Phase III trials on bapineuzumab Healthy were reported in 2008 and were disappointing; there was little or no brain evidence that the antibody provided benefits in preventing the pro- gression of the disease. Despite these findings, a large number of dif- ferent antibodies targeting amyloid peptides are currently in clinical trials. Given the impact of AD on human health and the large amount of money that could be earned from this type of drug, phar- maceutical companies are willing to take the risk that one of these immunologic strategies will exhibit some therapeutic value. Meanwhile, a comprehensive analysis of some of the patients who had been vaccinated with A ␤42 in the original immunization trial from 2001 was also reported in 2008. Analysis of this patient group indicated that the A ␤42 vaccination had had no effect on pre- venting disease progression. It was particularly striking that in sev- Alzheimer’s eral of these patients who had died of severe dementia, there were disease virtually no amyloid plaques left in their brains.This finding strongly suggests that removal of amyloid deposits in a patient already suffer- 2.5 ing the symptoms of mild-to-moderate dementia does not stop dis- ease progression. These results can be interpreted in more than one Molecules Biological of Types Four way. One interpretation is that the amyloid deposits are not the cause Figure 4 A neuroimaging technique that reveals the presence of of the symptoms of dementia. An alternate interpretation is that ir- amyloid in the brain. These PET (positron emission tomography) reversible toxic effects of the deposits had already occurred by the scans show the brains of two individuals that have ingested a radioac- time immunization had begun and it was too late to reverse the dis- tive compound, called Amyvid, that binds to amyloid deposits and ap- ease course using treatments that remove existing amyloid deposits. pears red in the image. The top shows a healthy brain and the bottom a It is important to note, in this regard, that the formation of amyloid brain from a patient with AD, revealing extensive amyloid build-up. deposits in the brain begins 10 or more years before any clinical Amyloid deposits in the brain can be detected with this technique in symptoms of AD are reported. It is possible that, if these treatments persons who show no evidence of cognitive dysfunction. Such had started earlier, the symptoms of the disease might never have ap- symptom-free individuals are presumed to be at high risk of going on peared. Recent advances in brain-imaging procedures now allow cli- to develop AD. Those who lack such deposits can be considered nicians to observe amyloid deposits in the brains of individuals long at very low risk of the disease in the near future. (C OURTESY OF ELI before any symptoms of AD have developed (Figure 4). Based on LILLY /A VID PHARMACEUTICALS .) these studies, it may be possible to begin preventive treatments in 70 any benefit in stopping AD progression. In addition to its lack of ef- much better with cognitive dysfunction and neuronal loss than does the ficacy, semagacestat caused adverse side effects that were probably a concentration of amyloid plaques. Given that mutations in genes in result of blockade of the Notch pathway. The goal of drug designers the A ␤ pathway are clearly a cause of AD, and yet it is the NFT burden is to develop a compound (e.g., begacestat) that blocks APP cleavage that correlates with cognitive decline, it would appear that both A␤ but does not interfere with cleavage of Notch. and NFTs must be involved in AD etiology. Many researchers Taken collectively, the apparent failure of all of these drugs, believe that A ␤ deposition somehow leads to NFT formation, but aimed at various steps in the formation of A ␤-containing aggregates the mechanism by which this might occur remains unknown. and amyloid deposition, has left the field of AD therapeutics with- It is evident from this discussion that a great deal of work on out a clear plan for the future. Some pharmaceutical companies are AD has been based on transgenic mice carrying human AD genes. continuing to develop new drugs aimed at blocking the formation of These animals have served as the primary preclinical subjects for amyloid aggregates, whereas others are moving in different direc- testing AD drugs, and they have been used extensively in basic re- tions. These findings also raise a more basic question: Is the A ␤ pep- search that aims to understand the disease mechanisms responsible tide even part of the underlying mechanism that leads to AD? It for the development of AD. But many questions have been raised as hasn’t been mentioned, but A ␤ is not the only misfolded protein to how accurately these animal models mimic the disease in humans, found in the brains of persons with AD. Another protein called tau, particularly the sporadic human cases in which affected individuals which functions as part of a nerve cell’s (Section 9.3), lack the mutant genes that cause the animals to develop the corre- can develop into bundles of tangled cellular filaments called neu- sponding disorder. In fact, one of the most promising new drugs at rofibrillary tangles (or NFTs) (Figure 2) that interfere with the the time of this writing is one that acts on NFTs rather than ␤-amy- movement of substances down the length of the nerve cell. NFTs loid. In this case, the drug methylthioninium chloride (brand name form when the tau molecules in nerve cells become excessively phos- Rember), which dissolves NFTs, was tested on a group of more than phorylated. Mutations in the gene that encodes tau have been found 300 patients with mild to moderate AD in a Phase II trial. The drug to cause a rare form of dementia (called FTD), which is character- was found to reduce mental decline over a period of one year by an ized by the formation of NFTs. Thus, NFTs have been linked to de- average of 81 percent compared to patients receiving a placebo. The mentia, but they have been largely ignored as a causative factor in drug is now being tested in a larger Phase III study, but no prelimi- AD pathogenesis, due primarily to the fact that the transgenic AD nary results have yet been reported. Other compounds that inhibit mouse models discussed above do not develop NFTs. If one extrap- one of the enzymes (GSK-3) that adds phosphate groups to the tau olates the results of these mouse studies to humans, they suggest that protein are also being investigated as AD therapeutics. Clinical trials NFTs are not required for the cognitive decline that occurs in pa- of one GSK-3 inhibitor, valproate, have been stopped due to adverse tients with AD. At the same time, however, autopsies of the brains effects. (The results of studies on these and other treatments can be of humans who died of AD suggest that NFT burden correlates examined at www.alzforum.org/dis/tre/drc)

The Emerging Field of With all of the at- and high-speed computers are used to perform large-scale tention on genome sequencing in recent years, it is easy to lose studies on diverse arrays of proteins. This is the same basic ap- sight of the fact that genes are primarily information storage proach that has proven so successful over the past decade in units, whereas proteins orchestrate cellular activities. Genome the study of genomes. But the study of proteomics is inher- sequencing provides a kind of “parts list.” The human genome ently more difficult than the study of because pro- probably contains between 20,000 and 22,000 genes, each of teins are more difficult to work with than DNA. In physical which can potentially give rise to a variety of different pro- terms, one gene is pretty much the same as all other genes, teins. 6 To date, only a fraction of these molecules have been whereas each protein has unique chemical properties and han- characterized. dling requirements. In addition, small quantities of a particu- The entire inventory of proteins that is produced by an lar DNA segment can be expanded greatly using readily organism, whether human or otherwise, is known as that or- available enzymes, whereas protein quantities cannot be in- ganism’s proteome . The term proteome is also applied to the creased. This is particularly troublesome when one considers inventory of all proteins that are present in a particular tissue, that many of the proteins regulating important cellular cell, or cellular . Because of the sheer numbers of processes are present in only a handful of copies per cell. proteins that are currently being studied, investigators have Traditionally, protein have sought to answer sought to develop techniques that allow them to determine a number of questions about particular proteins. These in- the properties or activities of a large number of proteins in a clude: What specific activity does the protein demonstrate in single experiment. A new term— proteomics —was coined to vitro, and how does this activity help a cell carry out a partic- describe the expanding field of protein . This ular function such as cell locomotion or DNA replication? term carries with it the concept that advanced technologies What is the protein’s three-dimensional structure? When does the protein appear in the development of the organism 6There are a number of ways that a single gene can give rise to more than one and in which types of cells? Where in the cell is it localized? Is polypeptide. Two of the most prominent mechanisms, alternative splicing and the protein modified after synthesis by the addition of chem- posttranslational modification, are discussed in other sections of the text. It can ical groups (e.g., or sugars) and, if so, how does

The The Chemical Basis of Life also be noted that many proteins have more than one distinct function. Even myoglobin, which has long been studied as an oxygen-storage protein, has this modify its activity? How much of the protein is present, recently been shown to be involved in the conversion of (NO) to and how long does it survive before being degraded? Does the Ϫ nitrate (NO 3 ). level of the protein change during physiologic activities or as Chapter Chapter 2