<<

Paper No. : 04 Genetic engineering and recombinant DNA technology Module : 24 Understanding the architecture of , to proteome

Principal Investigator: Dr Vibha Dhawan, Distinguished Fellow and Sr. Director The Energy and Resouurces Institute (TERI), New Delhi

Co-Principal Investigator: Prof S K Jain, Professor, of Medical Biochemistry Jamia Hamdard University, New Delhi

Paper Coordinator: Dr Mohan Chandra Joshi, Assistant Professor, Jamia Millia Islamia, New Delhi

Content Writer: Dr. Pravir Kumar, Professor, Department of Biotechnology, Delhi Technological University

Content Reviwer: Dr Rohini Muthuswami, Associate Professor, Jawaharlal Nehru University

Genetic engineering and Recombinant DNA technology Biotechnology Understanding the architecture of Proteins, protein to proteome

Description of Module Subject Name Biotechnology

Paper Name Genetic engineering and Recombinant DNA technology

Module Name/Title Understanding the architecture of Proteins, protein to proteome

Module Id 24

Pre-requisites

Objectives

Keywords

Genetic engineering and Recombinant DNA technology Biotechnology Understanding the architecture of Proteins, protein to proteome

e-Content

Introduction: Amino acids and protein Amino acids are compounds made up of carbon, oxygen, hydrogen and nitrogen. They act as monomers of proteins and are consisting of a carboxyl group, an amino group, a H-atom and a different side chain, all bonded to the α-carbon. In an α-, the amino and carboxylate groups are bonded to the α-carbon. Further, various α-amino acids differ in respect of the side chain (R-group) attached to their α-carbon. The common structure of amino acid is:

Figure 1: Common structure of an amino acid.

This structure is common to all except one of the α-amino acids (). The R- group of side chain attached to the α-carbon is different in each amino acid. In the simplest case, the R-group is a H-atom and amino acid is .

Figure 2: Structure of glycine and .

In α-amino acids both the carboxyl and amino group are bonded to the same C- atom. Although, many naturally occurring amino acids not found in protein, have structures that differ from the α-amino acids. In these compounds the amino group is attached to a C-atom other than α-carbon atom and they are called as β, γ, δ and ε amino acids depending upon the location of the C-atom to which amino group is attached.

Standard and non-standard amino acids

Genetic engineering and Recombinant DNA technology Biotechnology Understanding the architecture of Proteins, protein to proteome

More than 300 amino acids are present in cells; however, only 22 amino acids take their part in ribosomically. Such amino acids are known as standard or proteinogenic amino acids. Some amino acids are more abundant in proteins than other amino acids. Four amino acids-, lysine, leucine and are the most abundant amino acid residues in a typical protein. and are rare amino acids in a protein. Standard L-α-amino acids are specified by simple three letter codons. Α amino acids can be classified on the properties of their side chain, in particular, their polarity, or tendency to interact with H2O at physiological pH. The polarity of the side chain varies widely, from non- polar and hydrophobic to highly polar and hydrophilic. 1) Amino acids with non-polar side chains Among standard amino acids, nine amino acids contain non-polar side chain. These include alanine, glycine, leucine, valine, proline, isolucine, phenylalanine, methionine and tryptophan. Proline differs from other members because of having its side chain bonded to both the nitrogen and α- carbon atoms. Tryptophan and phenylalanine are having aromatic side chains. Further, the side chain of tryptophan contains an indole ring whereas, phenylalanine has a phenyl ring. 2) Amino acids with uncharged polar side chain Six amino acids contain uncharged polar side chain. These are theronine, , serine, , and . Three amino acids theronine, serine and tyrosine are having hydroxyl groups attached to a side chain. Cysteine is similar to serine but contains a thiol or sulfhydral group in place of the OH-group. 3) Amino acids with charged polar side chain Positively charged R group: and lysine have side chains that contain positively charged groups at neutral pH. Lysine has an amino group whilst arginine contains a guanidinium group. contains an imidazole group, an aromatic ring.

Negatively charged R group: glutamate and aspartate have side chains that contain acidic side chains and are having negatively charged groups at physiological pH.

Genetic engineering and Recombinant DNA technology Biotechnology Understanding the architecture of Proteins, protein to proteome

Figure 3: Amino acids with polar and non-polar side chains.

Further the structures of the 20 standard amino acids including (both hydrophilic and hydrophobic amino acids) are shown in the figure below.

Figure: Structures of hydrophilic amino acids.

Genetic engineering and Recombinant DNA technology Biotechnology Understanding the architecture of Proteins, protein to proteome

Figure 4: Structures of hydrophobic amino acids.

Non-standard amino acids Apart from the 22 standard amino acids, all other amino acids are not ribosomically incorporated into proteins are known as non-standard amino acids. Selenocysteine is introduced during protein biosynthesis rather than created through postsynthetic modification. It contains selenium rather than sulphur of its structural analog, cysteine. Since, selenocysteine is incorporated into polypeptides during translation, thus it is referred to as 21st amino acid and specified by stop codon (UGA). Similarly, pyrrolysine is considered to be 22st amino acid which is similar to lysine and found in some bacterial proteins. It is coded by UAG codon.

Figure 5: Structures of Nonstandard amino acids.

Proteins structural organization Proteins are unbranched polymers constructed from 22 standard α-amino acids. They have four levels of the structural organization. Primary structure, the amino acid sequence is specified by genetic information. As the polypeptide chain folds, it forms certain localized arrangements of adjacent amino acids that constitute secondary structure. The overall 3-D shape that a polypeptide assumes is known as tertiary structure. Likewise, proteins that contain two or more polypeptide subunits are said to have quaternary structure.

Genetic engineering and Recombinant DNA technology Biotechnology Understanding the architecture of Proteins, protein to proteome

Figure 6: Structural organization of protein.

1) Primary structure The primary structure of a polypeptide is its amino acid sequence. The amino acids are connected via peptide bonds. Polypeptide containing primary structure determines the higher levels of structural organization.

2) Secondary structure The most common types of secondary structure are the α-helix and the β- pleated sheet. Both α-helix and the β-pleated sheet patterns are stabilized by

Genetic engineering and Recombinant DNA technology Biotechnology Understanding the architecture of Proteins, protein to proteome

H-bonds between the carbonyl and N-H groups in the polypeptides backbone. The α-helix is a rigid, rod like structure that forms when polypeptide chain twists into a helical conformation. The screw sense of α- helix can be right-handed or left-handed. However, right-handed helices are energetically more favourable. Length of α-helix is usually 10-15 amino acid residues. Intrachain H-bonds form between the N-H group of each amino acid and the carbonyl group of the amino acid four residues away. Further, β- pleated sheets form when two or more polypeptide chain segments line up side by side. Each individual segment is referred to as a β-strand.

3) Tertiary structure The terms tertiary structure refers to the unique 3-D conformations that globular proteins assume as a result of the interactions between the side chains in their primary structure. All the information’s needed to fold protein into its native tertiary structure is contained within the primary structure of the peptide chain itself. The following types of covalent and non-covalent interactions stabilize tertiary structure of protein:

 Non-covalent interactions  H-bonds  Electrostatic interactions  Hydrophobic interactions  Van der waals force of interaction

Covalent bonds: The most prominent covalent bonds in tertiary structure are the bond present in several extracellular proteins. Cysteine has a thiol group that is unique among the 20 amino acids in that it often forms a disulfide bond to another cysteine residue through the oxidation of their thiol groups.

4) Quaternary structure

Genetic engineering and Recombinant DNA technology Biotechnology Understanding the architecture of Proteins, protein to proteome

Numerous proteins such as, haemoglobin are made up of two or more polypeptide chains; each polypeptide chain is known as subunit. Subunits in multimeric protein may be identical or quite different. Polypeptide subunits assemble to form quaternary structure and are stabilized by non-covalent bonds including, hydrophobic interactions, electrostatic interactions and H- bonds as well as covalent cross-links.

Further, the conformation of protein folding relies on following factors:  rotation and R group orientation  Weak non-covalent interaction  Protein shape and sequence of amino acids  Conformation and free energy  Denaturation and renaturation of proteins  Factors: heat, pH, chemicals and time

Post-translational modification

Acetylation This process illustrates a reaction, where an acetyl functional group into a chemical compound get introduced. Conversely, deacetylation is a process of the removal of an acetyl group. refers to the process of introducing an acetyl group (resulting in an acetoxy group) into a compound, namely the substitution of an acetyl group for an active H-atom. A reaction involves the substitution of the H-atom of a hydroxyl group (OH) with an acetyl group (CH3 CO) yields a specific ester, the acetate. Acetic anhydride is commonly used as an acetylating agent reacting with free hydroxyl groups. For example, it is used in the synthesis of , heroin, and THC-O-acetate.

Phosphorylation allows coupling and addition of a phosphoryl group. This particular biochemistry probably represents the origins of catecholamine overload (PO3) to a molecule. In biology, phosphorylation and its counterpart, , are

Genetic engineering and Recombinant DNA technology Biotechnology Understanding the architecture of Proteins, protein to proteome

critical for many cellular processes. A large fraction of proteins are temporarily phosphorylated, as are many sugars, lipids and other molecules. Phosphorylation is especially important for protein function as this modification activates (or deactivates) almost half of the enzymes, thereby regulating their function. Protein phosphorylation is considered the most abundant post-translational modification in eukaryotes.

Hydroxylation is a chemical process which is responsible for introducing a OH group into an organic compound. In biochemistry, hydroxylation reactions are often facilitated by enzymes called hydroxylases. Further, the process of hydroxylation is crucial for oxidative degradation of organic compounds in air. Moreover, it is also pivotal in detoxification process as hydroxylation converts lipophilic compounds into hydrophilic products that are more readily removed by the kidneys or and excreted. Some drugs including steroids are activated or deactivated by hydroxylation.

DNA This is an important process in biochemistry, in which the methyl groups are added to the DNA molecule. The process of methylation is responsible for altering the activity of a DNA segment without affecting the sequence. Further, it acts to repress gene transcription when it is located in a gene promoter. It plays a decisive role in normal development and found to be associated with numerous key processes including, X-chromosome inactivation, genomic imprinting, carcinogenesis, repression of transposable elements and aging.

Carboxylation The process of carboxylation in biochemistry is a post-translational modification of glutamate residues to γ-carboxyglutamate in proteins. It occurs primarily in proteins involved in the blood clotting cascade. This modification is required for these proteins to function. Carboxylation occurs in the liver and is performed by γ-glutamyl carboxylase. The carboxylase requires as a cofactor and performs the reaction in a processive manner. γ-carboxyglutamate binds calcium, which is essential for its activity. For example, in prothrombin, calcium binding allows the protein to associate with the plasma membrane in , bringing it into close proximity with the proteins that cleave prothrombin to active after injury.

Glycosylation is a form of co-translational and post-translational modification. In biochemistry, glycosylation refers in particular to the enzymatic cascade that attaches glycans to proteins, lipids or other organic molecules. This enzymatic cascade generates one of the fundamental biopolymers found in cells. Further, in this process, a carbohydrate, i.e. a glycosyl donor is attached to a hydroxyl or other functional group of a glycosyl acceptor. The majority of proteins formed in the RER (rough endoplasmic reticulum) undergo the process of glycosylation. It is an enzyme- directed site-specific process which is present in the cytoplasm and nucleus as the O-GlcNAc modification. There are five classes of glycans are produced:

Genetic engineering and Recombinant DNA technology Biotechnology Understanding the architecture of Proteins, protein to proteome

 N-linked glycans bonded to nitrogen of side-chains. N-linked glycosylation requires participation of a special lipid called dolichol phosphate.  O-linked glycans attached to the hydroxyl oxygen of serine, tyrosine, and hydroxyproline side-chains or to oxygen’s on lipids such as ceramide  phospho-glycans attached through the phosphate of a phospho-serine,  C-linked glycans, a rare form of glycosylation where a sugar is added to a carbon on a tryptophan side-chain.  Finally, glypiation which is the addition of a GPI anchor that links proteins to lipids through glycan linkages.

Ubiquitination The addition of to a substrate protein is called ubiquitination or less frequently ubiquitylation. Ubiquitination affects proteins in many ways: it can mark them for degradation through proteasome, impair their cellular location and activity and finally prevent protein interactions. It basically involves three crucial steps: activation, conjugation and ligation which are orchestrated by E1, E2, and E3 respectively. The outcome of this chronological cascade is to bind ubiquitin to lysine residues on the protein substrate via an , cysteine residues through a thioester bond, serine and threonine residues through an ester bond or the N- terminus of the proteins via a peptide bond

Genetic engineering and Recombinant DNA technology Biotechnology Understanding the architecture of Proteins, protein to proteome