The Role of Local Versus Nonlocal Physicochemical Restraints in Determining Protein Native Structure

Available online at www.sciencedirect.com ScienceDirect The role of local versus nonlocal physicochemical restraints in determining protein native structure Jeffrey Skolnick and Mu Gao The tertiary structure of a native protein is dictated by the Amino acids are chiral, with the majority of native resi- interplay of local secondary structure propensities, hydrogen dues comprised of L-amino acids which have a preference bonding, and tertiary interactions. It is argued that the space of for right handed a-helices and a right handed twist of known protein topologies covers all single domain folds and b-sheets [2]. Third, native proteins structures are well results from the compactness of the native structure and packed, compact structures whose average internal pack- excluded volume. Protein compactness combined with the ing density is comparable to crystalline solids [3]. Thus, chirality of the protein’s side chains also yields native-like the interplay of conformational preferences as provided Ramachandran plots. It is the many-body, tertiary interactions by the local in sequence interactions among the side among residues that collectively select for the global structure chains and backbone atoms, hydrogen bonding and chain that a particular protein sequence adopts. This explains why compactness driven by tertiary interactions dictate the the recent advances in deep-learning approaches that predict particular native structure a protein adopts. But to what protein side-chain contacts, the distance matrix between extent do the structural properties of proteins depend residues, and sequence alignments are successful. They solely on the intrinsic residue propensities with complete succeed because they implicitly learned the many-body neglect of their interactions with other residues in the interactions among protein residues. protein [4,5]? If this term were dominant, then local geometric and sequence specific effects would be most Address important. Conversely, what general features of protein Center for the Study of Systems Biology, School of Biological Sciences, structures arise from chain compactness and excluded Georgia Institute of Technology, 950 Atlantic Drive, NW, Atlanta, GA volume with complete neglect of sequence specificity 30332, United States [6,7]? If compactness were to play a major role in dictating Corresponding authors: the general geometric and topological features of pro- Skolnick, Jeffrey ([email protected]), Gao, Mu ([email protected]) teins, then perhaps, many body interactions between protein residues provide the specificity responsible for which structure a protein adopts, with local residue inter- Current Opinion in Structural Biology 2021, 68:1–8 actions providing the conformational bias to its secondary This review comes from a themed issue on Sequences and topology structure. This contribution explores these issues. Edited by Nir Ben-Tal and Andrei Lupas Despite the intuitive view that local peptide sequence interactions dominate, as illustrated in Figure 1, we argue that it is chain compactness which explains many general protein features, most especially the space of protein https://doi.org/10.1016/j.sbi.2020.10.008 folds, with local sequence specific effects providing a major, but not deterministic, driving force for secondary 0959-440X/ã 2020 Elsevier Ltd. All rights reserved. structure. Another key component is many-body, nonlocal in sequence interactions. While the importance of many body interactions was recognized for decades [8], the advent of deep-learning approaches enables the consideration of a wide spectrum of many-body effects that ultimately select the structure a given sequence adopts Determination of the native fold of a protein from its [9–12,13 ,15 ,16 ]. amino acid sequence is a fundamental question in bio- physics. Given the validity of the Anfinsen hypothesis At one extreme, one might imagine that every protein that the native conformation of the protein is at a global family adopts a unique protein fold which bears no free energy minimum, the native structure is the best statistically similar structural relationship to evolutionary compromise of local conformational preferences and ter- unrelated proteins [17]. That is, the structure adopted by tiary interactions [1]. There are three factors that deter- a given protein family is truly unique. Alternatively, due mine the global fold of a protein: First and foremost is the to inherent stereochemical restraints, perhaps there are a peptide bond that provides hydrogen bonding; this is the limited number of structurally distinct folds that single key directional term driving protein folding [2]. One protein domains adopt [18,19 ,20]. This is consistent with might also expect the peptide bond to provide local chain the observation of convergent folds and the fact that very stiffness. Then, there are the amino acid’s side chains diverse sequences adopt similar protein structures [21]. which provide the sequence specificity of the protein. www.sciencedirect.com Current Opinion in Structural Biology 2021, 68:1–8 2 Sequences and topology Figure 1 (a) (b) Compact shape Fold Current Opinion in Structural Biology Illustrating the interactions that dictate the (a) compact shape and (b) overall structural fold of a protein sequence. In (a), hydrogen bonds (dashed orange lines) yields a-helices or b-strands (purple cartoons). Together with various protein sidechain interactions (van der Waals representation), hydrophobic (white), positive (blue), negative (red), and polar (green), lead to the compactness of a protein. The protein backbone (ball-and-stick representation, carbon (cyan), nitrogen (blue), and oxygen(red) atoms) is characterized by the ’,c torsion angles. In (b), many-body, long-range interactions among residues (ball-and-stick representations) give rise to the specific protein fold (grey cartoon), as demonstrated for E. coli dihydrofolate reductase [57]. Snapshot images were created with VMD [58]. Fold convergence could occur either because the space of native (’,c) distribution without consideration of addi- single domain protein structures is inherently limited, or tional factors. These results suggest that the space of it could just reflect the evolutionary process by which all protein global protein folds and local geometric structures contemporary proteins emerged from a small set of ances- arise from protein compactness and local side chain tral peptides that were subsequently replicated, modified geometry and nothing more. When artificial proteins and shuffled to assemble all native folds [22–25]. Perhaps, containing native like secondary structures are folded, the latter occurred but was subject to the inherent global as expected, they also cover the space of single domain stereochemical constraints of proteins. This would result proteins [26]. in population of the same types of global folds indepen- dent of how they are generated. However, their relative Recent work on homopolymeric proteins containing population might be dictated by the initial primordial sets equal numbers of D-residues and L-residues, demi-chiral of primordial peptides. By studying quasi-spherical, ran- proteins, shows that they when compacted they too dom, flexible chains comprised of random sequences contain all observed native folds, that is, the space of protein-like average composition of L-amino acids com- native folds is complete [27 ]. But due to defects in chain pacted into the same volume as native proteins, those packing caused by the formation of local secondary protein properties that result from compaction and structures, they contain all native-like ligand-binding excluded volume were explored [26]. Remarkably, the topology of every single domain native protein (e.g. when Box 1 Ramachandran plot: The rotation of a protein backbone the local chain details are ignored, e.g. a helix is replaced chain can be characterized by two dihedral angles called (’,c), see by a smooth curve down its principal axis), is found in the Figure 1, left panel. The two dimensional plot of this pair of angles is structural library of compact, quasi-spherical proteins. known as the Ramachandran plot, which shows energetically This also implies that the structural space of single preferred angle regions, largely dictated by steric constraints. domain proteins is likely complete. What is more striking Free-modeling: This is the most challenging category of protein is that these compact, quasi-spherical random proteins targets in the CASP. A free-modeling target does not have a homolog whose atomic structure has been experimentally determined and exhibit protein like distributions of virtual bond angles; identified in the Protein Data Bank. Hence, it is a hard target for this result was not built into the model. While their homology-based structure prediction approaches such as backbone Ramachandran (’,c) plots are more diffuse ‘threading’ and sequence profile methods. Predicted structures in than native proteins, they occupy very similar regions this category are generally of lower quality than when related tem- (Box 1). Thus, in compact structures, backbone geometry plate structures can be identified. and protein side chain chirality generate the approximate Current Opinion in Structural Biology 2021, 68:1–8 www.sciencedirect.com Local versus nonlocal restraints in native proteins Skolnick and Gao 3 pockets, including native active site geometries with [38]. Subsequently, several groups demonstrated that L-amino acids. At what point following the origin

The Role of Local Versus Nonlocal Physicochemical Restraints in Determining Protein Native Structure

Arxiv:1911.09811V1 [Physics.Bio-Ph] 22 Nov 2019 Help to Associate a Folded Structure to a Protein Sequence

Protein Contact Map Prediction Using Multiple Sequence Alignment

Computational Protein Structure Prediction Using Deep Learning

Leveraging Novel Information Sources for Protein Structure Prediction

Ensembling Multiple Raw Coevolutionary Features with Deep Residual Neural Networks for Contact‐Map Prediction in CASP13

Direct Information Reweighted by Contact Templates: Improved RNA Contact Prediction by Combining Structural Features

Machine Learning Methods for Protein Structure Prediction Jianlin Cheng, Allison N

Protein Structures

Deep Learning-Based Advances in Protein Structure Prediction

Challenges in the Computational Modeling of the Protein Structure—Activity Relationship

Protein Contact Map Denoising Using Generative Adversarial Networks

Homology Modeling in the Time of Collective and Artificial Intelligence