Research Article Prediction of Four Kinds of Simple Supersecondary Structures in Protein by Using Chemical Shifts

Total Page:16

File Type:pdf, Size:1020Kb

Research Article Prediction of Four Kinds of Simple Supersecondary Structures in Protein by Using Chemical Shifts Hindawi Publishing Corporation e Scientific World Journal Volume 2014, Article ID 978503, 5 pages http://dx.doi.org/10.1155/2014/978503 Research Article Prediction of Four Kinds of Simple Supersecondary Structures in Protein by Using Chemical Shifts Feng Yonge College of Science, Inner Mongolia Agriculture University, Hohhot 010018, China Correspondence should be addressed to Feng Yonge; [email protected] Received 7 May 2014; Revised 3 June 2014; Accepted 4 June 2014; Published 18 June 2014 Academic Editor: Hao Lin Copyright © 2014 Feng Yonge. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Knowledge of supersecondary structures can provide important information about its spatial structure of protein. Some approaches have been developed for the prediction of protein supersecondary structure. However, the feature used by these approaches is primarily based on amino acid sequences. In this study, a novel model is presented to predict protein supersecondary structure by use of chemical shifts (CSs) information derived from nuclear magnetic resonance (NMR) spectroscopy. Using these CSs as inputs of the method of quadratic discriminant analysis (QD), we achieve the overall prediction accuracy of 77.3%, which is competitive with the same method for predicting supersecondary structures from amino acid compositions in threefold cross-validation. Moreover, our finding suggests that the combined use of different chemical shifts will influence the accuracy of prediction. 1. Introduction support vector machine [10]. And the overall accuracy of 78% was achieved. The features of these studies were mainly The prediction of protein structure is always one of the derived from the amino acid compositions or dipeptide most important research topics in the field of bioinformatics. compositions. However, it is very difficult to predict the spatial structure Nuclear magnetic resonance (NMR) technique plays an directly from the protein sequence. Therefore, the prediction important role in the determination of three-dimensional of supersecondary structure is an important step in the biological macromolecule structures. NMR chemical shifts prediction of protein spatial structure. The supersecondary encode subtle information about the local chemical environ- structural motifs are composed of a few secondary structural mentofnuclearspins.Formanyyears,therehasbeengrowing elements (namely, or ) connected by loops. At present, interest to access this information and utilize it for biomolec- therearefourkindsofsimplesupersecondarystructures, ular structure determination [11, 12]. Recent progress was namely, -loop-, -loop-, -loop-,and-loop-.These made by combining chemical shifts with protein structure motifs play an important role in protein folding and stability prediction programs [13–20], showing that chemical shifts because a large number of motifs exist in protein spatial information is a power parameter for the determination of structure. Many researches have focused on exploring meth- protein structure. In this paper, we utilized chemical shifts ods for protein supersecondary structure prediction [1, 2]. In as parameters to predict four kinds of simple supersecondary 1995, Sun et al. predicted protein supersecondary structure structures in protein by the method of quadratic discriminant and achieved an accuracy of between 70 and 80% by using analysis. Using the benchmark dataset, we achieved the neural networks [3]. Chou and Blinn presented a method for average of sensitivity of 76.3% and specificity of 74.3% and predicting beta turns [4–6], alpha turns [7],andallthetight the overall prediction accuracy of 77.3% in threefold cross- turns [6]. Cruz et al. identified -hairpin and non--hairpin validation by using six CSs (, ,,,,)asfeatures. [8].HuandLiidentifiedfourkindsofsimplesupersecondary Moreover, we have performed the prediction by combining structures in 2088 proteins and achieved an accuracy of the different chemical shifts as features. Results showed 78∼83 % [9]. Zou et al. also predicted four kinds of sim- that the redundant information has great influence on the ple supersecondary structures from 3088 proteins by using accuracy. 2 The Scientific World Journal 1 1 2. Materials and Methods =( − − Σ ) ln 2 2 ln 2.1. Database. The chemical shifts of all nuclei (, , 1 1 ,,,) in proteins were extracted from re-referenced −( − − Σ ). ln 2 2 ln protein chemical shift database (namely, RefDB [21]). The following steps were performed to construct the dataset. (4) Firstly, only proteins with six nuclei assigned CSs were considered. Secondly, only proteins with the supersecondary The result can be generalized to four groups directly and structures information in ArchDB40 [22]wereavailable.We described as follows. finally utilized the PISCES program [23]toremovethehighly Set similar sequences. After strictly following the aforemen- 1 = − V − Σ tioned procedures, 114 proteins were obtained which have V ln V 2 2 ln V both CSs and supersecondary structures. Among 114 proteins, (5) 92% (105 sequences) proteins have less than 25% sequence (V = , , ,) , identity, and the sequence identity of the remains ranges from 25 to 30%. The appendix lists 114 proteins used in this study. where Finally, we obtained 90 -loop- (HH), 89 -loop- (HE), 97 = (−)Σ−1 (−) , -loop- (EH), and 122 -loop- (EE) motifs, including the V V V V (6) - link and - hairpin. where V denotes the number of samples in group V, V is the square mahalanobis distance between and V with respect {, , 2.2. Feature Parameter. In the four data subsets to ΣV (note: V and |ΣV| are calculated in training set), and V , } ,wecalculatedtheaveragedCSsofsixnucleifora denotes chemical shift values of six nuclei :{} averaged sequence of length using the following formula: over group V; |ΣV| is the determinant of matrix ΣV. The six-dimensional vector V can be written as 1 = ∑ CS, (1) V () 1 =1 V = ∑ , (7) V =1 where =,,,,,. Therefore, a sequence can be V = , , , =,,,, ΣV converted into a six-dimensional vector :{}. where ; , ; isthecovariancematrixof6×6dimension, quantifying correlations between the chemical shifts of six nuclei: 2.3. Prediction Algorithm. Todesignanefficientandaccurate V V V predicted algorithm the key step is in protein supersecondary 1,1 1,2 ⋅⋅⋅ 1,6 [ ] structure prediction. The quadratic discriminant analysis [V V ⋅⋅⋅ V ] [ 2,1 2,2 2,6] [24] is a power algorithm that has been widely applied in [ ] ΣV = [ . ] , (8) genomic and proteomic bioinformatics. Thus, we used it here [ . ] to perform prediction. [ V V V ] 6,1 6,2 ⋅⋅⋅ 6,6 [ ] 2.4. Quadratic Discriminant Analysis (QD). For a sequence to be classified, we calculated the averaged CSs of six nuclei where the element using (1). So, the sequence is converted into a six-dimensional 1 :{} V () () vector : , = ∑ ( −V )( −V ). (9) V ={}(=,, ,, ,). (2) Here V = , , ,; , = , ,,,,. From (4)and(5), we have concluded Here we integrated six-dimensional vector by using quadratic = −. discriminant analysis function. Consider a sequence is (10) classified into four groups (, , , ). The discrimi- ( |) nant analysis function between group and group is defined Itcanbeeasilyprovedthat is the maximum ( |) V = by of V ,if is the maximal one in V ( , , , ). Then, we predict that belongs to group . = ln ( |)−ln ( |). (3) According to Bayes’ Theorem, we deduce 2.5. Correction in the Error Allowed Scope. Asequence is predicted for four kinds of supersecondary struc- ∼ − Σ tures by using (1) (10). If is the maximal one in 1 = , , , = ln − − ln ( ), then we predict that belongs 2 2 Σ to group . However, there are slight differences among The Scientific World Journal 3 ( = , ,). , To correct predicted results, we Table 1: The predicted accuracies by using six CSs as features (3-fold definethecoefficientoftheerrorallowedscopeas cross-validation). − Class SN (%) SP (%) Average Average = corr wro , (%) (11) structure <0.4 SN (%) SP (%) total corr HH 73.0 71.0 where denotes belonging to itself class , denotes corr wro 75.8 78.1 being predicted another class . For example, if is the EH 76.3 74.3 77.3 HE 69.0 66.7 super-secondary structure of ,then corr is and wro is the maximum among , , . EE 87.5 81.4 2.6. Performance Evaluation. In statistical prediction, inde- pendent dataset test, cross-validation test, and jackknife test can be used to examine a predictor for its effectiveness in indicating that CSs are highly informative with regard to practical application. Among the three test methods, the jack- supersecondary structures. knife test is deemed to be the least arbitrary that can always Generally speaking, chemical shift measurements can yield a unique result for a given benchmark dataset [25]and be incomplete for a multitude of reasons. Often, chemical has been widely used to examine the performance of various shifts can only be assigned partially or are missing. To assess predictors [26–37].However,inthisstudywehaveusedthe theimpactofincompletechemicalshiftassignments,we threefold cross-validation to examine the performance of our performed the prediction by using the combination of the method; in order to reduce the computational time, we ran- different chemical shifts as features. The results are shown in domly divided the training set into three parts, two of which Table 2. are for training and the rest
Recommended publications
  • Protein Structure: Data Bases and Classification
    Protein Structure: Data Bases and Classification Ingo Ruczinski Department of Biostatistics, Johns Hopkins University Reference Bourne and Weissig Structural Bioinformatics Wiley, 2003 More References 1 Structural Proteins Membrane Proteins Globular Proteins 2 Terminology • Primary Structure • Secondary Structure • Tertiary Structure • Quatenary Structure • Supersecondary Structure • Domain • Fold Hierarchy of Protein Structure Helices α 3.10 π Amino acids/turn: 3.6 3.0 4.4 Frequency ~97% ~3% rare H-bonding i, i+4 i, i+3 i, i+5 3 α-helices α-helices α-helices have handedness: α-helices have a dipole: β-sheets 4 β-sheets Have a right-handed twist! β-sheets Can form higher level structures! Super Secondary Structure Motifs 5 What is a Domain? Richardson (1981): W ithin a single subunit [polypeptide chain], contiguous portions of the polypeptide chain frequently fold into compact, local semi-independent units called domains. More About Domains • Independent folding units. • Lots of within contacts, few outside. • Domains create their own hydrophobic core. • Regions usually conserved during recombination. • Different domains of the same protein can have different functions. • Domains of the same protein may or may not interact. Why Look for Domains? Domains are the currency of protein function! 6 Domain Size • Domains can be between 25 and 500 residues long. • Most are less than 200 residues. • Domains can be smaller than 50 residues, but these need to be stabilized. Examples are the zinc finger and a scorpion toxin. Two Very Small Domains A Humdinger of a Domain 7 What’s the Domain? (Part 1) What’s the Domain? (Part 2) Homology and Analogy • Homology: Similarity in characteristics resulting from shared ancestry.
    [Show full text]
  • Water in Protein Structure Prediction
    Water in protein structure prediction Garegin A. Papoian†‡, Johan Ulander†‡§, Michael P. Eastwood†¶, Zaida Luthey-Schultenʈ, and Peter G. Wolynes†,†† †Department of Chemistry and Biochemistry and Center for Theoretical Biological Physics, University of California at San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0371; and ʈDepartment of Chemistry, University of Illinois at Urbana–Champaign, Urbana, IL 61801 Contributed by Peter G. Wolynes, December 4, 2003 Proteins have evolved to use water to help guide folding. A interactions play an important role not only in binding interfaces physically motivated, nonpairwise-additive model of water-medi- but in folding of monomeric proteins. ated interactions added to a protein structure prediction Hamilto- We use the associative memory (AM) Hamiltonian molecular nian yields marked improvement in the quality of structure pre- dynamics model as a starting point (14–16). This Hamiltonian diction for larger proteins. Free energy profile analysis suggests has two principal components: general polymer physics-based that long-range water-mediated potentials guide folding and terms that are sequence independent, collectively called ‘‘back- smooth the underlying folding funnel. Analyzing simulation tra- bone,’’ and sequence-dependent knowledge-based distance- jectories gives direct evidence that water-mediated interactions dependent additive potentials, collectively denoted as AM͞C facilitate native-like packing of supersecondary structural ele- (AM͞contact). The AM part describes interactions between all ments. Long-range pairing of hydrophilic groups is an integral part pairs of residues that are separated in sequence between 3 and of protein architecture. Specific water-mediated interactions are a 12 residues. It uses a set of nonhomologous memory proteins to universal feature of biomolecular recognition landscapes in both build a funneled energy landscape by matching fragments.
    [Show full text]
  • A Deep Learning Approach for Protein-Ligand Interaction Prediction
    bioRxiv preprint doi: https://doi.org/10.1101/2019.12.20.884841; this version posted September 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. SSnet: A Deep Learning Approach for Protein-Ligand Interaction Prediction Niraj Verma,y,x Xingming Qu,z,x Francesco Trozzi,y,x Mohamed Elsaied,{,x Nischal Karki,y,x Yunwen Tao,y,x Brian Zoltowski,y,x Eric C. Larson,z,x and Elfi Kraka∗,y,x yDepartment of Chemistry, Southern Methodist University, Dallas TX USA zDepartment of Computer Science, Southern Methodist University, Dallas TX USA {Department of Engineering Management and Information System, Southern Methodist University, Dallas TX USA xSouthern Methodist University E-mail: [email protected] Abstract Computational prediction of Protein-Ligand Interaction (PLI) is an important step in the modern drug discovery pipeline as it mitigates the cost, time, and resources re- quired to screen novel therapeutics. Deep Neural Networks (DNN) have recently shown excellent performance in PLI prediction. However, the performance is highly dependent on protein and ligand features utilized for the DNN model. Moreover, in current mod- els, the deciphering of how protein features determine the underlying principles that govern PLI is not trivial. In this work, we developed a DNN framework named SSnet that utilizes secondary structure information of proteins extracted as the curvature and torsion of the protein backbone to predict PLI. We demonstrate the performance of SSnet by comparing against a variety of currently popular machine and non-machine learning models using various metrics.
    [Show full text]
  • Predicting Protein Secondary and Supersecondary Structure
    29 Predicting Protein Secondary and Supersecondary Structure 29.1 Introduction............................................ 29-1 Background • Difficulty of general protein structure prediction • A bottom-up approach 29.2 Secondary structure ................................... 29-5 Early approaches • Incorporating local dependencies • Exploiting evolutionary information • Recent developments and conclusions 29.3 Tight turns ............................................. 29-13 29.4 Beta hairpins........................................... 29-15 29.5 Coiled coils ............................................. 29-16 Early approaches • Incorporating local dependencies • Predicting oligomerization • Structure-based predictions • Predicting coiled-coil protein interactions Mona Singh • Promising future directions Princeton University 29.6 Conclusions ............................................ 29-23 29.1 Introduction Proteins play a key role in almost all biological processes. They take part in, for example, maintaining the structural integrity of the cell, transport and storage of small molecules, catalysis, regulation, signaling and the immune system. Linear protein molecules fold up into specific three-dimensional structures, and their functional properties depend intricately upon their structures. As a result, there has been much effort, both experimental and computational, in determining protein structures. Protein structures are determined experimentally using either x-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy. While
    [Show full text]
  • Supersecondary Structure Motifs and De Novo Protein Structure Prediction
    Supersecondary Structure Motifs and De Novo Protein Structure Prediction Bryn Reinstadler Jennifer Van Amarda Shehu Williams College George Mason University George Mason University 31 July 2012 Abstract Defining physically-realistic structural fragments with which to assemble novel protein conformations is a central component of the ab initio protein structure prediction problem. In this work we pursue supersecondary structure motifs. The motifs are extracted from analysis of non-redundant experimental protein structures from the Protein Data Bank. Analysis of the motifs shows that they are present in all known protein folds and may therefore allow focusing de novo structure prediction to native folds. Our results show that these motifs can be promising for ab initio structure prediction. 1 Introduction Because of the difficulty of empirically deriving protein structure through means such as x-ray crystallography or NMR, the methods of ab initio structure prediction and de novo protein design have come into popularity in recent years. Ab initio structure prediction is based on Anfinsen’s premise that a protein's structure is based on the primary amino acid structure of the protein and the environment of the protein. [1] A protein's conformation, then, is the lowest free energy conformation given a set of global constraints that are provided by the environment. Thus, using the primary amino acid structure of a protein and a rough simulation of its environment, it is computationally possible to derive possible native conformations of a protein. [2] [3] Unfortunately, though computationally possible it is also quite computationally difficult. Structure prediction engines such as those found in prediction competitions like CASP or CAPRI often produce tens of thousands of candidate structures before refining their candi- dates through the use of energy functions and geometric hashing.
    [Show full text]
  • Supersecondary Structures (Structural Motifs)
    SupersecondarySupersecondary StructuresStructures (structural(structural motifs)motifs) VariousVarious SourcesSources Lecture 2 Biochemistry 4000 Slide 1 Supersecondary Structures (Motifs) Supersecondary Structures (Motifs): Combinations of secondary structures in specific geometric arrangements ● Simple supersecondary structures consisting of 3 or fewer secondary structures considered here ● Large supersecondary structures (ie. Greek Key and Jelly-Roll) consisting of more than 3 secondary structures will be considered along with tertiary structures and folds ● Why? Large supersecondary structures can be domains. Simple supersecondary structures are typically composed of two secondary structures (ie. strands or helices) and a turn (or loop) Helix-turn-helix DNA binding motif Lecture 2 Biochemistry 4000 Slide 2 Supersecondary Structures (Motifs) Supersecondary Structures Hierarchy of Protein Structure (Motifs): Combinations of α β secondary structures in specific geometric arrangements Simple supersecondary structures are the 'building blocks' of …. αα βαβ ββ a) Complex Supersecondary Structures b) Tertiary Structure / Domain folds ββββ Simple supersecondary structures αααα are relatively stable components of Greek 4 helix βαβαβ βββ the 'Molten Globule' protein folding Key bundle β intermediate Rossman β-meander fold Lecture 2 Biochemistry 4000 Slide 3 β-hairpin Simple and Common supersecondary structure β-hairpin: two antiparallel β-strands joined by a turn or loop ● Very small supersecondary structure (typically less than 10 residues)
    [Show full text]
  • Super Secondary Structures of Proteins with Post-Translational Modifications in Colon Cancer
    molecules Article Super Secondary Structures of Proteins with Post-Translational Modifications in Colon Cancer Dmitry Tikhonov 1,2 , Liudmila Kulikova 1, Arthur Kopylov 3 , Kristina Malsagova 3, Alexander Stepanov 3 , Vladimir Rudnev 2 and Anna Kaysheva 3,* 1 Institute of Mathematical Problems of Biology RAS-the Branch of Keldysh Institute of Applied Mathematics of Russian Academy of Sciences, 142290 Pushchino, Moscow Region, Russia; [email protected] (D.T.); [email protected] (L.K.) 2 Institute of Theoretical and Experimental Biophysics, Russian Academy of Sciences, 142290 Pushchino, Moscow Region, Russia; [email protected] 3 V.N. Orekhovich Institute of Biomedical Chemistry, 119121 Moscow, Russia; [email protected] (A.K.); [email protected] (K.M.); [email protected] (A.S.) * Correspondence: [email protected]; Tel.: +79-199-175-017 Academic Editors: Susy Piovesana and Rudy J. Richardson Received: 1 June 2020; Accepted: 6 July 2020; Published: 9 July 2020 Abstract: New advances in protein post-translational modifications (PTMs) have revealed a complex layer of regulatory mechanisms through which PTMs control cell signaling and metabolic pathways, contributing to the diverse metabolic phenotypes found in cancer. Using conformational templates and the three-dimensional (3D) environment investigation of proteins in patients with colorectal cancer, it was demonstrated that most PTMs (phosphorylation, acetylation, and ubiquitination) are localized in the supersecondary structures (helical pairs). We showed that such helical pairs are represented on the outer surface of protein molecules and characterized by a largely accessible area for the surrounding solvent. Most promising and meaningful modifications were observed on the surface of vitamin D-binding protein (VDBP), complement C4-A (CO4A), X-ray repair cross-complementing protein 6 (XRCC6), Plasma protease C1 inhibitor (IC1), and albumin (ALBU), which are related to colorectal cancer developing.
    [Show full text]
  • Protein Structure
    Proteins: an introduction Life is the result of the chemical activity of proteins (with nucleic acids essentially confined to encoding and producing them) A cross-section through Escherichia coli shows the diversity and density of macromolecules forming a cell flagella pili Howard Berg David Goodsell Life is the result of the chemical activity of proteins (with nucleic acids essentially confined to encoding and producing them) A cross-section through a human cell, from the plasma membrane to the nucleus The enlargement shows cell-surface gycoproteins, the sub-membraneous cytoskeleton (primarily actin filaments), a microtubulus, ribosomes (in pink; they are everywhere except the nucleus) relative size of E. coli David Goodsell Life is the result of the chemical activity of proteins (with nucleic acids essentially confined to encoding and producing them) A cross-section through a human cell, from the plasma membrane to the nucleus The enlargement shows the endoplasmatic reticulum (‚rough ER‘), Golgi stacks, and a clathrin cage assembling for vesicle budding David Goodsell Life is the result of the chemical activity of proteins (with nucleic acids essentially confined to encoding and producing them) A cross-section through a human cell, from the plasma membrane to the nucleus The enlargement shows a mitochondrion David Goodsell Life is the result of the chemical activity of proteins (with nucleic acids essentially confined to encoding and producing them) A cross-section through a human cell, from the plasma membrane to the nucleus The enlargement shows a nuclear pore, the pernuclear and intranuclear cytoskeleton, chromatin, spliceosomes and other RNP particles David Goodsell Proteins are extremely diverse David Goodsell Proteins are extremely diverse David Goodsell What is a protein? • protein - 1844, from Fr.
    [Show full text]
  • Structural Bioinformatics
    Course plan 2015-2016 Academic Year Qualification Master's Degree Structural Bioinformatics Descriptive details concerning the subject: · Name of the subject: Structural Bioinformatics · Code : 30169 · Type of subject: Optativa · Module: SBI · Credits: 5 ECTS · Total hours: 125.0 · Year: 1st · Term: 2nd · Coordination: Baldo Oliva · Department: Ciències Experimentals i de la Salut · Building: Campus Mar Teaching staff: · Group: 1 · Language: English · Lecturer: Baldo Oliva · Department: Ciències Experimentals i de la Salut Subject presentation: · Presentation: The main goal of the course is to learn the basic concepts on the structure of macromolecules, more specifically of proteins. Also the principles of crystallography are given in order to gain a deeper understanding on the 3- dimensional data. The main features relating sequence and protein fold will be used on the prediction of secondary and terciary structure of proteins and on its evaluation. Finally, the relationship between sequence/structure and function of proteins will be analysed. Prerequisites in order to follow the itinerary: 1) Knowledge in Bioinformatics: alignment algorithms (Needleman-Wunsch and Smith-Watermann) and substitution matrices (PAM and BLOSUM); Hidden Markov Models and Neural Networks; Linux and windows operative systems; Perl programming 2) Knowledge in Mathematics: Derivatives and Integrals in n-dimensions; Taylor polinomia; Linear algebra and trigonometry; main concepts of statistics and probabilities: combinatorial analysis, permutations, Bayes theorem, Gaussian and Poisson distributions; numerical analysis, optimization and fitting of functions. 3) Knowledge in Physics: Newton classical mechanics; Thermodynamics, Gibbs energy, entropy, enthalpy and basic statistical mechanics. 4) Knowledge in Chemistry: Chemical structure, conformation, configuration and estereoisomers; Reactivity, transition state, electrophilic and nucleophilic attacks; Electronic densitty, molecular orbitals, quantum numbers, Shroedingen function and wave function; Steady state and equilibrium.
    [Show full text]
  • Secondary Structure Prediction for RNA Binding Domain in RNP Proteins Identifies Pap As the Main Structural Motif
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector Volume 257, number 2, 373-376 FEB 07796 November 1989 Secondary structure prediction for RNA binding domain in RNP proteins identifies Pap as the main structural motif A. Ghetti, C. Padovani, G. Di Cesare and C. Morandi rst~tuto di Science Bio~ogi~~e, university di Verona, Strada le Grazie, 37334 Verma, Italy Received 23 August 1989 In eukaryotic cells transcript processing is strictly dependent upon binding of specific proteins. Nuclear RNA binding proteins share a common domain, which is involved in RNA binding. In order to characterize RNP-RNA interactions we have performed a secondary structure prediction based both on statistical algorithms and comparative analysis of different proteins. A high conservation for secondary structure propensity between different RNPs was observed. Ribonucle~c protein; Structure prediction I. INTRODUCTION polypeptide) confers with the protein the affinity for single-stranded nucleic acids. Further analysis of UP1 In recent years, it has been demonstrated that sequence domain revealed that it contains a tandem eukaryotic mRNAs are always associated in the duplicate of 90 residues. cytoplasm as well as in the nucleoplasm with specific Later works identified the UPl’s subdomain in other proteins which play an important role in transcript nuclear proteins from different sources: polyadenylate metabolism [ 1,2]. binding proteins [lo-121, Cl hnRNP [13], nucleolin After transcription by Pal II, the synthesized [ 141, proteins bound to small nuclear RNAs (snRNP) heterogeneous nuclear RNA (hnRNA) tightly [15-171 and others. associates with a set of proteins called ‘heterogeneous A common characteristic of these proteins is their nuclear ribonucleic proteins’ (hnRNP) to form a ability to bind single-stranded nucleic acids in vitro and NAfprotein complex in a ‘nucleosome-like’ structure, RNA in vivo.
    [Show full text]
  • Local Protein Structures Bernard Offmann, Manoj Tyagi, Alexandre De Brevern
    Local Protein Structures Bernard Offmann, Manoj Tyagi, Alexandre de Brevern To cite this version: Bernard Offmann, Manoj Tyagi, Alexandre de Brevern. Local Protein Structures. Current Bioinfor- matics, Benthams Science, 2007, 2, pp.165-202. inserm-00175058 HAL Id: inserm-00175058 https://www.hal.inserm.fr/inserm-00175058 Submitted on 11 May 2010 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. HAL authorLocal manuscript Protein Structures (Offmann, Tyagi & de Brevern) Current Bioinformatics 2007;2:165-202 Preprint for Current Bioinformatics 2007 HAL author manuscript inserm-00175058, version 1 Local Protein Structures Offmann B. 1, Tyagi M. 1+ & de Brevern A.G. 2* 1 Laboratoire de Biochimie et Génétique Moléculaire, Université de La Réunion, 15, avenue René Cassin, BP7151, 97715 Saint Denis Messag Cedex 09, La Réunion, France 2 Equipe de Bioinformatique Génomique et Moléculaire (EBGM), INSERM UMR-S 726, Université Paris Diderot, case 7113, 2, place Jussieu, 75251 Paris, France * Corresponding author: mailing address: Dr. de Brevern A.G., Equipe de Bioinformatique Génomique et Moléculaire (EBGM), INSERM UMR-S 726, Université Paris Diderot, case 7113, 2, place Jussieu, 75251 Paris, France E-mail : [email protected] Tel: (33) 1 44 27 77 31 Fax: (33) 1 43 26 38 30 key words: secondary structure, protein folds, structure-sequence relationship, structural alphabet, protein blocks, molecular modeling, ab initio .
    [Show full text]
  • Amino Acid Distribution Rules Predict Protein Fold: Protein Grammar for Beta-Strand Sandwich-Like Structures
    Biomolecules 2015, 5, 41-59; doi:10.3390/biom5010041 OPEN ACCESS biomolecules ISSN 2218-273X www.mdpi.com/journal/biomolecules/ Article Amino Acid Distribution Rules Predict Protein Fold: Protein Grammar for Beta-Strand Sandwich-Like Structures Alexander Kister Department of Mathematics, Rutgers University, Piscataway, NJ 08854, USA; E-Mail: [email protected] Received: 13 October 2014 / Accepted: 31 December 2014 / Published: 23 January 2015 Abstract: We present an alternative approach to protein 3D folding prediction based on determination of rules that specify distribution of “favorable” residues, that are mainly responsible for a given fold formation, and “unfavorable” residues, that are incompatible with that fold, in polypeptide sequences. The process of determining favorable and unfavorable residues is iterative. The starting assumptions are based on the general principles of protein structure formation as well as structural features peculiar to a protein fold under investigation. The initial assumptions are tested one-by-one for a set of all known proteins with a given structure. The assumption is accepted as a “rule of amino acid distribution” for the protein fold if it holds true for all, or near all, structures. If the assumption is not accepted as a rule, it can be modified to better fit the data and then tested again in the next step of the iterative search algorithm, or rejected. We determined the set of amino acid distribution rules for a large group of beta sandwich-like proteins characterized by a specific arrangement of strands in two beta sheets. It was shown that this set of rules is highly sensitive (~90%) and very specific (~99%) for identifying sequences of proteins with specified beta sandwich fold structure.
    [Show full text]