Raptorx: Exploiting Structure Information for Protein Alignment by Statistical Inference Jian Peng and Jinbo Xu*

Total Page:16

File Type:pdf, Size:1020Kb

Raptorx: Exploiting Structure Information for Protein Alignment by Statistical Inference Jian Peng and Jinbo Xu* proteins STRUCTURE O FUNCTION O BIOINFORMATICS Prediction Methods and Reports RaptorX: Exploiting structure information for protein alignment by statistical inference Jian Peng and Jinbo Xu* Toyota Technological Institute at Chicago, Chicago, Illinois ABSTRACT INTRODUCTION This work presents RaptorX, a sta- RaptorX is totally different from our previous threading program RAPTOR, tistical method for template-based which aligns a sequence to a template by using linear programming to minimize a protein modeling that improves given threading scoring function.1,2 By contrast, RaptorX uses a statistical learning alignment accuracy by exploiting method to design a new threading scoring function, aiming at better measuring the structural information in a single compatibility between a target sequence and a template structure. In addition to or multiple templates. RaptorX single-template threading, RaptorX also has a multiple-template threading compo- consists of three major compo- nent and contains a new module for alignment quality prediction. Our results show nents: single-template threading, 3,4 alignment quality prediction, and that RaptorX indeed has much better alignment accuracy than RAPTOR. multiple-template threading. This RaptorX is designed to address two ‘‘alignment’’ challenges facing template- work summarizes the methods used based protein modeling. One is how to align a target to its template when they by RaptorX and presents its CASP9 have a sparse sequence profile (i.e., no sufficient amount of information in result analysis, aiming to identify homologs). In this case, a profile-based alignment method may not work well. major bottlenecks with RaptorX The other is how to improve sequence-template alignment accuracy using more and template-based modeling and reliable template structural alignments as bridge when at least two similar tem- hopefully directions for further plates are available for a target. RaptorX addresses these two challenges by study. Our results show that tem- exploiting template structure information in several unique ways. plate structural information helps a Many homology modeling and protein threading methods have been developed lot with both single-template and for sequence-template alignment.5–20 These methods have two major issues in multiple-template protein threading especially when closely-related tem- dealing with distantly related sequence and template. One is that these methods plates are unavailable, and there is use a linear scoring function to guide the alignment of a sequence to its template. still large room for improvement in A linear function cannot deal well with correlation among protein features, both alignment and template selec- although many features are indeed correlated (e.g., secondary structure vs. solvent tion. The RaptorX web server is accessibility). The other issue is that these methods heavily depend on sequence available at http://raptorx.uchica- profile. Sequence profile has proved to be very powerful in detecting remote go.edu. homologs and generating accurate alignments, as demonstrated by the excellent 7 Proteins 2011; 79(Suppl 10):161–171. HHpred program. However, information in a sequence profile may not be suffi- VC 2011 Wiley-Liss, Inc. cient especially when the profile is sparse. To address these two issues, RaptorX Key words: single-template thread- The authors state no conflict of interest. ing; multiple-template threading; Grant sponsor: National Institutes of Health; Grant number: R01GM089753; Grant sponsor: National Science Foun- alignment quality prediction; prob- dation; Grant number: DBI-0960390 abilistic alignment; multiple pro- *Correspondence to: Jinbo Xu, Toyota Technological Institute at Chicago, 6045 S. Kenwood Avenue, Chicago, IL 60637. E-mail: [email protected]. tein alignment; CASP. Received 6 April 2011; Revised 25 July 2011; Accepted 19 August 2011 Published online 12 September 2011 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/prot.23175 VC 2011 WILEY-LISS, INC. PROTEINS 161 J. Peng and J. Xu uses a nonlinear scoring function to combine homolo- (250 < mutation score < 210) and (secondary structure gous information (i.e., sequence profile), and structure score > 0.9) and (solvent accessibility score > 0.6), then information in a very flexible way. When proteins under the log-likelihood of two residues being aligned is ln 0.7". consideration have high-quality sequence profile, Rap- Our scoring function is much more sensitive than the torX counts more on profile information, otherwise on widely used linear function, because our function can structure information to improve alignment accuracy. model protein feature correlation. Our method also ena- When multiple similar templates are available for a bles us to use different criteria to align different regions of single target, RaptorX improves pairwise target-template the sequence and template. This is analogous to the posi- alignment accuracy through exploiting template struc- tion specific scoring matrix, which has different mutation tural similarity (i.e., geometrical information), in addi- potentials for the same amino acid at different positions. tion to increase alignment coverage for the target (by Our method differs from others in that we use NEFF copying the most similar regions from different tem- to adjust the relative importance of homologous and plates). Nearly all existing multiple-template methods structure information so that the former will not domi- generate an alignment between the target and its multiple nate the latter. When proteins have large NEFF, RaptorX templates by either simply assembling pairwise align- counts more on sequence profile information; otherwise, ments into a single multiple alignment using the target structure information. Our method uses both context- as an anchor21,22 or using some multiple sequence specific and position-specific gap penalty and then use alignment tools such as T-Coffee23, MUSCLE,24 and NEFF to determine their relative importance. If NEFF is ProbCons.25 Neither way effectively uses template struc- large, we will rely more on position-specific gap penalty tural alignments, which are more reliable than sequence- derived from the alignment of sequence homologs; other- template alignments, as bridge to build sequence-to-mul- wise, context-specific gap penalty. Our context-specific tiple-template alignments, so they usually do not fare gap penalty depends on (predicted) secondary structure much better than when only a single template is available type, (predicted) solvent accessibility, amino acid identity, for the target. By contrast, RaptorX takes advantage of hydropathy count, and if a residue is in the core region template structural alignments to align a target sequence or not. simultaneously to multiple templates through a novel sta- tistical inference method. Two multiple protein alignment programs 3DCoffee26 and PROMAL3D,27 which can Alignment quality prediction also be used to align a single sequence to multiple tem- plates, indeed take advantage of template structural align- We predict the absolute quality of a pairwise sequence- ments in building a multiple alignment, but our experi- template alignment using neural network and then use mental results show that they do not fare well when the the predicted quality to rank all the templates for a spe- target and templates are not closely related.28 cific target. The quality of a pairwise sequence-template alignment is defined as the TMscore30 of the 3D model built from this alignment by MODELLER (with default METHODS parameters). However, our method does not need to Following PSI-BLAST and HHpred, we use NEFF to actually build a 3D model in order to predict alignment measure the amount of information in the sequence pro- quality, because only information in an alignment is used file of a protein. NEFF ranges from 1 to 20 and can be for quality prediction. This saves time for 3D model interpreted as the expected number of amino acid substi- building, and, thus, we can predict alignment quality tutions at each sequence position. A sparse sequence pro- very quickly. Our old RAPTOR program uses an SVM file (i.e., a profile with a small NEFF value) usually leads method to predict the number of correctly aligned posi- 31 to less accurate secondary structure prediction and less tions in an alignment. RaptorX differs from RAPTOR accurate alignment.29 in that RaptorX predicts TMscore, a better quality mea- sure, and also uses a better set of alignment features. Let A(i) denote the target residue aligned to the ith Single-template protein threading template residue. A(i) is empty if the template residue is We use a probabilistic model to formulate the pairwise not aligned to any target residue. Let PSSM and PSFM sequence-template alignment problem. See Peng and denote the position-specific scoring matrix and position Xu3,4 for the technical details. Our probabilistic model specific frequency matrix for the template and the uses a regression-tree-based nonlinear scoring function to sequence, respectively. PSSM and PSFM are two slightly measure the similarity between two proteins. A regression different representations of sequence profile. Let PSSMi tree consists of a collection of rules to calculate the proba- and PSFMi denote sequence profiles at template position bility of an alignment. One rule can be as simple as ‘‘if i and sequence position i, respectively. Both PSSMi and (mutation score < 250), then the log-likelihood of PSFMi are a vector of 20 real values encoding
Recommended publications
  • Development of Novel Strategies for Template-Based Protein Structure Prediction
    Imperial College London Department of Life Sciences PhD Thesis Development of novel strategies for template-based protein structure prediction Stefans Mezulis supervised by Prof. Michael Sternberg Prof. William Knottenbelt September 2016 Abstract The most successful methods for predicting the structure of a protein from its sequence rely on identifying homologous sequences with a known struc- ture and building a model from these structures. A key component of these homology modelling pipelines is a model combination method, responsible for combining homologous structures into a coherent whole. Presented in this thesis is poing2, a model combination method using physics-, knowledge- and template-based constraints to assemble proteins us- ing information from known structures. By combining intrinsic bond length, angle and torsional constraints with long- and short-range information ex- tracted from template structures, poing2 assembles simplified protein models using molecular dynamics algorithms. Compared to the widely-used model combination tool MODELLER, poing2 is able to assemble models of ap- proximately equal quality. When supplied only with poor quality templates or templates that do not cover the majority of the query sequence, poing2 significantly outperforms MODELLER. Additionally presented in this work is PhyreStorm, a tool for quickly and accurately aligning the three-dimensional structure of a query protein with the Protein Data Bank (PDB). The PhyreStorm web server provides comprehensive, current and rapid structural comparisons to the protein data bank, providing researchers with another tool from which a range of biological insights may be drawn. By partitioning the PDB into clusters of similar structures and performing an initial alignment to the representatives of each cluster, PhyreStorm is able to quickly determine which structures should be excluded from the alignment.
    [Show full text]
  • The Phyre2 Web Portal for Protein Modeling, Prediction and Analysis
    PROTOCOL The Phyre2 web portal for protein modeling, prediction and analysis Lawrence A Kelley1, Stefans Mezulis1, Christopher M Yates1,2, Mark N Wass1,2 & Michael J E Sternberg1 1Structural Bioinformatics Group, Imperial College London, London, UK. 2Present addresses: University College London (UCL) Cancer Institute, London, UK (C.M.Y.); Centre for Molecular Processing, School of Biosciences, University of Kent, Kent, UK (M.N.W.). Correspondence should be addressed to L.A.K. ([email protected]). Published online 7 May 2015; doi:10.1038/nprot.2015.053 Phyre2 is a suite of tools available on the web to predict and analyze protein structure, function and mutations. The focus of Phyre2 is to provide biologists with a simple and intuitive interface to state-of-the-art protein bioinformatics tools. Phyre2 replaces Phyre, the original version of the server for which we previously published a paper in Nature Protocols. In this updated protocol, we describe Phyre2, which uses advanced remote homology detection methods to build 3D models, predict ligand binding sites and analyze the effect of amino acid variants (e.g., nonsynonymous SNPs (nsSNPs)) for a user’s protein sequence. Users are guided through results by a simple interface at a level of detail they determine. This protocol will guide users from submitting a protein sequence to interpreting the secondary and tertiary structure of their models, their domain composition and model quality. A range of additional available tools is described to find a protein structure in a genome, to submit large number of sequences at once and to automatically run weekly searches for proteins that are difficult to model.
    [Show full text]
  • Statistical Inference for Template-Based Protein Structure
    Statistical Inference for template-based protein structure prediction by Jian Peng Submitted to: Toyota Technological Institute at Chicago 6045 S. Kenwood Ave, Chicago, IL, 60637 For the degree of Doctor of Philosophy in Computer Science Thesis Committee: Jinbo Xu (Thesis Supervisor) David McAllester Daisuke Kihara Statistical Inference for template-based protein structure prediction by Jian Peng Submitted to: Toyota Technological Institute at Chicago 6045 S. Kenwood Ave, Chicago, IL, 60637 May 2013 For the degree of Doctor of Philosophy in Computer Science Thesis Committee: Jinbo Xu (Thesis Supervisor) Signature: Date: David McAllester Signature: Date: Daisuke Kihara Signature: Date: Abstract Protein structure prediction is one of the most important problems in computational biology. The most successful computational approach, also called template-based modeling, identifies templates with solved crystal structures for the query proteins and constructs three dimensional models based on sequence/structure alignments. Although substantial effort has been made to improve protein sequence alignment, the accuracy of alignments between distantly related proteins is still unsatisfactory. In this thesis, I will introduce a number of statistical machine learning methods to build accurate alignments between a protein sequence and its template structures, especially for proteins having only distantly related templates. For a protein with only one good template, we develop a regression-tree based Conditional Random Fields (CRF) model for pairwise protein sequence/structure alignment. By learning a nonlinear threading scoring function, we are able to leverage the correlation among different sequence and structural features. We also introduce an information-theoretic measure to guide the learning algorithm to better exploit the structural features for low-homology proteins with little evolutionary information in their sequence profile.
    [Show full text]
  • Deep Learning-Based Advances in Protein Structure Prediction
    International Journal of Molecular Sciences Review Deep Learning-Based Advances in Protein Structure Prediction Subash C. Pakhrin 1,†, Bikash Shrestha 2,†, Badri Adhikari 2,* and Dukka B. KC 1,* 1 Department of Electrical Engineering and Computer Science, Wichita State University, Wichita, KS 67260, USA; [email protected] 2 Department of Computer Science, University of Missouri-St. Louis, St. Louis, MO 63121, USA; [email protected] * Correspondence: [email protected] (B.A.); [email protected] (D.B.K.) † Both the authors should be considered equal first authors. Abstract: Obtaining an accurate description of protein structure is a fundamental step toward under- standing the underpinning of biology. Although recent advances in experimental approaches have greatly enhanced our capabilities to experimentally determine protein structures, the gap between the number of protein sequences and known protein structures is ever increasing. Computational protein structure prediction is one of the ways to fill this gap. Recently, the protein structure prediction field has witnessed a lot of advances due to Deep Learning (DL)-based approaches as evidenced by the suc- cess of AlphaFold2 in the most recent Critical Assessment of protein Structure Prediction (CASP14). In this article, we highlight important milestones and progresses in the field of protein structure prediction due to DL-based methods as observed in CASP experiments. We describe advances in various steps of protein structure prediction pipeline viz. protein contact map prediction, protein distogram prediction, protein real-valued distance prediction, and Quality Assessment/refinement. We also highlight some end-to-end DL-based approaches for protein structure prediction approaches. Additionally, as there have been some recent DL-based advances in protein structure determination Citation: Pakhrin, S.C.; Shrestha, B.; using Cryo-Electron (Cryo-EM) microscopy based, we also highlight some of the important progress Adhikari, B.; KC, D.B.
    [Show full text]
  • Distance-Based Protein Folding Powered by Deep Learning Jinbo Xu Toyota Technological Institute at Chicago 6045 S Kenwood, IL, 60637, USA [email protected]
    Distance-based Protein Folding Powered by Deep Learning Jinbo Xu Toyota Technological Institute at Chicago 6045 S Kenwood, IL, 60637, USA [email protected] Contact-assisted protein folding has made very good progress, but two challenges remain. One is accurate contact prediction for proteins lack of many sequence homologs and the other is that time-consuming folding simulation is often needed to predict good 3D models from predicted contacts. We show that protein distance matrix can be predicted well by deep learning and then directly used to construct 3D models without folding simulation at all. Using distance geometry to construct 3D models from our predicted distance matrices, we successfully folded 21 of the 37 CASP12 hard targets with a median family size of 58 effective sequence homologs within 4 hours on a Linux computer of 20 CPUs. In contrast, contacts predicted by direct coupling analysis (DCA) cannot fold any of them in the absence of folding simulation and the best CASP12 group folded 11 of them by integrating predicted contacts into complex, fragment- based folding simulation. The rigorous experimental validation on 15 CASP13 targets show that among the 3 hardest targets of new fold our distance-based folding servers successfully folded 2 large ones with <150 sequence homologs while the other servers failed on all three, and that our ab initio folding server also predicted the best, high-quality 3D model for a large homology modeling target. Further experimental validation in CAMEO shows that our ab initio folding server predicted correct fold for a membrane protein of new fold with 200 residues and 229 sequence homologs while all the other servers failed.
    [Show full text]
  • Template-Based Protein Modeling Using the Raptorx Web Server
    PROTOCOL Template-based protein structure modeling using the RaptorX web server Morten Källberg1–3, Haipeng Wang1,3, Sheng Wang1, Jian Peng1, Zhiyong Wang1, Hui Lu2 & Jinbo Xu1 1Toyota Technological Institute at Chicago, Chicago, Illinois, USA. 2Department of Bioengineering, University of Illinois at Chicago, Chicago, Illinois, USA. 3These authors contributed equally to this work. Correspondence should be addressed to J.X. ([email protected]). Published online 19 July 2012; doi:10.1038/nprot.2012.085 A key challenge of modern biology is to uncover the functional role of the protein entities that compose cellular proteomes. To this end, the availability of reliable three-dimensional atomic models of proteins is often crucial. This protocol presents a community-wide web-based method using RaptorX (http://raptorx.uchicago.edu/) for protein secondary structure prediction, template-based tertiary structure modeling, alignment quality assessment and sophisticated probabilistic alignment sampling. RaptorX distinguishes itself from other servers by the quality of the alignment between a target sequence and one or multiple distantly related template proteins (especially those with sparse sequence profiles) and by a novel nonlinear scoring function and a probabilistic-consistency algorithm. Consequently, RaptorX delivers high-quality structural models for many targets with only remote templates. At present, it takes RaptorX ~35 min to finish processing a sequence of 200 amino acids. Since its official release in August 2011, RaptorX has processed ~6,000 sequences submitted by ~1,600 users from around the world. INTRODUCTION Proteomes constitute the backbone of cellular function by carrying used by threading methods8,9 such as MUSTER10, SPARKS11,12 and out the tasks encoded in the genes expressed by a given cell type.
    [Show full text]
  • And Contact-Based Protein Structure Prediction
    International Journal of Molecular Sciences Review Recent Applications of Deep Learning Methods on Evolution- and Contact-Based Protein Structure Prediction Donghyuk Suh 1 , Jai Woo Lee 1 , Sun Choi 1 and Yoonji Lee 2,* 1 Global AI Drug Discovery Center, School of Pharmaceutical Sciences, College of Pharmacy and Graduate, Ewha Womans University, Seoul 03760, Korea; [email protected] (D.S.); [email protected] (J.W.L.); [email protected] (S.C.) 2 College of Pharmacy, Chung-Ang University, Seoul 06974, Korea * Correspondence: [email protected] Abstract: The new advances in deep learning methods have influenced many aspects of scientific research, including the study of the protein system. The prediction of proteins’ 3D structural compo- nents is now heavily dependent on machine learning techniques that interpret how protein sequences and their homology govern the inter-residue contacts and structural organization. Especially, meth- ods employing deep neural networks have had a significant impact on recent CASP13 and CASP14 competition. Here, we explore the recent applications of deep learning methods in the protein structure prediction area. We also look at the potential opportunities for deep learning methods to identify unknown protein structures and functions to be discovered and help guide drug–target interactions. Although significant problems still need to be addressed, we expect these techniques in the near future to play crucial roles in protein structural bioinformatics as well as in drug discovery. Keywords: structural bioinformatics; deep learning; protein sequence homology; 3D structure of Citation: Suh, D.; Lee, J.W.; Choi, S.; Lee, Y. Recent Applications of Deep proteins; drug discovery Learning Methods on Evolution- and Contact-Based Protein Structure Prediction.
    [Show full text]
  • MULTICOM2 Open-Source Protein Structure Prediction System
    www.nature.com/scientificreports OPEN MULTICOM2 open‑source protein structure prediction system powered by deep learning and distance prediction Tianqi Wu1, Jian Liu1, Zhiye Guo1, Jie Hou2 & Jianlin Cheng1* Protein structure prediction is an important problem in bioinformatics and has been studied for decades. However, there are still few open‑source comprehensive protein structure prediction packages publicly available in the feld. In this paper, we present our latest open‑source protein tertiary structure prediction system—MULTICOM2, an integration of template‑based modeling (TBM) and template‑free modeling (FM) methods. The template‑based modeling uses sequence alignment tools with deep multiple sequence alignments to search for structural templates, which are much faster and more accurate than MULTICOM1. The template‑free (ab initio or de novo) modeling uses the inter‑residue distances predicted by DeepDist to reconstruct tertiary structure models without using any known structure as template. In the blind CASP14 experiment, the average TM‑score of the models predicted by our server predictor based on the MULTICOM2 system is 0.720 for 58 TBM (regular) domains and 0.514 for 38 FM and FM/TBM (hard) domains, indicating that MULTICOM2 is capable of predicting good tertiary structures across the board. It can predict the correct fold for 76 CASP14 domains (95% regular domains and 55% hard domains) if only one prediction is made for a domain. The success rate is increased to 3% for both regular and hard domains if fve predictions are made per domain. Moreover, the prediction accuracy of the pure template‑free structure modeling method on both TBM and FM targets is very close to the combination of template‑based and template‑free modeling methods.
    [Show full text]
  • An Open-Source Protein Structure Prediction System Powered by Deep Learning and Distance Prediction
    MULTICOM2: an open-source protein structure prediction system powered by deep learning and distance prediction Tianqi Wu University of Missouri Jian Liu University of Missouri Zhiye Guo University of Missouri Jie Hou Saint Louis University Jianlin Cheng ( [email protected] ) University of Missouri Research Article Keywords: TBM, MULTICOM2, MULTICOM-DEEP, MULTICOM-DIST, MULTICOM-HYBRID, Posted Date: March 23rd, 2021 DOI: https://doi.org/10.21203/rs.3.rs-339464/v1 License: This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License Page 1/11 Abstract Protein structure prediction is an important problem in bioinformatics and has been studied for decades. However, there are still few open-source comprehensive protein structure prediction packages publicly available in the eld. In this paper, we present our latest open-source protein tertiary structure prediction system - MULTICOM2, an integration of template-based modeling (TBM) and template-free modeling (FM) methods. The template- based modeling uses sequence alignment tools with deep multiple sequence alignments to search for structural templates, which are much faster and more accurate than MULTICOM1. The template-free (ab initio or de novo) modeling uses the inter-residue distances predicted by DeepDist to reconstruct tertiary structure models without using any known structure as template. In the blind CASP14 experiment, the average TM-score of the models predicted by our server predictor based on the MULTICOM2 system is 0.720 for 58 TBM (regular) domains and 0.514 for 38 FM and FM/TBM (hard) domains, indicating that MULTICOM2 is capable of predicting good tertiary structures across the board.
    [Show full text]
  • Probabilistic Multi-Template Protein Homology Modeling
    See discussions, stats, and author profiles for this publication at: http://www.researchgate.net/publication/277713280 Probabilistic multi-template protein homology modeling ARTICLE in PLOS COMPUTATIONAL BIOLOGY · DECEMBER 2015 Impact Factor: 4.87 2 AUTHORS, INCLUDING: Johannes Söding Max Planck Institute for Biophysical Chemistry 83 PUBLICATIONS 5,722 CITATIONS SEE PROFILE Available from: Johannes Söding Retrieved on: 02 September 2015 Probabilistic multi-template protein homology modeling Armin Meier1,2 and Johannes S¨oding1,2 1 Computational Biology, Max Planck Institute for Biophysical Chemistry, Am Fassberg 11, 37077 G¨ottingen,Germany 2 Gene Center, Ludwig-Maximilians-Universit¨atM¨unchen Munich, Feodor-Lynen-Strasse 25, 81377, Munich, Germany * E-mail: [email protected] Abstract Homology modeling predicts the 3D structure of a query protein based on the sequence alignment with one or more template proteins of known structure. Its great importance for biological research is owed to its speed, simplicity, reliability and wide applicability, covering more than half of the residues in protein sequence space. Although multiple templates have been shown to generally increase model quality over single templates, the information from multiple templates has so far been combined using empirically motivated, heuristic approaches. We present here a rigorous statistical framework for multi-template homology modeling. First, we find that the query proteins' atomic distance restraints can be accurately described by two-component Gaussian mixtures. This insight allowed us to apply the standard laws of probability theory to combine restraints from multiple templates. Second, we derive theoretically optimal weights to correct for the redundancy among related templates. Third, a heuristic template selection strategy is proposed.
    [Show full text]
  • Use of Bioinformatics Tools in Different Spheres of Life Sciences
    g in Geno nin m i ic M s ta & a P Mehmood et al., J Data Mining Genomics Proteomics 2014, 5:2 D r f o Journal of o t e l DOI: 10.4172/2153-0602.1000158 o a m n r i c u s o J ISSN: 2153-0602 Data Mining in Genomics & Proteomics Review Article Open Access Use of Bioinformatics Tools in Different Spheres of Life Sciences Muhammad Aamer Mehmood1, Ujala Sehar1 and Niaz Ahmad2* 1Bioenergy Research Centre, Department of Bioinformatics and Biotechnology, Government College University Faisalabad, Faisalabad-38000, Pakistan 2Agricultural Biotechnology Division, National Institute for Biotechnology and Genetic Engineering (NIBGE), Jhang Road, Faisalabad-38000, Pakistan Abstract The pace, by which scientific knowledge is being produced and shared today, was never been so fast in the past. Different areas of science are getting closer to each other to give rise new disciplines. Bioinformatics is one of such newly emerging fields, which makes use of computer, mathematics and statistics in molecular biology to archive, retrieve, and analyse biological data. Although yet at infancy, it has become one of the fastest growing fields, and quickly established itself as an integral component of any biological research activity. It is getting popular due to its ability to analyse huge amount of biological data quickly and cost-effectively. Bioinformatics can assist a biologist to extract valuable information from biological data providing various web- and/or computer-based tools, the majority of which are freely available. The present review gives a comprehensive summary of some of these tools available to a life scientist to analyse biological data.
    [Show full text]
  • Structural Insights and Characterization of Human Npas4 Protein
    Structural insights and characterization of human Npas4 protein Ammad Fahim1, Zaira Rehman1, Muhammad Faraz Bhatti1, Amjad Ali1, Nasar Virk1, Amir Rashid2 and Rehan Zafar Paracha3 1 Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), Islamabad, Pakistan 2 Army Medical College, National University of Medical Sciences, Rawalpindi, Pakistan 3 Research Center for Modeling and Simulation (RCMS), National University of Sciences & Technology (NUST), Islamabad, Pakistan ABSTRACT Npas4 is an activity dependent transcription factor which is responsible for gearing the expression of target genes involved in neuro-transmission. Despite the importance of Npas4 in many neuronal diseases, the tertiary structure of Npas4 protein along with its physico-chemical properties is limited. In the current study, first we perfomed the phylogenetic analysis of Npas4 and determined the content of hydrophobic, flexible and order-disorder promoting amino acids. The protein binding regions, post-translational modifications and crystallization propensity of Npas4 were predicted through different in-silico methods. The three dimensional model of Npas4 was predicted through LOMET, SPARSKS-X, I-Tasser, RaptorX, MUSTER and Pyhre and the best model was selected on the basis of Ramachandran plot, PROSA, and Qmean scores. The best model was then subjected to further refinement though MODREFINER. Finally the interacting partners of Npas4 were identified through STRING database. The phylogenetic analysis showed the human Npas4 gene to be closely related to other primates such as chimpanzees, monkey, gibbon. The physiochemical properties of Npas4 showed that it is an intrinsically disordered protein with N-terminal ordered region. The post-translational modification analyses indicated absence of acetylation Submitted 24 February 2018 and mannosylation sites.
    [Show full text]