PDF Bioinformatics Glossary

Total Page:16

File Type:pdf, Size:1020Kb

PDF Bioinformatics Glossary FR3 Bioinformatics Primer v1 June 23, 2014 FR3 Bioinformatics Primer: Glossary of terms* (*from various sources including Discovering Genomics, Proteomics, & Bioinformatics 2nd edition, Campbell and Heyer; The Internet and the New Biology by Peruski and Peruski; and Bioinformatics for Dummies by Claverie and Notredame) Accession number identification Antisense technology molecular number given to every DNA and method that uses a nucleic acid protein sequence submitted to NCBI sequence complementary to an or equivalent database. mRNA so that the two bind and the mRNA is effectively neutralized. Algorithm step-by-step procedure for solving a problem (e.g. aligning Array an orderly pattern of objects. two sequences) or computing a In genomic studies, there are quantity (e.g. %GC). Typically microarrays and macroarrays. written in Perl or another computer Microarrays are small spots of DNA language. or protein and the identity of the spotted material is known. Alignment representation of two or Macroarrays are bacterial, yeast or more protein or nucleotide similar colonies on plates used to sequences where identical amino determine functional consequences acids or nucleotides are in the same of genomic manipulations. columns while missing amino acids or nucleotides are replaced with Anonymous FTP A file transfer gaps. protocol (FTP) that allows the retrieval of files from public sites. Allele frequency prevalence of a gene variant in a population. Archive a collection of data, text, programs or other electronic Annotate a genome is annotated information stored for other parties to once it has been analyzed for gene access or retrieve, typically without content. A gene is considered charge. annotated if it has been assigned information pertaining to a cellular ASCII stands for American Standard role. Code for Information Interchange, sets a unique numeric definition for Antibody array An antibody each of the most commonly used microarray is a specific form of characters in Western language, protein microarrays, a collection of including numerals, letters and some capture antibodies are spotted and symbols. An ASCII set is these fixed on a solid surface such as characters and their numbers. In the glass, plastic or silicon chip, for the context of a file, an ASCII file is one purpose of detecting antigens. that contains only text characters: 1 FR3 Bioinformatics Primer v1 June 23, 2014 numbers, letters and standard punctuation. Bootstrap analysis a statistical technique used to assess the Bacterial artificial chromosome reliability of the branching structure (BAC) cloning vector replicated in in phylogenetic trees. bacteria that can hold an insert of about 150,000 base pairs. Bowtie software that maps short DNA sequences to a reference BAM file a binary version of a SAM genome. file, the format is .bam. Browser client software that reads BankIt a computer program HTML-formatted documents and developed by the NCBI for displays them on your computer submitting your own sequences to screen. GenBank. cDNA, see complementary DNA Binomial a probability model for counting the number of ‘successes’ CDS acronym for ‘coding sequences’ in an experiment with two possible outcomes. Cellular component a term coined by Gene Ontology to describe Bioinformatics a field of study that subcellular structures, locations and extracts biological information from macromolecular complexes such as large data sets such as sequences, nucleus, telomere, and mitotic protein interactions, microarrays, etc. spindles. This field also includes the area of data visualization. ChIP-Seq, chromatin immunoprecipitation sequencing Biological process coined by Gene combination of chromatin Ontology to describe broad cellular immunoprecipitation and massively outcomes, such as mitosis or energy parallel sequencing used to survey production, that are accomplished by interactions between protein, DNA ordered assemblies of molecules. and RNA. BLAST the protein and nucleic acid Chromatogram a four-colored graph sequence search engine developed produced from nonradioactive at NCBI that allows you to search dideoxy sequencing methods sequence databases. BLASTn including cycle sequencing. searches for nucleotide sequences, BLASTp searches for amino acid Clade group of related species and sequences; BLAST2 compares 2 their common ancestors. sequences. Cladogram phylogenetic tree BLOSUM, block scoring matrix, a showing the relationship between the popular substitution matrix for species. aligning protein sequences. 2 FR3 Bioinformatics Primer v1 June 23, 2014 Client In its simplest form a client is Conserved domain (CD) a domain a software program that an individual that has been retained during uses to send requests to a server. In evolution presumably due to its a broader definition it can mean a essential role within the protein’s computer or computer program that structure. Conserved domain requests a service of a host searches are a part of the BLAST computer or program. search. Clone noun or verb. A clone is any Constitutive always active; molecule/cell/organism present in constitutive promoters or genes are more than one identical copy. To active at all times. clone something means to produce more than one copy of the original Contiguous (contigs) overlapping molecule/cell/organism. DNA segments that as a collection form a longer and gapless segment ClustalW the most popular program of DNA. for making multiple sequence alignments. Correlation coefficient a measure of the correspondence between two Clustering action of grouping sets of data. sequences according to their similarity Covariance analysis the study of a sequence that follows the evolution Clusters of orthologous groups of two different positions and looks COG NCBI compilations of for some correlation. evolutionarily related gene sequences from several microbial Coverage based on the number of genomes. This site allows you to bases sequenced in a genome, the search by gene or cellular role and coverage represents how many produce dendrograms to show times an average base was sequence similarities. sequenced; finished genomes frequently have 8X coverage. Comparative genomics analysis based on comparisons of entire Cufflinks software that accepts genomes. aligned RNAseq reads and estimates the relative abundances of Complementary DNA abbreviation transcripts. Also tests for differential used at NCBI to indicate which expression between samples. bases constitute the open reading frame. Cy3 and Cy5 fluorescent green and red dyes, respectively, that are Consensus a pseudo sequence commonly used for microarray where each residue summarizes experiments. (using the majority rule) the content of a column in a multiple sequence Curate oversee/annotate/manipulate alignment. gene data (often manually). Includes 3 FR3 Bioinformatics Primer v1 June 23, 2014 fine annotation of genes, filtering of Draft sequence a description of the genes according to certain degree of confidence in a DNA parameters and construction and sequence. When the DNA has been manipulation of gene models. sequenced only 4 times it is described as a ‘draft sequence’ as Dendrogram another name for opposed to a ‘finished sequence’ phylogenetic tree. The ClustalW which has been sequenced 8 times. guide tree is often referred to as a With a draft sequence about 95% of dendrogram, it is a file with a .dnd the genes should be identifiable. extension. dsDNA double stranded DNA, Distance matrix matrix that contains formed when two complementary all the pairwise distances between DNA sequences bind each other. sequences in a data set. Distance matrices afe often u sed to compute dsRNA double stranded RNA, phylogenetic trees. Not a formed when two complementary substitution matrix. RNA sequences bind to each other. DNA microarray or DNA chip E value (Expect value) when synonyms for gene sequences performing a BLAST search, you will spotted on glass slides used to obtain an E-value for each sequence measure simultaneously the level of that is retrieved. An E-value can be transcription of many genes. thought of as the probability that two sequences are similar to each other Domain (computational) a network by chance. Therefore E-values are or portion thereof that operates best when they are small (e.g. 1 x under a single administrative 10-12). umbrella such as an institution, a group of institutions, a geographical EC number, Enzyme Commission region, or a country. number a numerical classification scheme for enzymes based on the Domain (molecular) a protein unit chemical reactions they catalyze. of at least 50 amino acids that can fold on its own, can also be used for EBI, European Bioinformatics protein sequences that occur in Institute the European counterpart various contexts. to the NCBI in the U.S. Dot matrix dot plot, a method for EMBL, European Molecular representing the similarity between Biology Laboratory, this acronym two sequences without using an refers to the nucleotide database alignment. that the laboratory maintains. Downstream a relative direction for Entrez the NCBI database querying a DNA sequence; towards the 3’ system that is similar to SRS at the end. EBL. 4 FR3 Bioinformatics Primer v1 June 23, 2014 Epitope a part of an antigen that is commend method of transferring recognized by the immune system. files across networks and between remote computers. EST, expressed sequence tag a short subset of a cDNA sequence. GA II Illumina sequencing platform that generates 40 Mb sequence per Expansionist a person who takes a run with paired read lengths
Recommended publications
  • 1. a 6-Frame Translation Map of a Segment of DNA Is Shown, with Three Open Reading Frames (A, B, and C). Orfs a and B Are Known to Be in Separate Genes
    1. A 6-frame translation map of a segment of DNA is shown, with three open reading frames (A, B, and C). ORFs A and B are known to be in separate genes. 1a. Two transcription bubbles are shown, one in ORF A and one in ORF B. In the transcription bubble diagram, mark the following: • the location of RNA polymerase on the appropriate strand in each bubble • the RNA transcripts to show the relative lengths of RNA made by those two polymerases 1b. Are the promoters for ORFs A and/or B present in the DNA region shown in this diagram? Promoter for A: Present? Yes No (circle one) If present, mark its location and label it. Promoter for B: Present? Yes No (circle one) If present, mark its location and label it. 1c. Electron microscopy experiments failed to show RNA polymerases over the ORF "C" region of DNA. State whether each of the three explanations listed below is valid or not, explaining as necessary: Explanation If valid, just write “Valid.” If invalid, BRIEFLY explain why. _______________________________________________________________________ ORFs B and C are exons of the same gene and a splicing error causes ORF C to be left out. Splicing occurs after transcription. Incorrect splicing doesn't explain why transcription didn't happen. ORF C has a mutation in its start (ATG) codon, preventing transcription. The start codon is used for translation, not transcription... whether or not the start codon is intact, transcription could still happen. ORF C has a promoter mutation preventing transcription. VALID. (A promoter mutation is consistent with failure to transcribe the gene.) 2.
    [Show full text]
  • Evolution, Politics and Law
    Valparaiso University Law Review Volume 38 Number 4 Summer 2004 pp.1129-1248 Summer 2004 Evolution, Politics and Law Bailey Kuklin Follow this and additional works at: https://scholar.valpo.edu/vulr Part of the Law Commons Recommended Citation Bailey Kuklin, Evolution, Politics and Law, 38 Val. U. L. Rev. 1129 (2004). Available at: https://scholar.valpo.edu/vulr/vol38/iss4/1 This Article is brought to you for free and open access by the Valparaiso University Law School at ValpoScholar. It has been accepted for inclusion in Valparaiso University Law Review by an authorized administrator of ValpoScholar. For more information, please contact a ValpoScholar staff member at [email protected]. Kuklin: Evolution, Politics and Law VALPARAISO UNIVERSITY LAW REVIEW VOLUME 38 SUMMER 2004 NUMBER 4 Article EVOLUTION, POLITICS AND LAW Bailey Kuklin* I. Introduction ............................................... 1129 II. Evolutionary Theory ................................. 1134 III. The Normative Implications of Biological Dispositions ......................... 1140 A . Fact and Value .................................... 1141 B. Biological Determinism ..................... 1163 C. Future Fitness ..................................... 1183 D. Cultural N orm s .................................. 1188 IV. The Politics of Sociobiology ..................... 1196 A. Political Orientations ......................... 1205 B. Political Tactics ................................... 1232 V . C onclusion ................................................. 1248 I. INTRODUCTION
    [Show full text]
  • In Silico Tools for Splicing Defect Prediction: a Survey from the Viewpoint of End Users
    © American College of Medical Genetics and Genomics REVIEW In silico tools for splicing defect prediction: a survey from the viewpoint of end users Xueqiu Jian, MPH1, Eric Boerwinkle, PhD1,2 and Xiaoming Liu, PhD1 RNA splicing is the process during which introns are excised and informaticians in relevant areas who are working on huge data sets exons are spliced. The precise recognition of splicing signals is critical may also benefit from this review. Specifically, we focus on those tools to this process, and mutations affecting splicing comprise a consider- whose primary goal is to predict the impact of mutations within the able proportion of genetic disease etiology. Analysis of RNA samples 5′ and 3′ splicing consensus regions: the algorithms used by different from the patient is the most straightforward and reliable method to tools as well as their major advantages and disadvantages are briefly detect splicing defects. However, currently, the technical limitation introduced; the formats of their input and output are summarized; prohibits its use in routine clinical practice. In silico tools that predict and the interpretation, evaluation, and prospection are also discussed. potential consequences of splicing mutations may be useful in daily Genet Med advance online publication 21 November 2013 diagnostic activities. In this review, we provide medical geneticists with some basic insights into some of the most popular in silico tools Key Words: bioinformatics; end user; in silico prediction tool; for splicing defect prediction, from the viewpoint of end users. Bio- medical genetics; splicing consensus region; splicing mutation INTRODUCTION TO PRE-mRNA SPLICING AND small nuclear ribonucleoproteins and more than 150 proteins, MUTATIONS AFFECTING SPLICING serine/arginine-rich (SR) proteins, heterogeneous nuclear ribo- Sixty years ago, the milestone discovery of the double-helix nucleoproteins, and the regulatory complex (Figure 1).
    [Show full text]
  • Important Considerations for Sample Collection in Metabolomics Studies with a Special Focus on Applications to Liver Functions
    H OH metabolites OH Review Important Considerations for Sample Collection in Metabolomics Studies with a Special Focus on Applications to Liver Functions Lorraine Smith 1, Joran Villaret-Cazadamont 1, Sandrine P. Claus 2 ,Cécile Canlet 1, Hervé Guillou 1 , Nicolas J. Cabaton 1 and Sandrine Ellero-Simatos 1,* 1 Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, 31300 Toulouse, France; [email protected] (L.S.); [email protected] (J.V.-C.); [email protected] (C.C.); [email protected] (H.G.); [email protected] (N.J.C.) 2 LNC Therapeutics, 17 place de la Bourse, 33076 Bordeaux, France; [email protected] * Correspondence: [email protected] Received: 11 February 2020; Accepted: 7 March 2020; Published: 12 March 2020 Abstract: Metabolomics has found numerous applications in the study of liver metabolism in health and disease. Metabolomics studies can be conducted in a variety of biological matrices ranging from easily accessible biofluids such as urine, blood or feces, to organs, tissues or even cells. Sample collection and storage are critical steps for which standard operating procedures must be followed. Inappropriate sample collection or storage can indeed result in high variability, interferences with instrumentation or degradation of metabolites. In this review, we will first highlight important general factors that should be considered when planning sample collection in the study design of metabolomic studies, such as nutritional status and circadian rhythm. Then, we will discuss in more detail the specific procedures that have been described for optimal pre-analytical handling of the most commonly used matrices (urine, blood, feces, tissues and cells).
    [Show full text]
  • Classification and Function of Small Open-Reading Frames Abstract
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Sussex Research Online 1 Classification and function of small open-reading frames Juan-Pablo Couso1,2* and Pedro Patraquim2 1Centro Andaluz de Biologia del Desarrollo, CSIC-UPO, Sevilla, Spain and 2Brighton and Sussex Medical School, University of Sussex, Brighton, United Kingdom. *Author for correspondence: [email protected] Abstract Small open-reading frames (smORFs or sORFs) of 100 codons or less are usually - if arbitrarily - excluded from canonical proteome annotations. Despite this, the genomes of a wide range of metazoans, including humans, contain hundreds of smORFs, some of which fulfil key physiological functions. Recently, ribosomal profiling has been employed to show that the transcriptome of the model organism Drosophila melanogaster contains thousands of smORFs of different classes actively undergoing translation which produces peptides of mostly unknown function. Here we present a comprehensive analysis of the smORF repertoire in flies, mice and humans. We propose the existence of several classes of smORFs with different functions, from inert DNA sequences to transcribed and translated cis- regulators of translation, and finally to expression of functional peptides with a propensity to act as regulators of canonical membrane-associated proteins, or as components of ancestral protein complexes in the cytoplasm. We suggest that the different smORF classes could represent steps during the evolution of novel peptide and protein sequences. Our analysis introduces a distinction between different peptide-coding classes in animal genomes, and highlights the role of Drosophila melanogaster as a model organism for the study of small peptide biology in the context of development, physiology and human disease.
    [Show full text]
  • Electrophysiology Read-Out Tools for Brain-On-Chip Biotechnology
    micromachines Review Electrophysiology Read-Out Tools for Brain-on-Chip Biotechnology Csaba Forro 1,2,†, Davide Caron 3,† , Gian Nicola Angotzi 4,†, Vincenzo Gallo 3, Luca Berdondini 4 , Francesca Santoro 1 , Gemma Palazzolo 3,* and Gabriella Panuccio 3,* 1 Tissue Electronics, Fondazione Istituto Italiano di Tecnologia, Largo Barsanti e Matteucci, 53-80125 Naples, Italy; [email protected] (C.F.); [email protected] (F.S.) 2 Department of Chemistry, Stanford University, Stanford, CA 94305, USA 3 Enhanced Regenerative Medicine, Fondazione Istituto Italiano di Tecnologia, Via Morego, 30-16163 Genova, Italy; [email protected] (D.C.); [email protected] (V.G.) 4 Microtechnology for Neuroelectronics, Fondazione Istituto Italiano di Tecnologia, Via Morego, 30-16163 Genova, Italy; [email protected] (G.N.A.); [email protected] (L.B.) * Correspondence: [email protected] (G.P.); [email protected] (G.P.); Tel.: +39-010-2896-884 (G.P.); +39-010-2896-493 (G.P.) † These authors contributed equally to this paper. Abstract: Brain-on-Chip (BoC) biotechnology is emerging as a promising tool for biomedical and pharmaceutical research applied to the neurosciences. At the convergence between lab-on-chip and cell biology, BoC couples in vitro three-dimensional brain-like systems to an engineered microfluidics platform designed to provide an in vivo-like extrinsic microenvironment with the aim of replicating tissue- or organ-level physiological functions. BoC therefore offers the advantage of an in vitro repro- duction of brain structures that is more faithful to the native correlate than what is obtained with conventional cell culture techniques.
    [Show full text]
  • In Silico Protein Design: a Combinatorial and Global Optimization Approach by John L
    From SIAM News, Volume 37, Number 1, January/February 2004 In Silico Protein Design: A Combinatorial and Global Optimization Approach By John L. Klepeis and Christodoulos A. Floudas The use of computational techniques to create peptide- and protein-based therapeutics is an important challenge in medicine. The ultimate goal, defined about two decades ago, is to use computer algorithms to identify amino acid sequences that not only adopt particular three-dimensional structures but also perform specific functions. To those familiar with the field of structural biology, it is certainly not surprising that this problem has been described as “inverse protein folding” [16]. That is, while the grand challenge of protein folding is to understand how a particular protein, defined by its amino acid sequence, finds its unique three-dimensional structure, protein design involves the discovery of sets of amino acid sequences that form functional proteins and fold into specific target structures. Experimental, computational, and hybrid approaches have all contributed to advances in protein design. Applying mutagenesis and rational design techniques, for example, experimentalists have created enzymes with altered functionalities and increased stability. The coverage of sequence space is highly restricted for these techniques, however [4]. An approach that samples more diverse sequences, called directed protein evolution, iteratively applies the techniques of genetic recombination and in vitro functional assays [1]. These methods, although they do a better job of sampling sequence space and generating functionally diverse proteins, are still restricted to the screening of 103 – 106 sequences [22]. Challenges of Generic Computational Protein Design The limitations of experimental techniques serve to highlight the importance of computational protein design.
    [Show full text]
  • Biological Waste Water Treatment
    Wastewater Management 18. SECONDARY TREATMENT Secondary treatment of the wastewater could be achieved by chemical unit processes such as chemical oxidation, coagulation-flocculation and sedimentation, chemical precipitation, etc. or by employing biological processes (aerobic or anaerobic) where bacteria are used as a catalyst for removal of pollutant. For removal of organic matter from the wastewater, biological treatment processes are commonly used all over the world. Hence, for the treatment of wastewater like sewage and many of the agro-based industries and food processing industrial wastewaters the secondary treatment will invariably consist of a biological reactor either in single stage or in multi stage as per the requirements to meet the discharge norms. 18.1 Biological Treatment The objective of the biological treatment of wastewater is to remove organic matter from the wastewater which is present in soluble and colloidal form or to remove nutrients such as nitrogen and phosphorous from the wastewater. The microorganisms (principally bacteria) are used to convert the colloidal and dissolved carbonaceous organic matter into various gases and into cell tissue. Cell tissue having high specific gravity than water can be removed in settling tank. Hence, complete treatment of the wastewater will not be achieved unless the cell tissues are removed. Biological removal of degradable organics involves a sequence of steps including mass transfer, adsorption, absorption and biochemical enzymatic reactions. Stabilization of organic substances by microorganisms in a natural aquatic environment or in a controlled environment of biological treatment systems is accomplished by two distinct metabolic processes: respiration and synthesis, also called as catabolism and anabolism, respectively.
    [Show full text]
  • 7 Systems Biotechnology: Combined in Silico and Omics Analyses for The
    0195300815_0193-0231_ Ch-07.qxd 23/6/06 4:56 PM Page 193 7 Systems Biotechnology: Combined in Silico and Omics Analyses for the Improvement of Microorganisms for Industrial Applications Sang Yup Lee, Dong-Yup Lee, Tae Yong Kim, Byung Hun Kim, & Sang Jun Lee Biotechnology plays an increasingly important role in the healthcare, pharmaceutical, chemical, food, and agricultural industries. Microorganisms have been successfully employed for the production of recombinant proteins [1–4] and various primary and secondary metabolites [5–8]. As in other engineering disciplines, one of the ulti- mate goals of industrial biotechnology is to develop lower-cost and higher-yield processes. Toward this goal, fermentation and down- stream processes have been significantly improved thanks to the effort of biochemical engineers [9]. In addition to the effort of making these mid- to downstream processes more efficient, microbial strains have been improved by recombinant and other molecular biological methods, leading to the increase in microbial metabolic activities toward desired goals [10]. However, these conventional attempts have not always been successful owing to unexpected changes in the physiology and metabolism of host cells. Alternatively, rational meta- bolic and cellular engineering approaches have been tried to solve such problems in a number of cases, but they were also still limited to the manipulation of only a handful of genes encoding enzymes and regulatory proteins. In this regard, systematic approaches are indeed required not only for understanding the global context of the meta- bolic system but also for designing better metabolic engineering strategies. Recent advances in high-throughput experimental techniques have resulted in rapid accumulation of a wide range of biological data and information at different levels: genome, transcriptome, pro- teome, metabolome, and fluxome [11–17].
    [Show full text]
  • Functionalization of Brain Region-Specific Spheroids With
    www.nature.com/scientificreports OPEN Functionalization of Brain Region- specifc Spheroids with Isogenic Microglia-like Cells Received: 5 February 2019 Liqing Song1,6, Xuegang Yuan1, Zachary Jones2, Cynthia Vied 3, Yu Miao1, Mark Marzano1, Accepted: 15 July 2019 Thien Hua4, Qing-Xiang Amy Sang4,5, Jingjiao Guan1, Teng Ma1, Yi Zhou2 & Yan Li1,5 Published: xx xx xxxx Current brain spheroids or organoids derived from human induced pluripotent stem cells (hiPSCs) still lack a microglia component, the resident immune cells in the brain. The objective of this study is to engineer brain region-specifc organoids from hiPSCs incorporated with isogenic microglia- like cells in order to enhance immune function. In this study, microglia-like cells were derived from hiPSCs using a simplifed protocol with stage-wise growth factor induction, which expressed several phenotypic markers, including CD11b, IBA-1, CX3CR1, and P2RY12, and phagocytosed micron-size super-paramagnetic iron oxides. The derived cells were able to upregulate pro-infammatory gene (TNF-α) and secrete anti-infammatory cytokines (i.e., VEGF, TGF-β1, and PGE2) when stimulated with amyloid β42 oligomers, lipopolysaccharides, or dexamethasone. The derived isogenic dorsal cortical (higher expression of TBR1 and PAX6) and ventral (higher expression of NKX2.1 and PROX1) spheroids/ organoids displayed action potentials and synaptic activities. Co-culturing the microglia-like cells (MG) with the dorsal (D) or ventral (V) organoids showed diferential migration ability, intracellular Ca2+ signaling, and the response to pro-infammatory stimuli (V-MG group had higher TNF-α and TREM2 expression). Transcriptome analysis exhibited 37 microglia-related genes that were diferentially expressed in MG and D-MG groups.
    [Show full text]
  • In Silico Prediction of Protein Flexibility with Local Structure Approach
    Bioinformatics protein flexibility prediction Preprint – accepted for publication in Biochimie in silico prediction of protein flexibility with local structure approach. Tarun J. Narwani1,2,3,+, Catherine Etchebest1,2,3,+, Pierrick Craveur1,2,3,4, Sylvain Léonard1,2,3, Joseph Rebehmed1,2,3,5, Narayanaswamy Srinivasan6, Aurélie Bornot1,2,3, #, Jean-Christophe Gelly1,2,3 & Alexandre G. de Brevern1,2,3,4,* 1 INSERM, U 1134, DSIMB, Univ Paris, Univ de la Réunion, Univ des Antilles, F-75739 Paris, France. 2 Institut National de la Transfusion Sanguine (INTS), F-75739 Paris, France. 3 Laboratoire d'Excellence GR-Ex, F-75739 Paris, France. 4 Molecular Graphics Laboratory, Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA. 5 Department of Computer Science and Mathematics, Lebanese American University, Byblos 1h401 2010, Lebanon. 9 MBU, IISc, Bangalore, India # Present address: AstraZeneca, Discovery Sciences, Computational Biology, Alderley Park UK. Short title: bioinformatics protein flexibility prediction * Corresponding author: Mailing address: Dr. de Alexandre G. de Brevern, INSERM UMR_S 1134, DSIMB, Université Paris, Institut National de Transfusion Sanguine (INTS), 6, rue Alexandre Cabanel, 75739 Paris cedex 15, France e-mail : [email protected] 1 Bioinformatics protein flexibility prediction Abstract Flexibility is an intrinsic essential feature of protein structures, directly linked to their functions. To this day, most of the prediction methods use the crystallographic data (namely B-factors) as the only indicator of protein’s inner flexibility and predicts them as rigid or flexible. PredyFlexy stands differently from other approaches as it relies on the definition of protein flexibility (i) not only taken from crystallographic data, but also (ii) from Root Mean Square Fluctuation (RMSFs) observed in Molecular Dynamics simulations.
    [Show full text]
  • Transcription and Open Reading Frame
    Transcription The expression of genetic information stored in the DNA sequence starts with synthesis of the RNA copy of the gene in a process called transcription. The RNA copy of the gene is called messenger RNA (mRNA). A special enzyme, RNA polymerase, recognizes a sequence, called promoter, on the DNA double helix upstream from the protein coding sequence. RNA polymerase binds to the promoter and opens it up: separates the complementary strands at about 12-nt-long region of the promoter. Then the enzyme starts the mRNA synthesis using all 4 NTPs. The DNA strand, which is used as a template by RNA polymerase, is called the template or antisense strand. The opposite strand of the gene, which sequence is identical to the sequence of mRNA (except the substitution T U, of course), is called the coding or sense strand. There is also a special sequence after the end of the gene, which signals to RNA polymerase to terminate the mRNA synthesis. Thus synthesized mRNA molecule, which includes the protein coding region flanked by short unrelated sequences from both sides, is either translated by the ribosome into the protein molecule at the spot (in case of prokaryotes) or is transported from the nucleus to the cytoplasm (in case of eukaryotes) and there it is translated by the ribosome. Open Reading Frame (ORF) Since the genetic code is triplet, three reading frames are possible for the same mRNA molecule. For instance, the sequence: …..AUUGCCUAACCCUUAGGG…. can be separated into triplets by three possible ways: ….AUUGCCUAACCCUUAGGG…. ….AUUGCCUAACCCUUAGGG…. ….AUUGCCUAACCCUUAGGG…. The frame, in which no stop codons are encountered, is called the Open Reading Frame (ORF).
    [Show full text]