Book of Abstracts Belgrade BioInformatics Conference 2016

20-24 June 2016, Belgrade, Serbia

UNIVERSITY OF BELGRADE FACULTY OF MATHEMATICS Nenad Miti´c, editor Belgrade BioInformatics Conference 2016 Book of abstracts Belgrade, June 20th-24th

The conference is organized by the Bioinformatics Research Group, University of Belgrade - Faculty of Mathematics (http://bioinfo.matf.bg.ac.rs). Coorganizers of the conference are: Faculty of Agriculture, Faculty of Biology, Faculty of Chemistry, Faculty of Physical Chemistry, Institute for Biological Re- search ”Siniˇsa Stankovi´c”, Institute for General and Physical Chemistry, Institute for Medical Research, Institute of Molecular Genetics and Genetic Engineering, Vinˇca Institute of Nuclear Sciences, Mathematical Institute of SASA, Belgrade, and COST - European Cooperation in Science and Technology

The conference is financially supported by

– Ministry of Education, Science and Technological Development of Republic of Serbia – Central European Initiative (CEI) – Telekom Srbija – SevenBridges Genomic – RNIDS - Register of National Internet Domain Names of Serbia – Genomix4Life

Publication of this Book of abstracts is financed by the Ministry of Education, Science and Technological Development of Republic of Serbia

Publisher: Faculty of Mathematics, University of Belgrade Printed in Serbia, by DonatGraf, Belgrade Serbian National Library Cataloguing in Publication Data Faculty of Mathematics, Belgrade Book of Abstracts: Belgrade BioInformatics Conference 2016, 20-24 June 2016.– Book of abstracts Nenad Miti´c, editor. XIX+151 pages, 24cm. Copyright c 2016 by Faculty of Mathematics, University of Belgrade All rights reserved. No part of this publication may be reproduced, stored in retrieval system, or transmited, in any form, or by any means, electronic, me- chanical, photocopying, recording or otherwise, without a prior premission of the publisher. ISBN: 978-86-7589-108-6

Number of copies printed: 200 International Advisory Committee Vladik Avetisov The Semenov Institute of Chemical Physica, RAS Moscow, Russia Vladimir Brusi´c School of Medicine and Bioinformatics Center, Nazarbayev University, Kazakhstan and Depart- ment of Computer Science, Metropolitan Col- lege, Boston University, USA Michele Caselle Department of Physics, Torino University, Torino, Italy Radu Constantinescu Department of Physics, University of Craiova, Craiova, Romania Oxana Galzitskaya Group of bioinformatics, Institute of Protein Re- search of the RAS, Russia Madhavi Ganapathiraju Department of Biomedical Informatics, Univer- sity of Pittsburgh, USA Mikhail Gelfand A.A. Kharkevich Institute for Information Trans- mission Problems, RAS, Faculty of Bioengi- neering and Bioinformatics, M.V. Lomonosov Moscow State University, Moscow, Russia Ernst Walter Knapp Fachbereich Biologie, Chemie, Phar- mazie/Institute of Chemistry and Biochemistry, Freie Universitt Berlin, Germany Sergey Kozyrev Steklov Mathematical Institute, Moscow, Russia Zoran Obradovi´c Center for Data Analytics and Biomedical Infor- matics, Temple University, USA Yuriy L. Orlov Institute of Cytology and Genetics SB RAS, Novosibirsk State University, Russia George Patrinos Department of Pharmacy, University of Patras, Greece Nataˇsa Prˇzulj Department of Computing , Imperial College London, UK Paul Sorba Laboratory of Theoretical Physics and CNRS, An- necy, France Bosiljka Tadi´c Department of Theoretical Physics, Jozef Stefan Institute, Ljubljana, Slovenia Peter Tompa VIB Structural Biology Research Center, Flanders Institute for Biotechnology (VIB), Belgium Silvio Tosatto Department of Biomedical Sciences, University of Padova, Italy Edward Trifonov Weizmann Institute of Science, University of Haifa, Haifa, Israel Matthias Ullmann Structural Biology/Bioinformatics Universitt Bayreuth, Germany Bane Vasi´c The University of Arizona, Department of Elec- trical and Computer Engineering, Bios Institute for Collaborative Bioresearch, USA Sergey Volkov Bogolyubov Institute for Theoretical Physics, Kiev, Ukraine Ioannis Xenarios SIB Swiss Institute of Bioinformatics, Switzer- land

BelBI2016, Belgrade, June 2016. International Programme Committee

Miloˇs Beljanski Institute for General and Physical Chemistry, University of Belgrade, Serbia Erik Bongcam-Rudloff Division of Molecular Genetics, Department of Animal Breeding and Genetics, Swedish Univer- sity of Agricultural Sciences, Sweden Antonio Cappuccio Immunity and Cancer, Institut Curie, France Oliviero Carugo Faculty of Science, University of Pavia, Italy Boris Delibaˇsi´c Faculty of Organizational Sciences, University of Belgrade, Serbia Zsuzsanna Dosztanyi Department of Biochemistry Eotv¨ os¨ Lorand´ Uni- versity, Budapest, Hungary Branko Dragovich Institute of Physics, Mathematical Institute SANU, Belgrade, Serbia Marko Djordjevi´c Faculty of Biology, University of Belgrade, Serbia Olgica Djurkovi´c-Djakovi´c Institute for Medical Research, University of Bel- grade, Serbia Lajos Kalmar Department of Veterinary Medicine, Cambridge Veterinary School, Cambridge, UK Eija Korpelainen CSC IT Center for Science, Finland Ilija Lalovi´c Faculty of Natural Sciences and Mathematics, Banja Luka, Bosnia and Herzegovina Nenad Miti´c Faculty of Mathematics, University of Belgrade, Serbia Mihajlo Mudrini´c Vinˇca Institute of Nuclear Sciences, University of Belgrade, Serbia Zoran Ognjanovi´c Mathematical Institute SANU, Serbia Gordana Pavlovi´c-Laˇzeti´c Faculty of Mathematics, University of Belgrade, Serbia Marco Punta Pierre and Marie Curie University, France Predrag Radivojac Department of Computer Science and Informat- ics, Indiana University, USA Ana Simonovi´c Institute for Biological Research Siniˇsa Stankovi´c, Belgrade, Serbia Jerzy Tiuryn Faculty of Mathematics, Informatics and Me- chanics, University of Warsaw, Poland Andrew Torda Center for Bioinformatics, University of Ham- burg, Germany Alessandro Treves SISSA-Cognitive Neuroscience, Trieste, Italy Nevena Veljkovi´c Institute for Nuclear Sciences VINCA, University of Belgrade, Serbia Igor V. Volovich Department of Mathematical Physics, Steklov Mathematical Institute, RAS, Moscow, Russia Sneˇzana Zari´c Faculty of Chemistry, University of Belgrade, Ser- bia

BelBI2016, Belgrade, June 2016. Local Organizing Committee

Bojana Banovi´c Institute of Molecular Genetics and Genetic En- gineering, University of Belgrade, Serbia Miloˇs Beljanski Institute for General and Physical Chemistry, University of Belgrade, Serbia Branko Dragovich Co-Chair, Institute of Physics, Mathematical In- stitute SANU, Belgrade, Serbia Marko Djordjevi´c Faculty of Biology, University of Belgrade, Serbia Olgica Djurkovi´c-Djakovi´c Institute for Medical Research, University of Bel- grade, Serbia Jelana Guzina Faculty of Biology, University of Belgrade, Serbia Jovana Kovaˇcevi´c Faculty of Mathematics, University of Belgrade, Serbia Saˇsa Malkov Faculty of Mathematics, University of Belgrade, Serbia Mirjana Maljkovi´c Faculty of Mathematics, University of Belgrade, Serbia Vesna Medakovi´c Faculty of Chemistry, University of Belgrade, Ser- bia Nenad Miti´c Co-Chair, Faculty of Mathematics, University of Belgrade, Serbia Ivana Mori´c Institute of Molecular Genetics and Genetic En- gineering, University of Belgrade, Serbia Mihajlo Mudrini´c Vinˇca Institute of Nuclear Sciences, University of Belgrade, Serbia Vesna Paji´c Faculty of Agriculture, University of Belgrade, Serbia Mirjana Pavlovi´c Institute for General and Physical Chemistry, University of Belgrade, Serbia Gordana Pavlovi´c-Laˇzeti´c Co-Chair, Faculty of Mathematics, University of Belgrade, Serbia Jelena Samardzi´c Institute of Molecular Genetics and Genetic En- gineering, University of Belgrade, Serbia Ana Simonovi´c Institute for Biological Research Siniˇsa Stankovi´c, Belgrade, Serbia Miomir Stankovi´c Mathematical Institute of the Serbian Academy of Sciences and Arts, Belgrade, Serbia Biljana Stojanovi´c Faculty of Mathematics, University of Belgrade, Serbia Aleksandra Uzelac Institute for Medical Research, University of Bel- grade, Serbia

BelBI2016, Belgrade, June 2016.

Preface

The first International Belgrade BioInformatics Conference (BelBI 2016) takes place in Belgrade, Serbia, 20 - 24 June 2016.It grew out of the communities of previous conferences held in Belgrade, Data Mining in Bioinformatics (DMBI, 2012) and the Theoretical Approaches to Bioinformation Systems (TABIS, 2010, 2013). It is organized by the Bioinformatics group from the University of Bel- grade, Faculty of Mathematics, in cooperation with several other institutions from Belgrade (Faculty of Agriculture, Faculty of Biology, Faculty of Chemistry, Faculty of Physical Chemistry, Institute for Biological Research ”Siniˇsa Stankovi´c”, Institute for General and Physical Chemistry, Institute for Medical Research, In- stitute of Molecular Genetics and Genetic Engineering, Vinˇca Institute of Nuclear Sciences, and Mathematical Institute of Serbian Academy of Science and Arts) and COST (European Cooperation in Science and Technology) Action BM1405. The main purpose of the BelBI 2016 conference is to illuminate different aspects of bioinformation systems, from theoretical approaches to modeling different phenomena in life sciences, to information technologies necessary for analysis and understanding huge amount of data generated, to application of computer science and informatics in the domain of precision medicine, finding new reme- dies against debilitating diseases and drug development. The conference focuses on three main research fields including (but not limited to) the following topics: 1. Theoretical Approaches to BioInformation Systems: – Structure and function of DNA, RNA and proteins – Gene expression and the genetic code – Neurons and cognition – Biological networks 2. Bioinformatics and Data Mining for OMICs Data: – Data mining methods, algorithms, and applications in life sciences and precision medicine – Big data and data science – Data analytics, pattern recognition and machine learning in data analysis – Software and tools in genomics, proteomics, metabolomics, transcrip- tomics, epigenomics, etc. – Sequence analysis – Predictive Models for OMICs data – Bioinformatics databases and algorithms 3. Biomedical Informatics will focus on information applied to or studied in the context of biomedicine: – Translational Bioinformatics – Disease Models & Epidemiology – Predictive Modeling and Analytics in Healthcare – Biomedical Imaging and Data Visualization – Biomedical/Health database integration and management – Biomedical data/text mining The conference program contains keynote lectures, invited talks, selected oral and poster presentations. We did our best to bring together scientists from Europe and beyond and will hopefully provide a pleasant and stimulative place of gathering and exchange of ideas in the field of bioinformatics and related fields. We thank all the colleagues who accepted our invitation to serve at the International Advisory, Program and Organizing Committees. We also thank all the colleagues who accepted our in- vitation to present their research. The book of abstracts of all the presentations is in our hands. We thank the Ministry of Education, Science and Technological Development of Republic of Serbia for financially supporting publication of this book of abstracts. We also thank our sponsors (Ministry of Education, Science and Technological Development of Republic of Serbia, Central European Initia- tive (CEI), Telekom Srbija, SevenBridges Genomic, RNIDS - Register of National Internet Domain Names of Serbia, and Genomix4Life) and all others who helped us in making this event happen.

June 2016 Program co-chairs: Branko Dragovich Gordana Pavlovic-Lazeti´c Nenad Miti´c

BelBI2016, Belgrade, June 2016. BelBI2016 Conference program

Monday, June 20th

Morning session Location: Rectors hall, Rectorate of the University of Belgrade

9:00-10:00 Registration

10:00-10:15 Opening ceremony

Chair: Branko Dragovich

10:15-11:00 Keynote speaker – Mikhail Gelfand (Moscow State University, Russia) Epigenetic state and spatial structure of chromatin

11:00-11:45 Welcome cocktail

Chair: Gordana Pavlovic-Lazetic

11:45-12:30 Keynote speaker – Vladimir Brusic (Nazarbayev University, Kazakhstan) Elemental metabolomics for improving human health

12:30-13:15 Keynote speaker – Vladimir Uversky (University of South Florida, USA) Intrinsically disordered proteins in salted water and in the thick soup

13:20-15:00 Lunch (Hotel ”Palace”)

Afternoon session: Location - Hotel ”Palace”, Conference hall

Chair: Mikhail Gelfand

15:00-15:35 Invited Speaker: Alexandre Morozov (Rutgers University, USA) Biophysical models of protein evolutionary dynamics 15:00-15:35 Invited Speaker: Alexandre Morozov (Rutgers University, USA) Biophysical models of protein evolutionary dynamics

15:35-16:10 Invited Speaker: Vladik Avetisov (The Semenov Institute of Chemical Physics, RAS, Russia) Complex landscapes and ultrametricity in a biological context

16:10-16:40 coffee break

TABIS Session: Hotel ”Palace”, Conference hall

Chair: Alexandre Morozov

16:40-17:00 Aleksandr Bugay (Joint Institute for Nuclear Research, Moscow, Russia) Radiation Induced Dysfunctions in the Working Memory Per- formance Studied by Neural Network Modeling

17:00-17:20 Hanen Masmoudi (Higher institute of Biotechnology of Sfax, Tunisia) Model selection in biomolecular pathways

17:20-17:40 Silvia Grigolon (The Francis Crick Institute, United Kingdom) Identifying relevant positions in proteins by Critical Variable Selection

17:40-18:00 Jelena Guzina (University of Belgrade, Faculty of Biology, Ser- bia) Transcription initiation by alternative sigma factors

18:00-18:20 Bojana Blagojevic (Institute of Physics, Belgrade, Serbia) Achieving a rapid expression of toxic (but useful) molecules within cell

18:20-18:40 Andjela Rodic (University of Belgrade, Faculty of Biology, Ser- bia) Examining regulation of restriction-modification systems by quantitative modeling

ii BelBI2016, Belgrade, June 2016. DMBI Session - Hotel ”Palace”, Banquet hall

Chair: Jovana Kovacevic

16:40-17:00 Urs Lahrmann (Fraunhofer Institute for Toxicology and Exper- imental Medicine, Regensburg, Germany) Combined genomic and transcriptomic characterization of sin- gle disseminated prostate cancer cells

17:00-17:20 Miroslava Cuperlovic-Culf (National Research Council of Canada, Ottawa, Canada) Genome-scale Modelling, Metabolomics and Cheminformatics analysis guiding the Discovery of Antifungal Metabolites for Crop Protection

17:20-17:40 Milos Busarcevic (United World College of the Adriatic, Duino, Italy) Transcriptome data mining results support observed changes in host lipid metabolism during experimental toxoplasmosis

17:40-18:00 Jovana Kovacevic (University of Belgrade, Faculty of Mathe- matics, Serbia) One structured output learning method for protein function prediction

18:00-18:20 Davorka Jandrlic (Faculty of Mechanical Engineering, Univer- sity of Belgrade, Serbia) The influence of amino acids physicochemical properties and frequencies on identifying MHC binding ligands

18:20-18:40 Vladimir Babenko (Institute of Cytology and Genetics, Novosi- birsk, Russia) Clustering of CpG-rich elements in gene dense regions

BelBI2016, Belgrade, June 2016. iii Tuesday, June 21th

COST session Morning session: Location - Hotel ”Palace”, Conference hall

Chair: Nevena Veljkovic

9:00- 9:35 Invited Speaker: Peter Tompa (Flanders Institute for Biotech- nology (VIB), Belgium) The role of structural disorder in protein degradation

9:35-10:10 Invited Speaker: Silvio Tosatto (University of Padova, Italy) Non-globular proteins: Towards an understanding of the ”dark matter” in the protein universe

10:10-10:45 Invited Speaker: Oxana Galzitskaya (Institute of Protein Re- search of the Russian Academy of Sciences, Russia) Molecular mechanism of Aβ amyloid formation

10:45-11:15 coffee break

Chair: Oliviero Carugo

11:15-11:50 Invited Speaker: Marco Punta (Pierre and Marie Curie Univer- sity, France) Intrinsically disordered protein families

11:50-12:25 Invited Speaker: Alexandre de Brevern (University Paris Diderot, France) On flexibility, deformability and mobility of protein structures in the light of a structural alphabet

12:25-13:00 Invited Speaker: Ioannis Xenarios (SIB Swiss Institute of Bioin- formatics, Switzerland) From biocuration to model predictions and back

13:00-14:00 Sponsors presentation (SevenBridges)

14:00-15:00 Lunch (Hotel ”Palace”)

iv BelBI2016, Belgrade, June 2016. Afternoon session: TABIS Session: Hotel ”Palace”, Banquet hall

Chair: Bosiljka Tadic

15:00-15:35 Invited Speaker: Antonio Celani (The Abdus Salam Interna- tional Centre for Theoretical Physics, Trieste, Italy) Infomax strategies for an optimal balance of exploration and exploitation

15:35-16:10 Invited Speaker: Stojmirovic, Aleksandar (Johnson & Johnson comp., USA) Networks of Co-expression Modules

16:15-16:35 Asja Jelic (The Abdus Salam International Centre for Theoreti- cal Physics, Trieste, Italy) Networks of interaction in moving animal groups and collec- tive changes of direction

16:35-16:55 Anashkina Anastasia (Engelhardt Institute of Molecular Biol- ogy, Russian Academy of Sciences, Russia) Bioinformatics Basis for the ”Molecular Tweezers” Construc- tion

16:55-17:25 coffee break

17:25-19:00 Poster Session (Hotel ”Palace”, Banquet hall)

HI Session - Hotel ”Palace”,Conference hall

Chair: Olgica Djurkovic-Djakovic

15:00-15:35 Invited Speaker: Ralf Bundschuh (The Ohio State University, USA) Quantifying genome-wide DNA methylation from MethylCap- Seq data and its applications in cancer

BelBI2016, Belgrade, June 2016. v 15:35-15:55 Paolo Paradisi ((Institute of Information Science and Technolo- gies, Pisa, Italy) Complexity measures based on intermittent events in brain EEG data

15:55-16:15 Ivan Jovanovic (VINCA Institute, University of Belgrade, Ser- bia) Could integrative bioinformatic approach predict the circulat- ing miRs that have significant role in pancreatic tissue in type 2 diabetes?

16:15-16:35 Nikola Milosevic (University of Manchester, UK) Hybrid methodology for information extraction from tables in clinical literature

16:35-16:55 Petar Velickovic (University of Cambridge, UK) Viral: Real-world competing process simulations on multiplex networks

16:55-17:25 coffee break

17:25-19:00 Poster Session (Hotel ”Palace”, Banquet hall)

Wednesday, June 22th

Morning session: Location - Hotel ”Palace”, Conference hall

Chair: Vladimir Brusic

8:30- 9:05 Invited Speaker: Zoran Obradovic (Temple University, USA) Effectiveness of Multiple Blood Cleansing Interventions in Sepsis

9:05- 9:40 Invited Speaker: Natasa Przulj (Imperial College London, UK) Network Data Integration Enables Precision Medicine

9:40-10:15 Invited Speaker: Nitesh Chawla (University of Notre Dame, USA) Leveraging Electronic Medical Records for Personalized and Pop- ulation Healthcare

11:00- EXCURSION

vi BelBI2016, Belgrade, June 2016. Thursday, June 23th

Morning session: Location - Hotel ”Palace”, Conference hall

Chair: Marko Djordjevic

9:00- 9:35 Invited Speaker: Yuriy L. Orlov (Novosibirsk State University, Russia) Comparative analysis of plant genome structure and antisense transcripts

9:35-10:10 Invited Speaker: Paul Sorba (Laboratory of Theoretical Physics and CNRS, France) Symmetry and Minimum Principle: a basis for the Genetic Code

10:10-10:45 Invited Speaker: Konstantin Severinov (Rutgers University, USA) The Influence of Copy-Number Maintenance Mechanisms of Targeted Extrachromosomal Genetic Elements on the Outcome of CRISPR-Cas Defense

10:45-11:15 coffee break

Chair: Vladimir Uversky

11:15-11:50 Invited Speaker: Bosiljka Tadi´c (Jozef Stefan Institute, Slove- nia) Algebraic Topology Analysis of Brain Graphs Emanating from Social Communications

11:50-12:25 Invited Speaker: Erik Bongcam-Rudloff (Swedish University of Agricultural Sciences, Sweden) Next Generation Biotechnologies, the bad and the good: a look into the future

12:25-13:00 Invited Speaker: Andrea Ciliberto (IFOM-IEO, Italy) Adapt or die. Investigating the molecular basis of cell variabil- ity

13:00-13:30 Sponsors presentation (Genomix4Life, Pearson)

13:30-15:00 Lunch (Hotel ”Palace”)

BelBI2016, Belgrade, June 2016. vii Afternoon session TABIS Session: Hotel ”Palace”, Banquet hall

Chair: Paul Sorba

15:00-15:35 Invited Speaker: Branko Dragovic (Mathematical Institute SASA, Serbia) Ultrametric Approach to Bioinformation Systems

15:55-16:15 Natasa Misic (Lola Institute, Belgrade, Serbia) Standard Genetic Code vs Vertebrate Mitochondrial Code: Nu- cleon Balances and p-Adic Distances

15:35 -15:55 Natasa Djurdjevac Konrad (Zuse Institute Berlin, Germany) A new random-walk-based approach for finding co-expression modules in biological networks

16:15-16:35 Ozal Mutlu (Marmara University, Istanbul, Turkey) Structural Characterization of the Trypanosoma brucei CK2A1-HDAC1/HDAC2 Interactions by Molecular Modeling and Protein-Protein Docking

16:35-17:10 coffee break Chair: Yuriy L. Orlov 17:10-17:30 Tamara Dimitrova (Macedonian Academy of Sciences and Arts, Macedonia) Analysis of network structural characteristics through vertex characteristics in directed networks

17:30-17:50 Balazs´ Szalkai (Eotv¨ os¨ Lorand´ University, Budapest, Hungary) Graph Theoretical Analysis Reveals: Womens Brains Are Better Connected than Mens

17:50-18:10 Balint Varga (Eotv¨ os¨ Lorand´ University, Budapest, Hungary) Comparative Connectomics: Mapping the Inter-Individual Variability of Connections within the Regions of the Human Brain

18:10-18:30 Yair Lakretz (Tel Aviv University, Israel) The perceptual structure of the phoneme manifold

20:00- Conference Dinner

viii BelBI2016, Belgrade, June 2016. DMBI/HI Session - Hotel ”Palace”, Conference hall

Chair: No¨el Malod-Dognin

15:00-15:35 Invited Speaker: Jan Baumbach (University of Southern Den- mark, Denmark) Computational Breath Analysis Non-invasive detection of biomarkers in exhaled air and bacterial vapor

15:35 -15:55 Ana Simonovic (Institute for Biological Research, University of Belgrade, Serbia) Identification of genes involved in morphogenesis in vitro in Centaurium erythraea Rafn. as a model organism

15:55-16:15 Richard Roettger (University of Southern Denmark, Odense, Denmark) On the clustering of biomedical datasets - a data-driven per- spective

16:15-16:35 Milan Vukicevic (University of Belgrade, Faculty of Organiza- tional Sciences, Serbia) White-Box Predictive Algorithms for Predicting Disease States on Gene Expression Data From Component Based Design to Meta Learning 16:35-17:10 coffee break Chair: Dragan Matic 17:10-17:30 Dragana Dudic (University of Belgrade, Faculty of Agriculture, Serbia) Mining PMMoV genotype-pathotype association rules from public databases

17:30-17:50 Ana Jelovic (University of Belgrade, Faculty of Transport and Traffic Engineering, Serbia) Filtering of repeat sequences in genomes

17:50-18:10 Milana Grbic (Univeristy of Banja Luka, Faculty of Science and Mathematics, Bosnia and Herzegovina) Improving 1NN strategy for classification of some prokaryotic organisms

18:10-18:30 Sanja Brdar (Institute for research and development of infor- mation technology in biosystem, University of Novi Sad, Ser- bia ) Non-negative Matrix Factorization for Integrative Clustering of Bioinformatics Data 20:00- Conference Dinner

BelBI2016, Belgrade, June 2016. ix Friady, June 24th

Morning session TABIS Session: Hotel ”Palace”, Banquet hall

Chair:Vladik Avetisov

9:00- 9:35 Invited Speaker: Sergei Kozyrev (Steklov Mathematical Insti- tute, Russia) Dark states in quantum photosynthesis

9:35-10:10 Invited Speaker: Argyris Nicolaidis (Aristotle University of Thessaloniki, Greece) A Quantum Approach to the DNA Structure

10:10-10:30 coffee break Chair: Robert Waterhouse

10:30-11:05 Invited Speaker: Sergey Volkov (Bogolyubov Institute for The- oretical Physics, Ukraine) DNA polymorphism as a tool for genetic information imple- mentation

11:05-11:25 Polina Kanevska (Bogolyubov Institute for Theoretical Physics, Ukraine) DNA polymorphism as a tool for genetic information imple- mentation

11:25-11:45 Ana Stanojevic (University of Belgrade, Faculty of Physical Chemistry, Serbia) Mathematical Modeling of the Hypothalamic-Pituitary-Adrenal Axis Dynamics in Rats

11:45-12:05 Alina-Maria Streche (University of Craiova, Department of Physics, Romania) Chaos and symmetry in mathematical neural flow models

13:00-15:00 Lunch (Hotel ”Palace”)

16:00-17:30 Round Table Discussion: perspectives and cooperation

17:30 Closing ceremony

x BelBI2016, Belgrade, June 2016. DMBI/HI Session - Hotel ”Palace”, Conference hall

Chair: Alexandre de Brevern

9:00- 9:35 Invited Speaker: Robert Waterhouse (University of Geneva, Switzerland) OrthoDB: an evolutionary perspective to interpreting genomics data

9:35-10:10 Invited Speaker:Goran Nenadic (University of Manchester, UK) What is bioinformatics made from: understanding database and software usage through literature mining

10:10-10:30 coffee break Chair: Jan Baumbach

10:30-11:05 Invited Speaker: No¨el Malod-Dognin (Imperial College Lon- don, UK) Patient Specific Network Data Integration Enables Precision Medicine in Cancer

11:05-11:25 Invited Speaker: Nevena Veljkovic (VINCA Institute, University of Belgrade, Serbia) Transcription factors interaction inference based on sequence feature representations

11:25-11:45 Zeljko Popovic (University of Novi Sad, Faculty of Sciences, Serbia) DORMANCYbase developing a bioinformatics database on molecular regulation of animal dormancy

11:45-12:05 Milena Banjevic (Natera Inc., San Carlos, USA) SNP-Based Noninvasive Prenatal Screening using Cell-Free DNA for Detection of Fetal Chromosome Abnormalities

13:00-15:00 Lunch (Hotel ”Palace”)

16:00-17:30 Round Table Discussion: perspectives and cooperation (Hotel ”Palace”, Conference hall)

17:30 Closing ceremony (Hotel ”Palace”, Conference hall)

BelBI2016, Belgrade, June 2016. xi

Table of Contents

A. Invited speakers

Complex landscapes and ultrametricity in a biological context ...... 1 Vladik A. Avetisov

Computational Breath Analysis - Non-invasive detection of biomarkers in exhaled air and bacterial vapor ...... 2 Jan Baumbach

Next Generation Biotechnologies, the bad and the good: a look into the future ...... 3 Erik Bongcam-Rudloff

On flexibility, deformability and mobility of protein structures in the light of a structural alphabet ...... 4 Tarun Narwani, Pierrick Craveur, Nicolas Shinada, Hubert Santuz, Joseph Rebehmed, Catherine Etchebest, and Alexandre G. de Brevern

Elemental metabolomics for improving human health ...... 7 Ping Zhang, Constantinos Georgiou, and Vladimir Brusic

Quantifying genome-wide DNA methylation from MethylCap-Seq data and its applications in cancer ...... 9 Ralf Bundschuh

Infomax strategies for an optimal balance of exploration and exploitation . 10 Antonio Celani

Leveraging Electronic Medical Records for Personalized and Population Healthcare ...... 11 Nitesh V. Chawla

Adapt or Die. Investigating the molecular basis of cell variability...... 12 Andrea Ciliberto

Ultrametric Approach to Bioinformation Systems ...... 13 Branko Dragovich

Molecular mechanism of Aβ amyloid formation ...... 14 Oxana V. Galzitskaya, Olga M. Selivanova, Alexey K. Surin, Victor V. Marchenkov, Ulyana F. Dzhus, Elizaveta I. Grigorashvili, Mariya Yu. Suvorina, Anna V. Glyakina, and Nikita V. Dovidchenko

Epigenetic state and spatial structure of chromatin ...... 15 Mikhail Gelfand Dark states in quantum photosynthesis ...... 16 Sergei Kozyrev

Patient Specific Network Data Integration Enables Precision Medicine in Cancer ...... 17 No¨el Malod-Dognin

Biophysical models of protein evolutionary dynamics ...... 18 Alexandre Morozov

What is bioinformatics made from: understanding database and software usage through literature mining ...... 19 Goran Nenadi´c

A Quantum Approach to the DNA Functioning ...... 20 Argyris Nicolaidis

Effectiveness of Multiple Blood Cleansing Interventions in Sepsis ...... 21 Zoran Obradovic

Comparative analysis of plant genome structure and antisense transcripts . 22 Salwa E.S. Mohamed, Oxana B. Dobrovolskaya, Vladimir N. Babenko, KhaledSalem, Ming Chen, and Yuriy L. Orlov

Network Data Integration Enables Precision Medicine ...... 23 Nataˇsa Prˇzulj

Intrinsically disordered protein families ...... 24 Marco Punta

The Influence of Copy-Number Maintenance Mechanisms of Targeted Extrachromosomal Genetic Elements on the Outcome of CRISPR-Cas Defense ...... 25 Konstantin Severinov, Iaroslav Ispolatov, and Ekaterina Semenova

Symmetry and minimum principle: a basis for the genetic code ? ...... 26 Paul Sorba

Networks of Co-expression Modules ...... 27 Aleksandar Stojmirovi´c

Algebraic Topology Analysis of Brain Graphs Emanating from Social Communications ...... 28 Bosiljka Tadi´c and Miroslav Andjelkovi´c

The role of structural disorder in protein degradation ...... 29 Peter Tompa

Non-globular proteins: Towards an understanding of the ”dark matter” in the protein universe ...... 30 Silvio C.E. Tosatto

xiv BelBI2016, Belgrade, June 2016. Intrinsically disordered proteins in salted water and in the thick soup .... 31 Vladimir N. Uversky

Transcription factors interaction inference based on sequence feature representations ...... 32 Nevena Veljkovi´c

DNA polymorphism as a tool for genetic information implementation .... 33 Sergey N. Volkov

OrthoDB: an evolutionary perspective to interpreting genomics data ..... 35 Robert M. Waterhouse, Evgenia V. Kriventseva, and Evgeny M. Zdobnov

From biocuration to model predictions and back ...... 36 Ioannis Xenarios

B. Speakers in sessions

Bioinformatics Basis for the ”Molecular Tweezers” Construction ...... 39 Anastasia Anashkina and Alexei Nekrasov

Clustering of CpG-rich elements in gene dense regions ...... 43 Vladimir Babenko, Irina Chadaeva, and Yuriy. Orlov

SNP-Based Noninvasive Prenatal Screening using Cell-Free DNA for Detection of Fetal Chromosome Abnormalities ...... 45 Milena Banjevic, Allison Ryan, and Styrmir Sigurjonsson

Achieving a rapid expression of toxic (but useful) molecules within cell ... 46 Bojana Blagojevic and Magdalena Djordjevic and Marko Djordjevic

Non-negative Matrix Factorization for Integrative Clustering of Bioinformatics Data ...... 47 Sanja Brdar

Radiation Induced Dysfunctions in the Working Memory Performance Studied by Neural Network Modeling ...... 48 Aleksandr Bugay

Transcriptome data mining results support observed changes in host lipid metabolism during experimental toxoplasmosis ...... 49 Miloˇs Busarˇcevi´c and Aleksandar Trbovich, Ivan Milovanovi´c, Aleksandra Uzelac, Olgica Djurkovi´c-Djakovi´c

Genome-scale Modelling, Metabolomics and Cheminformatics analysis guiding the Discovery of Antifungal Metabolites for Crop Protection ...... 52 Miroslava Cuperlovic-Culf

BelBI2016, Belgrade, June 2016. xv Analysis of network structural characteristics through vertex characteristics in directed networks ...... 53 Tamara Dimitrova

A new random-walk-based approach for finding co-expression modules in biological networks ...... 54 Natasa Djurdjevac Conrad

Improving 1NN strategy for classification of some prokaryotic organisms . 55 Milana Grbi´c, Aleksandar Kartelj, Dragan Mati´c and Vladimir Filipovi´c

Identifying relevant positions in proteins by Critical Variable Selection ... 57 Silvia Grigolon

Transcription initiation by alternative sigma factors ...... 58 Jelena Guzina and Marko Djordjevic

The influence of amino acids physicochemical properties and frequencies on identifying MHC binding ligands ...... 59 Davorka R. Jandrli´c, Nenad S. Miti´c, and Mirjana D. Pavlovi´c

Networks of interaction in moving animal groups and collective changes of direction ...... 61 Asja Jeli´c

Filtering of repeat sequences in genomes ...... 62 Ana Jelovi´c, Miloˇs Beljanski, and Nenad Miti´c

Could integrative bioinformatic approach predict the circulating miRs that have significant role in pancreatic tissue in type 2 diabetes? ...... 63 Ivan Jovanovi´c, Maja Zivkoviˇ ´c, Jasmina Jovanovi´c, Tamara Djuri´c, and Aleksandra Stankovi´c

Mechanism of unusual flexibility of DNA TATA-box ...... 68 Polina Kanevska and Sergey Volkov

One structured output learning method for protein function prediction ... 69 Jovana Kovacevic, Predrag Radivojac, Gordana Pavlovi´c-Laˇzeti´c

Combined genomic and transcriptomic characterization of single disseminated prostate cancer cells ...... 70 Stefan Kirsch, Urs Lahrmann, Miodrag Guzvic, Zbigniew T. Czyz, Giancarlo Feliciello, Bernhard Polzer and Christoph A. Klein

The perceptual structure of the phoneme manifold ...... 72 Yair Lakretz, Evan-Gary Cohen, Naama Friedmann, Gal Chechik, and Alessandro Treves

Model selection in biomolecular pathways ...... 73 Hanen Masmoudi

xvi BelBI2016, Belgrade, June 2016. Hybrid methodology for information extraction from tables in the biomedical literature ...... 74 Nikola Milosevic, Cassie Gregson, Robert Hernandez, and Goran Nenadic

Standard Genetic Code vs Vertebrate Mitochondrial Code: Nucleon Balances and p-Adic Distances ...... 79 Nataˇsa Z.ˇ Miˇsi´c

Structural Characterization of the Trypanosoma brucei CK2A1- HDAC1/HDAC2 Interactions by Molecular Modeling and Protein-Protein Docking ...... 80 Ozal Mutlu

Mining PMMoV genotype-pathotype association rules from public databases ...... 81 Vesna Paji´c, Bojana Banovi´c, Miloˇs Beljanski and Dragana Dudi´c

Complexity measures based on intermittent events in brain EEG data .... 87 Paolo Paradisi, Marco Righi, Massimo Magrini, Maria Chiara Carboncini, Alessandra Virgillito, and Ovidio Salvetti

DORMANCYbase developing a bioinformatics database on molecular regulation of animal dormancy ...... 92 Popovi´c Zeljkoˇ D., Kadlecsik Tamas,´ Fazekas David,´ Ari Eszter, Korcsmaros´ Tamas,´ Uzelac Iva, Avramov Miloˇs, Krivoku´ca Nikola, Kitanovi´c Nevena, and Kokai Dunja

Examining regulation of restriction-modification systems by quantitative modeling ...... 94 Andjela Rodic and Marko Djordjevic

On the clustering of biomedical datasets - a data-driven perspective ...... 95 Richard Roettger

Identification of genes involved in morphogenesis in vitro in Centaurium erythraea Rafn. as a model organism ...... 96 Ana Simonovi´c, Milan Dragi´cevi´c, Giorgio Giurato, Biljana Filipovi´c, Sladjana Todorovi´c, Milica Bogdanovi´c, Katarina Cukovi´ ´c, and Angelina Suboti´c

Mathematical Modeling of the Hypothalamic-Pituitary-Adrenal Axis Dynamics in Rats ...... 98 Ana Stanojevi´c, Vladimir Markovi´c, Zeljkoˇ Cupiˇ ´c, Stevan Ma´ceˇsi´c, Vladana Vukojevi´c, and Ljiljana Kolar-Ani´c

Chaos and symmetry in mathematical neural flow models ...... 99 Rodica Cimpoiasu, Radu Constantinescu, and Alina Streche

BelBI2016, Belgrade, June 2016. xvii Graph theoretical analysis reveals: Womens brains are better connected than mens ...... 100 Balazs´ Szalkai, Balint´ Varga, and Vince Grolmusz

Comparative Connectomics: Mapping the Inter-Individual Variability of Connections within the Regions of the Human Brain ...... 101 Balint´ Varga

Viral: Real-world competing process simulations on multiplex networks .. 102 Petar Veliˇckovi´c, Andrej Ivaˇskovi´c, Stella Lau, and Miloˇs Stanojevi´c

White-Box Predictive Algorithms for Predicting Disease States on Gene Expression Data From Component Based Design to Meta Learning ...... 107 Milan Vukicevic, Sandro Radovanovic, Boris Delibasic, and Milija Suknovic

C. Poster session

Machine learning-based approach to help diagnosing Alzheimer’s disease through spontaneous speech analysis ...... 111 Jelena Graovac, Jovana Kovaˇcevi´c, and Gordana Pavlovi´c Laˇzeti´c

Targeted resequencing in diagnostics of inherited genetic disorders ...... 112 Jelena Kusic-Tisma, Nikola Ptakov´ a,´ A. Divac, M. Ljujic, Lj. Rakicevic, M. Tesic, N. Antonijevic, S. Kojic, Milan Macek Jr., and D. Radojkovic

A biologically-inspired model of visual word recognition ...... 114 Yair Lakretz, Naama Friedmann, and Alessandro Treves

Crystallographic study on CH/O interactions of aromatic CH donors within proteins ...... 119 J. Lj. Dragelj, Ivana M. Stankovi´c, D. M. Boˇzinovski, T. Meyer, Duˇsan Z.ˇ Veljkovi´c, Vesna B. Medakovi´c, Ernst Walter Knapp,, and Sneˇzana D. Zari´c

Dynamics of Escherichia coli type I-E CRISPR spacers over 42,000 years .. 120 Ekaterina Savitskaya, Anna Lopatina, Sofia Medvedeva, Mikhail Kapustin, Sergey Shmakov, Alexey Tikhonov, Irena I. Artamonova, and Konstantin Severinov

De Novo Transcriptome Sequencing of Verbascum thapsus L. to Identify Genes Involved in Metal Tolerance ...... 121 Filis Morina, Marija Vidovi´c, Ana Sedlarevi´c, Ana Simonovi´c, and Sonja Veljovi´c-Jovanovi´c

De Novo Transcriptome Sequencing of Pelargonium zonale L. to Identify Genes Involved in UV-B and High Light Response ...... 123 Marija Vidovi´c, Filis Morina, Ana Sedlarevi´c, Ana Simonovi´c, and Sonja Veljovi´c-Jovanovi´c

xviii BelBI2016, Belgrade, June 2016. Protein Interaction Network Construction and Analysis Using the Quantitative Proteomics Data ...... 125 Ozal Mutlu and Nagihan Gulsoy

An optimal promoter description for bacterial transcription start site detection ...... 127 Milos Nikolic, Tamara Stankovic, and Marko Djordjevic

Chronic Treatment with Fluoxetine Led to Alterations in the Rat Hippocampal Proteome ...... 128 Ivana Peri´c, Dragana Filipovi´c, Victor Costina, and Peter Findeisen

A web-based tool for prediction of effects of single amino acid substitutions outside conserved functional protein domains ...... 130 Vladimir Perovic, Ljubica Mihaljevic, Branislava Gemovic, and Nevena Veljkovic

Protein-protein interaction prediction method based on principle component analysis of amino acid physicochemical properties ...... 131 Neven Sumonja, Nevena Veljkovic, Sanja Glisic, and Vladimir Perovic

Basic Sequence Alignment Based Screening for Alternative Mannanase Producing Bacteria ...... 132 Bojan D. Petrovi´c and Zorica D. Kneˇzevi´c-Jugovi´c

Theoretical study on the role of aromatic amino acids in stability of amyloids ...... 138 Dragan B. Ninkovi´c, Duˇsan P. Malenov, Predrag V. Petrovi´c, Edward N. Brothers, Shuqiang Niu, Michael B. Hall, Milivoj Beli´c and Sneˇzana D. Zari´c

Construction of Amyloid PDB Files Database ...... 139 Ivana Stankovi´c and Sneˇzana Zari´c

Search for small RNAs associated with CRISPR/Cas ...... 144 Tamara Stankovic, Jelena Guzina, Magdalena Djordjevic, and Marko Djordjevic

A novel approach for dealing with spatial/temporal edges within molecular interaction networks...... 145 Ruth A Stoney, Ryan Ames, Goran Nenadic, David L Robertson∗, and Jean-Marc Schwartz ∗Shared last/corresponding authors

Gene expression in schizophrenia patients and non-schizophrenic individuals infected with Toxoplasma gondii ...... 147 Aleksandra Uzelac, Tijana Stajner,ˇ Miloˇs Busarˇcevi´c, Ana Munjiza, Milutin Kosti´c, Cedoˇ Miljevi´c, Duˇsica Leˇci´c-Toˇsevski, Nenad Miti´c, Saˇsa Malkov, and Olgica Djurkovi´c-Djakovi´c

BelBI2016, Belgrade, June 2016. xix Propensities of amino acid toward certain secondary protein structure types: comparison of different statistical methods...... 149 Duˇsan Z.ˇ Veljkovi´c, Saˇsa Malkov, Vesna B. Medakovi´c, and Sneˇzana D. Zari´c

Botryosphaeriaceae on Aesculus hippocastanum in Serbia ...... 150 Milica Zlatkovi´c, Nenad Keˇca, Michael Wingfield, Fahimeh Jami, and Bernard Slippers

Botryosphaeriaceae on Sequoia sempervirens in Serbia ...... 151 Milica Zlatkovi´c, Nenad Keˇca, Michael Wingfield, Fahimeh Jami, and Bernard Slippers

Author Index ...... 153

List of participants ...... 153

Sponsors A. INVITED SPEAKERS

Complex landscapes and ultrametricity in a biological context

Vladik A. Avetisov

The Semenov Institute of Chemical Physics of the Russian Academy of Sciences, Kosygina 4, Moscow, Russia [email protected]

Abstract

In general, the control functions relevant to the cooperative behavior of systems of many interacting units can be perceived as landscapes. In biology, landscapes can specify the energy (i.e., they are to be minimized), or they can afford a mea- sure of fitness (i.e., they are to be maximized). For some problems, the landscape has few dimensions (e.g., when it represents the potential of a specific unit). For other problems, the landscape has many dimensions, such as when the behavior of a representative point describes the positions of all of the individual units of a many-body system. As in geography, landscapes may be flat and rugged. Flat landscapes are the simplest to consider, but rugged landscapes are of the great- est interest due to their complexity and rich behavior. Here, I discuss rugged landscapes focusing on proteins and evolutionary systems. These examples are demonstrative for cases in which one is attempting to describe the time behavior of complex systems with inherently conflicting interactions over a wide range of time scales. In fact, the protein energy landscape is overly complicated for re- constructing in detail the entire process of transmission of local excitations at the protein active site into the directed movements of large protein fragments. The same problems appear when we consider a community of species whose evolution in high-dimensional genomic space is specified by a rugged fitness landscape. In any case, one needs to make some simplifications in order to describe multi- scale dynamic behavior on extremely complex landscapes. In this respect, it seems that ultrametric random processes, which are inherently multi-scale, open up new perspectives [1],[2]. Keywords: complex landscapes, ultrametric diffusion, protein dynamics, evolu- tion

References

1. Avetisov, V. A., Bikulov, A. Kh., Zubarev.A. P. : Ultrametric random walk and dynamics of protein molecules, Proceedings of the Steklov Institute of Mathematics, 285, 3-25. (2014) 2. Avetisov V. A., Zhuravlev Yu. N.: An evolutionary interpretation of the p-adic ultrametric diffusion equation, Doklady Mathematics, 75 (3), 453-455. (2007)

BelBI2016, Belgrade, June 2016. 1 Computational Breath Analysis - Non-invasive detection of biomarkers in exhaled air and bacterial vapor

Jan Baumbach

Dept. of Mathematics and Computer Science, University of Southern Denmark, Campusvej 55, Odense, Southern Denmark 5230, Denmark [email protected]

Abstract

Volatile organic compounds are emitted by all living cells and tissues. We seek to non-invasively ’sniff’ biomarker molecules that are predictive for the biomedical fate of individual patients or cell cultures. This promises great hope to move the therapeutic windows to earlier stages of disease progression. While portable de- vices for exhaled volatile metabolite measurement exist, we face the traditional biomarker research barrier: A lack of robustness hinders translation to the world outside laboratories. To move from biomarker discovery to validation, from sep- arability to predictability, we have developed several bioinformatics methods for computational breath analysis, which have the potential to redefine non-invasive biomedical decision making by rapid and cheap matching of decisive medical patterns in exhaled air. We aim to provide a supplementary diagnostic tool com- plementing classic urine, blood and tissue samples. In the presentation, we will review the state of the art, study some clinical application examples, highlight existing challenges, and introduce new data mining methods for identifying ex- haled biomarkers.

2 BelBI2016, Belgrade, June 2016. Next Generation Biotechnologies, the bad and the good: a look into the future

Erik Bongcam-Rudloff

SLU Global Bioinformatics Centre, SLU, Uppsala, Sweden [email protected]

Abstract

Life sciences have undergone an immense transformation during the recent years, where advances in genomics, epigenetics, proteomics and other high- throughput techniques produce floods of raw data that need to be stored, anal- ysed and interpreted in various ways. NGS technology massively parallelises nucleotide sequencing procedures, mak- ing the sequencing of genomes and of transcriptomes much faster and cheaper than ever before. The new technologies are, however, posing massive (bio-) in- formatics challenges that require new ways of thinking and novel solutions. During my talk I will present new technologies and briefly discuss the pros and cons of this development. I will also discuss some of the aspects of our scientific publishing culture that are a hinder for modern efficient analysis of data. At last I will shortly present some exiting new initiatives that are working in the creation of research solutions to face the challenges that the new technologies generate.

BelBI2016, Belgrade, June 2016. 3 On flexibility, deformability and mobility of protein structures in the light of a structural alphabet

Tarun Narwani1, Pierrick Craveur1, Nicolas Shinada1,2, Hubert Santuz1, Joseph Rebehmed3, Catherine Etchebest1, and Alexandre G. de Brevern2

1 INSERM UMR S 1134, DSIMB, Univ Paris Diderot, Sorbonne Paris Cit, INTS, lab of excellence GR-Ex, 6 rue Alexandre Cabanel, 75013 Paris, France {arun.narwani, pierrick.craveur, nicolas.shinada, hubert.santuz, catherine.etchebest, alexandre.de-brevern}@inserm.fr 2 Discngine,79 Avenue Ledru Rollin, 75011 Paris, France [email protected] 3 Department of Computer Science and Mathematics, Lebanese American University, Byblos 1h401 2010, Lebanon [email protected]

Abstract

The function of a protein is directly dependent on its 3-dimensional structure. Visualization tools have oversimplified our views of protein structures. Often, they are considered as macromolecules with repetitive structures as rigid while the connecting loops as flexible or even disordered. in silico approaches are in- teresting tools to tackle this critical question of protein flexibility. Moreover, it allows applying other criteria than B-factors, to define flexibility [1]. We have previously developed different structural alphabets (SAs) [2], [3]. They are libraries of small protein fragments that are able to approximate every part of protein structures, making them more precise than classical secondary struc- tures. More precise and complete description of protein backbone conformation can be obtained using SAs for the structural analysis; from definition of lig- and binding sites to superimposition of protein structures [4]. SAs are also well suited to perform prediction of protein flexibility from the sequence [5]. We have also used them to analyse the dynamics of protein structures in a case of a transmembrane protein [6] and for integrins implicated in pathologies [7], [8]. Here, we have selected a representative set of 169 protein structures describ- ing equally the main four SCOP classes [9]. 3 independent Molecular Dynamic (MD) simulations of 50 ns have been performed for each system using GRO- MACS software [10] with classical parameters of AMBER99sb forcefield. All the simulations quickly reached a stabilize state, i.e. a plateau. Each simulation was analysed through classical secondary structures assign- ment (with DSSP [11]) and Protein Blocks (PBs [2]) assignment (with PBxplore tool [12]). Notably, from PBs’ assignment, entropy value: Neq can be computed which is a quantitative value reflecting the local conformational changes [2]. Of course, classical MD analyses, e.g. root mean square fluctuation (RMSf), have been performed. From the original protein structures, normalized B-factors and

4 BelBI2016, Belgrade, June 2016. On flexibility, deformability and mobility ... relative solvent accessibility have been computed with in-house tools and DSSP [11]. Using this large and diverse set of protein dynamics enables us to efficiently analyse the evolution of local protein structures. For instance, at a global level, correlation between normalized B-factors and RMSF is 0.43. Contrastingly, by computing the average value per PB (total 16), this correlation enhances to 0.98, underlining that PBs are of great interest to analyse protein flexibility.

Interestingly, correlation of Neq values with normalized B-factors and RMSf is merely 0.41 and 0.46, respectively. This behaviour roots from the fact that Neq encompasses local conformational variations (at residue level) while RMSf is computed for the overall structure every time. Therefore, we encounter cases where RMSf values are very high (flexibility) while Neq remains low (rigidity). Such cases may look awkward, as flexibility corresponds to huge movements and so must associate to high Neq values. However, it may happen due to con- fusion of resolving between flexibility, deformability and mobility. Biologically, such cases could correspond to rigid regions (mobility) enclosed between two flexible regions (deformability). Protein Blocks are thus of great interest to analyse MD simulation and their cou- pling to various experimental data (e.g. B-factor) allows understanding general and specific behaviours of local protein conformations found in various proteins.

Many complementary analyses were performed using PBs. Following are some of the primary findings. PB d and m (core of repetitive structures), depict higher tendencies to resist changes while many others are less reserved, e.g. PB g (for ”coil”) which changes 40% of the times to another PB during the dynamics. Clustering of different behaviours of the PBs has been performed and reveals that some changes are not expected but often found. In a significant number of cases, PB g changes to PB p, during majority of the simulation time and even sometimes to PB m. Noteworthy is that, PB p is associated to loops connecting α-helix to β-strand while PB m is denotes stable -helical regions. We will present here a complete summary of all our results and perspectives of this work. Keywords: bioinformatics, protein structures, secondary structures, flexibility, deformability, statistics, Protein Blocks

Acknowledgments

This work was supported by grants from the French Ministry of Research, Uni- versity of Paris Diderot Paris 7, French National Institute for Blood Transfusion (INTS), French Institute for Health and Medical Research (INSERM). AdB also acknowledge to Indo-French Centre for the Promotion of Advanced Research / CEFIPRA for collaborative grants (number 5302-2). This study was supported by grants from Laboratory of Excellence GR-Ex, reference ANR-11-LABX-0051. The labex GR-Ex is funded by the program Investissements davenir of the French

BelBI2016, Belgrade, June 2016. 5 Tarun Narwani et al.

National Research Agency, reference ANR-11-IDEX-0005-02. The authors were granted access to high performance computing (HPC) re- sources at the French National Computing Center CINES under grant no. c2013- 037147 funded by the GENCI (Grand Equipement National de Calcul Intensif).

References

1. Craveur, P., Joseph, A.P., Esque, J., Narwani, T.J., Nol, F., Shinada, N., Goguet, M., Leonard, S., Poulain, P., Bertrand, O., Faure, G., Rebehmed, J., Ghozlane, A., Swapna, L.S., Bhaskara, R.M., Barnoud, J., Tltcha, S., Jallu, V., Cerny, J., Schneider, B., Etchebest, C., Srinivasan, N., Gelly, J.-C., de Brevern A.G.: Protein flexibility in the light of structural alphabets., Frontiers in Molecular Biosciences - Structural Biology, 2:20. (2015) 2. de Brevern, A.G. , Etchebest, C., Hazout, S.: Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks, Proteins, 41:271-287. (2000) 3. Bornot, A., Etchebest, C., de Brevern, A.G.: A new prediction strategy for long local protein structures using an original description, Proteins, 76:570-87. (2009) 4. Joseph, A.P., Agarwal, G., Mahajan, S., Gelly, J.-C., Swapna, L.S., Offmann, B., Cadet, F., Bornot, A., Tyagi M., Valadi, H., Schneider, B., Etchebest, C., Srinivasan, N., de, Brevern, A.G.: A short survey on Protein Blocks, Biophysical Reviews, 2:137-145. (2010) 5. de Brevern, A.G., Bornot, A., Craveur, P., Etchebest, C., Gelly, J.-C.: PredyFlexy: Flexibility and Local Structure prediction from sequence, Nucleic Acid Res, 40:W317-22. (2012) 6. de Brevern, A.G., Wong, H., Tournamille, C., Cartron, J.-P., Colin, Y., Le Van Kim, C., Etchebest, C.: A structural model of seven transmembrane helices receptor, Duffy Antigen / Receptor for Chemokines (DARC), Biochem Biophys Acta, 1724:288-306. (2005) 7. Jallu, V., Poulain, P., Fuchs, P.F., Kaplan, C., de Brevern, A.G.: Modeling and Molecular Dynamics of HPA-1a and -1b Polymorphisms: Effects on the Structure of the 3 Subunit of the IIb/3 Integrin, Plos One, 7:e47304. (2010) 8. Jallu, V., Poulain, P., Fuchs, P.F., Kaplan, C., de Brevern, A.G.: Modeling and molecular dynamics simulations of the V33 variant of the integrin subunit 3: structural comparison with the L33 (HPA-1a) and P33 (HPA-1b) variants, Biochimie, 105:84-90. (2014) 9. Murzin, A.G., Brenner, S.E., Hubbard, T., Chotia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures, J Mol Biol., 247:536- 40. (1995) 10. Pronk, S., Pall S., Schulz, P., Larsson, P., Bjelkmar, P., Apostolov R., Shirts, M.R., Smith, J.C., Kasson, P.M., van der Spoel, D., Hess, B., Lindahl, E.: GROMACS 4.5: a high- throughput and highly parallel open source molecular simulation toolkit, Bioinformatics, 29: 845-854. (2013) 11. Kabsch, W., Sander, C.: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features, Biopolymers 22(12): 2577-2637. (1983) 12. Poulain, P. and collaborators: A program to explore protein structures with Protein Blocks, https://github.com/pierrepo/PBxplore. (2015)

6 BelBI2016, Belgrade, June 2016. Elemental metabolomics for improving human health

Ping Zhang1, Constantinos Georgiou2, and Vladimir Brusic3

1 Menzies Health Institute Queensland, Griffith University, Australia [email protected] 2 Department of Chemistry, Agricultural University of Athens, Greece [email protected] 3 School of Medicine and Bioinformatics Center, Nazarbayev University, Kazakhstan [email protected]

Abstract

Bulk organic elements (C, H, N, and O) make 96% of human body. Other ele- ments essential for life include macroelements (Ca, Cl, K, Mg, Na, P, and S) that make 3% of human body, and microelements (Co, Cu, Cr, Fe, I, Mn, Mo, Se, Zn) that are present in lower quantities. A number of elements are present in human body at significant quantities (Al, Ba, Rb, and Ti) but have no known biological function. Some elements (As, Br, Ni, Si, Sn, Sr, V, B, Cd, Li, Pb) are thought to be necessary for optimal functioning and good health of organism since they mod- ulate the function of essential elements. Other elements are present in human bodies at extremely low (ultratrace) concentrations. Some elements (such as As, Be, Cr, Cd, Hg, Pb) are potent toxins if they exceed homeostatic levels or if they are present in form of toxic compounds, such as hexavalent chromium, Cr(VI). Elements are bioavailable to humans through food, water, environmental and occupational exposure, and medical treatment. They circulate, accumulate, or disperse throughout the environment and the food chain. The development of mass spectrometry and measurement standards in recent years have enabled us to precisely measure the quantities of more than 70 trace and ultratrace elements and their isotopes in a variety of inorganic and organic samples. Elemental profiling has application in food science (assessment of food quality and safety, nutrition (healthy diet, deficiencies), medicine (screening, di- agnostics, and toxicology), hydrology (safety and health properties of drinking water), geology, ecology, environmental science, forensic, and even anthropol- ogy. Elemental profiling has mainly focused on identification of levels of indi- vidual elements in biological samples. Elemental profiles can be used for clas- sification of biological samples. The examples of elemental profile use include distinguishing wild and domestic rabbit meat, distinguish cherry tomatoes that originate from different geographic regions, organically produced from conven- tionally produced eggs, and identification of profiles characteristic of blood sam- ples from normal, obese, metabolic syndrome, and type 2 diabetics. Elemental profiling can be used for identification of nutritional needs of individuals and defining personalized diet. Also, changes in elemental profiles can be monitored to follow the progression of disease even in pre-clinical stages.

BelBI2016, Belgrade, June 2016. 7 Ping Zhang et al.

Rather than observing the behavior of one element at time, elements must be observed in a systemic way, conceptually similar to systems biology, so that the effects of variation of multiple elements can be correlated to observed outcomes, and the intervention can be correlated to the desired outcomes. Advanced statis- tical techniques and machine learning methods are needed to advance this field. The combination of databases and advanced algorithms will enable predictive modeling that will interpret complex effects based on combination of elements and their interactions rather than observing the behavior of a single element. El- emental metabolomics is the emerging field that focuses on the study of elemen- tal profiles and their use for advancing applications in health and development of living organisms.

8 BelBI2016, Belgrade, June 2016. Quantifying genome-wide DNA methylation from MethylCap-Seq data and its applications in cancer

Ralf Bundschuh

The Ohio State University, Department of Physics, Chemistry & Biochemistry, Division of Hematology, USA [email protected]

Abstract

DNA methylation is an epigenetic mark with direct impact on gene regulation which is known to be aberrant in many cancers. There are several techniques that allow interrogating DNA methylation on a genome-wide basis. One such technique that presents an especially good tradeoff between diversity of covered genomic regions, resolution, and cost compatible with large patient cohorts is the preferential capture of methylated DNA using the MBD2 domain followed by high throughput sequencing (MethylCap-Seq). However, the readout of this technique is relative coverage of different genomic regions and thus quite indi- rect, requiring computational analysis tools to interpret the data. In this talk I will present our computational approaches to extracting DNA methylation from MethylCap-Seq data, which includes quality control, methylation calling on in- dividual CpGs and on predefined genomic regions, global feature methylation summaries, and identification of significantly differentially methylated genomic regions. I will also present a use case of our approach in a large study of patients with acute myeloid leukemia (AML).

BelBI2016, Belgrade, June 2016. 9 Infomax strategies for an optimal balance of exploration and exploitation

Antonio Celani

The Abdus Salam International Centre for Theoretical Physics, Department of Quantitative Life Sciences, Trieste, Italy [email protected]

Abstract

Information theory branched out from the science of reliable communication to diverse domains that comprise decision theory, neural and cellular biology. In- fomax postulates that acquisition and transmission of information is a general functional principle, which has been applied to the visual system, transduction pathways, evolution, biological adaptation and regulation, training of neural networks, decision and search processes. While specific applications are success- ful, it remains generally unclear under what conditions information constitutes a valid functional proxy. Here, we consider the classical multi-armed bandit de- cision problem, which features arms (slot-machines) of unknown probabilities of success and a player trying to maximize cumulative reward by choosing the sequence of arms to play. The model captures the crux of the dilemma between exploitation and exploration, and optimal bounds and strategies are known. We introduce two novel Infomax strategies, Info-id and Info-p, which optimally gather information on the unknown identity of the best arm and on the highest mean reward among the arms, respectively. We investigate analytically and nu- merically the two strategies and compare their performance to optimal bounds. Strikingly, we find that Info-p performs optimally, whilst Info-id is vastly subop- timal, even though it gathers more information on the identity of the best arm. The cost and value of information is quantified via rate-distortion arguments. Results demonstrate the crucial role of the nature of information acquired by Infomax, which suggests new general approaches.

10 BelBI2016, Belgrade, June 2016. Leveraging Electronic Medical Records for Personalized and Population Healthcare

Nitesh V. Chawla

University of Notre Dame, IN, USA [email protected]

Abstract

Personalized healthcare and precision medicine are introducing novel opportu- nities to leverage data about an individual to deliver personalized health pro- files and wellness profiles to an individual. This data includes electronic medical records, genomics, lifestyle, and environmental data. However, there are funda- mental challenges from collecting such data at an individual level to integrating it to developing algorithms and tools to deliver the promise and outcome of personalized healthcare. Our research program is focused on these challenges. I will present our research on developing personalized disease risk profiles from electronic medical records (EMR), leveraging the phenotypes from EMR to guide genetic association discovery among diseases, and finally bringing together the spectrum of EMR to lifestyle data to guide a patient-centered population health management framework.

BelBI2016, Belgrade, June 2016. 11 Adapt or Die. Investigating the molecular basis of cell variability

Andrea Ciliberto

Quantitative Biology of Cell Division Unit, IFOM, Via Adamello 16, Milan, Italy [email protected]

Abstract

Cancer cells are highly proliferative. Drugs that impair cell proliferation antim- itotic drugs are indeed quite effective in treating cancer. Their mechanism of action is quite well understood. Several antimitotic drugs impair microtubule polymerization and thus do not allow cells to segregate their chromosomes. Cells arrested in their division cycle, however, will not remain arrested forever. Some will die and some will adapt resuming proliferation. The choice between these two fates is stochastic, with virtually identical cells going different paths. With a combination of mathematical models and live-cell imaging, we have ana- lyzed the phenomenon of adaptation, and we have investigated possible sources of variability that contributes to determine cell fate under constant drug treat- ment.

12 BelBI2016, Belgrade, June 2016. Ultrametric Approach to Bioinformation Systems

Branko Dragovich

1 Institute of Physics, University of Belgrade, Belgrade, Serbia [email protected] 2 Mathematical Institute SANU, Kneza Mihaila 36, Belgrade, Serbia

Abstract

Ultrametricity is related to metric spaces with utrametric distance, which is char- acterized by strong triangle inequality

d(x, z) ≤ max{d(x, z), d(z, y)}

Ultrametrics is appropriate for description of similarity (nearness) between el- ements of some information sets and in particular, between information in bio- logical systems. In this talk I will present basic properties of ultrametric distance, and in par- ticular – p-adic one. Then I will show that the set of 64 codons in the genetic code has an ultrametric structure which is suitably described by p-adic distance, where p=5 and p=2 [1, 2], see also a similar approach [3]. Some other proper- ties of the genetic code can be also expressed in terms of p-adic distance. I will also discuss some other examples of ultrametric biological information systems and point out their importance towards foundation of an ultrametric informa- tion theory. Keywords: ultrametrics, p-adic distance, bioinformation, genetic code

References

1. Dragovich, B., Dragovich, A.: A p-Adic Model of DNA Sequence and Genetic Code. p-Adic Numbers, Ultrametric Analysis and Applications, 1 (1), (2009). arXiv:q-bio.GN/0607018. 2. Dragovich, B., Dragovich, A.: p-Adic Modelling of the Genome and the Genetic Code. The Comuter Journal, 53(4), 432–442 (2010). arXiv:0707.3043[q-bio.OT]. 3. Khrennikov, A., Kozyrev, S.: Genetic Code on a Diadic Plane. Physica A: Stat. Mech. Appl., 381, 265–272 (2007).

BelBI2016, Belgrade, June 2016. 13 Molecular mechanism of Aβ amyloid formation

Oxana V. Galzitskaya, Olga M. Selivanova, Alexey K. Surin, Victor V. Marchenkov, Ulyana F. Dzhus, Elizaveta I. Grigorashvili, Mariya Yu. Suvorina, Anna V. Glyakina, and Nikita V. Dovidchenko

Institute of Protein Research, Russian Academy of Sciences, 142290 Pushchino, Moscow Region, Russia [email protected]

Abstract

It has been demonstrated using Aβ40 and Aβ42 recombinant and synthetic pep- tides that their fibrils are formed of complete oligomer ring structures. Such ring structures have a diameter of about 8-9 nm, the oligomer height of about 2-4 nm and the internal diameter of the ring of about 3-4 nm. Oligomers associate in a fibril in such a way that they interact with each other, overlapping slightly. There are differences in the packing of oligomers in fibrils of recombinant and synthetic Aβ peptides. The principal difference is in the degree of orderliness of ring-like oligomers that leads to generation of morphologically different fibrils. Most or- dered association of ring-like structured oligomers is observed for a recombinant Aβ40 peptide. Less ordered fibrils are observed with the synthetic Aβ42 peptide. Fragments of fibrils the most protected from the action of proteases have been determined by tandem mass spectrometry. It was shown that unlike Aβ40, fib- rils of Aβ42 are more protected, showing less ordered organization compared to that of Aβ40 fibrils. Thus, the mass spectrometry data agree with the electron microscopy data and structural models presented in our work.

14 BelBI2016, Belgrade, June 2016. Epigenetic state and spatial structure of chromatin

Mikhail Gelfand1,2

1 A.A. Kharkevich Institute for Information Transmission Problems, RAS [email protected] 2 Faculty of Bioengineering and Bioinformatics, M.V. Lomonosov Moscow State University, Moscow, Russia

Abstract

Abstract: Recent advances in large-scale experimental techniques, such as RNA- Seq, ChIP-Seq, HiC and others, provide data for integrated analysis of chromatin 3D state, epigenetic markers, and gene expression. Not surprisingly, these turned out to be highly interlinked. Contacting chromatin regions tend to carry similar histone modifications and gene experssion in such regions tends to be corre- lated. On a finer scale, topologically associating domains (TADs) also seem to depend on histone modifications and transcription. Indeed, TADs are enriched in repressive chromatin markers, wheres inter-TAD regions are enriched in active markers and highly transcribed genes. Moreover, differences in TAD structure be- tween cell lines are accompanied by corresponding differences in transcription. These observations seem to indicate that gene active expression is the driving force behind formation of the TAD structure. Finally, there are preliminary in- dications that regions forming many distant contacts are also enriched in active markers and actively transcribed genes.

BelBI2016, Belgrade, June 2016. 15 Dark states in quantum photosynthesis

Sergei Kozyrev

Steklov Mathematical Institute, Department of mathematical physics, Moscow, Russia [email protected]

Abstract

A model of quantum photosynthesis will be discussed. Photosynthesys system in the model is described by a three-level quantum system (which describes exitons) interacting with three different quantum fields, or reservoirs (light, phonons and the sink field corresponding to absorption of exitons), moreover, one of the levels of the system is degenerate. The degeneracy leads to excita- tion of the so called dark states (dark-state polaritons). Since interactions of the degenerate state of the system with two different reservoirs are different, the spaces of dark states for these fields are also different. This allows to manipulate these dark states, in particular, using spectroscopy. We conjecture that this model gives the description of the known from spectroscopic experiments phenomenon of observation of quantum coherences in photosynthesis systems.

16 BelBI2016, Belgrade, June 2016. Patient Specific Network Data Integration Enables Precision Medicine in Cancer

No¨el Malod-Dognin

Department of Computing, Imperial College London, London United Kingdom [email protected]

Abstract

Motivation. We are faced with a flood of molecular and clinical networked data. The recent advances in experimental technologies have resulted in the accumu- lation of large amounts of patient-specific “omics” and clinical datasets, which provide complementary information on the same disease. The challenge is how to mine these complex data systems to gain new insight into diseases and to im- prove therapeutics, in particular in the context of highly heterogeneous diseases such as cancer. Method. We introduce a versatile data fusion (integration) framework that can effectively integrate somatic mutation data, molecular interaction networks and drug chemical data to address three key challenges in cancer research: (1) strati- fication of patients into groups having different clinical outcomes, (2) prediction of driver genes whose mutations trigger the onset and development of cancers, and (3) repurposing of drugs treating particular cancer patient groups. Our new framework is based on graph-regularised non-negative matrix tri-factorization, a machine learning technique for co-clustering heterogeneous datasets. We apply our framework on ovarian cancer data to simultaneously cluster patients, genes and drugs by utilising all datasets. Results. We demonstrate superior performance of our method over the state-of- the-art method, Network-based Stratification, in identifying three patient sub- groups that have significant differences in survival outcomes and that are in good agreement with other clinical data. Also, we identify potential new driver genes that we obtain by analysing the gene clusters enriched in known drivers of ovarian cancer progression. We validated the top scoring genes identified as new drivers through database search and biomedical literature curation. Finally, we identify potential candidate drugs for repurposing that could be used in treat- ment of the identified patient subgroups by targeting their mutated gene prod- ucts. We validated a large percentage of our drug-target predictions by using other databases and through literature curation.

BelBI2016, Belgrade, June 2016. 17 Biophysical models of protein evolutionary dynamics

Alexandre Morozov

Rutgers University, USA [email protected]

Abstract

Abstract: High-throughput sequencing and other modern molecular biology tools have made it possible to track organismal evolution in unprecedented detail. As a result, we are now closer to understanding the fundamental principles involved in genome-scale evolution of proteins and protein interaction networks. Protein- protein interactions mediate numerous cellular processes, including metabolism, immune response, signaling, replication, and gene regulation. These interactions can rapidly evolve in response to perturbations in the protein physico-chemical environment, such as changes in the concentration or chemical composition of the protein’s binding targets. Several recent studies have underscored the piv- otal role of folding stability in protein evolutionary dynamics. Here I will focus on how structural coupling between folding and binding gives rise to evolu- tionary coupling between the traits of folding stability and binding strength. Using evolutionary models inspired by protein biophysics, I will show how these protein traits can emerge as evolutionary spandrels, that is, features that are by-products of the selection on some other trait, rather than direct targets of adaptation. In particular, proteins can evolve strong binding interactions that have no functional role but merely serve to stabilize the protein if its misfolding is deleterious. Furthermore, such proteins may have divergent fates, evolving to bind or not bind their targets depending on random mutational events. These ob- servations may explain the abundance of apparently non-functional interactions among proteins assayed using high-throughput protein-protein binding screens. For the common class of proteins with both functional binding and deleterious misfolding, evolution appears to be predictable at the level of biophysical traits: adaptive paths are constrained to first gain extra folding stability and then par- tially lose it as the novel binding function emerges, as frequently observed in protein engineering experiments. Overall, our findings lead to improved un- derstanding of evolution of proteins and protein interaction networks in both cellular and in vitro contexts.

18 BelBI2016, Belgrade, June 2016. What is bioinformatics made from: understanding database and software usage through literature mining

Goran Nenadi´c

1 School of Computer Science, University of Manchester, Institute of Biotechnology & Health eResearch Centre, Manchester, UK [email protected] 2 Mathematical Institute of SASA, Belgrade, Serbia

Abstract

Computional resources such as databases and software are central to bioinfor- matics research. They are often described within the biomedical literature, either when introduced to the community or when used as part of the methods. Using text mining to process the entire available literature could help reveal the pat- terns of database and software usage. Our group has developed a methodology to identify such resource mentions in full-text articles and construct networks of resources that can indicate their links and relative usage, both over time and within the sub-disciplines of bioinformatics, biology and medicine. For exam- ple, the bioinformatics literature has a high variability of new resources as novel resource development takes place, while database and software usage within bi- ology and medicine is more stable and conservative. Half of all mentions refer to only 133 resouces (top 5%), which seem to represent the core of the current bioinformatics. In some sub-disciplines, top 100 resources account for 96% of all mentions in the literature. While such resources could be interpreted as a proxy definition of a particualar area, it is apparent that many long-established resources are seeing a steady decline in their usage (e.g., BLAST, SWISS-PROT) while some are instead seeing rapid growth (e.g., the GO, R). We will illustrate the changes in the bioinfromatics resourceome by looking into specific journals and examining the ’long-tail’ of resources that are infrequently mentioned.

BelBI2016, Belgrade, June 2016. 19 A Quantum Approach to the DNA Functioning

Argyris Nicolaidis

Aristotle Univertsity of Thessaloniki, Theoretical Physics Department, Greece [email protected]

Abstract

We prime the notion that DNA is an information processing system, receiving registering transferring information. In the pursuit of an inherent logic in DNA functioning, we explore the possibility that quantum logic might serve this pur- pose. We use the quantum formalism to describe the DNA dynamics and as a byproduct we obtain the DNA vacuum. The DNA vacuum, in clear analogy to the quantum vacuum, is a collection of virtual DNA bases. An essential aspect of the DNA functioning is the complementarity relation R, which binds the pairs A-T, G-C, and generates the replication process. Further in an effort to codify DNA, we introduce a numbering, assigning a specific natural number to each individual DNA strand. This numbering allows a quantitative measure of the difference among the various DNA strands. Considering also that the four DNA bases constitute an ”alphabet”, we may assume the task to examine if DNA is a ”language”.

20 BelBI2016, Belgrade, June 2016. Effectiveness of Multiple Blood Cleansing Interventions in Sepsis

Zoran Obradovic

Laura H. Carnell Professor of Data Analytics Data Analytics and Biomedical Informatics Center, Computer and Information Sciences Department, Statistics Department Temple University, PA, USA [email protected]

Abstract epsis is a serious, life-threatening condition that presents a growing problem in medicine, but there is still no satisfying solution for treating it. Several blood cleansing approaches recently gained attention as promising interventions that target the main site of problem development the blood. The focus of this study is an evaluation of the theoretical effectiveness of hemoadsorption therapy and pathogen reduction therapy. This is evaluated using the mathematical model of Murine sepsis, and the results of over 2,200 configurations of single and multiple intervention therapies simulated on 5,000 virtual subjects suggest the advantage of pathogen reduction over hemoadsorption therapy. However, a combination of two approaches is found to take advantage of their complementary effects and outperform either therapy alone. The conducted computational experiments provide unprecedented evidence that the combination of two therapies synergis- tically enhances the positive effects beyond the simple superposition of the ben- efits of two approaches. Such a characteristic could have a profound influence on the way sepsis treatment is conducted. Results reported in this talk are published at April 21 issue of Scientific Re- ports by Nature Publishing Group and are obtained in collaboration with Ivan Stojkovic, Mohamed Ghalwash and Xi Hang Cao. The study is funded by DARPA Dialysis-like Therapy program.

BelBI2016, Belgrade, June 2016. 21 Comparative analysis of plant genome structure and antisense transcripts

Salwa E.S. Mohamed1, Oxana B. Dobrovolskaya2,3, Vladimir N. Babenko2, KhaledSalem1, Ming Chen4, and Yuriy L. Orlov2,3

1 Genetic Engineering and Biotechnology Research Institute, Sadat City, Egypt 2 Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia 3 Novosibirsk State University, Novosibirsk, Russia [email protected] 4 Zhejiang University, Hangzhou, China

Abstract

Analysis of next generation sequencing data on plants genomes in specialized databases is challenging problem of computer genomics. Pairs of RNA molecules transcribed from partially or entirely complementary loci are called cis-natural antisense transcripts (cis-NATs), and they play key roles in the regulation of gene expression in many organisms including plants. A promising experimental tool for profiling sense and antisense transcription is strand-specific RNA sequenc- ing. Earlier, identification of chromatin signature of cis-NATs in Arabidopsis in- dicated a connection between cis-NAT transcription and chromatin modification in plants. An analysis of small-RNA sequencing data showed that 4% of cis-NAT pairs produce putative cis-NAT-induced siRNAs. To meet issues of statistical anal- ysis of plant genome sequencing data we developed set of computer programs to define antisense transcripts and miRNA genes based on available sequencing data. Text complexity as a measure of context dependencies was applied for nu- cleotide sequences containing antisense transcripts in plants, as previously we did it for monomer analysis. We had search for homological regulatory regions in model plant genome organisms. We have analyzed data from PlantNATsDB (Plant Natural Antisense Transcripts DataBase) which is a platform for annotat- ing and discovering NATs by integrating various data sources (Chen et al., 2012). It contains about 70 plant species. The database provides an integrative, inter- active and information-rich web graphical interface to display multidimensional data, and facilitate research and the discovery of functional NATs. Available in- formation for the transcription factors for each species was retrieved from the Plant Transcription Factor Database. The phenomenon of antisense transcription and miRNA interference need further annotation in new sequenced genomes. We have compared gene structure in natural cis-antisense transcript for wheat and related plant genomes taking to account genes responsible for stress toler- ance. Keywords: plant genomes, transcription, sequencing, databases, wheat, genomics

Acknowledgements The work is supported in part by RFBR (15-04-05371; 16-54-53064).

22 BelBI2016, Belgrade, June 2016. Network Data Integration Enables Precision Medicine

Nataˇsa Prˇzulj

Computer Science Department, University College London, United Kingdom [email protected]

Abstract

We are faced with a flood of molecular and clinical data. Various biomolecules interact in a cell to perform biological function, forming large, complex systems. Large amounts of patient-specific datasets are available, providing complemen- tary information on the same disease type. The challenge is how to mine these complex data systems to answer fundamental questions, gain new insight into diseases and improve therapeutics. Just as computational approaches for ana- lyzing genetic sequence data have revolutionized biological understanding, the expectation is that analyses of networked ”omics” and clinical data will have similar ground-breaking impacts. However, dealing with these data is nontrivial, since many questions we ask about them fall into the category of computation- ally intractable problems, necessitating the development of heuristic methods for finding approximate solutions. We develop methods for extracting new biomedical knowledge from the wiring patterns of large networked biomedical data, linking network wiring patterns with function and translating the information hidden in the wiring patterns into everyday language. We introduce a versatile data fusion (integration) frame- work that can effectively integrate somatic mutation data, molecular interac- tions and drug chemical data to address three key challenges in cancer research: stratification of patients into groups having different clinical outcomes, predic- tion of driver genes whose mutations trigger the onset and development of can- cers, and re-purposing of drugs for treating particular cancer patient groups. Our new methods stem from network science approaches coupled with graph- regularised non-negative matrix tri-factorization, a machine learning technique for co-clustering heterogeneous datasets.

BelBI2016, Belgrade, June 2016. 23 Intrinsically disordered protein families

Marco Punta

Centre for Evolution and Cancer The Institute of Cancer Research, London, United Kingdom [email protected]

Abstract

Intrinsically disordered proteins have been reported to be on average less con- served in sequence than the structured regions of the proteome. As a conse- quence, a lot of effort has focused on the identification of short (often < 10 aa) conserved linear motifs known to be carrying diverse functional traits such as, for example, post-translational modification sites and fold-upon-binding in- teraction regions [1]. Such short linear motifs can arise independently in non- homologous sequences. At the same time, a number of longer, evolutionary re- lated, conserved disordered regions are known and have been referred to as ’disordered domains’ [2]. Some of these have already been integrated into pro- tein family databases such as . In this work, we annotate a set of yet un- classified long homologous intrinsically disordered regions (disordered families) within the UniProtKB database. We generate multiple sequence alignments for each family and look for evidence of their functions in the literature.

References

1. Dinkel et al. Nucleic Acids Res. 44:D294-300 (2016) 2. Tompa et al. Bioessays. 31:328-35 (2009)

24 BelBI2016, Belgrade, June 2016. The Influence of Copy-Number Maintenance Mechanisms of Targeted Extrachromosomal Genetic Elements on the Outcome of CRISPR-Cas Defense

Konstantin Severinov1,2, Iaroslav Ispolatov3, and Ekaterina Semenova1

1 Waksman Institute of Microbiology, Rutgers, the State University of New Jersey, Piscataway, NJ 08854, USA [email protected] 2 Skolkovo Institute of Science and Technology, Skolkovo 143025, Russia 3 Department of Physics, University of Santiago de Chile, Santiago, Chile

Abstract

Prokaryotic type I CRISPR-Cas systems respond to the presence of mobile genetic elements such as plasmids and phages in two different ways. CRISPR interfer- ence efficiently destroys foreign DNA harbouring protospacers fully matching CRISPR RNA spacers. In contrast, even a single mismatch between a spacer and a protospacer can render CRISPR interference ineffective but causes primed adap- tation - efficient and specific acquisition of additional spacers from foreign DNA into the CRISPR array of the host. It has been proposed that the interference and primed adaptation pathways are mediated by structurally different com- plexes formed by the effector Cascade complex on matching and mismatched protospacers. We will review experimental evidence and present a simple math- ematical model that shows that when plasmid copy number maintenance/phage genome replication is taken into account, the two apparently different outcomes of the CRISPR-Cas response can be accounted for by just one kind of effector complex on both targets. The results underscore the importance of considera- tion of targeted genome biology when considering consequences of CRISPR-Cas systems action.

BelBI2016, Belgrade, June 2016. 25 Symmetry and minimum principle: a basis for the genetic code ?

Paul Sorba

Laboratory of Theoretical Physics and CNRS, Annecy, France [email protected], [email protected]

Abstract

The importance of the notion of symmetry in physics is well established: could it also be the case for the genetic code? In this spirit, a model for the Genetic Code based on continuous symmetries and entitled the ”Crystal Basis Model” has been proposed a few years ago and applied to different problems, such as the elaboration and verification of sum rules for codon usage probabilities, rela- tions between physico-chemical properties of amino-acids and some predictions [1]. Defining in this context a ”bio-spin” structure for the nucleotids and codons, the interaction between a couple codon - anticodon can simply be represented by a (bio) spin- spin potential. Then, imposing the minimum energy principle, an analysis of the evolution of the genetic code can be performed with good agreement with the generally accepted scheme. A more precise study of this in- teraction model provides informations on codon bias, consistent with data [2]. This work is made in collaboration with A.Sciarrino, Universit di Napoli, Italy.

References

1. see for ex. L.Frappat, A.Sciarrino and P.Sorba , J.Biol.Phys. 27, 1-34 (2001); ibid. 28,17-26 (2002); Phys.lett.A311, 264-269 (2003) 2. A.Sciarrino and P.Sorba, BioSystems 107, 113-117 (2012); ibid. 111, 175-180 (2013); ibid.. 141, 20-30 (2016).

26 BelBI2016, Belgrade, June 2016. Networks of Co-expression Modules

Aleksandar Stojmirovi´c

Janssen R & D, LLC, Systems Pharmacology & Biomarkers, Immunology TA, Pennsylvania, United States of America [email protected]

Abstract

We present an approach to reduce the complexity of human tissue transcrip- tomic datasets by constructing networks of co-expression modules. We illustrate our proposed method using public data sets: three large datasets generated from liver, omental adipose and subcutaneous adipose samples collected from mor- bidly obese subjects, and a dataset of terminal ileum biopsies taken from pe- diatric subjects. Providing scaffolds for projection of data from other sources, module networks facilitate integrative analyses and provide insight into biolog- ical functions and cell compositions of the profiled tissues.

BelBI2016, Belgrade, June 2016. 27 Algebraic Topology Analysis of Brain Graphs Emanating from Social Communications

Bosiljka Tadi´c and Miroslav Andjelkovi´c

Jozef Stefan Institute, Department of Theoretical Physics, Ljubljana, Slovenia [email protected]

Abstract

In recent years, mapping the brain imaging data onto brain networks and the objective analysis using graph theory methods provided a new framework for better understanding the functional brain connections, e.g., related to infor- mation processing, cognitive control, the perception of space, time, numbers, and languages, or the presence of disease [1–3]. On the other hand, the archi- tecture of brain connections underlying human social behavior remains largely unexplored. Here, we use algebraic topology of graphs to analyze higher order structures occurring in the functional brain networks in spoken communications. In particular, we consider the correlations among sets of EEG signals recorded during the speakerlistener communications [4, 5]. The analysis reveals the or- ganization of the active areas in the speakers and listeners brain as well as the composition of the cross-brain correlations. The higher-order structures are rec- ognized by the presence of simplexes (cliques of potentially high dimension, topological level) and their complexes. The structural complexity of these brain networks is quantified by the number of simplexes and shared faces at each topo- logical level and the entropy related with the nodes population at each level. We show how the shifts in these topology measures vary with the quality of the speakerlistener communication, which depends on the communicated content. Keywords: functional brain networks, EEG data, algebraic topology of graphs

References

1. Sporns, O.: Structure and function of complex brain networks. Dialogues Clin. Neurosci., Vol. 15, No. 3, 247-262, (2013) 2. De Vico Falani, F., Richiardi, J., Chavez, M., Achard, S.: Graph analysis of functional brain networks: practical issues in translational neuroscience, arXiv:1406.7931 3. Zeng L.-L. et al., Identifying major depression using whole-brain functional connectivity: a multivariate pattern analysis, Brain, Vol.? (2012) 4. Kuhlen, A.K. et al., Content-specific coordination of listeners’ to speakers’ EEG during com- munication, Frontiers Human Neurosci., Vol. 6, 266 (2012) 5. M. Andjelkovic´ et al. Towards understanding the social impact to functional brain connec- tions and the formation of super-structures. Preprint (2016).

28 BelBI2016, Belgrade, June 2016. The role of structural disorder in protein degradation

Peter Tompa

VIB Structural Biology Research Center, Vrije Universiteit Brussel, Belgium [email protected]

Abstract

Structurally disordered proteins (IDPs) are prevalent in the proteome and often function by partner recognition and induced folding. Frequently, their recogni- tion elements are comprised of a short sequence of residues, termed Eukaryotic Linear Motifs (ELMs). ELMs represent an underappreciated functional element of the proteome, because their study lags far behind that of domains. Here I will give an overview of the motif field, including an assessment of the total number of motifs in the human proteome [1], followed by the analysis the role of motifs in protein degradation. Protein turnover is regulated by specific sig- nals (degrons), which we suggest to have a ”tripartite” nature [2, 3]. Tripar- tite degrons comprise: (1) a primary degron that specifies substrate recognition by cognate E3 ubiquitin ligases, (2) secondary site(s) comprising a single, or multiple neighboring, poly-ubiquitinated lysine(s), and (3) a segment that initi- ates substrate unfolding at the 26S proteasome. By collecting and analyzing all relevant cases (124 instances of 18 degron types), we show that primary and secondary degrons are short motifs that tend to fall into locally disordered re- gions, whereas the tertiary degron is a disordered segment in the vicinity of the secondary one that is responsible for effective proteasomal engagement. The im- portance of degron motifs in disordered regions is shown by the high incidence of their disease-causing mutations and their involvement in protein degradation mediated by mono-ubiquitination [4].

References

1. Tompa, P., Davey, N. E., Gibson, T. J., and Babu, M. M. (2014) A million peptide motifs for the molecular biologist. Mol. Cell 55: 161-169. 2. Guharoy, M., Bhowmick, P., Sallam, M. and Tompa, P. (2016) Tripartite degrons confer di- versity and specificity on regulated protein degradation in the ubiquitin-proteasome system. Nature Comm. 7: 10239. 3. Guharoy M, Bhowmick P, Tompa P. (2016) Design principles involving protein disorder fa- cilitate specific substrate selection and degradation by the ubiquitin-proteasome system. J Biol Chem. [Epub] 4. Braten O. et al. (2016) Numerous proteins with unique characteristics are degraded by the 26S proteasome following monoubiquitination. Cell [submitted]

BelBI2016, Belgrade, June 2016. 29 Non-globular proteins: Towards an understanding of the ”dark matter” in the protein universe

Silvio C.E. Tosatto

Dept. of Biomedical Sciences, University of Padova, Italia [email protected]

Abstract

Non-globular proteins (NGPs) encompass different molecular phenomena that defy the traditional sequence-structure-function paradigm. NGPs include intrin- sically disordered regions, tandem repeats, aggregating domains, low-complexity sequences and transmembrane domains. Although growing evidence suggests that NGPs are central to many human diseases, functional annotation is very limited. It was recently estimated that close to 40% of all residues in the human proteome lack functional annotation and many of these are NGPs. Several computational developments in the field of NGPs will be discussed. The MobiDB (Potenza et al., NAR database issue 2015; http://mobidb.bio.unipd.it/) and RepeatsDB (Di Domenico et al., NAR database issue 2014; http://mobidb. bio.unipd.it/) databases have been recently established to annotate intrinsically disordered and structurally repeated proteins respectively. Both can be easily ac- cessed through web services. A large-scale analysis of intrinsic disorder data has shown interesting differences among both predictors and experimental sources. Preliminary data on tandem repeat structures also helps explain their lack of annotation in protein domain databases. Last but not least, a newly established European research network focusing on NGPs aims to bring light into this (dark) corner of the protein universe.

30 BelBI2016, Belgrade, June 2016. Intrinsically disordered proteins in salted water and in the thick soup

Vladimir N. Uversky1,2,3

1 Department of Molecular Medicine, University of South Florida, Tampa, USA [email protected] 2 Department of Biology, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia 3 Laboratory of Structural Dynamics, Stability and Folding of Proteins, Institute of Cytology, Russian Academy of Sciences, St. Petersburg, Russian Federation

Abstract

Intrinsically disordered proteins (IDPs) lack stable tertiary and/or secondary structure under physiological conditions in vitro. Computational studies revealed that they are highly abundant in nature, as 25-30% of eukaryotic proteins are mostly disordered, and > 50% of eukaryotic proteins and > 70% of sig- naling proteins have long disordered regions. Often, these IDPs are involved in regulation, signaling and control pathways, where binding to multiple part- ners and high-specificity/low-affinity interactions play a crucial role. It is sug- gested that functions of IDPs may arise from the specific disorder form, from inter-conversion of disordered forms, or from transitions between disordered and ordered conformations. The choice between these conformations is deter- mined by the peculiarities of the protein environment, and many IDPs possess an exceptional ability to fold in a template-dependent manner. IDPs are highly abundant among hub proteins. They are associated with alternative splicing. This association helps proteins to avoid folding difficulties and provides a novel mechanism for developing tissue-specific protein interaction networks. Numer- ous IDPs are commonly associated with such devastating maladies as cancer, cardiovascular disease, amyloidoses, neurodegenerative diseases, and diabetes. Novel strategies for drug discovery are based on these proteins. The vast ma- jority of in vitro experiments with IDPs are traditionally performed under the relatively ideal thermodynamic conditions of low protein and moderate salt concentrations. However, the concentration of macromolecules, including pro- teins, nucleic acids, and carbohydrates, within a cell can be as high as 400 g/L, creating a crowded medium, with considerably restricted amounts of free wa- ter. The volume occupied by the macromolecular co-solutes is unavailable to other molecules, giving rise to the so-called excluded volume effects. Although it is believed that excluded volume can affect the behavior of biological macro- molecules, and protein-protein interactions, the accumulated data support the notion that many IDPs preserve their mostly disordered state in crowded envi- ronment. Disclaimer: This work was supported in part by a grant from the Russian Science Foundation RSCF No. 14-24-00131.

BelBI2016, Belgrade, June 2016. 31 Transcription factors interaction inference based on sequence feature representations

Nevena Veljkovi´c

Center for Multidisciplinary Research, Institute of Nuclear Sciences Vinca, University of Belgrade, Mihaila Petrovica Alasa 14, 11001, Belgrade, Serbia [email protected]

Abstract

Being central to most biological processes protein protein interactions (PPI) rep- resent an important class of targets for human therapeutics. Transcriptional reg- ulation which occurs mostly via PPI and which is often deregulated in cancer and complex diseases is at the forefront of this type drug discovery. Understand- ing how biomolecules recognize each other complements information on protein binding in a way that brings us necessary insights into how both high affinity and high specificity are achieved. Long-range intermolecular interactions play an im- portant role in the recognition between protein macromolecules and drugs and therapeutic targets. In my speech, I will describe computational methods for PPI prediction that consider long-range interaction properties of the sequence. Pre- dictors based on spectral representation of a sequence and pseudo amino acid composition that efficiently decipher PPI involved in transcriptional regulation will be presented.

32 BelBI2016, Belgrade, June 2016. DNA polymorphism as a tool for genetic information implementation

Sergey N. Volkov

Bogolyubov Institute for Theoretical Physics, NAS of Ukraine, Kiev, 03680 Ukraine [email protected]

Abstract

Accuracy of genetic information implementation in living cells is largely due to the peculiarities of the structure and variability of DNA double helix. The reg- ulation of genetic activity, stability and security of genetic texts, reading and translation of genetic information, all of these important biological processes take place because of the unique properties of the DNA double helix, which dis- tinguish them from other cellular molecules. One of these key properties of DNA molecule is the polymorphism of double helix, through which this molecule has the ability to change its secondary structure under the influence of some external factors or depending on the nucleotide sequence. Arising in this case localized deformations provide a broad palette of tools in the processes of genetic informa- tion realization. Besides, the restructuring of the double helix under threshold deformation of DNA allow preserving the genetic texts and protecting them from possible emergencies in the cell. The role of localized and threshold deformations caused by the polymorphic properties of the double helix is under thorough research lately [1, 2]. These deformations have sufficiently large amplitude of structural element deviations from their equilibrium positions in the double helix and therefore cannot be un- derstood in terms of the model of elastic rod that is fair for the study of DNA mechanics in harmonic approximation. On the other hand the all-atomic mod- eling cannot frequently explain the mechanism of complex processes of DNA deformations. In the report the approach for consideration of conformational depended defor- mations of DNA macromolecule is presented. The transformation of DNA struc- ture is considered in the frame of two-component model. One model component (external) describes the macromolecule deformation as in the model of elastic rod, another component (internal) - the conformation changes of the macro- molecule monomer units. Both components are considered as interconnected on the paths of certain conformational transformation [3, 4]. The developed approach allows studying the physical mechanisms of the lo- calized restructuring of the double helix due to the action of small molecules, regulatory proteins, and external forces on DNA structure. The obtained results give a consistent interpretation of the observed effects of the deformability of TATA-box, A-tract, allosteric proximal and distinct effects, and also the threshold

BelBI2016, Belgrade, June 2016. 33 Sergey N. Volkov character of DNA unzipping and overstretching. The approach provides the pos- sibility to predict the sizes and energies of local deformation of the double helix at the location of some definite nucleotide sequences by them conformational states. Theoretical study of threshold deformations in DNA unzipping and following molecular dynamics simulations [5] confirm the opinion that double-stranded DNA in solution is a highly organized system with definite degrees of protec- tion of its structure against extreme situation. The results obtained could also be useful for the development of modern technologies in the field of molecular medicine, and DNA-based engineering as well.

References

1. Ch. Prevost, M. Takahashi, R. Lavery, ChemPhysChem, 10 (2009) 1399. 2. P.D. Dans et al., Nucleic Acids Research, 40 (2012) 10668. 3. S.N. Volkov, Bioph. Bulletin, 7 (2000) 7; Ibid, 12 (2003) 5; J. Biol. Phys., 31(2005) 323. 4. P.P. Kanevska, S.N. Volkov, Ukr. J. Phys., 51 (2006) 1001. 5. S.N. Volkov, A.V. Solovyov, Eur. Phys. J. D., 54 (2009) 657; S.N. Volkov et al., J. Phys.: Condens. Mat. 24 (2012) 0351043.

34 BelBI2016, Belgrade, June 2016. OrthoDB: an evolutionary perspective to interpreting genomics data

Robert M. Waterhouse, Evgenia V. Kriventseva, and Evgeny M. Zdobnov

University of Geneva Medical School & Swiss Institute of Bioinformatics, Rue Michel-Servet 1, 1211 Geneva, Switzerland {Robert.Waterhouse,Evgenia.Kriventseva,Evgeny.Zdobnov}@unige.ch

Abstract

The OrthoDB [1] hierarchical catalog of orthologs represents a comprehensive resource of comparative genomics data that delineates the evolutionary his- tories of millions of genes from thousands of bacteria (3669) and hundreds of plants (33), fungi (227), and animals (331). Users may browse the cata- log at www.orthodb.org to view extensive mapped gene functional annotations and quantified evolutionary traits. To facilitate large-scale evolutionary and/or functional genomics research projects, dynamic data queries may be performed through the dedicated application programming interface, or the OrthoDB soft- ware may be employed to compute tailored orthology datasets. Additionally, OrthoDBs sets of Benchmarking Universal Single-Copy Orthologs, BUSCOs [2], provide a rich source of data to assess the quality and completeness of genome assemblies and their gene annotations. OrthoDB resources and tools enable extensive orthology-based genome annota- tion and interpretation in a comparative genomics framework that incorporates the growing numbers of sequenced genomes. Orthology is a cornerstone of com- parative genomics, and such approaches are well-established as immensely valu- able for gene discovery and characterization, offering evolutionarily-qualified hypotheses on gene function by identifying ”equivalent” genes in different species. Orthology-based approaches therefore provide an important evolutionary per- spective to interpreting the increasing quantities of genomics data, and OrthoDB offers both the ability to run custom analyses and to query extremely compre- hensive sets of orthology classifications. Keywords: OrthoDB, BUSCO, orthology, gene function, gene evolution, genome

References

1. Kriventseva, EV., Tegenfeldt, F., Petty, TJ., Waterhouse, RM., Simo, FA., Pozdnyakov, IA., Ioannidis, P., Zdobnov, EM.: OrthoDB v8: update of the hierarchical catalog of orthologs and the underlying free software. Nucleic Acids Res. 43(Database issue):D250-6. (2015) 2. Simo, FA., Waterhouse, RM., Ioannidis, P., Kriventseva, EV., Zdobnov, EM.: BUSCO: as- sessing genome assembly and annotation completeness with single-copy orthologs. Bioin- formatics. 31(19):3210-2. (2015)

BelBI2016, Belgrade, June 2016. 35 From biocuration to model predictions and back

Ioannis Xenarios

SIB Swiss Institute of Bioinformatics, Center for Integrative Genomics, University of Lausanne, Switzerland [email protected]

Abstract

We are in a time where sequencing is the paramount of biological and medical research (at least thats what the press is talking about), but piecing the genomic information from the regulation to its functional impact is a major challenge for biology. My presentation will describe the work of unsung heroes that are painstakingly biocurating the scientific literature and creating resources such as UniProtKB/Swiss-Prot and other that are making the life easier for hundred of thousands of scientists. I will then present some applications that use these resources in the system modeling arena, demonstrating that these models could be predictive and useful for targeting certain type of experimental design and discover novel treatments. The presentation will also stress the importance of proper infrastructure and expertises that are needed to enable such type of research as well as the clear necessity of continued international collaboration to achieve these goals.

36 BelBI2016, Belgrade, June 2016. SPEAKERS IN SESSIONS

Bioinformatics Basis for the ”Molecular Tweezers” Construction

Anastasia Anashkina1 and Alexei Nekrasov2

1 Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilovstr.,32, 119991 Moscow, Russia [email protected] 2 M.M. Shemyakin and Yu.A. Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Miklukho-Maklaya str., 16/10, 117997 Moscow, Russia alexei [email protected]

Abstract. The aim of this work is the creation of a potential field, which describes the interaction between amino acid residues. We used the previ- ously proposed in ANIS method basic unit of the protein sequence which is a block of five adjacent amino acid residues. We introduced a new classifi- cation of amino acid residues (information type) depending on the residue position respect to the local extrema of the occupancy profile of the pro- tein sequences. We have calculated 20x20 contacts matrices for each pairs of informational type of residues and for distances between Cα atoms of contacting residues from 3 A˚ up to 15 A.˚ The resulting set of matrix has a 37-fold excess of ”information importance” than previously known matrix of contacts. The proposed approach makes it possible to design ”molecular tweezers”, which can be attached to a variety of molecular identification systems, such as green fluorescent protein. Keywords: ANIS method, informational structure, contact matrices

1. Introduction

Potential fields that effectively describe the interactions between protein mole- cules are very important as for basic science and for applications. The aim of this work is the creation of a potential field, which describes the interaction between amino acid residues and that has more ”informative value” than potential field by the Voronoi-Delaunay method. To solve this problem, we used the previously proposed in ANIS method basic unit of the protein sequence which is a block of five adjacent amino acid residues [1]. This block has been named the infor- mation unit of protein, and it has been shown [1, 2], that information units can determine the structural organization of the local polypeptide chain. The density of the information units in a protein sequence is determining the effectiveness of interactions within the polypeptide chain [3].

2. Classification of Amino Acid Residues

In this paper, we introduced a new, additional, classification of amino acid residues (information type) depending on the residue position respect to the local ex- trema of the occupancy profile of the protein sequences [1]. Figure 1 shows

BelBI2016, Belgrade, June 2016. 39 Anashkina Anastasia et al. the amino acid sequence of the protein and the corresponding ”occupancy pro- file” of information units (see [4] for description of the ANIS method). We have considered residue position at the points of local maxima and minima of the occupancy profile, two residues adjacent right and left of these positions, and residues outside local extrema of the occupancy profile (eleven informational types of residue).

Fig. 1. Classification of amino acids according to the local extrema of the occu- pancy profile. M is amino acid residue at the local minimum position; ML1, MR1 are positions of residues shifted one position to the left or right from the mini- mum; ML2, MR2 are residue positions shifted two positions to the left or right from the minimum; P is amino acid residue position at the local maximum of oc- cupancy profile; PL1, PR1 are residue positions shifted one position to the left or right from the maximum; PL2, PR2 are residue positions shifted two positions to the left or right from the maximum; U is the other positions of occupancy profile.

3. Contact Matrices

We have calculated the information structure of each protein chain sequence (oc- cupancy profiles) from the set of 11,000 protein-protein complexes. All residues were classified by information type according to the residue position in the occu- pancy profile. In this paper we defined that the amino acids are in contact, if the distance between atoms of these amino acids (except hydrogen) is in the interval from 2.0 to 3.4A.˚ Such a ”hard” condition was used to avoid accidental contacts. Matrix of contacts calculated for all possible pairs of residues information types, in total 66 matrices of 20x20. Contact maps (Figure 2) show that the change of information type changes the surface describing the frequency of contacts between residues. Thus residues information type is an important factor deter- mining the specificity of the interaction between amino acid residues in protein- protein complexes. ”Information value” of the contacts matrices were compared with a matrix obtained by Voronoi Delaunay tessellation and it was shown that the average information entropy for the contacts matrices of is 0.2, and for ma- trix obtained by Voronoi-Delaunay tessellation [4, 5] is 7.5. This means that the

40 BelBI2016, Belgrade, June 2016. Bioinformatics Basis for ...

Fig. 2. Contact matrices for pairs of residues with different informational type: residues located at position in the local minimum ML1 with residues in positions ML2, ML1, M, MR1, MR2, - top row, PL2, PL1, P, PR1, PR2 bottom row). Contact matrices are shown as contour lines in a logarithmic scale.

addition of information types for residues increases the specificity of contact in the matrices describing the interaction in protein-protein complexes.

BelBI2016, Belgrade, June 2016. 41 Anashkina Anastasia et al.

4. Conclusions

The best tool for specific protein binding is monoclonal antibodies. But the bind- ing site in the target protein cannot be determined by the investigator and are selected by immune system. We offer a tool for designing specific binding site for recognition of researcher-defined area on the surface of the target protein. The proposed approach makes it possible to design ”molecular tweezers”, which can be attached to a variety of molecular identification systems, such as green fluo- rescent protein. Thus, it is possible to identify individual molecules in biological systems. In addition, choosing a binding site in the target protein, it is possible to block certain of its function. It may also be an important tool in biomedical research and medical practice. The data obtained in the studies opens up new opportunities to create fundamentally new artificial proteins for binding and regulation in protein engineering and various biotechnological applications.

Acknowledgments

This work was supported by the Russian Foundation for Basic Research, the grant 15-04-99605a and by grant of RAS Presidium program of fundamental research in strategic directions of science development ”Fundamental problems of mathematical modelling” (Program code: II.4), project ”Mathematical model of natural polypeptide chains spatial organization, based on information content of protein sequence”.

References

1. Nekrasov AN, Anashkina AA, Zinchenko AA. A new paradigm of protein structural organi- zation. Institute of Physics, Belgrade. (2014) 2. Nekrasov AN. Entropy of Protein Sequences: An Integral Approach. J. Biomol. Struct. Dyn. Vol. 20, 8792. (2002) 3. Nekrasov AN, Zinchenko AA. Structural Features of the Interfaces in Enzyme-Inhibitor Complexes. J. Biomol. Struct. Dyn. Vol. 28, 8596. (2010) 4. Anashkina A, Kuznetsov E, Esipova N, Tumanyan V. Comprehensive statistical analysis of residues interaction specificity at proteinprotein interfaces. Proteins Struct. Funct. Bioin- forma. Vol. 67, 106077. (2007) 5. Medvedev N. The algorithm for three-dimensional voronoi polyhedra. J. Comput. Phys. Vol. 67, 2239. (1986)

42 BelBI2016, Belgrade, June 2016. Clustering of CpG-rich elements in gene dense regions

Vladimir Babenko, Irina Chadaeva, and Yuriy. Orlov

Institute of Cytology and Genetics, Lavrentieva str 10, Novosibirsk, 630090, Russia [email protected]

Abstract

Due to the fourfold depletion of CG dinucleotides in human genome caused by targeted methylation they represent a highly specific marker for open chro- matin. We sought to elucidate its functional relevance by considering the loca- tion specifics of CG-rich clusters that are CpG islands (CGIs) and Alu retrotrans- posons. Chromosome wise resolution that displays the genes and CGI association was reported previously. We report the strong domain wide association of genes and CGIs across 30000 100kb non-overlapping bins. Nearly a half of genome is void of both genes and CGIs (43%), while 33% bins contain both elements, so the dis- cordant bins comprise only 24% of genome implying high significance of their non-random association. We ascertained that the major cause of this correlation is the joint affinity to chromatin accessibility, assessed as Dnase Hypersensitive Sites (DHS) density and chromatin state. Both genes and CGIs demonstrate high correlation with DHS and open state chromatin distribution genome wide. Alu clusters also demonstrated distinct affinity to open chromatin. Using chro- matin signatures inferred from topologically associated domains comprising 3d map of human genome [1], we elucidated that genes, CGIs and cg-rich AluY are preferentially clustered in gene dense chromatin of A1 type. Besides non-methylated, mostly promoter linked CpG islands which are inher- ently associated with DHS, we found that methylated CGIs also maintain strong affinity to the accessible chromatin and DHS hotstpots, implying that the vast majority of them maintains the functionality. The striking massive instances of highly clustered CG-rich elements are under- scored by chromosome 19, which features 2.5 fold densities of genes, CGIs and Alus compared with the closest chromosome density. Notably, skipping A2, only A1 gene dense open chromatin type is present on chromosome 19, while the single alternative A2 is 1.5 times more abundant genome wide [1]). While the phenomenon of gc-rich gene dense regions has long been appre- hended, we approached it using specific distribution patterns and large scale chromatin analysis. Genes, CGIs, and Alus elevated density in open chromatin implies complex in- teractions of them in the process of gene functioning. As one of elaborations of

BelBI2016, Belgrade, June 2016. 43 Vladimir Babenko et al. large scale viewpoint, we speculate on the observation that both 5 and 3 gene re- gions are encompassed with CG- rich stretches in gene dense regions specifically. We discuss how it may accommodate their expression in a range of ways.

References

1. Rao SS, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT, Sanborn AL, Machol I, Omer AD, Lander ES, Aiden EL. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159(7):1665-1680

44 BelBI2016, Belgrade, June 2016. SNP-Based Noninvasive Prenatal Screening using Cell-Free DNA for Detection of Fetal Chromosome Abnormalities

Milena Banjevic, Allison Ryan, and Styrmir Sigurjonsson

Natera Inc., 201 Industrial Rd, San Carlos, CA 94070 USA [email protected]

Abstract

A singlenucleotide polymorphism (SNP)-based noninvasive prenatal test (Pano- ramaTM, Natera, San Carlos, CA) detects aneuploidy in cell-free DNA from ma- ternal blood as early as nine weeks gestation. Fetal fraction limit for aneuploidy detection is 2.8% (measured post-PCR in the sequencing data). Over 1000 tests per day are performed internationally, with overall accuracy of aneuploidy de- tection > 99.5% and fetal sex determination accuracy of 100%. In this test, SNPs are amplified using a targeted massively multiplexed PCR (mmPCR) approach, and then sequenced using next-generation sequencing (NGS). Nateras propri- etary model of allelic data takes into account sample, amplification, sequencing and SNP target set characteristics (process characteristics). Using this model, fetal chromosomal copy numbers are inferred from the allelic sequencing data using iterative variational Bayesian and maximum likelihood methods. Here we will demonstrate adherence of the model to real world data based on a data set with known copy numbers and associated mother and child genotypes. We present the performance of the algorithm as a function of key sample character- istics for a large real sample data set with known chromosomal copy numbers. We will also show the performance of the algorithm in a simulated environment in which we can perform stress tests on a wide range of process characteristics. Keywords: non-invasive prenatal testing, variational Bayesian methods, NIPT

References

1. Zimmermann, B., Hill, M. et al. (2012) Noninvasive prenatal aneuploidy testing of chromo- somes 13, 18, 21, X, and Y, using targeted sequencing of polymorphic loci. Prenat. Diagn., 32:12331241 2. RyanA.,HunkapillerN.,Banjevic M et al. (2015) Validation of an Enhanced Version of a Single-Nucleotide Polymorphism-Based Noninvasive Prenatal Test for Detection of Fetal Aneuploidies. Fetal Diagnosis and Therapy. 3. http://www.natera.com/science-informatics

BelBI2016, Belgrade, June 2016. 45 Achieving a rapid expression of toxic (but useful) molecules within cell

Bojana Blagojevic1 and Magdalena Djordjevic1 and Marko Djordjevic2

1 Institute of Physics Belgrade, University of Belgrade, Pregrevica 118, 11080 Belgrade, Serbia {bojanab,magda}@ipb.ac.rs 2 Institute of Physiology and Biochemistry, Faculty of Biology, University of Belgrade, Studentski trg 16, 11000 Belgrade, Serbia [email protected]

Abstract

Restriction-modification system (RM) is a rudimental bacterial immune system, whose main ingredients are the restriction enzyme (R), which cuts specific DNA sequences, and the methyltransverase (M), which methylates and consequently protects the same DNA sequences from cleavage. While R is useful as it can cut the virus DNA, it is also potentially toxic as it can cut the unprotected host genome, so that R and M expression is tightly controlled by a control (C) protein. We developed a biophysical model of gene expression regulation in RM sys- tems, and applied it to EcoRV, which has divergent RC and M promoters [1]; this is, to our knowledge, the first quantitative model of divergent system archi- tecture. The main feature of EcoRV is that RC and M promoters overlap, which, in addition to C protein binding, controls the system transcription. We show that EcoRV features meet three design principles that we propose: the time-delayed expression of R with respect to M, the fast transition of R from OFF to ON state, and the increased steady-state stability of R. We show that perturbing EcoRV features leads to diminishing the design principles, and moreover consistently increases M to R ratio, preventing balancing the toxic molecule and its anti- dote. Based on the analysis of R-M system control, we propose a novel synthetic gene circuit [2], which combines a transcription control of R-M systems, with the transcript processing exhibited by CRISPR/Cas systems [2]. Our goal is to propose an optimal strategy for rapidly generating toxic molecules within a cell [2]. Keywords: R-M systems, divergent promoters, transcript processing

References

1. Rodic A., Blagojevic B., Zdobnov E., Djordjevic M., Djordjevic M.:Design principles of restriction-modification systems: ensuring safe and efficient host establishment., submit- ted (2016). 2. Rodic A., Blagojevic B., Zdobnov E., Djordjevic M., Djordjevic M:, to be submitted (2016). 3. Djordjevic M, Djordjevic M, Severinov K., Biology Direct, 7:24 (2012).

46 BelBI2016, Belgrade, June 2016. Non-negative Matrix Factorization for Integrative Clustering of Bioinformatics Data

Sanja Brdar

BioSense - Institute for research and development of information technology in biosystem, University of Novi Sad, Serbia [email protected]

Abstract n bioinformatics, integrative approaches are motivated by the desired improve- ment of robustness, stability and accuracy. Clustering, the prevailing technique for preliminary and explorative analysis of experimental data in genomics, may benefit from integration across multiple partitions. Different partitions can be in- ferred from different initialization, algorithms, parameters, features subsamples, items subsamples, similarity/distance functions or heterogeneous data sources. To overcome users’ dilemma of selecting data partition among many possible, we developed a technique that infers separate clusters from diverse inputs and then fuses them by means of non-negative matrix factorization (NMF). The proposed fusion technique is evaluated within the scope of functional genomics where it contributes to an increase of the quality of clusters with respect to enrichment of their associated gene function. The landscape of integrative clustering algo- rithms is further explored by comprehensive comparison of the partitions gen- erated by NMF and 5 alternative ensemble algorithms on 30 cancer genomics microarrays . Here, on high-dimensional microarray data, integrative clustering enhances the stability of final clusters that correspond to different types or sub- types of cancer. Finally, the current research on regularized NMF for integrative clustering will be presented, as well as possible applications on the analysis of metagenomic data where the microbial diversity assessment may also benefit from ensemble clustering .

BelBI2016, Belgrade, June 2016. 47 Radiation Induced Dysfunctions in the Working Memory Performance Studied by Neural Network Modeling

Aleksandr Bugay

Joint Institute for Nuclear Research, Laboratory of Radiation Biology, Moscow, Russia bugay [email protected]

Abstract

The synchronization of neuronal activity within a specific network is required for cognitive performance. Normal performance of neural network may be disturbed by various external factors. Among them galactic cosmic radiation remains one of the poorly studied while providing a potential risk for central nervous system in long-term space travel. In ground-based experiments, exposure to heavy ion radiation induces pronounced deficits in cognitive functions [1]. Biological neural network simulation have been applied recently for the quan- tification of related phenomena in hippocampus [2]. We study neural activity in the prefrontal cortex that is responsible for short-term retention of informa- tion about the object (object working memory). The model neural network con- tains two principal types of cells – pyramidal neurons (excitatory population) and interneurons (inhibitory population), connected to each other by synapses with GABA, AMPA and NMDA receptors. Further we apply phenomenological approach by using interpolated values of dose-dependent changes in basic struc- tural elements of neurons (synaptic receptors, ion channels, etc) according to known experimental data. The simulation of network spatiotemporal dynamics was performed for simple cognitive tasks. It is demonstrated, that radiation- induced alterations in the properties of synaptic receptors cause loss of stability for specific patterns of activity. This instability arises at the excess of threshold radiation dose. Proposed theoretical approach provides a tool for the estimation of cognitive impairments caused by ionizing radiation. Keywords: biological neural networks, radiation biology

References

1. Greene-Schloesser, D., et al.: Radiation-induced brain injury: a review, Frontiers in oncol- ogy, Vol. 2, 118 (2012). 2. Sokolova, I.V., et al.: Proton radiation alters intrinsic and synaptic properties of CA1 pyra- midal neurons of the mouse hippocampus, Radiation Research, Vol. 183, 208218 (2015).

48 BelBI2016, Belgrade, June 2016. Transcriptome data mining results support observed changes in host lipid metabolism during experimental toxoplasmosis

Miloˇs Busarˇcevi´c1,2 and Aleksandar Trbovich1, Ivan Milovanovi´c1, Aleksandra Uzelac1, Olgica Djurkovi´c-Djakovi´c1

1 Center of Excellence for Food- and Vector-borne Zoonoses, Institute for Medical Research, University of Belgrade, Dr. Subotia 4, 11129 Belgrade, Serbia 2 United World College of the Adriatic, via Trieste 29, 34011 Duino, Italy [email protected]

Abstract

Toxoplasma gondii is considered one of the most successful parasites on Earth due to its omnipresence and widest array of hosts, including all mammals. The genus comprises a single species infective for all hosts, with limited genetic di- versity in Europe and North America where all isolates belong to three clonal genotypes (type I, II and III). However, a wider genetic diversity characterized by non-clonal, atypical strains is found in South America and Africa, and is thought to be related to the presence of diverse Felidae as the only definitive host in which sexual reproduction, and consequentially, genetic recombinations, occur. In intermediate hosts, T. gondii occurs in two forms, the metabolically active rapidly proliferating tachyzoite which characterizes acute infection and the (so-called) metabolically inert encysted bradyzoite, characteristic of chronic infection; the parasite readily converts between the two in response to the hos- pitality or hostility of the host environment (mostly depending on the immune response) but is never eliminated from the infected host. Human infection is widespread but not clinically significant (mild and self-limi- ting) except in populations with an incompetent immune system such as the unborn baby and immunosuppressed individuals, such as those infected by HIV or organ and tissue transplant recipients, in which it may cause life-threatening disease. Treatment options have not much advanced for decades and there is still no drug able to eliminate encysted parasites, thus there is an urgent need for new drugs. Interestingly, T. gondii is not capable of synthesizing cholesterol (Chl), and thus depends on uptake of host Chl for its own development (1). We thus aimed to investigate Chl metabolism during T. gondii infection in the hope of find- ing prospective new drug targets. The aim of this study was to examine the effects of T. gondii on Chl metabolism in murine models of acute and chronic toxoplasmosis at the biological and molecular level. For this purpose, we have mined seven published microarray datasets of murine brain homogenates and lymphocytes (from peripheral blood or peritoneum) during acute infection with T. gondii type I, II and III strains (2, 3), for the expression levels of genes relevant

BelBI2016, Belgrade, June 2016. 49 Miloˇs Busarˇcevi´c et al. for Chl metabolism, including its biosynthetic pathway and export (KEGG path- ways). Experimental validation of these findings was performed by assessing the serum lipid status during acute and chronic murine toxoplasmosis, as well as by analyzing the transcript levels of relevant genes in brain and liver homogenates.

In the brain of mice infected with type II parasites, the data (day 8 post in- fection, p.i.) revealed down-regulation of most of the 13 genes from the Chl biosynthetic pathway starting from farnesyl-PP, and of two transcriptional acti- vators of this pathway, Srebf1 and Srebf2. On the other hand, Soat1 and Soat2, as well as Cyp7b1 and Abcg5 were upregulated. While up-regulation of Soat1 and Soat2 reflects an increase in Chl esterification, up-regulation of Cyp7b1, which catalyzes the first reaction in the Chl catabolic pathway of extrahepatic tissues towards bile acids, and of bile acid export protein Abcg5, reflect removal of Chl from a metabolically active pool. The expression profile of these genes in murine brain during infection with type I parasites (d5 p.i.) was similar but less pronounced when compared to type II infection. Peripheral lymphocytes seemed to show a similar expression profile, since down-regulation of several genes from the Chl biosynthesis pathway and up-regulation of genes involved in its esterification were observed during infection with both type I and type II par- asites. Furthermore, in infection with type II parasites, up-regulation of Abcg4, a Chl export protein, and of liver X receptor alpha (Nr1h3), were revealed. Fi- nally, in peritoneal cells of mice infected with T. gondii type I, II and III parasites, mining for differentially expressed genes involved in Chl biosynthesis and trans- port revealed an over two-fold increase in several genes, including Hmgcr, Fdft1, Sqle and Ldlr, while the expression of ApoE was reduced by more than six-fold. These results may be interpreted as decreased expression of genes involved in Chl biosynthesis and increased expression of genes involved in Chl esterification and transport outside of the brain. All the described changes may have as an end result a drop in cell Chl content. Experimental validation of the results of data mining showed that in a murine model of toxoplasmosis induced by type II parasites, acute infection (d14 p.i.) was associated with a decrease in the transcription of genes involved in Chl biosynthesis in both the brain and the liver. In contrast, in chronic infection (d42 p.i.), an increase in Chl metabolism was observed in both tissues. This was associated with changes at the biological level as well, as we observed a decrease in total serum Chl and HDL levels in acute infection, while both were unchanged in chronic infection. In summary, the decrease in Chl content in both the brain and periphery (liver, peritoneal lymphocytes), and the decrease in Chl reverse transport we observed in acute T. gondii infection, correspond to the gene expression data obtained via data mining. We propose that the observed changes in Chl metabolism are part of the host defense response. In acute infection, the host responds by an attempt to deprive the parasite of Chl, necessary for tachyzoite proliferation and devel- opment. In contrast, in chronic infection, when the parasite has converted into its metabolically less active cyst form, these metabolic changes are not promi- nent because of the established balance between the parasite and its host. It thus

50 BelBI2016, Belgrade, June 2016. Transcriptome data mining results support ... seems that Chl metabolism during T. gondii infection may be a novel target for therapeutic agents and should be further investigated. Keywords: cholesterol, data mining, murine infection, Toxoplasma gondii, tran- scriptome

References

1. Coppens, I., Sinai, AP., Joiner, KA: Toxoplasma gondii Exploits Host Low-Density Lipopro- tein Receptor-Mediated Endocytosis for Cholesterol Acquisition. J. Cell. Biol. 149,1:167- 180 (2000) 2. Hill, RD., Gouffon, JS., Saxton, AM., Su, C: Differential Gene Expression in Mice Infected with Distinct Toxoplasma Strains. Infect. Immun. 80,3:968-974 (2012) 3. Jia, B., Lu, H., Liu, Q., Yin, J., Jiang, N., Chen, Q: Genome-wide comparative analysis revealed significant transcriptome changes in mice after Toxoplasma gondii infection. Par- asit.Vectors. 4,6:161 (2013)

BelBI2016, Belgrade, June 2016. 51 Genome-scale Modelling, Metabolomics and Cheminformatics analysis guiding the Discovery of Antifungal Metabolites for Crop Protection

Miroslava Cuperlovic-Culf

National Research Council of Canada, Department for Information Communication Technologies, Ottawa, Canada [email protected]

Abstract

Fusarium head blight (FHB), also known as scab or tombstone, is a devastating disease of wheat, barley, oats and other small-grain cereals as well as corn caused primarily by Fusarium graminearum. Several cultivars of wheat have developed some level of resistance to FHB. Resistance to this fungal pathogen includes spe- cific metabolic responses to inoculation. A number of published metabolomics studies have determined major metabolic changes induced by pathogen in re- sistant and susceptible plants. Functionality of the majority of these metabolites in resistance remains, however, unknown. In this work we have made a com- pilation of all metabolites determined to selectively accumulate following FHB inoculation in resistant plants. Characteristics as well as possible functions and targets of these plant metabolites are investigated using cheminformatics ap- proaches. A particular focus has been on the likelihood of these metabolites targeting specific proteins and acting as drug-like molecules. Interesting tar- gets in Fusarium graminearum have been determined using COBRA analysis of genome-scale model of growth and toxin production in Fusarium graminearum. Results of computational analyses of binding properties of several representative metabolites to homology models of these target proteins are presented. Activ- ity of several of these compounds has been experimentally confirmed in fungal growth inhibition assays.

52 BelBI2016, Belgrade, June 2016. Analysis of network structural characteristics through vertex characteristics in directed networks

Tamara Dimitrova

Macedonian Academy of Sciences and Arts, Research Center for Computer Science and Information Technologies, Skopje, Macedonia [email protected]

Abstract

We suggested a unified framework for introducing novel and describing some well-known similarity characteristics of networks. These characteristics are com- puted for effective brain networks and their correlations are found from which conclusions about the brain networks are derived.

BelBI2016, Belgrade, June 2016. 53 A new random-walk-based approach for finding co-expression modules in biological networks

Natasa Djurdjevac Conrad

Zuse Institute Berlin, Germany [email protected]

Abstract

Co-expression modules are sets of biological entities (such as genes or proteins) which interact with each other and are having highly correlated expression pat- terns. Changes of activity in these modules can be used as robust biomarkers for early diagnosis or sub-type classification of major diseases. We consider the problem of finding co-expression modules in undirected net- works, i.e. how to identify connected groups of nodes with similar and relatively high (wrt. to the rest of the network) node weights. Compared to classical mod- ule identification, this task is more complex in the sense that it combines network topology with the imposed node information. In this talk, we will present our novel method for analyzing such networks based on a new type of time-continuous random walk (RW) processes [1], with tran- sition rules that take into account both node weights and a node’s neighbor- hood. We will show that for such a process, co-expression modules correspond to metastable sets which - in contrast to standard spectral clustering approaches - leads to much more prominent gaps in the spectrum of the adapted process. This enables better identification of metastable sets. We will discuss dynamical prop- erties of the new RW process and show how they contribute to co-expression module identification, improving upon previous methods [2]. Finally, we will present our recent biological results that can be achieved with our method in the context of cancer analysis based on NGS data on the STRING PPI network. Keywords: co-expression modules, network analysis, time-continuous random walk, metastable sets

References

1. Sarich, M. and Conrad Djurdjevac, N. et al.: Modularity revisited: A novel dynamics-based concept for decomposing complex networks. Journal of Computational Dynamics, 1(1):191- 212, (2014). 2. Komurov, K. et al.: Use of data-biased random walks on graphs for the retrieval of context- specific networks from genomic data. PLoS computational biology, 6(8):e1000889, (2010).

54 BelBI2016, Belgrade, June 2016. Improving 1NN strategy for classification of some prokaryotic organisms

Milana Grbi´c1, Aleksandar Kartelj2, Dragan Mati´c1 and Vladimir Filipovi´c2

1 Faculty of Science and Mathematics, University of Banja Luka, Mladena Stojanovi´ca 2, 78000 Banja Luka, Republic of Srpska, Bosnia and Herzegovina [email protected], [email protected] 2 Faculty of Mathematics, University of Belgrade, Studentski trg 16, 11000 Belgrade, Serbia [email protected], [email protected]

Abstract

Classification algorithms are intensively used in discovering new information in large sets of biological data. During the classification process, the classifier uses a set of training instances with known classes in order to learn how to pre- dict the class of an instance with an unknown class. For classifying biological data, a number of commonly used classification tools exists. However, in clas- sification tasks which invole nominal attributes, these tools often do not obtain results of satisfying quality, since mathematical operations and relations can not be directly applied to symbolic values. As a consequence, the classifiers ignore nominal attributes and form the classification model based solely on numerical attributes, which leads to inaccurate and unreliable results. This problem often appears in the K-nearest neighborhood (KNN) classification. KNN is based on a distance function that measures the difference or similarity between two instances. In KKN, there is an assumption that the class of a test in- stance is equal to the most frequent class of the nearby instances with respect to distance function, e.q. Euclidean distance function. When the problem includes many nominal attributes, the standard Euclidean distance can become burdened by the large number of irrelevant attributes consequently producing inaccurate classification results. In these cases, if a KNN classifier is to be applied, a new distance function between attributes needs to be defined. A dataset of prokaryotic organisms, analysed in this paper, contains total of 30 attributes, from which 11 attributes are nominal. Earlier experiments indicated that common used classification tools, which use NN strategy, mostly ignore nominal attributes and forms the classification based on only numerical ones. The classification task therefore becomes innacurate. In this paper we examine several metrics which can be applied to nominal at- tributes of the analysed dataset, and for each metric we apply the appropriate 1NN strategy. Additionally, we perform the attribute selection by formulating it as an optimization problem and solving it with Electromagnetism (EM) meta- heuristic algorithm. The proposed EM uses 1NN as an underlying classifier and implements the precisely adjusted operators for the optimization process.

BelBI2016, Belgrade, June 2016. 55 Milana Grbi´c et al.

In order to justify the proposed approach, comprehensive experiments are per- formed on the dataset of prokaryiotic organisms. The obtained results are com- pared with the results of other classification methods from literature. Keywords: bioinformatics; classification; nearest neighbor; data mining

56 BelBI2016, Belgrade, June 2016. Identifying relevant positions in proteins by Critical Variable Selection

Silvia Grigolon

Lincoln’s Inn Fields Laboratory, The Francis Crick Institute, London, United Kingdom [email protected]

Abstract

Evolution in its course found a variety of solutions to the same optimisation problem. The advent of high-throughput genomic sequencing has made avail- able extensive data from which, in principle, one can infer the underlying struc- ture on which biological functions rely. In this talk, I will present a new method aimed at extracting sites encoding structural and functional properties from a set of protein primary sequences, namely a Multiple Sequence Alignment [1].The method, called Critical Vari- able Selection, is based on the idea that subsets of relevant sites correspond to subsequences that occur with a particularly broad frequency distribution in the dataset. By applying this algorithm to in silico sequences, to the Response Reg- ulator Receiver and to the Voltage Sensor Domain of Ion Channels, I will show that this procedure recovers not only information encoded in single site statis- tics and pairwise correlations but it also captures dependencies going beyond pairwise correlations. The method proposed here is complementary to Statisti- cal Coupling Analysis [2], in that the most relevant sites predicted by the two methods markedly differ. We find robust and consistent results for datasets as small as few hundred sequences, that reveal a hidden hierarchy of sites that is consistent with present knowledge on biologically relevant sites and evolution- ary dynamics. This suggests that Critical Variable Selection is able to identify in a Multiple Sequence Alignment a core of sites encoding functional and structural information.

References

1. S. Grigolon, S. Franz, M. Marsili, Mol. BioSyst., 2016, DOI: 10.1039/C6MB00047A. 2. N. Halabi, O. Rivoire, S. Leibler, R. Ranganathan, Cell, 138(4):774-86, 2009.

BelBI2016, Belgrade, June 2016. 57 Transcription initiation by alternative sigma factors

Jelena Guzina and Marko Djordjevic

Faculty of Biology, University of Belgrade, Studentski trg 16, 11000 Belgrade, Serbia {jelenag,dmarko}@bio.bg.ac.rs

Abstract

ECF (ExtraCytoplasmic Function) subfamily is the largest and most diverse group of alternative σ factors within σ70 protein family [1]. Although physiologically highly important, the mechanisms of transcription initiation for ECF σ factors are poorly examined. Namely, the current paradigm of ECF σ functioning, which assumes promoter rigidness/absence of mix-and-matching, is based on a very limited data, centered around a subset of (canonical) σ factors with experimen- tally established promoter recognition specificity. To gain a more comprehensive insight into the ECF σ functioning, besides canon- ical, we also investigate much less studied ECF σ subgroups, and the group outliers obtained by recently sequenced bacteriophages [2]. More precisely, by employing extensive computational comparison of diverse ECF σs and their cor- responding promoters, we aim inferring DNA and protein recognition motifs involved in transcription initiation. The analysis identifies the -10 element extension in phage ECF σ promoters, where a comparison with bacterial σ factors points to a putative 6-aa motif just C-terminal of domain σ2, responsible for this interaction. Interestingly, similar protein motif is found C-terminal of domain σ2 in canonical ECF σ factors, at a position suitable for interaction with a conserved DNA motif further upstream of -10 element. Moreover, phiEco32 ECF σ lacks recognizable -35 element and σ4 domain, identified in a homologous phage 7-11, indicating that -35 element interactions can be compensated by the extended -10 element [3]. Overall, our results reveal a larger flexibility in ECF σ promoter recognition than previously recognized. The putative non-canonical σ-promoter interactions, along with promoter element complementation, implies a possibility that mix- and-matching mechanism - hallmark of the σ70 housekeeping factors - may also apply to ECF group. Keywords: ECF sigma, bacterial promoters, transcription initiation, σ70 family

References

1. Staro A, Sofia HJ, Dietrich S, Ulrich LE, Liesegang H, Mascher T. 2009. Molecular microbi- ology 74:557-581. 2. Guzina J, Djordjevic M., BMC Evolutionary Biology 15: S(1) (2015). 3. Guzina J, Djordjevic M., submitted for publication (2016).

58 BelBI2016, Belgrade, June 2016. The influence of amino acids physicochemical properties and frequencies on identifying MHC binding ligands

Davorka R. Jandrli´c1, Nenad S. Miti´c2, and Mirjana D. Pavlovi´c3

1 Faculty of Mechanical Engineering, Department for Mathematics, University of Belgrade, Serbia [email protected] 2 Faculty of Mathematics, University of Belgrade, Studentski trg 16 11000 Belgrade, Serbia [email protected] 3 Institute of General and Physical Chemistry, Studentski trg 12/V, Belgrade, Serbia 11000 Belgrade, Serbia [email protected]

Abstract

Binding of proteolyzed fragments of proteins to MHC molecules is essential and the most selective step that determines T cell epitopes. Therefore, prediction of MHC-peptide binding is principal for anticipating potential T cell epitopes and is of the immense relevance in vaccine design. Large quantity of protein fragments, experimentally tested as potential epitopes, and MHC allele poly- morphism, have prompted the development of many computational methods for epitope identification, thus reducing laboratory work and costs. Although some available methods, have reasonable accuracy, there is no guarantee that all models produce good quality predictions [1]. Here, new models for quantita- tively and qualitatively predicting MHC-binding ligands that use different amino scids properties, are presented. The models were made through two steps. In the first step, a new approach for identifying the most relevant physicochemical properties, for classification of peptides into MHC-binding ligands or non bind- ing ligands, is presented. For that purpose, classification models that take into account the physicochemical properties of amino acids and their frequencies, are developed. The developed classification models are rule based and use k-means clustering technique for extracting the most important physicochemical proper- ties. The obtained results indicate that the physicochemical properties of amino acids contribute significantly to the peptide-binding affinity and that the differ- ent alleles are characterized by a different set of the physicochemical properties. In the second step, results from these models are used as input features to two machine learning models, based on support vector machine technique for classi- fication and regression problem. The resulting models have shown comparable performance, or in some cases better than two of the currently best available pre- dictors: NetMHCpan and SMM PMBEC [2]. The new models could be used as complement to the best existing methods.

BelBI2016, Belgrade, June 2016. 59 Davorka Jandrli´c et al.

Keywords: MHC binding prediction, encoding schemas, k-mean clustering, SVM classification and regression

References

1. Brusi´c V., Baji´c V.B., Petrovsky N.,Computational methods for prediction of T-cell epi- topesa framework for modelling, testing, and applications, Methods, 34(4), 436-443, (2004) 2. Nielsen M., Zhang H., Lundegaard C., Pan-specific MHC class I predictors: a bench- mark of HLA class I pan-specific prediction methods, Bioinformatics, (2009)

60 BelBI2016, Belgrade, June 2016. Networks of interaction in moving animal groups and collective changes of direction

Asja Jeli´c

The Abdus Salam International Centre for Theoretical Physics (ICTP), Department for Quantitative Life Sciences, Trieste, Italy [email protected]

Abstract

Animal groups on the move are a paradigmatic example of collective behaviour in social species. The most striking features of this collective motion are sudden coherent changes in the travel direction of the whole group. Such a coordinated collective behaviour requires fast and robust transfer of in- formation among individuals in order to prevent cohesion loss. However, little is known about the mechanism by which natural groups achieve this robustness. Furthermore, collective directional switching often emerges not as a response to an external alarm cue, but spontaneously from the intrinsic fluctuations in individual behaviour. In particular, it is not yet clear the role of the underlying structure of the communication network in these events. In this talk, I will present an experimental and theoretical study of spontaneous collective turns in natural flocks of starlings [1, 2]. We automatically track the 3D positions and velocities of all individuals in flocks of up to 600 birds for the whole duration of a turning event [3]. This enables us to analyse the changes in the individual behaviour of every group member and reveal the emergent dy- namics of turning. We show that spontaneous turns start from the individuals located at the elongated tips of the flocks, and then propagate across the whole group. We find that birds on the tips deviate from the mean direction much more persistently than other individuals, indicating that persistent localized fluctua- tions are a trigger for collective directional switching. Moreover, our analysis reveals two crucial ingredients which enhance the effect of such noise leading to collective changes of state: the non-symmetric nature of interaction between individuals and the presence of heterogeneities in the topology of the network.

References

1. Attanasi, A. et al.: Information transfer and behavioural inertia in starling flocks. Nature Physics 10, 691–696 (2014) 2. Attanasi A. et al.: Emergence of collective changes in travel direction of starling flocks from individual birds’ fluctuations. J. Royal Soc. Interface 12 (108), 20150319 3. Attanasi, A. et al.: GReTA – a novel Global and Recursive Tracking Algorithm in three dimensions. IEEE Trans. Pattern Anal. Mach. Intell., vol.37 (2015)

BelBI2016, Belgrade, June 2016. 61 Filtering of repeat sequences in genomes

Ana Jelovi´c1, Miloˇs Beljanski2, and Nenad Miti´c3

1 Faculty of Transport and Traffic Engineering, Univeristy of Belgrade, 305 Vojvode Stepe, 11000 Belgrade, Serbia [email protected] 2 Institute of General and Physical Chemicstry, Studentski trg 12, 11000 Belgrade, Serbia [email protected] 3 Faculty of Mathematics, University of Belgrade, Studentski trg 16, 11000 Belgrade, Serbia [email protected]

Abstract

Finding repeat sequences in nucleic acids and proteins is of great importance in biology. A number of tools are able to efficiently extract these sequences. If we search for repeated sequences in a completely random computer-generated se- quence of any meaningful length we will still find a large number of matches. Ex- tracting all repeated sequences from a genome will find a mixture of sequences that are important for its function and organization and randomly occurring se- quences that are effectively noise. We developed a method for efficiently estimating the probability of a group of found repeated sequences being randomly occurring, and an accompanying pro- gram that finds and then filters the found repeated sequences based on the given probability threshold. What makes our method different from existing ones is that we don’t group the results by repeat length only but also by number of occurrences. Even short repeated sequences that happen many times may be statistically significant, or longer repeated sequences occurring just a few times may not be. For the large number of repeated sequences that can be found in a genome if the minimal sequence length is relatively low, our method provides a significant gain in performance and quality of results compared to outputting all the found sequences. The method can be applied to both nucleic acids and protein sequences. We have found that, as previously expected, longer repeated sequences mostly have higher probability that they are statistically significant, but also counterintu- itively that for some viruses, for example, shorter repeated sequences are more important than the longer ones.

62 BelBI2016, Belgrade, June 2016. Could integrative bioinformatic approach predict the circulating miRs that have significant role in pancreatic tissue in type 2 diabetes?

Ivan Jovanovi´c1, Maja Zivkoviˇ ´c1, Jasmina Jovanovi´c2, Tamara Djuri´c1, and Aleksandra Stankovi´c1

1 VINCAˇ Institute of Nuclear Sciences, University of Belgrade, Laboratory for Radiobiology and Molecular Genetics, Mike Petrovica Alasa 12-14, 11001 Belgrade, Serbia {ivanj,majaz,tamariska,alexas}@vin.bg.ac.rs 2 Faculty of Mathematics, University of Belgrade, Studentski trg 16, 11000 Belgrade, Serbia [email protected]

Abstract. The action of microRNAs (miRs) as post-transcriptional regu- lators of gene expression is being recognized as one of the critical pro- cesses that affect type 2 diabetes (T2D) progression. The cellcell signal- ing via paracrine or even endocrine routes is mediated by miRs released from human tissue. Therefore, the aim of our study was to bioinformat- ically predict the miRs from microarray gene expression analysis of the whole blood that play role in the pancreas β cell functioning in human T2D. We have demonstrated that gene expression signatures identified in the whole blood correspond to the miR expression changes specific for the pancreas tissue during the insulin resistance. Further experimental studies should follow in order to characterize described effects as early prognostic biomarkers of insulin resistance and T2D. Keywords: type 2 diabetes, mi- croRNA, microarray gene expression, bioinformatic integrative approach

1. Introduction

Type 2 diabetes (T2D) is a complex disease generally characterized by insulin re- sistance and increased hepatic glucose production. The rapidly increasing preva- lence of T2D is motivating the intensive search for biomarkers of the disease as well as novel therapeutic targets. The action of microRNAs (miRs) as post- transcriptional regulators of gene expression is being recognized as one of the critical processes that affect T2D progression. Therefore, these small, non-coding RNAs, that regulate gene expression by predominantly promoting the degrada- tion of mRNA, exhibit great biomarker and therapeutic potential. Also, it was described that all human cells can release miRs, which mediate cellcell signal- ing via paracrine or even endocrine routes. Recently, microarray whole genome expression data and miR target predictions from multiple prediction algorithms was linked using a multivariate statistical technique called Co-Inertia analysis (CIA) in order to predict miR activity and to associate specific miRs with differ- ent diseases. The studies have shown that CIA method does provide good quality

BelBI2016, Belgrade, June 2016. 63 Ivan Jovanovi´c et al. predictions of miR activity. It was suggested that CIA has complementarity with other previously described prediction approaches thus could offer the prediction of miRs unidentified by others. So far, this integrative approach was not used for the analysis of circulating miRs that may originate from pancreatic tissue in T2D. Therefore, the aim of our study was to bioinformatically predict the miRs from microarray gene expression analysis of the whole blood that play role in the pancreas β cell functioning in human T2D.

2. Materials and Methods

2.1. Gene expression data

The gene expression data set used in our study was downloaded from www.ncbi. nlm.nih.gov/geo/ (Gene Expression Omnibus database), accession number: GSE26168. The total mRNA expression of whole blood from T2D patients and healthy controls was profiled on Illumina HumanRef-8 v3.0 expression bead- chip. The data was initially background subtracted and quantile normalization was performed prior the analysis.

2.2. Co-Inertia analysis

CIA was used to link microarray gene expression data (8 T2D patients and 8 controls) and miR target predictions from multiple prediction algorithms to as- sociate specific miR activity with T2D. This multivariate statistical technique si- multaneously analyzes two connected data tables. The tables are treated as two sets of measurements on the same objects, genes. One of the tables is the mRNA gene expression table of g genes from n samples and the other displays predicted target counts of all miRs for the same g genes. Non-symmetric correspondence analysis was used as ordination method of CIA, which summarizes each data ta- ble in a low dimensional space by projecting the samples onto axes which maxi- mize the variances of the coordinates of the projected points. CIA performs two simultaneous NSCs on the two linked tables, and identifies pairs of axes, from the two datasets which are maximally covariant. This unsupervised method was used for visual inspection of the data. By further use of Between Group Analysis (BGA), which forces an ordination to be carried out on groups of samples rather than individual samples, CIA was directed to find the maximum co-variance be- tween the gene expression difference between groups of samples and the miR- gene target frequency tables. For the specified split in the data that contrasts T2D and control samples, we received a ranked list of miR motifs. CIA generates as many miR rank lists as target prediction algorithms used. The most extreme values of the ranking lists (top 20 and last 20) were used for the prediction of upregulated and downregulated miRs in T2D. Lists were combined using con- sistency among the methods, according to previous study. The complete analysis was performed by the MADE4 R package.

64 BelBI2016, Belgrade, June 2016. Could integrative bioinformatic approach...

2.3. miR target prediction

Five sequence based miR target prediction algorithms were used for CIA: Tar- getScan and TargetScanS, PicTar4way and Pictar5way, and miRanda according to Madden et al. Each of these sequence based prediction algorithms utilizes the complementarity with the miR seed and the cross species conservation in their predictions. The miR target prediction data for CIA input, extracted from these databases, was organized in gene/miR frequency tables of counts of pre- dicted targets per gene for each of the algorithm. The gene/miR frequency tables for sequence based predictions originated from the TargetScan website http://www.targetscan.org/ (version 4.1), the UCSC genome browser tract for pictar4way and pictar5way http://genome.ucsc.edu/, and from miRBase for mi- Randa (http://microrna.sanger.ac.uk/sequences/).

3. Results

CIA was firstly used in unsupervised manner for the purpose of data exploration. Figure 1. shows an example of unsupervised analysis of CIA using Pictar5Way target prediction program. The plot is in 2 parts and depicts a correspondence analysis of T2D patients and control samples and miRs associated with the gene expression pattern characteristic for the two groups of samples. The observed split in the data shows clear difference between the gene expression profiles of the analyzed groups (Figure 1). The CIA performed in conjunction with corre- spondence analysis and between group analysis produced five ranked lists of miRs associated with specific gene expression profile in T2D. Using consistency among methods, we characterized potentially upregulated and downregulated circulating miRs responsible for the whole blood gene expression template in T2D (data not shown). Clear clustering of T2D samples and controls shown in Figure 1. depict homo- geneous genome expression from whole blood (from microarray experiment) in T2D patients and different from control samples. This makes data suitable for further, supervised CIA. Our preliminary results indicate successful prediction of miRs from blood and applicability of our approach to select T2D associated miRs, as potentially new molecular biomarkers for this disease. By inspecting the results, along with literature mining, we discovered that two of the highly ranked miRs (Table 1) present important factors in pancreas β cell proliferation in response to hyperglycemia and insulin resistance which is the hallmark of T2D.

Table 1. The ranking of the selected miRs according to CIA performed on 5 prediction algorithms representing top and last 20 miRs. P4W Pictar4Way; P5W Pictar5Way; TS TargetScan; TSS TargetScanS

miR P4W P5W TS TSS Miranda Predicted regulation in T2D miR-375 6- 112-UP miR-184 81 177- DOWN

BelBI2016, Belgrade, June 2016. 65 Ivan Jovanovi´c et al.

Fig. 1. Axes of the unsupervised CIA performed on the whole genome gene ex- pression data of T2D patients and Controls. The gene/miR frequency table gen- erated with Pictar5Way was used to make this figure.

4. Discussion and conclusion

Blood miRs expression patterns have been reported for various human diseases with disease specific signatures. In one of the first studies, it was shown by se- quencing that patients with T2D have a significantly altered expression profile of serum miRs. This approach was also favored in the detection of miRs in blood and other body fluids of T2D patients. Using bioinformatical approach that combines microarray gene expression and miR target prediction from multiple prediction algorithms, we have associated specific circulating miRs with T2D. In this study, we have focused on the two of the most noteworthy miRs, functionally associated in a network within the miR pathway that coordinately regulates the compensatory proliferation of the pancreatic β cells in T2D. The miR-184 is unique in pancreatic islets as the most downregulated miR dur- ing insulin resistance. It was described that miR-184 acts as an inhibitor of Ago2. Increased expression of Ago2 facilitates the function of already upregulated miR- 375 in suppressing genes, including growth suppressor Cadm1 in vivo, thus in- ducing the proliferation of β cells and accommodation of the elevated demand for insulin. Therefore, the miR-184 mir-375 network presents the essential com- ponent of the compensatory response that regulates proliferation of β cells re- garding insulin sensitivity and metabolic stress. The most important finding of our study is that the whole blood gene expression signatures reflects the miR expression changes specific for the pancreas tissue during the insulin resistance. This is the first bioinformatical study showing that tissue-released miRs affect the whole blood gene expression in T2D. Although there is still a debate about the hormone-like effect of extracellular miR in the blood, the results of our study suggest that certain circulating miRs could be systemic biomarkers of pancreatic tissue changes in T2D. The results of our pre- dictions are also in agreement with microarray expression results of circulating miRs in T2D. The obtained results represent the data of great importance for understanding of complexity of miR nature. Also, here we demonstrate the crucial need of

66 BelBI2016, Belgrade, June 2016. Could integrative bioinformatic approach... bioinformatical integrative concepts in further research of molecular processes of T2D. Finally, further experimental studies should follow in order to charac- terize described effects as early prognostic biomarkers of insulin resistance and T2D.

References

1. Guo, H., Ingolia, NT., Weissman, JS., Bartel, DP.: Mammalian microRNAs predominantly act to decrease target mRNA levels. Nature, Vol. 466, No. 7308, 835-40. (2010) 2. Chen, K., Rajewsky, N.: The evolution of gene regulation by transcription factors and mi- croRNAs. Nature Reviews Genetics, Vol. 8, No. 2, 93-103. (2007) 3. Turchinovich, A., Samatov, TR., Tonevitsky, AG., Burwinkel, B.: Circulating miRNAs: cell-cell communication function? Frontiers in Genetics, Jun 28;4:119. (2013) 4. Madden, SF., Carpenter, SB., Jeffery, IB., Bjrkbacka, H., Fitzgerald, KA., O’Neill, LA., Hig- gins, DG.: Detecting microRNA activity from gene expression data. BMC Bioinformatics, 11: 257. (2010) 5. Mulrane, L., Madden, SF., Brennan, DJ., Greme, G., McGee, SF., McNally, S., et al, DP.: miR-187 is an independent prognostic factor in breast cancer and confers increased inva- sive potential in vitro. Clinical Cancer Research, Vol. 18, No. 24, 6702-13. (2012) 6. Jovanovi, I., Zivkovi, M., Jovanovi, J., Djuri, T., Stankovi, A.: The co-inertia approach in identification of specific microRNA in early and advanced atherosclerosis plaque. Medical Hypotheses, Vol. 83, No. 1, 11-5. (2014) 7. Arora, A., Simpson, DA., Individual mRNA expression profiles reveal the effects of specific microRNAs. Genome Biology, Vol. 9, No. 5, R82. (2008) 8. Karolina, DS., Armugam, A., Tavintharan, S., Wong, MT., Lim, SC., Sum, CF., Jeyaseelan, K.: MicroRNA 144 impairs insulin signaling by inhibiting the expression of insulin receptor substrate 1 in type 2 diabetes mellitus. PLoS One, Vol. 6, No. 8, e22839. (2011) 9. Culhane, AC., Perrire, G., Considine, EC., Cotter, TG., Higgins, DG.: Between-group anal- ysis of microarray data. Bioinformatics, Vol. 18, No. 12, 1600-8. (2002) 10. Culhane, AC., Thioulouse, J., Perrire, G., Higgins, DG.: MADE4: an R package for multi- variate analysis of gene expression data. Bioinformatics, Vol. 21, No. 11, 2789-90. (2005) 11. Keller, A., Leidinger, P., Vogel, B., Backes, C., ElSharawy, A., Galata, V., et al.: miRNAs can be generally associated with human pathologies as exemplified for miR-144. BMC Medicine, 12:224. (2014) 12. Chen, X., Ba, Y., Ma, L., Cai, X., Yin, Y., Wang, K., et al.: Characterization of microRNAs in serum: a novel class of biomarkers for diagnosis of cancer and other diseases. Cell Research. Vol. 18, No. 10, 997-1006. (2008) 13. Collares, CV., Evangelista, AF., Xavier, DJ., Rassi, DM., Arns, T., Foss-Freitas, MC, et al. Identifying common and specific microRNAs expressed in peripheral blood mononuclear cell of type 1, type 2, and gestational diabetes mellitus patients. BMC research notes, 6:491. (2013) 14. Zampetaki, A., Kiechl, S., Drozdov, I., Willeit, P., Mayr, U., Prokopi, M., et al.: Plasma mi- croRNA profiling reveals loss of endothelial miR-126 and other microRNAs in type 2 dia- betes. Circulation Research, Vol. 107, No. 6, 810-7. (2010) 15. Kong, L., Zhu, J., Han, W., Jiang, X., Xu, M., Zhao, Y., et al.: Significance of serum microR- NAs in pre-diabetes and newly diagnosed type 2 diabetes: a clinical study. Acta Diabeto- logica, Vol. 48, No, 1, 61-9. (2011) 16. Tattikota, SG., Rathjen, T., McAnulty, SJ., Wessels, HH., Akerman, I., van de Bunt, M., et al.: Argonaute2 Mediates Compensatory Expansion of the Pancreatic Cell. Cell Metabolism, Vol. 19, No. 1, 122-134. (2014)

BelBI2016, Belgrade, June 2016. 67 Mechanism of unusual flexibility of DNA TATA-box

Polina Kanevska and Sergey Volkov

Bogolyubov Institute for Theoretical Physics, 14-b, Metrolohichna str. Kyiv, 03680, Ukraine [email protected]

Abstract

DNA is macromolecule with a variety of secondary structure which vary depend- ing on nucleotide content and sequence, conditions of solution, and interaction with proteins or ligands [1]. The variations in helix geometry play the role of conformational information for the regulatory proteins and ferments. That is why studying of conformational transformations of the DNA double helix is an approach to understanding the mechanisms of many genetic processes. In our pervious works we revealed that polymorphic macromolecule can have specific deformational mechanism, due to appearance of localized conforma- tional excitations [2]. In this case macromolecular deformation occurs because of interrelation between internal (conformational) and external (elastic) com- ponents and its energy cost is smaller than the same deformation in elastic ap- proach as worm like chain (WLC). The internal-induced mechanism of macromolecular deformation is used for de- scribing anomalous large bending of DNA fragment with specific base pair se- quence (TATA-box) which is functionally important part of gene. It is argued that some decreasing of bending stiffness leads to drastic increasing of magnitude of bend with diminishing of energy to the value lower than elastic energy of equal bend. The bending stiffness parameter can be stronger regulator of the bend value, then WLC predicts. Our approach agrees with recent exploring of DNA polymorphism which demonstrates bimodality of regulator fragment [3]. Keywords: DNA TATA-box, conformational information, flexibility

References

1. Saenger, W.: Principles of Nucleic Acid Structure, 200–241. Springer, New York (1984) 2. Kanevska, P.P.and Volkov, S. N.: Intrinsically inuced deformation of a DNA macromolecule, Ukr.J.Phys., 51, 1003–1009. (2006) 3. Dans, P.D. and Perez, A. and Faustino, I. and Lavery, R. and Orozco, M.: Exploring polymor- phisms in B-DNA helical conformations, Nucleic Acids Res, 40(21), 10668–10678 (2012)

68 BelBI2016, Belgrade, June 2016. One structured output learning method for protein function prediction

Jovana Kovacevic1, Predrag Radivojac2, Gordana Pavlovi´c-Laˇzeti´c1

1 Faculty of Mathematics, University of Belgrade, Studentski trg 16, 11000 Belgrade, Serbia {jovana,gordana}@matf.bg.ac.rs 2 Department of Computer Science and Informatics, Indiana University, 150 South Woodlawn Avenue, Bloomington, Indiana, USA [email protected]

Abstract

The task of structured output learning is to learn a function that enables predic- tion of complex objects such as sequences, trees or graphs for a given input. One of the problems where such methods can be applied is protein function predic- tion, where the aim is to find one or more functions that the protein performs in a cell according to its characteristics such as its primary sequence, phylogenetic information, protein-protein interactions, etc. The space of all known protein functions is defined by a directed acyclic graph known as Gene Ontology (GO), where each node represents one function and each edge encodes a relationship such as is-a, part-of, etc. Each output, on the other hand, represents the subgraph of GO, consistent in a sense that it contains a proteins functions propagated to the root of the ontology. In this research, we developed structured output predictor that determines pro- tein function according to the histogram of 4-grams that appear in the proteins sequence. The predictor is based on the machine learning method of structural support vector machines (SSVM), which represents generalization of the well- known SVM optimizers on structured outputs. Adjusting SSVM to this specific problem required the development of an optimization algorithm that maximizes an objective function over the vast set of all possible consistent subgraphs of protein functional terms as well as careful choice of loss functions. Using the proposed method, we tested it on sets of proteins of five different organisms and investigated the influence of proteins origin to quality of function prediction. Keywords: Protein function prediction, structured output learning

BelBI2016, Belgrade, June 2016. 69 Combined genomic and transcriptomic characterization of single disseminated prostate cancer cells

Stefan Kirsch1, Urs Lahrmann1, Miodrag Guzvic2, Zbigniew T. Czyz1, Giancarlo Feliciello1, Bernhard Polzer1 and Christoph A. Klein1,2

1 Fraunhofer ITEM - Project Group Personalized Tumor Therapy, Am Biopark 9, 93053 Regensburg, Germany [email protected] 2 University of Regensburg - Experimental Medicine and Therapy Research, Franz-Josef-Strauß-Allee 11, 93053 Regensburg, Germany

Abstract he most frequent cause of death among prostate cancer patients is the manifesta- tion of bone metastases. It is therefore of high therapeutic relevance to identify how and when metastatic growth is induced. Using disseminated cancer cells (DCCs) may provide an opportunity to dissect the early stages of systemic can- cer and enable detection of critical therapeutic targets. To achieve a comprehen- sive characterization of these cells, we developed a method for combined whole genome and whole transcriptome analysis [1]. We analyzed in total 36 samples (24 EPCAM-positive DCCs; 1 single cell and 5 cell pools of the VCaP prostate cancer cell line; 6 cells from healthy donors as controls). After isolation and quality control by PCR-based QC-assays, DCCs were subjected to combined whole genome and whole transcriptome amplifi- cation using the Ampli1 WGA and Ampli1 WTA approach [1]. WGA products were then hybridized on high-resolution SurePrint aCGH arrays and analyzed with the Agilent Genomic Workbench Software. WTA products were sequenced on the Roche 454 GS FLX+ system and analyzed using an in-house developed bioinformatics pipeline that included quality control, read mapping, differential expression analysis, pathway analysis, fusion gene prediction and variant call- ing. Comprehensive transcript expression and cluster analysis revealed different gro- ups within our cell collective. Expression of classical cancer marker genes like KLK3 (PSA) or AR could only be identified in one of those groups while the others showed a strongly different expression signature. A further comparison of expression data, mutational profile and aCGH data uncovered group-specific features and allowed to link copy number alterations to corresponding changes in gene expression dosage. Keywords: bioinformatics, single cell, whole genome, whole transcriptome

70 BelBI2016, Belgrade, June 2016. Combined genomic and transcriptomic characterization ...

References

1. Klein, C.A., Seidl, S., Petat-Dutter, K., Offner, S., Geigl, J.B., Schmidt-Kittler, O., Wendler, N., Passlick, B., Huber, R.M., Schlimok, G., Baeuerle, P.A., Riethmller, G: Combined tran- scriptome and genome analysis of single micrometastatic cells. Nat Biotechnol. 2002 Apr;20(4):387-92.

BelBI2016, Belgrade, June 2016. 71 The perceptual structure of the phoneme manifold

Yair Lakretz1,2,3, Evan-Gary Cohen1, Naama Friedmann1, Gal Chechik2, and Alessandro Treves3

1 Tel-Aviv University, Tel-Aviv, Israel [email protected] 2 Bar Ilan University (Israel) 3 SISSA, 265 Via Bonomea, Trieste, Italy

Abstract

Theories of phoneme representation have been based on the notion of ”sub- phonemic features”, i.e. variables such as place of articulation, voicing and nasal- ization, some binary and some multi-valued, that can be taken to characterize the production, and with some modifications also the perception, of different phonemes. However, perceptual confusion rates between phonemes cannot be simply explained by the number of different values taken by their subphonemic features. Moreover, assuming a discrete nature for these variables is incongruent with the continuous, analog neural processes that underlie the production and perception of phonemes, and with the remarkable cross-linguistic differences ob- served, that make the notion of a universal phonemic space rather implausible. As a first step towards a plausible neuronal theory of how phoneme representa- tions may self-organize in each individual upon language learning, we describe methods to derive, from behavioral or neural data, distinct ”weights” for differ- ent features. Such weights provide a data-driven metric for the perceptual or motor phoneme manifold. We find that they differ by more than an order of magnitude, and differ across languages, pointing at the need to go beyond the classical digital description of phonemes.

72 BelBI2016, Belgrade, June 2016. Model selection in biomolecular pathways

Hanen Masmoudi

Higher institute of Biotechnology of Sfax, Tunisia [email protected]

Abstract

We present here a generalized Expectation-Maximization (GEM) algorithm [1] for learning parameters of nite Gaussian mixture distributions within a graph- ical modeling framework. The GEM algorithm iterates between three steps: an expectation (E) and a maximization (M) steps, like the EM algorithm, where a created function for log-likelihood is evaluated using the current estimate for the parameter after that the parameters estimated are maximized. Here, we added a third (G) step where the M-estimates are updated based on Lauritzen for- mula [2] given a graph that indicates relationship between nodes. We apply the GEM al- gorithm on biomolecular interaction networks. In such networks, arcs represent probabilistic relationships (regulation, interaction) between nodes or variables (proteins, genes, molecules, ...). A simulation study of signal transduc- tion net- work of a simple biomolecular pathway of the epidermal growth factor (EGFR) protein [3] was conducted, and we demonstrate that the GEM algorithm allows the classication of the data to each Gaussian distribution cluster and per- mits the selection of the best network that t the data. Keywords: EM algorithm, EGFR, Bayesian Network, Selection

References

1. Dempster, A.P., Laird, N.M. and Rubin, D.B.,: Maximum likeli- hood from incom- plete data via the EM algorithm (with discussion). J. Roy. Statist. Soc. B. 39, 1-38. 2. Lauritzen, S.L.,: Graphical models, Oxford University Press. (1996). 3. Ben Hassen, H. Masmoudi, A. and Rebai, A.,: Causal inference in Biomolecular Pathways using a Bayesian network approach and an Implicit method. J. Theor. Biol. 4, 717-724. (2008).

BelBI2016, Belgrade, June 2016. 73 Hybrid methodology for information extraction from tables in the biomedical literature

Nikola Milosevic1, Cassie Gregson2, Robert Hernandez2, and Goran Nenadic1

1 School of Computer Science, University of Manchester, Oxford Road, Manchester, UK {nikola.milosevic,gnenandic}@manchester.ac.uk 2 AstraZeneca plc, Cambridge, UK {cassie.gergson,rob.hernandez}@astrazeneca.com

Abstract. Scientific literature, especially in the biomedical domain, is grow- ing exponentially. Text mining can provide methods and tools that can help professionals to handle large amount of literature. However, most of the current approaches focus on the textual body of the article, usually ig- noring tables and figures. In this paper, we present a hybrid methodology that utilizes machine learning and set of heuristics rules for information extraction from tables in literature. In a case study, the method achieved F1-score of 83.94% for extracting the number of patients with the names of participant groups from clinical trial publications. Keywords: health informatics, text mining, table mining, clinical trials

1. Introduction

The literature in biomedical domain is growing exponentially. Currently, there are over 25 million articles indexed in MEDLINE. The fields of natural language processing and text mining have developed methods and tools that are able to help with processing a large amount of literature and retrieving information of interest. However, most of the current approaches are limited to the textual body of articles, usually ignoring figures and tables. However, authors of the scientific literature utilize tables in order to present detailed information about the settings and the results of their experiments. Tables are used also for other purposes, where authors need to present relatively large amount of multidimen- sional information in a compact manner [13]. Tables may have various structures and the information can be presented in the vast variety of formats. The structure of the table defines the relationships between cells. In order to ”read” the table correctly, these relationships and the roles of the cells need to be recognized. Current representational models of ta- bles mainly focus on visualization, which makes automated table processing a complex task, with a need to disentangle its visual structure before analyzing the presented data. Previously, work have been done in detecting tables in documents (PDF, text and HTML) using optical character recognition [6] and machine learning algo- rithms, such as decision trees [10], Support Vector Machines [12], and heuris- tics [16]. Recognizing functional areas of the table (headers and data areas) has been done mainly using machine learning methods like decision trees [2] or

74 BelBI2016, Belgrade, June 2016. Information extraction from tables in literature

Fig. 1. Example of a baseline characteristic table reporting number of partici- pants (PMC2147028) conditional random fields [15]. Also some work has been done in the area of in- formation retrieval [5], information extraction [9, 13] and question answering [15]. However, most of these approaches examined general domain and were limited to visually simple tables. In this paper, we introduce a methodology for information extraction from tables in the biomedical domain and present a case study on extracting number of patients and participants groups from clinical trial literature.

2. Method

Our approach consists of six steps: (1) table detection, (2) functional processing, (3) structural processing, (4) semantic tagging, (5) pragmatic processing, (6) syntactic processing and information extraction. As a data set to test our method we used clinical documents stored as open access in PubMedCentral3. As these documents are in XML format, it is trivial to detect tables by searching for a particular XML tag. We therefore focus on other steps.

2.1. Functional processing The aim of functional processing is to detect the basic roles of cells. The cell can be column header, row header, super-row (row or part of row header that categorize additionally row header) or data cell. In order to detect functional roles of the cells, we used a set of heuristics about cells positions, its neighbors, content type, surrounding XML tags and XML attributes (such as span).

2.2. Structural processing During the structural processing, the relationships between cells are recognized, which include relationships to the navigational cells such as headers, stubs and super-rows. We used a set of heuristics about cell’s function, structure, content, position and table’s structure to disentangle table’s structure and inter-cell rela- tionships [8]. Information about cell’s content, position, function and relation- ships are stored in a database. 3 http://www.ncbi.nlm.nih.gov/pmc/

BelBI2016, Belgrade, June 2016. 75 Nikola Milosevic et al.

2.3. Semantic tagging Once data are stored in the database, we enrich the data by annotating the cell’s content with concepts and semantic types from the UMLS [1]. We have developed a dictionary-based concept tagging method to annotate text using UMLS, WordNet, DBPedia and vocabularies represented in Simple Knowledge Organization System model [7].

2.4. Pragmatic processing The main purpose of the pragmatic processing is to identify the table where the information of interest is located and reduce the amount of false positives. We propose a machine learning classification method with an aim to determine the purpose of a given table and what kinds of information are stored in it. We classified tables into tables reporting baseline characteristics, adverse event, inclusion/exclusion criteria and others. We tested a number of machine learn- ing algorithms, including Naive Bayes, SVM, decision trees, random tree and random forest in Weka toolkit [4]. As features, we used words and semantic an- notations from the caption, column and row headings, the number of rows and number or columns.

2.5. Syntactic processing and information extraction In this step, we have designed rules to find lexical cues in cells and extract infor- mation from related cells. The cells are syntactically processed and a template based information extraction approach was implemented. Value patterns were examined using regular expressions and the appropriate part of the value is ex- tracted. Since CONSORT recommended reporting of trial baseline characteristics and demographic information in tables [11], we experimented with the extraction of the number of patient per participant group. They may be as the total num- ber of patients in the caption of the table or alternatively, per participant group in headings or data cells. We created a data set of 200 articles containing, at least, one baseline characteristic table and split it into a training and testing sets containing 100 article each. The rules are crafted by examining and testing on the training set in an iterative manner. The output contained a reference to the article from which the number was extracted, label (in this case ”Number of pa- tients”), participant group name (extracted from the header) and the extracted number.

3. Results

On evaluation set, our method performs functional analysis with a precision of 0.9425, recall of 0.9428 and F1-score of 0.9426. Relationships between cells were recognized with a precision of 0.9238, recall of 09744 and F1-score of 0.9484. The results of experiments performed for the pragmatic classification are given in Table 1.

76 BelBI2016, Belgrade, June 2016. Information extraction from tables in literature

Table 1. Results of the pragmatic four class classification experiments

Algorithm Precision Recall F-Score Naive Bayes 0.943 0.943 0.943 Bayesian Networks 0.938 0.939 0.938 C4.5 decision trees 0.944 0.945 0.944 Random tree 0.905 0.903 0.904 Random Forests 0.948 0.948 0.948 SVM 0.967 0.966 0.966

The results of manual evaluation of information extraction of number of the patients can be seen in Table 2.

Table 2. Results of information extraction for number of patients

Precision Recall F-Score Training 0.900 0.839 0.868 Testing 0.894 0.791 0.839

Accumulation of errors over the steps affects the final performance, as well as presentation types that are hard to generalize. However, the performance of information extraction from tables is promising and reliable over the range of complex tables.

4. Discussion

We presented a hybrid approach for information extraction from tables that is composed of six steps. Our approach uses machine learning and heuristic rules. Even though some of the previous approaches reported slightly better accuracy [3, 14], they were limited to standardized tables with pre-defined table’s struc- ture. Our approach does not make assumptions about table’s structure and can be applied to any kind of tables. The first three steps of our approach are domain independent. Semantic tagging, pragmatic processing and information extrac- tion rules are domain and task dependent. The results on a case study indicate that information can be reliably extracted from complex tables, in particular, if such information is combined with data mined from the main text.

References

1. Bodenreider, O.: The unified medical language system (umls): integrating biomedical terminology. Nucleic acids research 32(suppl 1), D267–D270 (2004)

BelBI2016, Belgrade, June 2016. 77 Nikola Milosevic et al.

2. Chavan, M.M., Shirgave, S.: A methodology for extracting head contents from mean- ingful tables in web pages. In: Communication Systems and Network Technologies (CSNT), 2011 International Conference on. pp. 272–277. IEEE (2011) 3. Embley, D.W., Tao, C., Liddle, S.W.: Automating the extraction of data from html tables with unknown structure. Data & Knowledge Engineering 54(1), 3–28 (2005) 4. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD explorations newsletter 11(1), 10– 18 (2009) 5. Hearst, M.A., Divoli, A., Guturu, H., Ksikes, A., Nakov, P., Wooldridge, M.A., Ye, J.: Biotext search engine: beyond abstract search. Bioinformatics 23(16), 2196–2197 (2007) 6. Kieninger, T.G., Strieder, B.: T-recs table recognition and validation approach. In: AAAI Fall Symposium on Using Layout for the Generation, Understanding and Re- trieval of Documents (1999) 7. Milosevic, N.: Marvin: Semantic annotation using multiple knowledge sources. arXiv preprint arXiv:1602.00515 (2016) 8. Milosevic, N., Gregson, C., Hernandez, R., Nenadic, G.: Extracting patient data from tables in clinical literature: Case study on extraction of bmi, weight and number of patients. In: Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2016). vol. 5, pp. 223–228 (2016) 9. Mulwad, V., Finin, T., Syed, Z., Joshi, A.: Using linked data to interpret tables. COLD 665 (2010) 10. Ng, H.T., Lim, C.Y., Koo, J.L.T.: Learning to recognize tables in free text. In: Proceed- ings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics. pp. 443–450. Association for Computational Linguistics (1999) 11. Schulz, K.F., Altman, D.G., Moher, D.: Consort 2010 statement: updated guidelines for reporting parallel group randomised trials. BMC medicine 8(1), 1 (2010) 12. Son, J.W., Lee, J.A., Park, S.B., Song, H.J., Lee, S.J., Park, S.Y.: Discriminating mean- ingful web tables from decorative tables using a composite kernel. In: Web Intelli- gence and Intelligent Agent Technology, 2008. WI-IAT’08. IEEE/WIC/ACM Interna- tional Conference on. vol. 1, pp. 368–371. IEEE (2008) 13. Tengli, A., Yang, Y., Ma, N.L.: Learning table extraction from examples. In: Proceed- ings of the 20th international conference on Computational Linguistics. p. 987. As- sociation for Computational Linguistics (2004) 14. Wang, X.F.: Research on information extraction based on web table structure and ontology. Applied Mechanics and Materials 321, 2254–2259 (2013) 15. Wei, X., Croft, B., McCallum, A.: Table extraction for answer retrieval. Information retrieval 9(5), 589–611 (2006) 16. Yildiz, B., Kaiser, K., Miksch, S.: pdf2table: A method to extract table information from pdf files. In: IICAI. pp. 1773–1785 (2005)

78 BelBI2016, Belgrade, June 2016. Standard Genetic Code vs Vertebrate Mitochondrial Code: Nucleon Balances and p-Adic Distances

Nataˇsa Z.ˇ Miˇsi´c

Lola Institute, Kneza Viˇseslava 70a, Belgrade, Republic of Serbia [email protected]

Abstract

The standard genetic code (SGC) is crucial to our understanding not only of the origin of life, but also of the link between the physical and biological realm with information as an ultimate unifying concept [1]. The nature, origin, and evolution of genetic code is an enigma for itself, whose disclosure approached through three not necessarily mutually exclusive main theories which suggest that we still have not found an adequate set of the physicochemical or/and bio- logical factors in ensuring the emergence of SGC. Our approach includes some nonstandard concepts: Shcherbaks arithmetical regularities of nucleon numbers of SGC constituents [2] and Dragovichs p-adic modeling of the vertebrate mito- chondrial code (VMC) [3]. In addition to well-known nucleon balances [1, 2], we introduced a new type based on an aggregate nucleon number of amino acid and its corresponding codon. By giving Euclidean representation of the 5-adic model of SGC and VMS as well as the nucleotide 2-adic distances, we visualized their inherent symmetries. A comparison of both type nucleon balances for these two genetic codes shows that the regularities are more presented in SCG than in VMC, despite the fact that the last is more symmetrical. The fact that VMC is the simplest genetic code system among all extant organisms so far examined makes the previous result to be more significant. Also we show that the mean value of the aggregate nucleon numbers of SGC has a more accurate agreement with the previously defined self-similarity constant [1] than of VMC, what altogether in- dicates that by partially different mechanisms had been driven an optimization of SCG in the primordial conditions and of VMC in the highly developed organ- isms. Keywords: genetic code, origin, codon degeneracy, amino acid nucleon bal- ances, p-adic distances.

References

1. Misiˇ c,´ N. Z.:ˇ Nested Numeric/Geometric/Arithmetic Properties of shCherbaks Prime Quan- tum 037 as a Base of (Biological) Coding/Computing. Neuroquantology, Vol. 9, No. 4, 702- 715, (2011); 2. shCherbak, V.I.: The Arithmetical Origin of the Genetic Code. In: Barbieri, M. (ed.): The Codes of Life: The Rules of Macroevolution, Springer, 153-188, (2008). 3. Dragovich, B., Dragovich, A.: p-Adic Modelling of the Genome and the Genetic Code. Com- puter Journal, Vol. 53, No. 4, 432-442, (2010).

BelBI2016, Belgrade, June 2016. 79 Structural Characterization of the Trypanosoma brucei CK2A1-HDAC1/HDAC2 Interactions by Molecular Modeling and Protein-Protein Docking

Ozal Mutlu

Marmara University, Faculty of Arts and Sciences, Department of Biology, 34722, Goztepe, Istanbul, Turkey [email protected]

Abstract

Post-translational modifications of the histone tails by various mechanisms in- cluding phosphorylation, methylation, acetylation, ubiquitination etc. have valu- able impacts on gene regulation, development and disease [1]. One of the mod- ifications is the acetylation which is modulated by histone deacetylases (HDAC) and targeting of the HDACs is a popular issue in cancer treatment, some neu- rological and parasitic diseases [2]. In this work, we have characterized Try- panosoma brucei gambiense histone deacetylase class I enzymes (HDAC1 and HDAC2) interaction with the casein kinase 2 alpha 1 catalytic domain by protein- protein docking. Because of there is no crystal structural of the enzymes, firstly 3D structures were determined by homology modeling using MODELLER v9.16. Then protein-protein docking and optimizations were conducted by using three different servers (HADDOCK, ZDOCK and PyDock). At the end, we decided to use only HADDOCK server because of higher prediction value. Then, docked proteins again and predicted the interaction interface using HADDOCK server and select best complexes based on the total server score, orientation of the desired region, and also solvation free energy (∆Gcomplex)from thePDBePISAs- erver (http://www.ebi.ac.uk/msd-srv/prot int/pistart.html). As a conclusion, un- derstanding binding mode and interaction interface of HDAC-CK2A1 could be a potent option in inactivation of histone deacetylation by dissecting protein- protein interaction for the treatment of parasitic diseases and selective drug de- sign development. Keywords: histone deacetylase, protein-protein docking, interaction interface, Trypanosoma brucei gambiense

References

1. Mersfelder, E.L., Parthun, M.R.: The Tale Beyond the Tail: Histone Core Domain Modifica- tions and the Regulation of Chromatin Structure.Nucleic Acids Research. 34(9):2653-2662, 2006. 2. Thomas, E.A.: Involvement of HDAC1 and HDAC3 in the Pathology of Polyglutamine Dis- orders: Therapeutic Implications for Selective HDAC1/HDAC3 Inhibitors.Pharmaceuticals. 7(6):634-661, (2014).

80 BelBI2016, Belgrade, June 2016. Mining PMMoV genotype-pathotype association rules from public databases

Vesna Paji´c1, Bojana Banovi´c2, Miloˇs Beljanski3 and Dragana Dudi´c1

1 Center for Data Mining and Bioinformatics, Faculty of Agriculture, University of Belgrade Nemanjina 6, 11080 Zemun, Serbia {svesna, ddragana}@agrif.bg.ac.rs 2 Institute of Molecular Genetics and Genetic Engineering, University of Belgrade, Vojvode Stepe 444a, 11000 Belgrade, Serbia [email protected] 3 Institute for General and Physical Chemistry, University of Belgrade, Studentski Trg 16, 11080 Belgrade, Serbia [email protected]

Abstract. In order to utilize knowledge hidden in public databases, we ap- plied several data mining techniques on PMMoV sequences from NCBI nu- cleotide database with an aim to characterize this virus at molecular level. The dataset consists of 231 nucleotide sequences collected. We identified three distinct genotype variants (namely TG, GA and GG) based on the nucleotide combinations on significant positions within subgroups of se- quences. Those positions were further confirmed using the EM algorithm. The information about pathotype was known for only 40% of studied se- quences and distribution of pathotypes was very imbalanced. Nevertheless, using the Apriori-type algorithm two strong rules was mined (confidence 0.96 and 0.93). The analysis showed that hidden knowledge could be dis- closed and put to use through data mining approaches like class associa- tion analysis and cluster analysis. Keywords: clustering, class association rules, PMMoV

1. Introduction

With new sequencing technologies, field of genomics is growing fast and so is the amount of the data behind it. Most of that data is publicly available through different data sources in a recent molecular biology databases review [1] there are even 1685 relevant resources in molecular biology reported, where each data source contains a large amount of data. NCBI nucleotide database con- tains sequences from multiple sources including GenBank with 190,250,235 sequences4, RefSeq with 92,936,289 sequences5, and PDB with 117,240 se- quences6. Although a vast amount of sequence data is available, there is a huge and mostly unrealized potential in analyzing it. In this research we choose Pepper Mild Mottle Virus (PMMoV) as in silico plant

4 http://www.ncbi.nlm.nih.gov/genbank/statistics/ 5 http://www.ncbi.nlm.nih.gov/refseq/statistics/ 6 http://www.rcsb.org/pdb/statistics/holdings.do

BelBI2016, Belgrade, June 2016. 81 Vesna Paji´c et al. virus model to test what kind of information one can extract from publically available nucleotide sequences by using bioinformatics tools and data mining approach. PMMoV is a tobamovirus responsible for diminishing pepper yields. Until 2005, soil treatment against PMMoV consisted of the application of methyl bromide, ozone depleting chemical. By utilizing publicly available PMMoV’s se- quence data one could explore life cycle, pathogenicity, virulence potential and plant resistance mechanisms of the virus in order to develop more eco-friendly alternatives for the suppression of the virus in the field. We analyzed nucleotide content and single nucleotide variations of available sequences with several data mining techniques, and compared the results with information on virus pathogenicity found in the literature [2–4]. The overall aim was to detect some of existing relations between nucleotide content and pathotype which could po- tentially be used for future monitoring of virus and its pathogenicity.

2. Data

At the time of the analysis, 231 PMMoV nucleotide sequences were available in NCBI database at total; 13 of them were complete genomes, 150 corresponded to coat protein, 62 corresponded to 126K replicase small subunit, 6 corresponded to 183K replicase large subunit and 7 corresponded to 30K cell-to-cell movement protein. They constituted dataset D1 and were aligned using Clustal X 2.17, with later manual correction in MEGA 68. We used package seqinr in R in order to de- termine profile sequence, in respect of which all other analyses were conducted.

There were 94 sequences (40%) in the dataset D1 for which information on pathotype (one of five pathotypes: P0,P1,P12,P123 and P1234 described in litera- ture [5, 6]) was available either in papers or in NCBI database. For the purpose of mining genotype-pathotype association rules, these sequences, along with the information about genotype (determined in this research) and pathotype were extracted in another dataset, the dataset D2. Dataset D1 was additionally split into groups and subgroups based on the part of the genome the sequences were covering (Table 1). The whole genome se- quences were then divided into subsequences corresponding to the same nu- cleotide positions (np) the subgroups were covering, so each subgroup had got 13 more sequences, obtained from the whole genome sequences.

3. Tools, Methods and Algorithms

For fulfilling data mining tasks we used WEKA 3.6.109 algorithm implemen- tations. Bioinformatics analyzes were performed using Bioconductor package

7 http://www.clustal.org/clustal2/ 8 http://www.megasoftware.net/ 9 http://www.cs.waikato.ac.nz/ml/weka/

82 BelBI2016, Belgrade, June 2016. Mining PMMoV genotype-pathotype association ...

Table 1. Number of sequences and np for each subgroup

Group Subgroup Number of Number of First np in Last np in sequences np covered the genome the genome 1 1.1 67 200 612 810 1.2 50 768 481 1248 2 2 22 190 1622 1811 3 3 18 790 4015 4805 4.1 15 779 4909 5682 4 4.2 110 476 5685 6157 4.3160 209 5841 6047

in R. After aligning sequences of each subgroup, single nucleotide variations (SNVs) were determined with in-house script. Comprehensive analyzes of SNVs revealed several informative np in each subgroup. Based on the combination of nucleotides contained in these np, sequences could be divided into disjoint sets. Information about the sets and determined significant positions is shown in Ta- ble 2. Analysis of relationships among these sets, for the sequences spanning in more

Table 2. Nucleotide positions that divide sequences into disjoint sets in each group

Group np in the genome Set label Nucleotide combi- Number of sequence nation (short mark) sequences 1.1 639; 669 G1.1-1 GT (G) 52 G1.1-2 AC (A) 15 1.2 565; 566; 708;1125 G1.2-1 TGGA (T) 34 G1.2-2 GTTG (G) 16 2 1638; 1647 G2-1 TA (T) 12 G2-2 CG (C) 9 3 4107; 4131; 4392; G3-1 GACGTCCA (G) 10 4395; 4516; 4560; 4650; G3-2 AGTACTTG (A) 8 4698 4.1 4929; 4963; 5085; G4.1-1 CGACACGG (C) 10 5151; 5244; 5487; 5557; G4.1-2 TAGGGTAA (T) 5 5611 5763; 5819; 5837; G4.2-1 CTTACTGATGC (C) 86 5996; 4.2 6002; 6011; 6038; G4.2-2 TCCTTCTTATT (T) 24 6062; 6100; 6101; 6127 4.3 5996; 6002; 6038 G4.3-1 ACG (A) 127 G4.3-2 TTT (T) 35

BelBI2016, Belgrade, June 2016. 83 Vesna Paji´c et al. than one group, was used for the detection of three distinct genotype variants, which were additionally confirmed with cluster analysis. In order to confirm determined genotype variants and to disclose other exist- ing similarities between sequences, we performed cluster analysis on D1 dataset using Expectation Maximization (EM) algorithm [7]. The optimum number of clusters was estimated via 10-fold cross validation. Rules for associating genotypes with pathotypes of virus were learned with the Apriori algorithm [8] from the dataset D2 with specified minimum support of 0.1 and minimum confidence of 0.9. We used modification of the original algorithm which combines association rule technique with classification rule technique [9] to allow the algorithm to focus on association rules useful to determine prede- fined classes.

4. Results

4.1. Determination of Genotypes We analyzed the relationship among sets G1.1-1, G1.1-2, G1.2-1 and G1.2-2 for 50 sequences in Group 1 covering subgroups 1.1 and 1.2. Three distinct geno- type variants were determined based on the nucleotide content on sites 565, 566, 639, 669, 708 and 1125: Genotype variant GA, Genotype variant TG and Genotype variant GG. Evaluation using the EM algorithm resulted with the three clusters in subgroup 1.1, based on already emphasized positions 639 and 708, corresponding to de- termined genotype variants. In subgroup 1.2, four clusters were mined, one of them contained only one sequence with Genotype variant GG. Distinction of three remaining clusters was based upon 565 np and 552 np. The separation at the earlier stressed position 565 extracts Genotype variant TG. For the clusters formed upon the new obtained position 552 we can state that all sequences from one cluster were having Genotype variant GG and all sequences having Geno- type variant GA was in the other cluster, which also contained sequences having Genotype variant GG. Assuming that revealed information about Group 1 can be transferred to whole genomes (and therefore to isolates) we classified the whole genome sequences based on the determined genotype variants (Table 3). Cluster analysis of the sequences from Groups 2, 3 and subgroup 4.1 did not re- veal any new similarities among sequences, but in the subgroup 4.2 it revealed three clusters which corresponded to defined genotype variants. The three clus- ters obtained using the EM algorithm segregated 6002 np, based on which Geno- type variant GA is matched, and newly observed 5975 np which can be used to distinguish Genotype variant GG from Genotype variant TG.

4.2. Genotype pathotype associations Applying class association analysis we found two strong rules:

84 BelBI2016, Belgrade, June 2016. Mining PMMoV genotype-pathotype association ...

Table 3. Distribution of whole genome sequences into disjoint sets determined by Groups 1.1 4.3 analysis and the determined genotype variants (for simplified representation, short marks are used instead of set’s labels)

Sequence 1.1 1.2 2 3 4.1 4.2 4.3 Geno- type Spain-1989-P12—NC 003630.1 G T T G C C A TG Spain-1989-P12—M81413.1 G T T G C C A TG Japan-2005-P0—AB113117.1 G T T G C C A TG Japan-2005-P0—AB113116.1 G T T G C C A TG Japan-2002-P0—AB069853.1 G T T G C C A TG Japan-1997-P1234—AB000709.2 G T T G C C A TG China-2006—AY859497 G T T G C C A TG Brasil-2010—AB550911.1 G T T G C C A TG India-2014-P12—KJ631123.1 G T T G C C A TG Japan-2003-P1234—AB276030.1 G G G A T C A GG SouthKorea-2005—AB126003.1 G G G A T C A GG Japan-2007-P12—AB254821.1 G G G A T C A GG Spain-2002-P123—AJ308228 A G G A T T T GA

Genotype variant=TG 45 =⇒ Pathotype=2 43 conf:(0.96) Genotype variant=GA 29 =⇒ Pathotype=3 27 conf:(0.93) The rules clearly indicate that almost all (43 out of 45) sequences that have Genotype variant TG also have pathotype P12, and that 27 out of 29 sequences having Genotype variant GA also have pathotype P123. For the sequences having Genotype variant GG, 5921 np (found with a classifi- cation method) can be discrimative for pathotype prediction: if sequence has T or C it is of pathotype P12, while if sequence has A it is of pathotype P123.

5. Conclusion

The clustering and class association analysis of 231 PMMoV sequences available at NCBI showed some regularities which potentially can be used for molecular monitoring of virus genotype-pathotype association.

References

1. Rigden D. J., Fernndez-Surez X. M., Galperin M. Y.: The 2016 database issue of Nucleic Acids Research and an updated molecular biology database collection. Nucleic Acids Re- search, 44, D1D6. (2016) 2. Gilardi P, Wicke B, Castillo S, de la Cruz A, Serra MT, Garca Luque I. Resistance in Cap- sicum spp. against the tobamoviruses. In: Pandalai SG, ed. Recent research developments in virology, Vol. 1. India: Transworld Research Network, 547-558. (1999) 3. Genda Y, Kanda A, Hamada H, Sato K, Ohnishi J, Tsuda S. Two amino acid substitutions in the coat protein of Pepper mild mottle virus are responsible for overcoming the L4 gene mediated resistance in Capsicum spp. Phytopathology 97, 787793. (2007)

BelBI2016, Belgrade, June 2016. 85 Vesna Paji´c et al.

4. Antignus, O. Lachman, M. Pearlsman, L. Maslenin, A. Rosner. A new pathotype of Pepper mild mottle virus (PMMoV) overcomes the L4 resistance genotype of pepper cultivars. Plant Dis. 92, 10331037. (2008) 5. Boukema I. W. Resistance to TMV in Capsicum chacoense Hunz. is governed by allele of the L-locus. Capsicum Newsl. 3, 4748 (1984) 6. Sawada H., Takeuchi S., Hamada H., Kiba A., Matsumoto M., Hikichi Y.: A new tobamovirus-resistance gene L1a, of sweet pepper (Capsicum annuum L.). J. Jpn. Soc. Hortic. Sci. 73, 552-557 (2004) 7. Dempster A. P., Laird N. M., and Rubin D. B., Maximum Likelihood from Incomplete Data via the EM Algorithm, Journal of the Royal Statistical Society B 39: 138. (1977)11. 8. Atluri, G., Gupta, R., Fang, G., Pandey, G., Steinbach, M., Kumar, V., Association analysis techniques for bioinformatics problems, Bioinformatics and Computational Biology, 1-13. (2009) 9. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules, Proceedings of the 20th International Conference on Very Large Databases, 487499. (1994)

86 BelBI2016, Belgrade, June 2016. Complexity measures based on intermittent events in brain EEG data

Paolo Paradisi1,2, Marco Righi1, Massimo Magrini1, Maria Chiara Carboncini3, Alessandra Virgillito3, and Ovidio Salvetti1

1 Institute of Information Science and Technologies (ISTI-CNR), Via G. Moruzzi 1, I-56124 Pisa, ITaly [email protected] 2 Basque Center for Applied Mathematics(BCAM), Alameda de Mazarredo 14, E-48009 Bilbao, Basque Country, Spain 3 Department of Neuroscience, University of Pisa, via Paradisa 2, I-56126 Pisa, Italy

Abstract. In this work we discuss the application of the complexity ap- proach to the study of physiological signals. In particular, a theoretical framework based on the ubiquitous emergence of fractal intermittency in complex signals is introduced. This approach is based on the ability of com- plex systems’ cooperative micro-dynamics of triggering meta-stable self- organized states. The meta-stability is strictly connected with the emer- gence of a intermittent point process displaying anomalous non-Poisson statistics and driving the fast transition events between successive meta- stable states. As a consequence, the estimation of features related to in- termittent events can be used to characterize the ability of the complex system to trigger self-organized structures. We introduce an algorithm for the processing of complex signals that is based on the fractal intermittency paradigm, thus focusing on the detection and scaling analysis of intermittent events in human ElectroEncephalo- Grams (EEGs). We finally discuss the application of this approach to real EEG recordings and introduce the preliminary findings. Keywords: signal processing, complexity, fractal intermittency, brain, elec- troencephalogram (EEG), disorders of consciousness

1. Introduction Human physiology is a prototypical example of complexity and the brain is surely the most important one. The brain is composed of elementary units, the neurons, that are strongly connected with many other neurons with highly nonlinear interactions, given by chemically activated electrical signals traveling along the inter-neuron links (axons and dendrites). The nonlinear dynamics at the level of single neurons (i.e., the threshold mechanism for the electrical dis- charges generating spikes and bursts) are highly enhanced by the complex link topology, but at the same time some kind of ordering, or self-organizing, princi- ple triggers the formation of global cooperativity. The overall picture is that of a complex network with a huge number of nodes (the neurons) and links with a very complicated topology. It is then not surprising that brain dynamics display a very rich landscape of different behaviors and a very efficient plastic behavior, characterized by a rapid and efficient capability of response to rapid changes in

BelBI2016, Belgrade, June 2016. 87 Paradisi et al. the external environment. Due to this variety, the attempt of characterizing the brain functioning with a relatively low number of parameters is a very fascinat- ing problem and a very hot topic in brain research. This topic involves different disciplines, spanning from biology and medicine to non-equilibrium statistical physics of complex systems, network analysis and information science.

2. The complexity paradigm The complexity paradigm involves a modeling approach that is complementary to the microscopic approach based on extending the micro-dynamics of single units to the network level via properly modeled node-node interactions. Follow- ing the paradigm of emerging properties, the complexity approach simply focuses on the modeling of self-organized large scale structures emerging from the coop- erative dynamics of the complex network. The main idea is that self-organized structures are the essential actors in the global dynamics of complex systems and play a crucial role in the response of the system to external stimuli. As a con- sequence, also the statistical indicators extracted from the data analysis usually refer to some global property associated with the large scale, global, dynamical evolution of coherent or self-organized structures.

2.1. Complexity and fractal intermittency Even if a universally accepted definition of a complexity does not yet exist, com- plex systems often display the following features: (1) a complex system is multi-component with a large number of degrees of freedom, i.e., many functional units or nodes. As said above, these units interact with each other and their dynamics are strongly nonlinear; (2) non-linearity and multi-component is not enough to define complexity: the dynamics must be cooperative and trigger the emergence of self-organized struc- tures; (3) self-organized states display long-range space-time correlations (slow power- law decay); (4) self-organized states are meta-stable, with relatively long life-times and fast transition events between two successive states, denoted in the following as cru- cial events. Crucial events determine a fast memory drop, while the self-organized struc- tures remain strongly correlated until their decay. The sequence of crucial events, marking the transition among self-organized states, is an emergent dynamics de- scribed as a a birth-death point process of self-organization. Then, the feature (4) in the above list is the basic property allowing for a description of com- plexity in terms of intermittent signals. Due to the fast memory drop occurring during the fast transitions, each self-organized state is often independent from each other, as such as the crucial transition events. This is denoted as renewal condition. In this case, the sequence of crucial events is described by a renewal point process. A very general observation is that a complex (cooperative) system is characterized by long life-times that are statistically distributed according to a inverse power-law. The life-times correspond to the time between two successive crucial events and are also denoted as inter-event times or Waiting Times (WTs).

88 BelBI2016, Belgrade, June 2016. EEG complexity measures...

In this work we discuss an approach to complexity based on the modeling of time intermittency emerging by the underlying cooperative dynamics. In par- ticular, the emergence of a renewal point process whose WT distribution is a inverse power-law: ψ(τ) ∼ 1/τ µ is denoted as fractal intermittency [9, 14, 15, 8, 6] The distribution ψ(τ) and the exponent µ are emerging properties and, thus, a signature of complex behavior. In Fig. 1 we report a synthetic scheme qual- itatively explaining the connection between self-organization, cooperation and non-Poisson renewal processes. Poisson renewal processes always emerge in the case of independent systems, whatever the micro-dynamics of the single nodes. As a consequence, a departure from the Poisson statistics reveals some kind of cooperation among the nodes of the network. For power-law distributed WTs,

Fig. 1. Comparison of Poisson (non-complex) and non-Poisson (complex) pro- cesses. µ is then used as an indicator of complexity, essentially being a measure of the ability of the system’s dynamics to trigger global self-organized structures. In particular, complexity is identified with a condition of very slow decay in ψ(τ), corresponding to the range µ < 3. Conversely, the feature (3) is the starting point for a description of complexity in terms of spatial and topological indicators (e.g., the degree distribution of a complex network, avalanche size distribution). 3. Crucial events and fractal intermittency in the brain Meta-stability is a basic feature of the information processing in the brain neu- ral network. Fingelkurts and Fingelkurts recognized that rapid changes in the ElectroEncephaloGram (EEG), called Rapid Transition Processes (RTPs), mark passages between two quasi-stationary periods, each one corresponding to dif- ferent neural assemblies, [1, 2] and are the signature of brain self-organization. RTPs and neural assemblies are then a prototype of crucial events and meta- stable self-organized states, respectively. The algorithm for the automatic detec- tion of RTP events in EEG data was developed in Ref. [2] and exploited by the authors of Refs. [3–5, 7, 9, 10, 16] to characterize the complexity of the intermit- tent events. By exploiting a scaling detection method, the EDDiS method ([14] and references therein), these authors found that brain dynamics display fractal intermittency. In particular, it was shown that the fractal intermittency approach is able to reveal the integrated (Rapid Eye Movement, REM) and segregated (Non-REM) stages during sleep, thus in agreement with the consciousness state of the subjects [9, 10, 16].

BelBI2016, Belgrade, June 2016. 89 Paradisi et al.

In the intermittency-based analysis here proposed, a key aspect is the definition of events, which needs to be further studied in order to extend the above analysis to different experimental and clinical conditions.

4. Signal processing for intermittent complex systems The results obtained by applying the algorithms cited above and, in particular: (i) the RTP event detection algorithm [2]; (ii) the EDDiS method for the evaluation of the diffusion scaling H, whose re- lationship with the index µ is known when renewal condition is positively vali- dated [3, 6, 8, 14]; are very promising in the perspective of potential applications in the clinical ac- tivity of neurological disorders. However, RTP events are defined only for some experimental conditions. In this work we investigate the key aspect of the event definition. We propose an algorithm involving a more general definition of event and being able to detect and discriminate events with different neuro-physiological origins. The proposed method essentially extends the technique introduced and applied in Refs. [11–13]. This method allows to extract different kind of crucial events marking the sudden increases of activity in given frequency bands. This allows to derive different definitions of events and to build a very flexible algorithm to be exploited in different experimental conditions. We assume that the signals were already pre-processed for the artifact cleaning. Then, the software tool is divided into different modules:

(1) splitting of the single EEG channel into different frequency bands; (2) detection of crucial events and high-activity epochs in the different fre- quency bands by using a thresholding method; (3) building of a spatio-temporal map of events; (4) extraction of some specific kind of events from the event map; (5) estimation of the complexity of these events of interest, both for single EEG channels and for global events.

Despite its apparent simplicity, this algorithm is very flexible and powerful. Being based on the classical Fourier approach and on splitting the EEG signal into standard frequency bands, this approach allows for a more clear link between the event detection algorithm and its neuro-physiological interpretation. In this sense, a particular kind of brain events should be recognized to be a neural correlate of some increased neuro-physiological activity. Finally, we will discuss some applications on real EEG data in different con- ditions (wake, sleep). Some preliminary results on subjects with disorder of con- sciousness will be presented.

References

1. A. A. Fingelkurts, A. A. Fingelkurts, Brain-Mind Operational Architectonics Imaging: Technical and Methodological Aspects. Open Neuroimag. J. 2 (2008) 73-93.

90 BelBI2016, Belgrade, June 2016. EEG complexity measures...

2. A.Y. Kaplan, A.A. Fingelkurts, A.A. Fingelkurts, B.S. Borisov, B.S. Darkhovsky, Nonsta- tionary nature of the brain activity as revealed by EEG/EMG: methodological, practical and conceptual challenges. Signal Process. 85 (2005) 2190-2212. 3. P. Allegrini, D. Menicucci, R. Bedini, L. Fronzoni, A. Gemignani, P. Grigolini, B.J. West, P. Paradisi, Spontaneous brain activity as a source of ideal 1/f noise, Phys. Rev. E 80 (2009), 061914. 4. P. Allegrini, P. Paradisi, D. Menicucci, A. Gemignani, Fractal complexity in spontaneous EEG metastable state transitions: new vistas on integrated neural activity. Frontiers in Physiology 1, 128 (2010). 5. P. Allegrini, D. Menicucci, R. Bedini, A. Gemignani, P. Paradisi, Complex intermittency blurred by noise: Theory and application to neural dynamics. Phys. Rev. E 82 (2010) 015103. 6. P. Paradisi, R. Cesari, A. Donateo, D. Contini, P. Allegrini, Diffusion scaling in event- driven random walks: an application to turbulence. Rep. Math. Phys. 70 (2012) 205- 220. 7. P. Allegrini, P. Paradisi, D. Menicucci, R. Bedini, A. Gemignani, L. Fronzoni, Noisy co- operative intermittent processes: From blinking quantum dots to human consciousness. J. Phys.: Conf. Series 306 (2011) 012027. 8. P. Paradisi, R. Cesari, A. Donateo, D. Contini, P. Allegrini, Scaling laws of diffusion and time intermittency generated by coherent structures in atmospheric turbulence. Nonlinear Processes in Geophysics 19 (2012) 113-126; P. Paradisi et al., Corrigendum, Nonlinear Processes in Geophysics 19 (2012) 685. 9. P. Paradisi, P. Allegrini, A. Gemignani, M. Laurino, D. Menicucci, A. Piarulli, Scaling and intermittency of brain events as a manifestation of consciousness, AIP Conf. Proc. 1510 (2013), 151-161. 10. P. Allegrini, P. Paradisi, D. Menicucci, M. Laurino, R. Bedini, A. Piarulli, A. Gemignani, Sleep unconsciousness and breakdown of serial critical intermittency: New vistas on the global workspace. Chaos, Solitons and Fractals 55 (2013) 32-43. 11. C. Navona, U. Barcaro, E. Bonanni, F. Di Martino, M. Maestri, L. Murri, An automatic method for the recognition and classification of the A-phases of the cyclic alternating pattern, Clin. Neurophysio. 113 (2002), 1826-1833. 12. U. Barcaro, E. Bonanni, M. Maestri, L. Murri, L. Parrino, M.G. Terzano, A general automatic method for the analysis of NREM sleep microstructure, Sleep Med. 5 (2004), 567-576. 13. M. Magrini, A. Virgillito, U. Barcaro, L. Bonfiglio, G. Pieri, O. Salvetti, M.C. Car- boncini, An automatic method for the study of REM sleep microstructure, Int. Work- shop on Computational Intelligence for Multimedia Understanding (IWCIM 2015), Prague, 29-30 October 2015, DOI: 10.1109/IWCIM.2015.7347066 [IEEE Xplore Digital Library] 14. P. Paradisi, P. Allegrini, Scaling law of diffusivity generated by a noisy telegraph signal with fractal intermittency, Chaos, Solitons and Fractals 81 (2015), 451–462. 15. P. Paradisi, G. Kaniadakis, A.M. Scarfone, The emergence of self-organization in com- plex systems–Preface, Chaos, Solitons and Fractals 81 (2015) 407–411. 16. P. Allegrini, P. Paradisi, D. Menicucci, M. Laurino, A. Piarulli, A. Gemignani, Self- organized dynamical complexity in human wakefulness and sleep: Different critical brain-activity feedback for conscious and unconscious states, Phys. Rev. E 92, (2015) 032808.

BelBI2016, Belgrade, June 2016. 91 DORMANCYbase developing a bioinformatics database on molecular regulation of animal dormancy

Popovi´c Zeljkoˇ D.1,2, Kadlecsik Tamas´ 2, Fazekas David´ 2, Ari Eszter2, Korcsmaros´ Tamas´ 3, Uzelac Iva1, Avramov Miloˇs1, Krivoku´ca Nikola1, Kitanovi´c Nevena1, and Kokai Dunja1

1 University of Novi Sad, Faculty of Sciences, Department of Biology and Ecology, Trg Dositeja Obradovia 3, 21000 Novi Sad, Serbia [email protected] 2 Eotv¨ os¨ Lorand´ University, Department of Genetics, Pazm´ any´ P´eter stny. 1/C, H-1117 Budapest, Hungary [email protected] 3 TGAC, The Genome Analysis Centre, Gut Health and Food Safety Programme, Institute of Food Research Norwich Research Park, Norwich, Norfolk, NR4 7UH, UK [email protected]

Abstract

Dormancy is a period in an organisms life cycle when growth, development and physical activity are temporarily arrested. It involves changes on behavioral, morphological, physiological, biochemical and molecular levels that, taken to- gether, increase the stress tolerance of organisms and help them survive harsh environmental conditions. In the last two decades, application of -omic and other modern technologies has led to the exponential expansion of scientific data on molecular background of dormancy. However, usage of these data is difficult and limited due to lack of organized system for the storage of produced data. In that light, developing a unique database, named DORMANCYbase, of gene and protein expression during animal dormancy will provide an inimitable source of functional expression data derived from scientific literature. DORMANCYbase will be available for free on the website: www.dormancybase.org, and linked to other relevant databases such as NCBI, UniProt, DDJB etc. Not only will the database allow scientists to browse information, but they will also be able to sub- mit their own research data. Excluding data from mass parallel -omic platforms, the database currently contains nearly 1000 RNA and protein sequences from 63 different animal species, and includes a wide range of information regard- ing type of dormancy, life stage, organ/tissue/cell type and methodology used for analysis, expression level as well as DOI numbers and URLs of entered pub- lications. Analyzing all this data, we expect to define common groups of genes which participate in the regulation of certain dormant states and also in response to diverse kinds of stress. The results will enable scientists to compare gene and protein sets expressed in various dormancies and organisms, both useful and harmful from mans point of view. Furthermore, molecular data from DORMAN- CYbase will allow researchers to identify both conserved and specific molecular

92 BelBI2016, Belgrade, June 2016. DORMANCYbase developing a bioinformatics database ... processes in different resting phases, as well as explore functional networks of genes and their products for a given type of dormancy. Keywords: dormancy, database, gene, protein, expression, bioinformatics

BelBI2016, Belgrade, June 2016. 93 Examining regulation of restriction-modification systems by quantitative modeling

Andjela Rodic and Marko Djordjevic

Faculty of Biology, University of Belgrade, Studentski trg 16, 11000 Belgrade, Serbia {andjela.rodic,dmarko}@bio.bg.ac.rs

Abstract

Bacterial restriction-modification (RM) systems encode a restriction enzyme (R), which cuts specific DNA sequences, and a methyltransferase (M), which methy- lates the same sequences thus protecting them from cutting. Expression of these enzymes has to be tightly coordinated to provide defense against foreign DNA without damaging host DNA, which is often accomplished by control (C) protein driven regulation. The main technical difficulty in directly observing R and M expression is synchro- nizing the plasmid entry in the bacterial cells. To resolve this difficulty, our col- laborators performed the first single-cell measurements of the in vivo dynamics of R and M expression, done for Esp1396I RM system. We developed a quanti- tative model of the system dynamics, where we used statistical thermodynamics to model transcription regulation of the system promoters, which was then used as an input for the dynamical modeling, predicting the change of the enzyme amounts in a cell. The model successfully reproduces the main experimentally observed features of the expression dynamics the significant delay of R with respect to M expression, including a high pic in M expression for the early times [1]. We use a similar modeling approach to perturb characteristic features of AhdI RM system, where we show that its design may be explained by the following principles: a delayed R expression, a fast transition from ”OFF” to ”ON” state, and the stable steady state. We use these design principles to propose an expla- nation for the extremely high binding cooperativity and dimerization constant observed in AhdI [1], and propose that these principles should be of general applicability to RM systems. Keywords: restriction-modification system, regulation, control protein

References

1. Morozova, N., Sabantsev, A., Bogdanova, E., Fedorova, Y., Majkova, A., Vediajkin, A., Rodic, A., Djordjevic, M., Khodorkovskiy, M., Severinov, K.: Nucleic Acids Research, 44, 790-800. (2015) 2. Rodic, A., Blagojevic, B., Zdobnov, E., Djordjevic M., Djordjevic M., submitted. (2016)

94 BelBI2016, Belgrade, June 2016. On the clustering of biomedical datasets - a data-driven perspective

Richard Roettger

University of Southern Denmark, Department of Mathematics and Computer Science, Odense, Denmark [email protected]

Abstract

Nowadays, scientists of virtually all disciplines are confronted with an increas- ing supply of information; this is especially true for biomedical research where recent advances in wet-lab technologies have led to a sheer explosion of the wealth, quality, and amount of available data. A typically first step in analyzing these large datasets is the so-called cluster analysis which unravels the inherent structure of the data by grouping similar objects together. Despite being a long standing problem, conducting a cluster analysis is every- thing but straight-forward; to the contrary, a high quality clustering analysis is very often overwhelming the practitioner. A multitude of decisions have to be made, all requiring deep understanding of the underlying methods; decisions like feature extraction, similarity calculation, clustering tool selection and pa- rameter optimization etc often overwhelm the practitioner. Here, well-structured and objective guidelines are widely missing, especially on larger scale. To attack these challenges, we have developed ClustEval, a fully integrated and automatized cluster evaluation framework. The power of this framework al- lowed us to conduct a massive, objective and fully reproducible clustering com- parison analysis consisting of several million evaluations. This massive data- driven background of structured clustering results allowed us provide an highly demanded overview of the field and to carefully derive guidelines for the clus- tering of biomedical datasets which we recently published in Nature Methods. Based on this effort, we want to present ClustEval, most recent findings, and furthermore aim to evaluate the future perspectives for improving the overall quality and usability of cluster analyses. All results and the framework are freely available: http://clusteval.sdu.dk/

BelBI2016, Belgrade, June 2016. 95 Identification of genes involved in morphogenesis in vitro in Centaurium erythraea Rafn. as a model organism

Ana Simonovi´c1, Milan Dragi´cevi´c1, Giorgio Giurato2, Biljana Filipovi´c1, Sladjana Todorovi´c1, Milica Bogdanovi´c1, Katarina Cukovi´ ´c1, and Angelina Suboti´c1

1 Institute for Biological Research ”Siniˇsa Stankovi´c”, University of Belgrade, Bul. Despota Stefana 142, 11000 Belgrade, Serbia {ana.simonovic, mdragicevic, biljana.nikolic, slatod, milica.bogdanovic, heroina}@ibiss.bg.ac.rs 2 Genomix4Life Srl, Spin-Off of the Laboratory of Molecular Medicine and Genomics, University of Salerno, Baronissi (SA), Italy [email protected]

Abstract

Centaurium erythraea is an endangered medicinal plant with great regenera- tion potential and developmental plasticity in vitro [1]. Identification of genes involved in organogenesis and somatic embryogenesis (SE) is the first step to- wards elucidation of molecular mechanisms underlying centaurys morphogenic plasticity. RNA from leaves (L), roots (R), embryogenic calii (EC), globular so- matic embryos (GSE), cotyledonary somatic embryos (CSE) and adventitious buds (AB) was sequenced, resulting in 29-37 million reads/sample. Sequencing, de novo transcriptome assembly using Trinity and annotation were operated by Genomix4Life. The reference transcriptome (142 Mbp) contained 160,839 Trin- ity transcripts comprising 105,726 ”genes”. Of 160,839 transcripts, 44,288 had Blast hits, 26,435 had GO Slim annotation, whereas 9,552 were with GO map- ping. The top-hit species was Coffea canephora. Relative expression was com- puted by aligning high quality reads to the Trinity transcripts and presented as TMM-FPKM. In each sample ≥30,000 transcripts were expressed. Transcripts involved in different morphogenetic paths were filtered using R. Potential SE markers (FPKM ≥1 in EC or GSE and ≥8x higher FPKM in EC or GSE than in L, R and AB) included 1989 sequences, such as LRR receptor-like PK, germin- like proteins, TFs WRKY, AINTEGUMENTA and others. There were 1203 tran- scripts important for later SE development, including seed storage proteins and expansins. Finally, 727 transcripts with at least 8x higher FPKM in AB than in other samples were considered as important for organogenesis. This work was supported by the Ministry of Education, Science and Technologi- cal Development of the Republic of Serbia, Project TR-31019. Keywords: organogenesis, somatic embryogenesis, RNA sequencing, transcrip- tome

96 BelBI2016, Belgrade, June 2016. Identification of genes involved in morphogenesis ...

References

1. Filipovic,´ B.K, Simonovic,´ A.D., Trifunovic,´ M.M., Dmitrovic,´ S.S., Savic,´ J.M., Jevremovic,´ S.B., Subotic,´ A.R.: Plant regeneration in leaf culture of Centaurium erythraea Rafn. Part 1: The role of antioxidant enzymes. Plant Cell Tiss Organ Cult, 121(3), 703-719. (2015)

BelBI2016, Belgrade, June 2016. 97 Mathematical Modeling of the Hypothalamic-Pituitary-Adrenal Axis Dynamics in Rats

Ana Stanojevi´c1, Vladimir Markovi´c1, Zeljkoˇ Cupiˇ ´c2, Stevan Ma´ceˇsi´c1, Vladana Vukojevi´c3, and Ljiljana Kolar-Ani´c1,2

1 University of Belgrade, Faculty of Physical Chemistry, Studentski trg 12-16, 11158 Belgrade, Serbia {ana.stanojevic, vladimir.markovic, stevan.macesic, lkolar}@ffh.bg.ac.rs 2 University of Belgrade, Institute of Chemistry, Technology and Metallurgy, Department of Catalysis and Chemical Engineering, Njegoeva 12, 11000 Belgrade, Serbia [email protected] 3 Karolinska Institute, Department of Clinical Neuroscience, Center for Molecular Medicine CMM L8:01, 17176 Stockholm, Sweden [email protected]

Abstract

The hypothalamic-pituitary-adrenal (HPA) axis is a dynamic regulatory network of biochemical reactions that integrates and synchronizes the nervous and the endocrine systems functions at the organism level. In order to describe how this vast network of biochemical interactions operates, we have developed a nonlin- ear eleven-dimensional stoichiometric model that concisely describes key bio- chemical transformations that comprise the HPA axis in rats. In a stoichiometric model of a biochemical system, the outcomes of complex biochemical pathways are succinctly described by stoichiometric relations. In this representation, sub- stances that initiate, i.e. enter a pathway are regarded to behave as reactants; substances that are generated in a pathway are regarded to behave as products; and the rates at which products of a pathway appear are jointly proportional to the concentrations of the reactants. In order to derive rate constants for spe- cific biochemical reaction pathways, we have resorted to our recently developed nonlinear reaction model that concisely describes biochemical transformations in the HPA axis in humans. In this way, a mathematical framework is developed to describe in the form of a system of ordinary differential equations (ODEs) the integration of biochemical pathways that constitute the HPA axis on chemical ki- netics basis. This, in turn, allows us to use numerical simulations to investigate how the underlying biochemical pathways are intertwined to give an integral HPA axis response at the organism level to a variety of external or internal per- turbators of the HPA dynamics. Given that the HPA axis is a nonlinear dynami- cal network, its response is complex and often cannot be intuitively predicted, stoichiometric modeling can be harnessed for gaining additional insights into dynamical functioning of this complex neuroendocrine system. Keywords: Hypothalamic-pituitary-adrenal (HPA) axis, rats, nonlinear dynami- cal network, system of ordinary differential equations

98 BelBI2016, Belgrade, June 2016. Chaos and symmetry in mathematical neural flow models

Rodica Cimpoiasu, Radu Constantinescu, and Alina Streche

University of Craiova, 13 A.I.Cuza, 200585 Craiova, Romania {rodicimp, rconsta, maria.alina2009}@yahoo.com

Abstract

There are many mathematical models trying to explain the propagation of the neural flow in terms of nonlinear dynamical systems. Our paper presents some results related to two such models. The first model consists in a system of 3 non- linear ODEs of Lorenz type and the second one is described through a general- ized nonlinear Boussinesq equation. In the first case, the neural flow is modeled by an electronic circuit generating chaotic signals, while, in the second case, the propagation of nerve pulses through neurons is assimilated with the sound propagation in cylindrical bio-membranes. The approach through the symmetry group method allows obtaining important information concerning each of the two systems. We will focus on the chaotic behavior and control techniques for the first example and on specific solitary wave solutions for the evolution de- scribed by the second system. Keywords: nerve propagation models, Lie symmetry, chaos, solitary wave

References

1. Olver, P. J.: Applications of Lie Groups to Differential Equations, GTM 107, Second edn., Springer-Verlag. (1993) 2. Bluman G.W., Kumei S.: Symmetries and Differential Equations. New York, Springer (1989) 3. Cimpoiasu, R., Constantinescu, R.: Nonlinear Analysis:Theory, Methods and Applications, vol.73, Issue1, 147-153 (2010) 4. Sprott, J.C.: Elegant Chaos, World Scientific Publishing Co., (2010) 5. Ionescu, C., Florian, G., Panaintescu, E., Petrisor, I.: Nonlinear control of chaotic circuits, in Rom.J.Phys., Vol.61, Nos.1-2, 183-193 (2016) 6. Ji, L.: J. Math.Anal.Appl.440, 286299 (2016) 7. Cimpoiasu, R.: Nerve pulse propagation in biological membranes: solitons and other in- variant solutions, in International Journal of Biomathematics, Vol.9, No.5 (2016), DOI: 10.1142/S1793524516500753

BelBI2016, Belgrade, June 2016. 99 Graph theoretical analysis reveals: Womens brains are better connected than mens

Balazs´ Szalkai, Balint´ Varga, and Vince Grolmusz

Eotv¨ os¨ Lorand´ University, Protein Information Technology Group, Pazm´ any´ P´eter s´etany´ 1/C, Budapest XI, Hungary {szalkai,balorkany,grolmusz}@pitgroup.org

Abstract

Deep graph-theoretic ideas in the context with the graph of the World Wide Web led to the definition of Googles PageRank and the subsequent rise of the most popular search engine to date. Brain graphs, or connectomes, are being widely explored today. We believe that non-trivial graph theoretic concepts, similarly as it happened in the case of the World Wide Web, will lead to discoveries enlight- ening the structural and also the functional details of the animal and human brains. In the present work we have examined brain graphs, computed from the data of the Human Connectome Project, recorded from male and female subjects between ages 22 and 35. Significant differences were found between the male and female structural brain graphs: we show that the average female connectome has more edges, is a bet- ter expander graph, has larger minimal bisection width, and has more spanning trees than the average male connectome. Since the average female brain weighs less than the brain of males, these properties show that the female brain has better graph theoretical properties, in a sense, than the brain of males [1]. Keywords: sex, brain, graph, graph theory, bioinformatics, MRI

References

1. Szalkai, B., Varga, B., Grolmusz, V.: Graph Theoretical Analysis Reveals: Womens Brains Are Better Connected than Mens. PLOS ONE, DOI: 10.1371/journal.pone.0130045 (2015)

100 BelBI2016, Belgrade, June 2016. Comparative Connectomics: Mapping the Inter-Individual Variability of Connections within the Regions of the Human Brain

Balint´ Varga

Eotv¨ os¨ Lorand´ University, Budapest, Hungary [email protected]

Abstract

The human braingraph, or connectome is a description of the connections of the brain: the nodes of the graph correspond to small areas of the gray mat- ter, and two nodes are connected by an edge if a diffusion MRI-based workflow finds fibers between those brain areas. We have constructed 1015-vertex graphs from the diffusion MRI brain images of 395 human subjects and compared the individual graphs with respect to several different areas of the brain. The inter- individual variability of the graphs within different brain regions was discovered and described. We have found that the frontal and the limbic lobes are more con- servative, while the edges in the temporal and occipital lobes are more diverse. Interestingly, a ”hybrid” conservative and diverse distribution was found in the paracentral lobule and the fusiform gyrus. Smaller cortical areas were also eval- uated: precentral gyri were found to be more conservative, and the postcentral and the superior temporal gyri to be very diverse.

References

1. The Human Connectome Project and beyond: initial applications of 300 mT/m gradi- ents. Neuroimage, 80:234245, Oct 2013. doi: 10.1016/j.neuroimage.2013.05.074. URL http://dx.doi.org/10.1016/j.neuroimage.2013.05.074. 2. Sex differences in the structural connectome of the human brain. Proc Natl Acad Sci U S A, 111(2):823828, Jan 2014. doi: 10.1073/pnas.1316909110. URL http://dx.doi.org/10.1073/pnas.1316909110. 3. Graph theoretical analysis reveals: Womens brains are better connected than mens. PLOS One, July 2015a. http://dx.plos.org/10.1371/journal.pone.0130045. 4. [Szalkai´ et al.(2015b)] The Budapest Reference Connectome Server v2. 0. Neuroscience letters, 595:6062, 2015b. 5. [Hirsch(1997)] Differential Topology. Springer-Verlag, 1997. ISBN 978-0-387-90148-0. 6. [Feller(2008)] An introduction to probability theory and its applications. John Wiley & Sons, 2008. 7. [Daducci et al.(2012)] The connectome mapper: an open-source processing pipeline to map con- nectomes with MRI. PLoS One, 7(12):e48121, 2012. doi: 10.1371/jour- nal.pone.0048121. URL http://dx.doi.org/10.1371/journal.pone.0048121. 8. [Tournier et al.(2012)] Mrtrix: diffusion tractography in crossing fiber regions. International Journal of Imaging Systems and Technology, 22(1):5366, 2012.

BelBI2016, Belgrade, June 2016. 101 Viral: Real-world competing process simulations on multiplex networks

Petar Veliˇckovi´c, Andrej Ivaˇskovi´c, Stella Lau, and Miloˇs Stanojevi´c

Computer Laboratory, University of Cambridge, Cambridge CB3 0FD, UK {pv273,ai294,sl715,ms2239}@cam.ac.uk

Abstract. Accurate modelling of spreading processes represents a crucial challenge of modern bioinformatics, particularly in the context of predict- ing the consequences of epidemics (e.g. the proportion of population in- fected at the critical point). A wide variety of frameworks have been es- tablished; especially, recent developments in multiplex networks allow for integrating several competing spreading processes and modelling their in- teractions more directly. However, the research developments so far have primarily been evaluated on randomly-generated networks and assump- tions on network dynamics that are unlikely to correspond to actual human psychology. As a decisive step towards controlled experiments of this kind, we present Viral, a multiplex-network-guided system for real-world simu- lations of the competing processes of epidemics and awareness in modern society, based around a lightweight distributed Android application and a centralised simulation server, both of which are simple to set up and config- ure. Extensive logging facilities are provided for analysing the simulation results. Keywords: multiplex networks, competing processes, epidemics, aware- ness, real-world simulations, Android

1. Introduction

Traditionally, epidemics modelling has been performed by way of single-layered networks, representing humans as nodes with a set of possible states they can be in (susceptible-infected-recovered (SIR) and its varieties being a popular choice) and allowing for disease to spread along the links of the network, representing pairs of people that come into physical contact. However, incorporating several competing processes into the model via multiplex networks [1] has been a topic of plentiful related research in recent years [2–6], showing an emergence of pre- viously unseen important phenomena in epidemics-related networks. Informally, a multiplex network is a multi-layered graph in which each layer is built over the same set of nodes, and there may exist edges between nodes in different layers. Here the nodes usually represent individuals in a population, while the layers usually correspond to the different processes under study. This framework has thus far been almost exclusively applied to generated networks (common choices include Erdos-R˝ ´enyi random graphs [7] and Barabasi-´ Albert scale-free networks [8]), and assumptions on the network dynamics (such

102 BelBI2016, Belgrade, June 2016. Viral: Real-world competing process simulations on multiplex networks as the Markov property) that may not always correspond to human psychol- ogy are often made. With the primary aim of providing a complementary tool that allows researchers to further verify their predictions on real-world con- trolled experiments, we have developed Viral during the 24-hour Hack Cam- bridge hackathon (https://www.hackcambridge.com/), where it was commend- ed as one of the top seven projects (out of ∼100 participating teams from top tier universities).

2. Multiplex network model

We consider a multiplex network setup with two layers (for the epidemics and awareness processes, respectively) over the same set of nodes, corresponding to individuals in the population. An SIS (susceptible-infected-susceptible) pro- cess is assumed for the epidemics layer, while a UAU (unaware-aware-unaware) process is assumed for the awareness layer (akin to the model used in [2]). Along the awareness layer, knowledge of an epidemic can spread between individuals that exchange information. This is modelled implicitly—the individ- uals are allowed to communicate verbally and via social networks, requiring no additional state to be maintained for supporting it. The layers influence one another in two critical ways: 1) a susceptible in- dividual that is aware of an epidemic can get vaccinated, thus diminishing their probability of infection; 2) an individual that becomes infected will, with a fixed probability, become aware. The full network dynamics are illustrated by Fig. 1. In order to discourage the “pack behaviour” in which awareness immediately fully spreads and everyone gets immunised early on, a novel component of our system encourages a proportion of the population to behave carelessly, by as- signing them a negative role of an infector—their purpose being to get as much of the population infected as possible until the round ends. All other (human) nodes are simply tasked with staying healthy until the end of the round.

awareness layer

U A immunisation

A

U U

A

S S

I

self-awareness S I

V epidemics layer

Fig. 1. Illustration of the underlying network dynamics assumed by Viral. Nodes take part in both the epidemics layer (SIS + vaccinated) and the awareness layer (UAU).

BelBI2016, Belgrade, June 2016. 103 P. Veliˇckovi´c et al.

3. Implementation

Viral consists of two core components: the server and the Android application. The Android application represents a node in the network and broadcasts its current geolocation to the server, which is used to compute distances between nodes and obtain transmission probabilities for modelling the epidemics layer. The server simulates the state transitions (such as a change in awareness or physical state) and sends the node’s updated state to the Android application.

3.1. Server The server communicates with the Android applications as well as simulates and maintains the network state. It also periodically appends the network state into a log file for the current session and provides a visualisation tool that displays the most recent state (created in publication-ready TikZ format—examples can be seen in the synthetic experiments’ outputs in Section 4.2). Simulating the epidemics layer is achieved by maintaining a matrix M of inverse-exponential distances between all pairs of nodes with

−λdij Mij = ke (1) where k > 0 and λ are server parameters, and dij is the great circle distance between the locations of node i and node j. The probability of activation for edge i ↔ j is given by normalising:

Mij Pij = P (2) i,j Mij This means that the likelihood of infection increases as the proximity between nodes increases, corresponding to an assumption of airborne transmission. An edge activated between a susceptible and an infected node leads to the suscep- tible node becoming infected with a specified probability (also a server parame- ter).

3.2. Android application The Android application consists of two main graphical components (Fig. 2): – Initial screen: the first prompt which becomes visible to the user once the application is started; it allows the user to provide the hostname and port of a Viral server; – Main screen: the screen responsible for showing all the necessary information received from the server, as well as allowing user input where necessary. Once the the hostname and port are provided via the initial screen, all the nec- essary components of the application are initialised. Thereafter, messages from the server can trigger updates to the main screen. Concurrently, when the po- sition of the device is changed, its new geolocation is submitted to the server. In addition, the user can enter (and potentially be shown) a round-unique code in order to initiate vaccination—this code can be shared among users, implicitly simulating the awareness layer.

104 BelBI2016, Belgrade, June 2016. Viral: Real-world competing process simulations on multiplex networks

Fig. 2. A variety of screenshots of the Viral Android application. Left-to-right: the initial screen, followed by three different states of the main screen.

4. Usage

4.1. Installation The full source code of Viral is hosted on the corresponding author’s GitHub profile, at https://github.com/PetarV-/viral, and is licensed under the MIT license. The source may be downloaded as an archive from GitHub, or the repos- itory may be directly cloned by running the following command within a termi- nal: $ git clone https://github.com/PetarV-/viral.git Detailed instructions for compiling and configuring the server, as well as set- ting up the Android application and configuring the synthetic clients used for the runs below, are provided in the README file of the repository.

4.2. Synthetic experiments While the primary purpose of Viral is creating data from a controlled and real en- vironment, it also supports the addition of bots (virtual participants), whom the server does not distinguish from users. In the current model, the bots perform random walks and periodically send position updates to the server. No other be- haviour is given to the bots, other than them vaccinating themselves if they have access to the valid vaccine code and have the human role. We have run our application on purely synthetic data for preliminary mea- surements. Some interesting cases of network behaviour (with different network parameters) can be seen in Fig. 3.

5. Conclusions

In this applications note we have presented Viral, a utility for performing real- world controlled experiments on epidemics spreading with configurable param- eters, taking advantage of the Android platform and multiplex networks. To the

BelBI2016, Belgrade, June 2016. 105 P. Veliˇckovi´c et al.

U A A A U A A U U

A A U

U A U

U U U

U A A

U A U

U U U

U U U

A U A

U U U

A U U U U U U A A

Fig. 3. Examples of round endings. The first diagram shows a situation with pa- rameters corresponding to a typical flu-like epidemic. The second diagram cor- responds to a pandemic-like scenario, in which everybody who is not vaccinated becomes infected. The third diagram corresponds to a severe epidemic with in- effective vaccines. The node colours correspond to the colour coding from Fig. 1, and the link intensities correspond to proximities in the epidemics layer. best of our knowledge, it is the first of its kind, and should serve as both a valuable tool for bioinformaticians and a potential reference implementation for future advancements in the area of real-world simultaneous spreading process simulation. In particular, the choice and amount of processes being considered should be extendable to other cases, such as simultaneously considering multi- ple transmission paths of a single disease [5] or multiple diseases [6]. We be- lieve that the awareness component is also vital, and the framework provided by Viral for implicitly simulating it should prove highly valuable in all future extensions. Furthermore, we hope that the human/infector model considered in Section 2 should be a valuable first step towards accurately simulating the fact that a large proportion of the population acts fairly carelessly in the presence of an epidemic.

References

1. Kivela,¨ M., Arenas, A., Barthelemy, M., Gleeson, J. P., Moreno, Y., & Porter, M. A. (2014). Multilayer networks. Journal of Complex Networks, 2(3), 203-271. 2. Granell, C., Gomez,´ S., & Arenas, A. (2014). Competing spreading processes on mul- tiplex networks: awareness and epidemics. Physical Review E, 90(1), 012808. 3. Buono, C., Alvarez-Zuzek, L. G., Macri, P. A., & Braunstein, L. A. (2014). Epidemics in partially overlapped multiplex networks. PloS one, 9(3), e92200. 4. Zhao, D., Wang, L., Li, S., Wang, Z., Wang, L., & Gao, B. (2014). Immunization of epidemics in multiplex networks. PloS one, 9(11), e112018. 5. Zhao, D., Li, L., Peng, H., Luo, Q., & Yang, Y. (2014). Multiple routes transmitted epidemics on multiplex networks. Physics Letters A, 378(10), 770-776. 6. Azimi-Tafreshi, N. (2015). Cooperative epidemics on multiplex networks. arXiv preprint arXiv:1511.03235. 7. Erdos,˝ P., & R´enyi, A. (1959). On random graphs. Publicationes Mathematicae De- brecen, 6, 290-297. 8. Barabasi,´ A. L., & Albert, R. (1999). Emergence of scaling in random networks. sci- ence, 286(5439), 509-512.

106 BelBI2016, Belgrade, June 2016. White-Box Predictive Algorithms for Predicting Disease States on Gene Expression Data From Component Based Design to Meta Learning

Milan Vukicevic, Sandro Radovanovic, Boris Delibasic, and Milija Suknovic

University of Belgrade, Faculty of Organizational Sciences, Jove Ilica 154, Belgrade, Serbia milan.vukicevic, sandro.radovanovic, boris.delibasic, [email protected]

Abstract

White-Box or Reusable Component Based Approach for design and application of predictive algorithms is recently proposed and allows numerous advantages over traditional (black-box) design: development of algorithms on common ba- sis, seamless design of large number of hybrid algorithms, fair performance com- parison, increased interpretability of the results and easier adoption in practice etc. In this paper we will showcase possibilities of white-box clustering algorithm design for predicting disease states based on gene expression microarray data in two ways. First we will design large number of hybrid clustering algorithms in order to achieve increased adaption of models for data at hand. Second, we will exploit models and evaluations to build meta-learning system that will allow efficient performance estimation of models on new gene expression microarray data. Keywords: white-box, clustering, gene expression, microarray, disease state pre- diction

BelBI2016, Belgrade, June 2016. 107

POSTER SESSION

Machine learning-based approach to help diagnosing Alzheimer’s disease through spontaneous speech analysis

Jelena Graovac, Jovana Kovaˇcevi´c, and Gordana Pavlovi´c Laˇzeti´c

Faculty of Mathematics, University of Belgrade, Studentski trg 16 11000 Belgrade, Serbia {jgraovac,jovana,gordana}@matf.bg.ac.rs

Abstract

Alzheimer’s disease and other dementias have been recognized as a major public health problem among the elderly in developing countries. We address this issue by exploring automatic noninvasive techniques for diagnosing patients through analysis of spontaneous, conversational speech. The techniques we are propos- ing are variant of the n-gram based kNN and SVM machine learning techniques. Since we use byte-level n-grams, we do not use any language dependent infor- mation, including word boundaries, character case, white-space characters or punctuation [1]. Twelve adults diagnosed with dementia of Alzheimer type (DAT) participate in the study. All DAT participants were interviewed at adult day care center for people with Alzheimer’s disease or dementia in Novi Sad, the only institution of its kind in Serbia. All interviews were audio-taped, transcribed verbatim by a trained researcher, and checked for accuracy by the authors. Means for the Mini-Mental Status Exam distinguished the two groups: moderate and mild. Our plan is to compile a control dataset based on the interviews of healthy el- derly that do not differ significantly in age, sex or education level from the DAT participants. We plan to compare DAT and healthy elderly participants to test how well our techniques will discriminate between these groups. We also plan to make a distinction between the two groups of the DAT participants. We already performed some preliminary experiments in that way, and we got promising re- sults. We hope that our techniques will show promising as diagnostic and prognostic additional tools that may help earlier diagnosis of DAT and determining its de- gree of severity. Keywords: dementia of Alzheimer type, automatic diagnostics, natural language processing, machine learning

References

1. Thomas, Calvin et al.: Automatic detection and rating of dementia of Alzheimer type through lexical analysis of spontaneous speech. Mechatronics and Automation, 2005 IEEE Interna- tional Conference. Vol. 3. IEEE (2005)

BelBI2016, Belgrade, June 2016. 111 Targeted resequencing in diagnostics of inherited genetic disorders

Jelena Kusic-Tisma1, Nikola Ptakov´ a´2, A. Divac1, M. Ljujic1, Lj. Rakicevic1, M. Tesic3,4, N. Antonijevic3,4, S. Kojic1, Milan Macek Jr.2, and D. Radojkovic1

1 Institute of Molecular Genetics and Genetic Engineering, University of Belgrade, Vojvode Stepe 444a, 11000 Belgrade, Serbia [email protected] 2 Department of Biology and Medical Genetics, University Hospital Motol and 2nd School of Medicine, Charles University, Prague, Czech Republic 3 Clinic for Cardiology, Clinical Center of Serbia, Belgrade, Serbia 4 Faculty of Medicine, University of Belgrade, Belgrade, Serbia

Abstract

Next-generation sequencing technologies have made genetic testing a powerful and cost-effective new tool in diagnostic of inherited diseases with locus and allelic heterogeneity. In this study we performed targeted resequencing in two groups of patients with different disorders: cystic fibrosis (CF, ORPHA586) and hypertrophic cardiomyopathy (HCM, ORPHA217569). Patients with CF were analyzed using CFTR MASTR kit (Multiplicom). The tech- nology is based on multiplex PCR amplification of coding regions of the CFTR gene and selected intronic variants resulting in 48 amplicons with average size of 460 bp. The generated amplicon library was pair-end sequenced on a MiSeq sys- tem (Illumina Inc., San Diego, CA) using MiSeq Reagent Kit v2 (2x250 cycles). Fastq files produced upon library sequencing were processed by Sequencing Pilot software v 3.5.0 (JSI MedicalSystems). Data analysis included trimming of the PCR primer sequences from the reads. Evaluation of detected variants for disease relevance was based on the CFTR databases: http://www.genet.sickkids.on.ca /app and http://www.cftr2.org. Patients with HCM were analyzed by TruSight Cardiomyopathy Sequencing Panel (Illumina). Library preparation was based on target enrichment by hybridiza- tion. Target region covered 46 genes of interest (246 Kb total). The resulting library was pair-end sequenced on a MiSeq system (Illumina Inc., San Diego, CA) using MiSeq Reagent Kit v2 (2x250 cycles). Alignment of Fastq files and variant calling were done on machine by Miseq Reporter software v2.5. Gener- ated vcf files were annotated by VariantStudio (v2.2). Detected variants were assessed for pathogenicity using guidelines of American college of medical ge- netics and genomics [1]. The NGS technology in combination with a well-characterized clinically rele- vant gene variation database is a good alternative for a time consuming step- wise testing of genes with large allelic heterogeneity such as CFTR. Absence of such databases for HCM render variants in insufficiently studied genes difficult

112 BelBI2016, Belgrade, June 2016. Targeted resequencing in diagnostics of ... to interpret, increasing likelihood to classify them as variants of unknown sig- nificance (VUS) and failing to determine genetic basis of disease.

References

1. Standards and guidelines for the interpretation of sequence variants: a joint consen- sus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genetics in Medicine (2015) 17, 405–423 doi:10.1038/gim.2015.30

BelBI2016, Belgrade, June 2016. 113 A biologically-inspired model of visual word recognition

Yair Lakretz1, Naama Friedmann1, and Alessandro Treves2

1 Tel-Aviv University, Tel-Aviv, Israel [email protected] 2 SISSA, 265 Via Bonomea, Trieste, Italy

Abstract. We present a computational model of visual word recognition. The model is biologically inspired, incorporating plausible cortical dynam- ics, thus adding to previous studies, which have used connectionist or ’box-and-arrow’ type models. We begin by exploring several methods to represent the letter identities in an artificial neural network, and to iden- tify the method that best agrees with experimental findings and compu- tational constraints. In the self-organization process of a multilayer neural network, letter-identity and letter-position representations are further pro- cessed to create word representations. These correspond to word memo- ries in an orthographic lexicon, as described in neuropsychological mod- els, and function as attractors of the neural network. Simulations present normal reading by the network in the absence of noise or deficits. When noise or deficits are introduced, the network presents failures such as let- ter transposition or letter substitution, which are similar to those made by dyslexics with letter-position dyslexia and letter-identity dyslexia, respec- tively. Keywords: Reading, attractor neural networks, dyslexia

1. Introduction

Reading is a complex skill. It requires the brain to perform multiple processes such as graphical pattern recognition, extraction of meaning, word production and more, all in parallel and in strikingly short time. The first stages of the pro- cess of reading include the encoding of letter identities, letter position process- ing, and the composition of letters into words. Neuropsychological studies have shown that these functions can be selectively impaired and give rise to specific dyslexias [1]. Most importantly for the current study, a dyslexia has been identi- fied in which letter position encoding is impaired [2–4]. Several computational models for visual word recognition (VWR) have been proposed in the literature [5, for a review]. Although insightful and comprehensive, these models shed lit- tle light on how the brain performs these tasks. We hereby present a model that brings cognitive models together with plausible brain dynamics. These are mod- eled in an attractor dynamics network consisting of graded-response neurons with threshold-linear activation function [6]. The model addresses the question of how these processes are executed at the neuronal level, including possible failures in processing, due to noise or deficits.

114 BelBI2016, Belgrade, June 2016. A biologically-inspired model ...

2. The model 2.1. Letter representations

This section investigates the very first stage of reading, from the printed word to the level of letter representation. The activation created by a printed-letter input in an early visual stage is modeled by ascribing a list of factors to a letter from all possible graphical features in a letter (figure 1A). Factors create in turn acti- vation in a higher layer, which we name the letter layer. Letter representations are then used to compose written words at the orthographic lexicon.

Fig. 1. (A) Example of visual factors in the representation of a Hebrew letter. (B) Letter similarity among all letter pairs in Hebrew as judged by Hebrew readers. Scores are between 1 and 10. (C) Multidimensional scaling of all 27 Hebrew letters. In order to reduce interference in memory retrieval between words in the lexicon, letter representations should be as little correlated as possible. We ex- amine and compare several methods for the generation of letter representation in this early stage of reading. All methods are taken to be simple abstractions of possible neuronal processes in the brain: Constituting factors This method assumes a two-layer architecture: a factor- and a letter-representation layer. Each feature in the graphical form of a letter is represented in the model as a unit in the factor layer. Letters that have same features will hence share the corresponding active-units. Each unit in the factor layer creates in turn activation in a predefined random subset of units in the letter layer. Renormalization As in the first method, each printed-letter input creates activa- tion in a factor layer. In this case however, the contribution of each factor to the final representation is increased by a factor that is inversely proportional to its appearance in other letters. Salient features will therefore have higher weight in the final letter representation. In addition, a competition between neighboring features occurs, leaving in that neighborhood only features that are most salient.

Intermediate sub-network layer A third layer between the factor and letter layers is added. This layer is composed of several sub-networks; each sub-network cor- responds to a receptive-field (RF), which is a surrounding of neighboring cells. The size of the RF is a parameter of the model. The optimal value of this pa- rameter will be investigated below. Each factor is connected to a random subset

BelBI2016, Belgrade, June 2016. 115 Yair Lakretz et al. of units in the sub-network layer. The size of this subset (UPF units per fac- tor) is another parameter of the model later to be determined. The connections between the sub-network layer and letter representation layer are set in the fol- lowing way: each weight between a unit in the sub-network layer and the rep- resentation layer is inversely proportional to the accumulated activation in that sub-network unit across all letters. That is, popular units in the sub-network are less dominant in the final pattern of activation. Therefore, similar to the second method, salient features have higher weight in the letter representation than features with high occurrence.

Figure 2 presents correlation matrices for the 27 letters in the Hebrew alpha- bet for the three methods. For each correlation matrix (top), a corresponding full-cue retrieval test is presented along with (bottom). A full-cue test is done by presenting the network with a full-cue of the printed letter and counting the number of times successful retrieval occurs. Results show that low values of the correlations matrix correspond to high full-cue retrieval performance, and that the intermediate sub-network layer method achieves best performance. We therefore focus on this method in what follows.

Fig. 2. Correlation matrices and full-cue retrieval test results for the three methods. (A) Constituting factors (B) Renormalization (C) Sub-network Layer (RF=1, UPF=100).

Note, however, that in addition to low correlations between letter representa- tions, we require that similarity between these representations will correspond to letter similarity as judged by readers. That is, taking in consideration both these constraints, letter representations cannot be completely orthogonal. Fig- ure 1B presents average similarities between letters as judged by 30 subjects. In this test, subjects were asked to judge similarities among all letter pairs in the Hebrew alphabet. We use this data to determine the optimal model parameters

116 BelBI2016, Belgrade, June 2016. A biologically-inspired model ... by choosing the values that: (a) maximize the correlation between letter simi- larities according to the model and the experimental data; and (b) minimize the mean correlation between letter representations of the model.

2.2. Word representations and the learning stage of the network

A full description of the composition of words from letters is beyond the scope of this report. The process of composition is done in two steps. First, for each serial letter, letter identity and position are composed together. This is done through a competition process between letter-identity and letter-position activations. Next, all resulting letter-in-their-position representations are composed together. This is done by a similar competitive process, eventually creating the desired word- representation. Importantly, units that encode letter position are the same units that encode letter identity, which is presumably the case in the brain. The final word representations undergo a follow-up self-organization pro- cess, which reduces redundant correlations between the representations. In this process, a multilayer neural network, endowed with Hebbian learning and synap- tic scaling, is repeatedly presented with word patterns in a random order. The resulting word representations are finally stored in the final layer of the network, which we name the word layer, and function as its attractor states.

2.3. Architecture and dynamics

The complete architecture of the model is a multilayer network, starting at the factor layer and ending at the word layer. Units in the network are graded- response neurons, that is, a positive continuous variable Vi that is proportional to the activity of the neuron is assigned to every unit. This is in accordance with an interpretation of Vi as mean firing-rates. The updating of the network assumes a threshold-linear activation function: V (t) = g(h(t)−θ)θ(h(t)−θ), where h(t) is the local field, which in the word layer P amounts to summation over all excitatory inputs: hi = Vinput + j WijVj; θ is the threshold below which there is no output; g is a gain factor; and Wij are the synaptic weights as defined below. Each update step is followed by a competitive process which brings the sparseness of the network to a constant value. The P 2 ( i Vi/N) sparseness a is defined as: a = P 2 , which in the limit of the binary case i Vi /N is equivalent to the fraction of active units. We set this value to a = 0.25, which is in the range of plausible cortical values. This competitive process represents inhibitory feedback regulation on the activation of the network, and it operates by adjusting the threshold and gain parameters of the threshold-linear activation function. Connections between neurons at the word layer are according to a covariance 1 P µ µ ¯ µ ¯ Hebbian rule: Wij = a µ ξi (ξj − ξ), where ξ is the µ’th word pattern, and ξ is the mean across all words. After the learning stage described above is over, new words can be presented to the network. Activations created by the printed word flow from the factor layer, in a feed forward manner, to the word layer, finally converging according to the above dynamics. The resulting pattern can then be compared to the stored memory patterns in the lexicon.

BelBI2016, Belgrade, June 2016. 117 Yair Lakretz et al.

Since similarities among letter identities and among letter positions are in- corporated in the model, the network exhibits several phenomena: under noise conditions, a printed-word input can cause the network to converge to an incor- rect attractor, which corresponds to the printed-word with transposed positions of letters, or to a word in which one letter is replaced by a similar one. These phenomena are dependent on the amount of noise and deficits presented to the network, and correspond to errors as described in dyslexia [2, 7, 8]. Further work is required to compare error statistics of the network to those found in reading test results.

3. Summary

We have presented a biologically-inspired computational model of visual word recognition. We have explored several methods for the representations of letter identity. The method that achieved best performance was that of adding an in- termediate layer with sub-networks, which correspond to visual receptive fields. The optimal parameter values of this method were determined by two con- straints: (a) low correlations between letter representations (to improve memory capacity of the neural network); and (b) high correlation between similarity re- lations among letter representations in the model, and those found in behavioral tests. Simulations in the absence of noise or deficits show almost perfect retrieval of word memories. When noise or deficits are presented, the network exhibits reading errors such as letter transposition or substitution, similarly to dyslexics. A full report of the results of the simulations will be presented elsewhere.

References

1. Friedmann, Naama and Coltheart, Max and Bar-On, A and Ravid, D: Types of develop- mental dyslexia, Handbook of communication disorders: Theoretical, empirical, and applied linguistics perspectives, eds A. Bar-On and D. Ravid (De Gruyter Mouton) 2. Friedmann, Naama and Gvion, Aviah: Letter position dyslexia, Cognitive Neuropsychology, Vol. 18, No. 8, pp. 673–696, Taylor & Francis, 2001 3. Friedmann, Naama and Rahamim, Einav: Developmental letter position dyslexia, Journal of Neuropsychology, Vol. 1, No. 2, pp. 201–236, Wiley Online Library, 2007 4. Friedmann, Naama and Rahamim, Einav: What can reduce letter migrations in letter posi- tion dyslexia?, Journal of Research in Reading, Vol 37. No. 3, pp. 297–315, Wiley Online Library, 2014 5. Norris, Dennis: Models of visual word recognition, Trends in cognitive sciences, Vol. 17, No. 10, pp. 517–524, Elsevier, 2013 6. Treves, Alessandro: Graded-response neurons and information encodings in autoassocia- tive memories, Physical Review A, Vol. 42, No. 4, pp. 2418, APS, 1990 7. Brunsdon, Ruth and Coltheart, Max and Nickels, Lyndsey: Severe developmental letter- processing impairment: A treatment case study, Cognitive neuropsychology, Vol 23, No. 6, pp. 795–821, Taylor & Francis, 2006 8. Friedmann, Naama and Biran, Michal and Gvion, Aviah: Patterns of visual dyslexia, Journal of neuropsychology, Vol. 6, No. 1, pp. 1–30, Wiley Online Library, 2012

118 BelBI2016, Belgrade, June 2016. Crystallographic study on CH/O interactions of aromatic CH donors within proteins

J. Lj. Dragelj1, Ivana M. Stankovi´c2, D. M. Boˇzinovski3, T. Meyer1, Duˇsan Z.ˇ Veljkovi´c3, Vesna B. Medakovi´c3, Ernst Walter Knapp1,, and Sneˇzana D. Zari´c3

1 Fachbereich Biologie, Chemie, Pharmazie/Institute of Chemistry and Biochemistry, Freie Universitt Berlin, Fabeckstrasse 36A, Berlin, Germany 2 ICTM, University of Belgrade, Njegoˇseva 12, Belgrade, Serbia 3 Department of Chemistry, University of Belgrade, Studentski trg 16, Belgrade, Serbia [email protected]

Abstract

CH/O interactions represent weak hydrogen bonds that stabilize protein struc- tures where they contribute up to 25% among the total number of detected hydrogen bonds. Previously, we showed that CH/O interactions do not show strong preference for linear contacts and that the energy of CH/O interactions of aromatic CH donors depends on the type of atom or group in ortho-position to the interacting CH group [1, 2]. In this work, CH/O interactions of aromatic CH donors within proteins have been studied by analyzing the data in the (PDB) and by quantum chemical calculations of electrostatic poten- tials. The CH/O interactions were studied between three aromatic amino acids; phenylalanine, tyrosine and tryptophan, with several acceptors. The analysis of the distribution of the CHO angle in the crystal structures from the PDB indicates no preference for linear CH/O interactions between aromatic donors and acceptors in protein structures. Although there is no tendency for linear CH/O interactions, there is no significant number of bifurcated CH/O interactions. The analyses also indicate an influence of simultaneous classical hydrogen bonds. The influence is particularly observed in case of tyrosine. The hydroxyl group of aromatic ring of tyrosine plays an important role by forming a simultaneous classical hydrogen bond along with CH/O interaction in ortho- position to the OH substituent. These investigations could help in future CH/O interactions studies in proteins or other proteic systems. Keywords: Aromatic amino acids; CH/O interactions; Hydrogen bond; PDB

References

1. Veljkovi, D. ., Janji, G. V., Zari, S. D.: Are CHO interactions linear? The case of aromatic CH donors. CrystEngComm. 13, 5005-5010. (2011) 2. Dragelj, J. Lj., Janji, G. V., Veljkovi, D. ., Zari, S. D.: Crystallographic andab initiostudy of pyridine CHO interactions: linearity of the interactions and influence of pyridine classical hydrogen bonds. CrystEngComm. 15, 10481-10489. (2013)

BelBI2016, Belgrade, June 2016. 119 Dynamics of Escherichia coli type I-E CRISPR spacers over 42,000 years

Ekaterina Savitskaya1,2, Anna Lopatina2,3, Sofia Medvedeva1,3, Mikhail Kapustin1, Sergey Shmakov1, Alexey Tikhonov6, Irena I. Artamonova7,8,9, and Konstantin Severinov1,2,3,4,5

1 Skolkovo Institute of Science and Technology, Skolkovo, Russia [email protected] 2 Institute of Molecular Genetics, Russian Academy of Sciences, Moscow, Russia 3 Institute of Gene Biology, Russian Academy of Sciences, Moscow, Russia 4 Waksman Institute of Microbiology, Rutgers, the State University of New Jersey, USA 5 Peter the Great St. Petersburg Polytechnic University, St. Petersburg, Russia 6 Zoological Institute, Russian Academy of Sciences, St. Petersburg, Russia 7 N.I. Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia 8 A.A. Kharkevich Institute of Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia 9 M.V. Lomonosov Moscow State University, Faculty of Bioengineering and Bioinformatics, Moscow, Russia

Abstract

CRISPRCas systems defend prokaryotes against mobile genetic elements such as plasmids and phages. During the adaptation stage of the CRISPR-Cas immunity mechanism new invader-derived sequences are integrated into genomic CRISPR arrays as spacers between CRISPR repeats. We compared spacers associated with type I-E E. coli CRISPR repeats from a baby Asiatic elephant from Moscow zoo and a baby mammoth Lyuba that died about 42,000 years ago [1]. A PCR based method with partially overlapping primers complementary to type I-E E. coli CRISPR repeat was elaborated. High density Illumina sequencing of amplicons revealed tens of thousands unique spacers. To reduce the diversity of spacers k-means hierarchical clustering was applied. Surprisingly, most of these spacer clusters were common to ancient and modern samples, indicating the lack of spacer turnover during the time separating the Lyuba mammoth and the present. Partial reconstruction of CRISPR arrays using known reference E. coli strains was performed. Most of reconstructed arrays are unchanged between mammoth and elephant samples. Thus, despite its adaptive potential, the immune repertoire of E. coli CRISPR-Cas system spacers did not significantly change in the course of 42,000 years. Keywords: bioinformatics, CRISPR-Cas systems

References

1. Mueller, T. and Latreille, F.: Ice baby. Nat. Geogr. 215, 30–51 (2009).

120 BelBI2016, Belgrade, June 2016. De Novo Transcriptome Sequencing of Verbascum thapsus L. to Identify Genes Involved in Metal Tolerance

Filis Morina1, Marija Vidovi´c1, Ana Sedlarevi´c1, Ana Simonovi´c2, and Sonja Veljovi´c-Jovanovi´c1

1 Institute for Multidisciplinary Research (IMSI), University of Belgrade, Serbia {filis,marija,ana.sedlarevic,sonjavel}@imsi.rs 2 Institute for Biological Research ’Sinisa Stankovic’, University of Belgrade, Serbia [email protected]

Abstract

Verbascum thapsus is a pioneer species and a successful colonizer of metal- polluted soils. Recently, we observed differential degree of metal tolerance at the physiological level in V. thapsus populations originating from metal-polluted and un-polluted soils [1, 2]. The aim of our work was de novo transcriptome assembly and annotation of V. thapsus leaf tissue by using data collected from RNA-Sequencing experiment. These results would enable identification of genes crucial for metal tolerance and redox homeostasis in this species. Sequencing, transcriptome assembly and annotation were done by Genomix4Life. Using ul- tra high-throughput RNA paired-end sequencing on Illumina platform 45 milion reads were obtained. The high quality reads were used as input to perform tran- scriptome assembly on Trinity platform. The assembled transcriptome of 69 Mbp had 41.37% GC and 73520 transcripts were grouped in 52204 ”genes”. The av- erage and median contig length were 938 bp and 598 bp, respectively, and N50 was 1160 bp. At least 41084 genes were expressed with FPKM ≥ 2. Of over 70,000 transcripts, 2,722 were Blasted without hits, 13,033 had Blast hits, 95 had Gene Onthology (GO) Slim annotation, whereas 481 sequences were with GO mapping, while 38,760 were matched in InterProScan but not blasted. The top-hit species was Sesamum indicum, from the same order Lamiales like V. thap- sus. The sequences of the assembled transcripts were translated into proteins with Transdecoder based on a minimum length open reading frame (minimum length 100aa). Transcripts were described in terms of their associated cellular component, biological process and molecular function. The functional annota- tion using B2GO of 2524 expressed sequence tags was obtained. This exhaustive annotation may offer a suitable platform for functional genomics, particularly useful for V. thapsus as a non-model species. Keywords: Blast2GO, de novo transcript sequence annotation, Verbascum thap- sus

BelBI2016, Belgrade, June 2016. 121 Filis Morina et al.

References

1. Morina, F. and Jovanovic,´ L. and Prokic,´ L. and Veljovic-Jovanovi´ c,´ S.: Environmental Sci- ence and Pollution Research DOI 10.1007/s11356-016-6177-4. (2016) 2. Morina, F. and Vidovic,´ M. and Kukavica, B. and Veljovic-Jovanovi´ c,´ S.: Botanica Serbica, 39(2). (2015)

122 BelBI2016, Belgrade, June 2016. De Novo Transcriptome Sequencing of Pelargonium zonale L. to Identify Genes Involved in UV-B and High Light Response

Marija Vidovi´c1, Filis Morina1, Ana Sedlarevi´c1, Ana Simonovi´c2, and Sonja Veljovi´c-Jovanovi´c1

1 Institute for Multidisciplinary Research (IMSI), University of Belgrade, Serbia {marija,filis,ana.sedlarevic,sonjavel}@imsi.rs 2 Institute for Biological Research ’Siniˇsa Stankovi´c’, University of Belgrade, Serbia [email protected]

Abstract

The variegated Pelargonium zonale cv. ”Frank Headley” is a periclinal chimera with white leaf margins, caused by the lack of functional chloroplasts in meso- phyll. Our previous ultrastructural, biochemical and physiological characteriza- tions of the photosynthetic and non-photosynthetic leaf tissues revealed signifi- cant differences related to sugar, phenolic, antioxidative metabolism and stom- atal regulation [1, 2]. High light intensity and UV-B radiation induced different antioxidative and phenolic responses in these two tissues. The aim of our study was de novo transcriptome assembly and annotation of green leaf tissue of P. zonale. By using ultra high-throughput RNA paired-end sequencing on Illumina platform Hiseq2500 43 million reads were obtained. The high quality reads were joined and then used as input to perform transcriptome assembly using Trinity platform. The preliminary assembled transcriptome included about 73 Mbp in 83012 transcripts grouped in 60087 ”genes”. The mean GC content was 43.18%, the average contig length was 879 bp and the N50 was 1167 bp. At least 41084 genes were expressed by at least 2 FPKM. The sequences of the assembled tran- scripts were translated into proteins with Transdecoder based on a minimum length (100 aa) open reading frame. The software Blast2GO was used to asso- ciate a function to the set of identified transcripts. Within 83000 transcripts, 7903 had Blast hits, while 5223 had Gene Onthology (biological processes, molecular function and cellular components). Majority of genes encoding en- zymes involved in response to UV-B radiation and high light intensity were iden- tified. Further work should expand on characterization of non-photosynthetic tissue, in order to further explore the tissue-specific regulation of antioxidative and phenolic metabolism. Keywords: Blast2GO, de novo transcript sequence annotation, Pelargonium zonale

References

1. Vidovic,´ M. and Morina, F. and Milic,´ S. and Albert, A. and Zechmann, B. and Tosti, T. and Winkler, J. B. and Veljovic´ Jovanovic,´ S.: Plant Physiology & Biochemistry, 93, 44–55. (2015)

BelBI2016, Belgrade, June 2016. 123 Marija Vidovi´c et al.

2. Vidovic,´ M. and Morina, F. and Milic,´ S. and Vuleta, A. and Zechmann, B. and Prokic,´ Lj. et al.: Plant Biology, doi: 10.1111/plb.12429. (2015)

124 BelBI2016, Belgrade, June 2016. Protein Interaction Network Construction and Analysis Using the Quantitative Proteomics Data

Ozal Mutlu and Nagihan Gulsoy

Marmara University, Faculty of Arts and Sciences, Department of Biology, 34722, Goztepe, Istanbul, Turkey [email protected]

Abstract

Protein-protein interaction networks comprise complex molecular interactions among proteins related with many diseases and biological systems including sig- naling, cellular growth, differentiation, cell death, environmental and pathogenic stimulus. Understanding complex system of protein association could help to identify new protein-protein networks and biomarker proteins under disease and external stimulation conditions. In this study, we have constructed and ana- lyzed protein interaction networks from the quantitative proteomics data of the ZnO nanoparticle exposed dermal fibroblasts to understand cellular responses and toxicity mechanisms. In the first step of computational studies, protein list with UNIPROT ID numbers were searched in STRING ver10.0 [1] and then net- work visualized and analyzed by Cytoscape ver3.2.1 [2] for general topologi- cal features. Based on gene ontology five different biological process including nonsense mediated mRNA decay, protein localization to organelle, mitotic cell cycle phase transition, response to oxidative stress and unfolded proteins were found. Cytoscape analysis showed top three proteins were HSP90, RPL3 (60S ribosomal protein L3) and glutathione reductase according to the betweenness centrality. Decreased expression protein network consisted less proteins with one central module related with cell growth, reproduction, cycle control and death. When all network outcomes were analyzed completely, fibroblast cells response to ZnO nanoparticles by activating the endoplasmic stress and oxida- tive stress mechanisms. These results will help to understand toxicity mecha- nisms of nanoparticles at proteome level. Keywords: protein-protein interaction network, String database, Cytoscape soft- ware, toxicoproteomics Acknowledgements: This work was supported by the Marmara University (BAP FEN-C-DRP-110412-0102).

References

1. Szklarczyk, D. and Franceschini, A. and Wyder, S. et al.: STRING v10: ProteinProtein Inter- action Networks, Integrated Over The Tree Of Life. Nucleic Acids Research, 43(Database issue), D447–D452, (2015)

BelBI2016, Belgrade, June 2016. 125 Ozal Mutlu et al.

2. Shannon, P. and Markiel, A. and Ozier et al.: Cytoscape: A Software Environment for In- tegrated Models of Biomolecular Interaction Networks. Genome Research, 13(11), 2498– 2504, (2003).

126 BelBI2016, Belgrade, June 2016. An optimal promoter description for bacterial transcription start site detection

Milos Nikolic, Tamara Stankovic, and Marko Djordjevic

Faculty of Biology, University of Belgrade, Studentski trg 16, 11000 Belgrade, Serbia {milos.nikolic,dmarko}@bio.bg.ac.rs

Abstract

Accurately detecting transcription start sites (TSS) in bacteria is a starting point for understanding transcription regulation. It presents an essential component for many bioinformatics applications, such as gene and operon predictions. Con- sequently, improving TSS prediction, which is a classical bioinformatics problem, is necessary since currently available methods show poor accuracy. Different TSS prediction approaches use very different description of the bac- terial promoter structure.Which promoter features should be included in TSS recognition, and how their accuracy impacts the search detection, is therefore, unclear. We address these questions on the examples of σ70 and σE (an alterna- tive sigma factor) in E. coli. We obtain that -35 element, which is considered exchangeable, contributes equally to the search accuracy as the ubiquitous -10 element (σ70) or more (σE). Fur- thermore, sequences upstream of the canonical -10 element notably contribute to the search accuracy, despite their relatively low conservation. The sequence of the spacer between -35 and -10 promoter elements, which is commonly included in TSS detection, notably decreases the search accuracy for σ70 promoters, but improves the search accuracy for σE promoters. Overall, there is as much as ∼ 50% false positive reduction for optimally implemented promoter features in σ70, compared to standard promoter structure implemented in TSS searches [1].

Keywords: σ70 promoters, σE promoters, transcription start site detection, tran- scription initiation, promoter specificity

References

1. Nikolic, M. and Stankovic, T. and Djordjevic, M.: submitted (2016).

BelBI2016, Belgrade, June 2016. 127 Chronic Treatment with Fluoxetine Led to Alterations in the Rat Hippocampal Proteome

Ivana Peri´c1, Dragana Filipovi´c1, Victor Costina2, and Peter Findeisen2

1 Vinˇca Institute of Nuclear Sciences, University of Belgrade, Serbia {ivanap,dragana}@vinca.rs 2 Institute for Clinical Chemistry, Medical Faculty Mannheim of the University of Heidelberg, University Hospital Mannheim, Germany {victor.costina,peter.findeisen}@medma.uni-heidelberg.de

Abstract

Fluoxetine (Flx) is the first-line treatment for depression and anxiety [1]; how- ever, precise mechanism of its action remains elusive. Therefore, we aim to identify protein expression changes regulated by Flx, using proteomics studies within the rat hippocampus. Fluoxetine-hydrohloride (15 mg/kg/day) was ad- ministered to adult male Wistar rats for 3 weeks and protein patterns from rat hippocampal cytosolic, nuclear, and mitochondrial fractions were identified by one dimensional gel electrophoresis followed by nano LC-MS/MS. All the differ- ential proteins were functionally annotated according to biological process and molecular function using Uniprot and Blast2GO. Using this approach, we com- pared Flx-treated controls versus vehicle-treated control rats. Comparative study revealed that 67, 61 and 4 proteins were down-regulated and 168, 32 and 79 proteins were up-regulated in the cytosolic, nuclear and mitosol fractions, respectively. The prevalent biological processes of down-regulated pro- teins were, as expected, cellular and single organism process in all three fractions while in up-regulated proteins, beside cellular and single organism process were biological regulation in cytosolic fraction and metabolic process in nuclear and mitosol fractions. The molecular functions of down- and up-regulated proteins were binding and catalytic activity in all hippocampal fractions. The pathway analysis of these differential proteins using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database showed that down-regulated proteins were ba- sically involved in amino acid biosynthesis and nucleotide metabolism while up- regulated proteins participated mainly in amino acids biosynthesis, fatty acids metabolism, glycolysis/gluconeogenesis and signaling pathways. Observed differences in protein expression patterns between various cellular compartments indicate that Flx led to alterations in the hippocampal proteome. This approach has provided new insight into the effects of Flx treatment on pro- tein expression in a key brain region associated with stress response and mem- ory. Keywords: proteomics, fluoxetine, rat brain, bioinformatics

128 BelBI2016, Belgrade, June 2016. Chronic Treatment with Fluoxetine ...

References

1. Tacke, U.: Fluoxetine: an alternative to the tricyclics in the treatment of major depression? Am J Med Sci 298:126–129. (1989)

BelBI2016, Belgrade, June 2016. 129 A web-based tool for prediction of effects of single amino acid substitutions outside conserved functional protein domains

Vladimir Perovic, Ljubica Mihaljevic, Branislava Gemovic, and Nevena Veljkovic

Centre for Multidisciplinary Research, Institute of Nuclear Sciences Vinca, University of Belgrade, Belgrade, Serbia [email protected]

Abstract

Single nucleotide polymorphisms (SNPs) are recognized as the main cause of human genetic variability. A non-synonymous SNP (nsSNP) is a single base change in the coding region of a gene that results in a single amino acid sub- stitution (SAP) in the corresponding protein product. nsSNPs can significantly alter protein function and thus, the cellular and organismal phenotype of an or- ganism. The main challenge ahead is to differentiate between ”neutral” versus ”pathogenic” SNPs that assign susceptibility to Mendelian disorders, common complex diseases, as well as cancers. Tools for predicting these functional ef- fects are mostly phylogeny-based, and as such have high accuracies in predict- ing disease-associated mutations in conserved positions in protein sequences. However, we have shown recently that accuracies are significantly lower in clas- sifying variations outside conserved functional domains (CFDs). We developed a new tool for prediction of effects of single amino acid substi- tutions outside conserved functional domains based on the model that relies on informational spectrum method for sequence analyses. It was implemented in Java programming language and it is available as user friendly web service. As input data, it uses position of the variation and substituted amino acid, and provides binary classification with graphical representation of the variation as output. Our tool was trained and tested on the datasets of the gene variations in epi- genetic regulators ASXL1, DNMT3A, EZH2, and TET2, a set of key biomark- ers in myeloid malignancies. It significantly outperformed state of the art tools, PolyPhen-2 and SIFT.

130 BelBI2016, Belgrade, June 2016. Protein-protein interaction prediction method based on principle component analysis of amino acid physicochemical properties

Neven Sumonja, Nevena Veljkovic, Sanja Glisic, and Vladimir Perovic

Centre for Multidisciplinary Research, Institute of Nuclear Sciences Vinca, University of Belgrade, Belgrade, Serbia [email protected]

Abstract

Protein-protein interactions (PPI) are of utmost importance for processes in the cell and key to understanding of protein functions. Here, we propose a method for solving the PPI binary classification problem based on sequence information only. Numerous general sequence-depending methods, despite using only a few amino acid (AA) physicochemical descriptors for sequence feature representa- tion, have proven to be efficient in identifying novel PPIs. For that purpose, we propose a group of entirely novel AA descriptors which were defined based on 531 amino acid properties in AAindex database by means of principle compo- nent analysis (PCA). As a first step in sequence analyses, each sequence was transformed into vector of numbers using new AA feature representation. Then, autocovariance function (ACF) on the obtain vectors was applied. Finally, amino acid composition (AAC) is combined with the ACF vector. Random forest models were trained and tested on independent sets containing yeast and human PPI ex- tracted from the protein interaction network analysis platform (PINA). In terms of computational efficiency and predictive performance our approach outper- formed similar state of the art sequence based methods. Being robust and fast, the model presented here can deepen our insights in interactions of different model organisms.

BelBI2016, Belgrade, June 2016. 131 Basic Sequence Alignment Based Screening for Alternative Mannanase Producing Bacteria

Bojan D. Petrovi´c and Zorica D. Kneˇzevi´c-Jugovi´c

Faculty of Technology and Metallurgy, University of Belgrade, Karnegijeva 4, 11000 Belgrade, Serbia [email protected]

Abstract. The idea behind this work was to explore the genetic potential within publically available data and collect the output for future research on novel mannanase producing organisms. Mannanolytic enzymes can be applied in multiple industrial setups, but this current interest was nar- rowed down exclusively to applications relevant for improved detergent formulations. Based on the patent data up to date, protein sequences of a couple Bacillus sp. enzymes were probed for similarity with non-redundant protein database available via National Center for Biotechnology Informa- tion (NCBI). Thereafter, sequences were realigned and analyzed to com- pare the sequences of particular interest and assess conserved regions and variations between species. Our results suggest that bacterial strains of in- terest should be tested for mannanase activity and if applicable, optimized for commercial enzyme production, as this would not be in conflict with the currently relevant patents.

Keywords: industrial enzymes, mannanase, detergents, sequence align- ment

1. Introduction

Mannanase (mannan endo-1,4-beta-mannosidase, EC 3.2.1.78) is an enzyme that catalyses a random hydrolysis of (1→4)-beta-D-mannosidic linkages in man- nans, galactomannans and glucomannans. Broad substrate specificities of β- mannanases enables a plethora of applications where they are employed: hy- drolytic agent in detergent industry, biobleaching of pulp and paper, use in im- provement of animal feeds and slime control, as well as some of the emerging pharmaceutical applications [1]. The use of commercial enzyme preparations in detergents is a growing market, but when it comes to mannanase - there is only one commercially available product: Mannaway R , produced by Novozymes (patent rights held by Novo Nordisk A/S, Denmark). This work will try to use the data from the patent datasheets and utilize a fast growing collection of se- quences annotated to NCBI databases in order to propose some alternative pro- ducing organisms for mannanase production, not yet covered by patent protec- tion.

132 BelBI2016, Belgrade, June 2016. Basic Sequence Alignment ...

2. Materials and Methods

Protein sequence data was extracted from the relevant patent [2] and inde- pendent Basic Local Alignment Search Tool (BLAST) searches were performed using those sequences as queries. Search results that were in accordance to the criteria given below (section 2.1) were further elaborated and corresponding protein coding domain sequences (CDS) were retrieved. Unless specified oth- erwise, all data manipulation was done as in our previous work that used the similar methodology for multiple sequence alignment and comparison [3]. Out of the above mentioned patent documentation, two of the protein sequences were chosen based on the producing organism. We chose to focus on Bacillus species (excluding well known and commercially exploited producers such as B. subtilis, B. circulans, B. agaradhaerens, B. halodurans, B. licheniformis, B. cereus) due to the fact that Bacteria from the Bacilli group are already characterized as a suitable producing organism for the mannanase enzyme, providing high activity yield. Relevant patents claim mannanase sequences and/or segments of man- nanase sequences derived from the given strain (Bacillus sp. I633) and other wild type and recombinant sequences that have a homology of 60% or more to the sequences claimed in those patents. Table 1 shows detailed data about the query sequences. The first query has a glycosyl hydrolase family 5 (GH5) cellulase domain, while the second has a GH5 cellulase domain and a cellulose binding module (CBM).

Table 1. Mannanolytic enzymes from Bacillus sp. extracted from the patent datasheets. Nr Organism Length [AA] GenBank Accsession Reference 1 Bacillus sp. I633 490 AAQ31834.1 Seq. #2 in [2] 2 Bacillus sp. I633 476 AAQ31835.1 Seq. #4 in [2]

2.1. BLAST

Standard protein-protein BLAST search [4] was performed using the queries given above. Parameters were as following: database was set to ”non-redundant protein sequences (nr)”, organism was set to exclude Bacillus/Staphylococcus group (taxid:1385), while ”blastp” algorithm was chosen, using the threshold values as follows: max. target sequences: 100, expect threshold: 10, word size: 6, max. matches in a query range: 0. Out of each search result obtained (avail- able in the supplementary material), subjects were picked according to the fol- lowing criteria: ”query cover” ≥ 70% and ”ident” within the [45%, 60%) range. If multiple loci from the same source organism that encode for the same protein product complied with these criteria, sequence with highest total score was cho- sen. Final list of sequences designated for downstream analyses is given within Table 2.

BelBI2016, Belgrade, June 2016. 133 Bojan D. Petrovic et al.

Table 2. Sequences filtered for further analyses, with BLAST data. The first three entries were obtained using query nr 1, while the remaining three were obtained using query nr 2.

Description Max Total Query E value Ident NCBI Accsession score score cover mannan endo-1,4-beta-mannosidase 444 444 79% 7e-148 56% KHD15024.1 (beta-mannanase) (1,4-beta-D-mannan mannanohydrolase) [Clostridium bu- tyricum] endoglucanase [Pseudobacteroides cellu- 400 400 70% 1e-131 55% WP 036945506.1 losolvens] hypothetical protein [Deinococcus misas- 388 388 93% 7e-126 45% WP 051963127.1 ensis] 1,4-beta-glucanase [Thermoanaerobacter 342 481 94% 2e-101 45% WP 045165560.1 cellulolyticus] endo-1,4-beta-glucanase [Caldicellu- 340 476 94% 7e-101 45% WP 013429869.1 losiruptor kronotskyensis] endo-1,4-beta-glucanase [Caldicellu- 338 473 94% 6e-100 45% WP 015908242.1 losiruptor bescii]

2.2. Sequence Alignment and Comparative Analyses

All the sequences were aligned using the Clustal W program implemented as an accessory application in BioEdit (version 7.2.5). Alignment files are available in the suppl. material. Aligned sequences underwent comparative analyses: a) Amino Acid Composition and b) Position Entropy were done within BioEdit platform. c) Conserved Regions Analysis was done within the BioEdit platform, using the following parameters: minimum segment length (actual for each se- quence): 10, maximum average entropy: 0.5, gaps limited to 1 per segment, contiguous gaps limited to 1 in any segment. d) The Phylogenetic Tree among species based on the average number of amino acid substitutions per site be- tween species was constructed using the unweighted pair group method with the arithmetic mean (UPGMA) implemented in Mega 3.1 software. Difference count matrix used to generate the tree is available in the suppl. material.

3. Results and Discussion

Amino acid composition is shown in Table 3. Even though two groups of three sequences (with and without CBM unit) differed in length by more than a dou- ble, interspecies variation fits into naturally occurring differences, some of which may be important for enzyme stability and efficacy. As it was moderately sug- gested by positional entropy distribution (result not shown; Figure S3 available in the suppl. material), most of the putative conserved sites were clustered within the N-terminal side of the sequence alignment, up until 600th residue. Conserved regions were found only within GH5 domain. It is indicative that these six sites (Figure 1) are all clustered within the GH5 cellulase domain and may be a key basis for the catalytic activity. Some data suggest Glu residue within the conserved region might be involved in the catalytic mechanism [5]. Our data, shown in Figure 1 also pinpoints a very strong conserved Asn, Val and Trp residues in the first conserved site; EVHD motif and a strongly conserved Thr in

134 BelBI2016, Belgrade, June 2016. Basic Sequence Alignment ... the second; Asn, Ile, Glu, Gly and Trp in the third; Asp, Trp and Gly in the fourth; Asp, Pro, Asn, Phe and a HMY motif in the fifth; and finally and IGEF motif, as well as strongly conserved Leu and His residues in the sixth conserved region. All of these spots are possibly heavily implicated in the catalytic mechanism, substrate specificity and native structure stability of the GH5 cellulase domain. Unlike GH5 domain, CBM units do not have any conserved spots, even though these modules generally share a large homology among species.

Table 3. Amino acid frequency analysis. All the frequencies are given in molar percent. EMW - estimated molecular weight of the protein (in kDa).

Species C. P. D. T. C. C. bescii, butyricum, celluloso- misasensis, cellulolyticus, kronotskye- gi506388523 Amino gi723448912 lvens, gi917356415 gi771511740 nsis, acid gi739074211 gi503195208 Ala 7.02 8.31 11.07 7.00 7.48 7.42 Cys 1.06 1.36 0.60 0.55 0.55 0.54 Asp 7.66 6.36 3.42 5.51 5.46 5.49 Glu 4.04 2.93 3.02 4.01 3.98 3.86 Phe 2.34 2.93 3.82 2.68 2.42 2.47 Gly 10.21 10.02 8.85 7.40 7.56 7.65 His 1.28 1.47 1.81 1.34 1.33 1.31 Ile 7.66 8.07 4.83 6.45 6.55 6.41 Lys 8.09 8.07 4.23 5.59 5.69 5.64 Leu 5.32 6.11 5.23 5.74 5.38 5.41 Met 2.34 2.93 2.21 1.49 1.71 1.62 Asn 8.94 7.82 9.26 7.32 7.25 7.11 Pro 1.28 2.69 3.22 5.98 6.24 6.41 Gln 2.98 2.44 3.02 2.91 2.65 2.70 Arg 0.85 0.98 2.62 2.91 2.81 2.86 Ser 7.45 7.82 10.26 8.42 8.81 8.89 Thr 6.81 6.85 9.86 8.89 9.12 9.20 Val 5.74 7.09 7.04 7.47 6.63 6.72 Trp 2.77 3.18 3.02 3.15 3.12 3.09 Tyr 6.17 1.96 3.02 5.19 5.30 5.18 EMW 51.8 43.9 53.0 140.4 141.2 142.2

Fig. 1. Conserved regions within the GH5 domain of mannanolytic enzymes form selected taxa.

This observed diversity provides for a phylogenetic overview of the selected taxa based on the mannanolytic CDS (Figure 2). However, it does not make for a good evolutionary marker, since this phylogeny does not fully correspond

BelBI2016, Belgrade, June 2016. 135 Bojan D. Petrovic et al. with the NCBI Taxonomy data. Such a result may be easily explained due to the critically strong impact of the environment factors to the evolution of cellulolytic enzymes en general [6].

Fig. 2. UPGMA phylogenetic tree. This phylogeny does not fully represent cur- rent generally accepted classification of the selected taxa.

4. Conclusion

This work, while utilizing simple and free bioinformatics tools, has provided with a total of six potential candidate microorganisms for possibility of mannanase production, in a manner that would not compromise patent protection currently active for Bacillus sp. mannanase sequences. Bearing in mind the biology of the species mentioned in this work, that all have putative or proven mannanolytic protein sequences annotated publically, it was not found in the literature that these bacterial strains could be pathogenic. This makes them solid candidates for mannanase production, as an alternative to well known producers.

5. Supplementary Material

Raw sequence files, original full BLAST results scraped from the NCBI server, as well as proprietary alignment files, supplementary figure and the relevant patent booklet can be found online at http://db.tt/XNSDuXmy.

References

1. Chauhan, P. S. et al: Mannanases: microbial sources, production, properties and potential biotechnological applications. Appl Microbiol Biotechnol, 93: 1817–1830. (2012) 2. Kauppinen, M. S. et al: Novel mannanases. WO1999064619A2. (1999). [Online]. Available: http://www.google.co.ug/patents/WO1999064619A2?cl=en 3. Prekovic, S. et al: Bioinformatical and mathematical comparative analysis of ClpP exons and protein sequence. In: Zakrzewska, J. and Zivic, M. and Andjus, P. (eds.): Regional Biophysics Conference 2012, Book of Abstracts, Serbian Biophysical Society, Belgrade, 110. (2012)

136 BelBI2016, Belgrade, June 2016. Basic Sequence Alignment ...

4. Altschul, S. F. et al: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 25: 3389–3402. (1997) 5. Py, B. et al: Cellulase EGZ of Erwinia chrysanthemi: structural organization and importance of His98 and Glu133 residues for catalysis. Protein Eng, 4(3): 325–333. (1991) 6. Aspeborg, H. et al: Evolution, substrate specificity and subfamily classification of glycoside hydrolase family 5 (GH5). BMC Evolutionary Biology, 12:186. (2012)

BelBI2016, Belgrade, June 2016. 137 Theoretical study on the role of aromatic amino acids in stability of amyloids

Dragan B. Ninkovi´c1,2, Duˇsan P. Malenov4, Predrag V. Petrovi´c1,2, Edward N. Brothers2, Shuqiang Niu3, Michael B. Hall3, Milivoj Beli´c2 and Sneˇzana D. Zari´c2,4

1 Innovation Center, Department of Chemistry, University of Belgrade, Studentski trg 12–16, Belgrade, Serbia 2 Science Program, Texas A&M University at Qatar, Texas A&M Engineering Building, Education City, Doha, Qatar 3 Department of Chemistry, Texas A&M University, College Station, TX 77843-3255, USA 4 Department of Chemistry, University of Belgrade, Studentski trg 12–16, Belgrade, Serbia [email protected]

Abstract

Various neurodegenerative disorders such as Alzheimer’s and Parkinson’s dis- eases have been associated with the amyloid fibril plaques. Widely spread belief is that aromatic amino acid residues are crucial in the formation of the plaques since they frequently occur in natural amyloids. It was shown that amyloids can be formed from aliphatic peptides as well. However, this issue is still studied and under consideration. In the last few years, numerous studies used various experimental and computational methods to investigate the role of aromatic amino acid residues in amyloid plaque formation. We studied influence of aromatic amino acids on amyloid formation using DFT methods to calculate interaction energies of peptide model systems with and without aromatic residues. We have also analyzed contributions of aliphatic- aliphatic, aromatic-aliphatic, and aromatic-aromatic interactions to the total in- teraction energy and stability of the structure. Studied peptides were based on the crystal structures of amyloids available from the Cambridge Structural Database. In model systems with aromatic amino acids calculations showed that aliphatic- aliphatic contribution is the weakest, followed by aromatic-aromatic, while aromatic- aliphatic interactions have the strongest contributions to the total interactions energies. In model systems without aromatic amino acids, having only peptides made of aliphatic amino acids, interactions are as strong as in systems with aro- matic amino acids. Results of the calculations indicate similar stability of amyloids with and without aromatic amino acids, which support findings that aromatic amino acids are not essential for amyloid formation. Keywords: amyloids, DFT, noncovalent interactions

138 BelBI2016, Belgrade, June 2016. Construction of Amyloid PDB Files Database

Ivana Stankovi´c and Sneˇzana Zari´c

1 ICTM, University of Belgrade, Njegoˇseva 12, Belgrade, Serbia ivana [email protected] 2 Department of Chemistry, University of Belgrade, Studentski trg 12-16, Belgrade, Serbia [email protected]

Abstract. Amyloids are insoluble proteins of a cross-β structure found as deposits in many diseases. They are largely examined structurally, but there is a lack of a unique structural database for amyloid proteins resolved with atomic resolution. Here, we present a constructed amyloid database made based on keyword criterion as well as structural features of amyloids described in literature. The searching filter was performed by python pro- gramming. The total number of structures is 109. This database can help further structural general and statistical analysis of amyloids, as we know the molecular basis can lead to understanding of disease mechanisms re- lated to amyloid proteins. Keywords: database, protein structure, amyloid

1. Introduction

Amyloids are insoluble proteins of a cross-β structure found as deposits in many diseases like Alzheimer’s, Parkinson’s, CreutzfeldtJakob’s, type II diabetes etc. They are also found in normal tissues (nails, spider net, silk) because of their strong fibrillar nature. Among functional nanostructured materials of a signifi- cant impact in nanotechnology and biological environments, amyloid fibrils have attracted great attention because of their unique architectures and exceptional physical properties. Short polypeptides, of minimum 4 amino acids [1], are self-assemblied into β- sheets via backbone hydrogen atoms, then several β-sheets interact with each other in a parallel fashion via polypeptide side chains forming long linear un- branched protofilaments with an axis nearly perpendicular to a polypeptide strand. Several protofilaments, the number being specific to the particular amy- loid protein, form fibrils. All amyloid proteins, independently of their sequence, form very similar structure, the cross-β structure, made of parallel arrays of β-strands. These structures are different only in the inter-sheet spacing which depends on the side chain size, and in a morphology of a fibril [2]. Amyloids are largely examined structurally [3–5] individually, but there is no systematic structural analysis of all resolved structures so far in the literature. There is a lack of a unique structural database for amyloid proteins resolved with atomic resolution. The Protein Data Bank (PDB) consists of nearly 120 000 3D shapes of proteins, nucleic acids and complex assemblies [6]. The PDB contains

BelBI2016, Belgrade, June 2016. 139 Ivana Stankovic et al. amyloid structures, but they are hard to find by a simple one criterion search. PDB files are often not uniform about the amyloid keyword. The molecules in .pdb files are often labeled by another name referring to an amyloid precursor name or a disease name, while the word amyloid could be mentioned within description such as publication title, publication keywords, title section etc. Another difficulty in constructing amyloid database is that amyloid proteins ex- ist in different conformations depending on conditions. They might exist in non- amyloidal conformation in solution when they form helical or random coil sec- ondary structure with no parallel fragments forming fibrils [7]. Here, we present a constructed amyloid database made based on keyword crite- rion as well as structural criteria.

2. Methodology

Amyloid protein 3D structures were searched in Protein Data Bank (PDB) and in Cambridge Structural Database (CSD). The searching criteria for the CSD was any 4 residue long acyclic polypeptide with nearly β-sheet structure. 8 structures were found, but with no proof of self-assembly in the published papers. Amyloid PDB subdatabase was made by searching the PDB for the keyword amy- loid and precursor names. Only the β secondary structures or extended ones were taken. There are 109 structures found in PDB, resolved by X-ray crystallog- raphy, solid state or solution NMR.

2.1. Online Search The online search on the website http://www.rcsb.org/pdb/home/home.do gave us a list of PDB IDs of potential amyloid structures according to the name key- word. The search was done by picking every structure in which the desired keyword appears. The keyword was simply amyloid and 38 amyloid precursor names. The precursors names were published recently in the editorial of Amyloid, The Journal of Protein Folding Disorders, Tables 1, 2 and 3 in [8]. These are all known naturally occurring amyloids. By searching by files that contain the key- word amyloid, we include all the synthetic amyloids as well, described by the keywords amyloid-like, amyloid-related, amyloidogenic etc. We got 1218 structures in total. It is difficult to separate all the amyloid struc- tures, but not pick the non-amyloid ones. Not every structure mentioning the amyloid keyword, is in fact an amyloid. Further filtration in the next sections will deal with structural features of amyloids.

2.2. Excluding Helical Structures It appears that amyloids are not exclusively β structures, there are also coil and extended peptides which pack in a parallel manner forming long fibrils perpen- dicular to the peptide axis. This is why we excluded only the helical structures, leaving the β-sheets and coil in the first step of the filtration. The filtration was

140 BelBI2016, Belgrade, June 2016. Construction of Amyloid PDB ... made using TCL scripting language [9] command get structure incorporated in the VMD software [10]. The result was total of 241 structures. This is still not the ready database, as it contains non-parallel, globular protein arrangements.

2.3. Excluding Non-parallel Structures We defined amyloid structure as a structure which does not contain more than 1 non-parallel peptide fragment for every fragment in the whole structure. This is because there are highly ordered structures with alternating parallel and tilted fragments, as in PDB ID: 4UBZ [11], thus amyloids could contain non- parallel fragments. On the other hand, there are parallel fragments in non- amyloidogenic structures, as they may contain -sheets made of parallel β-strands. But they are mostly globular proteins. We distinguished them from amyloids as structures which contain more than 1 non-parallel fragment for each fragment, Fig. 1.

Fig. 1. Criterion for distinguishing amyloid structures from non-helical struc- tures: an amyloid possesses maximum 1 nonparallel fragment for each fragment in the whole structure.

Flat fragments were defined according to the Ramachandran backbone tor- sion angles found in structures of 8 amyloid-β fragments published by [12]. Among these structures, there are β-sheets as well as curved coil fragments with the total torsion angles scope of (-156◦,-103◦) for the angle, and (104◦, 154◦) for the ψ angle. We expanded this scope by the fully extended peptide conforma- tion, (ϕ, ψ) = (-180◦, 180◦), so the final scope was ϕ=(-180◦,-103◦), ψ=(104◦, 180◦). Furthermore, a fragment must be of minimum 4 amino acids length. The criterion for the parallelity of fragments was also taken from the 8 structures in [12]. In these structures the maximal difference in the distance between two Cα atoms belonging to two parallel fragments is 1.5A˚ , Fig. 2. For the purpose of this final structural filtration, the .pdb files were down- loaded from http://files.rcsb.org/pub/pdb/data/structures/divided/pdb, and the .pdb1 files containing information in biological assembly were downloaded from http://files.rcsb.org/pub/pdb/data/biounit/coordinates/divided. This is impor- tant because both translating a crystallographic unit cell in all the three direc- tions, and completing the biological assembly structure must be done in order

BelBI2016, Belgrade, June 2016. 141 Ivana Stankovic et al.

Fig. 2. Criterion for parallel fragments: the distance between two Cα atoms belonging to two parallel fragments must differ maximally 1.5A˚ , as found in amyloid-β structures resolved by [12]. to complete the amyloid structure and find all the parallel fragments. Homemade scripts for the downloading and structural filtration were programmed in Python programming language [13] and for PDB file parsing MDAnalysis python library has been used [14].

3. Results and Discussion

The resulting database consists of 109 structures. The database was confirmed by visual inspection of the 241 non-helical structures found by TCL scripting search. According to the geometric parameters we considered, flat fragments weather as β-sheets or coils, and number of nonparallel fragments of each fragment, there are 5 classes of amyloid PDB structures: U-shape with β-sheets connected by unstructured coils, β-sheets packed in a flat fashion, β-sheets packed in a tilted fashion, coil structure packed in a flat fashion and coil structure packed in a tilted fashion. These arrangements of amyloid structures are all found in the review on amyloid states [15] according to the facial and directional alignment of the interacting β-sheets.

4. Conclusion

An amyloid atomic resolution structural data bank was made by searching the Protein Data Bank. The criteria were based on both amyloid name keyword and structural features of amyloid described in literature. The total number of struc- tures is 109 on the 25th of March of 2016. This number will grow as new amyloid structures are resolved crystallographically and by NMR spectroscopy. This database can help further structural general and statistical analysis of amy- loids, as we know the molecular basis can lead to understanding of disease mechanisms related to amyloid proteins.

142 BelBI2016, Belgrade, June 2016. Construction of Amyloid PDB ...

References

1. Lakshmanan, A. and Cheong, D. W. and Accardo, A. and Di Fabrizio, E. and Riekel, C. and Hauser, C. A.: Aliphatic peptides show similar self-assembly to amyloid core sequences, challenging the importance of aromatic interactions in amyloidosis. Proc. Natl. Acad. Sci. U.S.A., 110, 519–524. (2013) 2. Harrison, R. S. and Sharpe, P. C. and Singh, Y. and Fairlie, D. P.: Amyloid peptides and proteins in review. Physiol Biochem Pharmacol, 159:1–77. (2007) 3. Jakob T. Nielsen and Morten Bjerring and Martin D. Jeppesen and Ronnie O. Pedersen and Jan M. Pedersen and Kim L. Hein and Thomas Vosegaard and Troels Skrydstrup and Daniel E. Otzen and Niels C. Nielsen: Unique Identification of Supramolecular Structures in Amyloid Fibrils by Solid-State NMR Spectroscopy. Angew. Chem. Int. Ed., 48, 2118–2121. (2009) 4. Charles H. Davis and Max L. Berkowitz: Interaction Between Amyloid-β (1–42) Peptide and Phospholipid Bilayers: A Molecular Dynamics Study. Biophysical Journal 96, 785–797. (2009) 5. Das, P. and Kang, S-g. and Temple, S. and Belfort, G.: Interaction of Amyloid Inhibitor Proteins with Amyloid Beta Peptides: Insight from Molecular Dynamics Simulations. PLoS ONE 9(11): e113041. (2014) 6. Berman, H. M. and Henrick, K. and Nakamura, H.: Announcing the worldwide Protein Data Bank Nature Structural Biology 10 (12): 980. (2003) 7. Martino Calamai and Fabrizio Chiti and Christopher M. Dobson: Amyloid Fibril Formation Can Proceed from Different Conformations of a Partially Unfolded Protein. Biophysical Jour- nal 89, 4201–4210. (2005) 8. Nomenclature 2014: Amyloid fibril proteins and clinical classification of the amyloidosis Amyloid, 21(4): 221–224, Editorial. (2014) 9. http://www.tcl.tk/ 10. Humphrey, W. and Dalke, A. and Schulten, K.: VMD-Visual Molecular Dynamics. J Molec Graphics 14, 33–38. (1996) 11. Lu Yu and Seung-Joo Lee and Vivien C. Yee: Crystal Structures of Polymorphic Prion Protein β1 Peptides Reveal Variable Steric Zipper Conformations. Biochemistry, 54, 3640– 3648. (2015) 12. Jacques-Philippe Colletier and Arthur Laganowsky and Meytal Landau and Minglei Zhao and Angela B. Soriaga and Lukasz Goldschmidt and David Flot and Duilio Cascio and Michael R. Sawaya and David Eisenberg: Molecular basis for amyloid-β polymorphism, PNAS, 108, 16938-16943. (2011) 13. http://www.python.org/ 14. Michaud-Agrawal, N. and Denning, E. J. and Woolf, T. B. and Beckstein, O.: MDAnalysis: A Toolkit for the Analysis of Molecular Dynamics Simulations. J. Comput. Chem. 32, 2319– 2327. (2011) 15. David Eisenberg and Mathias Jucker: The Amyloid State of Proteins in Human Diseases, Cell 148. (2012)

BelBI2016, Belgrade, June 2016. 143 Search for small RNAs associated with CRISPR/Cas

Tamara Stankovic1, Jelena Guzina1, Magdalena Djordjevic2, and Marko Djordjevic1

1 Institute of Physiology and Biochemistry, Faculty of Biology, University of Belgrade, Studentski trg 16, 11000 Belgrade, Serbia {tamaras,jelenag,dmarko}@bio.bg.ac.rs 2 Institute of Physics Belgrade, University of Belgrade, Pregrevica 118, 11080 Belgrade, Serbia [email protected]

Abstract

CRISPR/Cas is an advanced heritable defense system against viruses and plas- mids, which was recently found in bacteria and archaea [1]. These systems con- sist of clusters of regularly interspaced palindromic repeats (the CRISPR array) and of CRISPR associated (Cas) proteins. In this research we focus on Type II CRISPR/Cas systems and small RNAs associated with these systems. These small RNAs have a crucial role in CRISPR/Cas functioning, such as processing CRISPR transcripts. Detecting them is however hard, as they are poorly conserved in even a closely related bacterial strains. Moreover, they are typically expressed under non-standard and ill-characterized conditions, obscuring their identifica- tion from (still limited in bacteria) dRNA-seq experiments. We here use a state- of-the art transcription start site detection methods that we developed, together with an optimized implementation of the transcription terminator detection, to detect CRISPR/Cas associated small RNAs in Type II systems [2].

References

1. Horvath, P. and Barrangou, R.: Science 327: 167 (2010) 2. Guzina, J. and Stankovic, T. and Zdobnov, E. and Djordjevic, M.: Detection of CRISPR/Cas associated small RNAs, in preparation (2016).

144 BelBI2016, Belgrade, June 2016. A novel approach for dealing with spatial/temporal edges within molecular interaction networks.

Ruth A Stoney1,2, Ryan Ames3, Goran Nenadic2, David L Robertson∗1, and Jean-Marc Schwartz1 ∗Shared last/corresponding authors

1 Computational and Evolutionary Biology, Faculty of Life Sciences, University of Manchester, Manchester, M13 9PT, UK 2 School of Computer Science, University of Manchester, Manchester, M13 9PT, UK 3 Wellcome Trust Centre for Biomedical Modelling and Analysis, University of Exeter, RILD Level 3, Exeter, EX2 5DW

Abstract

Functional networks are biological models, often used to explore the function of molecules such as proteins or transcription factors. Their aim is to use topologi- cal clustering to link molecules with shared cellular functions (e.g. metabolising sugar, building proteins), with applications such as identification of potential drug targets and understanding disease pathologies. Within these networks bio- logical molecules are represented as nodes; with molecular interactions shown as edges. Clusters in the network represent functionally similar molecules and are referred to as functional modules [1, 2]. A great deal of research has focused on computational methods used to form clusters, based on topological features [3–5]. However, such networks assume that edges are constant (non-dynamic), therefore it is always appropriate to use the sum of a nodes edges, rather than a subset. This assumption is often inac- curate in biological systems. Simply because two proteins can interact does not mean that they will interact in every context [1][1, 6]; some interactions may only take place under specific cellular conditions. Combining sets of edges which may not co-occur in the cell (due to spatial/temporal separation) may result in cases of incorrect clustering. . Evidence for this comes from discrepancies in community detection between networks created from different data types [7]. To deal with the issue of spatial/temporal edges we developed a new method using yeast pathways [8]. Pathways represent small, experimentally validated sets of protein-protein interactions, that have been observed under particular cellular conditions. Information passed into a pathway is assumed to affect all nodes and to be shared simultaneously (the whole pathway will respond to ex- ternal stimuli as a single unit). The nodes within a pathway may therefore be considered as a single pathway object. We have created a novel model in which pathways are used as nodes in the network, representing units of cellular activity. Our model shares the aim of bringing together functionally related molecules and interactions, however re- producing the molecular models method of linking pathway nodes by physical interactions is impractical.

BelBI2016, Belgrade, June 2016. 145 Ruth A Stoney et al.

Pathways are employed in one or more function within the cell, which can be identified using gene enrichment analysis. Functions may be assigned to path- ways with greater confidence than functional modules because the risk of spa- tial/temporal edges is removed. Cellular function is not divided discretely into pathways, rather pathway work together and share functions. We used shared functionality between pathways to create the edges in our network. The re- sulting network contains clusters of functionally related pathways, where each pathways represents a set of interacting proteins. This method achieves the goal of clustering interacting proteins, while avoiding the issues faced by previous molecular methods. Since the publication of this paper work on human systems has begun, with the goal of exploring the link between function and disease. Keywords: bioinformatics, functional network, cluster analysis

References

1. Chen, J. and Yuan, B.: Detecting functional modules in the yeast protein-protein interaction network. Bioinformatics [Internet]. 2006 [cited 2015 Mar 22];22:2283–90. Available from: http://www.ncbi.nlm.nih.gov/pubmed/16837529 2. Vidal, M. and Cusick, M. E. and Barabsi A-L.: Interactome networks and human disease. Cell [Internet]. 2011 [cited 2013 Nov 7];144:986–98. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3102045& tool=pmcentrez&rendertype=abstract 3. Blondel, V. and Guillaume, J.: Fast unfolding of communities in large networks. J. Stat. . . . [Internet]. 2008 [cited 2014 Jul 13];1–12. Available from: http://iopscience.iop.org/1742- 5468/2008/10/P10008 4. Song, J. and Singh, M.: How and when should interactome-derived clusters be used to predict functional modules and protein function? Bioinformatics [Internet]. 2009 [cited 2014 Jun 26];25:3143–50. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3167697& tool=pmcentrez&rendertype=abstract 5. Wang, J. and Li, M. and Deng, Y. and Pan, Y.: Recent advances in clustering methods for protein interaction networks. BMC Genomics [Internet]. 2010 [cited 2014 Oct 26];11 Suppl 3:S10. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2999340& tool=pmcentrez&rendertype=abstract 6. Hyduke, D. R. and Palsson, B. Ø.: Towards genome-scale signalling network reconstruc- tions. Nat. Rev. Genet. [Internet]. Nature Publishing Group; 2010;11:297–307. Available from: http://dx.doi.org/10.1038/nrg2750 7. Ames, R. M. and Macpherson, J. I. and Pinney, J. W. and Lovell, S. C. and Robertson, D. L.: Modular biological function is most effectively captured by combining molecular interaction data types. PLoS One [Internet]. 2013 [cited 2015 Jan 9];8:e62670. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3643936& tool=pmcentrez&rendertype=abstract 8. Stoney, R. and Ames, R. and Nenadic, G. and Robertson, D. and Schwartz, J.: Disentan- gling the multigenic and pleiotropic nature of molecular function. BMC Syst. Biol. 2015;9.

146 BelBI2016, Belgrade, June 2016. Gene expression in schizophrenia patients and non-schizophrenic individuals infected with Toxoplasma gondii

Aleksandra Uzelac1, Tijana Stajnerˇ 1, Miloˇs Busarˇcevi´c1, Ana Munjiza2, Milutin Kosti´c2, Cedoˇ Miljevi´c2, Duˇsica Leˇci´c-Toˇsevski2, Nenad Miti´c3, Saˇsa Malkov3, and Olgica Djurkovi´c-Djakovi´c1

1 Center of Excellence for Food- and Vector-borne Zoonoses, Institute for Medical Research, University of Belgrade, Dr. Suboti´ca 4, 11129 Belgrade, Serbia [email protected] 2 Institute of Mental Health, School of Medicine, University of Belgrade, Palmoti´ceva 37, 11000 Belgrade, Serbia 3 Faculty of Mathematics, University of Belgrade, Studentski trg 16, 11000 Belgrade, Serbia {nenad,smalkov}@matf.bg.ac.rs

Abstract

There is an increasing body of data suggesting the association of infection with the protozoan parasite Toxoplasma gondii and schizophrenia. In this study, we employed a combination of data mining and bioinformatics to investigate whether any genes from loci which harbor schizophrenia associated SNPs, as determined by a GWAS study by Ripke et al [1], are associated with the immune response to Toxoplasma gondii infection. After extracting a list of genes from the loci, we ex- amined the expression of their murine homologs in response to acute infection with T. gondii in brain homogenates and lymphocytes of experimentally infected animals by mining microarray data published by Jia et al [2]. Of the 208 unique protein coding genes in schizophrenia associated loci we were able to cross refer- ence with both sets of microarray data, 108 differed in expression by at least 30% with respect to controls. Functional annotation clustering using the algorithm in- cluded in the DAVID bioinformatics resources 6.7 database confirmed that the statistically most significant annotation cluster was indeed enriched with genes which code for proteins with immune functions. Based on these results, we se- lected the following genes HLA-DQA1, TAP1, TAP2, PSMB8, EGFL8, LY6G6C, C4A and CFB, which are all located in the MHC region on chromosome 6, for validation by real time PCR. Their expression is being assayed in the peripheral blood of schizophrenia patients infected and not infected with T. gondii and the corresponding non-schizophrenic controls. Preliminary results suggest that the expression of HLA-DQA1 and TAP2 in response to T. gondii infection is indeed altered in schizophrenia patients. We are also currently investigating whether the infection itself or an altered immune response to the infection can be cor- related with the patients Positive and Negative Symptom Scale (PANSS) scores and thereby with the clinical presentation of schizophrenia. Keywords: bioinformatics, data mining, gene expression, Toxoplasma gondii

BelBI2016, Belgrade, June 2016. 147 Aleksandra Uzelac et al.

References

1. Ripke, S. et al: Genome-wide association analysis identifies 13 new risk loci for schizophre- nia. Nat. Genet. 45(10):1150-9. (2013) 2. Jia, B. and Lu, H. and Liu, Q. and Yin, J. and Jiang, N. and Chen, Q.: Genome-wide compar- ative analysis revealed significant transcriptome changes in mice after Toxoplasma gondii infection. Parasit.Vectors. 4,6:161 (2013)

148 BelBI2016, Belgrade, June 2016. Propensities of amino acid toward certain secondary protein structure types: comparison of different statistical methods

Duˇsan Z.ˇ Veljkovi´c1, Saˇsa Malkov2, Vesna B. Medakovi´c1, and Sneˇzana D. Zari´c1

1 Department of Chemistry, University of Belgrade, Studentski trg 16, Belgrade, Serbia [email protected] 2 Faculty of Mathematics, University of Belgrade, Studentski trg 16, Belgrade, Serbia

Abstract

The conformational preferences of amino acids are of great importance for un- derstanding conformational interactions in proteins. When used as propensi- ties, these preferences can be helpful in predicting secondary and tertiary struc- tures of proteins. Several statistical studies were performed in order to calcu- late amino acid propensities [1, 2]. In our previous work we carried out study of amino acid propensities using statistical method [3, 4]. Based on the study, preferences of amino acids towards certain secondary structures classify amino acids into four groups: -helix preferrers, strand preferrers, turn and bend prefer- rers, and His and Cys (these two amino acids do not show clear preference for any secondary structure). Amino acids in the same group have similar structural characteristics at their Cβ and Cγ atoms that predict their preference for a par- ticular secondary structure. In this work other statistical methods for calculation of amino acid propensities were compared to the statistical method which was used in our previous work. Comparison was made on the basis of correlation coefficients (ρ(s,p)). The re- sults show that although methods are similar, there are some significant differ- ences, resulting in a more explicit connection between our classification and amino acid chemical structure. Application of our statistical approach allows for stricter conclusions, without misjudgment on the amino acid’s preferences. Keywords: amino acids, preferences, correlations, classification

References

1. Chou P.Y., Fasman G. D.: Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins. Biochemistry, 211-222. (1974) 2. Levitt M.: Conformational preferences of amino acids in globular proteins. Biochemistry, 4277-4285. (1978) 3. Malkov S. N., ivkovi M. V., Beljanski M. V., Hall M. B., Zari S. D.: A reexamination of the propensities of amino acids towards a particular secondary structure: classification of amino acids based on their chemical structure. J. Mol. Model. 769-775. (2008) 4. Malkov S. N., ivkovi M. V., Beljanski M. V., Stojanovi S. ., Zari S. D.: A reexamination of correlations of amino acids with particular secondary structures. The Protein Journal, 74- 86. (2009)

BelBI2016, Belgrade, June 2016. 149 Botryosphaeriaceae on Aesculus hippocastanum in Serbia

Milica Zlatkovi´c1, Nenad Keˇca1, Michael Wingfield2, Fahimeh Jami2, and Bernard Slippers2

1 University of Belgrade-Faculty of Forestry, Kneza Viˇseslava 1, Belgrade, Serbia [email protected] 2 Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Cnr Lynwood and University roads, Pretoria, South Africa

Abstract

Horse chestnut (Aesculus hippocastanum L.) is a large, long lived, deciduous tree endemic to the Southern part of the Balkan Peninsula, in South Eastern Europe [1]. The seeds of this tree are widely used in medicinal and pharmaceutical in- dustries. Because of its large hand-shaped leaves and attractive white flowers A. hippocastanum is a highly valuable street and shade tree commonly planted across Europe. In recent years, A. hippocastanum trees in Serbia have exhibited die-back of shoots, shoot cankers and necrotic lesions in the lower parts of the stems. Samples were collected from the symptomatic tissues in Belgrade, Ser- bia from 2009-2015. The consistently isolated fungal colonies were grey and Botryosphaeriaceae-like [2] and the aim of this study was to identify them. Based on morphology of the asexual morph and phylogeny of DNA sequence data for the internal transcribed spacer (ITS), translation elongation factor 1α (TEF 1-α), β-tubulin-2 (BT2) and large subunit (LSU) gene regions the isolates were identified as Botryospaheria dothidea, Neofusicoccum parvum, Diplodia mu- tila and Dothiorella sarmentorum. A. hippocastanum is in danger of extinction due to the population decline caused by the invasive leaf miner moth Cameraria ohridella and the species has been listed in the IUCN red list of threatened plants [3]. This study adds to the knowledge on the identity of Botryosphaeriaceae as potential pathogens of this important and threatened tree. Keywords: Botryosphaeriaceae, multigene phylogeny, identification, Aesculus hip- pocastanum

References

1. Jovanovic,´ B.: Dendrologija. IV izmenjeno izdanje. Univerzitet u Beogradu. Beograd. (1985) 2. Zlatkovic,´ M. and Keca,ˇ N. and Wingfield, M. J. and Jami, F. and Slippers, B.: Botryosphaeri- aceae associated with the die-back of ornamental trees in Serbia, Antonie van Leeuwen- hoek, International Journal of General and Molecular Microbiology, 109: 543–564. (2016) 3. Khela, S.: Aesculus hippocastanum. The IUCN Red List of Threatened Species. (2013). Accessed on 28 April 2016.

150 BelBI2016, Belgrade, June 2016. Botryosphaeriaceae on Sequoia sempervirens in Serbia

Milica Zlatkovi´c1, Nenad Keˇca1, Michael Wingfield2, Fahimeh Jami2, and Bernard Slippers2

1 University of Belgrade-Faculty of Forestry, Kneza Viˇseslava 1, Belgrade, Serbia [email protected] 2 Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Cnr Lynwood and University roads, Pretoria, South Africa

Abstract

Coastal redwood (Sequoia sempervirens) is an evergreen, large, long-lived tree native to Western North America. It is the only species in the genus Sequoia and is an important timber tree valued for its beauty, light-weight timber that is resistant to decay and fire damage. S. sempervirens is in danger of extinction due to its population decline and the species has been listed in the IUCN red list of threatened plants [1]. In Serbia, the only known S. sempervirens tree is planted in the botanical garden ”Jevremovac” in Belgrade. In autumn 2011, the tree exhibited branch flagging associated with branch and shoot cankers with the leaves remaining attached. Tissue samples associated with these symptoms were plated on Malt Extract Agar (MEA). One week later, fast-growing, grey fun- gal colonies resembling those of the Botryosphaeriaceae spp. [2] were obtained and the aim of this study was to identify them. Morphology of the asexual morph and phylogenetic inference based on DNA sequence data for the internal tran- scribed spacer (ITS), translation elongation factor 1α (TEF 1-α), β-tubulin-2 (BT2) and large subunit (LSU) gene regions showed that isolates represented Diplodia mutila, Neofusicoccum parvum and Botryospaheria dothidea. In its natu- ral range, S. sempervirens grows in coastal areas with moist climate. The reasons for the newly emerging die-back of this tree in Serbia might be connected with the recent drought periods that could have provided stressful conditions that are typically associated with opportunistic infections by Botryosphaeriaceae. Keywords: Botryosphaeriaceae, multigene phylogeny, identification, Sequoia sem- pervirens

References

1. Farjon, A. and Schmid, R.: Sequoia sempervirens. The IUCN Red List of Threatened Species. (2013) Accessed on 26 April 2016. 2. Zlatkovic,´ M. and Keca,ˇ N. and Wingfield, M. J. and Jami, F. and Slippers, B.: Botryosphaeri- aceae associated with the die-back of ornamental trees in Serbia, Antonie van Leeuwen- hoek, International Journal of General and Molecular Microbiology, 109: 543–564. (2016)

Author Index

Ames, Ryan, 145 Divac, A., 112 Anashkina, Anastasia, 39 Djordjevic, Magdalena, 46, 144 Andjelkovi´c, Miroslav, 28 Djordjevic, Marko, 46, 58, 94, 127, 144 Antonijevic, N., 112 Djurdjevac Conrad, Natasa, 54 Ari, Eszter, 92 Djuri´c, Tamara, 63 Artamonova, Irena, 120 Djurkovi´c-Djakovi´c, Olgica, 49, 147 Avetisov, Vladik, 1 Dobrovolskaya, Oxana, 22 Avramov, Miloˇs, 92 Dovidchenko, Nikita V., 14 Dragelj, J. , 119 Babenko, Vladimir, 22, 43 Dragi´cevi´c, Milan, 96 Banjevic, Milena, 44 Dragovich, Branko, 13 Banovi´c, Bojana, 81 Dudi´c, Dragana, 81 Baumbach, Jan, 2 Dzhus, Ulyana F., 14 Beli´c, Milivoj, 138 Beljanski, Miloˇs, 62, 81 Etchebest, Catherine, 4 Blagojevic, Bojana, 46 Boˇzinovski, D., 119 Fazekas, David,´ 92 Bogdanovi´c, Milica, 96 Feliciello, Giancarlo, 70 Bongcam-Rudloff, Erik, 3 Filipovi´c, Biljana, 96 Brdar, Sanja, 47 Filipovi´c, Dragana, 128 Brothers, Edward, 138 Filipovi´c, Vladimir, 55 Brusic, Vladimir, 7 Findeisen, Peter, 128 Bugay, Aleksandr, 48 Friedmann, Naama, 72, 114 Bundschuh, Ralf, 9 Busarˇcevi´c, Miloˇs, 49, 147 Gal Chechik, 72 Galzitskaya, Oxana V., 14 Carboncini, Maria Chiara, 87 Gelfand, Mikhail, 15 Celani, Antonio, 10 Gemovic, Branislava, 130 Chadaeva, Irina, 43 Georgiou, Constantinos, 7 Chawla, Nitesh, 11 Giurato, Giorgio, 96 Chen, Ming, 22 Glisic, Sanja, 131 Ciliberto, Andrea, 12 Glyakina, Anna V., 14 Cimpoiasu, Rodica, 99 Graovac, Jelena, 111 Cohen, Evan-Gary, 72 Grbi´c, Milana, 55 Constantinescu, Radu, 99 Gregson, Cassie, 74 Costina, Victor, 128 Grigolon, Silvia, 57 Craveur, Pierrick, 4 Grigorashvili, Elizaveta I., 14 Cuperlovic-Culf, Miroslava, 52 Grolmusz, Vince, 100 Czyz, Zbigniew, 70 Gulsoy, Nagihan, 125 Guzina, Jelena, 58, 144 Cupiˇ ´c, Zeljko,ˇ 98 Guzvic, Miodrag, 70

Cukovi´ ´c, Katarina, 96 Hall, Michael, 138 Hernandez, Robert, 74 de Brevern, Alexandre G., 4 Delibasic, Boris, 107 Ispolatov, Iaroslav, 25 Dimitrova, Tamara, 53 Ivaˇskovi´c, Andrej, 102 Jami, Fahimeh, 150, 151 Milovanovi´c, Ivan, 49 Jandrli´c, Davorka, 59 Miti´c, Nenad, 59, 62, 147 Jeli´c, Asja, 61 Mohamed, Salwa, 22 Jelovi´c, Ana, 62 Morina, Filis, 121, 123 Jovanovi´c, Ivan, 63 Morozov, Alexandre, 18 Jovanovi´c, Jasmina, 63 Munjiza, Ana, 147 Mutlu, Ozal, 80, 125 Kadlecsik, Tamas,´ 92 Kanevska, Polina, 68 Narwani, Tarun, 4 Kapustin, Mikhail, 120 Nekrasov, Alexei, 39 Kartelj, Aleksandar, 55 Nenadic, Goran, 19, 74, 145 Keˇca, Nenad, 150, 151 Nicolaidis, Argyris, 20 Kirsch, Stefan, 70 Nikolic, Milos, 127 Kitanovi´c, Nevena, 92 Ninkovi´c, Dragan, 138 Klein, Christoph, 70 Niu, Shuqiang, 138 Knapp, Ernst Walter, 119 Kneˇzevi´c-Jugovi´c, Zorica, 132 Obradovic, Zoran, 21 Kojic, S., 112 Orlov, Yuriy, 22, 43 Kokai, Dunja, 92 Kolar-Ani´c, Ljiljana, 98 Paji´c, Vesna, 81 Korcsmaros,´ Tamas,´ 92 Paradisi, Paolo, 87 Kosti´c, Milutin, 147 Pavlovi´c Laˇzeti´c, Gordana, 69, 111 Kovaˇcevi´c, Jovana, 69, 111 Pavlovi´c, Mirjana, 59 Kozyrev, Sergei, 16 Peri´c, Ivana, 128 Kriventseva, Evgenia, 35 Perovic, Vladimir, 130, 131 Krivoku´ca, Nikola, 92 Petrovi´c, Bojan, 132 Kuˇsic-Tiˇsma, Jelena, 112 Petrovi´c, Predrag, 138 Polzer, Bernhard, 70 Lahrmann, Urs, 70 Popovi´c, Zeljko,ˇ 92 Lakretz, Yair, 72, 114 Prˇzulj, Nataˇsa, 23 Lau, Stella, 102 Ptakov´ a,´ Nikola, 112 Leˇci´c-Toˇsevski, Duˇsica, 147 Punta, Marco, 24 Ljujic, M., 112 Lopatina, Anna, 120 Radivojac, Predrag, 69 Radojkovic, D., 112 Ma´ceˇsi´c, Stevan, 98 Radovanovic, Sandro, 107 Macek, Milan, 112 Rakicevic, Lj., 112 Magrini, Massimo, 87 Rebehmed, Joseph, 4 Malenov, Duˇsan, 138 Righi, Marco, 87 Malkov, Saˇsa, 147, 149 Robertson, David, 145 Malod-Dognin, No¨el, 17 Rodic, Andjela, 94 Marchenkov, Victor V., 14 Roettger, Richard, 95 Markovi´c, Vladimir, 98 Ryan, Allison, 44 Masmoudi, Hanen, 73 Mati´c, Dragan, 55 Salem, Khaled, 22 Medakovi´c, Vesna, 119, 149 Salvetti, Ovidio, 87 Medvedeva, Sofia, 120 Santuz, Hubert, 4 Meyer, T., 119 Savitskaya, Ekaterina, 120 Miˇsi´c, Nataˇsa, 79 Schwartz, Jean-Marc, 145 Mihaljevic, Ljubica, 130 Sedlarevi´c, Ana, 121, 123 Miljevi´c, Cedo,ˇ 147 Selivanova, Olga M., 14 Milosevic, Nikola, 74 Semenova, Ekaterina, 25 Severinov, Konstantin, 25, 120 Trbovich, Aleksandar, 49 Shinada, Nicolas, 4 Treves, Alessandro, 72, 114 Shmakov, Sergey, 120 Sigurjonsson, Styrmir, 44 Uversky, Vladimir, 31 Simonovi´c, Ana, 96, 121, 123 Uzelac, Aleksandra, 147 Slippers, Bernard, 150, 151 Uzelac, Iva, 92 Sorba, Paul, 26 Uzelac,Aleksandra, 49 Stankovi´c, Aleksandra, 63 Stankovi´c, Ivana, 119, 139 Varga, Balint,´ 100, 101 Stankovic, Tamara, 127, 144 Veliˇckovi´c, Petar, 102 Stanojevi´c, Ana, 98 Veljkovi´c, Duˇsan, 119, 149 Stanojevi´c,Miloˇs, 102 Veljkovi´c, Nevena, 32, 130, 131 Stojmirovi´c, Aleksandar, 27 Veljovi´c-Jovanovi´c, Sonja, 121, 123 Stoney, Ruth, 145 Vidovi´c, Marija, 121, 123 Streche, Alina, 99 Virgillito, Alessandra, 87 Suboti´c, Angelina, 96 Volkov, Sergey, 33, 68 Suknovic, Milija, 107 Vukicevic, Milan, 107 Sumonja, Neven, 131 Vukojevi´c, Vladana, 98 Surin, Alexey K., 14 Suvorina, Mariya Yu., 14 Waterhouse, Robert, 35 Szalkai, Balazs,´ 100 Wingfield, Michael, 150, 151

Stajner,ˇ Tijana, 147 Xenarios, Ioannis, 36

Tadi´c, Bosiljka, 28 Zari´c, Sneˇzana, 119, 138, 139, 149 Tesic, M., 112 Zdobnov, Evgeny, 35 Tikhonov, Alexey, 120 Zhang, Ping, 7 Todorovi´c, Sladjana, 96 Zlatkovi´c, Milica, 150, 151 Tompa, Peter, 29 Tosatto, Silvio, 30 Zivkoviˇ ´c, Maja, 63

List of partipipants

1. Anastasia Anashkina, Engelhardt Institute of Molecular Biology, Laboratory of computational methods for system biology, Russian Academy of Sciences, Moscow, Russia 2. Vladik Avetisov, The Semenov Institute of Chemical Physica, RAS Moscow, Russia 3. Miloˇs Avramov, University of Novi Sad, Faculty of Sciences, Department of Biology and Ecology, Serbia 4. Vladimir Babenko, Institute of Cytology and Genetics, Novosibirsk, Russia 5. Milena Banjevi´c, Natera, Department of Statistical Research, San Carlos, Cal- ifornia, United States of America 6. Bojana Banovi´c, Institute of Molecular Genetics and Genetic Engineering, University of Belgrade, Serbia 7. Jan Baumbach, Head of the Computational Biology Group Dept. of Math- ematics and Computer Science (IMADA), University of Southern Denmark (SDU), Denmark 8. Miloˇs Beljanski, Institute for General and Physical Chemistry, University of Belgrade, Serbia 9. Bojana Blagojevi´c, Institute of Physics Belgrade, Serbia 10. Erik Bongcam-Rudloff, Division of Molecular Genetics, Department of An- imal Breeding and Genetics, Swedish University of Agricultural Sciences, Sweden 11. Sanja Brdar, BioSense Institute for research and development of informa- tion technology in biosystem, University of Novi Sad, Serbia 12. Alexandre de Brevern, University Paris Diderot, Sorbonne Paris Cite, Paris, France 13. Vladimir Brusi´c, School of Medicine and Bioinformatics Center, Nazarbayev University, Kazakhstan and Department of Computer Science, Metropolitan College, Boston University, USA 14. Aleksandr Bugay, Joint Institute for Nuclear Research, Laboratory of Radia- tion Biology, Moscow, Russia 15. Ralf Bundschuh, The Ohio State University, Department of Physics, Chem- istry & Biochemistry, Division of Hematology, USA 16. Miloˇs Busarˇcevi´c, Center of Excellence for Food and Vector-borne Zoonoses, Institute for Medical Research, University of Belgrade, Serbia; United World College of the Adriatic, Duino, Italy 17. Oliviero Carugo, Faculty of Science, University of Pavia, Italy 18. Antonio Celani, The Abdus Salam International Centre for Theoretical Physics, Trieste, Italy 19. Nitesh Chawla, Frank M. Freimann Professor of Computer Science and En- gineering, University of Notre Dame, USA 20. Andrea Ciliberto, IFOM-IEO, Italy 21. Miroslava Cuperlovic-Culf, National Research Council of Canada, Depart- ment for Information Communication Technologies, Ottawa, Canada 22. Tamara Dimitrova, Macedonian Academy of Sciences and Arts, Research Center for Computer Science and Information Technologies, Skopje, Mace- donia 23. Marko Djordjevi´c, Faculty of Biology, University of Belgrade, Serbia 24. Branko Dragovich, Institute of Physics, Mathematical Institute SANU, Bel- grade, Serbia 25. Dragana Dudi´c, Faculty of Agriculture, Belgrade, Serbia 26. Nataˇsa Djurdjevac Conrad, Zuse Institute Berlin, Germany 27. Olgica Djurkovi´c-Djakovi´c, Institute for Medical Research, University of Bel- grade, Serbia 28. Oxana Galzitskaya, Group of bioinformatics, Institute of Protein Research of the RAS, Russia 29. Vladimir Gaˇsi´c, Institute of Molecular Genetics and Genetic Engineering, Belgrade, Serbia 30. Mikhail Gelfand, A.A. Kharkevich Institute for Information Transmission Problems, RAS, Faculty of Bioengineering and Bioinformatics, M.V. Lomonosov Moscow State University, Moscow, Russia 31. Giorgio Giurato, Genomix4Life, Italy 32. Sanja Gliˇsi´c, Institute of Nuclear Sciences VINCA, Center for Multidisci- plinary Research, Belgrade, Serbia 33. Jelena Graovac, Faculty of Mathematics, Department of Computer Science, University of Belgrade, Serbia 34. Milana Grbi´c, Faculty of Science and Mathematics, Univeristy of Banja Luka, Bosnia and Herzegovina 35. Silvia Grigolon, Lincolns Inn Fields Laboratory, The Francis Crick Institute, London, United Kingdom 36. Jelana Guzina, Faculty of Biology, University of Belgrade, Serbia 37. Maja Gvozdenov, Institute of Molecular Genetics and Genetic Engineering, Laboratory for Molecular Biology, University of Belgrade, Serbia 38. Andrej Ivaskovi´c, Faculty of Computer Science and Technology, Cambridge, United Kingdom 39. Davorka Jandrli´c, Faculty of Mechanical Engineering, Department for Math- ematics, University of Belgrade, Serbia 40. Asja Jeli´c, The Abdus Salam International Centre for Theoretical Physics (ICTP), Department for Quantitative Life Sciences, Trieste, Italy 41. Ana Jelovi´c, Faculty of Transport and Traffic Engineering, Department of General and Applied Mathematics, Univeristy of Belgrade, Serbia 42. Tihomir Jovani´c, School of Electrical Engineering, Belgrade, Serbia 43. Ivan Jovanovi´c, VINˇcA Institute of Nuclear Sciences, University of Belgrade, Department for Radiobiology and Molecular Genetics, Serbia 44. Jasmina Jovanovi´c, Faculty of Mathematics, Belgrade University, Serbia 45. Polina Kanevska, Bogolyubov Institute for Theoretical Physics, Kyiv, Ukraine 46. Jelena Kosti´c, Institute of Molecular Genetics and Genetic Engineering, Lab- oratory for Molecular Biology, Belgrade, Serbia 47. Jovana Kovaˇcevi´c, Faculty of Mathematics, Belgrade University, Department for Computer Science, Serbia 48. Sergei Kozyrev, Steklov Mathematical Institute, Moscow, Russia 49. Jelena Kuˇsi´c-Tiˇsma, Institute of Molecular Genetics and Genetic Engineer- ing, Laboratory for Molecular Biology, Belgrade, Serbia 50. Urs Lahrmann, Fraunhofer Institute for Toxicology and Experimental Medicine, Project Group Personalized Tumor Therapy, Regensburg, Germany 51. Yair Lakretz, Tel-Aviv university, Israel 52. Ilija Lalovi´c, Faculty of Natural Sciences and Mathematics, Banja Luka, Bosnia and Herzegovina 53. Mladen Lazarevi´c, Seven Bridges Genomics, Serbia 54. Mirjana Maljkovi´c, Faculty of Mathematics, University of Belgrade, Serbia 55. Saˇsa Malkov, Faculty of Mathematics, University of Belgrade, Serbia 56. No¨el Malod-Dognin, Imperial College London, Department of Computing, UK 57. Mina Mandi´c, Institute of Molecular Genetics and Genetic Engineering, Lab- oratory for Microbial Molecular Genetics and Ecology, Belgrade, Serbia 58. Hanen Masmoudi, Higher institute of Biotechnology of Sfax, Tunisia 59. Dragan Mati´c, Faculty of Science and Mathematics, Univeristy of Banja Luka, Bosnia and Herzegovina 60. Vesna Medakovi´c, Faculty of Chemistry, University of Belgrade, Serbia 61. Sofia Medvedeva, Skolkovo Institute of Science and Technology, Skolkovo, Russia 62. Sanja Mijalkovi´c, Seven Bridges Genomics, Serbia 63. Nikola Miloˇsevi´c, University of Manchester, School of Computer Science, United Kingdom 64. Nataˇsa Miˇsi´c, Lola Institute, Belgrade, Serbia 65. Nenad Miti´c, Faculty of Mathematics, University of Belgrade, Serbia 66. Ivana Mori´c, Institute of Molecular Genetics and Genetic Engineering, Uni- versity of Belgrade, Serbia 67. Filis Morina, Institute for Multidisciplinary Research, Department of Life Sciences, University of Belgrade, Serbia 68. Alexandre Morozov, Rutgers University, USA 69. Ozal Mutlu, Marmara University, Faculty of Arts and Sciences, Department for Biology, Istanbul, Turkey 70. Giovanni Nassa, Genomix4Life, Italy 71. Goran Nenadi´c, School of Computer Science, University of Manchester, In- stitute of Biotechnology & Health eResearch Centre, Manchester, UK; Math- ematical Institute of SASA, Belgrade, Serbia 72. Argyris Nicolaidis, Aristotle University of Thessaloniki, Greece 73. Miloˇs Nikoli´c, Faculty of Biology, University of Belgrade, Serbia 74. Zoran Obradovi´c, Center for Data Analytics and Biomedical Informatics, Temple University, USA 75. Zoran Ognjanovi´c, Mathematical Institute SASA, Serbia 76. Yuriy L. Orlov, Institute of Cytology and Genetics SB RAS, Novosibirsk State University, Russia 77. Vesna Paji´c, University of Belgrade, Faculty of Agriculture, Department for Mathematics and Physics, Center for Data Mining and Bioinformatics, Bel- grade, Serbia 78. Paolo Paradisi, Institute of Information Science and Technologies, National Research Council (ISTI CNR), Department for Signals and Images Labora- tory (SI-Lab), Pisa, Italy 79. Mirjana Pavlovi´c, Institute for General and Physical Chemistry, University of Belgrade, Serbia 80. Gordana Pavlovi´c-Laˇzeti´c, Faculty of Mathematics, University of Belgrade, Serbia 81. Ivana Peri´c, Vinˇca Institute of Nuclear Sciences, Laboratory of Molecular Biology and Endocrinology, Belgrade, Serbia 82. Vladimir Perovi´c, Centre for Multidisciplinary Research, Institute of Nuclear Sciences Vinca, University of Belgrade, Belgrade, Serbia 83. Jelena Petkovi´c, Institute of Molecular Genetics and Genetic Engineering, Belgrade, Serbia 84. Marko Petkovi´c, Seven Bridges Genomics, Serbia 85. Bojan Petrovi´c, Faculty of Technology and Metallurgy, Department for Bio- chemical Engineering and Biotechnology, University of Belgrade, Serbia 86. Zeljkoˇ Popovi´c, University of Novi Sad, Faculty of Sciences, Department of Biology and Ecology, Serbia 87. Nataˇsa Prˇzulj, Department of Computing , Imperial College London, UK 88. Marco Punta, Centre for Evolution and Cancer, The Institute of Cancer Re- search, London, UK 89. Krsto Radanovi´c, University of Banja Luka, Faculty of Sciences, Department for Biology, Bosnia and Herzegovina 90. Miloje Rakoˇcevi´c, Mathematical Institute SASA, Serbia 91. Andjela Rodi´c, University of Belgrade, Faculty of Biology, Serbia 92. Richard Roettger, Department of Mathematics and Computer Science, Uni- versity of Southern Denmark, Odense, Denmark 93. Jelena Samardˇzi´c, Institute of Molecular Genetics and Genetic Engineering, University of Belgrade, Serbia 94. Milica Selakovi´c, University in Belgrade, Faculty of Mathematics, Serbia 95. Konstantin Severinov, Rutgers University, Department of Molecular Biology and Biochemistry, Waksman Institute of Microbiology, USA 96. Ana Simonovi´c, Institute for Biological Research Siniˇsa Stankovi´c, Belgrade, Serbia 97. Paul Sorba, Laboratory of Theoretical Physics and CNRS, Annecy, France 98. Ivana Stankovi´c, Institute of Chemistry, Technology and Metallurgy, Univer- sity of Belgrade, Serbia 99. Tamara Stankovi´c, Institute of Physiology and Biochemistry, Faculty of Biol- ogy, University of Belgrade, Serbia 100. Ana Stanojevi´c, University of Belgrade, Faculty of Physical Chemistry, Ser- bia 101. Biljana Stojanovi´c, Faculty of Mathematics, University of Belgrade, Serbia 102. Aleksandar Stojmirovic, Janssen R & D, LLC, Systems Pharmacology & Biomarkers, Immunology TA, USA 103. Ruth Stoney, University of Manchester, United Kingdom 104. Alina-Maria Streche, University of Craiova, Department of Physics, Roma- nia 105. Neven Sumonja,ˇ Vinˇca Institute of Nuclear Sciences, Centre for Multidisci- plinary Research and Engineering, Belgrade, Serbia 106. Balazs´ Szalkai, Eotv¨ os¨ Lorand´ University, Budapest 107. Bosiljka Tadi´c, Department of Theoretical Physics, Jozef Stefan Institute, Ljubljana, Slovenia 108. Peter Tompa, VIB Structural Biology Research Center, Flanders Institute for Biotechnology (VIB), Belgium 109. Vladanka Topalovi´c, Institute of Molecular Genetics and Genetic Engineer- ing, Laboratory for Human Molecular Genetics , Belgrade, Serbia 110. Silvio Tosatto, Department of Biomedical Sciences, University of Padova, Italy 111. Vladimir Uversky, Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, USA 112. Iva Uzelac, University of Novi Sad, Faculty of Sciences, Department of Bi- ology and Ecology, Novi Sad, Serbia 113. Aleksandra Uzelac, Institute for Medical Research, Center of Excellence for Food- and Vector-borne Zoonoses, Department for Parasitology, Belgrade, Serbia 114. Balint Varga, Eotv¨ os¨ Lorand´ University, Budapest, Hungary 115. Petar Veliˇckovi´c, University of Cambridge, Faculty of Computer Science and Technology, Cambridge, United Kingdom 116. Aleksandar Veljkovi´c, Faculty of Mathematics, University of Belgrade, Ser- bia 117. Nevena Veljkovi´c, Institute for Nuclear Sciences VINCA, University of Bel- grade, Serbia 118. Sergey Volkov, Bogolyubov Institute for Theoretical Physics, Kiev, Ukraine 119. Milan Vukiˇcevi´c, University of Belgrade, Faculty of Organizational Sciences, Serbia 120. Robert Waterhouse, Department of Genetic Medicine and Development, Medical School, University of Geneva, Swiss Institute of Bioinformatics, Switzer- land 121. Ioannis Xenarios, SIB Swiss Institute of Bioinformatics, Switzerland 122. Ping Zhang, Griffith University, Southport, Australia 123. Milica Zlatkovi´c, University of Belgrade-Faculty of Forestry, Belgrade, Ser- bia

SPONZORS

Ministry of Education, Science and Technological Development of Republic of Serbia