Thesis Reference

Total Page:16

File Type:pdf, Size:1020Kb

Thesis Reference Thesis An interactive and integrative view of glycobiology ALOCCI, Davide Abstract Glycosylation is one of the most abundant post-translational modifications (PTM) and the primary cause of microheterogeneity in proteins (glycoforms). It consists of the addition of a sugar moiety on the surface of a protein, and it has been suggested that over 50% of mammalian cellular proteins are typically glycosylated. Despite its importance, glycobiology is still lacking concrete bioinformatics support which, in the last two decades, has boosted other major ’omics’ like proteomics, transcriptomics and genomics. Additionally, the inherent complexity of carbohydrates and the use of many orthogonal experimental techniques to elucidate a structure have slowed down the annotation of novel glycans leaving glycobiology fragmented internally. This thesis aims at the creation of a collection of integrative, explorative and knowledge-based tools to reconstitute the puzzle of biological evidence produced by different glycomics experiments and, at the same time, fill the gap with other ’omics’. Reference ALOCCI, Davide. An interactive and integrative view of glycobiology. Thèse de doctorat : Univ. Genève, 2018, no. Sc. 5280 URN : urn:nbn:ch:unige-1122950 DOI : 10.13097/archive-ouverte/unige:112295 Available at: http://archive-ouverte.unige.ch/unige:112295 Disclaimer: layout of this document may differ from the published version. 1 / 1 UNIVERSITE DE GENÈVE FACULTE DES SCIENCES Département d’informatique Dr. Frédérique Lisacek An Interactive and Integrative view of Glycobiology THÈSE Présentée à la Faculté des sciences de l’Université de Genève pour obtenir le grade de Docteur ès sciences, mention bioinformatique Par Davide Alocci de Orbetello (Italy) Thèse N ◦ 5280 GENÈVE 2018 iii Résumé en Français La glycosylation est l’une des modificationes post- traductionnelles (PTM) les plus abondantes ainsi que la première cause de la microhétérogénéité des proteines (gly- coformes). Elle consiste en l’adjonction d’une molécule de sucre (glycane) à la sur- face d’une protéine, d’autre part, il a été suggéré que plus de la moitié des protéines cellulaires des mammifères sont glycosylées . Malgré son importance, la glycobiologie reste isolée des autres domaines de la bi- ologie et les ressources bioinformatiques nécessaires à son essor sont aussi éparpil- lées, peu standardisées et rarement intégrées. Pourtant au cours des deux dernières décénnies, la bioinformatique et l’intégration de données ont joué un rôle crucial dans les autres disciplines ‘ -omiques’ comme le protéomique, transcriptomique et génomique . De plus la complexité intrinsèque des glycanes ainsi que le recours à une large panoplie de techniques expérimentales orthogonales pour élucider leurs structures, ralentissent la découverte de nouvelles glycanes et contribuent à la frag- mentiation interne de la glycobiologie. Ce travail de recherche vise à la création d’un ensemble d’outils pour reconstituer le puzzle des différentes sources de données expérimentales produites en glycomique aussi qu’à rétablir la liaison avec les autres ressources ’ -omiques ’ . Afin de car- actériser l’objectif propre de chaque outil nous les avons classés en trois catégories: intégratif, exploratoire et basé sur la connaissance. Les outils de nature integrative ont été conçus pour combiner des informations que l’on trouve dans différentes bases de données et ne peuvent pas être intégrées par un simple hyper-lien. Cette catégorie est dominée par deux outil principaux: GlyS3 et EpitopeXtractor. Le premier recherche des sous-structure de glycanes, le deuxième définit les épitopes de glycanes contenus dans un ou plusieurs glycanes complets sur la base d’un ensemble de référence. Les outils de nature exploratoire permettent à l’utilisateur d’ analyser des données à travers une approche graphique et interactive. Le premier logiciel de cette catégorie est PepSweetener qui facilite l’annotation manuelle des glycopeptides. Le deuxième est Gynsight qui, grâce à une interface web dédiée, permet de visualiser et comparer iv des profils de glycanes. Enfin, nous avons aussi repris la ressource Glydin’ qui per- met de naviguer dans un réseau de glyco-épitopes pour mettre en valeur la parenté éventuelle de ligands dans des données d’interactions sucre-protéine. La dernière categorie concerne les outils basés sur la connaissance. La plateforme centrale appelée GlyConnect est un tableau de bord intégré pour l’analyse de don- nées glycomiques et glycoprotéomiques. En conclusion, tous les outils présentés dans cette thèse font partie de l’initiative Glycomics@ExPASy qui vise à diffuser l’utilisation de la bioinformatique en glycobi- ologie et à mettre l’accent sur la relation entre la glycobiologie et la bioinformatique à partir des protéines. v Abstract Glycosylation is one of the most abundant post-translational modifications (PTM) and the primary cause of microheterogeneity in proteins (glycoforms). It consists of the addition of a sugar moiety on the surface of a protein, and it has been suggested that over 50% of mammalian cellular proteins are typically glycosylated. Despite its importance, glycobiology is still lacking concrete bioinformatics support which, in the last two decades, has boosted other major ’omics’ like proteomics, tran- scriptomics and genomics. Additionally, the inherent complexity of carbohydrates and the use of many orthogonal experimental techniques to elucidate a structure have slowed down the annotation of novel glycans leaving glycobiology fragmented internally. This thesis aims at the creation of a collection of tools to reconstitute the puzzle of biological evidence produced by different glycomics experiments and, at the same time, fill the gap with other ’omics’. To characterise the aim of each tool, we have created three different categories: integrative, explorative and knowledge- based. Integrative tools have been designed to combine specific pieces of information which are stored in different databases and cannot be integrated by a simple connection. The category is populated by two different tools: GlyS3 and EpitopeXtractor. The first performs glycan substructure search whereas the second defines which glycan epitopes are contained in one or more glycans based on a curated collection. Ex- plorative tools enable the user to perform data analysis with a graphical and inter- active approach, speeding up the analysis process. The first tool of this category is PepSweetener which facilitates the manual annotation of glycopeptides. Then, we have Glynsight which proposes a way to visualise and compare glycan profiles with a dedicated web-interface. To conclude, we mention Glydin’, a network visu- alisation of almost 600 glycan epitopes where connections exemplify substructure relationships. The last category regards the Knowledge-based tools and is filled by GlyConnect, an integrated dashboard for glycomics and glycoproteomics analysis. All tools presented in this thesis are part of the Glycomics@ExPASy initiative which aims at spreading the use of bioinformatics in glycobiology and stressing the relation between glycobiology and protein-oriented bioinformatics. vi vii Acknowledgements Firstly, I would like to express my sincere gratitude to my advisor Dr Frede´rique´ Lisacek for the continuous support of my PhD study and related research, for her patience and motivation. She always supported my ideas promoting the exploration of new technologies. Besides my advisor, I would like to thank the rest of my thesis committee: Prof. Bastien Chopard, Prof. Amos Bairoch, and Prof. Serge Perez, for their insightful comments and encouragement. I would like to thank my fellow labmates for the stimulating discussions during coffee breaks and for all the fun we have had in the last four years. Additionally, I would like to thank the Swiss Institute of Bioinformatics and all the partners of the Gastric Glyco Explorer that have given me the possibility to reach the end of this PhD. In the end, I would like to thank my family and my friend for supporting me spiri- tually throughout writing this thesis and my life in general. ix Contents Résumé en Français iii Abstractv Acknowledgements vii Contents ix 1 Introduction1 1.1 Bioinformatics.................................4 1.2 Data Integration................................5 1.2.1 Data discovery............................6 1.2.2 Data exploitation...........................6 1.2.3 Data provenance...........................7 1.3 Data integration strategies..........................8 1.3.1 Service oriented architecture....................9 1.3.2 Link integration............................9 1.3.3 Data warehousing.......................... 10 1.3.4 View integration........................... 11 1.3.5 Model-driven service oriented architecture............ 12 1.3.6 Integration applications....................... 12 1.3.7 Workflows............................... 13 1.3.8 Mashups................................ 14 1.3.9 The Semantic Web and RDF..................... 15 1.4 Data integration in bioinformatics..................... 16 1.4.1 Standards............................... 16 1.4.2 Ontologies............................... 17 x 1.4.3 Formats and reporting guidelines................. 19 1.4.4 Identifiers............................... 20 1.4.5 Visualisation.............................. 20 1.5 Glycomics................................... 21 1.6 Data integration in glycomics........................ 25 1.6.1 Glycan formats............................ 25 1.6.2 Glycan graphical representations.................
Recommended publications
  • A Comprehensive Workflow for Variant Calling Pipeline Comparison and Analysis Using R Programming
    www.ijcrt.org © 2020 IJCRT | Volume 8, Issue 8 August 2020 | ISSN: 2320-2882 A COMPREHENSIVE WORKFLOW FOR VARIANT CALLING PIPELINE COMPARISON AND ANALYSIS USING R PROGRAMMING 1Mansi Ujjainwal, 2Preeti Chaudhary 1MSc Bioinformatics, 2Mtech Bioinformatics, 1Amity Institute of Biotechnology 1Amity University, Noida, India Abstract: The aim of the article is to provide variant calling workflow and analysis protocol for comparing results of the two using two variant calling platforms. Variant calling pipelines used here are predominantly used for calling variants in human whole exome data and whole genome data. The result of a variant calling pipeline is a set of variants( SNPS, insertions, deletions etc) present in the sequencing data. Each pipeline is capable of calling its certain intersecting and certain unique variants. The intersecting and unique variants can further be distinguished on the basis of their reference SNP ID and grouped on the basis of its annotation. The number of variants called can be humongous depending upon the size and complexity of the data. R programming packages and ubuntu command shell can be used to differentiate and analyse the variants called by each type of pipeline. Index Terms – Whole Exome Sequencing, Variant Calling, Sequencing data, R programming I. INTRODUCTION The Human Genome Project started in 1990, makes up the single most significant project in the field of biomedical sciences and biology. The project was set out to change how we see biology and medicine. The project was set out to sequence complete genome of Homo sapiens as well as several microorganisms including Escherichia coli, Saccharomyces cerevisiae, and metazoans such as Caenorhabtidis elegans.
    [Show full text]
  • Communicate the Future PROCEEDINGS HYATT REGENCY ORLANDO 20–23 MAY | ORLANDO, FL and Summit.Stc.Org #STC18
    Communicate the Future PROCEEDINGS HYATT REGENCY ORLANDO 20–23 MAY | ORLANDO, FL www.stc.org and summit.stc.org #STC18 SOCIETY FOR TECHNICAL COMMUNICATION 1 Efficiency exemplified Organizations globally use Adobe FrameMaker (2017 release) Request demo to transform the way they create and deliver content Accelerated turnaround time for 70% reduction in printing and paper customized publications material cost Faster creation and delivery of content 50% faster production of PDF for new products across devices documentation 20% faster development of Accelerated publishing across formats course content Increased efficiency and reduced 20% improvement in process translation costs while producing multilingual manuals efficiency For a personalized demo or questions, write to us at [email protected] © 2018 Adobe Systems Incorporated. All rights reserved. The papers published in these proceedings were reproduced from originals furnished by the authors. The authors, not the Society for Technical Communication (STC), are solely responsible for the opinions expressed, the integrity of the information presented, and the attribution of sources. The papers presented in this publication are the works of their respective authors. Minor copyediting changes were made to ensure consistency. STC grants permission to educators and academic libraries to distribute articles from these proceedings for classroom purposes. There is no charge to these institutions, provided they give credit to the author, the proceedings, and STC. All others must request permission. All product and company names herein are the property of their respective owners. © 2018 Society for Technical Communication 9401 Lee Highway, Suite 300 Fairfax, VA 22031 USA +1.703.522.4114 www.stc.org Design and layout by Avon J.
    [Show full text]
  • Property Graph Vs RDF Triple Store: a Comparison on Glycan Substructure Search
    RESEARCH ARTICLE Property Graph vs RDF Triple Store: A Comparison on Glycan Substructure Search Davide Alocci1,2, Julien Mariethoz1, Oliver Horlacher1,2, Jerven T. Bolleman3, Matthew P. Campbell4, Frederique Lisacek1,2* 1 Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, Geneva, 1211, Switzerland, 2 Computer Science Department, University of Geneva, Geneva, 1227, Switzerland, 3 Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva, 1211, Switzerland, 4 Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia * [email protected] Abstract Resource description framework (RDF) and Property Graph databases are emerging tech- nologies that are used for storing graph-structured data. We compare these technologies OPEN ACCESS through a molecular biology use case: glycan substructure search. Glycans are branched Citation: Alocci D, Mariethoz J, Horlacher O, tree-like molecules composed of building blocks linked together by chemical bonds. The Bolleman JT, Campbell MP, Lisacek F (2015) molecular structure of a glycan can be encoded into a direct acyclic graph where each node Property Graph vs RDF Triple Store: A Comparison on Glycan Substructure Search. PLoS ONE 10(12): represents a building block and each edge serves as a chemical linkage between two build- e0144578. doi:10.1371/journal.pone.0144578 ing blocks. In this context, Graph databases are possible software solutions for storing gly- Editor: Manuela Helmer-Citterich, University of can structures and Graph query languages, such as SPARQL and Cypher, can be used to Rome Tor Vergata, ITALY perform a substructure search. Glycan substructure searching is an important feature for Received: July 16, 2015 querying structure and experimental glycan databases and retrieving biologically meaning- ful data.
    [Show full text]
  • Genomic Alignment (Mapping) and SNP / Polymorphism Calling
    GenomicGenomic alignmentalignment (mapping)(mapping) andand SNPSNP // polymorphismpolymorphism callingcalling Jérôme Mariette & Christophe Klopp http://bioinfo.genotoul.fr/ Bioinfo Genotoul platform – Since 2008 ● 1 Roche 454 ● 1 MiSeq ● 2 HiSeq – Providing ● Data processing for quality control ● Secure data access to end users http://bioinfo.genotoul.fr/ http://ng6.toulouse.inra.fr/ 2 Bioinfo Genotoul : Services – High speed computing facility access – Application and web-server hosting – Training – Support – Project partnership 3 Genetic variation http://en.wikipedia.org/wiki/Genetic_variation Genetic variation, variations in alleles of genes, occurs both within and in populations. Genetic variation is important because it provides the “raw material” for natural selection. http://studentreader.com/genotypes-phenotypes/ 4 Types of variations ● SNP : Single nucleotide polymorphism ● CNV : copy number variation ● Chromosomal rearrangement ● Chromosomal duplication http://en.wikipedia.org/wiki/Copy-number_variation http://en.wikipedia.org/wiki/Human_genetic_variation 5 The variation transmission ● Mutation : In molecular biology and genetics, mutations are changes in a genomic sequence: the DNA sequence of a cell's genome or the DNA or RNA sequence of a virus (http://en.wikipedia.org/wiki/Mutation). ● Mutations are transmitted if they are not lethal. ● Mutations can impact the phenotype. 6 Genetic markers and genotyping ● A set of SNPs is selected along the genome. ● The phenotypes are collected for individuals. ● The SNPs are genotyped
    [Show full text]
  • Genetics 211 - 2018 Lecture 3
    Genetics 211 - 2018 Lecture 3 High Throughput Sequencing Pt II Gavin Sherlock [email protected] January 23rd 2018 Long “Synthetic Reads” aka Moleculo Genomic DNA Fragment Size Select (10kb) Polish, ligate amplification adaptors ~10 kb DNA Dilute to 500 molecules per well Amplify, fragment, add sequencing adaptors Pool Sequence Separate, based on well barcode Remove barcodes, assemble 10kb fragments Assemble genome from 10kb fragments Synthetic Read Characteristics 10x Genomics • Similar in concept to CPT-Seq from last week’s paper • Idea is to uniquely barcode reads that derive from a long molecule - ~50-100kb • 10x Chromium system automates much of the process for you ~10 HMW gDNA molecules per GEM 10x Barcoded Beads HMW gDNA Oil Benefits of 10x • Correct placement in difficult to align regions: Paralog A Paralog B Benefits of 10x • Correct placement in difficult to align regions: Paralog A Paralog B Benefits of 10x • Correct placement in difficult to align regions: Paralog A Paralog B Benefits of 10x • Correct placement in difficult to align regions: Paralog A Paralog B Benefits of 10x • Correct placement in difficult to align regions: Paralog A Paralog B Benefits of 10x • Haplotype phasing: Benefits of 10x • Haplotype phasing: Benefits of 10x • Haplotype phasing: Benefits of 10x • Haplotype phasing: Using Hi-C data to aid assemblies • Hi-C is a proximity ligation method, aimed at reconstructing the 3 dimensional structure of a genome • Originally developed with the idea of looking at how the genome of an organism for which a good reference exists is physically organized • But, probability of intrachromosomal contacts is much higher than that of interchromosomal contacts.
    [Show full text]
  • Efficient Support for Data-Intensive Scientific Workflows on Geo
    Efficient support for data-intensive scientific workflows on geo-distributed clouds Luis Eduardo Pineda Morales To cite this version: Luis Eduardo Pineda Morales. Efficient support for data-intensive scientific workflows ongeo- distributed clouds. Computation and Language [cs.CL]. INSA de Rennes, 2017. English. NNT : 2017ISAR0012. tel-01645434v2 HAL Id: tel-01645434 https://tel.archives-ouvertes.fr/tel-01645434v2 Submitted on 13 Dec 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. A Lucía, que no dejó que me rindiera. Acknowledgements One page is not enough to thank all the people who, in one way or another, have helped, taught, and motivated me along this journey. My deepest gratitude to them all. To my amazing supervisors Gabriel Antoniu and Alexandru Costan, words fall short to thank you for your guidance, support and patience during these 3+ years. Gabriel: ¡Gracias! For your trust and support went often beyond the academic. Alex: it has been my honor to be your first PhD student, I hope I met the expectations. To my family, Lucía, for standing by my side every time, everywhere; you are the driving force in my life, I love you.
    [Show full text]
  • Apache Taverna
    Apache Taverna hp://taverna.incubator.apache.org/ Donal Fellows San Soiland-Reyes @donalfellows @soilandreyes [email protected] [email protected] hp://orcid.org/0000-0002-9091-5938 hp://orcid.org/0000-0001-9842-9718 Alan R Williams @alanrw [email protected] hp://orcid.org/0000-0003-3156-2105 This work is licensed under a Creave Commons Collaboraons Workshop 2015-03-26 A,ribuIon 3.0 Unported License. Taverna Workflow Ecosystem • Workflow Language — SCUFL2 (and t2flow) • Workflow Engine — Taverna • Used in… – Taverna Command Line Tool – Taverna Server – Taverna WorkBench • Allied services – myExperiment, workflow repository – Service Catalographer, service catalog software • Instantiated as BioCatalogue, BiodiversityCatalogue, … NERSC Workflow Day 2 Map of the Taverna Ecosystem Taverna Ruby client Taverna Online Applicaon-Specific Portals Player UI UI Plugins UI Plugins Plugins Taverna UI Plugins Lite Taverna Taverna Workbench UI Plugins Command UI Plugins Line Tool REST API Components Taverna Other TavernaUI Plugins APIs Server Servers Taverna SOAP API Taverna Core AcIvity Engine UI Ps UI Plugins Plugins Workflow Service many Repository Catalogs services… 3 Users, Scientific Areas, Projects Taverna In Use NERSC Workflow Day 4 Taverna Users Worldwide NERSC Workflow Day 5 Taverna Uses — Scientific Areas • Biodiversity — BioVeL project • Digital Preservation — SCAPE project • Astronomy — AstroTaverna product • Solar Wind Physics — HELIO project • In silico Medicine — VPH-Share project NERSC Workflow Day 6 Biodiversity: BioVeL • Virtual e-LaBoratory for
    [Show full text]
  • Introduction to Bioinformatics (Elective) – SBB1609
    SCHOOL OF BIO AND CHEMICAL ENGINEERING DEPARTMENT OF BIOTECHNOLOGY Unit 1 – Introduction to Bioinformatics (Elective) – SBB1609 1 I HISTORY OF BIOINFORMATICS Bioinformatics is an interdisciplinary field that develops methods and software tools for understanding biologicaldata. As an interdisciplinary field of science, bioinformatics combines computer science, statistics, mathematics, and engineering to analyze and interpret biological data. Bioinformatics has been used for in silico analyses of biological queries using mathematical and statistical techniques. Bioinformatics derives knowledge from computer analysis of biological data. These can consist of the information stored in the genetic code, but also experimental results from various sources, patient statistics, and scientific literature. Research in bioinformatics includes method development for storage, retrieval, and analysis of the data. Bioinformatics is a rapidly developing branch of biology and is highly interdisciplinary, using techniques and concepts from informatics, statistics, mathematics, chemistry, biochemistry, physics, and linguistics. It has many practical applications in different areas of biology and medicine. Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data. Computational Biology: The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems. "Classical" bioinformatics: "The mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequences and related information.” The National Center for Biotechnology Information (NCBI 2001) defines bioinformatics as: "Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline.
    [Show full text]
  • A Review of Scalable Bioinformatics Pipelines
    Data Sci. Eng. (2017) 2:245–251 https://doi.org/10.1007/s41019-017-0047-z A Review of Scalable Bioinformatics Pipelines 1 1 Bjørn Fjukstad • Lars Ailo Bongo Received: 28 May 2017 / Revised: 29 September 2017 / Accepted: 2 October 2017 / Published online: 23 October 2017 Ó The Author(s) 2017. This article is an open access publication Abstract Scalability is increasingly important for bioin- Scalability is increasingly important for these analysis formatics analysis services, since these must handle larger services, since the cost of instruments such as next-gener- datasets, more jobs, and more users. The pipelines used to ation sequencing machines is rapidly decreasing [1]. The implement analyses must therefore scale with respect to the reduced costs have made the machines more available resources on a single compute node, the number of nodes which has caused an increase in dataset size, the number of on a cluster, and also to cost-performance. Here, we survey datasets, and hence the number of users [2]. The backend several scalable bioinformatics pipelines and compare their executing the analyses must therefore scale up (vertically) design and their use of underlying frameworks and with respect to the resources on a single compute node, infrastructures. We also discuss current trends for bioin- since the resource usage of some analyses increases with formatics pipeline development. dataset size. For example, short sequence read assemblers [3] may require TBs of memory for big datasets and tens of Keywords Pipeline Á Bioinformatics Á Scalable Á CPU cores [4]. The analysis must also scale out (horizon- Infrastructure Á Analysis services tally) to take advantage of compute clusters and clouds.
    [Show full text]
  • Meeting Booklet
    LS2 Annual Meeting 2019 “Cell Biology from Tissue to Nucleus” Meeting Booklet 14-15 February, University of Zurich, Campus Irchel LS 2 - Life Sciences Switzerland Design by Dagmar Bocakova - [email protected] 2019 WELCOME ADDRESS Dear colleagues and friends It is with great pleasure that we invite you to the LS² Annual Meeting 2019, held on the 14th and 15th of February, 2019 at the Irchel Campus of the University of Zurich. This is a special year, as it is the 50th anniversary of USGEB/LS2, which we will celebrate with a Jubilee Apéro on Thursday 14th. The LS² Annual Meeting brings together scientists from all nations and backgrounds to discuss a variety of Life Science subjects. The meeting this year focusses on the cell in health and disease, with plenary talks on different aspects of cell biology and a symposium on live cell imaging approaches. You will be able to hear the latest, most exciting findings in several fields, from Molecular and Cellular Biosciences, Proteomics, Chemical Biology, Physiology, Pharmacology and more, presented by around 30 invited speakers and 45 speakers selected from abstracts in one of the seven scientific symposia and five plenary lectures. Two novelties highlight the meeting this year: The "PIs of Tomorrow" session, in which selected postdocs will present their research to a jury of professors, will be for the first time a plenary session. In addition, poster presenters will be selected to give flash talks in the symposia with the spirit to further promote young scientists. Join us for the poster session with more than 130 posters, combined with a large industry exhibition and the Jubilee Apéro.
    [Show full text]
  • Programming Models to Support Data Science Workflows
    UNIVERSITAT POLITÈCNICA DE CATALUNYA (UPC) BARCELONATECH COMPUTER ARCHITECTURE DEPARTMENT (DAC) Programming models to support Data Science workflows PH.D. THESIS 2020 | SPRING SEMESTER Author: Advisors: Cristián RAMÓN-CORTÉS Dra. Rosa M. BADIA SALA VILARRODONA [email protected] [email protected] Dr. Jorge EJARQUE ARTIGAS [email protected] iii ”Apenas él le amalaba el noema, a ella se le agolpaba el clémiso y caían en hidromurias, en salvajes ambonios, en sustalos exas- perantes. Cada vez que él procuraba relamar las incopelusas, se enredaba en un grimado quejumbroso y tenía que envul- sionarse de cara al nóvalo, sintiendo cómo poco a poco las arnillas se espejunaban, se iban apeltronando, reduplimiendo, hasta quedar tendido como el trimalciato de ergomanina al que se le han dejado caer unas fílulas de cariaconcia. Y sin em- bargo era apenas el principio, porque en un momento dado ella se tordulaba los hurgalios, consintiendo en que él aprox- imara suavemente sus orfelunios. Apenas se entreplumaban, algo como un ulucordio los encrestoriaba, los extrayuxtaba y paramovía, de pronto era el clinón, la esterfurosa convulcante de las mátricas, la jadehollante embocapluvia del orgumio, los esproemios del merpasmo en una sobrehumítica agopausa. ¡Evohé! ¡Evohé! Volposados en la cresta del murelio, se sen- tían balpamar, perlinos y márulos. Temblaba el troc, se vencían las marioplumas, y todo se resolviraba en un profundo pínice, en niolamas de argutendidas gasas, en carinias casi crueles que los ordopenaban hasta el límite de las gunfias.” Julio Cortázar, Rayuela v Dedication This work would not have been possible without the effort and patience of the people around me.
    [Show full text]
  • ADMI Cloud Computing Presentation
    ECSU/IU NSF EAGER: Remote Sensing Curriculum ADMI Cloud Workshop th th Enhancement using Cloud Computing June 10 – 12 2016 Day 1 Introduction to Cloud Computing with Amazon EC2 and Apache Hadoop Prof. Judy Qiu, Saliya Ekanayake, and Andrew Younge Presented By Saliya Ekanayake 6/10/2016 1 Cloud Computing • What’s Cloud? Defining this is not worth the time Ever heard of The Blind Men and The Elephant? If you still need one, see NIST definition next slide The idea is to consume X as-a-service, where X can be Computing, storage, analytics, etc. X can come from 3 categories Infrastructure-as-a-S, Platform-as-a-Service, Software-as-a-Service Classic Cloud Computing Computing IaaS PaaS SaaS My washer Rent a washer or two or three I tell, Put my clothes in and My bleach My bleach comforter dry clean they magically appear I wash I wash shirts regular clean clean the next day 6/10/2016 2 The Three Categories • Software-as-a-Service Provides web-enabled software Ex: Google Gmail, Docs, etc • Platform-as-a-Service Provides scalable computing environments and runtimes for users to develop large computational and big data applications Ex: Hadoop MapReduce • Infrastructure-as-a-Service Provide virtualized computing and storage resources in a dynamic, on-demand fashion. Ex: Amazon Elastic Compute Cloud 6/10/2016 3 The NIST Definition of Cloud Computing? • “Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” On-demand self-service, broad network access, resource pooling, rapid elasticity, measured service, http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-145.pdf • However, formal definitions may not be very useful.
    [Show full text]