Treball Fi De Grau

TREBALL DE FI DE GRAU TFG TITLE: Development of a GUI for InterMineR and Cytoscape to make biological databases FAIR. DEGREE: Biomedical Engineering Degree AUTHOR: Celia Sánchez Laorden ADVISORS: Dr Rachel Lyne Dr Gos Micklem SUPERVISOR: Dr Santiago Marco DATE: June 13, 2021 Títol: Creació d’una interfície gràfica d’usuari entre InterMineR i Cytoscape per fer que les bases de dades siguin FAIR. Autor: Celia Sánchez Laorden Directors: Dr Rachel Lyne Dr Gos Micklem Supervisor: Dr Santiago Marco Data: 13 de juny de 2021 Resum InterMine és un sistema del tipus magatzem de dades que permet crear grans bases de dades biològiques fàcilment des de fonts heterogènies en serveis web RESTful. El sistema proporciona eines estadístiques i d’anàlisis de dades potents. InterMine com a plataforma té una interfície web des d’on l’usuari pot fer cerques i incorpora recursos especialitzats com mètodes d’anàlisis d’enriquiment de conjunts de gens. InterMineR és una de les biblioteques informàtiques que dóna suport a la plataforma des de l’entorn de programació de R Studio. En aquest projecte, s’han implementat noves funcionalitats per permetre que l’usuari treballi amb col·leccions i conjunts de dades creats des de el servei web, anomenats llistes. Les noves funcions han estat incorporades i publicades en la nova versió del paquet a Bioconductor. L’objectiu principal d’aquest projecte ha estat la creació d’una interfície gràfica d’usuari (GUI) per fer cerques a les bases de dades d’InterMine i generar visualitzacions de xarxes Cytoscape. S’ha desenvolupat un programari executable en entorns d’es- criptori. Aquest permet als usuaris reduir la complexitat de les dades i extreure’n significat amb fins de recerca. S’espera que permeti analitzar amb més profun- ditat les fonts d’InterMine i augmentar-ne la FAIR (dades trobables, accessibles, interoperables i reutilitzables en anglès). Title : Development of a graphical user interface for InterMineR and Cytoscape to make biological databases FAIR. Author: Celia Sánchez Laorden Advisors: Dr Rachel Lyne Dr Gos Micklem Supervisor: Dr Santiago Marco Date: June 13, 2021 Overview InterMine is a biological data warehousing system for creating large biological databases of heterogenous data sources easily in RESTful web services. It provides powerful statistical and analysis tools. InterMine as a platform has a web interface in which user can run searches and incorporates specialized resources such as enrichment statistics. InterMineR is one of the software ’client’ libraries that interfaces the biological databases built using the InterMine platform with a programming environment, in this case R. In this project, new functionalities have been implemented to enable the user to work with stored collections of data and saved results sets created from a web-service, called lists. The new functions have been merged and published in the new Bioconductor version of the package. The main aim of this project is to create a graphical user interface (GUI) to query the InterMine databases and generating Cytoscape network visualizations. We developed a software that runs on desktop environment. It allows users to reduce the complexity of the data and extract meaning from it. It will allow data in InterMines to be further analysed - increasing its FAIRness. First of all, I would like to show appreciation to Dr Gos Micklem, for giving me the opportunity to conduct this project in his Lab. I would like to express my most sincere thanks of gratitude to Dr Rachel Lyne for her guidance, advice and assistance. This work would not have been possible without her intense commitment and dedication. I cannot fail in my remarks to express appreciation for the coworking with Yo Yehudi, Daniela Butano and Adrián Bazaga. Finally, I would also like to thank to Dr Santiago Marco, for his advice and support. CONTENTS Project Origin and Motivation .......................... 1 CHAPTER 1. Introduction ............................... 2 1.1. Scope ......................................... 2 1.2. Span ......................................... 2 1.2.1. Description of the span of the project .................. 3 1.2.2. Requirements of the project ....................... 3 1.2.3. Limitations of the project ......................... 4 1.2.4. Deliverables of the project ........................ 5 1.2.5. Criterion of acceptation of the results .................. 5 1.2.6. Restrictions of the project ........................ 5 1.2.7. Initial known risks ............................. 6 CHAPTER 2. Background ............................... 7 2.1. State of the art .................................... 7 2.1.1. InterMine as a Data Warehouse ..................... 7 2.1.2. Setting up an InterMine: making data FAIR ............... 8 2.1.3. Cytoscape: graph analysis software ................... 9 2.2. Project Environment ................................ 10 CHAPTER 3. Scan of the Market ........................... 11 CHAPTER 4. Design Engineering ........................... 13 4.1. Improvements of the core InterMineR R package ............... 13 4.1.1. Preliminary Project Study ........................ 13 4.1.2. Proposed Solution ............................ 13 4.2. Interfacing with Cytoscape ............................ 14 4.2.1. Preliminary Project Study ........................ 14 4.2.2. Proposed Solution ............................ 14 4.3. Tutorials ....................................... 15 4.3.1. Preliminary Project Study ........................ 15 4.3.2. Proposed Solution ............................ 15 4.4. Documentation ................................... 15 4.4.1. Preliminary Project Study ........................ 15 4.4.2. Proposed Solution ............................ 16 CHAPTER 5. Detail Engineering ............................ 17 5.1. Technologies involved ............................... 17 5.2. Design ........................................ 19 5.2.1. Improvements of the core of InterMineR ................ 19 5.2.2. Interfacing InterMineR and Cytoscape with Shiny ........... 23 5.2.3. Set Query Tab ............................... 25 5.2.4. Run your Query Tab ........................... 28 5.2.5. Visualize your Results Tab ........................ 28 5.2.6. Overlay Additional Data Tab ....................... 30 5.2.7. Saved Networks tab ........................... 31 5.2.8. Customer Support ............................ 32 5.2.9. GitHub Repository ............................ 33 5.3. Results ........................................ 33 5.3.1. Improvements of the core of InterMineR ................ 33 5.3.2. Use cases for the InterMineR-Cytoscape Shiny Interface ....... 35 CHAPTER 6. Organization ............................... 41 6.1. Technique pre-feasibility study .......................... 41 6.2. Schedule of Execution ............................... 42 CHAPTER 7. Economic pre-feasibility study .................... 43 7.1. Cost study ...................................... 43 CHAPTER 8. Environmental Impact .......................... 45 CHAPTER 9. License .................................. 46 CHAPTER 10. Future Extensions ........................... 47 Conclusions ...................................... 48 Bibliography ...................................... 49 APPENDIX A. Task 1: Improving InterMineR ............... 2 A.1. Classes ........................................ 2 A.1.1. ListManager-class.R ........................... 2 A.1.2. webservice-class.R ............................ 2 A.1.3. InterMineR-class.R ............................ 2 A.2. Methods and Functions .............................. 3 A.2.1. initInterMine ................................ 3 A.2.2. list_manager ................................ 3 A.2.3. ListManager-methods.R ......................... 3 A.3. Documentation ................................... 7 A.3.1. ListManager-class.Rd .......................... 7 A.3.2. ListManager-class documentation rendered to HTML. ........ 9 A.4. Results ........................................ 10 APPENDIX B. Task 2: Shiny Interface .................... 23 B.1. Welcoming message ................................ 23 B.2. Interface simplified code .............................. 23 B.2.1. Simplified code of the app.R script structure .............. 23 B.2.2. Simplified code of the tab dashboard structure of: ........... 25 B.3. InterMineR fragments of code .......................... 26 B.3.1. Selecting the InterMine and getting the data model: initInterMine, listMines, getModel ............................ 26 B.3.2. Get the information of the templates pre-defined in InterMine and get the query obtained in a template: getTemplates, getTemplatesQuery 26 B.3.3. Constraints: setConstraints ....................... 27 B.3.4. Initialize a new InterMineR query or modify an existing list query: setQuery .................................. 29 B.3.5. Get the summary of constraints: summary ............... 29 B.3.6. Get the results: runQuery ........................ 30 B.4. Cytoscape fragments of code ........................... 31 B.4.1. Visualize tab ................................ 31 B.4.2. Overlay tab ................................ 35 B.4.3. Saved Networks tab ........................... 46 B.5. User guide ...................................... 47 APPENDIX C. Use-cases for the InterMineR-Cytoscape Shiny In- terface ................................ 56 C.1. HumanMine use-case. ............................... 56 C.1.1. Workflow A: ................................ 56 C.1.2. Workflow B: ................................ 57 C.2. CovidMine use-case. ...............................

Treball Fi De Grau

Description of the Biomart Package

The Biomart User's Guide

Variation Data in Ensembl Plants

Original Article the Reactome Biomart

The Biomart User's Guide

The Biomart User's Guide

Biomart–Biological Queries Made Easy Damian Smedley, Syed Haider, Benoit Ballester, Richard Holland, Darin London, Gudmundur Thorisson, Arek Kasprzyk

The Biomart Community Portal

The Biomart User's Guide

The Biomart User's Guide

The Biomart Community Portal

Advanced Genomic Data Mining