Bioinformatics Methods for the Genetic and Molecular Characterisation Of

Total Page:16

File Type:pdf, Size:1020Kb

Bioinformatics Methods for the Genetic and Molecular Characterisation Of BIOINFORMATICS METHODSFORTHE GENETICAND MOLECULAR CHARACTERISATION OFCANCER Dissertation zur Erlangung des Grades des Doktors der Naturwissenschaften der Naturwissenschaftlich-Technischen Fakultäten der Universität des Saarlandes vo n DANIELSTÖCKEL Universität des Saarlandes, MI – Fakultät für Mathematik und Informatik, Informatik b e t r e u e r PROF.DR.HANS-PETERLENHOF 6. dezember 2016 Daten der Verteidigung Datum 25.11.2016, 15 Uhr ct. Vorsitzender Prof. Dr. Volkhard Helms Erstgutachter Prof. Dr. Hans-Peter Lenhof Zweitgutachter Prof. Dr. Andreas Keller Beisitzerin Dr. Christina Backes Dekan Prof. Dr. Frank-Olaf Schreyer This thesis is also available online at: https://somweyr.de/~daniel/ dissertation.zip ii a b s t r ac t Cancer is a class of complex, heterogeneous diseases of which many types have proven to be difficult to treat due to the high ge- netic variability between and within tumours. To improve ther- apy, some cases require a thorough genetic and molecular charac- terisation that allows to identify mutations and pathogenic pro- cesses playing a central role for the development of the disease. Data obtained from modern, biological high-throughput exper- iments can offer valuable insights in this regard. Therefore, we developed a range of interoperable approaches that support the analysis of high-throughput datasets on multiple levels of detail. Mutations are a main driving force behind the development of cancer. To assess their impact on an affected protein, we de- signed BALL-SNP which allows to visualise and analyse single nucleotide variants in a structure context. For modelling the ef- fect of mutations on biological processes we created CausalTrail which is based on causal Bayesian networks and the do-calculus. Using NetworkTrail, our web service for the detection of deregu- lated subgraphs, candidate processes for this investigation can be identified. Moreover, we implemented GeneTrail2 for uncovering deregulated processes in the form of biological categories using enrichment approaches. With support for more than 46,000 cat- egories and 13 set-level statistics, GeneTrail2 is the currently most comprehensive web service for this purpose. Based on the ana- lyses provided by NetworkTrail and GeneTrail2 as well as know- ledge from third-party databases, we built DrugTargetInspector, a tool for the detection and analysis of mutated and deregulated drug targets. We validated our tools using a Wilm’s tumour expression dataset and were able to identify pathogenic mechanisms that may be responsible for the malignancy of the blastemal tumour subtype and might offer opportunities for the development of novel therapeutic approaches. iii zusammenfassung Krebs ist eine Klasse komplexer, heterogener Erkrankungen mit vielen Unterarten, die aufgrund der genetischen Variabilität, die zwischen und innerhalb von Tumoren herrscht, nur schwer zu behandeln sind. Um eine bessere Therapie zu ermöglichen ist daher in einigen Fällen eine sorgfältige, genetische und moleku- lare Charakterisierung nötig, welche es erlaubt die Mutationen und pathogenen Prozesse zu identifizieren, die eine zentrale Rol- le während der Krankheitsentwicklung spielen. Daten aus mo- dernen, biologischen Hochdurchsatzexperimenten können hier- bei wertvolle Einsichten liefern. Daher entwickelten wir eine Rei- he interoperabler Ansätze, welche die Analyse von Hochdurch- satzdatensätzen auf mehreren Detailstufen unterstützen. Mutationen sind die treibende Kraft hinter der Entstehung von Krebs. Um ihren Einfluss auf das betroffene Protein beur- teilen zu können, entwarfen wir die Software BALL-SNP, wel- che es erlaubt einzelne Single Nucleotide Variations in einer Kris- tallstruktur zu visualisieren und analysieren. Um den Effekt ei- ner Mutation innerhalb eines biologischen Prozesses modellie- ren zu können, erstellten wir CausalTrail, das auf kausalen bayes- schen Netzwerken und dem do-calculus basiert. Unter der Ver- wendung von NetworkTrail, unserem Web-Service zur Detektion deregulierter Subgraphen, können Prozesse identifiziert werden, die als Kandidaten für eine solche Untersuchung dienen können. Zur Detektion deregulierter Prozesse in der Form von biologi- schen Kategorien mittels Enrichment-Ansätzen implementierten wir GeneTrail2. GeneTrail2 unterstützt mehr als 46.000 Kategorien und 13 Statistiken zur Berechnung von Enrichment-Scores. Basie- rend auf den Analysemethoden von NetworkTrail und GeneTrail2, sowie dem Wissen aus Drittdatenbanken konstruierten wir Drug- TargetInspector, ein Werkzeug zur Detektion und Analyse von mu- tierten und deregulierten Wirkstoffzielen. Wir validierten unsere Werkzeuge unter Verwendung eines Wilm’s Tumor Expressionsdatensatzes, für den wir pathogene Mechanismen identifizieren konnten, die für die Malignität des blastemreichen Subtyps verantwortlich sein können und biswei- len die Entwicklung neuartiger, therapeutischer Ansätze ermög- lichen könnten. v ACKNOWLEDGEMENTS The thesis you hold in your hands marks the end of a six year journey. When I started my PhD, a fledgling Master of Science, I thought Iknew everything about science. Surely I would be done in three years tops and surely my work would be nothing but revolutionising. All the oth- ers were certainly just slacking off! Oh, how naïve I was back then! And with the passage of time I learned a lot. Some of the things I learned concerned science, while other did not. I learned that succeeding as a researcher not only takes a lot of theoretical knowledge, but also a lot of dedication and hard work. I learned that the solution to some prob- lems cannot be rushed, but instead requires many small and deliberate steps. But most importantly, I learned that science is not an one man show, but a team effort. Because of this I would like to take the time to thank the persons without which this thesis would not have been possible. First of all, I would like to thank my advisor Prof. Dr. Hans-Peter Lenhof for giving me the opportunity to work at his chair. Hans-Peter also gave me the freedom to independently pursue the research topics I was interested in. If, however, a problem arose he would selflessly sacrifice his precious time to help with finding a solution. For thisIam deeply grateful. Next, thanks is in order for Prof. Dr. Andreas Keller for reviewing my thesis, bringing many interesting research opportunities to my attention and the energy and enthusiasm he is infusing into the people that have the pleasure to work with him. Naturally, this work would have been impossible without my fam- ily and, for obvious reasons, without my parents Thomas and Sabine Stöckel. It was them who woke my interest in science and technology. As long as I can remember they supported and encouraged me to pur- sue the career I have chosen for myself and, thus, without them I would not have been able to achieve what I achieved. Similarly, I would like to thank Charlotte De Gezelle for her patience whenever I postponed yet another vacation because: “I am done soon™!” Whenever I would become frustrated and needed a shoulder to cry on I could rely on her having an open ear for me. You truly have made my world a brighter place! Of course, there is another group of people who’s importance I can- not stress enough. My proofreaders Lara Schneider, Tim Kehl, Fabian Müller, Florian Schmidt, and Andreas Stöckel fought valiantly against spelling mistakes, layout issues, and incomprehensible prose. Thank you for setting my head straight whenever I was again convinced that what I wrote was perfectly clear. Hint: it wasn’t. Equally, I would like to thank my coworkers Alexander Rurainski, Christina Backes, Anna- Katharina Hildebrandt, Oliver Müller, Marc Hellmuth, Lara Schneider, Patrick Trampert, and Tim Kehl for the pleasant and familial working environment. Much of the work presented in this thesis has been a joint endeavour in which they had integral parts. Similarly, I would like to vii praise all the awesome people from the Saarland University, as well as the Universities Mainz and Tübingen that I had the pleasure to work with. A special shoutout goes to my colleagues and good friends from the MPI for Computer Science for the many enjoyable discussions dur- ing lunch. A thank you is also in order for my students, although I am not certain whether they enjoyed being taught by me as much as I en- joyed the opportunity to teach them. Finally, I am grateful for all my friends and the time we spent together playing board games, celebrat- ing, or simply hanging out. Without you, I would certainly not have managed to pull through and finish this thesis. viii co n t e n t s CONTENTS Contents ix List of Figures xi List of Tables xiii List of Notations xv 1 Introduction 1 1.1 Motivation .......................... 2 1.2 Overview ............................ 4 2 Biological Background 9 2.1 Cancer ............................. 11 2.2 Personalised Medicine ................... 18 2.3 Biological Assays ...................... 20 2.4 Wilm’s Tumour Data .................... 30 3 Biological Networks 33 3.1 Graph Theory ......................... 34 3.2 Types of Biological Networks . 36 3.3 Causal Bayesian Networks . 40 3.4 Gaussian Graphical Models . 55 3.5 Deregulated Subgraphs . 62 4 Enrichment Algorithms 81 4.1 Statistical Tests ........................ 82 4.2 A Framework for Enrichment Analysis . 86 4.3 Evaluation .......................... 99 4.4 Hotelling’s T 2-Test . 107 5 Graviton 113 5.1 The World Wide Web . 114 5.2 The Graviton Architecture . 123 5.3 GeneTrail2 . 133 5.4 NetworkTrail . 152 5.5 Drug Target Inspector . 157 6 BALL 163 6.1 The Biochemical Algorithms Library . 165 6.2 ballaxy ............................ 168 6.3 PresentaBALL . 170 6.4 BALL-SNP . 171 6.5 Summary . 173 7 Conclusion 175 ix co n t e n t s 7.1 Summary . 175 7.2 Discussion . 177 A Mathematical Background 179 A.1 Combinatorics and Probability Theory . 179 A.2 Machine Learning . 186 B Wilm’s Dataset - Supplementary Tables 191 C Enrichment Evaluation - Results 199 C.1 Enrichments on Synthetic Categories . 199 C.2 Enrichments on Reactome Categories . 201 D GeneTrail2 - Tables 203 D.1 Supported Organisms . 203 D.2 List of Human Categories . 203 Bibliography 207 x list of figures LISTOFFIGURES 1.1 Common causes of death in Germany (2014).
Recommended publications
  • The Hitchhiker's Guide to Graph Exchange Formats
    The Hitchhiker’s Guide to Graph Exchange Formats Prof. Matthew Roughan [email protected] http://www.maths.adelaide.edu.au/matthew.roughan/ Work with Jono Tuke UoA June 4, 2015 M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 1 / 31 Graphs Graph: G(N; E) I N = set of nodes (vertices) I E = set of edges (links) Often we have additional information, e.g., I link distance I node type I graph name M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 2 / 31 Why? To represent data where “connections” are 1st class objects in their own right I storing the data in the right format improves access, processing, ... I it’s natural, elegant, efficient, ... Many, many datasets M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 3 / 31 ISPs: Internode: layer 3 http: //www.internode.on.net/pdf/network/internode-domestic-ip-network.pdf M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 4 / 31 ISPs: Level 3 (NA) http://www.fiberco.org/images/Level3-Metro-Fiber-Map4.jpg M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 5 / 31 Telegraph submarine cables http://en.wikipedia.org/wiki/File:1901_Eastern_Telegraph_cables.png M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 6 / 31 Electricity grid M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 7 / 31 Bus network (Adelaide CBD) M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 8 / 31 French Rail http://www.alleuroperail.com/europe-map-railways.htm M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 9 / 31 Protocol relationships M.Roughan (UoA) Hitch Hikers Guide June 4, 2015 10 / 31 Food web M.Roughan (UoA) Hitch Hikers
    [Show full text]
  • A DSL for Defining Instance Templates for the ASMIG System
    A DSL for defining instance templates for the ASMIG system Andrii Kovalov Dissertation 2014 Erasmus Mundus MSc in Dependable Software Systems Department of Computer Science National University of Ireland, Maynooth Co. Kildare, Ireland A dissertation submitted in partial fulfilment of the requirements for the Erasmus Mundus MSc Dependable Software Systems Head of Department : Dr Adam Winstanley Supervisor : Dr James Power Date: June 2014 Abstract The area of our work is test data generation via automatic instantiation of software models. Model instantiation or model finding is a process of find- ing instances of software models. For example, if a model is represented as a UML class diagram, the instances of this model are UML object diagrams. Model instantiation has several applications: finding solutions to problems expressed as models, model testing and test data generation. There are sys- tems that automatically generate model instances, one of them is ASMIG (A Small Metamodel Instance Generator). This system is focused on a `problem solving' use case. The motivation of our work is to adapt ASMIG system for use as a test data generator and make the instance generation process more transparent for the user. In order to achieve this we provided a way for the user to interact with ASMIG internal data structure, the instance template graph via a specially designed graph definition domain-specific language. As a result, the user is able to configure the instance template in order to get plausible instances, which can be then used as test data. Although model finding is only suitable for obtaining test inputs, but not the expected test outputs, it can be applied effectively for smoke testing of systems that process complex hierarchic data structures such as programming language parsers.
    [Show full text]
  • Compiler Support for Operator Overloading and Algorithmic Differentiation in C++
    Compiler Support for Operator Overloading and Algorithmic Differentiation in C++ Zur Erlangung des Grades eines Doktors der Naturwissenschaften (Dr.rer.nat.) genehmigte Dissertation von Alexander Hück, M.Sc. aus Wiesbaden Tag der Einreichung: 16.01.2020, Tag der Prüfung: 28.02.2020 1. Gutachten: Prof. Dr. Christian Bischof 2. Gutachten: Prof. Dr. Martin Bücker Darmstadt – D 17 Computer Science Department Scientific Computing Compiler Support for Operator Overloading and Algorithmic Differentiation in C++ Doctoral thesis by Alexander Hück, M.Sc. 1. Review: Prof. Dr. Christian Bischof 2. Review: Prof. Dr. Martin Bücker Date of submission: 16.01.2020 Date of thesis defense: 28.02.2020 Darmstadt – D 17 Bitte zitieren Sie dieses Dokument als: URN: urn:nbn:de:tuda-tuprints-115226 URL: http://tuprints.ulb.tu-darmstadt.de/11522 Dieses Dokument wird bereitgestellt von tuprints, E-Publishing-Service der TU Darmstadt http://tuprints.ulb.tu-darmstadt.de [email protected] Die Veröffentlichung steht unter folgender Creative Commons Lizenz: Namensnennung – Keine Bearbeitungen 4.0 International (CC BY–ND 4.0) The publication is under the following Creative Commons license: Attribution – NoDerivatives 4.0 International (CC BY–ND 4.0) Erklärungen laut Promotionsordnung §8 Abs. 1 lit. c PromO Ich versichere hiermit, dass die elektronische Version meiner Dissertation mit der schriftli- chen Version übereinstimmt. §8 Abs. 1 lit. d PromO Ich versichere hiermit, dass zu einem vorherigen Zeitpunkt noch keine Promotion versucht wurde. In diesem Fall sind nähere Angaben über Zeitpunkt, Hochschule, Dissertationsthe- ma und Ergebnis dieses Versuchs mitzuteilen. §9 Abs. 1 PromO Ich versichere hiermit, dass die vorliegende Dissertation selbstständig und nur unter Verwendung der angegebenen Quellen verfasst wurde.
    [Show full text]
  • Lesson 8 Graph Visual Analytics: Visualising and Analysing Network Data
    ISSS608 Visual Analytics and Applications 10/14/2016 Lesson 8: Visualising and Analysing Network Data Lesson 8 Graph Visual Analytics: Visualising and Analysing Network Data Mentor: Dr. Kam Tin Seong Associate Professor of Information Systems (Practice) School of Information Systems, Singapore Management University Content • Introduction to Graph Visual Analytics • Graph Visualisation in Actions • Basic Principles of Graph – Network data sets – Graph data format • Network Visualisation and Analysis – Network visualisation and analysis process model – Graph layouts and visual attributes – Network metrics • Network visualisation and analysis tools – NodeXL – Gephi 2 8-1 ISSS608 Visual Analytics and Applications 10/14/2016 Lesson 8: Visualising and Analysing Network Data Network in real World • Physical – Transportation (i.e. road, port, rail, etc) – Utility (electricity, water, gas, network cable, etc) – Natural (river, etc) • Abstract – Social media (i.e. e-mail, Facebook, Twitter, Wikipedia, etc) – Organisation (i.e. NGO, politics, customer-company, staff-to-staff, criminal, terrorist, disease, etc) 3 Classical Graph Theory • The Seven Bridges of Königsberg is a historically notable problem in mathematics. Its negative resolution by Leonhard Euler in 1735 laid the foundations of graph theory and prefigured the idea of topology. Source: http://en.wikipedia.org/wiki/Seven_Bridges_of_K%C3%B6nigsberg 4 8-2 ISSS608 Visual Analytics and Applications 10/14/2016 Lesson 8: Visualising and Analysing Network Data Classical Graph Visualisation and
    [Show full text]
  • Lesson 8 Graph Visualisation and Analysis: Methods and Applications
    IS428 Visual Analytics for Business Intelligence 10/8/2016 Lesson 8: Graph Visualisation and Analysis Lesson 8 Graph Visualisation and Analysis: Methods and Applications Mentor: Dr. Kam Tin Seong Associate Professor of Information Systems (Practice) School of Information Systems, Singapore Management University Content • Motivation of graph visualisation • Graph Visualisation in Actions • Typology of network data • Graph data file formats • Graph Layout Techniques • Network visualisation and analysis process model • Network Metrics • Network visualisation and analysis tools – Gephi – NodeXL 8-1 IS428 Visual Analytics for Business Intelligence 10/8/2016 Lesson 8: Graph Visualisation and Analysis Network in real World • Physical – Transportation (i.e. road, port, rail, etc) – Utility (electricity, water, gas, network cable, etc) – Natural (river, etc) • Abstract – Social media (i.e. e-mail, Facebook, Twitter, Wikipedia, etc) – Organisation (i.e. NGO, politics, customer-company, staff-to-staff, criminal, terrorist, disease, etc) 3 Classical Graph Theory • The Seven Bridges of Königsberg is a historically notable problem in mathematics. Its negative resolution by Leonhard Euler in 1735 laid the foundations of graph theory and prefigured the idea of topology. Source: http://en.wikipedia.org/wiki/Seven_Bridges_of_K%C3%B6nigsberg 4 8-2 IS428 Visual Analytics for Business Intelligence 10/8/2016 Lesson 8: Graph Visualisation and Analysis Classical Graph Visualisation and Analysis • Using sociogram, also know as network graph Early 20th-century
    [Show full text]
  • WEB-BASED DRAWING SOFTWARE for GRAPHS in 3D and TWO LAYOUT ALGORITHMS FARSHAD BARAHIMI Bachelor of Science, Shahid Bahonar Unive
    WEB-BASED DRAWING SOFTWARE FOR GRAPHS IN 3D AND TWO LAYOUT ALGORITHMS FARSHAD BARAHIMI Bachelor of Science, Shahid Bahonar University of Kerman, 2010 A Thesis Submitted to the School of Graduate Studies of the University of Lethbridge in Partial Fulfillment of the Requirements for the Degree MASTER OF SCIENCE Department of Mathematics and Computer Science University of Lethbridge LETHBRIDGE, ALBERTA, CANADA c Farshad Barahimi, 2015 WEB-BASED DRAWING SOFTWARE FOR GRAPHS IN 3D AND TWO LAYOUT ALGORITHMS FARSHAD BARAHIMI Date of Defense: April 20, 2015 Dr. Stephen Wismath Supervisor Professor Ph.D. Dr. Robert Benkoczi Thesis Examination Assistant Professor Ph.D. Committee Member Dr. Joy Morris Thesis Examination Associate Professor Ph.D. Committee Member Dr. Howard Cheng Chair, Associate Professor Ph.D. Thesis Examination Committee Dedication Dedicated to my family whose value for me can not be expressed in words. iii Abstract A new web-based software system for visualization and manipulation of graphs in 3D, named We3Graph is presented with a focus on accessibility, customizability for applica- tions of graph drawing, usability and extendibility. The software system allows multiple users to work on the same graph at the same time and is accessible through web browsers. The software can be extended using plugins written in any programming language and custom render engines written in the Javascript language. Also two new algorithms are proposed to answer the following question, previously raised in [53]: Given a graph G with n vertices, V = fv1;v2;:::;vng, and given a set of n distinct points P = fp1; p2;:::; png each with integer coordinates in three di- mensions, can G be drawn crossing-free on P with vi at pi and with a number of bends polynomial in n and in a volume polynomial in n and the dimension of P? iv Acknowledgments I want to express my gratitude to Dr.
    [Show full text]
  • Unravelling Graph Exchange File Formats
    1 Unravelling Graph-Exchange File Formats Matthew Roughan and Jonathan Tuke Abstract—A graph is used to represent data in which the this context is not purely about syntax. Exchange also requires relationships between the objects in the data are at least as common definitions of the meaning of the attributes. important as the objects themselves. Over the last two decades nearly a hundred file formats have been proposed or used On the other hand, file size is not a primary consideration. to provide portable access to such data. This paper seeks to Hence many exchange formats pay little attention to this and review these formats, and provide some insight to both reduce related details (e.g., read/write performance). the ongoing creation of unnecessary formats, and guide the We concentrate on exchange formats, but some of the development of new formats where needed. formats considered here were not originally developed with exchange in mind, but have become de facto exchange formats I. INTRODUCTION through use. In these cases we see reversals of objectives XCHANGE of data is a basic requirement of scientific compared to some purpose-built exchange formats. We shall E research. Accurate exchange requires portable file for- therefore consider a large range of such features for compari- mats, where portability means the ability to transfer (without son, noting as we do so that as exchange of very large datasets extraordinary efforts) the data both between computers (hard- becomes important, the requirements will change. ware and operating system), and between software (different Many of the formats presented may seem obsolete.
    [Show full text]
  • XML-Entscheidungsbäume Für Applikationen Jan Oevermann (46594)
    XML-Entscheidungsbäume für Applikationen Jan Oevermann (46594) Projektarbeit an der Hochschule Karlsruhe WS 2013/14 2 Inhaltsverzeichnis Einführung ................................................................................................ 5 Abstract ................................................................................................................................ 5 Begriffe ................................................................................................................................. 6 UseCases ................................................................................................. 7 Entscheidungsbäume .............................................................................. 8 Darstellung ........................................................................................................................... 8 Anwendungen ...................................................................................................................... 8 Herausforderungen .................................................................................. 9 Trennung von Logik und Inhalt ............................................................................................ 9 Freigabeprozess ................................................................................................................... 9 Wiederverwendung .............................................................................................................. 9 Workflow ...............................................................................................
    [Show full text]
  • Optimisation of Clustering Algorithms for the Identification of Customer
    FAKULTAT¨ FUR¨ MATHEMATIK UNIVERSITAT¨ KARLSRUHE Optimisation of Clustering Algorithms for the Identification of Customer Profiles from Shopping Cart Data Optimierung von Clusterverfahren zur Identifikation von K¨auferprofilen aus Warenkorbdaten Diplomarbeit von: Stefanie Nagel Matrikel Nummer: 1190189 Karlsruhe, October 17, 2008 Ausgef¨uhrtunter der Leitung von Referentin/Betreuerin: Koreferent: Prof. Dr. Dorothea Wagner Prof. Dr. Willy D¨orfler Fakult¨atf¨urInformatik Fakult¨at f¨urMathematik Universit¨at Karlsruhe Universit¨at Karlsruhe betreuender Mitarbeiter: Robert G¨orke Fakult¨atf¨urInformatik Universit¨at Karlsruhe I Eidesstattliche Erkl¨arung Hiermit versichere ich, dass ich die vorliegende Diplomarbeit ohne Hilfe Dritter und nur mit den angegebenen Quellen und Hilfsmitteln angefertigt habe. Diese Arbeit hat in gleicher oder ¨ahnlicher Form noch keiner Pr¨ufungsbeh¨ordevorgelegen. Karlsruhe, February 25, 2009 Stefanie Nagel I Contents Eidesstattliche Erkl¨arung I Contents III List of Figures VII List of Tables IX 1 Introduction 1 2 Motivation 3 2.1 Finding Frequent Itemsets with FP-Growth . 4 2.2 Generating Association Rules . 10 2.3 Improving Association Rules . 10 2.4 Possibilities for Improving This Procedure . 11 3 Graph Modelling 13 3.1 Building the Weights . 13 3.2 Community . 14 3.3 The Number of Clusterings . 17 4 Quality Indices 19 4.1 Coverage . 19 4.2 Performance . 20 4.3 Modularity . 20 4.4 Extending Coverage . 21 4.5 Extending Performance . 21 4.6 Extending Modularity . 22 4.7 An Example . 26 5 Network Analysis 31 5.1 Visualisation of the Graph . 31 5.2 Degree Distribution . 32 5.3 Betweenness Centrality . 32 5.4 Clustering Coefficients . 34 5.5 k-Core Analysis .
    [Show full text]
  • The Rocs Handbook
    The Rocs Handbook Tomaz Canabrava Andreas Cord-Landwehr The Rocs Handbook 2 Contents 1 Introduction 6 1.1 Goals, Target Audience, and Workflows . .6 1.2 Rocs in a Nutshell . .7 1.2.1 Graph Documents . .7 1.2.2 Edge Types . .7 1.2.3 Node Types . .7 1.2.4 Properties . .8 1.3 Tutorial . .8 1.3.1 Generating the Graph . .8 1.3.2 Creating the Element Types . .8 1.3.3 The Algorithm . .8 1.3.4 Execute the Algorithm . .9 2 The Rocs User Interface 10 2.1 Main Elements of the User Interface . 10 2.2 Graph Editor Toolbar . 11 3 Scripting 12 3.1 Executing Algorithms in Rocs . 12 3.1.1 Control Script Execution . 12 3.1.2 Script Output . 12 3.1.3 Scripting Engine API . 13 4 Import and Export 14 4.1 Exchange Rocs Projects . 14 4.1.1 Import and Export of Graph Documents . 14 4.1.1.1 Trivial Graph File Format . 14 4.1.1.1.1 Format Specification . 14 4.1.1.1.2 Example . 15 4.1.1.2 DOT Language / Graphviz Graph File Format . 15 4.1.1.2.1 Unsupported Features . 15 4.1.1.2.2 Example . 15 The Rocs Handbook 5 Graph Layout 16 5.1 Laying out graphs automatically in Rocs . 16 5.1.1 Force Based Layout . 16 5.1.1.1 Radial Tree Layout . 17 6 Credits and License 18 4 Abstract Rocs is a graph theory tool by KDE. The Rocs Handbook Chapter 1 Introduction This chapter provides an overview of the core features and the typical workflows.
    [Show full text]
  • Overview of Existing Software Tools for Graph Matching
    Technical Report Pattern Recognition and Image Processing Group Institute of Computer Aided Automation Vienna University of Technology Favoritenstr. 9/183-2 A-1040 Vienna AUSTRIA Phone: +43 (1) 58801-18351 Fax: +43 (1) 58801-18392 E-mail: [email protected] URL: http://www.prip.tuwien.ac.at/ PRIP-TR-130 September 21, 2013 Overview of Existing Software Tools for Graph Matching Katrin Lasinger Abstract This report was created in the course of the seminar \Selected Chapters in Image Process- ing" at the Vienna University of Technology under the supervision of Walter Kropatsch. The report aims to give an overview of publicly available software toolkits for graph match- ing. Existing graph datasets for graph matching benchmarking as well as common data structures to store graphs are presented. Libraries and toolkits for exact and inexact match- ing are covered in the report and their implemented algorithms are stated. 1 Introduction Graph matching algorithms have been used for pattern recognition tasks since the early seventies. The paper "Thirty years of graph matching in Pattern Recognition" by Conte et al. [2] gives a survey of the literature in graph matching algorithms and types of applications of graph-based tech- niques up to the early 2000's. Algorithms can be divided into exact and inexact matching approaches. The first group covers approaches like graph and subgraph isomorphism that require strict correspondences between two graphs or at least their subgraphs. Inexact matching algorithms have less strict constraints and tolerate some differences between the graphs. Some reasons why this might be necessary are noisy data or intrinsic variability of the patterns.
    [Show full text]
  • Lesson 10 Visualising and Analysing Network Data
    ISSS608 Visual Analytics and Applications 11/18/2015 Lesson 10: Visualising and Analysing Network Data Lesson 10 Visualising and Analysing Network Data Mentor: Dr. Kam Tin Seong Associate Professor of Information Systems (Practice) School of Information Systems, Singapore Management University Content • Introduction to Graph Visual Analytics • Graph Visualisation in Actions • Basic Principles of Graph – Network data sets – Graph data format • Network Visualisation and Analysis – Network visualisation and analysis process model – Graph layouts and visual attributes – Network metrics • Network visualisation and analysis tools – NodeXL – Gephi 2 10-1 ISSS608 Visual Analytics and Applications 11/18/2015 Lesson 10: Visualising and Analysing Network Data Network in real World • Physical – Transportation (i.e. road, port, rail, etc) – Utility (electricity, water, gas, network cable, etc) – Natural (river, etc) • Abstract – Social media (i.e. e-mail, Facebook, Twitter, Wikipedia, etc) – Organisation (i.e. NGO, politics, customer-company, staff-to-staff, criminal, terrorist, disease, etc) 3 Classical Graph Theory • The Seven Bridges of Königsberg is a historically notable problem in mathematics. Its negative resolution by Leonhard Euler in 1735 laid the foundations of graph theory and prefigured the idea of topology. Source: http://en.wikipedia.org/wiki/Seven_Bridges_of_K%C3%B6nigsberg 4 10-2 ISSS608 Visual Analytics and Applications 11/18/2015 Lesson 10: Visualising and Analysing Network Data Classical Graph Visualisation and Analysis • Using
    [Show full text]