The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Networkx Tutorial
5.03.2020 tutorial NetworkX tutorial Source: https://github.com/networkx/notebooks (https://github.com/networkx/notebooks) Minor corrections: JS, 27.02.2019 Creating a graph Create an empty graph with no nodes and no edges. In [1]: import networkx as nx In [2]: G = nx.Graph() By definition, a Graph is a collection of nodes (vertices) along with identified pairs of nodes (called edges, links, etc). In NetworkX, nodes can be any hashable object e.g. a text string, an image, an XML object, another Graph, a customized node object, etc. (Note: Python's None object should not be used as a node as it determines whether optional function arguments have been assigned in many functions.) Nodes The graph G can be grown in several ways. NetworkX includes many graph generator functions and facilities to read and write graphs in many formats. To get started though we'll look at simple manipulations. You can add one node at a time, In [3]: G.add_node(1) add a list of nodes, In [4]: G.add_nodes_from([2, 3]) or add any nbunch of nodes. An nbunch is any iterable container of nodes that is not itself a node in the graph. (e.g. a list, set, graph, file, etc..) In [5]: H = nx.path_graph(10) file:///home/szwabin/Dropbox/Praca/Zajecia/Diffusion/Lectures/1_intro/networkx_tutorial/tutorial.html 1/18 5.03.2020 tutorial In [6]: G.add_nodes_from(H) Note that G now contains the nodes of H as nodes of G. In contrast, you could use the graph H as a node in G. -
Apache Flink™: Stream and Batch Processing in a Single Engine
Apache Flink™: Stream and Batch Processing in a Single Engine Paris Carboney Stephan Ewenz Seif Haridiy Asterios Katsifodimos* Volker Markl* Kostas Tzoumasz yKTH & SICS Sweden zdata Artisans *TU Berlin & DFKI parisc,[email protected] fi[email protected] fi[email protected] Abstract Apache Flink1 is an open-source system for processing streaming and batch data. Flink is built on the philosophy that many classes of data processing applications, including real-time analytics, continu- ous data pipelines, historic data processing (batch), and iterative algorithms (machine learning, graph analysis) can be expressed and executed as pipelined fault-tolerant dataflows. In this paper, we present Flink’s architecture and expand on how a (seemingly diverse) set of use cases can be unified under a single execution model. 1 Introduction Data-stream processing (e.g., as exemplified by complex event processing systems) and static (batch) data pro- cessing (e.g., as exemplified by MPP databases and Hadoop) were traditionally considered as two very different types of applications. They were programmed using different programming models and APIs, and were exe- cuted by different systems (e.g., dedicated streaming systems such as Apache Storm, IBM Infosphere Streams, Microsoft StreamInsight, or Streambase versus relational databases or execution engines for Hadoop, including Apache Spark and Apache Drill). Traditionally, batch data analysis made up for the lion’s share of the use cases, data sizes, and market, while streaming data analysis mostly served specialized applications. It is becoming more and more apparent, however, that a huge number of today’s large-scale data processing use cases handle data that is, in reality, produced continuously over time. -
Networkx: Network Analysis with Python
NetworkX: Network Analysis with Python Salvatore Scellato Full tutorial presented at the XXX SunBelt Conference “NetworkX introduction: Hacking social networks using the Python programming language” by Aric Hagberg & Drew Conway Outline 1. Introduction to NetworkX 2. Getting started with Python and NetworkX 3. Basic network analysis 4. Writing your own code 5. You are ready for your project! 1. Introduction to NetworkX. Introduction to NetworkX - network analysis Vast amounts of network data are being generated and collected • Sociology: web pages, mobile phones, social networks • Technology: Internet routers, vehicular flows, power grids How can we analyze this networks? Introduction to NetworkX - Python awesomeness Introduction to NetworkX “Python package for the creation, manipulation and study of the structure, dynamics and functions of complex networks.” • Data structures for representing many types of networks, or graphs • Nodes can be any (hashable) Python object, edges can contain arbitrary data • Flexibility ideal for representing networks found in many different fields • Easy to install on multiple platforms • Online up-to-date documentation • First public release in April 2005 Introduction to NetworkX - design requirements • Tool to study the structure and dynamics of social, biological, and infrastructure networks • Ease-of-use and rapid development in a collaborative, multidisciplinary environment • Easy to learn, easy to teach • Open-source tool base that can easily grow in a multidisciplinary environment with non-expert users -
Graph Database Fundamental Services
Bachelor Project Czech Technical University in Prague Faculty of Electrical Engineering F3 Department of Cybernetics Graph Database Fundamental Services Tomáš Roun Supervisor: RNDr. Marko Genyk-Berezovskyj Field of study: Open Informatics Subfield: Computer and Informatic Science May 2018 ii Acknowledgements Declaration I would like to thank my advisor RNDr. I declare that the presented work was de- Marko Genyk-Berezovskyj for his guid- veloped independently and that I have ance and advice. I would also like to thank listed all sources of information used Sergej Kurbanov and Herbert Ullrich for within it in accordance with the methodi- their help and contributions to the project. cal instructions for observing the ethical Special thanks go to my family for their principles in the preparation of university never-ending support. theses. Prague, date ............................ ........................................... signature iii Abstract Abstrakt The goal of this thesis is to provide an Cílem této práce je vyvinout webovou easy-to-use web service offering a database službu nabízející databázi neorientova- of undirected graphs that can be searched ných grafů, kterou bude možno efektivně based on the graph properties. In addi- prohledávat na základě vlastností grafů. tion, it should also allow to compute prop- Tato služba zároveň umožní vypočítávat erties of user-supplied graphs with the grafové vlastnosti pro grafy zadané uži- help graph libraries and generate graph vatelem s pomocí grafových knihoven a images. Last but not least, we implement zobrazovat obrázky grafů. V neposlední a system that allows bulk adding of new řadě je také cílem navrhnout systém na graphs to the database and computing hromadné přidávání grafů do databáze a their properties. -
Gephi Tools for Network Analysis and Visualization
Frontiers of Network Science Fall 2018 Class 8: Introduction to Gephi Tools for network analysis and visualization Boleslaw Szymanski CLASS PLAN Main Topics • Overview of tools for network analysis and visualization • Installing and using Gephi • Gephi hands-on labs Frontiers of Network Science: Introduction to Gephi 2018 2 TOOLS OVERVIEW (LISTED ALPHABETICALLY) Tools for network analysis and visualization • Computing model and interface – Desktop GUI applications – API/code libraries, Web services – Web GUI front-ends (cloud, distributed, HPC) • Extensibility model – Only by the original developers – By other users/developers (add-ins, modules, additional packages, etc.) • Source availability model – Open-source – Closed-source • Business model – Free of charge – Commercial Frontiers of Network Science: Introduction to Gephi 2018 3 TOOLS CINET CyberInfrastructure for NETwork science • Accessed via a Web-based portal (http://cinet.vbi.vt.edu/granite/granite.html) • Supported by grants, no charge for end users • Aims to provide researchers, analysts, and educators interested in Network Science with an easy-to-use cyber-environment that is accessible from their desktop and integrates into their daily work • Users can contribute new networks, data, algorithms, hardware, and research results • Primarily for research, teaching, and collaboration • No programming experience is required Frontiers of Network Science: Introduction to Gephi 2018 4 TOOLS Cytoscape Network Data Integration, Analysis, and Visualization • A standalone GUI application -
Evaluation of SPARQL Queries on Apache Flink
applied sciences Article SPARQL2Flink: Evaluation of SPARQL Queries on Apache Flink Oscar Ceballos 1 , Carlos Alberto Ramírez Restrepo 2 , María Constanza Pabón 2 , Andres M. Castillo 1,* and Oscar Corcho 3 1 Escuela de Ingeniería de Sistemas y Computación, Universidad del Valle, Ciudad Universitaria Meléndez Calle 13 No. 100-00, Cali 760032, Colombia; [email protected] 2 Departamento de Electrónica y Ciencias de la Computación, Pontificia Universidad Javeriana Cali, Calle 18 No. 118-250, Cali 760031, Colombia; [email protected] (C.A.R.R.); [email protected] (M.C.P.) 3 Ontology Engineering Group, Universidad Politécnica de Madrid, Campus de Montegancedo, Boadilla del Monte, 28660 Madrid, Spain; ocorcho@fi.upm.es * Correspondence: [email protected] Abstract: Existing SPARQL query engines and triple stores are continuously improved to handle more massive datasets. Several approaches have been developed in this context proposing the storage and querying of RDF data in a distributed fashion, mainly using the MapReduce Programming Model and Hadoop-based ecosystems. New trends in Big Data technologies have also emerged (e.g., Apache Spark, Apache Flink); they use distributed in-memory processing and promise to deliver higher data processing performance. In this paper, we present a formal interpretation of some PACT transformations implemented in the Apache Flink DataSet API. We use this formalization to provide a mapping to translate a SPARQL query to a Flink program. The mapping was implemented in a prototype used to determine the correctness and performance of the solution. The source code of the Citation: Ceballos, O.; Ramírez project is available in Github under the MIT license. -
Gephi-Poster-Sunbelt-July10.Pdf
The Open Graph Viz Platform Gephi is a new open-source network visualization platform. It aims to create a sustainable software and technical Download Gephi at ecosystem, driven by a large international open-source community, who shares common interests in networks and http://gephi.org complex systems. The rendering engine can handle networks larger than 100K elements and guarantees responsiveness. Designed to make data navigation and manipulation easy, it aims to fulll the complete chain from data importing to aesthetics renements and interaction. Particular focus is made on the software usability and interoperability with other tools. A lot of eorts are made to facilitate the community growth, by providing tutorials, plug-ins development documentation, support and student projects. Current developments include Dynamic Network Analysis (DNA) and Goals spigots (Emails, Twitter, Facebook …) import. Create the Photoshop of network visualization, by combining a rich set of Gephi aims at being built-in features and a sustainable, open to friendly user interface. many kind of users, and creating a large, Design a modular and international and extensible software diverse open-source architecture, facilitate plug-ins development, community reuse and mashup. Build a large, international Features Highlight and diverse open-source community. Community Join the network, participate designing the roadmap, get help to quickly code plug-ins or simply share your ideas. Select, move, paint, resize, connect, group Just with the mouse Export in SVG/PDF to include your infographics Metrics Architecture * Betweenness, Eigenvector, Closeness The modular architecture allows developers adding and extending features * Eccentricity with ease by developing plug-ins. Gephi can also be used as Java library in * Diameter, Average Shortest Path other applications and build for instance, a layout server. -
Plantuml Language Reference Guide (Version 1.2021.2)
Drawing UML with PlantUML PlantUML Language Reference Guide (Version 1.2021.2) PlantUML is a component that allows to quickly write : • Sequence diagram • Usecase diagram • Class diagram • Object diagram • Activity diagram • Component diagram • Deployment diagram • State diagram • Timing diagram The following non-UML diagrams are also supported: • JSON Data • YAML Data • Network diagram (nwdiag) • Wireframe graphical interface • Archimate diagram • Specification and Description Language (SDL) • Ditaa diagram • Gantt diagram • MindMap diagram • Work Breakdown Structure diagram • Mathematic with AsciiMath or JLaTeXMath notation • Entity Relationship diagram Diagrams are defined using a simple and intuitive language. 1 SEQUENCE DIAGRAM 1 Sequence Diagram 1.1 Basic examples The sequence -> is used to draw a message between two participants. Participants do not have to be explicitly declared. To have a dotted arrow, you use --> It is also possible to use <- and <--. That does not change the drawing, but may improve readability. Note that this is only true for sequence diagrams, rules are different for the other diagrams. @startuml Alice -> Bob: Authentication Request Bob --> Alice: Authentication Response Alice -> Bob: Another authentication Request Alice <-- Bob: Another authentication Response @enduml 1.2 Declaring participant If the keyword participant is used to declare a participant, more control on that participant is possible. The order of declaration will be the (default) order of display. Using these other keywords to declare participants -
Travels of Baron Munchausen
Songs for Scribus Travels of Baron Munchausen Songs for Scribus Travels of Baron Munchausen Travels of Baron Munchausen How should I disengage myself? I was not much pleased with my awkward situation – with a wolf face to face; our ogling was not of the most pleasant kind. If I withdrew my arm, then the animal would fly the more furiously upon me; that I saw in his flaming eyes. In short, I laid hold of his tail, turned him inside out like a glove, and flung him to the ground, where I left him. On January 3 2011, the first working day of the year, OSP gathered around Scribus. We wanted to explore framerendering, bootstrapping Scribus and to play around with the incredible adventures of Baron von Munchausen. The idea was to produce some kind of experimental result, a follow- up of to our earlier attempts to turn a frog into a prince. One day we asked Scribus team-members what their favourite Scribus feature was. After some hesitation they pointed us to the magical framerender. Framerenderer The Framerender is an image frame with a wrapper, a GUI, and a configuration scheme. External programs are invoked from inside of Scribus, and their output is placed into the frame! By default, the current Scribus is configured to host the following friends: LaTeX, Lilypond, gnuplot, dot/GraphViz and POV-Ray Today we are working with Lilypond. We are also creating cus- tom tools that generate PostScript/PDF via Imagemagick and other command line tools. Lilypond LilyPond is a music engraving program, devoted to producing the highest-quality sheet music possible. -
Performing Knowledge Requirements Analysis for Public Organisations in a Virtual Learning Environment: a Social Network Analysis
chnology Te & n S o o ti ft a w a m r r e Fontenele et al., J Inform Tech Softw Eng 2014, 4:2 o f E Journal of n n I g f i o n DOI: 10.4172/2165-7866.1000134 l e a e n r r i n u g o J ISSN: 2165-7866 Information Technology & Software Engineering Research Article Open Access Performing Knowledge Requirements Analysis for Public Organisations in a Virtual Learning Environment: A Social Network Analysis Approach Fontenele MP1*, Sampaio RB2, da Silva AIB2, Fernandes JHC2 and Sun L1 1University of Reading, School of Systems Engineering, Whiteknights, Reading, RG6 6AH, United Kingdom 2Universidade de Brasília, Campus Universitário Darcy Ribeiro, Brasília, DF, 70910-900, Brasil Abstract This paper describes an application of Social Network Analysis methods for identification of knowledge demands in public organisations. Affiliation networks established in a postgraduate programme were analysed. The course was ex- ecuted in a distance education mode and its students worked on public agencies. Relations established among course participants were mediated through a virtual learning environment using Moodle. Data available in Moodle may be ex- tracted using knowledge discovery in databases techniques. Potential degrees of closeness existing among different organisations and among researched subjects were assessed. This suggests how organisations could cooperate for knowledge management and also how to identify their common interests. The study points out that closeness among organisations and research topics may be assessed through affiliation networks. This opens up opportunities for ap- plying knowledge management between organisations and creating communities of practice. -
Portable Stateful Big Data Processing in Apache Beam
Portable stateful big data processing in Apache Beam Kenneth Knowles Apache Beam PMC Software Engineer @ Google https://s.apache.org/ffsf-2017-beam-state [email protected] / @KennKnowles Flink Forward San Francisco 2017 Agenda 1. What is Apache Beam? 2. State 3. Timers 4. Example & Little Demo What is Apache Beam? TL;DR (Flink draws it more like this) 4 DAGs, DAGs, DAGs Apache Beam Apache Flink Apache Cloud Hadoop Apache Apache Dataflow Spark Samza MapReduce Apache Apache Apache (paper) Storm Gearpump Apex (incubating) FlumeJava (paper) Heron MillWheel (paper) Dataflow Model (paper) 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 Apache Flink local, on-prem, The Beam Vision cloud Cloud Dataflow: Java fully managed input.apply( Apache Spark Sum.integersPerKey()) local, on-prem, cloud Sum Per Key Apache Apex Python local, on-prem, cloud input | Sum.PerKey() Apache Gearpump (incubating) ⋮ ⋮ 6 Apache Flink local, on-prem, The Beam Vision cloud Cloud Dataflow: Python fully managed input | KakaIO.read() Apache Spark local, on-prem, cloud KafkaIO Apache Apex ⋮ local, on-prem, cloud Apache Java Gearpump (incubating) class KafkaIO extends UnboundedSource { … } ⋮ 7 The Beam Model PTransform Pipeline PCollection (bounded or unbounded) 8 The Beam Model What are you computing? (read, map, reduce) Where in event time? (event time windowing) When in processing time are results produced? (triggers) How do refinements relate? (accumulation mode) 9 What are you computing? Read ParDo Grouping Composite Parallel connectors to Per element Group -
Apache Flink Fast and Reliable Large-Scale Data Processing
Apache Flink Fast and Reliable Large-Scale Data Processing Fabian Hueske @fhueske 1 What is Apache Flink? Distributed Data Flow Processing System • Focused on large-scale data analytics • Real-time stream and batch processing • Easy and powerful APIs (Java / Scala) • Robust execution backend 2 What is Flink good at? It‘s a general-purpose data analytics system • Real-time stream processing with flexible windows • Complex and heavy ETL jobs • Analyzing huge graphs • Machine-learning on large data sets • ... 3 Flink in the Hadoop Ecosystem Libraries Dataflow Table API Table ML Library ML Gelly Library Gelly Apache MRQL Apache SAMOA DataSet API (Java/Scala) DataStream API (Java/Scala) Flink Core Optimizer Stream Builder Runtime Environments Embedded Local Cluster Yarn Apache Tez Data HDFS Hadoop IO Apache HBase Apache Kafka Apache Flume Sources HCatalog JDBC S3 RabbitMQ ... 4 Flink in the ASF • Flink entered the ASF about one year ago – 04/2014: Incubation – 12/2014: Graduation • Strongly growing community 120 100 80 60 40 20 0 Nov.10 Apr.12 Aug.13 Dec.14 #unique git committers (w/o manual de-dup) 5 Where is Flink moving? A "use-case complete" framework to unify batch & stream processing Data Streams • Kafka Analytical Workloads • RabbitMQ • ETL • ... • Relational processing Flink • Graph analysis • Machine learning “Historic” data • Streaming data analysis • HDFS • JDBC • ... Goal: Treat batch as finite stream 6 Programming Model & APIs HOW TO USE FLINK? 7 Unified Java & Scala APIs • Fluent and mirrored APIs in Java and Scala • Table API