SGL: a Domain-Specific Language for Large-Scale Analysis of Open-Source
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Empirical Study on the Usage of Graph Query Languages in Open Source Java Projects
Empirical Study on the Usage of Graph Query Languages in Open Source Java Projects Philipp Seifer Johannes Härtel Martin Leinberger University of Koblenz-Landau University of Koblenz-Landau University of Koblenz-Landau Software Languages Team Software Languages Team Institute WeST Koblenz, Germany Koblenz, Germany Koblenz, Germany [email protected] [email protected] [email protected] Ralf Lämmel Steffen Staab University of Koblenz-Landau University of Koblenz-Landau Software Languages Team Koblenz, Germany Koblenz, Germany University of Southampton [email protected] Southampton, United Kingdom [email protected] Abstract including project and domain specific ones. Common applica- Graph data models are interesting in various domains, in tion domains are management systems and data visualization part because of the intuitiveness and flexibility they offer tools. compared to relational models. Specialized query languages, CCS Concepts • General and reference → Empirical such as Cypher for property graphs or SPARQL for RDF, studies; • Information systems → Query languages; • facilitate their use. In this paper, we present an empirical Software and its engineering → Software libraries and study on the usage of graph-based query languages in open- repositories. source Java projects on GitHub. We investigate the usage of SPARQL, Cypher, Gremlin and GraphQL in terms of popular- Keywords Empirical Study, GitHub, Graphs, Query Lan- ity and their development over time. We select repositories guages, SPARQL, Cypher, Gremlin, GraphQL based on dependencies related to these technologies and ACM Reference Format: employ various popularity and source-code based filters and Philipp Seifer, Johannes Härtel, Martin Leinberger, Ralf Lämmel, ranking features for a targeted selection of projects. -
Use Splunk with Big Data Repositories Like Spark, Solr, Hadoop and Nosql Storage
Copyright © 2016 Splunk Inc. Use Splunk With Big Data Repositories Like Spark, Solr, Hadoop And Nosql Storage Raanan Dagan, May Long Big Data Architect, Splunk Disclaimer During the course of this presentaon, we may make forward looking statements regarding future events or the expected performance of the company. We cauJon you that such statements reflect our current expectaons and esJmates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-looking statements, please review our filings with the SEC. The forward- looking statements made in the this presentaon are being made as of the Jme and date of its live presentaon. If reviewed aer its live presentaon, this presentaon may not contain current or accurate informaon. We do not assume any obligaon to update any forward looking statements we may make. In addiJon, any informaon about our roadmap outlines our general product direcJon and is subject to change at any Jme without noJce. It is for informaonal purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obligaon either to develop the features or funcJonality described or to include any such feature or funcJonality in a future release. 2 Agenda Use Cases: Fraud With Solr, Splunk, And Splunk AnalyJcs For Hadoop Business AnalyJcs With Cassandra, Splunk Cloud, And Splunk AnalyJcs For Hadoop Document Classificaon With Spark And Splunk Network IT With Kaa And Splunk Kaa Add On Demo 3 Fraud With Solr, Splunk, And Splunk AnalyJcs For Hadoop Use Case: Fraud – Why Apache Solr Apache Solr is an open source enterprise search plaorm from the Apache Lucene API. -
Personal Information Backgrounds Abilities Highlights
Yi Du Computer Network Information Center, Chinese Academy of Sciences Phone:+86-15810134970 Email:[email protected] Homepage: http://yiducn.github.io/ Personal Information Gender: Male Date of Graduate:July, 2013 Address: 4# Building, South 4th Street Zhong Guan Cun. Beijing, P.R.China. 100190 Backgrounds 2015.12~Now Department of Big Data Technology and Application Development, Computer Network Information Center Beijing, China Job Title: Associate Professor Research Interest: Data Mining, Visual Analytics 2015.09~2016.09 School of Electrical and Computer Engineering, Purdue University USA Job Title: Visiting Scholar Research Interest: Spatio-temporal Visualization, Visual Analytics 2013.09~2015.12 Scientific Data Center, Computer Network Information Center, CAS Beijing, China Job Title: Assistant Professor Research Interest: Data Processing, Data Visualization, HCI 2008.09~2013.09 Institute of Software Chinese Academy of Sciences Beijing, China Major: Computer Applied Technology Doctoral Degree Research Interest: Human Computer Interaction(HCI), Information Visualization 2004.08-2008.07 Shandong University Jinan Major: Software Engineering Bachelor's Degree Abilities Master the design and development of data science system, including data collecting, wrangling, analyzing, mining, visualizing and interacting. Master analyzing and mining of large-scale spatio-temporal data. Experiences in coding with Java, JavaScript. Familiar with Python and C++. Experiences in MongoDB, DB2. Familiar with MS SQLServer and Oracle. Experiences in traditional data mining and machine learning algorithms. Familiar with Hadoop, Titan, Kylin, etc. Highlights Participated in an open source project Gephi, contributed several plugins with over 3000 lines of code. An analytics platform named DVIZ, in which I played the leading role, gained several prizes in China. -
Hortonworks Cybersecurity Platform Administration (April 24, 2018)
Hortonworks Cybersecurity Platform Administration (April 24, 2018) docs.cloudera.com Hortonworks Cybersecurity April 24, 2018 Platform Hortonworks Cybersecurity Platform: Administration Copyright © 2012-2018 Hortonworks, Inc. Some rights reserved. Hortonworks Cybersecurity Platform (HCP) is a modern data application based on Apache Metron, powered by Apache Hadoop, Apache Storm, and related technologies. HCP provides a framework and tools to enable greater efficiency in Security Operation Centers (SOCs) along with better and faster threat detection in real-time at massive scale. It provides ingestion, parsing and normalization of fully enriched, contextualized data, threat intelligence feeds, triage and machine learning based detection. It also provides end user near real-time dashboarding. Based on a strong foundation in the Hortonworks Data Platform (HDP) and Hortonworks DataFlow (HDF) stacks, HCP provides an integrated advanced platform for security analytics. Please visit the Hortonworks Data Platform page for more information on Hortonworks technology. For more information on Hortonworks services, please visit either the Support or Training page. Feel free to Contact Us directly to discuss your specific needs. Except where otherwise noted, this document is licensed under Creative Commons Attribution ShareAlike 4.0 License. http://creativecommons.org/licenses/by-sa/4.0/legalcode ii Hortonworks Cybersecurity April 24, 2018 Platform Table of Contents 1. HCP Information Roadmap ......................................................................................... -
Towards an Integrated Graph Algebra for Graph Pattern Matching with Gremlin
Towards an Integrated Graph Algebra for Graph Pattern Matching with Gremlin Harsh Thakkar1, S¨orenAuer1;2, Maria-Esther Vidal2 1 Smart Data Analytics Lab (SDA), University of Bonn, Germany 2 TIB & Leibniz University of Hannover, Germany [email protected], [email protected] Abstract. Graph data management (also called NoSQL) has revealed beneficial characteristics in terms of flexibility and scalability by differ- ently balancing between query expressivity and schema flexibility. This peculiar advantage has resulted into an unforeseen race of developing new task-specific graph systems, query languages and data models, such as property graphs, key-value, wide column, resource description framework (RDF), etc. Present-day graph query languages are focused towards flex- ible graph pattern matching (aka sub-graph matching), whereas graph computing frameworks aim towards providing fast parallel (distributed) execution of instructions. The consequence of this rapid growth in the variety of graph-based data management systems has resulted in a lack of standardization. Gremlin, a graph traversal language, and machine provide a common platform for supporting any graph computing sys- tem (such as an OLTP graph database or OLAP graph processors). In this extended report, we present a formalization of graph pattern match- ing for Gremlin queries. We also study, discuss and consolidate various existing graph algebra operators into an integrated graph algebra. Keywords: Graph Pattern Matching, Graph Traversal, Gremlin, Graph Alge- bra 1 Introduction Upon observing the evolution of information technology, we can observe a trend arXiv:1908.06265v2 [cs.DB] 7 Sep 2019 from data models and knowledge representation techniques being tightly bound to the capabilities of the underlying hardware towards more intuitive and natural methods resembling human-style information processing. -
My Steps to Learn About Apache Nifi
My steps to learn about Apache NiFi Paulo Jerônimo, 2018-05-24 05:36:18 WEST Table of Contents Introduction. 1 About this document . 1 About me . 1 Videos with a technical background . 2 Lab 1: Running Apache NiFi inside a Docker container . 3 Prerequisites . 3 Start/Restart. 3 Access to the UI . 3 Status. 3 Stop . 3 Lab 2: Running Apache NiFi locally . 5 Prerequisites . 5 Installation. 5 Start . 5 Access to the UI . 5 Status. 5 Stop . 6 Lab 3: Building a simple Data Flow . 7 Prerequisites . 7 Step 1 - Create a Nifi docker container with default parameters . 7 Step 2 - Access the UI and create two processors . 7 Step 3 - Add and configure processor 1 (GenerateFlowFile) . 7 Step 4 - Add and configure processor 2 (Putfile) . 10 Step 5 - Connect the processors . 12 Step 6 - Start the processors. 14 Step 7 - View the generated logs . 14 Step 8 - Stop the processors . 15 Step 9 - Stop and destroy the docker container . 15 Conclusions . 15 All references . 16 Introduction Recently I had work to produce a document with a comparison between two tools for Cloud Data Flow. I didn’t have any knowledge of this kind of technology before creating this document. Apache NiFi is one of the tools in my comparison document. So, here I describe some of my procedures to learn about it and take my own preliminary conclusions. I followed many steps on my own desktop (a MacBook Pro computer) to accomplish this task. This document shows you what I did. Basically, to learn about Apache NiFi in order to do a comparison with other tool: • I saw some videos about it. -
Licensing Information User Manual
Oracle® Database Express Edition Licensing Information User Manual 18c E89902-02 February 2020 Oracle Database Express Edition Licensing Information User Manual, 18c E89902-02 Copyright © 2005, 2020, Oracle and/or its affiliates. This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government end users are "commercial computer software" or “commercial computer software documentation” pursuant to the applicable Federal -
Large Scale Querying and Processing for Property Graphs Phd Symposium∗
Large Scale Querying and Processing for Property Graphs PhD Symposium∗ Mohamed Ragab Data Systems Group, University of Tartu Tartu, Estonia [email protected] ABSTRACT Recently, large scale graph data management, querying and pro- cessing have experienced a renaissance in several timely applica- tion domains (e.g., social networks, bibliographical networks and knowledge graphs). However, these applications still introduce new challenges with large-scale graph processing. Therefore, recently, we have witnessed a remarkable growth in the preva- lence of work on graph processing in both academia and industry. Querying and processing large graphs is an interesting and chal- lenging task. Recently, several centralized/distributed large-scale graph processing frameworks have been developed. However, they mainly focus on batch graph analytics. On the other hand, the state-of-the-art graph databases can’t sustain for distributed Figure 1: A simple example of a Property Graph efficient querying for large graphs with complex queries. Inpar- ticular, online large scale graph querying engines are still limited. In this paper, we present a research plan shipped with the state- graph data following the core principles of relational database systems [10]. Popular Graph databases include Neo4j1, Titan2, of-the-art techniques for large-scale property graph querying and 3 4 processing. We present our goals and initial results for querying ArangoDB and HyperGraphDB among many others. and processing large property graphs based on the emerging and In general, graphs can be represented in different data mod- promising Apache Spark framework, a defacto standard platform els [1]. In practice, the two most commonly-used graph data models are: Edge-Directed/Labelled graph (e.g. -
Handling Data Flows of Streaming Internet of Things Data
IT16048 Examensarbete 30 hp Juni 2016 Handling Data Flows of Streaming Internet of Things Data Yonatan Kebede Serbessa Masterprogram i datavetenskap Master Programme in Computer Science i Abstract Handling Data Flows of Streaming Internet of Things Data Yonatan Kebede Serbessa Teknisk- naturvetenskaplig fakultet UTH-enheten Streaming data in various formats is generated in a very fast way and these data needs to be processed and analyzed before it becomes useless. The technology currently Besöksadress: existing provides the tools to process these data and gain more meaningful Ångströmlaboratoriet Lägerhyddsvägen 1 information out of it. This thesis has two parts: theoretical and practical. The Hus 4, Plan 0 theoretical part investigates what tools are there that are suitable for stream data flow processing and analysis. In doing so, it starts with studying one of the main Postadress: streaming data source that produce large volumes of data: Internet of Things. In this, Box 536 751 21 Uppsala the technologies behind it, common use cases, challenges, and solutions are studied. Then it is followed by overview of selected tools namely Apache NiFi, Apache Spark Telefon: Streaming and Apache Storm studying their key features, main components, and 018 – 471 30 03 architecture. After the tools are studied, 5 parameters are selected to review how Telefax: each tool handles these parameters. This can be useful for considering choosing 018 – 471 30 00 certain tool given the parameters and the use case at hand. The second part of the thesis involves Twitter data analysis which is done using Apache NiFi, one of the tools Hemsida: studied. The purpose is to show how NiFi can be used for processing data starting http://www.teknat.uu.se/student from ingestion to finally sending it to storage systems. -
Introduction to Graph Database with Cypher & Neo4j
Introduction to Graph Database with Cypher & Neo4j Zeyuan Hu April. 19th 2021 Austin, TX History • Lots of logical data models have been proposed in the history of DBMS • Hierarchical (IMS), Network (CODASYL), Relational, etc • What Goes Around Comes Around • Graph database uses data models that are “spiritual successors” of Network data model that is popular in 1970’s. • CODASYL = Committee on Data Systems Languages Supplier (sno, sname, scity) Supply (qty, price) Part (pno, pname, psize, pcolor) supplies supplied_by Edge-labelled Graph • We assign labels to edges that indicate the different types of relationships between nodes • Nodes = {Steve Carell, The Office, B.J. Novak} • Edges = {(Steve Carell, acts_in, The Office), (B.J. Novak, produces, The Office), (B.J. Novak, acts_in, The Office)} • Basis of Resource Description Framework (RDF) aka. “Triplestore” The Property Graph Model • Extends Edge-labelled Graph with labels • Both edges and nodes can be labelled with a set of property-value pairs attributes directly to each edge or node. • The Office crew graph • Node �" has node label Person with attributes: <name, Steve Carell>, <gender, male> • Edge �" has edge label acts_in with attributes: <role, Michael G. Scott>, <ref, WiKipedia> Property Graph v.s. Edge-labelled Graph • Having node labels as part of the model can offer a more direct abstraction that is easier for users to query and understand • Steve Carell and B.J. Novak can be labelled as Person • Suitable for scenarios where various new types of meta-information may regularly -
The Query Translation Landscape: a Survey
The Query Translation Landscape: a Survey Mohamed Nadjib Mami1, Damien Graux2,1, Harsh Thakkar3, Simon Scerri1, Sren Auer4,5, and Jens Lehmann1,3 1Enterprise Information Systems, Fraunhofer IAIS, St. Augustin & Dresden, Germany 2ADAPT Centre, Trinity College of Dublin, Ireland 3Smart Data Analytics group, University of Bonn, Germany 4TIB Leibniz Information Centre for Science and Technology, Germany 5L3S Research Center, Leibniz University of Hannover, Germany October 2019 Abstract Whereas the availability of data has seen a manyfold increase in past years, its value can be only shown if the data variety is effectively tackled —one of the prominent Big Data challenges. The lack of data interoperability limits the potential of its collective use for novel applications. Achieving interoperability through the full transformation and integration of diverse data structures remains an ideal that is hard, if not impossible, to achieve. Instead, methods that can simultaneously interpret different types of data available in different data structures and formats have been explored. On the other hand, many query languages have been designed to enable users to interact with the data, from relational, to object-oriented, to hierarchical, to the multitude emerging NoSQL languages. Therefore, the interoperability issue could be solved not by enforcing physical data transformation, but by looking at techniques that are able to query heterogeneous sources using one uniform language. Both industry and research communities have been keen to develop such techniques, which require the translation of a chosen ’universal’ query language to the various data model specific query languages that make the underlying data accessible. In this article, we survey more than forty query translation methods and tools for popular query languages, and classify them according to eight criteria. -
Full-Graph-Limited-Mvn-Deps.Pdf
org.jboss.cl.jboss-cl-2.0.9.GA org.jboss.cl.jboss-cl-parent-2.2.1.GA org.jboss.cl.jboss-classloader-N/A org.jboss.cl.jboss-classloading-vfs-N/A org.jboss.cl.jboss-classloading-N/A org.primefaces.extensions.master-pom-1.0.0 org.sonatype.mercury.mercury-mp3-1.0-alpha-1 org.primefaces.themes.overcast-${primefaces.theme.version} org.primefaces.themes.dark-hive-${primefaces.theme.version}org.primefaces.themes.humanity-${primefaces.theme.version}org.primefaces.themes.le-frog-${primefaces.theme.version} org.primefaces.themes.south-street-${primefaces.theme.version}org.primefaces.themes.sunny-${primefaces.theme.version}org.primefaces.themes.hot-sneaks-${primefaces.theme.version}org.primefaces.themes.cupertino-${primefaces.theme.version} org.primefaces.themes.trontastic-${primefaces.theme.version}org.primefaces.themes.excite-bike-${primefaces.theme.version} org.apache.maven.mercury.mercury-external-N/A org.primefaces.themes.redmond-${primefaces.theme.version}org.primefaces.themes.afterwork-${primefaces.theme.version}org.primefaces.themes.glass-x-${primefaces.theme.version}org.primefaces.themes.home-${primefaces.theme.version} org.primefaces.themes.black-tie-${primefaces.theme.version}org.primefaces.themes.eggplant-${primefaces.theme.version} org.apache.maven.mercury.mercury-repo-remote-m2-N/Aorg.apache.maven.mercury.mercury-md-sat-N/A org.primefaces.themes.ui-lightness-${primefaces.theme.version}org.primefaces.themes.midnight-${primefaces.theme.version}org.primefaces.themes.mint-choc-${primefaces.theme.version}org.primefaces.themes.afternoon-${primefaces.theme.version}org.primefaces.themes.dot-luv-${primefaces.theme.version}org.primefaces.themes.smoothness-${primefaces.theme.version}org.primefaces.themes.swanky-purse-${primefaces.theme.version}