Expression and Efficient Processing of Fuzzy Queries in a Graph

View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by HAL-Rennes 1 Expression and Efficient Processing of Fuzzy Queries in a Graph Database Context Olivier Pivert, GrégorySmits, Virginie Thion To cite this version: Olivier Pivert, GrégorySmits, Virginie Thion. Expression and Efficient Processing of Fuzzy Queries in a Graph Database Context. Fuzz-IEEE'15 : 24th IEEE International Conference on Fuzzy Systems, Aug 2015, Istanbul, Turkey, France. pp.8, 2015. <hal-01140195> HAL Id: hal-01140195 https://hal.inria.fr/hal-01140195 Submitted on 8 Jul 2016 HAL is a multi-disciplinary open access L'archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinéeau dépôtet àla diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiésou non, lished or not. The documents may come from émanant des établissements d'enseignement et de teaching and research institutions in France or recherche fran¸caisou étrangers,des laboratoires abroad, or from public or private research centers. publics ou privés. Expression and Efficient Processing of Fuzzy Queries in a Graph Database Context Olivier Pivert, Gregory´ Smits, Virginie Thion Rennes 1 University / IRISA Lannion, France Email: fOlivier.Pivert,Gregory.Smits,[email protected] Abstract—Graph databases have aroused a large interest in language, driven by a running example, which is also used to the last years thanks to their large scope of potential applications present the FUDGE language in the Section III. We then present (e.g. social networks, biomedical networks, data stemming from the SUGAR system in Section IV, and focus on implementation the web). In a similar way as what has already been proposed issues. Related work is discussed in Section V. Section VI in relational databases, defining a language allowing a flexible recalls the contributions and outlines some perspectives. querying of graph databases may greatly improve usability of data. This paper focuses on the notion of fuzzy graph database and describes a fuzzy query language that makes it possible to I. BACKGROUND NOTIONS handle such database, which may be fuzzy or not, in a flexible way. This language, called FUDGE, is an extension of the CYPHER A. Graph databases language used for querying graph databases in a crisp way in the Neo4j graph database management system. It can be A graph database management system enables managing used to express preference queries on fuzzy graph databases. data for which the structure of the schema is modeled as The preferences concern i) the content of the vertices of the a graph (nodes are entities and edges are relations between graph and ii) the structure of the graph. The FUDGE language is entities), and data is handled through graph-oriented operations implemented in a system called SUGAR, that we present in this and type constructors [9]. Among the existing systems, let us article. We also discuss some implementation issues of the FUDGE mention AllegroGraph [1], InfiniteGraph [2], Neo4j [3] and language in SUGAR. Sparksee [5]. There are different models for graph databases (see [9] for an overview), including the attributed graph (aka. INTRODUCTION property graph) aimed to model a network of entities with embedded data. In this model, nodes and edges may contain Much work has been done about fuzzy querying of rela- data in attributes (aka. properties). tional databases, cf. for instance [21] or [31], which led in particular to a fuzzy extension of the SQL language, called SQLf [13]. However, even though relational databases are still B. Fuzzy graphs widely used, the need to handle complex data has led to the A graph is a pair (V; R), where V is a set and R is emergence of other types of data models. In the last few years, a relation on V . The elements of V (resp. R) correspond a new concept has started to attract a lot of attention in the to the vertices (resp. edges) of the graph. Similarly, any database world, namely that of graph databases (see e.g. [9], fuzzy relation ρ on a set V can be regarded as defining a [35], [7]). The basic purpose of graph DB is to efficiently weighted graph, or fuzzy graph [33], [29], where the edge manage networks of entities where each node is described by (x; y) 2 V × V has weight or strength ρ(x; y) 2 [0; 1]. a set of characteristics (e.g. a set of attributes), and each edge As noted in [36], the fuzzy relation ρ may be viewed as a represents a link between entities. Such a database model has fuzzy subset on V × V , which allows us to use much of the many potential applications, e.g. for modeling social networks, formalism of fuzzy sets. For example, we can say that ρ ⊆ ρ RDF data, cartographic databases, bibliographic databases, etc. 1 2 if 8(x; y); ρ1(x; y) ≤ ρ2(x; y). Some notable properties Such a model may be extended into the notion of a fuzzy graph that can be associated with fuzzy relations are reflexivity database where a degree may be attached to edges in order (ρ(x; x) = 1; 8x), symmetry (ρ(x; y) = ρ(y; x)), transitivity to express the “intensity” of any kind of gradual relationship (ρ(x; z) ≥ max min(ρ(x; y); ρ(y; z))). (e.g., likes, is friends with, is about). Graph databases, which y may be fuzzy or not, raise new challenges in terms of flexible An important operation on fuzzy relations is composition. querying since two aspects may be involved in the preferences Assume ρ1 and ρ2 are two fuzzy relations on V . Thus, that a user may express: i) the content of the nodes and ii) the composition ρ = ρ1 ◦ ρ2 is also a fuzzy relation on V s.t. structure of the graph. ρ(x; z) = maxy min(ρ1(x; y); ρ2(y; z)): The composition operation can be shown to be associative: (ρ1 ◦ ρ2) ◦ ρ3 = In this paper, we present the FUDGE language and the ρ1 ◦ (ρ2 ◦ ρ3): The associativity property allows us to use the SUGAR system implementing this language, that make it pos- notation ρk = ρ◦ρ◦:::◦ρ for the composition of ρ with itself sible to query a graph database in a flexible way. The paper is k − 1 times. In addition, following [36], we define ρ0 to be organized as follows. In section I, we present some background s. t. ρ0(x; y) = 0; 8(x; y). notions about graph databases, fuzzy set theory, fuzzy graphs and the fuzzy querying of graph database. In Section II, we Remark 1: Fuzzy graphs as defined above may be gener- recall the basis of the query algebra underlying the FUDGE alized to the case where a fuzzy set of vertices is considered. Then, denoting by F the fuzzy subset of V considered, 3) Preference combination: Different types of connectives the corresponding fuzzy graph is defined as (V; F; ρF ). may be considered for combining conditions about the content In this case, we let ρF be a relation on V defined as or the structure of the graph: “flat” (min, max, arithmetic mean, ρF (x; y) = min(ρ(x; y); µF (x); µF (y)) where µF denotes etc.), weighted (weighted mean, OWA, quantified proposition, the membership function attached to F . In the following, we etc, see [24]), or hierarchical. only consider the simple case of a crisp set of vertices. II. FUZZY GRAPH DATABASES AND FUZZY ALGEBRA If ρ is symmetric, we shall say that (V; ρ) is an undirected graph. Otherwise, we shall refer to (V; ρ) as a directed graph. In this section, we recall the main elements of the algebra Without loss of generality, we consider directed graphs in the that constitutes the foundations of the FUDGE language. The following. whole algebra is presented in detail in [32]. C. Fuzzy preferences on graph DBs A. Data model In this section, we describe the main elements that may We are interested in fuzzy graph databases where nodes appear in a fuzzy query addressed to a graph database. Two and edges can carry data (e.g. key-value pairs in attributed types of preferences have to be considered: those on content graphs, see Section I-A). So, we first propose an extension of and those on structure. the definition of a fuzzy graph into that of a fuzzy data graph. 1) Preferences on the node content: The idea is to ex- Definition 1 (Fuzzy data graph): Let E be a set of labels. press flexible conditions about attributes associated with nodes A fuzzy data graph G is a quadruple (V; R; κ, ζ), where V is and/or vertices of the graph. An example is: “find the people a finite set of nodes (each node n is identified by n:id), R = S who are young, highly educated, and live in Eastern Europe” e2Efρe : V × V ! [0; 1]g is a set of labeled fuzzy edges (assuming that each node contains information about the age, between nodes of V , and κ (resp. ζ) is a function assigning a education level, address, etc., of the person it corresponds to). (possibly structured) value to nodes (resp. edges) of G. Compound conditions may also be expressed using a large In the following, a graph database is meant to be a fuzzy data range of fuzzy connectives. We do not get into more detail as graph. The following example illustrates this notion. this aspect has been much studied in a relational context [31]. Example 1: Fig. 1 is an example of a fuzzy data graph 2) Preferences on the graph structure : Hereafter, we inspired from DBLP1, with some fuzzy edges (with a degree in describe the concepts of fuzzy graph theory that appear the brackets), and crisp ones (degree equal to 1).

Load more