Recurring Retrieval Needs in Diverse and Dynamic Dataspaces: Issues and Reference Framework

Barbara Catania, Francesco De Fino, Giovanna Guerrini University of Genoa, Italy {firstname.lastname}@dibris.unige.it

ABSTRACT is, a set of smart information services offered by a Processing information requests over heterogeneous municipality to users that, e.g., retrieves information dataspaces is very challenging because aimed at guar- about attractions, points of interests, and shops. A anteeing user satisfaction with respect to conflicting user might, for instance, ask for the authors of the fig- requirements on result quality and response time. In urative artworks she is watching (i.e., they are located [3], it has been argued that, in dynamic contexts pre- close to her position), together with information about venting substantial user involvement in interpreting other places where such authors are currently expos- requests, information on similar requests recurring over ing. In this specific context, data may come from dif- time can be exploited during query processing. ferent datasets, are heterogeneous and very dynamic, In this paper, referring to a graph-based modeling as the user may quickly change her position or the en- of dataspaces and requests, we propose a preliminary vironment itself (including data) might be different at approach in this direction centered on the enabling different instants of time. concept of Profiled Graph Query Pattern (PGQP) as Processing complex requests on diverse and dynamic an aggregation of information on past evaluations of sources requires: (i) request interpretation; (ii) source a set of previously executed queries. The information selection; (iii) actual processing of the request on the maintained in PGQP is not query results, as in mate- relevant, and usually big in size, sources. The overall rialized queries, but can include different kinds of data process is costly and, nevertheless, it may not guar- and . antee user satisfaction on the obtained result since the request (i) could be incorrectly interpreted, (ii) could be processed on inaccurate, incomplete, unre- 1. INTRODUCTION liable data, or (iii) could require a processing time Motivations. The last years have been character- inadequate to its urgency. ized by a tremendous growth of available informa- To make processing time more adequate to the ur- tion, coming from information sources that are highly gency of the request, approximate query processing heterogeneous in terms of structure, semantic rich- approaches can be considered, by providing a fast an- ness, and quality. Sources can indeed contain unstruc- swer at the cost of a lower accuracy [4]. Additionally, tured data as well as data with well-defined structure, to improve the interpretation of the request, and re- strongly correlated and semantically complex but rel- lated source selection, one possibility is to rely on user atively static data (e.g., Linked Open Data), highly involvement. Novel paradigms for exploratory user- dynamic user-generated data. This information is, data interactions, that emphasize user context and however, a resource currently exploited much below interactivity with the goal of facilitating exploration, its potential because of the difficulties for the users interpretation, retrieval, and assimilation of informa- in accessing it. Users would like to find answers to tion, are being developed. Such solutions, however, complex requests expressing relationships among the do not seem adequate in dynamic contexts, character- entities of their interest, but they are able to specify ized by urgent requests and by communication means requests only vaguely because they cannot reasonably hampering user interaction. know the format and structure of the data that encode Other innovative approaches should therefore be de- the information relevant to them. vised, relying on different kinds of information for find- As an example, consider a smart city explorer, that ing possibly approximate answers to complex infor- mation needs, even vaguely and imprecisely specified, operating on the full spectrum of relevant content in a dataspace of highly heterogeneous and poorly con- trolled sources. In [3], to overcome difficulties related to heterogeneity and dynamic nature, the exploitation of additional information, in terms of user profile and 2017, Copyright is with the authors. Published in the Workshop Proceedings of the EDBT/ICDT 2017 Joint Conference (March 21, request context, data and processing quality, similar 2017, Venice, Italy) on CEUR-WS.org (ISSN 1613-0073). Distri- requests recurring over time, is suggested. bution of this paper is permitted under the terms of the Creative In this paper, we focus on similar requests recurring Commons license CC-by-nc-nd 4.0 over time for improving approximate processing of re- spond to previously computed results, rather it ranges quests that we assume already interpreted and rep- from query results to any other higher level informa- resented according to a given formalism. Recurring tion collected during query processing (e.g., query- retrieval needs are very common in dynamic contexts, to-data mappings). In specifying the framework, we such as, for instance, during or after an exceptional identify the need for PGQP-aware approximate query event (environmental emergencies or flash mobbing processing techniques and PGQP management issues initiatives), in the context of users belonging to the raised by the framework. Each instantiation of the same community or that are in the same place, pos- framework will lead to the definition of a specific PGQP- sibly at different times. The information needs are aware query processing approach for a given set of widespread among different users, because induced by information associated with PGQPs. Approximated the event, the interests of the community, and the results can be generated for two main reasons: due place, respectively. We aim at taking advantage of the to the dynamic nature of the dataspace, information experience gained by prior processing in new searches associated with PGQPs may change between two dis- for limiting interpretation errors and response time, tinct executions of the same query; PGQPs are se- thus reducing the possibility of producing unsatisfac- lected based on similarity criteria with respect to the tory answers. query at hand. Recurring retrieval needs have been considered in The paper finally proposes a specific instantiation query processing for a very long time in terms of ma- of the framework, in the context of which the various terialized views (see, e.g., [7, 8]) The idea is to precom- components and concepts are formalized, and algo- pute the results of some recurring queries as materi- rithms for the most relevant tasks are specified. In alized views, select some of such views (view selection this instantiation, we formally define PGQPs but we problem) and re-use them (view-based query process- leave to future work the association of PGQPs with ing) for the execution of new requests. Unfortunately, information collected in prior executions. the usage of materialized views in the contexts de- scribed above suffers of some problems: (i) views as- Paper organization. The proposed framework is sociate a result to a given query, however in the ref- presented in Section 2 and a specific instantiation of erence contexts we may be interested in associating the framework is defined in Section 3. Section 4 con- other or additional information (e.g., the set of used cludes by discussing a number of open issues we are data sources or, under pay-as-you-go integration ap- investigating in this work-in-progress. proaches [6], query-to-data mappings); (ii) view up- dates are very frequent in dynamic environments, re- ducing the efficiency of the overall system; (iii) view- 2. PGQP-BASED FRAMEWORK based query processing techniques usually rely on a The framework we propose is based on a graph- precise semantics while heterogeneity tends to favour based representation of dataspaces, interpreted as a approximation-based approaches. collection of (possibly) schemaless data sources, and of Alternative usages of the concept of recurring re- user retrieval needs, expressed as entity relationships trieval needs in query processing have been recently queries over such dataspaces [14]. Specifically, we as- provided but, also in this case, precise query pro- sume to deal with data sources represented in terms cessing is taken into account. For example, in [13], of partially specified graphs, where some information, common needs are used with the aim of expediting associated with nodes, may be unknown [2]. Similarly, the processing of graph queries against a of each request is represented in terms of a graph, speci- graphs, by reducing the number of subgraph isomor- fying entities and relationships of interest, which may phism tests to be performed. contain node variables, in order to characterize infor- mation to be retrieved [2]. As an example, Figure 2 Contributions. This paper presents the preliminary (a) presents a graph query Q corresponding to the in- results of an on-going work aiming at exploiting in- formation need “the authors of the figurative artworks formation about similar requests recurring over time the user is watching (i.e., they are located close to her in approximate query processing of requests executed position), together with such artworks, a biography of over diverse and dynamic dataspaces. We refer to a the authors, and information about places in Genoa graph-based representation of dataspaces, interpreted where the authors are currently exposing their art- as a collection of (possibly) schemaless data sources, works”. Notice that, since the user request is related and on a graph-based language for modeling user re- to her current position, it needs to be interpreted and trieval needs, expressed as entity relationships queries processed before such position changes. over such dataspaces. We propose a framework for For representing and managing (sets of) recurring representing and managing (sets of) recurring user re- user requests, the framework relies on the enabling quests, and for exploiting them in approximate query concept of Profiled Graph Query Pattern (PGQP). Each processing. The proposed framework is based on the PGQP corresponds to a graph, associated with data or concept of Profiled Graph Query Pattern (PGQP). Like metadata related to the past evaluation of the queries materialized views, each profiled graph query pattern it represents. Such information does not necessar- corresponds to a kind of index value associated with ily correspond to previously computed results, thus, information related to the past evaluation of the queries PGQPs do not necessarily correspond to materialized it represents; however, differently from materialized views. Rather, any kind of metadata collected dur- views, such information does not necessarily corre- ing query processing, as the set of used data sources, Figure 1: PGQP-based framework

query-to-data mappings under pay-as-you-go integra- cific PGQP-based processing algorithms should be de- tion approaches [6], or result quality information, can signed, depending on the information associated with be considered. As an example, Figure 2 presents a PGQPs. On the other hand, query Qsub, which repre- graph query Q and two PGQPs φ1 and φ2. sents the residual part of Q, has to be executed based Independently from the information associated with on a traditional graph query processing algorithm (see, PGQPs, the proposed framework aims at addressing e.g. [17]). At the end of the processing, all partial re- three main problems which can be stated as follows sults are merged through the fusion operator θ and (see Figure 1 for an overall picture). the result is returned to the user.

Referring to Figure 2(a), Qφ1 is processed using in- PGQP-based query decomposition. Given a query formation associated with φ1 while Qsub is processed Q and a set S of PGQPs, defined in terms of graph by a traditional query processing algorithm. queries, determine whether it is possible to rely on S for executing Q. This is possible if Q can be approx- PGQP set update. For each graph query Q, exe- imated by a new query, obtained by composing a set cuted by the system without taking into account of Q subgraphs, which best match PGQPs, with the PGQPs, information related to its processing is col- portion of Q which cannot be represented in terms of lected (depending on the aim of the approach) and S. More formally, we look for a subgraph Qsub of Q, used, together with Q, for updating the set of avail- a PGQP subset S⊆ ⊆ S, and a fusion operator θ such able PGQPs at the end of the processing. that Q can be rewritten in a new approximate query As an example, Figure 2(d) shows PGQP φ1 ex-

Qa = θ({Qφi |φi ∈ S⊆},Qsub), where Qφi denotes the tended with the part of the query executed by a tra- (sub)graph of Q for which an approximate match with ditional query processing algorithm. (a subgraph of) φi exists, and Qa satisfies some op- timality criteria. That is, PGQPs in S⊆ are selected The framework described above can be instantiated according to a similarity function δ which quantifies in several ways. As an example, consider an instan- how close the query graph and each PGQP are. tiation targeted to source selection. In this case, the With reference to Figure 2, various decompositions information associated with PGQP φ1 could be the of Q are possible based on φ1 and φ2. Assuming φ1 sources exploited during the past processing of the is more similar to Q than φ2, the decomposition of Q queries it represents to retrieve the relevant informa- can return: (i) S⊆ = {φ1}; (ii) Qsub defined as shown tion. Thus, for instance, information relevant for φ1 in Figure 2(a); (iii) θ defined as the join between the may come from a dynamic data source s1 contain-

partial results of Qφ1 (tuples for variables ?a,?m,?b) ing information about art exhibitions of various artists and Qsub (tuples for variables ?a,?b). Notice that, by (left part of φ1) and a static data source s2 (such as interpreting PGQPs as views, traditional view selec- ) containing biographies of artists (right part). tion algorithms would not have selected any PGQP, An important difference with respect to the material- due to the approximate matches between “Genoa” and ized view approach is thus that the selected PGQP is “GE” and “Figurative” and “Figurative Art”. Note not associated with query results (that would have re-

that, even if in this example Qφ1 totally matches φ1, quired a frequent recomputation, given the dynamicity partial matches are also possible. of data sources, e.g., in art exhibits) but the sources contributing data to the result. PGQP-based process- PGQP-aware query processing. In order to opti- ing would rely on this information for targeting to

mize Qa evaluation, we can rely on information asso- these sources the execution of Qφ1 . As an effect of ciated with PGQPs in S⊆. More precisely, each graph the execution of Q, a third data source s3 containing

query Qφi can be executed taking into account in- museum catalogues with artwork positions is identi- formation related to its prior execution, that is, in- fied, and associated with the updated PGQP (shown formation associated with φi in S. To this aim, spe- in Figure 2(d)). Figure 2: Example PGQP: (a) graph query Q; (b) PGQP φ1; (c) PGQP φ2; (d) PGQP φ1 after the update phase

3. A PRELIMINARY PGQP-AWARE conjunctive queries and we denote with UACQ the re- PROCESSING ALGORITHM sulting language [1, 2]. Formally, an UACQ query Q is the union of expressions Q1, ...Qk where each Qh, In this section, we select suitable models for datas- 1 ≤ h ≤ k, is an expression of the form paces and user requests, we formalize the concept of ^ PGQP and we discuss their use and management in a ans(z1, ..., zn) ← (xi, ai, yi) preliminar instantiation of the framework. 1≤i≤m

3.1 Dataspace and User Requests such that the graph underlying Qh is acyclic, m > 0, Data Space Representation. Under the reference each xi and yi is a node variable or a constant (1 ≤ context, data spaces may contain redundant or even i ≤ m), each ai ∈ Σ (1 ≤ i ≤ m), and each zi is some missing information. Furthermore, the heterogeneous variable xj or yj (1 ≤ i ≤ n, 1 ≤ j ≤ m). The query nature of data leads to schemaless representations. To Q is Boolean if ans(), i.e. n = 0. take care of these features, we assume the data space UACQ queries can always be interpreted as graph is represented in terms of a collection of graph pattern (pattern) queries Q = (, z) where  is the graph pat- , i.e., graphs where nodes can also be labeled tern underlying Q and z = (z1, ..., zn). with variables, for representing situations in which the The semantics of a UACQ query Q = (, z) when ex- node identity is not known [1]). ecuted over a graph pattern database π = (N,E) can be defined as the set of answers which are returned Definition 1. Let N be a set of node labels. Let for each instance of the graph pattern database. Ac- Vnode be a set of node variables. Let Σ be an arbitrary cording to [2], we call them certain answers and we (finite or infinite) set of symbols. A Graph Pattern compute them as: Database over Σ is a pair π = (N,E) where: (i) N ⊆ n certainΣ(Q, π) = {v ∈ N |π |= [v/z]} N ∪Vnode is the finite set of nodes; (ii) E ⊆ N ×Σ×N is the set of edges. 2 where, given two graph patterns π1 and π2, π1 |= π2 (π1 implies π2) if [[π1]]Σ ⊆ [[π2]]Σ. The semantics of a graph pattern database corre- sponds to all graphs which are instances of the pat- 3.2 Profiled Graph Query Patterns tern and it is defined via homomorphisms. Given a A PGQP represents in a compact way a set of graph 0 0 graph G = (N ,E ) and a graph pattern database queries executed in the past. In order to provide a for- π = (N,E), a homomorphism h : π → G is a mapping mal definition of PGQPs, we propose to consider union 0 h : N → N that maps labels and variables used in π as aggregate operator of the component queries and to to labels used in G such that: represent a PGQP as a UACQ query. Since PGQPs 1. h (n) = n, for every node label n ∈ N; and are used for the approximate evaluation of queries, the 1 issue also arises of determining whether each PGQP 2. for every edge (n1, e, n2) ∈ E, there is a path represents, in a reasonably good way, (a portion of) a (h(n1), e, h(n2)) in G. graph query Q. If this is the case, φ becomes a candi- date for the PGQP-aware processing of Q and we say The semantics of a graph pattern w.r.t. the label- that Q matches the PGQP. PGQP candidates can be ing alphabet Σ is [[π]]Σ = {G over Σ|G |= π}, where selected through approximate subgraph matching [12] G |= π holds if a homomorphism h : π → G exists. and the usage of a graph similarity function.

Graph Query Language. In this work, for the sake Definition 2. Let Q1 = (N1,E1),Q2 = (N2,E2) ∈ of simplicity and for guaranteeing a tractable com- UACQ. An Approximate Subgraph Match between Q1 0 0 bined and data complexity (fundamental requirements and Q2 is a bijection mapping λ : N1 ↔ N2, where 0 for dealing with massive in size graph databases), we Ni ⊆ Ni, i = 1, 2. An Approximate Subgraph Match- consider graph queries specified as unions of acyclic ing Function is a function m that, for each pair of Algorithm 1 PGQP-based query decomposition Algorithm 2 PGQP set update INPUT INPUT Q: graph query Q: graph query S = {φ1, φ2, ..., φn}: set of PGQPs S = {φ1, φ2, ..., φn}: set of PGQPs δm: a graph similarity function induced by an approxi- vc: maximum allowed transformation cost mate subgraph matching function m OUTPUT vs: minimum allowed graph similarity S: an updated set of PGQPs OUTPUT APPROACH (S⊆,Qsub), such that S⊆ ⊆ S and Qsub is a subgraph of 1: Let φh ∈ S such that CT (Q, φh) = min1≤i≤nCT (Q, φi) Q 2: if (CT (Q, φh) ≤ vc) then APPROACH 3: φh = T (Q, φh) 1: Let φk ∈ S such that δm(Q, φk) = max1≤i≤nδm(Q, φi) 4: S = S\{φh} ∪ {φh} 2: if δm(Q, φk) > vs then 5: else return ({Q },Q − Q ) 3: φk φk 6: S = S ∪ {Q} 4: else 7: return S 5: return ({},Q)

Qφ and then all remaining non connected nodes be- Q1,Q2 ∈ UACQ, returns an approximate subgraph longing to Qφ). This consideration guides us in the match from Q1 to Q2.A Graph Similarity Function definition of a preliminary decomposition algorithm is a metric function δ : UACQ × UACQ → [0, 1] such (see Algorithm 1). Given a query Q and a set of that, for each Q1,Q2 ∈ UACQ, δ(Q1,Q2) quantifies PGQPs S, the PGQP φk providing the highest simi- the similarity between Q1 and Q2. δ is induced by an larity value with respect to Q is selected and, if such Approximate Subgraph Matching Function m (denoted value is greater than a given threshold (thus, Q and φk by δm) if the value assigned to (Q1,Q2) quantifies the 0 0 are close enough), the binary decomposition described similarity between subgraphs Q1 and Q2 involved in above is returned. Otherwise, no decomposition is ap- the match m(Q1,Q2). 2 plied. PGQPs and PGQP matches can now be defined. 3.4 PGQP Set Update In order to update the PGQP store, we rely on a Definition 3. A Profiled Graph Query Pattern φ graph transformation function. is a graph query in UACQ. Let δm be a graph sim- ilarity function induced by an approximate subgraph Definition 4. A Graph Transformation Function matching function m. A graph query Q matches a is a function T : UACQ × UACQ → UACQ such PGQP φ, with similarity v ∈ (0, 1], if δm(Q, φ) = v. that, for each query Q and PGQP φ, Q is a match of Qφ denotes the (sub)graph of Q involved in the match. T (Q, φ), based on δm. 2 Q is a total match of φ if Qφ = Q, it is a partial match otherwise. 2 Transformation functions can be defined in terms of a small set of operations, previously defined in the In the following, we assume that function δm is pro- context of the so called tree editing problem [16], which vided by the system and used during PGQP-based at least contains node relabelling, deletion and inser- query decomposition and PGQP set update. tion. Each transformation function can be defined by specifying which operations should be applied to 3.3 PGQP-based Query Decomposition a given PGQP for each node and edge of the query In the following, we present a preliminary PGQP- Q at hand, in such a way that Q, after the trans- based query decomposition approach, defined assum- formation, matches the resulting PGQP. An example ing to: (i) select only one PGQP (thus, |S⊆| = 1); (ii) of graph transformation is the function that, given a merge partial results thought relational operators. PGQP φ and a query Q, inserts into φ all nodes and Several approximate graph matching functions have edges in Q − Qφ that have not been mapped into φ been proposed so far which can be used as basis for nodes by δm. defining similarity functions (see, e.g., [9, 10] for some In order to identify the best transformation, a graph recent papers and [15] for a survey). A relevant exam- cost function can be defined as follows. ple, due to its flexibility, is represented by the Similar- ity Flooding function [11]. Based on the intuition that Definition 5. A Graph Cost Function for a trans- similar nodes have similar neighbors, node similar- formation function T is a function CT : UACQ × ity values are computed relying on information about UACQ → R, which, for each graph query Q and PGQP node neighbors through an iterative fixpoint compu- φ, assigns a cost value to T (Q, φ). 2 tation. The selection metric, used in Selection Flood- ing to identify the best mapping, can be used (after Similarly to the tree editing problem, a graph cost normalization of values between 0 and 1) as a graph function quantifies the transformation effort by assign- similarity function. Details about PGQP definition in ing a cost to each applied editing operation. As an ex- terms of the Similarity Flooding function can be found ample, given a graph query Q and a PGQP φ, cost 1 in [5]. can be assigned to the insertion of each new node and According to Definition 3, each PGQP φ divides a new edge in φ; a cost proportional to their distance graph query Q into two subqueries, namely Qφ and can be assigned to each node n in Q which is mapped 0 Q − Qφ (computed by removing from Q first edges in into a node v in φ by δm. Algorithm 2 presents a simple approach for PGQP current and should be indeed used for updating the set update based on the notions introduced above. In PGQP set? What about the tradeoff between the size particular, it updates the input set of PGQPs by first of the PGQP set, the query processing cost, and the selecting the PGQP leading to the lowest transforma- effectiveness (in terms of result quality) of the overall tion cost for the input graph query Q. If such cost is query approach? lower than a given threshold, the PGQP set is updated by replacing the selected PGQP with the PGQP ob- 5. REFERENCES tained by the transformation. Otherwise, the PGQP [1] P. Barcel´o. Querying graph databases. In Proc. set is updated by inserting Q as a new PGQP. of PODS, pp. 175–188. ACM, 2013. Example 1. Consider the graph query Q and the [2] P. Barcel´o, L. Libkin, and J. L. Reutter. PGQP set S = {φ1, φ2} presented in Figure 2. Con- Querying graph patterns. In Proc. of PODS, pp. sider the graph similarity function δm induced by the 199–210. ACM, 2011. Similarity Flooding algorithm [11] (see [5] for details), [3] B. Catania et Al. Wearable queries: adapting a minimum similarity threshold vs = 0.7, and a max- common retrieval needs to data and users. In imum cost threshold vc = 3. Proc. DBRank Workshop. ACM, 2013. Q is processed as follows. Algorithm 1 first com- [4] B. Catania and G Guerrini. Approximate putes the mapping and the similarity between Q and queries with adaptive processing. Advanced each φi, relying on δm. Partial approximate subgraph Query Processing, Intelligent Systems Reference matches can be generated from Q to both φ1 and φ2. Library (36), pp. 237-269. Springer, 2013. Assume the following graph similarities are computed: [5] B. Catania, F. De Fino, and G. Guerrini. δm(Q, φ1) = 0.83 and δm(Q, φ2) = 0.40. PGQP φ1 is Recurring retrieval needs in diverse and selected because it is the most similar to Q. Since sim- dynamic dataspaces: issues and reference ilarity is greater than vs, decomposition is applied and framework, extended version. Technical Report, the pair ({Qφ1 },Q − Qφ1 ) is returned for processing. University of Genoa, Italy, 2016. At the end of the processing, Algorithm 2 is used [6] A. Das Sarma, X. Dong, and A. Halevy. for updating the PGQP store. Suppose that CT (Q − Bootstrapping Pay-As-You-Go Qφ1 , φ1) = 2.56 and CT (Q − Qφ2 , φ2) = 3. PGQP φ1 systems.In Proc. of SIGMOD. ACM, 2008. is selected because of the lowest transformation cost. [7] W. Fan, X. Wang, and Y. Wu. Answering Since the transformation cost is lower than vc, PGQP pattern queries using views. IEEE Trans. φ1 is updated by inserting all nodes and edges in Q − Knowl. Data Eng., 28(2):326–341, 2016. Q not mapped into φ nodes by δ . 3 φ1 1 m [8] F. Goasdou´eet Al. View selection in databases. PVLDB, 5(2): 97–108, 2011. 4. OPEN ISSUES [9] J. He et Al. Assessing single-pair similarity over The work reported in this paper represents just the graphs by aggregating first-meeting first step towards the design of approximate query pro- probabilities. Inf. Syst. 42: 107-122, 2014. cessing approaches for recurring retrieval needs. Sev- [10] D. Koutra et Al. DeltaCon: Principled eral issues still need to be investigated. Some of them Massive-Graph Similarity Function with are discussed in what follows. Attribution. TKDD 10(3): 28, 2016 Framework instantiation. In this paper, we consid- [11] S. Melnik, H. Garcia-Molina, and E. Rahm. ered a simple notion of approximate subgraph match- Similarity flooding: A versatile graph matching ing, defined by relaxing the notion of subgraph iso- algorithm and its application to schema morphism. Other approaches can however be con- matching. In Proc. of ICDE, pp. 117–128. IEEE, sidered, which allow more flexible types of approxi- 2002. mation (see, e.g., [15] for a survey). More effective [12] Y. Tian and J.M. Patel. TALE: A Tool for query decomposition approaches should also be de- Approximate Large Graph Matching. In Proc. of signed which increase the number of generated sub- ICDE, pp. 963-972, 2008 queries, together with specific optimality criterias; op- [13] J. Wang, N. Ntarmos, and P. Triantafillou. timization approaches should be developed which take Indexing query graphs to speedup graph query into account the number and the shape of the identi- processing. In Proc. of EDBT, pages 41–52. fied subqueries for selecting the most efficient PGQP- 2016. based execution plan. [14] G. Weikum. Data and knowledge discovery. Information associated with PGQPs. As already GRDI 2020, 2011. discussed, several types of information can be associ- [15] Y. Wu. Extending graph homomorphism and ated with PGQPs. For each type of information, spe- simulation for real life graph matching. PhD cific PGQP-based query processing algorithms should Thesis, 2010. be devised. PGQP management. In this paper, we assumed to [16] K. Zhang and D. Shasha. Simple fast algorithms deal with an initial set of PGQPs and we only prelim- for the editing distance between trees and inarily discuss how to modify it taking into account a related problems. SIAM Journal on Computing, new graph query execution. There are however sev- 18(6):1245–1262, 1989. eral further issues to be taken into account for PGQP [17] P. Zhao, J. Han. On graph query optimization management, e.g., how to decide that a query is re- in large networks. PVLDB, 3(1): 340-351, 2010.