A Fast Location Service for Partial Spatial Replicas

Yun Tian Philip J. Rhodes Department of Computer and Information Science Department of Computer and Information Science University of Mississippi University of Mississippi University, MS, USA 38677 University, MS, USA 38677 Email: [email protected] Email: [email protected]

Abstract—This paper describes a design and implementa- some time [6], [7], [8]. GIS data generally refers to the tion of a distributed high-performance partial spatial replica surface of the earth, and is usually two dimensional. Infor- location service. Our replica location service identifies the set mation such as road locations, municipal boundaries, bodies of partial replicas that intersect with a region of interest, an important component of partial spatial replica selection. of water, etc., can be represented using a sequence of points We find that using an R-tree data structure is superior to denoting the feature. Although this type of data is certainly relying on a relational alone when handling spatial spatial, our own research is focused more on volumetric data queries. We have also added a collection of optimizations data. For example, Computational Fluid Dynamics (CFD) that together improve performance. In particular, database simulations commonly represent three or four dimensional Query Aggregation and using a Morton curve during R-tree construction produce significant performance gains. Experi- volumes using rectilinear grids of data points that span the mental results show that the proposed partial spatial replica volume. Conceptually, such datasets (and subsets extracted location service scales well for multi-client and distributed from them) often have the shape of a rectangular prism, large spatial queries, queries that return more than 10,000 while GIS data may have an entirely irregular shape. replicas. Individual servers with one million pieces of replica The work described here was performed within the metadata in the backend database can support up to 100 clients concurrently when handling large spatial queries. context of the Granite Scientific Database System, which Our previous work solved the same problem using an un- provides efficient access to spatial datasets stored on local modified Globus Toolkit, but the work described here modifies or remote disks [9], [10]. Granite allows its users to specify and extends existing Globus Toolkit code to handle spatial spatial subsets of larger n-dimensional volumes for retrieval. metadata operations. It takes advantage of UDT [11], a UDP based reliable Keywords-Globus RLS; R-tree; replica; spatial; data transfer protocol that is well suited to the transfer of large data volumes over high bandwidth-delay product I.INTRODUCTION connections. The combination of UDT and Granite provides Replication and Replica Selection are widely used in fast access to subsets of datasets stored remotely. If a distributed systems to help distribute both data and the computation needs only a comparatively small subset of computation performed upon it. The Globus Toolkit includes a larger volume, accessing that subset remotely may be a Replica Location Service (RLS) that provides users with a much faster than moving the entire volume. The flexibility flexible mechanism to discover replicas that are convenient provided by this additional option is especially welcome in for a particular computation [1], [2], [3], [4], [5]. By provid- heterogeneous environments, where the hardware that best ing multiple copies of the same dataset, replicas can increase suits a computation (e.g. a GPU) might be located far from both I/O bandwidth and options for scheduling computation. the dataset. Recently, we have been working on the problem of partial The Magnolia component of the Granite system is in- replica selection for spatial data. A spatial dataset associates tended to integrate Granite’s unique spatial capabilities with data values with locations in an n-dimensional domain and existing Grid software. Replica selection for partial spatial is commonly used in various fields of engineering and the replicas is an important part of Magnolia, and requires both sciences. For example, a climatologist might use spatial data a way to discover the partial replicas that intersect with a produced by simulation to predict temperature changes for spatial query and a model of access time that allows Mag- the Gulf of Mexico over a long time period. An agronomist nolia to choose between them. The Granite system already might use spatial data to represent soil humidity for a large incorporates the concept of a storage model, which can tract of land. The size of spatial datasets is growing rapidly be used to infer disk access costs. This paper concentrates due to advances in measuring instrument technology and on the problem of efficiently determining the set of partial more accessible but less expensive computational power. replicas that intersect with the spatial query bounds given The Geographical Information Systems (GIS) community by the user. has been actively investigating distributed data access for In a previous paper [12], we described the Globus Toolkit R-tree (GTR-tree) implementation of an important spatial bility [3]. More recently, Chervenak et al. systematically data structure on top of an existing grid infrastructure. In described Globus RLS framework design, implementation, that work, an R-tree [13] is constructed in the Globus Toolkit performance and scalability evaluation, and its production RLS backend relational database, and metadata describ- deployments [5]. ing spatial replicas is managed via an unmodified Globus In the GIS context, Wu et al. described a framework for Toolkit. a spatial data catalog implemented on top of a grid and The GTR-tree is one possible implementation of a Spa- P2P system [6]. Wei and Di et al. implemented a Grid- tial Replica Location Service (SRLS), which represents the enabled catalogue service for GIS data, by adapting the Open spatial information of replicas in a grid or other distributed Geospatial Consortium (OGC) Catalogue Service for Web system, associating the spatial extent of replicas with their Specification and the ebXML Registry Information Model physical addresses. When a query region is submitted to an (ebRIM) [7], [8]. The OGC publishes a number of standards SRLS, it must return the set of replicas that intersect with for GIS and applications, but these aren’t well that region. A data grid may contain many SRLS instances, suited for other types of spatial data, especially those that each with a different set of replica metadata, and each able entail three or more dimensions. For this reason, databases to service multiple clients simultaneously. This allows not such as PostGIS [14] that implement the OGC GIS standard only the metadata, but also the intersection computation to are not easily applied to other types of scientific data or be be distributed across a grid. metadata. Using an R-tree for an SRLS implementation is very Narayanan et. al. described a middleware GridDB-Lite effective since we can selectively traverse the tree, greatly re- which provides basic scientific database support for data- ducing the number of required intersection tests. Implement- driven applications in the grid [15]. To expedite selection of ing the R-Tree data structure on top of an unchanged Globus the data of interest, they implemented an R-tree via summary implementation eases deployment and avoids compatibility index files and detailed index files in their indexing service. issues, but at some cost to performance. In this paper, we In another paper [16], Narayanan presented a generic frame- describe the Mortonized Aggregated Query (MAQ)¯ R-Tree, work to support partial data replication and data reordering. a new SRLS implementation that is now directly integrated Weng et. al. described a partial replica selection algorithm into the Globus Toolkit. Although we made minor changes for serving range queries on multidimensional datasets to the Globus source code, the MAQR-tree implementation [17]. Our work differs from these efforts in two important is independent of Globus and could easily be ported to other respects. First, our MAQR-tree integrates an R-tree into grid systems. We have also added a collection of optimiza- the Globus toolkit to support spatial metadata operations. tions that together improve performance over previous work Second, we investigated the efficient representation of an by a substantial margin. Namely, we have improved upon R-tree constructed in a relational database. the GTR-tree by changing the table design in the backend To manage spatial metadata for fast retrieval, we chose database, and by aggregating several queries into one larger the R-Tree [13] data structure for our system from among query, which reduces overhead. We now also use the Morton several other spatial data structures, including the Quadtree, Space-filling Curve during R-tree construction, which im- Octree, Kd-tree and UB-tree. We require a data structure proves spatial locality. Experiments evaluating multi-client which returns the collection of replica Minimal Bounding and distributed spatial queries demonstrate good scalability, Rectangles(MBRs) that intersect with a query region. OLAP and a very substantial improvement over previous results. methods like the UB-tree [18] return a collection of points The rest of the paper is organized as follows. Section II (records) that are contained within a query region, making presents related work. In section III, we further describe them inconvenient for our purposes. As described by Kamel the problem that we are addressing. Section IV analyzes et al., Quadtrees and Octrees typically divide the spatial our previous GTR-tree implementation. The MAQR-tree replicas into smaller blocks, thus generating more partial implementation is presented in section V. In section VI, spatial replicas during construction of the tree [19]. Kd-trees the advantages and usefulness of proposed techniques are [20] have a similar problem because they require us to divide validated through experiments. We conclude by summariz- the space with a splitting hyperplane. However, the R-tree ing this work and pointing out future research directions in and related methods (e.g. the X-tree [21]) allow for efficient section VII. retrieval of the replica MBRs that intersect with a query II.RELATED WORK region, without splitting replicas into smaller pieces. Replica selection in grid computing has been addressed Section V-C1 describes our use of the Morton Space- by many researchers. Chervenak et al. described the Giggle filling curve, also known as z-ordering. The Morton Space- framework, including RLS requirements, implementation, filling curve and Hilbert curve are widely used in mul- and performance results [2]. Cai et al. proposed a Peer-to- tidimensional access methods [19], [22]. Faloutsos et al. Peer Replica Location Service(P-RLS) with the properties compared the Hilbert curve and the Morton curve. Their of self-organization, fault-tolerance and improved scala- work concluded that the Hilbert curve has better distance- the actual number of child nodes. Each node of an R-tree can have a variable number of child nodes, up to a maximum value that we will refer to as the fanout. For our purposes, leaf nodes contain a reference to partial replicas, along with an MBR for each replica. Internal nodes contain references to child tree nodes, along with an MBR that encloses the MBRs of all the child nodes. During search, an internal Figure 1. Intersected Replicas in 2D. The dotted square represents the node’s MBR can be tested for intersection with the query spatial query, and intersected replicas are shown as shaded rectangles. rectangle, allowing large sections of the tree to be pruned.

IV. LIMITATIONS OF THE GTR-TREE preserving mapping than the Morton curve, but the Hilbert curve is more complex to calculate [23]. Both the Morton Our previous approach was implemented on top of the and Hilbert curves are found in commercial database sys- Globus Toolkit RLS. In particular, we repurposed the Physi- tems. Microsoft SQL server 2008 used the Hilbert curve cal File Name(PFN) to be a reference to a tree node. We then to determine the value of cell identifier, thus maintaining associated two string attributes “MBR” and “Info” with that good spatial locality [24]. Oracle has used the Morton PFN in order to represent its spatial extent and references to curve in their products for some time [22]. , a its children respectively. To enable the distributed query in a relational database system, integrated the UB-Tree into its grid, we used the GRAM API to invoke the R-tree traversal kernel to support spatial indexing. A UB-tree combines a routine which was located on the same machine as the RLS B-Tree with the Morton curve, mapping multi-dimensional server. Query results can be returned by using GridFTP or data to one-dimensional space using the Morton value of a Java socket. the data points [18]. In P2P systems, Ganesan et al. [25] The approach described above is still subject to some used the Morton curve to reduce multi-dimensional data disadvantages. First, fanout of the GTR-tree is limited by to one dimension, then partitioned the dataset according to the size of the string attribute in RLS backend database. contiguous Morton value range. Because the Morton value Although R-trees of large fanout can be represented in the can be simply computed by bit-interleaving, in this paper we database by introducing more user-defined attributes, we use the Morton curve to re-order the 3D replicas and then propose a more elegant solution in section V-C2. construct the MAQR-tree. Second, GTR-tree node representation was not optimized and was constrained by the design of the existing table III.ALGORITHM OVERVIEW structure in the backend database. In particular, one tree node Before a computation begins in a grid, partial spatial was stored in three different tables, which made the query replica selection must address two subproblems. The first and updates inefficient due to an expensive join operation. subproblem is to efficiently identify the set of partial replicas Third, although the GTR-tree dramatically prunes the that intersect the subvolume required by the computation in search space, some amount of metadata was still sent from a distributed system. Second, from that set, we want to select the server code to the query routine via local socket, which the “best” replicas to minimize transfer time. This requires a incurs some overhead. Also, because we needed to use exist- metric based on factors such as disk and network bandwidth ing Globus tools, we incurred additional latencies associated and latency, data storage organization, etc. Our current work with GRAM and GridFTP when performing a distributed focuses only on the first subproblem. query. Figure 1 illustrates how the intersecting replicas are identi- V. A NEW APPROACH fied in the system. Spatial replicas and queries are commonly In this paper, we modified the Globus Toolkit RLS to represented as MBRs, which approximate the extent of the support spatial metadata management in a grid. We evaluated object in space using minimum and maximum values for the performance of an R-tree stored in a relational database, each dimension. In the 2D case, a spatial replica or query can and some related factors which influence the performance be represented as {xmin, ymin, xmax, ymax}. These MBRs of spatial queries or updates, including Fanout, the Morton identify the region of the larger data space that each replica Space-filling Curve and Query Aggregation. represents. Intersection tests are performed to determine which replicas intersect with the spatial query. In the worst A. RLS in the Globus Toolkit case, each spatial replica in the catalog will be examined Communication between client and server in the Globus against the spatial query, which is very computationally RLS uses a simple string-based RPC protocol [3], [26], expensive. which relies on the Grid Security Infrastructure(GSI) and However, an R-tree can be used on each grid node to the globus io socket layer from the Globus Toolkit [2]. organize the metadata of all spatial replicas and to prune Method names, parameters and results are all encoded as the search space. We define the out degree of a tree node as null terminated strings. 7 8 9 1 Two critical factors influence tree performance. First, we consider how to select sets of MBRs to be siblings of a single parent, sometimes called clustering [19]. Ideally, 13 0 sibling nodes would be nearby in space, and also stored 21 14 3 2 close together in the underlying database to improve I/O 20 15 performance. The space-filling curve is a useful tool for 19 ordering child MBRs in a manner that addresses these concerns. The other critical factor is how to efficiently (a) Layout of 2D spatial replicas. The partial spatial repli- represent the tree structure in the RLS database. Reducing cas(leaves in the R-tree) are shown in the most left. The root node is shown in the most right. the storage volume for tree representation and improving access speed are both important goals here. 0 1) Morton R-tree: To construct the Morton R-tree, all spatial replicas are first sorted according to the Morton val- ues of their center. Then we construct the R-tree in a bottom- 1 2 3 up manner based on the specified fanout and the out degree 7 8 9 13 14 15 19 20 21 of the tree node [27]. Replicas whose MBRs are associated with adjacent Morton values will be clustered into the same (b) R-tree with fanout(Maximum out degree) of 6 and out tree node. We evaluate the MAQR-tree performance with degree = 3 for figure (a). and without the Morton re-ordering in section VI. Node_id num_child x_min y_min x_max y_max address 2) New Tree Node Representation: We added a new table (int) (int) (int) (int) (int) (int) (varchar) to the RLS backend database, to store the metadata of all 0 3 0 0 4 6 null ...... spatial replicas. Figure 2 describes the representation of the 21 0 1 1 2 6 “gsiftp://node1..../d8.bin” MAQR-tree, in which columns x min, y min, x max and y max describe the MBR of replicas. () MAQR-tree node representations for 2D replicas in figure (b). Given a current node id pid, the tree fanout f, and out Figure 2. An MAQR-tree example on a single grid node. The number in degree d, we can compute all its child ids using: each tree node indicates the tree node id. first child id = pid × f + 1 (1a) An RPC method invocation in the Globus RLS includes last child id = first child id + f − 1 (1b) several steps. First, the client sends the method name and last occupied child id = first child id + d − 1 (1c) all the arguments to the server. A thread on the server searches through a method array for an element matching Also, given the child id cid, we can calculate its parent id the requested method name. After the matching method is pid by:  cid − 1  invoked, execution results are sent back to the client. It pid = (2) follows that we can easily extend the functionality of the fanout RLS by adding new entries to the method array, where each To construct a R-tree with n spatial replicas, the MAQR- entry includes a method name and a reference to the code tree node ids on each level can be determined by using equa- itself. tion 1 and the tree height H. During tree construction, note B. Modifying The RLS that only d replicas are clustered into the same tree node, The current work extends the RLS in Globus Toolkit 5.0.3 the rest of (f − d) unused node ids and all their descendant for spatial replica selection in several ways. First, we added a ids will be reserved for tree updates, with d <= f. new table to the backend database, allowing all tree nodes to 3) Advantages of The New Representation: There are sev- be stored in a single database table. Second, we added spatial eral advantages associated with the new tree representation, replica query and insertion methods to the RPC methods relative to our previous work and other designs. array on the server, to service the spatial requests from the • Node ID Representation: We used an integer to rep- clients. Third, we implemented new methods on the server resent node id. In the underlying database, the perfor- to communicate with its backend database, where query mance is improved when tree nodes are retrieved using aggregation is used. Fourth, we added the spatial replica integer keys, especially after a relational database Index query and insertion functionality to the RLS client tool, has been constructed on the column Node id. client C API and Java API of the existing RLS. • Table Simplification: Metadata of spatial replicas is stored in a single table, and no relational join is C. The MAQR-tree performed during a query. To distinguish from the previous GTR-tree implementa- • No explicit representation of child and parent ids.: The tion of an SRLS, we call our new work the MAQR-tree. storage size of the tree node is dramatically decreased, Table I Table III EXPERIMENTAL GRID NODE CHARACTERISTICS NUMBEROF REPLICAS INTERSECTEDWITHQUERYINOUR GRID

Java Globus unixODBC MySQL MyODBC Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 OS Processor Cores Memory version version version database library Average 1995 2578 1288 3405 10667 15990 20181 22351 26523 Intel Xeon SE 2.6.18 2.40GHz 16 24G 1.6.0_23 v5.0.3 v2.2.11 v5.0.77 v3.51.26 Maximum 2063 2667 1350 3534 10865 16221 20484 22493 26822 Minimum 1911 2518 1243 3307 10527 15786 19976 22193 26347 Table II SPATIAL EXTENTOF REPRESENTATIVE QUERIES Table IV ADVANTAGE OF MAQR-TREE QUERY OVER PURE RELATIONAL 3D Query MBR (String) Q5 “200 200 200 400 400 400” Q1 “1400 1400 1400 1400 1400 1400” Q6 “800 900 1000 1000 1100 1400” Data1 Data2 Data3 Data4 Q2 “1500 1500 1600 1510 1520 1640” Q7 “1024 1024 1024 1324 1324 1324” Total Number(N) 1000000 1000000 1000000 1000000 Q3 “50 50 50 100 100 100” Q8 “1600 1600 1600 2048 2048 2048” Average number returned(AR) 11645 4649 1816 2642 Q4 “100 100 100 200 200 200” Q9 “900 900 1000 1400 1100 1400” Average speedup(AS) 1.9 4.36 7.75 8.13

resulting in a more compact backend database, because no stored reference to the children is required. Instead VI.EVALUATION we can calculate all child ids on the fly using equation The Distributed Research Testbed (DIRT) is a multi-site 1. instrument for developing and evaluating the performance of • Arbitrarily large fanout values: The fanout of the distributed software. We constructed a fifteen-node grid on MAQR-tree is no longer limited by the size of a table DiRT using a version of Globus Toolkit 5.0.3 modified to column. We can create a MAQR-tree with a fanout of include our spatial replica location service. In the grid, eight more than one thousand, for instance. Although a large nodes are located at the University of Mississippi, while number of node ids are reserved in the tree, the storage seven of them are located at the University of Florida. The utilization is not influenced by the tree fanout. characteristics of each grid node are described in table I. The • Query Aggregation: Database queries can be aggregated proposed service uses unixODBC manager and MyODBC into a single database transaction using a range of tree driver to communicate with MySQL server. node ids. We used independent datasets on each grid node. One • Support for insertion and deletion : the new MAQR-tree million 3D replicas were generated on each grid node using Packed R-tree is not a . Instead it is dynamic, and allows two separate Java Random objects. One object randomly insertion and deletion of replicas in the tree. generates the center point of the 3D rectangle in the entire Because the MAQR-tree is stored in the RLS’s backend dataset domain (a 20483 cube). Another Random object relational database, we are able to take advantage of the randomly generates the length of replicas along each axis, provided functionality. In addition to the convenient data which is bounded in the range of 1 to P . The following tests access the RDMS provides, we can also use the atomic trans- use P = 512, unless otherwise noted. action mechanism to maintain consistency when the server The same nine representative spatial queries used in [12] is simultaneously processing both queries and updates. were also used for testing in this paper. Queries include one point query and eight rectangular queries of various D. MAQR-Tree Traversal And Query Aggregation sizes. The extent of these nine spatial queries, relative to We use a linked list queue to implement the MAQR-tree the entire dataset (a 20483 cube) are shown in table II. Breadth-first Search. Each queue element stores the integer For each query, table III shows the average number(per grid node id of the tree node whose MBR is intersected with node) of replicas that intersect that query, and the maximal the query MBR. To reduce the latency cost associated with and minimal number of replicas returned in the grid. All each replica query, we aggregate a group of replica reads into following results were collected as an average of five runs one larger read when the RLS server retrieves information to reduce the effect of filesystem caching, network behavior from the backend database. We call this technique query and other variables. aggregation. When we tested performance of various MAQR-tree Under the MAQR-tree representation, during the tree fanout and out degree values, we found that a MAQR-tree traversal, a tree node id is first dequeued and all its child with out degree of 50 yields best performance on average. In information is retrieved by an I/O module. The I/O module the following tests, we used the MAQR-tree with out degree next computes the first child id and the last child id using of 50, and fanout of 70. equation 1. Then it performs a range query on the table A. Experimental Verification column Node id, and all child information is retrieved in one transaction. We constructed a relational index on the We performed a series of experiments to verify the table column Node id, so the range query on that column is usefulness of the MAQR-tree and the effectiveness of the fast. tree representation in the database. Column Name: id x_min y_min x_max y_max address the underlying database to support insertion. For the MAQR- Type: int int int int int varchar tree, we must perform several steps in order to insert a Figure 3. The schema used for the pure relational implementation, replica. Once the proper tree location has been determined, components of the MBR are stored as integers. all siblings (sharing the same parent) after this location must increment its node id by one in order to make space for select address from t_spatial_rep where (not ((x_min > qxmin and x_min > qxmax) or x_max < qxmin) the insertion. In the rare case that no space is available, a and not ((y_min > qymin and y_min > qymax ) or y_max < qymin)); node split operation must be performed [28], taking roughly 100 milliseconds in our tests. Next, the parent and ancestor Figure 4. SQL statement for conducting spatial queries without a MAQR-tree. Used with the schema shown in figure 3, this SQL state- MBRs are updated to account for the new node. ment can retrieve replicas that intersect with a bounding rectangle We tested MAQR-tree insertions with various groups of {qxmin, qymin, qxmax, qymax}. Performance was found to be worse 503 1003 2003 than the MAQR-Tree on average. replica MBRs, including cubic MBRs of , , , and 3003. We inserted these cube replicas into different We verified that an R-tree in a relational database is useful regions of the entire domain. It takes an average of 30 for spatial metadata by comparing MAQR-tree performance milliseconds to insert one replica into a MAQR-tree. The with a non-tree implementation that relies heavily on the pure relational approach is slightly quicker, taking an aver- underlying relational database. For this pure relational im- age of 18 milliseconds to insert, which is 1.67 times faster plementation, we used the database schema shown in figure than the MAQR-tree. Overall, the MAQR-tree approach is 3. Here, MBR coordinates are stored as separate integer effective for both scientific database and replica selection fields in the database, making them directly available in SQL applications, which involves intensive queries and relatively queries. Figure 4 shows an SQL implementation of a spatial less updates. intersection query using this schema. Indexes were built for all fields. B. Morton R-tree To carefully examine the advantage of the MAQR-tree We evaluated the effect of introducing the use of the Mor- over the pure relational implementation, we performed the ton Space-filling curve to the MAQR-tree implementation, comparison on four different datasets. We generate these as described in section V-C1, and found that adding this datasets randomly as described before, but with different technique improves performance by more than 30 times on values for P , which determines the maximum size of repli- average. The results presented in this paper use the MAQR- cas. This allows us to control the average number of replicas tree, but we expect a slightly better query performance that intersect the query region in the different datasets. The when applying Hilbert curve in the R-tree construction. advantage of the MAQR-tree query over pure relational Experiments show that the Hilbert R-tree outperforms the query is presented in table IV. MAQR-tree by 7% on dataset Data1 in table IV. In table IV, the row “AR” describes the average number of replicas returned per representative query on each dataset. By associating each MBR with a Morton value, spatially And the row “AS” is calculated by averaging the ratios of adjacent replicas are mapped to similar Morton values. MAQR-tree query performance to that of pure relational Therefore, they can be grouped into the same or neighboring for each query. We observed that the MAQR-tree query parent tree nodes. In this way, the Morton value helps can be up to 8.13 times quicker than the pure relational the enclosing MBRs to stay local, resulting in reduced implementation. “dead space” and possible overlaps in the tree node [28]. The reason for the advantages is very likely the multi- In short, the enclosing MBR is more efficiently used, and dimensional nature of the MAQR-tree. For pure relational the constructed R-tree is more compact. This decreases the implementation, even if an index can be used to quickly find number of nodes that must be examined for intersection, the set of replicas with appropriate x values, a y index can thereby improving performance. not help with pruning this set further because it contains all C. Query Aggregation the replicas in the database. A time consuming exhaustive search is therefore necessary. In contrast, the MAQR-tree We evaluated the benefit of aggregating the queries made query is able to drastically prune the set of replicas under to the underlying database, as described in section V-D. consideration as it encounters multidimensional MBRs at Comparing MAQR-tree query performance with and without each level of the tree, improving performance enormously. this feature activated demonstrates an average 15x perfor- Table IV also shows that the MAQR-tree has the greatest mance improvement for Query Aggregation. This improve- advantage when the number of replicas returned is smaller. ment is largely due to amortizing overhead associated with These more selective queries exhibit more aggressive prun- each RDBMS query over many child nodes. Tests with cold ing while traversing the tree, which enhances performance. and warm filesystem caches lead us to believe that it is We also tested the insertion cost with the two implemen- software costs, rather than disk access costs, that are most tations. The pure relational implementation relies entirely on directly addressed with this technique. Query Q3 Query Q4 Query Q5 45,000 we see that performance also scales well as we increase the 40,500 number of server nodes and total number of replicas. Each 36,000 server maintains one million replicas. With fifteen nodes we 31,500 are handling fifteen million replicas, but execution time is 27,000 22,500 less than five times as long as for one million replicas. 18,000 VII.CONCLUSION

Time(millisec) 13,500 9,000 Our eventual goal is a system where both data and 4,500 0 computation are truly distributed. We envision applications 1 25 50 75 100 where only a subvolume of a larger dataset is required for Number of Client threads a computation, and where that subvolume may be available (a) Multi-client Experiment of the spatial replica location service. from various combinations of different sources. If we hope to provide more than simple batch computation for such Query Q3 Query Q4 Query Q5 11,443.0 applications, we must be able to rapidly identify the set of 10,298.7 partial replicas that intersect with the required subvolume, 9,154.4 and then choose an optimal combination of replicas to read 8,010.1 from. This paper presents work that rapidly identifies the set 6,865.8 of intersecting replicas in a distributed environment. 5,721.5 4,577.2 Our previous work on this topic [12] solved this problem

Time(millisec) 3,432.9 with an implementation that lay entirely on top of an 2,288.6 unmodified Globus Toolkit. This approach has the advantage 1,144.3 of easy deployment using existing software infrastructure, 0 1 4 8 12 15 but must also pay a performance penalty. The work described Number of grid nodes here examined the performance benefits of an approach (b) Distributed query using one to fifteen nodes. that modifies the Globus source code to provide a more Figure 5. The multi-client test in (a) shows that server performance scales efficient implementation. We also provided important new well as the number of simultaneous client threads increases. In (b) we see functionality in the form of insertion and update operations. that performance scales well as we increase the number of server nodes With regard to the internal implementation of the R-tree and total number of replicas. In each test, the total number of replicas is the number of nodes multiplied by one million. data structure and associated operations, we have also added several improvements. Ordering tree node children according D. Comparison with the GTR-tree to the Morton curve was very effective, as was Query To make a fair comparison of the MAQR-tree with Aggregation. The net result is performance roughly 24.5 our previous GTR-tree, we tested on a single grid node times better than the GTR-tree implementation. Experiments using a same dataset. We compared the efficiency of the show single server with one million replicas in database can tree representation and associated techniques by sorting all scale up to 100 client threads, when handling large spatial replicas for both MAQR and old GTR-tree with the same queries. out degree. We observed the performance of the MAQR-tree There are several avenues for future research. We may also is on average 24.5 times better than the previous GTR-tree. continue our optimization efforts by replacing the database We attribute these speedups to the advantages of the MAQR- with our own storage module, giving greater control over tree over the previous GTR-tree described in section V-C3 disk behavior. The problem of selecting an optimal combi- and IV. nation of partial replicas must also be addressed, along with the development of suitable performance metrics. Lastly, we E. Multi-client and Distributed Experiments entertain the possibility of using GPUs to not only support We wrote a multi-threaded client program using our C API user computation directly, but to also assist in computation and the pthreads library. We conducted the multi-client tests intensive selection and management tasks. with the server and clients on a same machine, on which our threaded client program allows us to specify how many client ACKNOWLEDGMENT threads are created simultaneously. In the distributed query, This work was supported by the National Science Foun- we used as many remote machines as possible, submitting dation under grants CCF-0541239 and CRI-0855136. queries on a grid node at the University of Mississippi to grid nodes at the University of Florida. The results of the multi- REFERENCES client and distributed query are shown in figure 5. Figure [1] S. Vazhkudai, S. Tuecke, and I. Foster, “Replica selection in 5(a) shows that server performance scales very linearly as the globus data grid,” in ccgrid. Published by the IEEE the number of simultaneous client threads increases. In (b) Computer Society, 2001, pp. 106–113. [2] A. Chervenak, E. Deelman, I. Foster, L. Guy, W. Hoschek, [14] PostGIS Webpage. [Online]. Available: http://postgis. A. Iamnitchi, C. Kesselman, P. Kunszt, M. Ripeanu, refractions.net/ B. Schwartzkopf, H. Stockinger, K. Stockinger, and B. Tier- ney, “Giggle: a framework for constructing scalable replica [15] S. Narayanan, T. Kurc, U. Catalyurek, and J. Saltz, “Database location services,” in Proceedings of the 2002 ACM/IEEE support for data-driven scientific applications in the grid,” conference on Supercomputing. Baltimore, Maryland: IEEE PPL-Parallel Processing Letters, vol. 13, no. 2, pp. 245–272, Computer Society Press, 2002, pp. 1–17. 2003.

[3] M. Cai, A. Chervenak, and M. Frank, “A peer-to-peer replica [16] S. Narayanan, U. Catalyurek, T. Kurc, V. Kumar, and J. Saltz, location service based on a distributed hash table,” in Pro- “A runtime framework for partial replication and its applica- ceedings of the 2004 ACM/IEEE conference on Supercom- tion for on-demand data exploration,” in High Performance puting. IEEE Computer Society, 2004, p. 56. Computing Symposium (HPC 2005), SCS Spring Simulation Multiconference, 2005. [4] Globus.org, “Globus Toolkit 5.0.3 RLS User’s Guide.” [Online]. Available: http://www.globus.org/toolkit/docs/5.0/5. [17] L. Weng, U. Catalyurek, T. Kurc, G. Agrawal, and J. Saltz, 0.3/data/rls/user/#rlsUser “Servicing range queries on multidimensional datasets with partial replicas,” in Cluster Computing and the Grid, 2005. [5] A. Chervenak, R. Schuler, M. Ripeanu, A. Amer, S. Bharathi, CCGrid 2005. IEEE International Symposium on, vol. 2. I. Foster, A. Iamnitchi, and C. Kesselman, “The globus IEEE, 2005, pp. 726–733. replica location service: design and experience,” Parallel and Distributed Systems, IEEE Transactions on, vol. 20, no. 9, [18] F. Ramsak, V. Markl, R. Fenk, M. Zirkel, K. Elhardt, and pp. 1260–1272, 2009. R. Bayer, “Integrating the ub-tree into a database system ker- nel,” in VLDB 2000, Proceedings of 26th International Con- [6] W. Peng-fei, F. Yu, C. Bin, and W. Xi, “GloSDC: A Frame- ference on Very Large Data Bases, September 10-14, 2000, work for a Global Spatial Data Catalog,” in Geoscience Cairo, Egypt, A. E. Abbadi, M. L. Brodie, S. Chakravarthy, and Remote Sensing Symposium, 2008. IGARSS 2008. IEEE U. Dayal, N. Kamel, G. Schlageter, and K.-Y. Whang, Eds. International, vol. 2. IEEE, 2009. Morgan Kaufmann, 2000, pp. 263–272.

[7] Y. Wei, L. Di, B. Zhao, G. Liao, A. Chen, Y. Bai, and Y. Liu, [19] I. Kamel and C. Faloutsos, “Hilbert R-tree: An improved “The design and implementation of a grid-enabled catalogue R-tree using fractals,” in Proceedings of the International service,” in International Geoscience and Remote Sensing conference on Very Large Databases, 1994, pp. 500–509. Symposium, vol. 6, 2005, p. 4224. [20] J. L. Bentley, “Multidimensional binary search trees used for [8] L. Di, A. Chen, W. Yang, Y. Liu, Y. Wei, P. Mehrotra, associative searching,” Commun. ACM, vol. 18, pp. 509–517, C. Hu, and D. Williams, “The development of a geospatial September 1975. data grid by integrating OGC web services with globus-based grid technology,” Concurrency and Computation: Practice [21] S. Berchtold, D. Keim, and H. Kriegel, “The X-tree: An index and Experience, vol. 20, no. 14, pp. 1617–1635, 2008. structure for high-dimensional data,” Readings in multimedia computing and networking, vol. 12, p. 451, 2002. [9] P. J. Rhodes, X. Tang, R. D. Bergeron, and T. M. Sparr, “Itera- tion Aware Prefetching for Large Multidimensional Scientific [22] V. Gaede and O. Gunther,¨ “Multidimensional access meth- Datasets,” in SSDBM’2005: Proc. of the 17th international ods,” ACM Computing Surveys (CSUR), vol. 30, no. 2, pp. conference on Scientific and statistical database management. 170–231, 1998. Berkeley, CA, US: Lawrence Berkeley Laboratory, 2005, pp. 45–54. [23] C. Faloutsos and S. Roseman, “Fractals for secondary key re- trieval,” in Proceedings of the eighth ACM SIGACT-SIGMOD- [10] P. J. Rhodes and S. Ramakrishnan, “Iteration Aware Prefetch- SIGART symposium on Principles of database systems. ing for Remote Data Access,” in Proc. 1st International ACM, 1989, pp. 247–252. Conference on e-Science and Grid Computing, H. Stockinger, R. Buyya, and R. Perrott, Eds., 2005, pp. 279–286. [24] Y. Fang, M. Friedman, G. Nair, M. Rys, and A. Schmid, “Spatial indexing in microsoft SQL server 2008,” in Proceed- [11] Y. Gu and R. Grossman, “UDT: UDP-based data transfer ings of the 2008 ACM SIGMOD international conference on for high-speed wide area networks,” Comput. Netw., vol. 51, Management of data. ACM, 2008, pp. 1207–1216. no. 7, pp. 1777–1799, 2007. [25] P. Ganesan, B. Yang, and H. Garcia-Molina, “One torus to [12] Y. Tian and P. J. Rhodes, “The Globus Toolkit R-tree for rule them all: multi-dimensional queries in p2p systems,” in Partial Spatial Replica Selection,” in Proceedings of the 2010 Proceedings of the 7th International Workshop on the Web 11th IEEE/ACM International Conference on Grid Comput- and Databases: colocated with ACM SIGMOD/PODS 2004, ing. Brussels, Belgium: IEEE, Oct. 2010, pp. 169–176. ser. WebDB ’04. New York, NY, USA: ACM, 2004, pp. 19–24. [13] A. Guttman, “R-trees: A Dynamic Index Structure for Spatial Searching,” in SIGMOD ’84: Proceedings of the 1984 ACM [26] Globus.org, “Replica Location Service RPC Protocol SIGMOD international conference on Management of data. Description,” 2003. [Online]. Available: http://www.globus. New York, NY, USA: ACM, 1984, pp. 47–57. org/toolkit/docs/5.0/5.0.3/data/rls/developer/rpcprotocol.pdf [27] S. Leutenegger, M. Lopez, and J. Edgington, “STR: A simple and efficient algorithm for R-tree packing,” in Data Engineer- ing, 1997. Proceedings. 13th International Conference on, 1997, pp. 497–506.

[28] N. Beckmann, H. Kriegel, R. Schneider, and B. Seeger, “The R*-tree: an efficient and robust access method for points and rectangles,” ACM SIGMOD Record, vol. 19, no. 2, pp. 322– 331, 1990.