The RLR-: A Reinforcement Learning Based R-Tree for Spatial Data

Tu Gu1, Kaiyu Feng1, Gao Cong1, Cheng Long1, Zheng Wang1, Sheng Wang2 1School of Computer Science and Engineering, Nanyang Technological University, Singapore 2DAMO Academy, Alibaba Group [email protected],{kyfeng,gaocong,c.long,wang_zheng}@ntu.edu.sg,[email protected]

ABSTRACT space filling curve), and then learn the CDF for a particular dataset. Learned indices have been proposed to replace classic index struc- Despite the success of these learned indices in improving the perfor- tures like B-Tree with machine learning (ML) models. They require mance of some types of queries, they still have various limitations, to replace both the indices and query processing algorithms cur- e.g., they can only handle spatial point objects and limited types rently deployed by the databases, and such a radical departure is of spatial queries, some only return approximate query results, likely to encounter challenges and obstacles. In contrast, we pro- and they either cannot handle updates or need a periodic rebuild pose a fundamentally different way of using ML techniques to to retain high query efficiency (Detailed discussions are in Sec- improve on the query performance of the classic R-Tree without tion 2). These limitations, together with the requirement that the the need of changing its structure or query processing algorithms. learned indices need a replacement of the index structures and Specifically, we develop reinforcement learning (RL) based models query processing algorithms currently used by sys- to decide how to choose a subtree for insertion and how to split a tems, would make it less likely for learned indices to be deployed node, instead of relying on hand-crafted heuristic rules as R-Tree immediately in the current database systems. and its variants. Experiments on real and synthetic datasets with In this work, we consider a fundamentally different way of using up to 100 million spatial objects clearly show that our RL based machine learning to improve on the R-Tree, rather than learning a index outperforms R-Tree and its variants. CDF for spatial data. We aim to use machine learning techniques to construct an R-Tree in a data-driven way for better query efficiency. PVLDB Reference Format: Specifically, instead of relying on hand-crafted heuristic rules for Tu Gu1, Kaiyu Feng1, Gao Cong1, Cheng Long1, Zheng Wang1, Sheng ChooseSubtree and Split operations, which are two key operations 2 Wang . The RLR-Tree: A Reinforcement Learning Based R-Tree for Spatial for building a R-Tree, we propose to learn from data and train Data. PVLDB, 14(1): XXX-XXX, 2020. machine learning models for the two operations. Note that we do doi:XX.XX/XXX.XX not need to modify the basic structure of the R-Tree and thus all the currently deployed query processing algorithms will be still 1 INTRODUCTION applicable to our proposed index. This would make it easier for the To support efficient processing of spatial queries, such as range learning based index to be deployed by current databases. queries and KNN queries, for different application scenarios, such as To motivate our idea, we next revisit the two key operations Goole Maps, and Pok푒´Mon Go with hundreds of millions of objects ChooseSubtree and Split. When inserting a new spatial object, an and a large number of user queries, spatial databases have for R-Tree needs to invoke the ChooseSubtree operation iteratively, decades relied on delicate indexes. The R-Tree [15] is arguably the i.e., choosing which child node to insert the new data object, until most popular spatial index that prunes irrelevant data for queries. a leaf node is reached. If the number of entries in a node reaches R-Trees have attracted extensive research interests [2–4, 6, 8, 13, 17, the capacity, the Split operation is invoked to divide the entries 18, 23, 24, 27, 29, 31–33, 35, 37, 40, 42] and is widely used in popular into two groups. Many R-Tree variants have been proposed, which databases such as PostgreSQL and MySQL. mainly differ in their strategies for the insertion of new objects The learned index has been proposed in [21], which proposes (i.e., ChooseSubtree) as well as algorithms for splitting a node (i.e., arXiv:2103.04541v1 [cs.DB] 8 Mar 2021 a recursive model index (RMI) for indexing 1-dimensional data by Split). Almost all these strategies are based on hand-crafted heuris- learning a cumulative distribution function (CDF) to map a search tics. However, there is no single heuristic strategy that dominates key to a rank in a list of ordered data objects. To address the lim- all the others in terms of query performance. To illustrate this, we itations of the RMI, such as the lack of supporting updates, and generate a dataset with 1 million uniformly distributed data points improve it, several learned indices have been proposed based on and construct four R-Tree variants using four different Split strate- the RMI. The idea of learned indices is also extended for spatial gies, namely linear [15], quadratic [15], Greene’s [14] and R*-Tree data [24, 31, 40] or multi-dimensional data [6, 8, 27]. They usu- [3]. We run 1,000 random range queries and rank the four indices ally map spatial data points to a uniform rank space (e.g., using a based on the query processing time of each individual query. We observe that no single index has the best performance for all the This work is licensed under the Creative Commons BY-NC-ND 4.0 International License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of queries. For example, Greene’s Split has the best query performance this license. For any use beyond those covered by this license, obtain permission by among 50% of the queries while R*-Tree Split is the best for 49% emailing [email protected]. Copyright is held by the owner/author(s). Publication rights licensed to the VLDB Endowment. of the queries. The observation that no single index dominates Proceedings of the VLDB Endowment, Vol. 14, No. 1 ISSN 2150-8097. the others holds on datasets with other distributions and real-life doi:XX.XX/XXX.XX Tu Gu1, Kaiyu Feng1, Gao Cong1, Cheng Long1, Zheng Wang1, Sheng Wang2 datasets as well, although the top performers may be different on In summary, we make the following contributions: different datasets. (1) We propose to train machine learning models to replace The observation motivates us to develop machine learning mod- heuristic rules in the construction of an R-Tree to improve on its els to decide how to choose a subtree for insertion and how to split query efficiency. To the best of our knowledge, this is the first work a node, instead of relying on hand-crafted heuristic rules. Further- that uses machine learning to improve on the R-Tree without mod- more, we observe that the ChooseSubtree and Split operations can ifying its structure, so that all currently deployed query processing be considered as two sequential decision making problems. There- algorithms are still applicable and the proposed index can be easily fore, we model them as two Markov decision processes [30] (MDPs) deployed by current databases. and propose to use RL to learn models for the two operations. (2) We model the operations of how to choose a subtree for However, it is a very challenging journey to make the idea work. insertion and how to split a node as two Markov decision processes, The first challenge is how to formulate ChooseSubtree and Split and carefully design their states, actions and reward signals. We as MDPs appropriately? How should we define the states, actions, also present some of our unsuccessful trials of designing. and the reward signals for each MDP? Specifically, 1) Designing (3) We design an effective and efficient learning process that the action space. A naive idea could be to define an action as one of learns good policies to solve the MDPs. The learning process enables existing heuristic strategies. However, the idea does not work, for us to apply our RL models trained with a small dataset to build an which we observed from the experiments that different strategies R-Tree for up to 100 million spatial objects. would often make the same decision, and thus it leaves us less room (4) We conduct extensive experiments on both real and synthetic for improvement. Some other possible ideas include defining larger datasets. The experimental results show that our proposed index action spaces, but then it would increase the difficulty of training achieves up to 95% better query performance for range queries and the model. 2) Designing the state. As the number of entries varies 96% for KNN queries than the R-Tree and its variants built using across different R-tree nodes, it is challenging to represent the state one-by-one insertion methods. of a node for both operations? 3) Designing the reward signal. It is nontrivial to design a function that evaluates the reward of past 2 RELATED WORK actions during model training to encourage the RL agent to take “good” actions. 2.1 Spatial Indices The second challenge is how we can use RL to address the de- Spatial indices are generally classified into three categories, namely fined MDPs? For instance, in the construction of an R-Tree, node data partitioning based, space partitioning based, and mapping overflow (and thus the Split operation) occurs less frequently than based. We further divide the data partitioning based methods into the ChooseSubtree operation. Therefore, only a few state transi- “R-Tree and its variants”, and “packing R-Tree”. tions for Split operations are generated, making it difficult for the Data partitioning based indices: R-Tree and its variants. The RL agent to learn useful information. Moreover, a previous decision category of indices partition the dataset into subsets and indexes (ChooseSubtree or Split) may affect the structure of the R-Tree in the subsets. Typical examples include the R-Tree and its variants, the future. Therefore, a “good” action may receive a bad reward due such as R∗-Tree [3], R+-Tree [37], RR∗-Tree [4]. An R-Tree is usu- to some bad actions that were made previously. This makes it even ally built through a one-by-one insertion approach, and is main- more challenging to learn a good policy to solve the two MDPs. tained by dynamic updates. When a spatial object is inserted, the Our method. To this end, we propose the RL based R-Tree, namely ChooseSubtree operation is based on some heuristic rules, such as RLR-Tree. In the RLR-Tree, we carefully model the MDPs of the minimizing minimum bounding rectangle (MBR) area enlargement, operations ChooseSubtree and Split, and propose a novel RL-based minimizing MBR perimeter increment, or minimizing MBR overlap method to learn policies to solve them. The learned policies replace increment among all child nodes in the current node. A node is the hand-crafted heuristic rules that are commonly used for building split based on some heuristic rule when it contains more entries R-Trees. The RLR-Tree possesses several salient features: (1) Its than its capacity. As discussed in Section 1, none of these heuristic models are trained on a small sample of a given dataset and can rules have dominant query performance, either for ChooseSubtree scale successfully to dataset of very large scale. Furthermore, models or Split. This partly motivates us to design learning-based solutions trained on one data distribution can even be applied to data with a for ChooseSubtree and Split operations in this work. different distribution and achieve query performance improvement There exist R-Tree variants that explore the query workload up to 66% compared to the R-Tree. (2) It achieves up to 95% better property or the special property of data. For example, in QUASII query performance than the R-Tree and its variants built using [29] and CUR-Tree [33], the query workload is utilized to build an one-by-one insertion methods, i.e., R-Trees that are designed for R-Tree. In [18], an R-Tree index is built for datasets containing a a dynamic environment. (3) Like the R-Tree, the RLR-Tree can small number of large spatial objects by locating them near to the handle spatial objects of different types, such as points or rectangles, root node. These extensions are orthogonal to our work. and all query processing algorithms for R-trees are still applicable. Data partitioning based indices: Packing. An alternative way However, to the best of our knowledge, all existing learned indices of building the R-Tree is to pack data points into leaf nodes and [6, 8, 24, 27, 31, 40] can only handle point objects when used for then build an R-Tree bottom up. The original R-Tree and almost spatial data, and they only focus on limited types of queries, e.g., all of its variants are designed for a dynamic environment, being range queries, as they need new algorithm for each type of query. able to handle insertions and deletions, while packing methods are for static databases [17]. Packing needs to have all the data before building indices, which are not always available in many The RLR-Tree: A Reinforcement Learning Based R-Tree for Spatial Data situations. The existing packing methods explore different ways of how they can process other types of queries. For example, some of sorting spatial objects to achieve better ordering of the objects and them [6, 8, 27] do not consider KNN queries, an important type of eventually better packing. Some ordering methods [2] are based on spatial queries; Some learned indices [24] extend their algorithm the coordinates of objects, such as STR [23], TGS [13] and the lowx for range queries to handle KNN queries by issuing a series of range packed R-Tree [35] and other ordering methods are based on space queries until 푘 points are found. However, the query performance filling curves such as z-ordering28 [ , 32], Gray coding [9] and the largely depends on the size of the region used. In contrast, the RLR- Hilbert curve [10]. Tree simply uses existing query processing algorithms for R-Tree Other spatial indices. In space partitioning based indices, such as to handle different types of queries. Third, some of these learned kd-Tree [5] and Quad-Tree [12], the space is recursively partitioned indices [31, 40] return approximate query results with bounded er- until the spatial objects in a partition reaches a threshold for an rors. In contrast, our solution return accurate results for all types of index node. In mapping based indices, the spatial dimensions are queries. Fourth, both Flood and Tsunami need the query workload transformed to 1-dimensional space based on a space filling curve, as the input. However, we do not assume the query workload to be and then the data objects can be ordered sequentially and indexed known. Finally, updates are not discussed for Flood, Tsunami or the by a B+-Tree. This design is supported in Microsoft SQL Server. ML-Index. Although RSMI [31] and LISA [24] can handle updates, their models have to be retrained periodically to retain good query 2.2 Learned Indices performances. In contrast, our proposed RLR-Tree readily handles Learned one-dimensional indices. The idea behind learned in- updates without the need to keep retraining the models. dex is to learn a function that maps a search key to the storage address of a data object. The idea of learned indices is first intro- 2.3 Applications of Reinforcement Learning duced by [21], which proposes the Recursive Model Index (RMI) to To the best of our knowledge, no work is done to use RL to im- learn the data distribution in a hierarchical manner. It essentially prove on the R-Tree index or its variants. Different from the typical learns a cumulative distribution function (CDF) using a neural net- applications of RL where there is only one type of agents, our prob- work to predict the rank of a search key. The idea inspires a number lem has two different intermingled operations, ChooseSubtree and of follow-up studies [7, 11, 19] on learned indices. Split. In this sense, our work is related to multi-agent reinforcement Learned spatial indices. Inspired by the idea of learned one- learning (MARL), such as MARL for the prisoner’s dilemma [1], dimensional indices, several learned spatial indices have been pro- collaborative MARL [38], and traffic control [22]. posed. The Z-order model [40] extends RMI to spatial data by using RL has been successfully applied to solve database applications, a space filling curve to order data points and then learning the such as database tuning tasks [25, 39, 44], similarity search [41], CDF to map the key of a data point to its rank. Recursive spatial join order selection [43], index selection [36], and QD-Tree for data model index (RSMI) [31] further develops the Z-order idea [40] and partitioning [42]. In particular, QD-Tree utilizes a query workload RMI. RSMI first maps the data points to a rank space using therank to improve the kd-Tree. An RL model is trained to learn a policy space-based transformation technique [32]. Then, RSMI employs to make partitioning decisions to minimize the number of disk- a multi-dimensional data partitioning technique to partition the based blocks accesses for a given query workload. It depends on a space in a hierarchical manner, and an index is learned for each workload and cannot handle dynamic data updates. Compared with partition. LISA [24] is a disk-based learned index that achieves low these RL applications, our RL setting has two types of intermingled storage consumption and good query performance. It partitions the operations, and it is more challenging to design the actions, states data space with a grid, numbers the grid cells, and learns a data and reward functions. distribution based on this numbering. Similar to Z-order model [40] and RSMI [31], Flood [27] also maps a dataset to a uniform rank 3 PRELIMINARY AND PROBLEM space before learning a CDF. Differently, it utilizes workload to optimize the learning of the CDF, and it learns the CDF of each 3.1 Preliminary dimension separately. Tsunami [8] extends Flood to better utilize An R-Tree [15] is a balanced tree for indexing multi-dimensional workload to overcome the limitation of Flood in handling skewed objects, such as coordinates, rectangles, and polygons. Each tree workload and correlated data. The ML-Index [6] generalizes the node can contain at most 푀 entries, which is referred to as the idea of the iDistance scaling method [16] to map point objects to a capacity. Each node (except for the root node) must also contain at one-dimensional space and then learns the CDF. least 푚 entries. Each entry in a non-leaf node consists of a reference Remark. These learned indices all aim to learn a CDF for a partic- to a child node and the minimum bounding rectangle (MBR) of all ular data to replace the traditional indices. However, the idea of our entries within this child node. Each leaf node contains entries, each proposed RLR-Tree is fundamentally different— Instead of learning of which consists of a reference to an object and the MBR of that any CDF, we aim to train RL models to learn effective policies to object. Therefore, a query that does not intersect with the bounding handle ChooseSubtree and Split operations for the R-Tree index. rectangle cannot intersect any of the contained objects. Furthermore, these learned indices have the following limitations To insert an object into an R-Tree, the tree is traversed with two compared with our solution: First, they can only handle point ob- important operations: ChooseSubtree and Split. Starting from the jects when used for spatial data. In contrast, our proposed method root node, ChooseSubtree is iteratively invoked to decide in which is able to handle any spatial data, such as rectangular objects. subtree to insert a new data object, until a leaf node is reached. Second, they all need customized algorithms to handle each type The data object is inserted into the leaf node and its corresponding of query. They focus on certain types of queries and it is not clear MBR is updated accordingly. If the number of entries in a leaf node Tu Gu1, Kaiyu Feng1, Gao Cong1, Cheng Long1, Zheng Wang1, Sheng Wang2 exceeds the capacity, the Split operation is invoked to divide the MDP: State Space. A state captures the environment that is taken objects into two groups: one remains in the original leaf node and into account for decision making. For ChooseSubtree, it is a natural the other will become a new leaf node. The Split operation may idea that a state is from the tree node whose child nodes are to be need to be propagated upwards: An entry referring to the new leaf selected for inserting a new object. The challenging question is: node is added to the parent node, which may overflow and need to what kind of information should we extract from the tree node to be split recursively. represent the state? The query performance of an R-Tree is highly dependent on Intuitively, as we need to decide which child node to insert the how the R-Tree is built. Intuitively, to process queries efficiently, new object, it is necessary to incorporate the change of the child the MBRs of the nodes do not cover too much empty space and do node if we add the new object into it for each child node. Possible not overlap with each other too much. To this end, many R-Tree features that capture the change of a child node 푁 include: (1) variants have been proposed with different ChooseSubtree and Δ퐴푟푒푎(푁, 표), which is the area increase of the MBR of 푁 if we add Split strategies. the new object 표 into 푁 ; (2) Δ푃푒푟푖(푁, 표), which is the perimeter increase of the MBR of 푁 if we add 표 into 푁 , and (3) Δ푂푣푙푝(푁, 표), 3.2 Goal of this paper which is the increase of the overlap between 푁 and other child nodes after 표 is inserted into 푁 . Furthermore, it is helpful to know As presented in Section 1, most of the existing R-Tree variants adopt the occupancy rate of the child node, denoted by 푂푅(푁 ), which hand-crafted ChooseSubtree and Split strategies. No strategy has is the ratio of the number of entries to the capacity. A child node dominant query performance in all cases. Motivated by this, we with a high occupancy rate is more likely to overflow in the future. aim to learn to build an R-Tree in a data-driven way: We learn As we have presented several features to capture the properties models based on reinforcement learning to replace the hand-crafted of a tree node, a straightforward idea is that we compute the afore- heuristic rules for ChooseSubtree and Split, i.e., we use the learned mentioned features for every child node and concatenate them to models to decide how to choose a subtree for insertion and how to represent the state. However, a challenge for this idea is that the split an overflowing node. The new index is called RLR-Tree. number of the child nodes varies across different nodes, making it difficult to represent a state with a vector of a fixed length.An 4 RLR-TREE idea to address this challenge is to do padding, i.e., we append zeros The process of inserting a new object into an R-Tree is essentially to the features of the child nodes to get a 4· 푀 dimensional vec- a combination of two typical sequential decision making processes. tor, as there are four features and there are at most 푀 child nodes. In particular, starting from the root, it needs to make a decision However, as most of the nodes in an R-Tree are not full, the padded on which child node to insert the new object at each level in a top- representations are likely to have many zeros. The padded zeros will down traversal (ChooseSubtree). It also needs to make a decision add noises and mislead the model, resulting in poor performance, on how to split an overflowing node and divide the entries ina which is confirmed by our preliminary experiments. bottom-up traversal (Split). To address the challenge, we propose to only use a small part Reinforcement learning (RL) has been proven to be effective in of child nodes to define the state. This is because most of the child solving sequential decision making problems. Therefore, we pro- nodes are not good candidates for hosting the new object, as insert- pose to model the insertion of a new object as a combination of ing the new object may greatly increase their MBRs. Here we can two Markov decision processes (MDPs) and adopt RL to learn the deploy heuristics to prune unpromising child nodes from the state optimal policies for ChooseSubtree and Split operations, respec- space, and our RL agent will not consider them for representing a tively. However, as discussed in Introduction, it is challenging to state. Our design of state representation is as follows: We first sort design RL-based approaches for ChooseSubtree and Split. In this the child nodes by their area increase and retrieve the top-푘 child section, we present both our final designs for the two MDPs and nodes, where the sorting by area increase is based on empirical find- the other representative designs that we have explored to formulate ings. Then for each retrieved child node, we compute four features the problem. Specifically, we present how to choose subtree with Δ퐴푟푒푎(푁, 표), Δ푃푒푟푖(푁, 표), Δ푂푣푙푝(푁, 표), and 푂푅(푁 ). We concate- RL in Section 4.1, and present how to split an overflowing node in nate the features of the 푘 child nodes to get a 4· 푘 dimensional Section 4.2. Finally, we present how to combine the two learned vector to represent a state. 푘 is a parameter to be empirically. models to construct the RLR-Tree in Section 4.3. Note that to make the representation of different states comparable, the increases of area, perimeter and overlap are normalized by the 4.1 ChooseSubtree maximum corresponding value among all 푘 child nodes. To insert a new object into the R-Tree, we need to conduct a top- Remark. It is a natural idea that we can include more features to down traversal starting from the root. In each node, we need to represent a state. For instance, we can include global information, decide which child node to insert the new object. To choose a such as the tree depth and the size of the tree, and local information, subtree with RL, we formulate this problem as an MDP. We proceed such as the depth of the tree node, the coordinates of the boundary to present how to train a model to learn a policy for the MDP. of the MBR. However, our experiments show that these features do not improve the performance of our model while making the 4.1.1 MDP Formulation. An MDP consists of four components, model training slower. The four features that we use are sufficient namely states, actions, transitions, and rewards. We proceed to ex- to train our model to make good subtree choices as shown in our plain how states and actions are represented in our model, and then experiments. It would be a useful future direction to design and present the reward signal design, which is particularly challenging. evaluate other state features. The RLR-Tree: A Reinforcement Learning Based R-Tree for Spatial Data

MDP: Action Space. An action is a possible decision that could be efficiently, the reward signal is expected to reflect the improvement made by the RL agent. As many R-Tree variants have been proposed of query performance. with different ChooseSubtree strategies, such as minimizing the In the process of ChooseSubtree, it is challenging to directly increase of area, perimeter, or overlap. A straightforward idea is to evaluate whether an action taken at a state is good or not, because make the different cost functions as the actions, i.e., the agent will the new object has not been fully inserted into the R-Tree yet. A decide which cost function to use to choose the subtree. We have straightforward idea is after the new object has been inserted, we explored the idea with great efforts. For example, we have tried use the R-Tree to process a set of random range queries. The inverse different combinations of these cost functions as well as different of the cost (e.g., the number of accessed nodes) for processing the state space design. However, according to our experimental results, queries are set as the reward shared by all of the state-action pairs this idea is not effective. Table 1 depicts the average relative node ac- encountered in the insertion of the new object. At first glance, the cesses for processing 1,000 random range queries on three datasets agent seems to be encouraged to take the actions that lead to an of different distributions, namely Skew, Gaussian and Uniform. The R-Tree that can process range queries by only accessing as few relative node access will be defined in Experiments. Intuitively, if nodes as possible. However, this is not the case due to the following the relative node access is smaller than 1, the index requires fewer reasons: (1) A previous action may affect the structure of the R- node accesses than the R-Tree, i.e., the index outperforms R-Tree. Tree in the future, which may further affect the query performance. We observe from Table 1 that compared with the classic R-Tree, Therefore, a “good” action may receive a poor reward due to some an MDP model with the cost functions as actions only achieves an bad actions that were made previously. (2) More importantly, as improvement of less than 2% in query efficiency. It is much worse we aim to learn to construct an R-Tree that outperforms the com- than our final design (to be presented later). A possible reason is petitors, we are interested in knowing what kind of actions makes that we find from our experiments that in 90 % of the nodes inan the resulted R-Tree better than a competitor, and what kind of ac- R-Tree, different functions end up with the same subtree choice, tions makes it worse. The aforementioned reward signal cannot and this would leave us less room for improvement. distinguish the two types of actions, making it ineffective for the agent to learn a good policy to outperform the competitors. (3) As Table 1: Performance of cost function based action space. more objects are inserted into the R-Tree, the average number of accessed nodes naturally increases. Therefore, the reward signal Relative node access becomes weaker and weaker, which makes it difficult for the model Skew Gaussian Uniform Use cost functions 0.98 0.98 1.00 to learn useful information in the late stage of the training. Our final design 0.29 0.08 0.56 Inspired by the observations, we design a novel reward signal for ChooseSubtree. The high level idea is that we maintain a reference After much effort, we give up the idea of using existing cost tree with a fixed ChooseSubtree and Split strategy. The reference functions for the action space. Instead, we propose a new idea of tree serves as a competitor and can be any existing R-Tree variant. training the RL agent to decide which child node to insert the new The reward signal is computed based on the gap between costs for object directly. Based on the idea, one design is to have all child processing random queries with the reference tree and the RLR-Tree. nodes to comprise the action space. However, this incurs two chal- Specifically, the design of the reward signal is as follows: lenges: 1) the number of child nodes contained by different nodes (1) We maintain an R-Tree, namely RLR-Tree, that uses RL to is usually different, and 2) the action space is large. Considering all decide which child node to insert the new object, and adopts a child nodes as the actions leads to a large action space with many pre-specified Split strategy. “bad actions”. The bad actions make the exploration during model (2) We maintain a reference tree which adopts a pre-specified training ineffective and inefficient. To address the challenges, we ChooseSubtree strategy and the same Split strategy as the RLR- use the similar idea as we use for designing state space, i.e., we Tree. remove those child nodes that cannot be good actions, which is (3) We synchronize the reference tree with the RLR-Tree, so that identified using existing heuristic rule, such as their MBRs that are they have the same tree structure. greatly increased when the new object is inserted. Furthermore, (4) Given 푝 new objects {표1, . . . , 표푝 }, we insert them into both recall that in designing the state space, we propose to retrieve top-푘 the reference tree and the RLR-Tree. child nodes in terms of the increase of area to represent a state. To (5) After the 푝 objects are inserted, we generate 푝 range queries make the action space and the state representation consistent, we of predifined sizes whose centers are atthe 푝 objects, respectively. define the action space A = {1, . . . , 푘}, where action 푎 = 푖 means (6) We process the 푝 range queries with both the reference tree the RL agent chooses the 푖-th retrieved child node to be inserted and the RLR-Tree. We compute the normalized node access rate, with the new object. # acc. nodes which is defined as Tree height and is the number of accessed node MDP: Transition. In the process of the ChooseSubtree operation, for answering a range query over the tree height. Let 푅 and 푅′ be given a state (a node in the R-Tree) and an action (inserting the new the normalized node access rate of the RLR-Tree and the reference object into a child node), the RL agent transits to the child node. If tree, respectively. We compute 푟 = 푅′ −푅, which is the gap between the child node is a leaf node, the agent reaches a terminal state. two rates as the reward signal. The higher 푟 is, the fewer node MDP: Reward Signal. A reward associated with a transition corre- RLR-Tree needs to access to process the range queries than the sponds to some feedback indicating the quality of the action taken reference tree. at a given state. A larger reward indicates a better quality. Since our objective is to learn to build an R-Tree that processes query Tu Gu1, Kaiyu Feng1, Gao Cong1, Cheng Long1, Zheng Wang1, Sheng Wang2

(7) All the transitions encountered in the insertion of the 푝 objects (line 19). The parameters in the target network 푄ˆ are periodically share the same reward 푟. synchronized with 푄 (line 20). With the idea, we are able to distinguish the good actions from the bad actions: A positive reward means that the RLR-Tree pro- cesses the recent 푝 insertions well as it requires fewer node access to process the queries compared with the reference tree. Moreover, as the reference tree is periodically synchronized with the RLR-Tree, we can avoid the effect of previous actions. Therefore, maximizing the accumulated reward is equivalent to encouraging the agent to take the actions that can make the RLR-Tree outperform the competitors. 4.1.2 Training the Agent for ChooseSubtree. We have formulated the MDP for ChooseSubtree. Next, we present how to use RL to learn a policy to address the MDP. Deep-푄-Network (DQN) Learning. Deep Q-learning is a com- monly used model-free RL method. It uses a Q-function 푄∗ (푠, 푎) to represent the expected accumulated reward that the agent can Figure 1: RL ChooseSubtree Model Training obtain if it takes the action 푎 in state 푠 and then follows the optimal Remark. It is possible that the new object to be inserted is fully policy until it reaches a terminal state. The optimal policy takes the contained in one of the child nodes. If we add the new object into action with the maximum 푄-value in any state. Deep-푄-Network such a child node, the MBRs of all child nodes are not affected. [26] has been proposed to approximate the Q-function 푄∗ (푠, 푎) Therefore, it is unnecessary to make the agent consider such cases. with a deep neural network 푄 (푠, 푎; Θ) with parameters Θ. In our In our implementation, when such cases happen, we do not pass model, we adopt the deep Q-learning with experience replay [26] the state representation to the model, but choose the child node for learning the 푄-functions. that contains the new object directly. Given a batch of transitions (푠, 푎, 푟, 푠′), parameters in 푄 (푠, 푎; Θ) is updated with a gradient descent step by minimizing the mean square error (MSE) loss function, as shown in Equation 1. Algorithm 1: DQN Learning for ChooseSubtree 1 Input: A training dataset; 2 Output: Learned action-value function 푄 (푠, 푎; Θ); ∑︁ ′ ′ − 2 퐿(휃) = [(푟 + 훾푚푎푥푎′푄ˆ (푠 , 푎 ; Θ ) − 푄 (푠, 푎; Θ)) ], (1) − 3 Initialize 푄 (푠, 푎; Θ), 푄ˆ (푠, 푎; Θ ); 푠,푎,푟,푠′ 4 for 푒푝표푐ℎ = 1, 2,... do where 훾 is the discount rate, and 푄ˆ (; Θ−) is frozen target network. 5 Replay memory M ← ∅; Training the Agent Now we are ready to present how to train 6 for every 푝 objects {표1, . . . , 표푝 } in dataset do the agent for ChooseSubtree. Figure 1 shows the training process 7 푇푟 ← 푇푟푙 , 푆퐴 ← ∅, 푅푄 ← ∅; for ChooseSubtree and the detailed procedure is presented in Al- 8 for 표푖 ∈ {표1, . . . , 표푝 } do gorithm 1. We first initialize the main network 푄 (푠, 푎; Θ) and the 9 Insert 표푖 into 푇푟 ; − target network 푄ˆ (푠, 푎; Θ ) with the same random weights (line 10 푁 ← the root of 푇푟푙 ; 3). In each epoch, it first resets the replay memory (line 5). Then 11 while 푁 is non-leaf do it involves a sequence of insertions of the objects in the training 12 푠 ← state representation of 푁 and 표푖 ; dataset (lines 6–20). Specifically, for every 푝 objects {표1, . . . , 표푝 }, 13 푎 ← an action selected by 휖-greedy based on we synchronize the structure of 푇푟 with 푇푟푙 (line 7). For each 표푖 of 푄-values; the 푝 objects, we first insert it into the reference tree (line 9). Then 14 푁 ← 푎, 푆퐴 ← 푆퐴 ∪ {(푠, 푎)}; a top-down traversal on the RLR-Tree is conducted (lines 10–15). 15 Insert 표 into 푁 , split until no node overflows; At each level, we compute the state representation (line 12) and use 푖 16 푅푄 ← 푅푄 ∪ {a range query centered at 표 }; 휖-greedy to choose the action based on their 푄-values (line 13), until 푖 17 푟 ← compute reward with queries in 푅푄; we reach a terminal state (leaf node). The transitions are stored in ′ 푆퐴 (line 14). At the leaf node, we insert the new object and use the 18 Add (푠, 푎, 푟, 푠 ) for every (푠, 푎) ∈ 푆퐴 into memory; same Split strategy as the reference tree in a bottom-up scan to 19 Draw samples from memory and update Θ in 푄 (; Θ); − ensure no node overflows (line 15). Meanwhile, we generate a range 20 Periodically synchronize 푄ˆ (; Θ ) with 푄 (; Θ); query with a predefined size centered at 표푖 and add the query to 푅푄 (line 16). When the 푝 objects have been inserted, we compute the reward with the queries in 푅푄 (line 17). All transitions encountered 4.1.3 Time Complexity. In our analysis, the additional computation in the insertions of the 푝 objects share the same reward 푟 and are cost associated with the use of neural networks in an RLR-Tree is pushed into the replay memory (line 18). Then we draw a batch of deemed constant. Assume the RLR-Tree has a size of 푆 and a height transitions randomly from the replay memory and use the batch to of ℎ. Inserting an object into the RLR-Tree encounters ℎ − 1 states. update the parameters in the main network 푄 (; Θ) as DQN does At each state, it takes 푂 (푘 ·푀) time to retrieve the top-푘 child nodes The RLR-Tree: A Reinforcement Learning Based R-Tree for Spatial Data and 푂 (푀) to compute the features for each child node. Therefore, an object into the R-Tree, ChooseSubtree is iteratively invoked the overall time complexity is 푂 (ℎ· 푘 · 푀). As a comparison, it takes at each level in the top-down traversal. On the other hand, Split 푂 (ℎ· 푀) time for ChooseSubtree in the R-Tree. operation is invoked only when a node overflows. Therefore, only a few transitions for Split are available for training, making it difficult 4.2 Split for the agent to learn a good policy in reasonable time. The top-down traversal ends up at a leaf node. If the leaf node is To tackle this challenge, we design a new method for the agent to not full, the new object is inserted. Otherwise, the leaf node has to interact with the environment, so that Split operation is frequently be split into two nodes and the Split operation may be propagated invoked. Intuitively, if all the nodes in the R-Tree are full, the inser- upwards. In this subsection, we present how to model Split as an tion of a new object definitely causes a tree node to split. Inspired MDP and how to train a model to learn a policy to solve it. by this, we propose to first build an R-Tree in which most of the nodes are full, so that node splits can be frequently encountered. 4.2.1 MDP Formulation. Similar to ChooseSubtree, we have ex- We generate such almost-full R-Trees with different sizes, and use plored different ideas to design the state, action, transition and the transitions caused by inserting the remaining objects in the reward signal of the MDP for Split. Due to the space limitation, we dataset for training. only present the final design here. Figure 2 shows the framework for training for Split and the MDP: State Space. In terms of Split, it is a natural idea that a procedure is presented in Algorithm 2.In each epoch, we repeat state is from the node that overflows. A straightforward idea is to 푝푎푟푡푠−1 iterations to train the agent (lines 5–24). In particular, in the make the representation of a state capture the goodness of all the 푖-th iteration, we use the first 푖 of the objects in the dataset to possible splits, so that the agent can decide how to split the node. 푝푎푟푡푠 However, since an overflowing node contains 푀 + 1 entries, there construct an R-Tree 푇푏푎푠푒 (line 7). For each of the remaining objects, are (2푀+1 − 2) possible splits in total. It is impractical to reflect all if inserting it into푇푏푎푠푒 causes splits, we add it to 푂푡푟푎푖푛, which will of these splits in the state representation. be used to trigger splits for training later (line 9). Otherwise, the In order to avoid considering so many possible splits, we adopt object is inserted into 푇푏푎푠푒 (line 10). In this way, most of the nodes a similar idea as R∗-Tree [4] as follows: We first sort the entries in 푇푏푎푠푒 are likely to be full and the objects in 푂푡푟푎푖푛 are likely to with respect to their projection to each dimension. For each sorted cause splits. After푇푏푎푠푒 is constructed, we start training with objects sequence, we consider the split at the 푖-th element (푚 ≤ 푖 ≤ 푀 + in 푂푡푟푎푖푛 (lines 11–24). For every 푝 objects, we first synchronize 1 − 푚), where the first 푖 entries are assigned to the first group 푇푟 and 푇푟푙 with 푇푏푎푠푒 (line 12). This makes 푇푟 and 푇푟푙 have the and the remaining 푀 + 1 − 푖 entries are assigned to the second same structure and are almost full. Then for each of the 푝 objects, group. This means for each sorted sequence, we only consider we insert it into the reference tree 푇푟 directly with pre-specified 푀 + 2 − 2 · 푚 splits. Next, we further discard the splits which create ChooseSubtree and Split strategies (line 14). For the RLR-Tree, we two nodes that overlap with each other. We sort the remaining use the same ChooseSubtree strategy as the reference tree to reach splits in the ascending order of total area and pick the top-푘 splits a leaf node 푁 (line 15). If 푁 overflows, we generate a range query to construct the representation of the state. Specifically, for each with a predefined size centered at 표푖 and add the query to 푅푄 (line split, we consider four features: the areas and the perimeters of the 16). Then we iteratively split 푁 and move to its parent until 푁 does two nodes created by the split. We concatenate the features of all 푘 not overflow (lines 17–20). For each node 푁 , we compute the state splits and generate a 4푘-dimensional vector to represent the state. representation and use 휖-greedy to select an action based on their 푄- Note that the areas and perimeters are normalized by the maximum values (lines 18–19). The transitions are stored in 푆퐴 (line 20). When area and perimeter among all splits, so that each dimension of the the 푝 objects have been processed, we compute the reward with the state representation falls in (0, 1]. queries in 푅푄 (line 21). The transitions encountered in processing MDP: Action Space. Similar to ChooseSubtree, in order to make the 푝 objects share the same reward (line 22). Then we draw a batch the action space consistent with the candidate splits that are used of random transitions from the replay memory and use it to update to represent the state, we define the action space as A = [1, 푘], the parameters in the main network (line 23). The parameters in ˆ − where 푘 is the number of splits that we use to represent the state. the target network 푄 (; Θ ) are periodically synchronized with the An action 푎 = 푖 means that the 푖-th split is adopted. main network 푄 (; Θ) (line 24). MDP: Transitions. In Split, given a state (a node in the R-Tree) Remark. When we split a node to two, the ideal case is that the two and an action (a possible split), the agent transits to the state that nodes do not overlap with each other, so that the number of node represents the parent of the node. If the node does not overflow, it access for processing a query can be reduced. It is more challenging is the terminal state. when there are many candidate splits that generate two nodes with MDP: Reward Signal. The reward signal for Split is similar to zero overlap, as we need to carefully consider how to break the tie. that of ChooseSubtree. We maintain a reference tree that is peri- As a result, we consider such a special case in the exploration of odically synchronized with the RLR-Tree. We use the gap between the agent. Specifically, if there exists at most one candidate split the cost for processing random queries with the two trees as the that generates two non-overlapping nodes, we do not compute the reward signal. The difference is that the RLR-Tree adopts the same state representation or choose the action based on their Q-values, ChooseSubtree strategy as the reference tree and uses the RL agent and we simply select the split with the minimum overlap. We only to decide how to split an overflowing node. use RL to decide how to split the node when more than one split generates non-overlapping nodes. In this way, the agent is trained 4.2.2 Training the Agent for Split. Compared with ChooseSubtree, more effectively. training the agent for Split is a more challenging task. To insert Tu Gu1, Kaiyu Feng1, Gao Cong1, Cheng Long1, Zheng Wang1, Sheng Wang2

along each dimension, which takes 푂 (푀 log 푀) time. Then it takes 푂 (푘·(푀 − 2푚)· 푀) time to retrieve the top 푘 splits with minimum total area. Finally, it takes 푂 (푘· 푀) time to compute the four features for the 푘 splits. Therefore, the overall time complexity for RLR-Tree Split is 푂 (ℎ· 푘·(푀−2푚)· 푀). As a comparison, Split operation takes 푂 (ℎ· 푀2) time in the classic R-Tree. 4.3 The Combined Model We have presented how to train the DQN models for ChooseSubtree and Split separately. As we aim to learn to build an R-Tree with a good query efficiency, we expect the two agents are closely tied together and are able to help each other to achieve a better perfor- Figure 2: RL Split Model Training mance. An idea for combining the two models is to make the agents share information such as the observed states and selected actions Algorithm 2: DQN Learning for Split with each other. This may enhance the learned policy. However, this is not the case in our problem. Recall that we specially design 1 Input: A training dataset; a learning process for Split, as node overflow occurs infrequently 2 Output: Learned action-value function 푄 (푠, 푎; Θ); − in the construction of an R-Tree. Therefore, the learning processes 3 Initialize 푄 (푠, 푎; Θ), 푄ˆ (푠, 푎; Θ ); of ChooseSubtree and Split deal with different data with different 4 for 푒푝표푐ℎ = 1, 2, 3,... do procedures, making it difficult to exchange information between 5 for 푗 = 1, 2, 3,...(푝푎푟푡푠 − 1) do the two agents. 6 푂푡푟푎푖푛 ← ∅; Motivated by this, we propose to train the two agents alternately. 푗 7 Construct 푇푏푎푠푒 with the first 푝푎푟푡푠 of the objects; Specifically, in odd epochs, the RL agent for ChooseSubtree under- 8 for 표 in the remaining objects do goes training and the agent for Split is fixed to be the strategies for 9 if Inserting 표 into 푇푏푎푠푒 causes a split then Add the RLR-Tree. In even epochs, the agent for Split undergoes training 표 to 푂푡푟푎푖푛; and the agent for ChooseSubtree is fixed to be the ChooseSubtree strategy for RLR-Tree. 10 else Insert 표 into 푇푏푎푠푒 ; With the learned models for ChooseSubtree and Split, the final 11 for every 푝 objects {표1, . . . , 표푝 } ⊆ 푂푡푟푎푖푛 do RLR-Tree can be built using one-by-one insertion method. Fig- 12 푇 ← 푇 , 푇푟 ← 푇 , 푆퐴 ← ∅, 푅푄 ← ∅; 푟푙 푏푎푠푒 푏푎푠푒 ure 3 shows the procedure of inserting an object into the RLR-Tree. 13 for 표 ∈ {표 , . . . , 표 } do 푖 1 푝 Specifically, in the top-down traversal, we compute the state repre- 14 Insert 표 into 푇 ; 푖 푟 sentation of a node and use the model trained for ChooseSubtree to 15 Top-down traversal on 푇 to a leaf node 푁 ; 푟푙 select the subtree corresponding to the action with the maximum 푄- 16 if 푁 overflows then value. When a node overflows, we compute its state representation 푅푄 ← 푅푄 ∪ {random query}; and use the model trained for Split to choose the split corresponding 17 while 푁 overflows do to the action with the maximum 푄-value. 18 푠 ← state representation of 푁 ; 19 푎 ← an action selected by 휖-greedy RL ChooseSubtree based 푄-values; Go to Child Node & Update MBR

20 푁 ← 푁 ’s parent, 푆퐴 ← 푆퐴 ∪ {(푠, 푎)} No

Select Child 21 푟 ← compute reward with queries in 푅푄; Current Node Input Output Node Q-values Leaf node? ′ State 22 Add (푠, 푎, 푟, 푠 ) for all (푠, 푎) ∈ 푆퐴 to memory; Database Yes Object 23 Draw samples from memory and update 푄 (; Θ); to Insert Deep-Q-Network Add Object to Leaf − Node and Update MBR 24 Periodically synchronize 푄ˆ (; Θ ) with 푄 (; Θ); RL Split Overflow? Complete Yes No

Input Output Select Split

State Q-values The training process for Split has two advantages. Firstly, using Go to Parent different fractions of objects to build an “almost-full” base tree helps Current Node Node Parent Node to learn a more general model. Secondly, by building the “almost- Deep-Q-Network full” base tree and periodically resetting 푇푟 and 푇푟푙 to the base tree ensures that the Split operation is consistently invoked at a high Figure 3: Inserting an object to the RLR-Tree. frequency. This makes the training process more efficient. 4.2.3 Time Complexity. Similar to ChooseSubtree, the additional computation cost associated with the use of neural networks is 5 EXPERIMENTS deemed constant. If a node overflows, at most 푂 (ℎ) Split opera- We evaluate the performance of RL ChooseSubtree, RL Split, and tions are invoked. In each Split operation, we first sort the entries the RLR-Tree. The RLR-Tree: A Reinforcement Learning Based R-Tree for Spatial Data

5.1 Experimental Setup than the R-Tree, and smaller RNA values indicates better query Datasets. We use 3 synthetic datasets used in previous work on performance compared with the R-Tree. Note that all the indices spatial indices [2, 31, 32], and 2 large real-life datasets. compared use the same sequence of object insertion. Parameter settings. Table 2 shows a list of parameters and their (1) Skew (SKE): It is a synthetic dataset consisting of small corresponding values tested in our experiments. The default settings squares of a fixed size. The 푥 and 푦 coordinates of the squares are bold. For all R-Tree variants evaluated in this paper, we maintain centers are randomly generated from a uniform distribution a maximum of 50 and a minimum of 20 child nodes per tree node. in the range [0, 1] and then “squeezed” in the 푦-dimension, ( ) ( 9) that is, each square center 푥,푦 is replaced by 푥,푦 ; Table 2: Parameters and Values (2) Gaussian (GAU): It is a synthetic dataset consisting of small squares of a fixed size. The coordinates of the center ofa Parameters Values Data distribution SKE, GAU, UNI square are (푥,푦) where 푥 and 푦 are randomly generated from Dataset size (million) 1, 5, 10, 20, 100 = Training set size (thousand) 25, 50, 100, 200 a Gaussian distribution with mean 휇 0.5 and standard Training query size (%) 0.005, 0.01, 2 deviation 휎 = 0.2; Testing query size (%) 0.005, 0.01, 0.05, 0.1, 0.5, 1, 2 (3) Uniform (UNI): It is a synthetic dataset consisting of small Action space size 푘 2, 3, 5, 10 squares of a fixed size. The 푥 and 푦 coordinates of the cen- ters of the squares are randomly generated from a uniform The DQN models for both ChooseSubtree and Split contain 1 distribution in the range [0, 1]; hidden layer of 64 neurons with SELU [20] as the activation function. (4) OSM China (CHI): It is a real dataset that contains more than In the training process, the learning rate is set to be 0.003 for RL 98 million points in China extracted from OpenStreetMap; ChooseSubtree and 0.01 for RL Split. The initial value of 휖 is set (5) OSM India (IND): It is a real dataset that contains more than to be 1 and the decay rate is set to be 0.99. The value of 휖 is never 100 million points in India extracted from OpenStreetMap. allowed to be less than 0.1 in order to maintain a certain degree of exploration throughout model training. The replay memory can By following previous work, all spatial data objects in synthetic contain at most 5,000 (푠, 푎, 푟, 푠′) tuples. Network update is done by datasets fall within the unit square. first sampling a batch of 64 tuples from the replay memory. Then Queries. To measure the reward for our proposed methods, we Θ is updated by using gradient descent of the MSE loss function to propose to run range queries over both a reference R-Tree and close the gap between the Q-value predicted by Θ and the optimal the RLR-Tree built by the proposed method. To support this, we Q-value derived from Θ−. The discount factor is set to be 0.95 for generate range queries of different sizes. For model testing, we RL ChooseSubtree and 0.8 for RL Split. Synchronization of Θ− with randomly generate 1,000 range queries for each query size ranging Θ is done once every 30 network updates. from 0.005% to 2% of the whole region. Note that queries used for During the model training for RL ChooseSubtree (resp. RL Split), training and testing are completely different. the deterministic splitting (resp. insertion) rules are set to be the Baselines. As the objective of this work is to use machine learn- same as that used by the reference tree which is minimum overlap ing techniques to improve on the performance of the R-Tree that partition (resp. minimum node area enlargement). We train the RL supports updates in dynamic environments, we compare with the ChooseSubtree and Split models for 20 and 15 epochs, respectively, R-Tree and its variants that are designed for this purpose. Baselines and set 푝푎푟푡푠 in Algorithm 2 to be 15, i.e., the training dataset is used in the experiments include R-Tree [15] which is also the ref- divided into 15 equal parts. The action space size 푘 for both RL erence tree used for model training, R*-Tree [3] and RR*-Tree [4], ChooseSubtree and RL Split is set to be 2 by default. Note that the which is reported to have the best query performance among R-Tree trivial case of 푘 = 1 simply gives us the reference tree. variants built using one-by-one insertion for supporting queries We train our models on NVIDIA Tesla V100 SXM2 16 GB GPU us- in dynamic environments. Note that we do not compare with the ing PyTorch 1.3.1. All indices including both RLR-Tree and baselines packing R-Trees that are designed for the static databases [17] or the are coded using C++. learned indices as they are not designed for dynamic environment as well discussed in Section 2. 5.2 Experimental Results Measurements. We run all the indices in main memory for ease We first report the evaluation result ofRL ChooseSubtree and RL of comparisons. Note that it is straightforward to place the data Split, then the performance result of the proposed RLT-Tree, and blocks in external memory. For measurements of performance, we finally other experimental results. consider both running time and the number of node access. We find that both measures yield qualitatively consistent result because all 5.2.1 RL ChooseSubtree. We report the average RNA of the RL the indices compared in this study are R-Tree based. We only report ChooseSubtree on three synthetic datasets by varying the query re- the number of node accesses in this paper. Note that the number gion size and dataset size, respectively. Figure 4a shows that the RL of node accesses can also serve as a performance indicator for an ChooseSubtree model outperforms R-Tree consistently over differ- external memory based implementation of the R-Tree indices. More ent query sizes. The best RNA is 0.07 and observed on GAU dataset specifically, we report the average relative nodes accesses (RNA). for query size 0.005%. We also observe that the performance of RL For each query, the RNA of an index is computed by the ratio of the ChooseSubtree gets better as query size decreases. This is because number of nodes accessed for it to answer the query to the number a query with a larger size intersects with a larger number of tree of nodes accessed for an R-Tree to answer the same query. RNA nodes, and more tree nodes will then be traversed when answering value below 1 indicates that the index query performance is better such a query. In this case, all R-Tree based indices need to visit Tu Gu1, Kaiyu Feng1, Gao Cong1, Cheng Long1, Zheng Wang1, Sheng Wang2 a relatively larger portion of the data and the difference between lowest RNA is 0.2 which is observed for data size 20 million and different index techniques will diminish. Consider an extreme case query size 0.005%. On the other hand, the least significant model where a range query covers the entire data space. Then all tree performance is observed on UNI dataset. This trend is consistent nodes will be traversed when answering the query, irrespective the with RL ChooseSubtree. index used, and thus RNA will be close to 1. Figure 5a shows that RL Split outperforms the R-Tree consistently

1 1 over different query sizes, and the improvement can be up to 80%. SKE SKE GAU GAU We also observe that the RL Split model has better improvement on 0.8 UNI 0.8 UNI the R-Tree as the query size decreases. The possible reasons would

0.6 0.6 be similar as we discussed in Section 5.2.1 for RL ChooseSubtree. RNA RNA 0.4 0.4 Figure 5b shows that the RL Split model outperforms the R-Tree consistently over datasets of different sizes. Note that the RL Split 0.2 0.2 model is trained on a small training dataset with only 100,000 data 0.005 0.01 0.05 0.1 0.5 1 2 1 5 10 20 100 objects, but can be applied on large datasets successfully. Moreover, Testing Query Size (%) Dataset Size (million) we also observe that as dataset size increases from 1 million to (a) Different Query Sizes (b) Different Dataset Sizes 20 million, the improvement over the R-Tree generally increases Figure 4: Performance of RL ChooseSubtree on all 3 distributions. However, model performance slightly dete- Figure 4b shows that the RL ChooseSubtree model outperforms riorates as dataset size increases from 20 million to 100 million. the R-Tree consistently over different data sizes. It is remarkable that The reason might be due to the inherent complicatedness of Split. although the training dataset size is only 100,000, the trained model Node splitting results in the creation of 2 new nodes. As dataset can be applied on large datasets successfully. Furthermore, we ob- size increases, the increase in tree height further contributes to this serve that the model performance gets better on larger datasets! complicatedness because more nodes may be split from one Split This could be because as dataset gets larger the index tree will process, and the RL Split model trained on 100,000 data objects become larger and more RL based subtree selections are done would have more room for improvement. on average. Hence, the accumulated benefits from the RL based 5.2.3 RLR-Tree. This experiment is to evaluate the query perfor- ChooseSubtree enable the RL ChooseSubtree model to outperform mance of the RLR-Tree, which is based on the RL ChooseSubtree the R-Tree more significantly. and RL Split operations. We observe that RL ChooseSubtree has the most remarkable The enhanced training process. We first compare the perfor- performance on the GAU dataset, compared with datasets of other mance of RL ChooseSubtree, RL Split and RLR-Tree, which uses distributions. The best RNA is as low as 0.06 on dataset of size 100 the enhanced training process, on the 3 synthetic datasets and 2 million and query size 0.005%, i.e., the proposed RL ChooseSubtree real datasets. As highlighted in Table 3, by applying the enhanced operation outperforms the R-Tree by up to 94%. A possible expla- training process, RLR-Tree outperforms both RL ChooseSubtree nation is that under the GAU distribution, spatial data objects are and RL Split on all datasets. dense in the region close to the center of the data space, resulting Table 3: RL ChooseSubtree, RL Split and RLR-Tree in a large degree of overlap among tree nodes, hence a lot of these overlapped nodes are processed when answering queries in this SKE GAU UNI CHI IND RLR-Tree 0.21 0.06 0.54 0.56 0.65 region. As the RL model learns the relationship between nodes RL ChooseSubtree 0.29 0.08 0.56 0.60 0.67 that overlap (reflected in the state space) and query performance RL Split 0.22 0.28 0.58 0.63 0.71 (reflected in the reward), it is expected to learn to select asuitable action to alleviate this problem. Range queries. We evaluate the query performance of the RLR- Tree on all the 5 datasets using range queries of different sizes. 5.2.2 RL Split. To evaluate the query performance of the RL Split As shown in Figures 6a and 6b, RLR-Tree outperforms the three model, we conduct experiments on the 3 synthetic datasets of dif- baselines, the R-Tree, the R*-Tree, and the RR*-Tree, on all the ferent sizes by running range queries of different sizes. synthetic datasets. The best performance is observed on the GAU dataset where RLR-Tree outperforms RR*-Tree by 78.6% and R*-Tree 1 1 SKE SKE by 92% for query size 0.005%. Figures 6c and 6d show that RLR- GAU GAU UNI 0.8 UNI 0.8 Tree outperforms R*-Tree and RR*-Tree by up to 27.3% and 22.9%,

0.6 respectively. On all the 5 datasets, RLR-Tree has more significant

0.6 RNA RNA advantages over baselines for smaller query sizes. This observation 0.4

0.4 is consistent with RL ChooseSubtree and RL Split and possible 0.2 reasons have been discussed in section 5.2.1. For query size 2% all 0.2 0.005 0.01 0.05 0.1 0.5 1 2 1 5 10 20 100 the indexes have similar performance, which is similar to the results Testing Query Size (%) Dataset Size (million) reported in previous work [32]. (a) Different Query Sizes (b) Different Dataset Sizes KNN queries. To evaluate the performance of the RLR-Tree for types of queries that are not used in the model training process, we Figure 5: Performance of RL Split look into the K-Nearest-Neighbor (KNN) queries, which is a type When making comparisons among different data distributions, of very popular spatial queries. A KNN query returns the 퐾 nearest model performance is the most remarkable on SKE dataset. The objects to a given query point. In our experiments, we consider The RLR-Tree: A Reinforcement Learning Based R-Tree for Spatial Data

1 RLR-Tree RR*-Tree index is trained to minimize the number of nodes accesses when 0.8 R*-Tree answering range queries and not KNN queries. However, despite

0.6 those design features that do not favour KNN queries, RLR-Tree RNA 0.4 still has the best performance most of the time for KNN queries.

0.2 5.2.4 The effect of parameters. The Value of 푘. The first parameter we look into is the action 0.005 0.01 0.05 0.1 0.5 1 2 Testing Query Size (%) space size 푘. As top 푘 candidates are shortlisted to form the action space, the value of 푘 has a direct impact on the model performance. (a) Different Query Sizes (b) Different Datasets It is important to evaluate the tradeoff between 푘 and the model 1 1 performance. On one hand, when 푘 is larger, more actions are

0.9 RLR-Tree 0.8 RR*-Tree available to be selected for the trained model. On the other hand, RLR-Tree R*-Tree RR*-Tree 0.8 model performance can be adversely affected when the action space

0.6 R*-Tree

RNA RNA 0.7 is large as the trained model may not do a good job to filter out

0.4 "bad" candidates. In order to find a good 푘 value, we test different 0.6 values of 푘 on the 3 synthetic datasets of size 500,000. 0.2 0.5 0.005 0.01 0.05 0.1 0.5 1 2 0.005 0.01 0.05 0.1 0.5 1 2 5 Testing Query Size (%) Testing Query Size (%) 10 SKE GAU (c) CHI Dataset (d) IND Dataset UNI Figure 6: RLR-Tree Performance (Range Queries) 104 0.7

0.6 Training Time (s) RLR-Tree 0.5 RR*-Tree R*-Tree 0.4 103 25 50 100 200 RNA 0.3 Training Dataset Size (Thousand) 0.2 (a) Varying 푘 Values (b) Varying Training Set Size 0.1 1 SKE 1 5 25 125 625 GAU K 0.8 UNI

(a) Different 퐾 Values (b) Different Datasets 0.6

0.7 0.7 RNA RLR-Tree RLR-Tree 0.4 RR*-Tree RR*-Tree 0.6 R*-Tree R*-Tree 0.6 0.2

0.5

0.5 0 RNA RNA 25 50 100 200 0.4 Training Dataset Size (Thousand) 0.4 0.3 (c) Varying Training Set Size (d) Varying Training Query Size

0.2 0.3 1 5 25 125 625 1 5 25 125 625 Figure 8: Effect of Parameters K K (c) CHI Dataset (d) IND Dataset We use RL ChooseSubtree as an example to show the effect of 푘 on relevant query performances as shown in Figure 8a. We Figure 7: RLR-Tree Performance (KNN Queries) observe qualitatively similar trends on all the 3 synthetic datasets. We make 2 observations. Firstly, recall that 푘 = 1 is the trivial KNN queries with different 퐾 values, i.e. 퐾 ∈ {1, 5, 25, 125, 625} with case that generates the reference tree. By including one additional 퐾 = 1 being the default setting. For each 퐾 value, 1,000 uniformly candidate in the action space, i.e., when 푘 = 2, we achieve significant distributed query points are randomly generated in the data space. query performance improvement for RL ChooseSubtree. The best We use the algorithms proposed in [34] to compute KNN queries performance improvement is 56% on the GAU dataset. Secondly, we accurately. In Figure 7, we observe that the RLR-Tree outperforms observe that RL ChooseSubtree has the best result at 푘 = 2 for all the R*-Tree and the RR*-Tree by up to 93.5% and 30.4% respec- the 3 synthetic datasets. As 푘 value increases, model performance tively. The RLR-Tree outperforms all the baselines in almost all the deteriorates gradually. This observation aligns with our expectation. cases except for the SKE dataset and its advantage over RR*-Tree is When 푘 value approaches and exceeds 10, the RL ChooseSubtree more significant on the two real datasets. We also observe thatthe model starts to fail to outperform the R-Tree. We observe similar relative query performance of RLR-Tree to R-Tree gets better for trends for RL Split on the three datasets, and do not report the larger 퐾 values. RR*-Tree follows a similar trend but not R*-Tree. details here due to space limitation. The finding that the RLR-Tree also outperforms the baselines is Training Dataset Size This experiment is to evaluate the effect of particularly interesting—the RLR-Tree is designed and trained to training data size on the performance of the RLR-tree. We would optimize the performance of range queries, rather than the KNN expect better query performance if we train the RL ChooseSubtree queries. Additionally, we compute the reward and design the state and RL Split models on the whole dataset. However, the training is space for the RLR-Tree in a way such that the model to build the very slow on large dataset. Instead, we propose to train the models Tu Gu1, Kaiyu Feng1, Gao Cong1, Cheng Long1, Zheng Wang1, Sheng Wang2 on a sample of data, and apply the models on the whole data to build method incurs high running time cost. However, it is still accept- indices. Figures 8b and 8c show the model training time and model able to take slightly more than 3 hours to construct an RLR-Tree performance for RL ChooseSubtree when we vary the training containing 100 million spatial objects. And we expect RLR-Tree dataset size. The model training time for different data distributions construction time to be shorter when using better GPU devices. is almost the same for the same training dataset size. As expected, For index size, the RLR-Tree and other baselines almost have the the query performance of the trained models improves as training same size. Therefore, we report the RLR-Tree sizes for datasets of dataset size increases from 25,000 to 100,000 as shown in Figure 8c. different sizes. As shown in Table 4, the RLR-Tree size increases However, we observe that the query performance becomes stable linearly as the size of dataset increases. after the dataset size reaches 100,000, and the improvement of using dataset of 200,000 over 100,000 is not significant. The model Table 4: Index size for GAU datasets training time increases significantly with the size of training dataset Dataset size (million) 1 5 10 20 100 as shown in Figure 8b. The results for RL Split are qualitatively RLR-Tree size (MB) 39 195 390 780 3900 similar to those for RL ChooseSubtree. Therefore we set training dataset size to be 100,000 by default which achieves a good tradeoff Training Dataset Distribution. This experiment is to explore the between training time and query performance. With the default possibility of doing model training on one data distribution and then setting, the training time for RL ChooseSubtree is 2.8 hours. Note applying the trained model on datasets with different distributions. that although the high training time is a common characteristic of For this purpose, we train an RL ChooseSubtree model using UNI learned indices [31], the RLR-tree only needs to be trained once on dataset and apply it on SKE and GAU datasets. a sample of data and the learned models can be used on the whole data to build the index. However, most of other learned indices 0.8 0.8 UNI Training UNI Training 0.7 SKE Training need to train their models on the whole data. GAU Training 0.7 0.6 Training Query Size The training query size has to be set very 0.6 0.5 0.5

carefully as it plays a key role in reward computation. This exper- 0.4

RNA RNA iment is to evaluate the effect of the training query size on the 0.3 0.4

0.2 query performance. Figure 8d shows the query performance of RL 0.3 0.1 ChooseSubtree on the three synthetic datasets for training queries 0.2 0.005 0.01 0.05 0.1 0.5 1 2 0.005 0.01 0.05 0.1 0.5 1 2 of different sizes. We observe that the query performance ofRL Testing Query Size (%) Testing Query Size (%) ChooseSubtree becomes worse when using the largest training (a) GAU Distribution (b) SKE Distribution query size, i.e. 2%. On the GAU dataset, the model performance with training query size of 2% is even 300% poorer than that with Figure 10: RL ChooseSubtree for Different Model Training our default setting of 0.01%. On the other hand, when using training Distributions query size of 0.005%, model performance is on par with our default setting on GAU dataset and shows slightly poorer results on SKE Figure 10 shows that the RL ChooseSubtree model trained using and UNI datasets. Based on these results, we set query size to be UNI dataset outperforms the R-Tree on both GAU and SKE datasets 0.01% by default. We observe similar trends for RL Split and set the for all query sizes. However, its query performance is consistently same query size as the default. worse compared to the models trained using the same data distri- bution as the dataset on which they are applied. The performance 104 RLR-Tree gap between UNI training and GAU training on the GAU dataset is RR*-Tree R*-Tree significant. On the other hand, the gap is generally smaller onthe

103 SKE dataset especially for larger query sizes. We shall leave more in-depth studies of RL model training and application on different

Time (s) data distributions to future research work. 102 6 CONCLUSIONS AND FUTURE WORK 101 We propose the RLR-Tree to improve on the R-Tree and its variants 1 5 10 20 100 designed for a dynamic environment. Experimental results show Dataset Size (million) Figure 9: Index construction time for GAU datasets that RLR-Tree is able to outperform the RR*-Tree, the best R-Tree variant, by up to 78.6% for range queries and up to 30.4% for KNN 5.2.5 Other Experiments. queries. Although the models of the RLR-Tree are trained on a Index construction time and size. The index construction time small training dataset of size 100,000, the trained models are readily is similar for datasets of the same size, irrespective of the data distri- applied on large datasets. bution. Therefore, we use the GAU dataset as an example to show This work take a first step to use machine learning to improve the index construction time. As shown in Figure 9, index construc- on the R-Tree, and we believe it we would open a few promising tion time increases almost linearly with increase in dataset size. directions for future work: 1) further explore and refine the design Besides that, RR*-Tree needs the shortest construction time and of states, action space and reward; 2) extend the idea to index RLR-Tree takes the longest time. This is because the computation enriched spatial data, such as spatial-temporal data, moving objects, of state metrics and Q-values when using our RL based indexing and spatial-textual data; 3) explore other machine learning models. The RLR-Tree: A Reinforcement Learning Based R-Tree for Spatial Data

REFERENCES [25] Xi Liang, Aaron J Elmore, and Sanjay Krishnan. 2019. Opportunistic view ma- [1] Nicolas Anastassacos, Stephen Hailes, and M Musolesi. 2020. Partner selection for terialization with deep reinforcement learning. arXiv preprint arXiv:1903.01363 the emergence of cooperation in multi-agent systems using reinforcement learn- (2019). ing. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence [26] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, (AAAI-20). Advancement of Artificial Intelligence (AAAI). Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg [2] Lars Arge, Mark De Berg, Herman Haverkort, and Ke Yi. 2008. The priority Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. R-tree: A practically efficient and worst-case optimal R-tree. ACM Transactions Nature 518, 7540 (2015), 529–533. on Algorithms (TALG) 4, 1 (2008), 1–30. [27] Vikram Nathan, Jialin Ding, Mohammad Alizadeh, and Tim Kraska. 2020. Learn- [3] Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. ing multi-dimensional indexes. In Proceedings of the 2020 ACM SIGMOD Interna- 1990. The R*-tree: an efficient and robust access method for points and rectangles. tional Conference on Management of Data. 985–1000. In Proceedings of the 1990 ACM SIGMOD international conference on Management [28] Jack A Orenstein. 1986. Spatial query processing in an object-oriented database of data. 322–331. system. In Proceedings of the 1986 ACM SIGMOD international conference on [4] Norbert Beckmann and Bernhard Seeger. 2009. A revised r*-tree in comparison Management of data. 326–336. with related index structures. In Proceedings of the 2009 ACM SIGMOD Interna- [29] Mirjana Pavlovic, Darius Sidlauskas, Thomas Heinis, and Anastasia Ailamaki. tional Conference on Management of data. 799–812. 2018. QUASII: query-aware spatial incremental index. In 21st International Con- [5] Jon Louis Bentley. 1975. Multidimensional binary search trees used for associative ference on Extending Database Technology (EDBT). searching. Commun. ACM 18, 9 (1975), 509–517. [30] Martin L Puterman. 2014. Markov decision processes: discrete stochastic dynamic [6] Angjela Davitkova, Evica Milchevski, and Sebastian Michel. 2020. The ML-Index: programming. John Wiley & Sons. A Multidimensional, Learned Index for Point, Range, and Nearest-Neighbor [31] Jianzhong Qi, Guanli Liu, Christian S Jensen, and Lars Kulik. 2020. Effectively Queries.. In EDBT. 407–410. learning spatial indices. Proceedings of the VLDB Endowment 13, 12 (2020), 2341– [7] Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, 2354. Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, [32] Jianzhong Qi, Yufei Tao, Yanchuan Chang, and Rui Zhang. 2018. Theoretically et al. 2020. ALEX: an updatable adaptive learned index. In Proceedings of the 2020 optimal and empirically efficient r-trees with strong parallelizability. Proceedings ACM SIGMOD International Conference on Management of Data. 969–984. of the VLDB Endowment 11, 5 (2018), 621–634. [8] Jialin Ding, Vikram Nathan, Mohammad Alizadeh, and Tim Kraska. 2020. [33] Kenneth A Ross, Inga Sitzmann, and Peter J Stuckey. 2001. Cost-based unbal- Tsunami: A learned multi-dimensional index for correlated data and skewed anced R-trees. In Proceedings Thirteenth International Conference on Scientific and workloads. arXiv preprint arXiv:2006.13282 (2020). Statistical Database Management. SSDBM 2001. IEEE, 203–212. [9] Christos Faloutsos. 1986. Multiattribute hashing using gray codes. In Proceedings [34] Nick Roussopoulos, Stephen Kelley, and Frédéric Vincent. 1995. Nearest neighbor of the 1986 ACM SIGMOD international conference on Management of data. 227– queries. In Proceedings of the 1995 ACM SIGMOD international conference on 238. Management of data. 71–79. [10] Christos Faloutsos and Shari Roseman. 1989. Fractals for secondary key re- [35] Nick Roussopoulos and Daniel Leifker. 1985. Direct spatial search on picto- trieval. In Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium rial databases using packed R-trees. In Proceedings of the 1985 ACM SIGMOD on Principles of database systems. 247–252. international conference on Management of data. 17–31. [11] Paolo Ferragina and Giorgio Vinciguerra. 2020. The PGM-index: a fully-dynamic [36] Zahra Sadri, Le Gruenwald, and Eleazar Leal. 2020. Online index selection using compressed learned index with provable worst-case bounds. Proceedings of the deep reinforcement learning for a cluster database. In 2020 IEEE 36th International VLDB Endowment 13, 10 (2020), 1162–1175. Conference on Data Engineering Workshops (ICDEW). IEEE, 158–161. [12] Raphael A. Finkel and Jon Louis Bentley. 1974. Quad trees a data structure for [37] Timos Sellis, Nick Roussopoulos, and Christos Faloutsos. 1987. The R+-Tree: A retrieval on composite keys. Acta informatica 4, 1 (1974), 1–9. Dynamic Index for Multi-Dimensional Objects. Technical Report. [13] Yván J García R, Mario A López, and Scott T Leutenegger. 1998. A greedy [38] Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vini- algorithm for bulk loading R-trees. In Proceedings of the 6th ACM international cius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z Leibo, Karl symposium on Advances in geographic information systems. 163–164. Tuyls, et al. 2018. Value-decomposition networks for cooperative multi-agent [14] Diane Greene. 1989. An implementation and performance analysis of spatial data learning based on team reward. In Proceedings of the 17th international confer- access methods. In Proceedings. Fifth International Conference on Data Engineering. ence on autonomous agents and multiagent systems. International Foundation for IEEE Computer Society, 606–607. Autonomous Agents and Multiagent Systems, 2085–2087. [15] Antonin Guttman. 1984. R-trees: A dynamic index structure for spatial searching. [39] Immanuel Trummer, Junxiong Wang, Deepak Maram, Samuel Moseley, Saehan In Proceedings of the 1984 ACM SIGMOD international conference on Management Jo, and Joseph Antonakakis. 2019. Skinnerdb: Regret-bounded query evaluation of data. 47–57. via reinforcement learning. In Proceedings of the 2019 International Conference on [16] Hosagrahar V Jagadish, Beng Chin Ooi, Kian-Lee Tan, Cui Yu, and Rui Zhang. Management of Data. 1153–1170. 2005. iDistance: An adaptive B+-tree based indexing method for nearest neighbor [40] Haixin Wang, Xiaoyi Fu, Jianliang Xu, and Hua Lu. 2019. Learned Index for Spatial search. ACM Transactions on Database Systems (TODS) 30, 2 (2005), 364–397. Queries. In 2019 20th IEEE International Conference on Mobile Data Management [17] Ibrahim Kamel and Christos Faloutsos. 1993. On packing R-trees. In Proceedings (MDM). IEEE, 569–574. of the second international conference on Information and knowledge management. [41] Zheng Wang, Cheng Long, Gao Cong, and Yiding Liu. 2020. Efficient and effective 490–499. similar subtrajectory search with deep reinforcement learning. Proceedings of [18] Kothuri Venkata Ravi Kanth, Divyakant Agrawal, Ambuj K Singh, and Amr El Ab- the VLDB Endowment 13, 12 (2020), 2312–2325. badi. 1997. Indexing non-uniform spatial data. In Proceedings of the 1997 Interna- [42] Zongheng Yang, Badrish Chandramouli, Chi Wang, Johannes Gehrke, Yinan Li, tional Database Engineering and Applications Symposium (Cat. No. 97TB100166). Umar Farooq Minhas, Per-Åke Larson, Donald Kossmann, and Rajeev Acharya. IEEE, 289–298. 2020. Qd-tree: Learning data layouts for big data analytics. In Proceedings of the [19] Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, 2020 ACM SIGMOD International Conference on Management of Data. 193–208. Tim Kraska, and Thomas Neumann. 2020. RadixSpline: a single-pass learned [43] Xiang Yu, Guoliang Li, Chengliang Chai, and Nan Tang. 2020. Reinforcement index. In Proceedings of the Third International Workshop on Exploiting Artificial learning with tree-lstm for join order selection. In 2020 IEEE 36th International Intelligence Techniques for Data Management. 1–5. Conference on Data Engineering (ICDE). IEEE, 1297–1308. [20] Günter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. [44] Ji Zhang, Yu Liu, Ke Zhou, Guoliang Li, Zhili Xiao, Bin Cheng, Jiashu Xing, 2017. Self-normalizing neural networks. Advances in neural information processing Yangtao Wang, Tianheng Cheng, Li Liu, et al. 2019. An end-to-end automatic systems 30 (2017), 971–980. cloud database tuning system using deep reinforcement learning. In Proceedings [21] Tim Kraska, Alex Beutel, Ed H Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. of the 2019 International Conference on Management of Data. 415–432. The case for learned index structures. In Proceedings of the 2018 International Conference on Management of Data. 489–504. [22] Lior Kuyer, Shimon Whiteson, Bram Bakker, and Nikos Vlassis. 2008. Multiagent reinforcement learning for urban traffic control using coordination graphs. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 656–671. [23] Scott T Leutenegger, Mario A Lopez, and Jeffrey Edgington. 1997. STR: A sim- ple and efficient algorithm for R-tree packing. In Proceedings 13th International Conference on Data Engineering. IEEE, 497–506. [24] Pengfei Li, Hua Lu, Qian Zheng, Long Yang, and Gang Pan. 2020. LISA: A learned index structure for spatial data. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2119–2133.