Graphical Abstract HiVision: Rapid Visualization of Large-Scale Spatial Vector Data

Mengyu Ma,Ye Wu,Xue Ouyang,Luo Chen,Jun Li,Ning Jing

Range Resolution

VisualizationarXiv:2005.12489v2 [cs.GR] 24 Jan 2021 flow of

Object1 Plotted Object1

Merge Query

traditional Plot

approaches Datasets Image for Display

…… ……

Objectn Plotted Objectn

Range&Resolution

Visualization Spatial Judge T

flow of Pixel11 opology Value11 HiVision

Image for Display

…… ……

PixelNM ValueNM

Datasets Highlights HiVision: Rapid Visualization of Large-Scale Spatial Vector Data

Mengyu Ma,Ye Wu,Xue Ouyang,Luo Chen,Jun Li,Ning Jing

• Proposes a display-driven model for large-scale data visualization

• Designs a spatial-index-based optimization for real-time data visualization • Proposes a hybrid-parallel architecture for enhanced data processing • Implements an open-source tool for rapid visualization of large-scale spatial vector data HiVision: Rapid Visualization of Large-Scale Spatial Vector ⋆,⋆⋆ Data ∗ Mengyu Ma, Ye Wu, Xue Ouyang , Luo Chen, Jun Li and Ning Jing

College of Electronic Science, National University of Defense Technology, Changsha 410073, China

ARTICLEINFO ABSTRACT

Keywords: Rapid visualization of large-scale spatial vector data is a long-standing challenge in Geographic vector data visualization Information Science. In existing methods, the computation overheads grow rapidly with data big data volumes, leading to the incapability of providing real-time visualization for large-scale spatial display-driven computing vector data, even with parallel acceleration technologies. To fill the gap, we present HiVi- parallel computing sion, a display-driven visualization model for large-scale spatial vector data. Different from real-time traditional data-driven methods, the computing units in HiVision are pixels rather than spatial objects to achieve real-time performance, and efficient spatial-index-based strategies are intro- duced to estimate the topological relationships between pixels and spatial objects. HiVision can maintain exceedingly good performance regardless of the data volume due to the stable pixel number for display. In addition, an optimized parallel computing architecture is proposed in HiVision to ensure the ability of real-time visualization. Experiments show that our approach outperforms traditional methods in rendering speed and visual effects while dealing with large- scale spatial vector data, and can provide interactive visualization of datasets with billion-scale points/segments/edges in real-time with flexible rendering styles. The HiVision code is open- sourced at https://github.com/MemoryMmy/HiVision with an online demonstration.

1 1. Introduction

2 There has been an explosion in the amounts of spatial data in recent years, due to the development of data acquisition 3 technology, the prevalence of location-based services, and etc (Yao and Li, 2018). Visualization can make the intricate 4 data more intuitive to human readers, thus important to discover implicit information and support further decision- 5 making (Maceachren, Gahegan, Pike, Brewer, Cai, Lengerich and Hardisty, 2004). For example, effective visualization 6 of taxi trajectories can help people better understand the urban transportation system, finding out strategies to reduce the 7 number of accidents and traffic jams (Zuchao, Min, Xiaoru, Junping and Huub, 2013); a scatter plot of the road network 8 nationwide can help the government to expose isolated areas, planning and constructing new roads. As an important 9 type of spatial data, spatial vector is the abstract of real-world geographical entities, generally expressed as points, 10 linestrings, or polygons (areas) (Tong, Ben, Liu et al., 2013). In the big data era, the problem of efficient spatial vector 11 data visualization becomes even more prominent, as visualizing spatial vector data involves the rasterizing process, 12 which can be extremely time-consuming when the data scale is large. Rapid visualization of large-scale spatial vector 13 data has become a severe challenge in Geographic Information Science (GIS). 14 With the development of computer hardware, there has been an expansion in processor numbers, and parallel 15 computing becomes increasingly important for processing large-scale spatial data. Recently, to optimize the data- 16 intensive and computing-intensive spatial analysis using high-performance computing technologies has become a hot 17 research topic in GIS (Yao and Li, 2018). Parallel computing is an effective way to accelerate the visualization process 18 and is shown by representative works (Gao, Wang, Li and Shen, 2005; Tang, 2013; Guo, Guan, Xie, Wu, Luo and 19 Huang, 2015; Guo, Huang, Guan, Xie and Wu, 2017) that it can achieve highly improved performance compared with 20 traditional serial methods. In addition, with the emergence of various parallel computing models (e.g., MPI, OpenMP, 21 Hadoop, Storm, Spark), a series of high-performance spatial vector data visualization frameworks have been proposed ⋆ This document is the results of the research project funded by the National Natural Science Foundation of China under Grant No. 41871284,

41971362,⋆⋆ U19A2058 and the Natural Science Foundation of Hunan Province No.2020JJ3042. Mengyu Ma designed and implemented the algorithm; Mengyu Ma, Ye Wu and Luo Chen performed the experiments and analyzed the data; Jun Li and Ning Jing contributed to the construction of experimental environment; Mengyu Ma and Xue Ouyang wrote the paper. [email protected] (M. Ma); [email protected] (Y. Wu); [email protected] (X. Ouyang); [email protected] (L. Chen); [email protected] (J. Li); [email protected] (N. Jing) ORCID(s): 0000-0002-7510-5638 (M. Ma)

M Ma et al.: Preprint submitted to Elsevier Page 1 of 19 HiVision: Rapid Visualization of Large-Scale Spatial Vector Data

22 and have seen some success (e.g., HadoopViz (Eldawy, Mokbel and Jonathan, 2016), GeoSparkViz (Yu, Zhang and 23 Sarwat, 2018), etc). 24 However, not many existing methods can support real-time visualization of large-scale spatial vector data, even 25 with the adopted high-performance computing technologies. Figure1 presents the general processing flow in existing 26 visualization methods: firstly, each spatial object in the range is plotted according to the image resolution, then followed 27 by a merge step to generate the final raster image. The computational scale of this data-driven processing flow expands 28 rapidly with the volume of spatial objects in the image range, therefore, suffers performance drop when dealing with 29 big data scenario and cannot meet the real-time requirements.

Range Resolution

Object1 Plotted Object1

Merge

Query Plot

Datasets Image for Display

…… ……

Objectn Plotted Objectn

Figure 1: Processing flow of data-driven spatial vector data visualization.

30 To address the scale issue, we present HiVision, a display-driven vector data visualization model as shown in 31 Figure2. In HiVision, the computing units are pixels rather than spatial objects and the algorithms focus on determining 32 the actual pixel value with relevant spatial objects: as the number of pixels in the image range is limited and stable, the 33 computational complexity of HiVision remains stable while dealing with spatial data of different scales. In addition, 34 efficient spatial-index-based strategies are introduced to estimate spatial topological relationships between pixels and 35 spatial objects, thus determining the value of pixels. To the best of our knowledge, this is the first attempt for rapid 36 visualization of vector data that can achieve the advantage of being less sensitive to data volumes.

Range&Resolution

Judge Spatial Judge T

Pixel11 opology Value11

Image for Display

…… ……

PixelNM ValueNM

Datasets

Figure 2: Spatial vector data visualization processing flow in HiVision.

37 The contributions of this paper can be summarized as follows.

38 • Implements an open-source tool for rapid visualization of large-scale spatial vector data. HiVision can be used 39 to provide an interactive exploration of massive raw spatial vector data with flexible rendering styles and better 40 visual effects, so as to discover implicit information and parameter settings for further processing.

41 • Designs a display-driven vector data visualization approach and reduces the computational complexity dra- 42 matically (from O(n) to O(log(n)). HiVision calculates visualization results directly using a parallel per-pixel 43 approach with efficient fine-grained spatial indexes. Our approach provides new research ideas for many related 44 fields (e.g. map cartography (Kraak and Ormeling, 2013), spatial analysis and data visualization).

M Ma et al.: Preprint submitted to Elsevier Page 2 of 19 HiVision: Rapid Visualization of Large-Scale Spatial Vector Data

45 • Carries out extensive experiment evaluations and provides an online demonstration. Experiments show that 46 HiVision dramatically outperforms traditional data-driven methods in tile rendering speed, and it is capable of 47 handling billion-scale spatial vector data. In addition, the demonstration verifies that a normal 4 cores CPU with 48 32GB Memory could perform interactive visualization of 10-million-scale spatial objects using HiVision.

49 The rest of this paper proceeds as follows. Section2 highlights the state-of-the-art of big spatial data visualization. 50 In Section3 and Section4, the techniques of HiVision are described in details. The experimental results are presented 51 and analyzed in Section5 with an online demonstration of HiVision introduced in Section6. And the conclusions are 52 drawn in Section7.

53 2. Related Work

54 2.1. Big Spatial Data 55 There are many studies on big spatial data that discuss the challenges brought by data volume and diverse appli- 56 cation requirements. Most of the studies focus on solving the problem of spatial query that emerged when processing 57 large-scale spatial data (Bellur, 2014; Fries, Boden, Stepien and Seidl, 2014; Aly, Mahmood, Hassan, Aref, Ouz- 58 zani, Elmeleegy and Qadah, 2015; Zhu, Huo and Qiu, 2015; Eldawy, Yuan, Mokbel and Janardan, 2013; Scitovski, 59 2018). Typically, Aly et al.(2015) presented a workload-aware big spatial data partitioning strategies for range and 60 k-nearest-neighbor queries which achieved an order of magnitude enhancement compared with the state-of-the-art sys- 61 tems; Eldawy et al.(2013) proposed a set of efficient MapReduce algorithms for some basic spatial analysis operations, 62 including polygon union, farthest/closest pair, skyline query and convex hull. Besides, many high-performance frame- 63 works/systems have been proposed to process or analyze big spatial data, among them are ScalaGiST (Lu, Chen, Ooi, 64 Vo and Wu, 2014), Sphinx (Eldawy, Elganainy, Bakeer, Abdelmotaleb and Mokbel, 2015), Hadoop-GIS (Aji, Wang, 65 Vo, Lee, Liu, Zhang and Saltz, 2013), SpatialHadoop (Eldawy and Mokbel, 2015), GeoSpark (Yu, Wu and Sarwat, 66 2015), Simba (Xie, Li, Yao, Li, Chen, Zhou and Guo, 2016), and etc. However, none of them is capable of providing 67 visualization of large-scale spatial data in real-time.

68 2.2. Big Spatial Data Visualization 69 Spatial data visualization, as an important means of spatial analysis, is a core issue in map cartography. To visu- 70 alize large-scale spatial data, the Open Geospatial Consortium (OGC) has provided a standard Web Map Tile Service 71 (WMTS) (OpenGIS, 2010), in which pre-rendered or run-time computed georeferenced map images are organized 72 into the tile-pyramid structure and transferred as map tiles over the Internet. Tile-pyramid is a multi-resolution data 73 structure model widely used for map browsing on the web. At the lowest level of tile-pyramid (level 0), a single tile z 74 summarizes the whole map. On each higher level, there are up to 4 tiles, in which z is the zoom level. Each tile has 75 the same size of n × n pixels and corresponds to a same geographic range. However, existing solutions to large-scale 76 map data visualization are not ideal due to the following problems: 1) (long generating time) on the one hand, it 77 will take a long time to render one tile which intersects large amounts of spatial objects, on the other hand, it may 78 take dozens of hours or even more to slice all the tiles to provide free exploration of one spatial dataset; 2) (massive 79 tiles) a tile-pyramid of world-scale with zoom from 0 to 16 contains billions of tiles, requiring TB level in storage; 3) 80 (inflexible styles) the style of rendered tiles can not be changed. In other words, a new set of tiles has to be re-generated 81 if one wants to change the style. 82 Several studies focus on improving tile rendering performance, a typical yet important benchmark of spatial data 83 processing. In the field of map cartography, there are many mature tools for spatial data visualization, such as Map- 84 nik (A), GeoServer (B,b) and MapServer (B,a). These tools are widely used for generating maps due to their efficient 85 rendering algorithms and rich rendering styles. In order to further improve the rendering performance of large-scale 86 spatial data, researchers have provided several parallel methods, and various acceleration technologies are adopted. For 87 example, Gao et al.(2005) presented a parallel multi-resolution volume rendering algorithm for visualizing large data 88 sets, in which the raw data is converted to a wavelet tree to achieve load-balanced rendering. Tang(2013) proposed 89 a parallel construction method of large circular cartograms based on graphics processing units (GPUs) and achieved 90 significant acceleration performance. In order to achieve load-balance, Guo et al.(2015) developed a spatially adap- 91 tive decomposition approach for polyline and polygon visualization to divide the visualization domain into unequally 92 sized sub-domains, such that they entail approximately the same amount of computational intensity. Guo et al.(2017) 93 proposed an approach of vector data rendering by using the parallel computing capability of many-core GPUs, which 94 involves a balancing allocation strategy to take full advantage of all processing cores of the GPU. OmniSci (OmniSci,

M Ma et al.: Preprint submitted to Elsevier Page 3 of 19 HiVision: Rapid Visualization of Large-Scale Spatial Vector Data

Table 1 Performance of data-driven methods and HiBuffer (Ma et al., 2018). Algorithm 40,927 linestrings 208,067 linestrings 597,829 linestrings 21,898,508 linestrings a Parallel Method1 9.0 s 38.8 s 332.3 s Failed b Parallel Method2 15.4 s 128.2 s 936.9 s Failed c Parallel Method3 12.3 s 75.5 s 661.9 s Failed d Parallel Method4 17.2 s 220.8 s 2813.4 s Failed PostGIS 34.9 s 295.8 s 2380.2 s Failed QGIS 129 s 2788 s Failed Failed ArcGIS 139 s 2365 s Failed Failed HiBuffer <1 s <1 s <1 s <1 s a(Shen, Chen, Wu and Jing, 2018) b(Fan, Ji, Gu and Sun, 2014) c(Huang, 2013) d(Wang, Zhao, Wang, Chen and Cao, 2016)

95 2020) is a GPU-based analytics platform which allows users to interactively query and visualize large-scale spatial 96 dataset. It can process billions of points and millions of polygons and generate custom pointmaps, heatmaps, choro- 97 pleths, scatterplots, and other visualizations, enabling zero-latency visual interaction at any scale. Eldawy et al.(2016) 98 proposed a MapReduce-based framework, HadoopViz, for visualizing big spatial data, and experiments showed that 99 HadoopViz can efficiently produce giga-pixel images for billions of input records. Yu et al.(2018) proposed a big 100 spatial data visualization framework, GeoSparkViz, which takes advantage of the in-memory architecture of Spark, 101 and experiments verified that GeoSparkViz can generate a gigapixel image of 1.3 billion taxi trips in 5 minutes on a 102 four-node commodity cluster. 103 In addition, researchers have proposed several other approaches to improve data visualization effects. Yang, Wong, 104 Yang, Kafatos and Li(2005) listed several possible techniques to improve the performance of data exploration on 105 Web-based GIS, including pyramids and hash indices for large images, multi-threading, data catching and binary 106 compression. To manage the massive map tiles, Wan, Huang and Peng(2016) developed a tile storage approach based 107 on the NoSQL database. To provide interactive exploration of large-scale spatial data while avoiding generating all the 108 image tiles, Ghosh, Eldawy and Jais(2019) proposed an adaptive image data index, which pre-generates image tiles 109 for the regions where spatial objects are dense; other typical methods generate tile-like intermediate variables through 110 precomputing thus to compute requested tiles on the fly (Liu, Jiang and Heer, 2013; Pahins, Stephens, Scheidegger 111 and Comba, 2016; Lins, Klosowski and Scheidegger, 2013). The vector tile technology (Wikipedia, 2019) has been 112 a popular approach over the recent years to visualize large-scale spatial vector data; it transfers packets of geographic 113 data rather than images to clients and can change the map styles without generating new tiles. However, as it involves 114 the complex cartographic generalization operations, it is more time-consuming to generate vector tiles than image tiles. 115 To summarize, the existing solutions to rapid visualization of large-scale vector data are normally data-driven, with 116 the computational scales expanding rapidly with the volume of spatial objects, leading to the result that it is difficult 117 to provide visualization of large-scale vector data in real-time.

118 2.3. Display-driven Computing 119 Display-driven computing (DisDC) is a computing model that is especially suitable for data-intensive problems in 120 GIS. In DisDC, the computing units are pixels rather than the spatial objects. The core issue in DisDC is to identify 121 spatial topological relationships between pixels and spatial objects, thus determining the value of pixels for display. 122 DisDC has a broad prospect of applications and researches in big data analysis. 123 In our previous works (Ma, Wu, Luo, Chen, Li and Jing, 2018; Ma, Wu, Chen, Li and Jing, 2019), the primary idea 124 of DisDC was first proposed and applied to solve some basic analysis problems in GIS. We have successively brought 125 forward HiBuffer and HiBO to provide interactive buffer and overlay analysis of large-scale spatial data. In (Ma et al., 126 2018), we conducted an experiment on a high-performance server, in which the optimized parallel buffer analysis 127 methods proposed in recent years and the popular GIS software programs are discussed and compared (key results are 128 summarized in Table1), and the display-driven buffer analysis method, HiBuffer, is deployed and tested in the same 129 hardware environment. Experiments verified that HiBuffer reduced computation time by up to orders of magnitude 130 while dealing with large-scale spatial vector data, and DisDC has significant advantages compared with data-driven 131 computing (DataDC). In this paper, we have applied DisDC to the field of rapid vision for large-scale spatial vector 132 data to explore its effects.

M Ma et al.: Preprint submitted to Elsevier Page 4 of 19 HiVision: Rapid Visualization of Large-Scale Spatial Vector Data

133 3. Methodology

134 In this section, the key ideas for spatial vector data visualization in HiVision are introduced. Given the fact that the 135 core task of visualizing spatial vector is to rasterize the spatial objects and render the final raster images for display, 136 we have applied DisDC to scatter plot the spatial point, the linestring and the polygon objects in HiVision. And to 137 provide better visualization effects, points, linestrings and boundaries of polygons are generally plotted with widths, 138 and the anti-aliasing process is needed (see Figure3). In HiVision, we process each pixel of the final raster image as an 139 independent computing unit; and spatial indexes are utilized to identify the spatial topological relationships between 140 pixels and spatial objects, thus determining the value of pixels for display. We design a DisDC oriented vector data 141 organization structure for data visualization. Specifically, for point, linestring and polygon edges, we propose a visual- 142 ization method named Spatial-Index-Based Visualization (SIBV); and for the filling problem in polygon visualization, 143 we present Spatial-Index-Based Filling (SIBF) algorithm.

(a) Point objects (b) Linestring objects

(c) Polygon objects Figure 3: Plot spatial objects for visualization.

144 3.1. Data Organization 145 The core issue in DisDC is to identify spatial topological relationships between pixels and spatial objects, so as 146 to calculate the value of pixels for display. To support the rapid visualization of large-scale spatial vector data using 147 DisDC, we design a specialized data organization structure in HiVision. Spatial indexes are widely used to organize 148 spatial data so that efficient spatial object accessing can be guaranteed. R-tree, as an efficient tree data structure widely 149 used for indexing and querying spatial data, can be built efficiently by grouping nearby objects and representing them 150 with their Minimum Bounding Rectangle (MBR) in the next higher level of the tree (Choubey, Chen and Rundensteiner, 151 1999). In HiVision, an R-tree based design is proposed to organize spatial objects. As spatial objects, in reality, may 152 have complex structures and different shapes that are difficult to identify accurately by the MBRs, use the MBR of 153 each spatial object directly as R-tree record nodes can cause low query performance in the display-driven analyzing 154 process. Accordingly, linestring or polygon objects are separated to segments or edges to be stored in the R-tree indexes

M Ma et al.: Preprint submitted to Elsevier Page 5 of 19 HiVision: Rapid Visualization of Large-Scale Spatial Vector Data

155 in HiVision. 156 As shown in Figure4, for point and linestring objects, we create R-tree indexes with point and segment as nodes 157 types; for polygon objects which involve a filling step, we design a multi-level index architecture (MLIA). In MLIA, 158 each edge of the polygon objects is stored as a segment in RtreeE and the polygon MBRs are stored as boxes in 159 RtreeMBR. In particular, to support spatial judging in SIBF, two operations are executed: 1) node information 160 (IsLevel) is included in RtreeE to identify whether the edge is parallel to the x-axis; 2) for the edges which mono- 161 tonically increase or decrease, the segment cutting process is adopted (see Figure5). Raw Data Data Management

Record Node of RtreeP Point (x,y) point(x,y) ID ID Other Attributes

(x ,y ) 1 1 Record Node of RtreeL

segment(x1,y1,x2,y2) ID (x ,y ) Linestring 2 2 ID Other Attributes

segment(x2,y2,x3,y3) ID

(x3,y3)

Record Node of RtreeE

segment(x7,y7,x1,y1) IsLevel ID (x1,y1) (x2,y2) segment(x1,y1,x2,y2) IsLevel ID (x ,y )

7 7 ……

Polygon (x ,y ) 4 4 ID Other Attributes (x6,y6) (x3,y3) segment(x6,y6,x7,y7) IsLevel ID

(x5,y5) Record Node of RtreeMBR box(minx,miny,maxx,maxy) ID

Figure 4: Vector data organization in HiVision.

y Point Pi-1 y Point Pi-1 y Point Pi-1 Point Pi+1

Segment Si-1 Segment Si-1 Point P Point Pi Point P Segment Si-1 Segment Si Point Pi j-1 j-1 Point P Point Pi Segment Sj-1 ii Segment Sj-1

Point Pj Segment Sii Segment Si Point Pj Point Pj Segment Sj-1 Point Pjj Segment Sj Segment Sj Segment Sjj Point Pi+1 Point Pi+1 Tolerance t Point Pj-1 Point Pj+1 Point Pj+1 (t << Segment Length) Point Pj+1 x x x (a) Edges reverse direction at endpoint (cutting pro- (b) Edges monotonically increase or decrease (cutting process required) cess not required) Figure 5: Segment cutting for polygon edges.

M Ma et al.: Preprint submitted to Elsevier Page 6 of 19 HiVision: Rapid Visualization of Large-Scale Spatial Vector Data

162 3.2. Data Visualization 163 3.2.1. SIBV for Point, Linestring and Polygon Edges 164 As points, linestrings and polygon edges are visualized with widths, it can be regarded as generating spatial 165 buffers (Sommer and Wade, 2006) of the objects. Different from general spatial buffer analysis which identifies areas 166 by surrounding geographic features with a given spatial distance, for visualization, the widths of spatial objects are 167 measured by the number of pixels. In SIBV, we extend the buffer generation method in HiBuffer (Ma et al., 2018) 168 to visualize spatial point, linestring and the boundaries of polygon objects; moreover, we design a super-sampling 169 approach for anti-aliasing: as shown in Figure6, the pixel P is split into four sub-pixels and a sample is taken from 170 each sub-center. The value of P is generated by weighting the values of sub-pixels (Figure7 shows the improvement 171 of visual effects in SIBV with the anti-aliasing approach).

P1 P2

P

P3 P4

Figure 6: Super-sampling of pixel P for anti-aliasing in SIBV.

(a) Before anti-aliasing (b) After anti-aliasing Figure 7: Improvement of visual effects with anti-aliasing in SIBV.

172 The details of SIBV are as shown in Algorithm 1, and the query boxes used in SIBV is illustrated in Figure8. Two 173 main factors are considered to optimize the algorithm: 1) the super-sampling process should only be used in the color 174 transition regions, as it will surely increase the calculation amount; 2) when R-tree is used, intersect operators work well 175 for queries using bounding-box rather than other shapes, and nearest-neighbor√ search has much√ higher computation 176 complexity than the bounding-box query. We introduce R1 (= R − 2∕4 × Rz) and R2 (= R + 2∕4 × Rz). If the 177 distance from P to the nearest spatial object, defined as D, is less than R1, it means that all the sub-pixels of P are in 178 the zones of rasterized spatial objects; if D is between R1 and R2, P belongs to the color transition regions; otherwise, 179 P belongs to the background. The query process in SIBV can be divided into two steps: 180 Step 1 To determine whether P is in the buffer area of spatial objects with R1 as radius. We introduce InnerBox1 181 and OuterBox1 to deal with different situations in this step (if there are lots of spatial objects within the distance R1 182 from P , we query the spatial objects intersects the InnerBox1, as high density of spatial objects in the neighbor is 183 very likely to intersect the inner box; and if there are few spatial objects in the neighbor of P , we use the OuterBox1 184 to filter out the spatial objects which are far from P ).

M Ma et al.: Preprint submitted to Elsevier Page 7 of 19 HiVision: Rapid Visualization of Large-Scale Spatial Vector Data

OuterBox2

OuterBox1

InnerBox1

P

.r.

Figure 8: Query boxes for calculating pixel P with N as radius in SIBV (Rz: resolution at zoom level Z).

185 Step 2 OuterBox2 is used as a filter to determine whether P belongs to the color transition regions. If so, we 186 calculate the number of sub-pixels that are in the plotting region. Specially, the number of sub-pixels in the plotting 187 region is used as an indicator to identify the degree which P belongs to the zones of rasterized spatial objects.

Algorithm 1: Spatial-Index-Based Visualization

Input: Pixel P , zoom level Z, radius N (pixels) and spatial index Rtree (RtreeP , RtreeL or RtreeE). Output: 0 - 4 (0: P belongs to the background region; 1 - 3: P belongs to color transition regions; 4: P totally belongs to the zones of rasterized spatial objects). 1: Rz ← RESOLUTION(Z) R ← N × Rz 2: √ R ← R − 2∕4 × Rz 3: 1 √ R ← R + 2∕4 × Rz 4: 2 √ 5: r ← 2∕2 × R1 6: InnerBox1 ← BOX(P .x − r, P .y − r, P .x + r, P .y + r) 7: T mp ← satisfying Rtree.INTERSECT(InnerBox1).LIMIT(1) 8: if T mp is not null then return 4 9: else 10: OuterBox1 ← BOX(P .x − R1, P .y − R1, P .x + R1, P .y + R1) 11: T mp ← satisfying Rtree.INTERSECT(OuterBox1) and Rtree.NEAREST(P ) 12: if T mp is not null && DISTANCE(T mp,P ) ≤ R1 then return 4 13: else 14: OuterBox2 ← BOX(P .x − R2, P .y − R2, P .x + R2, P .y + R2) 15: T mp ← satisfying Rtree.INTERSECT(OuterBox2) and Rtree.NEAREST(P ) 16: if T mp is not null && DISTANCE(T mp,P ) ≤ R2 then 17: P1 ← POINT(P .x − 1∕4 × Rz, P .y + 1∕4 × Rz) ⊳ Super-sampling for anti-aliasing. 18: P2 ← POINT(P .x + 1∕4 × Rz, P .y + 1∕4 × Rz) 19: P3 ← POINT(P .x − 1∕4 × Rz, P .y − 1∕4 × Rz) 20: P4 ← POINT(P .x + 1∕4 × Rz, P .y − 1∕4 × Rz) ∑4 T mp, P R 21: return i=1 DISTANCE( i) ≤ ? 1 ∶ 0 22: return 0

188 3.2.2. SIBF for Polygon Filling 189 We design the SIBF to determine whether the pixel P is inside the polygon objects, so as to visualize the zones 190 inside polygon objects. The details of SIBF are shown in Algorithm 2, we use the RtreeMBR to find the candidate 191 polygons and then measure the spatial relationship between the pixel and each candidate polygon one by one until the 192 polygon which contains the pixel is found. We apply the ray casting algorithm (Shimrat, 1962) to determine whether 193 a pixel is inside a polygon. To be more specific, given a pixel and a polygon, a segment (QuerySegment) is drawn

M Ma et al.: Preprint submitted to Elsevier Page 8 of 19 HiVision: Rapid Visualization of Large-Scale Spatial Vector Data

194 from the MBR boundaries of the polygon to the pixel which is parallel to the x-axis, then RtreeE is used to calculate 195 how many times the segment intersects the edges of the polygon (the edges in parallel with the x-axis are processed 196 as invalid edges). The pixel is classified as ’inside the polygon’ if the number of crossings is odd, or ’outside’ if it is 197 an even number. The result holds for polygons with inner rings. Moreover, as longer QuerySegment may intersect 198 large amounts of edges which belong to other polygons and thus cause performance degradation, two optimizations 199 have been made to minimize the length of the QuerySegment: 1) the polygons with smaller x spans are used for 200 spatial judging preferentially (in line 2 in Algorithm 2); 2) the vertical segment from the pixel to the closer edge of the 201 polygon MBR is used as QuerySegment (details are given in line 6-9). When the spatial relationships are determined, 202 we can then render the pixels inside polygon objects according to the given styles. In the current implementation, 203 Monochromatic colors and patterns filling are both supported. Figure9 shows the visual effects of polygon objects in 204 HiVision.

Algorithm 2: Spatial-Index-Based Filling

Input: Pixel P, RtreeE and RtreeMBR. Output: True or False (whether P is in polygons). 1: T mpMBR ← satisfying RtreeMBR.INTERSECT(P ) 2: SORT(T mpMBR) ⊳ Polygon with smaller x span has higher priority. 3: for v ∈ T mpMBR do 4: EdgeCount ← 0 5: vMinx ← v.Box.minx, vMaxx ← v.Box.maxx 6: if P .x − vMinx < vMaxx − P .x then 7: QuerySegment ← SEGMENT(vMinx, P .y, P .x, P .y) 8: else 9: QuerySegment ← SEGMENT(P .x, P .y, vMaxx, P .y) 10: T mpS ← satisfy RtreeE.INTERSECT(QuerySegment) 11: for s ∈ T mpS do 12: if (not s.IsLevel)&&s.ID == v.ID then 13: EdgeCount + + 14: if EdgeCount is odd then return True 15: return False

(a) Monochromatic colors filling (b) Patterns filling Figure 9: Visualization of polygon objects in HiVision.

205 3.3. Superiority analysis 206 HiVision outperforms traditional data-driven solutions in the following two aspects:

207 • (low computational complexity) Assume n to be the number of spatial objects for visualization. As the ob- 208 servation resolution limit of human eyes(Bachman, 2014) and the processing capacity limit of the browsers, the 209 number of calculated pixels for display in a screen has an upper limit and can be regarded as a constant factor. 210 In data-driven solutions, as each object will be computed and analyzed successively, the total time complexity is 211 O(n). In contrast, the computing units in HiVision are pixels and we introduce R-tree indexes to accelerate the 212 process of finding the objects to determine the value of each pixel; as a result, the time complexity is reduced to 213 O(log(n)).

214 • (easy to parallel) Real-world spatial datasets have the spatial unbalanced distribution property, which creates 215 challenges with respect to efficient parallel processing. Using DataDC, in general, complex partitioning and

M Ma et al.: Preprint submitted to Elsevier Page 9 of 19 HiVision: Rapid Visualization of Large-Scale Spatial Vector Data

216 merging strategies need to be designed; however, few strategies are capable of handling all kinds of spatial 217 distributions with good load balancing. In contrast, as the time complexity of the algorithm is O(log(n)), it 218 indicates that our approach is less sensitive to spatial distributions and simply partitioning the analysis task by 219 dividing the pixels equally can achieve good load balancing.

220 4. Architecture

221 To provide an interactive exploration of large-scale spatial vector data, we design a high-performance parallel 222 processing architecture as illustrated in Figure 10. The DisDC oriented vector data organization structure is stored 223 as memory-mapped files (Wikipedia, 2020), which do not need to be totally loaded into memory while accessing the 224 files. The architecture of HiVision adopts the browser-server application model. In HiVision, visualization results 225 are organized into the tile-pyramid structure with 256 × 256 pixels as the tile size. When a user browses the spatial 226 datasets, tiles in the display range will be rendered on the fly according to the rendering styles. The server side of the 227 architecture consists of three parts: Multi-Thread Visualization Server (MTVS), In-Memory Messaging Framework 228 (IMMF) and Hybrid-Parallel Visualization Engine (HPVE).

Hybrid-Parallel Visualization Engine Multi-Thread Visualization Server Users Compute Process In-Memory Messaging Pattern Library S L (Multi-threads) chedule Framework oad oad Parse Tasks & Tasks Parse Render Tiles Render Spatial I Spatial Compute Process Task Pool Vector Data Visualization (Multi-threads) P rallel rallel WMTS

ndexes Compute Process Result Pool (Multi-threads) T asks

…… Data Meta Info Vector Data Vector Register Register Vector Data Registration DisDC Oriented Data Organization Service ……

Figure 10: Architecture of HiVision.

229 4.1. Multi-Thread Visualization Server 230 In MTVS, the spatial data visualization service is encapsulated as a WMTS, which can be easily added to web 231 maps as a raster layer. The visualizing process of each tile is treated as independent tasks. The Parse Tasks process 232 analyzes tile requests and generates visualization tasks in Task Pool; in particular, the following types of tiles will 233 not lead to new tasks: 1) tiles that are not in the spatial scope of data MBRs; 2) tiles that are previously processed 234 and the visualization results that are still preserved in the Result Pool; 3) tiles with wrong request expressions. The 235 Render Tiles process gets visualization results from the Result Pool once the visualizing process is done in HPVE, 236 and renders tiles according to the style provided by users; The Pattern Library stores the patterns used for pattern 237 filling style of polygon objects. MTVS provides a data registration interface, and in the Register Vector Data process, 238 MTVS creates spatial indexes and writes dataset meta-data (e.g., MBRs) to Data Meta Info. In addition, multi-thread 239 technology is adopted in MTVS to improve concurrency.

240 4.2. In-Memory Messaging Framework 241 To reduce message transmission time, tasks, results, control messages are delivered in memory without disk I/O 242 in IMMF. IMMF is implemented based on Redis, a distributed, in-memory key-value database. Task Pool is a first- 243 in-first-out queue that stores the requested tile tasks; the tasks are pushed to the list by MTVS and popped to the task 244 processors in HPVE, and the operations are executed in blocking mode to avoid repeated allocation of tasks. Result

M Ma et al.: Preprint submitted to Elsevier Page 10 of 19 HiVision: Rapid Visualization of Large-Scale Spatial Vector Data

Table 2 Experimental environment. Item Description CPU 4 × 32 cores, Intel(R) Xeon(R) [email protected] GHz Memory 4 × 256 GB Centos 7.1

245 Pool stores the visualization results in key-value structure: the key identifies the tile request expression, while the value 246 stores the visualization results (in the form of a two-dimensional array indicating different zones for the rendering 247 process). Once a task is finished in MPVE, the visualization result will be written to the Result Pool, and then a task 248 completion message will be sent to MTVS through subscribing/publishing in Redis. To avoid overwhelming memory 249 consumption, visualization results are set with an expire time window and expired results will be cleaned up if memory 250 usage reaches the upper limit.

251 4.3. Hybrid-Parallel Visualization Engine 252 HPVE adopts a hybrid MPI-OpenMP parallel processing model and a dynamic task partitioning strategy to achieve 253 real-time exploration of large-scale spatial vector data. In HiVision, each tile is partitioned by lines and processed with 254 multiple OpenMP threads in one MPI process. When a user browses the spatial datasets, the tasks are generated in a 255 streaming way and handled at a first-in/first-served basis. An MPI process will be suspended if no tasks are assigned 256 or the assigned tasks are finished. Tasks are dynamically allocated to a suspended MPI process, and if there are no 257 available free MPI processes, the extra task will be temporarily stored in the Task Pool waiting for idle processes.

258 5. Experimental Evaluation

259 In this section, we conduct several experiments to evaluate the performance of HiVision. Firstly, we compare 260 HiVision with the typical data-driven methods which are popular in recent years; then, we test the ability of HiVision 261 to support interactive exploration of large-scale spatial vector data; moreover, we carry out an experiment to analyze 262 the influence of request rates while providing interactive visualization in HiVision; finally, the parallel scalability of 263 HiVision is tested through running with varying numbers of MPI processes and OpenMP threads. 264 All the experiments are conducted on a cluster with four nodes (Table2). The computer code of HiVision is 265 implemented in C++, based on Boost C++ 1.64, MPICH 3.04, GDAL 2.1.2 and Redis 3.2.12. The data-driven methods 266 are based on Hadoop 2.9.2, Spark 2.3.3, SpatialHadoop 2.4.2, GeoSpark 1.2.0 and 3.0.22. In HiVision, the 267 spatial indexes could be constructed quickly (Fernández, 2018) and the experiments are conducted based on the pre- 268 built DisDC oriented vector data organization structure. The spatial indexes can be used to provide interactive buffer 269 and overlay analysis as well(Ma et al., 2018, 2019). To deploy HiVision on the cluster, we keep a copy of index 270 files in each cluster node so that all the processes can access the spatial vector data efficiently. Table3 shows the 271 datasets used in the experiments, and the datasets are all on the planet level. P1 is from OpenCellID1, which is the 272 world’s largest collaborative community project that collects GPS positions of base stations. Other datasets are from 273 OpenStreetMap, which is a digital map database built through crowdsourced volunteered geographic information. L7, 274 P2 and A2 respectively contain all the linestrings, points and polygons on the planet from OpenStreetMap; and there 275 are more than 1 billion segment/point/edge items in each of the dataset. 276 5.1. Experiment 1. Outperforming Data-driven Methods 277 In order to highlight the superiority of HiVision, we compare HiVision with three typical data-driven meth- 278 ods, namely, HadoopViz, GeoSparkViz and Mapnik. All the methods are deployed on the cluster with four nodes. 279 HadoopViz and GeoSparkViz are respectively implemented based on the Hadoop and the in-memory Spark with well 280 load-balance task partition strategies; given a spatial dataset, the methods can generate all the tiles of given zoom lev- 281 els. Mapnik is an open-source toolkit for rendering maps. The inputs of Mapnik are the spatial objects in the tile range 282 and the rendering styles, and the output is rendered map tile. In the experiment, the tile rendering tasks are stored in a 283 queue and multiple Mapnik rendering processes are launched to process the tasks successively. We have totally started 284 128 Mapnik rendering processes in the cluster. HiVision is set to run with 128 MPI processes and 2 OpenMP threads

1https://opencellid.org

M Ma et al.: Preprint submitted to Elsevier Page 11 of 19 HiVision: Rapid Visualization of Large-Scale Spatial Vector Data

Table 3 Datasets used in the experiments. Dataset Type Records Size

L1: OSM postal code areas boundaries Linestring 171,226 65,334,342 segments L2: OSM boundaries of cemetery areas Linestring 193,076 1,800,980 segments L3: OSM sporting areas boundaries Linestring 1,767,137 18,969,047 segments L4: OSM water areas boundaries Linestring 8,419,324 376,208,235 segments L5: OSM parks green areas boundaries Linestring 9,961,896 454,636,308 segments L6: OSM roads and streets Linestring 72,339,945 717,048,198 segments L7: OSM all linestrings on the planet Linestring 106,269,321 1,578,947,752 segments P1: OpenCelliD cell tower locations Point 40,719,479 40,719,478 points P2: OSM all points on the planet Point 2,682,401,763 2,682,401,763 points A1: OSM buildings Polygon 114,796,734 689,197,342 edges A2: OSM all polygons on the planet Polygon 177,662,806 2,077,524,465 edges

285 in each process. For each dataset, we generate tiles of zoom levels 1, 3, 5, 7 and 9 with different methods; the numbers 286 of tiles in each level are 4, 64, 1024, 16384 and 262144. 287 Figure 11 shows the total rendering time of all the tiles in zoom levels 1, 3, 5, 7 and 9 with different methods. 288 GeoSparkViz shows high performance among the data-driven methods: for all the datasets, GeoSparkViz takes the 289 shortest time than other data-driven methods. Taking HiVision and GeoSparkViz for comparison, GeoSparkViz shows 290 higher performance while the dataset scale is small (L1−4), and HiVision outperforms GeoSparkViz on larger datasets 291 (L5−7,P1−2,A1−2). For the billion-scale datasets L7,P2 and A2, HiVision shows the high performance and it respec- 292 tively takes about 38.33% (=268.61s ÷ 700.83s), 15.36% (=398.32s ÷ 2593.44s), 17.42% (=590.62s ÷ 3390.10s) of 293 the rendering time using GeoSparkViz on each dataset. From L1 to L7, the data size increases sequentially, there is no 294 significant uptrend in the tile rendering time using HiVision; in contrast, the tile rendering time of data-driven methods 295 expands rapidly with the increase of data scales. Surprisingly, using HiVision, L7, the largest linestring dataset with 296 more than 1 billion segments, produces better performance than L4, which has a much smaller scale. Experiment re- 297 sults show that HiVision is less sensitive to data volumes. In addition, HiVision produces better visual effects. Neither 298 HadoopViz nor GeoSparkViz contains the anti-aliasing and polygon filling processes, and polygon objects are treated 299 as linestring objects in the methods. Mapnik, as a mature map cartography tool, can visualize spatial objects with 300 various styles; however, Mapnik failed to process the billion-scale datasets L7,P2 and A2. In conclusion, compared 301 with traditional data-driven methods, HiVision produces higher performance with better visual effects while dealing 302 with large-scale spatial vector data. 303 Figure 12 shows the tile rendering speed of different zoom levels. As illustrated by the results, data-driven methods 304 show high tile rendering speed while zoom levels are high, however, the speed decreases rapidly while the zoom level is 305 low. It is because the spatial objects in a tile range could be extremely large if the zoom level is low, and using DataDC, 306 each object needs to be plotted and merged successively to generate the final visual effects. Given a large-scale spatial 307 dataset, as the amounts of spatial objects in the views are unpredictable, it is difficult to provide efficient visualization 308 of the dataset on all zoom levels using data-driven methods; by contrast, using the display-driven HiVision, as the tile 309 rendering speed remains stable with the different zoom levels, it is easy to provide an interactive exploration of the 310 dataset in all the zoom levels. Compared with data-driven methods, HiVision shows obvious advantages while the 311 density of spatial objects is high but tends to be slower when density is low. In our future works, we will consider 312 combining both display-driven and data-driven computing to provide interactive spatial visual analysis with better 313 performance.

314 5.2. Experiment 2. Visualizing Large-Scale Vector Data in Real Time 315 In this experiment, we test the ability of HiVision to support the real-time exploration of large-scale spatial vector 316 data. HiVision is set to run with 64 MPI processes and 4 OpenMP threads in each process. For each dataset, we 317 generate 10000 tile rendering tasks through a test program, which randomly requests tiles from zoom levels 3 to 15. 318 HiVision is set with no cache preserved in Result Pool to ensure that each requested tile will lead to a new task in Task 319 Pool, thus evaluating the performance of the visualization engine more precisely. 320 Figure 13 (a) shows the total rendering time of 10000 tiles on different datasets using HiVision. For all the datasets, 321 A2 produces the poorest performance with the speed of 356.69 tiles/s; as the number of tiles for display in a screen is

M Ma et al.: Preprint submitted to Elsevier Page 12 of 19 HiVision: Rapid Visualization of Large-Scale Spatial Vector Data 7236.34 7236.34 A1 4674.53 4674.53 A2 3863.20 3863.20 3390.10 3390.10 P2 A2 2593.44 2593.44 2507.68 2507.68 TIME TIME (S) 2215.18 2215.18 P2 L7 1737.45 1737.45 L6 1277.83 1277.83 A1 997.51 953.64 953.64 880.91 880.91 L6 754.60 754.60 739.51 739.51 700.83 700.83 590.62 590.62 595.45 595.45 P1 416.40 416.40 L4 398.32 398.32 430.21 430.21 398.74 398.74 397.11 397.11 L5 291.80 291.80 246.03 246.03 279.18 279.18 268.61 268.61 L4 262.39 262.39 L5 238.06 238.06 Failed L7 Failed 159.28 159.28 185.59 185.59 183.54 183.54 169.24 169.24 160.30 160.30 159.63 159.63 Failed 138.30 138.30 137.72 137.72 124.59 124.59 P1 A2 93.61 93.61 80.37 80.37 75.28 75.28 55.23 55.23 L3 A1 L1 L6 P2 A1 L4 L7 L6 P1 L3 L5 L5 L4 L2 L3 P1 L7 A2 L1 L2 L1 L1 L2 L3 L2 X X P 2 X HiVision HadoopViz GeoSparkViz Mapnik (display-driven) (data-driven) (data-driven) (data-driven)

Figure 11: Total rendering time of generating all the tiles in zoom levels 1, 3, 5, 7 and 9.

322 generally no more than 100 (much fewer than the number of tiles generated per second of A2), it is possible to perform 323 real-time visualization with HiVision on all the datasets. As shown in Figure 13 (b), the rendering time distributions 324 of each tile on different datasets are visualized through box plots (’×’ represents the average rendering time of each 325 tile). For A2 which produces the poorest performance, most of the tiles are rendered in 0.24s. Assume that a browser 326 requests 100 different tiles of A2. As there are totally 64 running MPI processes, all the 100 tasks will be processed in 327 two rounds with 28 (= 64processes × 2 − 100tasks) MPI processes suspended in the second round, and all the tasks will 328 be most likely completed in less than 0.48s (= 0.24s × 2). In conclusion, HiVision is able to provide an interactive 329 exploration of large-scale spatial vector data.

330 5.3. Experiment 3. Impact of Request Rate 331 In the experiments other than this experiment, all the tile requests are dispatched to HiVision simultaneously, which 332 means that the request rate is set to infinity, and HiVision keeps running at full load until all the tasks have been finished. 333 In practical applications, the tile requests are generally generated at much lower rates. In this experiment, HiVision 334 is set to run with 64 MPI processes and 4 OpenMP threads in each process. It means that there are 64 tile rendering 335 processes that can render 64 tiles at the same time (the redundant tasks are stored in the Task Pool waiting for idle 336 processes). The number of tiles to request per second in the experiment is set to multiple of 64 and the request rate is 337 respectively set to 128, 256, 512, 1024, 2048, 4096 and infinity (INF) tiles/s. For each rate, we generate 10000 random 338 tile tasks from zoom levels 3 to 15. 339 The rendering time distributions of each tile with different request rates are shown in Figure 14. The performance 340 limit is the number of tiles rendered per second at the rate of INF tiles/s (Figure 13 (a)). For all the datasets, the 341 rendering time of tiles increases obviously with the request rates when the rate is less than the performance limit. It is 342 because of the increasing resources competition between processes. And when the request rate exceeds the performance 343 limit, the rendering time of tiles does not change significantly. It is because that the visualization engine is running 344 at full load and the increase of request rates will not cause obvious effects on the performance; in particular, as the 345 increase of extra tasks in Task Pool, the response time of HiVision will increase rapidly if the request rate exceeds the 346 performance limit. In conclusion, compared with the results in other experiments, higher performance can be achieved 347 in practical applications as a result of the lower request rates.

348 5.4. Experiment 4. Parallel Scalability 349 To evaluate the parallel scalability, HiVision is respectively tested to run on 4, 8, 16, 32, 64, 128 and 256 MPI 350 processes with 1, 2, 4 and 8 OpenMP threads in each process. For each pair of MPI processes and OpenMP threads,

M Ma et al.: Preprint submitted to Elsevier Page 13 of 19 HiVision: Rapid Visualization of Large-Scale Spatial Vector Data

HiVision HadoopViz GeoSparkViz Mapnik HiVision HadoopViz GeoSparkViz Mapnik HiVision HadoopViz GeoSparkViz Mapnik

5000 7000 6000 6205.2894 5416.9818 4500 4457.9470 6000 4000 5000 3500 5000 5490.5879 4000 3000 4000 2500 2260.1318 2354.3600 3000 1953.2148 3000 2617.1479 2000 2310.2293 2035.3427 2109.2326 1386.7822 1524.3051 2004.7108 2023.3057 1957.6332 2014.8611 2000 1674.6961

1500 2000 1576.3137

SPEED(TILES/S) SPEED(TILES/S) 1550.5240 SPEED(TILES/S) 1480.9768 1404.6948 2031.7985 1269.4339 1733.4257 1753.3341 1000 763.0402 968.8896 1000 164.4782 29.5448 1000 5.3965 601.9988 1517.4372 1.5262 1135.3233 1307.6144 500 1.6070 7.9945 124.3926 0.1833 20.0360 0.0749 256.0256 0.0719 0.49981.3096 19.5045 253.2645 0.8763 2.191213.8013 436.6738 344.2574 0.0838 0.69621.2594 9.3240 18.7422 214.0692 0 0 210.1159 0 L E V E L 1 L E V E L 3 L E V E L 5 L E V E L 7 L E V E L 9 L E V E L 1 L E V E L 3 L E V E L 5 L E V E L 7 L E V E L 9 L E V E L 1 L E V E L 3 L E V E L 5 L E V E L 7 L E V E L 9

(a) L1 (b) L2 (c) L3 HiVision HadoopViz GeoSparkViz Mapnik HiVision HadoopViz GeoSparkViz Mapnik HiVision HadoopViz GeoSparkViz Mapnik

3500 3500 1800 3090.7376 1532.7571 3000 3000 2944.4126 1600 1400 2500 2500 1200 1059.1110 1066.7333 2000 2000 1000 948.2483 1447.4409 1397.5475 1508.9826 1525.1264 1068.3315 1500 1500 800 636.1513 699.9505 976.9934 976.1827 989.1248 600 513.0150

1000 883.8045 1000 1303.1560

SPEED(TILES/S) SPEED(TILES/S) 602.8352 625.6333 1003.8316 SPEED(TILES/S) 574.2274 556.7534 400 500 500 783.6575 214.3843 0.4068 52.9240 336.4376 0.3077 7.8960 404.7316 200 0.1349 16.5754 170.4618 0.0122 25.4878 449.1799 0.0103 40.7295 0.0030 11.8428 0.0620 0.23840.8446 3.2375 121.0945 0.0628 0.22800.8425 2.6181 120.7845 0.0463 0.09580.5837 1.3871 64.4548 0 10.8431 0 24.6156 0 3.2346 L E V E L 1 L E V E L 3 L E V E L 5 L E V E L 7 L E V E L 9 L E V E L 1 L E V E L 3 L E V E L 5 L E V E L 7 L E V E L 9 L E V E L 1 L E V E L 3 L E V E L 5 L E V E L 7 L E V E L 9

4 5 6 (d) L (e) L HiVision HadoopViz(f) L GeoSparkViz Mapnik HiVision HadoopViz GeoSparkViz Mapnik HiVision HadoopViz GeoSparkViz Mapnik 1200 800 1042.5129 4500 696.4833 702.5039 1026.9596 4114.0472 700 675.2964 1000 935.5366 929.6529 4000 639.0905 3500 600 542.4446 800 486.0636 3000 500 633.0844 600 545.9586 2500 400 2000 1745.0304 1756.5185 1722.6093 300 400

1500 1250.7084 1312.7559 1193.3708 SPEED(TILES/S)

SPEED(TILES/S) 200 SPEED(TILES/S) 200 1000 773.9436 130.0511 100 163.6868 500 0.2371 39.8940 324.6676 780.1818 46.7909 115.8111 0.0298 0.4756 0.6526 10.6067 100.2526 0.0098 1.4017 1.355920.8665 0.0246 0.3235 0.1279 3.32664.1258 0 0.0407 0.0855 0.0453 166.8704 0 0.0039 2.1610 0 14.8791 16.5325 L E V E L 1 L E V E L 3 L E V E L 5 L E V E L 7 L E V E L 9 L E V E L 1 L E V E L 3 L E V E L 5 L E V E L 7 L E V E L 9 L E V E L 1 L E V E L 3 L E V E L 5 L E V E L 7 L E V E L 9

7 1 2 (g) L (Mapnik failed) HiVision(h) PHadoopViz GeoSparkViz Mapnik (i) P (Mapnik failed) HiVision HadoopViz GeoSparkViz Mapnik 700 2500 606.4887 2091.8137 600 2000 HiVision 500 457.2464 391.6878 474.9727 1500 400 HadoopViz

921.4474 961.5872 300 240.4659 265.4862 1000 803.6283 GeoSparkViz

SPEED(TILES/S) 200 889.5282 SPEED(TILES/S) 469.9973 533.1426 500 247.2478 508.4389 100 58.0137 0.0391 12.1451 102.0707 Mapnik 0.0008 5.6141 109.4886 0.0243 0.3497 0.0945 1.91680.6938 38.7114 0.0408 0.05320.5067 0.9795 0 0.0030 0 1.7767 38.9030 L E V E L 1 L E V E L 3 L E V E L 5 L E V E L 7 L E V E L 9 L E V E L 1 L E V E L 3 L E V E L 5 L E V E L 7 L E V E L 9

(j) A1 (k) A2 (Mapnik failed)

Figure 12: Tile rendering speed of different zoom levels.

1357.11 40 1329.82 1400 35 1188.54 1238.77 1200 1057.94 30 28.04 1000 25 20.80 800 683.49 20 717.92 785.56 703.87 600 13.93 14.21 14.63 TIME(S) 15 12.73 480.73 400 10 8.41 9.45 8.07 7.37 7.52 356.69 5 200 TILES OF NUMBER

0 0 L1 L2 L3 L4 L5 L6 L7 P1 P2 A1 A2 Time Consumed To Render 10000 Tiles Number Of Tiles Rendered Per Second (a) Rendering time of 10000 tiles (b) Rendering time of each tile Figure 13: Tile rendering time of HiVision on different datasets.

M Ma et al.: Preprint submitted to Elsevier Page 14 of 19 HiVision: Rapid Visualization of Large-Scale Spatial Vector Data

(a) L1 (Performance limit: 1357.11 tiles/s) (b) L2 (Performance limit: 1329.82 tiles/s) (c) L3 (Performance limit: 1188.54 tiles/s)

(d) L4 (Performance limit: 717.92 tiles/s) (e) L5 (Performance limit: 1057.94 tiles/s) (f) L6 (Performance limit: 785.56 tiles/s)

(g) L7 (Performance limit: 703.87 tiles/s) (h) P1 (Performance limit: 1238.77 tiles/s) (i) P2 (Performance limit: 480.73 tiles/s)

(j) A1 (Performance limit: 683.49 tiles/s) (k) A2 (Performance limit: 356.69 tiles/s)

Figure 14: Rendering time of each tile with different request rates in HiVision.

351 we generate 1000 random tile requests of different zoom levels on each dataset. The experimental results are plotted 352 in Figure 15. 353 We analyze the rendering time of 1000 tiles. HiVision achieves high performance of parallel acceleration when the 354 process number is below 32, which is approximate to linearity; and the performance of parallel acceleration decreases 355 as the process number is over 32, especially while running with 8 OpenMP threads in each process. It is because 356 the increase of process numbers intensifies the resource competition, and the competition is even more intense while 357 running with multiple threads. For example, see Figure 15 (d), the rendering time of 1000 tiles with 8 threads increases 358 as the process numbers increase from 128 to 256. Then, we analyze the average rendering time of each tile line with 359 different OpenMP threads. As shown in the figures, multi-thread parallel processing reduces the rendering time of 360 a tile when the resource competition is not intense. Surprisingly, for L1,L2 and P1 which have the smaller scale, 361 running with 2 thread produces weaker performance than 1 thread even if the resource competition is not intense, it 362 is because that the initialization cost of multiple threads is higher compared with the parallel acceleration. Based on 363 the experimental results and the analysis, a conclusion can be drawn about the deployments of HiVision in the given 364 hardware environment: 1) if the number of request tiles is high, 256 processes × 1 thread is suggested; 2) if the number 365 of request tiles is low, 32 processes × 8 threads is suggested, as this setting has low response time of each tile.

M Ma et al.: Preprint submitted to Elsevier Page 15 of 19 HiVision: Rapid Visualization of Large-Scale Spatial Vector Data

4 8 16 32 64 128 256 4 8 16 32 64 128 256 4 8 16 32 64 128 256 100 100 100 NUMBER OF PROCESSES NUMBER OF PROCESSES NUMBER OF PROCESSES

12.22 10.20 11.32 10 10 10 10.55 8.72 5.91 8.62 5.62 7.48 7.47 5.37 8.63 6.69 5.41 5.74 4.65 5.40 4.50 4.62 3.27 3.92 2.65 4.02 2.87 2.53 2.85 2.33 2.93 3.23 2.28 1.95 2.82 2.05 1.68 2.04 1.57 1.28 1.97 1.74 1.06 1.80 1.25 1.64 0.88 1.15 1 0.81 0.92 1 0.93 1 1.36 1.15 1.17 0.65 1.11 0.87 0.720.67

0.99 0.61 0.59 1.02 0.57 0.57 0.96 0.580.58

TIME(S) TIME(S) TIME(S) 0.800.79 0.51 0.75 0.49 0.69 0.46 0.76 0.56 0.56 0.47 0.58 0.41 0.48 0.56 0.47 0.35 0.39 0.43

0.1017 0.10… 0.0916 0.1048 0.1 0.0732 0.0823 0.1 0.0777 0.0886 0.1 0.0738 0.0860 0.0832 0.0587 0.0709 0.0593 0.0681 0.0623 0.1013 0.0489 0.0472 0.0538 0.0459 0.0501 0.0450 0.0524 0.0676 0.0810 0.0424 0.0697 0.0408 0.0430 0.0565 0.0670 0.0837 0.0453 0.0456 0.0776 0.0519 0.0746 0.0698 0.0422 0.0558 0.0555 0.0671 0.0349 0.0372 0.0373 0.0400 0.0504 0.0563 0.0635 0.0345 0.0360 0.0405 0.0409 0.0488 0.0573 0.0432 0.0614 0.0511 0.0370 0.0365 0.0434 0.0299 0.0313 0.0328 0.0368 0.0299 0.0322 0.0326 0.0373 0.0479 0.0345 0.0354 0.0288 0.0318 0.0328 0.0268 0.0258 0.0316 0.0229 0.0226 0.0216 0.0234 0.0263 0.01 0.01 0.01

(a) L1 (b) L2 (c) L3

4 8 16 32 64 128 256 4 8 16 32 64 128 256 4 8 16 32 64 128 256 100 100 100 NUMBER OF PROCESSES NUMBER OF PROCESSES NUMBER OF PROCESSES

23.88 22.88 21.74 19.73 14.53 13.07 15.39 13.50 11.41 10 11.65 10 12.91 10 10.28 9.76 7.43 9.09 7.36 6.49 7.06 9.23 7.26 6.00 5.93 7.20 5.45 3.76 5.07 3.96 4.78 3.79 3.89 4.35 3.80 3.42 3.31 2.42 3.83 2.92 2.54 1.93 2.10 2.54 2.16 1.99 1.93 2.54 2.14 1.271.21 1.15 2.05 1.24 1.75 1.16 1.65 1.37 1.14 1.60 1.26 1.60 1.31 1.13 1 1.32 1.10 1.10 1 0.70 1 1.17 1.02 0.880.88 1.07 0.97 1.22 0.75 0.73

0.901.00 0.64 0.86 0.820.87

TIME(S) TIME(S) TIME(S) 0.64 0.64 0.69 0.54 0.2077 0.1529 0.2048 0.1585 0.1548 0.1455 0.1309 0.1393 0.1353 0.1588 0.1204 0.1983 0.1159 0.1342 0.1162 0.1039 0.0902 0.0915 0.0961 0.1096 0.1 0.0955 0.1271 0.1739 0.1 0.0808 0.0839 0.1 0.0913 0.1227 0.1571 0.1060 0.1316 0.0617 0.1118 0.0870 0.0932 0.0949 0.0876 0.0523 0.0594 0.0634 0.0832 0.1151 0.0934 0.0839 0.1480 0.0846 0.0794 0.0976 0.0789 0.0823 0.0873 0.0616 0.0589 0.0607 0.0692 0.1284 0.0622 0.0617 0.0638 0.0608 0.0686 0.0750 0.1026 0.0529 0.0516 0.0565 0.0512 0.0577 0.0767 0.0540 0.0581 0.0407 0.0391 0.0406 0.0407 0.0511 0.0364 0.0382 0.0390 0.0369 0.0348 0.0407 0.0288 0.0306 0.0329

0.01 0.01 0.01

(d) L4 (e) L5 (f) L6

4 8 16 32 64 128 256 4 8 16 32 64 128 256 4 8 16 32 64 128 256 100 100 100 NUMBER OF PROCESSES NUMBER OF PROCESSES 46.08 NUMBER OF PROCESSES

24.57 34.23 24.09 22.23 12.12 11.62 21.50 15.46 16.64 14.52 10 11.43 10 10.82 10 12.99 10.66 11.05 8.65 6.18 8.72 6.07 8.93 5.83 5.96 6.14 4.07 6.66 6.42 5.43 5.20 3.28 5.12 3.87 3.90 4.17 2.91 3.32 2.34 3.35 3.42 3.55 2.64 2.05 2.68 2.40 1.84 2.23 2.001.98 2.25 2.00 1.52 2.53 1.05 1.65 1.86 1.66 1.43 1.22 1.07 1.72 1.69 1.83 1.27 1.05 1.32 0.80 1.03 1.42 1.39 1 1.15 1.00 1 1

1.03 0.99 1.06 0.63 0.63 0.59

TIME(S) TIME(S) 0.800.80 0.56 TIME(S) 0.3596 0.56 0.62 0.3556 0.440.49 0.2457 0.1919 0.2324 0.2480 0.2201 0.3349 0.1464 0.1894 0.1843 0.1927 0.1909 0.1302 0.1495 0.1379 0.1688 0.2502 0.1782 0.1063 0.1979 0.0983 0.0970 0.0988 0.0761 0.1010 0.1637 0.1427 0.1702 0.1 0.1280 0.1242 0.1797 0.1 0.1 0.1369 0.1332 0.1428 0.1062 0.0913 0.0657 0.0752 0.1136 0.1080 0.0889 0.0914 0.0983 0.1264 0.0465 0.0590 0.0860 0.0853 0.0868 0.0720 0.0812 0.0486 0.0525 0.0745 0.0883 0.0809 0.0618 0.0692 0.0624 0.0433 0.0514 0.0512 0.0531 0.0466 0.0466 0.0487 0.0668 0.0798 0.0520 0.0513 0.0548 0.0442 0.0416 0.0428 0.0385 0.0423 0.0510 0.0349 0.0334 0.0339 0.0266 0.0268 0.0276

0.01 0.01 0.01

(g) L7 (h) P1 (i) P2

4 8 16 32 64 128 256 4 8 16 32 64 128 256 100 100 91.41 NUMBER OF PROCESSES 43.05NUMBER OF PROCESSES 51.84 31.61 33.17 29.75 23.49 19.03 15.05 21.81 15.75 14.59 12.95 11.86 10 7.71 10 11.42 9.12 10.27 8.06 8.30 5.67 6.02 6.23 3.91 5.45 5.26 3.68 5.11 4.00 2.88 2.94 3.03 2.75 Rendering Time Of 1000 Tiles (1 Thread) 3.27 2.77 2.23 2.94 2.54 1.93 2.44 2.34 1.46 2.41 2.272.46 2.19 Rendering Time Of 1000 Tiles (2 Threads) 1.78 1.18 1.21 1.76 1.01 1 1.42 1.14 1

0.99 1.06 1.14 0.5295 Rendering Time Of 1000 Tiles (4 Threads) TIME(S) TIME(S) 0.3458 0.4216 0.3656 0.3444 0.3626 0.2952 Rendering Time Of 1000 Tiles (8 Threads) 0.2170 0.3045 0.2918 0.4022 0.2050 0.2356 0.2926 0.3947 0.1757 0.1754 0.2074 0.18980.1993 0.2729 Average Rendering Time Of Each Tile (1 Thread) 0.1204 0.1416 0.2380 0.1771 0.1264 0.1233 0.1250 0.1236 0.1913 0.1365 0.1327 0.1682 0.1541 Average Rendering Time Of Each Tile (2 Threads) 0.1 0.0970 0.1140 0.1820 0.1 0.1260 0.1328 0.0940 0.1036 0.0963 0.1189 0.1046 0.0813 0.0909 0.0872 0.0913 0.0872 Average Rendering Time Of Each Tile (4 Threads) 0.0583 0.0645 0.0639 0.0563 0.0411 0.0409 0.0471 Average Rendering Time Of Each Tile (8 Threads)

0.01 0.01

(j) A1 (k) A2 Figure 15: Parallel performance of HiVision with different numbers of MPI processes and OpenMP threads. M Ma et al.: Preprint submitted to Elsevier Page 16 of 19 HiVision: Rapid Visualization of Large-Scale Spatial Vector Data

Table 4 Environment of the online demo. Item Description CPU 4cores, Intel(R) Xeon(R) [email protected] Memory 32 GB Operating System Centos 7

Table 5 Datasets of China. Dataset Type Records Size China roads Linestring 21,898,508 163,171,928 segments China points Point 20,258,450 20,258,450 points China farmland Polygon 10,520,644 133,830,561 edges

366 6. Online Demo

367 An online demonstration of HiVision is provided2. The 10-million-scale datasets (see Table5) used in the demon- 368 stration are provided by map service providers. As the datasets are not open published, the raw datasets are encrypted 369 by adding offsets. To note that, a current demonstration is deployed on a stand-alone server with 4 cores CPU and 370 32 GB Memory (see Table4), which is accessible for an up-to-date personal computer. Even so, as illustrated in the 371 demonstration, it is still possible to provide an interactive visualization of 10-million-scale datasets in HiVsion.

372 7. Conclusions and Future Work

373 In this paper, we present a display-driven visualization model, HiVision, for interactive exploration of large-scale 374 spatial vector data. Different from traditional methods, in HiVision, the computing units are pixels rather than spatial 375 objects to achieve the goal of being less sensitive to data volumes. Different experiments are designed and conducted 376 to evaluate various system performance: experiment 1 shows that, compared with traditional data-driven methods, our 377 approach produces higher performance with better visual effects; experiment 2 demonstrates the ability of HiVision 378 to provide an interactive exploration of large-scale spatial vector data; in experiment 3, we analyze the impact of the 379 request rate in HiVision and demonstrate that higher performance can be achieved in practical applications compared 380 with the results in experiments; experiment 4 tests the parallel scalability of HiVision, and the results show that HiVi- 381 sion achieves high performance of parallel acceleration while the resource competition is not intense. Moreover, an 382 online demonstration of HiVision is provided on the Web, which verifies that HiVision is capable of handling 10- 383 million scale spatial data even deployed on a personal computer. Our future work will focus on extending HiVision to 384 support more complex visualization styles and applying HiVision to the field of map cartography.

385 Sourcecode availability

386 The source code of HiVision, including test data and user manuals, is available for download from Github. 387 (https://github.com/MemoryMmy/HiVision)

388 References 389 A, P., . Mapnik. URL: https://mapnik.org. 390 Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J., 2013. Hadoop gis: a high performance spatial data warehousing system over 391 mapreduce. Proceedings of the VLDB Endowment 6, 1009–1020. 392 Aly, A.M., Mahmood, A.R., Hassan, M.S., Aref, W.G., Ouzzani, M., Elmeleegy, H., Qadah, T., 2015. Aqwa: adaptive query workload aware 393 partitioning of big spatial data. Proceedings of the VLDB Endowment 8, 2062–2073. 394 B, K., a. Mapserver. URL: https://mapserver.org. 395 B, Y., b. Geoserver. URL: http://geoserver.org. 396 Bachman, J., 2014. As tv resolution increases, your eyes may be the limit. URL: https://www.bloomberg.com/news/articles/2014-01-16/ 397 as-tv-resolution-increases-your-eyes-may-be-the-limit.

2https://github.com/MemoryMmy/HiVision

M Ma et al.: Preprint submitted to Elsevier Page 17 of 19 HiVision: Rapid Visualization of Large-Scale Spatial Vector Data

(a) China POI

(b) China roads

(c) China farmlands

(d) Patterns filling for polygon objects Figure 16: Visualized results of the online demo.

398 Bellur, U., 2014. On parallelizing large spatial queries using map-reduce, in: International Symposium on Web and Wireless Geographical Infor- 399 mation Systems, Springer. pp. 1–18. 400 Choubey, R., Chen, L., Rundensteiner, E.A., 1999. Gbi: A generalized r-tree bulk-insertion strategy, in: International Symposium on Spatial 401 Databases, Springer. pp. 91–108. 402 Eldawy, A., Elganainy, M., Bakeer, A., Abdelmotaleb, A., Mokbel, M., 2015. Sphinx: Distributed execution of interactive sql queries on big spatial 403 data, in: Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM. p. 78. 404 Eldawy, A., Mokbel, M.F., 2015. Spatialhadoop: A mapreduce framework for spatial data, in: 2015 IEEE 31st international conference on Data 405 Engineering, IEEE. pp. 1352–1363. 406 Eldawy, A., Mokbel, M.F., Jonathan, C., 2016. Hadoopviz: A mapreduce framework for extensible visualization of big spatial data, in: 2016 IEEE 407 32nd International Conference on Data Engineering (ICDE), IEEE. pp. 601–612. 408 Eldawy, A., Yuan, L., Mokbel, M.F., Janardan, R., 2013. Cg-hadoop: Computational geometry in mapreduce, in: Acm Sigspatial International 409 Conference on Advances in Geographic Information Systems. 410 Fan, J., Ji, M., Gu, G., Sun, Y., 2014. Optimization approaches to mpi and area merging-based parallel buffer algorithm. Boletim de Ciências 411 Geodésicas 20, 237–256. 412 Fernández, F., 2018. Boost geometry library. Boost Geometry library Available from: https://www.boost.org/doc/libs/1680/libs/geometry/doc/html/index..

M Ma et al.: Preprint submitted to Elsevier Page 18 of 19 HiVision: Rapid Visualization of Large-Scale Spatial Vector Data

413 Fries, S., Boden, B., Stepien, G., Seidl, T., 2014. Phidj: Parallel similarity self-join for high-dimensional vector data with mapreduce, in: 2014 414 IEEE 30th International Conference on Data Engineering, IEEE. pp. 796–807. 415 Gao, J., Wang, C., Li, L., Shen, H.W., 2005. A parallel multiresolution volume rendering algorithm for large data visualization. Parallel Computing 416 31, 185–204. 417 Ghosh, S., Eldawy, A., Jais, S., 2019. Aid: An adaptive image data index for interactive multilevel visualization, in: 2019 IEEE 35th International 418 Conference on Data Engineering (ICDE), IEEE. pp. 1594–1597. 419 Guo, M., Guan, Q., Xie, Z., Wu, L., Luo, X., Huang, Y., 2015. A spatially adaptive decomposition approach for parallel vector data visualization 420 of polylines and polygons. International Journal of Geographical Information Science 29, 1419–1440. 421 Guo, M., Huang, Y., Guan, Q., Xie, Z., Wu, L., 2017. An efficient data organization and scheduling strategy for accelerating large vector data 422 rendering. Transactions in GIS 21, 1217–1236. 423 Huang, X., 2013. Parallel buffer generation algorithm for gis. Journal of Geology and Geosciences 02. 424 Kraak, M.J., Ormeling, F.J., 2013. Cartography: visualization of spatial data. Routledge. 425 Lins, L., Klosowski, J.T., Scheidegger, C., 2013. Nanocubes for real-time exploration of spatiotemporal datasets. IEEE Transactions on Visualization 426 and Computer Graphics 19, 2456–2465. 427 Liu, Z., Jiang, B., Heer, J., 2013. immens: Real-time visual querying of big data, in: Computer Graphics Forum, Wiley Online Library. pp. 421–430. 428 Lu, P., Chen, G., Ooi, B.C., Vo, H.T., Wu, S., 2014. Scalagist: scalable generalized search trees for mapreduce systems [innovative systems paper]. 429 Proceedings of the VLDB Endowment 7, 1797–1808. 430 Ma, M., Wu, Y., Chen, L., Li, J., Jing, N., 2019. Interactive and online buffer-overlay analytics of large-scale spatial data. ISPRS International 431 Journal of Geo-Information 8. URL: http://www.mdpi.com/2220-9964/8/1/21, doi:10.3390/ijgi8010021. 432 Ma, M., Wu, Y., Luo, W., Chen, L., Li, J., Jing, N., 2018. Hibuffer: Buffer analysis of 10-million-scale spatial data in real time. ISPRS International 433 Journal of Geo-Information 7. URL: http://www.mdpi.com/2220-9964/7/12/467, doi:10.3390/ijgi7120467. 434 Maceachren, A.M., Gahegan, M., Pike, W., Brewer, I., Cai, G., Lengerich, E., Hardisty, F., 2004. Geovisualization for knowledge construction and 435 decision support. IEEE Computer Graphics & Applications 24, 13–17. 436 OmniSci, 2020. Omnisci technical white paper. URL: http://www2.omnisci.com/resources/technical-whitepaper/lp. 437 OpenGIS, 2010. Web map tile service implementation standard. URL: http://portal.opengeospatial.org/files/?artifact_id=35326. 438 Pahins, C.A., Stephens, S.A., Scheidegger, C., Comba, J.L., 2016. Hashedcubes: Simple, low memory, real-time visual exploration of big data. 439 IEEE transactions on visualization and computer graphics 23, 671–680. 440 Scitovski, S., 2018. A density-based clustering algorithm for earthquake zoning. Computers & Geosciences 110, 90–95. 441 Shen, J., Chen, L., Wu, Y., Jing, N., 2018. Approach to accelerating dissolved vector buffer generation in distributed in-memory cluster architecture. 442 ISPRS International Journal of Geo-Information 7, 26. 443 Shimrat, M., 1962. Algorithm 112: position of point relative to polygon. Communications of the ACM 5, 434. 444 Sommer, S., Wade, T., 2006. A to Z GIS: An Illustrated Dictionary of Geographic Information Systems. Esri Press. 445 Tang, W., 2013. Parallel construction of large circular cartograms using graphics processing units. International Journal of Geographical Information 446 Science 27, 2182–2206. 447 Tong, X., Ben, J., Liu, Y., et al., 2013. Modeling and expression of vector data in the hexagonal discrete global grid system. International Archives 448 of the Photogrammetry, Remote Sensing and Spatial Information Sciences 4, W2. 449 Wan, L., Huang, Z., Peng, X., 2016. An effective nosql-based vector map tile management approach. ISPRS International Journal of Geo-Information 450 5, 215. 451 Wang, T., Zhao, L., Wang, L., Chen, L., Cao, Q., 2016. Parallel research and opitmization of buffer algorithm based on equivalent arc partition. 452 Remote Sensing Information , 147–152. 453 Wikipedia, 2019. Vector tiles. URL: https://wiki.openstreetmap.org/wiki/Vector_tiles. 454 Wikipedia, 2020. Memory-mapped file. URL: https://en.wikipedia.org/wiki/Memory-mapped_file. 455 Xie, D., Li, F., Yao, B., Li, G., Chen, Z., Zhou, L., Guo, M., 2016. Simba: Spatial in-memory big data analysis, in: Proceedings of the 24th ACM 456 SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM. p. 86. 457 Yang, C., Wong, D.W., Yang, R., Kafatos, M., Li, Q., 2005. Performance-improving techniques in web-based gis. International Journal of Geo- 458 graphical Information Science 19, 319–342. 459 Yao, X., Li, G., 2018. Big spatial vector data management: a review. Big Earth Data 2, 108–129. 460 Yu, J., Wu, J., Sarwat, M., 2015. Geospark: A cluster computing framework for processing large-scale spatial data, in: Proceedings of the 23rd 461 SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM. p. 70. 462 Yu, J., Zhang, Z., Sarwat, M., 2018. Geosparkviz: a scalable geospatial data visualization framework in the apache spark ecosystem, in: Proceedings 463 of the 30th International Conference on Scientific and Statistical Database Management, ACM. p. 15. 464 Zhu, X., Huo, J., Qiu, Q., 2015. A novel methodology for parallel spatial overlay over vector data - a case study with shape file, in: 2015 IEEE 465 International Geoscience and Remote Sensing Symposium (IGARSS), IEEE. pp. 4522–4525. 466 Zuchao, W., Min, L., Xiaoru, Y., Junping, Z., Huub, V.D.W., 2013. Visual traffic jam analysis based on trajectory data. IEEE Transactions on 467 Visualization & Computer Graphics 19, 2159–2168.

M Ma et al.: Preprint submitted to Elsevier Page 19 of 19