Queryguard: Privacy-Preserving Latency-Aware Query Optimization for Edge Computing
Total Page:16
File Type:pdf, Size:1020Kb
QueryGuard: Privacy-preserving Latency-aware Query Optimization for Edge Computing Runhua Xu ∗, Balaji Palanisamy y and James Joshi z School of Computing and Information University of Pittsburgh, Pittsburgh, PA USA ∗[email protected], [email protected], [email protected] Abstract—The emerging edge computing paradigm has en- adopting distributed database techniques in edge computing abled applications having low response time requirements to brings new challenges and concerns. First, in contrast to meet the quality of service needs of applications by moving the cloud data centers where cloud servers are managed through computations to the edge of the network that is geographically closer to the end-users and end-devices. Despite the low latency strict and regularized policies, edge nodes may not have the advantages provided by the edge computing model, there are same degree of regulatory and monitoring oversight. This significant privacy risks associated with the adoption of edge may lead to higher privacy risks compared to that in cloud computing services for applications dealing with sensitive data. servers. In particular, when dealing with a join query, tra- In contrast to cloud data centers where system infrastructures are ditional distributed query processing techniques sometimes managed through strict and regularized policies, edge computing nodes are scattered geographically and may not have the same ship selected or projected data to different nodes, some of degree of regulatory and monitoring oversight. This can lead which may be untrusted or semi-trusted. Thus, such techniques to higher privacy risks for the data processed and stored at may lead to greater disclosure of private information within the edge nodes, thus making them less trusted. In this paper, the edge nodes. Traditional query processing techniques in we show that a direct application of traditional performance- the distributed environments aim at optimizing the queries based query optimization techniques in edge computing can lead to unexpected data disclosure risks at the edge nodes. in terms of using the most efficient query plan and do not We propose a new privacy-preserving latency-aware query op- focus on addressing the privacy concerns in distributed settings timization framework, QueryGuard, that simultaneously tackles [5], [6], [8]–[11]. Although cryptography based solutions have the privacy-aware distributed query processing problem while been adopted in database systems in [12], [13], such schemes optimizing the queries for latency. Our experimental evaluation either incur a significant computation cost for implementing demonstrates that QueryGuard achieves better performance in terms of execution time and memory usage than conventional the crypto operations or require the use of trusted third party distributed query optimization techniques while also enforcing to support the operation [4], [7]. Secondly, as edge computing the required constraints related to data privacy. nodes are scattered geographically with varying degrees of net- Index Terms—query optimization; data privacy; latency work connectivity in terms of network bandwidth and latency, awareness; edge computing; database management optimizing distributed query processing in edge computing requires a special emphasis on network latency compared to I. INTRODUCTION that in traditional query processing where the emphasis is The emerging edge computing paradigm has enabled appli- primarily on minimizing the query computation time. cations having low response time requirements to meet quality In this paper, we propose QueryGuard, a privacy- of service needs of applications by moving computations to the preserving, latency-aware query optimization framework, to edge of the network that is geographically closer to the end- tackle the challenges of join query optimization in an edge users and end-devices [1]–[3]. A key distinguishing feature computing environment. Our proposed work deals with both of edge computing is its ability to store and process large the privacy concerns as well as latency optimization for amounts of data on servers and computing units located closer distributed join query processing in edge computing environ- to the data sources such as sensors and mobile devices. This ments. The proposed query optimization mechanism generates enables it to provide low latency services for highly interactive optimal query execution plans that ensure users’ privacy applications. For instance, augmented reality applications with preferences on their sensitive data stored in edge nodes during real-time computation requirements can be deployed as an the query execution phase. In particular, the mechanism auto- edge computing application to meet the response time require- matically controls the movement of sensitive data in a cross- ments. In a highly distributed data processing environment site join operation so as to avoid the sensitive data being stored such as those in the Internet-of-things (IoT), edge computing or disclosed to an untrustworthy node in the decentralized provides a natural solution to deal with the decentralized and computing infrastructure. The proposed query optimization low-latency computation of data generated in the IoT devices. framework also optimizes for the latency of the join queries Distributed database management approaches offer an effi- by dynamically considering the network characteristics of the cient ways to manage and process large amounts of decen- edge computing environment. We experimentally evaluate the tralized data for data-driven applications [4]–[11]. However, performance of the proposed query optimization techniques TABLE I and the results demonstrate that the proposed methods achieve SIMULATED LOCATIONS AND THEIR NETWORK LATENCY better performance in terms of execution time and memory Location Distance Latency(ms) Result usage compared to conventional distributed query optimization E - E d 0:022 · d + 4:862 t =5.082 techniques while also enforcing the privacy constraints. 1 2 e1;e2 e1;e2 e1;e2 E2 - E3 de2;e3 0:022 · de2;e3 + 4:862 te2;e3 =5.202 E3 - E1 de3;e1 0:022 · de3;e1 + 4:862 te3;e1 =5.192 II. BACKGROUND AND MOTIVATION A - B da;b 0:022 · da;b + 4:862 ta;b =26.862 y Note that the latency of network traffic is estimated based on the Edge computing refers to computing infrastructures that distance using a linear model: y = 0:022x+4:862 with coefficient enable computations and data processing tasks to be performed of determination (R2 = 0:907) proposed in [14]. at the edges of the network, closer to the data sources, allowing z The distance between the data center and the city is assumed to be 1000 miles, while the distance between edge nodes is 10, 20, and low latency applications to meet their short response time 15 miles, respectively. requirements [2]. Fig.1 illustrates an example where several TABLE II edge nodes are distributed at the edge of the Internet. This SIMULATED PARAMETER SETTINGS AND VALUES helps provide storage and computing services for IoT sensors Symbol Value Description closer to the edge of the network. t 30 min Time interval of the query A join query t1; s1; s2 may be requested from a mobile ve1 1 KB/min Speed of stream data generating at edge E1 application at the edge node E5. As there exists multiple ve2 2 KB/min Speed of stream data generating at edge E2 v 3 KB/min Speed of stream data generating at edge E query plans for this join query, the query optimizer chooses the e3 3 vnet 100 Mbit/s Ethernet speed optimal query plan that minimizes the query execution time. T 10 ms Query time in a single machine For instance, there are three possible join query statements for the query: t1 ./ s1 ./ s2, t1 ./ s2 ./ s1, and s1 ./ s2 ./ t1, (i) Edge-based approach. The total estimated time includes based on the the order of the join operation. For each query, the maximum time of shipping data from E2, E3 to E1 several query execution plan candidates can be generated from and the time of query processing in E1, as follows: the query statement based on different permutations of the tedge = max (vit=vnet + tj) + T ; join order, projection, and selection operations. In an edge i2fe2;e3g;j2f(e1;e2);(e1;e3)g computing environment, each query plan incurs a different where v represent the data generation speed and v is query processing and latency cost as stream/relations are i j the latency time. stored and placed in different edge nodes. (ii) Cloud-based approach. The total estimated time includes the maximum time of shipping data from E , E and E A. Latency-aware Query Optimization 1 2 3 to B, the time of query processing in B, and the time of Even though traditional distributed query processing tech- returning query results: niques could be used to manage stream/relational data, they are X not directly suitable for applications that require low response tcloud = max (vit=vnet+ta;b)+T + vit=vnet+ta;b; i2fe ;e ;e g time such as real-time interactive applications and as a result, 1 2 3 such approaches yield a sub-optimal performance in edge com- where vi represent the data generation speed. puting environments. Suppose that a mobile-based application As a result, the estimated query time of cloud-based approach supporting audio-based walking guide is used by people who (tcloud) is about 84.818 ms, while the edge-based approach are blind for navigation; when they are walking around, even (tedge) is about 22.223 ms using the values in Table II. This a small latency delay may result in a serious deviation from indicates that the cloud-based approach incurs nearly 4 times the correct path. Such latency sensitive applications usually the cost of the edge-based approach. have a higher requirement on latency optimization, which can B. Privacy-preserving Query Processing be supported by using an edge computing based approach. Although edge computing provides significant advantages As another example to illustrate the benefits of adopting in tackling latency issues for short response time applications, edge computing for applications requiring low latency query it also brings new privacy challenges when deployed in dis- processing, suppose that edge nodes E1;E2;E3 are located in tributed, semi-trusted computing environments.