An Edge-Hosted Personal Service for Low-Bandwidth Document Synchronization in Mobile Cloud Storage Services
Total Page:16
File Type:pdf, Size:1020Kb
EdgeCourier: An Edge-hosted Personal Service for Low-bandwidth Document Synchronization in Mobile Cloud Storage Services Pengzhan Hao, Yongshu Bai, Xin Zhang, and Yifan Zhang Department of Computer Science SUNY Binghamton Binghamton, NY ABSTRACT ated, and can be further synchronized in real-time to other Using cloud storage to automatically back up content changes devices owned by the user or his collaborators. We name when editing documents is an everyday scenario. We demon- this common usage scenario as cloud-storage-backed mobile strate that current cloud storage services can cause unneces- document editing. sary bandwidth consumption, especially for office suite doc- Low-bandwidth sync is important for cloud-storage-backed uments, in this common scenario. Specifically, even with in- mobile document editing. However, our motivation study cremental synchronization approach in place, existing cloud (§3.2) shows that exiting cloud storage services incur high storage services still incur whole-file transmission every time network traffic on mobile devices when synchronizing doc- when the document file is synchronized. We analyze the ument files, especially for office suite documents [9, 10, 34, problem causes in depth, and propose EdgeCourier, a sys- 47, 63], such as word processing, spreadsheet, and presenta- tem to address the problem. We also propose the concept tion documents, which constitute the most commonly used of edge-hosed personal service (EPS), which has many ben- editable document formats in practice [18, 33]. Specifically, efits, such as helping deploy EdgeCourier easily in practice. most cloud storage services transmit the whole document We have prototyped the EdgeCourier system, deployed it in file from the mobile device to cloud storage every time when the form of EPS in a lab environment, and performed exten- the user saves the file, even for a small change in the file sive experiments for evaluation. Evaluation results suggest (e.g., a single character addition). This behavior can cause that our prototype system can effectively reduce document a high amount of network traffic for mobile users considering synchronization bandwidth with negligible overheads. that the “save” operation is common and happens frequently when users edit documents. We name this common problem as the whole-file-sync problem in the cloud-storage-backed CCS Concepts document editing scenario. •Networks → Cloud computing; •Human-centered To solve the problem, we develop EdgeCourier, a system computing → Mobile computing; Ubiquitous and mo- to improve cloud-storage-backed document editing usage ex- bile computing systems and tools; Mobile devices; perience by significantly lowering network traffic generated by mobile devices when synchronizing office document file Keywords changes with cloud storage services. In the following, we in- troduce how EdgeCourier works by summarizing the three Mobile computing; Edge computing; Cloud storage; File major challenges of solving the whole-file-sync problem, and synchronization; Unikernel our corresponding designs of addressing them. First, the proper way to solve the whole-file synchroniza- 1. INTRODUCTION tion issue is using an incremental synchronization approach, with which only the changes made since the last synchro- In this paper, we investigate achieving low-bandwidth doc- nization, instead of the whole file, are transmitted. How- ument content synchronization for a popular scenario en- ever, existing techniques for this purpose, such as chunk- abled by smart mobile devices (e.g., smartphones and ta- ing/deduplication [5,8,16,52,57,64] and delta-encoding [20, bles) and cloud storage services: a user edits documents on 36, 50, 58, 65], virtually have no effect on office documents. his smartphones/tablets. The content changes made by the The reason is that, for most office applications, a small edit user are synchronized to cloud storage as they are gener- in an office document can result in a substantial change in its Permission to make digital or hard copies of all or part of this work for personal or binary format. Moreover, the performance of existing incre- classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation mental sync approaches highly depends on properly choosing on the first page. Copyrights for components of this work owned by others than the the right configuration (e.g., chunk/block size) for the traffic author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or of interest. However, the current incremental sync methods republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. deployed on cloud storage services treat all sync traffic in the same way, and hence they also do not work well for synchro- SEC ’17, October 12–14, 2017, San Jose / Silicon Valley, CA, USA nizing general document traffic. More details are discussed ⃝c 2017 Copyright held by the owner/author(s). Publication rights licensed to ACM. ISBN 978-1-4503-5087-7/17/10. $15.00 in §3. DOI: https://doi.org/10.1145/3132211.3134447 To address the first challenge, we design and implement an office-document-aware incremental sync approach, which the fact that edge nodes are usually embedded devices with we name as ec-sync. Our approach is based on the ob- limited computational resources, achieving high resource ef- servation that different types of office documents are all ficiency is critical to EPS. Second, since multiple users’ data standard ZIP files (with different extensions), which are are processed in the same edge node, an EPS runtime that compressed archives containing a defined structure of sub- can help protect users’ data security and privacy is impor- documents [68, 70]. When a small edit is made to a docu- tant. Our approach of fulfilling the above two goals is to use ment on mobile device, the edit is recorded (in plain text) Unikernel as EPS runtime environment, since Unikernels are in one of the sub-documents. Thus, the general idea of ec- known for its properties of extreme lightweight and perfect sync is to capture the actual edits by comparing the latest resource isolation [43–46]. version of the sub-document containing the edits against the The third major challenge for EdgeCourier is that to de- last-synced version, and transmit only the actual edits. Al- ploy EdgeCourier, changes also needs to be made on mo- though the idea of ec-sync is straightforward, to make it bile devices side. However, requiring modifications on ei- work effectively and efficiently was not trivial. There were ther mobile apps or mobile OS is not practical for deploying several notable difficulties. For example, when comparing EdgeCourier on millions of mobile devices already in use. the versions of the sub-document, using existing implemen- To address this issue, we take advantage of our prior work tations of the diff utility [31] would contribute significant StoArranger [12, 13], a practical system framework running computation and time overheads to ec-sync if the document on mobile devices that can rearrange, coordinate, and trans- size is large. To address this problem, we developed ec- form cloud storage accesses from mobile apps. diff, which is a new difftool specifically designed for the To summarize, our contributions are as follows. cloud-storage-backed document editing scenario and is able • Through detailed measurement study, we identify the com- to achieve good performance regardless the sizes of the un- mon whole-file-sync problem, which incurs high sync band- derlying files. We discuss the details of ec-sync and the width in the popular cloud-storage-backed mobile document associated difficulties later in §4.1. editing scenario, and we demonstrate that it cannot be effec- The second challenge for EdgeCourier is that all the pos- tively solved by the exiting incremental sync approaches. sible incremental sync approaches require adding the cor- • We design ec-sync, an effective office-document-aware in- responding processing on cloud storage servers, which can cremental sync approach, and EdgeCourier, a system that greatly hinder them from being deployed in practice for two integrates ec-sync and the associated tools (e.g., ec-diff) reasons: one reason is that these approaches would undoubt- for low-bandwidth mobile document synchronization in cloud edly cause significant load increase on the cloud storage storage services. servers given their central processing nature; the other is • We propose the concept of edge-hosted personal service that it is hard, and also would take long time, for all the (EPS), which has many useful application scenarios and ben- cloud storage service providers to adopt such approaches. efits. For example, we demonstrate that EPS can help de- Our approach of addressing the above challenge is to dis- ploy the functionalities of EdgeCourier without requiring tribute the corresponding processing on cloud storage server changes on cloud storage services. to network edge nodes (e.g., wireless access points, in-home • We prototype the EdgeCourier system with real hardware. routers, and cellular towers). We propose the concept of We explore using Unikernel to serve as the runtime environ- edge-hosted personal service (EPS for short). An EPS runs ment of EPS, which is a promising approach to deploy EPS on a network edge node, and performs a specific functional- instances on resource-constrained edge nodes, and we report ity for mobile users. A mobile user can start/stop his own the corresponding experiences. EPS instance(s) on the edge node to which he is connected. • We evaluate the EdgeCourier system with real-world ex- One advantage of EPS is that it can help distribute the periments, which show that our system can achieve its goals functionalities that are normally deployed in data centers to with little overheads. network edge, to enjoy the benefits of fast deployment and data center load reduction.