An Empirical Analysis of a Large-Scale Mobile Cloud Storage Service
Total Page:16
File Type:pdf, Size:1020Kb
An Empirical Analysis of a Large-scale Mobile Cloud Storage Service Zhenyu Liy, Xiaohui Wangy, Ningjing Huangy, Mohamed Ali Kaafarz, Zhenhua Li?, Jianer Zhou{, Gaogang Xiey, Peter Steenkiste] yICT-CAS, zCSIRO Data61, ?Tsinghua Uni., {CNIC-CAS, ]CMU {zyli, wangxiaohui, huangningjing, xie}@ict.ac.cn, [email protected], [email protected], [email protected], [email protected] ABSTRACT personal cloud storage market is estimated to have a com- Cloud storage services are serving a rapidly increasing num- pound annual growth rate of 33.1% between 2015 and 2020 ber of mobile users. However, little is known about the dif- [5]. Major players such as Google, Microsoft, Apple, Baidu, ferences between mobile and traditional cloud storage ser- and Dropbox are competing to offer users the best qual- vices at scale. In order to understand mobile user access ity of service while keeping their costs low. Cloud storage behavior, we analyzed a dataset of 350 million HTTP re- providers are working hard to meet their growing number of quest logs from a large-scale mobile cloud storage service. mobile users. For instance, Dropbox redesigned its mobile This paper presents our results and discusses the implica- app, adding new functionality, and it tapped mobile ISPs for tions for system design and network performance. Our key improved user experience [4]. observation is that the examined mobile cloud storage ser- Both improving user experience and keeping costs low vice is dominated by uploads, and the vast majority of users can benefit from an in-depth understanding of the following rarely retrieve their uploads during the one-week observa- two system aspects: tion period. In other words, mobile users lean towards the • User behavior pattern, including workload variation, usage of cloud storage for backup. This suggests that delta session characteristics, and usage patterns (e.g., occa- encoding and chunk-level deduplication found in traditional sional vs. heavy use). Insight gained can help opti- cloud storage services can be reasonably omitted in mobile mize performance and reduce cost at both the server and scenarios. We also observed that the long idle time between client. chunk transmissions by Android clients should be shortened • Data transmission performance, where factors that in- since they cause significant performance degradation due to crease latency must be identified and addressed to im- the restart of TCP slow-start. Other observations related to prove user QoE (Quality of Experience), a key factor in session characteristics, load distribution, user behavior and user loyalty. engagement, and network performance. Unfortunately, little is known about these two system as- pects in mobile cloud storage services. Recent seminal work Keywords examined user behavior in Dropbox [12, 8] and Ubuntu One Mobile cloud storage; user behavior; TCP performance [16]. Other research focuses on the traffic inefficiencies re- sulting from the synchronization of frequent local changes 1. INTRODUCTION [21, 22]. However, all these studies are specific to traditional PC clients, rather than mobile users. While some work noted With the abundant and pervasive personal content genera- the unique challenges of synchronizing data of mobile users tion witnessed today, the use of cloud storage for storing and in cloud storage services [10], both the user access behavior sharing personal data remotely is increasing rapidly. The and network performance properties in large-scale mobile cloud storage services remain unexplored. Indeed, mobile Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or users might behave quite differently from PC-based users distributed for profit or commercial advantage and that copies bear this notice when using cloud storage services. For instance, mobile and the full citation on the first page. Copyrights for components of this work users may modify file content less often due to the inconve- owned by others than ACM must be honored. Abstracting with credit is per- mitted. To copy otherwise, or republish, to post on servers or to redistribute to nience of editing files on mobile terminals. Also, the proper- lists, requires prior specific permission and/or a fee. Request permissions from ties of mobile devices might limit transmission performance [email protected]. due to connectivity, power and even software constraints. IMC 2016, November 14-16, 2016, Santa Monica, CA, USA This paper fills this void by examining a unique dataset c 2016 ACM. ISBN 978-1-4503-4526-2/16/11. $15.00 collected from a large-scale mobile cloud storage service DOI: http://dx:doi:org/10:1145/2987443:2987465 for a one-week period. The dataset consists of 349 million posit a smart auto backup function that defers uploads HTTP request logs generated by 1.1 million unique mobile in order to reduce the peak load and reduce cost. Some users. Our analysis first separates the series of requests of cost-effective storage solutions for infrequently accessed ob- each user into more fine-grained sessions with the session jects, like f4 [26], can also easily reduce cost. In addition, interval empirically learned from the dataset. We then char- providers can leverage the distinct usage patterns of users for acterize session attributes and usage patterns. Finally, we ex- effective in-app advertisement. Finally, to improve network, amine data transmission performance and diagnose the per- the effect of long idle intervals between chunks should be formance bottlenecks using packet-level traces collected at mitigated by using larger chunks or batching the transmis- both the server and client sides. Our main findings and their sion of multiple chunks. implications are as follows: Another contribution of this work is that we have • Sessions and burstiness. The inter-file operation time made our dataset publicly available for the community at of individual users follows a two-component Gaussian http://fi.ict.ac.cn/data/cloud.html. We hope the dataset will mixture model, where one component captures the in- be used by other researchers to further validate and investi- session intervals (mean of 10s), and the other corre- gate usage patterns in mobile cloud storage services. sponds to the inter-session intervals (mean of 1 day). The remainder of this paper is organized as follows. Sec- This model allows us to characterize user behavior at ses- tion 2 describes the background and dataset, while Section sion level. For example, users store and retrieve files in 3 presents an in-depth analysis of user behavior. Section 4 a bursty way, as they tend to perform all file operations investigates data transmission performance. We discuss the within a short time period at the beginning of sessions, caveats of our analysis in Section 5, and related work in Sec- and then wait for data transmission to finish. tion 6. Finally, Section 7 concludes our work. • Storage-dominated access behavior. In a single ses- sion, mobile users either only store files (68.2% of 2. DATASET AND WORKLOAD sessions), or only retrieve files (29.9% of sessions), but This section begins with an overview of the mobile cloud they rarely perform both. The mixture-exponential dis- storage service that we examine in this paper, followed by tribution model for average file size reveals that 91% of the description of the dataset in use and its limitations. Fi- the storage sessions store files that are around 1.5 MB nally, we present the temporal pattern of workload from mo- in size. Surprisingly, over half of the mobile users are bile devices, which will facilitate the understanding of sys- predominantly interested in the cloud storage for back- tem artifacts. ing up their personal data, and they seldom retrieve data. In contrast, PC-based users are far more likely to fully exploit both storage and retrieval processes. 2.1 Overview of the Examined Service • Distinct engagement models. User engagement exhibits The service that we examined is one of the major personal a bimodal distribution, where users either return soon to cloud storage services in China, serving millions of active the service or remain inactive for over one week. No- users per day. The service is very similar to other state-of- tably, about 80% of the users that use multiple mobile the-art cloud storage services, like Google Drive and Mi- store, retrieve, delete terminals will not return to download their uploads in the crosoft OneDrive. Users are offered to and share 1 following week. On the other hand, users that use both files through either a PC client or a mobile app. mobile and PC clients account for 14.3% of users, and This paper focuses on the first two operations (i.e., store and they are more likely to retrieve their uploads very soon. retrieve), because the last two operations (i.e., delete and share) do not go through the storage front-end servers, where • Device type effects on performance. The mobile device we collected our data. type (either Android or iOS) has a significant impact on Users are allowed to select multiple files to store or re- chunk-level transmission performance. We show that the trieve at one time. Uploading a file does not automatically root cause lies in the idle time between chunk transmis- delete the local copies of the files. That said, a local copy of sions. A large TCP idle time might trigger the restart of the uploaded file might be kept on the device. HTTP is used TCP slow-start. As a result, as many as 60% of idle in- to move data between cloud servers and clients. The basic tervals on Android devices restart TCP slow-start for the object for a HTTP request is a chunk of data with a fixed size next chunk, compared with only 18% for iOS.