<<

[MOBILE PLATFORMS]

Jinghao Shi, Edwin Santos, and Geoffrey Challen University at Buffalo

Editors: Sharad Agarwal and Marco Gruteser WHY AND HOW TO USE PHONELAB While app marketplaces have enabled large-scale app-level experimentation, medium-scale experimentation with the platform code implementing the app interface and providing core device services remains difficult for academic researchers. But this is where many of the ideas currently being explored by the mobile systems community must be evaluated—including new networking protocols, security and mechanisms, storage abstractions, and energy management strategies. To enable these experiments, we built and are operating PhoneLab, a 175-smartphone testbed where real users run experimental Android platform builds on their primary devices. We are eager to make PhoneLab useful to the mobile systems community. To aid in this effort, this article discusses why PhoneLab might be useful for your research and provides an overview of how to use the testbed, including examples drawn from our group’s current projects.

32 GetMobile October 2015 | Volume 19, Issue 4 [MOBILE PLATFORMS]

he world’s two billion mobile played a significant role in democratizing enormous number of users, like the Carat2 together constitute the development by allowing a single personalized energy management app, which T largest, most powerful, and most developer to easily reach a global audience. was installed more than 750K times. We widely distributed system ever built. In In the same way, these marketplaces are expect app marketplaces to continue to play many ways, the incredible success of the also playing an increasingly important a vital role in allowing the mobile systems smartphone is built on decades of research role in enabling large-scale mobile systems community to evaluate ideas at scale. by the mobile systems community. But new research. Researchers can now make But what about the smartphone platform challenges and opportunities alike await experimental apps available to billions itself? Platform codebases are the millions next-generation mobile apps and systems of users, and rely on the fact that even a of lines of code that are responsible for that attempt to harness this increasingly tiny fraction of scientists willing to performing crucial tasks such as determining powerful and pervasive network, and participate in their project can amount to a the device’s location, choosing which researchers must continue to find ways representative, statistically significant and available network to use, and managing to make these ubiquitous devices even globally distributed sample of thousands limited resources such as energy – tasks more effective. or even tens of thousands of users. As an of obvious interest to the mobile systems An important part of the smartphone example, the Netalyzr tool1 distributed by research community. Smartphone platforms success story has been the emergence of ICSI has been downloaded 24,000 times cannot be altered via apps distributed by app mobile app marketplaces, such as the by users in 120 countries. And if the app marketplaces, and they also limit the Play Store. These distribution channels have provides a tangible benefit it may attract an of these apps to useful information and device capabilities – sometimes for security or privacy reasons, but at other times simply because the information is considered too difficult or not useful enough to expose. ’s and Apple’s platform codebases are proprietary, making experimentation impossible without insider support. But while the Android Open Source Project (AOSP) makes it possible to create fully functional custom Android platform images for select devices, many roadblocks still await researchers attempting to deploy modified platform images at even modest scale. Experimental participants must be recruited and, if their behavior is to be representative, convinced to swap their primary smartphone for (and port their number to) a new device. Data collection tools must be built and tested, and the platform changes themselves must be developed and deployed. These experimental barriers are making it difficult for today’s academic mobile systems research community to have a real impact on the most important parts of the most successful mobile systems platform: the smartphone. To simplify smartphone platform experi- mentation we are operating PhoneLab, a

1 http://netalyzr.icsi.berkeley.edu/

Photo, istockphoto.com Photo, 2 http://carat.cs.berkeley.edu/

Volume 19, Issue 4 33 [MOBILE PLATFORMS]

smartphone platform testbed (https://www. demographics. We do know that we have and location homogeneity, PhoneLab is a phone-lab.org). PhoneLab is both a group participants in many different departments relatively poor environment to investigate of people and an ongoing community effort on campus, providing a reasonable level of this challenge. to maintain a well-instrumented version on-campus spatial coverage which can be of the AOSP useful for research purposes. important to certain studies. EXPERIMENTAL PROCESS PhoneLab consists of 175 participants that To simplify testbed administration, we To use PhoneLab, you first download and run custom Android platform images on operate PhoneLab as a Sprint corporate- modify our fork of the AOSP. Next, you their primary smartphone. Our PhoneLab liable service plan. PhoneLab participants work with PhoneLab administrators and platform image is open to the mobile pay $45 per month for an all-you-can-eat developers to test your changes. After your systems community to instrument and Sprint service plan which includes unlimited changes are validated to be safe and effec- modify. We provide the participants, reliable voice, messaging and data. Technically tive, they are distributed to participants data collection tools, and increasingly well resources are pooled and capped over all and data collection begins. Finally, after instrumented AOSP platform sources. You participants, but in practice the caps are verifying that your experiment has been provide the exciting research questions and generous enough to never limit participant cleared for human subjects compliance, you new ideas that help us learn more about and consumption. Cooperation from Sprint will be provided the data that it generated. improve these ubiquitous mobile devices. allows us to create an attractive price point Here we review each step in the process in In this article our goal is to provide for our participants, while they offload all more detail. an overview of PhoneLab’s capabilities in payment processing and customer support the hope that you will find it helpful in to us. Participants purchase service directly Instrumenting or Modifying enabling your research studies. We start from PhoneLab in six-month installments, the PhoneLab Platform: with some information about the with invoice and payment processing done Experimenting on PhoneLab begins with testbed, describe the process of performing using Square. Sprint’s coverage in the Buffalo making changes to the Android platform. a PhoneLab experiment, and then use some area is fairly good, including LTE mobile PhoneLab hosts a mirror of the AOSP examples drawn from our ongoing projects data coverage in many locations. allowing experimenters to use Google’s to provide tangible examples of what Currently all participants use the Google Android well-documented platform build PhoneLab can do. smartphone running a custom An- process as well as their repo and Gerrit droid platform image based on the Android source control tools. Only the repoURL PARTICIPANTS, PLANS, PHONES Open Source Project (AOSP). Currently used during the process of downloading PhoneLab is a smartphone platform testbed our PhoneLab branch is based on AOSP the platform sources is different – the rest located at the University at Buffalo (UB). version 5.1 “Lollipop,” an upgrade from of the workflow is identical. 175 participants – primarily UB, students, version 4.4.4 that participants received in We divide PhoneLab experiments faculty, and staff – receive discounted Sprint mid-September 2015. In previous academic into two categories: instrumentation and service and a 5 smartphone years participants use the modification. Instrumentation consists of to use as their primary device. In exchange, (2012-13), Samsung (2013- additional logging that records but does not they run a modified Android platform 14), and Google Nexus 5 (2014-15). Com- alter how Android operates. Modifications image containing instrumentation and pared to these previous smartphones the modify Android itself, possibly by altering potentially novel changes and features. Nexus 5 seems far superior, and has helped existing services or adding new features. PhoneLab participants were originally address battery lifetime and network quality Of course, modification experiments will required to be UB affiliates. Recently complaints registered by first- and second- usually need to generate data to evaluate we have relaxed that policy, but almost year participants. their changes, and so will require instru- all participants are still UB students, We are aware that in some ways mentation as well. faculty and staff. We have not yet polled PhoneLab represents an unusually homo- Instrumentation changes are usually the demographics of our 2014-15 geneous environment for smartphone ex- fairly well-contained and unlikely to participants, but a 2013-14 survey indicated periments. All participants live in the same break participant smartphones, and so we that PhoneLab participants were well geographic area, carry the same smartphone fast-track approval of these experiments. distributed between genders and across age model, and have service provided by the Instrumentation is also an incremental brackets. While they may not be entirely same carrier. While this homogeneity helps process, and new useful logging messages representative of the billions of smartphone simplify testbed management, it can make are preserved in future PhoneLab builds. users worldwide, or even the millions of certain experiments difficult to perform. In contrast, modifications usually create those located in the , the For example, our group has recently begun larger patches, can be harder for our team testbed is not just a bunch of working on several projects intended to to understand, and may break things. As undergraduates or Ph.D. students. help developers cope with performance a result, modifications are inspected more Because PhoneLab has always been open to differences caused by the fact that Android carefully before being tested and released, any UB affiliate willing to agree to our terms runs on 18K devices that are attached and require more coordination with the of service and make the initial payment, to networks with orders of magnitude PhoneLab team. Unlike instrumentation, we have no way to control participant performance variation. Given its device most modification experiments will only

34 GetMobile October 2015 | Volume 19, Issue 4 [MOBILE PLATFORMS]

FIGURE 1. Experimentation on PhoneLab: Experimenters begin using PhoneLab by locally developing and testing their platform changes. Once they are ready, their experiment is tested by a group of PhoneLab developers and then eventually deployed to PhoneLab participants. Researchers can request data generated by their experiment or common PhoneLab instrumentation. Data requests specify tags and date ranges and must document that the experiment has been reviewed for human subjects compliance.

be run for a limited period of time, at which of their experiment. Second, we require log volume produced by instrumentation point their changes will be reverted. in almost all cases that log messages be on frequently executed code paths. But PhoneLab instrumentation is done formatted in JSON for easy cross-language we have found Android’s -based using the familiar Android logcat interface, deserialization. PhoneLab provides logging infrastructure to be surprisingly which allows logging in Java, ++ and C experimenters with helper functions to robust and low overhead, allowing us to files – all the primary languages used by the validate that log messages contain well- add log messages to almost any part of the AOSP. Standard logcat messages include formatted JSON. We also encourage system. For example, although this was a timestamp, log level (info, warning, developers to wrap all logging in a try‐ not a well thought out piece of logging, at etc.), process and thread identifiers, a tag catchblock for safety. one point we were successfully recording identifying the type of message, and a At present our PhoneLab platform every screen refresh without a noticeable message payload. The PhoneLab platform sources already include a variety of useful performance impact. However, there are includes data collection that information. Our (https://www. certainly places where logging would captures all logcat messages generated by the phone‐lab.org) contains more information generate too much output, and the impact device, caches them locally, and then uploads about all the tags that we are currently of new instrumentation on logging volume them to our when the smartphone is collecting, but they record commonly is something we note during testing. plugged in and charging and has network needed information such as location Frequently a more intelligent approach access. Log post-processing splits the updates; battery level changes; network to logging can reduce overhead while combined logs into per-device per-tag files transitions, measurements, and signal maintaining fidelity. and adds a device identifier to each log . strength changes; and app installation Eventually, with additional contributions We have taken two steps to help organize and removal. Some of this information is from the mobile systems community, we and facilitate PhoneLab instrumentation. available to apps through Android’s API, hope to achieve a thoroughly instrumented First, we have established a convention but we have found it easier to embed it Android platform source generating almost that log tags include a category and in our platform image. In other cases, all commonly needed measurements. Once subcategory which together hint at their PhoneLab instrumentation reflects our we reach that point, many experimenters purpose. Examples include Network‐ own research interests. For example, we may discover that their logging needs are , Network‐Wifi, Power‐Battery have instrumented the SQLite database met by instrumentation that has already and Power‐Screen. Log tags also include library used by many apps, the system call been contributed by other researchers and an institution identifying who contributed interface, and also added a variety of tags they can skip ahead to requesting existing the instrumentation. Standardizing the to certain core elements. data. Researchers are also free to utilize format of log tags is important since In all these cases, we are also recording our existing instrumentation for their own log tags are how researchers identify information not typically available to apps. purposes, regardless of whether they run requested PhoneLab data at the conclusion Initially we had concerns about the their platform on PhoneLab or not, and we

October 2015 | Volume 19, Issue 4 GetMobile 35 [MOBILE PLATFORMS]

are happy to provide our server-side tools to anyone who wants to franchise PhoneLab at their own institution.

Testing and Distributing Experiments: Once a researcher has completed their experimental changes, we perform three rounds of testing before changes are distributed to participants. First, we use Jenkins to do continuous integration testing to catch errors that break the build process. Second, Jinghao Shi, the PhoneLab team member overseeing the FIGURE 2. An Overview of PhoneLab distribution process, tests it on his own device. Third, we distribute changes to a half-dozen PhoneLab developers who are also participants. After one or two days on the device, Android provides an API testing. In many cases multiple data have passed, a preliminary data archive to initiate the process of rebooting into instrumentation experiments can be is made available to the experimenter recovery mode, applying the selected combined and run concurrently or so that they can confirm that PhoneLab update, and the rebooting again into alongside a single modification experiment. is collecting the data they need, and so the modified platform image. Built-in However, multiple platform modifications that we can estimate how much data the PhoneLab tools handle detecting when may conflict with each other and need to be experiment is generating. At the same time, updates are available, transferring them run serially. We determine this on an ad- developers are polled to ensure that they to the device, and determining when to hoc basis during experiment testing. have not noticed any regressions during apply the update safely without disturbing the testing period. Developer beta testing the participant. Frequently even PhoneLab Data Request and Analysis: has successfully caught several bugs that developers miss the update process and In the best case, researchers may find that would have irritated participants, including don’t realize that they are running a new PhoneLab has already been logging the data a change that broke the Android alarm image until they are asked to comment they need for their analysis. In other cases, clock and logging bugs in SQLite that broke on its stability. Once smartphones additional instrumentation will need to be almost every Android app. In both cases reboot into the new platform image, new run on the testbed for a period of time to these regressions were quickly identified instrumentation begins generating data generate enough data for analysis. In either and addressed. or new features provided by platform case, to access collected PhoneLab data Note that we also rely on developers to modifications are activated. researchers must provide documentation identify more subtle regressions, such as PhoneLab experimentation can also that their experiment has been approved poor interface responsiveness and increased include an app along with platform changes. for human subjects compliance—most energy consumption, rather than relying Depending on the experiment, it may be commonly by their university’s Institutional on automated tools. So far this approach more natural to implement parts of it as an Review Board (IRB). (Note that experiments has worked well, combined with proactive app, and in other cases the app provides must only be approved by a researchers local engagement with experimenters over issues the user interface to a new platform IRB and do not require further IRB review at such as energy management. In cases where interface or features—such as in the Jouler the University at Buffalo.) Given the ability we anticipate an experiment may produce experiment described below. Experimenters of platform changes to record an enormous high energy overhead, we ask researchers to are free to distribute their app by asking amount of sensitive data about participant’s self-limit their experiment using Android’s participants to install it from the Google behavior, the IRB is a critical safeguard to built-in energy measurement tools. For Play Store. Alternatively, we can bundle ensure that this data is used and handled example, in an upcoming high-overhead their app into the platform update itself, appropriately. However, under PhoneLab’s network measurement experiment, which ensures that it is installed on every own testbed-specific IRB, additional IRB researchers have agreed to limit the energy PhoneLab smartphone and allows it to run approval is only required when the data is consumed by their modifications to 10% with elevated permissions if needed. In used, not before it is collected. of the device’s battery capacity. both cases, future updates to the app that do With IRB approval researchers may After an experiment clears the testing not require platform modifications can be request time-delimited archives for a set of process, we distribute it to participants distributed using the Store. tags collected from all PhoneLab devices using the same over-the-air (OTA) platform Given that Android platform by emailing PhoneLab administrators. update capabilities provided by Android instrumentation and modification is a Currently we do not accept blanket requests devices and commonly used by slow process, our goal is to run PhoneLab for all logged information. By default we carriers. Once a platform update is stored experiments as soon as they have passed limit tag requests to the tags that we have

36 GetMobile October 2015 | Volume 19, Issue 4 [MOBILE PLATFORMS]

documented as part of our instrumentation, complete trace data from their phones for system, which we call Jouler, creates a new but researchers may request additional tags the month of March 2015. We collected 254 Android API allowing certain apps called if we determine that they are useful and the participant­ days of data which contained energy managers to manage overall and IRB approves them for release. roughly 45M queries, and an analysis of per-app energy consumption on behalf of PhoneLab data is structured to allow this initial dataset appeared at TPCTC users. By exposing existing energy control researchers to import and process the data in August 2015.3 We are continuing our mechanisms that were previously only using a wide variety of log-processing tools, experiment on the full testbed and have available inside the platform, Jouler allows with the choice specific to their expertise begun talks with the Transaction Processing a wide variety of new energy management and the type of analysis they are perform- Performance Council (TPC) on using them policies to be deployed using the same ing. Given that individual nature of this as the basis for a new database benchmark app marketplaces used to distribute other decision and the common format of the aimed at mobile systems: TPC-MOBILE. Android apps. PhoneLab data, we have yet to devote sig- This instrumentation experiment provides We have performed a small-scale nificant resources to developing a common a good example of PhoneLab’s ability to use test of Jouler’s platform modifications post-processing toolchain. the platform to “get underneath” apps by on PhoneLab developers and achieved instrumenting commonly used libraries and promising results: six out of seven EXAMPLE EXPERIMENTS interfaces, allowing us to record information participants were able to increase their To make the process of using PhoneLab about all apps used by participants without battery lifetime using one of three new more concrete, we next present two modifying any. energy management policies we provided. examples of ongoing experiments that We have used this capability in multiple We are in the process of distributing our research group, blue (https://blue.cse. other ways, including to study filesystem and evaluating this modification on the buffalo.edu), is performing using PhoneLab. access patterns (by modifying the bionicC entire PhoneLab testbed. As Jouler shows, Our examples include one modification library) and user interface behavior (by modification experiments frequently begin experiment and one instrumentation modifying Android’s base UI elements). with data collection required to design new experiment. approaches. In this case, because Jouler’s Jouler: modifications added a new feature, we also SQLite Logging: Smartphone energy management has been needed to distribute an app to participants SQLite is a widely deployed embedded a continual challenge. Poor battery lifetimes during the study to take advantage database library that is heavily used on may frustrate users, and wasted energy of the new Jouler interface. Existing Android. Android both uses SQLite represents a missed opportunity to improve PhoneLab battery level logging was used to internally and provides interfaces performance or otherwise enhance the user benchmark energy consumption during the encouraging apps to use SQLite to persist experience. Significant improvements in modification experiment to determine the their own structured data. Unsurprisingly, energy management would not only help impact of the new policies enabled by Jouler. query workloads experienced by mobile satisfy users, but would also help support app embedded databases are unlike those new categories of energy-hungry apps that OTHER SMARTPHONE experienced by database servers supporting current smartphones struggle to support. EXPERIMENTATION TOOLS or big data applications. And the We began studying smartphone energy We believe that PhoneLab is the only public performance requirements are also quite management by using PhoneLab to record testbed allowing researchers to modify a different between interactive mobile apps detailed energy consumption information smartphone platform codebase on hundreds trying to conserve energy and transaction- in the first year that the testbed was publicly of real devices. The LiveLabs4 testbed at processing systems where throughput is key available. Our data confirmed what earlier Management University is the and energy only a second-order concern. studies had shown: that large variations most similar facility. An ambitious project To measure SQLite behavior on exist between smartphone users and apps far exceeding PhoneLab in scale, LiveLabs Android we instrumented the SQLite in terms of energy consumption and is an urban lifestyle innovation platform library to record all queries and deployed charging habits. As a result, it is difficult or using thousands of participants to explore our changes on PhoneLab. To protect impossible for a single energy management emerging scenarios. participants privacy, our instrumentation policy embedded in the platform to LiveLabs benefits from robust government strips out as much personally identifying meet the needs and expectations of all support and a large engineering team, but is data as possible and records only hashes smartphone users. more focused on industrial experimentation of prepared statement arguments. We Instead, we rearchitected energy rather than academic research. have completed an initial investigation management on Android by applying A variety of projects have attempted using publicly available data released by 11 the classic systems design principle of to perform experimentation using apps, PhoneLab developers who agreed to release separating policy from mechanism. Our either to directly collect data of interest to researchers (MobilLab http://mobilab.eecs. umich.edu/), as a platform for more gener- 3 https://blue.cse.buffalo.edu/papers/tpctc2015-pocketdata/ alized experimentation (Seattle https:// 4 http://centres.smu.edu.sg/livelabs/ seattle.poly.edu/html/), or by providing

October 2015 | Volume 19, Issue 4 GetMobile 37 [MOBILE PLATFORMS]

library support for experimentation people treat it as having little value. But Despite our success in making the (MobiLyzr http://mobilyzer-project.mobi/). when we discovered this after the second testbed more sustainable, PhoneLab’s As we have acknowledged, app marketplaces year of the study, it gave us the opportunity future is uncertain. It has taken us longer are powerful new tools for researchers to make changes that have dramatically than planned to deliver a working smart- interested in doing large-scale experimenta- improved our ability to scale and sustain phone platform testbed, and both the IRB tion. However, the natural need for platforms PhoneLab. approval and difficulty modifying the An- to isolate apps and protect user privacy In the third year we ended the free- droid platform serve as largely unavoidable sometimes prohibit useful data collection year incentive and began charging all barriers to entry. However, PhoneLab is and it is difficult for these approaches to participants. The $45 per month participants beginning to attract external users. We have prototype new platform features. Both of the pay is almost sufficient to cover the amount just completed an experiment on behalf of example PhoneLab experiments we present PhoneLab pays to Sprint, making the study researchers at the International Computer above could not be distributed using app self-supporting and allowing us to scale Science Institute in Berkeley investigating marketplaces. out to a potentially unlimited number of usage and are working with participants. Because everyone wants a groups at both the University of Michigan SUSTAINABILITY AND FUTURE free smartphone, charging for service has and UCSD that have experiments (on net- PhoneLab is unlike other systems testbeds made it somewhat more difficult to recruit work handover and smartphone security, such as PlanetLab, EmuLab, and MoteLab participants. But the ownership our new respectively) in the final stages of approval. in that it involves actual human participants participants take over their smartphone has In other cases, researchers have decided, rather than just machines. Initially this produced better participants that actually correctly or not, that medium-scale experi- presented a sustainability problem, as use their PhoneLab smartphone as their ments or data collection are unnecessary to our original plan recruited participants primary device. evaluate new ideas and chosen to conduct using expensive and unscalable financial Because participants do not have any small-scale experiments instead. incentives. However, we have taken several direct relationship with Sprint, we do At the end of the day, a medium-scale steps to improve the testbed’s sustainability serve as a point of contact for problems human-facing testbed like PhoneLab will in hopes that we can continue to operate with broken or malfunctioning devices. always be more expensive to operate than this resources for the mobile systems For the last two years of the project a network of machines and more difficult community for years to come with limited we have employed a full-time testbed to use than running experiments on a additional funding. administrator to provide on-campus support handful of graduate students. Continuing The main change was to the incentive to participants, and work with Sprint to the project past September 2016 will model. In previous years, first-year address certain billing issues. We have made require additional funding and community participants were provided with one year enough progress in streamlining testbed support. But until then, PhoneLab is of free service as an incentive to join the management to be able to eliminate this available to help you evaluate your new study, but this incentive proved ineffective, position going forward, but support and smartphone platform ideas at scale. We expensive, and actually may have damaged management are two parts of PhoneLab hope that this article has helped explain PhoneLab’s early representativeness. that still scale poorly with the number of why and how you might use our testbed, Rather than continuing for multiple years participants. As a result, we have no plans and we look forward to running your to amortize the initial investment, most to increase the size of PhoneLab through experiment soon. n participants left the study after the free year more recruiting – at least not until we can Geoffrey Challen is an professor ended. In addition, we also suspect that acquire additional funding to support a at the University at Buffalo, where he directs a non-negligible fraction did not use the management position. But it also unclear PhoneLab and leads blue, a research group free smartphone as their primary device whether it is worth the large amount of investigating aspects of smartphone and as instructed, instead treating it as a free effort to scale PhoneLab past its current size distributed systems, including energy Android device to fool around with or as a medium-size testbed. We suspect that efficiency, programming models, hardware- ignore altogether. In hindsight, this seems for most research studies, 175 participants software codesign, crowdsourcing and performance. unsurprising: given something for free, are as good as 500 or even 1,000. Jinghao Shi is a third-year PhD student at University of Buffalo. His research interests lie broadly in wireless networks and mobile systems. He received a BCS degree from the PHONELAB IS BOTH A GROUP OF PEOPLE University of Science and Technology of AND AN ONGOING COMMUNITY EFFORT in 2011. Edwin Santos is a fourth-year undergraduate TO MAINTAIN A WELL-INSTRUMENTED student studying computer engineering and is one of the systems administrators VERSION OF THE AOSP USEFUL FOR for the PhoneLab and contracted as a developer for a startup by the name of RESEARCH PURPOSES. ArgyleTechnologyGroup.

38 GetMobile October 2015 | Volume 19, Issue 4