Managing GPU Buffers for Caching More Apps in Mobile Systems

Managing GPU Buffers for Caching More Apps in Mobile Systems Sejun Kwon Sang-Hoon Kim Sungkyunkwan University KAIST [email protected] [email protected] Jin-Soo Kim Jinkyu Jeong Sungkyunkwan University Sungkyunkwan University [email protected] [email protected] ABSTRACT hardware. Modern mobile systems cache apps actively to quickly respond to a user’s call to launch apps. Since the amount of Memory is one of the most important resources to be man- usable memory is critical to the number of cacheable apps, aged carefully in mobile systems to satisfy the demands it is important to maximize memory utilization. Mean- within its limited capacity. Many mobile platforms (e.g., while, modern mobile apps make use of graphics process- Android) provide app cachingas a means to provide high ingunits (GPUs) to accelerate their graphicoperations and responsiveness of app launching. Caching an app means to to provide better user experience. In resource-constrained keep the app in memory in a paused state instead of killing mobile systems, GPU cannot afford its private memory but it when a user leaves the app, since launchingthe app from shares the main memory with CPU. It leads to a consid- spawninga new process takes a longtime due to the initial- erable amount of main memory to be allocated for GPU ization overheads in the app and the system. Accordingly, buffers which are used for processingGPU operations. These when a user wants to use the cached app again, the app GPU buffers are, however, not managed effectively so that can quickly respond to the user’s call by resumingexecution inactive GPU buffers occupy a large fraction of the memory rather than beingrestarted from scratch. Therefore, it is and decrease memory utilization. crucial to cache as many apps as possible with a given limited amount of memory. Since cachingan app costs the memory This paper proposes a scheme to manage GPU buffers to allocated to the app, many studies have been proposed to increase the memory utilization in mobile systems. Our improve the memory utilization of the limited memory space scheme identifies inactive GPU buffers by exploitingthe thereby increasingthe number of cached apps [3, 10, 14, 16]. state of an app from a user’s perspective, and reduces their memory footprint by compressingthem. Our sophisticated Amongthese previous studies, compressingin-memory data design approach prevents GPU-specific issues from causing is a compellingsolution for increasingthe memory utiliza- an unpleasant overhead. Our evaluation on a runningpro- tion. Due to the app caching, a large portion of memory totype with realistic workloads shows that the proposed is actually inactive but allocated for keepingthe state of scheme can secure up to 215.9 MB of extra memory from cached apps. By compressingsuch inactive memory, the sys- 1.5 GB of main memory and increase the average number of tem can secure more free memory and utilize it for caching cached apps by up to 31.3%. more apps, cachingmore file data, and so forth. In practice, many smartphone manufacturers have been employingsimi- 1. INTRODUCTION lar approaches in their state-of-the-art smartphones [21, 22]. Recently, mobile systems, such as smartphones and tablets, Android, one of the most popular mobile system platforms, have been overtakingPCs in personal computingenviron- also encourages adopting the memory compression in low ments. Due to the adoption of open platforms, users can memory conditions [3]. easily download and run applications (or apps for short) whichever they want to use from app stores [24]. In this However, the approaches to compressinginactive memory regard, it is important in mobile systems to carefully man- data, which are usually referred to as a compression cache, age underlying resources so that the demands for running miss an opportunity for securingmore memory. The com- many and various apps are satisfied on resource-constrained pression cache has been targeting virtual memory (VM) pages which contain stacks and heaps of apps. In modern mobile systems, many apps leverage a graphics process- ing unit (GPU) to accelerate graphic operations for a high- quality user interface and a realistic 3-D graphic scenery, which requires a large amount of GPU buffers along with the VM pages. These GPU buffers, however, are managed in a different way from the VM pages so that the conven- tional compression cache cannot track nor compress unused GPU buffers. 978-1-4673-8079-9/15/$31.00 ©2015 IEEE 207 This paper proposes a scheme to compress the unused GPU CPU GPUCPUCCPUPU CPUCPUCPU buffers to secure more free memory. Unlike VM pages, GPU Main page GPU pageppagepageage pagepagepage CPU Memory Memory table it is difficult to select inactive GPU buffers since the in- tabletablettableable tabletabletable put/output memory management unit (IOMMU) page table with which GPU accesses a GPU buffer does not provide the CPUCCPUPU CPUCCPUPU functionality to track page access. Moreover, the traditional IOMMU Main Process GPU pageppageage pageppageage CPU demand paging approach cannot be applied to GPU buffers page Memory page table ttabletableable since GPUs are usually incapable of recoveringfrom a page tabletabletable table fault [19]. For these reasons, our scheme exploits the state of an app from a user’s perspective. If an app goes to the Figure 1: Shared memory architecture common in background, it implies that the GPU buffers belonging to mobile systems the app become inactive since the background app is invisi- ble to a user and does not incur graphic operations usually. Process Thus, the GPU buffer of the app can be compressed to re- duce its memory footprint. If the app is turninginto the Process GPU buffers page foreground, it indicates that any compressed GPU buffers Buffer pages table should be decompressed since the buffers are about to be 1M 64K 4K active. In addition, we carefully manage the amount of com- IOMMU pressed GPU buffer to control the unfairness caused by the page significant variance in the GPU buffer size among apps. table The proposed scheme is implemented on a Google Nexus 5 Figure 2: Components related to GPU buffers smartphone runningthe Android platform, and is evaluated with various popular applications. Evaluation results show that the proposed scheme can provide up to 215.9 MB of reports have revealed that app usage has temporal local- extra memory, which increases the average number of cached ity [9, 20, 23, 26]. In this sense, the framework maintains apps by up to 31.3 % without imposingsignificant overheads an LRU list of cached apps, named a cached app list,and associated with compression and decompression. estimates the importance of an app based on the LRU dis- tance of the app in the list. The estimated importance is The rest of this paper is organized as follows. The following provided to an in-kernel memory reclamation module called section describes the background and related work. Sec- Low Memory Killer (LMK). When the free memory in a tion 3 presents the motivation of this work. Section 4 de- system is low, LMK terminates the least important app and scribes the proposed scheme. Section 5 shows the experi- reclaims the memory associated with the app. mental results of the proposed scheme usinga runningpro- totype. Finally, this paper concludes in Section 6 with pro- An app might be cached or not according to the memory vidingthe future work. status and the recency of app use. An app is launched via app start if the app is not cached, or via app resume if the 2. BACKGROUND AND MOTIVATION app is cached. The app resume is considered more efficient than the app start in terms of user experience and energy 2.1 Background since the app start involves many software activities includ- 2.1.1 Process Management in Android ingspawninga process to host the app, loadingdata from a Android [2] is one of the most popular platforms for mobile storage device, and initializing the app. devices. It is comprised of the Android framework and the customized Linux kernel. The Android framework provides 2.1.2 GPU Buffer apps with a runtime environment on top of the kernel and A GPU buffer is a memory object that is used for the input manages the apps and the environment according to user and output of GPU operations. GPU reads data from a interactions. The kernel controls the access to hardware GPU buffer, processes it, and outputs a result to a GPU devices and manages system-wide resources such as CPU buffer. Thus, GPU buffers usually contain GPU commands, and memory. textures for surfaces, and frame buffers. Android manages apps according to the Android app life There are two types of GPU memory architectures according cycle [4] which is tailored to the user behavior on mobile to the location where the GPU buffers are allocated. One of devices. It is known that users interact with many apps the types is the private GPU memory architecture in which momentarily [9, 11, 23], which makes it burdensome for an GPU buffers are allocated from private GPU memory, which end-user to manage the life of an app explicitly. Instead, is very fast and optimized for concurrent access from many Android caches apps as longas the system has enoughfree GPU cores. This type of architecture is prevalent in tradi- memory, and terminates one or more cached apps automati- tional desktops and servers which demand very high GPU cally when the free memory is low. When a user leaves from performance, and most of the GPU memory management an app, the app is paused and sent to the background. If a studies have conducted on this architecture [15, 17, 18, 28]. user launches the app again, the app is brought back to the foreground immediately by resuming the app, thereby pro- In contrast, mobile systems usually employ a shared memory vidinga fast response time of an app launch.

Load more