Surface Compression Using Dynamic Color Palettes

Surface Compression Using Dynamic Color Palettes Ayub A. Gubran Felix Huang Tor M. Aamodt University of British Columbia Carnegie Mellon University University of British Columbia Vancouver, BC, Canada Pittsburgh, PA, United States Vancouver, BC, Canada [email protected] [email protected] [email protected] ABSTRACT framebuffer surfaces, which are composited to a single Off-chip memory traffic is a major source of power and surface for screen display. Also GPUs can use frame- energy consumption on mobile platforms. A large amount buffer surfaces as inputs to additional rendering stages, of this off-chip traffic is used to manipulate graphics e.g., render to texture and deferred shading; as a result, framebuffer surfaces. To cut down the cost of accessing any given application may utilize one or more frame- off-chip memory, framebuffer surfaces are compressed buffer surfaces. to reduce the bandwidth consumed on surface manipu- This work studies a large set of Android workloads to lation when rendering or displaying. infer the compression properties of framebuffer surfaces In this work, we study the compression properties generated by mobile UI, 2D and 3D applications. Our of framebuffer surfaces and highlight the fact that sur- study found that framebuffers from different classes of faces from different applications have different compres- workloads have different compression properties. We sion characteristics. We use the results of our analysis exploit these properties to propose an effective palette- to propose a scheme, Dynamic Color Palettes (DCP), based framebuffer compression scheme that focuses on which achieves higher compression rates with UI and common UI and 2D applications. In addition, we ex- 2D surfaces. ploit temporal coherence in graphics, where applications DCP is a hardware mechanism for exploiting inter- exhibit minor changes between frames that can be ex- frame coherence in lossless surface compression; it im- ploited for compression. plements a scheme that dynamically constructs color Using temporal coherence, and by focusing on com- palettes, which are then used to efficiently compress mon uses cases, we propose and evaluate our Dynamic framebuffer surfaces. To evaluate DCP, we created an Color Palettes (DCP) technique. DCP uses palette based extensive set of OpenGL workload traces from 124 An- compression and focuses on reducing the traffic caused droid applications. We found that DCP improves com- by framebuffer operations in UI and 2D applications. pression rates by 91% for UI and 20% for 2D applica- To evaluate our compression scheme, we created an ex- tions compared to previous proposals [1, 2]. We also tensive set of workloads from 124 Android applications. evaluate a hybrid scheme that combines DCP with a We show that by combining DCP with other compres- generic compression scheme [1], and found that com- sion techniques [1], DCP is able to improve compression pression rates improve over previous proposals [1, 2] by rates between 83% and 161% across UI, 2D and 3D ap- 161%, 124% and 83% for UI, 2D and 3D applications, plications. respectively. This paper makes the following contributions: 1. Characterizes compression properties of framebuffer arXiv:1903.06658v1 [cs.GR] 19 Jan 2019 1. INTRODUCTION surfaces from user-interface (UI) as well as non-UI Off-chip memory traffic, including that of framebuffer 2D and 3D applications; surfaces, is one of the major sources of power consumption on mobile systems-on-chip (SoCs). In some cases, 2. Uses characterization results to propose and eval- the energy consumption to access data on the off-chip uate dynamic color palettes (DCP), a compression memory can dominate that from computations [3]. In technique that offers higher compression rates for this work, we study the properties of framebuffer sur- common UI and non-UI 2D applications; faces and propose a set of unique compression techniques to reduce the bandwidth consumed by frame- 3. Proposes two DCP variations that dynamically choose buffer operations. an optimal palette size based on the frequencies of In graphics rendering, a framebuffer surface is an off- the values in color palettes; chip memory space that contains pixels generated by the graphics processing unit (GPU) and then used by 4. Evaluate our compression schemes using an exten- the display controller to read pixels to the screen. In sive set of workloads created from the OpenGL some cases, the display controller operates on multiple traces of 124 Android applications. Legends Refresh rates vary by App Level Application 1 Application 2 Application 3 application and user OS Level activity OS Graphics Libraries / Graphic API calls HW Level CPU (Software Rendering) / GPU (Hardware Rendering) Device Screen Compression! Compression! @ screen refresh 1 rate (i.e., 60 FPS) Render Framebuffer 1 Render Framebuffer 2 Render Framebuffer 3 Compression! Called upon Display updating any 2 Controller of the System Compositor (e.g., GPU or CPU Display framebuffer 3 framebuffers Android’s SurfaceFlinger) Composition Figure 1: The life-cycle of a surface from rendering to display. In 1 , applications render to their corresponding framebuffers. In 2 , a compositor combines the surfaces generated by different applications. In 3 , the composited surface is used and displayed on the screen by the display controller. 2. BACKGROUND AND RELATED WORK Surface compression differs from texture compression in that both encoding, as well as decoding, are per- 2.1 The life-cycle of a framebuffer surface formed in real-time. Also opposite to surface compres- Figure 1 summarizes the life-cycle of a frame surface sion schemes, most texture compression algorithms are in contemporary mobile systems (Android Ice Cream lossy [6, 11, 12]. Sandwich 4.0 and later [4]). Another crucial aspect of surface compression is ran- Figure 1 shows a typical scenario of drawing mul- dom accessibility. Techniques like Run-Length Encod- tiple surfaces simultaneously from multiple processes: ing (RLE) are unable to provide such accessibility. How- the status bar, Facebook, and the navigation bar. Each ever, it important to be able to randomly access a sur- process independently renders to its own surface ( 1 ); face when used for sampling (e.g., used as a texture), for example, Facebook renders a new surface when the resizing, or composition. Compression algorithms have user scrolls or clicks, while the navigation bar updates used block-based schemes to enable random access for the corresponding surface when the user clicks on one their simplicity and practicality. Block-based compres- of its buttons. sion mechanisms define preconfigured compression sizes For display, a system compositor, such as Surface- that allow random access to compressed surfaces. Block- Flinger [4] in Android, combines surfaces from multiple based mechanisms have been used for compressing inte- applications before sending them to the screen ( 2 ). ger (e.g., RGBA) surfaces [1], floating-point surfaces [13, The compositor actively monitors the surfaces of all ap- 14] and depth buffers [9, 14]. plications and when a process updates a surface, the The work by Rasmusson et al. [1] (which we refer compositor subsequently updates the composited sur- to by RAS) evaluated several surface compression pro- face. Simultaneously, the display controller hardware posals [15, 16, 17] and compared them against their continuously reads the composited surface to the screen technique. RAS is a lossless block-based compression at 60 frames per second (FPS) or higher ( 3 ). Note that technique for integer buffers that encodes the difference because using the same surface for updates and screen between adjacent pixel values. RGB pixels are con- refresh operations can cause artifacts, such as flickering verted to the Y cocg (luminance-chrominance) format, and tearing, double (or triple) buffering is used [5]. to increase compression efficiency. We compare against The example in Figure 1 shows how a surface can be RAS in this paper since it reports better compression used and re-used multiple times and this is why it is results versus prior work. important to reduce the overhead of framebuffer ma- We also evaluate our scheme against the compression nipulation through compression. scheme proposed by Nvidia [2]. In this scheme, for each block going to memory, the algorithm checks if 4×2 2.2 Surface compression techniques pixels in sub-blocks within a block are identical. If so, Surface compression is used to reduce off-chip mem- the block is compressed 1:8. When that is not possible, ory traffic, which can improve performance and/or re- the algorithm then checks if 2×2 regions have identical duce energy consumption. Graphics pipeline implemen- colors, if so the block is compressed 1:4, otherwise the tations utilize compression for textures [6], surfaces [1, block remains uncompressed. This algorithm works well 7, 8], depth [9] and vertex data [10]. with regions of identical color values, as the case with Many of the compression techniques use lossy com- UI surfaces. pression as well on lossless compression. For framebuffer Other compression work includes the work by Daniel- surfaces, lossless compression is used to avoid error ac- son [18], which proposes using a dictionary-based com- cumulation upon reading then re-writing surfaces (as is pression in which the operating system and/or program the case with composition). specify the colors to configure a dictionary. In contrast to Danielson's work, our work exploits temporal co- In this work, we focus on developing an effective scheme herence to dynamically construct dictionaries (palettes) for compressing UI and 2D framebuffer surfaces. The avoiding the need for software changes. reason for this choice is that studies found users to Another work by Shim et al. [19] use a dictionary- spend 70% of their time running UI applications [21, based compression mechanism targeted at display buffer 22], where over half of the time is spent on web brows- compression. Shim et al.'s approach compresses sur- ing, messaging and social media alone.

Surface Compression Using Dynamic Color Palettes

High End Visualization with Scalable Display System

Opengl Distilled / Paul Martz

Indexed Color

A New Steganographic Method for Palette-Based Images

Understanding Image Formats and When to Use Them

Deconstructing Hardware Usage for General Purpose Computation on Gpus

Content Aware Texture Compression

AMD Radeon E8860

The Opengl Framebuffer Object Extension

Order Independent Transparency in Opengl 4.X Christoph Kubisch – [email protected] TRANSPARENT EFFECTS

CES 2016 Exhibitor Listing As of 1/19/16

Amiga Graphics Reference Card 2Nd Edition