Analysis of the Effective Use of Thread-Level Parallelism in Mobile Applications a Preliminary Study on Ios and Android Devices

Analysis of the Effective Use of Thread-Level Parallelism in Mobile Applications A preliminary study on iOS and Android devices Ethan Bogdan Hongin Yun Swarthmore College Swarthmore College ebogdan1 hyun1 Abstract 2.1 Parallelism on desktop workstations Following the model of earlier studies of desktop software, In 2010, Flautner et al. studied a range of desktop appli- we perform a survey of thread-level parallelism in common cations running on Microsoft Windows 7 and Apple’s OS mobile applications. In particular, we study the Android and X Snow Leopard, analyzing for parallelism. [10] Using the iOS platforms, through a representative sample of 3rd-party metric of Thread Level Parallelism, or TLP (see section software in their respective app stores. Ultimately, we con- 3.1), they concluded that 2-3 cores were more than sufficient clude that multiple cores may not be necessary for the ma- for most applications, and that current desktop applications jority of mobile experiences, observing that iOS apps tend to were not fully utilizing multi-core architectures. have slightly greater average parallelism, with slightly lower Other studies in a similar vein date back to 2000, when variance, than Android apps. Flautner et al. first investigated the thread-level parallelism and interactive response time of desktop applications. [9] Keywords multi-core, thread-level parallelism, mobile apps This study was done when multiprocessing was prevalent mostly in servers and had only just begun to enter into desk- 1. Introduction top machines. While servers were considered to be a natural In chip engineering, there are two broad strategies for in- fit for multiprocessing, due to the parallel nature of serving creasing computational speed: design processors that can multiple clients, the benefits of multiprocessing for desktop sustain higher clock rates, and design systems that incorpo- applications were not obvious. Now, as multiprocessor sys- rate more processing cores. The former strategy has more or tems are entering the smartphone market, we believe it is a less hit a plateau, as overheating and the limitations of mod- natural extension of these studies to ask whether the benefits ern cooling systems have come to present an inherent physi- of multiprocessing are fully realized on mobile devices. cal barrier. Therefore, in recent years, the majority of time 2.2 Multi-processing in Android phones and energy has instead gone into developing more paral- Since its initial release in May 2007, Android and the de- lel hardware: multiprocessors containing several cores each, vices that run Android OS have evolved rapidly. The first and computer systems that contain one or more of these commercially available phone to run Android was the HTC multprocessing chips. However, while this frontier contin- Dream, released on October 22, 2008. [3] HTC Dream had ues to advance, its computational benefits remain limited by a Qualcomm MSM7201A chipset, including an ARM11 ap- the structure of the software that it serves: to take advantage plication processor, ARM9 modem, and high-performance of parallel hardware, developers must practice good parallel digital signal processors. [12] In 2010, Google and several design patterns. handset manufacturers launched a line of smartphones and We suspect that many systems do not fully utilize their tablets as their flagship Android devices under the name parallel capacities, and, in this paper, we seek to extend the Nexus. The Nexus One, manufactured by HTC in January existing body of research in this area into the realm of mobile 2010, was released with Android 2.1 and had a Qualcomm devices. QSD 8250 with a single core Qualcomm Scorpion CPU [5]. The world’s first Android device with a multi-core processor 2. Background and related work was the LG Optimus 2X, which was equipped with NVIDIA As a field of computer science, parallel design has a rela- Tegra 2 system-on-a-chip and a 1 GHz dual-core processor. tively long history, dating back to the early days of servers [4] and networked computers. We draw our inspiration from a The latest Samsung Galaxy S4 and Galaxy Note 3, couple of more recent studies. released respectively in March and September 2013, are 1 equipped with multiple-core processors. All versions of during execution. The formula for Machine Utilization is Galaxy S4 and Note 3 are equipped with quad-core pro- shown in Equation 1: cessors, and flagship models shipped to certain markets are Pn c i even equipped with octa-core processors. Both Galaxy S4 Machine Utilization = i=1 i (1) and Galaxy Note 3 can run the latest Android 4.3 Jelly Bean. n [1] [2] In this equation, n is the number of thread contexts in the subject machine, and ci is the fraction of time that i = 2.3 Multi-processing in iPhones 0; :::; n number of threads were executed concurrently. If Historically, Apple has been very tight-lipped about the in- all processors in the machine were fully utilized during the ternal components of its products – iPhone processors in- execution of the benchmark, Machine Utilization would be cluded. However, with a little bit of investigative work, var- 1. This intuitive metric is, however, not suitiable for the type ious third parties have determined that the first two iPhone of study we conducted. Applications on mobile devices tend models, released in 2007 and 2008 respectively, both had to incur a significant amount of idle time, when no threads single-core processors clocked at around 412 MHz. In 2009, are being executed in any of the processors, due to a high with the release of the purportedly fast 3GS model (the ”S” degree of user interactivity and I/O activity. stood for ”speed”), the iPhone got a boost to 600 MHz. Em- We therefore decided to use Thread Level Parallelism pirical data shows that subsequent iPhone models have had (TLP), the same metric that Flautner et al. used in their pio- variable speed processors, ranging from 750 MHz at the low neering research. TLP is a variation of Machine Utilization end (of the iPhone 4) to 1.3 GHz at the high end (of the that factors out idle time. The formula for TLP is given in iPhone 5S). Speculation has it that these models use their Equation 2: variable clock rates to conserve energy whenever possible, Pn c i an important consideration on mobile devices. [8] TLP = i=1 i (2) It wasn’t until the iPhone 4S that Apple first began using 1 − c0 dual-core processors, with the introduction of their A5 chip As in Equation 1, n is the number of thread contexts in in 2011. They threw in an improved version of this chip, the subject machine, and ci is the fraction of time that i along with a three-core graphical processor, in the iPhone 5. threads were executed concurrently. TLP output values will Interestingly, it had been a year earlier, in 2010, that Ap- fall between 1 and n. ple released the first version of iOS to support multi-tasking, One last caveat we had to consider was how exactly to iOS 4. This release was further notable for including Grand measure c0, the idle time of the system. While this measure- Central Dispatch, a new technology that offered developers ment might be straightforward on a relatively simple operat- a simple, high-level interface for efficient thread manage- ing system, both Android and iOS are constantly running a ment. The operating system first shipped on the iPhone 4, host of background processes. Some of these processes are which, with its single-core A4 processor, could not support indispensible to the active app, while others may serve en- genuine parallelism. Nevertheless, the A4 was sufficiently tirely unrelated functions of the system; (for example, they fast to achieve an illusion of concurrency, and the new op- might monitor cellular connectivity). Our goal was to mea- erating system and multi-threading technology helped pave sure each app in as much isolation as possible, but, since the way for the advent of the 4S the following year. we had no sure way of determining which background processes were a part of the app and which weren’t, we ran all 3. Methodology of our TLP calculations twice. The first time, we assumed that all background processes constituted ”idle” time for the In the interests of consistency, we have tried to model our foreground app, while the second time we assumed that they methodology on the aforementioned work by Flautner et al. were ”active.” The reality is most likely somewhere in the In this section, we will summarize that approach, then go on middle (see section 5 for further discussion). to discuss the particularities of working with Android and iOS systems. It is important to note that we carried out all 3.2 Benchmarks of our research using actual, physical phones. This hardware We conducted two separate sets of experiments. The first allowed a far higher degree of accuracy and transparency in was intended to encompass a broad and representative cross- our data collection than any kind of emulation would have section of applications currently available on the Android offered. Market and iTunes app stores. Our sampling policy involved selecting three applications from each of seven popular cat- 3.1 Metrics egories: There are several different common metrics for quantifying parallelism in computer systems.[9][10] One simple and intuitive metric is Machine Utilization, which is a measure of the percentage of total processing resources that gets used 2 1. Business 5. Media to a developing system via USB. Systrace, which runs as a 2. Entertainment 6. News Python script on the developing machine, then establishes a debugging connection via the Android Debugging Bridge 3.

Load more