An Experimental Evaluation of Datacenter Workloads on Low-Power Embedded Micro Servers
Total Page:16
File Type:pdf, Size:1020Kb
An Experimental Evaluation of Datacenter Workloads On Low-Power Embedded Micro Servers Yiran Zhao, Shen Li, Shaohan Hu, Hongwei Wang Shuochao Yao, Huajie Shao, Tarek Abdelzaher Department of Computer Science University of Illinois at Urbana-Champaign fzhao97, shenli3, shu17, hwang172, syao9, hshao5, [email protected] ABSTRACT To reduce datacenter energy cost, power proportional- This paper presents a comprehensive evaluation of an ultra- ity [47] is one major solution studied and pursued by both low power cluster, built upon the Intel Edison based micro academia and industry. Ideally, it allows datacenter power servers. The improved performance and high energy effi- draw to proportionally follow the fluctuating amount of work- ciency of micro servers have driven both academia and in- load, thus saving energy during non-peak hours. However, dustry to explore the possibility of replacing conventional current high-end servers are not energy-proportional and brawny servers with a larger swarm of embedded micro ser- have narrow power spectrum between idling and full uti- vers. Existing attempts mostly focus on mobile-class mi- lization [43], which is far from ideal. Therefore, researchers cro servers, whose capacities are similar to mobile phones. try to improve energy-proportionality using solutions such We, on the other hand, target on sensor-class micro servers, as dynamic provisioning and CPU power scaling. The for- which are originally intended for uses in wearable technolo- mer relies on techniques such as Wake-On-LAN [36] and gies, sensor networks, and Internet-of-Things. Although VM migration [24] to power on/off servers remotely and dy- sensor-class micro servers have much less capacity, they are namically. However, the opportunities to go into hiberna- touted for minimal power consumption (< 1 Watt), which tion state are few, while waking up servers on demand and opens new possibilities of achieving higher energy efficiency migrating VMs incur additional costs and unpredictability, in datacenter workloads. Our systematic evaluation of the which make these techniques less efficient [23]. The lat- Edison cluster and comparisons to conventional brawny clus- ter solution saves energy using Dynamic Voltage/Frequency ters involve careful workload choosing and laborious param- Scaling (DVFS) [34], which reduces CPU voltage (therefore eter tuning, which ensures maximum server utilization and power) and lowers frequency as the utilization drops. But thus fair comparisons. Results show that the Edison clus- even if the CPU power consumption is proportional to work- ter achieves up to 3:5× improvement on work-done-per-joule load, other components such as memory, disk and mother- for web service applications and data-intensive MapReduce board still consume the same energy [47]. Thus the energy- jobs. In terms of scalability, the Edison cluster scales lin- proportionality delivered by DVFS is not satisfactory. early on the throughput of web service workloads, and also Despite the large number of researches on the aforemen- shows satisfactory scalability for MapReduce workloads de- tioned complex solutions, significant energy savings are rare- spite coordination overhead. ly reported. The best scenarios only achieve up to 30% en- ergy reduction [26]. However, when running cloud services on energy-efficient embedded devices, the energy saving can 1. INTRODUCTION exceed 70% in some applications [21, 53]. Compared to The rising demand for cloud services has been continu- conventional high-end servers, embedded micro servers of- ously driving the expansion of datacenters, which not only fer three advantages: puts excessive pressure on power supply and cooling infras- tructure, but also causes inflated energy cost. For industry 1. System components of embedded micro servers are giants like Google, Microsoft and Yahoo!, up to 50% of the innately better balanced [42]. For modern high-end three-year total cost of datacenters is attributed to power CPU, the power consumption increases super-linearly consumption [50]. When amortized to monthly total cost of with speed, a large part of which is devoted to branch ownership, the energy-related costs can account for up to a prediction, speculative execution and out-of-order ex- third [33]. Therefore, reducing datacenter power consump- ecution [50]. Imbalanced memory, network and I/O tion has become a hot research topic. bandwidth often leave the CPU under-utilized, caus- ing high-end servers to be less energy efficient. 2. Individual node failure has far less significant impact on micro clusters than on high-end clusters, simply be- This work is licensed under the Creative Commons Attribution- cause the number of nodes in micro clusters is larger. NonCommercial-NoDerivatives 4.0 International License. To view a copy Another point in [29] shows that in the event of load of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. For increase due to node failure and load redistribution, any use beyond those covered by this license, obtain permission by emailing Xeon cores experience more QoS degradation than [email protected]. Proceedings of the VLDB Endowment, Vol. 9, No. 9 small Atom cores if the workload becomes beyond sus- Copyright 2016 VLDB Endowment 2150-8097/16/05. tainable point. 696 3. The deployment of micro clusters requires much less to be more efficient by using all the nodes to complete the sophisticated power and cooling infrastructure. The job faster and shutting down the entire cluster afterwards. immense power draw of conventional datacenters en- Berkeley Energy Efficient MapReduce [27] divides the clus- tails significant cost of designing and building the power ter into interactive zone and batch zone, where the former and cooling support. Thanks to the low power nature takes MapReduce Interactive Analysis (MIA) workloads and of embedded devices, much of these investments can the latter is responsible for non-interactive jobs and is often be considerably reduced [20, 51]. put into low-power state. PowerNap [39] proposes a new server architecture that is able to rapidly change the power In this paper, we push the energy efficiency even further state of different components to reduce idle power draw. by building our cluster with sensor-class Intel Edison de- However, generally speaking, switching power state and mi- vices [17], and conduct more extensive benchmark evalu- grating workloads inevitably increase overhead and make it ations. Although the Edison device is typically used for harder to guarantee service level agreements. wearable technologies, Internet-of-Things and sensor net- More recently, the idea of building datacenters based on works, we show that a larger cluster of such sensor-class mi- low-power embedded systems has become popular. Some cro servers can collectively offer significant processing capac- of these non-traditional server platforms are equipped with ity for datacenter workloads when managed properly, while low-power processors, such as ARM-based CPU [38, 41, 49], achieving more work-done-per-joule. Intel Atom CPU [33, 43, 46, 29, 25] or even embedded To evaluate our sensor-class Edison cluster and compare CPU [21, 50]. In addition to low-power CPU, more energy- with high-end Dell cluster, we run two major categories of efficiency can be achieved by exploiting low-power flash stor- datacenter workloads: on-line web services and off-line data age [25, 21] or highly customizable FPGA [40]. In indus- analysis. For the first category, we set up the cluster to take try, Applied Micro [16, 18, 19] and AMD [15] also target on major roles (web servers and cache servers) in a standard low-power ARM-based servers for datacenter workloads. In web service configuration, and have clients generate HTTP academia, several projects have explored using micro servers requests to test the throughput and response delay. For the in order to gain more work-done-per-joule than conventional second category, we choose to run well known MapReduce servers. Based on the CPU and memory capacity of micro jobs on the widely used Hadoop framework [2]. In addi- servers used in related work, we divide the platform into tion, we optimize some MapReduce jobs on both platforms two categories: 1) mobile-class servers, and 2) sensor-class to make sure that server capacity is as fully utilized as pos- servers. Shown in Table 1, the first five rows are catego- sible. The scalability of the Edison cluster is also evaluated rized as mobile-class micro servers, since their capacity is for both categories of workloads. We show detailed per- similar to a typical mobile phone. The last two rows in the formance comparison of the Edison cluster and Dell cluster table represent sensor-class micro servers, in that they have with emphasis on the metric of work-done-per-joule. Our considerably lower specifications on CPU core counts, clock meaningful and fair comparison results can be valuable for rates, and RAM sizes. Sensor-class micro servers achieve related system research. ultra-low power consumption (< 1W), by sacrificing consid- The rest of the paper is organized as follows. In Sec- erable computational power. Within the mobile-class cate- tion 2 we discuss related work. Section 3 gives the overview gory, [38, 43] run database benchmarks (query processing, of our testbed and measurement methodology. Then we OLAP, OLTP) and compare with conventional servers in benchmark and compare individual Edison server and Dell terms of performance per watt. [38, 25] run MapReduce server to understand the per node performance gap in Sec- workloads on their micro servers to compare work-done- tion 4. In Section 5 we run various datacenter workloads per-joule with conventional servers. In [29], mobile cores on the clusters and compare energy efficiency in terms of are used to run Internet-scale web search, and query speed work-done-per-joule. Section 6 analyzes and compares the and price efficiency are compared with conventional servers. total cost of ownership under a simple model.