Microbenchmarks in Big Data
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Overview of the SPEC Benchmarks
9 Overview of the SPEC Benchmarks Kaivalya M. Dixit IBM Corporation “The reputation of current benchmarketing claims regarding system performance is on par with the promises made by politicians during elections.” Standard Performance Evaluation Corporation (SPEC) was founded in October, 1988, by Apollo, Hewlett-Packard,MIPS Computer Systems and SUN Microsystems in cooperation with E. E. Times. SPEC is a nonprofit consortium of 22 major computer vendors whose common goals are “to provide the industry with a realistic yardstick to measure the performance of advanced computer systems” and to educate consumers about the performance of vendors’ products. SPEC creates, maintains, distributes, and endorses a standardized set of application-oriented programs to be used as benchmarks. 489 490 CHAPTER 9 Overview of the SPEC Benchmarks 9.1 Historical Perspective Traditional benchmarks have failed to characterize the system performance of modern computer systems. Some of those benchmarks measure component-level performance, and some of the measurements are routinely published as system performance. Historically, vendors have characterized the performances of their systems in a variety of confusing metrics. In part, the confusion is due to a lack of credible performance information, agreement, and leadership among competing vendors. Many vendors characterize system performance in millions of instructions per second (MIPS) and millions of floating-point operations per second (MFLOPS). All instructions, however, are not equal. Since CISC machine instructions usually accomplish a lot more than those of RISC machines, comparing the instructions of a CISC machine and a RISC machine is similar to comparing Latin and Greek. 9.1.1 Simple CPU Benchmarks Truth in benchmarking is an oxymoron because vendors use benchmarks for marketing purposes. -
Hypervisors Vs. Lightweight Virtualization: a Performance Comparison
2015 IEEE International Conference on Cloud Engineering Hypervisors vs. Lightweight Virtualization: a Performance Comparison Roberto Morabito, Jimmy Kjällman, and Miika Komu Ericsson Research, NomadicLab Jorvas, Finland [email protected], [email protected], [email protected] Abstract — Virtualization of operating systems provides a container and alternative solutions. The idea is to quantify the common way to run different services in the cloud. Recently, the level of overhead introduced by these platforms and the lightweight virtualization technologies claim to offer superior existing gap compared to a non-virtualized environment. performance. In this paper, we present a detailed performance The remainder of this paper is structured as follows: in comparison of traditional hypervisor based virtualization and Section II, literature review and a brief description of all the new lightweight solutions. In our measurements, we use several technologies and platforms evaluated is provided. The benchmarks tools in order to understand the strengths, methodology used to realize our performance comparison is weaknesses, and anomalies introduced by these different platforms in terms of processing, storage, memory and network. introduced in Section III. The benchmark results are presented Our results show that containers achieve generally better in Section IV. Finally, some concluding remarks and future performance when compared with traditional virtual machines work are provided in Section V. and other recent solutions. Albeit containers offer clearly more dense deployment of virtual machines, the performance II. BACKGROUND AND RELATED WORK difference with other technologies is in many cases relatively small. In this section, we provide an overview of the different technologies included in the performance comparison. -
Power Measurement Tutorial for the Green500 List
Power Measurement Tutorial for the Green500 List R. Ge, X. Feng, H. Pyla, K. Cameron, W. Feng June 27, 2007 Contents 1 The Metric for Energy-Efficiency Evaluation 1 2 How to Obtain P¯(Rmax)? 2 2.1 The Definition of P¯(Rmax)...................................... 2 2.2 Deriving P¯(Rmax) from Unit Power . 2 2.3 Measuring Unit Power . 3 3 The Measurement Procedure 3 3.1 Equipment Check List . 4 3.2 Software Installation . 4 3.3 Hardware Connection . 4 3.4 Power Measurement Procedure . 5 4 Appendix 6 4.1 Frequently Asked Questions . 6 4.2 Resources . 6 1 The Metric for Energy-Efficiency Evaluation This tutorial serves as a practical guide for measuring the computer system power that is required as part of a Green500 submission. It describes the basic procedures to be followed in order to measure the power consumption of a supercomputer. A supercomputer that appears on The TOP500 List can easily consume megawatts of electric power. This power consumption may lead to operating costs that exceed acquisition costs as well as intolerable system failure rates. In recent years, we have witnessed an increasingly stronger movement towards energy-efficient computing systems in academia, government, and industry. Thus, the purpose of the Green500 List is to provide a ranking of the most energy-efficient supercomputers in the world and serve as a complementary view to the TOP500 List. However, as pointed out in [1, 2], identifying a single objective metric for energy efficiency in supercom- puters is a difficult task. Based on [1, 2] and given the already existing use of the “performance per watt” metric, the Green500 List uses “performance per watt” (PPW) as its metric to rank the energy efficiency of supercomputers. -
The High Performance Linpack (HPL) Benchmark Evaluation on UTP High Performance Computing Cluster by Wong Chun Shiang 16138 Diss
The High Performance Linpack (HPL) Benchmark Evaluation on UTP High Performance Computing Cluster By Wong Chun Shiang 16138 Dissertation submitted in partial fulfilment of the requirements for the Bachelor of Technology (Hons) (Information and Communications Technology) MAY 2015 Universiti Teknologi PETRONAS Bandar Seri Iskandar 31750 Tronoh Perak Darul Ridzuan CERTIFICATION OF APPROVAL The High Performance Linpack (HPL) Benchmark Evaluation on UTP High Performance Computing Cluster by Wong Chun Shiang 16138 A project dissertation submitted to the Information & Communication Technology Programme Universiti Teknologi PETRONAS In partial fulfillment of the requirement for the BACHELOR OF TECHNOLOGY (Hons) (INFORMATION & COMMUNICATION TECHNOLOGY) Approved by, ____________________________ (DR. IZZATDIN ABDUL AZIZ) UNIVERSITI TEKNOLOGI PETRONAS TRONOH, PERAK MAY 2015 ii CERTIFICATION OF ORIGINALITY This is to certify that I am responsible for the work submitted in this project, that the original work is my own except as specified in the references and acknowledgements, and that the original work contained herein have not been undertaken or done by unspecified sources or persons. ________________ WONG CHUN SHIANG iii ABSTRACT UTP High Performance Computing Cluster (HPCC) is a collection of computing nodes using commercially available hardware interconnected within a network to communicate among the nodes. This campus wide cluster is used by researchers from internal UTP and external parties to compute intensive applications. However, the HPCC has never been benchmarked before. It is imperative to carry out a performance study to measure the true computing ability of this cluster. This project aims to test the performance of a campus wide computing cluster using a selected benchmarking tool, the High Performance Linkpack (HPL). -
A Study on the Evaluation of HPC Microservices in Containerized Environment
ReceivED XXXXXXXX; ReVISED XXXXXXXX; Accepted XXXXXXXX DOI: xxx/XXXX ARTICLE TYPE A Study ON THE Evaluation OF HPC Microservices IN Containerized EnVIRONMENT DeVKI Nandan Jha*1 | SaurABH Garg2 | Prem PrAKASH JaYARAMAN3 | Rajkumar Buyya4 | Zheng Li5 | GrAHAM Morgan1 | Rajiv Ranjan1 1School Of Computing, Newcastle UnivERSITY, UK Summary 2UnivERSITY OF Tasmania, Australia, 3Swinburne UnivERSITY OF TECHNOLOGY, AustrALIA Containers ARE GAINING POPULARITY OVER VIRTUAL MACHINES (VMs) AS THEY PROVIDE THE ADVANTAGES 4 The UnivERSITY OF Melbourne, AustrALIA OF VIRTUALIZATION WITH THE PERFORMANCE OF NEAR bare-metal. The UNIFORMITY OF SUPPORT PROVIDED 5UnivERSITY IN Concepción, Chile BY DockER CONTAINERS ACROSS DIFFERENT CLOUD PROVIDERS MAKES THEM A POPULAR CHOICE FOR DEVel- Correspondence opers. EvOLUTION OF MICROSERVICE ARCHITECTURE ALLOWS COMPLEX APPLICATIONS TO BE STRUCTURED INTO *DeVKI Nandan Jha Email: [email protected] INDEPENDENT MODULAR COMPONENTS MAKING THEM EASIER TO manage. High PERFORMANCE COMPUTING (HPC) APPLICATIONS ARE ONE SUCH APPLICATION TO BE DEPLOYED AS microservices, PLACING SIGNIfiCANT RESOURCE REQUIREMENTS ON THE CONTAINER FRamework. HoweVER, THERE IS A POSSIBILTY OF INTERFERENCE BETWEEN DIFFERENT MICROSERVICES HOSTED WITHIN THE SAME CONTAINER (intra-container) AND DIFFERENT CONTAINERS (inter-container) ON THE SAME PHYSICAL host. In THIS PAPER WE DESCRIBE AN Exten- SIVE EXPERIMENTAL INVESTIGATION TO DETERMINE THE PERFORMANCE EVALUATION OF DockER CONTAINERS EXECUTING HETEROGENEOUS HPC microservices. WE ARE PARTICULARLY CONCERNED WITH HOW INTRa- CONTAINER AND inter-container INTERFERENCE INflUENCES THE performance. MoreoVER, WE INVESTIGATE THE PERFORMANCE VARIATIONS IN DockER CONTAINERS WHEN CONTROL GROUPS (cgroups) ARE USED FOR RESOURCE limitation. FOR EASE OF PRESENTATION AND REPRODUCIBILITY, WE USE Cloud Evaluation Exper- IMENT Methodology (CEEM) TO CONDUCT OUR COMPREHENSIVE SET OF Experiments. -
I.MX 8Quadxplus Power and Performance
NXP Semiconductors Document Number: AN12338 Application Note Rev. 4 , 04/2020 i.MX 8QuadXPlus Power and Performance 1. Introduction Contents This application note helps you to design power 1. Introduction ........................................................................ 1 management systems. It illustrates the current drain 2. Overview of i.MX 8QuadXPlus voltage supplies .............. 1 3. Power measurement of the i.MX 8QuadXPlus processor ... 2 measurements of the i.MX 8QuadXPlus Applications 3.1. VCC_SCU_1V8 power ........................................... 4 Processors taken on NXP Multisensory Evaluation Kit 3.2. VCC_DDRIO power ............................................... 4 (MEK) Platform through several use cases. 3.3. VCC_CPU/VCC_GPU/VCC_MAIN power ........... 5 3.4. Temperature measurements .................................... 5 This document provides details on the performance and 3.5. Hardware and software used ................................... 6 3.6. Measuring points on the MEK platform .................. 6 power consumption of the i.MX 8QuadXPlus 4. Use cases and measurement results .................................... 6 processors under a variety of low- and high-power 4.1. Low-power mode power consumption (Key States modes. or ‘KS’)…… ......................................................................... 7 4.2. Complex use case power consumption (Arm Core, The data presented in this application note is based on GPU active) ......................................................................... 11 5. SOC -
Srovnání Kvality Virtuálních Strojů Assessment of Virtual Machines Performance
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by DSpace at VSB Technical University of Ostrava VŠB - Technická univerzita Ostrava Fakulta elektrotechniky a informatiky Katedra Informatiky Srovnání kvality virtuálních strojů Assessment of Virtual Machines Performance 2011 Lenka Novotná Prohlašuji, že jsem tuto bakalářskou práci vypracovala samostatně. Uvedla jsem všechny literární prameny a publikace, ze kterých jsem čerpala. V Ostravě dne 20. dubna 2011 ……………………… Lenka Novotná Ráda bych poděkovala vedoucímu bakalářské práce, Ing. Petru Olivkovi, za pomoc, věcné připomínky a veškerý čas, který mi věnoval. Abstrakt Hlavním cílem této bakalářské práce je vysvětlit pojem virtualizace, virtuální stroj, a dále popsat a porovnat dostupné virtualizační prostředky, které se využívají ve světovém oboru informačních technologií. Práce se především zabývá srovnáním výkonu virtualizačních strojů pro desktopové operační systémy MS Windows a Linux. Při testování jsem se zaměřila na tři hlavní faktory a to propustnost sítě, při které jsem použila aplikaci Iperf, ke změření výkonu diskových operací jsem využila program IOZone a pro test posledního faktoru, který je zaměřen na přidělování CPU procesů, jsem použila známé testovací aplikace Dhrystone a Whetstone. Všechna zmiňovaná měření okruhů byla provedena na třech virtualizačních platformách, kterými jsou VirtualBox OSE, VMware Player a KVM s QEMU. Klíčová slova: Virtualizace, virtuální stroj, VirtualBox, VMware, KVM, QEMU, plná virtualizace, paravirtualizace, částečná virtualizace, hardwarově asistovaná virtualizace, virtualizace na úrovní operačního systému, měření výkonu CPU, měření propustnosti sítě, měření diskových operací, Dhrystone, Whetstone, Iperf, IOZone. Abstract The main goal of this thesis is explain the term of Virtualization and Virtual Machine, describe and compare available virtualization resources, which we can use in worldwide field of information technology. -
Fair Benchmarking for Cloud Computing Systems
Fair Benchmarking for Cloud Computing Systems Authors: Lee Gillam Bin Li John O’Loughlin Anuz Pranap Singh Tomar March 2012 Contents 1 Introduction .............................................................................................................................................................. 3 2 Background .............................................................................................................................................................. 4 3 Previous work ........................................................................................................................................................... 5 3.1 Literature ........................................................................................................................................................... 5 3.2 Related Resources ............................................................................................................................................. 6 4 Preparation ............................................................................................................................................................... 9 4.1 Cloud Providers................................................................................................................................................. 9 4.2 Cloud APIs ...................................................................................................................................................... 10 4.3 Benchmark selection ...................................................................................................................................... -
On the Performance of MPI-Openmp on a 12 Nodes Multi-Core Cluster
On the Performance of MPI-OpenMP on a 12 nodes Multi-core Cluster Abdelgadir Tageldin Abdelgadir1, Al-Sakib Khan Pathan1∗ , Mohiuddin Ahmed2 1 Department of Computer Science, International Islamic University Malaysia, Gombak 53100, Kuala Lumpur, Malaysia 2 Department of Computer Network, Jazan University, Saudi Arabia [email protected] , [email protected] , [email protected] Abstract. With the increasing number of Quad-Core-based clusters and the introduction of compute nodes designed with large memory capacity shared by multiple cores, new problems related to scalability arise. In this paper, we analyze the overall performance of a cluster built with nodes having a dual Quad-Core Processor on each node. Some benchmark results are presented and some observations are mentioned when handling such processors on a benchmark test. A Quad-Core-based cluster's complexity arises from the fact that both local communication and network communications between the running processes need to be addressed. The potentials of an MPI-OpenMP approach are pinpointed because of its reduced communication overhead. At the end, we come to a conclusion that an MPI-OpenMP solution should be considered in such clusters since optimizing network communications between nodes is as important as optimizing local communications between processors in a multi-core cluster. Keywords: MPI-OpenMP, hybrid, Multi-Core, Cluster. 1 Introduction The integration of two or more processors within a single chip is an advanced technology for tackling the disadvantages exposed by a single core when it comes to increasing the speed, as more heat is generated and more power is consumed by those single cores. -
Power Efficiency in High Performance Computing
Power Efficiency in High Performance Computing Shoaib Kamil John Shalf Erich Strohmaier LBNL/UC Berkeley LBNL/NERSC LBNL/CRD [email protected] [email protected] [email protected] ABSTRACT 100 teraflop successor consumes almost 1,500 KW without After 15 years of exponential improvement in microproces- even being in the top 10. The first petaflop-scale systems, sor clock rates, the physical principles allowing for Dennard expected to debut in 2008, will draw 2-7 megawatts of power. scaling, which enabled performance improvements without a Projections for exaflop-scale computing systems, expected commensurate increase in power consumption, have all but in 2016-2018, range from 60-130 megawatts [16]. Therefore, ended. Until now, most HPC systems have not focused on fewer sites in the US will be able to host the largest scale power efficiency. However, as the cost of power reaches par- computing systems due to limited availability of facilities ity with capital costs, it is increasingly important to com- with sufficient power and cooling capabilities. Following this pare systems with metrics based on the sustained perfor- trend, over time an ever increasing proportion of an HPC mance per watt. Therefore we need to establish practical center’s budget will be needed for supplying power to these methods to measure power consumption of such systems in- systems. situ in order to support such metrics. Our study provides The root cause of this impending crisis is that chip power power measurements for various computational loads on the efficiency is no longer improving at historical rates. Up until largest scale HPC systems ever involved in such an assess- now, Moore’s Law improvements in photolithography tech- ment. -
CS 267 Dense Linear Algebra: History and Structure, Parallel Matrix
Quick review of earlier lecture •" What do you call •" A program written in PyGAS, a Global Address CS 267 Space language based on Python… Dense Linear Algebra: •" That uses a Monte Carlo simulation algorithm to approximate π … History and Structure, •" That has a race condition, so that it gives you a Parallel Matrix Multiplication" different funny answer every time you run it? James Demmel! Monte - π - thon ! www.cs.berkeley.edu/~demmel/cs267_Spr16! ! 02/25/2016! CS267 Lecture 12! 1! 02/25/2016! CS267 Lecture 12! 2! Outline Outline •" History and motivation •" History and motivation •" What is dense linear algebra? •" What is dense linear algebra? •" Why minimize communication? •" Why minimize communication? •" Lower bound on communication •" Lower bound on communication •" Parallel Matrix-matrix multiplication •" Parallel Matrix-matrix multiplication •" Attaining the lower bound •" Attaining the lower bound •" Other Parallel Algorithms (next lecture) •" Other Parallel Algorithms (next lecture) 02/25/2016! CS267 Lecture 12! 3! 02/25/2016! CS267 Lecture 12! 4! CS267 Lecture 2 1 Motifs What is dense linear algebra? •" Not just matmul! The Motifs (formerly “Dwarfs”) from •" Linear Systems: Ax=b “The Berkeley View” (Asanovic et al.) •" Least Squares: choose x to minimize ||Ax-b||2 Motifs form key computational patterns •" Overdetermined or underdetermined; Unconstrained, constrained, or weighted •" Eigenvalues and vectors of Symmetric Matrices •" Standard (Ax = λx), Generalized (Ax=λBx) •" Eigenvalues and vectors of Unsymmetric matrices -
A Synthetic Benchmark
A synthetic benchmark H J Curnow and B A Wichmann Central Computer Agency, Riverwalk House, London SW1P 4RT National Physical Laboratory, Teddington, Middlesex TW11 OLW Computer Journal, Vol 19, No 1, pp43-49. 1976 Abstract A simple method of measuring performance is by means of a benchmark pro- gram. Unless such a program is carefully constructed it is unlikely to be typical of the many thousands of programs run at an installation. An example benchmark for measuring the processor power of scientific computers is presented: this is compared with other methods of assessing computer power. (Received December 1974) An important characteristic of computers used for scientific work is the speed of the central processor unit. A simple technique for comparing this speed for a variety of machines is to time some clearly defined task on each one. Unfortunately the ratio of speeds obtained varies enormously with the nature of the task being performed. If the task is defined informally in words, large variations can be caused by small differences in the tasks actually performed on the machines. These variations can be largely overcome by using a high level language to specify the task. An additional advantage of this method is that the efficiency of the compiler and the differences in machine architecture are automatically taken into account. In any case, most scientific programming is performed in high level languages, so these measurements will be a better guide to the machine’s capabilities than measurements based on use of low level languages. An example of the use of machine-independent languages to measure processing speed appears in Wichmann [7] which gives the times taken in microseconds to execute 42 basic statements in ALGOL 60 on some 50 machines.