Performance Tuning of Scientific Applications
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Developer Guide
Red Hat Enterprise Linux 6 Developer Guide An introduction to application development tools in Red Hat Enterprise Linux 6 Last Updated: 2017-10-20 Red Hat Enterprise Linux 6 Developer Guide An introduction to application development tools in Red Hat Enterprise Linux 6 Robert Krátký Red Hat Customer Content Services [email protected] Don Domingo Red Hat Customer Content Services Jacquelynn East Red Hat Customer Content Services Legal Notice Copyright © 2016 Red Hat, Inc. and others. This document is licensed by Red Hat under the Creative Commons Attribution-ShareAlike 3.0 Unported License. If you distribute this document, or a modified version of it, you must provide attribution to Red Hat, Inc. and provide a link to the original. If the document is modified, all Red Hat trademarks must be removed. Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law. Red Hat, Red Hat Enterprise Linux, the Shadowman logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries. Linux ® is the registered trademark of Linus Torvalds in the United States and other countries. Java ® is a registered trademark of Oracle and/or its affiliates. XFS ® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries. MySQL ® is a registered trademark of MySQL AB in the United States, the European Union and other countries. Node.js ® is an official trademark of Joyent. -
Analysing CMS Software Performance Using Igprof, Oprofile and Callgrind
Analysing CMS software performance using IgProf, OPro¯le and callgrind L Tuura1, V Innocente2, G Eulisse1 1 Northeastern University, Boston, MA, USA 2 CERN, Geneva, Switzerland Abstract. The CMS experiment at LHC has a very large body of software of its own and uses extensively software from outside the experiment. Understanding the performance of such a complex system is a very challenging task, not the least because there are extremely few developer tools capable of pro¯ling software systems of this scale, or producing useful reports. CMS has mainly used IgProf, valgrind, callgrind and OPro¯le for analysing the performance and memory usage patterns of our software. We describe the challenges, at times rather extreme ones, faced as we've analysed the performance of our software and how we've developed an understanding of the performance features. We outline the key lessons learnt so far and the actions taken to make improvements. We describe why an in-house general pro¯ler tool still ends up besting a number of renowned open-source tools, and the improvements we've made to it in the recent year. 1. The performance optimisation issue The Compact Muon Solenoid (CMS) experiment on the Large Hadron Collider at CERN [1] is expected to start running within a year. The computing requirements for the experiment are immense. For the ¯rst year of operation, 2008, we budget 32.5M SI2k (SPECint 2000) computing capacity. This corresponds to some 2650 servers if translated into the most capable computers currently available to us.1 It is critical CMS on the one hand acquires the resources needed for the planned physics analyses, and on the other hand optimises the software to complete the task with the resources available. -
Mairjason2015phd.Pdf (1.650Mb)
Power Modelling in Multicore Computing Jason Mair a thesis submitted for the degree of Doctor of Philosophy at the University of Otago, Dunedin, New Zealand. 2015 Abstract Power consumption has long been a concern for portable consumer electron- ics, but has recently become an increasing concern for larger, power-hungry systems such as servers and clusters. This concern has arisen from the asso- ciated financial cost and environmental impact, where the cost of powering and cooling a large-scale system deployment can be on the order of mil- lions of dollars a year. Such a substantial power consumption additionally contributes significant levels of greenhouse gas emissions. Therefore, software-based power management policies have been used to more effectively manage a system’s power consumption. However, man- aging power consumption requires fine-grained power values for evaluating the run-time tradeoff between power and performance. Despite hardware power meters providing a convenient and accurate source of power val- ues, they are incapable of providing the fine-grained, per-application power measurements required in power management. To meet this challenge, this thesis proposes a novel power modelling method called W-Classifier. In this method, a parameterised micro-benchmark is designed to reproduce a selection of representative, synthetic workloads for quantifying the relationship between key performance events and the cor- responding power values. Using the micro-benchmark enables W-Classifier to be application independent, which is a novel feature of the method. To improve accuracy, W-Classifier uses run-time workload classification and derives a collection of workload-specific linear functions for power estima- tion, which is another novel feature for power modelling. -
A Statistical Performance Model of the Opteron Processor
A Statistical Performance Model of the Opteron Processor Jeanine Cook Jonathan Cook Waleed Alkohlani New Mexico State University New Mexico State University New Mexico State University Klipsch School of Electrical Department of Computer Klipsch School of Electrical and Computer Engineering Science and Computer Engineering [email protected] [email protected] [email protected] ABSTRACT m5 [11], Simics/GEMS [25], and more recently MARSSx86 [5]{ Cycle-accurate simulation is the dominant methodology for the only one supporting the popular x86 architecture. Some processor design space analysis and performance prediction. of these have not been validated against real hardware in However, with the prevalence of multi-core, multi-threaded the multi-core mode (SESC and M5), and the others that architectures, this method has become highly impractical as are validated show large errors of up to 50% [5]. All cycle- the sole means for design due to its extreme slowdowns. We accurate multi-core simulators are very slow; a few hundred have developed a statistical technique for modeling multi- thousand simulated instructions per second is the most op- core processors that is based on Monte Carlo methods. Us- timistic speed with the help of native mode execution. Fur- ing this method, processor models of contemporary archi- ther, the architectures supported by many of these simula- tectures can be developed and applied to performance pre- tors are now obsolete. diction, bottleneck detection, and limited design space anal- ysis. To date, we have accurately modeled the IBM Cell, the To address these issues and to satisfy our own desire for Intel Itanium, and the Sun Niagara 1 and Niagara 2 proces- performance models of contemporary processors, we devel- sors [34, 33, 10]. -
Power Modeling
CHAPTER 5 POWER MODELING Jason Mair1, Zhiyi Huang1, David Eyers1, Leandro Cupertino2, Georges Da Costa2, Jean-Marc Pierson2 and Helmut Hlavacs3 1Department of Computer Science, University of Otago, New Zealand 2Institute for Research in Informatics of Toulouse (IRIT), University of Toulouse III, France 3Faculty of Computer Science, University of Vienna, Austria 5.1 Introduction Power consumption has long been a concern for portable consumer electronics, with many manufacturers explicitly seeking to maximize battery life in order to improve the usability of devices such as laptops and smart phones. However, it has recently become a concern in the domain of much larger, more power hungry systems such as servers, clusters and data centers. This new drive to improve energy efficiency is in part due to the increasing deployment of large-scale systems in more businesses and industries, which have two pri- mary motives for saving energy. Firstly, there is the traditional economic incentive for a business to reduce their operating costs, where the cost of powering and cooling a large data center can be on the order of millions of dollars [18]. Reducing the total cost of own- ership for servers could help to stimulate further deployments. As servers become more affordable, deployments will increase in businesses where concerns over lifetime costs pre- viously prevented adoption. The second motivating factor is the increasing awareness of the environmental impact—e.g. greenhouse gas emissions—caused by power production. Reducing energy consumption can help a business indirectly reduce their environmental impact, making them more clean and green. The most commonly adopted solution for reducing power consumption is a hardware- based approach, where old, inefficient hardware is replaced with newer, more energy effi- COST Action 0804, edition. -
Understanding the Linux Kernel, 3Rd Edition by Daniel P
1 Understanding the Linux Kernel, 3rd Edition By Daniel P. Bovet, Marco Cesati ............................................... Publisher: O'Reilly Pub Date: November 2005 ISBN: 0-596-00565-2 Pages: 942 Table of Contents | Index In order to thoroughly understand what makes Linux tick and why it works so well on a wide variety of systems, you need to delve deep into the heart of the kernel. The kernel handles all interactions between the CPU and the external world, and determines which programs will share processor time, in what order. It manages limited memory so well that hundreds of processes can share the system efficiently, and expertly organizes data transfers so that the CPU isn't kept waiting any longer than necessary for the relatively slow disks. The third edition of Understanding the Linux Kernel takes you on a guided tour of the most significant data structures, algorithms, and programming tricks used in the kernel. Probing beyond superficial features, the authors offer valuable insights to people who want to know how things really work inside their machine. Important Intel-specific features are discussed. Relevant segments of code are dissected line by line. But the book covers more than just the functioning of the code; it explains the theoretical underpinnings of why Linux does things the way it does. This edition of the book covers Version 2.6, which has seen significant changes to nearly every kernel subsystem, particularly in the areas of memory management and block devices. The book focuses on the following topics: • Memory management, including file buffering, process swapping, and Direct memory Access (DMA) • The Virtual Filesystem layer and the Second and Third Extended Filesystems • Process creation and scheduling • Signals, interrupts, and the essential interfaces to device drivers • Timing • Synchronization within the kernel • Interprocess Communication (IPC) • Program execution Understanding the Linux Kernel will acquaint you with all the inner workings of Linux, but it's more than just an academic exercise. -
Contributions of Hybrid Architectures to Depth Imaging: a CPU, APU and GPU Comparative Study
Contributions of hybrid architectures to depth imaging : a CPU, APU and GPU comparative study Issam Said To cite this version: Issam Said. Contributions of hybrid architectures to depth imaging : a CPU, APU and GPU com- parative study. Hardware Architecture [cs.AR]. Université Pierre et Marie Curie - Paris VI, 2015. English. NNT : 2015PA066531. tel-01248522v2 HAL Id: tel-01248522 https://tel.archives-ouvertes.fr/tel-01248522v2 Submitted on 20 May 2016 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. THESE` DE DOCTORAT DE l’UNIVERSITE´ PIERRE ET MARIE CURIE sp´ecialit´e Informatique Ecole´ doctorale Informatique, T´el´ecommunications et Electronique´ (Paris) pr´esent´eeet soutenue publiquement par Issam SAID pour obtenir le grade de DOCTEUR en SCIENCES de l’UNIVERSITE´ PIERRE ET MARIE CURIE Apports des architectures hybrides `a l’imagerie profondeur : ´etude comparative entre CPU, APU et GPU Th`esedirig´eepar Jean-Luc Lamotte et Pierre Fortin soutenue le Lundi 21 D´ecembre 2015 apr`es avis des rapporteurs M. Fran¸cois Bodin Professeur, Universit´ede Rennes 1 M. Christophe Calvin Chef de projet, CEA devant le jury compos´ede M. Fran¸cois Bodin Professeur, Universit´ede Rennes 1 M. -
Best Practice Guide - Generic X86 Vegard Eide, NTNU Nikos Anastopoulos, GRNET Henrik Nagel, NTNU 02-05-2013
Best Practice Guide - Generic x86 Vegard Eide, NTNU Nikos Anastopoulos, GRNET Henrik Nagel, NTNU 02-05-2013 1 Best Practice Guide - Generic x86 Table of Contents 1. Introduction .............................................................................................................................. 3 2. x86 - Basic Properties ................................................................................................................ 3 2.1. Basic Properties .............................................................................................................. 3 2.2. Simultaneous Multithreading ............................................................................................. 4 3. Programming Environment ......................................................................................................... 5 3.1. Modules ........................................................................................................................ 5 3.2. Compiling ..................................................................................................................... 6 3.2.1. Compilers ........................................................................................................... 6 3.2.2. General Compiler Flags ......................................................................................... 6 3.2.2.1. GCC ........................................................................................................ 6 3.2.2.2. Intel ........................................................................................................ -
Pipenightdreams Osgcal-Doc Mumudvb Mpg123-Alsa Tbb
pipenightdreams osgcal-doc mumudvb mpg123-alsa tbb-examples libgammu4-dbg gcc-4.1-doc snort-rules-default davical cutmp3 libevolution5.0-cil aspell-am python-gobject-doc openoffice.org-l10n-mn libc6-xen xserver-xorg trophy-data t38modem pioneers-console libnb-platform10-java libgtkglext1-ruby libboost-wave1.39-dev drgenius bfbtester libchromexvmcpro1 isdnutils-xtools ubuntuone-client openoffice.org2-math openoffice.org-l10n-lt lsb-cxx-ia32 kdeartwork-emoticons-kde4 wmpuzzle trafshow python-plplot lx-gdb link-monitor-applet libscm-dev liblog-agent-logger-perl libccrtp-doc libclass-throwable-perl kde-i18n-csb jack-jconv hamradio-menus coinor-libvol-doc msx-emulator bitbake nabi language-pack-gnome-zh libpaperg popularity-contest xracer-tools xfont-nexus opendrim-lmp-baseserver libvorbisfile-ruby liblinebreak-doc libgfcui-2.0-0c2a-dbg libblacs-mpi-dev dict-freedict-spa-eng blender-ogrexml aspell-da x11-apps openoffice.org-l10n-lv openoffice.org-l10n-nl pnmtopng libodbcinstq1 libhsqldb-java-doc libmono-addins-gui0.2-cil sg3-utils linux-backports-modules-alsa-2.6.31-19-generic yorick-yeti-gsl python-pymssql plasma-widget-cpuload mcpp gpsim-lcd cl-csv libhtml-clean-perl asterisk-dbg apt-dater-dbg libgnome-mag1-dev language-pack-gnome-yo python-crypto svn-autoreleasedeb sugar-terminal-activity mii-diag maria-doc libplexus-component-api-java-doc libhugs-hgl-bundled libchipcard-libgwenhywfar47-plugins libghc6-random-dev freefem3d ezmlm cakephp-scripts aspell-ar ara-byte not+sparc openoffice.org-l10n-nn linux-backports-modules-karmic-generic-pae -
AMD's Early Processor Lines, up to the Hammer Family (Families K8
AMD’s early processor lines, up to the Hammer Family (Families K8 - K10.5h) Dezső Sima October 2018 (Ver. 1.1) Sima Dezső, 2018 AMD’s early processor lines, up to the Hammer Family (Families K8 - K10.5h) • 1. Introduction to AMD’s processor families • 2. AMD’s 32-bit x86 families • 3. Migration of 32-bit ISAs and microarchitectures to 64-bit • 4. Overview of AMD’s K8 – K10.5 (Hammer-based) families • 5. The K8 (Hammer) family • 6. The K10 Barcelona family • 7. The K10.5 Shanghai family • 8. The K10.5 Istambul family • 9. The K10.5-based Magny-Course/Lisbon family • 10. References 1. Introduction to AMD’s processor families 1. Introduction to AMD’s processor families (1) 1. Introduction to AMD’s processor families AMD’s early x86 processor history [1] AMD’s own processors Second sourced processors 1. Introduction to AMD’s processor families (2) Evolution of AMD’s early processors [2] 1. Introduction to AMD’s processor families (3) Historical remarks 1) Beyond x86 processors AMD also designed and marketed two embedded processor families; • the 2900 family of bipolar, 4-bit slice microprocessors (1975-?) used in a number of processors, such as particular DEC 11 family models, and • the 29000 family (29K family) of CMOS, 32-bit embedded microcontrollers (1987-95). In late 1995 AMD cancelled their 29K family development and transferred the related design team to the firm’s K5 effort, in order to focus on x86 processors [3]. 2) Initially, AMD designed the Am386/486 processors that were clones of Intel’s processors. -
The Severest of Them All: Inference Attacks Against Secure Virtual Enclaves Jan Werner Joshua Mason Manos Antonakakis UNC Chapel Hill U
The SEVerESt Of Them All: Inference Attacks Against Secure Virtual Enclaves Jan Werner Joshua Mason Manos Antonakakis UNC Chapel Hill U. Illinois Georgia Tech [email protected] [email protected] [email protected] Michalis Polychronakis Fabian Monrose Stony Brook University UNC Chapel Hill [email protected] [email protected] ABSTRACT and Communications Security (AsiaCCS ’19), July 9–12, 2019, Auckland, The success of cloud computing has shown that the cost and con- New Zealand. ACM, New York, NY, USA, 13 pages. https://doi.org/10.1145/ venience benefits of outsourcing infrastructure, platform, and soft- 3321705.3329820 ware resources outweigh concerns about confidentiality. Still, many businesses resist moving private data to cloud providers due to in- 1 INTRODUCTION tellectual property and privacy reasons. A recent wave of hardware Of late, the need for a Trusted Execution Environment has risen virtualization technologies aims to alleviate these concerns by of- to the forefront as an important consideration for many parties in fering encrypted virtualization features that support data confiden- the cloud computing ecosystem. Cloud computing refers to the use tiality of guest virtual machines (e.g., by transparently encrypting of on-demand networked infrastructure software and capacity to memory) even when running on top untrusted hypervisors. provide resources to customers [52]. In today’s marketplace, con- We introduce two new attacks that can breach the confidentiality tent providers desire the ability to deliver copyrighted or sensitive of protected enclaves. First, we show how a cloud adversary can material to clients without the risk of data leaks. At the same time, judiciously inspect the general purpose registers to unmask the computer manufacturers must be able to verify that only trusted computation that passes through them. -
Improving the Approach to Linux Performance Analysis an Analyst Point of View
Improving the Approach to Linux Performance Analysis An analyst point of view Jose Santos Guanglei Li IBM’s Linux Technology Center IBM’s China Development Lab [email protected] [email protected] Abstract 1 Introduction In-depth Linux kernel performance analysis As the complexity of the Linux kernel in- and debugging historically has been a focus creases, so does the complexity of the prob- which required resident kernel hackers. This lems that impact performance on a produc- effort not only required deep kernel knowledge tion system. In addition, the hardware sys- to analyze the code, but also required program- tems on which Linux runs today is also increas- ming skills in order to modify the code, re-build ing in complexity, scale, and size. Customers the kernel, re-boot and extract and format the are also running more robust and challenging information from the system. workloads in production environments and are A variety of powerful Linux system tools expecting more reliability, stability, and perfor- are emerging which provide significantly more mance from the underlying operating system flexibility for the analyst and the system owner. and hardware platform. The end result of all This paper highlights an example set of per- this complexity is that new performance bar- formance problems which are often seen on a riers continue to emerge which in many cases system, and focuses on the methodology, ap- are increasingly difficult to analyze and under- proach, and steps that can be used to address stand. each problem. In the past, many of these problems required the Examples of a set of pre-defined SystemTAP expertise of kernel developers to create special- “tapsets” are provided which make it easier ized one-off tools that were specific to the prob- for the non-programmer to extract information lem at hand and are of little use to other prob- about kernel events from a running system, ef- lems.