Characterization for Heterogeneous Multicore Architectures

Total Page:16

File Type:pdf, Size:1020Kb

Characterization for Heterogeneous Multicore Architectures S. Arash Ostadzadeh antitative Application Data Flow Characterization for Heterogeneous Multicore Architectures antitative Application Data Flow Characterization for Heterogeneous Multicore Architectures PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Technische Universiteit Del, op gezag van de Rector Magnificus prof. ir. K.Ch.A.M. Luyben, voorzier van het College voor Promoties, in het openbaar te verdedigen op dinsdag december om : uur door Sayyed Arash OSTADZADEH Master of Science in Computer Engineering Ferdowsi University of Mashhad geboren te Mashhad, Iran Dit proefschri is goedgekeurd door de promotor: Prof. dr. K.L.M. Bertels Samenstelling promotiecommissie: Rector Magnificus voorzier Prof. dr. K.L.M. Bertels Technische Universiteit Del, promotor Prof. dr. ir. H.J. Sips Technische Universiteit Del Prof. Dr.-Ing. Michael Hübner Ruhr-Universität Bochum, Duitsland Prof. Dr.-Ing. Mladen Berekovic Technische Universität Braunschweig, Duitsland Prof. dr. Henk Corporaal Technische Universiteit Eindhoven Prof. dr. ir. Dirk Stroobandt Universiteit Gent, België Dr. G.K. Kuzmanov Technische Universiteit Del Prof. dr. ir. F.W. Jansen Technische Universiteit Del, reservelid S. Arash Ostadzadeh antitative Application Data Flow Characterization for Heterogeneous Multicore Architectures Met samenvaing in het Nederlands. Subject headings: Dynamic Binary Instrumentation, Application Partitioning, Hardware/Soware Co-design. e cover images are abstract artworks created by the Agony drawing program developed by Kelvin (hp://www.kelbv.com/agony.php). Copyright © S. Arash Ostadzadeh ISBN 978-94-6186-095-8 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmied, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the permission of the author. 9 789461 860958 Printed in e Netherlands Dedicated to my dear parents ⅰ ⅱ Abstract ecent trends show a steady increase in the utilization of heterogeneous multicore architectures in order to address the ever-growing need for com- r puting performance. ese emerging architectures pose specific challenges with regard to their programmability. In addition, they require efficient application mapping schemes to fully harness their processing power and avoid bole- necks. In this respect, it is of critical importance to analyse application behaviour, and the data communication between tasks, in particular. In this dissertation, we present a profiling framework that helps developers to gain an insight into the behaviour of an application. e presented profiling framework is generic and not restricted to a particular platform, application, or purpose. We utilize this framework with the primary goal of mapping applications onto a heterogeneous multicore architecture. e framework includes a memory access profiling toolset, called Quad, that provides quantitative information regarding the memory accesses in an ap- plication. Quad utilizes Dynamic Binary Instrumentation (DBI) to detect the actual data dependencies that occur between the tasks of an application at runtime. Additional- ly, it also provides accurate memory access measurements, such as the amount of data transferred between tasks and the memory size required for their communication. Such information can be utilized to identify critical parts of an application, to highlight coarse- grained parallelism opportunities, and to guide code optimizations. As a proof of concept to substantiate the usefulness of the extracted profiling infor- mation, we utilize the main output of Quad, the antitative Data Usage (QDU) graph, as the input model to formulate a general application partitioning problem. e formula- tion of this intractable problem is flexible and accommodates different design objectives and constraints. Subsequently, we propose a heuristic algorithm to find high quality partitions of an application in a reasonable amount of time. In addition to the com- plexity analysis of the proposed algorithm, we present a thorough theoretical analysis of the application partitioning problem. In order to evaluate the quality of the solu- tions, we developed a test bench for generating synthetic QDU graphs and compared the results against the optimal partitions obtained using an exhaustive search. e com- parison results show that the proposed heuristic algorithm is able to provide optimal or near-optimal solutions. To further prove the applicability of the profiling framework, we investigate in de- tail the utilization of the framework in practice, by mapping two real applications onto ⅲ a heterogeneous reconfigurable architecture. To achieve this goal, we propose a hard- ware/soware partitioning methodology that introduces the concept of merging tightly- coupled tasks based on the data communication analysis. Moreover, the profiling infor- mation is utilized to fine-tune the applications and optimize their data flow. e obtained results show a performance increase of % and %. ⅳ Acknowledgements My interest in computers dates back to , when I managed to get my hands on a Commodore . I can still vividly recall the day my brother came up with a magic box in his hands. All it needed was just a "poke" to make my already hypnotized eyes poke out! and yes, I do remember the magical number aer all these years! POKE ,<color code: -> and bingo… you have the desired border color! Simple, but it was more than enough to cast a spell on me. If I am where I am standing today, it is because of you, Shervin. I decided to study computer science because I was enchanted by your programming skills and enthusiasm. I will never forget all those good times when I used to sit beside you, trying to learn something new about computers. You are not only a dear friend and a true brother to me, but also a perennial source of inspiration and fortitude. anks for your selfless support and encouragement through all these years. is thesis is not only the outcome of my endeavor over the last years, but also the kind guidance, assistance and support of several individuals, to all of whom I am deeply grateful. Words fail to stand for the deep gratitude that I wish to express to all of you. I would like to stress the fact that the order in which I acknowledge the names is not representative of the value that I place on their roles in this respect. First of all, I would like to sincerely thank my advisor and promotor, Prof. Koen Bertels, who gave me the chance to step into the PhD journey. Koen, I kindly value your continuous support, commitment, and patience, which immensely influenced my research view. You gave me the opportunity to develop myself in different aspects and to have a vision for future research. I am grateful for your dedication to guide me along the entire journey, our fruitful discussions, and the freedom that you granted me to pursue the research work. I am also thankful for your invaluable comments on my thesis. I would like to extend my gratitude to my defense commiee for the time that they invested in reading the thesis manuscript. I appreciate their insightful discussions and suggestions to improve the quality of this thesis. I am indebted to my dear friend, Faisal, for all the proofreading of my thesis. Faisal, I value the time we spent for the research collaboration; but above that, I highly appre- ciate your genial friendship. anks for the helping hand whenever I needed you. My appreciation also goes to Roel for the comments and discussions on our collaborative research work. Roel, thanks for helping me during the recent years in Holland. I would like to express my gratitude to Imran for his valuable contributions to the extension of this research work. Imran, you are a smart, dedicated, hardworking researcher that any- one would cherish working with. At the same time, you are a modest and trustworthy ⅴ friend. I would also like to thank Kamana for her friendship and support. I am grateful to Carlo for all the proofreading of my manuscripts and his comments over the last few years. I would also like to acknowledge Valery and Marco for their efforts to improve the Quad toolset, Andrew for kindly proofreading the abstract and propositions of the thesis, Roel and Moa for their translation into Dutch. My appreciation goes to Iranian friends in Holland who helped me sele down here and made me feel at home: Mahmood, Mojtaba, Alireza, Javad, Mehdi, Behnaz, Mahyar, Rahim, Hamed, Asad, Roya, Mehdi, Gholam Reza, Vahid, Azadeh, Ashkan, Ghazaleh, Sepideh, Mohammad Reza, Reza, Masoud, Mohammad, Amin, Behzad, Hossein, Hadi, Arash, Mohamad Reza, Ali, Hossein, and other friends that I have failed to name here. A special thanks goes to Alireza and Javad for tolerating me when I was falling asleep where I was not supposed to! … thanks for being supportive through all these years. Mahyar, I appreciate all your invaluable support and kindness. I would also like to acknowledge my present and former colleagues in the Computer Engineering research group at TU Del: Zubair, Luyi, omas, Dimitrios, Sebastian, Jae, Tariq, Fakhar, Seyab, Aqeel, Mafalda, Innocent, Laiq, Bogdan, Omar, Hamid, Roel, Saleh, Elena, Cuong, Vlad, Razvan, Muhammad, Chunyang, and Demid. I am grateful to Lidwina for taking care of all the administrative work during these years. I also wish to thank Bert, Erik and Eef for their technical support. I would like to take this opportunity to express my sincere appreciation to all my teachers who have taught me since I went to school, and to all my wonderful friends in Iran for their prayers, kind words, and moral support. Finally, I wish to express my deepest gratitude to my dear parents for their endless love, support, and commitment throughout my life. Mom, Dad, your incomparable love gave me the strength to overcome all the troubles that I faced in my life. You were my one and only motivation to stay and complete this journey. Mom, Dad, I endured just to see the smile on your face, which means more than the world to me.
Recommended publications
  • Application Performance Optimization
    Application Performance Optimization By Börje Lindh - Sun Microsystems AB, Sweden Sun BluePrints™ OnLine - March 2002 http://www.sun.com/blueprints Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95045 USA 650 960-1300 Part No.: 816-4529-10 Revision 1.3, 02/25/02 Edition: March 2002 Copyright 2002 Sun Microsystems, Inc. 4130 Network Circle, Santa Clara, California 95045 U.S.A. All rights reserved. This product or document is protected by copyright and distributed under licenses restricting its use, copying, distribution, and decompilation. No part of this product or document may be reproduced in any form by any means without prior written authorization of Sun and its licensors, if any. Third-party software, including font technology, is copyrighted and licensed from Sun suppliers. Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in the U.S. and other countries, exclusively licensed through X/Open Company, Ltd. Sun, Sun Microsystems, the Sun logo, Sun BluePrints, Sun Enterprise, Sun HPC ClusterTools, Forte, Java, Prism, and Solaris are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the US and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc. The OPEN LOOK and Sun™ Graphical User Interface was developed by Sun Microsystems, Inc. for its users and licensees. Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry.
    [Show full text]
  • A Hardware Abstraction Layer in Java
    A Hardware Abstraction Layer in Java MARTIN SCHOEBERL Vienna University of Technology, Austria STEPHAN KORSHOLM Aalborg University, Denmark TOMAS KALIBERA Purdue University, USA and ANDERS P. RAVN Aalborg University, Denmark Embedded systems use specialized hardware devices to interact with their environment, and since they have to be dependable, it is attractive to use a modern, type-safe programming language like Java to develop programs for them. Standard Java, as a platform independent language, delegates access to devices, direct memory access, and interrupt handling to some underlying operating system or kernel, but in the embedded systems domain resources are scarce and a Java virtual machine (JVM) without an underlying middleware is an attractive architecture. The contribution of this paper is a proposal for Java packages with hardware objects and interrupt handlers that interface to such a JVM. We provide implementations of the proposal directly in hardware, as extensions of standard interpreters, and finally with an operating system middleware. The latter solution is mainly seen as a migration path allowing Java programs to coexist with legacy system components. An important aspect of the proposal is that it is compatible with the Real-Time Specification for Java (RTSJ). Categories and Subject Descriptors: D.4.7 [Operating Systems]: Organization and Design—Real-time sys- tems and embedded systems; D.3.3 [Programming Languages]: Language Classifications—Object-oriented languages; D.3.3 [Programming Languages]: Language Constructs and Features—Input/output General Terms: Languages, Design, Implementation Additional Key Words and Phrases: Device driver, embedded system, Java, Java virtual machine 1. INTRODUCTION When developing software for an embedded system, for instance an instrument, it is nec- essary to control specialized hardware devices, for instance a heating element or an inter- ferometer mirror.
    [Show full text]
  • Modeling of Hardware and Software for Specifying Hardware Abstraction
    Modeling of Hardware and Software for specifying Hardware Abstraction Layers Yves Bernard, Cédric Gava, Cédrik Besseyre, Bertrand Crouzet, Laurent Marliere, Pierre Moreau, Samuel Rochet To cite this version: Yves Bernard, Cédric Gava, Cédrik Besseyre, Bertrand Crouzet, Laurent Marliere, et al.. Modeling of Hardware and Software for specifying Hardware Abstraction Layers. Embedded Real Time Software and Systems (ERTS2014), Feb 2014, Toulouse, France. hal-02272457 HAL Id: hal-02272457 https://hal.archives-ouvertes.fr/hal-02272457 Submitted on 27 Aug 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Modeling of Hardware and Software for specifying Hardware Abstraction Layers Yves BERNARD1, Cédric GAVA2, Cédrik BESSEYRE1, Bertrand CROUZET1, Laurent MARLIERE1, Pierre MOREAU1, Samuel ROCHET2 (1) Airbus Operations SAS (2) Subcontractor for Airbus Operations SAS Abstract In this paper we describe a practical approach for modeling low level interfaces between software and hardware parts based on SysML operations. This method is intended to be applied for the development of drivers involved on what is classically called the “hardware abstraction layer” or the “basic software” which provide high level services for resources management on the top of a bare hardware platform.
    [Show full text]
  • Memory Subsystem Profiling with the Sun Studio Performance Analyzer
    Memory Subsystem Profiling with the Sun Studio Performance Analyzer CScADS, July 20, 2009 Marty Itzkowitz, Analyzer Project Lead Sun Microsystems Inc. [email protected] Outline • Memory performance of applications > The Sun Studio Performance Analyzer • Measuring memory subsystem performance > Four techniques, each building on the previous ones – First, clock-profiling – Next, HW counter profiling of instructions – Dive deeper into dataspace profiling – Dive still deeper into machine profiling – What the machine (as opposed to the application) sees > Later techniques needed if earlier ones don't fix the problems • Possible future directions MSI Memory Subsystem Profiling with the Sun Studio Performance Analyzer 6/30/09 2 No Comment MSI Memory Subsystem Profiling with the Sun Studio Performance Analyzer 6/30/09 3 The Message • Memory performance is crucial to application performance > And getting more so with time • Memory performance is hard to understand > Memory subsystems are very complex – All components matter > HW techniques to hide latency can hide causes • Memory performance tuning is an art > We're trying to make it a science • The Performance Analyzer is a powerful tool: > To capture memory performance data > To explore its causes MSI Memory Subsystem Profiling with the Sun Studio Performance Analyzer 6/30/09 4 Memory Performance of Applications • Operations take place in registers > All data must be loaded and stored; latency matters • A load is a load is a load, but > Hit in L1 cache takes 1 clock > Miss in L1, hit in
    [Show full text]
  • A Full Stack Approach to FPGA Integration in the Cloud
    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/MM.2018.2877290, IEEE Micro THEME ARTICLE: HARDWARE ACCELERATION Galapagos: A Full Stack Approach to FPGA Integration in the Cloud Naif Tarafdar Field-Programmable Gate Arrays (FPGAs) have University of Toronto shown to be quite beneficial in the cloud due to their Nariman Eskandari energy-efficient application-specific acceleration. University of Toronto These accelerators have always been difficult to use Varun Sharma University of Toronto and at cloud scale, the difficulty of managing these Charles Lo devices scales accordingly. We approach the University of Toronto challenge of managing large FPGA accelerator Paul Chow clusters in the cloud using abstraction layers and a University of Toronto new hardware stack that we call Galapagos. The hardware stack abstracts low-level details while providing flexibility in the amount of low-level access users require to reach their performance needs. Computing today has evolved towards using large-scale data centers that are the basis of what many call the cloud. Within the cloud, large-scale applications for Big Data and machine learn- ing can use thousands of processors. At this scale, even small efficiency gains in performance and power consumption can translate to significant amounts. In the data center, power consump- tion is particularly important as 40 % of the costs are due to power and cooling [1]. Accelerators, which are more efficient for classes of computations, have become an important approach to ad- dressing both performance and power efficiency.
    [Show full text]
  • W4118 Operating Systems OS Overview
    W4118 Operating Systems OS Overview Junfeng Yang Outline OS definitions OS abstractions/concepts OS structure OS evolution What is OS? A program that acts as an intermediary between a user of a computer and the computer hardware. User App stuff between OS HW Two popular definitions Top-down perspective: hardware abstraction layer , turn hardware into something that applications can use Bottom-up perspective: resource manager/coordinator , manage your computers resources OS = hardware abstraction layer standard library OS as virtual machine E.g. printf(hello world), shows up on screen App can make system calls to use OS services Why good? Ease of use : higher level of abstraction, easier to program Reusability : provide common functionality for reuse E.g. each app doesnt have to write a graphics driver Portability / Uniformity : stable, consistent interface, different OS/version/hardware look same E.g. scsi/ide/flash disks Why abstraction hard? What are the right abstractions ??? Too low level ? Lose advantages of abstraction Too high level? All apps pay overhead, even those dont need Worse, may work against some apps E.g. Database Next: example OS abstractions Two popular definitions Top-down perspective: hardware abstraction layer, turn hardware into something that applications can use Bottom-up perspective: resource manager/coordinator , manage your computers resources OS = resource manager/coordinator Computer has resources, OS must manage. Resource = CPU, Memory, disk, device, bandwidth, Shell ppt gcc browser System Call Interface CPU Memory File system scheduling management management OS Network Device Disk system stack drivers management CPU Memory Disk NIC Hardware OS = resource manager/coordinator (cont.) Why good? Sharing/Multiplexing : more than 1 app/user to use resource Protection : protect apps from each other, OS from app Who gets what when Performance : efficient/fair access to resources Why hard? Mechanisms v.s.
    [Show full text]
  • Oracle Solaris Studio 12.2 Performance Analyzer MPI Tutorial
    Oracle® Solaris Studio 12.2: Performance Analyzer MPITutorial Part No: 820–5551 September 2010 Copyright © 2010, Oracle and/or its affiliates. All rights reserved. This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related software documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, the following notice is applicable: U.S. GOVERNMENT RIGHTS Programs, software, databases, and related documentation and technical data delivered to U.S. Government customers are “commercial computer software” or “commercial technical data” pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, duplication, disclosure, modification, and adaptation shall be subject to the restrictions and license terms setforth in the applicable Government contract, and, to the extent applicable by the terms of the Government contract, the additional rights set forth in FAR 52.227-19, Commercial Computer Software License (December 2007). Oracle America, Inc., 500 Oracle Parkway, Redwood City, CA 94065.
    [Show full text]
  • Oracle® Solaris Studio 12.4: Performance Analyzer Tutorials
    ® Oracle Solaris Studio 12.4: Performance Analyzer Tutorials Part No: E37087 December 2014 Copyright © 2014, Oracle and/or its affiliates. All rights reserved. This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, the following notice is applicable: U.S. GOVERNMENT END USERS. Oracle programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, delivered to U.S. Government end users are "commercial computer software" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, use, duplication, disclosure, modification, and adaptation of the programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, shall be subject to license terms and license restrictions applicable to the programs. No other rights are granted to the U.S. Government. This software or hardware is developed for general use in a variety of information management applications.
    [Show full text]
  • Vypracovane Otazky K Bakalarskym Statnicim
    Učební texty k státní bakalářské zkoušce Obecná informatika študenti MFF 15. augusta 2008 1 Vážený študent/čitateľ, toto je zbierka vypracovaných otázok pre bakalárske skúšky Informatikov. Otáz- ky boli vypracované študentmi MFF počas prípravy na tieto skúšky, a teda zatiaľ neboli overené kvalifikovanými osobami (profesormi/dokotorandmi mff atď.) - preto nie je žiadna záruka ich správnosti alebo úplnosti. Väčšina textov je vypracovaná v čestine resp. slovenčine, prosíme dodržujte túto konvenciu (a obmedzujte teda používanie napr. anglických textov). Ak nájdete ne- jakú chybu, nepresnosť alebo neúplnú informáciu - neváhajte kontaktovať adminis- trátora alebo niektorého z prispievateľov, ktorý má write-prístup k svn stromu, s opravou :-) Podobne - ak nájdete v „texteÿ veci ako ??? a TODO, znamená to že danú informáciu je potrebné skontrolovať, resp. doplniť... Texty je možné ďalej používať a šíriť pod licenciou GNU GFDL (čo pre všet- kých prispievajúcich znamená, že musia súhlasiť so zverejnením svojich úprav podľa tejto licencie). Veríme, že Vám tieto texty pomôžu k úspešnému zloženiu skúšok. Hlavní writeři :-) : • ajs • andree – http://andree.matfyz.cz/ • Hydrant • joshis / Petr Dvořák • kostej • nohis • tuetschek – http://tuetschek.wz.cz/ Úvodné verzie niektorých textov vznikli prepisom otázok vypracovaných „pí- somne na papierÿ, alebo inak ne-TEX-ovsky. Autormi týchto pôvodných verzií sú najmä nasledujúce osoby: gASK, Grafi, Kate (mat-15), Nytram, Oscar, Stando, xSty- ler. Časť je prebratá aj z pôvodných súborkových textov... Všetkým patrí naša/vaša vďaka. 2 Obsah 1 Logika 4 1.1 Logika – jazyk, formule, sémantika, tautologie . 4 1.2 Rozhodnutelnost, splnitelnost, pravdivost a dokazatelnost . 6 1.3 Věty o kompaktnosti a úplnosti výrokové a predikátové logiky . 11 1.4 Normální tvary výrokových formulí, prenexní tvary formulí prediká- tové logiky .
    [Show full text]
  • Openmp Performance 2
    OPENMP PERFORMANCE 2 A common scenario..... “So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?”. Most of us have probably been here. Where did my performance go? It disappeared into overheads..... 3 The six (and a half) evils... • There are six main sources of overhead in OpenMP programs: • sequential code • idle threads • synchronisation • scheduling • communication • hardware resource contention • and another minor one: • compiler (non-)optimisation • Let’s take a look at each of them and discuss ways of avoiding them. 4 Sequential code • In OpenMP, all code outside parallel regions, or inside MASTER and SINGLE directives is sequential. • Time spent in sequential code will limit performance (that’s Amdahl’s Law). • If 20% of the original execution time is not parallelised, I can never get more that 5x speedup. • Need to find ways of parallelising it! 5 Idle threads • Some threads finish a piece of computation before others, and have to wait for others to catch up. • e.g. threads sit idle in a barrier at the end of a parallel loop or parallel region. Time 6 Avoiding load imbalance • It’s a parallel loop, experiment with different schedule kinds and chunksizes • can use SCHEDULE(RUNTIME) to avoid recompilation. • For more irregular computations, using tasks can be helpful • runtime takes care of the load balancing • Note that it’s not always safe to assume that two threads doing the same number of computations will take the same time.
    [Show full text]
  • Openmp 4.0 Support in Oracle Solaris Studio
    OpenMP 4.0 Support In Oracle Solaris Studio Ruud van der Pas Disnguished Engineer Architecture and Performance, SPARC Microelectronics SC’14 OpenMP BOF November 18, 2014 Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 1 Safe Harbor Statement The following is intended to outline our general product direcEon. It is intended for informaon purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or funcEonality, and should not be relied upon in making purchasing decisions. The development, release, and Eming of any features or funcEonality described for Oracle’s products remains at the sole discreEon of Oracle. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 2 What Is Oracle Solaris Studio ? • Supported on Linux too • Compilers and tools, • It is actually a comprehensive so-ware suite, including: – C, C++ and Fortran compilers – Various tools (performance analyzer, thread analyzer, code analyzer, debugger, etc) • Plaorms supported – Processors: SPARC, x86 – Operang Systems: Solaris, Linux Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 3 hp://www.oracle.com/technetwork/ server-storage/solarisstudio/overview/index.html Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 4 OpenMP Specific Support/1 • Compiler – Various opEons and environment variables – Autoscoping – Compiler Commentary • General feature to inform about opEmizaons, but also specific to OpenMP • Performance Analyzer – OpenMP “states”, metrics, etc – Task and region specific informaon Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 5 Performance Analyzer - Profile Comparison Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 6 A Comparison– More Detail Copyright © 2014, Oracle and/or its affiliates.
    [Show full text]
  • Web Applications and the Hardware Abstraction Layer
    Web Applications Dilles, 09 Web Applications and the Evolution of the Hardware Abstraction Layer Jacob Dilles George Mason University March 15 2009 A computer is defined as a machine that manipulates data according to a list of instructions, however it has come to mean something more. Operating a modern Core class possessor that can perform 2.4 billion 64-bit operations per second (Intel Corporation, 2009) by listing one instruction at a time would not be very useful. However the first 1960 era computers, like the ENIAC, were only programmable with machine- code instructions entered by hand using switches and patch cables. This was not a significant problem at the time, when the longest program was 304 batch-operated instructions, took days to write and 5 to 10 seconds to run, and the punched card reader limited data input rate to 250 bytes per second. (Ballistic Research Laboratories, 1955). However computers grew more capable at an exponential rate while steadily decreasing in cost. By 1970 machines, like the relativity affordable DEC PDP-7 that were capable of fairly advanced timesharing, expedited the development of the operating system - an interface between the physical computer hardware and an abstract environment that allows more than one program to be run simultaneously. Due to the wide variety of hardware architecture at the time, early operating systems had to be written for a specific machine, and were not consistent or interchangeable between computer manufactures and models. In 1973 the Unix operating system developed at Bell Laboratories was ported from PDP-11/20 assembly to the C programming language, and 1 Web Applications Dilles, 09 could be used on any machine with a C compiler with minimal modification.
    [Show full text]