Design of a Parallel Multi-Threaded Programming Model for Multi-Core Processors

Total Page:16

File Type:pdf, Size:1020Kb

Design of a Parallel Multi-Threaded Programming Model for Multi-Core Processors DESIGN OF A PARALLEL MULTI-THREADED PROGRAMMING MODEL FOR MULTI-CORE PROCESSORS By Muhammad Ali Ismail Thesis submitted for the Degree of Doctor of Philosophy Department of Computer and Information Systems Engineering NED University of Engineering & Technology University Road, Karachi - 75270, Pakistan 2011 DESIGN OF A PARALLEL MULTI-THREADED PROGRAMMING MODEL FOR MULTI-CORE PROCESSORS PhD Thesis By Muhammad Ali Ismail Batch: 2008-2009 Project Advisor: Prof. Dr. Shahid Hafeez Mirza Project Co-supervisor: Prof. Dr. Talat Altaf 2011 Department of Computer and Information Systems Engineering NED University of Engineering & Technology University Road, Karachi - 75270, Pakistan Certificate Certified that the thesis entitled, “DEVELOPMENT OF A NEW PARALLEL MULTI-THREADED PROGRAMMING MODEL FOR MULTI-CORE PROCESSORS” which is being submitted by Mr. Muhammad Ali Ismail for the award of degree of Doctor of Philosophy in Computer & Information Systems Engineering Department of NED University of Engineering and Technology is a record of candidate’s own original work carried out by him under our supervision and guidance. The work incorporated in this thesis has not been submitted elsewhere for the award of any other degree. ___________________ _________________________ Prof. Dr. Talat Altaf, Prof. Dr. Shahid Hafeez Mirza Dean (ECE ), NEDUET Professor, UIT PhD Co-supervisor PhD Supervisor Acknowledgements In first place, I would like to thank the Almighty Allah for His countless blessings. In fact, all praise and glory belongs to Him and none has the right and worth to be worshipped but He. Next, I would like to acknowledge my home university, NED university of Engineering and Technology, for giving me the opportunity and funding for conducting this PhD research. I would also like to express my gratitude to my mentor and supervisor, Prof. Dr. Shahid Hafeez Mirza, for his generous supervision. His continuous support, encouragement, guidance, advices and comments helped me to stay in the right direction to complete this research. I am also very grateful to my co-supervisor, Prof. Dr. Talat Altaf, for his very kind advices, support and motivation throughout my PhD research. Many thank to my department, Computer and Information System Engineering, including my colleagues and its administrative and technical staff for providing me such a supportive and productive work environment. Last but not the least, special thanks to my family, particularly to my parents for their endless prayers and support. CONTENTS Abstract……………………………………………………………………………………………………………………..………….. v List of Publications…………………………………………………………………………………………………………………. vi List of Figures………………………………………………………………………………………………………..……..………… vii List of Tables………………………………………………………………………………………………………………………..… x 1. Introduction…………………………………………………………………………………………………………………….. 1 1.1. Contributions of Dissertation 1 1.1.1. Multi-level Cache System for Multi-core Processors ( "LogN+1" and "LogN" Cache Models ) 2 1.1.2. Multi-level Cache Simulator for Multi-core processors ( "MCSMC" ) 3 1.1.3. Multi-threaded Parallel Programming Model for Multi-core processors ( "SPC3 PM" ) 3 1.2. The Thesis Organization 4 2. Motivation and Challenges with Multi-Core Processors……………………………………………………. 5 2.1. Architectural Challenges 6 2.1.1. Memory Hierarchy 6 2.1.1.1. Cache Levels 7 2.1.1.2. Synchronization 7 2.1.1.3. False Sharing 8 2.1.1.4. Spinning 8 2.1.1.5. Communication Minimization 8 2.1.2. Architectural Support for Compilers / Programming Models 9 2.2. Software Challenges 9 2.2.1. Parallel Programming Models 9 2.2.2. Parallel Algorithm Models 10 2.2.2.1. Data Parallel Models 10 2.2.2.2. Task Graph Model 11 2.2.2.3. Work Pool Model 11 2.2.2.4. Master-Slave Model 11 2.2.2.5. Pipeline or Producer-Consumer Model 11 2.2.3. Decomposition Techniques 12 2.2.3.1. Recursive Decomposition 12 2.2.3.2. Data Decomposition 13 2.2.3.3. Exploratory Decomposition 13 2.2.3.4. Speculative Decomposition 13 2.2.4. Levels of Parallelism 13 2.2.5. Compiler Optimization 15 2.2.5.1. Parallelism 15 2.2.5.2. Removal of Data Dependencies 16 I 2.2.5.3. Memory Space 16 2.2.6. Related Tools for Performance and Parallel Debugging 16 2.2.7. Regular and Irregular Problems 17 2.3. Performance and Scalability Issues 18 2.4. Summary 19 3. LogN+1' and 'LogN' Cache model, A Binary Tree Based Cache System for Multi-Core Processors……………………………………………………………………………………………………………………….. 20 3.1. Present 3-level Cache System and Related Improvements for Multi-core Processors 20 3.2. 'LogN+1' and 'LogN' Cache Model 22 3.2.1. Design Concept 23 3.2.2. Cache Hierarchy and Cache Size 23 3.2.3. Cache Hierarchy and Cache frequency (Cycle Time) 28 3.3. Performance Evolution 30 3.3.1. Average Cache Access Time 30 3.3.2. Probability of Cache Hits 32 3.3.3. Result Analysis 34 3.4. Summary 35 4. Queuing Modeling of 'LogN+1' and 'LogN' Cache Models……………….………………………………… 36 4.1. Queuing Theory and Kendal’s Notation 36 4.2. M/D/C/K- FIFO Queuing Model, for LogN+1 and LogN Cache Model 37 4.2.1. Basic Model 38 4.2.2. Performance Equations 39 4.2.2.1. Average Data Request Rate 40 4.2.2.2. Average Cache Utilization 41 4.2.2.3. Average Individual Cache Access Time 42 4.2.2.4. Average Request Queue Length 42 4.2.2.5. Overall Average Cache System Access Time 42 4.3. Queuing Model for 3-Level Cache system 43 4.4. Performance Evolution 45 4.4.1. LogN+1 Model 45 4.4.2. LogN Model 48 4.4.3. Present 3-Level Cache System 48 4.4.4. Result Analysis 52 4.5. Summary 56 5. Simulation of 'LogN+1' and 'LogN' Cache Models Using 'MCSMC'…….………………………………. 57 5.1. Cache Simulation 57 5.2. MCSMC (Multi-level Cache Simulator for Multi-Cores) 58 5.2.1. Input Parameters Set 58 5.2.2. Software Modules 59 5.2.2.1. Cache Architecture Generator 60 5.2.2.2. Program Scheduler 60 5.2.2.3. Trace Generator 60 5.2.2.4. Replacement Policy Module 62 II 5.2.2.5. Results Generation 62 5.2.3. Serial / Parallel Execution of MCSMC 62 5.2.4. Comparison with CACTI Cache Simulator 65 5.3. Performance Evolution 67 5.3.1. Simulation Environment 67 5.3.2. Result Analysis 67 5.4. Summary 72 6. SPC3 PM; A Multithreaded Parallel Software Development Environment for Multi-Core Processors………………………………………………………………………………………...…………………………… 73 6.1. Currently Available Parallel Programming Tools 73 6.1.1. Commercially Available Multi-Core Application Development Aids 73 6.1.1.1. Intel's Multi-Core Application Development Aids 74 6.1.1.2. Microsoft’s Multi-Core Application Development Aids 76 6.1.1.3. Sun's Multi-Core Application Development Aids 76 6.1.1.4. Other Commercial Multi-Core Application Development Aids 77 6.1.2. Other Standard Shared Memory Programming Approaches Use for 78 Multi-core processors 6.1.2.1. Erlang 78 6.1.2.2. POSIX Thread (Pthreads) 79 6.1.2.3. OpenMP 79 6.1.3. Research Oriented Multi-Core Application Development Tools 79 6.1.4. Current Multi-Core Research Groups 81 6.1.5. Summary 83 6.2. Key Features of SPC3 PM 84 6.3. Design Concepts 85 6.3.1. Design Issues with Multi-Core Programming 86 6.3.2. Task Based Parallelism 89 6.3.3. Thread Level Parallelism 89 6.3.4. Decomposition Techniques 90 6.3.5. Task Scheduling 92 6.3.6. Execution Modes 93 6.3.7. Types of Problem Supported 93 6.3.8. Data Sharing 94 6.3.9. Compilation 94 6.4. Programming with SPC3 PM 96 6.4.1. Rules for Task Decomposition 96 6.4.2. Properties of a Task 97 6.4.3. Program Structure 99 6.4.4. SPC3 PM Library 100 6.4.4.1. Serial Function 100 6.4.4.2. Parallel Function 102 6.4.4.3. Concurrent Function 104 6.5. Performance Evolution 106 6.5.1. Matrix Multiplication Algorithm 107 6.5.2. Serial Function 109 6.5.3. Parallel Function 113 6.5.4. Concurrent Function 119 6.6. Summary 125 III 7. Solving Travelling Salesman Problem using SPC3 PM..………………………………………………………. 126 7.1. Travelling Salesman Problem (TSP) 126 7.1.1. TSP applications 126 7.1.2. TSP solutions 128 7.1.2.1. Exact Algorithms 129 7.1.2.2. TSP Heuristics 129 7.1.2.3. Meta-Heuristics 129 7.1.2.4. Hyper-Heuristics 130 7.2. Lin-Kernighan Heuristic 130 7.2.1. Basic Lin-Kernighan Heuristic Algorithm (LKH) 130 7.2.2. Modified Lin-Kernighan Heuristic Algorithm (LKH-1) 133 7.2.3. Lin-Kernighan Heuristic Algorithm with General k-opt Sub-move (LKH2) 134 7.3. LKH-2 Software 135 7.3.1. Execution of LKH-2 Software 135 7.3.2. Flow Chart for LKH-2 Software Processing 138 7.4. Parallelization of LKH-2 Software using SPC3 PM 139 7.4.1. Flow Chart for Parallel LKH-2 Software Processing Parallelized using SPC3 PM 141 7.5. Performance Evaluation 142 7.5.1. TSP Library (TSPLIB) 142 7.5.2. Result Analysis 143 7.6. Summary 150 8. Conclusions and Future Work……………………………………………….…………………………………………. 151 8.1. Summary 151 8.2. Future work 154 Appendix A: List of TSP instances in TSPLIB............................................................................... 156 References……………………………………………………………………………….……………………………………………. 159 IV Abstract With the arrival of Chip Multi-Processors (CMPs), every processor has now built-in parallel computational power and that can be fully utilized only if the program in execution is written accordingly. Also existing memory system and parallel developments tools do not provide adequate support for general purpose multi-core programming and unable to utilize all available cores efficiently.
Recommended publications
  • Optimizing Applications for Multicore by Intel Software Engineer Levent Akyil Welcome to the Parallel Universe
    Letter to the Editor by parallelism author and expert James Reinders Are You Ready to Enter a Parallel Universe: Optimizing Applications for Multicore by Intel software engineer Levent Akyil Welcome to the Parallel Universe Contents Think Parallel or Perish, BY JAMES REINDERS .........................................................................................2 James Reinders, Lead Evangelist and a Director with Intel® Software Development Products, sees a future where every software developer needs to be thinking about parallelism first when programming. He first published“ Think Parallel or Perish“ three years ago. Now he revisits his comments to offer an update on where we have gone and what still lies ahead. Parallelization Methodology...................................................................................................................... 4 The four stages of parallel application development addressed by Intel® Parallel Studio. Writing Parallel Code Safely, BY PETER VARHOL ........................................................................... 5 Writing multithreaded code to take full advantage of multiple processors and multicore processors is difficult. The new Intel® Parallel Studio should help us bridge that gap. Are You Ready to Enter a Parallel Universe: Optimizing Applications for Multicore, BY LEVENT AKYIL .............................................. 8 A look at parallelization methods made possible by the new Intel® Parallel Studio—designed for Microsoft Visual Studio* C/C++ developers of Windows* applications.
    [Show full text]
  • Andrzej Nowak - Bio
    Multi-core Architectures Multi-core Architectures Andrzej Nowak - Bio 2005-2006 Intel Corporation IEEE 802.16d/e WiMax development Theme: Towards Reconfigggurable High-Performance Comppguting Linux kernel performance optimizations research Lecture 2 2006 Master Engineer diploma in Computer Science Multi-core Architectures Distributed Applications & Internet Systems Computer Systems Modeling 2007-2008 CERN openlab Andrzej Nowak Multi-core technologies CERN openlab (Geneva, Switzerland) Performance monitoring Systems architecture Inverted CERN School of Computing, 3-5 March 2008 1 iCSC2008, Andrzej Nowak, CERN openlab 2 iCSC2008, Andrzej Nowak, CERN openlab Multi-core Architectures Multi-core Architectures Introduction Objectives: Explain why multi-core architectures have become so popular Explain why parallelism is such a good bet for the near future Provide information about multi-core specifics Discuss the changes in computing landscape Discuss the impact of hardware on software Contents: Hardware part THEFREERIDEISOVERTHE FREE RIDE IS OVER Software part Recession looms? Outlook 3 iCSC2008, Andrzej Nowak, CERN openlab 4 iCSC2008, Andrzej Nowak, CERN openlab Towards Reconfigurable High-Performance Computing Lecture 2 iCSC 2008 3-5 March 2008, CERN Multi-core Architectures 1 Multi-core Architectures Multi-core Architectures Fundamentals of scalability Moore’s Law (1) Scalability – “readiness for enlargement” An observation made in 1965 by Gordon Moore, the co- founder of Intel Corporation: Good scalability: Additional
    [Show full text]
  • Efficiency, Energy Efficiency and Programming of Accelerated HPC Servers: Highlights of PRACE Studies
    Efficiency, energy efficiency and programming of accelerated HPC servers: Highlights of PRACE studies Lennart Johnsson Department of Computer Science University of Houston and School of Computer Science and Communications KTH To appear in Springer Verlag “GPU Solutions to Multi-scale Problems in Science and Engineering”, 2011 2 Lennart Johnsson Abstract During the last few years the convergence in architecture for High-Performance Computing systems that took place for over a decade has been replaced by a di- vergence. The divergence is driven by the quest for performance, cost- performance and in the last few years also energy consumption that during the life-time of a system have come to exceed the HPC system cost in many cases. Mass market, specialized processors, such as the Cell Broadband Engine (CBE) and Graphics Processors, have received particular attention, the latter especially after hardware support for double-precision floating-point arithmetic was intro- duced about three years ago. The recent support of Error Correcting Code (ECC) for memory and significantly enhanced performance for double-precision arithme- tic in the current generation of Graphic Processing Units (GPUs) have further so- lidified the interest in GPUs for HPC. In order to assess the issues involved in potentially deploying clusters with nodes consisting of commodity microprocessors with some type of specialized processor for enhanced performance or enhanced energy efficiency or both for science and engineering workloads, PRACE, the Partnership for Advanced Com- puting in Europe, undertook a study that included three types of accelerators, the CBE, GPUs and ClearSpeed, and tools for their programming. The study focused on assessing performance, efficiency, power efficiency for double-precision arithmetic and programmer productivity.
    [Show full text]
  • Download (633Kb)
    Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, reference www.intel.com/software/products. Intel, Intel Core and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. Ct: A New Paradigm for Data Parallel Computing *Other names and brands may be claimed as the property of others. Hans–Christian Hoppe Intel Visual Computing Institute, Intel Labs Copyright © 2009. Intel Corporation. using material from http://intel.com/software/products Anwar Ghuloum, CJ Newburn, Michael McCool and Stefanus Du Toit Performance and Productivity Libraries, Developer Products Division, Software and Services Group Software & Services Group, Developer Products Division Software & Services Group, Developer Products Division Copyright © 2009, Intel Corporation. All rights reserved. Copyright © 2009, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
    [Show full text]
  • General Purpose Programming on Modern Graphics Hardware
    GENERAL PURPOSE PROGRAMMING ON MODERN GRAPHICS HARDWARE Robert Fleming, B.Sc. Thesis Prepared for the Degree of MASTER OF SCIENCE UNIVERSITY OF NORTH TEXAS May 2008 APPROVED: Robert Renka, Major Professor Armin Mikler, Committee Member Tom Jacob, Committee Member Krishna Kavi, Chair of the Department of Computer Science and Engineering Oscar Garcia, Dean of the College of Engineering Sandra L. Terrell, Dean of the Robert B. Toulouse School of Graduate Studies Fleming, Robert. General Purpose Programming on Modern Graphics Hardware. Master of Science (Computer Science), May 2008, 90 pp., 1 table, 3 figures, references, 124 titles. I start with a brief introduction to the graphics processing unit (GPU) as well as general-purpose computation on modern graphics hardware (GPGPU). Next, I explore the motivations for GPGPU programming, and the capabilities of modern GPUs (including advantages and disadvantages). Also, I give the background required for further exploring GPU programming, including the terminology used and the resources available. Finally, I include a comprehensive survey of previous and current GPGPU work, and end with a look at the future of GPU programming. Copyright 2008 by Robert Fleming ii To Wanda and Cheesepuff, my partners in crime. iii TABLE OF CONTENTS Page LIST OF TABLES AND ILLUSTRATIONS ......................................................................vi Chapters 1. MOTIVATION ............................................................................................ 1 1.1 What Kinds of Computation Suit the
    [Show full text]
  • Next Generation Developer Tools from Intel
    Next Generation Developer Tools from Intel Focusing on new Performance Analysis Tool 4th Parallel Tools Workshop HLRS, September 2010 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Developer Tools of Intel ---Today Fully Supported Developer Products: … and numerous unsupported tools like PIN, PTU, AVX Emulator,Emulator, CnC freely available from whatif.intel.com and other public sitessites 2 Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 2 Intel ®®® Parallel Studio –––Windows Only New Version 2011 Released Sep 2 ndndnd ! Intel ® Parallel Advisor Intel ® Parallel Composer • Demystifies and speeds parallel • C++ Compiler, libraries, debugger application design plug-in • Direct user where to parallelize • Intel ® Parallel Debugger Extension: • Explorer & Modeler tools give Simplify debugging parallel code parallelism design insight and • A family of Parallel models New! analysis Set of portable, reliable, future proof • Proposes parallelism scheme best parallel models for both data and suited for application task parallelism, includes Intel TBB, • Summary view for decision-making Cilk Plus • Support for Intel Array Building Blocks • Intel ® IPP, OpenMP * included Intel ® Parallel Inspector Intel ® Parallel Amplifier • Dynamic Memory & thread • Parallel performance analyzer Analysis for serial and parallel • Find both serial and parallel performance bottlenecks code • Scale application performance with • Finds thread data races & more processor cores deadlocks • No special compilers or builds • Finds memory leaks and necessary corruption Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
    [Show full text]
  • Gossamer: a Lightweight Approach to Using Multicore Machines
    Gossamer: A Lightweight Approach to Using Multicore Machines Item Type text; Electronic Dissertation Authors Roback, Joseph Anthony Publisher The University of Arizona. Rights Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author. Download date 06/10/2021 14:59:16 Link to Item http://hdl.handle.net/10150/194468 GOSSAMER: A LIGHTWEIGHT APPROACH TO USING MULTICORE MACHINES by Joseph Anthony Roback A Dissertation Submitted to the Faculty of the DEPARTMENT OF COMPUTER SCIENCE In Partial Fulfillment of the Requirements For the Degree of DOCTOR OF PHILOSOPHY In the Graduate College THE UNIVERSITY OF ARIZONA 2010 2 STATEMENT BY AUTHOR This dissertation has been submitted in partial fulfillment of requirements for an advanced degree at the University of Arizona and is deposited in the University Library to be made available to borrowers under rules of the Library. Brief quotations from this dissertation are allowable without special permission, provided that accurate acknowledgment of source is made. Requests for permission for extended quotation from or reproduction of this manuscript in whole or in part may be granted by the head of the major department or the Dean of the Graduate College when in his or her judgment the proposed use of the material is in the interests of scholarship. In all other instances, however, permission must be obtained from the author. SIGNED: Joseph Anthony Roback 3 ACKNOWLEDGEMENTS I cannot adequately express my gratitude to my advisor, Dr.
    [Show full text]
  • Software Plattform Embedded Systems 2020
    SPES Software Plattform Embedded Systems 2020 - Beschreibung der Fallstudie „Multi-core and Many-core Evaluation“ - Version: 1.0 Projektbezeichnung SPES 2020 Verantwortlich Richard Membarth QS-Verantwortlich Mario Körner, Frank Hannig Erstellt am 18.06.2010 Zuletzt geändert 18.06.2010 16:09 Freigabestatus Vertraulich für Partner Projektöffentlich X Öffentlich Bearbeitungszustand in Bearbeitung vorgelegt X fertig gestellt Weitere Produktinformationen Erzeugung Richard Membarth Mitwirkend Frank Hannig, Mario Körner, Wieland Eckert Änderungsverzeichnis Änderung Geänderte Beschreibung der Änderung Autor Zustand Kapitel Nr. Datum Version 1 22.06.10 1.0 Alle Finale Reporterstellung Prüfverzeichnis Die folgende Tabelle zeigt einen Überblick über alle Prüfungen – sowohl Eigenprüfungen wie auch Prüfungen durch eigenständige Qualitätssicherung – des vorliegenden Dokumentes. Geprüfte Neuer Datum Anmerkungen Prüfer Version Produktzustand Contents 1 Evaluation Application and Criteria 7 1.1 2D/3D Image Registration . .7 1.2 Checklist . .9 1.3 Profiling . 10 1.4 Parallelization Approaches . 12 2 Multi-Core Frameworks 15 2.1 OpenMP . 15 2.2 Cilk++ . 29 2.3 Threading Building Blocks . 43 2.4 RapidMind . 57 2.5 OpenCL . 70 2.6 Discussion . 82 3 Many-Core Frameworks 89 3.1 RapidMind . 89 3.2 PGI Accelerator . 92 3.3 OpenCL . 99 3.4 CUDA . 105 3.5 Intel Ct . 112 3.6 Larrabee . 112 3.7 Related Frameworks . 112 3.7.1 Bulk-Synchronous GPU Programming . 112 3.7.2 HMPP Workbench . 113 3.7.3 Goose . 113 3.7.4 YDEL for CUDA . 113 3.8 Discussion . 113 4 Conclusion 117 Bibliography 121 3 Abstract In this study, different parallelization frameworks for standard shared memory multi-core processors as well as parallelization frameworks for many-core processors like graphics cards are evaluated.
    [Show full text]
  • Cross-Platform Software Optimization with Intel's Ct Technology
    Cross-platform Software Optimization with Intel’s Ct Technology Software AND Services Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Agenda • What is Intel’s Ct Technology? • How does Intel’s Ct Technology work? • How to port C++ code to Intel’s Ct Technology? 2 What is Intel’s Ct Technology? 3 Throughput and Visual Computing Applications Entertainment “RMS” Applications Recognition Mining TIPS Learning & Travel Synthesis Personal Media RMS Creation and GIPS Management 3D & Video Tera-scale Performance MIPS Mult- Media Multi-core IPSInstruction= per second KIPS Text Single Core Health Kilobytes Megabytes Gigabytes Terabytes Dataset Size 4 Software Opportunities and Risks Opportunities: SW Performance ISV Differentiation Increasing Performance Many-core architecture offers unprecedented opportunities for differentiation: • Model-based applications: New functionalities enriching the user’s experience • Improved quality: Higher resolution/accuracy in results • Increased usability: Raw power to fuel more sophisticated usage models 5 Software Opportunities and Risks Opportunities: Risk: SW Performance Programmer ISV Differentiation "Headaches" (Reduced Productivity) Increasing Performance Reduced Productivity: – Data races: New class of bugs that increase exponentially with degree of parallelism – Performance tuning: Programmers can expect to spend most time here – Forward scaling: Anticipating future HW enhancements – Modularity: Difficult to compose parallel programs – Proprietary Tools: Reduces choice and challenges build infrastructure 6 Software Opportunities and Risks Risk: Opportunities: Programmer SW Performance “Headaches” ISV Differentiation Throughput (Reduced Computing SW Productivity) Technologies Increasing Performance Visual computing software technologies make it easier: Industry Standard APIs: DirectX*, OpenGL* Native Tools: SSE/AVX/Co-processor Native Compilers, OpenMP*, Intel® Threading Building Blocks, etc.
    [Show full text]
  • Intel® Software Development Tools Intel® Parallel Studio Seminar
    From Serial to Parallel Intel® Software Products for HPC Hubert Haberstock Technical Consulting Engineer Software & Services Group, Developer Products Division Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 1 Agenda 09:15 Saluto di benvenuto e apertura dei lavori (Assintel) 09:30 Architettura Parallela: lo sviluppo dell’hardware (Intel Italy) 10:00 Parallel Programming, today and tomorrow (Intel) 11:05 Dal seriale al parallelo Intel High-Performance Tools (Intel) Intel Parallel Studio (C. Fiorillo) 13:30 Un caso di studio (C. Fiorillo) 14:15 Parallel programming methods and tools (Intel) 15:00 Ottimizzazione di applicazioni (C. Fiorillo) 16:00 Wrap up, Q&A, seminar evaluation Intel Software Tools - Parallel Design Cycle Serial Visualization of Architectural applications and the system Analysis Highly optimizing Introducing compilers delivering scalable solutions Parallelism Detect latent programming Validating to address unique Correctness challenges Tune for performance Performance and scalability Tuning Parallel Software & Services Group, Developer Products Division Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 3 Intel® VTune™ Analyzer 9.1 "The Intel VTune Identifies hard to find Performance performance bottlenecks Analyzer took a multi-day task and • Features turned it into a sub- day task." – Tune process or thread parallel code – Low overhead sampling Randy Camp – Graphical call graph VP, Software R&D MUSICMATCH Inc. – View results on source or assembly • Applications – System-wide Analysis – Finding hotspots – Tuning libraries, drivers and applications – Remote Data Collector for Windows*/Linux* – Programming Lanugage and Compiler Independent – Supports latest Intel Processors Windows* Linux* Mac* IA32 Intel64 IA64 Multicore √ √ √ √ √ √ Software & Services Group, Developer Products Division Copyright © 2010, Intel Corporation.
    [Show full text]
  • Qilin: Exploiting Parallelism on Heterogeneous Multiprocessors with Adaptive Mapping
    Qilin: Exploiting Parallelism on Heterogeneous Multiprocessors with Adaptive Mapping Chi-Keung Luk Sunpyo Hong Hyesoon Kim Software Pathfinding and Electrical and Computer College of Computing Innovations Engineering School of Computer Science Software and Services Group Georgia Institute of Georgia Institute of Intel Corporation Technology Technology Hudson, MA 01749 Atlanta, GA 30332 Atlanta, GA 30332 [email protected] [email protected] [email protected] ABSTRACT 1. INTRODUCTION Heterogeneous multiprocessors are increasingly important in the Multiprocessors have emerged as mainstream computing plat- multi-core era due to their potential for high performance and en- forms nowadays. Among them, an increasingly popular class are ergy efficiency. In order for software to fully realize this potential, those with heterogeneous architectures. By providing processing the step that maps computations to processing elements must be as elements (PEs)1 of different performance/energy characteristics on automated as possible. However, the state-of-the-art approach is the same machine, these architectures could deliver high perfor- to rely on the programmer to specify this mapping manually and mance and energy efficiency [14]. The most well-known hetero- statically. This approach is not only labor intensive but also not geneous architecture today is probably the IBM/Sony Cell archi- adaptable to changes in runtime environments like problem sizes tecture, which consists of a Power processor and eight synergistic and hardware/software configurations. In this study, we propose processors [26]. In the personal computer (PC) world, a desktop adaptive mapping, a fully automatic technique to map computa- now has a multicore CPU and a GPU, exposing multiple levels of tions to processing elements on a CPU+GPU machine.
    [Show full text]