A Methodology for Efficient Multiprocessor System-On

Total Page:16

File Type:pdf, Size:1020Kb

A Methodology for Efficient Multiprocessor System-On A Methodology for Efficient Multiprocessor System-on-Chip Software Development Von der Fakult¨at f¨ur Elektrotechnik und Informationstechnik der Rheinisch–Westf¨alischen Technischen Hochschule Aachen zur Erlangung des akademischen Grades eines Doktors der Ingenieurwissenschaften genehmigte Dissertation vorgelegt von M.Sc. Jianjiang Ceng aus Zhejiang/China Berichter: Univ.-Prof. Dr. rer. nat. Rainer Leupers Univ.-Prof. Dr.-Ing. Gerd Ascheid Tag der m¨undlichen Pr¨ufung: 28.04.2011 Diese Dissertation ist auf den Internetseiten der Hochschulbibliothek online verf¨ugbar. Contents 1 Introduction 1 1.1 Background ...................................... 1 1.2 MPSoCSoftwareDevelopmentChallenge . ........ 3 1.2.1 ProgrammingModel .............................. 4 1.2.2 ApplicationParallelization . ...... 6 1.2.3 MappingandScheduling . .. 6 1.2.4 CodeGeneration ................................ 7 1.2.5 ArchitectureModelingandSimulation . ...... 7 1.2.6 Summary .................................... 8 1.3 ThesisOrganization. .. ... .. .. .. .. .. ... .. .. .. .. .. ..... 8 2 Related Work 11 2.1 GeneralPurposeMulti-Processor . ........ 11 2.1.1 Cell ....................................... 12 2.2 GeneralPurposeGraphicsProcessingUnit . ......... 13 2.3 Embedded Multi-Processor System-on-Chip . .......... 14 2.3.1 MPCore ..................................... 14 2.3.2 IXP ....................................... 14 2.3.3 OMAP...................................... 15 2.3.4 CoMPSoC.................................... 17 2.3.5 SHAPES..................................... 19 2.3.6 Platform2012.................................. 22 2.3.7 HOPES ..................................... 23 2.3.8 Daedalus..................................... 26 2.3.9 MPA....................................... 29 2.3.10 TCT....................................... 32 2.4 Summary ........................................ 34 3 Methodology Overview 35 3.1 MPSoC Application Programming Studio (MAPS) . ...... 35 3.1.1 ApplicationModeling. .. 36 3.1.2 ArchitectureModeling . .. 37 3.1.3 Profiling..................................... 38 3.1.4 Control/DataFlowAnalysis . ... 38 3.1.5 Partitioning .................................. 39 i ii Contents 3.1.6 Scheduling&Mapping . 40 3.1.7 CodeGeneration ................................ 40 3.1.8 High-LevelSimulation . .. 41 3.1.9 TargetMPSoCPlatform . 42 3.1.10 Integrated Development Environment . ........ 42 3.2 ContributionofThisThesis . ...... 42 4 Architecture Model 45 4.1 Performance Estimation and Architecture Model . ........... 45 4.2 RelatedWork ..................................... 46 4.3 ProcessingElementModel . ..... 47 4.3.1 CCompilationProcess . .. 47 4.3.2 OperationCostTable. .. 49 4.3.3 ComputationalCostEstimation . ..... 50 4.4 ArchitectureDescriptionFile. ........ 51 4.4.1 ProcessingElementDescription . ...... 52 4.4.2 ArchitecturalDescription. ..... 52 4.5 Summary ........................................ 54 5 Profiling 55 5.1 RelatedWork ..................................... 56 5.2 ProfilingProcessOverview. ...... 57 5.3 TraceGeneration ................................. ... 58 5.3.1 LANCEFrontend................................ 58 5.3.2 Instrumentation............................... .. 59 5.3.3 ProfilingRuntimeLibrary . ... 60 5.3.4 TraceFile .................................... 61 5.4 Post-Processing ................................. .... 62 5.4.1 IRProfileGeneration. .. 63 5.4.2 SourceProfileGeneration . ... 66 5.4.3 CallGraphGeneration . .. 68 5.5 Summary ........................................ 69 6 Control/Data Flow Analysis 71 6.1 ControlFlowGraph ................................ .. 72 6.2 StatementControlFlowGraph . ..... 73 6.3 StatementControl/DataFlowGraph . ....... 74 6.3.1 DataDependenceAnalysis. .. 75 6.4 Weighted Statement Control/Data Flow Graph . ......... 76 6.4.1 NodeWeightAnnotation. 77 6.4.2 ControlEdgeAnnotation. .. 77 6.4.3 DataEdgeAnnotation ............................. 78 6.5 Summary ........................................ 78 7 Partitioning 79 7.1 RelatedWork ..................................... 79 7.2 TaskGranularity ................................. ... 81 7.3 CoupledBlock.................................... .. 83 7.3.1 CBDefinition.................................. 83 ii Contents iii 7.3.2 SchedulabilityConstraint. ...... 84 7.3.3 DataLocalityConstraint. .... 84 7.4 CBGenerationandWSCDFGPartitioning . ....... 86 7.4.1 WSCDFGPartition............................... 86 7.4.2 PartitionOptimality . ... 87 7.4.3 CAHCAlgorithm................................ 89 7.5 GlobalAnalysis ................................... .. 95 7.6 UserRefinement .................................... 96 7.7 Summary ........................................ 97 8 High-Level Simulation 99 8.1 RelatedWork ..................................... 99 8.2 MVPOverview ..................................... 102 8.3 MVPSimulator..................................... 103 8.3.1 VirtualProcessingElement . ... 103 8.3.2 GenericSharedMemory . 106 8.3.3 UserInterfaceandVirtualPeripheral . ...... 107 8.4 SoftwareToolChain ............................... ... 108 8.4.1 ExecutionEnvironment . 108 8.4.2 CodeGeneration ................................ 109 8.5 DebugMethod ..................................... 113 8.6 Summary ........................................ 114 9 Case Study 115 9.1 JPEGEncoder ..................................... 115 9.1.1 TCTTools ................................... 115 9.1.2 Profiling..................................... 116 9.1.3 Partitioning .................................. 116 9.1.4 Simulation Result and Further Refinement . ....... 118 9.2 H.264BaselineDecoder. .... 119 9.2.1 TargetPlatform................................ 119 9.2.2 Profiling..................................... 121 9.2.3 Partitioning .................................. 122 9.2.4 TargetCodeGeneration . 122 9.2.5 SimulationResults ............................. 124 9.3 Summary ........................................ 126 10 Summary and Outlook 127 10.1Summary ........................................ 127 10.2Outlook ........................................ 128 AMAPSArchitectureDescriptionXMLSchema 131 A.1 ComponentLibrary................................. 131 A.2 ArchitectureModel ................................. 134 B Pseudo Code for Profiling 137 B.1 CodeInstrumentation . ... .. .. .. .. .. ... .. .. .. .. .. .. .... 137 B.2 FunctionCallCostCalculation . ....... 138 B.3 IRStatementExecutionCostCalculation . ......... 138 iii iv Contents B.4 CallGraphGeneration . .... 139 C MAPS Application Profile XML Schema 141 DPseudoCodeforControlDataFlowAnalysis 143 D.1 SCFGConstruction................................ ... 143 D.2 SCDFGConstruction............................... ... 144 D.3 DataDependenceAnalysis . .... 144 D.3.1 StaticDataAnalysis ............................. 144 D.3.2 DynamicAnalysis................................ 145 D.4 WSCDFGConstruction .............................. 146 D.4.1 NodeWeightAnnotation. 147 D.4.2 ControlEdgeWeightAnnotation . ... 147 D.4.3 DataEdgeWeightAnnotation. 148 Bibliography 151 iv Chapter 1 Introduction 1.1 Background Our lifestyles have been greatly changed by the various new electronic devices, which continuously appear in today’s market. For example, Personal Navigation Assistants (PNAs) help people to find driving direction without the need to read a map; Personal Digital Assistants (PDAs) are used by people to carry a vast amount of information with them just inside a pocket; Portable Multimedia Players (PMPs) allow people to enjoy music and video contents while they travel; and, last but not least, mobile/smart phones have fundamentally changed the way in which people are connected with each other. Behind these different devices, it is the advancement of technologies, especially the rapidly progressing semiconductor technology, that makes the development and production of such devices possible. Figure 1.1 shows a statistic of the transistor count of some Intel processors introduced in the past forty years. !! ! " # ; < = < = @A < C; C E < GC HI : ?? > B > > >F D DJ ¡ ¡ ¨ £ ¢ ¤ ¦ ¥ ¥ ¨ £ ¢ ¤ ¦ ¥ ¥ !! !+ " # I CAC = @ < C < < < I C : ? ? ? ¡ F B >F K> B > F > F L LM N O D O ¡ ¤ ¥ ¨ £ ¢ ¤ ¦ ¥ ¡ ¥ ¨ £ ¢ ¤ ¦ ¥ ¡ ¥ !! !* " # ¡ ¡ ¢ £ ¤ ¦ § ¥ ¢ £ ¤ ¦ ¨ ¨ ¥ 6 3 ¡ !! ! " ) # 9 ¢ £ ¤ ¦ ¨ ¨ ¨ ¥ 7 ¡ 8 ¢ £ ¤ ¦ ¥ 1 § © 7 !! !( " # 6 4 5 4 © 3 !! ! " ' # 2 © 1 0 © © !! !& " # © © § § © © !! !% " # Z [ [ Z P Y `S Q TU ^^QS X\ U ]S Q TU ^^QS _R !! !$ " # , - ./ +( + ! + +*! +* ++! ++ $!!! $!! $! ! $! ' ) )' ' ' ' ' Z P Y Q S TU V WX U R Figure 1.1: Moore’s Law & Transistor Count of Some Intel Processors 1 2 CHAPTER 1. INTRODUCTION It can be seen that the semiconductor industry has been advancing in a spectacular speed as predicted by Moore’s law [1], i.e., the number of transistors that can be fabricated on a single silicon chip doubles every two years. In order to utilize the massive amount of transistors, several years ago designers started integrating multiple processor cores into one chip. One of the reasons that caused this change is power dissipation. Since a processor’s power dissipation increases dramatically when its clock frequency [2] is raised, it would soon be necessary to use water cooling to cool a system down, if cranking up clock frequency is used as the only way to improve performance. By employing multiple processors, the overall computational power is improved without increasing the clock speed; hence, a balance between performance and energy consumption can be achieved. Today, multiprocessing
Recommended publications
  • An Introduction to Analysis and Optimization with AMD Codeanalyst™ Performance Analyzer
    An introduction to analysis and optimization with AMD CodeAnalyst™ Performance Analyzer Paul J. Drongowski AMD CodeAnalyst Team Advanced Micro Devices, Inc. Boston Design Center 8 September 2008 Introduction This technical note demonstrates how to use the AMD CodeAnalyst™ Performance Analyzer to analyze and improve the performance of a compute-bound program. The program that we chose for this demonstration is an old classic: matrix multiplication. We'll start with a "textbook" implementation of matrix multiply that has well-known memory access issues. We will measure and analyze its performance using AMD CodeAnalyst. Then, we will improve the performance of the program by changing its memory access pattern. 1. AMD CodeAnalyst AMD CodeAnalyst is a suite of performance analysis tools for AMD processors. Versions of AMD CodeAnalyst are available for both Microsoft® Windows® and Linux®. AMD CodeAnalyst may be downloaded (free of charge) from AMD Developer Central. (Go to http://developer.amd.com and click on CPU Tools.) Although we will use AMD CodeAnalyst for Windows in this tech note, engineers and developers can use the same techniques to analyze programs on Linux. AMD CodeAnalyst performs system-wide profiling and supports the analysis of both user applications and kernel- mode software. It provides five main types of data collection and analysis: • Time-based profiling (TBP), • Event-based profiling (EBP), • Instruction-based sampling (IBS), • Pipeline simulation (Windows-only feature), and • Thread profiling (Windows-only feature). We will look at the first three kinds of analysis in this note. Performance analysis usually begins with time-based profiling to identify the program hot spots that are candidates for optimization.
    [Show full text]
  • Software Optimization Guide for Amd Family 15H Processors (.Pdf)
    Software Optimization Guide for AMD Family 15h Processors Publication No. Revision Date 47414 3.06 January 2012 Advanced Micro Devices © 2012 Advanced Micro Devices, Inc. All rights reserved. The contents of this document are provided in connection with Advanced Micro Devices, Inc. (“AMD”) products. AMD makes no representations or warranties with respect to the accuracy or completeness of the contents of this publication and reserves the right to make changes to specifications and product descriptions at any time without notice. The infor- mation contained herein may be of a preliminary or advance nature and is subject to change without notice. No license, whether express, implied, arising by estoppel or other- wise, to any intellectual property rights is granted by this publication. Except as set forth in AMD’s Standard Terms and Conditions of Sale, AMD assumes no liability whatsoever, and disclaims any express or implied warranty, relating to its products including, but not limited to, the implied warranty of merchantability, fitness for a particular purpose, or infringement of any intellectual property right. AMD’s products are not designed, intended, authorized or warranted for use as compo- nents in systems intended for surgical implant into the body, or in other applications intended to support or sustain life, or in any other application in which the failure of AMD’s product could create a situation where personal injury, death, or severe property or environmental damage may occur. AMD reserves the right to discontinue or make changes to its products at any time without notice. Trademarks AMD, the AMD Arrow logo, and combinations thereof, AMD Athlon, AMD Opteron, 3DNow!, AMD Virtualization and AMD-V are trademarks of Advanced Micro Devices, Inc.
    [Show full text]
  • AMD Codexl 1.7 GA Release Notes
    AMD CodeXL 1.7 GA Release Notes Thank you for using CodeXL. We appreciate any feedback you have! Please use the CodeXL Forum to provide your feedback. You can also check out the Getting Started guide on the CodeXL Web Page and the latest CodeXL blog at AMD Developer Central - Blogs This version contains: For 64-bit Windows platforms o CodeXL Standalone application o CodeXL Microsoft® Visual Studio® 2010 extension o CodeXL Microsoft® Visual Studio® 2012 extension o CodeXL Microsoft® Visual Studio® 2013 extension o CodeXL Remote Agent For 64-bit Linux platforms o CodeXL Standalone application o CodeXL Remote Agent Note about installing CodeAnalyst after installing CodeXL for Windows AMD CodeAnalyst has reached End-of-Life status and has been replaced by AMD CodeXL. CodeXL installer will refuse to install on a Windows station where AMD CodeAnalyst is already installed. Nevertheless, if you would like to install CodeAnalyst, do not install it on a Windows station already installed with CodeXL. Uninstall CodeXL first, and then install CodeAnalyst. System Requirements CodeXL contains a host of development features with varying system requirements: GPU Profiling and OpenCL Kernel Debugging o An AMD GPU (Radeon HD 5000 series or newer, desktop or mobile version) or APU is required. o The AMD Catalyst Driver must be installed, release 13.11 or later. Catalyst 14.12 (driver 14.501) is the recommended version. See "Getting the latest Catalyst release" section below. For GPU API-Level Debugging, a working OpenCL/OpenGL configuration is required (AMD or other). CPU Profiling o Time-Based Profiling can be performed on any x86 or AMD64 (x86-64) CPU/APU.
    [Show full text]
  • AMD Codexl 1.8 GA Release Notes
    AMD CodeXL 1.8 GA Release Notes Contents AMD CodeXL 1.8 GA Release Notes ......................................................................................................... 1 New in this version .............................................................................................................................. 2 System Requirements .......................................................................................................................... 2 Getting the latest Catalyst release ....................................................................................................... 4 Note about installing CodeAnalyst after installing CodeXL for Windows ............................................... 4 Fixed Issues ......................................................................................................................................... 4 Known Issues ....................................................................................................................................... 5 Support ............................................................................................................................................... 6 Thank you for using CodeXL. We appreciate any feedback you have! Please use the CodeXL Forum to provide your feedback. You can also check out the Getting Started guide on the CodeXL Web Page and the latest CodeXL blog at AMD Developer Central - Blogs This version contains: For 64-bit Windows platforms o CodeXL Standalone application o CodeXL Microsoft® Visual Studio®
    [Show full text]
  • Instruction-Based Sampling: a New Performance Analysis Technique for AMD Family 10H Processors
    Instruction-Based Sampling: A New Performance Analysis Technique for AMD Family 10h Processors Paul J. Drongowski AMD CodeAnalyst Project Advanced Micro Devices, Inc. Boston Design Center 16 November 2007 1. Introduction Software applications must use computational resources efficiently in order to deliver best results in a timely manner. This is especially true for time-sensitive applications such as transaction processing, real-time control, multi-media and games. A program profile is a histogram that reflects dynamic program behavior. For example, a profile shows where a program is spending its time. Program profiling helps software developers to meet performance goals by identifying performance bottlenecks and issues. Profiling is most effective when a developer can quickly identify the location and cause of a performance issue. Instruction-Based Sampling (IBS) is a new profiling technique that provides rich, precise program performance information. IBS is introduced by AMD Family10h processors (AMD Opteron Quad- Core processor “Barcelona.”) IBS overcomes the limitations of conventional performance counter sampling. Data collected through performance counter sampling is not precise enough to isolate performance issues to individual instructions. IBS, however, precisely identifies instructions which are not making the best use of the processor pipeline and memory hierarchy. IBS collects a wide range of performance information in a single program run, making it easier to conduct performance testing. The AMD CodeAnalyst performance analysis tool suite supports IBS and correlates the instruction-level IBS information with processes, program modules, functions and source code. IBS in combination with CodeAnalyst helps a developer to find, analyze and ameliorate performance problems. This technical note is a brief introduction to Instruction-Based Sampling.
    [Show full text]
  • AMD Codexl 1.1 GA Release Notes
    AMD CodeXL 1.1 GA Release Notes Thank you for using CodeXL. We appreciate any feedback you have! Please use our CodeXL Forum to provide your feedback. You can also check out the Getting Started guide on the CodeXL Web Page and the latest CodeXL blog at AMD Developer Central - Blogs This version contains: CodeXL Visual Studio 2012 and 2010 packages and Standalone application, for 32-bit and 64-bit Windows platforms CodeXL for 64-bit Linux platforms Kernel Analyzer v2 for both Windows and Linux platforms Note about 32-bit Windows CodeXL 1.1 Upgrade Error On 32-bit Windows platforms, upgrading from previous version of CodeXL using the CodeXL 1.1 installer will remove the previous version and then display an error message without installing CodeXL 1.1. The recommended method is to uninstall previous CodeXL before installing CodeXL 1.1. If you ran the 1.1 installer to upgrade a previous installation and encountered the error mentioned above, ignore the error and run the installer again to install CodeXL 1.1. Note about installing CodeAnalyst after installing CodeXL for Windows CodeXL can be safely installed on a Windows station where AMD CodeAnalyst is already installed. However, do not install CodeAnalyst on a Windows station already installed with CodeXL. Uninstall CodeXL first, and then install CodeAnalyst. System Requirements CodeXL contains a host of development features with varying system requirements: GPU Profiling and OpenCL Kernel Debugging o An AMD GPU (Radeon HD 5xxx or newer) or APU is required o The AMD Catalyst Driver must be installed, release 12.8 or later.
    [Show full text]
  • AMD Codexl 1.9 GA Release Notes
    AMD CodeXL 1.9 GA Release Notes Contents AMD CodeXL 1.9 GA Release Notes ......................................................................................................... 1 New in this version .............................................................................................................................. 2 System Requirements .......................................................................................................................... 2 Getting the latest Radeon software release ......................................................................................... 4 Note about installing CodeAnalyst after installing CodeXL for Windows ............................................... 4 Fixed Issues ......................................................................................................................................... 4 Known Issues ....................................................................................................................................... 5 Support ............................................................................................................................................... 7 Thank you for using CodeXL. We appreciate any feedback you have! Please use the CodeXL Forum to provide your feedback. You can also check out the Getting Started guide on the CodeXL Web Page and the latest CodeXL blog at AMD Developer Central - Blogs This version contains: For 64-bit Windows platforms o CodeXL Standalone application o CodeXL Microsoft® Visual Studio® 2010
    [Show full text]
  • AMD Technology & Software
    AMD Technology & Software Justin Boggs, Sr. Developer Relations Engineer [email protected] Q3’06 Agenda • Desktop Overview • Processors & Roadmap • Software Architecture & Performance • Desktop Platforms • Future AMD Technology Directions • AMD Developer Resources • Call to Action 2 AMD Technology & Software AMD Desktop Overview AMD Desktop Advantage • Built from the ground up, AMD x86 processor technology makes it possible to improve responsiveness to changing business needs AMD64 offers flexibility by supporting 32- and 64-bit applications across desktop, mobile and server applications Direct Connect Architecture enables increased performance, scalability and improved multi-tasking AMD Dual Core provides enhanced performance without increasing power requirements AMD Cool ‘n’ Quiet™ decreases overall power consumption by optimizing performance on demand Enhanced Virus Protection adds an extra level of virus protection to your security solution 4 AMD Technology & Software AMD64™ Powerful 64-Bit Computing • Technology that gives total backward compatibility with leading-edge 32-bit computing performance. • Technology that paves the way to multi-core computing with cutting-edge communications technology. • Technology is more than 64-bit computing— it’s also about next-generation architecture. • Technology that allows software developers to create new functionality for end users. • Technology that solves real problems. 5 AMD Technology & Software AMD Direct Connect Architecture Direct Connect Architecture moves more data more efficiently,
    [Show full text]
  • Improving Program Performance with AMD Codeanalyst for Linux® Paul J. Drongowski AMD Codeanalyst Team May 9, 2007
    Improving program performance with AMD CodeAnalyst for Linux® Paul J. Drongowski AMD CodeAnalyst team May 9, 2007 AMD CodeAnalyst for Linux® Identify and investigate performance hot-spots Easy to use graphical user interface – Simplified configuration of performance events – GUI is consistent for cross-platform use Low overhead, system-wide data collection – Can profile dynamically loaded modules – Can profile kernel mode drivers as well as applications Results are displayed in configurable views – Source and disassembly annotated with performance data – Rates and ratios assist interpretation of results 2 May 9, 2007 Using AMD CodeAnalyst for Linux AMD CodeAnalyst for Linux Open source – Uses oprofile for data collection – Can import oprofile data – CodeAnalyst GUI is Qt-based Java profiling support – Version 2.5: Modified oprofile driver and daemon – Version 2.6: Modified daemon only 3 May 9, 2007 Using AMD CodeAnalyst for Linux Presentation outline AMD CodeAnalyst in action – Time-based profiling – Event-based profiling – Tuning Downloading AMD CodeAnalyst On-line resources 4 May 9, 2007 Using AMD CodeAnalyst for Linux AMD CodeAnalyst in action 5 May 9, 2007 Using AMD CodeAnalyst for Linux Performance tuning cycle Measure performance Change Identify program hot-spot Identify cause 6 May 9, 2007 Using AMD CodeAnalyst for Linux Kinds of analysis Identify hot-spot: Time-based profiling – Where is the application or system spending time? Identify cause: Event-based profiling – How is the application behaving on the CPU? Memory?
    [Show full text]
  • Efficient Implementation of Finite Element Operators on Gpus
    FRIEDRICH-ALEXANDER-UNIVERSITAT¨ ERLANGEN-NURNBERG¨ INSTITUT FUR¨ INFORMATIK (MATHEMATISCHE MASCHINEN UND DATENVERARBEITUNG) Lehrstuhl f¨ur Informatik 10 (Systemsimulation) Efficient Implementation of Finite Element Operators on GPUs Severin Strobl Studienarbeit Efficient Implementation of Finite Element Operators on GPUs Severin Strobl Studienarbeit Aufgabensteller: Prof. Dr. C. Pflaum Betreuer: Prof. Dr. C. Pflaum Bearbeitungszeitraum: 01.01.2008 – 02.09.2008 Erkl¨arung: Ich versichere, dass ich die Arbeit ohne fremde Hilfe und ohne Benutzung anderer als der angegebenen Quellen angefertigt habe und dass die Arbeit in gleicher oder ¨ahnlicher Form noch keiner anderen Prufungsbeh¨ ¨orde vorgelegen hat und von dieser als Teil einer Prufungs-¨ leistung angenommen wurde. Alle Ausfuhrungen,¨ die w¨ortlich oder sinngem¨aß ubernommen¨ wurden, sind als solche gekennzeichnet. Erlangen, den 1. September 2008 . Abstract Graphics processing units (GPUs) of the current generation are highly programmable using interfaces like NVIDIA’s Compute Unified Device Architecture (CUDA) while of- fering a massive computational power for a relatively low price. The solution of systems of linear equations described by stencils representing finite element operators by iterative solvers is a application which can utilize these benefits of GPUs. This thesis describes the implementation of a parallelized Gauss-Seidel solver achieving up to 40 times the perfor- mance of a corresponding solver on standard CPUs. The solver is part of a library named GPUsolve, which offers the infrastructure to integrate this GPU-enabled application into object oriented C++ code. 4 Contents 1 Introduction 7 2 GPU Programming 8 2.1 GPU Hardware .................................... 8 2.2 Shader Programming ................................. 9 2.2.1 BrookGPU .................................. 10 2.2.2 Sh ......................................
    [Show full text]
  • Innovatives Supercomputing in Deutschland
    inSiDE • Vol. 8 No.1 • Spring 2010 Innovatives Supercomputing in Deutschland Editorial Contents Welcome to this new issue of InSiDE the physics of galactic nuclei and the the journal on innovative Supercomput- global parameter optimization of a News ing in Germany published by the Gauss physics-based united-residue force field PRACE: Ready for Implementation 4 Center for Supercomputing. for proteins. New JUVIS Visualization Cluster 7 The focus of the last months all over Projects are an important driving fac- Europe was on the European HPC initia- tor for our community. They provide Applications tive PRACE (Partnership for Advanced a chance to develop and test new Direct Numerical Simulation of Computing in Europe) and the Gauss methods and tools. They allow further Active Separation Control Devices 8 Center for Supercomputing has played improving existing codes and building a very active and important role in the tools we need in order to harness Numerical Simulation and Analysis of Cavitating Flows 14 this process. As a result of all these the performance of our systems. activities PRACE is now ready for the A comprehensive section of projects Modelling Convection over West Africa 18 implementation phase. Here you find an describes among other the recent ad- overview of the results achieved in this vances in the UNICORE 6 middleware Driving Mechanisms of the Geodynamo 26 Editorial issue. In our autumn edition we will be software. This is a good example how Contents able to report on the next steps which a nationally funded project that was The Maturing of Giant Galaxies by Black Hole Activity 32 will include the founding of a formal supported by European projects finally European organization.
    [Show full text]
  • AMD Codexl 1.3 GA Release Notes
    AMD CodeXL 1.3 GA Release Notes Thank you for using CodeXL. We appreciate any feedback you have! Please use the CodeXL Forum to provide your feedback. You can also check out the Getting Started guide on the CodeXL Web Page and the latest CodeXL blog at AMD Developer Central - Blogs This version contains: For Windows for 32-bit and 64-bit Windows platforms o CodeXL Standalone application o CodeXL Microsoft® Visual Studio® 2010 extension o CodeXL Microsoft® Visual Studio® 2012 extension o CodeXL Remote Agent For 64-bit Linux platforms o CodeXL Standalone application o CodeXL Remote Agent Note about 32-bit Windows CodeXL 1.3 Upgrade Error On 32-bit Windows platforms, upgrading from CodeXL 1.0 using the CodeXL 1.3 installer will remove the previous version and then display an error message without installing CodeXL 1.3. The recommended method is to uninstall previous CodeXL versions before installing CodeXL 1.3. If you ran the 1.3 installer to upgrade a previous installation and encountered the error mentioned above, ignore the error and run the installer again to install CodeXL 1.3. Note about installing CodeAnalyst after installing CodeXL for Windows CodeXL can be safely installed on a Windows station where AMD CodeAnalyst is already installed. However, do not install CodeAnalyst on a Windows station already installed with CodeXL. Uninstall CodeXL first, and then install CodeAnalyst. System Requirements CodeXL contains a host of development features with varying system requirements: GPU Profiling and OpenCL Kernel Debugging o An AMD GPU (Radeon HD 5000 series or newer, desktop or mobile version) or APU is required.
    [Show full text]