SEVENTH FRAMEWORK PROGRAMME Research Infrastructures
Total Page:16
File Type:pdf, Size:1020Kb
SEVENTH FRAMEWORK PROGRAMME Research Infrastructures INFRA-2007-2.2.2.1 - Preparatory phase for 'Computer and Data Treat- ment' research infrastructures in the 2006 ESFRI Roadmap PRACE Partnership for Advanced Computing in Europe Grant Agreement Number: RI-211528 D8.3.2 Final technical report and architecture proposal Final Version: 1.0 Author(s): Ramnath Sai Sagar (BSC), Jesus Labarta (BSC), Aad van der Steen (NCF), Iris Christadler (LRZ), Herbert Huber (LRZ) Date: 25.06.2010 D8.3.2 Final technical report and architecture proposal Project and Deliverable Information Sheet PRACE Project Project Ref. №: RI-211528 Project Title: Final technical report and architecture proposal Project Web Site: http://www.prace-project.eu Deliverable ID: : <D8.3.2> Deliverable Nature: Report Deliverable Level: Contractual Date of Delivery: PU * 30 / 06 / 2010 Actual Date of Delivery: 30 / 06 / 2010 EC Project Officer: Bernhard Fabianek * - The dissemination level are indicated as follows: PU – Public, PP – Restricted to other participants (including the Commission Services), RE – Restricted to a group specified by the consortium (including the Commission Services). CO – Confidential, only for members of the consortium (including the Commission Services). PRACE - RI-211528 i 25.06.2010 D8.3.2 Final technical report and architecture proposal Document Control Sheet Title: : Final technical report and architecture proposal Document ID: D8.3.2 Version: 1.0 Status: Final Available at: http://www.prace-project.eu Software Tool: Microsoft Word 2003 File(s): D8.3.2_addn_v0.3.doc Written by: Ramnath Sai Sagar (BSC), Jesus Labarta Authorship (BSC), Aad van der Steen (NCF), Iris Christadler (LRZ), Herbert Huber (LRZ) Contributors: Eric Boyer (CINES), James Perry (EPCC), Paul Graham (EPCC), Mark Parsons (EPCC), Alan D Simpson (EPCC), Willi Homberg (FZJ), Wolfgang Gürich( FZJ), Thomas Lippert (FZJ), Radoslaw Januszewski (PSNC), Jonathan Follows (STFC), Igor Kozin (STFC), Dave Cable (STFC), Hans Hacker (LRZ), Volker Weinberg (LRZ), Johann Dobler (LRZ), Christoph Biardzki (LRZ), Reinhold Bader (LRZ), Momme Allalen (LRZ), Jose Gra- cia (HLRS), Vladimir Marjanovic (BSC), Guillaume Colin De Verdière (CEA), Cal- vin Christophe (CEA), Hervé Lozach (CEA), Jean-Marie Normand (CEA), Sadaf Alam (CSCS), Adrian Tineo (CSCS), Tim Stitt (CSCS), Neil Stringfellow(CSCS), Giovanni Erbacci (CINECA), Giovanni Foiani (CINECA), Carlo Cavazzoni (CINECA), Filippo Spiga (CINECA), Kimmo Koski (CSC), Jussi Heikonen (CSC), Olli-Pekka Lehto (CSC), Lennart Johnsson (KTH), Lilit Axner (KTH) Reviewed by: Thomas Eickermann (FZJ), Miroslaw Kupzyk (PSNC) Approved by: Technical Board PRACE - RI-211528 ii 25.06.2010 D8.3.2 Final technical report and architecture proposal Document Keywords and Abstract Keywords: PRACE, HPC, Research Infrastructure Abstract: This document describes the activities in Work Package 8 Task 8.3 (WP8.3) updating and analysing results reported in D8.3.1 for the dif- ferent WP8 prototypes. The document also suggests potential architec- tures for future machines, the level of performance we should expect and areas where research efforts should be dedicated. Copyright notices © 2010 PRACE Consortium Partners. All rights reserved. This document is a project docu- ment of the PRACE project. All contents are reserved by default and may not be disclosed to third parties without the written consent of the PRACE partners, except as mandated by the European Commission contract RI-211528 for reviewing and dissemination purposes. All trademarks and other rights on third party products mentioned in this document are ac- knowledged as own by the respective holders. PRACE - RI-211528 iii 25.06.2010 D8.3.2 Final technical report and architecture proposal Table of Contents Project and Deliverable Information Sheet ......................................................................................... i Document Control Sheet....................................................................................................................... ii Document Keywords and Abstract..................................................................................................... iii Table of Contents ................................................................................................................................. iv List of Figures ....................................................................................................................................... vi List of Tables......................................................................................................................................... ix References and Applicable Documents ............................................................................................... x Executive Summary .............................................................................................................................. 1 1 Introduction .................................................................................................................................. 2 1.1 Scope and Structure of the Report .............................................................................................. 3 2 WP8 prototypes and Research Activities.................................................................................... 4 2.1 Prototypes ...................................................................................................................................... 4 2.1.1 eQPACE......................................................................................................................... 4 2.1.2 BAdW-LRZ/GENCI-CINES Phase1 (CINES Part) ........................................................ 5 2.1.3 BAdW-LRZ/GENCI-CINES Phase2 (LRZ Part) ............................................................ 6 2.1.4 Intel Many Integrated Core (MIC) architecture ............................................................ 7 2.1.5 ClearSpeed-Petapath..................................................................................................... 8 2.1.6 Hybrid Technology Demonstrator ................................................................................. 9 2.1.7 Maxwell FPGA ............................................................................................................ 10 2.1.8 XC4-IO......................................................................................................................... 11 2.1.9 SNIC-KTH.................................................................................................................... 12 2.1.10 RapidMind ................................................................................................................... 13 2.2 Research activities....................................................................................................................... 14 2.2.1 PGAS language compiler............................................................................................. 14 2.2.2 Research on Power Efficiency ..................................................................................... 16 2.2.3 Parallel GPU............................................................................................................... 17 2.2.4 Performance Predictions ............................................................................................. 18 3 WP8 evaluation results............................................................................................................... 21 3.1 Performance experiments on prototype hardware................................................................... 21 3.1.1 Reference performance................................................................................................ 21 3.1.2 Numerical issues.......................................................................................................... 25 3.1.3 Accelerated Programming Languages and Compilers ................................................ 28 3.1.4 FPGA experiments....................................................................................................... 37 3.1.5 LRZ + CINES (Phase 1) .............................................................................................. 41 3.1.6 LRZ + CINES (Phase 2) .............................................................................................. 44 3.1.7 Intel MIC Architecture................................................................................................. 48 3.1.8 Petapath experiments................................................................................................... 50 3.2 Hybrid Programming Models .................................................................................................... 53 3.2.1 MPI+OpenMP ............................................................................................................. 53 3.2.2 MPI+CUDA................................................................................................................. 55 3.2.3 MPI + CellSs ............................................................................................................... 58 3.3 Intra-Node Bandwidth................................................................................................................ 58 3.3.1 Triads (RINF1) benchmark results (BAdW-LRZ) ........................................................ 58 3.3.2 Random Access Results................................................................................................ 59 3.3.3 Host to accelerator