CRAY XC30 System 利用者講習会

Total Page:16

File Type:pdf, Size:1020Kb

CRAY XC30 System 利用者講習会 CRAY XC30 System 利用者講習会 2015/06/15 1 演習事前準備 演習用プログラム一式が /work/Samples/workshop2015 下に置いてあります。 各自、/work下の作業ディレクトリへコピーし てください。 2 Agenda 13:30 - 13:45 ・Cray XC30 システム概要 ハードウェア、ソフトウェア 13:45 - 14:00 ・Cray XC30 システムのサイト構成 ハードウェア、ソフトウェア 14:00 - 14:10 <休憩> 14:10 - 14:50 ・XC30 プログラミング環境 ・演習 14:50 - 15:00 <休憩> 15:00 - 15:10 ・MPIとは 15:10 - 15:50 ・簡単なMPIプログラム ・演習 15:50 - 16:00 <休憩> 16:00 - 16:20 ・主要なMPI関数の説明 ・演習 16:20 - 16:50 ・コードの書換えによる最適化 ・演習 16:50 - 17:00 ・さらに進んだ使い方を学ぶ為には 17:00 - 17:30 ・質疑応答 3 CRAY System Roadmap (XT~XC30) Cray XT3 “Red Storm” Cray XT4 Cray XT5 Cray XT5 Cray XE6 Cray XC30 (2005) (2006) h (2007) (2007) (2010) (2012) Cray XT Infrastructure XK System With GPU XMT is based on • XMT2: fall 2011 XT3 • larger memory infrastructure. • higher bandwidth • enhanced RAS • new performance features 6/5/2015 4 CRAY XC30 System構成(1) ノード数 360ノード 2014/12/25 (720CPU, 5760コア)以降 理論ピーク性能 ノード数 360ノード 119.8TFLOPS (720CPU, 8640コア) 総主記憶容量 22.5TB 理論ピーク性能 359.4TFLOPS 総主記憶容量 45TB フロントエンドサービスノード システム管理ノード (ログインノード) FCスイッチ 二次記憶装置 磁気ディスク装置 システムディスク SMW 管理用端末 System, SDB 4x QDR Infiniband 貴学Network 8Gbps Fibre Channel 1GbE or 10GbE 5 System構成 (計算ノード) ノード数 :360ノード(720CPU,5760コア) 360ノード(720CPU,8640コア) 総理論演算性能 :119.8TFLOPS 359.4TFLOPS 主記憶容量 :22.5TB 47TB ノード仕様 CPU :Intel Xeon E5-2670 2.6GHz 8core CPU数 :2 Intel Xeon E5-2690 V3 2.6GHz 12core CPU理論演算性能 :166.4GFLOPS 499.2GFLOPS ノード理論演算性能 :332.8GFLOPS 998.4GFLOPS ノード主記憶容量 :64GB 128GB (8GB DDR3-1600 ECC DIMM x8) ノードメモリバンド幅 :102.4GB/s 16 GB DDR4-2133 ECC DIMM x8 136.4GB/s 6 TOP500 (http://www.top500.org/) TOP500は、世界で最も高速なコンピュータシステムの上位500位までを定期的にランク付 けし、評価するプロジェクトである。 1993年に発足し、スーパーコンピュータのリストの更新を年2回発表している。ハイパフォー マンスコンピューティング(HPC)における傾向を追跡・分析するための信頼できる基準を提 供することを目的とし、LINPACKと呼ばれるベンチマークによりランク付けを行っている。リ ストの作成はマンハイム大学、テネシー大学、ローレンス・バークレイ米国立研究所の研究 者らによる。毎年6月のInternational Supercomputing Conference(ISC)および11月の Supercomputing Conference(SC)の開催に合わせて発表されている。 本システムにおいては理論ピーク性能に対し77.3%の実効効率となります。 T/V N NB P Q Time Gflops -------------------------------------------------------------------------------- WR01R2C4 2234880 192 90 96 26755.52 2.78137e+05 この値は最新のtop500リスト(Nov.2014)では、196位相当になります。 7 Cray XC30 System Cabinet 8 Cray XC30 Compute Cabinet 9 XC30 Chassis : Blower Cabinet Hot Swap Blower Assembly Blower Cabinet N+1 configurations Low pressure and velocity of air Low noise (TN26-6 standard, 75 db/cabinet) Blower Cabinet Exploded View Blower Assembly 10 Cray XC30システム 冷却 11 Cray XC30システムのパッケージング キャビネット構成 1つのシャーシに16ブレードを搭載 合計64ノード(128 CPUソケット)を搭載 Aries間通信網はバックプレーン シャーシ構成 ブレード構成 1つのキャビネットに3シャーシ(192ノード)を 搭載 Side-to-side冷却エアーフロー 1つのブレードに4ノードを搭載 前世代のCray XE6システムと比較して1.5倍 合計8つのCPUソケットを搭載 のキャビネット幅で2倍のノード数を搭載 1つのAriesルータチップを搭載 12 13 Cray XC30 Compute Blade (left side) 14 Cray XC30 IO Blade (left side) 15 Cray XC30計算ノードと高速ルータチップAries 16 Cray Network Evolution SeaStar 25万コア対応のルーターチップ 高効率のルーティング、低コンテンション Red Storm, Cray XT3/XT4/XT5/XT6システム Gemini メッセージスループット100倍以上の改善 レイテンシ3倍以上の改善 PGAS, グローバルアドレッシング空間をサポート スケーラビリティの改善100万コア以上 Cray XE6システム Aries 高バンド幅、ホップ数低減、最大で10倍以上の改善 非常に効率の良いルーティングと低コンテンション Electro-Optical シグナル Cray XC30システム 17 階層型All-to-AllのDragonflyネットワークトポロジー Torus Topology トーラス・トポロジ Dragonfly Group システム全体を階層型 のAll-to-Allで構成 本システムでは2階層 dd A & te ks la in u L ps l a ba nc lo E G A ll- to -A l l L in All-to-Allリンクを形成 ks 各端子間は直接接続 1シャーシー内で15通りの ルーティングがあります ホップカウント数は単一 グローバルな帯域幅を向上 Flattened Butterfly Topology 18 CRAY XC30 System概要 Cray XC30 シリーズは、次世代Aries インターコネクト、 Dragonfly ネットワークトポロジ、Crayのスーパーコンピュータ ラインで初めて採用した高性能Intel Xeon プロセッサ、統合ス トレージソリューション、さらにCray のOS、プログラミング環境 などの先進的ソフトウエアから成る新世代スーパーコンピュー タです。次世代プロセッサへのアップグレードや各種アクセラ レータも利用可能とするCray のビジョン アダプティブスーパー コンピューティングを実現する画期的なシステムとなっておりま す。 19 Dragonfly Class-2 Topology ------JAIST System 6 backplanes connected with copper wires in a 2-cabinet group: “Rank-2” Pairs of Aries connect to a optical fiber to interconnect groups “Rank-3” Chassis 4 nodes connect 16 Aries connected to a single Aries in backplane “Rank-1” 20 Dragonflyネットワークの優位性 • 新開発のAriesチップとAll-to-Allをベースにした新しいネットワークトポロジの採用。 • 前世代のCray XE6システムと比較してより高バンド幅、低レイテンシを実現。 • 実アプリケーションによる多数ノードを使用した全通信処理時に著しく性能が向上。 (通信性能は2ノード間の評価ではなく、全体通信性能の評価が重要) ~ 20倍の All-to-All バンド幅 21 System構成 (Storage) 22 System構成 (Lustre File System) 本System(Storage)構成としては、Lustre File Systemにより構成されております。 Lustreファイルシステムは、メタデータ管理サーバ、ストレージ管理サーバおよびク ライアントから構成される並列ファイルシステムで、並列にファイルを分散、管理する ことにより負荷分散を行い、高レスポンスを実現しています。 MPIなどの並列アプリ ケーションからのデータ入出力など、大量ノードからの入出力を行う作業に適してい ます。 /work (200TB) Tier 1 1 2 3 4 5 6 7 8 P P Tier 2 1 2 3 4 5 6 7 8 P P Tier 3 1 2 3 4 5 6 7 8 P P DDN RAID S2A9900 23 /workの並列I/O性能 ● /work (DDN-SFA12000, Lustre ファイルシステム)のIORベンチ マークによる並列I/O性能は以下の通りです。 clients = 32 (1 per node) repetitions = 3 xfersize = 1 MiB blocksize = 64 GiB aggregate filesize = 2048 GiB Operation Max (MiB) Min (MiB) Mean (MiB) Std Dev Max (OPs) Min (OPs) Mean (OPs) --------- --------- --------- ---------- ------- --------- --------- ---------- write 11234.56 6176.17 9536.07 2375.86 11234.56 6176.17 9536.07 read 10662.81 6742.32 9270.11 1790.50 10662.81 6742.32 9270.11 Max Write: 11234.56 MiB/sec (11780.29 MB/sec) Max Read: 10662.81 MiB/sec (11180.76 MB/sec) ● 設定は api = POSIX, access = file-per-process 24 /xc30-workのI/O性能 ● /xc30-work (ファイルサーバ上のNFS)のI/O性能は以下の通り write 88MB/s, read 106MB/s Writing with putc()...done Writing intelligently...done Rewriting...done Reading with getc()...done Reading intelligently...done start 'em...done...done...done... Create files in sequential order...done. Stat files in sequential order...done. Delete files in sequential order...done. Create files in random order...done. Stat files in random order...done. Delete files in random order...done. Version 1.03e ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP xc30-0 63G 85778 85 88123 3 39613 3 87725 76 106167 4 293.1 0 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 993 4 1275 0 237 1 228 1 865 1 473 2 xc30-0,63G,85778,85,88123,3,39613,3,87725,76,106167,4,293.1,0,16,993,4,1275,0,237,1,228,1,865,1,473,2 ● 性能測定にはbonnie++1.03eを使用 25 XC30システム オペレーティングシステム CLE ● オープンソースLinuxをベースに最適化 ● 全ての機能設計でスケーラビリティを重視 ● システム・スケーラビリティ ● アプリケーション・スケーラビリティ ● 外部インタフェイス・スケーラビリティ ● ノード用途に合わせて適用 ● ログインノード専用ソフトウエア ● 計算ノード専用ソフトウエア ● システムサービスノード専用ソフトウエア ● 高速ネットワークインフラの上に実装 ● Aries専用高速プロトコル ● Infiniband, TCP/IP対応 ● 運用に必要な全機能を階層的に統合 ● サードパーティソフトウエアの融合 26 Crayプログラミング環境 Cray Programming Environment • 最適化 Fortran, C, C++ • Cray コンパイラ • Cray fortran コンパイラ • Fortran • 自動最適化、自動並列化 • C/C++ • Fortran 2008 規格準拠 • Cray 科学数学ライブラリCSML • coarray のフルサポート+デバッガ • 通信ライブラリ によるサポート • I/Oライブラリ • Cray C コンパイラ • 性能解析ツール • 自動最適化、自動並列化 • プログラム最適化ツール • UPC のフルサポート+デバッガによ • Cray 開発ツールキット るサポート • OpenACC 1.0 準拠(Fortran, C) • 以降も積極的な強化計画 • OpenACC 2.0 • OpenMP 4.0 • C++11 • inline assembly • Intel Xeon Phi 27 Cray Programming Environment Distribution Focus on Differentiation and Productivity Programming Programming Optimized Scientific Compilers I/O Libraries Tools Languages models Libraries Distributed Environment setup LAPACK Memory NetCDF Fortran Cray Compiling Environment (Cray MPT) Modules (CCE) • MPI ScaLAPCK • SHMEM HDF5 Debuggers BLAS (libgoto) C lgdb Shared Memory Iterative • OpenMP 3.0 Debugging Support Refinement (CCE & Intel PGI ) Tools GNU Toolkit C++ • Fast Track PGAS & Global Cray Adaptive Debugger View FFTs (CRAFFT) gdb • UPC (CCE) (CCE w/ DDT) • CAF (CCE) • Abnormal Chapel Termination FFTW • Chapel Processing Cray PETSc STAT (with CASK) Python Cray Trilinos Cray (with CASK) Comparative Debugger# Performance Analysis Cray developed #: Under development Cray Performance Licensed ISV SW Monitoring and 3rd party packaging Analysis Tool Cray added value to 3rd party 28 ジョブ投入(実行)環境(バッチ・サブシステム) ● アプリケーションをバッチ・サブシステムに投入して実行する方法 インタラクティブ・セッションから、バッチ・サブシステム(PBS Pro)で、aprunコマンド を用いてジョブを実行する事が出来ます。 ※ 具体的な利用方法は、後のプログラミング編でご説明いたします。 Login node User Database node Log-in and yod CPUapbridge inventory start application aprun detabaseapwatch User application app agent Local apsys apinit apsheperdPCT User CPU list Request CPU applicationUser app Compute node UNICOS/CLEcatamount/QK Compute PBSPro PE Scheduler apsched PBSPro Allocator apinit Executorapinit apsheperdPCT User applicationUser apsheperd PBSPro app Fan out PCT UNICOS/CLECompute node User Server catamount/QK application Userapplication app Compute node apinit apsheperdPCT UNICOS/CLcatamount/QK Job User applicationUser Queues app Compute node UNICOS/CLEcatamount/QK 29 ログイン・ノードのファイルシステムは、次の構成になっています。 /root /opt /tmp /work/appli /work Library及びヘッダ 一時領域 3rdベンダ・ ユーザ一時領域用 ーファイル等用 アプリケーション用 /opt : 後述いたしますmoduleコマンドでCompilerやLibrary等が 読み込まれる為、通常は直接パスで指定は不要です。 30 Cray XC30システムを使用する上での留意点 ● 利用可能な一時ファイルシステム / work ファイルシステムがテンポラリとして利用可能 ※利用の際は自分のユーザ名のディレクトリを作成 例: mkdir /work/testuser-name ● ログイン・ノード上にあるコンパイラでコンパイルし、生成した 実行形式ファイルが、計算ノードで実行可能 ● 計算ノード用の実行形式ファイルは、ログイン・ノードでは実 行しない ● 計算ノードではホームディレクトリ( /home )が利用できないた め,ジョブ実行に必要なファイルはすべて/work以下に置く 31 Cray XC30システムへログインする方法 ssh ユーザ名 @ xc30 Sample Program Directory xc30:/work/Samples/workshop2015 Queue Name SEMINAR 並列化プログラミング入門 この講習で行う内容 ● プログラミング環境 ● MPIとは? ● 簡単なMPIプログラム ● 主要なMPI関数の説明 ● 台形公式の数値積分 ● さらに進んだ使い方を学ぶためには? 34 プログラミング環境 35 プログラミング環境の選択- moduleコマンド ● moduleコマンド概要 ● ソフトウエア開発・実行に必要な環境設定を動的に切り替えるためのコマンド ツール ● Cray, IntelおよびGNUコンパイラのプログラミング環境 • Cray環境モジュール PrgEnv-cray: Crayコンパイラ(標準) • Intel環境モジュール PrgEnv-intel: Intelコンパイラ • GNU環境モジュール PrgEnv-gnu: GNUコンパイラ
Recommended publications
  • An Operational Perspective on a Hybrid and Heterogeneous Cray XC50 System
    An Operational Perspective on a Hybrid and Heterogeneous Cray XC50 System Sadaf Alam, Nicola Bianchi, Nicholas Cardo, Matteo Chesi, Miguel Gila, Stefano Gorini, Mark Klein, Colin McMurtrie, Marco Passerini, Carmelo Ponti, Fabio Verzelloni CSCS – Swiss National Supercomputing Centre Lugano, Switzerland Email: {sadaf.alam, nicola.bianchi, nicholas.cardo, matteo.chesi, miguel.gila, stefano.gorini, mark.klein, colin.mcmurtrie, marco.passerini, carmelo.ponti, fabio.verzelloni}@cscs.ch Abstract—The Swiss National Supercomputing Centre added depth provides the necessary space for full-sized PCI- (CSCS) upgraded its flagship system called Piz Daint in Q4 e daughter cards to be used in the compute nodes. The use 2016 in order to support a wider range of services. The of a standard PCI-e interface was done to provide additional upgraded system is a heterogeneous Cray XC50 and XC40 system with Nvidia GPU accelerated (Pascal) devices as well as choice and allow the systems to evolve over time[1]. multi-core nodes with diverse memory configurations. Despite Figure 1 clearly shows the visible increase in length of the state-of-the-art hardware and the design complexity, the 37cm between an XC40 compute module (front) and an system was built in a matter of weeks and was returned to XC50 compute module (rear). fully operational service for CSCS user communities in less than two months, while at the same time providing significant improvements in energy efficiency. This paper focuses on the innovative features of the Piz Daint system that not only resulted in an adaptive, scalable and stable platform but also offers a very high level of operational robustness for a complex ecosystem.
    [Show full text]
  • New CSC Computing Resources
    New CSC computing resources Atte Sillanpää, Nino Runeberg CSC – IT Center for Science Ltd. Outline CSC at a glance New Kajaani Data Centre Finland’s new supercomputers – Sisu (Cray XC30) – Taito (HP cluster) CSC resources available for researchers CSC presentation 2 CSC’s Services Funet Services Computing Services Universities Application Services Polytechnics Ministries Data Services for Science and Culture Public sector Information Research centers Management Services Companies FUNET FUNET and Data services – Connections to all higher education institutions in Finland and for 37 state research institutes and other organizations – Network Services and Light paths – Network Security – Funet CERT – eduroam – wireless network roaming – Haka-identity Management – Campus Support – The NORDUnet network Data services – Digital Preservation and Data for Research Data for Research (TTA), National Digital Library (KDK) International collaboration via EU projects (EUDAT, APARSEN, ODE, SIM4RDM) – Database and information services Paituli: GIS service Nic.funet.fi – freely distributable files with FTP since 1990 CSC Stream Database administration services – Memory organizations (Finnish university and polytechnics libraries, Finnish National Audiovisual Archive, Finnish National Archives, Finnish National Gallery) 4 Current HPC System Environment Name Louhi Vuori Type Cray XT4/5 HP Cluster DOB 2007 2010 Nodes 1864 304 CPU Cores 10864 3648 Performance ~110 TFlop/s 34 TF Total memory ~11 TB 5 TB Interconnect Cray QDR IB SeaStar Fat tree 3D Torus CSC
    [Show full text]
  • A New UK Service for Academic Research
    The newsletter of EPCC, the supercomputing centre at the University of Edinburgh news Issue 74 Autumn 2013 In this issue Managing research data HPC for business Simulating soft materials Energy efficiency in HPC Intel Xeon Phi ARCHER A new UK service for academic research ARCHER is a 1.56 Petaflop Cray XC30 supercomputer that will provide the next UK national HPC service for academic research Also in this issue Dinosaurs! From the Directors Contents 3 PGAS programming 7th International Conference Autumn marks the start of the new ARCHER service: a 1.56 Petaflop Profile Cray XC30 supercomputer that will provide the next UK national Meet the people at EPCC HPC service for academic research. 4 New national HPC service We have been involved in running generation of exascale Introducing ARCHER national HPC services for over 20 supercomputers which will be years and ARCHER will continue many, many times more powerful Big Data our tradition of supporting science than ARCHER. Big Data activities 7 Data preservation and in the UK and in Europe. are also playing an increasingly infrastructure important part in our academic and Our engagement with industry work. supercomputing takes many forms 9 HPC for industry - from racing dinosaurs at science We have always prided ourselves Making business better festivals, to helping researchers get on the diversity of our activities. more from HPC by improving This issue of EPCC News 11 Simulation algorithms and creating new showcases just a fraction of them. Better synthesised sounds software tools. EPCC staff
    [Show full text]
  • Through the Years… When Did It All Begin?
    & Through the years… When did it all begin? 1974? 1978? 1963? 2 CDC 6600 – 1974 NERSC started service with the first Supercomputer… ● A well-used system - Serial Number 1 ● On its last legs… ● Designed and built in Chippewa Falls ● Launch Date: 1963 ● Load / Store Architecture ● First RISC Computer! ● First CRT Monitor ● Freon Cooled ● State-of-the-Art Remote Access at NERSC ● Via 4 acoustic modems, manually answered capable of 10 characters /sec 3 50th Anniversary of the IBM / Cray Rivalry… Last week, CDC had a press conference during which they officially announced their 6600 system. I understand that in the laboratory developing this system there are only 32 people, “including the janitor”… Contrasting this modest effort with our vast development activities, I fail to understand why we have lost our industry leadership position by letting someone else offer the world’s most powerful computer… T.J. Watson, August 28, 1963 4 2/6/14 Cray Higher-Ed Roundtable, July 22, 2013 CDC 7600 – 1975 ● Delivered in September ● 36 Mflop Peak ● ~10 Mflop Sustained ● 10X sustained performance vs. the CDC 6600 ● Fast memory + slower core memory ● Freon cooled (again) Major Innovations § 65KW Memory § 36.4 MHz clock § Pipelined functional units 5 Cray-1 – 1978 NERSC transitions users ● Serial 6 to vector architectures ● An fairly easy transition for application writers ● LTSS was converted to run on the Cray-1 and became known as CTSS (Cray Time Sharing System) ● Freon Cooled (again) ● 2nd Cray 1 added in 1981 Major Innovations § Vector Processing § Dependency
    [Show full text]
  • Cray XC30™ Supercomputer Intel® Xeon® Processor Daughter Card
    Intel, Xeon, Aries and the Intel Logo are trademarks of Intel Corporation in the U.S. and/or other countries. All other trademarks mentioned herein are the properties of their respective owners. 20121024JRC 2012 Cray Inc. All rights reserved. Specifications are subject to change without notice. Cray is a registered trademark, Cray XC30, Cray Linux Environment, Cray SHMEM, and NodeKare are trademarks of Cray In. Cray XC30™ Supercomputer Intel® Xeon® Processor Daughter Card Adaptive Supercomputing through Flexible Design Supercomputer users procure their machines to satisfy specific and demanding The Cray XC30 supercomputer series requirements. But they need their system to grow and evolve to maximize the architecture has been specifically machine’s lifetime and their return on investment. To solve these challenges designed from the ground up to be today and into the future, the Cray XC30 supercomputer series network and adaptive. A holistic approach opti- compute technology have been designed to easily accommodate upgrades mizes the entire system to deliver and enhancements. Users can augment their system “in-place” to upgrade to sustained real-world performance and higher performance processors, or add coprocessor/accelerator components extreme scalability across the collective to build even higher performance Cray XC30 supercomputer configurations. integration of all hardware, network- ing and software. Cray XC30 Supercomputer — Compute Blade The Cray XC30 series architecture implements two processor engines per One key differentiator with this compute node, and has four compute nodes per blade. Compute blades stack adaptive supercomputing platform is 16 to a chassis, and each cabinet can be populated with up to three chassis, the flexible method of implementing culminating in 384 sockets per cabinet.
    [Show full text]
  • Cray XC40 Power Monitoring and Control for Knights Landing
    Cray XC40 Power Monitoring and Control for Knights Landing Steven J. Martin, David Rush Matthew Kappel, Michael Sandstedt, Joshua Williams Cray Inc. Cray Inc. Chippewa Falls, WI USA St. Paul, MN USA {stevem,rushd}@cray.com {mkappel,msandste,jw}@cray.com Abstract—This paper details the Cray XC40 power monitor- This paper is organized as follows: In section II, we ing and control capabilities for Intel Knights Landing (KNL) detail blade-level Hardware Supervisory System (HSS) ar- based systems. The Cray XC40 hardware blade design for Intel chitecture changes introduced to enable improved telemetry KNL processors is the first in the XC family to incorporate enhancements directly related to power monitoring feedback gathering capabilities on Cray XC40 blades with Intel KNL driven by customers and the HPC community. This paper fo- processors. Section III provides information on updates to cuses on power monitoring and control directly related to Cray interfaces available for power monitoring and control that blades with Intel KNL processors and the interfaces available are new for Cray XC40 systems and the software released to users, system administrators, and workload managers to to support blades with Intel KNL processors. Section IV access power management features. shows monitoring and control examples. Keywords-Power monitoring; power capping; RAPL; energy efficiency; power measurement; Cray XC40; Intel Knights II. ENHANCED HSS BLADE-LEVEL MONITORING Landing The XC40 KNL blade incorporates a number of en- hancements over previous XC blade types that enable the I. INTRODUCTION collection of power telemetry in greater detail and with Cray has provided advanced power monitoring and control improved accuracy.
    [Show full text]
  • Hpc in Europe
    HPC IN EUROPE Organisation of public HPC resources Context • Focus on publicly-funded HPC resources provided primarily to enable scientific research and development at European universities and other publicly-funded research institutes • These resources are also intended to benefit industrial / commercial users by: • facilitating access to HPC • providing HPC training • sponsoring academic-industrial collaborative projects to exchange expertise and accelerate efficient commercial exploitation of HPC • Do not consider private sector HPC resources owned and managed internally by companies, e.g. in aerospace design & manufacturing, oil & gas exploration, fintech (financial technology), etc. European HPC Infrastructure • Structured provision of European HPC facilities: • Tier-0: European Centres (> petaflop machines) • Tier-1: National Centres • Tier-2: Regional/University Centres • Tiers planned as part of an EU Research Infrastructure Roadmap • This is coordinated through “PRACE” – http://prace-ri.eu PRACE Partnership foR Advanced Computing in Europe • International non-profit association (HQ office in Brussels) • Established in 2010 following ESFRI* roadmap to create a persistent pan-European Research Infrastructure (RI) of world-class supercomputers • Mission: enable high-impact scientific discovery and engineering research and development across all disciplines to enhance European competitiveness for the benefit of society. *European Strategy Forum on Reseach Infrastructures PRACE Partnership foR Advanced Computing in Europe Aims: • Provide access to leading-edge computing and data management resources and services for large-scale scientific and engineering applications at the highest performance level • Provide pan-European HPC education and training • Strengthen the European users of HPC in industry A Brief History of PRACE PRACE Phases & Objectives • Preparation and implementation of the PRACE RI was supported by a series of projects funded by the EU’s FP7 and Horizon 2020 funding programmes • 530 M€ of funding for the period 2010-2015.
    [Show full text]
  • A Performance Analysis of the First Generation of HPC-Optimized Arm Processors
    McIntosh-Smith, S., Price, J., Deakin, T., & Poenaru, A. (2019). A performance analysis of the first generation of HPC-optimized Arm processors. Concurrency and Computation: Practice and Experience, 31(16), [e5110]. https://doi.org/10.1002/cpe.5110 Publisher's PDF, also known as Version of record License (if available): CC BY Link to published version (if available): 10.1002/cpe.5110 Link to publication record in Explore Bristol Research PDF-document This is the final published version of the article (version of record). It first appeared online via Wiley at https://doi.org/10.1002/cpe.5110 . Please refer to any applicable terms of use of the publisher. University of Bristol - Explore Bristol Research General rights This document is made available in accordance with publisher policies. Please cite only the published version using the reference above. Full terms of use are available: http://www.bristol.ac.uk/red/research-policy/pure/user-guides/ebr-terms/ Received: 18 June 2018 Revised: 27 November 2018 Accepted: 27 November 2018 DOI: 10.1002/cpe.5110 SPECIAL ISSUE PAPER A performance analysis of the first generation of HPC-optimized Arm processors Simon McIntosh-Smith James Price Tom Deakin Andrei Poenaru High Performance Computing Research Group, Department of Computer Science, Summary University of Bristol, Bristol, UK In this paper, we present performance results from Isambard, the first production supercomputer Correspondence to be based on Arm CPUs that have been optimized specifically for HPC. Isambard is the first Simon McIntosh-Smith, High Performance Cray XC50 ‘‘Scout’’ system, combining Cavium ThunderX2 Arm-based CPUs with Cray's Aries Computing Research Group, Department of interconnect.
    [Show full text]
  • Accelerated Prediction of the Polar Ice and Global Ocean (APPIGO)
    DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited. Accelerated Prediction of the Polar Ice and Global Ocean (APPIGO) Eric Chassignet Center for Ocean-Atmosphere Prediction Studies (COAPS) Florida State University, PO Box 3062840 Tallahassee, FL 32306-2840 phone: (850) 645-7288 fax: (850) 644-4841 email: [email protected] Award Number: N00014-13-1-0861 https://www.earthsystemcog.org/projects/espc-appigo/ Other investigators: Phil Jones (co-PI) Los Alamos National Laboratory (LANL) Rob Aulwes Los Alamos National Laboratory Tim Campbell Naval Research Lab, Stennis Space Center (NRL-SSC) Mohamed Iskandarani University of Miami Elizabeth Hunke Los Alamos National Laboratory Ben Kirtman University of Miami Alan Wallcraft Naval Research Lab, Stennis Space Center LONG-TERM GOALS Arctic change and reductions in sea ice are impacting Arctic communities and are leading to increased commercial activity in the Arctic. Improved forecasts will be needed at a variety of timescales to support Arctic operations and infrastructure decisions. Increased resolution and ensemble forecasts will require significant computational capability. At the same time, high performance computing architectures are changing in response to power and cooling limitations, adding more cores per chip and using Graphics Processing Units (GPUs) as computational accelerators. This project will improve Arctic forecast capability by modifying component models to better utilize new computational architectures. Specifically, we will focus on the Los Alamos Sea Ice Model (CICE), the HYbrid Coordinate Ocean Model (HYCOM) and the Wavewatch III models and optimize each model on both GPU-accelerated and MIC-based architectures. These codes form the ocean and sea ice components of the Navy’s Arctic Cap Nowcast/Forecast System (ACNFS) and the Navy Global Ocean Forecasting System (GOFS), with the latter scheduled to include a coupled Wavewatch III by 2016.
    [Show full text]
  • “Piz Daint:” Application Driven Co-Design of a Supercomputer Based on Cray’S Adaptive System Design
    “Piz Daint:” Application driven co-design of a supercomputer based on Cray’s adaptive system design Sadaf Alam & Thomas Schulthess CSCS & ETHzürich CUG 2014 Green 500 Top 500 Cray XC30 Cray XC30 (adaptive) Cray XK6 Cray XK7 Prototypes with accelerator devices GPU nodes in a viz cluster GPU cluster Collaborative projects HP2C training program PASC conferences & workshops High Performance High Productivity Platform for Advanced Scientific Computing (HP2C) Computing (PASC) 2009 2010 2011 2012 2013 2014 2015 … Application Investment & Engagement Training and workshops Prototypes & early access parallel systems HPC installation and operations 2 * Timelines & releases are not precise Reduce GPU-enabled MPI & MPS code GPUDirect GPUDirect-RDMA prototyping and OpenACC 1.0 OpenACC 2.0 deployment time on OpenCL 1.0 OpenCL 1.1 OpenCL 1.2 OpenCL 2.0 HPC systems CUDA 2.x CUDA 3.x CUDA 4.x CUDA 5.x CUDA 6.x CUDA 2.x CUDA 3.x CUDA 4.x CUDA 5.x CUDA 2.x CUDA 3.x CUDA 4.x CUDA 2.x Cray XK6 Cray XK7 Cray XC30 & hybrid XC30 X86 cluster with iDataPlex Testbed with C2070, M2050, cluster Kepler & S1070 M2090 Xeon Phi 2009 2010 2011 2012 2013 2014 2015 … Requirements analysis Applications development and tuning 3 * Timelines & releases are not precise Algorithmic motifs and their arithmetic intensity COSMO, WRF, SPECFEM3D Rank-1 update in HF-QMC Rank-N update in DCA++ Structured grids / stencils QMR in WL-LSMS Sparse linear algebra Linpack (Top500) Matrix-Vector Vector-Vector Fast Fourier Transforms Dense Matrix-Matrix BLAS1&2 FFTW & SPIRAL BLAS3 arithmetic density
    [Show full text]
  • Piz Daint” CSCS Enters the Path Towards Petaflop Computing
    Lugano, 21.03.2013 With “Piz Daint” CSCS enters the path towards petaflop computing The new CSCS supercomputer named "Piz Daint" is the first and largest Cray XC30 system installed worldwide. Its procurement and installation marks an important milestone in the implementation of the national high performance supercomputing strategy. In the beginning of April the system will be made available to Swiss researchers. In a collaboration with Cray and NVIDIA, this supercomputer will be extended with GPU-accelerators making it possible for the first time in Switzerland to exceed the petaflop frontier. CSCS, the Swiss National Supercomputing Centre, takes another important step in the implementation of the Swiss high performance computing and networking (HPCN) initiative, which is coordinated by ETH Board. A Cray XC30 supercomputer has been installed at CSCS and will become available to Swiss researchers in early April. The next generation supercomputer has a peak performance of 750 Teraflops, which means that it can handle 750 trillion (750'000'000'000'000) mathematical operations in just one second. Following tradition CSCS has named the new supercomputer after a Swiss mountain, "Piz Daint", which is a prominent peak in Grisons that overlooks the Fuorn pass. Piz Daint is the largest of the new generation of Cray supercomputers installed worldwide. It is based on the latest generation Intel Xeon E5 processors with a total of 36'096 compute elements. The internal communication network has been completely redesigned to enhance scalability of scientific applications. The benefit is that more compute elements can be used in parallel without sacrificing efficiency, thus allowing for larger and more complex scientific problems to be addressed.
    [Show full text]
  • Accurate, Large-Scale and Affordable Hybrid-PBE0 Calculations With
    Accurate, Large-Scale and Affordable Hybrid-PBE0 Calculations with GPU-Accelerated Supercomputers Laura E. Ratcliff Argonne Leadership Computing Facility, Argonne National Laboratory, Illinois 60439, USA Department of Materials, Imperial College London, London SW7 2AZ, UK E-mail: [email protected] A. Degomme Department of Physics, Universität Basel, Klingelbergstr. 82, 4056 Basel, Switzerland José A. Flores-Livas Department of Physics, Universität Basel, Klingelbergstr. 82, 4056 Basel, Switzerland Stefan Goedecker Department of Physics, Universität Basel, Klingelbergstr. 82, 4056 Basel, Switzerland Luigi Genovese Univ. Grenoble Alpes, CEA, INAC-SP2M, L_Sim, F-38000, Grenoble, France E-mail: [email protected] December 2017 Abstract. Performing high accuracy hybrid functional calculations for condensed matter systems containing a large number of atoms is at present computationally very demanding – when not out of reach – if high quality basis sets are used. We present a highly efficient multiple GPU implementation of the exact exchange operator which allows hybrid functional density-functional theory calculations with systematic basis sets without additional approximations for up to a thousand atoms. This method is implemented in a portable real-space- based algorithm, released as an open-source package. With such a framework hybrid DFT calculations of high quality become accessible on state-of-the-art supercomputers within a time-to-solution of the same order of magnitude as traditional semilocal-GGA functionals. arXiv:1712.07973v1 [cond-mat.mtrl-sci] 21 Dec 2017 Accurate, Large-Scale and Affordable Hybrid-PBE0 Calculations with GPU-Accelerated Supercomputers 2 1. Introduction results coming from these codes employ hybrid XC functionals. The main reason is that, for typical Density-functional theory (DFT) in principle is not an systems, hybrid functional calculations are one to approximation, and should produce the exact ground- two orders of magnitude more expensive than DFT- state energy and density, but in practice the crucial GGA calculations.
    [Show full text]