Many Core Computing Solutions
Total Page:16
File Type:pdf, Size:1020Kb
SGI® Many core computing Solutions Gábor Lehoczki Silicon Computers Kft. [email protected] 1 Nvidia Tesla series Features M2075 M2090 K10 K20 K20X Number and Type of 1 Fermi GPU 1 Fermi GPU 2 Kepler GK104s 1 Kepler GK110 GPU Peak double precision Gigaflops floating point 515 665 190 1.17 1.31 performance Gigaflops Gigaflops (95 Gflops per GPU) Tflops Tflops Peak single precision 4577 Gigaflops floating point 1030 1331 3.52 3.95 performance Gigaflops Gigaflops (2288 Gflops per GPU) Tflops Tflops Memory bandwidth 150 177 320 GB/sec 208 250 (ECC off) GBytes/sec GBytes/sec (160 GB/sec per GPU) GB/sec GB/sec 8 GB Memory size (GDDR5) 6 GB 6 GB 5 GB 6 GB (4 GB per GPU) 3072 CUDA cores 448 512 2496 2688 (1536 per GPU) 2 Intel® Xeon Phi™ Coprocessor 5110P 3120A 3120P 7120P 7120X 5120D Code Name Knights Corner # of Cores 60 57 57 61 61 60 Clock Speed 1.053 GHz 1.1 GHz 1.1 GHz 1.238 GHz 1.238 GHz 1.053 GHz Peak double precision floating 1.011 point performance TFlops Peak single precision floating 2.022 point performance TFlops Cache 30 MB 28.5 MB 28.5 MB 30.5 MB 30.5 MB 30 MB Max TDP 225 W 300 W 300 W 300 W 300 W 245 W Max Memory Size 8 GB 6 GB 6 GB GB GB 8 GB (16 channels) 16 16 Max Memory Bandwidth 320 GB/s 5 GB/s 5 GB/s 5.5 GB/s 5.5 GB/s 5.5 GB/s 3 SGI Development Suite SGI Intel Rogue Wave • For Linux Performance Composer XE TotalView software Suite Team development • Supports 2-4 developers • Technical Computing • Software development • Dynamic source code, Performance for SGI tool suite tuned for thread, and memory Systems application performance debugging for C, C++ and code robustness and Fortran HPC • Consists of SGI applications Accelerate, SGI MPI, • Consists of C++ & SGI REACT, SGI Fortran compilers, Math • Consists of TotalView, UPC Kernel Library, MemoryScape, Integrated Performance ReplayEngine, and • Tech support via SGI Primitives, Threading CUDA debugging Support contract Building Blocks • Includes 1 yr tech • Includes 1 yr tech support support 4 4 Intel® Xeon Phi™ compared to a GPU • Intel® Xeon Phi™ is a multi-core architecture with the following characteristics : – based on Pentium 4 in-order cores with a vector extension – OpenMP programming – “UV on a chip” – PCIe x16 Gen 2 (~ 6.7 GB/s) – Ease-of-Use • NVIDIA Fermi GPU use streaming multi-processors with currently 32 SIMD engines – A very high number of threads (10s of thousands) hides latency to memory – Minimal context switching time between threads – HW thread dispatcher – PCIe x16 Gen (~ 11.3 GB/s) [Kepler 10] 5 GPU-ACCELERATED APPLICATIONS 01 Research: Higher Education and Supercomputing CASINO Code for performing Quantum Monte Carlo GPUs to determine protein binding sites Borrows-Wheeler Transformation, only calculations of a wide variety of pricing and iray rendering in Maximus supported GPU simulation using Element 3D Accuweather Storyteller Weather graphics Real-time COMPUTATIONAL CHEMISTRY AND BIOLOGY (QMC) electronic structure calculations for Allows fast processing of large ligand Dynamic Programming, Multi-GPU support Intuvision Panoptes 3.0 Video Analytics Object risk measures on a broad range of asset Yes CUDA 3D object based particle system Faster effects Yes Single only NUMERICAL ANALYTICS finite and periodic systems databases Yes recognition and change detection Yes classes and derivatives Inventor 3D mechanical design, documentation, and Single only POPULAR GPU-ACCELERATED APPLICATIONS ASUCA Regional atmospheric model Entire model PHYSICS TBD Yes Single only UGENE Opensource Smith-Waterman for SSE/ LuciadLightspeed Geospatial Visualization and Existing models code in C# supported product simulation Maxon Cinema 4D 3D modeling, animation, and CATALOG | MAR14 | 11 Yes 06 Defense and Intelligence CP2K Program to perform atomistic and BUDE Molecular docking program Empirical Free CUDA, Suffix array based repeats finder and Analysis Geospatial Situational Awareness Single transparently, with minimal code changes, Uses BIM for intelligent building rendering Increased model complexity at interactive Video Copilot Optical CAM - SE Global atmosphere model for climate 06 Computational Finance molecular simulations of solid state, liquid, Energy Forcefield Single only dotplot only Supports multiple backends including components to improve design accuracy rates Flares research 07 Manufacturing: CAD and CAE molecular and biological systems Core Hopping Rapid screening of novel cores to Fast short read alignment Yes MotionDSP Ikena ISR Real-time Full Motion Video CUDA and OpenCL, Switches transparently Single only Single only Lens flares plug-in for After Effects Faster effects Dynamical core Yes COMPUTATIONAL FLUID DYNAMICS DBCSR (space matrix multiply library) Yes improve WideLM Fits numerous linear models to a fixed and WAMI between multiple GPUs and CPUS Revit Building Information Modeling (BIM) for NewTek Lightwave 3D modeling, animation, and Single only COSMO Regional atmospheric model Entire model COMPUTATIONAL STRUCTURAL MECHANICS GAMESS-UK The general purpose ab initio drug properties design and response enhancement and analytics depending on the deal support and load structural engineering, and construction rendering Increased model complexity at interactive Video Copilot Twitch Video effects plug-in for After Yes COMPUTER AIDED DESIGN molecular GPU accelerated application Yes Parallel linear regression on multiple Real-time super-resolution-based video factors Modeling (BIM) to design, build, and rates Effects Faster effects Single only GEOS-5 Global climate model Entire model Yes ELECTRONIC DESIGN AUTOMATION electronic structure program for performing FastROCS Molecule shape comparison application similarly-shaped models enhancement, filtering, mosaicing, video Yes maintain higher-quality, more energyefficient Single only EDITING HOMME Dynamical core for global atmospheric 09 Media and Entertainment SCF-, DFT- and MCSCF-gradient Real-time shape similarity searching/ Yes analytics, and transcoding Manufacturing: CAD and CAE buildings Otoy Octane Render GPU Renderer CUDA-based APPLICATION DESCRIPTION SUPPORTED model ANIMATION, MODELING AND RENDERING calculations comparison POPULAR GPU-ACCELERATED APPLICATIONS Yes COMPUTATIONAL FLUID DYNAMICS Single only GPU final-frame rendering Yes FEATURES MULTI-GPU SUPPORT Dynamical core Yes COLOR CORRECTION AND GRAIN (ss|ss) type integrals within calculations Yes CATALOG | MAR14 | 05 Nervve ViD SrX Video Analytics Object recognition APPLICATION DESCRIPTION SUPPORTED NVIDIA Iray A ready-to-integrate, physically-based, Pixologic Sculptris 3D sculpting Increased model Adobe Photoshop CC Image editing Over 30 effects HYCOM Ocean circulation model Dynamical core MANAGEMENT using Hartree-Fock ab initio methods Interactive Molecule NUMERICAL ANALYTICS and tracking Yes FEATURES MULTI-GPU SUPPORT photorealistic rendering solution complexity at interactive for smoother image Single only COMPOSITING, FINISHING AND EFFECTS and density functional theory. Supports Visualizer APPLICATION DESCRIPTION SUPPORTED NerVve Visual Search Altair AcuSolve General purpose CFD software Iray Interactive; Iray Photoreal; Iray rates manipulation in Mercury Graphics Engine Meteo Earth Weather graphics Real-time Single only EDITING organics and inorganics. Molecular visualization based on a raytracing engine FEATURES MULTI-GPU SUPPORT Solution (NVSS) Linear equation solver Yes Cluster. Fast interactive ray tracing; Single only Single only MITgcm Ocean circulation model Dynamical core ENCODING AND DIGITAL DISTRIBUTION Yes Written for use only on GPUs Yes ArrayFire Comprehensive GPU function library Video/Image Live and Forensic Search Video and ANSYS Fluent General purpose CFD software Physically-based, global-illumination Redshift Renderer GPU-accelerated, biased renderer Adobe Premiere Pro CC Video editing Mercury Single only ON-AIR GRAPHICS GAMESS-US Computational chemistry suite used to Molegro Virtual Docker 6 Method for performing high Hundreds of functions for math, signal/ Image content search Yes Radiation heat transfer model, linear rendering; Distributed cluster rendering CUDA-based GPU final-frame rendering Yes Playback Engine for real-time NEMO Ocean circulation model Entire model (GYRE ON-SET, REVIEW AND STEREO TOOLS simulate atomic and molecular electronic accuracy image processing, statistics, and more. OpCoast SNEAK Electromagnetic signals equation solver Yes Side Effects Houdini 3D modeling, animation, and video editing & accelerated rendering config) Yes SIMULATION structure flexible molecular docking Yes propagation Yes Dassault Systemes CATIA rendering Maximus-supported GPU simulation using Yes NIM Dynamical core for global atmospheric WEATHER AND CLIMATE FORECASTING Libqc with Rys Quadrature Algorithm, Energy grid computation, pose evaluation Mathematica Wolfram A symbolic technical modeling for complex urban and terrain Autodesk Moldflow Plastic mold injection software - Live Rendering OpenCL Apple Final Cut Pro Video editing Faster effects model 13 Oil and Gas Hartree-Fock, MP2 and CCSD and guided differential evolution computing language environments Linear equation solver Single only Photorealistic rendering Interactive. Fully integrated Single only Single only Dynamical core Yes POPULAR GPU-ACCELERATED APPLICATIONS Yes Single only and development environment Ray tracing, DTED and remote sensing CPFD Barracuda-VR and in CATIA V6. The Foundry Mari 3D paint Increased model Avid Media Composer Video editing Faster video Weather Central Fusion CATALOG | MAR14 | 01 Gaussian PIPER Protein Docking Protein-protein docking