SGI®

Many core computing Solutions

Gábor Lehoczki Silicon Computers Kft. [email protected]

1 Nvidia Tesla series

Features M2075 M2090 K10 K20 K20X Number and Type of 1 Fermi GPU 1 Fermi GPU 2 Kepler GK104s 1 Kepler GK110 GPU

Peak double precision Gigaflops floating point 515 665 190 1.17 1.31 performance Gigaflops Gigaflops (95 Gflops per GPU) Tflops Tflops

Peak single precision 4577 Gigaflops floating point 1030 1331 3.52 3.95 performance Gigaflops Gigaflops (2288 Gflops per GPU) Tflops Tflops

Memory bandwidth 150 177 320 GB/sec 208 250 (ECC off) GBytes/sec GBytes/sec (160 GB/sec per GPU) GB/sec GB/sec 8 GB Memory size (GDDR5) 6 GB 6 GB 5 GB 6 GB (4 GB per GPU) 3072 CUDA cores 448 512 2496 2688 (1536 per GPU)

2 Intel® Xeon Phi™ Coprocessor 5110P 3120A 3120P 7120P 7120X 5120D Code Name Knights Corner # of Cores 60 57 57 61 61 60 Clock Speed 1.053 GHz 1.1 GHz 1.1 GHz 1.238 GHz 1.238 GHz 1.053 GHz Peak double precision floating 1.011 point performance TFlops Peak single precision floating 2.022 point performance TFlops Cache 30 MB 28.5 MB 28.5 MB 30.5 MB 30.5 MB 30 MB Max TDP 225 W 300 W 300 W 300 W 300 W 245 W

Max Memory Size 8 GB 6 GB 6 GB GB GB 8 GB (16 channels) 16 16 Max Memory Bandwidth 320 GB/s 5 GB/s 5 GB/s 5.5 GB/s 5.5 GB/s 5.5 GB/s

3 SGI Development Suite

SGI Intel Rogue Wave • For Performance Composer XE TotalView software Suite Team development • Supports 2-4 developers

• Technical Computing • Software development • Dynamic source code, Performance for SGI tool suite tuned for thread, and memory Systems application performance debugging for C, C++ and code robustness and Fortran HPC • Consists of SGI applications Accelerate, SGI MPI, • Consists of C++ & SGI REACT, SGI Fortran compilers, Math • Consists of TotalView, UPC Kernel Library, MemoryScape, Integrated Performance ReplayEngine, and • Tech support via SGI Primitives, Threading CUDA debugging Support contract Building Blocks • Includes 1 yr tech • Includes 1 yr tech support support

4 4 Intel® Xeon Phi™ compared to a GPU • Intel® Xeon Phi™ is a multi-core architecture with the following characteristics : – based on Pentium 4 in-order cores with a vector extension – OpenMP programming – “UV on a ” – PCIe x16 Gen 2 (~ 6.7 GB/s) – Ease-of-Use • NVIDIA Fermi GPU use streaming multi-processors with currently 32 SIMD engines – A very high number of threads (10s of thousands) hides latency to memory – Minimal context switching time between threads – HW thread dispatcher – PCIe x16 Gen (~ 11.3 GB/s) [Kepler 10]

5 GPU-ACCELERATED APPLICATIONS

01 Research: Higher Education and Supercomputing CASINO Code for performing Quantum Monte Carlo GPUs to determine protein binding sites Borrows-Wheeler Transformation, only calculations of a wide variety of pricing and iray rendering in Maximus supported GPU simulation using Element 3D Accuweather Storyteller Weather graphics Real-time COMPUTATIONAL CHEMISTRY AND BIOLOGY (QMC) electronic structure calculations for Allows fast processing of large ligand Dynamic Programming, Multi-GPU support Intuvision Panoptes 3.0 Video Analytics Object risk measures on a broad range of asset Yes CUDA 3D object based particle system Faster effects Yes Single only NUMERICAL ANALYTICS finite and periodic systems databases Yes recognition and change detection Yes classes and derivatives Inventor 3D mechanical design, documentation, and Single only POPULAR GPU-ACCELERATED APPLICATIONS ASUCA Regional atmospheric model Entire model PHYSICS TBD Yes Single only UGENE Opensource Smith-Waterman for SSE/ LuciadLightspeed Geospatial Visualization and Existing models code in C# supported product simulation Maxon 3D modeling, animation, and CATALOG | MAR14 | 11 Yes 06 Defense and Intelligence CP2K Program to perform atomistic and BUDE Molecular docking program Empirical Free CUDA, Suffix array based repeats finder and Analysis Geospatial Situational Awareness Single transparently, with minimal code changes, Uses BIM for intelligent building rendering Increased model complexity at interactive Video Copilot Optical CAM - SE Global atmosphere model for climate 06 Computational Finance molecular simulations of solid state, liquid, Energy Forcefield Single only dotplot only Supports multiple backends including components to improve design accuracy rates Flares research 07 Manufacturing: CAD and CAE molecular and biological systems Core Hopping Rapid screening of novel cores to Fast short read alignment Yes MotionDSP Ikena ISR Real-time Full Motion Video CUDA and OpenCL, Switches transparently Single only Single only Lens flares plug-in for After Effects Faster effects Dynamical core Yes COMPUTATIONAL FLUID DYNAMICS DBCSR (space matrix multiply library) Yes improve WideLM Fits numerous linear models to a fixed and WAMI between multiple GPUs and CPUS Revit Building Information Modeling (BIM) for NewTek Lightwave 3D modeling, animation, and Single only COSMO Regional atmospheric model Entire model COMPUTATIONAL STRUCTURAL MECHANICS GAMESS-UK The general purpose ab initio drug properties design and response enhancement and analytics depending on the deal support and load structural engineering, and construction rendering Increased model complexity at interactive Video Copilot Twitch Video effects plug-in for After Yes COMPUTER AIDED DESIGN molecular GPU accelerated application Yes Parallel linear regression on multiple Real-time super-resolution-based video factors Modeling (BIM) to design, build, and rates Effects Faster effects Single only GEOS-5 Global climate model Entire model Yes ELECTRONIC DESIGN AUTOMATION electronic structure program for performing FastROCS Molecule shape comparison application similarly-shaped models enhancement, filtering, mosaicing, video Yes maintain higher-quality, more energyefficient Single only EDITING HOMME Dynamical core for global atmospheric 09 Media and Entertainment SCF-, DFT- and MCSCF-gradient Real-time shape similarity searching/ Yes analytics, and transcoding Manufacturing: CAD and CAE buildings Otoy Octane Render GPU Renderer CUDA-based APPLICATION DESCRIPTION SUPPORTED model ANIMATION, MODELING AND RENDERING calculations comparison POPULAR GPU-ACCELERATED APPLICATIONS Yes COMPUTATIONAL FLUID DYNAMICS Single only GPU final-frame rendering Yes FEATURES MULTI-GPU SUPPORT Dynamical core Yes COLOR CORRECTION AND GRAIN (ss|ss) type integrals within calculations Yes CATALOG | MAR14 | 05 Nervve ViD SrX Video Analytics Object recognition APPLICATION DESCRIPTION SUPPORTED NVIDIA Iray A ready-to-integrate, physically-based, Pixologic 3D sculpting Increased model Adobe Photoshop CC Image editing Over 30 effects HYCOM Ocean circulation model Dynamical core MANAGEMENT using Hartree-Fock ab initio methods Interactive Molecule NUMERICAL ANALYTICS and tracking Yes FEATURES MULTI-GPU SUPPORT photorealistic rendering solution complexity at interactive for smoother image Single only COMPOSITING, FINISHING AND EFFECTS and density functional theory. Supports Visualizer APPLICATION DESCRIPTION SUPPORTED NerVve Visual Search Altair AcuSolve General purpose CFD software Iray Interactive; Iray Photoreal; Iray rates manipulation in Mercury Graphics Engine Meteo Earth Weather graphics Real-time Single only EDITING organics and inorganics. Molecular visualization based on a raytracing engine FEATURES MULTI-GPU SUPPORT Solution (NVSS) Linear equation solver Yes Cluster. Fast interactive ; Single only Single only MITgcm Ocean circulation model Dynamical core ENCODING AND DIGITAL DISTRIBUTION Yes Written for use only on GPUs Yes ArrayFire Comprehensive GPU function library Video/Image Live and Forensic Search Video and ANSYS Fluent General purpose CFD software Physically-based, global-illumination Redshift Renderer GPU-accelerated, biased renderer Adobe Premiere Pro CC Video editing Mercury Single only ON-AIR GRAPHICS GAMESS-US Computational chemistry suite used to Molegro Virtual Docker 6 Method for performing high Hundreds of functions for math, signal/ Image content search Yes Radiation heat transfer model, linear rendering; Distributed cluster rendering CUDA-based GPU final-frame rendering Yes Playback Engine for real-time NEMO Ocean circulation model Entire model (GYRE ON-SET, REVIEW AND STEREO TOOLS simulate atomic and molecular electronic accuracy image processing, statistics, and more. OpCoast SNEAK Electromagnetic signals equation solver Yes Side Effects 3D modeling, animation, and video editing & accelerated rendering config) Yes SIMULATION structure flexible molecular docking Yes propagation Yes Dassault Systemes CATIA rendering Maximus-supported GPU simulation using Yes NIM Dynamical core for global atmospheric WEATHER AND CLIMATE FORECASTING Libqc with Rys Quadrature Algorithm, Energy grid computation, pose evaluation Mathematica Wolfram A symbolic technical modeling for complex urban and terrain Autodesk Moldflow Plastic mold injection software - Live Rendering OpenCL Apple Final Cut Pro Video editing Faster effects model 13 Oil and Gas Hartree-Fock, MP2 and CCSD and guided differential evolution computing language environments Linear equation solver Single only Photorealistic rendering Interactive. Fully integrated Single only Single only Dynamical core Yes POPULAR GPU-ACCELERATED APPLICATIONS Yes Single only and development environment Ray tracing, DTED and remote sensing CPFD Barracuda-VR and in CATIA V6. The Foundry Mari 3D paint Increased model Avid Media Composer Video editing Faster video Weather Central Fusion CATALOG | MAR14 | 01 Gaussian PIPER Protein Docking Protein-protein docking Development environment for CUDA and inputs Barracuda Network rendering complexity at interactive effects, unique stereo 3D Studio Research: Higher Education and Supercomputing (In development) program Molecule docking TBD OpenCL Yes Fluidized bed modeling software Linear equation Yes rates capabilities Weather graphics Real-time Single only COMPUTATIONAL CHEMISTRY AND BIOLOGY Predicts energies, molecular structures, PyMol User-sponsored molecular visualization Yes PCI Geomatics GXL Geospatial Visualization Image solver, particle Bunkspeed Suite Easy to use photorealistic renderingSingle only Single only WRF Regional numerical weather prediction Molecular Dynamics and vibrational frequencies of molecular system on an open-source foundation MATLAB by Mathworks GPU acceleration for orthorectification and additional calculations software 10 | POPULAR GPU-ACCELERATED Grass Valley Edius Video editing Faster effects model APPLICATION DESCRIPTION SUPPORTED systems Lines: 460% increase MATLAB (high-level image processing Single only Iray-based ray-tracing. Animation support. APPLICATIONS CATALOG | MAR14 Single only WSM5, WSM3, Ice Microphysics models Yes FEATURES MULTI-GPU SUPPORT Joint NVIDIA, PGI and Gaussian Cartoons: 1246% increase technical computing language) Yes FluiDyna Culises for Network rendering COLOR CORRECTION AND GRAIN Harris Velocity Video editing Faster effects Single WSI TrueView Max Weather graphics Real-time ACEMD GPU simulation of molecular mechanics collaboration Surface: 1746% increase Support for 200+ of most used MATLAB GAIA Multi-GPU, Multi-Machine distributed OpenFOAM Yes MANAGEMENT only Single only force fields, implicit and explicit solvent Yes Spheres: 753% increase functions (incl. Signal Processing, Image object store providing SQL style query Solver library for general purpose CFD RTT DeltaGen Redefines high-end 3D visualization APPLICATION DESCRIPTION SUPPORTED Quantel Qube Broadcast video editing Faster video POPULAR GPU-ACCELERATED APPLICATIONS Written for use only on GPUs Yes GPAW Real-space grid DFT code written in C and Ribbon: 426% increase Processing, Communications Systems, etc) capability, advanced geospatial query software and FEATURES MULTI-GPU SUPPORT effects, unique stereo 3D CATALOG | MAR14 | 13 AMBER Suite of programs to simulate molecular Python Single only Yes capability,heatmap generation, and Linear equation solvers Yes realtime interaction. This latest version Adobe SpeedGrade CC Color grading Real-time capabilities Oil and Gas dynamics on biomolecule Electrostatic poissonequation, VEGA ZZ Molecular Modeling Toolkit Virtual logP, NMath Premium GPU-accelerated math and distributed rasterization services FluiDyna LBultra General purpose CFD software gives users a broad suite of robust new grading and finishing with Single only APPLICATION DESCRIPTION SUPPORTED PMEMD Explicit Solvent and GB Implicit orthonormalizing of vectors, residual molecular surface values Single only statistics for Built for GPUs, scales to many machines Lattice-Boltzmann solver Yes features to truly revolutionize processes Lumetri Deep Color Engine Sony Vegas Pro Video editing Faster video effects FEATURES MULTI-GPU SUPPORT Solvent minimization method (rmm-diis) VMD Visualization and analyzing large biomolecular .NET, automatically detects the presence and many cards without end user query Prometech Particleworks Particle-based CFD and help increase visual quality, speed, and Single only and encoding Single only Acceleware Yes Yes systems in 3-D graphics of a CUDA-enabled GPU at runtime modification or pre-meditation software Implicit and explicit solvers Single only flexibility. Assimilate Scratch Color grading and finishing ENCODING AND DIGITAL DISTRIBUTION AxRTM CHARMM MD package to simulate molecular Jaguar A high-performance ab initio package High quality rendering, large structures and seamlessly redirects appropriate Yes Vratis ARAEL General purpose CFD software based Interactive ray tracing and global Accelerated debayering for real-time APPLICATION DESCRIPTION SUPPORTED AxKTM dynamics on biomolecule for performing gas and solution phase (100M atoms), analysis and visualization computations to it. Computational Finance on illumination. Integration with Siemens digital finishing FEATURES MULTI-GPU SUPPORT Seismic Processing RTM, Kirchhoff, control source, Implicit (5x), Explicit (2x) Solvent simulations tasks, multiple GPU support for display of Automatically offloads computations to the APPLICATION DESCRIPTION SUPPORTED FVM with OpenFOAM compatibility TeamCenter. Cluster support Realtime Single only Cinnafilm Tachyon Standards conversion Video electromagnetism, forward modeling via OpenMM. Native CUDA port in TBD Yes molecular orbitals GPU FEATURES MULTI-GPU SUPPORT Linear equation solver Yes & Offline Production Process Integration Blackmagic DaVinci procession and encoding Yes Yes development LATTE Density matrix computations CU_BLAS, SP2 Yes Single only Aaon Benfield Turbostream Ltd. CFD software for turbomachinery and scene building. Scene Analysis, Xplore Resolve Digimetrics Aurora Automated video and audio test CGGV GeoVation Seismic Processing Multiple Yes Algorithm Yes Bioinformatics PHYSICS Pathwise™ flows Explicit solver Yes DeltaGen, SDK for DeltaGen Color grading Real-time color correction, real-time and algorithms (RTM, etc) Yes DESMOND High-speed molecular dynamics MOLCAS Methods for calculating general electronic APPLICATION DESCRIPTION SUPPORTED APPLICATION DESCRIPTION SUPPORTED Specialized platform for real-time hedging, Vratis SpeedIT extreme Yes denoising measurment ffA Geoteric Seismic Interpretation Attributes simulations of biological systems on structures in molecular systems in both FEATURES MULTI-GPU SUPPORT FEATURES MULTI-GPU SUPPORT valuation, pricing and risk management for OpenFOAM PTC Creo Parametric Parametric design solution Yes Video encoding and video processing Single only calculations, geobodies conventional commodity clusters. ground and excited states BarraCUDA Sequence mapping software Alignment Chroma Lattice Quantum Chromodynamics (LQCD) Spreadsheet-like modeling interfaces, Solver library for general purpose CFD suite Anti-aliasing, better lighting and enhanced Cinnafilm Dark Energy Application and plug-in for Elemental Live Live streaming video processing and extraction The code uses novel parallel algorithms CU_BLAS Single only of short sequencing reads, Wilson-clover fermions, Krylov solvers, Python-based scripting environment and software shaded-with-edges mode. GRID Support image encoding Yes and numerical techniques to achieve high MOLPRO Used for accurate ab initio quantum alignment of indels with gap openings and Domain-decomposition Grid middleware Linear equation solvers Yes Single only enhancement Video encoding and video processing Yes ffA SVI Pro Seismic Interpretation Attributes performance and accuracy chemistry calculations extensions Yes Yes RESEARCH CFD DEVELOPMENTS Siemens PLM NX and Image de-noising and restoration. Elemental Server File-based video processing and calculations, geobodies Yes Density-fitted MP2 (DF-MP2), density fitted Yes ENZO 3D block-structured AMR code for Hanweck Associates Real-time options analytical FEFLO (GMU-Lohner) General purpose CFD Teamcenter Simultaneous active and background encoding Video encoding and video processing Yes extraction DL-POLY Simulate macromolecules, polymers, ionic local correlation methods (DF-RHF, DFKS), DFT CUDASW++ Open source software for Smith- cosmological structure formation engine (Volera) Real-time options analytics engine software for Product lifecycle management solutions rendering isovideo Viarte Video standards conversion CUDA- Yes systems, etc on a distributed memory Yes Waterman Accelerated magneto hydrodynamics Yes compressible and incompressible flows from design to simulation to production to Yes accelerated video procession and ffA SEA3D Pro Seismic Interpretation Attributes parallel computer MOPAC2013 Semiempirical Quantum Chemistry protein database searches on GPUs solvers Murex MACS Analytics Implicit and explicit solver Yes service Digital Vision Nucoda Color grading Real-time color encoding calculations, geobodies Two-body forces, Link-cell pairs, Ewald Pseudodiagonalization, full diagonalization, Parallel search of Smith-Waterman Yes Library SD++ (StanfordJameson) Design software, NX, and PLM viewer correction Single only Yes extraction SPME forces, Shake VV and density matrix assembling database GTC Simulates microturbulence and transport in Analytics library for modeling valuation and General purpose CFD software for applications, TcVis and Active Workspace Marquise Technologies MainConcept CUDA Yes Yes Single only Yes magnetically confined fusion plasma risk for derivatives across multiple asset compressible flows. Client. Rain H.264/AVC Encoder SDK GeoStar Seismic Suite Seismic Processing Multiple Folding@Home A distributed computing project that NWChem Calculations Triples part of Reg-CCSD(T), CUSHAW Parallelized short read aligner Parallel, Electron push and shift (accounting for classes. Explicit solver Yes Single only Color grading CUDA-based real-time color correction H.264 video encoder Video encoding and video algorithms (RTM, etc) Yes studies CCSD and accurate long read aligner for >80% of run time) Market standard models for all asset S3D (Sandia and Oak POPULAR GPU-ACCELERATED APPLICATIONS Single only processing Yes Headwave Suite Seismic Interpretation Attributes protein folding, misfolding, aggregation, and EOMCCSD task schedulers large genomes Yes classes paired with the most efficient Ridge NL) CATALOG | MAR14 | 09 Quantel Pablo Rio Color grading and finishing Real Sorenson Squeeze Video transcoding application and calculations, Volume Rendering Yes related diseases Yes Yes GTS Simulates microturbulence and the motion resolution methods (Monte Carlo Direct numerical solver (DNS) for turbulent ELECTRONIC DESIGN AUTOMATION time color correction Yes plug-In Video encoding and video processing Yes HUE HUEspace Seismic Interpretation Interpretation Powerful distributed computing molecular Octopus Used for ab initio virtual experimentation 04 | POPULAR GPU-ACCELERATED of charged particles and interactions in simulations and Partial Differential combustion APPLICATION DESCRIPTION SUPPORTED Red Digital Cinema Snell Alchemist on development platform Yes dynamics system; implicit solvent and and quantum chemistry calculations APPLICATIONS CATALOG | MAR14 fusion plasma Equations) Chemistry model Yes FEATURES MULTI-GPU SUPPORT REDCINE-X Demand OpenGeo Solutions folding Full GPU support for ground-state, GATK The Genome Analysis Toolkit or GATK Push and shift for both electron and ion Yes DualSPHysics SPH-based CFD software SPH model Agilent Technologies ADS Simulation tool for design Color grading CUDA-accelerated debayering Single Video standards conversion Video procession and OpenSeis Yes real-time calculations; Kohn-Sham is a software package developed at the dynamics Numerical Algorithms Yes of RF, microwave only encoding Yes Seismic Processing Spectral Decomposition Yes GPUGrid.net A distributed computing project that Hamiltonian, orthogonalization, subspace Broad Institute to analyse next-generation Yes Group (NAG) NASA FUN3D General purpose CFD software Linear and high speed digital circuits SGO Mistika Color grading and finishing Real-time Telestream Vantage Video transcoding and Panorama Tech Seismic Processing, Modeling uses diagonalization, poisson solver, time resequencing data. The toolkit offers a wide MILC Lattice Quantum Chromodynamics (LQCD) Random number generators, Brownian equation solver Single only Signal integrity simulation Single only color correction and finishing Single only processing Video encoding and video processing YesMultiple algorithms (RTM, etc) Yes GPUs for molecular simulations propagation variety of tools, with a primary focus on codes simulate how elemental particles bridges, and PDE solvers 08 | POPULAR GPU-ACCELERATED Agilent Technologies The Pixel Farm PFClean Image restoration and ON-AIR GRAPHICS Paradigm Echos RTM Seismic Processing RTM High-performance all- biomolecular yes variant discovery and genotyping as well as are formed and bound by the “strong force” Monte Carlo and PDE solvers Single only APPLICATIONS CATALOG | MAR14 EMPro remastering CUDA-based image processing APPLICATION DESCRIPTION SUPPORTED algorithm Yes simulations; explicit solvent and binding Q-CHEM Computational chemistry package designed strong emphasis on data quality assurance. to create larger particles like protons and RMS Catastrophic risk modeling for FSI COMPUTATIONAL STRUCTURAL MECHANICS Modeling and simulation environment for acceleration FEATURES MULTI-GPU SUPPORT Paradigm SKUA Reservoir Modeling Faults, Horizons Yes for HPC clusters Variant Calling; > 70X speedup Yes neutrons (earthquakes, hurricanes, terrorism, APPLICATION DESCRIPTION SUPPORTED analyzing 3D EM effects of high speed and Single only Avid On-air Graphics Motion Graphics Real-time and Flow Simulation Grid Yes GROMACS Simulation of biochemical molecules with Various features including RI-MP2 TBD GPU-BLAST Local search with fast k-tuple heuristic Staggered fermions, Krylov solvers, infectuous diseases) FEATURES MULTI-GPU SUPPORT RF/Microwave components COMPOSITING, FINISHING AND EFFECTS graphics rendering Single only Paradigm Geophysical complicated bond interactions QUICK QUICK is a GPU-enabled ab intio quantum Protein alignment according to BLASTP Single only Gauge-link fattening Risk analytics Yes Abaqus/Standard Simulation and analysis tool for FDTD solver Yes APPLICATION DESCRIPTION SUPPORTED Brainstorm Estudio Virtual Sets and motion graphics VoxelGeo Implicit (5x), Explicit (2x) Solvent Yes chemistry software package mCUDA-MEME Ultrafast scalable motif discovery Yes Tanay ZX Lib (Fuzzy structural ANSYS Nexxim Circuit simulation engine for FEATURES MULTI-GPU SUPPORT Real-time graphics rendering Single only Seismic Interpretation Volume Rendering, Horizon HALMD Large-scale simulations of simple and Running Hartree-Fock and DFT energy on algorithm PIConGPU A relativistic Particle-in-Cell code that Logic) mechanics RF/analog/ ABSoft Neat Video Video noise reduction plug-in Chyron HyperX On-air graphics Real-time graphics Flattening Yes complex liquids GPU, Supports s, p, d, f orbitals on energy based on MEME describes the dynamics of a plasma by Financial analytics and data mining library Monte Direct sparse solver Yes mixed-signal IC design; IBIS-AMI analysis CUDA-based acceleration Single only rendering Single only Roxar RMS Reservoir Modeling Multi GPU Simple fluids and binary mixtures (pair calculation, HF gradient with s,p,d orbital Scalable motif discovery algorithm based computing the motion of electrons and ions Carlo simulations, pricing of vanilla ANSYS Mechanical Simulation and analysis tool for speedup with GPU computing. Adobe After Effects CC Motion graphics and effects Chyron LEX On-air graphics Real-time graphics capabilities via HUEspace Yes potentials, high-precision NVE and NVT, support, GPU-based ERI generator on MEME subject to the Maxwell-Vlasov equation. and exotic options, fixed income analytics, structural AMI analysis Single only 3D raytracing engine based on NVIDIA rendering Single only Seismic City Prestack dynamic correlations) Yes Yes Simulation of laser-wakefield acceleration data mining mechanics ANSYS HFSS Simulation tool for modeling 3-D full- OptiX Harris Inscriber On-air graphics Real-time graphics Interpretation Single only POPULAR GPU-ACCELERATED APPLICATIONS MUMmerGPU High-throughput local sequence of electrons Yes Direct and iterative solvers Yes wave Yes rendering Single only Seismic Processing Multiple algorithms (RTM, etc) HOOMD-Blue Particle dynamics package written CATALOG | MAR14 | 03 alignment Yes SciComp, Inc Derivative pricing (SciFinance) Monte Impetus Afea Predicts large deformations of electromagnetic fields in high-frequency Autodesk Flame Monarch dScript 3D Virtual sets Real-time graphics Yes grounds TeraChem Quantum chemistry software designed to program QUDA Library for Lattice QCD calculations using Carlo and PDE pricing models Single only structures and high-speed electronic components. Premium rendering Single only SpectraSeis Seismic Processing Full elastic wave- up for GPUs run on NVIDIA GPU Aligns multiple query sequences against GPUs POPULAR GPU-ACCELERATED APPLICATIONS and components exposed to extreme Transient solver Yes Finishing and color grading Post-production Monarch Twister Pro Virtual Sets and motion equation imaging and Written for use only on GPUs Yes Full GPU-based solution; Performance reference sequence in parallel CUDA supports the following fermion CATALOG | MAR14 | 07 loading conditions CST Microwave Studio integrated end-to-end graphics Real-time graphics rendering Single only analysis of microseismic fracture data LAMMPS Classical molecular dynamics package compared to GAMESS CPU version TBD formulations: Wilson,Wilson-clover,Twisted Xcelerit SDK Software Development Kit (SDK) to Linear equation solver Yes (MWS) toolset for 3D VFX, editorial, and color Monarch Virtuoso Virtual Sets and motion graphics Yes Lennard-Jones, Gay-Berne, Tersoff, and Yes NVBIO NVBIO is an open source C++ library mass,Improved staggered (asqtad or HISQ) boost LS-DYNA Implicit Multiphysics simulation package 3D EM modeling and simulation Transient solver grading Real-time graphics rendering Single only Stoneridge Technologies many more potentials Materials Science of reusable components designed to and Domain wall the performance of Financial applications used Linear equation solver Yes Yes Single Only Pixel Power Clarity On-air graphics Real-time GAMPACK Yes APPLICATION DESCRIPTION SUPPORTED accelerate bioinformatics applications using Yes (e.g. Monte-Carlo, Finite-difference) with MSC Nastran Simulation and analysis tool for Delcross Savant Simulation tool for installed antenna Autodesk Smoke Finishing and editing Faster effects graphics rendering Single only Reservoir Simulation GPU Algebraic MultiGrid NAMD Designed for high-performance simulation FEATURES MULTI-GPU SUPPORT CUDA. RAMSES Simulates astrophysical problems on minimum changes to existing code structural performance and antenna-to-antenna Single only RT Software tOG On-air graphics Real-time graphics Package Yes of large molecular systems gWL-LSMS Materials code for investigating the Data structures, algorithms, and utility different scales (e.g. star formation, C++ programming language, crossplatform (back-end mechanics coupling Boris FX Continuum rendering Single only Terraspark Insight Earth Seismic Interpretation Full electrostatics with PME and most effects routines useful for building complex galaxy dynamics, cosmological structure generates CUDA and Direct sparse solver Yes High-frequency solver Yes Complete Vizrt Viz Engine On-air graphics Real-time graphics Horizon orientation attributes; automated simulation features; 100M atom capable of temperature on magnetism computational genomics applications on formation) optimized CPU code), supports Windows MSC Marc Simulation and analysis tool for structural EMSS FEKO 3D EM modeling and simulation MoM Visual effects plug-in Faster effects Single only rendering Single only fault extraction, Curvature Attributes Yes Generalized Wang-Landau method Yes CPU-GPU systems CUDA acceleration is applied for radiative and Linux operating systems. mechanics solver Single only Cinnafilm Dark Energy Wasp 3D Beehive On-air graphics and virtual sets Yes OpenMM Library and application for molecular PEtot First principles materials code that Yes transfer for reionization, and the Yes Direct sparse solver Yes Remcom XFdtd 3D EM modeling and simulation Plug-in Real-time graphics rendering Single only Tsunami RTM Seismic Processing RTM algorithm dynamics for HPC with GPUs computes the behavior of the electron NVBowtie A largely complete implementation of the hydrodynamic solver using AMR Synerscope’s Synerscope OptiStruct Simulation and analysis tool for structural FDTD solver Yes Color management All features accelerated with 12 | POPULAR GPU-ACCELERATED Yes Implicit and explicit solvent, custom forces Yes structures of materials Bowtie2 aligner on top of NVBIO Yes Data Visualization mechanics SPEAG SEMCAD-X 3D EM modeling and simulation CUDA Yes APPLICATIONS CATALOG | MAR14 WesterGeco Omega2 Test Drive the World’s Density functional theory (DFT) plane wave Good coverage of Bowtie2 features and Chemora Chemora is a system for performing Visual big data exploration and insight tools Graphical Direct and iterative solvers Yes FDTD solver Yes CoreMelt complete Visual effects plug-in Faster ON-SET, REVIEW AND STEREO TOOLS RTM Fastest Accelerator – Free! pseudopotential calculations comparable quality results simulations of systems described exploration of large network DS Exsight Uses Abaqus Standard for GPU Rocketick RocketSim Verilog simulation Verilog effects Single only APPLICATION DESCRIPTION SUPPORTED Seismic Processing Multiple algorithms (RTM, etc) Take the GPU Test Drive, a free and easy way to Yes Yes by differential equations running on datasets including geo-spatial and computing Direct sparse solver Single only simulation Yes eyeon Fusion Effects and compositing Faster effects FEATURES MULTI-GPU SUPPORT Yes experience accelerated computing on GPUs. You QMCPACK Solves the many-body Schrodinger REACTA A modified version of GCTA with improved accelerated computational clusters temporal components DS DesignSight Uses Abaqus Standard for GPU Media and Entertainment Single only 3ality Intellicam 3D stereo camera adjustment CUDA-For more information on GPU-accelerated can equation computational performance, support for Chemora embeds the equations’ Single only computing Direct sparse solver Single only ANIMATION, MODELING AND RENDERING GenArts Monsters GT Visual effects plug-in Faster based imaging Single only applications please visit,www.nvidia.com/teslaapps run your own application or try one of the preloaded for electronic structures using a quantum Graphics Processing Units (GPUs), and computational kernels into dynamically QuantAlea’s Alea.cuBase PAM-CRASH Implicit Multiphysics simulation APPLICATION DESCRIPTION SUPPORTED effects Single only BlueFish 4K Review Review and approval of 4K © 2014 NVIDIA Corporation. All rights reserved. ones, all running on a remote cluster. Try it today. Monte Carlo method additional features. The purpose of REACTA compiled loop nests shaped for input size F# package used Linear equation solver Single only FEATURES MULTI-GPU SUPPORT GenArts Sapphire Visual effects plug-in Faster content Real-time Single only NVIDIA, the NVIDIA logo, and CUDA, are www.nvidia.com/gputestdrive Main features Yes is to quantify the contribution of genetic and GPU structure F# package enabling a growing set of F# NX Nastran Simulation and analysis tool for structural + effects Single only Colorfront On-Set Dailies Review and approval of 4K trademarks and/or registered trademarks of 02 | POPULAR GPU-ACCELERATED Quantum Espresso/ variation to phenotypic variation for complex Yes capability to run on a GPU. mechanics NVIDIA iray ROBUSKEY Chroma keyer plug-in Faster effects content Real-time Yes NVIDIA Corporation. All company and product names APPLICATIONS CATALOG | MAR14 PWscf traits. 06 | POPULAR GPU-ACCELERATED F# for GPU accelerators Yes Linear equation solver Single only 3D modeling, animation, and rendering iray Single only eyeon Dimension 3D stereoscopic workflow Real- are trademarks or registered trademarks of the Quantum Chemistry An integrated suite of computer codes GRM creation, REML analysis, Regional APPLICATIONS CATALOG | MAR14 Altimesh’s Hybridizer C# Multi-target C# framework COMPUTER AIDED DESIGN interactive, photorealistic and Neat Video Open FX Video noise reduction plug-in time Single only respective owners with which they APPLICATION DESCRIPTION SUPPORTED for electronic structure calculations and Heritability (including multi-GPU) Defense and Intelligence for data parallel APPLICATION DESCRIPTION SUPPORTED physically correct rendering Faster effects Single only Lightcraft Previzion On-set workflow Real-time Single are associated. Features, pricing, availability, and FEATURES MULTI-GPU SUPPORT materials modeling at the nanoscale Yes APPLICATION DESCRIPTION SUPPORTED computing. FEATURES MULTI-GPU SUPPORT Yes NewBlueFX Video only specifications are all subject to change without Abinit Allows to find total energy, charge density PWscf package: linear algebra (matix SeqNFind SeqNFind® is a powerful tool suite that FEATURES MULTI-GPU SUPPORT C# with translation to GPU or Multi-Core AutoCAD 2D and 3D CAD design, drafting, modeling, 3D modeling, animation, and Essentials Tweak RV Review and approval of 4K content Real- notice. MAR14 and electronic structure of systems made of multiply), explicit computational kernels, addresses the need for complete and DigitalGlobe Advanced Xeon architectural drawing, and engineering rendering Increased model complexity, larger scenes, Video effects plug-in Faster effects Single only time Single only electrons and nuclei within DFT 3D FFTs accurate alignments of many small Ortho Series Yes software. Supports Open GL. Native DWG™ windowed stereo NewBlue Titler Pro Video titling plug-in Faster effects SIMULATION Local Hamiltonian, non-local Hamiltonian, Yes sequences against entire genomes utilizing Geospatial Visualization Image orthorectification Yes MISYS Global Risk Regulatory compliance and support. Single only Single only APPLICATION DESCRIPTION SUPPORTED LOBPCG algorithm, diagonalization/ VASP First principles materials code that a unique hardware/software cluster system Eternix Blaze Terra Geospatial Visualization 3D enterprise wide Surface, mesh, and solid modeling tools, Autodesk Motion Builder Character animation and Pixelan AnyFX Video effects plug-in Faster effects FEATURES MULTI-GPU SUPPORT orthogonalization. Contains BigDFT. computes the behavior of electronic for facilitating bioinformatics research in visualization of geospatial data Yes risk transparency package model documentation tools, parametric motion capture Increased model complexity at Single only 3DAliens Glu3d SPH fluid simulation Faster Yes structures based on quantum theory Next Generation sequencing and genomic Exelis (ITT) ENVI Geospatial Visualization Image Risk Analytics Yes drawing capabilities. Native DWG™ interactive Re:VisionEffects Visual effects plug-in Faster effects simulation Single only ACES III Takes best features of parallel Hybrid Hartree-Fock DFT functionals comparisons. orthorectification MiAccLib High Speed Multi-Algorithm Search Engine support. rates Single only Blastcode Kilton/ implementations of quantum chemistry including exact exchange Hardware and softare for reference (custom builds only) library providing high speed text string Single only Single only Red Giant Effects Suite Visual effects plug-ins Faster Megaton methods for electronic structure Yes assembly, BLAST, SW HMM, and de novo Yes search with scalability of searching text AutoCAD Design Suite AutoCAD 2014 software, plus 3D sculpting Increased model effects Single only Physics-based simulation plug in Faster simulation Integrating scheduling GPU into SIAL Visualization and Docking Software assembly MrGeo Geospatial Visualization Terrain analytics Yesand/or keywords on hundreds millions of tools to create, complexity at interactive Red Giant Magic Bullet Single only programming language and SIP runtime APPLICATION DESCRIPTION SUPPORTED Yes GeoWeb3d Desktop Geospatial Visualization 3D records and/or text data. capture, connect, and showcase designs rates Looks Jawset TurbulenceFD Physics-based simulation plug environment FEATURES MULTI-GPU SUPPORT SOAP3 GPU-based software for aligning short visualization of geospatial data Yes Exact Text match Search, Approximate\ 2D/3D display of designs, interactive 3D Single only Color and finishing tools Faster effects Single only in Faster simulation Single only, Yes Amira 5 A multifaceted software platform reads with a reference sequence.It can find Incogna GIS Geospatial Visualization Image Similarity Text Search, Wild Card presentation with realistic materials, Cebas finalRender GPU Renderer CUDA-based GPU The Foundry HIERO Shot management, conform multi-GPU ADF Density Functional Theory (DFT) software for visualizing, manipulating, and all alignments with k mismatches, where k processing on Tesla cloud servers Text Search, Proximity & Percentage rendering-ray tracing Rendering Yes and review available in package that enables first-principles understanding Life Science and bio-medical is chosen from 0 to 3 Object recognition Text Search, MultiKeyword and Single only CentiLeo GPU Render GPU Renderer CUDA-based timeline late Q2/Q3 electronic structure calculations data Short read alignment tool that is not Yes MultiColumnMultiKeyword Text Search, Autodesk 3ds Max 3D animation creative toolset for GPU Rendering Single only Better interactivity Single only WEATHER AND CLIMATE FORECASTING Fock Matrix, Hessians Yes 3D visualization of volumetric data and heurisitic based, reports all answers Intergraph Motion Video RadixSort Text modeling, Chaos V-Ray RT GPU Renderer CUDA interactive The Foundry and APPLICATION DESCRIPTION SUPPORTED BigDFT Implements density functional theory surfaces Yes Analyst Yes animation, simulation, and rendering for and final-frame GPU NUKEX FEATURES MULTI-GPU SUPPORT by solving the Kohn-Sham equations Single only SOAP3-dp SOAP3-dp: Ultra-fast GPU-based tool for Video filters and mosaic’ing - Geo-fuses SunGard Adaptiv games, film, and motion graphics Rendering Compositing tools with 3D tracker Faster effects Accuweather Cinemative describing the electrons in a material BINDSURF A virtual screening methodology that short read alignment via index-assisted FMV analytics with intelligence data Analytics 63D modeling, mesh and surface modeling, Yes Single only HD DFT; Daubechies wavelets, part of Abinit Yes uses dynamic programming Full motion video ortho mosaic processing Single A flexible and extensible engine for fast improved Nitrous viewport performance, Jawset TurbulenceFD Physics-based simulation plug-Video Copilot Software Weather graphics Real-time Single only Software Development Ecosystem for Intel Xeon Phi Open Source Commercial

Compiler gcc (kernel build only, not Intel Parallel Studio XE (C++ & Fortran) for applications), Intel Cluster Studio XE (C++ & Fortran) Python CAPS HMPP compiler (beta) Debugger gdb Intel Debugger Rogue Wave TotalView (beta) Allinea DDT Libraries TBB (in Intel Studio XE) Intel MKL, Intel MPI, OpenMP, Intel IPP, MPICH2 Cilk™ Plus (in Intel Studio XE products), FFTW NAG NetCDF Rogue Wave IMSL

Profiling & Analysis Intel VTune Amplifier XE Tools Intel Trace Analyzer & Collector Intel Inspector XE Workload Scheduler Altair PBS Professional, Adaptive Computing Moab

System Management SGI Management Center Bright Cluster Manager (beta)

For more information: http://software.intel.com/en-us/articles/intel-and-third-party-tools-and-libraries-available-with-support-for-intelr-xeon-phitm 7 Next generation XEON: Haswell

8 Next-next generation HPC XEON

PHI

9 “Pyramid 1U” Model C1104G-RP5 Chassis Profile 1U standard-depth Servers/System One dual-socket Chipset Intel® C600 Two Intel® Xeon® processor E5-2600 Max. Processors series Max. CPU TDP 115W Memory Slots 8 DIMM slots 1600/1333/1066/800 MHz DDR3 ECC Memory Type Reg

Max. Hard Disk Drives 4 x 2.5" drives Three PCIe 3.0 x16 Expansion Slot double-width slots One external PCIe 3.0 x8 low-profile slot

Networking (Onboard) Dual-Port GigE controller (Intel® I350) NVIDIA GPU accelerator support: IPMI Remote Integrated IPMI 2.0 • 3x K10, K20, K20X Management NVIDIA graphics GPU support: • Quadro Plex 7000 Power Supply 1800W Redundant* Platinum Level Intel accelerator support: • 3x Phi

10 10 “Pyramid 2U” Model C2110G-RP5 Chassis Profile 2U standard-depth Servers/System One dual-socket Chipset Intel® C600

Max. Processors Two Intel® Xeon® processor E5-2600 series Max. CPU TDP 130W Memory Slots 8 DIMM slots Memory Type 1600/1333/1066/800 MHz DDR3 ECC Reg

Max. Hard Disk Drives 10 x 2.5” drives Four internal PCIe 3.0 x16 double-width Expansion Slot (optional riser card for 2 additional external PCIe 3.0 x16 double width slots), One external PCIe 3.0 x8 low profile and one PCIe 2.0 x4 full height

Networking (Onboard) Dual-Port GigE controller (Intel® I350) NVIDIA GPU accelerator support: • 4x K10, K20, K20X IPMI Remote Management Integrated IPMI 2.0 NVIDIA graphics GPU support: • Quadro Plex 7000 Intel accelerator support: Power Supply 1800W Redundant** Platinum Level • 4x Phi

11 11 SGI UV 2000 Within a Single Standard 19” Rack

Up to: • 64 socket/512 cores/1024 threads ® ® - Or 34 CPU + 30 Intel Xeon Phi™ - Or 36 CPU + 28 NVIDIA® Tesla® K20,K20X, or K40 (2 partitions)

• 16TB memory • 63 x16 PCIe Gen 3 Links

12 12 SGI UV and Scale-out Compared Feature Standard Scale-out Servers SGI UV

Architecture Scale Out Scale Up Reference Terms Cluster Single System Image Distributed Memory Shared Memory

System Limit 16 cores, 0.5 TB memory 2048 cores, 64 TB memory

CPU X86 Intel® Xeon® or AMD Opteron™ Intel® Xeon®

Memory, Storage, Industry standard Industry Standard networking

Interconnect or Ethernet or InfiniBand SGI NUMAlink System Fabric

Hardware Package or Blades or Rackmount, 19” rack Blades (UV2000) or Building Block Rackmount(UV20), 19” rack

Software Off the Shelf Off the Shelf

Applications Small scale single apps Small or large single apps cluster or “MPI” apps Cluster or “MPI” apps also

Cost Lowest cost x86 architecture 20-75% > scale-out, ~1/3 the cost of ‘Big Iron’.

13 13 SGI® Value Propositions

• Long history of working with accelerators • “Home-brewed” – Geometry Engine, TPU (Tensor Processor Unit) • FPGA’s (RASC™ technology), GPUs, SOCs • 50 application experts focused on it • Accelerators in both a scale-up and scale-out environment • 32 Intel® Xeon® Phi™ coprocessors in SGI® UV™ 2000 (COSMOS and TGAC) • SGI: the only vendor able to deliver hybrid scale-out and scale-up solutions • Everything you need in a powerful solution • Factory-integrated, tested, rack level delivery – plug in and go • Starter kits • Worldwide customer support • Fully-managed, with SGI Management Center and Performance Suite software

14 14 SGI® Integrated Clusters for Intel Xeon Phi  A complete, managed GPU solution of software and hardware, all you need for a powerful deployment • Hardware stack • Rackable™ C2108 head node • Either/or Rackable C1104G or C2110G compute nodes • Infiniband NICs and Switch • Ethernet switch for management • Add SGI InfiniteStorage solutions, either direct-attached to the head node or SGI NAS  Software stack • SGI Management Center software • Altair PBSpro Load Management software  Complete Factory Test & Integration

15 SGI® Management Center Premium Edition offers Accelerator Monitoring and Management!

• Single system management console with remote server monitoring and control • Ease of use, full system management including GUI and CLI • Policy driven, fine grained power load management for green computing • Advanced fault, event, and alert management for improved reliability • Advanced capabilities including Accelerator management, BIOS management, and high availability

16 SMC Premium Edition– Accelerator Monitoring

Detailed GPU reporting enables high performance systems to remain at their peak performance

17 Optimized for HPC & Big Data Solutions

• SGI® ICE™ X

• SGI UV™ 2000

18 SGI ICE X Compute Node with GPU

• Two Intel® Xeon® processor E5-2600 series

• Two NVIDIA® TESLA® K20

• FDR InfiniBand - single or dual plane

• Four DDR3 DIMMs per socket @ 1600 MT/s

• Up to one 2.5” SATA HDD/ SSD drives

• Liquid cold sinks

One dual socket node w/ FDR + 1:1 processor to coprocessor ratio two coprocessors in = Balanced Throughput one blade slot!

19 19 SGI ICE X Compute Node with Phi • Two Intel® Xeon® processor E5-2600 series

• Two Intel® Xeon Phi™ 5120D Coprocessor

• FDR InfiniBand - single or dual plane

• Four DDR3 DIMMs/ socket @ up to 1600 MT/s • Expect 1866 MT/s support w/1 DPC populated (w/ Ivy Bridge-EP) • Up to one 2.5” SATA HDD/ SSD drive

• Liquid cold sinks

One dual socket node w/ FDR + 1:1 processor to coprocessor ratio two coprocessors in = Balanced Throughput one blade slot! 20 Balanced 20 throughput SGI Cold Sink Technology – Twin Blades

21 SGI Meets All Accelerator Needs!

• SGI has the longest history with accelerators • Accelerators have gone mainstream, we are supporting them across our product line • The only vendor to be able to deploy scale- up, scale-out and hybrid landscapes with accelerators everywhere.

22 22 SGI ICE: Sample ICE Customers

BAW - Bundesanstalt für KU Leuven Sikorsky Wasserbau Mazda Motor Corporation Skoda Auto Bridgestone McLaren Motor Racing Ltd TI-09 Central Research Inst of Mercedes-Benz GP Tokyo University – ISSP Electric Pwr Ind MPO Total CSIR - Centre for NASA Ames Toyota Mathematical Modeling and NASA Langley U.S. Air Force - AFB Compute NIIFI U.S. Navy/NRL Exeter University NIMS Universidade Catolica GENCI NMCAC Universite Paul Sabatier – HLRN Berlin NOAA (CSC) CICT/Calmip HLRN Hannover NTNU University of Arizona ICHEC Onera University of Hyderabad ICR ORNL University of Oxford Idaho National Laboratory Pontificia Universidade University of São Paulo – IFREMER Catolica – PUC-Rio USP/IAG Imperial College Rosshydromet University of Rio de Janeiro – IMSc Scientific Analysis Group NACAD INMET Semiconductor Energy Labs University of Rio Grande do JAMSTEC Sul – CESUP Korean Air Force US Army TACOM

23 Hungary 2011-2014

NIIFI supercomputers for non-profit customers:

• Cluster: SGI ICE Debrecen (18TFlops, 1536 core; 6TB ram) • Cluster + GPU: HP Szeged (14TFlops, 2304 core; 5,6TB ram, 6xM2070) • SMP/ccNUMA: SGI UV Pécs (10TFlops, 1152 core, 6TB ram) http://www.niif.hu/szolgaltatasok/szuperszamitastechnika/altalanos_ismerteto

24 Hungary 2015

• Cluster: 200+ Tflops (200pc GPU or PHI)

• Cluster: 30+ GPU or PHI

• Cluster: SGI ICE Debrecen (18TFlops, 1536 core; 6TB ram) • Cluster + GPU: HP Szeged (14TFlops, 2304 core; 5,6TB ram, 6xM2070) • SMP/ccNUMA: SGI UV Pécs (10TFlops, 1152 core, 6TB ram)

25 ©2012 SGI 26