Accelerating Research and Development Using the Titan Supercomputer

Total Page:16

File Type:pdf, Size:1020Kb

Accelerating Research and Development Using the Titan Supercomputer Accelerating Research and Development Using the Titan Supercomputer Fernanda Foertter HPC User Support, ORNL ORNL is managed by UT-Battelle for the US Department of Energy What is the Leadership Computing Facility (LCF)? • Collaborative DOE Office of Science • Highly competitive user allocation program at ORNL and ANL programs (INCITE, ALCC). • Mission: Provide the computational • Projects receive 10x to 100x more and data resources required to solve resource than at other generally the most challenging problems. available centers. • 2-centers/2-architectures to address • LCF centers partner with users to diverse and growing computational enable science & engineering needs of the scientific community breakthroughs (Liaisons, Catalysts). 2 The OLCF has delivered five systems and six upgrades to our users since 2004 • Increased our system capability by 10,000x • Strong partnerships with computer designers and architects • Worked with users to scale codes by 10,000x • Science delivered through strong user partnerships to scale codes and algorithms Titan XK7 • GPU upgrade Jaguar XT5 2012 Jaguar XT4 • 6 core upgrade Jaguar XT3 • Quad core upgrade 2008 Phoenix X1 • Dual core upgrade • Doubled size 2007 • X1e 2005 2004 3 Science breakthroughs at the OLCF: SELECTED science and engineering advances over the period 2003 - 2013 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Researchers solved the 2D Hubbard model MD simulations show and presented evidence that it predicts selectivity filter of a trans- HTSC behavior, Phys. Rev. Lett (2005) membrane ion channel is 105 citations, 3/2014 sterically locked open by hidden water molecules, First-Principles Flame Simulation Provides Crucial Nature (2013) Information to Guide Design of Fuel-Efficient Clean Calculation of the number of bound Engines, Proc. Combust. Insti. (2007) nuclei in nature, Nature (2012), 36 78 citations, 3/2014 citations, 3/2014 , 36 citations, 3/2014 Largest simulation of a galaxy’s worth of dark matter, showed for the first time the Global Warming preceded by increasing fractal-like appearance of dark matter carbon dioxide concentrations during the substructures, Nature (2008) last deglaciation, Nature (2012). 326 citations, 3/2014 64 citations, 3/2014 World’s first continuous simulation of Demonstrated that three-body forces Astrophysicists discover 21,000 years of Earth’s climate are necessary to describe the supernova shock-wave instability, history, Science (2009) long lifetime of 14C Astrophys. J. (2003) 116 citations 254 citations, 3/2014 Phys. Rev. Lett. (2011) Biomass as a viable, sustainable feedstock for hydrogen 28 citations, 3/2014 production for fuel cells, Nano Letters (2011) J. Phys. Chem. Lett. (2010) 4 71 & 74 citations, respectively No more free lunch Herb Sutter: Dr. Dobb’s Journal: 5 http://www.gotw.ca/publications/concurrency-ddj.htm Power is THE problem Power consumption of 2.3 PF (Peak) Jaguar: 7 megawatts, equivalent to that of a small city (5,000 homes) 6 Using traditional CPUs is not economically feasible 20 PF+ system: 30 megawatts (30,000 homes) 7 Why GPUs? Hierarchical Parallelism High performance and power efficiency on path to exascale • Hierarchical parallelism improves CPU GPU Accelerator scalability of applications • Expose more parallelism through code refactoring and source code directives – Doubles performance of many codes • Heterogeneous multicore processor architecture: Using right type of processor for each task • Optimized • Data locality: Keep data near processing for sequential multitasking • Optimized for many – GPU has high bandwidth to local memory simultaneous tasks for rapid access • 10 performance – GPU has large internal cache per socket • 5 more energy- • Explicit data management: Explicitly efficient systems manage data movement between CPU and GPU memories 8 #2 8.2 Megawatts 27 Pflops (Peak) 17.59 PFlops (Linpack) 9 10 Roadmap to Exascale Our Science requires that we advance computational capability 1000x over the next decade. 2022 2017 2012 OLCF-5: 1 EF 20 MW What are the Challenges? OLCF-4: 100-250 PF 4000 TB memory Titan 27 PF > 20MW 600 TB DRAM 6 day resilience 11 Hybrid GPU/CPU Represented Research Science Category Areas Requirements gathering Bioinformatics Biophysics DOE/SC and LCFs support a Life Sciences Biology Medical Science Neuroscience diverse user community Proteomics • Science benefits and impact of future Systems Biology Chemistry Chemistry systems are examined on an ongoing Physical Chemistry Computer Science Computer Science basis Climate Earth Science Geosciences • LCF staff have been actively engaged in Aerodynamics Bioenergy Engineering community assessments of future Combustion computational needs and solutions Turbulence Fusion Energy Fusion Plasma Physics • Computational science roadmaps are Materials Science Nanoelectronics developed in collaboration with leading Materials Nanomechanics domain scientists Nanophotonics Nanoscience Nuclear Fission Nuclear Energy • Detailed performance analyses are Nuclear Fuel Cycle Accelerator Physics conducted for applications to understand Astrophysics future architectural bottlenecks Atomic/Molecular Physics Condensed Matter Physics Physics High Energy Physics • Analysis of INCITE, ALCC, Early Science, Lattice Gauge Theory Nuclear Physics and Center for Accelerated Application Solar/Space Physics Readiness (CAAR) projects history and trends 12 Requirements Process • Surveys are a “lagging indicator” that tend to tell us what problems the users are seeing now, not what they expect to see in the future • https://www.olcf.ornl.gov/wp-content/uploads/2013/01/OLCF_Requirements_TM_2013_Final.pdf 13 OLCF User Requirements Survey – Key Findings • Memory bandwidth was reported as the Hardware feature Ranking greatest need Memory Bandwidth 4.4 • Local memory capacity was not a driver for Flops 4.0 most users, perhaps in recognition of cost Interconnect Bandwidth 3.9 trends Archival Storage 3.8 Capacity • 76% of users said there is still a moderate to Interconnect Latency 3.7 large amount of parallelism to extract in their Disk Bandwidth 3.7 code, but… WAN Network 3.7 Bandwidth Memory Latency 3.5 • 85% of users rated difficulty level of extracting Local Storage Capacity 3.5 that parallelism as moderate to difficult - often Memory Capacity 3.2 requires application refactoring Mean Time to Interrupt 3.0 – Highlights training needs and community Disk Latency 2.9 based efforts for application readiness Rankings from OLCF users 1=not important, 5=very important 14 Center for Accelerated Application Readiness (CAAR) • We created CAAR as part of the Titan project to help prepare applications for accelerated architectures • Goals: – Work with code teams to develop and implement strategies for exposing hierarchical parallelism for our users applications – Maintain code portability across modern architectures – Learn from and share our results • We selected six applications from across different science domains and algorithmic motifs 15 CAAR Plan • Comprehensive team assigned to each app – OLCF application lead – Cray engineer – NVIDIA developer – Other: other application developers, local tool/library developers, computational scientists • Single early-science problem targeted for each app – Success on this problem is ultimate metric for success • Particular plan-of-attack different for each app – WL-LSMS – dependent on accelerated ZGEMM – CAM-SE– pervasive and widespread custom acceleration required • Multiple acceleration methods explored – WL-LSMS – CULA, MAGMA, custom ZGEMM – CAM-SE– CUDA, directives – Two-fold aim – Maximum acceleration for model problem – Determination of optimal, reproducible acceleration path for other applications 16 Early Science Challenges for Titan WL-LSMS LAMMPS Illuminating the role of A molecular dynamics material disorder, simulation of organic statistics, and fluctuations polymers for applications in nanoscale materials in organic photovoltaic and systems. heterojunctions , de- wetting phenomena and biosensor applications CAM-SE Answering questions S3D about specific climate Understanding turbulent change adaptation and combustion through direct mitigation scenarios; numerical simulation with realistically represent complex chemistry. features like . precipitation patterns / statistics and tropical storms. Denovo NRDF Discrete ordinates Radiation transport – radiation transport important in astrophysics, calculations that can laser fusion, combustion, be used in a variety atmospheric dynamics, of nuclear energy and medical imaging – and technology computed on AMR grids. applications. 17 Effectiveness of GPU Acceleration? OLCF-3 Early Science Codes -- Performance on Titan XK7 Cray XK7 vs. Cray XE6 Application Performance Ratio* LAMMPS* 7.4 Molecular dynamics S3D 2.2 Turbulent combustion Denovo 3.8 3D neutron transport for nuclear reactors WL-LSMS 3.8 Statistical mechanics of magnetic materials Titan: Cray XK7 (Kepler GPU plus AMD 16-core Opteron CPU) Cray XE6: (2x AMD 16-core Opteron CPUs) *Performance depends strongly on specific problem size chosen 18 Additional Applications from Community Efforts Current Performance Measurements on Titan Cray XK7 vs. Cray XE6 Application Performance Ratio* AWP-ODC 2.1 Seismology DCA++ 4.4 Condensed Matter Physics QMCPACK 2.0 Electronic structure RMG (DFT – real-space, multigrid) 2.0 Electronic Structure XGC1 1.8 Plasma Physics for Fusion Energy R&D Titan: Cray XK7 (Kepler GPU plus AMD 16-core Opteron CPU) Cray XE6: (2x AMD 16-core Opteron CPUs) *Performance depends strongly on specific
Recommended publications
  • ORNL Debuts Titan Supercomputer Table of Contents
    ReporterRetiree Newsletter December 2012/January 2013 SCIENCE ORNL debuts Titan supercomputer ORNL has completed the installation of Titan, a supercomputer capable of churning through more than 20,000 trillion calculations each second—or 20 petaflops—by employing a family of processors called graphic processing units first created for computer gaming. Titan will be 10 times more powerful than ORNL’s last world-leading system, Jaguar, while overcoming power and space limitations inherent in the previous generation of high- performance computers. ORNL is now home to Titan, the world’s most powerful supercomputer for open science Titan, which is supported by the DOE, with a theoretical peak performance exceeding 20 petaflops (quadrillion calculations per second). (Image: Jason Richards) will provide unprecedented computing power for research in energy, climate change, efficient engines, materials and other disciplines and pave the way for a wide range of achievements in “Titan will provide science and technology. unprecedented computing Table of Contents The Cray XK7 system contains 18,688 nodes, power for research in energy, with each holding a 16-core AMD Opteron ORNL debuts Titan climate change, materials 6274 processor and an NVIDIA Tesla K20 supercomputer ............1 graphics processing unit (GPU) accelerator. and other disciplines to Titan also has more than 700 terabytes of enable scientific leadership.” Betty Matthews memory. The combination of central processing loves to travel .............2 units, the traditional foundation of high- performance computers, and more recent GPUs will allow Titan to occupy the same space as Service anniversaries ......3 its Jaguar predecessor while using only marginally more electricity. “One challenge in supercomputers today is power consumption,” said Jeff Nichols, Benefits ..................4 associate laboratory director for computing and computational sciences.
    [Show full text]
  • Petaflops for the People
    PETAFLOPS SPOTLIGHT: NERSC housands of researchers have used facilities of the Advanced T Scientific Computing Research (ASCR) program and its EXTREME-WEATHER Department of Energy (DOE) computing predecessors over the past four decades. Their studies of hurricanes, earthquakes, NUMBER-CRUNCHING green-energy technologies and many other basic and applied Certain problems lend themselves to solution by science problems have, in turn, benefited millions of people. computers. Take hurricanes, for instance: They’re They owe it mainly to the capacity provided by the National too big, too dangerous and perhaps too expensive Energy Research Scientific Computing Center (NERSC), the Oak to understand fully without a supercomputer. Ridge Leadership Computing Facility (OLCF) and the Argonne Leadership Computing Facility (ALCF). Using decades of global climate data in a grid comprised of 25-kilometer squares, researchers in These ASCR installations have helped train the advanced Berkeley Lab’s Computational Research Division scientific workforce of the future. Postdoctoral scientists, captured the formation of hurricanes and typhoons graduate students and early-career researchers have worked and the extreme waves that they generate. Those there, learning to configure the world’s most sophisticated same models, when run at resolutions of about supercomputers for their own various and wide-ranging projects. 100 kilometers, missed the tropical cyclones and Cutting-edge supercomputing, once the purview of a small resulting waves, up to 30 meters high. group of experts, has trickled down to the benefit of thousands of investigators in the broader scientific community. Their findings, published inGeophysical Research Letters, demonstrated the importance of running Today, NERSC, at Lawrence Berkeley National Laboratory; climate models at higher resolution.
    [Show full text]
  • Safety and Security Challenge
    SAFETY AND SECURITY CHALLENGE TOP SUPERCOMPUTERS IN THE WORLD - FEATURING TWO of DOE’S!! Summary: The U.S. Department of Energy (DOE) plays a very special role in In fields where scientists deal with issues from disaster relief to the keeping you safe. DOE has two supercomputers in the top ten supercomputers in electric grid, simulations provide real-time situational awareness to the whole world. Titan is the name of the supercomputer at the Oak Ridge inform decisions. DOE supercomputers have helped the Federal National Laboratory (ORNL) in Oak Ridge, Tennessee. Sequoia is the name of Bureau of Investigation find criminals, and the Department of the supercomputer at Lawrence Livermore National Laboratory (LLNL) in Defense assess terrorist threats. Currently, ORNL is building a Livermore, California. How do supercomputers keep us safe and what makes computing infrastructure to help the Centers for Medicare and them in the Top Ten in the world? Medicaid Services combat fraud. An important focus lab-wide is managing the tsunamis of data generated by supercomputers and facilities like ORNL’s Spallation Neutron Source. In terms of national security, ORNL plays an important role in national and global security due to its expertise in advanced materials, nuclear science, supercomputing and other scientific specialties. Discovery and innovation in these areas are essential for protecting US citizens and advancing national and global security priorities. Titan Supercomputer at Oak Ridge National Laboratory Background: ORNL is using computing to tackle national challenges such as safe nuclear energy systems and running simulations for lower costs for vehicle Lawrence Livermore's Sequoia ranked No.
    [Show full text]
  • Lessons Learned in Deploying the World's Largest Scale Lustre File
    Lessons Learned in Deploying the World’s Largest Scale Lustre File System Galen M. Shipman, David A. Dillow, Sarp Oral, Feiyi Wang, Douglas Fuller, Jason Hill, Zhe Zhang Oak Ridge Leadership Computing Facility, Oak Ridge National Laboratory Oak Ridge, TN 37831, USA fgshipman,dillowda,oralhs,fwang2,fullerdj,hilljj,[email protected] Abstract 1 Introduction The Spider system at the Oak Ridge National Labo- The Oak Ridge Leadership Computing Facility ratory’s Leadership Computing Facility (OLCF) is the (OLCF) at Oak Ridge National Laboratory (ORNL) world’s largest scale Lustre parallel file system. Envi- hosts the world’s most powerful supercomputer, sioned as a shared parallel file system capable of de- Jaguar [2, 14, 7], a 2.332 Petaflop/s Cray XT5 [5]. livering both the bandwidth and capacity requirements OLCF also hosts an array of other computational re- of the OLCF’s diverse computational environment, the sources such as a 263 Teraflop/s Cray XT4 [1], visual- project had a number of ambitious goals. To support the ization, and application development platforms. Each of workloads of the OLCF’s diverse computational plat- these systems requires a reliable, high-performance and forms, the aggregate performance and storage capacity scalable file system for data storage. of Spider exceed that of our previously deployed systems Parallel file systems on leadership-class systems have by a factor of 6x - 240 GB/sec, and 17x - 10 Petabytes, traditionally been tightly coupled to single simulation respectively. Furthermore, Spider supports over 26,000 platforms. This approach had resulted in the deploy- clients concurrently accessing the file system, which ex- ment of a dedicated file system for each computational ceeds our previously deployed systems by nearly 4x.
    [Show full text]
  • Titan: a New Leadership Computer for Science
    Titan: A New Leadership Computer for Science Presented to: DOE Advanced Scientific Computing Advisory Committee November 1, 2011 Arthur S. Bland OLCF Project Director Office of Science Statement of Mission Need • Increase the computational resources of the Leadership Computing Facilities by 20-40 petaflops • INCITE program is oversubscribed • Programmatic requirements for leadership computing continue to grow • Needed to avoid an unacceptable gap between the needs of the science programs and the available resources • Approved: Raymond Orbach January 9, 2009 • The OLCF-3 project comes out of this requirement 2 ASCAC – Nov. 1, 2011 Arthur Bland INCITE is 2.5 to 3.5 times oversubscribed 2007 2008 2009 2010 2011 2012 3 ASCAC – Nov. 1, 2011 Arthur Bland What is OLCF-3 • The next phase of the Leadership Computing Facility program at ORNL • An upgrade of Jaguar from 2.3 Petaflops (peak) today to between 10 and 20 PF by the end of 2012 with operations in 2013 • Built with Cray’s newest XK6 compute blades • When completed, the new system will be called Titan 4 ASCAC – Nov. 1, 2011 Arthur Bland Cray XK6 Compute Node XK6 Compute Node Characteristics AMD Opteron 6200 “Interlagos” 16 core processor @ 2.2GHz Tesla M2090 “Fermi” @ 665 GF with 6GB GDDR5 memory Host Memory 32GB 1600 MHz DDR3 Gemini High Speed Interconnect Upgradeable to NVIDIA’s next generation “Kepler” processor in 2012 Four compute nodes per XK6 blade. 24 blades per rack 5 ASCAC – Nov. 1, 2011 Arthur Bland ORNL’s “Titan” System • Upgrade of existing Jaguar Cray XT5 • Cray Linux Environment
    [Show full text]
  • Musings RIK FARROWOPINION
    Musings RIK FARROWOPINION Rik is the editor of ;login:. While preparing this issue of ;login:, I found myself falling down a rabbit hole, like [email protected] Alice in Wonderland . And when I hit bottom, all I could do was look around and puzzle about what I discovered there . My adventures started with a casual com- ment, made by an ex-Cray Research employee, about the design of current super- computers . He told me that today’s supercomputers cannot perform some of the tasks that they are designed for, and used weather forecasting as his example . I was stunned . Could this be true? Or was I just being dragged down some fictional rabbit hole? I decided to learn more about supercomputer history . Supercomputers It is humbling to learn about the early history of computer design . Things we take for granted, such as pipelining instructions and vector processing, were impor- tant inventions in the 1970s . The first supercomputers were built from discrete components—that is, transistors soldered to circuit boards—and had clock speeds in the tens of nanoseconds . To put that in real terms, the Control Data Corpora- tion’s (CDC) 7600 had a clock cycle of 27 .5 ns, or in today’s terms, 36 4. MHz . This was CDC’s second supercomputer (the 6600 was first), but included instruction pipelining, an invention of Seymour Cray . The CDC 7600 peaked at 36 MFLOPS, but generally got 10 MFLOPS with carefully tuned code . The other cool thing about the CDC 7600 was that it broke down at least once a day .
    [Show full text]
  • Unlocking the Full Potential of the Cray XK7 Accelerator
    Unlocking the Full Potential of the Cray XK7 Accelerator ∗ † Mark D. Klein , and, John E. Stone ∗National Center for Supercomputing Application, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA †Beckman Institute, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA Abstract—The Cray XK7 includes NVIDIA GPUs for ac- [2], [3], [4], and for high fidelity movie renderings with the celeration of computing workloads, but the standard XK7 HVR volume rendering software.2 In one example, the total system software inhibits the GPUs from accelerating OpenGL turnaround time for an HVR movie rendering of a trillion- and related graphics-specific functions. We have changed the operating mode of the XK7 GPU firmware, developed a custom cell inertial confinement fusion simulation [5] was reduced X11 stack, and worked with Cray to acquire an alternate driver from an estimate of over a month for data transfer to and package from NVIDIA in order to allow users to render and rendering on a conventional visualization cluster down to post-process their data directly on Blue Waters. Users are able just one day when rendered locally using 128 XK7 nodes to use NVIDIA’s hardware OpenGL implementation which has on Blue Waters. The fully-graphics-enabled GPU state is many features not available in software rasterizers. By elim- inating the transfer of data to external visualization clusters, currently considered an unsupported mode of operation by time-to-solution for users has been improved tremendously. In Cray, and to our knowledge Blue Waters is presently the one case, XK7 OpenGL rendering has cut turnaround time only Cray system currently running in this mode.
    [Show full text]
  • This Is Your Presentation Title
    Introduction to GPU/Parallel Computing Ioannis E. Venetis University of Patras 1 Introduction to GPU/Parallel Computing www.prace-ri.eu Introduction to High Performance Systems 2 Introduction to GPU/Parallel Computing www.prace-ri.eu Wait, what? Aren’t we here to talk about GPUs? And how to program them with CUDA? Yes, but we need to understand their place and their purpose in modern High Performance Systems This will make it clear when it is beneficial to use them 3 Introduction to GPU/Parallel Computing www.prace-ri.eu Top 500 (June 2017) CPU Accel. Rmax Rpeak Power Rank Site System Cores Cores (TFlop/s) (TFlop/s) (kW) National Sunway TaihuLight - Sunway MPP, Supercomputing Center Sunway SW26010 260C 1.45GHz, 1 10.649.600 - 93.014,6 125.435,9 15.371 in Wuxi Sunway China NRCPC National Super Tianhe-2 (MilkyWay-2) - TH-IVB-FEP Computer Center in Cluster, Intel Xeon E5-2692 12C 2 Guangzhou 2.200GHz, TH Express-2, Intel Xeon 3.120.000 2.736.000 33.862,7 54.902,4 17.808 China Phi 31S1P NUDT Swiss National Piz Daint - Cray XC50, Xeon E5- Supercomputing Centre 2690v3 12C 2.6GHz, Aries interconnect 3 361.760 297.920 19.590,0 25.326,3 2.272 (CSCS) , NVIDIA Tesla P100 Cray Inc. DOE/SC/Oak Ridge Titan - Cray XK7 , Opteron 6274 16C National Laboratory 2.200GHz, Cray Gemini interconnect, 4 560.640 261.632 17.590,0 27.112,5 8.209 United States NVIDIA K20x Cray Inc. DOE/NNSA/LLNL Sequoia - BlueGene/Q, Power BQC 5 United States 16C 1.60 GHz, Custom 1.572.864 - 17.173,2 20.132,7 7.890 4 Introduction to GPU/ParallelIBM Computing www.prace-ri.eu How do
    [Show full text]
  • Jaguar Supercomputer
    Jaguar Supercomputer Jake Baskin '10, Jānis Lībeks '10 Jan 27, 2010 What is it? Currently the fastest supercomputer in the world, at up to 2.33 PFLOPS, located at Oak Ridge National Laboratory (ORNL). Leader in "petascale scientific supercomputing". Uses Massively parallel simulations. Modeling: Climate Supernovas Volcanoes Cellulose http://www.nccs.gov/wp- content/themes/nightfall/img/jaguarXT5/gallery/jaguar-1.jpg Overview Processor Specifics Network Architecture Programming Models NCCS networking Spider file system Scalability The guts 84 XT4 and 200 XT5 cabinets XT5 18688 compute nodes 256 service and i/o nodes XT4 7832 compute nodes 116 service and i/o nodes (XT5) Compute Nodes 2 Opteron 2435 Istanbul (6 core) processors per node 64K L1 instruction cache 65K L1 data cache per core 512KB L2 cache per core 6MB L3 cache per processor (shared) 8GB of DDR2-800 RAM directly attached to each processor by integrated memory controller. http://www.cray.com/Assets/PDF/products/xt/CrayXT5Brochure.pdf How are they organized? 3-D Torus topology XT5 and XT4 segments are connected by an InfiniBand DDR network 889 GB/sec bisectional bandwidth http://www.cray.com/Assets/PDF/products/xt/CrayXT5Brochure.pdf Programming Models Jaguar supports these programming models: MPI (Message Passing Interface) OpenMP (Open Multi Processing) SHMEM (SHared MEMory access library) PGAS (Partitioned global address space) NCCS networking Jaguar usually performs computations on large datasets. These datasets have to be transferred to ORNL. Jaguar is connected to ESnet (Energy Sciences Network, scientific institutions) and Internet2 (higher education institutions). ORNL owns its own optical network that allows 10Gb/s to various locations around the US.
    [Show full text]
  • Hardware Complexity and Software Challenges
    Trends in HPC: hardware complexity and software challenges Mike Giles Oxford University Mathematical Institute Oxford e-Research Centre ACCU talk, Oxford June 30, 2015 Mike Giles (Oxford) HPC Trends June 30, 2015 1 / 29 1 { 10? 10 { 100? 100 { 1000? Answer: 4 cores in Intel Core i7 CPU + 384 cores in NVIDIA K1100M GPU! Peak power consumption: 45W for CPU + 45W for GPU Question ?! How many cores in my Dell M3800 laptop? Mike Giles (Oxford) HPC Trends June 30, 2015 2 / 29 Answer: 4 cores in Intel Core i7 CPU + 384 cores in NVIDIA K1100M GPU! Peak power consumption: 45W for CPU + 45W for GPU Question ?! How many cores in my Dell M3800 laptop? 1 { 10? 10 { 100? 100 { 1000? Mike Giles (Oxford) HPC Trends June 30, 2015 2 / 29 Question ?! How many cores in my Dell M3800 laptop? 1 { 10? 10 { 100? 100 { 1000? Answer: 4 cores in Intel Core i7 CPU + 384 cores in NVIDIA K1100M GPU! Peak power consumption: 45W for CPU + 45W for GPU Mike Giles (Oxford) HPC Trends June 30, 2015 2 / 29 Top500 supercomputers Really impressive { 300× more capability in 10 years! Mike Giles (Oxford) HPC Trends June 30, 2015 3 / 29 My personal experience 1982: CDC Cyber 205 (NASA Langley) 1985{89: Alliant FX/8 (MIT) 1989{92: Stellar (MIT) 1990: Thinking Machines CM5 (UTRC) 1987{92: Cray X-MP/Y-MP (NASA Ames, Rolls-Royce) 1993{97: IBM SP2 (Oxford) 1998{2002: SGI Origin (Oxford) 2002 { today: various x86 clusters (Oxford) 2007 { today: various GPU systems/clusters (Oxford) 2011{15: GPU cluster (Emerald @ RAL) 2008 { today: Cray XE6/XC30 (HECToR/ARCHER @ EPCC) 2013 { today: Cray XK7 with GPUs (Titan @ ORNL) Mike Giles (Oxford) HPC Trends June 30, 2015 4 / 29 Top500 supercomputers Power requirements are raising the question of whether this rate of improvement is sustainable.
    [Show full text]
  • EVGA Geforce GTX TITAN X Superclocked
    EVGA GeForce GTX TITAN X Superclocked Part Number: 12G-P4-2992-KR The EVGA GeForce GTX TITAN X combines the technologies and performance of the new NVIDIA Maxwell architecture in the fastest and most advanced graphics card on the planet. This incredible GPU delivers unrivaled graphics, acoustic, thermal and power-efficient performance. The most demanding enthusiast can now experience extreme resolutions up to 4K-and beyond. Enjoy hyper-realistic, real-time lighting with advanced NVIDIA VXGI, as well as NVIDIA G-SYNC display technology for smooth, tear-free gaming. Plus, you get DSR technology that delivers a brilliant 4K experience, even on a 1080p display. SPECIFICATIONS KEY FEATURES RESOLUTION & REFRESH Base Clock: 1127 MHZ NVIDIA Dynamic Super Resolution Technology Max Monitors Supported: 4 Boost Clock: 1216 MHz NVIDIA MFAA Technology 240Hz Max Refresh Rate Memory Clock: 7010 MHz Effective NVIDIA GameWorks Technology Max Analog: 2048x1536 CUDA Cores: 3072 NVIDIA GameStream Technology Max Digital: 4096x2160 Bus Type: PCI-E 3.0 NVIDIA G-SYNC Ready Memory Detail: 12288MB GDDR5 Microsoft DirectX 12 REQUIREMENTS Memory Bit Width: 384 Bit NVIDIA GPU Boost 2.0 600 Watt or greater power supply.**** Memory Speed: 0.28ns NVIDIA Adaptive Vertical Sync PCI Express, PCI Express 2.0 or PCI Express Memory Bandwidth: 336.5 GB/s NVIDIA Surround Technology 3.0 compliant motherboard with one graphics NVIDIA SLI Ready slot. DIMENSIONS NVIDIA CUDA Technology An available 6-pin PCI-E power connector and Height: 4.376in - 111.15mm OpenGL 4.4 Support an available 8 pin PCI-E power connector Length: 10.5in - 266.7mm OpenCL Support Windows 8 32/64bit, Windows 7 32/64bit, Windows Vista 32/64bit Width: Dual Slot HDMI 2.0, DisplayPort 1.2 and Dual-link DVI PCI Express 3.0 **Support for HDMI includes GPU-accelerated Blu-ray 3D support (Blu-ray 3D playback requires the purchase of a compatible software player from CyberLink, ArcSoft, Corel, or Sonic), x.v.Color, HDMI Deep Color, and 7.1 digital surround sound.
    [Show full text]
  • A Comprehensive Performance Comparison of Dedicated and Embedded GPU Systems
    DUJE (Dicle University Journal of Engineering) 11:3 (2020) Page 1011-1020 Research Article A Comprehensive Performance Comparison of Dedicated and Embedded GPU Systems Adnan Ozsoy 1,* 1 Department of Computer Engineering, Hacettepe University, [email protected] (ORCID: https://orcid.org/0000-0002-0302-3721) ARTICLE INFO ABSTRACT Article history: General purpose usage of graphics processing units (GPGPU) is becoming increasingly important as Received 26 May 2020 Received in revised form 29 June 2020 graphics processing units (GPUs) get more powerful and their widespread usage in performance-oriented Accepted 9 July 2020 computing. GPGPUs are mainstream performance hardware in workstation and cluster environments and Available online 30 September 2020 their behavior in such setups are highly analyzed. Recently, NVIDIA, the leader hardware and software vendor in GPGPU computing, started to produce more energy efficient embedded GPGPU systems, Jetson Keywords: NVIDIA Jetson, series GPUs, to make GPGPU computing more applicable in domains where energy and space are limited. Embedded GPGPU, CUDA Although, the architecture of the GPUs in Jetson systems is the same as the traditional dedicated desktop graphic cards, the interaction between the GPU and the other components of the system such as main memory, central processing unit (CPU), and hard disk, is a lot different than traditional desktop solutions. To fully understand the capabilities of the Jetson series embedded solutions, in this paper we run several applications from many different domains and compare the performance characteristics of these applications on both embedded and dedicated desktop GPUs. After analyzing the collected data, we have identified certain application domains and program behaviors that Jetson series can deliver performance comparable to dedicated GPU performance.
    [Show full text]