Trends in High-Performance Computing

Total Page:16

File Type:pdf, Size:1020Kb

Trends in High-Performance Computing N o v E l A r C h I t E C t u r E S Editors: Volodymyr Kindratenko, [email protected] Pedro Trancoso, [email protected] Trends in HigH-Performance comPuTing By Volodymyr Kindratenko and Pedro Trancoso HPC system architectures are shifting from the traditional clusters of homogeneous nodes to clusters of heterogeneous nodes and accelerators. e can infer the future of from traditional clusters of homo- at number six and deployed at the high-performance com- geneous nodes to clusters of hetero- French Atomic and Alternative Ener- W puting (HPC) from the geneous nodes (CPU+GPU). As the gies Commission. technologies developed today to show- sidebar, “Architecture of Current Peta- The Tianhe-1A is composed of case the leadership-class compute flops Systems” describes, three out 7,168 nodes—each containing two systems—that is, the supercomputers. of seven systems that achieved over Intel Xeon X5670 hex-core (West- These machines are usually designed one quadrillion flops (petaflops) on mere) processors and one Nvidia to achieve the highest possible perfor- the standard benchmark tool Linpack Tesla M2050 GPU. The Jaguar is a mance in terms of the number of 64-bit (www.top500.org/project/linpack) are more traditional Cray XT system floating-point operations per second Nvidia GPU-based, and two out of consisting of 18,688 compute nodes (flops). Their architecture has evolved the three largest systems are deployed containing dual hex-core AMD Op- from early custom design systems to in China. teron 2435 (Istanbul) processors. The the current clusters of commodity The number one system on the highest-performing GPU-based US- multisocket, multicore systems. Twice November 2010 Top-500 list is Tianhe- built system, deployed at Lawrence a year, the supercomputing commu- 1A, deployed at the National Su- Livermore National Laboratory, is nity ranks the systems and produces percomputing Center in Tianjin. It number 72 on the Top-500 list. We’ve the Top-500 list (www.top500.org), achieves about 2.57 petaflops on Lin- yet to see any substantially larger which shows the world’s 500 high- pack with a theoretical peak perfor- GPU-based HPC systems deployed in est performing machines. As we now mance of 4.7 petaflops. The number the US. Instead, we see many research describe, the technologies used in the three system is Nebulae, deployed at centers deploying small and mid- top-ranked machines give a good indi- the National Supercomputing Centre range GPU systems, such as the US cation of the architecture trends. in Shenzhen. It achieves 1.27 petaflops National Science Foundation (NSF) on the Linpack benchmark with a Track 2D Keeneland system, which The Top-500 theoretical peak of almost 3 petaflops. is in its initial deployment phase at The November 2010 Top-500 list The highest performing US system, the Georgia Institute of Technol- of the world’s most powerful super- Jaguar, deployed at the Oak Ridge ogy (number 117 on the Top-500 computers stressed two noticeable National Laboratory, is number two list), or the Forge GPU HPC cluster developments in HPC: the advent of on the Top-500 list, achieving 1.75 currently being developed at the US HPC clusters based on graphical pro- petaflops with theoretical perfor- National Center for Supercomput- cessing units (GPUs) as the dominant mance of 2.3 petaflops. The number ing Applications at the University of petascale architecture, and the rise of four system on the November 2010 Illinois. The Keeneland project in China as a dominant player in the su- Top-500 list is the Japan-built Tsub- particular is interesting in that a part percomputing arena. ame 2.0 system, which is a GPU-based of the effort is devoted to developing To build more powerful machines, system. Europe’s most powerful sys- the software infrastructure neces- system architects are moving away tem is the Tera-100, which is ranked sary to utilize the GPUs in an HPC 92 Copublished by the IEEE CS and the AIP 1521-9615/11/$26.00 © 2011 IEEE Computing in SCienCe & engineering CISE-13-3-Novel.indd 92 05/04/11 6:16 PM Architecture of current benchmark with a theoretical peak of about 2.3 peta- flops. It consists of 1,442 compute nodes containing PetAfloPs systems (mostly) six-core Intel Xeon X5670 (Westmere) CPus, or 73,278 processor cores in total, and three Nvidia he November 2010 top-500 list included seven tesla M2050 GPus per node. the system used Quad petaflops systems: t Data rate Infiniband interconnect. 1. tianhe-1A, deployed at the National Supercomputing 5. hopper, deployed at the uS National Energy research Center in tianjin, China, achieves about 2.57 petaflops Scientific Computing Center, achieves just over a peta- on linpack with a theoretical peak performance of flops on linpack, with a theoretical peak of almost 1.3 4.7 petaflops. tianhe-1A is composed of 7,168 nodes, petaflops.h opper is a Cray XE6 system consisting of each containing two Intel Xeon EM64 X5670 six-core 6,384 nodes containing 12-core AMD operon proces- (Westmere) processors and one Nvidia tesla M2050 sors, or 153,216 processor cores in total. the system GPu. the system uses a proprietary interconnect. uses a custom interconnect. 2. Jaguar, deployed at the oak ridge National labora- 6. tera-100, deployed at the Alternative Energies and tory, uS, achieves 1.75 petaflops with theoretical per- Atomic Energy Commission, France, achieves 1.05 formance of 2.3 petaflops. Jaguar is a Cray Xt system petaflops onl inpack with a theoretical peak of 1.25 consisting of 18,688 compute nodes containing dual petaflops.t he system consists of 4,300 bullx S series six-core AMD opteron 2435 (Istanbul) processors, server nodes based on eight-core Intel EM64 Xeon or 224,256 processor cores total. the system uses a 7500 (Nehalem) processors, or 138,368 proces- proprietary interconnect. sor cores total. the system used Quad Data rate 3. Nebulae, deployed at the National Supercomputing Infiniband interconnect. Centre in Shenzhen, China, achieves 1.27 petaflops 7. roadrunner, deployed at los Alamos National labora- on the linpack benchmark with a theoretical peak tory, uS, was the first supercomputer in the world of nearly 3 petaflops. Nebulae is composed of about to achieve over one petaflops. It currently stands at 4,700 compute nodes containing Intel EM64 Xeon six- 1.04 petaflops onl inpack, with theoretical peak of core X5650 (Westmere) processors, or 120,640 pro- 1.38 petaflops. Its main workforce is a nine-core IBM cessor cores total, and Nvidia tesla C2050 GPu. the PowerXCell 8i experimental chip. the system consists system used Quad Data rate Infiniband interconnect. of 12,960 PowerXCell 8is and 6,480 dual-core AMD 4. tsubame 2.0, deployed at the tokyo Institute of tech- opterons, or 122,400 processor cores total, and uses nology, Japan, achieves 1.19 petaflops on the linpack voltaire Infiniband interconnect. environment. It’s also NSF’s first 1.75/2.3 = 0.76, or 38 percent above The Green and Graph Lists system funded under the Track 2 ex- Tianhe-1A. The difference is even As the “Metrics and Benchmarks” perimental/innovative design program. more pronounced when considering sidebar describes, organizations use The Chinese HPC community’s real scientific workloads. various metrics and applications approach to designing high-end sys- Also, software availability for large- to evaluate computing systems. In tems is certainly worth note. In terms scale GPU-based systems is limited. the past, supercomputers were de- of raw performance, adding a GPU to While numerous applications have veloped for the single highest per- a conventional HPC cluster node can been ported to GPU-based systems formance goal (as measured by quadruple its peak performance, or over the past two years, many of the Linpack), but current system devel- even increase it by an order of mag- widely used scientific supercomputing opment is driven by two additional nitude when using 32-bit arithmetic. codes that have been developed over factors: power consumption and But the increased peak performance the past 20 years have yet to be rewrit- complex applications. Traditional doesn’t necessarily translate into sus- ten for GPUs. Many such applications approaches to achieving perfor- tained application performance. As have outlived several generations of mance have reached an unbearable an example, take Tianhe-1A, with computer architectures; it would be power consumption cost. As such, sustained 2.57 petaflops on Linpack irrational to scrap all the prior work the community has proposed a sec- and theoretical peak performance of each time a new and drastically dif- ond list—the Green-500 list (www. 4.7 petaflops. Its efficiency in terms of ferent architecture is presented, even green500.org)—that ranks systems sustained versus peak performance is if its performance is significantly according to their performance/ 2.57/4.7 = 0.55. Jaguar, on the other higher. Not surprising, we have yet power metric. The Green-500 list hand, achieves 1.75 petaflops on Lin- to see any applications that can take sorts systems from the Top-500 list by pack with theoretical performance of advantage of Tianhe-1A’s computing their power efficiency; it also shows 2.3 petaflops. Thus, its efficiency is power. the trend toward heterogeneous may/June 2011 93 CISE-13-3-Novel.indd 93 05/04/11 6:16 PM N o v E l A r C h I t E C t u r E S metrics And BenchmArks power consumption for executing the benchmark on the system. It thus uses flops per watt (total power consump- upercomputer systems are evaluated according to tion for the program’s execution).
Recommended publications
  • Lessons Learned in Deploying the World's Largest Scale Lustre File
    Lessons Learned in Deploying the World’s Largest Scale Lustre File System Galen M. Shipman, David A. Dillow, Sarp Oral, Feiyi Wang, Douglas Fuller, Jason Hill, Zhe Zhang Oak Ridge Leadership Computing Facility, Oak Ridge National Laboratory Oak Ridge, TN 37831, USA fgshipman,dillowda,oralhs,fwang2,fullerdj,hilljj,[email protected] Abstract 1 Introduction The Spider system at the Oak Ridge National Labo- The Oak Ridge Leadership Computing Facility ratory’s Leadership Computing Facility (OLCF) is the (OLCF) at Oak Ridge National Laboratory (ORNL) world’s largest scale Lustre parallel file system. Envi- hosts the world’s most powerful supercomputer, sioned as a shared parallel file system capable of de- Jaguar [2, 14, 7], a 2.332 Petaflop/s Cray XT5 [5]. livering both the bandwidth and capacity requirements OLCF also hosts an array of other computational re- of the OLCF’s diverse computational environment, the sources such as a 263 Teraflop/s Cray XT4 [1], visual- project had a number of ambitious goals. To support the ization, and application development platforms. Each of workloads of the OLCF’s diverse computational plat- these systems requires a reliable, high-performance and forms, the aggregate performance and storage capacity scalable file system for data storage. of Spider exceed that of our previously deployed systems Parallel file systems on leadership-class systems have by a factor of 6x - 240 GB/sec, and 17x - 10 Petabytes, traditionally been tightly coupled to single simulation respectively. Furthermore, Spider supports over 26,000 platforms. This approach had resulted in the deploy- clients concurrently accessing the file system, which ex- ment of a dedicated file system for each computational ceeds our previously deployed systems by nearly 4x.
    [Show full text]
  • Titan: a New Leadership Computer for Science
    Titan: A New Leadership Computer for Science Presented to: DOE Advanced Scientific Computing Advisory Committee November 1, 2011 Arthur S. Bland OLCF Project Director Office of Science Statement of Mission Need • Increase the computational resources of the Leadership Computing Facilities by 20-40 petaflops • INCITE program is oversubscribed • Programmatic requirements for leadership computing continue to grow • Needed to avoid an unacceptable gap between the needs of the science programs and the available resources • Approved: Raymond Orbach January 9, 2009 • The OLCF-3 project comes out of this requirement 2 ASCAC – Nov. 1, 2011 Arthur Bland INCITE is 2.5 to 3.5 times oversubscribed 2007 2008 2009 2010 2011 2012 3 ASCAC – Nov. 1, 2011 Arthur Bland What is OLCF-3 • The next phase of the Leadership Computing Facility program at ORNL • An upgrade of Jaguar from 2.3 Petaflops (peak) today to between 10 and 20 PF by the end of 2012 with operations in 2013 • Built with Cray’s newest XK6 compute blades • When completed, the new system will be called Titan 4 ASCAC – Nov. 1, 2011 Arthur Bland Cray XK6 Compute Node XK6 Compute Node Characteristics AMD Opteron 6200 “Interlagos” 16 core processor @ 2.2GHz Tesla M2090 “Fermi” @ 665 GF with 6GB GDDR5 memory Host Memory 32GB 1600 MHz DDR3 Gemini High Speed Interconnect Upgradeable to NVIDIA’s next generation “Kepler” processor in 2012 Four compute nodes per XK6 blade. 24 blades per rack 5 ASCAC – Nov. 1, 2011 Arthur Bland ORNL’s “Titan” System • Upgrade of existing Jaguar Cray XT5 • Cray Linux Environment
    [Show full text]
  • Musings RIK FARROWOPINION
    Musings RIK FARROWOPINION Rik is the editor of ;login:. While preparing this issue of ;login:, I found myself falling down a rabbit hole, like [email protected] Alice in Wonderland . And when I hit bottom, all I could do was look around and puzzle about what I discovered there . My adventures started with a casual com- ment, made by an ex-Cray Research employee, about the design of current super- computers . He told me that today’s supercomputers cannot perform some of the tasks that they are designed for, and used weather forecasting as his example . I was stunned . Could this be true? Or was I just being dragged down some fictional rabbit hole? I decided to learn more about supercomputer history . Supercomputers It is humbling to learn about the early history of computer design . Things we take for granted, such as pipelining instructions and vector processing, were impor- tant inventions in the 1970s . The first supercomputers were built from discrete components—that is, transistors soldered to circuit boards—and had clock speeds in the tens of nanoseconds . To put that in real terms, the Control Data Corpora- tion’s (CDC) 7600 had a clock cycle of 27 .5 ns, or in today’s terms, 36 4. MHz . This was CDC’s second supercomputer (the 6600 was first), but included instruction pipelining, an invention of Seymour Cray . The CDC 7600 peaked at 36 MFLOPS, but generally got 10 MFLOPS with carefully tuned code . The other cool thing about the CDC 7600 was that it broke down at least once a day .
    [Show full text]
  • Germany's Top500 Businesses
    GERMANY’S TOP500 DIGITAL BUSINESS MODELS IN SEARCH OF BUSINESS CONTENTS FOREWORD 3 INSIGHT 1 4 SLOW GROWTH RATES YET HIGH SALES INSIGHT 2 6 NOT ENOUGH REVENUE IS ATTRIBUTABLE TO DIGITIZATION INSIGHT 3 10 EU REGULATIONS ARE WEAKENING INNOVATION INSIGHT 4 12 THE GERMAN FEDERAL GOVERNMENT COULD TURN INTO AN INNOVATION DRIVER CONCLUSION 14 FOREWORD Large German companies are on the lookout. Their purpose: To find new growth prospects. While revenue increases of more than 5 percent on average have not been uncommon for Germany’s 500 largest companies in the past, that level of growth has not occurred for the last four years. The reasons are obvious. With their high export rates, This study is intended to examine critically the Germany’s industrial companies continue to be major opportunities arising at the beginning of a new era of players in the global market. Their exports have, in fact, technology. Accenture uses four insights to not only been so high in the past that it is now increasingly describe the progress that has been made in digital difficult to sustain their previous rates of growth. markets, but also suggests possible steps companies can take to overcome weak growth. Accenture regularly examines Germany’s largest companies on the basis of their ranking in “Germany’s Top500,” a list published every year in the German The four insights in detail: daily newspaper DIE WELT. These 500 most successful German companies generate revenue of more than INSIGHT 1 one billion Euros. In previous years, they were the Despite high levels of sales, growth among Germany’s engines of the German economy.
    [Show full text]
  • Jaguar Supercomputer
    Jaguar Supercomputer Jake Baskin '10, Jānis Lībeks '10 Jan 27, 2010 What is it? Currently the fastest supercomputer in the world, at up to 2.33 PFLOPS, located at Oak Ridge National Laboratory (ORNL). Leader in "petascale scientific supercomputing". Uses Massively parallel simulations. Modeling: Climate Supernovas Volcanoes Cellulose http://www.nccs.gov/wp- content/themes/nightfall/img/jaguarXT5/gallery/jaguar-1.jpg Overview Processor Specifics Network Architecture Programming Models NCCS networking Spider file system Scalability The guts 84 XT4 and 200 XT5 cabinets XT5 18688 compute nodes 256 service and i/o nodes XT4 7832 compute nodes 116 service and i/o nodes (XT5) Compute Nodes 2 Opteron 2435 Istanbul (6 core) processors per node 64K L1 instruction cache 65K L1 data cache per core 512KB L2 cache per core 6MB L3 cache per processor (shared) 8GB of DDR2-800 RAM directly attached to each processor by integrated memory controller. http://www.cray.com/Assets/PDF/products/xt/CrayXT5Brochure.pdf How are they organized? 3-D Torus topology XT5 and XT4 segments are connected by an InfiniBand DDR network 889 GB/sec bisectional bandwidth http://www.cray.com/Assets/PDF/products/xt/CrayXT5Brochure.pdf Programming Models Jaguar supports these programming models: MPI (Message Passing Interface) OpenMP (Open Multi Processing) SHMEM (SHared MEMory access library) PGAS (Partitioned global address space) NCCS networking Jaguar usually performs computations on large datasets. These datasets have to be transferred to ORNL. Jaguar is connected to ESnet (Energy Sciences Network, scientific institutions) and Internet2 (higher education institutions). ORNL owns its own optical network that allows 10Gb/s to various locations around the US.
    [Show full text]
  • Performance Evaluation of the Intel Sandy Bridge Based NASA
    Performance Evaluation of the Intel Sandy Bridge Based NASA Pleiades Using Scientific and Engineering Applications Subhash Saini, Johnny Chang, Haoqiang Jin NASA Advanced Supercomputing Division NASA Ames Research Center Moffett Field, California 94035-1000, USA {subhash.saini, johnny.chang, haoqiang.jin}@nasa.gov Abstract — We present a performance evaluation of Pleiades based based on the Many Integrated Core (code-named Knight’s on the Intel Xeon E5-2670 processor, a fourth-generation eight-core Corner) architecture and Yellowstone [1, 5, 6]. New and Sandy Bridge architecture, and compare it with the previous third extended features of Sandy Bridge architecture are: generation Nehalem architecture. Several architectural features have been incorporated in Sandy Bridge: (a) four memory channels as a) A ring to connect on-chip L3 cache with cores, system opposed to three in Nehalem; (b) memory speed increased from 1333 agent, memory controller, and QPI agent and I/O MHz to 1600 MHz; (c) ring to connect on-chip L3 cache with cores, controller to increase the scalability. L3 cache per core system agent, memory controller, and QPI agent and I/O controller to has been increased from 2 MB to 2.5 MB. increase the scalability; (d) new AVX unit with wider vector registers of 256 bit; (e) integration of PCI-Express 3.0 controllers into the I/O b) New micro-ops (L0) cache that caches instructions as subsystem on chip; (f) new Turbo Boost version 2.0 where base they are decoded. The cache is direct mapped and can frequency of processor increased from 2.6 to 3.2 GHz; and (g) QPI link store 1.5 K micro-ops.
    [Show full text]
  • President Lands at Moffett Field During Silicon Valley Visit by Huong Nguyen and Jessica Culler President Barack Obama’S Arrival Onboard Air Force One on Sunday, Sept
    Fall 2011 - A Quarterly Publication Thousands turn out to tour airborne observatory at Ames The Stratospheric Observatory for In- frared Astronomy (SOFIA) visited NASA Ames and provided a rare opportunity to tour the airborne observatory in October. News media and Ames employees were invited to tour on Friday, Oct. 14 and the public on Saturday, Oct. 15. On Friday, an estimated 2,500 people, including representatives from CNET, Fox News, KQED (PBS), New Scientist, Space. com, Mountain View Patch and the San Mateo Daily Journal attended the event. On Saturday, an estimated 5,500 people visited Ames to tour SOFIA. See page 6 for a feature about the SOFIA visit. NASA photo by Doiminic Hart President lands at Moffett Field during Silicon Valley visit BY HUONG NGUYEN AND JESSICA CULLER President Barack Obama’s arrival onboard Air Force One on Sunday, Sept. 25, 2011, at Moffett Federal Airfield marked his first land- ing at NASA Ames. Center Director Pete Worden met President Barack Obama upon his arrival. "I had the honor to meet President Obama when he arrived at Moffett Federal Airfield," said Worden, who along with San Jose Mayor Chuck Reed and Mountain View Mayor Jac Siegel, greeted the president. "It was fitting that the president came to Silicon Valley to talk about his job creation plan, given how critical Silicon Valley is to the future of the U.S. economy," Worden said. NASA Ames is an integral com- ponent of the world-renowned inno- vation economy and actively partici- pates in the valley’s technological Photo credit: Official White House Photo by Pete Souza and scientific evolution.
    [Show full text]
  • Presentación De Powerpoint
    Towards Exaflop Supercomputers Prof. Mateo Valero Director of BSC, Barcelona National U. of Defense Technology (NUDT) Tianhe-1A • Hybrid architecture: • Main node: • Two Intel Xeon X5670 2.93 GHz 6-core Westmere, 12 MB cache • One Nvidia Tesla M2050 448-ALU (16 SIMD units) 1150 MHz Fermi GPU: • 32 GB memory per node • 2048 Galaxy "FT-1000" 1 GHz 8-core processors • Number of nodes and cores: • 7168 main nodes * (2 sockets * 6 CPU cores + 14 SIMD units) = 186368 cores (not including 16384 Galaxy cores) • Peak performance (DP): • 7168 nodes * (11.72 GFLOPS per core * 6 CPU cores * 2 sockets + 36.8 GFLOPS per SIMD unit * 14 SIMD units per GPU) = 4701.61 TFLOPS • Linpack performance: 2.507 PF 53% efficiency • Power consumption 4.04 MWatt Source http://blog.zorinaq.com/?e=36 Cartagena, Colombia, May 18-20 Top10 Rank Site Computer Procs Rmax Rpeak 1 Tianjin, China XeonX5670+NVIDIA 186368 2566000 4701000 2 Oak Ridge Nat. Lab. Cray XT5,6 cores 224162 1759000 2331000 3 Shenzhen, China XeonX5670+NVIDIA 120640 1271000 2984300 4 GSIC Center, Tokyo XeonX5670+NVIDIA 73278 1192000 2287630 5 DOE/SC/LBNL/NERSC Cray XE6 12 cores 153408 1054000 1288630 Commissariat a l'Energie Bull bullx super-node 6 138368 1050000 1254550 Atomique (CEA) S6010/S6030 QS22/LS21 Cluster, 7 DOE/NNSA/LANL PowerXCell 8i / Opteron 122400 1042000 1375780 Infiniband National Institute for 8 Computational Cray XT5-HE 6 cores 98928 831700 1028850 Sciences/University of Tennessee 9 Forschungszentrum Juelich (FZJ) Blue Gene/P Solution 294912 825500 825500 10 DOE/NNSA/LANL/SNL Cray XE6 8-core 107152 816600 1028660 Cartagena, Colombia, May 18-20 Looking at the Gordon Bell Prize • 1 GFlop/s; 1988; Cray Y-MP; 8 Processors • Static finite element analysis • 1 TFlop/s; 1998; Cray T3E; 1024 Processors • Modeling of metallic magnet atoms, using a variation of the locally self-consistent multiple scattering method.
    [Show full text]
  • Financing the Future of Supercomputing: How to Increase
    INNOVATION FINANCE ADVISORY STUDIES Financing the future of supercomputing How to increase investments in high performance computing in Europe years Financing the future of supercomputing How to increase investment in high performance computing in Europe Prepared for: DG Research and Innovation and DG Connect European Commission By: Innovation Finance Advisory European Investment Bank Advisory Services Authors: Björn-Sören Gigler, Alberto Casorati and Arnold Verbeek Supervisor: Shiva Dustdar Contact: [email protected] Consultancy support: Roland Berger and Fraunhofer SCAI © European Investment Bank, 2018. All rights reserved. All questions on rights and licensing should be addressed to [email protected] Financing the future of supercomputing Foreword “Disruptive technologies are key enablers for economic growth and competitiveness” The Digital Economy is developing rapidly worldwide. It is the single most important driver of innovation, competitiveness and growth. Digital innovations such as supercomputing are an essential driver of innovation and spur the adoption of digital innovations across multiple industries and small and medium-sized enterprises, fostering economic growth and competitiveness. Applying the power of supercomputing combined with Artificial Intelligence and the use of Big Data provide unprecedented opportunities for transforming businesses, public services and societies. High Performance Computers (HPC), also known as supercomputers, are making a difference in the everyday life of citizens by helping to address the critical societal challenges of our times, such as public health, climate change and natural disasters. For instance, the use of supercomputers can help researchers and entrepreneurs to solve complex issues, such as developing new treatments based on personalised medicine, or better predicting and managing the effects of natural disasters through the use of advanced computer simulations.
    [Show full text]
  • The Outer Limits
    Welcome to the outer limits Budapest, March 19, 2013 M.Sc. Ji ří Hlavá č HPC consultant + sales manager for CEE [email protected] ©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement 1 Jiri Hlavac (by Czech Ji ří Hlavá č) 51 years old, 4 children … • MSc. Computers (1986) • Development of PC OSs for Tesla Czech (1986-1989) • Own SW company (1986-1991) • Owner SGI Distributor @ Czechoslovakia (1991-1995) • Employee @ SGI Czech office (1995-now) • Technical Director, Academic Sales, Enterprise Sales • HPC Consultant (2001-2011) • Sales Manager for Central + East Europe (2005-now) ©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement 2 SGI = Experts @ HPC Structural Mechanics Structural Mechanics Computational Fluid Electro-Magnetics Implicit Explicit Dynamics Computational Chemistry Computational Chemistry Computational Biology Seismic Processing Quantum Mechanics Molecular Dynamics Reservoir Simulation Rendering / Ray Tracing Climate / Weather Data Analytics Ocean Simulation ©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure Agreement 3 SGI = Focus on every detail (here Power Consumption) ©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure4 Agreement 4 SGI = Frontier @ Research SGI = Winner of last HPCwire Readers' Choice Award (Nov 2012) for "Top Supercomputing Achievement" for SGI contribution to the NASA Ames Pleiades supercomputer. SGI = Winner of last HPCwire Editor's Choice Award (Nov 2012) for "Best use of HPC in 'edge HPC‘ application" for Wikipedia historical mapping and exploration on UV 2000. SGI stock is growing 5 ©2012 Silicon Graphics International Corp. / Presented Only Under Non-Disclosure5 Agreement 5 Advanced Energy Exploration and Production Total: World's Largest Commercial HPC System (2.3 PF) SGI ICE ©2012 Silicon Graphics International Corp.
    [Show full text]
  • Efficient Object Storage Journaling in a Distributed Parallel File System Presented by Sarp Oral
    Efficient Object Storage Journaling in a Distributed Parallel File System Presented by Sarp Oral Sarp Oral, Feiyi Wang, David Dillow, Galen Shipman, Ross Miller, and Oleg Drokin FAST’10, Feb 25, 2010 A Demanding Computational Environment Jaguar XT5 18,688 224,256 300+ TB 2.3 PFlops Nodes Cores memory Jaguar XT4 7,832 31,328 63 TB 263 TFlops Nodes Cores memory Frost (SGI Ice) 128 Node institutional cluster Smoky 80 Node software development cluster Lens 30 Node visualization and analysis cluster 2 FAST’10, Feb 25, 2010 Spider Fastest Lustre file system in the world Demonstrated bandwidth of 240 GB/s on the center wide file system Largest scale Lustre file system in the world Demonstrated stability and concurrent mounts on major OLCF systems • Jaguar XT5 • Jaguar XT4 • Opteron Dev Cluster (Smoky) • Visualization Cluster (Lens) Over 26,000 clients mounting the file system and performing I/O General availability on Jaguar XT5, Lens, Smoky, and GridFTP servers Cutting edge resiliency at scale Demonstrated resiliency features on Jaguar XT5 • DM Multipath • Lustre Router failover 3 FAST’10, Feb 25, 2010 Designed to Support Peak Performance 100.00 ReadBW GB/s WriteBW GB/s 90.00 80.00 70.00 60.00 50.00 Bandwidth GB/s 40.00 30.00 20.00 10.00 0.00 1/1/10 0:00 1/6/10 0:00 1/11/10 0:00 1/16/10 0:00 1/21/10 0:00 1/26/10 0:00 1/31/10 0:00 Timeline (January 2010) Max data rates (hourly) on ½ of available storage controllers 4 FAST’10, Feb 25, 2010 Motivations for a Center Wide File System • Building dedicated file systems for platforms does not scale
    [Show full text]
  • NASA Announces a New Approach to Earth Science Data Analysis 20 April 2010
    NASA Announces A New Approach To Earth Science Data Analysis 20 April 2010 (PhysOrg.com) -- The way we analyze planet Earth than ten hours. will never be the same, thanks to a new initiative at NASA that integrates supercomputers with global NEX uses a new approach for collaboration among satellite observations and sophisticated models of scientists and science teams working to model the the Earth system in an online collaborative Earth system and analyze large Earth observation environment. As part of its celebration of Earth datasets. Using on-line collaboration technologies, Week, NASA unveiled the NASA Earth Exchange NEX will bring together geographically dispersed (NEX) at a “Green Earth” public forum held at the multi-disciplinary groups of scientists focused on NASA Exploration Center, Moffett Field, Calif. global change research. Scientists will be able to build custom project environments containing the By making NEX available, NASA expects to better datasets and software components needed to solve enable scientists to collaboratively conduct complex Earth science problems. These project research and address the impacts of changes in environments, built using virtualization technology, climate and land use patterns on ecosystems. NEX will be highly portable and reusable and will will link NASA’s supercomputing resources with automatically capture the entire analysis process, massive Earth system data sets, and provide a including the data and processing steps required to collection of tools for analysis and visualization. replicate the results in an open and transparent way. For example, results from the processing of "Currently, it can require months for scientists to the global Landsat data would be available to gather and analyze global-scale data sets, due to scientists with the additional expertise required to computing limitations, data storage requirements analyze rates of urbanization, deforestation, or and network bandwidth constraints”, said biodiversity impacts.
    [Show full text]