Supercomputing's Exaflop Target

Total Page:16

File Type:pdf, Size:1020Kb

Supercomputing's Exaflop Target news Technology | doi:10.1145/1978542.1978549 Tom Geller supercomputing’s exaflopt arget The twin challenges of parallelism and energy consumption are enlivening supercomputers’ progress. sia has come out swinging at the top of June 2011’s Top500 list, which rates the world’s fastest com- puters based on the LIN- APACK benchmark. Leading the list is the K Computer, which achieved 8.2 quadrillion floating-point operations per second (petaflops) to give Japan its first appearance in the much- coveted number-one position since November 2004. It knocks China’s Tianhe-1A, at 2.6 petaflops, to sec- ond place. The U.S.’s Jaguar (1.75 petaflops) was pushed from second to third place. China’s Nebulae (1.27 petaflops) dropped from third to fourth, and Japan’s Tsubame 2.0 (1.19 petaflops) slipped from fourth to the fifth position. Asia’s success comes at a time when national governments are reconsid- ering the value of these ratings—and what defines supercomputing suc- cess. The U.S. President’s Council of the top500’s leading supercomputer, the K computer, is more powerful than its five closest Advisors on Science and Technology competitors combined. (PCAST) recently warned that a focus on such rankings could “crowd out … combinations now popular in super- (1 exaflop). In asking for funding, they fundamental research in computer computing] is more difficult than for wrote that “Our global competitors in science and engineering.” Even Jack commodity CPUs,” and that “chang- Asia and Europe are already at work on Dongarra, the list’s founder and a com- ing from sequential to parallel code is exascale computing technology … we puter science professor at the Univer- not easy.” He predicts that such a tran- cannot afford to risk our leadership po- sity of Tennessee, believes the rank- sition would take three to five years. sition in computational sciences.” ings need to be seen in a larger context. But even with superb programming, David Kahaner, founding director “You can’t judge a car solely by how real-world aspects of data delivery and of the Asian Technology Information many RPMs its engine can do,” he says, error correction could significantly Program, believes that Tianhe-1A is the pointing to the Graph500 list and HPC reduce application speeds from those leading edge of a Chinese push to not Challenge Benchmark as other sources reported by the Top500. only increase supercomputing speeds, of comparative supercomputing data. Regardless, the November 2010 but also domesticate production. “It Computer scientists are also skepti- Top500 list’s release, with China’s represents a real commitment from cal about the value of such rankings. Tianhe-1A in first place, spurred po- the Chinese government to develop Petaflops are not the same as useful litical discussions about national com- supercomputing and the infrastruc- work; practical applications need to mitment to high performance comput- ture to support it,” he says. “A Chinese both preserve and take advantage of ing (HPC) throughout the world. In the domestic HPC ecosystem is evolving. their power. Xuebin Chi, director of U.S., 12 senators cited Tianhe-1A in Domestic components are being devel- the Supercomputing Center at the Chi- a letter to President Obama warning oped and incorporated, and their use nese Academy of Sciences, points out that “the race is on” to develop super- is likely to increase.” Dongarra agrees, that “programming [for the CPU/GPU computers capable of 1,000 petaflops noting, “The rate at which they’re do- credit tk 16 communications of the acm | month 2009 | vol. 00 | no. 00 news In today’s supercomputers, GPUs Programming “if you peek a little provide the brute calculation power, but rely heavily on CPUs for other tasks. For ACM-ICPC bit further into example, the number-two Tianhe-1A contains two six-core Intel Xeon X5670 graphics problems, CPUs for each 448-core Tesla M2050 World they look a lot like GPU (14,336 to 7,168); it also contains a much smaller number of eight-core Finals supercomputing Chinese-built Feiteng CPUs (2,048). Al- problems,” says together, GPUs in Tianhe-1A contribute students from Zhejiang approximately three million cores—30 University won the 2011 ACM sumit Gupta. international collegiate times as many as are in its CPUs. Programming contest World “modeling graphics But speed is not simply a matter Finals in orlando, FL, in may, of throwing more cores into the mix, beating student teams from is the same as as it is not easy to extract all of their 104 universities in the iBm- sponsored competition. modeling molecule processing power. First, data must be en route to its queued and managed to feed them— championship, Zhejiang movement in a and to put the results together when University students used iBm’s large-scale analytics and cloud chemical process.” they come out. “You need the CPU computing technologies to solve to drive the GPU,” Chi explains. “If eight out of 11 programming your problem can’t be fit into the problems in five hours. GPU itself, data will need to move fre- The only other team to solve eight problems, the University quently between the two, hurting per- of michigan, won a gold medal ing that is something we’ve not seen formance.” Dongarra correlates this, for finishing second. before with other countries.” saying, “The speed of moving data to The University of michigan team’s success was due to the GPU and the speed of computing it several factors, says coach Kevin cPus vs. cPu/GPu hybrids once it’s there are so mismatched that compton, associate professor If the June Top500 list had been a foot the GPU must do many computations in the school’s electrical race, the K Computer would have lapped with it before you see benefits.” engineering and computer science department. “First, they the competition. At 8.2 petaflops, it The K Computer, though, bucks are very talented programmers wields more power than the next five su- this trend. Unlike Tianhe-1A and other and problem solvers,” he says. “a percomputers combined. The K Com- recent large supercomputers, it does couple of them began entering programming competitions in puter’s name alludes to the Japanese not utilize GPUs or accelerators. The K high school.” in addition, from word “Kei” for 10 quadrillions, and rep- Computer uses 68,554 SPARC64 VIIIfx the time the regional contests resents the researchers’ desired perfor- CPUs, each with eight cores, for a total were held in october 2010 and mance goal of 10 petaflops. of 548,352 cores. And the Japanese en- the World Finals, the michigan team practiced intensely. Aside from national aspirations, the gineers plan to boost the K Computer’s The michigan team also Top500 list reveals technical trends in power by increasing the number of its created an efficient strategy HPC research worldwide. Most nota- circuit board-filled cabinets from 672 for solving problems, with team ble is an increased use of general-pur- to 800 in the near future. members having clearly defined roles, compton says. pose graphics processing units (GPUs) “each member had a fairly in a hybrid configuration with CPUs. asia’s ascension specialized role, either as a The June 2011 list includes 19 super- Asian researchers appear to be well po- coder or debugger and tester,” he says. “By following this computers that use GPU technology; sitioned to exploit GPUs for massively strategy, they were able to avoid the June 2010 list contained just 10. getting bogged down on a GPUs contain many more cores top500 List, June 2011. particular problem.” than CPUs, allowing them to per- Tsinghua University and st. Petersburg University Rank computer nation form a larger number of calculations finished in third and fourth in parallel. While originally used for 1 K Computer Japan place, respectively, winning graphics tasks, such as rendering 2 tianhe-1A China gold medals. The other top 12 finalists, each of which received 3 Jaguar U.S. every pixel in an image, GPUs are in- a silver medal, were Nizhny creasingly applied to a wide variety of 4 nebulae China Novgorod state University (5th), data-intensive calculations. “If you 5 tSUBAmE 2.0 Japan saratov state University (6th), peek a little bit further into graphics 6 Cielo U.S. Friedrich-alexander-University erlangen-Nuremberg (7th), problems, they look a lot like super- 7 Pleiades U.S. Donetsk National University computing problems,” says Sumit 8 hopper U.S. (8th), Jagiellonian University th Gupta, manager of Tesla Products at 9 tera-100 France in Krakow (9 ), moscow state University (10th), Ural state NVIDIA. “Modeling graphics is the 10 Roadrunner U.S. University (11th), and University same as modeling molecule move- Source: http://www.top500.org of Waterloo (12th). ment in a chemical process.” —Bob Violino month 2009 | vol. 00 | no. 00 | communications of the acm 17 news parallel supercomputing. Kahaner be- especially when compared with tradi- lieves China’s relative isolation from tional expectations. “Look at Moore’s Western influences may have led to in today’s hybrid law,” he says. “Computers will get about economics that favor such innovations. cPu/GPu 100 times faster in 10 years. But going “They’re not so tightly connected with from petascale to exascale in 10 years is U.S. vendors who have their own per- supercomputers, a multiple of a thousand.” Having said ception of things,” he says. “Potential GPus provide the that, he notes that it is been done be- bang for the buck is very strong in Asia, fore—twice. “We went from gigaflops in especially in places like China or India, brute calculation 1990 to teraflops in about 10 years, and which are very price-sensitive markets.
Recommended publications
  • Linpack Evaluation on a Supercomputer with Heterogeneous Accelerators
    Linpack Evaluation on a Supercomputer with Heterogeneous Accelerators Toshio Endo Akira Nukada Graduate School of Information Science and Engineering Global Scientific Information and Computing Center Tokyo Institute of Technology Tokyo Institute of Technology Tokyo, Japan Tokyo, Japan [email protected] [email protected] Satoshi Matsuoka Naoya Maruyama Global Scientific Information and Computing Center Global Scientific Information and Computing Center Tokyo Institute of Technology/National Institute of Informatics Tokyo Institute of Technology Tokyo, Japan Tokyo, Japan [email protected] [email protected] Abstract—We report Linpack benchmark results on the Roadrunner or other systems described above, it includes TSUBAME supercomputer, a large scale heterogeneous system two types of accelerators. This is due to incremental upgrade equipped with NVIDIA Tesla GPUs and ClearSpeed SIMD of the system, which has been the case in commodity CPU accelerators. With all of 10,480 Opteron cores, 640 Xeon cores, 648 ClearSpeed accelerators and 624 NVIDIA Tesla GPUs, clusters; they may have processors with different speeds as we have achieved 87.01TFlops, which is the third record as a result of incremental upgrade. In this paper, we present a heterogeneous system in the world. This paper describes a Linpack implementation and evaluation results on TSUB- careful tuning and load balancing method required to achieve AME with 10,480 Opteron cores, 624 Tesla GPUs and 648 this performance. On the other hand, since the peak speed is ClearSpeed accelerators. In the evaluation, we also used a 163 TFlops, the efficiency is 53%, which is lower than other systems.
    [Show full text]
  • 2020 ALCF Science Report
    ARGONNE LEADERSHIP 2020 COMPUTING FACILITY Science On the cover: A snapshot of a visualization of the SARS-CoV-2 viral envelope comprising 305 million atoms. A multi-institutional research team used multiple supercomputing resources, including the ALCF’s Theta system, to optimize codes in preparation for large-scale simulations of the SARS-CoV-2 spike protein that were recognized with the ACM Gordon Bell Special Prize for HPC-Based COVID-19 Research. Image: Rommie Amaro, Lorenzo Casalino, Abigail Dommer, and Zied Gaieb, University of California San Diego 2020 SCIENCE CONTENTS 03 Message from ALCF Leadership 04 Argonne Leadership Computing Facility 10 Advancing Science with HPC 06 About ALCF 12 ALCF Resources Contribute to Fight Against COVID-19 07 ALCF Team 16 Edge Services Propel Data-Driven Science 08 ALCF Computing Resources 18 Preparing for Science in the Exascale Era 26 Science 28 Accessing ALCF GPCNeT: Designing a Benchmark 43 Materials Science 51 Physics Resources for Science Suite for Inducing and Measuring Constructing and Navigating Hadronic Light-by-Light Scattering Contention in HPC Networks Polymorphic Landscapes of and Vacuum Polarization Sudheer Chunduri Molecular Crystals Contributions to the Muon 30 2020 Science Highlights Parallel Relational Algebra for Alexandre Tkatchenko Anomalous Magnetic Moment Thomas Blum 31 Biological Sciences Logical Inferencing at Scale Data-Driven Materials Sidharth Kumar Scalable Reinforcement-Learning- Discovery for Optoelectronic The Last Journey Based Neural Architecture Search Applications
    [Show full text]
  • Tsubame 2.5 Towards 3.0 and Beyond to Exascale
    Being Very Green with Tsubame 2.5 towards 3.0 and beyond to Exascale Satoshi Matsuoka Professor Global Scientific Information and Computing (GSIC) Center Tokyo Institute of Technology ACM Fellow / SC13 Tech Program Chair NVIDIA Theater Presentation 2013/11/19 Denver, Colorado TSUBAME2.0 NEC Confidential TSUBAME2.0 Nov. 1, 2010 “The Greenest Production Supercomputer in the World” TSUBAME 2.0 New Development >600TB/s Mem BW 220Tbps NW >12TB/s Mem BW >400GB/s Mem BW >1.6TB/s Mem BW Bisecion BW 80Gbps NW BW 35KW Max 1.4MW Max 32nm 40nm ~1KW max 3 Performance Comparison of CPU vs. GPU 1750 GPU 200 GPU ] 1500 160 1250 GByte/s 1000 120 750 80 500 CPU CPU 250 40 Peak Performance [GFLOPS] Performance Peak 0 Memory Bandwidth [ Bandwidth Memory 0 x5-6 socket-to-socket advantage in both compute and memory bandwidth, Same power (200W GPU vs. 200W CPU+memory+NW+…) NEC Confidential TSUBAME2.0 Compute Node 1.6 Tflops Thin 400GB/s Productized Node Mem BW as HP 80GBps NW ProLiant Infiniband QDR x2 (80Gbps) ~1KW max SL390s HP SL390G7 (Developed for TSUBAME 2.0) GPU: NVIDIA Fermi M2050 x 3 515GFlops, 3GByte memory /GPU CPU: Intel Westmere-EP 2.93GHz x2 (12cores/node) Multi I/O chips, 72 PCI-e (16 x 4 + 4 x 2) lanes --- 3GPUs + 2 IB QDR Memory: 54, 96 GB DDR3-1333 SSD:60GBx2, 120GBx2 Total Perf 2.4PFlops Mem: ~100TB NEC Confidential SSD: ~200TB 4-1 2010: TSUBAME2.0 as No.1 in Japan > All Other Japanese Centers on the Top500 COMBINED 2.3 PetaFlops Total 2.4 Petaflops #4 Top500, Nov.
    [Show full text]
  • Lessons Learned in Deploying the World's Largest Scale Lustre File
    Lessons Learned in Deploying the World’s Largest Scale Lustre File System Galen M. Shipman, David A. Dillow, Sarp Oral, Feiyi Wang, Douglas Fuller, Jason Hill, Zhe Zhang Oak Ridge Leadership Computing Facility, Oak Ridge National Laboratory Oak Ridge, TN 37831, USA fgshipman,dillowda,oralhs,fwang2,fullerdj,hilljj,[email protected] Abstract 1 Introduction The Spider system at the Oak Ridge National Labo- The Oak Ridge Leadership Computing Facility ratory’s Leadership Computing Facility (OLCF) is the (OLCF) at Oak Ridge National Laboratory (ORNL) world’s largest scale Lustre parallel file system. Envi- hosts the world’s most powerful supercomputer, sioned as a shared parallel file system capable of de- Jaguar [2, 14, 7], a 2.332 Petaflop/s Cray XT5 [5]. livering both the bandwidth and capacity requirements OLCF also hosts an array of other computational re- of the OLCF’s diverse computational environment, the sources such as a 263 Teraflop/s Cray XT4 [1], visual- project had a number of ambitious goals. To support the ization, and application development platforms. Each of workloads of the OLCF’s diverse computational plat- these systems requires a reliable, high-performance and forms, the aggregate performance and storage capacity scalable file system for data storage. of Spider exceed that of our previously deployed systems Parallel file systems on leadership-class systems have by a factor of 6x - 240 GB/sec, and 17x - 10 Petabytes, traditionally been tightly coupled to single simulation respectively. Furthermore, Spider supports over 26,000 platforms. This approach had resulted in the deploy- clients concurrently accessing the file system, which ex- ment of a dedicated file system for each computational ceeds our previously deployed systems by nearly 4x.
    [Show full text]
  • Titan: a New Leadership Computer for Science
    Titan: A New Leadership Computer for Science Presented to: DOE Advanced Scientific Computing Advisory Committee November 1, 2011 Arthur S. Bland OLCF Project Director Office of Science Statement of Mission Need • Increase the computational resources of the Leadership Computing Facilities by 20-40 petaflops • INCITE program is oversubscribed • Programmatic requirements for leadership computing continue to grow • Needed to avoid an unacceptable gap between the needs of the science programs and the available resources • Approved: Raymond Orbach January 9, 2009 • The OLCF-3 project comes out of this requirement 2 ASCAC – Nov. 1, 2011 Arthur Bland INCITE is 2.5 to 3.5 times oversubscribed 2007 2008 2009 2010 2011 2012 3 ASCAC – Nov. 1, 2011 Arthur Bland What is OLCF-3 • The next phase of the Leadership Computing Facility program at ORNL • An upgrade of Jaguar from 2.3 Petaflops (peak) today to between 10 and 20 PF by the end of 2012 with operations in 2013 • Built with Cray’s newest XK6 compute blades • When completed, the new system will be called Titan 4 ASCAC – Nov. 1, 2011 Arthur Bland Cray XK6 Compute Node XK6 Compute Node Characteristics AMD Opteron 6200 “Interlagos” 16 core processor @ 2.2GHz Tesla M2090 “Fermi” @ 665 GF with 6GB GDDR5 memory Host Memory 32GB 1600 MHz DDR3 Gemini High Speed Interconnect Upgradeable to NVIDIA’s next generation “Kepler” processor in 2012 Four compute nodes per XK6 blade. 24 blades per rack 5 ASCAC – Nov. 1, 2011 Arthur Bland ORNL’s “Titan” System • Upgrade of existing Jaguar Cray XT5 • Cray Linux Environment
    [Show full text]
  • Musings RIK FARROWOPINION
    Musings RIK FARROWOPINION Rik is the editor of ;login:. While preparing this issue of ;login:, I found myself falling down a rabbit hole, like [email protected] Alice in Wonderland . And when I hit bottom, all I could do was look around and puzzle about what I discovered there . My adventures started with a casual com- ment, made by an ex-Cray Research employee, about the design of current super- computers . He told me that today’s supercomputers cannot perform some of the tasks that they are designed for, and used weather forecasting as his example . I was stunned . Could this be true? Or was I just being dragged down some fictional rabbit hole? I decided to learn more about supercomputer history . Supercomputers It is humbling to learn about the early history of computer design . Things we take for granted, such as pipelining instructions and vector processing, were impor- tant inventions in the 1970s . The first supercomputers were built from discrete components—that is, transistors soldered to circuit boards—and had clock speeds in the tens of nanoseconds . To put that in real terms, the Control Data Corpora- tion’s (CDC) 7600 had a clock cycle of 27 .5 ns, or in today’s terms, 36 4. MHz . This was CDC’s second supercomputer (the 6600 was first), but included instruction pipelining, an invention of Seymour Cray . The CDC 7600 peaked at 36 MFLOPS, but generally got 10 MFLOPS with carefully tuned code . The other cool thing about the CDC 7600 was that it broke down at least once a day .
    [Show full text]
  • Biology at the Exascale
    Biology at the Exascale Advances in computational hardware and algorithms that have transformed areas of physics and engineering have recently brought similar benefits to biology and biomedical research. Contributors: Laura Wolf and Dr. Gail W. Pieper, Argonne National Laboratory Biological sciences are undergoing a revolution. High‐performance computing has accelerated the transition from hypothesis‐driven to design‐driven research at all scales, and computational simulation of biological systems is now driving the direction of biological experimentation and the generation of insights. As recently as ten years ago, success in predicting how proteins assume their intricate three‐dimensional forms was considered highly unlikely if there was no related protein of known structure. For those proteins whose sequence resembles a protein of known structure, the three‐dimensional structure of the known protein can be used as a “template” to deduce the unknown protein structure. At the time, about 60 percent of protein sequences arising from the genome sequencing projects had no homologs of known structure. In 2001, Rosetta, a computational technique developed by Dr. David Baker and colleagues at the Howard Hughes Medical Institute, successfully predicted the three‐dimensional structure of a folded protein from its linear sequence of amino acids. (Baker now develops tools to enable researchers to test new protein scaffolds, examine additional structural hypothesis regarding determinants of binding, and ultimately design proteins that tightly bind endogenous cellular proteins.) Two years later, a thirteen‐year project to sequence the human genome was declared a success, making available to scientists worldwide the billions of letters of DNA to conduct postgenomic research, including annotating the human genome.
    [Show full text]
  • (Intel® OPA) for Tsubame 3
    CASE STUDY High Performance Computing (HPC) with Intel® Omni-Path Architecture Tokyo Institute of Technology Chooses Intel® Omni-Path Architecture for Tsubame 3 Price/performance, thermal stability, and adaptive routing are key features for enabling #1 on Green 500 list Challenge How do you make a good thing better? Professor Satoshi Matsuoka of the Tokyo Institute of Technology (Tokyo Tech) has been designing and building high- performance computing (HPC) clusters for 20 years. Among the systems he and his team at Tokyo Tech have architected, Tsubame 1 (2006) and Tsubame 2 (2010) have shown him the importance of heterogeneous HPC systems for scientific research, analytics, and artificial intelligence (AI). Tsubame 2, built on Intel® Xeon® Tsubame at a glance processors and Nvidia* GPUs with InfiniBand* QDR, was Japan’s first peta-scale • Tsubame 3, the second- HPC production system that achieved #4 on the Top500, was the #1 Green 500 generation large, production production supercomputer, and was the fastest supercomputer in Japan at the time. cluster based on heterogeneous computing at Tokyo Institute of Technology (Tokyo Tech); #61 on June 2017 Top 500 list and #1 on June 2017 Green 500 list For Matsuoka, the next-generation machine needed to take all the goodness of Tsubame 2, enhance it with new technologies to not only advance all the current • The system based upon HPE and latest generations of simulation codes, but also drive the latest application Apollo* 8600 blades, which targets—which included deep learning/machine learning, AI, and very big data are smaller than a 1U server, analytics—and make it more efficient that its predecessor.
    [Show full text]
  • Germany's Top500 Businesses
    GERMANY’S TOP500 DIGITAL BUSINESS MODELS IN SEARCH OF BUSINESS CONTENTS FOREWORD 3 INSIGHT 1 4 SLOW GROWTH RATES YET HIGH SALES INSIGHT 2 6 NOT ENOUGH REVENUE IS ATTRIBUTABLE TO DIGITIZATION INSIGHT 3 10 EU REGULATIONS ARE WEAKENING INNOVATION INSIGHT 4 12 THE GERMAN FEDERAL GOVERNMENT COULD TURN INTO AN INNOVATION DRIVER CONCLUSION 14 FOREWORD Large German companies are on the lookout. Their purpose: To find new growth prospects. While revenue increases of more than 5 percent on average have not been uncommon for Germany’s 500 largest companies in the past, that level of growth has not occurred for the last four years. The reasons are obvious. With their high export rates, This study is intended to examine critically the Germany’s industrial companies continue to be major opportunities arising at the beginning of a new era of players in the global market. Their exports have, in fact, technology. Accenture uses four insights to not only been so high in the past that it is now increasingly describe the progress that has been made in digital difficult to sustain their previous rates of growth. markets, but also suggests possible steps companies can take to overcome weak growth. Accenture regularly examines Germany’s largest companies on the basis of their ranking in “Germany’s Top500,” a list published every year in the German The four insights in detail: daily newspaper DIE WELT. These 500 most successful German companies generate revenue of more than INSIGHT 1 one billion Euros. In previous years, they were the Despite high levels of sales, growth among Germany’s engines of the German economy.
    [Show full text]
  • Jaguar Supercomputer
    Jaguar Supercomputer Jake Baskin '10, Jānis Lībeks '10 Jan 27, 2010 What is it? Currently the fastest supercomputer in the world, at up to 2.33 PFLOPS, located at Oak Ridge National Laboratory (ORNL). Leader in "petascale scientific supercomputing". Uses Massively parallel simulations. Modeling: Climate Supernovas Volcanoes Cellulose http://www.nccs.gov/wp- content/themes/nightfall/img/jaguarXT5/gallery/jaguar-1.jpg Overview Processor Specifics Network Architecture Programming Models NCCS networking Spider file system Scalability The guts 84 XT4 and 200 XT5 cabinets XT5 18688 compute nodes 256 service and i/o nodes XT4 7832 compute nodes 116 service and i/o nodes (XT5) Compute Nodes 2 Opteron 2435 Istanbul (6 core) processors per node 64K L1 instruction cache 65K L1 data cache per core 512KB L2 cache per core 6MB L3 cache per processor (shared) 8GB of DDR2-800 RAM directly attached to each processor by integrated memory controller. http://www.cray.com/Assets/PDF/products/xt/CrayXT5Brochure.pdf How are they organized? 3-D Torus topology XT5 and XT4 segments are connected by an InfiniBand DDR network 889 GB/sec bisectional bandwidth http://www.cray.com/Assets/PDF/products/xt/CrayXT5Brochure.pdf Programming Models Jaguar supports these programming models: MPI (Message Passing Interface) OpenMP (Open Multi Processing) SHMEM (SHared MEMory access library) PGAS (Partitioned global address space) NCCS networking Jaguar usually performs computations on large datasets. These datasets have to be transferred to ORNL. Jaguar is connected to ESnet (Energy Sciences Network, scientific institutions) and Internet2 (higher education institutions). ORNL owns its own optical network that allows 10Gb/s to various locations around the US.
    [Show full text]
  • ECP Software Technology Capability Assessment Report
    ECP-RPT-ST-0001-2018 ECP Software Technology Capability Assessment Report Michael A. Heroux, Director ECP ST Jonathan Carter, Deputy Director ECP ST Rajeev Thakur, Programming Models & Runtimes Lead Jeffrey Vetter, Development Tools Lead Lois Curfman McInnes, Mathematical Libraries Lead James Ahrens, Data & Visualization Lead J. Robert Neely, Software Ecosystem & Delivery Lead July 1, 2018 DOCUMENT AVAILABILITY Reports produced after January 1, 1996, are generally available free via US Department of Energy (DOE) SciTech Connect. Website http://www.osti.gov/scitech/ Reports produced before January 1, 1996, may be purchased by members of the public from the following source: National Technical Information Service 5285 Port Royal Road Springfield, VA 22161 Telephone 703-605-6000 (1-800-553-6847) TDD 703-487-4639 Fax 703-605-6900 E-mail [email protected] Website http://www.ntis.gov/help/ordermethods.aspx Reports are available to DOE employees, DOE contractors, Energy Technology Data Exchange representatives, and International Nuclear Information System representatives from the following source: Office of Scientific and Technical Information PO Box 62 Oak Ridge, TN 37831 Telephone 865-576-8401 Fax 865-576-5728 E-mail [email protected] Website http://www.osti.gov/contact.html This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights.
    [Show full text]
  • Experimental and Analytical Study of Xeon Phi Reliability
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Lume 5.8 Experimental and Analytical Study of Xeon Phi Reliability Daniel Oliveira Laércio Pilla Nathan DeBardeleben Institute of Informatics, UFRGS Department of Informatics and Los Alamos National Laboratory Porto Alegre, RS, Brazil Statistics, UFSC Los Alamos, NM, US Florianópolis, SC, Brazil Sean Blanchard Heather Quinn Israel Koren Los Alamos National Laboratory Los Alamos National Laboratory University of Massachusetts, UMass Los Alamos, NM, US Los Alamos, NM, US Amherst, MA, US Philippe Navaux Paolo Rech Institute of Informatics, UFRGS Institute of Informatics, UFRGS Porto Alegre, RS, Brazil Porto Alegre, RS, Brazil ABSTRACT 1 INTRODUCTION We present an in-depth analysis of transient faults effects on HPC Accelerators are extensively used to expedite calculations in large applications in Intel Xeon Phi processors based on radiation experi- HPC centers. Tianhe-2, Cori, Trinity, and Oakforest-PACS use Intel ments and high-level fault injection. Besides measuring the realistic Xeon Phi and many other top supercomputers use other forms of error rates of Xeon Phi, we quantify Silent Data Corruption (SDCs) accelerators [17]. The main reasons to use accelerators are their by correlating the distribution of corrupted elements in the out- high computational capacity, low cost, reduced per-task energy put to the application’s characteristics. We evaluate the benefits consumption, and flexible development platforms. Unfortunately, of imprecise computing for reducing the programs’ error rate. For accelerators are also extremely likely to experience transient errors example, for HotSpot a 0.5% tolerance in the output value reduces as they are built with cutting-edge technology, have very high the error rate by 85%.
    [Show full text]