LINPACK Benchmark?Benchmark?

Total Page:16

File Type:pdf, Size:1020Kb

LINPACK Benchmark?Benchmark? Center for Information Services and High Performance Computing (ZIH) Performance Analysis of Computer Systems Benchmarks: TOP 500, Stream, and HPCC Nöthnitzer Straße 46 Raum 1026 Tel. +49 351 - 463 - 35048 Holger Brunst ([email protected]) Matthias S. Mueller ([email protected]) Center for Information Services and High Performance Computing (ZIH) Summary of Previous Lecture Nöthnitzer Straße 46 Raum 1026 Tel. +49 351 - 463 - 35048 Holger Brunst ([email protected]) Matthias S. Mueller ([email protected]) Summary of Previous Lecture Different workloads: – Test workload – Real workload – Synthetic workload Historical examples for test workloads: – Addition instruction – Instruction mixes – Kernels – Synthetic programs – Application benchmarks Holger Brunst, Matthias Müller: Leistungsanalyse Excursion on Speedup and Efficiency Metrics Comparison of sequential and parallel algorithms Speedup: T1 Sn = Tn – n is the number of processors – T1 is the execution time of the sequential algorithm – Tn is the execution time of the parallel algorithm with n processors Efficiency: Sp E = p p – Its value estimates how well-utilized p processors solve a given problem – Usually between zero and one. Exception: Super linear speedup (later) Holger Brunst, Matthias Müller: Leistungsanalyse Amdahl’s Law Find the maximum expected improvement to an overall system when only part of the system is improved Serial execution time = s+p Parallel execution time = s+p/n s + p S = n p s + n – Normalizing with respect to serial time (s+p) = 1 results in: • Sn = 1/(s+p/n) – Drops off rapidly as serial fraction increases – Maximum speedup possible = 1/s, independent of n the number of processors! Bad news: If an application has only 1% serial work (s = 0.01) then you will never see a speedup greater than 100. So, why do we build system with more than 100 processors? What is wrong with this argument? Holger Brunst, Matthias Müller: Leistungsanalyse Popilar and historic benchmarks Popular benchmarks: – Eratosthenes sieve algorithm – Ackermann’s Function – Whetstone – LINPACK – Dhrystone – Lawrence Livermore Loops – TPC-C – SPEC Holger Brunst, Matthias Müller: Leistungsanalyse Workload description Level of Detail of the workload description - Examples: – Most frequent request (e.g. Addition) – Frequency of request type (instruction mix) – Time-stamped sequence of requests – Average resource demand (e.g. 20 I/O requests per second) – Distribution of resource demands (not only the average, but also probability distribution) Holger Brunst, Matthias Müller: Leistungsanalyse Characterization of Benchmarks There are many metrics, each one has its purpose Computer Hardware – Raw machine performance: Tflops – Microbenchmarks: Stream – Algorithmic benchmarks: Linpack – Compact Apps/Kernels: NAS benchmarks – Application Suites: SPEC – User-specific applications: Custom benchmarks Applications Holger Brunst, Matthias Müller: Leistungsanalyse Comparison of different benchmark classes coverage relevance Identify Time problems evolution Micro 0 0 ++ + Algorithmic - 0 + ++ Kernels 0 0 + + SPEC + + + + Apps - ++ 0 0 Holger Brunst, Matthias Müller: Leistungsanalyse SPEC Benchmarks: CPU 2006 Application Benchmarks Different metrics: – Integer, floatingpoint – Standard and rate – Base, peak Run rules Holger Brunst, Matthias Müller: Leistungsanalyse Center for Information Services and High Performance Computing (ZIH) Stream Nöthnitzer Straße 46 Raum 1026 Tel. +49 351 - 463 - 35048 Holger Brunst ([email protected]) Matthias S. Mueller ([email protected]) Stream Benchmark Author: John McCalpin (“Mr Bandwidth”) John McCalpin “Memory Bandwidth and Machine Balance in High Performance Computers”, IEEE TCCA Newsletter, December 1995 http://www.cs.virginia.edu/stream STREAM: measure memory bandwidth with the operations: – Copy: a(i) = b(i) – Scale: a(i)=s*b(i) – Add: a(i)=b(i)+c(i) – Triad: a(i)=b(i)+s*c(i) STREAM2: measures memory hierarchy bandwidth with the operations: – Fill: a(i)=0 – Copy: a(i)=b(i) – Daxpy: a(i) = a(i) +q*b(i) – Sum: sum += a(i) Holger Brunst, Matthias Müller: Leistungsanalyse Stream 2 properties Holger Brunst, Matthias Müller: Leistungsanalyse Stream Results: TOP 10 STREAM Memory Bandwidth --- John D. McCalpin, [email protected] Revised to Tue Jul 25 10:10:14 CST 2006 All results are in MB/s --- 1 MB=10^6 B, *not* 2^20 B -------------------------------------------------------------------------------- Machine ID ncpus COPY SCALE ADD TRIAD -------------------------------------------------------------------------------- SGI_Altix_4700 1024 3661963.0 3677482.0 4385585.0 4350166.0 SGI_Altix_3000 512 906388.0 870211.0 1055179.0 1119913.0 NEC_SX-7 32 876174.7 865144.1 869179.2 872259.1 NEC_SX-5-16A 16 607492.0 590390.0 607412.0 583069.0 NEC_SX-4 32 434784.0 432886.0 437358.0 436954.0 HP_AlphaServer_GS1280-1300 64 407351.0 400142.0 437010.0 431450.0 Cray_T932_321024-3E 32 310721.0 302182.0 359841.0 359270.0 NEC_SX-6 8 202627.2 192306.2 190231.3 213024.3 IBM_System_p5_595 64 186137.0 179639.0 200410.0 206243.0 HP_Integrity_SuperDome 128 154504.0 152999.0 169468.0 170833.0 Holger Brunst, Matthias Müller: Leistungsanalyse Stream 2 Results a(i)=b(i)+alpha*c(i) 900 NEC_Azusa_Intel_Itanium_azusa_efc Pentium4_1400MHz_loan1_ifc 800 700 600 500 400 300 200 100 0 0 2 4 6 8 10 12 14 16 18 20 log_2(loop length) Holger Brunst, Matthias Müller: Leistungsanalyse Center for Information Services and High Performance Computing (ZIH) Linpack and TOP500 Slides courtesy Jack Dongarra Nöthnitzer Straße 46 Raum 1026 Tel. +49 351 - 463 - 35048 Holger Brunst ([email protected]) Matthias S. Mueller ([email protected]) LINPACKLINPACK Benchmark?Benchmark? The Linpack Benchmark is a measure of a computer’s floating-point rate of execution. It is determined by running a computer program that solves a dense system of linear equations. Over the years the characteristics of the benchmark has changed a bit. In fact, there are three benchmarks included in the Linpack Benchmark report. LINPACK Benchmark Dense linear system solve with LU factorization using partial pivoting Operation count is: 2/3 n3 + O(n2) Benchmark Measure: MFlop/s Original benchmark measures the execution rate for a Fortran program on a matrix of size 100x100. OutputOutput FromFrom LinpackLinpack 100100 BenchmarkBenchmark When the Linpack Fortran n = 100 benchmark is run it produces the following kind of results: Please send the results of this run to: Jack J. Dongarra Computer Science Department University of Tennessee Knoxville, Tennessee 37996-1300 Fax: 865-974-8296 Internet: [email protected] norm. resid resid machep x(1) x(n) 1.67005097E+00 7.41628980E-14 2.22044605E-16 1.00000000E+00 1.00000000E+00 times are reported for matrices of order 100 dgefa dgesl total mflops unit ratio times for array with leading dimension of 201 1.540E-03 6.888E-05 1.609E-03 4.268E+02 4.686E-03 2.873E-02 1.509E-03 7.084E-05 1.579E-03 4.348E+02 4.600E-03 2.820E-02 1.509E-03 7.003E-05 1.579E-03 4.348E+02 4.600E-03 2.820E-02 1.502E-03 6.593E-05 1.568E-03 4.380E+02 4.567E-03 2.800E-02 times for array with leading dimension of 200 1.431E-03 6.716E-05 1.498E-03 4.584E+02 4.363E-03 2.675E-02 1.424E-03 6.694E-05 1.491E-03 4.605E+02 4.343E-03 2.663E-02 1.431E-03 6.699E-05 1.498E-03 4.583E+02 4.364E-03 2.676E-02 1.432E-03 6.439E-05 1.497E-03 4.588E+02 4.360E-03 2.673E-02 Time Total Time Mflop/s Solve Time Factor rate LinpackLinpack BenchmarkBenchmark OverOver TimeTime In the beginning there was the Linpack 100 Benchmark (1977) n=100 (80KB); size that would fit in all the machines Fortran; 64 bit floating point arithmetic No hand optimization (only compiler options) Year Computer Number of Cycle time Mflop/s LinpackLinpack Processors BenchmarkBenchmark 2006 Intel Pentium Woodcrest (3 GHz) 1 3 GHz 3018 2005 NEC SX-8/1 (1 proc) 1 2 GHz 2177 ComputerComputer 2004 Intel Pentium Nocona (1 proc 3.6 GHz) 1 3.6 GHz 1803 atat thethe TopTop 2003 HP Integrity Server rx2600 (1 proc 1.5GHz) 1 1.5 GHz 1635 2002 Intel Pentium 4 (3.06 GHz) 1 2.06 GHz 1414 ofof thethe ListList 2001 Fujitsu VPP5000/1 1 3.33 nsec 1156 Over Time 2000 Fujitsu VPP5000/1 1 3.33 nsec 1156 Over Time 1999 CRAY T916 4 2.2 nsec 1129 forfor n=100n=100 1995 CRAY T916 1 2.2 nsec 522 1994 CRAY C90 16 4.2 nsec 479 LinpackLinpack 1993 CRAY C90 16 4.2 nsec 479 1992 CRAY C90 16 4.2 nsec 479 1991 CRAY C90 16 4.2 nsec 403 1990 CRAY Y-MP 8 6.0 nsec 275 1989 CRAY Y-MP 8 6.0 nsec 275 1988 CRAY Y-MP 1 6.0 nsec 74 1987 ETA 10-E 1 10.5 nsec 52 1986 NEC SX-2 1 6.0 nsec 46 1985 NEC SX-2 1 6.0 nsec 46 1984 CRAY X-MP 1 9.5 nsec 21 1983 CRAY 1 1 12.5 nsec 12 1979 CRAY 1 1 12.5 nsec 3.4 LinpackLinpack BenchmarkBenchmark OverOver TimeTime In the beginning there was the Linpack 100 Benchmark (1977) n=100 (80KB); size that would fit in all the machines Fortran; 64 bit floating point arithmetic No hand optimization (only compiler options) Linpack 1000 (1986) n=1000 (8MB); wanted to see higher performance levels Any language; 64 bit floating point arithmetic Hand optimization OK Linpack TPP (1991) (Top500; 1993) Any size (n as large as you can; n=106; 8TB; ~6 hours); Any language; 64 bit floating point arithmetic Hand optimization OK Strassen’s method not allowed (confuses the op count and rate) ||Ax b || Reference implementation available = O(1) ||Axn |||| || In all cases results are verified by looking at: 21 2 nn32 2n Operations count for factorization 32 ; solve R WhatWhat isis LINPACKLINPACK NxNNxN max Rate N Nmax LINPACK NxN benchmark 1/2 Size Solves system of linear equations by some method Allows the vendors to choose size of problem for benchmark Measures execution time for each size problem LINPACK NxN report Nmax – the size of the chosen problem run on a machine Rmax – the performance in Gflop/s for the chosen size problem run on the machine N1/2 – the size where half the Rmax execution rate is achieved Rpeak – the theoretical peak performance Gflop/s for the machine LINPACK NxN is used to rank TOP500 fastest computers in the world H.H.
Recommended publications
  • Year in Review 2 NEWSLINE January 7, 2011 2010: S&T Achievement and Building for the Future
    Published for Nthe employees of LawrenceEWSLINE Livermore National Laboratory January 7, 2011 Vol. 4, No. 1 Year in review 2 NEWSLINE January 7, 2011 2010: S&T achievement and building for the future hile delivering on its mission obligations with award-winning sci- ence and technology, the Laboratory also spent 2010 building for the future. W In an October all-hands address, Director George Miller said his top priori- ties are investing for the future in programmatic growth and the underpinning infrastructure, as well as recruiting and retaining top talent at the Lab. In Review “It’s an incredibly exciting situation we find ourselves in,” Miller said in an earlier talk about the Lab’s strategic outlook. “If you look at the set of issues facing the country, the Laboratory has experience in all of them.” Defining “national security” broadly, Miller said the Lab will continue to make vital contributions to stockpile stewardship, homeland security, nonprolif- eration, arms control, the environment, climate change and sustainable energy. “Energy, environment and climate change are national security issues,” he said. With an eye toward accelerating the development of technologies that benefit national security and industry, the Lab partnered with Sandia-Calif. to launch the Livermore Valley Open Campus (LVOC) on the Lab’s southeast side. Construction has begun on an R&D campus outside the fence that will allow for collaboration in a broad set of disciplines critical to the fulfillment DOE/NNSA missions and to strengthening U.S. industry’s economic competitiveness, includ- If you look at the set of issues facing ing high-performance computing, energy, cyber security and environment.
    [Show full text]
  • Microprocessor
    MICROPROCESSOR www.MPRonline.com THE REPORTINSIDER’S GUIDE TO MICROPROCESSOR HARDWARE EEMBC’S MULTIBENCH ARRIVES CPU Benchmarks: Not Just For ‘Benchmarketing’ Any More By Tom R. Halfhill {7/28/08-01} Imagine a world without measurements or statistical comparisons. Baseball fans wouldn’t fail to notice that a .300 hitter is better than a .100 hitter. But would they welcome a trade that sends the .300 hitter to Cleveland for three .100 hitters? System designers and software developers face similar quandaries when making trade-offs EEMBC’s role has evolved over the years, and Multi- with multicore processors. Even if a dual-core processor Bench is another step. Originally, EEMBC was conceived as an appears to be better than a single-core processor, how much independent entity that would create benchmark suites and better is it? Twice as good? Would a quad-core processor be certify the scores for accuracy, allowing vendors and customers four times better? Are more cores worth the additional cost, to make valid comparisons among embedded microproces- design complexity, power consumption, and programming sors. (See MPR 5/1/00-02, “EEMBC Releases First Bench- difficulty? marks.”) EEMBC still serves that role. But, as it turns out, most The Embedded Microprocessor Benchmark Consor- EEMBC members don’t openly publish their scores. Instead, tium (EEMBC) wants to help answer those questions. they disclose scores to prospective customers under an NDA or EEMBC’s MultiBench 1.0 is a new benchmark suite for use the benchmarks for internal testing and analysis. measuring the throughput of multiprocessor systems, Partly for this reason, MPR rarely cites EEMBC scores including those built with multicore processors.
    [Show full text]
  • Load Testing, Benchmarking, and Application Performance Management for the Web
    Published in the 2002 Computer Measurement Group (CMG) Conference, Reno, NV, Dec. 2002. LOAD TESTING, BENCHMARKING, AND APPLICATION PERFORMANCE MANAGEMENT FOR THE WEB Daniel A. Menascé Department of Computer Science and E-center of E-Business George Mason University Fairfax, VA 22030-4444 [email protected] Web-based applications are becoming mission-critical for many organizations and their performance has to be closely watched. This paper discusses three important activities in this context: load testing, benchmarking, and application performance management. Best practices for each of these activities are identified. The paper also explains how basic performance results can be used to increase the efficiency of load testing procedures. infrastructure depends on the traffic it expects to see 1. Introduction at its site. One needs to spend enough but no more than is required in the IT infrastructure. Besides, Web-based applications are becoming mission- resources need to be spent where they will generate critical to most private and governmental the most benefit. For example, one should not organizations. The ever-increasing number of upgrade the Web servers if most of the delay is in computers connected to the Internet and the fast the database server. So, in order to maximize the growing number of Web-enabled wireless devices ROI, one needs to know when and how to upgrade create incentives for companies to invest in Web- the IT infrastructure. In other words, not spending at based infrastructures and the associated personnel the right time and spending at the wrong place will to maintain them. By establishing a Web presence, a reduce the cost-benefit of the investment.
    [Show full text]
  • Microbenchmarks in Big Data
    M Microbenchmark Overview Microbenchmarks constitute the first line of per- Nicolas Poggi formance testing. Through them, we can ensure Databricks Inc., Amsterdam, NL, BarcelonaTech the proper and timely functioning of the different (UPC), Barcelona, Spain individual components that make up our system. The term micro, of course, depends on the prob- lem size. In BigData we broaden the concept Synonyms to cover the testing of large-scale distributed systems and computing frameworks. This chap- Component benchmark; Functional benchmark; ter presents the historical background, evolution, Test central ideas, and current key applications of the field concerning BigData. Definition Historical Background A microbenchmark is either a program or routine to measure and test the performance of a single Origins and CPU-Oriented Benchmarks component or task. Microbenchmarks are used to Microbenchmarks are closer to both hardware measure simple and well-defined quantities such and software testing than to competitive bench- as elapsed time, rate of operations, bandwidth, marking, opposed to application-level – macro or latency. Typically, microbenchmarks were as- – benchmarking. For this reason, we can trace sociated with the testing of individual software microbenchmarking influence to the hardware subroutines or lower-level hardware components testing discipline as can be found in Sumne such as the CPU and for a short period of time. (1974). Furthermore, we can also find influence However, in the BigData scope, the term mi- in the origins of software testing methodology crobenchmarking is broadened to include the during the 1970s, including works such cluster – group of networked computers – acting as Chow (1978). One of the first examples of as a single system, as well as the testing of a microbenchmark clearly distinguishable from frameworks, algorithms, logical and distributed software testing is the Whetstone benchmark components, for a longer period and larger data developed during the late 1960s and published sizes.
    [Show full text]
  • Overview of the SPEC Benchmarks
    9 Overview of the SPEC Benchmarks Kaivalya M. Dixit IBM Corporation “The reputation of current benchmarketing claims regarding system performance is on par with the promises made by politicians during elections.” Standard Performance Evaluation Corporation (SPEC) was founded in October, 1988, by Apollo, Hewlett-Packard,MIPS Computer Systems and SUN Microsystems in cooperation with E. E. Times. SPEC is a nonprofit consortium of 22 major computer vendors whose common goals are “to provide the industry with a realistic yardstick to measure the performance of advanced computer systems” and to educate consumers about the performance of vendors’ products. SPEC creates, maintains, distributes, and endorses a standardized set of application-oriented programs to be used as benchmarks. 489 490 CHAPTER 9 Overview of the SPEC Benchmarks 9.1 Historical Perspective Traditional benchmarks have failed to characterize the system performance of modern computer systems. Some of those benchmarks measure component-level performance, and some of the measurements are routinely published as system performance. Historically, vendors have characterized the performances of their systems in a variety of confusing metrics. In part, the confusion is due to a lack of credible performance information, agreement, and leadership among competing vendors. Many vendors characterize system performance in millions of instructions per second (MIPS) and millions of floating-point operations per second (MFLOPS). All instructions, however, are not equal. Since CISC machine instructions usually accomplish a lot more than those of RISC machines, comparing the instructions of a CISC machine and a RISC machine is similar to comparing Latin and Greek. 9.1.1 Simple CPU Benchmarks Truth in benchmarking is an oxymoron because vendors use benchmarks for marketing purposes.
    [Show full text]
  • Hypervisors Vs. Lightweight Virtualization: a Performance Comparison
    2015 IEEE International Conference on Cloud Engineering Hypervisors vs. Lightweight Virtualization: a Performance Comparison Roberto Morabito, Jimmy Kjällman, and Miika Komu Ericsson Research, NomadicLab Jorvas, Finland [email protected], [email protected], [email protected] Abstract — Virtualization of operating systems provides a container and alternative solutions. The idea is to quantify the common way to run different services in the cloud. Recently, the level of overhead introduced by these platforms and the lightweight virtualization technologies claim to offer superior existing gap compared to a non-virtualized environment. performance. In this paper, we present a detailed performance The remainder of this paper is structured as follows: in comparison of traditional hypervisor based virtualization and Section II, literature review and a brief description of all the new lightweight solutions. In our measurements, we use several technologies and platforms evaluated is provided. The benchmarks tools in order to understand the strengths, methodology used to realize our performance comparison is weaknesses, and anomalies introduced by these different platforms in terms of processing, storage, memory and network. introduced in Section III. The benchmark results are presented Our results show that containers achieve generally better in Section IV. Finally, some concluding remarks and future performance when compared with traditional virtual machines work are provided in Section V. and other recent solutions. Albeit containers offer clearly more dense deployment of virtual machines, the performance II. BACKGROUND AND RELATED WORK difference with other technologies is in many cases relatively small. In this section, we provide an overview of the different technologies included in the performance comparison.
    [Show full text]
  • Opportunities and Open Problems for Static and Dynamic Program Analysis Mark Harman∗, Peter O’Hearn∗ ∗Facebook London and University College London, UK
    1 From Start-ups to Scale-ups: Opportunities and Open Problems for Static and Dynamic Program Analysis Mark Harman∗, Peter O’Hearn∗ ∗Facebook London and University College London, UK Abstract—This paper1 describes some of the challenges and research questions that target the most productive intersection opportunities when deploying static and dynamic analysis at we have yet witnessed: that between exciting, intellectually scale, drawing on the authors’ experience with the Infer and challenging science, and real-world deployment impact. Sapienz Technologies at Facebook, each of which started life as a research-led start-up that was subsequently deployed at scale, Many industrialists have perhaps tended to regard it unlikely impacting billions of people worldwide. that much academic work will prove relevant to their most The paper identifies open problems that have yet to receive pressing industrial concerns. On the other hand, it is not significant attention from the scientific community, yet which uncommon for academic and scientific researchers to believe have potential for profound real world impact, formulating these that most of the problems faced by industrialists are either as research questions that, we believe, are ripe for exploration and that would make excellent topics for research projects. boring, tedious or scientifically uninteresting. This sociological phenomenon has led to a great deal of miscommunication between the academic and industrial sectors. I. INTRODUCTION We hope that we can make a small contribution by focusing on the intersection of challenging and interesting scientific How do we transition research on static and dynamic problems with pressing industrial deployment needs. Our aim analysis techniques from the testing and verification research is to move the debate beyond relatively unhelpful observations communities to industrial practice? Many have asked this we have typically encountered in, for example, conference question, and others related to it.
    [Show full text]
  • I.MX 8Quadxplus Power and Performance
    NXP Semiconductors Document Number: AN12338 Application Note Rev. 4 , 04/2020 i.MX 8QuadXPlus Power and Performance 1. Introduction Contents This application note helps you to design power 1. Introduction ........................................................................ 1 management systems. It illustrates the current drain 2. Overview of i.MX 8QuadXPlus voltage supplies .............. 1 3. Power measurement of the i.MX 8QuadXPlus processor ... 2 measurements of the i.MX 8QuadXPlus Applications 3.1. VCC_SCU_1V8 power ........................................... 4 Processors taken on NXP Multisensory Evaluation Kit 3.2. VCC_DDRIO power ............................................... 4 (MEK) Platform through several use cases. 3.3. VCC_CPU/VCC_GPU/VCC_MAIN power ........... 5 3.4. Temperature measurements .................................... 5 This document provides details on the performance and 3.5. Hardware and software used ................................... 6 3.6. Measuring points on the MEK platform .................. 6 power consumption of the i.MX 8QuadXPlus 4. Use cases and measurement results .................................... 6 processors under a variety of low- and high-power 4.1. Low-power mode power consumption (Key States modes. or ‘KS’)…… ......................................................................... 7 4.2. Complex use case power consumption (Arm Core, The data presented in this application note is based on GPU active) ......................................................................... 11 5. SOC
    [Show full text]
  • FY 2005 Annual Performance Evaluation and Appraisal Lawrence Livermore National Laboratory (Rev
    Description of document: FY 2005 Annual Performance Evaluation and Appraisal Lawrence Livermore National Laboratory (Rev. 1 June 15, 2006) Requested date: 26-January-2007 Released date: 11-September-2007 Posted date: 15-October-2007 Title of Document Fiscal Year 2005 Annual Performance Evaluation and Appraisal Lawrence Livermore National Laboratory Date/date range of document: FY 2005 Source of document: Department of Energy National Nuclear Security Administration Service Center P.O. Box 5400 Albuquerque, NM 87185 Freedom of Information Act U.S. Department of Energy 1000 Independence Ave., S.W. Washington, DC 20585 (202) 586-5955 [email protected] http://management.energy.gov/foia_pa.htm The governmentattic.org web site (“the site”) is noncommercial and free to the public. The site and materials made available on the site, such as this file, are for reference only. The governmentattic.org web site and its principals have made every effort to make this information as complete and as accurate as possible, however, there may be mistakes and omissions, both typographical and in content. The governmentattic.org web site and its principals shall have neither liability nor responsibility to any person or entity with respect to any loss or damage caused, or alleged to have been caused, directly or indirectly, by the information provided on the governmentattic.org web site or in this file. Department of Energy National Nuclear Security Administration Service Center P. O. Box 5400 Albuquerque, NM 87185 SEP 11 200t CERTIFIED MAIL - RESTRICTED DELIVERY - RETURN RECEIPT REQUESTED This is in final response to your Freedom oflnformation Act (FOIA) request dated January 26, 2007, for "a copy ofthe most recent two annualperformance reviews for Pantex Site, Kansas City Site, Sandia Site, Los Alamos Site, Y-12 Site and Livermore Site." I contacted the Site Offices who have oversight responsibility for the records you requested, and they are enclosed.
    [Show full text]
  • Through the Years… When Did It All Begin?
    & Through the years… When did it all begin? 1974? 1978? 1963? 2 CDC 6600 – 1974 NERSC started service with the first Supercomputer… ● A well-used system - Serial Number 1 ● On its last legs… ● Designed and built in Chippewa Falls ● Launch Date: 1963 ● Load / Store Architecture ● First RISC Computer! ● First CRT Monitor ● Freon Cooled ● State-of-the-Art Remote Access at NERSC ● Via 4 acoustic modems, manually answered capable of 10 characters /sec 3 50th Anniversary of the IBM / Cray Rivalry… Last week, CDC had a press conference during which they officially announced their 6600 system. I understand that in the laboratory developing this system there are only 32 people, “including the janitor”… Contrasting this modest effort with our vast development activities, I fail to understand why we have lost our industry leadership position by letting someone else offer the world’s most powerful computer… T.J. Watson, August 28, 1963 4 2/6/14 Cray Higher-Ed Roundtable, July 22, 2013 CDC 7600 – 1975 ● Delivered in September ● 36 Mflop Peak ● ~10 Mflop Sustained ● 10X sustained performance vs. the CDC 6600 ● Fast memory + slower core memory ● Freon cooled (again) Major Innovations § 65KW Memory § 36.4 MHz clock § Pipelined functional units 5 Cray-1 – 1978 NERSC transitions users ● Serial 6 to vector architectures ● An fairly easy transition for application writers ● LTSS was converted to run on the Cray-1 and became known as CTSS (Cray Time Sharing System) ● Freon Cooled (again) ● 2nd Cray 1 added in 1981 Major Innovations § Vector Processing § Dependency
    [Show full text]
  • Unix and Linux System Administration and Shell Programming
    Unix and Linux System Administration and Shell Programming Unix and Linux System Administration and Shell Programming version 56 of August 12, 2014 Copyright © 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2009, 2010, 2011, 2012, 2013, 2014 Milo This book includes material from the http://www.osdata.com/ website and the text book on computer programming. Distributed on the honor system. Print and read free for personal, non-profit, and/or educational purposes. If you like the book, you are encouraged to send a donation (U.S dollars) to Milo, PO Box 5237, Balboa Island, California, USA 92662. This is a work in progress. For the most up to date version, visit the website http://www.osdata.com/ and http://www.osdata.com/programming/shell/unixbook.pdf — Please add links from your website or Facebook page. Professors and Teachers: Feel free to take a copy of this PDF and make it available to your class (possibly through your academic website). This way everyone in your class will have the same copy (with the same page numbers) despite my continual updates. Please try to avoid posting it to the public internet (to avoid old copies confusing things) and take it down when the class ends. You can post the same or a newer version for each succeeding class. Please remove old copies after the class ends to prevent confusing the search engines. You can contact me with a specific version number and class end date and I will put it on my website. version 56 page 1 Unix and Linux System Administration and Shell Programming Unix and Linux Administration and Shell Programming chapter 0 This book looks at Unix (and Linux) shell programming and system administration.
    [Show full text]
  • Investigations of Various HPC Benchmarks to Determine Supercomputer Performance Efficiency and Balance
    Investigations of Various HPC Benchmarks to Determine Supercomputer Performance Efficiency and Balance Wilson Lisan August 24, 2018 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2018 Abstract This dissertation project is based on participation in the Student Cluster Competition (SCC) at the International Supercomputing Conference (ISC) 2018 in Frankfurt, Germany as part of a four-member Team EPCC from The University of Edinburgh. There are two main projects which are the team-based project and a personal project. The team-based project focuses on the optimisations and tweaks of the HPL, HPCG, and HPCC benchmarks to meet the competition requirements. At the competition, Team EPCC suffered with hardware issues that shaped the cluster into an asymmetrical system with mixed hardware. Unthinkable and extreme methods were carried out to tune the performance and successfully drove the cluster back to its ideal performance. The personal project focuses on testing the SCC benchmarks to evaluate the performance efficiency and system balance at several HPC systems. HPCG fraction of peak over HPL ratio was used to determine the system performance efficiency from its peak and actual performance. It was analysed through HPCC benchmark that the fraction of peak ratio could determine the memory and network balance over the processor or GPU raw performance as well as the possibility of the memory or network bottleneck part. Contents Chapter 1 Introduction ..............................................................................................
    [Show full text]