MPI on Aurora

Total Page:16

File Type:pdf, Size:1020Kb

MPI on Aurora AN OVERVIEW OF AURORA, ARGONNE’S UPCOMING EXASCALE SYSTEM ALCF DEVELOPERS SESSION COLLEEN BERTONI, SUDHEER CHUNDURI www.anl.gov AURORA: An Intel-Cray System Intel/Cray machine arriving at Argonne in 2021 Sustained Performance greater than 1 Exaflops 2 AURORA: A High-level View § Hardware Architecture: § Intel Xeon processors and Intel Xe GPUs § Greater than 10 PB of total memory § Cray Slingshot network and Shasta platform § IO • Uses Lustre and Distributed Asynchronous Object Store IO (DAOS) • Greater than 230 PB of storage capacity and 25 TB/s of bandwidth § Software (Intel One API umbrella): § Intel compilers (C,C++,Fortran) § Programming models: DPC++, OpenMP, OpenCL § Libraries: oneMKL, oneDNN, oneDAL § Tools: VTune, Advisor § Python 3 Node-level Hardware The Evolution of Intel GPUs Source: Intel 5 Intel GPUs § Intel Integrated GPUs are used for over a decade in § Laptops (e.g. MacBook pro) § Desktops § Servers § Recent and upcoming integrated generations : § Gen 9 – used in Skylake based nodes § Gen 11 – used in Ice Lake based nodes § Gen 9: Double precision peak performance: 100-300 GF § Low by design due to power and space limits Layout of Architecture components for an Intel Core i7 processor 6700K for desktop systems (91 W TDP, 122 mm) § Future Intel Xe (Gen 12) GPU series will provide both integrated and discrete GPUs 6 Intel GPU Building Blocks EU: Execution Unit Subslice L2 Slice: 24 EUs SIMD FPU Dispatch&I$ L1 Shared Local Memory (64 KB/subslice) SIMD FPU Subslice: 8 EUs Send 8x EU Sampler L2 $ L3 Data Cache Branch Dataport L1 $ (512 KB/slice) Thread Arbiter Thread RT InstructionFetch Slice: 24 EUs Shared Local Memory (64 KB/subslice) Subslice: 8 EUs Sampler L2 $ L3 Data Cache L1 $ (512 KB/slice) Slice: 24 EUs Shared Local Memory (64 KB/subslice) Subslice: 8 EUs Sampler L2 $ L3 Data Cache L1 $ (512 KB/slice) 7 Gen 9 IRIS (GT4) GPU Characteristics Characteristics Value Notes Clock 1.15 GHz Slices 3 EUs 72 3 slice * 3 sub-slices * 8 EUs Hardware Threads 504 72 EUs * 7 threads Concurrent Kernel Instances 16,128 504 threads * SIMD-32 L3 Data Cache Size 1.5 MB 3 slices * 0.5 MB/slice Max Shared Local Memory 576 KB 3 slice * 3 sub-slices * 64 KB/sub-slice Last Level Cache Size 8 MB eDRAM size 128 MB 32b float FLOPS 1152 FLOPS/cycle 72 EUs * 2 FPUs * SIMD-4 * (MUL + ADD) 331.2 DP 64b float FLOPS 288 FLOPS/cycle 72 EUs * 1 FPU * SIMD-2 * (MUL + ADD) GFlops 32b integer IOPS 576 IOPS/cycle 72 EUs * 2 FPUs * SIMD-4 8 Aurora Compute Node § 2 Intel Xeon (Sapphire Rapids) Xe Xe processors § 6 Xe Architecture based GPUs Xe Xe (Ponte Vecchio) Xe Xe § 8 Slingshot Fabric endpoints Slingshot Slingshot § Unified Memory Architecture CPU CPU across CPU & GPU DRAM DRAM § All-to-all connectivity within node § Low latency and high bandwidth 9 Network Interconnect Cray Slingshot Network § Dragonfly topology: Global links § 3-hop network topology § Groups connected globally all-to-all with optical cables Flattened Butterfly (Intra-group) § High bandwidth switch: Switch Switch Switch Switch Switch Switch Switch Switch § 64 ports at 200 Gb/s in each direction Nodes Nodes Nodes Nodes Nodes Nodes Nodes Nodes § Total bandwidth of 25.6 Tb/s per switch 11 Rosetta Switch HPC Features § QoS – Traffic Classes § Class: Collection of buffers, queues and bandwidth § Intended to provide isolation between applications via traffic shaping § Aggressive adaptive routing § Expected to be more effective for 3-hop dragonfly due to closer congestion information § Multi-level congestion management § To minimize the impact of congested applications on others Rosetta Switch § Very low average and tail latency (64 ports x 200 Gbps/dir) § High performance multicast and reductions 12 Slingshot Quality of Service Classes § Highly tunable QoS classes § Priority, ordering, routing biases, minimum bandwidth guarantees, maximum bandwidth Traffic type 1 constraint etc. § Supports multiple, overlaid virtual Traffic type 2 networks … Traffic type 3 § High priority compute § Standard compute § Low-latency control & synchronization Bandwidth sharing without QoS § Bulk I/O § Scavenger class background § Jobs can use multiple traffic classes Traffic class 1 § Provides performance isolation for different types of traffic Traffic class 2 § Smaller-size reductions can avoid getting Traffic class 3 stuck behind large messages § Less interference between compute and I/O Bandwidth sharing with QoS 13 Congestion Management § Hardware automatically tracks all outstanding packets § Knows what is flowing between every pair of endpoints § Quickly identifies and controls causes of congestion § Pushes back on sources (injection points) … just the right amount § Frees up buffer space for for other traffic § Other traffic not affected and can pass stalled traffic § Avoids HOL blocking across entire fabric § Fast and stable across a wide variety of traffic patterns § Suitable for dynamic HPC traffic § Performance isolation between application on same QoS class § Applications much less impacted by other traffic on the network – predictable runtimes § Lower mean and tail latency – a big benefit for apps with global synchronization 14 GPCNeT (Global Performance and Congestion Network Tests) § A new benchmark developed by Cray in collaboration with Argonne and NERSC § Publicly available at https://github.com/netbench/GPCNET § Goals: § Proxy for real-world application communication patterns § Measure network performance under load § Look at both mean and tail latency § Look at interference between workloads § How well does network perform Congestion management? 15 Congestion Impact in Real Systems 16 Congestion Impact in Real Systems 17 Congestion Impact in Real Systems § Infiniband does better than Aries § Impact worsens with scale and taper 18 Congestion Impact in Real Systems § Infiniband does better than Aries § Impact worsens with scale and taper § Slingshot does really well 19 MPI on Aurora § Intel MPI & Cray MPI - MPI 3.0 standard compliant § The MPI library will be thread safe and allow applications to use MPI from individual threads § Efficient MPI_THREAD_MUTIPLE (locking optimizations) § Asynchronous progress in all types of nonblocking communication § Nonblocking send-receive § Nonblocking collectives and § One-sided § Hardware optimized collective implementations § Topology optimized collective implementations § Supports MPI tools interface § Control variables 20 IO Aurora IO Subsystem Overview § Two-tiered approach to storage § DAOS provides high-performance storage for compute datasets and DAOS § Lustre provides traditional parallel file system Compute Groups checkpoints Nodes § Distributed Asynchronous Object Store (DAOS) NVM Storage Login and § Provides the main storage system for Aurora Compute Service • Greater than 230 PB of storage capacity nodes Slingshot DAOS<>Lustre/GPFS Nodes Dataset movement • Greater than 25 TB/s of bandwidth § New storage paradigm that provides significant performance improvements Libs, binaries, Gateway • Enable new data-centric workflows and namespace Nodes • Provides a POSIX wrapper with relaxed POSIX semantics § Lustre Infiniband GPFS Tier § Traditional PFS Lustre Tier HDD HDD Storage § Provides global namespace for the center Storage § Compatibility with legacy systems Pre-installed Storage § Storage for source code, binaries, and libraries § Storage of “cold” project data 22 I/O – User Perspective § User see single namespace which is in Lustre § “Links” point to DAOS containers within the /project directory § DAOS aware software interpret these links and access the DAOS containers Regular PFS directories & files § Data resides in a single place (Lustre or DAOS) HDF5 Container DAOS POSIX Container § Explicit data movement, no auto-migration /mnt DAOS MPI-IO Container § Users keep daos § source and binaries in Lustre lustre § Bulk data in DAOS PUUID1 PUUID2 users libs projects CUUID2 CUUID3 Buzz mkl.so hdf5.so Apollo Gemini CUUID1 POSIX Container MPI-IO Container HDF5 root MPI-IO file Container dir dir dir MPI-IO MPI-IO .shrc moon.mpg data data data datadata datadata datadata file file Simul.h5 Result.dn Simul.out datafile datafile datafilet EA: daos/hdf5://PUUID1/CUUID1 EA: daos/POSIX://PUUID2/CUUID2 EA: daos/MPI-IO://PUUID2/CUUID3 23 SOFTWARE Pre-exascale and Exascale US landscape Delivery CPU + Accelerator Vendor Summit 2018 IBM + NVIDIA Sierra 2018 IBM + NVIDIA Perlmutter 2020 AMD + NVIDIA Aurora 2021 Intel + Intel Frontier 2021 AMD + AMD El Captain 2022 AMD + X § Heterogenous Computing (CPU + Accelerator) § Varying vendors 25 Three Pillars Simulation Data Learning HPC Languages Productivity Languages Productivity Languages Directives Big Data Stack DL Frameworks Parallel Runtimes Statistical Libraries Statistical Libraries Solver Libraries Databases Linear Algebra Libraries Compilers, Performance Tools, Debuggers Math Libraries, C++ Standard Library, libc I/O, Messaging Containers, Visualization Scheduler Linux Kernel, POSIX 26 PROGRAMMING MODELS Heterogenous System programming models § Applications will be using a variety of programming models for Exascale: § CUDA § OpenCL § HIP § OpenACC § OpenMP § DPC++/SYCL § Kokkos § Raja § Not all systems will support all models § Libraries may help you abstract some programming models. 28 Aurora Programming Models § Programming models available on Aurora: § CUDA § OpenCL § HIP § OpenACC § OpenMP § DPC++/SYCL § Kokkos § Raja 29 Mapping of existing programming models to Aurora Aurora Models OpenMP w/o target OpenMP with target OpenMP OpenACC Vendor Supported Programming Models OpenCL OpenCL MPI + CUDA DPC++/
Recommended publications
  • Petaflops for the People
    PETAFLOPS SPOTLIGHT: NERSC housands of researchers have used facilities of the Advanced T Scientific Computing Research (ASCR) program and its EXTREME-WEATHER Department of Energy (DOE) computing predecessors over the past four decades. Their studies of hurricanes, earthquakes, NUMBER-CRUNCHING green-energy technologies and many other basic and applied Certain problems lend themselves to solution by science problems have, in turn, benefited millions of people. computers. Take hurricanes, for instance: They’re They owe it mainly to the capacity provided by the National too big, too dangerous and perhaps too expensive Energy Research Scientific Computing Center (NERSC), the Oak to understand fully without a supercomputer. Ridge Leadership Computing Facility (OLCF) and the Argonne Leadership Computing Facility (ALCF). Using decades of global climate data in a grid comprised of 25-kilometer squares, researchers in These ASCR installations have helped train the advanced Berkeley Lab’s Computational Research Division scientific workforce of the future. Postdoctoral scientists, captured the formation of hurricanes and typhoons graduate students and early-career researchers have worked and the extreme waves that they generate. Those there, learning to configure the world’s most sophisticated same models, when run at resolutions of about supercomputers for their own various and wide-ranging projects. 100 kilometers, missed the tropical cyclones and Cutting-edge supercomputing, once the purview of a small resulting waves, up to 30 meters high. group of experts, has trickled down to the benefit of thousands of investigators in the broader scientific community. Their findings, published inGeophysical Research Letters, demonstrated the importance of running Today, NERSC, at Lawrence Berkeley National Laboratory; climate models at higher resolution.
    [Show full text]
  • GPU Developments 2018
    GPU Developments 2018 2018 GPU Developments 2018 © Copyright Jon Peddie Research 2019. All rights reserved. Reproduction in whole or in part is prohibited without written permission from Jon Peddie Research. This report is the property of Jon Peddie Research (JPR) and made available to a restricted number of clients only upon these terms and conditions. Agreement not to copy or disclose. This report and all future reports or other materials provided by JPR pursuant to this subscription (collectively, “Reports”) are protected by: (i) federal copyright, pursuant to the Copyright Act of 1976; and (ii) the nondisclosure provisions set forth immediately following. License, exclusive use, and agreement not to disclose. Reports are the trade secret property exclusively of JPR and are made available to a restricted number of clients, for their exclusive use and only upon the following terms and conditions. JPR grants site-wide license to read and utilize the information in the Reports, exclusively to the initial subscriber to the Reports, its subsidiaries, divisions, and employees (collectively, “Subscriber”). The Reports shall, at all times, be treated by Subscriber as proprietary and confidential documents, for internal use only. Subscriber agrees that it will not reproduce for or share any of the material in the Reports (“Material”) with any entity or individual other than Subscriber (“Shared Third Party”) (collectively, “Share” or “Sharing”), without the advance written permission of JPR. Subscriber shall be liable for any breach of this agreement and shall be subject to cancellation of its subscription to Reports. Without limiting this liability, Subscriber shall be liable for any damages suffered by JPR as a result of any Sharing of any Material, without advance written permission of JPR.
    [Show full text]
  • 1 Intel CEO Remarks Pat Gelsinger Q2'21 Earnings Webcast July 22
    Intel CEO Remarks Pat Gelsinger Q2’21 Earnings Webcast July 22, 2021 Good afternoon, everyone. Thanks for joining our second-quarter earnings call. It’s a thrilling time for both the semiconductor industry and for Intel. We're seeing unprecedented demand as the digitization of everything is accelerated by the superpowers of AI, pervasive connectivity, cloud-to-edge infrastructure and increasingly ubiquitous compute. Our depth and breadth of software, silicon and platforms, and packaging and process, combined with our at-scale manufacturing, uniquely positions Intel to capitalize on this vast growth opportunity. Our Q2 results, which exceeded our top and bottom line expectations, reflect the strength of the industry, the demand for our products, as well as the superb execution of our factory network. As I’ve said before, we are only in the early innings of what is likely to be a decade of sustained growth across the industry. Our momentum is building as once again we beat expectations and raise our full-year revenue and EPS guidance. Since laying out our IDM 2.0 strategy in March, we feel increasingly confident that we're moving the company forward toward our goal of delivering leadership products in every category in which we compete. While we have work to do, we are making strides to renew our execution machine: 7nm is progressing very well. We’ve launched new innovative products, established Intel Foundry Services, and made operational and organizational changes to lay the foundation needed to win in the next phase of our company’s great history. Here at Intel, we’re proud of our past, pragmatic about the work ahead, but, most importantly, confident in our future.
    [Show full text]
  • Intel 2019 Year Book
    YEARBOOK 2019 POWERING THE FUTURE Our 2019 yearbook invites you to look back and reflect on a memorable year for Intel. TABLE OF CONTENTS 2019 kicked off with the announcement of our new p4 New CEO. Evolving culture. Expanded ambitions. chief executive, Bob Swan. It was followed by a stream of notable news: product announcements, technology p6 More data. More storage. More processing. breakthroughs, new customers and partnerships, p10 Innovation for the PC user experience and important moves to evolve Intel’s culture as the company entered its sixth decade. p12 Self-driving cars hit the road p2 p16 AI unlocks the power of data It’s a privilege to tell the Intel story in all its complexity and humanity. Looking through these pages, the p18 Helping customers push boundaries breadth and depth of what we’ve achieved in 12 p22 More supply to meet strong demand months is substantial, as is the strong foundation we’ve built for even greater impact in the future. p26 Next-gen hardware and software to unlock it p28 Tech’s future: Inventing and investing I hope you enjoy this colorful look at what’s possible when more than 100,000 individuals from every p32 Reinforcing the nature of Moore’s Law corner of the globe unite to change the world – p34 Building for the smarter future through technologies that make a positive difference to our customers, to society, and to people’s lives. — Claire Dixon, Chief Communications Officer NEW CEO. EVOLVING CULTURE. EXPANDED AMBITIONS. 2019 was an important year in Intel’s transformation, with a new chief executive officer, ambitious business priorities, an aspirational culture evolution, and a farewell to Focal.
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • PC Technician Essentials PC Anatomy
    Motherboards Prepared & Presented by Professor Gupta Motherboard The primary component of a computer is the motherboard (sometimes called the "mainboard"). The motherboard is the hub which is used to connect all of the computer's essential components. Motherboards ■ Everything that makes a computer a computer must be attached to the motherboard. From the CPU to storage devices, from RAM to printer ports, the motherboard provides the connections that help them work together. ■ The motherboard is essential to computer operation in large part because of the two major buses it contains: the system bus and the I/O bus. Together, these buses carry all the information between the different parts of the computer. ■ Components: Socket 775 processor; Dual-channel DDR2 memory slots; Heat sink over North Bridge; 24-pin ATX v2.0 power connector; South Bridge chip; PCI slots; PCI Express x16 slot; PCI Express x1 slot; CMOS battery; Port cluster; SATA host adapter; Floppy drive controller; PATA host adapter; 4-pix ATX12 power connector; Mounting holes. Prepared & Presented by Professor Gupta 301-802- 9066 AT & ATX Motherboard form factor The term "form factor" is normally used to refer to the motherboard's geometry, dimensions, arrangement, and electrical requirements. In order to build motherboards which can be used in different brands of cases, a few standards have been developed: • AT baby/AT full format is a format used in the earliest 386 and 486 PCs. This format was replaced by the ATX format, which shape allowed for better air circulation and made it easier to access the components; • ATX: The ATX format is an upgrade to Baby-AT.
    [Show full text]
  • Ushering in a New Era: Argonne National Laboratory & Aurora
    Ushering in a New Era Argonne National Laboratory’s Aurora System April 2015 ANL Selects Intel for World’s Biggest Supercomputer 2-system CORAL award extends IA leadership in extreme scale HPC Aurora Argonne National Laboratory >180PF Trinity NNSA† April ‘15 Cori >40PF NERSC‡ >30PF July ’14 + Theta Argonne National Laboratory April ’14 >8.5PF >$200M ‡ Cray* XC* Series at National Energy Research Scientific Computing Center (NERSC). † Cray XC Series at National Nuclear Security Administration (NNSA). 2 The Most Advanced Supercomputer Ever Built An Intel-led collaboration with ANL and Cray to accelerate discovery & innovation >180 PFLOPS (option to increase up to 450 PF) 18X higher performance† >50,000 nodes Prime Contractor 13MW >6X more energy efficient† 2018 delivery Subcontractor Source: Argonne National Laboratory and Intel. †Comparison of theoretical peak double precision FLOPS and power consumption to ANL’s largest current system, MIRA (10PFs and 4.8MW) 3 Aurora | Science From Day One! Extreme performance for a broad range of compute and data-centric workloads Transportation Biological Science Renewable Energy Training Argonne Training Program on Extreme- Scale Computing Aerodynamics Biofuels / Disease Control Wind Turbine Design / Placement Materials Science Computer Science Public Access Focus Areas Focus US Industry and International Co-array Fortran Batteries / Solar Panels New Programming Models 4 Aurora | Built on a Powerful Foundation Breakthrough technologies that deliver massive benefits Compute Interconnect File System 3rd Generation 2nd Generation Intel® Xeon Phi™ Intel® Omni-Path Intel® Lustre* Architecture Software >17X performance† >20X faster† >3X faster† FLOPS per node >500 TB/s bi-section bandwidth >1 TB/s file system throughput >12X memory bandwidth† >2.5 PB/s aggregate node link >5X capacity† bandwidth >30PB/s aggregate >150TB file system capacity in-package memory bandwidth Integrated Intel® Omni-Path Architecture Processor code name: Knights Hill Source: Argonne National Laboratory and Intel.
    [Show full text]
  • Architectural Trade-Offs in a Latency Tolerant Gallium Arsenide Microprocessor
    Architectural Trade-offs in a Latency Tolerant Gallium Arsenide Microprocessor by Michael D. Upton A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Electrical Engineering) in The University of Michigan 1996 Doctoral Committee: Associate Professor Richard B. Brown, CoChairperson Professor Trevor N. Mudge, CoChairperson Associate Professor Myron Campbell Professor Edward S. Davidson Professor Yale N. Patt © Michael D. Upton 1996 All Rights Reserved DEDICATION To Kelly, Without whose support this work may not have been started, would not have been enjoyed, and could not have been completed. Thank you for your continual support and encouragement. ii ACKNOWLEDGEMENTS Many people, both at Michigan and elsewhere, were instrumental in the completion of this work. I would like to thank my co-chairs, Richard Brown and Trevor Mudge, first for attracting me to Michigan, and then for allowing our group the freedom to explore many different ideas in architecture and circuit design. Their guidance and motivation combined to make this a truly memorable experience. I am also grateful to each of my other dissertation committee members: Ed Davidson, Yale Patt, and Myron Campbell. The support and encouragement of the other faculty on the project, Karem Sakallah and Ron Lomax, is also gratefully acknowledged. My friends and former colleagues Mark Rossman, Steve Sugiyama, Ray Farbarik, Tom Rossman and Kendall Russell were always willing to lend their assistance. Richard Oettel continually reminded me of the valuable support of friends and family, and the importance of having fun in your work. Our corporate sponsors: Cascade Design Automation, Chronologic, Cadence, and Metasoft, provided software and support that made this work possible.
    [Show full text]
  • TOP500 Supercomputer Sites
    7/24/2018 News | TOP500 Supercomputer Sites HOME | SEARCH | REGISTER RSS | MY ACCOUNT | EMBED RSS | SUPER RSS | Contact Us | News | TOP500 Supercomputer Sites http://top500.org/blog/category/feature-article/feeds/rss Are you the publisher? Claim or contact Browsing the Latest Browse All Articles (217 Live us about this channel Snapshot Articles) Browser Embed this Channel Description: content in your HTML TOP500 News Search Report adult content: 04/27/18--03:14: UK Commits a 0 0 Billion Pounds to AI Development click to rate The British government and the private sector are investing close to £1 billion Account: (login) pounds to boost the country’s artificial intelligence sector. The investment, which was announced on Thursday, is part of a wide-ranging strategy to make the UK a global leader in AI and big data. More Channels Under the investment, known as the “AI Sector Deal,” government, industry, and academia will contribute £603 million in new funding, adding to the £342 million already allocated in existing budgets. That brings the grand total to Showcase £945 million, or about $1.3 billion at the current exchange rate. The UK RSS Channel Showcase 1586818 government is also looking to increase R&D spending across all disciplines by 2.4 percent, while also raising the R&D tax credit from 11 to 12 percent. This is RSS Channel Showcase 2022206 part of a broader commitment to raise government spending in this area from RSS Channel Showcase 8083573 around £9.5 billion in 2016 to £12.5 billion in 2021. RSS Channel Showcase 1992889 The UK government policy paper that describes the sector deal meanders quite a bit, describing a lot of programs and initiatives that intersect with the AI investments, but are otherwise free-standing.
    [Show full text]
  • Intel 8080 Oral History
    Oral History Panel on the Development and Promotion of the Intel 8080 Microprocessor Participants: Steve Bisset Federico Faggin Hal Feeney Ed Gelbach Ted Hoff Stan Mazor Masatoshi Shima Moderated by: Dave House Recorded: April 26, 2007 Mountain View, California CHM Reference number: X4021.2007 © 2007 Computer History Museum Oral History Panel on Intel 8080 Microprocessor Dave House: Welcome to the video history of the MCS-80, or the 8080 microprocessor. We have with us today the team responsible for developing those products, and I'd like to start out, first of all, to introduce myself. I'm Dave House. I arrived just before the 8080 was introduced at Intel, so I was not part of the development team. But I'll be the MC today, and I'm going to ask some of the new members to our panel to introduce themselves and give their background. Ed Gelbach: I joined Intel in late, mid-1971. We had developed the microprocessor at that point. However, it was being used generally as a calculator type chip, and we were looking for ways to expand it. Prior to joining Intel, I worked for Texas Instruments for about ten years. And prior to that, I was at General Electric. House: And you grew up in? Gelbach: Southern California. House: Southern California beach boy. Gelbach: I graduated from USC [University of Southern California], and worked for TI [Texas Instruments] in Los Angeles predominantly. I moved to Texas for a short period of time, and then moved up here [Northern California] in 1971. House: Okay. Steve Bisset joined our team.
    [Show full text]
  • History of Micro-Computers
    M•I•C•R•O P•R•O•C•E•S•S•O•R E•V•O•L•U•T•I.O•N Reprinted by permission from BYTE, September 1985.. a McGraw-Hill Inc. publication. Prices quoted are in US S. EVOLUTION OF THE MICROPROCESSOR An informal history BY MARK GARETZ Author's note: The evolution of were many other applica- the microprocessor has followed tions for the new memory a complex and twisted path. To chip, which was signifi- those of you who were actually cantly larger than any that involved in some of the follow- had been produced ing history, 1 apologize if my before. version is not exactly like yours. About this time, the The opinions expressed in this summer of 1969, Intel was article are my own and may or approached by the may not represent reality as Japanese calculator manu- someone else perceives it. facturer Busicom to pro- duce a set of custom chips THE TRANSISTOR, devel- designed by Busicom oped at Bell Laboratories engineers for the Jap- in 1947, was designed to anese company's new line replace the vacuum tube, of calculators. The to switch electronic sig- calculators would have nals on and off. (Al- several chips, each of though, at the time, which would contain 3000 vacuum tubes were used to 5000 transistors. mainly as amplifiers, they Intel designer Marcian were also used as (led) Hoff was assigned to switches.) The advent of assist the team of Busi- the transistor made possi- com engineers that had ble a digital computer that taken up residence at didn't require an entire Intel.
    [Show full text]
  • The GPU Computing Revolution
    The GPU Computing Revolution From Multi-Core CPUs to Many-Core Graphics Processors A Knowledge Transfer Report from the London Mathematical Society and Knowledge Transfer Network for Industrial Mathematics By Simon McIntosh-Smith Copyright © 2011 by Simon McIntosh-Smith Front cover image credits: Top left: Umberto Shtanzman / Shutterstock.com Top right: godrick / Shutterstock.com Bottom left: Double Negative Visual Effects Bottom right: University of Bristol Background: Serg64 / Shutterstock.com THE GPU COMPUTING REVOLUTION From Multi-Core CPUs To Many-Core Graphics Processors By Simon McIntosh-Smith Contents Page Executive Summary 3 From Multi-Core to Many-Core: Background and Development 4 Success Stories 7 GPUs in Depth 11 Current Challenges 18 Next Steps 19 Appendix 1: Active Researchers and Practitioner Groups 21 Appendix 2: Software Applications Available on GPUs 23 References 24 September 2011 A Knowledge Transfer Report from the London Mathematical Society and the Knowledge Transfer Network for Industrial Mathematics Edited by Robert Leese and Tom Melham London Mathematical Society, De Morgan House, 57–58 Russell Square, London WC1B 4HS KTN for Industrial Mathematics, Surrey Technology Centre, Surrey Research Park, Guildford GU2 7YG 2 THE GPU COMPUTING REVOLUTION From Multi-Core CPUs To Many-Core Graphics Processors AUTHOR Simon McIntosh-Smith is head of the Microelectronics Research Group at the Univer- sity of Bristol and chair of the Many-Core and Reconfigurable Supercomputing Conference (MRSC), Europe’s largest conference dedicated to the use of massively parallel computer architectures. Prior to joining the university he spent fifteen years in industry where he designed massively parallel hardware and software at companies such as Inmos, STMicro- electronics and Pixelfusion, before co-founding ClearSpeed as Vice-President of Architec- ture and Applications.
    [Show full text]