Summarizing CPU and GPU Design Trends with Product Data

Total Page:16

File Type:pdf, Size:1020Kb

Summarizing CPU and GPU Design Trends with Product Data Summarizing CPU and GPU Design Trends with Product Data Yifan Sun, Nicolas Bohm Agostini, Shi Dong, and David Kaeli Northeastern University Email: fyifansun, agostini, shidong, [email protected] Abstract—Moore’s Law and Dennard Scaling have guided the products. Equipped with this data, we answer the following semiconductor industry for the past few decades. Recently, both questions: laws have faced validity challenges as transistor sizes approach • Are Moore’s Law and Dennard Scaling still valid? If so, the practical limits of physics. We are interested in testing the validity of these laws and reflect on the reasons responsible. In what are the factors that keep the laws valid? this work, we collect data of more than 4000 publicly-available • Do GPUs still have computing power advantages over CPU and GPU products. We find that transistor scaling remains CPUs? Is the computing capability gap between CPUs critical in keeping the laws valid. However, architectural solutions and GPUs getting larger? have become increasingly important and will play a larger role • What factors drive performance improvements in GPUs? in the future. We observe that GPUs consistently deliver higher performance than CPUs. GPU performance continues to rise II. METHODOLOGY because of increases in GPU frequency, improvements in the thermal design power (TDP), and growth in die size. But we We have collected data for all CPU and GPU products (to also see the ratio of GPU to CPU performance moving closer to our best knowledge) that have been released by Intel, AMD parity, thanks to new SIMD extensions on CPUs and increased (including the former ATI GPUs)1, and NVIDIA since January CPU core counts. 1st, 2000. In total, the dataset consists of 4031 products (2102 CPUs and 1929 GPUs). For each product, we record its release I. INTRODUCTION date and properties, such as process size, die size, transistor Moore’s Law [8] and Dennard Scaling [2] have guided count, base frequency and thermal design power (TDP). the development of the semiconductor industry. Moore’s Law A large number of GPU products (e.g., NVIDIA Tesla K80) predicted that the number of transistors on a chip would double have more than one GPU chip on a single PCB. For these every 18 months. Dennard scaling stated that the energy products, we only consider the properties of one of its chips. consumption of a chip would stay in proportion to the size of As an exception, the emerging Multi-Chip-Module (MCM) the chip. Shrinking transistor sizes (i.e., reducing the process packaging technology effectively incorporates multiple chips size) would allow us to increase the computing capabilities of in one package. We consider the multiple chips in a MCM the device without consuming more energy. Moore’s Law and package (e.g., AMD Threadripper CPUs) as a single chip. Dennard Scaling charted a promising future for computers. As Because MCM packages are usually programmer transparent, we reduce the transistor size, the performance of computing a programmer can use multiple chips in an MCM package as if devices would exponentially increase over time. they are a single chip. The development of interposer technolo- Recently, the semiconductor industry has faced unprece- gies can also provide promising communication bandwidth dented challenges. As the transistor sizes is approaching the between MCM chips. With recent technologies, MCM-CPUs practical limitations of physics, reducing transistor sizes can and MCM-GPUs can be as efficient as single large chips [1], hardly improve performance. To guide future semiconductor [4]. Another large group of products considered in this study arXiv:1911.11313v2 [cs.DC] 13 Jul 2020 research, we need to revisit these historical development trends and find new directions that can drive performance represent integrated CPU-GPU devices. We consider such de- improvement. vices as CPUs since the CPUs are the dominating components The Central Processing Unit (CPU) and Graphic Processing in these devices. We only consider the CPU on an integrated Unit (GPU) are the two mainstream options for general- CPU-GPU chip when calculating their theoretical computing purpose computing. The former primarily serves as the pro- capability. We use the Floating Point Operations Per Second cessor for general tasks executed on mobile devices, personal (FLOPS) or Tera-FLOPS (TFLOPS) as the metrics to evaluate computers, and servers. The latter takes on compute-intensive their theoretical single-precision and double-precision comput- tasks for graphics rendering, big-data analytics, signal pro- ing capabilities. cessing, artificial intelligence, and physics simulation [7]. To For device frequency, we only consider the base clocks. improve computing system performance, we need to better We do not consider the boost clocks as devices cannot con- understand CPU and GPU design trends. tinuously work at the boost clock frequencies. We estimate To study the trends, we collect and analyze the CPU and 1The data sources include www.techpowerup.com, wikichip.org, and vendor GPU data from public technical specifications of released websites. Process Size 220 nm 150 nm 110 nm 80 nm 55 nm 40 nm 28 nm 20 nm 14 nm 10 nm 250 nm 180 nm 130 nm 90 nm 65 nm 45 nm 32 nm 22 nm 16 nm 12 nm 7 nm 10 10 10 CPU 10 GPU 9 9 10 10 Transistors 8 Transistors 8 10 10 7 7 10 10 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 (a) Transistor count of CPUs. (b) Transistor count of GPUs. Fig. 1. Moore’s Law is still valid for both CPUs and GPUs. AMD NVIDIA 800 7 10 600 6 10 400 Die Size (mm^2) 200 Density (Transistor per mm^2) 5 10 7 nm 90 nm80 nm65 nm55 nm45 nm40 nm32 nm28 nm22 nm20 nm16 nm14 nm12 nm10 nm 250 nm220 nm180 nm150 nm130 nm110 nm 2000 2004 2008 2012 2016 2020 Process Size Release Date Fig. 2. Transistor Scaling. Fig. 3. GPU Die Sizes. III. FINDINGS energy consumption using the Thermal Design Power (TDP). A. Moore’s Law We admit that TDP may not be the best metric to evaluate a device’s power consumption. Chips can easily consume more Researchers have been predicting the end of Moore’s Law power than their TDP specification for a short periods of time. for a long time [5], [6], [10], [11]. However, Figure. 1 Chips may also maintain energy consumption under TDP by suggests the number of transistors on a chip is still increasing shutting down parts of the circuit. However, TDP can be a exponentially over time. For CPUs, we lack critical data reasonable estimation of the average power consumption when from the years 2014 to 2017 because Intel stopped releasing a device runs regular workloads at its base clock. transistor count and die size data since their 7th generation Core CPU. However, AMD restarted to produce high-end Although we have most of the product data, we miss the CPUs with large die-size recently. We can observe that the critical data of CPUs produced by Intel since 2014. Intel CPU transistor scaling trend is continuing to follow the pre- stopped revealing its CPU transistor counts and the die sizes 2014 trend. Also, Figure. 1 suggests that vendors tend to since the 7th generation Core CPU. Therefore, when present- use new CMOS technologies in high-end products first. Low- ing detailed studies that are related to transistor counts and end products may continue to use an older version of the die sizes (Figure. 2, 3, 4), we focus on the GPU data. Also, in CMOS process when the high-end products switch to a new these analyses, we only show devices that are manufactured by technology node. the Taiwan Semiconductor Manufacturing Company (TSMC). The next question we need to ask is, ”What drives increases We select TSMC because TSMC has produced the highest in transistor count?” In Figure. 2, we see that transistor scaling number of chips in our dataset. We only consider chips from is playing an essential role in increasing the die density (num- a single foundry since different foundries tend to use different ber of transistors per area). For the devices produced within a measurements when computing their process sizes. specific process size, the variance in terms of transistor density 3 10 1.0 1500 0.8 2 10 0.6 1000 1 0.4 10 Watt Per mm^2 500 0.2 Base Clock (MHz) Watt Per Billion Transistors 0 10 0.0 7 nm 7 nm 7 nm 90 nm 80 nm 65 nm 55 nm 40 nm 32 nm 28 nm 16 nm 14 nm 12 nm 90 nm 80 nm 65 nm 55 nm 40 nm 32 nm 28 nm 16 nm 14 nm 12 nm 90 nm 80 nm 65 nm 55 nm 40 nm 32 nm 28 nm 16 nm 14 nm 12 nm (a) Watt per billion transistor vs. process size. (b) Watt per mm^2 vs. process size. (c) Base clock vs. process size. Fig. 4. Process Size vs. (a) Energy consumption per billion transistor. (b) Energy consumption per area. (c) Base clock speed. Vendor AMD NVIDIA other nodes. The most recent 7nm technology has an unusually high energy density, exceeding the 98 percentile of the energy Die Size 300.0 600.0 900.0 density of all other nodes. It is still too early to conclude 100 the end of Dennard Scaling at the 7nm node since there have been only eight 7nm GPUs produced at the time of this report. 50 We expect chip manufacturers to develop solutions to improve energy efficiency. 20 Despite the challenges to continue Dennard Scaling, fre- quency continues to increase with shrinking process sizes (as 10 shown in Figure.
Recommended publications
  • CFD Analyses of a Notebook Computer Thermal Management
    PREPRINT. 1 Ilker Tari and Fidan Seza Yalçin, "CFD Analyses of a Notebook Computer Thermal Management System and a Proposed Passive Cooling Alternative, IEEE Transactions on Components and Packaging Technologies, Vol. 33, No. 2, pp. 443-452 (2010). CFD Analyses of a Notebook Computer Thermal Management System and a Proposed Passive Cooling Alternative Ilker Tari, and Fidan Seza Yalcin H Fin height, mm. Abstract— A notebook computer thermal management system L Heat sink vertical length, mm. is analyzed using a commercial CFD software package (ANSYS Nu Nusselt Number. Fluent). The active and passive paths that are used for heat Pr Prandtl Number. dissipation are examined for different steady state operating Ra Rayleigh Number. conditions. For each case, average and hot-spot temperatures of Re Reynolds Number. the components are compared with the maximum allowable T Temperature, °C or K. operating temperatures. It is observed that when low heat W Heat sink width, mm. dissipation components are put on the same passive path, the 2 increased heat load of the path may cause unexpected hot spot g Gravitational acceleration, m/s . 2 temperatures. Especially, Hard Disk Drive (HDD) is susceptible h Convection heat transfer coefficient, W/(m ·K). to overheating and the keyboard surface may reach k Thermal conductivity, W/(m·K). ergonomically undesirable temperatures. Based on the analysis q Heat transfer rate, W. results and observations, a new component arrangement s Fin spacing, mm. considering passive paths and using the back side of the LCD screen is proposed and a simple correlation based thermal Greek Symbols analysis of the proposed system is presented.
    [Show full text]
  • ATI Radeon™ HD 4870 Computation Highlights
    AMD Entering the Golden Age of Heterogeneous Computing Michael Mantor Senior GPU Compute Architect / Fellow AMD Graphics Product Group [email protected] 1 The 4 Pillars of massively parallel compute offload •Performance M’Moore’s Law Î 2x < 18 Month s Frequency\Power\Complexity Wall •Power Parallel Î Opportunity for growth •Price • Programming Models GPU is the first successful massively parallel COMMODITY architecture with a programming model that managgped to tame 1000’s of parallel threads in hardware to perform useful work efficiently 2 Quick recap of where we are – Perf, Power, Price ATI Radeon™ HD 4850 4x Performance/w and Performance/mm² in a year ATI Radeon™ X1800 XT ATI Radeon™ HD 3850 ATI Radeon™ HD 2900 XT ATI Radeon™ X1900 XTX ATI Radeon™ X1950 PRO 3 Source of GigaFLOPS per watt: maximum theoretical performance divided by maximum board power. Source of GigaFLOPS per $: maximum theoretical performance divided by price as reported on www.buy.com as of 9/24/08 ATI Radeon™HD 4850 Designed to Perform in Single Slot SP Compute Power 1.0 T-FLOPS DP Compute Power 200 G-FLOPS Core Clock Speed 625 Mhz Stream Processors 800 Memory Type GDDR3 Memory Capacity 512 MB Max Board Power 110W Memory Bandwidth 64 GB/Sec 4 ATI Radeon™HD 4870 First Graphics with GDDR5 SP Compute Power 1.2 T-FLOPS DP Compute Power 240 G-FLOPS Core Clock Speed 750 Mhz Stream Processors 800 Memory Type GDDR5 3.6Gbps Memory Capacity 512 MB Max Board Power 160 W Memory Bandwidth 115.2 GB/Sec 5 ATI Radeon™HD 4870 X2 Incredible Balance of Performance,,, Power, Price
    [Show full text]
  • The Central Processing Unit(CPU). the Brain of Any Computer System Is the CPU
    Computer Fundamentals 1'stage Lec. (8 ) College of Computer Technology Dept.Information Networks The central processing unit(CPU). The brain of any computer system is the CPU. It controls the functioning of the other units and process the data. The CPU is sometimes called the processor, or in the personal computer field called “microprocessor”. It is a single integrated circuit that contains all the electronics needed to execute a program. The processor calculates (add, multiplies and so on), performs logical operations (compares numbers and make decisions), and controls the transfer of data among devices. The processor acts as the controller of all actions or services provided by the system. Processor actions are synchronized to its clock input. A clock signal consists of clock cycles. The time to complete a clock cycle is called the clock period. Normally, we use the clock frequency, which is the inverse of the clock period, to specify the clock. The clock frequency is measured in Hertz, which represents one cycle/second. Hertz is abbreviated as Hz. Usually, we use mega Hertz (MHz) and giga Hertz (GHz) as in 1.8 GHz Pentium. The processor can be thought of as executing the following cycle forever: 1. Fetch an instruction from the memory, 2. Decode the instruction (i.e., determine the instruction type), 3. Execute the instruction (i.e., perform the action specified by the instruction). Execution of an instruction involves fetching any required operands, performing the specified operation, and writing the results back. This process is often referred to as the fetch- execute cycle, or simply the execution cycle.
    [Show full text]
  • An Evolution of Mobile Graphics
    AN EVOLUTION OF MOBILE GRAPHICS Michael C. Shebanow Vice President, Advanced Processor Lab Samsung Electronics July 20, 20131 DISCLAIMER • The views herein are my own • They do not represent Samsung’s vision nor product plans 2 • The Mobile Market • Review of GPU Tech • GPU Efficiency • User Experience • Tech Challenges • Summary 3 The Rise of the Mobile GPU & Connectivity A NEW WORLD COMING? 4 DISCRETE GPU MARKET Flattening 5 MOBILE GPU MARKET Smart • In 2012, an estimated 800+ Phones million mobile GPUs shipped “Phablets” • ~123M tablets • ~712M smart phones Tablets • Will easily exceed 1B in the coming years • Trend: • Discrete GPU relatively flat • Mobile is growing rapidly 6 WW INTERNET TRAFFIC • Source: Cisco VNI Mobile INET IP Traffic growth Traffic • Internet traffic growth Year (TB/sec) rate (TB/sec) rate is staggering 2005 0.9 0.00 2006 1.5 65% 0.00 • 2012 total traffic is 2007 2.5 61% 0.01 13.7 GB per person 2008 3.8 54% 0.01 per month 2009 5.6 45% 0.04 2010 7.8 40% 0.10 • 2012 smart phone 2011 10.6 36% 0.23 traffic at 2012 12.4 17% 0.34 0.342 GB per person per month • 2017 smart phone traffic expected at 2.7 GB per person per month 7 WHERE ARE WE HEADED?… • Enormous quantity of GPUs • Large amount of interconnectivity • Better I/O 8 GPU Pipelines A BRIEF REVIEW OF GPU TECH 9 MOBILE GPU PIPELINE ARCHITECTURES Tile-based immediate mode rendering IA VS CCV RS PS ROP (TBIMR) Tile-based deferred IA VS CCV scene rendering (TBDR) RS PS ROP IA = input assembler VS = vertex shader CCV = cull, clip, viewport transform RS = rasterization,
    [Show full text]
  • 45-Year CPU Evolution: One Law and Two Equations
    45-year CPU evolution: one law and two equations Daniel Etiemble LRI-CNRS University Paris Sud Orsay, France [email protected] Abstract— Moore’s law and two equations allow to explain the a) IC is the instruction count. main trends of CPU evolution since MOS technologies have been b) CPI is the clock cycles per instruction and IPC = 1/CPI is the used to implement microprocessors. Instruction count per clock cycle. c) Tc is the clock cycle time and F=1/Tc is the clock frequency. Keywords—Moore’s law, execution time, CM0S power dissipation. The Power dissipation of CMOS circuits is the second I. INTRODUCTION equation (2). CMOS power dissipation is decomposed into static and dynamic powers. For dynamic power, Vdd is the power A new era started when MOS technologies were used to supply, F is the clock frequency, ΣCi is the sum of gate and build microprocessors. After pMOS (Intel 4004 in 1971) and interconnection capacitances and α is the average percentage of nMOS (Intel 8080 in 1974), CMOS became quickly the leading switching capacitances: α is the activity factor of the overall technology, used by Intel since 1985 with 80386 CPU. circuit MOS technologies obey an empirical law, stated in 1965 and 2 Pd = Pdstatic + α x ΣCi x Vdd x F (2) known as Moore’s law: the number of transistors integrated on a chip doubles every N months. Fig. 1 presents the evolution for II. CONSEQUENCES OF MOORE LAW DRAM memories, processors (MPU) and three types of read- only memories [1]. The growth rate decreases with years, from A.
    [Show full text]
  • On Heterogeneous Compute and Memory Systems
    ON HETEROGENEOUS COMPUTE AND MEMORY SYSTEMS by Jason Lowe-Power A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Sciences) at the UNIVERSITY OF WISCONSIN–MADISON 2017 Date of final oral examination: 05/31/2017 The dissertation is approved by the following members of the Final Oral Committee: Mark D. Hill, Professor, Computer Sciences Dan Negrut, Professor, Mechanical Engineering Jignesh M. Patel, Professor, Computer Sciences Karthikeyan Sankaralingam, Associate Professor, Computer Sciences David A. Wood, Professor, Computer Sciences © Copyright by Jason Lowe-Power 2017 All Rights Reserved i Acknowledgments I would like to acknowledge all of the people who helped me along the way to completing this dissertation. First, I would like to thank my advisors, Mark Hill and David Wood. Often, when students have multiple advisors they find there is high “synchronization overhead” between the advisors. However, Mark and David complement each other well. Mark is a high-level thinker, focusing on the structure of the argument and distilling ideas to their essentials; David loves diving into the details of microarchitectural mechanisms. Although ever busy, at least one of Mark or David were available to meet with me, and they always took the time to help when I needed it. Together, Mark and David taught me how to be a researcher, and they have given me a great foundation to build my career. I thank my committee members. Jignesh Patel for his collaborations, and for the fact that each time I walked out of his office after talking to him, I felt a unique excitement about my research.
    [Show full text]
  • Three-Dimensional Integrated Circuit Design: EDA, Design And
    Integrated Circuits and Systems Series Editor Anantha Chandrakasan, Massachusetts Institute of Technology Cambridge, Massachusetts For other titles published in this series, go to http://www.springer.com/series/7236 Yuan Xie · Jason Cong · Sachin Sapatnekar Editors Three-Dimensional Integrated Circuit Design EDA, Design and Microarchitectures 123 Editors Yuan Xie Jason Cong Department of Computer Science and Department of Computer Science Engineering University of California, Los Angeles Pennsylvania State University [email protected] [email protected] Sachin Sapatnekar Department of Electrical and Computer Engineering University of Minnesota [email protected] ISBN 978-1-4419-0783-7 e-ISBN 978-1-4419-0784-4 DOI 10.1007/978-1-4419-0784-4 Springer New York Dordrecht Heidelberg London Library of Congress Control Number: 2009939282 © Springer Science+Business Media, LLC 2010 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com) Foreword We live in a time of great change.
    [Show full text]
  • NVIDIA Quadro P4000
    NVIDIA Quadro P4000 GP104 1792 112 64 8192 MB GDDR5 256 bit GRAPHICS PROCESSOR CORES TMUS ROPS MEMORY SIZE MEMORY TYPE BUS WIDTH The Quadro P4000 is a professional graphics card by NVIDIA, launched in February 2017. Built on the 16 nm process, and based on the GP104 graphics processor, the card supports DirectX 12.0. The GP104 graphics processor is a large chip with a die area of 314 mm² and 7,200 million transistors. Unlike the fully unlocked GeForce GTX 1080, which uses the same GPU but has all 2560 shaders enabled, NVIDIA has disabled some shading units on the Quadro P4000 to reach the product's target shader count. It features 1792 shading units, 112 texture mapping units and 64 ROPs. NVIDIA has placed 8,192 MB GDDR5 memory on the card, which are connected using a 256‐bit memory interface. The GPU is operating at a frequency of 1227 MHz, which can be boosted up to 1480 MHz, memory is running at 1502 MHz. We recommend the NVIDIA Quadro P4000 for gaming with highest details at resolutions up to, and including, 5760x1080. Being a single‐slot card, the NVIDIA Quadro P4000 draws power from 1x 6‐pin power connectors, with power draw rated at 105 W maximum. Display outputs include: 4x DisplayPort. Quadro P4000 is connected to the rest of the system using a PCIe 3.0 x16 interface. The card measures 241 mm in length, and features a single‐slot cooling solution. Graphics Processor Graphics Card GPU Name: GP104 Released: Feb 6th, 2017 Architecture: Pascal Production Active Status: Process Size: 16 nm Bus Interface: PCIe 3.0 x16 Transistors: 7,200
    [Show full text]
  • The Intel Microprocessors: Architecture, Programming and Interfacing Introduction to the Microprocessor and Computer
    Microprocessors (0630371) Fall 2010/2011 – Lecture Notes # 1 The Intel Microprocessors: Architecture, Programming and Interfacing Introduction to the Microprocessor and computer Outline of the Lecture Evolution of programming languages. Microcomputer Architecture. Instruction Execution Cycle. Evolution of programming languages: Machine language - the programmer had to remember the machine codes for various operations, and had to remember the locations of the data in the main memory like: 0101 0011 0111… Assembly Language - an instruction is an easy –to- remember form called a mnemonic code . Example: Assembly Language Machine Language Load 100100 ADD 100101 SUB 100011 We need a program called an assembler that translates the assembly language instructions into machine language. High-level languages Fortran, Cobol, Pascal, C++, C# and java. We need a compiler to translate instructions written in high-level languages into machine code. Microprocessor-based system (Micro computer) Architecture Data Bus, I/O bus Memory Storage I/O I/O Registers Unit Device Device Central Processing Unit #1 #2 (CPU ) ALU CU Clock Control Unit Address Bus The figure shows the main components of a microprocessor-based system: CPU- Central Processing Unit , where calculations and logic operations are done. CPU contains registers , a high-frequency clock , a control unit ( CU ) and an arithmetic logic unit ( ALU ). o Clock : synchronizes the internal operations of the CPU with other system components using clock pulsing at a constant rate (the basic unit of time for machine instructions is a machine cycle or clock cycle) One cycle A machine instruction requires at least one clock cycle some instruction require 50 clocks. o Control Unit (CU) - generate the needed control signals to coordinate the sequencing of steps involved in executing machine instructions: (fetches data and instructions and decodes addresses for the ALU).
    [Show full text]
  • UNIT 8B a Full Adder
    UNIT 8B Computer Organization: Levels of Abstraction 15110 Principles of Computing, 1 Carnegie Mellon University - CORTINA A Full Adder C ABCin Cout S in 0 0 0 A 0 0 1 0 1 0 B 0 1 1 1 0 0 1 0 1 C S out 1 1 0 1 1 1 15110 Principles of Computing, 2 Carnegie Mellon University - CORTINA 1 A Full Adder C ABCin Cout S in 0 0 0 0 0 A 0 0 1 0 1 0 1 0 0 1 B 0 1 1 1 0 1 0 0 0 1 1 0 1 1 0 C S out 1 1 0 1 0 1 1 1 1 1 ⊕ ⊕ S = A B Cin ⊕ ∧ ∨ ∧ Cout = ((A B) C) (A B) 15110 Principles of Computing, 3 Carnegie Mellon University - CORTINA Full Adder (FA) AB 1-bit Cout Full Cin Adder S 15110 Principles of Computing, 4 Carnegie Mellon University - CORTINA 2 Another Full Adder (FA) http://students.cs.tamu.edu/wanglei/csce350/handout/lab6.html AB 1-bit Cout Full Cin Adder S 15110 Principles of Computing, 5 Carnegie Mellon University - CORTINA 8-bit Full Adder A7 B7 A2 B2 A1 B1 A0 B0 1-bit 1-bit 1-bit 1-bit ... Cout Full Full Full Full Cin Adder Adder Adder Adder S7 S2 S1 S0 AB 8 ⁄ ⁄ 8 C 8-bit C out FA in ⁄ 8 S 15110 Principles of Computing, 6 Carnegie Mellon University - CORTINA 3 Multiplexer (MUX) • A multiplexer chooses between a set of inputs. D1 D 2 MUX F D3 D ABF 4 0 0 D1 AB 0 1 D2 1 0 D3 1 1 D4 http://www.cise.ufl.edu/~mssz/CompOrg/CDAintro.html 15110 Principles of Computing, 7 Carnegie Mellon University - CORTINA Arithmetic Logic Unit (ALU) OP 1OP 0 Carry In & OP OP 0 OP 1 F 0 0 A ∧ B 0 1 A ∨ B 1 0 A 1 1 A + B http://cs-alb-pc3.massey.ac.nz/notes/59304/l4.html 15110 Principles of Computing, 8 Carnegie Mellon University - CORTINA 4 Flip Flop • A flip flop is a sequential circuit that is able to maintain (save) a state.
    [Show full text]
  • Computer Organization & Architecture Eie
    COMPUTER ORGANIZATION & ARCHITECTURE EIE 411 Course Lecturer: Engr Banji Adedayo. Reg COREN. The characteristics of different computers vary considerably from category to category. Computers for data processing activities have different features than those with scientific features. Even computers configured within the same application area have variations in design. Computer architecture is the science of integrating those components to achieve a level of functionality and performance. It is logical organization or designs of the hardware that make up the computer system. The internal organization of a digital system is defined by the sequence of micro operations it performs on the data stored in its registers. The internal structure of a MICRO-PROCESSOR is called its architecture and includes the number lay out and functionality of registers, memory cell, decoders, controllers and clocks. HISTORY OF COMPUTER HARDWARE The first use of the word ‘Computer’ was recorded in 1613, referring to a person who carried out calculation or computation. A brief History: Computer as we all know 2day had its beginning with 19th century English Mathematics Professor named Chales Babage. He designed the analytical engine and it was this design that the basic frame work of the computer of today are based on. 1st Generation 1937-1946 The first electronic digital computer was built by Dr John V. Atanasoff & Berry Cliford (ABC). In 1943 an electronic computer named colossus was built for military. 1946 – The first general purpose digital computer- the Electronic Numerical Integrator and computer (ENIAC) was built. This computer weighed 30 tons and had 18,000 vacuum tubes which were used for processing.
    [Show full text]
  • Recommendation to Handle Bare Dies
    Recommendation to handle bare dies Rev. 1.3 General description This application note gives recommendations on how to handle bare dies* in Chip On Board (COB), Chip On Glass (COG) and flip chip technologies. Bare dies should not be handled as chips in a package. This document highlights some specific effects which could harm the quality and yield of the production. *separated piece(s) of semiconductor wafer that constitute(s) a discrete semiconductor or whole integrated circuit. International Electrotechnical Commission, IEC 62258-1, ed. 1.0 (2005-08). A dedicated vacuum pick up tool is used to manually move the die. Figure 1: Vacuum pick up tool and wrist-strap for ESD protection Delivery Forms Bare dies are delivered in the following forms: Figure 2: Unsawn wafer Application Note – Ref : APN001HBD1.3 FBC-0002-01 1 Recommendation to handle bare dies Rev. 1.3 Figure 3: Unsawn wafer in open wafer box for multi-wafer or single wafer The wafer is sawn. So please refer to the E-mapping file from wafer test (format: SINF, eg4k …) for good dies information, especially when it is picked from metal Film Frame Carrier (FFC). Figure 4: Wafer on Film Frame Carrier (FFC) Figure 5: Die on tape reel Figure 6: Waffle pack for bare die Application Note – Ref : APN001HBD1.3 FBC-0002-01 2 Recommendation to handle bare dies Rev. 1.3 Die Handling Bare die must be handled always in a class 1000 (ISO 6) clean room environment: unpacking and inspection, die bonding, wire bonding, molding, sealing. Handling must be reduced to the absolute minimum, un-necessary inspections or repacking tasks have to be avoided (assembled devices do not need to be handled in a clean room environment since the product is already well packed) Use of complete packing units (waffle pack, FFC, tape and reel) is recommended and remaining quantities have to be repacked immediately after any process (e.g.
    [Show full text]