Lec1:Arch-To-Layout

Total Page:16

File Type:pdf, Size:1020Kb

Lec1:Arch-To-Layout EE213 Digital Integrated Circuits II Lecture 01: From Architecture to Layout Prof. Pingqiang Zhou Chapter 4 Chapter Chapter 1 Chapter EE213 L01 From Architecture to Layout.1 Pingqiang, ShanghaiTech, 2017 Coping with Complexity How to design System-on-Chip or Microprocessor? Many millions (even billions!) of transistors Tens to hundreds of engineers The Core i7 die and major components. http://techreport.com/review/15818/intel-core-i7-processors Microcontroller-based system on a chip https://en.wikipedia.org/wiki/Atom_(system_on_chip) EE213 L01 From Architecture to Layout.2 Pingqiang, ShanghaiTech, 2017 Structured Design Hierarchy: Divide and Conquer Recursively system into modules Regularity Reuse modules wherever possible Ex: Standard cell library Modularity: well-formed interfaces Allows modules to be treated as black boxes Locality Physical and temporal EE213 L01 From Architecture to Layout.3 Pingqiang, ShanghaiTech, 2017 Design Abstraction Levels SYSTEM MODULE + GATE CIRCUIT Vin Vout DEVICE G S D n+ n+ EE213 L01 From Architecture to Layout.4 Pingqiang, ShanghaiTech, 2017 Design Abstractions Architecture: User’s perspective, what does it do? Instruction set, registers MIPS, x86, Alpha, PIC, ARM, … Microarchitecture Single cycle/multicycle, pipelined, superscalar? Ex: 80386, 80486, Pentium, Atom, AMD, Phenom, … Logic: how are functional blocks constructed Ripple carry, carry lookahead, carry select adders Circuit: how are transistors used Complementary CMOS, pass transistors, domino Physical: chip layout Objects: datapaths, memories, random logic EE213 L01 From Architecture to Layout.5 Pingqiang, ShanghaiTech, 2017 Gajski Y-Chart EE213 L01 From Architecture to Layout.6 Pingqiang, ShanghaiTech, 2017 MIPS Processor - An Example EE213 L01 From Architecture to Layout.7 Pingqiang, ShanghaiTech, 2017 1 - (MIPS) Architecture (Simple 32-bit RISC) Example: subset of MIPS processor architecture Drawn from Patterson & Hennessy, chapter 4 Consider 8-bit subset using 8-bit datapath Only implement 8 registers ($0 - $7) $0 hardwired to 00000000 8-bit program counter Interested in RISC? See http://baike.baidu.com/view/736344.htm EE213 L01 From Architecture to Layout.8 Pingqiang, ShanghaiTech, 2017 Instruction Set format example encoding 6 5 5 5 5 6 R add $rd, $ra, $rb 0 ra rb rd 0 funct 6 5 5 16 I beq $ra, $rb, imm op ra rb imm 6 26 J j dest op dest EE213 L01 From Architecture to Layout.9 Pingqiang, ShanghaiTech, 2017 Fibonacci (in C Language) f0 = 1; f-1 = -1 fn = fn-1 + fn-2 f = 1, 1, 2, 3, 5, 8, 13, … EE213 L01 From Architecture to Layout.10 Pingqiang, ShanghaiTech, 2017 Fibonacci (in Assembly Language) EE213 L01 From Architecture to Layout.11 Pingqiang, ShanghaiTech, 2017 Fibonacci (in Binary Code) 1st statement: addi $3, $0, 8. How do we translate this to machine language? format example encoding 6 5 5 5 5 6 R add $rd, $ra, $rb 0 ra rb rd 0 funct 6 5 5 16 I beq $ra, $rb, imm op ra rb imm 6 26 J j dest op dest EE213 L01 From Architecture to Layout.12 Pingqiang, ShanghaiTech, 2017 Fibonacci (in Binary Code) Machine language program EE213 L01 From Architecture to Layout.13 Pingqiang, ShanghaiTech, 2017 2 - (MIPS) Microarchitecture Multicycle marchitecture EE213 L01 From Architecture to Layout.14 Pingqiang, ShanghaiTech, 2017 Multicycle Controller EE213 L01 From Architecture to Layout.15 Pingqiang, ShanghaiTech, 2017 ALU Decoder Unit EE213 L01 From Architecture to Layout.16 Pingqiang, ShanghaiTech, 2017 3 - (MIPS) Logic Design Start at top level Hierarchically decompose MIPS into units Top-level interface 2-phase memread crystal ph1 memwrite clock oscillator generator ph2 MIPS 8 processor adr 8 external writedata reset memory 8 memdata EE213 L01 From Architecture to Layout.17 Pingqiang, ShanghaiTech, 2017 Top-Level MIPS Block Diagram EE213 L01 From Architecture to Layout.18 Pingqiang, ShanghaiTech, 2017 Hierarchical Design mips controller alucontrol datapath standard bitslice zipper cell library alu inv4x flop ramslice fulladder or2 and2 mux4 nor2 inv nand2 mux2 tri EE213 L01 From Architecture to Layout.19 Pingqiang, ShanghaiTech, 2017 HDLs Hardware Description Languages Widely used in logic design Verilog and VHDL Describe hardware using code Document logic functions Simulate logic before building Synthesize code into gates and layout - Requires a library of standard cells EE213 L01 From Architecture to Layout.20 Pingqiang, ShanghaiTech, 2017 Verilog Example – Full Adder mips controller alucontrol datapath standard bitslice8-Bit Adderzipper in ALU cell library alu inv4x flop ramslice fulladder or2 and2 mux4 nor2 inv nand2 mux2 tri EE213 L01 From Architecture to Layout.21 Pingqiang, ShanghaiTech, 2017 Verilog Example – Full Adder module fulladder(input a, b, c, output s, cout); sum s1(a, b, c, s); carry c1(a, b, c, cout); endmodule module carry(input a, b, c, output cout) assign cout = (a&b) | (a&c) | (b&c); endmodule EE213 L01 From Architecture to Layout.22 Pingqiang, ShanghaiTech, 2017 4 – (MIPS) Circuit Design Circuit design is concerned with arranging transistors to perform a particular logic function. How should logic be implemented? NANDs and NORs vs. ANDs and ORs? Fan-in and fan-out? How wide should transistors be? These choices affect speed, area, power Logic synthesis makes these choices for you Good enough for many applications Hand-crafted circuits are still better EE213 L01 From Architecture to Layout.23 Pingqiang, ShanghaiTech, 2017 Transistors as Switches Can view MOS transistors as electrically-controlled switches Source Gate Drain Source Gate Drain Polysilicon Polysilicon SiO2 SiO2 1 n+ n+ p+ p+ S D p bulk Si n bulk Si g = 0 g = 1 d d d OFF nMOS g ON s s s d d d OFF pMOS g ON s s s EE213 L01 From Architecture to Layout.24 Pingqiang, ShanghaiTech, 2017 CMOS Inverter A Y VDD 0 1 1 0 OFFON 01 A Y ONOFF A Y GND EE213 L01 From Architecture to Layout.25 Pingqiang, ShanghaiTech, 2017 CMOS NAND Gate A B Y 0 0 1 OFFON ONOFF 0 1 1 Y 01 1 0 1 A OFFON 1 1 0 10 B ONOFF EE213 L01 From Architecture to Layout.26 Pingqiang, ShanghaiTech, 2017 Complementary CMOS Complementary CMOS logic gates pMOS pull-up nMOS pull-down network network pMOS pull-up network inputs output a.k.a. static CMOS nMOS pull-down network Pull-up OFF Pull-up ON Pull-down OFF Z (float) 1 Pull-down ON 0 X (crowbar) EE213 L01 From Architecture to Layout.27 Pingqiang, ShanghaiTech, 2017 Series and Parallel a a a a a 0 0 1 1 nMOS: 1 = ON g1 g2 0 1 0 1 pMOS: 0 = ON b b b b b (a) OFF OFF OFF ON Series: both must be ON a a a a a 0 0 1 1 g1 g2 Parallel: either can be ON 0 1 0 1 b b b b b (b) ON OFF OFF OFF a a a a a g1 g2 0 0 0 1 1 0 1 1 b b b b b (c) OFF ON ON ON a a a a a g1 g2 0 0 0 1 1 0 1 1 b b b b b (d) ON ON ON OFF EE213 L01 From Architecture to Layout.28 Pingqiang, ShanghaiTech, 2017 Conduction Complement Complementary CMOS gates always produce 0 or 1 Ex: NAND gate Series nMOS: Y=0 when both inputs are 1 Thus Y=1 when either input is 0 Y Requires parallel pMOS A B Rule of Conduction Complements Pull-up network is complement of pull-down Parallel -> series, series -> parallel EE213 L01 From Architecture to Layout.29 Pingqiang, ShanghaiTech, 2017 CMOS NOR Gate A B Y 0 0 1 A 0 1 0 1 0 0 B 1 1 0 Y EE213 L01 From Architecture to Layout.30 Pingqiang, ShanghaiTech, 2017 Exercises - Compound Gates EE213 L01 From Architecture to Layout.31 Pingqiang, ShanghaiTech, 2017 Example: Carry Logic assign cout = (a&b) | (a&c) | (b&c); g1 a x a p1 b p2 b p4 b i4 c p3 i3 a p5 p6 g2 cn g4 cout a y cout c n3 i1 a n5 n6 c i2 g3 a n1 b n2 b n4 b z c Transistors? Gate Delays? EE213 L01 From Architecture to Layout.32 Pingqiang, ShanghaiTech, 2017 Gate-level Netlist module carry(input a, b, c, output cout) wireg1 x, y, z; a x b andg2 g1(x, g4a, b); a y and g2(y, a, coutc); c andg3 g3(z, b, c); b or zg4(cout, x, y, z); endmodulec This is a technology-independent structural description, because generic gates have been used and the actual gate implementations have not been specified. EE213 L01 From Architecture to Layout.33 Pingqiang, ShanghaiTech, 2017 Transistor-Level Netlist module carry(input a, b, c, output cout) wire i1, i2, i3, i4, cn; tranif1 n1(i1, 0, a); tranif1 n2(i1, 0, b); a p1 b p2 b p4 tranif1 n3(cn, i1, c); i4 c p3 i3 a p5 p6 tranif1 n4(i2, 0, b); cn tranif1 n5(cn, i2, a); cout c n3 i1 a n5 n6 tranif0 p1(i3, 1, a); i2 tranif0 p2(i3, 1, b); a n1 b n2 b n4 tranif0 p3(cn, i3, c); tranif0 p4(i4, 1, b); tranif0 p5(cn, i4, a); tranif1 n6(cout, 0, cn); tranif0 p6(cout, 1, cn); endmodule tranif1 corresponds to nMOS transistors that turn ON when the gate is 1 while tranif0 corresponds to pMOS transistors that turn ON when the gate is 0. EE213 L01 From Architecture to Layout.34 Pingqiang, ShanghaiTech, 2017 SPICE Netlist .SUBCKT CARRY A B C COUT VDD GND MN1 I1 A GND GND NMOS W=1U L=0.18U AD=0.3P AS=0.5P MN2 I1 B GND GND NMOS W=1U L=0.18U AD=0.3P AS=0.5P MN3 CN C I1 GND NMOS W=1U L=0.18U AD=0.5P AS=0.5P MN4 I2 B GND GND NMOS W=1U L=0.18U AD=0.15P AS=0.5P MN5 CN A I2 GND NMOS W=1U L=0.18U AD=0.5P AS=0.15P MP1 I3 A VDD VDD PMOS W=2U L=0.18U AD=0.6P AS=1 P MP2 I3 B VDD VDD PMOS W=2U L=0.18U AD=0.6P AS=1P MP3 CN C I3 VDD PMOS W=2U L=0.18U AD=1P AS=1P MP4 I4 B VDD VDD PMOS W=2U L=0.18U aAD=0.3Pp1 AS=1Pb p2 b p4 MP5 CN A I4 VDD PMOS W=2U L=0.18U AD=1P AS=0.3P i4 c p3 i3 a p5 p6 MN6 COUT CN GND GND NMOS W=2U L=0.18U AD=1P AS=1P cn MP6 COUT CN VDD VDD PMOS W=4U L=0.18U AD=2P AS=2P cout CI1 I1 GND 2FF c n3 i1 a n5 n6 CI3 I3 GND 3FF i2 CA A GND 4FF a n1 b n2 b n4 CB B GND 4FF CC C GND 2FF CCN CN GND 4FF CCOUT COUT GND 2FF .ENDS Transistors are specified by lines beginning with an M as follows: Capacitors are specified by lines beginning with C as follows: EE213 L01 From Architecture to Layout.35 Pingqiang, ShanghaiTech, 2017 5 – (MIPS) Physical Design Layout Schematic Routing Floorplan Placement “Electronic Design Automation: Synthesis, Verification, and Test (Systems on Silicon),” by Laung-Terng Wang, Yao-Wen Chang, and Kwang-Ting Cheng, Morgan Kaufmann Publishing, 2009.
Recommended publications
  • Super 7™ Motherboard
    SY-5EH5/5EHM V1.0 Super 7Ô Motherboard ************************************************ Pentium® Class CPU supported ETEQ82C663 PCI/AGP Motherboard AT Form Factor ************************************************ User's Guide & Technical Reference NSTL “Year 2000 Test” Certification Letter September 23, 1998 Testing Date: September 23, 1998 Certification Date: September 23, 1998 Certification Number: NCY2000-980923-004 To Whom It May Concern: We are please to inform you that the “SY-5EHM/5EH5” system has passed NSTL Year 2000 certification test program. The Year 2000 test program tests a personal computer for its ability to support the year 2000. The “SY-5EHM/5EH5: system is eligible to carry the NSTL :Year 2000 Certification” seal. The Year 2000 certification test has been done under the following system configuration: Company Name : SOYO COMPUTER INC. System Model Name : SY-5EHM/5EH5 Hardware Revision : N/A CPU Model : Intel Pentium 200/66Mhz On Board Memory/L2 Cache : PC100 SDRAM DIMM 32MBx1 /1MB System BIOS : Award Modular BIOS V4.51PG, An Energy Star Ally Copyright © 1984—98, EH-1A6,07/15/1998-VP3-586B- 8669-2A5LES2AC-00 Best regards, SPORTON INTERNATIONAL INC. Declaration of Conformity According to 47 CFR, Part 2 and 15 of the FCC Rules Declaration No.: D872907 July.10 1998 The following designated product EQUIPMENT: Main Board MODEL NO.: SY-5EH Which is the Class B digital device complies with 47 CFR Parts 2 and 15 of the FCC rules. Operation is subject to the following two conditions : (1) this device may not cause harmful interference, and (2) this device must accept any interference received, including interference that may cause undesired operation.
    [Show full text]
  • Open Final Thesis2.Pdf
    The Pennsylvania State University The Graduate School College of Engineering ARCHITECTURAL TECHNIQUES TO ENABLE RELIABLE AND HIGH PERFORMANCE MEMORY HIERARCHY IN CHIP MULTI-PROCESSORS A Dissertation in Computer Science and Engineering by Amin Jadidi © 2018 Amin Jadidi Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy August 2018 The dissertation of Amin Jadidi was reviewed and approved∗ by the following: Chita R. Das Head of the Department of Computer Science and Engineering Dissertation Advisor, Chair of Committee Mahmut T. Kandemir Professor of Computer Science and Engineering John Sampson Assistant Professor of Computer Science and Engineering Prasenjit Mitra Professor of Information Sciences and Technology ∗Signatures are on file in the Graduate School. ii Abstract Constant technology scaling has enabled modern computing systems to achieve high degrees of thread-level parallelism, making the design of a highly scalable and dense memory hierarchy a major challenge. During the past few decades SRAM has been widely used as the dominant technology to build on-chip cache hierarchies. On the other hand, for the main memory, DRAM has been exploited to satisfy the applications demand. However, both of these two technologies face serious scalability and power consumption problems. While there has been enormous research work to address the drawbacks of these technologies, researchers have also been considering non-volatile memory technologies to replace SRAM and DRAM in future processors. Among different non-volatile technologies, Spin-Transfer Torque RAM (STT-RAM) and Phase Change Memory (PCM) are the most promising candidates to replace SRAM and DRAM technologies, respectively. Researchers believe that the memory hierarchy in future computing systems will consist of a hybrid combination of current technologies (i.e., SRAM and DRAM) and non-volatile technologies (e.g., STT-RAM, and PCM).
    [Show full text]
  • Communication Theory II
    Microprocessor (COM 9323) Lecture 2: Review on Intel Family Ahmed Elnakib, PhD Assistant Professor, Mansoura University, Egypt Feb 17th, 2016 1 Text Book/References Textbook: 1. The Intel Microprocessors, Architecture, Programming and Interfacing, 8th edition, Barry B. Brey, Prentice Hall, 2009 2. Assembly Language for x86 processors, 6th edition, K. R. Irvine, Prentice Hall, 2011 References: 1. Computer Architecture: A Quantitative Approach, 5th edition, J. Hennessy, D. Patterson, Elsevier, 2012. 2. The 80x86 Family, Design, Programming and Interfacing, 3rd edition, Prentice Hall, 2002 3. The 80x86 IBM PC and Compatible Computers, Assembly Language, Design, and Interfacing, 4th edition, M.A. Mazidi and J.G. Mazidi, Prentice Hall, 2003 2 Lecture Objectives 1. Provide an overview of the various 80X86 and Pentium family members 2. Define the contents of the memory system in the personal computer 3. Convert between binary, decimal, and hexadecimal numbers 4. Differentiate and represent numeric and alphabetic information as integers, floating-point, BCD, and ASCII data 5. Understand basic computer terminology (bit, byte, data, real memory system, protected mode memory system, Windows, DOS, I/O) 3 Brief History of the Computers o1946 The first generation of Computer ENIAC (Electrical and Numerical Integrator and Calculator) was started to be used based on the vacuum tube technology, University of Pennsylvania o1970s entire CPU was put in a single chip. (1971 the first microprocessor of Intel 4004 (4-bit data bus and 2300 transistors and 45 instructions) 4 Brief History of the Computers (cont’d) oLate 1970s Intel 8080/85 appeared with 8-bit data bus and 16-bit address bus and used from traffic light controllers to homemade computers (8085: 246 instruction set, RISC*) o1981 First PC was introduced by IBM with Intel 8088 (CISC**: over 20,000 instructions) microprocessor oMotorola emerged with 6800.
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • Brainiacs, Speed Demons, and Farewell; 12/29/1999 Page 1 of 2
    Brainiacs, Speed Demons, and Farewell; 12/29/1999 Page 1 of 2 Client Login Search MDR Home Vol 13, Issue 17 December 27, 1999 Brainiacs, Speed Demons, and Farewell Some Vendors Learn Later Than Others That Clock Speed Drives Performance As my final editorial for this august publication, I would like to reflect on how the industry has changed--and in some ways stayed the same--since one of my earliest editorials, discussing Brainiacs and Speed Demons (see MPR 3/8/93, p. 3). At that time, Digital's brand-new Alpha line, HP's PA-RISC, and the MIPS R4000 strove for high clock speeds, while IBM (Power), Sun (SuperSparc), and Motorola (88110) focused on high-IPC (instruction per cycle) designs. In 1993, Speed Demons used simple scalar or two-issue designs running at 100 to 200 MHz in state-of-the-art 0.8-micron IC processes; Brainiacs could issue three or four instructions per cycle but at no more than 66 MHz. In the subsequent seven years, better IC processes have greatly improved both the IPC and the cycle time of microprocessors, leading some vendors to claim to deliver the best of both worlds. But a chip becomes a Speed Demon through microarchitecture design philosophy, not IC process gains. The Speed Demon philosophy is best summed up by an Alpha designer who said that a processor's cycle time should be the minimum required to cycle an ALU and pass the result to the next instruction. The processor can implement any amount of complexity so long as it doesn't compromise this primary goal of ultimate speed.
    [Show full text]
  • Table of Contents
    Table Of Contents TABLE OF CONTENTS 1. INTRODUCTION 1.1. PREFACE ................................................................................. 1-1 1.2. KEY FEATURES ....................................................................... 1-1 1.3. PERFORMANCE LIST .............................................................. 1-3 1.4. BLOCK DIAGRAM..................................................................... 1-4 1.5. INTRODUCE THE PCI - BUS .................................................... 1-5 1.6. FEATURES ............................................................................... 1-5 1.7. What is AGP ............................................................................. 1-6 2. SPECIFICATION 2.1. HARDWARE ............................................................................. 2-1 2.2. SOFTWARE.............................................................................. 2-2 2.3. ENVIRONMENT ........................................................................ 2-2 3. HARDWARE INSTALLATION 3.1. UNPACKING............................................................................. 3-1 3.2. MAINBOARD LAYOUT.............................................................. 3-2 3.3. QUICK REFERENCE FOR JUMPERS & CONNECTORS .......... 3-3 3.4. SRAM INSTALLATION DRAM INSTALLATION.......................... 3-5 3.5. DRAM INSTALLATION.............................................................. 3-5 3.6. CPU INSTALLATION AND JUMPERS SETUP........................... 3-5 3.7. CMOS RTC & ISA CFG CMOS SRAM......................................
    [Show full text]
  • High Performance Embedded Computing in Space
    High Performance Embedded Computing in Space: Evaluation of Platforms for Vision-based Navigation George Lentarisa, Konstantinos Maragosa, Ioannis Stratakosa, Lazaros Papadopoulosa, Odysseas Papanikolaoua and Dimitrios Soudrisb National Technical University of Athens, 15780 Athens, Greece Manolis Lourakisc and Xenophon Zabulisc Foundation for Research and Technology - Hellas, POB 1385, 71110 Heraklion, Greece David Gonzalez-Arjonad GMV Aerospace & Defence SAU, Tres Cantos, 28760 Madrid, Spain Gianluca Furanoe European Space Agency, European Space Technology Centre, Keplerlaan 1 2201AZ Noordwijk, The Netherlands Vision-based navigation has become increasingly important in a variety of space ap- plications for enhancing autonomy and dependability. Future missions, such as active debris removal for remediating the low Earth orbit environment, will rely on novel high-performance avionics to support advanced image processing algorithms with sub- stantial workloads. However, when designing new avionics architectures, constraints relating to the use of electronics in space present great challenges, further exacerbated by the need for significantly faster processing compared to conventional space-grade central processing units. With the long-term goal of designing high-performance em- bedded computers for space, in this paper, an extended study and trade-off analysis of a diverse set of computing platforms and architectures (i.e., central processing units, a Research Associate, School of Electrical and Computer Engineering, NTUA, Athens. [email protected] b Associate Professor, School of Electrical and Computer Engineering, NTUA, Athens. [email protected] c Principal Researcher, Institute of Computer Science, FORTH, Heraklion. [email protected] d Avionics and On-Board Software Engineer, GMV, Madrid. e On-Board Computer Engineer, Microelectronics & Data Systems Division, ESA ESTEC, Noordwijk.
    [Show full text]
  • Hwacha Vector-Fetch Architecture Manual, Version 3.8.1
    The Hwacha Vector-Fetch Architecture Manual, Version 3.8.1 Yunsup Lee Colin Schmidt Albert Ou Andrew Waterman Krste Asanović Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2015-262 http://www.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-262.html December 19, 2015 Copyright © 2015, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. The Hwacha Vector-Fetch Architecture Manual Version 3.8.1 Yunsup Lee, Colin Schmidt, Albert Ou, Andrew Waterman, Krste Asanovic´ CS Division, EECS Department, University of California, Berkeley fyunsup|colins|aou|waterman|[email protected] December 19, 2015 Contents 1 Introduction 4 2 Data-Parallel Assembly Programming Models 5 2.1 Packed SIMD Assembly Programming Model . 5 2.2 SIMT Assembly Programming Model . 6 2.3 Traditional Vector Assembly Programming Model . 7 3 Hwacha Vector-Fetch Assembly Programming Model 9 4 Hwacha Instruction Set Architecture 12 5 Control Thread Instructions 13 5.1 Vector Configuration Instructions . 13 5.2 Vector Fetch Instruction . 14 5.3 Vector Move Instructions . 14 6 Worker Thread Instructions 15 6.1 Vector Strided, Strided-Segment Memory Instructions . 15 6.2 Vector Indexed Memory Instructions . 17 6.3 Vector Atomic Memory Instructions .
    [Show full text]
  • Motherboard/Processor/Memory Different Kinds of SLOTS Different
    Motherboard/Processor/Memory There are different slots and sockets for CPUs, and it is necessary for a motherboard to have the appropriate slot or socket for the CPU. Most sockets are square. Several precautions are taken to ensure that both the socket and the processor are indicated to ensure proper orientation. The processor is usually marked with a dot or a notch in the corner that is intended to go into the marked corner of the socket. Sockets and slots on the motherboard are as plentiful and varied as processors. The three most popular are the Socket 5 and Socket 7, and the Single Edge Contact Card (SECC). Socket 5 and Socket 7 CPU sockets are basically flat and have several rows of holes arranged in a square. The SECC connectors are of two types: slot 1 & slot 2. Design No of Pins Pin Rows Voltage Mobo Class Processor's supported Socket 1 169 3 5 Volts 486 80486SX, 80486DX, 80486DX2, 80486DX4 Socket 2 238 4 5 Volts 486 80486SX, 80486DX, 80486DX2, 80486DX4 Socket 3 237 4 5 / 3.3 Volts 486 80486SX, 80486DX, 80486DX2, 80486DX4 Socket 4 273 4 5 Volts 1st Generation Pentium Pentium 60-66, Pentium OverDrive Socket 5 320 5 3.3 Volts Pentium Pentium 75-133 MHz, Pentium OverDrive Socket 6 235 4 3.3 Volts 486 486DX4, Pentium OverDrive Design No of Pins Pin Rows Voltage Mobo Class Processor's supported Socket 7 321 5 2.5 / 3.3Volts Pentium 75-200 MHz, OverDrive, Pentium MMX Socket 8 387 5 (dual pattern) 3.1 / 3.3Volts Pentium Pro Pentium Pro OverDrive, Pentium II OverDrive Intel Slot 1 242 N/a 2.8 / 3.3Volts Pentium Pro / Pentium II Pentium II, Pentium Pro, Celeron.
    [Show full text]
  • Intel Server Boards S3000AHLX, S3000AH, and S3000AHV
    Intel® Server Boards S3000AHLX, S3000AH, and S3000AHV Technical Product Specification Intel Order Number: D72579-003 Revision 1.4 November 2008 Enterprise Platforms and Services Division Revision History Intel® Server Boards S3000AHLX, S3000AH, and S3000AHV TPS Revision History Date Revision Modifications Number August 2006 1.0 Initial external release. December 1.1 − Removed AHCI from BIOS setup menu. 2006 − Added “SPI/FWH Selection Header” section. − Revised memory section. − Updated S3000AH and S3000AHV SKU NIC controller from 82573V to 82573E. − Updated “Clear CMOS and System Maintenance Mode Jumpers” section. − Added Intel® Matrix Storage Technology feature support. March 2008 1.2 − Added 1333FSB processor support notification(page 16) − Added notification for SATA device when enabling RAID option(page 65) April 2008 1.3 - Added reference documents and correct error November 1.4 - Corrected MCH Memory Sub-System Overview section. 2008 - Modified Server Board Mechanical Drawings PCI slot voltage error. ii Revision 1.2 Intel® Server Boards S3000AHLX, S3000AH, and S3000AHV TPS Disclaimers Disclaimers Information in this document is provided in connection with Intel® products. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. Except as provided in Intel's Terms and Conditions of Sale for such products, Intel assumes no liability whatsoever, and Intel disclaims any express or implied warranty, relating to sale and/or use of Intel products including liability or warranties relating to fitness for a particular purpose, merchantability, or infringement of any patent, copyright or other intellectual property right. Intel products are not intended for use in medical, life saving, or life sustaining applications.
    [Show full text]
  • Rise Mp6 Microprocessor
    Rise™ mP6™ Microprocessor Low Power 2.0 V in .18 um Process Technology Doc#: MKTD6510.0 Version 0.0 Data Sheet ™ mmPP66 MMICROPROCESSOR ™ FOR WWINDOWS SSYSTEMS The Rise™ mP6™ processor is a sixth generation processor optimized for low–power, high-performance multimedia Windows™ applications. The innovative Rise™ mP6™ processor is the first superscalar, superpipelined, Pentium® MMX* compatible processor featuring 3 integer units, 3–way superscalar MMX technology, and a fully pipelined floating point unit. The innovative circuitry of the Rise™ mP6™ processor maximizes processing per clock cycle while requiring minimal power consumption – providing an ideal choice for cost–effective, power–efficient desktop and mobile Windows* 95, Windows* 98, and Windows NT* systems. ✝ RATED PERFORMANCE 366 MHZ 333 MHZ HOST BUS FREQUENCY 100 MHz 95 MHz CPU/HOST BUS RATIO 2.5:1 2.5:1 • x86 Instruction Set Enhanced with • Superscalar, Superpipelined MMX™ Technology Architecture − Superscalar MMX* Execution and Three • Pin Compatible with the Intel Pentium, Superpipelined Integer Units AMD* K6, AMD K6-2, and Cyrix MII* − Pipelined Floating Point Unit Processors (Socket 7) • − 296 Pin BPGA “Socket 7” Package Advanced Architectural Features − 387 Ball T2BGA® “Socket 7-like” Package − Advanced Data Dependency Removal − Split Voltage Planes Techniques − Innovative Instruction Decode and Branch • Advanced Power Management and Prediction Power Reduction Features − Dynamic Allocation of Resources − SMM Compatible • IEEE 1149.1 Boundary Scan − Clock Control − IEEE
    [Show full text]
  • 1. Introduction 1.1
    5AX 1. INTRODUCTION 1.1. PREFACE Welcome to use the 5AX motherboard. The motherboard is a Pipeline 512 KB CACHE Pentiumâ Processor based PC / AT compatible system with ISA bus and PCI Local Bus, and has been designed to be the fastest PC / AT system. There are some new features allow you to operate the system with the performance you want. This manual also explains how to install the motherboard for operation, and how to set up your CMOS CONFIGURATION with BIOS SETUP program. 1.2. KEY FEATURES q Pentiumâ Processor based PC / AT compatible mainboard with PCI / ISA / AGP Bus. q 5 PCI Bus slots, 2 ISA Bus slots, 1 AGP slot. q Supports : · Pentiumâ Processor 133 / 166 / 200 MHz ; MMX (166 / 200 / 233), K6-(166 / 200 / 233 / 266 / 300); · AMD K6-2 ( 266 / 300 / 333 / 350 / 380 / 400 / 450 / 475 / 500 / 550 ) K6-2+(450) K6-III(400 / 450 / 475 / 500 / 550) · Cyrix/IBM 6x86MX ( PR166 / PR200 / PR233 / PR266) ; MII-PR300 / PR333 / PR366 / PR400 Winchip 2-(200 / 225 / 233 / 266 / 300) · IDT Winchip 3-266 · RISE MP6-266 q Supports true 64 bits CACHE and DRAM access mode. q Supports 321 Pins (Socket 7) ZIF white socket on board. nd q Supports 512 KB Pipeline Burst Sync. 2 Level Cache. Introduction q CPU L1 / L2 Write-Back cache operation. q Supports 8 - 768 MB DRAM memory on board. q Supports 3*168 pin 64/72 Bit DIMM module. q Supports 2-channel Enhanced PCI IDE ports for 4 IDE Devices. q Supports 2*COM (16550), 1*LPT (EPP / ECP), 1*1.44MB Floppy port.
    [Show full text]