Ram K. Krishnamurthy Senior Principal Engineer
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
On the Hardware Reduction of Z-Datapath of Vectoring CORDIC
On the Hardware Reduction of z-Datapath of Vectoring CORDIC R. Stapenhurst*, K. Maharatna**, J. Mathew*, J.L.Nunez-Yanez* and D. K. Pradhan* *University of Bristol, Bristol, UK **University of Southampton, Southampton, UK [email protected] Abstract— In this article we present a novel design of a hardware wordlength larger than 18-bits the hardware requirement of it optimal vectoring CORDIC processor. We present a mathematical becomes more than the classical CORDIC. theory to show that using bipolar binary notation it is possible to eliminate all the arithmetic computations required along the z- In this particular work we propose a formulation to eliminate datapath. Using this technique it is possible to achieve three and 1.5 all the arithmetic operations along the z-datapath for conventional times reduction in the number of registers and adder respectively two-sided vector rotation and thereby reducing the hardware compared to classical CORDIC. Following this, a 16-bit vectoring while increasing the accuracy. Also the resulting architecture CORDIC is designed for the application in Synchronizer for IEEE shows significant hardware saving as the wordlength increases. 802.11a standard. The total area and dynamic power consumption Although we stick to the 2’s complement number system, without of the processor is 0.14 mm2 and 700μW respectively when loss of generality, this formulation can be adopted easily for synthesized in 0.18μm CMOS library which shows its effectiveness redundant arithmetic and higher radix formulation. A 16-bit as a low-area low-power processor. processor developed following this formulation requires 0.14 mm2 area and consumes 700 μW dynamic power when synthesized in 0.18μm CMOS library. -
18-447 Computer Architecture Lecture 6: Multi-Cycle and Microprogrammed Microarchitectures
18-447 Computer Architecture Lecture 6: Multi-Cycle and Microprogrammed Microarchitectures Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 1/28/2015 Agenda for Today & Next Few Lectures n Single-cycle Microarchitectures n Multi-cycle and Microprogrammed Microarchitectures n Pipelining n Issues in Pipelining: Control & Data Dependence Handling, State Maintenance and Recovery, … n Out-of-Order Execution n Issues in OoO Execution: Load-Store Handling, … 2 Reminder on Assignments n Lab 2 due next Friday (Feb 6) q Start early! n HW 1 due today n HW 2 out n Remember that all is for your benefit q Homeworks, especially so q All assignments can take time, but the goal is for you to learn very well 3 Lab 1 Grades 25 20 15 10 5 Number of Students 0 30 40 50 60 70 80 90 100 n Mean: 88.0 n Median: 96.0 n Standard Deviation: 16.9 4 Extra Credit for Lab Assignment 2 n Complete your normal (single-cycle) implementation first, and get it checked off in lab. n Then, implement the MIPS core using a microcoded approach similar to what we will discuss in class. n We are not specifying any particular details of the microcode format or the microarchitecture; you can be creative. n For the extra credit, the microcoded implementation should execute the same programs that your ordinary implementation does, and you should demo it by the normal lab deadline. n You will get maximum 4% of course grade n Document what you have done and demonstrate well 5 Readings for Today n P&P, Revised Appendix C q Microarchitecture of the LC-3b q Appendix A (LC-3b ISA) will be useful in following this n P&H, Appendix D q Mapping Control to Hardware n Optional q Maurice Wilkes, “The Best Way to Design an Automatic Calculating Machine,” Manchester Univ. -
SIMD Extensions
SIMD Extensions PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 12 May 2012 17:14:46 UTC Contents Articles SIMD 1 MMX (instruction set) 6 3DNow! 8 Streaming SIMD Extensions 12 SSE2 16 SSE3 18 SSSE3 20 SSE4 22 SSE5 26 Advanced Vector Extensions 28 CVT16 instruction set 31 XOP instruction set 31 References Article Sources and Contributors 33 Image Sources, Licenses and Contributors 34 Article Licenses License 35 SIMD 1 SIMD Single instruction Multiple instruction Single data SISD MISD Multiple data SIMD MIMD Single instruction, multiple data (SIMD), is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously. Thus, such machines exploit data level parallelism. History The first use of SIMD instructions was in vector supercomputers of the early 1970s such as the CDC Star-100 and the Texas Instruments ASC, which could operate on a vector of data with a single instruction. Vector processing was especially popularized by Cray in the 1970s and 1980s. Vector-processing architectures are now considered separate from SIMD machines, based on the fact that vector machines processed the vectors one word at a time through pipelined processors (though still based on a single instruction), whereas modern SIMD machines process all elements of the vector simultaneously.[1] The first era of modern SIMD machines was characterized by massively parallel processing-style supercomputers such as the Thinking Machines CM-1 and CM-2. These machines had many limited-functionality processors that would work in parallel. -
Datapath Design I Systems I
Systems I Datapath Design I Topics Sequential instruction execution cycle Instruction mapping to hardware Instruction decoding Overview How do we build a digital computer? Hardware building blocks: digital logic primitives Instruction set architecture: what HW must implement Principled approach Hardware designed to implement one instruction at a time Plus connect to next instruction Decompose each instruction into a series of steps Expect that most steps will be common to many instructions Extend design from there Overlap execution of multiple instructions (pipelining) Later in this course Parallel execution of many instructions In more advanced computer architecture course 2 Y86 Instruction Set Byte 0 1 2 3 4 5 nop 0 0 addl 6 0 halt 1 0 subl 6 1 rrmovl rA, rB 2 0 rA rB andl 6 2 irmovl V, rB 3 0 8 rB V xorl 6 3 rmmovl rA, D(rB) 4 0 rA rB D jmp 7 0 mrmovl D(rB), rA 5 0 rA rB D jle 7 1 OPl rA, rB 6 fn rA rB jl 7 2 jXX Dest 7 fn Dest je 7 3 call Dest 8 0 Dest jne 7 4 ret 9 0 jge 7 5 pushl rA A 0 rA 8 jg 7 6 popl rA B 0 rA 8 3 Building Blocks fun Combinational Logic A A = Compute Boolean functions of L U inputs B 0 Continuously respond to input changes MUX Operate on data and implement 1 control Storage Elements valA A srcA Store bits valW Register W file dstW Addressable memories valB B Non-addressable registers srcB Clock Loaded only as clock rises Clock 4 Hardware Control Language Very simple hardware description language Can only express limited aspects of hardware operation Parts we want to explore and modify Data -
Liečba Firmy Krízovým Manažérom
SEPTEMBER- OKTÓBER 2016 Ročník VIII. Magazín o ekonomike, biznise a spoločnosti Cena: 2,20 € LIEČBA FIRMY KRÍZOVÝM MANAŽÉROM Neľahká cesta z červených do čiernych čísel Trendy a výzvy európskej logistiky Firemný blog: robte ho poriadne alebo vôbec Stalo sa, opravíte s naším poistením majetku. Poistenie majetku MÔJ DOMOV Postavte sa s odvahou všetkým nepred- vídaným situáciám, ktoré ohrozujú váš domov. Najoceňovanejšie poistenie majetku Môj domov ich za vás vyrieši rýchlo a fér. allianzsp.sk Infolinka 0800 122 222 VZDELÁVANIE Podchyťme všetky talenty, Magazín o ekonomike, biznise a spoločnosti lebo Európa ich potrebuje V deťoch sa ukrýva veľký potenciál, príliš často však zostáva nevyužitý. Registrované ako periodická tlač Ministerstvom kultúry Slovenskej To je niečo, čo si Európska únia jednoducho nemôže dovoliť: plytvanie republiky pod registračným číslom EV 3451/09, ISSN 1337-9798 ľudskými zdrojmi, ktoré robí ľudí nešťastnými a je takisto kolektívnym Vydanie september – október 2015 zlyhaním. Vydáva: Nemám pritom na mysli len nadanie na štúdium. Je načase uznať ši- Goodwill Publishing, s. r. o. rokú škálu talentu a zručností. Známe sú práce amerického výskumní- IČO: 44 635 770 LB)PXBSEB(BSEOFSB LUPSâJEFOUJmLPWBMWFĔBESVIPWJOUFMJHFODJFPE interpersonálnej po muzikálnu, od priestorovej po jazykovú, logickú Adresa redakcie: alebo intrapersonálnu. Azda všetci súhlasia s tým, že až príliš často sa GOODWILL, Nevädzová 5, 821 01 Bratislava talent hodnotí na základe pevných kritérií, ktoré neodrážajú jeho boha- UFMGBYtHPPEXJMM!HPPEXJMMFVTL tosť ani zložitosť. Musíme sa otvoriť koncepcii talentu a vidieť ďalej, za Ing. Juraj Filin študijné výsledky. Žiaľ, školy majú stále sklon sústrediť sa na úzku ideu šéfredaktor a konateľ spôsobilosti – na akademickú prácu. mMJO!HPPEXJMMFVTLtSFEBLDJB!HPPEXJMMFVTL Potrebujeme talenty pre vyššie vzdelávanie, ale aj pre oblasti odbor- tel.: 0907 78 91 64 ného vzdelávania a prípravy. -
The Economic Impact of Moore's Law: Evidence from When It Faltered
The Economic Impact of Moore’s Law: Evidence from when it faltered Neil Thompson Sloan School of Management, MIT1 Abstract “Computing performance doubles every couple of years” is the popular re- phrasing of Moore’s Law, which describes the 500,000-fold increase in the number of transistors on modern computer chips. But what impact has this 50- year expansion of the technological frontier of computing had on the productivity of firms? This paper focuses on the surprise change in chip design in the mid-2000s, when Moore’s Law faltered. No longer could it provide ever-faster processors, but instead it provided multicore ones with stagnant speeds. Using the asymmetric impacts from the changeover to multicore, this paper shows that firms that were ill-suited to this change because of their software usage were much less advantaged by later improvements from Moore’s Law. Each standard deviation in this mismatch between firm software and multicore chips cost them 0.5-0.7pp in yearly total factor productivity growth. These losses are permanent, and without adaptation would reflect a lower long-term growth rate for these firms. These findings may help explain larger observed declines in the productivity growth of users of information technology. 1 I would like to thank my PhD advisors David Mowery, Lee Fleming, Brian Wright and Bronwyn Hall for excellent support and advice over the years. Thanks also to Philip Stark for his statistical guidance. This work would not have been possible without the help of computer scientists Horst Simon (Lawrence Berkeley National Lab) and Jim Demmel, Kurt Keutzer, and Dave Patterson in the Berkeley Parallel Computing Lab, I gratefully acknowledge their overall guidance, their help with the Berkeley Software Parallelism Survey and their hospitality in letting me be part of their lab. -
Curtiss-Wright to Display Rugged COTS Modules and System Solutions at Intel Developer Forum 2016
NEWS RELEASE FOR IMMEDIATE RELEASE Contact: John Wranovics (925) 640-6402 Curtiss-Wright to Display Rugged COTS Modules and System Solutions at Intel Developer Forum 2016 INTEL DEVELOPER FORUM 2016 (IDF16) – SAN FRANCISCO, Calif. (Booth #329) – August 16-18, 2016 – Curtiss-Wright’s Defense Solutions division will highlight its industry-leading open architecture rugged commercial-off-the-shelf (COTS) processing modules and subsystems along with its OpenHPEC™ Accelerator Suite of High Performance Embedded Computing (HPEC) software development tools for the aerospace and defense market at Intel Developer Forum 2016 San Francisco (IDF16: Booth #329). Featured will be demonstrations of glass cockpit applications running on rugged Intel processing modules and the industry’s first VITA 48.8-compliant Air Flow Through (AFT) rugged OpenVPX™ chassis. Curtiss-Wright will also display its Intel® Xeon® processor D-based 3U VPX CHAMP-XD1 and 6U VPX CHAMP-XD2 Digital Signal Processor (DSP) modules, which bring supercomputing-class processing to very compute-intensive C4ISR aerospace and defense applications such as radar processing, Signal Intelligence (SIGINT), and Electronic Warfare (EW). The broad range of Intel-based rugged COTS solutions displayed will include: Rugged Single Board Computer and DSP Modules: 3U VPX and XMC Mobile Xeon processor E3 v5 Modules: At IDF16 Curtiss-Wright is introducing two new small form factor COTS Single Board Computers (SBCs) based on Intel’s latest generation Mobile Xeon processor E3 v5 (formerly known as “Skylake-H”). The new rugged modules, the 3U OpenVPX™ VPX3-1220 and XMC-121 XMC processor mezzanine card, feature a low-power version of the Xeon processor to provide high performance quad-core x86 processing with integrated graphics at typically 50% the power levels of previous solutions. -
LECTURE 5 Single-Cycle Datapath and Control
Single-Cycle LECTURE 5 Datapath and Control PROCESSORS In lecture 1, we reminded ourselves that the datapath and control are the two components that come together to be collectively known as the processor. • Datapath consists of the functional units of the processor. • Elements that hold data. • Program counter, register file, instruction memory, etc. • Elements that operate on data. • ALU, adders, etc. • Buses for transferring data between elements. • Control commands the datapath regarding when and how to route and operate on data. MIPS To showcase the process of creating a datapath and designing a control, we will be using a subset of the MIPS instruction set. Our available instructions include: • add, sub, and, or, slt • lw, sw • beq, j DATAPATH To start, we will look at the datapath elements needed by every instruction. First, we have instruction memory. Instruction memory is a state element that provides read-access to the instructions of a program and, given an address as input, supplies the corresponding instruction at that address. Code can also be written, e.g., self-modifying code DATAPATH Next, we have the program counter or PC. The PC is a state element that holds the address of the current instruction. Essentially, it is just a 32-bit register which holds the instruction address and is updated at the end of every clock cycle. Normally PC increments sequentially except for branch instructions The arrows on either side indicate that the PC state element is both readable and writeable. DATAPATH Lastly, we have the adder. The adder is responsible for incrementing the PC to hold the address of the next instruction. -
Upgrading and Repairing Pcs, 21St Edition Editor-In-Chief Greg Wiegand Copyright © 2013 by Pearson Education, Inc
Contents at a Glance Introduction 1 1 Development of the PC 5 2 PC Components, Features, and System Design 19 3 Processor Types and Specifications 29 4 Motherboards and Buses 155 5 BIOS 263 UPGRADING 6 Memory 325 7 The ATA/IDE Interface 377 AND 8 Magnetic Storage Principles 439 9 Hard Disk Storage 461 REPAIRING PCs 10 Flash and Removable Storage 507 21st Edition 11 Optical Storage 525 12 Video Hardware 609 13 Audio Hardware 679 14 External I/O Interfaces 703 15 Input Devices 739 16 Internet Connectivity 775 17 Local Area Networking 799 18 Power Supplies 845 19 Building or Upgrading Systems 929 20 PC Diagnostics, Testing, and Maintenance 975 Index 1035 Scott Mueller 800 East 96th Street, Indianapolis, Indiana 46240 Upgrading.indb i 2/15/13 10:33 AM Upgrading and Repairing PCs, 21st Edition Editor-in-Chief Greg Wiegand Copyright © 2013 by Pearson Education, Inc. Acquisitions Editor All rights reserved. No part of this book shall be reproduced, stored in a retrieval Rick Kughen system, or transmitted by any means, electronic, mechanical, photocopying, Development Editor recording, or otherwise, without written permission from the publisher. No patent Todd Brakke liability is assumed with respect to the use of the information contained herein. Managing Editor Although every precaution has been taken in the preparation of this book, the Sandra Schroeder publisher and author assume no responsibility for errors or omissions. Nor is any Project Editor liability assumed for damages resulting from the use of the information contained Mandie Frank herein. Copy Editor ISBN-13: 978-0-7897-5000-6 Sheri Cain ISBN-10: 0-7897-5000-7 Indexer Library of Congress Cataloging-in-Publication Data in on file. -
Effectiveness of the MAX-2 Multimedia Extensions for PA-RISC 2.0 Processors
Effectiveness of the MAX-2 Multimedia Extensions for PA-RISC 2.0 Processors Ruby Lee Hewlett-Packard Company HotChips IX Stanford, CA, August 24-26,1997 Outline Introduction PA-RISC MAX-2 features and examples Mix Permute Multiply with Shift&Add Conditionals with Saturation Arith (e.g., Absolute Values) Performance Comparison with / without MAX-2 General-Purpose Workloads will include Increasing Amounts of Media Processing MM a b a b 2 1 2 1 b c b c functionality 5 2 5 2 A B C D 1 2 22 2 2 33 3 4 55 59 A B C D 1 2 A B C D 22 1 2 22 2 2 2 2 33 33 3 4 55 59 3 4 55 59 Distributed Multimedia Real-time Information Access Communications Tool Tool Computation Tool time 1980 1990 2000 Multimedia Extensions for General-Purpose Processors MAX-1 for HP PA-RISC (product Jan '94) VIS for Sun Sparc (H2 '95) MAX-2 for HP PA-RISC (product Mar '96) MMX for Intel x86 (chips Jan '97) MDMX for SGI MIPS-V (tbd) MVI for DEC Alpha (tbd) Ideally, different media streams map onto both the integer and floating-point datapaths of microprocessors images GR: GR: 32x32 video 32x64 ALU SMU FP: graphics FP:16x64 Mem 32x64 audio FMAC PA-RISC 2.0 Processor Datapath Subword Parallelism in a General-Purpose Processor with Multimedia Extensions General Regs. y5 y6 y7 y8 x5 x6 x7 x8 x1 x2 x3 x4 y1 y2 y3 y4 Partitionable Partitionable 64-bit ALU 64-bit ALU 8 ops / cycle Subword Parallel MAX-2 Instructions in PA-RISC 2.0 Parallel Add (modulo or saturation) Parallel Subtract (modulo or saturation) Parallel Shift Right (1,2 or 3 bits) and Add Parallel Shift Left (1,2 or 3 bits) and Add Parallel Average Parallel Shift Right (n bits) Parallel Shift Left (n bits) Mix Permute MAX-2 Leverages Existing Processing Resources FP: INTEGER FLOAT GR: 16x64 General Regs. -
New Intel-Powered Classmate Pc Design
Intel Corporation 2200 Mission College Blvd. P.O. Box 58119 Santa Clara, CA 95052-8119 News Fact Sheet CONTACTS: Agnes Kwan Nor Badron 408-398-2573 +86 21 5460-4510 ext 2228 [email protected] [email protected] INTEL PROVIDES SNEAK PEEK OF NEW INTEL-POWERED CLASSMATE PC DESIGN INTEL DEVELOPER FORUM, San Francisco, Aug. 20, 2008 – Intel is expanding its offerings for the Intel-powered classmate PC category by introducing a design that has tablet, touch screen and motion-sensing interaction features. There are a vast number of different education needs among the 1.3 billion students in the world; the new classmate PC design aims to create more choices to meet these varying learning needs. “Understanding that there is no one-size-fits-all when it comes to education, we are passionate about transforming the way students learn,” said Lila Ibrahim, general manager of Intel’s Emerging Markets Platform Group. “We want to offer more choices to meet the diversity of student learning needs across the world. “Our ethnographic research has shown us that students responded well to tablet and touch screen technology,” Ibrahim added. “The creativity, interactivity and user-friendliness of the new design will enhance the learning experiences for these children. This is important for both emerging and mature markets where technology is increasing being seen as a key tool in encouraging learning and facilitating teaching.” New Design, Same Philosophy The new design is based on findings from ethnographic research and pilots from the past two years. The research pointed out that students naturally collaborate to learn in groups, and – more – Intel/Page 2 they will benefit from the mobility and flexibility of notebooks versus being tethered to their desks. -
AI Chips: What They Are and Why They Matter
APRIL 2020 AI Chips: What They Are and Why They Matter An AI Chips Reference AUTHORS Saif M. Khan Alexander Mann Table of Contents Introduction and Summary 3 The Laws of Chip Innovation 7 Transistor Shrinkage: Moore’s Law 7 Efficiency and Speed Improvements 8 Increasing Transistor Density Unlocks Improved Designs for Efficiency and Speed 9 Transistor Design is Reaching Fundamental Size Limits 10 The Slowing of Moore’s Law and the Decline of General-Purpose Chips 10 The Economies of Scale of General-Purpose Chips 10 Costs are Increasing Faster than the Semiconductor Market 11 The Semiconductor Industry’s Growth Rate is Unlikely to Increase 14 Chip Improvements as Moore’s Law Slows 15 Transistor Improvements Continue, but are Slowing 16 Improved Transistor Density Enables Specialization 18 The AI Chip Zoo 19 AI Chip Types 20 AI Chip Benchmarks 22 The Value of State-of-the-Art AI Chips 23 The Efficiency of State-of-the-Art AI Chips Translates into Cost-Effectiveness 23 Compute-Intensive AI Algorithms are Bottlenecked by Chip Costs and Speed 26 U.S. and Chinese AI Chips and Implications for National Competitiveness 27 Appendix A: Basics of Semiconductors and Chips 31 Appendix B: How AI Chips Work 33 Parallel Computing 33 Low-Precision Computing 34 Memory Optimization 35 Domain-Specific Languages 36 Appendix C: AI Chip Benchmarking Studies 37 Appendix D: Chip Economics Model 39 Chip Transistor Density, Design Costs, and Energy Costs 40 Foundry, Assembly, Test and Packaging Costs 41 Acknowledgments 44 Center for Security and Emerging Technology | 2 Introduction and Summary Artificial intelligence will play an important role in national and international security in the years to come.