Designing a CPU CPU: “Central Processing Unit” Computer: CPU + Display + Optical Disk + Metal Case + Power Supply +

Total Page:16

File Type:pdf, Size:1020Kb

Designing a CPU CPU: “Central Processing Unit” Computer: CPU + Display + Optical Disk + Metal Case + Power Supply + Let’s build a computer! Designing a CPU CPU: “central processing unit” computer: CPU + display + optical disk + metal case + power supply + ... the difference between your computer Last lecture: circuit that implements an adder and a TV set This lecture: circuit that implements a CPU Introduction to Computer Science • Robert Sedgewick and Kevin Wayne • Copyright © 2008 • http://www.cs.Princeton.EDU/IntroCS 2 TOY Lite Primary Components of Toy-Lite CPU TOY machine. • 256 16-bit words of memory. opcode Rs Rd1 Rd2 Arithmetic and Logic Unit (ALU) • 16 16-bit registers. 1 8-bit program counter. 4 bits to specify • opcode Rs addr one of 16 registers 2 instruction types • Memory 16 instructions. • 8 bits to specify one of 256 memory words TOY-Lite machine. Toy-Lite Registers • 16 10-bit words of memory. opcode Rs Rd1 Rd2 4 10-bit registers. • 2 bits to specify • 1 4-bit program counter. opcode Rs addr one of 4 registers • 2 instruction types Processor Registers: Program Counter and Instruction Register 16 instructions. 4 bits to specify • one of 16 memory words “Control” Goal: CPU circuit for TOY-Lite (same design extends to TOY, your computer) 3 4 Review of Combinational Circuits Useful Combinational Circuits Controlled switch. Adder Gates. Incrementer (easy, add 0001) Sum-of products implementation of Boolean functions Bitwise AND, XOR (easy) Decoder Adder Shifter (clever, but we’ll skip details) Multiplexer 5 6 Decoder Decoder application: ALU! Decoder. [n-bit] TOY arithmetic • n address inputs, 2n data outputs. 1: add input buses Addressed output bit is 1; others are 0. 2: subtract • TOY opcode bits • Implements n Boolean functions 3: and 4: xor sub 5: shift left 6: shift right add x0 x1 x2 z0 z1 z2 z3 z4 z5 z6 z7 and 0 0 0 1 0 0 0 0 0 0 0 xor 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 Details: shift 0 1 1 0 0 0 1 0 0 0 0 • All circuits compute their result. Decoder lines AND all results. 1 0 0 0 0 0 0 1 0 0 0 • • “one-hot” OR collects answer. right 1 0 1 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 1 output bus 3-bit Decoder 7 8 Primary Components of Toy-Lite CPU Nuts and Bolts: Buses and Multiplexers Bus. Parallel wires connecting major component. ! Arithmetic and Logic Unit (ALU) • Ex. Carry register bits to ALU. • Ex. Carry register bits to memory Memory Multiplexer. Combinational circuit that selects among input buses. • Exactly one select line i is activated. • Copies bits from input bus i to output bus. Toy-Lite Registers Processor Registers: Program Counter and Instruction Register “Control” CPU is a circuit: everything is “connected” but wires that can be “on” can be selected by other wires. 9 10 Nuts and Bolts: Buses and Multiplexers A New Ingredient: Circuits With Memory Bus. Parallel wires connecting major component. Combinational circuits. • Ex. Carry register bits to ALU. • Output determined solely by inputs. • Ex. Carry register bits to memory • Ex: majority, adder, decoder, MUX, ALU. Multiplexer (MUX). Combinational circuit that selects among input buses. Sequential circuits. • Exactly one select line i is activated. • Output determined by inputs and current “state”. • Copies bits from input bus i to output bus. • Ex: memory, program counter, CPU. Ex. Simplest feedback loop. • Two controlled switches A and B, both connected to power, each blocked by the other. B • State determined by whichever switches first. Stable. • A Aside. Feedback with an odd number of switches is a buzzer (not stable). Doorbell: buzzer made with relays. CPU is a circuit: everything is “connected” but wires that can be “on” can be selected by other wires. 11 12 SR Flip-Flop Memory Overview x y x y SR Flip-flop. OR NOR Computers and TOY have several memory components. • Two cross-coupled NOR gates x + y ( x + y )! • Program counter and other processor registers. OR gate NOR gate A way to control the feedback loop. TOY registers (4 10-bit words in Toy-Lite). • write 0 write 1 • • Abstraction that "remembers" one bit. • Main memory (16 10-bit words in Toy-Lite). write 0 write 1 • Basic building block for memory and registers. memory bit Implementation. memory bit Use one flip-flop for each bit of memory. • read • Use buses and multiplexers to group bits into words. Access mechanism: when are contents available? • Processor registers: enable write. • Main memory: select and enable write. • TOY register: dual select and enable write need to be able to read two registers at once Caveat. Timing, switching delay. 13 14 Processor register Bit Memory Bit Interface Processor register bit. Extend a flip-flop to allow easy access to values. Memory and TOY register bits: Add selection mechanism. [ TOY PC, IR ] [ TOY main memory ] [ TOY registers ] 15 16 Memory Bit: Switch Level Implementation Processor Register Memory and TOY register bits: Add selection mechanism. Processor register. don't confuse with TOY register • Stores k bits. • Register contents always available on output bus. • If enable write is asserted, k input bits get copied into register. Ex 1. TOY-Lite program counter (PC) holds 4-bit address. Ex 2. TOY-Lite instruction register (IR) holds 10-bit current instruction. (4-bit) [ TOY PC, IR ] [ TOY main memory ] [ TOY registers ] 17 18 Processor Register Processor Register Processor register. don't confuse with TOY register Processor register. don't confuse with TOY register • Stores k bits. • Stores k bits. • Register contents always available on output bus. • Register contents always available on output bus. • If enable write is asserted, k input bits get copied into register. • If enable write is asserted, k input bits get copied into register. Ex 1. TOY program counter (PC) holds 8-bit address. Ex 1. TOY program counter (PC) holds 8-bit address. Ex 2. TOY instruction register (IR) holds 16-bit current instruction. Ex 2. TOY instruction register (IR) holds 16-bit current instruction. (4-bit) 19 20 Memory Bank Memory: Interface Memory bank. Bank of n registers; each stores k bits. (four 6-bit words) • 6-bit • Read and write information to one of n registers. Address inputs specify which one. • log2n address bits needed • Addressed bits always appear on output. • If write enabled, k input bits are copied into addressed register. Ex 0 (for lecture). 4-by-6 (four 6-bit words) Ex 1. Main memory bank. • TOY: 256-by-16 • TOY-Lite: 16-by-10 Ex 2. Registers. 2-bit • TOY: 16-by-16 6-bit • TOY Lite: 4-by-10 Two output buses. • 21 22 Memory: Component Level Implementation Memory: Switch Level Implementation Decoder plus memory selection: connect only to addressed word. (four 6-bit words) 23 24 TOY-Lite Memory Toy-Lite Registers 16 10-bit words 4 10-bit words • input connected to registers for “store” • Dual-ported to support connecting two different registers to ALU • output connected to registers for “load” • Input MUX to support input connection to ALU, memory, IR, PC addr connect to processor Instruction Register (IR) to ALU (in) to ALU (out) • MUX select to registers (in) to registers (out) to IR to memory, IR to PC 25 26 Primary Components of Toy-Lite CPU How To Design a Digital Device How to design a digital device. • Design interface: input buses, output buses, control wires. ! ALU • Determine components. • Determine datapath requirements: "flow" of bits. • Establish control sequence. ! Memory Warmup. Design a program counter (3 devices, 3 control wires). ! Registers Goal. Design TOY-Lite computer (10 devices, 27 control wires). ! Processor Registers: Program Counter and Instruction Register Not quite done. Need to be able to increment. “Control” 27 28 Program Counter: Interface Program Counter: Components Counter. Holds value that represents a binary number. Components. • Load: set value from input bus. • Register. • Increment: add one to value. • Incrementer. • Enable Write: make value available on output bus. • Multiplexer (to provide connections for both load and increment). Ex. TOY-Lite program counter (4-bit). 29 30 Program Counter: Datapath and Control Program Counter: Datapath and Control Datapath. Datapath. • Layout and interconnection of components. • Layout and interconnection of components. • Connect input and output buses. • Connect input and output buses. Control. Choreographs the "flow" of information on the datapath. Control. Choreographs the "flow" of information on the datapath. 31 32 Program Counter: Datapath and Control Program Counter: Datapath and Control Datapath. • Layout and interconnection of components. • Connect input and output buses. Control. Choreographs the "flow" of information on the datapath. 1. load: 3. increment: copy input to register output plus 1 available in MUX copy to register 2. enable write: 4. enable write: register contents available on output register contents available on output 33 34 Primary Components of Toy-Lite CPU How To Design a Digital Device How to design a digital device. • Design interface: input buses, output buses, control wires. ! ALU • Determine components. • Determine datapath requirements: "flow" of bits. • Establish control sequence. ! Memory Warmup. Design a program counter (3 devices, 3 control wires). ! Toy-Lite Registers Next. Design TOY-Lite computer (10 devices, 27 control wires). ! Processor Registers: Program Counter and Instruction Register “Control” 35 36 TOY-Lite: Interface TOY-Lite: Components CPU is a circuit. Interface: switches and lights. • set memory contents • set PC value • press RUN • [details of connection to circuit omitted] 37 38 TOY-Lite: Layout TOY-Lite Datapath Requirements: Fetch Basic machine operation is a cycle. • Fetch Fetch • Execute Fetch. • Memory[PC] to IR Execute • Increment PC Execute.
Recommended publications
  • On the Hardware Reduction of Z-Datapath of Vectoring CORDIC
    On the Hardware Reduction of z-Datapath of Vectoring CORDIC R. Stapenhurst*, K. Maharatna**, J. Mathew*, J.L.Nunez-Yanez* and D. K. Pradhan* *University of Bristol, Bristol, UK **University of Southampton, Southampton, UK [email protected] Abstract— In this article we present a novel design of a hardware wordlength larger than 18-bits the hardware requirement of it optimal vectoring CORDIC processor. We present a mathematical becomes more than the classical CORDIC. theory to show that using bipolar binary notation it is possible to eliminate all the arithmetic computations required along the z- In this particular work we propose a formulation to eliminate datapath. Using this technique it is possible to achieve three and 1.5 all the arithmetic operations along the z-datapath for conventional times reduction in the number of registers and adder respectively two-sided vector rotation and thereby reducing the hardware compared to classical CORDIC. Following this, a 16-bit vectoring while increasing the accuracy. Also the resulting architecture CORDIC is designed for the application in Synchronizer for IEEE shows significant hardware saving as the wordlength increases. 802.11a standard. The total area and dynamic power consumption Although we stick to the 2’s complement number system, without of the processor is 0.14 mm2 and 700μW respectively when loss of generality, this formulation can be adopted easily for synthesized in 0.18μm CMOS library which shows its effectiveness redundant arithmetic and higher radix formulation. A 16-bit as a low-area low-power processor. processor developed following this formulation requires 0.14 mm2 area and consumes 700 μW dynamic power when synthesized in 0.18μm CMOS library.
    [Show full text]
  • 18-447 Computer Architecture Lecture 6: Multi-Cycle and Microprogrammed Microarchitectures
    18-447 Computer Architecture Lecture 6: Multi-Cycle and Microprogrammed Microarchitectures Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 1/28/2015 Agenda for Today & Next Few Lectures n Single-cycle Microarchitectures n Multi-cycle and Microprogrammed Microarchitectures n Pipelining n Issues in Pipelining: Control & Data Dependence Handling, State Maintenance and Recovery, … n Out-of-Order Execution n Issues in OoO Execution: Load-Store Handling, … 2 Reminder on Assignments n Lab 2 due next Friday (Feb 6) q Start early! n HW 1 due today n HW 2 out n Remember that all is for your benefit q Homeworks, especially so q All assignments can take time, but the goal is for you to learn very well 3 Lab 1 Grades 25 20 15 10 5 Number of Students 0 30 40 50 60 70 80 90 100 n Mean: 88.0 n Median: 96.0 n Standard Deviation: 16.9 4 Extra Credit for Lab Assignment 2 n Complete your normal (single-cycle) implementation first, and get it checked off in lab. n Then, implement the MIPS core using a microcoded approach similar to what we will discuss in class. n We are not specifying any particular details of the microcode format or the microarchitecture; you can be creative. n For the extra credit, the microcoded implementation should execute the same programs that your ordinary implementation does, and you should demo it by the normal lab deadline. n You will get maximum 4% of course grade n Document what you have done and demonstrate well 5 Readings for Today n P&P, Revised Appendix C q Microarchitecture of the LC-3b q Appendix A (LC-3b ISA) will be useful in following this n P&H, Appendix D q Mapping Control to Hardware n Optional q Maurice Wilkes, “The Best Way to Design an Automatic Calculating Machine,” Manchester Univ.
    [Show full text]
  • Subchapter 2.4–Hp Server Rp5400 Series
    Chapter 2 hp server rp5400 series Subchapter 2.4—hp server rp5400 series hp server rp5470 Table 2.4.1 HP Server rp5470 Specifications Server model number rp5470 Max. Network Interface Cards (cont.)–see supported I/O table Server product number A6144B ATM 155 Mb/s–MMF 10 Number of Processors 1-4 ATM 155 Mb/s–UTP5 10 Supported Processors ATM 622 Mb/s–MMF 10 PA-RISC PA-8700 Processor @ 650 and 750 MHz 802.5 Token Ring 4/16/100 Mb/s 10 Cache–Instr/data per CPU (KB) 750/1500 Dual port X.25/SDLC/FR 10 Floating Point Coprocessor included Yes Quad port X.25/FR 7 FDDI 10 Max. Additional Interface Cards–see supported I/O table 8 port Terminal Multiplexer 4 64 port Terminal Multiplexer 10 PA-RISC PA-8600 Processor @ 550 MHz Graphics/USB kit 1 kit (2 cards) Cache–Instr/data/CPU (KB) 512/1024 Public Key Cryptography 10 Floating Point Coprocessor included Yes HyperFabric 7 Electrical Characteristics TPM estimate (4 CPUs) 34,500 AC Input power 100-240V 50/60 Hz SPECweb99 (4 CPUs) 3,750 Hotswap Power supplies 2 included, 3rd for N+1 Redundant AC power inputs 2 required, 3rd for N+1 Min. memory 256 MB Current requirements at 200V 6.5 A (shared across inputs) Max. memory capacity 16 GB Typical Power dissipation (watts) 1008 W Internal Disks Maximum Power dissipation (watts) 1 1360 W Max. disk mechanisms 4 Power factor at full load .98 Max. disk capacity 292 GB kW rating for UPS loading1 1.3 Standard Integrated I/O Maximum Heat dissipation (BTUs/hour) 1 4380 - (3000 typical) Ultra2 SCSI–LVD Yes Site Preparation 10/100Base-T (RJ-45 connector) Yes Site planning and installation included No RS-232 serial ports (multiplexed from DB-25 port) 3 Depth (mm/inches) 774 mm/30.5 Web Console (including 10Base-T port) Yes Width (mm/inches) 482 mm/19 I/O buses and slots Rack Height (EIA/mm/inches) 7 EIA/311/12.25 Total PCI Slots (supports 66/33 MHz×64/32 bits) 10 Deskside Height (mm/inches) 368 mm/14.5 2 Hot-Plug Twin-Turbo (500 MB/s) and 6 Hot-Plug Turbo slots (250 MB/s) Weight (kg/lbs) Max.
    [Show full text]
  • Analysis of GPGPU Programs for Data-Race and Barrier Divergence
    Analysis of GPGPU Programs for Data-race and Barrier Divergence Santonu Sarkar1, Prateek Kandelwal2, Soumyadip Bandyopadhyay3 and Holger Giese3 1ABB Corporate Research, India 2MathWorks, India 3Hasso Plattner Institute fur¨ Digital Engineering gGmbH, Germany Keywords: Verification, SMT Solver, CUDA, GPGPU, Data Races, Barrier Divergence. Abstract: Todays business and scientific applications have a high computing demand due to the increasing data size and the demand for responsiveness. Many such applications have a high degree of parallelism and GPGPUs emerge as a fit candidate for the demand. GPGPUs can offer an extremely high degree of data parallelism owing to its architecture that has many computing cores. However, unless the programs written to exploit the architecture are correct, the potential gain in performance cannot be achieved. In this paper, we focus on the two important properties of the programs written for GPGPUs, namely i) the data-race conditions and ii) the barrier divergence. We present a technique to identify the existence of these properties in a CUDA program using a static property verification method. The proposed approach can be utilized in tandem with normal application development process to help the programmer to remove the bugs that can have an impact on the performance and improve the safety of a CUDA program. 1 INTRODUCTION ans that the program will never end-up in an errone- ous state, or will never stop functioning in an arbitrary With the order of magnitude increase in computing manner, is a well-known and critical property that an demand in the business and scientific applications, de- operational system should exhibit (Lamport, 1977).
    [Show full text]
  • Datapath Design I Systems I
    Systems I Datapath Design I Topics Sequential instruction execution cycle Instruction mapping to hardware Instruction decoding Overview How do we build a digital computer? Hardware building blocks: digital logic primitives Instruction set architecture: what HW must implement Principled approach Hardware designed to implement one instruction at a time Plus connect to next instruction Decompose each instruction into a series of steps Expect that most steps will be common to many instructions Extend design from there Overlap execution of multiple instructions (pipelining) Later in this course Parallel execution of many instructions In more advanced computer architecture course 2 Y86 Instruction Set Byte 0 1 2 3 4 5 nop 0 0 addl 6 0 halt 1 0 subl 6 1 rrmovl rA, rB 2 0 rA rB andl 6 2 irmovl V, rB 3 0 8 rB V xorl 6 3 rmmovl rA, D(rB) 4 0 rA rB D jmp 7 0 mrmovl D(rB), rA 5 0 rA rB D jle 7 1 OPl rA, rB 6 fn rA rB jl 7 2 jXX Dest 7 fn Dest je 7 3 call Dest 8 0 Dest jne 7 4 ret 9 0 jge 7 5 pushl rA A 0 rA 8 jg 7 6 popl rA B 0 rA 8 3 Building Blocks fun Combinational Logic A A = Compute Boolean functions of L U inputs B 0 Continuously respond to input changes MUX Operate on data and implement 1 control Storage Elements valA A srcA Store bits valW Register W file dstW Addressable memories valB B Non-addressable registers srcB Clock Loaded only as clock rises Clock 4 Hardware Control Language Very simple hardware description language Can only express limited aspects of hardware operation Parts we want to explore and modify Data
    [Show full text]
  • Vax 6135 Car Vax Uk - Service Parts List
    VAX 6135 CAR VAX UK - SERVICE PARTS LIST Item Vax Part No. Description Price Code Service Item Item Vax Part No. Description Price Code Service Item 2 1-2-31-01-006 FILTER-FOAM-EXHAUST S1 CS 34 1-2-30-01-015 SEAL-FILTER/RESERVOIR K1 S 5 1-2-11-04-007 CABLE CLAMP L1 CS 35 1-3-15-03-007 FILTER HSG ASSY-UNIV-BLACK T3 S 29 1-2-31-01-002 FILTER DISC-BONDINI A1 CS 39 1-2-14-02-001 BALL VALVE N1 S 32 1-2-31-01-007 FILTER -FIBRE-MOULDED X1 CS 40 1-3-09-06-007 RESERVOIR ASSY-UNIV-BLACK W3 S 44 1-3-13-02-001 UPHOLSTERY TOOL Y1 CS 41 1-2-30-02-002 SEAL-CONICAL DUCT D2 S 47 1-2-39-01-010 CREVICE TOOL Y1 CS 42 1-2-30-02-003 SEAL-EXHAUST D2 S 48 1-3-13-01-001 DUSTBRUSH H2 CS 43 1-2-124731-12 RECOVERY BUCKET/SPIDER ASSY Z3 S 50 1-2-13-02-002 TUBE-EXTN-STAINLESS B3 CS 45 1-2-32-01-002 CASTOR-TWO WHEEL X1 S 52 1-3-18-01-022 HOSE & GRIP ASSY W3 CS 60 1-9-124423-00 TUBE-PUMP TO QRV A1 S 53 1-9-124962-00 WASH HEAD J3 CS 62 1-1-06-01-073 FEMALE QRV ASSY S2 S 54 1-9-124961-00 FLOOR BRUSH F3 CS 63 1-3-124422-00 ELBOW-QRV-ET408 C2 S 64 1-2-11-03-007 RETAINER CLIP-SPRING H1 CS 65 1-2-13-04-008 INLET TUBE-PVC A1 S 74 1-2-124594-00 UPHOLSTERY WASH TOOL K3 CS 66 1-3-124424-00 FIXING STRAP-ET408 L1 S 75 1-9-125226-00 TURBO TOOL-100mm R3 CS 67 1-7-124677-00 FILTER-WATER-INLET E2 S 77 1-9-125549-00 CREVICE TOOL-EXTRA LONG G2 CS 68 1-5-124419-00 PUMP-ET408 S3 S 101 1-2-124805-00 WATER TUBE ASSY-STRAIGHT Z2 CS 69 1-6-124425-00 SEAL-PUMP ET408 L1 S 3 1-3-124588-03 COVER-ACCESS-B R GREEN X1 NS 99 1-2-07-03-001 DUCT-CONICAL X1 S 6 1-2-36-04-004 CORD PROTECTOR A1 NS 102
    [Show full text]
  • CS 6290 Chapter 1
    Spring 2011 Prof. Hyesoon Kim Xbox 360 System Architecture, „Anderews, Baker • 3 CPU cores – 4-way SIMD vector units – 8-way 1MB L2 cache (3.2 GHz) – 2 way SMT • 48 unified shaders • 3D graphics units • 512-Mbyte DRAM main memory • FSB (Front-side bus): 5.4 Gbps/pin/s (16 pins) • 10.8 Gbyte/s read and write • Xbox 360: Big endian • Windows: Little endian http://msdn.microsoft.com/en-us/library/cc308005(VS.85).aspx • L2 cache : – Greedy allocation algorithm – Different workloads have different working set sizes • 2-way 32 Kbyte L1 I-cache • 4-way 32 Kbyte L1 data cache • Write through, no write allocation • Cache block size :128B (high spatial locality) • 2-way SMT, • 2 insts/cycle, • In-order issue • Separate vector/scalar issue queue (VIQ) Vector Vector Execution Unit Instructions Scalar Scalar Execution Unit • First game console by Microsoft, released in 2001, $299 Glorified PC – 733 Mhz x86 Intel CPU, 64MB DRAM, NVIDIA GPU (graphics) – Ran modified version of Windows OS – ~25 million sold • XBox 360 – Second generation, released in 2005, $299-$399 – All-new custom hardware – 3.2 Ghz PowerPC IBM processor (custom design for XBox 360) – ATI graphics chip (custom design for XBox 360) – 34+ million sold (as of 2009) • Design principles of XBox 360 [Andrews & Baker] - Value for 5-7 years -!ig performance increase over last generation - Support anti-aliased high-definition video (720*1280*4 @ 30+ fps) - extremely high pixel fill rate (goal: 100+ million pixels/s) - Flexible to suit dynamic range of games - balance hardware, homogenous resources - Programmability (easy to program) Slide is from http://www.cis.upenn.edu/~cis501/lectures/12_xbox.pdf • Code name of Xbox 360‟s core • Shared cell (playstation processor) ‟s design philosophy.
    [Show full text]
  • Evolving GPU Machine Code
    Journal of Machine Learning Research 16 (2015) 673-712 Submitted 11/12; Revised 7/14; Published 4/15 Evolving GPU Machine Code Cleomar Pereira da Silva [email protected] Department of Electrical Engineering Pontifical Catholic University of Rio de Janeiro (PUC-Rio) Rio de Janeiro, RJ 22451-900, Brazil Department of Education Development Federal Institute of Education, Science and Technology - Catarinense (IFC) Videira, SC 89560-000, Brazil Douglas Mota Dias [email protected] Department of Electrical Engineering Pontifical Catholic University of Rio de Janeiro (PUC-Rio) Rio de Janeiro, RJ 22451-900, Brazil Cristiana Bentes [email protected] Department of Systems Engineering State University of Rio de Janeiro (UERJ) Rio de Janeiro, RJ 20550-013, Brazil Marco Aur´elioCavalcanti Pacheco [email protected] Department of Electrical Engineering Pontifical Catholic University of Rio de Janeiro (PUC-Rio) Rio de Janeiro, RJ 22451-900, Brazil Leandro Fontoura Cupertino [email protected] Toulouse Institute of Computer Science Research (IRIT) University of Toulouse 118 Route de Narbonne F-31062 Toulouse Cedex 9, France Editor: Una-May O'Reilly Abstract Parallel Graphics Processing Unit (GPU) implementations of GP have appeared in the lit- erature using three main methodologies: (i) compilation, which generates the individuals in GPU code and requires compilation; (ii) pseudo-assembly, which generates the individuals in an intermediary assembly code and also requires compilation; and (iii) interpretation, which interprets the codes. This paper proposes a new methodology that uses the concepts of quantum computing and directly handles the GPU machine code instructions. Our methodology utilizes a probabilistic representation of an individual to improve the global search capability.
    [Show full text]
  • MPC5634M Microcontroller Data Sheet, Rev
    Freescale Semiconductor MPC5634M Rev. 9.2, 01/2015 MPC5634M Microcontroller Datasheet This is the MPC5634M Datasheet set consisting of the following files: • MPC5634M Datasheet Addendum (MPC5634M_AD), Rev. 1 • MPC5634M Datasheet (MPC5634M), Rev. 9 © Freescale Semiconductor, Inc., 2015. All rights reserved. Freescale Semiconductor MPC5634M_AD Datasheet Addendum Rev. 1.0, 01/2015 MPC5634M Microcontroller Datasheet Addendum This addendum describes corrections to the MPC5634M Table of Contents Microcontroller Datasheet, order number MPC5634M. 1 Addendum List for Revision 9 . 2 For convenience, the addenda items are grouped by 2 Revision History . 2 revision. Please check our website at http://www.freescale.com/powerarchitecture for the latest updates. The current version available of the MPC5634M Microcontroller Datasheet is Revision 9. © Freescale Semiconductor, Inc., 2015. All rights reserved. 1 Addendum List for Revision 9 4 Table 1. MPC5634M Rev 9 Addendum Location Description Section 4.11, “Temperature In “Temperature Sensor Electrical Characteristics” table, update the Min and Max value of Sensor Electrical “Accuracy” parameter to -20oC and +20oC, respectively. Characteristics”, Page 81 2 Revision History Table 2 provides a revision history for this datasheet addendum document. Table 2. Revision History Table Rev. Number Substantive Changes Date of Release 1.0 Initial release. 12/2014 MPC5634M_AD, Rev. 1.0 2 Freescale Semiconductor How to Reach Us: Information in this document is provided solely to enable system and software Home Page: implementers to use Freescale products. There are no express or implied copyright freescale.com licenses granted hereunder to design or fabricate any integrated circuits based on the Web Support: information in this document. freescale.com/support Freescale reserves the right to make changes without further notice to any products herein.
    [Show full text]
  • LECTURE 5 Single-Cycle Datapath and Control
    Single-Cycle LECTURE 5 Datapath and Control PROCESSORS In lecture 1, we reminded ourselves that the datapath and control are the two components that come together to be collectively known as the processor. • Datapath consists of the functional units of the processor. • Elements that hold data. • Program counter, register file, instruction memory, etc. • Elements that operate on data. • ALU, adders, etc. • Buses for transferring data between elements. • Control commands the datapath regarding when and how to route and operate on data. MIPS To showcase the process of creating a datapath and designing a control, we will be using a subset of the MIPS instruction set. Our available instructions include: • add, sub, and, or, slt • lw, sw • beq, j DATAPATH To start, we will look at the datapath elements needed by every instruction. First, we have instruction memory. Instruction memory is a state element that provides read-access to the instructions of a program and, given an address as input, supplies the corresponding instruction at that address. ­ Code can also be written, e.g., self-modifying code DATAPATH Next, we have the program counter or PC. The PC is a state element that holds the address of the current instruction. Essentially, it is just a 32-bit register which holds the instruction address and is updated at the end of every clock cycle. ­ Normally PC increments sequentially except for branch instructions The arrows on either side indicate that the PC state element is both readable and writeable. DATAPATH Lastly, we have the adder. The adder is responsible for incrementing the PC to hold the address of the next instruction.
    [Show full text]
  • Readingsample
    Embedded Robotics Mobile Robot Design and Applications with Embedded Systems Bearbeitet von Thomas Bräunl Neuausgabe 2008. Taschenbuch. xiv, 546 S. Paperback ISBN 978 3 540 70533 8 Format (B x L): 17 x 24,4 cm Gewicht: 1940 g Weitere Fachgebiete > Technik > Elektronik > Robotik Zu Inhaltsverzeichnis schnell und portofrei erhältlich bei Die Online-Fachbuchhandlung beck-shop.de ist spezialisiert auf Fachbücher, insbesondere Recht, Steuern und Wirtschaft. Im Sortiment finden Sie alle Medien (Bücher, Zeitschriften, CDs, eBooks, etc.) aller Verlage. Ergänzt wird das Programm durch Services wie Neuerscheinungsdienst oder Zusammenstellungen von Büchern zu Sonderpreisen. Der Shop führt mehr als 8 Millionen Produkte. CENTRAL PROCESSING UNIT . he CPU (central processing unit) is the heart of every embedded system and every personal computer. It comprises the ALU (arithmetic logic unit), responsible for the number crunching, and the CU (control unit), responsible for instruction sequencing and branching. Modern microprocessors and microcontrollers provide on a single chip the CPU and a varying degree of additional components, such as counters, timing coprocessors, watchdogs, SRAM (static RAM), and Flash-ROM (electrically erasable ROM). Hardware can be described on several different levels, from low-level tran- sistor-level to high-level hardware description languages (HDLs). The so- called register-transfer level is somewhat in-between, describing CPU compo- nents and their interaction on a relatively high level. We will use this level in this chapter to introduce gradually more complex components, which we will then use to construct a complete CPU. With the simulation system Retro [Chansavat Bräunl 1999], [Bräunl 2000], we will be able to actually program, run, and test our CPUs.
    [Show full text]
  • Automated Synthesis of Symbolic Instruction Encodings from I/O Samples
    Automated Synthesis of Symbolic Instruction Encodings from I/O Samples Patrice Godefroid Ankur Taly Microsoft Research Stanford University PLDI’2012 Page 1 June 2012 Need for Symbolic Instruction Encodings Symbolic Execution is a key l 1 : m o v eax, i n p 1 component of precise binary m o v c l , i n p 2 program analysis tools s h l eax, c l j n z l2 j m p l3 - SAGE, BitBlaze, BAP, etc. l 2 : d i v ebx, eax // Is this safe ? - Static analysis tools / / I s e a x ! = 0 ? l 3 : … PLDI’2012 Page 2 June 2012 Problem: Symbolic Instruction Encoding Bit Vector[X] Instruction Bit Vector[Y] Inp 1 ? Op1 Inpn Opm Problem: Given a processor and an instruction name, symbolically describe the input-output function for the instruction • Express the encoding as bit-vector constraints (ex: SMT-Lib format) PLDI’2012 Page 3 June 2012 So far, only manual solutions… • From the instruction architecture manual (X86, ARM, …) implemented by the processor Limitations: • Tedious, expensive - X86 has more than 300 unique instructions, each with ~10 OPCodes, 2000 pages • Error-prone - Written in English, many corner cases • Imprecise - Spec is often under-specified • Partial - Not all instructions are covered X86 spec for SHLD • Can we trust the spec ? PLDI’2012 Page 4 June 2012 Here: Automated Synthesis Approach Searches for a function f that respects truth table Partial Truth- table S Sample inputs Synthesis Function (C with in-lined engine f assembly) Goals: • As automated as possible so that we can boot-strap a symbolic execution engine on an arbitrary instruction
    [Show full text]