Addressing Modes

Total Page:16

File Type:pdf, Size:1020Kb

Addressing Modes Advanced Architecture Basic architecture Comppgzuter Organization and Assembl yggy Languages Yung-Yu Chuang with slides by S. Dandamudi, Peng-Sheng Chen, Kip Irvine, Robert Sedgwick and Kevin Wayne Basic microcomputer design Basic microcomputer design • clock synchronizes CPU operations • The memory storage unit holds instructions and •control unit (CU) coordinates sequence of data for a running program execution steps • A bus is a group of wires that transfer data from • ALU performs arithmetic and logic operations one part to another (dt(data, address, contro l) data bus data bus registers registers I/O I/O I/O I/O Central Processor Unit Memory Storage Central Processor Unit Memory Storage Device Device Device Device (CPU) Unit (CPU) Unit #1 #2 #1 #2 ALU CU clklock ALU CU clklock control bus control bus address bus address bus Clock Instruction execution cycle • synchronizes all CPU and BUS operations program counter •machine (c loc k) cycle measures time of a sing le instruction queue operation PC ppgrogram • Fetch • clock is used to trigger events I-1 I-2 I-3 I-4 memory fetch •Decode one cycle op1 read op2 • Fetch 1 registers registers operands instruction I-1 register • Execute decod 0 • Store output e te te • Basic unit of time, 1GHz→clock cycle=1ns wri wri flags ALU execute • An instruction could take multippyle cycles to (tt)(output) complete, e.g. multiply in 8088 takes 50 cycles Multi-stage pipeline • Pipelining makes it possible for processor to execute instructions in parallel • Instruction execution divided into discrete stages Pipeline Stages S1 S2 S3 S4 S5 S6 Example of a non- 1 I-1 pipelined processor. 2 I-1 3 I-1 For example, 80386. 4 I-1 Many wasted cycles. 5 I-1 6 I-1 7 I-2 Cycles 8 I-2 9 I-2 10 I-2 11 I2I-2 12 I-2 Pipelined execution Pipelined execution • More efficient use of cycles, greater throughput • Pipelining requires buffers of instructions: (80486 started to use pipelining) – EhEach bbffuffer hhldolds a silingle value – Ideal scenario: equal work for each stage Stages • SiSometimes it is not possible For k stages and S1 S2 S3 S4 S5 S6 • Slowest stage determines the flow rate in the 1 I-1 n instructions, the entire pipe line 2 I-2 I-1 number of s 3 I-2 I-1 reqqyuired cycles is: ee 4 I-2 I-1 k + (n –1) Cycl 5 I-2 I-1 6 I2I-2 I1I-1 comparedtd to k*n 7 I-2 Pipelined execution Pipelined execution • Some reasons for unequal work stages • Operand fetch of I2 takes three cycles – A complex step cannot be subdivided conveniently – Pipeline stalls for two cycles – An operation takes variable amount of time to • Caused by hazards execute,,gp e.g. operand fetch time de pends on where – Pipeline stalls reduce overall throughput the operands are located •Registers • Cache •Memory – CCpomplexit y of op eration dep ends on the t ype of operation • Add: may take one cycle • MltilMultiply: may tktake several cycles Wasted cycles (pipelined) Superscalar • When one of the stages requires two or more A superscalar processor has multiple execution clock cycles, clock cycles are again wasted. pipelines. In the following, note that Stage S4 has left and right pipelines (u and v). Stages exe Stages S1 S2 S3 S4 S5 S6 For k states and n S4 1 I-1 For k stages and n instructions, the 2 I2I-2 I1I-1 instructions, the S1 S2 S3 u v S5 S6 1 I-1 number of required 3 I-3 I-2 I-1 number of required 2 I-2 I-1 4 I-3 I-2 I-1 cycles is: les 3 I-3 I-2 I-1 cc 5 I3I-3 I1I-1 cycles is: 4 I-4 I-3 I-2 I-1 k + n Cy 6 I-2 I-1 k + (2n –1) 5 I-4 I-3 I-1 I-2 7 I-2 I-1 Cycles 6 I-4 I-3 I-2 I-1 8 I3I-3 I2I-2 7 I-3 I-4 I-2 I-1 9 I-3 I-2 8 I-4 I-3 I-2 10 I-3 9 I-4 I-3 Pentium: 2 pipelines 11 I-3 10 I4I-4 PtiPentium Pro: 3 Pipeline stages Hazards • Pentium 3: 10 • Three types of hazards • PiPentium 4: 203120~31 – Resource hazards • Next-generation micro-architecture: 14 • Occurs when two or more instructions use the same resource, also called structural hazards •ARM7: 3 – Data hazards • Caused by data dependencies between instructions, e.g. result produced by I1 is read by I2 – Control hazards • Default: sequential execution suits pipelining • Altering control flow (e. g., branching) causes problems, introducing control dependencies Data hazards Data hazards add r1, r2, #10 ; write r1 • Forwarding: provides output result as soon as sub r3, r1, #20 ; read r1 possible add r1, r2, #10 ; write r1 sub r3, r1, #20 ; read r1 fetch decode reg ALU wb fetch decode reg ALU wb fetch decode stall reg ALU wb fetch decode stall reg ALU wb Data hazards Control hazards • Forwarding: provides output result as soon as bz r1, target possible add r2, r4, 0 ... add r1, r2, #10 ; write r1 target: add r2, r3, 0 sub r3, r1, #20 ; read r1 fetch decode reg ALU wb fetch decode reg ALU wb fetch decode reg ALU wb fetch decode stall reg ALU wb fetch decode reg ALU wb fetch decode reg ALU wb fetch decode stall reg ALU wb fetch decode reg ALU Control hazards Control hazards • Braches alter control flow • Delayed branch execution – Requ ire specia l atttittention in piliipelin ing – Effecti vel y reduces the branc h penalty – Need to throw away some instructions in the – We always fetch the instruction following the branch pipeline • Why throw it away? • Depends on when we know the branch is taken • Place a useful instruction to execute • Pipeline wastes three clock cycles •This is calldlled dldelay slot Delay slot – Called branch penalty – RdReduci igng branc h penalty add R2,R3,R4 branch target • Determine branch decision early branch target add R2,R3,R4 sub R5,R6,R7 sub R5,R6,R7 . Branch prediction Branch prediction • Three prediction strategies • Static prediction – Fixe d – Improves preditidiction accuracy over FidFixed • Prediction is fixed – Example: branch-never-taken IiInstruction type IiInstruction PdiiPrediction: Correct » Not proper for loop structures Distribution Branch prediction – Static (()%) taken? (()%) • Strategy depends on the branch type Unconditional 70*0.4 = 28 Yes 28 branch – Conditional branch: always not taken Conditional 70*06 0.6 = 42 No 42*06 0.6 = 25. 2 – Loop: always taken branch –Dynamic Loop 10 Yes 10*0.9 = 9 • Takes run-time history to make more accurate predictions Call/return 20 Yes 20 prediction accuracy = 82.2% Overall Branch prediction Branch prediction • Dynamic branch prediction • Impact of past n branches on prediction – Uses runtime hithistory accuracy • Takes the past n branch executions of the branch type and makes the prediction Type of mix –Simple strategy n Compiler Business Scientific • Prediction of the next branch is the majjyority of the 0 64.1 64.4 70.4 previous n branch executions 1 91.9 95.2 86.6 •Example: n = 3 – If two or more of the last three branches were taken, the 2 93.3 96.5 90.8 prediction is “branch taken” 3 93.7 96.6 91.0 • Depending on the type of mix, we get more than 90% predict ion accuracy 4 94. 5 96. 8 91. 8 5 94.7 97.0 92.0 Branch prediction Multitasking • OS can run multiple programs at the same time. 00 01 no branch • MlilMultiple thread s of execution wiihithin the same branch Predict Predict no branch no branch program. • Scheduler utility assigns a given amount of CPU no time to each running program. branch no branch • Rapid switching of tasks branch branch – ggpggives illusion that all programs are running at once – the processor must support task switching – scheduling policy, round-robin, priority 10 no 11 branch Predict Predict branch branch branch SRAM vs DRAM data bus registers I/O I/O Central Processor Unit Memory Storage Device Device (CPU) Unit #1 #2 Cache ALU CU clock control bus address bus Tran. Access Needs per bit time refresh? Cost Applications SRAM 4 or 6 1X No 100X cache memories DRAM 1 10X Y1XYes 1X MiMain memor ies, frame buffers The CPU-Memory gap Memory hierarchies The gap widens between DRAM, disk, and CPU speeds. • Some fundamental and enduring properties of 100, 000,000 hardware and software: 10,000,000 – Fast storage technologies cost more per byte, have 1,000,000 Disk seek time less capacity, and require more power (heat!). 100,000 DRAM access time 10,000 – The gap between CPU and main memory speed is ns SRAM access time 1, 000 widening. CPU cycle time 100 – Well-written programs tend to exhibit good locality. 10 1 • They suggest an approach for organizing 1980 1985 1990 1995 2000 memory and storage systems known as a ye ar memory hierarchy. register cache memory disk Access time 1 1-10 50-100 20,,,000,000 (cycles) Memory system in practice Reading from memory • Multiple machine cycles are required when reading L0: registers from memory, because it responds much more slowly Smaller, faster, and than the CPU (e.g.33 MHz). The wasted clock cycles are more expensive (per L1: on-chip L1 cache (SRAM) called wait states. byte) storage devices L2: off-chip L2 cache (SRAM) L1 Data 1 cycle ltlatency L3: main memory Regs.
Recommended publications
  • PIPELINING and ASSOCIATED TIMING ISSUES Introduction: While
    PIPELINING AND ASSOCIATED TIMING ISSUES Introduction: While studying sequential circuits, we studied about Latches and Flip Flops. While Latches formed the heart of a Flip Flop, we have explored the use of Flip Flops in applications like counters, shift registers, sequence detectors, sequence generators and design of Finite State machines. Another important application of latches and flip flops is in pipelining combinational/algebraic operation. To understand what is pipelining consider the following example. Let us take a simple calculation which has three operations to be performed viz. 1. add a and b to get (a+b), 2. get magnitude of (a+b) and 3. evaluate log |(a + b)|. Each operation would consume a finite period of time. Let us assume that each operation consumes 40 nsec., 35 nsec. and 60 nsec. respectively. The process can be represented pictorially as in Fig. 1. Consider a situation when we need to carry this out for a set of 100 such pairs. In a normal course when we do it one by one it would take a total of 100 * 135 = 13,500 nsec. We can however reduce this time by the realization that the whole process is a sequential process. Let the values to be evaluated be a1 to a100 and the corresponding values to be added be b1 to b100. Since the operations are sequential, we can first evaluate (a1 + b1) while the value |(a1 + b1)| is being evaluated the unit evaluating the sum is dormant and we can use it to evaluate (a2 + b2) giving us both |(a1 + b1)| and (a2 + b2) at the end of another evaluation period.
    [Show full text]
  • 2.5 Classification of Parallel Computers
    52 // Architectures 2.5 Classification of Parallel Computers 2.5 Classification of Parallel Computers 2.5.1 Granularity In parallel computing, granularity means the amount of computation in relation to communication or synchronisation Periods of computation are typically separated from periods of communication by synchronization events. • fine level (same operations with different data) ◦ vector processors ◦ instruction level parallelism ◦ fine-grain parallelism: – Relatively small amounts of computational work are done between communication events – Low computation to communication ratio – Facilitates load balancing 53 // Architectures 2.5 Classification of Parallel Computers – Implies high communication overhead and less opportunity for per- formance enhancement – If granularity is too fine it is possible that the overhead required for communications and synchronization between tasks takes longer than the computation. • operation level (different operations simultaneously) • problem level (independent subtasks) ◦ coarse-grain parallelism: – Relatively large amounts of computational work are done between communication/synchronization events – High computation to communication ratio – Implies more opportunity for performance increase – Harder to load balance efficiently 54 // Architectures 2.5 Classification of Parallel Computers 2.5.2 Hardware: Pipelining (was used in supercomputers, e.g. Cray-1) In N elements in pipeline and for 8 element L clock cycles =) for calculation it would take L + N cycles; without pipeline L ∗ N cycles Example of good code for pipelineing: §doi =1 ,k ¤ z ( i ) =x ( i ) +y ( i ) end do ¦ 55 // Architectures 2.5 Classification of Parallel Computers Vector processors, fast vector operations (operations on arrays). Previous example good also for vector processor (vector addition) , but, e.g. recursion – hard to optimise for vector processors Example: IntelMMX – simple vector processor.
    [Show full text]
  • Microprocessor Architecture
    EECE416 Microcomputer Fundamentals Microprocessor Architecture Dr. Charles Kim Howard University 1 Computer Architecture Computer System CPU (with PC, Register, SR) + Memory 2 Computer Architecture •ALU (Arithmetic Logic Unit) •Binary Full Adder 3 Microprocessor Bus 4 Architecture by CPU+MEM organization Princeton (or von Neumann) Architecture MEM contains both Instruction and Data Harvard Architecture Data MEM and Instruction MEM Higher Performance Better for DSP Higher MEM Bandwidth 5 Princeton Architecture 1.Step (A): The address for the instruction to be next executed is applied (Step (B): The controller "decodes" the instruction 3.Step (C): Following completion of the instruction, the controller provides the address, to the memory unit, at which the data result generated by the operation will be stored. 6 Harvard Architecture 7 Internal Memory (“register”) External memory access is Very slow For quicker retrieval and storage Internal registers 8 Architecture by Instructions and their Executions CISC (Complex Instruction Set Computer) Variety of instructions for complex tasks Instructions of varying length RISC (Reduced Instruction Set Computer) Fewer and simpler instructions High performance microprocessors Pipelined instruction execution (several instructions are executed in parallel) 9 CISC Architecture of prior to mid-1980’s IBM390, Motorola 680x0, Intel80x86 Basic Fetch-Execute sequence to support a large number of complex instructions Complex decoding procedures Complex control unit One instruction achieves a complex task 10
    [Show full text]
  • Instruction Pipelining (1 of 7)
    Chapter 5 A Closer Look at Instruction Set Architectures Objectives • Understand the factors involved in instruction set architecture design. • Gain familiarity with memory addressing modes. • Understand the concepts of instruction- level pipelining and its affect upon execution performance. 5.1 Introduction • This chapter builds upon the ideas in Chapter 4. • We present a detailed look at different instruction formats, operand types, and memory access methods. • We will see the interrelation between machine organization and instruction formats. • This leads to a deeper understanding of computer architecture in general. 5.2 Instruction Formats (1 of 31) • Instruction sets are differentiated by the following: – Number of bits per instruction. – Stack-based or register-based. – Number of explicit operands per instruction. – Operand location. – Types of operations. – Type and size of operands. 5.2 Instruction Formats (2 of 31) • Instruction set architectures are measured according to: – Main memory space occupied by a program. – Instruction complexity. – Instruction length (in bits). – Total number of instructions in the instruction set. 5.2 Instruction Formats (3 of 31) • In designing an instruction set, consideration is given to: – Instruction length. • Whether short, long, or variable. – Number of operands. – Number of addressable registers. – Memory organization. • Whether byte- or word addressable. – Addressing modes. • Choose any or all: direct, indirect or indexed. 5.2 Instruction Formats (4 of 31) • Byte ordering, or endianness, is another major architectural consideration. • If we have a two-byte integer, the integer may be stored so that the least significant byte is followed by the most significant byte or vice versa. – In little endian machines, the least significant byte is followed by the most significant byte.
    [Show full text]
  • Programming Model, Address Mode, HC12 Hardware Introduction
    EEL 4744C: Microprocessor Applications Lecture 2 Programming Model, Address Mode, HC12 Hardware Introduction Dr. Tao Li 1 Reading Assignment • Microcontrollers and Microcomputers: Chapter 3, Chapter 4 • Software and Hardware Engineering: Chapter 2 Or • Software and Hardware Engineering: Chapter 4 Plus • CPU12 Reference Manual: Chapter 3 • M68HC12B Family Data Sheet: Chapter 1, 2, 3, 4 Dr. Tao Li 2 EEL 4744C: Microprocessor Applications Lecture 2 Part 1 CPU Registers and Control Codes Dr. Tao Li 3 CPU Registers • Accumulators – Registers that accumulate answers, e.g. the A Register – Can work simultaneously as the source register for one operand and the destination register for ALU operations • General-purpose registers – Registers that hold data, work as source and destination register for data transfers and source for ALU operations • Doubled registers – An N-bit CPU in general uses N-bit data registers – Sometimes 2 of the N-bit registers are used together to double the number of bits, thus “doubled” registers Dr. Tao Li 4 CPU Registers (2) • Pointer registers – Registers that address memory by pointing to specific memory locations that hold the needed data – Contain memory addresses (without offset) • Stack pointer registers – Pointer registers dedicated to variable data and return address storage in subroutine calls • Index registers – Also used to address memory – An effective memory address is found by adding an offset to the content of the involved index register Dr. Tao Li 5 CPU Registers (3) • Segment registers – In some architectures, memory addressing requires that the physical address be specified in 2 parts • Segment part: specifies a memory page • Offset part: specifies a particular place in the page • Condition code registers – Also called flag or status registers – Hold condition code bits generated when instructions are executed, e.g.
    [Show full text]
  • Review of Computer Architecture
    Basic Computer Architecture CSCE 496/896: Embedded Systems Witawas Srisa-an Review of Computer Architecture Credit: Most of the slides are made by Prof. Wayne Wolf who is the author of the textbook. I made some modifications to the note for clarity. Assume some background information from CSCE 430 or equivalent von Neumann architecture Memory holds data and instructions. Central processing unit (CPU) fetches instructions from memory. Separate CPU and memory distinguishes programmable computer. CPU registers help out: program counter (PC), instruction register (IR), general- purpose registers, etc. von Neumann Architecture Memory Unit Input CPU Output Unit Control + ALU Unit CPU + memory address 200PC memory data CPU 200 ADD r5,r1,r3 ADD IRr5,r1,r3 Recalling Pipelining Recalling Pipelining What is a potential Problem with von Neumann Architecture? Harvard architecture address data memory data PC CPU address program memory data von Neumann vs. Harvard Harvard can’t use self-modifying code. Harvard allows two simultaneous memory fetches. Most DSPs (e.g Blackfin from ADI) use Harvard architecture for streaming data: greater memory bandwidth. different memory bit depths between instruction and data. more predictable bandwidth. Today’s Processors Harvard or von Neumann? RISC vs. CISC Complex instruction set computer (CISC): many addressing modes; many operations. Reduced instruction set computer (RISC): load/store; pipelinable instructions. Instruction set characteristics Fixed vs. variable length. Addressing modes. Number of operands. Types of operands. Tensilica Xtensa RISC based variable length But not CISC Programming model Programming model: registers visible to the programmer. Some registers are not visible (IR). Multiple implementations Successful architectures have several implementations: varying clock speeds; different bus widths; different cache sizes, associativities, configurations; local memory, etc.
    [Show full text]
  • The Birth, Evolution and Future of Microprocessor
    The Birth, Evolution and Future of Microprocessor Swetha Kogatam Computer Science Department San Jose State University San Jose, CA 95192 408-924-1000 [email protected] ABSTRACT timed sequence through the bus system to output devices such as The world's first microprocessor, the 4004, was co-developed by CRT Screens, networks, or printers. In some cases, the terms Busicom, a Japanese manufacturer of calculators, and Intel, a U.S. 'CPU' and 'microprocessor' are used interchangeably to denote the manufacturer of semiconductors. The basic architecture of 4004 same device. was developed in August 1969; a concrete plan for the 4004 The different ways in which microprocessors are categorized are: system was finalized in December 1969; and the first microprocessor was successfully developed in March 1971. a) CISC (Complex Instruction Set Computers) Microprocessors, which became the "technology to open up a new b) RISC (Reduced Instruction Set Computers) era," brought two outstanding impacts, "power of intelligence" and "power of computing". First, microprocessors opened up a new a) VLIW(Very Long Instruction Word Computers) "era of programming" through replacing with software, the b) Super scalar processors hardwired logic based on IC's of the former "era of logic". At the same time, microprocessors allowed young engineers access to "power of computing" for the creative development of personal 2. BIRTH OF THE MICROPROCESSOR computers and computer games, which in turn led to growth in the In 1970, Intel introduced the first dynamic RAM, which increased software industry, and paved the way to the development of high- IC memory by a factor of four.
    [Show full text]
  • IBM Z/Architecture Reference Summary
    z/Architecture IBMr Reference Summary SA22-7871-06 . z/Architecture IBMr Reference Summary SA22-7871-06 Seventh Edition (August, 2010) This revision differs from the previous edition by containing instructions related to the facilities marked by a bar under “Facility” in “Preface” and minor corrections and clari- fications. Changes are indicated by a bar in the margin. References in this publication to IBM® products, programs, or services do not imply that IBM intends to make these available in all countries in which IBM operates. Any reference to an IBM program product in this publication is not intended to state or imply that only IBM’s program product may be used. Any functionally equivalent pro- gram may be used instead. Additional copies of this and other IBM publications may be ordered or downloaded from the IBM publications web site at http://www.ibm.com/support/documentation. Please direct any comments on the contents of this publication to: IBM Corporation Department E57 2455 South Road Poughkeepsie, NY 12601-5400 USA IBM may use or distribute whatever information you supply in any way it believes appropriate without incurring any obligation to you. © Copyright International Business Machines Corporation 2001-2010. All rights reserved. US Government Users Restricted Rights — Use, duplication, or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. ii z/Architecture Reference Summary Preface This publication is intended primarily for use by z/Architecture™ assembler-language application programmers. It contains basic machine information summarized from the IBM z/Architecture Principles of Operation, SA22-7832, about the zSeries™ proces- sors. It also contains frequently used information from IBM ESA/390 Common I/O- Device Commands and Self Description, SA22-7204, IBM System/370 Extended Architecture Interpretive Execution, SA22-7095, and IBM High Level Assembler for MVS & VM & VSE Language Reference, SC26-4940.
    [Show full text]
  • CS152: Computer Systems Architecture Pipelining
    CS152: Computer Systems Architecture Pipelining Sang-Woo Jun Winter 2021 Large amount of material adapted from MIT 6.004, “Computation Structures”, Morgan Kaufmann “Computer Organization and Design: The Hardware/Software Interface: RISC-V Edition”, and CS 152 Slides by Isaac Scherson Eight great ideas ❑ Design for Moore’s Law ❑ Use abstraction to simplify design ❑ Make the common case fast ❑ Performance via parallelism ❑ Performance via pipelining ❑ Performance via prediction ❑ Hierarchy of memories ❑ Dependability via redundancy But before we start… Performance Measures ❑ Two metrics when designing a system 1. Latency: The delay from when an input enters the system until its associated output is produced 2. Throughput: The rate at which inputs or outputs are processed ❑ The metric to prioritize depends on the application o Embedded system for airbag deployment? Latency o General-purpose processor? Throughput Performance of Combinational Circuits ❑ For combinational logic o latency = tPD o throughput = 1/t F and G not doing work! PD Just holding output data X F(X) X Y G(X) H(X) Is this an efficient way of using hardware? Source: MIT 6.004 2019 L12 Pipelined Circuits ❑ Pipelining by adding registers to hold F and G’s output o Now F & G can be working on input Xi+1 while H is performing computation on Xi o A 2-stage pipeline! o For input X during clock cycle j, corresponding output is emitted during clock j+2. Assuming ideal registers Assuming latencies of 15, 20, 25… 15 Y F(X) G(X) 20 H(X) 25 Source: MIT 6.004 2019 L12 Pipelined Circuits 20+25=45 25+25=50 Latency Throughput Unpipelined 45 1/45 2-stage pipelined 50 (Worse!) 1/25 (Better!) Source: MIT 6.004 2019 L12 Pipeline conventions ❑ Definition: o A well-formed K-Stage Pipeline (“K-pipeline”) is an acyclic circuit having exactly K registers on every path from an input to an output.
    [Show full text]
  • Computer Organization
    Chapter 12 Computer Organization Central Processing Unit (CPU) • Data section ‣ Receives data from and sends data to the main memory subsystem and I/O devices • Control section ‣ Issues the control signals to the data section and the other components of the computer system Figure 12.1 CPU Input Data Control Main Output device section section Memory device Bus Data flow Control CPU components • 16-bit memory address register (MAR) ‣ 8-bit MARA and 8-bit MARB • 8-bit memory data register (MDR) • 8-bit multiplexers ‣ AMux, CMux, MDRMux ‣ 0 on control line routes left input ‣ 1 on control line routes right input Control signals • Originate from the control section on the right (not shown in Figure 12.2) • Two kinds of control signals ‣ Clock signals end in “Ck” to load data into registers with a clock pulse ‣ Signals that do not end in “Ck” to set up the data flow before each clock pulse arrives 0 1 8 14 15 22 23 A IR T3 M1 0x00 0x01 2 3 9 10 16 17 24 25 LoadCk Figure 12.2 X T4 M2 0x02 0x03 4 5 11 18 19 26 27 5 C SP T1 T5 M3 0x04 0x08 5 6 7 12 13 20 21 28 29 B PC T2 T6 M4 0xFA 0xFC 5 30 31 A CPU registers M5 0xFE 0xFF CBus ABus BBus Bus MARB MARCk MARA MDRCk MDR MDRMux AMux AMux MDRMux CMux 4 ALU ALU CMux Cin Cout C CCk Mem V VCk ANDZ Addr ANDZ Z ZCk Zout 0 Data 0 0 0 N NCk MemWrite MemRead Figure 12.2 (Expanded) 0 1 8 14 15 22 23 A IR T3 M1 0x00 0x01 2 3 9 10 16 17 24 25 LoadCk X T4 M2 0x02 0x03 4 5 11 18 19 26 27 5 C SP T1 T5 M3 0x04 0x08 5 6 7 12 13 20 21 28 29 B PC T2 T6 M4 0xFA 0xFC 5 30 31 A CPU registers M5 0xFE 0xFF CBus ABus BBus
    [Show full text]
  • Chapter 1: Microprocessor Architecture
    Chapter 1: Microprocessor architecture ECE 3120 – Fall 2013 Dr. Mohamed Mahmoud http://iweb.tntech.edu/mmahmoud/ [email protected] Outline 1.1 Computer hardware organization 1.1.1 Number System 1.1.2 Computer hardware organization 1.2 The processor 1.3 Memory system operation 1.4 Program Execution 1.5 HCS12 Microcontroller 1.1.1 Number System - Computer hardware uses binary numbers to perform all operations. - Human beings are used to decimal number system. Conversion is often needed to convert numbers between the internal (binary) and external (decimal) representations. - Octal and hexadecimal numbers have shorter representations than the binary system. - The binary number system has two digits 0 and 1 - The octal number system uses eight digits 0 and 7 - The hexadecimal number system uses 16 digits: 0, 1, .., 9, A, B, C,.., F 1 - 1 - A prefix is used to indicate the base of a number. - Convert %01000101 to Hexadecimal = $45 because 0100 = 4 and 0101 = 5 - Computer needs to deal with signed and unsigned numbers - Two’s complement method is used to represent negative numbers - A number with its most significant bit set to 1 is negative, otherwise it is positive. 1 - 2 1- Unsigned number %1111 = 1 + 2 + 4 + 8 = 15 %0111 = 1 + 2 + 4 = 7 Unsigned N-bit number can have numbers from 0 to 2N-1 2- Signed number %1111 is a negative number. To convert to decimal, calculate the two’s complement The two’s complement = one’s complement +1 = %0000 + 1 =%0001 = 1 then %1111 = -1 %0111 is a positive number = 1 + 2 + 4 = 7.
    [Show full text]
  • Pep8cpu: a Programmable Simulator for a Central Processing Unit J
    Pep8CPU: A Programmable Simulator for a Central Processing Unit J. Stanley Warford Ryan Okelberry Pepperdine University Novell 24255 Pacific Coast Highway 1800 South Novell Place Malibu, CA 90265 Provo, UT 84606 [email protected] [email protected] ABSTRACT baum [5]: application, high-order language, assembly, operating This paper presents a software simulator for a central processing system, instruction set architecture (ISA), microcode, and logic unit. The simulator features two modes of operation. In the first gate. mode, students enter individual control signals for the multiplex- For a number of years we have used an assembler/simulator for ers, function controls for the ALU, memory read/write controls, Pep/8 in the Computer Systems course to give students a hands-on register addresses, and clock pulses for the registers required for a experience at the high-order language, assembly, and ISA levels. single CPU cycle via a graphical user interface. In the second This paper presents a software package developed by an under- mode, students write a control sequence in a text window for the graduate student, now a software engineer at Novell, who took the cycles necessary to implement a single instruction set architecture Computer Organization course and was motivated to develop a (ISA) instruction. The simulator parses the sequence and allows programmable simulator at the microcode level. students to single step through its execution showing the color- Yurcik gives a survey of machine simulators [8] and maintains a coded data flow through the CPU. The paper concludes with a Web site titled Computer Architecture Simulators [9] with links to description of the use of the software in the Computer Organiza- papers and internet sources for machine simulators.
    [Show full text]