Computer Organization Structure of a Computer Registers Register

Total Page:16

File Type:pdf, Size:1020Kb

Computer Organization Structure of a Computer Registers Register Computer Organization Structure of a Computer ❚ Computer design as an application of digital logic design procedures ❚ Block diagram view address ❚ Computer = processing unit + memory system Processor read/write Memory System central processing data ❚ Processing unit = control + datapath unit (CPU) ❚ Control = finite state machine ❙ Inputs = machine instruction, datapath conditions ❙ Outputs = register transfer control signals, ALU operation control signals codes Control Data Path ❙ Instruction interpretation = instruction fetch, decode, data conditions execute ❚ Datapath = functional units + registers instruction unit execution unit ❙ Functional units = ALU, multipliers, dividers, etc. – instruction fetch and – functional units interpretation FSM ❙ Registers = program counter, shifters, storage registers and registers CS 150 - Spring 2004 – Lec 11: Computer Org I - 1 CS 150 - Spring 2004 – Lec 11: Computer Org I - 2 Registers Register Transfer ❚ Selectively loaded – EN or LD input ❚ Point-to-point connection MUX MUX MUX MUX ❚ Output enable – OE input ❙ Dedicated wires ❙ Muxes on inputs of rs rt rd R4 ❚ Multiple registers – group 4 or 8 in parallel each register ❚ Common input from multiplexer ❙ Load enables LD OE for each register rs rt rd R4 D7 Q7 OE asserted causes FF state to be D6 Q6 connected to output pins; otherwise they ❙ Control signals D5 Q5 are left unconnected (high impedance) MUX D4 Q4 for multiplexer D3 Q3 D2 Q2 LD asserted during a lo-to-hi clock D1 Q1 transition loads new data into FFs ❚ Common bus with output enables D0 CLK Q0 ❙ Output enables and load enables for each register rs rt rd R4 BUS CS 150 - Spring 2004 – Lec 11: Computer Org I - 3 CS 150 - Spring 2004 – Lec 11: Computer Org I - 4 Register Files Memories ❚ Larger Collections of Storage Elements ❚ Collections of registers in one package ❙ Implemented not as FFs but as much more efficient latches ❙ Two-dimensional array of FFs ❙ High-density memories use 1-5 switches (transitors) per bit ❙ Address used as index to a particular word ❙ Separate read and write addresses so can do both at same time ❚ Static RAM – 1024 words each 4 bits wide ❙ Once written, memory holds forever (not true for denser ❚ 4 by 4 register file dynamic RAM) ❙ 16 D-FFs ❙ Address lines to select word (10 lines for 1024 words) ❙ Organized as four words of four bits each ❙ Read enable RD RE ❘ Same as output enable WR ❙ Write-enable (load) RB RA ❘ Often called chip select A9 ❙ Read-enable (output enable) A8 WE Q3 ❘ Permits connection of many IO3 A7 IO2 WB Q2 chips into larger array A6 WA Q1 IO1 A5 IO0 Q0 ❙ Write enable (same as load enable) A4 D3 ❙ Bi-directional data lines A3 D2 A2 D1 ❘ output when reading, input when writing A2 D0 A1 A0 CS 150 - Spring 2004 – Lec 11: Computer Org I - 5 CS 150 - Spring 2004 – Lec 11: Computer Org I - 6 Instruction Sequencing Instruction Types ❚ Example – an instruction to add the contents of two ❚ Data Manipulation registers (Rx and Ry) and place result in a third ❙ Add, subtract register (Rz) ❙ Increment, decrement ❚ Step 1: Get the ADD instruction from memory into an ❙ Multiply instruction register ❙ Shift, rotate ❙ Immediate operands ❚ Step 2: Decode instruction ❙ Instruction in IR has the code of an ADD instruction ❚ Data Staging ❙ Register indices used to generate output enables for ❙ Load/store data to/from memory registers Rx and Ry ❙ Register-to-register move ❙ Register index used to generate load signal for register Rz ❚ Control ❚ Step 3: execute instruction ❙ Conditional/unconditional branches in program flow ❙ Enable Rx and Ry output and direct to ALU ❙ Subroutine call and return ❙ Setup ALU to perform ADD operation ❙ Direct result to Rz so that it can be loaded into register CS 150 - Spring 2004 – Lec 11: Computer Org I - 7 CS 150 - Spring 2004 – Lec 11: Computer Org I - 8 Elements of the Control Unit (aka Instruction Unit) Instruction Execution ❚ Control State Diagram (for each diagram) ❚ Standard FSM Elements Reset ❙ Reset ❙ State register ❙ Fetch instruction Init ❙ Next-state logic Initialize ❙ Decode Machine ❙ Output logic (datapath/control signaling) ❙ Execute ❙ Moore or synchronous Mealy machine to avoid loops unbroken Fetch by FF ❚ Instructions partitioned Instr. into three classes ❚ Plus Additional ”Control" Registers ❙ Branch ❙ Instruction register (IR) Load/ XEQ ❙ Load/store Branch Store Instr. ❙ Program counter (PC) Register- ❙ Register-to-register to-Register ❚ Inputs/Outputs Branch Branch ❚ Different sequence Taken Not Taken ❙ Outputs control elements of data path Incr. through diagram for PC ❙ Inputs from data path used to alter flow of program (test if each instruction type zero) CS 150 - Spring 2004 – Lec 11: Computer Org I - 9 CS 150 - Spring 2004 – Lec 11: Computer Org I - 10 Data Path (Hierarchy) Data Path (ALU) ❚ Arithmetic circuits constructed in hierarchical and ❚ ALU Block Diagram Cin iterative fashion ❙ Input: data and operation to perform ❙ each bit in datapath is ❙ Output: result of operation and status information Ain FA Sum functionally identical Bin ❙ 4-bit, 8-bit, 16-bit, 32-bit datapaths Cout AB 16 16 Ain Sum HA Bin Cout HA Operation Cin 16 N SZ CS 150 - Spring 2004 – Lec 11: Computer Org I - 11 CS 150 - Spring 2004 – Lec 11: Computer Org I - 12 Data Path (ALU + Registers) Data Path (Bit-slice) ❚ Accumulator ❚ Bit-slice concept: iterate to build n-bit wide datapaths ❙ Special register ❙ One of the inputs to ALU ❙ Output of ALU stored back in accumulator CO ALU CI CO ALU ALU CI ❚ One-address instructions ❙ Operation and address of one operand 16 AC AC AC ❙ Other operand and destination is accumulator register REG AC R0 R0 R0 ❙ AC <– AC op Mem[addr] 16 16 rs rs rs ❙ ”Single address instructions” OP rt rt rt (AC implicit operand) rd rd rd ❚ Multiple registers N 16 from from from ❙ Part of instruction used Z memory memory memory to choose register operands 1 bit wide 2 bits wide CS 150 - Spring 2004 – Lec 11: Computer Org I - 13 CS 150 - Spring 2004 – Lec 11: Computer Org I - 14 Instruction Path Data Path (Memory Interface) ❚ Memory ❚ Program Counter ❙ Separate data and instruction memory (Harvard architecture) ❙ Keeps track of program execution ❘ Two address busses, two data busses ❙ Address of next instruction to read from memory ❙ Single combined memory (Princeton architecture) ❙ May have auto-increment feature or use ALU ❘ Single address bus, single data bus ❚ Instruction Register ❚ Separate memory ❙ Current instruction ❙ ALU output goes to data memory input ❙ Includes ALU operation and address of operand ❙ Register input from data memory output ❙ Data memory address from instruction register ❙ Also holds target of jump instruction ❙ Instruction register from instruction memory output ❙ Immediate operands ❙ Instruction memory address from program counter ❚ Relationship to Data Path ❚ Single memory ❙ PC may be incremented through ALU ❙ Address from PC or IR ❙ Contents of IR may also be required as input to ALU ❙ Memory output to instruction and data registers ❙ Memory input from ALU output CS 150 - Spring 2004 – Lec 11: Computer Org I - 15 CS 150 - Spring 2004 – Lec 11: Computer Org I - 16 Block Diagram of Processor Block Diagram of Processor ❚ Register Transfer View of Princeton Architecture ❚ Register transfer view of Harvard architecture ❙ Which register outputs are connected to which register inputs ❙ Which register outputs are connected to which register inputs ❙ Arrows represent data-flow, other are control signals from ❙ Arrows represent data-flow, other are control signals from control FSM load control FSM load 16 path 16 path ❙ MAR may be a simple multiplexer ❙ Two MARs (PC and IR) rather than separate register REG AC rd wr REG AC rd wr 16 16 store ❙ Two MBRs (REG and IR) 16 16 store ❙ MBR is split in two path data path data OP Data Memory ❙ Load control for each register OP Data Memory (REG and IR) (16-bit words) (16-bit words) addr addr ❙ Load control N 8 N 16 Z Z for each register Control MAR FSM Control 16 FSM 16 IR PC IR PC data 16 16 16 16 Inst Memory (8-bit words) OP OP addr 16 16 CS 150 - Spring 2004 – Lec 11: Computer Org I - 17 CS 150 - Spring 2004 – Lec 11: Computer Org I - 18 A simplified Processor Data-path and Memory Processor Control ❚ Princeton architecture memory has only 255 words ❚ Synchronous Mealy machine with a display on the last one ❚ Register file ❚ Multiple cycles per instruction ❚ Instruction register ❚ PC incremented through ALU ❚ Modeled after MIPS rt000 (used in 61C textbook by Patterson & Hennessy) ❙ Really a 32 bit machine ❙ We’ll do a 16 bit version CS 150 - Spring 2004 – Lec 11: Computer Org I - 19 CS 150 - Spring 2004 – Lec 11: Computer Org I - 20 Processor Instructions Tracing an Instruction's Execution ❚ Three principal types (16 bits in each instruction) ❚ Instruction: r3 = r1 + r2 type op rs rt rd funct R 0 rs=r1 rt=r2 rd=r3 funct=0 R(egister) 33334 I(mmediate) 3337 ❚ 1. Instruction fetch J(ump) 313 ❙ Move instruction address from PC to memory address bus ❚ Some of the instructions ❙ Assert memory read add 0 rs rt rd 0 rd = rs + rt ❙ Move data from memory data bus into IR R sub 0 rs rt rd 1 rd = rs - rt ❙ Configure ALU to add 1 to PC and 0 rs rt rd 2 rd = rs & rt or 0 rs rt rd 3 rd = rs | rt ❙ Configure PC to store new value from ALUout slt 0 rs rt rd 4 rd = (rs < rt) lw 1 rs rt offset rt = mem[rs + offset] ❚ 2. Instruction decode I sw 2 rs rt offset mem[rs + offset] = rt ❙ Op-code bits of IR are input to control FSM beq 3 rs rt offset pc = pc + offset, if (rs == rt) addi 4 rs rt offset rt = rs + offset ❙ Rest of IR bits encode the operand addresses (rs and rt) J j 5 target address pc = target address ❘ These go to register file halt 7 - stop execution until reset CS 150 - Spring 2004 – Lec 11: Computer Org I - 21 CS 150 - Spring 2004 – Lec 11: Computer Org I - 22 Tracing an Instruction's Execution Tracing an Instruction's Execution (cont’d) (cont’d) ❚ Instruction: r3 = r1 + r2 ❚ Step 1 R 0 rs=r1 rt=r2 rd=r3 funct=0 ❚ 3.
Recommended publications
  • How Data Hazards Can Be Removed Effectively
    International Journal of Scientific & Engineering Research, Volume 7, Issue 9, September-2016 116 ISSN 2229-5518 How Data Hazards can be removed effectively Muhammad Zeeshan, Saadia Anayat, Rabia and Nabila Rehman Abstract—For fast Processing of instructions in computer architecture, the most frequently used technique is Pipelining technique, the Pipelining is consider an important implementation technique used in computer hardware for multi-processing of instructions. Although multiple instructions can be executed at the same time with the help of pipelining, but sometimes multi-processing create a critical situation that altered the normal CPU executions in expected way, sometime it may cause processing delay and produce incorrect computational results than expected. This situation is known as hazard. Pipelining processing increase the processing speed of the CPU but these Hazards that accrue due to multi-processing may sometime decrease the CPU processing. Hazards can be needed to handle properly at the beginning otherwise it causes serious damage to pipelining processing or overall performance of computation can be effected. Data hazard is one from three types of pipeline hazards. It may result in Race condition if we ignore a data hazard, so it is essential to resolve data hazards properly. In this paper, we tries to present some ideas to deal with data hazards are presented i.e. introduce idea how data hazards are harmful for processing and what is the cause of data hazards, why data hazard accord, how we remove data hazards effectively. While pipelining is very useful but there are several complications and serious issue that may occurred related to pipelining i.e.
    [Show full text]
  • Computer Organization
    Computer organization Computer design – an application of digital logic design procedures Computer = processing unit + memory system Processing unit = control + datapath Control = finite state machine inputs = machine instruction, datapath conditions outputs = register transfer control signals, ALU operation codes instruction interpretation = instruction fetch, decode, execute Datapath = functional units + registers functional units = ALU, multipliers, dividers, etc. registers = program counter, shifters, storage registers CSE370 - XI - Computer Organization 1 Structure of a computer Block diagram view address Processor read/write Memory System central processing data unit (CPU) control signals Control Data Path data conditions instruction unit execution unit œ instruction fetch and œ functional units interpretation FSM and registers CSE370 - XI - Computer Organization 2 Registers Selectively loaded – EN or LD input Output enable – OE input Multiple registers – group 4 or 8 in parallel LD OE D7 Q7 OE asserted causes FF state to be D6 Q6 connected to output pins; otherwise they D5 Q5 are left unconnected (high impedance) D4 Q4 D3 Q3 D2 Q2 LD asserted during a lo-to-hi clock D1 Q1 transition loads new data into FFs D0 CLK Q0 CSE370 - XI - Computer Organization 3 Register transfer Point-to-point connection MUX MUX MUX MUX dedicated wires muxes on inputs of each register rs rt rd R4 Common input from multiplexer load enables rs rt rd R4 for each register control signals MUX for multiplexer Common bus with output enables output enables and load rs rt rd R4 enables for each register BUS CSE370 - XI - Computer Organization 4 Register files Collections of registers in one package two-dimensional array of FFs address used as index to a particular word can have separate read and write addresses so can do both at same time 4 by 4 register file 16 D-FFs organized as four words of four bits each write-enable (load) 3E RB read-enable (output enable) RA WE (- WB (.
    [Show full text]
  • The Central Processing Unit(CPU). the Brain of Any Computer System Is the CPU
    Computer Fundamentals 1'stage Lec. (8 ) College of Computer Technology Dept.Information Networks The central processing unit(CPU). The brain of any computer system is the CPU. It controls the functioning of the other units and process the data. The CPU is sometimes called the processor, or in the personal computer field called “microprocessor”. It is a single integrated circuit that contains all the electronics needed to execute a program. The processor calculates (add, multiplies and so on), performs logical operations (compares numbers and make decisions), and controls the transfer of data among devices. The processor acts as the controller of all actions or services provided by the system. Processor actions are synchronized to its clock input. A clock signal consists of clock cycles. The time to complete a clock cycle is called the clock period. Normally, we use the clock frequency, which is the inverse of the clock period, to specify the clock. The clock frequency is measured in Hertz, which represents one cycle/second. Hertz is abbreviated as Hz. Usually, we use mega Hertz (MHz) and giga Hertz (GHz) as in 1.8 GHz Pentium. The processor can be thought of as executing the following cycle forever: 1. Fetch an instruction from the memory, 2. Decode the instruction (i.e., determine the instruction type), 3. Execute the instruction (i.e., perform the action specified by the instruction). Execution of an instruction involves fetching any required operands, performing the specified operation, and writing the results back. This process is often referred to as the fetch- execute cycle, or simply the execution cycle.
    [Show full text]
  • The Microarchitecture of a Low Power Register File
    The Microarchitecture of a Low Power Register File Nam Sung Kim and Trevor Mudge Advanced Computer Architecture Lab The University of Michigan 1301 Beal Ave., Ann Arbor, MI 48109-2122 {kimns, tnm}@eecs.umich.edu ABSTRACT Alpha 21464, the 512-entry 16-read and 8-write (16-r/8-w) ports register file consumed more power and was larger than The access time, energy and area of the register file are often the 64 KB primary caches. To reduce the cycle time impact, it critical to overall performance in wide-issue microprocessors, was implemented as two 8-r/8-w split register files [9], see because these terms grow superlinearly with the number of read Figure 1. Figure 1-(a) shows the 16-r/8-w file implemented and write ports that are required to support wide-issue. This paper directly as a monolithic structure. Figure 1-(b) shows it presents two techniques to reduce the number of ports of a register implemented as the two 8-r/8-w register files. The monolithic file intended for a wide-issue microprocessor without hardly any register file design is slow because each memory cell in the impact on IPC. Our results show that it is possible to replace a register file has to drive a large number of bit-lines. In register file with 16 read and 8 write ports, intended for an eight- contrast, the split register file is fast, but duplicates the issue processor, with a register file with just 8 read and 8 write contents of the register file in two memory arrays, resulting in ports so that the impact on IPC is a few percent.
    [Show full text]
  • Review of Computer Architecture
    Basic Computer Architecture CSCE 496/896: Embedded Systems Witawas Srisa-an Review of Computer Architecture Credit: Most of the slides are made by Prof. Wayne Wolf who is the author of the textbook. I made some modifications to the note for clarity. Assume some background information from CSCE 430 or equivalent von Neumann architecture Memory holds data and instructions. Central processing unit (CPU) fetches instructions from memory. Separate CPU and memory distinguishes programmable computer. CPU registers help out: program counter (PC), instruction register (IR), general- purpose registers, etc. von Neumann Architecture Memory Unit Input CPU Output Unit Control + ALU Unit CPU + memory address 200PC memory data CPU 200 ADD r5,r1,r3 ADD IRr5,r1,r3 Recalling Pipelining Recalling Pipelining What is a potential Problem with von Neumann Architecture? Harvard architecture address data memory data PC CPU address program memory data von Neumann vs. Harvard Harvard can’t use self-modifying code. Harvard allows two simultaneous memory fetches. Most DSPs (e.g Blackfin from ADI) use Harvard architecture for streaming data: greater memory bandwidth. different memory bit depths between instruction and data. more predictable bandwidth. Today’s Processors Harvard or von Neumann? RISC vs. CISC Complex instruction set computer (CISC): many addressing modes; many operations. Reduced instruction set computer (RISC): load/store; pipelinable instructions. Instruction set characteristics Fixed vs. variable length. Addressing modes. Number of operands. Types of operands. Tensilica Xtensa RISC based variable length But not CISC Programming model Programming model: registers visible to the programmer. Some registers are not visible (IR). Multiple implementations Successful architectures have several implementations: varying clock speeds; different bus widths; different cache sizes, associativities, configurations; local memory, etc.
    [Show full text]
  • V850ES/SA2, V850ES/SA3 32-Bit Single-Chip Microcontrollers
    To our customers, Old Company Name in Catalogs and Other Documents On April 1st, 2010, NEC Electronics Corporation merged with Renesas Technology Corporation, and Renesas Electronics Corporation took over all the business of both companies. Therefore, although the old company name remains in this document, it is a valid Renesas Electronics document. We appreciate your understanding. Renesas Electronics website: http://www.renesas.com April 1st, 2010 Renesas Electronics Corporation Issued by: Renesas Electronics Corporation (http://www.renesas.com) Send any inquiries to http://www.renesas.com/inquiry. Notice 1. All information included in this document is current as of the date this document is issued. Such information, however, is subject to change without any prior notice. Before purchasing or using any Renesas Electronics products listed herein, please confirm the latest product information with a Renesas Electronics sales office. Also, please pay regular and careful attention to additional and different information to be disclosed by Renesas Electronics such as that disclosed through our website. 2. Renesas Electronics does not assume any liability for infringement of patents, copyrights, or other intellectual property rights of third parties by or arising from the use of Renesas Electronics products or technical information described in this document. No license, express, implied or otherwise, is granted hereby under any patents, copyrights or other intellectual property rights of Renesas Electronics or others. 3. You should not alter, modify, copy, or otherwise misappropriate any Renesas Electronics product, whether in whole or in part. 4. Descriptions of circuits, software and other related information in this document are provided only to illustrate the operation of semiconductor products and application examples.
    [Show full text]
  • 1.1.2. Register File
    國 立 交 通 大 學 資訊科學與工程研究所 碩 士 論 文 同步多執行緒架構中可彈性切割與可延展的暫存 器檔案設計之研究 Design of a Flexibly Splittable and Stretchable Register File for SMT Architectures 研 究 生:鐘立傑 指導教授:單智君 教授 中 華 民 國 九 十 六 年 八 月 I II III IV 同步多執行緒架構中可彈性切割與可延展的暫存 器檔案設計之研究 學生:鐘立傑 指導教授:單智君 博士 國立交通大學資訊科學與工程研究所 碩士班 摘 要 如何利用最少的硬體資源來支援同步多執行緒是一個很重要的研究議題,暫存 器檔案(Register file)在微處理器晶片面積中佔有顯著的比例。而且為了支援同步多 執行緒,每一個執行緒享有自己的一份暫存器檔案,這樣的設計會增加晶片的面積。 在本篇論文中,我們提出了一份可彈性切割與可延展的暫存器檔案設計,在這 個設計裡:1.我們可以在需要的時候彈性切割一份暫存器檔案給兩個執行緒來同時 使用,2.適當的延伸暫存器檔案的大小來增加兩個執行緒共用的機會。 藉由我們設計可以得到的益處有:1.增加硬體資源的使用率,2. 減少對於記憶 體的存取以及 3.提升系統的效能。此外我們設計概念可以任意的滿足不同的應用程 式的需求。 V Design of a Flexibly Splittable and Stretchable Register File for SMT Architectures Student:Li-Jie Jhing Advisor:Dr, Jean Jyh-Jiun Shann Institute of Computer Science and Engineering National Chiao-Tung University Abstract How to support simultaneous multithreading (SMT) with minimum resource hence becomes a critical research issue. The register file in a microprocessor typically occupies a significant portion of the chip area, and in order to support SMT, each thread will have a copy of register file. That will increase the area overhead. In this thesis, we propose a register file design techniques that can 1. Split a copy of physical register file flexibly into two independent register sets when required, simultaneously operable for two independent threads. 2. Stretch the size of the physical register file arbitrarily, to increase probability of sharing by two threads. Benefits of these designs are: 1. Increased hardware resource utilization. 2. Reduced memory
    [Show full text]
  • Testing and Validation of a Prototype Gpgpu Design for Fpgas Murtaza Merchant University of Massachusetts Amherst
    University of Massachusetts Amherst ScholarWorks@UMass Amherst Masters Theses 1911 - February 2014 2013 Testing and Validation of a Prototype Gpgpu Design for FPGAs Murtaza Merchant University of Massachusetts Amherst Follow this and additional works at: https://scholarworks.umass.edu/theses Part of the VLSI and Circuits, Embedded and Hardware Systems Commons Merchant, Murtaza, "Testing and Validation of a Prototype Gpgpu Design for FPGAs" (2013). Masters Theses 1911 - February 2014. 1012. Retrieved from https://scholarworks.umass.edu/theses/1012 This thesis is brought to you for free and open access by ScholarWorks@UMass Amherst. It has been accepted for inclusion in Masters Theses 1911 - February 2014 by an authorized administrator of ScholarWorks@UMass Amherst. For more information, please contact [email protected]. TESTING AND VALIDATION OF A PROTOTYPE GPGPU DESIGN FOR FPGAs A Thesis Presented by MURTAZA S. MERCHANT Submitted to the Graduate School of the University of Massachusetts Amherst in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE IN ELECTRICAL AND COMPUTER ENGINEERING February 2013 Department of Electrical and Computer Engineering © Copyright by Murtaza S. Merchant 2013 All Rights Reserved TESTING AND VALIDATION OF A PROTOTYPE GPGPU DESIGN FOR FPGAs A Thesis Presented by MURTAZA S. MERCHANT Approved as to style and content by: _________________________________ Russell G. Tessier, Chair _________________________________ Wayne P. Burleson, Member _________________________________ Mario Parente, Member ______________________________ C. V. Hollot, Department Head Electrical and Computer Engineering ACKNOWLEDGEMENTS To begin with, I would like to sincerely thank my advisor, Prof. Russell Tessier for all his support, faith in my abilities and encouragement throughout my tenure as a graduate student.
    [Show full text]
  • Flynn's Taxonomy
    Flynn’s Taxonomy n Michael Flynn (from Stanford) q Made a characterization of computer systems which became known as Flynn’s Taxonomy Computer Instructions Data SISD – Single Instruction Single Data Systems SI SISD SD SIMD – Single Instruction Multiple Data Systems “Vector Processors” SIMD SD SI SIMD SD Multiple Data SIMD SD MIMD Multiple Instructions Multiple Data Systems “Multi Processors” Multiple Instructions Multiple Data SI SIMD SD SI SIMD SD SI SIMD SD MISD – Multiple Instructions / Single Data Systems n Some people say “pipelining” lies here, but this is debatable Single Data Multiple Instructions SIMD SI SD SIMD SI SIMD SI Abbreviations •PC – Program Counter •MAR – Memory Access Register •M – Memory •MDR – Memory Data Register •A – Accumulator •ALU – Arithmetic Logic Unit •IR – Instruction Register •OP – Opcode •ADDR – Address •CLU – Control Logic Unit LOAD X n MAR <- PC n MDR <- M[ MAR ] n IR <- MDR n MAR <- IR[ ADDR ] n DECODER <- IR[ OP ] n MDR <- M[ MAR ] n A <- MDR ADD n MAR <- PC n MDR <- M[ MAR ] n IR <- MDR n MAR <- IR[ ADDR ] n DECODER <- IR[ OP ] n MDR <- M[ MAR ] n A <- A+MDR STORE n MDR <- A n M[ MAR ] <- MDR SISD Stack Machine •Stack Trace •Push 1 1 _ •Push 2 2 1 •Add 2 3 •Pop _ 3 •Pop C _ _ •First Stack Machine •B5000 Array Processor Array Processors n One of the first Array Processors was the ILLIIAC IV n Load A1, V[1] n Load B1, Y[1] n Load A2, V[2] n Load B2, Y[2] n Load A3, V[3] n Load B3, Y[3] n ADDALL n Store C1, W[1] n Store C2, W[2] n Store C3, W[3] Memory Interleaving Definition: Memory interleaving is a design used to gain faster access to memory, by organizing memory into separate memories, each with their own MAR (memory address register).
    [Show full text]
  • Computer Organization & Architecture Eie
    COMPUTER ORGANIZATION & ARCHITECTURE EIE 411 Course Lecturer: Engr Banji Adedayo. Reg COREN. The characteristics of different computers vary considerably from category to category. Computers for data processing activities have different features than those with scientific features. Even computers configured within the same application area have variations in design. Computer architecture is the science of integrating those components to achieve a level of functionality and performance. It is logical organization or designs of the hardware that make up the computer system. The internal organization of a digital system is defined by the sequence of micro operations it performs on the data stored in its registers. The internal structure of a MICRO-PROCESSOR is called its architecture and includes the number lay out and functionality of registers, memory cell, decoders, controllers and clocks. HISTORY OF COMPUTER HARDWARE The first use of the word ‘Computer’ was recorded in 1613, referring to a person who carried out calculation or computation. A brief History: Computer as we all know 2day had its beginning with 19th century English Mathematics Professor named Chales Babage. He designed the analytical engine and it was this design that the basic frame work of the computer of today are based on. 1st Generation 1937-1946 The first electronic digital computer was built by Dr John V. Atanasoff & Berry Cliford (ABC). In 1943 an electronic computer named colossus was built for military. 1946 – The first general purpose digital computer- the Electronic Numerical Integrator and computer (ENIAC) was built. This computer weighed 30 tons and had 18,000 vacuum tubes which were used for processing.
    [Show full text]
  • The Microarchitecture of the Pentium 4 Processor
    The Microarchitecture of the Pentium 4 Processor Glenn Hinton, Desktop Platforms Group, Intel Corp. Dave Sager, Desktop Platforms Group, Intel Corp. Mike Upton, Desktop Platforms Group, Intel Corp. Darrell Boggs, Desktop Platforms Group, Intel Corp. Doug Carmean, Desktop Platforms Group, Intel Corp. Alan Kyker, Desktop Platforms Group, Intel Corp. Patrice Roussel, Desktop Platforms Group, Intel Corp. Index words: Pentium® 4 processor, NetBurst™ microarchitecture, Trace Cache, double-pumped ALU, deep pipelining provides an in-depth examination of the features and ABSTRACT functions of the Intel NetBurst microarchitecture. This paper describes the Intel® NetBurst™ ® The Pentium 4 processor is designed to deliver microarchitecture of Intel’s new flagship Pentium 4 performance across applications where end users can truly processor. This microarchitecture is the basis of a new appreciate and experience its performance. For example, family of processors from Intel starting with the Pentium it allows a much better user experience in areas such as 4 processor. The Pentium 4 processor provides a Internet audio and streaming video, image processing, substantial performance gain for many key application video content creation, speech recognition, 3D areas where the end user can truly appreciate the applications and games, multi-media, and multi-tasking difference. user environments. The Pentium 4 processor enables real- In this paper we describe the main features and functions time MPEG2 video encoding and near real-time MPEG4 of the NetBurst microarchitecture. We present the front- encoding, allowing efficient video editing and video end of the machine, including its new form of instruction conferencing. It delivers world-class performance on 3D cache called the Execution Trace Cache.
    [Show full text]
  • Implementation, Verification and Validation of an Openrisc-1200
    (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 10, No. 1, 2019 Implementation, Verification and Validation of an OpenRISC-1200 Soft-core Processor on FPGA Abdul Rafay Khatri Department of Electronic Engineering, QUEST, NawabShah, Pakistan Abstract—An embedded system is a dedicated computer system in which hardware and software are combined to per- form some specific tasks. Recent advancements in the Field Programmable Gate Array (FPGA) technology make it possible to implement the complete embedded system on a single FPGA chip. The fundamental component of an embedded system is a microprocessor. Soft-core processors are written in hardware description languages and functionally equivalent to an ordinary microprocessor. These soft-core processors are synthesized and implemented on the FPGA devices. In this paper, the OpenRISC 1200 processor is used, which is a 32-bit soft-core processor and Fig. 1. General block diagram of embedded systems. written in the Verilog HDL. Xilinx ISE tools perform synthesis, design implementation and configure/program the FPGA. For verification and debugging purpose, a software toolchain from (RISC) processor. This processor consists of all necessary GNU is configured and installed. The software is written in C components which are available in any other microproces- and Assembly languages. The communication between the host computer and FPGA board is carried out through the serial RS- sor. These components are connected through a bus called 232 port. Wishbone bus. In this work, the OR1200 processor is used to implement the system on a chip technology on a Virtex-5 Keywords—FPGA Design; HDLs; Hw-Sw Co-design; Open- FPGA board from Xilinx.
    [Show full text]