1 Computer Systems

Total Page:16

File Type:pdf, Size:1020Kb

1 Computer Systems CS2 Supplementary Notes 2000-01 1 COMPUTER SYSTEMS. 1.1 Digital Computers. A digital computer is a programmable digital system possessing the following elements (may be present in multiplicity). 1. Memory Unit: used to store programs and operands (data); usually location addressable. 2. Execution Unit: consisting of at least an ALU, which performs all data processing functions (logical and arithmetic). 3. Control Unit: a synchronous stored program programmable controller. 4. I/O Unit: provides communication with other systems (and the outside world). The ALU and CU are in intimate communication: CU monitors condition codes, generates ALU function selects, and controls movement of data. The CU and ALU, together with possibly a few storage registers, are therefore often considered as a single entity called a central processing unit (CPU ). In the early 1970s, LSI technology (MOS) made it possible to fabricate an entire CPU (admittedly a rather small and simple one) on a single chip. An integrated CPU is called a microprocessing unit (MPU ) or simply a microprocessor . Whether the CPU is a single chip, or a board (e.g.in a minicomputer), or a number of boards, its connection to the rest of the machine is limited typically to one or two data buses (with associated address and control buses). Machines which have completely separate buses for instructions and operands are called Harvard architectures . An example is the first commercial microprocessor, the INTEL 4004 (in 1971), which had an 8-bit instruction bus and a 4-bit operand bus. Most machines, however, use only one bus to fetch instructions and read/write operands: this is the Princeton or von Neumann architecture . The width in bits (or lines) of the data bus is an important characteristic of the CPU. In the 1970s it was possible to classify machines in the following way: large machines (mainframes) typically had 32 CPU data bus lines; minicomputers (e.g. DEC PDP 11) typically had 16; and microcomputers (e.g. original IBM PC) typically 8. However, with VLSI, microprocessors with the performance and data bus width of minicomputers and small to medium mainframes are now in existence. The width of the address bus determines the maximum number of locations which can be addressed; the set of all possible locations constitutes the address space of the CPU. The address space can be very large even in microprocessors: e.g. the Pentium has a space of 2 32 bytes or 4G bytes (1M:=1K.1K; 1G:=1K.1M). Using standard memory SIMMs have only a capacity of 128M bytes, 4G bytes is 128 such SIMMs. This would be expensive even in a large system and usually there are large sections of unpopulated address space (no physical devices using addresses in these regions). The memory addressed via the CPU address bus (i.e. that in the Memory Units) is referred to as primary memory . Most installations have a much larger backing store or secondary memory which is treated as a subsidiary external system and is accessed via specialised I/O units called secondary storage controllers . The CPU can communicate not only with its primary memory, but also with its I/O units, which typically contain at least one or two registers addressable by it. Some CPUs have a separate address space for I/O devices. This can be done by inclusion of one extra line, say I/O enable, but special I/O instructions are needed also (e.g. Pentium family). Other CPUs simply use the memory address bus to place their I/O devices, which then appear like memory locations to the CPU. This is called memory mapped I/O (e.g. MC68000 family). 1.2 Bus Communications. We have seen how various computer subsystems are interfaced to a system address/data bus. Data transfer on a bus is a transaction involving one source subsystem, and, usually one destination. Each such transfer or bus cycle is conducted under the control of the subsystem which is currently driving the address and Page 1 control buses, the bus master for that cycle. We will consider only cycles of the type involving two subsystems: one bus master and a bus slave . Each bus master will, of course, be a subsystem with an internal controller, such as a CPU (and other devices like DMACs), which is capable of driving the address bus. In the simplest systems the only potential bus master is the (single) CPU. In more complex systems several subsystems may have the ability to be bus masters. If several masters wish to use a shared bus, it is necessary to arbitrate between them. This is often done by a separate subsystem called a bus arbiter , which each potential master can request for bus control. The arbiter grants the bus to one master at a time. There are several algorithms which can be used to decide which master will be issued with a grant: e.g masters can be prioritised. Also the arbiter may allow one master to hold the bus until it has finished its transfer, or it may allow each master only one cycle at a time before forcing rearbitration. Slaves are, by definition, addressable devices. The address bus defines a system address space (n-bit address bus gives 2 n word address space). Each slave appears in this space as one or more addressable locations (usually locations within a slave are contiguous). A slave may be a memory mapped I/O device with a couple of locations, or a large memory unit with millions. It is necessary for an address to specify not only which location within a slave is being addressed, but also which slave is involved. Each slave must be allocated a unique portion of the address space which it will occupy. The more significant bits of the address bus are usually used to select the the slave in question. Often these bits are interpreted by a single central address decoder which then sends an enable signal to the slave required. This —decoder“ is not usually as simple as the decoder circuit discussed earlier, since different slaves can occupy different amounts of address space. A memory unit may require tens of millions of addresses, any one of which will cause it to be enabled, while an I/O interface may have as few as one or two. Thus one output of the address decoder may need to go active for tens of millions of different inputs, while another responds to only one. In any case, at most one output should go active at any one time. Address Decoder Address bus (usually upper Selects to different lines) slaves: active select will enable slave Note that any master which cannot drive all address lines will be unable to access large areas of the address space. Note also, that a master in one cycle can sometimes be a slave in another. Subsystems which have master/slave capability must have I/O interfaces to the address and control buses as well as the data bus. When a bus master has control of the bus, it will activate the address bus, putting the address of the required location onto it. Various control lines are also necessary to manage the transfer. The master will usually need at least: 1) Read/Write Line (say H=read, L=write) which will indicate to the slave whether the master is going to read or write to it. 2) Address Valid (say L=address valid) indicates to all slaves when the value on the address bus is valid, to avoid ambiguities, when, for example the lines are in transition. The address decoder will usually be disabled (all outputs inactive) when the address valid is inactive. It cannot in general be assumed that the master knows how long the slave will need to identify its address, store data written to it, or present valid data requested from it. The problem is particularly acute with mass produced standard microprocessors, where a single type of device can be used in a myriad of different systems containing other subsystems of widely differing speeds. Several strategies are used to overcome this. Page 2 2 INSTRUCTIONS. Instructions fetched and executed by the CPU vary from machine to machine. The collection of instructions which a particular CPU can execute comprises its instruction set . Although instructions are fetched, via the data bus, in binary form or machine code , it is normal to associate with each a mnemonic which describes its function: but this, of course, is for human consumption only. An instruction may be several CPU words long, and so may involve more than one memory cycle to fetch. For example a 68000 instruction can be anything from 1 to 5 words long (each inst word must be stored in a 16-bit memory word). Instructions in general can be classified as: 1) Control Instructions. Normally CU executes instructions sequentially but it is sometimes desirable to alter this. Branch or jump instructions direct the CU to begin executing at some location specified by the instruction itself. Simple branches may be unconditional or may depend on status inputs from e.g. the ALU (i.e. condition codes). Most CPUs save the condition codes from the last instruction in a special internal register called the status register (or SR for short) (also condition code register (CCR for short)). Other branching instructions may call subroutines or govern more sophisticated looping behaviour. Additionally, many machines have control instructions which can e.g. stop the CPU and wait for some external event, reset the rest of the system etc. 2) Data Processing Instructions tell the CPU to operate on data.
Recommended publications
  • Computer Organization and Architecture Designing for Performance Ninth Edition
    COMPUTER ORGANIZATION AND ARCHITECTURE DESIGNING FOR PERFORMANCE NINTH EDITION William Stallings Boston Columbus Indianapolis New York San Francisco Upper Saddle River Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montréal Toronto Delhi Mexico City São Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo Editorial Director: Marcia Horton Designer: Bruce Kenselaar Executive Editor: Tracy Dunkelberger Manager, Visual Research: Karen Sanatar Associate Editor: Carole Snyder Manager, Rights and Permissions: Mike Joyce Director of Marketing: Patrice Jones Text Permission Coordinator: Jen Roach Marketing Manager: Yez Alayan Cover Art: Charles Bowman/Robert Harding Marketing Coordinator: Kathryn Ferranti Lead Media Project Manager: Daniel Sandin Marketing Assistant: Emma Snider Full-Service Project Management: Shiny Rajesh/ Director of Production: Vince O’Brien Integra Software Services Pvt. Ltd. Managing Editor: Jeff Holcomb Composition: Integra Software Services Pvt. Ltd. Production Project Manager: Kayla Smith-Tarbox Printer/Binder: Edward Brothers Production Editor: Pat Brown Cover Printer: Lehigh-Phoenix Color/Hagerstown Manufacturing Buyer: Pat Brown Text Font: Times Ten-Roman Creative Director: Jayne Conte Credits: Figure 2.14: reprinted with permission from The Computer Language Company, Inc. Figure 17.10: Buyya, Rajkumar, High-Performance Cluster Computing: Architectures and Systems, Vol I, 1st edition, ©1999. Reprinted and Electronically reproduced by permission of Pearson Education, Inc. Upper Saddle River, New Jersey, Figure 17.11: Reprinted with permission from Ethernet Alliance. Credits and acknowledgments borrowed from other sources and reproduced, with permission, in this textbook appear on the appropriate page within text. Copyright © 2013, 2010, 2006 by Pearson Education, Inc., publishing as Prentice Hall. All rights reserved. Manufactured in the United States of America.
    [Show full text]
  • Please Replace the Following Pages in the Book. 26 Microcontroller Theory and Applications with the PIC18F
    Please replace the following pages in the book. 26 Microcontroller Theory and Applications with the PIC18F Before Push After Push Stack Stack 16-bit Register 0120 143E 20C2 16-bit Register 0120 SP 20CA SP 20C8 143E 20C2 0703 20C4 0703 20C4 F601 20C6 F601 20C6 0706 20C8 0706 20C8 0120 20CA 20CA 20CC 20CC 20CE 20CE Bottom of Stack FIGURE 2.12 PUSH operation when accessing a stack from the bottom Before POP After POP Stack 16-bit Register A286 16-bit Register 0360 Stack 143E 20C2 SP 20C8 SP 20CA 143E 20C2 0705 20C4 0705 20C4 F208 20C6 F208 20C6 0107 20C8 0107 20C8 A286 20CA A286 20CA 20CC 20CC Bottom of Stack FIGURE 2.13 POP operation when accessing a stack from the bottom Note that the stack is a LIFO (last in, first out) memory. As mentioned earlier, a stack is typically used during subroutine CALLs. The CPU automatically PUSHes the return address onto a stack after executing a subroutine CALL instruction in the main program. After executing a RETURN from a subroutine instruction (placed by the programmer as the last instruction of the subroutine), the CPU automatically POPs the return address from the stack (previously PUSHed) and then returns control to the main program. Note that the PIC18F accesses the stack from the top. This means that the stack pointer in the PIC18F holds the address of the bottom of the stack. Hence, in the PIC18F, the stack pointer is incremented after a PUSH, and decremented after a POP. 2.3.2 Control Unit The main purpose of the control unit is to read and decode instructions from the program memory.
    [Show full text]
  • Register Are Used to Quickly Accept, Store, and Transfer Data And
    Register are used to quickly accept, store, and transfer data and instructions that are being used immediately by the CPU, there are various types of Registers those are used for various purpose. Among of the some Mostly used Registers named as AC or Accumulator, Data Register or DR, the AR or Address Register, program counter (PC), Memory Data Register (MDR) ,Index register,Memory Buffer Register. These Registers are used for performing the various Operations. While we are working on the System then these Registers are used by the CPU for Performing the Operations. When We Gives Some Input to the System then the Input will be Stored into the Registers and When the System will gives us the Results after Processing then the Result will also be from the Registers. So that they are used by the CPU for Processing the Data which is given by the User. Registers Perform:- 1) Fetch: The Fetch Operation is used for taking the instructions those are given by the user and the Instructions those are stored into the Main Memory will be fetch by using Registers. 2) Decode: The Decode Operation is used for interpreting the Instructions means the Instructions are decoded means the CPU will find out which Operation is to be performed on the Instructions. 3) Execute: The Execute Operation is performed by the CPU. And Results those are produced by the CPU are then Stored into the Memory and after that they are displayed on the user Screen. Types of Registers are as Followings 1. MAR stand for Memory Address Register This register holds the memory addresses of data and instructions.
    [Show full text]
  • Computer Organization
    Chapter 12 Computer Organization Central Processing Unit (CPU) • Data section ‣ Receives data from and sends data to the main memory subsystem and I/O devices • Control section ‣ Issues the control signals to the data section and the other components of the computer system Figure 12.1 CPU Input Data Control Main Output device section section Memory device Bus Data flow Control CPU components • 16-bit memory address register (MAR) ‣ 8-bit MARA and 8-bit MARB • 8-bit memory data register (MDR) • 8-bit multiplexers ‣ AMux, CMux, MDRMux ‣ 0 on control line routes left input ‣ 1 on control line routes right input Control signals • Originate from the control section on the right (not shown in Figure 12.2) • Two kinds of control signals ‣ Clock signals end in “Ck” to load data into registers with a clock pulse ‣ Signals that do not end in “Ck” to set up the data flow before each clock pulse arrives 0 1 8 14 15 22 23 A IR T3 M1 0x00 0x01 2 3 9 10 16 17 24 25 LoadCk Figure 12.2 X T4 M2 0x02 0x03 4 5 11 18 19 26 27 5 C SP T1 T5 M3 0x04 0x08 5 6 7 12 13 20 21 28 29 B PC T2 T6 M4 0xFA 0xFC 5 30 31 A CPU registers M5 0xFE 0xFF CBus ABus BBus Bus MARB MARCk MARA MDRCk MDR MDRMux AMux AMux MDRMux CMux 4 ALU ALU CMux Cin Cout C CCk Mem V VCk ANDZ Addr ANDZ Z ZCk Zout 0 Data 0 0 0 N NCk MemWrite MemRead Figure 12.2 (Expanded) 0 1 8 14 15 22 23 A IR T3 M1 0x00 0x01 2 3 9 10 16 17 24 25 LoadCk X T4 M2 0x02 0x03 4 5 11 18 19 26 27 5 C SP T1 T5 M3 0x04 0x08 5 6 7 12 13 20 21 28 29 B PC T2 T6 M4 0xFA 0xFC 5 30 31 A CPU registers M5 0xFE 0xFF CBus ABus BBus
    [Show full text]
  • Computer Organization and Architecture, Rajaram & Radhakrishan, PHI
    CHAPTER I CONTENTS: 1.1 INTRODUCTION 1.2 STORED PROGRAM ORGANIZATION 1.3 INDIRECT ADDRESS 1.4 COMPUTER REGISTERS 1.5 COMMON BUS SYSTEM SUMMARY SELF ASSESSMENT OBJECTIVE: In this chapter we are concerned with basic architecture and the different operations related to explain the proper functioning of the computer. Also how we can specify the operations with the help of different instructions. CHAPTER II CONTENTS: 2.1 REGISTER TRANSFER LANGUAGE 2.2 REGISTER TRANSFER 2.3 BUS AND MEMORY TRANSFERS 2.4 ARITHMETIC MICRO OPERATIONS 2.5 LOGIC MICROOPERATIONS 2.6 SHIFT MICRO OPERATIONS SUMMARY SELF ASSESSMENT OBJECTIVE: Here the concept of digital hardware modules is discussed. Size and complexity of the system can be varied as per the requirement of today. The interconnection of various modules is explained in the text. The way by which data is transferred from one register to another is called micro operation. Different micro operations are explained in the chapter. CHAPTER III CONTENTS: 3.1 INTRODUCTION 3.2 TIMING AND CONTROL 3.3 INSTRUCTION CYCLE 3.4 MEMORY-REFERENCE INSTRUCTIONS 3.5 INPUT-OUTPUT AND INTERRUPT SUMMARY SELF ASSESSMENT OBJECTIVE: There are various instructions with the help of which we can transfer the data from one place to another and manipulate the data as per our requirement. In this chapter we have included all the instructions, how they are being identified by the computer, what are their formats and many more details regarding the instructions. CHAPTER IV CONTENTS: 4.1 INTRODUCTION 4.2 ADDRESS SEQUENCING 4.3 MICROPROGRAM EXAMPLE 4.4 DESIGN OF CONTROL UNIT SUMMARY SELF ASSESSMENT OBJECTIVE: Various examples of micro programs are discussed in this chapter.
    [Show full text]
  • Intel ® Atom™ Processor E6xx Series SKU for Different Segments” on Page 30 Updated Table 15
    Intel® Atom™ Processor E6xx Series Datasheet July 2011 Revision 004US Document Number: 324208-004US INFORMATIONLegal Lines and Disclaimers IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “undefined.” Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.
    [Show full text]
  • Adding Support for Vector Instructions to 8051 Architecture
    International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 10 | Oct 2018 www.irjet.net p-ISSN: 2395-0072 Adding Support for Vector Instructions to 8051 Architecture Pulkit Gairola1, Akhil Alluri2, Rohan Verma3, Dr. Rajeev Kumar Singh4 1,2,3Student, Dept. of Computer Science, Shiv Nadar University, Uttar, Pradesh, India, 4Assistant Dean & Professor, Dept. of Computer Science, Shiv Nadar University, Uttar, Pradesh, India, ----------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Majority of the IoT (Internet of Things) devices are Some of the features that have made the 8051 popular are: meant to just collect data and sent it to the cloud for processing. They can be provided with such vectorization • 4 KB on chip program memory. capabilities to carry out very specific computation work and • 128 bytes on chip data memory (RAM) thus reducing latency of output. This project is used to demonstrate how to add specialized 1.2 Components of 8051[2] vectorization capabilities to architectures found in micro- controllers. • 4 register banks. • 128 user defined software flags. The datapath of the 8051 is simple enough to be pliable • 8-bit data bus for adding an experimental Vectorization module. We are • 16-bit address bus trying to make changes to an existing scalar processor so • 16 bit timers (usually 2, but may have more, or less). that it use a single instruction to operate on one- dimensional • 3 internal and 2 external interrupts. arrays of data called vectors. The significant reduction in the • Bit as well as byte addressable RAM area of 16 bytes. Instruction fetch overhead for vectorizable operations is useful • Four 8-bit ports, (short models have two 8-bit ports).
    [Show full text]
  • Datasheet, Volume 2
    Intel® Core™ i7-900 Desktop Processor Extreme Edition Series and Intel® Core™ i7-900 Desktop Processor Series on 32-nm Process Datasheet, Volume 2 July 2010 Reference Number: 323253-002 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, or life sustaining applications. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “undefined.” Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The Intel® Core™ i7-900 desktop processor Extreme Edition series and Intel® Core™ i7-900 desktop processor series on 32-nm process may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. See http://www.intel.com/products/processor_number for details.
    [Show full text]
  • Chapter 2 - Computer Evolution and Performance
    Chapter 2 - Computer Evolution and Performance A Brief History of Computers (Section 2.1) on pp. 16-38 Let's take a quick look at the history of computers so that we know where it all began. Further, we will find that most things have dramatically changed while some have not. ENIAC (Electronic Numerical Integrator and Computer) Designed by: John Mauchly and John Presper Eckert Reason: Response to wartime needs Completed: 1946 Specifics: 30 tons, 15,000 sq ft, 18000 vacuum tubes, 5000 additions/sec decimal machine with 20 accumulators each accumulator could hold a 10-digit decimal number Programming the ENIAC involved plugging and unplugging cables. If however, the program could be stored somehow and held in memory along with the data, this process would not be necessary. This is known as the "stored-program concept." Much of the credit was given to John von Neumann who helped engineer a computer (IAS) that utilized this concept and was completed in 1952. What is amazing is that today, most computers have the same general structure and function. Wow!! IAS Computer Designed by: John von Neumann Reason: Incorporate the stored-program concept Completed: 1952 Specifics: 1000 words (40-bit words) of storage, both data and instructions are stored, binary representations A word of storage can represent either an instruction or a number. The format for each is as follows: Number: bit 0 is sign bit, bits 1-39 are the number Instruction: bits 0-7 opcode of left instruction bits 8-19 address of one of the words of memory bits 20-27 opcode of right instruction bits 28-39 address of one of the words of memory Note: An instruction word really contains two instructions (left and right instructions).
    [Show full text]
  • MPC500 Family MPC509 User's Manual
    MPC509UM/AD MPC500 Family MPC509 User’s Manual Paragraph TABLE OF CONTENTS Page Number Number PREFACE Section 1 INTRODUCTION 1.1 Features. 1-1 1.2 Block Diagram . 1-2 1.3 Pin Connections. 1-3 1.4 Memory Map . 1-5 Section 2 SIGNAL DESCRIPTIONS 2.1 Pin List . 2-1 2.2 Pin Characteristics . 2-2 2.3 Power Connections . 2-3 2.4 Pins with Internal Pull-Ups and Pulldowns. 2-3 2.5 Signal Descriptions . 2-4 2.5.1 Bus Arbitration and Reservation Support Signals . 2-6 2.5.1.1 Bus Request (BR). 2-6 2.5.1.2 Bus Grant (BG). 2-7 2.5.1.3 Bus Busy (BB) . 2-8 2.5.1.4 Cancel Reservation (CR) . 2-8 2.5.2 Address Phase Signals . 2-8 2.5.2.1 Address Bus (ADDR[0:29]). 2-9 2.5.2.2 Write/Read (WR) . 2-9 2.5.2.3 Burst Indicator (BURST). 2-9 2.5.2.4 Byte Enables (BE[0:3]) . 2-10 2.5.2.5 Transfer Start (TS) . 2-10 2.5.2.6 Address Acknowledge (AACK). 2-10 2.5.2.7 Burst Inhibit (BI) . 2-11 2.5.2.8 Address Retry (ARETRY). 2-12 2.5.2.9 Address Type (AT[0:1]) . 2-12 2.5.2.10 Cycle Types (CT[0:3]). 2-13 2.5.3 Data Phase Signals . 2-13 2.5.3.1 Data Bus (DATA[0:31]). 2-13 2.5.3.2 Burst Data in Progress (BDIP) .
    [Show full text]
  • Introduction to CUDA Programming
    Προηγμένη Αρχιτεκτονική Υπολογιστών Non-Uniform Cache Architectures Νεκτάριος Κοζύρης & Διονύσης Πνευματικάτος {nkoziris,pnevmati}@cslab.ece.ntua.gr Διαφάνειες από τον Ανδρέα Μόσχοβο, University of Toronto 8ο εξάμηνο ΣΗΜΜΥ ⎯ Ακαδημαϊκό Έτος: 2019-20 http://www.cslab.ece.ntua.gr/courses/advcomparch/ Modern Processors Have Lots of Cores and Large Caches • Sun Niagara T1 From http://jinsatoh.jp/ennui/archives/2006/03/opensparc.html Modern Processors Have Lots of Cores and Large Caches • Intel i7 (Nehalem) From http://www.legitreviews.com/article/824/1/ Modern Processors Have Lots of Cores and Large Caches • AMD Shanghai From http://www.chiparchitect.com Modern Processors Have Lots of Cores and Large Caches • IBM Power 5 From http://www.theinquirer.net/inquirer/news/1018130/ibms-power5-the-multi-chipped-monster-mcm-revealed Why? • Helps with Performance and Energy • Find graph with perfect vs. realistic memory system What Cache Design Used to be About Core L1I L1D 1-3 cycles / Latency Limited L2 10-16 cycles / Capacity Limited Main Memory > 200 cycles • L2: Worst Latency == Best Latency • Key Decision: What to keep in each cache level What Has Changed ISSCC 2003 What Has Changed • Where something is matters • More time for longer distances NUCA: Non-Uniform Cache Architecture Core L1I L1D • Tiled Cache • Variable Latency L2 L2 L2 L2 • Closer tiles = Faster L2 L2 L2 L2 L2 L2 L2 L2 • Key Decisions: – Not only what to cache L2 L2 L2 L2 – Also where to cache NUCA Overview • Initial Research focused on Uniprocessors • Data Migration Policies –
    [Show full text]
  • Address Decoding Large-Size Binary Decoder: 28-To-268435456 Binary Decoder for 256Mb Memory
    Embedded System 2010 SpringSemester Seoul NationalUniversity Application [email protected] Dept. ofEECS/CSE ower Naehyuck Chang P 4190.303C ow- L Introduction to microprocessor interface mbedded aboratory E L L 1 P L E Harvard Architecture Microprocessor Instruction memory Input: address from PC ARM Cortex M3 architecture Output: instruction (read only) Data memory Input: memory address Addressing mode Input/output: read/write data Read or write operand Embedded Low-Power 2 ELPL Laboratory Memory Interface Interface Address bus Data bus Control signals (synchronous and asynchronous) Fully static read operation Input Memory Output Access control Embedded Low-Power 3 ELPL Laboratory Memory Interface Memory device Collection of memory cells: 1M cells, 1G cells, etc. Memory cells preserve stored data Volatile and non-volatile Dynamic and static How access memory? Addressing Input Normally address of the cell (cf. content addressable memory) Memory Random, sequential, page, etc. Output Exclusive cell access One by one (cf. multi-port memory) Operations Read, write, refresh, etc. RD, WR, CS, OE, etc. Access control Embedded Low-Power 4 ELPL Laboratory Memory inside SRAM structure Embedded Low-Power 5 ELPL Laboratory Memory inside Ports Recall D-FF 1 input port and one output port for one cell Ports of memory devices Large number of cells One write port for consistency More than one output ports allow simultaneous accesses of multiple cells for read Register file usually has multiple read ports such as 1W 3R Memory devices usually has one read
    [Show full text]