www.getmyuni.com

General Purpose Processors - I

www.getmyuni.com In this lesson the student will learn the following Architecture of a General Purpose Processor Various Labels of Pipelines Basic Idea on Different Execution Units Branch Prediction

Pre-requisite

Digital Electronics

8.1 Introduction

The first single chip microprocessor came in 1971 by Corporation. It was called Intel 4004 and that was the first single chip CPU ever built. We can say that was the first general purpose processor. Now the term microprocessor and processor are synonymous. The 4004 was a 4-bit processor, capable of addressing 1K data memory and 4K program memory. It was meant to be used for a simple calculator. The 4004 had 46 instructions, using only 2,300 transistors in a 16- pin DIP. It ran at a of 740kHz (eight clock cycles per CPU cycle of 10.8 microseconds). In 1975, Motorola introduced the 6800, a chip with 78 instructions and probably the first microprocessor with an index register. In 1979, Motorola introduced the 68000. With internal 32-bit registers and a 32-bit address space, its bus was still 16 bits due to hardware prices. On the other hand in 1976, Intel designed 8085 with more instructions to enable/disable three added interrupt pins (and the serial I/O pins). They also simplified hardware so that it used only +5V power, and added clock-generator and bus-controller circuits on the chip. In 1978, Intel introduced the 8086, a 16-bit processor which gave rise to the architecture. It did not contain floating-point instructions. In 1980 the company released the 8087, the first math co- processor they'd developed. Next came the 8088, the processor for the first IBM PC. Even though IBM engineers at the time wanted to use the Motorola 68000 in the PC, the company already had the rights to produce the 8086 line (by trading rights to Intel for its bubble memory) and it could use modified 8085-type components (and 68000-style components were much more scarce).

Table 1 Development History of Intel Microprocessors

Intel Processor Year of Initial Clock Number of Circuit Line Introduction Speed Transistors Width 4004 1971 108 kHz 2300 10 micron 8008 1972 500-800 KHz 3500 10 micron 8080 1974 2 MHz 4500 6 micron 8086 1978 5 MHz 29000 3 micron 8088 1979 5 MHz 29000 3 micron Intel286TM 1982 6 MHz 134,000 1.5 micron Intel386TM 1985 16 MHz 275,000 1.5 micron Intel486TM 1989 25 MHz 1.2 Million 1 Micron PentiumTM 1993 66 MHz 3.1 Million 0.8 Micron PentiumTM Pro 1995 200 MHz 5.5 Million 0.35 Micron PentiumTM II 1997 300 MHz 7.5 Million 0.25 Micron www.getmyuni.com CeleronTM 1998 266 MHz 7.5 Million 0.25 Micron PentiumTM III 1999 500 MHz 9.5 Million 0.25 Micron PentiumTM IV 2000 1.5MHz 42 Million 0.18 Micron ItaniumTM 2001 800 MHz 25 Million 0.18 Micron Intel® Xeon™ 2001 1.7 GHz 42 million 0.18 micron ItaniumTM 2 2002 1 GHz 220 million 0.18 micron PentiumTM M 2005 1.5 GHz 140 Million 90 nm

The development history of Intel family of processors is shown in Table 1. The Very Large Scale Integration (VLSI) technology has been the main driving force behind the development.

8.2 A Typical Processor

Fig. 8.2 The photograph

The photograph and architecture of a modern general purpose processor from VIA (C3) (please refer lesson on Embedded components 2) is shown in Fig2 and Fig. 8.3 respectively.

www.getmyuni.com

I-Cache & I-TLB I I-Fetch 64 KB 4-way 128-ent 8-way B 8-ent PDC predecode V Return stack decode buffer 3 BHTs Decode & Decode F Branch Translate Prediction 4-entry inst Q

BTB Translate X

Bus L2 cache 4-entry inst Q ROM Unit 64 Kb 4-way Register File R address calculation A

D-Cache & D-TLB D - 64 KB - 128-ent 8-way 4 way 8-ent PDC G Execute

FP Integer ALU E Q MMX/ Store-Branch S 3D FP unit unit Write back W Store Buffers

Write Buffers

Fig. 8.3 The architecture Specification

Name: VIA C3TM in EBGA: VIA C3 is the name of the company and EBGA for Enhanced Ball Grid Array, clock speed is 1 GHz

Ball Grid Array. (Sometimes abbreviated BG.) A ball grid array is a type of microchip connection methodology. Ball grid array chips typically use a group of solder dots, or balls, www.getmyuni.com arranged in concentric rectangles to connect to a circuit board. BGA chips are often used in mobile applications where Pin Grid Array (PGA) chips would take up too much space due to the length of the pins used to connect the chips to the circuit board.

SIMM

DIP

PGA

SIP

Fig. 8.4 Pin Grid Array (PGAA)

Fig. 8.5 Ball Grid Array www.getmyuni.com

Fig. 8.6 The Bottom View of the Processor

The Architecture

The processor has a 12-stage integer pipe lined structure:

Pipe Line: This is a very important characteristic of a modern general purpose processor. A program is a set of instructions stored in memory. During execution a processor has to fetch these instructions from the memory, decode it and execute them. This process takes few clock cycles. To increase the speed of such processes the processor divide itself into different units. While one unit gets the instructions from the memory, another unit decodes them and some other unit executes them. This is called pipelining. This can be termed as segmenting a functional unit such that it can accept new operands every cycle while the total execution of the instruction may take many cycles. The pipeline construction works like a conveyor belt accepting units until the pipeline is filled and than producing results every cycle. The above processors has got such a pipeline divided into 12–stages There are four major functional groups: I-fetch, decode and translate, execution, and data cache. • The I-fetch components deliver instruction bytes from the large I-cache or the external bus. • The decode and translate components convert these instruction bytes into internal execution forms. If there is any branching operation in the program it is identified here and the processor starts getting new instructions from a different location. • The execution components issue, execute, and retire internal instructions www.getmyuni.com • The data cache components manage the efficient loading and storing of execution data to and from the caches, bus, and internal components

Instruction Fetch Unit

I-Cache & I-TLB I

128-ent 8-way 64 KB 4-way B 8-ent PDC

predecode V

decode buffer

Fig. 8.7

First three pipeline stages (I, B, V) deliver aligned instruction data from the I-cache (Instruction Cache) or external bus into the instruction decode buffers. The primary I-cache contains 64 KB organized as four-way set associative with 32-byte lines. The associated large I-TLB(Instruction Translation Look-aside Buffer) contains 128 entries organized as 8-way set associative.

TLB: translation look-aside buffer a table in the processor’s memory that contains information about the pages in memory the processor has accessed recently. The table cross-references a program’s virtual addresses with the corresponding absolute addresses in physical memory that the program has most recently used. The TLB enables faster computing because it allows the address processing to take place independent of the normal address-translation pipeline. The instruction data is predecoded as it comes out of the cache; this predecode is overlapped with other required operations and, thus, effectively takes no time. The fetched instruction data is placed sequentially into multiple buffers. Starting with a branch, the first branch-target byte is left adjusted into the instruction decode buffer.

www.getmyuni.com Instruction Decode Unit

Predecode V

Return stack Decode buffer 3 BHTs Decode Decode F & Branch Prediction Translate 4-entry inst Q

BTB Translate X

Fig. 8.8

Instruction bytes are decoded and translated into the internal format by two pipeline stages (F,X). The F stage decodes and “formats” an instruction into an intermediate format. The internal- format instructions are placed into a five-deep FIFO(First-In-First-Out) queue: the FIQ. The X- stage “translates” an intermediate-form instruction from the FIQ into the internal microinstruction format. Instruction fetch, decode, and translation are made asynchronous from execution via a five-entry FIFO queue (the XIQ) between the translator and the execution unit.

Branch Prediction

predec Return stack Decode buffer 3 BHTs

Dec Branch Prediction

4-entry inst Q

BTB Tran

Fig. 8.9 www.getmyuni.com BHT Branch History Table and BTB Branch Target Buffer

The programs often invoke subroutines which are stored at a different location in the memory. In general the instruction fetch mechanism fetches instructions beforehand and keeps them in the cache memory at different stages and sends them for decoding. In case of a branch all such instructions need to be abandoned and new set of instruction codes from the corresponding subroutine is to be loaded. Prediction of branch earlier in the pipeline can save time in flushing out the current instructions and getting new instructions. Branch prediction is a technique that attempts to infer the proper next instruction address, knowing only the current one. Typically it uses a Branch Target Buffer (BTB), a small, associative memory that watches the instruction cache index and tries to predict which index should be accessed next, based on branch history which stored in another set of buffers known as Branch History Table (BHT). This is carried out in the F stage.

Integer Unit

BT Translate X Bus L2 4-entry inst Q ROM Unit 64 Kb 4-way Register R

address calculation A D-Cache & D-TLB D - 64 KB - 128-ent 8-way 4 way 8-ent PDC G

Integer ALU E

Store-Branch S

Writeback W Store Buffers

Write Buffers

Fig. 8.10

Decode stage (R): Micro-instructions are decoded, integer register files are accessed and resource dependencies are evaluated. Addressing stage (A): Memory addresses are calculated and sent to the D-cache (Data Cache). www.getmyuni.com Cache Access stages (D, G): The D-cache and D-TLB (Data Translation Look aside Buffer) are accessed and aligned load data returned at the end of the G-stage. Execute stage (E): Integer ALU operations are performed. All basic ALU functions take one clock except multiply and divide. Store stage (S): Integer store data is grabbed in this stage and placed in a store buffer. Write-back stage (W): The results of operations are committed to the register file.

Data-Cache and Data Path

BTB Translate X

Socket L2 cache ROM 370 Bus 4-entry inst Q Unit Bus 64 Kb 4-way Register File R

address calculation A

D-Cache & D-TLB D

- 64 KB 4-way - 128-ent 8-way G 8-ent PDC

Integer ALU E

Fig. 8.11

The D-cache contains 64 KB organized as four-way set associative with 32-byte lines. The associated large D-TLB contains 128 entries organized as 8-way set associative. The cache, TLB, and page directory cache all use a pseudo-LRU (Least Recently Used) replacement algorithm www.getmyuni.com The L2-Cache Memory

BTB Translator

Socket L2 cache 370 Bus 4-entry inst Q Unit Bus 64 Kb 4-way Register F

address calculation

D-Cache & D-TLB

- 64 KB 4-way - 128-ent 8-way 8-ent PDC

Fig. 8.12

The L2 cache at any point in time are not contained in the two 64-KB L1 caches. As lines are displaced from the L1 caches (due to bringing in new lines from memory), the displaced lines are placed in the L2 cache. Thus, a future L1-cache miss on this displaced line can be satisfied by returning the line from the L2 cache instead of having to access the external memory.

FP, MMX and 3D

Uni Ececute

FP Integer ALU E Q Store-Branch S MMX/ FP 3D Unit Writeback W Unit

Fig. 8.13

FP; Floating Point Processing Unit MMX: Multimedia Extension or Matrix Math Extension Unit www.getmyuni.com 3D: Special set of instructions for 3D graphics capabilities

In addition to the integer execution unit, there is a separate 80-bit floating-point execution unit that can execute floating-point instructions in parallel with integer instructions. Floating-point instructions proceed through the integer R, A, D, and G stages. Floating-point instructions are passed from the integer pipeline to the FP-unit through a FIFO queue. This queue, which runs at the processor clock speed, decouples the slower running FP unit from the integer pipeline so that the integer pipeline can continue to process instructions overlapped with FP instructions. Basic arithmetic floating-point instructions (add, multiply, divide, square root, compare, etc.) are represented by a single internal floating-point instruction. Certain little-used and complex floating point instructions (sin, tan, etc.), however, are implemented in and are represented by a long stream of instructions coming from the ROM. These instructions “tie up” the integer instruction pipeline such that integer execution cannot proceed until they complete.

This processor contains a separate execution unit for the MMX-compatible instructions. MMX instructions proceed through the integer R, A, D, and G stages. One MMX instruction can issue into the MMX unit every clock. The MMX multiplier is fully pipelined and can start one non- dependent MMX multiply[-add] instruction (which consists of up to four separate multiplies) every clock. Other MMX instructions execute in one clock. Multiplies followed by a dependent MMX instruction require two clocks. Architecturally, the MMX registers are the same as the floating-point registers. However, there are actually two different register files (one in the FP- unit and one in the MMX units) that are kept synchronized by hardware.

There is a separate execution unit for some specific 3D instructions. These instructions provide assistance for graphics transformations via new SIMD(Single Instruction Multiple Data) single- precision floating-point capabilities. These instruction-codes proceed through the integer R, A, D, and G stages. One 3D instruction can issue into the 3D unit every clock. The 3D unit has two single-precision floating-point multipliers and two single-precision floating-point adders. Other functions such as conversions, reciprocal, and reciprocal square root are provided. The multiplier and adder are fully pipelined and can start any non-dependent 3D instructions every clock.

8.3 Conclusion

This lesson discussed about the architecture of a typical modern general purpose processor(VIA C3) which similar to the x86 family of microprocessors in the Intel family. In fact this processor uses the same x86 instruction set as used by the Intel processor. It is a pipelined architecture. The General Purpose Processor Architecture has the following characteristics • Multiple Stages of Pipeline • More than one Level of Cache Memory • Branch Prediction Mechanism at the early stage of Pipe Line • Separate and Independent Processing Units (Integer Floating Point, MMX, 3D etc) • Because of the uncertainties associated with Branching the overall instruction execution time is not fixed (therefore it is not suitable for some of the real time applications which need accurate execution speed) • It handles a very complex instruction set • The over all power consumption because of the complexity of the processor is higher

In the next lesson we shall discuss the signals associated with such a processor. www.getmyuni.com 8.4 Questions and Answers

Q1. Draw the architecture of a similar processor (say P4) from Intel Family and study the various units. Q2. What is meant by the superscalar architecture in Intel family of processors? How is it different/similar to pipelined architecture? Q3. What kind of instructions do you expect for MMX units? Are they SIMD instructions? Q4. How do you evaluate sin(x) by hardware? Q5. What is the method to determine the execution time for a particular instruction for such processors? Q6. Enlist some instructions for the above processor. Q7. What is power consumption of this processor? How do you specify them? Q8. Give the various logic level voltages for the VIA C3 processor. Q9. How do you number the pins in an EBGA chip? Q10. What is the advantages of EBGA over PGA?

Answers

Q1. Intel P4 Net-Burst architecture

System Bus Frequently used paths

Less frequently used paths Bus Unit

3rd Level Cache Optional

2nd Level Cache 1st Level Cache 8-Way 4-Way

Front End

Trace Cache Execution Fetch/Decode Microcode Out-Of-Order Retirement ROM Core

Branch History Update BTBs/Branch Prediction

Version 2 EE IIT, Kharagpur 14 www.getmyuni.com Q.2 Superscalar architecture refers to the use of multiple execution units, to allow the processing of more than one instruction at a time. This can be thought of as a form of "internal multiprocessing", since there really are multiple parallel processors inside the CPU. Most modern processors are superscalar; some have more parallel execution units than others. It can be said to consist of multiple pipelines.

Q3. Some MMX instructions from x86 family MOVQ Move quadword PUNPCKHWD Unpack high-order words PADDUSW Add packed unsigned word integers with unsigned saturation They also can be SIMD instructions.

Q4. (a) Look Up Table (b) Taylor Series (c) From the complex exponential

Q5. This is done by averaging the instruction execution in various programming models which includes latency and overhead. This is a statistical measure.

Q6. All x86 family instructions will work.

Q7. around 7.5 watts

Q8.

Parameter Min Max Units Notes

VIL – Input Low Voltage -0.58 0.700 V

VIH1.5 – Input High Voltage VREF + 0.2 VTT V (2)

VIH2.5 – Input High Voltage 2.0 3.18 V (3)

VOL – Low Level Output Voltage 0.40 V @IOL

VOH – High Level Output Voltage VCMOS V (1)

IOL – Low Level Output Current 9 mA @VCL

ILI – Input Leakage Current ±100 µA

ILO – Output Leakage Current ±100 µA

Q9. Refer Text

Q10. Refer Text www.getmyuni.com

General Purpose Processors - II

www.getmyuni.com Signals

In this lesson the student will learn the following Signals of a General Purpose Processor Multiplexing Address Signals Data Signals Control Bus Arbitration Signals Status Signal Indicators Sleep State Indicators Interrupts

Pre-requisite

Digital Electronics

9.1 Introduction

The input/output signals of a processor chip are the matter discussion in this chapter. We shall take up the same VIA C3 processor as discussed in the last chapter. In the design flow of a processor the internal architecture is determined and simulated for optimal performance.

APPLICATION REQUIREMENT CAPTURE INSTRUCTION SET DESIGN AND CODING FUNCTIONAL

ASIC SW INITIAL ABSTRACT FINAL INSTRUCTION SET & HW TOOLS INSTRUCTION SET INITIAL ARCHITECTURE FLOW FLOW

ENVIRONMENT REQUIREMENT EXPLORATION OF CAPTURE ARCHITECTURES

FUNCTIONAL NON - ESTIMATION FUNCTIONAL

AUGMENTED ABSTRACT FINAL INSTRUCTION SET & INSTRUCTION SET PROCESSOR TOOLS & FINAL ARCHITECTURE ARCHITECTURE HW IMPLEMENTATION

Fig. 9.1 The overall design flow for a typical processor www.getmyuni.com The basic architecture decides the signals. Broadly the signals can be classified as: 1. Address Signals 2. Data Signals 3. Control Signals 4. Power Supply Signals

Some of these signals are multiplexed in time for making the VLSI design easier and efficient without affecting the over all performance.

Multiplexed in Time (known as Time Division Multiplexing)

A digital data transmission method that takes signals from multiple sources, divides them into pieces which are then placed periodically into time slots,(clock cycles here) transmits them down a single path and reassembles the time slots back into multiple signals on the remote end of the transmission www.getmyuni.com

Fig. 9.2 Bottom View of the Processor

9.2 Signals of VIA Processor discussed earlier

The following lines discuss the various signals associated with the processor.

A[31:3]# The address Bus provides addresses for physical memory and external I/O devices. During cache inquiry cycles, A31#-A3# are used as inputs to perform snoop cycles. This is an output signal when it sends and address to the memory and I/O device. It serves as both input and output during snoop cycles. It is synchronized with the Bus Clock (BCLK)

Snoop cycles: The term "snooping" commonly refers to at least three different actions.

• Inquire Cycles: These are bus cycles, initiated by external logic, that cause the processor to look up an address in its physical cache tags. www.getmyuni.com

• Internal Snooping: These are internal actions by the processor (rather than external logic) that are taken during certain types of cache accesses in order to detect self- modifying code. • Bus Watching: Some caching devices watch their address and data bus continuously while they are held off the bus, comparing every address driven by another bus master with their internal cache tags and optionally updating their cached lines on the fly, during write backs by the other master.

A20M# A20 Mask causes the CPU to make (force to 0) the A20 address bit when driving the external address bus or performing an internal cache access. A20M# is provided to emulate the 1 M Byte address wrap-around that occurs on the x86. Snoop addressing is not affected. It is an input signal. If it is not used then it is connected to the power supply. This is not synchronized with the Bus Clock or anything.

ADS# Address Strobe begins a memory/I/O cycle and indicates the address bus (A31#-A3#) and transaction request signals (REQ#) are valid. This is an output signal during addressing cycle and an input/output signal during transaction request cycles. This is synchronized with the bus clock. Memory /I/O cycle: The memory and input output data transfer (read or write) is carried out in different clock cycles. The address is first loaded on the address bus. The processor being faster waits till the memory or input/output is ready to send or receive the date through the data bus. Normally it takes more than one clock cycle. Transaction Request Cycle: When the external device request the CPU to transmit data. The request comes through this line.

BCLK Bus Clock: provides the fundamental timing for the CPU. The frequency of the input clock determines the operating frequency of the CPU’s bus. External timing is defined referenced to the rising edge of CLK. It is an Input clock signal.

BNR# Block Next Request: signals a bus stall by a bus agent unable to accept new transactions. This is an input or output signal and is synchronized with the bus clock.

BPRI# Priority Agent Bus Request arbitrates for ownership of the system bus. Input and is synchronized with the Bus clock. Bus Arbitration: At times external devices signal the processor to release the system address/data/control bus from its control. This is achieved by an external request which normally comes from the external devices such as a DMA controller or a Coprocessor.

BR[4:0]: Hardware strapping options for setting the processors internal clock multiplier. By strapping these wires to the supply or ground (some times they can be kept open for making them 1). This option divides the input clock.

BSEL[1:0]: Bus frequency select balls (BSEL 0 and BSEL 1) identify the appropriate bus speed (100 MHz or 133 MHz). It is an output signal.

BR0#: It drives the BREQ[0]# signal in the system to request access to the system bus.

D[63:0]#: Data Bus signals are bi-directional signals which provide the data path between the CPU and external memory and I/O devices. The data bus must assert DRDY# to indicate valid data transfer. This is both input as well as output. www.getmyuni.com DBSY#: Data Bus Busy is asserted by the data bus driver to indicate data bus is in use. This is both input as well as output.

DEFER#: Defer is asserted by target agent and indicates the transaction cannot be guaranteed as an in-order completion. This is an input signal.

DRDY#: Data Ready is asserted by data driver to indicate that a valid signal is on the data bus. This is both input and output signal.

FERR#: FPU Error Status indicates an unmasked floating-point error has occurred. FERR# is asserted during execution of the FPU instruction that caused the error. This is an output signal.

FLUSH#: Flush Internal Caches writing back all data in the modified state. This is an input signal to the CPU.

HIT#: Snoop Hit indicates that the current cache inquiry address has been found in the cache. This is both input as well as output signal.

HITM#: Snoop Hit Modified indicates that the current cache inquiry address has been found in the cache and dirty data exists in the cache line (modified state). (both input/output)

INIT#: Initialization resets integer registers and does not affect internal cache or floating point registers. (Input)

INTR: Maskable Interrupt I. This is an input signal to the CPU.

NMI: Non-Maskable Interrupt I

LOCK#: Lock Status is used by the CPU to signal to the target that the operation is atomic. An atomic operation is any operation that a CPU can perform such that all results will be made visible to each CPU at the same time and whose operation is safe from interference by other CPUs. For example, reading or writing a word of memory is an atomic operation.

NCHCTRL: The CPU uses this ball to control integrated I/O pull-ups. A resistance is to be connected here to control the current on the input/output pins.

PWRGD (power good) Indicates that the processor’s VCC is stable. It is an input signal.

REQ[4:0]#: Request Command is asserted by bus driver to define current transaction type.

RESET#: This is an input that resets the processor and invalidates internal cache without writing back.

RTTCTRL: The CPU uses this ball to control the output impedance.

RS[2:0]#: Response Status is an input that signals the completion status of the current transaction when the CPU is the response agent.

SLP#: Sleep when asserted in the stop grant state, causes the CPU to enter the sleep state. www.getmyuni.com Different Sleep states

"Stop Grant" Power to CPU is maintained, but no instructions are executed. The CPU halts itself and may shut down many of its internal components. In Microsoft Windows, the "Standby" command is associated with this state by default.

"Suspend to RAM" All power to the CPU is shut off, and the contents of its registers are flushed to RAM, which remains on. This system state is the most prone to errors and instability.

"Suspend to Disk" CPU power shut off, but RAM is written to disk and shut off as well. In Microsoft Windows, the "Hibernate" command is associated with this state. Because the contents of RAM are written out to disk, system context is maintained. For example, unsaved files would not be lost following this.

"Soft Off" System is shut down, however some power may be supplied to certain devices to generate a wake event, for example to support automatic startup from a LAN or USB device. In Microsoft Windows, the "Shut down" command is associated with this state. Mechanical power can usually be removed or restored with no ill effects.

Processor "C" power states Processor "C" power states are also defined. These are typically implemented in laptop platforms only. Here the cpu consumes less power while still doing work, and the tradeoff comes between power and performance, rather than power and latency.

SMI#: System Management (SMM) Interrupt forces the processor to save the CPU state to the top of SMM memory and to begin execution of the SMI services routine at the beginning of the defined SMM memory space. An SMI is a high-priority interrupt than NMI.

STPCLK#: Stop Clock Input causes the CPU to enter the stop grant state.

TRDY#: Target Ready Input indicates that the target is ready to receive a write or write-back transfer from the CPU.

VID[3:0]: Voltage Identification Bus informs the regulatory system on the motherboard of the CPU Core voltage requirements. This is an output signal.

9.3 Conclusion

In this chapter the various signals of a typical general purpose processor has been discussed. Broadly we can classify them into the following categories.

Address Signals: They are used to address the memory as well as input/output devices. They are often multiplexed with other control signals. In such cases External Bus controllers latch these address lines and make them available for a longer time for the memory and input/output devices while the CPU changes the status of the same. The Bus controllers drive their inputs which are www.getmyuni.com connected to the CPU to high impedance so as not to interfere with the current state of these lines from the CPU.

Data Signals: These lines carry the data to and fro the processor and memory or i/o devices. Transceivers are connected on the data path to control the data flow. The data flow might succeed some bus transaction signals. This bus transaction signals are necessary to negotiate the speed mismatch between the input/output and the processor.

Control Signals: These can be generally divided into the following groups Read Write Control Memory Write The processor issues this signal while sending data to the memory Memory Read The processor issues this signal while reading the data from the memory I/O Read The input/output read signal which is generally preceded by some bus transaction signals I/O Write The input/output read signal which is generally succeeded by some bus transaction signals These read write signals are not generally directly available from the CPU. They are decoded from a set of status signal by an external bus controller.

Bus Transaction Control

• Master versus Slave master send address Bus Bus master slave data can go either way

A bus transaction includes two parts: sending the address and receiving or sending the data. Master is the one who starts the bus transaction by sending the address. Slave is the one who responds to the address by sending data to the master if the master asks for data and receiving data from master if master wants to send data. These are controlled by signals like Ready, Defer etc.

Bus Arbitration Control

Control: master initiates requests Bus Bus Master Slave Data can go either way

This is known as requesting to obtain the access to a bus. They are achieved by the following lines. Bus Request: The slave requests for the access grant Bus Grant: Gets the grant signal Lock: For specific operations the bus requests are not granted as the CPU might be doing some important operations. www.getmyuni.com Interrupt Control

In a multitasking environment the Interrupts are external signals to the CPU for emergency operations. The CPU executes the interrupt service routines while acknowledging the interrupts. The interrupts are processed according to their priority. More discussion is available in subsequent lessons.

Processor Control

These lines are activated when there is a power on or the processor comes up from a power- saving mode such as sleep. These are Reset Test lines etc. Some of the above signals will be discussed in the subsequent lessons.

9.4 Questions and Answers

Q1. What is maximum memory addressing capability of the processor discussed in this lecture?

Ans: The number of address lines is 32. Therefore it can address 232 locations which is 4G bytes

Q2. What do you understand by POST in a desktop computer?

Ans: It is called Power On Self Test. This is a routine executed to check the proper functioning of Hard Disk, CDROM, Floppy Disk and many other on-board and off-board components while the computer is powered on.

Q3. Describe the various power-saving modes in a general purpose CPU?

Ans: Refer to: Sleep Mode in Text

Q4. What could be the differences in design of a processor to be used in the following applications? LAPTOP Desktop Motor Control

Ans: LAPTOP processor: should be complex General Purpose Processor with low power consumption and various power saving modes.

Desktop: High Performance processor which has no limit on power consumption.

Motor Control: Simple low power specialized processor with on-chip peripherals with Real Time Operating System.

Q5. What is the advantage of reducing the High state voltage from 5 V to 3.5 volts? What are the disadvantages? www.getmyuni.com Ans: It reduces the interference but decreases the noise accommodation.

Q6. What is the use of Power-Good signal?

Ans: It is used to know the quality of supply in side the CPU. If it is not good there may mal- operations and data loss.