<<

Computer Architectures

I/O subsystem Miroslav Šnorek, Pavel Píša, Michal Štepanovský

Czech Technical University in Prague, Faculty of Electrical Engineering English version partially supported by: European Social Fund Prague & EU: We invests in your future.

AE0B36APO Computer Architectures Ver.1.10 1 This lecture topics?

● Computer subsystem interconnection ● Buses, examples of different platforms (VME) ● PC standard example ● PCI ● Why? Introduction for laboratory experiments ● PCI innovations ● PCIe ● Why? Introduction for laboratory experiments ● Further innovations ● Hypertransport, Infiniband, QPI ● Why? For you to know

AE0B36APO Computer Architectures 2 Motivation

Goal of today lecture understand interconnection technologies

AE0B36APO Computer Architectures 3 Motivation – Intel – Only as an example

AE0B36APO Computer Architectures 4 What is the main task of I/O subsystem?

● Interconnection of subsystems inside computer, connection of external peripherals and computers together ● Demands on I/O subsystem: Creating optimal data paths, especially critical for the most demanding peripherals (graphic cards, external memories) ● Possible solutions: There have to be compromises due to price/performance ratio when it is ● possible to share data paths, or ● advantageous to share them.

AE0B36APO Computer Architectures 5 Single cycle processor form lecture 2

Main memory is not part of the CPU… MemToReg Contr MemWrite ol Unit Branch 31:26 Opcod ALUControl 2:0 e ALUScr 5:0 RegDest AluOutW Funct RegWrite 25:21 WE3 SrcA PC’ Zero WE 0 PC A RD Instr A1 RD1 ALU 0Result 1 20:16 A RD 1 Instr. A2 RD2 0SrcB AluOutM Data ReadData Memory Memory A3 Reg. 1 WD W 20:16 File WriteData 3 Rt 0 WriteReg D 15:11 Rd 1 + 4 15:0 <<2 Sign Ext SignImm PCBranch PCPlus4F PCPlus4D PCPlus4E +

AE0B36APO Computer Architectures 6 Single cycle processor form lecture 2 The address bus width is 32 64 signals are required to bits, data path is 32 bits wide connect momory (control ones too as well as instruction! not counted) MemToReg Contr MemWrite ol Unit Branch 31:26 Opcod ALUControl 2:0 e ALUScr 5:0 RegDest AluOutW Funct RegWrite

25:21 WE3 SrcA Zero WE A 0PC’ PC A RD Instr A1 RD1 ALU 0Result RD 1 20:16 A RD 1 Instr. A2 RD2 0SrcB AluOutM ReadData Memory A3 Reg. 1 WD W 20:16 File WriteData 3 Rt 0 WriteReg D WE 15:11 Rd 1 + A RD 4 15:0 <<2 Data Sign Ext SignImm PCBranch Memory PCPlus4F PCPlus4D PCPlus4E + WD

AE0B36APO Computer Architectures 7 Some other examples and solutions

● Do you know Parallel ATA (PATA)? ● Integrated Drive Electronics (IDE) nebo EIDE (Enhanced IDE) from Western Digital may be more known term ● ATA = Advanced Technology Attachment

• It has been the most used interface to connect hard-drives and optical units • 40-pin header -> 40 leads (16 of these for data) • later 80 leads used for better signal integrity (shielding), but 40-pin header preserved

AE0B36APO Computer Architectures 8 Serial ATA – Solution used today

● Serial ATA (SATA) more used today ● SATA 1.0: 150 MB/s (PATA:130MB/s) ● SATA 2.0: 300 MB/s ● SATA 3.0: 600 MB/s ● SATA 3.2: about 2 GB/s

• Interface used for drives and optical units connection today • 7 leads only!!! Pin Mating Function 1 1st Ground

2 2nd A+ (Transmit) 3 2nd A− (Transmit)

4 1st Ground

5 2nd B− (Receive) 6 2nd B+ (Receive) http://en.wikipedia.org/wiki/Serial_ATA AE0B36APO Computer Architectures 7 1st Ground 9 Interfacing terminology – Important terms:

● Interface ● Common communication part shared by two systems, equipment or programs. ● Includes also boundary and supporting control elements necessary for their interconnection. ● Bus × point-to-point connection. ● Address, data, control bus. ● Multiplexed/separate bus. ● Processor, system, local, I/O bus. ● Bus cycle, bus transaction. ● Open collector, 3-state output, switched multiplexers

AE0B36APO Computer Architectures 10 Reminder: bus x point-to-point connection

Bus – shared data path Remark: logical topology as seen from computer system inside (OS, programs) can differ from the physical topology Point-to-point connection

Many different combinations in between in real systems AE0B36APO Computer Architectures 11 PC architecture (cca 2000+) … based on PCI

Point-to-point

Buses

Source: http://computer.howstuffworks.com/pci.htm

AE0B36APO Computer Architectures 12 PCIe architecture - bus is replaced by shared switch

RAM y r m o

e Graphic t m

s RAM

e controller y s s e S e m c c i RAM i v v e

Back Side bus e d d

(BSB) I A C

Memory AGP S I L2 P More details later … cache controller chipset Microprocessor Bus Bus Front Side bus (FSB) PCI Bus ISA Bus bridge bridge

PCIe = PCI Express

RAM

RAM End End End End point point RAM point point

Root End Microprocessor Switch complex point

End End End End End point point point point point

AE0B36APO Computer Architectures 13 Another bus example

Microcomputer control system VME

AE0B36APO Computer Architectures 14 VME Bus (and system)

Students of KyR surely will meet soon (avionic, railway, industry)

AE0B36APO Computer Architectures 15 VME milestones

VMX bus

Disc I/O Memory Processor Memory Controller Controller

VMS

VME bus Version Max. throughput

I/O Memory Processor VME32 40 MB/s Controller (IEEE-1014) VMX bus VME64 80 MB/s

VME2eSST 320+ MB/s

AE0B36APO Computer Architectures 16 PC platform: some details concerning PCI

AE0B36APO Computer Architectures 17 PCI signals – 32-bit version

AD[31::00] address and data C/BE[3::0]# PAR

FRAME# TRDY# interface IRDY# control STOP# PCI INTA# DEVSEL# INTB# IDSEL# device INTC# INTD# PERR# error reporting SERR#

access REQ# arbitration GNT# (master only) CLK# system RST#

AE0B36APO Computer Architectures 18 PCI - architecture

Bus access arbiter IDSEL L # # L # # L # L # # # T E T E T T Q E E Q Q Q S N S N S E S N N E E AD[31::00] E D D G D R D G R G G R R I I I I

C/BE[3::0]# PCI Slot 1 Slot 2 Slot 3 Slot 4

Control

IRQ subsystem

AE0B36APO Computer Architectures 19 PCI bus cycle example

AD bus is - bidirectional and - multiplexed (explanation later) AE0B36APO Computer Architectures 20 Bus-line electronics - buffer

bi-directional

uni-directional

AE0B36APO Computer Architectures 21 IO output: TP and OC? Practical examples:

Vdd Vdd

B A Out OC – Open Collector

A

B

Vss

TP – Totem Pole

Internal structure NAND circuit equipped by open collector output. Pull up resistor is connected externally. Internal structure of two inputs NAND chip from TTL 7400 family

AE0B36APO Computer Architectures 22 Study of conflict: bi-directional transfer on a single bus line

& & No problem with parallel inputs. Inputs There can be short circuit on outputs!

& & Outputs 1 0

+Vcc +Vcc

switched on Solution? switched on

AE0B36APO Computer Architectures 23 One solution: output circuit as OC x tri-state

+Vcc +Vcc

Bus line Bus line

Vin OE

OC – Open collector 3S – Tri-state output OE = Output Enable

AE0B36APO Computer Architectures 24 Bus switches and multiplexers

Bus Exchange

MUX/DeMUX 1A1 1B1 Two-Port 1A 11B A B

1B2

OE 1A2 1B2 OE Control S Logic OE Control S Logic

Source: Texas Instruments Digital Bus Switch Selection Guide http://www.ti.com/signalswitches

AE0B36APO Computer Architectures 25 Data transfer synchronization

AE0B36APO Computer Architectures 26 How to synchronize/recognize valid data on the bus?

● Possible transfer types: ● asynchronous, ● synchronous, – pseudo-synchronous, ● isochronous (not focus of this lecture).

● Remark: ● usually we call these timing diagrams as bus protocol ● usually only one bus transfer cycle is analyzed

AE0B36APO Computer Architectures 27 Synchronization options: synchronous transfer

AE0B36APO Computer Architectures 28 Synchronization options: asynchronous transfer

AE0B36APO Computer Architectures 29 How to slow down synchronous variant

What to do when data source or target cannot process given data at clock rate

Status signals used by transfer participants (initiator or target of the transfer request) requests to insert WAIT- cycles

Pseudo-synchronous transfer variant

AE0B36APO Computer Architectures 30 Bus/signal lines reuse/multiplexing

Bus access arbiter IDSEL

AD[31::00]

PCI C/BE[3::0]# Slot 1 Slot 2 Slot 3 Slot 4

Control

IRQ Interrupt subsystem

AD[31::00] signals are used for address transfer during initial transfer phase, then they are used for data

AE0B36APO Computer Architectures 31 PCI terms and definitions I.

● Two devices participate in each bus transaction: ● Initiator (starts the transaction) × Target (obeys request) ● Initiator = Bus Master, ● Target = Slave. ● Initiator and target role does not directly impose data source and receiver role! ● What does it mean that bus is multimaster? ● More participants can act as Bus Master – initiate transfers! ● When bus signals are shared, then transactions need to be serialized and only one device can act as master at any given time instant ● Bus master is responsible to grant bus to requesting participant ● Who takes care of bus arbitration? ● One dedicated device or bus backplane ● Or decentralized arbitration is possible for some buses (not PCI) AE0B36APO Computer Architectures 32 PCI terms and definitions II.

● The rising clock edge is reference/trigger point for all phase of the bus cycle/transaction timing ● Bus cycle is formed in most cases by ● address phase ● data phase ● Premature termination of data transfer is possible during bus cycle ● Transfer synchronization itself is pseudo-synchronous

AE0B36APO Computer Architectures 33 Bus cycle kind/direction – command – specified by C/BE

C/BE[0::3]# Bus command (BUS CMD) 0000 Interrupt Acknowledge 0001 Special Cycle 0010 I/O Read 0011 I/O Write 0100 Reserved 0101 Reserved 0110 Memory Read 0111 Memory Write 1000 Reserved 1001 Reserved 1010 Configuration Read (only 11 low addr bits for fnc and reg + IDSEL) 1011 Configuration Write (only 11 low addr bits for fnc and reg + IDSEL) 1100 Memory Read Multiple 1101 Dual Address Cycle (more than 32 bits for address – i.e. 64-bit) 1110 Memory Read Line 1111 Memory Write and Invalidate AE0B36APO Computer Architectures 34 PCI bus memory write timing

AE0B36APO Computer Architectures 35 Some remarks and observations

● The length of the transferred data block is controlled by the FRAME signal. It is negated (de-asserted) before last word transfer by initiator. ● Single world or burst transfer can be delayed by inserting wait cycles (pseudo-synchronous synchronization)!

AE0B36APO Computer Architectures 36 PCI bus memory read timing

This transaction is going to be captured and shown by logic analyzer during labs AE0B36APO Computer Architectures 37 Interrupt acknowledge cycle

● Processor needs time to save interrupted execution state to allow state restoration after return from service routine ● Interrupt controller provides vector number (assigned to the asynchronous event) and CPU is required to translate it to the interrupt service routine start address ● All these activities require some time

AE0B36APO Computer Architectures 38 Interrupt acknowledge cycle timing

AE0B36APO Computer Architectures 39 Some more details about standard PC PIC IRQ routing

Timer IRQX 0 Keyboard INTA# INTA# INTA# INTA# ) IRQY t 1 e 1 INTB# INTB# INTB# INTB# s 2 IRQZ # p i

INTC# INTC# INTC# INTC# 9 INTR h

IRQW 3 5 c

INTD# INTD# INTD# INTD# 2 I 8

4

C to CPU C P I ( 5 PCI#4 PCI#3 PCI#4 PCI#4 P IRQ2,9 r o 6 x

d IRQ3 e PCI slots l r 7 p s a i l t

o IRQ4 l a r b u e n IRQ5 i m h

a CMOS RTC p s

i 0

IRQ6 t m r s e

IRQ7 e m p 1 u

o 2 q d r IRQ10

f 2 # e e

t r 9 s

t

a IRQ11 t

3 5 r p p 2 g u u

IRQ12 8 r e r 4 t r r C

n NXP e e i

IRQ14 I t t 5 n n P I IRQ15 I 6 7

AE0B36APO Computer Architectures 40 Standard PC PIC interrupt vectors assignment

Common interrupt numbers for PC category computers:

IRQ 0 Timer (for scheduler, timers) 1 Keyboard 2 i8259 cascade interrupt 8 Real-time clocks (CMOS wall time) 9 Available or SCSI controller 10,11 Available 12 Available or PS/2 mouse 13 Available or arithmetics co-processor 14 1-st IDE controller 15 2-nd IDE controller 3 COM2 4 COM1 5 LPT2 or available 6 Floppy disc controller 7 LPT1

AE0B36APO Computer Architectures 41 Recapitulation of the bus description steps

● Notice: we started PCI description by bus topology analysis, PCI signals and then we moved to timing diagrams ● simpler cases first (read and write) ● then how bus access is arbitrated ● the special functions (interrupt acknowledge) last

● Doc. Šnorek recommends: always follow these steps when trying to learn new bus technology.

AE0B36APO Computer Architectures 42 PCI bus timing laboratory exercise

AE0B36APO Computer Architectures 43 Some more notes regarding the bus hardware realization

AE0B36APO Computer Architectures 44 How signals are transferred

English term Signalling

Single ended (asymmetric) versus. differential (symmetric)

AE0B36APO Computer Architectures 45 Typical signaling levels and some speed considerations

Differential signaling Single ended signaling

AE0B36APO Computer Architectures 46 PCIe

AE0B36APO Computer Architectures 47 PCIe architecture

RAM RAM End End End point End point RAM point point

Root End Microprocessor Switch complex point

End End End End End point point point point point

Source: http://computer.howstuffworks.com/pci-express.htm

AE0B36APO Computer Architectures 48 PCIe topology and components

CPU

PCI Express PCI Express Endpoint CPU Root Complex Memory PCI Express-PCI PCI Bridge Express

PCI/PCI-X Switch

PCI PCI PCI PCI Express Express Express Express

Legacy Legacy PCI Express PCI Express Endpoint Endpoint Endpoint Endpoint

AE0B36APO Computer Architectures 49 Why switch to serial data transfer

AE0B36APO Computer Architectures 50 PCIe transfers signaling, PCIe lanes

● Link interconnect switch with exactly one device ● Differential AC signaling is used – two wires for single direction ● Each link consists of one or more lanes ● Lane consists of two pair of wires ● One pair for Tx and other one for Rx ● Data are serialized by 8/10 code ● The separate pairs allow full- duplex operation/transfers

AE0B36APO Computer Architectures 51 PCIe physical link layer

Differential full-duplex physical layer

8b/10b encoding provides enough edges for clock signal reconstruction/synchronization and balanced number of ones and zeros. This ensures zero common signal (DC) and AC (only) coupling is possible

AE0B36APO Computer Architectures 52 PCIe physical layer model

Source: Budruk, R., at all: PCI Express System Architecture

AE0B36APO Computer Architectures 53 PCIe characteristics

● 2.5 GHz clock frequency (1 GT/s), raw single link single direction bandwidth 250 MB/s (can be multiplied by parallel lanes – 2×, 4×, 8×) ● Single link efficient data rate is 200 MB/s, this is 2× … 4× more than for classic PCI ● The bandwidth is not shared, point to point interconnection ● Two pairs of wires, differential signaling ● Data are encoded (modulated) using 8b/10b code ● Expected up to 10 GHz clocks due technology advances ● PCI Express 2.x (2007) allows 5 GT/s (5 GHz clock) ● PCI Express 3.x (started at 2010) increases it to 8 GT/s ● Encoding changed from 8b/10b (20% bandwidth used by encoding) to "scrambling" and 128b/130b encoding (takes only 1.5% of the bandwidth)

AE0B36APO Computer Architectures 54 PCIe x4

PCIe x16

PCIe x1

PCIe x16

32-bit classic PCI slot

Picture source: Wikipedia LanParty nF4 Ultra-D mainboard from DFI

AE0B36APO Computer Architectures 55 The PCIe board DB4CGX15 used in your semester work

● PCIe x1 ● Altera EP4CGX15 BF14C6N FPGA ● EPCS16 Configuration device ● 32Mbyte DDR2 SDRAM ● 20 I/O Pins ● 4 Input Pins ● 2 User LEDs

AE0B36APO Computer Architectures 56 PCIe communication protocol – hardware and software layers

PCIe physical topology is serial, point-to-point, packet oriented, but its logical view and behavior is the same as PCI – that is multi-master bus, same enumeration, read/write cycles

AE0B36APO Computer Architectures 57 Data packet structure and transport layers

Source: http://zone.ni.com/devzone/cda/tut/p/id/3767#toc0

AE0B36APO Computer Architectures 58 PCIe packet format

Specification highlights: • Packet length from 4B to 4096B • PCIe packed has o be exchanged as the whole (no option to preempt when higher priority transfer is requested) • Long packed can cause increase of latencies in the system • On the other hand, short packets overhead (frame/data) is considerable AE0B36APO Computer Architectures 59 Other, more advanced, architectures and PC future

AE0B36APO Computer Architectures 60 What are future options?

● In PC development, 15 years is quite a long time and there have been significant successes, but what about the future? ● There are already other technologies in use (some in PC systems already), many originating from supercomputers: ● PCI-X, ● HyperTransport ● InfiniBand ● QPI ● More: ● Advanced Computer Architectures (AE4M36PAP) course

AE0B36APO Computer Architectures 61 InfiniBand vs. HyperTransport

AE0B36APO Computer Architectures 62 Miscellaneous information regarding buses

AE0B36APO Computer Architectures 63 Nowadays, using high speed, serial bus is quite common.

FireWire

DVI-D, HDMI switch from analog Replacement of slow VGA/ RGB RS-232 serial, PS/2. General purpose now: disc, video, etc. Serial attached SCSI switch from parallel (wide, differential) SCSI Switch from parallel ATA AE0B36APO Computer Architectures 64 Key characteristics of the dominant bus technologies

Parameter Firewire USB 2.0 PCI SATA Serial Express Attached SCSI Expected use external internal internal internal both Max. nodes number 63 127 1 1 4 No. of wires 4 2 4 4 4 (fundamental only) Raw bandwidth 50 MB/s 0,2 MB/s 250 MB/s 300 300 MB/s 100 MB/s 1,5 MB/s 1x MB/s 60 MB/s Hot-plug attach? yes yes (yes) yes yes Max. bus distance 4,5 m 5 m 0,5 m 1 m 8 m Official standard IEEE USB PCI-SIG SATA-IO T10 designation 1394 PCI committee

AE0B36APO Computer Architectures 65 Key characteristics – another view

AE0B36APO Computer Architectures 66 Another comparison of bus standards

AE0B36APO Computer Architectures 67