Interconnect & Communication Space, Time, & stuff…

What is the big I don’t see what deal with these is so exciting about things? the “back-side” either.

Lab 8 due Tomorrow (Wednesday)!

6.004 – Fall 2004 11/23/04 modified 11/22/2004 11:44 AM L21 – Communication 1 Computer System Technologies What’s the most important part of this picture?

DRAM Hard Disk Drives Linux Windows Mother boards XP

Flash Memory SDRAM LAN technology

ActiveX Graphics App Controls Acceleration Servers

6.004 – Fall 2004 11/23/04 L21 – Communication 2 Technology comes & goes; interfaces last forever Interfaces typically deserve more engineering attention than the technologies they interface… • Abstraction: should outlast many technology generations • Often “virtualized” to extend beyond original function (e.g. memory, I/O, services, machines) • Represent more potential value to their proprietors than the technologies they connect. Interface sob stories: • Interface “warts”: Windows “aux.c” bug, Big/little Endian wars • IBM PC debacle ... and many success stories: • IBM 360 Instruction set architecture; Postscript; Compact Flash; ... • Backplane buses

6.004 – Fall 2004 11/23/04 L21 – Communication 3 System Interfaces & Modularity

Ancient Times (Ad hoc connections) MEM MEM I/O Late 60s (Processor-dependent ) CPU I/O CPU DISK I/O DISK

MEM MEM I/O 80s (Processor-independent Bus)

CPU DISK I/O Back-side bus Today MEM Buses Galore Graphics MEM CPU I/O I/O MEM

CPU L2 $ “AGP” bus Front-side bus (PCI and EISA) ? DISK I/O I/O

6.004 – Fall 2004 11/23/04 L21 – Communication 4 Interface Standard: Backplane Bus

The backplane provides: Modular cards that plug into Power a common backplane: Common system clock CPUs Wires for communication Memories Bulk storage a address I/O devices d S/W? data operation start finish clock

BUS LINES

Printed Circuit Cards MODULE LOGIC

6.004 – Fall 2004 11/23/04 L21 – Communication 5 The Dumb Bus: ISA & EISA

Original primitive approach -- Just take the control ISA bus (Original IBM PC bus) - signals and data bus from Pin out and timing is nearly identical the CPU module, buffer it, to the 8088 spec. and call it a bus.

Ah, you forget, , S-100, SWTP SS-50, STB, , Apple 2E, …

6.004 – Fall 2004 11/23/04 L21 – Communication 6 Smarter “Processor Independent” Buses

NuBus, PCI… TERMINOLOGY – Isolate basic communication primitives from processor BUS MASTER – a module that architecture: initiates a bus transaction. • Simple read/write protocols (CPU, disk controller, etc.) • Symmetric: any module can become “Master” (smart I/O, multiple processors, etc) BUS SLAVE – a module that • Support for “plug & play” responds to a bus request. expansion I’ve been (Memory, I/O device, etc.) waiting here for hours and I Goal: vendor-independent still haven’t interface standard seen a bus cycle go by yet! BUS CYCLE – The period from when a transaction is requested until it is served.

6.004 – Fall 2004 11/23/04 L21 – Communication 7 Buses, Interconnect… what’s the big deal?

Aren’t buses simply logic circuits with long wires?

Wires: circuit theorist’s view: Wires: interconnect engineer’s view: Equipotential “nodes” of a Transmission lines. circuit. Finite signal propagation Instant propagation of v, i velocity. over entire node. Space matters. “space” abstracted out of Time matters. design model. Reality matters. Time issues dictated by RLC elements; wires are timeless.

6.004 – Fall 2004 11/23/04 L21 – Communication 8 Bus Lines as Transmission Lines

ANALOG ISSUES: • Propagation times ? ? ? ? ƒ Light travels about 1 ft / ns (about 7”/ns in a wire) • Skew ƒ Different points along the bus see the signals at different times • Reflections & standing waves ƒ At each interface (places where the propagation medium changes) the signal may reflect if the impedances are not matched. ƒ Make a transition on a long line – may have to wait many transition times for echos to subside.

TIME

6.004 – Fall 2004 11/23/04 L21 – Communication 9 Coping with Analog Issues...

We’d like our bus to be technology independent... • Self-timed protocols allow bus transactions to accommodate varying response times; • Asynchronous protocols avoid the need to pick a (technology- dependent) clock frequency.

BUT... asynchronous protocols are vulnerable to analog-domain problems, like the infamous

i WIRED-OR GLITCH: what happens i when a switch is opened???

COMMON COMPROMISE: Synchronous, Self-Timed protocols • Broadcast bus clock • Signals sampled at “safe” times * DEAL WITH: noise, clock skew (wrt signals)

6.004 – Fall 2004 11/23/04 L21 – Communication 10 Synchronous Bus Clock Timing

sample edge assertion edge

CLK

Signal at source Signal at destination

“Settling Time” “de-skew time”

Allow for several “round-trip” bus delays so that ringing can die down.

6.004 – Fall 2004 11/23/04 L21 – Communication 11 A Simple Bus Transaction

MASTER:MASTER: sample edge assertion edge 1)1) Chooses Chooses bus bus operation operation 2)2) Asserts Asserts an an address address CLK 3)3) Waits Waits for for a a slave slave to to answer.answer.

start (Master) SLAVE:SLAVE: 1)1) Monitors Monitors start start 2) Check address finish (Slave) 2) Check address 3)3) If If meant meant for for me me a)a) look look at at bus bus operation operation b) do operation operation WRITE (Master) b) do operation c)c) signal signal finish finish of of cycle cycle

address (Master) BUS:BUS: 1)1) Monitors Monitors start start 2)2) Start Start count count down down data (Master) 3)3) If If no no one one answers answers before before countercounter reaches reaches 0 0 then then “time“time out” out”

6.004 – Fall 2004 11/23/04 L21 – Communication 12 Multiplexed Bus: Write Transaction More efficient use of shared wires We let the address and data buses share the same sample edge assertion edge wires.

CLK Slave sends a status message by driving the start (Master) operation control signals when it finishes. Possible indications: finish (Slave) - request succeeded - request failed

operation WRITE (Master) OK (Slave) - try again

A slave can stall the write by address adr (Master) data (Master) waiting several cycles /data before asserting the finish signal.

6.004 – Fall 2004 11/23/04 L21 – Communication 13 Multiplexed Bus: Read Transaction

sample edge assertion edge On reads, we allot one cycle for CLK the bus to “turn around” (stop driving and begin receiving). It generally takes some time to start (Master) read data anyway.

A slave can stall the read (for (Slave) finish instance if the device is slow compared to the bus clock) by waiting several clocks before Turn around time operation READ (Master) OK (Slave) asserting the finish signal. These delays are sometimes address called “WAIT-STATES” adr (Master) Turn around time data (Slave) /data Throughput: 3+ Clocks/word

6.004 – Fall 2004 11/23/04 L21 – Communication 14 Block Write Transfers

CLK

start (Master)

finish (Slave)

operation WBLK 2 (M) CONT (M) CONT (M) CONT (M) CONT (M) OK (Slave)

address adr A (M) data[A] data[A+1] data[A+2] data[A+3] /data Block transfers are the way to get peak performance from a bus. A throughput of nearly 1 Clock/word is achievable on large blocks. Slaves must generate sequential addresses.

6.004 – Fall 2004 11/23/04 L21 – Communication 15 Block Read Transfers

CLK

start (Master)

finish (Slave)

operation RBLK 4 (M) HOLD (S) CONT (S) CONT (S) CONT (S) OK (Slave)

address adr A (M) data[A] (S) data[A+1] (S) data[A+2] (S) data[A+3] (S) /data

Block read transfers still require at least one cycle to turn-around the bus. More WAIT-STATES can be added if initial latency is high. The throughput is nearly 1 Clock/word on large blocks. Great for reading long cache lines!

6.004 – Fall 2004 11/23/04 L21 – Communication 16 Split-Transaction Bus Operation … you knew we’d work pipelining in somehow!

The bus master can post CLK several read requests before the first request is served. start (M1) (M2) Generally, accesses are served in the same order finish (S1) that they are requested.

Slaves must queue up operation Rd #1 (M1) Rd #2 (M2) OK #1 (S1) multiple requests, until master releases bus. address adr A1 (M1) adr A2 (M2) data[A1] (S1) /data The master must keep track of outstanding requests and their status.

Throughput: 2 Clocks/word, independent of read latency

6.004 – Fall 2004 11/23/04 L21 – Communication 17 Bus Arbitration: Multiple Bus Masters “Daisy-Chain Arbitration” Request

Request Request Request Request Grant In Grant In Grant In Grant In Grant Out Grant Out Grant Out Grant Out

Module 1 Module 2 Module 3 Module 4 ISSUES: • Fairness - Given uniform requests, bus cycles should be divided evenly among modules (to each, according to their needs…) • Bounded Wait – An upper bound on how long a module has to wait between requesting and receiving a grant • Utilization - Arbitration scheme should allow for maximum bus performance • Scalability - Fixed-cost per module (both in terms of arbitration H/W and arbitration time.

STATE OF THE ART ARBITRATION: N masters, log N time, log N wires.

6.004 – Fall 2004 11/23/04 L21 – Communication 18 Outside the box… The Network as an interface standard

ETHERNET: In the mid-70’s Bob EMERGING IDEA: Protocol Metcalf (at Xerox PARC, an MIT “stacks” that isolate alum) devised a bus for networking application-level interface computers together. from low-level physical devices:

Application Session

Transport TCP UDP Network IP • Bit-serial (optimized for long wires) Physical Token Ring • Asynchronous (no clock distribution) • Variable-length “packets”

6.004 – Fall 2004 11/23/04 L21 – Communication 19 Generalizing Buses… Communication Topologies

1-dimensional approaches: “Low cost networks” – constant cost/node

BUS ONE step for random message delivery (but only one message at a time)

RING

Θ(n) steps for random message delivery

6.004 – Fall 2004 11/23/04 L21 – Communication 20 Quadratic-cost Topologies

COMPLETE GRAPH:

Dedicated lines connecting each pair of communicating nodes. Θ(n) simultaneous communications.

B B B B 1 2 3 4 CROSSBAR SWITCH:

• Switch dedicated between each pair of A nodes 1 A • Each Ai can be connected to one Bj at any 2 time A 3 • Special cases: A • A = processors, B = memories 4 • A, B same type of node • A, B same nodes (complete graph)

6.004 – Fall 2004 11/23/04 L21 – Communication 21 Mesh Topologies

Θ ( n ) Thruput Θ (n) Latency Θ ( n ) Cost

8-Neighbor 4-Neighbor 2-Dimensional Meshes

Nearest-neighbor connectivity: Point-to-point interconnect - minimizes delays - minimizes “analog” effects Store-and-forward Θ ( n ) Thruput Θ (n3 ) Latency (some overhead associated Θ ( n ) Cost with communication routing)

3-D, 6-Neighbor Mesh

6.004 – Fall 2004 11/23/04 L21 – Communication 22 Logarithmic Latency Networks

HYPERCUBE (n-cube): Θ 1-cube Cost = (n log n) 2-cube Worst-case path length = Θ(log n) 3-cube

4-cube

BINARY TREE: Maximum path length is Θ(log n) steps; Cost/node constant.

6.004 – Fall 2004 11/23/04 L21 – Communication 23 Communication Topologies: Latency Theorist's view: • Each point-to-point link requires one hardware unit. • Each point-to-point communication requires one time unit. Theoretical Actual Topology $ Latency Latency 3 Complete Graph Θ (n 2 ) Θ (1) ≥Θ(n)

Crossbar Θ (n 2 ) Θ (1) Θ (n)

1D Bus Θ (n ) Θ (1) Θ (n)

2D Mesh Θ (n ) Θ (n) 3D Mesh Θ (n ) Θ (n3 ) Tree Θ (n ) Θ (log n ) ≥Θ(n3 ) N-cube Θ ( n log n ) Θ (log n ) ≥Θ(n3 )

IS IT REAL? • Speed of Light: ~ 1 ns/foot (typical bus propagation: 5 ns/foot) • Density limits: can a node shrink forever? How about Power, Heat, etc … ?

OBSERVATION: Links on Tree, N-cube must grow with n; hence time/link must grow.

6.004 – Fall 2004 11/23/04 L21 – Communication 24 Communications Futures Backplane Buses – standard for peripherals + easy hardware configurability + vendor-independent standards - serialized communications FireWire ISA - bottleneck as systems scale up EISA RDRAM NuBus PCI Specialized buses for memory, graphics, … SDRAM SBUS New-generation communications... USB • Log networks (trees, hypercubes, …) • 2D Meshes (IWARP, ...) • 3D Meshes … • 4-neighbor, 3D mesh (NuMesh Diamond lattice)

Space: the final frontier?

6.004 – Fall 2004 11/23/04 L21 – Communication 25