STREAMS Overview

 STREAMS is a exible communication sub-

system framework

{ Originally develop ed by for Research

UNIX Network Programming

UNIX

 STREAMS provides a uniform infrastruc-

Overview of STREAMS

ture for developing and con guring character-

based I/O

{ e.g., networks, terminals, lo cal IPC

Douglas . Schmidt

 STREAMS supp orts the addition and re-

moval of pro cessing comp onents at installation-

time or run-time

{ Via user-level orkernel-level commands

1 2

STREAMS Overview cont'd

STREAMS Bene ts

 The STREAMS paradigm is data-driven, not

demand-driven

 STREAMS provides an integrated environ-

ment for developing kernel-resident network-

{ i.e., asychronous in the kernel, yet synchronous at

the application

ing services

 STREAMS promotes de nition of standard

 Supp orts b oth immediate and deferred pro-

service interfaces

cessing

{ e.g., TPI and DLPI

 Internally, data are transferred by passing

p ointers to messages

 STREAMS supp orts dynamic \service sub-

stitution" controlled by user-level commands

{ Goal is to reduce memory-to-memory copying over-

head

4 3

A Simple Stream

Bene ts cont'd STREAMS APPLICATION PROCESS fd = open ("/dev/dev0");

USER

Message-based interfaces enable o -b oard

 KERNEL

proto col migration

WRITE STREAM READ

QUEUE HEAD QUEUE

 Permits layered and de-layered multiplexi ng

More recent implementations take advan-

 WRITE- SIDE READ- SIDE

of paralleli sm in the op erating system

tage ( DOWNSTREAM) ( UPSTREAM)

and hardware

"dev0" WRITE STREAM READ

QUEUE DRIVER QUEUE

6 5

A Mo dule on a Stream

The Stream Head

 A \stream head" exp orts a uniform service

to other layers of the UNIX kernel APPLICATION PROCESS interface

(fd, PROCESSI_PUSH, "modx");

{ Including the application \layer" running in user-

USER space

KERNEL

General stream head services include WRITE STREAM READ 

QUEUE HEAD QUEUE

1. Queueing

a Provides a synchronous interface to asychronous

STREAM devices "modx" WRITE READ

QUEUE MODULE QUEUE

2. Datagram- and stream-oriented data transfer

3. Segmentation and reassembly of messages

"dev0" WRITE STREAM READ

Event propagation

QUEUE DRIVER QUEUE 4.

{ i.e., signals

8 7

The Stream Head cont'd

The Stream Head cont'd

 Stream head op erations include

PROCESS

{ stropen USER

KERNEL

. called from le system layer to op en a Stream

sd_pollist

strclean

stdata_t {

. called from le system layer to remove event

sd_wrq sd_siglist cells from Stream Head when a le is closed

STREAM strevent

strclose queue_t queue_t { Head

strwsrv() strrput() strevent

. called from the le system layer to dismantle a Stream q_next q_next strevent

Intermediate

strread STREAM Modules {

and STREAM

called from the le system layer to retrieve data

Driver .

messages coming upstream

10 9

 Stream Head op erations cont'd

Messages

{ strwrite

 In STREAMS, all information is exchanged

. called from the le system layer to send data

via messages

messages downstream

{ i.e., b oth data and control messages of various

priorities

{ strioctl

. called from the le system layer to p erform con-

trol op erations

 A multi-comp onent message structure is used

to reduce the overhead of

{ strgetmsg

1. Memory-to-memory copying

. called from the layer to get a proto col

or data message coming upstream

{ i.e., via \reference counting"

2. Encapsulation/de-encapsulation

{ strputmsg

{ i.e., via \comp osite messages"

. called from the system call layer to send a pro-

to col or data message downstream

{ strpoll

 Messages may b e queued at STREAM mo d-

. called from the le system layer to check if p ol-

ules

lable events are satis ed

11 12

Comp osite Message

Message Structure

mblk_t mblk_t b_band b_next b_cont b_prev b_cont b_rptr b_datap b_datap b_wptr b_datap

db_ref DYNAMICALLY ALLOCATED db_base db_base db_type DATA mblk_t db_lim BUFFER db_base dblk_t dblk_t

queue_t dblk_t DATA DATA

BUFFER BUFFER

13 14

Message Bu er Sharing

Message Typ es

mblk_t STREAMS

M DATA user data

M PROTO proto col information

M PASSFP pass le p ointer

IOCTL user io ctl request

mblk_t M

BREAK request line break b_datap M

b_rptr

M SIG signal pro cess group

b_wptr dblk_t

M DELAY request transmit delay

M CTL mo dule-sp eci c control message

SETOPTS set Stream head options

b_datap M

RSE reserved for RSE use b_rptr M db_base b_wptr

db_ref = 2

 Normal priority messages

{ M DATA, M PROTO, M PASSFP, and M IOCTL

may b e generated from user-level

Typically subject to ow control DATA {

BUFFER

16 15

STREAMS Message Typ es

Queue Structure

cont'd

M PCPROTO proto col information

M FLUSH ush queues

IOCACK acknowledge io ctl request

M qband

IOCNAK fail io ctl request

M qb_hiwat qb_last qband

COPYIN request to copyin io ctl data M qb_lowat qb_first module_stat

qb_count qb_next

M COPYOUT request to copyout io ctl data

M IOCDATA reply to M COPYIN and M COPYOUT

WRITE READ

PCSIG signal pro cess group

M STREAM QUEUE QUEUE q_qinfo

q_qinfo HEAD

READ read noti cation

M queue_t queue_t

qinit

M HANGUP line disconnect

q_next q_next open()

ERROR fatal error M close()

WRITE READ put()

STOP stop output immediately M STREAM QUEUE QUEUE q_qinfo

DRIVER service()

START restart output

M q_qinfo queue_t queue_t

STOPI stop input immediately

M module_info

STARTI restart input M q_link q_maxpsz

qband qband

PCRSE reserved for RSE use M q_flag q_minpsz

q_hiwat q_last q_lowat q_ptr q_count

q_first

 High priority messages

*Typically not ow controlled

*M PCPROTO may b e generated from user-level

* Others passed b etween STREAM comp onents

17 18

Queue Structure cont'd

Queue and Message Linkage

t queue  queue_t

qband

{ Primary data structure

BAND 1 BAND 2

. Each mo dule contains a write queue and a read

queue q_nband

q_band qb_next qb_next

{ Stores information on ow control, scheduling, max/min

interface sizes, linked messages, private data q_first qb_first qb_first

q_last qb_last qb_last

 qinit

{ Contains put, service, open, close routines sub NORMAL ( BAND 0) MESSAGES PRIORITY

( BAND 1) qband  b_next MESSAGES

b_prev PRIORITY

Contains information on each additional message { b_next ( BAND 2)

MESSAGES

> 0 and < 256 band b_prev HIGH b_next PRIORITY

b_prev MESSAGES

 module info

{ Stores default ow control information

19 20

Queue Subroutines cont'd

Queue Subroutines

 Standard subroutines cont'd

 Four standard subroutines are asso ciated with

queues in a mo dule or driver, e.g.,

{ putqueue t *q, mblk t *mp

{ openqueue t *q, dev t *devp, int o ag, int s ag,

cred t *cred p;

. Performs immediate pro cessing

. Called when Stream is rst op ened and on any

. Supp orts synchronous communciation services

subsequent op ens

 i.e., further queue pro cessing is blo cked until

. Passed a p ointer to the new read queue

put returns

. Also called any time a mo dule is \pushed" onto

{ servicequeue t *q

the Stream

. Performs deferred pro cessing

{ closedev t dev, int ag, int otyp, cred t *cred p;

. Supp orts asynchronous communication services

. Called when last reference to a Stream is closed

 Uses the message queue available in a queue

. Also called when a mo dule is \p opp ed" o the

Stream

. Runs as a \weightless" pro cess:::

21 22

Flow Control and the service

Queue Flow Control

Pro cedure

Typical non-multiplexed example WRITE() SLEEPS UNTIL STREAM IS APPLICATION  " BACK- ENABLED" PROCESS

USER

service queue t *q KERNEL int TYPICALLY NO QUEUEING ON

WRITE f STREAM HEAD STREAM

QUEUE HEAD t *mp; WRITE QUEUE mblk

q_next

while mp = getq q != 0

WRITE

queclass mp == QPCTL jj STREAM if QUEUE MODULE PRESSURE

-

canputnext q f BACK

q_flag == QFULL /* Pro cess message */

q_next putnext q, mp;

WRITE STREAM g

QUEUE DRIVER

else f

putb q q, mp;

return 0;

g

g

 put and service work together to supp ort ad-

 Flow control is more complex with multi-

visory ow control

plexers and concurrency

24 23

Flow Control and the canput

Pro cedure

Flow Control and the put

 canputnext is used by put and service

Pro cedure

routines to test advisory ow control condi-

tions

 Typical put example

int put queue t *q, mblk t *mp

 e.g.,

f

if queclass mp == QPCTL jj

int canputnext queue t *q

canputnext q f

f

/* Pro cess message */

nd closest queue with a service pro cedure

putnext q, mp;

if queue is full f

else

set ag for \back-enabling"

putq q, mp;

return 0;

/* Enables service routine */

g

return 0;

return 1;

g

g

 Note that non-MP systems may use canput:: :

25 26

getq

putq

 The mblk t *getqqueue t * function dequeues

a message from a queue

 The int putqqueue t *, mblk t * function

enqueues a message on a queue

{ It is typically called by a queue's service pro-

cedure

{ It is typically called by a queue's put pro cedure

when it can no longer pro ceed:::

 Messages are dequeued in priority order

 putq automatically handles

{ i.e., higher priority messages are stored rst in the

queue!

1. priority-band allo cation

2. priority-band message insertion

 getq handles

3. ow control

1. Flow control

2. Back-enabling

 Enqueueing a high priority message auto-

matically schedules the queue's service

pro cedure to run at some p oint

 getq returns 0 when there are no available

{ Di ers on MP vs. non-MP system

messages

28 27

Multiplexin g

 STREAMS provides rudimentary supp ort for

Multiplexor Links b efore

multiplexing via multiplexor drivers

{ Unlike mo dules, multiplexors o ccupy a le-system no de that can b e \op ened" fd1 = open ("/dev/ip"); APPLICATION

fd2 = open ("/dev/eth");

. Rather than \pushed" USER

KERNEL

STREAM WRITEWRITE READREAD WRITEWRITE READREAD STREAM

HEAD QUEUEQUEUE QUEUEQUEUE QUEUEQUEUE QUEUEQUEUE HEAD

 Multiplexors may contain one ormore upp er

and/orlower connections to other STREAM

and/or multiplexors mo dules STREAM WRITEWRITE READREAD WRITEWRITE READREAD STREAM QUEUE QUEUE QUEUE QUEUE MULTIPLEXOR QUEUE QUEUE QUEUE QUEUE DRIVER

"/dev/ip" "/dev/eth"

{ Enables supp ort forlayered network proto col suites

. e.g., "/dev/tcp"

 Note there is no automated supp ort forprop-

agating ow control across multiplexors

29 30

Multiple xor Links after

Internetworking Multiplexor ioctl (fd1, I_LINK, fd_2); APPLICATION

APPLICATION A PPLICATION APPLICATION USER U SER

KERNEL KERNEL

STREAM WRITE READ WRITE READ WRITE READ WRITE READ STREAM WRITEWRITE READREAD WRITE READ STREAM WRITE READ WRITE READ WRITE READ WRITE READ DISABLED HEADS QUEUEQUEUE QUEUEQUEUE QUEUEQUEUE QUEUEQUEUE QUEUEQUEUE QUEUEQUEUE QUEUEQUEUE QUEUEQUEUE HEAD QUEUEQUEUE QUEUEQUEUE QUEUE QUEUE HEAD

UPPER WRITEWRITE READREAD WRITEWRITE READREAD WRITEWRITE READREAD WRITEWRITE READREAD STREAM MULTIPLEXOR QUEUEQUEUE QUEUEQUEUE QUEUEQUEUE QUEUEQUEUE QUEUEQUEUE QUEUEQUEUE QUEUEQUEUE QUEUEQUEUE STREAM UPPER WRITEWRITE READREAD MULTIPLEXOR MULTIPLEXOR QUEUE QUEUE LOWER WRITEWRITE READREAD MULTIPLEXER QUEUE QUEUE "/dev/ip" MULTIPLEXOR QUEUEQUEUE QUEUEQUEUE

" BORROWED" QUEUES FROM UPPER WRITEWRITE READREAD LOWER WRITEWRITE READREAD MULTIPLEXOR QUEUEQUEUE QUEUEQUEUE STREAM MULTIPLEXOR QUEUE QUEUE THE STREAM HEAD QUEUE QUEUE MULTIPLEXOR LOWER WRITEWRITE READREAD WRITEWRITE READREAD MULTIPLEXOR QUEUEQUEUE QUEUEQUEUE QUEUEQUEUE QUEUEQUEUE

WRITEWRITE READREAD STREAM STREAM WRITEWRITE READREAD WRITEWRITE READREAD QUEUEQUEUE QUEUEQUEUE DRIVER DRIVER QUEUEQUEUE QUEUEQUEUE QUEUEQUEUE QUEUEQUEUE "/dev/eth" NETWORK

INTERFACES

31 32

Driver Data Structure Linkage Mo dule Data Structure Linkage

strdata (Stream head) Other kernel layers cb_ops fmodsw f_str sd_strtab d_str

streamtab streamtab

st_wrinit st_rdinit st_wrinit st_rdinit

qinit qinit qinit qinit

q_qinfo q_qinfo q_qinfo q_qinfo queue_t queue_t queue_t queue_t qi_minfo qi_minfo qi_minfo qi_minfo STREAM STREAM module_info module_info module_info module_info

DRIVER MODULE

33 34

Pip es and FIFOs

Pip es and FIFOs cont'd

In SVR4 and Solaris, the traditional lo cal

 STREAM PIPE

IPC mechanisms such as pip es and FIFOs

APPLICATION APPLICATION using STREAMS have b een reimplemented int fds[2]; pipe (fds); fork (); USER

KERNEL

 This has broaded the semantics of pip es and

Os in the following ways:

FIF WRITE READ WRITE READ WRITE READREAD QUEUE QUEUE QUEUE QUEUE QUEUE QUEUE

EADS

1. Pip es are now bidirectional STREAM pip es H

STREAM

2. Pip es and FIFOs now supp ort b oth bytestream-

oriented and message-oriented communication

FIFO

ioctls exist to mo dify the b ehavior of STREAM

{ APPLICATION APPLICATION

descriptors to enable or disable many of the new mkfifo ("/tmp/fifo"); open ("/tmp/fifo", 0); features open ("/tmp/fifo", 1); USER

KERNEL

Pip es can be explicitly named and exp orted into 3. STREAM WRITEWRITE READREAD

the le system QUEUE QUEUE

HEAD QUEUE QUEUE

4. Pip es and FIFOs can b e extended by having mo d-

ules pushed onto them

35 36

Mounted Streams and CONNLD

Mounted Streams and CONNLD

cont'd

In earlier generations of UNIX, pip es were  int fds[2];

pipe (fds); SERVER

to allowing multiplexed commu- restricted CLIENT fattach (fds[1], "/tmp/server_x");

nication ioctl (fds[1], I_PUSH, "connld"); open ("/tmp/server_x", O_RDWR);

ioctl (fds[0], I_RECVFD, &recv_fd); USER

This is overly complex for many client/server-style { WRITE READ KERNEL STREAM WRITEWRITE READ QUEUE QUEUE

HEADS QUEUEQUEUE QUEUE applications MOUNTED STREAM ("/tmp/server_x")

connld

 SVR4 UNIX and Solaris provide a mech-

known as \Mounted Streams" that

anism SERVER CLIENT

p ermits non-multiplexed communication read()/write() USER

KERNEL

{ This provides semantics similar to UNIX domain STREAM WRITE READ WRITE READ

WRITE READ WRITEWRITE READ WRITE READ WRITEWRITE READ ets so ck QUEUE QUEUE QUEUE QUEUE HEADS QUEUE QUEUE QUEUEQUEUE QUEUE QUEUE QUEUE QUEUEQUEUE QUEUE

connld

{ However, Mounted Streams are more exible since

MOUNTED STREAM

rp orate other features of STREAMS

they inco ("/tmp/server_x")

38 37

Layered Multiplexi ng cont'd

Layered Multiple xin g

 Advantages

Share resources such as control blo cks APPLICATION APPLICATION APPLICATION {

USER

KERNEL

{ Supp orts standard OSI and Internet layering mo d-

WRITE READ STREAM WRITE READ WRITEWRITE READREAD WRITEWRITE READREAD WRITEWRITE READREAD els QUEUE QUEUE HEADS QUEUE QUEUE QUEUEQUEUE QUEUEQUEUE QUEUEQUEUE QUEUEQUEUE QUEUEQUEUE QUEUEQUEUE

UPPER WRITEWRITE READREAD WRITEWRITE READREAD WRITEWRITE READREAD WRITEWRITE READREAD MULTIPLEXOR QUEUE QUEUE QUEUE QUEUE QUEUE QUEUE QUEUE QUEUE

QUEUE QUEUE QUEUE QUEUE QUEUE QUEUE QUEUE QUEUE STREAM Disadvantages MULTIPLEXOR  LOWER WRITEWRITE READREAD MULTIPLEXOR

QUEUEQUEUE QUEUEQUEUE

{ More pro cessing involved to demultiplex in deep roto col stacks STREAM WRITEWRITE READREAD p

DRIVER QUEUEQUEUE QUEUEQUEUE

{ May be more dicult to parallelize due to lo cks red resource contention

NETWORK INTERFACE and sha

{ Hard to propagate ow control and QOS info across

muxer

40 39

De-layered Multiplexin g cont'd

De-layered Multiplexi ng

 Advantages

APPLICATION

Less pro cessing for deep proto col stacks APPLICATION APPLICATION {

USER

KERNEL

{ Potentially increased parallelism due to less con- STREAM WRITEWRITE READREAD WRITEWRITE READREAD WRITEWRITE READREAD WRITEWRITE READREAD QUEUE QUEUE QUEUE QUEUE QUEUE QUEUE QUEUE QUEUE HEADS QUEUE QUEUE QUEUE QUEUE QUEUE QUEUE QUEUE QUEUE tention and lo cking required

STREAM WRITEWRITE READREAD WRITEWRITE READREAD WRITEWRITE READREAD WRITEWRITE READREAD

Easier to propagate ow control and QOS info QUEUE QUEUE QUEUE QUEUE QUEUE QUEUE { MODULES QUEUE QUEUE QUEUE QUEUE QUEUE QUEUE QUEUEQUEUE QUEUEQUEUE

STREAM WRITEWRITE READREAD WRITEWRITE READREAD WRITEWRITE READREAD WRITEWRITE READREAD MULTIPLEXOR QUEUEQUEUE QUEUEQUEUE QUEUEQUEUE QUEUEQUEUE QUEUEQUEUE QUEUEQUEUE QUEUEQUEUE QUEUEQUEUE

DRIVER

 Disadvantages

NETWORK INTERFACE

{ Violates layering e.g., need a packet lter

{ Replicates resources such as control blo cks

41 42

STREAM Concurrency

Concurrency Alternatives

 Mo dern versions of STREAMS supp ort multi- ro cessing p APPLICATION APPLICATION

C1 C2

Since mo dern UNIX systems have multi-threaded

{ STREAM ernels

k Heads

LP_XDR::svc (void) LP_XDR::svc (void) { /* incoming */ } { /* outgoing */ } Modules

Modules

 Di erent levels of concurrency supp ort in- LP_TCP::put clude LP_TCP::put (Message_Block *mb) (Message_Block *mb) { /* incoming */ }

{ /* outgoing */ }

1. Fine-grain

LP_DLP::svc (void) LP_DLP::svc (void) { /* incoming */ }

{ /* outgoing */ } * Queue-level

* Queue-pair-level STREAM

Tail

Coarse-grain 2. NETWORK DEVICES

OR PSEUDO-DEVICES

* Mo dule-level * Mo dule-class-level WRITE READ MODULE PROCESS MESSAGE

* Stream-level QUEUE QUEUE

OBJECT OBJECT OBJECT OR THREAD OBJECT

 Note, develop ers must use kernel lo cking

primitives to provide mutual exclusion and

 Layer Paralleli sm

synchronization

43 44

Concurrency Alternatives cont'd

APPLICATION APPLICATION C1C2 STREAM CP_XDR::put Heads (Message_Block *mb) { /* incoming */ } CP_XDR::put (Message_Block *mb) { /* outgoing */ } CP_TCP::put (Message_Block *mb) CP_TCP::put { /* incoming */ } (Message_Block *mb) { /* outgoing */ } CP_IP::put (Message_Block *mb) CP_IP::put { /* incoming */ } (Message_Block *mb) { /* outgoing */ } Modules Modules CP_DLP::svc (void) { /* incoming */ } CP_DLP::svc (void) { /* outgoing */ } STREAM Tail NETWORK DEVICES OR PSEUDO-DEVICES

WRITE READ MODULE QUEUE QUEUE PROCESS MESSAGE

OBJECT OBJECT OBJECT OR THREAD OBJECT

 Connectional Paralleli sm 45

Concurrency Alternatives cont'd

APPLICATION APPLICATION

STREAM Heads MP_XDR::put (Message_Block *mb) MP_XDR::put { /* incoming */ } (Message_Block *mb) { /* outgoing */ } Modules Modules MP_UDP::put (Message_Block *mb) MP_UDP::put { /* incoming */ } (Message_Block *mb) { /* outgoing */ }

MP_DLP::svc (void) { /* incoming */ } MP_DLP::svc (void) { /* outgoing */ } STREAM Tail NETWORK DEVICES OR PSEUDO-DEVICES

WRITE READ PROCESS

MODULE QUEUE QUEUE OR THREAD

 Message Paralleli sm 46

STREAMS Evaluation

 Advantages

{ Portability, availability, stability

{ Kernel-mo de eciency

 Disadvantages

{ Stateless pro cess architecture

. i.e., cannot blo ck!

{ Lack of certain supp ort to ols

. e.g., standard demultiplexing mechanisms

{ Kernel-level development environment

. Limited debugging supp ort:::

{ Lack of real-time scheduling for STREAMS pro cessing:: :

. Timers may b e used for \iso chronous" service 47

Summary

 STREAMS provides a exible communica-

tion framework that supp orts dynamic con-

guration and recon guration of proto col

functionality

 Mo dule interfaces are well-de ned and rea-

sonably well-do cumented

 Supp ort for multi-pro cessing exists in So-

laris 2.x 48