<<

PacketTyp es: Abstract Sp eci cation of Network Proto col Messages

Peter J. McCann and Satish Chandra

ratories Packet Types: AbstractBell Lab Specification o of Network

Protocol263 Shuman Messages Blvd.

Nap erville, IL 60566

mccap, [email protected] f Peter J. McCann and Satish Chandra Bell Laboratories 263 Shuman Blvd., Naperville, IL 60566

{mccap, chandra}@research.bell-labs.com

Abstract

ip_fw_chk(struct iphdr *ip) f

struct tcphdr *tcp =

(struct tcphdr *)((__u32 *)ip + ip->ihl); In writing networking co de, one is often faced with the task

int offset =

of interpreting a raw bu er according to a standardized

ntohs(ip->frag_off) & IP_OFFSET;

packet format. This is needed, for example, when moni-

...

toring network trac for sp eci c kinds of packets, or when

switch (ip->protocol) f

unmarshaling an incoming packet for proto col pro cessing.

case IPPROTO_TCP:

In such cases, a programmer typically writes C co de that

if (!offset) f

understands the grammar of a packet and that also per-

src_port = ntohs(tcp->source);

forms any necessary byte-order and alignment adjustments.

dst_port = ntohs(tcp->dst);

Because of the complexity of certain proto col formats, and

...

b ecause of the low-level of programming involved, writing

such co de is usually a cumb ersome and error-prone pro cess.

Furthermore, co de written in this style loses the domain-

Figure 1: Excerpt from the rewall co de in Linux (v2.0.32).

sp eci c information, viz. the packet format, in its details,

making it dicult to maintain.

Motivated by this problem, signi cant research e ort has re-

We prop ose to use the idea of types to eliminate the need for

cently b een put into systematic software architectures and

writing suchlow-level co de manually. Unfortunately,typ es

languages suited for constructing networking software. These

in programming languages, such as C, are not well-suited

e orts usually address a sp eci c asp ect of the complexityof

for the purp ose of describing packet formats. Therefore,

networking co de. Some approaches try to b etter express the

wehave designed PacketTypes, a small packet sp eci ca-

mo dular structure and comp osition of proto col layers (e.g.,

tion language that serves as a typ e system for packet for-

x-kernel [11 ], Foxnet [6 ]). Others try to b etter express the

mats. PacketTypes conveniently expresses features com-

reactivecontrol within eachlayer (e.g., Esterel [5 ]). Yet oth-

monly found in proto col formats, including layering of pro-

ers emphasize the veri cation asp ect (e.g., Promela++ [3 ]),

to cols by encapsulation, variable-sized elds, and optional

or fo cus on an ob ject-oriented implementation (e.g., Mor-

elds. A compiler for this language generates ecientcode

pheus [1 ], Prolac [14 ]). These e orts demonstrate that an

for typ e checking a packet, i.e., matching a packet against a

implementation metho dology or language more suited to the

typ e. In this pap er, we describ e the design, implementation,

task at hand can help build cleaner and more robust im-

and some uses of this language.

plementations and can do so without exacting p erformance

p enalties.

1 Intro duction

One asp ect of the complexity of writing networking co de

arises from the fact that the wire format of a network packet

is xed by standards. Packet formats are indep endent of

Networking software is dicult to construct. Networking

any given machine's architecture to allow interop erability,

co de in a system must interface with bare hardware in a net-

and generally pack data as tightly as p ossible to minimize

work device, and at the same time implement complicated

the header overhead p er unit of actual content. Tointerpret

real-time algorithms. Furthermore, b oth the low-level data

a bu er containing a \raw" network packet, a programmer

manipulation and the emphasis on high p erformance have

must write low-level co de that understands the packet for-

(barring few exceptions) limited the choice of an implemen-

mat and that also p erforms any necessary byte-order and

tation language to C. As a result, development, testing, and

alignment adjustments as p er host requirements. We illus-

deployment of new proto cols is a slow and exp ensive pro cess.

trate suchlow-level co de by means of an example.

Permission to make digital or hard copies of all or part of this work for

Figure 1 shows an abstracted form of the rewall co de in the

p ersonal or classro om use is granted without fee provided that copies

Linux networking mo dule (net/). The co de rst com-

are not made or distributed for pro t or commercial advantage and

that copies b ear this notice and the full citation on the rst page. To

putes the starting address of an IP packet's into the

copy otherwise, or republish, to p ost on servers or to redistribute to

variable tcp,by adding the size of the header ( eld ihl)to

lists, requires prior sp eci c p ermission and/or a fee.

the start of the packet. It then computes the o set of the

SIGCOMM'00,Stockholm, Sweden.

Copyright2000ACM 1-58113-224-7/00/0008...$5.00.

321

1

present IP fragment into the variable offset; the macro

case (#payload ip) of

ntohs converts a network byte order representation to the

TCP fsource, dst, ...g =>

host byte order representation for a 16-bit quantity. If the

if ((#offset ip) = 0) then

proto col in the payload is TCP and o set is zero ( rst frag-

...

ment), it extracts the source and destination p ort numbers

from the TCP header. Later co de (not shown) implements

sp eci c rewall p olicies based on this and other information.

Figure 2: An ML-style description. The syntax #field var

is eld selection from a record. The term on the left of =>

The style of programming involved here is not complicated,

is a pattern, here matching the datatyp e TCP.

but do es have some undesirable prop erties. A program-

mer must explicitly enco de the layout of an IP packet us-

ing pointer arithmetic and bit op erations. He must en-

p ossible typ es of payload a given packet may b e asked

co de the correlation between the protocol eld's value of

to carry. Thus, this solution is not easily extensible.

TCP and the payload b eing a TCP packet. He must IPPROTO

Another problem with C unions is that one must still

also remember to convert anymulti-byte numeric quantities

test the discriminating eld and cho ose the interpre-

between the network byte order and the host byte order at

tation consistently. For example, if we representanIP

2

all appropriate places. While the TCP/IP suite do es not

payload as a union of TCP and UDP typ es, wemust

have a complicated packet format, other proto cols do, and

still make the rightchoice based on the value of the

writing co de that interprets a packet b elonging to one of

protocol eld.

those proto cols can require a lot of cumb ersome program-

ming. For example, the Q.931 proto col [12 ] de nes the for-

3. Finally, b ecause of p ossible alignment and byte order

mat of ISDN signaling messages as rather complex hierar-

mismatches b etween a host machine and the network

chical packets. A C implementation that parses a Q.931

format, applications may still need to do adjustments

message can run into thousands of lines of co de, and it is

b efore using a numeric value as a primitivetyp e.

easytointro duce co ding errors in o sets, sizes, conditionals,

or endianness.

What ab out data typ es in higher-level programming lan-

guages, such as Standard ML? In Standard ML, one might

The capabilitytointerpret a network packet is key to many

express the logic in Figure 1 using the co de fragmentshown

imp ortantnetworking applications, in addition to proto col

in Figure 2. UnlikeinFigure1, the programmer do es not

pro cessing. These applications include network monitoring,

need to explicitly interpret the wire format.

accounting, and security services. The only software archi-

tecture in the literature that addresses such applications is a

There are go o d reasons why proto col co de is not written

packet lter. However, lter sp eci cations essentially parse

in this way. First, unmarshaling from the wire format to

a packet in muchthestyle of Figure 1, and are written in

the high-level data typ e could b e exp ensive, b ecause ML's

byte co de (except for some higher-level expression languages

representation of data typ es is not as close to machine rep-

available for TCP/IP). The need for a mechanism to build

resentation as that of C. Second, de ning layered packets

such applications quickly and correctly,even for complicated

in an extensible wayis still dicult. Although Foxnet [6]

formats, has b een our primary motivation.

implements TCP/IP in Standard ML, packets are exp osed

as C-likebyte arrays. The co de uses low-level marshaling

At rst blush, the concern for parsing a packet may not ap-

and unmarshaling into SML records to access elds that are

p ear to b e a problem at all|one might b e able to simply

needed for proto col pro cessing.

overlaya raw bu er with an appropriately de ned C struct

or union, and read o the values from the elds of the struc-

In this pap er we show that the idea of typ es can nevertheless

ture. Because C supp orts bit elds in structures, this seems

b e a useful one in the context of networking applications. To

to be a plausible choice. However, the C typ e system is

this end, we prop ose PacketTypes ,asmallpacket sp eci -

not adequate for describing packet formats for a number of

cation language to describ e packet formats. The philosophy

reasons:

b ehind this language is to supply an external type system for

packet formats. While an external typ e system cannot b e

1. Proto col headers can contain elds whose sizes dep end

enforced by the compiler for a host language, weshowthat

on the value of a previous eld in the header. For ex-

by its disciplined use, a programmer can avoid the problems

ample, the options eld in an IP header can o ccupy

mentioned earlier. PacketTypes has the following salient

between 0 to 40 bytes, dep ending on the value of a pre-

features:

vious eld ihl (header length). Variable-sized elds

3

cannot b e represented as static typ es. Similarly,op-

 Packet descriptions are expressed as typ es.

tional elds are not supp orted byC structs.

 The fundamental op eration on packetsischecking their

2. Ctyp es do not supp ort the notion of layering of pro-

memb ership in a typ e.

to cols in a clean way. One p ossibility is to de ne the

payload of a packet as a union of all p ossible typ es of

 Layering of proto cols is expressed as successive sp e-

packets the payload could carry. A problem that im-

cialization on typ es.

mediately presents itself is howtoknow in advance all

 Re nementoftyp es creates new typ es, a facility useful

1

Alongpayload may b e transmitted by IP as a series of IP frag-

for packet classi cation.

ments, each of whichcontains a segment of the original payload, and

an indication of the o set of the segment in the original payload.

2

To put this in p ersp ective, in the Linux implementation (v2.0.32),

The role of PacketTypes is analogous to \yacc", in that

endianness related macros hton* and ntoh* together app ear ab out 300

it abstracts away the packet grammar into a separate sp eci-

times in the net/ipv4.

cation language, and automatically creates recognizers for

3

By static typ es we mean data structures whose memory can b e

packets. PacketTypes can nd application in a number of

allo cated at compile time. Pointer data structures can represent

situations, such as the following: variable-sized ob jects, but cannot b e allo cated at compile time.

322

Attribute Meaning

 Speci cation of network monitoring code. The task of

value The natural number formed by concatenat-

writing network monitoring co de can b e automated|

ing all the bits of the eld in network order.

and p erhaps more imp ortantly, sp ed up|by writing

numbits The total number of bits o ccupied bythe

the sp eci cations in PacketTypes .

eld.

 Packet classi cation. Typ e de nitions written using

numbytes The total number of bytes o ccupied bythe

PacketTypes can also serve as packet lter sp eci -

eld.

cations, and can b e much simpler to write than byte

numelems The number of elements in an array-typ e

co des.

eld.

alt For an alternative-typ e eld, a collection of

 Stub generation. The PacketTypes compiler can au-

b o oleans indicating which alternative was

tomatically generate interface co de b etween the wire

chosen.

representation and a host language's representation,

eliminating the need for low-level programming.

Table 1: Attributes and their meanings. See examples in

the text for details.

 Formal speci cation of packet formats. Instead of En-

glish descriptions and ASCI I graphics, Request for Com-

ments do cuments can use PacketTypes to adopt a

nybble := bit[4];

formal approach to describing formats. This could also

short := bit[16];

provide a basis for formal veri cation.

long := bit[32];

ipaddress := byte[4];

ipoptions := bytestring;

Wehave implemented this language, and based on it, have

built a numb er of applications, including a network moni-

IP_PDU := f

toring system that works for Q.931 proto col messages. We

nybble version;

have also measured the p erformance of these applications

nybble ihl;

on simulated, yet realistic workloads. Based on our exp eri-

byte tos;

ments, we b elieve that the packet sp eci cation language can

short totallength;

be implemented eciently enough and serve a number of

short identification;

uses in real systems.

bit morefrags;

bit dontfrag;

We describ e the packet sp eci cation language in Section 2,

bit unused;

and a case study on its application to network monitoring

bit frag_off[13];

in Section 3. We describ e the implementation of this lan-

byte ttl;

guage in Section 4. Section 5 describ es uses of our language

byte protocol;

in stub generation and packet classi cation, and also gives

short cksum;

p erformance results. Section 7 compares PacketTypes to

ipaddress src;

ipaddress dest;

related work.

ipoptions options;

bytestring payload;

g ...

2 PacketTypes: APacket Sp eci cation Language

IP PDU de nes the elds of an Internet Proto col version 4

In this section we describ e a sp eci cation language for packet

header [21 ]. This typ e imp oses a structure on packets, but

formats, based on the notion that the layout of elds within

without additional constraints, it allows many sequences of

apacket, plus a collection of constraints on those elds, can

bits that are not valid IP packets. The necessary constraints

be considered to be a type. The semantics of a typ e is a

app ear in a where clause following the sequence, as in:

subset of the universe of binary strings. As we will see b e-

IP_PDU := f

low, this sp eci cation technique is similar in many resp ects

...

to regular languages, but it adds a system of constraints

g where f

over attributes of terms whichislacking in most automata-

version#value = 0x04;

theoretic mo dels.

options#numbytes = ihl#value * 4 - 20;

We start with one primitivetyp e, bit, and add the ability

payload#numbytes = totallength#value - ihl#value * 4;

to de ne new typ es using just a few op erators. De nitions

g

havetheformname := type. For example,

These constraints sp ecify that the version eld is set to

byte := bit[8];

4 (for IP version 4), and give the number of bytes occu-

bytestring := byte[];

pied by the options and payload elds. We use the syntax

field#attribute to reference sp eci c attributes of the given

de nes the term byte to b e a sequence of exactly eightbits,

elds. A partial list of attributes and their meaning is given

and de nes the term bytestring to b e an arbitrarily long

in Table 1. Note that the attributes #numbits, #numbytes,

sequence of bytes. The empty square bracket op erator []

and #numelems can take on only natural numbers as val-

is therefore analogous to the Kleene closure op erator *,but

ues, and any packet for which the corresp onding constraints

with the addition of a constant argument, it constrains the

asso ciate a negative value with these attributes should be

rep etition to have exactly the given numb er of o ccurrences.

rejected as ill-formed.

We also allow grouping and sequencing of terms to form

A set of constraints in a where clause may followanynewly

structures. For example, the sp eci cation of an Internet

PDU example, these constraints have de ned typ e. In the IP

Proto col (IP) packet might b egin as:

b een used to sp ecify that a certain eld will hold a certain

constantvalue and to sp ecify the length of variable-length

323

terms disjunctively, where a given bitstring matches the typ e elds. In the absence of constraints, the rep etition op era-

if and only if it matches one of the constituent members.

tor [] is treated \greedily," meaning that any following data

Earlier, in describing IP PDU,we had de ned ipoptions as

that could b elong to the rep etition is assumed to do so. Con-

a bytestring. Wenowprovide a more precise de nition of

straints on the numb er of rep etitions may come from within

the ipoptions typ e:

the rep eated memb er, b ecause each elementmust match the

given typ e; from the where section of a structure that con-

ipoptions := f

tains the rep etition, as in the example ab ove; or from out-

NonEndOption neo[];

side the containing structure in the form of constraints on

EndOption eo[];

the length of the data ob ject in which the structure app ears.

bytestring padding;

g where f

In addition to de ning typ es by the := op erator, we also

eo#numelems <= 1;

allow a construct known as re nement, which is represented

g

bythe:> op erator. Re nementis used to add additional

constraints to an already-sp eci ed typ e. For example, an

Ethernet containing an IP packet might b e sp eci ed

Here the options are sp eci ed as a sequence of

as:

NonEndOptions, followed by an optional EndOption, followed

by padding. The [] op erator along with the constraint

macaddr := bit[48];

eo#numelems <= 1 e ectively ensures that there will b e zero

or one EndOptions. We use this idiom to denote optional

Ethernet_PDU := f

elds.

macaddr dest;

macaddr src;

While the EndOption is just a single byte whose value is

short type;

zero, the NonEndOptions are more interesting and make use

bytestring payload;

of the |= construct:

g

NonEndOption |= f

IPinEthernet :> Ethernet_PDU where f

NoOperation nop;

type#value = 0x0800;

Security sec;

overlay payload with IP_PDU;

LSRR lsrr;

g

SSRR ssrr;

RR rr;

StreamID sid;

Ethernet PDU contains the sp eci cation of an

Timestamp tstamp;

and IPinEthernet shows howtolayer IP on it by constrain-

g

ing the type to have the value 0x0800 and overlaying the

IP PDU de nition onto the Ethernet payload.

The alternatives construct |= is syntactically similar to the

The overlay...with constraint allows us to merge twotyp e

de nition construct := in that it consists of a list of typ e,

sp eci cations byemb edding one within the other, as is done

name pairs. The rep etition op erator can b e used as b efore

when one proto col is encapsulated within another. Overlay

and there may be a where clause following the members

constraints intro duce additional substructure to an already

with additional constraints. In contrast to :=,however, this

existing eld. Essentially, the constraints of the IP PDU typ e

example states that a NonEndOption may takeonanyone

will b ecome a part of the payload member of the Ether-

of the formats describ ed by the members.

net frame, and must b e checked b efore a packet can match

The #alt attribute may be used in constraints to detect

IPinEthernet. Because IPinEthernet is grounded in a link-

which alternativewas chosen. For example, the alternatives

layer frame that might come from a device driver, this would

list ab ove de nes seven booleans expressions, #alt @ nop

b e an appropriate representation on which to base a network

through #alt @ tstamp,eachofwhich is true if and only if

monitoring application.

the member took on the given alternative. We use a sep-

Sometimes it is useful to sp ecify additional constraints on

arate @ op erator to compare the #alt attribute to one of

an overlayed typ e in a re nement, or to otherwise reference

the p ossible values in order to keep the space of suchvalues

the elds of a sub-structure. For example, an IPinEthernet

distinct from other numeric or b o olean values. In a where

packet whose source address is 192.168.0.1 could b e expressed

clause of the ipoptions structure, for instance, the b o olean

as

expression neo[0]#alt @ tstamp would b e true if and only

if the rst alternativewas a timestamp.

My_IPinEthernet :> IPinEthernet where f

payload.srcaddr#value = 192.168.0.1;

Clearly, the expressiveness of the language comes primarily

g

from constraints. In general any kind of constraint, includ-

ing relational op erators >, < and b o olean combinators may

The notation payload.srcaddr is used to denote the IP

b e used (see the app endix). However, for computability rea-

source address of the packet. Here the dot notation is used

sons, the language limits the kinds of constraints that it ad-

to access the elds of an overlayed structure, but it mayalso

mits. Constraints that provide size information for variable-

b e used in other contexts such as to reference the substruc-

sized elds must b e such that they reference attributes only

ture of a de nition that uses other de nitions as member

of previously mentioned elds. An implementation should

typ es. Note that if a eld with substructure is overlayed,

enforce this restriction.

that structure is hidden by the new structure and the old

Moreover, anygiven implementation may put further limita-

elds are no longer accessible. This ensures that each dotted

tions in the kinds of constraints it supp orts; for example, it

name will resolve to a unique memb er of the packet struc-

may require that lengths b e given using very simple expres-

ture.

sions such as the ones in IP PDU. Limitations of our compiler

Finally, we allow the sp eci cation of alternative typ es us-

are mentioned in Section 4.5.

ing the alternation (|=) op erator. This op erator combines

324

0%0000010 Proceeding; 3 Network Monitoring Case Study

...

g

In this section, we consider in detail a network monitor-

ing example taken from a real-life situation: monitoring

The typ e infoelem is the most challenging part of this pro-

of signaling messages for ISDN calls in

to col. At the top level, an infoelem can be represented

switching equipment. The proto col under consideration is

simply as follows:

Q.931, the Layer-3 ISDN signaling, running over LAPD

frames. The general format of a Q.931 message includes

a single-byte proto col discriminator (8 for Q.931 messages),

infoelem |= f

Type1SingleOctetInfoel em t1;

a call reference value to distinguish b etween di erent calls

Type2SingleOctetInfoel em t2;

being managed over the same D channel, a message typ e,

MultiOctetInfoelem multi;

and various information elements (IEs) as required by the

g

message typ e in question. Thus, the top-level description of

a Q.931 packet is given as follows (the typ es cref, mtype

infoelem is de ned to b e one of three p ossible alternatives.

and infoelem will b e de ned shortly):

The rst two alternatives are simple, one-o ctet typ es. The

third, multi-o ctet typ e is quite complex, and is describ ed

Q931control := f

next. The rst and second alternatives can b e distinguished

byte protdisc;

from the third by their rst bit, whichisalways set, whereas

cref callref;

mtype messtype;

it is always clear in the third; they di er b etween themselves

infoelem elems[];

in their next three bits. Each MultiOctetInfoelem de-

g where f

scrib es its own length, which enables iteration over a variable-

4

protdisc#value = 0%0001000;

sized arrayof infoelems.

g

MultiOctetInfoelem is the base for carrying IEs. It can

carry a varietyofIEs,such as Bearer Capability, Cause, Call

We wish to write a monitoring to ol that taps Q.931 mes-

State, etc. We fo cus only on Bearer Capability; other ones

sages and prints out the elds of the message on a display

can b e handled similarly. In PacketTypes , MultiOctetInfo

device. We rst describ e the relevant p ortions of the format

elem is an alternatives list containing the various IEs:

of a Q.931 packet, and alongside, showhow PacketTypes

supp orts the idioms that app ear in it. We then compare

MultiOctetInfoelem |= f

the monitoring to ol created by using PacketTypes to one

BearerCapability bc;

written by hand in C to bring out the advantagesofusinga

...

concise yet p owerful description language.

g

The typ e cref contains one or more o ctets. Octets following

the rst octet in a cref denote an optional call reference

EachIEisare nement of the following \base" typ e:

value crv. The length of crv in o ctets is given in a p ortion

of the rst o ctet of cref, and could b e 0 if crv is omitted.

MultiOctetInfoelemBase := f

If crv is present, the most-signi cant bit of its rst o ctet

0%0 zero;

denotes a ag value. The packet typ es de ned next capture

LongElemIDcs0 elemid;

this description.

byte length;

bytestring payload;

g where f

cref := f

payload#numbytes = length#value;

nybble reserved;

g

nybble length;

crv cr[];

g where f

LongElemIDcs0 is another enumerated typ e expressed as a

cr#numelems <= 1;

5

list of alternatives, each b eing a seven-bit pattern.

cr#numbytes = length#value;

g

IEs can be quite complicated. An ASCI I picture of the

crv := f

Bearer Capability IE is shown in Figure 3. Its top level

bit flag;

de nition in our language is the following:

bit cvalue[];

g

BearerCapability :> MultiOctetInfoelemBase

where f

Here, the eld cr is constrained to have zero or one oc-

elemid#alt @ BearerCapabilityID;

currences. The constraintonthenumbytes attribute of cr

overlay payload with f

BC_group3 g3; expresses the relation b etween the value of the length eld

BC_group4 g4;

and the numb er of o ctets in cr.

BC_group5 g5[];

The typ e mtype is simple: one reserved bit followed by one

BC_group6 g6[];

seven-bit MessageType,which is an alternate list of bit pat-

BC_group7 g7[];

terns of seven bits each. In this list, instead of inventing

g where f

g5#numelems <= 1;

names for xed bit-patterns, we use the literals as typ es.

4

The iteration continues until no more infoelems can b e matched.

5

MessageType |= f

The sux cs0 corresp onds to co deset 0. Co desets can b e changed

0%0000000 Escape;

on the y to get a di erent set of IEs in place. We do not supp ort

co desets at present (see Section 6). 0%0000001 Alerting;

325

g6#numelems <= 1;

g7#numelems <= 1;

g

g

It is natural to describ e the elds g3 to g7 of Bearer Capabil-

ity in terms of octet groups. An o ctet group is a sequence of

non- nal o ctets followed bya nal octet. A non- nal o ctet

has its most signi cant bit clear, whereas a nal o ctet has

it set.

| 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | Octet

octetgroup := f

|------|

nonfinaloctet nfo[];

| |

finaloctet fo;

| Bearer Capability/Low Layer Compatibility | 1

g

|------|

nonfinaloctet := byte where f

| |

[0]#value = 0;

| Length | 2

g

|------|

finaloctet := byte where f

| 1 | Coding | |

[0]#value = 1;

| | Standard | information transfer cap. | 3

g

|------|

| | transfer | |

The syntax [0]#value denotes the rst element of the typ e

| 0/1 | mode | information transfer rate | 4

byte, in other words, the zero'th bit.

|------|

| | | | |

group3 through BC group7 can now Each of the typ es BC

| 0/1 | structure |configura'n|establish' t| 4a

be de ned by overlaying an octetgroup with appropriate

|------|

decomp ositions. We do not describ e these further, as they

| | | information transfer rate |

are quite routine.

| 1 | symmetry | destination -> origination | 4b

We can now sp ecify a monitor for the wire format by re n-

|------|

ing a LAPD frame (de nition is not shown, but assume it

| | 0 1 | |

contains a payload eld):

| 0/1 | layer 1 | user info layer 1 protocol | 5

|------|

Q931inLAPD :> LAPDframe where f

| |sync |Negot| |

overlay payload with Q931control;

| 0/1 |async|posbl| user rate | 5a

g

|------|

| |intermediat| NIC | NIC |flow |flow | |

In addition to being a machine readable enco ding of Fig-

| 0/1 | rate |on Tx|on Rx|on Tx|on Rx| 0 | 5b+

ure 3, the typ es describ ed ab ove can b e used to automat-

|------|

| |headr|multi| |LL ID|assgr|inbnd| |

ical ly construct monitoring co de. The compiler generates

| 0/1 |no hd|frame|mode |negot|assge|negot| 0 | 5b*

print statements after each unit (or typ e) is matched. An

|------|

imp ortant issue is the output format: the default print out

| | number of | number of | |

may not b e what is desired in a given monitoring system.

| 0/1 | stop bits | data bits | parity | 5c

One p ossibility is to sp ecify a printf -like template string

|------|

that de nes the output format, and interpret it with the

| |duplx| |

data values extracted from the packet. Another p ossibility

| 1 |mode | modem type | 5d

is to generate a skeletal data-structure traversal routine in C

|------|

and let the user insert print (or other) statements as needed.

| | 1 0 | |

| 1 | layer 2 | user info layer 2 protocol | 6

By contrast, writing C co de by hand to recognize and parse

|------|

aQ.931 packet is tedious due to the large number of con-

| | 1 1 | |

ditionals and bitwise op erations needed to capture the de-

| 1 | layer 3 | user info layer 3 protocol | 7

6

scription. One implementation of (roughly ) this function-

------

ality takes well over 10,000 lines of C. It is also harder to

5b+ for V.110

mo dify as sp eci cations evolve (for example, a version of

5b* for V.120

this proto col augments o ctet 3 of Figure 3 with an optional

3a o ctet.) By contrast, a PacketTypes version of a subset

of the same sp eci cation takes far fewer lines|by ab out a

factor of four|and contains far more readable text than the

Figure 3: Bearer Capability Information Element

corresp onding C co de.

Wehave pro duced a working implementation of a monitor

for Q.931 over LAPD (although wehave implemented only

a few information elements), and also for IP packets over

Ethernet. The latter includes complete parsing capability

for IP options. We are investigating the p ossibilityofcon-

structing monitors for ASN.1 messages, by expressing ASN.1

enco dings in our language.

6

For example, the implementation do es supp ort multiple co desets,

whichwe do not currently supp ort.

326

4.2 Elab oration 4 Implementation

In this phase, de nitions are expanded so that each usage in-

Wehave implemented a compiler for PacketTypes . The

stance of a de ned typ e is given its own data structure no de.

compiler works in four steps. In the rst step, it takes as

The later phases of the compiler use this data structure to

input a list of PacketTypes typ e de nitions and pro duces

store information sp eci c to this instance of the typ e; for

a parse tree consisting of a no de for each term in the in-

example, the o set of a particular typ e t will typically b e

put. In the second step, it p erforms semantic pro cessing to

di erent for each instance of t.For cases in which de nitions

resolve eldnames and propagate constants, so that struc-

are used in arraytyp es, only one no de is inserted for each

tures of xed size can b e recognized. In the third step, it

array instead of p otentially unb ounded replication. This has

pro duces an elab orated intermediate representation consist-

imp ortant implications for the co de generation phase. Also,

ing of one no de for each usage of a de nition. Finally, the

this phase identi es structural constraints, which are con-

compiler generates co de from the elab orated intermediate

straints that imp ose limits on the size of variable-lengths

representation. Construction of the parse tree is routine

elds, and demultiplexing constraints, which are constraints

and is not discussed further. The subsequent phases are de-

comparing the value of a key eld (see Section 4.4) to a

scrib ed next. We also discuss eciency considerations in the

constant.

matching co de, and limitations of our current approach.

4.3 Co de Generation

4.1 Semantic Pro cessing

The primary functionality of the generated co de is to check

In this step, the compiler p erforms a b ottom-up \size prop-

if a packet b elongs to a sp eci ed typ e. Therefore, for each

agation" so that structures of xed size are annotated with

de nition or alternatives list, one routine is generated that

their size. A structure has a xed size if:

checks for that typ e as well as any re nements of that typ e.

Usually only one of these routines corresp onding to a given

 it is a bit;

link-layer or device-sp eci c form will be of interest to the

 it is a xed size array of bits;

user, suchastheEthernet PDU from Section 2 or the LAPD-

frame from Section 3. This routine will search for the most

 it is a binary string literal;

re ned typ e that matches a given packet, or stop after nd-

 it is a de nition list (:=) consisting only of xed size

ing a single match for which the user has indicated interest.

members; or

During co de generation, rst each of the no des in the elab-

 it is an alternatives list (|=) consisting only of xed

orated intermediate representation is allo cated named lo ca-

size memb ers where all members have the same size.

tions, henceforth referred to as registers, to hold the non-

constant (or, run-time) information such as lengths and o -

A re nement has the same size as the structure it is re ning.

sets of elds. The co de also uses a set of scratch registers as

If the size of a structure cannot b e determined at this time,

needed. These registers are later mapp ed on to variables in

it must be computed at run time; the compiler generates

the generated co de. Next, the various language constructs

co de for this computation as describ ed later.

are translated, as describ ed b elow.

The compiler also resolves eldnames during this step. At

De nition List. For a de nition list (:=), the compiler gener-

each o ccurrence of a eldname in the constraints, it inserts

ates co de that checks the constraints of each memb er of the

a p ointer to the member to which it refers. Fieldnames can

typ e, and then checks the constraints of the where clause.

also b e \dotted", i.e., contain references to elds of substruc-

Each member is checked recursively in the same manner.

tures, e.g., payload.srcaddr. For each dotted eldname, a

Note that some members maybevariable-sized arrays and

search is p erformed backwards from the point at which it

that each elementofsuchanarraymay itself b e of variable

app ears in the following manner:

size. This necessitates an iteration strategy that rst checks

for any structural constraints on the numb er of elements or

 Abackwards search is p erformed to nd a constraint

the numb er of bits that is o ccupied by the memb er, and then

that overlays any eldname which is a pre x of the

checks each element in turn to b e sure that its constraints

target eldname. If suchanoverlay is found, the over-

are satis ed. Iteration must stop when the member runs

laid structure or de nition is searched for the rest of

out of space, either by violating its structural constraintor

the eldname.

by failing to leave enough ro om for all of the subsequent,

constant-size memb ers. Failure of an element to satisfy its

 If no such overlay constraintisfound, and this is a

constraints also terminates the array iteration.

re nement of another typ e, the other typeissearched

for the eldname as in the previous step.

Because each usage of a de nition is given a xed amount

of storage space, all elements of an array share the same set

 If this typ e is a de nition or alternatives list, the mem-

of registers and only the last element touched is saved at

ber names are checked to see if any match the rst

anygiven moment. If values of another element are needed,

term of the eldname. If one is found, its de nition is

they must b e recomputed. Also, iteration over an array uses

checked for the subsequent terms of the eldname.

only lo cal information. If an array uses up to o much space

 An empty eldname matches the entire structure in

and a later, variable-sized member is left without enough

which it app ears.

space, we do not attempt backtrackandchange the number

of elements o ccupied by the rst array. We b elieve this is

This algorithm implies that once a eld is overlaid, its inter-

an appropriate limitation for most real-world applications,

nal structure is no longer accessible. Only the newly overlaid

b ecause most proto cols do not require implementations to

elds may b e accessed from that p oint on. It also shows that

p erform backtracking during parsing.

the order in which overlay constraints app ear is signi cant.

327

Alternatives List. In the co de generated for an alternatives

list (|=), each member is checked in turn as b efore, but pro-

b er which is found to match.

cessing stops at the rst mem InfoElemBase

instead of b eing cumulative, bit o sets are restarted

Also, elemid

for eachmember. The constraints in the where

from zero payload

hecked as b efore. Note that for alternative lists

section are c InfoElem

the #alt attribute is available to see which alternativewas

matched. If a eldname of a constraint happ ens to reference

an alternativethatwas not selected, then the entire b o olean

h it app ears is taken to b e false. expression in whic Cause

elemid = 0x3

Re nement. Re nements of a typ e (:>) are checked immedi-

CallState overlay payload

yp e has b een checked. Each re nement ately after the base t elemid = 0x2

BearerCapability

yp e is checked in turn: the new constraints intro duced of a t overlay payload

elemid = 0x1

by the re nement are simply evaluated to see if they hold. If

overlay payload

a match is found, further re nements of the matching typ e

are checked b efore examining any sibling typ es. Thus, this

approach seeks to nd a most-re ned match, although other

p olicies can also b e implemented. If the typ e de nitions con-

nected by :> are arranged as a tree, in which a :> b causes

Figure 4: Optimization of traversal of sibling typ es. The

no de a to b e a child of no de b in the tree, then our matching

b old edges lead to typ es included in the alternatives list

strategy can b e seen as an preorder traversal of this tree.

(|=)of InfoElem. The dotted edges lead to typ es derived

Constraints. As mentioned earlier, structural constraints

from InfoElemBase using re nement(:>).

provide information ab out variable-length elds. The com-

piler uses registers to hold values of attributes computed at

run time. A structural constraint generates co de that reads

(when p ossible) a switch statement indexed by the demul-

the necessary attributes from previously matched elds|

tiplexing key to pro ceed directly to the co de to match a

these attributes are held in registers|and either evaluates

sp eci c re nement or alternative.

a b o olean, or assigns a value to an attribute of some eld.

We illustrate this idea by an example. Figure 4 shows three

Non-structural constraints generate co de that evaluates a

typ es, BearerCapability, CallState and Cause, that form

b o olean condition on #value or #alt attributes of elds.

a part of the alternatives list of InfoElem. The three typ es

If any b o olean condition evaluates to false, matching for

also are re nements of InfoElemBase, and can b e discrimi-

the corresp onding typ e fails and the execution attempts

nated by the value of the elemid memb er. When matching a

to match the packet against the next typ e in its traversal.

packet against typ e InfoElem, the base typ e can b e checked

Note that overlay constraints have no run-time signi cance

just once. If the base typ e matches, the value of elemid

by themselves, except that they can contribute additional

can be used to determine which particular typ e to exam-

structural and non-structural constraints.

7

ine further. A similar scheme could b e followed to match

The matching co de that we generate can optionally p erform

a packet against a typ e \tree" ro oted at InfoElemBase. If

rigorous b ounds checking to ensure that each structure ts

the lo cal constraints of InfoElemBase match, we attempt to

within the memory it is given. Such co de is often missing

nd a more sp eci c match. Again, the value of elemid is

from hand-co ded implementations of networking proto cols.

used to directly go to the co de that matches the particular

Along with matching co de, we can optionally generate co de

typ e, bypassing a preorder traversal.

to print out the values of elds as they are encountered.

The compiler uses demultiplexing constraints to identify op-

As shown in Section 3, this feature makes it easy to create

p ortunities for this optimization. For now, only equality

network monitoring to ols.

comparisons and alt @ constraints are supp orted as demul-

tiplexing constraints. This is b ecause the implementation

strategy is to generate C switch statements, which are e-

4.4 Eciency Considerations

ciently supp orted by a native C compiler. If there is more

than one eld used as the key in the demultiplexing con-

An optimization implicit in our matching scheme is to match

straints, a nested switch statement is generated. Ecient

common pre xes of comp osite typ es only once. For exam-

handling of ranges in demultiplexing constraints would re-

ple, if there are two re nements of IP PDU, say TCPinIP

quire the implementation of scalable range matching algo-

and UDPinIP, at run-time, the matching pro cess matches

rithms [7 ]. Also, handling of dynamic insertion and deletion

the IP PDU part only once, and then tries the additional

of endp oints in a general and ecient way would require

constraints of TCPinIP and UDPinIP in sequence. This is a

dynamic co de generation to pro duce co de equivalent to a

rather imp ortant optimization in the packet lter literature.

C switch. We present a more limited scheme for dynamic

However, matching against a list of re nements could be

insertion and deletion of demultiplexing keys in Section 5.3.

slow if the implementation were to iterate through a large set

Another imp ortant consideration is how to obtain fast co de

of \sibling" typ es until a matchwas found. In many cases,

for member value references. In principle, we could generate

these sibling typ es di er only in the value of one member

co de that can extract any number of bits from any o set

or a set of memb ers, which acts as a demultiplexing key. A

in the packet. However, such co de is likely to run slowly,

similar situation also arises for alternatives lists, if most of

b ecause machine instruction sets supp ort ecient loads only

the alternatives are either literal bit-strings or re nements

of a common typ e with a demultiplexing key. Our compiler

7

In general, InfoElem could have other alternatives unconnected

generates co de to p erform an optimized traversal in these

with the three typ es. In that case, if none of the three typ es match,

cases. Instead of a sequential search, the compiler generates

the others alternatives are then tried in order.

328

5 Applications and Performance from aligned addressed, and only of certain xed numb ers of

bits (typically, 8, 16, and 32.). Fortunately, in practice most

proto cols allo cate elds in a way thatpermitsustoperform

In this section we describ e some sample scenarios in which

member value references eciently. Our compiler makes

PacketTypes can b e used, and the p erformance of the gen-

certain assumptions to b e able to generate ecient co de for

erated co de for those situations.

value references. It assumes that any #value that needs to

b e extracted from a packet will b e of a xed, constantsize

less than 32 bits, and that the value will not straddle an

5.1 Network Monitoring Tool

alignment b oundary greater than this size. That is, elds

having 32 bits or less will b e completely contained within

Wehave already describ ed an application of PacketTypes

word-aligned four bytes, elds having 16 bits or less will b e

to rapid generation of network monitoring to ols. The de-

completely contained within a short-aligned twobytes, and

scription outlined in Section 3 can b e used to generate packet

so on. The compiler also optionally assumes that overlays

matching co de that prints out the values encountered in each

are applied only to byte-aligned elds to avoid b o ok-keeping

eld. Because such a monitor is pro ducing high-level output

for fractional o ctets.

to a le or a screen, p erformance is not an imp ortant issue.

4.5 Implementation Choices and Limitations

5.2 Stub Generator

For simplicity, our compiler currently supp orts only limited

In this section, we outline, by means of an example, how

forms of structural and demultiplexing constraints. Struc-

we can use PacketTypes to automatically obtain interface

tural constraints must currently b e written with a single at-

co de b etween the wire representation and a host language's

tribute (#numelems, #numbits,or#numbytes)ofavariable-

representation. We also examine the p erformance of such

length eld on the left hand side, and an op erator =, <,

stubs (which were compiled using the native C compiler)

or <=, followed byaninteger expression on the right-hand

with hand-written C stubs compiled using the same compiler

side. Currently, the compiler supp orts only demultiplexing

(Sun Workshop 4.2, -xO2 option).

constraints that are of the form name #value = constant or

Reconsider the co de from Figure 1. This co de fragment

name #alt @ constant.

(partially) unmarshals a TCP packet from an IP payload,

Assumptions on alignment of memb ers could be relaxed

and makes decisions based on some of the values found in

while maintaining the same level of p erformance for some

the proto col data. In this resp ect, it resembles a packet

sp ecial cases with the use of an alignment inference engine

lter, except it is interested in extracting some of the values

that could b e easily constructed, given our intermediate rep-

in a packet rather than demultiplexing it. This co de can b e

resentations and an indication of the exp ected alignmentof

written as:

the b eginning of each packet. Memb ers for whichnoalign-

ment or size could b e inferred would need to use more gen-

ip_fw_chk(struct iphdr *ip) f

eral, less ecientcode.

struct tcpview f

short src_port;

While we have chosen to pro duce C co de from our inter-

short dst_port;

mediate byteco de, this is not the only p ossible choice. We

g tv;

to ok this path to take advantage of the optimization done

...

by the native C compiler, and b ecause compilation over-

if (match((void *) ip,

head is currently not a concern in the primary application

TCPinIP_firstfrag,

we are interested in, viz., network monitoring. We could

TCPVIEW,

(void *) &tv))

also have pro duced sp eci cations for other packet lters or

...

pro duced executable co de directly, as is done using dynamic

co de generation in DPF [10 ]. Because each packet typ e uses

a xed amount of space during matching, hardware compi-

The function match takes four arguments, as shown in the

lation also app ears to b e feasible.

gure. The rst argument, ip, is simply a p ointer to the

packet bu er. The second argument, the typ e name TCPinIP

Currently, our compiler do es not p erform any optimizations

firstfrag, represents the encapsulation of a TCP data-

on the generated byteco de. We could (and plan to) adopt

gram inside an IP packet that is a rst fragment(IPoffset

an approach similar to that describ ed for BPF+ [4 ]. This

is 0). Typ e names o ccur as integer constants in the C co de.

is desirable not only for dynamic co de generation, where we

The third argument, a descriptor, sp eci es the set of eld

do not have the b ene t of a C optimizer, but also for some

values that the programmer desires to extract (explained

high-level transformations that a C compiler is not able to

further shortly), and the nal argument p oints to space

p erform.

where such extracted values must b e stored. Note that the

An unexp ected b ene t of generating C co de is easy de-

programmer did not have to write low-levelCcodetowork

bugging, b ecause we can use standard C debuggers to step

with the wire representation. The style of this co de is similar

through the generate co de. Once wemove to dynamically-

to the typ eful style of Figure 2.

generated ob ject co de, we will lose this facility. In fact, an

The match call entails two tasks. First, the packet is matched

option of generating C co de is worth keeping around just for

against the typ e TCPinIP firstfrag. Second, the values de-

debugging the typ e de nitions.

sired by the programmer must b e extracted and presented

in the host language's representation. The mechanism we

Our implementation strategy is geared towards batchcom-

cho ose to do this is to use a descriptor. Eachtyp e name is

pilation, in which all typ e de nitions are presented to the

asso ciated with a few user-de ned descriptors. A descriptor

compiler at once. In Section 5.3, we describ e a (restricted)

lists elds and the format in which the host language exp ects

wayinwhichwe p ermit run-time addition of new typ es.

them. For example, use of the descriptor TCPVIEW implies

329

that the src and dst elds of TCP PDU should be used to timings were collected by calling getrusage. Since weare

p opulate struct tcpview in agreement with the host ma-

working with a small amountofsimulated data, we b elieve

chine's alignment and byte order. The sp eci cation of the

that the working set ts in the cache and that the timings

descriptor itself lists the eld names that are desired, as well

are fairly indicativeofnumb er of instructions executed.

as the sizes in bits of the corresp onding elds in the struc-

The following table rep orts the execution time p er call in

ture that hold the extracted values (e.g., we could ask for

extracting a value that o ccupies 5 bits in the packet into a

nanoseconds for each of the stubs. The rst tworows show

8-bit char). For TCPVIEW, the view sp eci cation is given as

the p erformance of the hand-written and automatically gen-

follows:

erated co des, resp ectively. The third row is explained fur-

ther b elow.

TCPVIEW of TCPinIP firstfrag

payload.src D 16 0

Type1 Type2 Type3

payload.dst D 16 32

Hand-Written 91 105 158

Generated 128 110 189

This sp eci cation lists the elds to b e extracted in sequence;

Baseline 73 60 67

for each eld, it gives the eld name, address (A)or data

(D) to be copied, bit size and the bit o set in the target

8

The generated co de ran up to 40% slower than the hand-

space. Notice that the descriptor do es not assume, but

written co de. We insp ected the co de to understand the

rather explicitly sp eci es the bit o sets in target memory

reasons for this p erformance di erence. In Type1, our com-

where the extracted data is to b e stored. This is b ecause we

piler do es not p erform b o olean short-circuit evaluation that

did not want to generate co de sp eci c to the layout that the

the hand-written co de do es p erform. When we applied this

C compiler on any given machine pro duces for the target

transformation (manually) to the generated co de, it matched

structure (struct tcpview in the example). The generated

Type1 in 106 nanoseconds on an average, reducing the p er-

co de corrects the endianness of multi-byte quantities, so on

formance gap substantially. The remaining di erence in

anygiven machine, the user sees the views p opulated in the

times in each of the three typ es is due to di erent control

correct host byte order. Co de generation can b e done in a

sequences generated by the native C compiler for automat-

p ortable fashion, without regard to the endianness of the

9

ically generated Ccode and for the handwritten C co de.

host machine.

We do not currently know the reason for this di erentbe-

We can also automatically generate complete views|or parse

havior; in the future we plan to explore transformations on

trees|from a packet sp eci cation, pro ducing C structs that

byteco de that will result in b etter control constructs b eing

contain elds for every memb er of the input sp eci cation. To

generated in the resulting ob ject co de.

accommo date variable-sized ob jects, these views necessarily

In many situations, for example in Figure 1, programmers

make assumptions ab out p ointers and memory management

write classi cation co de inline. Therefore, it might b e more

that might not hold in every usage situation, but they serve

appropriate to factor out the overhead of the function call,

as convenient starting p oints when most or all elds need to

and of moving the data into a view structure in the timings.

b e extracted.

We measured this overhead by running a \Baseline" version

We used our compiler to obtain match functions that re-

of the exp eriment, in which no classi cation logic is run in-

turned a b o olean value and p opulated a view structure on

side the functions. We see that the cost of the instructions

success. We generated stubs for the following three typ es:

to evaluate the logic is only a small part of the total cost of

each stub call. Therefore, for p erformance-critical applica-

1. Type1 matches any Ethernet frame that holds an IP

tions, a desirable optimization to p erform would b e to inline

or an ARP packet. The view extracts Ethernet's type

the match call and eliminate the loads and stores to the view

eld.

structure.

2. Type2 matches Ethernet frames that are IP packets

from host 135.1.152.73 to host 135.1.152.66. (We

5.3 Packet Classi er

sp ecialize the typ e IPinEthernet describ ed earlier.)

The view extracts IP's src and dest addresses.

We also implemented the core of a packet classi er using

the technique for matching typ es describ ed ab ove. In a typ-

3. Type3 matches TCP segments b etween the ab ovetwo

ical deployment, a packet classi er requires the capability

hosts that have source p ort 1014 and destination p ort

of dynamically adding and removing typ es of interest. The

513. (We sp ecialize typ e TCPinIP, which, in turn, con-

implementation outlined in Section 5, however, works only

strains the protocol eld of IP PDU to 6, and overlays

on a static collection of typ es.

PDU.) The view extracts IP's src its payload with TCP

and dest addresses, and also TCP's src and dst p ort

Wehave added a feature to our system that allows us to dy-

numbers.

namically add and removetyp es, alb eit in a restricted way.

Wemake use of the observation that, typically, packet clas-

si ers are interested in demultiplexing structurally similar

We ran each of these stubs for a large numb er of times on

packets to a large numb er of endp oints: that is, the packets

simulated data. We used three di erent packets for the

are almost of the same typ e, with a few elds working as the

Type1 (of which 2 match), three di erent packets for the

demultiplexing key. For example, for TCP connections, the

Type2 (of which 1 matches) and four di erentpackets for

demultiplexing key is a four-tuple consisting of the source

the Type3 (of which2 match). Each packet was tried on the

and destination IP addresses and the source and destination

stub 10 million times. Our exp eriments were carried out on

TCP p orts. We call suchtyp es parameterized typ es.

a 300 MHz UltraSparc-I Ii machine with 128M memory; all

9

8

The sequence of loads and stores generated in the twocaseswas

The purp ose of providing the A alternative is that for payload of

the same.

packets, one wants to retrievea pointer to it rather than a copy.

330

We timed a test run that simulates 10 million arrivals of a se- The use of demultiplexing parameters is similar to the idea

quence of the four packets mentioned ab ove. We p erformed of identifying a set of demultiplexing constraints presented

this test with various numb ers of lters installed, from 10 to earlier, but that technique relied on the presence of similar

200 (of which, only 3 lters match). All runs were conducted constraints in a number of typ es given statically at com-

by a single user level pro cess. A realistic deploymentofsuch pile time. Here weareinterested in generating parameter

asystemwould require the standard packet lter machinery sets even for typ es that haveno derived typ es at compile

of inserting and deleting lters from an op erating system time, so we rely on the use of user pragmas in the typ e

kernel, whichwas not our fo cus here. sp eci cation to identify parameterized typ es and their de-

multiplexing keys. Re nements of parameterized typ es can

500

tially only di erentvalues of

only b e of certain kinds|essen "run.dat"

the demultiplexing key. These re nements can b e installed

450

in the system only through a sp eci c interface at run time.

Furthermore, no additional re nements are p ermitted. The

400

compiler generates functions, with formal parameters for the

demultiplexing key, to install or remove re nements of pa-

350

rameterized typ es. For the case of TCP packets, prototyp es

of such functions are the following: 300

Time (nanoseconds)

insert_TCP(int IPsrc, int IPdst, int srcport,

void 250

int dstport, void *endpoint);

200

void delete_TCP(int IPsrc, int IPdst, int srcport, dstport); int 150 0 50 100 150 200

Number of Filters

The functions insert and delete allow the user to install

and remove endp oints. Another function returns a previ-

Figure 5: Execution Times of the Packet Filter

ously installed endp oint corresp onding to the demultiplex-

ing key,orNULL if none is found. For TCP packets, it has

Figure 5 shows the time in nanoseconds needed to determine

the following prototyp e:

an endp oint (if any) for each packet. As is evident from

the gure, the p erformance of our system scales very well

void *lookup_TCP(int IPsrc, int IPdst, int srcport,

up on increasing the number of installed lters. The steps

int dstport);

in the gure denote the increase in the average lengths of

the linked list for each hash table bin (this could b e reduced

Typically, lookup is called only from inside the compiler-

by resizing the hash table p erio dically). Better scalability

generated matching co de.

could b e achieved with an implementation of more advanced

At run time, a hash table for each parameterized typ e main-

lo okup and insertion algorithms [7 ].

tains a map of the demultiplexing key and the endp ointin-

The absolute time that we obtain per packet, ab out 350

serted for that key.Thus, the interface functions mentioned

nanoseconds, is very low compared to the p er-packet cost of

ab ove map directly to insert, delete, and lo okup functions

running virtual-machine-based packet classi ers (e.g., MPF,

of a hash table. The hash table approach is essentially the

10

which costs several microseconds p er packet). In our case,

same as adopted in previous packet lter work [24 , 10 ].

we run compiled C co de to p erform the match, so there is

The traversal optimization presented in Section 4.4 and the

no interpretation overhead. On the other hand, our system

run-time technique presented here are more closely related

would incur a higher exp ense when installing completely new

than they rst app ear to b e. In Section 4.4, we used a C

parameterized typ es in the system, b ecause we need to rerun

switch statement, whenever a suitable demultiplexing key

the packet sp eci cation compiler and the C compiler (unless

could b e identi ed. A compiler would (in most cases) im-

dynamic co de generation is used). In a virtual-machine-

plement this switch statementby using a \jump" table of

based classi er, new lters can simply b e downloaded in the

co de addresses. If the cases of this switch statementwere

form of byteco de into the kernel with a system call.

known only at run time, wecouldmaintain a similar table

ourselves, without compiler supp ort. Furthermore, since the

6 Limitations of PacketTypes

size or distribution of the cases are unknown, a hash table is

agoodchoice for implementing this table. This is essentially

the technique that wehave adopted here.

We have found PacketTypes quite adequate to express

packet formats for typical ISO layer 3 and layer 4 proto-

In the remainder of this subsection, we describ e the runtime

cols. However, sometimes there are consistency conditions

p erformance of demultiplexing TCP sessions. Our exp eri-

on a packet that are at a semantic level b eyond the scop e

mental setup is the same as in the case of the stub generator.

of PacketTypes . One example is checking the value of

The workload that we measured is a set of four simulated

a checksum eld for whether it indeed corresp onds to a

TCP packets, each of whichvary only the TCP destination

standards-based checksum value for the packet. Another ex-

port number in Type3 ab ove. For three of the four packets,

ample is b eing able to resp ect ordering or uniqueness condi-

we installed a lter that matches the resp ectivepackets. We

tions in options, suchasahyp othetical condition that there

also inserted endp oints for other typ es, but these typ es did

not o ccur in the workload. However, the additional typ es

10

If optional b ounds checking on elds and overlayed structures

in uence the p erformance of hash table lo okups, whichisa

are enabled, the times go up by ab out 50%, to approximately 500

more realistic scenario.

nanoseconds p er packet.

331

not supp ort relational op erators on arithmetic quantities. is at most one app earance of the timestamp option in an

(b) CSN.1 is geared primarily for describing formats and for IP packet. This situation is analogous to that in compilers:

generating conforming packets for testing. It do es not have the yacc sp eci cation of a grammar cannot check for certain

a notion of views for extracting data from a packet. (c) semantic conditions, whichmust b e checked explicitly in a

PacketTypes pays signi cant attention to eciency of the subsequent phase. In future work wewillinvestigate adding

generated matching co de, and makes a number of assump- universal and existential quanti ers to supp ort more general

tions to b e able to do so. (d) PacketTypes tries to provide conditions over arraytyp es.

a user with a syntax similar to C structs, where as CSN.1's

Link-layer proto cols sometimes have additional characteris-

syntax is close to pure regular expression and grammar de-

tics, suchas character escaping in PPP [22 ], that are not

scriptions, which sometimes app ears overly terse.

considered in PacketTypes. Application-layer proto cols

Packet Filters. Packet ltering systems also provide some are often ASCI I-text based, and PacketTypes do es not of-

wayof sp ecifying a packet format, which could b e consid- fer sp eci c supp ort to handle them. Another problem in

ered a typ e de nition. Examples of packet lters include expressing higher-layer formats all the waytolinklayer (us-

CSPF [17 ], BPF [16 ], BPF+ [4 ], MPF [24 ], DPF [10 ] and ing :> and overlay)isthatwemayneedtoconvert from

PathFinder [2]. Except for PathFinder and BPF+, lters a datagram-oriented view of packets to a stream-oriented

are sp eci ed as low-level byteco de. Work byJayaram and view, or wemay need to re-assemble fragments of a lower-

Cytron [13 ] used context-free grammars to sp ecify lters in layer packet. These capabilities are beyond the scop e of

an attempt to gain comp osability and allow sp eci cation PacketTypes .

of variable-length elds. The generality of their approach

In certain proto cols, the grammar of packets can essentially

forced them to use LR parsers to create recognizers for pack-

change dynamically dep ending on previously seen data. An

ets. Furthermore, since their language is a string of bits, se-

example is the use of di erent sets of \information elements"

mantic relationships b etween elds must b e broken down to

that maychange as a packet is interpreted. Similarly, ASN.1

allowable strings of bits, which leads to hard-to-read sp eci-

packets can signal di erentencodingschemes to b e used for

cations.

the remainder of a packet, dep ending on values of certain

Data-structure Description Languages. The need for de- elds. One approachis to keep a limited amount of state

scribing a data structure has arisen in many contexts in at run time, and identify certain typ e de nitions as b eing

the past, but most imp ortantly in distributed computation, applicable only in a certain state.

suchas remote pro cedure calls and comp onent-based pro-

Another limitation of the language is that, while the user

gramming. The USC stub compiler [20 ] employs very pre-

need not be aware of the endianness of his own machine,

cise descriptions of structure layouts, but lacks the abilityto

all sp eci cations (in particular, the interpretation wehave

sp ecify variable-length elds. Other format description lan-

assigned to #value) assume that integers are represented as

guages such as ASN.1 [8 ] or XDR [23 ] allow for very exible

unsigned quantities in network byte order. This could easily

descriptions of structure, but do not provide control over lay-

be changed to any xed byte ordering, but the expression

out adequate to b e used for sp eci cation of arbitrary packet

of proto cols with variable byte orders, such as CORBA's

formats. Stub compiler languages such as CORBA IDL [19 ]

GIOP [19 ], which allows the byte order to change even within

or Flick[9]have similar limitations.

a single message, would require some mechanism for express-

Other. The idea of creating subtyp es by adding constraints ing this suchasa new #byteorder attribute. Similarly, rep-

on a typ e b ears resemblance to the work by Liskov and resentation of signed quantities could b e accommo dated by

Wing [15 ]. p erhaps adding #svalue attributes to stand for signed in-

terpretation of values.

Finally, PacketTypes do es not haveany primitives to ex-

8 Conclusion

press error conditions; either a packet matches or it fails.

Sometimes it is desirable for a packet sp eci cation to toler-

Interesting programming language concepts, suchastyp es,

ate certain kinds of errors. One approach mightbetoindi-

often go unused in systems software, b ecause of the deeply

cate the closest or \deep est" match and then indicate which

entrenched practice of writing low-level C co de. This leads

constraint failed to hold. Weintend to add such features in

to co de that is hard to write, maintain, and debug. This pa-

the future.

p er shows that even within this traditional co ding practice,

the idea of typ es has muchtoo er.

7 Related Work

Acknowledgments

CSN.1. PacketTypes is most closely related to the CSN.1

language [18 ], develop ed within the Europ ean Telecommu-

We thank the reviewers for their insightful comments on

nications Standards Institute. Similarly to PacketTypes ,

this work, and sp ecially our shepherd for his guidance in

CSN.1 sp eci es bit-strings that corresp ond to packet for-

preparing the nal version of this pap er. A previous version

mats by using regular expression and grammar constructs.

of this language was presented at the Workshop on Compiler

It also allows constructs similar to length and value at-

Supp ort for Systems Software, in May 1999.

tributes of PacketTypes . CSN.1 provides some desirable

features lacking in PacketTypes; for example, it directly

References

supp orts error handling by sp ecifying the error case bit-

strings along with the regular case bit-strings.

[1] Mark B. Abb ott and Larry L. Peterson. A language-based

PacketTypes di ers from CSN.1 in signi cantways. (a)

approach to proto col implementation. ACM Transactions

PacketTypes provides a more general constraint language

on Networking, 1(1):4{19, February 1993.

than CSN.1. For example, the CSN.1 core sp eci cation do es

332

[21] J. Postel. RFC 791: Internet Proto col, Septemb er 1981. Ob- [2] Mary L. Bailey, Burra Gopal, Michael A. Pagels, Larry L.

soletes RFC0760. See also STD0005. Status: STANDARD. Peterson, and Prasenjit Sarkar. PathFinder: A pattern-

based packet classi er. In Proceedings of the First Sym-

[22] W. Simpson. RFC1662: PPPinHDLC-like framing, July

posium on Operating Systems Design and Implementation.

1994. See also STD0051 . Obsoletes RFC1549 . Status:

USENIX Asso ciation, Novemb er 1994.

STANDARD.

[3] Anindya Basu, Mark Hayden, Greg Morrisett, and Thorsten

[23] R. Srinivasan. RFC 1832: XDR: External data representa-

von Eicken. A language-based approach to proto col construc-

tion standard, August 1995. Status: DRAFT STANDARD.

tion. In Proceedings of the ACM SIGPLAN Workshop on

[24] Masanobu Yuhara, Brian N. Bershad, Chris Maeda, and

Domain Speci c Languages (WDSL),Paris, France, January

J. Eliot B. Moss. Ecientpacket demultiplexing for mul-

1997.

tiple endp oints and large messages. In Proceedings of the

[4] Andrew Begel, Steven McCanne, and Susan L. Graham.

1994 Winter USENIX Conference, pages 153{165, January

BPF+: Exploiting global data- ow optimization in a gener-

1994.

alized packet lter architecture. Computer Communication

Review, 29(4):123{34, Octob er 1999.

[5] Gerard Berry and Georges Gonthier. The ESTEREL

App endix

synchronous programming language: Design, semantics,

implementation. Technical Rep ort 842, Ecole Nationale

Superieure des Mines de Paris, 1988.

de nition-list :

[6] Edoardo Biagioni, Rob ert Harp er, and Peter Lee. Signa-

[ de nition ]*

tures for a network proto col stack: A systems application of

Standard ML. In Lisp and Functional Programming,1994.

de nition :

id := structure

[7] Milind M. Buddhikot, Subash Suri, and Marcel Waldvogel.

id :> structure

Space decomp osition techniques for fast layer-4 switching. In

id |= structure

Proceedings of the Sixth International Workshop on Proto-

cols for High Speed Networks, pages 25{41. Kluwer Academic

structure :

Publishers, August 1999.

singleton-type ;

[8] CCITT. Recommendation X.208: Speci cation of Abstract

{ [ member ]* }

Syntax Notation One (ASN.1), 1988.

{ [ member ]* } where constraints

[9] Eric Eide, Kevin Frei, Bryan Ford, Jay Lepreau, and Gary

singleton-type :

Lindstrom. Flick: A exible, optimizing IDL compiler. In

id

Proceedings of PLDI '97, pages 44{56, 1997.

id [ integer-constant ]

[10] Dawson R. Engler and M. Frans Kaasho ek. DPF: Fast, ex-

id []

ible message demultiplexing using dynamic co de generation.

In Proceedings of ACM SIGCOMM'96 Conference on Appli-

member :

cations, Technologies, Architectures and Protocols for Com-

id id ;

puter Communication, 1996.

id id [ integer-constant ];

[11] Norman C. Hutchinson and Larry L. Peterson. The x-Kernel:

id id [];

An architecture for implementing network proto cols. Trans-

actions on Software Engineering, 17(1):64{76, January 1991.

constraints :

{ constraint-list }

[12] International Union. Recommendation

singleton-constraint

Q.931 - ISDN user-network interface layer 3 speci cation

for basic cal l control,May 1998.

constraint-list :

[13] Mahesh Jayaram and Ron K. Cytron. Ecientdemultiplex-

[ singleton-constraint ]*

ing of network packets by automatic parsing. In Proceedings

of the Workshop on Compiler Support for System Software

singleton-constraint :

(WCSSS), 1996.

boolexpr ;

[14] Eddie Kohler, M. Frans Kaasho ek, and David R. Mont-

gomery. A readable TCP in the Prolac proto col lan-

boolexpr :

guage. Computer Communication Review, 29(4):3{13, Oc-

expr = expr

tob er 1999.

expr > expr

expr < expr

[15] Barbara H. Liskov and Jeannette M. Wing. A b ehavioral

expr >= expr

notion of subtyping. ACM Transactions on Programming

expr <= expr

Languages and Systems, 16(4), Novemb er 1994.

expr != expr

[16] Steven McCanne and Van Jacobsen. The BSD packet lter:

expr @ id

Anewarchitecture for user-level packet capture. In 1993

boolexpr || boolexpr

Winter USENIX, pages 259{269, January 1993.

boolexpr && boolexpr

[17] Je rey C. Mogul, Richard F. Rashid, and Michael J. Accetta.

expr :

The packet lter: An ecientmechanism for user-level net-

integer-constant

work co de. In Proceedings of the 11th ACM Symposium on

Operating Systems Principles, pages 39{51, November 1987.

id [ [ . id ] j [ [ expr ] ] ]* # id

expr + expr

[18] Michael Mouly. CSN.1 Speci cation, Version 2. Cell & Sys,

expr * expr

1998.

expr / expr

[19] Ob ject Management Group. CORBA/IIOP 2.2 Speci ca-

expr - expr

tion, 1998.

( expr )

[20] Sean W. O'Malley,To dd A. Pro ebsting, and Allen B. Montz.

USC: A universal stub compiler. In Proceedings of the SIG-

COMM '94 Symposium, Octob er 1994.

333