Algorithms for Packet Classification Pankaj Gupta and Nick Mckeown, Stanford University
Total Page:16
File Type:pdf, Size:1020Kb
Algorithms for Packet Classification Pankaj Gupta and Nick McKeown, Stanford University Abstract The process of categorizing packets into “flows” in an Internet router is called packet classification. All packets belonging to the same flow obey a predefined rule and are processed in a similar manner by the router. For example, all packets with the same source and destination IP addresses may be defined to form a flow. Packet classification is needed for non-best-effort services, such as firewalls and quality of service; services that require the capability to distinguish and isolate traffic in differ- ent flows for suitable processing. In general, packet classification on multiple fields is a difficult problem. Hence, researchers have proposed a variety of algorithms which, broadly speaking, can be categorized as basic search algorithms, geometric algorithms, heuristic algorithms, or hardware-specific search algorithms. In this tutorial we describe algorithms that are representative of each category, and discuss which type of algorithm might be suitable for different applications. ntil recently, Internet routers provided only best- Table 2 shows the flows into which an incoming packet effort service, servicing packets in a first-come- must be classified by the router at interface X. Note that the first-served manner. Routers are now called on to flows specified may or may not be mutually exclusive. For provide different qualities of service to different example, the first and second flow in Table 2 overlap. This is Uapplications, which means routers need new mechanisms common in practice, and when no explicit priorities are speci- such as admission control, resource reservation, per-flow fied, we follow the convention that rules closer to the top of queuing, and fair scheduling. All of these mechanisms the list take priority. require the router to distinguish packets belonging to differ- ent flows. Problem Statement Flows are specified by rules applied to incoming packets. Each rule of a classifier has d components. R[i] is the ith com- We call a collection of rules a classifier. Each rule specifies a ponent of rule R, and is a regular expression on the ith field flow to which a packet may belong based on some criteria of the packet header. A packet P is said to match rule R, if "i, applied to the packet header (shown in Fig. 1). To illustrate the ith field of the header of P, satisfies the regular expression the variety of classifiers, consider some examples of how pack- R[i]. In practice, a rule component is not a general regular et classification can be used by an Internet service provider expression but is often limited by syntax to a simple (ISP) to provide different services. Figure 2 shows ISP1 con- address/mask or operator/number(s) specification. In an nected to three different sites: enterprise networks E1 and E2, address/mask specification, a 0 (1) at bit position x in the and a network access point1 (NAP), which is in turn connect- mask denotes that the corresponding bit in the address is a ed to ISP2 and ISP3. ISP1 provides a number of different ser- don’t care (significant) bit. Examples of operator/number(s) vices to its customers, as shown in Table 1. specifications are eq 1232 and range 34-9339. Note that a pre- fix can be specified as an address/mask pair where the mask is contiguous (i.e., all bits with value 1 appear to the left of bits Service Example with value 0 in the mask). It can also be specified as a range of width equal to 2t where t = 32 – prefixlength. Most com- Packet filtering Deny all traffic from ISP3 (on interface X) destined to E2. 1 A network access point is a network site which acts as an exchange point for Policy routing Send all voice-over-IP traffic arriving from Internet traffic. ISPs connect to the NAP to exchange traffic with other ISPs. E1 (on interface Y) and destined to E2 via a separate ATM network. Flow Relevant packet fields Accounting and Treat all video traffic to E1 (via interface Y) as highest priority and perform accounting for the traffic sent this way. Email and from ISP2 Source link-layer address, source transport port number Traffic rate limiting Ensure that ISP2 does not inject more than 10 Mb/s of e-mail traffic and 50 Mb/s of From ISP2 Source link-layer address total traffic on interface X. From ISP3 and going Source link-layer address, Traffic shaping Ensure that no more than 50 Mb/s of Web to E2 destination network-layer address traffic is injected into ISP on interface X. 2 I Table 2. Flows into which an incoming packet must be classi- I Table 1. Customer services provided by ISP1. fied by the router at interface X. 24 0890-8044/01/$10.00 © 2001 IEEE IEEE Network • March/April 2001 L4-SP L4-DP L4-PROT L3-SA L3-DA L3-PROT L2-SA L2-DA Payload 16b 16b 8b 32b 32b 8b 48b 48b monly occurring specifications can be represented by ranges. An example real-life classifier in Transport layer header Network layer header Link layer header four dimensions is shown in Table 3. By convention, the first rule R1 is of L2: Layer 2 (e.g., Ethernet) DA: Destination address highest priority, and rule R7 is of low- L3: Layer 3 (e.g., IP) SA: Source address est priority. Some example classifica- L4: Layer 4 (e.g., TCP) PROT: Protocol tion results are shown in Table 4. SP: Source port DP: Destination port Longest prefix matching for routing lookups is a special case of one-dimen- I Figure 1. This figure shows some of the header fields (and their widths) that might be used sional packet classification. All packets for classifying the packet. Although not shown in this figure, higher-layer (e.g., application- destined to the set of addresses level) headers may also be used. described by a common prefix may be considered to be part of the same flow. The address of the next hop to which the packet should be forwarded is the associated action. The length of the Y E prefix defines the priority of the rule. 1 ISP1 Performance Metrics for Classification ISP2 Algorithms NAP X Z • Search speed — Faster links require faster classifica- ISP3 E tion. For example, links running at 10 Gb/s can Router 2 bring 31.25 million packets/s (assuming minimum- sized 40-byte TCP/IP packets). I Figure 2. An example network of an ISP (ISP ) connected to two enter- • Low storage requirements — Small storage require- 1 prise networks (E and E ) and to two other ISP networks across a net- ments enable the use of fast memory technologies 1 2 work access point (NAP). like static random access memory (SRAM). SRAM can be used as an on-chip cache by a software algo- rithm and as on-chip SRAM for a hardware algorithm. • Ability to handle large real-life classifiers. • Flexibility in specification — A classification algorithm should • Fast updates — As the classifier changes, the data struc- support general rules, including prefixes, operators (range, ture needs to be updated. We can categorize data struc- less than, greater than, equal to, etc.), and wildcards. In some tures into those which can add or delete entries applications, noncontiguous masks may be required. incrementally, and those which need to be reconstructed from scratch each time the classifier changes. When the Classification Algorithms data structure is reconstructed from scratch, we call it pre- processing. The update rate differs among different appli- Background cations: a very low update rate may be sufficient in For the next few sections, we will use the example classifier in firewalls where entries are added manually or infrequent- Table 5 repeatedly. The classifier has six rules in two fields ly, whereas a router with per-flow queues may require very labeled F1 and F2; each specification is a prefix of maximum frequent updates. length 3 bits. We will refer to the classifier as C = {Rj} and • Scalability in the number of header fields used for classification. each rule Rj as a 2-tuple: <Rjl, Rj2>. Rule Network-layer destination Network-layer source (address/ Transport-layer Transport-layer Action (address/mask) mask) destination protocol R1 152.163.190.69/255.255.255.255 152.163.80.11/255.255.255.255 * * Deny R2 152.168.3.0/255.255.255.0 152.163.200.157/255.255.255.255 eq www udp Deny R5 152.163.198.4/255.255.255.255 152.163.160.0/255.255.252.0 gt 1023 tcp Permit R6 0.0.0.0/0.0.0.0 0.0.0.0/0.0.0.0 * * Permit I Table 3. A real-life classifier in four dimensions. Packet Network-layer destination Network-layer source Transport-layer Transport-layer protocol Best matching rule, header destination action P1 152.163.190.69 152.163.80.11 www tcp R1, deny P2 152.168.3.21 152.163.200.157 www udp R2, deny P3 152.168.198.4 152.163.160.10 1024 tcp R5, permit I Table 4. Example classification results. IEEE Network • March/April 2001 25 111 P Definition 1 — Given a set of N disjoint ranges G = {Gi = [li, 110 W ui]} that form a partition of the number line [0,2 – 1], i.e., li and u are such that l = 0, l £ u , l = u + 1, u = 2W – 1; R5 R6 i 1 i i i+1 i N 101 the range lookup problem is to find the range GP (and any asso- ciated information) that contains an incoming point.