<<

IPv6 Tunneling Over an IPv4 Network

James M. Moscola, David Lim, Alan Tetley

Department of Computer Science Washington University Campus Box 1045 One Brookings Drive Saint Louis, MO 63130 December 13, 2001

Abstract Due to the growth of the , the current address space provided provided by IPv4, with only 4, 294, 967, 296 addresses, has proven to be inadequate. Because of IPv4’s shortcomings, a new protocol, IPv6, has been created to take its place. This new protocol, using its 128-bit address scheme (thats 7x1023 addresses per square meter of earth!), should provide enough addresses for everyone’s computer, refrigerator and their toaster to have a connection to the internet. To help facilitate the movement from an IPv6 internet to an IPv4 internet we have created a module for the the Field Programmable Port Extender (FPX) in accordance with RFC1933. This module allows IPv6 packets coming from an IPv6 network to be packed into IPv4 packets, tunneled through an IPv4 network and then unpacked at the other end of the tunnel before reentering an IPv6 network. This approach to incorporating the new IPv6 specification allows a progressive changeover of networks from IPv4 to the newer IPv6. The current implementation runs at 80 MHz.

1 1 Introduction

Due to the growth of the internet, the current address space provided by IPv4, with only 4, 294, 967, 296 addresses, has proven to be inadequate. A new protocol, IPv6 [1], has been developed and promises to facilitate the continual growth of the internet community. IPv6 is capable of offering 2128 internet addresses which amounts to approximately 340 trillion trillion trillion addresses (no that is not a typo, it is truly 340 trillion3). There are several ways to make the transition from the current IPv4 internet implementation to the newer IPv6 internet implementation. The first option, and also the least likely to happen, is to choose a day and have everyone with network hardware and software change their implementation. This approach is highly unlikely to happen, and some might say even impossible. Another approach is to have new hosts and routers support both IPv4 and IPv6. This is a much more reasonable approach but still has some problems. Consider the situation where an IPv6 host is sending data to another IPv6 host. The host cannot predetermine the route the data takes along the way and therefore cannot guarantee all networks the data passes through will support IPv6. A third approach is to allow both IPv4 and IPv6 networks to reside on the internet and tunnel IPv6 packets through IPv4 networks [2]. In other words, when an IPv6 packet is leaving an IPv6 domain and entering an IPv4 domain, the packet is encapsulated in an IPv4 packet and transmitted through the network. When the packet reaches the other end of the IPv4 network the IPv4 headers are removed from the IPv6 packet and the IPv6 packet can continue on to an IPv6 domain. A module for the Field Programmable Port Extender (FPX) [3][4] has been created that implements the third method described above. The module contains support for both ends of the and can both pack and unpack IPv6 packets into and from IPv4 packets. Figure 1 shows the layout of the tunneling modules between IPv6 and IPv4 networks.

IPv6 IPv6 Host Host IP Tunneling IP Tunneling Module Module (Packer) (Unpacker)

IPv6 IPv4 IPv6 Network Network Network

Figure 1: Layout of Tunneling modules between IPv6 and IPv4 networks

2 Field Programmable Port Extender (FPX)

The FPX is a reprogrammable logic device that provides a hardware platform for the user to deploy network modules. It acts as an interface between the line cards and the WUGS (Washington University Gigabit Switch) [5], and can be inserted between these devices as shown in Figure 2. The FPX is composed of two FPGAs: the Network Interface Device (NID) and the Reprogrammable Application Device (RAD) [6].

2.1 Network Interface Device (NID) The NID controls how packet flows are routed to and from modules. It also provides mechanisms to dy- namically load hardware modules over the network. The combination of these features allows these modules to be dynamically loaded and unloaded without affecting the switching of other traffic flows or the

2 Line IPP OPP Card FPXField− OC3/ programmable OC12/ Port IPP OPP OC48 Extender Gigabit Switch Fabric Line IPP OPP Card FPXField− OC3/ programmable IPP OPP OC12/ Port OC48 Extender

Figure 2: Configuration for the WUGS, FPX, and the Line Cards processing of packets by other modules in the system. As show in Figure 3, the NID has several components, all of which are implemented in FPGA hardware. It contains a four-port switch to transfer data between ports; Virtual Circuit lookup tables (VC) on each port in order to selectively route flows; a Control Cell Processor (CCP), which is used to process control cells that are transmitted and received over the network; logic to reprogram the FPGA hardware on the RAD; and synchronous and asynchronous interfaces to the four network ports that surround the NID.

Data Data SDRAM SDRAM

Data Module Module Data SRAM SRAM

RAD

VC VC Four Port RAD Program

SRAM CCP Switch VC VC NID

EC EC FPX

Switch LineCard

Figure 3: Major Components of the FPX

2.2 FPX Reprogramability The RAD can be programmed and reprogrammed to hold user-defined network modules, and is connected to two SRAM and two SDRAM components (Figure 3). In order to reprogram the RAD over the network,

3 the NID implements a reliable protocol that fills the contents of the on-board RAM with configuration data that are transmitted over the network. As each cell arrives, the NID uses the data and the sequence number in the cell to write data into the RAD Program SRAM. Once the last cell has been correctly received, the FPX holds an image of the reconfiguration bytestream that is needed to reprogram the RAD. At that time, another control cell can be sent to the NID to initiate the reprogramming of the RAD using the contents of the RAD Program SRAM. The FPX supports partial reprogramming of the RAD by allowing configuration streams to contain commands that only program a portion of the logic on the RAD. Rather than issue a command to reinitialize the device, the NID writes the frames of reconfiguration data to the RAD’s reprogramming port. This feature enables the other modules on the RAD to continue processing packets during the partial reconfiguration. Similar techniques have been implemented in other systems using software-based controllers [7] [8].

3 Protocol Wrappers

Protocol Wrappers [9] [10] are used in the regular expression module to streamline and simplify the net- working functions to process ATM cells and AAL5 frames directly in hardware. They use a layered design and consist of different processing circuits within each layer. The block diagram of the Protocol Wrappers is shown in Figure 4. At the lowest level, the Cell Processor processes raw ATM cells between network inter- faces. At the higher levels, the Processor processes variable length AAL5 frames. Different layers of abstraction are important for structuring a network because doing so allows applications to be implemented at specific levels where important details may be exposed and irrelevant details may be hidden. In this manner, an application that interacts with AAL5 frames can effectively use the Protocol Wrappers.

IP Tunneling Module

Data read Data written from network to network

Frame Processor

Cell Processor

Figure 4: Block Diagram of IP Tunneling module in the Protocol Wrappers.

4 Implementation Details

Several processing components have been combined with the Frame Wrappers to implement an IPv6 over IPv4 tunneler for the FPX. An overview of the design is shown in Figure 5. The Frame Wrappers pro- cess incoming ATM cells to provide the interior components with full AAL5 ATM frames. These internal components then check the frame to see if they should process the packet or pass it on. The Control Cell

4 Processor (CCP) sits on the back end of the Frame Wrappers and receives AAL0 control cells which are used to configure tunnels. It passes the information from these cells to the Address Lookup where they are stored. When the IPv6 component gets an IPv6 packet, it checks the Address Lookup to see if it should be packed. This is determined by checking if the destination IP is part of a known subnet. If a match is found, the packet must be packed since its next hop is part of an IPv4 network. Otherwise the packet is routed like a normal IPv6 packet. The other end of the tunnel is handled by the IPv4 component. This component processes incoming IPv4 packets and checks the destination IP and the IP protocol field to determine if the packet should be unpacked, and does this process if necessary. If not, the packet is routed like a normal IPv4 packet. Even though both ends of the tunnel have been implemented, both ends do not have to be this implementation. Since RFC1933[2] has been followed, either end of the tunnel can be any other router or host that also follows this specification.

Address Lookup

Data read Data written from network Packer Unpacker to network Cell Processor

IPv6 Processor IPv4 Processor

Frame Processor

Cell Processor

Figure 5: Flow Diagram for IP Tunneling Module

4.1 IPv6 Processor The general design of the IPv6 component is seen in Figure 6. As frames enter the IPv6 component, they are first buffered into a FIFO (1). This is necessary since the Address Lookup component can take an indeterminate and variable amount of time to respond to lookup requests. An FSM (2) has been implemented to take care of this task. The machine looks at SOF, EOF, and DataEn to buffer all valid words of data. In the case where part of a frame is dropped, the FSM sees consecutive SOFs without an EOF, and clears the entire FIFO. This will end up dropping any previous packets already in the buffer. However, this design decision greatly reduces logic complexity and is a case that should not occur often anyway. The FSM also keeps track of the length using the input control signals and places it in a FIFO (3). This is the only way to know the length of non-IPv6 packets. A second FSM (4) controls the output of data. It is responsible for generating output SOF, EOF, and DataEn signals as well as appropriate data depending on the type of packet being passed out. When the machine senses that there is data in the input buffer, it moves the ATM into another FIFO (5) and checks to see if the packet is an IPv6 packet and should be processed. If not, the data is output immediately from the input buffer. Otherwise, the IPv6 header is moved into the header buffer. During this process, the component decrements the Hop Limit by 1 and records the length. If the Hop Limit is reduced to 0, the packet is dropped. In addition, as the destination IP is being moved, the FSM passes it out to the Address Lookup component. As soon as the Address Lookup responds, the FSM starts outputting data, starting with the ATM header. If a match was found, the component must pack this IPv6 packet into an IPv4 packet. Packing is done by inserting a valid IPv4 header (6) into the output before the IPv6 header and payload. The makeup of this header is shown in Figure 6. The length is calculated by adding the payload length of the IPv6 packet to a constant 60 bytes for the IPv4 and IPv6 headers (IPv4 length includes the header, whereas the IPv6 length does not). The source IP is the address of the FPX module, and the destination IP is the IPv4 address returned by Address Lookup. The checksum is calculated over the whole IPv4 header, including these values. However, it must be output in the third word. To achieve this, the checksum of the

5 first three words and source IP are calculated while the IPv6 header is being moved between the FIFOs. This is possible since all of these values are known except the length, which can be determined in the third word of the IPv6 header. Then, the checksum is finished up by adding in the destination address as soon as it is output from the Address Lookup, and in parallel to outputting the first word of the IPv4 header. Once the IPv4 header is output the process is the same for both packed packets and IPv6 packets just being passed through. First, the IPv6 header is output from its FIFO. Then, the payload is output from the input buffer. The payload length determines the amount of data read, rather than just emptying the FIFO. This is because there could be another packet in the buffer waiting to be processed.

3 Length FIFO

21 4 SOF, EOF, DataEn SOF, EOF, DataEn empty FSM FSM rd_en

wr_en rd_en wr_en 12 5 Data In

Input Buffer IPv6 Header Buffer

6 1 Data Out 2 IPv4 3 Headers 4 5

Figure 6: IPv6 Component Design

4.2 IPv4 Processor The IPv4 Processor resides on the backside of the IPv6 Processor in the current implementation (Figure 5). Frames can enter the IPv4 Processor as either IPv4 frames, IPv6 frames, or any other type of data that may be passing through the switch. The first thing the IPv4 Processor does when receiving data is check the version and the IP header length to decide if the frame is IPv4. If the frame is not IPv4, all data just passes through the module without modification. Otherwise, if the frame is IPv4, a series of actions is taken. These actions can be followed in Figure 7. Firstly, the time-to-live (ttl) field is checked for validity. If the ttl field is equal to zero, then the packets lifetime has expired and the packet is dropped. If the ttl field is not zero, it is decremented and a new IPv4 header checksum is calculated for the checksum field. The IPv4 header checksum is validated upon receiving an IPv4 packet. If the header checksum is invalid, the packet is dropped. Following this, the IPv4 Processor checks both the protocol field and the destination address field of the packet. If the protocol field is not equal to 0x29 (next encapsulated protocol is IPv6) or the destination address of the IPv4 packet is not equal to the address of the switch that the module is residing on, the rest of the IPv4 packet is sent to the Frame Processor without modification. However, if both the protocol field and the destination address match, then the packet needs to be unpacked. To unpack the IPv6 packet from the IPv4 packet the IPv4 headers are simply removed and the IPv6 hop limit is decremented. The IPv6 packet is then sent to the Frame Processor with no further modification.

6 version /= 4 or iphl < 5 ver forward

version = 4 and iphl >= 5 sof_in = '1'

flags ttl = 0 drop checksum /= x"FFFF"

ttl crc_wt

idle iphl = 5

src payload eof_in = '1'

if protocol = 0x29 and iphl = 5 eof_in = '1' dest addy = local addy unpack <= '1' dest aal5len

options aal5crc

Figure 7: IPv4 State Machine

4.3 Control Cell Processor The control cell processor(CCP) is responsible for two things: 1)adding (up to four) IPv4 tunnels that will later on be used by the IPv6 processor for packing incoming IPv6 packets, 2)updating the FPX IP address, the IPv4 processor compares this address with the destination addresses of any incoming IPv4 packets to see if the incoming IPv4 packets have our module as the end of a tunnel. The CCP is the first module that gets an incoming cell. Therefore it is responsible for checking to see if the cell is a control cell (VCI=35). If the incoming cell is not a control cell, it just passes the cell through so that the IPv6 processor and the IPv4 processor will get it. If the incoming cell is a control cell, the CCP checks the opcode of the incoming cell. When it sees opcode 0x10, the CCP adds an IPv4 tunnel; when it sees opcode 0x12, the CCP updates the FPX IP address. If the CCP sees an opcode other than 0x10 or 0x12, it simply passes the cell through. The finite state machine for the CCP is shown in Figure 8. When the CCP sees opcode 10h it saves the incoming subnet, mask and destination address for an IPv4 tunnel. For now, everything is stored in registers. The CCP just enables 32-bit registers at the right time, latching first the subnet, then the mask, then destination address of the IPv4 tunnel. Note that both the subnet and the mask are 128 bits long, they are therefore latched on four consecutive clocks, i.e first latching the highest 32 bits of the subnet, then the next 32 bits... When the CCP sees opcode 12h it saves the incoming IP address in a register, also by enabling the FPX IP address register. The finite state machine for the CCP is shown in Figure 8.

4.4 Lookup Engine The address lookup module is responsible for returning the IPv4 address for the end of an IPv4 when given an IPv6 address by the IPv6 processor. It does so by, going through each available tunnel. For each tunnel, first, the address lookup module masks the IPv6 address sent to it with the IPv6 address mask for that tunnel, then compares the masked subnet with subnet for that tunnel, if there is a match, it returns the IPv4 address.

7 idle ip_addr soc='1'

opcode=x12 pad pl2 opcode/=x10 opcode/=x12

pl3 opcode

pl4 opcode=x10

subnet1 mask1 dest_addr pl5

pl6

subnet2 mask2 cmdata pl7

pl8

subnet3 mask3 crc pl9

pl10

subnet4 mask4 pl11

Figure 8: Finite state machine for Control Cell Processor

The address lookup module sits in idle until it gets an address request from the IPv6 processor it then proceeds to latch in the 128-bit IPv6 address. Note that this takes four clock cycles because the address is coming in 32 bits at a time. After it has latched the IPv6 address, the address lookup module then proceeds to mask then compare the IPv6 address to the subnets available. Note that this is could have been done in one clock cycle. However, in an attempt to meet a 100MHz clock rate the masking is done in the first clock, then there is a two-cycle compare. It there is a match, the address lookup simply returns the corresponding IPv4 address to the IPv6 processor. If there is no match, the the address lookup moves on the the next tunnel. The finite state machine of the address lookup is shown in Figure 9.

5 Results

The following sections go through both the simulation and the synthesis results.

5.1 Simulation Results A simulation testbench has been setup to test the functionality of our tunneling module. For creating ATM cells, we use the IPTestBench. Details on using the IPTestBench can be found in Section 5 of the paper entitled Layered Protocol Wrappers for Internet Packet Processing in Reconfigurable Hardware [9]. ModelSim was used to send these cells through our module. Below you can see several output waveforms that show the module running in simulation. In Figure 10 the wave forms shows several things. The first thing that happens is a control cell comes in to set the local IP address for the switch. In this example the IP address for the switch is set to 0xADD0ADD0 or 173.208.173.208. The next two incoming cells contain an IPv6 packet. Because there are currently no

8 chk_cntr_val=wr_cntr_val

chk_cntr_val/=wr_cntr_val no_match addr_st3 addr_st4

idle and_st comp_st1 comp_st2 addr_req='1'

addr_st1 addr_st2 match

Figure 9: Finite state machine for Address Lookup

entries in the lookup tables, the IPv6 lookup fails and the packet comes out without being packed. Following this, another control cell is sent in to add an entry to the lookup table. Then the same IPv6 packet from before is sent through the module again. However, this time the lookup succeeds and the IPv6 packet is encapsulated in an IPv4 packet. With the addition of the five IPv4 header words the outgoing packet is now comprised of three ATM cells.

Figure 10: An IPv6 Packet passing through the module before and after the destination address has been added as a route to the lookup table

The next waveform, Figure 11, shows the output of the IPv6 module after it has encapsulated an IPv6 packet into an IPv4 packet. Notice the IPv6 header is still intact as part of the payload of the IPv4 packet. The destination address for this new IPv4 packet has been decided using the lookup tables and inserted into the packet. In this example, the lookup table has returned a value of0x7F000001 or 127.0.0.1. The destination address would always be a valid internet address and not the localhost, however, for simulation we chose this value.

9 Figure 11: A closeup of an IPv6 packet encapsulated in an IPv4 packet

The final waveform, Figure 12, shows the following sequence of events. The first cell that arrives at the module is a control cell to set the local IP address for the switch. In this example the IP address for the switch is set to 0x7F000001 or 127.0.0.1. Notice once again that the localhost address would not be used in a real environment. Following the control cell three ATM cells containing an IPv4 packet arrive at the module. Because the destination address of the IPv4 packet is equal to the IP address of the switch (currently set to 127.0.0.1) and the protocol field is equal to 0x29 the module decides it needs to unpack the data from the IPv4 packet. Notice the incoming IPv4 packet consisted of three ATM cells and the outgoing IPv6 packet consists of only two ATM cells. This is because the five IPv4 headers are stripped away shrinking the frame to only two ATM cells.

Figure 12: An IPv4 packet goes into the modules, the encapsulated IPv6 packet is unpacked from the IPv4 packet and sent onto the network

10 5.2 Synthesis Results The current hardware implementation is capable of running at 80MHz. This amounts to approximately 2.5GB/s of data that can pass through our module (OC-48 speeds). The placement of the circuit on a Xilinx Virtex XCVE-1000E yields the following chip statistics:

• Maximum Frequency: 80 MHz

• Number of Slice Flip Flops: 5,049 out of 24,576 (20%)

• Total Number of LUTs: 4,430 out of 24,576 (18%)

• Number of Block RAMs: 15 out of 96 (15%)

• Total Equivalent Gate Count: 321,724

6 Future Enhancements

The current tunneling module was designed with a place holder for the IPv6 address lookup and tables. To improve upon the tunneling module, and to make it a truly useful module, we have designed it such that a new address lookup/routing table can be dropped in place of the current place holder. With a real address lookup and real routing tables this module could be extremely useful to anyone with an FPX looking to support tunneling. Another enhancement to the module would be the support of ICMP packets. Currently, when IP packets are dropped in the IPv4 and IPv6 Processor no ICMP messages are returned to the sender. ICMP is not a required part of either the IPv4 or the IPv6 protocol, however, it would make a more robust switch were we to include this functionality. Finally, if we had more time to work on the design we could definitely achieve the 100MHz goal of the project. This would allow us to process data at a whopping 3.2GB/s, well above OC-48 speeds.

11 References

[1] “, Version 6 (IPv6) Specification.” Online: http://www.faqs.org/rfcs/- rfc2460.html, Dec. 1998.

[2] “Transition Mechanisms for IPv6 Hosts and Routers.” Online: http://www.faqs.org/rfcs/- rfc1933.html, Apr. 1996.

[3] J. W. Lockwood, J. S. Turner, and D. E. Taylor, “Field programmable port extender (FPX) for dis- tributed routing and queuing,” in ACM International Symposium on Field Programmable Gate Arrays (FPGA’2000), (Monterey, CA, USA), pp. 137–144, Feb. 2000.

[4] J. W. Lockwood, N. Naufel, J. S. Turner, and D. E. Taylor, “Reprogrammable Pro- cessing on the Field Programmable Port Extender (FPX),” in ACM International Symposium on Field Programmable Gate Arrays (FPGA’2001), (Monterey, CA, USA), pp. 87–93, Feb. 2001.

[5] T. Chaney, J. A. Fingerhut, M. Flucke, and J. S. Turner, “Design of a gigabit ATM switch,” Tech. Rep. WU-CS-96-07, Washington University in Saint Louis, 1996. [6] D. E. Taylor, J. W. Lockwood, and N. Naufel, “RAD Module Infrastructure of the Field-programmable Port eXtender (FPX),” tech. rep., WUCS-01-16, Washington University, Department of Computer Science, July 2001.

[7] W. Westfeldt, “Internet reconfigurable logic for creating web-enabled devices.” Xilinx Xcell, Q1 1999. [8] S. Kelem, “Virtex configuration architecture advanced user’s guide.” Xilinx XAPP151, Sept. 1999. [9] F. Braun, J. W. Lockwood, and M. Waldvogel, “Layered protocol wrappers for internet packet pro- cessing in reconfigurable hardware,” Tech. Rep. WU-CS-01-10, Washington University in Saint Louis, Department of Computer Science, June 2001. [10] F. Braun, J. Lockwood, and M. Waldvogel, “Reconfigurable router modules using network protocol wrappers,” in to appear: Proceedings of Field-Programmable Logic and Applications, (Belfast, Northern Ireland), pp. xx–xx, Aug. 2001.

12