Content

„ 3.1 General Issues Modern Computer Networks „ 3.2 Data-Plane Protocols: IPv4 An Open Source Approach „ 3.3 Data-Plane Protocols: IPv6 „ 34C3.4 Con tro l-Plane P rot ocol s: Add ress Management Chap ter 3It3: PtProtocol lL Layer „ 3.5 Control-Plane Protocols: Error Reporting „ 3. 6 Control-Plane Protocols: Routing „ 3.7 Control-Plane Protocols: Multicast

Copyright reserved 2001 (Lin & Hwang) 2

DHCP server 3.1 General Issues host NAT Server Router

TCP/UDP Routing IP address Protocols Subnet ICMP Routing Default „ Service router Table „ Addressing IP ARP IP NAT IP „ Forwarding „ Routing Data Link Data Link Data Link „ Security

Copyright reserved 2001 (Lin & Hwang) 4 Service Internetwork

„ An example of an internetwork „ Provides a host-to-host transmission service Ethernet Fast Ethernet „ Connects several LANs into an internetwork

‰ a network of networks H1 H2 R1 R2 „ “Internet” GigabitEthernet ‰ the global internetwork to which most of networks R3 are connectdted

H3 Wireless LAN

Copyright reserved 2001 (Lin & Hwang) 5 Copyright reserved 2001 (Lin & Hwang) 6

Internet Service Model Address

„ Connectionless „ A globally unique address for host „ Best effort delivery identification ‰ packets may be lost „ Data link layer: a flat address ‰ packets are delivered out of order ‰ dlitduplicate cop ies o f a pac ktket are dliddelivered „ Network layer: a hierarchical address ‰ packets can be delayed for a long time „ Next-hop forwarding based on destination address

Copyright reserved 2001 (Lin & Hwang) 7 Copyright reserved 2001 (Lin & Hwang) 8 Deliver a packet Forwarding at Data Plane

„ How to deliver a packet? „ Steps

‰ Routing ‰ Extract destination address

„ Find a path from source to destination ‰ Look up destination address in routing table „ Done by routing protocols „ Obtain the outppgut interface from routing table

‰ Forwarding ‰ Forward the packet „ Forward packets at a router „ Look up the next-hop from the routing table and then forward

Copyright reserved 2001 (Lin & Hwang) 9 Copyright reserved 2001 (Lin & Hwang) 10

Look uppg the routing table Routing at Control Plane

„ Issues „ Task of routing

‰ Speed and memory requirement ‰ Select a path from the source to the destination ‰ Good data structure „ Goal of routing „ fast look uppp and table update ‰ Efficient (low delay , high throughput , … ) „ low memory requirement ‰ Scalable ‰ Classical approaches ‰ Stabl e „ Trie ‰ Robust „ Hash „ Fast lookup table ‰ Fair „ Hardware implementation

Copyright reserved 2001 (Lin & Hwang) 11 Copyright reserved 2001 (Lin & Hwang) 12 IP routing Multicast

„ Hop-by-hop routing „ DfiitiDefinition o f a mu lticas t

‰ Option: source routing ‰ Communication between a group of hosts ‰ Packets are sent to all group members „ Shortest path routing „ Issues „ Available information ‰ Group membership ‰ Global information vs. local information „ receivers of a multicast session „ Information exchange ‰ Multicast tree construction „ Multiple point-to-point connections or a multicast tree ‰ Floodingg( (broadcast ) vs. nei ghbors onl y „ A multicast tree connects the source node to all destination nodes

Copyright reserved 2001 (Lin & Hwang) 13 Copyright reserved 2001 (Lin & Hwang) 14

Security of IP

„ Aspects on the network security User-Plane Protocols and ‰ Access Control „ Control who has the rights to access MhiMechanisms ‰ Data Security „ Encrypt messages transmitted ‰ Intrusion Detection 3. 2 „ Detect illegal break in 3.3 Internet Protocol Version 6

Copyright reserved 2001 (Lin & Hwang) 15 Copyright reserved 2001 (Lin & Hwang) 16 IP Address

3.2 Internet Protocol „ A globally unique 32-bit address to identify a network interface „ A hierarchical address

‰ consists of network id and host id

„ Addressing „ A router usually has more than one „ Subnetting interface and one address „ Forwarding „ A host may have more than one address „ Packet format „ Fragmentation and re-assembly

Copyright reserved 2001 (Lin & Hwang) 17 Copyright reserved 2001 (Lin & Hwang) 18

IP Address Notation Class-ful IP Address

140.123.1.1 = 10001100 01111011 00000001 00000001 bits 0 1 2 3 4 8 16 24 31 1.0.0.0 to 140 123 1 1 Class A 0 Network Host 127.255.255.25

IP address notation 128.0.0.0 to Class B 1 0 Network Host 191.255.255.25 order transmitted in networks order stored in memory 192.0.0.0 to Class C 1 1 0 Network Host 223.255.255.25 Big Endian Big Endian 224.0.0.0 to 10001100 01111011 00000001 00000001 Class D 1 1 1 0 10001100 01111011 00000001 00000001 239.255.255.25

240.0.0.0 to Class E 1 1 1 1 Reserved 255. 255.255 .25 Little Endian 00000001 00000001 10001100 01111011

Copyright reserved 2001 (Lin & Hwang) 19 Copyright reserved 2001 (Lin & Hwang) 20 Reserved IP Addresses IP Subnet

„ Host id = 0 „ Network address uniquely identifies a

‰ denotes the network itself physical network „ Host id = F…F „ A physical network consists of several LANs

‰ of the network ‰ Subnet mask is used to identify a subnet

‰ Hosts in the same IP subnet talk directly without intervening router „ For example

‰ cs.ccu.edu.tw: 140.123.101.0

‰ subnet mask: 255.255.255.0 or 140.123.101.0/24

Copyright reserved 2001 (Lin & Hwang) 21 Copyright reserved 2001 (Lin & Hwang) 22

IP Subnet Classless IP Address

H2 Subnet: H1 140.123.1.2 140.123.1.0 140.123.1.1 „ Classful addressing: 140.123.1.250 ‰ Inefficient use of address space R1 „ A class B address is too large 140.123.250.1 „ A class C address is too small 140.123.250.2 140.123.250.3 Subnet: ‰ Scalability: too many class C routing entries R2 140.123.250.0 R3 „ CIDR: Classless InterDomain Routing 140.123.2.250 140.123.3.250 ‰ network portion of address of arbitrary length 140.123.2.1 140.123.2.2 140.123.3.1 H3 H4 H5 ‰ address format: a. b. c. d/x Subnet: 140.123.2.0 Subnet: 140.123.3.0

Copyright reserved 2001 (Lin & Hwang) 23 Copyright reserved 2001 (Lin & Hwang) 24 Authority IP Forwarding

ICANN: Internet Corporation for Assigned „ Aspects of forw arding ‰ Packets from upper layer protocols Names and Numbers ‰ Packets fro m a net wo rk inte rface ‰ allocates addresses „ Routing table ‰ manages DNS ‰ Forwarding is based on routing table ‰ Routing entry: (Destination/SubnetMask, NextHop) ‰ assigns domain names, resolves disputes ‰ Default router: (0.0.0.0/0, default router)

Copyright reserved 2001 (Lin & Hwang) 25 Copyright reserved 2001 (Lin & Hwang) 26

Packet Forwardingg( (at Host) Packet Forwardingg( (at Router)

D = NetworkAddress of the destination If (NetworkAddress of the destination == My subnet address) then Look up the routing table Transmit the ppyacket directly to the destination IfthepacketistobedelieredtotheIf the packet is to be delivered to the u pperlaerpper layer Else Deliver the packet to an upper layer protocol Look up the routing table Else if the packet is to be delivered to a directly connected subnet Deliver the packet to the default router Deliver the packet directly to the destination End if Else Deliver the packet to a next hop router Check if destination is in my sbunet: End if If ((HostIP ^ DestinationIP) & SubnetMask)==0)

Copyright reserved 2001 (Lin & Hwang) 27 Copyright reserved 2001 (Lin & Hwang) 28 Table Look Up Opppen Source Implementation: Packet Comes From Upper Layer „ Longest prefix match „ Search cache first; if not found, search the

‰ Organization A: 194.24.0.0/21 routing table (FIB).

‰ Organization B: 194.24.7.0/24 ip_route_output() ‰ 194.24.7.10 matches 194.24.0.0/21 (21 bits) as well as 194.24.7.0/24 (24bits)

‰ Longest prefix: 194. 24. 7. 0/24 is the right routing ip_route _out put _ke y() entry yes no return fd?found? itttl()ip_route_output_slow()

Copyright reserved 2001 (Lin & Hwang) 29 Copyright reserved 2001 (Lin & Hwang) 30

Opppen Source Implementation: Opppen Source Implementation: Routing Cache Routing Table (FIB)

fib_table fn_hash fn_zone rt_hash_table chain u.rt_next fn_zones[0] fz_next fib_node fib_node rtable rtable fn_ zones[[]1] fz_ hash[[]..] fn_next fn_next fn_zones[2] chain tb_data fn_info fn_info

chain fn_zone fz_next fib_nh fz_hash[..] fib_nh nh_dev fn_zone fib_ info fn_zones[32] fz_next nh_gw fn_zone_list fz_hash[..]

Copyright reserved 2001 (Lin & Hwang) 31 Copyright reserved 2001 (Lin & Hwang) 32 IP Packet Format IP Packet Format (()cont.)

0 4 8 16 24 31 HdHeader Type of Version Packet Length (bytes) Length Service „ Version Number

Identifier Flags 13-bit Fragmentation Offset ‰ Current version 4 Upper Layer Header Checksum Time-to-Live Protocol ‰ Version for next generation IP is 6 Source IP Address „ Header Length

Destination IP Address ‰ In units of 4-byte words Options „ TfSi(TOS)Type of Service (TOS)

Data ‰ Desired service of the packet

Copyright reserved 2001 (Lin & Hwang) 33 Copyright reserved 2001 (Lin & Hwang) 34

IP TOS IP Packet Format (()cont.) New: Used as DS codepoint. „ Packet Length

Precedence Type of Service R ‰ Total number of byy(tes (header + data)

Precedence defined TOS defined in RFC 1349: ‰ Maximum is 65,535 bytes In RFC 791: 1000: minimize delay 111: network control 0100: maximize throughput „ Identifier 110: Internetwork control 0010: maximize reliability 101: CRITIC/ECP 0001: minimize cost ‰ Uniquely identify an IP packet 100: Flash override 0000: normal service 011: Flash 1111: maximize security „ Flags 010: Intermediate R 001: Priority : Reserved ‰ Low-order two bits: for fragmentation control 000: Routine „ First bit: do not fragment Not implemented!! Partially implemented!! „ Last bit: more

Copyright reserved 2001 (Lin & Hwang) 35 Copyright reserved 2001 (Lin & Hwang) 36 IP Packet Format (()cont.) IP Packet Format (()cont.)

„ Fragmentation Offset „ HdChkHeader Checksum

‰ Position of the fragment, measured in unit of 8 ‰ 16-bit 1’s complement checksum bytes. „ Source Address (32 bits) „ Time-to-live (TTL) „ Destination Address (()32 bits)

‰ Used as hop limit „ Options

‰ Each router decrease TTL by one ‰ loose source routing, strict source routing, record route, record timestamp ‰ If TTL reaches zero, sent an ICMP message „ Data „ ULPUpper Layer Protoco l ‰ Payload from upper layers ‰ IP:0, ICMP:1, TCP:6, UDP:17

Copyright reserved 2001 (Lin & Hwang) 37 Copyright reserved 2001 (Lin & Hwang) 38

Opppen Source Implementation: IP Fraggymentation & Reassembly Checksum „ ip_fast_csum() function „ Limitation from data link layers (src/include/asm_i386/checksum.h). ‰ MTU(different link-layers, different MTUs) „ An IP packet larger than MTU of its data link layer ‰ optimized by writing this function in assembly languages. needs to be “fragmented” ‰ one packet becomes several small packets ‰ For 80x86 machines, ‰ Re-assembled only at the destination „ do the summation in 32-bit words first

„ The result is then copied to another register Help, cannot Yes, can get get through. through now. „ Shifting registers to have 16 bits in their low-order bits IP „ add up registers link-layer link-layer Packet „ Taking the complement of the result gives the checksum IP fragments

Copyright reserved 2001 (Lin & Hwang) 39 Copyright reserved 2001 (Lin & Hwang) 40 Fragment Control IP Fraggpmentation Example

Header Header „ Identify fragments of a packet id=x, more=0, offset=0 id=x, more=1, offset=0

‰ All fragments have the same identifier 1480 bytes of data „ Know the position of a fragment Header ‰ Recorded in fragmentation offset (13 bits) id=x, more=1, offset=185 3200 bytes of data „ Know the end of a packet 1480 bytes of data

‰ more fragment bit of the last fragment is 0 Header id=x, more=0, offset=370

240 bytes of data

(a) Original packet (b) F ragment s

Copyright reserved 2001 (Lin & Hwang) 41 Copyright reserved 2001 (Lin & Hwang) 42

Opppen Source Implementation: Opppen Source Implementation: Fragmentation Re-assembly

„ Upper layer protocol calls ip_queue_xmit() net_bh() ip_rcv() ip_route_input() ip_local_deliver() „ After routing is determined, call ip_queue_xmit2() In ip_local_deliver(): yes „ ip_queue_xmit2() calls ip_fragment() if the packet more or offset is set? ip_defrag() length is larger than the MTU of the device no ip_local _deliver _finish() „ ip_fragment() In ip_defrag(): ‰ A while loop is used to fragment the original packet into ip_find() ip_ frag_queu e() ? ip_frag_reasm () fragments all fragments in In ip_find(): ‰ Size (in bytes) of a fragment, except the last one, is set to yes the largest multiplicative number of 8 that is less than the ipqhashfn() foundinhashtablefound in hash table? return queue

MTU no ip_frag_create()

Copyright reserved 2001 (Lin & Hwang) 43 Copyright reserved 2001 (Lin & Hwang) 44 Network Address Translation

Network Address Translation „ Why NAT? ‰ Solution to IP address depletion ‰ Private IP address (RFC 1597) „ 10.0.0.0-10.255.255.255 „ 172. 16. 0. 0-172. 31. 255. 255 „ 192.168.0.0-192.168.255.255 „ Network Address Translation Protocol ‰ Network address translation (RFC 3022) „ Allow hosts with private IP address to have Internet access „ Short-term solution for IP address depletion „ Also provides security for Intranet service

Copyright reserved 2001 (Lin & Hwang) 45 Copyright reserved 2001 (Lin & Hwang) 46

NAT Example Typypes of NAT

NAT Table „ NAT with a pool of global IP addresses 10.2.2.2 ==> 140.123.101.30 ‰ 10.2.2.2 ==> 140.123.101.30 10.2.2.3:1175 ==> 140.123.101.30:6175 ‰ 10.2.2.3 ==> 140.123.101.31

‰ dynamic: translate IP address on demand

Src: 10.2.2.2: 1064 Src: 140.123.101.30: 1064 ‰ static: translate IP address with pre-configuration Dst: 140.113.250.5: 80 Router Dst: 140.113.250.5: 80 „ NAT with Port Address Translation (NAPT) of With Src: 10.2.2.3: 1175 NAT Src: 140.123.101.30: 6175 one global IP address Dst: 140.113.54.100: 21 Dst: 140.113.54.100: 21 ‰ 10.2.2.2:1064 ==> 140.123.101.30:5064

‰ 10.2.2.3:1175 ==> 140.123.101.30:6175

Copyright reserved 2001 (Lin & Hwang) 47 Copyright reserved 2001 (Lin & Hwang) 48 Typypes of NAT ( cont.) Problems with NAT

„ Port redirection „ Modify source IP and/or port number ‰ Redirect all WWW service to a specific IP and „ Modify IP header checksum private port number „ Modify TCP checksum „ DNS: www.cs.ccu.edu.tw ==> 140.123.101.38 „ NAT: 140.123.101.38:80 ==> 10.2.2.2:8080 „ Appli ca tion depen den t mo difica tion „ Transpppyarent proxy ‰ ICMP: „ Basic NAT: ICMP checksum, query id (echo) ‰ Enforce all www traffic to a proxy with cache „ NAPT: ICMP packets that may contain IP address „ 140.12 3038803.101.38:80 ==> in teaternal www poy(0proxy (10.1.1.1 ) ‰ destination unreachable (3), source quench (4), redirect (5), „ All HTTP requests go to the internal proxy time exceeded (11), IP header error (12)

Copyright reserved 2001 (Lin & Hwang) 49 Copyright reserved 2001 (Lin & Hwang) 50

Problems with NAT (()cont.) NAT Opppen Source Implementation

„ AlitiSApplication Spec ifiGtific Gateways (ALG)(ALGs) „ Source and destination NAT implementation ‰ FTP in Linux iptables „ PORT/PASV command has IP address:port in ASCII From PRE_ROUTING Routing POST_ROUTING To „ Translate IP address may result in a change of packet size Interface (Destination NAT) Decision (Source NAT) Interface ‰ If new size is shorter, pad with zeroes ‰ If new size is longer, need to change TCP sequence nu mber LOCAL_OUT ‰ Affects acknowledge, congestion control, … (Destination NAT) ‰ A special table is used to correct the TCP sequence and acknowledge numbers

‰ Others: SMTP, SNMP, ……. Upper Layer (TCP/UDP)

Copyright reserved 2001 (Lin & Hwang) 51 Copyright reserved 2001 (Lin & Hwang) 52 NAT Opppen Source Implementation NAT Opppen Source Implementation (cont.) (cont.) „ Data structure „ NAT function flows

‰ Hash table: ip_conntrack_hash[] ip_nat_out() ‰ Hash function: hash_conntrack() ip_nat_out() do_bindings() ‰ Linear search with a hashed list itllt()ip_nat_localout()

do_masquerade() ip_conntrack_in() resolve_normal_ct() upper_layer_protocol->manip_pkt() manip_pkt()

ip_conntrack_f()find_get()

Copyright reserved 2001 (Lin & Hwang) 53 Copyright reserved 2001 (Lin & Hwang) 54

NAT Opppen Source Implementation (cont.) „ FTP ALG function flows 3.3 Internet Protocol Version 6

do_bindings() helper->help() ftp_data_fixup()

ip_nat_seq_adjust() mangle_rfc959_packet()

„ Changes from IPv4 ip_nat_resize_packet() ip_nat_mangle_tcp_packet() „ IPv6 Header „ IPv6 Extension Header „ IPv6 Fragmentation and Reassembly „ IPv6 Address Space

Copyright reserved 2001 (Lin & Hwang) 55 Copyright reserved 2001 (Lin & Hwang) 56 IPv6 IPv6 Header

„ Problems with IPv4 0 4 12 16 24 31 ‰ Shortage of address space Version Traffic Class Flow Label ‰ Lack of Quality of Service guarantee Payload Length Next Header Hop Limit „ New features of IPv6 Source Address (16 octects) ‰ Enlarge address space

‰ Fixed header format helps speed processing/forwarding Destination Address (16 octects) ‰ Be tter support f or Q ualit y of S ervi ce

‰ Auto-configuration

‰ new “anycast” address: route to “best” of several replicated servers

Copyright reserved 2001 (Lin & Hwang) 57 Copyright reserved 2001 (Lin & Hwang) 58

IPv6 Header Changes from IPv4

„ Version: 6 „ Expanded Addressing Capabilities „ Traffic class: ‰ from 32 bits to 128 bits (more level and nodes) ‰ improve multicast routing (“scope” field) ‰ identify class of service ‰ “anycast address”: send a packet to any one of a ‰ E.g ., DiffServ (DS codepoint) group of nodes „ Flow Label: „ Header Format Simplification ‰ identify datagrams in same “flow” ‰ reduce bandwidth cost „ Next header: „ Extensions

‰ identify upper layer protocol for data ‰ more flexibility

Copyright reserved 2001 (Lin & Hwang) 59 Copyright reserved 2001 (Lin & Hwang) 60 Changg()es from IPv4 (cont.) IPv6 Extension Header Examples

„ Options IPv6 Header TCP Header Data ‰ allowed, but outside of header, indicated by “Next Next Header = TCP Header” field (a) No extension header

IPv6 Header Routing Header „ Checksum TCP Header Data Next Header = Routing Next Header = TCP

‰ removed to reduce processing at routers (b) IPv6 header followed by a routing header

„ IPv6 Header Routing Header Fragment Header Fragmentation TCP H ea der Data Next Header = Routing Next Header = Frag. Next Header = TCP ‰ Not allowed at intermediate routers (c) IPv6 header followed by a routing header and a fragment header

Copyright reserved 2001 (Lin & Hwang) 61 Copyright reserved 2001 (Lin & Hwang) 62

IPv6 Extension Header (()cont.) IPv6 Extension Header (()cont.)

„ Order of extension headers „ Not processed by intermediate routers

‰ IPv6 ‰ except hop-by-hop option header ‰ Hop-By-Hop Options header (0) „ Processed strictly in order ‰ Destination Options header (60) „ Each extension header occurs at most once ‰ Rou ting hea der (43) ‰ ‰ Fragment header (44) except Destination Options header, which occurs at most twice ‰ Authentication header (51)

‰ Encapsulating Security Payload header

‰ Destination Options header

‰ Upper-layer header „ TCP(6), UDP(17), Nothing(59)

Copyright reserved 2001 (Lin & Hwang) 63 Copyright reserved 2001 (Lin & Hwang) 64 Fragment Header Fraggpmentation Example

„ Fragmentation is only performed by source IPv6 Header Fragment 1 Data Fragment 2 Data Fragment 3 Data

„ Fragment header format (a) Original packet

IPv6 Header Fragment Header Fragment 1 Data 0 8 16 29 31 Next Header Reserved Fragment Offset R M IPv6 Header Fragment Header Fragment 2 Data Identifier

IPv6 Header Fragment Header Fragment 3 Data

(b) Fragments

Copyright reserved 2001 (Lin & Hwang) 65 Copyright reserved 2001 (Lin & Hwang) 66

Packet Size Issue IPv6 addressing

„ MTU of every link must >= 1280 bytes „ Three categories ‰ ‰ Use Path MTU Discovery to discover MTU greater Unicast than 1280 bytes ‰ MltiMulticas t ‰ Anycast ‰ A nodeneedtoacceptafragmented packet that is as large as 1500 octets „ NttiNotation ‰ 16-bit Hex’s separated by colons 3FFD:3600:0000:0000:0302:B3FF:FE3C: C0DB

‰ Consecutive null 16-bit numbers replaced by :: 3FFD:3600:0:0:0:0:1:A =>3FFD:3600::1:A

Copyright reserved 2001 (Lin & Hwang) 67 Copyright reserved 2001 (Lin & Hwang) 68 IPv6 Address Assignment IPv6 Unicast Address Prefix Address Type Portion 0000 0000 Reserved (IPv4 compatibility) 1/256 0000 0001 Unassigned 1/256 Unicast Address without Internal Structure: 0000 001 Reserved for NSAP 1/128 0000 010 Reserved for IPX 1/128 Node Address 0000 011 Unassigned 1/128 0000 1 Unassigned 1/32 0001 Unassigned 1/16 Unicast Address with Subnet: 001 Aggregatable Global Unicast Address 1/8 010 Unassigned 1/8 Subnet Prefix Interface ID 011 UidUnassigned 1/8 100 Unassigned 1/8 101 Unassigned 1/8 Unicast Unspecified Address: 110 Unassigned 1/8 1110 Unassigned 1/16 0000 0000 0000 1111 0 Unassigned 1/32 1111 10 Unassigned 1/64 1111 110 Unassigned 1/128 Unicast Loopback Address: 1111 1110 0 Unassigned 1/512 1111 1110 10 Link Local Unicast Address 1/1024 00000000 0001 1111 1110 11 Site Local Unicast Address 1/1024 1111 1111 Multicast Address 1/256

Copyright reserved 2001 (Lin & Hwang) 69 Copyright reserved 2001 (Lin & Hwang) 70

IPv6 Unicast Address (()cont.) Aggggreg atable Global Unicast Address IPv4-compatible IPv6 Address: ::8C7B:65A0 16 32 bits 3 13 8 24 16 64 000000000000 IPv4 Address P TLA ID RES NLA ID SLA ID Interface ID

IPv 4-MdIP6AddMapped IPv6 Address: ::FFFF:8C7B:65A0 16 32 bits „ P : Fromat Prefix (001) 00000000FFFF IPv4 Address „ TLA : Top-Level Aggregation Identifier NSAP Addresses: „ RES : Reserved

0000001 defined according to usage requirements „ NLA : Next-Lev el A ggr egati on I dentifi eer IPX Addresses: „ SLA : Site-Level Aggregation Identifier „ Interface ID : Interface Identifier 0000010 to be defined

Copyright reserved 2001 (Lin & Hwang) 71 Copyright reserved 2001 (Lin & Hwang) 72 IPv6 Unicast Address (()cont.) IPv6 Anycast Address

Unicast Link-Local Address: 10 54 64 bits Required Anycast Address: 1111111010 0000 Interface ID n bits 128-n bits subnet prefix 00000000000 Unicast Site-Local Address:

10 38 16 64 bits

1111111011 0000 Subnet ID Interface ID

Copyright reserved 2001 (Lin & Hwang) 73 Copyright reserved 2001 (Lin & Hwang) 74

IPv6 Multicast Address IPv6 Multicast Address (()cont.)

Format: 8 4 4 112 bits „ Node-Local Scope 11111111 flag scope Group ID FF01:0:0:0:0:0:0:1 All Nodes Address „ flag : 000T FF01:0:0:0:0:0:0:2 All Routers Address T = 0 : well-known multicast address T = 1 : transient multicast address „ Link-Local Scope „ scope : scope of multicast group FF02:0:0:0:0:0:0:1 All Nodes Address 0000 : reserved FF02:0:0:0:0:0:0:2 All Routers Address 0001 : node-local scope FF0200001FFFF02:0:0:0:0:1:FFxx:xxxx SliitdNdAddSolicited Node Address 0010 : link-local scope 0101 : site-local scope „ Site-Local Scope 1000 : organization-local scope FF05: 00000020:0:0:0:0:0:2 All Routers Address 1110 : global scope FF05:0:0:0:0:0:0:3 All DHCP Servers

Copyright reserved 2001 (Lin & Hwang) 75 Copyright reserved 2001 (Lin & Hwang) 76 Transition From IPv4 To IPv6

„ Not all routers can be upgraded simultaneous Chapter 3 ‰ How will the network operate with mixed IPv4 and IPv6 routers? IPlLInternet Protocol Layer „ Two proposed approaches Part II ‰ Dual Stack: some routers with dual stack (v6, v4) can “translate” between formats

‰ Tunneling: IPv6 carried as payload in IPv4 datagram among IPv4 routers Ren-Hung Hwang

Copyright reserved 2001 (Lin & Hwang) 77 Copyright reserved 2001 (Lin & Hwang) 78

Control Plane Mechanisms

„ Address Managemen t 3.4 Address Management ‰ Address resolution ‰ Address configuration „ Error reporting ‰ Internet Control Message Protocol

„ Routin g „ Address resolution ‰ Intra-domain routing „ Address configuration ‰ Inter-domain routing „ Multicast

Copyright reserved 2001 (Lin & Hwang) 79 Copyright reserved 2001 (Lin & Hwang) 80 Address Resolution

Address Resolution „ What is address resolution ‰ Translate address at different layers

‰ For example „ host name to IP address „ IP address to Ethernet address Address Resolution Protocol (ARP) „ Why address resolution ‰ MAC address vs. IP address

Copyright reserved 2001 (Lin & Hwang) 81 Copyright reserved 2001 (Lin & Hwang) 82

Address Resolution Protocol ARP Packet Format

„ Protocol operation 0 8 16 24 31 ‰ Source node broadcasts an ARP request packet HdHardware Address T ype PtProtocol lAdd Address T ype on the IP subnet H. Addr Len P. Addr Len Operation Code ‰ All nodes on the subnet will receive the ARP Sender Hardware Address (0-3) request, but only the target node (or some Sender Hardware Addr (4-5) Sender Protocol Addr (0-1) designate server) will reply an ARP reply packet Sender Protocol Addr (2-3) Targg(et Hardware Addr (0-1) via unicast Target Hardware Address (0-3) ‰ Source node receives the reply and gets the MAC Target Protocol Address address of the target node

‰ Cache is used to speed up (w/ timer)

Copyright reserved 2001 (Lin & Hwang) 83 Copyright reserved 2001 (Lin & Hwang) 84 ARP Packet Format ARP Packet Format

„ HARDWARE ADDRESS TYPE „ OPERATION

‰ Link types: Ethernet=0x0001 ‰ Operation code: ARP request=1, ARP reply=2 RARP request= 3, RARP reply= 4 „ PROTOCOL ADDRESS TYPE „ SENDER HADDR ‰ Upper layer protocol identifier: IP=0x0800 ‰ Sender link layer address „ HADDR LEN „ SENDER PADDR ‰ Length of the address of the link layer: Ethernet=6 ‰ Sender network layer address „ PADDR LEN „ TARGET HADDR

‰ Lenggyth of the address of the network layer: IP=4 ‰ Target link layer address, fill zero if unknown „ TARGET PADDR ‰ Target network layer address

Copyright reserved 2001 (Lin & Hwang) 85 Copyright reserved 2001 (Lin & Hwang) 86

Encapsulate ARP Packet into MAC Reverse ARP (()RARP) Frame „ Protocol id: 0x0806 „ Allow a diskless workstation to discover its IP „ Destination address of an ARP request address packet: 0xFFFFFFFFFFFF „ Need a RARP server on each network „ Bootp:

‰ Use UDP messages which are forwarded over routers to find the file server that holds the mapping

Copyright reserved 2001 (Lin & Hwang) 87 Copyright reserved 2001 (Lin & Hwang) 88 Opppen Source Implementation: ARP „ DtData s truct ure ‰ Hash table: arp_table Address Configuration ‰ Hash parameters: a primary key and device interface index „ Functions ‰ Arp_send(): set up ARP header and then xmit ‰ Arp_rcv(): Only deal with reply or request operation. „ Request: calls ip_input_route(), if routes to local, calls arp_send() to send out ARP reppyly. Otherwise, if the host is an ar ppp prox y, also Dynamic Host Configuration Protocol sends ARP reply. „ Reply: update ARP table. (DHCP) ‰ __neigh_lookup(): calls neigh_ lookup() to search the arp hash table, if not found, create one ‰ Eth_rebuild_header (old) or arp_solicit() calls arp_send()

Copyright reserved 2001 (Lin & Hwang) 89 Copyright reserved 2001 (Lin & Hwang) 90

Address Configuration DHCP Protocol

„ What is address configuration „ DiHtCfitiPtlDynamic Host Configuration Protocol

‰ Automaticallyyy and dynamically yg assign an IP „ DHCP is derived from BOOTP address to a host ‰ Some fields are not for host configuration „ Why address configuration „ Operations

‰ Setting IP address is error prone. ‰ A host broadcasts a DHCPDISCOVER message ‰ ‰ Insufficient IP addresses: share IP addresses A DHCP server receives and replies it among hosts ‰ Or a DHCP relay server receives it and forwards to the DHCP server, gggets the configuration and ‰ Better network management relays to the host ‰ DHCP message are sent over UDP (port 67)

Copyright reserved 2001 (Lin & Hwang) 91 Copyright reserved 2001 (Lin & Hwang) 92 State Diagram for DHCP Client DHCP Packet Format

Initial 0 8 16 24 31 /DHCPDISCOVER DHCPNACK or Lease expires OtiOperation Hard. Type Hardware Len Hops Transaction ID DHCPOFFER Seconds B Flags Offer Rebind Client IP Address DHCPNACK Your IP Address /DHCPREQUEST Rebinding expires SIPAddServer IP Address /DHCPREQUEST Router IP Address /DHCPACK Request Renew Client Hardware Address (16 octects) /DHCPACK Server Host Name (64 octects)

/DHCPACK Renewal expires Boot File Name (128 octects) Bind /DHCPREQUEST Options (variable)

Copyright reserved 2001 (Lin & Hwang) 93 Copyright reserved 2001 (Lin & Hwang) 94

DHCP Packet Format DHCP Packet Format

„ More information for host configuration „ Options

‰ such as default router, subnet mask ‰ Option field starts with three fields: code (53), length(1),

‰ encoded in the option field (code=55, length, parameter) type(1-7) Type DHCP Message ID Request Parameter 1 DHCPDISCOVER 1 Subnet mask 2 DHCPOFFER 3 Default gateway 3 DHCPREQUEST 6DNS server 12 Host name 4 DHCPDECLINE 15 Domain name 5 DHCPACK 17 Boot path 6 DHCPNACK 40 NIS domain name 7 DHCPRELEASE

Copyright reserved 2001 (Lin & Hwang) 95 Copyright reserved 2001 (Lin & Hwang) 96 Opppen Source Implementation: DHCP Client

3.5 Error Reporting ip_auto_ config() icdyc_dyna mic() ic_ boot ppse_sen d_if()

struct bootp_pkt { /* BOOTP packet format */ struct iphdr iph; /* IP header */ struct udphdr udph; /* UDP header */ u8 op; /* 1=request, 2=reply */ ic_dhcp_init_options() u8 htype; /* HW address type */ u8 hlen; /* HW address length */ u8 hops; /* Used only by gateways */ u32 xid; /* Transaction ID */ u16 secs; /* Seconds since we started */ Internet Control Message Protocol u16 flags; /* Just what it says */ u32 client_ip; /* Client's IP address if known */ (ICMP) u32 your_ip; /* Assigned IP address */ u32 server_ip; /* (Next, e.g. NFS) Server's IP address */ u32 relay_ ip; /* IP address of BOOTP relay */ u8 hw_addr[16]; /* Client's HW address */ u8 serv_name[64]; /* Server host name */ u8 boot_file[128]; /* Name of boot file */ u8 exten[312]; /* DHCP options / BOOTP vendor extensions */ };

Copyright reserved 2001 (Lin & Hwang) 97 Copyright reserved 2001 (Lin & Hwang) 98

Error Control Protocol ICMP

„ What is error control protocol „ ICMP runs over IP

‰ A protocol for reporting error or status of TCP/IP at remote site (router or host) ICMP Header ICMP Data „ Why error control protocol

‰ For monitoring the status of TCP/IP at each host/router

‰ For reporting error between hosts or routers IP Header IP Data

Copyright reserved 2001 (Lin & Hwang) 99 Copyright reserved 2001 (Lin & Hwang) 100 ICMPv4 Packet Format Typype and Code

Type Code Description 0 0 Echo reppy(ply (pin g) „ Type and Code are used to identify an error 3 0 Destination network unreachable 3 1 Destination host unreachable event 3 2 Destination protocol unreachable 3 3 DtitiD estination por t unreach hbla ble „ Data reports the header and first 8 bytes of 3 4 Fragmentation needed and DF set 3 5 Source route failed the error packet 3 6 D estination network unknown 3 7 Destination host unknown 4 0 Source quench (congestion control) 0 8 16 24 31 5 0 Redirect (destination network) 5 1 Redirect (host) Type Code Checksum 8 0 Echo request (ping) 9 0 Route advertisement Data 10 0 Router discovery 11 0 TTL expired 12 0 Bad IP header

Copyright reserved 2001 (Lin & Hwang) 101 Copyright reserved 2001 (Lin & Hwang) 102

ICMPv4 Examples ICMPv4 Exampp(les (cont.)

„ Ech o Reques t/Rep ly „ If dtftdo not fragment bit i s set, and pack et i s l arger th an lin k MTU , router sends a fragmentation required (type=3, code=4) ICMP ‰ Source sends an echo request to a destination message to source ‰ Destination responses with an echo reply ‰ Source Quench ‰ Type and code of Echo Request and Reply are (8, 0) and (0, 0), respectively. „ when buffer overflows, router sends a source quench (type=4) to source ‰ ping uses echo request and reply „ Destination Unreachable (type=3) ‰ Routing redirect „ If a host forwards a packet to a wrong router, router sends a ‰ Possible errors: network unreachable(code=0),host unreachable(code=1),protocol unreachable redirect (type=5, code=0 or 1, (network/ host)) ICMP message (code=2),port unreachable(code=3),source to source route fail(code=5),destination network unknown ‰ IP header error (code=6), destination host unknown(code = 7 ) „ Wrong IP header, such as wrong option field. (type=12)

Copyright reserved 2001 (Lin & Hwang) 103 Copyright reserved 2001 (Lin & Hwang) 104 ICMPv4 Exampp(le (cont.) ICMPv6

‰ Time E xceed e d „ NtNew type an ddd code „ If TTL is less or equal to zero (after decrement), router sends ‰ Type 0..127: error report a Time Exceeded ((yptype=11 ) ICMP messa ge to source „ 1: Destination unreachable „ traceroute implementation „ 2: Packet too big ‰ traceroute sends an ICMP echo request with TTL=1 to the „ 3: Time Exceeded target machine „ 4: Parameter problem ‰ When the first router receives the message, it responds with a time exceeded message ‰ Type 128..255: informational ‰ traceroute then sends another echo request with TTL= 2 „ 128, 129: Echo request & reply ‰ The message passes the first router, but discarded by the „ 130, 131, 132: Multicast group membership management second router with a returned time exceeded message „ 133,134: Router solicitation and advertisement ‰ Traceroute repeats sending echo requests until it receives „ 135, 136: Neighbor solicitation and advertisement an echo reply from the target machine „ 137: Redirect

Copyright reserved 2001 (Lin & Hwang) 105 Copyright reserved 2001 (Lin & Hwang) 106

Type Code Description ICMPv6 1 0 No route to destination Opppen Source Implementation: 1 1 Communication with destination administratively prohibited 1 3 Address unreachable ICMP 1 4 Port unreachable „ Error when forwarding IP packets 2 0 Packet too big 3 0 Hop limit exceeded in transit ‰ ip_forward() Æ icmp_send() 3 1 Fragment reassembly time exceeded „ TTL 4 0 Erroneous header field encountered „ Strict source routing 4 1 Unrecognized Next Header type 4 2 Unrecognized IPv6 option encountered „ Route redirect 128 0 Echo request „ destination unreachable (ICMP_FRAG_NEEDED) 129 0 Echo reply 130 0 Multicast Listener Query „ Error when receiving IP packets 131 0 MltiMulticas tLitt Listener Repor t ‰ ititl()ip_route_input_slow() Æi()ip_error() Æ id()icmp_send() 132 0 Multicast Listener Done „ destination unreachable 133 0 Router Solicitation 134 0 Router Advertisement 135 0 Neighbor Solicitation 136 0 Neighbor Advertisement 137 0 Redirect

Copyright reserved 2001 (Lin & Hwang) 107 Copyright reserved 2001 (Lin & Hwang) 108 Opppen Source Implementation: ICMP (cont.) „ Receiving ICMP packets 3.6 Routing ‰ Control handlers: icmp_pointers[] „ icmp_ unreach() for type 3 , 4 , 11 , and 12 „ icmp_redirect() for type 5 „ icmp_echo() for type 8 „ icmp_timestamp() for type 13 „ icmp_address() for type 17 „ icmp_ address_ reply() for type 18 „ Principle „ icmp_discard() for other types „ Intra-domain routing ‰ icmp_rcv() Æ icmp_pointers „ Inter-domain routing ‰ ICMPv6 „ icmpv6_send() „ icmpv6_ rcv() Æicmpv6_ echo_ reply(), icmpv6_ notify()

Copyright reserved 2001 (Lin & Hwang) 109 Copyright reserved 2001 (Lin & Hwang) 110

Routing

Routing Principle „ Task of routing ‰ Select a path from the source to the destination „ Goal of routing

‰ Efficient (low delay , high throughput , … )

‰ Scalable „ Link State Routing ‰ Stabl e „ Distance Vector Routing ‰ Robust

‰ Fair

Copyright reserved 2001 (Lin & Hwang) 111 Copyright reserved 2001 (Lin & Hwang) 112 Oppytimality of IP Routing Routinggg Algorithm Classification

„ IP uses hop-by-hop routing(forwarding) „ Global or decentralized information?

‰ Each router determines its own routing table ‰ Link State routinggjg: use Dijkstra algorithm

‰ Why packets will be delivered to their destinations ‰ Distance Vector routing: use distributed Bellman- alonggpp the optimal path? Ford algorithm „ If k is an intermediate node on the optimal path from „ Static or dynamic(adaptive)? source node s to destination d ‰ Fixed routing table, set up manually „ The path from s to k is also the optimal path from s to k „ A shortest path tree can be constructed from a source to ‰ Routing table adapts to network status the rest of the graph.

Copyright reserved 2001 (Lin & Hwang) 113 Copyright reserved 2001 (Lin & Hwang) 114

The Shortest Path Algorithm Link-State Routing

„ View a network as a graph „ Routing information

‰ Nodes are routers ‰ Global information is available by reliable

‰ Edges are physical links broadcasting „ Associated with a link cost: delay, cong estion level, … ‰ Dyygpgynamic: information exchanged when topology changes or periodically „ Find the least cost path „ ‰ Depends on information available Path calculation ‰ Dijkstra algorithm

Copyright reserved 2001 (Lin & Hwang) 115 Copyright reserved 2001 (Lin & Hwang) 116 Dijjgkstra Algorithm Dijjgkstra Algorithm Examp le 1 For each v in V-{s} { B D 4 If v is adjacent to s C(v)=lc(s,v) 3 else A 2 1 C(()v)=? } 1 T = {s} C E While (T≠V)){ { 1 find w not in T s.t. C(w) is the minimum for all w in (V-T) Iteration T C(B),p(B) C(C),p(C) C(D),p(D) C(E),p(E) T = T ∪{w} 0 A 4,A 1,A ∞ ∞ For each v in V-T C(v) = MIN(C(v), C(w)+lc(w,v)) 1 AC 3,C 4,C 2,C P(v)=w) 2 ACE 3,C 3,E } 3 ACEB 3E3,E 4 ACEBD

Copyright reserved 2001 (Lin & Hwang) 117 Copyright reserved 2001 (Lin & Hwang) 118

Routing Table at Node A Distance Vector Algorithm

„ Routing information Destination Cost NextHop ‰ Only local information is known B 3 C „ Knows status of adjacent links and routing information of adjacent nodes C 1 C D 3 C ‰ Dynamic: information exchanged when link cost or E 2 C shortest path changed „ Path calculation

‰ Bellman-Ford

Copyright reserved 2001 (Lin & Hwang) 119 Copyright reserved 2001 (Lin & Hwang) 120 Bellman-Ford Algorithm Bellman-Ford Alggporithm Example:

While (1) { Dt. C NH If x received route update message from y { Step 1 A 4 A For each (Dest, Distance) pair in y’s report { C 2 C If (Dest is new) { /* Dest not in routing table */ D 1 D Dt. C NH Add a new entry for destination Dest Dt. C NH B 1 B rt(Dest).distance = Distance+lc(x,y) B D rt(Dest).NextHop = y B 4 B 4 1 C 3 C C 1 C E 1 E } 3 else if ((Di st ance+l(lc(x,y) )

Copyright reserved 2001 (Lin & Hwang) 121 Copyright reserved 2001 (Lin & Hwang) 122

Bellman-Ford Alggporithm Example: Bellman-Ford Alggporithm Example

Step 2 Dt. C NH A 3 C C 2 B „ Routing table of node A after convergence D 1 D Dt. C NH Dt. C NH E 2 D A 4 C B 3 C B D B 1 B Destination Cost NextHop C 1 C 4 1 C 2 E D 4 C B 3 C 3 E 1 E E 2 C A 2 1 C 1 C D 3 C 1 E 2 C C E 1 Dt. C NH Dt. C NH A 1 A A 2 C B 2 B B 2 D D 2 E C 1 C E 1 E D 1 D

Copyright reserved 2001 (Lin & Hwang) 123 Copyright reserved 2001 (Lin & Hwang) 124 Problem with DV Routing Problem with DV Routing

„ Phenomenon „ Routing loop

‰ good news travels fast ‰ Due to the above phenomenon

‰ bad news travels slowly ‰ Loop formed before routing converged „ Partial solutions

1 1 B D B D ‰ 4 50 Poisoned reverse

3 3 ‰ Split hor izon A 2 1 A 2 1

7 1 ‰ Hold down timer C E ∞ C E 1 1 1

Route updated in two iterations. Route updated in more than 25 iterations.

Copyright reserved 2001 (Lin & Hwang) 125 Copyright reserved 2001 (Lin & Hwang) 126

Hierarchical Routing AS

„ Not a flat network: too many routing entries „ The Internet consists of Autonomous Systems (AS) interconnected with each other: „ Define an AS ‰ Stub AS: small corporation ‰ Routers within an AS are under the same ‰ Multihomed AS: large corporation (no transit) administrative control ‰ TitASTransit AS: provider „ Routing within an AS and between AS’s

‰ ItIntra-didomain rou ting „ Two-level routing:

‰ Inter-domain routing ‰ Intra-AS: routing within an AS ‰ Inter-AS: routing between AS’s

Copyright reserved 2001 (Lin & Hwang) 127 Copyright reserved 2001 (Lin & Hwang) 128 An examppgle of Hierarchical Routing Examppgle of Internet Routing Protocols

Inter-domain routers (exterior gateway) „ Intradomain routing A.2 C.2 Domain A ‰ RIP Domain C ‰ OSPF A.1 C.3 C1C.1 „ Interdomain routing A.3 ‰ BGP-4 B.4 B.1 Domain B

B3B.3 B.2 Inttara-doma in route r s (in teri or gatew ay)

Copyright reserved 2001 (Lin & Hwang) 129 Copyright reserved 2001 (Lin & Hwang) 130

Intra-domain Routing

Intra-domain Routing „ What is intra-domain routing ‰ Routingg() within a domain (AS)

‰ Administrator decides the routing protocol

‰ Administrator has total control on all routers „ Why intra-domain routing „ Routing Information Protocol (RIP) ‰ Mai n tai n connec tivit y withi n a doma in „ Open Shortest Path First (OSPF)

Copyright reserved 2001 (Lin & Hwang) 131 Copyright reserved 2001 (Lin & Hwang) 132 Intra-domain Routing RIP

„ Runs Interior Gateway Protocols (IGP) „ Originally designed for Xerox PARC „ Most Common IGP’s Universal Protocol (used in XNS)

‰ RIP: Routing Information Protocol „ Adopted by UNIX and TCP/IP in 1982

‰ OSPF: Open Shortest Path First ‰ routed of BSD „ RIP: RFC 1058 [1988] „ RIPv2: RFC 1388 [1993]

Copyright reserved 2001 (Lin & Hwang) 133 Copyright reserved 2001 (Lin & Hwang) 134

RIP RIPv2 Packet Format

0 8 16 24 31 „ Distance Vector routing Command Version Must be zero Family of net 1 Route Tag for net 1 ‰ use hop count as cost metric (()up to 15) Address of net 1 ‰ restrict size of the network to 15 Subnet Mask for net 1

‰ Exchange routing message (advertisement) Next Hop for net 1 Distance to net 1 „ every 30 seconds Family of net 2 Route Tag for net 2 ‰ Each advertisement consists of up to 25 routes Address of net 2 (destination nets) Subnet Mask for net 2 Next Hop for net 2 Distance to net 2

Copyright reserved 2001 (Lin & Hwang) 135 Copyright reserved 2001 (Lin & Hwang) 136 RIP Packet Format and Stability Routing Table of RIP

„ RIP packet format ‰ commands: reqqpy,uest or reply, version number „ ‰ up to 25 destination addresses Taken from a cisco ro u ter at cs. ccu. edu. tw „ Stability Destination Gateway Distance Update Flag Interface ‰ hop coun t li m it: 15 means in fin ity /Hop timer 35.0.0.0/8 140.123.1.250 120/1 00:00:28 R Vlan1 ‰ Stabilization Timer: 127.0.0.0/8 directly connected C Vlan0 „ allows RIP to learn all routes from its neighbors before sending full updates 136. 142. 0. 0/16 140. 123. 1. 250 120/1 00:00 :17 R Vlan 1 ‰ Split horizons 150.144.0.0/16 140.123.1.250 120/1 00:00:08 R Vlan1 „ no upp(date on backward route (omits routes learned from that 140.123.230.0/24 directly connected C Vlan230 neighbor) 140.123.240.0/24 140.123.1.250 120/4 00:00:22 R Vlan1 ‰ Poison Reverse Update 140.123.241.0/24 140.123.1.250 120/3 00:00:22 R Vlan1 „ sends uppgdates to a neighbor includes routes learned from that 140.123.242.0/24 140.123.1.250 120/1 00:00:22 R Vlan1 neighbor but sets the route metric to infinity 192.152.102.0/24 140.123.1.250 120/1 00:01:04 R Vlan1 0.0.0.0/0 140.123.1.250 120/3 00:00:08 R Vlan1

Copyright reserved 2001 (Lin & Hwang) 137 Copyright reserved 2001 (Lin & Hwang) 138

Opppen Source Implementation Routing Daemon and Kernel

Routing manager „ GNU Zebra Project (Zebra, routed, gated, …) Handling protocol specific packets ‰ Supports manyyg routing protocols User space „ RIP, OSPF, BGP Kernel space ‰ Runs routinggp daemon as user process „ Communicates with kernel via netlink Control Routing Table Kernel packets DtData pac ktkets

Packets from NICs

Copyright reserved 2001 (Lin & Hwang) 139 Copyright reserved 2001 (Lin & Hwang) 140 Overview of Zebra Routing Protocols Zebra and Netlink/Rtnetlink

Routing Protocols (v RIPd OSPFd BGPd RIPngd R o ia soc uting Zebra Protocol k et int Zebra Daemon Infor Zebra Daemon m e rface) ation ioctl sysctl netlink proc fs rtnetlink netlink / rtnetlink

Kernel Routing Table Kernel

Copyright reserved 2001 (Lin & Hwang) 141 Copyright reserved 2001 (Lin & Hwang) 142

Client Server Interaction in Zebra ZbZebra Client /Server Protoco l Protocol /* Structure for the zebra client. */ Zebra IPv4 route message API zclient_init() struct zclient Zebra Server Install callback functions { Make zebra server socket ///* Other data structures here */ zsend_interface_{add,delete} … zsend_interface_address_{add,delete} zclient_connect /* Pointer to the callback functions. */ zsend_interface_{up,down} int (*interface _add) ( …); zsend_ipv4_{add,delete} int (*interface_delete) (…); zsend_ipv4_{add,delete}_multipath Zebra int (*interface_up) (…); Zebra client Zebra server APIs connection Zeblibra client APIs int (*interface_down) (…); zapi_ipv4_{add, delete} int (*interface_address_add) (…); int ((__)(*interface_address_delete) (…); zebra_interface_add_read int (*ipv4_route_add) (…); zebra_interface_state_read callback functions int (*ipv4_route_delete) (…); zebra_interface_address_{add,delete}_read };

Copyright reserved 2001 (Lin & Hwang) 143 Copyright reserved 2001 (Lin & Hwang) 144 RIP Daemon ((p)ripd) OSPF Features

Initialization ShdliScheduling „ Link-state routing protocol „ Run internal to a single Autonomous System „ Shor test -path tree b e cons truct ed f or routi ng Interface RIP Peer RIP core Zebra rip_version rip_network rip_peer_timeout table client rip_ peer_ update ridflttiip_default_metric rip_neighbor ‰ Dijkstra algorithm rip_timers rip_passive_interface rip_peer_display rip_route ip_rip_version „ Support for equal-cost multipath routing rip_ distance ip_ rip_ authentication „ rip_split_horizon StfTOSSupport for TOS-bdtibased routing „ Support variable subnet length ‰ each route distributed has a destination and mask routemap offset Zebra Daemon

Copyright reserved 2001 (Lin & Hwang) 145 Copyright reserved 2001 (Lin & Hwang) 146

OSPF Features (()cont.) OSPF : Two Levels of Hierarchy

„ Integrated uni- and multicast support: AS boundary router

‰ Multicast OSPF (MOSPF) uses same topology backbone database as OSPF Area border router Area border router „ Two levels of hierarchy : areas within an AS router Backbone

‰ Area: a group of contiguous networks and hosts „ Topology of an area is invisible form outside internal internal ‰ Routing in the AS takes place on two level internal router router router „ intra-area routing, inter-area routing

Area A Area B Area C

Copyright reserved 2001 (Lin & Hwang) 147 OSPF Features (()cont.) OSPF

„ External routing data is advertised through AS „ Features

‰ Flood without modification ‰ Supports stub to reduce broadcasting

‰ Two types of cost „ An area can be figured as stub when there is a single exit point from the area. „ type 1: compatible with costs within area, cost to an external network is the sum of internal cost and external cost „ Virtual Link can not be configured through stub areas. „ type 2: order of magnitude larger, cost to an external network „ AS boundary routers cannot be placed internal to stub is solely determined by external cost areas. „ No AS external advertisements are flood into /through stub areas.

Copyright reserved 2001 (Lin & Hwang) 149 Copyright reserved 2001 (Lin & Hwang) 150

RT1 3 RT5 8 N12 1 RT4 8 188 N13 N1 N3 8 OSPF Hierarchy 11 7 N14 3 6 8 Internal RT2 RT3 6 router N2 26RT6 „ Area border routers Area border N4 Area 1 router ‰ “summarize” distances to networks of its area 7 Ia AS boundary router ‰ advertise to other Area Border routers Area 2 N11 N12 6 RT7 2 „ Backbone routers Ib 5 9 Stub RT9 RT10 N15 ‰ run OSPF routing limited to backbone

1 3 11 „ BdBoundary rou ters RT11 1 2 ‰ connect to other AS’s N9 N6 N8 H1 1 RT12 1 10 RT8 2 N10 4 N7 Area 3 Copyright reserved 2001 (Lin & Hwang) 152 OSPF Example: Intra-area OSPF Example: Inter-area

„ Backbone information advertised into area „ SidiftidtidbSummarized area information advertised by 1 by RT3 and RT4. RT3 and RT4 to backbone. Destination Cost advertised by RT3 Cost advertised by RT4 Network Cost advertised by RT3 Cost advertised by RT4 Ia, Ib 20 27 N1 4 4 N6 16 15 N7 20 19 N2 4 4 N3 1 1 N8 18 18 N9-N11 29 36 N4 2 3 RT5 14 8

RT7 20 14

Copyright reserved 2001 (Lin & Hwang) 153 Copyright reserved 2001 (Lin & Hwang) 154

OSPF Examppgle: Final Routing Table OSPF Packet Format: Header

„ RT4’s routing table „ Five types of OSPF message Destination Path Type Cost Next Hop N1 intra-area 4 RT1 N2 intra-area 4 RT2 N3 intra-area 1 direct Type Description 0 8 16 24 31 N4 intra-area 3 RT3 Version Type Packet Length N6 Inter-area 15 RT5 1 Hello Router ID N7 inter-area 19 RT5 2 Database Description Area ID N6 Inter-area 15 RT5 3 Link State Request Checksum Authentication Type N7 inter-area 19 RT5 4 Link State Update Authentication N8 Inter-area 18 RT5 5 Link State Acknowledgment Authentication N9-N11 inter-area 36 RT5 N12 Type 1 external 16 RT5 Common header N13 Type 1 external 16 RT5 N14 Type 1 external 16 RT5 N15 Type 1 external 23 RT5

Copyright reserved 2001 (Lin & Hwang) 155 Copyright reserved 2001 (Lin & Hwang) 156 OSPF LSAs OSPF Daemon of Zebra

„ Routing information is carried by LSAs Initialization Scheduling LS LS Name Originated Scope Description Type by of Flood 1 Router- All routers Area Describes the collected states LSA LSAs of the router's interfaces to an Interface zclient area. OSPF core Link State Route 2 NtNetwor k- DiDesigna tdted Area Contitains th e lis t o f rou ters ip_ ospf_ interface Advertisement LSAs router connected to the network. ip_ospf_neighbor 3 Summary- Area border Associat Describes routes to inter-area ospf_router_id Network Zebra LSAs router -ed networks. nnetworketwork_aarearea daemon (IP network) Areas show_ip_ospf_cmd 4 Summary- Area border Associat Describes routes to AS LSAs router -ed boundary routers. (()ASBR) Areas 5 AS-external AS AS Describes routes to other Ass. Route Map OSPF SPF ASE -LSAs boundary route_map_update OSPF Flooding LSDB calcuation AS external router route_map_event route calculation

Copyright reserved 2001 (Lin & Hwang) 157 Copyright reserved 2001 (Lin & Hwang) 158

Inter-domain Routing

Inter-domain Routing „ Called Exterior Gateway Protocols (EGP) „ Most common EGP

‰ BGP: Border Gateway Protocol

Border Gateway Protocol (BGP)

Copyright reserved 2001 (Lin & Hwang) 159 Copyright reserved 2001 (Lin & Hwang) 160 BGP Features BGP Features (()cont.)

„ RFC 1771 (BGP-4) „ Can be used within and between ASs „ “Path vector” routing ‰ multiple border routers (()BGP speaker) within an ‰ loop free inter-domain routing between ASs AS „ Runs over TCP with port 179 ‰ IBGP: Interior BGP „ Routing table keeps all feasible paths „ runs between routers in the same AS

‰ Onlyyppg advertises optimal path to neighbors „ All BGP speakers within the AS must be fully meshed (through IGP protocol)

‰ EBGP: Exterior BGP

„ runs between routers belonging to two different ASs

Copyright reserved 2001 (Lin & Hwang) 161 Copyright reserved 2001 (Lin & Hwang) 162

BGP Features (()cont.) BGP Messages

„ Open „ Support information aggregation ‰ First message sent after connection ‰ CIDR „ Keepalive ‰ Confederation ‰ Send often enouggph to keep from timer ex piration „ Update „ could also be used to allow multiple ASs within an AS ‰ No periodic refresh of the entire table „ Policy routing at AS ‰ Adverti se a s ing le feas ible rou te to a peer ‰ Withdraw multiple routes previously advertised ‰ access-list permit or deny (route or path filtering) ‰ Messaggpe contains path attributes and Network La yer „ Link cost metric Reachability Information (NLRI) „ Notification ‰ combination of different metric with the degree of preference (weight, loc pref, med, …) ‰ sendhd when an error idttdis detected

Copyright reserved 2001 (Lin & Hwang) 163 Copyright reserved 2001 (Lin & Hwang) 164 BGP Routinggg Algorithm BGP Path Selection

„ Path vector routing „ Path selection

‰ Different ASs may have different link cost metrics (1) If Next_Hop is inaccessible, drop the update

‰ Loop free is very important (2) Prefer largest LOCAL_PREF

‰ PliPolicy routi ng i s pref erred d(diff (different pri ori iities, prohibi hibilit lists, … ) (3) Prefer shorter AS _PATH

‰ AS_PATH of the path attribute (4) Prefer lower origin code (igp

Copyright reserved 2001 (Lin & Hwang) 165 Copyright reserved 2001 (Lin & Hwang) 166

BGP PATH Attributes BGP PATH Attributes (()cont.)

„ Origin „ LOCAL_PREF ‰ Defines the origin of the path information „ IGP, BGP, Incompp(lete (unknown, e. g., static route ) ‰ Indicate preferred exit router within an AS „ AS_PATH „ Multi_Exit_Disc(MED)

‰ Ordered list or a set ‰ When a router has multiple external links to the same AS, the link to the router with lower MED is preferred. „ Next_Hop

‰ IP of the next hop to the destination

‰ For multiaccess network, nexthop could be a router other than the BGP speaker

Copyright reserved 2001 (Lin & Hwang) 167 Copyright reserved 2001 (Lin & Hwang) 168 BGP Example Opppen Source Implementation

LOCAL_ NtNetwork NtHNext Hop WihtWeight Bt?Best? PATH OiOrig in PREF 139.175.56.165 0 N 4780,9739 IGP 61.13.0.0/16 140.123.231.103 0 N 9918,4780,9739 IGP 140. 123. 231. 100 0 0 Y 9739 IGP 139.175.56.165 0 Y 4780,9277,17577 IGP 61.251.128.0/20 140.123.231.103 0 N 9918,4780,9277,17577 IGP 211.73.128.0/19 210.241.222.62 0 Y 9674 IGP 139. 175.56 .165 0 N 4780, 9919 IGP 218.32.0.0/17 140.123.231.103 0 N 9918,4780,9919 IGP 140.123.231.106 0 Y 9919 IGP 139.175.56.165 0 N 4780,9919 IGP 218. 32. 128. 0/17 140. 123.231. 103 0 N 9918, 4780,9919 IGP 140.123.231.106 0 Y 9919 IGP

Copyright reserved 2001 (Lin & Hwang) 169 Copyright reserved 2001 (Lin & Hwang) 170

Multicast

3.7 Multicast „ What is multicast? „ Protocols

‰ Internet Group Management Protocol V2

‰ Distance Vector Multicast Routing Protocol

‰ Protocol-Independent Multicast (PIM) – Sparse „ Multicast Backbone (MBONE) Mode (SM) „ Internet Group Management Protocol (IGMP) „ Open Source Implementation „ Distance Vector Multicast Routing Protocol (DVMRP) ‰ Trace of IGMP „ Protocol-Indepp()endent Multicast (PIM) ‰ Trace of DVMRP

Copyright reserved 2001 (Lin & Hwang) 171 Copyright reserved 2001 (Lin & Hwang) 172 Multicast

„ Communication among more than two parties Membership Management ‰ Multi-partyyg video conferencing

‰ Distance learning „ Issues

‰ Maintain group member information IGMP ‰ CttConstruct a multi ltittfcast tree for pack kttiiet transmission

‰ Many to many communication

Copyright reserved 2001 (Lin & Hwang) 173 Copyright reserved 2001 (Lin & Hwang) 174

Internet Grouppg Management Protocol Protocol Overview ( IGMPv2) „ Multicast router plays one of the two roles: „ RFC 2236 Querier or Non-Querier „ Used byypgp IP hosts to report multicast group ‰ Querier is resppponsible for maintain membership memberships to routers information ‰ Router with the smallest IP address becomes the „ Enhances IGMPv1 Querier „ Routers hear the Query messages and make the ‰ Querier election mechanism jjgudge

‰ IGMPv2 Leave Group message ‰ Querier periodically sends General Query to solicit membership information ‰ Group-Specific Query message ‰ A General Query is sent to 224.0.0.1 (ALL- SYSTEMS multicast group)

Copyright reserved 2001 (Lin & Hwang) 175 Copyright reserved 2001 (Lin & Hwang) 176 Protocol Overview (()cont.) Protocol Overview (()cont.)

„ When a host receives a General Query „ When a router receives a report ‰ Delays a random time from the range of ‰ adds the gggroup being reported to the list of [0..Max Response Time](starts a timer) multicast groups „ Max Resp. Time is given in the Query message ‰ Sets timer for the membershipp[ to [Group ‰ Sends a report with TTL=1 when timer expires Membership Interval]. ‰ Report suppression „ Deletes it if no reports received before timer expired „ If anoth er h ost’ s report recei ved , s top th e ti mer and „ Query is sent periodically does not send the report „ When a host jjgpoins a multicast group „ Similar for a host receives a Group- Specific Query ‰ Sends an unsolicited report immediately

Copyright reserved 2001 (Lin & Hwang) 177 Copyright reserved 2001 (Lin & Hwang) 178

Protocol Overview (()cont.) IGMPv2 message format

„ When a host leaves a multicast group „ message ftformat ‰ If it was the last host to reply to a Query, it 0 8 16 24 31 should send a Leave Group message to all- Type Max. Resp. Checksum routers multicast address (224.0.0.2) Time Multicast group Address „ When a router receives a Leave Group message „ type ‰ Sends Group-specific Queries every [Last 0x11=Membershippy Query Member Query Interval] to the group being left - General query - Group-Specific Query for [Last Member Query Count] times. 0x16=Version 2 Membershippp Report ‰ If no reports received before [Last Member 0x17=Leave Group Query Interval], assumes no local members. 0x12=Version 1 Membership Report

Copyright reserved 2001 (Lin & Hwang) 179 Copyright reserved 2001 (Lin & Hwang) 180 IGMPv2 message format IGMPv3

„ Max Response Time „ IETF draft-ietf-idmr-igmp-v3-05.txt - only in membership query message „ Adds support for “source filterin g” - set to be zero in other messages ‰ A receiver may request to receive packets only „ Checksum from specific source addresses - 16-bit one’s complement ‰ Select source addresses by INCLUDE or „ Group address EXCLUDE - zero when sending a General Query „ IPMulticastListen(socket, interface, multicast-address, - ggproup address when sendin g a Grou p-Specific filter-mode, source-list) query „ filter-mode: INCLUDE or EXCLUDE

Copyright reserved 2001 (Lin & Hwang) 181 Copyright reserved 2001 (Lin & Hwang) 182

Multicast Routing Protocols

„ Two types of multicast tree Multicast Routing Protocols ‰ source-based tree ‰ core-based tree (shared tree) „ Multicast protocols What’s the ‰ DVMRP difference: ‰ PIM per (S,G) tree „ DVMRP „ Sparse mode or „ Dense mode „ PIM-SM per (*,G) tree ‰ CBT ‰ MOSPF ‰ BGMP

Copyright reserved 2001 (Lin & Hwang) 183 Copyright reserved 2001 (Lin & Hwang) 184 Distance Vector Multicast Routing RPF Algorithm Protocol (DVMRP) „ RFC-1054 „ Three steps „ Derived from RIP ‰ Reverse Path Broadcast (()RPB)

‰ Relies on RIP for unicast routing ‰ Prune to a Reverse Path Multicast (RPM) tree ‰ Forwarding data uni-directionally „ Widely used on the Mbone

‰ Enable incremental deployment of IP multicast siitttlince it supports tunnel „ Construct a source-based tree per source

‰ Provide a shortest path between source and receivers using RPF algorithm

Copyright reserved 2001 (Lin & Hwang) 185 Copyright reserved 2001 (Lin & Hwang) 186

Reverse Path Broadcast (()RPB) RPB Example

„ Broadcast on the Reserve Path member source ‰ When a multicast packet is received mrouter Forward „ Forward the packet on all of its outgoing links only if router w/o Discard member ‰ Packet arrives on the interface that is also the interface of the shortest path back to the sender ‰ Packet is not duplicated „ Otherwise, discard the packet

Copyright reserved 2001 (Lin & Hwang) 187 Copyright reserved 2001 (Lin & Hwang) 188 Prune PRB Tree Prune RPB Tree Example

„ Prune to RPM tree member source ‰ Routers that do not lead to any members send Forward prune messages to upstream routers mrouter router w/o Prune ‰ Routers know membership information via member IGMP

Copyright reserved 2001 (Lin & Hwang) 189 Copyright reserved 2001 (Lin & Hwang) 190

Example of a RPM tree RPF Drawbacks and Benefits

member source „ Drawbacks router w/ Forward member ‰ First packet has to be flooded router w/o ‰ Periodic prune state refresh member ‰ Routing state per (source , group) pair „ Benefits

‰ guarantee effi ci ent d eli very

‰ easy to implement

Copyright reserved 2001 (Lin & Hwang) 191 Copyright reserved 2001 (Lin & Hwang) 192 Problems of DVMRP PIM-SM

„ Work well only for densely represented groups „ Protocol Overview

‰ periodic broadcast will cause performance „ Special Features problems „ Packet Formats „ Large amount of state information stored

‰ Information for forwarding

‰ Prune-state information „ Not scaleable

Copyright reserved 2001 (Lin & Hwang) 193 Copyright reserved 2001 (Lin & Hwang) 194

Protocol Overviews Phase One: RP Tree

„ Documents „ Receiver ‰ RFC 2362 ‰ Sends join message to DR using IGMP ‰ IETF draft: draft-ietf-pim-sm-v2-new-01.txt ‰ DR sends (*,G) PIM Join message to RP „ Terminologies „ Reaches RP or converge on a router on the RPT ‰ DR: Designated Router „ Join message is sent periodically (o.w., it will time ‰ RP: Rendezvous Point out) ‰ RPT: RP-based Tree „ Sender „ PIM-SM route pppackets in three phases ‰ Sender sends a packet with multicast address ‰ Phase one: RP tree as its destination to DR ‰ Phase two: Register Stop ‰ DR unicasts encapsulated packet to RP ‰ Phase three: Shortest-Path Tree (Optional) „ PIM Register packets ‰ RP decapsulates it and forwards it onto RPT

Copyright reserved 2001 (Lin & Hwang) 195 Copyright reserved 2001 (Lin & Hwang) 196 Phase One: RP Tree ((g)Fig) Phase Two: Reggpister Stop

„ Motivation member Join ‰ Encapsulation and decapsulation are too expensive RP Encapsulated DR RP Multicast Send „ Steps ‰ RP initiates an (S,G) source-specific Join to S

‰ All the routers on the path records the (S,G) multicast state

‰ Packets start to flow following the (S,G) tree to RP (*,G) (*,G) ‰ If the packet reaches a router with (*,G), do a short-cut to receivers. source ‰ RP may now receive duplicate packets: native and A B encapsulated. RP discards the encapsulated packet. ‰ RP sends a Register-Stop message to DR of Source.

‰ RP forwards native packets to the RPT.

Copyright reserved 2001 (Lin & Hwang) 197 Copyright reserved 2001 (Lin & Hwang) 198

Phase Two: Reggp(g)ister Stop (Fig) Phase Three: Shortest-Path Tree

„ Motivation

member Source specific join ‰ From source to RP, then to receivers is too long. RP „ DR RP Steps (S,G) ‰ A receiver’s DR may optionally initiate to transfer from the RPT to a source-specific tree (SPT)

‰ It i ssues an (S ,G) j oi n to S . Th e j o in message may reac h the source or converged at some router.

‰ It starts to receive two copies of packets. Drop the one from RPT. source ‰ It then sends an (S,G) prune message to RP „ (S, G, rpt) prune „ Prune message reaches RP or converged at some router on RPT

Copyright reserved 2001 (Lin & Hwang) 199 Copyright reserved 2001 (Lin & Hwang) 200 Phase Three: Shortest-Path Tree ((g)Fig) Special Issues

member Source specific join „ Source-specific Joins RP Source specific prune „ DR RP Multi-access Transit LANs „ RP Discovery

(S,G)

(S, G)

source

Copyright reserved 2001 (Lin & Hwang) 201 Copyright reserved 2001 (Lin & Hwang) 202

Source-sppJecific Joins Multi-access Transit LANs

„ Problems on a LAN with more than one „ If a receiver sends a source-specific join routers using IGMPv3 ‰ Two or more routers issue (*,G) Joins ‰ DR may omit performing a (*,G) join. ‰ Two or more routers issue (S,G) Joins ‰ Instead, DR issues a source-specific (S,G) join. ‰ A router issues a (*,G) Join while another router „ Multicast addresses for source-specific issues a (S,G) Join multicast „ Routers will observe duplicate join messages

‰ 232.0.0.0 to 232.255.255.255 ‰ Use PIM Assert messages to elect a single forwarder for the LAN ‰ Only source-specific join will be accepted for „ Choose the router sends (S,G) group in this range. „ Choose the router with best metric to RP or to source

Copyright reserved 2001 (Lin & Hwang) 203 Copyright reserved 2001 (Lin & Hwang) 204 RP Discovery DR Election

„ PIM-SM routers need to know how to map a „ PIM-Hello messages are sent periodically on each group to an RP PIM-enabled interface ‰ Hello messages are used to learn neighboring routers ‰ Use bootstrap mechanism and elect a DR. ‰ In each PIM domain, a router is elected as the ‰ Hello messages are sent to address 224.0.0.13 Bootstrap Router (BSR) . ‰ HllHello messages con titain DRlDR elec tion pr ior ity an d ‰ Candidate RPs of the domain unicast their Generation Identifier fields candidacy to the BSR. „ A router with largest DR election priority will be the DR. ‰ BSR decides an RP-set and periodically Tie break by IP address (larger is preferred) announces it in a bootstrap message to all routers. „ Generation Identifier is randomly generated. A new GIDGenID causes upd dtfldHllifate of old Hello informa tion an d may ‰ A router (DR) uses an order-preserving hash cause a new election of DR. function to map the group address into the RP-set

Copyright reserved 2001 (Lin & Hwang) 205 Copyright reserved 2001 (Lin & Hwang) 206

BSR Election RP-set

„ A set of routers are configured as candidate „ A set of rout ers are confi gured as candid at e bootstrap routers (C-BSRs) RPs (C-RPs) ‰ Typically same as C-BSRs ‰ Bootstrap messages are used for BSR election and RP-set distribution „ Candidate RPs periodically unicast

‰ A C-BSR with largest BSR priority is elected as Candidate-RP-Advertisement messages (C- the BSR. Tie break by IP address. RP-Advs) to the BSR (which includes) ‰ C-RP address ‰ Group address and a mask to indicate a set of groups it preferred to be the RP „ BSR forms the RP-set (for each group prefix)

Copyright reserved 2001 (Lin & Hwang) 207 Copyright reserved 2001 (Lin & Hwang) 208 Hash Function Summary

„ A rout er mai itintains up to dtdate RP -set „ Source-bdtbased tree „ Choose an RP for a group G based on ‰ Advantage „ Optimal path between sources and receivers ‰ Choose RPs from the RP-set whose Group- prefix is the longest that covers G ‰ Disadvantage „ RtiifRouting informa tiftion for each h(SG)i (S,G) pair ‰ Compute a value by Value(G,M,C(i))= „ Shared tree (1103515245 * ((1103515245 * (G&M)+12345) ‰ Advantage XOR C(i)) + 12345) mod 2^31 „ Less state in each router ‰ Choose t he RP w it h hig hest pr ior ity an d va lue ‰ Disa dvan tage ‰ Tie break by IP address „ Non-optimal path between sources and receivers

Copyright reserved 2001 (Lin & Hwang) 209 Copyright reserved 2001 (Lin & Hwang) 210

MBONE

MBONE „ A vitirtua l ne twor kon tfItttop of Internet „ Provide multicast and real-time transmission technique „ Characteristic of Mbone ‰ Bandwidth usage will not increase proportionally „ Island when ggproup membershi p increases „ Mrouter „ Goal of MBONE „ Tunnel ‰ Construct a testbed for multicast applications when no ubiquitous mrouters in the Internet

Copyright reserved 2001 (Lin & Hwang) 211 Copyright reserved 2001 (Lin & Hwang) 212 MBONE Structure MBONE Structure

member Island A mrouter router w/o „ Three components of Mbone : multicast cap.

‰ Island

‰ Mrouter

‰ Tunnel Tunnel Island C „ Islands

‰ Ne twork s with IP multi cast capabilit y Island B

‰ Hosts in the same island can do multicast directly withou t throug h rou ters

Island D

Copyright reserved 2001 (Lin & Hwang) 213 Copyright reserved 2001 (Lin & Hwang) 214

MBONE Structure MBONE Structure

„ Mrouter „ Tuuennel ‰ To solve problems caused by some routers that do ‰ Construct a virtual point-to-point link between not support multicast routing local mrouter and remote mrouter ‰ run mrouted (multicast routing daemon) ‰ Allow multicast traffic to pass through non- „ determine routing path multicast capable router „ multicast packet transition ‰ Capsulation

Multicast Header IMulticast Data

NIPHdNew IP Header Tunnel source Original Multicast Packet and destination

Copyright reserved 2001 (Lin & Hwang) 215 Copyright reserved 2001 (Lin & Hwang) 216 MBONE Address MBONE Communication Protocol

„ Multicast address „ Multi cast R outin g Pr ot ocol s

‰ Assigned to a multicast group ‰ DVMRP, PIM-DM, PIM-SM, CBT, MOSPF, ... ‰ Senders use it as destination IP address „ IGMP

„ Class D Address (224.0.0.0~239.255.255.255 ) ‰ A communication protocol between mrouter ‰ Hig h-order four bit s i s 1110 andhd hos ts in a su bne t

‰ 28-bit multicast group ID

Copyright reserved 2001 (Lin & Hwang) 217 Copyright reserved 2001 (Lin & Hwang) 218

MBONE Application Pitfalls and Misleading

„ Debug tool „ SDR Tool MAC address, IP address, and domain name ‰ mtrace „ Forwardinggg and routing ‰ map-mbone „ Classful IP and CIDR „ Basic software „ DHCP an d BOOTP ‰ SDR (Session Directory) „ DHCP and IPv6 auto-configuration ‰ Wb (Whiteboard) „ Multicast tree and Steiner tree ‰ VAT (Visual Audio Tool)

‰ VIC (Vid eo Con ference ) „ MBONE and Video/Audio conferencing

Copyright reserved 2001 (Lin & Hwang) 219 Copyright reserved 2001 (Lin & Hwang) 220 Further Reading Further Reading

„ IPv4 „ FtTblLkFast Table Look up ‰ V. Cerf and R. Kahn, “A protocol for packet network intercommunication,” ‰ M. Degermark, A. Brodnik, S. Carlsson, and S. Pink, “Small forwarding tables for IEEE Transactions on Communications, vol. 22, May 1974, pp. 637-648. fast routing lookups,” SIGCOMM’97, Oct. 1997, pp. 3-14. ‰ M. Waldvogel, G. Varghese, J. Turner, and B. Plattner, “Scalable high speed ‰ JBPJ. B. Post tl“Itel, “Internet work kPt Protocol lA Approach es,” IEEE T ransacti ons on routing lookups,” SIGCOMM’97, Oct. 1997, pp. 25-36. Communications, vol. 28, April 1980, pp. 604-611. ‰ B. Lampson, V. Srinivasan, and G. Varghese, “IP Lookups Using Multiway and ‰ Related RFC’s: Multicolumn Search,” INFOCOM’98, March 1998. „ Stan dar ds: RFC 791 (s tan dar d), RFC 1122 (requ iremen ts for hos ts ) ‰ V. Srinivasan and G. Varghese, “Faster IP lookups using controlled prefix „ Reassembly: RFC 815 expansion,” SIGMETRICS’98, June 1998. „ Subnetting procedure: RFC 950 ‰ T. V. Lakshman and D. Stiliadis, “High speed policy-based packet forwarding „ TOS: RFC 2474 usin g effi ci ent m ulti-dim en si on al r an ge m at chin g,” SI GCOMM’ 98, Sept. 1 998, pp. 203-214. „ CIDR ‰ V. Srinivasan, G. Varghese, S. Suri, and M. Waldvogel, “Fast scalable layer four ‰ RFC 1519 switching,” SIGCOMM’98, Sept. 1998, pp. 191-202. „ NAT ‰ VSiiV. Srinivasan, GVG. Varg hese an dSSd S. Sur i“Pkti, “Packet c lass ifitiification us ing tup le space search,” SIGCOMM’99, Sept. 1999. ‰ RFC 3022 ‰ N-F Huang, S-M Zhao, J-Y Pan, and C-A Su, “A Fast Routing lookup scheme for ‰ Terminology and considerations RFC 2663 gggigabit switchin g routers,” INFOCOM’99, pp. 1429-1436.

Copyright reserved 2001 (Lin & Hwang) 221 Copyright reserved 2001 (Lin & Hwang) 222

Further Reading Further Reading

„ IPv6 „ RiRouting

‰ http://playground.sun.com/pub/ipng/html/ ‰ C. Huitema, Routing in the Internet, Prentice Hall, 1995. ‰ C. Huitema, IPv6: The new Internet Protocol, Prentice Hall, 1997. ‰ C. Labovitz,,,, G. R. Malan, F. Jahanian, “Internet routing gy, instability,” ‰ S. Bradner and A. Mankin, IPng: Internet Protocol Next Generation, SIGCOMM’97, Oct. 1997, pp. 115-126. editors, Addison-Wesley, 1995. ‰ RIP ‰ Related RFC „ RFC 1058 (RIPv1) „ Spec: RFC 2460 (draft standard) „ RFC 2453 (RIPv2) „ Address: RFC 2373 „ Path MTU discovery: RFC 1191 ‰ OSPF „ RFC 2328 (OSPFv2) „ ARP ‰ Related RFC ‰ BGP-4 „ Spec: RFC 826 „ J. Stewart, BGP4: Interdomain Routing in the Internet, Addison-Wesley, 1999. „ More Fault tolerant approach: RFC 1029 „ RFC 177 1 „ DHCP „ RFC 2858: Multiprotocol Extensions for BGP-4 ‰ RFC 2131

Copyright reserved 2001 (Lin & Hwang) 223 Copyright reserved 2001 (Lin & Hwang) 224 Further Reading Hands-on Exercises

„ MliMulticast „ Trace ip_route_input() and ip_route_output() in the ‰ S. Paul, “Multicasting on the Internet and its applications,” Kluwer source code of Linux. Describe how packet are Academic Publishers, 1998.

‰ S. Deering and D. Cheriton, “Multicast routing in datagram internetworks fddtthdforwarded to next hop and upper layer. and extended LANS,” ACM Transactions on Computer Systems, vol. 8, „ Use Sniffer or similar software to observe fragments May 1990, pp. 85-110.

‰ S. Deering, D. Estrin, D. Farinacci, V. Jacobson, C. Liu, and L. Wei, “The oflf a large IP IPkt packet. PIM architecture for wide-area multicast routing,” ACM/IEEE „ Build an internetwork with three IPv6 subnets using Transactions on Networking,,,p,pp vol. 4, April 1996, pp. 153-162.

‰ Related RFC Linux-bdPC’based PC’s. „ IGMP: RFC 2236 (v2) „ Build a NAT server using Linux-based PC. „ DVMRP: RFC 1075 „ PIM-SM: RFC 2362 „ Build a DHCP server using Linux-based PC. „ Mrouted (v3.8): http://parcftp.xerox.com/pub/net-research/ipmulti

Copyright reserved 2001 (Lin & Hwang) 225 Copyright reserved 2001 (Lin & Hwang) 226

Hands-on Exercises Hands-on Exercises

„ Use virtual route or traceroute to find out the „ Install MBONE applications and run sdr tool to infrastructure of your domain and routes to foreign create a multicast session with audio, video, and countitries. shdhitbdhared whiteboard. „ Use Sniffer or similar software to find out how ping is „ Use Sniffer or similar software to observe the imp lemen te d us ing ICMP messages. operation of DVMRP „ Use Sniffer or similar software to find out how tttraceroute is imp lemen te d us ing ICMP messages. „ Build a subnet with three Linux-based routers. Use SiffSniffer or si m ilar so ftware to o bserve t he operat ion of RIP.

Copyright reserved 2001 (Lin & Hwang) 227 Copyright reserved 2001 (Lin & Hwang) 228 Written exercises Written exercises

„ What would be t he pro blems w hen two hosts use t he same IP „ When an IP pack et i s f ragmented i nto fragments, a si ng le fragment address and ignore the existence of each other. loss will cause the whole packet to be discarded. Consider an IP „ Comppgypyare the addressing hierarchy in the telephone system with that packet that contains 4800 bytes of data (from upper layer) is to be in the Internet. (Hint: The telephone system uses geographical delivered to a directly connected destination. Consider two types of addressing.) data link layer with different MTU's. Let type A technology uses 5 byyy(yytes of header and has an MTU of 53 bytes (you may think of it as „ Why fragmentation is needed in IP? Which fields in IP header are needed for fragmentation and reassembly? the ATM technology). On the other hand, type B technology uses 18 bytes of header and has an MTU of 1518 bytes (say, it is Ethernet). „ What is the purpose of the identifier field in the IPv4 header? Will Assume the frame loss rate of type A is 0. 001 while that of type B is wrap around be a problem? Give an example to show the wrap 0.01. Compare the packet loss rate under these two types of data around problem. link layer technology. „ How would the time out value of ARP cache affect its performance? „ Discuss th e diffi cu lties o f bu ilding connec tion less serv ice over a „ Let A be a host with private IP which connects to the Internet virtual circuit subnet, e.g., IP over ATM. through a NAT server. Can a host outside A's subnet telnet to A?

Copyright reserved 2001 (Lin & Hwang) 229 Copyright reserved 2001 (Lin & Hwang) 230

Written exercises Written exercises

„ Consider the following LAN with one Ether switch, S, one intradomain router , R , and two hosts , X , Y . Assume switch S has „ Cons ider the fo llow ing ne twor k topo logy. Show how does no de A been just powered on. construct its routing table using Link-State routing and Distance

‰ Describe the routing and address resolution steps performed at X, Y, Vector routing, respectively. and S when X sends an IP packet to Y. 3 ‰ Describe the routing and address resolution steps performed at X, Y, and S B C when Y replies an IP packet to X. 1 6 ‰ Describe the routing and address resolution steps performed at X, S and R when X sends an IP packet to a host that is outside the domain. (Hint: do not forget to A 4 explain how does X know of the router R.) 1 10 2 R D E F Ethernet „ Continues with the ppqrevious question. Now su ppose link A-B fails, S Y how the LS and DV routing reacts to this change? Ethernet „ Compare the message complexity and convergence speed of LS X and DV routing.

Copyright reserved 2001 (Lin & Hwang) 231 Copyright reserved 2001 (Lin & Hwang) 232 Written exercises Written exercises

„ Suppose that a positive lower bound is know for all link costs. „ Compare t he differences o fIP4f IPv4 an dIP6hd IPv6 hea der formats. Discover Design a new link state algorithm which can add more than one the changes and explain why made these changes. node into the set N at each iteration? „ In the IPv4 header,,p there is a protocol id field. What is the „ Distance vector routing algorithms are both adopted in intra-domain functionality of this field. Is there a corresponding field in IPv6 routing (e.g., RIP) and inter-domain routing (e.g., BGP), but are header? implemented with different concerns and additional features. „ What are the major differences between link state routing and Compare the differences between intra-domain routing and inter- distance vector routing? What are the stability problems of distance domain routing when both of them all use distance vector algorithm. vector algorithms and what are the possible solutions? „ In your opinion, how quality of service is better supported in IPv6? „ Consider the tunneling technique between two mrouters. Describe „ Why the order of IPv6 extension headers is important and cannot be how a multicast packet is encapsulated in a unicast packet? How altered? does the mrouter at the other side of tunnel know it is an „ Describe the path MTU discovery procedure defined in RFC 1981. encapsulated packet?

Copyright reserved 2001 (Lin & Hwang) 233 Copyright reserved 2001 (Lin & Hwang) 234

Written exercises Written exercises

„ DDVMRPiiihDoes DVMRP minimize the use o f networ kbdidhdk bandwidth or end-to-end „ AliA multicast tree w ihiiidith minimized cost is ca lldilled a steiner tree. Why delay to each destination? Will a node receive multiple copies of the none of the protocols proposed in IETF RFCs tries to construct a same packet? If yes, propose a new protocol such that all nodes will steiner multicast tree? receive only one copy. „ In general, we will think that the cost of a source-based tree will be „ PIM consists of two modes: dense mode and sparse mode. What less than that of a shared-based tree. Do you agree or not? Why? are the differences between these two modes? Why to define two Construct an counter example to show that the cost of a source- modes? based tree is actually larger than that of a shared-based tree. „ In PIM-SM, how does a router know where is the RP for a new jijoine d mem ber o f a mu lticas t group ? „ When a host sends a packet to a multicast group, how the packet is handled differentlyyy by the desi gnated router under DVMRP and PIM- SM?

Copyright reserved 2001 (Lin & Hwang) 235 Copyright reserved 2001 (Lin & Hwang) 236 Written exercises

„ Show th e mul t icast tree bu ilt by DVMRP in t he fo llow ing networ k topology.

source

2 R1 1 R3 2 R2 2 2 receiver B

receiver A

Copyright reserved 2001 (Lin & Hwang) 237