1/19/11

DNS Caching DNS Name Resolution Exercises

Show the DNS resolution paths, assuming the DNS hierarchy shown in the figures and assuming caching: • Once a (any) name server learns mapping, it caches mapping • thumper.cisco.com looks up bas.cs.princeton.edu • to reduce latency in DNS translation • thumper.cisco.com looks up opt.cs.princeton.edu • Cache entries timeout (disappear) after some time (TTL) • thumper.cisco.com looks up cat.ee.princeton.edu • TTL assigned by the authoritative server responsible for the host name • thumper.cisco.com looks up ket.physics.princeton.edu • Local name servers typically also cache • bas.cs.princeton.edu looks up dog.ee.princeton.edu • TLD name servers to reduce visits to root name servers • opt.cs.princeton.edu looks up cat.ee.princeton.edu • all other name server referrals • both positive and negative results

1 2 Peterson & Davie 2nd. ed., pp. 627, 628

DNS Design Points DNS Design Points

DNS serves a core Internet function application application At which protocol layer does the DNS operate? transport •host, routers, and name servers communicate to transport resolve names (name to address translation) DNS serves a core Internet function •complexity at network’s “edge” At which protocol layer does the DNS operate? network network •host, routers, and name servers communicate to resolve names (name to address translation) link Why not centralize DNS? link •complexity at network’s “edge” •single point of failure physical •traffic volume physical •performance: distant centralized database Why not centralize DNS? •maintenance ➡doesn’t scale!

DNS is “exploited” for server load balancing, how?

3 4

1 1/19/11

DNS protocol, messages DNS protocol, messages

DNS protocol : query and reply messages, both with same message format Name, type fields for a query msg header . identification: 16 bit # for RRs in reponse query, reply to query uses to query same # . flags: records for - query or reply authoritative servers - recursion desired - recursion available additional “helpful” - reply is authoritative info that may be used

5 6

The Internet Network Layer Packet and Packet Header

Previously . . . the Internet is a packet switched network: Host, router network layer functions: data is parceled into packets each packet carries a destination address each packet is routed independently Transport layer: TCP, UDP packets can arrive out of order packets may not arrive at all Routing protocols Forwarding protocol (IP) • addressing conventions • path selection Just as with the postal system, the “content” you want to send must datagram format • RIP, OSPF, BGP • be put into an envelope and the envelope must be addressed Network • packet handling conventions layer forwarding “Signalling” The “envelope” in this case is the packet header table protocol (ICMP) • error reporting • router “signaling” Recall: protocols are rules (“syntax” and “grammar” ) governing communication between nodes Link layer: Ethernet, WiFi, SONET, ATM The format of a packet header is part of a protocol Physical layer: copper, fiber, radio, microwave

For packet forwarding on the Internet, the protocol is the (IP) 7 8

2 1/19/11

Encapsulation IPv4 Packet Header Format usually 20 bytes Each protocol has its own “envelope” usually IPv4 (without options) • each protocol attaches its header to the packet 4-bit 4-bit 8-bit • so we have a protocol wrapped inside another protocol 16-bit total length (bytes) IP fragmentation version hdr len Type of Service • each layer of header contains a protocol demultiplexing field to (bytes) (TOS) 3-bit identify the “packet handler” the next layer up, e.g., 16-bit Identification source Flags 13-bit Fragment Offset • protocol number M application

• port number Ht M transport max number 8-bit Time to H H 8-bit Protocol 16-bit header checksum n t M network remaining hops Live (TTL) H H H M l n t link (decremented at physical each router) 32-bit Source IP Address H H H M l n t link Hl Hn Ht M physical upper layer protocol to deliver payload to, 32-bit Destination IP Address switch e.g., ICMP (1), UDP (17), TCP (6) Options (if any) H H M e.g. timestamp, destination n t network Hn Ht M error H H H M record route message M application l n t link Hl Hn Ht M check segment H M taken, specify t transport physical Payload (e.g., TCP/UDP packet, max size?) header datagram/packet H H M route n t network H H H M frame l n t link router physical 9 10

Packet Forwarding IP Addressing: Introduction

Goal: deliver packets through routers from source to 223.1.1.1 destination IP address: 32-bit identifier for 223.1.2.1 • source node puts destination address in packet header host/router interface 223.1.1.2 • each router node on the Internet: 223.1.1.4 223.1.2.9 • looks up destination address in its routing table routing algorithm interface: connection between • we’ll study several path selection (i.e., routing) algorithms 223.1.2.2 host/router and physical link 223.1.1.3 223.1.3.27 • sends the packet to the next hop towards the destination local forwarding table • routes may change during session dest address output link • routers typically have 0100 3 0101 2 multiple interfaces • analogy: driving, asking directions 0111 2 1001 1 • host may have multiple 223.1.3.1 223.1.3.2 interfaces destination address in arriving packet’s header • IP address associated with 1 each interface 0111

3 2 223.1.1.1 = 11011111 00000001 00000001 00000001

223 1 1 1

11 12

3 1/19/11

Flat vs. Hierarchical Addressing IPv4 Addressing 4 Independent of physical hardware address Flat addressing: 1 2 2 2 32-bit number represented as dotted decimal: • each router needs 10 entries 3 2 in its routing table 4 - • for ease of reference 5 5 • each # is the decimal representation of an octet 6 6 7 7 Divided into two parts: 8 2 9 7 • network prefix, globally assigned 10 2 • route to network first 11 2 • host ID, assigned locally Hierarchical addressing: • hosts only need to know the default router, Example: 12.34.158.0/24 is a 24-bit network prefix with 28 host addresses usually its border router • each border router keeps in its routing table: • next hop to other networks 12 34 158 5 • all hosts within its own network 3.1 2.* 2.1 00001100 00100010 10011110 00000101 1.* 1.1 4.* 2.1 3.2 3.2 note that for routing table, we store the next hop address 3.3 3.3 instead of the interface number 3.4 3.4 Network (24 bits) Host (8 bits) 13 14

Subnets Classfull Addresses For the example network prefix: 12.34.158.0/24

223.1.1.1 • how many hosts can the network have?

A network can be further 223.1.2.1 divided into subnets 223.1.1.2 What is a good partition of the 32-bit address space 223.1.1.4 223.1.2.9 between the network and host parts?

What’s a subnet ? 223.1.2.2 • device interfaces with same 223.1.1.3 223.1.3.27 Historically . . . classfull addresses: subnet part of IP address Class A: 0*, very large /8 blocks (e.g., MIT has 18.0.0.0/8) • can physically reach each other LAN Class B: 10*, large /16 blocks (e.g,. UM has 141.213.0.0/16) without intervening router Class C: 110*, small /24 blocks (e.g., AT&T Labs has 192.20.225.0/24) 223.1.3.1 223.1.3.2 Class D: 1110*, multicast groups Class E: 11110*, reserved for future use

Problems: a network consisting of 3 subnets 1. the Goldilock problem: everybody wanted a Class B 2. address space usage became inefficient 3. routing table explosion 4. and then, address space became scarce… 15 16 • by 1992, half of Class B has been allocated, would have been exhausted by 3/94

4 1/19/11

Classless InterDomain Routing (CIDR) CIDR: Hierarchical Address Allocation

Network portion of address is of arbitrary length, Prefixes are key to Internet routing scalability determined by a prefix mask •address allocation by ICANN, ARIN/RIPE/APNIC and by ISPs •routing protocols and packet forwarding based on prefixes Uses two 32-bit numbers to represent a network address •today, routing tables contain ~150,000-200,000 prefixes network number = IP address + mask 12.0.0.0/16 IP address: Usually written as a.b.c.d/x, 12.1.0.0/16 12.3.0.0/24 : 12.4.0.0 00001100 00000100 00000000 00000000 where x is number of bits 12.2.0.0/16 12.3.1.0/24 : in the network portion of 12.3.0.0/16 : : : address: 12.4.0.0/15 mask: 12.0.0.0/8 12.3.254.0/24 255.254.0.0 11111111 11111110 00000000 00000000 : : 12.253.0.0/19 : 12.253.32.0/19 Another example: 12.253.64.0/19 Network Prefix for hosts 200.23.16.0/23 12.253.0.0/16 12.253.96.0/19 network host 12.254.0.0/16 12.253.128.0/19 prefix part 12.253.160.0/19 11001000 00010111 00010000 00000000 12.253.192.0/19 17 18

CIDR: Route Aggregation Longest Prefix Match: More specific routes

Hierarchical addressing allows efficient advertisement of routing information: ISPs-R-Us has a more specific route to Organization 1

Organization 0

Organization 0 200.23.16.0/23 200.23.16.0/23 “Send me anything Organization 1 “Send me anything with addresses 200.23.18.0/23 Organization 2 beginning with addresses Organization 2 beginning 200.23.20.0/23 . Fly-By-Night-ISP 200.23.16.0/20” 200.23.16.0/20” . 200.23.20.0/23 . Fly-By-Night-ISP . . Internet . . . . Organization 7 . . Internet Organization 7 . 200.23.30.0/23 200.23.30.0/23 ISPs-R-Us “Send me anything “Send me anything with addresses ISPs-R-Us beginning 199.31.0.0/16 with addresses Organization 1 or 200.23.18.0/23” beginning 200.23.18.0/23 199.31.0.0/16”

19 20

5 1/19/11

How are Packets Forwarded? Special IPv4 Addresses

Routers have forwarding tables forwarding table • maps each IP prefix to next-hop link(s) 4.0.0.0/8 • network identification: • entries can be statically configured 4.83.128.0/17 • 0s on host part, e.g. ,141.212.0.0 (cannot be used to send packets) • e.g., “map 12.34.158.0/24 to Serial0/0.1” 12.0.0.0/8 destination • directed broadcast: 12.34.158.0/24 12.34.158.5 • 0xffff on host part, e.g., 141.212.255.255 126.255.103.0/24 Destination-based forwarding • Broadcast to all hosts on network (141.212) (Not implemented?) • Packet has a destination address • limited broadcast: • Router identifies longest-matching prefix • 0xffffffff, received by all hosts on LAN, not forwarded beyond LAN outgoing link • this computer: Serial0/0.1 But, this doesn’t adapt • 0.0.0.0 to be used at startup to ask for one’s own IP address (RARP, • to failures deprecated) • to new equipment • loopback address: • to the need to balance load • 127.*.*.* (usually 127.0.0.1), named localhost • … • pkts sent to localhost traverse down the kernel networking code & back up That is where routing protocols come to application without traversing the network, useful for testing networking code in… [more on this in the next lectures]

21 22

PA1 Topics IPv4 Packet Header Format

usually 20 bytes usually IPv4 Replace calls to my “hidden” functions (without options) • leaving them “as is” is considered cheating! 4-bit 4-bit 8-bit 16-bit total length (bytes) IP fragmentation version hdr len Type of Service (bytes) (TOS) Stream socket data separation: 3-bit • Use records (data structures) to partition data stream 16-bit Identification Flags 13-bit Fragment Offset • Three ways to tell the end of a record max number 8-bit Time to • How do we implement variable length records? 8-bit Protocol 16-bit header checksum size of remaining hops Live (TTL) record (decremented at each router) A B C 4 32-bit Source IP Address upper layer protocol fixed length fixed length variable length to deliver payload to, 32-bit Destination IP Address record record e.g., ICMP (1), UDP (17), record TCP (6) Options (if any) e.g. timestamp, error How to populate and read a struct, e.g., when vip_input() calls record route check taken, specify Payload (e.g., TCP/UDP packet, max size?) header vif_pullup()? route

23 24

6 1/19/11

ICMP: Internet Control Message Protocol Traceroute and ICMP

Used by hosts & routers to Source sends a series of UDP communicate network-level Type Code description packets to dest When ICMP message arrives back at the source, information 0 0 echo reply (ping) • first pkt has TTL =1 source calculates RTT • error reporting: unreachable 3 0 dest. network unreachable • second pkt has TTL=2, etc. host, network, port, protocol • sent to a (probably) unused port 3 1 dest host unreachable Traceroute does this 3 times per TTL • echo request/reply (used by ping) 3 2 dest protocol unreachable number 3 3 dest port unreachable Stopping criterion: Network-layer “above” IP: 3 4 frag needed but DF set When n-th packet arrives at • UDP pkt eventually arrives at destination host 3 6 dest network unknown n-th router: • ICMP msgs carried in IP • Destination returns ICMP “destination port • router discards packet datagrams 3 7 dest host unknown unreachable” message (type 3, code 3) 8 0 echo request (ping) • and sends to source an ICMP • When source gets this ICMP message, stops 9 0 route advertisement message (type 11, code 0) sending UDP packets ICMP message: 10 0 router discovery • message includes IP address of • type router 11 0 TTL expired Time • code 12 0 bad IP header TTL=1 exceeded • plus first 8 bytes of IP datagram Time causing error destination source TTL=2 exceeded

25 26

IP Fragmentation & Reassembly IP Fragmentation and Reassembly

Network links have MTU (max Example: 4000 byte datagram transfer size) limit - the largest MTU = 1500 bytes unique per datagram possible link-level frame per source • different link types, different MTUs fragmentation: length ID fragflag offset in: one large datagram =4000 =x =0 =0 out: 3 smaller datagrams Large IP datagrams are split up (“fragmented”) within the 1480 bytes in offset = data field 1480/8 network reassembly • one datagram becomes length ID fragflag offset =1500 =x =MF =0 several datagrams One large datagram becomes • each with its own IP header several smaller datagrams •all but the last fragments must be in length ID fragflag offset • fragments are “reassembled” multiple of 8 bytes =1500 =x =MF =185 only at final destination •offsets are specified in unit of 8-byte chunks (why?) length ID fragflag offset • IP header bits used to identify =1040 =x =0 =370 and order related fragments 27 28

7 1/19/11

Fragmentation Considered Harmful Fragmentation Considered Harmful

Reason 1: Lose 1 fragment, lose Reason 2: Inefficient transmission Analysis: whole packet: • IP doesn’t have control over number of fragments • kernel has limited buffer space Example: • TCP can do buffer management better because it has more • but IP doesn’t know number of information •10 KB of data fragments per packet •sent as 1024 byte TCP segments •uses 10 IP packets, each 1064 bytes Alternatives to fragmentation: For example: •Suppose MTU is 1006 bytes • send only small datagrams (why not?) • Sender sends two pkts, L and S •each TCP packet is fragmented into 2 pkts of • do path MTU discovery and let TCP send the appropriate segment sizes 1006 bytes & 78 bytes • L is fragmented into 8 fragments • set DF flag •ends up sending 20 packets • S is fragmented into 2 fragments • router returns ICMP error message (type 3, code 4) if fragmentation necessary • Receiver has 8 buffer slots •If TCP had sent 966-byte segments, only • IPv6 enforces min MTU of 576 bytes, no fragmentation at routers need to send 11 pkts • Suppose fragments arrive in the following order: L1, L2, L3, L4, L5, L6, L7, S1, L8, S2 • Receiver’s buffer fills up after S1, both packets thrown away when reassembly timer times out 29 30

IPv6 IPv6 Addresses

Initial motivation: 32-bit address space exhaustion What does an IPv6 address look like? • 128 bits written as 8 16-bit integers seperated by ’:’ Additional motivation: • each 16 bit integer is represented by 4 hex digits • header format helps speed processing/forwarding • fixed-length 40 byte header (0.06% overhead) • header checksum: removed entirely to reduce processing time at each hop Example: FEDC:BA98:7654:3210:FEDC:BA98:7654:3210 • options: allowed, but outside of header, indicated by “next header” field • header changes to facilitate QoS: Abbreviations: • priority: identify priority among datagrams in flow (ToS bit) • actual: 1080:0000:0000:0000:0008:0800:200C:417A • flow label: identify datagrams in • skip 0’s: 1080:0:0:0:8:800:200C:417A the same “flow” (concept of “flow” not well defined, originally these • double ’::’: 1080::8:800:200C:417A but were “reserved” bits) not ::BA98:7654:: Next header identifies “upper layer” protocol or IPv6 options: IPv4: 10.0.0.1 • hop-by-hop option, destination option, can be written as 0:0:0:0:0:0:A00:1 or ::10.0.0.1 routing, fragmentation, authentication, encryption 31 32

8 1/19/11

Tunneling NAT: Network Address Translation Not all routers can be upgraded simultaneous • no “flag days” Motivation: a stop-gap measure to handle the IPv4 address exhaustion • How will the network operate with mixed IPv4 and IPv6 routers? problem • share a limited number (≥ 1) of global, static addresses by a number of local hosts Tunneling: IPv6 carried as payload in IPv4 datagram among IPv4 routers • local to global address binding done per connection, on-demand

A B E F rest of Logical view: tunnel local network Internet (e.g., home network) IPv6 IPv6 IPv6 IPv6 10.0.0/24 10.0.0.1

A B C D E F 10.0.0.4 Physical view: 10.0.0.2 IPv6 IPv6 IPv4 IPv4 IPv6 IPv6 138.76.29.7 Src:B Src:B Flow: X Flow: X 10.0.0.3 Src: A Dest: E Dest: E Src: A Dest: F Flow: X B-to-E: Flow: X Dest: F IPv6 inside Src: A Src: A All datagrams leaving local Datagrams with source or data Dest: F IPv4 Dest: F data network have same source NAT IP destination in this network have 10.0.0/24 address for A-to-B: data data E-to-F: address: 138.76.29.7, IPv6 source, destination (as usual) IPv6 33 different source port numbers 34

NAT: Network Address Translation NAT: Network Address Translation

NAT translation table 1: host 10.0.0.1 A NAT box functions: global addr local addr 2: NAT router sends datagram to changes datagram • replaces of every 138.76.29.7, 5001 10.0.0.1, 3345 128.119.40, 80 outgoing datagram to source addr from …… …… 10.0.0.1, 3345 to • remote hosts use as destination addr 138.76.29.7, 5001, S: 10.0.0.1, 3345 updates table D: 128.119.40.186, 80 10.0.0.1 • remember (in NAT translation table) every 1 to 2 S: 138.76.29.7, 5001 mapping D: 128.119.40.186, 80 10.0.0.4 10.0.0.2 • replaces in dest field 138.76.29.7 of every incoming datagram with corresponding S: 128.119.40.186, 80 4 S: 128.119.40.186, 80 3 D: 10.0.0.1, 3345 stored in the NAT D: 138.76.29.7, 5001 10.0.0.3 table 3: Reply arrives 4: NAT router • forwards modified datagrams into the local network dest. address: changes datagram 138.76.29.7, 5001 dest addr from 138.76.29.7, 5001 to 10.0.0.1, 3345

35 36

9 1/19/11

NAT: Network Address Translation

Advantages: • can change address of devices in local network without notifying outside world • devices inside local net not explicitly addressable by or visible to the outside world (a security plus)

Disadvantage: • devices inside local net not explicitly addressable by or visible to the outside world, making peer-to-peer networking that much harder • routers should only process up to layer 3 (port#’s are app layer objects) • address shortage should instead be solved by IPv6, instead NAT hinders the adoption of IPv6 (nothing wrong with that?)

Lesson: Be careful what you propose as a “temporary” patch, “temporary” solutions have a tendency to stay around beyond expiration date

“The evil that men do lives after them, the good is oft interred with their bones.” – Shakespeare, Julius Caesar37

10