AARNet's experiences using MPLS for protection

Internet2/NLANR Joint Techs Meeting Boulder, CO, USA 2002-07-28

Glen Turner, Network Engineer Australian Academic & Research Network glen.turner@.edu.au http://www.aarnet.edu.au/ Topics

MPLS overview Protection technology AARNet's experiences with MPLS Other interesting stuff if we have time Coverage

MPLS is a big topic with multiple implementation choices at almost every turn Only discuss some of the technolgy choices

● MPLS generic tagging, not ATM tagging

● RSVP and not LDP

● OSPF and not IS-IS Coverage

Discuss the use of MPLS for protection, not discussing some important uses of MPLS

● VPNs (and thus BGP)

● GMPLS, the integrated control layer for switching technologies “How to speak Australian”

● words with “or” à “our”, “z” à “s” SONET à SDH (slight framing difference) T1 à E1 (E1 is 2Mbps) Topic MPLS overview

Label switching protocols, RSVP Routing protocols, OSPF MPLS aims

Scalable IP traffic engineering

● Avoid need for full IP network knowledge at core Virtual service

● By providing label switch paths exclusive to a customer This presentation focuses on traffic engineering

● Only beginning to experiment with VPNs MPLS is a layer 2½ protocol

7 Application 7 Application

6 Presentation 6 Presentation

5 Session 5 Session

4 Transport 4 Transport

3 Network 3 Network

2 Link 2½ MPLS

1 Physical 2 Link

1 Physical Advantages of layer 2½

No complex next hop algorithm

● IP address lookup is expensive – Closest matching prefix versus table lookup ● IP next hop algorithm gets more complex with each new service – Policy routing – Multicast Want GbE switch prices not GbE prices New behaviours only effect edge routers Advantages of layer 2½, cond

No need to follow IP routing

● The shortest path may not be the best path

● Want policy – For traffic engineering

● Bandwidth ● Diverse routers and paths – For arbitrary customer requirements

● eg: Australian Army doesn't want to be routed over links not owned by Australian-controlled telcos Advantages of layer 2½, cond

Why MPLS for policy and not BGP?

● BGP is globally visible – Scalability: Does outer Mongolia need to know of an interface failure in outback Australia? – Can lose connectivity due to dampening, which is essential due to global visibility ● Not all reasonable policies can be expressed in BGP Disadvantages of layer 2½

Another set of control protocols

● ATM: OAM, ILMI, PNNI

● 802.1Q VLANs: Virtual LAN reservation protocol

● SDH/SONET MPLS uses IP as its control and routing protocol Layer 2.5 and protection

Network layer protection requires a network-layer repsonse

● Limited by convergence time of routing protocol

● Fast convergence and global visibility do not mix – BGP rate limiting is an expression of this Layer 2.5 and protection

Link layer protection requires a link layer response

● These often have constrained topologies – SDH/SONET rings – 802.1D and parallel links ● They often inefficiently use protection bandwidth

● They often treat all network traffic as equally valuable

● Lack of network topology: poor decisions Layer 2.5 and protection

Allow to establish pre- routed fallback path

● Full topology awareness Allow link layer to switch to fallback path

● Not globally visible

● Fast convergence This could get messy upon multiple failures

● Run interior routing protocol afterwards Forwarding equivalence class

Another view of IP routing

● Step 1: Determine forwarding equivalence class from IP header (or more) – Standard: Destination IP address – Advanced: source IP address, multicast group, DSCP, TCP port, increasing bizaare ● Step 2: Lookup FEC forwarding table to determine output interface (ie: switch the packet) Forwarding equivalence class, cond IP router calculates forwarding equivalence class at every hop

● Expensive – either in CPU time or hardware ● Extensive – IP forwarding table is big with frequent updates ● Difficult to alter for new behaviours – ASIC designers may have not anticipated the change (reverse path lookup, source-specific multicast) Forwarding equivalence class, cond MPLS switching

● Determine forwarding equivalence class at ingress

● Tag packet with a fixed-length label for this forwarding equivalence class

● Switch using the label at every other hop to egress – Tags are designed for hardware manipulation Labels are not globally unique

Even one router can run multiple “label spaces”

– eth0, eth1 in LS1 – eth2, eth3 in LS2 Edge routers need distinct IP routing tables for each label space

● The key to MPLS VPNs

● We often want multiple routing tables and settle for policy routing instead MPLS tag

A 32-bit header in front of the packet Tag contains just enough information for forwarding and queuing

● Unlike IPv4/IPv6 header, which carries a lot more Tag has hardware-friendly structure MPLS tag, fields

Label

● Determines next-hop interface Experimental (QoS)

● Determines output interface queuing S for “last of stack”

● S=1 on last header Time to live

● Discard upon zero, otherwise decrement MPLS tag, stacking

Tag Tag Tag Network-layer packet S=0 S=0 S=1

An MPLS tagged packet can be tagged again (“stacked”)

● Allows Provider-Provider connections to maintain customer tags

● Simplifies design considerably

● Avoids need for global label space MPLS tag, stacking and MTU

The tag may reduce the size of the path maximum transmission unit (PMTU)

● TCP/IP stacks don't cope well with change of PMTU – PMTU at establishment of TCP determines TCP MSS ● Best to ensure that main and protect paths have identical tag depths Or may not, if the link layer will let us flex the rules MPLS operation

mpls-path.dia MPLS operation, cond Label switch router Incoming packet, look up incoming label map, which contains

● Incoming label

● MPLS opcode: PUSH, POP, etc

● Forwarding equivalence class

● Link to outgoing next hop label entry MPLS operation, cond Label switch router Incoming packet operations

● Extract label from top tag

● Lookup incoming label map

● Execute MPLS opcodes to manipulate tags

● Forward packet to outgoing processing MPLS operation, cond Label switch router Outgoing packet, look up next hop label entry, which contains

● Outgoing label

● Outgoing interface

● Perhaps, outgoing per-hop queuing behaviour MPLS operation, cond Label switch router Outgoing packet operations

● Look up next hop label entry

● Create new tag containing outgoing label

● PUSH tag onto label stack

● Add to transmit queue on outgoing interface – queuing discipline may depend upon

● Value in next hop forwarding entry ● Value determined from Exp bits, a lá IP DSCP and weighted fair queuing + RED MPLS operation, cond Ingress label edge router Incoming packet, look up forwarding equivance class to next hop label entry (FTN), which contains

● forwarding equivalence class

● next hop label entry MPLS operation, cond Ingress label edge router Incoming packet operations

● Determine forwarding equivalence class using “standard” IP forwarding – Basic: lookup destination IP address in IP forwarding table – Advanced: policy routing, multicast routing, QoS routing, ... ● Use FEC to lookup forwarding equivalence class to next hop label entry table

● Process next hop label entry MPLS operation, cond Egress label edge router Next hop label entry shows this router as the penultimate hop Protocol-dependent actions to simulate label switch routers being real routers

● Decrement IP TTL

● Generate any ICMP which would have occurred Forward the packet using the standard IP algorithm Faking ICMP gives interesting results Traceroute from Glen's home to www.internet2.edu

1 sadial.sa.csiro.au 119.657 ms 129.673 ms 100.004 ms 2 sa.gw.csiro.au 119.944 ms 129.829 ms 110.382 ms 3 lis255.atm1-0.central.saard.net 131.917 ms 119.858 ms 109.980 ms 4 sa-nsw.atm.net.aarnet.edu.au 139.715 ms 149.829 ms 140.002 ms 5 vlan916.gbe3-0.sccn1.broadway.aarnet.net.au 149.941 ms 149.773 ms 149.968 ms 6 pos1-0.sccn1.manoa.aarnet.net.au 349.907 ms 279.791 ms 289.963 ms 7 pos2-0.sccn1.seattle.aarnet.net.au 279.866 ms 329.880 ms 279.904 ms 8 Abilene-PWAVE.pnw-gigapop.net 279.870 ms 351.155 ms 328.555 ms 9 dnvr-sttl.abilene.ucaid.edu 339.933 ms 339.861 ms 329.944 ms 10 kscy-dnvr.abilene.ucaid.edu 349.847 ms 339.622 ms 350.053 ms 11 ipls-kscy.abilene.ucaid.edu 339.756 ms 339.932 ms 339.903 ms 12 clev-ipls.abilene.ucaid.edu 339.884 ms 349.808 ms 339.963 ms 13 nycm-clev.abilene.ucaid.edu 349.752 ms 349.857 ms 339.969 ms 14 border-abilene-oc3.advanced.org 360.135 ms 359.857 ms 379.851 ms 15 www.internet2.edu 379.865 ms 359.838 ms 359.950 ms Architectural issues

There is a lot of complexity at the edge

● Especially in the egress router But we want the edge to be cheap, as there is a lot of it There are no MPLS applications

ATM has applications

● (Today's bizaare but true fact) Links between 3G base stations and switching points is the most recent application to treat ATM as a transport layer Even has applications

● DEC Local Area Transport There are no MPLS applications

MPLS exists only to carry other protocols

● The label edge routers must support the protocol

● This isn't new – All routers have to support the network layer protocol they are routing Model is strained somewhat by abuse of MPLS to carry ethernet frames Configuring a label switch router Linux

Both eth0 and eth1 in label space 1

● mplsadm -L eth0:1 mplsadm -L eth1:1 Configuring a label switch router Linux

Configure label switching

● mplsadm -A -I gen:10:1 -O gen:20:ipv4:10.3.0.2 -B mplsadm -A -I gen:21:1 -O gen:11:ipv4:10.2.0.1 -B – -A -B: add and bind – -I: incoming on eth0, generic tag, label 10 – -O: outgoing on eth1, generic tag, label 20, only if next hop is available Configuring a label edge router Linux

Configuration for left-most router Label space

● mplsadm -L eth0:1 mplsadm -L eth1:1 Configuring a label ingress router – Linux

Ingress label edge router Set forwarding equivalence class in routing subsystem

● route add -net 10.4.0.0/16 gw 10.2.0.2 Set FEC in MPLS subsystem

● mplsadm -A -B -O gen:10:eth0:ipv4:10.2.0.2 -f 10.4.0.0/16 – outgoing label of 10 Egress label edge router

● mplsadm -A -I gen:11:1

● mplsadm -A -I gen:10:1 -O gen:20:ipv4:10.3.0.2 -B mplsadm -A -I gen:21:1 -O gen:11:ipv4:10.2.0.1 -B – -A -B: add and bind – -I: incoming on eth0, generic tag, label 10 – -O: outgoing on eth1, generic tag, label 20, only if next hop is available Configuring a label egress router Linux

Egress label edge router Incoming MPLS packets with label 11 are POPed and escalated to IP routing system

● mplsadm -A -I gen:11:1 Configuring a label edge router Linux

Label space

● mplsadm -L eth0:1 mplsadm -L eth1:1 Ingress label edge router

● Forwarding equivalence class is determined by routing sub-system

● route add 10.4.0.0/16 gw 10.2.0.2 mplsadm -A -B -O gen:10:eth0:ipv4:10.2.0.2 -f 10.4.0.0/16 Egress label edge router

● mplsadm -A -I gen:11:1

● mplsadm -A -I gen:10:1 -O gen:20:ipv4:10.3.0.2 -B mplsadm -A -I gen:21:1 -O gen:11:ipv4:10.2.0.1 -B – -A -B: add and bind – -I: incoming on eth0, generic tag, label 10 – -O: outgoing on eth1, generic tag, label 20, only if next hop is available Representation

How should MPLS look to the network layer? The preceeding is not a good fit

● eth0 has multiple subnets

● eth0 can be partially down

● Routing protocols need considerable work Representation

A tunnel seems a good fit

● Tunnels run between routers, making intermediate routers invisible

● Tunnels have MTU issues, as does MPLS

● Routing protocols understand tunnels

● Management systems expectations are met – interface either down or up – SNMP counters count something useful Configuring a label ingress router with tunnels – Linux

Create MPLS tagging

● mplsadm -A -O gen:10:eth0:ipv4:10.3.0.2 – -A -O: add outgoing label

● gen:10: generic tag with label 10 ● eth0: outgoing interface ● ipv4:10.3.0.2: address of remote-end of tunnel Configuring a label ingress router with tunnels – Linux

Create a tunnel interface

● mplsadm -A -T mpls0 – -A -T: Add tunnel

● mpls0: tunnel interface name Configuring a label ingress router with tunnels – Linux

Assign an IP address to the local end of the tunnel, use the same address as the ethernet interface

● ifconfig eth0 inet addr:10.2.0.1 ifconfig mpls0 10.2.0.1 netmask 255.255.255.255 – mpls0: tunnel interface to configure – 10.2.0.1: local-end IPv4 address Configuring a label ingress router with tunnels – Linux

Bind outgoing label to tunnel

● mplsadm -B -O gen:10:eth0 -T mpls0 – -B -O: bind outgoing label

● gen:10: generic tag with label 10 ● eth0: interface – -T: tunnel

● mpls0: tunnel interface name Configuring a label ingress router with tunnels – Linux

Forward traffic to mpls0 tunnel

● route add -net 10.4.0.0/16 gw 10.3.0.2 dev mpls0 – 10.4.0.0/16: Forwarding equivalence class – gw 10.3.0.2: remote tunnel-end address – dev mpls0: next hop interface Configuring a label egress router with tunnels – Linux

Same as normal egress

● mplsadm -A -I gen:11:1 Configure label edge router Linux Configure the rightmost label edge router similarly We want to do this automatically

● That is, to use a signalling protocol Topic MPLS overview

Label switching Control plane protocols, RSVP Routing protocols, OSPF IP-based signalling and routing

Unusual, most link technologies develop their own signalling and routing

● Ethernet: bridge protocol data unit – carries

● 802.1D spanning tree ● 802.1Q virtual LAN registration protocol ● ATM: OAM, ILMI and PNNI Signalling

LDP: Label distribution protocol RSVP: Resource reservation protocol We'll only discuss RSVP RSVP

A soft-state protocol for establishing and maintaining IntServ QoS paths

● Sent Path message requests a IntServ path

● Received Resv message confirms a IntServ path request RSVP, cond

New RSVP objects for MPLS paths Path mesage

● LABEL_REQUEST: create a label switched path

● EXPLICIT_ROUTE: through these label switch routers Resv message

● LABEL: Inserts entry into label switch forwarding table RSVP and traffic engineering

Sometimes don't want the shortest path

● A longer congestion-free path is always better than a shorter congested path

● Bizaare customer requirements – eg: ADF and links controlled by non-ANZUS telcos ● Diversity – Complex, as lots of failure modes

● Don't want to share core, cable, conduit, router, UPS, building, site, block, substation, road, flood plain, craft personnel, jurisdiction RSVP and diversity

● RSVP has “resource affinities”, roughly 32 per label space – Enough for broad-brush use, say for a national backbone – AARNet doesn't use this

● Our use of MPLS is either too trivial or too complex RSVP and degraded service

RSVP has a Setup Priority and Holding Priority

● These allow established paths to be pre- empted by a new path

● AARNet considering use for recovery scenarios – So we can prioritise use of degraded capacity – eg: voice, commodity, research, quality video, multicast RSVP failure

Hello protocol

● HELLO REQUEST

● HELLO ACK Detects

● Node down

● Node reboot – Thus needs instant path re-establishment ● All links between the two nodes have failed RSVP node failure, cond

No “alarm heirarchy” of Hellos

● They run on every label switch path Good

● Alarm heirarchies often fail – CPU overwhelmed by massive failure Bad

● Bandwidth and CPU interrupts

● End-to-end, not segment-based This won't do for GMPLS Signalling configures path in one direction Important that other direction be established :-) It should follow the same physical segments

● Balakrishnan, Padmanabhan, Fairhurst, et al TCP performance implications of network path asymmetry – draft-ietf-pilc-asym-07 Topic MPLS overview

Label switching Control plane protocols, RSVP Routing protocols, OSPF Requirements

We want to specify paths with

● Forwarding equivalency class

● Origin and destination node

● Path placement constraints So the routing protocol needs to distribute

● Connectivity

● Path attributes to satify constraint calculations Possible contraints Routers Support for prioritisation Support for protocols Available bandwidth Link technologies Protection switching technologies Possible contraints Links Available bandwidth Reliability Colour Cost Membership of shared link risk group OSPF implementation

Add new link state advertisment types which contain link attributes These LSAs should be ignored by standard OSPF – they are “oqaque” There are three new Opaque LSAs, all identical except for flooding scope Add an OSPF Hello option so neighbours can become Opaque LSA neighbours and pass Opaque LSAs Structure of the opaque (huh?)

List of TLVs for routers and links

● Type, length value

● Allows un unsupported variable to be silently ignored Attributes are held in sub-TLVs

● TLVs within TLVs Routers sub-TLV

● Router ID Structure of the oqaque

Link TLV

● Identity sub-TLVs – Link type: point-to-point, multi-point – Router ID of neighbour – Local interface IP address – Remote interface IP address Structure of the opaque

Link TLV

● Traffic engineering sub-TLVs – Traffic engineering metric, 32-bit cardinal – Maximum bandwidth, 32-bit floating point – Maximum reservable bandwidth, 32-bit floating point – Unreserved bandwidth, 32-bit floating point – Resource colour, 32-bit mask

● A “colour” might be a DWDM channel, or a E1 time- slice within an E3, or a ... Limitations - flooding

Traffic engineering values can change rapidly and repeatedly

● Available bandwidth Important to limit flooding Opaque LSAs don't do this nearly as well as they could as there are only three flooding scopes Limitations - summarisation

Difficult to summarise traffic engineering information Thus areas are difficult to construct But areas are vital in limiting flooding Configuration – Zebra Interface control zebra.conf

● interface eth0 bandwidth 100000 description Link to LSR ip address 10.2.0.1/30

● interface eth1 description Hosts bandwidth 100000 ip address 10.1.0.1/16

● interface mpls0 description Tunnel bandwidth 100000 ip address 10.3.0.2/32 no multicast ipv6 nd suppress-ra Configuration – Zebra OSPF router ospfd.conf

● router ospf ospf router-id 10.2.0.1 auto-cost reference-bandwidth 10000 area 0 authentication message-digest network 10.1.0.0/16 area 0 network 10.2.0.0/30 area 0 network 10.3.0.2/32 area 0 neighbor 10.3.0.2 capability opaque mpls-te mpls-te router-address 10.2.0.1 Configuration - Zebra OSPF interfaces ospfd.conf

● interface eth0 ip ospf network broadcast ip ospf authentication message-digest ip ospf message-digest-key ... md5 ... mpls-te link metric 0 mpls-te link max-bw 1e+07 mpls-te link max-rsv-bw 5e+06 mpls-te link rsc-clsclr 0x1

● interface eth1 ip ospf network broadcast ip ospf authentication message-digest ip ospf message-digest-key ... md5 ... Configuration - Zebra OSPF tunnel interface ospfd.conf

● interface mpls0 ip ospf network point-to-point ip ospf authentication message-digest ip ospf message-digest-key ... md5 ... MPLS is improving OSPF

Dynamic shortest path first algorithms

– About 10% of full-DB Dijkstra Hitless restart

– Remove assumption OSPF comes up in quiescent netwok Graceful handing of failure

– Database overflow – Rate limiting

● Especially of flapping interfaces Load sharing AARNet's US capacity AARNet's load share configuration – South

● interface POS1/0 description Seattle-Sydney SDH ip address 192.231.212.34 255.255.255.252 ip ospf cost 128 mpls traffic-eng tunnels mpls traffic-eng backup-path Tunnel8204 tag-switching ip pos ais-shut pos report lrdi ip rsvp bandwidth 150000 150000 ... AARNet's load share configuration – North

● interface POS2/0 description Seattle-Manoa SDH ip address 192.231.212.162 255.255.255.252 ip ospf cost 64 mpls traffic-eng tunnels mpls traffic-eng backup-path Tunnel 8203 tag-switching ip pos ais-shut pos report lrdi ip rsvp bandwidth 150000 150000 OSPF design hints

Use current best practice

● Small area 0, consistent with TE – Area 0 has total network knowledge ● Using areas allows address aggregation – Most importantly this aggregates network state

– Addressing needs to be thought Aoreaut in Area 2 adv0 ance Manoa Sydney Seattle core

Area 1 Wollongong core OSPF design hints, cond

● Loopback interface as router ID – Make this a /32 ● Broadcast and loop media has an advantage – Only two routers in subnet (DR and BDR) track area state ● Don't redistribute – Use network statements – You'll end up with a lot of these so use a Perl script ● Use MD5 authentication Topic Protection technology Fast re-route Basic mechanism Fast re-route

● Detect fault using – Link layer carrier loss – RSVP Hello timeout (150ms) ● Signal failure using RSVP ResvTear message

● Change to pre-established label switch path

● Recalculate optimal paths by running OSPF RSVP messages

Path FAST_REROUTE

● Request a path proected with a fast re- route path Path DETOUR

● Request a fast re-route path Two modes of operation

J: LSP oriented:

● Establish an detour LSP to protect one other LSP

● Upon failure switch packets to the detour LSP Two modes of operation

C: Tunnel oriented

● Establish a tunnel to protect other tunnels

● Upon failure send the packets through the tunnel – pushing onto the label stack ● One backup tunnel can protect many other tunnels These don't interoperate. Ouch OSPF run to clean up

A multiple failure may not lead to a sane topology OSPF is run to route all active main and detour LSPs optimally Need to rate limit how often this is done

● else intermittent interface failures will use more CPU than they deserve Tunnel-style fast re-route

The main LSP

● interface POS1/0 description Seattle-Sydney fiber ip address 192.231.212.34 255.255.255.252 mpls traffic-eng tunnels ! Seattle-Manoa protect mpls traffic-eng backup-path Tunnel8204 tag-switching ip pos ais-shut pos report lrdi ip rsvp bandwidth 150000 150000 ... Tunnel-style fast re-route

The backup tunnel

● interface Tunnel8204 description Seattle-Manoa backup ip unnumbered Loopback0 tag-switching ip ! Loopback0 on manoa tunnel destination 192.231.212.148 tunnel mode mpls traffic-eng tunnel mpls traffic-eng priority 0 0 tunnel mpls traffic-eng path-option 1 explicit name sea-haw ... Topic AARNet's experiences

Configuration Protection Measurement Properties of international links Future “You are in one of a large number of tunnels, all seemingly alike” The number of MPLS paths explodes quickly It took up some time and a lot of care to get all the tunnels established Managing tunnels

Naming conventions Only the beginning of automated tools

● These tend to be proprietary rather than general, and driven from a GUI rather than a database Had to build a lot of our own tools

● SNMP program to check all LSPs had reverse LSP

● Wanted to write more but insufficent router MIBs Topic AARNet's experiences

Configuration Protection Measurement Properties of international links Future Restoration

MPLS performance should be worse than SDH performance

● MPLS is end-to-end protection and the link latency is 80ms

● SDH has section protection, longest section is 40ms Restoration

This was true in practice

● Still not enough time for a phone user to hang up

● Too long to be used to switch routers in and out of working path – Want to do this for software upgrades Performance under stress

MPLS restoration was better behaved than SDH when things fell apart

● AARNet's network management system has a sophistication that the SDH systems do not have – This leverages off the work on monitoring generic IP links ● We could detect and isolate odd conditions before they threatened service SDH in practice

SDH alarms can overwhelm management console

● Some vendors have poor isolation between configuration and operation Configuration errors are disturbingly common No interlocks

● Put main circuit into loopback

● Put protect circuit into loopback OSPF

Far too easy to cause OSPF-TE to fail

● Flapping interfaces drove CPU to 100%

● CPU then fails to generate OSPF Neighbour Hellos

● OSPF loses adjacencies

● CPU returns to 0%

● Repeat OSPF, cond

Fixes

● Obvious solution is to rate limit repeated OSPF next state output where state inputs are the same – Router manufacturers have gone for simpler variants of this, such as rate limiting all state changes OSPF, cond

Fixes, cond

● Dynamic alternatives to Dijkstra algorithm – Run time depends on “importance” of lost link, not size of total database – In practice, about 10% resources of standard algorithm Topic AARNet's experiences

Configuration Protection Measurement Properties of international links Future Measurement

Traceroute and ping haven't been useful as performance-measuring tools since flow routing MPLS nails coffin shut

● It's all faked at egress router, probably on slow path Active measurement

Need to allow for parallel paths

● Four adjacent IP addresses on measuring platforms

● Hashing will place these on differing paths to the same destination Need to use a fast-path protocol

● Not ICMP Be careful not to measure the measurement host Active measurement

Loss

● Indicates major fault or congestion Latency

● Indicates protection or misconfiguration – Measurement system needs to know nominal latency for main and protect paths SNMP

Needed to detect protection event Needed to detect loss of protect path

● RECOVERY Service:Tunnel8194(Broadway­Seattle)Backup Host:SCCN­Broadway­Router Address:162.231.212.20 State:OK Interface:OK–1 Date: Sat20Jul12:54:24.3 Useful for checking configuration

● Each label switch path has a reverse path

● Each main LSP has a protect LSP MPLS load sharing in operation SCCN interfaces in Sydney Sydney, NSW — Seattle, WA

Sydney, NSW — Manoa, HA — Seattle, WA MPLS load sharing in operation

Graphs are similar in shape but not in detail

● Load sharing is by hashing – As round robining would delivery every second packet out-of-order ● Sydney-Manoa traffic is not load-shared – As the southern path is Sydney-NZ-Fiji- Seattle-Manoa MPLS traffic engineering in operation – Manoa Typically file transfer traffic Label switched path Sydney, NSW — Manoa, HW MPLS traffic engineering in operation – Manoa No load sharing Sydney-Manoa

● South path only used if much more direct North path fails Topic AARNet's experiences

Configuration Protection Measurement Properties of international links Future SDH

SDH works well But you can't build a genuine 99.999% availability with only one redundant path Failure patterns

Many small single-segment failures

● These are usually intentional

● Software upgrades ● Maintain consistent active service age of equipment ● “Hits” of 50ms of less

● Better to blackhole this traffic rather than attempt a protection switch ● When should we declare a path failure? ● A big time avoids MPLS fast reroutes at the cost of greater time to restore service upon a genuine segment failure Failure patterns

Causes of major failures

– Physical break of cable

● SCCN had a cable break whilst maintaining the Protect segment – Craft technician error

● Decommission or loopback of wrong link Failures are often made worse

– Loopback test both segments simultaneously – Insufficient CPU provisioning in control plane – Network Management System fails when most needed MPLS fast re-route tuning

Configuration

● Need to calculate value for MPLS fast re- route hold-down timer from capacity vendor's SDH automatic protection switching tables – You'll get lots of small hits otherwise International link interior routing design International links are an obvious OSPF stub area

● With OSPF default pointing back towards NOC – BGP default might point towards a US ISP – Exterior default overrides interior default during normal operation ● A stub area is good as we want to isolate MPLS-TE information for an international link H.323 configuration

H.323 gatekeeper should always reject calls to PoP console server modems

● Forcing calls to re-route to PSTN without needing a prefix

● Uni phone books never list that uni's prefix to defeat VoIP toll bypass Personal relationships are important SCCN has been forthright and honest about failures

● Helps considerably to estimate risk of outage re-occuring

● US ISPs compare poorly AARNet's unusual requirements interested the SCCN technical staff Allowed us to build excellent relationships which have carried over into operation Topic AARNet's experiences

Configuration Protection Measurement Properties of international links Future Future intentions

Obviously fuzzy Lots of trans-Pacific capacity coming available

● SCCN (AU-US)

● AU-JP

● US-JP Opportunity to construct protection against design and operational failure by international capacity providers MPLS across undersea vendors

MPLS should be able to do multi-vendor protection better than SDH

● SDH has no segment visibility in this application

● No clocking issues

● No “profile” issues MPLS VPNs look like a good idea

MPLS VPNs look attractive for virtual research networks

● For example, to research routing protocols

● When MPLS becomes a campus technology then allows network policies smaller than an autonomous system – Not necessarily a good thing – Moves the complexity (not removes the complexity) – At least the complexity is no longer seen in global BGP routing table MPLS VPNs look like a bad idea

VPNs are useful for “crunchy outside, soft inside” firewalled networks

● Do not need a firewall at each site

● Firewall configuration is simpler Assumes that the “baddies” are on the outside, ROFL MPLS configures as best effort offers no protection from denial of service attacks in network “interior” Use of MPLS to simplify BGP

A&R networks often want to offer transit to other A&R networs Problem: BGP configuration for this can be complex, and transit network gets caught up in this complexity Solution: offer MPLS transit Use of MPLS to simplify BGP

Often use to policy routing because we want multiple routing instances in one router

● Operational nightmare, especially in protection scenarios Can use MPLS to implement this

● Run two BGP instances, obe per VPN

● Place interfaces in particular MPLS VPNs MPLS monitoring

Routers don't provide nearly enough performance information

● Protection

● How long did protection take? ● What was the cost in CPU resource? ⋯ Enabling capacity planning for protection ● End-to-end performance – Loss – Latency, especially changes Topics if we have time

Quality of service MPLS on Linux GMPLS Forwarding equivalence class and

A forwarding equivalence class is mainly about routing Quality of service is mainly about queuing Two choices

Place differing QoS into differing FECs

● Label switch router uses label to infer forwarding and queuing Place differing QoS into same FEC

● Use Experimental bits to mark 3 bits of service – Treating Exp similarly to IP's DSCP Not really a choice, we can do both It might router configuration easier if queueing discipline were always driven from Exp bits Even if Exp always has the same value for that forwarding equivalence class Traffic engineering and QoS routing Traffic engineering can be used to recover some quality services before others

● Recover voice services before recovering best-effort data before recovering video See “reservation priority” Experience with QoS so far

No harder than IP DiffServ :-)

● Same lack of coherent, total solution

● Same “will be supported in next version” issues Topics if we have time

Quality of service MPLS on Linux GMPLS MPLS on Linux mpls-linux on sourceforge

● Kernel patches against 2.4 – Compiles against 2.4.18-rc3-ac1 after about an hour's work ● Includes Nortel-based LDP, library and command line configuration. Supports tunnels and MPLS opcode programming. Over ATM, ethernet and (with patches) PPP.

● Beta MPLS on Linux

Zebra ospfd

● OSPF-TE with Opaque LSA

● CVS is usually more stable than releases

● Late beta MPLS on Linux rsvpd

● Not yet with Hellos and fast re-route

● Early beta, research orientation MPLS on Linux

NIST Switch

● For BSD

● Reasonably complete MPLS, RSVP-TE implementation

● Web site suggested a Linux port could happen, but this was in 2001

● Current status? Topics if we have time

Quality of service MPLS on Linux GMPLS GMPLS expands MPLS's aims

World domination

● One control plane protocol – RSVP-TE Controlling all switching mechanisms

● MPLS – ATM – Ethernet – RPR ● SDH/SONET GMPLS expands MPLS's aims, cond By viewing all switching as a special case of MPLS switching we can get a single

● Control layer – Not one per switching mechanism ● Management domain – Not one per vendor per switching mechanism ● Security mechanism – Not a billion passwords, all known and unalterable That's all folks Further reading Books Davie & Rekhter MPLS: Technology and applications

● Good but dated Alwayn Advanced MPLS design and implementation

● Good coverage of TE and VPNs

● “Advanced” only in sense of not a “... for dummies” book Further reading Internet drafts Sharma, Hellstrand (eds) Framework for MPLS-based recovery

● draft-ietf-mpls-recovery-frmwrk-05 Lai, McDysan (eds), Boyle, et al Network hierarchy and multilayer survivability

● draft-ietf-tewg-restore-hierarchy-00 Further reading Internet drafts, cond Owens, Sharma, Oommen, et al Network survivability considerations for traffic engineered IP networks

● draft-owens-te-network-survivability-03