∗ Back to the Future Part 4: The

Simon S. Lam

Department of Computer Sciences The University of Texas at Austin [email protected]

Slide 1 I was much younger then. The great thing about young people is that they have no fear. I don’t remember that I ever worried about losing money. I arranged to use the Thompson conference center on campus which charged us only a nominal fee. We had a great lineup of speakers. The pre-conference tutorial on network protocol design was by David Clark, a former SIGCOMM Award winner. The Keynote session had three speakers: , another for- mer SIGCOMM Award winner, John Shoch of Xerox who Back to the Future Part 4: was in charge of development at that time, and The Internet Louis Pouzin, yet another former SIGCOMM Award win- ner. We had 220 attendees, an excellent attendance for a first-time conference. We ended up making quite a bit of Simon S. Lam money.Endofstory. Department of Computer Sciences The University of Texas at Austin Slide 2

Sigcomm 2004 Keynote 1 S. S. Lam IP won the networking race I thank the SIGCOMM Awards committee for this great Many competitors in the honor. I would like to share this honor with my former and FTP SMTP HTTP RTP DNS current students, and colleagues, who collaborated and co- past authored papers with me. Each one of them has contributed SNA, DECnet, XNS to make this award possible. To all my coauthors, I thank X.25, ATM, CLNP TCP UDP you.

It is most gratifying for me to accept this award at a SIG- IP provides end-to-end IP COMM conference because my relationship with the confer- delivery of datagrams, ence goes back a long way. Over 21 years ago, I organized best-effort service only . . . the very first SIGCOMM conference held on the campus of IP can use any link-layer LINK1 LINK2 LINKn the University of Texas at Austin in March 1983. Let me technology that delivers tell you a little story about the first SIGCOMM. datagrams (packets) Back in 1982–1983, the SIGCOMM chair was David Wood. Sigcomm 2004 Keynote 2 The vice chair was Carl Sunshine. One day during the sum- S. S. Lam mer of 1982, Carl called me up on the phone. He said, “Si- mon, SIGCOMM is broke. We have a lot of red ink. How IP won the networking race for data communications. would you like to organize a SIGCOMM conference to make However, as recently as ten years ago, before widespread some money for us?” I said, “Okay, I will do it.” Then, we use of the World Wide Web, it was not clear that IP would talked a bit. Towards the end of our conversation, Carl said, be the winner. The original ARPANET in the 1970s and “By the way, Simon, as I said, SIGCOMM is broke. So if later the Internet had many competitors in the past. you lose any money on the conference, you will have to take In industry, the competitors included SNA, IBM’s Sys- care of the loss yourself.” tems Network Architecture, DECnet with the Digital Net- work Architecture, and XNS which stands for Xerox Net- ∗This is an edited transcript of the author’s SIGCOMM 2004 keynote speech given on August 31, 2004. This work was work systems. SNA was developed around 1975. At that sponsored in part by Nation Science Foundation grants ANI- time, IBM was the 500-pound gorilla in the computer indus- 0319168 and CNS-0434515 and Texas Advanced Research try. IBM already had networking products used by numer- Program grant 003658-0439-2001. ous customers. Those early networks had a tree topology with a mainframe computer connected to data concentra- packets may be discarded because of buffer overflow. As a tors and terminals. But SNA was designed to have a new result, a network of queues is prone to congestion collapse, as architecture with a mesh topology similar to ARPANET’s illustrated in this figure, where system decreases architecture. Back in 1975, very few people would have pre- rapidly as offered load becomes large. This is a weakness of dicted that the small, experimental ARPANET, rather than IP that we should keep in mind. SNA, would emerge 25 years later as winner of the network- ing race. In fact, when SNA was first developed, SNA was Slide 4 the acronym for Single Network Architecture. IBM had very ambitious goals. However, by the time SNA was announced to the public, the name was changed to Systems Network Architecture, possibly due to antitrust concerns. IP also had strong competitors in the standards world. Congestion collapse—ALOHA channel First, there was X.25 in the 1970s and 1980s. Then there was ATM in the 1990s. Also, there were ISO protocols such The ALOHA System (Abramson 1970) as CLNP (Connectionless Network Protocol). Poisson process assumption IP is a very simple protocol. It provides end-to-end deliv- ery of datagrams, also called packets. It provides no service pure ALOHA throughput guarantee to its users. In turn, IP expects no service guar- S = G e-2G (Abramson) antee from any link-layer technology it uses. Therefore, any slotted ALOHA throughput network can connect to IP. S = G e-G (Roberts) Ten years ago, when Internet applications were primarily email, ftp, and web, IP’s simplicity was its greatest strength in fighting off competitors. In the future, however, IP’s ARPANET Satellite System (1972) simplicity is possibly a liability because the requirements of Internet’s future applications will be more demanding, Sigcomm 2004 Keynote 4 particularly the requirements of interactive multimedia ap- S. S. Lam plications. Congestion collapse was a subject of great interest to me Slide 3 when I was a Ph.D. student at UCLA. Congestion collapse of the ALHOA channel was the inspiration of my disserta- tion research. In the next several slides, I will show you a little bit of my early work in the 1970s. My early work had IP‛s underlying model is a network a lot of influence on my thinking, and may explain why I of queues hold certain opinions later on in this talk. I went to UCLA as a graduate student in 1969 when the Revolutionary change Unreliable channels, limited buffer capacity first IMP of the ARPANET was being installed there. The from the circuit model next year, Norm Abramson presented his seminal paper on (Kleinrock 1961) Prone to congestion collapse the ALOHA System in the Fall Joint Computer Conference. Abramson made use of a Poisson process assumption and t

Each packet is routed u p obtained the well-known formula for the throughput of pure independently using its h g

destination IP address u ALOHA. o r

No concept of a flow h The throughput formula for slotted ALOHA was an obser- T between source and m vation by Larry Roberts in 1972. By then, the ARPANET e

destination, no flow t s

state in routers y was already up and running. Roberts was looking for some- S thing else to do. Motivated by the ALOHA System, he Offered Load started the ARPANET Satellite System project, later re- named the Packet Satellite project. I was one of the first Sigcomm 2004 Keynote 3 S. S. Lam graduate students to work on the project. Others include Bob Metcalfe who was then at Xerox PARC and Dick Binder IP’s underlying model is a network of queues. Packets are who was working on the ALOHA System. transmitted from one to another, leaving one queue and joining the next queue as they travel from source to Slide 5 destination. Each packet travels independently. There is no concept of a flow of packets belonging to the same session Congestion collapse was not a possibility discussed by between source and destination processes. ALOHA System researchers. In their early presentations, I give credit to Len Kleinrock for proposing the network of the ALOHA System was always said to work well. This queues model for data communications as an alternative to is because the ALOHA channel was vastly under-utilized. the circuit model in telephone networks. Some might argue In today’s terminology, it was over-provisioned. One of my that Kleinrock did not invent the name “.” opinions is that over-provisioning covers up problems. But he was definitely the first to propose the network of Also, with the Poisson process assumption, Abramson ab- queues model in his 1961 Ph.D. dissertation proposal. stracted away in his analysis the need for a backoff algo- In a real network, channels are unreliable. More impor- rithm. So the first thing Len Kleinrock and I did for slotted tantly, buffer capacity for queueing is limited. Therefore, ALOHA was to introduce a realistic backoff algorithm for tously after about 3,000 time slots. I believe that this was Congestion collapse—ALOHA channel the first experimental demonstration of congestion collapse (cont.) and ALOHA instability.

The early ALOHA System was vastly under- Slide 7 utilized

Backoff algorithm for slotted ALOHA Congestion collapse—ALOHA channel (Kleinrock-Lam 1973) (cont.) Retransmit a collided packet randomly into Adaptive backoff algorithm (Lam 1974): one of K future time slots Retransmit a packet with m previous collisions into K(m) slots, where K(m) is monotonically Poisson process assumption implies nondecreasing in m K ˳

Sigcomm 2004 Keynote 5 S. S. Lam collided packets. In our algorithm each collided packet is re- K(1) = 10 K transmitted randomly into one of future time slots, where K(m) = 150, m ˻ 2 K is a parameter. Our analysis produced some interesting observations. First, the Poisson process assumption implies K Sigcomm 2004 Keynote 7 that is infinity. In hindsight, this is obvious—it is not S. S. Lam possible to get independent arrivals unless collided pack- ets are retransmitted into the infinite future. Second, for a K Subsequently, I developed a Markov chain model that ex- given channel throughput there is a value that minimizes plains ALOHA instability. I also proposed and investigated average delay. several adaptive backoff algorithms. This particular heuris- tic might look familiar to you. The idea is to use the number Slide 6 of previous collisions of a packet as an indication of current load. A packet with m previous collisions is retransmitted into an interval of K(m)slots,whereK(.) is a monotonically nondecreasing function of m.Inparticular,K(m)increases Congestion collapse—ALOHA channel rapidly as m increases from 1. Exponential backoff is a spe- (cont.) cial case of this heuristic. ASS Note 48 (Lam 1973) For the experimental results shown here, I used K(1) = 10, which is very close to optimal for a wide range of channel throughput. I used K(m) = 150 for m ≥ 2. There was no K = 15 need for a larger K value because I found by analysis that K = 150 was sufficient to stabilize a slotted ALOHA system with 400 users and this particular simulation had 400 users. I have plotted channel throughput as a function of time in slots. In this experiment, the input rate was about 0.32 and it jumped up to 1 for an interval of 200 time slots. As we can see, the adaptive backoff algorithm handled the input pulse easily. This is because the jump in K value from 10 to 150 was actually faster than exponential.

Sigcomm 2004 Keynote 6 S. S. Lam Slide 8 In early 1973, I ran a simulation to confirm my conjecture As a graduate student, I was also very interested in the that ALOHA channels are unstable. I waited a long time possibility of congestion collapse in the ARPANET. We to do this because, in those days, running a simulation was knew that a network of queues models with limited buffer not as easy as today. Obviously, we didn’t have ns. In fact, capacity would be prone to congestion collapse. So the real there was no simulation package of any kind to use. We had question was this: What about packet networks with win- to write simulation programs from scratch. dow flow control? The original ARPANET had flow control My simulation results were disseminated as ASS Note 48, using a message called RFNM and the subsequent TCP had where ASS stands for ARPANET Satellite System. The ex- window flow control. From Little’s law, we knew that win- periment shown here was run for a fixed value of K equal dow flow control provides congestion control to some extent. to 15. The channel input rate was 0.35 which is less than My conjecture was that static window flow control would the theoretical maximum value of 0.368 for slotted ALOHA. not be sufficient to avoid congestion collapse unless each net- Here I have plotted channel throughput as a function of time work node had a very large buffer capacity. I had to wait in slots. Notice that congestion collapse occurred precipi- until 1979 before I had enough computing resources to run Congestion collapse—packet networks Internet congestion control Network of queues with limited buffers Static window flow control in TCP not sufficient Van Jacobson‛s algorithms for TCP (Lam, late 1970s) congestion control (late 1980s)

main reason for stability of the current e t a Internet R

t u

p UDP does not perform congestion control h g u ) o preferred by voice and video applications r c h e s T /

s k More and more voice and video traffic will t r e o k w c

t impact Internet stability a e p N ( I believe in differentiating voice and video flows as well as flow admission control

Number of Buffers at Each Node Sigcomm 2004 Keynote 8 Sigcomm 2004 Keynote 9 S. S. Lam S. S. Lam meaningful simulation experiments to validate this conjec- congestion control. This is advocated by Sally Floyd and ture. Here are some of my simulation results for a 7-node others. This approach may work for the current Internet network. Each node is a router. The vertical axis is net- which has only small amounts of voice and video traffic. work throughput rate. The horizontal axis is the number of However, in the future, if voice and video traffic become buffers in each router. There are three curves, the left for very large components of Internet’s traffic mix, I believe a network with 84 flow-controlled sessions, the middle for a that differentiating voice and video flows together with the network with 168 flow-controlled sessions, and the right for use of flow admission control will be a better approach. a network with 336 flow-controlled sessions.1 The window size for each session was fixed at twice the number of hops Slide 10 between the session’s source and destination. Each point on a curve represents the network throughput of a simula- tion experiment that ran for 150 seconds of simulated time. So points with low network throughput indicate that the Efforts to extend/replace IPv4 networks were experiencing congestion collapse. (past 15 years) There are two ways to look at these results. First, for a fixed number of flow-controlled sessions (consider the curve IP multicast on the right for a network with 336 flow-controlled sessions) QoS support - IntServ, RSVP, DiffServ if the buffer capacity at each node is too small, congestion Active Networks research program of collapse would occur quickly. Second, consider a network DARPA with a fixed amount of buffer capacity, say 150 buffers per node. If the number of sessions is 168, the network will most IPsec – retrofitting IP with security likely not incur congestion collapse. But if the number of IPv6 –replacing IPv4 sessions increases to 336, then congestion collapse will occur 128 bit IP address quickly. flow concept to support QoS Mobile IP Slide 9 . . .

Sigcomm 2004 Keynote 10 Even though I knew that the TCP window size needs to be S. S. Lam adaptively controlled, I was not smart enough to design an effective adaptive algorithm. So several years went by and The work of Van Jacobson was the start of a new era of after the Internet experienced real congestion collapse, the Internet research beginning from the late 1980s by a new problem was solved by Van Jacobson, Raj Jain, and others. generation of researchers. In SIGCOMM ’88, Steve Deer- Van Jacobson’s algorithms for TCP congestion control are ing’s paper was the beginning of a large body of research on generally acknowledged as the main reason for stability of IP multicast. The fair queueing paper (by Demers, Keshav, the current Internet where more than 90% of the traffic are and Shenker) in SIGCOMM ’89 was probably the begin- carried by TCP. ning of an even larger body of research on quality of ser- Nowadays, UDP, rather than TCP, is the transport proto- vice. Throughout the 1990s, the research community was col preferred by voice and video applications because UDP most interested in improving, extending, or even replacing does not perform congestion control. If the Internet has to IPv4. These efforts also include the Active Networks re- carry more and more UDP traffic, its stability will be af- search program funded by DARPA, IPsec with the objec- fected. One possible way to address this concern is to entice tive of retrofitting IP with security measures, IPv6 with the designers of multimedia applications to use TCP-friendly objective of replacing IPv4, Mobile IP with the objective of 1Flow-controlled sessions are labeled as “virtual channels” supporting mobility, etc. For this audience, it is unnecessary in the figure. for me to elaborate on any of these topics. It suffices to say that numerous protocols have been designed, and thousands of papers have been written and published, as well as Inter- net drafts and RFCs. The research community worked very Is P2P overlay a panacea ? hard on these topics. But as we know, very few of these P2P overlay supports efforts have been deployed to a significant extent. Of the many new distributed P2P overlay ones on this list, I would say that DiffServ and IPsec have applications meaningful deployment on the Internet, perhaps, because FTP SMTP HTTP RTP DNS they can be quite useful even when deployed incrementally. P2P overlay is inefficient in its use TCP UDP Slide 11 of underlying Internet resources P2P overlay does not IP directly address IP‛s foundational issues LINK1 LINK2 . . . LINKn Don‛t mess with IP ? (stability, QoS)

In recent years, the Sigcomm 2004 Keynote 12 S. S. Lam research community FTP SMTP HTTP RTP DNS has moved on to other areas Slide 13 P2P overlays TCP UDP (ad hoc networks, sensor networks, satellite networks) IP Measurements IP as the future universal interface But the IP foundation, LINK1 LINK2 . . . LINKn currently relying on Payload in the form of IP packets to over-provisioning, still enable new applications across different needs work telecom networks Sigcomm 2004 Keynote 11 S. S. Lam analog digital packet

The research community seems to have decided not to Recent news mess with IP any more. In recent years, many researchers Microsoft unveiled plans for developing IPTV have redirected their efforts to the application layer for mul- (October 2003) ticast support. Some are trying to use the transport layer SBC to invest $6 billion on fiber to home for for QoS support. Most of the current Internet research ef- TV services (June 2004) forts are concerned with either the application layer at the top, such as research on various P2P systems, the link layer Sigcomm 2004 Keynote 13 at the bottom, such as research on sensor networks, wireless S. S. Lam ad hoc networks, and satellite networks, or methodology for Internet measurements. Why should we be concerned with IP’s foundation? It is often said that with recent advances in DWDM (dense I believe that IP is on track to become the universal inter- wavelength division ), the Internet core is over- face2 for . Like the previous technology provisioned. Therefore, there is no need to worry about IP transformation from analog to digital transport, we are in any more. Actually, I disagree with that statement. I think the midst of a technology transformation from digital to that while the Internet core is over-provisioned in the near packet transport. There is no alternative to IP to serve as future, we should take advantage of this window of oppor- this universal packet interface. tunity to do research that will strengthen IP’s foundation. The migration of voice traffic from telephone networks to IP networks has already begun. The next step is the Slide 12 migration of services. You may have read about these two news items. In October 2003, Microsoft unveiled Let me digress and express a few opinions about P2P sys- major plans for developing IPTV. More recently I read that tems since there is so much research interest in P2P overlays. SBC would invest $6 billion dollars on fiber to home for I think that P2P research is an exciting research area and television services to compete with cable companies. P2P overlays will enable the Internet to support many new distributed applications. However, P2P overlays are some- what inefficient in their use of underlying Internet resources Slide 14 and they do not directly address IP’s stability and QoS is- While more than 90% of the current Internet traffic is sues. TCP traffic, which is responsive to congestion, I believe the Imagine that the Internet protocol stack is a house with network traffic mix will change substantially in the future. IP as its foundation. Adding another floor to the house Here are a few numbers to consider. will provide more space for new functions. However adding another floor will not address foundational issues. 2More specifically, the IP packet format. from Ethernet technology twenty years ago. In particular, 10 Gbps Ethernet is very different from 10 Mbps Ethernet. Changing network traffic mix Only the Ethernet frame interface remains the same. Much more voice and video traffic All kinds of technologies now co-exist under the Ether- net interface. Ethernet was originally designed for a large Current traffic of a major population of bursty users. For many years, Ethernet was telecommunications carrier synonymous with CSMA/CD. Ethernet is now also used Circuit switched voice 1.2 petabytes/day for high-throughput point-to-point full-duplex communica- Internet traffic 1.5 petabytes/day tions, not only within a local area but also for wide-area Television services over IP communications. Back of the envelope calculation: 4-8 Mbps, 10,000 seconds/day, 108 TV sets Slide 16 500-1000 petabytes/day to end users

Sigcomm 2004 Keynote 14 A pragmatic approach (cont.) S. S. Lam

2. Accept in some form a competing idea A major telecommunications carrier currently delivers about that has been vanquished again and again 1.2 petabytes of circuit-switched voice per day and about 1.5 petabytes of Internet traffic per day. Indeed, the amount of by IP but refuses to die and go away: data is larger than the amount of voice traffic. But if voice continues its migration to the Internet, the voice traffic com- virtual circuit packet switching ponent would be significant. X.25 1970s – 1990s [L. Roberts] What about television services? I don’t have real data. Frame Relay 1980s – present But here are some back of the envelope calculations: I used ATM early 1990s – present 4–8 Mbps for a high quality video signal, such as MPEG-2 Label switching, MPLS late 1990s - present or MPEG-4, a round number of 10,000 seconds per day per television set (this is about two and three-quarter hours), Sigcomm 2004 Keynote 16 and 100 million television sets. I got between 500 to 1,000 S. S. Lam petabytes per day to end users. This amount will have to be distributed over a large number of networks, and I have Second, we should accept in some form acompetingidea not taken into account savings from using multicast. Still that has been vanquished again and again by IP but it re- the point here is that the amount of television traffic will be fuses to die and go away, namely, the virtual circuit packet large. With the addition of both telephony and television switching idea. The virtual circuit idea first appeared in traffic, the network traffic mix will be substantially different X.25 which was developed by a company called Telenet in from today’s traffic mix. the mid 1970s and it became the first international standard for packet networks. Slide 15 The CEO of Telenet was Larry Roberts. This is the same Larry Roberts who built the original ARPANET several years earlier. In 1973, when the ARPANET was still in its infancy, Larry left ARPA to be CEO of Telenet. Te- A pragmatic approach lenet would be the first public network to provide packet switching services, somewhat like today’s ISPs. Larry was 1. Learn from the evolution of Ethernet a visionary. But as an entrepreneur he was about 20 years Ethernet technology today is very different too early. from Ethernet technology 20 years ago. X.25 was also used in other public packet networks, such Transmission rates: 10 Mbps, 100 Mbps, 1 Gbps, as Datapac in Canada and Transpac in France. X.25 is gone 10 Gbps now. But the virtual circuit idea reappeared in frame relay, Switching protocols: which is a streamlined version of X.25. It reappeared again • CSMA/CD on a cable, in ATM, and more recently in MPLS. All three technologies • CSMA/CD on a hub, • collision-free switching, are working as link-layer technologies under IP. • full-duplex point-to-point, both WAN and LAN A variety of coding techniques and media Slide 17 Only the Ethernet frame interface remains the same. In addition to virtual circuits, we also have real circuit

Sigcomm 2004 Keynote 15 switching running under IP. IP over SONET is one example. S. S. Lam Some time in the future, GMPLS will be another one. In GMPLS, forwarding can be based upon a particular time For IP to become the universal interface there is a prag- slot in a TDM frame, a particular wavelength, or a particular matic approach. First, we can learn from the evolution optical port. Therefore a label-switched path in GMPLS is of Ethernet. Ethernet technology today is very different a real circuit. for network operators. However, it should not be considered a solution by the research community. Furthermore, the Real also under IP access paths from end systems to the core are not over- provisioned and will require a better solution than best effort IP over SONET in order for IP to provide QoS end to end.

GMPLS forwarding based on TDM time slot, Slide 19 wavelength, or optical port Nesting of label switched paths (LSPs) reminiscent of multiplexing/demultiplexing hierarchy in telephone networks The hourglass shape reconsidered

Although link-layer technologies are diverse, including email, WWW, ... Sigcomm 2004 Keynote 17 virtual and real S. S. Lam circuits, only best TCP, UDP effort service is available to Internet IP The nesting of label-switched paths is essentially the same applications concept as the multiplexing/demultiplexing hierarchy in tele- best phone networks. effort So, virtual and real circuits, these old ideas from the tele- phone world, are alive and doing quite well in the link layer under IP. This is because they are useful. More specifi- Sigcomm 2004 Keynote 19 cally, they are used by carriers to provision Virtual Private S. S. Lam Networks and to provision other carriers. In business, the customer is always right. If there is a need for something, it To summarize what I said in the last few slides, let’s re- will not go away. consider the hour-glass shape of the Internet protocol stack. Internet’s link-layer technologies are diverse, including dif- Slide 18 ferent kinds of virtual and real circuits. However, only the best effort service is available to protocols and applications above IP. With this perspective, the shape of the Internet protocol stack is more like a drinking glass than an hour- IP should be a “big tent” glass. Slide 20 To “rule” the world of communications, IP has to attend to the needs of new constituents (voice, video) multiple services to support diverse applications IP needs a broader base For the research community, over- provisioning should be considered a temporary fix, not a permanent solution telephony, television, telephony, television, email, WWW, ... email, WWW, ... While the core is over-provisioned, access paths to the core are not TCP, UDP TCP, UDP

IP IP IP Sigcomm 2004 Keynote 18 S. S. Lam best best effort CoS effort FoS For IP to eventually become the universal interface for telecommunications, i.e., to rule the world of communica- CoS Class-oriented Service tions, the IP layer itself will need to evolve to provide ser- vices that attend to the needs of new constituents, namely, FoS Flow-oriented Service Sigcomm 2004 Keynote 20 voice and video traffic. When packet switching was first pro- S. S. Lam posed in the 1960s, it was justified with the observation that data traffic is bursty, with a very high peak-to-average ratio, In the future, when telephony and television applications unlike voice traffic. But if the network traffic mix changes, generate substantial amounts of traffic, the Internet protocol with the addition of large amounts of voice and video traf- stack will become top heavy and it will require a broader fic, we should be open minded about adopting a multiservice base for stability. approach for IP. I have suggested two services in addition to the best-effort Over-provisioning the core is currently a good solution service for IP: The first is CoS, a class-oriented service, which may be DiffServ as it exists now or a revised version. (these are old ideas, not new ideas) that should be further in- The second is FoS, a flow-oriented service targeting voice vestigated because, I believe, they are appropriate for voice and video traffic. The flow-oriented service should be a topic and video. of investigation and discussion in the near future. The first idea is flow aggregation. This idea is old; it goes back to the virtual path concept in ATM. It reappeared in Slide 21 the nesting of label-switched paths in MPLS. Now RSVP also allows aggregation of reservations. With aggregation, a flow travels from its source to desti- nation along a sequence of “virtual channels.” Each virtual channel carries a flow aggregate. What is flow-oriented service? Flow aggregation can be used to reduce both state infor- Dynamic signaling, flow admission control, mation and signaling overhead in routers, thus improving flow state, quantitative QoS metric scalability. The question is whether the improvements will Flows subject to admission control rather than be enough to make a flow-oriented service commercially at- random packet drop tractive. I believe that there is potential for very substantial Make use of virtual or real circuits in link layer improvements. Ideally, a router may only need to keep track of the number of flows in a flow aggregate. In the extreme Am I reviving IntServ? case, a router has just two flow aggregates for each outgoing link, one for voice and the other for video. Not exactly Even though the flow aggregation idea has been around Per-flow state and dynamic signaling are for a long time, it is not yet well understood as a research not scalable! issue. I believe that it is a promising approach. But I don’t I know think we know how to make it work yet.

Sigcomm 2004 Keynote 21 S. S. Lam Slide 23

In my mind, a flow-oriented service targeting voice and video traffic should have these four elements: dynamic sig- naling, flow admission control, flow state, and a quantitative Two good engineering ideas (cont.) QoS metric. Both voice and video flows are relatively long in duration. They should be subject to admission control 2. Statistical guarantee rather than controlled by random packet drop. In particu- lar, they should be differentiated from elastic traffic. A natural service guarantee for a voice I also envision that a flow-oriented service in IP can make or video flow is the flow‛s loss use of virtual or real circuits in the link layer if they are probability for a given packet delay available along a flow’s end-to-end path. bound Am I reviving IntServ? Not exactly. I am trying to keep Prob[packet delay > x] < ε alive some useful ideas that are in IntServ. I am also fully Very hard problem! Substantial aware that per-flow state and dynamic signaling are not scal- research in the past but needs much able. more work to be applicable. Statistical multiplexing gain from flow Slide 22 aggregation Sigcomm 2004 Keynote 23 S. S. Lam

Two good engineering ideas for The second good engineering idea, I believe, is statistical voice and video guarantee. For a voice or video flow, a natural service guar- antee is the flow’s loss probability for a given packet delay 1. Flow aggregation bound Current examples Virtual paths in ATM, Label stacks in MPLS, Prob[packet delay >x] < RSVP aggregation For many years, we, as researchers, fell in love with deter- Each flow is routed along a sequence of “virtual channels” each of which carries a ministic guarantees because they have such elegant math- flow aggregate ematics and their solution is more or less complete. But network operators are more practical than us researchers Flow aggregation reduces state and they don’t care about elegant mathematics. information and signaling overhead thus Statistical guarantee is a much harder problem with rather improving scalability • In the extreme case, a router has just two flow complicated mathematics. There has been substantial re- aggregates (voice and video) for each outgoing channel search in the past, but it is still far from having as clean, and as complete a solution as the one for deterministic guar- Sigcomm 2004 Keynote 22 S. S. Lam antees. The two ideas, flow aggregation and statistical guaran- So what do we do? There are two good engineering ideas tee, have synergy. In particular, flow aggregation provides statistical multiplexing gain. For a given statistical guaran- tee, the needed to provision a flow aggregate is less than the total bandwidth needed to provision individual QoS research is not done flows. e r h u t g i c IntServ e H

Slide 24 t i h c r A

f o m

u y i t i d aggregation + x e e l

M statistical Major research issues p guarantee m o How to derive service guarantee of a flow C DiffServ w best o

from the service guarantee provided to a L effort flow aggregate? None Weak Strong Very strong Dynamic configuration and provisioning of Strength of Service Guarantee

virtual channels for flow aggregates. Sigcomm 2004 Keynote 25 How to efficiently compute the end-to-end S. S. Lam statistical guarantee to a flow, under practical assumptions? design, modeling, analysis exploited in IntServ. Using these ideas, I believe it is pos- approximation methods sible to find good solutions with medium complexity and a measurement-based techniques fairly strong, quantitative service guarantee.

Slide 26 Sigcomm 2004 Keynote 24 S. S. Lam

To apply the ideas of flow aggregation and statistical guar- Inter-provider QoS is a major antee, there are some major research issues to be addressed. How to derive the service guarantee of a flow from the ser- challenge vice guarantee provided to a flow aggregate? How to dy- namically configure and provision virtual channels for flow Business and legal issues aggregates? How to efficiently compute the end-to-end sta- Framework for competitive ISPs to tistical guarantee to a flow under practical assumptions? cooperate There are various models in the literature with differ- A quantitative QoS metric for inter-provider ent assumptions for deriving statistical guarantees. In my agreement opinion, the modeling and analysis work is not yet ready A small set of standardized traffic specs for for application. For efficient application, we should also voice and video investigate the use of approximation methods as well as . . . 3 measurement-based techniques. It would be nice to have a de facto standard! Slide 25 Sigcomm 2004 Keynote 26 The idea of this simple graph is borrowed from a paper S. S. Lam in IEEE Communications Magazine (June 2003) by Nico- las Christin and J¨org Liebeherr, but I have modified it. The Providing quality of service across multiple providers is a vertical axis represents the complexity of architecture, while major challenge because it involves business and legal issues the horizontal axis represents the strength of service guar- in addition to technical issues. While the research commu- antee provided. The three QoS architectures for IP, best ef- nity cannot directly address business and legal issues, we fort, DiffServ, and IntServ, with increasing complexity and can indirectly facilitate the resolution of such issues with an strength of guarantee, are shown in the figure. appropriate framework for end-to-end QoS deployment. I The fact that IntServ has not gained acceptance in the don’t have a framework to present today. As a start, I sug- marketplace because of its high complexity should not de- gest that the framework should include these two elements. ter us, as a research community, from another attempt at a First, I believe that a quantitative QoS metric is impor- flow-oriented service. The scalability issue can be addressed tant for inter-provider agreement. When a provider offers by flow aggregation and by targeting voice and video flows. a premium service to a customer or another provider, the A design based upon flow aggregation and statistical guar- quality of the premium service should be measurable. Sec- antee will benefit from statistical multiplexing that was not ond, I believe that the number of possible traffic specs for 3 voice and video should be limited and standardized. A small An afterthought: While a mathematical model for com- number of traffic specs will also reduce complexity and im- puting statistical guarantees is useful to have for provision- prove scalability. ing, it is perhaps not necessary for implementing the ideas of flow aggregation and statistical guarantee. An efficient It would be nice if a de facto standard emerges in the measurement technique for checking service conformance is future. So I started thinking: How do we get de facto stan- more important. dards? Slide 27 Conclusions For the research community, over- provisioning should not be considered a How to get one? solution The SSL model To become the universal telecom interface, Application-driven—the need to secure web IP needs to be a “big tent” transactions A flow-oriented service needed to support Now SSL used for other applications as well television and telephony services The FedEx model QoS research is not done Profit-driven—someone takes risk Flow aggregation and statistical guarantee merit further investigation Possible scenario: Some large ISP takes risk and provides QoS services to a large

part of the Internet. Success leads to Sigcomm 2004 Keynote 28 universal global coverage and a de facto S. S. Lam standard.

Sigcomm 2004 Keynote 27 S. S. Lam Slide 29

I thought of two examples from the past. First, there is the SSL model. In this case, the de facto Conclusions (cont.) standard, namely, the SSL protocol, was application-driven. There was an urgent need to secure web transactions. The web’s success led to widespread adoption of SSL as a de From history facto standard. Now SSL is used for other applications as Internet—almost 30 years from initial research well. to commercial deployment Second, there is the FedEx model. I know that FedEx is Packet radio—about 25 years not a protocol. Nevertheless it is still a great example of QoS research began in late 1980s a new business model created by someone who took risk, motivated by the potential of profit, and succeeded. Widespread commercial deployment of Before FedEx, our primary mail delivery service was the QoS within 10 years! US Postal Service, which is a best effort service just like IP. Then someone by the name of Frederick Smith founded FedEx more than 30 years ago to provide guaranteed overnight delivery service. FedEx charged a lot more than the Postal Sigcomm 2004 Keynote 29 Service. Yet business people are willing to pay for guar- S. S. Lam anteed delivery. FedEx took a large risk because it had to build a complete infrastructure for end-to-end delivery (i.e., Lastly, we as researchers need to be more patient. History a large fleet of planes and trucks, together with people). shows that it takes many years to nurture and develop a new idea until widespread commercial deployment. The Internet A similar scenario could play out for the Internet: Some took almost 30 years, from the mid 1960s to the mid 1990s. ISP takes risk and provides QoS service to a large part of Packet radio took about 25 years, from 1973 to the late the Internet. If successful, it will lead to universal global 1990s. coverage and a de facto standard. Even though packet voice research started in the late 1970s, I believe that QoS research as we know it now be- Slide 28 gan in the late 1980s, about 15 years ago. Therefore to reach 25 years, widespread commercial deployment of QoS Here are my conclusions: is not due for another 10 years. First, over-provisioning covers up problems. We, as the networking research community, should not consider over- provisioning of the core as a solution. Acknowledgments Second, IP won the networking race for data communica- The author thanks Ken Calvert, Thomas Woo, Geoff Xie, tions. However, to become the universal telecom interface, and Yin Zhang for their constructive comments during the IP needs to be a big tent, as in politics. To address the needs preparation of this talk. The author also thanks X. Brian of new constituents, namely, voice and video, a flow-oriented Zhang for his help in preparing slides and Min Sik Kim for service is needed to support telephony and television services his help in formatting this paper. in the future. Third, QoS research is not done. I suggest flow aggrega- tion and statistical guarantee as two good engineering ideas that merit further investigation.