VoCCN: Voice-over Content-Centric Networks
Van Jacobson Diana K. Smetters Nicholas H. Briggs Michael F. Plass Paul Stewart James D. Thornton Rebecca L. Braynard Palo Alto Research Center Palo Alto, CA, USA {van,smetters,briggs,plass,stewart,jthornton,rbraynar}@parc.com
ABSTRACT Alice's VoIP Provider Bob's VoIP Provider
A variety of proposals call for a new Internet architecture 1 focused on retrieving content by name, but it has not been Signaling Path clear that any of these approaches are general enough to sup- port Internet applications like real-time streaming or email. Internet We present a detailed description of a prototype implemen- tation of one such application – Voice over IP (VoIP) – in Router A Router B a content-based paradigm. This serves as a good example to show how content-based networking can offer advantages 2 for the full range of Internet applications, if the architecture Media Path has certain key properties. IP IP Alice Bob Categories and Subject Descriptors Figure 1: Voice-over-IP data flows C.2.1 [Computer Systems Organization]: Network Architecture and Design; C.2.2 [Computer Systems Organization]: Network Protocols To investigate this question we have implemented VoCCN — a real-time, conversational, telephony ap- General Terms plication over Content-Centric Networking (CCN) [16, 17, 18] and found it to be simpler, more secure and more Design, Experimentation, Performance, Security scalable than its VoIP (Voice-over-IP) equivalent. Our implementation uses standard SIP [11] and RTP [10] 1. INTRODUCTION payloads which gives it complete and secure interop- There is widespread agreement that content –what erability with standard-conforming VoIP implementa- a user wants – should have a more central role in fu- tions via a simple, stateless, IP-to-CCN gateway. ture network architectures than it does in the Internet’s This paper describes how to map the existing VoIP current host-to-host conversation model [9, 5, 23, 4, 2, architecture into CCN while preserving security, inter- 15, 6, 19, 20, 22, 7, 3, 8, 16, 17, 18]. But while it operability, and performance. The mapping techniques is clear that architectures based on Pub-Sub and simi- are not unique to VoIP, but are examples of general lar data-oriented abstractions provide a good fit to the transformations that we believe can be applied to al- massive amounts of static content exchanged via the most any conversational Internet protocol. Through the World Wide Web and various P2P overlay networks, it example of VoCCN we explore the properties that can is less clear how well they fit more conversational traffic enable networking models focused on content to be more such as email, e-commerce transactions or VoIP. general than traditional conversational models. These new architectures can therefore deliver benefits for both static content and the full range of conversational com- munication applications that are important in the In- Permission to make digital or hard copies of all or part of this work for ternet today. personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific 2. VOIP BACKGROUND permission and/or a fee. ReArch’09, December 1, 2009, Rome, Italy. Voice over Internet Protocol (VoIP) is the dominant Copyright 2009 ACM 978-1-60558-749-3/09/12 ...$10.00. open protocol for Internet telephony. Figure 1 depicts
1 the components of a standard VoIP exchange. When Alice and Bob wish to make a phone call, their VoIP phones set up an audio link using the Session Initiation Protocol (SIP) [11] via what is termed the signaling path. As VoIP endpoints are often mobile or located Internet on dynamic IP addresses, signaling path exchanges are mediated by proxies – service providers or corporate Router A Router B
VoIP signaling gateways that receive and forward mes- Signaling & Media Path sages on behalf of their client endpoints. To place a call to Bob, Alice’s endpoint first contacts her SIP proxy CCN CCN who forwards the call invitation to Bob’s SIP proxy, as only Bob’s proxy knows the current IP address of Bob’s endpoint. The body of the invitation contains both in- Figure 2: Voice-over-CCN data flows formation about Alice and the RTP [10] address where she expects to receive audio (or other media streams) from Bob, should he accept the call. Bob’s accept of Thereareacoupleofproblemsthatmustbesolvedin theinvitecontainstheRTPaddressofwhereheex- order to map conversational protocols like SIP and RTP pects to receive audio from Alice which allows a direct, into a content-oriented model. First, we must support bi-directional media path between their endpoints. service rendezvous. To initiate a call, the caller’s phone VoIP media (voice, video, etc.) can be secured and must be able to request a connection with the callee, authenticated using either an encrypted form of RTP and get a confirmation response. This requires the (SRTP [12]), or by tunneling RTP inside another se- callee’s phone to offer a service contact point. In stan- cure network protocol (e.g., DTLS [14]). The encryp- dard IP, a port number serves as such a point where a tion keys are either set up via the signaling path, which process receives requests to which it dynamically gener- must then itself be encrypted, or in-band in the me- ates responses. To translate this into a content-oriented dia path (ZRTP [24]). Signaling path authentication model, we need on-demand publishing: the ability to and encryption can be done via wrapping the signaling request content that has not yet been published, route exchange in DTLS and relying on a Public Key Infras- that request to potential publishers, and have them cre- tructure (PKI) to authenticate the exchange, or using ate, and then publish, the desired content in response. a key agreement protocol such as Multimedia Internet Second, we must have a way to transition from this Keying (MIKEY [13]) embedded in the signaling mes- initial rendezvous to a bi-directional flow of conversa- sages. MIKEY has the advantage of providing end-to- tional data. In standard IP, there are packets (either end security and authenticating its own messages while designated packets in the rendezvous sequence, or all minimizing signaling path roundtrips, but does not it- TCP or UDP packets) that contain the information self provide confidentiality of the signaling pathway. In needed to name the destination to which replies should practice, however, the perceived difficulty and cost of be sent. For example, a TCP packet header contains configuring cryptographic keys and establishing a PKI a source IP address and port plus a protocol identifier. means that VoIP traffic is almost always unencrypted In a SIP exchange, the SDP content of the SIP message and unauthenticated. (shown in Figure 3) identifies the address to use for the media conversation. To translate this into a content- 3. ARCHITECTURE oriented model, we need constructable names:itmust be possible to construct the name of a desired piece of The complex data paths of Figure 1 result from a mis- content a priori, without having been given the name match between the user’s goal and the network’s means up front or having previously seen the content – the of achieving it: Alice simply wants to talk to Bob but service consumer must be able to figure out how to for- the network requires that the communication be ad- mulate a request that will reach the service provider. dressed to the IP address of Bob’s phone. All the in- This can be done if: frastructure in the Figure, together with the several ser- vices, devices and DNS name registrations that are not • There is a deterministic algorithm by which the shown, exist solely to map from the user/application’s data provider and consumer arrive at the same world view into the network’s world view. One strong (routable) name based on data available to both. driver for content-oriented networking is that this trans- • Consumers can retrieve content based on partially- lation (typically referred to as middleware) is not needed. specified names. Data should instead ideally flow directly from producer to interested consumer, as shown in Figure 2. Our The former requirement guarantees that consumer VoCCN prototype achieves this. and producer will arrive at the same name, and that
2 ing time with an identity (e.g., [email protected]), and INVITE sip:[email protected] SIP/2.0 registers to offer data in a namespace derived from that Via: SIP/2.0/CCN parc.com:5060 identity and the name of the SIP service (/ccnx.org/ From: Alice Briggs
3 Caller (Alice) Callee (Bob)
/domain/sip/bob/invite/EpkB(sk)/Esk(SIP INVITE message)
Data: Name:/domain/sip/bob/invite/EpkB(sk)/Esk(SIP INVITE message) Signature Info:
Content: Esk(SIP response message)
Interest: /domain/bob/call-id/rtp/seq-no
Data: Name:/domain/bob/call-id/rtp/seq-no Signature Info:
Figure 4: Protocol exchange.
multipoint calling removes the requirement that crementally deploy a VoCCN-based infrastructure, in- endpoints register their IP address every time they teroperating with VoIP calls coming in from or going to move, at least within the routing domain. the outside world and internal legacy VoIP phones. In this model, the VoCCN/VoIP proxy serves as the • The “identity” of a content infrastructure endpoint SIP proxy for external (and internal) inbound VoIP is represented by credentials located on that end- calls, and translates from VoIP packets (SIP and SRTP) point – i.e., a signing key, identifying content (e.g., to VoCCN packets (which again merely encapsulate SIP voice packets) that it creates. Management in a and SRTP for this application), and vice versa. voice content system is minimized, as provisioning On receiving a SIP or SRTP packet, the proxy merely a new endpoint consists only of giving that end- examines that packet and generates a corresponding point a credential. In contrast, VoIP configuration CCN Data packet whose name is determined based on requires setting the mapping from callee identity information in the original inbound packet header (see to endpoint IP address at multiple points in the below). It then forwards it into the CCN routing fabric network, each time that IP address changes. in response to an Interest from the other (CCN-aware) • Advanced services (voicemail, conference calling, endpoint involved in the call. The proxy then performs call logging and recording) can be easily built on the CCN-specific parts of the call on behalf of the legacy top of a content-based voice system as additional VoIP client – generating and sending an Interest in the components utilizing multipoint routing to follow next packet of the exchange, where the name used in the call requests or copy and process call contents. Interest is also computed as a function of information in the inbound VoIP packet. 3.2 VoCCN/VoIP Interoperability The proxy retains no state about the call but re- We have chosen to implement standard VoCCN by sponds only to received (CCN or VoIP) packets – the encapsulating standard VoIP protocols (SIP, SRTP), to call “state” is contained at the endpoints and in the In- enable direct interoperability between VoCCN and un- terests noted in the forwarding tables along the path to modified legacy VoIP implementations. Such interoper- the content source. The proxy also has limited par- ability can be achieved using a stateless VoCCN-VoIP ticipation in call security. Key exchange and media gateway.1 Such a gateway allows an organization to in- path encryption (if supported by the VoIP client) is 1 end-to-end, provided by standard MIKEY and SRTP. We present the design of such a gateway here, but have not yet implemented one. SIP signaling security for the VoIP portion of the call, if
4 available, is provided by legacy mechanisms between the LinPhone Packet Interarrival Time VoIP client and the VoCCN-VoIP gateway/SIP proxy. 1 The proxy provides signaling security for the VoCCN 0.9 portion of the call, which digitally signs all messages it 0.8 translates before sending them into the CCN infrastruc- 0.7 ture, and additionally encrypts and authenticates the 0.6 SIP messages between itself and the CCN endpoint. 0.5 0.4 4. IMPLEMENTATION 0.3 CDF: Proportion of Packets We implemented a proof-of-concept VoCCN client as 0.2 0.1 Stock LinPhone an extension to an open source Linux VoIP phone, Lin- CCN LinPhone - Encrypted 0 phone (version 3.0). We made Linphone exchange data 0 10 20 30 40 50 60 70 80 90 100 over CCN by taking advantage of the ability to plug new Packet Interarrival Time (ms) transports into libeXoSIP and liboRTP, the libraries it uses for SIP and RTP. Figure 5: Cumulative distribution of inter- Our CCN network layer is implemented as a content packet intervals, or jitter, for a 10-minute call. router, which runs on every CCN-aware node, together with an interface library that simplifies the process of writing content-based applications. Each VoCCN end- DTLS for its ability to perform a complete SIP exchange point runs a CCN content router. These routers ex- and key setup in a single round trip. change CCN packets via an overlay consisting of UDP To provide signaling path security, we implemented sent over preconfigured point-to-point or multicast links. a simple inline message encryption and authentication The new libeXoSIP and liboRTP transports use the in- scheme (shown in abbreviated form in Figure 4). The terface library to send and receive CCN packets through caller encrypts the SIP invite message using a randomly- the local content router. generated symmetric key, sk, and authenticates that en- Our CCN library and network stack, together with crypted message using HMAC, a symmetric-key MAC. an updated version of our Linphone client, is available The caller then encrypts sk under the public key of the as open source from [1]. callee (pkB in Figure 4). To construct a call-initiating Interest message, the caller includes as components in 4.1 Security the desired name not only information necessary to route All VoCCN Data packets in both the signaling and the Interest to the callee, but also both the encrypted the media paths are digitally signed, using per-user key session key block (EpkB(sk)) and the encrypted and au- pairs. CCN simplifies the problem of key distribution, thenticated SIP message (Esk(SIP INV IT E message)). easing deployment of end-to-end security. Public keys The callee, on receiving the Interest, decrypts the en- can be distributed via CCN itself, so obtaining the pub- crypted key block with its private key, recovers sk,and lic key for a VoCCN user can be as simple as request- uses it to verify and decrypt the SIP INVITE (which it- ing a piece of content with a predetermined name – self contains the first, separately authenticated, MIKEY e.g., /ccnx.org/users/alice/KEY. Such keys can be key exchange message to establish messaging path se- accepted on faith at first and remembered over time curity). The caller then uses sk to encrypt its SIP re- (key continuity), giving a historically-based notion of sponse message (contained in a signed Data packet). identity. They can be authenticated in-band by their This system offers much better security than most users, giving equivalent security to ZRTP [24]. Or they production VoIP deployments. These usually offer no can be published as CCN content signed by a publisher security, or at best encrypt the signaling path hop-by- trusted by both endpoints, securely binding that key to hop using tunnels between proxies. Our VoCCN imple- a CCN name derived from the user’s identity. This acts mentation provides end-to-end security for both signal- effectively as a digital certificate, allowing the construc- ing and media traffic. It also supports end-to-end me- tion of a CCN-based PKI. dia path security when interoperating with VoIP clients, To provide media path security, we ran all calls over through its encapsulation of MIKEY and SRTP. SRTP,2 using pre-existing hooks to integrate libsrtp, an open SRTP implementation, with Linphone. We 4.2 Performance implemented a MIKEY [13] library to perform key ex- To evaluate CCN’s ability to support timely delivery change in the signaling path. We selected MIKEY over of realtime data we looked at the packet arrival times 2 This provided both encryption and content authentication; for our VoCCN implementation. We used two Linux though the latter was redundant given CCN’s digital signa- 2.6.27 machines (a 3.4 GHz Intel P4 and a 2.66 GHz ture on each packet. Intel Core2 Duo) on either 100Mbs or 1Gbs switched
5 networks for this experiment. Figure 5 shows the distri- considerations for a network of information. In bution of inter-packet intervals, effectively packet jitter, ReArch, 2008. for a 10-minute voice call made using stock Linphone [4] H. Balakrishnan, K. Lakshminarayanan, S. Ratnasamy, S. Shenker, I. Stoica, and M. Walfish. (solid line) over UDP RTP, and our Linphone-based A Layered Naming Architecture for the Internet. In VoCCN client (dashed line) (expected interval 20 msec). SIGCOMM, 2004. Steplike appearance is due to the Linux kernel schedul- [5] H. Balakrishnan, S. Shenker, and M. Walfish. ing quantum (for a single process in the case of stock Semantic-Free Referencing in Linked Distributed Linphone, and 3 processes for VoCCN in this early pro- Systems. In IPTPS, February 2003. totype). The VoCCN call has slightly fewer packets at [6] M. Caesar, T. Condie, J. Kannan, K. Lakshminarayanan, I. Stoica, and S. Shenker. or below the expected inter-packet interval, and a small ROFL: Routing on Flat Labels. In SIGCOMM, 2006. number of long-interval packets at the tail. No packets [7] C. Dannewitz, K. Pentikousis, R. Rembarz, were lost by either stock Linphone or our VoCCN client, E. Renault, O. Strandberg, and J. Ubillos. Scenarios however a small number of VoCCN packets (less than and research issues for a network of information. In 0.1%) were dropped by Linphone for late arrival. With Proc. 4th Int. Mobile Multimedia Communications Conf., July 2008. almost equal delivery performance, VoCCN and VoIP [8] C. Esteve, F. L. Verdi, and M. F. Magalh˜aes. Towards have the same call quality. a new generation of information-oriented Each packet in the VoCCN exchange was individually internetworking architectures. In CONEXT, 2008. signed with a 1024-bit RSA key; interestingly this did [9] M. Gritter and D. R. Cheriton. An architecture for not noticeably impact CPU load or performance on our content routing support in the internet. In Usenix Symposium on Internet Technologies and Systems test machines. For more constrained environments, sig- (USITS), 2001. nature aggregation techniques (see [18]) or fast signing [10] IETF. RFC 1889 – RTP: A transport protocol for algorithms such as ESIGN [21] can be used to increase real-time applications. throughput and lower computational burden. [11] IETF. RFC 3261 – SIP: Session initiation protocol. [12] IETF. RFC 3711 – The Secure Real-time Transport Protocol (SRTP). 5. CONCLUSIONS [13] IETF. RFC 3830 – MIKEY: Multimedia Internet Content-oriented network architectures not only move KEYing. content scalably and efficiently, they can also implement [14] IETF. RFC 4347 – Datagram Transport Layer Security. IP-like conversational services like voice calls, email or [15] V. Jacobson. A New Way to Look at Networking, transactions. To demonstrate this we have implemented August 2006. http://video.google.com/videoplay? and tested a Voice-over-CCN prototype. The result is docid=-6972678839686672840&ei= functionally and performance equivalent to Voice-over- iUx3SajYAZPiqQLwjIS7BQ&q=tech+talks+van+ IP but substantially simpler in architecture, implemen- jacobson. tation and configuration. It is intrinsically more scal- [16] V. Jacobson. Making the Case for Content-Centric Networking: An Interview with Van Jacobson. ACM able because it does not require VoIP’s inbound SIP Queue, January 2009. proxy (with the associated signaling state concentra- [17] V. Jacobson. Special Plenary Invited Short Course: tion). Since CCN secures content rather than the con- (CCN) Content-Centric Networking. In Future nections it travels over, VoCCN does not require dele- Internet Summer School, August 2009. gation of either trust or keys to proxies or other network [18] V. Jacobson, D. K. Smetters, J. D. Thornton, M. Plass, N. Briggs, and R. Braynard. Networking intermediaries and thus is far more secure than VoIP. Named Content. In CoNext, 2009. Finally, thanks to certain affordances offered by CCN’s [19] T. Koponen, M. Chawla, B.-G. Chun, A. Ermolinskiy, structured naming, VoCCN is completely interoperable K. H. Kim, S. Shenker, and I. Stoica. A Data-Oriented with VoIP via simple, stateless gateways. (and Beyond) Network Architecture. In SIGCOMM, 2007. [20] B. Ohlman et al. First NetInf architecture description, Acknowledgements April 2009. http://www.4ward-project.eu/index. We thank Dirk Balfanz and Philippe Golle for useful php?s=file_download&id=39. input. [21] T. Okamoto and J. Stern. Almost Uniform Density of Power Residues and the Provable Security of ESIGN. In ASIACRYPT, 2003. 6. REFERENCES [22] D. Trossen. Conceptual architecture of PSIRP TM [1] Project CCNx . http://www.ccnx.org, Sep. 2009. including subcomponent descriptions (D2.2), June [2] W. Adjie-Winoto, E. Schwartz, H. Balakrishnan, and 2008. http://psirp.org/publications. J. Lilley. The Design and Implementation of an [23] M. Walfish, H. Balakrishnan, and S. Shenker. Intentional Naming System. SIGOPS Oper. Syst. Untangling the Web from DNS. In NSDI, 2004. Rev., 33(5):186–201, 1999. [24] P. Zimmermann, A. Johnston, and J. Callas. ZRTP: [3] B. Ahlgren, M. D’Ambrosio, M. Marchisio, I. Marsh, Media Path Key Agreement for Secure RTP. C. Dannewitz, B. Ohlman, K. Pentikousis, http://www.philzimmermann.com/zfone/ O. Strandberg, R. Rembarz, and V. Vercellone. Design draft-zimmermann-avt-zrtp-15.html.
6