VoCCN: Voice-over Content-Centric Networks

Van Jacobson Diana K. Smetters Nicholas H. Briggs Michael F. Plass Paul Stewart James D. Thornton Rebecca L. Braynard Palo Alto Research Center Palo Alto, CA, USA {van,smetters,briggs,plass,stewart,jthornton,rbraynar}@parc.com

ABSTRACT Alice's VoIP Provider Bob's VoIP Provider

A variety of proposals call for a new Internet architecture 1 focused on retrieving content by name, but it has not been Signaling Path clear that any of these approaches are general enough to sup- port Internet applications like real-time streaming or email. Internet We present a detailed description of a prototype implemen- tation of one such application – Voice over IP (VoIP) – in Router A Router B a content-based paradigm. This serves as a good example to show how content-based networking can offer advantages 2 for the full range of Internet applications, if the architecture Media Path has certain key properties. IP IP Alice Bob Categories and Subject Descriptors Figure 1: Voice-over-IP data flows C.2.1 [Computer Systems Organization]: Network Architecture and Design; C.2.2 [Computer Systems Organization]: Network Protocols To investigate this question we have implemented VoCCN — a real-time, conversational, telephony ap- General Terms plication over Content-Centric Networking (CCN) [16, 17, 18] and found it to be simpler, more secure and more Design, Experimentation, Performance, Security scalable than its VoIP (Voice-over-IP) equivalent. Our implementation uses standard SIP [11] and RTP [10] 1. INTRODUCTION payloads which gives it complete and secure interop- There is widespread agreement that content –what erability with standard-conforming VoIP implementa- a user wants – should have a more central role in fu- tions via a simple, stateless, IP-to-CCN gateway. ture network architectures than it does in the Internet’s This paper describes how to map the existing VoIP current host-to-host conversation model [9, 5, 23, 4, 2, architecture into CCN while preserving security, inter- 15, 6, 19, 20, 22, 7, 3, 8, 16, 17, 18]. But while it operability, and performance. The mapping techniques is clear that architectures based on Pub-Sub and simi- are not unique to VoIP, but are examples of general lar data-oriented abstractions provide a good fit to the transformations that we believe can be applied to al- massive amounts of static content exchanged via the most any conversational Internet protocol. Through the World Wide Web and various P2P overlay networks, it example of VoCCN we explore the properties that can is less clear how well they fit more conversational traffic enable networking models focused on content to be more such as email, e-commerce transactions or VoIP. general than traditional conversational models. These new architectures can therefore deliver benefits for both static content and the full range of conversational com- munication applications that are important in the In- Permission to make digital or hard copies of all or part of this work for ternet today. personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific 2. VOIP BACKGROUND permission and/or a fee. ReArch’09, December 1, 2009, Rome, Italy. Voice over Internet Protocol (VoIP) is the dominant Copyright 2009 ACM 978-1-60558-749-3/09/12 ...$10.00. open protocol for Internet telephony. Figure 1 depicts

1 the components of a standard VoIP exchange. When Alice and Bob wish to make a phone call, their VoIP phones set up an audio link using the Session Initiation Protocol (SIP) [11] via what is termed the signaling path. As VoIP endpoints are often mobile or located Internet on dynamic IP addresses, signaling path exchanges are mediated by proxies – service providers or corporate Router A Router B

VoIP signaling gateways that receive and forward mes- Signaling & Media Path sages on behalf of their client endpoints. To place a call to Bob, Alice’s endpoint first contacts her SIP proxy CCN CCN who forwards the call invitation to Bob’s SIP proxy, as   only Bob’s proxy knows the current IP address of Bob’s endpoint. The body of the invitation contains both in- Figure 2: Voice-over-CCN data flows formation about Alice and the RTP [10] address where she expects to receive audio (or other media streams) from Bob, should he accept the call. Bob’s accept of Thereareacoupleofproblemsthatmustbesolvedin theinvitecontainstheRTPaddressofwhereheex- order to map conversational protocols like SIP and RTP pects to receive audio from Alice which allows a direct, into a content-oriented model. First, we must support bi-directional media path between their endpoints. service rendezvous. To initiate a call, the caller’s phone VoIP media (voice, video, etc.) can be secured and must be able to request a connection with the callee, authenticated using either an encrypted form of RTP and get a confirmation response. This requires the (SRTP [12]), or by tunneling RTP inside another se- callee’s phone to offer a service contact point. In stan- cure network protocol (e.g., DTLS [14]). The encryp- dard IP, a port number serves as such a point where a tion keys are either set up via the signaling path, which process receives requests to which it dynamically gener- must then itself be encrypted, or in-band in the me- ates responses. To translate this into a content-oriented dia path (ZRTP [24]). Signaling path authentication model, we need on-demand publishing: the ability to and encryption can be done via wrapping the signaling request content that has not yet been published, route exchange in DTLS and relying on a Public Key Infras- that request to potential publishers, and have them cre- tructure (PKI) to authenticate the exchange, or using ate, and then publish, the desired content in response. a key agreement protocol such as Multimedia Internet Second, we must have a way to transition from this Keying (MIKEY [13]) embedded in the signaling mes- initial rendezvous to a bi-directional flow of conversa- sages. MIKEY has the advantage of providing end-to- tional data. In standard IP, there are packets (either end security and authenticating its own while designated packets in the rendezvous sequence, or all minimizing signaling path roundtrips, but does not it- TCP or UDP packets) that contain the information self provide confidentiality of the signaling pathway. In needed to name the destination to which replies should practice, however, the perceived difficulty and cost of be sent. For example, a TCP packet header contains configuring cryptographic keys and establishing a PKI a source IP address and port plus a protocol identifier. means that VoIP traffic is almost always unencrypted In a SIP exchange, the SDP content of the SIP message and unauthenticated. (shown in Figure 3) identifies the address to use for the media conversation. To translate this into a content- 3. ARCHITECTURE oriented model, we need constructable names:itmust be possible to construct the name of a desired piece of The complex data paths of Figure 1 result from a mis- content a priori, without having been given the name match between the user’s goal and the network’s means up front or having previously seen the content – the of achieving it: Alice simply wants to talk to Bob but service consumer must be able to figure out how to for- the network requires that the communication be ad- mulate a request that will reach the service provider. dressed to the IP address of Bob’s phone. All the in- This can be done if: frastructure in the Figure, together with the several ser- vices, devices and DNS name registrations that are not • There is a deterministic algorithm by which the shown, exist solely to map from the user/application’s data provider and consumer arrive at the same world view into the network’s world view. One strong (routable) name based on data available to both. driver for content-oriented networking is that this trans- • Consumers can retrieve content based on partially- lation (typically referred to as middleware) is not needed. specified names. Data should instead ideally flow directly from producer to interested consumer, as shown in Figure 2. Our The former requirement guarantees that consumer VoCCN prototype achieves this. and producer will arrive at the same name, and that

2 ing time with an identity (e.g., [email protected]), and INVITE sip:[email protected] SIP/2.0 registers to offer data in a namespace derived from that Via: SIP/2.0/CCN parc.com:5060 identity and the name of the SIP service (/ccnx.org/ From: Alice Briggs sip/alice/invite). A caller maps a SIP INVITE (ex- To: Bob Jacobs ample shown in Figure 3) into an Interest packet asking Call-ID: 1911287229 for new content from the callee. The network routes CSeq: 20 INVITE the Interest to the callee, which generates a piece of Content-Type: application/sdp Data with the requested name containing the SIP re- Max-Forwards: 70 sponse, thus completing SIP signaling in a single round User-Agent: /3.0.0 (eXosip2/3.1.0) trip. The use of structured names in CCN allows a Subject: Phone call simple mapping of the callee identity and service name Expires: 120 into the first part of the content name (used for hierar- Content-Length: 1477 chical, prefix-based routing) while unique identifiers for [... the request are added to distinguish the desired content o=alice 123456 654321 IN IP4 13.2.117.34 from any pre-existing content. Note that the entire SIP c=IN IP4 13.2.117.34 INVITE message is included in the requested content a=key-mgmt: mikey AQQFgE3dV+ACAA... name (potentially in encrypted form). The callee re- m=audio 7078 RTP/AVP 111 110038101 ceiving this Interest can unpack the name and generate ...] a SIP response as the content of the Data packet that Figure 3: Example of SIP INVITE message will satisfy the Interest. To transition from the SIP rendezvous into the media conversation with RTP, each phone takes information names will not depend on data not available to both exchanged in the rendezvous and uses it to construct a (such as the cryptographic digest of the content, impos- sequence of names for the individual packets of media sible for data that does not yet exist and will be gen- data. This is also illustrated in Figure 4, where we see erated on demand). The latter deals with the fact that that the call-id from the SIP exchange, together with without significant prearrangement to allow for a source the identity of the other party and a sequence number of shared randomness, such constructed names will not is used to construct the name of each fragment of me- be unique. By allowing flexibility in the query mecha- dia. Sequence numbering provides a simple way for data nism, we can allow for uniquely named content, while provider and consumer to arrive at the same unique matching it to deterministically generated queries. For name for each piece of content. These names can be example, allowing a query for a structured name that cryptographically anonymized to unlink them from the matches only the prefix of that name. SIP exchange and provide user privacy. In CCN, each fragment of content that may be pub- In the content delivery architecture of CCN, Inter- lished in the network has a hierarchically structured ests and Data flow in lock-step, each Interest retrieving name. Requests for content are expressed in Interest a single data packet. In a dispersed or high-latency net- packets, which specify the prefix of the name of the de- work, the round-trip latency can easily be large enough sired content and a set of rules by which to determine to delay reception times of media packets to the point what of the content under that prefix to return. CCN where they become unplayable. To solve this problem, routing tables use prefix matching to directly route in- we employ pipelining by sending Interests for multiple terests based on their name prefixes towards content future media packets. The CCN media receiver main- sources that have registered availability of content by tains some number of outstanding Interests in a media prefix. CCN does not require that data be published stream; when the stream is opened (or as network con- and registered with the infrastructure before it can be ditions change) it generates a number of Interests. Each retrieved; a request merely needs to start with a pre- time it receives content for that stream, it produces a fix registered with intervening routers for delivery to an new Interest, thereby maintaining the number of out- interested publisher. Such a publisher can then look at standing Interests in the pipeline. the request and create named content, or Data packets, dynamically in response. The network forwards match- 3.1 Advantages ing Data packets back along the path taken by the In- Our approach adds appealing properties beyond sim- terests that requested them, so content reaches the re- ply supporting an existing conversational protocol: quester and is never sent where it was not requested. For a detailed description of CCN and its current im- • Content networking infrastructures support multi- plementation, please see [18]. point routing – for example, automatically routing We map a SIP rendezvous onto CCN as shown in Fig- a call request to all likely places where it might be ure 4. Each phone endpoint is configured at provision- answered. This direct infrastructure support for

3 Caller (Alice) Callee (Bob)

Interest:

/domain/sip/bob/invite/EpkB(sk)/Esk(SIP INVITE message)

Data: Name:/domain/sip/bob/invite/EpkB(sk)/Esk(SIP INVITE message)      Signature Info: ,

Content: Esk(SIP response message)

Interest: /domain/bob/call-id/rtp/seq-no

Data: Name:/domain/bob/call-id/rtp/seq-no Signature Info: , Content: SRTP packet (encrypted audio) Interest: /domain/alice/call-id/rtp/seq-no    Data: Name:/domain/alice/call-id/rtp/seq-no Signature Info: , Content: SRTP packet (encrypted audio)

Figure 4: Protocol exchange.

multipoint calling removes the requirement that crementally deploy a VoCCN-based infrastructure, in- endpoints register their IP address every time they teroperating with VoIP calls coming in from or going to move, at least within the routing domain. the outside world and internal legacy VoIP phones. In this model, the VoCCN/VoIP proxy serves as the • The “identity” of a content infrastructure endpoint SIP proxy for external (and internal) inbound VoIP is represented by credentials located on that end- calls, and translates from VoIP packets (SIP and SRTP) point – i.e., a signing key, identifying content (e.g., to VoCCN packets (which again merely encapsulate SIP voice packets) that it creates. Management in a and SRTP for this application), and vice versa. voice content system is minimized, as provisioning On receiving a SIP or SRTP packet, the proxy merely a new endpoint consists only of giving that end- examines that packet and generates a corresponding point a credential. In contrast, VoIP configuration CCN Data packet whose name is determined based on requires setting the mapping from callee identity information in the original inbound packet header (see to endpoint IP address at multiple points in the below). It then forwards it into the CCN routing fabric network, each time that IP address changes. in response to an Interest from the other (CCN-aware) • Advanced services (voicemail, conference calling, endpoint involved in the call. The proxy then performs call logging and recording) can be easily built on the CCN-specific parts of the call on behalf of the legacy top of a content-based voice system as additional VoIP client – generating and sending an Interest in the components utilizing multipoint routing to follow next packet of the exchange, where the name used in the call requests or copy and process call contents. Interest is also computed as a function of information in the inbound VoIP packet. 3.2 VoCCN/VoIP Interoperability The proxy retains no state about the call but re- We have chosen to implement standard VoCCN by sponds only to received (CCN or VoIP) packets – the encapsulating standard VoIP protocols (SIP, SRTP), to call “state” is contained at the endpoints and in the In- enable direct interoperability between VoCCN and un- terests noted in the forwarding tables along the path to modified legacy VoIP implementations. Such interoper- the content source. The proxy also has limited par- ability can be achieved using a stateless VoCCN-VoIP ticipation in call security. Key exchange and media gateway.1 Such a gateway allows an organization to in- path encryption (if supported by the VoIP client) is 1 end-to-end, provided by standard MIKEY and SRTP. We present the design of such a gateway here, but have not yet implemented one. SIP signaling security for the VoIP portion of the call, if

4 available, is provided by legacy mechanisms between the LinPhone Packet Interarrival Time VoIP client and the VoCCN-VoIP gateway/SIP proxy. 1 The proxy provides signaling security for the VoCCN 0.9 portion of the call, which digitally signs all messages it 0.8 translates before sending them into the CCN infrastruc- 0.7 ture, and additionally encrypts and authenticates the 0.6 SIP messages between itself and the CCN endpoint. 0.5 0.4 4. IMPLEMENTATION 0.3 CDF: Proportion of Packets We implemented a proof-of-concept VoCCN client as 0.2 0.1 Stock LinPhone an extension to an open source VoIP phone, Lin- CCN LinPhone - Encrypted 0 phone (version 3.0). We made Linphone exchange data 0 10 20 30 40 50 60 70 80 90 100 over CCN by taking advantage of the ability to plug new Packet Interarrival Time (ms) transports into libeXoSIP and liboRTP, the libraries it uses for SIP and RTP. Figure 5: Cumulative distribution of inter- Our CCN network layer is implemented as a content packet intervals, or jitter, for a 10-minute call. router, which runs on every CCN-aware node, together with an interface library that simplifies the process of writing content-based applications. Each VoCCN end- DTLS for its ability to perform a complete SIP exchange point runs a CCN content router. These routers ex- and key setup in a single round trip. change CCN packets via an overlay consisting of UDP To provide signaling path security, we implemented sent over preconfigured point-to-point or multicast links. a simple inline message encryption and authentication The new libeXoSIP and liboRTP transports use the in- scheme (shown in abbreviated form in Figure 4). The terface library to send and receive CCN packets through caller encrypts the SIP invite message using a randomly- the local content router. generated symmetric key, sk, and authenticates that en- Our CCN library and network stack, together with crypted message using HMAC, a symmetric-key MAC. an updated version of our Linphone client, is available The caller then encrypts sk under the public key of the as open source from [1]. callee (pkB in Figure 4). To construct a call-initiating Interest message, the caller includes as components in 4.1 Security the desired name not only information necessary to route All VoCCN Data packets in both the signaling and the Interest to the callee, but also both the encrypted the media paths are digitally signed, using per-user key session key block (EpkB(sk)) and the encrypted and au- pairs. CCN simplifies the problem of key distribution, thenticated SIP message (Esk(SIP INV IT E message)). easing deployment of end-to-end security. Public keys The callee, on receiving the Interest, decrypts the en- can be distributed via CCN itself, so obtaining the pub- crypted key block with its private key, recovers sk,and lic key for a VoCCN user can be as simple as request- uses it to verify and decrypt the SIP INVITE (which it- ing a piece of content with a predetermined name – self contains the first, separately authenticated, MIKEY e.g., /ccnx.org/users/alice/KEY. Such keys can be key exchange message to establish messaging path se- accepted on faith at first and remembered over time curity). The caller then uses sk to encrypt its SIP re- (key continuity), giving a historically-based notion of sponse message (contained in a signed Data packet). identity. They can be authenticated in-band by their This system offers much better security than most users, giving equivalent security to ZRTP [24]. Or they production VoIP deployments. These usually offer no can be published as CCN content signed by a publisher security, or at best encrypt the signaling path hop-by- trusted by both endpoints, securely binding that key to hop using tunnels between proxies. Our VoCCN imple- a CCN name derived from the user’s identity. This acts mentation provides end-to-end security for both - effectively as a digital certificate, allowing the construc- ing and media traffic. It also supports end-to-end me- tion of a CCN-based PKI. dia path security when interoperating with VoIP clients, To provide media path security, we ran all calls over through its encapsulation of MIKEY and SRTP. SRTP,2 using pre-existing hooks to integrate libsrtp, an open SRTP implementation, with Linphone. We 4.2 Performance implemented a MIKEY [13] library to perform key ex- To evaluate CCN’s ability to support timely delivery change in the signaling path. We selected MIKEY over of realtime data we looked at the packet arrival times 2 This provided both encryption and content authentication; for our VoCCN implementation. We used two Linux though the latter was redundant given CCN’s digital signa- 2.6.27 machines (a 3.4 GHz Intel P4 and a 2.66 GHz ture on each packet. Intel Core2 Duo) on either 100Mbs or 1Gbs switched

5 networks for this experiment. Figure 5 shows the distri- considerations for a network of information. In bution of inter-packet intervals, effectively packet jitter, ReArch, 2008. for a 10-minute voice call made using stock Linphone [4] H. Balakrishnan, K. Lakshminarayanan, S. Ratnasamy, S. Shenker, I. Stoica, and M. Walfish. (solid line) over UDP RTP, and our Linphone-based A Layered Naming Architecture for the Internet. In VoCCN client (dashed line) (expected interval 20 msec). SIGCOMM, 2004. Steplike appearance is due to the Linux kernel schedul- [5] H. Balakrishnan, S. Shenker, and M. Walfish. ing quantum (for a single process in the case of stock Semantic-Free Referencing in Linked Distributed Linphone, and 3 processes for VoCCN in this early pro- Systems. In IPTPS, February 2003. totype). The VoCCN call has slightly fewer packets at [6] M. Caesar, T. Condie, J. Kannan, K. Lakshminarayanan, I. Stoica, and S. Shenker. or below the expected inter-packet interval, and a small ROFL: Routing on Flat Labels. In SIGCOMM, 2006. number of long-interval packets at the tail. No packets [7] C. Dannewitz, K. Pentikousis, R. Rembarz, were lost by either stock Linphone or our VoCCN client, E. Renault, O. Strandberg, and J. Ubillos. Scenarios however a small number of VoCCN packets (less than and research issues for a network of information. In 0.1%) were dropped by Linphone for late arrival. With Proc. 4th Int. Mobile Multimedia Communications Conf., July 2008. almost equal delivery performance, VoCCN and VoIP [8] C. Esteve, F. L. Verdi, and M. F. Magalh˜aes. Towards have the same call quality. a new generation of information-oriented Each packet in the VoCCN exchange was individually internetworking architectures. In CONEXT, 2008. signed with a 1024-bit RSA key; interestingly this did [9] M. Gritter and D. R. Cheriton. An architecture for not noticeably impact CPU load or performance on our content routing support in the internet. In Usenix Symposium on Internet Technologies and Systems test machines. For more constrained environments, sig- (USITS), 2001. nature aggregation techniques (see [18]) or fast signing [10] IETF. RFC 1889 – RTP: A transport protocol for algorithms such as ESIGN [21] can be used to increase real-time applications. throughput and lower computational burden. [11] IETF. RFC 3261 – SIP: Session initiation protocol. [12] IETF. RFC 3711 – The Secure Real-time Transport Protocol (SRTP). 5. CONCLUSIONS [13] IETF. RFC 3830 – MIKEY: Multimedia Internet Content-oriented network architectures not only move KEYing. content scalably and efficiently, they can also implement [14] IETF. RFC 4347 – Datagram . IP-like conversational services like voice calls, email or [15] V. Jacobson. A New Way to Look at Networking, transactions. To demonstrate this we have implemented August 2006. http://video.google.com/videoplay? and tested a Voice-over-CCN prototype. The result is docid=-6972678839686672840&ei= functionally and performance equivalent to Voice-over- iUx3SajYAZPiqQLwjIS7BQ&q=tech+talks+van+ IP but substantially simpler in architecture, implemen- jacobson. tation and configuration. It is intrinsically more scal- [16] V. Jacobson. Making the Case for Content-Centric Networking: An Interview with Van Jacobson. ACM able because it does not require VoIP’s inbound SIP Queue, January 2009. proxy (with the associated signaling state concentra- [17] V. Jacobson. Special Plenary Invited Short Course: tion). Since CCN secures content rather than the con- (CCN) Content-Centric Networking. In Future nections it travels over, VoCCN does not require dele- Internet Summer School, August 2009. gation of either trust or keys to proxies or other network [18] V. Jacobson, D. K. Smetters, J. D. Thornton, M. Plass, N. Briggs, and R. Braynard. Networking intermediaries and thus is far more secure than VoIP. Named Content. In CoNext, 2009. Finally, thanks to certain affordances offered by CCN’s [19] T. Koponen, M. Chawla, B.-G. Chun, A. Ermolinskiy, structured naming, VoCCN is completely interoperable K. H. Kim, S. Shenker, and I. Stoica. A Data-Oriented with VoIP via simple, stateless gateways. (and Beyond) Network Architecture. In SIGCOMM, 2007. [20] B. Ohlman et al. First NetInf architecture description, Acknowledgements April 2009. http://www.4ward-project.eu/index. We thank Dirk Balfanz and Philippe Golle for useful php?s=file_download&id=39. input. [21] T. Okamoto and J. Stern. Almost Uniform Density of Power Residues and the Provable Security of ESIGN. In ASIACRYPT, 2003. 6. REFERENCES [22] D. Trossen. Conceptual architecture of PSIRP TM [1] Project CCNx . http://www.ccnx.org, Sep. 2009. including subcomponent descriptions (D2.2), June [2] W. Adjie-Winoto, E. Schwartz, H. Balakrishnan, and 2008. http://psirp.org/publications. J. Lilley. The Design and Implementation of an [23] M. Walfish, H. Balakrishnan, and S. Shenker. Intentional Naming System. SIGOPS Oper. Syst. Untangling the Web from DNS. In NSDI, 2004. Rev., 33(5):186–201, 1999. [24] P. Zimmermann, A. Johnston, and J. Callas. ZRTP: [3] B. Ahlgren, M. D’Ambrosio, M. Marchisio, I. Marsh, Media Path Key Agreement for Secure RTP. C. Dannewitz, B. Ohlman, K. Pentikousis, http://www.philzimmermann.com/zfone/ O. Strandberg, R. Rembarz, and V. Vercellone. Design draft-zimmermann-avt--15.html.

6