A Case for SIP in Javascript", IEEE Communications Magazine , Vol
Total Page:16
File Type:pdf, Size:1020Kb
Copyright © IEEE, 2013. This is the author's copy of a paper that appears in IEEE Communications Magazine. Please cite as follows: K.Singh and V.Krishnaswamy, "A case for SIP in JavaScript", IEEE Communications Magazine , Vol. 51, No. 4, April 2013. A Case for SIP in JavaScript Kundan Singh Venkatesh Krishnaswamy IP Communications Department IP Communications Department Avaya Labs Avaya Labs Santa Clara, USA Basking Ridge, USA [email protected] [email protected] Abstract —This paper presents the challenges and compares the applications independent of the specific SIP extensions alternatives to interoperate between the Session Initiation supported in the vendor's gateway. Protocol (SIP)-based systems and the emerging standards for the Web Real-Time Communication (WebRTC). We argue for an We observe that decoupled development across services , end-point and web-focused architecture, and present both sides tools and applications promotes interoperability. Today, the of the SIP in JavaScript approach. Until WebRTC has ubiquitous web developers can build applications independent of a cross-browser availability, we suggest a fall back strategy for web particular service (Internet service provider) or vendor tool developers — detect and use HTML5 if available, otherwise fall (browser, web server). Unfortunately, many existing SIP back to a browser plugin. systems are closely integrated and have overlap among two or more of these categories, e.g., access network provider Keywords- SIP; HTML5; WebRTC; Flash Player; web-based (service) wants to control voice calling (application), or three communication party calling (application) depends on your telephony provider (service). The main motivation of the endpoint approach (also I. INTRODUCTION known as SIP in JavaScript) is to separate the applications In the past, web developers have resorted to browser (various SIP extensions and telephony features) from the plugins such as Flash Player to do audio and video calls in the specific tools (browsers or proxies) or VoIP service providers. browser and to interoperate with the legacy voice-over-IP In practice, this is not guaranteed because the web site may (VoIP) systems from the web. To avoid the third-party plugin restrict the hosted SIP-in-JavaScript application to only connect dependency, new standards for web real-time communication to its own proxy server. Moreover, several challenges make (WebRTC) are emerging to support the devices, codecs and this approach nearly impossible to work in practice. communication primitives natively in the browser. While the global voice communications use the Session Initiation The paper is organized as follows. Section II gives a Protocol (SIP) [1] for signaling, the HTML5 standard as part of background on related technologies. Section III compares the the WebRTC effort [2][3][4] keeps the signaling part outside two alternatives to interoperate between SIP and WebRTC. We the scope of the browser. present an implementation in Section IV. Section V describes the fall back strategy of using the Flash Player plugin by the Traditional voice systems are built around a business model web developers while WebRTC is being incrementally adopted that requires universal reach hence interoperability among by the browser vendors. Finally, we present our conclusions in multiple VoIP vendors and services is crucial. By contrast, Section VI. web based services are experimenting with business models capturing as many users within a single domain as possible. II. BACKGROUND AND RELATED WORK For instance, a user on a social networking web site is likely to Before describing the interoperability between SIP and communicate with others on that web site, but not on another WebRTC, we give a brief background on these systems and web site. Thus, every web site can choose its own rendezvous their differences from the interoperability point of view. (or signaling) protocol without worrying about interoperating with other web sites. We only need interoperability among a A. What is SIP? few browser vendors for the media path. SIP [1] is the IETF standard for establishing, managing and terminating Internet sessions including voice and video calls In practice, interoperability with legacy VoIP and telephone and conferences. As shown in Fig.1 (a), a SIP system uses systems is crucial. This is done in either the end-point browser other standards such as SDP (Session Description Protocol) for or a network gateway attached to the web server. In the former offer/answer of session negotiation, RTP (Real-time Transport end-point approach, the SIP stack runs in JavaScript in the Protocol) for media path transport, RTCP (Real-time Transport browser while using WebRTC for media path to connect with Control Protocol) for feedback and control of media path, another SIP device. In the latter approach, if the client-server optionally SRTP (secure RTP) with keys negotiated in SDP for signaling is standardized, the same web application or the media path security, and optionally ICE (Interactive gateway can be reused with mix-and-match interoperability on Connectivity Establishment) for traversal through intermediate several web sites. For the endpoint approach, the signaling is NATs and firewalls. The dotted red-line separates what is using SIP, whereas for the gateway approach, a custom programmed by the application developer and what is provided protocol maps to SIP in the backend. We argue for the endpoint by the platform. approach because it keeps the web developers build application application to connect to the service infrastructure. We compare these two approaches in the next section. SDP JSON codecs WebRTC API developer SIP SIP XML SDP TABLE I. INTEROPERABILITY DIFFERENCES IN SIP VS WEB RTC. (O) SRTP Codecs MEANS OPTIONAL ICE ICE SIP RTCP WS SRTP RTP ICE RTCP RTCP Property SIP WebRTC HTTP RTP Media transport RTP, SRTP (o) SRTP, new RTP profiles TCP UDP TCP UDP Session negotiation SDP, offer/answer SDP, trickle (a) Typical SIP end-point platform (b) Typical WebRTC end-point STUN (o), TURN (o), ICE (includes STUN, NAT traversal Figure 1. Comparision of typical SIP vs WebRTC application stack ICE (o) TURN) Media transport Separate: audio/video, Same path with all media path/connection RTP vs RTCP and control B. What is WebRTC? User trusts device and User trusts browser but Security model WebRTC represents the family of emerging standards service provider not web site Typically G.711, Mandatory Opus and within the WebRTC working group in W3C [3] and the IETF Audio codecs RTCWEB working group [2] to enable end-to-end browser G.729, G.722, Speex G.711, optionally others Typically H.261, Undefined yet but likely Video codecs communication for real-time media. Please refer to [4] for an H.263, H.264 VP8 and/or H.264 overview of WebRTC. It reuses existing standards such as mandatory SRTP (Secure RTP) for media transport, and mandatory ICE (Interactive Connectivity Establishment) for Even though the media path uses the same set of protocols, traversal through NATs and firewalls. The signaling messages achieving true media path interoperability between a SIP user are browser independent and are left to the application agent and a WebRTC capable browser is a challenge. Table I developer who would typically use HTTP (Hyper-Text summarizes the main differences between SIP and WebRTC Transfer Protocol) and WebSocket (WS) [5] for exchanging for interoperability based on the discussions in the IETF. call control and session description information among the WebRTC requires new RTP profiles to send multiple RTP participants. Fig.1 compares the typical SIP and WebRTC media streams as well as RTCP control packets multiplexed application stack. over the same transport path (or logical connection) so that the expensive step of negotiating an end-to-end path across The WebRTC API proposal [3] uses SDP, which enables an firewalls is done only once. SIP end-points use a variety of application developer to use it as is, or transform it to web- optional techniques for NAT traversal, whereas WebRTC friendly JSON (JavaScript Object Notation) or XML makes ICE mandatory. Instead of using the lock-step of SIP (eXtensible Markup Language). Optionally, the application can offer-answer, i.e., once an offer is made the end-point must include a SIP implementation in JavaScript and reuse all the wait for an answer before issuing another offer , WebRTC features provides by SIP. allows trickle or incremental change in the session description C. What is WebSocket? to promote incremental address gathering in ICE. WebSocket (WS) is an IETF protocol and an HTML5 API While there is some overlap in the mandatory audio codecs that allows creating a bi-directional client-server connection of WebRTC and the commonly used codecs in SIP systems, the from the JavaScript code in the browser to the web server. To working group is still undecided on the choice of mandatory traverse web proxies, it uses HTTP to initiate the first request video codecs. Moreover, the generic data channel proposed in and subsequently upgrades to a persistent connection via WebRTC has no clear equivalent in existing SIP systems. additional handshakes. Once the connection is established, it For the media path security, WebRTC requires SRTP. SIP allows sending any data with packet boundary in either systems using SRTP, while not ubiquitous, are nonetheless direction over TCP. increasing. The origin-based trust model suggests that the two D. Related Work in Interoperability browsers must be visiting (or share some content on) the same In the early days of WebRTC, the IETF rejected having SIP web site to establish a WebRTC session. Finally, the end-user in the browser and left the signaling to the application in can trust her browser, but not the web site she is visiting. JavaScript. Subsequent attempts to interwork between SIP Several open issues exist and are being debated in the IETF, systems and WebRTC enabled browsers fall in two categories: e.g., the choice of mandatory codecs and the granularity of the translation at the gateway or implementing SIP in JavaScript.