Webrtc in the Real World: STUN, TURN and Signaling HTML5 Rocks Webrtc in the Real World: STUN, TURN and Signaling
Total Page:16
File Type:pdf, Size:1020Kb
9/21/2015 WebRTC in the real world: STUN, TURN and signaling HTML5 Rocks WebRTC in the real world: STUN, TURN and signaling By Sam Dutton Published: November 4th, 2013 Updated: November 4th, 2013 Comments: 16 WebRTC enables peer to peer communication. BUT... WebRTC still needs servers: For clients to exchange metadata to coordinate communication: this is called signaling. To cope with network address translators (NATs) and firewalls. In this article we show you how to build a signaling service, and how to deal with the quirks of realworld connectivity by using STUN and TURN servers. We also explain how WebRTC apps can handle multiparty calls and interact with services such as VoIP and PSTN (aka telephones). If you're not familiar with the basics of WebRTC, we strongly recommend you take a look at Getting Started With WebRTC before reading this article. What is signaling? Signaling is the process of coordinating communication. In order for a WebRTC application to set up a 'call', its clients need to exchange information: Session control messages used to open or close communication. Error messages. Media metadata such as codecs and codec settings, bandwidth and media types. Key data, used to establish secure connections. http://www.html5rocks.com/en/tutorials/webrtc/infrastructure/ 1/22 9/21/2015 WebRTC in the real world: STUN, TURN and signaling HTML5 Rocks Network data, such as a host's IP address and port as seen by the outside world. This signaling process needs a way for clients to pass messages back and forth. That mechanism is not implemented by the WebRTC APIs: you need to build it yourself. We describe below some ways to build a signaling service. First, however, a little context... Why is signaling not defined by WebRTC? To avoid redundancy and to maximize compatibility with established technologies, signaling methods and protocols are not specified by WebRTC standards. This approach is outlined by JSEP, the JavaScript Session Establishment Protocol: The thinking behind WebRTC call setup has been to fully specify and control the media plane, but to leave the signaling plane up to the application as much as possible. The rationale is that different applications may prefer to use different protocols, such as the existing SIP or Jingle call signaling protocols, or something custom to the particular application, perhaps for a novel use case. In this approach, the key information that needs to be exchanged is the multimedia session description, which specifies the necessary transport and media configuration information necessary to establish the media plane. JSEP's architecture also avoids a browser having to save state: that is, to function as a signaling state machine. This would be problematic if, for example, signaling data was lost each time a page was reloaded. Instead, signaling state can be saved on a server. http://www.html5rocks.com/en/tutorials/webrtc/infrastructure/ 2/22 9/21/2015 WebRTC in the real world: STUN, TURN and signaling HTML5 Rocks JSEP architecture JSEP requires the exchange between peers of offer and answer: the media metadata mentioned above. Offers and answers are communicated in Session Description Protocol format (SDP), which look like this: v=0 o=- 7614219274584779017 2 IN IP4 127.0.0.1 s=- t=0 0 a=group:BUNDLE audio video a=msid-semantic: WMS m=audio 1 RTP/SAVPF 111 103 104 0 8 107 106 105 13 126 c=IN IP4 0.0.0.0 a=rtcp:1 IN IP4 0.0.0.0 a=ice-ufrag:W2TGCZw2NZHuwlnf a=ice-pwd:xdQEccP40E+P0L5qTyzDgfmW a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level a=mid:audio a=rtcp-mux a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:9c1AHz27dZ9xPI91YNfSlI67/EMkjHHIHORiClQe a=rtpmap:111 opus/48000/2 … Want to know what all this SDP gobbledygook actually means? Take a look at the IETF examples. http://www.html5rocks.com/en/tutorials/webrtc/infrastructure/ 3/22 9/21/2015 WebRTC in the real world: STUN, TURN and signaling HTML5 Rocks Bear in mind that WebRTC is designed so that the offer or answer can be tweaked before being set as the local or remote description, by editing the values in the SDP text. For example, the preferAudioCodec() function in apprtc.appspot.com can be used to set the default codec and bitrate. SDP is somewhat painful to manipulate with JavaScript, and there is discussion about whether future versions of WebRTC should use JSON instead, but there are some advantages to sticking with SDP. RTCPeerConnection + signaling: offer, answer and candidate RTCPeerConnection is the API used by WebRTC applications to create a connection between peers and communicate audio and video. To initialise this process RTCPeerConnection has two tasks: Ascertain local media conditions, such as resolution and codec capabilities. This is the metadata used for the offer and answer mechanism. Get potential network addresses for the application's host, known as candidates. Once this local data has been ascertained, it must be exchanged via a signaling mechanism with the remote peer. Imagine Alice is trying to call Eve. Here's the full offer/answer mechanism in all its gory detail: 1. Alice creates an RTCPeerConnection object. 2. Alice creates an offer (an SDP session description) with the RTCPeerConnection createOffer() method. 3. Alice calls setLocalDescription() with his offer. 4. Alice stringifies the offer and uses a signaling mechanism to send it to Eve. 5. Eve calls setRemoteDescription() with Alice's offer, so that her RTCPeerConnection knows about Alice's setup. 6. Eve calls createAnswer(), and the success callback for this is passed a local session description: Eve's answer. 7. Eve sets her answer as the local description by calling setLocalDescription(). 8. Eve then uses the signaling mechanism to send her stringified answer back to Alice. 9. Alice sets Eve's answer as the remote session description using setRemoteDescription(). http://www.html5rocks.com/en/tutorials/webrtc/infrastructure/ 4/22 9/21/2015 WebRTC in the real world: STUN, TURN and signaling HTML5 Rocks Strewth! Alice and Eve also need to exchange network information. The expression 'finding candidates' refers to the process of finding network interfaces and ports using the ICE framework. 1. Alice creates an RTCPeerConnection object with an onicecandidate handler. 2. The handler is called when network candidates become available. 3. In the handler, Alice sends stringified candidate data to Eve, via their signaling channel. 4. When Eve gets a candidate message from Alice, she calls addIceCandidate(), to add the candidate to the remote peer description. JSEP supports ICE Candidate Trickling, which allows the caller to incrementally provide candidates to the callee after the initial offer, and for the callee to begin acting on the call and setting up a connection without waiting for all candidates to arrive. Coding WebRTC for signaling Below is a W3C code example that summarises the complete signaling process. The code assumes the existence of some signaling mechanism, SignalingChannel. Signaling is discussed in greater detail below. var signalingChannel = new SignalingChannel(); var configuration = { 'iceServers': [{ 'url': 'stun:stun.example.org' }] }; var pc; // call start() to initiate function start() { pc = new RTCPeerConnection(configuration); // send any ice candidates to the other peer pc.onicecandidate = function (evt) { if (evt.candidate) signalingChannel.send(JSON.stringify({ 'candidate': evt.candidate })); http://www.html5rocks.com/en/tutorials/webrtc/infrastructure/ 5/22 9/21/2015 WebRTC in the real world: STUN, TURN and signaling HTML5 Rocks }; // let the 'negotiationneeded' event trigger offer generation pc.onnegotiationneeded = function () { pc.createOffer(localDescCreated, logError); } // once remote stream arrives, show it in the remote video element pc.onaddstream = function (evt) { remoteView.src = URL.createObjectURL(evt.stream); }; // get a local stream, show it in a self-view and add it to be sent navigator.getUserMedia({ 'audio': true, 'video': true }, function (stream) { selfView.src = URL.createObjectURL(stream); pc.addStream(stream); }, logError); } function localDescCreated(desc) { pc.setLocalDescription(desc, function () { signalingChannel.send(JSON.stringify({ 'sdp': pc.localDescription })); }, logError); } signalingChannel.onmessage = function (evt) { if (!pc) start(); var message = JSON.parse(evt.data); if (message.sdp) pc.setRemoteDescription(new RTCSessionDescription(message.sdp), function () { // if we received an offer, we need to answer if (pc.remoteDescription.type == 'offer') pc.createAnswer(localDescCreated, logError); }, logError); else pc.addIceCandidate(new RTCIceCandidate(message.candidate)); }; function logError(error) { http://www.html5rocks.com/en/tutorials/webrtc/infrastructure/ 6/22 9/21/2015 WebRTC in the real world: STUN, TURN and signaling HTML5 Rocks log(error.name + ': ' + error.message); } To see the offer/answer and candidate exchange processes in action, take a look at the console log for the 'singlepage' video chat example at simpl.info/pc. If you want more, download a complete dump of WebRTC signaling and stats from the chrome://webrtcinternals page in Chrome or the opera://webrtcinternals page in Opera. Peer discovery This is fancy way of saying — how do I find someone to talk to? For telephone calls we have telephone numbers and directories. For online video chat and messaging, we need identity and presence management systems, and a means for users to initiate sessions. WebRTC apps need a way for clients to signal to each other that they want to start or join a call. Peer discovery mechanisms are not defined by WebRTC and we won't go into the options here. The process can be as simple as emailing or messaging a URL: for video chat applications such as talky.io, tawk.com and browsermeeting.com you invite people to a call by sharing a custom link. Developer Chris Ball has built an intriguing serverlesswebrtc experiment that enables WebRTC call participants to exchange metadata by any messaging service they like, such as IM, email or homing pigeon. How can I build a signaling service? To reiterate: signaling protocols and mechanisms are not defined by WebRTC standards.