Mumble Protocol Release 1.2.5-Alpha
Total Page:16
File Type:pdf, Size:1020Kb
Mumble Protocol Release 1.2.5-alpha Nov 06, 2017 Contents 1 Contents 1 1.1 Introduction...............................................1 1.2 Overview.................................................1 1.3 Protocol stack (TCP)...........................................1 1.4 Establishing a connection........................................3 1.4.1 Connect.............................................3 1.4.2 Version exchange........................................5 1.4.3 Authenticate...........................................5 1.4.4 Crypto setup...........................................6 1.4.5 Channel states..........................................6 1.4.6 User states............................................6 1.4.7 Server sync...........................................7 1.4.8 Ping...............................................7 1.5 Voice data................................................7 1.5.1 Packet format..........................................8 1.5.2 Codecs............................................. 10 1.5.3 Whispering........................................... 11 1.5.4 UDP connectivity checks.................................... 11 1.5.5 Tunneling audio over TCP................................... 11 1.5.6 Encryption........................................... 11 1.5.7 Variable length integer encoding................................ 12 i ii CHAPTER 1 Contents 1.1 Introduction This document is meant to be a reference for the Mumble VoIP 1.2.X server-client communication protocol. It reflects the state of the protocol implemented in the Mumble 1.2.8 client and might be outdated by the time you are reading this. Be sure to check for newer revisions of this document at http://mumble-protocol.readthedocs.org/. This document is a constant work in progress. 1.2 Overview Mumble is based on a standard server-client communication model. It utilizes two channels of communication, the first one is a TCP connection which is used to reliably transfer control data between the client and the server. The second one is a UDP connection which is used for unreliable, low latency transfer of voice data. Both are protected by strong cryptography, this encryption is mandatory and cannot be disabled. The TCP control channel uses TLSv1 AES256-SHA1 while the voice channel is encrypted with OCB-AES1282. While the TCP connection is mandatory the UDP connection can be compensated by tunnelling the UDP packets through the TCP connection as described in the protocol description later. 1.3 Protocol stack (TCP) Mumble has a shallow and easy to understand stack. Basically it uses Google’s Protocol Buffers1 with simple prefixing to distinguish the different kinds of packets sent through an TLSv1 encrypted connection. This makes the protocol very easily expandable. 1 http://en.wikipedia.org/wiki/Transport_Layer_Security 2 http://www.cs.ucdavis.edu/~rogaway/ocb/ocb-back.htm 1 https://github.com/google/protobuf 1 Mumble Protocol, Release 1.2.5-alpha Fig. 1.1: Mumble system overview Fig. 1.2: Mumble crypto types Fig. 1.3: Mumble packet 2 Chapter 1. Contents Mumble Protocol, Release 1.2.5-alpha The prefix consists out of the two bytes defining the type of the packet in the payload and 4 bytes stating the length of the payload in bytes followed by the payload itself. The following packet types are available in the current protocol and all but UDPTunnel are simple protobuf messages. If not mentioned otherwise all fields outside the protobuf encoding are big-endian. Table 1.1: Packet types Type Payload 0 Version 1 UDPTunnel 2 Authenticate 3 Ping 4 Reject 5 ServerSync 6 ChannelRemove 7 ChannelState 8 UserRemove 9 UserState 10 BanList 11 TextMessage 12 PermissionDenied 13 ACL 14 QueryUsers 15 CryptSetup 16 ContextActionModify 17 ContextAction 18 UserList 19 VoiceTarget 20 PermissionQuery 21 CodecVersion 22 UserStats 23 RequestBlob 24 ServerConfig 25 SuggestConfig For raw representation of each packet type see the attached Mumble.proto2 file. 1.4 Establishing a connection This section describes the communication between the server and the client during connection establishing, note that only the TCP connection needs to be established for the client to be connected. After this the client will be visible to the other clients on the server and able to send other types of messages. 1.4.1 Connect As the basis for the synchronization procedure the client has to first establish the TCP connection to the server and do a common TLSv1 handshake. To be able to use the complete feature set of the Mumble protocol it is recommended that the client provides a strong certificate to the server. This however is not mandatory as you can connect to the server without providing a certificate. However the server must provide the client with its certificate and it is recommended that the client checks this. 2 https://raw.github.com/mumble-voip/mumble/master/src/Mumble.proto 1.4. Establishing a connection 3 Mumble Protocol, Release 1.2.5-alpha Fig. 1.4: Mumble connection setup 4 Chapter 1. Contents Mumble Protocol, Release 1.2.5-alpha 1.4.2 Version exchange Once the TLS handshake is completed both sides should transmit their version information using the Version message. The message structure is described below. Table 1.2: Version mes- sage Version version uint32 release string os string os_version string The version field is a combination of major, minor and patch version numbers (e.g. 1.2.0) so that major number takes two bytes and minor and patch numbers take one byte each. The structure is shown in figure ref{fig:versionEncoding}. The release, os and os_version fields are common strings containing additional information. Table 1.3: Version field en- coding (uint32) Major Minor Patch 2 bytes 1 byte 1 byte The version information may be used as part of the SuggestConfig checks, which usually refer to the standard client versions. The major changes between these versions are listed in table below. The release, os and os_version informa- tion is not interpreted in any way at the moment. Table 1.4: Mumble version differences Version Major changes 1.2.0 CELT 0.7.0 codec support 1.2.2 CELT 0.7.1 codec support 1.2.3 CELT 0.11.0 codec 1.2.4 Opus codec support, SuggestConfig message 1.4.3 Authenticate Once the client has sent the version it should follow this with the Authenticate message. The message structure is described in the figure below. This message may be sent immediately after sending the version message. The client does not need to wait for the server version message. Table 1.5: Authenticate mes- sage Authenticate username string password string tokens string The username and password are UTF-8 encoded strings. While the client is free to accept any username from the user the server is allowed to impose further restrictions. Furthermore if the client certificate has been registered with 1.4. Establishing a connection 5 Mumble Protocol, Release 1.2.5-alpha the server the client is primarily known with the username they had when the certificate was registered. For more information see the server documentation. The password must only be provided if the server is passworded, the client provided no certificate but wants to authen- ticate to an account which has a password set, or to access the SuperUser account. The third field contains a list of zero or more token strings which act as passwords that may give the client access to certain ACL groups without actually being a registered member in them, again see the server documentation for more information. 1.4.4 Crypto setup Once the Version packets are exchanged the server will send a CryptSetup packet to the client. It contains the necessary cryptographic information for the OCB-AES128 encryption used in the UDP Voice channel. The packet is described in figure below. The encryption itself is described in a later section. Table 1.6: CryptSetup message CryptSetup key bytes client_nonce bytes server_nonce bytes 1.4.5 Channel states After the client has successfully authenticated the server starts listing the channels by transmitting partial ChannelState message for every channel on this server. These messages lack the channel link information as the client does not yet have full picture of all the channels. Once the initial ChannelState has been transmitted for all channels the server updates the linked channels by sending new packets for these. The full structure of these ChanneLState messages is shown below: Table 1.7: ChannelState message ChannelState channel_id uint32 parent uint32 name string links repeated uint32 description string links_add repeated uint32 links_remove repeated uint32 temporary optional bool position optional int32 The server must send a ChannelState for the root channel identified with ID 0. 1.4.6 User states After the channels have been synchronized the server continues by listing the connected users. This is done by sending a UserState message for each user currently on the server, including the user that is currently connecting. 6 Chapter 1. Contents Mumble Protocol, Release 1.2.5-alpha Table 1.8: UserState mes- sage UserState session uint32 actor uint32 name string user_id uint32 channel_id uint32 mute bool deaf bool suppress bool self_mute bool self_deaf bool texture bytes plugin_context bytes plugin_identity string comment string hash string comment_hash bytes texture_hash bytes priority_speaker bool recording bool 1.4.7 Server sync The client has now received a copy of the parts of the server state he needs to know about. To complete the syn- chronization the server transmits a ServerSync message containing the session id of the clients session, the maximum bandwidth allowed on this server, the servers welcome text as well as the permissions the client has in the channel he ended up. For more information pease refer to the Mumble.proto file1. 1.4.8 Ping If the client wishes to maintain the connection to the server it is required to ping the server. If the server does not receive a ping for 30 seconds it will disconnect the client. 1.5 Voice data Mumble audio channel is used to transmit the actual audio packets over the network. Unlike the TCP control channel, the audio channel uses a custom encoding for the audio packets.