Introducing a Verified Authenticated Key Exchange Protocol over Voice Channels for Secure Voice Communication

Piotr Krasnowski1,2, Jerome Lebrun1 and Bruno Martin1 1Univ. Coteˆ d’Azur, I3S-CNRS, Sophia Antipolis, France 2BlackBoxSecu,´ Sophia Antipolis, France {krasnowski, lebrun, bruno.martin}@i3s.unice.fr

Keywords: Authenticated Key Exchange, Secure Voice Communications, Data over Voice, Vocal Verification, Crypto Phone, Tamarin Prover, Formal Protocol Verification.

Abstract: Increasing need for secure voice communication is leading to new ideas for securing voice transmission. This work relates to a relatively new concept of sending encrypted speech as pseudo-speech in audio domain over existing civilian voice communication infrastructure, like 2G-4G networks and VoIP. Such a setting is more universal compared to military “Crypto Phones” and can be opened for public evaluation. Nevertheless, secure communication requires a prior exchange of cryptographic keys over voice channels, without reliance on any Public Key Infrastructure (PKI). This work presents the first formally verified and authenticated key exchange (AKE) over voice channels for secure military-grade voice communications. It describes the operational principles of the novel com- munication system and enlists its security requirements. The voice channel characteristics in the context of AKE protocol execution is thoroughly explained, with a strong emphasis on differences to classical store- and-forward data channels. Namely a robust protocol has been designed specifically for voice channels with double authentication based on signatures and Short Authentication Strings (SAS). The protocol is detailed and analyzed in terms of fundamental security properties and successfuly verified in a symbolic model using Tamarin Prover.

1 INTRODUCTION

An increasing concern of privacy violation in voice communications has motivated the development of secure voice over IP (VoIP) communicators, with Telegram and Signal being the iconic examples12. However, these applications are inherently insecure against spying malware installed on the smart-phone (Scott-Railton et al., 2017). Parallely, military-grade applications requiring higher protection rely on ded- icated hardware, most commonly in the form of Figure 1: Encrypted voice over voice channel scheme. Crypto Phones. These closed and unverifiable solu- tions suffer from high costs and low flexibility, as typ- speech is recorded by (a) the headset’s microphone ically encrypted phones allow communications exclu- and then forwarded to (b) the encryption device (here sively over one kind of a voice channel, like GSM. called the Crypto Box). The Crypto Box processes The mentioned limitations encourage the search the speech and enciphers vocal parameters of the sig- for open solutions complementary to Crypto Phones, nal. The encrypted speech in the form of data stream combining flexibility and high protection provided by shaped into pseudo-speech audio signal is transmit- specialized hardware. A new idea, depicted on Fig.1, ted by () the audio link to the audio input of (d) the is based on voice encryption in the audio domain. The phone and sent through 2G-4G networks or VoIP. Fi- nally, the received pseudo-speech is deciphered by the 1https://signal.org paired Crypto Box on the other side of the channel. 2https://core.telegram.org In such a setting, voice encryption is performed outside of the phone, hence protecting against audio- to strong error correction and voice compression by recording malware. To limit the risk of a system cor- coders like MELP or Codec2. ruption, the Crypto Box has only analog input/output Secure speech enciphering requires a prior ex- interfaces to the headset and to the phone. However, change of session keys between the Crypto Boxes. for security reasons, it is necessary that other analog Due to system requirements, the key exchange can inputs of the phone (particularly the built-in micro- only be made through the same point-to-point voice phone) are blocked by a special case or removed. channel, which gives no practical possibility of From the system perspective, two Crypto Boxes adding an online trusted third party (TTP) or a certifi- are the end-points of a secured voice domain. Every- cate authority (CA). Such a limitation is a big concern thing in between, including mobile phones itself, are for users’ authentication. elements of a communication infrastructure that en- Research on secure key exchange between two ables voice transmission. The framework adds a new honest parties without any TTP led to the creation layer of security, protecting against spying malware of standards suitable for VoIP applications, as an ex- installed on the phone. Since all communications be- tension of the Real-Time Transport Protocol, called tween encrypting devices is done purely in the ana- ZRTP (Callas et al., 2011), and Multimedia Inter- log domain, the selection of the specific voice com- net KEYing (MIKEY) protocol (Arkko et al., 2004). munication technology is therefore a secondary issue. Especially ZRTP is interesting in the context of this Compatibility with most of the vocal communication work, because it provides authentication mechanism methods, like VoIP applications or 2G-4G networks, in the absence of any Public Key Infrastructure (PKI) significantly widens the range of usability scenarios. or a pre-shared secret. In these situations, authenti- The described setting, which is not intended for a cation is based on vocally comparing Short Authen- daily-usage, is of great interest for business, diplo- tication Strings (SAS). Unfortunately, having three matic and military services, who require secure com- modes of operation and extensive negotiation signal- munications in an unreliable environment and without ing, even ZRTP seems to be overly complex for com- the access to a confidential communication infrastruc- munication over voice channels. Moreover, none of ture. the protocols put a sufficient emphasis on resistance to strong message distortion or desynchronization in The major motivation in our approach is to secure low-bandwidth environment. voice communication even with untrusted phones, as these should not be actively involved in the setup of a To the best of authors’ knowledge, this is the first secure connection or store sensitive data. Instead, the paper focusing on authenticated key exchange (AKE) trust is given to Crypto Box manufacturers, respon- protocols over voice channels. The work aims at sible for software implementation or update policy. giving the understanding of the very specific chan- Though, the open framework enables various hard- nel constraints, leading to a protocol highly adapted ware solutions, including combining the phone and to voice channel characteristics and system require- the Crypto Box into one device. ments. The protocol provides double authentication in a single mode of operation, by signatures and vo- Producing encrypted speech in real-time appears cal comparison of SAS. In addition, it is flexible to be quite technically challenging. Firstly, the enough to support authentication of user who did not recorded speech is encoded into the vocal parame- yet share the signing public keys between each other, ters in a similar manner as during speech compres- with SAS-only authentication or unilateral signature sion. Later, the speech parameters are encrypted and authentication. Finally, the same protocol can be used mapped onto the audio waveform. This technique, to authenticate the exchange of signing public keys. called Data over Voice (DoV), proved its feasibil- ity in practical scenarios (Katugampala et al., 2004; Shahbazi et al., 2009; Dhananjay et al., 2010; Bian- cucci et al., 2013). However, since voice channels are 2 SYSTEM REQUIREMENTS designed to carry voice signal without much loss of perceptual quality, which is a different goal than the The need for hardware-based voice encryption is a re- transmission of data, the achievable bitrate for DoV sponse to an increased risk of being intercepted. Thus, typically is at most 2 kbps. Even in case of mod- a cryptographic scheme should reflect higher require- ern digital VoIP applications, the received voice is ments for secrecy and authentication. The first con- much distorted compared to the input signal, mak- cern is recording and analyzing the network traffic ing the transmission resembling a communication by omnipresent passive eavesdroppers. Active adver- over highly distortive analog channel. Sending en- saries controlling the network are more likely to block crypted voice with such constraints is possible thanks or distort communication, which is technically very simple. However, a powerful and knowledgeable ad- 3.1 Preliminaries versary who is able to analyze and synthesize a com- patible pseudo-speech may try to modify a message or Let us describe the key exchange between honest insert his own. Finally, in critical situations, the en- users Alice and Bob who know each other, without crypting device could be hijacked in order to extract any legitimate trusted third party participating. The long-term keys. On the other hand, in our work we operational framework requires that Alice and Bob assume that the encryption device does not allow any first need to establish a non-encrypted voice connec- intrusion into its internal memory during the opera- tion with a preferred voice application. Then, they can tion, so all ephemeral data stored on the device (and initiate a secure communication. The system model deleted after each protocol run) is considered secure. assumes that identity information used to make a call A design process of the protocol is motivated by (phone number, user account, credentials etc.) is inde- an anticipated user experience. However, due to se- pendent from the authentic user identity and from the vere constraints of the voice channel characteristics, identification number of the voice encryption hard- the biggest challenges are related to protocol com- ware. Only one running session at a time is possi- plexity, synchronization and robustness. A major bot- ble since each device cannot process more than one tleneck is a large message round-trip time, around 2 message simultaneously. Therefore, several kinds of seconds long, which causes the whole protocol run- Denial-of-Service (DoS) attacks, when the adversary time prohibitively long even in case of simple proto- tries to send multiple messages to a recipient, are not cols. Another limitation is a very small bandwidth effectively different than distorting or blocking the implying a reduction of the message size. More- channel. over, the protocol has to be robust against fading In highly unreliable channels like voice channels, and signal distortion, requiring a significant simpli- Alice and Bob are never sure of message delivery. fication of signalization and strong error correction Thus, several synchronization techniques are needed, mechanisms. Finally, in order to decrease battery i.e. repeat requests, retransmissions and time-outs. power consumption, cryptographic operations should For simplicity and space limitations, most details on be rather lightweight and optimized. When imple- synchronization will be omitted here. Additionally, menting, relying on popular and verified network se- thanks to strong error-detection coding, users are able curity libraries, like OpenSSL or NaCL, could be a to detect random channel errors and differentiate them strong practical advantage. from intentional malicious manipulations. Adaptation to hardware and channel constraints should not lead to significant relaxation of the security 3.2 Symbolic Model of the Protocol level. It will be detailed that the key exchange proto- col provides strong mutual agreement on the parame- The proposed protocol, that is presented on Fig- ters used for the derivation of the session key, putting ure 2 next page, relies on Ephemeral (Elliptic- a special emphasis on preventing Man-In-The-Middle Curve) Diffie-Hellman (EC)DHE exchange (Hanker- (MITM) attacks and achieving Perfect Forward Se- son et al., 2005), authenticated by signatures (existen- crecy (PFS). The crucial property of the protocol is to tially unforgeable and deterministic) or Short Authen- enable the authentication of peers, no matter if they tication Strings. Before the protocol starts, Alice and share a common secret or not. Bob agree on the elliptic curve and the lengths of keys A successful and fast key exchange is an indicator and nonces. Public verification keys should be pro- of sufficiently good channel conditions, that provide a vided to the recipients in an authenticated way before comfortable communication. Each received message the communication starts and are stored in the Crypto can be used to effectively estimate channel character- Box address book. However, in many real scenarios istics and to improve decoding capabilities. it is not possible to properly provide such a verifica- tion key. If the signature cannot be verified by the recipient, the protocol offers vocal verification as an 3 PROTOCOL DESCRIPTION alternative, which authenticates the speakers and the parameters used to derive the current session key. This section presents the symbolic model of the au- The protocol interaction consists of several steps: thenticated key exchange protocol over voice chan- the setup, the key exchange and authentication, the nels and provides a brief explanation. protocol acknowledgement and the optional vocal verification. Table 1 contains the glossary of terms used in the protocol specification, along with their bit- lengths. Table 1: Glossary.

Acronyms Definitions Bits ...... Unsecured call initiation ...... IDU fixed user identifier 32 vocal agreement on Alice ←−−−−−−−−−−−−−→ Bob random and unique protocol initialization NU 32 nonce ...... Setup...... KS Session Key 256 ∗ A/B role ∗ Short Authentication 1 : NA ←$ Z32 ←−−−−−−−−−−−−−→ NB ←$ Z32 SAS 32 negotiation String ∗ ∗ 2 : dA ←$ Z dB ←$ Z Short Authentication 256 256 (RA, RB) (128, 32) 3 : QA = dAGQB = dBG String seeds ∗ ∗ 4 : RA ←$ Z RB ←$ Z secret/public ECDHE 128 32 (d , Q ) (256, 256) U U key pair ...... Key exchange and authentication ...... ID , N , Q signing/verification 5 : −−−−−−−−−−−−−−−→A A A (SU , VU ) (256, 256) key pair h128(IDAkNAkQAkRA) ID , N , Q , R signature (signed 6 : ←−−−−−−−−−−−−−−−B B B B SignSU (·) 256 SignSB (N) with SU ) RA, SignSA (H) hX (·) hash function X 7 : Z = dAQB −−−−−−−−−−−−−−−→ Z = dBQA 8 : KS = h256(Zk•) KS = h256(Zk•) Setup: The negotiation stage has been considerably ...... Acknowledgment ...... simplified. Participants have to mutually agree ACK on starting the key exchange procedure, therefore 9 : ←−−−−−−−−−−−−−−− the actual key exchange protocol is preceded only ...... SAS comparison over Encrypted Channel ...... by fast and automatic role negotiation in order to SAS vocal 10 : SAS = h32() ←−−−−−−−−−−−−−→ SAS = h32() prevent mutual interference or logjams. Then, both comparison Alice and Bob choose a random private integer d, a random and unique nonce N, a random value R and Symbols: Q compute a public key . Unique nonce guarantees N ≡ ’B’kIDAkNAkQAkIDBkNBkQB the uniqueness of the triple (ID,Q,N). H ≡ ’A’kIDBkNBkQBkIDAkNAkQA • ≡ IDAkNAkIDBkNB Key Exchange and Authentication: In this stage ≡ R kR kID kQ kN Alice and Bob exchange values that are used to  A B B B B derive the Session Key (K ) and the SAS. Alice sends S Figure 2: Key exchange protocol over voice channels. her public ID, the nonce, the ephemeral public key and the hash, with RA included. Bob responds with his values, appends R , and additionally sends his if any of the users was not able to verify the signature. B It is assumed that the comparison process is authen- signature over all sent parameters required for KS calculation. Alice answers with her signature over ticated - users are able to recognize voice character- istics of the peer (timbre, tempo, etc.). The SAS is the same data and finally reveals RA. It is worth noting that the protocol permits a situation when the displayed on the Crypto Box as a short string of digits signature cannot be verified. If any of the recipients or words to be vocally uttered by the users. did not obtain a verification key corresponding to the sender’s ID, the signature is checked against channel errors but not processed further. 4 FORMAL VERIFICATION

Protocol Acknowledgment: When all cryptographic Verification of the protocol is performed in a sym- parameters are exchanged, voice encryption can be bolic model, where all cryptographic primitives are started. Encryption is initiated after a reception of assumed perfect and give the adversary no advantage Bob’s acknowledgment by Alice. The acknowl- (Dolev and Yao, 1983). In the analyzed scenario, it edgment is a confirmation of error-less message means that all parties generate truly random numbers, reception, so can be non-encrypted. signatures are unforgeable and ECDH parameters do not reveal any secret information. Formal symbolic Short Authentication String Comparison: Each verification can be considered as a first step of a proto- participant can request for vocally challenging SAS col analysis, paving the way to computational model equality with the peer. SAS comparison is obligatory verification (Goldwasser and Micali, 1984; Blanchet, 2012), in which the adversary gets the power to attack Table 2: Security properties verified by Tamarin in four au- cryptographic algorithms. thentication scenarios: (a) mutual signature authentication, A formal analysis in a symbolic model of the (b) unilateral signature authentication, (c) SAS vocal verifi- proposed protocol was done with Tamarin Prover cation and (d) no authentication. Authentication (Schmidt, 2012; Meier, 2013; Meier et al., 2013), a (a) (b) (c) (d) powerful and increasingly popular automatic verifica- scenario: tion tool designed at ETH Zurich.¨ Tamarin is based Session Key secrecy     on multiset rewriting (MSR) language and supports forward secrecy     generic Diffie-Hellman group operations. In addi- injective agreement     tion, Tamarin can model many cryptographic prim- reflection attack     key compromise itives like signatures or hashes and offers an impres-   - - sive database of examples, that makes the tool suitable impersonation for the protocol evaluation. authentication or authenticated SAS comparison are 4.1 Protocol Modeling proven to provide perfect forward secrecy and injec- tive agreement. Unilateral signature authentication Verification by Tamarin implies providing an abstract between two honest users who know each other guar- protocol model, which tries to faithfully express rel- antees the same level of security as mutual signature evant information from the security perspective, but authentication. Surprisingly, vocal verification does still being within the scope of feasibility of the analy- not protect against reflection attack, because the user sis. The protocol model code can be found in (Kras- can trivially compare SAS with herself. Table results nowski, 2019). Several protocol restrictions were re- indicate the importance of authentication - none of the laxed, allowing users to run multiple protocol instan- properties were verified if no authentication was per- tiations at the same time and to “forget” the verifica- formed. tion key of the peer. SAS verification is performed by a separate protocol rule which is not obligatory, sim- ulating a realistic case when users simply ignore it. 5 SECURITY CONSIDERATIONS Vocal challenging is modeled as communicating over an authenticated (not secret) channel, that is a chan- The following sections explain in more detail the nel which the adversary can intercept but not modify. protocol characteristics, providing several justifica- Last ACK message is skipped. tions and practical recommendations. It starts from the overview of fundamental protocol elements: the 4.2 Security Properties and Verification choice of public-based cryptography, the role of sig- Results natures and of Short Authentication Strings. Next sec- tion enlists potential protocol weaknesses and possi- The protocol model was checked against the Dolev- ble fixes. Yao adversary (Dolev and Yao, 1983), having a full control over the network and with the power to reveal 5.1 Discussion the long-term secret key of any user (ephemeral data is considered secure). Evaluation was done in four au- Public Key Agreement versus Symmetric Cryp- thentication configurations: mutual signature authen- tography: In exceptionally constrained resource tication between two honest users, unilateral signature devices, such as IoT sensors or RFID cards, a pursue authentication (when only one user can verify the sig- for ultra-lightweight key exchange protocols led to nature of the peer), vocal verification or no authenti- the shift from the public key encryption towards cation. symmetric encryption techniques (Echevarria et al., Verification focused on most critical security 2016; Lee et al., 2014; Baashirah and Abuzneid, properties: (perfect forward) secrecy and a mutual in- 2018). Even the ZRTP protocol offers a possibility of jective agreement (Lowe, 1997) on the Session Key. a key exchange in a lightweight preshared mode. In The protocol was also verified for resilience to reflec- this configuration, two entities share a secret which tion attacks (a user cannot accept her own identity as a is used to encrypt or refresh the keying material for peer) and for signing key compromise impersonation the new session. In order to achieve Perfect Forward (adversary can impersonate only corrupted users). Secrecy, the long-term secret should be regularly Results of protocol verification can be found in updated, desirably after each successful key exchange Table 2. Protocol configurations involving signature run. The update decision has to be mutual, otherwise risking one-side update and user desynchronization. Zimmermann, 1996). The device should encourage Unfortunately, in voice channels such a risk cannot the mutual SAS comparison by indicating a part of be eliminated, because the last update confirmation the SAS to pronounce and a part to hear from the peer. message may not be delivered. Decreasing the chance of desynchronization by sending more confirma- Signature-based Authentication: Signatures pro- tion messages would negatively affect the protocol vide device authentication and message integrity, sim- run-time. Another solution, based on on-the-fly ilarly to message authentication codes (MAC), which resynchronization mechanisms requires an online are simpler and easier to compute. Indeed, in some server keeping the track of all key updates or a costly scenarios choosing hash-based MAC instead of sig- and potentially unsecure “guessing” the long-term natures would be sufficient. However, signatures give parameters until decryption is successful (Baashirah wider flexibility, justifying higher computational cost. and Abuzneid, 2018). Finally, as was emphasized A natural advantage of signatures is that they do not before, in some scenarios the exchange of long-term require mutual agreement and secure exchange of a secret is not possible, limiting the usability of sym- long-term secret between two parties. Moreover, each metric cryptography. In the light of above-mentioned user keeps in memory only one private signing key, reasons and relatively smaller hardware restrictions used regardless of the receiver’s identity. In conse- compared to IoT sensors, public-based key exchange quence, if the user is corrupted, the adversary should scheme seems more adequate. be able to impersonate only that person. When one user cannot obtain a verification key Role of Short Authentication Strings: If the key ex- due to insecure environment, it is still possible to change is not interfered by a third party, both partic- achieve unilateral authentication (Boyd and Mathuria, ipants obtain the same Short Authentication String. 2003; Maurer et al., 2013; Dodis and Fiore, 2017). Challenging SAS vocally between honest users has a One-side authentication prevents MITM attacks, leav- twofold role. Firstly, it enables authentication of users ing only two possibilities: both honest users securely based on voice identification. Secondly, the inequality exchange a secret or the adversary is an authenticator of codes may indicate the presence of an active MITM (Maurer et al., 2013). It naturally implies that if the adversary. However, MITM manipulations would be users want to communicate and they know they can undetected if the adversary is somehow able to influ- perform unilateral authentication, the adversary can- ence or precompute the SAS value before the users. not interfere undetected in another way than prevent- Computation of the code depends on seed values ing the successful exchange. However, the user who RA and RB chosen randomly by honest users. Impor- failed to authenticate the peer is still complied to chal- tantly, Alice and Bob are forced to select seeds before lenge the SAS, because from her perspective it is the knowing the value of the peer - Alice by sending the only formal way to verify the absence of the MITM hash of RA in the first message and Bob by revealing manipulations. RB before RA. Such construction, inspired by (Pasini A signature key management policy, due to a lack and Vaudenay, 2006), prevents adaptive selection of of any PKI infrastructure, has a crucial impact on seeds by each party. The same rule applies to the ad- the system security and usability. This work points versary, who cannot predict the SAS value until it is out two possible schemes, decentralized and fully too late. The only hope for him is a random guess with centralized, which can be chosen depending on the a low probability of a success, or an extraction of RA needs. In a centralized system, keys are managed by from the hash sent in the first message by brute force an offline central authority, keeping the track of all search. For this last reason, the length of RA should records and being responsible for key distribution be considerably larger than RB. On the other hand, the and update. In a decentralized case, each user is difference of lengths is partially compensated by tak- entitled to generate her own key pair and distribute ing QBkNB as an additional input of the hash function. public keys to specific users in authenticated way. It is worth noting, that SAS value does not have to be Following the PGP model, sharing the key can be confidential, since it plays only the authentication role performed remotely based on speaker identification and cannot be modified without detection. and vocal authentication of the channel. Thus, the In practice, the security of vocal verification proposed protocol with SAS comparison gives the depends also on how users abide to it. The SAS possibility to authenticate the exchange of signature could be represented by a smaller number of simple verification keys. pictographs or easily pronounceable words, the same way as in the ZRTP which has the PGP Word List Identity Protection: In many situations protecting incorporated into its framework (Callas et al., 2011; the identity of the user is as important as securing the content of the speech. However, calling anybody with tection. To minimize the negative consequences of a a civilian communication networks is always associ- theft, the device should be protected against physical ated with revealing user metadata (i.e. phone num- tampering, making reverse engineering very difficult. ber, user credentials, location). Even if the metadata is publicly known, it may be advantageous to at least hide the identity of the encrypting device from passive 6 CONCLUSIONS eavesdroppers. It is possible to redesign the proposed protocol to Our work is the first attempt to resolve the problem of attain identity anonymity without the change of any cryptographic key exchange over voice channels for other substantial protocol property. The ID and the military-grade secure voice communications. It also signatures of Alice and Bob can be sent encrypted introduces challenges related to secure communica- with the key derived from DH secret exchanged dur- tions over voice channels like extremely small band- ing first message round-trip, in a similar manner as width, no guarantee of message delivery and the is- in the Initial Exchange of IKEv2 standard (Kaufman sue of battery consumption. The paper lists the secu- et al., 2010). The complexity of a protocol providing rity requirements posed to the system, like protecting anonymity would increase, since it will require addi- against interception and MITM attacks, emphasizing tional data encryption. It is also important to care- the importance of user authentication in absence of a fully evaluate the way the encryption key is derived trusted server. All these concerns and limitations jus- and how it is related to the the session key, in order to tify the need of a dedicated protocol instead of relying give no foothold for cryptanalysis. fully on standardized solutions. We proposed a simplified key exchange protocol 5.2 Possible Attacks and Threats between two honest parties which is based on the ephemeral elliptic curve Diffie-Hellman (ECDHE) Many protocol vulnerabilities are focusing on the se- protocol. The protocol offers two ways of authen- lection of specific cryptographic algorithms, its im- tication - by signatures and by vocally challenging plementation and finally on compliance to protocol the equality of the Short Authentication Strings. A rules. The biggest threat is posed by not respecting symbolic model of the protocol was analyzed using the obligation of SAS comparison by real users, open- Tamarin Prover in order to verify the crucial secu- ing a space for MITM attacks. rity properties as Perfect Forward Secrecy and mutual The capabilities of modern speech synthesizers agreement on the Session Key. The process of the ver- which exploit AI techniques to impersonate speaker’s ification was explained, pointing out the limitations voice (Gao et al., 2018) question the level of authen- of a symbolic analysis, particularly model simplifica- tication provided by voice recognition. Instead of tions and perfect cryptography assumption. breaking the SAS security, the adversary may simply Formal verification was followed by the discus- simulate or replay the speaker pronouncing the code sion of the protocol properties, like unilateral authen- (Shirvanian et al., 2018). The risk is amplified by the tication provided by one-side signature verification or fact that the voice sent is highly compressed and thus the role of vocal comparison in preventing MITM at- significantly differs from its real characteristics. For tacks. The analysis led to the observation that all ana- this reason, it is recommended to extend sequence lyzed techniques in themselves do not provide perfect comparison by contextual questions, like describing authentication, thus informal identity authentication the last watched movie, or to share personal informa- methods has to be introduced. tion known only by the peer but not by the adversary. Potential vulnerabilities and attacks on the sys- If honest users are capable to verify signatures of tem were also covered in this work. Several propo- each other and of achieving strong authentication, the sitions and practical solutions regarding key manage- adversary may try a downgrade-attack. It can be done ment, proper SAS comparison or identity protection simply by modifying users’ ID and imposing vocal can serve as a guide for engineers working on the im- verification. The problem may be partially solved by plementations of exchange protocols over voice chan- displaying the IDs along the SAS. However, the real nels or in similar scenarios. solution would be to force signature verification dur- ing each protocol by default. Finally, the proposed protocol cannot protect ACKNOWLEDGEMENTS against the consequences of a device being stolen or misused, giving the responsibility to the manufacturer This work is supported by grant DGA Cifre-Defense to provide strong enough password or biometric pro- program No 01D17022178 DGA/DS/MRIS. REFERENCES Exploring the Technical Challenges in Secure GSM and WLAN. IET. Arkko, J., Carrara, E., Lindholm, F., Norrman, K., Kaufman, C., Hoffman, P., Nir, Y., and Eronen, P. (2010). and Naslund, M. (2004). Mikey: Multimedia Internet Key Exchange Protocol Version 2 (IKEv2) Internet KEYing (Proposed standard RFC3830). (RFC7296). IETF. Network Working Group. Retrieved from Krasnowski, P. (2019). Tamarin code of the key https://tools.ietf.org/html/rfc3830. exchange over voice channels. Available at Baashirah, R. and Abuzneid, A. (2018). Survey on promi- https://github.com/PiotrKrasnowski/AKE over Voice. nent RFID authentication protocols for passive tags. Lee, J.-Y., Lin, W.-C., and Huang, Y.-H. (2014). A Sensors, 18(10):3584. lightweight authentication protocol for internet of Biancucci, G., Claudi, A., and Dragoni, A. F. (2013). Se- things. In 2014 International Symposium on Next- cure data and voice transmission over GSM voice Generation Electronics (ISNE), pages 1–2. IEEE. channel: Applications for secure communications. Lowe, G. (1997). A hierarchy of authentication specifica- In 2013 4th International Conference on Intelligent tions. In Proceedings 10th Computer Security Foun- Systems, Modelling and Simulation, pages 230–233. dations Workshop, pages 31–43. IEEE. IEEE. Maurer, U., Tackmann, B., and Coretti, S. (2013). Key Ex- Blanchet, B. (2012). Security protocol verification: Sym- change with Unilateral Authentication: Composable bolic and computational models. In Proceedings of security definition and modular protocol design. IACR the First international conference on Principles of Se- Cryptology ePrint Archive, 2013:555. curity and Trust, pages 3–29. Springer-Verlag. Meier, S. (2013). Advancing automated security protocol Boyd, C. and Mathuria, A. (2003). Protocols for authenti- verification. PhD thesis, ETH Zurich. cation and key establishment, volume 1. Springer. Meier, S., Schmidt, B., Cremers, C., and Basin, D. Callas, J., Johnston, A., and Zimmermann, P. (2011). (2013). The TAMARIN prover for the symbolic anal- ZRTP: Media path key agreement for unicast se- ysis of security protocols. In International Confer- cure RTP (RFC6189). IETF. Retrieved from ence on Computer Aided Verification, pages 696–701. https://tools.ietf.org/html/rfc6189. Springer. Dhananjay, A., Sharma, A., Paik, M., Chen, J., Kuppusamy, Pasini, S. and Vaudenay, S. (2006). SAS-based authenti- T. K., Li, J., and Subramanian, L. (2010). Hermes: cated key agreement. In Public Key Cryptography - data transmission over unknown voice channels. In PKC 2006, pages 395–409. Springer Berlin Heidel- Proceedings of the sixteenth annual international con- berg. ference on Mobile computing and networking, pages Schmidt, B. (2012). Formal analysis of key exchange proto- 113–124. ACM. cols and physical protocols. PhD thesis, ETH Zurich. Dodis, Y. and Fiore, D. (2017). Unilaterally-authenticated Scott-Railton, J., Marczak, B., Razzak, B. A., Crete- key exchange. In International Conference on Finan- Nishihata, M., and Deibert, R. (2017). Reckless ex- cial Cryptography and Data Security, pages 542–560. ploit: Mexican journalists, lawyers, and a child tar- Springer. geted with NSO spyware. Technical report. Dolev, D. and Yao, A. (1983). On the security of public key Shahbazi, A., Rezaie, A. H., Sayadiyan, A., and protocols. IEEE Transactions on information theory, Mosayyebpour, S. (2009). A novel speech-like sym- 29(2):198–208. bol design for data transmission through GSM voice channel. In 2009 IEEE International Symposium on Echevarria, J. J., Legarda, J., Larranaga,˜ J., and Ruiz- Signal Processing and Information Technology (IS- de Garibay, J. (2016). lwAKE: A lightweight Au- SPIT), pages 478–483. IEEE. thenticated Key Exchange for class 0 devices. In- ternational Journal of Distributed Sensor Networks, Shirvanian, M., Saxena, N., and Mukhopadhyay, D. (2018). 12(5):6236494. Short voice imitation man-in-the-middle attacks on Crypto Phones: Defeating humans and machines. Gao, Y., Singh, R., and Raj, B. (2018). Voice imperson- Journal of Computer Security, 26(3):311–333. ation using generative adversarial networks. In 2018 Zimmermann, P. (1996). PGPfone Owner’s Man- IEEE International Conference on Acoustics, Speech ual. . Retrieved from and Signal Processing (ICASSP), pages 2506–2510. http://web.mit.edu/network/pgpfone/manual/. IEEE. Goldwasser, S. and Micali, S. (1984). Probabilistic en- cryption. Journal of computer and system sciences, 28(2):270–299. Hankerson, D., Menezes, A. J., and Vanstone, S. (2005). Guide to elliptic curve cryptography. Computing Re- views, 46(1):13. Katugampala, N., Al-Naimi, K., Villette, S., and Kondoz, A. (2004). Real time data transmission over GSM voice channel for secure voice & data applications. In 2nd IEE Secure Mobile Commmunications Forum: