AUTHENTICATION TECHNIQUES FOR HETEROGENEOUS TELEPHONE NETWORKS

By BRADLEY GALLOWAY REAVES

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2017 © 2017 Bradley Galloway Reaves For Sarah ACKNOWLEDGMENTS Iamonlywritingthistodaybecauseofthemultitudeoffamily,friends,teachers,and colleagues who helped get me here. This journey began in high school, when Mrs. Reid, my English teacher, suggested that I would make a good college professor. I wasn’t sure about the idea until my second programming class in college. I loved programming, so I would do the lab assignments at home, then show up in the lab to demonstrate the project to the TA. My work for the week was done, but I didn’t leave the lab. Instead, I stayed for the next few hours helping other students when they needed help with the programming assignments. It became the best part of my week, and I realized that there was no career I wanted more than to be a professor of computing. Having a goal and knowing what it takes to achieve it are two very di↵erent things. At the time I knew I needed a PhD, but nothing of what it took to get one. Luckily, I had wonderfully supportive professors and advisors who told me what it took, and one in particular helped me take the first steps toward a research career. Tommy Morris was a new professor at Mississippi State, and after teaching my digital design class o↵ered me a (paid!) position in his research lab. I thought I’d be doing scut work, but he very quickly let me define my own project: analyzing the security of digital radios. Along the way, he taught me the basics of computer security, how to define and execute a research project, how the publication process works, and helped me win an NSF Graduate Fellowship. As I finished my master’s degree, I desperately wanted to learn how to do the best research I could, and I knew I still had so much to learn. By certain divine providence, I joined Patrick Traynor’s group at Georgia Tech. Patrick Traynor has, over the course of the past six years, become the single greatest professional influence on my life. He has been not just an amazing academic advisor, but also an incomparable friend, confidant, occasional running coach, and an exemplar of what it means to live life fully. He taught me every aspect of my craft, including how to be an impactful teacher, mentor, manager, technologist, and researcher, and he

4 believed in me even when I did not believe in myself. He showed me by his example that as important and rewarding as it is do pour everything you have into your career, it is far more important to care for your family and put their needs first. If I am a fraction of the teacher, advisor, husband, and father that he is I will lead a full and rich life. Very few doctoral courses follow a straight-line path, and my course was no exception. The greatest surprise and kismet occurred when Patrick Traynor moved to the University of Florida in 2014 to found the Florida Institute for CyberSecurity Research (FICS), and I joined him as a founding member of the group. While leaving Georgia Tech and Atlanta was bittersweet and meant leaving behind good friends and colleagues, UF provided bountiful opportunities that I would have been a fool to miss. At FICS, I had the pleasure of working with a group of friends and colleagues who were not only all dedicated to building one of the world’s finest computer security research groups, but to have fun doing so. I know that what success I have had has been enabled by the support and camaraderie of the great group of students and faculty at FICS. There is nothing better than working with a group of friends you can rely upon in the trenches. UF provided a wonderful home to write my dissertation, but satisfying the University’s rigorous requirements as a transfer student would not have been possible without the support of the CISE graduate advisor, Adrienne Cook. I can say with no reservation that I would not have graduated without her tireless support and assistance. She not only regularly made the seemingly impossible happen, her exceptional optimism and friendliness made every visit to her oce a delight. Iwouldalsoliketothankmydissertationcommitteefortheirhelpfuladviceand guidance in completing this dissertation. One of the lessons I learned from Patrick Traynor was that it is far better to work with others on projects than to work alone, and he proved that to me by grafting me into an extensive academic family. Kevin Butler was always ready with a brilliant insight or helpful comment to make my work much better than it would have been otherwise.

5 Will Enck showed me how wonderful collaboration can be and was always ready with encouragement. Patrick McDaniel gave me the advice I needed at precisely the right times. I’m also grateful to my family, who gave me everything I needed to be successful. My mother nurtured (and sometimes even bravely endured) my insatiable curiosity, while my father taught me to love learning how things worked, building and making things, and the courage to believe I can do anything I put my mind to. My stepmother Angela showed me what it means to love someone as your own. Pat Bradley, my grandmother, taught me to love reading and to find humor in everything. My grandmother Ann showed me what endless patience and self-sacrificing love truly mean, and my grandfather John demonstrated what discipline and a strong work ethic can achieve. Sarah Anderson Reaves has been with me since well before this long journey was even an idea. She has selflessly loved me through everything: a decade of rootlessness and the uncertainty of college and graduate school; the long days and sometimes even longer nights of coursework, research, and travel; self-doubt and delusions of grandeur; celebrations and disappointments; crises and opportunities. Through all of it she has been my greatest supporter and my best friend. This thesis and the degree it completes would never have happened were it not for the love she gave and the sacrifices she made, large and small. I am so very lucky I get to share my life with her. To everyone mentioned here, and all those who have helped me become who I am: thank you from the bottom of my heart.

6 TABLE OF CONTENTS page ACKNOWLEDGMENTS ...... 4 LIST OF TABLES ...... 10 LIST OF FIGURES ...... 11 ABSTRACT ...... 13

CHAPTER 1 INTRODUCTION ...... 15 1.1 Thesis Statement ...... 17 1.2 Contributions ...... 17 1.3 Organization ...... 17 1.4 Publications ...... 18 2 BACKGROUND AND RELATED WORK ...... 19 2.1 The Modern SMS Ecosystem ...... 19 2.2 Telephone Network Background ...... 22 2.2.1 Landline Networks ...... 23 2.2.2 Cellular Networks ...... 24 2.2.3 VoIP ...... 26 2.2.4 Challenges to Authenticating Phone Calls ...... 27 2.3 Related Work ...... 29 2.3.1 Prior Work on SMS Use and Abuse ...... 29 2.3.2 Telephony Fraud and Detection ...... 30 2.3.3 Prior Work Authenticating Phone Calls ...... 31 2.3.4 Audio Quality Measurement ...... 32 3 CHARACTERIZING THE SECURITY OF THE SMS ECOSYSTEM WITH PUBLIC GATEWAYS ...... 34 3.1 Methodology ...... 36 3.1.1 Public Gateways ...... 36 3.1.2 Crawling Public Gateways ...... 40 3.1.3 Additional Data Sources and Analyses ...... 40 3.1.4 Message Clustering ...... 43 3.1.5 Message Intentions ...... 45 3.2 Data Characterization ...... 46 3.2.1 Gateways and ...... 46 3.2.2 Infrastructure ...... 46 3.2.3 Geography ...... 48 3.2.4 Clusters ...... 49

7 3.2.5 SMS Usage ...... 49 3.3 Uses of SMS as a Secure Channel ...... 51 3.3.1 PII and other Sensitive Information ...... 51 3.3.2 SMS code Entropy ...... 55 3.3.3 Takeaways ...... 58 3.4 Abuses of SMS ...... 59 3.4.1 Gateways and PVA ...... 59 3.4.2 Detecting Gateways ...... 61 3.4.3 Abuse Campaigns in SMS ...... 65 3.4.4 Takeaways ...... 69 4 DETECTING INTERCONNECT BYPASS FRAUD ...... 73 4.1 What is a Simbox? ...... 75 4.1.1 How Simbox Fraud Works ...... 76 4.1.2 Consequences of Simbox Operation ...... 78 4.2 Methodology ...... 79 4.2.1 Inputs to Ammit ...... 80 4.2.2 Detecting Unconcealed Losses ...... 81 4.2.3 Detecting Concealed Losses in GSM-FR ...... 84 4.2.4 Simbox Decision and SIM Detection ...... 85 4.2.5 Eciency of Ammit ...... 86 4.3 Threat Model and Evasion ...... 86 4.3.1 Security Assumptions ...... 87 4.3.2 Evasion ...... 87 4.4 Experimental Setup ...... 90 4.4.1 Speech Corpus ...... 90 4.4.2 VoIP Degradation and Loss ...... 91 4.4.3 GSM Air Loss ...... 93 4.4.4 Simboxing SIM Detection Test ...... 93 4.4.5 Real Simbox Tests ...... 94 4.4.6 Technical Considerations ...... 95 4.5 Detection Results ...... 97 4.5.1 Simulated Call Analysis ...... 98 4.5.2 Detection of Real Simboxes ...... 99 4.5.3 Discussion ...... 99 4.5.4 Ammit Performance ...... 100 5 PRACTICAL END-TO-END CRYPTOGRAPHIC AUTHENTICATION FOR TELEPHONY OVER VOICE CHANNELS ...... 102 5.1 Voice Channel Data Transmission ...... 104 5.1.1 Challenges to Data Transmission ...... 104 5.1.2 Modem Design ...... 105 5.1.3 Link Layer ...... 108 5.1.4 Framing and Error Detection ...... 108

8 5.1.5 Acknowledgment and Retransmission ...... 110 5.1.6 Na¨ıve TLS over Voice Channels ...... 111 5.2 Security Model ...... 111 5.3 AuthLoop Protocol ...... 114 5.3.1 Design Considerations ...... 114 5.3.2 Protocol Definition ...... 115 5.3.3 Formal Verification ...... 116 5.3.4 Implementation Parameters ...... 116 5.4 Evaluation ...... 118 5.4.1 Prototype Implementation ...... 118 5.4.2 Modem Evaluation ...... 120 5.4.3 Link Layer Evaluation ...... 120 5.4.4 Handshake Evaluation ...... 121 5.5 Discussion ...... 122 5.5.1 Client Credentials ...... 122 5.5.2 Telephony Public Infrastructure ...... 123 5.5.3 Deployment Considerations ...... 125 6 EFFICIENT IDENTITY AND CONTENT AUTHENTICATION FOR PHONE CALLS ...... 126 6.1 Security Model ...... 128 6.2 Protocol Design and Evaluation ...... 130 6.2.1 Enrollment Protocol ...... 131 6.2.2 Handshake Protocol ...... 134 6.2.3 Call Integrity Protocol ...... 136 6.2.4 Evaluation ...... 137 6.3 Speech Digest Design and Evaluation ...... 137 6.3.1 Construction ...... 140 6.3.2 Implementation and Evaluation ...... 142 6.4 System Implementation ...... 149 6.5 Results ...... 150 6.5.1 Experiment Setup ...... 150 6.5.2 Enrollment Protocol ...... 150 6.5.3 Handshake Protocol ...... 151 6.5.4 Speech Digest Performance ...... 152 6.6 Discussion ...... 153 7 SUMMARY AND CONCLUSIONS ...... 158 REFERENCES ...... 161 BIOGRAPHICAL SKETCH ...... 184

9 LIST OF TABLES Table page 3-1 Message and Phone Number Count by Gateway ...... 37 3-2 Gateway Message and Phone Number Count by Country ...... 47 3-3 Types of Carriers Used By Gateways ...... 47 3-4 Message counts by code type ...... 50 3-5 code Randomness Statistics by Service ...... 56 3-6 Message, URL Click, and Test Message Counts by Country ...... 62 3-7 Similar Phone Number Counts by Gateway ...... 63 3-8 Similar Number Counts by Carrier Type ...... 64 3-9 Phishing Domains in Gateway Messages ...... 67 3-10 VirusTotal Scans for URLS in Gateway Messages ...... 68 5-1 TLS Handshake Sizes ...... 112 5-2 Bit error rates...... 120 5-3 Link layer transmission of 2000 bits...... 120 5-4 Handshake completion times...... 121

10 LIST OF FIGURES Figure page 2-1 SMS Ecosystem Diagram ...... 20 2-2 Telephone Network Architecture ...... 22 2-3 E↵ects of AMR Codec on Sample Audio ...... 27 3-1 Cluster Sizes ...... 49 3-2 Heatmaps of codes ...... 57 3-3 Gateway Number Lifetime Statistics...... 70 3-4 Maps Indicating Locations of Gateway Message Senders ...... 71 3-5 Phishing SMS Message ...... 72 3-6 SMS Phishing Page Screenshot ...... 72 4-1 Typical and Simboxed Calls ...... 77 4-2 Short Term Energy Loss Detection ...... 82 4-3 GSM-FR PLC in Time and Cepstral Domain ...... 84 4-4 Simbox Testbed Block Diagram ...... 94 4-5 Laboratory Simbox Testbed ...... 96 4-6 Call Detection vs. Loss Rate ...... 97 4-7 SIM Detection Performance ...... 98 4-8 Ammit Analysis Time ...... 101 5-1 Example Modem Transmission ...... 105 5-2 Link layer state machine...... 110 5-3 AuthLoop Authentication Protocol ...... 112 5-4 AuthLoop message sizes...... 117 5-5 Telephony Public Key Infrastructure ...... 124 6-1 Caller ID and Call Content Attacks ...... 128 6-2 Enrollment Protocol ...... 132 6-3 Handshake Protocol ...... 134

11 6-4 Call Integrity Protocol ...... 136 6-5 RSH Digest Process ...... 139 6-6 RSH BER after audio degradation ...... 144 6-7 RSH BER on Adversarial Audio ...... 145 6-8 RSH Receiver Operating Characteristic ...... 146 6-9 AuthentiCall Enrollment Time ...... 151 6-10 AuthentiCall Handshake Time ...... 152 6-11 Digest Performance on Real Calls ...... 154 6-12 Prototype User Interface ...... 156

12 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy AUTHENTICATION TECHNIQUES FOR HETEROGENEOUS TELEPHONE NETWORKS By Bradley Galloway Reaves August 2017 Chair: Patrick G. Traynor Major: Computer Engineering The global telephone network is relied upon daily by billions worldwide for reliable communications. Beyond their use for communications, telephones are being used as a solution to identify users on Internet because almost every person globally has at least one phone number. Unfortunately, telephones are also plagued with fraud and abuse, making this use and many others insecure. This abuse is ultimately caused by the fact that the phone network o↵ers no strong guarantees of identity, and addressing this problem is complicated by the fact that the network is composed of many di↵erent and largely incompatible technologies. In this study, we examine the poor state of authentication in telephone networks and provide new mechanisms to authenticate callers to each other. We begin by examining how the telephone network — specifically, text messaging — is being used to bolster claims of identity and authentication in Internet systems, finding that public gateways negate many of the supposed advantages of these techniques. We then turn our attention to interconnect bypass fraud, showing that while telephone networks cannot e↵ectively determine the true origin of a phone call, we can provide mechanisms based on in-call audio measurements to detect so-called “simboxing fraud.” Finally, we develop two new systems, Authloop and Authenticall, to address call authentication. Both systems provide strong cryptographic authentication of callers. Authloop transmits this information through call audio, while Authenticall uses an auxiliary data channel to authenticate

13 both call end points and call content. In total, this thesis provides mechanisms to prevent robocalling, phone phishing, interconnect bypass fraud, preventing billions of dollars in fraud and restoring trust and confidence in the phone network.

14 CHAPTER 1 INTRODUCTION Since its invention in the late nineteenth century, telephones have revolutionized personal and professional communications. Even decades after the emergence of the Internet as a commodity communication networks for both people and networks, the telephone remains an important communications system. The global telephone network supports 4.7 billion mobile users [1], 1 billion fixed-line subscribers [2], and at least 100 million VoIP lines [3]. One of the reasons for the continued relevance of the telephone is that it has changed drastically over the past century. Telephones have evolved from the original fixed landline analog telephones, to add digital switching and dialing, mobile service, and finally Internet telephony (VoIP). When cellular networks enabled the short message system (i.e., text messages), they made mobile instant messaging an essential part of everyday life. All of these new technologies were deployed concurrently with legacy systems, and in fact the phone network to this day remains interoperable with legacy equipment like rotary-dial phones that are decades old. These changes have changed how users have used telephone systems, and new innovations have made many new services possible. They have also created new security issues. SMS spam is orders of magnitude easier to distribute than robodialing end users. Digital PBXs 1 have reduced the costs of small and large business telephony, but they are vulnerable to compromise of the same vulnerabilities that a↵ect traditional servers. VoIP providers have made elastically sizing telephony demands simple and inexpensive, but they also facilitate frauds that rely on the ability to spoof caller ID and churn numbers rapidly. Many of these issues are complicated by the requirement of the network to be compatible with heterogeneous technologies.

1 Private Branch Exchanges are the telephony equivalent of a LAN switch or router

15 In spite of these issues, heterogeneous telephony networks (including voice and text messaging) are still relied upon daily for the most sensitive transactions, including banking, finance, and sensitive information exchange. Telephones are also increasingly used to serve as authenticators for account creation and login for many online services. This situation is unfortunate because the telephone system o↵ers no network support for authentication. In the general case, users, devices, and carriers have no non-asserted way of determining endpoints of a call — even for trained experts. This lack of authentication abets billions of dollars a year in fraud according to the Communications Fraud Control Association [4]. Authentication — the determination of identity — is one of the most fundamental properties any secure network must provide. Without strong authentication, network entities cannot know with reliability who they are speaking to. Also without authentication, networks face significant diculties detecting, preventing, or even attributing the source of fraudulent or abusive behavior. Telephone network operators realized this problem early in the history of the phone network [5], and as a result they developed strong authentication techniques to authenticate users to the networks that serve them. In the case of cellular and VoIP networks, this involved cryptographic authentication of endpoints. However, this strong authentication only authenticates the phone to the network — not phones to other phones. As a result, this lack of authentication means that pretending to be another party in the phone network is trivial in many cases. This lack of authentication prevents attributing the true source of phone scams, including voice spam and phishing, and is at the root of many of the problems in the phone networks. In this thesis, we provide new techniques for authenticating users and calls in telephone networks. We address three critical security problems: SMS authentication problems, interconnect bypass fraud, and caller ID spoofing. These problems represent billions of dollars in fraud caused by poor authentication in telephone networks. Our solutions to these problems have the potential to transform telephony from a

16 “weakest-link” network into a source of strong authentication not just for telephony but for Internet authentication as well. 1.1 Thesis Statement

The purpose of this work is to characterize authentication in telephone networks and provide practical deployable systems that improve authentication for carriers, organizations, and end users. Accordingly, the central thesis of this dissertation is: Weak authentication in heterogeneous telephone networks enables fraud and abuse of the network and other services. New authentication mechanisms can use call audio to detect or prevent fraud and provide stronger guarantees than the network currently provides. 1.2 Contributions

This thesis makes the following contributions: 1. Measurement of reliance on telephone networks for authentication: We show empirically that many organizations and users rely on the telephone network for authenticating new or existing accounts. 2. Detection of interconnect bypass fraud: We develop the Ammit system to detect interconnect bypass calls in real time with high accuracy. 3. End-to-end authentication for heterogeneous telephony networks: We develop the Authloop system to provide authentication of end users over the voice channels. 4. End-to-end authentication over data channels: We relax the assumptions we made in Authloop and provide a novel call authentication system for endpoints that have a simultaneous data connection. 1.3 Organization

The remainder of this dissertation is organized as follows: Chapter 2 provides background on phone networks and a discussion of the prior work on telephone security. Chapter 3 characterizes how the SMS network is used for

17 authentication purposes and how that authentication is abused. Chapter 4 describes a mechanism to detect interconnect bypass fraud. Chapter 5 describes the Authloop system to authenticate phone calls using the voice channel of the call. Chapter 6 describes the Authenticall system to authenticate phone calls using an auxiliary data channel. Finally, Chapter 7 provides a discussion of concluding remarks and future research directions. 1.4 Publications

This dissertation is based on the following publications:

• Bradley Reaves, Nolen Scaife, Dave Tian, Logan Blue, Patrick Traynor, and Kevin Butler. Sending Out an SMS: Characterizing the Security of the SMS Ecosystem with Public Gateways. In Proceedings of the 37th IEEE Symposium on Security and Privacy, San Jose, CA, May 2016. (Acceptance Rate:13.0%).

• Bradley Reaves, Ethan Shernan, Adam Bates, Henry Carter, and Patrick Traynor. Boxed Out: Blocking Cellular Interconnect Bypass Fraud at the Network Edge. In Proceedings of the 24th USENIX Security Symposium, 2015. (Acceptance Rate:15.7%).

• Bradley Reaves, Logan Blue, and Patrick Traynor. Authloop: Practical End-to-End Cryptographic Authentication for Telephony over Voice Channels. In Proceedings of 25th USENIX Security Symposium, Austin, TX, August 2016. (Acceptance Rate:15.5%).

• Bradley Reaves, Logan Blue, Hadi Abdullah, Luis Vargas, Patrick Traynor, and Tom Shrimpton. AuthentiCall: Ecient Identity and Content Authentication for Phone Calls. In Proceedings of 26th USENIX Security Symposium, Vancouver, BC, August 2017. (Acceptance Rate:16.3%).

18 CHAPTER 2 BACKGROUND AND RELATED WORK This thesis concerns end-to-end authentication of users and phone calls in the complex and diverse landscape of modern telephony. The first section of this chapter provides background information on text messaging, and the second section of this chapter provides background on telephone calls that will be needed to understand later chapters. The third section of this chapter describes related work in the area of security of telephony. 2.1 The Modern SMS Ecosystem

In this section, we describe at a high level how text messages are sent and received, with a special emphasis on recent developments that have greatly expanded the SMS ecosystem. This information provides background for Chapter 3, which describes how text messages are used for authenticating end users. Figure 2-1 shows the components of the modern SMS ecosystem in detail. Short Messaging Service Centers (SMSCs) route messages through carrier networks and are the heart of the SMS system [6]. These entities receive inbound text messages and handle delivery of these messages to mobile users in the network using a store-and-forward regime similar to email. When a mobile device sends or receives a text message, the message is encrypted between the phone and the base station serving the phone; however, once inside the core network the message is typically not encrypted. Text messages 1 are not just sent between individuals, but also by parties external to the network known as External Short Message Entities (ESMEs). ESMEs form an entire industry dedicated to facilitating the sending and receiving of messages for large-scale organizations for purposes as diverse as emergency alerts, donations to charities, or receiving one-time passwords [7]. These ESMEs act as gatekeepers and interfaces to

1 We use SMS and “text message” interchangeably.

19 Cell Network Key

Encrypted Core Core Not Encrypted ESME Gateway Over Internet Cloud SMSC SMSC Web Services

VOIP ESME ESME Carrier Gateway Reseller OTT VOIP Services Carrier

ESME Web ESME Reseller Services Reseller

Figure 2-1. While viewed as existing solely within cellular networks, the modern SMS ecosystem includes a wide variety of non-traditional carriers, ESME gateways and resellers, and OTT services. This evolution challenges old assumptions (e.g., phone numbers represent mobile devices tied to a single identity) and create new opportunities for interception. Accordingly, evaluating the state of this ecosystem is critical to understanding the security it provides.

SMS. Some have direct connections to SMSCs in carrier networks via SMPP (Short Message Peer-to-Peer) [8], while others resell such access purchased from other ESMEs. For example, the VoIP carrier Bandwidth.com provides SMS access to many third party services. Recently, startups like Twilio [9], Nexmo [10], and Plivo [11]serveasESMEs and provide easy-to-deploy, low cost voice and SMS services. They serve a number of high-profile clients, including , Coca Cola, and eBay. Just as the methods for SMS distribution have evolved over the past two decades, how end users receive SMS has evolved as well. Originally, SMS were only delivered to mobile phones or to ESMEs. With the advent of smartphones, this ecosystem is changing rapidly. Over-the-top networks like Burner [12], Pinger [13], and Google Voice [14]provide SMS and voice services over data networks (including cellular data as well as Internet). Many of these services contract out to third party ESMEs for service and do not actually act as ESMEs themselves. Additionally, messages that are delivered to a mobile device may not remain restricted to that device. Systems like Apple Continuity [15], Google Voice, Pushbullet [16], and MightyText [17]uselocalwirelessnetworksorcloudservices

20 to store and sync SMS from the receiving device to the user’s other devices. Millions of subscribers use these services to transfer their messages from their localized mobile device to be stored in the cloud. The modern SMS ecosystem has the consequence from a security perspective that a single SMS may be processed by many di↵erent entities — not just carriers — who in toto present a broad attack surface. Attacks against these systems may be technical in nature and take a form similar to publicized data breaches [18]–[21]. While to date there are no disclosed attacks against these SMS services, we note that there is precedent for infiltration of carrier networks [22]. Social engineering attacks are also possible. Mobile Transaction Authentication Numbers (mTANs)2 have been stolen using SIM Swap attacks [23]whereanattackerimpersonatesthevictimtoacarriertoreceiveaSIMcard for the victim’s account, allowing the attacker to intercept security-sensitive messages. Attackers have also compromised accounts protected by one-time-passwords delivered over SMS by impersonating the victim to set up number forwarding to an attacker-controlled device [24]. Accordingly, it is worth determining what kinds of data are being sent via SMS so that the consequences of future compromise are well understood. Chapter 3 measures how di↵erent entities implement security mechanisms via text messages through the use of public SMS gateways. As such, we are able to observe a wide array of services and their behavior through time. Additionally, because these gateways provide phone numbers to anonymous users, we are also able to measure the extent to which such gateways are being used for malicious purposes. This combined measurement will help to provide the research community with a more accurate and informed picture of the security of this space.

21 Intermediary Telco IP Networks Networks VOIP Cell Network Carrier

Web Gateway Internet Services

VOIP Proxy Gateway

PSTN

Figure 2-2. A high-level representation of modern telephony systems. In addition to voice being transcoded at each gateway, all identity mechanisms become asserted rather than attested as calls cross network borders. A strong end-to-end authentication must be designed aware of all such limitations.

2.2 Telephone Network Background

Now that we have described the modern SMS ecosystem, we can begin to discuss the di↵erent technologies that are used to facilitate telephone calls. Subscribers can receive service from mobile, landline, and VoIP networks, and calls to those subscribers may similarly originate from networks implementing any of the above technologies. In this section, we provide background on each of these technologies, with an emphasis on how each handles authentication, and how each technology interoperates with the global telephone network. Figure 2-2 provides a high-level overview of this ecosystem.

2 mTANs are used to authenticate mobile banking transactions via SMS in many countries, including Germany, South Africa, and Russia.

22 2.2.1 Landline Networks

Early landline networks were entirely analog throughout the network. Over time, what was an entirely analog network switched exclusively by human operators became an automated network, and later, a network entirely digital in the core. While landline networks now have a digital core, endpoints retain the same twisted pair analog connection to the network that were used a century ago. Attacks against landline networks were the very beginnings of both o↵ensive and defensive network security [5]. In landline networks, the principal security question was whether users could be accurately billed for the service they used. While many “attacks” were motivated entirely by curiosity, others had the sole motive of placing long-distance calls without paying for toll charges. Before operator assistance was unnecessary and was still common, “phreakers” would social engineer these operators into connecting circuits or trunks to place calls that would not be billed correctly. After long-distance calling was no longer required to be operator assisted, phreakers pored through Bell technical manuals to determine how in-band long distance signaling worked and how it could be exploited to place toll-free calls. In fact, early phreakers even computerized and automated attacks that abused MIT’s internal phone network peering relationships to make unauthorized long distance calls [25]. The Bell system and federal agents actively pursued these phreakers, developing automated anomaly detection systems that would be used to isolate and attribute these illicit calls. Exploitation of in-band signaling was a major motivation in a move to an all-digital, out-of-band signaling system known as SS7. The importance of SS7 to modern telephony cannot be overstated. Even though it was deployed in the 1980’s, it still serves as a lingua franca for all of telephony; it is the primary mechanism carriers use to interconnect and share call information. It has also served as a foundational core protocol upon which new network technologies and functionality have been built. Features like caller ID, call forwarding, and number

23 portability are facilitated by SS7. The SS7 protocol was also significantly expanded to support mobile telephony, especially mobility management. While improved security over in-band signaling was a major motivation of deploying SS7, its security improvement came primarily from isolation because endpoints no longer had the ability to send signaling information to the network. In particular, SS7 did not provide mechanisms for confidentiality, integrity, authentication or authorization in the protocol. In its initial deployment, SS7 was used among a relatively small number of trusted entities. However, with the advent of VoIP — and its associated proliferation of small carriers with access to SS7 — attacks became more realistic and practical. In recent years, attacks have been demonstrated against SS7 that allow for tracking of mobile users [26], denial of service against endpoints, and redirection and interception of calls and text messages [27]. These attacks motivate the need for systems that can protect against intercepted or redirected calls like those presented in this thesis. We note that SS7 attacks are not the only threats to plain-old telephone service (POTS devices). Soon after the deployment of Caller ID to end consumers, phreakers developed so-called “orange boxes” to fabricate Caller ID information during a call. These devices send signaling tones to Caller ID boxes immediately after the call is established to change the Caller ID display to an attacker-controlled value. Orange boxes are simply another mechanism that demonstrates the phone network provides no trustworthy guarantees of the source of a phone call. 2.2.2 Cellular Networks

First generation (1G) cellular systems were the first to consider such mechanisms given the multi-user nature of wireless spectrum. Unfortunately, 1G authentication relied solely on the plaintext assertion of each user’s identity and was therefore subject to significant fraud [28]. Second generation (2G) networks (e.g., GSM) designed cryptographic mechanisms for authenticating users to the network. These protocols failed to authenticate the network to the user and lead to a range of attacks against

24 subscribers [29]–[32]. Third and fourth generation (3G and 4G) systems correctly implement mutual authentication between the users and providers [33]–[35]. The Global System for Mobile Communications (GSM) is a suite of standards used to implement cellular communications. It is used by the majority of carriers in the US and throughout Europe, Africa, and Asia. GSM is a “second generation” (2G) cellular network and has evolved into UMTS (3G) and LTE (4G) standards. GSM manages user access to the network by issuing users a small smartcard called a Subscriber Identity Module (SIM card) that contains identity and cryptographic materials. AcarrierSIMcardcanbeplacedinanydeviceauthorizedtooperateonacarrier’s network. Because GSM networks cryptographically authenticate almost every network transaction, cellular network activity can always be attributed to a specific SIM card. In the past, the ability to clone a SIM card negated this guarantee; however, modern SIM cards now have hardware protections that prevent practical key recovery and card cloning. In addition to describing network functionality, the GSM standards also specify a method for encoding audio known as the GSM Full Rate (GSM-FR) codec [36]. Although designed for mobile networks, it is also used as a general purpose audio codec and is frequently implemented in VoIP software. To avoid ambiguity, we use “GSM” or “air transmission” to mean GSM cellular networks and “GSM-FR” to indicate the audio codec. While this section has focused on GSM, which was in many ways a progenitor and model for all later systems, we note that security problems in newer standards have been discovered. UMTS, the 3G successor to GSM, has seen quite a bit of work on its security. Meyer and Wetzel found that a type of downgrade attack from UMTS to GSM security enables man in the middle attacks against UMTS devices [37]. Kambourakis et al. find several signaling attacks that result in denial of service [38]. Arapinis et al. identify attacks that can lead to tracking of mobile UMTS users [39]. LTE, the successor to UMTS, has also seen security research. Independently, Kim et al. [40] and Li et al. [41] identified attacks against VoLTE that allow for data theft, and in later work by Tu et

25 al. [42]identifiedsimilarattackswithSMStransmissioninLTEnetworks.Shaiketal. disclose tracking attacks against LTE networks[43]. 2.2.3 VoIP

Voice over Internet Protocol (VoIP) is a technology that implements telephony over IP networks such as the Internet. Two clients can complete a VoIP call using exclusively the Internet, or calls may also be routed from a VoIP client to a PSTN line (or vice-versa) through a VoIP Gateway. Providers including Vonage, , and Google Voice provide both IP-only and IP-PSTN calls. The majority of VoIP calls are set up using a text-based protocol called the Session Initiation Protocol (SIP). One of the jobs of SIP is to establish which audio codec will be used for the call. Once a call has been established, audio flows between callers using the Realtime Transport Protocol (RTP), which is typically carried over UDP. The widespread deployment of VoIP has probably been the single most security-relevant development in the history of the phone network. First, most modern attacks — including phishing, robocalls, PBX hacking, interconnect bypass fraud — are made much simpler to execute and avoid detection in VoIP networks as opposed to other techniques. Most VoIP providers, for example, make it trivial to spoof caller ID information. VoIP call quality is a↵ected by packet loss and jitter. Absent packets, whether they are the result of actual loss or jitter, cause gaps in audio. Such gaps are filled in with silence by default. Some VoIP clients attempt to improve over this standard behavior and implement Packet Loss Concealment (PLC) algorithms to fill in missing packets with repeated or generated audio. Specifically, PLC algorithms take advantage of the fact that speech waveforms are more or less stationary for short time periods, so clients can generate a plausible section of audio from previous packets. Many codecs have mandatory PLCs, although some are optional (as in the case of the G.711 audio codec) or are not implemented (as is frequently the case when GSM-FR is used outside of cellular networks).

26 a) 1-second chirp sweep from 300 - 3300 Hz before AMR-NB encoding

b) 1-second chirp sweep from 300 - 3300 Hz after AMR-NB encoding

Figure 2-3. A comparison of a (a) before and (b) after being encoded with the AMR codec. Note that while the entirety of the signal is within the range of allowable frequencies for call audio, the received signal di↵ers significantly from its original form. It is therefore critical that a high-fidelity mechanism for delivering data over a mobile audio channel be designed.

Some VoIP software (including Asterisk) implements their own PLC algorithms, but do not activate them unless configured by an administrator. 2.2.4 Challenges to Authenticating Phone Calls

The previous subsections have provided background on the di↵erent technologies that make up the global PSTN. In this subsection, we now describe why that mix of technology makes authenticating phone calls dicult. While performing similar high-level functionality (i.e., enabling voice calls), the global telephone network is built on a range of often incompatible technologies. From circuit-switched intelligent network cores to packet switching over the public Internet, all of these disparate entities rely on gateways to translate protocol information between incompatible networks. As a result, very little information beyond the voice signal actually propagates across the borders of these systems. In fact, because many of these networks rely on di↵erent codecs for encoding voice, one of the major duties of gateways between these systems is the transcoding of audio. Accordingly, voice encoded at one end of a phone call is unlikely to have the same (or even similar) bitwise representation when it arrives at the client side of the call. As evidence, the top plot of Figure 2-3 shows a sweep of an audio signal from 300 to 3300 Hz (all within the acceptable band) across 1 second.

27 The bottom plot shows the same signal after is has been encoded using the Adaptive Multi-Rate (AMR) audio codec used in cellular networks, resulting in a dramatically di↵erent message. This massive di↵erence is a result of the voice-optimized audio codecs used in di↵erent telephony networks. Accordingly, successfully performing end-to-end authentication will require careful design for this non-traditional data channel. Beyond voice, other data that may be generated by a user or their home network is not guaranteed to be delivered or authenticatable end-to-end. That is, because the underlying technologies are heterogeneous, there is no assurance that information generated in one system is passed (yet alone authenticated) to another. This has two critical implications. The first is that any proofs of identity a user may generate to their provider are not sent to the other end of the call. For instance, a mobile phone on a 4G LTE connection performs strong cryptographic operations to prove its identity to its provider. However, there exists no means to share such proofs with a callee within this system yet alone one in another provider’s network. Second, claims of identity (e.g., Caller ID) are sent between providers with no means of verifying said claims. As evidenced by greater than $7billioninfraudin2015[44], it is extremely simple for an adversary to trick a receiver into believing any claim of identity. There is no simple solution as calls regularly transit multiple intermediate networks between the source and destination. One of the few pieces of digital information that can be optionally passed between networks is the Caller ID. Unfortunately, the security value of this metadata is minimal —suchinformationisassertedbythesourcedeviceornetwork,butnevervalidatedby the terminating or intermediary networks. As such, an adversary is able to claim any phone number (and therefore identity) as its own with ease. This process requires little technical sophistication, can be achieved with the assistance of a wide range of software and services, and is the enabler of greater than US$2Billioninfraudannually[4].

28 2.3 Related Work

In this section, we highlight preceding work in the areas of telephony security addressed by this dissertation. We begin by describing prior work on SMS use and abuse, continue to work detecting telephony fraud, followed by work authenticating phone calls. We conclude by briefly discussing single-ended audio techniques that will be used in Chapter 4. 2.3.1 Prior Work on SMS Use and Abuse

Prior measurement work has studied the underground economies [45]thatdrive spam [46]–[48], malware [49]–[51]andmobilemalware[52]–[54], and other malicious behavior. While others have investigated SMS content and metadata in the context of SMS spam [55]–[58], this work is the first to expansively measure how SMS is used for security purposes by legitimate services. We note that much of the research in this area has been forced to rely on small datasets (some less than 2000 messages [58]). Mobile two-factor authentication is increasing in popularity, with some eagerly heralding its arrival [59] and others cautioning that it may only provide a limited increase in security [60]. Much of the data we collected contained mobile two-factor authentication tokens sent over SMS. While SMS tokens are popular in many contexts, including mobile banking and finance [61], other approaches have been implemented in a variety of forms including keychain fobs [62], [63], one-time pads [64], [65], biometric scanners [66], [67], and mobile phones [68]–[70]. Analysis of individual systems has led to the discovery of a number of weaknesses, including usability concerns [71]andsusceptibilityto desktop [72]ormobilemalware[73]–[78]. SMS-based tokens are especially vulnerable to link-layer attacks against the cellular network. These networks use vulnerable channel [31], [79], [80], allow end devices to connect to illicit base stations [81]–[83], and are vulnerable to low-rate denial of service attacks [84], [85]. However, the majority of the infrastructure behind many two-factor authentication systems — the portions of the

29 system outside the cellular network — has not been previously explored from a security perspective. Dmitrienko et al. were the first to examine SMS messages to study security of two-factor authentication schemes [77]. We greatly exceed the scope of their work in five important ways. First, our work presents a cohesive examination of the entire SMS infrastructure — from online services to end devices. Second, we focus on how online services use SMS well beyond two-factor authentication. Third, our data includes two orders of magnitude more services and we identify and classify the intent of each message. Fourth, we provide a more detailed classification of two-factor authentication systems. Finally, our more rigorous entropy analysis of two-factor authentication PINs allow us to make strong claims for more than 30 services (instead of just 3), helping us to find egregious entropy problems in the popular WeChat and Talk2 services. Our emphasis on phone verified accounts provides a separate contribution. Thomas et al. study the e↵ects of phone verified accounts at Google [86]. While they use datasets of purchased or disabled PVAs, we provide insight into PVA fraud from enabling services. While we confirm some of their observations, our data indicated their recommendations may prove ine↵ective at defeating PVA evasion. 2.3.2 Telephony Fraud and Detection

Telephony fraud detection is a well-studied problem, and e↵orts to fight telecommunications fraud have primarily depended on call records. Machine learning and data mining have been used extensively to detect fraudulent activity using call records [87]–[90]. Given the importance of the simboxing problem in a↵ected countries, there are anumberofcommercialsimboxdetectionproducts,aswellastwopublishedresearch papers [91], [92]. Most simbox detection systems use one of two techniques: test call generation and call record analysis. A few products use hybrid techniques [93], [94]. Test call generation approaches [95]–[99]useprobeswidelydeployedinmanynetworksto verify that the CLI (i.e. Caller-ID) records on calls are correct — if a simbox is used, the

30 CLI record would indicate the MSISDN (i.e. the phone number) of the SIM card routing the call and not the originating probe. Test call methods only work for certain kinds of simboxing (when a simboxer sells services to another telecom, not through the common case of selling calling cards to consumers). By contrast, call record analysis detect all types of simboxing. Those approaches rely on the fact that SIMs used in simboxes have usage patterns distinct from legitimate customers [91], [100]–[103]. These techniques are prone to false positives and active evasion by simboxers. In recent work, Murynets et al. published a call record analysis approach that used machine learning to identify IMEIs (device identifiers) used by simboxes [92]. The authors’ published accuracy rates measure identifying individual calls (not simbox devices) only after simboxes are identified, and thus are not directly comparable to the accuracy figures for Ammit. Additionally, that work identifies IMEIs (which are an asserted — and thus spoofable — identifier) of devices only after a simbox makes dozens or hundreds of calls with a single SIM card; even if the work described in that paper is deployed, simboxing will continue to be profitable. Our work is an improvement over the state of the art because we can reliably detect simboxed calls using features inherent to simboxing at the time of the call, thus making simboxing unprofitable. The Pindr0p system combats telephony fraud by identifying callers using audio “fingerprints.” These fingerprints consist of noise characteristics and indicators of di↵erent codecs used by the di↵erent PSTN and VoIP networks that route a call. For Pindr0p, capturing characteristics of end-to-end call path is essential to identify repeat callers. For Ammit, it is sucient to hear audio that has been degraded by any prior network. us making simboxing unprofitable. 2.3.3 Prior Work Authenticating Phone Calls

While a number of seemingly-cellular mechanisms have emerged to provide authentication between end users (e.g., Zphone, RedPhone) [104]–[114],thesesystemsultimatelyrelyon adata/Internetconnectiontowork,andarethemselvesvulnerabletoanumberofattacks

31 [115], [116]. Accordingly, there remains no end-to-end solution for authentication across voice networks (i.e., authentication with any non-VoIP phone is not possible). Mechanisms to deal with such attacks have had limited success. Websites have emerged with reputation data for unknown callers [117]; however, these sites o↵er no protection against Caller-ID spoofing, and users generally access such information after such a call has occurred. Others have designed heuristic approaches around black lists [118], speaker recognition [119]–[124], channel characterization [125], [126], post hoc call data records [127]–[130]andtiming[131]. Unfortunately, the fuzzy nature of these mechanisms may cause them to fail under a range of common conditions including congestion and evasion. Authentication between entities on the Internet generally relies on the use of strong cryptographic mechanisms. The SSL/TLS suite of protocols are by far the most widely used, and help provide attestable identity for applications as diverse as web browsing, email, instant messaging and more. SSL/TLS are not without their own issues, including a range of vulnerabilities across di↵erent versions and implementations of the protocols [132]–[135], weaknesses in the model and deployment of Certificate Authorities [136]–[142], and usability [143]–[149]. Regardless of these challenges, these mechanisms provide more robust means to reason about identity than the approaches used in telephony. We will explore this idea further in Chapters 5 and 6. 2.3.4 Audio Quality Measurement

Chapter 4 is concerned with detecting simbox fraud, and the techniques used in that chapter belong to the long tradition of non-intrusive call quality measurement. Non-intrusive measurements are taken passively and without a reference audio; this is in opposition to intrusive measurements [150], [151] which measure the degradation of a known reference signal. Traditional call quality metrics measure listener experience, and imperceptible degradations do not significantly a↵ect these scores. These scores have been shown to vary widely based on random conditions, language choice [152]or

32 VoIP client [153]. The most widely used non-intrusive measurement standard is ITU specification P.563 [154], but other metrics have been developed for holistic quality measurements [155]–[157] and for individual artifacts like robotization [158]andtemporal clipping [159]. Because call quality metrics like P.563 are only concerned with perceptible degradation and vary widely in results, they are unsuitable for detection of simbox fraud.

33 CHAPTER 3 CHARACTERIZING THE SECURITY OF THE SMS ECOSYSTEM WITH PUBLIC GATEWAYS Text messaging has become an integral part of modern communications. First deployed in the late 1990s, the Short Messaging Service (SMS) now delivers upwards of 4.2 trillion messages around the world each year [160]. Because of its ubiquity and its perception as providing a secondary channel bound tightly to a user’s identity, a range of organizations have implemented security infrastructure that take advantage of SMS in the form of one-time codes for two-factor authentication [68]–[70]andaccountvalidation[48]. The text messaging ecosystem has evolved dramatically since its inception, and now includes a much wider range of participants and channels by which messages are delivered to phones. Whereas phone numbers once indicated a specific mobile device as an endpoint and were costly to acquire, text messages may now pass through a range of di↵erent domains that never touch a cellular network before being delivered to a non-cellular endpoint. Moreover, these systems allow users to send and receive messages for free or low cost using numbers not necessarily tied to a mobile device, specific geographic area or even a single customer. As such, they violate many of the assumptions upon which the previously mentioned security services were founded. In this chapter, we perform the first longitudinal security study of the modern text messaging ecosystem. Because of the public nature of many SMS gateways (i.e., messages are simply posted to their websites), we are able to gain significant insight into how a broad range of companies are implementing SMS-based security services. Moreover, these systems allow us to see the ways in which defenses such as phone-verified accounts

Text of this chapter is reprinted with permission from Bradley Reaves, Nolen Scaife, Dave Tian, Logan Blue, Patrick Traynor, and Kevin Butler. Sending Out an SMS: Characterizing the Security of the SMS Ecosystem with Public Gateways. In Proceedings of the 37th IEEE Symposium on Security and Privacy, San Jose, CA, May 2016. (Acceptance Rate:13.0%).

34 (PVAs) are successfully being circumvented in the wild. Our work makes the following contributions:

• Largest public analysis of SMS data: While others have looked at aspects of SMS security in the past [77], [161], ours is the largest and longest study to date. Our analysis tracks over 400 phone numbers in 28 countries over the course of 14 months, resulting in a dataset of 386,327 messages. This dataset, which is orders of magnitude larger than any previous study of SMS, allows us to reason about the messaging ecosystem as a whole, which has not been possible in previous public studies.

• Evaluation of security posture of benign services: We observe how a range of popular services use SMS as part of their security architecture. While we find many services that attempt to operate in a secure fashion, we identify a surprising number of other services that send sensitive information in the clear (e.g., credit card numbers and passwords), include identifying information, and use low entropy numbers for their one-use codes. Because there is no guarantee that this channel is indeed separate, such observations create the potential for attacks.

• Characterization of malicious behavior via SMS gateways: We cluster and characterize the lifetime, volume and content of the trac seen in SMS gateways. Our analysis uncovers numerous malicious behaviors, including bulk spam and phishing. Most critically, our data shows that these systems are being used to support phone-verified account fraud, and the ways in which these systems are used makes proposed mitigations from previous work [86]largelyine↵ective. We note the very fact that some users are willing to intentionally direct text messages to public portals is obviously dangerous. We do not address this phenomenon and instead focus on the risks of compromise of the SMS channel. Because these messages are known by the recipient to be publicly available, this dataset would naturally not be entirely representative of all SMS activity of a typical user. Nevertheless, this dataset enables the first public insights into issues such as PVA scams, SMS spam, and sensitive information sent by legitimate services. Furthermore, this data is widely available to the community for continued evaluation and measurement in the future. The remainder of the chapter is organized as follows: Section 3.1 discusses our collection and analysis methodology; Section 3.2 characterizes our dataset; Section 3.3

35 discusses our analysis on legitimate usage of SMS via the gateways; Section 3.4 discusses the malicious behaviors seen in our dataset. 3.1 Methodology

In this section, we describe the origins of our dataset, discuss some limitations of this dataset, discuss supplementary sources that give us additional insights into our SMS dataset, and finally describe the techniques we use to extract meaningful information from this dataset. 3.1.1 Public Gateways

In the previous section we noted that there are a number of organizations that process text messages, including carriers, ESMEs, resellers, and value-added services like message syncing. Within the category of ESMEs lie a niche class of operator: public SMS gateways. Many third party entities (including cellular carriers) provide external public interfaces to send text messages (but not receive them). Example use cases include the convenience of an email gateway or the ability to use a web service to send a message to a friend after one’s mobile phone battery dies. While there are many public services for sending messages, they also have counterparts in public websites that allow anyone to receive a text message online. These systems publish telephone numbers that can receive text messages, and when a text message arrives at that number the web site publicly publishes the text message. These services are completely open — they require no registration or login, and it is clear to all users that any message sent to the gateway is publicly available. We recognized the research value of these messages for the potential to inform a data-driven analysis, and collected them over a 14 month period from 8 distinct public gateways that facilitate the receiving of text messages,1 listed in Table 3-1. These gateways have similar names that are potentially

1 Note that throughout the rest of the chapter we use the term “gateway” to refer exclusively to these receive-only SMS gateways.

36 Table 3-1. SMS gateways analyzed and the number of messages and phone numbers collected from each. Site Messages Phone #s (1) receivesmsonline.net 81313 38 (2) receive-sms-online.info 69389 59 (3) receive-sms-now.com 63797 48 (4) hs3x.com 55499 57 (5) receivesmsonline.com 44640 93 (6) receivefreesms.com 37485 93 (7) receive-sms-online.com 27094 19 (8) e-receivesms.com 7107 14 confusing, so where appropriate we reference them by an assigned number 1–8 based on message volume. Despite their similar names, most of these services appear to be unaliated, and each has distinct hosting infrastructure. Gateways 4, 5, and 7 share 21 phone numbers, indicating a likely relationship between these gateways. These di↵erent services have essentially the same functionality, but advertise their intended use in di↵erent ways. For example, Gateway 2 claims to be “useful if you want to protect your privacy by keeping your real phone number away from spammers,” while Gateway 4 instructs users to “Enter the number where you want verify [sic] like Gmail, Yahoo, Microsoft, Facebook, Amazon, VK etc.” Gateway 7 has perhaps the most specific use case: “When your ex-wife wants to send you a text message.” Gateway 4 indicates that they expect users to use their service for account verification, while Gateways 2 and 7 simply advertise themselves as privacy services. We suspect that the business model of most of these websites relies on advertising revenue, and this is confirmed by at least Gateway 2, which prominently displays “almost all of [our income] comes from our online advertising” in a banner requesting that users disable their ad blocker. However, advertising is not the sole source of revenue for every system: Gateways 3, 4, 5, 6, and 8sellprivatenumbersforreceivingSMS,whileGateways4and5actuallysellverified Google Voice and WhatsApp accounts.

37 Ethical considerations. As researchers, our ultimate goal is to improve the security practices of users and organizations, but we must do so ethically. In particular, we should make every e↵ort to respect the users whose data we use in our studies. Asuperficialethicalanalysiswouldconcludethatbecauseitisclearthatallmessages sent to these gateways are public, and their use is strictly “opt-in”, users have no reasonable expectation of privacy in the collection and analysis of this data. While we believe this analysis to be true, the situation is more complex and requires further discussion, as there are a number of parties to these messages. In addition to users who knowingly provide a gateway number as their own phone number, other individuals and institutions (companies, charities, etc.) may send information to individuals, not knowing that the messages are delivered to a public gateway. While institutions rightfully have privacy rights and concerns, they di↵er from those of individuals. As we show in our results, the vast majority of the information that we collect is sent indiscriminately and automatically by organizations to a large number of recipients. These messages are unlikely to contain information that would negatively impact the institution if disclosed. Although we study bulk institutional messages, we do not analyze further those messages determined to be of a strictly personal nature. While those messages may have a research value, we deliberately avoid these messages to prevent further propagating this data. Nevertheless, the use of gateways absolutely creates confidentiality and privacy concerns. For example, when personally identifying information (PII) or account credentials are sent to a gateway (whether or not all parties are aware), the compromise of that information is immediate and irrevocable.2 Because we do not make our data available to others, this study does not change — in severity or duration — the harm done by the existence and use of the gateway. Furthermore, while in Section 3.3.1 we describe

2 Except perhaps by the gateway itself; however, it is clear from our data that gateways are not taking steps to prevent PII exposure

38 ahostofsensitiveinformationfoundinthedataset,wedonotpublish,use,orotherwise take advantage of this information. In particular, we especially do not attempt to access accounts owned by gateway users or operators. We recognize that there are ethical questions raised not just with the collection of this data, but also by combining it with other data sources. Our data augmentation is suciently course-grained that no individual user of a gateway could be identified through our additional data.3 Geographic information not already disclosed in text messages was limited to country-scale records in the case of gateway users and city-scale in the case of gateway numbers (which in any case do not likely correlate with the location of the gateway operator). Overall, our hope is this study would raise awareness of the risks of sending sensitive information over insecure media and prevent future harm.

Limitations. To the best of our knowledge, this chapter presents an analysis of the largest dataset of SMS published to date. However, there are some limitations to this data. First, because the messages are public, many services that use SMS (like mobile banking) are likely underrepresented in our dataset. For this reason, our findings about sensitive data appearing in SMS are likely underestimated. Second, because gateways change their phone numbers with regularity, it is unlikely that long-term accounts can be successfully created and maintained using these numbers, which may bias the number of services we observe in our dataset. Accordingly, those users are unlikely to enable additional security services like mobile two-factor authentication (2FA) using one-time passwords (OTP), further limiting our visibility to a wider range of services. These limitations mean that the overall distributions that we report may not generalize to

3 The one exception to this was an individual whose information was used (likely without his/her knowledge) to register a domain used in a phishing scam. This information was discovered after a routine WHOIS lookup after discovering the phishing domain.

39 broader populations. Nevertheless, we believe that this work provides useful conclusions for the security community. 3.1.2 Crawling Public Gateways

To gather messages from gateways, we developed a web crawler using the Scrapy [162] framework. Every 15 minutes, our crawler connected to each gateway, obtained new messages, and stored these in a database. We faced two challenges to accurately recording messages: ignoring previously crawled messages and recovering message received times. Ignoring previously crawled messages was dicult because gateways display the same messages for a considerable amount of time (days, months, or even years). A consequence of this is that our dataset contains messages that gateways received before our data collection began. In order to prevent storing the same messages repeatedly (and thus skewing the results), we discard previously crawled messages upon arrival by comparing the hash of a concatenation of the sender and receiver MSISDNs and the message content against hashes already in the database. If a match is found, the message sent times are compared to ensure that they were the same instance of that message, ensuring that messages that were repeatedly sent are still included in the data. Message times required finesse to manage because gateways report a relative time since the message was received (e.g., “3 hours ago”) instead of an ideal ISO-8601 timestamp [163]. Parsing these timestamps is fairly simple, but care must be taken when doing comparisons using these times as the precision can vary (“3 minutes” vs. “3 days”). To ensure accuracy, we store and take into account the precision of every timestamp when comparing message timestamps. 3.1.3 Additional Data Sources and Analyses

Phone number analysis. After the scrapers pull the initial data from the gateways, the data is augmented with data from two outside sources. The first service, Twilio [9], provides a RESTful service that provides mobile, VoIP, and landline number look ups. Twilio resolves the number’s country of origin, national number format for that country,

40 and the number’s carrier. Carrier information includes the carrier’s name, the number’s type, and the mobile network and country codes. Twilio is accurate and appropriately handles issues like number porting, which could cause inconsistencies in our data if incorrect. The second service, OpenCNAM [164], provides caller identity information for North American numbers. This database contains a mapping of phone numbers and strings; carriers consult this database to provide Caller ID information when connecting a call. Therefore, OpenCNAM is also the most accurate public location to obtain identity information for North American numbers. We obtained data from both Twilio and OpenCNAM for all the numbers that were hosted on the gateways as well as a subset of the numbers that contacted the hosted numbers.

URL analysis. We extracted 20,793 URLs from messages by matching URL regular expressions with each message in the dataset. Overall, there were 848 unique second-level domains and 1,055 unique base URLs (fully-qualified domain names and IP addresses) in this set. For each of these domains, we obtained domain registration data. A domain’s WHOIS registration data contains useful metadata about the history of a domain, including its creation date. Since this data is distributed among registrars, it is not always available and some fields may be restricted. We were able to obtain complete registration data for 532 of the second-level domains in our set. Due to the limited length of an SMS message, shortened URLs are often sent in these messages. The short URL is a hop between the user and the destination, allowing URL shortening services to collect data about the users following the links. For each Bitly- and Google-shortened URL, we obtained statistics (e.g., number of clicks) when possible. The SMS gateway services do not publish data on their users, so this data represents one of the best insights into user demographics in our dataset.

41 Finally, since these gateways freely accept and publicly post SMS messages, the gateways represent an easy mechanism for delivering malicious messages including phishing or malicious URLs. VirusTotal [165]canprovidevaluableinsightintothe maliciousness of a given URL. We requested scans of each of the URLs via VirusTotal and collected the scan reports. If a URL had a previously-requested scan, we collected the cached scan and did not rescan the URL. Due to the short lifetimes of some malicious domains, we anticipated earlier scan results would be more accurate. For each product that VirusTotal uses to scan the URL, it reports whether or not the product alerted and if so, the category of detection.

Personally-identifying information analysis. We searched the messages for personally-identifying information (PII) [166] using regular expressions. In particular, we searched for major credit card account numbers (e.g., Visa, Mastercard, American Express, Discover, JCB, and Diners Club). For each match, we further verified these numbers using the Luhn algorithm [167]. This algorithm performs a checksum and can detect small input errors in an account number. This checksum is built into all major credit card account numbers and can also assist in distinguishing a 16-digit Visa account number from a 16-digit purchase order number, for example. This check is rudimentary, however, and we manually verified that the remaining matches appeared to be account numbers in context (i.e., the messages containing these numbers appeared to reference an account number). Furthermore, we also checked strings of numbers to determine if they were identification numbers such as US Social Security Numbers or national identifiers from Austria, Bulgaria, Canada, China, Croatia, Denmark, Finland, India, Italy, Norway, Romania, South Korea, Sweden, Taiwan, or the United Kingdom. We found no valid matches in our data.

42 3.1.4 Message Clustering

Amajorgoalofthisstudyistodeterminewhattypesofmessagesaresentvia SMS and how service providers are using SMS. While there are available machine learning techniques for this type of analysis and clustering (e.g., topic discovery and text clustering), scalability is a major problem when dealing with the large number of messages in our dataset. Accordingly, we explore other methods as described below.

Keyword analysis. As a first attempt, we automatically labeled messages in the dataset using searches in multiple languages for keywords such as “password,” “email,” and “verification.” We found that these keywords are often overloaded and insucient for successfully separating the data. For example, Talk2 [168]uses“verificationcode”for the purpose of new account creation, while SMSGlobal [169]uses“verificationcode” for one-time passwords. Adding further complication, LiqPay [170]uses“password”for one-time passwords. Furthermore, we identified messages that referenced our keywords without containing any obvious authentication data. These messages are often informative messages about the keywords (e.g., “Do not disclose your password.”). Conversely, some messages containing sensitive information did not include keywords. Ultimately, the outcome of this experiment was unsuccessful, leading us to adopt a manual labeling approach.

Clustering analysis. Through further analysis, we discovered that many messages from the same service provider share the same pattern. We manually reviewed messages and grouped similar messages together into “clusters.”4 The essence of our clustering algorithm is distance-based clustering [171]. However, we wanted a high-accuracy clustering algorithm with minimal and easily estimated

4 Our definition of this term should not be confused with the classic machine learning definition of “clustering.”

43 tuning parameters, ruling out k-means. We attempted to use an edit-distance metric to group similar messages into a connected graph (where edges are created between similar messages), but a pairwise algorithm exceeded the time and hardware available to the project. Instead, we noted that the messages we were interested in were virtually identical, apart from known common variable strings like codes or email addresses. By replacing these with fixed values, a simple lexical sort would group common messages together. We then identified cluster boundaries by finding where the normalized edit distance was lower than a threshold (0.9) between two consecutive sorted messages. Our threshold was was empirically selected to conservatively yield correct clusters, and we were able to cluster all 386,327 messages in a few minutes with high accuracy. Amoreexplicitstatementofthisprocessfollows: 1. Load all messages. 2. Preprocess messages by replacing numbers, emails and URLs with fixed strings. 3. Alphabetically sort preprocessed messages. 4. Separate messages into clusters by using an edit distance threshold to find dissimilar consecutive messages. 5. Manually inspect each cluster to label service providers, message types, etc. In this step, we culled clusters that had < 43 messages.5 Preprocessing is perhaps the most important step, because it allows us to avoid aligning messages from di↵erent service providers together. When using naive sorting on the original messages, the sorting places together messages from various services that start with a verification code. We avoid this problem by replacing variable content with a fixed string, causing the final sort order to be related to the non-variable content of

5 We initially planned on labeling only clusters with more than 50 messages, but our labelling process resulted in more labeled clusters than expected.

44 the messages. Unlike traditional machine learning methods, our sorting-based clustering method is fast (minutes for our dataset). After clustering, we manually labeled each cluster, a time-consuming process which allowed us to both verify the correctness of the cluster generation, and guarantees correct labels. It is dicult to determine the intent of the message when the message contains little context (e.g. “X is your Google verification code.”). For the most common 100 services, we attempted to identify message intentions using those services’ public documentation. Where this information was unavailable, we attempted to register accounts with the services to obtain messages and match these to our clusters. If we were still unable to determine the message type, we classified these with a generic label. We also define and apply labels based on the overall content of the message, including content such as PII or any sensitive, security-related information. 3.1.5 Message Intentions

Due to the lack of standardized terms for the intentions of the authentication and verification values sent via SMS, we divided the various message intentions into categories in this section. In this chapter, we use code to describe the value extracted from any message sent to a user for any of the below intentions. To our knowledge, there is no authoritative source for these intentions, despite their popularity. More than 261,000 (67.6%) of the messages contain a code,andthefollowingcategoriesenabledustomore accurately cluster our messages:

• Account creation verification:Themessageprovidesacode to a user from a service provider that requires a SMS verification during a new account creation.

• Activity confirmation:Themessageprovidesacode to a user from a service provider asking for authorization for an activity (e.g., payment confirmation).

• One-time password:Themessagecontainsacode for a user login.

45 • One-time password for binding di↵erent devices:Themessageissentto ausertobindanexistingaccountwithanewphonenumberortoenablethe corresponding mobile application.

• Password reset:Themessagecontainsacode for account password reset.

• Generic:Weusethiscategoryforanycodes to which we are unable to assign a more specific intent. 3.2 Data Characterization

In this section, we provide high-level information about our collected data. The dataset includes data from 8 gateways over 14 months. Overall, our dataset includes 386,327 messages sent from 421 phone numbers from 52 known carriers in 28 countries. Table 3-2 shows the message count for gateway phone numbers alongside the total number of gateway numbers by country. 3.2.1 Gateways and Messages

Table 3-1 shows the eight gateways we scraped, the number of messages from each, and the number of unique phone numbers hosted at each service during the collection time. The number of messages received by each gateway ranged from 7,107 to 69,389. The number of hosted numbers per service ranged from 14 to 93. 3.2.2 Infrastructure

We obtained detailed data from Twilio about the phone numbers in our dataset, as shown in Table 3-3.Twilioidentified52carriers,ofwhich46aremobile,3areVoIP, and three are labeled as landline carriers. We believe that the numbers seen from these “landline” carriers are simply mislabeled as landlines by Twilio and are actually mobile numbers, due to all three being carriers that advertise both mobile and landline service. Furthermore, Twilio indicates numbers from bandwidth.com as “mobile” numbers (this is not due to porting, as Twilio resolves porting scenarios). The carrier known as bandwidth.com is actually a VoIP provider. The numbers in this chapter are corrected to reflect this.

46 Table 3-2. Gateways have an international presence, with most message volume taking place in North America and Western Europe. The message count represents the number of messages sent to numbers in each country. Country MessageCount NumberCount United States 95138 98 Canada 77036 55 Germany 53497 46 United Kingdom 44039 75 Poland 16103 15 Sweden 14849 22 Spain 11323 11 France 8273 10 Russian Federation 7344 - Norway 6674 8 Mexico 6431 5 Romania 6043 6 Australia 5964 13 Belgium 5253 3 India 5064 2 Ukraine 4363 3 Italy 4326 3 Thailand 4073 5 Hong Kong 3251 7 Israel 1971 5 Switzerland 1722 3 Finland 1714 13 Lithuania 520 1 Estonia 405 1 Ireland 331 3 Austria 158 2 Denmark 54 1 Czech Republic 6 2 Belgium - 3

Table 3-3. Using Twilio-provided data, we obtained the carrier type for each of the carriers associated with sender and receiver numbers on the gateways. Carrier Type Amount Percent of Total Mobile 261 62.0% VoIP 149 35.4% Landline 11 2.6%

47 3.2.3 Geography

Twilio’s number data also includes geolocation information for each number which shows our data is based in 28 countries. The United States has the most gateway controlled numbers with 98 numbers seen receiving 95,138 messages, the most trac of any country. Conversely, Lithuania only had one gateway-controlled number registered to it, the lowest of the countries in our data. The Czech Republic has the fewest messages sent to the gateway-controlled numbers registered to a country, with two numbers receiving only six messages. Interestingly, 9 of our numbers are associated with providers who service the Channel Islands, located o↵the coast of France with a total population of less than 170,000 people. Twilio data provides only the country of origin, so for all 153 numbers in the United States and Canada we obtain caller ID name (CNAM) data.6 We found that the vast majority of numbers (55.4%) have no CNAM data at all. Of those messages that have data, the ocial record in the CNAM database is simply “CONFIDENTIAL,” “WIRELESS CALLER,” or “Unavailable.” Note that “Unavailable” is the actual string that would be displayed to a user, not an indication of no data in the database. The remainder of the messages are sent to phone numbers that have CNAM data indicating the number is in one of 57 cities or 3 provinces (British Columbia, Ontario, and Quebec) in the United States or Canada. By message volume, the top locations are “Ontario”, followed by Centennial, CO (in the Denver area); San Francisco, CA; Little Rock, AR; Airdrie, AB; Columbia, SC; San Antonio, TX; Detroit, MI; Cleveland, OH; and Washington, MD. There are several observations to make from these findings: first, numbers are selected to well beyond what is likely the gateways’ main location. Second is that neither gateways nor users feel a need to use numbers based in large population centers. With the exception of Centennial, CO, all locations had four or fewer numbers,

6 CNAM data only covers the US and Canada.

48 Figure 3-1. Cluster sizes are exponentially distributed, and so appear as a straight line when sorted and plotted on a log-log scale. regardless of population of the location. Gateways 4 and 5 registered the numbers in Centennial. 3.2.4 Clusters

We generated 44,579 clusters from our dataset. All messages with more than 43 messages were manually tagged and analyzed giving us 754 tagged clusters. These clusters represent the messages from the most popular services in our dataset. The tagged clusters only represent 1.7% of the total clusters but the tagged clusters cover 286,963 messages (74.2%). Figure 3-1 represents the data that supports this assertion by showing the exponential distribution of the cluster sizes. 3.2.5 SMS Usage

As shown in Table 3-4,messagescontainingacode constitute the majority of our dataset at 67.6% of the total messages, enforcing that a main usage of SMS in our data is verification and authentication.7 Account creation and mobile device binding codes are the largest subcategories with 51.6% of the messages. Compared to other messages containing a code, one-time password messages are only 7.6% of messages. The URL

7 As we note in the previous section, these percentages are reflective of gateway messages, and may not necessarily be representative of broader SMS trends.

49 Table 3-4. We separated and labeled each cluster containing a code the intent of the message. This table contains each of those labels and the number of messages in each, which total 74.2% of the messages in our dataset. Tag Messages % Tagged otp-dev 95685 33.4% code 52872 18.5% ver 52181 18.2% conf 38521 13.4% otp 21919 7.6% pw-reset 3602 1.3% ver-url 3139 1.1% advertising 2999 1.0% pw-reset-url 2696 0.9% test 2612 0.9% info 2339 0.8% otp-dev-url 863 0.3% password 697 0.2% code-url 676 0.2% conf-ro 401 0.1% otp-url 320 0.1% stop 284 0.1% username 178 0.06% conf-url 92 0.03% variations for these code messages are also rare, constituting only 2.6% of messages. This reflects that most services prefer to plain codes, instead of URLs, which may not work well for older phones. Password reset messages comprise 1.3% of our dataset. The corresponding URL version takes another 1.0% of our dataset. Interestingly, these password reset URLs overwhelmingly consist of Facebook results. Asmallpart(0.8%)consistsof“test”messages.Thesearemessagesthatconsistof text such as “Test,” “Hello,” or “Hi” with no other information. This category consists of large clusters of messages sent by individuals to probe that the service works as advertised and is currently working. The sender phone numbers, therefore, provide insight into users of the gateways. We explore this more fully later in Section 3.4.

50 Finally, a few messages contain partial or complete usernames and passwords. These messages are particularly egregious because they may lead to account compromise and/or user identification. We discuss this further below. 3.3 Uses of SMS as a Secure Channel

In this section, we discuss what we observed about the security implications if any of the components of the SMS ecosystem described in Figure 2-1 are compromised. Although the usage we discuss in this section is benign, we observe the presence of PII and low code entropy, which are dangerous when available to an adversary in this ecosystem. 3.3.1 PII and other Sensitive Information

SMS has become a major portion of global telecommunications worldwide, and its use by companies and other organizations comes as unsurprising. However, our dataset contained instances of companies using SMS to distribute payment credentials or other financial information, login credentials, and other personally identifiable information. We also see instances where gateways are used for sensitive services.

Financial information. We found several distinct instances of credit card numbers being distributed over SMS in our dataset. Two of these appear to be intentional methods of distributing new cards, while another two appear to be the result of commerce. We discovered these using PII regular expressions. We also discovered several instances of CVV2 codes in our data. CVV2 codes are credit card codes meant to verify that the user is in possession of the physical card at the time of purchase. We found that two services that provide “virtual” credit card numbers to allow access to mobile wallet funds distribute the numbers over SMS. These card numbers are “virtual” in the sense that they are not backed by a credit line, but in fact seem to be persistent. The first service is Paytoo, based in the United States. We recovered three distinct cards from this service, and additional messages containing balance updates, account numbers and transaction identifiers. While password reset was handled over email, identifiers such as email, username, phone number, or account number could all be used for login.

51 The other service is iCashCard, based in India. They distribute a prepaid credit card account number over SMS; this card is protected by a PIN also distributed over SMS. Additional messages contained a separate PIN which allows for account login with the phone number, meaning that access to SMS reveals access to the entire payment credential and account. We found an additional credit card number, CVV, and expiration value from an unnamed service whose identity or purpose we could not identify. The message indicated that it was being sent to a user who had purchased a “package” of some sort, and confirmed the purchase using the full credit card number. Incidentally, the purchaser’s IP address was listed in the SMS, and that IP address was placed in SANS blocklist for suspected bots and forum spammers. Our PII regular expressions discovered one final credit card number present in a text message sent to a Mexican phone number. The message contains a reference to aVenezuelanbank,thecardholder’sname,andincludesthecreditcardnumber,the CVV2, and the expiration date. To determine the context for this message, we examined other messages from this sender and found what appeared to be an SMS-based mailing list for purchasing items on the black market in Venezuela; items for sale included US paper products (diapers, tissue), oil, and tires, as well as US dollars at non-ocial rates [172]. Our best hypothesis for the presence of the credit card is that a purchaser of an item mistakenly sent payment information to the list in place of the actual sender. Nevertheless, this highlights that highly sensitive enterprises rely on SMS. In addition to credit card information, we discovered one unidentified Polish service that includes a CVV2 code in their messages after registering for a prepaid service. Translated (by Google), these messages read: “Thank you for registering on the site prepaid. Your CVV2 code is: 194” The financial information in our gateway data is not limited to credit card numbers. We found several instances of messages sent by a prepaid credit card provider in Germany,

52 PayCenter [173], that distributes bank account numbers (IBANs) in SMS messages. The same provider also sends a verification text to the user with a URL that includes the user’s full name. The messages above indicate that some services unwisely transmit sensitive financial information over SMS.

Usernames and passwords. In scanning our labeled clusters, we identified several services that would allow user accounts to be compromised if SMS confidentiality is lost. The most prominent example of these is Canadian international calling provider Boss Revolution [174]. Their user passwords are distributed via SMS, and usernames are simply the user’s phone number. Thus, an attacker with read access to these messages can compromise an account. Another example was the Frim messaging service [175]. That service also uses the user’s phone number and a password distributed over SMS. Other services distributing usernames and passwords in SMS include eCall.ch (a Swiss VoIP provider) [176]andRedOxygen(abulkSMSprovider)[177]. Fortunately for users, most online services in our data do not distribute password information through SMS.

Password reset. Several organizations, including Facebook and the investment platform xCFD, distribute password reset information via SMS in addition to, or in place of, other methods. The most common password request in our data was for Facebook account resets. Upon investigating these messages (using only our own accounts), we found that the messages contained a URL that would allow a password reset with no other identifying information or authentication — not even a name or username. This would allow any adversary with access to the message — either as it transits carrier networks, the receiving device, or any other entity that handles the message – to control the account. If the adversary has the username, he/she could cause reset messages to be sent for that account, allowing the adversary to take complete control of the account. This highlights the consequences of a compromise of the SMS ecosystem.

53 Other personally identifiable information. We found numerous examples of PII — including addresses, zip codes, and email addresses. Email addresses are worth noting because the presence of an email address indicating an association between a phone number an account could be used to associate codes or other authenticators sent to that device to the particular account. Our PII regular expressions identified 522 messages with emails – most of these were sent by live.com, gmail.com, inbox.ru, or pop.co (a hosting provider).

SMS activity from sensitive applications. Finally, we noticed several instances where messages appeared in the gateway from organizations whose very nature is sensitive. The worst among these was the roomsharing service . One of our messages contained the full address of the shared property (personal information obscured): “Airbnb reservation reminder: Jan 25-28 @

. : or ” Although we suspect that the owner of the property listed it in such a way that this data was revealed, the use of SMS gateways for these services is troubling as it could facilitate real-world abuses. Other examples of sensitive applications include a large set of registrations with other telecommunications services. These include popular phone services like Telegram, Viber, Line, Burner and Frim. The presence of these services in gateway data may indicate the use of these gateways for “number chaining,” a practice that allows phone verified account evaders to acquire a large number of telephone numbers for free [48]. In addition, we see registration and activity in the gateway data to a number of bulk SMS services. This may indicate the use of gateway numbers to obtain access to bulk SMS services for the purposes of sending SPAM, in addition to a potential use for number chaining.

Case study: QIWI wallet. We have identified one service that uses most of the previously discussed problematic SMS practices: QIWI wallet, a Russian mobile wallet operated in partnership with VISA[178]. First, QIWI wallet sends email addresses in messages to bind emails to accounts. Second, this service also sends password reset codes

54 over SMS, while allowing login with the user’s phone number — meaning any reader of the message can reset the user’s password. QIWI also provides VISA numbers for its users, and they send partially-blinded card numbers and full CVV2 numbers through SMS. Such partially-blinded information can still be sensitive, as knowing the last four digits of a credit card is sometimes used for over-the-phone authentication, and such information has been used in the past to target call centers [179]. More worrisome, they seem to use two di↵erent blinding schemes – sometimes blocking the first and last four digits, other times blocking the middle 8 digits of the card. If both blinding schemes are used for the same card, it would be possible to acquire all card information over SMS. This service also sends balance updates over SMS, which are also sometimes used for caller authentication. Finally, we found at least one message in our data corresponding to a QIWI blocked account notification; one possible reason for this is the use of the QIWI account (registered with the gateway number) for fraud or abuse. 3.3.2 SMS code Entropy

Our message dataset a↵orded us samples of codes sent by many services over SMS. These codes provide valuable phone verification capabilities to services that wish to increase the burden of obtaining an account (e.g., to prevent fraudulent account creation), and these codes provide a glimpse into the security of the code-generation schemes. We grouped those clusters containing codes by service and extracted the numeric code from each message. Overall, we extracted from 33 clusters containing 35,942 authentication codes across 25 services, as shown in Table 3-5. We first tested the entropy of each set of codes using a chi-square test. The chi-square test is a null hypothesis significance test, and in our use case indicates if the codes are uniformly generated between the lowest and highest value. The p-value less than 0.01 means that there is a statistically significant di↵erence between the observed data and an ideal uniform distribution. Only 12 of 34 clusters (35%) had p>0.05. We also measure the e↵ect size w for each test, which indicates whether statistically significant

55 Table 3-5. The results of our statistical analysis of authentication codes from each service. Some services appear more than once in the data because their messages were split into multiple clusters (e.g., one for password resets and one for logins). This table presents the p-value, and if p<0.05, whether the e↵ect seen was large or medium according to established guidelines. Service Uniform? p-value E↵ectSize(w) E↵ect? Mean Code Google 7 0.000 0.721 Large 547948 Google 7 0.000 0.793 Large 558380 Instagram 7 0.000 0.622 Large 503172 Instagram 7 0.000 0.574 Large 498365 Instagram 7 0.000 0.600 Large 497936 Jamba 7 0.000 6.009 Large 4719 LINE 7 0.000 0.595 Large 5476 LINE 7 0.000 0.519 Large 5530 LINE 7 0.000 0.530 Large 5442 Microsoft 7 0.000 2.929 Large 357494 Odnoklassniki 7 0.000 0.675 Large 433997 Origin 7 0.000 0.512 Large 502627 QQ 7 0.000 0.522 Large 505555 SMSGlobal 7 0.000 0.500 Large 5540 Talk2 7 0.000 1.327 Large 5732 Telegram 7 0.000 0.478 Medium 54961 Viber 7 0.000 8.138 Large 112075 WeChat 7 0.000 0.664 Large 4989 Alibaba X 0.988 548652 Backslash X 0.325 556223 Baidu X 0.015 505165 BeeTalk X 0.595 544719 Circle X 0.080 506514 X 0.461 5512 Google X 0.917 501623 Hushmail X 0.527 503161 LINE X 0.698 5511 Origin X 0.086 500739 RunAbove X 0.427 494697 Skout X 0.004 5492 Tuenti X 0.981 5010 Weibo X 0.395 512458 WhatsApp X 0.022 543563

56 A WeChat B Talk2 C LINE

Figure 3-2. These figures present heatmaps of codes where the first two digits are represented on the y-axis and the last two digits are represented on the x-axis. Darker values represent higher frequencies of a code in our data. These figures show that WeChat and Talk2 present an egregious lack of entropy in their authentication codes, while Line generates random codes without leading zeros. di↵erences have substantial di↵erences. We find that most e↵ect sizes were large (w>0.5) with only one medium (w>0.3), indicating our statistically significant di↵erences were in fact meaningful. Finally, we confirmed that all tests performed had a statistical power of 0.98 or higher, indicating that our test had a high likelihood of observing any e↵ect present. Of the clusters, those belonging to the WeChat and Talk2 services had the least entropy of the authentication codes we analyzed. Not only did both services have p<0.001 in the above chi-square test, the service’s codes each generate a specific pattern. We mapped the first two digits of each code with the back two digits and show these two services’ codes in Figure 3-2.

WeChat. Until April 2015, WeChat’s authentication codes followed a pattern of rand() 16 mod 10000, which caused the stair-step o↵set-by-16 heatmap in Figure 3-2A. ⇤ The pattern could be explained by a random number generator with low entropy in the four least significant bits. This e↵ectively reduced the possible space of 4-digit codes to 625. In April 2015, WeChat changed its code generation algorithm. We removed the

57 625 known-pattern codes from the WeChat set and recomputed the chi-square entropy test. The p-value increased to 0.761 with statistical power and e↵ect size of 0.989 and 0.423, respectively, indicating that the new algorithm is likely producing uniformly-random codes.

Talk2. This service has an extreme lack of entropy in its code-generation algorithm, as seen in Figure 3-2B.Inparticular,itappearstoavoiddigits0,1,2,5,and8inpositions 1and3ofa4-digitcode.Wemadeseveralattemptstoreproducethisentropypattern, but we were unable to produce a reasonable explanation for this dramatic reduction in entropy.

Google. While the Google codes we harvested did not appear to be uniformly-random in our experiments, this appears to be caused by duplicate codes. When requesting that a code be resent, Google will send the same code again. This practice is potentially problematic because it indicates that the Google codes have a long lifetime. Since messages on gateways may be accessible for weeks or months, it may be possible for an adversary that can identify the associated account to use an unclaimed code. Without access to the associated accounts, however, we were unable to determine the exact lifetime of Google’s codes.

LINE. Although our experiments show LINE’s codes are likely uniformly generated, the service does not generate codes with a leading zero, reducing the overall space of codes by 10%. This practice is common among our clusters, with 13 total clusters exhibiting this behavior. For comparison, we display LINE’s codes in Figure 3-2C. 3.3.3 Takeaways

In this section, we explored the data that is exposed in the SMS channel for benign purposes. This is problematic if an adversary has access to SMS messages, as is the case with the gateways. We observed services that expose sensitive user data via SMS including financial data, account information, password reset URLs, and personal information

58 such as physical and e-mail addresses. We then found that 65% of services that use SMS to deliver codes generate low-entropy codes,whichmaybepredictableandgrant unauthorized access to accounts. The design of such services is guided by an assumption that the SMS channel is secure from external observation, and our observations show that this results in poor security design in those applications. 3.4 Abuses of SMS

Having explored how services attempt to use SMS as a secure channel, we now discuss what we observed about the security implications and evidence of abuse related to gateway activity. This includes phone verified account evasion, failed attempts at location anonymity, whether similar gateway numbers can be detected, and spam and fraud in the messages themselves. 3.4.1 Gateways and PVA

In this subsection, we discuss the relevance of our data to phone-verified accounts. In particular, we present evidence that the primary activity of the gateways we observe is evading phone verified account restrictions, and that existing countermeasures are ine↵ective.

Message activity statistics. In Section 3.2,wenotedthatmorethanhalfofthe messages received by gateways are related to account verification. This vastly outweighed any other purpose of sending SMS. Beyond this information, message activity statistics also support this claim. The median number lifetime (the time from first message to last) in our dataset is 20 days, and the CDF of number lifetime is shown in Figure 3-3A.This lifetime is fairly short, and in fact 73.9% of numbers do not even last a full billing cycle (31 days). There are two likely explanations for the short lifetime: one is that services that facilitate PVA need to replace their numbers often as they exhaust their usefulness to create new accounts. The second is that many of these numbers are in carriers (especially

59 mobile carriers) that shut o↵numbers for anomalous message volume. These explanations are not necessarily mutually exclusive. To gain insight into this question, we computed the daily volume of messages for each phone number used by a gateway, and we call this series the “daily activity” of the number. If these numbers were being primarily used for personal messages or informational activities (like signing up for advertising alerts), we would expect the daily activity of the number to be fairly constant across the lifetime of the number, or for there to be a “ramp up” period as new users discover the new line. Instead, we see almost the exact opposite behavior. To concisely express this, we computed skewness and kurtosis statistics of the daily activity of every number. Simply, kurtosis is a statistic that indicates if a series is “flat” or “peaky,” while skewness indicates whether a peak falls closer to the middle, beginning, or end of a series. A skew of between ( 1, 1) indicates the peak falls in the middle of the series, while a positive skew indicates apeakthatarrives“earlier”intheseries.Weplottheskewnessandkurtosisforevery number in Figure 3-3B. Note that we reverse the x-axis, so that the further left in the plot anumberfalls,the“earlier”itspeak. Figure 3-3C shows the CDF of the daily activity skew, and we observe that approximately 60% of numbers have a skew towards early activity. This implies that most numbers have a high message volume early in the lifetime, and consequently, most of the activity of the number has been completed by the time it is shut down. If carriers are disabling numbers (for exceeding a message rate cap, for example), they are doing so well after most numbers have seen their peak use. Likewise, if online services are considering a number invalid for phone verification, they are still permitting a high-volume of registration requests for a number (in aggregate) before blacklisting the number.

User location leakage. Some gateways advertise their services towards users that may be seeking privacy or anonymity. Although SMS does not provide either of these properties, the use of a gateway may provide a sense of anonymity for a user registering

60 for a service. Shortened URLs (often provided in space-constrained SMS messages) leak information about the user clicking the link to the URL-shortening service. With the statistics we collected from these services, we have identified both the source and destination countries for each message, we also found that the users of these services are located in significantly di↵erent locations. We do not attempt to deanonymize, track, or identify any users. Our data consists solely of publicly-available aggregate click statistics. The number of clicks recorded ranged from 0–1,582,634 with a median of 10. This data represents any click to these URLs, not just those from the gateway pages. As a result, to prevent skewing our data with popular and spam messages, we focused on URLs with 10 clicks, since many incoming links expected by users of SMS gateways are likely  clicked a small number of times. We collected the countries associated with each of the remaining 2,897 clicks and aggregated the results. Figure 3-4C shows the total clicks for each country across all shortened URLs. We could not map 194 clicks because the specific country information was not available or the service identified that the request was from an anonymizing proxy service. Also in our data were “test” messages sent by users testing the services. These messages provide another window into the user base. Figure 3-4B and Table 3-6 show that the geographical extent of these users goes well beyond the home countries of gateway numbers. Users of gateways may not be aware that these URLs and messages are leaking metadata, and gateways do not adequately warn users of this danger. We consider the use of a gateway as an anonymizing service to be a subset of PVA evasion, however, because users are attempting to evade phone verification, albeit for a di↵erent intent. 3.4.2 Detecting Gateways

As we have discussed above, these gateways facilitate PVA evasion and the demographic data we can obtain about the users of these services clearly shows usage patterns consistent with PVA fraud. It is clear that in most cases even reputable well-funded online services are not successfully defending against these (and similarly, for-pay gateways).

61 Table 3-6. This table contains the counts of the geolocated sender phone numbers for each country alongside the number of URL clicks from users based in those countries and the number of test messages sent to those countries. This data underscores the variation between the users of the gateway services and the numbers sending messages to the gateways. Country Messages Clicks Test Messages Country Messages Clicks Test Messages United States 95138 964 744 Pakistan - 3 11 Canada 77036 6 56 Moldova -3 - Germany 53497 95 65 Turkey -3 - United Kingdom 44039 10 89 Malaysia -2 8 Poland 16103 11 17 Morocco -2 1 Sweden 14849 29 9 Hungary -2 - Spain 11323 5 1 Algeria -2 - France 8273 478 20 Taiwan - 1 144 Russian Federation 7344 276 14 Saudi Arabia -1 6 Norway 6674 1 11 Ghana -1 5 Mexico 6431 71 14 Brazil -1 4 Romania 6043 190 - South Africa -1 4 Australia 5964 - 43 Egypt -1 3 Belgium 5253 3 10 Bulgaria -1 1 India 5064 81 13 Vietnam -1 1 Ukraine 4363 4 - Argentina -1 - Italy 4326 4 11 Iceland -1 - Thailand 4073 - 1 Ivory Coast -1 - Hong Kong 3251 - 13 Jordan -1 - Israel 1971 6 6 Myanmar -1 - Switzerland 1722 9 14 Sri Lanka -- 9 Finland 1714 191 1 Iraq -- 7 Lithuania 520 1 - Singapore -- 6 Estonia 405 - 2 United Arab Emirates - - 5 Ireland 331 2 3 Isle of Man -- 4 Austria 158 7 8 Kuwait -- 4 Denmark 54 - - Bangladesh -- 3 Czech Republic 6 - 3 Lebanon -- 3 Netherlands - 247 12 New Zealand -- 3 Portugal - 21 1 Cambodia -- 2 China - 10 6 Costa Rica -- 1 Indonesia -9 7 Jamaica -- 1 Nigeria -5 7 Maldives -- 1 Serbia -5 1 Oman -- 1 Luxembourg -5 - Philippines -- 1 Iran - 4 18 Reunion Island -- 1 Japan -4 - Slovakia -- 1

62 Table 3-7. We analyzed the numbers from each gateway for similarity. In 7 of 8 gateways, at least 40% of the gateways’ numbers were similar. Site Similar/Total Percent [1] receive-sms-online.info 15 / 59 25.4% [2] receivesmsonline.net 16 / 38 42.1% [3] e-receivesms.com 7 / 14 50.0% [4] hs3x.com 28 / 57 49.1% [5] receivefreesms.com 52 / 93 55.9% [6] receivesmsonline.com 38 / 93 40.9% [7] receive-sms-online.com 8 / 19 42.1% [8] receive-sms-now.com 20 / 48 41.7%

Although number lifetimes are short, the sheer volume of verification messages in our data indicates that evasion is still an e↵ective driver of profit for gateways. PVA evasion is not new to online services. In particular, Google is acutely aware of this problem, having published a paper on the topic [86]. In that paper, Thomas et al. propose several strategies to detect PVA evasion. They include blocking irreputable carriers, restricting how quickly numbers can verify accounts, and phone re-verification. In this section we explore the recommendations in [86]anddiscusshowourdatashowsthat these recommendations are unlikely to be e↵ective.

Carrier reputation. While we only see one of the carriers identified as abuse-prone in [86] (bandwidth.com), blacklisting blocks of numbers by carrier would not stop all PVA evasion. Carrier-based blocking is prohibitively expensive for all but the largest of organizations. We obtained Twilio data for each number in our data set and although the cost was relatively small ($0.005/lookup), scaling this (and additional number metadata such as CNAM and HLR data) to cover all of a business’s customers represents asubstantialcost.Furthermore,thiskindofbulkblacklistingisdiculttoenforcein the face of gateway services that maintain a large pool of numbers over many carriers. Online services that attempt to restrict the speed at which numbers can be reused for new accounts face an arms race against gateways.

63 Table 3-8. An analysis of the similarity of gateway numbers shows that the majority of numbers are in mobile carrier number blocks, not VoIP as we expected. As a result, attempting to block these number blocks may result in high false positives. Carrier Type Similar / Total Percent Mobile 159 / 184 86.4% Landline 5 / 184 2.7% VoIP 20 / 184 10.9%

Phone reputation. One option suggested in [86]fordeterminingphonereputationis to create a service which shares abuse data between service providers. Although little information about how such a service could be created, we considered that it might be possible to blacklist abusive numbers if they are similar to each other. We conducted a self-similarity analysis against the phone numbers in our dataset to determine how numbers are purchased. If they are purchased in bulk, it may be possible to detect them. We analyzed all of the gateways’ numbers to determine similar numbers using Hamming distance. We found that most carriers use similar numbers (i.e., those with a Hamming distance of 2 or less), and the results are shown in Table 3-7.Over40% of all of a gateway’s numbers were similar in 7 of 8 gateways, however we found that most of these repeated numbers are in mobile carriers, not VoIP, as shown in Table 3-8.The data shows that the gateway numbers are in the carriers that are most likely to serve legitimate users, so attempting to block these numbers may result in a high false positive rate.

Phone re-verification. Phone number re-verification would fail if the number were checked again outside the expected lifetime of a gateway number. In [86], Thomas et al. saw a median number lifetime of one hour, a reasonable point to perform a re-verification. In our dataset, however, we have seen that half of all gateway numbers last up to 20 days. Therefore, re-verification at any interval is unlikely to be universally e↵ective since phone number longevity is not guaranteed.

64 3.4.3 Abuse Campaigns in SMS

Since gateways accept unsolicited messages, often do not filter messages, and are subject to users providing these numbers to various services, our data contains SMS from SPAM campaigns, phishing campaigns, and even one black market as discussed in Section 3.3.1.Inthissection,wewilldiscussthesecampaigns.

Spam campaigns. We found 1.0% of tagged messages across 32 clusters related to advertising. Upon manual inspection none of these appeared to be solicited messages, so we consider these to be spam messages. Of the advertising clusters we identified, 15 are UK-based financial services (e.g., payday loans, credit lines) from 14 numbers. Five are for distinct bulk messaging services. These services advertise gateways and the ability to avoid phone verification: “Using our service to create and verify accounts without your own phone number.” Another six clusters are from a specific job stang site and appear to be bulk messages related to a job search. Curiously, these messages contain a name and zip code. We expanded the search beyond the labeled clusters and found 282 messages in 107 clusters. These messages may be related to this organization testing their bulk SMS API. All of these messages were sent to a single gateway number within a seven-hour timespan, which is unusual when compared to other bulk message campaigns in our dataset. Finally, two of these messages have links to surveys via Bitly links. These links were created by user “smsautodialer”, who has been a member since July 2015 and has shared over 2,802 Bitly links. The destination domain has a 0/65 detection ratio on VirusTotal. We were surprised at the low spam volume observed in public gateways, as they market themselves as a service for avoiding spam. This has been a major topic of research, but the volume of spam trac in our dataset is lower than previously measured [161], [180].

Phishing campaigns. In contrast to spam, phishing messages attempt to trick the user into believing he/she is communicating with a legitimate entity (e.g., to steal service

65 credentials). These scams typically use “fast-flux” domain registrations to defeat domain blacklisting strategies. Therefore, the age of the domain at the time a message arrives containing that domain is of particular value; if the domain is new, it may indicate that the domain is malicious. We matched the timestamps for incoming SMS messages with the registration times for the domains included in each message. The fastest domain to appear in our dataset was danske-mobile*com8 ,adomain that had been registered for only 11 hours before it appeared in an SMS message. The text of the message (translated from German) is “Dear Danske Bank customer, you have received an internal message” alongside the URL. We believe this to be a banking phishing message, however we were unable to verify the URL’s purpose. At the time of this writing, the specific host in the message returns a DNS NXDOMAIN error and the second-level domain returns a registrar parking page. The SMS gateway that received this message did not display the sender MSISDN number, instead replacing it with “DanskeBank,” which may indicate number spoofing. Curiously, the domain WHOIS data shows detailed personal information (name, address, phone number) of the registrant, who is based in the United States. The real Danske Bank web site has registration data with contact information in its home country, Denmark. Given this domain’s intended purpose, we believe that this data is either incorrect or stolen personal information, and we did not pursue the ownership further. In total, 8 domains appeared in messages after being registered for less than one day, as shown in Table 3-9. Only one of these domains was accessible via HTTP at the time of writing. The domain, phone-gps*com, has an error and delivers a stack trace when no HTTP user-agent string is provided; when we provided one, it delivers empty content (0 bytes). This site, therefore, may be using user-agent strings to determine what content

8 We substitute an asterisk into suspicious URLs in this chapter to prevent PDF readers from inferring hyperlinks.

66 Table 3-9. Using domain WHOIS information, we measured the distance between the time adomainwasfirstregisteredandthetimeagatewayfirstreceivedamessage containing a URL with this domain. In total, 8 domains appeared in messages within 24 hours of being registered. Domain SenderMSISDN TimetoFirstMessage danske-mobile*com DanskeBank 0 days 11:41:02 location-message*com 243858234346 0 days 13:38:02 it-panels*com 16312237715 0days16:30:02 iurl-sms*com 14156537352 0 days 16:30:02 phone-gps*com 243858214490 0 days 18:41:03 url-sms*com 243858361940 0 days 18:47:03 location-device*com 243858097749 0 days 19:42:02 sms-new-page*com 243858289642 0 days 20:08:02 to deliver, however we were not able to get the site to deliver any content with common strings for desktop and mobile browsers. The remaining 7 domains are all registered with contact addresses and registrars based in China and take the form of hyphen-separated English words. Since none of these domains had accessible hosts at the time of writing, we were unable to determine their purpose. Since we were unable to verify the intent of the above domains, we manually searched our dataset for a recently-seen newly-registered domain. We found lostandfounds-icloud*com, a site that is designed to appear like the legitimate “Find My iPhone” Apple service. Figure 3-5 shows the SMS message containing this URL, which also indicates a phishing attempt. The page’s code appears to reject any user name or password entered into the fields (a common practice among phishing sites), and indeed, upon putting any content in these fields, the page returned the error seen in Figure 3-6. As of November 2015 (less than one month since the message arrived at the gateway), the site has been taken o✏ine. Due to the necessity of retrieving working domains from newly-obtained messages, this message appears later in our dataset than other messages we discuss in this chapter.

Other malicious behavior. Another empirical measure of the maliciousness of the URLs is scanning these URLs with security products. VirusTotal provides one such

67 Table 3-10. We requested VirusTotal scans for each extracted URL in our dataset. This table shows the number of detections for each product that detected a malicious URL. Overall 417 URLs had at least one detection. Product Detections ADMINUSLabs 1 AutoShun 144 Avira 7 BitDefender 15 Blueliv 5 C-SIRT 1 CLEAN MX 11 CRDF 5 Dr.Web 62 ESET 6 Emsisoft 23 Fortinet 31 Google Safebrowsing 15 Kaspersky 3 Malekal 3 Malware Domain Blocklist 20 Malwarebytes hpHosts 1 ParetoLogic 54 Phishtank 1 Quttera 2 SCUMWARE.org 4 Sophos 28 Spam404 3 Sucuri SiteCheck 94 TrendMicro 1 Trustwave 55 Web Security Guard 1 WebsenseThreatSeeker 81 Webutation 2 Yandex Safebrowsing 1

68 measure by requesting scans from multiple products. The full results are displayed in Table 3-10. VirusTotal returned 417 URLs with at least one detection. Only 3 URLs had 5 detections, and no URL had more than 5 detections. Of these detections, 508 were detected as “malicious site,” 147 as “malware site,” and 25 as “phishing site.” Unsurprisingly, danske-mobile*com was not detected by any product, since this domain no longer appears to host any content and it is unlikely that any of these products can determine phishing attempts using the metadata we previously discussed. Overall, abusive messages (spam, phishing, and malware) consisted of only a small portion of our dataset, despite being billed as a major problem in popular press. This is especially strange given that evasion of spam is something many of the gateways advertise, as we discussed in Section 3.1. Given previous reports on the pervasiveness of SMS spam, we believe that some entity in the SMS ecosystem is performing adequate spam filtering and that this problem may no longer be as severe as it once was. 3.4.4 Takeaways

In this section, we explored malicious uses of the SMS channel. First, we discussed how our data shows the prevalence of PVA evasion due to the stark contrast between gateway number locations and locations of users interacting with the gateways. We then discussed the diculty of detecting gateways with carrier blocking due to cost and number lifetimes. Finally, we explored abuse campaigns via SMS and found that spam, phishing, and suspicious URLs are infrequent, which may indicate that SMS filtering at the gateways and in the network are sucient.

69 A

B

C

Figure 3-3. Gateway number lifetime statistics. A) Only 25% of gateway-controlled numbers are used after one month. The median number lifetime is only 20 days. B) The skew and kurtosis of number lifetime indicates that 60% of messages have a significant skew towards heavier use at the beginning of the lifetime, while the kurtosis indicates that these numbers see a sharp increase in activity followed by steep decline. C) 60% of numbers used show a strong tendency for heavy use in the early lifetime of the number.

70 A

B

C

Figure 3-4. These maps visualize the sender phone number locations of A) all messages and B) test messages sent to the gateways. In C), we map the locations of users that have clicked Bitly- or Google-shortened URLs. These locations provide insight on both the services users are attempting to access and the gateway users themselves. Overall, the locations of the gateways’ users significantly di↵ers from the services sending messages, implying the primary purpose of these gateways is PVA fraud.

71 Apple Customer, Your lost iPhone has been found \ and temporarily switched ON. To view iPhone map location lostandfounds-icloud*com Apple

Figure 3-5. The phishing SMS message, as received by a gateway. This message is the first step to deceiving a user into providing his/her Apple ID credentials. We substituted the asterisk in to prevent accidental clicks.

Figure 3-6. The page delivered to the user after following a link provided in a phishing SMS. The site refuses any username and password combination provided and displays the error shown in this figure.

72 CHAPTER 4 DETECTING INTERCONNECT BYPASS FRAUD Cellular networks provide digital communications for more than five billion people around the globe. As such, they represent one of the largest, most integral pieces of critical infrastructure in the modern world. Deploying these networks requires billions of dollars in capital by providers and often necessitates government subsidies in poorer nations where such investments may not produce returns for many decades. As a means of maintaining these systems, international calls destined for such networks are often charged a significant tari↵, which distributes the costs of critical but expensive cellular infrastructure to callers from around the world. Many individuals seek to avoid such tari↵s by any means necessary through a class of attacks known as “interconnect bypass fraud”. Specifically, by avoiding the regulated network interconnects and instead finding unintended entrances to the provider network, a caller can be connected while dramatically lowering his or her costs. Such fraud constitutes a “free rider” problem, a term from economics in which some participants enjoy the benefits of expensive infrastructure without paying to support it. The most common implementation of interconnect bypass fraud is known as simboxing. Enabled by VoIP-GSM gateways (i.e., “simboxes”), simboxing connects incoming VoIP calls to local cellular voice network via a collection of SIM cards and cellular radios. Such calls appear to originate from a customer phone to the network provider and are delivered at the subsidized domestic rate, free of international call tari↵s. Interconnection bypass fraud negatively impacts availability, reliability and quality for legitimate consumers by

Text of this chapter is reprinted with permission from Bradley Reaves, Ethan Shernan, Adam Bates, Henry Carter, and Patrick Traynor. Boxed Out: Blocking Cellular Interconnect Bypass Fraud at the Network Edge. In Proceedings of the 24th USENIX Security Symposium, 2015. (Acceptance Rate:15.7%).

73 creating network hotspots through the injection of huge volumes of tunneled calls into underprovisioned cells, and costs operators over $2Billionannually[4]. In this chapter, we discuss Ammit1 ,asystemfordetectingsimboxingdesignedtobe deployed in a cellular network. Our solution relies on the fact that audio transmitted over the Internet before being delivered to the GSM network will be degraded in measurable, distinctive ways. We develop novel techniques and build on mechanisms from the Pindr0p call fingerprinting system [125]tomeasurethesedegradationsbyapplyinganumber of lightweight signal processing methods to the received call audio and examining the results for distinguishing characteristics. These techniques rapidly and automatically identify simboxed calls and the SIMs used to make such connections, thereby allowing us to quickly shut down these rogue accounts. In so doing, our approach makes these attacks far less likely to be successful and stable, thereby largely closing these illegal entrances to provider networks. We make the following contributions:

• Identify audio characteristics useful for detecting simboxes: We identify features in simboxed call audio that make it easily di↵erentiable from traditional GSM cellular calls and argue why such features are dicult for adversaries to avoid.

• Develop rapid detection architecture for the network edge: We design and implement Ammit, a detection tool that uses signal processing techniques for identifying illicitly tunneled VoIP audio in a GSM network, and demonstrate that our techniques can easily execute in real time. Such performance means that our solution can be practically deployed at the cellular network edge.

• Demonstrate high detection rate for SIM cards used in simboxes: Through experimental analysis on a real simbox, we show that Ammit can quickly profile and terminate 87% of simboxed calls with no false positives. Such a high detection rate arguably makes interconnect bypass fraud uneconomical.

1 Ammit was an Egyptian funerary deity who was believed to separate pure and impure souls, preventing the latter from achieving immortality in the afterlife.

74 We note that our techniques di↵er significantly from related work, which requires either large-scale post hoc analysis [92]orserendipitoustestcallstonetworkprobes[95]– [99]. Our approach is intended to be used in real time, allowing for rapid detection and elimination of simboxes. It should be noted that we are not attempting to combat the spread of inexpensive VoIP calls in this chapter. Traditional VoIP calls, which connect users through IP or a licensed VoIP-PSTN (Public Switched Telephone Network) gateway, are not considered aproblemincountriesthatcombatsimboxes.Instead,weseektopreventthecreation of unauthorized entry points into private cellular networks that degrade performance for legitimate users and cost providers and governments two billion dollars annually. This is analogous to the problem of rogue Wi-Fi access points; simboxing prevents network administrators from controlling access to the network and can degrade service for other users. Moreover, similar to other economic free-rider problems, failure to combat such behavior can lead to both underprovisioning and the overuse of such networks, making quality and stability dicult to achieve [7]. Failure to combat simbox fraud may ultimately lead to raising prices and lower reliability for subsidized domestic calls in developing nations, where the majority of citizens can rarely a↵ord such cost increases. The remainder of this chapter is organized as follows: Section 4.1 describes simbox operation and their consequences; Section 4.2 presents our detection methodology; Section 4.4 describes our experimental methodology; and Section 4.5 discusses our results; 4.1 What is a Simbox?

AsimboxisadevicethatconnectsVoIPcallstoaGSMvoice(notdata)network. AsimplementalmodelforasimboxisaVoIPclientwhoseaudioinputsandoutputsare connected to a mobile phone. The term “simbox” derives from the fact that the device requires one or more SIM cards to wirelessly connect to a GSM network. There is a strong legitimate market for these devices in private enterprise telephone networks. GSM-VoIP gateways are sold to enterprises to allow them to use a cellular

75 calling plan to terminate2 calls originating in an oce VoIP network to mobile devices. This is typically a cost saving measure because the cost of maintaining a mobile calling plan is often lower that the cost of paying termination fees to deliver the VoIP call through a VoIP PSTN provider (as well as the cost to the receiving party). Such a setup is done with the permission of a licensed telecommunications provider and is only done for domestic calls. This is in direct opposition to simboxers, who purchase subsidized SIM cards to deliver trac to a local network without paying the legally mandated tari↵s. Because there is a high demand for GSM-VoIP gateways, they span a wide range of features and number of concurrent calls supported. Some gateways support limited functionality and only a single SIM card, while others hold hundreds of cards and support many audio codecs. Some simboxes used in simbox fraud rings are actually distributed, with one device holding hundreds of cards in a “SIM server” while one or more radio interfaces connect calls using the “virtual SIM cards” from the server. This allows for simple provisioning of SIM cards, as well as the ability to rotate the cards to prevent high-use or location-based fraud detection. 4.1.1 How Simbox Fraud Works

Simboxing is a lucrative attack. Because simboxers can terminate calls at local calling rates, they can significantly undercut the ocial rate for international calls, while still making a handsome profit. In doing so, simboxers are e↵ectively acting as an unlicensed and unregulated telecommunications carrier. Simboxers’ principal costs include simbox equipment (which can represent an investment up to $200,000 US in some cases), SIM cards for local cellular networks, airtime, and an Internet connection. Successfully combating this type of fraud can be accomplished by making any of these costs prohibitively high.

2 In cellular and telephone networks, “terminating a call” has the counterintuitive meaning of “establishing a complete circuit from the caller to the callee.”

76 Foreign Regulated Domestic PSTN Interconnect PSTN Core (RI) Core

A Typical International Call

Legitimate VoIP Call Simboxed Call RI Foreign Domestic Internet PSTN Core VoIP GSM PSTN Core Simbox International Border Legitimate Local Call

B Simboxed Call

Figure 4-1. A typical international call (A) is routed through a regulated interconnect. Note that VoIP calls from services such as Skype that terminate on a mobile phone also pass through this regulated interconnect and are not the target of this research. A simboxed international call (gray box, subfigure B) avoids the regulated interconnect by routing the call to a simbox that completes the call using the local cellular network.

Figure 4-1 demonstrates in greater detail how simboxing compares to typical legitimate international call termination. Figure 4-1 shows two international call paths: a typical path (Figure 4-1A) and one simbox path (Figure 4-1B). In the typical case, when Alice calls Bob, her call is routed through the telephone network in her country (labeled “Foreign PSTN Core”) to an interconnect between her network and Bob’s network. The call is passed through the interconnect, routed through Bob’s domestic telephone network (“Domestic PSTN Core”) to Bob’s phone. If Alice and Bob are not in neighboring countries, there may be several interconnects and intermediate networks between Alice and Bob. The process essentially remains the same if Alice or Bob are using mobile phones. The interconnect in this scenario is crucial — interconnects

77 are heavily regulated and monitored to ensure both call quality and billing accuracy (especially for tari↵s). In the simbox case, Alice’s call is routed through her domestic telephone network, but rather than passing through a regulated interconnect, her call is routed over IP to a simbox in the destination country. The simbox then places a separate call on the cellular network in the destination country, and then routes the audio from the IP call into the cellular call, which is routed to Bob through the domestic telephone network. In practice, simboxers execute this attack and profit in one of two ways. The most common method is for the simboxer to present themselves as a legitimate telecommunications company that o↵ers call termination as a service to other telecom companies. As a call is routed through these intermediate networks, neither of the end users is aware that the call is being routed through a simbox. This agreement is analogous to a contract between two ISPs who have agreed to route trac between their networks. While the end user has no knowledge of how his trac is routed, the intermediate network owners profit from reduced prices for routed trac. The second method simboxers use to profit is to o↵er discounted call rates directly to end consumers, primarily through the sale of international calling cards. Such cards have a number that the user must dial before she can dial the recipient’s number; this number will route to a number provided by a VoIP provider that points to the simbox in the recipient’s country. When the user calls the number on her calling card, the simbox will answer, prompt her to dial the recipient’s number, then the simbox will connect the call. 4.1.2 Consequences of Simbox Operation

The consequences of simboxing are significant to users who place simbox calls, users who share the cellular network with simboxers, and to cellular carriers and national governments.

78 As for the e↵ects on users, Alice is likely unaware of the details of her call routing. However, Alice and Bob may both notice a degradation in quality, and Bob may notice that the Caller ID for Alice does not show her correct number. Bob may blame his local carrier for poor call quality, and so the carrier unfairly su↵ers in reputation. Other users in the same cell as the simbox also su↵er negative consequences. Cellular networks are provisioned to meet an expected demand of mobile users who only use the network a fraction of the time, and accordingly may only be able to support a few dozen simultaneous calls. When a simboxer sets up an unauthorized carrier and routes dozens of calls through a cell provisioned to support only a handful of simultaneous calls, the availability of that cell to service legitimate calls is significantly impaired. Connectivity within the cell may be further impaired by the dramatic increase in control trac [181]. 4.2 Methodology

Legitimate VoIP calls and other international calls enter a cellular network through a regulated interconnect or network border gateway. To halt simboxed calls, we only need to monitor incoming calls from devices containing a SIM card. Figure 4-1B shows the path of legitimate and simboxed audio, respectively, from the calling source to the final destination. In both cases, the tower believes it is servicing a voice call from a mobile phone. However, the audio received by the tower from a simboxed call will contain losses, indicating that the audio signal has traveled over an Internet connection, while the audio from a legitimate call will not contain these losses, having been recorded directly on the transmitting mobile phone. As discussed in Chapter 2, jitter and loss in Internet telephony manifest as unconcealed and concealed gaps of audio to the receiving client (the simbox, in this case). These features are inherent to VoIP transmission, and the only variant is the frequency of these events. All simboxed calls will have some amount of packet loss and jitter, so we design Ammit to detect these audio degradations. Because the audio transmitted to the mobile device could have originated from a variety of connection types, Ammit only analyzes audio received from mobile devices. If the mobile device is

79 asimbox,thecharacteristicsofthisaudiowillexhibitthelosspatternsconsistentwith aVoIPconnection,makingthecalldistinguishablefromaudiorecordedandsentbya mobile phone. 4.2.1 Inputs to Ammit

The most common codec supported by simboxes is G.711 [182]. The G.711 codec is computationally simple, royalty-free, and serves as a least common denominator in VoIP systems. It was originally developed in 1972 for digital trunking of audio in the PSTN, and it is still the digital encoding used in PSTN core networks. The original standard indicated that G.711 should insert silence when packets are delayed or lost, so we examine G.711 using this setting. Simboxers will have a clear incentive to configure their simboxes to evade detection, and an obvious evasion strategy is to ensure that audio is as close as possible to legitimate audio by using the GSM-FR codec for the VoIP link. Therefore, we show how Ammit accounts for this dicult case where GSM-FR is used with and without packet loss concealment. We discuss how Ammit addresses other evasion techniques in Section 4.3. In summary, Ammit must detect the two audio phenomena that are characteristic of VoIP transmission: concealed and unconcealed packet losses. The following subsections detail how Ammit detects these phenomena, but first we briefly describe the data that Ammit receives from the tower before detecting audio features. In GSM, audio encoded with the GSM-FR codec is transmitted between a mobile station (MS, i.e., a phone) and a base transceiver station (BTS, i.e., a cell tower) using a dedicated trac channel. The encoding used by GSM-FR causes certain bits in a frame to be of greater importance than others. When an audio frame is transmitted, frame bits are separated by their importance. “Class 1” bits containing the most important parameters are protected by a parity check and error correcting codes, while “Class 2” bits are transmitted with no protections because bit error in these bits has only a small e↵ect on the quality of the audio. The approach of only protecting some bits is a compromise

80 between audio quality and the cost of the error correcting code. When Class 1 bit errors cannot be corrected, the receiver erases (i.e., drops) the entire frame. When Class 2 bits are modified, the audio is modified, but the receiver has no mechanism to detect or correct these modifications. This is termed “bit error.” It should be noted that bit error and frame erasure are distinct concerns in GSM. The receiving device (MS or BTS) may use PLC to conceal this frame erasure. When a BTS erases a frame, it conceals the loss before forwarding the audio into the core network. Visibility into frame erasures motivates our choice to place Ammit at the tower. However, there are additional benefits to locating Ammit at a tower. Specifically, this allows for scalable detection of simboxes because a single Ammit instance is only responsible for the dozens of calls that pass through the tower instead of the thousands of concurrent calls in a region or nation. Finally, if Ammit has a high confidence that a call is simboxed (as defined by a network administrator policy), ending a call at the tower is simpler than in other parts of the network. This policy would further frustrate the e↵orts of simboxers. It is also possible to deploy Ammit closer to the network core, perhaps at BSC or MSC nodes, but GSM loss information would need to be forwarded. Ammit takes two inputs: a stream of GSM-encoded audio frames and a vector indicating which audio frames were erased (both of which can be collected by the BTS connecting the call). Ammit uses the frame erasure vector to ignore the e↵ects of the air interface on the call audio. Ignoring erased frames ensures that losses on the air interfaces are not misinterpreted as losses caused by VoIP. 4.2.2 Detecting Unconcealed Losses

Ammit must detect two degradation types: unconcealed packet loss and concealed packet loss. To detect unconcealed loss, Ammit looks for portions of audio where the energy of the audio drops to a minimum value then quickly rises again. This technique is also used in the Pindr0p system [125]. The following discussion describes the Pindr0p

81 0.02 Packet Loss 0.018 Detected Loss

0.016

0.014

0.012

0.01 time energy − 0.008 Short True Undetectable 0.006 Positive Loss

0.004

0.002

0 0 50 100 150 200 250 Time (ms)

Figure 4-2. The short-term energy of speech during audio can reveal silence insertion. Packet loss that falls in naturally silent sections of audio is undetectable. approach to detecting unconcealed losses, with additional implementation insight and details. Figure 4-2 demonstrates unconcealed packet loss in a clip of audio at 78 ms and 215 ms. At 78 ms, a packet is lost and silence begins. A short time later, at 90 ms, the energy rises again, indicating that a new packet has arrived containing speech. Because the time between the energy fall and rise is less than typical in speech, Ammit marks that section of audio as containing a lost packet. While the intuition is simple, there are several challenges to using this technique to detect losses from simboxed audio. The first challenge is that many packet losses will occur during naturally silent audio — meaning that there will be no significant change in energy. This fact merely limits the amount of detectable loss events. The second challenge is that speech regularly has short pauses, causing false positives. A third challenge is that because there is no guarantee that VoIP frames are fully contained within a single GSM frame, a VoIP loss could begin in the middle of a GSM packet. Finally, uncorrected packet

82 losses will have very low but non-zero energy because the pure silence is altered by bit errors in air transmission or by degradations within the simbox. The first step of detecting unconcealed packet loss is to compute the energy of the audio signal. Ammit uses Short Time Energy (STE) as its measure of signal energy. Short time energy is a frequently used metric in speech analysis [183]. STE is computed by taking small windows of data and summing the squared values of the signal in the window. More formally, STE can be written as

n E = ((x(i))w(n i))2 (4–1) n i=n N+1 X where x is the audio signal, w is the window function, n is the frame number and N is the frame size. Ammit computes STE using a 10ms audio frame, not the 20ms frames used in GSM-FR and many other codecs, because 10ms is the minimum frame size used by a VoIP codec, as standardized in RFC 3551 [184]. We use the standard practice of using a Hamming window half the length of the frame with a 50% overlap. Therefore, each STE measurement covers 5ms of audio and overlaps with 2.5ms of audio with the last window. This fine-grained measurement of energy ensures that Ammit can detect packet loss that begins in the middle of a GSM air frame. With STE computed, Ammit then computes the lower envelope of the energy. In the presence of noise, the “silence” inserted in the VoIP audio will have non-zero energy. We define the lower envelope as the mean of the minimum energy found in the 10 ms frames. We also determine a tolerance around the minimum energy consisting of 50% of the lower envelope mean (this was determined experimentally). Once Ammit has determined the lower envelope, it looks for energies that fall within the minimum envelope tolerance but then rise after a short number of energy samples. We experimentally chose 40ms as the maximum value for a sudden drop in packet energy, and

83 0.04

0.02

0

−0.02

Audio Amplitude Repeated, Attenuated Repeated, Attenuated Original Signal Signal Signal −0.04 0 10 20 30 40 50 60 0.6 Time (ms) 0.4

0.2

0

−0.2 Cepstrum Magnitude −0.4 0 5 10 15 20 25 Quefrency (ms)

Figure 4-3. GSM-FR repeats and attenuates the last good frame to conceal packet loss. This results in a clear peak at 20ms in the cepstrum of the audio that can be used to detect a simboxed call. our experimental results reflect the fact that this period is lower than the minimum for pauses in standard speech (which is around 50–60ms). Because this method simply looks for silence, it is e↵ective for both codecs we study, and it is fundamentally suited for all codecs that insert silence in the place of lost packets. 4.2.3 Detecting Concealed Losses in GSM-FR

Before we describe how Ammit detects GSM-FR packet loss concealment, we first describe GSM-FR PLC [185]atahighlevel.Onthefirstframeerasure,theerasedframe is replaced entirely by the last good frame. On each consecutive frame erasure, the previous frame is attenuated before replacing the erased frame. After 320ms (16 frames) of consecutive frame erasures, silence is inserted. Attenuation of repeated frames is motivated by the fact that while speech is stationary in the short term, longer-term prediction of audio has a high error that users perceive as unnatural.

84 Repeating frames wholesale has the frequency domain e↵ect of introducing harmonics

1 3 every 20ms =50Hz [158]. Thus, there will be a spike in the cepstrum at the 20ms quefrency. Because 50Hz is well below human pitch, this is a distinctive indicator of GSM packet loss. Figure 4-3 shows a clip of audio that has had GSM-FR packet loss concealment applied and the corresponding cepstrum. Note that the audio repeats (but is attenuated) every 20ms resulting in a peak at the cepstrum at 20ms. To detect GSM-FR PLC, Ammit computes the cepstrum of a window of three frames of audio and looks for a coecient amplitude in the 20ms quefrency bin that is double the standard deviation of amplitudes of the other cepstral coecients and not located in a silent frame. 4.2.4 Simbox Decision and SIM Detection

While concealed and unconcealed packet loss are measurable indicators of simboxing, there is a small false positive rate caused by the imperfection of our signal processing techniques. Accordingly, a single instance of a detected loss or concealed loss is not sucient to consider a call to be originated from a simbox. Instead, we normalize the counts of loss events by the number of total frames in a call and consider a call as simboxed if the loss event percentage is much higher than the average loss event percentage for legitimate audio. We show in the following section that this approach is e↵ective for all but the highest quality VoIP links, which provide few loss events to detect. Even with this thresholding, some legitimate calls will occasionally be marked as asimbox.Toensuredetectionofsimboxeswithevenimprobablylowlossrates,andto reduce the impact of false positives, we propose that the network should keep track of the number of times a call placed from a SIM is marked as a simboxed call. We term this

3 A “cepstrum” is a signal representation defined as the inverse Fourier transform of the logarithm of the Fourier transform. A rough mental model is to think of the “cepstrum” as the “Fourier transform of the Fourier Transform of a signal. ”The domain of the function is termed “quefrency” and has the units of seconds

85 technique “SIM detection” and show in the following sections that by using this technique we can further discriminate the legitimate subscribers from simboxers. 4.2.5 Eciency of Ammit

Ammit is designed to analyze call audio in real time as it is received by the cellular tower. So, the system must be designed to function eciently using minimal computation and network resources. To accomplish this, we avoid using costly analysis associated with machine learning or complex signal feature analysis, and instead apply simple threshold checks to processed audio signals. For each time window collected by Ammit, we apply two iterations of the Fast Fourier Transform (FFT) and a comparison operation to the distinguishing criteria noted above. The FFT is a well-known algorithm that can be run with O(n log n)complexity,andisusedtoanalyzeaudioinrealtimeforapplicationssuch as audio visualizers. We further verify empirically that these operations can be executed in real time in Section 4.5. In addition, any added load on the network will cause a minimal impact on the overall throughput. While Traynor et al. [181] demonstrated that added signaling within the cellular network can cause a DDoS e↵ect, Ammit sends only a single message to the HLR4 for any call flagged as a simboxed call. For this added messaging to cause an e↵ect on the internal cellular network, a cell containing a simbox would have to simultaneously send significantly more messages than there are channels to handle cellular calls, which is not possible. 4.3 Threat Model and Evasion

To evade Ammit, simboxers must either compromise Ammit’s measurement abilities or successfully prevent or hide VoIP losses. While simboxers will take every economically rational action to preserve their profitability, attempting to evade Ammit will be dicult

4 The Home Location Register is a central database in a cellular network that manages subscriber information

86 and likely expensive. This will hold true even if simboxers are aware of Ammit’s existence and detection techniques, and even if simboxers are able to place arbitrary numbers of calls to test evasion techniques. In this section, we outline basic assumptions about our adversary. We then provide details about how Ammit can be expanded to address stronger adversaries that could defeat the prototype described in this chapter. 4.3.1 Security Assumptions

The e↵ectiveness of Ammit relies on four reasonable assumptions to ensure that Ammit cannot be trivially evaded by simboxers. First, we assume that the Ammit system (hardware and software) is no more accessible to the attacker than any other core network system (including routing and billing mechanisms). Second, we assume that Ammit will be used to analyze all call audio so that simboxers cannot evade a known evaluation period. We show in Section 4.5.4 that Ammit can eciently analyze calls. Third, we assume that Ammit will report measurements to a single location (like the HLR) so that simboxers cannot evade Ammit by frequently changing towers. Finally, we recommend that Ammit be widely deployed throughout a carrier’s infrastructure because a wider deployment will provide fewer places for simboxes to operate. 4.3.2 Evasion

If the simboxer cannot avoid Ammit analysis, he must hide or prevent VoIP packet loss and jitter. Hiding packet loss and jitter was the very goal of over two decades of intense academic and industrial research that has so far only provided good but algorithmically detectable solutions, including jitter bu↵ering and loss concealment.

Extreme jitter bu↵ering. VoIP clients (including simboxes) routinely use short audio bu↵ers to prevent low levels of jitter from causing delays in playback. Simboxers could set the jitter bu↵er to a large value (say, several seconds of audio) to prevent jitter from causing noticeable audio artifacts. However, this would be intrusive to users, and Ammit could still detect true losses as well as the added false starts and double talk. While we leave the testing of this approach to future work, we briefly describe how high jitter

87 bu↵ers could be detected by measuring the incidence of double talk. Double talk is the phenomenon where, after a lull in conversation, two users begin to talk (apparently) simultaneously. Because double talk increases with audio latency, increased double talk will be indicative of increased latency. Because an increased jitter bu↵er (combined with the already high call latency from an international call) will lead to higher than “normal” latency, detecting anomalous double talk will help in detecting simboxing. Detecting double talk is an important task in equipment quality testing, and ITU-T standard P.502 provides an o↵-the-shelf method for measuring it. Feasibility and appropriate thresholds can be determined using call data through simboxes and from legitimate subscribers. While such data is unavailable to outside researchers, it is available to the carriers who would be fielding such a system.

Alternative PLC approaches. Ammit looks for brief silences as one signal of VoIP loss, so simboxers could replace silence with noise or other audio. This is a well known form of Packet Loss Concealment. In general, PLC algorithms (like the GSM-FR PLC) fall into three categories: insertion, interpolation, and regeneration [186]. Although there are a number of algorithms in each category, the majority are published (and those that are not are often similar to those that are). All will have some artifacts that can lead to detection, and because the Pindr0p project has developed techniques to identify other codecs [125], we leave detecting other PLCs as future engineering work not essential to confirming our hypothesis that audio features can identify simboxes.

Improved link quality. In addition to jitter and loss concealment, simboxers could reduce losses and jitter with high-quality network links or a redundant transmission scheme, but there are several barriers to this. First, finding a reliable provider may not be possible given the low connectivity conditions in simboxing nations. If a provider is available, the costs will likely be prohibitive. For example, in Kenya one can expect to pay $200,000 US per month for a high-quality 1 Gbps link[187]. This connection also guarantees little beyond the first routing hop. Beyond the costs, having a better quality

88 connection than many universities and businesses may raise undesirable scrutiny and attention to the simboxers. Even if a high-quality link is available, it would not remove degradations from the call that occur before the call arrives at the entry point to the simbox.

Garbled frame transmission. Finally, simboxers could evade Ammit detection by failing to transmit valid GSM air frames when an IP frame is lost. In e↵ect, Ammit would believe that all VoIP losses were air losses and would not detect VoIP losses. Ammit could detect this evasion by noting anomalous air loss patterns. Currently, conducting a simboxing operation requires the technical sophistication of systems administrator. This evasion technique will require significant engineering resources (with expertise in embedded system design, implementation, and production) because GSM modems are typically sold as packages that accept an audio stream and high-level control commands (e.g. “place a call” or “send an SMS”). These tightly integrated chips are not capable of sending damaged packets on command. While the Osmocom baseband project [188]couldprovideastartforacustomradio,Osmocomtargetsinexpensive (though relatively rare) feature phone variants and would not be a turnkey GSM baseband for a custom simbox5 . Finally, even if the simboxers develop such a modem, they would have to conceal all detectable artifacts from both the final VoIP step as well as any intermediate networks (like a caller’s mobile network). For these reasons, this strategy would only be e↵ective for the most motivated and very well-funded simboxers. However, in the event that simboxers do pursue this strategy, we propose the following methodology to detect such an attack. Given the considerable diculty in developing the attack as well as constructing a suitable test environment, we leave testing this detection methodology to future work. We hypothesize that this a garbled packet

5 We pursued this line of research ourselves before finally purchasing a commercial simbox

89 evasion strategy can be detected from anomalous air interface loss patterns because simboxed calls will see the ”typical” amount of loss plus the loss created by the simboxer. Loss patterns may be anomalous for improbable amounts of loss, or for improbably bursty sequences of lost frames. These anomalies could be determined on a tower-by-tower basis to take into account local transmission conditions (like a tunnel a↵ecting signal quality). Because mobile stations (i.e. phones) do not know which frames are erased when they arrive at the tower, simboxers will not be able to tune their loss rate to be within the bounds used by this strategy. 4.4 Experimental Setup

In this section, we describe how we characterize Ammit through the use of simulation and test its e↵ectiveness against a real simbox. We simulate simboxed calls by taking a corpus of recorded audio and passing them first through a VoIP simulator then through a GSM air simulator (again, we use the term “air” to distinguish GSM cellular transmission). The GSM air simulator provides Ammit with both audio and a vector of GSM frame errors. To simulate legitimate calls, we pass the audio corpus through the air simulator only. We motivate the use of simulation in Section 4.4.6. We test Ammit against three simbox codec choices: G.711 with no packet loss concealment and GSM-FR with and without packet loss concealment (we discussed this choice in Section 4.2. We evaluate single simbox call detection and SIM detection at 1%, 2%, and 5% loss rates (we justify this choice later in this section). 4.4.1 Speech Corpus

The source of voice data for our experiments was the TIMIT Acoustic-Phonetic Continuous Speech corpus [189]. This is a de facto standard dataset for call audio testing. The TIMIT corpus consists of recordings of audio of 630 English speakers from 8 distinct

90 regions each reading 10 “phonetically rich” standard sentences 6 .Therecordingsare 16kHz 16-bit signed Pulse Code Modulation (PCM), which are downsampled to 8kHz to conform to telephone quality. For the single call detection tests, we concatenate the 10 sentences for each of the 462 speakers into 1 call per speaker, creating a dataset of 462 calls 7 . Each call is approximately 30 seconds in length. The SIM detection test requires a larger call corpus, so for 98 randomly selected speakers we generate 20 calls for each speaker using permutations of the 10 sentences for each speaker (for a total of 1960 calls). Calls consist of only one speaker because Ammit analyzes each direction of the call separately. 4.4.2 VoIP Degradation and Loss

VoIP simulation takes TIMIT call audio as input and outputs audio that has been degraded by VoIP transmission. The simulator must convert the input audio from its original format (PCM) to the VoIP codec simulated (GSM or G.711), simulate loss, implement packet loss concealment in the case of GSM-FR, and output the final degraded audio. We examine these steps in greater detail in this subsection. Audio conversions. The input audio files, encoded using PCM, must either be converted to G.711 or GSM-FR. We use the widely-used open-source utility sox [190]forallcodec transitions throughout the Ammit testing infrastructure. Note that these codec transitions are standard practice throughout PSTN and VoIP networks. Packet loss modeling. We model Internet losses with the widely-used [191]Gilbert-Elliot packet loss model [192]. The Gilbert-Elliot model is a 2-state Markov model that models packet losses with bursty tendencies. A given channel can be in either a “good” state or a “bad” state. If the channel is in the “bad” state, packets are dropped. The Gilbert-Elliot

6 N.B. We use a subset of 462 male and female speakers from all 8 regions 7 We set aside 12 of these calls as a training set to develop and verify our algorithms and set detection thresholds. These calls were not used for testing.

91 model can be described with two parameters: p,thelikelihoodthatthechannelenters the “bad” state, and r, the likelihood that the channel leaves the bad state. The parameter p controls the frequency of loss events while r controls how long bursts last. We parameterize the model such that p is the target loss rate (for these experiments, 1%,2%, and 5%) and r =1 p. This means that the higher the loss rate, the greater the tendency of losses to be bursty. Although jitter is a source of audio artifacts, we do not model jitter explicitly. Instead, because the audio symptoms of jitter and packet loss are the same (i.e., audio is not present when needed), we simply consider jitter as a special case of packet loss, as is done by Jiang and Schulzrinne [191]. Loss rate justification. The reader may note that we are modeling loss rates that are considered high for Internet loss. Our model is justified for several reasons. The first consideration is that the typical Internet connection conditions in simboxing countries are of much lower quality than what most of Europe, East Asia, or even North America experiences [187], [193], with loss rates often exceeding 10%. Second, because conditions can vary from hour to hour or even moment to moment, examining performance at higher loss rates than typical is justified [191]. G.711 processing. To implement VoIP loss in G.711 audio, we use a packet loss simulation tool from the G.711 reference implementation available in the ITU Software Tools Library [194]. This tool implements concealed and unconcealed loss on 16-bit 8kHz PCM audio. We use sox to encode our input files to G.711 and back to 16-bit PCM before processing by the tool. This step is required because G.711 is a lossy codec, and the act of encoding and decoding irreversibly changes the audio. The tool takes a frame error vector as input, allowing us to use the Gilbert-Elliot Model described above. GSM-FR processing. We developed our own GSM-FR VoIP loss simulator in Matlab. All audio processing in this tool is done on GSM-FR encoded audio. The tool implements

92 the previously discussed packet loss model, the GSM-FR PLC as defined in 3GPP Standard 46.011 [185], and unconcealed packet loss by inserting GSM-FR silent frames. 4.4.3 GSM Air Loss

As we discussed in Section 4.4.6,wesimulatesimboxcallsoutofnecessity.To simulate GSM cellular transmission (i.e., “air loss”) we modify a GSM Trac Channel simulation model for Simulink [195]. This model takes frames of GSM-encoded audio and encodes them as transmission frames for transmission over a GSM trac channel as specified in 3GPP Standard 45.003 [196]. The transmission encoding includes interleaving as well as the error correcting codes and parity checks applied to Class 1 bits (as discussed in Section 4.2). The model then simulates the modulation and transmission of the encoded frame using GMSK (Gaussian Minimum Shift Keying) in the presence of Gaussian white noise in the RF channel. This white noise is the source of random transmission errors in the model. The model then demodulates the transmitted channel frame, evaluates the error correcting codes, and computes the parity check to determine if the frame is erased or not. Finally, the model outputs the received audio and a vector indicating which frames were erased. The channel model signal-to-noise ratio is tuned to produce a frame erasure rate (FER) of 3% at the receiver, which is considered nominal according to 3GPP Standard 45.005 [197]. 4.4.4 Simboxing SIM Detection Test

Our SIM detection mechanism is tailored to reduce the e↵ect of a single false positive or false negative call judgment by examining multiple calls. To measure the e↵ectiveness of this mechanism, we use 20 audio files from 98 unique speakers (for a total of 1960 calls) to simulate legitimate and simboxed calls using our GSM and VoIP simulators. We examine legitimate calls as well as simboxes covering all

93 PlanetLab Simbox OpenBTS Simbox TIMIT Audio Soft Asterisk Audio Internet VoIP GSM Audio Phone Base Station

Figure 4-4. Our detection mechanisms are run against a real simbox deployment (Hybertone GoIP-1) communicating to a modified Range Networks OpenBTS base station. three codecs (GSM-FR, GSM-FR with PLC, and G.711) at 1%, 2%, and 5% loss rates. We model individual SIM cards as groups of 20 calls. For legitimate SIM cards, all calls from a particular speaker are assigned to a single SIM card, while simbox SIM cards consist of groups of randomly selected calls. This models the fact that simbox SIMs will rarely be used to provide service for the same user twice. We analyzed all legitimate and simboxed calls with Ammit, then computed the percentage of calls in each SIM card group that were marked as simboxed. We consider a SIM to be used in a simbox if at least 25% of the calls it makes are marked as simboxed by Ammit call analysis. 4.4.5 Real Simbox Tests

We collect audio traces from calls made through a real simbox to validate our simulation experiments. Figure 4-4 shows a schematic diagram of our experimental setup. We use 100 randomly selected audio files from the single call detection corpus (discussed in Section 4.4.1) to model the original call source. The call path begins at a PJSIP soft phone at a

94 PlanetLab node located in Thailand, a country with major simboxing problems [198]8 . This step emulates the arrival of a call to a simboxer. The call originates from a soft phone and is routed through an Asterisk PBX9 (not shown in the figure) to our Hybertone GoIP-1 simbox in the United States. Hybertone simboxes o↵er useful features to simboxing, including the ability to automatically change the IMEI number broadcast to evade filtering and detection systems like those presented in prior work [92]. Hybertone products have been advertised for sale specifically for simboxing [199], and entrepreneurs even sell value-added management consoles specifically for simboxers [200]. While the GoIP-1 supports several incoming codecs, it does not disclose which PLC algorithm it uses. We have determined experimentally that it is using a variant of the GSM-FR PLC. The simbox delivers the call to a cellular base station under our control. Our base station is a Range Networks Professional Development Kit running the OpenBTS 5.0 open-source base station software and Asterisk 11.7. This base station is a low-power research femtocell and allows us to record call audio digitally as the base station receives it – including frame erasure information. To determine false positives, we create control calls by playing the same 100 randomly-selected audio files into a BLU Samba Jr. Plus feature phone and capturing the call audio at the base station. Figure 4-5 shows our base station and simbox experimental apparatus. 4.4.6 Technical Considerations

Our experimental setup uses both simulation and real simbox data we collect ourselves for several reasons. First, simulations provide the best way to examine

8 Note that Thailand is the only major simboxing country with functional PlanetLab node at the time of writing 9 A Private Branch Exchange (PBX) is a telephony switch analogous to an intelligent router in the Internet

95 )ZCFS5POF(P*1 4JNCPY

$FMM1IPOF

3BOHF/FUXPSLT0QFO#54

Figure 4-5. Our simbox experimental apparatus, including our OpenBTS GSM base station, mobile phone to model legitimate calls, and our GoIP-1 simbox. the e↵ects of codec choice, packet loss concealment, and loss rates reproducibly and accurately. Second, they allow us to build generic models of simboxes so that our detection mechanism is not tied to any particular simbox model. Third, because we use tools and models that are extensively studied, verified, and frequently used throughout the literature [125], [191], [192], [194], [195], [201], we can have confidence that our results are correct. We supplement our simulations with data collected through a commonly used simbox to support and confirm our simulation results. The reader will note that our real simbox calls were originated in a simboxing country, not terminated there. While simboxing is a global problem [92], we wanted to focus on areas where the problem is endemic and has a substantial impact. However, logistical, economic, and legal considerations prevented us from placing our simbox and research base station abroad. Instead, we capture the exact loss and jitter characteristics of the Internet connections in a simboxing country by originating the call there while terminating the call in our lab. Legal and privacy concerns prevent us from receiving simbox audio from mobile operators (since the audio would be from callers who could not give their consent for such use). However, we note that there are no additional privacy concerns created when an operator deploys Ammit in a real network. Operators regularly use automated techniques

96

100 92 87

66 GSM−FR GSM−FR PLC G.711 49 Legitimate Calls (FP) % Calls Detected

30 26

15

1 1 2 5 % Loss Rate

Figure 4-6. Ammit detection depends on the loss rate and Simbox codec used. For a 2% loss rate, Ammit detects over 55% of simboxed calls with less than a 1% false positive rate. This performance makes SIM detection (shown in Fig. 4-7)very reliable. to monitor call quality of ongoing conversations, and Ammit does no analysis that could be used to identify either the speakers or the semantic content of the call. Finally, we note that the use of TIMIT audio is extremely conservative; it presumes pristine audio quality before the call transits an IP link. In fact, there will be detectable degradations from the PSTN even before the VoIP transmission. Chief among these will be GSM-FR PLC applied if Alice calls from a mobile phone. Because mobile phones regularly see high loss rates10 ,simboxerscarryingmobile-originatedtracwillbeeven more vulnerable to detection by Ammit than this methodology reflects. 4.5 Detection Results

This section demonstrates how Ammit detects simbox fraud. We first discussed Ammit’s e↵ectiveness at identifying a real simbox, followed by a discussion of the results of detecting simulated simboxed calls. We then examine how Ammit can be used to

10 Recall from 3GPP standard 45.005 [197]that3%lossisconsiderednominal

97 100 96

G.711 GSM−FR 43 GSM−FR PLC % SIMs Detected

28

0 1% 2% 5% % Loss Rate

Figure 4-7. Even with unusually high-quality network connections, Ammit can be used to identify SIM cards used for simboxing. identify SIM cards used in simboxing fraud. Finally, we show that Ammit is fast enough to be e↵ective in real networks. 4.5.1 Simulated Call Analysis

In this subsection, we evaluate Ammit’s ability to detect individual simboxed calls and SIM cards used in simboxing. Figure 4-6 presents the percentage of simboxed calls detected for three simbox types at three di↵erent loss rates. At the still plausible 5% loss rate, Ammit detects from 87% to 100% of simboxed calls. Lower detection rates for low loss rates are simply a result of fewer loss events for Ammit to detect. However, in the case of no packet loss concealment, Ammit still detects from 15–66% of the simboxed calls for 1 and 2% loss. As discussed in the previous section, these loss rates include the e↵ect of jitter, so loss rates as low as 1% and 2% are unlikely to be encountered often in practice [187], [193]. Third, the lowest dotted line in Figure 4-6 shows the low (but non-zero) detection rate for the control group of simulated legitimate calls — less than 1% (0.87% to be exact).

98 Figure 4-7 shows the percentage of simbox SIM cards that can be automatically disabled at the threshold of 25% of calls. For a 5% loss rate, our policy can identify 100% of SIM cards used in simboxes. For calls using GSM-FR with packet loss concealment our policy can also detect 100% of SIM cards. As the loss rates decrease, we identify fewer SIM cards for codecs without packet concealment. In the case of 2% loss, we identify 96% and 100% of SIMs used in GSM-FR and G.711 simboxes, respectively. In the case of 1% loss, we still identify 43% of G.711 SIMs and 28% of GSM-FR SIMs. Our threshold results in a false positive rate of 1% and was determined experimentally from a ROC curve (omitted for space reasons). To counter the e↵ects of false positives, the operator could implement a simple policy step allowing users to reactivate canceled SIMs after some verification. One possibility is requiring flagged users to verify the National ID numbers used to register the SIM card over the phone or in person at a sales agent. 4.5.2 Detection of Real Simboxes

We begin with the most important result that Ammit is e↵ective at detecting real simboxes. We find Ammit can detect 87% of real simboxed calls with zero false positives on the call set. These figures are the result of running our GSM-FR packet loss concealment after tuning on simulated individual call data; improved detection may be possible at a cost of a low false positive rate. While simulations produce useful insights about Ammit’s performance in a wide range of conditions, these results confirm our hypothesis that call audio can be used to e↵ectively combat simbox fraud. 4.5.3 Discussion

We make three observations from the individual call simulations. First, the results show a clear relationship between the loss rate of a call and Ammit’s ability to detect a call. Second, Figure 4-6 shows the counterintuitive result that using GSM-FR packet loss concealment makes calls easier to detect. Even at a 1% loss rate, Ammit detects 30% of simboxed calls using GSM-FR PLC. Ammit is so e↵ective at detecting concealed packet loss events because the GSM-FR PLC cepstral peak is distinctive and rare in speech. The

99 corollary to this finding is that simboxers will have an incentive to disable packet loss concealment. This will noticeably impair call quality and user acceptability. Third, the non-zero false positive rate means that discretion will be required when Ammit indicates a positive simbox call. Our SIM detection results show that Ammit can be used not only to detect single calls but as a larger initiative against simboxing. At 2% and 5% loss, we can detect and disable a single SIM card after at most 20 calls. Even at 1% loss, we can still detect and disable many SIM cards. Given that SIM cards come at a non-trivial cost (either at a legitimate point of sale or on a black market), by reducing the lifetime of a SIM card we make simboxers unable to operate. Finally, we make two observations from the real simbox results. First, we note that our simulations were e↵ective for tuning Ammit before applying real data. This validates our methodological strategy. Second, our simulation false positive rates were conservatively high; while we saw 1% false positives on our simulated data, we saw no false positives on our actual data. 4.5.4 Ammit Performance

To show that Ammit is scalable and performant, we examine the amount of time Ammit requires to analyze a call for concealed and unconcealed packet losses. Although in the previous subsections we analyzed Ammit’s performance for 30 second calls, we hypothesize that longer analyses would lead to even better results, especially for lower loss rate calls. We tested Ammit’s performance on a set of 10 calls of approximately 30s, 60s, and 120s; we present the averages of 10 analyses of each call in Figure 4-8. We test Ammit on a late 2011 iMac with a quadcore 3.4 GHz Intel Core i7, 16GB RAM, and a 1TB solid state disk running OS X 10.9. Although this is capable hardware, the detection is done entirely with Matlab in a single thread, and the detection code is correct but far from optimal. Optimizing the Matlab code for eciency would likely reduce analysis time. Beyond that, implementing Ammit in a more performant language

100 0.8 Concealed Packet Loss Detection Unconcealed Packet Loss Detection 0.7

0.6

0.5

0.4

0.3 Average Analysis Time (s) 0.2

0.1

0 29.9 59.5 119 Average Call Time (s)

Figure 4-8. Ammit analyzes audio much faster than real time and is ecient enough to deploy in cell towers. like C could reduce analysis time further. For a commercial implementation, code customized for a digital signal processor could further improve performance. Ammit may be deployed directly as a BTS or BSC software update or as inexpensive standalone hardware. As Figure 4-8 shows, the majority of analysis time is spent detecting concealed packet loss. Nevertheless, calls can be analyzed 150 times faster than real time, indicating that a single thread of execution could analyze approximately 150 calls per unit time. Even our unoptimized code would be able to analyze all trac at a tower in real time.

101 CHAPTER 5 PRACTICAL END-TO-END CRYPTOGRAPHIC AUTHENTICATION FOR TELEPHONY OVER VOICE CHANNELS Modern telephony systems include a wide array of end-user devices. From traditional rotary PSTN phones to modern cellular and VoIP capable systems, these devices remain the de facto trusted platform for conducting many of our most sensitive operations. Even more critically, these systems o↵er the sole reliable connection for the majority of people in the world today. Such trust is not necessarily well placed. Caller ID is known to be a poor authenticator

[202], [125], [203], and yet is successfully exploited to enable over US$2Billioninfraud every year [4]. Many scammers simply block their phone number and exploit trusting users by asserting an identity (e.g., a bank, law enforcement, etc.), taking advantage of a lack of reliable cues and mechanisms to dispute such claims. Addressing these problems will require the application of lessons from a related space. The web experienced very similar problems in the 1990s, and developed and deployed the (TLS) protocol suite and necessary support infrastructure to assist with the integration of more verifiable identity in communications. While by no means perfect and still an area of active research, this infrastructure helps to make a huge range of attacks substantially more dicult. Unfortunately, the lack of similarly strong mechanisms in telephony means that not even trained security experts can currently reason about the identity of other callers.

Text of this chapter is reprinted with permission from Bradley Reaves, Logan Blue, and Patrick Traynor. Authloop: Practical End-to-End Cryptographic Authentication for Telephony over Voice Channels. In Proceedings of 25th USENIX Security Symposium, Austin, TX, August 2016. (Acceptance Rate:15.5%).

102 In this chapter, we address this problem with AuthLoop.1 AuthLoop provides a strong cryptographic authentication protocol inspired by TLS 1.2. However, unlike other related solutions that assume Internet access (e.g., Silent Circle, RedPhone, etc [104]– [112]), accessibility to a secondary and concurrent data channel is not a guarantee in many locations (e.g., high density cities, rural areas) nor for all devices, mandating that a solution to this problem be network agnostic. Accordingly, AuthLoop is designed for and transmitted over the only channel certain to be available to all phone systems — audio. The advantage to this approach is that it requires no changes to any network core, which would likely see limited adoption at best. Through the use of AuthLoop, users can quickly and strongly identify callers who may fraudulently be claiming to be organizations including their financial institutions and their government [4]. We make the following contributions:

• Design a complete transmission layer: We design the first codec-agnostic modem that allows for the transmission of data across audio channels. We then create a supporting link layer protocol to enable the reliable delivery of data across the heterogeneous landscape of telephony networks.

• Design AuthLoop authentication protocol: After characterizing the bandwidth limitations of our data channel, we specify our security goals and design the AuthLoop protocol to provide explicit authentication of one party (i.e., the “Prover”) and optionally weak authentication of the second party (i.e., the “Verifier”).

• Evaluate performance of a reference implementation: We implement AuthLoop and test it using three representative codecs — G.711 (for PSTN networks), AMR (for cellular networks) and Speex (for VoIP networks). We demonstrate the ability to create a data channel with a goodput of 500 bps and bit error rates averaging below 0.5%. We then demonstrate that AuthLoop can be run over this channel in an average of 9 seconds (which can be played below speaker

1 Anamereminiscentofthe“LocalLoop”usedtotietraditionalphonesystemsinto the larger network, we seek to tie modern telephony systems into the global authentication infrastructure that has dramatically improved transaction security over the web during the past two decades.

103 audio), compared to running a direct port of TLS 1.2 in an average of 97 seconds (a 90% reduction in running time). The remainder of this chapter is organized as follows: Section 5.1 presents the details of our system including lower-layer considerations; Section 5.2 discusses our security model; Section 5.3 formally defines the AuthLoop protocol and parameterizes our system based on the modem; Section 5.4 discusses our prototype and experimental results; and Section 5.5 provides additional discussion about our system; 5.1 Voice Channel Data Transmission

To provide end-to-end authentication across any telephone networks, we need a way to transfer data over the voice channel. The following sections detail the challenges that must be addressed, how we implemented a modem that provides a base data rate of 500bps, and how we developed a link layer to address channel errors. We conclude with adiscussionofwhatthesetechnicallimitationsimplyforusingstandardauthentication technologies over voice networks. 5.1.1 Challenges to Data Transmission

Many readers may fondly remember dial-up Internet access and a time when data transmission over voice channels was a common occurrence. In the heyday of telephone modems, though, most voice channels were connected over high-fidelity analog twisted pair. Although the voice channel was band limited and digital trunks used a low sample rate of 8kHz, the channel was quite “well behaved” from a digital communications and signal processing perspective. In the last two decades, telephony has been transformed. Cellular voice and Internet telephony now comprise a majority of all voice communications; they are not just ubiquitous, they are unavoidable. While beneficial from a number of perspectives, one of the drawbacks is that both of these modalities rely on heavily compressed audio transmission to save bandwidth. These compression algorithms — audio codecs — are technological feats, as they have permitted cheap, acceptable quality phone calls,

104 Header 17 data bits Footer

Figure 5-1. This 74ms modem transmission of a single frame demonstrates how data is modulated and wrapped in headers and footers for synchronization. especially given that they were developed during eras when computation was expensive. To do this, codec designers employed a number of technical and psychoacoustic tricks to produce acceptable audio to a human ear, and these tricks resulted in a channel poorly suited for (if not hostile to) the transmission of digital data. As a result, existing voice modems are completely unsuited for data transmission in cellular or VoIP networks. Voice codecs present several challenges to a general-purpose modem. First, amplitudes are not well preserved by voice codecs. This discounts many common modulation schemes, including ASK, QAM, TCM, and PCM. Second, phase discontinuities are rare in speech, and are not e↵ective to transmit data through popular voice codecs. This discounts PSK, QPSK, and other modulation schemes that rely on correct phase information. Furthermore, many codecs lose phase information on encoding/decoding audio, preventing the use of ecient demodulators that require correct phase (i.e., coherent demodulators). Because of the problems with amplitude and phase modulation, frequency-shift modulation is the most e↵ective technique for transmitting data through voice codecs. Even so, many codecs fail to accurately reproduce input frequencies — even those well within telephone voicebands (300–3400 Hz). Our physical layer protocol addresses these challenges. 5.1.2 Modem Design

The AuthLoop modem has three goals: 1. Support highest bitrate possible 2. At the lowest error rate possible

105 3. In the presence of deforming codecs We are not the first to address transmission of data over lossy compressed voice channels. Most prior e↵orts [204]–[206]havefocusedontransmissionoverasinglecodec, though one project, Hermes [207]wasdesignedtosupportmultiplecellularcodecs. Unfortunately, that project only dealt with the modulation scheme, and did not address system-level issues like receiver synchronization. Furthermore, the published code did not have a complete demodulator, and our own implementation failed to replicate their results. Thus, we took Hermes as a starting point to produce our modem. Most modems are designed around the concept of modulating one or more parameters —amplitude,frequency,and/orphase—ofoneormoresinewaves.Ourmodem modulates a single sine wave using one of three discrete frequencies (i.e. it is a frequency shift key, or FSK, modem). The selection of these frequencies is a key design consideration, and our design was a↵ected by three design criteria. First, our modem is designed for phone systems, so our choice of frequencies are limited to the 300–3400Hz range because most landline and cellular phones are limited to those frequencies. Second, because we cannot accurately recover phase information for demodulation, our demodulation must be decoherent; the consequence is that our chosen frequencies must be separated by at least the symbol transmission rate [208]. Third, each frequency must be an integer multiple of the symbol frequency. This ensures that each symbol completes a full cycle, and it also ensures that each cycle begins and ends on a symbol boundary. This produces a continuous phase modulation, and it is critical because some voice codecs will produce artifacts or aliased frequencies in the presence of phase discontinuities. These constraints led to the selection of a 3-FSK system transmitting symbols at 1000 Hz using frequencies 1000, 2000, and 3000 Hz. Unfortunately, 3-FSK will still fail to perform in many compressed channels simply because those channels distort frequencies, especially frequencies that change rapidly. To mitigate issues with FSK, we use a di↵erential modulation: bits are encoded not as

106 individual symbols, but by the relative di↵erence between two consecutive symbols. For example, a “1” is represented by an increase in two consecutive frequencies, while a “0” is represented by a frequency decrease. Because we only have 3 frequencies available, we have to limit the number of possible consecutive increases or decreases to 2. Manchester encoding, where each bit is expanded into two “half-bits” (e.g. a “1” is represented by “10”, and “0” represented by “01”) limits the consecutive increases or decreases within the limit. While these details cover the transmission of data, there are a few practical concerns that must be dealt with. Many audio codecs truncate the first few milliseconds of audio. In speech this is unnoticeable, and simplifies the encoding. However, if the truncated audio carries data, several bits will be lost every transmission. This e↵ect is compounded if voice activity detection (VAD) is used (as is typical in VoIP and cellular networks). VAD distinguishes between audio and silence, and when no audio is recorded in a call VAD indicates that no data should be sent, saving bandwidth. However, VAD adds an additional delay before voice is transmitted again. To deal with early voice clipping by codecs and VAD, we add a 20 ms header and footer at the end of each packet. This header is a 500 Hz sine wave; this frequency is orthogonal to the other 3 transmission frequencies, and is half the symbol rate, meaning it can be used to synchronize the receiver before data arrives. A full modem transmission containing 17 bits of random data can be seen in Figure 5-1. To demodulate data, we must first detect that data is being transmitted. We distinguish silence and a transmission by computing the energy of the incoming signal using a short sliding window (i.e, the short-time energy). Then we locate the header and footer of a message to locate the beginning and end of a data transmission. Finally, we compute the average instantaneous frequency for each half-bit and compute di↵erences between each bit. An increase in frequency indicates 1, a decrease indicates 0.

107 5.1.3 Link Layer

Despite a carefully designed modem, reception errors will still occur. These are artifacts created by line noise, the channel codec, or an underlying channel loss (e.g., alostIPpacket).Toaddresstheseissues,wedevelopedalinklayertoensurereliable transmission of handshake messages. This link layer manages error detection, error correction, frame acknowledgment, retransmission, and reassembly of fragmented messages. Because error rates can sometimes be as high as several percent, a robust retransmission scheme is needed. However, because our available modem data rate is so low, overhead must be kept to a minimum. This rules out most standard transmission schemes that rely on explicit sequence numbers. Instead, our data link layer chunks transmitted frames into small individual blocks that may be checked and retransmitted if lost. We are unaware of other link layers that use this approach. The remainder of this subsection motivates and describes this scheme. 5.1.4 Framing and Error Detection

Most link layers are designed to transmit large (up to 12,144 bits for Ethernet) frames, and these channels either use large (e.g., 32-bit) CRCs2 for error detection to retransmit the entire frame, or use expensive but necessary error correcting schemes in lossy media like radio. Error correcting codes recover damaged data by transmitting highly redundant data, often inflating the data transmitted by 100% or more. The alternative, sending large frames with a single CRC, was unlikely to succeed. To see why, note that: P (C)=(1 p)l (5–1)

2 ACyclicRedundancyCheck(CRC)isacommonchecksumthatisformedby representing the data as a polynomial and computing the remainder of polynomial division. The polynomial divisor is a design parameter that must be chosen carefully.

108 where C is a “correct CRC” event, p is probability of a single bit error, and l is the length of the CRC. For a 3% bit error rate, the probability of just the CRC being undamaged is less than 38% — meaning two thirds of packets will be dropped for having a bad CRC independent of other errors. Even at lower loss rates, retransmitting whole frames for a single error would cause a massive overhead. Instead, we divide each frame into 32-bit “blocks”. Each block carries 29 bits of data and a 3-bit CRC. This allows short sections of data to be checked for errors individually and retransmitted, which is closer to optimal transmission. Block and CRC selection was not arbitrary, but rather the result of careful modeling and analysis. In particular, we aimed to find an optimal tradeo↵between overhead (i.e., CRC length) and error detection. Intuitively, longer CRCs provide better error detection and reduce the probability of an undetected error. More formally, a CRC of length l can guarantee detection of up to h bit errors3 in a B-length block of data, and can detect more than h errors probabilistically [209]. The tradeo↵is maximizing the block size and minimizing the CRC length while minimizing the probability of a loss in the frame or the probability of an undetected error U, represented by the following equations, which take into account the probability L of a lost frame and probability S of a successful frame

Pr(L)=1 Pr(S)(5–2) =1 (1 p)B (5–3)

h B i B i Pr(U)=1 p (1 p) (5–4) i i=0 X ✓ ◆

3 The Hamming distance of the transmitted and received data

109 RECEIVE SEND SEND IDLE OTHER ERROR STANDARD (START) FRAME FRAME FRAME NACKs==0

SEND AWAIT Timeout / ERROR ACK Error MESSAGE RECEIVE ANY RECEIVE ERROR NACKs>0 STANDARD STATE NACKs==0 FRAME FRAME SEND AWAIT SEND REPEAT SEND ACK NACKs >0 REPEAT Timeout ERROR FRAMES BLOCKS FRAME Receive Repeat Blocks

Figure 5-2. Link layer state machine. where p represents the probability of a single bit error. The probability of undetected error is derived from the cumulative binomial distribution. Using these equations and the common bit error rate of 0.3% (measured in Section 5.4), we selected 32-bit blocks with a 3-bit CRC. We chose the optimal 3-bit CRC polynomial according to Koopman and Chakravarty [209]. These parameters give a likelihood of undetected error of roughly 0.013%, which will rarely a↵ect a regular user. Even a call center user would see a protocol failure due to bit error once every two weeks, assuming 100 calls per day. 5.1.5 Acknowledgment and Retransmission

Error detection is only the first step of the error recovery process, which is reflected as a state machine in Figure 5-2. When a message frame is received, the receiver computes which blocks have an error and sends an acknowledgment frame (“ACK”) to the transmitter. The ACK frame contains a single bit for each block transmitted to indicate if the block was received successfully or not. Blocks that were negatively acknowledged are retransmitted; the retransmission will also be acknowledged by the receiver. This process will continue until all original blocks are received successfully. By using a single bit of acknowledgment for each block we save the overhead of using sequence numbers. However, even a single bit error in an ACK will completely

110 desynchronize the reassembly of correctly received data. Having meta-ACK and ACK retransmission frames would be unwieldy and inelegant. Instead, we transmit redundant ACK data as a form of error correction; we send ACK data 3 times in a single frame and take the majority of any bits that conflict. The likelihood of a damaged ACK given a probability of bit error p and block count N is:

3Np2 (5–5) instead of 1 (1 p)N (5–6) Note that there are e↵ectively distinct types of frames — original data, ACK data, retransmission data, and error frames. We use a four-bit header to distinguish these frames; like ACK data, we send three copies of the header to ensure accurate recovery. We will explore more robust error correcting codes in future work. 5.1.6 Na¨ıve TLS over Voice Channels

With a modem and link layer design established, we can now examine how a standard authentication scheme — TLS 1.2 — would fare over a voice channel. Table 5-1 shows the amount of data in the TLS handshakes of four popular Internet services: Facebook, Google, Bank of America, and Yahoo. These handshakes require from 41,000 to almost 58,000 bits to transmit, and this excludes application data and overhead from the TCP/IP and link layers. At 500 bits per second (the nominal speed of our modem), these transfers would require 83–116 seconds as a lower bound. From a usability standpoint, standard TLS handshakes are simply not practical for voice channels. Accordingly, a more ecient authentication protocol is necessary. 5.2 Security Model

Having demonstrated that data communication is possible but extremely limited via voice channels, we now turn our attention to defining a security model. The combination of our modem and this model can then be used to carefully design the AuthLoop protocol.

111 Table 5-1. TLS Handshake Sizes Site Name Total Bits Transmission Time (seconds at 500bps) Facebook 41,544 83.088 Google 42,856 85.712 BankofAmerica 53,144 106.288 Yahoo 57,920 115.840 Average 48,688 97.732

(0) Initiate Call

(1) V, NV C: Certificate Mobile Call Center E: Encryption - (Verifier) (2) P, NP , CP , D(KP , P, NP) (Prover) H: HMAC D: Digital Signature (3) E(K +,S), H(k,'VRFY', #1, #2) P K+,-: Public/Private Key (4) H(k,'PROV', #1, #2) k: Symmetric Key

... N: Nonce P: Prover (n-1) H(k, V, NV+n-1) S: Pre-Master Secret V: Verifier (n) H(k, P, NP+n)

Figure 5-3. The AuthLoop authentication protocol. Solid arrows indicate the initial handshake message flows, and dotted arrows indicate subsequent authenticated “keep alive” messages. Note that #1 and #2 in messages 2 and 3 indicate that that contents of messages 1 and 2 are included in the calculation of the HMAC, as is done in TLS 1.2.

The goal of AuthLoop is to mitigate the most common enabler of phone fraud: claiming a false identity via Caller ID spoofing. This attack generally takes the form of the adversary calling the victim user and extracting sensitive information via social engineering. The attack could also be conducted by sending the victim a malicious phone number to call (e.g., via a spam text or email). An adversary may also attempt to perform aman-in-the-middleattack,callingboththevictimuserandalegitimateinstitution and then hanging up the call on either when they wish to impersonate that participant. Finally, an adversary may attempt to perform a call forwarding attack, ensuring that correctly dialed numbers are redirected (undetected to the caller) to a malicious endpoint.

Assumptions. We base our design on the following assumptions. An adversary is able to originate phone calls from any telephony device (i.e., cellular, PSTN, or VoIP) and

112 spoof their Caller ID information to mimic any phone number of their choosing. Targeted devices will either display this spoofed number or, if they contain a directory (e.g., contact database on a mobile phone), a name associated or registered with that number (e.g., “Bank of America”). The adversary can play arbitrary sounds over the audio channel, and may deliver either an automated message or interact directly with the targeted user. Lastly, the adversary may use advanced telephony features such as three-way calling to connect and disconnect parties arbitrarily. This model describes the majority of adversaries committing Caller ID fraud at the time of this work. Our scenario contains two classes of participants, a Verifier (i.e., the user) and Prover (i.e., either the attacker of the legitimate identity owner). The adversary is active and will attempt to assert an arbitrary identity. As is common on the Web, we assume that Provers have certificates issued by their service provider4 containing their public key and that Verifiers may have weak credentials (e.g., account numbers, PINs, etc) but do not have certificates. We seek to achieve the following security goals in the presence of this adversary: 1. (G1) Authentication of prover: The Verifier should be able to explicitly determine the validity of an asserted Caller ID and the identity of the Prover without access to a secondary data channel. 2. (G2) Proof of liveness: The Prover and Verifier will be asked to demonstrate that they remain on the call throughout its duration. Note that we do not aim to achieve voice confidentiality. As discussed in Chapter 2, the path between two telephony participants is likely to include a range of codec transformations, making the bitwise representation of voice vary significantly between source and destination. Accordingly, end-to-end encryption of voice content is not currently possible given the relatively low channel bitrate and large impact of transcoding.

4 See Section 5.5 for details.

113 Solutions such as Silent Circle [112]andRedPhone[110] are able to achieve this guarantee strictly because they are VoIP clients that traverse only data networks and therefore do not experience transcoding. However, as we discuss in Section 5.5,ourtechniquesenable the creation of a low-bandwidth channel that can be used to protect the confidentiality and integrity of weak client authentication credentials. 5.3 AuthLoop Protocol

This section describes the design and implementation of the AuthLoop protocol. 5.3.1 Design Considerations

Before describing the full protocol, this section briefly discusses the design considerations that led to the AuthLoop authentication protocol. As previously mentioned, we are constrained in that there is no fully-fledged Public Key Infrastructure (PKI), meaning that Verifiers (i.e., end users) do not universally possess a strong credential. Moreover, because we are limited to transmission over the audio channel, the AuthLoop protocol must be highly bandwidth ecient. The most natural choice for AuthLoop would be to an authentication protocol such as Needham-Schroeder [210]. Reusing well-understood security protocols has great value. However, Needham-Schroeder is inappropriate because it assumes that both sides have public/private key pairs or can communicate with a third party for session key establishment. Goal G1 is therefore not practically achievable in real telephony systems if Needham-Schroeder is used. This protocol is also unsuitable as it does not establish session keys, meaning that achieving G2 would require frequent re-execution of the entire authentication protocol, which is likely to be highly inecient. TLS can achieve goals G1 and G2, and already does so for a wide range of traditional applications on the Web. Unfortunately, the handshaking and negotiation phases of TLS 1.2 require significant bandwidth. As we demonstrate in Section 5.1,unmodifieduseof this protocol can require an average of 97 seconds before authentication can be completed. However, because it can achieve goals G1 and G2, TLS 1.2 is useful as a template for

114 our protocol, and we discuss what could be considered a highly-optimized version below. We note that while TLS 1.3 provides great promise for reducing handshaking costs, the current draft version requires more bandwidth than the AuthLoop protocol. 5.3.2 Protocol Definition

Figure 5-3 provides a formal definition for our authentication protocol. We describe this protocol below, and provide details about its implementation and parameterization (e.g., algorithm selection) in Section 5.3.4. The AuthLoop protocol begins immediately after a call is terminated.5 Either party, the Prover P (e.g., a call center) or the Verifier V (e.g., the end user) can initiate the call.

V then transmits its identity (i.e., phone number) and a nonce NV to P . Upon receiving this message, P transmits a nonce NP ,itscertificateCP ,andsignsthecontentsofthe message to bind the nonce to its identity. Its identity, P ,istransmittedviaCallerIDand is also present in the certificate. V then generates a pre-master secret S,andusesS to generate a session key k,which is the result of HMAC(S, NP ,NV ). V then extracts P ’s public key from the certificate, encrypts S using that key and then computes HMAC(k, ‘VRFY’, #1, #2), where ‘VRFY’ is a literal string, and #1 and #2 represent the contents of messages 1 and 2. V then sends S and the HMAC to P . P decrypts the pre-master secret and uses it to similarly calculate k, after which is calculates HMAC(k, ‘PROV’, #1, #2), which it then returns to V . At this time, P has demonstrated knowledge of the private key associated with the public key included in its certificate, thereby authenticating the asserted identity. If the Prover does not provide the correct response, its claim of the Caller ID as its identity is rejected. Security goal G1 is therefore achieved. Moreover, P and V now share a session

5 This is the telephony term for “delivered to its intended destination” and signifies the beginning of a call, not its end.

115 key k, which can be subsequently used to provide continued and ecient proofs (i.e., HMACs over incrementing nonces) that they remain on the call, thereby achieving Goal G2. We note that the session key generation step between messages 2 and 3 can be extended to provide keys for protecting confidentiality and integrity (as is done in most TLS sessions). While these keys are not of value for voice communications (given the narrow bitrate of our channel), they can be used to protect client authentication credentials. We discuss this in greater detail in Section 5.5. 5.3.3 Formal Verification

We use the Proverif v1.93 [211]automaticcryptographicprotocolverifiertoreason about the security of the AuthLoop handshake. Proverif requires that protocols be rewritten as Horn clauses and modeled in Pi Calculus, from which it can then reason about secrecy and authentication in the Dolev-Yao setting. AuthLoop was represented by atotalof60linesofcode,andProverifverifiedthesecrecyofthesessionkeyk. 5.3.4 Implementation Parameters

Table 5-4 provides accounting of every bit used in the AuthLoop protocol for each message. Given the tight constraints on the channel, we use the following parameters and considerations to implement our protocol as eciently as possible while still providing strong security guarantees. We use elliptic curve cryptography for public key primitives. We used the Pyelliptic library for Python [212], which is a Python wrapper around OpenSSL. Keys were generated on curve sect283r1, and keys on this curve provide security equivalent to RSA 3456 [213]. For keyed hashes, we use SHA-256 as the underlying hash function for HMACs. To reduce transmission time, we compute the full 256-bit HMAC and truncate the result to 80 bits. Because the security factor of HMAC is dependent almost entirely

80 on the length of the hash, this truncation maintains a security factor of 2 [214]. This

116 Figure 5-4. AuthLoop message sizes. security factor is a commonly accepted safe value [215]forthenearfuture,andasourdata transmission improves, the security factor can increase as well. While similar to TLS 1.2, we have made a few important changes to reduce overhead. For instance, we do not perform cipher suite negotiation in every session and instead assume the default use of AES256 GCM and SHA256. Our link layer header contains a bit field indicating whether negotiation is necessary; however, it is our belief that starting with strong defaults and negotiating in the rare scenario where negotiation is necessary is critical to saving bandwidth for AuthLoop. Similarly, we are able to exclude additional optional information (e.g., compression types supported) and the rigid TLS Record format to ensure that our overhead is minimized. We also limit the contents of certificates. Our certificates consist of a protocol version, the prover’s phone number, claimed identification (i.e., a name), validity period, unique certificate identification number, the certificate owner’s ECC public key and a signature. Because certificate transmission comprises nearly half of the total transmission

117 time, we implemented two variants of AuthLoop: the standard handshake and a version with a verifier-cached certificate. Certificate caching enables a significantly abbreviated handshake. For certificate caching, we include a 16-bit certificate identifier that the verifier sends to the prover to identify which certificate is cached. We discuss how we limit transmitted certificate chain size to a single certificate in Section 5.5. Finally, we keep the most security-sensitive parameters as defined in the TLS specification, including recommended sizes for nonces (96 bits). While our protocol implementation significantly reduces the overhead compared to TLS 1.2 for this application, there is still room for improvement. In particular, the encrypted pre-master secret requires 1224 bits for the 256-bit premaster secret. This expansion is due to the fact that with ECC one must use a hybrid encryption model called the Integrated Encryption Scheme (IEC), so a key must be shared separately from the encrypted data. Pyelliptic also includes a SHA-256 HMAC of the ECC keyshare and encrypted data to ensure integrity of the message (which is standard practice in IEC). Because the message already includes an HMAC, in future work we plan to save 256 bits (or 15% of the cached certificate handshake) by including the HMAC of the ECC share into the message HMAC. 5.4 Evaluation

Previous sections established the need for a custom authentication protocol using avoicechannelmodemtoprovideend-to-endauthenticationfortelephonecalls.In this section, we describe and evaluate our prototype implementation. In particular, we characterize the error performance of the modem across several audio codecs, compute the resulting actual throughput after layer 2 e↵ects are taken into account, and finally measure the end to end timing of complete handshakes. 5.4.1 Prototype Implementation

Our prototype implementation consists of software implementing the protocol, link layer, and modem running on commodity PCs. While we envision that AuthLoop will

118 eventually be a stand-alone embedded device or implemented in telephone hardware/software, a PC served as an ideal prototyping platform to evaluate the system. We implemented the AuthLoop protocol in Python using the Pyelliptic library for cryptography. We also implemented the link layer in Python. Our modem was written in Matlab, and that code is responsible for modulating data, demodulating data, and sending and receiving samples over the voice channel. We used the Python Engine for Matlab to integrate our modem with Python. Our choice of Matlab facilitated rapid prototyping and development of the modem, but the Matlab runtime placed a considerable load on the PCs running the prototype. Accordingly, computation results, while already acceptable, should improve for embedded implementations. We evaluate the modem and handshake using software audio channels configured to use one of three audio codecs: G.711 (µ-law), Adaptive MultiRate Narrow Band (AMR-NB), and Speex. These particular codecs were among the most common codecs used for landline audio compression, cellular audio, and VoIP audio, respectively. We use the sox[190] implementations of G.711 and AMR-NB and the ↵mpeg[216] implementation of Speex. We use software audio channels to provide a common baseline of comparison, as no VoIP client or cellular device supports all of these codecs. As link layer performance depends only on the bit error characteristics of the modem, we evaluate the link layer using a software loopback with tunable loss characteristics instead of a voice channel. This allowed us to fully and reproducibly test and evaluate the link layer.

119 Table 5-2. Bit error rates. Codec AverageBitError Std. Dev G.711 0.0% 0.0% AMR-NB 0.3% 0.2% Speex 0.5% 5%

Table 5-3. Link layer transmission of 2000 bits. Bit Error Rate Transmission Time Goodput 0.1% 4.086 s (0.004) 490 bps 1% 6.130 s (0.009) 326 bps 2% 11.652 s (0.007) 172 bps

5.4.2 Modem Evaluation

The most important characteristic of the modem is its resistance to bit errors. To measure bit error, we transmit 100 frames of 2,000 random bits6 each and measure the bit error after reception. Table 5-2 shows the average and standard deviation of the bit error for various codecs. The modem saw no bit errors on the G.711 channel; this is reflective of the fact that G.711 is high-quality channel with very minimal processing and compression. AMR-NB and Speex both saw minimal bit error as well, though Speex had a much higher variance in errors. Speex had such a high variance because one frame was truncated, resulting in a higher average error despite the fact the other 99 frames were received with no error. 5.4.3 Link Layer Evaluation

The most important characteristic of the link layer is its ability to optimize goodput —theactualamountofapplicationdatatransmittedperunittime(removingoverhead from consideration). Table 5-3 shows, as a function of bit error, the transmission time and the goodput of the protocol compared to the theoretical optimal transmission time and goodput. The

6 2,000 bits was chosen as the first “round” number larger than the largest message in the AuthLoop handshake.

120 Table 5-4. Handshake completion times. Codec Cached Certificate Certificate Exchanged G.711 4.463 s (0.000) 8.279 s (0.000) AMR-NB 5.608s(0.776) 10.374s(0.569) Speex 4.427 s (0.000) 8.279 s (0.000)

Average 4.844 s 8.977 s optimal numbers are computed from the optimal bit time (at 500 bits per second) plus 40ms of header and footer. The experimental numbers are the average of transmission of 50 messages with 2000 bits each. The table shows that in spite of high bit error rates (up to 2%) the link layer is able to complete message transmission. Of course, the e↵ect of bit errors on goodput is substantial at larger rates. Fortunately, low bit error rates (e.g. 0.1%) result in a minor penalty to goodput — only 5bps lower than the optimal rate. Higher rates have a more severe impact, resulting in 65.8% and 34.7% of optimal goodput for 1% and 2% loss. Given our observations of bit error rates at less than 0.5% for all codecs, these results demonstrate that our Link Layer retransmission parameters are set with an acceptable range. 5.4.4 Handshake Evaluation

To evaluate the complete handshake, we measure the complete time from handshake start to handshake completion from the verifier’s perspective. We evaluate both variants of the handshake: with and without the prover sending a certificate. Handshakes requiring a certificate exchange will take much longer than handshakes without a certificate. This is a natural consequence of simply sending more data. Table 5-4 shows the total handshake times for calls over each of the three codecs. These results are over 10 calls each. Note that these times are corrected to remove the e↵ects of instrumentation delays and artificial delays caused by interprocess communication among the di↵erent components of our prototype that would be removed or consolidated in deployment.

121 From the verifier perspective, we find that cached-certificate exchanges are quite fast — averaging 4.844 seconds across all codecs. When certificates are not cached, our overall average time is 8.977 seconds. Di↵erences in times taken for certificate exchanges for di↵erent codecs are caused by the relative underlying bit error rate of each codec. G.711 and Speex have much lower error rates than AMR-NB, and this results in a lower overall handshake time. In fact, because those codecs saw no errors during the tests, their execution times were virtually identical. Most of the time spent in the handshake is spent in transmitting messages over the voice channel. In fact, transmission time accounts for 99% of our handshake time. Computation and miscellaneous overhead average to less than 50 milliseconds for all messages. This indicates that AuthLoop is computationally minimal and can be implemented on a variety of platforms. 5.5 Discussion

This section provides a discussion of client authentication, public key infrastructure, and deployment considerations for AuthLoop. 5.5.1 Client Credentials

Up until this point, we have focused our discussion around strong authentication of one party in the phone call (i.e., the Prover). However, clients already engage in a weaker “application-layer” authentication when talking to many call centers. For instance, when calling a financial institution or ISP, users enter their account number and additional values including PINs and social security numbers. Without one final step, our threat model would allow for an adversary to successfully steal such credentials as follows: An adversary would launch a 3-Way call to both the victim client and the targeted institution. After passively observing the successful handshake, the adversary could capture the client’s credentials (i.e., DTMF tone inputs) and hang up both ends of the call. The adversary could then call the targeted institution back spoofing the victim’s Caller ID and present the correct credentials.

122 One of the advantages of TLS is that it allows for the generation of multiple session keys, for use not only in continued authentication, but also in the protection of data confidentiality and integrity. AuthLoop is no di↵erent. While the data throughput enabled by our modem is low, it is suciently large enough to carry encrypted copies of client credentials. Accordingly, an adversary attempting to execute the above attack would be unable to do so successfully because this sensitive information could easily be passed through AuthLoop (and therefore useless in a second session). Moreover, because users are already accustomed to entering such information when interacting with these entities, the user experience could continue without any observable di↵erence. 5.5.2 Telephony Public Key Infrastructure

One of the most significant problems facing SSL/TLS is its trust model. X.509 certificates are issued by a vast number of Certificate Authorities (CAs), whose root certificates can be used to verify the authenticity of a presented certificate. Unfortunately, the unregulated nature of who can issue certificates to whom (i.e., what authority does X have to verify and bind names to entity Y ?) and even who can act as a CA have been known since the inception of the current Public Key Infrastructure [138]. This weakness has lead to a wide range of attacks, and enabled both the mistaken identity of domain owners and confusion as to which root-signed certificate can be trusted. Traditional certificates present another challenge in this environment — the existence of long verification chains in the presence of the bitrate limited audio channel means that the blind adoption of the Internet’s traditional PKI model will simply fail if applied to telephony systems. As we demonstrated in our experiment in Table 5-1,transmitting the entirety of long certificate chains would simply be detrimental to the performance of AuthLoop. The structure of telephony networks leads to a natural, single rooted PKI system. Competitive Local Exchange Carriers (CLECs) are assigned blocks of phone numbers by the North American Numbering Plan Association (NANPA), and ownership of these

123 Stored at AddTrust Root Verisign Root . . . Entrust Root Endpoint

NANPA Root

Stored Symantec at Endpoint

AT&T (NPA/NXX Administrator)

bankof america.com

(800) 432-1000 Bank of America

xyz.bankof america.com

Current Internet PKI Proposed TPKI

Figure 5-5. The Telephony Public Key Infrastructure (TPKI). Unlike in Internet model, the TPKI has a single root (NANPA) which is responsible for all block allocation, and a limited second level of CLECs who administer specific numbers. Accordingly, only the certificate for the number claimed in the current call needs to be sent during the handshake. blocks is easily confirmed through publicly posted resources such as NPA/NXX databases in North America. A similar observation was recently made in the secure Internet routing community, and resulted in the proposal of the Resource Public Key Infrastructure (RPKI) [217]. The advantage to this approach is that because all allocation of phone numbers is conducted under the ultimate authority of NANPA, all valid signatures on phone numbers must ultimately be rooted in a NANPA certificate. This Telephony Public Key Infrastructure (TPKI) reduces the length of certificate chains and allows us to easily store the root and all CLEC certificates in the US and associated territories ( 700 [218]) in just over 100 KiB of storage (1600 bits per certificate 700). Alternatively, ⇡ ⇥ if certificates are only needed for toll-free numbers, a single certificate for the company that administers all such numbers (i.e., Somos, Inc.) would be sucient. Figure 5-5 shows the advantages of our approach. Communicating with a specific server (xyz.bankofamerica.com) may require the transmission of three or more certificates before identity can be verified. Additionally, the existence of di↵erent roots adds confusion

124 to the legitimacy of any claimed identity. Our proposed TPKI relies on a single NANPA root, and takes advantage of the relatively small total number of CLECs to require only single certificate for the calling number to be transmitted during the handshake. Further specifying details of the proposed TPKI (e.g., revocation, etc) presents an opportunity for interesting future work. 5.5.3 Deployment Considerations

As our experiments demonstrate that AuthLoop is bandwidth and not processor bound, we believe that these techniques can be deployed successfully across a wide range of systems. For instance, AuthLoop can be embedded directly into new handset hardware. Moreover, it can be used immediately with legacy equipment through external adapters (e.g., Raspberry Pi). Alternatively, AuthLoop could be loaded on mobile devices through a software update to the dialer, enabling large numbers of devices to immediately benefit. Full deployments have the opportunity to make audio signaling of AuthLoop almost invisible to the user. If AuthLoop is in-line with the call audio, the system can remove AuthLoop transmissions from the audio sent to the user. In other words, users will never hear the AuthLoop handshakes or keep-alive messages. While our current strategy is to minimize the volume of the signaling so as to not interrupt a conversation (as has been done in other signaling research [219]), we believe that the in-line approach will ultimately provide the greatest stability and least intrusive user experience. Lastly, we note that because AuthLoop is targeted across all telephony platforms, a range of security indicators will be necessary for successfully communicating authenticated identity to the user. However, given the limitations of space and the breadth of devices and their interfaces, we leave this significant exploration to our future work.

125 CHAPTER 6 EFFICIENT IDENTITY AND CONTENT AUTHENTICATION FOR PHONE CALLS Telephones remain of paramount importance to society since their invention 140 years ago, and they are especially important for sensitive business communications, whistleblowers and journalists, and as a reliable fallback when other communication systems fail. However, as we have discussed, these networks were never designed to provide end-to-end authentication or integrity guarantees. Adversaries with minimal technical ability regularly take advantage of this fact by spoofing Caller ID, a vulnerability enabling over $7billioninfraudin2015[44]. More capable adversaries can exploit weaknesses in core network protocols such as SS7 to reroute calls and modify content [220]. Unlike the web, where mechanisms such as TLS protect data integrity and allow experts to reason about the identity of a website, the modern telephony infrastructure simply provides no means for anyone to reason about either of these properties. In this chapter, we discuss AuthentiCall, a system designed to provide end-to-end guarantees of authentication and call content integrity over modern phone systems (e.g., landline, cellular, or VoIP). While most phones have access to some form of data connection, that connection is often not robust or reliable enough to support secure VoIP phone calls. AuthentiCall uses this often low-bitrate data connection to mutually authenticate both parties of a phone call with before the call is answered. Even in the worst case, this authentication adds at most a negligible 1.4 seconds to call establishment. Once a call is established, AuthentiCall binds the call audio to the original authentication using specialized, low-bandwidth digests of the speech in the call. These digests protect the integrity of call content and can distinguish legitimate audio

Text of this chapter is reprinted with permission from Bradley Reaves, Logan Blue, Hadi Abdullah, Luis Vargas, Patrick Traynor, and Tom Shrimpton. AuthentiCall: Ecient Identity and Content Authentication for Phone Calls. In Proceedings of 26th USENIX Security Symposium, Vancouver, BC, August 2017. (Acceptance Rate:16.3%).

126 modifications attributable to the network from 99% of maliciously tampered call audio even while a typical user would expect to see a false positive only once every six years. Our system is the first to use these digests to ensure that received call audio originated from the legitimate source and has not been tampered with by an adversary. Most critically, AuthentiCall provides these guarantees for standard telephone calls without requiring changes to any core network. This chapter makes the following contributions:

• Designs channel binding and authentication protocols: We design and implement protocols that bind identities to phone numbers, mutually authenticate both parties of a phone call, and protect call content in transit.

• Evaluates robust speech digests for security: We show that proposed constructions for digesting speech data in systems that degrade audio quality can be made e↵ective in adversarial settings in real systems.

• Evaluates call performance in real networks: Our prototype implementation shows that the techniques pioneered in AuthentiCall are practical and performant, adding only 1.4 seconds in the worst case to phone call establishment in typical settings. We are not the first to address this problem [104]–[106], [110], [111], [125], [221], [222]. However, other approaches have relied upon weak heuristics, fail to protect phone calls using the public telephone network, are not available to end users, neglect to protect call content, are trivially evaded, or add significant delay to call establishment. AuthentiCall is the only system that authenticates phone calls and content with strong cryptography in the global telephone network with negligible latency and overhead. The remainder of this chapter is organized as follows: Section 6.1 describes our assumptions about adversaries and our security model in detail; Section 6.2 gives a formal specification of the AuthentiCall system; Section 6.3 discussed how analog speech digests can be used to achieve call content integrity; Section 6.4 provides details of the implementation of our system; Section 6.5 shows the results of our experiments; Section 6.6 o↵ers additional discussion.

127 Telephony Bank Core Caller ID Spoofing HI CC#?

Telephony Telephony Core Core Content Injection

Figure 6-1. Broad overview of attacks possible on Caller ID and call content in current telephony landscape.

6.1 Security Model

In order to authenticate voice calls and content, AuthentiCall will face adversaries with a range of capabilities. The simplest adversary will attempt to commit phone fraud by spoofing Caller ID when calling a target. An equivalent form of this attack may occur by the adversary tricking their target to call an arbitrary number under their control (e.g., via spam or phishing) and claiming to represent some other party (e.g., a financial institution). Additionally, this adversary may perform a call forwarding attack, which forces a target calling a legitimate number to be redirected to the adversary. Lastly, the adversary may place a voice call concurrent with other legitimate phone calls in order to create a race condition to see which call arrives at the destination first. In all of these cases, the goal of the adversary is to claim another identity for the purpose of extracting sensitive information (e.g., bank account numbers, usernames, and passwords). Amoresophisticatedadversarymaygainaccesstoanetworkcoreviavulnerabilities in systems such as SS7 [220], or improperly protected legal wiretapping infrastructure [22]. This adversary can act as a man-in-the-middle, and is therefore capable of redirecting calls to an arbitrary endpoint, acting as an arbitrary endpoint, hanging up one side of a call at any point in time, and removing/injecting audio to one or both sides. Such an adversary is much more likely to require nation-state level sophistication, but exists nonetheless. Examples of both classes of adversary are shown in Figure 6-1.

128 Given that the bitwise encoding of audio is unlikely to be the same at each endpoint, end-to-end encryption is not a viable means of protecting call content or integrity across the heterogeneous telephony landscape. Moreover, while we argue that the majority of phones have access to at least a low-bandwidth data connection, solutions that demand high-speed data access at all times (i.e., pure VoIP calls) do not o↵er solutions for the vast majority of calls (i.e., cellular calls). Finally, we claim no ability to make changes throughout the vast and disparate technologies that make up the core networks of modern telephony and instead focus strictly on addressing this problem in an end-to-end fashion. We define four participants: the Caller (R), the Callee (E), the Server (S), and the Adversary (Adv). Callers and Callees will register with the AuthentiCall service as described in the next section and will generate credentials1 that include a public key. AuthentiCall will achieve the following security goals in the presence of the above-described adversaries: 1. (G1) Proof of number ownership: During the process of registration, R will actively demonstrate ownership of its claimed Caller ID to S before it receives a signed certificate. 2. (G2) Authentication of the caller: E will be able to cryptographically verify the identity of R prior to accepting an incoming call. 3. (G3) Authentication of the callee: R will be able to cryptographically verify the identity of E as soon as the call begins. 4. (G4) Integrity protection of call content: Both R and E will be able to verify that the analog voice content has not been meaningfully altered, or that new content has not been injected by a man in the middle. Additionally, both will be protected against concurrent call attacks.

1 The details of which are described in depth in Section 6.2.

129 5. (G5) Proof of liveness: Both R and E will be able to detect if the other party is no longer on the call, perhaps as the result of a man in the middle attempting to engage in the call after the initial authentication phase. 6.2 Protocol Design and Evaluation

In the previous section, we saw that AuthentiCall has five security goals to meet, and this section describes the three protocols that AuthentiCall uses to achieve these goals. These are the Enrollment, Handshake, and Call Integrity protocols. These protocols make use of certificates issued to each client that indicate that a particular client controls a specific phone number. In Chapter 5 we proposed a full public key infrastructure for telephony [222]calleda“TPKI”thatwouldhaveasitsroot the North American Numbering Plan Administration with licensed carriers acting as certificate authorities. This PKI would issue an authoritative certificate that a phone number is owned by a particular entity, and AuthentiCall could enforce that calls take place between the entities specified in those certificates. While AuthentiCall can leverage the proposed TPKI, a fully-deployed TPKI is not necessary as AuthentiCall can act as its own certificate authority (this is discussed further in the enrollment protocol). All of these protocols make use of a client-server architecture, where an AuthentiCall server acts as either an endpoint or intermediary between user clients. There are several reasons for this design choice. First, having a centralized relay simplifies the development of AuthentiCall. Second, it allows the server to prevent abuses of AuthentiCall like robodialing by a single party by implementing rate limiting. The server can authenticate callers before allowing the messages to be transmitted, providing a mechanism for banning misbehaving users. Finally, all protocols (including handshake and enrollment) implement end-to-end cryptography. Assuming the integrity of the AuthentiCall certificate authority infrastructure and the integrity of the client, no other entity of the AuthentiCall network can read or fabricate protocol messages. We also assume that all communications between clients and servers use a secure TLS configuration with server authentication.

130 Our protocols have another goal: no human interaction except for choosing to accept a call. There are two primary reasons for this. First, it is well established that ordinary users (and even experts) have diculty executing secure protocols correctly [223]. Second, in other protocols that rely on human interaction, the human element has been shown to be the most vulnerable [115]. The following subsections detail the three protocols in AuthentiCall. The first protocol, the enrollment protocol, ensures that a given AuthentiCall user actually controls the phone number they claim to own (G1). The enrollment protocol also issues a certificate to the user. The second protocol, the handshake protocol, mutually authenticates two calling parties at call time (G2 and G3). The final protocol, the call integrity protocol, ensures the security of the voice channel and the content it carries (G4 and G5). 6.2.1 Enrollment Protocol

The enrollment protocol ensures that a client controls a claimed number and establishes a certificate that binds the identity of the client to a phone number. For our purposes, “identity” may be a user’s name, organization, or any other pertinent information. Binding the identity to a phone number is essential because phone numbers are used as the principal basis of identity and routing in phone networks, and they are also used as such with AuthentiCall. The enrollment protocol is similar to other certificate issuing protocols but with the addition of a confirmation of control of the phone number. Figure 6-2 shows the details of the enrollment protocol. The enrollment protocol has two participants: a client C and an AuthentiCall enrollment server S.Inmessage1,C sends an enrollment request with S’s identity, C’s identity information, C’s phone number, and C’s public key. In message 2, the server sends a nonce NNet,theidentitiesofC and S and the phone numbers of C and S with a timestamp to ensure freshness, liveness, and to provide a “token” for this particular authentication session.

131 + ID(C), PhNum(C), ID(S),KC Server(S) (1) NNet, ID(C), PhNum(C), ID(S), PhNum(S),TS

Client (C) Client (2)

NAudio

(3) NAudio,NNet, ID(C), PhNum(C)ID(S), TS, Sign kC (4) Cert(ID(C), PhNum(C), K+, Sign ) C KS (5)

Data Channel Audio Channel

Figure 6-2. Our enrollment protocol confirms phone number ownership and issues a certificate.

In message 3, the server begins to confirm that C controls the phone number it claims. The number is confirmed when S places a call to C’s claimed phone number. When the call is answered, S transmits a nonce over the voice channel. Having S call C is a critical detail because intercepting calls is far more dicult than spoofing a source number.2 Using a voice call is important because it will work for any phone – including VoIP devices that may not have SMS access.

In message 4, C sends both NNet and NAudio along with the IDs of server, client, a timestamp, and a signature covering all other fields. This final message confirms three things: possession of NNet,theabilitytoreceiveacallbyprovidingNAudio and possession by C of the private key KC by virtue of signing the message. In message 5, S replies with a signed certificate issued to C.Thiscompletesthe enrollment protocol.

2 We will revisit the threat of call interception later in this subsection.

132 We note that this protocol is subject to the same limitations on certifying identity as every other Internet certificate authority. In particular, we will require an out-of-band process to verify identity for high-value certificates, and will require the ability to authenticate supporting documentation. AuthentiCall can also use other authoritative information sources like CNAM3 lookups to verify number ownership in some cases. While no system or process is perfect, these types of policies have been largely e↵ective on the Internet. We also note that this is a trust-on-first-use (TOFU) protocol. While the protocol is secure in the presence of passive adversaries on both the data and voice networks, if an adversary can actively intercept a call addressed to a victim phone number (and also supply any out-of-band identity confirmation), they may be able to obtain a certificate for a number they illicitly control. If a TPKI were deployed, this attack would not be possible. Even without a TPKI, the likelihood of a successful attack is limited. Success is limited because the attack would eventually be detected by the legitimate owner when they attempt to register or authenticate using the legitimate number. To further protect against the prior attack, our protocol meets an additional goal: human interaction is not required for enrollment and confirming control of the claimed phone number. This means that automatic periodic reverification of phone number control is possible. This is important to prevent long-term e↵ects of a brief phone number compromise, but also for more mundane issues like when phone numbers change ownership.

3 CNAM is the distributed database maintained by carriers that maps phone numbers to the names presented in traditional caller ID. While spoofing a number is trivial, CNAM lookups occur out-of-band to call signaling and results could only be spoofed by a carrier, not a calling party.

133 Caller (R) Server (S) Callee (E) Call PhNum(E) (1) Incoming call from R E AuthentiCall Users (2) (3)

ID(R), PhNum(R), Cert(R), ID(E), PhNum(E), Cert(E)

TS1,NR, DHR, Sign TS2,NE, DHE, Sign KR KE (4a) (4b) HMAC (msg , msg , “Caller”) HMAC (msg , msg , “Callee”) KER1 4a 4b KER2 4a 4b (5a) (5b)

TLS to Server Message via Server TLS Voice Call

Figure 6-3. Our handshake protocol mutually authenticates both parties.

6.2.2 Handshake Protocol

The handshake protocol takes place when a caller intends to contact a callee. In this protocol, the caller places a voice call over the telephone network while simultaneously using a data connection to conduct the handshake protocol. The handshake protocol consists of two phases. The first indicates to the AuthentiCall server and the calling party that a call is imminent. The second phase authenticates both parties on the call and establishes shared secrets. These secrets are only known end-to-end and are computed in a manner that preserves perfect forward secrecy. Figure 6-3 shows the handshake protocol. The first phase consists of messages 1–3. In message 1, a caller R indicates to an AuthentiCall server S that R would like to place a call to the callee E.Inmessage2,S informs the callee E that an authenticated voice call is incoming. In message 3, S informs R whether E is an AuthentiCall user or not, but does not provide information about E’s presence or availability. Message 3 has several aims. The first is to protect the privacy of E.Astrawmanmechanismtoprotectprivacyis for AuthentiCall to provide no information about E until E agrees to accept the call. However, this presents a problem: if an adversary tampers or blocks messages from E, it prevents E from participating in the handshake, and R would have to assume (in the

134 absence of outside knowledge) that E is not a participant in AuthentiCall. This would allow an adversary to evade AuthentiCall. To solve this problem, S simply indicates to R whether or not R should expect to complete an AuthentiCall handshake for this call if E is available and chooses to accept the call. This reveals only E’s preference to authenticate a phone call, and nothing about her availability or whether she has even chosen to accept or reject a call. Protecting this information is important because if an unwanted callee knows that a user is available, they may call repeatedly or use that information in other undesirable ways (e.g., harassment or telemarketing). If message 3 indicates that E is not an AuthentiCall user but E does not choose to accept the call, R must simply wait for the call request to time out. From R’s perspective, this is no di↵erent from dialing and waiting for a busy signal or voicemail and should add little to no latency to the call. If message 3 indicates that E is not an AuthentiCall user, the protocol ends at this step and R is forced to fallback to an insecure call. The second handshake phase authenticates R and E and consists of messages 4A-B and 5A-B. These messages are indicated by letters A and B because the messages contain the same fields for caller and callee respectively. They can be computed independently and sent in parallel, reducing round trip latencies. Message 4 contains all information necessary for a Die-Hellman key establishment authenticated with a signature key defined in the certificate of R or E.Italsocontains identity information for R or E, the calling or called phone number, a timestamp, and a nonce. Each side also provides a Die-Hellman share, and the entire message is signed with the public key in the certificate issued by AuthentiCall. After message 4, both sides combine their Die-Hellman secret with the share they received to generate the derived secret. Each client then generates keys using the Die-Hellman result, the timestamps of both parties, and the nonces of both parties. Message 5A and 5B contain an HMAC of messages 4A and 4B along with a string to di↵erentiate message 5A from message 5B. The purpose of this message is to provide

135 Server (S) Caller (R) Callee (E)

“Call Connected”, TS1, “Call Connected”, TS2,

HMACKER (TS1) HMACKER (TS2) (0a) (0b) Index, Audio Digest1, Index, Audio Digest1,

AuD2...AuD5, AuD2...AuD5,

HMACKER (P receding) HMACKER (P receding) (1a) . . (1b) . .

“Call Ended”, TS3, “Call Ended”, TS4,

HMACKER (TS3) HMACKER (TS4) (Na) (Nb)

Message via Server TLS Voice Call

Figure 6-4. Our call integrity protocol protects all speech content. key confirmation that both sides of the exchange have access to the keys generated after messages 4A and 4B. This message concludes the handshake protocol. 6.2.3 Call Integrity Protocol

The call integrity protocol binds the handshake conducted over the data network to the voice channel established over the telephone network. Part of this protocol confirms that the voice call has been established and confirms when the call ends. The remainder of the messages in this protocol exchange content authentication information for the duration of the call. This content integrity takes the form of short “digests” of call audio (we discuss these digests in detail in the following section). These digests are e↵ectively heavily compressed representations of the call content; they allow for detection of tampered audio at a low bit rate. Additionally, the digests are exchanged by both parties and authenticated with HMACs. Figure 6-4 shows the details of the call integrity protocol. The protocol begins after the voice call is established. Both caller R and callee E send a message indicating that the

136 voice call is complete. This message includes a timestamp and HMAC of the timestamp. These messages are designed to prevent attacks where a call is redirected to another phone. One possible attack is an adversary maliciously configuring call forwarding on a target; the handshake would be conducted with the target, but the voice call would be delivered to the adversary. In such a case, the target would not send a “call established” message and the attack would fail. Once the voice call begins, each side will send the other audio digests at a regular interval. This message is protected with an HMAC to prevent a network adversary from tampering with the audio digests. When the voice call ends, each side sends a “call concluded” message containing a timestamp with an HMAC. This alerts the end point to expect no more digests. It also prevents a man-in-the-middle from continuing a call that the victim has started and authenticated. 6.2.4 Evaluation

Our protocols use standard constructions for certificate establishment, certificate-based authentication, authenticated key establishment, and message authentication. We used ProVerif [224] to further analyze the handshake and enrollment protocols. The analysis verified that our handshake protocol establishes and never leaks the secret key. The protocol also provides authentication and perfect forward secrecy for both the caller and callee. The enrollment protocol is verified to never leak the private keys of either party. This property allows us to assert that both signatures and certificates cannot be forged. 6.3 Speech Digest Design and Evaluation

The previous section describes how AuthentiCall enrolls and authenticates users prior to a call. During a call, AuthentiCall needs a way to summarize speech content in order to authenticate audio using a low-bandwidth data connection. We term these summaries “speech digests.” A speech digest has two goals. First, it must accurately summarize the content of the call. However, it is not necessary for this summary to be lossless or

137 meaningful for human interpretation. We are also concerned more with semantics (i.e., words spoken) than we are with speaker voice characteristics (e.g., tone, identity) or extraneous features like noise. Second, the digest must be robust to non-semantic changes in audio. Because of ambient or electronic noise, intermittent loss, and the use of di↵ering encodings throughout the phone network, the audio transmitted by a phone will not be the same as the audio received. In particular, the audio received is practically guaranteed to not be similar on a bit level to the audio sent by the phone. This means that common data digest approaches like cryptographic hashes will fail. While the original phone system used analog transmission of voice, it is now common in every telephone network (landline, VoIP, cellular, etc.) for speech to be digitized and compressed using an audio codec. At network boundaries, it is common for audio to be decoded and recoded into a di↵erent codec (known as transcoding). Codecs used in the phone network are highly lossy and drastically distort the call audio, and so have the potential to significantly impact audio digest performance. In digital audio systems, voice data is encoded into discrete frames of 10-30 milliseconds (depending on codec choice and other factors) of audio that are transmitted. Because some phone systems (especially cellular and VoIP) use lossy networks for transmission, frames are routinely lost. For example, loss rates of 4% are considered nominal for cellular voice [197]. Finally, for a digest scheme to be e↵ective, the digests must be computed on the same audio, requiring time synchronization on both ends of the call to know where each digest should start and end. While we discuss how we achieve this synchronization in Section 6.3.2,wenotethat slight deviation in synchronization is likely, and we must use digests that account for all of the above realities.

138 L r0,0 r0,1 … r0,10 r 1.,0 Index l1 . B1 DCT DCT

Index l1 + w 8 8

Index l2

B2 > Index l2 + w

r200,0 r200,10 1 Second of Audio … 8 Digest Bits

Audio Features Compression Function (once per second) (64 times per second)

Figure 6-5. This figure illustrates the digest construction described in Section 6.3.1. Audio digests summarize call content by taking one second of speech data, deriving audio features from the data, and compressing blocks of those features into a bit string.

To accomplish these goals, we leverage research from an area of signal processing that produces techniques that are known as “perceptual hashes” or “robust hashes.”4 Unlike cryptographic hashes, which change drastically with small changes in input, robust digests give very similar outputs for similar inputs. Robust digests have been developed for a wide domain of inputs, including music, images, and speech, but their applicability has remained limited. To our knowledge, this work presents one of the first uses of robust speech digests for security. The following subsections provide a description of the speech digests we use in AuthentiCall and a thorough analysis of the performance of these digests for telephone calls.

139 6.3.1 Construction

There are a number of constructions of speech digests, and they all use the following basic process. First, they compute derived features of speech. Second, they define a compression function to turn the real-valued features into a bit string. In this work, we use the construction of Jiao et al. [225], which they call RSH. We chose this technique over others because it provides good performance on speech at a low-bitrate, among other properties. We note that the original work did not evaluate the critical case where an adversary can control the audio being hashed. Our evaluation shows that RSH maintains audio integrity in this crucial case. Finally, to our knowledge we are the first to use any robust speech digest for an authentication and integrity scheme. Figure 6-5 illustrates how RSH computes a 512-bit digest for one second of audio. In the first step of calculating a digest, feature computation, RSH computes the Line Spectral Frequencies (LSFs) of the input audio. LSFs are commonly used in speech compression algorithms to represent the major frequency components of human voice (known as formants), which contain the majority of semantic information in speech. That is, LSFs represent phonemes — the individual sound units present in speech. While pitch is useful for speaker recognition, LSFs are not a perfect representation of all of the nuances of human voice. This is one reason why it is sometimes dicult for humans to confidently recognize voices over the phone. This means that the digest more accurately represents semantic content rather than the speaker’s voice characteristics. This is important because anumberoftechniquesareabletosynthesizenewspeechthatevadesspeakerrecognition from existing voice samples [123], [226]. Finally, LSFs are numerically stable and robust to quantization — meaning that modest changes in input yield small changes in output.

4 In this work, we call these “robust digests” or “digests” to avoid confusion with cryptographic hashes

140 In RSH, the input audio is grouped into 30ms frames with 25ms audio overlap between frames, and 10 line spectral frequencies are computed for each frame to create a matrix L. The second phase of digest computation involves compressing the large amount of information about the audio into a digest. Because audio rarely changes on millisecond time scales, the representation L is highly redundant. To compress this redundant data, RSH uses the two-dimensional discrete cosine transform (DCT). The DCT is related to the Fourier transform, is computationally ecient, and is commonly used in compression algorithms (e.g., JPEG, MP3). RSH computes the DCT over di↵erent sections of the matrix L to produce the final digest. RSH only uses first eight DCT coecients (corresponding to the highest energy components and discarding high-frequency information). The second phase of digest computation — the compression function — uses the DCT algorithm in the computation of the bitwise representation of the audio sample. The following process generates 8 bits of a digest; it is repeated 64 times to generate a 512 bit digest.

1. Obtain a window size w and two window start indexes l1 and l2 from the output of a keyed pseudorandom function.

2. Select from L two blocks of rows. These blocks B1 and B2 contain all columns from

l1 : l1 + w and l2 : l2 + w respectively. 3. Compress these individual blocks into eight coecients each using the DCT.

4. Set eight digest bits by whether the corresponding coecients of the first block (B1)

are greater than the coecients of the second block (B2). We note that sections of audio are selected probabilistically; Reaves et al. show that the probability that a section of audio is not used in a digest is negligible [227]. This simply means that digests cover practically all content in the call. An important consideration is to note that the digest is keyed. These digests are clearly not intended to be used for the same purposes as a cryptographic hash, and the

141 use of a key in these functions is for a di↵erent purpose than keying in a cryptographic construction. By using a pseudorandom function, digests become dependent on time. This dependence adds entropy to digest construction so that repeated phrases generate unique digests. It also has the advantage that it makes it dicult to compute digests for audio without knowledge of the key, which in AuthentiCall is derived during the handshake for each call. In AuthentiCall, digests themselves are also authenticated using an HMAC to guarantee digest integrity in transit. Digests are computed by the caller and are received and verified by the callee. The verifying party computes the digest of the received audio, then computes the hamming distance between the calculated and received digests. Because degradation of audio over a phone call is expected, digests will not match exactly. However, the Hamming distance between two audio digests — or bit error rate (BER) — is related to the amount of change in the audio. By setting an appropriate threshold on BER, legitimate audio can be distinguished from incorrect audio. 6.3.2 Implementation and Evaluation

Now that we have seen how RSH digests are computed, we can evaluate properties of RSH digests. This includes e↵ects of legitimate transformations and the results of comparing digests of unrelated audio samples (as might be generated by an adversary). We also describe how we use digests to detect tampered audio. We implement RSH using Matlab, and we deploy it in our AuthentiCall prototype by using the Matlab Coder toolbox to generate C code that is compiled as an Android native code library. We use the TIMIT audio corpus [189]whichisastandardtestdatasetfor speech processing systems. It consists of high-fidelity recordings of 630 male and female speakers reading 10 English sentences constructed for phonetic diversity. Because RSH computes hashes of one second of audio, we split the TIMIT audio data into discrete seconds of audio corresponding to a unique section of audio from a speaker and sentence. This resulted in 22,487 seconds of unique audio.

142 Robustness. Robustness is one of the most critical aspects of our speech digests, and it is important to show that these digests will not significantly change after audio undergoes any of the normal processes that occur during a phone call. These include the e↵ects of various audio encodings, synchronization errors in audio, and noise. To test robustness, we generate modified audio from the TIMIT corpus and compare the BER of digests of standard TIMIT audio to digests of degraded audio. We first downsample the TIMIT audio to a sample rate of 8kHz, which is standard for most telephone systems. We used the sox [190]audioutilityfordownsamplingandaddingdelaytoaudiotomodel synchronization error. We also used sox to convert the audio to two common phone codecs, AMR-NB (Adaptive Multi-Rate Narrow Band) and GSM-FR (Groupe Sp´ecial Mobile Full-Rate). We used GNU Parallel [228]toquicklycomputetheseaudiofiles.To model frame loss behavior, we use a Matlab simulation that implements a Gilbert-Elliot loss model [229]. Gilbert-Elliot models bursty losses using a two-state Markov model parameterized by probabilities of individual and continued losses. We use the standard practice of setting the probability of an individual loss (p) and probability of continuing the burst (1 r)tothedesiredlossrateof5%forourexperiments.WealsouseMatlab’s awgn function to add Gaussian white noise at a 30 decibel signal to noise ratio. Figure 6-6 shows boxplots representing the distribution of BER rates of each type of degradation tested. All degradations show a fairly tight BER distribution near the median with a long tail. We see that of the e↵ects tested, 10ms delay has the least e↵ect; this is a result of the fact that the digest windows the audio with a high overlap. For most digests, addition of white noise also has little e↵ect; this is because LSF analysis discards all frequency information except for the most important frequencies. We see higher error rates caused by the use of audio codecs like GSM-FR and AMR-NB; these codecs significantly alter the frequency content of the audio. We can also see that a 5% loss rate has negligible e↵ect on the audio digests. Finally, we see that combining transcoding, loss, delay, and noise has an additive e↵ect on the resulting digest error — in other words, the more

143 Figure 6-6. These box plots show the distribution of digests bit error rates as a result of various audio degradations. These error rates are well below the rates seen by adversarial audio, shown in Figure 6-7. degradation that takes place, the higher the bit error. These experiments show that RSH is robust to common audio modifications.

Adversarial audio. While robustness is essential, the ultimate goal of these digests is to detect maliciously tampered or injected audio, which we term “adversarial audio.” Such an analysis has not been previously performed. To validate the ability of RSH to detect adversarial audio we compute the BER of digests of every pair of seconds of TIMIT audio discussed in the previous section. This dataset includes 252,821,341 pairs of single seconds of audio. For this test, we use the same key for every hash; this models the situation where an adversary can cause the target to receive audio of its choice but not modify the associated digest. We find that the mean BER between two distinct audio pairs is 0.478. A histogram and kernel density estimate of these values is also shown in Figure 6-7.Thisplotshows that the bit error is normally distributed with a mean and median of 0.478 and 0.480 (respectively). The expected bit error for two random bit strings is 50%, and the mean seen for RSH bit error is close to the optimal, best possible distance between two adversarial digests.

144 Figure 6-7. This graph shows the histogram and kernel density estimate of digest of adversarial audio on over 250 million pairs of 1-second speech samples. While the majority of legitimately modified audio has digest errors less than 35%, adversarial audio has digest BERs averaging 47.8%.

Because the TIMIT corpus contains speakers speaking several identical sentences, we can investigate the resilience of the digest to more specific adversarial scenarios in two important ways. First, we can look at whether using di↵erent speech from the same speaker can create a false positive. If so, this would be a serious problem because an adversary could use recorded words from the target speaker undetected. Second, we can determine if a di↵erent speaker uttering the same words causes false positives. This test indicates to what extent the digest is protecting content instead of speaker characteristics. We found that digests from the same speaker speaking di↵erent content are accepted at practically the same rate as audio that di↵ers in speaker and content. At a BER detection threshold of 0.384 (derived and discussed in the following subsection), the detection rate for di↵erent content spoken by the same speaker is 0.901482, while the detection rate for di↵erent content spoken by a di↵erent speaker is 0.901215. However, identical phrases spoken by di↵erent speakers results in a much higher rate of collision and a detection rate of 0.680353. This lower detection rate is not a problem for AuthentiCall because it is still

145 Figure 6-8. The digest performance ROC graph shows that digests can easily distinguish between legitimate and substituted audio, even in the presence of transcoding, loss, delay, and noise. These results are computed over digests of a single second. The graph is scaled to show the extreme upper corner. high enough to detect modified call audio with high probability. More importantly, it indicates that RSH is highly sensitive to changes in call content.

Threshold selection and performance. Distinguishing legitimate and illegitimate audio requires choosing a BER threshold to detect tampered audio. Because the extreme values of these populations overlap, a tradeo↵between detection and false positives must be made. The tradeo↵is best depicted in a ROC curve in Figure 6-8.Thisfigure shows the true positive/false positive tradeo↵measured on the adversarial audio and two legitimate modifications — GSM encoding and a combination of GSM, AMR-NB, 5% frame loss, 10ms delay, and 30dB of white noise. This combination represents an approximate “worst case” of legitimate audio. Figure 6-8 shows excellent performance in terms of distinguishing audio. For GSM-only audio, we see an area-under-curve of 0.998, and for the “worst case” audio, we see an area-under-curve of 0.992. However, because digests will be used at a high rate (one per second), even with a very small false positive rate, alerting users for every individual detection will likely result in warning fatigue. As a result, the most important metric for evaluating a threshold is minimizing

146 the users’s likelihood of a false positive. This problem suggests trading o↵sensitivity to short changes in call content for a lower false positive rate. To reduce overhead and network load, AuthentiCall sends digests in groups of five. To provide high detection rates while limiting false positives, AuthentiCall alerts the user if any 3 out of 5 digests are greater than the BER threshold. We model true and false performance of this scheme as a set of five Bernouli trials — successful authentication for true positives and successful digest collision for false positives. Thus, we can compute 3-out-of-5 performance using the binomial distribution. After this analysis, we selected an individual-digest BER threshold of 0.384. This corresponds to an individual adversary audio true positive detection rate of 0.90, while presenting a 0.0058 false positive rate against our “worst-case” audio and a 0.00089 false positive rate against clean GSM-FR encoded audio. Using our “three-out-of-five” alerting scheme, the probability of detecting 3 or more seconds of tampered audio is 0.992. The

6 false positive rate is drastically reduced: the false positive rate is 1.96 10 ,andforclean ⇥ 9 GSM-FR audio the false positive rate is 7.02 10 . This corresponds to a false alert ⇥ on average every 425.1 hours of talk time for worst case audio, and for GSM-FR audio one false positive every 118,766 hours. For reference, the average British mobile phone user only places 176 minutes per month of outbound calls [230]; assuming inbound and outbound talk time are roughly equal, the average user only places 70.4 hours of calls per year. This means that the average AuthentiCall user would only see a false alert once every six years.

Limitations. No security solution is perfect, and our use of audio digests have some limitations. The chief limitation is that audio digests cannot detect altered audio less than one second in length. This limitation is simply a result of the constraints of doing low-bitrate authentication of mutable and analog data.

147 While the digests are not perfect, we argue that they are secure against most adversaries. We note that audio digests have two purposes: 1) to provide a guarantee that the voice call established was the one that was negotiated in the handshake and 2) that the voice content has not significantly changed during the call. These two goals deal with adversaries of di↵erent capabilities. In particular, intercepting and modifying call audio requires far more advanced access and capability than simply spoofing a caller ID during a handshake already occurring. Audio digests will detect the first scenario within five seconds of audio, and it will also quickly detect changes that e↵ect any three seconds in five for the second scenario. In limited circumstances, it may be possible for a man-in-the-middle adversary to make small modifications to the received audio. For the second attack to be successful in the presence of these digests, a number of conditions must hold: First, the adversary can change no more than two seconds out of every five seconds of audio. Second, the adversary must change the audio in a way that would sound natural to the victim. This would mean that the changed audio would have to conform to the both the current sentence pattern as well as the speaker’s voice. While voice modification algorithms exist, modifying an existing sentence in an ongoing conversation is likely beyond the abilities of current natural-language processing. Finally, in addition to the substantial diculty of these limits, the adversary must also do all of this in soft-real-time. Nevertheless, a user is still not defenseless against such an attack. While we believe such attempts would likely be noticeable and suspicious to the human ear, users could also receive prompts from AuthentiCall when individual digests fail. These prompts could recommend that the user ask the opposing speaker to elaborate their prior point or to confirm other details to force the adversary to respond with enough tampered audio that the attack could be detected.

148 6.4 System Implementation

The previous sections described the protocol design and characterized our speech digests. In this section, we describe our AuthentiCall client and server implementation, and in the following section evaluate its performance. Server. Our server was implemented in Java, using Twilio’s Call API to call clients during the registration phase to share the audio nonce that confirms control of a phone number. Google Cloud Messaging (GCM) is used to generate a push notification to inform clients of incoming calls. Client. Our prototype AuthentiCall client consists of an Android app, though we anticipate that in the future AuthentiCall will be available for all telephony platforms, including smartphones, VoIP phones, PBXs, and even landlines (with additional hardware similar in concept to legacy Caller ID devices). ATLSconnectionisusedtoestablishasecurechannelbetweenclientandserver. We implement the AuthentiCall protocol in Java using the Spongy Castle library [231]. The audio digests were implemented in Matlab, compiled to C, and linked into the app as native code. In our implementation, digest protocol messages contain five seconds of audio digests. We use RSA-4096 to as our public key algorithm and SHA-3 for the underlying hash function for HMACs. To reduce handshake time, we use a standard set of NIST Die Hellman parameters hardcoded into the client. These are NIST 2048-bit MODP group with a 256-bit prime order subgroup from RFC5114 [232]. We also use the HMAC-based key derivation algorithm used by TLS 1.2 described in RFC 5869 [233]. Upon registration, the server issues the client an X.509 certificate. This consists of a user’s claimed identity, phone number, validity, public key and signature of the CA. Audio nonces. As described in Section 6.2, the AuthentiCall enrollment protocol sends a nonce through the voice channel to ensure that an client can receive a voice call. We use a 128-bit random nonce. In our implementation, the nonce is encoded as

149 touch-tones (DTMF5 ). DTMF tones were used because they are faithfully transmitted through every telephone system and were simple to send and detect. There are 16 possible touch-tone digits, 6 so each tone can represent an encoded hexadecimal digit. These tones are transmitted for 200ms each with a 100ms pause between tones. This provides a bit rate of 13.3 bits per second for a nonce transmission time of 9.6 seconds. This transmission time comprises the bulk of the time spent in the enrollment protocol. 6.5 Results

Our AuthentiCall implementation allows us to test its performance in enrollment, call handshakes, and detecting modified call audio in real phone calls. 6.5.1 Experiment Setup

Before describing individual experiments, we describe our experiment testbed. The AuthentiCall server was placed on an Amazon Web Services (AWS) server located in Northern Virginia. We used the same network provider, AT&T, and the same cellular devices, Samsung Galaxy Note II N7100s, across all experiments. The enrollment and handshake experiments were carried out 20 times over both WiFi and 3G, and digest exchange tests were done 10 times using WiFi. Digest exchange was done over WiFi as this experiment was used to validate content protection, not delivery speed. In all experiments, calls used a 3G voice channel. 6.5.2 Enrollment Protocol

Our first experiments measure the user enrollment time. We measure the time from the instant a user begins enrollment to when the user receives the last protocol message, including all protocol messages and the audio nonce. For clients, enrollment is a one-time

5 Dual-Tone Multi-Frequency tones are the sounds made by dialing digits on a touch-tone phone. 6 Four DTMF tones are not available on consumer phones but provide additional functionality in some special phone systems

150 Figure 6-9. Enrollment takes less than 30 seconds and is a one time process that may be done in the background. process that is done before the first call can be placed, analogous to activating a credit card. Figure 6-9 shows the average time of enrollment (and standard error) using 3G and WiFi to exchange protocol messages. The main contributor to the enrollment time comes from the transmission of the audio nonce which is used to establish ownership. Though the enrollment times over 3G and WiFi are 25 and 22 seconds respectively, this protocol requires no user interaction. 6.5.3 Handshake Protocol

We next measure the time to complete an entire handshake, including data messages and voice call setup. We note that voice call setup time is substantial, and requires many seconds even without AuthentiCall. We believe the most important performance metric is additional latency experienced by the end user. As shown in Figure 6-10, AuthentiCall only adds 1.07 seconds for WiFi or 1.41 seconds on 3G data to the total call establishment time (error bars indicate standard error). We believe that this will be unnoticeable to the user for several reasons. First, call establishment time varies significantly. This is normal network behavior, not an artifact introduced by AuthentiCall. In our 3G experiments our additional handshake time is approximately equal to the standard error in voice call

151 Figure 6-10. AuthentiCall adds 1 to 1.41 seconds to the phone call establishment, making the overhead e↵ectively unnoticeable to users. establishment. We also note that our test phones were in the same location connected to the same tower, so the voice call setup time is likely lower than a typical call. In fact, our measured times are very close to the published estimates of 6.5 seconds for call setup by the tower between both phones [234]. Finally, we note that this is substantially faster than Authloop [222]whichtakesninesecondstoperformauthenticationaftercalldelivery. 6.5.4 Speech Digest Performance

Our final experiments evaluate our speech digest accuracy over real call audio. In these 10 calls, we play 10 sentences from 10 randomly selected speakers in the TIMIT corpus through the call, and our AuthentiCall implementation computed the sent and received digests. In total this represented 360 seconds of audio. For simplicity, a caller sends audio and digests, and a callee receives the audio and compares the received and locally computed digests. We also compared these 10 legitimate call digests with an “adversary call” containing di↵erent audio from the hashes sent by the legitimate caller. To compare our live call performance to simulated audio from Section 6.3,wefirstdiscuss our individual-hash accuracy.

152 Figure 6-11 shows the cumulative distribution of BER for digests of legitimate audio calls and audio sent by an adversary. The dotted line represents our previously established BER threshold of 0.348. First, in testing with adversarial audio, we see that 93.4% of the individual fraudulent digests were detected as fraudulent. Our simulation results saw an individual digest detection rate of 90%, so this means that our real calls see an even greater performance. Using our 3-out-of-5 standard for detection, we detected 96.7%. This test shows that AuthentiCall can e↵ectively detect tampering in real calls. Next, for legitimate calls, 95.5% of the digests were properly marked as authentic audio. Using our 3-out-of-5 standard, we saw no five-second frames that were marked as tampered. While our individual hash performance false positive rate of 4.5% was low, we were surprised that the performance di↵ered from our earlier evaluation on simulated degradations. Upon further investigation, we learned that our audio was being transmitted using the AMR-NB codec set to the lowest possible quality setting (4.75kbps); this configuration is typically only used when reception is exceptionally poor, and we anticipate this case will be rare in deployment. Nevertheless, there are several mechanisms that can correct for this. One option would be to digest audio after compression for transmission (our prototype uses the raw audio from the microphone); such a scheme would reduce false positives partially caused by known-good transformation of audio. Another option is to simply accept these individual false positives. Doing so would result in a false alert on average every 58 minutes, which is still acceptable as most phone calls last only 1.8 minutes [235]. 6.6 Discussion

We now discuss additional issues related to AuthentiCall.

Applications and use cases. AuthentiCall provides a mechanism to mitigate many open security problems in telephony. The most obvious problems are attacks that rely on Caller ID fraud, like the perennial “IRS scams” in the United States. Another problem

153 Figure 6-11. This figure shows that 93.4% of individual digests of adversarial audio are correctly detected while 95.5% of individual digests of legitimate audio are detected as authentic. Using a 3-out-of-5 detection scheme, 96.7% of adversarial audio is detected. is that for sensitive transactions many institutions, including banks and utilities, have to use extensive and error-prone challenge questions to authenticate their users, and these challenges still fail to stop targeted social engineering attacks. AuthentiCall o↵ers a strong method to authenticate users before and during a call, increasing security while reducing the time and e↵ort required by customers and call center workers. Yet another valuable use case is emergency services. These services have faced prank “swatting” calls that endanger the lives of first responders [236]aswellasdenialof service attacks that have made it impossible for legitimate callers to receive help [237]. AuthentiCall provides a mechanism that would allow essential services to prioritize authenticated calls in such a scenario while answering other calls opportunistically. While

154 such a proposal would need to be reviewed by public policy experts and stakeholders, we provide a mitigation to a problem that has no clear solution.

Server deployment. AuthentiCall relies on a centralized server infrastructure to facilitate authenticated calls while minimizing abuse. AuthentiCall, including server infrastructure, could be provided by a carrier or an independent organization. While acentralizedmodelissimplesttotestourhypothesisthatauxiliarydatachannelscan be used to authenticate traditional voice calls, we intend to study decentralized and privacy-preserving architectures in future work.

Cellular network load. Systems that make use of the cellular network must be careful not to increase signaling load on the network in a harmful way [238]–[240]. We believe that AuthentiCall will not cause network harm because in modern networks (3G and 4G), data signaling is no longer as expensive as a voice call, and simultaneous voice and data usage is now commonplace.

Certificate management. Any system that relies on certificates must address certificate revocation and expiration. AuthentiCall’s centralized model allows the server to deny use of any revoked certificate, drastically simplifying revocation compared to CRLs or protocols like OCSP. Similar to Let’s Encrypt [241], AuthentiCall certificates can have short lifetimes because certificate renewal using our enrollment protocol is fast and requires no human interaction As mentioned in Section 6.2, AuthentiCall could also make use of the proposed Telephony PKI [222]. In this scenario, certificate lifetime would be determined by the TPKI, and revocation managed by a certificate revocation list (CRL) published by the TPKI.

Why IP data. We chose IP data over other channels because it provides reliable and fast data transmission for most existing devices including smartphones, VoIP phones, and even landlines if provided with suitable hardware. As an example, SMS as a transmission carrier would be impractical. Bandwidth is low, and delivery is slow (on average 6.4

155 (a) (b)

Figure 6-12. Before the call is answered, AuthentiCall indicates if the call is authenticated or unauthenticated seconds [242]) and not guaranteed [6]. In particular, the average time to send one SMS message is 6.4 seconds [242], meaning that AuthentiCall using SMS would require a minimum of 38.4 seconds — e↵ectively increasing call setup time by a factor of 5.

Why not biometrics. Robust speech digests are a superior solution for content integrity than voice biometrics for several reasons. First, voice authentication is simply not secure in adversarial settings [226]. Second, voice biometrics would assume that the call would only consist of a single party (e.g., speakerphones would not be supported). By contrast, audio digests are speaker independent and can be computed locally with no additional knowledge about the other party.

User interface. We have developed a complete working prototype of AuthentiCall for Android, including a preliminary simple user interface as shown in Figure 6-12.Thisis one of the first interfaces to indicate secure Caller-ID, our prototype interface is intended to simply and clearly alert the user to the safety of the call. We note that indicating

156 security in a user interface requires great care [149], [243], and we intend to formally study interface design for AuthentiCall in future work.

157 CHAPTER 7 SUMMARY AND CONCLUSIONS The global telephone network revolutionized communications and remains a critical infrastructure for society. Phones are used to confirm some of our most sensitive transactions. From coordination between energy providers in the power grid to corroboration of high-value transfers with a financial institution, we rely on telephony to serve as a trustworthy communications path. Despite its continued importance as well as its continued technological evolution, this network does not provide the security guarantees that we require of a trusted critical infrastructure. One of these missing yet critical guarantees is authenticating users that use the network. We began this thesis in Chapter 3 by showing how text messaging has become an important part of the security infrastructure. However, the SMS ecosystem has evolved significantly since its inception, and now includes a wide range of devices and participants external to traditional cellular providers. Public SMS gateways directly embody this change, and allow us to not only observe at scale how a range of providers are implementing security solutions via text messages, but also provide us evidence of how assumptions about SMS are being circumvented in the wild. Our measurements identify a range of popular services whose one-time messaging mechanisms should be improved, and additional entities who may be creating new opportunities for compromise by sending highly sensitive data (e.g., credit card numbers) via these channels. On the abuse side, we see the ease with which these gateways are being used to circumvent authentication mechanisms, and show that previously proposed mitigations to PVA fraud such as block banning are unlikely to be successful in practice. These measurements indicate that all providers relying on SMS as an out of band channel for authentication with strong ties to a user’s identity should reevaluate their current solutions for this evolving space. From text messaging, we moved to examine the fact that carriers cannot authenticate inbound calls in their networks. Cellular networks in developing nations rely on tari↵s

158 collected at regulated interconnects in order to subsidize the cost of their deployment and operation. These charges can result in significant expense to foreign callers and create incentive for such callers to find less expensive, albeit unlawful, means of terminating their calls. Simboxes enable such interconnect bypass fraud by tunneling trac from a VoIP connection into a provider network without proper authorization. In Chapter 4, we develop the Ammit tool, which allows us to detect simboxes based on measurable di↵erences between true GSM and tunneled VoIP audio. Ammit uses fast signal processing techniques to identify whether individual calls are likely made by a simbox and then to develop profiles of SIM cards. This approach allows a provider to deactivate the associated SIMs rapidly and virtually eliminates the economic incentive to conduct such fraud. In so doing, we demonstrate that the subsidized rates that allow much of the developing world to be connected can be protected against the impact of this fraud. We then moved to the problem of authenticating end points. In spite of this trust placed in phone networks, authentication between two end points across the technologically diverse phone network was previously not possible. In Chapter 5,we present AuthLoop to address this challenge. We began by designing a modem and supporting link layer protocol for the reliable delivery of data over a voice channel. With the limitations of this channel understood, we then presented a security model and protocol to provide explicit authentication of an assertion of Caller ID, and discussed ways in which client credentials could be subsequently protected. Finally, we demonstrated that AuthLoop reduced execution time by over an order of magnitude on average when compared to the direct application of TLS 1.2 to this problem. In so doing, we have demonstrated that end-to-end authentication is indeed possible across modern telephony networks. After AuthLoop, we examined how the AuthentiCall system could use auxiliary data channels to authenticate phone calls. AuthentiCall not only cryptographically authenticates both parties on the call, but also provides strong guarantees of the integrity

159 of conversations made over traditional phone networks. We achieve these ends through the use of formally verified protocols that bind low-bitrate data channels to heterogeneous audio channels. Unlike previous e↵orts, we demonstrate that AuthentiCall can be used to provide strong authentication before calls are answered, allowing users to ignore calls claiming a particular Caller ID that are unable or unwilling to provide proof of that assertion. Moreover, we detect 99% of tampered call audio with negligible false positives and only a worst-case 1.4 second call establishment overhead. In so doing, we argue that strong and ecient end-to-end authentication for phone networks is approaching a practical reality. In this dissertation, we have demonstrated the ways in which telephones are used for authentication and provided new techniques to reduce or eliminate fraudulent use of phone networks. We begin by demonstrating how text messages are being used for authenticating users, despite the fact that fraud is simple to conduct and dicult to prevent. We next turned to interconnect bypass fraud, where one telephone network misrepresents the source of an original call; we show that this is detectable using features inherent to simboxed audio. We then address authenticating callers end to end, first with the AuthLoop system, which uses in-band data exchange to cryptographically authenticate end points with the voice channel. We then develop the AuthentiCall system to provide cryptographic mutual authentication and call integrity guarantees. In total, these e↵orts pave a way forward for an improved telephone network that provides the security guarantess that users expect and require.

160 REFERENCES

[1] eMarketer and AP, Number of mobile phone users worldwide from 2013 to 2019 (in billions), https://www.statista.com/statistics/274774/forecast-of-mobile-phone- users-worldwide/,2015.

[2] ITU, Number of fixed telephone lines worldwide from 2000 to 2016, https://www. statista.com/statistics/273014/number-of-fixed-telephone-lines-worldwide-since- 2000/,2016.

[3] TIA, Voip residential and business telephone lines in the united states from 2010 to 2018 (in millions), https://www.statista.com/statistics/615387/voip-telephone- lines-in-the-us/,2015.

[4] Communications Fraud Control Association (CFCA), 2013 Global Fraud Loss Survey, http://www.cvidya.com/media/62059/global-fraud loss survey2013.pdf, 2013.

[5] P. Lapsley, Exploding the Phone.GrovePress,Feb.2014,p.448.

[6] P. Traynor, P. McDaniel, and T. La Porta, Security for Telecommunications Networks, ser. Advances in Information Security Series 978-0-387-72441-6. Springer, Aug. 2008.

[7] P. Traynor, “Characterizing the Security Implications of Third-Party EAS Over Cellular Text Messaging Services,” IEEE Transactions on Mobile Computing (TMC),vol.11,no.6,pp.983–994,2012.

[8] SMS Forum, Short Message Peer to Peer Protocol Specification 5.0,2003.

[9] Twilio, http://www.twilio.com,2015.

[10] Nexmo, https://www.nexmo.com/,2015.

[11] Plivo, https://www.plivo.com/,2015.

[12] Burner app, http://www.burnerapp.com,2015.

161 [13] Pinger, http://www.pinger.com,2015.

[14] Google voice, http://www.google.com/voice,2015.

[15] Apple continuity, https://support.apple.com/en-us/HT204681,2015.

[16] Pushbullet, http://pushbullet.com,2015.

[17] Mightytext, http://mightytext.net,2015.

[18] B. Krebs, Banks: Credit Card Breach at Home Depot, http://krebsonsecurity.com/ 2014/09/banks-credit-card-breach-at-home-depot/,Sep.2014.

[19] U.S. Oce of Personnel Management, Cybersecurity Incidents, https://www.opm. gov/cybersecurity/cybersecurity-incidents/,2015.

[20] B. Krebs, Online Cheating Site AshleyMadison Hacked, http://krebsonsecurity. com/2015/07/online-cheating-site-ashleymadison-hacked/, Jul. 2015.

[21] ——, Experian Breach A↵ects 15 Million Consumers, http://krebsonsecurity.com/ 2015/10/experian-breach-a↵ects-15-million-consumers/,Oct.2015.

[22] Vassilis Prevelakis and Diomidis Spinellis, “The Athens A↵air,” IEEE Spectrum, Jun. 2007.

[23] A. Tims, “‘SIM swap’ gives fraudsters access-all-areas via your mobile phone,” The Guardian,Sep.2015.

[24] K. Campbell-Dollaghan, How Hackers Reportedly Side-Stepped Google’s Two-Factor Authentication, http://gizmodo.com/how-hackers-reportedly-side-stepped-gmails- two-factor-a-1653631338, Nov. 2014.

[25] Henry Lichstein, “Telephone Hackers Active,” The Tech, Nov. 1963. [Online]. Available: http://tech.mit.edu/V83/PDF/V83-N24.pdf.

162 [26] Tobias Engel, Tracking Mobile Phones, Berlin, 2008. [Online]. Available: http :

//berlin.ccc.de/⇠tobias/25c3-locating-mobile-phones.pdf.

[27] Karsten Nohl, SS7 Attack Update and Phone Phreaking, 2016. [Online]. Available: https://www.youtube.com/watch?v=BbPLscWQ1Bw.

[28] A. Ramirez, Theft through cellular ‘clone’ calls, http://www.nytimes.com/1992/04/ 07/business/theft-through-cellular-clone-calls.html, Apr. 1992.

[29] C.-H. Lee, M.-S. Hwang, and W.-P. Yang, “Enhanced privacy and authentication for the global system for mobile communications,” Wireless Networks,vol.5,no.4, pp. 231–243, 1999.

[30] Y. J. Choi and S. J. Kim, “An improvement on privacy and authentication in GSM,” in Proceedings of Workshop on Information Security Applications (WISA), 2004.

[31] E. Barkan, E. Biham, and N. Keller, “Instant Ciphertext-Only Cryptanalysis of GSM Encrypted Communication,” Journal of Cryptology,vol.21,no.3,pp.392– 429, 2008.

[32] M. Toorani and A. Beheshti, “Solutions to the GSM security weaknesses,” in Proceedings of the Second International Conference on Next Generation Mobile Applications, Services, and Technologies (NGMAST),2008,pp.576–581.

[33] 3rd Generation Partnership Project, “A Guide to 3rd Generation Security,” Tech. Rep. TS 33.900, 2000.

[34] ——, “3G Security Principles and Objectives,” Tech. Rep. TS 33.120, 2001.

[35] ——, “IP Multimedia Subsystem (IMS),” no. TS 23.228, 2012.

[36] ——, “Full rate speech; Transcoding,” Tech. Rep. TS 46.010.

[37] U. Meyer and S. Wetzel, “A man-in-the-middle attack on UMTS,” Proceedings of the 2004 ACM Workshop on Wireless Security,p.90,2004.

163 [38] G. Kambourakis, C. Kolias, S. Gritzalis, and J. H. Park, “DoS Attacks Exploiting Signaling in UMTS and IMS,” Comput. Commun.,vol.34,no.3,Mar.2011.

[39] M. Arapinis, L. Mancini, E. Ritter, M. Ryan, N. Golde, K. Redon, and R. Borgaonkar, “New privacy issues in mobile telephony: Fix and verification,” in Proceedings of the 2012 ACM Conference on Computer and Communications Security, New York, NY, USA, 2012, pp. 205–216.

[40] H. Kim, D. Kim, M. Kwon, H. Han, Y. Jang, D. Han, T. Kim, and Y. Kim, “Breaking and Fixing VoLTE : Exploiting Hidden Data Channels and Mis- implementations,” pp. 328–339, 2015.

[41] C.-Y. Li, G.-H. Tu, C. Peng, Z. Yuan, Y. Li, S. Lu, and X. Wang, “Insecurity of Voice Solution VoLTE in LTE Mobile Networks,” in Proceedings of the 22nd ACM Conference on Computer and Communications Security,ACM,2015.

[42] G.-H. Tu, C.-Y. Li, C. Peng, Y. Li, and S. Lu, “New Security Threats Caused by IMS-based SMS Service in 4G LTE Networks,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, New York, NY, USA: ACM, 2016, pp. 1118–1130.

[43] A. Shaik, R. Borgaonkar, N. Asokan, V. Niemi, and J.-P. Seifert, “Practical attacks against privacy and availability in 4G/LTE mobile communication systems,” in Proceedings of the 2016 Network and Distributed Systems Security Symposium (NDSS),2016.

[44] M. Hu↵man, Survey: 11% of adults lost money to a phone scam last year, https: //www.consumera↵airs.com/news/survey-11-of-adults-lost-money-to-a-phone- scam-last-year-012616.html,2016.

[45] K. Thomas, D. Huang, D. Wang, E. Bursztein, C. Grier, T. J. Holt, C. Kruegel, D. McCoy, S. Savage, and G. Vigna, “Framing Dependencies Introduced by Underground Commoditization,” in Proceedings of the 14th Annual Workshop on the Economics of Inforamtion Security,2015.

[46] C. Kanich, C. Kreibich, K. Levchenko, B. Enright, G. M. Voelker, V. Paxson, and S. Savage, “Spamalytics: An empirical analysis of spam marketing conversion,” in Proceedings of the 15th ACM Conference on Computer and Communications Security,ACM,2008,pp.3–14.

164 [47] C. Kanich, N. Weaver, D. McCoy, T. Halvorson, C. Kreibich, K. Levchenko, V. Paxson, G. M. Voelker, and S. Savage, “Show Me the Money: Characterizing Spam-advertised Revenue.,” in USENIX Security Symposium,2011,pp.15–15.

[48] K. Thomas, D. McCoy, C. Grier, A. Kolcz, and V. Paxson, “Tracking Fraudulent Accounts: The Role of the Underground Market in Twitter Spam and Abuse.,” in USENIX Security,2013,pp.195–210.

[49] B. Stone-Gross, M. Cova, L. Cavallaro, B. Gilbert, M. Szydlowski, R. Kemmerer, C. Kruegel, and G. Vigna, “Your Botnet is My Botnet: Analysis of a Botnet Takeover,” in Proceedings of the 16th ACM Conference on Computer and Com- munications Security, ser. CCS ’09, New York, NY, USA: ACM, 2009, pp. 635– 647.

[50] C. Y. Cho, J. Caballero, C. Grier, V. Paxson, and D. Song, “Insights from the inside: A view of botnet management from infiltration,” in USENIX Workshop on Large-Scale Exploits and Emergent Threats (LEET),2010.

[51] C. Grier, L. Ballard, J. Caballero, N. Chachra, C. J. Dietrich, K. Levchenko, P. Mavrommatis, D. McCoy, A. Nappa, A. Pitsillidis, N. Provos, M. Z. Rafique, M. A. Rajab, C. Rossow, K. Thomas, V. Paxson, S. Savage, and G. M. Voelker, “Manufacturing Compromise: The Emergence of Exploit-as-a-service,” in Proceed- ings of the 2012 ACM Conference on Computer and Communications Security, ser. CCS ’12, New York, NY, USA: ACM, 2012, pp. 821–832.

[52] A. P. Felt, M. Finifter, E. Chin, S. Hanna, and D. Wagner, “A Survey of Mobile Malware in the Wild,” in ACM Workshop on Security and Privacy in Mobile Devices, Chicago, Illinois, USA, Oct. 2011.

[53] Y. Zhou and X. Jiang, “Dissecting Android Malware: Characterization and Evolution,” in 2012 IEEE Symposium on Security and Privacy (SP),May2012, pp. 95–109.

[54] C. Lever, M. Antonakakis, B. Reaves, P. Traynor, and W. Lee, “The Core of the Matter: Analyzing Malicious Trac in Cellular Carriers,” in Proceedings of the 20th Network and Distributed System Security Symposium, San Diego, CA, Feb. 2013.

165 [55] I. Murynets and R. Piqueras Jover, “Crime Scene Investigation: SMS Spam Data Analysis,” in Proceedings of the 2012 ACM Conference on Internet Measurement Conference, ser. IMC ’12, New York, NY, USA: ACM, 2012, pp. 441–452.

[56] H. Tan, N. Goharian, and M. Sherr, “$100,000 Prize Jackpot. Call Now!: Identifying the Pertinent Features of SMS Spam,” in Proceedings of the 35th In- ternational ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA: ACM, 2012, pp. 1175–1176.

[57] N. Jiang, Y. Jin, A. Skudlark, and Z.-L. Zhang, “Greystar: Fast and Accurate Detection of SMS Spam Numbers in Large Cellular Networks using Grey Phone Space,” in Proceedings of the 22nd USENIX Security Symposium.,WashingtonDC, USA: USENIX Association, 2013.

[58] A. Narayan and P. Saxena, “The Curse of 140 Characters: Evaluating the Ecacy of SMS Spam Detection on Android,” in Proceedings of the Third ACM Workshop on Security and Privacy in Smartphones & Mobile Devices, ser. SPSM ’13, New York, NY, USA: ACM, 2013, pp. 33–42.

[59] J. Atwood, Make Your Email Hacker Proof, http://blog.codinghorror.com/make- your-email-hacker-proof/, Apr. 2012.

[60] B. Schneier, “Two-factor Authentication: Too Little, Too Late,” Commun. ACM, vol. 48, no. 4, Apr. 2005.

[61] B. Reaves, N. Scaife, A. Bates, P. Traynor, and K. Butler, “Mo(bile) Money, Mo(bile) Problems: Analysis of Branchless Banking Applications in the Developing World,” in Proceedings of the USENIX Security Symposium (SECURITY),2015.

[62] RSA SecurID Hardware Tokens, http://www.emc.com/security/rsa-securid/rsa- securid-hardware-tokens.htm,2015.

[63] IdentityGuard Identity Authentication Platform, https : / / www . entrust . com / products/entrust-identityguard/,2015.

[64] J. Leyden, Visa trials PIN payment card to fight online fraud, http : / / www . theregister.co.uk/2008/11/10/visa one time code card/,2008.

166 [65] SiPix Imagining, Inc., World’s First ISO Compliant Payment DisplayCard using SiPix and SmartDisplayer’s Flexible Display Panel, http://www.businesswire.com/ portal/site/google/index.jsp?ndmViewId=news view&newsId=20060510006193& newsLang=en,2006.

[66] A.-B. Stensgaard, Biometric breakthrough - credit cards secured with fingerprint recognition made feasible, http://www.ameinfo.com/58236.html,2006.

[67] CardTechnology, UAE ID Card To Support Iris Biometrics, http : / / www . cardtechnology.com/article.html?id=20070423V0XCZ91L,2007.

[68] F. Aloul, S. Zahidi, and W. El-Hajj, “Two factor authentication using mobile phones,” in IEEE/ACS International Conference on Computer Systems and Applications, 2009. AICCSA 2009,May2009,pp.641–644.

[69] D. DeFigueiredo, “The Case for Mobile Two-Factor Authentication,” IEEE Security Privacy, vol. 9, no. 5, pp. 81–85, Sep. 2011.

[70] Mobile Authentication,https://www.duosecurity.com/product/methods/duo-mobile, 2015.

[71] M. Adham, A. Azodi, Y. Desmedt, and I. Karaolis, “How to Attack Two-Factor Authentication Internet Banking,” in Financial Cryptography and Data Security, ser. Lecture Notes in Computer Science 7859, Springer Berlin Heidelberg, Apr. 2013, pp. 322–328.

[72] R. K. Konoth, V. van der Veen, and H. Bos, “How Anywhere Computing Just Killed Your Phone-Based Two-Factor Authentication,” in Proceedings of the 20th International Conference on Financial Cryptography and Data Security,2016.

[73] C. Castillo, Spitmo vs Zitmo: Banking Trojans Target Android, http://blogs.mcafee. com/mcafee-labs/spitmo-vs-zitmo-banking-trojans-target-android,Sep.2011.

[74] L. Koot, “Security of mobile TAN on smartphones,” Master’s Thesis, Radboud University Nijmegen, Nijmegen, Feb. 2012.

167 [75] C. Mulliner, R. Borgaonkar, P. Stewin, and J.-P. Seifert, “SMS-based one-time passwords: Attacks and defense,” in Detection of Intrusions and Malware, and Vulnerability Assessment,Springer,2013,pp.150–159.

[76] R. E. Koenig, P. Locher, and R. Haenni, “Attacking the Verification Code Mechanism in the Norwegian Internet Voting System,” in E-Voting and Iden- tity, ser. Lecture Notes in Computer Science, J. Heather, S. Schneider, and V. Teague, Eds., Springer Berlin Heidelberg, Jul. 2013, pp. 76–92.

[77] A. Dmitrienko, C. Liebchen, C. Rossow, and A.-R. Sadeghi, “On the (In)Security of Mobile Two-Factor Authentication,” in Financial Cryptography and Data Security (FC14),Springer,Mar.2014.

[78] J.-E. L. Eide, “SMS One-Time Passwords: Security in Two-Factor Authenication,” Master’s Thesis, University of Bergen, May 2015.

[79] A. Biryukov, A. Shamir, and D. Wagner, “Real Time Cryptanalysis of A5/1 on a PC,” in Proceedings of the 7th International Workshop on Fast Software Encryption, ser. FSE ’00, London, UK, UK: Springer-Verlag, 2001, pp. 1–18.

[80] O. Dunkelman, N. Keller, and A. Shamir, “A Practical-time Related-key Attack on the KASUMI Cryptosystem Used in GSM and 3g Telephony,” in Proceedings of the 30th Annual Conference on Advances in Cryptology, ser. CRYPTO’10, Berlin, Heidelberg: Springer-Verlag, 2010, pp. 393–410.

[81] Z. Ahmadian, S. Salimi, and A. Salahi, “New attacks on UMTS network access,” in Wireless Telecommunications Symposium, 2009. WTS 2009, Apr. 2009, pp. 1–6.

[82] N. Golde, K. Redon, and R. Borgaonkar, “Weaponizing Femtocells: The E↵ect of Rogue Devices on Mobile Telecommunications.,” in NDSS,2012.

[83] A. Dabrowski, N. Pianta, T. Klepp, M. Mulazzani, and E. Weippl, “IMSI-catch me if you can,” in Proceedings of the 30th Annual Computer Security Applications Conference,2014.

[84] P. Traynor, W. Enck, P. McDaniel, and T. La Porta, “Exploiting Open Functionality in SMS-Capable Cellular Networks,” Journal of Computer Security (JCS),vol.16, no. 6, pp. 713–742, 2008.

168 [85] P. Traynor, P. McDaniel, and T. La Porta, “On Attack Causality in Internet Connected Cellular Networks,” in Proceedings of the USENIX Security Symposium (SECURITY),2007.

[86] K. Thomas, D. Iatskiv, E. Bursztein, T. Pietraszek, C. Grier, and D. McCoy, “Dialing Back Abuse on Phone Verified Accounts,” in Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, New York, NY, USA: ACM, 2014, pp. 465–476.

[87] P. Burge and J. Shawe-Taylor, “An unsupervised neural network approach to profiling the behavior of mobile phone users for use in fraud detection,” Journal of Parallel and Distributed Computing, vol. 61, no. 7, pp. 915–925, Jul. 2001.

[88] K. C. Cox, S. G. Eick, G. J. Wills, and R. J. Brachman, “Visual data mining: Recognizing telephone calling fraud,” en, Data Mining and Knowledge Discovery, vol. 1, no. 2, pp. 225–231, Jun. 1997.

[89] C. S. Hilas and P. A. Mastorocostas, “An application of supervised and unsupervised learning approaches to telecommunications fraud detection,” Knowledge-Based Sys- tems,vol.21,no.7,pp.721–726,Oct.2008.

[90] S. Qayyum, S. Mansoor, A. Khalid, K. Khushbakht, Z. Halim, and A. Baig, “Fraudulent call detection for mobile networks,” in 2010 International Conference on Information and Emerging Technologies (ICIET),2010,pp.1–5.

[91] A. H. Elmi, S. Ibrahim, and R. Sallehuddin, “Detecting SIM box fraud using neural network,” en, in IT Convergence and Security 2012, ser. Lecture Notes in Electrical Engineering 215, K. J. Kim and K.-Y. Chung, Eds., Springer Netherlands, Jan. 2013, pp. 575–582.

[92] I. Murynets, M. Zabarankin, R. Jover, and A. Panagia, “Analysis and detection of SIMbox fraud in mobility networks,” in 2014 Proceedings IEEE INFOCOM, Apr. 2014, pp. 1519–1526.

[93] Mobius fraud mangement, http://www.mobiusws.com/solutions/fraud- management/,2014.

[94] ROC fraud management, http://www.subex.com/pdf/bypass-fraud.pdf,2014.

169 [95] Araxxe SIM box detection, http://www.araxxe.com/SIM-box-detection.html,2014.

[96] Meucci solutions SIM box detection, http://www.meucci-solutions.com/solutions/ fraud-and-revenue/sim-box-detection/,2014.

[97] Mocean SIM box detector, http://www.mocean.com.my/SIM box detector solution. php,2014.

[98] Roamware SIM box detector, http://www.roamware.com/predictive intelligence sim box detector.php,2014.

[99] Telenor simbox detection, http://www.telenorglobal.com/wp-content/uploads/sites/ 4/2013/09/Global-SIM-Box-Detection1.pdf,2014.

[100] Agilis international simbox detection, http://www.agilisinternational.com/solutions/ customer-analytics/risk-and-fraud-management/,2014.

[101] CxB solutions SIM box detection, http://www.cxbsolutions.com/html/sim box detection.html,2014.

[102] FraudBuster SIMBuster, http://www.fraudbuster.mobi/new-simbuster-and- tracchecker-deployment-in-africa/,2014.

[103] XINTEC SIM box detector, http://www.xintec.com/fraud-management/sim-box- detector/,2014.

[104] R. Bresciani, “The ZRTP Protocol Analysis on the Die-Hellman Mode,” Trinity College Dublin Computer Science Department, Tech. Rep. TCD-CS-2009-13, 2009.

[105] P. R. Zimmermann, The Project, http://zfoneproject.com/,2016.

[106] R. Bresciani, S. Superiore, S. Anna, and I. Pisa, “The ZRTP Protocol Security Considerations,” Laboratoire Sp´ecification et V´erification, ENS Cachan, Tech. Rep. LSV-07-20, 2007.

[107] PGPfone - Phone, http://www.pgpi.org/products/pgpfone/, 2015.

170 [108] GSMK CryptoPhone, http://www.cryptophone.de/en/,2015.

[109] Signal, https://itunes.apple.com/us/app/signal-private-messenger/id874139669? mt=8,2015.

[110] RedPhone, https://play.google.com/store/apps/details?id=com.littlebytesofpi. linphonesip.

[111] P. Zimmermann, A. Johnston, and J. Callas, “ZRTP: Media Path Key Agreement for Unicast Secure RTP,” IETF, RFC 6189, 2011.

[112] Silent Circle, https://www.silentcircle.com/,2015.

[113] I. Dacosta and P. Traynor, “Proxychain: Developing a Robust and Ecient Authentication Infrastructure for Carrier-Scale VoIP Networks,” in Proceedings of the USENIX Annual Technical Conference (ATC),2010.

[114] I. Dacosta, V. Balasubramaniyan, M. Ahamad, and P. Traynor, “Improving Authentication Performance of Distributed SIP Proxies,” IEEE Transactions on Parallel and Distributed Systems (TPDS),vol.22,no.11,pp.1804–1812,2011.

[115] M. Shirvanian and N. Saxena, “Wiretapping via Mimicry: Short Voice Imitation Man-in-the-Middle Attacks on Crypto Phones,” in Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (CCS),2014, pp. 868–879.

[116] M. Petraschek, T. Hoeher, O. Jung, H. Hlavacs, and W. Gansterer, “Security and usability aspects of Man-in-the-Middle attacks on ZRTP,” Journal of Universal Computer Science,no.5,pp.673–692,

[117] Directory of Unknown Callers, http://www.800notes.com/,2015.

[118] Finally! No more annoying Robocalls and Telemarketers. http://www.nomorobo. com/, 2016. [Online]. Available: http://www.nomorobo.com/.

171 [119] Z. Wu, A. Khodabakhsh, C. Demiroglu, J. Yamagishi, D. Saito, T. Toda, and S. King, “SAS: A speaker verification spoofing database containing diverse attacks,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),2015,pp.4440–4444.

[120] F. Alegre, G. Soldi, and N. Evans, “Evasion and obfuscation in automatic speaker verification,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),2014,pp.749–753.

[121] Z. Wu and H. Li, “Voice conversion and spoofing attack on speaker verification systems,” in Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA),IEEE,2013.

[122] F. Alegre and R. Vipperla, “On the vulnerability of automatic speaker recognition to spoofing attacks with artificial signals,” in Proceedings of the 20th European Signal Processing Conference (EUSIPCO),2012,pp.36–40.

[123] Y. Stylianou, “Voice transformation: A survey,” in Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing (ICASSP),2009.

[124] Q. Jin, A. R. Toth, A. W. Black, and T. Schultz, “Is voice transformation a threat to speaker identification?” In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP),2008,pp.4845–4848.

[125] V. Balasubramaniyan, A. Poonawalla, M. Ahamad, M. Hunter, and P. Traynor, “PinDr0p: Using Single-Ended Audio Features to Determine Call Provenance,” in Proceedings of the ACM Conference on Computer and Communications Security (CCS),2010.

[126] B. Reaves, E. Shernan, A. Bates, H. Carter, and P. Traynor, “Boxed Out: Blocking Cellular Interconnect Bypass Fraud at the Network Edge,” in Proceedings of the USENIX Security Symposium (SECURITY),2015.

[127] S. Rosset, U. Murad, E. Neumann, Y. Idan, and G. Pinkas, “Discovery of Fraud Rules for Telecommunications-Challenges and Solutions,” in Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), New York, NY, USA, 1999, pp. 409–413.

172 [128] B. Mathieu, S. Niccolini, and D. Sisalem, “SDRS: A Voice-over-IP Spam Detection and Reaction System,” IEEE Security & Privacy Magazine,vol.6,no.6,pp.52–59, Nov. 2008.

[129] Mustafa, H. and Wenyuan Xu and Sadeghi, A.R. and Schulz, S., “You can SPIT, but you can’t hide: Spammer identification in telephony networks,” in 2011 Proceedings IEEE INFOCOM,2011,pp.41–45.

[130] N. Jiang, Y. Jin, A. Skudlark, W.-L. Hsu, G. Jacobson, S. Prakasam, and Z.-L. Zhang, “Isolating and analyzing fraud activities in a large cellular network via voice call graph analysis,” in Proceedings of the 10th international conference on Mobile systems, applications, and services (MobiSys),2012,p.253.

[131] H. Sengar, “VoIP Fraud : Identifying a wolf in sheep’s clothing,” in Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (CCS),2014,pp.334–345.

[132] B. Moeller and A. Langley, “TLS Fallback Signaling Cipher Suite Value (SCSV) for Preventing Protocol Downgrade Attacks,” Internet Engineering Task Force, Internet-Draft, 2014.

[133] J. Clark and P. C. Van Oorschot, “SoK: SSL and HTTPS: Revisiting past challenges and evaluating certificate trust model enhancements,” in Proceedings of the IEEE Symposium on Security and Privacy (S&P),2013,pp.511–525.

[134] T. Zoller, TLS & SSLv3 Renegotiation Vulnerability, http : / / www . g - sec . lu / practicaltls.pdf,2009.

[135] Z. Durumeric, J. Kasten, D. Adrian, J. A. Halderman, M. Bailey, F. Li, N. Weaver, J. Amann, J. Beekman, M. Payer, and V. Paxson, “The matter of heartbleed,” in Proceedings of the 2014 Conference on Internet Measurement Conference (IMC), Vancouver, BC, Canada: ACM, 2014, pp. 475–488.

[136] R. Rivest and B. Lampson, SDSI: A Simple Distributed Security Infrastructure, http://research.microsoft.com/en-us/um/people/blampson/59-sdsi/webpage.html, 1996.

[137] C. Ellison, B. Frantz, B. Lampson, R. L. Rivest, B. Thomas, and T. Ylonen, “SPKI Certificate Theory,” IETF, RFC 2693, 1999.

173 [138] C. Ellison and B. Schneier, “Ten risks of PKI: What you’re not being told about public key infrastructure,” Computer Security Journal,vol.16,no.1,pp.1–7,2000.

[139] R. Holz, L. Braun, N. Kammenhuber, and G. Carle, “The SSL landscape: a thorough analysis of the x.509 PKI using active and passive measurements,” in Proceedings of the 2011 ACM SIGCOMM conference on Internet Measurement Conference (IMC),2011,pp.427–444.

[140] I. Dacosta, M. Ahamad, and P. Traynor, “Trust No One Else: Detecting MITM Attacks Against SSL/TLS Without Third-Parties,” in Proceedings of the European Symposium on Research in Computer Security (ESORICS),2012.

[141] L. S. Huang, A. Rice, E. Ellingsen, and C. Jackson, “Analyzing forged SSL certificates in the wild,” in Proceedings of the IEEE Symposium on Security and Privacy (SP),2014.

[142] A. Bates, J. Pletcher, T. Nichols, B. Hollembaek, and K. R. Butler, “Forced perspectives: Evaluating an SSL trust enhancement at scale,” in Proceedings of the 2014 Internet Measurement Conference (IMC),ACM,2014,pp.503–510.

[143] E. Rescorla, SSL and TLS: Designing and Building Secure Systems. Addison-Wesley, 2001, p. 499.

[144] R. Dhamija, J. D. Tygar, and M. Hearst, “Why phishing works,” in Proceedings of the SIGCHI conference on Human Factors in Computing Systems (CHI), ser. CHI ’06, New York, NY, USA: ACM, 2006.

[145] S. E. Schechter, R. Dhamija, A. Ozment, and I. Fischer, “The emperor’s new security indicators,” in Proceedings of the IEEE Symposium on Security and Privacy (SP),2007.

[146] S. Egelman, L. F. Cranor, and J. Hong, “You’ve been warned: An empirical study of the e↵ectiveness of web browser phishing warnings,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI),2008.

[147] J. Sobey, R. Biddle, P. van Oorschot, and A. S. Patrick, “Exploring user reactions to new browser cues for extended validation certificates,” in Proceedings of the European Symposium on Research in Computer Security (ESORICS),2008.

174 [148] D. Akhawe, B. Amann, M. Vallentin, and R. Sommer, “Here’s my cert, so trust me, maybe? Understanding TLS errors on the web,” in Proceedings of the 22nd International Conference on World Wide Web (WWW),2013,pp.59–70.

[149] D. Akhawe and A. P. Felt, “Alice in Warningland: A large-scale field study of browser security warning e↵ectiveness,” in Proceedings of the 22nd USENIX Conference on Security, ser. SEC’13, Washington, D.C.: USENIX Association, 2013, pp. 257–272.

[150] “ITU standard P.800 methods for subjective determination of transmission quality,” Aug. 1996.

[151] “ITU standard P.862:perceptual evaluation of speech quality (PESQ),” Oct. 2007.

[152] A. Takahashi, A. Kurashima, and H. Yoshino, “Objective assessment methodology for estimating conversational quality in VoIP,” IEEE Transactions on Audio, Speech, and Language Processing,vol.14,no.6,pp.1984–1993,2006.

[153] S. Broom, “VoIP quality assessment: Taking account of the edge-device,” IEEE Transactions on Audio, Speech, and Language Processing,vol.14,no.6,2006.

[154] “ITU standard P.563:single-ended method for objective speech quality assessment,” Apr. 2004.

[155] T. Falk and W.-Y. Chan, “Single-ended speech quality measurement using machine learning methods,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 6, pp. 1935–1947, 2006.

[156] C. Hoene, H. Karl, and A. Wolisz, “A perceptual quality model intended for adaptive VoIP applications: Research articles,” Int. J. Commun. Syst.,vol.19, no. 3, pp. 299–316, Apr. 2006.

[157] L. Ding, Z. Lin, A. Radwan, M. S. El-Hennawey, and R. A. Goubran, “Non-intrusive single-ended speech quality assessment in VoIP,” Speech Communication,vol.49, no. 6, pp. 477–489, Jun. 2007.

[158] M. Paping and T. Fahnle, “Automatic detection of disturbing robot voice and ping-pong e↵ects in GSM transmitted speech,” in EUROSPEECH,1997.

175 [159] A. Hines, J. Skoglund, A. Kokaram, and N. Harte, “Monitoring the e↵ects of temporal clipping on VoIP speech quality,” in 14th Annual Conference of the International Speech Communication Association, ISCA, 2013.

[160] The Open University, 2014 Text Messaging Usage Statistics, http : / / www . openuniversity.edu/news/news/2014-text-messaging-usage-statistics,Dec.2014.

[161] S. J. Delany, M. Buckley, and D. Greene, “SMS spam filtering: Methods and data,” Expert Systems with Applications,vol.39,no.10,pp.9899–9908,2012.

[162] Scrapy, http://scrapy.org,2015.

[163] The International Organization for Standardization, ISO 8601 - Time and date format, http://www.iso.org/iso/home/standards/iso8601.htm,2004.

[164] OpenCNAM, https://www.opencnam.com,2015.

[165] VirusTotal, http://virustotal.com.

[166] E. McCallister, T. Grance, and K. Scarfone, NIST SP800-122: Guide to Protecting the Confidentiality of Personally Identifiable Information (PII), http://csrc.nist. gov/publications/nistpubs/800-122/sp800-122.pdf,2010.

[167] H. P. Luhn, “Computer for verifying numbers,” US Patent 2950048, 1960.

[168] Talk2, http://talk2ph.com,2015.

[169] SMSGlobal, https://www.smsglobal.com,2015.

[170] LiqPay, https://www.liqpay.com,2015.

[171] B. J. Frey and D. Dueck, “Clustering by passing messages between data points,” Science,vol.315,no.5814,pp.972–976,2007.

176 [172] N. Crooks, Venezuela, the Country With Four Exchange Rates, http : / / www . bloomberg.com/news/articles/2015-02-19/venezuela-the-country-with-four- exchange-rates,Feb.2015.

[173] PayCenter, https://www.paycenter.de,2015.

[174] Boss Revolution, https://www.bossrevolution.ca,2015.

[175] Frim, http://fr.im,2015.

[176] eCall, http://www.ecall.ch,2015.

[177] RedOxygen, http://www.redoxygen.com,2015.

[178] Visa QIWI Wallet, https://qiwi.ru,2015.

[179] M. Honan, How Apple and Amazon security flaws led to my epic hacking, http://www.wired.com/2012/08/apple-amazon-mat-honan-hacking/all/, Aug. 2012.

[180] A. Skudlark, “Characterizing SMS Spam in a Large Cellular Network via Mining Victim Spam Reports,” AT&T Labs, Tech. Rep., Dec. 2014.

[181] P. Traynor, M. Lin, M. Ongoing, V. Rao, T. Jaeger, P. McDaniel, and T. La Porta, “On cellular botnets: Measuring the impact of malicious devices on a cellular network core,” in Proceedings of the 16th ACM conference on Computer and communications security,2009.

[182] “ITU-T recommendation G.711,” Jun. 1990.

[183] M. Jalil, F. Butt, and A. Malik, “Short-time energy, magnitude, zero crossing rate and autocorrelation measurement for discriminating voiced and unvoiced segments of speech signals,” in Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE), 2013 International Conference on,May2013, pp. 208–212.

177 [184] Schulzrinne and Casner, “RTP Profile for Audio and Video Conferences with Minimal Control,” IETF, RFC 3551, 2003.

[185] 3rd Generation Partnership Project, “Full rate speech; Substitution and muting of lost frames for full rate speech channels,” Tech. Rep. TS 46.011.

[186] C. Perkins, O. Hodson, and V. Hardman, “A survey of packet loss recovery techniques for streaming audio,” IEEE Network, vol. 12, no. 5, pp. 40–48, 1998.

[187] R. Les Cottrell, “Pinging Africa - a decade long quest aims to pinpoint the Internet bottlenecks holding Africa back,” Spectrum, IEEE,vol.50,no.2,pp.54–59,Feb. 2013.

[188] OsmocomBB GSM baseband, http://bb.osmocom.org/trac/,2015.

[189] J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, N. L. Dahlgren, and V. Zue, TIMIT Acoustic-Phonetic Continuous Speech Corpus. Philadelphia: Linguistic Data Consortium, 1993.

[190] Sox, http://sox.sourceforge.net/Main/HomePage,2016.

[191] W. Jiang and H. Schulzrinne, “Comparison and optimization of packet loss repair methods on VoIP perceived quality under bursty loss,” in Proceedings of the 12th International Workshop on Network and Operating Systems Support for Digital Audio and Video, ser. NOSSDAV ’02, Miami, Florida, USA: ACM, 2002, pp. 73–81.

[192] G. Hasslinger and O. Hohlfeld, “The Gilbert-Elliott model for packet loss in real time services on the Internet,” in Measuring, Modelling and Evaluation of Computer and Communication Systems (MMB), 2008 14th GI/ITG Conference, Mar. 2008, pp. 1–15.

[193] Y. Wang, C. Huang, J. Li, and K. Ross, “Queen: Estimating packet loss rate between arbitrary internet hosts,” in Passive and Active Network Measurement, ser. Lecture Notes in Computer Science, S. Moon, R. Teixeira, and S. Uhlig, Eds., vol. 5448, Springer Berlin Heidelberg, 2009, pp. 57–66.

[194] ITU Software Tool Library Manual. Geneva: ITU, 2009.

178 [195] J. A. S. Molina, GSM trac channel simulator, http://www.mathworks.com/ matlabcentral/fileexchange/11078-gsm-trac-channel-simulator,2006.

[196] 3rd Generation Partnership Project, “Channel Coding,” Tech. Rep. TS 45.003.

[197] ——, “Radio transmission and reception,” Tech. Rep. TS 45.005.

[198] U. Ratana, “Telcos lose money to SIM fraud,” Phnom Penh Post,Feb.2014.

[199] Goip For Grey Route SIM Box, http://www.alibaba.com/product-detail/16-ports- gsm-gateway-goip-for 862885942.html.

[200] GoAntifraud.com. [Online]. Available: https://goantifraud.com/.

[201] A. M. White, A. R. Matthews, K. Z. Snow, and F. Monrose, “Phonotactic Reconstruction of Encrypted VoIP Conversations: Hookt on Fon-iks,” in Pro- ceedings of the 2011 IEEE Symposium on Security and Privacy,2011.

[202] D. Samfat, R. Molva, and N. Asokan, “Untraceability in mobile networks,” in Proceedings of the First Annual International Conference on Mobile Computing and Networking (MobiCom),1995,pp.26–36.

[203] TelTech, SpoofCard, http://www.spoofcard.com/,2015.

[204] A. Tyrberg, “Data Transmission over Speech Coded Voice Channels,” Master’s Thesis, Linkoping University, 2006.

[205] M. A. Ozkan, B. Ors, and G. Saldamli, “ communication via GSM network,” 2011 7th International Conference on Electrical and Electronics Engi- neering (ELECO),pp.II–288–II–292,2011.

[206] N. N. Katugampala, K. T. Al-Naimi, S. Villette, and A. M. Kondoz, “Real-time end-to-end secure voice communications over GSM voice channel,” Signal Process- ing Conference, 2005 13th European,pp.1–4,2005.

179 [207] A. Dhananjay, A. Sharma, M. Paik, J. Chen, T. K. Kuppusamy, J. Li, and L. Subramanian, “Hermes: Data transmission over unknown voice channels,” in Proceedings of the Sixteenth Annual International Conference on Mobile Computing and Networking, ser. MobiCom, New York, NY, USA: ACM, 2010.

[208] Sklar, Bernard, Digital Communications: Fundamentals and Applications,English, Second. Upper Saddle River, N.J: Prentice Hall, Jan. 2001.

[209] P. Koopman and T. Chakravarty, “Cyclic redundancy code (CRC) polynomial selection for embedded networks,” in 2004 International Conference on Dependable Systems and Networks, Jun. 2004, pp. 145–154.

[210] R. Needham and M. Schroeder, “Using encryption for authentication in large networks of computers,” Communications of the ACM,vol.21,no.12,pp.993–999, 1978.

[211] B. Blanchet, ProVerif: verifier in the formal model, http: //www.proverif.ens.fr/,2016.

[212] Pyelliptic, https://pypi.python.org/pypi/pyelliptic,2016.

[213] Certicom Research, SEC 2: Recommended Elliptic Curve Domain Parameters, Jan. 2010.

[214] M. Bellare, New Proofs for NMAC and HMAC Security without Collision- Resistance, Advances in Cryptology - CRYPTO ’06, 2006.

[215] National Institute of Standards and Technology, NIST Special Publication 800-107 Revision 1: Recommendation for Applications Using Approved Hash Algorithms, 2008.

[216] Ffmpeg, https://www.↵mpeg.org,2016.

[217] M. Lepinski, R. Barnes, and S. Kent, “An Infrastructure to Support Secure Internet Routing,” IETF, RFC 6480, 2012.

180 [218] Local Search Association, CLEC Information, http://www.thelsa.org/main/ clecinformation.aspx,2016.

[219] M. Sherr, E. Cronin, S. Clark, and M. Blaze, “Signaling Vulnerabilities in Wiretapping Systems,” IEEE Security & Privacy Magazine, vol. 3, no. 6, pp. 13–25, 2005.

[220] S. Alfonsi, Hacking Your Phone, http://www.cbsnews.com/news/60-minutes- hacking-your-phone/,2016.

[221] H. Mustafa, A.-R. Sadeghi, S. Schulz, and W. Xu, “You Can Call But You Can’t Hide: Detecting Caller ID Spoofing Attacks,” in Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks (DSN),2014.

[222] B. Reaves, L. Blue, and P. Traynor, “AuthLoop: End-to-End Cryptographic Authentication for Telephony over Voice Channels,” 25th USENIX Security Symposium (USENIX Security 16), pp. 963–978, Aug. 2016.

[223] A. Whitten and J. D. Tygar, “Why Johnny Can’t Encrypt: A Usability Evaluation of PGP 5.0.,” in 25th USENIX Security Symposium (USENIX Security 16),1999.

[224] B. Blanchet, “An Ecient Cryptographic Protocol Verifier Based on Prolog Rules,” in Proceedings of the 14th IEEE Workshop on Computer Security Foundations, 2001.

[225] Y. Jiao, L. Ji, and X. Niu, “Robust Speech Hashing for Content Authentication,” IEEE Signal Processing Letters, vol. 16, no. 9, pp. 818–821, 2009.

[226] T. Kinnunen, Z.-Z. Wu, K. A. Lee, F. Sedlak, E. S. Chng, and H. Li, “Vulnerability of speaker verification systems against voice conversion spoofing attacks: The case of telephone speech,” in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),IEEE,2012,pp.4401–4404.

[227] Bradley Reaves, Logan Blue, Hadi Abdullah, Luis Vargas, Patrick Traynor, and Tom Shrimpton, “AuthentiCall: Ecient Identity and Content Authentication for Phone Calls,” in Proceedings of the 26th USENIX Security Symposium,2017.

181 [228] O. Tange et al.,“Gnuparallel-thecommand-linepowertool,”;login: The USENIX Magazine: Volume 36, Number 1,2011.

[229] O. Hohlfeld, R. Geib, and G. Haßlinger, “Packet Loss in Real-time Services: Markovian Models Menerating QoE Impairments,” in Quality of Service, 2008. IWQoS 2008. 16th International Workshop on,IEEE,2008,pp.239–248.

[230] Statista, Average monthly outbound minutes, https://www.statista.com/statistics/ 273902/average-monthly-outbound-mobile-voice-minutes-per-person-in-the-uk/, 2013.

[231] “Bouncy Castle Crypto API,” http:// www.bouncycastle.org/ ,2007.

[232] M. Lepinski and S. Kent, “Additional Die-Hellman Groups for Use with IETF Standards,” RFC Editor, RFC 5114, Jan. 2008.

[233] H. Krawczyk and P. Eronen, “HMAC-based Extract-and-Expand Key Derivation Function (HKDF),” IETF, RFC 5869, 2010.

[234] Qualcomm, Circuit-switched fallback: The first phase of voice evolution for mobile LTE devices. http://www.ericsson.com/res/docs/2012/the first phase of voice evolution for mobile lte devices.pdf,2012.

[235] Average call, https://www.statista.com/statistics/185828/average-local-mobile- wireless-call-length-in-the-united-states-since-1987/,2012.

[236] D. Tynan, The terror of swatting: How the law is tracking down high-tech prank callers, https://www.theguardian.com/technology/2016/apr/15/swatting-law-teens- anonymous-prank-call-police, Apr. 2016.

[237] Teen’s iphone hack gets him arrested for unleashing ddos on 911 system, https: //www.neowin.net/news/teens-iphone-hack-gets-him-arrested-for-unleashing-ddos- on-911-system,2016.

[238] J. Serror, H. Zang, and J. C. Bolot, “Impact of paging channel overloads or attacks on a cellular network,” in Proceedings of the 5th ACM Workshop on Wireless Security,2006.

182 [239] P. P. Lee, T. Bu, and T. Woo, “On the detection of signaling DoS attacks on 3G/WiMax wireless networks,” Computer Networks,vol.53,no.15,pp.2601–2616, 2009.

[240] W. Enck, P. Traynor, P. McDaniel, and T. La Porta, “Exploiting Open Functionality in SMS-Capable Cellular Networks,” in Proceedings of the 12th ACM conference on Computer and communications security,ACM,2005,pp.393–404.

[241] Letsencrypt, https://letsencrypt.org/,2016.

[242] R. Pries, T. HoBfeld, and P. Tran-Gia, “On the suitability of the short message service for emergency warning systems,” in 2006 IEEE 63rd Vehicular Technology Conference,IEEE,vol.2,2006,pp.991–995.

[243] C. Amrutkar, P. Traynor, and P. van Oorschot, “An Empirical Evaluation of Security Indicators in Mobile Web Browsers,” IEEE Transactions on Mobile Computing (TMC), vol. 14, no. 5, pp. 889–903, 2015.

183 BIOGRAPHICAL SKETCH Bradley Reaves is a computing researcher and educator. His research is dedicated to measuring and improving the security and privacy of computer systems, with a particular emphasis on telephone networks and software for mobile platforms. This work has addressed detection and measurement of mobile malware in the wild, identified systemic risks in developing world mobile money systems, and developed new techniques to distinguish legitimate and fraudulent phone calls. His work has been recognized with two best paper awards, and he was named an NSF Graduate Research Fellow in 2010. Bradley completed his Ph.D. in computer engineering in August 2017 at the University of Florida, where he served as the lead graduate student for the Florida Institute for CyberSecurity Research in 2016. He also earned an M.S. in computer science from the Georgia Institute of Technology and a B.S. and M.S. in computer engineering from Mississippi State University. After graduation, he accepted a position as a tenure-track assistant professor in the Department of Computer Science at North Carolina State University.

184