Ph.D. Thesis

Covert Channels in Modern Computer Systems: The cases of Mobile and Cloud

Ken Block

College of Computer and Information Science

Northeastern University

Ph.D. Committee

Guevara Noubir Advisor, Northeastern University

Yunsi Fei Northeastern University

Engin Kirda Northeastern University

Ricardo Rodriguez External Member, Raytheon

July 2018

Abstract

Covert Channels have existed for centuries, from the time of Histiaeus to the modern day. Like its historical roots, the modern covert channel’s life cycle consists of identifying new attack vectors, developing countermeasures and continuing with the next thrust and parry cycle. The front line in this conflict now includes mobile devices and cloud computing centers.

In this thesis, we investigate the potential of a particular covert channel form, its performance limits and explore mitigation techniques and effectiveness. We emphasize permissionless, resource compromising channels where there is complexity faced by both the attacker and the provider / defender in creating and mitigating the attacks respectively. Two channels discussed herein utilize shared resources as the critical communications element while the third uses external entities to communicate with target devices. Furthermore, one channel leverages the physical structure of a mobile device. A second relies on an external source that freely communicates with a mobile device’s unprotected sensor while the third, a cloud platform-based attack, targets shared resources whose very existence provides economic benefit to the service provider.

Initially, we describe a privacy attack whereby a seemingly innocuous app receives and exfiltrates loca- tion information obtained indirectly from an external source despite user efforts to suspend all location acquisition and supporting services such as GPS, cellular, Wi-Fi, Bluetooth and NFC. Source locations may include stores, malls, railways, airports, hotels, cross-walks and bus stations. A location resident system encodes a unique ID that references position data and transmits it via magnetic field manipu- lation. A victim’s local magnetometer, available to any app without permission and functioning as the receiver, detects the encoded pattern in the presence of motion and other environmental noise. The pat- tern’s payload is transmitted off-board the Android device at a later time when communication services are enabled. We can therefore establish a partial history of device locations despite the user’s effort to prevent tracking, short of powering off the device. We achieve an aggregate location ID accuracy of 86% with a bit error rate of 1.5%.

Next, we form an ultrasonic, permissionless bridge between two co-resident AndroidTM apps using the speaker as the acoustic source and the accelerometer as the receiver. The MEMs sensors’ resonance behavior is exploited as an alternative to the permissions requisite microphone. Information is extracted by one app which is granted permission to access sensitive information but is blocked from external access. A second app is allowed external access but is prevented by Android protections from direct access to the sensitive information. This bridge enables sensitive information to flow to an eventual off-board destination, operating unconstrained by the Android system and without alerting the victim. 4 We achieve bit error rates of 10− with channel capacity approaching 40 bits per second when applying performance boosting techniques such as a MIMO-like dual channel configuration and an Amplitude Shift Keying modulation scheme. These performance levels are very reasonable for acquiring personally identifiable and other sensitive information.

Finally, we consider an alternative family of channels that exploit cloud co-residency. Here we describe a stealthy channel where a hostile client-server application pair, masquerading as a legitimate hosted site

ii with valuable content, exploits shared resources on a cloud server. We demonstrated this stealthy timing channel attack achieving worst case native BERs of 1.87 10 2 and 5 10 4 by applying spreading gain. × − × − This channel, built on out of the box libraries and application configurations, executed continuously for 24 contiguous days in a major university Computer Science department datacenter. It shared the same highly dynamic environment that actively supported over 1000 virtual and physical nodes.

iii Contents

1 Introduction 1 1.1 Motivation Summary for Thesis ...... 1 1.2 Mobile Device Position Identification Summary ...... 1 1.3 Mobile Device Sensitive Information Extraction Summary ...... 2 1.4 Cloud Platform Attack Summary ...... 3 1.5 Existing Work and Mitigation Attempts Overview ...... 3 1.5.1 Attack Vector Summary ...... 3 1.5.2 Mitigation ...... 4 1.6 Thesis Overview ...... 4 1.7 App Download Behavior and Enticement ...... 5 1.7.1 Age Based Demographics and Behaviors ...... 5 1.7.2 Trust ...... 6 1.7.3 Multi-app Consideration ...... 6 1.7.4 Nationality / Country Considerations ...... 6 1.7.5 Emerging Defense ...... 7 1.7.6 Legal Protections ...... 7 1.7.7 Attack Strategy and Summary ...... 7

2 Permissionless Tracking Using Magnetic Fields 9 2.1 Introduction ...... 9 2.2 Background and Motivation ...... 10 2.3 Threat Model ...... 12 2.3.1 Vulnerability ...... 12 2.3.2 Threat ...... 12 2.3.3 Attack ...... 13 2.3.4 Exploit ...... 13 2.3.5 Trust ...... 14 2.3.6 Data ...... 14 2.4 System Design ...... 14 2.4.1 Magnetometer Based Tracking System Overview ...... 14 2.4.2 Magnetic Flux Determination ...... 15 2.4.3 Challenges and Tradeoffs ...... 17 2.4.4 Design Decisions, Observations and Parametrics ...... 17 2.4.5 Coil and Electronics Design ...... 19 2.5 Code Selection and Payload Design ...... 20 2.6 Signal Processing ...... 21 2.7 Testing and Evaluation Approach ...... 23 2.7.1 Testing Methodology ...... 24 2.7.2 Magnetic Field Characteristics ...... 25

iv 2.8 Testing Results ...... 26 2.8.1 Sampling Rates ...... 26 2.8.2 Processing ...... 27 2.8.3 Stationary Testing ...... 29 2.8.4 Walking Results ...... 29 2.8.5 Contiguous Identical Bit Assessment ...... 32 2.9 Mitigation ...... 33 2.10 Related Work ...... 34 2.11 Conclusion ...... 35

3 Android Ultrasonic Covert Channel 36 3.1 Introduction ...... 36 3.2 Background and Motivation ...... 38 3.3 Threat Model ...... 40 3.3.1 Vulnerability ...... 40 3.3.2 Threat ...... 41 3.3.3 Attack ...... 41 3.3.4 Exploit ...... 41 3.3.5 Trust ...... 42 3.3.6 Data ...... 42 3.4 System Design ...... 42 3.4.1 Challenges ...... 42 3.4.2 Solution Overview ...... 43 3.4.3 Phase I: Channel Identification ...... 44 3.4.4 Phase II: Data Transfer ...... 51 3.4.5 Key Stealth Factors ...... 52 3.4.6 Performance Boosting Design ...... 53 3.5 Testing and Evaluation Approach ...... 54 3.5.1 Test Environments ...... 54 3.5.2 Evaluation Approach ...... 56 3.6 Results ...... 58 3.6.1 Channel Identification Results ...... 58 3.6.2 Error Testing Results ...... 60 3.6.3 Device Family Uniformity ...... 63 3.6.4 Capacity, Throughput and Performance Boosting ...... 64 3.6.5 Capacity ...... 64 3.6.6 Throughput ...... 64 3.6.7 Performance Boosting Evaluation ...... 64 3.7 Mitigation ...... 67 3.8 Related Work ...... 68 3.9 Conclusion ...... 69

v 4 Cloud Covert Channel 70 4.1 Background and Motivation ...... 70 4.2 Threat Model ...... 70 4.2.1 Vulnerability ...... 71 4.2.2 Threat ...... 71 4.2.3 Attack ...... 71 4.2.4 Exploit ...... 71 4.2.5 Trust ...... 72 4.2.6 Data ...... 72 4.3 Approach and Modeling ...... 72 4.3.1 Approach ...... 72 4.3.2 Models ...... 73 4.4 Challenges ...... 75 4.5 Detailed Approach ...... 77 4.5.1 Impact of Web Page Size ...... 77 4.5.2 Discrimination ...... 78 4.5.3 Obfuscation ...... 79 4.5.4 Data Collection and Post Processing ...... 80 4.5.5 Data Analysis ...... 80 4.5.6 Protocols ...... 81 4.5.7 Spreading Gain ...... 82 4.5.8 Attack Sequencing ...... 82 4.5.9 Time Induced Errors ...... 83 4.5.10 Content ...... 84 4.5.11 Query Entropy / Duty ...... 84 4.5.12 Test Pattern ...... 85 4.6 Test Bed ...... 85 4.6.1 Node Distribution ...... 85 4.6.2 Platform Parametric Information ...... 86 4.6.3 Speed ...... 87 4.7 Results ...... 88 4.7.1 End To End Traffic ...... 88 4.7.2 Channel Results ...... 91 4.7.3 Analysis ...... 94 4.8 Mitigation ...... 102 4.9 Related Work ...... 103 4.10 Conclusion ...... 104

5 Thesis Conclusion 106

vi List of Figures

1 System Design ...... 11 2 System Building Blocks ...... 15 3 Point In Space Magnetic Flux ...... 16 4 Single Coil Circuit Design ...... 20 5 Magnetic Fields, Galaxy S6 ...... 22 6 Signal Processing ...... 23 7 Platform with Coil Exposed ...... 24 8 Axial Static Position Readings ...... 26 9 FFT - Low Frequency Content ...... 27 10 Single Walk Magnetometer Readings ...... 28 11 Signal Processing Chain ...... 28 12 Sensor Response to 21050 Hz Tone Bursts on Galaxy S5 ...... 39 13 Single Device Covert Communications Channel System Design ...... 43 14 Frequency Identification Sequence ...... 44 15 Spectral Components, Channel Identification Sweep Pattern ...... 45 16 Spectrum at Microphone ...... 45 17 Coherence ...... 45 18 Sensor Event Time Histogram ...... 48 19 Synchronization Window ...... 48 20 Pulse Width Identification Sequence ...... 49 3.21 Frequency Identification, Resolution Effect ...... 50 3.22 Data Transfer Sequence ...... 52 3.23 Accelerometer Distribution ...... 54 3.24 Environment Noise Spectra ...... 55 3.25 Devices with Vulnerability ...... 59 3.27 Error and Capacity Summary ...... 61 3.28 Throughput vs. Pulse Width for Uncoded Communication ...... 65 3.29 Performance Boosting ...... 66 4.1 System View ...... 74 4.2 Covert Channel Client-Server Context ...... 75 4.3 Sequence Diagram ...... 82 4.4 Network Topology ...... 86 4.5 24 Hour Network Traffic ...... 89 4.6 Endpoint Network Traffic ...... 90 4.7 Web Page Retrieval Times, 1 Hr ...... 91 4.8 Web Page Retrieval Times, 1 Day ...... 92 4.9 Full View ...... 93 4.10 Full View, Zoom ...... 93 4.11 Retrieval Times Statistics, 1 Day ...... 93

vii 4.12 Client Clock Skew ...... 94 4.13 Population Distributions, Ones and Zeros ...... 95 4.14 Direct Derivation vs Adaptive Sweep ...... 99 4.15 Enhanced BER with Gain ...... 99 4.16 Scoring ...... 99 4.17 Throughput vs BER ...... 101 4.18 Spread Impact ...... 101 4.19 Capacity Behaviors ...... 101 4.20 Frame Throughput ...... 102

viii List of Tables

1 Key Downloading Influences ...... 5 2 Coil Parameters ...... 19 3 Data Pattern ...... 25 4 Sampling Rate Statistics (Msec) ...... 26 5 Static Error Rate Summary ...... 29 6 Error Summary ...... 29 7 Data Pattern, Two Bit Limit ...... 32 8 Two Bit Limit Summary ...... 33 9 Correlation and Synchronization Notations ...... 48 10 Device Audibility Test ...... 51 11 Tested Devices by Environment ...... 57 12 Devices with Indeterminate Channels ...... 58 13 Error Rate Summary ...... 62 14 Device Pool Identicality ...... 63 15 Multichannel Axial Correlation ...... 67 16 Sample index.html Sizes ...... 78 17 Yahoo URL Size Distribution ...... 79 18 Potential Network Competitive Sources ...... 86 19 Platforms ...... 87 20 Chip Distribution Statistics (Seconds) ...... 93 21 Time Jitter ...... 93 22 Daily Accuracy and CER Results ...... 95 23 Chip Performance vs Threshold ...... 97 24 Memoryless Test ...... 98 25 Tangent Derived Error Rates ...... 98 26 Effective Data Rates Including Spreading Effect ...... 100 27 Channel Length, BER, Throughput Summary ...... 101

ix Chapter 1

1 Introduction

Covert Channels have existed for centuries, from the time of Histiaeus to the modern day. Like its histor- ical roots, the modern covert channel’s life cycle consists of identifying new attack vectors, developing countermeasures and continuing with the next thrust and parry cycle. In this thesis, we investigate the potential of a particular covert channel form, its associated performance limits and mitigation tech- niques as they apply to the most recent battlefield examples, mobile devices and cloud computing centers. These domains are highly visible, currently experiencing high growth with the former ex- pected to achieve 80% growth over three years [10] and the latter expected to achieve 21% growth over 2017 [52]. Our objective is to not only identify these vulnerabilities, but to offer pragmatic mitigation strategies in the context of adaption concerns.

We emphasize resource compromising channels where channel existence is enabled by the host and the attacker and provider / defender both face complexity in creating and mitigating the attacks respectively. One channel relies on an external source that freely communicates with a mobile device’s unprotected sensor while a second leverages a mobile device’s physical structure and component resonance behaviors. The third, a cloud platform-based attack, targets shared resources whose existence is attributable to a service provider’s economic benefit pursuits.

1.1 Motivation Summary for Thesis

The primary motivation of this thesis is to examine new covert channels, avoiding Maginot Line-like defenses, assess channel performance and develop mitigation strategies. Emphasis is placed on stealth, rather than absolute performance, taking advantage of host / provider imposed limitations. For ex- ample, in the Android channels, we exploit sensors performing an unintended behavior - acting as communications receivers. In the cloud case, we exploit commonly shared resources that the cloud operator has a natural reluctance to ‘un-share’. We created these compromises to expose platform se- curity architecture limitations. In this regard, detection opportunities become limited, increasing the probability of information loss.

1.2 Mobile Device Position Identication Summary

Although Personally Identifiable Information (PII) leakage is at the forefront of the news cycle courtesy of FacebookTM [29, 37], location identification is also a significant privacy concern. While beneficial in many cases, the concept of location tracking elicits fear of law enforcement abuse [55], yet abuse concerns are valid, especially in the private sector. Many examples exist, highlighted in later chapters,

1 with FacebookTM garnering the most recent attention. Unfortunately, little progress has been made in mitigating this form of exposure.

Here, the permissionless magnetometer represents the attack surface of interest. Historically, permis- sionless attacks have targeted the Gyroscope [85] and the Accelerometer [22]. Although GoogleTM is active in sealing off potential avenues of attack, the former appears reluctant to impose a new permis- sion scheme that addresses all resources including the native device sensors.

Our objective is to track the user despite her attempts to prevent such efforts. Her tactics are assumed to include disabling location (GPS and cellular) services, Wi-Fi and NFC services to effectively disconnect from the ‘grid’. A seemingly innocuous app that is downloaded by the victim enables positional in- formation to be obtained indirectly via radiated electromagnetic (EM) fields from a source with known GPS coordinates. These sources may include stores, malls, railways, airports, hotels, cross-walks and bus stations. In a collaborative manner, these locations would host a special platform that transmits encoded data via magnetic field manipulation that a victim0s Android device magnetometer can detect. These codes represent the unique location of the platform which are transmitted off-board the Android device at later opportunities when other communication services are active. We can therefore establish a partial history of where the device has traveled which is potentially beneficial to commercial marketing, government agencies and malicious actors.

The permissionless nature of Android sensor access makes this channel difficult to discover. What exacerbates the problem is that the gait associated with movement has similar amplitude and frequency characteristics as the encoded data.

1.3 Mobile Device Sensitive Information Extraction Summary

In this attack, our objective is to obtain PII and other sensitive information. We created a covert channel built upon an ultrasonic communications bridge between two co-resident Android apps. The bridge, which uses the speaker and the local sensor package, leverages the ’s resonance behavior, where the resonance frequency(ies) is a function of the sensor design as integrated into the device’s housing. Manufacturers typically design the MEMS operating band in the linear region, well below the resonance frequencies. However, if a stimulus generates frequency components in the non-linear region near a resonance point, sensor sensitivity increases, and the accelerometer behaves as an amplifier / resonator.

The communication bridge enables the flow of information from a trusted app to an untrusted app. The assumption is that the trusted app is granted access to sensitive information but has no external access. A representative example of this would be a password manager. Conversely, the second has external access but is not trusted with sensitive information. A game would typify this type of app.

This is an out-of-band communications pathway which is challenging to identify. The inaudibility and permissionless characteristics allow for stealthy operations, ‘hiding in plain sight’ with data flowing freely as it circumvents system defenses.

2 1.4 Cloud Platform Attack Summary

This particular channel exploits a vulnerability created by the solution provider seeking to leverage platform shared resources for economic reasons. This attack is not found in traditional high affin- ity, proprietary data centers due to the polices blocking third party application deployment. In the cloud, commerce encourages collocation and resource sharing including such server resources such as L2 cache, communication buffers and networks. Here, we jam normal content traffic, using delays to represent communication symbols. We focus on stealth such that no direct TCP/IP communica- tion pathway exists between the jammer, co-resident with the victim and the victim, a publicly facing innocuous and commercial content server. The former creates contention in the shared resources, in particular, the network resources including the transmit buffer. When the interference server generates burst traffic, it affects the victim’s content retrieval time as observed by the information sink.

The cloud context offers a unique dilemma for the hosting provider versus the proprietary data center owner. In the first case, the provider is trying to maximize resource utilization, deploying as many applications as feasible on a single platform for economic reasons. To alleviate performance degra- dation, they utilize commonly shared resources such as cache. Unfortunately, common resources are often the attacker’s target so there is an inherent conflict. This is worsened by the traditional mitigation strategies such as injecting noise and delay. In the case of the proprietary datacenter, the owner is more amenable to emphasizing security and may be willing to ignore these performance boosters, utilizing these mitigation strategies to protect their system.

1.5 Existing Work and Mitigation Attempts Overview

1.5.1 Attack Vector Summary

Although covert channel research is pervasive and has patched many vulnerabilities, those at- tack surfaces that have persisted and not mitigated in the OS involve the three commonly provided sensors; accelerometer, gyroscope and magnetometer. Permissionless acoustic based channels to this point have targeted the gyroscope [64] and the accelerometer [89]. The stimulus is either high frequency as in the case of the gyroscope, or sub 1000 Hz and may include the use of the vibrator [2], although the latter requires permissions and is not yet considered dangerous. Microphone based channels re- quire permission and are considered dangerous by Android and typically involve an external source to complete the channel. Use of external resources contributes additional challenges due to the need for arranging proximity.

In the cloud domain, most attacks are side-channel centric where the actor monitors or exploits resource utilization or timing vulnerabilities, typically targeting the theft of cryptographic keys, library versions etc. [79]. Many have followed a ‘flush and reload’ pattern, consistent with same-platform side-channel attacks. Little work is covert channel specific.

3 1.5.2 Mitigation

The key defensive Android enhancements have included the deprecation of android.permission.ACCESS SUPERUSER and the default enforcement mode configuration for SELinux, eliminating the granting of ‘su’ privileges, useful in accessing kernel data structures. Furthermore, adoption of runtime and deploy time permission checks enables the user to determine the potential for inappropriate resource utiliza- tion or provide an alert for unrelated activities to the app of interest. As a result, system resource manipulation-based channels such as the /proc attacks have essentially been eliminated.

Additional mitigation discussions invariably converge on assigning permissions to gain resource access requiring runtime and load time checks. In theory, this alerts the user to apps that suspiciously use re- sources. Unfortunately, this has not extended to sensors. Reducing the sampling rates is also advocated, which in turn reduces a potential channel’s communications rate. Here a balance is struck between app usefulness (i.e., activity monitoring) and potential exfiltration rates. For example, activity analysis and to some extent directional analysis utility, degrade as the sampling rates decrease. At some point these would be rendered virtually useless. Although not yet in the literature, degrading sensor sensitivity yields a similar undesirable result.

Cloud channel prevention is typically limited to passive methods such as logging where analytical methods such as Kolmogorov-Smirnov testing is used to focus on discovery [36]. In terms of research, there are two common cloud mitigation recommendations. First, turn off shared cache i.e., L2 cache and manage other shared resources. This drives the hosting deployment scheme to encourage resource affinity (binding) which is a significant financial disincentive. Another tactic is to dynamically alter switch paths or insert noise so as to randomly insert delay into the communications pathway [45]. Although plausible and perhaps desirable in proprietary data centers, this may have an adverse on Quality of Service in the cloud which may increase the risk of customer flight when performance becomes unacceptable.

1.6 Thesis Overview

The thesis structure consists of three main chapters. In Chapter 2, we describe a permissionless privacy attack where location is determined in the absence of native location-enabled or adjunctive services which support the inference of position. In Chapter 3, we describe a permissionless, ultrasonic covert channel residing on an Android device which was deemed by peers as not possible to create. In Chapter 4, we demonstrate a co-resident cloud based covert channel where there is no direct linkage between the adversary’s source and listener. Finally, in Chapter 5 we summarize our efforts in the conclusion.

We must highlight an important topic addressed in paragraph 1.7, titled "App Download Behavior and Enticement", which describes the influences that entice users to download potentially malicious apps. This is such a critical topic that we believe it deserves its own discussion since the mobile device’s covert channel’s existence depends on executing a successful deployment despite the Security

4 Table 1: Key Downloading Influences Item % Respondents Browsing through the platform’s official store 63% Recommendation from others (word of mouth) 50% Browsing the store’s top Apps 34% Previously installed by the manufacturer on the smartphone 20% Through social networks 19% General Internet browsing 16% Searching via search engines (Google, Bing, etc..) 14% Ads in newspapers or magazines 7% Reading blogs 7% Reading newsletters 6% TV ads 6% Smartphone ads 6% Radio ads 3% Other 7%

Framework which is designed to prevent many malicious types of attacks.

1.7 App Download Behavior and Enticement

Malicious app installation is critical to two attacks described in this thesis and it is important to study user behaviors to exploit a prospective victim. In this section we describe the motivations behind a user installing apps in general. We maintain that the decision process, including the decision criteria in identifying and selecting these apps, does not vary between malicious and non-malicious apps as a priori knowledge of the former is rare.

Users commonly rely on a variety of influence sources in their app selection process. Consider the list in Table 1 based on Forrester research [28]. The top six items indicate that users rely on browsing / social impetus and prior experiences as major reasons for app acquisition. The social influence is particularly interesting since it can be manipulated via reviews and social application vehicles such as Instagram and Facebook for advertisements.

1.7.1 Age Based Demographics and Behaviors

Per Forrester [30], the aforementioned social media / networking applications, in addition to Youtube, are the dominant sources of online content for groups aged less than fifty years old. This demographics

5 group is typically more daring in trusting social media and other online sources for recommendations. Twitter and Pinterest are more balanced across age related demographic groups and offer a more diverse access base. In all cases, the user community has a number of popular online vehicles as potential influencing sources.

According to the aforementioned Forrester report, on a weekly basis, at least 70% of all users under 40 years of age access social media sites. The existence of strong market pull conditions is illustrated by the high demand from the Millennial population where, when asked about new app excitement, 70% of the respondents indicated that they are always interested in new and exciting apps.

1.7.2 Trust

Now consider trust and how it relates to consumer behavior. Survey respondents, in a study by Niel- son [70] indicated that 66% percent of users trust online consumer opinions, suggesting that an inti- mate relationship with a known person isn’t necessary in influencing decisions. Apptentive’s online survey [83] raised two key points. The first, 75% of those surveyed indicate that ratings are a key driver for downloading an app and the second, there is an eight times likelihood of downloading a highly rated app of an unknown brand than a poorly rated app of a known brand. These show a general will- ingness to try new apps based on reviews from unknown and untrusted sources suggesting that users show little apprehension in selecting apps to download. A real-world example was offered by Bell [6] who highlighted a common method for scammers to profit from tricking users into downloading apps offering subscription scams derived from the convergence of search ad selection in addition to Search Engine Optimization.

1.7.3 Multi-app Consideration

With respect to downloading multiple apps, the attacker can adopt a technique that asks the user if she has interest in downloading another recommended app during the download process. The rational is that once the victim has decided to download the first app, she may be predisposed to additional downloads, suggesting that the barrier to downloading multiple apps may not be high, especially before buyer’s remorse sets in. In the event that there is temporal separation, the Forrester study suggests a 20% return rate. It is imperative that the first app provide suitable functionality, is well featured and doesn’t crash such that a bias for return is established and this probability isn’t reduced.

1.7.4 Nationality / Country Considerations

Not surprisingly, there are behaviors unique to each country as highlighted by Lim [57]. For example, in China, there is a likelihood of selecting the first app shown on a list by a 9.27 factor. In India, we see it is 3.35 times more likely to download education apps. In the U.S. there is significant interest in health

6 and weather apps.

1.7.5 Emerging Defense

The attacker must be wary that in 2017, Google started policing app reviews and it is reported that fake reviews are being deleted [81]. Google allegedly monitors the rate at which new reviews occur, so the attacker must be cautious about an overly aggressive marketing campaign.

1.7.6 Legal Protections

Download behavior analysis is not complete without addressing how users respond to service agree- ments and privacy statements. In a 2017 study by Deloitte, which surveyed 2,000 U.S. consumers [14], three key conclusions summarizing user behaviors are noteworthy. First, 91% of respondents agreed to terms and conditions without reading them. Second, those aged 18-34 agree to terms and conditions before reading them 97% of the time. Third, there is a perception that the language is complicated and lengthy and that the respondents feel that all that is at risk are names and email addresses.

In addition, Obar [1] further demonstrated user behaviors by creating a social media website, Name- drop. Its terms and conditions included ‘gotcha’ clauses, one of which was that by agreeing to these, the user would have to give up their first-born child. Unfortunately, 98% missed these clauses.

These studies demonstrate that in general, users fail to read these agreements with the attention that they deserve. We conclude that these intended protections are ineffective.

1.7.7 Attack Strategy and Summary

Based on these aforementioned studies, the obvious attack strategy is to target social media sites as a source of enticement for users to download these innocuous looking apps. Purchasing reviews which are delivered at a below-the-threshold level will aid the attack. The specific marketing collateral ap- proaches to execute this strategy are not documented herein as they are demographics specific. Fur- thermore, we suspect that there is little difference between marketing strategies and campaigns from non-malicious app tactics other than in off-line advertising.

In summary, we observe a general demand among the younger population segments and a trust model built upon on-line reviews and referrals from both friends and unknown sources. A carefully designed attack will factor in these behavioral attributes and successfully entice users to download apps, mali- cious or otherwise. Additional attributes can enhance attack success when considering geographical and cultural behaviors / preferences. Manipulating search return list positions and emphasizing edu- cation, health and weather spaces can increase the attack breadth. It is important to note that the same vehicle which enticed the victim to download the app is the same that can accelerate the app’s demise

7 via poor reviews. If the malicious apps must perform as described and function without anomalies such that reviews are positive, they may be long-lived and pervasive. Finally reliance on terms and condi- tions in general, fails to have the desired affect such that the attacker could even include information about their attack / intent with little concern for detection.

8 Chapter 2

2 Permissionless Tracking Using Magnetic Fields

2.1 Introduction

Location tracking has achieved significant attention in the research community [44, 96, 73, 67, 92] and great notoriety in the news cycle. In late 2017, Quartz [16] reported that Android devices sent loca- tion data to Google when they were within range of a new cell tower. Although Google denied using this information for any malicious purposes and that Android devices would henceforth no longer transmit the data, it is yet another example of realizing the fear of being tracked. Uber [31] recently removed a feature that tracked riders for several minutes post ride termination. In an example of an unintended consequence, Strava [9] identified that US soldiers might be tracked via GPS coordinates available through a fitness app. The U.S. judicial system is now involved in the discussion as the Supreme Court agreed in 2017 to hear Carpenter v. United States [60] where the government was using cell phone records to identify locations where the phone had been, further demonstrating that tracking data is available and an evolving legal concern. In 2018, the Wall Street Journal published an article explaining the lucrative and expanding business of selling location data [66]. Each of these represent the potential for tracking and abuse of location information. Most disconcerting is the reluctance and / or slow response to preventing these compromises.

This discussion presents a unique location compromise via a stealthy location attack built upon a smart- phone magnetometer’s ability to detect small coded magnetic field fluctuations while the device is in motion. The attacker generates location identification information which is transmitted from an in- nocuous physical source. A device resident app, listens to the sensor output, performs noise removal processing and decodes the resultant signal which represents the transmitter’s location with commer- cial GPS accuracy. Combined with time, the attacker knows the when and where of the victim’s device despite her efforts to be temporarily disconnected from ‘the grid’ by disabling Wi-Fi, cellular, NFC, Bluetooth and GPS services. Even within a building, its accuracy exceeds Wi-Fi and cellular triangu- lation with these services enabled. One of the significant challenges to this attack is extracting the low-level signal from system and motion induced noise. System noise results from the magnetometer reporting scheme where the Hardware Abstraction Layer in some cases generates reports in a quasi- return to zero format. Motion noise induced from victim movement causes the readings to shift relative to its reference orientation. We evaluate channel viability by reducing system noise and account for motion noise stemming from carrying the device in typical manner i.e., mounted on a belt or inside a shoulder bag. This initial work addresses only these two conveyance modalities.

The attack is unique as it is an out-of-band, unilateral, non-persistent communications pathway that is difficult to detect. This permissionless attack is useful for commercial, law enforcement and unfortu- nately, malicious purposes with outgoing communications occurring asynchronously post data capture.

9 Mitigation is challenging without modifying the Operating System to seek permissions for sensor use, changing the sampling rate or changing device sensitivity.

Our contributions to this Institutional Review Board (IRB) approved research can be summarized as follows:

• To the best of our knowledge, we are the first to report the use of magnetic field communications to compromise a victim’s location privacy.

• We designed, built, and evaluated a system that is transferable to real-world deployments and scalable to at least one million locations.

• The system is intended to identify location absent of Wi-Fi, cellular, GPS, Bluetooth and NFC services and the attack functions without the need for permissions, making detection difficult.

• We developed signal processing and coding techniques that address the substantial signal fading effects due to mobility in a very-low power magnetic near field.

• We evaluated the prototype on six android devices, in an IRB-approved study with six participants.

• We achieved an aggregate location identification success rate of 86% with a bit error rate of 1.5% which is only ten times the stationary error rate.

• The solution’s position accuracy is controlled by the attacker rather than the mobile device’s ca- pabilities.

The remainder of this chapter is constructed as follows. In Sections 2.2 and 2.3, we describe the back- ground, motivation and threat model for the attack. Section 2.4 details the system design and some of the practical limitations for this attack type. Section 2.7 describes the testing methodology and we describe the results of the two walking test in Section 2.8. Section 2.9 describes mitigation options and we end with a related works discussion and conclusion in Section 2.10 and Section 2.11 respectively.

2.2 Background and Motivation

Location identification as a means to compromise privacy is a significant concern. Deriving location via Wi-Fi, cell tower triangulation and sensor exploitation is well researched, with typical accuracies of a few hundred yards and at best, tens of feet. GPS, with better accuracy, has limited functionality within buildings. Despite the advantages of location services and the economic benefit that beacons may provide, fear of actors such as law enforcement [56] engaging in user tracking remains. This fear has ex- tended to other actors as noted by the 2018 Department of Homeland Security disclosure of "Stingrays" or International Mobile Subscriber Identity (IMSI) catcher discovery around Washington DC [68].

Other permissionless forms of location compromises may involve the Gyroscope and the Accelerome-

10 Figure 1: System Design ter. However, absent the use of recorded dead reckoning data to infer position, there is little research involving magnetometer specific attacks. Some directly related attacks are:

Magnetic mapping, Gozick [39], is used to identify a physical structure’s footprint by its magnetic fields. Assuming each building has a unique signature, the attacker can determine where a particular device has been. The limitations are the magnitude of collected data (device and site) and how effectively it correlates to known magnetic fingerprints.

In a short-range communications example, Matyunin [63] identified a communications channel using a PC’s disk drive as the source and the magnetometer from a second device as the sink where both devices are stationary. They achieved a bit rate of 4 bps at a distance of 4 cm.

Although the magnetometer is used in Narain’s [69] location inference attack, it is not the primary con- tributing sensor. The accelerometer and the gyroscope were used primarily to determine position using graph analysis referenced to the OpenStreetMap database. Its accuracy is limited to half the distance between map street junctions and cannot be applied in pedestrian, rail, ship or air travel contexts as velocity or its approximation is needed from an external source. iBeacon™, an Apple technology, uses Bluetooth low energy (BLE) identifiers that a smartphone app can listen to, enabling device location and customer tracking etc. In this case, the user enables tracking services, clearly willing to be tracked to enhance the shopping experience.

An additional vector of recent interest involves facial recognition, Gunther [40], or presentation attacks, Ramachandra [78]. However, identity matching remains problematic primarily due to database com- pleteness limitations. Unless access to law enforcement types of databases is available, the attacks have limited effectiveness unless the presentation attack involves scanning social media databases containing pictures and executing resource intensive activities.

If one applies a permissionless constraint to limit detection, additional complexity is levied on the so-

11 lution since all native transmitting services are disqualified from use. On the receiver side, this applies to the microphone whose functionality is enabled with the granting of permissions. Even with permis- sions, what differentiates this from simply broadcasting an ultrasonic signal from the device’s speaker to an external device is threefold. First, the device becomes the transmitter which suggests that it must transmit signals while active. This creates a significant power drain on the battery whereas reading the magnetometer consumes much less power. Secondly, the new receiver must be aligned with the acoustic wavefront either by direct radiation or incident reflections. Hence orientation is important. In the channel discussed herein, this latter concern lacks applicability since the EM fields are not blocked. Finally, devices may be subject to acoustic attenuation due to clothing, shoulder bags / pocketbooks, carrying belts and potentially skimming protection solutions.

Our motivation is to, absent of such limitations, track the user despite her attempts to prevent such efforts. The solution is scalable, limited to the willingness of the tracking entry to install a system to perform some type of privacy invasion.

2.3 Threat Model

This section describes the location ID threat model.

2.3.1 Vulnerability

There are two aspects to this attack’s vulnerability; the technical element and human behavior. A vulnerability exists in the Android space where direct reading of sensor data is not restricted when using the SensorManager class. Sensor access is not encumbered by declaring their use in the Android- Manifest.XML file. With this permissionless access control(AC), the user is not alerted to sensor use at installation time nor at run time. This allows magnetometer data to be accessed without security limitations. This attack involves one-way communications with the magnetometer as a passive receiver without the need for permission dependent transmitting resources.

On the human side, we rely on social media and Google play ratings to convince users to download the app. The app must be well rated which is easy to overcome by procuring high ratings, seeding a ‘like’ in social media via Facebook and invoking other means. In this manner, the illusion of trust is established and propagated.

2.3.2 Threat

The threat is in obtaining position information despite the victim's efforts to avoid leaking this infor- mation. It is assumed that she disables Wi-Fi, NFC, GPS, Bluetooth and cellular services. The device however, remains powered on. This condition is consistent with placing the device in airplane mode in addition to disabling location acquisition and support services. The victim installs a seemly innocuous

12 app that functions even when placed in the background. All that remains is for the victim to move in proximity to the radiating sub-system. In some cases where the attacker is a government entity, the threat of field proximity as platform deployment is potentially pervasive given access to vast resources.

2.3.3 Attack

The attack is driven by a select group of potentially malicious and benign actors. Those benefiting might include, governments, law enforcement, marketing and sales analysts and the hosting entity. Government and law enforcement interests are based on the desire to track any number of individuals for location history purposes. Notification of their activities to third parties and data usage would be subject to jurisdictional laws. Marketing and sales analysts would seek to identify drive-by individuals for campaign targeting. Supplemental means to contact the target(s) i.e., via text messaging and email, might occur subsequent to a ‘hit’. Similarly, the hosting entity might desire to use this in support of in-store sales activities and broader campaigns.

The attack is enabled by placing a transmitter in high traffic locations. External triggers are used to detect human presence, either by a pressure switch or an optical loop that is interrupted by the victim. In a physical drive-by style attack, the target(s) enters the transmitter’s field of view and if the malicious app is present, the attack should succeed as it continuously monitors the magnetometer’s outputs via the Hardware Abstraction Layer as a registered listener. We discuss the method to entice potential users to download the app in Chapter 1, section 1.7.

The transmitter and the controller elements may be embedded in the floor, ceiling or a wall. The instal- lation would be minor for new construction or store set-up and slightly more complicated for existing structures. The transmitter may also be located in kiosks, cross walks, bus, airplane and train terminals, limited by available resources.

2.3.4 Exploit

The innocuous app, masquerading as a legitimate function, i.e., a tasker, file explorer, calendar etc., is a registered sensor listener that records magnetometer data. When a synch frame is detected, the data is processed, stored and subsequently transmitted to an off-board colluding application when Wi-Fi or cellular services are enabled. We exploit the Android security framework which currently does not re- quire permissions or a listing in the Manifest file that an app is intent on using the magnetometer. As a result, there is limited traceability of activity on this sensor. The magnetometer is additionally exploited as it responds to external magnetic field stimuli either as ambient values or by emitting sources. We transmit information with pulsewidths sufficiently long such that the sampling rates do not violate the Nyquist frequency for data reconstruction.

13 2.3.5 Trust

Trust is presumed in two cases. First, the victim believes that her movements are not tracked when disabling location related services. Second, she has downloaded an app that provides valuable func- tionality and whose app store ratings meet her satisfaction.

2.3.6 Data

In this attack, we actually create the data and determine if it is received and interpreted correctly by the victim's device. at a future time, we transmit the interpreted data off-board to a cooperative endpoint to ascertain the victim's location. The data represents a transposition of GPS coordinates.

2.4 System Design

2.4.1 Magnetometer Based Tracking System Overview

The system overview diagram is illustrated in Figure 1. We show a victim, oblivious to the platform beneath her feet, as she walks toward her destination. Alternatively, she is walking in proximity to an innocent looking kiosk. She is unaware that each of these structures house a subsystem that generates coded yet harmless magnetic fields strong enough for her smartphone’s magnetometer to detect. She is also unaware that the ‘really cool’ app she downloaded is also recording the magnetometer’s readings which when processed, provide position data.

We envision that each of these system types is potentially located in stores, malls, transportation hubs, cross-walks or in other locations where victims would patronize or be in close proximity. The smart- phone’s app functions include recording magnetometer data and exfiltrating semi-processed data.

The nature of this app could support malicious or non-malicious activities. In the former case, the app supports location exposure, violating user privacy. In the latter case, the app can support targeted shop- ping as a commerce-oriented beacon. It may also enhance law enforcement activities such as persons of interest tracking.

In all cases, the system is placed in a predetermined location, where each location is assigned a unique ID. Since this attack is expected to scale to one million instances, the coding approach is important. Code selection must also account for long continuous equal-bit value sub-streams to avoid frequency content similar to the victim's gait.

Each station issues its unique code in perpetuity, triggered by an external event such as depressing a pressure switch on the floor or interrupting an optical loop. This ensures that the victim’s phone is within detection range of the coded electromagnetic (EM) field emissions from the platform coil(s). Based on anthropomorphic data [38], we set the transmission field strength at 37 to 41 inches above the

14 plane of the coil(s) to be less than 30µT. Although the only platform limitations are size, this influences EM field strength. The former is driven by physical limitations of the deployment site while the latter is a function of current, turn count, coil dimensions and the permitted power ceiling.

The magnetometer data stream is filtered to remove device anomalies, high frequencies and the effect of the human gait. Automated Gain Control (AGC) is applied post filtering to compensate for uneven signal strength due to position within the magnetic field. The app either waits until it is within a Wi-Fi range and transmits the individual identification out to a participating server or data is aggregated and transferred in batch. In either case, the locations and times are now available from which further attacks may be launched.

Figure 2 illustrates the major transmit side elements: The controller, switching circuitry and the plat- form hosting the coils. The first two are described in Section 2.4.5 while the platform is addressed in Section 2.4.4.

Controller Switching

Platform

Figure 2: System Building Blocks

2.4.2 Magnetic Flux Determination

Due to the asymmetric nature or our platform, we derive the magnetic flux density B at a point in space Px,y,z for a rectangular coil [84] of N turns, Figure 3. This vector consists of each of the axial flux density contributions Bx, By, Bz at P as the target device passes through the magnetic field. This coordinate system is consistent with the three axis Cartesian coordinate system found on .

The magnitude is given by equation 1. q 2 2 2 B = Bx + By + Bz (1) where Bx, Bx, Bx are x,y and z plane flux contributions

15 z

P(x,y,z) r1 . r4

r2 a r3

y a

b b

x

Figure 3: Point In Space Magnetic Flux

µ I 4 1α+1z) B = N 0  −  (2) x π ∑ 4 α=1 rα[rα + dα]

4 α+1 µ0I  1 z)  By = N ∑ − α (3) 4π rα[rα + 1 +1Cα] α=1 − 4 α+1 µ0I  1 z Cα  Bz = N ∑ − α (4) 4π rα[rα + 1 +1Cα] − rα[rα + dα] α=1 − and α is a side (one of four), N, I, µ0 are the coil turn count, current and permeability respec- tively and where:

C , C = a + x and C , C = a x (5) 1 − 4 2 − 3 −

d = d = y + b and d = d = y b (6) 1 2 3 4 − q 2 2 2 r1 = (a + x) + (y + b) + z (7)

q r = (a x)2 + (y + b)2 + z2 (8) 2 − q r = (a x)2 + (y b)2 + z2 (9) 3 − − q r = (a + x)2 + (y b)2 + z2 (10) 4 −

16 a and b are half the length and half the width of the coil respectively and Cα and dα are ref- erence points enabling the derivation of rα, the Euclidean distance of each corner to P. These equations are the foundation for the design/parametrization of our system prototype. The key point is that Bx,y,z is affected linearly with N and I and inversely proportional with rα.

2.4.3 Challenges and Tradeos

There are seven factors that drive the system design.

• Platform Size: We are limited by the magnetic field size which is a function of platform size. Hosts will want to limit the system’s physical footprint and make it imperceptible.

• Magnetometer Sampling Rates: We are limited by the device’s sensor sample rates. Low rates necessitate larger signal pulsewidth, which in turn increases transmission and in- the-field times.

• Speed: The speed at which humans can walk affects attack viability. If the velocity is too high, the code pattern may not be received in toto as the device passes though the magnetic field.

• Scale: The system must scale to support a large set of deployments which drives the payload length.

• Device Orientation: The position ID must be resolvable without regard to device orien- tation.

• Safety: The system must not radiate magnetic fields large enough to cause harm.

• Stealth: The system attack must be stealthy and function without GPS, Wi-Fi, NFC, Blue- tooth and cellular location supporting capabilities during execution.

2.4.4 Design Decisions, Observations and Parametrics

• Scope: Since we are in the initial stages of this effort, the testing scope was limited to a belt and a shoulder bag. Other transport modalities which are useful in completing this effort include in-clothing pockets, in-hand and arm-band modalities. In addition, the transmitter was limited for this current series of experiments to floor operations, whereas in-wall, in-ceiling, kiosk resident and security tower-like structures are viable deployment alternatives.

17 • Sampling: The largest sampling period of any evaluated device tested was 18.9 msec. This limits the lower bound signaling pulsewidth to approximately 37.8 msec to avoid aliasing effects. We utilize 45 msec to account for future target devices within our proto- type’s dimensions limits.

• Data Frame Sizing: We utilize a synchronization preamble and a payload consisting of an ID information field and a validation field. ID length must factor in the number of unique locations and any necessary coding overhead.

n = t 1/pw (11) × t = L/v (12)

We define the longitudinal velocity of the human passing through the magnetic file, v; the length of the platform, L; the time in field, t; the pulsewidth, pw; the number of bits

that can be detected while in the magnetic field, bitsr; the number of bits transmitted once

triggered, bitst; the frame length, n; the number of deployed platforms p. Since Equa-

tions 11 and 12 must be solved simultaneous and if bitst = bitsr = n, then using t, the maximum time in field to solve for n, t = n pw = L/v n = 1/pw L/v. × → × • Device Velocity: Velocity affects the time within the magnetic field. From Bohannon [11], the nominal comfortable gait speed is 4.6 feet/sec. ≈ • Physical Size: We assume platform dimensions of 8.7 feet x 36 inches. Since the magnetic flux lines extend beyond the physical boundary of the coil and due to the test environ- ment’s physical constraints, we set the length as slightly less than 4.6 pw n which × × represents a balance of minimal intrusiveness, physical constraints and meeting perfor- mance needs.

• Gait Amplitude Contribution: We found in our experiments that the deviation in mag- netometer readings attributable to the gait was approximately equal to 15µT. This am- ± plitude is similar to that induced by the signal.

• Distinguishing Gait from Data: Long contiguous snippets of equal valued data can be construed as a gait induced contribution. For example, 8 bits at 45 msec per bit is 360 msec, which in terms of frequency, is within the gait frequency band. As a result, the data must be coded to account for these occurrences. For this exercise, we intentionally shortened the permitted identical contiguous bit sub-streams and accepted the bit length penalty of a longer ID field.

• Scale: The key issue is determining the bit stream length to accommodate the limited

18 time within the magnetic field and the sampling rate limitations while providing effec- tive signal discrimination. We selected 1 million possible platforms, 20 bits uncoded, to provide a modest level of scaling.

• Safety: In a joint question and answer report [32], the U.S. National Institutes of En- vironmental Health Sciences and Health suggested that the levels associated with hair dryers (nominally 300µT) and electric razors (nominally 100µT) at operating distances are safe to humans in normal use. In a static position measurement taken below the knee, our maximum RMS field strength was approximately 44µT. At range, our worst-case field strength is 20µT. Exposure time in the appliance case is several minutes while less than ≈ 5 seconds with ours.

• Stealth: The attack is permissionless, making detection difficult. Exfiltration to a third party, albeit needing permissions, is not the objective of this study.

2.4.5 Coil and Electronics Design

Parameter Value Dimensions (Inches) 67.5 x 24.5 Turns 65 Height (Inches) 1.25 Inductance (Henries) 0.024 Wire Length (Feet) 993 Wire Nominal Resistance (Ohms) 6 Wire Gauge (AWG) 18

Relative Permeability (µ/µ0) 1.00000043 Table 2: Coil Parameters

Coil Design Table 2 highlights coil parametric information for the hand wound, air-gapped coil using a wooden frame as a bobbin and magnet wire as the conductor. These parameters approximate a real-world deployment.

Electronics Design The electronic circuitry, Figure 4, consists of an Arduino based Linkit Smart 7688 series controller, two linear power supplies providing 12.5V voltage rails, voltage ± suppression / fly-back capability and solid-state switches, in a single pole, double throw con- figuration, which supplies current to the coil(s). The Arduino controller enables the switches

19 on a per bit basis where each bit has a duration of 45 msec. Individual bits are coded in a non-return-to-zero (NRZ) format. We selected NRZ since the magnetic field’s rise and fall time approached 10 msec which potentially increases aliasing at the sensor sampling rates. We switch between the voltage rails to support rapid charging and discharging of the coil(s), L1, without the need for AC coupling.

Figure 4: Single Coil Circuit Design

2.5 Code Selection and Payload Design

Coding selection presents a challenge since we are limited to 32 bits for data and checking based on the previously described physical constraints and synhcronization framing. Concate- nating Hamming codes [43] is insufficient since, for example, in an 8-4-4 Hamming scheme, one needs eight bits to get four bits and the scale and payload conditions are violated. Gold codes [50] and Kasami codes [98] do not allow enough preferred pairs to satisfy the scale ob- jective while simultaneously satisfying the number of bits available. Compounding this is the situation when either a stream of 0s or 1s bits occurs since these may be undifferentiated from gait attributed frequencies. To avoid this concern, we selected ID patterns that did not include identical bit sequences exceeding 3 bits. With a one million platforms target, we would need 23 bits in our code to get 20.

For demonstration purposes, we used an 8-bit parity sequence where each bit is the bit XOR of the corresponding bit derived from the 23-bit code sliced into three 8-bit words. We initially set the 24th bit to 0 in this calculation. Once completed, we set the 24th bit to the parity value of

20 the 8-bit parity sequence itself. This allows us to perform a rudimentary check on the latter and allows us to detect small error counts. Of note, we violated the four identical bit rule for the 9- bit parity sequence, recognizing that this would potentially increase error rates. Consequently, the overall frame design consists of a synchronization header, the aforementioned ID field and a footer containing parity information. For synchronization, we adopted a spreading technique leveraging a Pseudo Noise (PN) sequence (MSEQ 15) in the header which represents a balance in achieving short synchronization lengths, signal gain and minimizing pattern duplication.

We could increase the frequency separation from the gait fundamental by limiting the number of contiguous identical bits to two. The number of such codewords follows a Fibonacci series with an n of 29 yielding a k of 20. These testing results are discussed in Section 2.8.5.

More aggressively, we could include error correction by changing the header to a Barker 7 code, utilize a Hamming [31, 26, 3] code correcting one bit, then break the 0/1 pattern by bit stuffing while including the two-bit limit. We leave the optimal coding scheme for a future study.

2.6 Signal Processing

Android smartphone sensors such as the accelerometer and gyroscope are intended to re- spond to motion while the magnetometer is intended to respond to orientation shifts relative to a magnetic reference position. These sensors may be sensitive to external non-motion driven stimuli such as ultrasonic signals in the cases of the accelerometer and gyroscope and in the case of the magnetometer, indigenous fields and external magnetic field manipulation. These, plus motion induced responses, are seen in Figure 5a and Figure 5b. The former illustrates all testing movement including the returns to the starting position while the latter shows for a zoomed single walk, the visible signal on the X and Y axes. The effect of the gait is visible, predominately in the Z and to a lesser extent, the Y and X axes. This is expected as the distance off the platform plane varies within hip flexion and extension ranges while underway. From a signal strength view, the gait amplitude may reach twice that of the signal while its period exceeds the bit’s pulsewidth.

Figure 6 illustrates the process steps used to extract payload data. The sensor data is initially interpolated in increments of 1 msec since sensor reporting is non-uniform. In addition to the obvious sample rate differences, we found that among the LG, Nexus and ZETA devices, the Hardware Abstraction Layer reports a ‘zero’ post a scheduled sample. This behavior is observed in Figure 10 versus what is observed with the Galaxy S6, see Figure 5. We apply a concept of last reported value when this occurs in order to eliminate this condition. This condition is detected when the next sample changes by more than the mean signal value over

21 300 X Y 200 Z

100

0

100 − Sensor Reported Value

200 − 0 1 2 3 4 5 1011 Sensor Event Report Time (Msec) × (a) Dynamic Test - 25 Walks, Macro View Including Turns

300 X Y 200 Z

100

0

100 − Sensor Reported Value

200 − 1.495 1.500 1.505 1.510 1.515 1.520 1.525 1.530 1011 Sensor Event Report Time (Msec) × (b) Signal, Code and Gait from Single Walk Figure 5: Magnetic Fields, Galaxy S6 the entire run. Moving average or equivalent filters are not suitable as they smooth the signal. Once completed, low frequency components sourced by the gait motion are identified by an FFT. These are used to set the filter cut-off frequencies which are typically below 2 Hz. A subsequent composite signal is generated and passed through an AGC process which com- pensates for signal strength variation due to differences in off-angle positions relative to the center of a coil. The AGC value at point s is shown in equation 13 where τ is the selected threshold, ds is the post noise-cancelled value at s and es is the energy at s.

AGCs = f (ds, τ, es) (13) A synchronization preamble hunt occurs post AGC processing. Using a matched filter based on the MSEQ15 pattern, we slide the filter over the AGC output and correlate at each AGC point. Correlation is computed using Equation (20), where l is the encoding scheme length, x[i] is the sensor measurement at i within l and EE[i] is the encoding scheme’s ith code value within l, e.g., -1 or 1 for each chip as needed. l syncstart = ∑ ( x[i] ) EE[i] (14) | i=1 | | · | sˆ = argmax syncstart (15) s s ,s ,s ,...s ∈{ 1 2 3 n}

22 Z te D = AGCidt (16) i=ts  1, if D τ Bits = ≥ (17) 0 otherwise Once the start of the synchronization preamble sˆ is determined, the preamble is stripped away and the payload is extracted. We apply discrimination processing in 16 and 17 where we inte- grate the AGC output over the interval ts to te and apply the result against a threshold τ to get the bit value. Error processing is executed with ID error rates and bit error rates subsequently determined.

FFT

Magnetometer Filter Vector Gen Apply AGC Data Interpolate

Discriminate Synchronize Process Processed Errors Data

Figure 6: Signal Processing

2.7 Testing and Evaluation Approach

We successfully obtained IRB approval for human-in-the-loop experiments, enabling the use of test assistants to transport the test devices from differing manufacturers in proximity to the magnetic field. These assistants encompassed three men and three women of varying heights, weights and walking gait patterns.

23 Figure 7: Platform with Coil Exposed

2.7.1 Testing Methodology

We deployed an app in each device that recorded the magnetometer readings in each of the X, Y and Z directions. Each assistant carried all devices simultaneously and were tested with a tool belt and with a shoulder bag. Orientation was not explicitly controlled although in the case of the tool belt, the general orientation was vertical with the face pointing toward the participant’s torso. With the shoulder bag, the general orientation was horizontal with the face pointed up. There was no intention for any axis to be precisely parallel or perpendicular to the platform surface. These positions were selected to reflect commonly used orientations.

After enabling sensor recording with the test app, each assistant walks over the platform, Fig- ure 7, 25 times while carrying all of the devices concurrently. A new ID position code, Table 3, was transmitted for each pass, yielding a total of 25 unique codes per assistant per device per test. The same position code sequences were used for each test. Each walk pass consisted of a synchronous series of events. Initially, the assistant would wait for a fixed period of time (sec- onds) until visually cued with warning signals followed by a ‘go’ signal. The assistant would subsequently traverse the platform, return to the starting position and wait for the next series of cues. This sequence is more stressing than with a real deployment since the emissions would be triggered by a pressure switch / optical sensor such that emissions would occur while the victim was within the coil boundaries whereas in these tests, we relied on reaction time. After all walk passes were complete, the data was post processed to ascertain solution effectiveness.

24 We do not evaluate the effect of gender. Tests were performed on both sexes independent of vehicle. Our focus was to identify differences in performance with respect to noise where one noise contributor might be gender related gait.

Table 3: Data Pattern Pattern # Bit Pattern 0 00110111010001000100111100111101 1 00100010101001101010101000101110 2 00011011101010111001101100101010 3 00011000111010111001000001100011 4 10101000101011011100010111000001 5 11011001100100111011101011110000 6 11001001010011101101000101010111 7 01100011101000101001101101011011 8 11000111001011010101101110110000 9 01011000110011011101110101001001 10 11100010011100011010001100110001 11 01001010101000110001001111111011 12 11010100101100110111010100010011 13 10101010001101001011000000101110 14 10111011101101110101011001011010 15 10001100011100011010100001010101 16 01101001000100110110011100011100 17 10100111001010001010101100100101 18 11100011011011000111011011111001 19 11001101110001110010010000101110 20 10101011100011101000111110101011 21 00101010110110011001110001101111 22 11010100011010110110010011011011 23 10011001000101001000101100000111 24 00101110111001001001010101011110

2.7.2 Magnetic Field Characteristics

Magnetic field strength varies relative to position as devices move over the coil. Figure 8, il- lustrates the X,Y and Z readings of the measured field values at 16 inches above the center

25 of coil moving from the edge closest to the starting position, ‘Start’ to the ending position, ‘End’. Measurements were taken by sliding the device along the parallel plane while the coil is transmitting signals. Positions 1st Q, Mid and 2nd Q denote the midpoints of the first half, overall platform and second half of the platform respectively. The measuring device was par- allel to the surface of the platform and rotated 90°. The Z position was flat with the face up. In general, the symmetry is clear with the worst-case max-to-min ratio approaches 3 : 1.

45 X 40 Y 35 Z

30

25

20

15

RMS Value10 (uT)

5

0 Start 1st Q Mid 2nd Q End Position Figure 8: Axial Static Position Readings

2.8 Testing Results

This section describes our test results. Of the six devices evaluated, four produced satisfactory results. The two failures were the S7 and the Nexbit Robin.

2.8.1 Sampling Rates

Device sample rates are provided in Table 4. Aliasing was not a concern except in the case of the Robin where the standard deviation equaled the pulsewidth. Otherwise, the worst-case sample rates were greater than twice the 18.92 Hz signaling bit rate (45 msec pulsewidth).

Table 4: Sampling Rate Statistics (Msec) Mfr. Device Model Mean Max Min STD LG 4 8.39 18.92 4.18 0.07 Nexbit Robin 7.41 1009.8 4.92 45.01 4.93 9.86 3.88 0.22 Samsung S6 4.41 10.66 1.5 1.42 Samsung S7 4.73 6.23 4.62 0.01 ZTE Blade V8 Pro 5.01 6.84 4.92 0.014

26 2.8.2 Processing

We show in Figure 10, the magnetometer response for a given pass of the ZETA Pro device, with and without signaling. In the sequence times between 2000 and 8000 msec in Figure 10a, the straight-line periodicity of the gait is observable without signal emissions. The deviations after this range are test specific as the subjects were asked to return to the starting position. The presence of signal is shown in Figure 10b between 3000 and 5500 msec for a similar walk. Note that the latter imposes a minor amplitude deviation while underway and the signal rides on top of the ambient (including gait) readings.

The RZ signature is eliminated prior to interpolation which is followed by the removal of gate related components and the turns contributions as shown in Figure 11a using a multi-stage filtering scheme. The gait frequency spectrum for each DUT for a sample test assistant is pro- vided in Figure 9. Although these frequencies are less than 5Hz, the content is more noticeable at lower frequencies.

X FFT X FFT Y FFT Y FFT Z FFT Z FFT Strength Strength

0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40 Freq (Hz) Freq (Hz)

S6 ZTE

X FFT X FFT Y FFT Y FFT Z FFT Z FFT Strength Strength

0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40 Freq (Hz) Freq (Hz)

Nexus LG Figure 9: FFT - Low Frequency Content

We transpose the resultant tri-axial data, Figure 11a, into the composite signal, see Figure 11b in both Cartesian and Spherical coordinate systems from the measured values associated with

Section 2.4.2 as vector B, inclination arccos(Bz/B), and azimuth arctan(By/Bx), representa- tions and select the output with the most fidelity prior to the application of AGC to create the final data. Since the orientation is neither controllable nor predictable and the axial sensor readings vary with orientation and position within the magnetic field, computing all three

27 composite signals is needed. This is evident from Figure 5b where the Z axis has a strong gait and little signal, X and Y have severe and moderate edge attenuation respectively due to slow fading of the near-field channel and each exhibits asymmetrical yet opposing gait elements. Figure 11c illustrates an AGC output example superimposed on the composite signal.

50 50 X X Y Y 0 Z 0 Z

50 50 − −

100 100 − − Sensor Reported Value Sensor Reported Value

150 150 − 0 2000 4000 6000 8000 10000 − 0 2000 4000 6000 8000 10000 Sensor Event Report Time (Msec) Sensor Event Report Time (Msec) (a) Signal-less Walk (b) Walk with Signal Figure 10: Single Walk Magnetometer Readings

120 X Vector Incl 100 Y Z Az

80

60

40 Sensor Value Composite Value 20

0 94500 95000 95500 96000 96500 97000 97500 98000 98500 99000 94500 95000 95500 96000 96500 97000 97500 98000 98500 99000 Sensor Event Report Time (Msec) Sensor Event Report Time (Msec) (a) Gate Removed (b) Composite Signals

Vector AGC

94500 95000 95500 96000 96500 97000 97500 98000 98500 99000 Sensor Event Report Time (Msec) (c) Automated Gain Control Figure 11: Signal Processing Chain

28 Table 5: Static Error Rate Summary Mfr. Device Model BER LG 4 1.67 10 3 × − Nexus 5 6.25 10 3 × − Samsung S6 0 ZETA Blade V8 Pro 1.67 10 3 × −

2.8.3 Stationary Testing

Table 5 summarizes device static error rates for a 4800-bit test. The worst-case error rate is 6.25 10 3. × −

2.8.4 Walking Results

Table 6: Error Summary Test Subject Belt/Bag Device IDE BE SBE DBE A Bag LG-D41521 5 8 4 0 A Bag Nexus522 4 6 3 0 A Bag SM-G920T23 5 36 2 1 A Bag Z97823 1 1 1 0 A Belt LG-D41521 2 2 2 0 A Belt Nexus522 9 15 5 2 A Belt SM-G920T23 2 2 2 0 A Belt Z97823 0 0 0 0 B Bag LG-D41521 1 1 1 0 B Bag Nexus522 7 9 6 0 B Bag SM-G920T23 5 57 1 0 B Bag Z97823 1 1 1 0 B Belt LG-D41521 4 46 1 0 B Belt Nexus522 4 6 3 0 B Belt SM-G920T23 1 2 0 1 B Belt Z97823 0 0 0 0 C Bag LG-D41521 2 4 1 0 C Bag Nexus522 6 9 5 0 C Bag SM-G920T23 3 32 1 0

29 Table 6 – Continued from previous page Test Subject Belt/Bag Device IDE BE SBE DBE C Bag Z97823 0 0 0 0 C Belt LG-D41521 1 1 1 0 C Belt Nexus522 6 8 4 2 C Belt SM-G920T23 2 2 2 0 C Belt Z97823 2 2 2 0 D Bag LG-D41521 7 8 6 1 D Bag Nexus522 7 12 4 2 D Bag SM-G920T23 2 3 1 1 D Bag Z97823 1 2 0 1 D Belt LG-D41521 5 53 2 0 D Belt Nexus522 9 12 6 3 D Belt SM-G920T23 1 1 1 0 D Belt Z97823 0 0 0 0 E Bag LG-D41521 5 7 3 2 E Bag Nexus522 6 56 3 0 E Bag SM-G920T23 2 3 1 1 E Bag Z97823 0 0 0 0 E Belt LG-D41521 7 9 5 2 E Belt Nexus522 8 11 5 3 E Belt SM-G920T23 3 34 1 0 E Belt Z97823 2 2 2 0 F Bag LG-D41521 2 2 2 0 F Bag Nexus522 9 14 5 3 F Bag SM-G920T23 5 48 1 1 F Bag Z97823 0 0 0 0 F Belt LG-D41521 5 8 3 1 F Belt Nexus522 4 6 3 0 F Belt SM-G920T23 4 33 2 0 F Belt Z97823 0 0 0 0

Table 6 summarizes the testing results. Columns IDE and BE indicate the number of errors for a given device with respect to the 25 possible IDs (IDE) and 800 bits (BE). The worst-case ID error rate is 9 out of 25 which occurred on only one device type. This suggests a good

30 confidence level that we can determine the precise location of a device that is in range of our coil. The S6 and Zeta are the best performers where the worst-case correct identification rate between the two is 83%. The LG and Nexus follow with worst-case values of 77% and 61% respectively. Although gender breakout is intentionally hidden from the reader, the results appear inconclusive with respect to gender orientation and locomotion.

Initially, we thought that there might be a bias based on gender due to physical and traditional conveyance modality. Although the mean height for men moves device distances 5.5 inches ≈ further away from the surface of the platform, pocket book / shoulder bag use may offset this gap as the bottom of the book / bag is usually several inches above the iliac crest, which itself is estimated to be 2 inches above the location of a belt carried device. The use of high heels further reduces the gap. Second, a bag provides additional device tilt such that the data may show an increased contribution from previously non-dominant axes. One might attempt to in- fer gender from axial data but once ‘vectorized’, the belt vs. bag results are indistinguishable. See Frimenko et.al, [33] for more information on gender gait differences.

In some cases, there are low IDEs with large BEs. These typically occur when synchronization fails. Some failures may be attributed to the testing scheme which is dependent on visual cues to initiate a walk. Any delays by the test assistant may cause a partial preamble loss due to emissions starting prior to acceptable proximity to the coils. In a true deployment which would rely on physical detection methods for presence within the anticipated field, proximity induced synchronization failures would be mitigated.

The results validate our approach to identifying location despite the presence of static envi- ronmental magnetic fields and system noise sources such as those associated with actively carrying the device. With one exception, the non-stationary error rates are an order of magni- tude worse than the corresponding stationary ones. We have included two additional columns, single bit errors (SBE) and double bit errors, DBE where we track the occurrence of each for a given test. The ID success rate would exceed 94.8% if the coding scheme selected supported single bit error correction and 97% for double bit error correction.

Although the Galaxy S7’s sample rates were well within our operational parameters, it appears that the issue is poor sensitivity as it did not exhibit the dynamic range seen in the four attack prone devices. At this juncture we are unable to identify the sensor part number to examine its specifications.

31 2.8.5 Contiguous Identical Bit Assessment

The above results reflect data patterns prohibiting four or more contiguous identical bits. We

Table 7: Data Pattern, Two Bit Limit Pattern # Bit Pattern 0 00100100101010101010110100100101 1 00100101010010011011001100100110 2 00100110101010101001010110101101 3 00101001010010101001011010110011 4 00101010101011001001100101011011 5 00101011010010110011011010101100 6 00101100101011001010010011001001 7 00101101010011001010010101001010 8 00110010101011010010010010101010 9 00110011010011010010010101001001 10 00110100101100110011010101001101 11 00110101010011010011011010101100 12 00110110101011001010110010011001 13 10010010101011010100101100110110 14 10010011010101101010110010100100 15 10010100101010110010010010101010 16 10010101101011010010010101001001 17 10010110101100110010011010101010 18 10011001010110110010100101001010 19 10011010110010110010101010101100 20 10011011010010010010101101001011 21 10100100110010010010110010101100 22 10100101010010101001001010101101 23 10100110101100101001001101010110 24 10101001101101001001010010101011 replaced the datapattern with one that prohibits three or more contiguous bits as illustrated in Table 7 to provide greater separation from the gait fundamental frequency. The results are provided in Table 8. The overall Bag result improves slightly versus the Belt which improves substantially. In the latter, errors are either non-existent or correctable with single bit correc- tion schemes. We suspect that a belt offers greater structural coupling to the body vs. a bag

32 which floats, anchored at one spot and may be susceptible to other noise sources.

Table 8: Two Bit Limit Summary Test Subject Belt/Bag Device IDE BE SBE DBE B Bag LG-D41521 2 4 1 0 B Bag Nexus522 5 6 4 1 B Bag SM-G920T23 6 93 0 0 B Bag Z97823 0 0 0 0 B Belt LG-D41521 0 0 0 0 B Belt Nexus522 3 3 3 0 B Belt SM-G920T23 0 0 0 0 B Belt Z97823 0 0 0 0

2.9 Mitigation

Since Wi-Fi, GPS, Bluetooth, cellular and NFC are assumed to be disabled, the attack surface is for this type of attack is reduced to the magnetometer. In the current Android security framework, the user is not notified of magnetometer usage. As such, the practical mitigation strategy scope is limited, short of power cycling.

Other than removal of the magnetometer, sampling rate modification may provide the most ef- fective mitigation scheme. The mean sampling rates for the magnetometer were in the 150 Hz range. Decreasing this rate still allows non-malicious functionality while limiting the magne- tometer as a covert or side channel medium due to the effectively reduced Nyquist frequency. A less aggressive approach is to randomize the sampling rate which increases the ID and bit error rates in fixed length pulsewidths. To sustain this type of channel, the attacker would need to increase the pulsewidth, causing either a reduction in payload length or migrating to a larger physical footprint making the attack more challenging.

Adopting the overdamped scheme of analog compasses of the prior century is interesting. This provides low pass filtering, exhibits non-linear behavior and reduces the signal-to-noise ratio. What is compelling is the difficulty in envisioning the practical need for a critically / under damped magnetometer since an analog equivalent should be acceptable.

Another possibility is to eliminate the magnetometer altogether although some would suffer as no alternative is available. Those who can communicate with the GPS constellation might not need this feature. Placing the phone next to a permanent magnet would limit the magne-

33 tometer's ability to act as a receiver and unfortunately severely limit utility. A more practical solution is to reduce sensor sensitivity to approach 100µT/LSB or less rather than 1µT/LSB which significantly degrades magnetometer resolution while retaining functionality.

Attenuating magnetic fields is challenging as it is affected primarily by the shielding mate- rial. In a first order approximation from [17], the attenuation α equals permeability (T /D ) × S S where TS/DS is the ratio of the shield thickness to the length of the diagonal sheet or diameter of the shield circle depending on geometries. Since the latter is less than one, the permeability of the material must be very high to provide effective shielding as is the case with materials such as ferromagnetic alloys containing high Nickel concentrations. Materials suitable for RFI shielding such as Aluminum are ineffective in magnetic shielding applications.

Monitoring frequencies between 5 Hz and 50 Hz in 2 second segments is resource intensive. ≈ and mitigation in real-time is unlikely either in identifying the participating app or discon- necting all registered listeners which is currently not a feature. Finally, querying the user for permission to use the sensor is an option albeit unlikely due to the lack of action taken historically when highlighted in prior works.

2.10 Related Work

Jin [48], developed a file sharing scheme which used the magnetometer to reduce the prob- ability of proximate Man in the Middle attacks and limit eavesdropping from prospective attackers. Static device EMF readings are exchanged as a seed for secure communications. The operating range is less than 20 cm, far less than needed in our attack and inappropriate for dynamically controlled signals.

In Jiang[47], the authors use Amplitude Shift Keying (ASK) encoding for the ‘Pulse’ application intended for near field communications. They use multiple coils and ASK yet this channel fails to operate at distances higher than 2 cm and stationary devices are assumed. Although the field strength is similar, ASK is challenging in our attack due to position driven non-linearity of coil emissions.

Son’s [85] work demonstrated the effect of radiating acoustic energy at drones with power levels near 100 dBSPL, disrupting flight patterns by stimulating the gyroscope at its resonance frequency. In addition to the concern for the unprotected victim, additional power might be needed to penetrate clothing, leather pouches, pocketbooks etc. which offer significantly greater acoustic shielding at sensor resonant frequency(ies), making this attack implausible.

34 In Guri [41], data is transmitted via controlling a desktop computer’s resources (i.e., memory bus) at GSM, UMTS and LTE frequencies which are received by a smartphone at a distance in the 1 to 5.5-meter range. This requires the use of cellular services which we must avoid due to its location tracking capability.

Other sensor inter-device communications were demonstrated by Farshteindiker [27] whose covert channel utilizes an ‘implant’ to transmit ultrasonic waveforms, stimulating a smart- phone’s gyroscope. Since the devices must be touching in a position-sensitive location to function, this application is impracticable.

Michalevsky et al. [65] developed PowerSpy which used power levels and Dynamic Time Warp- ing to yield 80% user route inference accuracy. There are two issues with this approach. Since the attack requires known road structures a-priori overlaid with surveyed power levels, the data collection is substantial. Most importantly, active cellular services are needed which is prohibited in our attack.

2.11 Conclusion

We demonstrated a zero permission, location identification attack of an Android device. By constructing a low power transmitter that emits GPS mapped location data, we leverage the magnetometer to bypass location privacy protection schemes. With motion compensation, we can determine location 86% of the time with a BER of 1.5% which is only ten times the station- ary error rate. The ID success rate would exceed 94.6% if the coding scheme selected supported single bit error correction and would reach 97.8% for double bit error correction.

Future work includes improving our understanding of this attack’s potential by including other devices, improving coil switching time for size reduction, experimentally evaluating suitable mitigation techniques and extending the data-collection using other modalities and transmitter configurations.

35 Chapter 3

3 Android Ultrasonic Covert Channel

3.1 Introduction

In the U.S., broadband usage has slowed with the increased consumption of Smartphones [15]. Mobile device adoption is widespread with 3.9 billion smartphone subscriptions sold world- wide through November 2016 with projected sales exceeding 6 billion by 2020 [26]. Smart- phone sales alone in Q2 2016 reached 343.3 million units [46].

In order to enhance the smartphone user experience, manufacturers have historically and fre- quently, incorporated technological enhancements. Examples included increasingly accurate MEMs sensors, cameras and microphone arrays in addition to more powerful processors, en- hanced communication radios and high capacity storage chips. These enhancements combined with user-friendly and incentive-driven App Stores, help fuel demand for greater functional- ity, including the control and storage of personal information. Unfortunately, this increases security concerns due to the appeal of acquiring such rich information for malicious purposes.

Google™ and Apple™ have tried to address these concerns by implementing multi-layered security architectures. For example, Android implements Application Sandboxing and a Per- mission based framework, enabling users to control and grant / deny access to sensitive re- sources. Security enhancements are regularly incorporated such as the introduction of permis- sion groups (normal and dangerous) and runtime authorization for dangerous permissions (as of Android 6.0). Nevertheless, vulnerabilities are periodically discovered with malicious ap- plications attempting to trick users by exploiting design and implementation flaws [8]. These exploits ([22, 74, 69]) continue to be difficult to detect.

Interestingly, Gartner, in early 2018 [34], observed a slowdown in sales due to two key observa- tions. First, upgrades from feature phones to smartphones have slowed down due to a lack of quality "ultra-low-cost" smartphones and second, replacement smartphone users seek quality models hence they are extending the lifecycle of current devices. The expectation was that the manufacturers would continue to push high quality offerings but appear to have pursued higher margins at quality’s expense. The effect on covert channels is difficult to assess. Al- though poor quality sensor and processing chains would impact channel performance, we as- sume that user retention and pursuit of quality devices is advantageous for channel existence.

36 This chapter presents a stealthy covert channel built upon an ultrasonic communications bridge between two co-resident Android apps. The bridge, which uses the speaker and the local sen- sor package, leverages the smartphone’s resonance behavior, where the resonance points are a function of the sensor design as integrated into the device’s housing. Manufacturers typically design the operating band in the linear region, well below the resonance frequencies. However, if a stimulus generates frequency components in the non-linear region at or near a resonance point, sensor sensitivity increases and the accelerometer behaves as an amplifier / resonator.

There is a threefold benefit to the adversary. First, this is an out-of-band communications pathway. Second, the channel’s high frequency speaker emissions operate well beyond the voice band, affecting audibility. Third, it’s permissionless implementation greatly contributes to stealthy operations, ‘hiding in plain sight’ with data flowing freely as it circumvents system defenses.

Our contributions can be summarized as follows:

• To the best of our knowledge, we are the first to report the existence of a same-device, permissionless and ultrasonic, covert channel.

• Unique among Android covert channels, the transmitter monitors the received data which supports self-configuration, self-optimization, error correction and flow control.

• The channel is resilient to Android’s non-uniform event reporting effects.

• We developed an automated framework to discover, identify and characterize the channel since it exists only in very narrow bands of the spectrum and is device unique.

• We applied the framework on a set of 28 mobile devices (spanning 18 different models) and established channels on 13 devices (4 unique models).

• We evaluated channel capacity and throughput in three environments: a laboratory, the Xamarin TestCloud™ and the AWS Device Farm™ .

• We achieved device dependent theoretical Shannon capacity approaching 14 bps.

• Beyond our basic channel, we achieved 80% and 3 improvements with our multichannel × and Amplitude Shift Keying enhancements.

The remainder of this chapter is constructed as follows. In Section 3.2, we describe the exploita- tion opportunity. Section 3.4 details the system design and some of the practical limitations

37 presented by Android devices for this attack type. Section 3.5 describes test environment details, device selection and measures of effectiveness. Results, including performance differ- ences within identical device types, are noted in Section 3.6. In Section 3.6.4, we highlight measured performance and performance boosting techniques. Section 3.7 describes mitigation options and we end with a related works discussion followed by our conclusion in Section 3.8 and Section 3.9 respectively.

3.2 Background and Motivation

Smartphone covert channels follow either inter or intra-device patterns. Channel endpoints in the former case reside in separate physical structures and face elaborate proximity challenges. For example, Do [23] used Frequency Shift Keying of ultrasonic waves to communicate with a pre-positioned, external sink. Farshteindiker [27] used a critically positioned, external ultra- sonic device, affixed to the target Android device. In each case, position, external coupling and a participant unencumbered by Android limitations (i.e., permissions including microphone usage) were critical to establishing a channel.

With intra-device channels, the source and sink are co-located, exploiting same device resources for communication. Traditional tactics include setting manipulation, state modification, status manipulation and microphone based channels. These are becoming increasingly less feasible due to recent improvements in Android security. Establishing the default enforcement mode configuration for SELinux and deprecation of android.permission.ACCESS_SUPERUSER have eliminated the granting of ‘su’ privileges, useful in accessing kernel data structures. Further, adoption of runtime and deploy time permission checks enables the user to determine the presence of inappropriate resource consumption or unrelated activities. As a result, system resource manipulation attacks such as the /proc attacks described by Marforio [61] now have limited viability. Other attacks such as volume setting manipulation could be throttled with Operating System (OS) modifications that limit the setting change rate to be consistent with human behavior. Using the finger tap rate example, these channels would be limited to 7 changes per second [24], severely limiting performance.

Despite these aggressive security policies, attackers will continue to seek alternatives. If the objective remains to execute a stealthy same-device attack, then hiding from entities (including humans) that monitor permission requests, resource consumption or perform unusual shared system resource manipulation is vital. To date, Android covert channels circumvent at most, two of these detection mechanisms. The closest examples to encompassing all three are an air-gapped work demonstrated by Al-Haiqi [2] and Deshotels [21] using the vibrator as the

38 source. There are two shortcomings to this attack in that the vibrator requires permissions and, the vibration may be audible either directly or from device movement relative to the physical surface it rests upon, i.e., friction rub.

1.0

0.5

0.0 Amplitude 0.5 −

1.0 −

) 0.025 2

m/s 0.000

0.025 −

0.050 − X

Acceleration ( 0.075 Y − 35000 35200 35400 35600 35800 36000 Time (ms)

Figure 12: Sensor Response to 21050 Hz Tone Bursts on

In the channel proposed herein, we address all three conditions. We extend the two app ap- proach offered by Marforio [61], Laland [53] and Gasior [35] but in our context, the first app, isolated from network connectivity, accesses sensitive information while promising privacy. This provides the illusion of comfort to the user. Examples include calendars, contact lists, journals, password file managers and credit card managers. The channel can be established even with Do’s [23] recommendation to ask for permissions for any resource request. Although applying this scheme to the device’s speakers which currently do not require permissions, au- dio access can be justified as the need for an alerting mechanism.

The second app has sensor and network connectivity yet is blocked from direct access to sensi- tive information. Examples include games and fitness apps where sensor and network access are rationalized as functionally relevant.

This attack differs from Michalevsky [64], who reconstructed voice from sub-200 Hz signals and Zhang [97] who used the accelerometer to process speech, intending to detect ‘hotwords’

39 via energy pattern identification. In both cases, the energy from the voice content was near to or below the sensor Nyquist frequency. Although some aliasing exists, sufficient energy to yield results is present in the passband. In our channel, the frequency identification suf- fers aliasing effects since the operating band is 2 orders of magnitude above the Nyquist ≈ frequency.

Besides aliasing, High frequency signals face another impactful challenge: filtering. Typical MEMS devices will apply a Low Pass Filter (LPF) to limit high frequency artifacts in the sen- sor signal chain. A simple single order filter attenuates signals at 20 dB per decade resulting in 40 dB of roll off at our channel’s operating frequencies ( 20 kHz). The amount of power ≈ needed to detect the signal then becomes significant. This is supported by O’Reilly [80], who used MEMs accelerometers as guitar pickups which realized 40 dB attenuation at 20 kHz for each X, Y, Z axis above 1 kHz. Assuming that the pickups are exposed to an average stimulus of 60 dB of sound, near that of typical human voice, the power needed to achieve the in-band detection level is 100 dB. This is the equivalent audio power of standing next to a powered lawnmower. The 60 dB sensitivity level is consistent with Michalevsky [64], noting that the gyroscope could detect voice signals, containing much lower frequencies, at 57 dB.

This discussion’s foundational concept is based on our discovery that under specific condi- tions, highly correlated responses are observable on the accelerometers when high frequency signals are Amplitude Modulated (AM) when emitted from a device’s speaker(s). The signal is amplified [25] when operating near the accelerometer’s mounted resonance frequency yield- ing a response emulating an AM non-coherent detector. We show in Figure 12, an example of this effect with a pattern of three, 21,050 Hz, AM modulated tones and the corresponding X axis accelerometer response. Significantly, this shows the envelope, meaning that the channel receiver need not know the carrier frequency, eliminating the need for a priori knowledge of the operating frequency.

3.3 Threat Model

This section describes the threat model for the ultrasonic channel. What would entice a user to download apps in general including malicious ones is discussed in Section 1.7.

40 3.3.1 Vulnerability

A vulnerability exists in the Android space where acquiring sensor readings is not restricted. With permissionless access control, the user is not alerted to sensor use at installation time nor at run time. This attack involves one-way communications with the accelerometer acting as a passive receiver.

3.3.2 Threat

The threat is in obtaining sensitive information from a trusted app. We assume that there are two apps participating in the attack. The source has access to sensitive information in its role as masquerading as a password manager or a contact manager etc. In this capacity, it is not designed to have external access. The second app, the sink, masquerades as a game or other app where external access is necessary or assumed. The threat manifests itself when a form of communication may exist between the two apps via non-traditional out of band methods.

3.3.3 Attack

The attack is driven by a select group of malicious actors. The attack is enabled by the victim who downloads multiple apps. Once installed, the attack’s foundation is established.

At certain times of day such as between 1AM and 5AM, the device is typically resting in a sta- tionary position. At this point the attack is initiated by the source, who prepares the channel. The source generates a frequency sweep to identify the carrier frequency and performs some sample tests to determine the optimal signal parameters. It packages the data and transmits the data ultrasonically via the device speakers. The sink listens to the accelerometer, waiting for signals with certain characteristics, triggering communications processing. Upon detection of a valid frame, the sink decodes the information and repackages the data for transmission off-board at a later time.

3.3.4 Exploit

Most smartphones possess MEMs chips with sensors that provide accelerometer, gyroscope and magnetometer functionality. These chips are designed to detect and report perturbations depending on the sensor’s unique functionality. In each case, the system OS depends on a lin- ear response to perturbations. As a result, they design these chips with a resonance frequencies

41 (RF) that are much greater than the frequency components associated with any envisioned per- turbation. However, as installed in the chassis, there are a number of factors that affect the RF including potentially secondary resonance effects. We exploit the chip as integrated into the smartphone by transmitting ultrasonic waves, not much higher than 20KHz through the device speaker(s). This in-turn triggers responses near the RF which in-turn triggers a response to the shape of the stimulating wave. These emissions are inaudible to most individuals.

3.3.5 Trust

Trust is presumed in that the two apps perform their allegedly intended task as defined when selected for download. Otherwise they are candidates for removal. In addition, the victim trusts that her data is safe since it cannot be transmitted from the trusted app over the internet as it lacks the permissions to do so. Third, she has downloaded apps that provide valuable functionality and whose app store ratings meet her satisfaction.

3.3.6 Data

As mentioned, the data might encompass passwords from a password management app, per- sonal information such as that used by fitness apps, customs (immigration) apps, contact man- agement apps and many others.

3.4 System Design

3.4.1 Challenges There are four key challenges in developing this channel.

Stealth: This requires operating without needing permissions, avoiding resources that may be monitored or perform unusual operations and remain inaudible.

Device and Environmental Diversity: Device diversity affects frequency identification, bit rate and channel orthogonality. Operation is limited to small, device specific frequency bands with narrow coherence bandwidth, necessitating an autonomous solution to identify frequency and pulsewidth. Environmental conditions require signal extraction from noisy and noise prone accelerometers.

Configuration: To avoid attribution and linkage, the channel endpoints must function without

42 direct communication.

Android Limitations: The channel must operate despite sensor sampling and event reporting intervals that are orders of magnitude larger than the carrier frequency’s period, violating the Nyquist rate. In addition, the Android system’s sensor event reporting scheme is non-uniform and must be accounted for.

3.4.2 Solution Overview

App Store Channel Terminus

Internet Internet

Source Sink

OS and APIs

Media Player Sensor Wi‐Fi/Cellular

Smartphone

Figure 13: Single Device Covert Communications Channel System Design

As mentioned, we assign data theft and off-board communications to two separate apps. These apps communicate with one another via an ultrasonic bridge using only two system shared and permissionless resources; the speaker and the sensors. Typical operation includes the bridge’s source forming the high frequency wave and using the Android MediaPlayer API to control tone transmission through the local speakers. The receiver (the sink) uses the SensorManager API to monitor the resultant accelerometer perturbations prior to decoding and exfiltrating the data to an off-board third party using a cellular or Wi-Fi network (Figure 13). Post installation, the attack occurs in two stages: 1) Channel Identification (Phase 1) (exclusively a source activity) which addresses channel parametrization (e.g., carrier frequency, pulse width) and; 2) Data Transfer (Phase 2) (source and sink activity) which addresses the compromised data transfer. There are three precondition assumptions: 1) Both apps were willingly deployed by the victim; 2) The source has access to the speakers and; 3) The source has access to the compromised data.

43 3.4.3 Phase I: Channel Identication

The top level frequency ID process is illustrated in Figure 14. The key steps involve setting up the test sweep pattern, listening and processing and identifying the optimal operational frequency selection base on the source’s processing of the monitored accelerometer data.

Source App Sensor Suite Select Device Type

Select Volume, Wave Amplitude, Window Type

Generate Encoded Data Pattern

Generate Test Tones

Register With Sensor Suite Play Test Tones Monitor Sensor Data Perform Data Processing/Correlate With Encoding Pattern Identify Best Operation Frequencies

Figure 14: Frequency Identification Sequence

Interestingly, the narrow coherance band favors our attack. From Figure 16, we see the spec- tral response by the microphone from the application of the aforementioned sweep. The cor- responding accelerometer response is shown in Figure 17. We observe the sensor’s narrow coherence band by its response in very narrow frequency bands. This is advantageous as the accelerometer appears to have negligible responses to nearby frequencies. However, this does not apply to interference from sub-10Hz components due to motion etc.

The source synthesizes a frequency identification (FID) sweep pattern comprised of a set of discrete, coded frequency subpatterns using short interval spacing (i.e., 50 Hz and less). The frequencies range from 22,050 Hz to 16,000 Hz as shown in Figure 15 (see Section 3.4.5 for rationale). Note that an 800 Hz marker was included to support testing.

We adopt a spreading technique commonly used in wireless Direct Sequence Spread Spectrum (DSSS) applications. Using a Pseudo Noise (PN) sequence in the FID subpattern improves sig- nal and clock recovery processing in the presence of interference. This technique is useful for

44 40 − 60 − 800 Hz Marker Sweep Range ) 80 dB − 100 −

Power ( 120 − 140 − 160 − 0 5000 10000 15000 20000 25000 Frequency (Hz) Figure 15: Spectral Components, Channel Identification Sweep Pattern

Figure 16: Spectrum at Microphone

Figure 17: Coherence

45 signal extraction as the correlation result is strong when matched to sequence. The sequence (code) length depends on the spreading gain needed, per Equation (18). Spreading Gain(dB) = 10 log (code length) (18) × 10 Using a Barker Code of length 11, Equation (19),

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] (19) − − − − − − which is typically used in Wi-Fi DSSS), we can achieve 10dB of spreading gain. M-Sequences, ≈ for example, can be used for generating flexible length PN sequences [18], achieving (93.3 dB) with lengths in excess of 2, 147, 483, 647. In a real attack, the adversary can dynamically select the length based on accelerometer noise measurements and the needed bandwidth. We gen- erate the coded pattern based on bit and frequency values and modulate it using a Hanning window pattern. This window pattern was selected due to it’s limited distortion relative to others such as Tukey and Flat top window functions albeit the latter two provide more energy. The resulting pattern is sent to the media player and played over the device’s speaker(s). While transmitting, the source concurrently monitors the accelerometer’s X, Y and Z axes responses and applies post-processing as follows:

• Generate a matched filter of temporal length equal to PN sequence length pulsewidth × with sample points derived from the event times.

• Correlate the sliding received signal data with the dynamically created matched filter.

Correlation is computed for each encoded frequency sequence f to obtain a score f value using Equation (20), where l is the encoding scheme length, x[i] is the sensor measure-

ment at i within l, µx is the mean of all x over the sensor data length and EE[i] is the encoding scheme’s ith code value within l, e.g., -1 or 1 for each chip as needed. l score f = ∑ ( x[i] µx ) EE[i] (20) | i=1 | | − | | · |

• Identify the frequency, fˆ, yielding the maximum correlation magnitude using Equa- tion (21). fˆ is coincident with the best signal-to-noise ratio.

fˆ = argmax score f (21) f f , f , f ,... f ∈{ 1 2 3 n}

Correlation and Synchronization The matched filter window is dynamically generated by the app(s) as it slides over the sensor reporting events. Matched filters are ideal signal representa- tions, commonly used to extract the original signal in noisy environments. This contrasts with

46 traditional filters which may degrade the original signal during noise removal. Here, the sam- ple count is based on the number of event reports within the sliding window while the values are determined from the time positions relative to an ideal, uniformly spaced time based sam- ple set. As illustrated for a Samsung Galaxy S5 in Figure 18, non-uniform sampling rates are present, affecting the matched filter sample points over the coded window. Options included processing without correction or synthesizing evenly spaced events. However, Marvasti [62] asserted that non-uniform sampling intervals may be used in signal reconstruction if the av- erage of the interval rate satisfies the Nyquist rate. Therefore, the matched filter approach suffices provided this condition is met. Furthermore, it must be true for the event samples rel- ative to the pulsewidth. If so, events are processed as is, otherwise the pulsewidth is increased until the condition is met. This addresses the Android Nyquist sampling limitation.

The dot product of the matched filter with the signal window is computed. The result is normalized over the window’s event count and maintained for analysis.

Table 9 shows the notations defined for synchronization. We derive Tsynch (Equation 22), the sequence‘s communication frame temporal reference start point. T = max R R (22) synch | j|∀ where R = CW Sv (23) j j · j Figure 19 illustrates this synchronization method applied to a ’s X axis accelerometer. The window’s amplitude was reduced in size to support visualization.

47 Table 9: Correlation and Synchronization Notations Notation Definition T The set of all sensor recorded times j The relative position of each time re- port in T

CWj The correlation window (matched fil- ter) starting at position j Sv S The set of sensor values S within a cor- j ⊂ relation window starting at j

Tsynch The time yielding the best synchro- nization estimate R The set of all correlations t Rj The j h correlation result where Rj = CW Sv j · j

0.18 0.16 0.14 0.12 0.10 0.08

Probability 0.06 0.04 0.02 0.00 0 5 10 15 20 25 30 35 40 45 Reporting Interval (Msec) Figure 18: Sensor Event Time Histogram

0.10 Sensor Data Correlation Window 0.05

0.00

0.05 − Relative Amplitude

0.10 − 0 500 1000 1500 2000 2500 3000 3500 Window Sensor Event Time Figure 19: Synchronization Window

48 Pulse Width Determination After obtaining the operating frequency(ies), the source synthe- sizes a new set of test patterns using those frequency(ies) while varying the pulsewidth. The source measures the bit error rate (BER) associated with each pulsewidth, calculates the throughput using Equation (35), selects the desired throughput level and uses the correspond- ing pulsewidth for Phase II. Operationally, pulsewidth selection is a function of reporting rate which is device limiting per the provider’s configuration and / or sensor characteristics. If narrow pulsewidths, 10 msec or less are desired, repetition codes, interpolation or other ≈ compensation techniques may be needed to account for lower sensor event reporting rates. The sequence of events is illustrated in Figure 20.

Source App Sensor Suite Select Device Type Retrieve Frequency Identification Parameters Generate Encoded Data Pattern For Each Pulse Width Generate Test Tones

Register With Sensor Suite Play Test Tones Monitor Sensor Data Perform Data Processing/Correlate With Encoding Pattern Identify Lowest Pulse Width With Acceptable Chip/ Bit Error Rate

Figure 20: Pulse Width Identification Sequence

Sweep Granularity Initially, we used 50 Hz sweep increments to restrict transmission time and reduce power consumption. Since the risk is missed channel recognition, most of our reported results can be considered a lower bound for channel identification and performance.

49 SM-G920T23 Correlation Result 0.006 X 0.004 Y Z 0.002

0.000

0.002

Correlation Value 0.004

0.006 22 21 20 19 18 17 16 Frequency (KHz)

(a) Example, Freq ID Sweep at 50 Hz Increments SM-G920T23 Correlation Result 0.005 X 0.000 Y Z

0.005

0.010

0.015 Correlation Value

0.020 22 21 20 19 18 17 16 Frequency (KHz)

(b) Example, Freq ID Sweep at 10 Hz Increments Figure 3.21: Frequency Identification, Resolution Effect

All of our FID testing was performed using 50 Hz resolution. We tested a Samsung Galaxy S6 with 10 Hz resolution. The results are shown in Figure 3.21b. We see that there is some activity on X at 20,610 Hz which is not seen on the 50 Hz version (see Figure 3.21a). This is a trade-off between detection, power consumption and effectiveness. In this case, 50 Hz res- olution at nearly 1400 pulses for sweep completion was sufficient to identify the channel. At 10 Hz, 6700 pulses would encompass the sweep chip count. ≈

Spreading / Coding Spreading allows the channel to improve the error rate either by ap- plying gain, even with a simplistic technique such as a repetition code or, by transiting over multiple operating frequencies, Coding reduces error rate which in turn improves channel per- formance to levels approaching theoretical capacity. We use spreading in our synchronization patterns but not in our payloads and other framing elements. We found that for some of the de- vices, coding was not necessary since our throughput was within the same order of magnitude as the theoretical capacity. The remainder could benefit from coding as to reduce error rates.

50 Audibility Testing Testing was performed on each device to identify practical audio output levels. We configured the device volume settings and synthesized tone amplitudes such that human evaluators were unable to hear the audio output within 0.5 meters in any direction. The hearing level evaluation summary is provided in Table 10. The volume ‘Volume’ column is the relative value from maximum device speaker volume. The values under ‘Window’ type repre- sent the relative amplitude of the carrier’s sine wave. We used Hanning and Tukey windows to determine the energy vs. audibility tradeoff. Tukey windows, which provide additional energy due to its relative flatness compared to a Hanning window, induced more audible dis- tortion at equivalent volumes than the Hanning window. Although we did not observe any situation where Hanning usage failed and Tukey use was successful, the contrary was true in the case of the HTC One and the where no volume setting met the criteria.

Device Model Volume Window Type Hanning Tukey HTC One M9 0.8 0.5 Audible Huawei P8lite 0.9 0.9 0.1 LG Nexus 5 0.99 1.0 1.0 LG 0.99 0.7 0.5 LG Nexus 7 0.6 0.1 Audible LG Optimus L90 0.99 1.0 1.0 Samsung Galaxy S3 0.99 0.7 0.7 Samsung Galaxy S5 0.99 0.9 0.8 Samsung Galaxy S6 0.99 0.9 0.7 0.99 0.9 0.7 Table 10: Device Audibility Test

Interestingly, the Samsung family of phones generating little distortion at higher power levels while others necessitated lower volumes. We attribute this to the unique characteristics of a manufacturer’s complex audio chain (speaker, gain stages, filtering and frequency response).

3.4.4 Phase II: Data Transfer

Once the optimal channel parameters have been identified, the source packages the compro- mised data into transmission frames. Each frame consists of a preamble, the payload, a CRC field and an end of frame marker. The initial preamble consists of a synchronization sequence, parametric information and a shared secret magic number used by the sink to determine pulse

51 Channel Source App Sensor Suite SourceSink App App Terminus Generate Synch Payload, CRC, EOF Generate Test Tones Monitor Play Test Sensor Synch w MSEQ31 Tones Data Sequence Monitor Sensor Identify Pulse Data Width Decode Payload Encrypt

Generate New Message Xmit

Figure 3.22: Data Transfer Sequence width. We utilize a MSEQ31

[1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0] (24) pattern, due to the higher error rate environments so additional gain is needed. Payload cod- ing (spreading) may be added if needed. An additional shared secret in the initial preamble designating the spreading gain configuration could suffice. Like the source in Phase I, the sink processes the received data using the matched filter and synchronizes to the transmission frame. It subsequently applies discrimination techniques to retrieve the parametric information and compromised data prior to repackaging for off-board transmission. The entire sequence is illustrated in Figure 3.22.

3.4.5 Key Stealth Factors

Zero Permissions and Detectability: Engaging only zero permission resources, i.e., the speaker, sensors and media players is key. Detection risk is reduced by avoiding microphones, the vi- brator, shared files and message passing Android APIs.

Audibility from Spectral Components: User hearing is another detection concern. We se- lected the operating frequency band in part, based on speech studies by Beiter and Talley [5], who studied the auditory response of college age women. They observed hearing thresh- old changes of 160 dB/octave between 16 kHz and 20 kHz. More recently, Jungmee Lee et. −

52 al. [54] demonstrated that significant sound pressure level power increase ( 8 dB) is needed ≥ to normalize (flatten) hearing above 15 kHz.

Consequently, we selected 16 kHz as our lower frequency limit. Looking ahead, most vulner- abilities were uncovered at frequencies greater than 20 kHz, while all were above 17.7 kHz. Operating below 20 kHz depends on the attacker’s risk tolerance. The upper limit was set at below the Nyquist frequency, assuming the media player used a 48 kHz sampling rate. Al- though operable to 24 kHz, distortion is perceived at frequencies near the Nyquist frequency.

Audio Volume and Distortion: Certain devices produced audible artifacts during FID testing. These intermittent clicks we suspect are the result of clipping, inter-modulation distortion or mechanical noise from speaker movement. These were noticeable in most devices under test (DUT) when sending tones at the maximum (unity) sine wave amplitude (SWA) and at full player and full speaker volume settings. In virtually all cases, reducing the SWA to 0.99 and speaker volume to getStreamMaxVolume 1 eliminated the artifacts. −

3.4.6 Performance Boosting Design

We developed two techniques to enhance performance: Multichannel operation and Ampli- tude Shift Keying (ASK). For multichannel operation, performance improvement is a function of the total number of the device’s contributing sensor axes. In this case, we demonstrated multi-axial, single sensor responses. For example, the Galaxy S6 and S5 X and Y accelerom- eter axes respond to different and non-harmonically overlapping ultrasonic frequencies. By generating a waveform representing a weighted summation (to avoid clipping) of indepen- dent axis-specific waves, we can evoke axial responses concurrently. Channel setup requires executing a series of tests with different weights, measuring axial BERs and identifying those factors yielding the highest aggregate capacity.

ASK feasibility resulted from accelerometer sensitivity to speaker power levels. With ASK, the bit rate (BR) is a function of symbol rate and the log2 of the symbol set size, see Equation (25). Channel setup requires a series of tests where each symbol’s max and min over an observed range should not overlap with any other symbols’ max or min. The degree of amplitude separation drives the symbol count. The receiver can be informed of the symbol count and symbol amplitude range in the Phase II preamble. Advanced approaches can combine these techniques to further increase channel capacity. BR = log (#symbol levels) Symbol Rate (25) 2 ×

53 3.5 Testing and Evaluation Approach

3.5.1 Test Environments

Three primary test environments were used in this study; a development laboratory, the AWS Device Farm™ and the Xamarin TestCloud™. The laboratory test pool consisted of de- vices available to laboratory personnel. Within the lab environment, location and orientation (screen up) were fixed. Placement was near laboratory personnel performing normal activ- ities. Within several hundred feet outside the site was an uncooperative environment that included construction projects, nearby rail lines and vehicular traffic from the urban surround- ings. Less frequent perturbations emanated from the Android device directly i.e., ringing and event alerts. We believe that the environment was well suited for examining covert channel potential due to its similarity to urban residential environments. Since the TestCloud and De- vice Farm environments were uncontrollable (i.e., orientation, ambient acoustic and vibrational noise, co-resident testing), the devices were tested as is. Device Farm and TestCloud devices were oriented similarly (screen up), verified by Z axis readings. At testing inception, TestCloud had over 1200 unique Android device types while Device Farm had more than 150. We also ran FID tests in a busy café to further evaluate channel robustness in highly frequented locations.

0.2 0.1 0.0 0.1 0.2 0.2 0.1 0.0 0.1 0.2 − − − − Magnitude Magnitude

(a) AWS (σ = .065) (b) Xamarin (σ = .019)

0.2 0.1 0.0 0.1 0.2 0.2 0.1 0.0 0.1 0.2 − − − − Magnitude Magnitude

(c) Café (σ = .017) (d) Lab (σ = .020) Figure 3.23: Accelerometer Distribution

Environmental Assessment Microphone and accelerometer readings were recorded concur-

54 rently for a Samsung Galaxy S6 in each test environment. We assumed that calibration errors and component drift tracked similarly among identical device types. Since calibration data nor access to the physical devices themselves in the cloud cases was available, each measurement was relative to its last calibrated reference. Figure 3.23 and Figure 3.24 show the distribution of

100 100

) 80 ) 80 dB dB 60 60

40 40

Power Gain ( 20 Power Gain ( 20

0 0 0 5 10 15 20 25 0 5 10 15 20 25 Frequency (kHz) Frequency (kHz) (a) AWS (b) Xamarin

100

) 80 dB 60

40

Power Gain ( 20

0 0 5 10 15 20 25 Frequency (kHz) (c) Café Figure 3.24: Environment Noise Spectra

p the accelerometer magnitudes ( x2 + y2 + z2) and their corresponding ambient environments’ frequency spectra respectively. Since the distributions are Gaussian in form, their standard de- viation σ can be used to estimate the accelerometer noise. The recorded audio noise gain, P, can be used to estimate environmental noise levels. The AWS hosted devices exhibit high accelerometer noise (0.035 σ 0.15) and environmental noise gain (30 dB P 80 dB) rel- ≤ ≤ ≤ ≤ ative to its Laboratory equivalent (i.e., three to eight orders of magnitude higher), suggesting that the AWS™ environment is very noisy.

As a group, the Xamarin TestCloud™, café and laboratory had similar accelerometer and envi- ronmental noise levels. Note that the sensors report ultrasonic, audible and infrasonic signal components which all compete for bandwidth.

Collection Phase I testing consisted of executing FID tests at least five times for each device under test (DUT). Follow-up bit error rate testing confirmed channel existence thereby avoid- ing false positive results. Phase II testing consisted of a series of test and evaluate cycles until the aggregate payload size was at least ten times the inverse of the aggregate BER.

55 Automation: Calabash-android scripts, based on a user interface (UI) automation library for Android apps, controlled the device data collection process using the following test sequence: 1) Start the audio pattern, 2) Wait for audio pattern completion, 3) Upload the sensor files to our server, 4) Wait for upload completion and 5) Loop as configured.

Noise Mitigation: Each DUT underwent repetitive frequency identification tests to mitigate spurious environmental effects. Although results were averaged over the number of tests, smoothing effectiveness depends on perturbation magnitude and duration. Although most environmentally induced noise is observable in the Z axis accelerometer when at rest (assum- ing it rests with the screen up), occasionally the perturbation couples over to X and Y leading to inter-run variation. Sustained background noise would require additional processing such as noise cancellation, greater spreading and larger pulse widths and window lengths. Mitiga- tion is easier than in traditional communication systems due to the closed loop nature of the channel since the transmitter has access to the same sensor data as the receiver, allowing for dynamic channel parameter adjustment and / or frame retransmission.

Selection We implemented our source and sink app pairs on 18 models of smartphones (a total of 28 physical devices). These models were selected based on their popularity, availability in the test environments, Android versions, and sensor sampling rate. Although we sought to target the most popular phones during the selection process, we saw no clear consensus market ranking. However, Suvarna and Top101news [87, 88] suggest that the Samsung S6 is one most popular smartphones with a 5.47% global market share as of the first half of 2016. The same source writes that the , Galaxy S6 and Galaxy Note 4 comprise 20% of all ≈ Android devices in the US market.

Finding multiple instances of devices across environments was challenging. Balancing popu- larity, availability and repeatability needs, we reduced our sample size to the 28 devices. The specific list is available in Table 11.

3.5.2 Evaluation Approach

We applied classic communications theory to evaluate channel capacity and throughput. Shan- non capacity provides a theoretical measure of effectiveness (MOE) using realized bit error rates. Throughput provides an MOE that includes additional implementation factors such as payload length and frame length. While there are numerous methods to improve performance, two of which (multichannel and Amplitude Shift Keying) were implemented and are discussed

56 Table 11: Tested Devices by Environment Device Model Laboratory TestCloud Device Farm HTC 10 * HTC One M9 * * Huawei 6 * Huawei * Huawei P8lite * LG G5 * LG Nexus 5 * * LG Nexus 5x * LG Nexus 7 * LG Optimus L90 * G3 * * Motorola Nexus 6 * OnePlus One * * Samsung Galaxy Note 4 * * * Samsung Galaxy S3 * Samsung Galaxy S5 * * * Samsung Galaxy S6 * * * Z3 *

Note: * indicates evaluated within the specific environment in Section 3.6.4.

Theoretical Capacity: We assume a binary symmetric channel (BSC) [19], with a Bit Error Rate, BER. The Shannon channel capacity C, provides a theoretically achievable upper bound on the channel throughput and is computed as shown in Equation (34) factoring in the pulsewidth PW. This theoretical upper bound, is practically approachable with proper codes of large block length (e.g., turbo-codes, or LDPC codes) [58].

1 C = (1 + BER log (BER) + (1 BER) log (1 BER)) (26) PW × × 2 − × 2 −

Throughput: Throughput, TH offers an implementation and performance specific assessment of transmission rates. For an uncoded channel: 1 Length FramingBits TH = − (1 BER)Length (27) PW × Length × −

57 3.6 Results

This section presents the Channel Identification and Bit Error Rate testing results and con- cludes with a discussion covering intra-family similarity (identicality).

3.6.1 Channel Identication Results

Channel ID Results Summary

Nearly 25% of the 18 unique device models evaluated tested positive for vulnerability to our attack. We observe that the vulnerability is not universally present within a family of products lines, i.e., successful attacks could not be carried out on the Galaxy S3 vs. the Galaxy S6 and Galaxy S5, all Samsung products.

Devices with Positive Results

Figure 3.25 illustrates the sensor measurement correlation test results as a function of fre- quency. This highlights sensor sensitivity to the frequency sweep stimulus. Three Samsung devices, the S6, S5 and Note 4, exhibited positive responses to the attack as seen in Figures 3.25a to 3.25c respectively. Nexus 6 testing also revealed a potential channel, see Figure 3.25d. These plots provide graphical representations of Equation (20), showing each frequency’s correlation score. The preferred operating point is the frequency associated with the largest excursion from 0.000. Channel presence was confirmed in all cases using bit error rate testing whose results are discussed in Section 3.6.2.

Devices with Indeterminate Results

Table 12: Devices with Indeterminate Channels Device Model API Device Model API HTC One M9 22 LG Nexus 5x 23 HTC Ten 23 LG Nexus 7 17 Huawei HONOR 6 17 LG Optimus L90 19 Huawei Nexus 6 23 Motorola G3 22 Huawei P8lite 22 OnePlus One 19 LG G5 23 Samsung Galaxy S3 19 LG Nexus 5 22 21

We were unable to establish a channel using our frequency range in fourteen devices listed in

58 0.002 X 0.000 Y Z 0.002 − 0.004 − 0.006 − Correlation Value 0.008 − 0.010 − 22 21 20 19 18 17 16 Test Frequency (kHz) (a) Samsung Galaxy S6

0.002 0.001 X Y 0.000 Z 0.001 − 0.002 − 0.003 − 0.004 −

Correlation Value 0.005 − 0.006 − 0.007 − 22 21 20 19 18 17 16 Test Frequency (kHz) (b) Samsung Galaxy S5

0.005 X 0.000 Y Z 0.005 − 0.010 − 0.015 − Correlation Value 0.020 − 0.025 − 22 21 20 19 18 17 16 Test Frequency (kHz) (c) Samsung Galaxy Note 4

0.00 X Y Z 0.02 −

0.04 −

0.06 − Correlation Value

0.08 − 22 21 20 19 18 17 16 Test Frequency (kHz) (d) Motorola Nexus 6 Figure 3.25: Devices with Vulnerability

59 Table 12. In cases where the correlation values deviated from the nominal level yet were weak (less than 0.004), we confirmed the false positive with BER testing. Occasionally, simultane- ± ous peaks occurred in more than one sensor axis. This indicated a strong external force such as seen during significant impulse noise (shock) or a periodic external stimulus. Follow-up bit error rate testing to identify false positive conditions is needed since the Note 4 and Nexus 6 have simultaneous peaks as a legitimate operating condition.

We show in 3.26a an example of a failed test run for the HTCOneM. In this case, the sensor axes do not appear to have any sensitivity to this form of attack.

Correlation Result, Grp. Avg. HUAWEIALE-L0422 0.0030 0.0025 X Y 0.0020 Z 0.0015 0.0010 0.0005 0.0000 0.0005 Correlation Value 0.0010 0.0015 22 21 20 19 18 17 16 Test Frequency (KHz)

(a) A failed operating frequency test on Huawei P8lite

Environmental Influence All device types demonstrating susceptibility were proven in the lab- oratory and/or the Xamarin TestCloud™ and café. No channel could be established with these same devices in the AWS Device Farm™ due to noise levels. From Section 3.5, AWS™ had the highest sound pressure levels and Accelerometer variance of all testing environments.

3.6.2 Error Testing Results

We executed BER testing, per Section 3.5.1, on each channel capable device at four pulse widths, 250, 200, 100, and 50 msec. If no error was found we conservatively set the BER to 10-4.

Bit Error Rate Results Summary

The BER vs. pulse width results are provided in Figure 3.27a and summarized in Table 13. In general, the BER decreases inversely with increased pulse width. Excluding the Galaxy Note 4, and the Nexus 6, the S5 and S6 had BERs in the 10-3 or better range at 250 and 200 msec pulse widths. All, excluding the Galaxy S6, tail off below 200 msec with BERs worse than 10-2

60 S5 1 10− S6 Note 4 Nexus 6 2 10−

3 10− Bit Error Rate

4 10−

5 10− 250 200 150 100 50 Pulsewidth (Msec) (a) Bit Error Rate vs. Pulse Width

14 S5 S6 12 Note 4 Nexus 6 10

8

6

4 Capacity Bits/Sec

2

0 250 200 150 100 50 Pulsewidth (Msec) (b) Capacity vs. Pulsewidth Figure 3.27: Error and Capacity Summary

61 Table 13: Error Rate Summary Device Model Pulse Width (Msec) BER Motorola Nexus 6 50 0.1162 Motorola Nexus 6 100 0.0552 Motorola Nexus 6 200 0.0001 Motorola Nexus 6 250 0.1780 Samsung Galaxy S5 50 0.1898 Samsung Galaxy S5 100 0.0240 Samsung Galaxy S5 200 0.0001 Samsung Galaxy S5 250 0.0001 Samsung Galaxy S6 50 0.0616 Samsung Galaxy S6 100 0.0001 Samsung Galaxy S6 200 0.0001 Samsung Galaxy S6 250 0.0001 Samsung Galaxy Note 4 50 0.1965 Samsung Galaxy Note 4 100 0.1444 Samsung Galaxy Note 4 200 0.00832 Samsung Galaxy Note 4 250 0.0001 as pulse widths approach 100 msec. At 50 msec, all exhibited degraded performance. Inter- estingly, the 250 msec BER for the Nexus 6 was less than its 200 msec measurement while the BER for the Note 4, S5 and S6 peaked at 10-4 and monotonically degraded as PW deceased.

Regarding the Galaxy Note 4, we anticipated that as an older device, it would have poorer per- formance than its test peers. This is apparent as we observe poor resolving capability below 250 msec.

Resident in the Xamarin cloud, the Nexus 6 exhibited unusual performance at 250 msec. This is worthy of additional study as it is the only device exhibiting counter-intuitive behavior.

62 3.6.3 Device Family Uniformity

Table 14: Device Pool Identicality Device Model Series Location Test Case X Freq Y Freq Z Freq Motorola Nexus 6 - Xamarin 1 20500 17950 16600 Motorola Nexus 6 - Xamarin 2 17950 18400 17950 Samsung Galaxy S5 V TF 1 21050 17600 20100 Samsung Galaxy S5 V TF 2 21200 20650 17700 Samsung Galaxy S5 V Xamarin 3 20150 17950 20400 Samsung Galaxy S6 T TF 1 20500 19400 19400 Samsung Galaxy S6 T TF 2 20700 20550 21050 Samsung Galaxy S6 F Xamarin 3 20350 20950 20200 Samsung Galaxy S6 F Xamarin 4 16050 20650 16850 Samsung Galaxy Note 4 T TF 1 18250 18150 18150 Samsung Galaxy Note 4 F Xamarin 2 18150 18050 17950 Samsung Galaxy Note 4 F Xamarin 3 18000 18000 18000

Notes: TF = Test Facility, NA = Unremarkable Correlation, Boldface = Best Freq.

We compared the FID performance of three Galaxy S5s, four Galaxy S6s, three Galaxy Note 4s and two 6s to assess performance of similar device types. The tests were con- ducted in either the local test facilities (TF) or in the Xamarin TestCloud™. All device types, excluding the Nexus 6s, had at least one instance in both.

Each tested device exhibited positive FID responses, see Table 14 with the Nexus 6s exhibit- ing multi-frequency responses. In at least one case there was overlap in sensitive frequencies (18.4 kHz), albeit not at the peak sensitivities. Regarding the remaining devices, no two ‘like’ devices sampled demonstrated sensitivity to the same frequencies for a given axis. Within a specific device family, i.e., the Galaxy S6 (excluding the Nexus 6), the frequency differences are less than 1100 Hz. The sample size is too small to draw any conclusions regarding the ex- tent of the differences. Despite identical configurations as in the ‘T’ series S6 and the ‘F’ series Note 4, differences were observed. However, it demonstrates the effect of element (component, Operating System, manufacturing process) variability between identical versions and variants. This reinforces the viability of the self-identification method described earlier, allowing for the attack independence.

Surprisingly, one of the Xamarin S6s and Xamarin S5s had an axial Y component as the pre- ferred operating frequency that didn’t occur in other phones. We checked the Z axis readings

63 for orientation and the gravity and magnitude levels were consistent. We suspect damage or a different or modified chip set was utilized.

3.6.4 Capacity, Throughput and Performance Boosting

This section includes a summary of baseline channel, multichannel and Amplitude Shift Key- ing performance.

3.6.5 Capacity

We computed the channel capacity from Equation (34) for each of the susceptible devices us- ing the measured BERs. The results are illustrated in Figure 3.27b. The capacity levels which we hope to theoretically achieve, approach 14, 10, 6 and 6 for the S6, Nexus 6, S5 and Note 4 respectively.

3.6.6 Throughput

As with the capacity analysis, we derived the throughput from the BERs by applying Equa- tion (35) with frame lengths ranging from 64 to 512 bits. We assumed lengths for the synchro- nization frame, CRC and end of frame marker (EOF) of 23, 16 and 8 respectively.

The best throughput, 4 bits/sec, occurs on the S6 (see Figure 3.28a) with a 64 bit frame length ≈ and a PW of 100 msec. The S5 (see Figure 3.28b) and Nexus 6 (see Figure 3.28d) similarly offer a throughput of 2 bps with a 200 msec pulse width, also with length of 64 bits. The Note 4, ≈ see Figure 3.28c, throughput peaks out with a 250 msec PW and a frame length of 64 bits.

Given the discrepancy between channel capacity and throughput, it is clear that the channel is suboptimal as configured. Throughput degradation occurs at the narrowest pulse widths which are typically coincident with the highest BERs. Utilizing coding techniques [58] at these particular rates, would allow the throughput to approach theoretical capacity levels.

3.6.7 Performance Boosting Evaluation

Multichannel Performance

Since several devices exhibited multi-axial responses to non-overlapping and non-harmonically related frequencies, we sought to evaluate the feasibility of multichannel communications. To

64 4.5 64 4.0 128 3.5 256

3.0 512

2.5

2.0

1.5

1.0

Throughput (Bits/Sec) 0.5

0.0 250 200 150 100 50 Pulsewidth (Msec) (a) Galaxy S6

2.5 64 128 2.0 256 512 1.5

1.0

0.5 Throughput (Bits/Sec)

0.0 250 200 150 100 50 Pulsewidth (Msec) (b) Galaxy S5

1.8 64 1.6 128 1.4 256

1.2 512

1.0

0.8

0.6

0.4

Throughput (Bits/Sec) 0.2

0.0 250 200 150 100 50 Pulsewidth (Msec) (c) Note 4

2.5 64 128 2.0 256 512 1.5

1.0

0.5 Throughput (Bits/Sec)

0.0 250 200 150 100 50 Pulsewidth (Msec) (d) Nexus 6 Figure 3.28: Throughput vs. Pulse Width for Uncoded Communication

65 evaluate this hypothesis, we generated a multichannel test pattern for the Galaxy S6 by ag- gregating the individual X / Y axis peak response frequencies (20,500 Hz, 20,250 Hz) with amplitude weights of 1.0/0.0, 0.9/0.1, 0.8/0.2, . . . , 0.0/1.0. BER tests were subsequently exe- cuted on the aggregated pattern yielding capacity gains nearly twice that of a single channel (see Figure 3.29a). Although suboptimal, this demonstrated the basic multichannel capability. Improved BERs may be realized by including peak to average power correction to minimize energy loss.

2.0 XY Dual Channel X Only 1.5 Y Only

1.0

0.5 Relative Capacity

0.0

1.0/0.0 0.9/0.1 0.8/0.2 0.7/0.3 0.6/0.4 0.5/0.5 0.4/0.6 0.3/0.7 0.2/0.8 0.1/0.9 0.0/1.0 Weight Pairs (X/Y) (a) Dual Channel

0.2

0.0

0.2 −

0.4 −

0.6 −

0.8 −

1.0 −

Sensor Reported Value 1.2 − X 1.4 − 0 5000 10000 15000 20000 Sensor Event Report (b) ASK Figure 3.29: Performance Boosting

We analyzed potential cross coupling effects by calculating the Spearman [86] rank order cor- relation and Pearson product-moment correlation [99] coefficients for each weighted pair to identity any bleed-over, i.e., X affecting Y, Y affecting X etc. From Table 15, we see very low ax- ial coupling. Ignoring directionality, the worst-case correlation was less than 0.119, suggesting that interference is inconsequential for this particular device-frequency pair.

66 Table 15: Multichannel Axial Correlation Weight (X/Y) Correlation Type Axial Pair Coefficient Two Tailed p Value 5 0.0/1.0 Pearson XY -0.119 < 10− 0.0/1.0 Pearson XZ 0.018 0.0002 0.0/1.0 Pearson YZ -0.001 0.7663 5 0.0/1.0 Spearman XY -0.091 < 10− 5 0.0/1.0 Spearman XZ 0.023 < 10− 0.0/1.0 Spearman YZ -0.0014 0.767 5 1.0/0.0 Pearson XY -0.079 < 10− 1.0/0.0 Pearson XZ 0.001 0.827 1.0/0.0 Pearson YZ 0.0074 0.106 5 1.0/0.0 Spearman XY -0.0531 < 10− 1.0/0.0 Spearman XZ 0.0013 0.7801 1.0/0.0 Spearman YZ 0.0067 0.1421

Amplitude Shift Keying

We selected eight distinct, non-overlapping amplitude symbol levels to demonstrate ASK. Fig- ure 3.29b shows the symbols amplitudes in 16 bit patterns separated by quiet periods for illustration purposes. This corresponding bit rate (BR) boost from Equation (25), is three times the single amplitude rate.

3.7 Mitigation

Adverse effects, resource consumption and channel flexibility drive mitigation effectiveness. Spectral limiting techniques i.e., filtering frequencies above 20 kHz diminishes sound qual- ity due to the hypersonic effect [75], discouraging those seeking higher fidelity sound. Re- ducing sensor reporting rates slows the transmission rate by forcing an increase in channel pulsewidth. Sensor fusion may offer a solution where multiple sensor readings combine to in- fer a valid condition. However, the channel’s frequency summation ability supports operating with concurrent sensor stimulation, thereby potentially thwarting this defense. Some a priori characterization of valid conditions are needed to generate the appropriate approach.

Other possibilities relate to sensing activities such as monitoring player content over time. Al- though resource intensive and complex, a defender could monitor the sound (spectral) content and perform envelope detection on data snippets sent to the device players. Preferably, the OS could monitor apps that use the speaker(s) and the sensors, reporting resource usage normal- ized over access time to ascertain anomalous behavior. Unfortunately, this might be untimely

67 for real time detection of one-time exfiltration events.

Relying strictly on permissions for access control to prevent channel execution is perilous his- torically. Monitoring all apps as resources are accessed and subsequently querying the user for direction, depends on user sophistication and requires patience. For example, solutions such as Audroid [77] target audio covert channel mitigation. Defending this particular attack would require approval each time there are speaker only requests. These types of frequent query ap- proaches may encourage the user to disable enforcement mode or worse, root the device which encourages a broader set of attacks.

Even with the aforementioned defensive tactic and a sophisticated user, functional plausibility enhances obfuscation. Consider the source masquerading as calendar app which utilizes an audible alarm for appointment alerts. In this case, it has legitimate need to use the speaker. The sink alternatively, needs permission for sensor access. However, if it masquerades as a game, a legitimate use, the defense collapses. If the source is blocked from sensor access, it could cumbersomely and inefficiently use device family historical data, transmitting across a large spectrum without feedback.

3.8 Related Work

Farshteindiker [27], based on Son’s [85] work, developed a channel that relies on an ‘implant’, an external device that stimulates the smartphone’s gyroscope with high frequency energy. Due to sensitivity with the sensor module location, this device contacts the victim smartphone at a predetermined position. Complexity is high, requiring a priori target information. Fur- thermore, arranging proximity is very challenging as the contact point of between the two devices is specific and must be maintained throughout the attack.

Schlegel [82], Marforio [61], Yue [95] and Okhravi [72] addressed system setting manipulation. Examples included vibration setting, volume setting, screen state, socket creation / breakdown discovery, intent type, processor frequency and timing manipulation. Although channel data rates ranged from 3 to over 3000 bps, current Android security features (i.e., using SELinux) limit the ability to create such channels. Those attacks that are not mitigated by the OS may be rate-limited to the maximum rate of an equivalent human activity.

Microphone based inter-device channels also face the challenge of prearranged proximity. Ex- amples include: Do [23] who used Audio Frequency Shift Keying operating within the 20 kHz to 22 kHz band to communicate between air gapped devices; Hanspac [42], who devel- oped a mesh channel between two T400 series computers operating in the 18 kHz+

68 range and O‘Malley [74], who demonstrated inter-computer communications in the 20 kHz to 23 kHz band. Its limitations are similar to beacons where acoustic shielding, incident angles and directionality affect channel viability and integrity.

Dey [22] and Das [20] demonstrated the existence of device signatures unique to the manu- facturing process. Variation (see section 3.6.3) in our channel frequencies supports their ob- servations. Their results were based on bench testing which is not suitable for covert channel applications. In our case, we rely on self-discovery to leverage this device uniqueness feature.

In an attempt to game usage monitoring applications for financial benefit, Trippel [89] attacked a Fitbit’s accelerometer in an application that gives incentives for use. In this case, high fre- quency signals were emitted while the device was at rest, yet the sensor interpreted the signals as if they were motion induced.

3.9 Conclusion

We demonstrated that a zero permissions, ultrasonic covert channel can be created and self- parametrized on certain Android devices using the speaker and the sensor suite as the chan- nel’s endpoints. Individual axis channel capacity approached 14 bits per second limited by the testing scope and device type. Furthermore, we demonstrated that we can increase the data rate by including multi-channel and Amplitude Shift Keying capabilities to yield rates up to and including 3 times the basic channel capability. The path forward includes broadening the test base to include next generation devices and evaluating suitable mitigation techniques. De- veloping the channel to operate while the victim is moving as well as operating concurrently with audio / video source material would enhance the channel’s stealth capability and provide an interesting extension to this work.

69 Chapter 4

4 Cloud Covert Channel

4.1 Background and Motivation

We have found that little interference based covert timing channel research is available in the literature. Coincidently, we find explosive adoption of cloud based datacenters as the pre- ferred means of hosting web-enabled applications. Our research focuses on the intersection of the two topics since the cloud data center presents an interesting system of systems structure to target given the richness of potential exfiltration information and conversely, the sophis- tication of detection techniques used for malicious agent discovery. Many possibilities arise given a successful attack, ranging from basic exfiltration to establishing ’n’ information agents supporting aggregation to covert distributed processing of complex problems.

Prior works have focused on establishing a timing channel using IP manipulation or detection yet with little regard to stealth as needed in interference based channels, see Section( 4.9). Al- though our architecture is based on Bates’ work [4] who developed an interference channel to create communications symbols, we incorporate a stealthier approach with victim content sizes three orders of magnitude smaller than prior work. We broaden channel stealth to include query timing and content diversity in terms of the malicious components and the victim.

In addition, there is an absence of channel performance analysis in the literature whereas we provide a more extensive investigation into performance including in the presence of large dynamic network swing effects. We additionally apply a theoretical analysis which is also missing from prior work by applying communications theory to investigate channel capacity and throughout as well as analyze the impact of time synchronization and spreading gain on a practical channel implementation in a real-world data center.

4.2 Threat Model

This section describes the threat model for the cloud channel.

70 4.2.1 Vulnerability

A vulnerability exists in co-resident / multi-tenant / cloud solutions where resources are shared across tenants, allowing for exposure to resource manipulation attacks. These typically include L2 (specifically lowest level cache) and the platform’s network transmit and receive buffers.

4.2.2 Threat

The threat, once data is acquired, is that innocent applications / platforms may be unwitting participants in a covert exfiltration process. The threat is derived from the host facility’s need to utilize platform resources effectively as its management attempts to increase the customer- to-platform ratio by deploying more applications on a single platform. This increases the op- portunity for a malicious actor to land on a platform where a victim possessing the desirable attributes such as home page size, sub-URL sizes and quantities, resides.

4.2.3 Attack

The basic attack model forms after the malicious actor determines that it has successfully de- ployed a server application that is co-resident with the intended victim. Two external time synchronized clients initiate concurrent accesses to the server application and the victim re- spectively. The channel server responds to queries with content sufficient to interfere with the victim’s normal transmission time or not, depending on the data symbol requiring transfer. The colluding client (sink) observes the presence of interference from the measured time it took to receive the complete response from the victim. Symbol discrimination is applied to the retrieval time.

4.2.4 Exploit

The shared network / transmit buffer is exploited. If the interference server’s content size matches the victim’s content size and the requests are made concurrently then the content destined to be served up from the two sources is interleaved in the transmit buffer, causing the effective throughput for each request to be reduced. The time deltas may be enhanced by deploying unfair TCP protocols, manipulating the channel server’s TCP /IP stack or utilizing UDP and RTP as the transmission protocol.

71 4.2.5 Trust

Trust is presumed by the victim as an operational application assumed to be protected by the hosting service from common attacks. Service Level Agreements (SLA) typically involve resource consumption / utilization i.e., processors count, memory, disk space, network speed and availability so the victim presumes to trust in the provider’s ability to execute. SLAs gener- ally do not address covert / side channel attacks due to the difficulty in defending such attacks. In terms of acquiring the potentially leaked data, the customer assumes that the host patches are current and anti-malware applications perform their functions as anticipated, leading them to believe that there is minimal risk of data theft. Although we do not delve into information acquisition, we assume it has taken place prior to exfiltration.

4.2.6 Data

The data might encompass passwords, financial accounts, billing information, customer lists, business plans and more implementation / configuration type information such as private keys, symmetric keys, library revision and patch numbers and other interesting information that can be stored in the cloud.

4.3 Approach and Modeling

We created an interference based covert timing channel within a major university’s data cen- ter, maintaining 24x7 operations for 24 contiguous days. Our objective in addition to stealth, included maximizing efficiency, observing the practical effects of time synchronization and develop a decoding scheme independent of recent timing.

4.3.1 Approach

Our setup consisted of four Virtual Machine nodes deployed as two client/server pairs. The first pair consisted of a publicly available innocuous commercial web application clone and a malicious web client acting as a decoding node and operating under a false flag of a legitimate client. The second pair included an interference source server and its client, both decoupled from the first pair, sharing only random seed and time of operation information. Both servers are co-resident on a datacenter host.

We introduce interference and malicious client queries at predefined intervals according to a

72 pattern set by the exfiltration client. The resulting delivery time differences represent encoded exfiltrated data to the malicious web client. We utilize a coding-agnostic approach supporting multiple spreading gain selections.

Since covert effectiveness is conditional on the unpredictability of traffic and its non-stationarity, we introduced entropy in the form of randomly selected time slots where slot usage is a func- tion of a pseudo-random secret seed shared by the client side applications. This seed deter- mines in which time-slot requests are generated and decoded.

4.3.2 Models

The system can be described via two primary models, communications and adversarial. The communications model includes all actors and the target environment whose topology must be conducive to an attack.

The adversarial model is important since it frames the attack method and aligns the key de- cisions involved in selecting the appropriate attack vector and the corresponding parametric information needed. These include channel capacity, bandwidth and efficiency / Bit Error Rate, good-put maximization and detectability.

Communications Model We target third party enterprise hosting solution providers with Internet connectivity. Target selection can be expanded to environments where the attacker already has a presence. In either case, the candidate environment must host an outwardly facing content server and meet our co-residency criteria.

We assume a facility where the interference server and the victim share the same infrastructure including host, see Figure 4.1. The information sink and exfiltration source may be externally deployed. The victim can service requests from anywhere including the information sink. Currently, the interference server only responds to requests from the exfiltration source. How- ever, it may service other requests if its response is aligned with a non-interfering transmission window.

Adversarial Model Our approach is predicated on a sophisticated adversary’s attack whose configuration is patterned after Figure 4.2. Shared resources represent the attack surface. When these resources, (e.g., host buffers and the network interface) experience contention, the result- ing effects are exploited.

73 Figure 4.1: System View

Assuming the successful compromise of the victim, the exfiltration source initiates channel operations by directing burst requests to the interference server at prearranged random time intervals. These requests, reflecting the compromised data’s binary pattern, are synchronized with requests by the information sink queries. Both servers respond when they are able to do so. Actively synchronized responses due to the impracticality of predicting or controlling victim response times are avoided.

Although not the focus of this discussion, we address stealth from the perspective of plausible deniability. The interference server must appear to offer a legitimate service including support of traffic patterns that appear plausible.

Victim selection includes targeting a server that provides predicable content, expecting high access periodicity. The interference server must also provide predictable content and provide sufficient interference such that the information sink can discriminate between interference presence vs. absence. Channel detection becomes challenging since all content, content re- quests and operational behaviors will appear legitimate.

The implemented system required no root or sudo privileges beyond application installation and launch at the interference server and the two clients. Any root level access was decoupled from the exfiltration form, (i.e. data collection is separated from exfiltration). This eliminated the need for network stack or driver manipulation on which other exploits depend.

74 Figure 4.2: Covert Channel Client-Server Context

4.4 Challenges

The adversary faces numerous challenges. Beyond penetration and evading detection, she should make the channel robust and efficient to maximize the exfiltration rate. The channel must additionally ensure time synchronization, avoid defensive tactics and exhibit tolerance to unusual network / system behaviors and other sources of resource contention.

Time Synchronization: This channel relies on time synchronization to support decoding ac- curacy. The method used to maintain synch is critical due to issues with stability, time source defense mechanisms, synch request rate and Virtual Machine (VM) behavior. First, the sta- bility of the time source may induce drift in client clocks since the service requests appear at differing times relative to absolute time. Second, the time source may be intolerant to frequent requests due to perceived Denial of Service attacks on the time server, e.g., NIST Internet Time Servers [71]. This leads to a third issue in that drift is introduced with low time synchroniza- tion request rates. Fourth, there may be a lag between the time update request and the slew adjustment.

Detection: The hosting enterprise and the victim are the two predominant actors interested in detecting the channel. The hosting enterprise’s interest lies in determining the nature of the information leakage. The victim’s interests lay in the aforementioned and the elimination of

75 requests that provide no economic benefit.

The victim has few, if any, courses of action since it has a limited view of the problem. Forensics efforts may be limited to server resource consumption and metrics collected relating to client behavior. Borders [12] identified a number of timing related behaviors that suggest the pres- ence of a covert channel. In our channel, the information sink can make requests at random times for random web content from an outwardly facing server expecting highly repetitive queries. In doing so the risk of discovery is very low. In effect, the victim may be able to detect that a breech occurred as opposed to the presence of the covert channel used to exfiltrate the stolen data.

The hosting enterprise has access to a wider range of tools, (i.e., sniffing and intrusion detec- tion tools). In addition, they can investigate inter-arrival times at any given node. Berk [7] introduced such a method albeit with limited effectiveness. First, they must know that a chan- nel exists. Second, they must know where to instrument the network and the tools must have the ability to discriminate between multiple retrieval time distributions. With small time deltas and non-Gaussian distributions, this may present a discrimination challenge. Third, Berk as- sumes that the attacker chooses to maximize the exfiltration rate. One attack option is to trade exfiltration rate for entropy as a countermeasure. Fourth, the administrators, upon probing the burst requests and subsequent responses, should find nothing unusual since the content and periodicity were carefully selected to appear innocuous.

Prevention: Numerous techniques have been proposed to counter network covert timing chan- nels, primarily in the form of delay modulation, fragmentation and noise introduction. Al- though not our focus, we discuss three of them below. In each of these cases the hosting facility faces economic disincentives as these measures negatively impact performance and / or Quality of Service.

In Wray [91], random delay insertion was shown to adversely affect the timing channel since it shifts the expected symbol operating ranges. Viability of this solution may be limited with co-residency since the delays are potentially symmetrical among all co-resident applications. Unless the delays are asymmetrical, the measurements shift evenly with no appreciable ef- fect on the error rate. The effort to create dynamic asymmetric and acceptable QoS on each deployed application is not trivial.

Creating variable fragmentation that generates delay could also affect performance albeit at the expense of normal traffic. In Kang [49], noise was introduced in the form of variable ACK delays. Each of these countermeasures would also need to be applied asymmetrically in order for the channel’s discrimination capabilities to break down otherwise, QoS is impacted. De-

76 termining the asymmetric application of QoS will be challenging unless some suspicion as to what applications are the perpetrators exists.

Congestion and Path Volatility: Network congestion and path volatility contribute secondary effects unless the network is poorly designed and / or undersized. We assume that the effects of congestion are symmetrical on the portion of the network that is shared between the two servers. However, congestion observed on non-shared paths may contribute to channel errors.

Internal to the hosting facility, the use of load balancing is common, especially with high avail- ability / load balanced systems. Here, jitter is typically queue depth driven. If sized correctly, the queue depth should be small enough such that request redirection provides minimal la- tency effects. In the rare case of redirection to another site or processing a SYN, SYN/ACK, ACK handshake, the channel might incur errors. Compensation could include ignoring the anomaly.

External to the facility, the channel is subject to Internet behaviors such as asymmetrical return paths and router table volatility. Errors may result in either of these cases.

Platform Resource Contention: Platform resource contention is typically a result of sharing or layering. In VM environments, we commonly see CPU, I/O and memory sharing effects. Examples of sharing exploits can be found in [93, 94, 90, 76].

Latency associated with layering primarily affects network and CPU performance. Since we are network oriented including the transmitting and receiving chain, we focus our discussion accordingly. Fortunately, network latency cannot be eliminated although applications such as VSphere’s VM-Latency-SRIOV [71] reduce the impact. As these effects become less discernible, the less attractive this type of channel becomes as an exfiltration solution.

4.5 Detailed Approach

This section describes the strategy used to perform channel operations. We discuss transmis- sion rate as it relates to victim content size (e.g., web page size), discrimination, obfuscation, data collection, post processing and data analysis.

4.5.1 Impact of Web Page Size

After victim selection and confirmation of co-residency by techniques such as identified in [4], we now define the exfiltration rate. In this channel, the rate is a function of the retrieval time Tr0

77 observed by the information sink in the absence of channel induced interference. Including a guard band of 100% in the design allows for Tr0 variation. The encoding / decoding slot size is the result of the sum of Tr0 and the guard band. We assume that the retrieval time including in- terference is less than this sum. The exfiltration rate ceiling is now proportional to 1/(2 Tr ). × 0 Web page sizes vary substantially. Table 16 lists some popular websites and their correspond- ing home page sizes. These examples provide a range of sizes suitable for our use in selecting our web server’s index.html file and sub-pages.

Site Size .com 384.5 KB cnn.com 86 KB ebay.com 165 KB espn.com 234.8 KB facebook 26.9 KB HDtracks.com 1 MB latimes.com 299.3 KB msnbc.com 136 KB twitter.com 104 KB yahoo.com 324.7 KB Table 16: Sample index.html Sizes

4.5.2 Discrimination

As a timing channel, discrimination depends on distinguishing between flows with and with- out interference. Assuming the channel is time synchronized, we now must ensure sufficient traffic overlap of the victim content with interference source content for discrimination.

We define the retrieval time with 100% overlap as Tr = Tr + O Tr where O Tr is the 1 0 × 0 × 0 magnitude of interference injection time as a function of the original non-interfering time. This is based on a first order approximation where we assume that for 100% concurrent flows, the new retrieval time attributable to overlap will be 2 Tr0. Fairness is ignored in this approxima- × tion. If we then factor in measurement error due to Tskew, the clock skew between client clocks, then the discrimination resolution Tr Tr must be greater than O Tr T depending 1 − 0 × 0 ± skew on the relative skew direction.

78 4.5.3 Obfuscation

We focus on randomization due to the abundance of prior research in obfuscation techniques. We can increase entropy by reducing the probability of transmission within a specific time interval. This is accomplished by managing time slots and intervals.

The maximum number of non-overlapping transmissions in a one second period is 1/Trg sec- onds where Trg is sum of the retrieval time Tr0 and the guard band time. If the channel transmits once during this period, the probability of observing the channel at a given time is

Trg in a one second period, assuming Trg < 1. We can further reduce the detection probability to Trg/n by expanding this period using a macro interval n, where n > 1 second.

Another method of increasing obfuscation is to use random content. The attacker can crawl the victim’s website to identify pages suitable for attack using size as the criteria. For example, we catalogued the yahoo.com domain by page size and found 239 URLs sized as follows in Table 17:

Page Size Range Number < 100K 92 100–200K 40 200–300K 21 300–400K 44 400–500K 20 > 500K 22 Table 17: Yahoo URL Size Distribution

Although this breakdown is at a course level, further decomposition would yield collections of pages within narrower size ranges. The information sink need only randomly access pages of similar size to further confuse the defenders. The larger the sub-URL quantity, the more chal- lenging the detection effort since the attack does not query the same page deterministically.

A third form of obfuscation is to modify the interference transmission timing scheme. Nor- mally the information sink listens at a time specific to a interval-slot window. The information source has two choices. One is to not request a burst during the window when it intends to send a zero. The other is to introduce misdirection by requesting a burst in an additional slot or interval where the information sink is not listening.

79 4.5.4 Data Collection and Post Processing

We built a suite of tools to automate the data collection and post processing activities. The collection tools utilize tcpdump which is launched at the beginning of each experiment via crontab triggered scripts. We filter TCP/IP traffic for the specific source and destination IP addresses. The file size for a run is typically 407 MB.

Post processing involves minimizing the time needed for calculating retrieval times and deter- mining bit value without creating a bottleneck. We execute tshark to generate timestamped GET and ‘200 OK’ pairs and sequence numbers. The resultant file size is typically 217 KB. Finally, we transfer both files to a remote mounted folder for final processing, scoring and archiving.

We maintained raw and post processed data sets, approximately 10 GB per day, for anomaly and trend analysis.

4.5.5 Data Analysis

We utilize three techniques for data analysis. The first is based on finding a midpoint between the retrieval times for all populations of ones and zeros. We apply this point to all raw times and assign a communications symbol, one or zero, depending on its relationship to the mid- point. The second approach involves sweeping through a range of times for each run and comparing the score against truth to derive the best time threshold for that particular run. The third approach leverages the latter’s sweep approach but determines best fit based on the slope of the calculated score versus threshold curve.

The first approach requires a large sample size to determine best fit. We analyzed 24 runs containing 1280 bits each and determined the median points. Subsequently, we applied the Kolmogorov–Smirnov test to the populations of ones and zeros. This test yielded H=1 and P=0.0032. We interpret this as there is a 99.68% probability that the distributions are not Gaus- sian. The implication is that retrieval time behavior is influenced by other system level actors and complicates the threshold selection process, making this solution undesirable.

The second technique is useful as it optimizes an individual run although it relies on truth to determine the best threshold. This may necessitate specific bit patterns to be embedded in the exfiltration pattern so as to provide a calibration reference.

The third approach is applicable in near real-time and requires no knowledge of truth. As with the second technique, we apply a range of times to sweep through but in this case, we

80 determine the number of ones (or zeroes) each result would yield. We subsequently create a scoring curve with the results and calculate the slope at each time threshold increment. The time point where the absolute value of the slope is minimal represents the lowest error rate. If needed, we can interpolate between multiple minimal slope values to refine the selection and avoid taxing the information sink’s resources.

We seek the minimum value over the sample set s which corresponds to the minimal slope in Equation 29. Using Iverson Bracket notation:

f (s) = Σ [M (α + sβ)] Σ [M (α + (s + 1)β)] (28) s| i − − s i − | min f (s), f (s + 1)... f (s n) (29) { − } where the controllable parameters are α, the sweep start time and β the time increment. Mi is the real time measurement for the sample i.

4.5.6 Protocols

The system is built utilizing connection and connectionless protocols over IPV4. Web client- server exchanges follow the HTTP version 1.1 standard. UDP packets underly the malicious client-server pair exchanges. The information sink issues an HTTP GET to the victim’s in- dex page on each planned interval while the interference source concurrently requests packet bursts from the interference server.

We implemented the web server using the Ubuntu apache2 module. We use Python 3.4 http.client libraries using the HTTPConnection method to establish the web client-server con- nection. The advantage of this library and method pair is that the SYN, SYN/ACK, ACK sequence is executed at the beginning of each experiment. Subsequent queries within an exper- iment’s duration, unless induced from an external source, do not invoke this handshake. This greatly reduces the timing uncertainty associated with establishing a connection for each query.

We limited customization by using the default socket libraries native to Ubuntu 14.04 LTS and avoided raw socket manipulation in the python scripts.

Although we used UDP as the protocol for interference traffic, other protocols/mechanisms such as RTP or unfair TCP stacks are viable, provided that the selected protocol/content com- bination supports evading detection.

81 4.5.7 Spreading Gain

Applying spreading gain, a technique common in spread spectrum communications, allows us to reduce the error rate by manipulating the spread factor defined as the ratio of the chip rate to symbol rate.

We applied a repetition code pattern that supports a simple majority rules error correction scheme. This simultaneously provides a trivial spreading gain solution and afforded us the opportunity to combine error correction with capacity analysis in a lossy system and study the effect of performance with clustered chip patterns.

We define the normalized throughput THc of a communication channel as:

THc = Cr/S (30)

where Cr is the chip rate and S is the spread factor.

4.5.8 Attack Sequencing

Figure 4.3: Sequence Diagram

The system information flow control is illustrated in Figure 4.3. Prior to any client-server in- teraction, a crontab triggered script launches tcpdump on the web client (sink) VM. The web client’s network card is not configured for promiscuous mode since the channel depends solely

82 on query elapsed time measurements and this added functionality is unnecessary. Triggered by a 2nd local crontab script, the controller initiates the channel’s operational sequences by passing to the web and UDP clients (exfiltration source) the start time, the exfiltrated (exper- imental) filename and the seed used by the pseudo-random generators for transmit/receive slot assignment. The web and UDP servers (victim and interference servers) respond in best effort time to their client requests.

The control method should not contribute to discovery. Inter-client linkage is minimal since the controller may be located within or external to the data center.

4.5.9 Time Induced Errors

This channel relies on time synchronization to support encoding and decoding accuracy. The method used to maintain sync is critical due to issues with stability, time source defense mech- anisms, sync request rate and Virtual Machine (VM) behavior. First, the stability of the time source may induce drift in clients that it services since the requests appear at differing times. Second, the time source may be intolerant to frequent requests due to perceived Denial of Service attacks on the time server, (e.g., NIST Internet Time Servers [71]). This leads to a third issue in that drift is introduced with low sync request rates. Fourth, there may be a lag between the time update request and the slew adjustment.

We define t1 and t0 as measured content retrieval times observed by the information sink with and without channel induced interference respectively. We denote τ as the midpoint between any two t0 and t1 observations and τm as the median time of all t1 and t0 measurements for a given a run. Using τm as the decoding process discrimination threshold, there will be an error when t1 < τm or τm < t0.

The error rate, using Iverson Bracket [51] notation, is:

Error Rate = (Σi[t1i < τm] + Σi[τm < t0i])/n (31) where n is the total number of samples in a run, i is the ith sample, t1i is the retrieval time for the ith transmitted one and t0i is the retrieval time for the ith transmitted zero.

Since τ or τm track system behaviors and are inherently unstable, any variation, including the effect of clock skew, will affect the BER. We can compensate for skew by increasing the amount of interference since it widens the range between t1i and t0i. In addition, we can use spread spectrum coding techniques that increase robustness by improving the BER.

83 4.5.10 Content

Sizing We are limited to web page sizes that are available from prospective victims. We con- figured the victim based on a popular web site (see Table 16 examples) using an index.html page size of 308 KB. This selection strikes a balance between typical sizes found in the wild, randomness, desired exfiltration rate and the need to minimize clock skew effects.

The interference payload incurs three limitations. The first is the minimal burst size. As with the wire speed issue, the deltas between HTTP traffic times with and without interference must be resolvable. Second, the value cannot be too large as to overlap the subsequent time slot’s start time. Third, we did not want to fragment the transmission. We were able to sub-optimally resolve times with 60% overlap as a starting point, which also satisfies the bleed-over concern. In addition, we set the individual packet size to be 1390 bytes to avoid fragmentation.

Interference Server Content Sizing and Diversity We highlight content diversity on the in- terference server to further increase the difficulty in channel detection. Ideally, this would involve serving up content that changes dynamically and supports a wide variety of data val- ues. An interesting solution takes the form of yahoo finance. When one executives a ‘wget" on finance.yahoo.com, the resultant page size is 70kB. For each financial ticker symbol added ≈ to the query, the page returned adds 2kB. Also of benefit is the granularity of a per symbol ≈ impact where the level of interference is selectable and can be tuned to the victim’s home page with a high degree of accuracy.

During the time window where the stock market opens, this value changes upon the reporting of a financial transaction for a given symbol. In this manner, the content naturally changes without influence from the attacker. Therefore, the strategy becomes making frequent yet ran- domized queries based on the desired entropy while randomizing the target financial symbol requests.

4.5.11 Query Entropy / Duty

We utilize a seeded pseudo random number generator to introduce start time jitter into the channel. A shared secret seed is provided to the two clients prior to each experiment’s kickoff. The generator determines in which of the available slots that each client side requests are to occur. The result is a unique start time for each interval. Since the clients were written in Python 3.4, both random number generator outputs are identical.

84 For this experiment, one second was used as the macro interval and 100 msec as the slot time. The 100 msec value was derived from the victim web page retrieval time and the guard band. This yielded a 10% transmission duty cycle. The two clients query their respective servers based on the slot interval start times derived from random interval values.

4.5.12 Test Pattern

We used a test pattern length of 1280 chips. The pattern consisted of alternating groups of equal distributions of five ‘1’s and five ‘0’s, i.e., ‘11111’ or ‘00000’. We applied this pattern for each experiment over the 24 day period. Pattern size and form enable simple error correction and capacity analysis since at no gain and full gain, the bit pattern length is 1280 and 256 bits respectively. The pattern length is short enough to support exfiltration and the secure export of the completed 407 MB tcpdump file to a remote server via sshfs prior to the start of the next experiment.

4.6 Test Bed

The tests were conducted at Northeastern University’s Computer Science Department’s (CCIS) data center (CSD), illustrated in Figure 4.4. The CSD network core consists of a VMware Host ESXI and a pfSense VM cluster providing router, firewall functionality and other network ser- vices i.e. DHCP, DNS, NTP etc. Connected to these clusters are the bulk of the data center servers. These servers are used for CSD research efforts and classroom activities. Bluejay, which hosts the experiment’s management controller, Web and UDP client VMs is connected via a dedicated link to this cluster. Additional switch clusters connected to the core enable ac- cess to a series of switch ports via a Cisco 5548 cluster. This cluster feeds a dual, multi-homed Cisco 6509 catalyst switch. Hercules, which hosts the covert channel Web and UDP servers, connects to this switch via a 1Gb/sec link.

4.6.1 Node Distribution

In Table 18, we offer a snap shot of active machines (physical or virtual) on the network. The two covert channel servers on Hercules compete with 234 other machines whose traffic traverses Babel. Bluejay competes with 949 other machines whose traffic traverses the core.

85 Figure 4.4: Network Topology

Node/Cluster Machines(Physical/Virtual) Babel 235 Core Traversal 950 Table 18: Potential Network Competitive Sources

4.6.2 Platform Parametric Information

Bluejay is a PowerEdge 2950 III 2.66GHz Quad Core Processor with 16 GB of RAM with 1GigE network support. Hercules is a Dell PowerEdge R720XD 2x16 server with 128 GB of RAM with 1GigE network support. The asymmetric sizing reflects the supposition that the server side is more robust due to its more demanding business purpose. Since the malicious client seeks to maintain as low a profile as possible and support a low total cost of ownership, we use a lower end server. Table 19 illustrates the contributing channel elements.

86 Component Web Client UDP Client Web Server UDP Server Host Bluejay Bluejay Hercules Hercules VM VirtualBox VirtualBox VirtualBox VirtualBox Processor 0 0 0 0 Vendor ID Intel Intel Intel CPU Model Xeonr Xeonr Xeonr Xeonr Model Type ES430 ES430 ES2650 ES2650 Model Speed 2.66 GHz 2.66 GHz 2.00 GHz 2.00 GHz CPU Cores 1 1 1 1 Memory 5012 MB 4690 MB 8177 MB 8177 MB Disk Storage 256 GB 106 GB 100 GB 100 GB OS Ubuntu Ubuntu Ubuntu Ubuntu OS Version 14.04 LTS 14.04 LTS 14.04 LTS 14.04 LTS Table 19: Platforms

4.6.3 Speed

Wire speed is a controllable factor which can be used to manipulate symbol threshold values and compensate for excessive clock skew.

Since inserting network control devices is impractical, we use ethtools to manipulate VM NIC card speeds. For our experiments we implemented a 100Mb/sec client side configuration to ensure that the retrieval time delta between bit types exceeded the client side clock skews. Server side speeds remained at 1000 Mb/sec. Delay was negligible as measured with pings between the clients and paired servers. These nominally measured at 1 msec, with peaks approaching 4 msec.

Bit throughput is defined as follows:

Bits/Day = 86400 S /Interval (32) × c Bits/Hour = 3600 S /Interval (33) × c where Sc is the ratio of slots utilized to slots available.

87 4.7 Results

This section describes our experimental results. We begin with the actual observed CDS net- work traffic followed by discussion of endpoint measurements, the time jitter effect and chan- nel scoring.

4.7.1 End To End Trac

This section highlights the traffic observed at four key instrumented CDS network points as illustrated by the red circles in Figure 4.4. The Core observation point monitors system wide traffic. The PFSense–Core point monitors all core traffic going to the firewall that Bluejay shares. The remaining two points, Hercules–Babel and Bluejay–PFSense, monitor the two channel endpoints. We provide an analysis of a typical 24 hour period.

Core Trac Figure 4.5a illustrates the data center core traffic for an sample 24 hour observa- tion period. Each five minute sample represents the average of the traffic over that particular sample’s range. We see the largest sample value of 110 MB/sec with mean and standard devi- ation approximately equal to 16 MB/sec and 15 MB/sec respectively. The measured dynamic range is approximately 30 to 1.

For the same sample period, the pfSense firewall’s interface traffic, see Figure 4.5b, reaches an approximate peak of 17 MB/sec with mean and standard deviation approximately equal to 2 MB/sec and 2.2 MB/sec, respectively. We observe a dynamic range of 125 to 1 during the same period. Both observations indicate the uneven nature of core traffic.

88 (a) Core

(b) pfSense-Core Figure 4.5: 24 Hour Network Traffic

Endpoint Trac Figure 4.6a illustrates Hercules’ ingress and egress traffic levels for the same 24 hour series of runs. The regular pattern seen at Hercules, the server side host, suggests that the large load observed by the core appears to have little effect. Figure 4.6b, which illustrates Bluejay traffic, also suggests tolerance to core perturbations. This figure indicates that Bluejay traffic is dominated by the test channel.

89 (a) Hercules

(b) Bluejay

(c) 24 Hrs Bluejay Traffic - Zoom Figure 4.6: Endpoint Network Traffic

Now consider the period of time where the pfSense traffic load is greatest as seen in Fig- ure 4.5b. The peak rates occur in samples 204 through 257 which correspond to hours 17 through 21. The results in Figure 4.6c appear to confirm that this load creates no appreciable effect in that there is no significant rate reduction. The small peak after the large excursion is

90 attributable to offloading data to the remote server for post processing.

4.7.2 Channel Results

This section describes the hourly and daily results.

(a) One Day, Full View

(b) Full View, Zoom Figure 4.7: Web Page Retrieval Times, 1 Hr

Hourly Results We show in Figure 4.7a the retrieval time results based on the chip pattern with time separation of the interference and non-interference effects for one day’s aggregated runs. Note the small cluster around the mean highlighted by a horizontal line, indicative of system noise affecting the channel. In the magnified snippet, Figure 4.7b, we also observe the sequence with an interesting pattern developing of increased separation within a given burst pattern that might imply there is some memory in the channel. This will be evaluated in a later section.

91 (a) Full View

(b) Full View, Zoom Figure 4.8: Web Page Retrieval Times, 1 Day

Daily Results In Figure 4.8a, we observe the separation between the retrieval times associated with the ones and zeros for 24 contiguous runs executed for an entire day. In Figure 4.8b, we zoom in to illustrate the regular pattern for each of the day’s experiments in toto as a recon- structed waveform. Here we also see imperfect separation although its overall shape holds well. Figure 4.9 illustrates variation in mean, median and standard deviations. Although vari- ation is expected, the large peak in standard deviation in the hour 2 to 4 range is noteworthy. Zooming into Figure 4.10, it is clear that these have little effect on the mean and median values for this series of experiments. This is encouraging since the scoring threshold is related to these values. However, the statistical values in Table 20 indicate that the interference mean is 2σ from the non-interference mean. This may have significant error implications. In particular the maximum zero far exceeds the mean one implying crossover. The sample described herein contains a number of outliers which may contribute to large standard deviations. Examples of this behavior are visible at the upper range as seen in Figure 4.8a. To mitigate their potential contribution to adjacent slot interference, we rely on the guard band to improve separation.

92 Figure 4.9: Full View Figure 4.10: Full View, Zoom Figure 4.11: Retrieval Times Statistics, 1 Day

Chip Truth Mean STD Min Max one 0.04728 0.00714 0.0280 0.252 zero 0.03154 0.00842 0.0236 0.247 Table 20: Chip Distribution Statistics (Seconds)

Time Jitter Figures 4.12a, 4.12b and Table 21 summarize the observed clock skew between the web and UDP clients for a sample size of 47000 clock update requests, equivalent to ap- proximately 12 hours. The instrumented clocks vary up to 10 msec between one another. What is important, is that the worst case clock skew is less than measured difference between the interference and non interference induced retrieval times for all but a small percentage of sam- ples, see Figure 4.13. Interestingly, the approximate skew is similar to the 1σ value for each of the channel populations, 7.1 msec and 8.4 msec for ones and zeros respectively. The µ0 + σ0 retrieval times for all zeros and the µ σ retrieval times for all ones are within 1 msec. This 1 − 1 creates a very small discrimination error window which may degrade in the presence of large amounts of skew.

Statistic Value Mean 3.44 e-5 Max 0.0078 Min 0.00926 − 1 σ 0.00231 2 σ 0.00463 3 σ 0.00694 Sample Size 46962 Table 21: Time Jitter

93 (a) Measurements

(b) Distribution Figure 4.12: Client Clock Skew

4.7.3 Analysis

Thresholds Identifying the scoring threshold may be accomplished via heuristics analysis, brute force techniques, qualitative methods or adaptive means.

Heuristics determine a value offering the best threshold score via a statistical mean or me- dian from population distributions, see Figure 4.13. Brute force techniques involve sweeping through a range of values using truth data to determine accuracy. Qualitative determination is based on estimating a midpoint as illustrated in Figures 4.7b, 4.7a.

94 Figure 4.13: Population Distributions, Ones and Zeros

Alternatively, adaptive scoring may be used where analysis tools sweep through a range of thresholds, selecting the threshold corresponding to a relative minimum slope along the scor- ing curve.

The brute force sweep approach is well suited for static or low perturbation environments. Where large shifts in ambient network traffic occur, the system needs to recalibrate with new truth traffic which may require some out-of-band communication to avoid detection during set up. Alternatively, some truth data could be issued with each frame for the sink to establish a frame level threshold.

Table 22: Daily Accuracy and CER Results Day Best Threshold (Sec) Raw Score Accuracy 1 0.039 30375 98.88 2 0.039 30437 99.08 3 0.039 30407 98.98 4 0.04 29073 98.75 5 0.039 30450 99.12 6 0.04 29947 97.48 7 0.04 29244 95.20 8 0.04 30329 98.73 9 0.039 29749 96.84 10 0.04 30388 98.92 11 0.04 29908 97.36 12 0.04 29960 97.53 13 0.039 30363 98.84 14 0.04 30406 98.98

95 Table 22 – Continued from previous page Day Best Threshold (Sec) Raw Score Accuracy 15 0.04 30410 98.99 16 0.039 30435 99.07 17 0.039 30421 99.03 18 0.039 30443 99.10 19 0.04 29128 98.94 20 0.04 30388 98.92 21 0.04 30291 98.60 22 0.04 30342 98.77 23 0.04 30110 98.01 24 0.04 30446 99.11 Total 723450 98.12

Now consider chip performance over the course of a 24 day period. Table 22 summarizes the combination of the sum of the correct scores for a given day and the common threshold value yielding that result. Three issues are apparent. First, the optimal retrieval time varies from day to day as it would vary from retrieval to retrieval. Second, we see a daily raw score ranging from a high of 30450 to a low of 29244. Again, system traffic densities and resource contention would affect the results. Finally, we see 13830 misses representing an aggregate CER between 1 and 2 10 2. This approach yields a solid method for threshold determination. The addi- × − tion of coding with multi-bit correction would improve the result. The primary limitation is realized when multiple dynamic network swings occur within a frame’s transmission.

The adaptive method introduced herein, automatically compensates for network traffic behav- ioral changes. In the presence of large shifts in ambient network traffic, the method can simply expand its sweep time window and re-execute the algorithm.

To demonstrate the sweep approach, we applied a range of candidate thresholds and recorded the total number correct for an individual run. We subsequently determined the percent cor- rect versus truth. The results are summarized in Table 23 where we find the best results in the 39 to 40 msec range. This is consistent with the graphical separation observed between clusters in Figure 4.7b. We demonstrate the application of this technique over a 24 Hr period later in this section.

96 Threshold (Sec) Total Correct Percent Correct 0.035 29545 96.18 0.036 29877 97.26 0.037 30104 97.99 0.038 30250 98.47 0.039 30353 98.81 0.040 30380 98.89 0.041 30158 98.17 0.042 29355 95.56 0.043 28131 91.57 0.044 26866 87.45 0.045 25480 82.94 Table 23: Chip Performance vs Threshold

Memorylessness We investigated the memoryless nature of the channel to determine if it is representative of a Binary Symmetric Channel. We measured the CER as it varied for contigu- ous chips of ones, contiguous chips of zeros and the transition points between ones and zeros and zero and ones, performing this analysis with 983040 samples with an aggregate CER of 1.5%.

Table 24 summarizes the group scores where a group represents a chip’s relative position in the test pattern. For example, group one is the first position (left to right) in the ‘11111’ pattern, group two is the second position and so on. This also applies to the zeros where group one is the first position in the ‘00000’ pattern etc.

With respect to the homogeneous group results, there is no indication of memory in the chan- nel since neither the one’s scores nor the zero’s scores are monotonically increasing nor de- creasing with successive groups.

Now consider the two aforementioned transitions. Here we observe a shift in one to zero er- rors of 2.5%. Regarding zero to one scores, we observe a 0.3% shift in errors. Since they are well within the same order of magnitude, we see no clear indication of channel memory and in the aggregate, we cannot conclude that the channel has memory.

97 Group Ones Error Zeros Error Rate Rate 1 96516 0.01819 94718 0.03648 2 97623 0.00693 96349 0.01989 3 97831 0.00481 96710 0.01622 4 97682 0.00633 96928 0.014 5 97057 0.01269 96863 0.01466 Table 24: Memoryless Test

Tangent Analysis The application of tangent analysis allows for decoding without the need for truth data. Table 25 summarizes the effectiveness of our tangent based scoring algorithm when applying equations 28 and 29. Column 2, CERIter represents the CER calculation result- ing from a brute force sweep of times with the results compared against truth. The third col- umn, CERTan represents the CER after tangent analysis application. ∆CER represents the dif- ference between these calculated error rates. Ideally this should be zero. We observe nine, non- zero values all with less than a 0.01 offset. We also observe that the best time threshold for each approach is at most, with one exception, separated by 0.001, the increment used in the sweep process. This is clear from the THIter and THTan entries where THIter is the best brute force time threshold and THTan is the derived tangent threshold. Although the scores typically match, the times may not due to the symmetry associated with the absolute value calculation. Fur- thermore, the increment granularity influences the result as it may not align with the curve’s minimum slope point. In Figure 4.14 we observe the aforementioned deviations between the two techniques with the largest delta observed in hour nine, reflecting the peak ∆CER.

Table 25: Tangent Derived Error Rates

Time (Hr) CERIter CERTan ∆CER THIter (Sec) THTan (Sec) 0000 0.99219 0.98906 0.00313 0.0390 0.04 0100 0.98672 0.98281 0.00391 0.0400 0.041 0200 0.98906 0.98906 0.00000 0.0390 0.04 0300 0.98906 0.98906 0.00000 0.0380 0.04 0400 0.99609 0.99609 0.00000 0.0400 0.04 0500 0.99297 0.99297 0.00000 0.0390 0.04 0600 0.99219 0.99219 0.00000 0.0390 0.039 0700 0.99375 0.99297 0.00078 0.0390 0.04 0800 0.98281 0.98281 0.00000 0.0390 0.04 0900 0.99062 0.97656 0.01406 0.0390 0.041

98 Table 25 – Continued from previous page

Time (Hr) CERIter CERTan ∆CER THIter (Sec) THTan (Sec) 1000 0.99453 0.99375 0.00078 0.0390 0.04 1100 0.99219 0.99219 0.00000 0.0400 0.04 1200 0.98750 0.98750 0.00000 0.0400 0.04 1300 0.98281 0.98125 0.00156 0.0400 0.041 1400 0.99297 0.99297 0.00000 0.0400 0.04 1500 0.99375 0.99219 0.00156 0.0400 0.039 1600 0.99062 0.98281 0.00781 0.0400 0.041 1700 0.99219 0.99062 0.00156 0.0400 0.039 1800 0.98750 0.98750 0.00000 0.0390 0.04 1900 0.99062 0.99062 0.00000 0.0400 0.04 2000 0.99062 0.99062 0.00000 0.0400 0.04 2100 0.98750 0.98750 0.00000 0.0390 0.04 2200 0.99375 0.99375 0.00000 0.0400 0.04 2300 0.98984 0.98984 0.00000 0.0390 0.04

Figure 4.14: Direct Derivation vs Adaptive Sweep Figure 4.15: Enhanced BER with Gain Figure 4.16: Scoring

Network Interference Impact In Figure 4.5, we observed peak traffic loads between hours 17 and 21. Now referencing Table 25, we see the average error rate for 24 hours is 0.0095 while the average for the 17 to 21 hour range is 0.00969 and the remaining error rate is 0.0093. Although slightly elevated, we conclude that the effect is negligible and confirms that these load levels have no adverse effect on channel performance.

Gain and BER We illustrate the effect of spreading gain on BER in Figure 4.15. Only two of the runs are non-zero and those with errors have a BER of better than 8 10 3. The mean over × −

99 this sample is 4.87 10 4 which indicates that this approach yields a two orders of magnitude × − improvement.

Theoretical Capacity and Eective Throughput Since earlier analysis indicated that the channel exhibits memoryless and symmetric behaviors, we can apply traditional communi- cations theory methods to estimate the theoretical capacity and throughput. We first discuss the theoretical channel capacity as a function of BER for a memoryless channel. We assume a binary symmetric channel (BSC) with channel capacity Cc as follows:

C = 1 + BER log (BER) + (1 BER) log (1 BER) (34) c × 2 − × 2 −

Capacity We now apply the measured chip error rate to derive the theoretical capacity (TC) and the theoretical capacity with the application of spreading gain. We take the best, worst and mean individual results from the 24 day period and apply the capacity formula (34) and derive Table 26. Here, the last row represents the application of spreading gain equal to five. The use of spreading as a basic coding scheme results in boosting the BER at the expense of a lower bit-rate and therefore yielding a lower net capacity. Clearly, spreading gain is expensive in that the new data rate is approximately 1/3 of the worst case value without its application. We further illustrate this trade-off in Figure 4.18.

Case Spread Factor BER Capacity Capacity w Spread Factor 24 Day Worst Case 1 0.048 0.72 0.72 24 Day Best Case 1 0.009 0.93 0.93 24 Day Mean Case 1 0.019 0.87 0.87 24 Hr Sample 5 0.0005 0.994 0.249 Table 26: Effective Data Rates Including Spreading Effect

The channel capacity is approachable with codes typically of large block length (e.g., turbo- codes, or LDPC codes). Transmitting data packets over the uncoded channel results in a Frame Error Rate (FER) that is a function of the packet length L. An uncoded packet is lost if at least one of its bits is incorrect. Typically each frame would include a checksum to detect errors and should be accounted for in the computation of the channel goodput. We leave the investigation of optimal code performance to a future exercise.

100 Throughput Now consider the impact of frame length which is a key driver in the through- out calculations from Equation 37 and illustrated in Figure 4.17. There is a clear penalty at- tributable to length at higher BERs. The effect on throughout of adding spreading is seen in Table 27. with an SF of 5. The throughput albeit low, significantly bests that without spreading gain for the same length. This concern is tempered when constructing a true frame which would contain a header and a footer. The low frame length would be consumed by these such that the effective throughput would fall below the throughout realized by spreading.

THr = 1 FER = (1 BER)Length (35) − c − THr = 1 FER = (1 BER)Length (36) − − THr = (1 BER)Length (37) −

Spread Factor BER Length Throughput

2 3 1 10− 1280 10− 2  1 10− 128 0.276 5 4.9 10 4 1280 0.107 × − Table 27: Channel Length, BER, Throughput Summary

Figure 4.17: Throughput vs BER Figure 4.18: Spread Impact Figure 4.19: Capacity Behaviors

Maximizing Throughput With Spreading and Length Considerations On Practical Frame Sizes We show in Figure 4.20 the effects of spreading gain and payload length, assuming an 8-bit preamble while varying the spreading factor(SF) between 1 and 5 and CRC length between 16 and 32 bits. We see that when the SF equals 5 we realize a maximum throughput for each CRC length with a total frame length approximately equal to 300.

101 Figure 4.20: Frame Throughput

4.8 Mitigation

Numerous techniques have been proposed to counter network covert timing channels, primar- ily in the form of delay modulation, fragmentation and noise introduction. In each of these cases the hosting facility faces economic disincentives of adoption as these measures may neg- atively impact performance and / or Quality of Service.

In Wray [91], the authors addressed how random delay insertion adversely affects a channel by altering the expected symbol operating ranges. Viability of this solution may be limited if the two clients are situated within the hosting facility and slaved to the same time master unless the delay is applied asymmetrically. The skew may also be overcome by adaptive detection methods. Furthermore, injecting variable delays that affect the two clients differently and in isolation presupposes the unlikely discovery of their coupling or a path of brute force miti- gation was selected. Ironically, if noise / delays are implemented, the ability to detect other types of timing channels is reduced due to the challenge in identifying the potential channel from the new noise floor as in the case of Kolmogorov–Smirnov testing. As such, we find this approach to be unsupportable.

Creating variable fragmentation that leads to variable delay insertion could affect performance albeit at the expense of normal traffic. This technique would need to be applied through- out the server-side host asymmetrically in order for the channel’s discrimination capabilities to break down. The customers that observe the highly fragmented traffic may infer that the hosting business has an inefficient implementation and might question its sophistication. The economic concerns of degradation to legitimate applications would need to be factored in to adopt this strategy.

102 In Kang [49], noise was introduced in the form of variable ACK delays as a means to jam a covert timing channel. Given that our experiments target VM intensive data centers, this approach would lack the scalability necessary to be effective unless all traffic were funneled though one intermediary or intermediary cluster. The channel may be successfully degraded, if the randomness were based on source IP address at the expense of QoS and performance i.e. if the delay approaches a significant percentage of the retrieval time. If so, it has a corresponding negative effect on normal traffic performance by adversely influencing the Bandwidth Delay Product.

Datacenter proxies that translate encrypted data to plain-text prior to creating new cypher text with minimal delays additionally fail since it is seeking content containing exfiltrated data buried in the content which in the case here.

In a related manner, one must consider content and functionality-oriented protocols. Lower level protocols such as RTP or unfair TCP stacks are also viable interference source candidates. Generally, if the application and the content/protocol combination is plausible, one would expect the combined profile to be accepted by those protecting the data center.

As mentioned earlier, proprietary datacenters have additional latitude to condition traffic without the economic fallout. Alternatively, solutions in non-proprietary cloud environments where QoS as it relates to customer satisfaction / retention is critical, would likely be forced to adopt less aggressive options.

4.9 Related Work

Timing covert channel research typically focuses on development, excluding capacity and throughput analysis, or detection. Within development, there is considerable topic breadth. In Bates’ Co-Residency article [3], the authors considered a co-resident watermarking attack pattern in a cloud based virtual machine leveraging; an innocuous web server victim, a co- resident flooder, a UDP sink and a participating web client. Using an on/off scheme for flow watermarking, they developed a low-duty channel that operates for 250 msec per a two sec- ond period. A 16-bit checksum for every 64-bit block flooder transmission enabled recovery when needed.

In Luo [59], the authors demonstrated a novel technique by fixing the number of TCP pack- ets as the timing channel encoding scheme. Independent of time per se and consequently insensitive to jitter, the next message encoding only occurs after the source node receives the prior messages ACK sequences. Experimentally, the PlanetLab resident test bed consisted of

103 9 nodes, each with 17545 Round Trip Times yielded an effective max data rate of 450 bits/sec. The concern here is that they rely on fragmentation which is an indicator of poor network performance and may encourage further investigation.

In Yao et al., the authors performed a theoretical assessment of timing channel capacity and compared the results with a deployed construct. Intentionally ignoring the effects of synchro- nization, error correction and security, they evaluated a channel between two Shanghai Jiao Tong University campuses. Testing consisted of evaluating multiple data sets ranging in size from 64 to 1024 bytes. Their results indicated that error rates decrease as sending time intervals increase. Doubling the interval time reduced the errors nearly 2 orders of magnitude.

Detection additionally offers a rich topic set. In Gianvecchio [36], the authors used estimated entropy rate and corrected conditional entropy to detect covert timing channels applied to HTTP traffic. Both tests were combined due to the unique sensitivity of each test. Combined, they were successful in detecting all the tested covert timing channels. They achieved a 0.01 False Negative rate and a True Positive (1-False Positive) of 1. This success rate results from native capabilities since the entropy test is sensitive to small changes. This fails in the context of a covert timing channel as the distribution supposedly closely approximates legitimate traf- fic. Since the corrected conditional entropy test measures traffic ’regularity’, the accuracy is high. In contrast, if the distributions differ, the conditional entropy fails yet the distribution changes are detectable by the entropy test.

In Cabuk [13], the authors implemented a storage channel between Purdue and Georgetown Universities applying two timing channel detection techniques ’e-similarity’ and compress- ibility for detection purposes. With respect to ’e-similarity’, a majority vote was applied to inter-arrival times over a set of seven different ’e’ values. They observed that compressibility of inter-arrival packet times will be greater than with normal traffic channels. ’e-similarity’ failed to detect the timing channel in environments where the noise level reached 10%. On the other hand, compressibility additionally detected WWW and FTP-Data mix data sets in environments with noise levels of 25%.

4.10 Conclusion

We demonstrated the viability of a co-residency based covert timing channel using BER and capacity as the primary measures of effectiveness. The channel functioned reliably with its 24 day, 24x7 operation in a major university’s Computer Science department’s data center that supports over 1000 virtual and physical nodes. The channel exhibited a data rate of one

104 bit/sec that is scalable proportionally to 1/Tr, the content retrieval time. We characterized the channel’s performance in the presence of large dynamic network load swings and included an analysis of the impact of time synchronization and spreading gain. Finally, we introduced a decoding algorithm where knowledge of truth was unnecessary.

105 Chapter 5

5 Thesis Conclusion

We have demonstrated three covert / side channel attacks which support sensitive data exfil- tration on AndroidTM and cloud platforms, all exploiting the unprotected platforms’ shared resources. Despite the presence of the Android Security Framework, we have attacked the per- missionless attributes of sensors, targeting the accelerometer to move data between apps and the magnetometer as a receiver for location identification. The cloud attack targets the shared resources of a hosting platform, targeting the network transmit buffer of the unwitting victim which affect http query response timing.

We have also studied user download behaviors, concluding that installation of our channels’ malicious apps is realistically achievable. Surprisingly, it appears that users are more cavalier than anticipated in their download decisions.

Our location privacy attack is predicated on a seemingly innocuous app receiving and exfiltrat- ing location information obtained indirectly from an external source despite efforts to suspend all location acquisition and supporting services such as GPS, cellular, Wi-Fi, Bluetooth and NFC. Source locations may include stores, malls, railways, airports, hotels, cross-walks and bus stations. A location resident system encodes a unique ID that references position data such as GPS coordinates and transmits it via magnetic field manipulation. The victim’s local magnetometer, available to any app, functions as the receiver and detects the ambient mag- netic field disturbances caused by the encoded pattern in the presence of motion and other environmental noise. The pattern’s payload is transmitted off-board the Android device at a later time when communication services are enabled. We can therefore establish a partial history of device locations despite the user’s effort to prevent tracking, short of powering off the device. We are able to a determine location identification 86% of the time with a bit error rate of 1.5% using an uncoded pattern. We could reach an identification rate of 94.8% with a coding scheme designed for single bit error correction.

We have also created an ultrasonic, permissionless bridge formed between two co-resident An- droid apps using the speaker as the acoustic source and the accelerometer as the receiver. The MEMs sensors’ resonance behavior is exploited as an alternative to the permissions requisite microphone. Information is extracted by one app which is granted permission to access sensi- tive information but is blocked from external access. A second app is allowed external access

106 but is prevented by Android protections from direct access to the sensitive information. This bridge enables sensitive information to flow to an eventual off-board destination. The Android Security Framework offers no restraints to the bridge including the absence of alerts to the 4 victim. We achieved bit error rates of 10− with channel capacity approaching 40 bits per second when applying performance boosting techniques such as a MIMO-like dual channel configuration and an Amplitude Shift Keying modulation scheme. These performance levels are very reasonable for acquiring personally identifiable and other sensitive information.

Finally, we considered an alternative family of channels that exploit cloud co-residency. Here we described a stealthy channel where a hostile client-server application pair masquerading as a legitimate hosted site with valuable content, exploits shared resources on a cloud server. This channel, built on out of the box libraries and application configurations, executed contin- uously for 24 contiguous days in a major university Computer Science department datacenter. We emphasized stealth via increased randomization access and obfuscation via content diver- sity over conventional channels and mitigated the high error rate environment using spreading gain to achieve throughputs in the same order of magnitude as the Shannon capacity while achieving worst case BERs of 2% with frame lengths of 1280 bits. The BER improved two orders of magnitude with the application of spreading gain.

107 References

[1] Jonathan A. Obar. The biggest lie on the internet: Ignoring the privacy policies and terms of service policies of social networking services. 01 2016.

[2] A. Al-Haiqi, M. Ismail, and R. Nordin. A new sensors-based covert channel on android. The Scientific World Journal, 2014.

[3] Adam Bates, Benjamin Mood, Joe Pletcher, Hannah Pruse, Masoud Valafar, and Kevin Butler. Detecting co-residency with active traffic analysis techniques. In Proceedings of the 2012 ACM Workshop on Cloud Computing Security Workshop, CCSW ’12, 2012.

[4] Adam Bates, Benjamin Mood, Joe Pletcher, Hannah Pruse, Masoud Valafar, and Kevin Butler. On detecting co-resident cloud instances using network flow watermarking techniques. International Journal of Information Security, 2013.

[5] Robert Beiter and James Talley. High-frequency audiometry above 8000 hz. In Audiology/Audiologie, pages 207–214, May 1976.

[6] Karissa Bell. How apple’s app store turned into a scammer’s paradise. https:// mashable.com/2017/06/12/apple-app-store-subcription-scams/#HOFG5zoMeqqN, 2017.

[7] Vincent Berk, Annarita Giani, and George Cybenko. Detection of covert channel encoding in network packet delays. Technical report, Dartmouth College, 2005.

[8] A. Bianchi, J. Corbetta, L. Invernizzi, Y. Fratantonio, C. Kruegel, and G. Vigna. What the app is that? deception and countermeasures in the android user interface. In 2015 IEEE Symposium on Security and Privacy, 2015.

[9] Eoin Blackwell. Fitness app strava published ’heat map’ details about secret military bases. https://www.huffingtonpost.com/entry/fitness-app-strava-published-heat-map- details-about-secret-military-bases_us_5a6e9f6be4b01fbbefb3315f, Jan 2018.

[10] Jim Blasingame. Mobile computing will dominate your future. https://www.nasdaq.com/ article/mobile-computing-will-dominate-your-future-cm933354.

[11] Richard W Bohannon. Comfortable and maximum walking speed of adults aged 20 to 79 years: reference values and determinants. Age and ageing, 26(1):15–19, 1997.

[12] Kevin Borders and Atul Prakash. Web tap: Detecting covert web traffic. In Proceedings of the 11th ACM Conference on Computer and Communications Security, CCS ’04, pages 110–120, 2004.

108 [13] Serdar Cabuk, Carla E. Brodley, and Clay Shields. Ip covert timing channels: Design and detection. In Proceedings of the 11th ACM Conference on Computer and Communications Security, CCS ’04, 2004.

[14] Caroline Cakebread. You’re not alone, no one reads terms of service agreements. http://www.businessinsider.com/deloitte-study-91-percent-agree-terms-of- service-without-reading-2017-11.

[15] Pew Research Center. Mobile fact sheet. http://www.pewinternet.org/fact-sheet/ mobile/, 2017. (visited: 2017-02-22).

[16] Keith Collins. Google collects android users’ locations even when location services are disabled. 2017.

[17] Magnetic Shield Corporation. How do magnetic shields work. www.magnetic- shield.com/pdf/how_do_magnetic_shields_work.pdf.

[18] Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. John Wiley and Sons Inc., 1991.

[19] Thomas M. Cover and Joy A. Thomas. Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing). Wiley-Interscience, 2006.

[20] Anupam Das, Nikita Borisov, and Matthew Caesar. Do you hear what i hear?: Finger- printing smart devices through embedded acoustic components. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, CCS ’14. ACM, 2014.

[21] Luke Deshotels. Inaudible sound as a covert channel in mobile devices. In 8th USENIX Workshop on Offensive Technologies (WOOT 14), San Diego, CA, 2014. USENIX Association.

[22] Sanorita Dey, Nirupam Roy, Wenyuan Xu, Romit Roy Choudhury, and Srihari Nelakuditi. AccelPrint: Imperfections of Accelerometers Make Smartphones Trackable. In Proceedings of NDSS, 2014.

[23] Quang Do, Ben Martini, and Kim-Kwang Raymond Choo. Exfiltrating data from android devices. Computers and Security, 48, 2015.

[24] Mahmut Eksioglu and Ali Iseri. An estimation of finger-tapping rates and load capacities and the effects of various factors. Human Factors, 57(4):634–648, 2015.

[25] Endevco. Practical understanding of key accelerometer specifications TP328. https://www.endevco.com/news/archivednews/2009/2009_09/TP328.pdf, 2009. (visited: 2016-10-16).

109 [26] Ericson. Ericsson Mobility Report. https://www.ericsson.com/assets/local/mobility- report/documents/2016/ericsson-mobility-report-november-2016.pdf, 2016. (Re- trieved January 22, 2017).

[27] Benyamin Farshteindiker, Nir Hasidim, Asaf Grosz, and Yossi Oren. How to phone home with someone else’s phone: Information exfiltration using intentional sound noise on gyroscopic sensors. In 10th USENIX Workshop on Offensive Technologies (WOOT 16), 2016.

[28] Jose Fernandes. App marketing: how to increase the downloads of your app. https://bloomidea.com/en/blog/app-marketing-how-increase-downloads-your-app.

[29] Seth Fiegerman. Congress grilled facebook’s mark zuckerberg for nearly 10 hours. what’s next? http://money.cnn.com/2018/04/12/technology/facebook-hearing-what-next/ index.html.

[30] Gina Fleming. The state of consumers and technology: Benchmark 2017, us! https://cdn2.hubspot.net/hubfs/197229/The-State-Of-Consumers-And- Technology_-Benchmark-2017_-US%20(1).pdf, 2017.

[31] Fortune. Uber ditches tracking feature after concern over customer privacy. http://fortune.com/2017/08/29/uber-app-privacy-location-data/, 2017.

[32] Rebecca Frimenko, Cassie Whitehead, and Dustin Bruening. Electric and magnetic fields associated with the use of electric powe. Technical report, National Institute of Environmental Health Sciences,National Institutes of Health, 2002.

[33] Rebecca Frimenko, Cassie Whitehead, and Dustin Bruening. Do men and women walk differently? a review and meta-analysis of sex difference in non-pathological gait kinematics. Technical report, INFOSCITEX CORP DAYTON OH, 2014.

[34] Gartner. Gartner says worldwide sales of smartphones recorded first ever decline during the fourth quarter of 2017. https://www.gartner.com/newsroom/id/3859963.

[35] W. Gasior and L. Yang. Exploring covert channel in android platform. In Cyber Security (CyberSecurity), 2012 International Conference on, pages 173–177, 2012.

[36] S. Gianvecchio and Haining Wang. An entropy-based approach to detecting covert timing channels. Dependable and Secure Computing, IEEE Transactions on, 2011.

[37] Ashley Gold. Facebook’s privacy woes grow in washington. https://www.politico.com/ story/2018/06/04/david-cicilline-facebook-zuckerberg-lied-to-congress- 620656.

110 [38] United States Government. 1988 anthropometric survey of u.s. personnel: Summary statis- tics interim report. http://www.dtic.mil/dtic/tr/fulltext/u2/a209600.pdf, 2006.

[39] Brandon Gozick, Kalyan Pathapati Subbu, Ram Dantu, and Tomyo Maeshiro. Magnetic maps for indoor navigation. IEEE Transactions on Instrumentation and Measurement, 60(12):3883 – 3891, 2011.

[40] Manuel Gunther, Laurent El Shafey, and Sebastien Marcel. 2d face recognition: an experimental and reproducible research survey. 2017.

[41] Mordechai Guri, Assaf Kachlon, Ofer Hasson, Gabi Kedma, Yisroel Mirsky, and Yuval Elovici. Gsmem: Data exfiltration from air-gapped computers over gsm frequencies. In 24th USENIX Security Symposium (USENIX Security 15), pages 849–864, Washington, D.C., 2015. USENIX Association.

[42] Michael Hanspach and Michael Goetz. On covert acoustical mesh networks in air. JCM, 8:758–767, 2013.

[43] W. C. Huffman and Richard A. Brualdi. Handbook of Coding Theory. Elsevier Science Inc., New York, NY, USA, 1998.

[44] K. Huguenin, I. Bilogrevic, J. S. Machado, S. Mihaila, R. Shokri, I. Dacosta, and J. P. Hubaux. A predictive model for user motivation and utility implications of privacy- protection mechanisms in location check-ins. IEEE Transactions on Mobile Computing, 17(4), 2018.

[45] C. Hunger, M. Kazdagli, A. Rawat, A. Dimakis, S. Vishwanath, and M. Tiwari. Un- derstanding contention-based channels and using them for defense. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), pages 639–650, Feb 2015.

[46] IDC. Worldwide smartphone volumes relatively flat in q2 2016 marking the second straight quarter without growth. Press Release, July 2016. http: //www.idc.com/getdoc.jsp?containerId=prUS41636516.

[47] Weiwei Jiang, Denzil Ferreira, Jani Ylioja, Jorge Goncalves, and Vassilis Kostakos. Pulse: Low bitrate wireless magnetic communication for smartphones. In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing, UbiComp ’14, pages 261–265, New York, NY, USA, 2014. ACM.

111 [48] R. Jin, L. Shi, K. Zeng, A. Pande, and P. Mohapatra. Magpairing: Pairing smartphones in close proximity using magnetometers. IEEE Transactions on Information Forensics and Security, 11(6):1306–1320, June 2016.

[49] Myong H. Kang and Ira S. Moskowitz. A pump for rapid, reliable, secure communication. In Proceedings of the 1st ACM Conference on Computer and Communications Security, CCS ’93, 1993.

[50] M. N. S. Swamy Ke-Lin Du. Wireless Communication Systems From RF Subsystems to 4G Enabling Technologies. Cambridge University Press, New York, NY, USA, 2010.

[51] Donald E. Knuth. Two notes on notation. The American Mathematical Monthly, 99(5), 1992.

[52] Tom Krazit. Plenty of growth in store for cloud computing, according to gartner, will hit $186b in 2018. https://www.geekwire.com/2018/plenty-growth-store-cloud- computing-according-gartner-will-hit-186b-2018/.

[53] J. F. Lalande and S. Wendzel. Hiding privacy leaks in android applications using low-attention raising covert channels. In Availability, Reliability and Security (ARES), 2013 Eighth International Conference on, 2013.

[54] Junmee Lee and et. al. Behavioral hearing thresholds between 0.125 and 20 khz using depth-compensated ear simulator calibration. In Ear and Hearing, 33(3), Jul 2016.

[55] ERIC LICHTBLAU. Police Are Using Phone Tracking as a Routine Tool. w.nytimes.com/ 2012/04/01/us/police-tracking-of-cellphones-raises-privacy-fears.html, 2012. (visited: 2017-11-22).

[56] Eric Lichtblau. Police are using phone tracking as a routine tool. The New York Times, 2012.

[57] S. L. Lim, P. J. Bentley, N. Kanakam, F. Ishikawa, and S. Honiden. Investigating country differences in mobile app user behavior and challenges for software engineering. IEEE Transactions on Software Engineering, 41(1):40–64, Jan 2015.

[58] Shu Lin and Daniel J. Costello. Error Control Coding, Second Edition. Prentice-Hall, Inc., 2004.

[59] Xiapu Luo, E.W.W. Chan, and R.K.C. Chang. Clack: A network covert channel based on partial acknowledgment encoding. In Communications, 2009. ICC ’09. IEEE International Conference on, 2009.

112 [60] Jennifer Lynch and Andrew Crocker. The supreme court finally takes on law enforcement access to cell phone location data: 2017 in review. 2017.

[61] Claudio Marforio, Hubert Ritzdorf, Aurélien Francillon, and Srdjan Capkun. Analysis of the communication between colluding applications on modern smartphones. In Proceedings of the 28th Annual Computer Security Applications Conference, ACSAC ’12, 2012.

[62] Marvasti. Nonuniform sampling: theory and practice. Kluwer Academic/Plenum Publishers, New York, 2001.

[63] N. Matyunin, J. Szefer, S. Biedermann, and S. Katzenbeisser. Covert channels using mo- bile device’s magnetic field sensors. In 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), pages 525–532, Jan 2016.

[64] Yan Michalevsky, Dan Boneh, and Gabi Nakibly. Gyrophone: Recognizing speech from gyroscope signals. In 23rd USENIX Security Symposium (USENIX Security 14). USENIX Association, 2014.

[65] Yan Michalevsky, Aaron Schulman, Gunaa Arumugam Veerapandian, Dan Boneh, and Gabi Nakibly. Powerspy: Location tracking using mobile device power analysis. In Jaeyeon Jung and Thorsten Holz, editors, USENIX Security Symposium, pages 785–800. USENIX Association, 2015.

[66] Christopher Mims. Your location data is being sold—often without your knowledge. Mar 2018.

[67] A. Mosenia, X. Dai, P. Mittal, and N. Jha. PinMe: Tracking a Smartphone User around the World. ArXiv e-prints, February 2018.

[68] Jason Murdock. Who is tracking d.c. cell phones? homeland security confirms finding surveillance devices in washington, 2018.

[69] S. Narain, T. D. Vo-Huu, K. Block, and G. Noubir. Inferring user routes and locations using zero-permission mobile sensors. In 2016 IEEE Symposium on Security and Privacy (SP), May 2016.

[70] Nielson. Recommendations from friends remain most credible form of adver- tising among consumers; branded websites are the second-highest-rated form. http://www.nielsen.com/us/en/press-room/2015/recommendations-from-friends- remain-most-credible-form-of-advertising.html.

113 [71] NIST. Internet time servers. http://tf.nist.gov/tf-cgi/servers.cgi. ’Last checked 2015-04-15’.

[72] H. Okhravi, S. Bak, and S. T. King. Design, implementation and evaluation of covert channel attacks. In Technologies for Homeland Security (HST), 2010 IEEE International Conference on, pages 481–487, 2010.

[73] A. M. Olteanu, K. Huguenin, R. Shokri, M. Humbert, and J. P. Hubaux. Quantifying interdependent privacy risks with location data. IEEE Transactions on Mobile Computing, 16(3):829–842, March 2017.

[74] Samuel Joseph O‘Malley and Kim-Kwang Raymond Choo. Bridging the air gap: In- audible data exfiltration by insiders. In 20th Americas Conference on Information Systems - AMCIS, 2014.

[75] Tsutomu Oohashi, Emi Nishina, Manabu Honda, Yoshiharu Yonekura, Yoshitaka Fuwamoto, Norie Kawai, Tadao Maekawa, Satoshi Nakamura, Hidenao Fukuyama, and Hiroshi Shibasaki. Inaudible high-frequency sounds affect brain activity: Hypersonic effect. Journal of Neurophysiology, 83(6):3548–3558, 2000.

[76] Colin Percival. Cache missing for fun and profit. In Proc. of BSDCan 2005, 2005.

[77] Giuseppe Petracca, Yuqiong Sun, Trent Jaeger, and Ahmad Atamli. Audroid: Preventing attacks on audio channels in mobile devices. In Proceedings of the 31st Annual Computer Security Applications Conference, ACSAC 2015, pages 181–190, New York, NY, USA, 2015. ACM.

[78] Raghavendra Ramachandra and Christoph Busch. Presentation attack detection methods for face recognition systems: A comprehensive survey. ACM Comput. Surv., 50(1), March 2017.

[79] Thomas Ristenpart, Eran Tromer, Hovav Shacham, and Stefan Savage. Hey, you, get off of my cloud: Exploring information leakage in third-party compute clouds. In Proceedings of the 16th ACM Conference on Computer and Communications Security, CCS ’09, 2009.

[80] Rob O’Reilly, Alex Khenkin, and Kieran Harney. Sonic nirvana: Using mems accelerom- eters as acoustic pickups in musical instruments. http://www.analog.com/en/analog- dialogue/articles/mems-accelerometers-as-acoustic-pickups.html, Feb 2009. Accessed: Feb, 2017.

114 [81] Swaroop S. Genuine play store reviews are getting deleted by google assuming it as fake!, 2017.

[82] Roman Schlegel, Kehuan Zhang, Xiaoyong Zhou, Mehool Intwala, Apu Kapadia, and XiaoFeng Wang. Soundcomber: A Stealthy and Context-Aware Sound Trojan for Smart- phones. In Proceedings of the 18th Annual Network & Distributed System Security Symposium (NDSS), 2011.

[83] Ashley Sefferman. Mobile ratings: The good, the bad, and the ugly, 2016.

[84] James C. Simpson, John E. Lane, Christopher D. Immer, and Robert C. Youngquist. Simple Analytic Expressions for the Magnetic Field of a Circular Current Loop. https://ntrs.nasa.gov/search.jsp?R=20010038494, 2001. (visited: 2017-8-20).

[85] Yunmok Son, Hocheol Shin, Dongkwan Kim, Youngseok Park, Juhwan Noh, Kibum Choi, Jungwoo Choi, and Yongdae Kim. Rocking drones with intentional sound noise on gyroscopic sensors. In Proceedings of the 24th USENIX Conference on Security Symposium, SEC’15, 2015.

[86] C. Spearman. The proof and measurement of association between two things. The American Journal of Psychology, 15(1):72–101, 1904.

[87] Sid Suvarna. Most Popular Android Smartphones of 2016, According to AnTuTu. http: //n4bb.com/most-popular-android-smartphones-2016, 2016. (visited: 2016-10-02).

[88] Top101news. Top Ten Best Selling Smartphones in the World. http://top101news.com/ 2015-2016-2017-2018/news/products/best-selling-smartphones-world/, 2015. (Re- trieved October 2, 2016).

[89] Timothy Trippel, Ofir Weisse, Wenyuan Xu, Peter Honeyman, and Kevin Fu. WALNUT: Waging doubt on the integrity of MEMS accelerometers with acoustic injection attacks. In Proceedings of the 2nd Annual IEEE European Symposium on Security and Privacy, April 2017.

[90] Zhenghong Wang and R.B. Lee. Covert and side channels due to processor architecture. In Computer Security Applications Conference, 2006. ACSAC ’06. 22nd Annual, Dec 2006.

[91] J.C. Wray. An analysis of covert timing channels. In Research in Security and Privacy, 1991. Proceedings., 1991 IEEE Computer Society Symposium on, May 1991.

[92] Ping Xiong, Lefeng Zhang, and Tianqing Zhu. Semantic analysis in location privacy pre- serving. Concurrency and Computation: Practice and Experience, 28(6):1884–1899, April 2016.

115 [93] Yunjing Xu, Michael Bailey, Farnam Jahanian, Kaustubh Joshi, Matti Hiltunen, and Richard Schlichting. An exploration of l2 cache covert channels in virtualized environ- ments. In Proceedings of the 3rd ACM Workshop on Cloud Computing Security Workshop, CCSW ’11, 2011.

[94] Peili Yang, Shien Ge, and Zhongqi Yang. Establishing covert channel on shared cache ar- chitecture. In Natural Computation (ICNC), 2014 10th International Conference on, Aug 2014.

[95] Mengchao Yue, William H. Robinson, Lanier Watkins, and Cherita Corbett. Constructing timing-based covert channels in mobile networks by adjusting cpu frequency. In Proceed- ings of the Third Workshop on Hardware and Architectural Support for Security and Privacy, HASP ’14. ACM, 2014.

[96] Sameh Zakhary and Abderrahim Benslimane. On location-privacy in opportunistic mo- bile networks, a survey. Journal of Network and Computer Applications, 103:157 – 170, 2018.

[97] Li Zhang, Parth H. Pathak, Muchen Wu, Yixin Zhao, and Prasant Mohapatra. Accelword: Energy efficient hotword detection through accelerometer. In Proceedings of the 13th Annual International Conference on Mobile Systems, Applications, and Services, MobiSys ’15. ACM, 2015.

[98] K. Sh Zigangirov. Theory of Code Division Multiple Access Communication. Wiley-IEEE Press, USA and Canada, 2004.

[99] Daniel Zwillinger and Stephen Kokosk. CRC -Sstandard Probability and Statistics Tables and Formulae. Chapman and Hall/CRC, Boca Raton London New York Washington, D.C., 2000.

116