1 An Empirical Evaluation of Bluetooth-based Decentralized in Crowds Hsu-Chun Hsiao, Chun-Ying Huang, Bing-Kai Hong, Shin-Ming Cheng, Hsin-Yuan Hu, Chia-Chien Wu, Jian-Sin Lee, Shih-Hung Wang, Wei Jeng

This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.

Abstract— is being used by many coun- duration of such exposure, their mobility trajectories over time, tries to help contain COVID-19’s spread in a post-lockdown whether masks were worn, and whether the contact occurred world. Among the various available techniques, decentralized indoors or outdoors. contact tracing that uses Bluetooth received signal strength indi- cation (RSSI) to detect proximity is considered less of a privacy With the escalating scale of the COVID-19 pandemic, the risk than approaches that rely on collecting absolute locations via vast numbers of new cases identified via testing each day GPS, cellular-tower history, or QR-code scanning. As of October exceed the capacity of traditional contact tracing, the failures 2020, there have been millions of downloads of such Bluetooth- of which have been extensively reported [1]. Some local based contract-tracing apps, as more and more countries officially health services have even abandoned their former practice of adopt them. However, the effectiveness of these apps in the real world remains unclear due to a lack of empirical research that directly communicating with the close contacts of a case. Thus, includes realistic crowd sizes and densities. This study aims to governments are turning to digital contact tracing that utilizes fill that gap, by empirically investigating the effectiveness of mobile providers or mobile applications to help identify po- Bluetooth-based contact tracing in crowd environments with a tential contacts by means of GPS data, cell-tower connection total of 80 participants, emulating classrooms, moving lines, and history, QR-code scanning, and so forth. other types of real-world gatherings. The results confirm that Bluetooth RSSI is unreliable for detecting proximity, and that this However, large-scale collection of citizens’ digital footprints inaccuracy worsens in environments that are especially crowded. by governments has sparked concerns about mass surveillance. In other words, this technique may be least useful when it is Accordingly, to enhance the privacy of digital contact tracing, most in need, and that it is fragile when confronted by low- a number of decentralized approaches have been developed, cost jamming. Moreover, technical problems such as high energy capable of detecting close proximity between phones (and consumption and phone overheating caused by the contact- tracing app were found to negatively influence users’ willingness thus, presumably, those phones’ owners) without needing to to adopt it. On the bright side, however, Bluetooth RSSI may report each phone’s location to a centralized server. Many still be useful for detecting coarse-grained contact events, for of them use Bluetooth received signal strength indication example, proximity of up to 20m lasting for an hour. Based on our (RSSI) as a proxy for distance. The core of these apps is the findings, we recommend that existing contact-tracing apps can be periodic broadcasting via Bluetooth of anonymized, frequently re-purposed to focus on coarse-grained proximity detection, and that future ones calibrate distance estimates and adjust broadcast changing tokens (e.g., broadcasting every 100-270 ms and frequencies based on auxiliary information. changing every 15 minutes [2], [3]). These tokens contain no information about the phone or its owner, and each phone Keywords-Private contact tracing, COVID-19, Bluetooth RSSI stores the sent as well as received tokens. Thus, a user, Alice, arXiv:2011.04322v2 [cs.CY] 24 May 2021 who has tested positive can voluntarily publish all the tokens her phone sent during her potential transmission period (e.g., I.INTRODUCTION the past 14 days), such that another user, Bob, can download For decades, contact tracing has been known as an effective the tokens published by infected people and determine for method for controlling the spread of infectious diseases. In himself whether he has encountered any of them, and for how the traditional contact-tracing model, trained personnel use in- long, by matching them to his own phone’s received tokens. person or telephone interviews to identify and list those who Bob can then voluntarily seek help from health authorities have had meaningful exposure to diagnosed individuals during if his risk of exposure is considered high. Many apps of this a disease’s likely transmission period. The risk levels of those kind have built-in risk-estimation models, and may recommend contacts can then be determined based on factors such as the different measures (e.g., self-monitoring, commencing a 14- physical distances between them and the infected people, the day quarantine, or seeking medical attention) based on the risk levels they calculate. H.-C. Hsiao, H.-Y. Hu, C.-C. Wu J.-S. Lee, S.-H. Wang, W. Jeng are with As countries lift their lockdowns and reopen facilities, National Taiwan University, Taipei, Taiwan. there is a rush to deploy these Bluetooth-based decentralized C.-Y. Huang is with National Chiao Tung University, Hsinchu, Taiwan. S.-M. Cheng, B.-K. Hong are with National Taiwan University of Science contact-tracing apps. As of October 2020, some 20 countries and Technology, Taipei, Taiwan. and six U.S. states have adopted them. The EU is currently 2 planning a Europe-wide coronavirus-tracing network, based on example, contacts within 20m that last for at least an hour. The new infrastructure that will enable data-sharing between na- sampled phone users said they were more willing to use the tional contact-tracing apps [4]. In addition, Apple and Google similar apps 1) when in crowded environments and 2) while are planning to jointly release a new system called Exposure contact tracers from health departments were also using them. Notification Express, aimed at local health authorities that have not built their own apps [5]. II.BACKGROUNDAND RELATED WORK Nevertheless, few studies have gauged the effectiveness of We summarize several representative privacy-preserving these apps in practice, with existing evaluations of Bluetooth- contact-tracing methods and previous studies that review them. based contact-tracing techniques relying on either simulation with mathematical models, or experiments involving either mobile devices only, or a very limited number of partici- A. Privacy-preserving contact tracing for COVID-19 pants. Accordingly, the present work empirically investigates There are two major approaches to privacy-preserving con- the effectiveness of Bluetooth-based contact tracing, in two tact tracing. The first adopts a decentralized design intended phases. In Phase 1, we will examine whether Bluetooth RSSI to minimize the amount of data that needs to be sent to a can reliably predict distance in a controlled experimental centralized server. The other applies cryptographic algorithms setting. Then, in Phase 2, based on estimated distances be- to protect sensitive user data. tween pairs of participants’ phones over time, we compare a) Decentralized design: The former approach is exem- detected proximity and contact events in a semi-controlled plified by Safe Path [6], DP3T [7], and Covid-Watch [3]. event: a real-world academic gathering, the ground truth of Safe Path logs a user’s movement routes in his/her mobile which will be carefully recorded. We recruited a total of 80 device and only exports that data to health authorities if that participants to use one Bluetooth-based contact-tracing app, user is diagnosed with the virus. When this happens, the which we modified from Covid-Watch-TCN, in our controlled exported dataset is first redacted to ensure privacy, and then (30 participants) and semi-controlled (50 participants) settings. is broadcast to other users to allow them to self-determine Our modifications allowed us to collect ground-truth data their likelihood of having been exposed. Rather than capturing and the phones’ usage logs. After both experimental phases and storing route information, DP3T and Covid-Watch contin- were completed, we also conducted a followup survey with uously broadcasts ephemeral Bluetooth identities, which are participants to enrich the data we had already obtained. published to other relevant users in the event of a positive This paper will address the following questions: diagnosis, again to aid their individual risk calculations. Many of them have been deployed in the wild, but their effectiveness • RQ1. Can the app reliably estimate the distance between is yet to be investigated, partially due to the lack of empirical its users based on Bluetooth RSSI under different crowd studies. parameters, e.g., standing still vs. walking, with or with- b) Cryptographic algorithms: The second, cryptography- out physical barriers, varying interpersonal distances, and based approach has been adopted by Private Set Intersection the presence of jamming? (PSI) [8] and TraceSecure [9]. PSI enables two parties to • RQ2. How accurately does the app detect proximity compute the intersection of their data in a privacy-preserving and contact events in realistic crowd environments, as way, with only the common data values being revealed. The compared with the ground truth of such events? data used for computation contains only hashed location • RQ3. What are users’ perceptions and experiences of points, and location privacy can therefore be guaranteed. using these apps? TraceSecure, on the other hand, incorporates a public-key Our experimental results confirmed that Bluetooth RSSI based security protocol for message exchange and storage. was unreliable for detecting proximity, and revealed that such An optional homomorphic encryption scheme can be used to inaccuracy worsened in crowded environments. This implies further enhance data protection. While the cryptographic-based that this technique may be least useful when it is most needed, approach guarantees better privacy than the decentralized and fragile when confronted by low-cost jamming. Particu- one, it is less deployable on consumer devices and existing larly, the app failed to capture the majority of the proximity infrastructure. The present study therefore focuses only on events: Only 16 of 67 (24%) proximity events were detected decentralized contact tracing, as being more feasible to deploy by the app when setting a 2-meter distance threshold; and on the massive scale required for this purpose. only 19 out of 67 (28%) were detected even when no distance c) Implementation: Among the decentralized contact- threshold was set. In terms of user experience and perceptions, tracing apps, a number of them use Bluetooth RSSI as a proxy technical problems such as high energy consumption and for close proximity between devices, and therefore, between phone overheating caused by the app were found to negatively the owners of those devices. Early implementations such as the influence users’ willingness to adopt it. On the bright side, DP3T project and Covid-Watch had to confront Bluetooth in- the app captured 63% of contacts lasting one hour in a room teroperability issues between the two major mobile platforms, containing 50 participants and more than 150 other people. iOS and Android. In April 2020, the two major smartphone OS Divided by the operating systems, 80% of the Android devices providers, Apple and Google, announced that they had formed were able to be discovered by both nearby Android and iOS a coalition to release APIs that would help contact-tracing apps devices in about an hour. This implies that this technique may work across iOS and Android devices, with the first such APIs still be useful for detecting coarse-grained contact events, for appearing the following month. Android version 6.0 and iOS 3

13.7 and higher have supported the fundamental functions of ployed in the field, their effectiveness remains an open ques- Bluetooth contact tracing, including broadcasting and listening tion due to the lack of ground-truth information. Prior to our to tokens, at the OS level. To prevent abuse of these APIs, present effort to help fill that gap, an empirical study [17] the companies restricted each country to one official app. was conducted in April 2020 among a group of 48 soldiers in An increasing number of national projects have adopted the Germany. They were divided into five scenarios with different GAEN APIs, including Switzerland’s SwissCOVID, Italy’s moving patterns, with at most 10 people in any one scenario. , and Germany’s Corona-Warn-App. Since our use of A follow-up report documented the experimental protocol it in our experiments, Covid-Watch has also now switched and provided some preliminary analysis, but drew no clear to using the GAEN APIs. We plan to use our proposed conclusions and made no recommendations. methodology to evaluate GAEN-API-based apps once the Leith and Farrell conducted a series of studies [18] to GAEN APIs are enabled in Taiwan. empirically measure RSSI between mobile phones indoors and d) Privacy concerns: Tracing contacts by accessing de- outdoors, as well as on a bus and a tram, and considered vices’ relative locations (e.g., using Bluetooth signal reception) factors that could affect such signal strength, including dis- is considered more protective of user privacy than capturing tance, phone orientation, and absorption and/or reflection by absolute locations (e.g., via GPS). Several prior studies have surroundings such as building walls or even human bodies. investigated privacy threats such as replay attacks and de- Their follow-up measurement study recruited five participants anonymization attacks by state-level or resourceful adver- on a commuter bus to investigate the relationship between saries [10], [11], or have proposed advanced cryptographic Bluetooth attenuation and distance in an environment prone to solutions to enhance privacy [12]. While detailed privacy signal reflection. They made several recommendations for in- concerns are beyond the scope of our empirical study, we creasing the accuracy of Bluetooth-based contact tracing, such feel it should be noted that privacy enhancement beyond a as leaving phones on tables instead of keeping them in bags or certain level is likely to degrade the detection accuracy and pockets. Our study considers scenarios involving much larger other aspects of the performance of contact-tracing apps. Our groups of participants as well as jamming devices, which methodology and protocols used in this study will be useful in allows us to simulate crowded scenarios and to observe issues assessing whether the negative performance impacts of future that might only occur only or mostly in large groups of people, privacy-enhancement efforts outweigh their benefits. e.g., rapid battery depletion, interference, and interoperability problems between different phone models. In addition, we also collected users’ feedback after using a Bluetooth-based B. Evaluation of Bluetooth-based Contact Tracing contact-tracing app, particularly their perceptions and concerns Broadcasting of anonymized tokens to nearby devices is regarding privacy and usability. fundamental to the implementation of privacy-preserving con- Since our focus is on Bluetooth-related issues, the following tact tracing. In contemporary smartphones, the most widely studies were also important to our thinking, despite being deployed techniques that support such broadcasting are Blue- beyond the scope of our own research. Existing exposure- tooth and Wi-Fi. Of the two, however, developers prefer to notification apps often feature fixed thresholds for identifying use Bluetooth because it was originally designed to function contact events and calculating exposure risks. For example, if within ad hoc networks, and because it has also been widely a user has been in close proximity with a confirmed patient used for distance measurement and indoor positioning [13], for a sufficiently long time (e.g., 15 minutes), that user will [14]. By reading an RSSI reported by a receiver, an application be warned of potential exposure by the app. Wilson et al. [19] can estimate the distance between the receiving and sending proposed a calibrated measure of infection risk based on devices; and indoor positions can be calculated based on three empirical measurements, and devised a risk-scoring system or more RSSIs from fixed-location broadcasters or beacons. that aims to provide better quarantine recommendations. Some a) RSSI-based distance estimation: The major drawback scholars have evaluated the effectiveness of contact tracing to RSSI-based distance estimation is variation in its measure- via mathematical modeling and simulation, and compared it ment results, which can be caused by various environmental against other countermeasures such as or factors such as interference, emission power, and receiver lockdowns [20], [21]. Our findings can provide more realistic sensitivity, all of which introduce noise. Studies [15], [16] have parameters for contact tracing that can assist the refinement of shown that distances derived from RSSIs without calibration such models and simulations. can be quite diverse, even in a controlled-experiment scenario with no human participants and with the same settings on all devices. Some have proposed to augment Bluetooth with C. Worldwide Government Practices and User Perceptions other sensors to improve accuracy [14]. Nevertheless, we felt Despite the security and privacy issues involved in the that Bluetooth-based distance estimation has the potential to adoption of digital contact tracing, as of mid-October 2020, a provide helpful information to pandemic investigators, and considerable number of national and local governments have tested its performance for this purpose with human subjects either already introduced such measures in the fight against in controlled and uncontrolled settings, as explained in Sec- COVID-19, or are planning to do so. tion III. On the flipside, a relatively small number of govern- b) Empirical evaluation of contact tracing: Although ments have launched new human-based tracing services or many Bluetooth-based contact-tracing apps have been de- announced improvements to existing ones, and remain on 4 the fence regarding the adoption of digital contact tracing, On starting up, our modified app prompted each participant repeatedly citing concerns about uptake rates and false pos- to enter the unique ID that was assigned to him/her at the itives/negatives. The UK, for instance, is about to launch a beginning of the experiment, and to log device information coronavirus app that enables users to report symptoms and including the running operating system and phone model. book tests, but does not allow contact tracing [22]. While running, the app logged all the sent and received Several large-scale questionnaire surveys have been con- tokens, along with RSSI values, timestamps, and phone battery ducted to capture phone users’ general perceptions toward status. Specifically, when a device transmitted a token, the contact-tracing mobile apps. A multi-country survey of Eu- app logged the sent token with the unique ID of the device, rope and North America has shown high user acceptance of the current battery status, and a timestamp. When a token downloading such apps (74.8%) [23]. However, the results of was received from another participant’s device, on the other another survey, conducted in the U.S., suggest that support for hand, the app logged that received token with measured RSSI the policy of encouraging use of these apps is relatively weak and calculated distance; the unique ID of the device; and a (42%), as compared to traditional measures; but also that the timestamp. At the end of each task, the participants were asked implementation of decentralized data storage helps increase to click on a “Submit” button to upload the logged information acceptance [24]. A team in Jordan, meanwhile, reported that to our backend server. How the logged information was 71.6% of their respondents accepted the use of contact- processed for further analysis will be explained in Section IV. tracing technology, but only 37.8% actually used it [25]. It is noteworthy, therefore, that all of the respondents to our post- B. Phase I: Controlled Experiment study survey had actually experienced using such an app, and We assigned each participant a position and movement were thus able to provide meaningful, app-specific answers pattern (see Figure 2), such that the actual and Bluetooth- about their usage experience and privacy perceptions. RSSI-estimated distances between each pair of participants can be calculated and compared at multiple time-points. a) Research site and participants: The controlled experi- III.RESEARCH DESIGN ment, conducted in July 2020, comprised two scenarios: indoor Our research design comprises four elements, as shown in and outdoor. The indoor scenario utilized an empty classroom Table I. These are: 1) modification to the Covid-Watch-TCN 1, 411ft2 (131m2) in size, and the outdoor scenario, a covered mobile application; 2) a controlled experiment with 30 partic- patio measuring 5, 413ft2 (503m2). Both of them are 9.94ft ipants; 3) semi-controlled experiment with 50 participants in (3.03m) in height. the wild; and 4) a followup user survey administered to the We recruited 30 college students from our institutions. All participants from both experiments. the participants brought their personal mobile devices. Among these 30 devices, 14 were Android and the remainder, iOS. In both scenarios, we physically labeled each participant with A. App Customization and Configuration a unique ID and marked the floor with tape. Before starting To evaluate the effectiveness of Bluetooth-based decentral- the experiment, we provided instructions to all participants ized contact tracing, our experiments collected information explaining the research purpose and the overall experiment that would help us reconstruct the ground truth required for flow, and collected informed consent from all of them. Figure 1 conducting comparative analysis: e.g., the sender of each provides an overview of the experiment settings and process. token, and the distance between each pair of participants. b) Protocol: The indoor scenario was broken down into Some of this information was collected by the mobile app and five sessions, and the outdoor one into three, as shown in reported to a backend server; some was pre-assigned based on Table II. Both scenarios included three sessions—i.e., sessions our protocol; and some was derived from direct observation. 1, 3, and 4 in the indoor scenario, and sessions 6, 7, and 8 This subsection describes how we customized and configured in the outdoor scenario—that required the participants to hold an open-source app, and the following two subsections present their devices in hands and i) stand still, ii) equidistantly walk our protocol and observational approaches, respectively. in a given area, or iii) gradually move closer to each other. We modified the source code of Covid-Watch-TCN Expo- The other two sessions in the indoor scenario required the sure Notification App [3], whose Android and iOS versions are participants to stand still in a jammed environment charac- both open-source and can be found on GitHub. We modified terized by continuous sending of useless data by Bluetooth the version based on the TCN Coalition’s implementation. beacons (Session 2), and near a wall that divided the par- Since our use of it in our experiments, Covid-Watch has also ticipants into two groups (Session 5). To set up the jammed now switched to using the GAEN APIs, which nevertheless are environment, we placed six RaspberryPi 3 model b devices in not enabled in Taiwan. To minimize interference with the app’s the same room. Each of these devices emitted a unique, useless main functionality, we did not modify its token generation data via Bluetooth every 20 ms, i.e., five times faster than a algorithm, where tokens are sent every 100 ms and changed normal device. This emulates two cases: 1) the existence of every 15 minutes, but only locally logged data, and sent the a malicious user jamming the wireless channels by sending logs back to our server at the end of each task. We manually tokens at a higher frequency, and 2) a very crowded place inspected the app’s logic to identify the Android or iOS system containing an additional 30 (= 6 ∗ 5) phones. API calls that created or sent tokens, and inserted our logging In all, the eight sessions comprised 28 one-minute tasks. code before such calls. The multiple tasks within a session were set up to test how 5

TABLE I: Summary of our methodology RQ Element Sample Description Collected Data N/A App N/A The app was built on the source code of the Covid-Watch Exposure •Operating system customization Notification App. Device information was captured when a participant •Phone model and configuration registered as a user. RQ1 Controlled exper- 30 participants: The experiment was conducted in both indoor and outdoor scenarios. •Timestamp iment 14 Android devices; Participants held their devices and followed instructions to stand still, •Battery status 16 iOS devices walk along assigned paths, move closer to one another, etc. •Sent/Received tokens •Unique device IDs RQ1 Semi-controlled 50 participants: The experiment was performed during a conference in an auditorium. •RSSI values experiment 24 Android devices; Participants carried their devices when they were listening to a speech, •Calculated distances 26 iOS devices having a group discussion, taking a tea break, etc. •Ground-truth records RQ3 User survey 24 respondents A follow-up survey was developed to reveal the participants’ 1) technical •Survey results problems encountered during both experiments, 2) willingness to use contact-tracing apps, and 3) privacy concerns.

(a) The floor marked with tapes (b) The indoor scenario (a classroom) (c) The outdoor scenario (a covered patio) Fig. 1: Settings and participants in the controlled experiment

TABLE II: Session details of the pre-test. Note. d = 1.5 meters.

Scenario Task # OS(es) Distance(s) Session 1: Stand still 1-3 Android 0.5d, 1.0d, 1.5d4-6iOS 7-9 Mixed Session 2: Stand still in a jammed environment 10-12 Mixed 0.5d, 1.0d, 1.5d Indoor Session 3: Equidistantly walk in a given area 13-15 Mixed 0.5d, 1.0d, 1.5d Session 4: Gradually move closer 16 Mixed 0.5d Session 5: Stand still, and separated by a wall 17 Mixed 1.0d Session 6: Stand still 18-20 iOS 0.5d, 1.0d, 1.5d21-23Android 24-26 mixed Outdoor Session 7: Equidistantly walk in a given area Fig. 2: Example of a slide showing participants their positions 27 Mixed 1d Session 8: Gradually move closer and movement paths for a specific task. The letters represent 28 Mixed 0.5d users; the green circles, Android users; the black circles, iOS users; and the numerals, positions where each user stands.

Bluetooth signal propagation varied across devices 1) running different operating systems (i.e., Android, iOS, and mixed) • Distances. To simulate a real-world setting in which and 2) held at different distances from one another (i.e., 0.5d, people may or may not maintain social distancing, we 1d, 1.5d, where d = 1.5meters). asked the participants in sessions 1, 2, 3, and 6 to keep • Operating systems. In sessions 1 and 6, where the partic- a distance of 0.5d, 1.0d, or 1.5d (d=1.5 m) from each ipants stood still in both the indoor and outdoor scenar- other. This yielded data that subsequently allowed us to ios, they were first grouped by their devices’ operating compare the estimated distances generated by the app systems. After completing the tasks (i.e., tasks 1-6 and against the ground-truth distances, and thus to evaluate 18-23), the participants were taken out of these operating- the accuracy of the app’s proximity-detection techniques. system-based groups, and the experiment continued, any Due to time limitations, sessions 4, 5, 7, and 8 were further groupings being randomized. conducted with the participants attempting to maintain 6

a single fixed distance of 0.5d, 1.0d, 1.0d, and 0.5d, in, and Other. We also asked a yes/no question regarding respectively. whether the respondents had encountered upload failure during At the beginning of each task, the participants were the experiments. prompted to turn on the app, as well as their devices’ Bluetooth b) Section 2: Willingness to Use the Contact-tracing and GPS. We required all users to turn on GPS during the App: Section 2 aimed to capture how different technical experiment for consistency, because for Android version 6.0 factors and situations influenced the participants’ willingness and above, location services (e.g., GPS) need to be enabled to use the contact-tracing app. Its questions were divided into when performing Bluetooth scanning [26]. During each task, two groups. a series of slides would show the participants their assigned In the first, each of the seven answer options from Section positions and/or movement paths (Fig. 2). Once a task ended, 1 regarding technical problems was repeated, along with the the participants were asked to manually upload the logged data question, Will this technical problem affect your willingness to our backend server using the app. to use the app in the next six months? The respondent was then asked to select how much each problem s/he had selected C. Phase II: Semi-controlled Experiment would affect such willingness, on a three-point Likert-scale ranging from -2=gradually decrease to 0=not influenced, Participants were allowed to move freely within a large though an answer of N/A could also be given in place of a auditorium, and proximity events and contact events were scaled response. reconstructed based on video footage and on-site direct ob- The second group of items in Section 2 asked the respon- servation. dents to select how various non-technical conditions would a) Research sites and participants: We additionally con- affect their willingness to use the app over the following ducted a semi-controlled experiment at a cybersecurity sum- six months. These conditions were Regulation by law or mer school, which was held from July 27 through August 2, my school; Social influence from my family or colleagues; 2020 in a campus auditorium measuring roughly 4, 259ft2 2 Planning a trip domestically or abroad; Entering a crowd of (396m ) with a seating capacity of 250. more than 100 people; and Current use of such apps by epi- Out of the 216 attendees, we recruited 50 participants: 24 demic investigators. These were rated on a five-point Likert- were using Android devices and the other 26, iOS ones. The scale ranging from -2=gradually decrease to +2=gradually participants could be told apart from the other attendees by increase, plus an N/A option. their differently colored lanyards. c) Section 3: Privacy Considerations: In the final section b) Protocol: On the first day, we asked participants to of the questionnaire, the respondents were first asked about turn on our research app, Bluetooth, and GPS for at least 90 when and why they usually turned on their devices’ Bluetooth minutes, starting around 6:30pm; and on the second day, this and GPS functions. Then, they rated the statement My data was increased to 150 minutes, starting around 5pm. At the are secure and my privacy is protected while using the app points when the participants were asked to turn on the app, on a five-point Likert scale ranging from -2=strongly disagree they might be sitting still and listening to a speech, divided to +2=strongly agree, again with an N/A option. If a person’s into groups and taking part in discussions, or having a tea response revealed a negative attitude toward the app’s privacy break outside the auditorium. and security, i.e., was lower than 0, s/he would further be On the first day, one conference staff member was secretly asked to select from among the following list of five data- assigned to be “the source of the virus”. This individual turned security and privacy concerns: Data being tampered with; The on the app on his device, randomly passed by the participants, app developer or associates may take advantage of security and recorded these actions with a GoPro camera so that after weaknesses; The app developer or associates may use my data the experiment we could reconstruct his close contacts. In for other purposes; and My identity past contacts, or past addition, to enhance the analyzability of the collected sig- locations may be recognized. nals, four researchers observed and manually documented the We sent out the questionnaire to all 80 participants, but ground truth of proximity events in the auditorium, including in fact this represented only 78 individuals, as two had the IDs of the participants involved and the times when they participated in both experiments. Of these 78, 24 completed occurred. the questionnaire: a response rate of 30.8%.

D. User Survey E. Ethics After the experiments, we administered a questionnaire. Its This study was reviewed and approved in July 2020 by Na- three sections covered 1) technical problems the participants tional Taiwan University’s Research Ethics Office (equivalent had encountered during the experiments, 2) their attitudes to an Institutional Review Board in North America), and meets toward the use of the contact-tracing app, and 3) their attitudes all criteria for minimal-risk research (#202006HS001). toward personal privacy. a) Section 1: Technical Problems: In this section, the respondents could choose to agree with any or all of the F. Open Research Data following seven statements: Phone overheating, Seriously in- We will open the participant instructions shortly after pub- creased energy consumption, App crash, Unstable receiving lication, so that other research teams can reuse our protocols token, Phone performance negatively affected, Couldn’t log or reproduce our research. 7

IV. RESULT A. Data Processing and Analysis Based on the participants’ devices’ logs, we reconstructed a directed multigraph, on which a vertex represents a participant, and an edge from A to B represents a token sent by A and received by B. Each edge is labeled with a unique tuple (token, RSSI, timestamp) representing the corresponding token’s RSSI value and timestamp. Because tokens are sent every 100ms and changed every 15 minutes, the same token may be seen multiple times and have different RSSI values and timestamps. Tokens missing either sender or receiver information were removed. Missing sender information could have been caused Fig. 3: Transmission and reception statistics of Android and by technical glitches (e.g., device malfunctioning, phone over- iOS devices heating and network congestion), while missing receiver infor- mation could have been caused by any of the same factors, or simply by no device having received them. Tokens with 1) The Influence of Operating Systems: Figure 3 represents non-negative RSSI values or unrecognized sender/receiver IDs the transmission and reception statistics for all the devices were also removed. In all, around 524,000 tokens, representing used during the controlled experiment, classified by operating 86% of the total received, were removed. system. It shows that there was a significant difference between Because the participants all used their own devices, it was Android and iOS devices’ token-transmission capabilities. not possible for us to determine the root causes of all technical Most of the Android devices were able to transmit tokens to glitches; nor can we be certain that they were not specific both Android and iOS devices directly, while all but one of to our experiments. However, app-store reviews and news the iOS devices relied on nearby Android devices to broadcast reports reveal that many similar apps have struggled to resolve tokens to others. However, two out of nine Android devices similar glitches in real-world settings. Thus, it seems relatively and four out of 16 iOS devices failed to receive any packets unlikely that our settings and/or app modifications caused at all. them. According to the Covid-Watch-TCN app’s specifications, iOS versions 13.4 and older do not support discoverability B. Phase 1: Results of Distance Estimation in a Controlled between third-party iOS apps in the suspended or background- Environment running state if the devices’ screens are locked [3]. Therefore, an iOS device running version 13.4 or older should rely on an During the controlled experiment, five of the Android smart- Android device as a relay to broadcast Bluetooth packets when phones encountered technical issues, and thus data collected running the app in the background. On the other hand, iOS from them were corrupted and discarded. The remaining 25 devices running the app in the foreground exchange Bluetooth valid smartphones included nine Android and 16 iOS devices, packets with one another directly. and over the whole course of the experiment transmitted 700 Due to the iOS Bluetooth platform’s reliance on relays from unique tokens and received around 85,000 unique tuples of other devices, its RSSI and estimated-distance information (token, RSSI, timestamp), all of which were included in the cannot represent actual values. Therefore, we chose to focus analysis described below. only on directly transmitted tokens, i.e., Android-to-Android Estimated distances between pairs of devices were calcu- or Android-to-iOS, in our further analysis. lated directly by the Covid-Watch-TCN app based on Blue- 2) The Influence of Distance: Figure 4 illustrates the re- tooth RSSI data. lationships of the estimated and true distances between each The relation between measured RSSI and estimated dis- sender-receiver pair of Android devices in Session 1. In that tance, d, can be expressed as session, the participants stood still in an indoor environment, and within each task were 0.5d, 1d, or 1.5d apart. It can be seen

RSSI = −10n log10 d + A from the figure that the standard deviation of the estimated distance increased as the true distance increased, suggesting where n is the environment factor, and A is the reference that Bluetooth signals attenuate during transmission and be- signal strength at 1m. come more easily influenced by radio noise. The correlation In both Android and iOS Covid-Watch-TCN apps, n is coefficient between true distance and estimated distance was set to 2. The A value is determined based on the sender’s 0.68. transmission power level, encoded in Bluetooth tokens. When Figure 5 also indicates that the app tended to underestimate receiving a token, the app extracts the transmission power the distance between devices when the receiver was an An- level, and determines the value of A according to what range droid one, potentially leading to high numbers of false-positive that level falls within. Then, the app estimates based on the results. When the receiver was an iOS device, in contrast, the measured RSSI and A, using the equation shown above. app tended to overestimate the distance, potentially leading to 8

coefficient between true distance and estimated distance was just 0.26. Additionally, in Tasks 1-3, the app recorded transmission events for 137 out of 170 device pairs, a rate of 81%. However, when more participants joined the experiment in Tasks 7-9, the number of recorded pairs dropped by to 111, or 65%; i.e., one-third of the senders were no longer able to successfully transmit tokens to receivers, due to the “noise” caused by iOS devices in the immediate vicinity. 4) Power Consumption: On average, the phone battery dropped by 11.3% per hour in the uncontrolled experiment. As a reference, the per-hour drop was 14.3% in the semi- controlled experiment. We also observed a bigger battery drop in larger crowds: The per-hour drop for small and large groups Fig. 4: Estimated distance vs. ground truth, Session 1, Android were 10.4% and 29.6%, respectively. phones only C. Phase 2: Results of Proximity and Contact Detection in a Real-world Event Data collected in the semi-controlled experiment were also analyzed to evaluate the effectiveness of the Covid-Watch- TCN app in a spacious indoor environment. After data pre- processing, 41 valid devices were retained, including 24 An- droid and 17 iOS devices. Collectively, over the two days of the second experiment, they transmitted a total of 39,000 tokens and received 1.8 million. A proximity event was deemed to have occurred if 1) two devices were detected by the app as having exchanged tokens at below a particular estimated-distance threshold, and 2) the time at which this exchange was recorded as occurring by the app was within 15 minutes before or after the time at which the same event was recorded by the researchers observing the Fig. 5: Estimated distance vs. ground truth, Session 1, phones conference and/or the GoPro videos. with both systems running the app, both systems receiving A contact event between two devices was defined as a continuous proximity event lasting for a particular period, for example, 15 minutes. We further defined a strict contact event high numbers of false negatives. as one meeting the additional condition that every minute 3) The Influence of Background Radio Noise or Jamming: during the exposure period included at least one proximity Next, we tested if and how background radio noise or jamming event. affected token reception and RSSI. 1) Proximity Detection and Contact Tracing: There were Our results indicated that Bluetooth was susceptible to 67 proximity events documented by the four researchers during packet drop due to jamming. The devices received fewer to- the experiment and extracted from the GoPro videos. If we set kens when there were higher levels of background radio noise the distance threshold as 2m, commonly recommended as a due to jamming. Looking at Fig. 6, the number of received social-distancing measure, only 16 of these proximity events tokens was reduced in the presence of jamming. Additionally, were detected, implying a proximity detection rate of 24%. the app had a lower accuracy when the participants were in a However, even when no distance threshold was set (i.e., no jammed environment. By applying a Wilcoxon Rank Test with lower bound was placed on the RSSI value), the number of α = 0.05, we confirm that the estimation errors across these detected events only increased to 19, i.e., 28% of the total two types of settings were drawn from two non-distinguishable known to have occurred. distributions, implying that jamming did affect the app’s ability Under an exposure-duration rule of 15 minutes, meanwhile, to estimate distance. the app could only detect 7.5% of the relevant contact events. Even the app itself became a source of noise when a large Decreasing the exposure duration to 5 minutes and 1 minute number of app users were gathered in the same place. Tasks resulted in only slight increases in its contact-detection rate: 7-9 can be seen as more “noisy” conditions than Tasks 1- to 9.0% and 10.4%, respectively. And, when the strict contact 3, i.e., with 16 iOS devices placed between each pair of rule was applied, the app failed to detect any contact events at Android devices. As shown in Figure 5, as compared to the all. The proximity and contact detection rates at various RSSI previous three tasks’ results, estimates of distance in “noisy” and contact-duration thresholds are shown in Figure 7 and environments became more inaccurate in general, and even Figure 8, in which a measured signal with RSSI of -80 dB at distances of less than 2m. For these tasks, the correlation equates roughly to a 2m separation. 9

(a) 0.75m apart (b) 1.5m apart (c) 2.25m apart Fig. 6: Number of tokens received by each pair of devices in jammed vs. normal environments, with participants standing still

Fig. 9: Change over time in the number of unique recorded Fig. 7: Detection rates for contact events of three durations by sender-receiver pairs RSSI threshold, with proximity detection shown for compari- son

Fig. 10: The percentage of recorded unique sender-receiver pairs classified by the running operating system Fig. 8: Detection rates for strict-contact events of three dura- tions by RSSI threshold, with proximity detection shown for comparison contact events involving the “virus”, none were detected. 2) Exposure Duration: During the span of the semi- controlled experiment, each device sent at least one token to We also evaluated the proximity and contact detection rates every other device, and received at least one, with an average of “the source of the virus”. A total of 11 proximity events of around 1,000 tokens per device being sent, and about 38,000 with this individual were recorded, two via direct observation per device being received. and nine via screening of the GoPro videos. However, only Figure 9 illustrates trends in the number of unique recorded four of these 11 proximity events were recorded by the app, sender-receiver pairs over a 90-min period on the first day of despite none of them being fleeting. That is, exposure to the the experiment. This number steadily increased over time and “virus” lasted for at least 5 minutes in each case, according to converged to an upper bound of about 1,000, with 63% of all our observations. Moreover, among the 11 documented 5min- possible pairs represented. This indicates that, if one of its 10 users remains in an indoor environment long enough, the app V. PRACTICAL IMPLICATIONS will be able to discover most of the nearby devices. Our empirical findings have four important implications for We further classified the recorded device pairs according Bluetooth-based contact tracing, each of which is discussed in to their operating systems, as shown in Figure 10. Notice turn below. that most (> 80%) of the Android devices were able to be discovered by both nearby Android and iOS devices. As for A. Reliable or Unreliable Proximity Detection? iOS devices, 55% of them were discovered by iOS devices, with only 17% of them discovered by Android devices. Such First and foremost, RSSI does not produce reliable estimates results are consistent with our findings in the controlled of physical distance. Our comparison between RSSI estimates experiment that limitations of the iOS Bluetooth platform and measured distances showed that the former often spanned could significantly influence the transmission capability of iOS -11.3 to 11.7 db, resulting in errors of 0.27 to 3.85 times of the devices. ground truth. Unreliable distance estimates lead to inaccurate proximity or contact detection. While increasing the number of samples might reduce this D. Users’ Perceptions of App Usage variance, we also observed system bias caused by contextual Among the 24 questionnaire respondents, about half of them factors such as phone models and crowd size, in addition to indicated that they had encountered at least one technical those investigated in previous work, including wall geometry, problem, including the app crashing(n=7), unstable receiving phone orientation, and whether users were indoors or outdoors. tokens (n=6), phone overheating (n=4), unexpectedly high These system biases would be hard to eliminate in the absence energy consumption (n=3), login issues (n=3), and difficulty of extensive, detailed prior knowledge of the context in which uploading their devices’ token data (n=15). Nine out of the contact tracing would need to occur. 15 participants tried re-uploading and successfully uploaded Although the app was unreliable in estimating the distance the data eventually. between app users, information about whether they are in the Among these, the technical problem with the strongest same indoor location or not could still be useful to the broader negative influence on the respondents’ intentions to use the app contact-tracing process. was high energy consumption, followed by phone overheating. We also found that, although marginally more respondents B. The Influence of Crowds and Jamming mentioned experiencing crash problems (n=13) than inefficient Another interesting observation was that variance increased phone performance (n=12), the latter problem had a greater with the density of the crowd. That is, given the same number negative impact on their willingness to use the app. of participants in the same room, variance was lower when the The external conditions that the respondents selected most participants were farther away from each other. This could be often as likely to affect their willingness to use the contact- due to the status of human bodies as obstacles, as well as to tracing app were entering crowds of 100 people or more wireless channels becoming congested when all devices in the (n=19); current use of the app by epidemic investigators room are transmitting signals simultaneously. (n=19); regulations (n=16); and domestic-trip planning (n=16). In addition, the six Raspberry Pi devices that emitted tokens The top two of these conditions were also the most positively at a high rate in one session of our experiment, which we influential on the respondents’ willingness to use the contact- added to investigate possible jamming effects, had similar an tracing app. Some of the participants even rated the influence impact to more crowded conditions. That is, all else being held of regulations on their willingness to use the app as negative. equal, variance was higher in the more “noisy” environment Turning now to privacy issues, half our respondents stated that resulted from the inclusion of these extra devices. that their habits regarding Bluetooth and GPS functions would Reducing the token broadcast frequency (e.g., from 100ms not change in the wake of our experiments, whereas half said to 1s) in dense areas may alleviate packet loss due to inter- that they would. However, since our questionnaire did not ask ference but its effect on the proximity and contact detection about how/why Bluetooth and GPS usage impacted the re- remains to be investigated. spondents’ privacy concerns, we cannot make any conclusions about this split in attitudes. Surprisingly, only a small minority of our respondents C. Hidden Issue—the Power Drain and Other Glitches expressed a belief that, due to using the focal contact-tracing In all our experimental tasks, the participants’ phone bat- app, their data (n=4) or privacy (n=5) might be unsafe, with teries drained quickly regardless of brand. This excessive the others either deeming them to be safe, or expressing no consumption means that our research app would not be usable opinion on this matter. Among the minority, the top two data- in real-world scenarios, even if people were willing to try. security concerns cited were that the app developer might Some also complained that their phone overheated while take advantage of security weaknesses (n=4), and that the running the app. app developer did not build in sufficient protections (n=3). The reason for this extreme power consumption may be Their top privacy concerns included their past routes being attributable to how Bluetooth is handled in our app. The recognized (n=4), developers’ associations using the data for early version of Covid-Watch—along with many other apps other purposes (n=4), and the developer itself using the data implemented before the release of the Google-Apple Exposure for other purposes (n=3). Notification (GAEN) API—did not have native access to 11

Bluetooth, and had to use some hacks to bypass low-level However, it should be borne in mind that most of our restrictions. For example, iOS versions below 13.4 can only participants in the second experiment were students with send tokens when either the sender or the receiver is running information-engineering backgrounds and an interest in cyber- in the foreground. These hacks likely consume unnecessary security, who may have been less likely to worry about data- resources, including energy. Although we were unable to test security issues than an equivalent-sized sample of the general GAEN API-based apps, we anticipate that they will have better public. power efficiency. Some participants also experienced app crashes or hangs, F. Limitations and thus could not broadcast or submit tokens. Although this Our study has the following limitations and thus may not was likely caused by our rapid development cycle and lack be generalized to other settings. Additional studies are needed of testing on a variety of phone models and OSs, it is worth to address these limitations and to further improve Bluetooth- noting that similar issues have been reported by users of other based contact tracing. contact-tracing apps. • We did not have access to the GAEN APIs developed by Google and Apple, and our evaluation was limited to D. Potential Interoperability Issues a specific implementation (the TCN Coalition version of To be effective, apps need to be interoperable and produce Covid-Watch). consistent results regardless of what OSs, phone models, • We used just one RSSI-to-distance function in our exper- app configurations and implementations are involved. Our iment, so apps using different distance-estimation algo- experiments used the existing Android and iOS versions of rithms might produce different results. the same app, and about half of our participants used Android, • Our logging code may introduce additional overheads. and the rest used iOS. To emulate realistic scenarios, we did • Our experiments were of short duration, and were not not place any limits on the phone models involved, apart from representative of the full range of real situations. a requirement that all support Bluetooth. • Falsely identified non-contacts (false positives) were not We found asymmetric results across phone models and analyzed in our semi-controlled setting. versions. The differences we observed among phone models might be due to differences in Bluetooth chips, transceiver VI.CONCLUDING REMARKS modules, and signal-processing methods, among other factors. Digital contact tracing has strong potential to ease the This complicates interoperability by implying that each re- burdens of traditional contact tracing by narrowing down the ceiving phone may need to know the model and version of set of cases requiring further investigation. To achieve this each sending phone if the app’s detection accuracy is to be potential, however, it is important to understand the its limits in improved. real-world settings. In this study, we evaluated a type of digital We observed that iOS devices tended to overestimate dis- contact tracing that preserves user privacy through Bluetooth tances, while Android ones tended to underestimate them. The technology and decentralized design. Based on our findings, overestimation of distance by the iOS devices may have been we make the following recommendations, including some for caused by the calibration of the default reference RSSI value future research. (i.e., A) at 1m across both versions of the Covid-Watch-TCN When using existing/unmodified Bluetooth-based contact- app. For the Android version of the app, there were only three tracing apps, users should be aware of the inaccuracy of their possibilities for the value of A; and the iOS version had the estimates of distance and contact duration, and avoid making same possible values as the Android version, except that its important decisions based solely on such estimates. Also, users default value was greater by 10, i.e., Android is -67db and iOS should maintain social distancing even when these apps are on, -57db. This coarse ranges of transmission power levels could not merely as an extra layer of protection, but also to enhance lead to inaccuracy in distance estimates. the accuracy of their proximity-detection function. Google and Apple’s exposure-notification system [2] recom- When it is possible to change their detection logic, we rec- mends that Android devices be calibrated to a typical iPhone ommend using Bluetooth-based contact tracing for proximity according to their model designations. However, even with detection that is coarse-grained (e.g., within 20m for an hour) improved calibration to compensate biases due to inter-device rather than fine-grained (e.g., within 1m for 15 minutes, as differences, inaccuracies caused by environmental factors may a means of identifying social-distancing violations). Coarse- be difficult to eliminate in the absence of prior knowledge of grained proximity detection would tend to have similar out- the context. comes to digital approaches based on QR code scanning or cell tower history, insofar as it would capture the fact that a particular set of people were co-present with an infected E. Users’ Perception person (e.g., attended the same convention or ate in the same Our survey result may indicate that people’s willingness to restaurant), but preserve privacy. use contact-tracing apps is rooted in self-protection concerns When it is possible to modify their apps to collect addi- and/or a public-spirited desire to aid epidemic investigation tional information, designers should consider incorporating work, but that the enforced use of such apps might nevertheless mechanisms to reduce false/missed detection and resource provoke opposition. consumption. This could be achieved by taking additional 12 samples or collecting additional contextual information to [10] H. Cho, D. Ippolito, and Y. W. Yu, “Contact tracing mobile apps for calibrate distance estimates and broadcast frequencies. An app covid-19: Privacy considerations and related trade-offs,” arXiv preprint arXiv:2003.11511, 2020. might be able to collect some contextual information by itself, [11] L. Baumgärtner et al., “Mind the gap: Security & privacy risks of contact e.g., whether the phone is in an area dense with other phones, tracing apps,” arXiv preprint arXiv:2006.05914, 2020. or whether it has been placed in a bag. Further studies are [12] N. Trieu, K. Shehata, P. Saxena, R. Shokri, and D. Song, “Epi- one: Lightweight contact tracing with strong privacy,” arXiv preprint needed to evaluate the influence of such dynamic adjustment arXiv:2004.13293, 2020. on detection rates, interoperability, and privacy. In addition, it [13] S. Bertuletti, A. Cereatti, M. Caldara, M. Galizzi, and U. Della Croce, might be helpful for apps to encode certain sender information “Indoor distance estimated from bluetooth low energy signal strength: Comparison of regression models,” in IEEE Sensors Applications Sym- in addition to maximum transmission strength level in its posium, 2016. broadcasts, though such information should be narrowed down [14] X. Guo, N. Ansari, L. Li, and L. Duan, “A hybrid positioning system to a small number of common options to limit potential privacy for location-based services: Design and implementation,” IEEE Commu- nications Magazine, vol. 58, no. 5, pp. 90–96, 2020. abuse (e.g., tracking a particular device based on its unique [15] D. J. Leith and S. Farrell, “Coronavirus contact tracing: Evaluating maximum transmission strength). App designers may also the potential of using bluetooth received signal strength for proximity wish to consider augmenting other sensing technologies such detection,” SIGCOMM Comput. Commun. Rev., vol. 50, no. 4, p. 66–74, Oct. 2020. as ultrasound and Wi-Fi [14], [27]. As well as facilitating [16] J. Bay and H. Tan, “OpenTrace Calibration,” 2020. [Online]. Available: the development of better distance-estimation functions, added https://github.com/opentrace-community/opentrace-calibration contextual information should help those seeking to estimate [17] J. Ma, D. Neumann, F. Sattler, R. Schäfer, P. Wagner, and T. Wiegand, “Proximity tracing app: Report from the infection probabilities. measurement campaign 2020-04-09,” 2020. [Online]. Avail- On a broader picture, many more questions need to be an- able: https://github.com/pepp-pt/pepp-pt-documentation/blob/master/ swered before we can conclude: Can Bluetooth-based contact 12-proximity-measurement/2020-04-09-BW-report-epi-mod.pdf [18] “Testing Apps for COVID-19 Tracing (TACT),” 2020. [Online]. tracing help manual contact tracing? These include but are Available: https://down.dsg.cs.tcd.ie/tact/ not limited to investigating other potential attacks and privacy- [19] A. M. Wilson et al., “Quantifying sars-cov-2 infection risk within preserving countermeasures, improving app adoption, handling the google/apple exposure notification framework to inform quarantine recommendations,” medRxiv, 2020. [Online]. Available: https://www. panics and fatigues due to false positives, and estimating medrxiv.org/content/early/2020/09/17/2020.07.17.20156539 exposure risks based on collected contact information and [20] A. J. Kucharski et al., “Effectiveness of isolation, testing, contact epidemiological infectiousness models. tracing and physical distancing on reducing transmission of sars-cov- 2 in different settings,” medRxiv, 2020. [21] M. Abueg et al., “Modeling the combined effect of digital exposure noti- ACKNOWLEDGMENTS. fication and non-pharmaceutical interventions on the covid-19 epidemic This work was financially supported by the Ministry of in washington state,” medRxiv, 2020. [22] M. Allison, “The uk’s coronavirus app will launch Science and Technology (MOST) in Taiwan, under MOST without contact tracing,” https://www.imore.com/ 107-2221-E-009-028-MY3, 109-2636-E-002-021-, and 109- uks-coronavirus-app-will-launch-without-contact-tracing, 2020. 2636-H-002-002-. The authors would like to thank Xue-Yuan [23] S. Altmann et al., “Acceptability of app-based contact tracing for covid- 19: Cross-country survey evidence,” Available at SSRN 3590505, 2020. Gu and Kai-Lin Zhang for co-creating the application, Jia- [24] B. Zhang, S. Kreps, and N. McMurry, “Americans’ perceptions of Chi Huo and Bo-Rong Chen for implementing the jamming privacy and surveillance in the covid-19 pandemic,” 2020. devices, Yun-Chi Chang, Yu-Jen Chen, Li-Fei Kung, and Yu- [25] S. Abuhammad, O. F. Khabour, and K. H. Alzoubi, “Covid-19 contact- tracing technology: Acceptability and ethical issues of use,” Patient Ju Yang for helping conduct the experiments, and Yu-Wen Preference and Adherence, 2020. Huang for redesigning the instruction slides. [26] “Android 6.0 changes,” https://developer.android.com/about/versions/ marshmallow/android-6.0-changes.html#behavior-hardware-id, 2020. REFERENCES [27] P.-S. Loh, “Accuracy of Bluetooth-Ultrasound Contact Trac- ing,” 2020. [Online]. Available: https://www.novid.org/downloads/ [1] W. S. D. of Health, “Case investigation and contact tracing metrics,” 20200626-accuracy.pdf https://www.doh.wa.gov/Portals/1/Documents/1600/coronavirus/ data-tables/COVID-19-CaseInvestigationContractTracingReport.pdf, 2020. [2] “Exposure Notifications: Using technology to help public health authorities fight COVID-19,” 2020. [Online]. Available: https://www. google.com/covid19/exposurenotifications/ [3] “CovidWatch,” 2020. [4] A. Noll, “Eu plans international coronavirus tracing network,” https: //p.dw.com/p/3ioYf, 2020. [5] “Apple, google promise better coronavirus tracking with ’exposure notifications express’,” https://www.cnet.com/news/ apple-google-promise-better-coronavirus-tracking-with-exposure-notifications-express/, 2020. [6] R. Raskar et al., “Apps gone rogue: Maintaining personal privacy in an epidemic,” CoRR, vol. abs/2003.08567, 2020. [7] C. Troncoso et al., “Decentralized privacy-preserving proximity tracing,” IEEE Data Eng. Bull., vol. 43, no. 2, pp. 36–66, 2020. [8] A. Berke, M. Bakker, P. Vepakomma, K. Larson, and A. S. Pentland, “Assessing disease exposure risk with location data: A proposal for cryptographic preservation of privacy,” CoRR, vol. abs/2003.14412, 2020. [9] J. Bell, D. Butler, C. Hicks, and J. Crowcroft, “Tracesecure: Towards privacy preserving contact tracing,” CoRR, vol. abs/2004.04059, 2020. [Online]. Available: https://arxiv.org/abs/2004.04059