
Ethical issues in research using datasets of illicit origin Daniel R. Thomas Sergio Pastrana Alice Hutchings Richard Clayton Alastair R. Beresford Cambridge Cybercrime Centre, Computer Laboratory University of Cambridge United Kingdom [email protected] ABSTRACT 3 INTRODUCTION We evaluate the use of data obtained by illicit means against The scientific method requires empirical evidence to test a broad set of ethical and legal issues. Our analysis covers hypotheses. Consequently, both the gathering, and the use both the direct collection, and secondary uses of, data ob- of, data is an essential component of science and supports tained via illicit means such as exploiting a vulnerability, or evidence-based decision making. Computer scientists make unauthorized disclosure. We extract ethical principles from significant use of data to support research and inform policy, existing advice and guidance and analyse how they have been and this includes data which was obtained through illegal or applied within more than 42 recent peer reviewed papers that unethical behaviour. deal with illicitly obtained datasets. We find that existing In this paper we consider the ethical and legal issues sur- advice and guidance does not address all of the problems that rounding the use of datasets of illicit origin, which we define researchers have faced and explain how the papers tackle as data collected as a result of (i) the exploitation of a vulner- ethical issues inconsistently, and sometimes not at all. Our ability in a computer system; (ii) an unintended disclosure analysis reveals not only a lack of application of safeguards by the data owner; or (iii) an unauthorized leak by someone but also that legitimate ethical justifications for research are with access to the data. being overlooked. In many cases positive benefits, as well as The collection, or use, of a dataset of illicit origin to support potential harms, remain entirely unidentified. Few papers research can be advantageous. For example, legitimate access record explicit Research Ethics Board (REB) approval for to data may not be possible, or the reuse of data of illicit the activity that is described and the justifications given for origin is likely to require fewer resources than collecting data exemption suggest deficiencies in the REB process. again from scratch. In addition, the sharing and reuse of existing datasets aids reproducibility, an important scientific CCS CONCEPTS goal. The disadvantage is that ethical and legal questions •Social and professional topics → Computing profes- may arise as a result of the use of such data. sion; Codes of ethics; Computing / technology pol- There is evidence that some researchers who use datasets icy; •General and reference → Surveys and overviews; of illicit origin consider ethical and legal issues, particularly •Applied computing → Law; •Networks → Network through the introduction of ethical consideration sections privacy and anonymity; into papers [:5] and the development and use of institutional resources such as Research Ethics Boards (REBs) [48]. Un- KEYWORDS fortunately, our work shows that neither is common practice, and even where they are tackled, ethical and legal consid- Ethics, law, leaked data, found data, unintentionally public erations often appear incomplete. It therefore follows that data, data of illicit origin, cybercrime, Menlo Report potential harms may have taken place which might otherwise ACM Reference format: have been mitigated or avoided. Research that lacks sufficient Daniel R. Thomas, Sergio Pastrana, Alice Hutchings, Richard ethical consideration may still be ethical, but it is difficult to Clayton, and Alastair R. Beresford. 4239. Ethical issues in research assess this. using datasets of illicit origin. In Proceedings of IMC ’39, London, General guidance, such as that provided by the Menlo UK, November 3–5, 4239, 3: pages. Report [4:], provides useful advice, but does not address all DOI: 32.3367/5353587.53535:; the issues that arise in using data of illicit origin. Academic discussions have taken place [54], and there are blog posts and IMC ’39, London, UK other informal articles by academics on the topic [33;, 346], © 4239 Copyright held by the owner/author(s). Publication rights licensed to ACM. This is the author’s version of the work. It is posted but to date there is little in the way of detailed analysis, or here for your personal use. Not for redistribution. The definitive a systematisation of knowledge, which explores this problem Version of Record was published in Proceedings of IMC ’39, November in depth. 3–5, 4239, http://dx.doi.org/32.3367/5353587.53535:;. IMC ’39, November 3–5, 4239, London, UK Daniel R. Thomas et al. The goal of this paper is to address this gap and provide a to extra-judicial assassination [8] and hence care would need detailed evaluation of the use of data of illicit origin in peer- to be taken with data collected from online drug markets to reviewed research, and to support the development of a more ensure this did not result in such abuse. nuanced understanding of the issues and problems in this Releasing and using shared data: The WECSR workshop space. We do this by first reviewing previous work to identify in 4234 convened a panel of experts from different domains the ethical (§4) and legal (§5) issues that can arise. We then who agreed that research involving data of illicit origin would analyse over 42 recent peer-reviewed papers which make need to have a clear benefit to society [54]. They also argued use of data of illicit origin (§6), and systematize (Table 3) that simply because data is public does not exempt research the ethical and legal decisions made against a common set using such data from obtaining REB approval since it might of justifications, safeguards, potential harms and potential contain personally identifiable information. This echoes the benefits (§7). Menlo Report’s suggestion that the REB must protect the interests of individuals where informed consent is impossible. Sharing of datasets is beneficial for data science, but the 4 ETHICS purpose and scope for using such data must be stated [3;]. Ethical norms are constantly changing with research ethics Allman and Paxson discussed the ethical issues of releasing developing over the course of the 42th century and becoming data, using data released by others, and the interactions more prominent in our field in the 43st century. Previous between providers and users of data [6]. A key ethical consid- work related to the ethical use of data of illicit origin spans a eration in this context is privacy protection. It is likely that number of topics, including informed consent, human rights, data of illicit origin was not intended for research purposes or releasing and using shared data, hacking, analysis techniques, public exposure, and thus it may not be anonymised. In such ethical review, and Research Ethics Boards (REBs). We cases, the raw dataset should not be shared publicly, and consider each of these in turn. research conducted with such data should aim to preserve Informed consent: The earliest work on the ethics of privacy. Researchers who hold data of illicit origin should computer-monitored data explored informed consent and only provide details of their source or (as Allman and Paxson emphasised the right of withdrawal as well as the importance suggest) share data with verified researchers under a written of data anonymisation [:;]. The first difficulty with data of acceptable usage policy. None of the papers we discuss later illicit origin is that it is not always possible to meet these re- took this approach. Partridge argues that papers in network quirements. Acquiring consent from users involved in leaked measurement research should have an ethics section, partly to data is challenging, particularly if they are involved in illegal increase the availability of examples of ethical reasoning [:5]. activities [94]. In the case of data obtained from underground We show in §7 that few papers using data of illicit origin marketplaces, covert research without consent is necessary to have an ethics section. understand what is traded due to the illegality of the goods Both Allman & Paxson, and Partridge warn against relying bought or sold – consent could affect the results [323]. This on the anonymisation of data since deanonymisation tech- is one of the exceptions for informed consent in the ethics niques are often surprisingly powerful. Robust anonymisation statement of the British Society of Criminology, which states of data is difficult, particularly when it has high dimensional- that “covert research may be allowed where the ends might ity, as the anonymisation is likely to lead to an unacceptable be thought to justify the means” [45]. level of data loss [5]. In cases where consent is possible, previous work has con- Hacking and intervening: Hacking into computers to ex- cluded that if informed consent has been given on the basis tract information is usually unethical [324] and illegal. Moore of a promise of confidentiality by the researcher, then re- and Clayton considered ethical dilemmas in take-down re- searchers should take particular care to ensure that they are search resulting from nine dilemmas they faced during their willing to keep the promises they make, particularly if doing research. They considered the balance between reducing so might require them to break the law [74]. harm uncovered during measurements and the accuracy of Where informed consent is impossible to obtain, the Menlo such measurements, the dangers of telling criminals the flaws Report recommends that the REB must protect the interests in their systems and the importance of ensuring that pro- of the individuals [48]. Thus, the REB has a particularly posed interventions are likely to work [97]. Dittrich et al. important role to play in research which makes use of data provide two case studies on ethical decision making for remote of illicit origin where informed consent is not possible [45].
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages18 Page
-
File Size-