<<

Feature IMAGE VISUALIZATION BY ADAM HARVEY (HTTPS://MEGAPIXELS.CC) BASED ON THE MEGAFACE DATA DATA THE MEGAFACE BASED ON (HTTPS://MEGAPIXELS.CC) HARVEY ADAM BY VISUALIZATION IMAGE COMMONS FLICKR CREATIVE THE YAHOO ET AL. BASED ON IRA KEMELMACHER-SHLIZERMAN SET BY LICENCES BY) (CC ATTRIBUTION COMMONS CREATIVE UNDER LICENSED SET AND DATA MILLION 100 A collage of images from the MegaFace data set, which scraped online photos. Images are obscured to protect people’s privacy.

n September 2019, four researchers wrote to the publisher Wiley to “respect- fully ask” that it immediately retract a scientific paper. The study, published in THE ETHICAL 2018, had trained algorithms to distin- guish faces of Uyghur people, a predom- inantly Muslim minority ethnic group in , from those of Korean and Tibetan Iethnicity1. QUESTIONS THAT China had already been internationally con- demned for its heavy surveillance and mass detentions of in camps in the north- western province of — which the gov- HAUNT FACIAL- ernment says are re- centres aimed at quelling a terrorist movement. According to media reports, authorities in Xinjiang have used surveillance cameras equipped with soft- ware attuned to Uyghur faces. RECOGNITION As a result, many researchers found it dis- turbing that academics had tried to build such algorithms — and that a US journal had published a research paper on the topic. And the 2018 study wasn’t the only one: journals RESEARCH from publishers including Springer Nature, Elsevier and the Institute of Electrical and Elec- Journals and researchers are under fire for tronics Engineers (IEEE) had also published controversial studies using this technology. And a peer-reviewed papers that describe using facial recognition to identify Uyghurs and Nature survey reveals that many in this field think members of other Chinese minority groups. there is a problem. By Richard Van Noorden (Nature’s news team is editorially independent from its publisher, Springer Nature.) The complaint, which launched an ongo- ing investigation, was one foray in a growing push by some scientists and human-rights activists to get the scientific community

354 | Nature | Vol 587 | 19 November 2020 ©2020 Spri nger Nature Li mited. All ri ghts reserved. ©2020 Spri nger Nature Li mited. All ri ghts reserved.

to take a firmer stance against unethical issued the world’s largest data set, MSCeleb5, used it or data derived from it (see go.nature. facial-recognition research. It’s important consisting of 10 million images of nearly com/3nlkjre). The authors urged researchers to denounce controversial uses of the tech- 100,000 individuals, including journalists, to set more restrictions on the use of data sets nology, but that’s not enough, ethicists say. musicians and academics, scraped from the and asked journals to stop accepting papers Scientists should also acknowledge the mor- Internet. that use data sets that had been taken down. ally dubious foundations of much of the aca- In 2019, Berlin-based artist Adam Harvey Legally, it is unclear whether scientists in demic work in the field — including studies that created a website called MegaPixels that Europe can collect photos of individuals’ faces have collected enormous data sets of images flagged these and other data sets. He and for biometric research without their consent. of people’s faces without consent, many of another Berlin-based technologist and pro- The European Union’s vaunted General Data which helped hone commercial or military grammer, Jules LaPlace, showed that many had Protection Regulation (GDPR) does not pro- surveillance algorithms. (A feature on page been shared openly and used to evaluate and vide an obvious legal basis for researchers 347 explores concerns over algorithmic bias improve commercial surveillance products. to do this, reported6 Catherine Jasserand, a in facial-recognition systems.) Some were cited, for instance, by companies biometrics and privacy-law researcher at the An increasing number of scientists are urg- that worked on military projects in China. “I Catholic University of Leuven in Belgium, in ing researchers to avoid working with firms wanted to uncover the uncomfortable truth 2018. But there has been no official guidance or universities linked to unethical projects, that many of the photos people posted online on how to interpret the GDPR on this point, and to re-evaluate how they collect and distribute have an afterlife as training data,” Harvey it hasn’t been tested in the courts. In the United facial-recognition data sets and to rethink the says. In total, he says he has charted 29 data States, some states say it is illegal for commer- ethics of their own studies. Some institutions sets, used in around 900 research projects. cial firms to use a person’s biometric data with- are already taking steps in this direction. In the Researchers often use public Flickr images that out their consent; Illinois is unique in allowing past year, several journals and an academic were uploaded under copyright licences that individuals to sue over this. As a result, several conference have announced extra ethics allow liberal reuse. firms have been hit with class-action lawsuits. checks on studies. After The Financial Times published an The US social-media firm , for “A lot of people are now questioning why article on Harvey’s work in 2019, instance, agreed this year to pay US$650 mil- the computer-vision community dedicates so and several universities took their data sets lion to resolve an Illinois class-action law- much energy to facial-recognition work when down. Most said at the time — and reiterated suit over a collection of photos that was not it’s so difficult to do it ethically,” says Deborah to Nature this month — that their projects publicly available, which it used for facial Raji, a researcher in Ottawa who works at the had been completed or that researchers recognition (it now allows users to opt out of non-profit Internet foundation Mozilla. “I’m had requested that the data set be removed. facial-recognition tagging). The controver- seeing a growing coalition that is just against sial New York City-based technology company this entire enterprise.” “Conferences should avoid Clearview AI — which says it scraped three bil- This year, Nature asked 480 researchers lion online photos for a facial-recognition sys- around the world who work in facial recog- sponsors who are accused of tem — has also been sued for violating this law nition, and artificial intel- enabling abuses of human in pending cases. And the US tech firms IBM, ligence (AI) for their views on thorny ethical rights.” , Microsoft, Amazon and FaceFirst were questions about facial-recognition research. also sued in Illinois for using a data set of nearly The results of this first-of-a-kind survey sug- one million online photos that IBM released in gest that some scientists are concerned about Computer scientist Carlo Tomasi at Duke Uni- January 2019; IBM removed it at around the the ethics of work in this field — but others still versity was the sole researcher to apologize for time of the lawsuit, which followed a report by don’t see academic studies as problematic. a mistake. In a statement two months after the NBC News detailing photographers’ disquiet data set had been taken down, he said he had that their pictures were in the data set. Data without consent got institutional review board (IRB) approval Microsoft told Nature that it has filed to dis- For facial-recognition algorithms to work well, for his recordings — which his team made to miss the case, and Clearview says it “searches they must be trained and tested on large data analyse the motion of objects in video, not only publicly available information, like sets of images, ideally captured many times for facial recognition. But the IRB guidance Google or any other search engine”. Other under different lighting conditions and at dif- said he shouldn’t have recorded outdoors and firms did not respond to requests for com- ferent angles. In the 1990s and 2000s, scien- shouldn’t have made the data available with- ment. tists generally got volunteers to pose for these out password protection. Tomasi told Nature photos — but most now collect facial images that he did make efforts to alert students by Vulnerable populations without asking permission. putting up posters to describe the project. In the study on Uyghur faces published by For instance, in 2015, scientists at Stanford The removal of the data sets seems to have Wiley1, the researchers didn’t gather photos University in California published a set of dampened their usage a little, Harvey says. But from online, but said they took pictures of 12,000 images from a webcam in a San Fran- big online image collections such as MSCeleb more than 300 Uyghur, Korean and Tibetan cisco café that had been live-streamed online2. are still distributed among researchers, who 18–22-year-old students at Dalian Minzu Uni- The following year, researchers at Duke Uni- continue to cite them, and in some cases have versity in northeast China, where some of the versity in Durham, North Carolina, released re-uploaded them or data sets derived from scientists worked. Months after the study was more than 2 million video frames (85 minutes) them. Scientists sometimes stipulate that data published, the authors added a note to say that of footage of students walking on the univer- sets should be used only for non-commercial the students had consented to this. But the sity campus3. research — but once they have been widely researchers’ assertions don’t assuage ethical The biggest collections have been gathered shared, it is impossible to stop companies concerns, says Yves Moreau, a computational online. In 2016, researchers at the University from obtaining and using them. biologist at the Catholic University of Leuven. of Washington in Seattle posted a database, In October, computer scientists at Princeton He sent Wiley a request to retract the work last called MegaFace, of 3.3 million photos from University in New Jersey reported identifying year, together with the Toronto-based advo- the image-sharing site Flickr4. And scientists at 135 papers that had been published after the cacy group Tech Inquiry. Microsoft Research in Redmond, Washington, Duke data set had come down and which had It’s unlikely that the students were told

Nature | Vol 587 | 19 November 2020 | 355 ©2020 Spri nger Nature Li mited. All ri ghts reserved. ©2020 Spri nger Nature Li mited. All ri ghts reserved.

Feature enough about the purpose of the research informed consent are met and described in and travelled there to give a speech. Some AI to have given truly informed consent, says articles. Other publishers say that they have researchers, including Toby Walsh at the Uni- Moreau. But even if they did freely consent, made adjustments, too. The IEEE says that versity of New South Wales in Sydney, Australia, he argues, human-rights abuses in Xinjiang this September, it approved a policy under later criticized Jain for this in stories reported mean that Wiley ought to retract the study to which authors of articles on research involv- by the New York City-based Coda magazine. avoid giving the work academic credence. ing human subjects or animals should con- Coda magazine also noted that Springer Moreau has catalogued dozens of papers on firm whether they had approval from an IRB Nature sponsored the conference; the com- Uyghur populations, including facial-recogni- or equivalent local review; editors determine pany said its role was limited to publishing tion work and studies that gathered Uyghur whether research (on biometrics or other CCBR proceedings and that it had strength- people’s DNA. In December, he wrote an opin- areas) involves human subjects. ened its requirements for conference organ- ion article in Nature calling for all unethical But Moreau says that publishers’ focus on izers to comply with the publisher’s editorial work in biometric research to be retracted7. the details of consent is too narrow, and that policies after concerns were raised about past His campaign has had some impact, but not they should also take a stand on the wider eth- content. And Jain challenged the critique, quite to the extent he’d hoped. Publishers say ics of research. “We are talking about massive telling Nature that attending conferences in the key issue is checking whether participants human-rights abuses,” he says. “At some point, China “does not mean that … international in studies gave informed consent. Springer Western publishers should say that there are conference participants, like me, condone Nature, for instance, said in December 2019 some baselines above which they don’t go.” these atrocities against minorities”. Growth that it would investigate papers of concern He suggests that publishers should set up in surveillance there shouldn’t be a reason to on vulnerable groups along these lines, and independent ethics boards that can give opin- “curtail scientific exchange”, he said. that it had updated its guidance to editors ions when questions such as these arise. (No Jain remains on the advisory board for CCBR and authors about the need to gain explicit publishers asked by Nature said that they had 2020–21; Springer Nature is still publishing the and informed consent in studies that involve taken this step.) Universities and researchers conference abstracts. And major international clinical, biomedical or biometric data from who disapprove of human-rights abuses could computer-vision conferences have continued people. This year, the publisher retracted also do more to express this by dropping their to accept sponsorship from Chinese firms. Just two papers on DNA sequencing8,9 because associations with questionable technology after the blacklisting, SenseTime and Megvii the authors conceded that they hadn’t asked firms, says Kate Crawford, co-director of the sponsored the 2019 International Conference Uyghur people for their consent, and it has AI Now Institute at New York University. on Computer Vision, and Megvii sponsored placed expressions of concern on 28 others. In the past year, there has been growing scru- the 2020 Conference on Computer Vision and Wiley has also focused on informed consent. Pattern Recognition, although its logo was Last November, the publisher told Moreau and “There are a number of removed from the conference’s website after Tech Inquiry that it was satisfied that consent lawful and legitimate the meeting occurred. “Conferences should forms and university approval for the Dalian avoid sponsors who are accused of enabling study were available, and so it stood by the applications of face and abuses of human rights,” reiterates Walsh. research, which it felt could be firmly sepa- biometric recognition.” However, he notes that last year, the non-gov- rated from the actions in Xinjiang. “We are ernmental organization Human Rights Watch aware of the persecution of the Uyghur com- in New York City withdrew initial allegations munities,” Wiley said. “However, this article is tiny of universities’ partnerships with compa- that Megvii facial-recognition technology was about a specific technology and not an appli- nies or research programmes linked to mass involved in an app used in Xinjiang. Conference cation of that technology.” surveillance in Xinjiang. The Massachusetts organizers did not respond to a request for In December, however, the publisher Institute of Technology (MIT) in Cambridge, comment. opened a formal investigation, after Curtin for example, said it would review its relation- University in Perth, Australia, where one of ship with the -based tech firm Ethical checkpoints the authors is based, also asked for a retrac- SenseTime after the US government — in the Questionable research projects have popped tion, saying it agreed that the work was eth- middle of a trade war with China — blacklisted up in the United States, too. On 5 May, Harris- ically indefensible. This year, Wiley added a the firm and other Chinese AI companies, such burg University in Pennsylvania posted a press publisher’s note saying that the article “does as Megvii in , over their alleged contri- release declaring that researchers there had appear to be in compliance with acceptable butions to human-rights violations in Xinjiang. developed facial-recognition software “capa- standards for conducting human subject In 2018, SenseTime and MIT announced they ble of predicting whether someone is likely research”. In September, after Moreau dug into had formed an “alliance on artificial intelli- going to be a criminal”, with “80 percent accu- the authors’ previous studies of facial recog- gence”; MIT says that SenseTime had provided racy and no racial bias”. The announcement nition on Uyghurs and pointed out apparent an undisclosed sum to the university without triggered a wave of criticism, as had previ- inconsistencies in the year that the data sets any restrictions on how it would be used, and ous studies that hark back to the discredited had been gathered, Wiley placed an expression that the university will not give it back. work of nineteenth-century physiognomists. of concern on the study, saying that it was not Both Megvii and SenseTime contest the US One notorious 2016 study reported that a clear when the data collection had taken place. blacklisting. SenseTime says its technology machine-learning algorithm could spot the The publisher told Nature that it now consid- has “never been applied for any unethical pur- difference between images of non-criminals ers the matter closed after thorough investi- poses”, and Megvii says it requires its clients and those of convicted criminals that were gation — but not everyone involved is content. “not to weaponize our technology or solutions supplied by a Chinese police department10. “Curtin University maintains that the paper or use them for illegal purposes”. Harrisburg University removed its press should be retracted,” deputy vice-chancellor Academic conferences have been conten- release on 6 May following the outcry, but Chris Moran told Nature. He said the university tious, too. The Chinese Conference on Biomet- left a dangling question: the press release was still investigating the work. rics Recognition (CCBR) was held in Xinjiang’s had said that the work was to be published by Wiley says that after its conversations with capital, Ürümqi, in 2018. Anil Jain, a computer Springer Nature in a book series (which the Moreau, it updated its integrity guidelines scientist at Michigan State University in East publisher later denied). On 22 June, more than to make sure that expected standards for Lansing, sat on the conference’s advisory board 2,400 academics signed a letter from a group

356 | Nature | Vol 587 | 19 November 2020 ©2020 Spri nger Nature Li mited. All ri ghts reserved. ©2020 Spri nger Nature Li mited. All ri ghts reserved.

called the Coalition for Critical Technology (CCT), asking Springer Nature not to publish FACIAL RECOGNITION: A SURVEY ON ETHICS Nature surveyed* nearly 500 researchers who work in facial recognition, computer vision and artificial the work and calling on all publishers to refrain intelligence about ethical issues relating to facial-recognition research. They are split on whether from publishing similar studies. certain types of this research are ethically problematic and what should be done about concerns. The letter pointed out that such studies are Who responded to the survey? based on unsound science. It also noted that 480 respondents algorithmic tools that tell police where or who to target tend to provide a scientific veneer L for automated methods that only exacerbate Europe North China South Southeast Australia/ India Middle Africa Hong Kong/ Russia Not existing biases in the criminal justice system. America (mainland) America Asia New Zealand East Taiwan specified Three days earlier, more than 1,400 US math- Restrictions on image use ematicians had written a letter asking their Question: Researchers use large data sets of images of people’s faces — often scraped from the Internet — to colleagues to stop collaborating with police train and test facial-recognition algorithms. What kind of permissions do researchers need to use such images? on algorithms that claim to help reduce crime, Researchers should get informed because of concerns about systemic racism in consent from people before putting US law-enforcement agencies. their faces in a database Springer Nature said the work was never Researchers can freely use any online photos accepted for publication: it had been submit- ted to a conference and rejected after peer Researchers can use online photos when terms or licences permit that use review. (The authors, and Harrisburg Univer- sity, declined to comment.) Other Springer Nature was already under fire for a different paper, published in January in the No opinion Journal of , on detecting ‘criminal ten- 0 10 20 30 40 dency’ in photos of criminals and non-crimi- Percentage of respondents nals11. After researchers from the IEEE got in touch with ethical concerns, Margeret Hall, Restrictions related to vulnerable populations Question: Is it ethical to do facial-recognition research on vulnerable populations that might not the paper’s co-author at the University of be able to freely give informed consent, such as the Muslim population in western China? Nebraska Omaha, asked in June for the paper to be withdrawn. Hall says the now-retracted Ethically acceptable as long as the population gives consent paper was “indefensible”. Springer Nature says the journal reviewed its processes and now Might be ethically questionable even if informed consent is given requires authors to include statements on eth- ics approvals and consent when submitting Other manuscripts. 0% 10 20 30 40 50 60 70%

Nature survey Attitudes on dierent uses To get a wider sense of academic views on Question: How comfortable are you with facial-recognition technology being used in the following ways? facial-recognition ethics, Nature this year Extremely Somewhat Neither Somewhat Extremely surveyed 480 researchers who have published uncomfortable uncomfortable comfortable comfortable

papers on facial recognition, AI and computer Police identifying a science. On some questions, respondents suspect after a crime showed a clear preference. When asked for their opinions on studies that apply facial-rec- Airports checking travellers’ identities ognition methods to recognize or predict per- sonal characteristics (such as gender, sexual Users unlocking identity, age or ethnicity) from appearance, around two-thirds said that such studies Companies tracking who enters premises should be done only with the informed con- sent of those whose faces were used, or after Public-transport systems checking travellers’ identities discussion with representatives of groups that might be affected (see ‘Facial recognition: a Schools registering students and survey on ethics’). checking attendance

But on other issues, academics were split. Police monitoring public places Around 40% of the scientists in the survey felt that researchers should get informed Schools assessing consent from individuals before using their students’ behaviour faces in a facial-recognition data set, but more Companies tracking than half felt that this wasn’t necessary. The people in public spaces

researchers’ dilemma is that it’s hard to see Employers assessing personality traits how they can train accurate facial-recognition and emotions of job candidates

algorithms without vast data sets of photos, Anyone looking up says Sébastien Marcel, who leads a biometrics somebody’s identify group at the Idiap Research Institute in Mar- 0% 100% tigny, Switzerland. He thinks that researchers *Questions and answers have been paraphrased for brevity. The full survey and results are available online at go.nature.com/2uwtzyh

Nature | Vol 587 | 19 November 2020 | 357 ©2020 Spri nger Nature Li mited. All ri ghts reserved. ©2020 Spri nger Nature Li mited. All ri ghts reserved.

Feature But researchers must also recognize that a technology that can remotely identify or classify people without their knowledge is fundamentally dangerous — and should try to resist it being used to control or criminal- ize people, say some scientists. “The AI com- munity suffers from not seeing how its work fits into a long history of science being used to legitimize violence against marginalized people, and to stratify and separate people,” says Chelsea Barabas, who studies algorithmic decision-making at MIT and helped to form the CCT this year. “If you design a facial-recogni- tion algorithm for medical research without thinking about how it could be used by law enforcement, for instance, you’re being neg- ligent,” she says. Some organizations are starting to demand that researchers be more careful. One of the AI field’s premier meetings, the NeurIPS (Neural Information Processing Systems) conference, is requiring such ethical considerations for

the first time this year. Scientists submitting BAKER/AFP/GETTY GREG Schoolchildren walk beneath surveillance cameras in Xinjiang in western China. papers must add a statement addressing ethi- cal concerns and potential negative outcomes should get informed consent — but in practice, a lower proportion than respondents from the of their work. “It won’t solve the problem, but they don’t. His own group doesn’t crawl the United States and Europe (both above 73%). it’s a step in the right direction,” says David web for images, but it does use online image One Chinese American AI researcher who Ha, an AI researcher at Google in Tokyo. The data sets that others have compiled. “A lot of didn’t want to be named said that a problem journal Nature Machine Intelligence is also tri- researchers don’t want to hear about this: they was a cultural split in the field. “The number alling an approach in which it asks the authors consider it not their problem,” he says. of Chinese researchers at top conferences who of some machine-learning papers to include Ed Gerstner, director of journal policy at actively support censorship and Xinjiang con- a statement considering broader societal Springer Nature, said the publisher was con- centration camp[s] concerns me greatly. These impacts and ethical concerns, Gerstner says. sidering what it could do to discourage the groups have minimal contact with uncensored Levy is hopeful that academics in facial recog- “continued use” of image databases that don’t media and tend to avoid contact with those nition are waking up to the implications of what have explicit consent for their use in research who don’t speak Mandarin, especially about they work on — and what it might mean for their from the people in the images. social issues like this. I believe we need to find reputation if they don’t root out ethical issues Nature’s survey also asked researchers ways to actively engage with this community,” in the field. “It feels like a time of real awakening whether they felt that facial-recognition they wrote. in the science community,” she says. “People research on vulnerable populations — such Nature asked researchers what the scien- are more acutely aware of the ways in which as refugees or minority groups that were tific community should do about ethically technologies that they work on might be put under heavy surveillance — could be ethically questionable studies. The most popular to use politically — and they feel this viscerally.” questionable, even if scientists had gained answer was that during peer review, authors informed consent. Overall, 71% agreed; some of facial-recognition papers should be asked Richard Van Noorden is a features editor for noted it might be impossible to determine explicitly about the ethics of their studies. The Nature in London. whether consent from vulnerable populations survey also asked whether research that uses was informed, making it potentially valueless. facial-recognition software should require Some of those who disagreed, however, prior approval from ethics bodies, such as IRBs, 1. Wang, C., Zhang, Q., Liu, W., Liu, Y. & Miao, L. Wiley tried to draw a distinction between academic that consider research with human subjects. Interdiscip. Rev. Data Min. Knowl. Discov. 9, e1278 (2019). 2. Stewart, R., Andriluka, M. & Ng, A. Y. in Proc. 2016 IEEE research and how facial recognition is used. Almost half felt it should, and another quarter Conf. on Computer Vision and Pattern Recognition The focus should be on condemning and said it depended on the research. 2325–2333 (IEEE, 2016). 3. Ristani, E., Solera, F., Zou, R. S., Cucchiara, R. & Tomasi, C. restricting unethical applications of facial rec- Preprint at https://arxiv.org/abs/1609.01775 (2016). ognition, not on restricting research, they said. Ethical reflection 4. Nech, A. & Kemelmacher-Shlizerman, I. in Proc. 2017 Ethicists regard that distinction as naive. Researchers who work on technology that IEEE Conf. on Computer Vision and Pattern Recognition 3406–3415 (IEEE, 2017). “That’s the ‘I’m just an engineer’ mentality — recognizes or analyses faces point out that it 5. Guo, Y., Zhang, L., Hu., Y., He., X. & Gao, J. in Computer and we’re well past that now,” says Karen Levy, has many uses, such as to find lost children, Vision — ECCV 2016 (eds Leibe, B., Matas, J., Sebe, N. & a sociologist at Cornell University in Ithaca, track criminals, access smartphones and cash Welling, M.) https://doi.org/10.1007/978-3-319-46487-9_6 (Springer, 2016). New York, who works on technology ethics. machines more conveniently, help robots to 6. Jasserand, C. in Data Protection and Privacy: The Internet Some of the respondents in China said that interact with humans by recognizing their of Bodies (eds Leenes, R., van Brakel, R., Gutwirth, S. & they were offended by the question. “You identities and emotions and, in some medical de Hert, P.) Ch. 7 (Hart, 2018). 7. Moreau, Y. Nature 576, 36–38 (2019). should not say that in Xinjiang some groups are studies, to help diagnose or remotely track 8. Zhang, D. et al. Int. J. Legal Med. https://doi.org/10.1007/ detained in camps,” wrote one. Just under half consenting participants. “There are a number s00414-019-02049-6 (2019). of the 47 Chinese respondents felt that studies of lawful and legitimate applications of face 9. Pan, X. et al. Int. J. Legal Med. 134, 2079 (2020). 10. Wu, X. & Xhang, X. Preprint at https://arxiv.org/ on vulnerable groups could be ethically ques- and biometric recognition which we need in abs/1611.04135 (2016). tionable even if scientists had gained consent, our society,” says Jain. 11. Hashemi, M. & Hall, M. J. Big Data 7, 2 (2020).

358 | Nature | Vol 587 | 19 November 2020 ©2020 Spri nger Nature Li mited. All ri ghts reserved.