COMMENTARY How many scientific papers are not original? Michael Lesk1 measurements by Sir Cyril Burt. Burt had Department of Library and Information Science, Rutgers University, New Brunswick, NJ 08901 studied what seemed to be a remarkable number of identical twins raised apart. His Is plagiarism afflicting science? In PNAS, Given the incentives, it is hardly surprising data were challenged soon after his death as Citron and Ginsparg (1) count the number of that some authors are attempting to exploit too good to be true; the original notes were authors who are submitting articles contain- the system. This can be surprisingly easy. gone, and his coworkers could not be found. ing text already appearing elsewhere. They Delgado et al. (7) explain how they created Although there has been argument back and report disturbing numbers of authors resort- a half-dozen fake papers, with several hun- forth, even his supporters have been de- ing to copying, particularly in some countries dred . One of the authors saw his fending him by saying he was careless rather where 15% of submissions are detected as count go up by a factor of 2 and than fraudulent and that other people containing duplicated material. I am on the his h-index increased from 10 to 15. Fans studying genetics and intelligence have found editorial board of an Institute of Electrical of bicycle racing may smile on reading that about the same level of correlation (10). and Electronic Engineers (IEEE) magazine, the fake papers were attributed to Alberto More recently, two economists, Carmen which also finds it useful to run all of the Pantini-Contador. Reinhart and Kenneth Rogoff, published submissions through a plagiarism filter. What Refereeing, at least for some journals, a claim that economic growth slowed in can be done about this? is pretty shaky. As cited by Citron and countries whose national debt exceeded In 1830, Charles Babbage deplored unreli- Ginsparg, Bohannon (8) submitted a fake ar- 90% of gross domestic product. After 2 y, able science. He discussed hoaxes, forgeries, ticle to more than 300 open access journals, they gave their spreadsheet to researchers data trimming, and “cooking” (selecting data and more than half accepted it. Following at the University of Massachusetts, who to match a theory) (2). Today, doubtful up, he found that one of these journals found several errors; for example, the first few papers may be plagiarized, invented, or mis- had plagiarized its own description from a countries in alphabetical order had been taken. This paper documents problems left out of the calculation. A corrected at one extreme: straightforward pla- One bright spot in the spreadsheet did not show the same abrupt giarism within one publisher. More com- Citron and Ginsparg slowdown in growth, but the original pa- plex deceptions can be found at the site paper is that plagiarism per had already been used to justify a retractionwatch.com, which includes, among change to budget-balancing policies in major other examples, invented or fraudulent is concentrated: they economies (11). data. Mistaken research was highlighted note that a small number Returning to the simpler problem of in an important study by Begley and Ellis, of authors produce a plagiarism, it can extend beyond individual who found that it was impossible to repli- papers. In 2009, a conference in Hainan, “ cate 47 of 53 oncology studies that they disproportionate China, called itself the International Joint ” attempted to repeat (3). At a time when share of the doubtful Conference on Artificial Intelligence. That important scientific questions are under at- submissions. name is very familiar to artificial intelligence tack, we need to improve confidence in researchersasthetitleofamajorconference our publications. reputable journal in the same subject area. held regularly since 1969. However, the How can we increase our level of trust in The scholarlyoa.com site attempts to catalog conference with the long history met in the ? In 2012, more than 2 the doubtful publishers and their journals. Pasadena in 2009; the Hainan conference just million papers were published (4). They ap- Much more common than completely fake borrowed the name. Perhaps it is not sur- pear in publications ranging from highly papers is the boosting of publication count by prising that the Hainan conference in- dividing one’s reports into multiple short cluded several papers that had come from the competitive and prestigious journals such “ as , Science, Lancet, and this journal, papers, an idea that has been called the least SCIGen chatterbot or some similar program. publishable unit” since the 1970s. Some pub- down to the predatory publishers listed in Here is a sentence from one (since lishers or conference organizers join in the “ scholarlyoa.com who will print pretty much removed from IEEE Xplore): Furthermore, manipulation. Whilhite and Fong describe anything for a fee. University faculty, in par- it explored a pervasive tool for enabling an editor who asked prospective authors ticular, are encouraged to publish because the pasteurization, which is used to show that to add citations to his journal to their reward systems often depend on publication context-free grammar and B-trees are largely articles to increase the impact factor of ” andcitationcountsaswaysofevaluating compatible. Chatterbot output can now be the journal (9). merit. The h-index is the modern equivalent detected automatically (12) and publish- of the old saying “Deans can’t read, they can Consequences ers find themselves, regrettably, forced to only count.” In some countries, having a pa- Deception and mistake can have real con- per accepted in a top journal can mean a cash sequences outside of science. For decades, the Author contributions: M.L. wrote the paper. bonus, with Zhejiang University offering a UK educational system emphasized the “11- The author declares no conflict of interest. $30,000 payment to an author who publishes plus” examination, justified by a belief in the See companion article on page 25. in Science or Nature (5, 6). inheritability of intelligence that came from 1Email: [email protected].

6–7 | PNAS | January 6, 2015 | vol. 112 | no. 1 www.pnas.org/cgi/doi/10.1073/pnas.1422282112 Downloaded by guest on September 30, 2021 use such software, as well as anti-copying with some combination of carrots and sticks, Some ignore the flag, and some say that what COMMENTARY utilities. encourage the institutions in all countries to they are doing is acceptable practice. These Plagarism would matter less if counting enforce standards? There are very few in- responses suggest that some additional articles was less significant than under- dividual scientists today, and approaching the response is needed (although Citron and standing them. ArXiv at least does not claim institutions might be the best way to affect Ginsparg do not say how many authors re- to referee submissions; anyone using it knows achangeinattitude. spond to the warning in which way). that they have to read and evaluate the con- For example, recently I received a request Nature published a discussion on plagia- tent for themselves. This, of course, trans- from someone in Asia who wanted to be a rism 2 y ago, and in it, Zhang and McIntosh fers the burden of judgment from a small postdoctoral researcher in our department in suggested keeping a blacklist of individuals number of referees to the much larger the United States. I took the first two para- (14). They note that this should be a number of potential readers. In addition, graphs of his research statement and found many of those readers may be students, or them on a commercial website of a US multipublisher effort and that it is unclear in a different discipline, and be less able company. Should I have told this to the head who would run it or pay for it (14). I would to evaluate a paper. This is why we have the of his institution? Right now, we don’tdo suggest one further step: identify depart- current publication system, but it is being that, partly out of politeness and partly out of ments, and perhaps institutions, where the abused by researchers who know that for fear of lawsuits. However, when Citron and problems are arising. Publishers should sug- some purposes, the main question being Ginsparg write that some of the people whose gest that they will blacklist the entire de- asked of a candidate for hiring or promo- plagiarism is detected reply by asking to be partment (or, if need be, the institution). tion is “how many articles?” told which parts were found to be copied, Intermediate forms of punishment are possi- Mere number of publications is not what presumably to learn how to evade detection ble, such as delaying publication rather than is really important. When challenged as a inthefuture,onedespairs. denying it entirely. “half-wit,” the Roman emperor Claudius, at For experimental studies, the move to re- In summary, this paper describes the scope least in the British Broadcasting Corpora- quiring data availability will be a step for- of plagiarism within arXiv. The good news tion version of his life, replied that it is ward. If an author did not actually write the is that the tools used to detect plagiarism quality rather than quantity of wits that paper under discussion, presumably that work effectively and efficiently, the copied matters (13). Similarly, the National Sci- author does not have the data behind it. The papers are concentrated by author and by ence Foundation asks those who submit data can be copied as well, but that offers country, and the copied papers are less cited. proposals to list five important and relevant another chance for automated tools to spot papers and not to attempt to drown the ref- the duplication, and one where paraphrasing The bad news is that the problem is real and erees in dozens (or hundreds) of articles. is more complicated. in some countries severe. ArXiv is now Fortunately, one bright spot in the Citron ArXivistryingtomotivateauthorsby identifying the papers that have substantial and Ginsparg paper is that plagiarism is flagging papers that contain overlap. Readers overlap and is waiting to see if that affects the concentrated: they note that a small number arethenonnoticethatthepaperhasa submissions. Perhaps the publishing com- of authors produce a disproportionate share problem; unfortunately, authors do not nec- munity as a whole should be preparing to see of the doubtful submissions. In addition, essarily react with shame or withdrawal. if stronger steps are needed. those articles are not the heavily cited ones, suggesting that they have less influence. Also, there are many important countries where 1 Citron DT, Ginsparg P (2015) Patterns of text 8 Bohannon J (2013) Who’s afraid of peer review? Science the plagiarism rate is low. Conversely, the reuse in a scientific corpus. Proc Natl Acad Sci USA 112: 342(6154):60–65. 25–30. 9 Wilhite AW, Fong EA (2012) Scientific publications. Coercive methodology of the paper relies on exact text 2 Babbage C (1830) Reflections on the Decline of Science in citation in . Science 335(6068):542–543. overlap; it will not detect, for example, an England, and on Some of Its Causes (B. Fellowes, London). 10 Plucker JA, Esping A, eds. (2014) The Cyril Burt affair. Human 3 Begley CG, Ellis LM (2012) Drug development: Raise standards for Intelligence: Historical Influences, Current Controversies, Teaching article translated from another language, nor – preclinical cancer research. Nature 483(7391):531 533. Resources. Available at: www.intelltheory.com. Accessed November 4 Reich ES (2013) Science publishing: The golden club. Nature one which paraphrases but adds nothing to 23, 2014. 502(7471):291–293. its source. 11 Krugman P (2013) How the case for austerity has crumbled. The 5 Davis P (2011) Paying for impact: Does the Chinese model make New York Review of Books. Available at: www.nybooks.com/articles/ sense? Available at: scholarlykitchen.sspnet.org/2011/04/07/paying- Possible Actions archives/2013/jun/06/how-case-austerity-has-crumbled. Accessed on for-impact-does-the-chinese-model-make-sense/. Accessed What can we do? This paper observes a November 24, 2014. November 27, 2014. 6 Shao J, Shen H (2011) The outflow of scientific papers from 12 Labbé C, Labbé D (2013) Duplicate and fake publications in the strong cultural connection with plagiarism: China: Why is it happening and can it be stemmed? Learn Publ scientific literature: How many SCIgen papers in computer science? there are some countries in which 15% of the 24(2):95–97. 94(1):379–396. submissions to arXiv are plagiarized, and 7 Lopez-Cozar E, Robinson-Garcia N, Torres-Solinas D (2013) 13 Pullman J (1976) I, Claudius [television production], director Wise H Manipulating citations and Google Scholar metrics: (British Broadcasting Corporation). others in which very few papers are copying Simple, easy and tempting. Available at: arxiv.org/abs/1212.0638. 14 Zhang Y, McIntosh I (2012) How to stop plagiarism: Blacklist from others. Can the scientific community, Accessed November 27, 2014. repeat offenders. Nature 481(7379):22.

Lesk PNAS | January 6, 2015 | vol. 112 | no. 1 | 7 Downloaded by guest on September 30, 2021