E ǡ NC

E ONE OF THE GREAT STARS IN THE SCIENTIFIC FIRMAMENT . FRAUDUL vs

E

NC CYRIL LABBÉ

E ‹˜‡”•‹–›‘ˆ ”‡‘„Ž‡ǡ ƒ„‘”ƒ–‘”›Ǥ ”ƒ ‡ E-mail: Cyril.Labbe[at]imag.fr

Abstract: ‘™Dz ‡–ƒ”‡dz„‡ ƒ‡‘‡‘ˆ–Š‡‘•–Š‹‰ŠŽ› ‹–‡†• ‹‡–‹•–•‹–Š‡‘†‡”™‘”Ž†ƒ†Š‘™›‘— ‘—Ž†„‡ ‘‡Ž‹‡Š‹Ǥ

INTRODUCTION stein in 36th position with a h-index of 84.

Best of all, with respect to the hm-index “Ike is one of the most powerful Antkare” holds the sixth position outclassing tools that allows researchers to share and all scientists in his field (computer science). find scientific publications. It is also used as a This document explains why this is pos- means of measuring the individual output of sible and how you could become as good as

researchers (h-index [7]. g-index - e.g. [5], hm- Ike Antkare. index [8], etc.). Several tools (Scholarometer The first section demonstrates how eas- [4], Publish or Perish [6], Scholar H-Index Cal- ily fake scientific documents can be gen-

ARCH IN FOCUS: SCI culator [3], H-view [1]) computes these metrics erated on the necessary scale. The second

E using the data provided by Google Scholar. section explains what has to be done for

S Since the 8th of April 2010, these tools these documents to be indexed by Google/ have allowed a certain Ike Antkare to become Google scholar and thus to become visible. one of the most highly cited scientists of the RE modern world (see Appendix A, Figures 2-6). According to Scholaro meter, “Ike Antkare” THE HOLY GRAIL OF A LAZY has 102 publications (almost all in 2009) and SCIENTIST has an h-index of 94, putting him in the 21st position of the most highly cited scientists. Scigen [2] is an automatic generator of This score is less than Freud, in 1st position amazing and funny articles using the jar- with a h-index of 183, but better than Ein- gon of computer science. Scigen is based

ISSI NEWSLETTER VOL. !. NR. ". © International Society for and Informetrics 48 E NC E . FRAUDUL vs

E NC Real documents Ike Antkare’s !"! document E Figure 1: References between fake and real documents.

on hand-written context-free grammar ticle was added to each of them. This was and has been developed by the PDOS re- achieved by generating a document referenc- search group at MIT CSAIL. It was initially ing only real articles1 and by adding an extra aimed at testing the selection process of reference to this document in each of the 100 contributions submitted to apparently generated articles (see Figure 1). dubious conferences. Titles, authors, sec- $VD¿QDOVWHSWKHKWPOSDJHVSURYLGLQJ tions, bibliography, graphs and figures can OLQNV WR WKH  SGI ¿OHV PXVW EH FUDZOHG be automatically generated. But titles and by a Googlebot. This takes an undetermined authors can also be chosen. In the produc- time, however the fastest and guaranteed tion of Ike Antkare’s bibliography, these UHVXOWV DUH REWDLQHG E\ XVLQJ KWWSZZZ tools were slightly modified to generate: JRRJOHFRPDGGXUO 7KHRU\ VD\V WKDW ,NH 2 ► a list of n titles, Antkare’s h-index=g-index=hm-index=100 . ► n articles titled using the previous But, as you know, theory and real world are titles. Each article cited the whole often slightly different. set of the n articles (itself included),

ARCH IN FOCUS: SCI ► a html page, providing titles,

E abstracts and links to pdf files. CONCLUSION S At this point in time, tools computing indi-    vidual researcher performance indices using RE Google scholar are not reliable. This experi- For an article to be indexed in Google Scholar ment shows how easily and to what extent it has to have at least one reference to an ar- computed values can be distorted. It is worth ticle already indexed in Google Scholar. For Ike Antkare’s set of articles to be indexed, 1 Ike Antkare, Architecting E-Business Using Psychoacous- an extra reference to an already indexed ar- tic Modalities. PhD thesis, United Saints of Earth, 2009. 2 or 99 without counting references of a document to itself

ISSI NEWSLETTER VOL. !. NR. ". © International Society for Scientometrics and Informetrics 49 E noting that this distortion could have been [3] http://userscripts.org/scripts/show/59378, easily achieved using names of real people, April 2010. thus fostering or rather discrediting them. NC It is widely accepted that important deci- [4] Indiana University Bloomington. http:// E sions on the future of a scientist cannot be scholarometer.indiana.edu, April 2010. taken based on these criteria. Moreover, the case of Ike Antkare implies that one takes [5] Egghe, L, Mathematical theory of the h- and a careful look, not only at documents, but g-index in case of fractional counting of also at documents citing documents. authorship. JASIST, 59(10):1608-1616, 2008.

[7] Harzing, A.W., Publish or perish. available at   www.harzing.com/pop.html, 2010.

The author would like to thank Yves Den- [8] Hirsch, J.E., An index to quantify an individual’s . FRAUDUL neulin and Edward Arnold for their help. scientific research output. Proceedings

vs of the National Academy of Sciences,

102(46):16569-16572, November 2005. E REFERENCES [9] Schreiber, M. To share the fame in a fair

NC [1] http://hview.limsi.fr/, April 2010. way, h m modifies h for multi-authored

E manuscripts. New Journal of Physics, [2] http://pdos.csail.mit.edu/scigen/, April 2010. 10(4):040201, 2008.

APPENDIX A ARCH IN FOCUS: SCI E S RE

Figure 2: Ike Antkare’s hm-index according to Scholarometer.

ISSI NEWSLETTER VOL. !. NR. ". © International Society for Scientometrics and Informetrics 50 E NC E . FRAUDUL vs

E

Figure 3: Ike Antkare’s h-index according to Scholarometer. NC E

Figure 4: Ike Antkare’s performance indices according to Scholarometer. ARCH IN FOCUS: SCI E S RE

Figure 5: Ike Antkare’s performance indices according to Publish or Perish.

ISSI NEWSLETTER VOL. !. NR. ". © International Society for Scientometrics and Informetrics 51 E NC E . FRAUDUL Figure 6: Ike Antkare. vs

E APPENDIX B NC E ARCH IN FOCUS: SCI E S RE

Appendix B: Fragments from pages ! and " of a fake document generated using Scigen

ISSI NEWSLETTER VOL. !. NR. ". © International Society for Scientometrics and Informetrics 52