Trends in Gaming Indicators: On Failed Attempts at Deception and their Computerised Detection

Cyril Labb´e

Universit´eGrenoble Alpes - LIG - ´equipe Sigma

March 26, 2018

BIR-ECIR 2018

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 1 / 33 1 Of Publications and Gamming : what for? Medley SCIgen a Probabilistic Context Free Grammar

2 Of the use of fake publications h-index hacking Resume Padding Journal Hijacking

3 Detection of SCIgen papers Google Search SciDetect: Automatic detection

4 Automatic detection of questionable research papers Fact checking science Seek & Blastn tool

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 2 / 33 Of Publications and Gamming Table of Contents

1 Of Publications and Gamming Scientometrics: what for? Medley SCIgen a Probabilistic Context Free Grammar

2 Of the use of fake publications h-index hacking Resume Padding Journal Hijacking

3 Detection of SCIgen papers Google Search SciDetect: Automatic detection

4 Automatic detection of questionable research papers Fact checking science Seek & Blastn tool

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 3 / 33 Of Publications and Gamming Scientometrics: what for? Ranking Uni, Journals and Scientists

Librarian Impact Factor What are the must-buys for my readers? Average number of (....) over the last two years. Computed since 1975. Scientist Where shall I submit my research? h-index and variations http://sci2s.ugr.es/hindex

Research Administration h5-index, g-index, hm-index, a-index, hg-index, ar -index... Who shall I hire? Who deserve a promotion? ARWU Students Academic Ranking of World Universities (Shanghai ranking) since 2003. Where to study? With whom? In which country? Collaborative distance Government Who deserve investment? What for? Which scientific field?

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 4 / 33 Of Publications and Gamming Scientometrics: what for? Information Systems for science

Scientific publications are at the heart of the system: Knowledge di↵usion. Counting unit.

Increasing number of information sources: Publishers repositories Open archive and dedicated social networks

Various caracteristics: free or toll acces Peer review vs non-Peer review

Various goals: Spreading knowledge / State of the art / Bibliometry / Scientometrics

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 5 / 33 Of Publications and Gamming Scientometrics: what for? Tools that count citations.

Toll based tools. Provided by publisher (, Thomson reuters); Based on publishers catalogs (ACM, IEEE, Springer, Elsevier); Selected venues only (all peer reviewed).

Free tools: , CiteSeerX,... Crawling the web / selected catalogs / added by users; Social media (Google+, Scholarometer, Microsoft Academics...).

Free tools that computes indicators Publish or Perish; Scholarometer; Microsoft Academics; Google+; and many more...

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 6 / 33 Of Publications and Gamming Medley Cases an possible countermeasure

King Abdulaziz University Citations Analysis Recruiting massively highly cited authors in Track down potential manipulations afield. h-index and self cita- tions [Bartneck and Servaas, 2011] Hacking peer-review process Editors Peer-review ring to bypass real peer review misbehavior [Herteliu et al., 2017] and avoid rejection by gaining an easy and cartels [Fister jr et al., 2016] quick acceptation.

Content similarity Academic search engine optimization Track down Search engine Paper mills (authorship) spoofing [Lopez-Cozar et al., 2012, Beel and Gipp, 2010, Beel et al., 2010]. Scientific errors Content reuse Paper mills Pay for someone to write and present your The Holy Grail of a lazy scientist paper at a conference/journal. Automatic evaluation (and generation) of (real) scientific papers.

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 7 / 33 Non zero probability to 1

Of Publications and Gamming SCIgen a Probabilistic Context Free Grammar PCFG: Probabilistic Context Free Grammar

Sets of symbols Set of non terminal symbols = , , , , N {SP S V P} Set of terminal symbols ⌃= ”.”, sing, dance, flight, seas, oceans, air, streets, hills, fields . { }

Set of rules i R 1 : . p( 1)=1 R SP! S R 2 : We shall in the p( 2)=1/4 R S! V P R 4 : We shall in the and in the , p( 4)=1/4 R S! V P P S R 3 : , p( 3)=1/2 R S!SS R 5..7 : sing dance flight p( )=1/3 i=5..7 R V! | | Ri 8..13 : seas oceans air streets hills fields p( )=1/6 i=8..13 R P! | | | | | Ri

Terminal string example: s :Weshallsingintheairandinthehills,Weshalldanceinthefields. p(s)= p( j ) j R

C.Labb´eQ (UGA-LIG) Ike Antkare & Co March 26, 2018 8 / 33 Of Publications and Gamming SCIgen a Probabilistic Context Free Grammar PCFG: Probabilistic Context Free Grammar

Sets of symbols Set of non terminal symbols = , , , , N {SP S V P} Set of terminal symbols ⌃= ”.”, sing, dance, flight, seas, oceans, air, streets, hills, fields . { }

Set of rules i R 1 : . p( 1)=1 R SP! S R 2 : We shall in the p( 2)=1/4 Non zero R S! V P R 4 : We shall in the and in the , p( 4)=1/4 probability R S! V P P S R 3 : , p( 3)=1/2 to R S!SS R 1 5..7 : sing dance flight p( )=1/3 i=5..7 R V! | | Ri 8..13 : seas oceans air streets hills fields p( )=1/6 i=8..13 R P! | | | | | Ri

Terminal string example: s :Weshallsingintheairandinthehills,Weshalldanceinthefields. p(s)= p( j ) j R

C.Labb´eQ (UGA-LIG) Ike Antkare & Co March 26, 2018 8 / 33 Of Publications and Gamming SCIgen a Probabilistic Context Free Grammar

SCIgen 2005 by J. Stribling, M. Krohn & D. Aguayo

... maximize amusement, rather than coherence ...

Titre Introduction Model Impl Eval RelatedWork Concl References

Intro_A Intro_A2 Intro_A3 Intro_closing

ntro A Many SCI PEOPLE would agree that, had it not been for SCI GENERIC NOUN,... I ! ntro A In recent years, much research has been devoted to the SCI ACT; ,... I ! ntro A SCI THING MOD and SCI THING MOD, while SCI ADJ in theory, have not until... I ! ntro A The SCI ACT is a SCI ADJSCI PROBLEM. I ! ntro A The SCI ACT has SCI VERBEDSCI THING MOD, and current trends... I ! ntro A The implications of SCI BUZZWORD ADJ SCI BUZZWORD NOUN have... I ! ...... ! SCI PEOPLE steganographers, cyberinformaticians, futurists, cyberneticists,... ! SCI BUZZWORD ADJ omniscient, introspective, peer to peer, ambimorphic,... !

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 9 / 33 Of Publications and Gamming SCIgen a Probabilistic Context Free Grammar

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 10 / 33 Of the use of fake publications Table of Contents

1 Of Publications and Gamming Scientometrics: what for? Medley SCIgen a Probabilistic Context Free Grammar

2 Of the use of fake publications h-index hacking Resume Padding Journal Hijacking

3 Detection of SCIgen papers Google Search SciDetect: Automatic detection

4 Automatic detection of questionable research papers Fact checking science Seek & Blastn tool

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 11 / 33 Of the use of fake publications h-index hacking

Building a citation farm [Labb´e, 2010]

Modified SCIgen

... 1 100 ...

0 ...

......

Real Documents Ike Antkare’s 101 Documents

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 12 / 33 Of the use of fake publications h-index hacking

Ike Antkare h-index [Labb´e, 2010]

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 13 / 33 Of the use of fake publications Resume Padding IEEEXplore: 12 nov. 2014

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 14 / 33 Of the use of fake publications Resume Padding IEEEXplore: 2 feb. 2016

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 15 / 33 Of the use of fake publications Journal Hijacking

Beware Hijacking (Lorem Ipsum) Je↵rey Beall http://scholarlyoa.com

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 16 / 33 Detection of SCIgen papers Table of Contents

1 Of Publications and Gamming Scientometrics: what for? Medley SCIgen a Probabilistic Context Free Grammar

2 Of the use of fake publications h-index hacking Resume Padding Journal Hijacking

3 Detection of SCIgen papers Google Search SciDetect: Automatic detection

4 Automatic detection of questionable research papers Fact checking science Seek & Blastn tool

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 17 / 33 Detection of SCIgen papers Google Search Phrase search

Many SCI PEOPLE would agree that, had it not been for SCI GENERIC NOUN, ... In recent years, much research has been devoted to the SCI ACT; ... SCI THING MOD and SCI THING MOD, while SCI ADJ in theory, have not until ... The SCI ACT has SCI VERBEDSCI THING MOD, and current trends ... The implications of SCI BUZZWORD ADJ SCI BUZZWORD NOUN have ...

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 18 / 33 Detection of SCIgen papers Google Search Phrase search

Many SCI PEOPLE would agree that, had it not been for SCI GENERIC NOUN, ... In recent years, much research has been devoted to the SCI ACT; ... SCI THING MOD and SCI THING MOD, while SCI ADJ in theory, have not until ... The SCI ACT has SCI VERBEDSCI THING MOD, and current trends ... The implications of SCI BUZZWORD ADJ SCI BUZZWORD NOUN have ...

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 18 / 33 Detection of SCIgen papers SciDetect: Automatic detection

Automatic detection (classification ) [Labb´eand Labb´e, 2013]

Inter-textual Distance:

(a,b) = proportion of di↵erent works (tokens) in the two texts.

Let Hierarchical Clustering

...... 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 t atextundertest. Fake t =min(t,f )

l f SCIgen l l 2 l l l l l l I l l I l I I I l l I I I I I l I I l l I l I I l l I I I I I I I I I I I l l I I I I I l I I I l I I I I I I Fake I I I I l I I I I I I I I l l l l I I I I I I I I I I I I I I I I I I I I

I I Si ( < )Then I I I

I th I I I I I I

I I I t I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I l I I I I I I I I I I I I I I I I I I I I l l I I I I l I I l l l l l l l ! I I l l ! l ! ! ! The text is almost surely ! l l ! ! ! l I I l l l l ! ! l l l l l l ! ! ! ! l l l l ! l l I I l l l l l ! l l l l ! l l l ! l l ! l ! ! l ! l l ! l ! ! ! l l l ! ! l ! ! ! ! l l ! ! l l l ! l l l l ! ! ! ! l l ! ! ! ! l l l ! ! ! l ! ! ! ! ! l l ! ! ! l ! ! l l ! ! ! ! ! ! l l ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! l l ! ! ! ! ! l ! ! ! ! ! ! ! ! ! ! ! ! l SCIgen generated. l l l l l l l l l Else l l

l l l l non-SCIgen.

SCIGen Corpus Z MLT

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 19 / 33 Detection of SCIgen papers SciDetect: Automatic detection SCIgen papers and its clones

SSME: Int. Conf. on Services Science, Management and Engineering. 2009. IEEEXplorer, indexed in Scopus and WoK 150 papers, 4 SCIgen and 1 duplicate. Ocial acceptance rate : 28%

SCIgen inside (publishers) SCIgen inside (social networks) 120 IEEE (retracted or deleted), http://www.researchgate.net 16 Springer (retracted), http://scholar.harvard.edu 1Elsevier(accepted-unpublished) http://www.academia.edu

Other generators Mathgen (http://thatsmathematics.com/mathgen/) The Postmodernism Generator (http://www.elsewhere.org/pomo/) scigen-physics (https://bitbucket.org/birkenfeld/scigen-physics) Auto. SBIR Grant Proposal Generator (http://www.nadovich.com/chris/randprop/)

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 20 / 33 Detection of SCIgen papers SciDetect: Automatic detection

Mainstream press (2014), is it arming science?

Publishers withdraw more than 120 gib- berish papers

Science publisher fooled by gibberish papers Publier ou p´erir: faux articles pour faux congr`es More Computer-Generated Nonsense Fraudulent scientific papers published, Papers Pulled From Science Journals then withdrawn

Wissenschaftsverlag l¨oscht 16 sinn- Ike Antkare, le grand scientifique qui Science publisher fooled by gibberish freie Artikel n’existait pas How Gobbledygook Ended Up in Re- spected Scientific Journals

How computer-generated fake papers Science Publishers Remove Papers Generated as a Hoax Wieder ließen Fachverlage Nonsens are flooding academia ungepr¨uftdurchgehen

Fake Research Papers: How Did More Than 120 ’Gib- berish’ Computer-Generated Studies Get Published?

Fraudulent scientific papers published, then with- drawn

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 21 / 33 Detection of SCIgen papers SciDetect: Automatic detection No SCIgen paper in arXiv (Computer Science)

Automated screening: ArXiv screens spot fake papers Only stop-words PCA Supposed non Zipfian

Image borrowed from [Ginsparg, 2014]

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 22 / 33 Detection of SCIgen papers SciDetect: Automatic detection Related/Ongoing Work

Detecting Based on references [Xiong and Huang, 2009], Compression based and ad-hoc classifier [Dalkilic et al., 2006], Ad-hoc similarity and classifier [Lavoie and Krishnamoorthy, 2010], Structural distances between texts [Fahrenberg et al., 2014]. Phrases search [Springer, 2014]. Topological properties [Amancio, 2015]

Spoofing [Beel and Gipp, 2010, Lopez-Cozar et al., 2012], Academic optimisation [Beel et al., 2010];

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 23 / 33 Detection of SCIgen papers SciDetect: Automatic detection

Springer- funded SciDetect: http://scidetect.forge.imag.fr

Press release, march 2015 ”The open source software discovers text that has been generated with the SCIgen computer program and other fake-paper generators like Mathgen and Physgen.” ”SciDetect is highly flexible and can be quickly customized to cope with new methods of automatically generating fake or random text”

Do not cop with other problems Peer review rings Paper mills Black market and authorship selling

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 24 / 33 Automatic detection of questionable research papers Table of Contents

1 Of Publications and Gamming Scientometrics: what for? Medley SCIgen a Probabilistic Context Free Grammar

2 Of the use of fake publications h-index hacking Resume Padding Journal Hijacking

3 Detection of SCIgen papers Google Search SciDetect: Automatic detection

4 Automatic detection of questionable research papers Fact checking science Seek & Blastn tool

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 25 / 33 Automatic detection of questionable research papers Automatic detection of questionable research papers [Byrne and Labb´e, 2017b, Byrne and Labb´e, 2017a]

Scientific ethics Non-sense detection Plagiarism, auto-plagiarism, Paper generator (SCIgen, content reuse... physic-gen, MathGen...) N grams signature Authorship detection (hashing functions). (inter-textual distance).

Need to detect questionable scientific results

Fabrications (making up data or results) Error spreading Falsification (manipulating data or results) = Wrong belief False or unsupported armations ) Research irreproducibility Genuine errors

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 26 / 33 Automatic detection of questionable research papers Fact checking science Starting point : striking similarities, obvious errors

Jennifer Byrne: 5 Publications from China: First reported TPD52L2 Single gene knockdown (20 years ago) experiments. 5 Publications with obvious Human cancer cell lines. errors!

Conclusions highlight potential therapy ...TPD52L2... novel therapeutic target for glioma treatment. ...TPD52L2... novel clues for oral squamous cell carcinoma therapy. ...TPD52L2... therapeutic approach for the treatment of breast cancer. ...TPD52L2 is indispensable in gastric cancer proliferation. ...TPD52L2 could be a novel therapeutic target for human liver cancer.

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 27 / 33 Fact-Check using blastn (NCBI)

Query= SeqDSeqD (evalue = 10) Length=68 Sequences producing significantsignificant alignments:alignments: ...... >.... HomoHomo sapienssapiens NIN1/PSMD8NIN1/PSMD8 bindingbinding proteinprotein 11 homologhomolog (NOB1)...(NOB1)... Length=1775 ... Query 9 GCCAAGGAAGTGCAATTGCATA 30 |||||||||||||||||||||| Sbjct 1505 GCCAAGGAAGTGCAATTGCATA 1526 .... Query 37 TATGCAATTGCACTTCCTTGG 57 |||||||||||||||||||||| Sbjct 1526 TATGCAATTGCACTTCCTTGG 1506

Automatic detection of questionable research papers Fact checking science Obvious errors: example

PMID : 25262828

Fact-Check using blastn (NCBI) Materials and methods Query= SeqASeqA (evalue = 10) Length=54 The shRNA sequence (5’-GCGGAGGGTTTGAAAGAATATCTC- Sequences producing sigsignificantnificant align alignments:ments: GAGATATTCTTTCAAACCCTCCGCTTTTTT-3’) targeting ...... TPD52L2 (NM 199360) was inserted into the pFH-L plasmid >.... Homo sapi sapiensens tu tumormor pro proteintein D52 likelike 22 (TPD52L2),(TPD52L2), ... (Shanghai Hollybio, China). A scrambled shRNA that shared no Length=2230 homology with the mammalian genome (5’-CTAGCCCGGCCAAG- ... GAAGTGCAATTGCATACTCGAGTATGCAATTGCACTTC- Query 1 GCGGAGGGTTTGAAAGAATAT 21 ||||||||||||||||||||| CTTGGTTTTTTGTTAAT-3’) was used as control. Sbjct 894 GCGGAGGGTTTGAAAGAATAT 914 .... Query 28 ATATTCTTTCAAACCCTCCGC 48 ||||||||||||||||||||| Sbjct 914 ATATTCTTTCAAACCCTCCGC 894 SeqD SeqA

50 GCGG 50 GTAG Targets(21/21) Targets(22/22)

Gene TPD52L2 Gene Nob1

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 28 / 33 Fact-Check using blastn (NCBI)

Query= SeqASeqA (evalue = 10) Length=54 Sequences producing sigsignificantnificant align alignments:ments: ...... >.... Homo sapi sapiensens tu tumormor pro proteintein D52 likelike 22 (TPD52L2),(TPD52L2), ... Length=2230 ... Query 1 GCGGAGGGTTTGAAAGAATAT 21 ||||||||||||||||||||| Sbjct 894 GCGGAGGGTTTGAAAGAATAT 914 .... Query 28 ATATTCTTTCAAACCCTCCGC 48 ||||||||||||||||||||| Sbjct 914 ATATTCTTTCAAACCCTCCGC 894

Automatic detection of questionable research papers Fact checking science Obvious errors: example

Fact-Check using blastn (NCBI)

PMID : 25262828 Query= SeqDSeqD (evalue = 10) Length=68 Sequences producing significantsignificant alignments:alignments: Materials and methods ...... >.... HomoHomo sapienssapiens NIN1/PSMD8NIN1/PSMD8 bindingbinding The shRNA sequence (5’-GCGGAGGGTTTGAAAGAATATCTC- proteinprotein 11 homologhomolog (NOB1)...(NOB1)... Length=1775 GAGATATTCTTTCAAACCCTCCGCTTTTTT-3’) targeting ... TPD52L2 (NM 199360) was inserted into the pFH-L plasmid Query 9 GCCAAGGAAGTGCAATTGCATA 30 (Shanghai Hollybio, China). A scrambled shRNA that shared no |||||||||||||||||||||| Sbjct 1505 GCCAAGGAAGTGCAATTGCATA 1526 homology with the mammalian genome (5’-CTAGCCCGGCCAAG- .... GAAGTGCAATTGCATACTCGAGTATGCAATTGCACTTC- Query 37 TATGCAATTGCACTTCCTTGG 57 CTTGGTTTTTTGTTAAT-3’) was used as control. |||||||||||||||||||||| Sbjct 1526 TATGCAATTGCACTTCCTTGG 1506

SeqD SeqA

50 GCGG 50 GTAG Targets(21/21) Targets(22/22)

Gene TPD52L2 Gene Nob1

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 28 / 33 Automatic detection of questionable research papers Seek & Blastn tool Seek & Blastn at a glance

(1) Facts extraction: Named entity recogni- Facts to check Materials and methods The shRNA sequence (5’-GCGGAGGGTTTGAAA- tion,extractnucleotide GAATATCTCGAGATATTCTTTCAAACCCTCCGCTTTTTT- Status DNA Seq 3’) targeting TPD52L2 (NM 199360) was inserted into and status...... the pFH-L plasmid (Shanghai Hollybio, China). A scrambled shRNA that shared no homology with the Targeting GCG...TTT mammalian genome (5’-CTAGCCCGGCCAAGGAAGTG- Non-Targ. CTA...AAT CAATTGCATACTCGAGTATGCAATTGCACTTCCTTG- GTTTTTTGTTAAT-3’) was used as control......

(2) Blastn call software gives the hit list

Hit lists (Blastn results) Checked Facts (3) Comparison Satus DNA Seq hit list DNA Seq Targ. GCG...TTT ...... Non-Targ CTA...AAT TPD52L2, ... GCG...TTT ...... NOB1,... CTA...AAT ......

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 29 / 33 Automatic detection of questionable research papers Seek & Blastn tool Tests and results Used Corpora. Problematic Paper (CorpusP): 38/48 (79%) highly similar publications with nucleotide sequence(s) did not match their experimental use (blastn). Unknown papers (CorpusU): 154 papers, retrieved using CorpusP papers and the ”PubMed similar” function.

Seek & Blastn performances In CorpusU nucleotide sequences were extracted from 111/154 (73%) papers. Claims were not (correctly) identified for 19/341 (5.6%) sequences in CorpusP. Identification of the 38/48 (79%) CorpusP papers that incorrectly employed sequences.

Error detection in scientific literature 38 papers in CorpusP appear to have incorrectly employed nucleotide sequence. ”seek & blastn” predicted that 30/154 (19%) CorpusU papers may have incorrectly employed nucleotide sequence reagent(s).

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 30 / 33 Automatic detection of questionable research papers Seek & Blastn tool Seek & Blastn

Related works Detection of statistically flawed paper Fake news detection

Seek & Blastn perspectives Online tool : http://scigendetection.imag.fr/TPD52 Avoid false positive, more in-deep analysis of sentences.

Retractions, Errors corrections A few retractions ( 10), 50 to be treated ⇡ ⇡ Citation analysis (to be done)

Open Access vs Fee based When fee-based, automatically download is not permitted. Paywall are hiding good and junk science

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 31 / 33 Automatic detection of questionable research papers Seek & Blastn tool Conclusion, Future/Ongoing works

Publication procedures, models and habits Why fake papers were accepted, published and ... sold. Traditional publisher vs open access. Blind management rules: incitation to malpractices, slicing, faked data, ...

Automatically Identify and flag scientific errors/breakthrough Mutual enrichment of two families of techniques (B+IR). Joint analysis of citations and text.

Measurement of perturbations...... introduced by measuring science.

In the web today Automatic knowledge extraction/detection/generation. How to separate the wheat from the cha↵... and scale up !

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 32 / 33 Automatic detection of questionable research papers Seek & Blastn tool Thanks

Amancio, D. R. (2015). single gene knockdown experiments in Ike antkare, one of the great stars in the human cancer cell lines. scientific firmament. Comparing the topological properties of Scientometrics,110(3):1471–1493. International Society for Scientometrics real and artificially generated scientific and Informetrics Newsletter, manuscripts. Dalkilic, M. M., Clark, W. T., Costello, 6(2):48–52. Scientometrics,105(3):1763–1779. J. C., and Radivojac, P. (2006). Labb´e,C. and Labb´e,D. (2006). Bartneck, C. and Servaas, K. (2011). Using compression to identify classes of inauthentic texts. Atoolforliterarystudies.intertextual Detecting h-index manipulation through In Proceedings of the 2006 SIAM distance and tree classification. self-citation analysis. Conference on Data Mining. Literary and Linguistic Computing, Scientometrics,87(1):85–98. 21(3):311–326. Fahrenberg, U., Biondi, F., Corre, K., Labb´e,C. and Labb´e,D. (2013). Beel, J. and Gipp, B. (2010). J´egourel, C., Kongshøj, S., and Legay, Academic search engine spam and A. (2014). Duplicate and fake publications in the google scholar’s resilience against it. Measuring structural distances between scientific literature: how many scigen texts. papers in computer science? Journal of Electronic Publishing,13(3). CoRR,abs/1403.4024. Scientometrics,94(1):379–396.

Beel, J., Gipp, B., and Wilde, E. Fister jr, I., Fister, I., and Perc, M. Lavoie, A. and Krishnamoorthy, M. (2010). (2016). (2010). Academic search engine optimization Toward the discovery of citation cartels Algorithmic Detection of Computer (aseo). in citation networks. Journal of scholarly publishing, 4. Generated Text. ArXiv e-prints. 41(2):176–190. Ginsparg, P. (2014). Lopez-Cozar, E. D., Robinson-Garc´ıa, Byrne, J. A. and Labb´e,C. (2017a). Automated screening: Arxiv screens N., and Torres-Salinas, D. (2012). Fact checking nucleotide sequences in spot fake papers. Manipulating google scholar citations life science publications: The seek & Nature,508(7494):44–44. and google scholar metrics: Simple, blastn tool. easy and tempting. In International Congress on Peer Herteliu, C., Ausloos, M., Ileanu, B. V., Review and Scientific Publication, arXiv preprint arXiv:1212.0638. Enhancing the quality and credibility of Rotundo, G., and Andrei, T. (2017). science,Chicago. Quantitative and qualitative analysis of Xiong, J. and Huang, T. (2009). editor behavior through potentially An e↵ective method to identify machine Byrne, J. A. and Labb´e,C. (2017b). coercive citations. Publications,5(2). automatically generated paper. Striking similarities between In KESE ’09. Pacific-Asia Conference, pages 101–102. publications from china describing Labb´e,C. (2010).

C.Labb´e (UGA-LIG) Ike Antkare & Co March 26, 2018 33 / 33