eISSN : 2287-4577 pISSN : 2287-9099 http://www.jistap.org Vol. 6 No. 3 September 30, 2018

06 A Combinational Method to Determining Identical Entities from Heterogeneous Knowledge Graphs

16 Evaluation of Websites of Public Libraries of India under Ministry of Culture: A Webometric Analysis

25 Anonymous and Non-anonymous User Behavior on Social Media: A Case Study of Jodel and Instagram

37 Rediscovering Forgotten Research: Sleeping Beauties at the University of Waterloo

45 Quantifying Quality: Research Performance Evaluation in Korean Universities

Indexed/Covered by SCOPUS, LISA, DOAJ, and CrossRef General Information

Aims and Scope The Journal of Information Theory and Practice (JISTaP) is an international journal that aims at publishing original studies, review papers and brief communications on information science theory and practice. The journal provides an international forum for practical as well as theoretical research in the interdisciplinary areas of information science, such as information processing and management, knowledge organization, scholarly communication and bibliometrics. JISTaP will be published quarterly, issued on the 30th of March, June, September, and December. JISTaP is indexed in the Scopus, Korea Science Citation Index (KSCI) and KoreaScience by the Korea Institute of Science and Technology Information (KISTI) as well as CrossRef. The full text of this journal is available on the website at http://www.jistap.org

Indexed/Covered by

Publisher Korea Institute of Science and Technology Information 66, Hoegi-ro, Dongdaemun-gu, Seoul, Republic of Korea (T) +82-2-3299-6102 (F) +82-2-3299-6067 E-mail: [email protected] URL: http://www.jistap.org

Managing Editor: Suhyeon Yoo, Eungi Kim

Copy Editor: Ken Eckert

Design & Printing Company: SEUNGLIM D&C 4F, 15, Mareunnae-ro, Jung-gu, Seoul, Republic of Korea (T) +82-2-2271-2581~2 (F) +82-2-2268-2927 E-mail: [email protected]

Open Access and Creative Commons License Statement All JISTaP content is Open Access, meaning it is accessible online to everyone, without fee and au-thors, permission. All JISTaP content is published and distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/). Under this license, authors reserve the copyright for their content; however, they permit anyone to unrestrictedly use, distribute, and reproduce the content in any medium as far as the original authors and source are cited. For any reuse, redistribution, or reproduction of a work, users must clarify the license terms under which the work was produced.

This paper meets the requirements of KS X ISO 9706, ISO 9706-1994 and ANSI/NISO Z39.48-1992 (Permanence of Paper)

2018 Copyright © Korea Institute of Science and Technology Information Editorial Board

Co-Editors-in-Chief Gary Marchionini University of North Carolina, USA Dong-Geun Oh Keimyung University, Korea

Associate Editor Kiduk Yang Kyungpook National University, Korea Taesul Seo Korea Institute of Science and Technology Information, Korea

Managing Editor Suhyeon Yoo Korea Institute of Science and Technology Information, Korea Eungi Kim Keimyung University, Korea

Editorial Board Consulting Editors

Beeraka Ramesh Babu Lokman I. Meho Sujin Butdisuwan Hur-Li Lee University of Madras, India American University of Beirut, Mahasarakham University, University of Wisconsin- Lebanon Thailand Milwaukee, USA Pia Borlund University of Copenhagen, Jin Cheon Na Folker Caroli P. Rajendran Denmark Nanyang Technological Universitat Hildesheim, SRM University, India University, Singapore Germany France Bouthillier B. Ramesha McGill University, Canada Dan O’Connor Seon Heui Choi Bangalore University, India Rutgers University, USA Korea Institute of Science Kathleen Burnett and Technology Information, Tsutomu Shihota Florida State University, USA Christian Schloegl Korea St. Andrews University, Japan University of Graz, Austria Boryung Ju Joy Kim Ning Yu Louisiana State University, USA Ou Shiyan University of Southern University of Kentucky, USA Nanjing University, China Noriko Kando California, USA Wayne Buente National Institute of Paul Solomon Kenneth Klein University of Hawaii, USA Informatics, Japan University of South Carolina, University of Southern USA Shailendra Kumar California, USA University of Delhi, India Ina Fourie M. Krishnamurthy University of Pretoria, South DRTC, Indian Statistical Mallinath Kumbar Africa University of Mysore, India Institute, India Helen Partridge S.K. Asok Kumar Fenglin Li University of Southern Wuhan University, China The Tamil Nadu Dr Queensland, Australia Ambedkar Law University, Thomas Mandl India Universiat Hildesheim, Germany

2018 Copyright © Korea Institute of Science and Technology Information

Table of Contents

Vol. 6 No. 3 September 30, 2018 JISTaP Journal of Information Science Theory and Practice• http://www.jistap.org

Articles 06

A Combinational Method to Determining Identical Entities from 06 Heterogeneous Knowledge Graphs - Haklae Kim

Evaluation of Websites of Public Libraries of India under Ministry of Culture: 16 A Webometric Analysis - Krishna Brahma, Manoj Kumar Verma

Anonymous and Non-anonymous User Behavior on Social Media: 25 A Case Study of Jodel and Instagram Regina Kasakowskij, Natalie Friedrich, Kaja J. Fietkiewicz, Wolfgang G. Stock

Rediscovering Forgotten Research: Sleeping Beauties at the University of Waterloo 37 - Jeffrey Demaine

Quantifying Quality: Research Performance Evaluation in Korean Universities 45 - Kiduk Yang, Hyekyung Lee

Call for Paper 61

Information for Authors 62

2018 Copyright © Korea Institute of Science and Technology Information JISTaP http://www.jistap.org Research Paper Journal of Information Science Theory and Practice J Inf Sci Theory Pract 6(3): 06-15, 2018 eISSN : 2287-4577 pISSN : 2287-9099 https://doi.org/10.1633/JISTaP.2018.6.3.1

A Combinational Method to Determining Identical Entities from Heterogeneous Knowledge Graphs

Haklae Kim* Korea Institute of Science and Technology Information, Daejeon, Korea E-mail: [email protected]

ABSTRACT With the increasing demand for intelligent services, knowledge graph technologies have attracted much attention. Various application-specific knowledge bases have been developed in industry and academia. In particular, open knowledge bases play an important role for constructing a new knowledge base by serving as a reference data source. However, identifying the same entities among heterogeneous knowledge sources is not trivial. This study focuses on extracting and determining exact and precise entities, which is essential for merging and fusing various knowledge sources. To achieve this, several algorithms for extracting the same entities are proposed and then their performance is evaluated using real-world knowledge sources. Keywords: entity consolidation, knowledge extraction, knowledge graph, knowledge creation, knowledge interlinking

Open Access Accepted date: July 09, 2018 All JISTaP content is Open Access, meaning it is accessible online to Received date: December 07, 2017 everyone, without fee and authors’ permission. All JISTaP content is published and distributed under the terms of the Creative Commons *Corresponding Author: Haklae Kim Attribution License (http://creativecommons.org/licenses/by/3.0/). Senior Researcher Under this license, authors reserve the copyright for their content; Korea Institute of Science and Technology Information, 245 Daehak-ro, however, they permit anyone to unrestrictedly use, distribute, and Yuseong-gu, Daejeon, 34141, Korea reproduce the content in any medium as far as the original authors and E-mail: [email protected] source are cited. For any reuse, redistribution, or reproduction of a work, users must clarify the license terms under which the work was produced.

© Haklae Kim, 2018 A Method to Determine Identical Entities

1. INTRODUCTION identifying the same relationships to extract and generate knowledge from different data sets. Entity consolidation With the increasing demand for intelligent services, for data integration at the instance level has attracted knowledge graph technologies have attracted much interest in the semantic web and linked data communities. attention for applications, ranging from question-answer It refers to the process of identifying same entities across systems to enterprise data integration (Gabrilovich & heterogeneous data sources (Hogan et al., 2012). A problem Usunier, 2016). A number of research efforts have already can be simplified such that different identifiers are used developed open knowledge bases such as DBpedia for identical entities scattered across different datasets in (Lehmann et al., 2009), Wikidata (Vrandecic, 2012), a web of data. Because redundancy causes an increase in YAGO (Suchanek, Kasneci, & Weikum, 2007), and noisy or unnecessary information across a distributed web Freebase (Bollacker, Evans, Paritosh, Sturge, & Taylor, of data, identifying the same items can be advantageous in 2008). Most open knowledge bases heavily use Linked that multiple descriptions of the same entity can mutually Data technologies for constructing, publishing, and complete and complement each other (Enríquez et al., accessing knowledge sources. Linked data is one of the core 2017). concepts of the Semantic Web, also called the Web of Data This study proposes a combinational approach (Bizer, Cyganiak, & Heath, 2007; Gottron & Staab, 2014). for extracting and determining same entities from It involves making relationships such as links between heterogeneous knowledge sources. It focuses on extracting datasets understandable to both humans and machines. exact and precise entity linkages, which is the key to merging Technically, it is essentially a set of design principles for and fusing various knowledge sources into new knowledge. sharing machine-readable interlinked data on the Web The remainder of this paper is organized as follows. Section (Berners-Lee, 2009). According to LODstats,1 149B triples 2 presents a literature review of related works. Section from 2,973 datasets have been published in public, and 3 introduces research methods and basic principles of 1,799,869 identical entity relations have already been made defining an entity pair from multiple knowledge bases. from 251 datasets. The standard method for stating a set Section 4 introduces a formal model for entity consolidation of the same entities is to use the owl:same property. This and presents several strategies for extracting and identifying property is used to describe homogeneous instances that same entities. Section 5 introduces implementations refer to the same object in the real world. It aims to indicate of proposed strategies with some examples. Section 6 that two uniform resource identifier (URI) references addresses and discusses findings from the evaluation using actually refer to the same thing (Berners-Lee, 2009). real-world knowledge bases. Section 7 concludes this study Existing knowledge bases can be used to construct and discusses future work. new ones to meet certain objectives, since constructing a new knowledge base from scratch is not easy. However, various issues arise when creating a new knowledge base 2. RELATED WORK by integrating multiple knowledge sources. One issue is whether the relationships in the existing knowledge base are A number of open knowledge bases already exist such always reliable. All individual instances of given knowledge as DBpedia, Freebase, Wikidata, and YAGO (Paulheim, sources should be identified and linked to these sources 2017). Wikidata (Vrandecic, 2012) is a knowledge base before integrating knowledge sources (Halpin, Hayes, about the world that can be read and edited by humans and McCusker, McGuinness, & Thompson, 2010). The problem machines with the Creative Commons Zero license (CC- of discovering the same entities in various data sources has 0).2 Information from Wikidata is called items, which are been studied extensively; it is variously referred to as entity comprised of labels, descriptions, and aliases in all languages reconciliation (Enríquez, Mayo, Cuaresma, Ross, & Staples, of Wikipedia. Wikidata does not aim to offer a single truth 2017), entity resolution (Stefanidis, Efthymiou, Herschel, about things; instead, it provides statements given in a & Christophides, 2014), entity consolidation (Hogan, particular context. DBpedia (Lehmann et al., 2009) is a Zimmermann, Umbrich, Polleres, & Decker, 2012), and structured, multilingual knowledge set from Wikipedia and instance matching (Castano, Ferrara, Montanelli, & Lorusso, is made freely available on the Web using semantic web and 2008). All of these approaches are very important for linked data technologies. It has developed into the central

1 http://lodstats.aksw.org/stats 2 https://creativecommons.org/choose

7 http://www.jistap.org JISTaP Vol.6 No.3, 06-15 interlinking hub in the Web of linked data, because it covers identifying same entities from knowledge sources is not a wide variety of topics and sets resource data framework enough to integrating two knowledge bases. Various (RDF) links pointing to various external data sources. studies have investigated pragmatic issues of owl:sameAs in Freebase (Bollacker, Evans, Paritosh, Sturge, & Taylor, 2008) the context of the Web of Data (Halpin et al., 2010; Ding, was a large collaborative and structured knowledge base Shinavier, Shangguan, & McGuinness, 2010; Hogan et al., harvested from diverse data sources. It aimed to create a 2012; Idrissou, Hoekstra, van Harmelen, Khalili, & den global resource graph that allowed human and machines Besselaar, 2017). In particular, Hogan et al. (2012) discuss to access common knowledge more effectively. Google scalable and distributed methods for entity consolidation developed a Knowledge Graph using Freebase. On the other to locate and process names that signify the same entity. hand, Knowledge Vault is developed by Google to extract They calculate weighted concurrence measures between facts, in the form of disambiguated triples, from the entire entities in the Linked Data corpus based on shared inlinks/ web (Dong et al., 2014). The main difference from other outlinks and attribute values using statistical analyses. works is that it fuses together facts extracted from text with This paper proposes a combinational approach to extract prior knowledge derived from the Freebase graph. YAGO identical entity pairs from heterogeneous knowledge (Suchanek et al., 2007) fuses multilingual knowledge with sources. English WordNet to build a coherent knowledge base from Wikipedia in multiple languages. Färber, Ell, Menne, and Rettinger (2015) analyses 3. METHODOLOGY existing knowledge graphs based on 35 characteristics, including general information (e.g., version, languages, or 3.1. Research Approach covered domains), format and representation (e.g., dataset This study proposes a method for extracting a set formats, dynamicity, or query languages), genesis and of identical entities from heterogeneous knowledge usage (e.g., provenance of facts, influence on other linked sources. An identical relationship of entities is based on open data [LOD] datasets), entities (e.g., entity reference, calculating the properties and its values of the entities. The LOD registration and linkage), relations (e.g., reference, analysis is performed through a combination of several relevance, or description of relations), and schema (e.g., methods called ‘strategy.’ In this paper, five strategies are restrictions, constraints, network of relations). According introduced and are combined for extracting and verifying to the comparison of entities, most knowledge graphs identical relationships of entities. Each strategy has its own provide human-readable identifiers, however, Wikidata advantages and disadvantages. For example, a consistency provides entity identifiers, which consists of “Q” followed strategy is a simple method for extracting entities, but it by a specific number (Wang, Mao, Wang, & Guo, 2017). returns high ambiguities as noise to some extent, whereas Most knowledge graphs are published in RDF and link a max confidence strategy delivers reduced ambiguities by their entities to entities of other datasets in LOD cloud.3 calculating a confidence score of entity pairs. Although the In particular, DBpedia and Freebase have a high degree of max confidence method would be useful for extracting connectivity with other LOD datasets. entity pairs compared to the consistency method, the max Note that Google recently announced that it transferred confidence strategy is based on the entity pairs extracted by data from Freebase to Wikidata, and it launched a new the consistency one. Therefore, each strategy can be used for API for entity search powered by Google’s Knowledge individual purposes, and also can be applied to determine Graph. Mapping tools4 have been provided to increase the a high quality of identical entity pairs by combining several transparency of the publication process of Freebase content strategies. to integrate into Wikidata. Tanon, Vrandecic, Schaffert, Steiner, and Pintscher (2016) provided a method for 3.2. A Formal Model of an Entity Pair migrating from Freebase to Wikidata with some limitations, Let knowledge bases K1 and K2 contain a set of entities E e1 including entity linking and schema mapping. This study and properties, respectively. The set of entities is Ki = {Ki , ... , en P P1 Pn provides comprehensive entity extraction techniques Ki } and the set of properties in Ki is Ki = {Ki , ... , Ki }. In O C P for interlinking from two knowledge sources. However, addition, let Ki = {Ki , ... , Ki } be the ontology schema of Ki, C P where Ki is the set of classes and Ki is the set of properties.

3 Thus, entity pairs EP as a set of identical entities for http://lod-cloud.net/ (K1,K2) 4 https://github.com/google/freebase-wikidata-converter given knowledge bases K1 and K2 are denoted as follows:

8 A Method to Determine Identical Entities

e1 ej es et EP(K1, K2) = {(K1 ,K1 ), ... ,(K1 ,K2 )} source, or knowledge redundancy. Note that one entity can be interlinked to multiple entities of different knowledge e1 ej ei ej es et ei where Ki is identical to K2 . On the other hand, the sources (e.g. ∃(K1 , K2 ), (K1 , K2 ) EP(K1, K2), and K1 = O es ej et schema alignment K is aligned to its schemas: K1 and K2 K2 ). This ambiguous pair might arise from

a defect in the knowledge base Ki. For establishing high- O O align O K = K1 K2 quality linkages across heterogeneous knowledge sources,

it is essential to extract confident EP(K1, K2) by eliminating C align C P align P where K1 K2 is the class alignment and K1 K2 ambiguities to the greatest extent possible. Therefore, Ci is the property alignment for K1 and K2. In this sense, K1 alternative strategies are proposed. align Cj Ci Cj Pi align Pj K2 means that K1 is identical to K2 , and K1 K2 means that the value of Pi in K1 corresponds to that of Pj in 4.2. Max Confidence Strategy 0 K2. Thus, according toK , a set of property mappings to the This strategy calculates a confidence score for the entity matching keys is defined as follows: pairs extracted by the consistency strategy to reduce the noise caused by defects and determines precise and Pi Pj Ps Pt MK(K1, K2) = {(K1 ,K2 ), ... ,(K1 ,K2 )} confident entity pairs. The formal notation of this strategy is defined as follows:

Pi Pj Ps Pt 4. STRATEGIES FOR ENTITY CONSOLIDATION Given matching keys MK(K1,K2) = {(K1 ,K2 ), ... ,(K1 ,K2 )}, em E en E Pim Pjm for K1 K1 and K2 K2 , let MKm = {(K1 ,K2 ), ... ,(K Psm Ptm Pim Pjm A number of approaches is available for identifying the 1 ,K2 )} be the matched MK(K1, K2), where (K1 ,K2 ) Pim em same entities from heterogeneous knowledge bases (Hors & indicates that the K1 value of K1 is exactly equal to the Pjm en Speicher, 2014; Nguyen & Ichise, 2016; Moaawad, Mokhtar, K2 value of K2 . Based on this, MKm and MK(K1, K2) can

& al Feel, 2017). This section addresses some methods to be defined as MKm MK(K1,K2), then a confidence score of em en determine identical relationships from the extracted entities. (K1 ,K2 ) is calculated by the following equation: Note that formal models of four strategies are introduced em en and their characteristics are also discussed. conf(K1 ,K2 ) = ||MKm|| / ||MK(K1,K2)||

4.1. Consistency Strategy where ||·|| is the cardinality. Therefore, a confidence score ei ej es et This strategy aims to extract a set of precise entities by is assigned to each entity pair, and for (K1 ,K2 ),(K1 ,K2 ) ei ej mapping property values on specific knowledge bases. That EP(K1, K2). Therefore, K( 1 ,K2 ) is the confident identical entity ei ej ei ej es et is, to determine the consistency of K1 and K2 based on pair where conf(K1 ,K2 ) > conf(K1 ,K2 ). matching keys MK, two strategies, SI and SU, are defined: 4.3. Threshold Filtering Strategy em en Pi Pj Strategy SI: For K1 and K2 from K1 and K2, ∀(K1 ,K2 ) The Max Confidence allows to filter out ambiguous same Pi em Pj MK(K1,K2), the K1 value of K1 is exactly equal to the K2 value entity pairs; nonetheless, some of entity pairs may have Pj em en of K2 . Then, (K1 ,K2 ) is an identical entity pair, and the relatively high scores with low confidence levels. To solve consistency determination is of the intersection strategy SI. this issue, a threshold is added to the extraction process: If em en Pi Pj Strategy SU: For K1 and K2 from K1 and K2, ∃(K1 ,K2 ) an entity pair has determined with the highest confidence Pi em Pj MK(K1, K2), the K1 value of K1 is exactly equal to the K2 and it has a low score compared to other scores, it can be en em en value of K2 . Then, (K1 ,K2 ) is an identical entity pair, and removed from a set of candidates. The threshold filtering the consistency determination is of the union strategy SU. strategy aims to improve a confidence level of extracted entity pairs by using a threshold score. Given a threshold ei ej ei et ei ej This strategy is based on the assumption that all for (K1 ,K2 ),(K1 ,K2 ) EP(K1, K2), where conf(K1 ,K2 ) ei et ei ej ei ej knowledge sources are trustworthy: The knowledge in Ki is conf(K1 ,K2 ) and conf(K1 ,K2 ) , (K1 ,K2 ) is selected as precise and without defect. The identical relations EP(K1, K2) the confident same entity pair. extracted by this strategy are considered precise because the mapping of the property values is exact without bias. 4.4. One-to-One Mapping Strategy On the contrary, most open knowledge bases contain some This strategy extracts simply 1-1 entity pairs from defects which may be caused by false recognition, inaccurate heterogeneous knowledge sources by ignoring multiple

9 http://www.jistap.org JISTaP Vol.6 No.3, 06-15 relations in which one identifier is matched to multiple relevant and available evidence that supports the claim that identifiers of different sources. Formally, it is represented as the actual state belongs to A. In this sense, a degree of belief ei ej es et ∀(K1 ,K2 ) EP(K1,K2), ∄(K1 ,K2 ) EP(K1,K2) where i = s or j is represented as a belief function rather than a Bayesian = t. By applying one-to-one mapping, identical entity pairs probability distribution.

EP(K1,K2) have no ambiguous relations.

4.5. Belief-based Strategy 5. IMPLEMENTATION OF THE BELIEF-BASED The four strategies introduced so far focus on inter- STRATEGY relations between entity pairs by comparing properties of knowledge bases, whereas they do not consider intra- The proposed strategies are developed in the entity relations in a certain pair. In other words, property values of extraction framework (Kim, Liang, & Ying, 2014), which is entities in a certain pair should be checked for determining to extract identical entities among heterogeneous knowledge identical relations. The belief-based strategy aims to analyse sources. In particular, entity matching is carried out by property values of extracted entity pairs that is based on the configured property values for each entity pair. As illustrated Dempster-Shafer theory (Yager, 1987), also called the theory in Fig. 1, it is comprised of several components: Preprocessor of evidence. for normalising entities and properties and to extract a set

Given a set of same entity pairs EP, let XEP denote the set of URI from knowledge sources, Matching for extracted representing all possible states of an entity pair. Here, two entities and properties based on exact and similarity cases are possible: The two entities are linked (L) or the two measure, Optimization for better extracting a set of same entities are not linked (U). Note that XEP = {L, U}. Then, entity pairs using several strategies, and Knowledge Base

XEP= { , L, U, {L, U}}, where indicates the empty set, and Management that aims to create and interlink a knowledge {L, U} indicates that it is uncertain whether they are linked. base for the consolidation results.

Therefore, a belief degree is assigned to each element of XEP:

→ Preprocessor Matching Optimisation m: XEP [0,1] (1) Normalisation Exact relationship Max confidence where m is the degree of same belief, which is the basic extraction Property combination Threshold belief assignment in the Dempster-Shafer theory. Then, URI encoding & decoding Similarity joins Belief-baesd each pair of knowledge sources has four hypotheses, and the formal model is represented as follows: Entity consolidation engine

m( ) = 0 (2) Knowledge bases management Pi Pj Ps Pt Given MK(K1,K2) = {(K1 ,K2 ), ... ,(K1 ,K2 )} and MK(K1,K2) Pim Pjm Psm Ptm Metadata Knowledge bases Intermediate datasets = {(K1 ,K2 ), ... ,(K1 ,K2 )}, m is assigned as follows:

Fig. 1. Entity extraction framework. URI, uniform resource identifier. m({L}) = ||MKm|| / ||MK(K1,K2)|| (3)

em K1 ||MKum|| / ||MK(K1,K2)|| (4) Currently, this framework is being used for extracting relations from both Wikidata and Freebase. To identify the

where MKum represents the unmatched MK(K1,K2), that is, same entities from both knowledge sources, Wikipedia is Pi em Pj en the K1 value of K1 is not equal to the K2 value of K2 . And the primary data source used to detect relations between uncertain pairs of knowledge sources are calculated by the Freebase and Wikidata. Therefore, for detecting source following model: errors and identifying exact identical relationships, four strategies are implemented. In particular, those strategies m({L, U}) = 1 - m({L}) - m({U}) (5) are fully implemented in this framework: for example, the workflow of entity consolidation based on the Max According to the theory of evidence, the basic belief Confidence as shown in Fig. 2. It is designed to compute the assignment m(A), A , expresses the proportion of all Max Confidence for entity consolidation to reduce the noise

10 A Method to Determine Identical Entities

applied by using the same datasets. As shown in Table 1, for the Persian soldier Pharnabazus II (https://en.wikipedia. org/wiki/Pharnabazus_II), Freebase (http://rdf.freebase. com/ns/m.01d89y) has 8 Wikipedia links whereas Wikidata (https://www.wikidata.org/wiki/Q458256) has 20 Wikipedia links in Table 2. Note that the belief-based approach for the case shown in Tables 1 and 2 can be calculated as follows:

mass({ }) = 0 mass({Link}) = Matched Wikipedia Link Number ⁄ Total Wikipedia Link Number mass({Unlink}) = Unmatched Wikipedia Link Number ⁄ Total Wikipedia Link Number mass({Link, Unlink}) = 1 - mass({ }) - mass({Link}) - mass({Unlink})

There are matched and unmatched links compared to the given identifiers based on a Wikipedia link. On the other hand, both Wikidata and Freebase do not have the corresponding links. In this case, the status is uncertain. Therefore, the belief-based approach for the given example is calculated:

Fig. 2. The algorithm of the Max Confidence strategy. mass({Link}) = 3/8 = 0.375 mass({Unlink}) = 5/8 = 0.625 caused by defects and to obtain precise and confident same mass({Link, Unlink}) = 0 entity pairs. For the threshold strategy, a threshold score is set as As a result, for entity ‘m.01d89y,’ the belief degree for 0.5 by default. After eliminating a set of pairs under the unlinking with entity ‘Q458256’ is much greater than threshold score, the Max Confidence approach is applied. the belief degree for linking. Therefore, we consider that Furthermore, the belief-based approach is developed and ‘m.01d89y’ is different from Q458256‘ .’

Table 1. An example of Freebase entity Identifier Wikipedia language Wikipedia link Matched en http://en.wikipedia.org/wiki/Pharnabazos_II,_Satrap_of_Phrygia Unmatched es http://es.wikipedia.org/wiki/Farnabazo_I Unmatched it http://it.wikipedia.org/wiki/Farnabazo_II Matched (1) http://ja.wikipedia.org/wiki/%E3%83%95%E3%82%A1%E3%83%AB%E3%8 3%8A%E3%83%90%E3%82%BE%E3%82%B9_%28%E3%82%A2%E3%83% ja AB%E3%82%BF%E3%83%90%E3%82%BE%E3%82%B9%E3%81%AE%E5% Unmatched m.01d89y AD%90%29 ca http://ca.wikipedia.org/wiki/Farnabazos_I Unmatched http://he.wikipedia.org/wiki/%D7%A4%D7%A8%D7%A0%D7%91%D7%96 he %D7%95%D7%A1_%D7%94%D7%A9%D7%A0%D7%99 Matched (2) hr http://hr.wikipedia.org/wiki/Farnabaz_I. Unmatched http://el.wikipedia.org/wiki/%CE%A6%CE%B1%CF%81%CE%BD%CE%AC%C el E%B2%CE%B1%CE%B6%CE%BF%CF%82_%CE%92%CE%84 Matched (3) The full uniform resource identifier of Freebase entity has ‘http://rdf.freebase.com/ns/’ with identifier, i.e., http://rdf.freebase.com/ns/m.01d89y.

11 http://www.jistap.org JISTaP Vol.6 No.3, 06-15

Table 2. An example of Wikidata Identifier Wikipedia language Wikipedia link Matched http://be-x-old.wikipedia.org/wiki/%D0%A4%D0%B0%D1%80%D0%BD%D0 be_x_old %B0%D0%B1%D0%B0%D0%B7_II Uncertain http://be.wikipedia.org/wiki/%D0%A4%D0%B0%D1%80%D0%BD%D0%B0 be %D0%B1%D0%B0%D0%B7_II Uncertain http://bg.wikipedia.org/wiki/%D0%A4%D0%B0%D1%80%D0%BD%D0%B0 bg %D0%B1%D0%B0%D0%B7_II Uncertain ca http://ca.wikipedia.org/wiki/Farnabazos_II Unmatched de http://de.wikipedia.org/wiki/Pharnabazos_II. Uncertain it http://it.wikipedia.org/wiki/Farnabazo_II Matched (1) en http://en.wikipedia.org/wiki/Pharnabazus_II Uncertain es http://es.wikipedia.org/wiki/Farnabazo_II Unmatched fr http://fr.wikipedia.org/wiki/Pharnabaze Uncertain http://he.wikipedia.org/wiki/%D7%A4%D7%A8%D7%A0%D7%91%D7%96 he %D7%95%D7%A1_%D7%94%D7%A9%D7%A0%D7%99 Matched (2) Q458256 hr http://hr.wikipedia.org/wiki/Farnabaz_II. Unmatched http://el.wikipedia.org/wiki/%CE%A6%CE%B1%CF%81%CE%BD%CE%AC%C el E%B2%CE%B1%CE%B6%CE%BF%CF%82_%CE%92%CE%84 Matched (3) http://ja.wikipedia.org/wiki/%E3%83%95%E3%82%A1%E3%83%AB%E3%83% ja 8A%E3%83%90%E3%82%BE%E3%82%B9_(%E3%83%95%E3%82%A1%E3% Unmatched 83%AB%E3%83%8A%E3%82%B1%E3%82%B9%E3%81%AE%E5%AD%90) nl http://nl.wikipedia.org/wiki/Pharnabazus Uncertain no http://no.wikipedia.org/wiki/Farnabazos Uncertain pl http://pl.wikipedia.org/wiki/Farnabazos_II Uncertain http://ru.wikipedia.org/wiki/%D0%A4%D0%B0%D1%80%D0%BD%D0%B0% ru D0%B1%D0%B0%D0%B7 Uncertain sh http://sh.wikipedia.org/wiki/Farnabaz_II Uncertain sv http://sv.wikipedia.org/wiki/Farnabazos Uncertain http://uk.wikipedia.org/wiki/%D0%A4%D0%B0%D1%80%D0%BD%D0%B0 uk %D0%B1%D0%B0%D0%B7 Uncertain The full uniform resource identifier of Freebase entity has ‘https://www.wikidata.org/wiki/’ with identifier, i.e., https://www.wikidata.org/wiki/Q458256.

6. EVALUATION URLs of Wikidata). After pre-processing the collected datasets, 4,446,380 entities from Freebase and 15,403,618 6.1. Data Collection entities from Wikidata are extracted with Wikipedia links.

Two knowledge bases (i.e., Wikidata and Freebase) are By using the consistency strategy (i.e., S1), 4,400,955 pairs selected to demonstrate the proposed strategies. Wikidata are obtained from both knowledge sources. and Freebase are receiving great attention from academia and industry for constructing their own knowledge 6.2. Results bases, and there are realistic issues for data integration The aim of applying different approaches for same between two knowledge sources. It is essential to derive extraction is to generate links with the highest confidence homogeneous entities for knowledge integration, since between Freebase and Wikidata entities. The results Wikidata and Freebase have been developed independently. differed slightly with the given datasets. Fig. 3. The result of A set of same entities between Freebase (2015-02-10)5 and extracting same entities between Freebase and Wikidata.3 Wikidata (2015-02-07) is extracted via their own Wikipedia illustrates the results obtained using different mapping styles reference links (i.e., wiki-keys of Freebase and Wikipedia with the proposed strategies. Note that the consistency strategy obtains the largest number of entity pairs. 5 https://developers.google.com/freebase/ Nonetheless, there are a number of 1-multiple/multiple-1/

12 A Method to Determine Identical Entities

4,410,000 4,400,955 4,400,000 4,395,258 4,395,542 4,390,429 4,390,000

4,380,000

4,370,000

4,360,000 4,357,469

4,350,000

4,340,000

4,330,000 Consistency Max Confidence (A+B) Threshold (0.5) One-to-one mapping Belief-based

Fig. 3. The result of extracting same entities between Freebase and Wikidata. multiple-multiple links which cause ambiguities as shown in precision scores are slightly differed among these strategies. Table 3. Without applying any approaches, the consistency Based on this result, a combination of each strategy can strategy possesses the largest ambiguity (0.37%). The one- reduce some ambiguities that are not removed using a single to-one mapping obviously holds the full confident same approach. On the other hand, both the precision and F1 entity mapping pairs. The Max Confidence, the Threshold score of the belief-based strategy are 99.1165 and 99.5563, Filtering (0.5 threshold), and the Belief-based strategies respectively. This demonstrates that the belief-based strategy show great effect on elimination of ambiguity. The number provides an extremely high matching quality. of mapping pairs based on belief degree is approximated to Note that Google has also constructed a mapping between that of Max Confidence. The belief degree greatly influences Freebase and Wikidata that was published in October the reduction in ambiguity in the multiple Freebase case but 2013. They detected 2,099,582 entity pairs with 2,096,745 not in the multiple Wikidata case. Freebase entities and 2,099,582 Wikidata entities. Fig. 4 As shown in Table 4, the precision and F1 score are 100 illustrates the result of identical entity pairs using the same percent for all strategies, because the set of matching pairs datasets from Freebase and Wikidata. The entity pairs from is extracted by using the Strategy SU, whereas both the all proposed strategies have some differences compared precision and F1 score are greater than 98.1371 percent, and to the Google result. Although they did not explicitly

Table 3. Composition of mapping results based on different strategies Consistency Max Confidence Threshold Filtering One-to-one mapping Belief-based 1 Freebase and 1 Wikidata 4,384,747 4,390,685 4,390,423 4,390,423 4,352,022 1 Freebase and multiple Wikidata 14,586 4,400 4,814 0 4,704 Multiple Freebase and 1 Wikidata 957 143 262 6 632 Multiple Freebase and multiple Wikidata 665 30 43 0 111 Total 4,400,955 4,395,258 4,395,542 4,390,429 4,357,469

Table 4. Matching quality of proposed strategies Consistency Max Confidence Threshold Filtering One-to-one mapping Belief-based Recall (%) 100 100 100 100 100 Precision (%) 98.1371 98.2643 98.2580 98.3724 99.1165 F1 score (%) 99.0598 99.1246 99.1213 99.1795 99.5563

13 http://www.jistap.org JISTaP Vol.6 No.3, 06-15

Identical entities (from Google) large-scale real-world datasets, there are more experiments Different entities (from Google) for integrating heterogeneous knowledge sources. Future work may explore the alternative expanding algorithms for handling different property values and evaluating the impact

33,704 of optimised approaches. Another potential area of research is to integrate heterogeneous knowledge into existing knowledge sources by instance matching techniques. 12,072 13,885

9,367 REFERENCES 2,089,245 2,085,746 2,086,021 2,081,374 Berners-Lee, T. (2009). The semantic web: linked data. Retrieved Jun 10, 2018 from https://www.w3.org/ Consistency May Threshold Belief-based DesignIssues/LinkedData.html. Bizer, C., Cyganiak, R., & Heath, T. (2007). How to publish Fig. 4. A comparison of the Google result. linked data on the web. Retrieved Jun 10, 2018 from http://wifo5-03.informatik.uni-mannheim.de/bizer/ announce how they extracted this result, it might use an pub/LinkedDataTutorial/. exact matching of Wikipedia URL. Applying the proposed Bollacker, K., Evans, C., Paritosh, P., Sturge, T., & Taylor, strategies to the Google results, the identical mapping pairs J. (2008). Freebase: A collaboratively created graph are more than 99.51%. However, they include ambiguous database for structuring human knowledge. In results according to individual strategies. For example, the SIGMOD ‘08: Proceedings of the 2008 ACM SIGMOD consistency strategy has the highest different entities (1.59%), International Conference on Management of Data (pp. whereas the belief-based strategy is the smallest (0.45%). In 1247-1250). New York, NY: ACM. summary, the belief-based strategy can be considered as an Castano, S., Ferrara, A., Montanelli, S., & Lorusso, D. effective approach to reduce ambiguity for entity extraction. (2008). Instance matching for ontology population. In Note that matching performance of the Google result is not S. Gaglio, I. Infantino, & D. Saccà (Eds.), Proceedings conducted, because they provided this dataset only once, of the Sixteenth Italian Symposium on Advanced and did not update related data sources. Database Systems (pp. 121-132). Mondello, Italy: SEBD. Ding, L., Shinavier, J., Shangguan, Z., & McGuinness, D. 7. CONCLUSIONS L. (2010). SameAs networks and beyond: Analyzing deployment status and implications of owl: sameAs in This study proposed several approaches for identifying linked data. In P. F. Patel-Schneider, Y. Pan, P. Hitzler, the same entities from heterogeneous knowledge sources P. Mika, L. Zhang, J. Z. Pan,…B. Glimm (Eds.), and evaluated these approaches by using Wikidata International Semantic Web Conference (pp. 145-160). and Freebase. According to the evaluation results, the Berlin: Springer. belief-based approach is most effective for reducing the Dong, X., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., ambiguous relations between the given datasets. Although Murphy, K.,…Zhang, W. (2014). Knowledge vault: A the consistency strategy returned the largest number of web-scale approach to probabilistic knowledge fusion. pairs of the same relation, it also had the highest number of In Proceedings of the 20th ACM SIGKDD International errors. Entity resolution is a popular topic in industry and Conference on Knowledge Discovery and Data Mining academia. Currently, common and popular approaches for (pp. 601-610). entity resolution focus on similarity-join techniques, but Enríquez, J. G., Mayo, F. J. D., Cuaresma, M. J. E., Ross, M., few studies have focused on belief-based approaches. The & Staples, G. (2017). Entity reconciliation in big data proposed belief-based same extraction approach can be a sources: A systematic mapping study. Expert Systems new technique for measuring the matching degree of entity with Applications, 80, 14-27. pairs. Färber, M., Ell, B., Menne, C., & Rettinger, A. (2015). A Although this paper conducted an entity extraction using comparative survey of DBpedia, Freebase, OpenCyc,

14 A Method to Determine Identical Entities

Wikidata, and YAGO. Semantic Web Journal, 1, 1-5. Moaawad, M. R., Mokhtar, H. M. O., & al Feel, H. T. (2017). Gabrilovich, E., & Usunier, N. (2016). Constructing and On-the-fly academic linked data integration. In ICCDA mining web-scale knowledge graphs. In R. Perego, F. ’17 Proceedings of the International Conference on Sebastiani, J. A. Aslam, I. Ruthven, & J. Zobel (Eds.), Compute and Data Analysis (pp. 114-122). New York, SIGIR '16 Proceedings of the 39th International ACM NY: ACM. SIGIR conference on Research and Development in Nguyen, K., & Ichise, R. (2016). Linked data entity Information Retrieval (pp. 1195-1197). New York, NY: resolution system enhanced by configuration learning ACM. algorithm. IEICE Transactions, 99-D, 1521-1530. Gottron, T., & Staab, S. (2014). Linked open data. In Paulheim, H. (2017). Knowledge graph refinement: Encyclopedia of social network analysis and mining (pp. A survey of approaches and evaluation methods. 811-813). New York, NY: Springer. Semantic Web, 8, 489-508. Halpin, H., Hayes, P., McCusker, J. P., McGuinness, D., Stefanidis, K., Efthymiou, V., Herschel, M., & Christophides, & Thompson, H. S. (2010). When owl:sameAs isn’t V. (2014). Entity resolution in the web of data. the same: An analysis of identity in linked data. In In Proceedings of the 23rd International Conference on Proceedings of the 9th International Semantic Web World Wide Web (WWW ‘14 Companion) (pp. 203- Conference (ISWC) (pp. 53-59). Berlin, Heidelberg: 204). New York, NY: ACM. IOS Press. Suchanek, F., Kasneci, G., & Weikum, G. (2007). YAGO-A Hogan, A., Zimmermann, A., Umbrich, J., Polleres, A., & core of semantic knowledge. In Proceedings of Decker, S. (2012). Scalable and distributed methods International Conference on World Wide Web (pp. 697- for entity matching, consolidation and disambiguation 706). New York, NY: ACM. over linked data corpora. Journal of Web Semantics, 10, Tanon, T. P., Vrandecic, D., Schaffert, S., Steiner, T., & 76-110. Pintscher, L. (2016). From Freebase to Wikidata: The Hors, A. L., & Speicher, S. (2014). Using read-write linked great migration. In Proceedings of the 25th International data for application integration. In A. Harth, K. Hose, Conference on World Wide Web (WWW ‘16) (pp. & R. Schenkel (Eds.), Linked data management (pp. 1419-1428). Geneva, Switzerland: International World 459-483). Lyon, France: Chapman and Hall/CRC. Wide Web Conferences Steering Committee. Idrissou, A. K., Hoekstra, R., van Harmelen, F., Khalili, A., Vrandecic, D. (2012). Wikidata: A new platform for & den Besselaar, P. V. (2017). Is my sameAs the same collaborative data collection. In A. Mille, F. L. Gandon, as your sameAs? Lenticular lenses for context-specific J. Misselis, M. Rabinovich, & S. Staab (Eds.), WWW identity. In Ó. Corcho, K. Janowicz, G. Rizzo, I. Tiddi, & (Companion Volume) (pp. 1063-1064). New York, NY: D. Garijo (Eds.), K-CAP (pp. 23:1-23:8). New York, NY: ACM. ACM. Wang, Q., Mao, Z., Wang, B., & Guo, L. (2017). Knowledge Kim, H., Liang, H., & Ying, D. (2014). Knowledge extraction graph embedding: A survey of approaches and framework for building a largescale knowledge base. applications. IEEE Transactions on Knowledge and EAI Endorsed Transactions on Industrial Networks Data Engineering, 29, 2724-2743. and Intelligent Systems, 16(7), 1-8. Yager, R. R. (1987). On the Dempster-Shafer framework and Lehmann, J., Bizer, C., Kobilarov, G., Auer, S., Becker, C., new combination rules. Information , 41(2), Cyganiak, R., & Hellmann, S. (2009). DBpedia: A 93-137. crystallization point for the Web of Data. Journal of Web Semantics, 7, 154-165.

15 http://www.jistap.org JISTaP http://www.jistap.org Research Paper Journal of Information Science Theory and Practice J Inf Sci Theory Pract 6(3): 16-24, 2018 eISSN : 2287-4577 pISSN : 2287-9099 https://doi.org/10.1633/JISTaP.2018.6.3.2

Evaluation of Websites of Public Libraries of India under Ministry of Culture: A Webometric Analysis

Krishna Brahma Manoj Kumar Verma* Department of Library and Information Science, Department of Library and Information Science, Mizoram University, Aizawl, India Mizoram University, Aizawl, India E-mail: [email protected] E-mail: [email protected]

ABSTRACT The purpose of this paper is to investigate the domain authority, number of webpages, links, and calculate the web impact factor of six public libraries of India which are fully funded by Ministry of Culture with the supervision of administration. The data for the study were collected from websites of concerned libraries with the help of a suitable search engine, Open Site Explorer. The study found that the highest domain and page authority was recorded by Khuda Baksh Oriental Public Library and National Library, respectively. It also further revealed that excepting the two libraries, i.e., Khuda Baksh Oriental Public Library and Delhi Public Library, the internal equity-passing links and total internal links of rest of the libraries is zero. National Library leads with maximum total links and total equity-passing links, also with the highest followed linking root domains, total linking root domains, and linking C blocks, and concludes with the web impact factor of Central Secretariat Library recording the maximum, followed by National Library and Khuda Baksh Oriental Public Library. Keywords: webometric, websites, public libraries, Ministry of Culture, link analysis, web impact factor

Open Access Accepted date: August 08, 2018 All JISTaP content is Open Access, meaning it is accessible online to Received date: December 08, 2017 everyone, without fee and authors’ permission. All JISTaP content is published and distributed under the terms of the Creative Commons *Corresponding Author: Manoj Kumar Verma Attribution License (http://creativecommons.org/licenses/by/3.0/). Assistant Professor Under this license, authors reserve the copyright for their content; Department of Library and Information Science, Mizoram University, however, they permit anyone to unrestrictedly use, distribute, and Aizawl, Mizoram 796004, India reproduce the content in any medium as far as the original authors and E-mail: [email protected] source are cited. For any reuse, redistribution, or reproduction of a work, users must clarify the license terms under which the work was produced.

© Krishna Brahma, Manoj Kumar Verma, 2018 Evaluation of Websites of Public Libraries of India

1. INTRODUCTION or a private local area network (Babu, Jeyshankar, & Rao, 2010). Bjorneborn and Ingwersen (2004) defined Libraries are the local gateway to national and global webometrics as “the study of the quantitative aspects of the knowledge and with the help of websites and the Internet, construction and use of information resources, structures, information is publicly available to all. People have become and technologies on the web, drawing on bibliometric more in tune with the Internet, and the curiosity to access and informetric approaches.” This definition covers the information as quickly and easily as possible has increased. construction side and usage side of the web, which embrace Library websites are the primary source of information for the following four main areas of webometrics study: (1) finding information, for simply searching on the Internet is web page content analysis; (2) weblink structure analysis considered the easiest way rather than going to the library (e.g., hyperlink, self-link, and external link); (3) web usage and finding books from the shelf. In the age of digitization, analysis (e.g., exploiting log files for user searching and many libraries have transformed from traditional to digital browsing behavior); and (4) web technology analysis libraries. Under the National Mission on Libraries, Ministry (including search engine performance) (Bjorneborn & of Culture, the Government of India have taken the initiative Ingwersen, 2004). WIF, introduced by Ingwersen (1998), to modernize and digitally link nearly 9,000 libraries across may be defined as the number of web pages in a web site the country to provide access of books and information receiving links from other web sites, divided by the number to potential readers (National Mission on Libraries). The of web pages published in the site that are accessible to the aim of the mission is to transform India into a vibrant crawler (Ingwersen, 1998). knowledge-based society. Culture plays an important role in the development agenda of any nation. The mandate 1.2. National Mission on Libraries of the Ministry of Culture revolves around functions like The National Knowledge Commission gave ten preservation and conservation of our cultural heritage and recommendations on libraries in its 2011 report. So promotion of all forms of and culture, both tangible and based on these recommendations the Government of intangible. The functional spectrum of this ministry is wide, India started the National Mission on Libraries under the ranging from generating cultural awareness at a grassroots Indian Ministry of Culture, to which Raja Rammohun level to promoting cultural exchanges at an international Roy Library Foundation, as a central agency for National level (Ministry of Culture). Mission on Libraries for administrative, logistics, planning, The purpose of the present study is to examine the public and budgeting purposes, was launched by the Honourable libraries websites of India under Ministry of Culture and President of India on 3 February 2014 (Ministry of Culture; rank them based on web impact factor (WIF). It is all about National Mission on Libraries). exploring the present status of link pages, web pages, domain authority, and WIF of the six public libraries websites which 1.3. Public Libraries under the Ministry of Culture are under administrative supervision. Webometrics is an The Ministry of Culture is the Indian government interesting and ongoing area of research in the field of ministry charged with preservation and promotion of art library and information science, as many studies on websites and culture. The mission of the department is to preserve, of academic institutions, libraries, organizations have been promote, and disseminate all forms of art and culture. The recorded for many years. The paper is set to identify the Ministry of Culture exercises administrative supervision most highly visited website, to check out the number of over six public libraries which are listed below. root domains, web pages and authority, number of links, and WIF with the help of an optimization tool (Open Site 1.3.1. The National Library Explorer), which is a search engine for links. After the independence of India, the National Library of India was accorded as an Institution of National Importance 1.1. Webometrics in Article 62 in the 7th schedule of the union list of the A website is a collection of related web pages, images, constitution of India. It is the country’s largest library and videos, or other digital assets that are addressed relative to a the library of public record, established in 1948 and located common uniform resource locator, often consisting of only at Kolkata, West Bengal. the domain name or the IP address and the root path (‘/’) in an Internet protocol based network. A website is hosted on 1.3.2. The Rampur Raza Library at Rampur at least one web server accessible via a network such as the This is one of the largest libraries in Asia, established in

17 http://www.jistap.org JISTaP Vol.6 No.3, 16-24

1774 and located at Rampur, Uttar Pradesh. It is a repository backlink counts. Qutab and Mahmood (2009) investigated of Indo-Islamic cultural heritage and a treasure house of the content of library websites in Pakistan and analyzed knowledge which is now managed by the Government of their navigational strengths and weaknesses. The authors India. The library occupies the position of an autonomous surveyed 52 academic, special, public, and national libraries institution of national importance under the Ministry of websites in Pakistan based on a 77-item checklist. The study Culture, Government of India. found that no library website contained all items on the checklist, websites such as Government College University, 1.3.3. The Khuda Baksh Oriental Public Library Lahore University of Management Sciences, University It is an autonomous institution fully funded by the of Punjab, Air University, University of Management Ministry of Culture, established in 1891 and located at Bihar. Technology, Agha Khan University, Karachi, and Quaid-e- The Government of India declared the library an Institution Azam University, Islamabad have a good number of items of National Importance in 1969 by an act of parliament. included in the checklist, and two items in the checklist were not found on any library website. Joicy and Varghese 1.3.4. Delhi Public Library (2011) evaluated the websites of research and development It is an autonomous organization under the Ministry of institutions in India and their study revealed that a majority Culture, established in 1951 and located at New Delhi. The of the R&D institutions in India provide informative links to library is a national depository library and the biggest library contacts, copyright, news and events, Right to Information, in Southeast Asia. and history, but a few websites provide opportunity for user interaction in the form of feedback, and a majority of 1.3.5. The Central Secretariat Library the R&D institution websites are good for navigating and It is one of the oldest libraries of the Government of finding information. India, funded and administered by the Ministry of Culture, Vijayakumar, Kannappanavar, and Kumar (2012) established in 1891 and located at Kolkata, West Bengal. examined the web presence and links of SAARC countries. The authors used an advanced search facility, i.e., AltaVista 1.3.6. Thanjavur Maharaja Serfoji’s Sarasvati Mahal Library for data collection and the study revealed that India It is one of the oldest libraries in Asia, situated within recorded the maximum number of 14,10,00,000 webpages, the campus of the Thanjavur Palace located at Thanjavur, 58,20,000 external links, 1,18,00,000 internal links, and Tamil Nadu. The library started as a royal library for the 9,83,00,000 overall links, followed by Pakistan and Sri private pleasure of the kings of Thanjavur, who ruled 1535 Lanka for overall links, countries which are maximally to 1675. Since 1918, the library has been a possession of the linked to the Indian domain are Pakistan (3,610), Sri Lanka state of Tamil Nadu. In 1983, the library was declared as an (2,070), and Nepal (728), and Pakistan also occupied Institution of National Importance. first place to get the maximum of 18,300 links from India comparatively from other SAARC countries. Based on the WIF of external links, Sri Lanka occupied first place with 2. LITERATURE REVIEW 0.06495, followed by Pakistan and Bhutan; the highest WIF for overall links was occupied by India with 0.6971 Lee and Teh (2001) have evaluated the content and and also as per the WISER rank for SAARC countries. design of 12 academic library websites of public and Mohamadesmaeil and Koohbanani (2012) studied the private institutions of higher learning in Malaysia and web usability evaluation of Iran National Library website. revealed that the academic libraries in Malaysia generally The authors applied library (attribute) method to develop have set up well-designed and useful websites, a few of a checklist of 11 criteria and 160 components, and features the academic library websites have very simple and basic and also evaluation survey methods were applied too to features. Noruzi (2005) evaluated the WIFs for Iranian assess the usability of the website. The study identified the universities. The study used Alta Vista search engine and elements that are important in the design of national library from its output, counts of links to the websites of Iranian websites and indicated that the lowest amount of usability universities were calculated. The study found that overall, was 6 points for appearance and the highest was 156 points university websites have a low inlink WIF, which shows a for navigation. Walia and Gupta (2012) conducted a study significant correlation between the proportion of English- on WIF of select national libraries websites and their study language pages at an institution’s site and the institution’s revealed that among the selected national libraries, websites

18 Evaluation of Websites of Public Libraries of India of national libraries of America, Australia, and Britain were and their WIF, and observed that e-ShodhSindhu and more visible and hosted more content compared to the DeLCon consortiums are the most popular among the websites of India, Namibia, and South Africa. Thanuskodi selected consortia of India. Verma and Brahma (2017c) (2012) conducted a webpage content analysis on selected also conducted a study on webometric analysis of selected Institutes of National Importance websites in India and non-profit organizations of Assam and have found that revealed that only a few of the institute websites are up to the Centre for North East Studies and Policy Research, date and the rest of the websites do not mention the time/ Guwahati and AARANYAK is at the top rank among date in their homepage; regarding general information the selected non-profit organizations websites in Assam. about homepage features, more was found in IITs and Verma and Brahma (2017a) further conducted a study on less in both the institute i.e., Indian Statistical Institute and webometric analysis of national libraries websites in South Indian Institute of Science, Bangalore. Based on findings, Asia and analyse the number of web pages and link pages, Thanuskodi also suggested that institutes should provide calculate the WIF of national libraries websites, and rank more services like feedback and a sitemap to view the the websites as per the WIF. The study visualized that the overall functions. Shukla and Tripathi (2014) investigated WIF of National Library of India was the highest followed the backlinks of the Institutes of National Importance and by National Library of Sri Lanka and National Library of Premier Management Institutions Library websites. The Bhutan among the other national library websites. researchers retrieved backlinks by four search engines, Google, AlltheWeb, AltaVista, and Yahoo Site, and revealed that among four search engines, Yahoo Site Explorer is more 3. OBJECTIVES OF THE STUDY reliable, and the index page/homepage of library websites attracts the highest number of backlinks over other web 1. Analyze the uniform resource locator of public libraries pages of library websites. Chakravarty and Wasan (2015) websites of India. also analyzed the library websites of higher educational 2. Calculate the number of webpages and domain institutes of India in which Google search engine was used authority of public libraries websites of India. for the study, and they calculated the WIF and R-WIF 3. Examine the link-equity of public libraries websites of of ten library websites and correlated both the formulas India. with Spearman’s Rank Correlation and found very little 4. Find out the internal and external link pages of public difference between the two ranking methods. Verma and libraries websites of India. Devi (2016) studied the web content and design trends of 5. Calculate the WIF of public libraries websites of India. the Indian Institutes of Management libraries website, where they examined the information available on the library webpage of specific universities; as such a checklist was 4. SCOPE OF THE STUDY designed and library webpages were evaluated. Verma and Brahma (2017b) examined the selected library consortium The scope of the present study is limited to six public websites in India by analyzing total numbers of webpages, libraries of India under the Ministry of Culture as listed in domain authority, equity links, internal and external links Table 1.

Table 1. List of libraries of national importance Libraries Website State/city Year of establishment The National Library http://www.nationallibrary.gov.in/ West Bengal 1953 The Rampur Raza Library at Rampur http://razalibrary.gov.in/ Uttar Pradesh 1774 The Khuda Baksh Oriental Public Library http://kblibrary.bih.nic.in/ Bihar 1891 Delhi Public Library http://www.dpl.gov.in/ New Delhi 1951 The Central Secretariat Library http://www.csl.nic.in/ Kolkata 1891 Thanjavur Maharaja Serfoji’s Sarasvati Mahal Library http://www.sarasvatimahal.in/ Tamil Nadu 1918 From Ministry of Culture. (n.d.). Libraries & Manuscripts. Retrieved Jun 30, 2018 from http://www.indiaculture.nic.in/libraries-manuscripts.

19 http://www.jistap.org JISTaP Vol.6 No.3, 16-24

5. METHODOLOGY It predicts the root domain’s ranking potential in search engines based on an algorithmic combination of all link For the present study, data were collected from the metrics. It showed that the domain authority of The Khuda websites of six selected public libraries of India during 18-20 Baksh Oriental Public Library with 63 (24.60%) was the November 2017 by using a suitable search engine, i.e., Open highest, followed by National Library with 56 (21.87%) and Site Explorer (www.opensiteexplorer.org), that counts the Rampur Raza Library at Rampur with 42 (16.40%). While a number of pages in websites and number of pages linking high page authority score means the page has the potential to the websites. Open Site Explorer makes the gathering, to rank well in search engine results, it predicts the page’s sorting, and exporting of link data easier than ever. It is built ranking potential in search engines based on an algorithmic with speed and accessibility at the forefront and provides combination of all link metrics. The page authority of a tremendous amount of information about the links to National Library with 64 (21.62%) was highest, Khuda any page or site. The collected data were tabulated for Baksh Oriental Public Library with 58 (19.59%) recorded exploration and findings of the study. second highest, and Rampur Raza Library at Rampur had 52 (17.56%). 5.1. Method of Calculating WIF Distribution of data by simple web impact factor (SWIF) 6.2. Internal Equity-Passing Links, External Equity- has been calculated by the following formula: Passing Links, and Total Equity-Passing Links Table 3 depicts the internal equity passing links, external Total no. of links SWIF = equity-passing links, and total equity-passing links of Total no. of webpages websites of public libraries under the Ministry of Culture. Distribution of data by internal web impact factor (IWIF) The equity-passing links are the links which pass value has been calculated by the following formula: from one page to another (from page A to page B). Internal equity-passing links are the links pointing to pages inside Total no. of internal links IWIF = your website. Search engines generally consider passing Total no. of webpages ranking values. It was found that the internal equity- Distribution of data by external web impact factor (EWIF) passing links of Delhi Public Library has the highest links has been calculated by the following formula: with 105 (94.59%) and the other five libraries are too low, resulting in Khuda Baksh Oriental Public Library with six Total no. of external links EWIF = links and the rest with zero links. On the other hand, based Total no. of webpages on the external equity-passing links, the National Library with 6,662 (28.70%) was recorded to be highest, followed 6. DATA ANALYSIS by Central Secretariat Library with 6,067 (26.13%) and 6.1. Domain Authority and Page Authority Khuda Baksh Oriental Public Library with 5,857 (25.23%). Table 2 indicates general information in the websites External equity-passing links are the links pointing from of public libraries under the Ministry of Culture which another domain to a page on your website. Search engines includes domain authority and page authority. Domain generally consider passing ranking values that come from authority is a measure of the power of a domain name. external websites. That means the total equity-passing links

Table 2. Domain authority and page authority Serial no. Libraries Domain authority (%) Page authority (%) 1 The National Library 56 (21.87) 64 (21.62) 2 The Rampur Raza Library at Rampur 42 (16.40) 52 (17.56) 3 The Khuda Baksh Oriental Public Library 63 (24.60) 58 (19.59) 4 Delhi Public Library 40 (15.62) 47 (15.87) 5 The Central Secretariat Library 40 (15.62) 50 (16.89) 6 Thanjavur Maharaja Serfoji’s Sarasvati Mahal Library 15 (5.85) 25 (8.44) Total 256 (100) 296 (100)

20 Evaluation of Websites of Public Libraries of India

Table 3. Internal equity-passing links, external equity-passing links, and total equity-passing links Internal equity-passing External equity- Total equity-passing Serial no. Libraries links (%) passing links (%) links (%) 1 The National Library 0 6662 (28.70) 6662 (28.56) 2 The Rampur Raza Library at Rampur 0 4,593 (19.78) 4,593 (19.69) 3 The Khuda Baksh Oriental Public Library 6 (5.40) 5,857 (25.23) 5,863 (25.14) 4 Delhi Public Library 105 (94.59) 23 (0.09) 128 (0.54) 5 The Central Secretariat Library 0 6,067 (26.13) 6,067 (26.01) 6 Thanjavur Maharaja Serfoji’s Sarasvati Mahal Library 0 8 (0.03) 8 (0.03) Total 111 (100) 23,210 (100) 23,321 (100)

Table 4. Total internal links, total external links, and total links Serial no. Libraries Total internal links (%) Total external links (%) Total links (%) 1 The National Library 0 6,687 (28.74) 6,687 (28.58) 2 The Rampur Raza Library at Rampur 0 4,595 (19.75) 4,595 (19.64) 3 The Khuda Baksh Oriental Public Library 24 (18.60) 5,865 (25.21) 5,889 (25.17) 4 Delhi Public Library 105 (81.39) 28 (0.12) 133 (0.56) 5 The Central Secretariat Library 0 6,077 (26.12) 6,077 (25.97) 6 Thanjavur Maharaja Serfoji’s Sarasvati Mahal Library 0 11 (0.04) 11 (0.04) Total 129 (100) 23,263 (100) 23,392 (100)

of National Library with 6,662 (28.56%) scores is highest, all types of links) of National Library occupies top position Central Secretariat Library with 6,067 (26.01%) is at second, with 6,687 (28.58%), Central Secretariat Library has 6,077 and Khuda Baksh Oriental Public Library with 5,863 (25.97%) at second, and Khuda Baksh Oriental Public (25.14%) is at third. Total equity-passing links are the total Library with 5,889 (25.17%) is at third. amount of equity-passing links. 6.4. Followed Linking Root Domains, Total Linking 6.3. Total Internal Links, Total External Links, and Root Domains, and Linking C Blocks Total Links Table 5 depicts the followed linking root domains, total Table 4 illustrates the total internal links, total external linking root domains, and linking C blocks of websites links, and total links of websites of public libraries under the of public libraries under the Ministry of Culture. Linking Ministry of Culture. The internal links are the hyperlinks on root domains are the number of unique domains linking a webpage to another web page resource such as an image or to your domain or page. Followed linking root domain is document on the same website or domain. The total internal a website that links to you. Total linking root domains are links of Delhi Public Library scores maximum with 105 the number of web pages that link to you that include the (81.39%), and Khuda Baksh Oriental Public Library has 24 followed linking root domains. Linking C blocks refers to (18.60%), leaving four other libraries with zero links, which the part of the IP address that is different. The table clearly indicates similar results with the above Table 3; whereas, shows that the followed linking root domains, total linking the external links are hyperlinks that point at any domain root domains, and linking C blocks of National Library are other than the domain the link exists on (source). The total recorded to be on top with 139 (58.89%), 155 (58.49%), and external links of National Library has the maximum links 136 (59.91%) respectively, which is followed by Rampur with 6,687 (28.74%), followed by Central Secretariat Library Raza Library with 39 (16.52%), 41 (15.47%), and 32 (14.09%) with 6,077 (26.12%) and Khuda Baksh Oriental Public respectively, while third position was occupied by Khuda Library with 5,865 (25.21%). It is clearly viewed that the total Baksh Oriental Public Library with 27 (11.44%), 31 (11.69%), links (the total amount of links to a site and this would be and 28 (12.33%) respectively.

21 http://www.jistap.org JISTaP Vol.6 No.3, 16-24

Table 5. Followed linking root domains, total linking root domains, and linking C blocks Followed linking root Total linking root Serial no. Libraries domains (%) domains (%) Linking C blocks (%) 1 The National Library 139 (58.89) 155 (58.49) 136 (59.91) 2 The Rampur Raza Library 39 (16.52) 41 (15.47) 32 (14.09) 3 The Khuda Baksh Oriental Public Library 27 (11.44) 31 (11.69) 28 (12.33) 4 Delhi Public Library 14 (5.93) 18 (6.79) 12 (5.28) 5 The Central Secretariat Library 14 (5.93) 15 (5.66) 14 (6.16) 6 Thanjavur Maharaja Serfoji’s Sarasvati Mahal Library 3 (1.27) 5 (1.88) 5 (2.20) Total 236 (100) 265 (100) 227 (100)

Table 6. Web impact factor Serial no. Libraries IWIF EWIF SWIF Ranking 1 The Central Secretariat Library 0 121.54 121.54 1 2 The National Library 0 104.48 104.48 2 3 The Khuda Baksh Oriental Public Library 0.41 101.12 101.53 3 4 The Rampur Raza Library at Rampur 0 88.36 88.36 4 5 Delhi Public Library 2.23 0.59 2.82 5 6 Thanjavur Maharaja Serfoji’s Sarasvati Mahal Library 0 0.44 0.44 6 IWIF, internal web impact factor; EWIF, external web impact factor; SWIF, simple web impact factor.

6.5. WIF easily understood that the overall WIF of Central Secretariat Table 6 explores the ranking of WIF of websites of public Library recorded the highest, followed by National Library libraries under the Ministry of Culture by calculating the and Khuda Baksh Oriental Public Library. The lowest WIF IWIF, EWIF, and SWIF. The WIF provides quantitative was revealed from Thanjavur Maharaja Serfoji’s Sarasvati tools for ranking, evaluating, categorizing, and comparing Mahal Library. web sites, top-level domains, and sub-domains. The WIF is a form of measurement used to determine the relative standing of web sites in particular fields or a country: for 7. MAJOR FINDINGS OF THE STUDY instance, academic web sites in a country. The higher the impact factor, the higher the perceived reputation of the 1. It is found that the domain authority of Khuda Baksh web site (Noruzi, 2006). It is the number of webpages in a Oriental Public Library with 63 (24.60%) was the website receiving links from other websites, divided by the highest and the page authority of National Library with number of webpages published in the site, that is accessible 64 (21.62%) was highest. to the crawler. The SWIF is the ratio of links to the number 2. The internal equity-passing links of Delhi Public of pages. IWIF is the ratio of internal links within the site to Library leads with highest links (94.59%), while the number of pages. EWIF is the ratio of links made from National Library of India leads with highest external external sites to the target site, to the number of pages at equity-passing links (28.70%) and total equity-passing the site. It visualized that the Central Secretariat Library has links (28.56%). the maximum EWIF and SWIF with 121.54. The National 3. The total internal links of Delhi Public Library secured Library occupies second place with 104.48 EWIF and SWIF. the maximum with 105 (81.39%), whereas the National And the third position was occupied by Khuda Baksh Library of India secured highest total equity external Oriental Public Library with 101.12 EWIF and 101.53 SWIF. links (28.74%) and total links (28.58%). The IWIF of Delhi Public Library is 2.23, while the rest of 4. The followed linking root domains, total linking root the IWIF libraries is zero. Hence, from the table it can be domains and linking C blocks of National Library of

22 Evaluation of Websites of Public Libraries of India

India was recorded to be on top with 139 (58.89%), 155 institutions to make their websites more dynamic. In the (58.49%), and 136 (59.91%) respectively. same way, readers also will be able to compare and identify 5. The overall WIF of Central Secretariat Library (EWIF the most visited library websites. The findings showed & SWIF=121.54) recorded the highest, followed by that the overall WIF of Central Secretariat Library was National Library (EWIF & SWIF=104.48) and Khuda ranked at the top position. It also revealed that the internal Baksh Oriental Public Library (EWIF=101.12 & equity-passing links and total internal links of four libraries SWIF=101.53). The lowest WIF was revealed from (National Library, Rampur Raza Library, Central Secretariat Thanjavur Maharaja Serfoji’s Sarasvati Mahal Library. Library, and Thanjavur Maharaja Serfoji’s Sarasvati Mahal Library) have zero links, which is an indication of poor visibility. The IWIF of all the six libraries are not in a very 8. DISCUSSION AND CONCLUSION good status, thus it is suggested to improve the internal links of respective library websites for its better accessibility and The Government of India have taken the initiative, i.e., visibility and the library websites should be interlinked with National Mission on Libraries to modernize and digitally each other to make the resources used at it is desired. link public libraries across the country, in which six public libraries are given administrative supervision funded by the Ministry of Culture. This paper aims at knowing the present REFERENCES status of websites of public libraries under the Ministry of Culture and provides information regarding those selected Babu, B. R., Jeyshankar, R., & Rao, P. N. (2010). Websites of public libraries’ websites. Websites and the Internet are central universities in India: A webometric analysis. an integral part of library service across the world, and to Journal of Library & Information Technology, 30(4), design websites to be attractive, more interactive, and more 33-43. user friendly is the duty of the web designer. The quality Bjorneborn, L., & Ingwersen, P. (2004). Towards a basic and visibility of these selected public library websites is framework for webometrics. Journal of the American improving day by day as was proved from comparison with Society for Information Science and Technology, 55(14), findings of a previous study (Jhamb & Ruhela, 2017) and 1216-1227. this present study. It visualizes improvement in the findings Chakravarty, R., & Wasan, S. (2015). Webometric analysis which is good for the website score. The previous study of library websites of Higher Educational Institutes shows that the highest domain authority of Khuda Baksh (HEIs) of India: A study through Google search Oriental Public Library was 62 and the page authority engine. DESIDOC Journal of Library & Information of National Library of India was 63, whereas the present Technology, 35(5), 325-329. study revealed that Khuda Baksh Oriental Public Library’s Ingwersen, P. (1998). The calculation of web impact factors. domain authority is 63 and National Library of India’s page Journal of Documentation, 54(2), 236-243. authority is 64. The previous study also found that the Jhamb, G., & Ruhela, A. (2017). A webometric study of the highest total equity-passing links and total links of National websites of public libraries. International Journal of Library of India were 5,985 and 6,009 respectively, while the Library and Information Studies, 7(4), 83-89. present study revealed 6,662 and 6,687 respectively. There Joicy, A. J., & Varghese, R. R. (2011). Websites of research is a difference in the result of then and now. The previous and development institutions in India: A webometric study’s findings also viewed that the Central Secretariat study. International Journal of Digital Library Services, Library ranks first with highest EWIF as well as SWIF of 1(2), 90-104. 115.8 each, whereas the present study’s findings visualize Lee, K. H., & Teh, K. H. (2001). Evaluation of academic that the Central Secretariat Library has the maximum EWIF library web sites in Malaysia. Malaysian Journal of and SWIF of 121.54. There is a difference of 5.74 increase in Library & Information Science, 5(2), 95-108. both EWIF and SWIF from the previous study to present Ministry of Culture. (n.d). About us. Retrieved Jun 30, 2018 study. The IWIF of Sarasvati Mahal Library ranks at the top from http://www.indiaculture.nic.in/about-us. with 23.25 in the previous study, but in the present study it Ministry of Culture. (n.d.). Libraries & Manuscripts. is found to be zero and Delhi Public Library tops with the Retrieved Jun 30, 2018 from http://www.indiaculture. highest IWIF of 2.23. nic.in/libraries-manuscripts. This study will be helpful for the web designers of the Mohamadesmaeil, S., & Koohbanani, S. K. (2012). Web

23 http://www.jistap.org JISTaP Vol.6 No.3, 16-24

usability evaluation of Iran national library website. Verma, M. K., & Brahma, K. (2017a). A webometric analysis COLLNET Journal of Scientometrics and Information of national libraries’ websites in South Asia. Annals of Management, 6(1), 161-174. Library and Information Studies, 64(2), 116-124. National Mission on Libraries (n.d). National Mission on Verma, M. K., & Brahma, K. (2017b). Webometric Libraries. Retrieved Jun 30, 2018 from http://www. analysis of selected library consortium websites of nmlindia.nic.in/ India: An evaluative study. In Proceedings of the 11th Noruzi, A. (2005). Web impact factors for Iranian International CALIBER (pp. 328-341). Tamil Nadu: universities. Webology, 2(1), 1-26. Retrieved Jun 30, INFLIBNET. 2018 from http://www.webology.org/2005/v2n1/a11. Verma, M. K., & Brahma, K. (2017c). A webometric analysis html of selected non-profit organizations (NGOS) of Assam. Noruzi, A. (2006). The web impact factor: A critical review. KIIT Journal of Library and Information Management, The Electronic Library, 24(4), 490-500. 4(1), 63-72. Qutab, S., & Mahmood, K. (2009). Library web sites in Verma, M. K., & Devi, K. K. (2016). Web content and design Pakistan: an analysis of content. Electronic Library and trends of Indian Institutes of Management (IIMs) Information Systems, 43(4), 430-445. libraries website: An analysis. DESIDOC Journal of Shukla, A. & Tripathi, A. (2014). Backlinks analyses Library and Information Technology, 36(4), 220-227. of institutes of national importance and premier Vijayakumar, M., Kannappanavar, B. U., & Kumar K. T. S. management institutions library websites. Journal of (2012). Webometric analysis of web presence and links International Academic Research for Multidisciplinary, of SAARC countries. DESIDOC Journal of Library & 2(7), 560-575. Information Technology, 32(1), 70-76. Thanuskodi, S. (2012). A webometric analysis of selected Walia, P. K., & Gupta, M. (2012). Web impact factor of institutes of national importance websites in India. select national libraries’ websites. DESIDOC Journal of International Journal of Library Science, 1(1), 13-18. Library and Information Technology, 32(4), 347-352.

24 JISTaP http://www.jistap.org Research Paper Journal of Information Science Theory and Practice J Inf Sci Theory Pract 6(3): 25-36, 2018 eISSN : 2287-4577 pISSN : 2287-9099 https://doi.org/10.1633/JISTaP.2018.6.3.3

Anonymous and Non-anonymous User Behavior on Social Media: A Case Study of Jodel and Instagram

Regina Kasakowskij* Natalie Friedrich Department of Information Science, Heinrich Heine Department of Information Science, Heinrich Heine University Düsseldorf, Düsseldorf, Germany University Düsseldorf, Düsseldorf, Germany E-mail: [email protected] E-mail: [email protected]

Kaja J. Fietkiewicz Wolfgang G. Stock Department of Information Science, Heinrich Heine Department of Information Science, Heinrich Heine University Düsseldorf, Düsseldorf, Germany University Düsseldorf, Düsseldorf, Germany E-mail: [email protected] E-mail: [email protected]

ABSTRACT Anonymity plays an increasingly important role on social media. This is reflected by more and more applications enabling anonymous interactions. However, do social media users behave different when they are anonymous? In our research, we investigated social media services meant for solely anonymous use (Jodel) and for widely spread non-anonymous sharing of pictures and videos (Instagram). This study examines the impact of anonymity on the behavior of users on Jodel compared to their non-anonymous use of Instagram as well as the differences between the user types: producer, consumer, and participant. Our approach is based on the uses and gratifications theory (U>) by E. Katz, specifically on the sought gratifications (motivations) of self-presentation, information, socialization, and . Since Jodel is mostly used in Germany, we developed an online survey in German. The questions addressed the three different user types and were subdivided according to the four motivation categories of the U>. In total 664 test persons completed the questionnaire. The results show that anonymity indeed influences users’ usage behavior depending on user types and different U> categories. Keywords: user behavior, anonymity, social media, uses and gratifications theory, identifiability, user roles

Open Access Accepted date: September 12, 2018 All JISTaP content is Open Access, meaning it is accessible online to Received date: May 03, 2018 everyone, without fee and authors’ permission. All JISTaP content is published and distributed under the terms of the Creative Commons *Corresponding Author: Regina Kasakowskij Attribution License (http://creativecommons.org/licenses/by/3.0/). Student Under this license, authors reserve the copyright for their content; Department of Information Science, Heinrich Heine University however, they permit anyone to unrestrictedly use, distribute, and Düsseldorf, Universitätsstraße 1, Düsseldorf 40225, Germany reproduce the content in any medium as far as the original authors and E-mail: [email protected] source are cited. For any reuse, redistribution, or reproduction of a work, users must clarify the license terms under which the work was produced.

© Regina Kasakowskij, Natalie Friedrich, Kaja J. Fietkiewicz, Wolfgang G. Stock, 2018 JISTaP Vol.6 No.3, 25-36

1. INTRODUCTION applied by users to satisfy their needs for self-presentation, information, socialization, or entertainment. In this study As early as 1993, Peter Steiner portrayed the concept of we are going to investigate whether the non-anonymous online anonymity with his adage “On the Internet, nobody and anonymous SNSs are being actively applied to meet knows you’re a dog.” The cartoon features two dogs, one of different needs of the users. them sitting on a chair in front of a computer and speaking Usually there are three different ways of dealing with the caption to a second dog sitting on the floor and listening. social media: We produce (e.g., create a post), consume His cartoon marks a notable moment in the history of the (e.g., read a post), and participate (e.g., like or comment on Internet and symbolizes a certain understanding of privacy a post). Therefore, we take on different roles when using and personal identity on the web. You can hide your real SNSs, which leads to the theoretical constructs of different personality behind the screen and create a new identity. user types: producers, participants, and consumers. Shao Gender, age, looks—everything is up to you. Thus, facts (2009) assumes that every gratification is related to a specific about one’s self may be true, but alternatively they can user role. For example, content is produced to satisfy the be fabricated or exaggerated and used for legal or illegal need for self-representation, it is consumed to satisfy the purposes (Morahan-Martin & Schumacher, 2000; Jordan, craving for information and entertainment, and, finally, 2002). However, this has changed over time through the users participate in order to interact socially. privacy policies of some social networking services (SNSs) According to Zimmer, Scheibe, and Stock (2018), user (Krishnamurthy & Wills, 2009; Peddinti, Ross, & Cappos, roles are not limited to a specific gratification type but each 2014). These enforce a real-name policy that requires users role may by pursued to obtain different types of gratification. to reveal their legal name. On such SNSs, if we create an Thus, a consumer, producer, or participant can satisfy his or account by adding a profile picture and name, we make her need for entertainment, information (Lee & Ma, 2012), ourselves easily identifiable to others. Other SNSs omit this and self-representation, as well as socialization. In the role principle as they do not have a real-name policy or do not of consumer, users act only passively in social media. They require creating a profile in general, for example, Yik Yak, listen to or watch occurrences on social media in order to be 7 Cups, Blind, Jodel, and Whisper. A user can decide what informed or entertained. In addition, a user can consume kind of SNSs he or she wants to use and for what purpose. social media in order to identify herself or himself with The recurring emergence of SNSs supporting anonymous other users as well as gain insights into the living conditions usage indicates the existing demand for such an option and of others. In the role of a producer, users actively contribute is a topic of contemporary importance (Zhang & Kizilcec, to social media. They produce and send content on SNSs 2014; Peddinti et al., 2014; Scott & Orlikowski, 2014). It is to represent themselves. Also, a user can produce content therefore important to find out what are the possible motives to inform or entertain others. In addition, a producer has to use anonymous or non-anonymous SNSs. Furthermore, the opportunity to make new acquaintances by addressing it is interesting to know whether there are any differences other users. In the role of a participant, users partake actively in the use of such services depending on the different user in social media, but their input is not as extensive as the one roles. of the producers. Since they participate in events on social media, they are simultaneously consumers. They comment 1.1. Research Background on, like, or share content with other users to maintain social A popular approach to understanding mass contacts and promote engaging topics through positive communication is the uses and gratifications theory feedback. Users can also participate to share or complete (U>) by Blumler and Katz (1974). This theory follows their opinions about certain information. In addition, by the approach that people use media to satisfy their specific participating in SNSs, users can help others in their self- needs in the form of gratifications. Based on Katz, Blumler, expression (through, for example, likes or positive as well and Gurevitch (1973) and Blumler and Katz (1974), the as negative comments and ratings). The different roles that theory places more focus on the audience instead of the users take on during their social media usage can lead to actual sender by asking “what people do with media” rather obtaining different gratifications. Therefore, we are going than “what media does to people.” It assumes that members to examine whether user behavior changes with respect of the audience are not passive, but take an active role in to the role he or she assumes (i.e., consumer, producer, or interpreting and integrating media into their own lives participant). (McQuail, 1994). It is furthermore suggested that SNSs are

26 Anonymous and Non-anonymous User Behavior on Social Media

Anonymous and (a) (b) nonanonymous use of social media

User roles

Producer Consumer Participant

Gratification types

Self-presentation Information Socialization Entertainment

Fig. 1. Our research model. Fig. 2. Post on Instagram (a) and posts on Jodel (b).

1.2. Research Questions and Objectives are needs that a user with a particular role wants to satisfy This study examines the impact of anonymity on the by using anonymous or non-anonymous social media behavior of users on Jodel compared to non-anonymous platforms. usage of Instagram as well as the possible differences To target users of anonymous and non-anonymous social between the different roles that the user can take on. We media, we have chosen to study two mobile SNSs, namely formulate two research questions: Jodel and Instagram. Instagram represents an SNS in which What are the differences in social media usage motivation users can be identifiable, whereas Jodel is an SNS where on anonymous and non-anonymous platforms? users remain anonymous. These two services were chosen Does the social media usage behavior change when because they are very successful as well as of high quality considering the different user types (producers, participants, (Nowak, Jüttner, & Baran, 2018; Scholl, 2015). They both and consumers)? have a high number of active users and can therefore be As seen in Fig. 1, we distinguish two types of social media considered as suitable media for estimating a representative usage, namely non-anonymous and anonymous. Non- mass. anonymous users are clearly identifiable by their real name Instagram is a free online sharing service for photos and or pseudonym (including artist name). Anonymous (also videos owned by Facebook Inc. It was developed in 2010 pseudo-anonymous) users are not identifiable. Pseudo- by Kevin Systrom and Mike Krieger. It is a combination of anonymous users are users who have no visible identifier a microblog and an audiovisual platform. When creating or information that can be linked to them. However, this a profile on Instagram, users can decide whether or not to does not mean that messages cannot be traced back to their use a real name and profile picture. With 800 million active sources because a user’s identifier is available to service users worldwide, Instagram is currently one of the most providers or website administrators in the form of login ID’s popular social media platforms (Sheldon & Bryant, 2016). or IP addresses. It is not clear to other users who the real We have chosen Instagram as an example of a platform that person is because there is no name or image connected to can be applied by non-anonymous users characterized by a the profile. high degree of identifiability. In a further step, we assign different roles to the users. Jodel is an anonymous mobile social media application Here, a distinction is made between producers, consumers, that is mostly used by students. It was developed in 2014 and participants. All users are consumers, users producing by Alessio Borgmeyer in Aachen, Germany and quickly content are producers, and users who react to posts in became popular in German-speaking countries. The free the form of likes, votes, or comments are participants. We app allows users to send short messages that anyone in the assume that each user in each role applies social media to community can read. Those short messages may contain obtain certain gratifications. Following U>, we selected jokes, opinions, questions, discussions, or (real-time) photos four gratification types: self-presentation, information, and can be seen by community members who are located socialization, and entertainment (Katz et al., 1973; Blumler within the radius of about ten kilometers. Each of these & Katz, 1974; McQuial, 1994; Zimmer et al., 2018). These so-called “Jodel” can be positively or negatively evaluated

27 http://www.jistap.org JISTaP Vol.6 No.3, 25-36

(applying up- and down-votes) and commented on by other effect” (Suler, 2005). Disinhibition can also have a positive community members situated nearby. The valuation is effect on communities and their online behavior. For ultimate and cannot be undone. If a Jodel receives a negative example, anonymity can provide coverage for intimate and valuation of minus five, it is removed from the feed. The open conversations. This is also stated by Peddinti et al. community is self-regulating and decides independently (2014), who found a correlation between content sensitivity what it wants to see in the feed. One of Jodel’s features is and a user’s decision to be anonymous. Zhang and Kizilcec “Karma,” which is displayed in the top right corner of the (2014) state that anonymous sharing is a popular choice, app (Thiele, 2015). According to the developers of Jodel, the especially for controversial content. In addition, anonymity karma points indicate how much good has been done for can encourage experimentation with new ideas or memes. the Jodel community so far. Jodel’s use is always anonymous. Furthermore, under the mask of anonymity, failures (e.g., There are no friends or followers. It only counts who is no reaction to threads) can be mitigated (Dibbell, 2010), nearby (Nowak et al., 2018; Wielert, 2017). In general, whereas identifiability preserves the memory of failure and Jodel is similar to Yik Yak, a former successful anonymous feelings of being ignored for a longer time. Black et al. (2015) application which ceased operation in May 2017 (Kolodny, and Saveski et al. (2016) found no significant differences 2017). In Fig. 2 we can see an example of a post on in the usage behavior of anonymous and non-anonymous Instagram (left hand side) and on Jodel (right hand side). users on social media with regard to the content of posts. Only a slight increase in vulgarity usage was identified for anonymous users. 2. RELATED WORKS Seigfried-Spellar and Lankford (2017) go one step further, not distinguishing between anonymous and identifiable Anonymity, even in the form of quasi-anonymity, users, but focusing on individuals (posters, trolls, lurkers, offers users a new way of communicating and expressing confessors) on the anonymous social media platform Yik themselves. Non-anonymous social media can put pressure Yak. They suggest that there are differences in behavior in on users to manifest themselves as consistent, optimistic, terms of online environment and morality of individuals and competent all the time. If this is the case, our user who post, troll, confess, or passively lurk on anonymous behavior can change as soon as we are anonymous. Another social media. possibility is that there is no behavioral change in the use of So far no one has investigated how anonymity affects the social media and, thus, no difference between identifiable user behavior of consumers, producers, and participants, and anonymous usage. also without reference to aggression, vulgarity, and violence. The fact that anonymity strongly influences people’s When disregarding aggression, violence, and anti-social behavior has long been established by socio-psychological or unrestrained behavior, is there a difference between research. One of the most remarkable works was done by anonymous and non-anonymous producers, consumers, Zimbardo (1969). In a series of experiments he found out and participants in terms of gratifications they seek when that people in an anonymous state develop a tendency using a social media platform? towards greater aggression and violence. Katzer (2016) also indicates that the behavior of an individual changes when in a group and that anonymity promotes this process of 3. METHODS deindividuation. Similar behavior can also be observed on SNSs. But do we only use anonymous social media to satisfy To find out how anonymity affects user behavior in our need for aggression, violence, and immoral actions? terms of the four motivation categories (self-presentation, Several studies have shown how the state of anonymity information, socialization, and entertainment) and with affects online behavior (Bernstein et al., 2011; Postmes, regard to the three user roles (consumers, participants, Spears, & Lea, 1998; Seigfried-Spellar & Lankford, 2017; and producers), a questionnaire for Jodel and Instagram Saveski, Chou, & Roy, 2016; Black, Mezzina, & Thompson, users was created. Since Jodel is most popular in German- 2015; Wielert, 2017; Wodzicki, Schwämmlein, Cress, & speaking countries, we restricted our investigation to Kimmerle, 2011). Generally, they indicate that anonymity German-speaking users and created an online survey in can have both a positive as well as a negative effect on user German (Fig. 3). behavior. Negative influences include mob and antisocial First, we inquired whether the survey participant has behaviors, which are triggered by the “online disinhibition an Instagram and Jodel account. If the participant has an

28 Anonymous and Non-anonymous User Behavior on Social Media

I use Instagram, ...

Self-presentation * The distance between two values is always the same.

disagree fully agree 1 2 3 4 5 6 7 to identify myself with other users. to present myself. to present myself or help others representing themselves through likes or positive/negative comments.

Fig. 3. Sample question from the online survey (translated from German). account on both platforms, we asked for the durability and October 31, 2017 to November 22, 2017 on various social the type of activity on each of them. In order to identify media channels such as Facebook, Jodel, and Instagram. the different user types, we asked how frequently a user This ensured that both Instagram and Jodel users could performs different actions on each platform (e.g., posting, be reached. Since Jodel is a location-based service, we voting, or commenting). artificially altered the location in order to distribute the The following questions addressed the four motivation survey throughout Germany. We focused on cities with categories of the U> adjusted to the three user types. universities or other institutions of higher education so The first category, self-presentation, includes such factors that we could reach a large amount of Jodel users (who as identifying oneself with other users, presenting oneself, are usually students). Based on these criteria 40 cities were or helping others in their self-presentation. The second selected to distribute the survey via Jodel. motivation category, information, covers the questions The collected data were not normally distributed whether the user distributes, receives, or complements (Shapiro-Wilk test). Therefore, for the evaluation we news and information. For the third category, socialization, calculated median and the interquartile range for each we asked if the user is on Instagram or Jodel to establish investigated aspect distinguished by the application (non- contacts and whether the user wants to gain insights into the anonymous Instagram and anonymous Jodel usage), user lives of others. For the fourth category, entertainment, we type (consumer, participant, and producer) and motivation asked if the users apply the service to entertain themselves category (information, self-presentation, entertainment, or others and if they promote entertaining content. In order and socialization). We applied the Wilcoxon signed-rank to exclude anonymous Instagram users, we asked if the test to investigate the differences between anonymous user is registered with his or her real name on Instagram. and non-anonymous usage behavior. The test was Subsequently, we asked about the user’s attitude towards proposed by chemist and statistician Wilcoxon (1945) being anonymous on Instagram and Jodel, and how and is a nonparametric statistical test that uses two paired he or she values anonymity. Finally, we collected socio- (dependent) samples to check the equality of the key trends demographic data (gender, birth year, and educational in the underlying populations. background). To determine the different user types, we referred to the To adequately measure Jodel and Instagram usage posting, liking, or voting behavior. The consumer category behavior of the survey participants we applied a 7-point included all users of the respective social media. Producers Likert scale for the responses. Likert (1932) developed the were the ones who generated content more than once a principle of measuring attitudes by asking people to respond week. Participants were those who responded to content to a series of statements about a topic. The responses can be more than once a week. In order to investigate the difference marked on a 7-point scale where 1 stands for disagreement between sought gratifications while using an anonymous and 7 for full agreement. This way it is possible to provide and a non-anonymous platform, we selected users who were a neutral response (4) as well as a precise evaluation of the producers or participants on both platforms, Instagram and tendency of the answers. Jodel, for the statistical analysis with Wilcoxon signed-rank The online survey was distributed in the period from test.

29 http://www.jistap.org JISTaP Vol.6 No.3, 25-36

4. RESULTS For the gratification type socialization there is no result for participants, because it could not be investigated. The Out of 746 respondents, 664 completed the questionnaire. reason is the impracticability of anonymous participants to Out of these, 224 (33.8%) respondents were male, 420 stay in contact with friends. They are not able to recognize (63.4%) female, and 16 (2.4%) of another gender. 4.5% of the their new or old friends, so they cannot be compared to respondents were 14 to 17 years old, 71.3% of the respondents participants who are identifiable. were in the age range of 18 to 24, 16.8% were 25 to 30 years Strong differences between anonymous and non- old, and 6.8% were more than 30 years old. We had 426 anonymous social media usage are particularly noticeable consumers on Instagram and 424 on Jodel. Participants among participants who are motivated by information and included 411 Instagram users and 422 Jodel users. The among producers who are motivated by entertainment and producers included 371 Instagram users and 351 Jodel users. self-presentation. Nearly all differences between anonymous Table 1 shows the medians and interquartile ranges and non-anonymous usage are statistically significant, with for consumers, producers, and participants regarding the exception of socialization by producers. In the following, the sought gratification categories of self-representation, we investigate the different gratification types for each user information, socialization, and entertainment while using role more closely. Instagram (not anonymous) and Jodel (anonymously). The results in Fig. 4 show boxplots for different gratification

Table 1. Sought gratifications of identifiable (Instagram) and anonymous (Jodel) consumers, producers and participants Consumer Participant Producer (n=245) (n=232) (n=174) Instagram Jodel Instagram Jodel Instagram Jodel Siga) Siga) Siga) Median IQR Median IQR Median IQR Median IQR Median IQR Median IQR

Self-presentation 3 3 4 3 *** 3 3 3 4 * 4 3 2 2 *** Information 5 2 6 1 ** 2 2 4 3 *** 3 3 4 3 *** b) Socialization 5 2 4 3 *** NA NA NA NA NA 2 3 2 3 NS Entertainment 6 2 7 1 *** 4 2 5 3.5 *** 2 4 4 3 *** Likert scale from 1 (“do not agree”) to 7 (“fully agree”). IQR, interquartile range; Sig, significance of difference; NA, not applicable; NS, not significant. *P<0.05, **P<0.01, ***P<0.001; a)Wilcoxon rank test; b)P>0.05.

7

6

5

505 4 493 552 640

465 561 42 Likert scale values 628 3 491 86 354

426 364 408 214 2 493

86 380 590 406 338 5516 198 1 391 383 544 145

Self-presentation Self-presentation Information on Information on Socialization on Socialization on Entertainment Entertainment on Instagram on Jodel Instagram Jodel Instagram Jodel on Instagram on Instagram

Fig. 4. Sought gratifications of identifiable (Instagram) and anonymous (Jodel) consumers.

30 Anonymous and Non-anonymous User Behavior on Social Media

628 189 7 6

6

5

4

Likert scale values 3

2

1

Self-presentation Self-presentation Information on Information on Socialization on Socialization on Entertainment Entertainment on Instagram on Jodel Instagram Jodel Instagram Jodel on Instagram on Instagram

Fig. 5. Sought gratifications of identifiable (Instagram) and anonymous (Jodel) producers. types distinguished between anonymous and identifiable producers on the two investigated platforms. It can be social media usage by consumers. Considering the observed that producers do not apply media in which they motivation category self-representation, consumers on the are anonymous to represent themselves; this is shown by anonymous platform are rather moderately motivated by a very low median of 2 for Jodel. The non-anonymous this factor (median of 4 on the 7-point Likert scale), whereas usage of Instagram is more motivated by this factor with on the non-anonymous platform (Instagram) the tendency a median of 4, which is a rather neutral value, but the is rather negative (median of 3). As for the motivation highest one for usage of Instagram in the role of producer. category information, consumers on both platforms seem We observe that producers prefer not to use Instagram to to be driven by this factor. The need for information is, spread information or news (median of 3). They are more however, stronger when using Jodel (median amounts neutral about this factor when applying Jodel (median of 4). to 6) than when using Instagram (median of 5). When For the motivation category socialization both usage types using Jodel, consumers are rather neutral regarding the by the producers, anonymous as well as non-anonymous socialization (median of 4), while when applying Instagram ones, have a median of 2, meaning that producers do not they have a slightly higher interest in the living conditions seek social relationships on either of the platforms. For and lifestyles of others (median of 5). Usage of both this motivation category there is no significant difference platforms is strongly driven by the need for entertainment, between the two usage types. especially on Jodel where the median reaches the highest It can be observed that producers seek more entertainment possible value of 7. Still, the use of Instagram is also strongly on the anonymous platform than on the non-anonymous motivated by this factor (median of 6). All differences one (median of 4 for Jodel in contrast to median of 2 for between Jodel and Instagram usage are statistically Instagram). Thus, anonymous producers tend to entertain significant (at least at the level P<0.01). users more than while using a non-anonymous platform. All in all, the results show clear differences in usage However, the interquartile range for Instagram is very high behavior when using anonymous and non-anonymous (4), meaning that there are also producers who might be platforms. When being anonymous (Jodel), consumers slightly motivated by this factor, as there are many answers want to identify themselves with others and they seek a great in the spectrum between 1 and 5. Still, these results are very deal of information and even more entertainment. While surprising, especially regarding the behavior of producers being identifiable (Instagram), the consumers are more on non-anonymous platform. On Instagram, they are rather likely to gratify their need for socialization. “neutral” towards self-presentation, not very motivated by The boxplots in Fig. 5 summarize the responses from information, and very negatively opposed to socialization

31 http://www.jistap.org JISTaP Vol.6 No.3, 25-36

569 257 7 631 143

6

5

4

Likert scale values 3

2

1

Self-presentation Self-presentation Information on Information on Entertainment Entertainment on Instagram on Jodel Instagram Jodel on Instagram on Instagram

Fig. 6. Sought gratifications of identifiable (Instagram) and anonymous (Jodel) participants.

Table 2. Attitudes towards anonymity on Instagram and Jodel by non-anonymous Instagram users (n=132) Instagram Jodel Median IQR Median IQR Siga) I (would) like to be anonymous on… 4 3 7 1 *** I (would) dare more when (I were) anonymous on… 4 3 6 2 *** Likert scale from 1 (“do not agree”) to 7 (“fully agree”). IQR, interquartile range; Sig, significance of difference. ***P<0.001; a)Wilcoxon rank test.

and entertainment. On Jodel they are neutral about others in their self-representation and evaluate or comment information and entertainment, whereas for entertainment on information than on a non-anonymous one. Altogether we can recognize a rather positive tendency (spectrum the differences between anonymous and non-anonymous between 3 and 6). Finally, there appears to be no seeking for usage are significant. self-presentation and socialization on this platform. The boxplots in Fig. 7 show the users’ attitudes towards When analyzing the results for participants (Fig. 6), we being anonymous on Instagram and Jodel. The differences can recognize that there are again visible differences between are according to Wilcoxon rank test significant (Table 2). As anonymous and non-anonymous usage. With a median of expected, identifiable users of Instagram have a neutral opinion 2 (and interquartile range between 1 and 3), participants (median of 4) regarding being anonymous on this platform. on Instagram do not seem to want to spread or receive In contrast, they appreciate being anonymous on Jodel. The information at all, while on Jodel they are more neutral median of 7 on the Likert scale represents an unambiguous about it (median of 4 with a rather positive tendency). attitude. In contrast to the anonymous use of Jodel (median Participants seem to be little interested in self-presentation of 6), when anonymously using Instagram, the users would as well, as median amounts of 3 for both platforms (however, still not dare more when producing content or participating with a more positive tendency on Jodel). Finally, participants in exchanges (median of 4). This could indicate that the on Instagram are rather neutral regarding entertainment general type of social media platform—one being purely (median of 4) while on Jodel they are more motivated by anonymous (Jodel), the other enabling either anonymous or this factor (median of 5). non-anonymous usage (Instagram)—has an impact on user Overall, participants are more likely to promote behavior. In general, anonymity seems to play a minor role on entertainment on an anonymous platform as well as help Instagram and, logically, a very high one on Jodel.

32 Anonymous and Non-anonymous User Behavior on Social Media

7

6

5

299 310 4 342

402 Likert scale values 464 660 3 294

325 2

302 325 1 417 239 356 302

I like to be anonymous on I like to be anonymous on I would dare more if I were I would dare more if I were Instagram Jodel anonymous (on Instagram) anonymous (on Jodel)

Fig. 7. Attitudes towards being anonymous on social media.

5. DISCUSSION information and entertainment when being anonymous, and more self-presentation when being identifiable. Except Anonymity plays an increasingly important role on the for socialization as a consumer and self-presentation as a Internet and on social media in particular. There are more producer, in eight remaining cases the median values for and more applications that allow users to preserve their sought gratifications are either higher or at least the same anonymity. Such applications, like Jodel, are particularly for anonymous usage (Jodel). Only for producers seeking popular among students. Anonymity offers users new socialization is there no statistically significant difference opportunities to express themselves in a community and to between anonymous and identifiable usage. satisfy certain needs that one would suppress under other Do users change their usage behavior when they are circumstances. There are already some studies that deal with anonymous? Previous studies showed that anonymity has anonymity on social media, but many questions remained different influences on the (online) behavior of people. open. With our study we tried to close this research gap. Anonymity can promote negative behaviors such as For this purpose we conducted an online survey and aggression, antisociality, and violence (Zimbardo, 1969; determined whether users are motivated by different aspects Katzer, 2016; Suler, 2005) as well as positive behaviors when taking on different roles (as producers, consumers, such as intimacy, openness, the promotion of ideas, and and participants) and whether this changes when using concealment of failures (Peddinti et al., 2014; Zhang & anonymity-based and non-anonymity-based platforms. Kizilcec, 2014). Similarly, there are some studies that found Regarding the three user roles the output of the survey no significant differences between anonymity and non- shows that there are significant differences between anonymity (Black et al., 2015; Saveski et al., 2016). The results anonymous and identifiable usage. Consumers, when of this study show that anonymous and non-anonymous being anonymous, seek for some self-presentation (i.e., they usage exhibit great significant differences for all user types. try to identify themselves with others), as well as a large There are two considerable differences between anonymous amount of information and entertainment, while when and non-anonymous usage, where the difference of the being identifiable they are looking for socialization (which is median values equals 2. The first one is given between doomed to be rather unsuccessful when being anonymous). anonymous and non-anonymous usage by participants When being anonymous, the participants seek especially who are seeking information. When being anonymous, the more entertainment and some information, whereas when participants tend to rate and comment more on information being identifiable they are not interested in information at all, rather than when they are identifiable. The second major but instead a little bit in entertainment. Producers seek more difference is apparent between anonymous and non-

33 http://www.jistap.org JISTaP Vol.6 No.3, 25-36 anonymous usage by producers seeking self-presentation. results were also discovered by Wielert (2017), who found When being identifiable, producers tend to post more that anonymity is of great importance to Jodel users, content in order to present themselves rather than when especially with regard to creating posts. In the case of votes, being anonymous. This outcome is not surprising, since however, anonymity does not play a significant role. This is to when being anonymous one cannot present him or herself be expected, since posts are considered much more personal to the fullest extent as it is possible on non-anonymous statements than reviews in the form of likes or votes. platforms. In summary it can be said that both systems, anonymous Especially for self-presentation there are (for all users) low as well as non-anonymous ones, can be popular and median values (of 4 or less). When being both, anonymous successful (Nowak et al., 2018; Scholl, 2015). This is mainly or identifiable, users rarely use social media to represent due to the different user gratifications. As mentioned above, themselves or identify themselves with others. This result this is especially noticeable for participants and consumers, is surprising, as self-presentation was often named as an where the behavior between anonymous and non- important motivational factor (Shao, 2009; Shang, Chen, & anonymous usage is very different. For system developers, Liao, 2006; Heinonen, 2011; Livingstone, 2008). However, it would be interesting to implement a function allowing there are also studies which indicated the opposite. users to switch between anonymity and identifiability. With Friedländer (2017), for instance, showed that only about our results, the services can better estimate what form of use 11 percent of all producers of social live streaming services they should provide to approach a specific type of potential name self-expression as one of their motives to produce users. If they are looking for users who are supposed to rate content. As for the live streaming platform YouNow, Scheibe, and comment on content, they should consider ensuring an Zimmer, and Fietkiewicz (2017) identified 18 percent of adequate level of privacy. In contrast, when their platform users applying this service because of self-presentation. As is supposed to be built upon user-generated content and for Instagram and Jodel, this motivational factor seems to evolve around the users themselves (self-presentation), they be barely relevant. Overall, users are more likely to consume should focus on producers who do not necessarily look for media, both anonymously and non-anonymously, rather anonymity, but prefer being identifiable (and, this way, being than produce content. This is in line with other research able to personally receive appreciation for content they results on social media; e.g., Scheibe, Fietkiewicz, and Stock created). This can be especially applicable to the so-called (2016) found for social live streaming services that about 60 influencers or micro-celebrities who, through an intensive percent of all users consume streams, but only 45 percent self-branding, create an “influential” online persona produce their own content. (Fietkiewicz, Dorsch, Scheibe, Zimmer, & Stock, 2018) and Omitting aggressiveness, anti-social behavior, or violence, need to remain identifiable to their fans. This, however, and focusing on the motives self-presentation, information, does not stand for content creation in the categories of socialization, and entertainment, and considering different information or entertainment. Here, a good example of user roles, it can be seen that anonymous and non- successful platforms for producing information could be anonymous usage more or less satisfies the needs for Reddit, whereas for entertainment—the meme-sharing entertainment through consuming as well as rating and platforms 4chan or 9gag. On all of them the users are not commenting (i.e., participating); however, not in producing required to reveal their identity and can publish content content on non-anonymous platform. Nearly all consumers anonymously. and participants seek entertainment on both platforms. This is understandable, given the premise that users use social media to entertain and distract themselves from 6. LIMITATIONS AND OUTLOOK everyday life. Heinonen (2011) explains that entertainment is understood as an act for “relaxation or escape.” Until now some studies have covered the aspect of When on an anonymous platform, users prefer to stay anonymity and non-anonymity regarding aggressiveness, that way, whereas on non-anonymous ones they do not care antisociality, and violence on social media. Shao (2009) about anonymity. In addition, on an anonymous platform examined the adaptation of U> on social media with a users dare more to post or comment on and rate content. On focus on self-representation, socialization, information, and a non-anonymous platform (Instagram), users would dare entertainment. Still, there were no results combining both of more to post or respond to posts if they were anonymous, these aspects so far. though not to the same extent as they do on Jodel. Similar Considering the median values for producers over

34 Anonymous and Non-anonymous User Behavior on Social Media all gratification types, which is only neutral (4) or below Current perspectives on gratifications research. Beverly on the 7-point Likert scale, the question arises: What do Hills, CA: Sage. producers want on social media, especially when they are Dibbell, J. (2010). Radical opacity. Technology Review, not being anonymous? To better understand why non- 113(5), 82-86. anonymous producers tend to act passively in terms of the Fietkiewicz, K. J., Dorsch, I., Scheibe, K., Zimmer, F., & uses and gratifications of socialization, information, and Stock, W. G. (2018). Dreaming of stardom and money: entertainment, while anonymous producers tend to be only Micro-celebrities and influencers on live streaming moderately active regarding information and entertainment, services. In G. Meiselwitz (Ed.). Social computing and their motivation needs to be investigated more closely. social media. User experience and behavior. SCSM Qualitative interviews should be an adequate method to 2018 (pp. 240-253). Cham, Switzerland: Springer. disclose the motives of producers. In addition, one could Friedländer, M. B. (2017). Streamer motives and user- change the consideration of the user groups in further generated content on social live-streaming services. studies. This study looked at and evaluated users who use Journal of Information Science Theory and Practice, both—Instagram and Jodel. To exclude the possibility of 5(1), 65-84. usage patterns when using both systems, one might consider Gruenderszene (2017). Studenten-App Jodel erhält sechs users who use only Jodel or only Instagram; however, not Millionen: und will in die USA expandieren [Student both. app Jodel receives six million: and wants to expand The limitations of our study concern the regional into the US]. Retrieved Jun 30, 2018 from https://www. distribution of the survey. Since Jodel is an app developed gruenderszene.de/allgemein/jodel-studenten-app-usa- in Germany, it is only well known in German speaking millionen. countries. Our survey was therefore created in German to Heinonen, K. (2011). Consumer activity in social media: reach the majority of the users of Jodel. That also means that Managerial approaches to consumers’ social media the whole study, including the investigation of Instagram behavior. Journal of Consumer Behavior, 10(6), 356- usage, is limited to German speaking users. To conduct a 364. broader study it is necessary to design a survey in English Jordan, T. (2002). Cyberpower: The culture and politics of and distribute it in other countries. However, this would cyberspace and the Internet. New York, NY: Routledge. pose another limitation, namely that users in other countries Katz, E., Blumler, J. G., & Gurevitch, M. (1973). Uses and might be unfamiliar with Jodel. Therefore, an alternative gratifications research.Public Opinion Quarterly, 37(4), to Jodel as a mobile app enabling solely anonymous usage 509-523. should be investigated. Or perhaps Jodel will become as Katzer, C. (2016). Cyberpsychologie: Leben im Netz: Wie das successful in the English-speaking world as formerly Yik Internet uns verändert [Cyberpsychology: Life on the Yak, since the company is planning to expand into the USA Net: How the Internet changes us]. Munich, Germany: (Gruenderszene, 2017). Deutscher Taschenbuch Verlag. Kolodny, L. (2017). Yik Yak shuts down after Square paid $1 million for its engineers. Retrieved Jun 30, 2018 from REFERENCES https://techcrunch.com/2017/04/28/yik-yak-shuts- down-after-square-paid-1-million-for-its-engineers/ Bernstein, M. S., Monroy-Hernández, A., Harry, D., André, Krishnamurthy, B., & Wills, C. E. (2009). On the leakage of P., Panovich, K., & Vargas, G. G. (2011). 4chan and/ personally identifiable information via online social b: An analysis of anonymity and ephemerality in a networks. In Proceedings of the 2nd ACM Workshop large online community. In Proceedings of the Fifth on Online Social Networks (pp. 7-12). New York, NY: International AAAI Conference on Weblogs and Social ACM. Media (pp. 50-57). Palo Alto, CA: Association for the Lee, C. S., & Ma, L. (2012). News sharing in social media: Advancement of Artificial Intelligence Press. The effect of gratifications and prior experience. Black, E. W., Mezzina, K., & Thompson, L. A. (2015). Computers in Human Behavior, 28(2), 331-339. Anonymous social media: Understanding the content Likert, R. (1932). A technique for the measurement of and context of Yik Yak. Computers in Human Behavior, attitudes. Archives of Psychology, 140, 1-55. 57, 17-22. Livingstone, S. (2008). Taking risky opportunities in Blumler, J., & Katz, E. (1974). The uses of mass communications: youthful content creation: Teenagers’ use of social

35 http://www.jistap.org JISTaP Vol.6 No.3, 25-36

networking sites for intimacy, privacy and self- Shang, R. A., Chen, Y. C., & Liao, H. J. (2006). The value expression. New Media & Society, 10(3), 393-411. of participation in virtual consumer communities on McQuail, D. (1994). Mass communication theory. London, brand loyalty. Internet Research, 16(4), 398-418. UK: Sage. Shao, G. (2009). Understanding the appeal of user-generated Morahan-Martin, J., & Schumacher, P. (2000). Incidence media: A uses and gratification perspective. Internet and correlates of pathological internet use among Research, 19(1), 7-25. college students. Computers in Human Behavior, 16(1), Sheldon, P., & Bryant, K. (2016). Instagram: Motives for its 13-29. use and relationship to narcissism and contextual age. Nowak, P., Jüttner, K., & Baran, K. S. (2018). Posting content, Computers in Human Behavior, 58, 89-97. collecting points, staying anonymous: An evaluation Steiner, P. (1993). On the Internet, nobody knows you’re a of Jodel. In G. Meiselwitz (Ed.). Social computing dog [Cartoon.] The New Yorker, 69(20), 61. and social media. User experience and behavior. 10th Suler, J. (2005). The online disinhibition effect. International International Conference, SCSM 2018, Held as Part of Journal of Applied Psychoanalytic Studies, 2(2), 184- HCI International 2018 (pp. 67-86). Cham, Switzerland: 188. Springer. Thiele, P. (2015). Jodel-App: Karma-Punkte. Das steckt Peddinti, S. T., Ross, K. W., & Cappos, J. (2014). On the dahinter [Jodel App: Karma Points. That's behind it]. Internet, nobody knows you’re a dog: A Twitter case Retrieved Jun 30, 2018 from https://praxistipps.chip.de/ study of anonymity in social networks. In Proceedings jodel-app-karma-punkte-das-steckt-dahinter_43924 of the Second ACM Conference on Online Social Wielert, E. (2017). Die Rolle von Anonymität und Lokalität Networks (pp. 83-94). New York, NY: ACM. in einem sozialen Netzwerk am Beispiel von Jodel [The Postmes, T., Spears, R., & Lea, M. (1998). Breaching or role of anonymity and locality in a social network using building social boundaries? SIDE-effects of computer- the example of Jodel] (Unpublished bachelors thesis). mediated communication. Communication Research, Christian-Albrechts-University Kiel, Germany. 25(6), 689-715. Wilcoxon, F. (1945). Individual comparisons by ranking Saveski, M., Chou, S., & Roy, D. (2016). Tracking the Yak: methods. Biometrics Bulletin, 1(6), 80-83. An empirical study of Yik Yak. In International AAAI Wodzicki, K., Schwämmlein, E., Cress, U., & Kimmerle, Conference on Web and Social Media (ICWSM) J. (2011). Does the type of anonymity matter? The (pp. 671-674). Palo Alto, CA: Association for the impact of visualization on information sharing in Advancement of Artificial Intelligence Press. online groups. Cyberpsychology, Behavior, and Social Scheibe, K., Fietkiewicz, K. J., & Stock, W. G. (2016). Networking, 14(3), 157-160. Information behavior on social live streaming services. Zhang, K., & Kizilcec, R. F. (2014). Anonymity in social Journal of Information Science Theory and Practice, media: Effects of content controversiality and social 4(2), 6-20. endorsement on sharing behavior. In International Scheibe, K., Zimmer, F., & Fietkiewicz, K. (2017). Das AAAI Conference on Web and Social Media (ICWSM) Informationsverhalten von Streamern und Zuschauern (pp. 643-646). Palo Alto, CA: Association for the bei Social Live-Streaming Diensten am Fallbeispiel Advancement of Artificial Intelligence Press. YouNow [The information behavior of streamers and Zimbardo, P. G. (1969). The human choice: Individuation, viewers in social live streaming services in the case of reason, and order versus deindividuation, impulse, and YouNow]. Information: Wissenschaft & Praxis, 68(5- chaos. Nebraska Symposium on Motivation, 17, 237- 6), 352-364. 307. Scholl, H. (2015). Instant profits guide to Instagram success. Zimmer, F., Scheibe, K., & Stock, W. G. (2018). A model for Budapest: PublishDrive. information behavior research on social live streaming Scott, S. V., & Orlikowski, W. J. (2014). Entanglements in services (SLSSs). In G. Meiselwitz (Ed.). Social computing practice: Performing anonymity through social media. and social media: User experience and behavior. 10th MIS Quarterly, 38(3), 873-893. International Conference, SCSM 2018 (pp. 429-448). Seigfried-Spellar, K. C., & Lankford, C. M. (2017). Cham, Switzerland: Springer. Personality and online environment factors differ for posters, trolls, lurkers, and confessors on Yik Yak. Personality and Individual Differences, 124, 54-56.

36 JISTaP http://www.jistap.org Research Paper Journal of Information Science Theory and Practice J Inf Sci Theory Pract 6(4): 37-44, 2018 eISSN : 2287-4577 pISSN : 2287-9099 https://doi.org/10.1633/JISTaP.2018.6.3.4

Rediscovering Forgotten Research: Sleeping Beauties at the University of Waterloo

Jeffrey Demaine* Dana Porter Library, University of Waterloo, Waterloo, ON, Canada E-mail: [email protected]

ABSTRACT An academic article is normally cited within a few years of publication, after which interest falls off as the research field moves on. However, an article is sometimes ignored for many years only to attract interest after a long period of dormancy. Such articles are called “Sleeping Beauties.” A general characterization of this pattern has recently been defined and is used in this study to identify five Sleeping Beauties that were published by researchers at the University of Waterloo in the 1970s and 1980s. While a handful of studies have examined the occurrence of such Sleeping Beauties in specific fields of research or in a particular journal, none has yet identified these unusual articles in the context of the lasting impact of a university’s research. This study is therefore a novel application of the latest technique for identifying Sleeping Beauties. The possibilities for using this unusual citation pattern in raising the profile of a university’s research are discussed. Keywords: bibliometrics, sleeping beauties, beauty coefficient, research impact

Open Access Accepted date: August 31, 2018 All JISTaP content is Open Access, meaning it is accessible online to Received date: June 22, 2018 everyone, without fee and authors’ permission. All JISTaP content is published and distributed under the terms of the Creative Commons *Corresponding Author: Jeffrey Demaine Attribution License (http://creativecommons.org/licenses/by/3.0/). Bibliometrics Librarian Under this license, authors reserve the copyright for their content; Dana Porter Library, University of Waterloo, Waterloo, ON, Canada however, they permit anyone to unrestrictedly use, distribute, and E-mail: [email protected] reproduce the content in any medium as far as the original authors and source are cited. For any reuse, redistribution, or reproduction of a work, users must clarify the license terms under which the work was produced.

© Jeffrey Demaine, 2018 JISTaP Vol.6 No.3, 37-44

1. INTRODUCTION Einstein, Podolsky, and Rosen published in 1935. Known as the EPR paper, it was not extensively cited until some 60 years A university’s ability to advance a research front can after it appeared (Fig. 1). Although Redner (2005) notes that be quantified by counting the citations to its published the EPR paper was cited 36 times before 1980, the explosive research. But it is not only the recent research coming growth of interest in this paper since 1990 is a hallmark of from a university that advances a research front. Some an idea that was ahead of its time. Though the concept of papers are able to influence current research many years quantum entanglement presented in the EPR paper may have after they first appear. Papers published decades ago been of some theoretical interest throughout the twentieth may find new relevance and become highly cited after century, it is only with the technological advances in quantum many years of dormancy, suggesting that these ideas were computing in recent years that this article has found new ahead of their time. Publications that exhibit this pattern relevance. Although it is an old paper, it has become current of delayed recognition are known as “Sleeping Beauties” and is now a central part of the evolving research front. A (SBs) (van Raan, 2004). Although rare, they have been history of the implications of the EPR paper make clear identified in such diverse research areas as physics (Redner, its relevance to quantum physics: “Due to its role in the 2005), pediatrics (Završnik & Kokol, 2016), medicine and development of quantum information theory, it is also near biological engineering (Huang, Hsu, & Ciou, 2015), and the top in [the] list of currently ‘hot’ papers” (Fine, 2017). psychology (Lange, 2005; Ho & Hartley, 2017). This raises the question of whether SBs can also be found at specific 1.1. Drivers Behind the SB Citation Pattern institutions. The current study seeks to identify SBs in the There are several explanations as to why an article would research papers published by the faculty of the University of exhibit such an unusual pattern of citations. In some cases, Waterloo located in Ontario, Canada. SBs appear because the research in the article finds relevance In this paper we present a review of the literature about in another discipline where it has an impact far greater SBs and explore some of the reasons behind this unusual than in its substantive field. Examining the metadata of the citation pattern. We implement the most advanced articles that cited the SBs and caused them to ‘awaken’ (so algorithm for evaluating the ‘surprisingness’ of an article’s called “Prince” articles), Braun, Glänzel, and Schubert (2010) rediscovery and use it in a case study of SBs published by as well as Teixeira, Vieira, and Abreu (2017) both found researchers at the University of Waterloo. that 40% of the Princes were from a research field different One of the original SBs to be studied is the work of from that of the SB that they awakened. In such a situation

450

400

350

300

250

200 Times cited Times

150

100

50

0 1935 1939 1943 1947 1951 1955 1959 1963 1935 1971 1975 1979 1983 1987 1991 1995 1999 2003 2007 2011 2015

Fig. 1. Citations of the Einstein, Podolsky, and Rosen (1935) paper.

38 Rediscovering Forgotten Research the SB pattern of citations can be seen as a snapshot of the 2. METHODOLOGY transfer of ideas from one domain of knowledge to another. They also found that the Princes were consistently from There are a number of approaches to identifying SBs. journals with twice the journal impact factor of the journals When the concept was originally characterised, articles in which the SBs appeared. Thus it appears that in many were evaluated according to features of their citation cases the dormancy of SBs is due in some respects to the history. Glänzel and Garfield (2004) defined these “delayed relative obscurity of the journal in which it was published. It recognition” papers as having been uncited for at least five is only when subsequent research in higher-profile journals years after publication, and then subsequently being cited and/or fields picks up on the dormant article that the SB is at least 50 times in the following 15 years. Redner (2005) awakened. offered a simple rule-of-thumb for determining which Secondly, the concepts outlined in a paper may be ahead papers qualify as SBs. He defines a “revived classic as a of their time or run counter to the prevailing consensus of nonreview Physical Review article, published before 1961, the research field. A study of SBs in the field of innovation that has received more than 250 citations and has a ratio of studies found that the reasons for their delayed recognition the average citation age to the age of the paper greater than varies and is as much due to the content of the SB as to 0.7.” This approach to describing SBs lends itself well to their the characteristics of the Prince (Teixeira et al., 2017). A identification in large databases by defining a few search long dormancy is sometimes due to resistance within the parameters. scientific community to the ideas described in the SB, and A more general technique for identifying SBs that does its awakening is sometimes attributable to the development not rely on rule-of-thumb thresholds has been recently of new conceptual models that can leverage the ideas of proposed by Ke, Ferrara, Radicchi, & Flammini (2015). the SB. This aligns with the concept of a “paradigm shift” Instead, the algorithm they propose (Equation 1) expresses as described by Thomas S. Kuhn in his landmark book how surprising the citations to an article are in relation to The structure of scientific revolutions (Kuhn, 1962). In this the number of years it has been dormant. The resulting scenario, the SB serves as an indicator of the rapid evolution number is called the “Beauty Coefficient” (BC). of a research field as it overturns outmoded ideas. A third explanation for the sudden interest in a long- dormant paper is one of technological readiness. It may be that the ideas discussed in an SB paper are correct and/or Equation 1. The Beauty Coefficient as described by Ke, Ferrara, relevant to the field, but the equipment required to test or Radicchi, and Flammini (2015). implement those ideas is too expensive to be widely available or simply does not exist. The EPR paper illustrates just how This approach takes as its input five parameters: it may take decades for the right combination of ideas and technological advancement to come together. Certainly The number of years since publication,t few would have dismissed an article by Albert Einstein as The number of times an article was cited in its year of being of no value. Indeed it was not actually dormant and publication, c0 it received a modest yet steady number of citations every The number of citations at yeart , ct year during the 1950s, 1960s, and 1970s. The discussion The number of years since publication until the year of around this paper became known as the “EPR paradox” maximum citation, tm (the possibility of faster-than-light communication between The number of times an article was cited in its most two particles). Yet it was only in the late 1980s that the EPR highly-cited year, ctm paper began to be highly cited, coinciding with the ability to exploit quantum entanglement in the context of quantum By calculating a sum of these five parameters for every computing that these ideas became applicable. year from publication to the year in which the article in Thus there are at least four reasons why an article should question is most highly cited, a metric of the SB effect is become an SB: It is hidden in a relatively obscure journal, it obtained which expresses how surprising the resurgence finds traction in a different field, it is too unorthodox to be of citations is. Considering that in the context of an article immediately integrated into its proper field of research, or which has received a steadily-increasing number of citations it is ahead of its time in terms of the technology required to year after year, yet another year of increased citations is not apply the concepts it contains. at all surprising, and consequently the article in question

39 http://www.jistap.org JISTaP Vol.6 No.3, 37-44 would have a very low BC. Conversely, a paper that has Table 1. Papers by year of publication and number of times they have been cited been dormant for decades only to receive a sudden and Publication year No. of papers Total cites Average cites per paper large spike in citations is highly unusual and would therefore 1958 1 10 10 receive a high BC. 1960 4 6 1.5 This technique is used in this study to identify SB papers 1961 6 9 1.5 1962 4 65 16.3 that were published by faculty at the University of Waterloo. 1963 21 722 34.4 To calibrate our implementation of the Ke et al. (2015) 1964 39 322 8.3 algorithm, we quantify the rapid growth in citations to the 1965 69 911 13.2 EPR paper after 1987 and arrive at a BC of 2,333. This is 1966 83 903 10.9 very similar to the score of 2,258 calculated by Ke et al. (2015), 1967 116 3,855 33.2 the slight increase being due to differences in the journals 1968 156 2,726 17.5 indexed (and therefore the citations identified) between the 1969 249 6,659 26.7 Web of Science used by Ke et al. and the Scopus database 1970 301 6,141 20.4 used in the current study. In addition, the accumulation of 1971 385 8,654 22.5 1972 432 8,658 20.0 new citations to the EPR paper in the three years since Ke et 1973 464 9,288 20.0 al. collected their data would naturally produce a higher BC. 1974 524 9,180 17.5 To identify SBs at the University of Waterloo, citation 1975 461 8,251 17.9 frequency data (Demaine, 2018) was downloaded from 1976 470 8,250 17.6 Elsevier’s Scopus database in November 2017 for the 1977 489 10,991 22.5 period 1958 (when the university was founded) to 1998 1978 490 8,053 16.4 (inclusively). While the University of Waterloo now 1979 520 13,221 25.4 publishes thousands of papers every year, its output at its 1980 553 10,343 18.7 1981 532 10,736 20.2 founding was naturally very modest. For example, the first 1982 572 13,845 24.2 and only paper published by Waterloo in 1958 was “Decay 1983 655 13,648 20.8 of immediate memory with age” by Fraser. Since then 1984 859 18,217 21.2 the growth in publications from the university has been 1985 877 15,024 17.1 impressive with the university publishing 4,341 papers in 1986 921 17,423 18.9 2017 (Scopus, 2018). 1987 865 19,078 22.1 As no SB articles were found to have been published 1988 910 23,326 25.6 after 1987, this study will only examine the first 30 years of Total 12,028 248,515 Average: 18.75

1,000 25,000

900

800 20,000

700

600 15,000 Times cited

500

400 10,000

Papers published per year Papers 300

200 5,000

100

0 0 1958 1961 1963 1965 1967 1969 1971 1973 1975 1977 1979 1981 1983 1985 19871988

Fig. 2. Publications by the University of Waterloo 1958 to 1988 and times cited.

40 Rediscovering Forgotten Research the university’s development up to 1988. To provide some are the citations to a re-awakened article and the more context for this analysis, we see that the university published that article fits the definition of a SB. Thus there must be a 12,028 papers from 1958 to 1988. This is illustrated in practical lower limit below which the recognition, delayed Fig. 2 (with associated data in Table 1). We see that after a as it may be, is simply too small to signify any meaningful slow start in the early 1960s, the university was publishing impact on the current research front. Given that Ke et al. a thousand papers per year by the end of the 1980s. On (2015) considered a BC value of 30 as being “small,” we will average these twelve thousand papers have been cited 18.75 use a threshold of 100 for the BC as the lower limit of what times, although this statistic is heavily skewed by highly constitutes a meaningful SB. cited outliers. While much of the earlier work in this field has relied on programmatic approaches employing SQL to scan a local 3. RESULTS database of citation data for patterns that match certain threshold criteria (for example, in Redner’s 2005 study, over We identified five articles that were published by faculty a century’s worth of publications of the American Physical or graduate students of the University of Waterloo that Society were searched), the much smaller amount of data exhibit a clear SB citation pattern (Table 2, Fig. 3). The collected for this study permitted a manual approach to earliest SBs we discovered were published in 1971, and the identifying SBs. most recent was published in 1987. As SBs are known to Working at the level of an institution (that is to say, for be quite rare, it is not surprising that there should only be a several thousand records), the citation-by-year data of handful from any given institution. Scopus can be exported as a delimited text file and then A statistical description of these articles illustrates just sorted using a spreadsheet application such as Microsoft how unusual they are. Besides the BC, the unusual nature Excel. From this point, the technique for identifying unusual of these articles is illustrated by calculating their average citation patterns is straightforward: With successive columns citation age. Note that the concept of “citation age” was listing the citations received in each year after publication defined by Redner (2005) as “The age of a citation is the for the rows of articles, a sorting is defined in which each difference between the year when a citation occurs and successive year is a sorting level. With each sorting level the publication year of the cited paper.” To arrive at an ordered from lowest to highest, those articles with the average citation age, the number of citations in a given year lowest number of citations appear at the top of the list. One is multiplied by the number of years since publication to visually scans this layout for a long series of years after an generate an age-weighted citation count. The overall total of article’s publication in which there were zero (or nearly zero) the citation ages for all years is divided by the total number citations, followed by a more recent increase. The BC can of citations to determine the average citation age. then be calculated using the technique of Ke et al. (2015) for Redner (2005) looked at a century’s worth of articles in the small number of articles that are identified as having the the journal Physical Review and found an average citation characteristics of a SB. age of 6.2 years. Most articles have most of their impact Note that Ke et al. (2015) do not specify a threshold value within a few years of publication. In contrast we see (Table for determining the significance of the BC: “There are no 2) that the average citation age of these five articles from the clear demarcation values that allow us to separate SBs from University of Waterloo is considerably longer and ranges ‘normal’ papers: delayed recognition occurs on a wide and from 21.5 to 40.2 years (calculated up to the year of peak continuous range.” While this new metric measures the citation). Thus the peak of citations to these SB articles magnitude of the awakening, it does not offer a mechanism happens decades after most articles have had an impact on for determining whether a paper is a SB or not. This is their field. due to the fact that they find that BCs exhibit a scale-free The most striking result is the 1974 article by Horndeski distribution when calculated for articles in both the Web of in the International Journal of Theoretical Physics. Since Science and American Physical Society databases (Redner, 2010 the growth of interest in this paper has been explosive, 2005). This implies that there is no characteristic value for awakening in 2011 with 16 citations and reaching a peak of the BC and that while it must be a positive value, it may 153 citations only five years later. This is an ideal SB citation range from null to an arbitrarily large number. pattern. Note that this article was cited once in 1976, 1977, While the BC follows a scale-free distribution, it is also and 1983 by other researchers, indicating that the paper true that the greater the BC, the more sudden and surprising was indexed by citation databases and was potentially

41 http://www.jistap.org JISTaP Vol.6 No.3, 37-44

Table 2. Five Sleeping Beauty articles published by the University of Waterloo Total Average Beauty Year & field Author. “Title” Journal citations citation age Coefficient 1974 Horndeski G. W. “Second-order scalar-tensor field equations in a four-dimensional space” Physics International Journal of Theoretical Physics 619 40.2 2,434 1980 Collins C. B., Glass E. N., Wilkinson D. A. “Exact spatially homogeneous cosmologies” Physics General Relativity and Gravitation 186 28.9 315 1971 Lovelock D. “The Einstein tensor and its generalizations” Physics Journal of Mathematical Physics 1,180 35.2 286 1981 Computer Mark J. W., Todd T. D. “A nonuniform sampling approach to data compression” 116 29.9 117 science IEEE Transactions on Communications 1987 Computer Kilgour D. M., Hipel K.W., Fang L. “The graph model for conflicts” 119 21.5 103 science Automatica

160 B = 2434 Horndeski G.W. (1974) B = 286 Lovelock D. (1971) 140 B = 315 Collins C.B., Glass E.N., Wilkinson D.A. (1980) B = 117 Mark J.W., Todd T.D. (1981) 120 B = 103 Kilgour D.M., Hipel K.W., Fang L. (1987)

100

80 Citations 60

40

20

0 1971 1973 1975 1977 1979 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015 2017

Fig. 3. Citation history of five Sleeping Beauty papers from the University of Waterloo. The Beauty Coefficient for each paper is shown as the value “B.” discoverable. After this, the Horndeski paper did not receive name brand within this sub-specialty of cosmology. Of the another citation for 28 years. Interestingly, and despite being citing articles, 100 use that researcher’s name to represent four decades younger than the paper by Einstein, Podolsky, the associated concept: “Horndeski theories,” “Horndeski and Rosen (1935), citations to this article spiked so suddenly model,” and “Horndeski gravity.” A deeper analysis of the after 2010 that its BC of 2,434 even exceeds that of the EPR evolution of this very complex and theoretical topic in paper which has a value of 2,333. While the EPR paper has cosmological physics is beyond the scope of this paper. But been cited many more times than Horndeski’s 1974 paper, from a bibliometric perspective, it is sufficient to note that it is actually the latter that has had a more sudden and the Horndeski paper lay dormant for a quarter of a century surprising impact on its research field. and in the relatively short period since its awakening in Entitled “Second-order scalar-tensor field equations in a 2011 has suddenly became relevant to the understanding of four-dimensional space,” the Horndeski paper proposes a gravity. This is an example of the second type of driver for highly theoretical reimagining of what gravity is. A review the emergence of an SB, that of an idea that was ahead of its of the recent articles that cite it indicates that this paper has time. This represents, on a very small scale, what Thomas S. become central to the understanding of Galilean gravity Kuhn described as a “paradigm shift” in science. models. Indeed the paper has become something of a The citation history of the Horndeski and another

42 Rediscovering Forgotten Research

SB paper, Lovelock’s “The Einstein tensor and its the SBs from the University of Waterloo, as this faculty generalizations” (1971) are entwined: Horndeski was the is a smaller and more recent part of the organization. graduate student of Lovelock and the Horndeski paper cites Lovelock’s paper. Being an extrapolation of Lovelock’s ideas, the Horndeski (1974) paper took on new relevance once 4. CONCLUSION the Lovelock paper was awakened and it was through the latter that researchers were presumably led to the Horndeski By identifying SBs in the publication history of the paper. Indeed, these papers have been co-cited 68 times in University of Waterloo, we have uncovered the legacy of some the Scopus database as of March 2018. of the research performed there decades ago. Rather than Two articles in computer science from the 1980s are also being forgotten these unusual examples of scholarship have somewhat surprising: Kilgour, Hipel, and Fang (1987), and found new life, contributing to the research fronts of physics Mark and Todd (1981). While they have each been cited and computer science. Despite being a middle-sized university more than 100 times, the pattern of citations to these articles founded only 60 years ago, the University of Waterloo has is of a more gradual awakening rather than a sudden spike produced a handful of SBs, including one even more surprising of interest. This moderates just how high their BC score (in terms of the suddenness of its impact as measured by its can be. Still, they were both very much dormant for two BC) than the Einstein, Podolsky, and Rosen paper. decades, and the average age of the citations to these is 21.5 While the University of Waterloo is renowned within and 29.9 years, respectively, so their BCs are greater than 100 Canada for its research in such fields as computer science, and they can both rightfully be considered SBs. nanotechnology, and engineering, it is not an exceptional Note that while an article’s BC is related to the number of institution in the context of higher education globally. It citations it receives, it is not strictly proportional. Consider is therefore not unreasonable to expect that many other two articles in physics: Lovelock (1971) has received 1,180 universities around the world have also produced research citations (as of November 2017, Scopus) and has a BC that has lain dormant for many years and that has recently of 286. In contrast, Collins, Glass, and Wilkinson (1980) been rediscovered. The technique outlined here could be has received only 186 citations and yet has a higher BC used at other institutions to identify researchers who were of 315. This is because the surprisingness of the spike in ahead of their time. citations to the Lovelock paper after 2001 is muted by the The rationale for doing so is not simply esoteric. The modest attention it received in the 1980s. We see that the discovery of this unusual citation pattern in the historical BC algorithm of Ke et al. (2015) takes into account both the publications of an institution presents it with an opportunity depth of the sleep and the suddenness of the awakening in to think about the use of bibliometrics in a new way. In determining how much of an SB a paper represents. contrast to the negative reputation that bibliometrics has Given that SBs have been found in a wide range of gained as a result of its inappropriate use in judging faculty, research fields, and that Glänzel and Garfield (2004) SBs are a thoroughly positive application of bibliometrics found twice as many SBs in life sciences as in physics, it because they celebrate work that has been overlooked. may seem curious that the delayed recognition papers Indeed, the fairy tale analogy implied by the term “Sleeping from the University of Waterloo occur in only physics Beauty” is not simply that the articles have been awakened, and computer science. This is no doubt a reflection of the but that the story has a happy ending. For faculty who have history and research strengths of the university, which been conditioned to view bibliometrics as merely a form has no medical school and which was founded in 1958 of accounting, SBs demonstrate that bibliometrics can with a focus on engineering, math, and computer science. instead be used to construct a positive story about the use of This legacy continues to this day, with the university being citations in describing research. ranked as having the 70th best program in engineering How then can universities such as Waterloo capitalize and technology (which includes computer science) in the on this uncommon research legacy? One approach would world according to the 2018 QS World University Rankings be to use SBs in a communications plan to highlight the (https://www.topuniversities.com/subject-rankings/2018). most impactful research in the history of the university. This The natural sciences (which includes physics) also bibliometric technique is easily applied at any university with do well, and the university is ranked as the 116th best access to the appropriate databases. Once identified, the SBs program in the world. It is therefore not surprising that in an institution’s publication record demonstrate the legacy no articles from the social sciences were found amongst of the groundbreaking research that was performed there.

43 http://www.jistap.org JISTaP Vol.6 No.3, 37-44

ACKNOWLEDGMENTS Huang, T.-C., Hsu, C., & Ciou, Z.-J. (2015). Systematic methodology for excavating sleeping beauty The author would like to thank Cal Murgu for his publications and their princes from medical and assistance and thoughtful comments. biological engineering studies. Journal of Medical and Biological Engineering, 35(6), 749-758. Ke, Q., Ferrara, E., Radicchi, F., & Flammini, A. (2015). REFERENCES Defining and identifying sleeping beauties in science. Proceedings of the National Academy of Sciences of the Braun, T., Glänzel, W., & Schubert, A. (2010). On United States of America, 112(24), 7426-7431. Sleeping Beauties, Princes and other tales of citation Kilgour, D. M., Hipel, K. W., & Fang, L. (1987). The graph distributions. Research Evaluation, 19(3), 195-202. model for conflicts.Automatica , 23(1), 41-55. Collins, C. B., Glass, E. N., & Wilkinson, D. A. (1980). Exact Kuhn, T. S. (1962). The structure of scientific revolutions. spatially homogeneous cosmologies. General Relativity Chicago, IL: University of Chicago Press. and Gravitation, 12(10), 805-823. Lange, L. L. (2005). Sleeping beauties in psychology: Demaine, J. (2018). DATA.xlsx (version 1). figshare. Comparisons of "hits" and "missed signals" in psychological Retrieved Jun 30, 2018 from https://doi.org/10.6084/ journals. History of Psychology, 8(2), 194-217. m9.figshare.6464840.v1. Lovelock, D. (1971). The Einstein tensor and its generalizations. Einstein, A., Podolsky, B., & Rosen, N. (1935). Can Journal of Mathematical Physics, 12(3), 498-501. quantum-mechanical description of physical reality Mark, J. W., & Todd, T. D. (1981). A nonuniform sampling be considered complete? Physical Review, 47(10), 777- approach to data compression. IEEE Transactions on 780. Communications, 29(1), 24-32. Fine, A. (2017). Einstein-Podolsky-Rosen argument Redner, S. (2005). Citation statistics from 110 years of in quantum theory. The Stanford Encyclopedia of Physical Review. Physics Today, 58(6), 49-54. Philosophy. Retrieved Jun 30, 2018 from https://plato. Scopus (2018). [Search string = AF-ID (“University of stanford.edu/archives/win2017/entries/qt-epr/. Waterloo” 60014171)]. Retrieved Aug 27, 2018 from Fraser, D. C. (1958). Decay of immediate memory with age. https://www.scopus.com. Nature, 182(4643), 1163. Teixeira, A. A. C., Vieira, P. C., & Abreu, A. P. (2017). Glänzel, W., & Garfield, E. (2004). The myth of delayed Sleeping Beauties and their Princes in innovation recognition. The Scientist, 18(11), 8-9. studies. Scientometrics, 110(2), 541-580. Ho, Y.-S., & Hartley, J. (2017). Sleeping Beauties in van Raan, A. F. J. (2004). Sleeping Beauties in science. psychology. Scientometrics, 110(1), 301-305. Scientometrics, 59(3), 467-472. Horndeski, G. W. (1974). Second-order scalar-tensor field Završnik, J., & Kokol, P. (2016). Sleeping Beauties in equations in a four-dimensional space. International pediatrics. Journal of the Medical Library Association, Journal of Theoretical Physics, 10(6), 363-384. 104(4), 313-314.

44 JISTaP http://www.jistap.org Research Paper Journal of Information Science Theory and Practice J Inf Sci Theory Pract 6(3): 45-60, 2018 eISSN : 2287-4577 pISSN : 2287-9099 https://doi.org/10.1633/JISTaP.2018.6.3.5

Quantifying Quality: Research Performance Evaluation in Korean Universities

Kiduk Yang* Hyekyung Lee Department of Library and Information Science, Department of Library and Information Science, Kyungpook National University, Daegu, Korea Kyungpook National University, Daegu, Korea E-mail: [email protected] E-mail: [email protected]

ABSTRACT Research performance evaluation in Korean universities follows strict guidelines that specify scoring systems for publication venue categories and formulas for co-authorship credit allocation. To find out how the standards differ across universities and how they differ from bibliometric research evaluation measures, this study analyzed 25 standards from major Korean universities and rankings produced by applying standards and bibliometric measures such as publication and citation counts, normalized impact score, and h-index to the publication data of 195 tenure-track professors of library and information science departments in 35 Korean universities. The study also introduced a novel impact score normalization method to refine the methodology from prior studies. The results showed the university standards to be mostly similar to one another but quite different from citation-driven measures, which suggests the standards are not quite successful in quantifying the quality of research as originally intended. Keywords: bibliometrics, research assessment, impact score normalization, rank cluster analysis

Open Access Accepted date: September 14, 2018 All JISTaP content is Open Access, meaning it is accessible online to Received date: September 06, 2018 everyone, without fee and authors’ permission. All JISTaP content is published and distributed under the terms of the Creative Commons *Corresponding Author: Kiduk Yang Attribution License (http://creativecommons.org/licenses/by/3.0/). Professor Under this license, authors reserve the copyright for their content; Department of Library and Information Science, Kyungpook National however, they permit anyone to unrestrictedly use, distribute, and University, 80 Daehak-ro, Buk-gu, Daegu 41566, Korea reproduce the content in any medium as far as the original authors and E-mail: [email protected] source are cited. For any reuse, redistribution, or reproduction of a work, users must clarify the license terms under which the work was produced.

© Kiduk Yang, Hyekyung Lee, 2018 JISTaP Vol.6 No.3, 45-60

1. INTRODUCTION inclinations. To circumvent subjective influences and streamline the research performance evaluation process, Evaluation of research performance is not only a Korean universities employ prescriptive research assessment crucial component of faculty appraisal in universities but standards that aim to quantify research productivity also of great importance to external entities that provide and quality in a similar fashion to the indexed journal research funding or assess academic institutions. Within publication count. the university system, research performance is typically Since each Korean university uses its own research the basis for promotion and tenure. For funding agencies, assessment standard, we were curious to find out how the research performance of applicants is a key facet of proposal standards differ across universities and how they affect evaluation. For those who need to assess academic units, research evaluation outcomes. Furthermore, we were be it the universities, departments, or researchers, proper interested in discovering how well those standards measure evaluation of research performance is mandatory for research quality and what can be done to enhance the accurate appraisal. accuracy and sensitivity of such measures. To discover The most common metric of research performance is differences in university research assessment standards, the number of publications, especially those published in we first examined and compared 25 standards from indexed journals. Publication count in indexed journals can major Korean universities, after which we applied those be thought to represent the productivity of “quality” research 25 evaluation standards to the publication data of 195 since indexed journals are supposed to publish quality tenure-track professors of library and information science manuscripts. Indeed, to be accepted in indexed journals, a (LIS) departments in 35 Korean universities to see how manuscript has to pass the quality standards of the editor- the standards affects the research evaluation outcome. To in-chief, an editor, and two or more reviewers. There are ascertain how well the standards measure research quality, variations in quality, however, across and within journals. we compared research performance scores computed To further complicate things, the perspectives, expertise, according to university standards with scores generated and tendencies of reviewers can also affect the manuscript by applying various bibliometric measures that estimate review outcome. research impact, such as citation count, h-index, and impact As is the case with many standards, these variations do factor. Based on these analyses, we assessed how robust the not affect the evaluation outcomes of very strong or weak research evaluation standards of Korean universities are performers in a meaningful way. However, for mediocre and explored how they may be modified to enhance the performers, especially those on the borderline, such accuracy and sensitivity of research performance evaluation. variations can result in significantly different outcomes. The rest of the paper is organized as follows: A review of For instance, a borderline manuscript may get rejected prior research is presented next, followed by the description by reviewers with high standards or subject expertise, of methodology and discussion of results. The paper whereas another borderline manuscript may get accepted concludes with a summary of findings and suggestions for by reviewers who are more lenient or less familiar with optimizing the research performance assessment approach. the subject area of the manuscript. Also, researchers who publish more will get rated higher than those who publish less regardless of the quality of their research, as long as they 2. RELATED RESEARCH publish in indexed journals. In other words, indexed journal publication count, There have been numerous studies that investigated despite its intention to capture both the productivity bibliometric measures for assessing research outcome, and quality of research, falls short of its goal by ignoring especially in the field of LIS. Budd and Seavey (1996), quality differences between journals and manuscripts. who evaluated LIS research in the United States, found no Nevertheless, it forms the basis for the majority of research significant difference in publication counts across academic performance evaluation practices across the globe. The faculty ranks, directly contradicting the finding of an earlier peer review approach commonly used for promotion and study (Hayes, 1983). Adkins and Budd (2006) analyzed tenure introduces qualitative assessments of publications productivity rankings of authors and institutions based on to research performance evaluation, but it is a resource- publication data from Social Science Citation Index (SSCI) intensive and subjective process that can be greatly and found a statistically significant difference in publication influenced by peer reviewers’ perspective, expertise, and and citation counts by faculty rank. Cronin and Meho (2006),

46 Quantifying Quality: Research Performance Evaluation in Korean Universities

Table 1. Selected universities with faculty research performance assessment standards

Chonbuk National University (CNU) Chonnam National University (CNU3) Public (6) Chungnam National University (CNU2) Incheon National University (INU) Kyungpook National University (KNU) Pusan National University (PNU)

Catholic University of Daegu (CUD) Cheongju University (CJU) Daegu University (DU) Dong-eui University (DEU) Dongduk Women’s University (DWU) Ewha Womans University (EWU) Hannam University (HNU) Hansung University (HSU) Kangnam University (KNU2) Keimyung University (KMU) Private (19) Konkuk University (KU) Kyonggi University (KGU) Kyungil University (KIU) Kyungsung University (KSU) Myongji University (MJU) Sangmyung University (SMU) Seoul Women’s University (SWU) Sookmyung Women’s University (SWU2) Sungkyunkwan University (SKKU)

who found a positive relationship between citation count credits below those of single authors. In a follow-up study (H. and h-index rankings of 31 influential information science Lee & Yang, 2017), they investigated publication venue score faculty members, suggested that h-index could complement guidelines specified in university standards in conjunction the citation count in research impact assessment. with co-authorship allocation in a similar manner. Finding Yang and Lee (2012), based on bibliometric analysis of university standards to differ from bibliometric measures in 2,401 LIS faculty publications in Korea, found an increasing some instances, the authors suggested incorporation of more trend of collaboration, publications, and internationalization granular impact assessment measures such as citation count in the LIS field in Korea. In a follow-up study, they (Yang and impact factor. & Lee, 2013) analyzed the LIS faculty rankings produced by various bibliometric measures such as publication count, citation count, h-index, and g-index and found that 3. METHODOLOGY while publication counts correlate with citation counts for productive authors, no correlation was found between 3.1. Data Collection publication and citation counts of authors with the small Among the 35 universities with LIS departments in number of publications. Korea, 25 universities with publically accessible faculty In another study, authors J. Lee and Yang (2015) research performance assessment standards were selected as investigated co-authorship credit allocation models via the study sample. About a quarter of the sample were public comparative analysis of rankings produced by applying and three quarters were private universities. Table 1 lists the co-authorship formulas to 1,436 Web of Science (WoS) selected universities. For the study, we extracted formulas papers published by 35 chemistry faculty members at Seoul for scoring publication venues (e.g., SSCI journal) and co- National University. Noting differences in rankings across authorship contributions (e.g., 1/n for n authors) from the models, the authors suggested that authorship patterns in sample standards. conjunction with citation counts are important factors for The study also extended prior research (Yang & Lee, robust authorship models. 2012) to compile the publication data of 195 tenure-track In a related study, H. Lee and Yang (2015) investigated the professors in 35 LIS departments in Korea from 2001 to co-authorship allocation standards of Korean universities by 2017, which consisted of 4,576 publications and 31,220 comparative analysis of author rankings based on university citations as of August 2018. The publications included standards and bibliometric co-authorship measures applied 3,996 domestic journal papers with 20,799 citations, 402 to Korean LIS faculty publications. The study found the international journal papers with 8,198 citations, and 178 harmonic method to be most similar to the university international conference proceedings with 2,223 citations. standards and concluded that even the most generous Proceedings papers from international conferences were university standards of co-authorship allocation still included in the data collection since they are considered to penalized collaborative research by reducing each co-author’s be an important publication venue for dynamic research

47 http://www.jistap.org JISTaP Vol.6 No.3, 45-60

Table 2. Publication venue and impact score distributions Domestic journals International journals International proceedings IF N/A Total IF CpD CS h5 N/A Total CpD CS h5 N/A Total 159 34 193 67 28 1 5 24 125 28 0 6 59 93 IF, impact factor; CpD, citations per document; CS, CiteScore; h5, h5-index; N/A, missing.

Table 3. An example of journal category score normalization Before normalization After normalization Journal category STD1 STD2 STD3 STD4 STD1 STD2 STD3 STD4 KCI 10 100 100 400 100 100 100 100 Scopus 20 150 200 500 200 150 200 125 SCI 30 200 300 1,200 300 200 300 300 STD, university standard for journal category score; KCI, Korea Citation Index; SCI, Science Citation Index.

fields such as computer science and information science did not have impact scores of any kind. The distribution of (Drott, 1995; Lisée, Larivière, & Archambault, 2008). publication venue and impact score used in the study are The collection of publication data proceeded as follows: shown in Table 2.

1. A faculty list was compiled from National LIS 3.2. Data Normalization Department Faculty Address Books1 and departmental Although the research assessment standards include websites. metrics for monographs and patents, the study focused on 2. Each name in the faculty list was used as a query peer-reviewed journal papers, which is the most common to search National Research Foundation’s Korean form of scholarly communication. Since the university Researcher Information system to generate an initial assessment standards for journal papers had similar journal list of publications. categories (e.g., Science Citation Index [SCI]/SSCI, KCI, etc.) 3. The publication list was supplemented by searching but different scoring scales, the standards were normalized National Research Foundation’s Korea Citation Index by equalizing the KCI scores to make them comparable. The (KCI) and NAVER Academic2 to collect additional formula and Table 3 illustrates the journal category score publications as well as bibliographic information and normalization process for the university standards. citation count for each publication. 100 4. The citation counts of international publications were SC(J ) = SC(J) × (1) norm SC(J=KCI) updated by searching Google Scholar. SC(J): original score for journal category J

To estimate the quality or impact of publication venues SC(Jnorm): normalized score for journal category J bibliometrically, the study collected impact factor (IF) from SC(J=KCI): original score for journal category KCI KCI for domestic journals, and IF from WoS, Cites per Doc (CpD) from SCImago,3 CiteScore from Scopus,4 and Journal and proceedings impact scores also had to be h5-index from Google Scholar Metrics5 for international normalized since IF, CpD, CiteScore, and h5-index are not journals and proceedings, in that order of preference. There directly comparable measures despite their intent to estimate were 193 domestic journals, 125 international journals, and the impact of publication venues. CpD, CiteScore, and 93 international proceedings in the study data, of which 117 IF should be similar to one another since they essentially represent the average citation count per document.

1 Indeed, Table 4 shows the sample values for IF, CpD (i.e., The National LIS Department Faculty Address Book is published 2 annually by the Korea LIS Faculty Association. 2-year CpD), and CiteScore to be similar. CiteScore, being 2 http://academic.naver.com/ averaged over three instead of two years like CpD2 and IF, is 3 https://www.scimagojr.com/ 4 https://www.scopus.com/sources.uri generally larger than the other two, but the three measures 5 https://scholar.google.com/citations?view_op=top_venues approximate one another (Fig. 1).

48 Quantifying Quality: Research Performance Evaluation in Korean Universities

Table 4. Journal impact scores comparison6

Journals IF CpD2 CS h5 Journal of Informetrics 2.92 3.26 2.99 36 Information Processing & Management 2.39 2.75 2.83 35 Journal of Association for Information Science & Technology 2.23 2.75 2.74 53 Scientometrics 2.15 2.33 2.30 49 Knowledge and Information Systems 2.00 2.18 2.36 39 Data & Knowledge Engineering 1.69 1.99 2.24 26 College & Research Libraries 1.52 1.94 2.15 25 Journal of Information Science 1.37 1.31 2.74 22 Journal of Academic Librarianship 1.29 2.18 1.99 25 Information Technology and Libraries 1.03 1.43 1.33 16 Information Research 0.84 0.63 0.79 18 Library Hi Tech 0.76 1.14 1.39 20 Malaysian Journal of Library & Information Science 0.65 0.70 0.71 9 The Library Quarterly 0.56 0.98 1.18 16 Electronic Library 0.48 0.98 1.11 20 Serials Review 0.38 0.33 0.53 12 Journal of Information Technology Research 0.23 0.54 0.54 6

IF, impact factor; CpD2, 2-year citations per document; CS, CiteScore; h5, h5-index.

3.50 3.50 IF IF

3.00 CpD2 3.00 CpD2 CiteScore CiteScore 2.50 2.50

2.00 2.00

1.50 1.50

1.00 1.00

0.50 0.50

0.00 0.00 0 2 4 6 8 10 12 14 16 18 J1 J2 J3 J4 J5 J6 J7 J8 J9 J10 J11 J12 J13 J14 J15 J16 J17

Fig. 1. Impact factor (IF), 2-year citations per document (CpD2), and CiteScore comparison.

To compute the normalized impact score of publication of 1 to 10 using Equation 28 to compute h5n, CSn, venues, we mapped the impact scores to the sample set scale CDn, and IFn. of 1 to 10 and added the average difference of normalized c. Compute the average differences of IFn and h5n scores and the normalized IF (e.g., IF-h5) of a sample set. (mIF-h5), IFn and CSn (mIF-CS), and IFn and CDn The steps outlined below describes the normalization (mIF-CD). process in detail: 2. For the study data, a. Select the first non-missing value from IF, CpD, 1. For the sample set, CiteScore, and h5-index. a. Find the minimum and maximum values7 of h5- b. If the score (e.g., h5-index) is greater than or equal index, CiteScore, CpD, and IF. to the sample set minimum, map it to the sample b. Map h5-index, CiteScore, CpD, and IF to the scale set scale of 1 to 10 using Equation 2 and add the

6 8 Journal impact scores were collected in June 2018. Note that min(SCnorm) will be 1 and max(SCnorm) will be 10 when 7 min(SC) and max(SC) in Equation 2 mapping to the scale of 1 to 10.

49 http://www.jistap.org JISTaP Vol.6 No.3, 45-60

Table 5. Impact score normalization example IF CpD CS h5 IFn CDn CSn h5n IF-CD IF-CS IF-h5 IFnSC A1 2.23 2.75 2.74 53 10.00 10.00 10.00 10.00 0.00 0.00 0.00 10.00 A2 2.15 2.33 2.30 49 9.63 8.29 8.20 9.23 1.34 1.43 0.39 9.63 A3 2.00 2.18 2.36 39 8.98 7.66 8.45 7.32 1.32 0.54 1.66 8.98 A4 1.69 1.99 2.24 26 7.59 6.90 7.95 4.83 0.69 -0.37 2.76 7.59 A5 1.52 1.94 2.15 25 6.78 6.71 7.59 4.64 0.07 -0.80 2.14 6.78 A6 1.37 1.31 2.74 22 6.14 4.14 10.00 4.06 2.00 -3.86 2.08 6.14 A7 1.29 2.18 1.99 25 5.76 7.69 6.93 4.64 -1.93 -1.18 1.12 5.76 A8 1.03 1.43 1.33 16 4.60 4.64 4.23 2.91 -0.05 0.36 1.68 4.60 A9 0.84 0.63 0.79 18 3.75 1.38 2.02 3.30 2.37 1.73 0.45 3.75 A10 0.76 1.14 1.39 20 3.38 3.45 4.48 3.68 -0.06 -1.10 -0.30 3.38 A11 0.65 0.70 0.71 9 2.89 1.66 1.7 1.57 1.23 1.19 1.32 2.89 A12 0.56 0.98 1.18 16 2.48 2.80 3.62 2.91 -0.33 -1.14 -0.44 2.48 A13 0.48 0.98 1.11 20 2.14 2.79 3.33 3.68 -0.64 -1.19 -1.54 2.14 A14 0.38 0.33 0.53 12 1.67 0.17 0.96 2.15 1.50 0.71 -0.48 1.67 A15 0.23 0.54 0.54 6 1.00 1.00 1.00 1.00 0.00 0.00 0.00 1.00 IF CpD CS h5 IFn CDn CSn h5n mIF-CD mIF-CS mIF-h5 IFnSC B1 2.92 3.26 2.99 36 13.11 13.11 B2 2.75 50 10.01 0.58 10.59 B3 2.15 7.59 -0.28 7.31 B4 33 6.17 0.83 7.00 B5 0.10 0.19 0.19 B6 0.01 Sample set = A1-A15, Study data = B1-B6. IF, impact factor; CpD, citations per document; CS, CiteScore; h5, h5-index.

average normalized score difference (e.g., mIF-h5) to the LIS publication data to generate 25 sets of publication compute the normalized impact score (IFnSC). scores for each of 195 authors and 35 universities to produce c. If the score is less than the sample set minimum, divide author and university rankings according to each standard. it by the sample set minimum to compute IFnSC. The rankings produced by 25 standards were compared to d. If all scores are missing, set IFnSC to be 0.01. ascertain how the differences in standards affect the research evaluation outcome. The order of preference in Step 2a corresponds to the In order to assess how well the standards measure order of similarity to IF, while the least similar measure research performance, we then compared the rankings of h5-index drives the determination of the sample set to by university assessment standards with the rankings ensure the optimal approximation of the overall normalized generated from applying bibliometric measures, such as impact score. Table 5 further illustrates the normalization publication count, citation count, h-index, and impact process by example. factor, to publication data. Specifically, the author ranking by publication count is based on the number of articles an (SC - min(SC))×(max(SCnorm) - min(SCnorm)) ISCnorm = min(SCnorm) + (2) author has published, citation count ranking is based on the max(SC) - min(SC) number of citations an author received, h-index ranking is based on the h-index of each author, and impact factor 3.3. Data Analysis ranking is based on the sum of normalized impact scores To discover differences in research evaluation practices of the author’s publications. The university rankings are in Korean universities, we first normalized and compared generated in a similar fashion by aggregating publication 25 research assessment standards from major Korean counts, citation counts, and impact scores as well as by universities. We then applied the evaluation standards to computing the h-index for each university.

50 Quantifying Quality: Research Performance Evaluation in Korean Universities

Table 6. Bibliometric score computation 4. STUDY RESULTS

University Standard SC(pau) = SCnorm (AS) × wt(au) SC(au) = ∑ SC(p ) 4.1. Analysis of University Standards Score p∈au au SC(in) = ∑au∈INSC(au) The university standards, which specify scoring guidelines

Publication Count Score SC(pau) = hwt(au) for publication venues as well as co-authorship credit, follow SC(au) = ∑p∈auSC(pau) a similar pattern of publication venue categorization and co- SC(in) = ∑ SC(au) au∈IN author contribution computation. Publication venues are

Citation Count Score SC(pau) = cc(pau) × hwt(au) classified into 10 journal categories along with domestic and SC(au) = ∑p∈auSC(pau) international proceedings. Journal categories, in the order of SC(in) = ∑ SC(au) au∈IN importance, are Science, Nature, Cell (CNS), SSCI, and h-index Score SC(au) = hidx(au) Humanities Citation Index (A&HCI), SCI, Science Citation SC(in) = ∑ SC(au) au∈IN Index Expanded (SCIE), Scopus, KCI, Korea Citation

Impact Score SC(pau) = IFnSC(pau) × hwt(au) Expanded (KCIE), non-indexed international journal, and SC(au) = ∑p∈auSC(pau) non-indexed domestic journal. SC(in) = ∑ SC(au) au∈IN Over two thirds of the universities (17 out of 25) group au, author; in, institution (i.e., university); pau, article by au; SCnorm(AS), normalized SSCI, A&HCI, and SCI into a single category and only one university assessment standard; wt(au), institution co-authorship weight; hwt(au), harmonic co-authorship weight; cc(pau), citation count of pau; hidx(au): differentiates between SSCI and A&HCI, thereby creating 7 h-index of au; IFnSC(p ), normalized impact score of p ’s venue. au au major categories of CNS, WoS (SCI/SSCI/A&HCI), SCIE, KCI, KCIE, non-indexed international journal, and non- indexed domestic journal. In general, CNS journals count The study computed Spearman’s rank correlation 1.5 times as much as WoS on average, WoS about 2.5 times to discover statistically significant differences between KCI, and expanded category journals (i.e., SCIE, KCIE) rankings, which reflects the differences in evaluation are assigned about 80% of the non-expanded category methods that generated those rankings. For rank correlation scores while non-indexed journals and proceedings are analysis, we examined rank cluster correlations since the given smaller fractions of the KCI score. Table 7, which overall rank correlation can mask the local differences lists normalized journal category score statistics, shows (H. Lee & Yang, 2015). Four types of rank clusters were the widest range for CNS and high variabilities9 for non- identified by ranking the scores by publication count indexed journals and proceedings. (PCrank), citation count (CCrank), h-index (Hrank), Comparison of the current standards with an older set and impact scores (ISrank) and partitioning the resulting of standards (Table 8) reveals a definite trend of increasing rankings into groups (e.g., rank 1-40, rank 41-80, etc.). importance for internationally indexed journals over PCrank for authors, for instance, clusters authors with domestic journals. When we compare the standards by similar productivity level, while CCrank groups those with university type, it appears that private universities and similar impact level and so on. The idea of rank partition universities in the Seoul area give more weight to WoS types is to identify subgroups below author or institution journals on average though the scores vary more widely level where local trends occur. than for their counterparts (Table 9, Figs. 2 and 3). It should be noted that the study used the harmonic The co-authorship credit allocation component of formula (Equation 3) in conjunction with bibliometric the university standards, which has been investigated measures (e.g., citation count), which was shown to be extensively by prior research (J. Lee & Yang, 2015; H. closely correlated to the co-authorship credit allocation Lee & Yang, 2015, 2017), was excluded from the study formula employed by universities (H. Lee & Yang, 2015, to simplify analysis and focus on university standards in 2017), to fractionize the publication score according to co- practice. Instead of conducting a complex investigation of author contributions. At the university level, the publication publication venue and co-authorship allocation compound scores of co-authors affiliated with the university are effect, the study used the actual combination of venue summed up to arrive at each publication score. Table 6 scores and co-authorship credit formulas for each university formalizes the score computations described above.

9 100 r = rank of an author Relative mean absolute deviation, computed by dividing mean hwt(au) = (3) absolute deviation by arithmetic mean, can be regarded as the 1 1 ... 1 (1 + 2 + 2 + + N) N = number of authors variability normalized across different mean values.

51 http://www.jistap.org JISTaP Vol.6 No.3, 45-60 standard while coupling bibliometric measures with the to university standard co-authorship credit formulas (J. Lee harmonic formula that has been shown to be most similar & Yang, 2015; H. Lee & Yang, 2015, 2017).

Table 7. Normalized publication venue category score statistics Average score Minimum score Maximum score Median SD RMAD CNS 368.33 100 1000 300 222.13 0.46 SSCI 281.67 100 350 250 104.36 0.28 A&HCI 279.67 100 350 250 105.47 0.29 SCI 253.67 100 333 200 106.07 0.31 SCIE 207.33 100 333 200 83.53 0.29 Scopus 149.33 67 200 133 62.99 0.27 KCI 100 - - - - - KCIE 84.8 50 100 90 17.21 0.18 nIJ 72.61 0 200 67 73.00 0.43 nDJ 34.58 0 100 38 40.78 0.66 IP 31.52 0 70 30 28.00 0.42 DP 15.65 0 33 17 14.44 0.49 SD, standard deviation; RMAD, relative mean absolute deviation; CNS, Science, Nature, Cell; SSCI, Social Science Citation Index; A&HCI, Arts and Humanities Citation Index; SCI, Science Citation Index; SCIE, Science Citation Index Expanded; KCI, Korea Citation Index; KCIE, Korea Citation Expanded; nIJ, non-indexed international journal; nDJ, non-indexed domestic journal; IP, international proceeding; DP, domestic proceeding.

Table 8. Comparison of 2014 and 2018 standards Average score Increase 2014 2018 ∆ % CNS 340.43 368.33 27.90 8.20 SSCI 249.26 281.67 32.41 13.00 A&HCI 247.41 279.67 32.26 13.04 SCI 231.98 253.67 21.69 9.35 SCIE 196.48 207.33 11.05 5.63 Scopus 142.10 149.33 7.23 5.09 KCI 100 100 0.00 0.00 KCIE 83.02 84.8 1.78 2.14 CNS, Science, Nature, Cell; SSCI, Social Science Citation Index; A&HCI, Arts and Humanities Citation Index; SCI, Science Citation Index; SCIE, Science Citation Index Expanded; KCI, Korea Citation Index; KCIE, Korea Citation Expanded.

Table 9. Comparison of standards by university type and region National (6) Private (19) Seoul Area (11) Provincial (14) Average SD Average SD Average SD Average SD CNS 423.61 339.38 350.88 180.15 351.52 186.70 381.55 252.68 SSCI 229.17 67.85 298.25 109.70 296.97 121.52 269.64 91.61 A&HCI 229.17 67.85 295.61 111.51 296.97 121.52 266.07 93.40 SCI 209.72 76.81 267.54 111.87 274.24 125.69 237.50 89.32 SCIE 184.72 65.07 214.74 88.88 223.94 94.14 194.64 75.22 Scopus 146.67 46.76 150.18 68.41 150.45 86.82 148.45 39.05 KCI 100 - 100 - 100 - 100 - KCIE 90.00 15.49 83.16 17.79 82.88 17.48 86.31 17.49 nIJ 73.00 70.13 72.49 31.95 76.67 31.10 69.43 50.36 nDJ 40.78 33.36 32.62 26.44 25.08 22.63 42.05 29.82 IP 28.00 17.66 32.63 17.32 23.48 14.05 37.83 17.13 DP 14.44 10.89 16.03 9.39 11.77 8.33 18.69 9.61 SD, standard deviation; CNS, Science, Nature, Cell; SSCI, Social Science Citation Index; A&HCI, Arts and Humanities Citation Index; SCI, Science Citation Index; SCIE, Science Citation Index Expanded; KCI, Korea Citation Index; KCIE, Korea Citation Expanded; nIJ, non-indexed international journal; nDJ, non-indexed domestic journal; IP, international proceeding; DP, domestic proceeding.

52 Quantifying Quality: Research Performance Evaluation in Korean Universities

450.00 All 400.00 National (6) Private (19) 350.00

300.00

250.00

200.00

150.00

100.00

50.00 CNS SSCI A & HCI SCI SCIE SCOPUS KCI KCIE Fig. 2. Comparison of standards by university type. CNS, Science, Nature, Cell; SSCI, Social Science Citation Index; A&HCI, Arts and Humanities Citation Index; SCI, Science Citation Index; SCIE, Science Citation Index Expanded; KCI,Korea Citation Index; KCIE, Korea Citation Expanded.

400.00 All National (6) 350.00 Private (19)

300.00

250.00

200.00

150.00

100.00

50.00 CNS SSCI A & HCI SCI SCIE SCOPUS KCI KCIE Fig. 3. Comparison of standards by university region. CNS, Science, Nature, Cell; SSCI, Social Science Citation Index; A&HCI, Arts and Humanities Citation Index; SCI, Science Citation Index; SCIE, Science Citation Index Expanded; KCI, Korea Citation Index; KCIE, Korea Citation Expanded.

4.2. Analysis of Rankings by University Standards data over institutions also tends to mask local trends since To ascertain the statistical significance of differences differences among authors can even out at the institution among university standards, the study applied 25 university level (H. Lee & Yang, 2015, 2017). standards to the publication data of 195 LIS tenure-track Spearman’s rank-order correlation results for author faculty members in 35 universities in Korea to generate ranking clusters show evidence for the different research 25 sets of rankings for each author and university. As was performance assessment standard employed by Chonbuk done in prior studies (H. Lee & Yang, 2015, 2017), rankings National University (CNU). This is likely due to the fact were partitioned to isolate and identify “local” trends (e.g., that CNU is the only university that awards SCIE journals mid-level productivity group) that can be masked in overall 300 points (vs. a mean of 207) and one of three universities statistics. A common example of such is the “averaging that give 200 points for Scopus journals (vs. a mean of effect” that overwhelms differences in a subgroup when 149). In addition, CNU’s co-authorship credit allocation averaging over total population. Aggregating publication formula gives more weight to co-authored papers with the

53 http://www.jistap.org JISTaP Vol.6 No.3, 45-60 denominator of (n+1) instead of more typical (n+2) where Table 10, for instance, implies that university standards will n is the number of authors. Table 10, which denotes the rank rank authors with a similar number of published articles more clusters where significant correlation values (i.e., rho) of differently than those with a similar number of citations. More standard pairs are below 0.5, and Fig. 4, which plots author entries in rho middle rows (e.g., rank 81-120) rank suggests rankings of PCrank clusters, clearly shows CNU’s standard that mid-level authors are more sensitive to differences in to be more different than other university standards. evaluation standards than top- or bottom-level authors are, In Fig. 4, each line represents an author’s publication scores which is consistent with findings from prior studies (H. Lee & computed by 25 university standards. The more similar Yang, 2017). the standards, the flatter the slope of the line will appear. Institution rankings, in keeping with prior research Implications of different rank clusters are not totally clear at findings, did not show significant differences and will not be this point and requires further study. Generally speaking, included in the analysis other than to say that aggregation PCrank clusters of author rankings, which were arrived at at the institution level can nullify any individual differences by sorting author rankings by their publication count and that may have existed at the author level. One may also posit partitioning them in groups of 40, identify author clusters that LIS departments in Korea have comparable levels of of similar productivity, whereas CCrank clusters, sorted by research performance. Fig. 5, which plots the rankings of citation count, identify author clusters of similar impact. More LIS departments by university standards, is included for rho entries in the PCrank column than CCrank column in comparison purposes.

Table 10. Spearman’s rank-order correlation of university standards10 PCrank ISrank CCrank Hrank .323* : EWU-CNU .438** : DEU-CNU .494** : DEU-CNU .369* : KMU-CNU .438** : HNU-CNU .494** : KSU-CNU .389* : SMU-CNU .438** : KSU-CNU .494** : HNU-CNU rank 1-40 .390* : SWU-CNU .493** : EWU-CNU .390* : KIU-CNU .392* : KNU-CNU .474** : DU-CNU .334* : HSU-CNU .385* : DU-CNU .338* : SKKU-CNU .398* : KGU-CNU .338* : INU-CNU .398* : CJU-CNU rank 41-80 .348* : SWU-CNU .398* : KNU2-CNU .370* : PNU-CNU .417** : CNU2-CNU .432** : CUD-CNU .315* : KU-CNU .316* : KU-CNU .329* : CNU2-CNU .316* : CNU2-CNU .393* : CUD-CNU .386* : CUD-CNU .425** : KNU2-CNU .401* : KGU-CNU .425** : KGU-CNU .401* : CJU-CNU rank 81-120 .425** : CJU-CNU .401* : KNU2-CNU .472** : DWU-CNU .447** : DW2-CNU .479** : HSU-CNU .456** : SKKU-CNU .491** : PNU-CNU .456** : INU-CNU .462** : PNU-CNU .495** : SWU-CNU .375* : ICN-CNU .314* : CNU2-CNU .375* : SKKU-CNU .315* : CUD-CNU .381* : MJU-CNU .350* : KU-CNU .421** : SWU-CNU .404** : CJU-CNU rank 121-160 .404** : KNU2-CNU .404** : KGU-CNU .428** : HSU-CNU .439** : DWU-CNU .455** : PNU-CNU

rank 161-195

*p<0.05, **p<0.01.

10 See Table 1 for university acronyms.

54 Quantifying Quality: Research Performance Evaluation in Korean Universities

Fig. 4. Author rankings by university standards: PCrank clusters.

55 http://www.jistap.org JISTaP Vol.6 No.3, 45-60

Fig. 5. Institution rankings.

4.3. Comparison of University Standards with that authors with high h-index do not necessary have to Bibliometric Measures have high publication count. In fact, we can see in the author In order to test the reliability and stability of university rankings of Hrank cluster plot (Fig. 7) that many authors in standards for research assessment, we compared the the top Hrank cluster (i.e., Hrank1-40) have low pcnt ranks rankings by university standards with rankings by (pcnt-har below 40) and those in the second Hrank cluster publication count, citation count, impact score, and (i.e., Hrank41-80) have pcnt ranks above and below the rank h-index.11 When we examine evaluation measure pairs interval of 41 to 80. In addition, there are many more data with rho below 0.5 (p<0.05) for author ranking clusters in points for pcnt-har than for h-index even within the same Table 11, we can see that university standards tend to differ rank range, which reflects the finer granularity of citation from citation-driven evaluation measures such as citation count that will also influence rho. The CCrank cluster plot in count, h-index, and impact score. Citation count (ccnt), Fig. 6 shows similar patterns of rank discrepancies between being the most granular measure of impact, is the most citation count and h-index rankings. The granularity ofccnt- prevalent entry in Table 11, and h-index (hidx), being the har over h-index, especially in the top cluster, and wider least granular measure that often produces identical ranks, rank spread of h-index in CCrank clusters are likely causes appear frequently in CCrank and Hrank clusters, which we of low rho values shown in the CCrank column of Table 11. believe is partially due to its low granularity. Lack of SCH- Incidentally, school12 in CCrank and Hrank clusters exhibits pcnt and SCH-is pairs in Table 11 suggest that university similar patterns of rank spread and granularity differences standards are similar to publication count and impact score with bibliometric measures. measures. This may be due to the fact that the majority of There is much to be gleaned from the rank cluster publications are articles published in domestic journals with analysis but such is beyond the focus of this study. Both the similar impact scores. rank cluster tables and plots clearly show that university As for bibliometric measure pairs, impact score and standards differ from bibliometric measures in how they citation count (is-ccnt) shows weak correlations in PCrank assess faculty research performance. Indeed, the university’s and ISrank clusters while h-index pairings occur in CCrank aim to introduce quality assessment into quantitative and Hrank clusters as with the university standards. The evaluation of research performance does not appear to have pcnt-hidx in top Hrank clusters (rank1-40, 41-80) suggests been effectively realized by the current standards.

12 school in rank cluster plots represents KNU2 standard, which is most 11 See Table 6 for bibliometric score formulas. similar to the average standard shown in Table 8.

56 Quantifying Quality: Research Performance Evaluation in Korean Universities

Fig. 6. Comparison of university standard and bibliometric measures: PCrank and CCrank clusters.

57 http://www.jistap.org JISTaP Vol.6 No.3, 45-60

Fig. 7. Comparison of university standard and bibliometric measures: ISrank and Hrank clusters.

58 Quantifying Quality: Research Performance Evaluation in Korean Universities

Table 11. Spearman’s rank-order correlation of research performance measures13

PCrank ISrank CCrank Hrank

SCH-ccnt (15) SCH-ccnt (23) SCH-hidx (19) SCH-hidx (25) is-ccnt pcnt-ccnt SCH-ccnt(14) pcnt-hidx rank 1-40 is-ccnt pcnt-ccnt is-ccnt

SCH-ccnt (17) SCH-ccnt (13) SCH-hidx (23) rank 41-80 is-ccnt ccnt-hidx pcnt-hidx is-hidx

is-ccnt SCH-ccnt (22) SCH-hidx (23) rank 81-120 is-ccnt pcnt-hidx ccnt-hidx

SCH-ccnt (24) SCH-ccnt (12) ccnt-hidx SCH-hidx (25) is-ccnt pcnt-hidx rank 121-160 is-hidx ccnt-hidx

rank 161-195 is-hidx ccnt-hidx

SCH, university standards; pcnt, publication count; ccnt, citation count; is, impact score; hidx, h-index.

5. DISCUSSION The results showed the university standards to be more or less similar to one another in general but different from Research performance evaluation in Korean universities citation-driven measures, which suggests the standards are follows strict guidelines that specify scoring systems for not quite successful in quantifying the quality of research publication venue categories and formulas for weighing as originally intended. The standards not only do not each author’s contribution to a co-authored publication. differentiate between journals in the same category but also To find out how the standards differ across universities treat articles in the same journal to be of the same quality, and how the differences affect the research evaluation thereby favoring in essence quantity over quality. In addition, outcome, we first examined 25 such standards, after which the standards’ punitive scoring of co-authored publications we applied the standards along with four bibliometric discourages collaboration and multi-disciplinary research, measures to the publication data of LIS faculty members which is the hallmark of a vibrant research community. in Korean universities to generate rankings by each evaluation measure. We then analyzed ranking differences by examining Spearman’s rank-order correlation outputs in REFERENCES rank clusters to ascertain the robustness of measures as well as to gain insights into how different measures influence the Adkins, D., & Budd, J. (2006). Scholarly productivity of U.S. research evaluation outcome. We also introduced a novel LIS faculty. Library & Information Science Research, impact score normalization method as well as analyzing 28, 374-389. the actual combination of publication venue scores and co- Budd, J. M., & Seavey, C.A. (1996). Productivity of U.S. authorship credit formulas for each university in order to library and information science faculty: The Hayes enhance the research methodologies employed in prior study revisited. Library Quarterly, 66(1), 1-20. studies. Cronin, B., & Meho, L. I. (2006). Using the h-index to rank influential information scientists. Journal of the American Society for Information Science and Technology, 57(9), 1275-1278. 13 Numbers in parentheses denote the number of university standards. SCH-ccnt (15), for instance, means there were 15 university standard Drott, M. C. (1995). Reexamining the role of conference and citation count ranking pairs with rho < 0.5. papers in scholarly communication. Journal of the

59 http://www.jistap.org JISTaP Vol.6 No.3, 45-60

American Society for Information Science, 46(4), 299- Lisée, C., Larivière V., & Archambault, E. (2008). Conference 305. proceedings as a source of scientific information: A Hayes, R. M. (1983). Citation statistics as a measure of bibliometric analysis. Journal of the American Society faculty research productivity. Journal of Education for for Information Science and Technology, 59(11), 1776- Librarianship, 23(3), 151-172. 1784. Lee, H., & Yang, K. (2015). Comparative analysis of Korean Yang, K., & Lee, J. (2012). Analysis of publication patterns universities’ co-author credit allocation standards on in Korean library and information science research. journal publications. Journal of Korean Library and Scientometrics, 93(2), 233-251. Information Science Society, 46(4), 191-205. Yang, K., & Lee, J. (2013). Bibliometric approach to research Lee, H., & Yang, K. (2017). Comparative analysis of Korean assessment: Publication count, citation count, & author universities’ journal publication research performance rank. Journal of Information Science Theory and evaluation standard. Journal of Korean Library and Practice, 1(1), 27-41. Information Science Society, 48(2), 295-322. Lee, J., & Yang, K. (2015). Co-authorship credit allocation methods in the assessment of citation impact of chemistry faculty. Journal of the Korean Society for Library and Information Science, 49(3), 273-289.

60 Call for Paper Journal of Information Science Theory and Practice (JISTaP)

We would like to invite you to submit or recommend papers to Journal of Information Science Theory and Practice (JISTaP, elSSN: 2287-4577, plSSN: 2287-9099), a fast track peer-reviewed and no-fee open access academic journal published by Korea Institute of Science and Technology Information (KISTI), which is a government-funded research institute providing STI services to support high-tech R&D for researchers in Korea. JISTaP marks a transition from Journal of Information Management to an English-language international journal in the area of library and information science.

JISTaP aims at publishing original studies, review papers and brief communications on information science theory and practice. The journal provides an international forum for practical as well as theoretical research in the interdisciplinary areas of information science, such as information processing and management, knowledge organization, scholarly communication and bibliometrics.

We welcome materials that reflect a wide range of perspectives and approaches on diverse areas of information science theory, application and practice. Topics covered by the journal include: information processing and management; information policy; library management; knowledge organization; metadata and classification; information seeking; information retrieval; information systems; scientific and technical information service; human-computer interaction; social media design; analytics; scholarly communication and bibliometrics. Above all, we encourage submissions of catalytic nature that explore the question of how theory can be applied to solve real world problems in the broad discipline of information science.

Co-Editors in Chief : Gary Marchionini & Dong-Geun Oh

Please click the "Online Submission" link in the JISTaP website (http://www.jistap.org), which will take you to a login/ account creation page. Please consult the "Author's Guide" page to prepare your manuscript according to the JISTaP manuscript guidelines.

Any question? Suhyeon Yoo (managing editor) : [email protected]

61 Information for Authors

The Journal of Information Science Theory and Practice (JISTaP), which is published quarterly by the Korea Institute of Science and Technology (KISTI), welcomes materials that reflect a wide range of perspectives and approaches on diverse areas of information science theory, application and practice. JISTaP is an open access journal run under the Open Access Policy. See the section on Open Access for detailed information on the Open Access Policy.

A. Originality and Copyright All submissions must be original, unpublished, and not under consideration for publication elsewhere. Once an article is accepted for publication, all papers are accessible to all users at no cost. If used for other researches, its source should be indicated in an appropriate manner and the content can only be used for uncommercial purpose under Creative Commons license.

B. Peer Review All submitted manuscripts undergo a single-blind peer review process in which the identities of the reviewers are withheld from the authors.

C. Manuscript Submission Authors should submit their manuscripts online via Article Contribution Management System (ACOMS). Online submission facilitates processing and reviewing of submitted articles, thereby substantially shortening the paper lifecycle from submission to publication. After checking the manuscript’s compliance to the Manuscript Guidelines, please follow the “Online Submission” hyperlink in the top navigation menu to begin the online manuscript submission process.

D. Open Access With the KISTI’s Open Access Policy, authors can choose open access and retain their copyright or opt for the normal publication process with a copyright transfer. If authors choose open access, their manuscripts become freely available to public under Creative Commons license. Open access articles are automatically archived in the KISTI’s open access repository (KPubS, www.kpubs.org). If authors do not choose open access, access to their articles will be restricted to journal users.

E. Manuscript Guidelines Manuscripts that do not adhere to the guidelines outlined below will be returned for correction. Please read the guidelines carefully and make sure the manuscript follows the guidelines as specified. We strongly recommend that authors download and use the manuscript template in preparing their submissions.

62 Manuscript Guidelines

1. Page Layout : All articles should be submitted in single column text on standard Letter Size paper (21.59 × 27.94 cm) with normal margins.

2. Length : Manuscripts should normally be between 4,500 and 9,000 words (10 to 20 pages).

3. File Type : Articles should be submitted in Microsoft Word format. To facilitate the manuscript preparation process and speed up the publication process, please use the manuscript template.

4. Text Style : • Use a standard font (e.g., Times New Roman) no smaller than size 10. • Use single line spacing for paragraphs. • Use footnotes to provide additional information peripheral to the text. Footnotes to tables should be marked by superscript lowercase letters or asterisks.

5. Title Page : The title page should start with a concise but descriptive title and the full names of authors along with their affiliations and contact information (i.e., postal and email addresses). An abstract of 150 to 250 words should appear below the title and authors, followed by keywords (4 to 6).

Author1 Affilliation, Postal Address. E-mail

Author2 Affilliation, Postal Address. E-mail

ABSTRACT A brief summary (150-250 words) of the paper goes here.

Keywords : 4 to 6 Keywords, separated by commas.

6. Numbered Type :

1. INTRODUCTION All articles should be submitted in single column text on standard letter size paper (21.59 × 27.94 cm) with normal margins[1 . Text should be in 11-point standard font (e.g., Times New Roman) with single line spacing. [1 Normal margin dimensions are 3 cm from the top and 2.54 cm from the bottom and sides.

63 2. SECTIONS The top-level section heading should be in 14-point bold all uppercase letters.

2.1. Subsection Heading 1 The first-level subsection heading should be in 12-point bold with the first letter of each word capitalized.

2.1.1. Subsection Heading 2 The second-level subsection heading should be in 11-point italic with the first letter of each word capitalized.

7. Figures and Tables : All figures and tables should be placed at the end of the manuscript after the reference list. To note the placement of figures and tables in text, “Insert Table (or Figure) # here” should be inserted in appropriate places. Please use high resolution graphics whenever possible and make sure figures and tables can be easily resized and moved.

Figure

14

12

10

8

6

Num.of Faculty 4

2

0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 Num.of Paper

Fig. 1. Distribution of authors over publication count.

Table

Table 1. The title of table goes here Study Time period study Data Smith Wesson (1996) 1970 - 1995 684 papers in 4 SSCI journals Reeves [a (2002) 1997 - 2001 597 papers in 3 SSCI journals Jones Wilson [b (2011) 2000 - 2009 2,166 papers in 4 SSCI journals

[a Table footnote a goes here [b Table footnote b goes here

64 8. Acknowledgements : Acknowledgements should appear in a separate section before the reference list.

9. Citations : Citations in text should follow the author-date method (authors’ surname followed by publication year). • Several studies found... (Barakat et al., 1995; Garfield, 1955; Meho & Yang, 2007). • In a recent study (Smith & Jones, 2011)... • Smith and Jones (2011) investigated...

10. Reference List : Reference list, formatted in accordance with the American Psychological Association (APA) style, should be alpha-betized by the first authors last name.

Journal article • Author, A., Author, B. & Author, C. (Year). Article title. Journal Title, volume(issue), start page-end page. • Smith, K., Jones, L. J., & Brown, M. (2012). Effect of Asian citation databases on the impact factor. Journal of Information Science Practice and Theory, 1(2), 21-34.

Book • Author, A., & Author, B. (Year). Book title. Publisher Location: Publisher Name. • Smith, K., Jones, L. J., & Brown, M. (2012). Citation patterns of Asian scholars. London: Sage.

Book chapter • Author, A., & Author, B. (Year). Chapter title. In A. Editor, B. Editor, & C. Editor (Eds.), Book title (pp. xx-xx). Publisher Location: Publisher Name. • Smith, K. & Brown, M. (2012). Author impact factor by weighted citation counts. In G. Martin (Ed.), Bibliometric approach to quality assessment (pp. 101-121). New York: Springer.

Conference paper • Author, A., & Author, B. (Year). Article title. In A. Editor & B. Editor (Eds.), Conference title (pp. xx-xx). Publisher Location: Publisher Name. • Smith, K. & Brown, M. (2012). Digital curation of scientific data. In G. Martin & L. J. Jones (Eds.), Proceedings of the 12th International Conference on Digital Curation (pp. 41-53). New York: Springer.

Online document • Author, A., & Author, B. (Year). Article title. Retrieved month day, year from URL. • Smith, K. & Brown, M. (2010). The future of digital library in Asia. Digital Libraries, 7,111-119. Retrieved May 5, 2010, from http://www.diglib.org/publist.htm.

65

JISTaP Journal of Information Science Theory and Practice http://www.jistap.org

66, Hoegi-ro, Dongdaemun-gu, Seoul, Republic of Korea(ZIP code: 02456) Tel. +82-2-3299-6102 Fax. +82-2-3299-6067 http://www.jistap.org