eISSN : 2287-4577 pISSN : 2287-9099 http://www.jistap.org Vol.1 No.1 March 30, 2013 Journal of Information Science Theory and Practice

Indexed/Covered by KSCI, KoreaScience and CrossRef General Information

Aims and Scope The Journal of Information Science Theory and Practice (JISTaP) is an international journal that aims at publishing original studies, review papers and brief communications on information science theory and practice. The journal provides an international forum for practical as well as theoretical research in the interdisciplinary areas of information science, such as information processing and manage- ment, knowledge organization, scholarly communication and bibliometrics. JISTaP will be published quarterly, issued on the 30th of March, June, September, and December. JISTaP is indexed in the Korea Science Citation Index (KSCI) and KoreaScience by the Korea Institute of Science and Technology Information (KISTI) as well as CrossRef. The full text of this journal is available on the website at http://www.jistap.org

Indexed/Covered by

Publisher Korea Institute of Science and Technology Information 245 , Yuseong-gu, Daejeon, Republic of Korea (T) 82-42-869-1615 (F) 82-42-869-1767 E-mail: [email protected] URL: http://www.jistap.org

Design & Printing Company: Visual Storm 53-2 Eunhaeng-dong, Jung-gu, Daejeon, Republic of Korea (T) 82-42-223-8581 (F) 82-42-223-8583 E-mail: [email protected]

Open Access and Creative Commons License Statement All JISTaP content is Open Access, meaning it is accessible online to everyone, without fee and authors’ permission. All JISTaP content is published and distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/). Under this license, authors reserve the copyright for their content; however, they permit anyone to unrestrictedly use, distribute, and reproduce the content in any medium as far as the original authors and source are cited. For any reuse, redistribution, or reproduction of a work, users must clarify the license terms under which the work was produced.

2013 Copyright Korea Institute of Science and Technology Information Editorial Board

Co-Editors-in-Chief Gary Marchionini University of North Carolina, USA Dong-Geun Oh Keimyung University, Korea

Associate Editors Honam Choi Korea Institute of Science and Technology Information, Korea Kiduk Yang Kyungpook National University, Korea

Managing Editors Hea Lim Rhee Korea Institute of Science and Technology Information, Korea Yong-Gu Lee Keimyung University, Korea

Editorial Board Consulting Editors

Beeraka Ramesh Babu Lokman I. Meho Sujin Butdisuwan Hur-Li Lee University of Madras, India American University of Mahasarakham University, University of Wisconsin- Beirut, Lebanon Thailand Milwaukee, USA France Bouthillier McGill University, Canada Jin Cheon Na Seon Heui Choi P. Rajendran Nanyang Technological Korea Institute of Science and SRM University, India Kathleen Burnett University, Singapore Technology Information, Florida State University, USA B. Ramesha Korea Daniel O. O’Connor Bangalore University, India Boryung Ju Rutgers University, USA Joy Kim Louisiana State University, Soo Young Rieh University of Southern USA Alice R. Robin University of Michigan, USA California, USA Indiana University Noriko Kando Tae-Sul Seo Bloomington, USA Kenneth Klein National Institute of Korea Institute of Science and University of Southern Informatics, Japan Paul Solomon Technology Information, California, USA University of South Carolina, Korea Mallinath Kumbar USA M. Krishnamurthy University of Mysore, India Tsutomu Shihota DRTC, Indian Statistical St. Andrews University, Institute, India Shailendra Kumar Japan University of Delhi, India S.K. Asok Kumar Ning Yu The Tamil Nadu Dr Ambedkar Fenglin Li University of Kentucky, USA Wuhan University, China Law University, India

2013 Copyright Korea Institute of Science and Technology Information

Table of Contents

Vol. 1 No.1 March 30, 2013 JISTaP Journal of Information Science Theory and Practice http://www.jistap.org

Letters from Editor’s Desk 06 Gary Marchionini, Dong-Geun Oh / Co-Editors-in-Chief

President’s Congratulatory Message 08 Young Seo Park / President of KISTI

Congratulatory Message 09 Jung-Il Jin / President of Korean Council of Science Editors

Atricles 10 Domain Adaptation for Opinion Classification: A Self-Training Approach 10 - Ning Yu Bibliometric Approach to Research Assessment: Publication Count, 27 Citation Count, & Author Rank - Kiduk Yang, Jongwook Lee Information Needs and Seeking Behavior During the H1N1 Virus Outbreak 42 - Shaheen Majid, Nor Ain Rahmat A Study on Behavioral Traits of Library and Information Science 54 Students in South India - S. Baskaran, B. Ramesh Babu, S. Gopalakrishnan A Faceted Data Model for Bibliographic Integration Between MARC and 69 FRBR - Seungmin Lee Tracking down on 50-year History of Research about Information Manage- 83 ment and Technology in Korea

Editorial Board of JISTaP 92

Information for Authors 101

2013 Copyright Korea Institute of Science and Technology Information Letters from Editor’s Desk

We are at a paradigm shift in the evolution of scholarly publishing. The goals of scholarly publishing have always been to share new knowledge while staking origi- nality claims of the knowledge creator. Today, new models of scholarly publishing are emerging. In a globally connected electronic environment it is no longer easy to control access and distribution of publications. New models of open access, instan- taneous annotation and commentary, multimedia components, and data archives that accompany published results bring a renaissance in knowledge production as well as substantial challenges to the entire publishing enterprise.

In the midst of these changes the Journal of Information Science Theory and Practice (JISTaP) arrives as a bridge between peer-reviewed journals and the emer- gent online amalgam of bits from authors, reviewers, readers, and annotators. Establishing and sustaining quality and reputation in a new journal will require attention to the global research community, to electronic technologies that allow perfect copies and instant mass distribution and multimedia forms of expressions, and to new models for including supplementary materials and storing the papers, data, and ancillary objects associated with the scholarly process. It will require effi- cient work flows and attention to new models of access and delivery.

This is truly an auspicious time for an innovative publishing enterprise and JISTaP is driven by several principles that will guide it in serving as a scholarly bridge and meeting these challenges. These principles include:

Peer review. We are in an age where relevance as the key element of selection is giving way to credibility and quality control. JISTaP is committed to insuring that contributions are critically examined by scholars who are actively engaged in the area of research or practice discussed in the paper. Broad view of information science. JISTaP welcomes both theoretical and applied work from scholars working across the full range of information and library science. It encourages diverse points of views and methodologies and active debate and commentary. This is especially important in a rapidly evolving field. Global perspective. JISTaP aims to attract the brightest talent and best thinking from Asia and the entire world. JISTaP is an English language journal, however it aims to manage name authority in novel ways and to represent global experi-

06 ence, expression, and thought from multiple cultural perspectives. Open to change. JISTaP recognizes that scholarly publishing is in transition and is committed to trying new publication practices as long as they are in concert with the principles above. We welcome ideas from the information science community.

JISTaP has been launched as a paper-based journal with electronic access and will depend on an editorial board of distinguished scholars for peer review. We recog- nize that static text is not enough today and expect that authors will increasingly want to provide active figures and tables (e.g., spreadsheet rather than a snapshot), multimedia (e.g., user interface prototypes that show dynamics of interaction; ani- mations or videos of workflows), and provide underlying research data. We wel- come suggestions for ways to encourage and manage these developments over the coming years. We are committed to providing extensive metadata to support find- ability and understanding. One innovation is the use of QR codes for each article to encode author information that will help uniquely identify authors and their insti- tutions. We will add multilingual abstracts for articles and plan to incorporate annotations and social media that makes articles and the journal living documents that drive research and practice in the information field. We welcome your partici- pation, suggestions, and patience as we build this bridge across the interdiscipli- nary field and march from the limitation of paper-based publishing to the emerging hybrid of paper+electronic knowledge dissemination and preservation.

Gary Marchionini, Dong-Geun Oh / Co-Editors-in-Chief March 2, 2013

07 President s Congratulatory Message

I would like to congratulate Information Service Center of Korea Institute of Science and Technology (KISTI) on the publication of the first issue of the Journal of Information Science Theory and Practice (JISTaP).

KISTI actively advancing as a world-class information research institute, celebrated its 50th anniversary last year. KISTI originated from Korea Science & Technology Information Center (KORSTIC) in 1962, and went through the ages of Korea Institute for Industrial Economics & Trade (KIET), Korea Institute of Industry and Technology Information (KINITI) and Korea Research & Development Information Center (KORDIC). This journey until the establishment of the KISTI was definitely a great history of seeding the knowledge of science & technology and building advanced infrastructure.

The area of Information Dissemination is a primary misson of KISTI. The Journal of Information Management was published on such area, and it was inherited by JISTaP. I believe that the JISTaP will exemplify a globalized journal in the area of infor- mation science and technology.

Significant amount of time and effort were spent on the publication. JISTaP staffs have selected renowned professionals for the editorial board, and they also have made efforts to call for and publish high-quality papers. JISTaP publication is the achievement of KISTI by collaborating with academic organizations leading the library and information science field.

Because JISTaP is an open access journal, it is open to the public for free. To meet the demands for globalization of domestic scientific journals as well as to invigorate digital information dissemination, KISTI will build globalized publication life-cycle platform from the experience of JISTaP publication. By establishing such platform, anyone will be able to publish scientific knowledge easily, users will be able to access the publication freely, and authors’copyrights will be protected. This accom- plishment will build the open access scientific communication environment. Lastly, I would like to thank all JISTaP staffs who have contributed to the publica- tion for their efforts.

Young Seo Park / President of KISTI March 11, 2013

08 Congratulatory Message

The Korean Council of Science Editors (KCSE) is pleased to congratulate the Korea Institute of Science and Technology Information (KISTI) on the initiative of creating an international journal, Journal of Information Science Theory and Practice (JISTaP), which will serve as a tool in sharing innovations, ideas, knowledge and expertise, thereby enhancing information science theory, application and practice.

Over the last fifty years, KISTI has laid the foundation for R&D by collecting science and technology information from countries all over the world and providing the information to researchers in Korea. Thus, KISTI has been playing a key role in enabling Korea to join the ranks of developed countries by actively responding to the rapidly evolving global science and technology paradigm. In keeping with KISTI’s long-term objectives, I believe that the publication of JISTaP will become a valuable platform for promoting collaboration and sharing research findings and insights of specialists and distinguished scholars around the world.

I believe that science and technology hold answers to fundamental questions we must address to prepare for the future. KCSE is the national network of science edi- tors, who are all key researchers for the development of science and technology in Korea. KCSE has been working together with KISTI since its foundation to help learned societies create new synergies and collaboration opportunities. In this spirit, I welcome warmly the launch of JISTaP, a quarterly journal published by KISTI. JISTaP is a timely initiative that will support our common goal which is to harness the power of science and technology for the benefit of all.

I would like to extend my sincere congratulations and best wishes for success of JISTaP and I hope it will contribute to the development of information technology and encourage those who are involved in information science.

Jung-Il Jin, Ph.D. / President of Korean Council of Science Editors March 9, 2013

09 Research Paper JISTaP http://www.jistap.org J. of infosci. theory and practice 1(1): 10-26, 2013 Journal of Information Science Theory and Practice http://dx.doi.org/10.1633/JISTaP.2013.1.1.1

Domain Adaptation for Opinion Classification: A Self- Training Approach

Ning Yu* School of Library and Information Science University of Kentucky, USA E-mail: [email protected]

ABSTRACT Domain transfer is a widely recognized problem for machine learning algorithms because models built upon one data domain generally do not perform well in another data domain. This is especially a challenge for tasks such as opinion classification, which often has to deal with insufficient quantities of labeled data. This study investigates the feasibility of self-training in dealing with the domain transfer problem in opinion classification via leveraging labeled data in non-target data domain(s) and unlabeled data in the target-domain. Specifically, self-training is evaluated for effectiveness in sparse data situations and feasibility for domain adaptation in opinion classification. Three types of Web content are tested: edited news articles, semi-structured movie reviews, and the informal and unstructured content of the blogosphere. Findings of this study suggest that, when there are limited labeled data, self-training is a promising approach for opinion classification, although the contributions vary across data domains. Significant improvement was demonstrated for the most challenging data domain the blogosphere when a domain transfer-based self-training strategy was implemented.

Keywords:Domain adaptation, Opinion classification, Self-training, Semi-supervised learning, Sentiment analysis, Machine learning

1. INTRODUCTION ledge, and opinions. Retail websites such as Amazon. com and review aggregators such as Yelp. com collect The rapid growth of freely accessible and easily customer reviews on specific products or services customizable Web 2.0 applications has made it easy while blogs and social networking sites such as and fun for people to share their experiences, know- Twitter and Facebook allow users to publish opini-

Open Access

Received date: December 30, 2012 All JISTaP content is Open Access, meaning it is accessible Accepted date: February 23, 2013 online to everyone, without fee and authors’ permission. All JISTaP content is published and distributed under the terms of *Corresponding Author: Ning Yu the Creative Commons Attribution License (http://creativecom- Assistant professor mons.org/licenses/by/3.0/). Under this license, authors reserve School of Library and Information Science the copyright for their content; however, they permit anyone to University of Kentucky, USA unrestrictedly use, distribute, and reproduce the content in any E-mail: [email protected] medium as far as the original authors and source are cited. For any reuse, redistribution, or reproduction of a work, users must clarify the license terms under which the work was produced.

Ning Yu, 2013 10 Domain Adaptation for Opinion Classification

ons and share emotions on an infinite array of topics natural language processing (NLP) tasks, including ranging from the benefits of eating blueberries to the topic classification and sentiment analysis; but SSL U.S. presidential election. Being able to listen to and has seldom been examined for domain adaption. understand online voices is playing an important role Specifically, this study investigates applications of in today’s decision making for business practices, self-training for opinion classification in three types political campaigns, and daily life. of Web content: edited news articles, semi-struc- Since the late 1990s, researchers from different tured movie reviews, and the informal and unstruc- communities have been working in the area of sen- tured content of the blogosphere. An easily general- timent analysis, which includes tasks such as differ- izable and highly adaptable SSL algorithm, self- entiating opinions from facts (Wiebe, Wilson, Bruce, training, is evaluated for its effectiveness in sparse Bell, & Martin, 2004; Yang, Yu, & Zhang, 2007), detect- data situations and domain adaptation. ing positive and negative polarity (Abbasi, Chen, & Salem, 2008; Pang, Lee, & Vaithyanathan, 2002), classifying fine-grain emotions (Bollen, Mao & 2. BACKGROUND AND RELATED WORK Zeng, 2011; Yu, Kubler, Herring, Hsu, Israel & Smiley, 2012), and identifying other opinion properties Two major sentiment analysis strategies exist in (Tsou, Yuen, Kwong, Lai, & Wong, 2005; Ku & Chen, the sentiment analysis literature: The ad hoc rule- 2007). For any tasks, data pre-labeled with senti- based approach, sometimes known as the lexicon- ment categories are essential for creating and evalu- based approach (Ounis, Macdonald, & Soboroff, ating sentiment analysis systems. However, the 2008), and the machine learning-based approach, reality is that labeled data are usually limited, espe- sometimes known as the corpus-based approach. cially at the sub-document level. Although this Both of these approaches benefit from the large shortage of sentiment-labeled data is less challeng- number and great variety of sentiment-bearing fea- ing in some domains (e.g., movie reviews) than in tures used as evidence in sentiment analysis. Such others (e.g., blog posts), simply borrowing labeled sentiment evidence can be knowledge-based (e.g., data from a non-target data domain often fails due the more depressed a person feels, the more likely to the domain transfer problem. he/she will use the first-person word “I,” Penne- Domain transfer is a widely recognized problem baker, 2011), statistical/empirical (e.g., high order for machine learning algorithms because models n-grams), or style-based (e.g., “IMHO,” “-)”). Since built via learning one data domain generally do not each source of sentiment evidence has its own char- perform well in another data domain. Hence for acteristics and captures different aspects of senti- each data domain, machine learning tends to start ment, sentiment-bearing features from more than from scratch. But there may not be sufficient one source of evidence are often preferred. Most ‘ground truth’ (i.e., labeled data) in the target data studies have suggested that a fusion of various sen- domain for machine learning algorithms to rely on. timent-bearing features surpasses the use of any While it is difficult to obtain sentiment-labeled data single subset of features (Chesley, Vincent, Xu & and manual annotation is tedious, expensive, and Srihari, 2006; Gamon, 2004; Hatzivassiloglou & error-prone, unlabeled user-generated data are Wiebe, 2000; Yang, et. al., 2007). readily available. This paper therefore examines The machine learning approach is more practical strategies to utilize both unlabeled data in the target in sentiment analysis than the ad hoc rule-based domain and labeled data in other data domains to approach due to its fully automatic implementation tackle the domain transfer problem. The specific and its ability to identify features that are not intu- machine learning methods explored in this research itive to human. State-of-the-art topical supervised fall into the category of semi-supervised learning classification algorithms are often tailored for senti- (SSL), which requires only limited labeled data to ment analysis in the following manner: 1) binary automatically label unlabeled data. SSL has achieved feature values (presence/absence) are used instead promising results in sparse data situations in various of frequency. This is motivated by the extreme

11 http://www.jistap.org JISTaP Vol.1 No.1, 10-26

brevity of the classification unit (e.g., tweets, attractive for sentiment analysis in challenging data reviews) and the characteristics of sentiment analy- domains such as the blogosphere, which is short of sis, where occurrence frequency is less influential high-quality sentiment-labeled data. (i.e., a single occurrence of sentiment evidence is sufficient); and 2) a wider variety of evidence (e.g., 2.1. Semi-Supervised Learning and Self-Training linguistic features, links) is investigated in addition According to a survey of SSL by Zhu (2008), the to auto-generated features (e.g., bag-of-words, n- most commonly used SSL algorithms include self- grams). These supervised learning algorithms have training, Expectation-Maximization (EM) with gene- achieved satisfactory results for sentiment analysis rative mixture models, co-training, Semi-Super- (Wiebe, et. al., 2004; Zhang & Yu, 2007). vised Support Vector Machines (S3VMs), and graph- The biggest limitation associated with supervised based methods. Except for S3VMs, all SSL algo- learning is that it is sensitive to the quantity and rithms have been found to be effective for senti- quality of the training data and may fail when train- ment analysis (Aue & Gamon, 2005; Pang & Lee, ing data are biased or insufficient. In contrast with 2004; Yu & Kubler 2010; 2011). This study focuses on supervised learning, which learns from labeled data self-training due to its easy generalization and high only, semi-supervised learning (SSL) learns from adaptability. both labeled and unlabeled data based on the idea Self-training1 is a wrapper SSL approach that can that although unlabeled data hold no information be applied to any existing system as long as a confi- about classes (e.g., “sentiment” and “non-senti- dence score can be produced. Self-training keeps a ment”), they do contain information about joint system in a black box and avoids dealing with any distribution over classification features. In contrast inner complexities. The major steps in self-training with supervised learning, the value of SSL in senti- are: (1) training an initial classifier on the labeled ment analysis lies not only in its need for less dataset; (2) applying this classifier to the unlabeled labeled data but also in its ability to handle the data and selecting the most confidently labeled data domain dependency challenge: When there are as determined by the classifier to augment the origi- labeled data in the non-target data domain only, an nal labeled dataset; and (3) re-training the classifier SSL algorithm can reduce the bias of the non-target by repeating the whole process from step (1). A sim- data by increasing the number of labeled data from ple pseudo code for self-training is illustrated in the target data domain. This aspect of SSL is very Figure 1.

Input: classifier C, a small set of labeled data L, a large amount of unlabeled data U

Loop until iteration procedure converges or loop for k iterations 1) Train C on L; 2) Apply trained C to U; 3) Select top n results returned by C and add them back to L;

Output: Extended L and an updated C

Fig. 1 Bootstrapping Procedure in Self-training

1 Self-training, also known as mutual bootstrapping or self-teaching, is conceptually equal to the pseudo relevance feedback technique in information retrieval where the top n retrieved results to a given query are assumed to be relevant and are used to form a new query.

12 Domain Adaptation for Opinion Classification

Self-training has been originally adopted for sen- application and of sufficient quantity” (Conrad & timent lexicon expansion (Riloff & Jones, 1999) and Schider, 2007, p. 235). This approach is especially only recently has been explicitly applied for sen- common in opinion detection in the blogosphere. tence-level sentiment analysis (He & Zhou, 2011). For example, Chesley et al. (2006) leveraged blog The initial classifier C is either a simple rule-based training data with non-blog training data contain- classifier built using a few manually created opinion ing relatively “pure” opinion information; addition- seeds or a supervised classifier trained on a few ally, most participants in TREC’s Blog track have manually labeled data. Across several experiments crawled the Web to generate a great number of carried out by Wiebe and Riloff (2005), a self-trained opinion-labeled training data. However, according Na1ve Bayes classifier using this procedure achieved to Aue and Gamon (2005), who compared four the best recall with modest precision when classify- strategies for utilizing opinion-labeled data from ing subjective sentences. one or more non-target domains, using non-target labeled data without an adaptation strategy is not 2.2. Domain Adaptation as efficient as using labeled data from the target Domain dependency may seem less problematic domain, even when the majority of labels are for sentiment analysis than topical classification assigned automatically by a self-training algorithm. since generic sentiment-bearing words such as Blitzer, Dredze and Pereira (2007) proposed a “good” and “bad” are not limited to any particular structural correspondence learning (SCL) algo- domain. But there are few generic sentiment-bear- rithm for sentiment classification to reduce the ing words and it is therefore necessary to extract classification error of a classifier trained with non- sentiment-bearing features from the target data col- target data. The key to this domain adaptation lection. These features are generally domain depen- strategy is to implicitly associate domain specific dent and may not be reusable in another domain features in the target and non-target data domains for several reasons: (1) there are specific sentiment- with certain general features that are used fre- bearing words associated with different domains quently in both domains and are relevant to the (e.g., “cheap” and “long-lasting” are frequently used opinion class. As a result, even if a feature in the in product reviews, but not in movie reviews); (2) target domain has never occurred in the non-target different domains have different stylistic expecta- domain, the class label can be predicted by looking tions for language use (e.g., news articles are less up its corresponding feature(s) in the non-target likely than blogs to use words such as “crappy” or domain. “soooooooooo”); and (3) some sentiment-bearing A study by Tan, Cheng, Wang, and Xu (2009) is the words can be either positive or negative depending most similar in spirit to this research. It made use of on the object (e.g., “small” may be positive in “a general features in both the target and non-target small camera” but negative in “a small memory domains to address the domain adaptation prob- card”). Since information used for sentiment analy- lem in opinion classification. Their approach dif- sis is typically lexical and lexical means of express- fered from the study by Blitzer et al. (2007) in that ing sentiments may vary not only from domain to only labeled data in the non-target domain were domain but also from register to register, a senti- used with an SSL algorithm, EM-NB, that put more ment analysis strategy that works for one target data weight on target data for opinion classification. domain generally will not work for another data Regardless of their positive contributions to senti- domain. ment analysis, both of these domain adaptation Most sentiment analysis systems borrow senti- strategies involve sophisticated and expensive ment-labeled data directly from non-target data methods for selecting general features and applying domains when there are few labeled data in the tar- them to sentiment analysis. Believing sentiment is a get domain or when the characteristics of the target sentence-level feature, this study conducts opinion domain make it difficult to detect sentiments if classification on the sentence-level, instead of on the non-target data appear to be “relevant to the the document level as in Tan et. al.’s work.

13 http://www.jistap.org JISTaP Vol.1 No.1, 10-26

3. EXPERIMENTS jective expressions. The JDPA corpus6 (Kessler, Eckert, Clark, & Nicolov, 3.1. Selection of Datasets 2010), a new opinion corpus released in 2010, con- Three types of text have been explored in prior sists of blog posts expressing opinions about auto- sentiment analysis studies: news articles, online mobiles and digital cameras. Opinions about named reviews, and online discourse in blogs or discussion entities (e.g., “seat,” “lens”) were manually annotat- forums. These texts differ from one another in terms ed. All sentences containing sentiment-bearing of structure, text genre (e.g., level of formality), and expressions were extracted and objective sentences the proportion of opinions each contains. A dataset were manually identified by eliminating subjective of each type was selected in order to investigate the sentences that were not targeted to any labeled enti- robustness and adaptability of SSL algorithms for ties. This process produced 10,000 subjective sen- opinion classification and to test the feasibility of tences and 4,348 objective sentences. To balance the SSL for domain adaptation. A small set of blog data number of subjective and objective sentences, 4,348 was also used for parameter optimization. Several subjective sentences were randomly selected from manually created opinion lexicons used in earlier the original set of 10,000. studies were also collected in order to increase clas- From 2006 through 2008, a dataset called Blogs067 sification precision for data domains where opinion was used for tasks in TREC’s Blog track. Researchers detection is particularly difficult. at the University of Glasgow crawled the blogos- One of the standard datasets in sentiment analy- phere over an 11-week period from December 2005 sis is the movie review dataset created by Pang and to February 2006 to create the Blogs06 collection Lee (2004).2 It contains 5,000 subjective sentences or (Ounis, Rijke, Macdonald, Mishne, & Soboroff, 2007). snippets from the Rotten Tomatoes3 pages and 5,000 In this collection, permalink documents (i.e., Web objective sentences or snippets from IMDB4 plot pages containing a single blog post with its associat- summaries, all in lowercase. Sentences containing ed comments) were the retrieval and assessment less than 10 tokens were excluded, and the dataset units. For TREC’s Blog track opinion retrieval tasks, was labeled automatically by assuming opinion 50 topics (i.e., search queries and descriptions) were inheritance. released every year, and each participant in the Blog The news article dataset5 created by Wiebe, Bruce, track was to submit several retrieval runs, each run and O’Hara (1999) is widely used as the gold-stan- consisting of the top 1000 documents retrieved for dard corpus in opinion detection research. They each topic. The top documents retrieved across sys- chose the Wall Street Journal portion of the Penn tems for each topic were then manually labeled as Treebank III (Marcus, Santorini, Marcinkiewicz, & topical relevant, topical relevant but not opinion- Taylor, 1999) and manually augmented it with opin- bearing, and topical relevant and opinion-bearing ion related annotations. According to their coding (i.e., “positive,” “negative,” or “neutral”). Because manual, subjective sentences are those expressing topical relevance and opinion polarity would not be evaluations, opinions, emotions, and speculations. taken into consideration in this research, non-rele- For this research, 5,297 subjective sentences and vant data were ignored, and negative, positive, and 5,174 objective sentences were selected based on mixed opinion data were combined into one opin- the presence or absence of manually labeled sub- ion dataset.

2 This dataset can be downloaded from http://www.cs.cornell.edu/people/pabo/movie-review-data/, under subjectivity datasets. 3 http://www.rottentomatoes.com/ 4 http://www.imdb.com/ 5 This dataset can be downloaded from http://www.cs.pitt.edu/mpqa/databaserelease/ 6 The license form for this dataset is available at: http://www.icwsm.org/data/JDPA-Sentiment-Corpus-Licence-ver-2009-12-17.pdf 7 This dataset can be purchased via this page: http://ir.dcs.gla.ac.uk/test_collections/access_to_data.html

14 Domain Adaptation for Opinion Classification

The Blogs06 collection is labeled at the document opinion lexicon distributed by Hatzivassiloglou and level and thus required manual labeling to prepare Wiebe (2000), which include manually and auto- labeled data at the sentence level. In order to avoid matically identified semantic oriented adjectives, bias caused by a particular topic, five TREC labeled dynamic adjectives, and gradable adjectives; and opinion-bearing documents (1 positive, 1 negative strong semantic oriented adjectives in the subjectiv- and 3 mixed opinion) were randomly selected and ity term list created by Wilson, Pierce and Wiebe manually examined for each of the 150 topics, for a (2003). Dynamic adjectives were separated from total of 750 documents. Because machines cannot other Colin adjectives into an individual lexicon be expected to recognize trivial expressions of opin- because of their unique features and their signifi- ion about which humans are uncertain, emphasis cant contributions. was placed on identifying opinion expressions that Appraisal groups have also been suggested as use- contained explicit opinion cues. For example, in a ful in identifying what is called an appraisal expres- product review, the sentence “I returned this prod- sion, “a textual unit expressing an evaluative stance uct after a week” may indicate a negative opinion, towards some target” (Bloom, Garg & Argamon, but it may also state the fact that the product was 2007, p. 308). Given the high cost of full syntactic returned because the reviewer received another as a parsing and the difficulty of fine-level analysis, this gift. It is also reasonable to assume that explicit research used only the head adjectives, which are opinion cues may exist around ambiguous opinion marked as positive or negative in the hand-built lex- expressions to support or explain them (e.g.,“It is icon distributed by Bloom et al. (2007). horrible! I returned this product after a week.”). Although not as significant as adjectives, verbs Therefore, a sentence was labeled as an opinion have also been found to be good indicators of opin- only if strong traces of opinion cues were present. ion information. Verb classes, categories for classify- Sentences that made objective statements were ing verbs syntactically and/or semantically, are labeled as non-opinion, and the remaining sen- often used for culling opinionated verbs. Levin’s tences in selected blog posts were ignored. All in all, verb classes, developed on the basis of both intuitive 1,237 subjective sentences and 616 objective sen- semantic groupings and participation in valence, or tences were collected. polarity alternations (Levin, 1993), are the most popular verb classes used as opinion evidence. For 3.2. Domain Independent Opinion Lexicons this research, verbs from opinion-related Levin’s Several studies have suggested that the use of verb classes, including judgement (e.g., “abuse,” high-quality opinion lexicons can yield high preci- “acclaim”), complain (e.g., “hate,” “despise”), and sion for opinion detection. Therefore, it is advisable psych (e.g., “amuse,” “admire,” “marvel (at)”), were to apply these lexicons to boost the classification selected. Similarly, FrameNet (Fillmore & Baker, precision of the initial classifier for SSL runs, espe- 2001), which groups words, including verbs, accord- cially for difficult data domains such as blog posts. ing to conceptual structures, provides semantic Accordingly, six domain independent opinion lexi- frames such as communication (e.g., “indicate,” cons that had proven useful in previous opinion “convey”) as evidence of opinion (Breck, Choi, & mining studies were collected for use in these Cardie, 2007). For this research, several frames were experiments. selected: “agree or refuse to act,” “be in agreement Adjectives are often connected to the expression on assessment,” “desirability,” “experiencer (objec- of attitudes and have been reported to have a posi- tive /subjective),” “judgment,” “opinion,” “prevari- tive and statistically significant correlation with sub- cation,” and “statement.” jectivity (Wiebe et al., 1999). Three adjective opinion In addition to single words, opinion lexicons used lexicons were selected for this research: Index of in this research include patterns such as IU colloca- General Inquirer (IGI) tag categories, a manually tions (Yang et al., 2007) and bigrams. IU colloca- constructed list that contains 765 positive and 873 tions are n-grams with first-person pronouns (e.g., negative words (Stone, 1997); Colin adjectives, an “I,” “we”) and second-person pronouns (e.g., “you”)

15 http://www.jistap.org JISTaP Vol.1 No.1, 10-26

as anchor terms. During their experiments for ion detection; stemming may actually erase subtle TREC’s Blog track, Yang et al. (2007) found that IU opinion cues such as past tense verbs. For each sen- collocations worked best as single features. The tence, nine lexicon scores were assigned, with each UMass Amherst Linguistics Sentiment Corpora score corresponding to the total occurrence of a term (Constant, Davis, Potts, & Schwarz, 2009; Potts & in one particular lexicon. Schwarz, 2008) consists of unigrams and bigrams As illustrated in Figure 2, each dataset was ran- gathered from online book reviews on Amazon8 and domly split into three portions: 5% of the sentences online hotel reviews on TripAdvisor.9 For each n- were reserved as the evaluation set (E) and were gram, total occurrence is reported on an ordinal available only for S3VM runs; 90% were treated as scale of 1 to 5, with 1 indicating a highly negative unlabeled data (U); and i% (i = 1, 2, 3, 4 or 5) were review and 5 indicating a highly positive review. treated as labeled data (L). In order to pick opinion n-grams, bigrams were excluded if they: contained domain stop words (e.g., 3.4. Experiment Design book, hotel); occurred frequently at all rating levels; In the experiments reported here, opinion detec- occurred more often at neutral ratings than at either tion was treated as a binary classification problem positive or negative ratings; or contained digits or with two categories: subjective sentences (i.e., posi- less than 3 characters. Only those n-grams appear- tive examples, or p) and objective sentences (i.e., ing in both Amazon book reviews and TripAdvisor negative examples, or n). hotel reviews were retained. Two groups of experiments were conducted. One Altogether, nine domain-independent opinion group of experiments applied self-training only with lexicons were utilized: appraisal semantic oriented the target data domain to investigate the overall fea- adjectives,10 gradable and semantic oriented Colin sibility and effectiveness of self-training in opinion adjectives, dynamic adjectives,11 IGI semantic ori- detection. The other group of experiments used ented adjectives,12 Wilson subjective terms,13 Levin’s opinion-labeled data from non-target data domains opinion-related verb class terms, FrameNet opinion to examine the applicability of self-training for related category labels, IU collocations, and review domain adaptation. bigrams. 3.4.1 Design of Experiment 1: Basic Self-training 3.3. Data Preprocessing In order to test the effectiveness of self-training All words in datasets were converted to lower with respect to the number of available labeled case, and numbers were replaced with the place- data, each self-training opinion classifier was holder “#”. Unigrams and bigrams were generated trained on i% of the labeled dataset L and the unla- for each sentence, and common stop words such as beled dataset U. The corresponding baseline super- articles and prepositions were removed from uni- vised opinion classifier was constructed using only grams. No stemming was conducted since the liter- L, and the fully supervised opinion classifier was ature shows no clear gain from stemming in opin- constructed by treating all data in U and L as labeled

8 http://www.amazon.com/ 9 http://www.tripadvisor.com/ 10 The appraisal adjectives can be downloaded from http://lingcog.iit.edu/arc/appraisal_lexicon_2007a.tar.gz 11 The gradable and semantic oriented Colin adjectives and the dynamic adjectives can be downloaded from http://www.cs.pitt.edu/~wiebe/pubs/col- ing00/coling00adjs.tar.gz 12 The IGI words can be accessed at http://www.wjh.harvard.edu/~inquirer/inqdict.txt. Positive and negative words were extracted. 13 The Wilson subjective terms are included in the OpinionFinder package available at http://www.cs.pitt.edu/mpqa/opinionfinderrelease/. Strong subjective terms were extracted.

16 Domain Adaptation for Opinion Classification

Baseline Semi-Supervised Fully Supervised Learning Learning Supervised Learning

Labeled Set Unlabeled Set Evaluation Set (5%)

Fig. 2 Data Split for Semi-supervised Learning Runs, Baseline Supervised Learning Runs and Fully Supervised Learning Runs

data. Performance of each self-training run was mized parameter values established in this group compared with the performance of both the base- of experiments will be used for the next group of line SL run and the full SL run. experiments. Although both SVM and Na1ve Bayes algorithms are widely used for document classification, the 3.4.2 Design of Experiment 2: Domain Adaptation Na1ve Bayes classifier was selected as the base clas- Because movie review data are often labeled with sifier for this study because preliminary experi- sentiment classes and are reported to achieve great ments showed that, even with a logistic model to classification accuracy in the sentiment analysis lit- output probability scores for the SVM classifier, the erature, they were treated as the source data, while difference in probabilities is too small to select a datasets for news articles and blog posts were treat- small number of top classification predictions. The ed as target data. multinomial Na1ve Bayes classifier in Weka (Hall, While the data split for the target domain was the Frank, Holmes, Pfahringer, Reutemann & Witten, same as that used in previous experiments, all sen- 2009) was used to run all the experiments. tences in the source domain, except for the 5% eval- For each sentence, both unigrams and bigrams uation data, were treated as labeled data. For exam- were extracted as classification features. Higher order ple, in order to identify opinion-bearing sentences n-grams (i.e., n>=3) were not used because effective from the blog dataset, all 9,500 movie review sen- high order n-grams cannot be extracted from a small tences and i% of blog sentences were used as labeled dataset. Binary values (i.e., presence or labeled data, 90% of blog sentences were used as absence) were applied for these features. unlabeled data, and 5% were reserved as evaluation Other parameter settings included: (1) for all data. In addition, a parameter was added to gradu- self-training runs, iterations stopped when there ally reduce the weight of non-blog examples in the were no more unlabeled data; (2) for each itera- training set during iterations, similar to the tion, a number of unlabeled examples u, smaller approach taken by Tan et al. (2009). To reduce bias than U, were randomly extracted from the unla- caused by features specific to one non-target data beled dataset U for classifiers to predict opinion domain, labeled data from two different non-target labels; and (3) for each iteration, opinion examples data domains were combined as training data for (p) and non-opinion examples (n) were added both supervised and semi-supervised learning algo- back to the labeled dataset. The ratio between p rithms (i.e., in co-training, two view classifiers were and n approximates the distribution of opinions trained on two non-target domains). and non-opinions in the labeled dataset. The opti- In order to compare the benefits of employing

17 http://www.jistap.org JISTaP Vol.1 No.1, 10-26

non-target labeled data to the benefits of using should be available to the classifier on each itera- general opinion lexicons to deal with the domain tion, experiments were designed using 20, 75, 100 transfer problem, another set of domain adaptation and all unlabeled sentences (i.e., approximately experiments used general opinion lexicons instead 1660 sentences). By computing the average of borrowing opinion labeled sentences from other improvement of self-training runs over correspond- domains. In addition to the n-gram features, SL ing baseline SL runs with 1% to 5% labeled data, it and SSL runs in this set used features from nine was found that self-training runs classifying all opinion lexicons to represent each in-domain sen- unlabeled sentences on each iteration decreased tence. classification accuracy by 4.67%; self-training runs classifying 100 unlabeled sentences on each itera- 3.5. Evaluation Measures tion increased baseline performance by 2.08%; self- Classification accuracy was used as the evaluation training runs classifying 75 unlabeled sentences on measure when comparing SSL and SL runs. Clas- each iteration did not improve baseline perfor- sification accuracy evaluates the overall correctness mance; and self-training runs classifying only 20 of a classifier and is calculated using the formula unlabeled sentences on each iteration increased ACC = (a+d)/(a+b+c+d). baseline performance by 4.18%. For the following In addition, two measures were adopted to deter- experiments, u was set to 20. mine whether performance increased when more After p auto-labeled opinion sentences and n unlabeled data were used and whether the contri- auto-labeled non-opinion sentences were selected bution of unlabeled data decreased with the and added to the labeled dataset, p+n unlabeled increase in available labeled data, as suggested in sentences can be drawn from U to replenish u or a most SSL studies (e.g., Nigam & Ghani, 2000). new set of u can be generated from U. Experiments using TREC’s blog dataset indicated that replenish- ing u outperformed generating a new set of u by 4. RESULTS AND DISCUSSION 11.87% in terms of classification accuracy. One explanation is that, for succeeding iterations, 4.1. Preliminary Experiments replenishing u kept those unlabeled sentences for Self-training runs with various parameter settings which the classifier generated low prediction scores were conducted on TREC’s blog data to evaluate the in the current iteration and forced the classifier to impact of different experimental settings and to reclassify difficult sentences, while generating a new determine optimized parameters for all self-training set of u allowed the classifier to select sentences that runs. were easy to classify. On the one hand, in order to avoid mislabeled 4.1.1. Feature Selection data in the labeled dataset, only the most confident- Two popular feature selection methods informa- ly labeled data should be selected, and a small value tion gain (IG) and chi-square (CHI) were investi- for p and n would be preferred. Alternatively, in gated. When keeping all other parameters fixed and order to reduce the number of iterations necessary selecting the top 100 features, neither feature selec- for SSL to converge, a larger value for p and n would tion method contributes to SSL performance with be preferred. Preliminary experiments compared labeled data from 1% to 5% of the total dataset. the results of setting p and n either to one or to two Because feature selection consumes computing and found no noticeable difference. For this reason, time, especially when a new classification model p and n were set at two for all experiments. must be built for each iteration, no feature selection was conducted for the subsequent experiments. 4.2. Basic Self-training The first experiment examined the effectiveness 4.1.2. Unlabeled Data Available for Each Iteration of self-training that used only in-domain data. For To decide how many unlabeled sentences u the movie review, news, and blog data domains, the

18 Domain Adaptation for Opinion Classification

performance of self-training was compared with the unlabeled data with one classifier was effective for performance of SL runs, which used the same num- movie reviews and achieved performance close to ber of labeled sentences as well as those that used fully supervised learning while saving the labor all data as labeled sentences. involved in labeling thousands of unlabeled sen- Table 1 shows the classification accuracy of self- tences. Because news articles follow similar pat- training and two supervised learning runs for movie terns, their results will not be shown here. reviews. The more labeled data provided for the As shown in Table 2, none of the self-training runs baseline SL runs, the better the performance: With proved beneficial in the blog domain. This is 100 labeled sentences, the baseline SL run achieved because the blog data domain is even more chal- classification accuracy of only 63.80%; but with 500 lenging than the news domain. The language used labeled sentences, the supervised learning classifier in blog posts is more informal than the language of achieved classification accuracy of 80.20%. The sec- the other two data domains, and blog writing con- ond row shows the performance of the simple self- tains a variety of opinion cues not found in movie training method using 100 to 500 labeled sentences reviews or news writing. Furthermore, because the and an additional 9000 unlabeled sentences. These JDPA blog data are focused on reviews of cars and self-training runs improved performance over the cameras, opinion and non-opinion sentences share corresponding baseline supervised runs: For exam- topic-related features; moreover, the average length ple, using 100 labeled sentences, self-training for opinion and non-opinion sentences in blog

Table 1. Classification Accuracy (%) of Self-training and SL Runs for Movie Reviews

# of Original Labeled Sentences Run Type 100 200 300 400 500

Baseline SL 63.80 73.60 77.20 79.40 80.20

Self-training 85.20 86.60 87.00 87.20 85.20

Full SL 90.00 92.00 91.80 91.60 91.80

Note. Settings for self-training: u=20, p=2, n=2, n-grams=unigrams+bigrams. Full SL runs used an additional 9000 labeled sentences.

achieved a classification accuracy of 85.2% and out- posts is 17 words, shorter than that for movie performed the baseline SL by 33.5%. Although the reviews (23.5 words) or news articles (22.5 words). full SL run using all labeled data surpassed the sim- In fact, approximately one quarter of the sentences ple self-training run by 4.9%, significant effort was in the blog dataset had only 5 to 10 words. This saved by labeling only 100 sentences rather than poses an additional challenge because there is less 9,100. If approximately 30 seconds are needed to information for the classifier in terms of the number label each sentence, self-training saves 225 hours of of individual features. labor of three human annotators; and if they are With limited labeled data, the results of these paid $15/hour, this saves almost $3,400. With 500 experiments suggest that self-training can make labeled sentences, self-training improved accuracy effective use of unlabeled data for opinion detection over the baseline supervised run by 6%, indicating in certain data domains (e.g., movie reviews) but that self-training is particularly beneficial when the not in others (e.g., news and blog data). One reason number of labeled data is small. for the failure of self-training in the blog domains is Overall, self-training which iteratively labeled the low classification accuracy of initial runs: The

19 http://www.jistap.org JISTaP Vol.1 No.1, 10-26

Table 2. Classification Accuracy (%) of Self-training and SL Runs for Blog Posts

# of Original Labeled Sentences Run Type 86 172 258 344 430

Baseline SL 55.05 58.95 61.93 64.69 66.06

Self-training 54.59 55.73 56.65 58.49 64.45

Full SL 71.56 73.17 72.71 72.94 72.48

Note. Settings for self-training: u=20, p=2, n=2, n-grams=unigrams+bigrams. Full SL runs used an additional 7740 labeled sentences.

performance of blog baseline classifiers was only lexicons makes it a strong opinion indicator. For slightly better than chance (50%) and decreased the example, ‘like’ is included in the Levin verb class quality of auto-labeled data. lexicon, the frameNet lexicon, and the Wilson lexi- cons, and its occurrences were counted when calcu- 4.3. Domain Adaptation lating values for all three lexicon features. In order to deal with challenging data domains In Table 3 and Table 4, the baseline supervised such as blog posts, one possible solution is to learning runs using domain-independent opinion improve baseline accuracy for self-training by intro- lexicon features (i.e., Baseline SL w/ Lexicon) pro- ducing high-quality features: for example, augment- duced higher classification accuracies than base- ing the feature set with domain independent opin- line supervised learning runs that did not use ion lexicons such as those which have been suggest- lexicon features (i.e., Baseline SL w/o Lexicon). ed as effective in creating high precision opinion However, self-training runs that used opinion lexi- classifiers. An alternative approach for dealing with cons (i.e., Self-training w/ Lexicon) did not general- challenging data domains is to borrow labeled data ly improve the baseline run (i.e., Baseline SL w/ from one or more “easy” domains: for example, the Lexicon); in some cases, performance was even use of movie review data in self-training applica- lower than that of the corresponding self-training tions for opinion detection in news article and blog runs that did not use domain-independent opinion domains. lexicon information (i.e., Self-training w/o Lexicon). For example, using opinion lexicon fea- 4.3.1. Using Domain-Independent Opinion Lexicons tures with 86 labeled blog sentences, supervised In addition to unigram and bigram features with learning yielded a classification accuracy of 63.76%, binary values, nine lexicon features were added to 8.71% higher in absolute value than the classifica- the feature set. To avoid the possibility that the large tion accuracy produced by the supervised learning number of n-gram features would weaken these run that made no use of opinion lexicon features; nine lexicon features, the value of each lexicon fea- however, after self-training iterations, the perfor- ture (e.g., dynamic adjectives) was not binary but mance of the former run decreased to 51.38%, represented the total number of matches between 3.21% lower in the absolute value of classification lexicon terms and the words in a target sentence. accuracy than the classification accuracy produced For example, the value of Wilson lexicon features for by the latter run. This may be because, as a closer the sentence “I like these two much better than the look at the distribution of opinion lexicon terms in versions made for the Hong Kong market” is two the three datasets indicates, many opinion lexicon because two Wilson lexicon terms, ‘like’ and terms actually occur frequently in objective, non- ‘better,’are used in this sentence. Redundancy opinion sentences. between lexicons was not removed under the Table 5 shows the number of unique opinion lexi- assumption that one word occurring in multiple con terms that appear in subjective and objective

20 Domain Adaptation for Opinion Classification

Table 3. Classification Accuracy (%) of Self-training With and Without Opinion Lexicon Features for News Articles

# of Original Labeled Sentences Run Type 103 206 309 412 515

Baseline SL w/o Lexicon 60.50 64.31 69.47 69.47 71.38

Self-training w/o Lexicon 60.11 65.84 66.41 67.75 67.56

Baseline SL w/ Lexicon 66.60 70.42 70.99 72.14 72.52

Self-training w/ Lexicon 59.73 66.41 71.18 70.61 70.61

Note. Settings for self-training: u=20, p=2, n=2, n-grams=unigrams+bigrams.

Table 4. Classification Accuracy (%) of Self-training With and Without Opinion Lexicon Features for Blog Posts

# of Original Labeled Sentences Run Type 86 172 258 344 430

Baseline SL w/o Lexicon 55.05 58.95 61.93 64.69 66.06

Self-training w/o Lexicon 54.59 55.73 56.65 58.49 64.45

Baseline SL w/ Lexicon 63.76 64.68 63.53 66.51 67.89

Self-training w/ Lexicon 51.38 62.16 55.73 61.47 69.04

Note. Settings for self-training: u=20, p=2, n=2, n-grams=unigrams+bigrams.

data in the three data domains as well as the total terms are used in opinion sentences approximately occurrence of opinion lexicon terms in subjective three times as often as in non-opinion sentences in and objective sentences. Although opinion lexicon both the blog and news data domains; opinion lexi- terms are used more often in opinion sentences con terms are used in non-opinion sentences a little than in non-opinion sentences, their presence does more than half as often as they are in opinion sen- not appear to be a strong indicator of opinions. For tences in the movie review domain. This suggests example, more than half of the opinion lexicon fea- that automatically created subjective and objective tures that appear in opinion blog sentences also movie review data will not necessarily reflect opin- appear in non-opinion blog sentences. When con- ion and non-opinion classes. sidering their total occurrence, opinion lexicon

Table 5. Distribution of Domain Independent Opinion Lexicons

Dataset

# of Matches Movie Reviews News Articles Blog Posts

Non-Op Op Non-Op Op Non-Op Op

Unique Terms 1076 1428 502 1127 459 753

Total Occurrence 4867 8596 1865 5576 1778 4668

Note. Non-Op=non-opinion; Op=opinion.

21 http://www.jistap.org JISTaP Vol.1 No.1, 10-26

The inefficiency of opinion lexicons can be attrib- From Movie Reviews/News to Blog Posts uted to the fact that opinion features are often very Domain transfer self-training runs for blog data sensitive to the context in which they occur. For combined all movie review data and i% labeled blog example, “like” is included in three opinion lexicons data to form the initial labeled dataset, and then fol- and is therefore treated as a good opinion indicator, lowed the traditional self-training procedure. A con- but when it is used in the sentence “the lens cap trol factor was introduced and investigated to grad- finally snaps into the front of the lens like other ually reduce the impact of out-of-domain data (i.e., makers’ models,” it is no longer an opinion indica- movie reviews) on each iteration. tor. As a result, when there was a limited number of Table 6 reports the results of self-training runs to labeled data at the beginning of a self-training run, identify opinion sentences in blog posts, both with extra opinion lexicon features helped; however, and without the use of movie review data, as well as with more and more unlabeled data labeled auto- corresponding baseline and fully supervised learn- matically and used to replenish the labeled dataset, ing runs. The results for baseline SL runs without the limitations of opinion lexicons were amplified, movie reviews and self-training without movie undermining overall performance. reviews show that self-training using only blog data decreases baseline SL performance. By keeping the 4.3.2. Using Labeled Data in Non-Target Domain same settings and adding more labeled data from A preliminary experiment on the use of movie the movie review domain, self-training with movie review data was conducted on the news domain. reviews increased the performance of SL runs by This analysis was followed by a more in-depth 12% to 15% and came closer to the performance of investigation of the use of movie review data in the full SL runs, which used 90% of the labeled blog blog data domain. data. In the case of domain transfer runs, the num- ber of available in-domain labeled data did not From Movie Reviews to News Articles appear to have an impact on overall performance: This experiment tested an extreme situation neither supervised nor semi-supervised runs using where there were no labeled data available in the movie review data produced higher classification target data domain. To begin, 9,500 labeled movie accuracies with increasing numbers of labeled blog review sentences were used to train a Na1ve Bayes sentences. For example, the self-training run using classifier. Although this classifier produced a fairly movie review data yielded the same classification good classification accuracy of 89.2% on movie accuracy of 71.10% with as few as 86 or as many as review data, its accuracy in a domain-transfer SL 430 labeled blog sentences in the original training run on news data was poor (64.1%), demonstrating set. This may be due to the preponderance of movie the severity of the domain transfer problem. review data available during training. A self-training run starting with the same Na1ve A control factor intended to reduce the bias of Bayes classifier trained on movie review data and movie review data was added to weaken the effects using unlabeled data from the news domain (i.e., a of domain transfer gradually (i.e., a decrease of domain-transfer SSL run) showed some improve- 0.001 on each iteration). The results reported for ment, achieving a classification accuracy of 75.1% self-training runs with both movie review data and that surpassed the domain-transfer SL run by more weight control show that these runs outperformed than 17% with no extra efforts for manual annota- runs that did not use weight control by 1% to 3%, tion. To further understand how well SSL handles reaching and occasionally exceeding the perfor- the domain transfer problem, a full SL run that used mance of the full SL run. all labeled news sentences was also performed. This Overall, for high-challenge data domains, adop- full SL run achieved 76.9% classification accuracy, tion of domain independent opinion lexicons only 1.8% higher in absolute value than the domain- resulted in only minimal improvement, but apply- transfer SSL run, which had not used any labeled ing simple self-training alone was promising for news data. tackling domain transfer from the source domain of

22 Domain Adaptation for Opinion Classification

Table 6. Classification Accuracy (%) of Self-training With and Without Labeled Movie Reviews

# of Original Labeled Sentences (Blog) Run Type 86 172 258 344 430

Baseline SL 55.05 58.95 61.93 64.69 66.06

Self-training 54.59 55.73 56.65 58.49 64.45

Baseline SL w/ m.r. 63.07 62.16 62.61 62.16 61.70

Self-training w/m.r. 71.10 70.87 71.41 70.41 71.10

Self-training w/m.r. w/w.c. 72.94 72.94 72.48 71.56 71.79

Full SL 71.56 73.17 72.71 72.94 72.48

Note. m.r. = Movie reviews. w.c. = Weight Control. Settings for self-training: u=20, p=2, n=2, n-grams=unigrams+bigrams. Full SL used an additional 7740 labeled blog sentences.

movie reviews to the target domains of news articles opinion detection with both time and cost benefits. and blog posts. Supported by the opinion feature Due to the nature of the movie review data, opinion distribution statistics in Table 5, one guess for the detection in movie reviews is an “easy” problem success of movie reviews in helping classifying because it involves genre classification and thus opinions in news articles and blogs is the rich opin- relies, strictly speaking, on distinguishing movie ion features in this data domain. reviews from plot summaries. For other manually created datasets that are expected to reflect real sentiment characteristics, self-training was imped- 5. CONCLUSION ed by low baseline precision and demonstrated only limited improvement. Blog posts are the most chal- Sentiment is an important aspect of many types lenging domain and blog data showed no benefits of information and being able to identify and orga- from implementing self-training. However, with the nize sentiments is essential for information studies. addition of out-of-domain labeled data (i.e., movie The shortage of labeled data has become a severe review data), self-training for identifying opinion challenge for developing effective sentiment analy- sentences in blogs exceeded fully supervised learn- sis systems. This study tackled this challenge by ing using all available labeled blog data. This investigating a semi-supervised learning (SSL) promising result suggests great value in further approach, motivated by limited labeled data and exploration of SSL for domain adaptation, especially the availability of plentiful unlabeled data. because of its easy implementation. Specifically, this research investigated self-training The contributions of this research are four-fold. strategies in dealing with the domain transfer prob- First, the findings of this research indicate a general lem via learning unlabeled data in the target approach that can be adapted for use in existing sen- domain and labeled data in non-target domain(s). timent analysis systems across data domains and To understand the feasibility and effectiveness of across languages. These findings also provide valu- SSL for sentiment analysis, self-training was applied able guidelines and evaluation baselines for later to three datasets from domains with different char- studies applying SSL algorithms in sentiment analy- acteristics (i.e., movie reviews, news articles, and sis. Second, there are several applications for auto- blog posts), and its performance varied across matically labeled data generated by the effective SSL domains. For movie reviews, all self-training runs strategies reported in this research: creating senti- showed the advantage of using unlabeled data for ment labeled corpora directly; providing candidates

23 http://www.jistap.org JISTaP Vol.1 No.1, 10-26

for manual annotation; and extracting sentiment- ify blog sentiment. In Proceedings of AAAI- bearing features. Third, the SSL strategies investigat- CAAW-06, the Spring Symposia on Computational ed in this research, especially those related to Approaches to Analyzing Weblogs, Stanford domain adaptation, are readily extensible to other University, CA., 27-29 March 2006, Menlo Park, text mining systems (e.g., genre identification). Finally, CA: AAAI Press. this research contributes to SSL re-search by expan- Conrad, J. G., & Schilder, F. (2007). Opinion mining ding the spectrum of SSL applications to include in legal blogs. In Proceedings of the 11th Inter- sentiment analysis, confirming the effectiveness of national Conference on Artificial Intelligence and SSL as a general approach for dealing with insuffi- Law, Stanford, CA (pp. 231-236). New York, NY: cient quantities of labeled data, and providing ACM. promising new approaches for domain adaptation. Constant, N., Davis, C., Potts, C., & Schwarz, F. (2009). The pragmatics of expressive content: Evidence from large corpora. Sprache und Datenverar- REFERENCES beitung 33, 5-21. Fillmore, C. J., & Baker, C. F. (2001). Frame seman- Abbasi, A., Chen, H., & Salem, A. (2008). Sentiment tics for text understanding. In Proceedings of analysis in multiple languages: Feature selection WordNet and Other Lexical Resources Workshop, for opinion classification in Web forums. ACM Pittsburgh, PA. Transactions on Information Systems, 26(3). Gamon, M. (2004). Sentiment classification on cus- Aue, A., & Gamon, M. (2005). Customizing sentiment tomer feedback data: Noisy data, large feature classifiers to new domains: A case study. In Pro- vectors, and the role of linguistic analysis. In ceedings of International Conference Recent Ad- Proceedings of the 20th International Confer- vances in Natural Language Processing (RANLP- ence on Computational Linguistics, Geneva, 2005), Borovets, Bulgaria, 21-23 September 2005. Switzerland, 23-27 August 2004. Stroudsburg, PA, Blitzer, J., Dredze, M., & Pereira, F. (2007). Biogra- USA: Association for Computational Linguistics. phies, Bollywood, boom-boxes and blenders: Hatzivassiloglou, V., & Wiebe, J. (2000). Effects of Domain adaptation for sentiment classification. adjective orientation and gradability on sentence In Proceedings of the 45th Annual Meeting of the subjectivity. In Proceedings of the 18th Confer- Association of Computational Linguistics (pp. ence on Computational Linguistics, Saarbrucken, 440-447). Association for Computational Lin- Germany, 31 July-4 August 2000 (pp. 299-305). guistics. Stroudsburg, PA, USA: Association for Computa- Bloom, K., Garg, N., & Argamon, S. (2007). Extracting tional Linguistics. appraisal expressions. In Proceedings of the Annual He, Y., & Zhou, D. (2011). Self-training from labeled Conference of the North American Chapter of the features for sentiment analysis. Information Association for Computational Linguistics (NAACL Processing and Management, 47(4), 606-616. HLT), Rochester, NY (pp. 308-315). Morristown, Hall, M., Frank, E., Holmes, G., Pfahringer, B., NJ: Association for Computational Linguistics. Reutemann, P., & Witten, I. H. (2009). The WEKA Bollen, J., Mao, H., & Zeng, X. J. (2011). Twitter mood data mining software: An update. ACM SIGKDD predicts the stock market. Journal of Computational Explorations news letter, 11(1), 10-18. Science, 2(1), 1-8. Kessler, J. S., Eckert, M., Clark, L., & Nicolov, N. Breck, E., Choi, Y., & Cardie, C. (2007). Identifying (2010). The ICWSM 2010 JDPA sentiment corpus expressions of opinion in context. In Proceedings for the automotive domain. In Proceedings of the of the 20th International Joint Conference on 4th International AAAI Conference on Weblogs Artificial Intelligence, Hyderabad, India, 6-12 and Social Media Data Workshop Challenge January 2007 (pp. 2683-2688). (ICWSM-DWC), Washington, D.C., USA. Chesley, P., Vincent, B., Xu, L., & Srihari, R. K. (2006). Ku, L. W., & Chen, H. H. (2007). Mining opinions Using verbs and adjectives to automatically class- from the Web: Beyond relevance retrieval.

24 Domain Adaptation for Opinion Classification

Journal of the American Society for Information das for analyzing text content. In C. Roberts (Ed.), Science and Technology, 58(12), 1838-1850. Text analysis for the social sciences. Mahwah, NJ: Levin, B. (1993). English verb classes and alternations. Lawrence Erlbaum Associates. Chicago, IL: University of Chicago Press. Tan, S., Cheng, X., Wang, Y., & Xu, H. (2009). Adap- Marcus, M. P., Santorini, B., Marcinkiewicz, M. A., & ting Na1ve Bayes to domain adaptation for sen- Taylor, A. (1999). Treebank-3. Linguistic Data Con- timent analysis. In Proceedings of the 31th sortium, Philadelphia. European Conference on IR Research on Advances Nigam, K., & Ghani, R. (2000). Analyzing the effec- in Information Retrieval, (pp. 337-349). tiveness and applicability of co-training. In Pro- Tsou, B. K. Y., Yuen, R. W. M., Kwong, O. Y., Lai, T. B. ceedings of the Ninth International Conference on Y., & Wong, W. L. (2005). Polarity classification of Information and Knowledge Management (pp. celebrity coverage in the Chinese press. In 86-93). New York, NY, USA: ACM. Proceedings of the International Conference on Ounis, I., Macdonald, C., & Soboroff, I. (2008). Intelligence Analysis, McLean, VA, 2-4 May 2005. Overview of the TREC-2008 Blog Track. In Pro- Wiebe, J., Bruce, R., & O’Hara, T. P. (1999). Develop- ceedings of the 17th Text REtrieval Conference ment and use of a gold-standard data set for sub- (TREC 2008). jectivity classifications. In Proceedings of the 37th Ounis, I., Rijke, M. D., Macdonald, C., Mishne, G., & Annual Meeting of the Association for Computa- Soboroff, I. (2007). Overview of the TREC-2006 tional Linguistics on Computational Linguistics, Blog track. In Proceedings of the 15th Text RE- College Park, MD, 20-26 June 1999 (pp. 246-253). trieval Conference. Stroudsburg, PA, USA: Association for Computa- Pang, B., & Lee, L. (2004). A sentimental education: tional Linguistics. Sentiment analysis using subjectivity summa- Wiebe, J., & Riloff, E. (2005). Creating subjective and rization based on minimum cuts. In Proceedings objective sentence classifiers from unannotated of the 42nd Annual Meeting on Association for texts. In Proceedings of the 6th International Computational Linguistics, Barcelona, Spain, 21- Conference on Intelligent Text Processing and 26 July 2004, (pp. 271-278). Stroudsburg, PA, USA: Computational Linguistics (CICLing-2005), Association for Computational Linguistics. Mexico City, Mexico, 13-19 February 2005 (pp. Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs 486-497). Heidelberg, Berlin: Springer-Verlag. up?: Sentiment classification using machine Wiebe, J., Wilson, T., Bruce, R., Bell, M., & Martin, M. learning techniques. In Proceedings of the ACL-02 (2004). Learning subjective language. Compu- Conference on Empirical Methods in Natural tational Linguistics, 30(3), 277-308. Language Processing, Philadelphia, PA, 6-7 July Wilson, T., Pierce, D. R., & Wiebe, J. (2003). Identi- 2002, (pp. 79-86). Stroudsburg, PA, USA: Associa- fying opinionated sentences. In Proceedings of tion for Computational Linguistics. the 2003 Conference of the North American Pennebaker, J. W. (2011). The secret life of pronouns: Chapter of the Association for Computational What our words say about us. New York, NY: Linguistics on Human Language Technology: Bloomsbury Press. Demonstrations, Edmonton, Canada (pp. 33-34). Potts, C., & Schwarz, F. (2008). Exclamatives and Stroudsburg, PA, USA: Association for Computa- heightened emotion: Extracting pragmatic gener- tional Linguistics. alizations from large corpora. Ms.: UMass Amherst. Yang, K., Yu, N., & Zhang, H. (2007). WIDIT in TREC- Riloff, E., & Jones, R. (1999). Learning dictionaries for 2007 Blog track: Combining lexicon-based meth- information extraction by multi-level bootstrap- ods to detect opinionated blogs. In Proceedings ping. In Proceedings of the Sixteenth National of the 16th Text REtrieval Conference (TREC Conference on Artificial Intelligence, Orlando, FL 2007). (pp. 474-479). Menlo Park, CA, USA: American Yu, N., & Kubler, S. (2010). Semi-supervised learning Association for Artificial Intelligence. for opinion detection. In Proceeding of the IEEE/ Stone, P. J. (1997). Thematic text analysis: New agen- WIC/ ACM International Conference on Web

25 http://www.jistap.org JISTaP Vol.1 No.1, 10-26

Intelligence and Intelligent Agent Technology, vol. 3, Toronto, ON, Canada, 31 August 3 September 2010 (pp. 249-252). Stroudsburg, PA, USA: Association for Computa-tional Linguistics. Yu, N., & Kubler, S. (2011). Filling the gap: Semi- supervised learning for opinion detection across domains. In Proceeding of the Fifteenth Confer- ence on Computational Natural Language Learn- ing (CoNLL 2011), Portland, OR, 23-24 June 2011 (pp. 200-209). Yu, N., Kubler, S., Herring, J., Hsu, Y. Y., Israel, R., & Smiley, C. (2012). LASSA: Emotion detection via information fusion. Biomedical Informatics Insights, 5(Suppl. 1), 71-76. Zhang, W., & Yu, C. (2007). UIC at TREC 2007 Blog track. In Proceedings of the 16th Text REtrieval Conference (TREC 2007). Zhu, X. (2008). Semi-supervised learning literature survey: Department of Computer Sciences, Uni- versity of Wisconsin, Madison. (Technical Report No. 1530).

26 Research Paper JISTaP http://www.jistap.org J. of infosci. theory and practice 1(1): 27-41, 2013 Journal of Information Science Theory and Practice http://dx.doi.org/10.1633/JISTaP.2013.1.1.2

Bibliometric Approach to Research Assessment: Publication Count, Citation Count, & Author Rank

Kiduk Yang* Jongwook Lee Department of Library and Information Science School of Library and Information Studies Kyungpook National University, Republic of Korea Florida State University, USA Email: [email protected] Email: [email protected]

ABSTRACT We investigated how bibliometric indicators such as publication count and citation count affect the assessment of research performance by computing various bibliometric scores of the works of Korean LIS faculty members and comparing the rankings by those scores. For the study data, we used the publication and citation data of 159 tenure- track faculty members of Library and Information Science departments in 34 Korean universities. The study results showed correlation between publication count and citation count for authors with many publications but the oppo- site evidence for authors with few publications. The study results suggest that as authors publish more and more work, citations to their work tend to increase along with publication count. However, for junior faculty members who have not yet accumulated enough publications, citations to their work are of great importance in assessing their research performance. The study data also showed that there are marked differences in the magnitude of citations between papers published in Korean journals and papers published in international journals.

Keywords: Bibliometrics, Citation Analysis, Author Rank, Research Assessment

1. INTRODUCTION range of issues on quality assessment. How do we assess beauty? Is it quantifiable? Is there an objec- In the fairy tale of “Snow White,” the evil queen tive standard for beauty? After all, isn t beauty in asks the magic mirror the following question: the eye of the beholder? Research assessment, being “Mirror, mirror, on the wall, who is the fairest of ultimately an exercise in quality assessment, shares them all?” From a research perspective, this is a much in common with assessment of beauty, although loaded question that invites consideration of a one may argue that research is much more tangible

Open Access

Received date: November 25, 2012 All JISTaP content is Open Access, meaning it is accessible Accepted date: February 22, 2013 online to everyone, without fee and authors’ permission. All JISTaP content is published and distributed under the terms of *Corresponding Author: Kiduk Yang the Creative Commons Attribution License (http://creativecom- Associate professor mons.org/licenses/by/3.0/). Under this license, authors reserve Department of Library and Information Science the copyright for their content; however, they permit anyone to Kyungpook National University, Republic of Korea unrestrictedly use, distribute, and reproduce the content in any E-mail: [email protected] medium as far as the original authors and source are cited. For any reuse, redistribution, or reproduction of a work, users must clarify the license terms under which the work was produced.

Kiduk Yang, Jongwook Lee, 2013 27 JISTaP Vol.1 No.1, 27-41

than beauty and therefore easier to quantify. not adequately capture all the facets of research When we evaluate a researcher, however, we performance. In bibliometric analysis, research often assess the person for his or her research quantity, i.e., how much research has been done, is potential rather than simply basing our judgments usually measured in terms of the number of publi- on the one-dimensional examination of existing cations the research generates, and impact, i.e., how research outcome. The determination of research significant the research contribution is, is approxi- potential encompasses consideration of three key mated by the number of citations to the publica- aspects: capability, experience, and impact. The tions that the research produces. To properly assess main component of research capability is the the quality of research, however, one must look at researcher s knowledge and skill set, which is not only the count of citations but also the sources accrued initially by education and then by experi- and contexts of citations so that the true impact of ence. Although its impact is secondary, the compu- research that each citation implies can be ascer- tational, organizational, and operational support tained. infrastructure of the organization the researcher is affiliated with also influences research potential. 1.1. Research Assessment Metrics The third component of research capability is net- There are several research assessment metrics working ability, which grows in importance as col- which are often used in bibliometric analysis. laboration becomes the norm rather than the Citation count, i.e., the number of citations to a exception in a modern day research environment. publication, is a document-level measure used to Research experience, with the typical lifecycle of approximate the impact or importance of a paper, grant proposal, project management, and publica- whereas publication count, i.e., the number of pub- tion, is directly related to research capability in that lications, is an author-level measure that represents it is the natural outcome of research capability. an author s research productivity. Another author- Research impact, on the other hand, is not neces- level measure that takes into consideration both the sarily proportional to research capability and expe- impact and productivity of a researcher is h-index rience. For instance, a capable researcher with plen- (Hirsh, 2005). h-index is computed by sorting the ty of experience may not have as much impact as a publications of a given author by the descending young researcher on the trail of a hot topic. If the order of citation count and finding the rank of pub- research experience is the quantitative outcome of lication at which the citation count is equal to or research capability, the research impact is the con- greater than the rank. In this way, an author with h- sequence of research quality and significance. Just index of k is guaranteed to have k papers with at as research capability and experience reinforce each least k or more citations to each paper. The strength other, research impact and experience feed off one of h-index lies in the fact that it requires many high another. The impact of a researcher, demonstrated impact papers to achieve a high score. In other by citations to and extension of his or her work in words, neither the authors with many papers that related studies, helps the researcher to obtain fund- are cited infrequently nor the authors with a few ing, which fuels his or her research productivity and papers that are cited highly will receive high h-index thus increases the impact potential. scores. Among the three aspects of research potential, h-index, however, is not good at differentiating experience and impact are more readily measurable among authors with similar publication and cita- than capability since they are tangible outcomes of tion patterns but different citation magnitudes. As research activity as opposed to qualitative condi- Table 1 illustrates, authors with a higher number of tions for producing those outcomes. In fact, research citation counts at top ranks (e.g., author 1) can get experience and impact are components of research the same h-index as other authors with fewer cita- performance, which is typically the main target of tions (e.g., author 2) as long as their citation counts assessment in bibliometric analysis. The bibliomet- near the h-index rank are similar. g-index, proposed ric approach to research assessment, however, does by Egghe (2006), compensates for this weakness of

28 Bibliometric Approach to Research Assessment

Table 1. Example: h-index vs. g-index (Author 1) h-index=5, g-index=8 (Author 2) h-index=5, g-index=6

P CC >h CC+ >g P CC >h CC+ >g

1 20 1 20 1 1 8 1 8 1

2 10 2 30 4 2 7 2 15 4

3 8 3 38 9 3 6 3 21 9

4 8 4 46 16 4 6 4 27 16

5 5 5 51 25 5 5 5 32 25

6 5 6 56 36 6 5 6 37 36

7 5 7 61 49 7 5 7 42 49

8 4 8 65 64 8 4 8 46 64

9 4 9 69 81 9 4 9 50 81

h-index by using the cutoff rank criteria as the rank at which the accumulative citation counts are equal d1 to or greater than the rank squared, thus taking into consideration the total number of citations for high- d2 ly cited papers. Another way to assess the quality of publications Fig. 1 Link Propagation Example other than citation count is to consider the venue of publication. A paper published in a high impact papers that cites d1 is much more important (indi- journal can be regarded to be of higher quality than cated by its size) due to its own citation counts. a paper published in a low impact journal. The pop- Eigenfactor captures this property with a recursive ular metric for assessing the impact of a journal is link propagation algorithm (Equation 2). Unfortunately, given by the impact factor, which is computed by computation of recursive link propagation mea- dividing the number of citations in a given year to sures such as Eigenfactor requires a complete set of papers published in a journal during the prior two the citation network, which is impractical if not years by the number of publications in those years impossible for most people. The computation, not (Equation 1). Impact factor, being the average num- to mention data collection, needed to apply an ber of citations to a paper for a journal, suffers from eigenfactor-like algorithm at an author- or docu- the same weakness as the citation counts which estimate the impact of a publication in that they ment-level is prohibitively complex, so it is doubtful treat all citations to be of equal importance, which whether such measures can be computed in a can be a gross oversight in reality. dynamic fashion even with an inside access to cita- tion databases such as Web of Science and Scopus.

citation counts in year Y to papers in Y 1 & Y 2 IF = (1) k publication counts in Y 1 & Y 2 R(Pi ) R(P) = (2) C( ) i=1 Pi Eigenfactor (Bergstrom, 2007) addresses this shortcoming of impact factor by estimating the 1.2. Challenges for Research Assessment importance of citing journals similar to Google s Like the assessment of beauty, research assess- PageRank approach. As can be seen in Figure 1, d1 ment is an inherently subjective task that strives for and d2 both have two citations but one of the objective standards by means of commonality. The

29 http://www.jistap.org JISTaP Vol.1 No.1, 27-41

input, outcome, and methods for research assess- al., 2012). Moreover, citation databases are not yet ment depend on who is evaluating whom for what very user-friendly for research assessment tasks that purpose and in what context. At the same time, require more than raw publication and citation research assessment should be a consistent and counts. methodological process that produces a valid and Faced with these challenges for research assess- robust outcome. Consequently the first issue in ment, meaningful and consistent analysis of biblio- research assessment is whether to take a qualitative metric data is no trivial task. Until the coverage, approach, in which subjective criteria suitable for quality, and usability of citation databases are sig- the purpose and context of assessment can be nificantly enhanced, we must keep in mind that applied to evaluate research performance in a com- bibliometric indicators are only as reliable as their prehensive manner, or to take the quantitative data sources and methods employed to produce approach that employs standard evaluation metrics them. Although the approach to citation database to generate comparable assessment outcomes. enhancement is one of the core issues in our project The qualitative approach has the advantages of (Yang & Meho, 2011), we focused on examining bib- human judgment, flexibility, and customization, liometric measures for research assessment in the but is a resource-intensive process with a lack of current study. Specifically, we investigated how standardized criteria and methodology that can robust different bibliometric indicators are in lead to inconsistent or biased results. The quantita- assessing research performance. tive approach is a fairly standardized process which can be applied to evaluate and compare a large 1.3. Study Design amount of data in an efficient manner. However, In order to test the reliability and stability of bib- the evaluative outcome, which is based on only liometric indicators (BI) for research assessment, we those facets of research performance that are readi- compared the rankings of faculty members by vari- ly quantifiable, is neither as holistic nor personal- ous BI scores, such as publication count, citation ized as that arrived at by the qualitative approach. count, and h-index. By comparing the rankings, we Furthermore, the quantitative approach typically hoped to gain insights into the aspects of research does not adequately take into consideration the dif- performance measured by BIs and to determine ferences among base units of evaluation (e.g., cita- how robust the assessment may be. tions, publications), thus sacrificing the accuracy of assessment for the sake of simplification. On top of 1.3.1. Study Data these challenges, properly assessing the contribu- For the study data, we used the publication and tion of each author for a collaborative work is a citation data of 159 tenure-track faculty members of troublesome undertaking. Estimating author contri- Library and Information Science (LIS) departments butions by the order of authorship (i.e. author rank) in 34 Korean universities (Yang & Lee, 2012). The in a multi-author paper is guess work at best and study data included 2402 peer-reviewed papers does not always correspond to the true contribu- published between 2001 and 2010, 2232 of which tions that authors put forth to the publication in were Korean journal papers, 111 international jour- question. nal papers, and 59 international conference papers. In addition to the challenges inherent in assess- We collected 2811 citations to 871 papers (1531 ment approaches are the sources of evaluation data, papers had no citations), 1452 of which were cita- especially for the quantitative approach. The publi- tions to Korean journal papers, 1116 were to 93 cation and citation data that feed into research per- international journals, and 243 citations were to 38 formance assessment are collected from citation international conference proceedings. databases such as Web of Science, but citation data- We initially compiled the publication list of 146 bases suffer from lack of comprehensive coverage faculty members from the National Research Foun- and standard data inclusion criteria that can lead to dation’s (NRF) Korean Researcher Information inconsistent outcomes (Meho & Yang, 2007; Yang et (KRI) system, which was supplemented by 4 author-

30 Bibliometric Approach to Research Assessment

supplied publication lists and publication informa- ed data sources, sparse coverage of non-English tion for 9 additional authors from the Korea publications, and omission of citations from non- Institute of Science and Technology Information s journal sources (e.g., books, conferences), to many (KISTI), Science and Technology Society Village technical problems dealing with synonyms, (STSV), and Nurimedia’s DBPIA citation database. homonyms, and authority control (Funkhouser, KRI publication data was then validated and sup- 1996; Meho & Yang, 2007; Seglen, 1998). plemented by double-checking with STSV, DBPIA, Meho and Yang (2007) conducted a citation study and Naver’s Scholarly Publication Database service, that further demonstrated the necessity of using after which Google Scholar was searched to update multiple citation sources. The study used citations the international publication data (e.g., SSCI journal to more than 1,400 works by 25 library and informa- papers). tion science faculty to examine the effects of adding After all the publication data was compiled, we Scopus and Google Scholar data on the citation collected the citation data from KISTI’s Korean counts and rankings of these faculty members as Science Citation Index (KSCI) and NRF’s Korea measured by WoS. The study found that the addi- Citation Index (KCI). Since the KCI data appeared to tion of Scopus citations to those of WoS significantly be sparsely populated at the time of data collection, altered the relative ranking of faculty in the middle we used the KSCI to obtain citations to the five of the rankings. The study also found that Google major Korean LIS journals and used KCI to obtain Scholar stands out in its coverage of conference citations to other miscellaneous journal papers.1 proceedings as well as international, non-English The citations to international publications were col- language journals. According to the authors, the use lected from Web of Science and Google Scholar. The of Scopus and Google Scholar, in addition to WoS, inclusion criteria for publication were as follows: reveals a more comprehensive and complete pic- For Korean publications, only the papers published ture of the extent of the scholarly relationship bet- 2 in KCI journals were included. For international ween library and information science and other fields. publications, only the papers published in peer- Despite criticisms, which are largely concerned reviewed journals as indicated in Ulrich’s Periodicals with the comprehensiveness of citation data and peer-reviewed conferences as verified in the sources, proponents have reported the validity of conference websites were included. citation counts in research assessments as well as the positive correlation between them and peer reviews and lists of publications. In citation studies 2. RELATED RESEARCH that compared peer assessment to citation counts (Oppenheim, 1995; Holmes & Oppenheim, 2001), Although counting citations to estimate the quali- researchers found that peer ratings of academic ty of scholarly publication is fundamental to cita- departments are strongly correlated to the citation tion analysis (Garfield, 1979; Smith, 1981; Cronin, counts for the publications by the members of 1984), the effectiveness of citation count as a surro- departments. In a study that compared the results gate measure for publication quality has been ques- of expert surveys with citations to 10 German-lan- tioned by researchers (MacRoberts & MacRoberts, guage journals, Schloegl and Stock (2004) found 1996; Seglen, 1998). Limitations reported in litera- strong correlation (+0.7) between reading frequency ture range from the problems associated with limit- and the regional impact factor,3 the impact factor of

1 KSCI covers only science and engineering journals whereas KCI covers journals in all disciplines. Since the five major LIS journals covered in KSCI make up the bulk of LIS publications, it is likely that other journals may be of non-LIS disciplines and covered in KCI rather than KSCI. 2 KCI journals are those journals selected by the NRF to be included in KCI. They are similar to ISI journals in that they are regarded as high quality publi- cations. There are 4 KCI journals in the LIS field. 3 To adjust the journal impact factor for a given region, the regional impact factor was computed by adding journal self-citation counts and numbers of citations from regional journals to the numerator of the impact factor formula.

31 http://www.jistap.org JISTaP Vol.1 No.1, 27-41

journals, while finding slightly negative correlation weight coefficients”4 based on author rank (i.e., the (-0.11) between reading frequency and the overall order of authors) to differentiate among contribu- impact factor. In addition to giving more evidence tions of multiple authors. Zhang extended an earlier to the validity of citation count as a measure of reciprocal-rank based weighting proposal (Seker- research impact, Schloegl and Stock’s study under- cioglu, 2008) that assigned the weight of 1/k to kth scored the importance of appropriate application of ranked co-author with the formula shown below citation analysis by showing how simple adjustment that linearly transformed previously hyperbolic for region resulted in a much different outcome. Li author weight distribution.5 et al. (2010), who conducted a study correlating the results from an expert survey of publications by 2(n-k+1) c(k, n) = {n 4, 2 k n-1} (3) researchers with citation-based author scores (e.g., (n+1)(n-2), h-index, g-index) using WoS, Scopus, and Google Scholar data, found that expert assessment of schol- Literature on bibliometric assessment of Korean arly work is strongly correlated to automatic quan- faculty research is limited mostly to studies that tification of research performance by citation analy- analyze the publication data. A few studies that sis. The authors cautioned, however, that the mag- make use of the citation count rely on Web of nitudes of correlation, though statistically signifi- Science, which does not have the complete citation cant, were not at levels where citation-based indica- data for Korean publications. Chung (2009), who tors could substitute for expert judgments. evaluated the scholarly work of 41 Korean LIS pro- While the bulk of citation analysis studies has fessors published between 2003 and 2007 (239 jour- been focused on validating citation-based measures nal articles and 49 monographs), compared the against the gold standard of human judgment, publication counts of authors with publication some researchers have explored the idea that not all counts weighted according to faculty evaluation citations are created equal (Cronin, 1984). Google’s 6 PageRank (Brin & Page, 1998) can be regarded as an guidelines used in typical Korean universities. adaptation of Pinski and Narin’s algorithm to the Chung emphasized the importance of qualitative setting of the Web to estimate the importance of over quantitative analysis of scholarly publications; web pages. A recent application of link-propagated however, his study did not delve deeply into citation weighting is the Eigenfactor score specifics of the qualitative approach beyond the (Bergstrom, 2007), which calculates the impact of simple application of somewhat arbitrary publica- journals by aggregating citation weights that are tion quality standards (i.e., faculty evaluation guide- computed in a manner similar to the PageRank score. lines). Yang and Lee (2012), in an analysis of 2,401 Aside from differentiating articles according to publications authored by 159 Korean LIS professors their importance or impact, a publication by multi- between 2001 to 2010, ranked LIS departments in ple authors may be assigned a weight that corre- Korea by publication counts in various categories, sponds to the contribution of each author. Zhang such as domestic (i.e., Korean) papers, international (2009) proposed a citation weighting scheme that papers, per faculty, and overall, to highlight the effect multiplies the raw citation count by “co-author of different bibliometrics on evaluative outcomes.

4 First and corresponding authors are each given weights of 1 while the weights of remaining authors sum to one with co-author weights being inversely proportional to author ranks. 5 In the special case of c(2,3), the weight of 0.7 is assigned. 6 In many universities in Korea, faculty evaluation guidelines specify how research performance should be assessed. For instance, the guidelines may specify that articles published in SCI or SSCI journals receive 150 points while articles published in Korean journals of equivalent status obtain 100 points.

32 Bibliometric Approach to Research Assessment

3. STUDY RESULTS at least 4 out of 20 most-published authors in the study sample have very few citations to their work despite the high numbers of papers that they pro- In order to test the reliability and stability of bib- duced. On the other hand, large rank differences in liometric indicators (BI) for research assessment, we the right table, where authors are ranked by CC, are compared the rankings of faculty members by vari- caused by a handful of highly cited papers. For ous BI scores, such as publication count, citation example, the top ranked author (P075) had four count, and h-index. By comparing the rankings, we papers cited 179, 30, 26, and 18 times respectively, hoped to gain insights into the aspects of research and the second ranked author had a paper with 87 performance measured by BIs and to learn how citations and another with 45 citations. Similarly, robust the assessment may be. the fourth and fifth ranked authors (P008 and P033) had 29, 20, 19, and 19 citations and 27, 23, and 21 3.1 Publication Count vs. Citation Count citations respectively, and so on. Another interesting We first compared the ranking of the authors by fact is that these highly cited papers are all interna- publication count (PC) with the ranking by citation tional publications, whereas most of the papers with count (CC). Table 2 shows the top 20 faculty mem- few citations in the right table are published in bers ranked by PC versus the top 20 by CC. It can be Korean journals. seen that in the left table where authors are ranked To ascertain whether PC and CC measure the by PC, large PC and CC rank differences (rows in same or different aspects of research performance, red) are due to high PC and low CC. In other words, we computed the Spearman’s rank order correla-

Table 2. Top 20 Authors by Publication Count (PC) and Citation Count (CC)

AuID PC CC PC-rank CC-rank Rank Diff AuID CC PC CC-rank PC-rank Rank Diff

P091 63 77 1 8 7 P075 333 22 1 31 30 P044 49 38 2 14 12 P051 186 22 2 32 30 P077 48 78 3 7 4 P111 134 39 3 6 3 P042 46 29 4 20 16 P008 134 19 4 41 37 P041 44 38 5 15 10 P033 123 12 5 77 72 P111 39 134 6 3 -3 P133 86 37 6 8 2 P069 38 3 7 108 101 P077 78 48 7 3 -4 P133 37 86 8 6 -2 P091 77 63 8 1 -7 P089 37 28 9 23 14 P011 58 6 9 129 120 P037 36 57 10 10 0 P037 57 36 10 10 0 P018 36 37 11 16 5 P073 54 18 11 44 33 P110 35 45 12 12 0 P110 45 35 12 12 0 P006 34 25 13 28 15 P129 42 20 13 38 25 P131 33 9 14 74 60 P044 38 49 14 2 -12 P028 32 35 15 17 2 P041 38 44 15 5 -10 P050 32 1 16 138 122 P018 37 36 16 11 -5 P104 31 29 17 21 4 P028 35 32 17 15 -2 P015 30 1 18 139 121 P007 32 29 18 19 1 P007 29 32 19 18 -1 P101 32 15 19 58 39 P005 29 26 20 26 6 P042 29 46 20 4 -16

33 http://www.jistap.org JISTaP Vol.1 No.1, 27-41

tion. For the entire rank of 159, Spearman’s rho instance, the contribution of the corresponding showed positive association (p = 0.7045). PC and CC author should be counted heavily regardless of the both being the measures of research performance, author order. Zhang (2009), for example, assigns an overall association between two indicators showing author weight of 1 to both the first author and corre- positive correlation seemed reasonable since pock- sponding author, while remaining co-authors are ets of differences are likely to be hidden when aver- assigned weights that diminish with the author aged over the entire ranking. To examine the rank order. differences at a finer grade, we computed the rank However, there is no guarantee regarding the correlation for the rank intervals of 20 (Table 3). We contribution of the corresponding author and the can see clearly that PC and CC are not correlated at order of authorship linearly corresponding to the rank intervals, which indicates that PC and CC mea- contribution amount. Co-author contributions will sure different aspects of research performance. probably vary from case to case, so the most accu- Table 3 also shows that the strength of association rate assessment should come from authors them- gets weaker at lower rank intervals, which suggests selves. Since such information is impractical to col- the importance of citation counts for authors with lect in a large scale as well as being subject to per- low publication counts. sonal bias and subjective interpretations, one must turn to readily available evidence, which is the order

Table 3. Spearman’s Rank Order Correlation at Rank Intervals: in which authors are listed in a publication. Based Publication Count vs. Citation Count on the assumption that an author’s contribution to a collaborative work should correspond to what we Rank p (PC-CC) call the “authorRank” (i.e. author order), we com- 1-20 0.5677 pared the raw citation count (CC) with citation 21-40 0.0606 counts weighed by their estimated contribution to 41-60 0.1321 the paper that is being cited. The first weighting for- 61-80 0.4439 mula (CC2) is a modified version of Zhang s (2009) 81-100 0.2465 co-author weights. Since our study data did not 101-120 -0.0116 include information on who the corresponding 121-140 0.0666 authors are, we assigned the first author the weight 141-159 0.0320 of 1 and the co-authors diminishing weights that sum to 1. The formula for CC2 is shown in Equation 4. (df=18, =0.05, CV=0.447)

CC2 = cc*auwt 1,

3.2. Author Rank Effect aucnt - aurank+1 aurank = 1 auwt = (4) , au > 1 Collaborative research projects produce publica- 0.5*aucnt*(aucnt-1) rank tions with multiple authors. Typically, the first author is the main contributor with co-authors list- We also computed the second co-author weight- ed in the order of contribution amount. An excep- ing formula for citation count (CC3) using 1 over tion to this format occurs when there is a corre- authorRank (Equation 5) and the third formula sponding author, who may sometimes be listed as using 1 over author count (Equation 6). CC3 uses the last author but whose contribution can be com- diminishing author weights, the sum of which can parable to the first or the second author. A corre- exceed the first author weight of 1, while CC4 sponding author may be someone akin to the prin- assigns all co-authors the same weights that can be cipal investigator of a research project, who archi- a small fraction of the first author weight for publi- tected and directed the research that produced the cations with many authors. An example shown in publication, while the first author did the leg work Table 4 illustrates the differences in the authorRank and wrote the bulk of the paper. In such an formulas.

34 Bibliometric Approach to Research Assessment

1 ferences in rankings across authorRank weight for- CC3 = CC * (5) aurank mulas (Table 5). We attribute this to the fact that 1 only about half of the study data is co-authored CC4 = CC * (6) aucnt (31% with 2 authors, 16% with 3 or more authors). For the 51% of the single author papers, authorRank When we compared the rankings of authors by weights can have no effect, thus the authorRank CC, CC2, CC3, and CC4, we observed only small dif- effect is muted when averaged out over the entire

Table 4. Example: AuthorRank Weights Applied to Citation Counts

Aurank Auwt CC CC2 CC3 CC4

1 1 10 10 10 10

2 4/10 10 4 5 2

3 3/10 10 3 3.3 2

4 2/10 10 2 2.5 2

5 1/10 10 1 2 2

Table 5. Top 20 Authors by Citation Counts, Using AuthorRank Weights

AuID PC CC CC2 CC3 CC4 CC-rank CC2-rank CC3-rank CC4-rank

P075 22 333 332 239 163 1 1 1 1

P051 22 186 108 108 64 2 4 3 6

P111 39 134 132 91 84 3 2 4 3

P008 19 134 118 118 95 4 3 2 2

P033 12 123 80 78 74 5 6 6 4

P133 37 86 86 82 42 6 5 5 9

P077 48 78 75 72 63 7 8 8 7

P091 63 77 77 77 69 8 7 7 5

P011 6 58 58 58 58 9 9 9 8

P037 36 57 57 51 37 10 10 11 11

P073 18 54 54 54 42 11 11 10 10

P110 35 45 42 34 24 12 12 12 16

P129 20 42 42 31 27 13 13 15 13

P044 44 38 30 26 21 14 17 22 22

P041 49 38 37 30 22 15 14 16 19

P018 36 37 34 31 21 16 16 14 21

P028 32 35 35 33 24 17 15 13 15

P007 29 32 29 28 20 18 20 19 23

P101 15 32 30 29 23 19 18 18 18

P042 46 29 25 20 13 20 24 27 33

35 http://www.jistap.org JISTaP Vol.1 No.1, 27-41

study data. One notable occurrence in the To isolate the effect of authorRank, we excluded Spearman’s coefficient table (Table 6) is the low single-author publications and redid the rank com- numbers in the CC-CC4 column, which suggests parisons with authorRank. There were 1136 out of that the raw citation count, which passes the entire 2402 papers that were co-authored, 65% of which impact indicator of a given paper to all authors were written by 2 authors and 35% with 3 or more equally, is quite different from CC4, which passes authors. As can be seen in Table 7, the rank differ- only a fraction of the impact indicator amount to ences are more pronounced without the single the co-authors. author papers, but Spearman’s coefficients still

Table 6. Spearman’s Rank Order Correlation at Rank Intervals, Using AuthorRank Weights

Rank p (CC-CC2) p (CC-CC3) p (CC-CC4) p (CC2-CC3) p (CC2-CC4) p (CC3-CC4)

1-20 0.9085 0.9504 0.9263 0.9714 0.9143 0.9278 21-40 0.5474 0.4632 0.2752 0.8361 0.4316 0.6331 41-60 0.7895 0.6677 0.1789 0.9128 0.3574 0.4120 61-80 0.8376 0.4571 0.4632 0.5579 0.5173 0.7230 81-100 0.7955 0.5143 0.2346 0.7805 0.5008 0.2526 101-120 0.8421 0.6090 0.3429 0.7910 0.7188 0.8090 121-140 0.9474 0.7744 0.3729 0.7594 0.5699 0.5444 141-159 0.9959 0.9959 0.9876 0.9794 0.9732 0.9794

(df=18, =0.05, CV=0.447)

Table 7. Top 20 Authors by Citation Counts, Using AuthorRank Weights (aucnt>1)

AuID PC CC CC2 CC3 CC4 CC-rank CC2-rank CC3-rank CC4-rank P075 20 298 297 204 128 1 1 1 1 P051 13 163 85 85 49 2 4 2 4 P111 24 96 94 53 46 3 2 4 2 P133 37 86 86 82 42 4 3 3 3 P033 8 70 27 25 21 5 10 9 6 P008 10 61 45 45 22 6 5 5 5 P037 25 37 37 31 17 7 6 6 7 P110 28 35 32 24 14 8 7 11 9 P129 14 31 31 20 16 9 8 15 8 P041 34 29 28 21 13 10 9 13 12 P018 27 29 26 23 13 11 12 12 11 P042 38 28 24 19 12 12 15 17 15 P044 27 27 20 15 10 13 19 20 18 P160 2 27 4 6 4 14 64 44 39 P138 12 27 27 27 14 15 11 7 10 P077 35 26 23 20 11 16 16 14 16 P039 13 26 26 26 13 17 13 8 13 P073 6 25 25 25 13 18 14 10 14 P017 16 24 7 9 6 19 42 31 27 P006 28 23 14 12 8 20 23 24 21

36 Bibliometric Approach to Research Assessment

showed little differences and thus an insignificant ence in such a small interval to be not meaningful effect of authorRank (Table 8). The strength of asso- for gauging differences in bibliometric measures ciation, especially in the top rank interval, is weaker even if such outcomes were not spurious. than when single-author papers were included, To isolate the authorRank effect further to the which may be due to a few authors who received point of magnification, we excluded all publications many citations as co-authors. Restricting to co- from the study data where the faculty members in authored papers reduced the total number of the study were listed as first authors. The resulting authors in the study data from 159 to 125, so the last data subset included 107 authors that published 594 row in the Spearman’s coefficient table spanned the papers, where 61% were 2-author papers and 39% rank interval of 5 instead of 20. The rho s in the rank were papers with 3 or more authors. As expected, interval of 121-125 turned out to be all statistically the rank differences became more pronounced with insignificant, but we considered the ranking differ- a restricted dataset (Table 9) with smaller rho across

Table 8. Spearman s Rank Order Correlation at Rank Intervals, Using AuthorRank Weights (aucnt>1)

Rank p (CC-CC2) p (CC-CC3) p (CC-CC4) p (CC2-CC3) p (CC2-CC4) p (CC3-CC4) 1-20 0.8872 0.7293 0.8977 0.8782 0.9744 0.8962 21-40 0.8971 0.8541 0.8677 0.8541 0.8256 0.8135 41-60 0.8331 0.6075 0.5654 0.7759 0.8045 0.2286 61-80 0.7459 0.4586 0.5594 0.6211 0.8571 0.3895 81-100 0.7008 0.6361 0.8526 0.6677 0.7158 0.7789 101-120 0.9684 0.9386 0.9534 0.9323 0.9669 0.9534 121-125 0.3000 0.1000 0.5000 -0.5000 0.2000 0.7000

(df=18, =0.05, CV=0.447), (df=3, =0.05, CV=1.000)

Table 9. Author Rankings by Citation Counts Weighted by AuthorRank Weights (aurank>1)

AuID PC CC CC2 CC3 CC4 CC-rank CC2-rank CC3-rank CC4-rank P075 5 187 186 93 93 1 1 1 1 P051 5 101 23 23 18 2 3 3 3 P111 20 85 83 42 42 3 2 2 2 P033 6 61 18 16 16 4 7 4 4 P160 9 27 4 5 4 5 31 14 18 P110 24 23 20 12 10 6 5 5 6 P129 10 22 22 11 11 7 4 6 5 P008 4 22 6 6 6 8 20 12 13 P044 20 21 14 9 8 9 9 8 8 P017 9 20 3 5 4 10 36 16 21 P035 12 19 19 10 10 11 6 7 7 P006 20 18 9 7 6 12 12 10 11 P041 14 16 15 8 8 13 8 9 9 P042 22 15 11 6 6 14 11 11 10 P037 7 12 12 6 6 15 10 13 12 P018 12 11 8 5 4 16 14 15 15 P077 16 10 7 4 4 17 18 18 16 P133 24 9 9 5 5 18 13 17 14 P104 14 9 5 4 3 19 25 23 23 P116 10 9 7 4 4 20 17 19 20

37 http://www.jistap.org JISTaP Vol.1 No.1, 27-41

Table 10. Spearman’s Rank Order Correlation at Rank Intervals, Using AuthorRank Weights (aurank>1)

Rank p (CC-CC2) p (CC-CC3) p (CC-CC4) p (CC2-CC3) p (CC2-CC4) p (CC3-CC4)

1-20 0.5955 0.8556 0.7459 0.9083 0.9338 0.9639

21-40 0.5654 0.9143 0.6421 0.5789 0.7098 0.8030

41-60 0.4060 0.5008 0.5053 0.6451 0.8992 0.8647

61-80 0.3233 0.6902 0.5038 0.3474 0.7910 0.8286

81-100 0.9338 0.9173 0.9609 0.9850 0.9564 0.9444

101-107 0.6786 0.7143 0.6071 0.7857 0.2143 0.1786

(df=18, =0.05, CV=0.447), (df=5, =0.05, CV=0.786)

rank intervals (Table 10), which serves as evidence ing differences in all rank intervals except for the that non-primary author contributions should be very top and very bottom rank intervals, which treated differently from primary author contributions. could reflect the tendency of citations to overwhelm the h- and g-index computations for authors with very high or low citation counts. 3.3. Publication Count and Citation Count vs. Table 12 shows the Spearman’s coefficients for h-index and g-index the entire 159 authors with additional comparisons In addition to comparing publication count and of authorRank weights, where suffixes correspond citation count, which measure quantity and quality with authorRank formulas. For instance, h2 is rank- of research respectively, we investigated h-index ing by h-index where citation counts are weighted and g-index, which consider both quantity and with the author weight formula (CC2), h3 uses the quality of research. The fact that rankings by publi- author rank formula (CC3), and h4 uses the author cation count and h-index (p-h) and by publication count formula. The strength of association between count and g-index (p-g) show significant ranking publication count and h-/g-index using CC2 are differences in all rank intervals demonstrates inher- consistently lower than for other ranking compari- ent differences in what publication count measures son pairs, which suggests that CC2 (author weight) and what h- or g-index measure (Table 11). Rankings formula may be the most robust authorRank weight- by citation count and h-index (c-h) and by publica- ing formula used in the study. tion count and g-index (c-g) show significant rank-

Table 11. Spearman’s Rank Order Correlation at Rank Intervals, Using PC/CC vs. h-/g-index

Rank p (p-h) p (p-g) p (c-h) p (c-g)

1-20 0.4211 0.4391 0.7173 0.9248

21-40 0.2195 0.2105 0.1293 0.3940

41-60 0.0947 0.0105 0.2286 0.2737

61-80 0.1880 0.2376 -0.0872 0.1774

81-100 0.1910 0.3008 -0.0015 0.1398

101-120 0.1820 0.2436 -0.1398 0.1308

121-140 -0.0707 0.1504 0.1353 0.1353

141-159 0.1414 0.2343 0.6367 0.6367

38 Bibliometric Approach to Research Assessment

Table 12. Spearman’s Rank Order Correlation at Rank Intervals (Overall)

p (p-c) p (p-h) p (p-g) p (c-h) p (c-g) p (h-g)

0.7169 0.5608 0.5609 0.8522 0.9137 0.9171 p (p-c2) p (p-h2) p (p-g2) p (c2-h2) p (c2-g2) p (h2-g2)

0.5445 0.5445 0.4726 0.8315 0.8315 0.8496

p (p-c3) p (p-h3) p (p-g3) p (c3-h3) p (c3-g3) p (h3-g3)

0.7096 0.5346 0.4870 0.8256 0.8416 0.8695

p (p-c4) p (p-h4) p (p-g4) p (c4-h4) p (c4-g4) p (h4-g4)

0.6963 0.5266 0.4511 0.8267 0.8265 0.8602

4. CONCLUSION reflection of citation behavior specific to Korean LIS researchers. Whether low citation counts to Korean We investigated how bibliometric indicators such journals reflect the impact of those journals or their as publication count and citation count affect the environment, such as the size and characteristics of assessment of research performance by computing the user groups, remains to be seen. It may very well various bibliometric scores of the works of 159 be that some of the papers published in Korean Korean LIS faculty members and comparing the journals are of little interest to non-Korean scholars, rankings by those scores. The study results showed in which case citation counts should be normalized correlation between publication count and citation accordingly. Papers with less than two citations, count for authors with many publications but the however, suggest low impact regardless of the size opposite evidence for authors with few publica- of the citation pool. tions. This suggests that as authors publish more We also found that citation counts should be and more work, citations to their work tend to weighted according to authorRank for non-primary increase along with publication count. However, for authors in multi-author papers. Though not conclu- junior faculty members who have not yet accumu- sive, an author weighting formula that assigns lated enough publications, citations to their work decreasing weights to authorRank and sums to 1 are of great importance in assessing their research may be the most robust approach to handling the performance. authorRank effect. Another study finding is that h- The study data also showed that there are marked index and g-index measure markedly different difference in the magnitude of citations between aspects of research performance than publication papers published in Korean journals and papers count and citation count. Although this finding is published in international journals. To say that this no surprise since it is in accordance with the origi- difference, which is over an order of magnitude in nal intention of h-index and g-index, much weaker most cited papers, is due to the population size dif- strength of association between publication/cita- ference between Korean scholars and scholars in tion counts and h-/g-index than publication count the world at large overlooks some important aspects and citation count indicate that integrating the con- of research impact. Specifically, we must keep in sideration of quality and quality in research assess- mind that the open access to a wide audience pool ment produces quite a different outcome than com- that most international journals enjoy increases the paring quality and quantity of research separately. potential of research impact. In addition, research The study demonstrated that bibliometric approach of significance should theoretically incur more cita- to research assessment can produce different evalu- tions than those exhibited in the study data regard- ation outcomes depending on how the data is ana- less of its venue. Evidence to the contrary may be a lyzed. Such findings, even without the issue of data

39 http://www.jistap.org JISTaP Vol.1 No.1, 27-41

problems, should serve as a reminder that biblio- States of America, 12(46): 16569-16572. metric methods have limitations and we should Holmes, A., & Oppenheim, C. (2001). Use of citation take care in interpreting their results. Research per- analysis to predict the outcome of the 2001 Rese- formance, let alone research potential, has many arch Assessment Exercise for Unit of Assessment facets that quantitative methods cannot fully cap- (UoA) 61: Library and information management. ture. Even the aspects of research performance that Information Research, 6(2). Retrieved from http: are quantifiable are not necessarily measured by the //informationr.net/ir/6-2/paper103.html conventional bibliometric measures in a robust and Li, J., Sanderson, M., Willett, P., Norris, M., & Oppen consistent manner. We must therefore continue the heim. C. (2010). Ranking of library and informa- investigation into research assessment approaches tion science researchers: Comparison of data that can incorporate a wider spectrum of research sources for correlating citation data, and expert performance in an efficient and effective manner. judgments. Journal of Informetrics, 4, 554-556. One of the key steps in future research must be to MacRoberts, M. H., & MacRoberts, B. R. (1996). Pro- compare various measures across boundaries for blems of citation analysis. Scientometrics, 36(3), the purpose of normalization. Evaluation of 435-444. research performance using document-level versus Meho, L., & Yang, K. (2007). Impact of data sources journal level measures (e.g., citation count vs. jour- on citation counts and rankings of LIS faculty: nal impact factor) and analysis of impact factor dif- Web of Science vs. Scopus and Google Scholar. ferences across disciplines and countries would be Journal of the American Society for Information good places to begin exploring ways to normalize Science and Technology, 58(13), 2105-2125. these measures that in some cases are like apples Oppenheim, C. (1995). The correlation between cita- and oranges. tion counts and the 1992 Research Assessment Exercise Ratings for British library and informa- tion science university departments. Journal of REFERENCES Docu- mentation, 51(1), 18-27. Page, L., Brin, S., Motwani, R., & Winograd, T. (1998). Bergstrom, C. T. (2007). Eigenfactor: Measuring the The PageRank citation ranking: Bringing order to value and prestige of scholarly journals. College & the Web. Retrieved from http://dbpubs.stanford. Research Libraries News,68(5), 314-316. edu/pub/showDoc. Fulltext?lang = en & doc = Chung, J. S. (2009). A study on assessment of faculty 1999- 66 & format = pdf. performance in research achievement: A focus Schloegl, C., & Stock, W. G. (2004). Impact and rele- on library and information science field. Journal vance of LIS journals: A scientometric analysis of of the Korean BIBLIA Society for Library and international and German-language LIS journals- Information Science, 20(2), 129-142. Citation analysis versus reader survey. Journal of Cronin, B. (1984). The citation process: The role and the American Society for Information Science and significance of citations in scientific communica- Technology, 55(13), 1155-1168. tion. London: Taylor Graham. Seglen, P. O. (1998). Citation rates and journal impact Egghe, L. (2006). Theory and practice of the g-index. factors are not suitable for evaluation of research. Scientometrics, 69(1), 131-152. Acta Orthopaedica Scandinavica, 69(3), 224-229. Funkhouser, E. T. (1996). The evaluative use of cita- Sekercioglu, C. H. (2008). Quantifying coauthor con- tion analysis for communications journals. Human tributions. Science, 322(5900), 371-375. Communication Research, 22(4), 563-574. Smith, L. C. (1981). Citation analysis. Library Trends, Garfield, E. (1979). Citation indexing: Its Theory and 30(1), 83-101. Application in Science. New York, NY: Wiley. Yang, K., & Lee, J. (2012). Analysis of publication pat- Hirsch, J. E. (2005). An index to quantify an individ- terns in Korean library and information science ual’s scientific research output. Proceedings of research. Scientometrics, 93(2), 233-251. the National Academy of Sciences of the United Yang, K., Lee, J., Choi, S. H., & You, B. J. (2012). Com-

40 Bibliometric Approach to Research Assessment

parison and analysis of data coverage for citation index. Eighth International Conference on Webometrics, Informetrics and Scientometrics (WIS) & Thirteenth COLLNET Meeting. : COLLNET. Yang, K., & Meho, L. (2011). Multi-faceted citation analysis for quality assessment of scholarly pub- lications. Journal of the Korean Society for Infor- mation Management, 28(2), 79-96. Zhang, C. T. (2009). A proposal for calculating weighted citations based on author rank. Embo Reports, 10(5), 416-417.

41 http://www.jistap.org Research Paper JISTaP http://www.jistap.org J. of infosci. theory and practice 1(1): 42-53, 2013 Journal of Information Science Theory and Practice http://dx.doi.org/10.1633/JISTaP.2013.1.1.3

Information Needs and Seeking Behavior During the H1N1 Virus Outbreak

Shaheen Majid* Nor Ain Rahmat Wee Kim Wee School of Communication & Information Wee Kim Wee School of Communication & Information Nanyang Technological University, Singapore Nanyang Technological University, Singapore Email: [email protected] Email: [email protected]

ABSTRACT Timely access to quality healthcare information during an outbreak plays an important role in curtailing its spread. The aim of this study was to investigate the information needs and seeking behavior of the general public in Singapore during the H1N1 pandemic. A pre-tested questionnaire was used for data collection. The convenience snowball sampling method was used and 260 working adults and tertiary-level students participated in this study. The most crucial information needs of a majority of the participants were: symptoms of H1N1, causes of the infection, preventive measures, and possible treatments. Data analysis also revealed that mass media such as television, news- papers, and radio were most frequently used for seeking the needed information. The use of human information sources was also quite high while only a small number of the respondents accessed online news and healthcare web- sites. About three-quarters of the participants indicated that the gathered information helped them to stay vigilant and take necessary precautionary measures. A major problem identified by the participants in using H1N1 informa- tion was the lack of understanding of certain terms used in public communications. This paper suggests certain mea- sures for strengthening health information communication during future outbreaks.

Keywords: Information Needs, Information Seeking Behavior, N1H1, Swine Flu, Influenza A, Information Sources, Singapore

1. INTRODUCTION to spread around the world quickly due to rapid urbanization, increase in global travelling, and over- Pandemics caused by diseases such as H1N1, also crowded conditions in big cities. In order to limit referred to as Influenza A and Swine flu, are likely the spread of an outbreak, several organizations

Open Access

Received date: December 16, 2012 All JISTaP content is Open Access, meaning it is accessible Accepted date: February 13, 2013 online to everyone, without fee and authors’ permission. All JISTaP content is published and distributed under the terms of *Corresponding Author: Shaheen Majid the Creative Commons Attribution License (http://creativecom- Associate professor mons.org/licenses/by/3.0/). Under this license, authors reserve Wee Kim Wee School of Communication & Information the copyright for their content; however, they permit anyone to Nanyang Technological University, Singapore unrestrictedly use, distribute, and reproduce the content in any E-mail: [email protected] medium as far as the original authors and source are cited. For any reuse, redistribution, or reproduction of a work, users must clarify the license terms under which the work was produced.

Shaheen Majid, Nor Ain Rahmat, 2013 42 Information Needs and Seeking Behavior

such as government departments, hospitals, peak of H1N1 outbreak. Their study, based on com- international health institutions, and other health- puter-assisted telephone interviews of 1,050 care agencies usually make concerted efforts to cre- respondents in Malaysia, showed that newspapers, ate awareness among the general public about the television, family members, and healthcare potential risks, disease symptoms, precautions, and providers were the main sources for seeking H1N1- possible treatments and interventions. related information. It was also revealed that the A pandemic may result in the deaths of large two major information needs during outbreak were numbers of people, disturbing local and interna- disease prevention and treatment. tional travelling, straining healthcare services, seri- As communities in many big cities are becoming ously hurting economies, wasting precious resou- very diverse, comprising different ethnic, cultural, rces, and disturbing the daily lives and activities of and social groups, it is important that the language citizens. In 2009, the world was shocked by a new of the message and delivery channels should be strain of the H1N1 virus and health experts found it chosen according to demographic composition. difficult to predict with certainty how this strain Yip et al. (2009) investigated the information-seek- would mutate to evolve into a new dangerous ing behavior of limited-English proficient (LEP) strain. In June 2009 the World Health Organization Chinese in Washington State (USA) during the (WHO) declared that the H1N1 virus outbreak had H1N1 outbreak. It was revealed that the major climbed to a Phase 6 pandemic level, meaning that channels used for seeking HIN1 information were this virus was likely to spread globally. This trig- TV (81%), reading Chinese newspapers (69%), and gered a worldwide panic and many countries start- community-based organizations (30%). Only 2 per- ed assessing their level of preparedness and began cent of the obtained information was from a public taking appropriate measures to combat the spread health system or hotline. The authors suggested of this virus. A necessary step in these preparedness that appropriate measures are desirable to reach efforts was to provide up-to-date information to the out to different ethnic minorities to enhance their general public about the spread of the H1N1 virus, capacity to effectively respond to an outbreak. necessary precautionary measures, and details on Lagasse et al. (2011) assessed the literacy level and how to impr- ove personal health and hygiene. readability of online communications about the To stop any further spread of an outbreak, it is H1N1 virus issued by the Centers for Disease necessary that appropriate messages should be Control and Prevention (CDC) during the first month delivered to the general public through suitable of the outbreak. It was found that documents tar- communication channels. During an outbreak geting non-technical audiences were text-heavy many countries provide health advisories to their and densely-formatted while the vocabulary and citizens using a variety of channels such as televi- writing style were in accordance with the targeted sion, radio, newspapers, posters, and the Internet to audience. create awareness and to build social resilience Gerwin (2012) argues that information dissemi- (Thompson, 2003). A study by Walter et al. (2012) nated to the general public by government agencies investigated the knowledge, attitudes, and behavior during a pandemic should be actionable. This of the general public about the H1N1 outbreak and means providing accurate facts and their proper vaccination against this virus. They reported a sig- interpretations which an individual should be able nificant difference in information-seeking behav- to use for making judgments and decisions. He fur- iors of different population subgroups. However ther elaborates that information should allow an they found that in all subgroups, conventional individual to consider risks to one s self, his or her media sources such as television, radio, and news- family, and community in an uncertain situation. papers were more frequently used than the However, Gerwin (2012) reports that a considerable Internet. Wong and Sam (2010) explored the H1N1- proportion of the general public failed to receive related information sources, information needs, and believe government messages on the safety and and preferences of the general public during the desirability of H1N1 vaccination. He also pointed

43 http://www.jistap.org JISTaP Vol.1 No.1, 42-53

out various factors that resulted in distorting the passengers travelling from high risk areas also government messages on the H1N1 vaccine in dif- showed a higher desire for H1N1 information. It is, ferent newspapers. Holmes et al. (2009) proposed therefore, desirable to develop appropriate infor- that, given the important role played by mass media mation strategies for possible future outbreaks to during an outbreak, there is an urgent need for pub- adequately meet the information needs of suspect- lic health agencies to build partnerships with jour- ed infected individuals as well as of other high risk nalists to help disseminate health information groups (Dickmann et al., 2011). effectively. Another related problem is that although In addition to mass media and other information mass media and other sources provide quicker sources, librarians and information professionals access to huge amounts of information, it may be can also play a key role in providing current, rele- more difficult for the general public to differentiate vant and accurate information to their patrons. between fact and fiction (Gerwin, 2012). Featherstone et al. (2012) investigated the support In order to effectively meet the information needs provided by librarians to meet information needs of of different segments of the society, it is important healthcare administrators. It was revealed that to adequately understand their information needs. emails and in-person requests were the most popu- Yang (2012) investigated knowledge levels among lar methods for approaching health librarians for 371 college students of the H1N1 pandemic and acquiring the needed information. In addition, availability of anti-viral vaccines. It was reported alerting services from reputable sources were also that most of the students overestimated their very useful in gathering reliable information. knowledge of the H1N1 virus although a majority of Therefore, libraries need to leverage their position them were not familiar with basic facts about this as a primary source of trustworthy information by outbreak. Caress et al. (2010) studied the informa- providing quick and easy access to credible infor- tion needs of respiratory patients, considered a mation during an outbreak (Zach, 2011). high-risk group, and their family members about the Availability of a variety of new information com- H1N1 pandemic. It was revealed that the patients munication platforms such as blogs, social net- and their family members wanted more informa- works, text messaging, podcasts, online gaming, tion about H1N1, although a majority of them had and virtual worlds have added a new dimension to already received a leaflet on this outbreak. The health information dissemination during outbreaks respondents pointed out that they would like to (Macario et al., 2011). Tausczik et al. (2012) investi- receive more focused and in-depth information, gated the effectiveness of various media used for particularly condition-specific information. information seeking during the H1N1 outbreak by During the peak of the H1N1 pandemic, all major examining language used in blogs, newspaper arti- airports made special arrangements to screen out cles, and the number of visits made to Wikipedia passengers with flu-like symptoms. They also pro- articles. The study revealed that the language used vided essential information to passengers through in blogs was strongly related to language used in advertisements, handouts, and public announce- newspapers on the same day. The number of visits ments to encourage them to undertake appropriate to Wikipedia peaked shortly after the announce- preventive measures to restrict the spread of the ment of the H1N1 pandemic and then declined disease. Dickmann et al. (2011) used semi-struc- rapidly. The study showed that the public reaction tured interviews to study the adequacy of informa- to the H1N1 outbreak was rapid and short-lived. It tion provided to passengers and airport staff during was suggested that an analysis of web behavior can the H1N1 outbreak. Their findings showed that the provide useful data about information seeking dur- desire for additional information was associated ing an outbreak (Tausczik et al., 2012). with the higher level of concern; that is, participants The above literature review suggests that the with higher concerns about the H1N1 pandemic impact of any outbreak can be considerably expressed a range of information needs. It was also reduced by providing the right information to the reported that airport staff coming in contact with right person at the right time and in a right format.

44 Information Needs and Seeking Behavior

However, to achieve this objective, an adequate by H1N1. These included parents of a seven year old understanding of the information needs and seek- H1N1 patient and two tertiary-level students who ing behavior of the general public during an out- had some N1H1 symptoms after returning from an break like H1N1 is essential. Singapore was one of overseas trip. the Southeast Asian countries badly affected by The questionnaire consisted of 5 sections con- H1N1 outbreak. However, only limited research taining 26 questions. The purpose of the first sec- has been done in this region on the information tion was to explore the respondents’ healthcare seeking behavior of the general public during this lifestyle and general health awareness. The next two outbreak. The main objective of this study was to sections, containing 27 statements each, collected bridge this gap and provide insight into this impor- data about the pandemic-related information needs tant subject area. Some aspects covered in this and information seeking behavior of the respon- study were: information needs of the general public dents. The next section was on problems faced by during the H1N1 pandemic, preference for differ- the respondents in understanding and using H1N1- ent information sources, purposes of seeking infor- related information. The final section of the ques- mation, use of health websites, and problems faced tionnaire collected demographic information about in using H1N1-related information. The findings of the respondents. The questionnaire was reviewed this study will be useful to health information com- and approved by the Institutional Review Board municators, hospitals, government health depart- (IRB) of Nanyang Technological University, Singapore. ments, social welfare departments, and other agen- The study population included individuals aged cies involved in public safety and wellbeing. They 17 years and above from various ethnic groups liv- can also use this knowledge to develop appropriate ing in Singapore. The scope of the study was, how- information strategies to keep the general public ever, confined to two major groups: working adults informed during an outbreak without creating and tertiary-level students as all organizations in unnecessary information overload. Singapore were required by the government to reg- ularly disseminate H1N1-related information to their staff and students. As a majority of the ques- 2. METHODOLOGY tions were not suitable for self-employed and non- working groups, they were excluded from the study A pre-tested questionnaire was used for data col- population. lection. Several factors were considered while The convenience snowball sampling method was designing the questionnaire to appropriately mea- used for data collection. Students were approached sure perceptions, attitudes, and behavior of the during class breaks and were given extra copies for study respondents. As the targeted population was distribution to their friends. For working adults the the general public, efforts were made to avoid using questionnaire was distributed via friends, working technical jargon. To become familiarized with the colleagues, neighbors, and other contacts. All topic, the researchers also consulted health related together 216 useable questionnaires were received survey questionnaires from REACH quick poll, and analyzed. The data collection work was com- MyMailMoment quick poll and health literacy stud- pleted in the first quarter of 2011. ies conducted by SingHealth [Singapore Health] and the Singapore Health Promotion Board. Visits were also made to neighbourhood clinics and other 3. FINDINGS health-related agencies in Singapore to collect H1N1-related brochures, posters, and other materi- The following sections present results of the data als to analyze their content. To further understand analysis. the topic and decide what areas to cover in the sur- vey, informal interviews were conducted with four 3.1. Respondents’ Demographic individuals who were directly or indirectly affected Twenty-nine percent of the respondents were ter-

45 http://www.jistap.org JISTaP Vol.1 No.1, 42-53

tiary students while the remaining 71% were work- 78.7% and 77.8% respectively. However, less than ing adults. There were more male respondents (59%) 40% of the respondents were using hand sanitizers than female (41%). The majority of the respondents and going for annual medical check-ups. It was belonged to the age groups 21 to 30 years (34%) and encouraging to note that a majority of the respon- 31 to 40 years (24%). The percentage of the respon- dents were observing good personal hygiene which dents in the age groups 20 years or less and 41 to 50 could play a crucial role in preventing spread of dis- years was 19% and 14% respectively. Those aged eases. over 50 years formed only 9% of the respondents. The majority of the respondents were either 3.3. Knowledge of the H1N1 Flu Virus Singaporeans (77%) or Singapore permanent resi- The participants were asked about their knowl- dents (13%). The remaining 10% of the respondents edge of the spread of the H1N1 virus, its symptoms, were mainly from Malaysia, Indonesia, India, and high risk groups, preventions, and available treat- China. ments. As shown in Table 2, a majority (mean score =3.70) of the respondents knew how the H1N1 virus 3.2. Healthcare Habits and Practices could spread from individuals infected with H1N1 As shown in Table 1, 84.3% of the respondents influenza. This was consistent with the previous reported covering their mouth and nose while finding in which a majority of the respondents coughing or sneezing. The percentage of respon- (84.3%) practiced good personal hygiene by cover- dents washing their hands several times a day and ing their mouth and nose when coughing or sneez- taking body temperature when feeling sick was ing. It was, however, observed that the mean scores

Table 1. Respondents’ Healthcare Habits and Practices (N=216)

S. No. Healthcare Measures Frequency Percent

1 I usually cover my mouth and nose while coughing or sneezing. 182 84.3%

2 I wash my hands several times in a day. 170 78.7%

3 I monitor my body temperature when I feel unwell. 168 77.8%

4 I eat balanced meals with plenty of fruits and vegetables. 142 65.7%

5 I exercise regularly. 104 48.2%

6 I use hand sanitizer quite frequently. 85 39.4%

7 I go for health check-up every year. 79 36.6%

Table 2. Respondents’ Knowledge of the H1N1 Virus (N=216)

S. No. Knowledge of H1N1 Mean Score (1~5) SD

1 I know how the H1N1 virus can spread from people with influenza. 3.70 1.07

2 I know the symptoms of H1N1. 3.56 1.02

3 I know who are the high risk groups of people for H1N1 infection 3.46 1.25

4 I know what steps I need to take to control spread of the H1N1 virus. 3.43 1.09

5 I know what treatments are available for H1N1 infection. 3.23 1.22

46 Information Needs and Seeking Behavior

for the remaining types of knowledge fell within a consuming pork while another 29% thought that very narrow range of 3.23 and 3.56. On the whole it people engaged in pig-related activities can get appeared that participants in this survey had a rea- infected. Based on these replies, it can be concluded sonable level of knowledge and awareness about that there were some myths and knowledge gaps in the H1N1 virus. understanding of the H1N1 flu virus. At the initial stages of the H1N1 outbreak, several terms including ‘Swine flu’ and ‘Influenza A’ were 3.4. H1N1-related Information Needs used to describe this pandemic. In order to further The respondents were asked to indicate the investigate the respondents’ knowledge they were importance of different H1N1-related information asked if the H1N1 virus could spread by eating pork, needs, using a 5-point Semantic Differential scale visiting a pig farm, or undertaking pig-related activi- where 1 represented the ‘least important’ and 5 the ties. It was interesting to note that 14% of the ‘most important’ information need (Table 3). The respondents believed that people can get H1N1 by top 5 most important information needs were: pre-

Table 3. Importance of H1N1-related Information Needs (N=216)

Importance level Ranking Information Need Mean Score SD (1~5)

1 Prevention and control of H1N1 virus 4.01 1.18

2 H1N1 signs and symptoms 3.98 1.10

3 Causes and treatments of illness 3.85 1.19

4 Spread of H1N1 in Singapore 3.77 1.21

Availability of medicines and vaccination in Singapore against H1N1 and 5 3.76 1.27 their side effects

6 Government’s advice for individuals having flu like symptoms 3.66 1.15

7 Information about proper procedure for washing hand 3.66 1.11

8 H1N1 vulnerable groups and the level of risk 3.65 1.24 H1N1 protection products and their availability at major retail outlets 9 3.59 1.20 (e.g. masks, sanitizers, etc.)

10 Updated information about H1N1 cluster areas in Singapore 3.56 1.25

11 Updated information about who should get the H1N1 vaccine 3.55 1.23

12 Information about proper ways for putting on a mask 3.55 1.17

13 Updated information about current and future Pandemic Plan for Singapore 3.50 1.17

Procedure for seeking treatment of suspected H1N1 patients at Pandemic 14 3.49 1.26 Preparedness Clinics (PPCs) or hospitals

15 Updated list of H1N1 affected countries 3.46 1.22

16 Updated number of H1N1 fatalities in Singapore and other countries. 3.34 1.25

17 Updated number of H1N1 infected cases across the world 3.21 1.25

18 Origin of H1N1 virus 3.19 1.32

19 Updated number of H1N1 infected patients who have recovered in Singapore 3.19 1.25

20 Facts about pig and eating pork in relation to H1N1 3.15 1.21

47 http://www.jistap.org JISTaP Vol.1 No.1, 42-53

vention and control of H1N1 virus (mean score ily members (mean score 3.25) and colleagues 4.01); symptoms of H1N1 (mean score 3.98); causes (mean score 3.21). Family doctors were approached and treatments (mean score 3.85); spread of H1N1 the least frequently (mean score 2.35) for seeking in Singapore (mean score 3.77); and availability of H1N1 information probably because they were only medicines and vaccination in Singapore (mean consulted when it was suspected that someone was score 3.76). On the other hand, the information infected by the virus. Previously Wong and Sam needs receiving lowest mean scores were: origin of (2010) also reported more dependence on family H1N1virus (mean score 3.19); number of H1N1 members and healthcare providers for getting infor- recovered patients (mean score 3.19); and relation- mation about the H1N1 pandemic. ship between eating pork and H1N1 infection Through an open-ended option, the respondents (mean score 3.15). As all the information needs fell were asked to identify other human sources used by in a small range of mean scores (3.15 - 4.01), it can them for getting H1N1 information. Two sources be concluded that the respondents had very diverse reported more frequently were school teachers and information needs. A study by Wong and Sam institutional Human Resource (HR) staff. In many (2010) also revealed that the two most important institutions, HR departments were required to pro- information needs during the Influenza A pandemic vide up-to-date information about H1N1to their were prevention and treatment of H1N1 infection. staff as well as instructions about certain precau- tionary measures such as temperature recording, 3.5. Information Seeking Behavior use of face masks, and travel advisories. A further The participants were asked to identify the data analysis showed that students were more likely sources they used for seeking the latest H1N1-relat- to obtain information from their friends and teach- ed information during the peak period of the out- ers while working adults more likely from their col- break. They were given several options under the leagues and HR department. broad categories of human sources, print sources, media sources, on-line sources, and health related 3.5.2. Print and Media Information Sources websites. A 5-point Sem- antic Differential scale was It was found that in this category the top three used for data collection where 1 represented ‘least most frequently used sources were mass media frequently’ and 5 ‘most frequently’ used informa- sources, i.e. television (mean score 4.02), newspa- tion sources. pers (mean score 4.00), and radio (mean score 3.71)(Table 5). Information used from these sources 3.5.1. Use of Human Information Sources included news, announcements, and healthcare As shown in Table 4, the most frequently used related government advisories. On the other hand, human source for seeking H1N1-related informa- the least frequently used sources were magazines, tion was friends (mean score 3.43), followed by fam- emails, circulars sent by schools or company man-

Table 4. Preferred Human Sources

Frequency Level Ranking Human Sources N Mean Score SD (1~5)

1 Friends 216 3.43 1.23

2 Family members 216 3.25 1.30

3 Colleagues 153 3.21 1.31

4 Family doctor 216 2.35 1.60

48 Information Needs and Seeking Behavior

Table 5. Preferred Print and Media Sources

Frequency Level Ranking Print / Media Sources N Mean Score SD (1~5)

1 Television 216 4.02 1.14

2 Newspapers 216 4.00 1.09

3 Radio 216 3.71 1.31

4 Healthcare posters 216 3.08 1.41

5 Healthcare pamphlets 216 3.00 1.47

6 Emails/ circulars from schools/ companies 216 2.80 1.71

7 Magazines 216 2.20 1.53

agement, and H1N1-related health pamphlets. It comparatively more frequently used online sources could be due to the reason that probably the were school websites (mean score 2.65), news web- respondents were flooded with many emails and sites (mean score 2.60), and company intranets they were getting almost the same instructions (mean score 2.56). Interestingly, social networking repeatedly from different sources. Magazines were websites and online databases were the least fre- the least frequently used source (mean score 2.20) quently used online information sources. probably due to their inability to provide up-to-date In response to an open-ended option for online information. These findings are in line with previ- information sources, some respondents indicated ous studies which also report a preference for mass Yahoo and Google as other means for seeking media sources such as television, newspapers, and H1N1-related information. As Yahoo and Google radio in an outbreak situation (Walter et al., 2012; are search engines, they were not considered as Wong & Sam, 2010; Thom- pson, 2003). information sources in this study.

3.5.3. Online Information Sources 3.5.4. Health related Websites On the whole, the respondents used online infor- The three most frequently used health websites mation sources less frequently (Table 6). Three for seeking H1N1-related information were the

Table 6. Preferred Online Sources

Frequency Level Ranking Online Sources N Mean Score SD (1~5)

1 School website 63 2.65 1.81

2 News websites (CNA , BBC, etc) 216 2.60 1.81

3 Company intranet 153 2.56 1.76

4 Social network websites (Twitter, Blogspot, etc) 216 1.98 1.69

5 Online databases (i.e. Pubmed, Factiva, etc) 216 1.69 1.65

49 http://www.jistap.org JISTaP Vol.1 No.1, 42-53

Singapore Ministry of Health (mean score 2.82) reported limited use of the Internet during a 2009 website, a dedicated H1N1 website launched by the Influenza A outbreak in Germany. Singapore Ministry of Health, and the website of the Singapore Health Promotion Board (Table 7). 3.5.5. Overall Most Preferred Information Sources However, certain international health-related web- Table 8 presents a combined list of the top ten sites such as WHO (mean score 2.06) and CDC most preferred information sources for seeking (mean score 1.87) websites were the least frequently H1N1-related information. The first top three posi- used for seeking H1N1-related information. It was tions were occupied by mass media sources (Table surprising as WHO website provided very compre- 8). This was probably because these sources provid- hensive coverage during the H1N1 outbreak. A ed very up-to-date H1N1-related news as well as probable explanation is that most respondents government announcements and advisories. The believed that local websites could provide more up- next two important information sources were to-date and directly relevant information about friends and family members. It is worth noting that, H1N1. A previous study by Walter et al. (2010) also on the whole, online information sources and

Table 7. Preferred Health Web Sources (N=216)

Frequency Level Ranking Health Related Web Sources Mean Score SD (1~5)

1 Ministry of Health, Singapore (www.moh.gov.sg) 2.82 1.75 2 Dedicated H1N1 website, Ministry of Health, Singapore (now unavailable) 2.78 1.76 3 Health Promotion Board, Singapore (www.hpb.gov.sg) 2.66 1.70 4 Influenza A Home - Singapore Government Crisis News (now unavailable) 2.25 1.74 5 World Health Organization (WHO) (http://www.who.int/en/) 2.06 1.71 6 Center for Disease Control and Prevention (CDC) (www.cdc.gov) 1.87 1.74

Table 8. Top 10 Sources for Seeking H1N1-related Information

Frequency level Source Ranking Information Sources N Mean Score SD Type (1~5)

1 Television 216 4.02 1.14 Media 2 Newspapers (online/print) 216 4.00 1.09 Media 3 Radio 216 3.71 1.31 Media 4 Friends 216 3.43 1.23 Human 5 Family members 216 3.25 1.30 Human 6 Healthcare posters 216 3.08 1.41 Print 7 Healthcare pamphlets 216 3.00 1.47 Print 8 Colleagues 153 3.21 1.31 Human 9 Emails from school management 63 2.86 1.71 Print

10 Singapore Ministry of Health website 216 2.82 1.75 Website

50 Information Needs and Seeking Behavior

health-related websites were least frequently used 3.6. Problems in Using H1N1-releated Infor- for seeking H1N1 information. This was particularly mation surprising as IT literacy in Singapore is quite high One concern during the H1N1 outbreak was the and more than 90% of the households have Internet ability of the general public to adequately under- access. It is possible that most of the H1N1-related stand terminology used in news and public com- information needs of the respondents were ade- munications. A list of 8 routinely used H1N1-related quately met through mass media and people close terms was provided in the questionnaire and the to them, and therefore there was no pressing need respondents were asked to point out difficult or to further search online sources and websites. confusing terms. It was found that a majority of the respondents (64%) were unable to understand the 3.5.6. Purposes of Seeking H1N1-related Information term ‘DORSCON’ alert levels, which indicate the After investigating respondents’ preference for risk of acquiring an infectious disease (Table 10). different information sources, they were asked to Two other difficult to understand terms or concepts indicate their purposes in seeking H1N1-related were ‘Mitigation Phase’ and ‘Pandemic Business information. The respondents were allowed to Continuity program.’ The mitigation phase started select more than one option from the list provided. when the H1N1 flu was expected to be managed in a Though respondents were also given an open- similar manner to a seasonal flu. The Pandemic ended option to indicate additional purposes, none Business Continuity program was started by the of them provided any input. Singapore government during the H1N1 outbreak As shown in Table 9, the two most important pur- to help businesses prepare to deal with the effects of poses for seeking H1N1-related information were ‘to the flu pandemic. The terms adequately understood remain vigilant and adjust precautionary measures by a majority of the respondents were ‘social dis- accordingly’ and ‘to keep themselves informed of the tancing,’ ‘contact tracing,’ ‘swine flu,’ and ‘Influen- latest news.’ It appeared that a majority of the respon- za A.’ This recognition could be due to the reason dents were concerned about the emergence of a new that these terms appeared more frequently in the virus and wanted to protect themselves and family media and were more relevant to the respondents. members against this virus. A considerable number of Gerwin (2012) also argues the general public is likely the respondents also revealed that they sought H1N1 to face difficulties in adequately understanding ter- information for its possible dissemination to other minology in information disseminated by various interested individuals. This indicated their sense of media channels. social responsibility and willingness to help other The participants were also asked if they faced any community members. problems in seeking information during the H1N1 outbreak. Although a majority of the respondents

Table 9. Purpose of Seeking H1N1 Information (multiple responses)

Ranking Purpose Frequency (N=216) Percent

To remain vigilant and adjust my own precautionary measures 1 157 72.7% accordingly.

2 To find out latest information related to H1N1 for my own personal use. 128 59.3%

3 To help someone who was looking for information. 82 38.0% To prepare corporate advisory and circular related to H1N1 for 4 55 25.5% distribution within my organisation. To find out information about H1N1 for my school assignments 5 35 16.2% (only for students).

51 http://www.jistap.org JISTaP Vol.1 No.1, 42-53

Table 10. Unfamiliar H1N1-related Terms

Ranking Term Frequency Percent

1 DORSCON alert level 138 63.9%

2 Mitigation phase 100 46.3%

3 Flu Pandemic Business Continuity Program 96 44.4%

4 Containment phase 66 30.6%

5 Social distancing 32 14.8%

6 Contact tracing 30 13.9%

7 Swine flu 20 9.3%

8 Influenza A 19 8.8%

disagreed with the listed statements, two problems can easily become pandemic and spread all over the pointed out by a considerable number of the partic- world within a very short period of time. In addition ipants were the availability of too much repetitive to other measures, creating awareness among the information through the Internet, and too many general public is essential to curtail the spread of emails providing too much H1N1 information any epidemic. To achieve this purpose, it is desir- (Table 11). The next two problems also highlighted able that we should adequately understand the the issue of excessive information availability. It information needs and seeking behavior of people appeared that some respondents were facing infor- during an outbreak. Such knowledge would be very mation overload due to availability of excessive and useful to national and international health agencies, repetitive information about H1N1 from multiple local public communication departments, hospi- sources. This also explains our earlier findings that tals, charity and relief organizations, information emails and circulars were less preferred choices for and communication professionals, and other public seeking H1N1 information as probably the respon- service agencies to prepare appropriate information dents were already overwhelmed by receiving fre- strategies for implementation during possible quent and repetitive updates from different agencies. future epidemics. It was found that among all the information sources, television, newspapers, and radio were the 4. CONCLUSION most preferred sources for obtaining H1N1-related information. Some previous studies also suggest Due to rapid urbanization and frequent air travel, that these sources are frequently used by the gener- there is a real danger that a local disease outbreak al public during epidemics and natural disasters.

Table 11. Information Seeking Problems (N=216)

S. No. Statement Agreed Disagreed

1 Too much repetitive information available through the Internet 46.5% 53.5%

2 Too frequent emails containing too much information. 40.1% 59.9%

3 Too many healthcare printed pamphlets sent by different agencies. 34.6% 65.4%

4 Too many updates and frequent changes in content. 30.1% 69.9%

5 Difficulty in adequately understanding H1N1 information. 28.4% 71.6%

52 Information Needs and Seeking Behavior

Thus relevant agencies should take full advantage of (2009). Communicating with the public during the power of mass media for providing emergency health crises: Experts’ experiences and opinions. alerts, creating awareness, and disseminating nec- Journal of Risk Research, 12(6), 793-807. essary advisories during crises. Such efforts are like- Lagasse, L. P., Rimal, R. N., Smith, K. C., Storey, J., ly to improve public preparedness to act responsi- Rhoades, E., Barnett, D. J., & Links, J. (2011). How bly and cooperate in the efforts to limit the impact accessible was information about H1N1 flu? of any outbreak. Literacy assessments of CDC guidance documents It is equally important that all agencies involved for different audiences. PLOS ONE, 6(10), 1-6. in public awareness during an outbreak should Macario, E., Ednacot, E., Ullberg, L., & Reichel, J. actively collaborate and come up with common (2011). The changing face and rapid pace of pub- strategies for providing timely, relevant, and accu- lic health communication. Journal of Communi- rate health information to different segments of cation in Healthcare, 4(2), 145-150. society. It appeared in this study that some of the Tausczik, Y., Faasse, K., Pennebaker, J., & Petrie, K. respondents were unhappy about receiving too (2012). Public anxiety and information seeking much repetitive information from multiple agen- following the H1N1 outbreak: Blogs, newspaper cies. The danger is that involvement of too many articles, and Wikipedia visits. Health Communi- agencies in public health communication may cation, 27(2), 179-185. result in confusion, information overload, or even Thompson, M. R., Heath, I., Ellis, B. G., Swarbrick, E. information anxiety. It would be more appropriate T., Wood, L. F., & Atkin, W. S. (2003). Identifying if concerned agencies decide on their information- and managing patients at low risk of bowel can- related roles and responsibilities during a crisis and cer in general practice. British Medical Associa- make necessary preparations accordingly. tion Journal: Clinical Research Edition, 327 (7409), 263-265. Walter, D., Bohmer, M., Reiter, S., Krause, G., & REFERENCES Wichmann, O. (2012). Risk perception and infor- mation-seeking behaviour during the 2009/10 Caress, A., Duxbury, P., Woodcock, A., Luker, K., Ward, influenza A (H1N1) pdm09 pandemic in Ger- D., Campbell, M., & Austin, L. (2010). Exploring many. Euro Surveillance: European Commu- the needs, concerns and behaviours of people nicable Disease Bulletin, 17(13). with existing respiratory conditions in relation to Wong, L., & Sam, I. (2010). Public sources of infor- the H1N1 ‘swine influenza’ pandemic: a multi- mation and information needs for pandemic centre survey and qualitative study. Health influenza A (H1N1). Journal of Community Health, Technology Assessment, 14(34), 1-108. 35(6), 676-682. Dickmann, P., Rubin, G., Gaber, W., Wessely, S., Yang, Z. (2012). Too scared or too capable? Why do Wicker, S., Serve, H., & Gottschalk, R. (2011). New college students stay away from the H1N1 vac- Influenza A/H1N1 (‘Swine Flu’): Information cine? Risk Analysis: An International Journal, needs of airport passengers and staff. Influenza 32(10), 1703-1716. and other Respiratory Viruses, 5(1), 39-46. Yip, M., Ong, B., Painter, I., Meischke, H., Calhoun, Featherstone, R., Boldt, R., Torabi, N., & Konrad, S. B., & Tu, S. (2009). Information-seeking behaviors (2012). Provision of pandemic disease informa- and response to the H1N1 outbreak in Chinese tion by health sciences librarians: A multisite limited-English proficient individuals living in comparative case series. Journal of the Medical King County, Washington. American Journal of Library Association, 100(2), 104-112. Disaster Medicine, 4(6), 353-360. Gerwin, L. (2012). The challenge of providing the public Zach, L. (2011). What do I do in an emergency? The with actionable information during a pandemic. role of public libraries in providing information Journal of Law, Medicine & Ethics, 40(3), 630-654. during times of crisis. Science & Technology Holmes, B. J., Henrich, N., Hancock, S., & Lestou, V. Libraries, 30(4), 404-413.

53 http://www.jistap.org Research Paper JISTaP http://www.jistap.org J. of infosci. theory and practice 1(1): 54-68, 2013 Journal of Information Science Theory and Practice http://dx.doi.org/10.1633/JISTaP.2013.1.1.4

A Study on Behavioral Traits of Library and Information Science Students in South India

S. Baskaran* B. Ramesh Babu S. Gopalakrishnan Madras University Library Dept. of Library and Information Science Madras Institute of Technology University of Madras, India University of Madras, India Anna University, India Email: [email protected] Email: [email protected] Email: [email protected]

ABSTRACT Human behaviour normally depends on the environment of the incident and the time of its occurrence. The behaviour of people depends on many factors and these behaviour traits are an important aspect in the Library and Information Science (LIS) field. Hence in this paper an attempt has been made to examine the behaviour traits of LIS students in South India. Out of 400 questionnaires distributed 367 have responded and the response rate is 91.75%. In this survey three aspects comprising student behaviour have been analysed such as Work Environment, Natural Environment, and Social Environment. In the case of Work Environment the respondents were grouped as Workaholic, Impatience, Achievement oriented, Rash nature, and Punctuality. Further, in respect to Natural environment, the respondents are grouped as Complacent, Patience, Easygoing, and Relaxed. Last, the respondents were grouped in the Social Environment as Balancing nature, Magnanimity, Naturalistic, Assertive nature, Dependency, Lucrative, Lonely nature, and Time Based personality. Finally the authors conclude that LIS students need to possess these qualities and behaviours to work in different environments.

Keywords: LIS students, behaviour traits, work environment, natural environment, social environment, India

1. INTRODUCTION the young for participation in new economic and work environments, where the basis of employment There is widespread interest, discussion, and is more flexible and the required skills tend to be exploration globally regarding school improvement higher order, more diverse, and continually chang- in one form or another. One of the often cited rea- ing. Today it is seen that academic skills and intelli- sons for educational change is the need to prepare gences alone are not sufficient to cope up with the

Open Access

Received date: December 19, 2012 All JISTaP content is Open Access, meaning it is accessible Accepted date: March 11, 2013 online to everyone, without fee and authors’ permission. All JISTaP content is published and distributed under the terms of *Corresponding Author: S. Baskaran the Creative Commons Attribution License (http://creativecom- Technical Officer mons.org/licenses/by/3.0/). Under this license, authors reserve Madras University Library the copyright for their content; however, they permit anyone to University of Madras, India unrestrictedly use, distribute, and reproduce the content in any E-mail: [email protected] medium as far as the original authors and source are cited. For any reuse, redistribution, or reproduction of a work, users must clarify the license terms under which the work was produced.

S. Baskaran, B. Ramesh Babu, S. Gopalakrishnan, 2013 54 A Study on Behavioral Traits

global competition. In the field of Library and across situations and over time. The service indus- Information Science (LIS), the possession of skills tries are always associated with direct personal con- and competencies are needed which demand that tact such as personality, temperament, and other students shall behave properly in their learning and internal factors (Feng and Zhang, 2009). working environments. It has been established that There are no studies reported in the context of LIS LIS professionals should possess a quality of psy- students and hence this study bridges the gap. In chological aspects as a built in capability of the pro- this paper an attempt has been made to study the fession. Hence multiple intelligences become behavioral traits of Library and Information Science mandatory coupled with proper behaviour, which (LIS) students and to categorize them based on in turn refers to behavioral psychology. Behaviour is ‘Work environment,’ ‘Natural environment,’ and a manner of behaving or conducting oneself. ‘Social environment.’ The study has been carried Condi-tioning, reinforcement, and punishment are out with the following objectives: key concepts used by behaviorists. The profiles of behavioral traits are as follows * To identify the Behavioral traits of ‘Work Envi- (Friedman and Rosenman, 1959; McAdams, 1996; ronment,’ ‘Natural Environment,’ and ‘Social Vazquez-Carrasco and Foxall, 2006; Wright, 1996): Environment’ among Library and Information Science students in South India. * Accommodating concern for group account- * To identify the differences in behavioral traits ability between male and female Students in Library * Assertiveness a measure of generalized self and Information Science. acceptance and confidence * To compare the behavioral traits of the students * Attitude related to stability and poise of Library and Information Science in different * Decisiveness associated with taking control as geographical environments. well as self acceptance * Energy Level a tendency toward restlessness, activity, and drive 2. RESEARCH DESIGN * Independence individual preference rather than being directed by others This study sought views on behavioural traits of * Manageability social responsibility and stability master’s degree students in Library and Information * Objective Judgment a sense of rational com- Science in various universities of Southern India. petence and objectivity For this purpose a structured questionnaire was * Sociability a measure of social presence and administrated among all 400 LIS students (Popu- self-confidence lation) of LIS schools spread through four southern States namely Tamil Nadu, Andhra Pradesh, Karna- Stogdill (1948) and Mann (1959) reported that taka, Kerala, and one Union Territory, Pondicherry. many earlier studies on traits attempted to identify The survey is based on the census method. There personality characteristics that appear to differentiate are about 400 master’s students studying in the one from the other. More recently people have tried states of Southern India and the survey has been looking at what combinations of traits might be good given to the entire population. Out of 400 question- for a particular situation. There is some mileage in naires distributed, 367 have responded and the this. It appears possible to link clusters of personality response rate is 91.75%. traits to success in different situations (Wright, 1996). The data collected from the respondents were McAdams (1996) suggests that personality traits, analysed using the SPSS software package. The which deal with temporal and situationally invari- background information of the respondents is pre- ant personal characteristics, distinguish different sented in Table 1. individuals and lead to consistencies in behavior Out of 367 respondents, 58.04% are male and

55 http://www.jistap.org JISTaP Vol.1 No.1, 54-68

41.96% are female. 71.39% of respondents are in the dents studying in that university at the time of the age group of below 25 years, 24.52% are in the age survey. group of between 25 and 29 years and 4.09% are The university-wise distribution of respondents is above 29 years. The largest number of respondents shown in Table 2. is from Karnataka (36.51%), followed by Andhra Pra- A total of 22 LIS Schools are listed in four south- desh (27.79%), Tamil Nadu (23.71%), and Kerala ern states, including one Union Territory. Out of (10.64%). The lowest number of respondents (1.36%) these there are 7 each in Tamil Nadu and Karnata- belong to Pondicherry since there are only 5 PG stu- ka, followed by 5 in Andhra Pradesh.

Table 1. Demographic Background Information about the Respondents

Description No. of Respondents Percentage Male 213 58.04 Gender Female 154 41.96 Below 25 years 262 71.39 Age Between 25 and 29 90 24.52 Above 29 15 4.09 Tamil Nadu 87 23.71 Andhra Pradesh 102 27.79 State Karnataka 134 36.51 Kerala 39 10.63 Pondicherry 5 1.36

Table 2. University-wise Distribution of Respondents

State University/Institution Frequency Percent Alagappa University 12 3.27 Bharathidasan University 7 1.91 University of Madras 10 2.72 Tamilnadu Bishop Heber College 17 4.63 AVVM Sri Pushpam College 10 2.72 Madurai Kamaraj University 10 2.72 Annamalai University 21 5.72 Sri Venkateswara University 15 4.09 Andhra University 30 8.17 Andhra Ambedkar University 23 6.27 Pradesh Sri Krishnadevaraya University 18 4.90 Osmania University 16 4.36 Bangalore University 21 5.72 Mysore University 24 6.54 Karnatak University 24 6.54 Karnataka Mangalore University 16 4.36 Gulbarga University 28 7.63 Kuvempu University 16 4.36 Documentation Research and Training Centre(DRTC) 5 1.36 Calicut University 18 4.90 Kerala Kerala University 21 5.72 Pondicherry Pondicherry University 5 1.36 Total 367 100

56 A Study on Behavioral Traits

3. RESEARCH ANALYSIS: BEHAVIOR TRAITS within a single construct. When the set of items OF LIS STUDENTS measures more than one construct, coefficient omega_hierarchical is more appropriate (McDonald, Friedman and Rosenman (1959) first propounded 1999; Zinbarg et al. 2005). the A/B type of behavioural pattern to describe cer- Commonly accepted rules for describing internal tain kinds of individuals who, they believed, tended consistency using Cronbach’s alpha (Cronbach, Lee to be overrepresented as clients in their clinical J., and Shavelson R J, 2004) are 0.9 (Excellent), practice. Based on their study, this paper examined 0.9> 0.8 (Good), 0.8> 0.7 (Acceptable), 0.7> the behavioral traits of LIS students in South India 0.6 (Questionable), 0.6> 0.5 (Poor) and 0.5> and categorized them under three different envi- (Unacceptable). ronments, namely ‘work environment,’ ‘nature Therefore Cronbach’s alpha value has been calcu- environment,’ and ‘social environment.’ The work lated for the variables taken up for three groups and environment group has been described as working the same is shown in Table 3. in the office, the nature environment is described as naturally/habitually showing attitudes irrespective of the environment, and the social environment is Table 3. Reliability Test described in regard to attitudes toward the society. No. of Environment Alpha Value In this study the behaviour traits of the LIS stu- Variables dents has been examined in three environments Work Environment 16 0.7307 stated above and the number of variables taken up Natural Environment 16 0.7271 under each are: Social Environment 27 0.9019

* Work Environment 16 variables * Natural Environment 16 variables * Social Environment 27 variables The Cronbach alpha value indicates that of all the variables taken up for the study are acceptable. 3.1. Reliability Test Reliability is concerned with the consistency of a 3.2. Work Environment Group variable. There are two identifiable aspects of this The nature of behaviour is time-bound in the case issue: external and internal reliability. Nowadays, of work environment. Opinions on 16 variables the most common method of estimating internal were taken up in a five point scale such as “strongly reliability is Cronbach’s alpha ( ), which is roughly agree,” “Agree,” “No opinion,” “Disagree,” and equivalent to the average of all possible split-half “Strongly Disagree.” The mean and standard devia- reliability coefficients for a scale (Zeller and tion were calculated based on the opinions. Further Carmines, 1980). The usual formula is ranks were assigned. The opinions, mean, standard deviation, and rank are shown in Table 4. The mean value in Table 4 shows that the highest K 2 K i 1 i value (4.14) is for “I am never late if I have an ap- = 1 i =1 i (1) K 1 2 pointment,” whereas the same variable shows more variation with the standard deviation value as 1.20. Based on the responses from 367 LIS students for 2 Here K is the number of items; i is the sum of the 16 variables, component factors were adminis- 2 the total variances of the items; and x is the vari- trated. The scores obtained were subjected to factor ance of the total score (Pedhazur and Schmelkin, analysis and five factors have emerged (Table 5). 1991). As a result, alpha is most appropriately used As can be seen from the table, the variables are when the items measure different substantive areas grouped into five components. Further Eigen values

57 http://www.jistap.org JISTaP Vol.1 No.1, 54-68

Table 4. Frequency Data on Behavioural Nature of the Respondents

S. No. Variables SA A N D SD MEAN Std R

I prefer to move around rapidly when 79 157 56 42 33 W1 3.47 1.10 8 I am not doing anything (21.53) (42.78) (15.26) (11.44) (8.99)

I prefer to finish the tasks at hand as 133 168 49 13 4 W2 4.11 0.85 2 soon as possible (36.24) (45.78) (13.35) (3.54) (1.09)

I am never late if I have an 173 123 42 19 10 W3 4.14 1.20 1 appointment (47.14) (33.51) (11.44) (5.18) (2.72)

I tend to feel impatient with the rate 53 141 103 52 18 W4 3.38 1.20 11 at which most events take place (14.44) (38.42) (28.07) (14.17) (4.90)

I have very few interests outside 73 146 67 50 31 W5 3.41 0.97 9 my work (19.89) (39.78) (18.26) (13.62) (8.45)

I feel impatient when I don’t have 67 179 65 36 20 W6 3.59 1.06 6 any work in hand (18.26) (48.77) (17.71) (9.81) (5.45)

32 148 79 86 22 W7 I always feel rushed 3.16 1.06 15 (8.72) (40.33) (21.53) (23.43) (5.99)

52 133 101 57 24 W8 I habitually have quick meals 3.29 1.12 12 (14.17) (36.24) (27.52) (15.53) (6.54)

138 144 57 21 7 W9 Competition is my first choice 4.03 1.01 3 (37.60) (39.24) (15.53) (5.72) (1.91)

I enjoy doing two or more things 65 173 65 52 12 W10 3.59 1.09 7 simultaneously (17.71) (47.14) (17.71) (14.17) (3.27)

65 126 80 65 31 W11 I cannot relax without feeling guilt 3.27 1.08 13 (17.71) (34.33) (21.80) (17.71) (8.45)

I have always struggled to achieve 105 138 59 53 12 W12 3.71 1.06 4 more in less time (28.61) (37.60) (16.08) (14.44) (3.27)

I am very particular to exhibit my 63 129 101 58 16 W13 3.41 1.04 10 superiority whenever I play (17.17) (35.15) (27.52) (15.80) (4.36)

35 87 89 122 34 W14 I have always lived the life of deadlines 2.82 1.24 16 (9.54) (23.71) (24.25) (33.24) (9.26)

I take it as a privilege to display or discuss 85 168 59 41 14 W15 my achievements or accomplishments 3.69 1.20 5 (23.16) (45.78) (16.08) (11.17) (3.81) whenever I get an opportunity to do so

I have never found sufficient time 62 132 79 52 42 W16 3.21 1.15 14 for the task at hand (16.89) (35.97) (21.53) (14.17) (11.44)

SA-Strongly Agree A-Agree N-No opinion D-Disagree SD-Strongly Disagree Mean Arithmetic Mean SD-Standard Deviation R-Rank

for the same have been carried out. It can be seen ance ratio 55.422%). This indicates that five factors that only the first five factors have Eigen values are interpretable. greater than 1. ‘1’ was the criterion for retention of a The components were named based on the vari- factor, which indicates that only the first five factors able under each component such as Workaholic, are to be extracted. It can be seen that the variances Impa-tience, Rash nature, Achievement oriented, were more evenly distributed in the rotated sum of and dominating nature. Further, the number of per- the squared loading (11.945%, 11.424%, 11.376%, sons and the gender under each component has 10.744% and 9.933% respectively; Cumulative vari- been identified and the same is shown in Table 6.

58 A Study on Behavioral Traits

Table 5. Result of Factor Analysis of Work Environment Variables

Component S. No. 1 2 3 4 5 W5 .495 W9 .562 W10 .700 W15 .713 W1 .676 W4 .699 W16 .601 W6 .731 W17 .702 W8 .616 W11 .632 W12 .733 W13 .606 W14 .470 W2 .795 W3 .789 Eigen value 1.911 1.828 1.820 1.719 1.589 Cumulative variance ratio 11.945 23.369 34.745 45.489 55.422

1-Workaholic 2-Impatience 3-Rash Nature 4-Achievement Oriented 5-Dominating Nature

Table 6. Work Environment vs. Gender

Male Female Total Work Environment No. % No. % No. %

Workaholic 55 14.99 24 6.54 79 21.53

Impatience 34 9.26 33 8.99 67 18.26

Rash Nature 39 10.63 19 5.18 58 15.80

Achievement oriented 30 8.17 38 10.35 68 18.53

Punctuality 55 14.99 40 10.90 95 25.89

Total 213 58.04 154 41.96 367 100.00

It can be seen from Table 6 that the “punctuality” Nature’ dominates among males (10.63%) in the group works out to 25.88%, followed by the “worka- work environment, on the other hand it has the holic” group (21.52%). least impact (5.18) among females. Further, it can From Table 6 it can be observed that ‘Workaholic’ be seen that ‘Workaholic’ natures (14.99%) persist (14.99%) and ‘Punctuality’ (14.99%) are of equally among males whereas in the female it is 6.54%. It is importance among the males, whereas ‘Punctuality’ also found that ‘Impatience’ nature in the work (10.9%) and ‘Achievement oriented’ (10.35%) are of environment group is almost equal among males equally important among the females. While ‘Rash and females, i.e. 9.26% and 8.99% respectively.

59 http://www.jistap.org JISTaP Vol.1 No.1, 54-68

Table 7. Work Environment vs. State of Respondents

Achievement Workaholic Impatient Rash Nature Punctuality State Oriented Total M F M F M F M F M F

10 6 5 8 12 5 6 10 11 14 87 Tamil Nadu 2.72 1.63 1.36 2.18 3.27 1.36 1.63 2.72 3.00 3.81 23.71

4 0 0 0 1 0 0 0 0 0 5 Pondicherry 1.09 0 0 0 0.27 0 0 0 0 0 1.36

18 5 12 6 12 4 14 7 18 6 102 Andhra Pradesh 4.90 1.36 3.27 1.63 3.27 1.09 3.81 1.91 4.90 1.63 27.79

21 7 16 14 13 5 10 10 24 14 134 Karnataka 5.72 1.91 4.36 3.81 3.54 1.36 2.72 2.72 6.54 3.81 36.51

2 6 1 5 1 5 0 11 2 6 39 Kerala 0.54 1.63 0.27 1.36 0.27 1.36 0.00 3.00 0.54 1.63 10.63

55 24 34 33 39 19 30 38 55 40 367 Total 14.99 6.54 9.26 8.99 10.63 5.18 8.17 10.35 14.99 10.90 100.00

The state-wise distribution of respondents under as females have ‘Workaholic’ and ‘Punctuality’ work environment is shown in Table 7. (1.63%), followed by ‘Impatience’ and ‘Rash The following skills have been identified as distin- nature’ (1.36%). guished between genders: From the data in Table 8, the top priority variables 1. In Tamil Nadu, ‘Rash nature’ followed by ‘Pun- for the students can be presented in Table 8. ctuality’ are more frequent in males whereas ‘Punctuality’ followed by ‘Achievement orien- 3.3. Natural Environment ted’ are priorities among females. The sixteen variables thus selected to ascertain 2. In Pondicherry all the respondents are male the natural environment behavioural traits among and all favoured the category of ‘Workaholic’ LIS professionals has been evaluated. From the fac- (1.09%) and ‘Rash nature’ (0.27%). tor analysis of obtained scores, four factors emerged 3. In Andhra Pradesh the male group dominates and the result is shown in Table 9. on ‘Workaholic’ and ‘Punctuality’ (4.9%) equal- ly, followed by ‘Achievement oriented’ (3.81%). However, in females ‘Achievement oriented’ (1.91%) dominates, followed by ‘Punctuality’ Table 8. Work Environment - Top Priority Skills of Students in States vs. Gender (1.63%). 4. In Karnataka males fall under the category of State Male Female ‘Punctuality’ (6.54%), followed by ‘Workaholic’ Tamil Nadu Rash nature Punctuality (5.72%), whereas ‘Punctuality’ and Pondicherry Workaholic - ‘Impatience’ (3.81%) are equally found among Achievement females. Andhra Pradesh Workaholic oriented 5. In Kerala, the male group falls under ‘Worka- Karnataka Punctuality Punctuality holic’ and ‘Punctuality’ (0.54%), followed by ‘Impa-tience’ and ‘Rash nature’ (0.27%), where- Kerala Punctuality Punctuality

60 A Study on Behavioral Traits

Table 9. Natural Environment

S.No Description S A A U D S D Mean Std Rank

81 149 52 69 16 N1 I do not work under time pressure 2.44 1.204 9 (22.1) (40.6) (14.2) (18.8) (4.4)

I do not display or discuss either my achievements or accomplishments 71 169 82 33 12 N2 1.87 .850 15 unless such exposure is demanded by (19.3) (46.0) (22.3) (9.0) (3.3) the situation

I have never set deadlines for my 47 166 83 54 17 N3 1.83 1.006 16 acomplishments (12.8) (45.2) (22.6) (14.7) (4.6)

109 191 39 23 5 N4 I play for fun and relaxation 2.57 1.056 6 (29.7) (52.0) (10.6) (6.3) (1.4)

85 188 44 34 16 N5 I relax whenever I want to do 2.51 1.196 8 (23.2) (51.2) (12.0) (9.3) (4.4)

I do not give much weightage to 56 144 86 66 15 N6 quantity in comparison to other 2.35 1.058 11 (15.3) (39.2) (23.4) (18.0) (4.1) measures of success

I prefer to concentrate on one task 102 202 44 10 9 N7 2.78 1.086 2 at a time (27.8) (55.0) (12.0) (2.7) (2.5)

I enjoy my food by making no 87 168 61 41 10 N8 2.64 1.105 5 haste while eating (23.7) (45.8) (16.6) (11.2) (2.7)

58 160 88 51 10 N9 I never feel rushed 1.95 .965 14 (15.8) (43.6) (24.0) (13.9) (2.7)

Leisure time is welcome after a spell 105 173 57 17 15 N10 2.38 1.036 10 of work (28.6) (47.1) (15.5) (4.6) (4.1)

108 172 49 27 11 N11 I am open in expressing my feelings 2.65 1.203 4 (29.4) (46.9) (13.4) (7.4) (3.0)

91 149 72 48 7 N12 I have many interests outside my work 2.26 1.120 13 (24.8) (40.6) (19.6) (13.1) (1.9)

I am comfortable with the rate 64 175 74 41 13 N13 2.55 1.082 7 at which most events take place (17.4) (47.7) (20.2) (11.2) (3.5)

52 177 79 46 13 N14 I take appointments casually 3.09 1.148 1 (14.2) (48.2) (21.5) (12.5) (3.5)

I prefer to complete the tasks at 40 126 77 96 28 N15 2.27 1.056 12 hand slowly (10.9) (34.3) (21.0) (26.2) (7.6)

I prefer to sit at one place when 41 119 75 93 39 N16 2.67 1.238 3 I am not doing anything (11.2) (32.4) (20.4) (25.3) (10.6)

61 http://www.jistap.org JISTaP Vol.1 No.1, 54-68

As can be seen from the table, the variables are almost evenly distributed ranging from 23.43% grouped into four components. Eigen values were to 27.52%. Further, it can be seen that ‘Patience’ calculated for the same variables. The first four fac- (27.52%) in nature dominates and is followed by the tors have Eigen values greater than 1. ‘1’ was the cri- ‘Complacent’ group (25.34%). terion for retention of a factor, which indicates that From Table 11 it can be seen that Easygoing only the first four factors are to be extracted. It can (16.89%) and Patience (15.8%) respectively were be seen that the variances were more evenly distrib- given importance among the males, whereas uted in the rotated sum of the squared loading Complacent (12.81%) and Patience (11.72%) were (12.676%, 12.508%, 11.811%, and 11.739% respec- important among the females. Both Complacent tively; Cumulative variance ratio 48.734%), which and Relaxed (12.53% and 12.81%) are equal in shows that the four factors are interpretable. The importance among the males, where the females four components have been extracted and named gave less importance to ‘Easygoing’ (6.81%). In gen- as Complacent, Patience, Easygoing and Relaxed. eral there is contraction in the natural environment Moreover, the number of respondents and the in the case of ‘Easygoing’ more in males and less in gender under each component is shown in Table females. Similarly the ‘Complacent’ nature was 11. It is seen that individuals under each group are more favored by female than male respondents.

Table 10. Result of Factor Analysis of Natural Environment

Component Description 1 2 3 4 N7 .599 N9 .446 N11 .643 N13 .689 N6 .596 N14 .428 N15 .580 N16 .723 N1 .629 N2 .673 N3 .621 N4 .686 N5 .714 N8 .469 N10 .462 N12 .485 Eigen value 2.028 2.001 1.890 1.878 Cumulative variance ratio 12.676 25.184 36.995 48.734

1-Complacent 2-Patience 3-Easy Going 4-Relaxed

Table 11. Natural Environment Components vs. Gender

Male Female Total Natural Environment No. % No. % No. % Complacent 46 12.53 47 12.81 93 25.34 Patient 58 15.80 43 11.72 101 27.52 Easygoing 62 16.89 25 6.81 87 23.71 Relaxed 47 12.81 39 10.63 86 23.43 Total 213 58.04 154 41.96 367 100.00

62 A Study on Behavioral Traits

Regarding the skills relating to Natural Environ- Easygoing (6.53%), followed by Relaxed (5.72%), ment, the following have been identified among whereas in females Complacent and Relaxed states and gender of the respondents: (4.08%) had equal importance followed by 1. In Tamil Nadu, “Patience” (3.54%) is followed by Patience (3.26%). “Easygoing” (3.26%) for males. Patient (3.81%) 5. In Kerala the male group falls under the cate- followed by Complacent (3.26%) are strongest gories of Patience, Easygoing and Relaxed equal- ly (0.54), whereas Patience (3.54%) is strongest, among females. followed by Complacent (2.72%), among fe- 2. In Pondicherry all the respondents are male and males. all favoured the category of Patience (1.08%) and From Table 12, the top priority variables of the Relaxed (0.27%). students are summarised and shown in Table 13. 3. In Andhra Pradesh the male group dominates on Easygoing (6.53%) and Patient (5.44%). This 3. 4. Social Environment is followed by Complacent (4.35%). Whereas, in Similar to that of work environment and natural females Relaxed (3.81%) is dominant, followed environment, the behavioural natures of LIS stu- by Complacent (2.72%). dents in the case of Social Environment have been 4. In Karnataka males fall under the category of identified by making use of 27 variables (Table 14).

Table 12. Skills on Natural Environment vs. State

Complacent Patience Easy going Relaxed States Total M F M F M F M F

10 12 13 14 12 7 9 10 87 Tamil Nadu 2.72 3.26 3.54 3.81 3.26 1.90 2.45 2.72 23.7 0 0 4 0 0 0 1 0 5 Pondicherry 0 0 1.08 0 0 0 0.27 0 1.36 16 10 20 4 24 6 14 8 102 Andhra Pradesh 4.35 2.72 5.44 1.08 6.53 1.63 3.81 2.17 27.79 20 15 19 12 24 8 21 15 134 Karnataka 5.44 4.08 5.17 3.26 6.53 2.17 5.72 4.08 36.51 0 10 2 13 2 4 2 6 39 Kerala 0 2.72 0.54 3.54 0.54 1.08 0.54 1.63 10.62 46 47 58 43 62 25 47 39 367 Total 12.53 12.80 15.80 11.71 16.89 6.81 12.80 10.62 100

Table 13. Natural Environment - Top Priority among States vs. Gender

State Male Female

Tamil Nadu Patience Patience Pondicherry Patience -

Andhra Pradesh Easygoing Relaxed

Karnataka Easygoing Complacent

Kerala Patience Patience

63 http://www.jistap.org JISTaP Vol.1 No.1, 54-68

Table 14. Social Environment

S.No Description E G A P U Mean Std Rank Do you just listen, or do you get invited to 79 166 104 10 8 S1 2.19 .878 26 speak at gatherings? (21.5) (45.2) (28.3) (2.7) (2.2) 60 131 128 34 14 S2 Do you always use the phone? 2.49 .997 19 (16.3) (35.7) (34.9) (9.3) (3.8) 39 83 111 96 38 S3 Has your work been published? 3.03 1.153 1 (10.6) (22.6) (30.2) (26.2) (10.4) 37 104 139 48 39 S4 Do you always tell people, “I never lie”? 2.86 1.107 5 (10.1) (28.3) (37.9) (13.1) (10.6) Do your friends entrust you with their 76 134 98 25 34 S5 2.47 1.166 20 keys and money at parties? (20.7) (36.5) (26.7) (6.8) (9.3) Do you think working in groups wastes 82 166 81 33 5 S6 2.22 .939 25 your time or encourages your best work? (22.3) (45.2) (22.1) (9.0) (1.4) Are your projects always due last week, or 39 128 137 41 22 S7 2.67 1.010 13 did your class project go public already? (10.6) (34.9) (37.3) (11.2) (6.0) 27 112 148 45 35 S8 Do you think someone else can fix it? 2.86 1.043 4 (7.4) (30.5) (40.3) (12.3) (9.5) Do your friends ask you to balance their 53 105 129 51 29 S9 2.72 1.116 11 check books? (14.4) (28.6) (35.1) (13.9) (7.9) Do you think everything you see or hear 59 106 118 63 21 S10 2.68 1.109 12 is true? (16.1) (28.9) (32.2) (17.2) (5.7) 67 110 114 47 29 S11 Do policymakers call you for advice? 2.62 1.155 16 (18.3) (30.0) (31.1) (12.8) (7.9) 44 122 125 46 30 S12 Do you think taking risks is too risky? 2.72 1.090 10 (12.0) (33.2) (34.1) (12.5) (8.2) Do your friends earn great returns on 62 129 121 38 17 S13 2.51 1.037 18 your investment advice? (16.9) (35.1) (33.0) (10.4) (4.6) Do you keep the rulebook on your 41 120 107 60 39 S14 2.83 1.156 6 bedside table? (11.2) (32.7) (29.2) (16.3) (10.6) Do your friends admire the way you 55 94 128 56 34 S15 handled both traffic court and the 2.78 1.155 7 (15.0) (25.6) (34.9) (15.3) (9.3) Royal Court? 40 91 144 56 36 S16 Do you think leaders are megalomaniacs? 2.88 1.104 3 (10.9) (24.8) (39.2) (15.3) (9.8) Are you asked to chair committee 41 103 119 59 45 S17 2.90 1.172 2 meetings? (11.2) (28.1) (32.4) (16.1) (12.3) 61 116 95 47 48 S18 Do you like to be alone? 2.74 1.253 9 (16.6) (31.6) (25.9) (12.8) (13.1) Do you handle deadlines by ignoring 60 118 128 39 22 S19 them or by doing your most creative 2.58 1.071 17 (16.3) (32.2) (34.9) (10.6) (6.0) work under pressure? Do you celebrate random actions all day 45 96 160 42 24 S20 2.74 1.031 8 every day? (12.3) (26.2) (43.6) (11.4) (6.5) Do people ask you to leave the problem to 70 151 117 23 6 S21 2.30 .904 23 them or to help them find the solution? (19.1) (41.1) (31.9) (6.3) (1.6) When someone says, “hello,” do you 66 159 82 44 16 S22 need to think for a moment about what 2.41 1.052 22 (18.0) (43.3) (22.3) (12.0) (4.4) language to reply in?

Do you always get someone to help you, 68 162 103 28 6 S23 or are you the one providing the help and 2.30 .912 24 (18.5) (44.1) (28.1) (7.6) (1.6) advice? 59 142 115 40 11 S24 Do you specialize in big-picture thinking? 2.46 .985 21 (16.1) (38.7) (31.3) (10.9) (3.0) Do your friends ask you to plan their 51 143 94 47 32 S25 2.63 1.137 15 weddings? (13.9) (39.0) (25.6) (12.8) (8.7) Do you think everyone should figure 46 114 152 35 20 S26 2.64 1.000 14 things out for themselves? (12.5) (31.1) (41.4) (9.5) (5.4) Do you get asked to help teach your 90 168 79 20 10 S27 2.16 .949 27 friends? (24.5) (45.8) (21.5) (5.4) (2.7)

64 A Study on Behavioral Traits

The factor analyses of obtained scores for social This indicates that the eight factors are interpretable. environment under a rotated component matrix The variables grouped into eight components are and eight components which emerged are present- Balancing nature, Magnanimity, Naturalistic, Asser- ed in Table 15. tive nature, Dependency, Lucrative, Loneliness, and The Eigen values thus calculated for the above Time Based activity. variables are shown in Table 15. It can be seen that The number of respondents and the gender un- only the first eight factors have Eigen values greater der each component is shown in Table 16. It can be than 1. ‘1’ was the criterion for retention of a factor, seen that persons under each group are evenly dis- which indicates that only the first five factors are to tributed ranging from 9.54% to 14.442%. Further, it be extracted. Even though the variances were not can be seen that ‘Naturalistic’ and ‘Assertive nature’ evenly distributed in the rotated sum of the squared (14.44%) were equally given importance in the So- loading (percentages ranges between 4.376% and cial environment, followed by the ‘Time based’ per- 10.896%; cumulative variance ratio 60.659%), the sonality group (13.9%). Eigen values are in ranges between 1.181 and 2.942.

Table 15. Result of Factor Analysis of Social Environment

Component S. No. 1 2 3 4 5 6 7 8 S1 .557 S3 .694 S4 .679 S7 .498 S11 .449 S13 .462 S14 .414 S10 .485 S24 .682 S25 .735 S15 .504 S16 .575 S17 .487 S19 .479 S20 .557 S26 .575 S6 .523 S21 .606 S23 .699 S27 .649 S5 .812 S9 .590 S8 .521 S12 .556 S22 .667 S18 .816 S2 .757 Eigen value 2.942 2.332 2.317 2.277 2.050 1.818 1.461 1.181 Cumulative variance ratio 10.896 19.531 28.112 36.546 44.140 50.873 56.283 60.659

1-Balancing Nature 2-Magnanimity 3-Naturalistic 4-Assertive Nature 5-Dependency 6-Lucrative 7-Lonely Nature 8-Time Based Personality

65 http://www.jistap.org JISTaP Vol.1 No.1, 54-68

Table 16. Social Environment Components vs. Gender

Male Female Total Social Environment No. % No. % No. %

Balancing nature 22 6.00 21 5.72 43 11.72

Magnanimity 37 10.08 15 4.09 52 14.17

Naturalistic 35 9.54 18 4.90 53 14.44

Assertive nature 24 6.54 29 7.90 53 14.44

Dependency 18 4.90 20 5.45 38 10.35

Lucrative 22 6.00 13 3.54 35 9.54

Lonely nature 27 7.36 15 4.09 42 11.44

Time Based personality 28 7.62 23 6.27 51 13.90

Total 213 58.04 154 41.96 367 100

From the table, it can be seen that Magnanimity 3.00%) These are followed by ‘Assertive nature’ (10.08%) and Naturalistic (9.54%) respectively were and ‘Lucrative’ (2.72%) equally. Whereas, for prominent among the males, whereas Assertive females ‘Time based’ personality (2.18%) is fol- nature (7.9%) and Time Based personality (6.27%) lowed by ‘Assertive nature’ (1.09%). were given importance among the females. In males 4. In Karnataka, males fall under the category of this was followed by Lonely nature (7.36%) and ‘Naturalistic’ (5.72%) followed by ‘Balancing Time Based personality (7.62%), where in females it nature’ and ‘Time based’ personality (3%), was ‘Balancing nature’ (5.72%) and ‘Dependency’ whereas for females ‘Dependency’ (3.27%) and (5.45%). In general there is contraction in the case ‘Assertive nature’ (3%) dominate, followed by of Magnanimity more in males and less in fe- ‘Naturalistic’ and ‘Lucrative’ (both 1.63%). males. Similarly the case of ‘Naturalistic’ features 5. In Kerala, the male group falls under the cate- more in males and less in females. gories of ‘Dependency’ and ‘Lucrative’ (0.54%) The skills of the respondents have been identified equally whereas among females ‘Balancing among states and sex as follows: nature’ (2.45%) is followed by ‘Assertive nature’ 1. In Tamil Nadu, ‘Magnanimity’ and ‘Naturalis- (1.63%). tic’ (2.72%) are equally considered among male From Table 17, the top priority variables for the students, followed by ‘Balancing nature’ and students are identified and the same results are pre- ‘Lonely nature’ (1.36%) in equal rank as well. sented in Table 18. ‘Assertive nature’ and ‘Naturalistic’ (2.18%) are In order to identify the overall view of the three equal among female respondents followed by environments such as Work Environment, Natural ‘Balancing nature’ and ‘Magnanimity’ (1.63%), Environment, and Social Environment, their rela- also in equal rank. tionship to states of south India and gender is 2. In Pondicherry, all of the respondents are male shown in Table 19. and all favoured the category of ‘Magnanimity’ Rash nature, Workaholic, Punctuality, Patience, and ‘Time Based’ personality (0.54%) equally, Easygoing, Magnanimity, Lonely Nature, Natural- followed by ‘Assertive nature’ (0.27%). istic, and Dependency are some of the behavioral 3. In Andhra Pradesh, the male group dominates traits thus existing among male LIS professionals. on ‘Lonely nature’ (3.81%), followed by ‘Magna- Similarly Punctuality, Achievement oriented, nimity’ and ‘Time Based’ personality (both Patience, Relaxed, Complacent, Assertive, Time

66 A Study on Behavioral Traits

Table 17. Social Environment Components vs. States vs. Gender

Time Balancing Magnani- Assertive Depen- Naturalistic Lucrative Loneliness Based Nature mity Nature dency States Activity Total

M F M F M F M F M F M F M F M F

5 6 10 6 10 8 3 8 3 4 4 1 5 3 4 7 87 Tamil Nadu 1.36 1.63 2.72 1.63 2.72 2.18 0.82 2.18 0.82 1.09 1.09 0.27 1.36 0.82 1.09 1.91 23.71

0 0 2 0 0 0 1 0 0 0 0 0 0 0 2 0 5 Pondi- cherry 0.00 0.00 0.54 0.00 0.00 0.00 0.27 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.54 0.00 1.36

5 1 11 1 4 2 10 4 9 2 10 3 14 7 11 8 102 Andhra Pradesh 1.36 0.27 3.00 0.27 1.09 0.54 2.72 1.09 2.45 0.54 2.72 0.82 3.81 1.91 3.00 2.18 27.79

11. 5 14 3 21 6 9 11 4 12 6 6 8 2 11 5 134 Karna- taka 3.00 1.36 3.81 0.82 5.72 1.63 2.45 3.00 1.09 3.27 1.63 1.63 2.18 0.54 3.00 1.36 36.51

1 9 0 5 0 2 1 6 2 2 2 3 0 3 0 3 39 Kerala 0.27 2.45 0.00 1.36 0.00 0.54 0.27 1.63 0.54 0.54 0.54 0.82 0.00 0.82 0.00 0.82 10.63

22 21 37 15 35 18 24 29 18 20 22 13 27 15 28 23 367 Total 5.99 5.72 10.08 4.09 9.54 4.90 6.54 7.90 4.90 5.45 5.99 3.54 7.36 4.09 7.63 6.27 100.00

Table 18. Top Priority Variables in Social Environment among States vs. Gender

State Male Female

Tamil Nadu Magnanimity Assertive

Pondicherry Magnanimity -

Andhra Pradesh Lonely nature Time base personality

Karnataka Naturalistic Dependency

Kerala Dependency Balancing nature

Table 19. Comparison of Top Priority Variables in Three Environment among States vs. Gender

Work Natural Social State Environment Environment Environment Male Female Male Female Male Female

Tamil Nadu Rash nature Punctuality Patience Patience Magnanimity Assertive

Pondicherry Workaholic - Patience - Magnanimity -

Achievement Time based Andhra Pradesh Workaholic Easy going Relaxed Lonely nature oriented personality

Karnataka Punctuality Punctuality Easy going Complacent Naturalistic Dependency

Balancing Kerala Punctuality Punctuality Patience Patience Dependency nature

67 http://www.jistap.org JISTaP Vol.1 No.1, 54-68

based, Dependency and Balancing nature exist empirical study in service context. In First among female LIS students. international conference on Infor-mation Science and engineering. (pp. 4326-4329) Beijing, China: IEEE Conference Publications. Retrieved from 4. CONCLUSIONS http://ieeeexplore.ieee.org/ielx5/5454173/0545 5189.pdf. It is often said that people’s behaviour is purely Friedman, M., & Rosenman R. H. (1959). Association based on the environment and differs from one pro- of specific overt behavior pattern with blood and fession to another and from person to person. cardiovascular findings blood cholesterol level, Further, it depends on culture, sex, and state of blood clotting time, incidence of arcus senilis, mind. This study helped to identify the behavioral and clinical coromary artery disease. Journal of psychology of LIS students in South India. Attitude American Medical Association, 169(12), 1286- is of outmost importance as it can make or mar the 1296. li-brary professional par excellence. The importance Mann, R. D. (1959). A review of the relationships of optimism, enthusiasm, courage, confidence, between personality and performance in small sense of humor, empathy, sympathy, patience, groups. Psychological Bulletin, 56(4), 241-270. altruism, and intellectual curiosity should be the McAdams, D. P. (1996). Personality, modernity, and focus of library professionals. Their qualities in fact the storied self: A contemporary framework for should be a combination of knowledge, skills, and studying persons. Psychological Inquiry, 7(4), attitudes that these authors also seek from the new 295-321. breed of librarians. The respondents exhibit a sense McDonald, R. P. (1999). Test theory: A unified treat- of values and standards, a public service orienta- ment. Hillsdale, NJ: Lawrence Erlbaum. tion, and above all the commitment to the funda- Pedhazur, E. J., & Schmelkin, L. P. (1991). Measure- mental values of access to information. This study ment, design, and analysis: An integrated app- also demonstrates that LIS students shall need to roach. Hillsdale, NJ: Lawrence Erlbaum. possess these qualities, including risk taking, adapt- Stogdill, R. M. (1948). Personal factors associated ability, assertiveness, and willingness to embrace with leadership: A survey of the literature. The approaches from outside the library world. Journal of Psychology, 25(1), 35-71. This study discusses research results on the Vazquez-Carrasco, R., & Foxall, G. R. (2006). Influ- behavioral traits of Indian LIS students using ence of personality traits on satisfaction, percep- descriptive inferential statistics. Future studies can tion of relational benefits, and loyalty in a per- replicate its methods in various nations and/or sonal service context. Journal of Retailing and regions with various groups of users in various types Consumer Services, 13(3), 205-219. of libraries. Further studies and inferential analysis Wright, P. L. (1996). Managerial leadership. London: on the relationships between behavioral traits and Thomson Learning. customer satisfaction and loyalty can suggest much Zeller, R. A., & Carmines, E. G. (1980). Measurement more beneficial information for library manage- in the social sciences: The link between theory and ment as well. data. Cambridge University Press. Zinbarg, R. E., Revelle, W., Yovel, I., & Li, W. (2005). Cronbach’s , Revelle’s , and Mcdonald’s H: REFERENCES their relations with each other and two alterna- Cronbach, L. J., & Shavelson, R. J. (2004). My current tive conceptualizations of reliability. Psychomet- thoughts on coefficient alpha and successor rika, 70(1), 123-133. procedures. Educational and Psychological Measurement, 64(3), 391-418. Feng, X., & Zhang, M. l. (2009). Impact of personality traits on perception of relational benefits: an

68 Research Paper JISTaP http://www.jistap.org J. of infosci. theory and practice 1(1): 69-82, 2013 Journal of Information Science Theory and Practice http://dx.doi.org/10.1633/JISTaP.2013.1.1.5

A Faceted Data Model for Bibliographic Integration Between MARC and FRBR

Seungmin Lee* Chungnam National University Republic of Korea E-mail: [email protected]

ABSTRACT Although MAchine Readable Cataloging (MARC) and Functional Requirements for Bibliographic Records (FRBR) are currently the most broadly used bibliographic structures for generating bibliographic data in the library com- munity, each has its own weaknesses in describing information resources in diverse media. If the MARC format could be implemented in a structure that reflects the multi-layered characteristics of FRBR, its use could address current problems and limitations in resource description. The purpose of this research is to propose an alternative approach that can integrate the heterogeneous bibliographic structures of MARC and FRBR through the applica- tions of facet and facet analysis. The proposed faceted data model is expected to function as a conceptual struc- ture that can mediate between MARC data elements and FRBR attributes in order to utilize these structures in a more reliable and comprehensive way.

Keywords: MARC, FRBR, Facet, Facet analysis, Integration

1. INTRODUCTION the most broadly used bibliographic standard for encoding and exchanging bibliographic data To handle the increase of information resources based on the descriptive rules provided by AACR. in diverse media, many in the library community Although AACR and MARC format are suitable for have relied on tools such as Anglo-American more traditional resources such as books and print- Cataloging Rules (AACR) and MAchine Readable ed materials, they may not be the most appropriate Cataloging (MARC) in order to manage and orga- tools for describing new forms of resources, such as nize these resources. Currently, the MARC format is digital resources on the Web that are remotely

Open Access

Received date: January 12, 2013 All JISTaP content is Open Access, meaning it is accessible Accepted date: February 23, 2013 online to everyone, without fee and authors’ permission. All JISTaP content is published and distributed under the terms of *Corresponding Author: Seungmin Lee the Creative Commons Attribution License (http://creativecom- Assistant Professor mons.org/licenses/by/3.0/). Under this license, authors reserve Department of Library and Information Science the copyright for their content; however, they permit anyone to Chungnam National University, Republic of Korea unrestrictedly use, distribute, and reproduce the content in any E-mail: [email protected] medium as far as the original authors and source are cited. For any reuse, redistribution, or reproduction of a work, users must clarify the license terms under which the work was produced.

Seungmin Lee, 2013 69 http://www.jistap.org JISTaP Vol.1 No.1, 69-82

accessed. Because of its strict and rigid structure, Many approaches have been explored in attempts the MARC format is limited in its ability to describe to overcome these problems and to achieve inter- digital resources and cannot adequately represent operability between MARC and FRBR in order to the semantics and dynamic natures of those combine the strengths of and complement the resources. MARC’s ability to represent relationships weaknesses of both MARC and FRBR. If the MARC among bibliographic entities with multi-layered format could be implemented in a structure capable characteristics is also problematic because of its lin- of describing multi-layered characteristics, it could ear and single-layered structure. address current problems and limitations in re- There is growing awareness of the need for a sources description. These approaches are based on more flexible structure for bibliographic data which the approaches to metadata interoperability and can handle a variety of resource media, represent focus on similarities between the two sets of the relationships among descriptive entities, and descriptive elements. However, they have shown describe the multi-layered characteristics of digital that many FRBR attributes do not have data ele- resources. Currently, metadata, generally defined ments that can be mapped directly to MARC and as data about data, is recognized as a powerful tool vice versa because of their unique characteristics. for generating bibliographic data and standardizing The purpose of this paper is to propose an alter- resource description. However, many communities native approach that could interrelate between have developed unique metadata standards to sat- MARC data elements and FRBR attributes by con- isfy their own purposes. This tendency has led to a structing a conceptual data model. It provides a set flood of heterogeneous purpose-specific metadata of core bibliographic elements through the applica- standards that can be only used in a specific com- tions of facet analysis in order to conceptually inte- munity. It also leads to duplication of metadata grate MARC data elements and FRBR attributes. records in different formats and inefficient use of existing records. Although those metadata stan- dards are designed to provide standardized prac- 2. APPROACHES TO INTEROPERABILITY tices for resource description, they have failed to generate standardized resource description 2.1 Current Approaches to Interoperability because of the lack of commonly accepted descrip- The efforts of the library community to achieve tion rules. interoperability between bibliographic structures To overcome these problems and to cope with the and their applications have made use of several dif- dynamic nature of new types of resources, the ferent approaches, mostly applied to achieve meta- International Federation of Library Associations data interoperability. Although these approaches and Institutions (IFLA) proposed the Functional are focused on establishing semantic relationships Requirements for Bibliographic Records (FRBR) between the components of metadata standards, model, which focuses on the organization of biblio- they have adopted different methods that reflect the graphic elements and provides for multiple rela- aspects of the standards on which they focus. tionships among descriptive entities. Although Chan and Zeng (2006) have grouped current ap- FRBR can support the representation of multi-lay- proaches to interoperability into three categories ered characteristics of resources, it also has several based on the level of relationships established weaknesses as a bibliographic standard. For exam- between metadata standards: schema level, record ple, it does not provide sufficient data elements for level, and repository level. In schema level approach- resource description. And, because its strict hierar- es, the focus is on the elements of a scheme that are chical structure prescribes the relationships among independent of any application. Crosswalks, appli- entities and attributes, the predetermined relation- cation profiles, and registries are methods included ships are also too rigid to provide the flexibility nec- in this category. In record level approaches, meta- essary to describe the dynamic nature of digital data records are integrated through the mapping of resources. elements. This level encompasses element mapping

70 A Faceted Data Model

and data integration methods. In repository level ments do not have equivalent FRBR attributes that approaches, the objective is to map value strings can be directly mapped because they focus on simi- associated with particular elements. Metadata larities between the two sets of descriptive ele- repository and aggregation are representative meth- ments. In addition, these approaches attempted to ods in this third category. map between MARC data elements and FRBR Other researchers have contributed substantially attributes without considering structural differ- to understanding metadata interoperability. Moen ences, although MARC has a single-layered struc- (2004) has argued that mechanisms for addressing ture and FRBR adopts a hierarchical structure with metadata interoperability evince four main multi-layered attributes. approaches: mapping, crosswalks, application pro- These difficulties in achieving interoperability files, and metadata registries. According to Moen, between MARC and FRBR mainly result from the mapping is a process that identifies semantically heterogeneity of bibliographic structures. Each equivalent elements in different standards, and structure has different representation of descriptive crosswalks implement the basic rationale of ele- entities, different structural frameworks, and differ- ment mapping, making mapping and crosswalks ent levels of granularity from the other. For these very similar approaches. Application profiles offer reasons, current approaches have generally failed to schema-level interoperability for sharing informa- achieve reliable interoperability between MARC and tion about metadata standards in order to exchange FRBR. and reuse elements. Metadata registries provide indexes to metadata terms and official definitions as 2.2 Methodology well as to local variations and extensions that MARC and FRBR are bibliographic systems with enable the reuse of existing elements. pre-determined structures. MARC can be consid- Hodge (2005) has proposed major approaches to ered as a bibliographic structure based on a stan- interoperability, including metadata frameworks, dardized cataloging rule for resource descriptions. It crosswalks, and metadata registries. The metadata contains more than 2,000 bibliographic entities framework approach integrates various standards under a strict and rigid structure. FRBR can also be into a single standardized scheme. It is a reference considered as a cataloging rule with a conceptual model which provides a conceptual structure into structure for bibliographic description. However, it which other standards can be placed. A crosswalk does not have specific entities but provide concepts matches the elements, semantics, and syntax of one that categorize bibliographic entities under the con- standard with those of one or more other standards, ceptual structure. where possible. A metadata registry is based on ele- To achieve interoperability between these two ment mapping at the schema level. Hodge states heterogeneous structures and to fully utilize the that a metadata registry is a metadata database that advantages of each, it is necessary to adopt different stores terms and definitions of the components of approaches from those applied to metadata inter- metadata schemes and provides extensions of the operability. Bibliographic entities conceptually rep- terms in order to support reuse and exchange of ele- resent general aspects of resources and consist of ments. structures of bibliographic records, whereas meta- Based on these approaches to metadata interop- data elements indicate specific aspects of informa- erability, several methods have been explored in tion resources. For this reason, interoperability attempts to integrate MARC and FRBR. Delsey between MARC and FRBR may not utilize appro- (2002) mapped MARC21 data elements to FRBR aches based on direct mapping between elements attributes based on the direct element mapping in different metadata standards. approach. Aalberg (2005) refined FRBR attributes This research tried to construct a conceptual and created mapping tables to match FRBR ele- structure that would function as a data model which ments to MARC data elements. However, these integrates descriptive aspects of information approaches have shown that many MARC data ele- resources. This approach incorporates the strengths

71 http://www.jistap.org JISTaP Vol.1 No.1, 69-82

of merging and mapping to make two heteroge- may require clear and comprehensive criteria that neous bibliographic structures interoperable. In can be used as conceptual spaces for merging data most cases, merging and mapping play important elements in different structures. Applications of roles in integrating and achieving interoperability facet analysis are applied to set up the criteria for between heterogeneous structures. integrating the two sets of bibliographic structures Mapping can be defined as the process of estab- and constructing a faceted data model. lishing relationships between semantically equiva- lent elements in different structures (Kurth, Ruddy & Rupp, 2004). Thus, mapping refers to the process 3. APPLICATION OF FACET ANALYSIS of associating elements of one set with elements of another set. In bibliographic description, mapping 3.1 Characteristics of Facet Analysis can provide conceptual connections among data A facet, in its simple meaning, is a conceptual cat- elements in two or more bibliographic structures. egorization. It generally refers to a concept group, These conceptual connections can establish simple consisting of generic terms, used as a general mani- relationships or associations among elements with- festation of a compound subject to denote compo- out any change or modification. Because the char- nents of the subject (Ranganathan, 1962). In the acteristics of the original elements are retained after library community, a facet is defined as “clearly mapping, the process can achieve interoperability defined, mutually exclusive, and collectively ex- between heterogeneous structures while retaining haustive aspects, properties, or characteristics of a the unique characteristics of the original structures. class or specific subject” (Taylor, 1992). From differ- However, mapping itself is not enough to achieve ent perspectives, a facet also refers to a partitioning interoperability since it can only provide conceptual of vocabulary or grouping of terms obtained by the relationships at the data element level. It is neces- division of a subject discipline into homogeneous or sary to establish relationships at the structure level semantically cohesive categories (Svenonius, 2000). to achieve full interoperability between two or more The process of partitioning domain vocabulary structures. and generating facets is often called facet analysis Merging generally takes two or more entities and (or faceting). It is a mental process involving analy- reconstructs them into a single new entity (Ten- sis of a subject into its facets based on a set of postu- nant, 2004). Thus, if one entity is merged with lates and principles, resulting in a knowledge struc- another, they are combined to make a new struc- ture with clearly delineated semantic relationships ture. The merged entity does not retain the struc- between concepts. This structure provides a frame- tures or characteristics of the original entities. work to accommodate various types of concepts Merging creates a new structure that is totally differ- along with syntax rules for their combination ent from those of the original entities. (Kumar, 1987). Bibliographic structures are usually heteroge- Facet analysis derives two processes: analysis and neous structures with unique characteristics. This is synthesis. Ranganathan (1967) demonstrated facet the main reason that interoperability between dif- analysis as “the process of breaking down subjects ferent structures is obstructed. Although merging into their elemental concepts.” These concepts can eliminates the heterogeneity of different structures be synthesized, which is the process of recombining and data elements, it can also eliminate the unique concepts into subject strings or creating new com- characteristics of each structure and data element pound terms. by merging them into a new structure. Through the process of analysis and synthesis, To mediate these limitations and problems, this facet analysis can be used as a tool to identify and research utilizes the strengths of both mapping and represent relationships between concepts in a cer- merging in order to integrate between two different tain subject. Each facet contains a number of terms bibliographic structures, instead of achieving inter- that will be considered to be conceptually equiva- operability between them. The integration process lent (Harter, 1986). Based on the relationships of

72 A Faceted Data Model

concepts, the process also provides a framework of concepts used in resource description. This facet vocabulary with enough flexibility to include new also defines representative aspects, properties, and subjects in it because it can synthesize or combine characteristics of a bibliographic entity. subjects with facets. Facet analysis also provides for When applied to MARC and FRBR, the limitations the organization of concepts in modular hierarchies of facet and facet analysis can be complemented by by separating unrelated or dissimilar concepts and the bibliographic structures of MARC and FRBR. grouping related or similar concepts. Thus, relevant These bibliographic structures have categorized concepts are identified by partitioning domain entities based on their meanings in bibliographic vocabulary into mutually exclusive facets (Priss & descriptions with hierarchical structure. The Jacob, 1998). semantic range of each entity is predefined with concrete structure. In addition, the label of each 3.2 Types of Facets bibliographic entity is mutually exclusive with each Although the approaches of facet and facet analy- other. Therefore, when applied to bibliographic sis have many strengths in organizing and catego- structures, facet analysis can provide a clear facet rizing related concepts, it is difficult to define facets structure. In addition, the class facets and property and to prescribe the semantic range of each facet. facets can be applied to each entity contained in Because of the ambiguous or abstract semantics of both structures. facets, there are concepts which cannot be catego- However, the applications of facet and facet rized exclusively into only one facet (Svenonius, analysis can provide integration of bibliographic 2000). Another problem is raised when two con- entities in both structures instead of interoperabili- cepts with different attributes have the same label. ty. Interoperability is conceptually based on the Facet analysis does not provide a way to distinguish direct mapping of entities with the same or similar between these concepts, resulting in ambiguity of meaning. In contrast, the bibliographic structures concept relationships. are designed to provide bibliographic rules and To address these weaknesses and to support the standards to generate bibliographic data, although organizing and categorizing process through facet they contain concrete entities. and facet analysis, this research divides facets into By applying these different types of facets, seman- two types according to their functions in biblio- tic similarity between concepts can be clearly iden- graphic description: class facet and property facet. tified and the semantic relationships between con- Class facet refers to the fundamental attributes of cepts can be established in the facet structure. The every bibliographic entity in resource description. constructed facet structure can support consistent The semantic range of each class facet will be broad extension of facets and facet analysis by represent- and abstract in order to encompass all the related ing concept relationships and functions as a faceted concepts in resource description. It can also sup- data model for integrating MARC and FRBR. port the categorization process by encompassing entire concepts and categorizing related concepts with each other. Through these processes, class 4. CONSTRUCTION OF FACETED DATA MODEL facet prescribes the semantic range of facet struc- tures generated by facet analysis. Each class facet is 4.1 Analysis of MARC and FRBR located at the top of the structure and provides a The first step in the process of constructing semantic framework in which related concepts can faceted data model for integrating bibliographic be placed together. structures involves the generation of facet vocabu- Property facet can be considered as subclass of lary in different structures. This generation of class facet. It is located under each of the related vocabulary begins with analyzing and identifying class facets and constructs hierarchical facet struc- bibliographic structures from which to extract con- ture by specifying the semantic range of the class cepts of elements and the concept relationships. facets. It represents the category of the attributes of

73 http://www.jistap.org JISTaP Vol.1 No.1, 69-82

4.1.1 Analysis of MARC data elements and structure tal resources because it was originated for tradition- MARC provides a standardized record structure for al printed materials. encoding and exchanging bibliographic data. As Another restriction on the MARC format is that its Moen and Benardino (2003) assert, MARC originated structure is based on the concept of main entry. as a means to communicate bibliographic data about Main entry (i.e., the 1XX field tag) relies on author- printed materials. However, it has evolved to address ship of a work as the primary access point. With the the representation of numerous bibliographic data advent of online catalogs, main entry is losing its types, including computer files, maps, serials, music, importance because there are multiple possible visual materials, and archival materials. access points other than author. The prescribed MARC consists of three main components in a structure based on main entry also separates related bibliographic record: the leader, the directory, and data elements, thereby resulting in the possible the variable fields. These bibliographic components duplication of data in the record. are enumerated in a predetermined structure. In addition, the MARC format uses a set of tags, indica- 4.1.2 Analysis of FRBR data elements and structure tors, delimiters, and subfield codes, which are FRBR provides a conceptual model for biblio- applied to pre-determined MARC fields. graphic records that defines logical relationships The MARC format is an analytical system with a among bibliographic objects in terms of an entity- linear structure that can fully describe bibliographic relationship model. FRBR defines the structure of a entities through application of almost 2,000 catalog record as a set of relationships among mul- descriptive data elements. Although MARC analyzes tiple entities, whereas MARC uses a linear and flat data elements in detail, it simply enumerates those bibliographic structure. FRBR identifies three types elements in a single-layered format that is pre- of entities that are relevant to bibliographic objects: determined. This structural rigidity cannot fully Group 1(Work, Expression, Manifestation and support representation of resources with multi-lay- Item); Group 2 (Person and Corporate body); and ered bibliographic relationships. It is also problem- Group 3 (Concept, Object, Event and Place). As a atic for MARC format to describe new types of digi- further definition,

Table 1. Components of Directory

Variable Fields Field tags Descriptions

Control Field 00X Control number, data and time, etc.

0XX Control information, numbers, codes

1XX Main entry

2XX Titles, edition, imprint

3XX Physical description, etc.

4XX Series statements Data Fields 5XX Notes

6XX Subject access fields

7XX Name, etc. added entries or series; linking

8XX Series added entries; holdings and locations

9XX Reserved for local implementation

74 A Faceted Data Model

Table 2. Entities Comprising FRBR Groups

Group Entities Attributes

Work Work title, form or genre, date, performance medium, intended audience, etc.

expression title, form of the expression, language of the expression, type of Expression score, scale of a map, etc. Group 1 manifestation title, publisher, date of publication, form of carrier, dimensions, Manifestation manifestation identifier (e.g. ISBN), terms of availability, etc. location or call number, barcode, provenance, condition, access restrictions Item on an item, etc.

Person names, dates, titles or other designations, etc. Group 2 Corporate body name, number, place, date, other designation, etc.

Concept Term

Object Term Group 3 Event Term

Place Term

Group 1 comprises the products of intellectual or contrast, MARC generally consists of manifestation- artistic endeavour that are named or described in level and item-level information and the biblio- bibliographic records: work, expression, manifesta- graphic elements are enumerated according to a lin- tion, and item. Group 2 comprises those entities ear structure. MARC also has bibliographic elements responsible for the intellectual or artistic content, that correspond to work-level and expression-level the physical production and dissemination, or the elements in FRBR, but they are placed in fields relat- custodianship of such products: person and corpo- ed to authority files or uniform title. Therefore, rate body. Group 3 comprises an additional set of MARC format is a mixture of work, expression, man- entities that serve as the subjects of intellectual or ifestation, and item information in bibliographic artistic endeavour: concept, object, event, and place records within a linear structure that cannot express (IFLA Study Group on the Functional Requirements explicit relationships between entities. for Bibliographic Records, 1998). FRBR enhances the retrieval of digital resources Each entity in each of FRBR’s three groups can be because it contains attributes which can be specific expanded by using attributes which can serve as a to digital resources, such as system requirements, means for users to formulate queries about a partic- file characteristics, mode of access, and access ular object (IFLA Study Group on the Functional address, which MARC does not clearly provide. Requirements for BR, 1998). A total of 97 attributes However, the FRBR model does not provide suffi- are defined in FRBR in terms of the characteristics cient data elements to fully describe bibliographic of an entity, rather than as specific data elements. entities, even though it can support the representa- The FRBR model focuses on the organization of tion of multi-layered characteristics of information data elements. It provides for multiple relationships resources. Also, it has a pre-determined hierarchical among bibliographic entities by adopting a hierar- structure, and the relationships among data ele- chical structure. This hierarchical structure can ments are too rigid to provide the flexibility neces- clearly describe bibliographic relationships and deal sary when describing the dynamic nature of digital with multi-layered characteristics of resources. In resources.

75 http://www.jistap.org JISTaP Vol.1 No.1, 69-82

4.2 Generating Facet Vocabulary of Biblio- functions as a class facet that connects different graphic Entities structures through the related subordinate ele- Once bibliographic structures have been ana- ments placed under each superordinate element. lyzed, the next step is to generate facet vocabulary The subordinate elements serve as property facets by extracting elements from the two structures. because they represent specific attributes and char- Each element has a unique meaning to represent acteristics of each bibliographic element in different a specific aspect of resources. However, the mean- structures. ing is not just derived from the aspects of resources, but also from the context which is usually reflected 4.3 Concept Group Identification in those structures. This context affects the seman- The basic strategy of categorizing extracted ele- tics of elements and often causes the same element ments is to identify groups of related concepts that to be used in different ways. To address this contex- could be potential facets. Application of facet analy- tual problem, the semantics of each element is con- sis converts extracted elements into concepts and sidered along with the structural differences creates a comprehensive set of candidate facets. The elements in MARC are distributed under 10 between the two structures when extracting ele- categories represented by each field from 0XX to ments. 9XX. Although these 10 fields encompass all the The first step of the extraction process is the iden- MARC elements, the meaning of each field is too tification of superordinate elements in each struc- broad to represent the specific aspects of each ture. By identifying and comparing the elements resource. The elements substantially used for with correspondents in another structure, com- resource description are the delimiters that specify monly used superordinate elements in both struc- the meaning of each field. Thus, this research ana- tures can be considered as core elements because lyzed the meaning of delimiters and extracted they share specific meanings in resource descrip- MARC elements based on the analyzed meaning. tion. Then, all elements placed under superordinate The elements in MARC were grouped into seven elements in each structure are analyzed and categories: Author, Title, Subject, Publication, extracted according to their semantics. These sub- Description, Identifier, and Format. ordinate elements may have specific meanings to The structure of FRBR is extremely different from describe detailed aspects of resources. Thus, the that of MARC because FRBR was originally designed types and quantities of these elements may vary as a data model focusing on the organization of bib- across structures according to the levels of granular- liographic entities. Thus, the FRBR model does not ity. In addition, some elements may be used in provide sufficient elements to fully describe more than one place in a structure with different resources. However, FRBR also contains many ele- labels. In this case, the extraction of subordinate ments for describing and managing resources with- elements only focuses on the semantics of the ele- in a hierarchical structure. FRBR also has 10 super- ment based on the context, instead of the labels. By ordinate elements (entities) grouped into three using the semantics as the criteria of element from Group 1 to Group 3. However, the meanings of extraction, this duplicated use of the same element the elements are very broad and ambiguous. Thus, can be eliminated. this research considered the attributes of each enti- After these analyses, the extracted elements from ty that specifies the aspects of a resource. The the two structures are put together into the facet attributes in FRBR were also grouped into seven vocabulary and categorized according to their categories: Author, Title, Subject, Publication, semantic similarities. This vocabulary reflects the Description, Identifier, and Format. semantic range of those structures and functions By comparing these superordinate elements as the foundation of constructing a faceted data extracted from both MARC and FRBR, this research model. creates seven class facets: Author, Title, Subject, Each shared meaning of superordinate elements Publication, Description, Identifier, and Format. In

76 A Faceted Data Model

Table 3. Example of Category of Facet (Class Facet Format only) structed. Class facets, which are commonly used in both structures, are placed at the top of the data Class Facet Property Facet model. Each class facet contains elements equivocal Serials or similar to property facets from both MARC and

Musical works FRBR. The class facet consists of property facets, which functions to specify attributes or characteris- Image tics of the class facet. This group of facets prescribes Format Cartographic the semantic concepts of different bibliographic structures. In addition, the group of facets describes Electronic resources the semantics of elements in both structures by

Sound specifying the function of the element under a spe- cific context. Based on semantically distinct categories of facets, a faceted data model was constructed, which the proposed data model structure, each class facet categorizes constituent facets according to their provides a space in which the related elements and functional roles. The structure of the faceted data attributes can be placed together according to their model consists of three components: facet group (class facet and property facet), extracted concept, semantics. and bibliographic elements in both structures. The Once class facets have been established, all other facet group has its own hierarchical structure of elements placed under each superordinate element relationships between class facets and property in both structures are analyzed and facet analysis facets. This group of facets has specific relationships can extract the concept of each element. The ex- with other components of the model which indi- tracted concepts are placed under each of the related cates how elements in bibliographic structures are class facets. These extracted concepts function as conceptually connected with facets. By this step, the property facets which specify the semantic range of duplication of elements can be eliminated. In addi- class facets. tion, the problems of the separation of same or sim- ilar elements in different places in each structure 4.4 Construction of a Faceted Data Model can be also addressed. For example, both 245 and Construction of a faceted data model was based 246 fields represent a title of a work, but those fields on the generated facet vocabulary. Class facets are separated in the MARC system. FRBR also sepa- derived from superordinate elements can have a rates the attributes related to a title of a work into broad range of semantics and may not be specific three places: title of work, title of expression, title of enough to represent particular characteristics of a the manifestation. Through the use of class facets resource. The concepts derived from subordinate and property facets, these separated and distributed elements function as property facets and are nested elements can be grouped together in regard to their under class facets derived from superordinate ele- semantics. ments. These property facets divide the broad In the proposed faceted data model, there are meaning of the class facet into more specific units seven class facets at the top of the structure and 21 of semantics and establish semantic relationships property facets under basic facets according to the between class facets and property facets. Property semantic similarities between them (see Table 5). facets related to the same class facet were subse- These facets can be assigned to each element in quently categorized based on their functional simi- MARC and FRBR. Through these facets, both MARC larities. and FRBR elements can be integrated and interop- Following the categorization of extracted con- erable with each other. cepts into class facets and property facets, a hierar- The faceted data model is not intended to de- chical structure of a faceted data model can be con- scribe specific information resources but to provide

77 http://www.jistap.org JISTaP Vol.1 No.1, 69-82

Table 4. Example of Facet Vocabulary (Class Facet Format only)

Class Facet MARC FRBR Property Facet

Serials: publication frequency

Serials: publication status

Musical works: performance medium

Musical works: score type

Musical works: key

Image: technique 254, 255, 256 310, 321 Image: color Work 352 Format Expression Cartographic: coordinates 342, 343 Manifestation 362 Cartographic: scale 856 Cartographic: technique

Electronic resource: system

Electronic resource: access

Sound: technique

Sound: reproduction

Sound: playing

Table 5. Class Facets and Property Facets of the Proposed Faceted Data Model

Class Facets Property Facets

Author Person, Corporate Body, Meeting

Title Title Statement, Series Statement

Subject Classification Number, Keyword

Description Edition, Summary, Representation

Identifier Identifier

Publication Publisher

Serials, Musical Work, Cartographic Work, Computer File, Image, Microform, Format Electronic Resource, Sound Recording, Other Formats

a set of core bibliographic elements. Facets in the 4.5 Implementation of a Faceted Data Model model can be connected to both the MARC and Elements of bibliographic structures are used to FRBR because these facets were extracted from the describe specific aspects of a resource and derive elements contained in the MARC and FRBR systems. their values from the resource being described. If any facet in the data model can be connected with However, the values associated with a resource are any of the corresponding MARC data elements and not values of the element per se, but of the concept FRBR entities/attributes, a user can utilize MARC that is the core meaning of the element. In this for detailed descriptive elements and FRBR for rep- sense, an element is the representation of a con- resentation of bibliographic relationships. cept. The format of an element can be changed

78 A Faceted Data Model

Table 6. Example of the Interoperable Elements on Faceted Data Model (Class Facet Title Only)

Class Facet Property Facet MARC Element FRBR Element

Work Title of work

Title statement 245 $a Title statement Expression Title of expression

Manifestation Title of the manifestation

246 $a Title proper Work Title of work

Title Title proper Expression Title of expression 246 $b Remainder of title Manifestation Title of the manifestation

130 $a Uniform title

Uniform title 240 $a Uniform title Work Title of work

730 $a Uniform title

according to the context of the associated biblio- which comprise the class facet. Using these two graphic structure, as reflected in its structure and types of facets, the different functions of each facet syntax, while the concept, which is the translated and their relationships can be specified and differ- meaning of the element, is not changed regardless ences in semantic ranges can be mediated. of differing contexts. Based on the semantically distinct categories Implementation of the faceted data model is illustrated in Figure 2, a faceted data model was based on the contextual instantiations of facets in constructed that categorizes constituent facets facet vocabulary that represent the semantic func- according to their functional roles. The facet struc- tions of elements in resource description. Thus, the ture consists of three components: a group of facets faceted data model incorporates three primary (class facet and property facet), roles of facets, and components: the elements, the facets, and the func- meanings of elements. Class facet represents the tions of the facets. Each of the components is key to primary concept that subsumes related subordinate implementation of the faceted data model. Each concepts represented by elements in both MARC component is connected to other components by and FRBR. Property facet contains concepts that specific relationships. A class facet identifies the describe specific aspects of the class facet and func- core and broad meaning of an element, providing tions as subordinates of the class facet. This set of the semantics for the element. The property facet facets has its own hierarchical structure of relation- stands in for the roles of each element in resource ships between facets. Property facet represents a description. specific part of the semantics of class facet and a Each class facet, property facet, and elements in group of facets (class facet-property facet pair) clari- both structures are closely related with each other. fies the semantic range of bibliographic structures, Property facet represents a specific part of the which is determined by the meaning of each ele- semantics of class facet and the meanings of ele- ment in both structures. ments are converted to property facets based on the The faceted data model is not intended to be used prescribed semantic range of related class facets. for any actual resource description, but to provide a Both property facets and the meanings of elements set of facets that can relate elements from different consist of the semantic range of the superordinate bibliographic structures. Facets, which represent class facet. elements from different structures in a context- In spite of their different functions, the class facet independent manner, can semantically connect the and property facet serve as containers that can hold elements in both the MARC and FRBR because the similar yet heterogeneous concepts of elements facet is extracted from the elements contained in

79 http://www.jistap.org JISTaP Vol.1 No.1, 69-82

points to Class Facet Meaning of Element represented by

Fig. 1 Relationship Between Class Facet and Corresponding Meaning of Element

those structures. This process is optimized to make ographic description. A set of semantics of elements elements in a bibliographic structure from different in different structures was categorized according to structures interoperable by identifying and linking their semantic similarities through the application elements with the same concept, and thus to estab- of facet analysis. lish semantic relationships between those struc- This research specified two types of facet: class tures. Ideally, the faceted data model generates a facet and property facet. Class facet was placed at conceptual structure that can connect elements the top of the structure and displays the framework from different bibliographic structures on the basis of the faceted data model. Property facet, nested of semantic, syntactic, and structural similarities by under each class facet, specifies the semantic range identifying the conceptual orientation of seemingly of the class facets. disparate elements within a single, context-inde- These different types of facets are assigned to pendent data model. each element in both bibliographic structures based on the semantics of those elements. Through these assigned facets, the elements with the same facets 5. CONCLUSION can be connected with each other. This connection integrates both MARC and FRBR into the proposed This research has constructed a faceted data faceted data model. model for integrating heterogeneous bibliographic The faceted structure of the proposed data model structures such as MARC and FRBR. This model is also provides the capability of integrating semantic not intended to describe any specific information and structural interoperability by taking into resources but to provide a set of facets used in bibli- account the contextual differences between MARC

Class Facet

parts of parts of

Meaning of Element Property Facet converted into

Fig. 2 Relationships Across Different Types of Facets and Meaning of Element

80 A Faceted Data Model

Class Facet

clarifies Bibliographic contains Structures

Meaning of Element determines

Meaning of Element

describes

Fig. 3 Conceptual Structure of the Faceted Data Model

and FRBR. Thus, integration of heterogeneous bibli- aacr.pdf ographic structures into this faceted data model can Harter, S. P. (1986). Online information retrieval: provide an alternative approach to utilize these Concepts, principles, and techniques. New York: structures in a more reliable and comprehensive Academic Press. way by reflecting the semantics of and the relation- Hodge, G. (2005). Metadata for electronic informa- ships between elements used in bibliographic struc- tion resources: From variety to interoperability. tures. Infor-mation Services & Use, 25(1), 35-45. IFLA Study Group (1998). Functional requirements for bibliographic records, final report. IFLA Study REFERENCES Group on the Functional Requirements for Bibli- ographic Records. Aalberg, T. (2005). From MARC to FRBR: A case study Kumar, P. S. G. (1987). Introduction to colon classifi- in the use of the FRBR model on the BIBSYS data- cation. Nagpur, India: Dattsons. base [PowerPoint slides]. Retrieved from www. Kurth, M., Ruddy, D., & Rupp, N. (2004). Repurposing fla.fi/ frbr05/aalberg2BIBSYSfrbrized.pdf MARC metadata: Using digital project experi- Chan, L. M., & Zeng, M. L. (2006). Metadata interop- ence to develop a metadata management design. erability and standardization - a study of meth- Library Hi Tech, 22(2), 153-165. odology, Part I: Achieving interoperability at the Moen, W. E. (2004). Metadata interaction, integration, schema level. D-Lib Magazine, 12(6). Retrieved and interoperability [PowerPoint slides]. In from http://www.dlib.org/dlib/june06/chan/ National Information Standards Organization 06chan.html (NISO) Workshop: Metadata Practices on the Delsey, T. (1998). The logical structure of the Anglo- Cutting Edge, Washington, DC. American Cataloguing Rules-Part I, Drafted for Moen, W. E. & Benardino, P. (2003). Assessing meta- the Joint Steering Committee for Revision of data utilization: An analysis of MARC content AACR. Retrieved from www.rda-jsc.org/docs/ designation use. In Proceedings of International

81 http://www.jistap.org JISTaP Vol.1 No.1, 69-82

Conference on Dublin Core and Metadata Applications, DC-2003, Seattle, WA, (pp. 171- 180). Retrieved De-cember 12, 2012 from http://dcpapers. dublin core .org / ojs/pubs/arti- cle/view/745/741. Priss, U., & Jacob, E. K. (1998). A graphical interface for faceted thesaurus design. In Procee- dings of the 9th ASIS SIG/CR Classification Research Workshop, (pp. 107-118). Silver Spring, MD: Ame- rican Society for Information Science. Ranganathan, S. R. (1962). Elements of library of classification. Bangalore: Sarada Ranganathan Endow-ment for Library Science. Ranganathan, S. R. (1967). Prolegomena to library classification. New York Asia Publishing House. Svenonius, E. (2000). The intellectual foundation of information organization. (3rd ed.) Cambridge, MA: MIT Press. Taylor, A. G. (1992). The organization of information. Englewood, CO: Libraries Unlimited. Tennant, R. (2004). A bibliographic metadata infra- structure for the 21st century. Library Hi Tech, 22(2), 175-181.

82 Review Paper JISTaP http://www.jistap.org J. of infosci. theory and practice 1(1): 83-91, 2013 Journal of Information Science Theory and Practice http://dx.doi.org/10.1633/JISTaP.2013.1.1.6

Tracking down on 50-year History of Research about Information Management and Technology in Korea

Dept. of Domestic Information Information Service Center Korea Institute of Science and Technology Information

Keywords: Journal of Information Science Theory and practice, Journal of Information Management, JISTaP, Korea Institute of Science and Technology Information, KISTI

1. INTRODUCTION signed and implemented by the information profes- sionals of Korea to reflect changes in both scholarly JISTaP(Journal of Information Science Theory and communications and information technologies. practice) is a newly launched journal focusing on Many theories established on library and informa- information management and technology, replac- tion science and research outputs collaborated with other fields of studies as well as works of early for- ing the existing domestic journal, Journal of Infor- eign researchers are the examples of its coverage. mation Management which has been published To commemorate the birth of JISTaP, the direc- over 50 years now, with a new international one. tion and aim of JISTaP will be set up by providing Journal of Information Management was an intel- analytical review of the fifty years of Journal of lectual property which comprehensively defined Information Management. This is also to provide a the role of KISTI(Korea Institute of Science and guiding principle for the readers and contributors in Technology Information) in science and technology the future. such as information production, collection, organi- zation, database building, information dissemina- 2. PUBLICATION STATUS tion, and standardization. This paper includes R&D outputs on information 2.1 Publication History content, cases regarding information services de- As Korea Scientific and technological Information

Open Access

Received date: February 27, 2013 All JISTaP content is Open Access, meaning it is accessible Accepted date: March 24, 2013 online to everyone, without fee and authors’ permission. All JISTaP content is published and distributed under the terms of *Corresponding Author: Seon Heui Choi the Creative Commons Attribution License (http://creativecom- Principal Researcher mons.org/licenses/by/3.0/). Under this license, authors reserve Dept. of Domestic Information the copyright for their content; however, they permit anyone to Information Service Center, KlSTl unrestrictedly use, distribute, and reproduce the content in any Republic of Korea medium as far as the original authors and source are cited. For E-mail: [email protected] any reuse, redistribution, or reproduction of a work, users must clarify the license terms under which the work was produced.

KlSTl, 2013 83 http://www.jistap.org JISTaP Vol.1 No.1, 83-91

Fig. 1 History of Journal of Information Management’s Cover

Center(KORSTIC) was established in 1962, KORSTIC Research Foundation registered journals. Moreover, had been published as an bulletin to advertise and Journal of Information Management accomodated publicize the organization in 1963, but discontinued to the recent trends by applying the ISO standard in December 1966 as Volume 3, number 1. In 1971, process to its publications in 2008. In 2010, it was KORSTIC is retitled Journal of Information Manage- converted to an open access journal. ment and served as a professional magazine to The fifty years of Journal of Information Manage- propagate information trends and technological dif- ment can be divided into three periods: the first fusion in addition to its previous role as an adver- being 1963 to 1970, a publication focusing on the tisement of the organization. Journal of Information organization’s activities. In the second period (1971 Management, which were being issued bimonthly, ~ 1991), it served to introduce quality academic arti- had been temporarily suspended for two years from cles from in and out of the country as well as an aca- 1986 to 1987 then reissued in 1988. In 1990s, it pro- demic magazine which introduces various informa- vided methodologies through a comprehensive re- tion management organizations. The year 1992 search on theories and practices in the field of infor- marked the beginning of the third period in which it mation management as well as practical guidelines matured into a professional academic journal. for information management professionals. As the journal became a quarterly publication, its contents 2.2 Total Number of Articles grew more versatile. Along with academic articles, a Since 1963 to 2012, which marked the 50th year of series of information management institutes analy- Journal of Information Management, the total num- sis was published and external authors increased to ber of articles published is 915. This is approximately vary the topics of submitted articles. As time went 21 articles per year. The number of articles pub- by, its groundwork as a journal publication hard- lished increased gradually, and the year with the ened; articles with depth in many fields of studies most publications was 2011 (44 articles). The year were submitted and it finally became one of Korea 2008 with 41 articles, 2012 with 38, and 2010 with 37

84 Tracking down on 50-year History

Table 1. Publication history of Journal of Information Management

1963 Publication of KORSTIC

1966 Discontinuance of KORSTIC

1971 Re-issued under the name of Journal of Information Management, a bimonthly

1986-87 Publication halted

1988 Re-issuance of Journal of Information Management expanded and distributed as a quarterly journal

1992 Changed to Quarterly Publication

1996 Registered at bureau of Public Information’s Regular Publication. (Registration No. 12641)

2007 Appointed as Korean Research Foundation Registered academic journal

2010 Changed to OpenAccess academic journal

articles followed respectively. Note that no articles 3rd period(122 articles for 1973~1977) and 4th(103 were published during 1967~1970 and 1986~1987. for 1977~1981) showed drastically increased com- Table 2 illustrates the per-year publication numbers. pared to those of 1st(36) and 2nd five years(41). Rearranging the data with a granularity of 5 year Total 225 articles published during the 3rd and 4th periods, the numbers of published articles for the period amounted to 1/4 of the entire articles(915),

Table 2. Journal of Information Management Per-year Publication Numbers

Publication No. of 5 year Publication No. of 5 year Publication No. of 5 year Year Publication total Year Publication total Year Publication total

1963 8 1980 17 1997 13

1964 18 1981 22 103 1998 13

1965 6 36 1982 18 1999 15

1966 4 1983 30 2000 14 85

1967 - 1984 25 2001 19

1968 - 1985 17 72 2002 24

1969 - 1986 - 2003 22

1970 - 41 1987 - 2004 22

1971 20 1988 9 2005 32 140

1972 21 1989 10 2006 31

1973 26 1990 8 50 2007 33

1974 26 1991 7 2008 41

1975 23 122 1992 16 2009 33

1976 23 1993 15 2010 37 193

1977 24 1994 14 2011 44 73 1978 24 1995 17 2012 38

1979 22 1996 14 Total 915

85 http://www.jistap.org JISTaP Vol.1 No.1, 83-91

Fig. 2 Journal of Information Management yearly published Numbers

corresponding 22.5 publications per year. The rea- as shown in table 3. son that accounts for a bigger volume of publica- Figure 3 shows the per-year article distribution tions for these periods was that the journal revived graph of the most contributing authors to Journal of with a new title called Journal of Information Information Management. It is evident that the ratio Management and a new frequency of ‘bi-monthly’. of external authors sharply increased from 5.5% to Immediately after this period, the number of arti- 65.5% starting the year 1992 when the journal began cles published seemed to dwindle. However, the to focus on scholarly research. Also, it seemed that trend reversed to have 140 articles for 2003~2007 the increase of external authors since 2002 had and 193 for 2008~2012 period. The 2003~2012 peri- affected the change of keywords. od showed the biggest volume of 333 publications Comparing the appearance frequency of authors’ to take up 36.4% of all published articles (33.3 arti- affiliations by both KISTI and external organiza- cles per year). tions, KISTI comprised 36%(433 times)(see Figure 4). This shows that Journal of Information Management 2.3 Authors was a type of representative bulletin of the organi- The total number of authors who contributed to zation. The affiliations other than KISTI were uni- Journal of Information Management is 1,341 and versities 42% (538 times), other research institutes the average number of authors per article came out 14% (170 times), corporate libraries 5% (62 times), a to be 1.5. Without duplicates, the total number of combination of public and school libraries, newspa- unique authors were 791. Table 3. lists the 28 authors per agencies, and academic societies 3%(39 times). who have contributed more than 6 articles. Yoon More specifically, the distribution of university Hee-Yoon has the most articles contributed. Yoon’s showed that had the most first submission was in 2002, and authored a total of appearances (67 times) which is approximately 13 articles. The main areas of the author’s works 12.4%. Chung-Ang University had 61 times (11.3%), were public libraries (4 articles), scholarly commu- Chung-Nam National University 36 times (6.7%), nication (3 articles), and 2 articles for the following Ehwa Womans University 34 times (6.3%). Also, Kei- areas : foreign journals, copyrights, and science and Myung University and Daegu University both had technology resources. Nam young-joon (12), Choi 21 appearances (3.9% each). , Sung-Yong (10), and Lee Jae-Yoon(9) followed Yoon Chonbuk University, Chonnam National University,

86 Tracking down on 50-year History

Table 3. 28 Top Ranking Authors who have contributed to Journal of Information Management

No. of No. of Author Affiliation Author Affiliated Articles Articles

Yoon, Hee-Yoon Daegu University 13 Hwang, Hye-Kyong KISTI 7 Nam, Young-Joon Chung-Ang University 12 Noh, Kyung-Ran KISTI 7 Choe, Seong-Yong KISTI(KORSTIC) 10 Seo, Tae-Sul KISTI 7 Lee, Jae-Yun Kyonggi University 9 Yu, Gyeong-Hui KISTI(KORSTIC) 7 Choi, Hee-Yoon KISTI 8 Kim, Eun-Sik KISTI(KIET) 6 Kim, Hye-Sun KISTI 8 Kim, Seong-Hee Chung-Ang University 6 Chungnam National Kim, Sung-Won 8 Kim, Suk-Young KISTI 6 University Chungnam National Lee, Eung-Bong 8 Lee, Jee-Yeon Yonsei University 6 University Lee, Hyeon-Cheol KISTI(KORSTIC) 8 Lee, Jeong-Il KISTI(KORSTIC) 6 Nam, Tae-Woo Chung-Ang University 8 Lee, U-Beom 6 Oh, Dong-Geun Keimyung University 8 Mok, Yeon-Gyun KISTI(KIET) 6 SaGong, Cheol KISTI(KORSTIC) 8 NamGung, Bong KISTI(KORSTIC) 6 Yu, Ja-Gyeong KISTI(KORSTIC) 8 Park, Hyun-Woo KISTI 6

Ahn, Hyun-Soo KT R&D Group 7 Yoon, Cheong-Ok Cheongju University 6

Fig. 3 Yearly Article Distribution of Top Rankin Authors

87 http://www.jistap.org JISTaP Vol.1 No.1, 83-91

Busan University, and Kyungpook National Univer- 2.4 Keywords sity follwed in the order. The use of “Keywords” started mainly in 1992, Lastly, a total of 106 foreign authors appeared in and they were in Korean and English language. Only Journal of Information Management over the last English keywords were used for this job. A total of fifty years, comprising 13% among the total 791 1903 words were identified and this is about 5.4 key- authors.(see Figure 5). 96 articles out of 102 were words per articles. The most used keyword reflects identified to be translated articles, and non-trans- the topic of interest of the contributors. Table 4 lated articles authored by foreign authors had began shows the most used keywords and their count. to be directly submitted since 2007. Among all keywords, “Database” was the most used During the time, as the main goal of the journal word with a count of 30. “Digital Library” and “Elec- publication was to propagate overseas technology tronic Journal” followed “Database”. transfer and trend information, many translated Figure 6 represents the distribution graph based on foreign articles were introduced to the domestic the keywords from Table 4. We observe that “Data- readers. base”, “Digital Library”, “Electronic Journal”, “Univer-

Fig. 4 Ratio of Affiliation of Authors Fig. 5 Foreign Authors in Journal of Information Management

Table 4. Keywords Used the Most

Keyword No. of Times Keyword No. of Times

Database 30 Collection Development 9

Digital Library 23 Copyright 8

Electronic Journal 16 Knowledge Management 8

Information Retrieval 15 Scholarly Communication 8

Internet 15 Science and Technology Information 8

University Library 15 Academic Library 7

Citation Analysis 14 KESLI 7

Metadata 14 KISTI 7

Public Library 14 Resource Sharing 7

Information Service 13 Thesaurus 7

Open Access 12 User Satisfaction 7

88 Tracking down on 50-year History

Keyword 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12

Database 4 3 3 1 3 3 1 1 2 1 2 2 2 1 1 Digital Library 2 1 3 1 2 1 2 1 2 2 2 1 3 Electronic Journal 1 1 2 1 1 1 2 3 1 2 1 Information Retrieval 3 3 1 1 2 2 1 2 Internet 3 2 2 4 1 1 1 1 University Library 1 1 1 2 1 1 1 1 1 1 1 Citation Analysis 3 1 1 2 2 2 1 2 Metadata 1 1 1 2 2 1 2 2 1 1 Public Library 1 1 1 2 1 3 2 3 Information Service 1 1 1 1 1 1 1 1 2 1 1 1 Open Access 1 1 2 1 1 3 1 2 Collection Development 1 2 1 2 3 Copyright 1 1 1 2 1 3 1 Knowledge Management 1 1 2 1 1 1 1 1 Scholarly Communication 1 1 1 1 2 1 2 1 Science and Technology Information 1 1 1 2 1 1 1 Academic Library 1 1 1 4 KESLI 1 2 1 1 1 1 KISTI 5 1 1 Resource Sharing 2 1 2 1 1 Thesaurus 1 1 1 1 1 1 1 User Satisfaction 1 1 1 4

Fig. 6 Yearly Distribution of Top Ranking Keywords

sity Library”, “Information Service” have been the researchers in the field had a great interest in that most frequently appeared topics of interest over the keyword. The most used keywords for each 5-year last 20 years (1992~2012). Dividing the keyword time periods were identified as Database - Database usage into two periods of 1992~2001 and 2002~2012, - KISTI - Metadata - Public Library respectively. reveals that articles with more varying topics were Such keywords, which differed in terms of propensi- submitted in the later period than the former. “Meta- ty, started appearing in 2004 and had become a data”, “Public Library,” “Open Access”, “Collection mainstream after 2009. Development”, “Science and Technology Infor- mation”, “Scholarly Communication”, “Knowledge 3. CONCLUSIONS Management”, “Copyright”, “Academic Library”, “KESLI”, “KISTI”, “Resource Sharing”, “Thesaurus”, This paper investigated changes in both scholarly and “User Satisfaction” were the keywords appeared communications and information technologies in more frequently in the later period. Some of the key- Korea for 50 years by reviewing 915 articles pub- words appeared in the later period such as Public lished in the Journal of Information Management Library, Collection Development, Scholarly Communi- from 1963 through 2012. It also analyzed resear- cation, Knowledge Management, Academic Library, chers and article keywords published in that jour- User Satisfaction differed from those that frequently nal. The following parts summarize research results appeared in the former period. This phenomena of this study. accorded with the period(2002) in which the number From 1963 to its 50th anniversary in 2012, the Journal of external authors increased as shown in Figure 3. of Information Management totally published 915 Table 5 shows the keyword distribution by subdi- articles, publishing a yearly average of 21 vided time periods. In the first period, “Database” articles(the average excluding the year of which were used 7 times in the 2 year span and 11 times in no article has been published). The year with the the next 4 year span (’94~’98). This shows that the most articles was 2011 (44 articles).

89 http://www.jistap.org JISTaP Vol.1 No.1, 83-91

Table 5. Keyword Distribution per Section

Year Ranking Keyword No. of times

1 Database 7

2 Information Retrieval 6

1992~1993 3 Automatic Indexing 3

4 Citation Analysis 3

5 Acquisition System 2

1 Database 11

2 Internet 10

1994~1998 3 Digital Library 7

4 Hypertext 5

5 SGML 5

1 KISTI 6

2 Database 5

1999~2003 3 Digital Library 5

4 Authority Control 4

5 Authority File 3

1 Metadata 8

2 Open Access 8

2004~2008 3 Digital Library 7

4 University Library 7

5 Citation Analysis 6

1 Public Library 9

2 Scientific Data 5

2009~2012 3 User Satisfaction 5

4 Academic Library 4

5 Digital Library 4

The authors who have contributed to Journal of authors comprised 36% (433 times), universities Information Management is 1,341 (overlapping took up 42% (538 times), research center other than aspect is removed, it indicated 791 authors) in total, KISTI took up 14% (170 times), corporate refer- and 1.5 author per each article contributed their ence room took up 5% (62 times), regional library, writing. Hee-Yoon Yoon submitted the most arti- elementary, middle and high school library, news- cles, 13 articles, as a main author. papers, academy and so on took up 3% (39 times). In particular, since the year of 2002, it can be noted If a closer look is taken on universities which had that the ratio of external authors rapidly increased, shown the most proportion, Yonsei University and such rapid increase of external authors seemed accounted for 12.4% of the total with 67 times. to have affected the change of keyword. Foreign authors appearing in Journal of Infor- Ratio of affiliation of authors were KISTI affiliated mation Management, a total of 106 authors were

90 Tracking down on 50-year History

involved in the past. Out of 102 articles of foreign authors, 96 articles appeared as translated items, and from 2007, articles which were not translated were contributed From 1992, the keyword in Journal of Information Management began to be used in earnest. The keywords were a total number of 1,903, and the average keyword per each article showed to be 5.4. Among all keywords, “Database” was the most used word. The introduction of keywords in the second half was somewhat different from the previous themes, and this corresponds to the period of increased foreign authors since 2002.

The Journal of Information Management was published for many years in library and information science in Korea. Once its publication had ceased, but the journal was reissued as a better journal. JISTaP will develop based on the Journal of Infor- mation Management.

Reference

Choi, H. K. (1999). An analytical study on research patterns in library and information science. Journal of the Korean Society for information Management, 16(3), 137-158. Oh, S. H., & Lee, T. Y. (2005) Research trends of infor- mation science in Korea. Journal of the Korean Society for information Management, 22(1), 167- 189. Sohn. J. P. (2003). An analytical study on research trends of library and information science in Korea : 1957~2002. Journal of Korean Library and Information Science Society, 34(3), 9-32. Korea Institute of Science and Technology Information (2012). The 50th year of KISTI. Daejeon: Korea Institute of Science and Technology Information.

91 http://www.jistap.org Editorial Board of JISTap

Co-Editors-in-Chief

Gary Marchionini is the Dean and Cary C. Boshamer Professor in the School of Information and Library Science at the University of North Carolina at Chapel Hill. He teaches courses in human- information interaction, interface design and testing, and digital libraries. He has published over 200 articles, chapters and reports in a variety of books and journals. Professor Marchionini has had grants or research awards from the National Science Foundation, Council on Library Resources, the National Library of Medicine, the Library of Congress, Bureau of Labor Statistics, Kellogg Foundation, NASA, The National Cancer Institute, Microsoft, Google, and IBM among others. Professor Marchionini was Editor-in-Chief for the ACM Transaction on Information Gary Marchionini Systems (2002-2008) and is the editor for the Morgan-Claypool Lecture Series on Information [email protected] Concepts, Retrieval, and Services. He has been program chair for ACM SIGIR (2005) and ACM/IEEE JCDL (2002) as well as general chair of ACM DL 96 and JCDL 2006. His current inter- ests and projects are related to: interfaces that support information seeking and information retrieval and usability of personal health records. He currently is PI on a grant from NSF focused on a search results framework that supports searches over multiple sessions and in collabora- tion.

Dr. Dong-Geun Oh, Ph. D., MLIS, MBA, is a Professor and Chairperson, Department of Library and Information Science, Kei-Myung University, . His concerns are various from cataloging and classification to library management. He has done major roles in many academic societies in Korea as a vice president, EIC, and/or committee members, including Korean Library and Information Science Society, etc; as the members of the national committees in Korea, including those for the National Library for Children and Young Adults; for the Ministry of Education & Human Resources Development, etc. He received “Korean Library Award 2001” (Korean Library Association), “Best Research Award Dong-Geun Oh 2002” and “Distinguished Achievement Award 2012” (Kei-Myung University), and “Acade- [email protected] my Award 2009” (Korean Library and Information Science Society). He has been selected 6 times as “Bisa Best Research Professor” (Kei-Myung University) for his achievements in research and teaching. He has written and translated more than 40 books, and has pub- lished more than 70 articles in the scholarly journals, including “Complaining Behavior of Public Library Users In South Korea”, “Developing and Maintaining a National Classification System”, etc. He also has done more than 30 research projects, and has lec- tured many times home and abroad, including in India, Singapore, and Japan.

92 Associate Editors

Dr. Honam Choi is a General Director of Information Service Center at Korea Institute of Science & Technology Information (KISTI). He worked as the 1st Chairman of Korea Special Library Association until the end of March 2010. He also was a vice President of Korea Library Association from July 2009 through June 2011. As a general director of infor- mation service center, he is currently responsible for national S&T information services for Korean researchers, professors, students and stake-holders. Dr. Choi is interested in devel- oping criteria for evaluation against research output by employing the methods of citation data analysis. He has experience of building and analyzing the citation data for the purpose Honam Choi of evaluation of domestic scholarly journals published by learned societies in Korea. He [email protected] has published about 20 journal articles and technical reports.

Dr. Kiduk Yang studied Computer Science as an undergraduate in the University of North Carolina at Chapel Hill (UNC-CH), after which he began a career as an application pro- grammer and systems developer. While working over 14 years as an IT professional, he earned a Master’s degree and Ph.D in Information Science at the School of Information and Library Science at UNC-CH. Dr. Yang began his academic career in the School of Library and Information Science at Indiana University Bloomington, where he established and directed the WIDIT (Web Information Discovery Integrated Tool) Research Laboratory. At present, Dr. Yang is an associate professor at the Department of Library and Kiduk Yang Information Science at Kyungpook National University in Korea. Dr. Yang’s research areas [email protected] are information retrieval with emphasis on leveraging human knowledge for information discovery on the Web and bibliometrics with a focus on multi-facetted approach to quality assessment. His research projects include integration of information retrieval and knowl- edge organization, multi-faceted fusion approach to bibliometric analysis, fusion of data sources and retrieval methods, and targeted opinion detection in blogosphere.

93 Managing Editors

Dr. Hea Lim Rhee is currently a senior researcher at Korea Institute of Science and Technology Information. Also she is a managing editor of the Journal of Information Science Theory and Practice. She got her Doctor of Philosophy degree from the School of Information Sciences at the University of Pittsburgh in 2011. She received her master of science in information from the University of Michigan, where she specialized in archives and records management. Before coming to the United States, her undergraduate major was library and information science. She continued to pursue her studies in this area in her master’s program at in Korea, specializing in East Asian archival Hea Lim Rhee studies. Her research interests include archival appraisal, government records, user studi- [email protected] es, international archives, digital archives, records management, and information-seeking behavior. Her professional experience in archives and libraries includes work as a librarian at a main library at Ewha and as an intern for a library at Columbia University.

Yong-Gu Lee is an assistant professor at the Department of Library and Information Science, Keimyung University, Republic of Korea. He received Master degree and Ph.D. in Library and Information Science from the Yonsei University, Korea. His research areas are information retrieval, automated text categorization, word sense disambiguation and digi- tal library.

Yong-Gu Lee [email protected]

94 Editorial Board

Dr. B. Ramesh Babu is Professor in the Department of Library and Information Science, University of Madras. He has been awarded Dr. S.R. Ranganathan Memorial Gold Medal from the University of Mysore for the First Rank in M. Lib. Sc., degree. He has been award- ed Commonwealth Fellowship for Post-Doctoral research for the year 1999/2000. He has also visited France, Nepal, Bangladesh, Muscat and South Korea on academic invitation. He has been awarded C. D. Sharma Best Paper Award by the Indian Library Association for the Year 1999 and READIT 2001 Best Paper Award by the IIT, Madras, IGCAR and MALA at the National Conferences. He has also been conferred Prof. Parvathaneni Gangadhara Rao Beeraka Ramesh Babu Memorial Award for 2007 by the Potti Sreeramulu Telugu University, Hyderabad for the [email protected] significant contributions in the field of Library and Information Science. He has also been conferred the Best Teacher and Researcher Award by the National Association of Indian libraries (NAIL) for the year 2008 and IATLIS- Motiwale Best LIS Teacher award in 2011. He has published more than 330 research papers in Indian and Foreign journals, Festschrift volumes and National and International seminars/workshops on various aspects of Library and Information Science. He has organized a number of workshops, seminars and conferences. He is a Resource person in various Distance Education Institutes and pre- pared course materials and delivered lectures. He has delivered Guest Lectures in a num- ber of Universities and Academic Staff Colleges in Andhra Pradesh, Tamilnadu, Karnataka, Maharashtra, Pondicherry and Orissa States.

France Bouthillier has been director of the School of Information Studies, McGill University, since 2004 and an associate professor since 2002. Her teaching areas include the management of information services, business information, competitive intelligence and information and society. Her publications, presentations and research projects deal with information needs of small businesses, knowledge management, education of infor- mation professionals, competitive intelligence and ethnographic research. In 2003-2004 she was elected president of the Corporation of Professional Librarians in Quebec, and president of the Canadian Association of Information Science. Prior to her hiring at McGill, France Bouthilier she had held various teaching and administrative positions in other universities and public [email protected] agencies. Professor Bouthillier has worked as a consultant for conducting various organiza- tional studies in both the public and private sectors.

95 Editorial Board

Kathleen Burnett is a professor at the College of Communication and Information. She earned her B.A. (1978) summa cum laude in German Literature, with a minor in Philosophy, from the University of California, San Diego. She got her M.L.S. (1979) and Ph.D. (1989) in Library and Information Studies with a specialization in the History of Printing and Publishing from the University of California, Berkeley. Dr. Burnett conducts research on the social meaning of information, including the development and application of theoretical frameworks to empirical studies of information worlds, the examination of interaction in online education, anticipatory socialization in doctoral education, and the Kathleen Burnett emergence of disciplinary identity in information science. Currently, she conducts studies [email protected] of Latinas’ engagement with IT and scientists’ collaborative practices. She teaches courses in international and comparative information services, information behavior, and infor- mation education. She is a member of the ALISE Publication Committee and a past chair of the Statistical Advisory Board (2007). She was recently appointed an editor of the ALISE publication, Journal of Education for Library and Information Science Education (JELIS).

Boryung Ju is an Associate Professor at the School of Library & Information Science, Louisiana State University. Her research interests include human-computer interaction, design and evaluation of user interfaces, usability analysis, human factors, and knowledge management. She received her Ph.D. in Information Studies from The Florida State University, her Master’s from Indiana University, and MA & BA degrees from Chung-Ang University in Seoul, Korea.

Boryung Ju [email protected]

96 Prof. Mallinath Kumbar obtained his B.L.I.Sc. in First Rank from Gulbarga University, M.L.I.Sc. in first class from Bangalore University, M.A. in Political Science and Ph.D. degree in Library & Information science from Karnataka University, Dharwad. He started his career as Assistant Librarian in December 1985 in Mangalore University, later he joined as Assistant Librarian in Kuvempu University, Shimoga and he started Department of Library and information science in Kuvempu University in 1993. In 1999 he joined University of Mysore as Reader; in 2007 he was promoted as Professor in department of Library and Information Science, University of Mysore. Prof. Mallinath Kumbar has successfully guid- Mallinath Kumbar ed 5 Ph.D & 2 Phil’s; he has published 113 research papers and 3 books. He has worked has [email protected] organizing secretary and director for national & International conferences. He has visited aboard for academic assignments. Prof. Mallinath Kumbar is recipient of State award for Securing first Rank in B.L.I.Sc. From Department of Youth Service Govt of Karnataka in 1984 and recently on 12 August 2012 he has been felicitated with State award for the contri- butions to the Field of Library & Information Science from the Government of Karnataka (India).

Shailendra Kumar has done his Ph.D. in Library and Information Science, he got his Master degree in LIS and B.Sc. from University of Delhi. He is presently working as Associate Professor in Department of Library and Information Science, University of Delhi and Head of Department during 2001 to 2004. He has more than 32 years of Professional experience. He served also IGNOU, CSIR and INSA at Delhi. He got Young Information Scientist award in 1994, Bharat Jyoti Award in 2006 and Fellowship Award of Society of Information Science in 2008, IATLIS Best Teacher Award in LIS for his contributions in Education in Library and Information Science. He pioneered first V-Book in Library and Shailendra Kumar Information Science in 2003 and consecutively published three more E-Books on LIS in [email protected] India. Core member of first e-course in LIS developed by Consortium For Educational Communication(CEC) of UGC in India. Participated in many teleconferencing pro- grammes in IGNOU and CEC. He has to his credit more than 75 publications and five books. He was chairperson and panelist for many international and national conferences. He has supervised 35 M.Phil and 10 Ph.D. degrees from University of Delhi and other uni- versities. His interest areas are Library and Information Management Softwares, E- Learning and Scientometrics.

97 Editorial Board

Fenglin Li is a Professor in Information Systems and E-commerce Department at the School of Information Management, Wuhan University, and a member of China Electronic Commerce Association. He is the Deputy Director of Research Center of Modern Service at Wuhan University. His academic interests include human information behavior, business information services, e-commerce, theory and application of service management. He received the Ph.D. in informatics from the School of Information management, Wuhan University in 2002, and the M.S. in space physics from the School of Electronic informa- tion, Wuhan University in 1998. He taught physics at high school in Wuhan after gradua- Fenglin Li tion in Physics at Jianghan University in 1982. He works at Wuhan University from 1998, [email protected] and his teaching courses are the principle of computer, business process management, and modern service management. He currently conducts research on the personalized information service system based on context aware.

Lokman I. Meho earned a B.A. and M.A. in Political Studies from the American University of Beirut (AUB) in 1991 and 1996, respectively. He also earned an MLS in 1996 and PhD in 2001 in Information and Library Science from North Carolina Central University and the University of North Carolina at Chapel Hill, respectively. Before joining AUB as the University Librarian, he was a tenured Associate Professor at the School of Library and Information Science at Indiana University Bloomington. Dr. Meho is the author of several major bibliographies, including The Kurds and Kurdistan, Libraries and Information in the Arab World, Kurdish Culture and Society, The Kurdish Question in U.S. Foreign Policy, and Lokman I. Meho Censorship in the Arab World. He is also the author of several major studies that appeared [email protected] mostly in the Journal of the American Society for Information Science and Technology. Dr. Meho’s research interests are in the areas of citation analysis, bibliometrics, scientomet- rics, and scholarly communication. His teaching interests include reference, online search- ing, social science information, and evaluation of library sources and services. He is the recipient of several teaching and best paper awards.

98 Dr. Jin-Cheon Na is an associate professor in the Wee Kim Wee School of Communication & Information at Nanyang Technological University (NTU), Singapore. Before he joined NTU, he was a senior researcher at the Agency for Defense Development, Korea. He obtained his PhD in the Department of Computer Science at Texas A&M University. He has published more than 60 papers mainly in the areas of digital libraries, sentiment analy- sis, hypertext & hypermedia, document engineering, and knowledge organization. He is a member of the Editorial Board of International Journal of Organizational and Collective Intelligence (IJOCI) and Journal of Information Science Theory and Practice (JISTaP). Jin-Cheon Na [email protected]

Daniel O. O’Connor has his MSLS and Ph.D. from Syracuse University and he is currently an Associate Professor in the Department of Library and Information Science at Rutgers where he teaches research methods in the undergraduate, masters and doctoral programs at the School of Communication & Information. He has served as an elected member of the Council of the American Library Association (ALA), chaired ALA’s Education Committee, chaired its Committee on Research and Statistics, and served as a member of ALA’s Committee on Accreditation.. He was a Co-conference chair and referee for the Association for Library and Information Science Education. At Rutgers, Dr. O’Connor is the Daniel O. O’Connor chair of the New Brunswick Faculty Council which includes all academic departments in [email protected] central campuses. He has served as Co-Chair of the University Senate Budget Committee and as Chair of the New Brunswick Faculty Council Personnel Committee. He is the cur- rent President of the New Jersey State Conference of the American Association of University Professors, representing over 7,500 faculty and professional staff at 13 higher education institutions. He has held consulting positions with the New York Public Library and the Metropolitan Museum of Art.

99 Editorial Board

Alice Robbin is an associate professor of library and information science in the School of Library and Information Science at Indiana University Bloomington. She served as director and co-director of the Rob Kling Center for Social Informatics between 2004 and 2012. Her research interests include information policy, communication and information behavior in complex organizations, and the societal implications of the information age. In addition to her research on the effects of digital inequality and information seeking behavior on the Internet, she examines the political controversy over the federal reclassification of stan- dards for racial and ethnic group data and, with colleagues, studies the evolution of e-gov- Alice Robbin ernment in Lebanon and the Internet censorship regulatory regime in China. [email protected]

Paul Solomon served as statistician, operations research analyst, and program analyst/evalu- ator in several agencies of the Federal Government. Following this he served on the faculty of the School of Information and Library Science at the University of North Carolina at Chapel Hill, including service as an administrator. Most recently he is a member of the fac- ulty of the School of Library and Information Science, University of South Carolina. Also, he service as Fulbright Professor (Finland). Dr. Solomon’s research focuses on social stud- ies of information, which try to understand what information is to people as they engage in life and work, in such contexts as schools, government, and universities, and with such Paul Solomon populations as children, managers, chemists/chemical engineers, and the elderly. Dr. [email protected] Solomon’s teaching has been focused on research methods, management, and user per- spectives. He earned a B.S. in Business Administration from Penn State; an M.B.A. from the University of Washington; and M.L.S. and PhD degrees from the University of Maryland. He is the author of numerous information and library science publications, many of which have been widely cited and several have been recognized as ‘best papers’ by the American Society for Information Science and Technology.

100 Information for Authors

The Journal of Information Science Theory and Practice (JISTaP), which is published quarterly by the Korea Institute of Science and Technology (KISTI), welcomes materials that reflect a wide range of perspectives and approaches on diverse areas of information science theory, application and practice. JISTaP is an open access journal run under the Open Access Policy. See the section on Open Access for detailed information on the Open Access Policy.

A. Originality and Copyright All submissions must be original, unpublished, and not under consideration for publication elsewhere. Once an article is accepted for publication, all papers are accessible to all users at no cost. If used for other research- es, its source should be indicated in an appropriate manner and the content can only be used for uncommer- cial purpose under Creative Commons License.

B. Peer Review All submitted manuscripts undergo a single-blind peer review process in which the identities of the reviewers are withheld from the authors.

C. Manuscript Submission Authors should submit their manuscripts online via http://www.jistap.org. Online submission facilitates pro- cessing and reviewing of submitted articles, thereby substantially shortening the paper lifecycle from submiss- ion to publication. After checking the manuscript’s compliance to the Manuscript Guidelines, please follow the “e-Submission” hyperlink in the top navigation menu to begin the online manuscript submission process.

D. Open Access With the KISTI’s Open Access Policy, authors can choose open access and retain their copyright or opt for the normal publication process with a copyright transfer. If authors choose Open Access, their manuscripts become freely available to public under Creative Commos License. Open access articles are automatically archived in the KISTI’s Open Access repository (OAK Central). If authors do not choose open access, access to their articles will be restricted to journal users.

E. Manuscript Guidelines Manuscripts that do not adhere to the guidelines outlined below will be returned for correction. Please read the guidelines carefully and make sure the manuscript follows the guidelines as specified. We strongly recommend that authors download and use the manuscript template in preparing their submissions.

101 Manuscript Guidelines

1. Page Layout : All articles should be submitted in single column text on standard letter size paper (21.59 x 27.94 cm) with nor- mal margins.

2. Length : Manuscripts should normally be between 4,500 and 9,000 words (10 to 20 pages).

3. File Type: Articles should be in submitted in Microsoft Word format. To facilitate the manuscript preparation process and speed up the publication process, please use the manuscript template.

4. Text Style : Use a standard font (e.g., Times New Roman) no smaller than size 10. Use single line spacing for paragraphs. Use footnotes to provide additional information peripheral to the text. Footnotes to tables should be marked by superscript lowercase letters or asterisks.

5. Title Page : The title page should start with a concise but descriptive title and the full names of authors along with their affiliations and contact information (i.e., postal and email addresses). An abstract of 150 to 250 words should appear below the title and authors, followed by keywords (4 to 6).

Author1 Affilliation, Postal Address. E-mail

Author2 Affilliation, Postal Address. E-mail

ABSTRACT A brief summary (150-250 words) of the paper goes here.

Keywords: 4 to 6 keywords, separated by commas.

6. Numbered Type:

1. INTRODUCTION All articles should be submitted in single column text on standard letter size paper (21.59 x 27.94 cm) with normal margins[1 . Text should be in 11-point standard font (e.g., Times New Roman) with single line spacing. [1 Normal margin dimensions are 3 cm from the top and 2.54 cm from the bottom and sides.

102 2. SECTIONS The top-level section heading should be in 14-point bold all uppercase letters. 2.1. Subsection Heading 1 The first-level subsection heading should be in 12-point bold with the first letter of each word capitalized. 2.1.1. Subsection Heading 2 The second-level subsection heading should be in 11-point italic with the first letter of each word capi- talized.

7. Figures and Tables : All figures and tables should be placed at the end of the manuscript after the reference list. To note the place- ment of figures and tables in text, “Insert Table (or Figure) # here” should be inserted in appropriate places. Please use high resolution graphics whenever possible and make sure figures and tables can be easily resized and moved.

Figure

Fig. 1 Distribution of Authors Over Publication Count

Table

Table 1. The Title of Table Goes Here

Study Time Period Study Data

Smith & Wesson (1996) 1970 - 1995 684 papers in 4 SSCI journals

Reeves[a (2002) 1997 - 2001 597 papers in 3 SSCI journals

Jones & Wilson[b (2011) 2000 - 2009 2,166 papers in 4 SSCI journals

[a Table footnote a goes here [b Table footnote b goes here

103 Manuscript Guidelines

8. Acknowledgements : Acknowledgments should appear in a separate section before the reference list.

9. Citations : Citations in text should follow the author-date method (authors’ surname followed by publication year). Several studies found ... (Barakat et al., 1995; Garfield, 1955; Meho & Yang, 2007). In a recent study (Smith & Jones, 2011) ... Smith and Jones (2011) investigated ...

10. Reference List : Reference list, formatted in accordance with the American Psychological Association (APA) style, should be alpha- betized by the first author’s last name.

Journal article Author, A., Author, B., & Author, C. (Year). Article title. Journal Title, volume(issue), start page-end page. Smith, K., Jones, L. J., & Brown, M. (2012). Effect of Asian citation databases on the impact factor. Journal of Information Science Practice and Theory, 1(2), 21-34.

Book Author, A., & Author, B. (Year). Book title. Publisher location: Publisher Name. Smith, K., Jones, L. J., & Brown, M. (2012). Citation patterns of Asian scholars. London: Sage.

Book chapter Author, A., & Author, B. (Year). Chapter title. In A. Editor, B. Editor, & C. Editor (Eds.), Book title (pp. xx-xx). Publisher location: Publisher Name. Smith, K. & Brown, M. (2012). Author impact factor by weighted citation counts. In G. Martin (Ed.), Bibliometric approach to quality assessment (pp. 101-121). New York: Springer.

Conference paper Author, A., & Author, B. (Year). Article title. In A. Editor & B. Editor (Eds.), Conference title (pp. xx-xx). Publisher location: Publisher Name. Smith, K. & Brown, M. (2012). Digital curation of scientific data. In G. Martin & L. J. Jones (Eds.), Proceedings of the 12th International Conference on Digital Curation (pp. 41-53). New York: Springer.

Online document Author, A., & Author, B. (Year). Article title. Retrieved month day, year from URL. Smith, K. & Brown, M. (2010). The future of digital library in Asia. Digital Libraries, 7, 111-119. Retrieved May 5, 2010, from http://www.diglib.org/publist.htm.

104 Journal of Information Science Theory and practice (JISTaP) Call for Paper

We would like to invite you to submit or recommend papers to Journal of Information Science Theory and Practice (JISTaP, eISSN: 2287-4577, pISSN: 2287-9099), a fast track peer-reviewed and no-fee open access academic journal pub- lished by Korea Institute of Science and Technology Information (KISTI), which is a government-funded research insti- tute providing STI services to support high-tech R&D for researchers in Korea. JISTaP marks a transition from Journal of Information Management to an English-language international journal in the area of library and information science.

JISTaP aims at publishing original studies, review papers and brief communications on information science theory and practice. The journal provides an international forum for practical as well as theoretical research in the interdisciplinary areas of information science, such as information processing and management, knowledge organization, scholarly communication and bibliometrics.

We welcome materials that reflect a wide range of perspectives and approaches on diverse areas of information science theory, application and practice. Topics covered by the journal include: information processing and management; infor- mation policy; library management; knowledge organization; metadata and classification; information seeking; infor- mation retrieval; information systems; scientific and technical information service; human-computer interaction; social media design; analytics; scholarly communication and bibliometrics. Above all, we encourage submissions of catalytic nature that explore the question of how theory can be applied to solve real world problems in the broad discipline of information science.

Co-Editors in Chief : Gary Marchionini & Dong-Geun Oh

Please click the “e-Submission” link in the JISTaP website (http://www.jistap.org), which will take you to a log- in/account creation page. Please consult the “Author’s Guide” page to prepare your manuscript according to the JISTaP manuscript guidelines.

Any question? Hea Lim Rhee (managing editor): [email protected] Journal of Information Science Theory and practice (JISTaP) Errata Leaf

Page Error Correction

03 Beeraka Ramesha Babu Beeraka Ramesh Babu

05 B. Ramesha Babu B. Ramesh Babu

54 B. Ramesha Babu B. Ramesh Babu

93 [email protected] [email protected]

Beeraka Ramesha Babu Beeraka Ramesh Babu [email protected] [email protected]

Dr. Ramesha completed BLISc., and MLISc., from Dr. B. Ramesh Babu is Professor in the Department of Bangalore University, Karnataka, INDIA and awarded Library and Information Science, University of Madras. Ph.D. from Karnatak University, Dharwad. He was He has been awarded Dr. S.R. Ranganathan Memorial started his carrier as a Librarian in the United Mission Gold Medal from the University of Mysore for the First Degree College, Bangalore and joined the Karnatak Rank in M. Lib. Sc., degree. He has been awarded University Library, Dharwad in 1999 as a Asst. Commonwealth Fellowship for Post-Doctoral research Librarian. Subsequently he was moved to University of for the year 1999/2000. He has also visited France, Madras, Chennai as a Lecturer. Presently he is working Nepal, Bangladesh, Muscat and South Korea on acade- as a Associate Professor in the Dept. of Library and mic invitation. He has been awarded C. D. Sharma Best Information Science, Bangalore University, Bangalore. Paper Award by the Indian Library Association for the As of today he has participated more then 50 national Year 1999 and READIT 2001 Best Paper Award by the and international conferences, seminars and work- IIT, Madras, IGCAR and MALA at the National shops and published over 75 research papers both in Conferences. He has also been conferred Prof. national and international journals and conference Parvathaneni Gangadhara Rao Memorial Award for 95 proceedings. Completed one one Major and one minor 2007 by the Potti Sreeramulu Telugu University, Research proejct funded by UGC. Dr. Ramesha suved Hyderabad for the significant contributions in the field as resource person in many Academic Staff Colleges(ASC), of Library and Information Science. He has also been workshops and symposia. He is a editor of many pro- conferred the Best Teacher and Researcher Award by fessional journals includes Indian Journal of Library the National Association of Indian libraries (NAIL) for and Information, Journal of Indian Library Association, the year 2008 and IATLIS- Motiwale Best LIS Teacher KELPRO Bulletin and Consulting Editor of Journal of award in 2011. He has published more than 330 Information Science Theory and Practice (JISTaP), research papers in Indian and Foreign journals, KISTI, Korea. He is a life member of many professional Festschrift volumes and National and International associations including ILA, IASLIC, IATLIS, FIC, MALA, seminars/workshops on various aspects of Library and KALA etc. His areas of interest includes Information Information Science. He has organized a number of Literacy, Design and Development of Digital Repositories workshops, seminars and conferences. He is a Resource and Application of ICT in LIS. person in various Distance Education Institutes and prepared course materials and delivered lectures. He has delivered Guest Lectures in a number of Universities and Academic Staff Colleges in Andhra Pradesh, Tamilnadu, Karnataka, Maharashtra, Pondicherry and Orissa States. () JISTaP Journal of Information Science Theory and Practice http://www.jistap.org

245 Daehangno, Yuseong-gu, Daejeon, Republic of Korea(ZIP code:305-806) Tel. (82)-42-869-1615 Fax. (82)-42-869-1767 http://www.jistap.org