Knowl. Org. 46(2019)No.3

KO KNOWLEDGE ORGANIZATION

Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444 International Journal devoted to Concept Theory, Classification, Indexing and Knowledge Representation

Contents

Articles Letters to the Editor

Graham Freeman and Robert J. Glushko. Guohua Xiao. Organization, Not Inspiration: A Historical Knowledge Ontology: A Tool for the Unification Perspective of Musical Information Architecture ...... 161 of Knowledge...... 236

Lielei Chen and Hui Fang. Birger Hjørland. An Automatic Method for Extracting Innovative Annual Progress in Knowledge Organization (KO)? Ideas Based on the Scopus® Database ...... 171 Annual Progress in Thesaurus Research? ...... 238

Debashis Naskar and Subhashis Das. Books Recently Published ...... 240 HNS Ontology Using Faceted Approach ...... 187

Robin A. Moeller and Kim E. Becnel. “Why On Earth Would We Not Genrefy the Books?”: A Study of Reader-Interest Classification In School Libraries ...... 199

Reviews of Concepts in KO

Wendy Korwin and Haakon Lund. Alphabetization ...... 209

Emma Stuart. Flickr: Organizing and Tagging Images Online ...... 223

Knowl. Org. 46(2019)No.3

KNOWLEDGE ORGANIZATION KO

Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444 International Journal devoted to Concept Theory, Classification, Indexing and Knowledge Representation

KNOWLEDGE ORGANIZATION José Augusto Chaves GUIMARÃES, Departamento de Ciência da Informacão, Universidade Estadual Paulista–UNESP, Av. Hygino Muzzi This journal is the organ of the INTERNATIONAL SOCIETY FOR Filho 737, 17525-900 Marília SP Brazil. E-mail: [email protected] KNOWLEDGE ORGANIZATION (General Secretariat: Amos DA- VID, Université de Lorraine, 3 place Godefroy de Bouillon, BP 3397, Michael KLEINEBERG, Humboldt-Universität zu Berlin, Unter den 54015 Nancy Cedex, France. E-mail: [email protected]. Linden 6, D-10099 Berlin. E-mail: [email protected]

Editors Kathryn LA BARRE, School of Information Sciences, University of Illi- nois at Urbana-Champaign, 501 E. Daniel Street, MC-493, Champaign, IL Richard P. SMIRAGLIA (Editor-in-Chief), Institute for Knowledge Or- 61820-6211 USA. E-mail: [email protected] ganization and Structure, Shorewood WI 53211 USA. E-mail: [email protected] Devika P. MADALLI, Documentation Research and Training Centre (DRTC) Indian Statistical Institute (ISI), Bangalore 560 059, India. Joshua HENRY, Institute for Knowledge Organization and Structure, E-mail: [email protected] Shorewood WI 53211 USA. Daniel MARTÍNEZ-ÁVILA, Departamento de Ciência da Informação, Peter TURNER, Institute for Knowledge Organization and Culture, Universidade Estadual Paulista–UNESP, Av. Hygino Muzzi Filho 737, Shorewood WI 53211 USA. 17525-900 Marília SP Brazil. E-mail: [email protected]

J. Bradford YOUNG (Bibliographic Consultant), Institute for Knowledge Widad MUSTAFA el HADI, Université Charles de Gaulle Lille 3, URF Organization and Structure, Shorewood WI 53211, USA. IDIST, Domaine du Pont de Bois, Villeneuve d’Ascq 59653, France. E-mail: [email protected] Editor Emerita H. Peter OHLY, Prinzenstr. 179, D-53175 Bonn, Germany. Hope A. OLSON, School of Information Studies, University of Wiscon- E-mail: [email protected] sin-Milwaukee, Milwaukee, Northwest Quad Building B, 2025 E New- port St., Milwaukee, WI 53211 USA. E-mail: [email protected] M. Cristina PATTUELLI, School of Information, Pratt Institute, 144 W. 14th Street, New York, New York 10011, USA. Series Editors E-mail: [email protected]

Birger HJØRLAND (Reviews of Concepts in Knowledge Organization), K. S. RAGHAVAN, Member-Secretary, Sarada Ranganathan Endowment Department of Information Studies, University of Copenhagen. E-Mail: for Library Science, PES Institute of Technology, 100 Feet Ring Road, [email protected] BSK 3rd Stage, Bangalore 560085, India. E-mail: [email protected].

María J. LÓPEZ-HUERTAS (Research Trajectories in Knowledge Heather Moulaison SANDY, The iSchool at the University of Missouri, Organization), Universidad de Granada, Facultad de Biblioteconomía y 303 Townsend Hall, Columbia, MO 65211, USA. Documentación, Campus Universitario de Cartuja, Biblioteca del Colegio E-mail: [email protected] Máximo de Cartuja, 18071 Granada, Spain. E-mail: [email protected] M. P. SATIJA, Guru Nanak Dev University, School of Library and Infor- Editorial Board mation Science, Amritsar-143 005, India.

E-mail: [email protected] Thomas DOUSA, The University of Chicago Libraries, 1100 E 57th St, Chicago, IL 60637 USA. E-mail: [email protected] Aida SLAVIC, UDC Consortium, PO Box 90407, 2509 LK The Hague, The Netherlands. E-mail: [email protected] Melodie J. FOX, Institute for Knowledge Organization and Structure, Shorewood WI 53211 USA. E-mail: [email protected]. Renato R. SOUZA, Applied Mathematics School, Getulio Vargas

Foundation, Praia de Botafogo, 190, 3o andar, Rio de Janeiro, RJ, 22250- Jonathan FURNER, Graduate School of Education & Information Stud- 900, Brazil. E-mail: [email protected] ies, University of California, Los Angeles, 300 Young Dr. N, Mailbox 951520, Los Angeles, CA 90095-1520, USA. Rick SZOSTAK, University of Alberta, Department of Economics, 4 E-mail: [email protected] Edmonton, Alberta, Canada, T6G 2H4. E-mail: [email protected]

Claudio GNOLI, University of Pavia, Science and Technology Library, Joseph T. TENNIS, The Information School of the University of Wash- via Ferrata 1, I-27100 Pavia, Italy. E-mail: [email protected] ington, Box 352840, Mary Gates Hall Ste 370, Seattle WA 98195-2840 USA. E-mail: [email protected] Ann M. GRAF, School of Library and Information Science, Simmons University, 300 The Fenway, Boston, MA 02115 USA. Maja ŽUMER, Faculty of Arts, University of Ljubljana, Askerceva 2, E-mail: [email protected] Ljubljana 1000 Slovenia. E-mail: [email protected]

Jane GREENBERG, College of Computing & Informatics, Drexel University, 3141 Chestnut Street, Philadelphia, PA 19104 USA, E-mail: [email protected]

Knowl. Org. 46(2019)No.3 161 G. Freeman and R. J. Glushko. Organization, not Inspiration: A Historical Perspective of Musical Information Architecture

Organization, Not Inspiration: A Historical Perspective of Musical Information Architecture Graham Freeman*, Robert J. Glushko** *Dan School of Drama and Music, Queen's University, Harrison LeCaine Hall, 39 Bader Lane, Kingston, Ontario, Canada, K7L 3N6, ** University of California, Berkeley, Cognitive Science Program, 140 Stephens Hall, Berkeley CA 94720, USA,

Graham Freeman is a musicologist and technical writer in Toronto, Canada. He received his PhD from the University of Toronto and is currently teaching at Queen’s University and at George Brown College.

Bob Glushko is an adjunct full professor at the University of California at Berkeley in the Cognitive Science Program. He has had more than thirty years of experience in information systems and service design, content management, electronic publishing and ebooks, internet commerce, and human factors in computing systems. He founded or co-founded four companies, including Veo Systems in 1997, which pioneered the use of XML for electronic business. Veo’s innovations included the Common Business Library (CBL), the first native XML vocabulary for business-to-business transactions, and the Schema for Object-Oriented XML (SOX), the first object-oriented XML schema language.

Freeman, Graham and Robert J. Glushko. 2019. “Organization, not Inspiration: A Historical Perspective of Musical Information Architecture.” Knowledge Organization 46(3): 161-170. 28 references. DOI:10.5771/0943- 7444-2019-3-161.

Abstract: The organization of musical resources in a piece of music is opaque for everyone but for those with the highest levels of musical education. For the average listener, the specific vocabulary of musical organization is usually replaced by metaphorical language relating to inspiration and musical affect, or by a social perspective that rids the music of its specific theoretical language and provides a more relatable perspective of the music as a historical and communal event. We examine the ways in which information architecture and organizational theory can surface the inner workings of music in a relatable and approachable way. We consider music as a series of design resources that composers draw upon and organize according to a series of constraints that create a sense of musical structure to which the listener can relate. After a general introduction to the literature relating to constraints and creativity, we use two historical anecdotes that provide accessible demonstrations of how musicians in the seventeenth and twentieth centuries organized their musical resources both for their own compositional needs and for the purposes of didactic communication.

Received: 23 September 2018; Revised: 26 January 2019; Accepted: 27 March 2019

Keywords: music, musical resources, constraints, organizing systems, composers, information

1.0 Introduction ity?;” “What genre?;” “Which structure?” If we have no mu- sical experience, we might ask even more fundamental ques- For those with no experience writing music, the process by tions like “Which notes should I use?;” “Should I use all the which it happens is opaque, with the catalyst for it being notes or just some of them?” or; “What should the duration something mysterious and difficult to define called “inspira- of each note I play be?” tion.” Igor Stravinsky (1998, 50) referred to it as a “hazy If we have every possible musical design resource at our emotive disturbance that sets the composer’s creative imag- disposal, it can be paralyzing even to imagine where to begin ination in motion.” But how do we take that “emotive dis- organizing them into music. This phenomenon is what psy- turbance” and create music from it? Once inspiration ar- chologist Barry Schwartz calls (2004) the paradox of choice, rives, we might ask any one of a number of very reasonable in which an overwhelming number of alternatives creates questions: “In what style should I compose?;” “What tonal- information overload that prevents any meaningful interac- 162 Knowl. Org. 46(2019)No.3 G. Freeman and R. J. Glushko. Organization, not Inspiration: A Historical Perspective of Musical Information Architecture tion with the information or resources presented. Even Stra- support.” Some of these organizing systems, like western vinsky can experience the anxiety this can produce (Stravin- tonality or Hindustani practices, have very long histories of sky 1998, 63): established practice, in which constraints have been care- fully selected, developed, and pruned over many hundreds I experience a sort of terror when, at the moment of years. Other practices, such as the music of the twenti- of setting to work and finding myself before the in- eth-century avant-garde, feature composers creating their finitude of possibilities that present themselves, I own individual organizing systems, often using different have the feeling that everything is permissible to me. constraints for every piece they compose. If everything is permissible to me, the best and the Musical organizing systems have been the subject of worst; if nothing offers me any resistance, then any several important studies in the field of knowledge organi- effort is inconceivable, and I cannot use anything as zation over the last several years. Representative examples a basis, and consequently every undertaking be- include Smiraglia, whose work (2002, 2017; Thomas and comes futile. Smiraglia 1998; Smiraglia and Graf 2017) has been perhaps the most valuable contribution to the classification of To overcome this anxiety, a composer must impose con- printed and recorded musical materials. Adcock (2001) has straints, and music is, therefore, rich in constraints that rein examined the challenge of creating classification systems in the number of options. At a very high level, these con- for printed and recorded music that increase the accessibil- straints determine what arrangements of musical resources ity of musical artifacts for the visually impaired. Wu and Shi are considered stylistically acceptable within a specific (2016) have met the challenge of exploring a classification genre or culture. More specifically, constraints are a way of system that provides precise categories and metadata for organizing musical material and limiting choices to prevent classical music recordings on the internet. Weissenberger’s cognitive overload and to help composers find a reasonable (2015) work on the classification of traditional music gen- path forward through the seemingly infinite number of res expands the concept of documented music to include musical options. As Stravinsky described it (1998, 65): alternative forms of knowledge such as oral/aural trans- mission of musical material. Abrahamsen (2003) has ex- My freedom thus consists in my moving about within plored the role of ontology in the neglected classifications the narrow frame that I have assigned myself for each of popular music in musicology. Finally, Lee (2017a, 2017b, one of my undertakings. I shall go even further: my 2019) has explored the distinction between scientific and freedom will be much the greater and more meaning- bibliographic classification and the challenges of classifica- ful the more narrowly I limit my field of action and tion systems for the various sub-categories of instrumental the more I surround myself with obstacles. Whatever music ensembles. diminishes constraint, diminishes strength. The more To complement the existing literature on the challenges constraints one imposes, the more one frees one’s of classifying of musical artifacts, we will explore some his- self of the chains that shackle the spirit. torical examples of classifying the musical material itself. This article is about considering music as architecture, in Form provides an excellent example of a musical con- which musical resources are assembled and constrained to straint. For much of the history of western art music create a structure in the form of a musical work. This per- (WAM), composers have used recognizable structures for spective of musical architecture is not designed to replace their music. Among these are sonata form, a large form di- traditional methods of music theory or analysis, but rather vided into three or four separate movements that are often to demonstrate that musical thought has some very specific unified by key or thematic material, and dance forms, which resonances with the architectural metaphor. In particular, were short musical structures having their historical origins we focus on the tools and materials of information archi- in socially performed dances. The benefit of these forms is tecture (IA). The purpose of showing the complementarity that their prescriptive structures provided composers with of these two disciplines is to demonstrate the ways in which a path along which they could develop their musical mate- musical thought resonates strongly with two fundamental rial and make it relatively familiar to the listener, like a con- principles of IA: 1) organizing resources for retrieval and tainer into which the musical material can be poured. use; and; 2) sensemaking, and that musicians are inherently The diversity of musical practice from different regions excellent information architects of their own design re- and eras demonstrates the various ways in which musicians sources. In addition, analyzing music as the result of sys- have constrained musical material. We call these sets of tematic architectural thinking is easier and more useful than constraints and the design decisions that embody them “or- viewing it as the product of opaque inspiration. ganizing systems” (Glushko 2016), “an intentionally ar- After some brief background on the theory of con- ranged collection of resources and the interactions they straints and IA, we will examine some different ways in Knowl. Org. 46(2019)No.3 163 G. Freeman and R. J. Glushko. Organization, not Inspiration: A Historical Perspective of Musical Information Architecture which musicians have created musical architectures that poses a model using generative constraints to show how resonate closely with IA. For the sake of brevity, we pro- the use of existing design resources at various levels of vide examples from WAM, but the agnostic nature of this granularity created by other artists can be squared against approach will prove to be well-suited to musical architec- copyright law. tures from any time or place. This approach has been de- Beyond the perspective that music is simply about ap- veloped during many years of teaching music history and plying basic constraints, cognitive science has been a vital theory to non-musicians enrolled in general music history tool in examining the complexity of musical understanding courses in a university setting, with continual modifica- and the multiple organizational levels on which musical de- tions and fine-tuning in response to the ongoing interac- cisions are made by composers and perceived by listeners. tions with students and their input. Musical sounds have no defined semantic meaning. Instead, musical structure operates on multiple structural levels sim- 2.0 Constraints and IA ultaneously, which requires the listener to extract complex information such as pitch, timbre, and duration. Pearce and Why is IA the appropriate framework for this discussion? Rohrmeier (2012) describe each element of musical infor- IA is strongly associated with website design, but Glushko mation as involving multiple cognitive processes. Compos- provides (2016, 115) a better definition as “designing an ab- ers, therefore, operate with the understanding that every stract and effective organization of information and then musical constraint they apply provokes in the listener, even exposing that organization to facilitate navigation and in- one with only a passing familiarity with the musical genre formation use.” According to Glushko, the process for cre- they are hearing, a complex set of cognitive processes that ating the organizing system that sits at the foundation of anticipate the fulfillment of their tacit expectations regard- IA is: ing the organization of fundamental musical resources. Finally, composers recognize that music is constrained – Selecting the resources that will be organized by the medium of its transmission. For those in the WAM – Organizing the resources according to rules or con- tradition, music is transmitted to other musicians typically straints through notation with the tacit understanding that the – Designing interfaces to the resources to facilitate re- symbols on the page represent the prescriptive procedures trieval and use, and for turning visual representations of sound into aural real- – Maintaining and adapting the system over time. izations. Western notation imposes its own constraints on what it is possible to document with visual symbols. Many IA provides the tools for the task of sensemaking in which composers have attempted to break free of these con- people use the resources they have organized to interpret straints with new notational systems that included ex- the world, survive their environments, be innovative with tended symbols, written instructions, graphic representa- existing resources, and even invent new resources. Arango tions of sound waves, and code-based notation. Contem- (2011) suggests that IA and traditional architecture share porary musicians in popular or electronic genres will fre- the common goal of intentionally designing environments quently use sound recordings of performances or rehears- to facilitate a specific goal. For traditional architecture, the als, face-to-face communication, or collaborative technol- goal is to create a habitable ; for IA, it is to help a ogy to transmit and document their organization of musi- user to navigate and put to use the overwhelming deluge cal material, often to avoid the difficulties of fitting the of information that surrounds us. piece into the constraints of traditional notation or of cre- Musicians facilitate understanding of their musical en- ating a bespoke notational representation. vironments by thinking of composition as the application Yet what do we do with these insights, and what does a of hierarchical constraints that progressively organize the better understanding of artistic creativity and musical per- musical design material. The current literature on creativity ception provide for us as listeners, observers, or partici- and innovation is rich with excellent analyses of the way in pants in that activity? By augmenting our musical perspec- which constraint-based thinking promotes creative prob- tive with tools derived from IA, we can provide additional lem solving. Stokes (2013) writes, for example, that there domain-agnostic insights that enhance the way we talk are four levels of constraints arranged in a hierarchy that about music and the lessons we can extract from it. To do descends from the breadth of genre and style to the gran- this, we need to tell two stories about how music resonates ularity of materials and resources. Constraints are then ei- with the principles of IA. The first, about the music of ther adhered to in order to maintain the expectations of Arnold Schoenberg, shows how resource selection and or- style and genre or they are broken for the purpose of in- ganization can lead to innovative new developments of novation and expanding into new creative areas. Building musical praxis. The second, about the theoretical work of on Stokes’ typology of constraints, Fishman (2015) pro- Jean-Philippe Rameau, shows how resource selection and 164 Knowl. Org. 46(2019)No.3 G. Freeman and R. J. Glushko. Organization, not Inspiration: A Historical Perspective of Musical Information Architecture organization can facilitate revolutionary new approaches are the terms consonance and dissonance.” This is largely to musical sensemaking. Our observations here remain the result of a constantly shifting definition of what con- within the scope of notation-based representations of mu- stitutes a dissonance, as we shall see below. A useful basic sic, but this approach is applicable to musical transmission definition of consonance is a combination of tones that in any representational media. sound settled, as though they are pleasant enough that they do not unto themselves suggest the need to move any fur- 3.0 General introduction to musical constraints ther to be nice to listen to. Dissonance, in contrast, is cre- ated by a combination of tones that sound harsh or unset- Charting a quick and accessible history of musical organiz- tled, as though they are imbued with a tension that requires ing systems in WAM is not an easy task. While it would be their continued movement to consonant combinations to convenient to have a tidy narrative in which musical devel- find resolution. Consonance and dissonance are, therefore, opments build upon one another and cascade along an ef- organizing principles with a defined semantic relationship ficient path like a sort of waterfall methodology, the truth between them, one that the composer can manipulate to is considerably messier and replete with many innovative, create and extend musical interest. Dissonance produces iconoclastic, nonlinear and sometimes even regressive de- the expectation of resolution to consonance, and compos- velopments. Nevertheless, it will prove helpful for our pur- ers will use that either to provide or deny the listener the poses to include a short guide to harmony in WAM since fulfillment of those expectations. According to Salimpoor the eighteenth-century, one that provides enough of a sum- et al. (2015), the use of dissonance to create the sense of mary to demonstrate the value of the IA perspective on the expectation in the listener is part of the composer’s power- history of theory while also providing a few tantalizing ful ability to strategically manipulate the listener’s response. hints for future work. The constraints that govern the balance between conso- The best place to begin is with the concept of tonality. nance and dissonance form the basis of tonality, which is Tonality has been a dominant framework for organizing the fundamental grammar of music during the common musical materials since the eighteenth-century, and it con- practice period. tinues that dominance in contemporary popular music. Consonance and dissonance are not absolute concepts, This makes tonality the primary concept that people use to and they are in no way universal. Over time, as well as understand western musical harmony, whether they know across geographies, the concepts of what sounds are con- anything about music theory or not. sonant or dissonant can vary considerably, even today. It is Tonality is an organizing system that governs the pro- the relationship between them, and how one progresses gression of musical events in a piece of music. Many theo- from one to the other, that constitutes the constraint. Mu- rists have proposed that the way the rules of tonality gov- sic that is entirely consonant might be immediately sono- ern musical events is similar to the way the rules of gram- rous for the ears, but the listener would quickly tire of the mar govern linguistic communication, with melodies, lack of tension, contrast, and interest in the musical mate- chords and other musical materials replacing those of lan- rials. Dissonance is, therefore, a necessary and vital part of guage such as verbs and nouns. musical organization, and it is how this relationship is man- The eighteenth- and early nineteenth-centuries consti- aged that has changed so significantly over time. tute what is often referred to as the “common practice pe- In the eighteenth-century of Mozart, dissonance was a riod,” which denotes a period in which general best prac- principle that was carefully governed by the strict con- tices regarding tonality prevailed throughout Europe and straints of tonality. Like the physical element of fire, it was dictated stylistically acceptable compositional practice. This the vital catalyst for creating energy, but also one that could approach to tonality also defined the parameters against destroy the entire architecture if left unchecked. Eight- which innovators would push to establish new means of eenth-century tonality, therefore, constrained dissonance musical expression and organization. Mozart, Beethoven to a transitional stage between moments of consonance. and Haydn, to name just a few examples, are late eight- Dissonant musical notes were to be preceded by conso- eenth- early nineteenth-century composers with diverse nant ones and should release the tension they created by compositional outputs for whom tonality constituted the subsequently resolving into consonance. Composers common fundamental language of their time. could, of course, strain against those constraints for the Perhaps the most significant constraint within the tonal purposes of artistic expression. Mozart’s String Quartet K. system is that concerning the relationship of consonance 465, aptly nicknamed “Dissonance,” is an excellent exam- and dissonance. The definitions of these terms are difficult ple of a work that briefly moves to the very edge of the to pin down. Indeed, Tenney has written (1988, 1) that tonal constraint on dissonance to produce a riveting and “There is nothing in the language of discourse about music dramatic musical effect, all the while remaining within the that is more burdened with purely semantic problems than parameters of the constraint. Mozart achieves this effect Knowl. Org. 46(2019)No.3 165 G. Freeman and R. J. Glushko. Organization, not Inspiration: A Historical Perspective of Musical Information Architecture with an extended introduction that emphasizes dissonance Hence, it seemed at first impossible to compose and a lack of harmonic direction that eventually moves to pieces of complicated organization or of great length. the home key in a cathartic resolution. One of the defining characteristics of music in the nine- Contrary to expectations, removing the constraints of to- teenth-century is the increasingly liberal way in which com- nality did not encourage creativity but effectively hindered posers treated the constraint on dissonance. By the end of it. his life, Beethoven had begun to infuse his music with a This is an excellent demonstration of the fundamental decidedly more expressive treatment of dissonance, though premise of much critical work in the field of creativity, still within the bounds of the tonal constraints, as one can which tells us that constraints are both restrictive and gen- hear by comparing, for example, the early String Quartet No. erative, since they limit the number of available options to 1 op. 18 in F Major to the much later op. 133 (“Große allow creators to follow established and ready-made paths Fuge”). Later composers such as Richard Wagner, Claude to create coherent artistic works on a large scale. In the Debussy and Alexander Scriabin found innovative and even early 1920s, Schoenberg realized that to make atonality a revolutionary ways of incorporating new elements, such as viable compositional method, he needed to constrain the exotic scales or harmonies, into the tonal material, such as choices he could make with the material in such a way as in Debussy’s L’après-midi d’un faune or Scriabin’s Piano Sonata to allow him to organize it effectively. He, therefore, de- No. 9 (nicknamed “The Black Mass”), or of expanding dis- vised a method that became known as serialism, in which sonant events and denying their expected resolution in such each of the twelve tones in the octave is arranged in a row a way as to create prolonged periods of extreme harmonic and then subject to various applied manipulations like tension, such as in the Prelude to Tristan und Isolde by Richard playing it backwards (retrograde), upside-down (inver- Wagner. Each of these composers, to name just a few, ex- sion), upside-down and backwards (retrograde inversion), panded the tonal resources available to composers in a re- etc. Serialism provided Schoenberg with the constraints markable way. Yet these expanded resources remained that allowed him to gather his musical resources into larger bound, if loosely, to the tonal organizing system, and the units that could then be incorporated into larger and more constraints governing the concept of consonance and dis- traditional formal musical structures. sonance remained guiding principles within which compos- An IA perspective provides some insights into the sig- ers worked. Strained though it was, tonality was not re- nificance of this series of shifts, as well as a demonstration placed as an organizing system by these radical musical of how musical innovation is often based on the manipu- works. lation of the ways in which design resources are organized. The most significant and revolutionary attempt to re- Music is sound organized according to certain parameters, place the organizing system of tonality with a new one notably pitch (the auditory sensation of pitches being came from the composer Arnold Schoenberg. In 1908, “high” or “low”), time (how long pitches last and their Schoenberg eliminated the distinction between consonance arrangement into rhythmic groupings), timbre (the specific and dissonance altogether and treated all twelve notes in the tone qualities of the sound), dynamics (how loud or quiet octave as equal. Schoenberg’s new atonal music promised a note is), and form. Within these parameters, composers new sounds and textures that had never been used before apply constraints or organizing principles to materials at in the European tradition, which would give composers an the level of each parameter to create the foundation for a almost limitless palette of expressive resources, as is evi- piece of music. For example, the sub-levels of pitch are dent in one of his most famous works from this period, domain-specific constraints such as melody, harmony, and Pierrot Lunaire. Yet Schoenberg encountered the difficulty counterpoint, while under time the sub-levels are rhythm, of organizing these sounds without the constraints of to- tempo, pulse, etc. Conventional choices for these parameter nality. He was overwhelmed by the paradox of choice that values define the familiar categories according to which comes with unbounded possibility. As he wrote in his essay musical sound is organized. Composers between the sev- “Composition with Twelve Tones” (1975): enteenth and early twentieth centuries generally adhered to the tonal organizing system. In other words, Bach, Beetho- Harmonic variation could be executed intelligently ven, Mozart and many others all created very different mu- and logically only with due consideration of the fun- sic, but still operated within a tightly constrained, shared set damental meaning of the harmonies. Fulfillment of of parameter values that governed what constituted music all these functions—comparable to the effect of in the eighteenth and nineteenth centuries. These proper- in the construction of sentences, of sub- ties are, of course, not only conventional, but arbitrary. division into paragraphs, and of fusion into chap- Should we decide to do so, we could imagine others. We ters—could scarcely be assured with chords whose could, for example, distinguish musical sounds made by constructive values had not as yet been explored. men from those made by women, or those made by 166 Knowl. Org. 46(2019)No.3 G. Freeman and R. J. Glushko. Organization, not Inspiration: A Historical Perspective of Musical Information Architecture

Musical Bound Approach Agnostic Approach stead of exploiting new musical possibilities, Schoenberg Resource was simply placing his new material in an old container, Pitch Melody Serial applications Harmony tainting his supposedly radical innovation with a regressive Counterpoint conservatism. And while Boulez writes using the idiom of Time Rhythm Rhythm music and art, we can draw some lessons from his criti- Tempo Tempo cisms by applying Glushko’s organizing system that sits at Pulse Pulse the heart of the IA perspective. Timbre Instrumentation Instrumentation Selecting resource properties is the stage during which Volume Perceived loudness Perceived loudness or or quietness of quietness of sound the composer decides what properties of resources are sound most important for their intended organization and use. Form The structure of a The structure of a piece For example, when we want to select a book that would be piece of music of music based on repe- appropriate for a university course, its most important based on repeti- tition or development tion or develop- of musical material properties are “aboutness,” author, publication date, and ment of musical other properties useful in assessing relevance. However, if material we are in the business of warehousing and shipping books, Table 1. Bound and agnostic constraints. it is more important to know a book’s size and weight. The selection stage is crucial, because one includes some prop- people named Ludwig from those made by people named erties while excluding others. This is precisely Boulez’s crit- Johann. For the most part, we do not do that, but only be- icism of Schoenberg: of the five potential properties of cause those distinctions do not serve any practical purpose; music that Schoenberg could have selected for applying that could change at any time to match whatever our prior- the serial method (pitch, time, timbre, dynamics, and ities might happen to be. The conventional category of form), Schoenberg limited his selection only to pitch and constraints is what we have classified as bound constraints excluded the remaining four. in Table 1, meaning that composers operate within them During the organizing stage, the selected resource and are bound to the communal determination of how they properties and the range of values they might take define are organized. Schoenberg’s innovation was to recognize an organizing principle. In Schoenberg’s case, this stage is that these constraints are arbitrary, and that the constraints the creation of the serial method to act as a constraint on can be broken at will to produce new ones. Schoenberg’s the pitch choices of a composition. realization demonstrates that composers can take an agnos- Designing the interactions with the organized resources tic perspective that ignores the bound constraints, freeing is the stage at which Schoenberg specifies the possible them to arrange the musical resources any way they want. ways to create a piece of music, and it is here that Boulez This presentation of the “bound” and “agnostic” con- identified what he saw as Schoenberg’s greatest error. Bou- straints throws into stark relief the very interesting fact lez accused Schoenberg of failing to recognize the possi- that Schoenberg’s innovation, as radical as it appears to be, bilities inherent in the revolutionary organization of the is arguably less radical than one might imagine, for of the pitch materials, and of simply pasting these new resources five parameters provided here, his serial method only ap- into the design patterns and formal structures derived plies to that of pitch. French composer Pierre Boulez, in a from tonal music, which created hybrid works that fulfilled not-so-hagiographic article after Schoenberg’s death enti- the potential of neither the old or the new idioms. The tled “Schoenberg is Dead,” (1968) criticized Schoenberg maintenance and expansion of Schoenberg’s serial method for failing to apply the serial method to the other musical infused Boulez’s own music, as well as that of later gener- parameters of duration, timbre, and dynamics. In particu- ations of composers, which would embrace the idea of ap- lar, Boulez castigated Schoenberg’s continued reliance on plying the serial method to all musical parameters to create traditional classical formal structures such as sonata form, what is sometimes referred to as total serialism. which were strongly associated with tonality, to structure Boulez’s powerful attack on Schoenberg’s approach to and organize his post-tonal serial music. Boulez’s condem- musical architecture provides us with a clear example of nation of this regressive approach was sharp, as he as- how the IA perspective can illuminate musical thought, as serted that the classical forms (272) “annihilate the possi- well as a way to use that perspective to talk about musical bilities of organization inherent in the new language” and innovations in different musical practices at other histori- create “maximum incoherence—a paroxysm in the ab- cal periods. Figure 1 shows a summary of this brief his- surdity of Schoenberg’s incompatibilities” (273). torical example. The basic premise of Boulez’s criticism is that Schoen- berg did not go far enough in breaking the constraints of the bound category in favour of agnostic constraints. In- Knowl. Org. 46(2019)No.3 167 G. Freeman and R. J. Glushko. Organization, not Inspiration: A Historical Perspective of Musical Information Architecture

The tonal system has been the dominant organizing system in WAM since the early eighteenth-century. It governs the musical resources relating to pitch and prescribes systematic treatments of dissonance, harmonic relationships, and chord progressions. While the tonal constraints of the early eighteenth-century were very different from those of the late nineteenth-century, the fundamental principles of tonality remain largely intact even today. Schoenberg began by loosening the constraints of tonality, but quickly discovered that an organizing system with no constraints at all is not conducive to creativity. He, therefore, created a rigorous organizing system he called serialism, in which all available notes within the octave are arranged into patterns and then subject to various manipulations. Schoenberg’s serialism became another popular organizing system for composers in the twentieth-century, as well as a point of origin for later composers to expand and adapt it. A summary of this organizing system according to Glushko’s structure for IA is as follows:

Selecting Schoenberg selects pitch as the primary musical resource to organize. Organizing Schoenberg replaces tonal constraints with serial constraints. Schoenberg uses the serial method to produce pieces for consumption and study, publishes writings explaining the Designing method, and teaches the method to other composers. Schoenberg adopts serialism as his primary method of composition, while other composers modify the approach to Maintaining encompass additional resources.

Figure 1. Summary of the Information Architecture of Serialism.

4.0 Sensemaking and tonality as IA In WAM, the lowest voice in the musical texture is the bass. The bass provides the fundamental foundation of the Sensemaking is the way in which humans organize re- musical texture, as well as the names of the chords that are sources in an attempt to impose meaning on the world in built on it. In the seventeenth century, the bass was consid- which they live. According to Glushko, we record, analyze, ered so fundamental that musicians developed a system of organize and reorganize resources and observations about musical notation in which only the bass voice was notated those resources, both natural and artificial, as a way of nav- on a musical staff, while the remaining voices were sketched igating our way through the world and making sense of the above the bass note using numbers. These numbers indi- resources it contains. Humans are quite good at sensemak- cated the intervals or steps above the bass the musician ing, and we are biologically hard-wired with this capacity to needed to fill in with the upper voices. This practice, as well understand our environment by simplifying and categoriz- as the shorthand notation, was known as thoroughbass, and ing our sensory inputs to avoid threats. More important it reflected the primacy of the bass voice and the con- than our biological sensemaking, however, is our inten- straints it imposed on the remaining voices above. tional sensemaking, in which we organize the information In thoroughbass, each instance of a bass with a figura- in our environment and encode that information into ar- tion above it was generally considered an independent en- chitectures or knowledge structures that allow us both to tity. For example, a simultaneity consisting of the notes (as- impose meaning on those structures and facilitate our in- cending from the bass) C-E-G would be written as a C in teraction with them. Weick et al. write (2005, 410) that the notation and then the numbers 5/3 above, while a simulta- basis of sensemaking is organizing to “make sense of neity consisting of the notes (again ascending from the equivocal inputs and enact this sense back into the world to bass) E-G-C would be written with an E in notation with make that world more orderly.” the numbers 6/3 above (the numbers represent musical in- When we consider our definition of music as an organ- tervals, which can be calculated simply by counting letter izing system of constrained sound materials, we see that mu- names in accordance with the musical alphabet, which runs sic is an ancient and ongoing process of continuous sense- from A-G and then starts over again). C5/3 and E6/3 were making. Consequently, musicians are extraordinary sense- considered separate entities, despite the fact that they con- makers, as musical resources are constantly being organized, sist of the same set of notes in different arrangements. reorganized, and applied to suit various aesthetic prefer- Thoroughbass is, therefore, an effective method for no- ences across time and geography. We will demonstrate this tating harmony for performers. Yet it is considerably more perspective of music with a by necessity very general de- than that, for what appears to be a simple notational method scription of a single development in the story of the tonal for performance practice actually contains a complex and system in western art music in the seventeenth and eight- tacit framework of musical relationships, and it was incum- eenth centuries. Tonality is the organizing system against bent on the performer to have a thorough understanding of which Schoenberg and Boulez reacted, so in a way this an- these relationships in order to navigate thoroughbass in the ecdote will act as a sort of prequel to the story told above. correct way (Holtmeier 2007). Johann Friedrich Daube, in 168 Knowl. Org. 46(2019)No.3 G. Freeman and R. J. Glushko. Organization, not Inspiration: A Historical Perspective of Musical Information Architecture

1756, describing the qualifications required for performing G and E-G-C) to be two different figurations of the same thoroughbass, wrote (quoted in Holtmeier 2007, 8): harmonic entity. Since C-E-G consists entirely of thirds (C- E is a distance of a third and E-G is a distance of a third), Without a complete understanding of harmony it is and the third is considered a consonant interval, this ar- impossible to play thoroughbass correctly.” Accord- rangement of the notes was considered the “root” position, ing to Daube, this understanding includes: “(1) from suggesting that it is the most stable and consonant arrange- whence most chords originate, (2) to where they may ment of these notes, with the arrangement E-G-C being a be connected, and (3) how, from the first chord, one variation of this entity. The note C is therefore the most im- can deduce subsequent ones. portant note in this set, as it is the note that gives the simul- taneity its name and dictates its function, since we would The architecture of knowledge the musician had to possess, now consider both C-E-G and E-G-C to be different fig- and upon which their ability to understand thoroughbass re- urations of “C” simultaneities.” “C” is what Rameau re- lied, was, therefore, implicitly and tacitly understood to be ferred to as the fundamental bass, which means that it is the already present. As Keiler has written (2013, 288), “it is clear most important note in each of the sets even if it does not enough what is meant, in a given treatise, but the limits or sit in the actual bass voice. The concept of the fundamental extent of the knowledge out of which practical rules are bass constituted a principle that allowed Rameau to catego- formulated—the mental origin of their practical formula- rize the many separate harmonic simultaneities and assign tion—is much harder to determine.” What thoroughbass them common identities based on their similarities. With a practice, therefore, lacked, despite its rich tapestry of musi- smaller number of categories of harmonic simultaneities, cal and compositional resources, was a rigorous and system- which we can now refer to as chords, Rameau was then able atic approach to its theory and pedagogy. Since each simul- to analyze the way in which the entities within each larger taneity was considered a separate entity, there could be no category of chord behaved, draw general conclusions about single principle to systematize and govern the nearly infinite their typical behavior, quantify that behavior by making ex- number of possible relationships between them in a way plicit their general patterns, and subordinate those move- that could be effectively abstracted and theorized (Lester ments to a larger principle that governed them. This larger 2002). While the compositional possibilities in this approach principle became the prescriptive grammar that constrained were extensive, the pedagogy was complex, and instruc- the behavior of chords to repeatable and coherent patterns, tional materials that attempted to summarize the thorough- thereby creating the organizing system we today refer to as bass approach for teaching were often catalogue-like trea- tonality, in which the relationships between consonance and tises that dealt with classifying these simultaneities and gov- dissonance are more rigidly subordinated to a prescriptive erning their movement from one to the next. As just one and iterative system. Figure 2 provides a summary of this example of these treatises, Der General-Bass in der Composition brief historical example. by Johann David Heinichen from 1728 is a treatise that lays Weick, Sutcliffe and Obstfield (2005) provide a model of out the author’s prescriptions for thoroughbass and weighs the principles of sensemaking, some parts of which we will in at an impressive 960 pages. repurpose here to show how Rameau’s project represents an In 1722, Jean-Philippe Rameau proposed a rigid logic for excellent example of how musical thought is a constant cre- identifying harmonic simultaneities and imposing a system- ation and revision of a musical information architecture. atic governance on their progression from one to the next. He did this by categorizing the vast array of independent 4.1 Sensemaking organizes flux simultaneities according to their similarities and common features. To use our example from above, Rameau consid- Sensemaking begins with an undifferentiated stream of ered the two simultaneities (ascending from the bass: C-E- events that contains the potential for a seemingly infinite

In the seventeenth-century, music for accompanying singers or other instrumentalists on the keyboard or lute was frequently written in a shorthand notation known as figured bass, in which a bass note appeared on the staff with numbers included to indicate the intervals of the notes that should be played above it. Thoroughbass, though perhaps having the appearance of being a simple notational conven- tion, was an extremely complex practice that relied extensively on implicit understanding and tacit knowledge of musical conventions and logic, yet lacked iterative fundamental principles for organizing and communicating this understanding clearly and effectively. Rameau created a simpler and more accessible information architecture by imposing a unified theory of organization upon the complex practices of thoroughbass. Rameau’s concept of tonality subordinated disparate harmonic entities to fundamental principles of organi- zation to make their relationships more explicit, observable, and iterative, thereby fundamentally altering the way in which subsequent generations of musicians perceived musical harmony.

Figure 2. Summary of the Information Architecture of Rameau. Knowl. Org. 46(2019)No.3 169 G. Freeman and R. J. Glushko. Organization, not Inspiration: A Historical Perspective of Musical Information Architecture cluster of potential actions, misdirection, mistakes, and By observing existing musical practices, applying de- successes. In the thoroughbass period, this stage is the po- ductive reasoning, and creating an organizing system that tential information overload of the thoroughbass perfor- governed the application of musical resources, Rameau ef- mance practice and treatises. fectively engaged in sensemaking to create an information architecture and then facilitated wayfinding through that 4.2 Sensemaking starts with noticing and bracketing architecture by means of his theoretical pedagogy. As Lester writes (2002), Rameau’s sensemaking exercise was Events, actions and processes can be ongoing, but they one of the most important revolutions in western musical have not necessarily been recognized as significant or part thought, for it organized a vast array of disparate perfor- of a larger rule that would provide them some contextual mance practices under a unifying principle derived from logic. Rameau observed that frequently occurring patterns deductive reasoning. The impact of the fundamental bass and tendencies towards certain idioms in thoroughbass approach to music analysis and composition has also had suggested a tacit recognition of their adherence to a larger an enormous impact on everything from the pedagogical fundamental principle that had, at that point, remained un- approach to WAM since the nineteenth-century to that of explored and unarticulated. jazz and popular music today. Despite the theoretical revolution Rameau instigated, it 4.3 Sensemaking is about labeling is important to avoid the value judgement that would sug- gest that his tonal system is better than thoroughbass Labeling is a way of suggesting plausible methods of man- simply by virtue of having applied to it a systematic infor- aging information and executing actions based on that in- mation architecture. Indeed, one could make the legitimate formation. Labels gather up granular analysis of individual argument that Rameau’s project actually served to conceal events and create larger and more frequently occurring pat- and even eradicate a rich musical tradition by encouraging terns that can be memorized and recalled more easily. La- the perspective of thoroughbass as antiquated. As Lambe beling of musical resources was ubiquitous for the thor- has written (2007), taxonomies, of which we must surely oughbass practitioners, and many of the thoroughbass trea- consider Rameau’s system to be an example, not only re- tises, such as the Heinichen mentioned above, consisted of veal information and make it visible, they also conceal and heuristics for memorizing thoroughbass patterns and real- destroy by exclusion. For much of the twentieth-century, izing them in performance. Rameau would expand this ex- thoroughbass was an unjustly neglected field of musical ercise in labeling by recognizing the implicit categories of pedagogy, and it is only within the last few decades that the the musical resources composers were already using. field of historically informed music theory has sought to rectify this by reconstituting the contextual implications of 4.4 Sensemaking is retrospective and social thoroughbass to reveal the insights into the historical rep- ertoire that Rameau’s approach had obscured (Holtmeier Sensemaking uses data derived from experience to trace de- 2007). velopments and interactions over time. Rameau derived his harmonic theory by applying deductive reasoning to exist- 5.0 Conclusion ing practices, practices that had themselves been derived from earlier musical practices. As Lester observes (2002), For Stravinsky, it was organization, not simply inspiration, many of the elements Rameau organized were recognized that produced the catalyst for writing music, as when he as significant by many other theorists across Europe even wrote the following (1998, 51): “This appetite that is aroused before Rameau was born, but they lacked a single deductive in me at the mere thought of putting in order musical ele- perspective to rein them into a unified system. ments that have attracted my attention is not at all a fortui- tous thing like inspiration, but as habitual and periodic, if 4.5 Sensemaking is about organizing through not as constant, as a natural need.” communication The impetus to composition was, therefore, the desire to organize musical materials in ways that caught his atten- Communication is how sensemaking becomes information tion and were pleasing to him, in much the same way as architecture. By gathering unstructured information into an composers in other musical practices seek to apply organ- organizing system and publishing that organizing system in izational methods that fulfill the expectations of specific the Traité de l’harmonie in 1722, Rameau effectively commu- styles and genres. In this paper, we have looked at musical nicated his sensemaking activity to others to facilitate their inspiration as the fundamental principle of sound orga- use of musical resources, fundamentally revolutionizing nized according to constraints so that we can examine compositional pedagogy in the process. more clearly and without recourse to advanced music the- 170 Knowl. Org. 46(2019)No.3 G. Freeman and R. J. Glushko. Organization, not Inspiration: A Historical Perspective of Musical Information Architecture ory how musicians make sense of their information envi- Lester, Joel. 1992. Compositional Theory in the Eighteenth Cen- ronments. It is, therefore, our hope that this preliminary tury. Cambridge, MA: Harvard University Press. examination will inspire further work that will explore the Lester, Joel. 2002. “Rameau and Eighteenth-Century Har- application of the principles of information architecture monic Theory.” In Cambridge History of Music Theory, ed. to other musical genres and practices. Thomas Christensen. Cambridge: Cambridge Univer- sity Press, 753-57. doi:10.1017/CHOL9780521623711. References 026 Pearce, Marcus and Martin Rohrmeier. 2012. “Music Cog- Abrahamsen, Knut Tore. 2003. “Indexing of Musical Gen- nition and Cognitive Sciences.” Topics in Cognitive Science 4: res: An Epistemological Perspective.” Knowledge Organi- 468-84. doi:10.1111/j.1756-8765.2012.01226.x zation 30: 144-69. Salimpoor, Valorie N., David H. Hald, Robert J. Zatorre, Adcock, Lucy. 2001. “Building a Virtual Music Library: To- Alain Dagher, and Anthony Randal McIntosh. “Predic- wards a Convergence of Classification within Internet- tions and the Brain: How Musical Sounds Become Re- based Catalogues.” Knowledge Organization 28: 66-74. warding.” Trends in Cognitive Science 19, no. 2: 86-87. Arango, Jorge. 2011. “Architectures.” Journal of Information Schoenberg, Arnold. 1975. Style and Idea: Selected Writings of Architecture 3, no. 1. http://journalofia.org/volume3/ Arnold Schoenberg, ed. Leonard Stein, trans. Leo Black. issue1/04-arango/ London: Faber. Bawden, David and Lyn Robinson. 2009. “The Dark Side Schwartz, Barry. 2004. The Paradox of Choice: Why More is of Information: Overload, Anxiety, and Other Para- Less. New York: HarperCollins. doxes and Pathologies.” Journal of Information Science 35, Smiraglia, Richard P. 2002. “Musical Works and Infor- no. 2: 180-91. doi:10.1177/0165551508095781 mation Retrieval.” Notes: The Quarterly Journal of the Music Boulez, Pierre. 1968. “Schoenberg is Dead.” In Notes of an Library Association 58: 747-64. Apprenticeship, texts collected and presented by Paule Smiraglia, Richard P. 2017. Describing Music Materials: A Man- Thévenin, trans. Herbert Weinstock. New York, A.A. ual for Resource Description of Printed and Recorded Music and Knopf, 268-76. Music Videos. 4th ed. Lanham, MD: Rowman & Littlefield. Fishman, Joseph P. 2015. “Creating Around Copyright.” Smiraglia, Richard P. and Ann M. Graf. 2017. “From Music Harvard Law Review 128: 334-404. Cataloging to the Organization of Knowledge: An Inter- Glushko, Robert J., ed. 2016. The Discipline of Organizing, view with Richard P. Smiraglia.” Cataloging & Classification 4th ed. Sebastopol: O’Reilly Media. Quarterly 55, no. 5: 269-88. doi:10.1080/01639374.2017. Holtmeier, Ludwig. 2007. “Heinichen, Rameau, and the 1312653 Italian Thoroughbass Tradition: Concepts of Tonality Stokes, Patricia D. 2013. “Crossing Disciplines: A Con- and Chord in the Rule of the Octave.” Journal of Music straint-Based Model of the Creative/Innovative Pro- Theory 51, no. 1: 5-49. cess.” Journal of Productivity and Innovative Management 31, Keiler, Allan. 2013. “The Problem of the Retrieval of Mu- no. 2: 247-58. doi: 10.1111/jpim.12093 sical Knowledge: The Thoroughbass Tradition and Its Stravinsky, Igor. 1998. Poetics of Music in the Form of Six Les- Relationship to Rameau.” Journal of Music Theory 57, no. sons, trans. Arthur Knodel and Ingolf Dahl. Cambridge, 2: 287-320. doi:10.1215/00222909-2323488 MA: Harvard University Press. Lambe, Patrick. 2007. Organising Knowledge: Taxonomies, Know- Thomas, David H. and Richard P. Smiraglia. 1998. “Be- ledge and Organisational Effectiveness. Chandos Knowledge yond the Score:” Notes: The Quarterly Journal of the Music Management Series. Oxford: Chandos. Library Association 54: 649‐66. Lee, Deborah Theresa Lee. 2017a. “Modelling Music: A Weick, Karl E., Kathleen M. Sutcliffe, and David Obstfeld. Theoretical Approach to the Classification of Notated 2005. “Organizing and the Process of Sensemaking.” Western Art Music.” PhD diss., City, University of Lon- Organization Science 16: 409-21. doi:10.1287/orsc.1050. don. http://openaccess.city.ac.uk/17445/1/Lee%2C% 0133 20Deborah.pdf Weissenberger, Lynnsey K. 2015. “Traditional Musics and Lee, Deborah. 2017b. “Numbers, Instruments and Hands: Ethical Considerations of Knowledge and Documen- The Impact of Faceted Analytical Theory on Classifying tation Processes.” Knowledge Organization 42: 290-5. Music Ensembles.” Knowledge Organization 44: 405-15. Wu, Dan and Jinsong Shi. 2016. “Classical Music Record- Lee, Deborah, Lyn Robinson, and David Bawden. 2019. ing Ontology Used in a Library Catalog.” Knowledge Or- “Modeling the Relationship Between Scientific and Bib- ganization 43: 416-30. liographic Classification for Music.” Journal of the Asso- ciation for Information Science and Technology 70: 230-41. doi:10.1002/asi.24120 Knowl. Org. 46(2019)No.3 171 Lielei Chen and Hui Fang. An Automatic Method for Extracting Innovative Ideas Based on the Scopus® Database

An Automatic Method for Extracting Innovative Ideas Based on the Scopus® Database Lielei Chen*, Hui Fang** Nanjing University, School of Electronic Science and Engineering, Nanjing 210023, China, *<[email protected]>, ** (corresponding author)

Lielei Chen received a bachelor’s degree in electronic and information engineering from Hohai University in Nanjing, China, in 2016. She is now a graduate student at the School of Electronic Science and Engineering, Nanjing University. Her research interests include natural language processing and information science.

Hui Fang received a bachelor’s degree in radio engineering (in 1990) and a master’s degree in signal processing (in 1993) from Southeast University in Nanjing, China, and the PhD in electroanalytical chemistry from Nanjing University in Nanjing, China, in 1998. He is now an associate professor at the School of Electronic Science and Engineering, Nanjing University and is affiliated with the State Key Laboratory of Analytical Chemistry for Life Science. His research interests include information processing, data mining, artificial intelligence, instruments and instrumentation, and bibliometrics.

Chen, Lielei and Hui Fang. 2019. “An Automatic Method for Extracting Innovative Ideas Based on the Scopus® Database.” Knowledge Organization 46(3): 171-186. 74 references. DOI:10.5771/0943-7444-2019-3-171.

Abstract: The novelty of knowledge claims in a research paper can be considered an evaluation criterion for papers to supplement citations. To provide a foundation for research evaluation from the perspective of inno- vativeness, we propose an automatic approach for extracting innovative ideas from the abstracts of technology and engineering papers. The approach extracts N-grams as candidates based on part-of-speech tagging and de- termines whether they are novel by checking the Scopus® database to determine whether they had ever been presented previously. Moreover, we discussed the distributions of innovative ideas in different abstract structures. To improve the performance by excluding noisy N-grams, a list of stop-words and a list of research description characteristics were developed. We selected abstracts of articles published from 2011 to 2017 with the topic of semantic analysis as the experimental texts. Excluding noisy N-grams, considering the distribution of innovative ideas in abstracts, and suitably combining N-grams can effectively improve the performance of automatic inno- vative idea extraction. Unlike co-word and co-citation analysis, innovative-idea extraction aims to identify the differences in a paper from all previously published papers.

Received: 26 October 2018; Revised: 20 February 2019; Accepted: 1 March 2019

Keywords: innovative ideas, research, corpus, Scopus®

1.0 Introduction innovation is closely related to social and scientific devel- opment. Innovation embodies the creation, evolution, ex- Research evaluation is important for employing researchers, change, and application of new ideas for the advancement making grant decisions, and determining researcher promo- of society (Rogers 1993). Therefore, innovativeness can tions (Kosten 2016). Currently, one widely recognized reflect the contribution of individual scientific publica- method is citation analysis. Articles with high numbers of tions. To assess the innovativeness of a research paper, we citations reflect their contribution to a certain extent. How- should first extract its innovative ideas. In the academic ever, the use of citation-based indicators to evaluate re- stratification system, peer review plays a central role in the search is the subject of much debate (Wu 2015). Those with evaluation of academic work (Cole et al. 1974), and novelty low citations may also be valuable despite currently having is a major and frequently used criterion (Guetzkow et al. little impact (Garfield 1972). Garfield (1979) questioned the 2004). However, automatically evaluating the innovative- rationality of assessing the quality of publications based on ness of research remains difficult. only the number of citations. A research evaluation based Innovation is considered one evaluation criterion of solely on citations is not objective (Fiala et al. 2017). scientific papers. Methods for identifying the original and Innovation is considered to be the soul of science; it innovative works of research efficiently and accurately promotes scientific research (Xu 2001), and the pursuit of have been researched in recent years (Wieringa et al. 2006). 172 Knowl. Org. 46(2019)No.3 Lielei Chen and Hui Fang. An Automatic Method for Extracting Innovative Ideas Based on the Scopus® Database

Existing innovation-idea identification methods for indi- sults). This typological assessment was recommended for vidual papers are based on context features extracted or performing peer reviews of innovation (Dirk 1999). learned from manually judged documents. However, these methods can extract the ideas of a paper whose authors 2.2 Structure of abstracts believe the ideas are innovative, but in reality, have been proposed previously. To avoid this situation, we present an Mullins and others (1988) proposed an analysis of innova- innovative-idea extraction method that checks a widely tion evaluation based on the structure of scientific papers, used document database to determine whether the ideas including the introduction, methods, and results. extracted from a paper are innovative. Abstract writing guidelines have been studied to im- In this study, we performed an analysis based on a series prove the quality and consistency of abstracts. Milas- of aspects concerning innovative ideas, and as a result, we Bracović and Zajec (1989) suggested using the IMRAD for- propose an automatic method for innovative-idea extrac- mat for an abstract, that is, introduction (I), methods (M), tion that checks the innovativeness of the ideas extracted. results (R), and discussion (D). Endres-Niggemeyer (1998) We combined N-gram extraction and web search tech- defined five moves—background (B), purpose (P), meth- niques to extract innovative ideas from papers without re- odology (M), result (R), and discussion (D)—that consti- quiring any domain corpus assembled by experts. We se- tute the abstract of research articles. By investigating ab- lected abstracts of articles published from 2011 to 2017 stracts from a variety of journals, researchers revealed sev- with the topic of “semantic analysis” as our experimental eral most-frequent abstract elements. Hartley and Betts texts and considered ideas that had not been proposed pre- (2009) showed that most paper abstracts in the social sci- vious to an article’s publication year as innovative. Through ences included the goals, methods, results, and conclusions. experiments, we investigated factors that could improve the Jamar, Šauperl and Bawden (2014) demonstrated that the performance of this method and analysed the reasons caus- most common combination of structural elements in the ing defects in the method, thereby suggesting how the abstracts of technical sciences papers is moves B-M-R. method can be further improved in future work. The pro- Cross and Oppenheim (2006) found that moves M and R posed method provides a foundation technique for future were present in all experimental abstracts. evaluation of innovation in research papers. This work is By using this five-move framework, Kanoksilapatham an application of knowledge organization research. (2013) provided a linguistic characterization of infor- mation presented in abstracts. The study indicated that 2.0 Related work move B functions by preparing the topic focus for readers and by highlighting the importance of topics using words 2.1 Types of innovative ideas or phrases such as “challenging,” “increasingly important,” and “improve” or by introducing the current development To support paper evaluation criteria in the requirements of the topic with present tense verbs such as “are,” “is,” engineering field, Wieringa and others (2006) classified re- “can,” and “exhibit.” Move P usually follows move B and search papers into six classes: “evaluation research,” “pro- is explicitly stated. The phrase “this study” is commonly posal of solution,” “validation research,” “philosophical found in this move, and the present tense, in active or pas- research,” “opinion papers,” and “personal experience pa- sive voice, is preferred. Move M is typically expressed using pers.” Among these classes, “proposal of solution” and research activity verbs such as “were conducted,” “was “philosophical research” papers generally contain novel tested,” “estimated,” and “included” when the subject of and original technologies or concepts. Frame (2008) di- the research is an experiment or algorithm. To express vided technology innovations into the following three move R, the verbs “show” and “find” in either present or types: “derivative” (an extension of existing technology), past tense are usually used. In move D, phrases such as “platform” (a new application of existing technology), and “were attributed,” “is estimated,” and “should consider” “breakthrough” (an entirely new technology). Mullins, are used to discuss the implications, significance, interpre- Snizek, and Oehler (1988) proposed an analysis of inno- tations, and explanations of the results and findings. vation evaluation based on the structure of scientific pa- pers (i.e., introduction, methods, and results). Based on 2.3 Terminology extraction and the Stanford Parser that research, Dirk (1996) described a research work as a combination of established (E) or new (N) elements of Terminology extraction methods mainly include linguistic “theory-methods-results” and suggested eight types to (Chen and Cshen 1994; Justeson and Katz 1995), statistical evaluate the novelty of original work, ranging from E-E-E (Church and Hanks 1990; Smadja 1993), and hybrid ap- (established theory—established methods—established proaches (Bounhas and Slimani 2009; Maynard and Ananja- results) to N-N-N (new theory—new methods—new re- dou 1999; Oliver and Vàzquez 2015). Linguistic ap- Knowl. Org. 46(2019)No.3 173 Lielei Chen and Hui Fang. An Automatic Method for Extracting Innovative Ideas Based on the Scopus® Database proaches utilize the linguistic features of sentences such as 1996; Fleming 2001). The identification of innovative parts of speech and structure to identify terms. Statistical ideas in the scientific literature can be classified into those approaches consider statistical indicators such as term fre- at sentence-level and at phrase-level based on the extrac- quency, mutual information variants, co-occurrence, TFIDF tion unit (Wen, Xu, Lai and Wen 2005; Leng et al. 2013). (term frequency–inverse document frequency), and other There are two primary categories of extraction technol- methods to measure the association of terms. Recent termi- ogy: feature-based methods (Wen, Wen, Xu and Pan 2005) nology extraction methods combine terminology extraction and machine learning methods (Freitag 1998). methods with statistical approaches into a hybrid method to Feature-based methods use the linguistic features of the achieve better performance. Hybrid measures first utilize sentences in which the innovative ideas are located to ex- linguistic analysis to extract all candidates and then apply sta- tract the candidates. Dahl (2008) constructed a list of lin- tistical analysis for further selection. guistic features that potentially indicate new research con- Software including Word Segmenter, EnglishTokenizer, tributions to identify knowledge claims automatically. Wen, and Parser, developed by the Stanford Natural Language Wen, Xu and Pan (2005) established the information rela- Processing Group, has been widely used in natural language tionship between innovation and knowledge claims; thus, processing (NLP) research (https://nlp.stanford.edu/soft- a feature-based knowledge spectrum was aggregated by re- ware/). Here, we used the Stanford Part-Of-Speech Tagger lated sentences to extract innovative ideas. However, one (POS Tagger) (Toutanova et al. 2003) as a part-of-speech limitation of this method is that the feature-based rules are annotation tool to analyse sentences. POS Tagger assigns constructed manually by linguistics experts. In addition, parts of speech to each word using tokens such as noun the selected features and the rule formulations cannot (NN), verb (VB), and adjective (JJ). The tag set in Table 1 cover all the linguistic phenomena of the target text. shows thirty-six POS tags used by Stanford POS Tagger Machine learning methods primarily create rules by (Taylor et al. 2003). learning a pre-annotated corpus. Experts in fields initially annotate the corpus using certain specifications; subse- 2.4 Innovative idea extraction quently, a system trained on that corpus handles new texts automatically (Leng et al. 2013). Soderland (1997) pro- Innovation is an unusual recombination of prior know- posed a knowledge extraction system that used covering ledge (Nelson and Winter 1982; Basalla 1988; Weitzman algorithms and assembled a set of text analysis rules.

Tag Part of speech Tag Part of speech CC Coordinating conj. PP$ Possessive pronoun CD Cardinal number RB Adverb DT Determiner RBR Adverb, comparative EX Existential there RBS Adverb, superlative FW Foreign word RP Particle IN Preposition SYM Symbol JJ Adjective TO Infinitival to JJR Adjective, comparative UH Interjection JJS Adjective, superlative VB Verb, base form LS List item marker VBD Verb, past sense MD Modal VBG Verb, gerund/present pple NN Noun, singular or mass VBN Verb, past participle NNS Noun, plural VBZ Verb, 3rd ps. sg. present NNP Proper noun, singular VBP Verb, 3rd ps. sg. present NNPS Proper noun, plural WDT Wh-determiner PDT Predeterminer WP Wh-pronoun POS Possessive ending WP$ Possessive wh-pronoun PRP Personal pronoun WRB Wh-adverb

Table 1. Part of the speech tag set used in the Stanford POS Tagger. 174 Knowl. Org. 46(2019)No.3 Lielei Chen and Hui Fang. An Automatic Method for Extracting Innovative Ideas Based on the Scopus® Database

Huang and others (2012) transformed the problem of in- search studies were built upon previous endeavours. For novation extraction into a classification problem. Classifi- example, Li and others (2016) first combined the Biterm cation features such as word frequency, sentence length, topic model and K-means clustering algorithm when they and verb characteristics were selected to train the classifier. sought to discover topics from blogs. In another example, A machine learning method is faster than a manual method Recchia and Louwerse (2016) applied cognitive science ap- but requires sufficient training data. In addition, these su- proaches to Indus to estimate the provenance of ar- pervised learning methods require manual training set an- tefacts with unknown origins (the geographic origin of ar- notation, and the system’s performance is affected by the tefacts from the Indus Valley Civilization), an application marked corpus. of this technique that had not been previously proposed. These methods extract potentially innovative ideas from paper abstracts but do not validate whether the ex- 4.0 Data and methodology tracted ideas are truly innovative. For example, the authors of a paper might deem their work to be innovative even Our innovative-idea extraction method was limited to the though such work had been previously proposed, because abstracts of each scientific paper; we did not analyse the the authors had not read the related previous paper. Thus, full texts for the following reasons: 1) an abstract can rep- they describe the work as innovative using the linguistic resent the important content of the paper (Salager-Meyer features usually used to introduce innovative ideas. No 1990), and a well-written abstract can be considered key to matter how well the existing innovation extraction meth- understanding the original argument (Swales 1990). There- ods perform, they will identify such work as innovative fore, the abstract can be employed as a summary of the when it is not. To ensure that the ideas extracted are inno- main work of the whole research; 2) because abstracts are vative, we propose an automatic approach that combines much shorter in length than the full text, judging innova- N-gram extraction and web search techniques to extract tive ideas from the abstract corpus is an efficient approach; and identify the innovative ideas of scientific papers. We 3) an English-language abstract can help overcome lan- used the Scopus® database as the official corpus to iden- guage barriers (Cross and Oppenheim 2006; Small et al. tify the extracted ideas. Our main contribution is to pro- 2014). Many articles written in other languages also pro- vide the foundation for a scientific paper content analysis vide an English-language abstract containing the central and evaluation system. themes to widen readers’ access to research; and, 4) access to the full texts of papers is often restricted for some jour- 3.0 Innovative ideas in research papers nals; however, the abstracts of papers are always freely ob- tainable if the institution subscribes to a document data- The innovative ideas in research papers considered here base such as Scopus® that indexes the journals in which are the ideas that have not appeared in previously pub- the papers appeared. Therefore, using abstracts broadened lished papers. Authors of research papers need to demon- the scope of our investigation. strate that their works are rational. Therefore, they prefer In this pilot work, we limited our investigation to tech- to use terminology that is known. Otherwise, it is difficult nology and engineering papers, because the abstracts of for researchers to use academic papers to communicate. theoretical research papers often include analysis that in- Authors also need to express the differences between their terferes with automatically extracting innovative ideas; work and existing studies, as is the case for the examples thus, automatically extracting innovative ideas from ab- given in the last paragraph of this section. stracts of technology and engineering papers is both more Here, innovative ideas are extracted according to their probable and simpler. Automatically extracting innovative novelty, and they range from very small new ideas to major ideas from theoretical research papers will be attempted in innovations. This pilot work aims to establish a convenient future studies. Specifically, we used semantic analysis pa- and reliable method to extract innovative ideas from re- pers to test our method, because it is not difficult for us to search papers, and thus, the developed method provides understand the content of papers in this area. innovative ideas for future works to further grade them. Papers on semantic analysis (excluding theoretical re- The present work extracts the innovative ideas of a pa- search papers) were used to exemplify the presented auto- per at the word level. Some were expressed by a single matic innovative-idea extraction method. We downloaded phrase, such as probabilistic latent semantic analysis 1,663 abstracts from Scopus®, limited to those whose title, (PLSA) in the paper by Hofmann (1999). Some of the abstract, or keywords contained “semantic analysis,” whose other innovative ideas were expressed as a combination of publication year was from 2011 to 2017, and whose docu- two or more phrases. These phrases may be technologies ment type was article or article in press. Our research objec- presented before the paper was published or may be appli- tive addresses engineering papers. Therefore, we identified cations of existing technologies, indicating that many re- 1,014 articles that do have certain engineering innovation Knowl. Org. 46(2019)No.3 175 Lielei Chen and Hui Fang. An Automatic Method for Extracting Innovative Ideas Based on the Scopus® Database and excluded reviews, questionnaire analyses, comparisons manual judgements. The two authors of this work read the of existing methods, and specific technology evaluations. abstracts, provided their judgements on the research inno- Ideas within a paper include what the authors intended vations in the corresponding papers, and retrieved the ideas to do and how they did it. As most papers published cur- using Scopus® to determine whether the ideas had emerged rently report positive results, ideas within a paper comprise previously. We eliminated any disagreements by discussion not only what its authors intended but also what they to construct a final standard for assessing the automatic in- achieved to do, including the purpose, technology, applica- novative-idea extraction method developed below. This te- tion, etc. The knowledge organization theory (Smiraglia and dious work was time consuming and limited the number of van den Heuvel 2013) shows that works are made up of abstracts that could be used in the experiment. ideas and that ideas are made up of concepts, which can be We define φ(a) as the set of innovative ideas extracted expressed by words. The expressions and applications of by the automatic approach and φ(b) as the standard set concepts have been extensively researched. Interested read- based on the manual judgements used for comparison. ers can refer to reviews (e.g., Dahlberg 2006; Hjørland 2017; Here, φ(a) contains three subsets: φ(a1) is the subset of in- Kleineberg 2017; Arboit 2018; Mazzocchi 2018) and the ref- novative ideas that are completely included in the standard, erences therein. Therefore, our work automatically extracted i.e., φ(a1) = φ(a) ∩ φ(b); φ(a2) contains synonyms or differ- innovative ideas at the word-level, that is, we extracted the ent expressions of the elements in φ(b), and φ(a3) consists N-grams that reflect the main content of the article and of noise candidates. The metrics used to evaluate the per- then judged whether the work is innovative. An N-gram is a formance of the method are recall and precision. Recall is contiguous sequence of N items; here, an N-gram is defined the proportion of the manually judged innovative ideas ex- as a noun phrase, because the main concepts of sentences tracted by the automatic method, which can be notated as are carried primarily by noun phrases (Kamp 2008). N refers Recall = |φ(a)  φ(b)|/φ(b). Precision is the proportion of to the number of words the noun phrases contain; it is var- the automatically extracted innovative ideas that match the iable and determined by the extraction results of the Stan- artificially judged standard or their synonyms—in other ford POS Tagger. words, Precision = |φ(a1)  φ(a2)|/φ(a). We evaluated the performance of the automatic innova- Figure 1 shows that the automatic innovative-idea ex- tive-idea extraction method by comparing its results with traction method consists of the following four steps:

Input Extracting N-grams Abstract of rese- using POS Tagger arch paper N-grams of the abstract

Excluding noise ba- sed on a stop-word corpus

Filtered N-grams

Combining N-grams for further potential innovative ideas candidates

Filtered N-grams and their combinations

Finding innovative ideas by Output checking the candidates u- Innovative ideas sing Scopus® database of the input paper

Figure 1. Flowchart of the automatic innovative-idea extracting method. 176 Knowl. Org. 46(2019)No.3 Lielei Chen and Hui Fang. An Automatic Method for Extracting Innovative Ideas Based on the Scopus® Database

1) Extracting N-grams from each abstract using POS Tag- the N-grams “ubiquitous network text,” “proposed machine ger, provided by the Stanford Natural Language Pro- learning algorithm,” and “well-known retrieval algorithm” cessing Group; contain adjectives such as “ubiquitous,” “proposed” and 2) Excluding noise based on a stop-word corpus, extended “well-known” that should be removed when they are the from the work of Liu and others (2015), and excluding first words of N-grams, because they do not express an in- N-grams not included in move P or move M; novative idea. For the same reason, certain nouns such as 3) Combining two N-grams if necessary; “sample,” “method,” and “approach” should be removed 4) Checking the extracted N-gram candidates using the when they are the last term of an N-gram. In addition, cer- Scopus® database to determine whether they represent tain writing words with low information that can be used in innovative ideas. papers in many domains should also be removed. Examples are “algorithm,” “framework,” and “importance,” which More details of these four steps are explained below. contain low or no specific concept information when used alone for writing purposes. 4.1 N-gram extraction Liu and others (2015) assembled a set of noun-phrase filtering terms for the same purpose. Here, we extended In the first step, we used the Stanford POS Tagger for POS their set and assembled two sets of stop-words—descriptive tagging and extracted all the N-grams, excluding symbols, adjectives and terms used for writing purposes. Table A1 markers, numbers, special characters and tokens, which were shows the descriptive adjective stop-words. When one such tagged for example as personal pronouns (PRP) and deter- adjective is the first term of an N-gram, we remove it and miners (DT). For the reasons explained above, the N-grams retain the remaining words in the N-gram for subsequent here were confined to noun phrases; thus, the extracted N- steps. Table A2 shows two types of writing phrase stop- grams should be expressed and tagged as the following words. forms: In addition, some concepts are used in abstracts for enu- meration following the phrase “such as.” When a sentence – A sequence of nouns (e.g., “collection,” “text classifica- has an “A such as B, C and D” structure, the concepts B, C tion,” “information retrieval technology”) and D are generally attached to A. The focus of this sen- – Noun-grams following one or several adjectives (e.g., tence is concept A. If B, C and D are important, there “conceptual representation,” “salient semantic analysis”) should be other sentences describing them. Therefore, we – “Noun-grams” with “adjective-grams” and the conjunc- can ignore B, C and D if they are not described elsewhere. tion “and.” This type of N-gram should be divided into Texts also contain concepts used for comparison following two N-grams, because the conjunctions can be noise the characterization of, for example, “other,” “different when retrieved from the database. There are two dividing from,” “unlike,” and “in contrast to.” The concepts listed situations: after these characterizations are not the main idea of the ab- a) N-grams expressed as “adjective-gram(s) noun- stract and thus should also be removed. For example, in the gram(s)1 and noun-gram(s)2” should be divided into sentence, “This paper introduces the construction of the “adjective-gram(s) noun-gram(s)1” and “adjective- Semantic Lexicon of Dermatology by using the theory and gram(s) noun-gram(s)2.” For example, the N-gram technology of Natural Language Processing (NLP) which “characteristic extraction and detection” should be di- can provide the database, such as automatic semantic analy- vided into “characteristic extraction” and “character- sis, word sense disambiguation, for NLP” (Zhou et al. 2016), istic detection.” the concepts following “such as” consist of an enumeration b) N-grams expressed as “adjective-gram(s)1 and adjec- of NLP technology, which is not the focus of the sentence. tive-gram(s)2 noun-gram(s)” should be divided into In the sentence, “Unlike some traditional forecasting model “adjective-gram(s)1 noun-gram(s)” and “adjective- based on several movie-related features, this paper compre- gram(s)2 noun-gram(s).” For example, the N-gram hensively utilizes the real-time social media, microblog, to “geo-tagged and time-tagged data” should be divided realize a more accurate weekly box office forecasting into “geo-tagged data” and “time-tagged data.” model” (Chen et al. 2016), the concepts following “Unlike” are existing technology used for comparison purposes and 4.2 Noise exclusion should be removed. In addition, in the experiment, we found that the ideas The last step obtained an N-gram set from each abstract. are mainly distributed in the move P and move M portions However, this set includes some descriptive adjectives and of an abstract. The phrases listed in Table 2 are the char- words that do not carry pertinent information and that are acterizations that mark the sentence as the beginning of a used only for writing and thus are not ideas. For example, research description and appear after move B, while the Knowl. Org. 46(2019)No.3 177 Lielei Chen and Hui Fang. An Automatic Method for Extracting Innovative Ideas Based on the Scopus® Database

are designed; are developed; are presented; are proposed; are shown; design/methodology/approach; is designed; is developed; is presented; is proposed; is shown; materials and methods; methods/methods; our; the article; the article here; the paper; the present; the study; this article; this context; this contribution; this letter; this paper; this present; this publication; this research; this study; this work; was designed; was developed; was presented; was proposed; was shown; we; were designed; were developed; were presented; were proposed; were shown

Table 2. Characterizations in sentences that indicate the beginning of a research description.

analysis revealed that; as result; are demonstrated; comparative experiments; comparison experiments; conclusion; conclusions; contrast experiment; evaluation experiments; evaluation show; experimental data shows; experimental results; experimental study; experi- ments demonstrate; experiments on; experiments reveal; experiments show; experiments were performed; final conclusion; findings -; findings; findings indicate; for evaluation; for evaluation; in experiment; in experiments; in sum; is evaluated; is demonstrated; our experiment; our result; perform experiments; practical experiment; promising result; result; result achieved; result indicates that; result proves; results:; results are compared to; results demonstrate; results provide; results show; results show that; experimentation showed that that; results suggest; shows comparable performance; simulated experiment; the experiment; to illustrate; test showed that; was tested; we demonstrate; was evaluated; we evaluate; we perform; when compared to

Table 3. Characterizations in sentences that indicate the end of a research description.

phrases listed in Table 3 are the characterizations that mark because there is a usage limitation per week for one Sco- the sentence as the beginning of move R or the end of the pus® API key (see the next sub-section). Suppose one sen- research description. The proposed method excludes N- tence contains M N-grams that must be combined. There grams that are not included in move P and move M. would be M × (M – 1)/2 combinations between any two N-grams but only M – 1 combinations based on our rule 4.3 N-gram combination (combining only adjacent N-grams). The number of com- bined N-grams of the former is M/2 times the latter. Considering that some innovative ideas are expressed as a combination of N-grams, we applied a rule-based approach 4.4 Innovation judgement of N-grams to combine certain N-grams from the filtered N-gram set. Appertaining means that purpose and meaning occur to- From the aforementioned steps, we now have a set of N- gether in one sentence (Thorleuchter 2008); thus, such in- grams with filtered individual and combined concepts that novative ideas might be represented by a combination of must be classified into innovative ideas or existing con- certain concepts that occur together in the same sentence. cepts. The criterion used to judge these ideas as innovative Therefore, we combine two filtered N-grams that are not ideas is that the idea should not have been previously pro- new methods or concepts but are adjacent in a sentence that posed. We used all the abstracts collected by the Scopus® contains “of,” “to,” “with,” “by,” “for,” or a characterization database as a corpus and retrieved the N-gram candidates word such as “based,” “utilize,” “apply,” “combine,” “and,” from the Scopus® database automatically using the Sco- or “conducted” (Liu et al. 2015) and use the combination as pus® application programming interface (API) to deter- a new N-gram in the following step. mine whether they were innovative ideas. We created this combination rule for two reasons. First, The combination of terms in Section 4.3 was realized the association between elements in a sentence is stronger in this step. For example, to check whether the combina- than associations in different sentences (Thorleuchter tion of the Biterm topic model and K-means clustering 2008), and within a sentence, the association is stronger algorithm in the study by Li and others (2016) is an inno- between adjacent elements than between non-adjacent vative idea, we conducted a search of the two terms with ones. Second, combination rules should be highly efficient, the “and” operation in Scopus®. 178 Knowl. Org. 46(2019)No.3 Lielei Chen and Hui Fang. An Automatic Method for Extracting Innovative Ideas Based on the Scopus® Database

Scopus® APIs allows researchers to integrate content paper, new search engine system, ontology, technological and data from the Scopus® database into their own websites resources, system, database, ontology knowledge ware- and applications. Curated abstracts and citation data of all house, related conceptions, relationships, technological do- scholarly journals indexed by Scopus®, Elsevier’s abstract mains, semantic analysis, queries, heuristic search, techno- and citation database, can be retrieved using Scopus® APIs logical resources, intentions. (https://dev.elsevier.com/sc_apis.html/). There is an API After excluding noise and the N-grams not in move P or key for each API, and there is a usage quota enabled for each M, we obtained the following N-gram candidates: search API key per week (https://dev.elsevier.com/api_key_set- engine system, ontology, technological resources, ontology tings.html/). The quota for our abstract retrieval here is knowledge warehouse, technological domains, semantic 10,000 per week (i.e., we can send up to 10,000 retrieval re- analysis, queries, heuristic search, technological resources, quests every week to Scopus® using our API key). intentions. When using the Scopus® APIs to automatically retrieve Using the combination strategy mentioned in Section extracted N-grams from the Scopus® database to judge 4.3, we added the following combined N-grams to the N- whether the N-grams are innovative, we limited the re- gram candidates: “search engine system and ontology,” trieval scope to abstracts with publication dates before the “ontology and technological resources,” “semantic analysis publication year of the paper inspected. For a candidate and queries,” “queries and heuristic search,” “heuristic idea in the abstract of a paper inspected (Pins), if no ab- search and technological resources.” stract of a paper published before the year of the publica- By retrieving the N-gram candidates from the Scopus® tion of Pins mentioned the candidate idea, the idea was database automatically using the Scopus® API, we found classified as innovative. Here, we checked whether an idea that the following N-gram candidates had not appeared in is innovative using the Scopus® platform rather than Web the publication year of the example paper: “ontology of Science, because we have not found any API for the knowledge warehouse,” “heuristic search and technological latter platform. resources.” Obviously, the combination N-gram candidate “heuris- 4.5 An example tic search and technological resources” is not a specific in- novative idea and is an error in the results. Building an on- Here, we exemplify the proposed method with the follow- tology knowledge warehouse as a database that includes all ing paper: “Dai, W, You, Y, Wang, W, Sun, Y, Li, T. (2011) related conceptions and relationships of the technological Search engine system based on ontology of technological domain as the query conditions is an innovative idea (alt- resources. Journal of Software, 6(9): 1729-1736.” Its abstract hough it is a small new idea) in this example paper for pre- is as follows: cise and complete retrieval.

Internet has become a huge and updating information 5.0 Results warehouse, and provides a new source for us to build a well technological resources sharing system to sup- From the 1,014 abstracts used in the experiment, 4,399 N- port our research work and development activities. grams were finally extracted and classified automatically as However, the technological resources on Internet is innovative or non-innovative ideas by our method. In addi- usually diverse, professional and complex. They are tion, 2,295 manually judged innovative ideas were used as difficult to be retrieved precisely and completely by the standard for evaluating the method. Among the 4,399 traditional search engines. This paper proposed a new extracted innovative ideas, 2,272 matched the manually search engine system based on ontology of techno- judged innovative ideas; thus, the precision was 51.6%. logical resources. In that system, a database with on- Among the 2,295 innovative ideas judged manually, 1,991 tology knowledge warehouse was designed to store all were extracted by the automatic method; thus, the recall was related conceptions and the relationships of techno- 86.8%. logical domains. By semantic analysis of users' queries and a heuristic search, the expected technological re- 5.1 Effects of noise exclusion sources can be retrieved more precisely and com- pletely to satisfy their intentions. When stop-words and the terms for enumeration and comparison were not excluded, more resulting N-grams The N-grams extracted from its abstract are as follows: In- appeared combined with these stop-words and terms as ternet, information warehouse, new source, technological noise, but they were classified as innovative ideas automat- resources sharing system, research work, development ac- ically, which reduced the precision to 35.8%. tivities, technological resources, traditional search engines, Knowl. Org. 46(2019)No.3 179 Lielei Chen and Hui Fang. An Automatic Method for Extracting Innovative Ideas Based on the Scopus® Database

5.2 Mistakes of data and POS Tagger were classified as innovative ideas by the automatic extrac- tion method, reducing the precision to 39.4%. When the Spelling mistakes in the original abstracts and the errors by text to be processed contained move R but not move B, the POS Tagger made during the N-gram extraction process 5,681 N-grams were classified as innovative ideas automati- using the Stanford NLP tool also reduced the performance cally, reducing the precision to 40.1%. Only four and five of our method. Across all 1,014 of the experimental ab- abstracts mentioned innovative ideas in moves B and R, re- stracts, seventy-three text mistakes were found, and exam- spectively. Recall increases slightly when the method consid- ples of them appear in Table 4. Additionally, the POS Tag- ers move B or R, as shown in Table 6. The results show that ger made forty-five errors, and examples of these errors are limiting the text to be processed to that occurring between listed in Table 5. When these mistakes and the subsequent move B and move R excludes much interference and im- noise combinations are included, the precision decreases to proves the efficiency and accuracy of our work. 49.8%. These errors prevented the extraction of three com- Table 6 shows the compared results of our experiments bined innovation ideas, which reduced the recall to 86.6%. as discussed above. Because of the rule of combining two adjacent N-grams in the same sentence, when stop-words 5.3 Location distribution of innovative ideas are not excluded, they sometimes prevent the combination in abstracts of the two surrounding technology concepts; thus, these instances miss the chance to be judged as innovative ideas. As shown in Table 6, when the text to be processed con- Without removing stop-words, recall decreased to 80.3%. tained move B but does not contain move R, 5,780 N-grams In addition, without combining two adjacent N-grams,

Errors Correct timesequential images time sequential images bag-of- word model bag-of-word model shot segmentationsin videos shot segmentations in videos spectralanalysis spectral analysis weightestimation algorithms weight estimation algorithms machine learning techniques In this study machine learning techniques. In this study models word sense disambiguationand models word sense disambiguation and event relation ship event relationship automated semantic analyses.We automated semantic analyses. We manyanonymity algorithms many anonymity algorithms concept-basedand concept-based and

Table 4. Examples of textual errors in the original abstracts.

Wrong Correct No Sentence Term POS POS 1 help the government offer more effective assistance offer NN VB 2 guide the lexicographer through his/her task his/her NN PRP 3 performance benefits from a syntactic-based definition benefits NNS VB 4 link identification numbers with a semantic enrichment process link NN VB first JJ RB 5 The PAM first extract the dominant color compositions extract NN VB 6 The BIOMedical Search Engine Framework search VB NN 7 (PLSA) method is developed to leverage attribute information leverage NN VB develop a lexical database of Punjabi verbs leveraged in the form of a 8 leveraged NN VBN dictionary of verbs

Table 5. Examples of POS Tagger errors. 180 Knowl. Org. 46(2019)No.3 Lielei Chen and Hui Fang. An Automatic Method for Extracting Innovative Ideas Based on the Scopus® Database

Process Precision Recall Without removing stop-words 35.8% 80.3% Including textual and parser tool errors 49.8% 86.6% Containing text to be processed before move B 39.4% 86.9% Containing text to be processed after move R 40.1% 87.0% Without combining adjacent N-grams 56.7% 37.1% Final result with 4 improvement steps 51.6% 86.8%

Table 6. Precision and recall of the proposed method under different conditions. precision increased to 56.7% because of the reduction in est reduction in the method’s precision. Therefore, we plan noisy combinations, but recall decreased to 37.1%. to introduce Wordnet (Miller et al. 1990) into future work to reduce the negative influence of synonyms. 6.0 Discussion The unique experimental tools, data, or platforms used in some research also form noise that reduced the preci- This paper introduced an automatic approach to extract sion. For example, Ben Aouicha and others (2016) ex- innovative ideas from the abstracts of technology and en- ploited seventeen datasets for semantic similarity purposes gineering research papers. The results show the feasibility and semantic relatedness evaluation. These datasets, in- and effectiveness of the method; however, its perfor- cluding RG65, MC30, AG203, etc., had not been used in mance can still be improved. other research based on Scopus® retrieval, and, therefore, One challenge is that the performance of the innovative- they were automatically classified as innovations by our idea extraction method depends to some extent upon the method. Although using these seventeen datasets can be quality of the abstracts. One type of quality in abstracts in- considered novel research to some extent, they are not in- volves clarity of presentation (Timmer et al. 2003). Ab- novative ideas by themselves, and including them increases stracts that clearly, concisely, and unambiguously present the the number of noise candidates and reduces precision. main points of the research are ideal targets for our work. One shortcoming of the method is the rule of combin- In reality, most of the abstracts used in this study proved to ing only two adjacent N-grams in one sentence to express be sufficiently good to achieve the novelty extraction pur- the potential innovative ideas. This limitation might miss pose, but there were some unsatisfactory examples for innovative ideas that combine three or more technologies, which the extraction failed. In addition, some unstructured that are described in several steps in different sentences, or abstracts lack obvious characteristics to identify the portion that are represented as two non-adjacent concepts. For ex- that describes the central work of the research. For example, ample, Renu and Mocko (2016) aimed to enable retrieval some abstracts do not contain the features often used in the and knowledge sharing of text-based assembly process first sentence of the result or comparison descriptions of plans, and one innovative idea of their research lies in com- the research in abstracts, such as “experimental result” (see bining the four text-mining algorithms “word overlap,” Table 3), which resulted in noise candidates and reduced the “Jaccard score,” “term frequency-inverse document fre- precision. quency,” and “latent semantic analysis;” however, this in- Synonymy caused by different authors’ writing styles is novation cannot be extracted when only two N-grams are also a significant challenge in our work. Synonymy means allowed to be combined. Another example is Yuan and that meanings can be expressed in several different forms others (2016), who introduced an approach to analyse and (Miller et al. 1990), which leads to a problem in automati- model relationships among image sequences and key pos- cally judging innovative ideas; an N-gram in an abstract tures. They described their work in four steps. Our auto- may be an alternative expression of an existing concept. matic method does not extract innovative ideas reflected However, the retrieved result for that N-gram from Sco- in a combination of several steps proposed in different pus® indicates that it had not been previously proposed sentences. To address this problem, in future research, we before the paper’s publication; thus, the N-gram is mistak- will attempt to both combine N-grams efficiently from enly identified as innovative. For example, “latent context one sentence and concepts emerging in different sen- features” (Ren and Wang 2016) has the same meaning as tences. “context-based latent features;” however, the former ex- Another shortcoming of the combination of N-grams pression could not be retrieved from Scopus® before the is that due to the search quota limitation of the Scopus® paper’s publication year; thus, the method misjudged this API, we did not recheck the retrieved results from Sco- candidate as an innovative idea. Synonyms led to the great- pus®. Our method retrieves a combined concept using a Knowl. Org. 46(2019)No.3 181 Lielei Chen and Hui Fang. An Automatic Method for Extracting Innovative Ideas Based on the Scopus® Database strategy that relates its two terms using the operator delimitate science subfields (Olmeda-Gómez et al. 2017) “AND;” this approach returns all the abstracts that include for exploring the structure of scientific literature (Small the two N-grams. However, the two terms can appear in and Griffith 1974). One function is to detect the research different sentences in some result abstracts; thus, the asso- front (Zitt and Bassecoulard 1994). The results show the ciation of those two terms might be weak in those ab- differences between different classes of research papers stracts. The Scopus® retrieval rules allow researchers to and the similarities between papers in a same class. Our use a location qualifier operator to limit the distance be- method characterizes the novelty of a paper by comparing tween two search terms in the abstract to a specific value. it with all previous papers, even though the difference may This capability is beneficial for limiting the two terms to be slight. In short, clustering methods show differences one sentence. However, with this limitation, the retrieval among papers at the class level, while our method shows results might miss similar work describing the combined differences among papers at the article level. concepts in different sentences. In the future, to improve Research evaluations and scientific research policies that the accuracy of the results, we plan to recheck the retrieval affect researchers’ careers influence researchers’ behaviours. results by inspecting the association of the concepts in- For example, the policy that university funding should be cluded in combination candidates in the returned abstracts. based on only the number of publications, which was im- There is another reason for rechecking the returned ab- plemented in Australia in 1995, mostly led to greater pub- stracts in the future version of the method. There are two lishing activity in low-quality journals (Butler 2003). Cur- types of rules for retrieving N-grams in Scopus®: exact rently, research evaluation is mainly based on the number of match and approximate match. Exact match uses braces research publications and citations. This notion encourages ({}) around the phrase to be retrieved, and the results must researchers to study hot topics, as papers on hot topics are contain the exact phrase that occurs between the braces. more likely to be accepted for publication and to receive Approximate match uses quotes (“”) around the phrase to more citations. Such research belongs to normal science. be retrieved, and in this case, the results contain the adja- Another kind of research is scientific “revolution,” that is, cent words of the phrase but might also contain punctua- the creation of a paradigm shift (Kuhn 1962). Scientific tion between them. In addition, when an approximate “revolution” is caused by breakthrough and is excellent sci- match uses the singular form of a word in the strategy, the ence (Spier and Poland 2013). These studies can be recog- results may include its singular, plural, and possessive nized by identifying their differences from previous studies forms for most words. Thus, we use approximate match in at the article level or by identifying the new knowledge that our method, because doing so can reduce omissions they provide to human society. Additionally, if research eval- caused by different word forms. However, because the ap- uation considers innovativeness, it will encourage research- proximate match method ignores punctuation, when we ers to pursue breakthroughs. retrieve the N-grams “Word1 Word2 Word3,” for example, Innovation involves a paradox: innovation is important “Natural Language Processing,” punctuation might occur for science development, but ideas with a higher level of between the three continuous words in the returned doc- originality have a higher risk of rejection by audiences uments, for example, “Natural Language, Processing,” (Staw 1995; Cooper 2007; Mueller et al. 2012), even by ac- which does not meet our expectations. We randomly in- ademic journals (Starbuck 2003). Readers prefer normal- spected 1,027 retrieved phrases with more than two words science work or innovative works with fewer new elements in returned abstracts and found that eighteen results con- for two reasons. One reason is that existing research work tained punctuation between the retrieved continuous has provided partial recognition for the contribution; the terms, corresponding to an error rate of 1.75%. Therefore, other reason is that the professional knowledge of audi- we plan to recheck the N-grams in returned abstracts to ences is occasionally not consistent with that in innovative determine whether punctuation exists in the continuous works (Trapido 2015). Our method can help readers, in- terms, which will help to ensure the consistency of strate- cluding journal editors and reviewers, to know and con- gies and returns. The rechecking work is time consuming sider carefully the innovative ideas of research papers. but could increase the recall. This work is an extension of knowledge organization re- search. Mazzocchi (2018) proposed that there are two main 7.0 Implications items that characterize knowledge organization. One is the knowledge organization process, and the other is the In contrast to co-word and co-citation analysis, which are knowledge organization system. This work is based on the used to investigate the relationships between papers, inno- knowledge organization theory and uses an abstract data- vative-idea extraction reflects the differences in a paper base of academic papers. Therefore, it is related to another from all previously published papers. Co-word and co-ci- item that characterizes knowledge organization: knowledge tation analysis are two major clustering methods used to organization application. 182 Knowl. Org. 46(2019)No.3 Lielei Chen and Hui Fang. An Automatic Method for Extracting Innovative Ideas Based on the Scopus® Database

8.0 Limitation Chen, Kuang-hua and Hsin-Hsi Cshen. 1994. “Extracting Noun Phrases from Large-Scale Texts: A Hybrid Ap- In the experiments of this work, we used a self-judged proach and Its Automatic Evaluation.” In 32nd Annual standard to assess the performance of the proposed Meeting of the Association for Computational Linguistics: Pro- method. Although we established this standard carefully ceedings Of The Conference: 27-30 June 1994, New Mexico State and prudently, it is inevitable that bias is included in this University, Las Cruces, New Mexico, USA. [Morristown, standard, thus leading to some errors in the performance N.J.]: Association for Computational Linguistics, 234–41. evaluation. However, comparisons of the performance of Chen, Runyu, Wei Xu and Xinghan Zhang. 2016. “Dy- the method with those of different measures show that the namic Box Office Forecasting Based on Microblog method can be improved in appropriate ways. The discus- Data.” Filomat 30: 4111–24. sion section also provides potential methods for further Church, Kenneth W. and Patrick Hanks. 1990. “Word As- improvement. Together, these data and proposed im- sociation Norms, Mutual Information, and Lexicogra- provements fulfil the aim of this paper, which is automat- phy.” Computational Linguistics 16: 22–9. ically extracting innovative ideas from papers and has the Cole, Jonathan R., Stephen Cole, and Donald D. Beaver. possibility to be fully achieved and applied in practice, 1974. “Social Stratification in Science.” American Journal though the method needs further refinement. of Physics 42: 923–4. Cooper, Robert G. 2007. “Managing Technology Develop- 9.0 Conclusions ment Projects.” IEEE Engineering Management Review 35: 67–76. We performed an experimental investigation into the in- Cross, Cate and Charles Oppenheim. 2006. “A Genre novative ideas that exist in semantic analysis papers. We Analysis of Scientific Abstracts.” Journal of Documenta- proposed an automatic approach to extract the knowledge tion 624: 428–46. claims in abstracts and judge whether they are innovative Dahl, Trine. 2008. “Contributing to the Academic Conver- by comparing them with retrievals from the Scopus® da- sation: A Study of New Knowledge Claims in Econom- tabase. This approach does not require manually assem- ics and Linguistics.” Journal of Pragmatics 40: 1184–201. bling a domain corpus. A list of stop-words and the char- Dahlberg, Ingetraut. 2006. “Knowledge Organization: A acteristics of research descriptions were developed to ex- New Science?” Knowledge Organization 33: 11–9. clude noise. By considering the distribution of text de- Dirk, Lynn. 1996. “From Laboratory to Scientific Litera- scribing innovation in abstracts and excluding stop-words, ture: The Life and Death of Biomedical Research Re- the performance of our method was improved. We believe sults.” Science Communication 18: 3–28. that with further improvement, our research will be helpful Dirk, Lynn. 1999. “A Measure of Originality: The Ele- to the development of a research evaluation system. ments of Science.” Social Studies of Science 29: 765–76. Endres-Niggemeyer, Brigitte. 1998. Summarizing Infor- References mation: Including CD-Rom “SimSum”, simulation of summa- rizing, for Macintosh and Windows. Berlin: Springer. Arboit, Aline Ellis. 2018. “Knowledge Organization: From Fiala, Jaroslav, Jiří J.Mareš, and Jaroslav Šesták. 2017. “Re- Term to Concept, From Concept to Domain.” Knowledge flections on How to Evaluate the Professional Value of Organization 45: 125–36. Scientific Papers and Their Corresponding Citations.” Basalla, George. 1988. The Evolution of Technology. Cam- Scientometrics 112: 697–709. bridge: Cambridge University Press. Fleming, Lee. 2001. “Recombinant Uncertainty in Techno- Ben Aouicha, Mohamed, Mohamed A. Hadj Taieb and logical Search.” Management Science 47: 117–32. Abdelmajid Ben Hamadou. 2016. “LWCR: Multi-Lay- Frame, J. Davidson. 2008. Review of Reinventing Project Man- ered Wikipedia Representation for Computing Word agement: The Diamond Approach to Successful Growth and Inno- Relatedness.” Neurocomputing 216: 816–43. vation, by Aaron J. Shenhar and Dov Dvir. Project Management Bounhas, Ibrahim and Yahya Slimani. 2009. “A Hybrid Ap- Journal 39: 96. doi:10.1002/pmj.20027 proach for Arabic Multi-Word Term Extraction.” In Freitag, Dayne. 1998. “Multistrategy Learning for Infor- 2009 International Conference on Natural Language Processing mation Extraction.” In Machine Learning: Proceedings of and Knowledge Engineering. [Piscataway, N.J.]: IEEE, 429– the Fifteenth International Conference (ICML '98), ed. Jude 36. doi:10.1109/NLPKE.2009.5313728 Shavlik. San Francisco: Morgan Kaufmann, 161–9. Butler, Linda. 2003. “Explaining Australia's Increased Garfield, Eugene. 1972. “Citation Analysis as a Tool in Jour- Share of ISI Publications the Effects of a Funding For- nal Evaluation.” Science 178: 471–9. mula Based on Publication Counts.” Research Policy 32: Garfield, Eugene. 1979. Citation Indexing. New York: Wiley. 143–55. doi:10.1016/S0048-7333(02)00007-0 Knowl. Org. 46(2019)No.3 183 Lielei Chen and Hui Fang. An Automatic Method for Extracting Innovative Ideas Based on the Scopus® Database

Guetzkow, Joshua, Michele Lamont, and Gregoire Mallard. Liu, Haixia, James Goulding and Tim Brailsford. 2015. “To- 2004. “What is Originality in the Humanities and the So- wards Computation of Novel Ideas from Corpora of cial Sciences?” American Sociological Review 69: 190–212. Scientific Text.” In Machine Learning and Knowledge Discov- Hartley, James and Lucy Betts. 2009. “Common Weak- ery in Databases: European Conference, ECML PKDD 2015, nesses in Traditional Abstracts in the Social Sciences.” Porto, Portugal, September 7-11, 2015, Proceedings, Part II, ed. Journal of the Association for Information Science and Technol- Annalisa Appice, Pedro Pereira Rodrigues, Vítor Santos ogy 60: 2010–8. Costa, João Gama, Alípio Jorge, and Carlos Soares. Lec- Hofmann Thomas. 1999, “Probabilistic Latent Semantic ture Notes in Computer Science 9285. Cham: Springer, Analysis.” In Uncertainty in Artificial Intelligence: Proceedings 541–56. doi:10.1007/978-3-319-23525-7_33 of the Fifteenth Conference (1999), July 30-August 1, 1999, Mazzocchi, Fulvio. 2018. “Knowledge Organization System Royal Institute of Technology (KTH), Stockholm, Sweden, ed. (KOS): An Introductory Critical Account.” Knowledge Or- Kathryn B. Laskey and Henri Prade. San Francisco, CA: ganization 45: 54–78. Morgan Kaufmann, 289–96. Maynard, Diana and Sophia Ananiadou. 1999. “Identifying Hjørland, Birger. 2017. “Classification.” Knowledge Organiza- Contextual Information for Multi-Word Term Extrac- tion 44: 97–128. tion.” In TKE '99: Terminology and Knowledge Engineering; Huang, Ke-Chun, Charles C. Liu, Shung-Shiang Yang, Proceedings, Fifth International Congress on Terminology and Furen Xiao, Jau-Min Wong, Chun-Chih Liao, and I-Jen Knowledge Engineering, 23-27 August 1999, Innsbruck, Aus- Chiang. 2012. “Classification of PICO Elements by Text tria, ed. Peter Sandrini. Vienna: TermNet, 212–21. Features Systematically Extracted from PubMed Ab- Milas-Bracović, Milica and Jasenka Zajec. 1989. “Author stracts.” In Proceedings: 2011 IEEE International Conference Abstracts of Research Articles Published in Scholarly on Granular Computing; Kaohsiung, Taiwan, Nov. 8-10, 2011, Journals in Croatia (Yugoslavia): An Evaluation.” Libri ed Tzung-Pei Hong. Piscataway, NJ: IEEE, 279–83. 39: 303–18. Jamar, Nina, Alenka Šauperl and David Bawden. 2014. Miller, George A., Richard Beckwith, Christiane Fellbaum, “The Components of Abstracts: The Logical Structure Derek Gross and Katherine Miller. 1990. “Introduction of Abstracts in the Areas of Materials Science and to WordNet: An On-Line Lexical Database.” Interna- Technology and of Library and Information Science.” tional Journal of Lexicography 3: 235–44. New Library World 115: 15–33. Mueller, Jennifer S., Shimul Melwani and Jack A. Goncalo. Justeson, John S. and Slava M. Katz. 1995. “Technical Ter- 2012. “The Bias Against Creativity: Why People Desire minology: Some Linguistic Properties and an Algo- but Reject Creative Ideas.” Psychological Science 23: 13–7. rithm for Identification in Text.” Natural Language Engi- Mullins, N., W. Snizek and K. Oehler. 1988. “The Structural neering 1: 9–27. Analysis of a Scientific Paper.” In Handbook of Quantita- Kamp, Hans. 2008. “A Theory of Truth and Semantic Rep- tive Studies of Science & Technology, ed A. F. J. van Raan. resentation.” In Formal Semantics: The Essential Reading, ed. Amsterdam: North-Holland, 81–105. Paul Portner and Barbara H. Parte. Oxford: Blackwell, Nelson, Richard R. and Sidney G. Winter. 1982. An Evolu- 189–222. tionary Theory of Economic Change. Cambridge, MA: Har- Kanoksilapatham, Budsaba. 2013. “Generic Characterisa- vard University Press. tion of Civil Engineering Research Article Abstracts.” 3L Oliver, Antoni and Mercе ̀ Vàzquez. 2015. “TBXTools: A Southeast Asian Journal of English Language Studies 19: 1–10. Free, Fast and Flexible Tool for Automatic Terminology Kleineberg, Michael. 2017. “Integrative Levels.” Knowledge Extraction.” In International Conference Recent Advances in Organization 44: 349–79. Natural Language Processing Hissar, Bulgaria 7–9 September, Kosten, Joost. 2016. “A Classification of the Use of Re- 2015: Proceedings, ed. Galia Angelova, Kalina Bontcheva, search Indicators.” Scientometrics 108: 457–64. and Ruslan Mitkov. Shoumen, Bulgaria: Incoma, 473–9. Kuhn, Thomas S. 1962. The Structure of Scientific Revolutions. Olmeda-Gómez, Carlos, Maria-Antonia Ovalle-Peran- Chicago: University of Chicago Press. dones, and Antonio Perianes-Rodríguez. 2017. “Co- Leng, Fuhua, Rujiang Bai, and Qingsong Zhu. 2013. “A Hy- word Analysis and Thematic Landscapes in Spanish In- brid Semantic Information Extraction Method for Scien- formation Science Literature, 1985–2014.” Scientometrics tific Research Papers.” Library and Information Service 57: 113: 195–217. 112–9. Recchia, Gabriel L. and Max M. Louwerse. 2016. “Archae- Li, Weijiang, Yanming Feng, Dongjun Li, and Zhengtao Yu. ology Through Computational Linguistics: Inscription 2016. “Micro-Blog Topic Detection Method Based on Statistics Predict Excavation Sites of Indus Valley Arti- BTM Topic Model and K-Means Clustering Algorithm.” facts.” Cognitive Science 40: 2065–80. Automatic Control and Computer Sciences 50: 271–7. 184 Knowl. Org. 46(2019)No.3 Lielei Chen and Hui Fang. An Automatic Method for Extracting Innovative Ideas Based on the Scopus® Database

Ren, Kai and Shi-Wen Wang. 2016. “Improved Convolu- schaft für Klassifikation e.V., Albert-Ludwigs-Universität Frei- tional Neural Network for Biomedical Word Sense Dis- burg, March 7–9, 2007, ed. Christine Preisach, Hans ambiguation with Enhanced Context Feature Model- Burkhardt, Lars Schmidt-Thieme, and Reinhold Decker. ling.” Journal of Digital Information Management 14: 342–50. Berlin: Springer, 413–20. Renu, Rahul S. and Gregory Mocko. 2016. “Computing Timmer, Antje, Lloyd R. Sutherland and Robert J. Hilsden. Similarity of Text-Based Assembly Processes for Know- 2003. “Development and Evaluation of a Quality Score ledge Retrieval and Reuse.” Journal of Manufacturing Systems for Abstracts.” BMC Medical Research Methodology 3: 1–7. 39: 101–10. doi:10.1186/1471-2288-3-2 Rogers, Debra M. Amidon. 1993. “Knowledge Innovation Toutanova, Kristina, Dan Klein, Christopher D. Manning, System: The Common Language.” Journal of Technology and Yoram Singer. 2003. “Feature-Rich Part-of-Speech Studies 19: 2–8. Tagging with a Cyclic Dependency Network.” In Human Salager-Meyer, Françoise. 1990. “Discoursal Flaws in Med- Language Technology Conference of the North American Chapter ical English Abstracts: A Genre Analysis per Research- of the Association for Computational Linguistics: Proceedings of and Text-Type.” Text 10: 365–84. The Conference and Associated Workshops. East Stroudsburg, Smiraglia, Richard P. and Charles van den Heuvel. 2013. PA: Association for Computational Linguistics, 252–9. “Classifications and Concepts: Towards an Elementary Trapido, Denis. 2015. “How Novelty in Knowledge Earns Theory of Knowledge Interaction.” Journal of Documen- Recognition: The Role of Consistent Identities.” Re- tation 69: 360–83. search Policy 44: 1488–500. Small, Henry, Kevin W. Boyack, and Richard Klavans. 2014. Weitzman, Martin L. 1996. “Hybridizing Growth Theory.” “Identifying Emerging Topics in Science and Technol- American Economic Review, 86: 207–12. ogy.” Research Policy 43: 1450–67. Wen, Youkui, Guohua Xu, Bonian Lai, and Hao Wen. Small, Henry and Belver C. Griffith. 1974. “The Structure 2005. Knowledge Element Mining. Xi’an: Xi’an Electronic of Scientific Literature. I: Identifying and Graphing Spe- Science & Technology University Press. cialties.” Science Studies 4: 17–40. Wen, Youkui, Hao Wen, Duanyi Xu, and Longfa Pan. Smadja, Frank. 1993. “Retrieving Collocations from Text: 2005. “Knowledge Element Mining in Knowledge Xtract.” Computational Linguistics 19: 143–77. Management.” Journal of the China Society for Scientific and Soderland, Stephen G. 1997. “Learning Text Analysis Technical Information 24: 663–8. Rules for Domain-Specific Natural Language Pro- Wieringa, Roel, Neil Maiden, Nancy Mead, and Colette cessing.” Ph.D. diss, University of Massachusetts. Rolland. 2006. “Requirements Engineering Paper Clas- Spier, Ray E. and Gregory A. Poland. 2013. “What is Ex- sification and Evaluation Criteria: a Proposal and a Dis- cellent Science and How Does it Relate to What We cussion.” Requirements Engineering 11: 102–7. Publish in Vaccine?” Vaccine 31: 5147–8. Wu, Zhiqian. 2015. “Average Evaluation Intensity: A Qual- Starbuck, William H. 2003. “Turning Lemons into Lem- ity-Oriented Indicator for the Evaluation of Research onade: Where is the Value in Peer Reviews?” Journal of Performance.” Library & Information Science Research 37: Management Inquiry 12: 344–51. 51–60. Staw, Barry M. 1995. “Why No One Really Wants Creativ- Xu, YanZhang. 2001. “Innovation: The Soul of Science.” ity.” In Creative Action in Organizations, ed. Ford Cameron Science & Technology Review 19: 8–11. and Gioia Dennis. Thousand Oaks, CA: Sage, 161–6. Yuan, Hejin, Chunhong Duo and Weihua Niu. 2016. “A Hu- Swales, John M. 1990. Genre Analysis: English in Academic and man Behavior Recognition Method Based on Latent Se- Research Settings. Cambridge: Cambridge University Press. mantic Analysis.” Journal of Information Hiding and Multime- Taylor, Ann, Mitchell Marcus and Beatrice Santorini. 2003. dia Signal Processing 7: 489–98. “The Penn Treebank: An Overview.” In Treebanks, ed. Zhou, Yang, Nan Xiang, Ruixiang Wang, Yao Liu, Xingli- Anne Abeillé. Text, Speech and Language Technology ang Qi and Zhenguo Wang. 2016. “Construction of Se- 20. Dordrecht: Springer, 5–22. doi:10.1007/978-94-010- mantic Lexicon of Dermatology.” ICIC Express Letters 0201-1_1 10: 1725–30. Thorleuchter, Dirk. 2008. “Finding New Technological Zitt, Michel and Elise Bassecoulard. 1994. “Development Ideas and Inventions with Text Mining and Technique of a Method for Detection and Trend Analysis of Re- Philosophy.” In Data Analysis, Machine Learning and Appli- search Fronts Built by Lexical or Co-citation Analysis.” cations: Proceedings of the 31st Annual Conference of the Gesell- Scientometrics 30: 333–51.

Knowl. Org. 46(2019)No.3 185 Lielei Chen and Hui Fang. An Automatic Method for Extracting Innovative Ideas Based on the Scopus® Database

Appendix

Table A1 shows the descriptive adjective stop-words. When one such adjective is the first term of an N-gram, we remove it and retain the remaining words in the N-gram for subsequent steps. Table A2 shows the writing phrase stop-words, of which there are two types. The first type contains the words that should be removed when they are extracted as a single N-gram. The other type occurs when a word such as “sample,” “method,” or “approach” is the last term of an N-gram; in such cases, we remove the last word and retain the remaining words in the N-gram.

When the following words are the first terms of N-grams, remove the first word and retain the rest. above; acceptable; accurate; additional; advanced; aforementioned; apparent; appropriate; arbitrary; available; bad; basic; best; bewildering; brief; careful; certain; challenging; chaotic; chief; chosen; collected; common; comprehensive; considerable; corresponding; creative; credible; critical; crucial; current; detailed; different; distinct; distinctive; developed; diverse; difficult; easy; effective; efficient; elaborate; emphasis; entire; essential; established; eventual; excellent; exhaustive; existing; existent; extracted; felicitous; few; final; first; following; follow-up general; great; given; good; helpfulness; high quality; high-performance; high-quality; huge; important; improved; improper; incomplete; independent; inappropriate; incremental; insightful; insufficient; interesting; inventive; judicious; known; large; large-scale; large-granular; latest; longstanding; main; major; many; mass; mass-use; massive; meaningful; methodological; modern; more; most; namely; necessary; new; next; not; novel; numerous; obtained; off-the-shelf; old; only; overall; own; particular; personal; plausible; popular; possible; potential; powerful; practical; precise; pre-existing; previous; primary; prior; progress; promising; proposed; recorded; related; reasonable; recent; relevant; reliable; respective; rich; robust; same; satisfying; second; several; sharing; significant; simple; small-scale; so-called; so-called; special; specific; state-of-the-art; strong; subsequent; successful; such; sufficient; suitable; superior then; total; tough; traditional; trend-breaking; turn; typical; ubiquitous; understandable; unique; unknown; unnecessary; unreliable; useful; usual; valid; valuable; various; vast; well-defined; well-established; well-known; whole; wrong

Table A1. Stop-words of descriptive adjectives used for writing purposes.

The following words should be removed when they are extracted as a single word. ability; absence; accomplishment; accuracy; achievement; activity; adaptation; addition; adequacy; advance; advantage; advantageous function; agenda; algorithm; amount; analogy; analysis; answer; application; approach; approximation; architecture; area; article; aspect; associate; assumption; attempt; attribute; background; basis; bulk; capacity; case; case study; category; cause; challenge; characteristic; class; code; coefficient; collocation; combination; comment; com- munity; companionship; comparison; competence; competency; competitive advantage; complete procedure; completeness; comple- tion; complex; complicated problem; computing precision; concept; conception; conclusion; condition; connotation; consolidated statement; construct; construction; content; contribution; convenience; core; core aim; core attribute; core feature; core idea; correla- tion; course; creativity; data; database; dataset; definition; description; design; detail; determination; development; difference; diffusion; dimension; disclosure; discovery; discussion; dissertation; diversity; domain; drawback; effectiveness; efficacy; efficiency; effort; enhancement module; enrichment; entity; essay; estimation; ethics; evaluation; event; eventual; everyday; evidence; exemplary tasks; example; experiment; explanation; explication; exploration; expression; extensibility; facet; facility; feature; feature guarantee; field; figure; flexibility; focus; form; formalism; formation; formed indicator; former; frame; framework; function; further; further improvement; further validation; gap; generation; goal; graph; group; heavy; heuristic; hiding; high correlation; hypotheses; 186 Knowl. Org. 46(2019)No.3 Lielei Chen and Hui Fang. An Automatic Method for Extracting Innovative Ideas Based on the Scopus® Database

idea; identification; image; impact; impetus; implementation; implication; importance; impossibility; improvement; inclusion; incom- pleteness; inconsideration; inconsistency; increase; individual; information; interconnection; innovative; input; insight; item; inter; inter- est; integration; interpretable way; investigation; issue; key factor; kind; knowledge; label; lack; latter; level; limitation; list; literature; manifest validity; mean; meaning; measurement; mechanism; medline; mistake; mix; merit; method; methodology; model; modelling; model parameter; module; multifold; multiplicity; need; node; notion; novelty; number; object; objective; ones; operation; opinion; order; original source; outlook; output; pace; paper; paradigm; parallel application; parameter; part; participant; people; performance; performance characteristic; period; phe- nomena; phrase; piece; platform; point; popular approach; portion; possibility; practicality; practice; precision value; precondition; pre- work; present; presence; present paper; principle; probe; problem; problem situation; procedure; process; processing; product attribute; project; promising approach; proposal; proposition; purpose; quality; quantity; question; range; raw data; realization; record; redundancy; reference; reflection; reformation; relation; relationship; relative improvement; relative value; reliability; remodelling; representation; requirement; research; research theme; research trend; resolution; respect; responsiveness; restriction; restructure; result; rigorous; role; rule; sample; satisfaction; scale; scientist; scope; score; sentence; set; shortcoming; signature; significance; similarity; simulation; single experi- ment; size; solution; specificity subtopic; standard measure; standing; step; stock; strategy; structure; study; subject; subject matter; suc- cess; superiority; supplement; support; survey; sustainable good performance; system; tally; target; task; technique; technologist; technology; technological problem; text; theme; theoretical principles; thesis; time; tool; toolkit; topic; training example; training set; transformation; truth; type; typicality; underlying mechanism; understanding; unnecessity; usability; usage; use; usefulness; user; validity; value; variability; variance; variety; vein; version; void; volume; way; weakness; weight; whole research; wide scope; word; work When the following words are the last word of an N-gram, remove the last word and retain the rest. approach; case; comparison; complexity; cost; efficiency; example; experiment; insight; measure; mechanism; method; methodology; one; problem research; researcher; sample; score; stage; task; technique; theory; phenomenon; principle; property; quality

Table A2. Stop-words of terms used for writing purposes.

Knowl. Org. 46(2019)No.3 187 D. Naskar and S. Das. HNS Ontology using Faceted Approach

HNS Ontology Using Faceted Approach† Debashis Naskar*, Subhashis Das** *Polytechnic University of Valencia, Department of Computer Systems and Computation, (DSIC), Valencia, Spain, **University of Trento, Department of Computer Science and Information Engineering, (DISI), Trento, Italy,

Debashis Naskar has obtained a master’s degree in library and information science from the Documentation Research and Training Centre, Indian Statistical Institute, Bangalore in 2014. Currently he is a PhD student at the Department of Computer Systems and Computation, Polytechnic University of Valencia (Spain). His current research interest is on sentiment analysis and emotion prediction in social networks.

Subhashis Das currently works as a post-doctoral researcher at the Department of Information Engineering and Computer Science, University of Trento (Italy). He obtained his PhD from ICT-Doctoral School, University of Trento,. Subhashis does research in information science, computing in health science, arts and humanities and geoinformatics (GIS).

Naskar, Debashis and Subhashis Das. 2019. “HNS Ontology using Faceted Approach.” Knowledge Organization 46(3): 187-198. 44 references. DOI:10.5771/0943-7444-2019-3-187.

Abstract: The purpose of this research is to develop an ontology with subsequent testing and evaluation, for identifying utility and value. The domain that has been chosen is human nervous system (HNS) disorders. It is hypothesized here that an ontology-based patient records management system is more effective in meeting and addressing complex information needs of health-care personnel. Therefore, this study has been based on the premise that developing an ontology and using it as a component of the search interface in hospital records management systems will lead to more efficient and effective management of health-care. It is proposed here to develop an ontology of the domain of HNS disorders using a standard vocabulary such as MeSH or SNOMED CT. The principal classes of an ontology include facet analysis for arranging concepts based on their common characteristics to build mutually exclusive classes. We combine faceted theory with description logic, which helps us to better query and retrieve data by implementing an ontological model. Protégé 5.2.0 was used as ontology editor. The use of ontologies for domain modelling will be of acute help to doctors for searching patient records. In this paper we show how the faceted approach helps us to build a flexible model and retrieve better information. We use the medical domain as a case study to show examples and implementation.

Received: 3 November 2018; Revised: 18 February 2018; Accepted: 21 March 2019

Keywords: ontology, domain, faceted, information, data, health care, medical

† To access our ontology, download the owl file and upload into the WebProtégé tool. The following links will help you to download and access our ontology: – HumanNervousSystem.owl raw file can be download from GoogleDrive link: https://drive.google.com/file/d/1Aw7LPafkCYSaorxjPJMvC8Nr2es9iPVk/view – Link to access HumanNervousSystem.owl file in the WebProtégé: https://webprotege.stanford.edu/#projects/a5ba0b79-4141-4612-8252-4714a538cd6b/edit/Classes

1.0 Introduction became popular in the 1990s, ontology was used as a new catchword for knowledge representation artifacts in expert Ontology has been defined as the conceptualization of a systems. It is used in this field to refer to a detailed schema domain. The term is somewhat ambiguous, insofar as it has of a “slice of reality” based on known facts about that re- been employed to refer both to an artifact and to a set of ality (domain). In the fields of information retrieval, con- philosophical principles. Indeed, the term ontology has tent management and knowledge management, ontologies been used in a number of different senses in different sci- are increasingly being seen as tools for knowledge repre- entific fields. Nonetheless, it is in its association with com- sentation to facilitate, support and enhance the quality of putational approaches that it has acquired importance and resource discovery and information retrieval. Ontologies prominence in recent years. This is because when the term play an important role in the semantic web, and the number 188 Knowl. Org. 46(2019)No.3 D. Naskar and S. Das. HNS Ontology using Faceted Approach of ontologies in a wide range of domains has been devel- in library and information science, which is generally used oped, which is a clear indication of the growing recognition for classifying different domains. Faceted classification is of the importance of ontologies (Naskar and Dutta 2016). “the sorting of terms in a given field of knowledge into An area that has seen quite a few research papers in the homogeneous, mutually exclusive facets, each derived application of ontology is the domain of health care and from the parent universe by a single characteristic of divi- delivery. Khoo et al. (2011) have demonstrated that an on- sion” (Vickery 1968), described in Ranganathan (1937) and tology can support evidence-based medical practice and implemented in Ranganathan (1989). alert doctors to the range and quality of clinical data avail- The rest of this paper is organized as follows. Section 2 able to make informed treatment decisions. Shepherd and reviews the related work. Section 3 describes the evalua- Sampalli (2012) have shown the use of ontologies as tion of the faceted theory. Section 4 explains different re- boundary objects that could help enhance the quality of quirements and methodologies of building ontologies. health-care and delivery. Lee et al. (2004) have worked on Section 5 shows the method of verification by implemen- automatic methods to identify treatment relations in med- tation. Section 6 explains the process of evaluation by ical ontology. Khoo and Na (2009) developed an ontology SPARQL queries. The final section concludes and explains to represent the knowledge-base for a clinical decision the direction of our future work. support system for wound management. There is a con- siderable degree of interest among LIS professionals in the 2.0 Literature review use of ontologies for domain modelling as evident from the papers on the subject (Prieto-Díaz 2003). The patient There are a number of studies regarding the modelling of record management systems in use in many hospitals also the medical domain that propose various opinions and suffer from limitations in terms of their ability to support methodologies for its detection. complex searches; for example, consider a request for rec- ords of patients in a certain age group with certain speci- 2.1 Medical ontology fied symptoms and ailment, treated with a particular drug having some after effects. Such a complex query may be Some well-known researchers built different ontology difficult to meet using the systems that are used in most models related to a medical domain (brain tumor, nervous hospitals. This is particularly evident in the records of pa- system) by proposing their opinions and methodologies. tients that are maintained in hospitals. A major factor is Khoo et al. (2000) developed a method to extract know- that data input to patient records are made by health-care ledge and to identify the information that is explicitly ex- personnel of different types and levels, e.g., physicians, pressed in medical abstracts in the Medline database. They pathologists, surgeons, nurses, physiotherapists, etc. This used Conexors FDG parser to construct a syntactic parse leads to a considerable degree of inconsistency in the vo- tree for each target sentence and four medical domain ar- cabulary/terminology used in describing symptoms, dis- eas related to heart disease, AIDS, depression and schizo- eases, etc. Motivated by these contrasting observations on phrenia. Lee et al. (2003) developed an automatic method effectiveness of using ontologies in different domains, we from existing ontologies to identify semantic relations be- restructure ontology for relevant purposes and attempt to tween the concepts in a medical domain by using the improve delivery of quality health-care service. The pur- UMLS (Unified Medical Language System) semantic net. pose of this paper is to develop an ontology related to hu- Murugavalli and Rajamani (2006) carried out a high speed man nervous system (HNS) disorders—for evaluating its parallel Fuzzy C Means (FCM) algorithm for brain tumour utility and value. We perform complex queries to address segmentation for the clustering of both the sequential relevant information needs of health-care personnel. Us- FCM and parallel FCM. The following year, Murugavalli ing an ontology as a component of the search interface in and Rajamani (2007) came up with an improved imple- hospital records management systems will lead to more ef- mentation of a brain tumor detection technique using seg- ficient and effective management of health-care. mentation based on the Neuro-Fuzzy technique. Khoo et It is proposed here to develop an ontology of the do- al. (2011) have shown that the basic idea is that a training main of HNS disorders using a standard vocabulary such set of documents is used to build the ontology. Then a test as Medical Subject Heading (MeSH) or Systematized Nomen- set is used to evaluate whether the ontology covers most clature of Medicine - Clinical Terms (SNOMED-CT). The of the relevant concepts and relations in the domain. They principal classes of the ontology will include: HNS, dis- applied the UMLS semantic network, MeSH, and the Na- eases/disorders, diagnosis, treatments/therapy, symptoms, tional Cancer Institute (NCI) thesaurus as the base medical side effects, etc. In this paper, we have demonstrated an ontology, enriched with relations to link potential medical ontology-based modelling of HNS disorders using Ranga- treatments with diseases. Shepherd and Sampalli (2012) nathan’s faceted approach (1937), a well-known principle built an ontology based on SNOMED-CT as a boundary Knowl. Org. 46(2019)No.3 189 D. Naskar and S. Das. HNS Ontology using Faceted Approach object to bridge the semantic interoperability gap between eted approach for the purpose of development of various members of multidisciplinary health-care teams caring for information retrieval tools. She found that the faceted ap- patients with chronic diseases. In recent years, different ap- proach as a standard theory can function as a tool for proaches proposed for learning ontologies in the medical browsing, for navigation and for retrieval. Correspond- domain. Likewise, Rios-Alvarado et al. (2015) proposed a ingly, work by Agostini et al. (2011) represented a formal new ontology learning approach that discovers hierarchical framework to refine the original query for search and re- relations and axiom extraction over the medical domain. trieval purposes by using general principles of faceted clas- One interesting study by Alkhammash et al. (2016) de- sification. They used ALC (attributive language complex signed an ontology associated with water quality and kid- concept negation) description logic to implement the facet ney diseases to assist physicians in predicting certain dis- engine as the main component of this method. ALC is a eases such as the existence of stones, gravels and cancer. core attributive language (AL)-based description logic Puri et al. (2011) suggested an ontology-based approach to which complements (ALC); unlike AL, the complement of integrate heterogeneous healthcare data for building a rec- any concept is allowed, not just the complement of atomic ommendation system. concepts. From the permissible constructors’ point of Recently human nervous system ontology has gained view, ALC would be equivalent to AL Concept Union and more popularity. Hamilton et al. (2012) proposed an infor- Full Existential qualification (ALUE), although the latter matics infrastructure to describe neurons through a stand- name is not used. ALC concept expressions can include ard terminology. They also discussed current national and concept names, concept intersection, concept union, com- international efforts to address the complexity of neuronal plement, existential and universal quantifiers, and individ- types within the Neuroscience Information Framework ual names (Donini et al. 1997; Baader et al. 2003). (NIF) and the International Neuroinformatics Coordinat- Prieto-Díaz (2003) proposed a faceted classification ing Facility (INCF) Neuron Registry initiative. Similarly, method to build an ontology for identifying and categoriz- Imam et al. (2012) developed a knowledge model called ing concepts. Similarly, by using an analytico-synthetic ap- Neuroscience Information Framework Standardized On- proach, Ghosh and Panigrahi (2015) developed an ontol- tologies (NIFSTD) which provides an extensive collection ogy in the library and information science domain to prove of standard neuroscience concepts along with the syno- the relevance and importance of Ranganathan’s philoso- nyms and relationships. Another interesting study was done phy. To overcome semantic interoperability issues in a by Köhler et al. (2016), defining a characteristic of the nerv- knowledge base system and to exploit the benefits offered ous system with principles of ontology. They followed nine by the state of the art technologies Hasan et al. (2015) de- steps (including thresholding, watershed segmentation, veloped an ontology named Earthquake Engineering Re- morphological operation) for detecting disease matching search Projects and Experiments (EERPE) using a faceted them with their existing database containing images of neu- approach. Another study carried out by Das and Roy rologic diseases. In a recent study done by Polavaram and (2016), created a faceted based ontological framework on Ascoli (2017), where they established an ontology-based the brain tumor domain to retrieve and facilitate semantic search engine of interconnected hierarchies focusing on query answering. Other notable work on the semantic web the main dimensions of animal species, anatomical regions, domain was influenced by faceted classification theory. For and cell types. They mapped each metadata term into the example, for the purpose of representing multiple classifi- formal ontology that explicitly resolves all ambiguities cation criteria, authors Rodriguez-Castro et al. (2010) ex- caused by synonymy and homonymy. amined a simplified procedure to develop a faceted classi- fication scheme (FCS) for domain specific concepts. 2.2 Faceted approach 3.0 Evolution of faceted theory There is a considerable degree of interest among library and information science (LIS) professionals in the use of According to Ranganathan’s faceted classification (1989), ontologies for domain modelling as evident from the num- knowledge can be divided into five fundamental catego- ber of papers on the subject. Several studies have been car- ries: “personality” (P), “matter” (M), “energy” (E), “space” ried out regarding the modeling of ontologies and pro- (S) and “time” (T)—well known as PMEST. The notion posed faceted approaches for classifying, organizing and of a refined faceted theory proposed by Bhattacharyya searching web documents. Earlier, Ellis and Vasconcelos (1981), consists four categories: “discipline” (or domain) (2000) used faceted classification in subject directories and (D), “entity” (E), “property” (P) and “action” (A), plus an- search engines and Yee et al. (2003) for retrieval of images other special category called “modifier” (m); this is known from a large database collection. Broughton (2006) esti- by the acronym DEPA. DERA, which stands for “do- mated the impact of faceted classification and used a fac- main,” “entity,” “relation” and “attribute,” is a faceted 190 Knowl. Org. 46(2019)No.3 D. Naskar and S. Das. HNS Ontology using Faceted Approach knowledge organization framework. It makes a provision Malik 2013, 1) as “a well-defined practice for conducting for the organization of knowledge into facets by defining enterprise analysis, design, planning and implementing by them as per their domains (Giunchiglia et al. 2014). In using a holistic approach at all times for the successful de- DERA, domain consists of three elements, namely “en- velopment and execution of the strategy.” EA offers meth- tity” (E), “relation” (R) and “attribute” (A), i.e., D =< E, odology and reusable architecture to systematically assist R, A >. We would like to describe the HNS ontology from large-scale systems and helps to create a set of health ar- the DERA perspective. In this ontology, the nervous sys- chitecture components that can be reused globally. EA tem is a “domain” (D), which contains a class, relation be- could be a possible approach to design and develop health tween classes or objects and attribute or characteristic for information systems for global health-care. However, refining class or entity. Entity is (Giunchiglia et al., 2014, some limitations that appeared on designing tool and tech- #51), “an elementary component that consists of classes nique for the HIS application include: (categories) and their instances, having either perceptual correlates or only conceptual existence in a domain in con- – lack of standardization, including the use of standards text.” This entity definition is slightly different from for data storage and interoperability; Bhattacharyya’s definition of entity (Bhattacharyya 1975) – minimal interoperability between individual applica- although the main idea derives from it. tions developed for a single solution; – limited reuse of existing applications that are often en- 3.1 Advantages gineered around a single application use case; – lack of data integration as a result of different concep- The main advantage of the faceted approach is to make tual frameworks and lack of use of standards; logical explicit relationships among the concepts or group – poor data quality, often resulting from the lack of effec- of concepts and ignore the limitation of traditional hierar- tive data use locally as well as poor data entry tool and chies. Some more advantages of the faceted approach are training. given below: EA provides a methodology and reusable architectural as- – Hospitable: the classes are easily extensible. The new sets that can assist in the development of complex, large- classes or schema can accommodate without any diffi- scale systems systematically and holistically, and can poten- culties. tially create reusable architecture components for global – Flexible: the classes are more flexible on the basis of health projects. creating structure, sharing with others to facilitate searching and navigating. 4.0 Methodology – Reusable: a facet-based ontology allows many different aspects and approaches to the items, which may be re- In the past decade, ontologies have been used as a core in usable for other related domains. most knowledge-based applications (Kharbat and El-Gha- – Homogeneity: a faceted approach represents a group of layini 2008). In the literature, several definitions of ontol- concepts based on their homogeneous characteristic(s), ogy are available. A definition is given by Benjamin et al. which also solve the problem of polyhierarchy. in the IDEF5 project (1994, 2): – Compact and Completeness: a faceted approach holds complete structure of classes and subclasses and re- An ontology is a domain vocabulary together with a quires compact space with comparison to other hierar- set of precise definitions, or axioms, that constrain chical knowledge organization systems. the meaning of the terms in that vocabulary suffi- ciently to enable consistent interpretation of state- 3.2 Adaptation ments that use that vocabulary.

Health-care information systems (HIS) are somewhat frag- Among other available definitions, probably the most rel- mented in terms of design and operation, as a result of evant definition of ontology was proposed by Guarino successive projects that are not well coordinated or harmo- (1998, 6): “a set of logical axioms designed to account for nized with the existing public health systems. A “bottom- the intended meaning of a vocabulary.” In this definition, up” approach for designing and implementing systems Guarino emphasized the role of logic as a way of repre- may also contribute to fragmentation within a system. senting an ontology. We believe that ontology has an im- Enterprise architecture (EA) is a common approach to portant role to play in the general task of managing diverse develop a system that is more coordinated and integrated information. Most of the work done in this domain is at a system level. EA has been described (Cameron and mainly focused on the design of an ontology for the infor- Knowl. Org. 46(2019)No.3 191 D. Naskar and S. Das. HNS Ontology using Faceted Approach mation system. In contrast, we focus on designing the on- as Medical Subject Heading (MeSH) and Systematized No- tology and also map it with upper-level ontology. There- menclature of Medicine Clinical Terms (SNOMED CT) fore, the work described here was motivated by the follow- 2017 International edition. Some of the terms have also ing research questions: been taken from a classification by The American Associ- ation of Neurological Surgeons. 1. How does one design an adaptive model that answers various queries for the health-care system? 4.1.3 Arrangement and alignment 2. How does one align the model with any upper-level on- tology? The terminology collected during the previous step was an- alyzed for categorization and arrangement of terms ac- Many diverse situations related to hospital, patient, doctor cording to their similarity and differences. We also analyzed and event will make it challenging to come up with a full which terms represent classes, properties and values. Here proof, simplified, and generalized query that will tackle we considered only qualitative values. Qualitative values every intricate situation. However, to minimize the chal- usually reflect properties values, which usually express con- lenge, we formulate our own steps to design the human cepts for the value rather than a number. For example, if nervous system (HNS) ontology, which is motivated by the we use “male” or “female” as values for the property “gen- work done by Gruninger et al. (1995). Our major focus der,” then the terms express a qualitative value. Qualitative was on the generation of axioms using description logic values are usually useful when codifying disease names, (DL) rather than using first order logic (FOL). DL pos- treatment names or particular medical procedures, or char- sesses more advantages over FOL as it ensures more ex- acteristics of or labels for a group of classes. Next, we di- pressiveness of the model. Figure 1 shows all the steps that vided terms into classes and formulated two more tasks. we followed to develop the HNS ontology. The steps are One is to arrange the terms in hierarchical order (super- briefly enumerated below: class, subclass), and the second task is alignment with top- level ontology (see Figure 2). A top-level ontology usually 4.1 Steps in model generation references information architecture, which enables interop- erability when we need to integrate our model with others. 4.1.1 Domain analysis 4.2 Design principles In this process, we analyzed all components associated with the domain discourse (Guarino et al., 2009). Another An application has been developed for the health-care do- task is to finalize the reference context in which we wanted main, which involves plenty of personal data. To tackle to build the application. For example, we can build an ap- such sensitive personal information, we are using the de- plication for the patient, doctor or hospital within the signing principle of common data model (CDM) as sug- health-care domain. A feasibility study for the final appli- gested by Reich et al. (2017). The CDM is designed to store cation also needs to be udertaken in this step. observational data to allow for our experiments under the following principles: 4.1.2 Identification of the terminology – Suitability: The CDM aims to provide data organized in We adopted a set of words for building an ontology. a way optimal for analysis rather than for the purpose Words, in this context. are considered to be terms that rep- of operational needs of health care providers or payers; resent particular concepts in a given natural language. For – Data protection: All data that might jeopardize the our work, technical terms have been collected from vari- identity and protection of patients, such as names, pre- ous literature published by different brain tumour associa- cise birthdays etc. Exceptions are possible where the re- tions and societies. As main sources of natural language search expressly requires more detailed information, terminology, we have selected a standard vocabulary such such as precise birth dates for the study of infants’

Figure 1. Steps followed to construct the ontology. 192 Knowl. Org. 46(2019)No.3 D. Naskar and S. Das. HNS Ontology using Faceted Approach

Figure 2. Alignment with the upper-level ontology.

– Design of domains: The domains are modelled in an database, such as Oracle, SQL Server etc., or as SAS entity-centric relational data model, where for each rec- analytical datasets; ord the identity of the person and a date is captured as – Scalability: The CDM is optimized for data processing a minimum; and computational analysis to accommodate data – Rationale for domains: Domains are identified and sep- sources that vary in size, including databases with up to arately defined in an entity-relationship model if they hundreds of millions of persons and billions of clinical have an analysis use case, and the domain has specific observations; attributes that are not otherwise applicable. All other – Backwards compatibility: All changes from previous data can be preserved as an observation in an entity- CDMs are clearly delineated. Older versions of the attribute-value structure. CDM can be easily created from this CDMv5, and no – Standardized vocabularies: To standardize the content information is lost that was present previously. of the records, the CDM relies on the standardized vo- cabularies containing all necessary and appropriate cor- 5.0 Implementation responding standard health-care concepts. – Reuse of existing vocabularies: If possible, the con- The best way to verify a model or a theory is through im- cepts are leveraged from national or industry standard- plementation. As Fernández-López, Gomez and Juristo ization or vocabulary definition organizations or initia- (1997, 34) said “Obviously, if ontologies are to be used by tives, such as the National Library of Medicine, the De- computer, they have to be implemented.” We implemented partment of Veterans Affairs, the Center of Disease our proposed framework through a graphical analytical Control and Prevention, National Health Service, etc.; platform, as shown in Figure 3. The faceted approach is – Maintaining source codes: Even though all codes are adapted from the information science principle, which al- mapped to the standardized vocabularies, the model lows easy maintainability and encapsulation of data (enti- also stores the original source code to ensure no infor- ties) that will help in the creation of a high performance, mation is lost; generic and adaptive systems. D =< E, R, A > facet was – Technology neutrality: The CDM does not require a transformed into an OWL model in such a way that it specific technology rather than realized in any relational could capture its uniqueness. Whereas “entity” (E) trans- Knowl. Org. 46(2019)No.3 193 D. Naskar and S. Das. HNS Ontology using Faceted Approach

Figure 3. Implementation architecture. form to “owl:Class,” “relation” (R) transform to “owl:Ob- The actual implementation has been done in Protégé jectProperty” and “a map” to “owl:DatatypeProperty.” (https://protege.stanford.edu), a free, open-source ontol- For an example, in RDF/XML syntax it represents the ogy editor developed by the Stanford Center for Biomed- class “clinicalFiniding” as ical Informatics Research at the Stanford University School of Medicine. Protégé uses OWL ontologies, which properties) and classes. tient, doctor) are interpreted as sets that contain individu- Clinical als (patient x, doctor x). Figure 4 depicts the hierarchy of finding HNS ontology on the left side of the figure, and on the right side, class visualization is represented using the Pro- tégéVOWL (http://vowl.visualdataweb.org/protegevowl. It represents the relation “addressCity” as: html) visualization tool; a Protégé plugin for the user-ori- ented visualization of ontologies. ProtégéVOWL imple- ontology language (OWL) that are combined to a force- directed graph layout representing the ontology. And it represents the attribute “age” as” For analytics and query visualization, we used GraphDB standards. Semantic graph databases (also called RDF triple stores) provide the core infrastructure for solutions where 194 Knowl. Org. 46(2019)No.3 D. Naskar and S. Das. HNS Ontology using Faceted Approach

Figure 4. HNS ontology visualization. modelling agility, data integration, relationship exploration of the best methods to evaluate medical ontologies as sug- and cross-enterprise data publishing and consumption are gested by Abacha et. al. (2013) and Bezerra et. al. (2013). important. Competency queries provided the way to check the “entity” The connected graph is the final implementation of the (E) facet, “relation” (R) facet and “attribute” (A) facet to- model in the GraphDB platform. Figure 5 depicts a snap- gether, which are embedded in the form of natural language shot of the connected graph of the HNS ontology. From in a given question; for example, a query like “Give a list all Figure 5, we can easily understand how one individual (e.g., the hospitals in x city which have facilities for the disabled.” Dr. Anirban Deep Banerjee) is connected with other related Then from this natural language question we can derive: entities. The same color nodes represent entities that belong to the same class, and directed arrows depict how they are Identification of general query pattern. Give me all X in connected. Y AND WHERE.property.True. Identification: Concepts and Properties. Entity: Hospital, 6.0 Evaluation City. Relation (R) addressCity: Hospital.name, City.name, and We checked: a) syntactic correctness and consistency; b) Attribute (A) facilityForDisable. Boolean completeness and conciseness; and, c) empirical adequacy of the developed model. Syntactic correctness and con- We formalized CQ according to the query language and sistency are checked by means of facilities offered by Pro- retrieved the correct result. Example of this kind of three tégé, and the Hermit OWL 2 reasoner has been used to queries are given below: check the consistency of the model as per description logic (DL) specifications and declarations. As described in Sec- CQ1: Find all doctors’ names as well as the hospitals where tion 4, the methodology we employed ensures that the de- they are available. veloped model is by construction complete and concise as CQ2: Find all doctors’ names and their specialization along per required task. with the where cities they are available. The second part of the evaluation has been done in re- CQ3: Find all doctors’ names and their contact infor- spect with the competency question (CQ). This is the one mation. Knowl. Org. 46(2019)No.3 195 D. Naskar and S. Das. HNS Ontology using Faceted Approach

Figure 5. Connected entities.

CQ1

SPARQL query 1.

CQ2

SPARQL query 2 196 Knowl. Org. 46(2019)No.3 D. Naskar and S. Das. HNS Ontology using Faceted Approach

CQ3

SPARQL query 3.

7.0 Conclusion and future work Agostini, Alessandro, Devika P. Madalli and A. R. D. Prasad. 2011. “Faceted Approach to Diverse Query Proc- The purpose of representing active knowledge about the essing.” In DiversiWeb-2011, Knowledge Diversity on the Web, human nervous system (HNS) is quite important and Proceedings of the 1st International Workshop on Knowledge Di- largely advantageous. Computer-based HNS ontology versity on the Web, Hyderabad, India, March 28, 2011, ed. supports the work of researchers in gathering information Elena Simperl, Devika P. Madalli, Denny Vrandecic, and on nervous system research and allows users across the Enrique Alfonseca. CEUR Workshop Proceedings 762. world to intelligently access new scientific information Aachen: ceur-ws.org, 17-24. http://ceur-ws.org/Vol-76 quickly and efficiently. Shared knowledge improves re- 2/paper4.pdf search efficiency and effectiveness, because it helps to Alkhammash, Eman, Waleed S. Mohamed, Amira S. Ashour avoid unnecessary redundancy in doing the same experi- and Nilanjan Dey. 2016. “Designing Ontology for Asso- ments or research, thereby avoiding repetition of work. We ciation between Water Quality and Kidney Diseases for have described how we built an ontology by using a faceted Medical Decision Support System.” In VI International classification approach to enhance the accessing and re- Conference Industrial Engineering and Environmental Protection trieving of web content. Our ontology will facilitate the IIZS, Zrenjanin, Serbia, October 13-14, 2016, Zrenjanin, Ser- exact combination of the genetic and environmental fac- bia: University of Novi Sad Technical faculty “Mihajlo tors involved as well as their individual influence on HNS. Pupin” Zrenjanin, 302-11. It will be of acute help to doctors for searching patient Baader, Franz, Diego Calvanese, Deborah McGuinness, records. Ultimately such initiative aimed towards the deliv- Peter Patel-Schneider and Daniele Nardi, eds. 2003. The ery of quality health-care service. Description Logic Handbook: Theory, Implementation and Ap- In the future, we wish to develop an ontology related to plications. Cambridge: Cambridge University Press. more specific diseases and assembled datasets from the Benjamin, Perakath C., Christopher P. Menzel, Richard J. best hospitals across different regions. This ontology will Mayer, Florence Fillion, Michael T. Futrell, Paula S. be used with a more advanced methodology to retrieve rel- deWitte and Madhavi Lingineni. 1994. "IDEF5 Method evant and details information regarding patient records. Report." Information Engineering for Concurrent En- This system will help to guide a new medical practitioner gineering. College Station, Tex.: Knowledge Based Sys- as well as laymen who seek information. tems, Inc. http://www.idef.ru/documents/Idef5.pdf Bezerra, Camila, Fred Freitas and Filipe Santana. 2013. References “Evaluating Ontologies with Competency Questions.” In Proceedings of the 2013 IEEE/WIC/ACM International Abacha, Asma Ben, Marcos Da Silveira and Cédric Pruski. Joint Conferences on Web Intelligence (WI) and Intelligent Agent 2013. “Medical Ontology Validation through Question Technologies (IAT), Atlanta, GA, USA, Nov 17-20, 2013. Answering.” In 14th Conference on Artificial Intelligence in Washington, DC: IEEE Computer Society, 284-285. Medicine in Europe AIME, Murcia, Spain, May 29-June 1, doi: 10.1109/WI-IAT.2013.199. 2013, ed. Niels Peek, Marín Morales, Luis Roque, and Bhattacharyya, Ganesh. 1975. “Fundamentals of Subject More Peleg. Berlin: Springer, 196-205. Indexing Languages.” In Ordering systems for global infor- mation networks: Proceedings of the Third International Study Knowl. Org. 46(2019)No.3 197 D. Naskar and S. Das. HNS Ontology using Faceted Approach

Conference on Classification Research, Bombay, India. January 6- Hamilton, David J., Gordon M. Shepherd, Maryann E. Mar- 11, 1975, ed. A. Neelameghan. Bangalore: Sarada Ranga- tone and Giorgio A. Ascoli. 2012. “An Ontological Ap- nathan Endowment for Library Science, 83-9. proach to Describing Neurons and their Relationships.” Bhattacharyya, G. 1981. “Subject Indexing Language: Its Frontiers in Neuroinformatics 6: 15. doi:10.3389/fninf.2012. Theory and Practice.” In Proceedings of the DRTC Re- 00015 fresher Seminar-13: New Developments in LIS in India, Octo- Hasan, Rashedul, Feroz Farazi Oreste, Salvatore Bursi and ber 14-17, 1981. Bangalore: Indian Statistical Institute. Md Shahin Reza. 2015. “A Faceted Lightweight Ontol- Broughton, Vanda. 2006. “The Need for a Faceted Classifi- ogy for Earthquake Engineering Research Projects and cation as the Basis of All Methods of Information Re- Experiments.” In Experimental Research in Earthquake En- trieval.” Aslib Proceedings 58, no. 1/2: 49-72. doi:10.1108/ gineering: EU-SERIES Concluding Workshop, ed. Fabio Tau- 00012530610648671 cer and Roberta Apostolska. Geotechnical, Geological Cameron, Brian and Nick Malik. 2013. “A Common Per- and Earthquake Engineering 35. Springer: Cham, 11-19. spective on Enterprise Architecture.” N.p.: The Feder- doi:10.1007/978-3-319-10136-1_2 ation of Enterprise Architecture Professional Organi- Imam, Fahim T., Stephen Larson, Jeffery S. Grethe, Am- zations (FEAPO). arnath Gupta, Anita Bandrowski and Maryann E. Mar- Das, Subhashis and Sayon Roy. 2016. “Faceted Ontological tone. 2012. “Development and Use of Ontologies In- Model for Brain Tumour Study.” Knowledge Organization side the Neuroscience Information Framework: A 43: 3-12. Practical Approach.” Frontiers in Genetics 3: 111. Donini, Francesco M., Maurizio Lenzerini, Daniele Nardi Kharbat, Faten and Haya El-Ghalayini. 2008. “Building On- and Werner Nutt. 1997. “The Complexity of Concept tology from Knowledge Base Systems.” In Data Mining in Languages.” Information and Computation 134, no. 1: 1-58. Medical and Biological Research, ed. Eugenia G. Giannopou- doi:10.1006/inco.1997.2625 lou. [N.p.]: IntechOpen. https://www.intechopen.com/ Ellis, David and Ana Vasconcelos. 2000. “The Relevance of download/pdf/5906. Facet Analysis for World Wide Web Subject Organiza- Khoo, Christopher S. G, Syin Chan and Yun Niu. 2000. tion and Searching.” Journal of Internet Cataloging 2, no. 3- “Extracting Causal Knowledge from a Medical Database 4: 96-114. Using Graphical Patterns.” In Proceedings of the 38th An- Fernández-López, M., A. Gomez-Perez, and H. Juristo. nual Meeting on Association for Computational Linguistics, Hong 1997. “METHONTOLOGY; From Ontological Art Kong, October 03-06, 2000. Stroudsburg, PA: Association Towards Ontological Engineering.” In Proceedings of the for Computational Linguistics, 336-43. doi:10.3115/ Ontological Engineering AAAI-97 Spring Symposium Series. 1075218.1075261 Stanford University, EEUU. 33-40. http://oa.upm.es/ Khoo, Christopher S. G. and Jin-Cheon Na. 2009. “Issues 5484/1/METHONTOLOGY_.pdf in Ontology Design for a Clinical Decision Support Ghosh, Shrabana and Pijushkanti Panigrahi. 2016. “Use of System.” In Asia-Pacific Conference on Library & Infor- Ranganathan’s Analytico-Synthetic Approach In Devel- mation Education and Practice (A-LIEP 2009). Tsukuba, Ja- oping A Domain Ontology In Library And Infor- pan: University of Tsukuba. 127-36. mation Science.” Annals of Library and Information Studies Khoo, Christopher S. G,, Jin-Cheon Na, Vivian Wei Wang 62, no. 4: 274-80. and Syin Chan. 2011. “Developing an Ontology for En- Giunchiglia, Fausto, Biswanath Dutta and Vincenzo Mal- coding Disease Treatment Information in Medical Ab- tese. 2014. “From Knowledge Organization to Know- stracts.” DESIDOC Journal of Library & Information Tech- ledge Representation.” Knowledge Organization 41: 44-56. nology 31, no. 2: 103-15. doi:10.14429/djlit.31.2.8620 Grüninger, Michael and Mark S. Fox. 1995. “Methodology Köhler, Sebastian, Nicole A. Vasilevsky, Mark Engelstad, for the Design and Evaluation of Ontologies.” In Pro- Erin Foster, Julie McMurry, Ségolène Aymé, Gareth ceedings of the 14th International Joint Conference on Artificial Baynam, Susan M. Bello, Cornelius F. Boerkoel, Kym M. Intelligence IJCAI, Workshop on Basic Ontological Issues in Boycott, Michael Brudno, et al. 2016. “The Human Phe- Knowledge Sharing, Montreal, Canada, August 20-25, 1995. notype Ontology in 2017.” Nucleic acids research 45, no. San Francisco: Morgan Kaufmann, 1-10. D1: D865-76. doi:10.1093/nar/gkw1039 Guarino, Nicola, ed. 1998. Formal Ontology in Information Sys- Lee, Chew-Hung Christopher Khoo and Jin-Cheon Na. tems: Proceedings of the First International Conference 2004. “Automatic Identification of Treatment Relations (FOIS'98), June 6-8, Trento, Italy. Amsterdam: IOS press. for Medical Ontology Learning: An Exploratory Study.” Guarino, Nicola, Daniel Oberle, and Steffen Staab. 2009. In Knowledge Organization and the Global Information Society: “What Is an Ontology?” In Handbook on Ontologies, ed. Proceedings of the Eighth International ISKO Conference 13-16 S. Staab and R. Studer. Berlin: Springer, 1-17. July 2004 London, UK, ed. Ia C. McIlwaine. Advances in Knowledge Organization 9. Würzburg: Ergon, 245-50. 198 Knowl. Org. 46(2019)No.3 D. Naskar and S. Das. HNS Ontology using Faceted Approach

Lee, Chew-Hung, Jin-Cheon Na and Christopher Khoo. Computing Systems and Applications, Las Vegas, NV, USA, 2003. “Ontology Learning for Medical Digital Libraries.” October 27-29, 2003 Washington, DC: IEEE, 458-65. In 6th International Conference on Asian Digital Libraries, doi:10.1109/IRI.2003.1251451 Kuala Lumpur, Malaysia, December 8-12, 2003, ed. Tengku Ranganathan, S. R. 1937. Prolegomena to Library Classification. Mohd Tengku Sembok, Halimah Badioze Zaman, Hsin- Madras Library Association. chun Chen, Shalini R. Urs, and Sung-Hyon Myaeng. Lec- Ranganathan, S. R. 1989. “Colon Classification.” 7th ed. ture Notes in Computer Science 2911. Berlin: Springer, rev. and ed. M.A. Gopinath. Bangalore: Sarada Ranga- 302-5. doi:10.1007/978-3-540-24594-0_29 nathan Endowment for Library Science. Liaw, Siaw-Teng, Jane Taggart, Hairong Yu, Simon de Reich, Christian, Patrick Ryan, Rimma Belenkaya, Karthik Lusignan, Craig Kuziemsky, and Andrew Hayen. 2014. Natarajan, and Clair Blacketer. 2017. “OMOP Common “Integrating Electronic Health Record Information to Data Model v5.1 Specifications.” https://github.com/ Support Integrated Care: Practical Application of On- OHDSI/CommonDataModel/wiki tologies to Improve the Accuracy of Diabetes Disease Rios-Alvarado, Ana B., Ivan Lopez-Arevalo, Edgar Tello- Registers.” Journal of Biomedical Informatics 52: 364-372. Leal and Victor J. Sosa-Sosa. 2015. “An Approach for doi:10.1016/j.jbi.2014.07.016 Learning Expressive Ontologies in Medical Domain.” Murugavalli, S. and V. Rajamani. 2006. “A High Speed Par- Journal of Medical Systems 39, no. 8: 75. doi:10.1007/s109 allel Fuzzy C-Mean Algorithm for Brain Tumor Seg- 16-015-0261-z mentation.” BIME Journal 6, no. 1: 29-33. Rodriguez-Castro, Bene, Hugh Glaser and Leslie Carr. Murugavalli, S., and V. Rajamani. 2007. “An Improved Im- 2010. “How to Reuse a Faceted Classification and Put It plementation of Brain Tumor Detection Using Segmen- on the Semantic Web.” In The Semantic Web - ISWC 2010: tation Based on Neuro Fuzzy Technique.” Journal of 9th International Semantic Web Conference, ISWC 2010, Computer Science 3, no. 11: 841-6. https://thescipub.com/ Shanghai, China, November 7-11, 2010, Revised Selected Pa- PDF/jcssp.2007.841.846.pdf pers, Part I, ed. Peter F. Patel-Schneider, Yue Pan, Pascal Naskar, Debashis and Biswanath Dutta. 2016. “Ontology Hitzler, Peter Mika, Lei Zhang, Jeff Z. Pan, Ian Hor- and Ontology Libraries: A Study from an Ontofier and rocks and Birte Glimm. Lecture Notes in Computer Sci- an Ontologist Perspective.” In 19th International Sympo- ence 6496. Berlin: Springer, 663-678. doi:10.1007/978- sium on Electronic Theses and Dissertations (ETD 2016): 3-642-17746-0_42 “Data and Dissertations”, Villeneuve d’Ascq, France, July Shepherd, Michael and Tara Sampalli. 2012. “Ontology as 2016. https://hal.archives-ouvertes.fr/hal-01398427/ Boundary Object.” In Categories, Contexts, and Relations in Polavaram, Sridevi and Giorgio A. Ascoli. 2017. “An On- Knowledge Organization: Proceedings of the Twelfth Interna- tology-Based Search Engine for Digital Reconstruc- tional ISKO Conference, 6-9 August 2012, Mysore, India, ed. tions of Neuronal Morphology.” Brain Informatics 4, no. A. Neelameghan and K. S. Raghavan. Advances in 2: 123-34. doi:10.1007/s40708-017-0062-x Knowledge Organization 13. Würzburg: Ergon, 131-7. Puri, Colin A., Karthik Gomadam, Prateek Jain, Peter Z. Vickery, B. C. 1968. Faceted Classification: A Guide to Construc- Yeh and Kunal Verma. 2011. “Multiple Ontologies in tion and Use of Special Schemes. London: Aslib. Healthcare Information Technology: Motivations and Yee, Ka-Ping, Kirsten Swearingen, Kevin Li, and Marti Recommendation for Ontology Mapping and Align- Hearst. 2003. “Faceted Metadata for Image Search and ment.” In 2nd International Conference on Biomedical Ontol- Browsing.” In Proceedings of the SIGCHI Conference on Hu- ogy ICBO, Buffalo, NY, USA, July 26, 2011, ed. Olivier man Factors in Computing Systems, Florida, USA, April 05 - Bodenreider, Maryann E. Martone, and Alan Rutten- 10, 2003. New York: ACM, 401-408. doi:10.1145/6426 berg. http://ceur-ws.org/Vol-833/paper70.pdf 11.642681 Prieto-Díaz, Rubén. 2003. “A Faceted Approach to Building Ontologies.” In Proceedings Fifth IEEE Workshop on Mobile

Knowl. Org. 46(2019)No.3 199 R. A. Moeller and K. Becnel. “Why on Earth would we not Genrefy the Books?”

“Why On Earth Would We Not Genrefy the Books?”: A Study of Reader-Interest Classification In School Libraries Robin A. Moeller*, Kim E. Becnel** Appalachian State University, Box 32086, Leadership and Educational Studies, 151 College Street, Boone NC 28608, *, **

Robin A. Moeller is an associate professor of library science at Appalachian State University in Boone, North Carolina and a former school librarian. She received her PhD in curriculum studies from Indiana University, Bloomington. Her research interests are visual representations of information as they apply to youth and school- ing, as well as the reading habits and interests of children and young adults.

Kim Becnel is an associate professor of library science at Appalachian State University in Boone, North Carolina and a former youth services librarian. She earned her PhD in literature and language from the University of South Carolina and currently teaches and researches in the areas of youth literature and literacy and the intersec- tion of technology and pedagogy.

Moeller, Robin A. and Kim Becnel. 2019. “Why on Earth would we not Genrefy the Books?” A Study of Reader- Interest Classification in School Libraries.” Knowledge Organization 46(3): 199-208. 23 references. DOI:10.5771/0943- 7444-2019-3-199.

Abstract: Through their work as instructors in a master of library science program, the authors observed a sharp increase in students’ desire to adopt the reader-interest classification approach of genrefication for their school libraries’ fiction collections. In order to better understand this trend, the researchers interviewed seven school librar- ians regarding their motivations for genrefying their libraries’ fiction collections; the challenges they encountered during or after the genrefication process; and any benefits they perceived as having resulted in the implementation of genrefication. The data suggest that the librarians’ interests in genrefication stem mostly from the lack of time they have to help individual students find materials, and the lack of time students are given out of the instructional day to explore the libraries’ fiction collections. The participants felt that reclassifying the library’s fiction collection by genre gave students more ownership of the fiction collection and allowed them to find materials that genuinely interested them. The significant challenges the librarians faced in the reorganization process speak to challenges regarding the ways in which librarians attempt to provide access to diverse materials for all patrons.

Received: 11 January 2-19; Revised: 10 April 2019; Accepted: 11 April 2019

Keywords: students, fiction collections, librarians, genrefication

1.0 Introduction cipline, which Melvil Dewey used to develop his numerical organization system in 1876. The Follett Corporation While the majority of library collections follow an estab- (2019), a major school library materials and organization lished classification system, school librarians are taking a services distributor, noted that genrefication “is an increas- different approach to organizing their physical material ingly popular way to support literacy efforts and engage collections with the hope that in doing so, they create a school library readers.” collection that is easier and more welcoming for students Librarians may choose to genrefy their entire collec- to use. Genrefication is a specific approach to library col- tions, but many opt to focus solely on their fiction collec- lection organization, which departs from the traditional tions, as fiction is the least specifically organized in terms approach of classifying library materials by their Dewey Dec- of the DDC. Historically, however, there have also been imal Classification (DDC) numbers into what Martinez-Ávila movements toward reader interest classification (RIC) of (2016, 234) called, “reader-interest classification.” non-fiction collections (Martinez-Ávila 2017). The genres The process of genrefication is one in which the school by which material are classified are sometimes identified by librarian organizes the collection by subject rather than dis- vendor guides or by using other systems of classification, 200 Knowl. Org. 46(2019)No.3 R. A. Moeller and K. Becnel. “Why on Earth would we not Genrefy the Books?” such as Metis or the Book Industry Standards and Com- Truly, genrefication seeks to fragment the traditional DDC munications (BISAC) Subject Headings List. Most often, sequence and, arguably, fragment the traditional relation- genres are identified by the school librarian, sometimes ship of the patron with the collection. Martinez-Ávila with assistance from school community stakeholders, and (2017, 234) described RIC as “a more suitable arrangement are ideally reflective of the needs of the patron commu- for the reader because it … is more intuitive to use.” Mar- nity. For example, librarians might use students’ terminol- tinez-Ávila, writing alone (2017) and with San Segundo ogies to develop specific genres, such as “scary” instead of (2013), further discussed how RIC systems became popular “horror,” or “love” instead of “romance.” They may also in the 1980s because of the perceived usefulness for the ask teachers to identify a popular assignment which require end user-the patron, and that the physical arrangement of students to find a specific type of book in the library, and the fiction collection was more important than classifica- that assignment name will become a genre label. Organiz- tion, in that related aspects such as signage were imperative ing the fiction collection in the traditional Dewey or Li- to the success of the re-organization. brary of Congress Classification styles requires students Perhaps in response to the popularity of RIC in the looking for books to understand and use a “language” of 1980s, Sharon Baker’s research at this time focused, in part, pre-determined subject headings in order to search a data- on the use of RIC approached for fiction collection organ- base to find books they might be interested in reading. As ization. Baker and Shepherd (1987) surveyed historical lit- Snipes (2015, 29) noted, “The use of a numbering system erature regarding RIC and fiction collections and found leaves little room for questions whereas a qualitative sys- that five essential principles developed to drive RIC for fic- tem of topic names is much more concrete in coverage.” tion. These included: 1) the notion that classification should In short, genrefying the fiction collection allows students make finding materials of interest easy for users; 2) that any to go directly to a section of materials that may interest subdivisions that might help users find material of interest them, but the question remains, how effective is this or- should be utilized in classification; 3) the notion that classi- ganizational approach for school libraries? fication itself should help expose users to authors’ works that might otherwise be overlooked; 4) fiction classification 2.0 Literature review approaches should maintain an interfiled collection, rather than separating the collection into smaller subsections; and, Historically, RIC systems were seen as having developed as 5) fiction classification approaches should not separate the part of the user-orientation movement in library and infor- works written by the same author. The latter two principles mation science and reflected the profession’s shift in focus have been disputed amongst practitioners in the library on the accommodation of the patrons as opposed to the community. Baker and Shepherd also analyzed five histori- standards of librarianship (Martinez-Ávila and San Segundo cal works that examined the extent to which readers found 2013). As with any other type of patron, students and teach- classified collections that utilized these principles to be ers in public schools often lack the “language” required of helpful in finding reading material. While the studies re- adept users of traditional classification systems, such as the ported that the classifications system were successful, Baker DDC. Of such systems, Betts (1982, 63) wrote, and Shepherd stressed the importance of further research about RIC for fiction collections. in creating a logical set of relationships between Writing on her own, Baker (1988) further reported on “subjects,” [systems] fail to take account of the her own examination of RIC for fiction collection in pub- (changing) interests with lead people to approach lic libraries, in which she specifically sought to determine those subjects. Interests cross logical boundaries (as the extent to which fiction classification is helpful for read- do books themselves at times) with the consequence ers to find books they’re seeking out, as well as authors that books which readers would wish to access by in- they might have otherwise overlooked; and whether the terest are often widely and inconveniently separated size of a library’s collection and the method of fiction clas- on shelves and in some instances one or other se- sification has an impact on the perceived success of the quence may never be found. Conversely, books ap- classification system. Using data from three public librar- pear together on the shelves which have no relation- ies, Baker found that fiction classification did help readers ship other than a formal academic one, to the benefit find books they wanted and introduced them to new au- of no one in particular. The positive corollary of all thors. She also found that fiction classification is signifi- this is that books should be grouped to reflect the cantly more helpful to patrons when titles are physically actual or potential interest relationship between grouped together, rather than simply labeled according to them, even if this means fragmenting the traditional genre, and that only collections identified as “too large” classification sequence. need to institute “extra selection guidance” in the form of fiction classification (374). Knowl. Org. 46(2019)No.3 201 R. A. Moeller and K. Becnel. “Why on Earth would we not Genrefy the Books?”

Indeed, other institutions have benefited from this type and nonfiction, addresses and supports changes in the cur- of fiction classification. The seemingly simplistic nature of riculum, exposes students to new authors, and enables the finding books and authors of interest by genre is a notion school librarian to become more familiar with the collection. that has been adopted by the retail bookstore, where cus- While benefits and drawbacks of genrefication have tomers browse the shelves, looking for books by the topic been written about by many practitioners (for example, see under which they have been filed (Martinez-Ávila 2017). Jameson 2013; Kaplan et al. 2013; Miller 2013; Rodgers While some may consider this a positive aspect of genrefi- 2018), there is a dearth of contemporary empirical evidence cation, Pendergrass (2015) argued that retail bookstores that examines the effectiveness of genrefication for library sort their books by genre in order to force customers to patrons. In our practice as professors who have taught a browse to find the book they need so that they may find master of library science (MLS) action research course, we and purchase additional materials in the process. She sug- continually noted the absence of research about genrefica- gested that the time it takes students to find the resource tion. The topic of genrefication was a popular research they need is valuable, and that students don’t have the lux- topic for students who studied school librarianship, because ury to peruse the collection to find what they need. Pender- action research requires the researcher to identify a problem grass also identified a common argument against genrefica- of practice, develop an intervention by which to address tion: where there is a lack of consistent, uniform classifica- the problem, and determine the extent to which the imple- tion, patrons can become confused when trying to find ma- mentation was successful. Many of our students felt that terial in other libraries. In addition to maintaining con- enabling students to better find reading material of interest sistency, Pendergrass argued that school librarians should was a problem worthy of focus. With very little empirical continue to classify their collections using the DDC, be- research to inform their study, students were left to rely on cause the fluid nature of student populations requires reg- the “best practices” of other librarians to design their own ular reconsideration of established genres in the collection. course of action and determine how well it worked for their Additionally, she noted that changes in school library staff- libraries’ patrons. The purpose of this particular study is to ing could create confusion and further need for reorgani- examine how these practicing librarians approached gen- zation. In other words, one librarian’s interpretation of how refying their libraries’ fiction collections, and how they per- a book fits into a genre could very well be different than ceived the impacts of this process for their school stake- the next librarian’s interpretation. Pendergrass (2015) and holders. Specifically, we wanted to know: 1) what are school Snipes (2015) also discussed how the intense time commit- librarians’ motivations for genrefying their libraries’ fiction ment required of school librarians to reorganize a collec- collections; 2) what challenges did they encounter during or tion takes away from pressing obligations such as student after the genrefication process; and, 3) what benefits do instruction and collaboration. While genrefication seems to they perceive have resulted in the implementation of gen- be a simple way of connecting patrons to their reading in- refication? The reason for classifying library materials is to terests, it may also be limiting for those materials that have provide patrons with a sort of standardized map, which al- a more complex story structure. For example, the novel se- lows them to systematically locate materials of interest. ries “Twilight” could be genrefied as horror, romance, sus- With genrefication seeming to be an increasingly prevalent pense, gothic, coming-of-age or several other topics. Gen- organizational approach to fiction collections in school li- refication allows this novel to be classified by just one of braries, it is important to empirically examine the extent to these genres. This classification approach relies on potential which this approach is a successful method of systematic, readers being open to exploring multiple genres to find ti- methodical organization. tles that might interest them. If readers commit to explor- ing only one or two genre sections, they might miss titles 3.0 Methodology that cross genres and are difficult to classify. Alternately, LaGarde (2015) argued that the idea that li- As we had initially noticed an increase in the trend toward braries need to follow the same method for collection or- fiction genrefication amongst our graduate students, we ganization is an outdated idea that needs to be replaced with asked a sample of those individuals who had conducted ac- the recognition that school libraries should be responsive tion research on genrefying their school libraries’ fiction col- their students’ needs. She also noted that genrefication uses lections, and who had graduated from our MLS program, to the terminology of patrons, not of library professionals, and serve as our participants. Seven former students agreed to that children are the ones who are using the library, not li- participate: two elementary school librarians (serving stu- brary professionals. Snipes (2015) reported that school li- dents ages five through elevee); three middle school librari- brarians who support genrefication do so because they feel ans (serving students ages eleven through fourtenn); and they process results in a collection that is student-centered, two high school librarians (serving students ages fourteen easier to browse, provides a closer arrangement of fiction through eighteen). Of these seven participants, three had 202 Knowl. Org. 46(2019)No.3 R. A. Moeller and K. Becnel. “Why on Earth would we not Genrefy the Books?” been practicing for two years, three had practiced for two choose a book stemmed from the feeling the librarians ex- and a half years, and one had served as a practicing librarian pressed regarding the tight schedules imposed by teachers. for four years. In preparation to gather data for their initial One participant described teachers’ approaches to explor- action research projects that they implemented in their final ing library fiction collections by saying, “We want them to semester of their graduate MLS program, the librarians go in and accomplish our task and then get out.” For the were required to first reorganize their fiction collections ac- participants, this perceived lack of time for browsing or cording to genre. Each librarian developed her own reorgan- for purposefully navigating the collection using an online ization strategy and schedule, as well as her own set of gen- catalog resulted in students choosing books that they may res, and determined the extent to which she would organize not ultimately enjoy. One participant described how she only parts of the fiction collection or the fiction collection sees this played out in her high school library. She de- as a whole. Once the reorganization was complete, the li- scribed how a teacher told her students: brarians gathered data to answer each of their own individ- ual research questions related to the genrefication of their “Class, you have 10 minutes to pick out a book.” And libraries’ fiction collections. At the point at which we en- we have over 16,000 books in our collection and it gaged the librarians as research participants, they had each just seemed like a hopeless cause, watching these kids completed their genrefication process and had at least four walk through the shelves just randomly pulling some- months of implementation to reflect on the process and de- thing out, looking at the cover and deciding based on termine the extent to which they found their genrefication- just the cover what they wanted, knowing nothing else related efforts successful. Each participant genrefied only about it .... With high school, they have so little time their fiction collection. Additionally, each of the participants in that library and I just thought there has to be a chose to continue classifying their collection according to better way. DDC but arranged materials by genre. We conducted semi-structured individual interviews Another concern regarding the minimal time students had with each participant in order to better understand her mo- to explore the library’s fiction collection was the lack of tivations for experimenting with genrefying the fiction col- time the participants had to provide students with reader’s lection and to understand the extent to which she found advisory services. One librarian noted, “I quickly realized the experiments to be “successful.” After transcribing the that a lot of the students would ask for the sports books, interviews, each researcher approached data analysis look- and where are the scary books, where are the animal books, ing for themes that emerged from the data set. After devel- and Dewey Decimal really lent itself to answer some of those oping their own sets of codes, the researchers came to- questions, but when we got to the fiction section, I could gether to discuss their results. Differences between the suggest authors, like Matt Christopher, but I couldn’t point codes identified by each researcher were discussed and re- them in necessarily the direction of the sports books.” solved so that agreement was reached regarding the signif- Similarly, the participants mentioned their own focus on icant themes that emerged from the participant data. One trying to help students develop and maintain an interest in example of a disagreement was the question of the extent pleasure reading. One librarian noted that she was, “trying to which librarians were able to offer more in-depth to keep the kids reading. Trying to keep that interest in reader’s advisory services to students because of the revised reading. The harder it is to find a book, the less likely organization of the library. One researcher concluded that they’re going to want to read it.” Many of the participants librarians expressed that they felt they had more time for noted how their students are transitioning from searching reader’s advisory, whereas the other researcher failed to see for reading material by level, as dictated by the use of com- this in her analysis. Returning to the data, it became evident mercial reading programs, such as Accelerated Reader to both researchers that a desire for more time for reader’s (AR). The participants observed that without the use of advisory was, indeed, a relevant finding. levels, students seemed to be struggling to find reading ma- terial. One librarian said: 4.0 Results They come into the library now, it shocks them ... I 4.1 Reasons for genrefying the fiction collection cannot tell you how many [students] I have had say, and input into approach “you know, this library is so different from the mid- dle school.” And I’ll say, “what is different about it?” Overwhelmingly, the participants said that their primary And they’ll start talking about AR and how in AR reason for genrefying their fiction collections was to help they could only look at certain shelves and here they students make quick connections to books they might find can look at all the shelves and they’ll even ask me, enjoyable. This focus on the time it took for students to “can I go to any of the books over there?” So its Knowl. Org. 46(2019)No.3 203 R. A. Moeller and K. Becnel. “Why on Earth would we not Genrefy the Books?”

amazing to watch them cuz it’s like such a privilege important information to consider when deciding how to them. they would genrefy. The librarians indicated that they would have conversations with teachers about their Several participants indicated that another reason for gen- thoughts regarding genres their students were interested in refying their fiction collection was the pressure or sugges- reading. Additionally, several participants sought the help tions they received from their school’s administrators or of students in deciding which genres they would imple- teachers to do so. One participant described: ment for their fiction collections.

I will say that administration and teachers had men- 4.2 Benefits of genrefying the fiction collection tioned [genrefying the fiction collection] and thought that it was one way to help students .… A lot of Most participants indicated that a benefit of genrefication [teachers] have classroom libraries and they had set was the decreased time it took for students to locate a book theirs up like how they do at the bookstores and I of interest to them. One librarian said, “Students make think they spoke to administration and so … they comments about how much easier it is to locate books and have had a lot of success with getting more students have really enjoyed new favorite sections based on the la- interested in reading with having it set up like that in beling.” The participants also indicated that students are their classroom. learning about, discussing, and engaging genres that they hadn’t previously. One participant noted, “I think [genrefi- Another participant described how a particular assignment cation] opens students’ eyes up to the fact that there’s more given by a specific teacher helped encourage her to genrefy out there than AR.” Additionally, the librarians observed her fiction collection. She explained: that students were engaging in reading more as a social ac- tivity by talking to each other about what they were read- Our sixth grade teacher does the 40 book challenge ing. One participant noted, “I think that [genrefying the [an assignment that challenges students to read 40 fiction collection] makes the library more of a community books throughout the school year], and does it by hub.” Another said: genre, so it was going to be helpful to them, so there were kind of a number of reasons that I thought this There’s a lot more conversation between students was the best direction to go. about books because, you know, they’ll say, you like mystery books? Well there’s a good one over here Many of the librarians also approached the decision to gen- I’ve read. So getting students involved in reading the refy their fiction collection with the sense that doing so books is the biggest positive. would help empower students to be able to select materials in a way that made sense to them. One participant described: The participants also acknowledged that the reorganiza- tion of the fiction collection imbued students with a sense The kids would come, asking, where’s a good mys- of empowerment. One librarian explained: tery, I want realistic fiction, you know, they were sort of asking for it. So it seemed to make the most I think [students] feel a little bit like their voice is sense—if the students are asking for ... they don’t heard a little bit more. Like they can see there are really realize that they’re asking for it, but that’s what things over here that I like. Like, I know I like mys- they’re asking for, to set the library up such that they tery books and I can see visually now that there are can find them a lot easier. mystery books here for me.

Another librarian said, “Students tell me they’re used to In reference to choosing books, another participant said: genres, like in their music cloud, social media groups, and gaming online choices. Even Netflix movies are gen- Now if they can do that without having to come to refied.” Another participant put it succinctly, “everything me. Like if they’re nervous or shy or uncomfortable else in these kids’ lives is genrefied. Why on earth would doing that, they’ve got a place they can go that they we not genrefy the books?” could find without having to do that if they’re not When asked what sources of information informed quite comfortable. their approach to genrefication, the participants over- whelmingly identified advice sought from professional Another participant made a similar observation in saying, peers and librarian blogs to have been the most helpful. “This gives them independence...they can wander, just Input from school stakeholders was also considered to be kind of peruse, you know, and look for something.” 204 Knowl. Org. 46(2019)No.3 R. A. Moeller and K. Becnel. “Why on Earth would we not Genrefy the Books?”

Nearly all of the librarians discussed the visual nature them. With a genrefied fiction collection, she said the dif- of genrefication and how those visuals have helped stu- ference is, “I don’t have to sell an author to them. I can sell dents locate books with greater ease. One participant de- a book to them.” The vast majority of the participants also scribed: noted that the process of genrefication allowed them to develop a much more broad and deep understanding of It’s more visual I think to them. Like, those books their fiction collection, which allowed them to identify were always there, but maybe they didn’t realize how gaps in their fiction collection as well as materials that many there were. We do have a pretty big section of needed to be deselected. These participants considered horror and suspense or sports or whatever. It maybe this outcome to be a significant benefit to the genrefication used to get lost a little bit—they’d get all mixed up in process. Interestingly, one participant in particular viewed everything and so there’s some things over there, the process of creating and maintaining a genrefied fiction they’ve always been there that speak to them, but collection through the lens of marketing. She noted, “I now they can see them a little bit better .… They’re have a communications major and I knew that marketing not hidden, all mixed in together. matters and I knew that nothing was being marketed so I wanted to address the marketing issue .… We don’t market Another participant said, “Students need very little direc- to try to bring readers in at all.” tion from me once they become familiar with the layout to find the books they are interested in.” One librarian de- 4.3 Challenges associated with genrefying the scribed how the visual nature of genre labels had been fiction collection combined with the traditional Dewey author classification approach to provide a more effective location system: Those challenges the participants identified as being asso- ciated with genrefying the fiction collection mostly re- Before we just have fiction and like A-De or what ferred to library administrative tasks. Specifically, the sig- not. While they are still organized by last name in the nificant amount of time involved in the actual reorganiza- genre section, there is less to dig through and it’s less tion of the fiction collection was overwhelmingly identi- overwhelming. I think they are using the call number fied by the librarians as the biggest challenge. Other ad- initials now even more than before. We always taught ministrative challenges the participants identified were it but they would come in months later and still not changing item locations in the cataloguing and circulation know. I have not had anyone ask lately how to find system, defining and deciding which genres to use in the the name or what it means. fiction collection, budgeting for processing materials, and classifying each item into a genre. With regard to deciding The participants also saw benefits of genrefication for how to assign a genre to a book, one participant explained: themselves. Most indicated that they held a desire for more quality time to engage students through in-depth reader’s I had to figure out where to go to find those answers, advisory services, and that reorganizing the fiction collec- because sometimes you could read the back of the tion in such a way that gave students more ownership, al- book and make that decision but other times you’re lowed them time to do just that. One librarian explained: left guessing so I used a lot of Goodreads and Am- azon reviews and tried to make my best judgment. I When asked, “how’s it working?” I really enjoy it ... ended up moving books after I had genrefied them it’s really opened up my time as the librarian to help because I realized they were in the wrong spot. those reluctant readers find books because the kids that know what they like to read immediately know Another participant noted, “our books are all over the where to go and don’t need my help so I can spend place.” A few participants noted that they started using too my time with the kids that don’t love to read and I many genres, which proved to be overwhelming to the li- have to really dig in and find a book for them. brarians. Participants were also challenged with questions regard- Another noted, “I’m able to have more conversations with ing appropriate genres for their patron population. One li- kids about the books because we’re able have more of an brarian explained: idea of what they like.” One participant described how, when the fiction collection was organized by Dewey Decimal We considered doing an urban section, or, like, do Classification, she would have to send a student to look for we do an LGBTQ section but then do you really books by a particular author, with the hope that the stu- want to call those groups out and separate them. dent would find a title by that author that would appeal to They’re just part of the regular section. I didn’t want Knowl. Org. 46(2019)No.3 205 R. A. Moeller and K. Becnel. “Why on Earth would we not Genrefy the Books?”

to, like, ostracize that so. You know, I have had some this decision and the dearth of research about genrefica- African American students look for some, like, the tion, we specifically wanted to understand from this study: [unintelligible] novels, and ... I think, god, should we 1) what are school librarians’ motivations for genrefying have done it but then I don’t know. their libraries’ fiction collections; 2) what challenges did they encounter during or after the genrefication process; Similarly, another participant described how she struggled and, 3) what benefits do they perceive have resulted in the with genre-defining decisions: implementation of genrefication? The data present a pic- ture of genrefication as a dynamic process constantly I’ve heard different things. Like, with the multicul- evolving to meet the needs of a fluid school community. tural ... a lot of times the kids didn’t go to that sec- The participants’ motivations reflected those repre- tion to read. They would have read those books, they sented in the user orientation resurgence of the 1980s, would have been more likely to pick those books if which was to empower the patron with regard to their own they had been in Realistic [genre section] than pulled information needs (Martinez-Ávila and San Segundo 2013). out separately. The participants’ responses, however, suggested a new facet of the RIC approach, in that time was the primary A different participant cited a gender-specific example; factor for their decision to reorganize their fiction collec- “One of the librarians in the past had done, like a boy genre tions. In other words, by physically arranging the fiction and a girl genre—don’t recommend that. There are no collection into concepts and genres with which students books for just boys and no books for just girls so stay away were familiar, librarians were empowering students to find from that kind of stuff.” Another challenge for participants books of potential interest more easily than they were able was deciding what should be genrefied. All but one of the to do with the fiction collection being organized by DDC. participants genrefied their fiction collections, but many Through their own observations and interactions with wondered if they should be turning their attention next to other school stakeholders, the participants identified that, their libraries’ non-fiction collections and how their ap- in using DDC, students did not have the ability to explore proach to genrefying that collection might look. the fiction collection and/or find reading material that in- One challenge particular to students that emerged from terested them. These observations and interactions suggest the data was their disuse of the library catalog. Participants that classroom teachers feel pressured for instructional noted that, with the increased age of computers, and the time and do not feel they can prioritize students browsing time it took students to learn how to use the catalog, it the library fiction collection for reading material. While became easier to eschew the use of the catalog altogether school librarians also participate in that instructional time, and rely solely on genre location. One librarian explained, they have the additional charge of helping students develop “I had a teacher in here yesterday that said, ‘Nobody on an appreciation of lifelong learning, of which finding en- Destiny [catalog interface]. We don’t have time. Just go find joyment in reading is a part. Thus, a tension existed between a book.’” Students’ inability to use the catalog prevented ensuring that instructional time was maximized for student them from searching for a particular title, in the instance learning and allowing students the freedom to explore li- that they had ideas as to specific books they might want to brary materials to help develop their interests and under- read. This proved to be a challenge for the librarians as standing of the world. The participants’ decisions to gen- well, who took the time and care to properly maintain the refy their fiction collections seemed to be, in part, a re- catalog, ensuring that it would accurately reflect their li- sponse to this tension, as a way to provide students with braries’ holdings. They had also carved out instructional the opportunity to do more focused browsing in a short time to teach students how to use the catalog. These ef- amount of time. Several researchers (Raqi and Zainab 2008; forts to maintain and teach students how to use a valuable Reuter 2008; Montgomery 2014) have noted the im- tool to locate materials in the library seemed to go to waste, portance of patrons browsing for materials when they are as students were instructed to only browse the fiction col- choosing a book to read. While they could not change the lection. culture of the school, they could change how students in- teracted with the fiction collection. Participants also dis- 5.0 Discussion cussed how the demands on the school librarian’s time is such that they are often away from the physical collection, The decision to adopt and implement a new classification thereby leaving students to “fend for themselves” with re- system for one’s school library fiction collection is signifi- gard to finding a book that may interest them. Through the cant, as it has impact on not just the library staff and space, connections they made between literary genres and those but also the entire school community and how they learn genres presented via music and video streaming services, about general library organization. Due to the weight of the participants felt that that they were appealing to stu- 206 Knowl. Org. 46(2019)No.3 R. A. Moeller and K. Becnel. “Why on Earth would we not Genrefy the Books?” dents through their own tools and language to provide a The challenges described by the participants focused kind of self-reader’s advisory service. largely on administrative tasks. The most onerous of these The more prevalent benefit identified by the partici- was the one-time reorganization of the fiction collection. pants was in response to their impetus for genrefying in While the effort of this task is not to be downplayed, all the first place: reorganization had decreased the time stu- of the participants indicated that this undertaking was dents needed to explore the fiction collection and find a worth the effort. Perhaps most interesting was the librari- book they wanted to read. Not only did the students save ans’ struggles with how to or if they even should consider time, but the librarians also discussed how they were able facets of culture as genres. These struggles suggest a to be more judicious with their own time. They were able broader tension related to the librarians’ desire to connect to give more extensive help to students who needed it, students with stories in which they can see themselves as whether it was through reader’s advisory services or well as with stories about people unlike them who have through technical or account assistance. The participants experiences different from their own. In writing about one also discussed how genrefication gave students the oppor- public library’s response to a challenge over LGBTQ ma- tunity to enjoy the fiction collection. They felt that stu- terial, Lechtenberg (2018) described how that library de- dents quickly gained a familiarity with their libraries’ fiction cided to transition to using the BISAC system to reclassify collections to the point that they could work inde- the collection. She described the dangers related to censor- pendently and, thus, explore the fiction collection more ship when a library reclassifies its collection with the goal thoroughly than before. Through the use of the students’ of steering patrons to or away from material, based on par- own terminology, or the terms they use to describe types ticular topics. Similarly, the librarians in this study realized of stories, and familiar physical organization, genrefication the dangerous waters into which they were swimming seemed to impact the way in which students learned about, when they considered creating genre labels based on race, considered, and discussed genres. Indeed, the participants sexuality, and gender. If they did choose to label books as reported that they observed more social engagement such, the librarians might potentially risk reducing charac- about reading between students and were more often en- ters and the stories in which they’re featured into a single- gaged by students themselves to talk about their reading. faceted human experience, when most characters and sto- Several researchers (for example, Guthrie et al. 1995; Smith ries are actually multi-faceted. This caution is supported by and Wilhelm 2002; Baker and Wigfield 1999) have found Martinez-Ávila, San Segundo and Olson (2014, 151), who social engagement to be a strong motivational factor in encouraged interrogating BISAC and other classifications children’s decisions to read for pleasure. In addition to the systems with regard to: social engagement observed by the librarians, students provided each other with recommendations of books, the socio-cultural aspects of the systems, including based on what they had read and the expressed interests the misrepresentation of marginalized groups, and of other students. In a sense, the students provided the consequences that these misrepresentations reader’s advisory services for each other, which suggests could have for the social construction of identities they felt confident in their reading and suggestions. This regarding such sensitive matters as race, religion, and phenomenon also indicates that the students felt a kind of gender studies. ownership or authority about the fiction collection. Per- haps one of the elements of genrefication that made this The librarians’ decisions not to move forward with these sense of authority possible was the visual organization of labels suggest their realization that doing so would restrict materials. The participants had created genre section with some students from engaging in quality literature that they accompanying signage that made sense to the students, by otherwise might have found rewarding. using a language and organizational scheme with which they were already familiar from other formats of infor- 6.0 Implications mation such as video and music apps. As Martinez-Ávila (2017, 235) noted, “The way books are physically arranged The reason that school librarians are embracing genrefica- and how classes are displayed within the system have al- tion is simple: they feel that students are struggling to find ways been among the most important aspects of reader- fiction books that appeal to them. This finding begets an- interest classifications.” As students relied solely on sign- other—that these participants’ focus was on helping stu- age and knowledge of genre arrangement to choose dents become readers, not actually teaching students how books, the data supports Martinez-Ávila’s (2017) assertion libraries are traditionally organized or using the catalog to that fiction collection arrangement is more important than find materials. As noted by most of the participants, the how materials are classified. technological difficulties related to, and lack of time to teach students how to use the catalog, greatly influenced Knowl. Org. 46(2019)No.3 207 R. A. Moeller and K. Becnel. “Why on Earth would we not Genrefy the Books?” their decision to genrefy their fiction collections. This find- meaning in the context of how librarians use this theory in ing suggests that the larger school library shift to organiz- education. As Martinez-Ávila, San Segundo and Olson ing by genre is rooted in schools’ focus on reading, instead (2014) noted in their analysis of BISAC as a new case of of teaching students how libraries are organized or how to RIC, the benefit of such as approach is that it is supported use a catalog to locate specific information. In the current by a centralized organization that has developed standards educational culture of standardized testing in which read- for this type of classification. The genrefication approach ing gains are closely examined, this approach makes sense. that is described in this paper adheres to no such centrali- Another product of the current stringent testing environ- zation. Questions about genres, including whether or not ment is the lack of time teachers allow for students to visit they be common, who decides what is a genre, and who is and peruse the library fiction collection. For their part, included in which genres, are questions that should be con- school librarians have seen the effects of this shift and sidered as school librarians move forward with genrefica- have essentially thrown an anchor to students. Examining tion. Currently, there is no common system of genres for this finding from another perspective raises concerns school librarians to use when reorganizing and reclassify- about how students will continually interact with library ing, but booksellers such as Follett may change that (Follett collections. If students don’t learn in their formative years Corporation 2018). Recently, this company formed an ad- how libraries systematically classify information, how suc- visory board to help guide school librarians on how to gen- cessful will they be in progressive schooling years? Will refy their collections, including mutually agreed-upon genre they know how to use library catalogs to find specific in- standards that they could adopt for their own collection formation? What are the consequences if they don’t learn needs. While this would make classification of materials in primary or secondary school? Additionally, similar re- much easier for librarians, the data from this study suggests search should examine the differences in genrefication ap- that local school community considerations were important proaches between librarians serving different age groups, for the participants in terms of the genres they selected. a factor which was outside the scope of this study. Such Pre-established genres may or may not assist librarians in research may expose unique challenges or considerations reaching the needs of their students. Martinez-Ávila (2017, in librarians’ ability to create better access to reading ma- 65), in his description of the local versus global interests of terials for their students. RIC suggested that, historically, locally developed RIC ap- The data from this study suggest that physical organi- proaches “ended up as individual practices that were hard zation and visual signage are imperative to the success of to standardize and reuse in subsequent projects” or, per- genrefication. As has been noted previously, reorganiza- haps, with changing library administration, while those ap- tion, rather than reclassification, proved to be the pivotal proaches developed with a global interest in mind ulti- factor in the participants’ experiences with genrefication. mately failed patrons whose interests did not match those Whichever way librarians choose to classify and organize a established by the centralized body. Researchers should collection, it is important to consider, as Martinez-Ávila continue to follow the development and implementation of (2017) wrote, that there are no objective organizational genre standards in RIC of school library fiction collections strategies. One genre identified by a librarian may mean in order to better understand if and how such an approach something different to a patron or even another librarian. meets the reading needs of students. In other words, organization and classification systems tend to privilege one reader over another by using written References and visual signifiers that are more familiar to certain read- ers and not all readers. That said, the data from this study Baker, Linda and Allan Wigfield. 1999. “Dimensions of shows that students are taking ownership of the genrefied Children's Motivation for Reading and Their Relations fiction collections in their school libraries by providing to Reading Activity and Reading Achievement.” Reading reader’s advisory services to other students. This practice Research Quarterly 34: 452-77. suggests that students are becoming very familiar with the Baker, Sharon L. 1988. “Will Fiction Classification Schemes nature of items classified under specific genres, and where Increase Use?” RQ 27, no. 3: 366-76. those items of interest are located within the fiction col- Baker, Sharon L. and Gay W. Shepherd. 1987. “Classifica- lection. The students’ ability and desire to provide reader’s tion Schemes: The Principles Behind Them and Their advisory services to other students suggests that these stu- Success.” RQ 27, no. 7: 245-51. dents are becoming experienced and frequent users of the Betts, Douglas. 1982. “Reader Interest Categories in Sur- library’s fiction collection. rey.” In Alternative Arrangement: New Approaches to Public The reason that librarians classify and organize infor- Libraries Stock, ed. Patricia Ainsley and Barry Totterdell. mation is to provide patrons with a systematic method of London: Association of Assistant Librarians, 60-77. finding information they need. This takes on a different 208 Knowl. Org. 46(2019)No.3 R. A. Moeller and K. Becnel. “Why on Earth would we not Genrefy the Books?”

Follett Corporation. 2018. “Follett Forms Advisory Board Global Structures and Local Meaning, ed. Jack Andersen to Help K-12 Schools Genrefy Libraries.” Accessed May and Laura Skouvig, Bingley, UK: Emerald, 51-69. 10. https://www.follett.com/news?articleid=15201. Martinez-Ávila, Daniel and Rosa San Segundo. 2013. Follett Corporation. 2019. “Genre Services.” Accessed “Reader-Interest Classification: Concept and Terminol- March 5. https://www.follettlearning.com/professional- ogy Historical Overview.” Knowledge Organization 40: services/simplify/library-services/genre-services 102-14. Fister, Barbara. 2009. “The Dewey Dilemma.” Library Jour- Martinez-Ávila, Daniel, Rosa San Segundo and Hope A. Ol- nal 134, no.16: 22-5. son. 2014. “The Use of BISAC in Libraries as New Cases Guthrie, John T., William Schafer, Yuh Yin Wang, and Pe- of Reader-Interest Classifications.” Cataloging & Classifi- ter Afflerbach. 1995. “Relationships of Instruction to cation Quarterly 52, no. 3: 137-55. Amount of Reading: An Exploration of Social, Cogni- Miller, Kristie. 2013. “Ditching Dewey.” Library Media Con- tive, and Instructional Connections.” Reading Research nection 31, no.6: 24-6. Quarterly 30, no.1: 8-25. Montgomery, Barbara. 2014. “A Case for Browsing: An Jameson, Juanita. 2013. “A Genre Conversation Begins.” Empowering Research Strategy for Elementary Learn- Knowledge Quest 42, no.2: 10. ers.” Knowledge Quest 43, no.2: E5-E9. Kaplan, Tali Balas, Sue Giffard, Jennifer Still-Schiff, and Moreillon, Judy, Jana Hunt, and Colleen Graves. 2013. “One Andrea K. Dolloff. 2013. “One Size Does Not Fit All.” Common Challenge - Two Different Solutions: Stories Knowledge Quest 42, no.2: 30-7. from Two Libraries.” Knowledge Quest 42, no.2: 38-43. LaGarde, Jennifer. 2013. “Five More Conversations Pendergrass, Devona J. 2013. “Dewey or Don’t We.” [About School Libraries] That I Don’t Want to Have Knowledge Quest 42, no.3: 56-9. Anymore.” Collected Magazine 11: 5-6. Raqi, Syahranah A. and A. N. Zainab. 2008. “Observing Lechtenberg, Kate. 2018. “Could Genre-Based Classifica- Strategies Used by Children When Selecting Books to tion Limit Intellectual Freedom?” Intellectual Freedom Blog Browse, Read or Borrow.” Journal of Educational Media (blog), Dec. 30. https://www.oif.ala.org/oif/?p=13666. & Library Sciences 45, no.4: 483-503. Martinez-Ávila, Daniel. 2017. “Reader-Interest Classifica- Reuter, Kara. 2008. “Teaching Effective Book-Selection tion: An Alternate Arrangement for Libraries.” Knowledge Strategies and Inspiring Engaged Readers in the Library Organization 44: 234-46. Media Center.” Library Media Connection 26, no.7: 18-20. Martinez-Ávila, Daniel. 2017. “Reader-Interest Classifica- Rodgers, Linda. 2018. “Give Your Circulation a Lift.” tions: Local Classifications or Global Industry Inter- School Library Journal 64, no.7: 24. est?” In The Organization of Knowledge: Caught Between

Knowl. Org. 46(2019)No.3 209 W. Korwin and H. Lund. Alphabetization

Alphabetization† †† Wendy Korwin*, Haakon Lund** *119 W. Dunedin Rd., Columbus, OH 43214, USA, **University of Copenhagen, Department of Information Studies, DK-2300 Copenhagen S Denmark,

Wendy Korwin received her PhD in American studies from the College of William and Mary in 2017 with a dissertation entitled Material Literacy: Alphabets, Bodies, and Consumer Culture. She has worked as both a librarian and an archivist, and is currently based in Columbus, Ohio, United States.

Haakon Lund is Associate Professor at the University of Copenhagen, Department of Information Studies in Denmark. He is educated as a librarian (MLSc) from the Royal School of Library and Information Science, and his research includes research data management, system usability and users, and gaze interaction. He has pre- sented his research at international conferences and published several journal articles.

Korwin, Wendy and Haakon Lund. 2019. “Alphabetization.” Knowledge Organization 46(3): 209-222. 62 references. DOI:10.5771/0943-7444-2019-3-209.

Abstract: The article provides definitions of alphabetization and related concepts and traces its historical devel- opment and challenges, covering analog as well as digital media. It introduces basic principles as well as standards, norms, and guidelines. The function of alphabetization is considered and related to alternatives such as system- atic arrangement or classification.

Received: 18 February 2019; Revised: 15 March 2019; Accepted: 21 March 2019

Keywords: order, orders, lettering, alphabetization, arrangement

† Derived from the article of similar title in the ISKO Encyclopedia of Knowledge Organization Version 1.0; published 2019-01-10. Article category: KOS general issues.

†† This article could not have been written without the engagement and help provided by the editor, Birger Hjørland. The authors also thank two anonymous peer-reviewers for valuable feedback.

1.0 Definitions and explanation In general, the most common uses of ordered (or sorted) sequences are: Alphabetization1 is a kind of ordering. The Oxford English Dictionary (Oxford University Press 2018) defines ordering: – making lookup or search efficient; 1a: “a. To place in order, give order to; to arrange in a par- – making merging of sequences efficient. ticular order; to arrange methodically or suitably.” Order- – enable processing of data (http://www.isko.org/cy- ing may be understood in two ways: clo/data) in a defined order.

1. arranging items in a sequence according to some crite- Alphabetization is the process of establishing the alpha- rion;2 betical order of a set of items based on their names or 2. categorizing: grouping items with similar properties. headings.6 Alphabetical order is the arrangement of items by sorting strings of characters7 according to their position It is the first of these meanings that is relevant in relation to in a given alphabet. the term “alphabetization.” Besides alphabetical order as or- In addition to conventions for ordering letters, other dering criterion, other criteria such as chronological or sys- characters such as numbers, symbols, ideograms, logo- tematic may be used for arranging items in a sequence.3 Both grams, and typographical issues such as lowercase and up- these meanings of ordering are often used as synonymous percase letter should be differentiated. The overall term with “sorting,”4 although sorting is often preferred for me- for this is “alphanumeric arrangement.” chanical procedures, such as sorting algorithms.5

210 Knowl. Org. 46(2019)No.3 W. Korwin and H. Lund. Alphabetization

Examples: therefore placed in different alphabetical locations). This is dealt with in library and information science by forms – Books can be arranged according to titles, authors, lan- of “vocabulary control” (such as subject headings and the- guages, and other characteristics displayed in headings sauri). This issue will be dealt with in other articles in this (as can representations of books in catalogs). encyclopedia. – Back-of-the book indexes may contain alphabetically Despite this “unnatural and arbitrary” order, alphabeti- arranged names and keywords referring to the pages on zation has proven itself extremely useful. It is a widespread which these names are mentioned or concepts corre- practice valued for its ability to render large amounts of sponding to the keywords bring information. information readily accessible to users. Alphabetizing is – Entries in dictionaries and encyclopedias can be ar- such a firmly ingrained process in many cultures that users ranged according to headwords (in addition, indexes may scarcely notice the organizational scheme that helps can be arranged according to derived or assigned terms them browse through record stores or locate icons on their or names). computer desktop.9 Its history reveals, however, that al- – In reference lists (e.g., in this article and in all articles in phabetization was not an inevitable development, nor was the IEKO Encyclopedia) references are ordered alpha- it a practice adopted wholesale from the moment of its betically according to author and publication year. invention. Instead, it has existed alongside, and has fre- – Computer sorted outputs from databases can be ar- quently been combined with or challenged by, other means ranged according to many characteristics, including of arrangement. those mentioned in the examples above. As we shall see below, alphabetization is often a com- – Persons can be arranged according to their last names, plex operation that demands much more knowledge than first names, and occupations in a directory. just the twenty-six letters in the English alphabet and their – Wine bottles may be arranged alphabetically in super- conventional order. markets according to, for example, country of origin or name of producer. 2.0 History

The process of alphabetizing headings starts by collocat- The literature on alphabetization is limited but related to ing those starting with the first letter in a given alphabet. large literatures on the developments of alphabets (e.g., Headings starting with the second letter are then collo- Drucker 1995), writing systems (e.g., Diringer 1962; Hooker cated, and the process repeats through the last letter in the 1990; and Daniels and Bright 1996) and, at the broadest alphabet (in English this is mostly termed the A-Z order). level, human symbolic evolution (Lock and Peters 1999). Each collocated group is then arranged according to the Each specific writing system has its own literature and may second letter in the heading and so on, until the whole pose specific problems to the development of standards for string of characters in the heading has been arranged (i.e., its representation and ordering. exact alphabetical order, cf. below). Alphabetization requires that letters bear consistent Alphabetical order has been described as “unnatural names and, most importantly, a standard order. In non- and arbitrary” (Weinberger 2007, 26) rather than organic phonetic languages like Japanese and Chinese, for instance, or intuitive. The reasons for this are: alphabetization is less entrenched, as their logographic and syllabic characters support multiple arrangement possibil- 1. alphabetical ordering is the ordering of items (or their ities. The English term “alphabet,” on the other hand, em- representations) by the symbols used for their names or bodies the very idea that it labels: the consistency and pre- attributes. Because things and their attributes may have dictability of A, B, C. Michael Rosen explains that the word different names, a first kind of arbitrarity is involved; itself is constructed from the first two letters of the Greek 2. because formal (rather than substantial) aspects are alphabet, alpha and beta. He writes (2015, 395-6): used in alphabetization, a second kind of “unnatural and arbitrary” order is involved. Books with similar ti- The alphabet is then the “alphabeta,” rather as if we tles might be kept together, even when they differ were to call the number system the “one-two.” Tracing widely in their subjects. A translated book might also be the route back we go to Latin “alphabetum,” back to separated from the same title published in its original the “alphabetos,” back to Phoenician language (although cataloging rules may apply the prin- “aleph” (“ox”) and “beth” (“house”) which were once ciple of uniform titles8). pictograms. So, incredibly, the word “alphabet” con- tains within it the whole history of this particular al- Another issue arises with synonyms, which allow for the phabet or “ox-house.” same concept to be expressed using different words (and Knowl. Org. 46(2019)No.3 211 W. Korwin and H. Lund. Alphabetization

Lloyd Daly (1967), author of the most in-depth study of the cally fitting new items into pre-allotted spaces, which history of alphabetization to date, notes that the practice sometimes resulted in creating sub-lists or squeezing new became possible when the ancient Greeks adopted the entries into the margins of existing documents. , along with its established letter order. Until the development of printing, the alphabetic prin- Yet, for roughly five centuries, the Greeks found no need to ciple was also limited by media. Extensive alphabetization develop alphabetization, relying instead on other forms of projects depend upon the ability to manipulate entries in- classification, or indeed no classification scheme at all, to dividually, and this is often done by first composing these compile their lists. Daly traces the earliest uses of alphabet- entries on provisional cards11 or slips. Both papyrus and ization to the end of the third century BCE. On the islands parchment were too valuable to be used so ephemerally, of Kos and Kalymnos, he finds inscriptions recording par- and so until paper became cheaper and more abundant in ticipants in local cults in which individuals’ names were di- the late fifteenth century, few efforts were made to apply vided into sections and then arranged according to their first alphabetization to its full potential. As Geoffrey Martin letter. The Alexandrian libraries provided an early occasion (2003, 16) writes, alphabetical indexes based on absolute to apply the alphabetic principle more broadly, as scholars order became common only “as the printed book estab- accumulated and needed to navigate amongst an expanding lished itself as an engine of scholarship,” and alphabetiza- number of texts. Portions of the Pinakes, Callimachus’ par- tion “came into its own as a guide to the contents of the tial library catalogue, classified works by subject and then, greatly expanded libraries that printing made both possible most likely, by author. As part of their literary study, scholars and necessary to the advancement of learning.” also produced glosses of words found in various texts. At Even in the age of the printed book, however, alpha- first, they arranged these lists to reflect the order in which betization remained one of many arrangement possibili- the terms appeared in a given work, but as the glosses grew ties, and end users still needed to be guided in its applica- to unwieldy proportions, they began to arrange them alpha- tion. When Robert Cawdrey published one of the first betically by first letter. English dictionaries in 1604, his Table Alphabeticall, he ex- In spite of these early examples, Daly stresses that plicitly instructed readers how to use it (quoted in Daly adoption of alphabetic order was piecemeal, and favored 1967, 91): mainly by scholars rather than public officials. Although he finds evidence that tax rolls and other administrative doc- If thou be desirous… rightly and readily to under- uments from the Ptolemaic and Graeco-Roman admin- stand, and to profit by this Table, and such like, then istration of Egypt reflected alphabetization to some ex- thou must learne the alphabet, to wit the order of tent, he also explains that “for each example cited, there the letters as they stand, perfectly without booke, are hundreds of documents where the principle might and where every letter standeth: as (b) nere the be- have been used but was not” (50). One particular gap is ginning, (n) about the middest, and (t) toward the found in the administration of ancient Rome, where the end. alphabetic scheme, although known, was not adopted to organize army rolls or tax ledgers, whose large scale might Clearly, Cawdrey could not assume that his early seven- have benefited from such a system. teenth-century readers were familiar with the practice of locating information by consulting alphabetically arranged 2.1 Some challenges of alphabetization documents. There is also a point to be made about alphabetical or- In all the early instances uncovered by Daly, alphabetiza- der in “word” books at a time when spelling was not stand- tion was limited to arranging items based on their first let- ardized and much more fluid. Mulcaster’s Elementarie ter. Eventually, scholars began to extend the practice to or- (1582) provides an example (http://www.bl.uk/learning/ der entries according to their second and third letters, but images/texts/dict/large1323.html). Words such as “cha- it is not until the second century CE that Daly finds exam- lenge,” “chauffinch,” “chearfull,” and “chearie” are spelled ples of exact or absolute alphabetical order in Galen’s In- differently in modern English, and, therefore, fall in differ- terpretation of Hippocratic Glosses.10 In general, its cumber- ent places in the alphabetic sequence. some nature prevented absolute order from gaining wide- Writers and publishers of encyclopedias have also wres- spread acceptance until the end of the Middle Ages, in part tled with presenting alphabetic schemes to their readers. due to the effort required and in part due to the availability Etymologically, encyclopedias offer “general education” or of materials. When compiling a list, an alphabetizer needed “instruction in a circle,” and most early authors sought to to estimate ahead of time the amount of area required to structure their works in ways that presented a coherent accommodate the number of entries falling under a given sum of human knowledge, stressing the internal relations letter. Expanding or combining lists thus required physi- between different fields of inquiry. Alphabetical arrange- 212 Knowl. Org. 46(2019)No.3 W. Korwin and H. Lund. Alphabetization ments, in contrast, disperse conceptually related terms 3. Headings beginning with identical words should based on the relative happenstance of the order of their be arranged in the following sequence. First: Sin- letters, severing important connections between associated gle-word headings; Second: Multi-word headings, ideas. Richard Yeo (1991) has written about this tension including headings with qualifiers. and the ways that, at least since the 1728 release of 4. Cross-references are not part of a heading, and Ephraim Chambers’ Cyclopaedia, editors have tried to re- therefore do not affect the arrangement of a solve it using tools like subject indexes, cross-references, heading. mixed thematic and alphabetical arrangements, and histor- 5. Subheadings are normally arranged in alphanu- ical surveys. Chambers opted to combine systematic and meric sequence. They are subject to the same ar- alphabetical orders in his work, while the Encyclopaedia Bri- rangement rules as the headings they modify. tannica, in 1824, introduced longer historical dissertations Function words at the beginning of subject head- on different branches of science to accompany its shorter, ings should be arranged as any other words. They alphabetically-ordered entries. should not be disregarded. The apparent objectivity of alphabetical order also ob- 6. An initial article in a heading should be treated as scures editorial decisions, such as whether one term should any other initial word. When it is deemed appro- fall under the umbrella of another or merits its own treat- priate or desirable to arrange headings with initial ment. Other practical concerns arise when a recently pub- articles by the word following the article (for ex- lished volume contains entries that rely on concepts that ample, in library catalogs where many title head- follow them alphabetically and may not appear in print for ings begin with an article) the heading may be years to come. Later entries might also be condensed to structured to achieve the desired arrangement. meet publication deadlines, space limits, and financial con- Such structuring has two disadvantages: (a) it straints. Yeo (40) points out that in the original Encyclopae- needs human intervention; and (b) the deletion of dia Britannica, volumes dedicated to the letters A and B an article may distort the meaning of a heading, were granted 687 pages of text, while the remainder of the especially in titles. alphabet was condensed to occupy only 2,000 pages. Ra- 7. Numbers in headings, whether at the beginning ther than necessarily offering order and ease of use, then, or within a heading, should be arranged in arith- alphabetization is also capable of producing disorder, ran- metic order. Headings beginning with numbers domness, and opacity. written in Arabic numerals should be sorted in as- cending arithmetic order before headings begin- 3.0 Some principles of alphabetization ning with a letter sequence. Roman numbers (written by means of letters) should be arranged Any arrangement scheme must take all elements of an in- by their arithmetical value, among other numbers dex entry into consideration. Wellisch (1999) provides a written in Arabic numerals. To achieve this, the detailed discussion of alphabetical arrangement and pre- sequence of letters must first be tagged as a num- sents the following seven rules for ordering characters: ber by human intervention, and may then be sorted as a Roman numeral, either manually or by 1. Headings shall be arranged exactly as written, an algorithm. printed or otherwise displayed. The arrangement of a heading among other headings should be There are two overall basic forms of arrangements of head- based solely on the sequence of numbers in arith- ings, word-by-word and letter-by-letter (see table 1). These metical order and on the sequence of the twenty- two schemes differ in how they handle spaces and other six letters of the English alphabet. non-letter characters and typographical forms. Word-by- The basic order of characters should be in the word arrangement puts “nothing before something,” following sequence: whereas letter-by-letter arrangement (“all through”) ignores – Spaces spaces and punctuation between words. Wellisch (1999, 5 – Symbols other than numerals, letters and emphasis original) writes: punctuation marks – Numerals (0 through 9) This method [letter-by-letter] is primarily used for – Letters (A through Z) the arrangement of headings in dictionaries, because 2. Qualifying or explanatory terms are integral parts it keeps different spellings of the same term together of a heading and should be arranged as any other (for example, ground water, ground-water, ground- words in the heading. water). The application of this method violates, however, the provision of Section 3.1, and it is also Knowl. Org. 46(2019)No.3 213 W. Korwin and H. Lund. Alphabetization

subject to a number of different interpretations …. (ASCII) (Table 2; for ASCII and bit see Appendix 1 and This method is therefore not recommended.” 2). This does not follow the traditional ordering of letters in the English alphabet, where uppercase and lowercase As shown in Table 1, these two styles of alphabetizing letters do have the same position in the alphabet.14 In a yield very different results, which is of great importance in digital computer (or binary computer), each character is long indexes. Most users of indexes do not think about the given a unique binary code. This means that an uppercase various ways entries may be alphabetized, and if not found A has a different code than a lowercase a. According to in a particular place, they may assume that a subject is not ASCII, all uppercase letters appear in order first, followed included. Using standards and orders that work for users by lowercase letters. Following this logic, all entries begin- is critical. Unfortunately, there are different, non-compati- ning with uppercase letters will be arranged before entries ble standards and guidelines, as discussed below. beginning with lowercase letters. Table 2 illustrates the result of using the ASCII arrange- 4.0 Standards, norms and guidelines12 ment to encode characters compared to the example in Ta- ble 1. The rules and standards governing the many aspects of alphabetization may be difficult to grasp. This is particu- In the two leftmost columns, the arrangement follows the larly true with the implementation of well-established na- traditional English alphabetical order according to the tional traditions for arranging names and headings in com- guiding principles of word-by-word or letter-by-letter ar- puter programs. This process is often (e.g., in Library of rangements, with no distinction between uppercase and Congress Filing Rules as well as in this article) called “filing lowercase letters. In the rightmost column, the order fol- rules” (see note thirteen about the use of this term in clas- lows the encoding scheme used in the ASCII character set. sification research). In addition to encoding schemes, it has, therefore, been One challenge of alphabetization in computer software necessary to establish guides or collation15 rules for how is establishing the method by which a system will encode letters should be ordered according to national alphabets. alphanumeric characters. The encoding of characters in These language-specific rules reflect different cultural tra- computer systems has been guided by both national and ditions for arranging alphabetic characters. international standards, as well as by proprietary encoding To add to the complexity, different institutions (e.g., li- schemes established by the various software houses, e.g., braries, publishing houses) also maintain specific traditions IBM, Microsoft, Apple Computer etc., leading to difficul- for how they arrange names and headings. This impacts ties in interoperability between different software pro- the order of books on shelves, the arrangement of book grams. indexes, and the display of search results in an OPAC. As an example, uppercase and lowercase letters are or- Ordering practices have been guided by professional as- dered separately following the 7-bit character set defined sociations like the American Library Association, the Li- by the American Standard Code for Information Interchange brary of Congress, ARMA International (previously the

Word-by-Word Letter-by-Letter (Strict interpretation) N. E. Zenith Co. networks networks New, Agnes New, Agnes New Brunswick New Brunswick N. E. Zenith Co.

Table 1. Simplified figure after Wellisch (1999, 6)

Word-by-Word Letter-by-Letter ASCII (Strict interpretation) (unmodified) N. E. Zenith Co. networks “New lamps for old” networks New, Agnes N. E. Zenith Co. New, Agnes New Brunswick New Brunswick New Brunswick N. E. Zenith Co. networks

Table 2. Simplified figure after Wellisch (1999, 6). 214 Knowl. Org. 46(2019)No.3 W. Korwin and H. Lund. Alphabetization

Association of Records Managers and Administrators), as or headings should be formulated, e.g., back-of-the-book well as by standardizing bodies such as the National Infor- indexes, algorithmic search indexes, library OPACs, etc. mation Standards Organization (NISO) and International Figure 1 provides an overview of numerous standards, Organization for Standardization (ISO), among others. guidelines, and rules (the top box represents issues related Filing rules differ by the level of human intervention used to indexing, cataloging, and metadata that are beyond the to determine which part of the heading or name should be scope of the present article17). In Figure 1, 4.1 and 4.2 de- used for ordering. This involves an intellectual understand- note standards guiding the encoding of characters and ing of the actual meaning of the heading, i.e., to distin- technical solution for the implementation of filing rules in guish between a personal name, a place name, a subject etc. computer systems. 4.3 is filing rules and the order of let- and arrange accordingly. The example below is taken form ters used within different domains. 4.1 to 4.3 are explained the Library of Congress Filing Rules (1980, 24), where head- in detail below. The dotted box gives examples of guide- ings with identical leading elements16 are arranged in the lines and rules for the formulation of headings and indexes following order: person, place, corporate body, subject, ti- etc. “AACR 2ed.” is the Anglo American Cataloging Rules tle (leading element underlined): and is probably the most widely used cataloguing rules globally. “RDA” is The Resource Description and Access George III, King of Great Britain, 1738-1820 cataloguing standard (Joint Steering Committee for RDA George, Saint, d. 303 2015) and is considered the successor to AACR2. These George, Alan guidelines fall outside the scope of this article, but all men- George (Ariz.) tioned standards and guidelines are included in the refer- George (Motor boat) [corporate body] ence list. George, Lake, Battle of, 1755 [subject heading] George [motion picture] 4.1 Standards for encoding of alphanumerical characters In this example, the leading element is in all cases identical and the list is then arranged according to type of heading. Presented below is a selection of US and international Outside the scope of this article are the standards, rules, standards, mainly governing the encoding of the English and guidelines suggesting what indexes are appreciated in written alphabet with later extensions allowing for encod- a certain document or information system and how entries ing of alphabets using Latin script.

Guidelines and rules for the formulation of headings and indexes AACR 2ed; RDA; ARMA; Chicago Manual of Style

4.3 Guidelines and rules for the sorting (filing) of alpha- numeric characters

NISO ALA filing rules 4.2 Standards for the or- LC filing rules dering of alphabets ARMA BS/EN 13710 Chicago Manual of Style ISO/IEC 14651 UCA

4.1 Standards for the encoding of characters ANSI X3.4 & ISO/IEC 646 & ISO/IEC 10646, including ISO/IEC 8859

Figure 1. Overview of standards, guidelines and rule. Knowl. Org. 46(2019)No.3 215 W. Korwin and H. Lund. Alphabetization

ANSI INCITIS X3.4-1986: Information Systems— all kinds of standards and knowledge organization sys- Coded Character Sets—7-Bit American National Standard tems. Code for Information Interchange (7-bit ASCII), first edi- tion published in 1963 and was adapted as the international 4.2 Standards and recommendations for the standard ISO/IEC 646 in 1967. These two standards for ordering of alphabets 7-bit encoding are only presented here because of their historical importance for the early development of com- According to Küster (1999, 21) the “ordering of letters is puters and the attempt to standardize the industry. The 7- highly dependent on the cultural expectations.” This au- bit character sets provided space for English alphanumeric thor thus seems to strive for a multilingual approach to or- letters, resulting in many national variants. To support a dering. What might be expected as the correct alphabetical wider number of characters, the 8-bit family of encoding order in English is not the same in, for example, Danish. standards was developed, the first edition published in Besides the letters a to z, the Danish alphabet also com- 1987 as ISO/IEC 8859. This family of standards is incor- prises the letters æ, ø and å, and the ordering of the Danish porated in ISO/IEC 10646 mentioned below. alphabet is from a – å, meaning that æ, ø and å are the three A widely used character set is the Unicode Standard, last letters in the alphabet. This raises a number of ques- which was first published in 1991 and whose most recent tions about how to treat different national alphabets when version, Unicode 11.0, was published in 2018 (Unicode dealing multilingual information and software. These is- Consortium 2018). Version 11.0 contains a repertoire of sues are both about securing correct order according to 137,439 characters covering 146 modern and historic different national traditions and about how to incorporate scripts, as well as multiple symbol sets and .18 or express letters from other alphabets in, for example, the Unicode makes it possible to encode more than 1.1 mil- English language. lion characters, thereby providing encoding of all existing Example: according to Wellisch (1999, 3) the Danish alphabets, including letter based as well as ideographic letters æ, ø and å should be arranged in the English alphabet writing systems, but only a fraction of this set is currently as ae, o and a. Needless to say, this would have an effect in use. The Unicode standard is developed by The Unicode on the arrangement of characters when following a Dan- Consortium in tandem with ISO, and the most recent ISO ish language-based system compared to an English lan- standard is ISO/IEC 10646:2017 Information technol- guage-based system, and subsequently also the exchange ogy—Universal Coded Character Set (UCS). It corre- of information between the two systems. This is not just sponds to Unicode 10.0 but excludes some special charac- of “academic interest” but relevant whenever Danish ters and emoji symbols (See further Wikipedia Universal names appear in English reference lists—and of course Coded Character Set 2018). similarly with every other language. Unicode is currently the most important issue relating Standards such as BS/EN 13710: 2011 European Order- to alphabetization, and it may deserve an independent en- ing Rules. Ordering of Characters from Latin, Greek, Cyrillic, try in this encyclopedia (see Aliprand 2017 for an encyclo- Georgian and Armenian Scripts have been established to nor- pedia article in Encyclopedia of Library and Information Sci- malize this (see also Küster 2006, chapter 17.4). ences). From a research-oriented perspective, two issues are The standards and recommendations mentioned here crucial: 1) unicode can be implemented by different char- do not only deal with the ordering of letters but also define acter encodings and there seems to be a trade-off between collation algorithms. According to Davis, Whistler and the number of bytes used for each character and the space Scherer (2018, Section 1) the purpose of the Unicode Col- used, and thus the efficiency of the implementation; and, lation Algorithm (UCA) is: 2) philosophical and completeness criticisms. There has been a debate on such issues (see endnote nineteen). Collation varies according to language and culture: Among the issues raised is the relation between characters, Germans, French and Swedes sort the same charac- graphemes and glyphs as units. Holmes (2003) has sug- ters differently. It may also vary by specific applica- gested that although Unicode is a success, a different ap- tion: even within the same language, dictionaries may proach would have worked much better for encoding text, sort differently than phonebooks or book indices. documents and writing systems. The attempt to accommo- For non-alphabetic scripts such as East Asian ideo- date all the world’s languages in one gigantic codespace graphs, collation can be either phonetic or based on means that it cannot take full advantage of the systematic the appearance of the character. Collation can also be graphical features of various writing systems. The criti- customized according to user preference, such as ig- cisms of Unicode seem related to earlier versions and are noring punctuation or not, putting uppercase before possibly less relevant to its newer versions. It is, however, lowercase (or vice versa), and so on. Linguistically important to be open to possible limitations and biases in correct searching needs to use the same mech- 216 Knowl. Org. 46(2019)No.3 W. Korwin and H. Lund. Alphabetization

anisms: just as “v” and “w” traditionally sort as if many other publishers have traditionally preferred the let- they were the same base letter in Swedish, a loose ter-by-letter system but will normally not impose it on a search should pick up words with either one of them. well-prepared index that has been arranged word by word.” It is important to note that Chicago’s choice of let- Collation rules have a wide impact on digital systems, from ter-by-letter alphabetization is in conflict with the word- determining the simple alphabetical ordering of letters in by-word arrangement recommended by Wellisch (1999) an index to influencing how databases and search engines and by both the ALA Filing Rules and the Library of Con- are organized and consequently behave when confronted gress Filing Rules. with a search request submitted by a user. For use within the domain of Records and Information One important function of UCA is, therefore, to pro- Management, ARMA International (ANSI 2005) publishes vide a technical solution for implementing filing rules (see a set of guidelines. These guidelines advise a unit-by-unit below in 4.3) in a software program. It is imperative to un- approach for alphabetical filing, which differs from both derline that the collation algorithm does not prescribe spe- letter-by-letter and word-by-word filing. cific rules for how to arrange or file headings; it only gov- erns the technical implementation of filing rules. 5.0 Alphabetic order versus other ordering criteria The international collation standard is ISO/IEC 14651, Information Technology, International String Ordering and Compar- In botany, Richards (2016, 66) explains that alphabetical ison, Method for Comparing Character Strings and Description of arrangements of plants in herbaria were common by about the Common Template Tailorable Ordering. It was developed in 1596, but many other criteria were also used, like sorting tandem with UCA. Furthermore, Wellisch (1999) and the plants with pleasant flowers from odorous plants and clas- LC Filing Rules (1980) prescribe the ordering of the Eng- sifying plants according to their similarities and differ- lish alphabet and the arrangement of non-English letters ences. This last principle led to hierarchical and more sys- into the English alphabet. tematic approaches, for instance, organizing plants into genera and subdividing them into species. But these spe- 4.3 Rules and guidelines for the arrangement of cies and genera were not necessarily what we would see in headings (filing rules) modern scientific classifications. Sometimes plants were, for example, simply classified as trees, shrubs, or herbs. It Filing rules guide alphabetization, including the ordering is common knowledge that such different ordering princi- and sorting of library catalogs, indexes, dictionaries, and ples were standardized by the taxonomy set up by Carl Lin- directories (Wellisch 1999, v). These rules are published by naeus in his Systema Naturae (1735). Today it is the norm both professional entities and organizations, e.g., national that such systematical arrangements are supplemented by library bodies, library associations, publishing houses, etc. alphabetic indexes for the easy location of a specific name. With this in mind, only a few important examples of Concerning the organization of knowledge in encyclo- guidelines are mentioned here. pedias, Sundin and Haider (2013) write: Wellisch (1999) published by NISO is an attempt to es- tablish a set of common guidelines. According to the fore- The encyclopaedias that emerged around the time of word, “this technical report seeks to make the alphanu- the Enlightenment are said to have shifted know- meric arrangement of headings ‘as easy as ABC’” (Wellisch ledge’s organizational principle; from the tree of 1999, v). The American Library Association (ALA) has knowledge to the alphabet. Yet despite the success published ALA Filing Rules (American Library Association of the alphabetic principle, it has not erased classifi- 1980) and the Library of Congress has published LC Filing cation endeavours, in fact not even in the beginning. Rules (Rather and Biebel 1980). Both are widely used As Ann Blair [2010] points out, already d’Alembert within libraries, but alas they provide different solutions. defended the alphabetic principles in the Ency- For example, the ALA Filing Rules do not distinguish be- clopédie at the same time that he provided readers tween types of headings (Bakewell 1972, 166); this differs with an image of a tree of knowledge as a supple- from the LC Filing Rules (see this article Section 4 for ex- ment to the alphabet. ample). The three recommendations above all advise a word-by-word arrangement. Sundin and Haider then describe how the Swedish elec- Many book publishers follow their own alphabetizing tronic encyclopedia Nationalencyklopedin, in addition to its al- styles. North American publishers often follow the guide- phabetical arrangement, also uses a Swedish bibliographic lines in The Chicago Manual of Style (University of Chicago classification system Klassifikationssystem för svenska bibliotek Press 2017, 944, §16.58), which call for letter-by letter: Al- (SAB). However, the authors do not further examine the phabetization: “Chicago, most university presses, and use of classification systems in contemporary encyclope- Knowl. Org. 46(2019)No.3 217 W. Korwin and H. Lund. Alphabetization dias, and although such systems are sometimes provided pending on both cultural traditions and different ap- (e.g., in Encyclopedia Britannica’s “Syntopicon: An Index to proaches used in different domains or under different cir- The Great Ideas” (1952) followed in 1974 by Propaedia, an cumstances. The implementation of well-established filing “outline of knowledge,” see Adler 2007), there is little evi- rules in computer software has resulted in a number of dence of their use and usefulness over alphabetical arrange- different proprietary technical solutions established by ments, indexes and internal references. However, such sys- software companies. What has characterized these has tematic outlines often form the basis for the overall editing been a lack of interoperability, resulting in incompatible of encyclopedias and the commission of articles. For the systems. The development of computers and software has user, they may, therefore, provide a better overview and been dominated by Anglo-American companies; hence, means to evaluate the coverage of the work. the default “computer” language has been and still is Eng- In libraries, there have been controversies about the lish. This has created a number of difficulties for support- strengths and weaknesses of alphabetical subject catalogs ing non-English alphabets, based on both Latin and non- versus systematic catalogs (see Hanson and Daily 1970 Latin writing systems. Fortunately, the increase in compu- about the history of library catalogs). In The Organization tational power and decrease in storage cost has led to the of Knowledge in Libraries and the Subject-Approach to Books, development of new standards like Unicode, which can Henry E. Bliss (1933) argues that a systematic subject-ap- support all known writing systems. Unicode has now proach is required. Any attempt to apply a simple alpha- gained ground as the “default” standard for encoding char- betical subject-approach without a systematic organization acters, compatible with virtually all modern computer soft- of the plurality of knowledge subjects is rejected by Bliss ware. It now seems possible to support our culturally di- (1933, 301) as a kind of “subject-index illusion.”20 A mere verse writing systems and to achieve interoperability be- listing of subjects, as provided by subject headings, would tween different computer software. However, technical as not be able to meet the principle of maximal efficiency well as philosophical questions persist: What happens that results from the strategies of collocation of closely when the most comprehensive standards prove impractical related classes or subjects and subordination of the spe- to use? And can any alphabetization standard ever function cific to the generic. This means that a differentiation (anal- as a neutral tool, or will it always serve some cultures and ysis) of subjects should only be considered as a necessary domains better than others? first step that needs to be succeeded by an integration (syn- thesis) of subjects into a well-structured knowledge organ- Notes ization system, as underlined by Bliss (1933, 104): 1. This entry is about written alphabets only. We are not Analytic division tends to dispersion. But synthesis, addressing issues relating to unwritten languages or either collocative or systematic, places subjects in ef- sign-languages. About the International Phonetic Al- fectual relation and efficient organization. A colloca- phabet see Brown (2013). tive synthesis does not, however, forego analysis, 2. This first meaning of ordering corresponds to how which inevitably issues from subdivision; but it col- WordNet 3.1. defines the noun “ordering”: S: (n) or- locates the results of analytic subdivision. This is the der, ordering (the act of putting things in a sequential very nature of systematic classification. It opposes arrangement) ‘there were mistakes in the ordering of the false theory that disorder and dispersion can be items on the list’.” Küster (1999, 21; italics in original) obviated or compensated by an alphabetic key or made a distinction between sorting and ordering that subject-index. conflicts with the other definitions presented here: “English terminology usually distinguishes between There are different ways of combining alphabetic and sys- sorting and ordering. Sorting is primarily a service for tematic order. One example is provided by the so-called users to facilitate their access to information by pre- “Cutter numbers” used by the Library of Congress, where senting it in a structured and predictable way, e. g. by alphabetic arrangement is a very significant aspect of the subdividing the information by subject matter (by hav- classification scheme.21 ing several registers to a book, for instance), having multiple indices in a library etc. Ordering––the ar- 6.0 Conclusion rangement of information in alphabetical sequence–– is in most cases an integral part of this procedure.” But Research has demonstrated the complexities that may arise as we saw the term ordering is not normally limited to from using alphabetization: the apparent simple process alphabetization. can be quite difficult. To order headings and indexes al- 3. Even a random order may be used for some purposes, phabetically is not as straightforward as it may sound, de- e.g., statistical sampling. 218 Knowl. Org. 46(2019)No.3 W. Korwin and H. Lund. Alphabetization

4. The Oxford English Dictionary (2018) defines sorting: “9. manner of alphabetical order, and hence completely a. transitive. To arrange (things, etc.) according to kind ineffective for retrieval.” or quality, or after some settled order or system; to sep- 10. Valerius Harpocration was, according to Keaney arate and put into different sorts or classes; to classify; (1973) probably the first to use absolute alphabetiza- to assort.” tion. WordNet 3.1 defines sorting (as a noun): 11. In this context, it seems relevant to mention that it was Carl Linnaeus (1707–1778) who invented the card index – S: (n) sort, sorting (an operation that segregates (cf., Mueller-Wille 2009). The card index served an im- items into groups according to a specified criterion) portant purpose: “Linnaeus had to manage a conflict be- “the bottleneck in mail delivery is the process of tween the need to bring information into a fixed order sorting” for purposes of later retrieval, and the need to perma- – S: (n) classification, categorization, categorisation, nently integrate new information into that order.” sorting (the basic cognitive process of arranging 12. Beside the guidelines mentioned in the section, into classes or categories) Chauvin (1977) should be mentioned. – S: (n) sorting (grouping by class or kind or size) 13. In classification research, in particular in the facet-an- alytic tradition, the terms “citation order” and “filing ODLIS: Online Dictionary for Library and Infor- order” are well established with the following mean- mation Science (Reitz 2004) defines sorting: “In a ings: search of a online catalog or bibliographic database, the default display is normally alphabetical order by au- – “Citation order simply refers to the order in which thor or title, or reverse chronological order by publica- notational elements are cited in a built notation. The tion date. However, in some online catalogs and data- most commonly applied rule is to cite the most spe- bases, the user may select the sequence in which results cific concept first and then move in stages to the will be displayed, usually from a list of options, either most general” (Batley 2005, 17). before or after the search is executed. Compare with – “Filing order, which establishes shelf order, is usu- ranking. See also: arrangement.” ally the opposite of the citation order, with general 5. About algorithmic sorting see, for example, Knuth aspects of a subject shelved before more specific (1998), Christophersen (1997) and Wikipedia: “Sorting aspects. This makes intuitive sense: library users Algorithm” at https://en.wikipedia.org/wiki/Sorting_ would expect broad aspects of a topic to be shelved algorithm. before narrower aspects” (Batley 2005, 17). 6. Wellisch (1999, 2) defines heading: “Any written, printed or otherwise visually displayed item, consisting In a printed telephone directory, for example, this ter- of one or more words, that is to be arranged among minology implies that the citation order would be the other such items in a known order.” way the single entry is constructed (e.g., “Adams, John 7. A character is the “smallest possible unit of arrange- W. librarian #xx) while the filing order would be the ment: a space, letter, numeral, punctuation mark, or alphabetical arrangement of the different entries. other symbol” (Wellisch 1999, 1). Later, in Section 4.1) 14. ASCII-code order is also called ASCIIbetical order. In it is mentioned that the UNICODE has met some dif- ASCII all uppercase come before lowercase letters; for ficulties with characters and that glyphs rather than example, Z precedes a (see the ASCII table in Appen- characters may be needed as units in some scripts. dix A). 8. In practice, library catalogs will mostly apply the prin- 15. See also “Collation” in Wikipedia, the free encyclope- ciple of uniform titles to ensure that a translation is dia: https://en.wikipedia.org/wiki/Collation entered under the original title to keep versions of the 16. Headings are split into elements where an element can same work together. consist of one or more words and is identified by 9. One of the anonymous referees wrote: “Otherwise I punctuation marks etc., e.g., a person’s name consisting thought this was a firm rebuttal of Weinberger and a of a last name, first name is split in two elements using challenge to the idea that alphabetical order is arbitrary, the comma as delimiter. on that basis almost every ordering principle is, and 17. The history of the AACR cataloging rules and the dif- even ‘natural’ orders need to seek consensus on the se- ferent editions can be seen in Joint Steering Committee quence (e.g. natural numbers in ascending order, ele- for RDA (2009) http://www.rda-jsc.org/archivedsite/ ments in the periodic table by increasing atomic num- history.html; The latest version of the RDA is published ber and weight). What is a ‘natural’ order (such as the by Joint Steering Committee for RDA (2015). Such rules elements) may not be familiar to a lay audience in the belong to the field of (descriptive) cataloging (see Knowl. Org. 46(2019)No.3 219 W. Korwin and H. Lund. Alphabetization

Joudrey 2017). Publishers’ guidelines (Such as the Chicago American Library Association. 1980. ALA Filing Rules. Manual of Style (University of Chicago Press 2017)) are Chicago: American Library Association. mainly constructed from practical experience, but there ANSI (American National Standards Institute.) 1986. Amer- is a growing tendency to consider normative guidelines ican National Standard for Information Systems. Coded Character from the perspective of genre- and writing studies, thus Sets. 7-bit American National Standard Code for Information In- contributing theoretical perspectives. terchange (7-Bit ASCII). New York: American National 18. Most editions are published in electronic format as well Standards Institute. as book form and have an ISBN; however, newer edi- ANSI (American National Standards Institute). 2005. Es- tions are not available in WorldCat or in Amazon.com tablishing Alphabetic, Numeric and Subject Filing Systems. but a pdf can be generated from the unicode.org page. Lenexa, KS: ARMA International. Details about the book publication and ordering infor- Bakewell, Kenneth Graham Bartlett. 1972. A Manual of mation of Unicode standards may be found at http:// Cataloguing Practice. Oxford: Pergamon. www.unicode.org/book/aboutbook.html Batley, Sue. 2005. Classification in Theory and Practice. Oxford: 19. A debate included Goundry (2001) “Why Unicode Chandos. Won’t Work on the Internet: Blair, Anne. 2010. Too Much to Know: Managing Scholarly In- Linguistic, Political, and Technical Limitations”; Whis- formation before the Modern Age. New Haven, CT: Yale tler (2001), “Why Unicode Will Work On The Inter- University Press. net.” Peterson (2006) “Unicode in Japan. Guide to a Bliss, Henry Evelyn. 1933. The Organization of Knowledge in Technical and Psychological Struggle and Searle Libraries and the Subject-Approach to Books. New York: (2002).” Unicode Revisited. “ There is also in Wikipe- Wilson. dia’s entry about Unicode section about this: https:// Brown, Adam. 2013. “International Phonetic Alphabet.” In en.wikipedia.org/wiki/Unicode#Philosophical_and_ The Encyclopedia of Applied Linguistics, ed. Carol A. completeness_criticisms Chapelle. 10 vols. Hoboken, NJ: Wiley-Blackwell, 5: 1-8. 20. However, despite Bliss’ criticism, the dictionary catalog DOI: 10.1002/9781405198431.wbeal0565 had many followers, and there was a good deal of oppo- BS/EN 13710: 2011. European Ordering Rules. Ordering of sition to his view, most notably by John Metcalfe (1959). characters from Latin, Greek, Cyrillic, Georgian and Armenian 21. Named after Charles Ammi Cutter, Cutter numbers scripts. London: British Standards Institution. represent a method of representing words or names Chauvin, Yvonne. 1977. Pratique du Classement Alphabétique. by using a decimal point followed first by a letter of 4th ed. Paris: Bordas. the alphabet, then by one or more Arabic numerals. In Christophersen, Hans 1997. “Alphabetisierung auf Compu- Library of Congress (LC) Call numbers, Cutter num- ter. Prinzipien, Probleme und eine Lösungsverbesse- bers do function as book number and distinguishes a rung.” http://www.rostra.dk/alphabet/alpha%5Fdt.htm particular work from others in the same class. “Exam- Daly, Lloyd W. 1967. Contributions to a History of Alphabeti- ple: Call number: Z733.U58G66 1991 contains Cutter zation in Antiquity and the Middle Ages. Brussels: Latomus. number: .U58 [for the United States] and G66 [for Daniels, Peter T. and William Bright, eds. 1996. The World's Goodrum, the author].” (Example taken from https:// Writing Systems. New York: Oxford University Press. www.itsmarc.com/crs/mergedProjects/cutter/cutter/ Davis, Mark, Ken Whistler and Markus Scherer. 2018. definition_cutter_number_cutter.htm). Immroth (1971, “Unicode Collation Algorithm (11.0.0).” Unicode Tech- 384) wrote that Cutter “is perhaps best known today for nical Standard 10. https://www.unicode.org/reports/ his alphabetic order of Cutter tables.” Winke (2002) pro- tr10/ vides an overview of the current use of Cutter’s Expan- Diringer, David. 1962. Writing. London: Thames & Hud- sive Classification of which only Cutter numbers and son. Cutter tables remains in general use. Drucker, Johanna. 1995. The Alphabetic Labyrinth: The Let- ters in History and Imagination. London: Thames & Hud- References son. Goundry, Norman. 2001. “Why Unicode Won’t Work on Adler, Mortimer J. 2007. “Circle of Learning.” The New En- the Internet: Linguistic, Political, and Technical Limita- cyclopædia Britannica, 15th ed. Chicago: Encyclopædia Bri- tions.” Technical Papers. http://www.hastingsresearch. tannica. com/net/04-unicode-limitations.shtml Aliprand, Joan M. 2017. “Unicode Standard.” In Encyclope- Hanson, Eugene R. and Jay E. Daily. 1970. “Catalogs and dia of Library and Information Sciences, 4th ed., ed. John D. Cataloging.” In Encyclopedia of Library and Information Sci- McDonald and Michael Levine-Clark. Boca Raton, FL: ence, ed. Allen Kent and Harold Lancour. New York: CRC Press, 7: 4662-71. Marcel Dekker, 4: 242-305. 220 Knowl. Org. 46(2019)No.3 W. Korwin and H. Lund. Alphabetization

Holmes, Neville. 2003. “The Problem with Unicode.” Küster, Marc Wilhelm. 1999. “Multilingual Ordering: The Computer 36, no. 6: 116 + 114-115 [sic!]. doi:10.1109/ European Ordering Rules.” In Multilinguale Corpora. Cod- MC.2003.1204385 ierung, Strukturierung, Analyse. 11. Jahrestagung der Gesell- Hooker, James T., ed. 1990. Reading the Past: Ancient Writing schaft für Linguistische Datenverarbeitung, ed. Jost Gippert from to the Alphabet. London: British Museum and Peter Olivier. Prag: Enigma, 21–33. Press. Küster, Marc Wilhelm. 2006. Geordnetes Weltbild. Die Tradi- Immroth, John Phillip. 1971. “Cutter, Charles Ammi.” In tion des alphabetischen Sortierens von der Keilschrift bis zur Encyclopedia of Library and Information Science, ed. Allen EDV. Eine Kulturgeschichte. Berlin: De Gruyter. Kent and Harold Lancour. New York, NY: Marcel Dek- Library of Congress. 1980. Library of Congress Filing Rules, ker, vol. 6: 380-7. prepared by John C. Rather and Susan C. Biebel. Wash- ISO (International Organization for Standardization) and ington: Cataloging Distribution Service, Library of IEC (International Electrotechnical Commission). 1991. Congress. Information Technology, ISO 7-Bit Coded Character Set for Infor- Lock, Andrew and Charles R. Peters, eds. 1999. Handbook mation Interchange. International Standard ISO/IEC 646. of Human Symbolic Evolution. Oxford: Blackwell. Geneva: International Organization for Standardization Mackenzie, Charles E. 1980. Coded Character Sets: History and and International Electrotechnical Commission. Development. Reading, MA: Addison-Wesley Publishing. ISO (International Organization for Standardization) and Martin, Geoffrey. 2003. “Alphabetization Rules.” In Interna- IEC (International Electrotechnical Commission). 1999. tional Encyclopedia of Information and Library Science. 2nd ed., Information technology, 8-Bit Single-Byte Coded Graphic Charac- ed. John Feather and Paul Sturges. London: Routledge, ter Sets. International Standard ISO/IEC 8859. Geneva: 15-17. International Organization for Standardization and In- Metcalfe, John. 1959. Subject Classifying and Indexing in Li- ternational Electrotechnical Commission. braries and Literature. Sydney: Angus & Robertson. ISO (International Organization for Standardization) and Mueller-Wille, Staffan. 2009. “Carl Linnaeus Invented the IEC (International Electrotechnical Commission). 2017. Index Card.” Paper presented at the annual meeting of Information Technology: Universal Coded Character Set (UCS). the British Society for the History of Science in Leices- 5th ed. International Standard ISO/IEC 10646. Geneva: ter July 4, 2009. https://phys.org/news/2009-06-carl- International Organization for Standardization and In- linnaeus-index-card.html ternational Electrotechnical Commission. Reference Mulcaster, Richard. 1582. The First Part of the Elementarie which number: ISO/IEC 10646:2017(E). Entreateth Chefelie of the Right Writing of our English Tung. ISO (International Organization for Standardization) and London: T. Vautroullier. IEC (International Electrotechnical Commission). 2016. Oxford English Dictionary Online. 2018. S.v. “Ordering”, ac- Information Technology, International String Ordering and Com- cessed November 4. parison, Method for Comparing Character Strings and Description Oxford English Dictionary Online. 2018. S.v. “Sorting”, ac- of the Common Template Tailorable Ordering. 4th ed. Interna- cessed November 4. tional Standard ISO/IEC 10646. Geneva: International Peterson, Benjamin. 2006. “Unicode in Japan. Guide to a Organization for Standardization and International Elec- Technical and Psychological Struggle.” Blog post. trotechnical Commission. Reference number: ISO/IEC https://web.archive.org/web/20090627072117/http:// 10646:2017(E). www.jbrowse.com/text/unij.html Joint Steering Committee for RDA. 2009. “A Brief History Reitz, Joan M. 2004. ODLIS: Online Dictionary for Library and of AACR.” http://www.rda-jsc.org/history.html Information Science. Santa Barbara, CA: Libraries Unlim- Joint Steering Committee for RDA. 2015. RDA: Resource ited. https://www.abc-clio.com/ODLIS/odlis_a.aspx Description and Access. 2015 Revision. Chicago: American Richards, Richard A. 2016. Biological Classification: A Philo- Library Assn. sophical Introduction. Cambridge: Cambridge University Joudrey, Daniel N. 2017. “Cataloging.” Encyclopedia of Li- Press. brary and Information Sciences, 4th ed., ed. John D. McDon- Rosen, Michael. 2015. Alphabetical: How Every Letter Tells a ald and Michael Levine-Clark. Boca Raton, FL: CRC Story. Berkeley, CA: Counterpoint. Press, 2: 723-32. Searle, Steven J. 2002. “Unicode Revisited.” http://tron- Keany, John. J. 1973. “Alphabetization in Harpocration’s web.super-nova.co.jp/unicoderevisited.html Lexicon.” Greek, Roman, and Byzantine Studies 14: 415–23. Sorting Algorithm. 2018. Wikipedia, accessed Nov. 3. Knuth, Donald E. 1998. Sorting and Searching. Vol. 3 of The https://en.wikipedia.org/wiki/Sorting_algorithm Art of Computer Programming. 2nd ed. Boston: Addison- Sundin, Olof and Jutta Haider. 2013. “The Networked Wesley. Life of Professional Encyclopaedias: Quantification, Knowl. Org. 46(2019)No.3 221 W. Korwin and H. Lund. Alphabetization

Tradition, and Trustworthiness.” First Monday 18, no. 6. org/story/01/06/06/0132203/why-unicode-will-work-on- https://firstmonday.org/article/view/4383/3686 the-internet . 2018. The Unicode Standard. Version Unicode. 2018. Wikipedia, accessed Nov. 3. https://en.wiki 11.0.0. Mountain View, CA: Unicode Consortium. pedia.org/wiki/Unicode http://www.unicode.org/versions/Unicode11.0.0/ Universal Coded Character Set. 2018. Wikipedia, accessed University of Chicago Press. 2017. The Chicago Manual of Nov. 3. https://en.wikipedia.org/wiki/Universal_Coded Style, 17th ed. Chicago: University of Chicago Press. _Character_Set Weinberger, David. 2007. Everything Is Miscellaneous: The Winke, R. Conrad. 2002. “The Contracting World of Cut- Power of the New Digital Disorder. New York: Times ter’s Expansive Classification.” Library Resources & Tech- Books. nical Services 48: 122-9. Wellisch, Hans H. 1999. Guidelines for the Alphabetical Arrange- Wordnet Search - 3.1. s.v. “ordering.” Accessed November 4. ment of Letters and Sorting of Numerals and Other Symbols. Wordnet Search - 3.1. s.v. “sorting.” Accessed November 4. NISO Technical Report 1081-8006 vol. 3. Bethesda, Yeo, Richard. 1991. “Reading Encyclopedias: Science and MD: National Information Standards Organization. the Organization of Knowledge in British Dictionaries Whistler, Ken. 2001. “Why Unicode Will Work on The In- of Arts and Sciences, 1730-1850.” Isis 82, no. 1: 24-49. ternet.” Slashdot (blog), June 9. https://features.slashdot.

Appendix 1: ASCII Table

Appendix 2 – Two bits character sets can have four possible charac- Developments in character codes by bits ters. 22=4. 00,01,10,11. (i.e. 0-3) – Three bits character sets can have eight possible char- – One-bit character sets can have two possible characters. acters 23=8. 21=2. 0 or 1. (This is the binary alphabet used by mod- – Four bits character sets can have 16 possible characters. ern computers) 24=16. 0000,0001,0010,0011, etc. (i.e. 0-15) 222 Knowl. Org. 46(2019)No.3 W. Korwin and H. Lund. Alphabetization

– Five bits character sets can have 32 possible characters. characters; in 1968 MARC-8 7 bits Library computer sys- 25 = 32. Until about 1928 some 5-bit codes were used tems was introduced. (e.g., Baudot code and Murray code) – Eight bits character sets can have 256 possible charac- – Six bits character sets can have 64 possible characters. ters. 28=256. 00000000,00000001,00000010, etc. (i.e. 0- 26=64. In 1928 the BCD (“Binary-Coded Decimal”) 6 255). In 1963 the Extended Binary Coded Decimal In- bits code was introduced with the IBM card, generally terchange Code (EBCDIC) 8 bits code were developed used for the upper-case letters, the numerals, some for IBM computers. punctuation characters, and sometimes control charac- – 16 bits character sets can have 216 possible characters = ters. 65,536 – Seven bits character sets can have 128 possible charac- – 32 bits character sets can have 232 possible characters = ters. 27=128. 0000000,0000001,0000010, etc. (i.e. 0-127). 4,294,967,296. In 1991 Unicode, packed into 8/16/32, In 1963 the ASCII 7 bits code provides 128 different but less than 21 bits are usable (=2,097,152 characters).

Knowl. Org. 46(2019)No.3 223 E. Stuart. Flickr: Organizing and Tagging Images Online

Flickr: Organizing and Tagging Images Online† Emma Stuart University of Wolverhampton, Wulfruna Street, Wolverhampton, WV1 1LY,

Emma Stuart is a postdoctoral researcher at the University of Wolverhampton, who specializes in social media analysis. Her principal research interests are concerned with the types of images being posted to social networking sites and image-centric mobile apps, and the changing role of photography. Emma holds a PhD in information science from the University of Wolverhampton.

Stuart, Emma. 2019. “Flickr: Organizing and Tagging Images Online.” Knowledge Organization 46(3): 223-235. 106 references. DOI:10.5771/0943-7444-2019-3-223.

Abstract: Flickr was launched when digital cameras first began to outsell analog cameras, and people were drawn to the site for the opportunities it offered them to store, organize, and share their images, as well as for the connections that could be made with other like-minded people. This article examines the links between Flickr’s success and how images are organized within the site, as well as the types of people and organizations that use Flickr and their motivations for doing so. Factors that have contributed to Flickr’s demise in popularity will be explored, and the article finishes with some suggestions for how Flickr could develop in the future, along with some conclusions for image organization.

Received: 25 March 2019; Accepted: 27 March 2019

Keywords: Flickr, images, tags, tagging

† Derived from the article of similar title in the ISKO Encyclopedia of Knowledge Organization Version 1.0; published 2019-03-14. Article category: KO in different contexts and applications

1.0 Introduction innovative new features if offered such as the use of pho- tostreams, tags, favorites, and groups (McCracken 2014). Flickr (www.flickr.com)—from the English word “flick,” Flickr also provided a platform for people who were pas- meaning to flick through something—is an image- and sionate about photography to share their images with video-hosting website that was launched in 2004 by Stew- other people who were also passionate about photography art Butterfield and Caterina Flake. Whilst Flickr’s creators at the exact same time that digital cameras first began to originally intended for it to be a massive multiplayer online outsell analog cameras (Weinberger 2007, 12). Thus, a new game (called Game Neverending), it was the image sharing knowledge organization system was born, creating a place aspect of the game that unexpectedly became more popu- for the management and retrieval of people’s images. The lar, and so the original game idea was abandoned, thus al- timing for Flickr was perfect, and it soon became “one of lowing for the development of Flickr. the internet’s biggest repositories of photographs,”1 thus Flickr is credited as being one of the “first classic web making it an important digital cultural repository to ex- 2.0 sites” (Van House et al. 2005; Cox 2008a; Cox, Clough plore and evaluate. and Siersdorfer 2010) as it provided the perfect mix of new At the height of its popularity in around 2010, 3,000 and innovative features that piqued people’s interest at a images were being uploaded to Flickr every minute,2 which time of significant change on the web. This change be- equated to approximately 4.3 million images each day. It came known as web 2.0, and the term was widely used was a website that rode the web 2.0 wave extremely suc- from 2004 up until around 2008 to refer to a fundamental cessfully, continually adding new features and listening to shift in the way that people created and shared information feedback from users, and it was always more popular than online. Rather than being passive consumers of infor- rival photo hosting sites such as Picasa, Photobucket, mation on websites, the consumers themselves now began SmugMug, Shutterfly, and Photoshelter. generating the content for websites such as YouTube However, Flickr’s heyday now seems to be over, with (videos), Wikipedia (collaborative articles), and Twitter the most recent statistics being released by Flickr in 2014 (thoughts and ideas). Similarly, with Flickr, it was the users stating that only 1 million images were now being up- of the website itself who generated the content (i.e., the loaded each day.3 Flickr’s decrease in popularity seems to images), and they were drawn to the site because of the be the culmination of three main factors. Firstly, Flickr was 224 Knowl. Org. 46(2019)No.3 E. Stuart. Flickr: Organizing and Tagging Images Online acquired by Yahoo in 2005, and whilst the acquisition did equates to metadata that is embedded in images that have not immediately cause adverse effects for Flickr, Stewart been taken with digital cameras. Metadata is “data about Butterfield (one of Flickr’s creators) nonetheless admits data,” and for digital images this can include information that Yahoo stifled innovation within the company, and he such as: the date and time that the image was taken; the dramatically resigned in 2008,4 thus indicating that all was make and model of camera or cameraphone used; shutter not well in the Flickr/Yahoo partnership. Secondly, Flickr’s speed; specific settings that were used; focal length; and failure to implement a successful mobile platform (Bowker even GPS data (Bausch and Bumgardner 2006). This type 2017). And lastly, the rise of image-centric smartphone ap- of metadata can be useful in two main ways: to help you plications such as Instagram and Snapchat have usurped a remember how you achieved a shot that you are particu- large amount of attention away from Flickr (Bowker larly proud of and may want to try to recreate; and it also 2017). In the remainder of this article I will explore in tells other users how they can achieve a similar effect for more depth the links between Flickr’s success and how im- their images. ages are organized within the site, as well as the factors that The visual attributes of an image can relate to either the have contributed to the site’s decrease in popularity, and subject content of an image (i.e., what the image is “of ” how Flickr may adapt in the future to keep pace with a or “about”), and it can also relate to object aspects such as changing knowledge organization landscape. colour, shape, perspective, composition, pattern etc. Whilst a person may add descriptors to their images that 2.0 Success and knowledge organization in Flickr relate to attributes such as colour, shape, pattern, etc., these attributes are more commonly associated with content- For a knowledge organization system to be accessible and based image retrieval (CBIR) where images are retrieved usable, the knowledge contained within it has to be orga- using automated systems that search at pixel level (Jansen nized in some way (Soergel, 2009). With Flickr, it is the 2008, 82). Flickr offers a similarity based filter when users of the system itself that organize the digital images, searching for images and it also allows you to filter images and this is one of the site’s main success factors. In the by style and pattern. The subject content approach to cat- context of Flickr, digital images are defined as, “a repre- egorizing and classifying an image tends to be based on sentation of an image stored in numerical form, for po- what an image is “of ” and “about,” and there is no stand- tential display, manipulation or dissemination” (Terras ardized protocol for achieving this (unlike books, which 2008, 6). The default view of images once uploaded in can all be categorized according to, for example, the Dewey Flickr is the “photostream.” A user’s photostream displays Decimal Classification). As such, a number of different ap- all of their images sorted by upload date, with the most proaches have been developed. One approach is called recently uploaded images at the start of the stream (Wil- subject indexing (Graham 2001). This approach involves kinson 2007). Other users can follow photostreams by assigning terms to images that have been selected from a clicking a follow button whilst viewing a person’s image, controlled vocabulary such as a subject heading list (e.g., and such follows tend to be reciprocated (Mislove et al. the Library of Congress Subject Headings); a thesaurus (e.g., 2008). Images in Flickr can also be organized into sets and Art and Architecture Thesaurus (AAT) or the Thesaurus for groups, based on whatever concepts users like (Stuart Graphic Materials (TGM)); or a classification scheme (e.g., 2013). A set is a collection of images from a user’s photo- ICONCLASS) (Graham 2001). stream, and users tend to add images with a common The main drawback with the use of such systems is that theme into sets (e.g., images from a specific event or holi- they can only be used by subject specialist, professional in- day). For personal information organization using tradi- dexers, with it typically taking up to forty minutes to assign tional photo albums photographs could only exist in one terms to one image (Eakins and Graham 1999), and the place at a time, whereas digital images can be placed in any terms that are attached to images are often far removed number of Flickr sets at once. Images can also be placed from the retrieval needs of the end users. Whilst this was into groups, where “like-minded users gather, discuss not a problem with traditional analogue picture libraries things, and share pictures” (Wilkinson 2007). Images can where images would be retrieved by staff for the end users, also be “favourited” by other users, and users can monitor it is however more of a problem with web-based image statistics on the number of times their images have been databases where it is the end users themselves that search viewed. for the images they want. However Flickr does not utilize In addition to images being placed into sets, albums and controlled vocabularies, thesauri, or specific classification groups, images can also be categorized according to what schemes as it is not generally subject specialist, profes- Berinstein (1996, 26) calls the visual and non-visual attrib- sional indexers that are attaching key terms to the images utes of an image. The non-visual attributes of an image on Flickr; it is normal everyday people. Shirky (2005) de- relates to its biographical elements, and in Flickr this scribes this change as heralding a philosophical shift in in- Knowl. Org. 46(2019)No.3 225 E. Stuart. Flickr: Organizing and Tagging Images Online dexing, and Rafferty and Hidderley (2007) describe it as a to be a popular practice in Flickr (Marlow et al. 2006; Cox shift from a monologic to a dialogic indexing practice. 2008a; Ding et al. 2009), presumably because images (es- It is widely accepted that images are inherently more pecially if the images are photographs of friends or family) difficult to categorize than text, as, “a picture can mean are regarded as quite personal items. In a social system different things to different people, and it will also mean such as Flickr, tagging can also be a means of attracting different things to the same people at different times” traffic to one’s images (Zollers 2007), thus facilitating in- (Graham 2001, 25). However, as the person who creates teraction between users. the image is generally the person who uploads it to Flickr, Unlike traditional classification and indexing, tagging is they are not, therefore, likely to struggle in deciding what not hierarchical, although some systems may adopt the use key terms should be attached to their image. However, the of automatically generated related tags (Rafferty 2016). key terms chosen by the image creator are not necessarily Flickr did introduce an auto-tagging feature in 2015, how- the same key terms that other end users will subsequently ever the tag suggestions were based on image recognition use to search for images within Flickr. This can be seen as technology (i.e., analysis of the visual features within the one of the main problems on a user-generated site such as image) rather than semantic relationships between words Flickr. associated with the image. The auto-tagging feature re- With Flickr being credited as “one of the Internet’s big- ceived much controversy after images of black people were gest repositories of photographs.” Morville (2005) reiter- automatically given the tags “ape” and “animal.”5 ates that findability is a key issue in a busy information en- Rorissa (2010) conducted a study to empirically test the vironment and the main method of both categorizing and similarities and differences between user-generated tags as- subsequently retrieving images on Flickr is via the use of signed to images on Flickr compared to controlled vocab- user-generated tags (Wilkinson 2007). ulary assigned to images in general image collections by professional indexers. Overall it was found that there were 2.1 Tagging and image retrieval significant differences between the two groups, and that user-generated tags and controlled vocabularies tend to One of the main facets of knowledge organization, is the have different underlying structures. Jörgensen (2003) process by which knowledge is organized, such as abstract- points out that whilst controlled vocabularies can help to ing, indexing, cataloging, subject analysis, and classifying. guide users to select appropriate terms to assign to their The process of tagging can now also be added to this list. images, they nonetheless have a number of drawbacks, in- Whilst tagging was introduced in 2003 by a now discontin- cluding the fact they tend to be narrow, expert-oriented vo- ued social bookmarking website called Delicious cabularies that use inflexible and pre-coordinated terms. (Cagnazzo 2018), Flickr was one of the first websites to There is an extensive body of research that has sug- fully adopt tagging and make it mainstream (Smith 2008, gested that tagging is utilized on Flickr for a combination 9). Tagging is seen as one of the most successful phenom- of four main reasons, two of which centre around concepts ena generated by web 2.0 (Cagnazzo 2018). Tagging is the of knowledge organization (i.e., social-organization and name given to the process whereby users assign keywords self-organization), and the other two centre around con- to web objects (Xu et al. 2006), and whilst tagging is not cepts of communication (i.e., social-communication and mandatory in Flickr, it is the main method for allowing im- self-communication) (Van House et al. 2004; Van House et ages to be subsequently retrieved by other users, and im- al. 2005; Van House 2007; Nov, Naaman and Ye 2009a; ages can have one or more tags assigned to them. Nov, Naaman and Ye 2009b; Ames et al. 2010). Social-or- Tagging is a key part of the organizational structure on ganization is where tags are utilized so that other users of Flickr (Wilkinson 2007) and tags essentially organize, de- Flickr are able to search for and retrieve images. Self-organ- scribe, comment on, and categorize resources, thus allow- ization is using tags to categorize images to make it easier ing the images to be retrieved at a later date. If a person for oneself to find them in the future. Social-communica- tags all the images in their photostream that contain a sun- tion is whereby tags are used to express emotions or opin- set with the tag “sunset,” then when they subsequently ions, or to attract attention to images. Self-communication click on the tag “sunset,” all their images of sunsets will be is the use of tags to aid with one’s own memory of events displayed to them; tags, therefore, act as links (Bausch and and for personal reflection. Bumgardner 2006). Similarly, a person may perform a There have been numerous studies that have looked at global search within Flickr, and find images by all users that the types of tags that Flickr users apply to their images. Sig- have been tagged with “sunset.” Although there is no way urbjörnsson and van Zwol (2008) found that in a collection of ultimately knowing if all relevant images have been re- of over fifty-two million publicly available Flickr images, trieved. Whilst it is possible to add tags to another user’s users’ tags tended to describe the “where” (an image was images (social/collaborative tagging), this has not proved taken), the “who” or the “what” (is in the image), and the 226 Knowl. Org. 46(2019)No.3 E. Stuart. Flickr: Organizing and Tagging Images Online

“when” (the image was taken). It was also found that the of tags that users subsequently assigned to their images. top five most frequently occurring tags were: 2006, 2005, Tags that generically described what images were “of ” wedding, party, and 2004. This finding suggests that tags in were found to be the most popular type of tag category Flickr tend to follow a power law distribution whereby the (Stuart 2012). majority of images are annotated with the same few tags Geotagging is also an additional way of being able to (Mathes 2004; Sigurbjörnsson and van Zwol 2008). tag images in Flickr, which is where latitude and longitude In an analysis of 1.4 million Flickr tags, Ding et al. (2009) coordinates are attached to an image, thus allowing for ex- found that the most popular types of tags were dates, loca- act geographical identification (Bausch and Bumgardner tions, colours, and seasons. However, Flickr users are rarely 2006). found to have more than twenty tags assigned to their im- Images in Flickr can also have titles and descriptions ages (Barton 2015, 54). Whilst Flickr has a global userbase, added to them, which can also aid in their retrieval. Titles people tend to consider the wider Flickr community when are generally just a few words long and appear above an tagging their images and generally opt to tag in English, image, whereas descriptions appear below an image, and which is less likely to exclude other users (Dotan and can be anything from a few sentences to an entire story Zaphiris 2010). about the image in question (Bausch and Bumgardner Research on tagging has highlighted numerous weak- 2006). Whilst it is the tags, titles and descriptions that are nesses with its use, which ultimately mean that the attached to images that allow them to be subsequently re- knowledge organization systems that adopt tagging have trieved in a search by another user, Lerman and Jones certain limitations, including: misspellings and nonsensical (2006) highlight the way in which Flickr users also find new tags (Aurnhammer et al. 2006; Guy and Tonkin 2006; images by browsing through their contacts’ photostreams Spiteri 2007); ambiguous and personalized tags (Guy and (social browsing). Tonkin 2006; Macgregor and McCulloch 2006); com- Whilst tagging is seen as one of the main key success pound tags (Mathes 2004); tags that utilize abbreviations, factors of Flickr, allowing multiple entry points to the re- initialisms and acronyms (Spiteri 2007); tags that use neol- trieval of images, two further success factors will also be ogisms, slang, and jargon (Spiteri 2007); and polysemous discussed: Flickr’s groups, games, and competitions fea- words, synonyms, homonyms, and homographs (Aurn- ture; and its application programming interface (API). hammer et al. 2006; Golder and Huberman 2006). All of these issues have led to criticism and the conclu- 2.2 Groups, games, and competitions sion that tags impact negatively on retrieval precision (Macgregor and McCulloch 2006). On the flipside how- It is not obligatory that users of Flickr must join groups, ever, it is also said that all of these issues contribute to- indeed it tends to be more committed members that do so wards a true representation of knowledge (Macgregor and (Cox, Clough and Siersdorfer 2011). However, the “groups” McCulloch 2006) and a rich end-user vocabulary (Rorissa feature is one of the flagship features of Flickr and has con- 2010), and Spiteri (2007) suggests that the percentage of tributed to its success (Negoescu et al. 2009). In Flickr, “problem tags” is actually very small. The adoption of se- groups are where users who share a common interest come mantic tagging (tagging content with URIs) is now seen as together to share images and have discussions, and many a way of overcoming some of the problems inherent with sub-cultures exist within Flickr groups (Cox 2008a). user-generated tags (Cagnazzo 2018). Groups can be public (whereby anyone can see the pho- Other research has suggested that tags are often more tos within the group), public-invitation only (whereby an closely related to the motivation of the uploader, rather existing member of the group must send an invitation), than relating to image content (Kennedy et al. 2007), with and private (whereby the group would not show up in any motivation to tag often being very different from the initial searches, and again, an existing member would need to motivation to use a website (Stuart 2012), and people may send an invite). also have more than one reason for tagging (Ames and Sharing images with groups on Flickr was considered to Naaman 2007). Much of the literature on motivation for be an important part of Flickr etiquette, although it has tagging distinguishes between tagging for one’s own or- been claimed that 50% of Flickr users never post images ganization and retrieval purposes or tagging so that other to groups (Negoescu and Gatica-Perez 2008), and research people are able to find the content in question (Hammond by Stvilia and Jörgensen (2010) found that from their sam- et al. 2005; Marlow et al. 2006; Heckner, Heilemann and ple of Flickr users, 37% did not belong to any groups. Cox, Wolff 2009). In an investigation of motivation to upload Clough, and Siersdorfer (2011) in an investigation of 1,000 and tag images in Flickr which included a sample of 3,462 random Flickr groups found that nearly 80% of groups images and 12,832 tags, it was found that overall motiva- had less than 100 members, with nearly 50% of groups tion to upload and tag images was not related to the types having less than 100 photos. However, it tends to be the Knowl. Org. 46(2019)No.3 227 E. Stuart. Flickr: Organizing and Tagging Images Online comments that are attached to photos that are the means tag information (Lerman and Jones 2006; Lerman, of interaction between group members, rather than gen- Plangprasopchok and Wong 2007; van Zwol 2007; Prieur eral group discussions (Cox, Clough and Siersdorfer 2011). et al. 2008; Angus, Thelwall and Stuart 2008; Angus, Stuart Such is the cohesion in many groups that users come to and Thelwall 2010; Cox, Clough and Siersdorfer 2010; Do- view them as additional online communities (Holmes and tan and Zaphiris 2010; Rorissa 2010, Stuart 2012) and user Cox 2011). Images posted to groups also receive more ex- information (Mislove et al. 2008; Negoescu and Gatica-Pe- posure (Negoescu and Gatica-Perez 2008) and are, there- rez 2008; Nov, Naaman and Ye 2008). The API has also fore, more likely to be added to people’s favorites and are been used to automatically add machine tags to images more likely to receive comments and feedback from other (McWilliams 2008), and also to create novel mash-ups such members of the group. as earth album (a combination of Google Maps and Flickr Related to Flickr’s “groups” features are the many games images: www.earthalbum.com), and InfiniteComic (locates and competitions that take place. These games and compe- tweets and Flickr images based on supplied keywords and titions tend to occur within a specific group that has been turns them into a comic strip: infinitecomic.com). set up, and the overarching idea is to have fun while playing around with images (Mäyrä 2011), and for “awards” to be 3.0 Flickr users and motivation given to images that fulfill a certain criterion (Cox, Clough and Siersdorfer 2011). For example, in a “catch me if you 3.1 Flickr users can” game, an image submitted needs to match a previous image based on a specified attribute such as colour, shape, Cox (2008a) describes Flickr as encompassing all forms of genre, etc., and then the challenge passes along the line to photography: from people who could be defined as “snap- another person (Mäyrä 2011). One of the most well-known shooters” or casual hobbyists (those taking photos for games on Flickr is Photoshop Tennis where two players suc- friends and family, often of touristic travel and the mun- cessively edit the same image (Cox 2008b) using graphics dane); to people who would class themselves as serious am- software (McDonald 2007). Such “edits” may include: the ateurs or serious hobbyists (those with a wider audience of addition of a figure or object into the picture; changing the hobby contacts, and a shift in photo content away from per- head of a person in the image for another person’s head; sonal interest to presenting a sample of “good” photos); changing colours; editing objects; or to zoom out, whereby through to semi-professional and professional photogra- the image as it currently stands then becomes the cover of phers (those who have had formal photographic training a book or the picture on a TV screen contained within a and generally use photography as part of their job(s)). completely new image (Cox 2008b). Photoshop Tennis has Flickr is also widely used by a number of different organ- no winners or losers or awards given for the “best image” izations and cultural institutions. With one of the most no- created (although players can receive positive feedback and table being The Library of Congress. The Library of Con- accolade via comments received), the main purpose of the gress has collaborated with Flickr to create “The Com- contest is to collaboratively create images and to have fun mons,” whereby images from cultural heritage institutions (https://www.flickr.com/groups/pstennis/). that have no known copyright restrictions can be shared, and Flickr users are invited to add tags and comments to the 2.3 Application programming interface (API) images (Springer et al. 2008). Allowing Flickr users to add tags and descriptions overcomes the problem of time- The Flickr Application programming interface (API) al- starved library staff having to annotate immense collections lows Flickr users to access and interact with data on the of images (Earle 2014). Therefore, this kind of collabora- Flickr website (Anderson 2007). The API can, therefore, tion is particularly important for “making historical and spe- be seen as an important information retrieval mechanism cial format materials easier to find in order to be useful for on Flickr, allowing users to retrieve and download vast educational and other pursuits” (Springer et al. 2008). This amounts of images, and data relating to images and users. has sparked a number of other museums, libraries and ar- Whilst other photo sharing sites also had APIs available at chives to adopt similar practices using “The Commons” in the same time as Flickr (e.g., SmugMug, Photobucket), order to also increase awareness and discoverability of their Flickr provided the most comprehensive documentation collections, with the Smithsonian being another notable in- to accompany its API, thus making it more accessible to stitution that uses Flickr (Kalfatovic et al. 2008). people. McWilliams (2008) also described the Flickr API In an investigation of fifty-two cultural heritage insti- as “the web services standard by which other APIs should tutes that have Flickr accounts, Beaudoin and Bosshard be judged.” (2012) found that the predominant reason for using Flickr The API proved to be invaluable for academic research- was to provide access to the images in their collections, and ers that have needed to interrogate data such as: image and in many instances the institutions also thought that using 228 Knowl. Org. 46(2019)No.3 E. Stuart. Flickr: Organizing and Tagging Images Online

Flickr provided a better technical experience than placing uploaded. This could be to allow absent friends and family their images into an in-house content management system, to keep up to date with one’s life. Or it could be to share “providing the institution increased image storage capabil- images with people who have shared a mutual experience, ities, the ability to use their posted images in widgets and such as attending a wedding, or party, or even a conference apps, and the service’s ease of use.” However, in terms of or work-based event (Kindberg et al. 2005a, 2005b). There the types of images that the institutions actually posted to is also the notion of “passive” contact with people, whereby Flickr, it was found that over half of the images analysed people share and view photos online, because it is nice to were related to disseminating information about current see what certain people are up to but without the expecta- events and exhibitions occurring rather than being images tion of commenting or liking photos (Lin and Faste 2012). from the institutions’ collections. Social-communication is the motivation to upload im- ages to Flickr in the hope of drawing attention to them in 3.2 Motivations for uploading to Flickr order to gain likes, comments, feedback, accolades, or maybe even payment for the images in the hope they are In 2015, Flickr announced they had over ten billion images licensed (Ames and Naaman 2007; Angus and Thelwall on their site,6 and, therefore, in addition to understanding 2010). Images could also be uploaded as a conduit for con- the system features that have contributed towards Flickr’s necting with other like-minded people who share similar in- success, it is also important to understand why so many terests or hobbies (Cox, Clough and Marlow 2008). Social- people want to put their images on Flickr. Whilst people communication also covers motivations relating to self-ex- may be drawn to a website because of innovative features pression and self-presentation (e.g., using Flickr to present such as tagging and groups, there has to be a greater inter- an overall image of oneself, and to express one’s views and nal motivation at play when the use of that website ulti- feelings) (Ames and Naaman 2007). mately involves sharing one’s images with friends, family, Stuart (2012), utilizing the framework proposed by Ames and potentially the general public. and Naaman (2007) conducted an investigation of 456 ran- Ames and Naaman (2007) developed a taxonomy of mo- dom Flickr users and asked why they upload their images to tivations for tagging images in systems such as Flickr, and Flickr. The most popular reason cited via the use of a semi- this taxonomy includes four overarching categories that structured survey was social-communication (31.75%), and much of the literature on motivations for using web 2.0 sys- this was for respondents who had expressed one sole reason tems also fits into, as well as the literature on the motivation for using Flickr. More specifically, social signaling/attention to tag. The four overarching categories proposed by Ames was expressed as the main motivating factor, whereby re- and Naaman (2007) are: self-organisation, self-communica- spondents were keen to receive advice and feedback on the tion, social-organisation, and social-communication. photos they had uploaded in the hope of improving their Motivation based on self-organisation is the drive to use photography techniques. Self-communication was the least Flickr as a place to store and organise photographs, either popular motivating factor, with only twelve respondents re- for long-term backup, or as a way of being able to easily porting using Flickr solely for this purpose. access them at a future point. The fact that users of a site Linked to the motivation that some people upload their such as Flickr also “own” the content they are uploading images to Flickr in the hope of attaining commercial gain (compared to sites where external information is merely from them (Angus and Thelwall 2010) is the fact that be- being shared), is also likely to have implications on moti- tween 2008 and 2014, Flickr had a partnership with Getty vation and people are, therefore, far more likely to be in- Images (the stock photo agency). This partnership allowed terested in managing and preserving their content (Nov, Getty to contact photographers via Flickr if they wanted to Naaman and Ye 2009). In addition, as more images are pay to license their images. The partnership was extremely now being taken with cameraphones, many people worry successful, with over 400,000 images being selected for that the images on their cameraphones will be lost when commercial use.8 In 2014 this partnership came to an end, their phones are updated or upgraded, and so this is also a with no suggestion that it is likely to be renewed (Bowker driving factor for uploading images to sites such as Flickr 2017), signaling that Getty Images does not perhaps view its (David 2010). This is mirrored in the fact that the Apple relationship with Flickr in as high a regard as it once did. iPhone is still the most popular camera for uploading im- ages to Flickr.7 Motivation based on self-communication is 4.0 Failure to transform centered around the desire to keep track of and document day-to-day life or one’s own development in a particular After an inspection of articles written about Flickr since area (Ames and Naaman 2007). its decline in popularity, three main contributing factors Social-organisation is the motivation to use Flickr in or- seem to reoccur when discussing its decrease in popularity: der to allow other people to see the images that have been the company’s acquisition with Yahoo; its failure to imple- Knowl. Org. 46(2019)No.3 229 E. Stuart. Flickr: Organizing and Tagging Images Online ment a successful mobile platform; and the advent of new often had to have the same type of phone as the sender.11 image-centric smartphone applications such as Instagram But with the arrival of more affordable monthly phone and Snapchat. contracts, people began to increasingly have data plans that Flickr was bought by Yahoo in 2005, and in the two years allowed them time to connect to the web (Stuart 2013). following its acquisition it went from strength to strength. Therefore, people began to upload their images to social However, whereas Flickr had initially started out as a com- networking sites such as Facebook and Flickr, and more pany that had paved the way for innovative new features, its recently via image-centric social media apps such as Insta- innovation slowly began to stagnate after the acquisition. gram and Snapchat. “All Yahoo cared about was the database its users had built Instagram is a photo and video-sharing network that and tagged. It didn’t care about the community that had was launched in 2010, and Snapchat is a multimedia mes- created it or (more importantly) continuing to grow that saging app that was launched in 2011. The core aspect of community by introducing new features” (Honan 2012). both of these apps is that they are designed for sharing Flickr staff had to spend a lot of time on things related to images captured on someone’s smartphone in a quick and the acquisition and in making sure that certain demands of engaging way. Indeed, Snapchat’s main feature is centered the acquisition were being met, and this, therefore, ham- around the fact that images quickly disappear on a recipi- pered their ability to spend time on creating innovative new ent’s handset once they have been viewed, thus positioning features. As a result of this, Flickr missed the perfect win- the app as the perfect conduit for sharing those “mun- dow of opportunity for introducing a successful mobile dane” and fleeting photographs that have limited appeal platform. In fact, Flickr’s mobile platform only became for long term archiving. Instagram images can have filters fully operational in 2017.9 This missed opportunity meant applied to them, which can drastically change the look and that Flickr became difficult for users to access via their feel of an image, transforming a bad image into one that smartphones (Bowker 2017), at the exact time when is more aesthetically pleasing. Such is the appeal of these smartphone usage was booming. By 2015, two thirds of the apps that it is now often claimed that the future of pho- UK population owned a smartphone, with 33% preferring tography lies in cameraphone apps (Eler 2012). to use their smartphones for accessing online content ra- It is especially bittersweet that image-centric social media ther than laptops and desktop computers.10 This rise in the apps such as Instagram and Snapchat have usurped a lot of use of smartphone usage also went hand in hand with the attention away from Flickr for the sharing of images and growth of smartphone photography. The fact that people videos, as it was Flickr that was originally a forerunner in the carry their smartphones with them everywhere they go more nuanced aspects of social networking. Flickr allowed means that they are always “at hand’” for capturing photos, for contacts to be marked as friends or family, and images and by 2009, 67% of UK households were using their could be shared with friends, family, or just a few specific smartphones as their main camera to take pictures with friends and family, they could also be shared with the public (Dutton, Helsper and Gerber 2009, 13). at large, or they could be marked as entirely private. This is Whereas taking a photograph was once generally set a more complex form of networking compared to the more aside for special events such as weddings, christenings, binary relationships seen in Instagram, where someone ei- birthdays, holidays, and family portraits (Murray 2008), ther is or is not a contact, and content is either shared with photographs are now increasingly being taken of the more the public at large or only with one’s contact list if the user’s mundane aspects of everyday life (Okabe 2004), as well as account is set to private (Honan 2012). Flickr has been increasing numbers of selfies (Walker 2005; Kedzior, Allen usurped at something it paved the way in. and Schroeder 2016). Whilst “mundane” photographs and We live in an increasingly visual world, and digital im- “selfies” are not as likely to make it into the prized family ages are a ubiquitous part of everyday life (Jörgensen album, they are however more likely to be shared with 2016), it, therefore, follows suit that even people who do friends and family in a more fleeting way, to generate hu- not class themselves as photographers nonetheless enjoy mour among friends and family (Meyer 2008), to let loved taking and sharing images. As such, apps such as Instagram ones know they are being thought of, or to add to personal and Snapchat are more likely to appeal to such people due common ground with someone (Kindberg et al., 2005a, to their ease and simplicity. In many ways, Flickr was a 2005b), and this kind of photography exchange lends it- placeholder, satisfying the desire that people had to show- self perfectly to the mobile platform. case the increased number of images that they were now Sharing images via MMS was slow to take off for two taking on their smartphones, and once a platform came main reasons: firstly, most people tended to have pay as along that was more specifically designed for the layman you go phones at that time and MMS messages were more image taker rather than the aspiring photographer, they expensive to send than a normal text message; secondly, in jumped ship. It seems unlikely that these people will ever order for an MMS to be successfully received, the recipient return to Flickr. 230 Knowl. Org. 46(2019)No.3 E. Stuart. Flickr: Organizing and Tagging Images Online

5.0 The future Breslin, and Passant (2009), who explain that for resources tagged with terms such as “New York city, nyc and big ap- In April 2018 it was announced that Flickr has gone through ple, using one of the tags will only retrieve resources another acquisition, with Yahoo (who themselves were ac- tagged with the exact match.” Whereas if all of the tags quired by Verizon/Oath in 2017) selling Flickr to fellow could be linked to the same concept (uniquely identified photo sharing and hosting site SmugMug.12 Initial responses by a URI), then all images that were tagged with at least to the acquisition have been positive, with Flickr’s original one of the terms would all be subsequently retrieved. Link- creators Stewart Butterfield and Caterina Flake giving the ing tags to URIs solves the issue of ambiguity, as tags are thumbs up.13 The main reason behind the positive response linked to unambiguous URIs (Cagnazzo 2018). being that companies such as Yahoo/Verizon/Oath have There have been numerous studies already that have pro- different priorities when it comes to thinking about the users posed semi-automatic image annotation systems that have of Flickr compared to SmugMug. A company such as Veri- attempted to enrich Flickr tags with semantic relationships zon/Oath is a multinational telecommunications conglom- (Rattenbury, Good and Naaman 2007; Im and Park 2014. erate, whereas SmugMug is a fellow image sharing and host- Authors tend to caveat, however, that human assessment ing site. Whilst SmugMug is a paid for service that focuses will always be needed in conjunction with semi-automated on catering to semi-pro and professional photographers ra- systems, as evidenced by the introduction of Flickr’s contro- ther than social networking (Fleishman 2018), Flickr on the versial auto-tagging feature in 2015. The same is likely to be other hand (in addition to its paid for pro account option) true for the automatic addition of semantic tags. While se- also offers free accounts that come with one TB of storage. mantic tags may be more accurate, and users may be given However, SmugMug is nonetheless a site that is passionate the option of which tags to include from a suggestion pool, about photography, and that is a crucial difference when it nonetheless remains that the average Flickr user is likely to compared to Yahoo and Verizon/Oath. be uninterested in employing the use of semantic tags, de- One of the reasons attributed to Flickr’s demise was its spite any benefits for future search and retrieval. Although, failure to develop a successful mobile platform, and Smug- SmugMug could potentially persuade those photographers Mug’s CEO Don MacAskill already has ideas about how who are particularly keen to have their images found to the mobile app can be improved (Fleishman 2018). The adopt semantic tagging (perhaps via the use of a dedicated introduction of image-centric mobile apps such as Insta- Flickr plugin, meaning that not all Flickr users would have gram and Snapchat have also been seen to contribute to- semantic tagging forced upon them). SmugMug could high- wards Flickr’s demise, however, MacAskill does not intend light the benefits of semantic tagging in terms of search and to try to compete with such apps. This is actually a smart retrieval and leave it up to the users to decide whether or not move. It is unlikely that people who do not class them- to use it. selves as photographers or aspiring photographers will re- turn to Flickr, because apps such as Instagram and Snap- 6.0 Conclusions for knowledge organization chat fulfill their needs for a fun way to share their images in a much more instant way. Flickr is a more complex sys- With reference to knowledge retrieval, and the fact that tem, aimed at photographers, with more sophisticated im- Flickr users may not necessarily search for images using age editing capabilities, and more opportunities for social the same search terms as those assigned by the image up- networking related specifically to photography. MacAskill loader, Cox (2008a) posits that most Flickr users are not recognizes these differences, and rather than try to com- likely to be searching Flickr with a specific “information pete, he sees SmugMug and Flickr as a safe place for pho- need’,” and are instead likely to be browsing for visual tographers to “do anything they want to do with their pleasure, thus rendering precision and recall as irrelevant. work,” with the ultimate aim being to create the technol- This, therefore, creates a very different knowledge organi- ogy that allows photographers to create the images that zation landscape than traditional classification indexing in they want to create (Fleishman 2018). specialist picture libraries and databases. Spiteri and Pennington (2018) advise that, “the internet Findability is nonetheless still an important aspect of a is moving rapidly from the social web embodied in Web site as large as Flickr, and tagging remains the dominant 2.0, to the semantic web (Web 3.0), where information re- method of retrieving images. The section on tagging high- sources are linked in such a way as to make them compre- lighted some of the problems inherent with tagging, and hensible to both machines and humans.” If machines can how it is these problems with tags that negatively affect more easily understand information resources, they will be their role in an information retrieval environment (Kim et able to “build relationships between resources, enrich us- al. 2008). Whilst the adoption of semantic tagging (tagging ers’ experience and improve discoverability” (Cagnazzo content with URIs) is seen as a way of overcoming some 2018, 12). Using an example put forward by Choudhury, of the problems inherent with user-generated tags, seman- Knowl. Org. 46(2019)No.3 231 E. Stuart. Flickr: Organizing and Tagging Images Online tic web requirements can often seem intimidating, and 4. Kim, Eugene. 2014. “The Guy Who’s Trying to Build more user-friendly interfaces are needed (Cagnazzo 2018). the Next Microsoft Wrote This Epic Resignation Let- As Flickr has always been a website that has been revered ter When He Left Yahoo.” Business Insider, August 7. for its savvy design and easy to use interfaces (Tik 2005), http://www.businessinsider.com/stewart-butterfield- it has the perfect vantage point to try to persuade at least epic-resignation-letter-2014-8?IR=T some users to adopt semantic tagging. Professional and 5. Hern, Alex. 2015. “Flickr Faces Complaints Over ‘Of- semi-professional photographers who are particularly keen fensive’ Auto-Tagging for Photos.” The Guardian, May to have their images found may be the user demographic 20. https://www.theguardian.com/technology/2015/ who are most likely to be persuaded of the benefits of se- may/20/flickr-complaints-offensive-auto-tagging- mantic tagging. photos However, now that Flickr has been acquired by Smug- 6. Smith, Craig. 2018. “17 Interesting Flickr Stats and Mug, with their emphasis firmly placed on the role of the Facts (February 2018) by the Numbers.” Digital Market- photographer rather than on profits, there is hope that ing Ramblings, February 3. https://expandedramblings. Flickr will continue to grow its position as the biggest repos- com/index.php/flickr-stats/ itory of images on the internet. With such a big repository 7. Stapley, Jon. 2017. “Flickr Announces its Most Popular of images, comes a level of responsibility with regards to Cameras in 2017.” Digital Camera World, December 11. the content of the images stored within the system. Web- https://www.digitalcameraworld.com/news/flickr-an sites that become big and successful become arbiter in de- nounces-its-most-popular-cameras-in-2017 ciding what is and is not allowed on their systems. For social 8. Galai, Noam. 2014. “Getty Images Announces the Ter- sites that become very successful, this has important politi- mination of Their Partnership with Flickr. Fstoppers,” cal and societal implications, as it is the worldview of the March 11. https://fstoppers.com/stock/getty-images- sites owners that can begin to dictate the type of content announces-termination-their-partnership-flickr-7998 that is allowed on the site. This can be reflected in the con- 9. Flickr. 2017. “Making Flickr Fully Responsive.” Flickr troversy in 2018 surrounding Facebook’s co-founder Mark blog, April 18. http://blog.flickr.net/en/2017/04/18/ Zuckerberg who was accused of censoring content from making-flickr-fully-responsive/ Republican vloggers (Robertson 2018). Flickr has to be 10. Ofcom. 2015. “The UK Is Now a Smartphone Society.” mindful about its approach to image censorship in order to August 6. https://www.ofcom.org.uk/about-ofcom/ achieve the right balance between protecting its user base latest/media/media-releases/2015/cmr-uk-2015. and allowing freedom of expression. 11. “Lack of Text Appeal.” Economist 380 (2006): 56. Ultimately, Flickr’s main weapons for rising back up the 12. Flickr. 2018. “Together is Where Photographers Be- popularity ranks are ones that were executed perfectly be- long!” Flickr blog, April 20. https://blog.flickr.net/en/ fore Yahoo/Verizon/Oath came along, and that was adapt- 2018/04/20/together-smugmug-flickr/ ing quickly in the current internet age and listening to feed- 13. Hawk, Thomas. 2018. “My Thoughts on the SmugMug back from its loyal fan base. Now that Flickr has joined Flickr Acquisition.” PetaPixel, April 21. https://peta forces with SmugMug, it is envisaged that Flickr users will pixel.com/2018/04/21/thoughts-on-the-smugmug- once again be placed at the forefront of any future endeav- flickr-acquisition/ ors. References Notes Ames, Morgan, Dean Eckles, Mor Naaman, Mirjana 1. Wray, Richard, and Bobbie Johnson. 2008. “I’m Off to Spasojevic and Nancy Van House. 2010. “Requirements Tend My Alpacas – Flickr Founder’s Exit Marks End for Mobile Photoware.” Personal and Ubiquitous Compu- of a Web Era.” The Guardian, June 20. https://www. ting 14, no. 2: 95-109. theguardian.com/media/2008/jun/20/digitalmedia. Ames, Morgan and Mor Naaman. 2007. “Why We Tag: yahoo Motivations for Annotation in Mobile and Online Me- 2. “5,000,000,000.” Flickr blog, September 19, 2010. http:// dia.” Paper presented at the SIGCHI Conference on blog.flickr.net/en/2010/09/19/5000000000/ Human Factors in Computing Systems, San Jose, April 3. Etherington, Darrell. 2014. “Flickr at 10: 1M Photos 28-May 3. Shared Per Day, 170% Increase Since Making 1TB Anderson, Paul. 2007. “What is Web 2.0? Ideas, Technol- Free.” https://techcrunch.com/2014/02/10/flickr-at- ogies and Implications for Education.” JISC Technol- 10-1m-photos-shared-per-day-170-increase-since-mak ogy and Standards Watch, February. http://www.jisc. ing-1tb-free/ ac.uk/media/documents/techwatch/tsw0701b.pdf 232 Knowl. Org. 46(2019)No.3 E. Stuart. Flickr: Organizing and Tagging Images Online

Angus, Emma, David Stuart and Mike Thelwall. 2010. Cox, Andrew M. 2008b. “The Shaping of Mass Participa- “Flickr's Potential as an Academic Image Resource: An tion in Web 2.0: Photoshop Contest on Flickr.” Paper Exploratory Study.” Journal of Librarianship and Infor- presented at the Workshop on Social Networking, As- mation Science 42: 268-78. sociation of Internet Researchers 9th Annual Confer- Angus, Emma, and Mike Thelwall. 2010. “Motivations for ence (Copenhagen, Denmark, October 15-18, 2008). Image Publishing and Tagging on Flickr.” In Publishing in Cox, Andrew M, Paul Clough, and Jennifer Marlow. 2008. the Networked World: Transforming the Nature of Communica- “Flickr: A First Look at User Behaviour in the Context tion; 14th International Conference on Electronic Publishing 16- of Photography as Serious Leisure.” Information Research 18 June 2010, Helsinki, Finland, ed. Turid Hedlund, Yasar 13, no. 1. http://informationr.net/ir/13-1/paper336. Tonta. Helsinki: Hanken School of Economics, 189-204. html Angus, Emma, Mike Thelwall and David Stuart. 2008. Cox, Andrew M, Paul Clough, and Stefan Siersdorfer. “General Patterns of Tag Usage in Flickr Image 2010. “Developing Metrics to Characterize Flickr Groups.” Online Information Review 32, no. 1: 89-101. Groups.” Journal of the American Society for Information Sci- Aurnhammer, Melanie, Peter Hanappe and Luc Steels. ence and Technology 62: 493-506. 2006. “Integrating Collaborative Tagging and Emer- David, Gaby. 2010. “Camera Phone Images, Videos and gent Semantics for Image Retrieval.” Paper presented at Live Streaming: A Contemporary Visual Trend.” Visual the Workshop on Collaborative Web Tagging, 15th In- Studies 25, no. 1: 89-98. ternational World Wide Web Conference (Edinburgh, Ding, Ying, Elin K. Jacob, Zhixiong Zhang, Schubert Foo, Scotland, May 23-26, 2006). http://digital.csic.es/bit- Erjia Yan, Nicolas L. George, and Lijiang Guo. 2009. stream/10261/127833/1/Image%20Retrieval.pdf “Perspectives on Social Tagging.” Journal of the American Barton, David. 2015. “Tagging on Flickr as a Social Prac- Society for Information Science and Technology 60: 2388-401. tice.” In Discourse and Digital Practices: Doing Discourse Dotan, Amir, and Panayiotis Zaphiris. 2010. “A Cross-Cul- Analysis in the Digital Age, ed. Rodney Jones, Alice Chik, tural Analysis of Flickr Users from Peru, Israel, Iran, and Christoph A. Hafner. New York: Routledge, 48-65. Taiwan and the UK.” International Journal of Web Based Bausch, Paul and Jim Bumgardner. 2006. Flickr Hacks: Tips Communities 6: 284-302. and Tools for Sharing Photos Online. Sebastopol, CA: Dutton, William H, Ellen J. Helsper, and Monica M. Ger- O’Reilly Media. ber. 2009. The Internet in Britain: 2009. Oxford: Oxford Beaudoin, Joan E. and Cécile Bosshard. 2012. “Flickr Im- Internet Institute. ages: What & Why Museums Share.” In ASIS&T 2012: Eakins, John P. and Margaret E. Graham. 1999. “Content- Proceedings of the 75th ASIS & T Annual Meeting; Infor- based Image Retrieval.” http://www.leeds.ac.uk/edu mation, Interaction, Innovation, ed. Andrew Grove. Proceed- col/documents/00001240.htm ings of the American Society for Information Science Earle, Evan., 2014. “Crowdsourcing Metadata for Library and Technology 49. Silver Spring, MD: ASIST. and Museum Collections Using a Taxonomy of Flickr Berinstein, Paula. 1996. Finding Images Online: Online User's User Behavior.” Masters thesis, Cornell University. Guide to Image Searching in Cyberspace. Wilton, CT: Pember- Eler, Alicia. 2012. “Flickr Can't Go Back to What it Once ton. Was.” Readwrite (blog), February 22. http://www.read- Bowker, Daniela. 2017. “Flickr is on its Way Out. What Are writeweb.com/archives/flickr_cant_go_back_to_what Your Alternatives?” DIY Photography (blog). https:// _it_once_was/ www.diyphotography.net/flickr-way-alternatives/ Fleishman, Glenn. 2018. “Yes, Flickr Has a Future – As “A Cagnazzo, Laura. 2018. “Tagging the Semantic Web: Com- Safe Place for Photographers.” Fast Company, April bining Web 2.0 and Web 3.0.” In Social Tagging in a 26th. https://www.fastcompany.com/40563632/yes- Linked Data Environment, ed. Louise Spiteri and Diane flickr-has-a-future-as-a-safe-place-for-photographers Pennington. London: Facet, 11-38. Golder, Scott A. and Bernardo A. Huberman. 2006. “Us- Choudhury, Smitashree, John G. Breslin, and Alexandre age Patterns of Collaborative Tagging Systems.” Journal Passant. 2009. “Enrichment and Ranking of the Youtube of Information Science 32: 198-208. Tag Space and Integration with the Linked Data Cloud.” Graham, Margaret. E. 2001. “The Cataloguing and Index- In Proceedings of the International Semantic Web Conference, ing of Images: Time for a New Paradigm?” Art Libraries eds. Abraham Bernstein, David R. Karger, Tom Heath, Journal 26: 22-7. Lee Feigenbaum, Diana Maynard, Enrico Motta and Guy, Marieke and Emma Tonkin. 2006. “Folksonomies: Krishnaprasad Thirunarayan. Berlin: Springer, 747-62. Tidying Up Tags?” D-Lib Magazine 12, no. 1. http:// Cox, Andrew M. 2008a. “Flickr: A Case Study of Web www.dlib.org/dlib/january06/guy/01guy.html 2.0.” Aslib Proceedings 60: 493-516. Hammond, Tony, Timo Hannay, Ben Lund, and Joanna Scott. 2005. “Social Bookmarking Tools (I): A General Knowl. Org. 46(2019)No.3 233 E. Stuart. Flickr: Organizing and Tagging Images Online

Review.” D-Lib Magazine 11, no. 4. http://www.dlib. Kindberg, Tim, Mirjana Spasojevic, Rowanne Fleck, and org/dlib/april05/hammond/04hammond.html Abigail Sellen. 2005b. “I Saw This and Thought of You: Heckner, Markus, Michael Heilemann, and Christian Some Social Uses of Camera Phones.” In CHI '05 Ex- Wolff. 2009. “Personal Information Management vs. tended Abstracts on Human Factors in Computing Systems. New Resource Sharing: Towards a Model of Information York, NY: ACM, 1545-8. doi:10.1145/1056808.1056962 Behaviour in Social Tagging Systems.” In Proceedings of Lerman, Kristina, and Laurie Jones. 2006. “Social Brows- the Third International Conference on Weblogs and Social Me- ing on Flickr.” Paper presented at the International dia, eds. Eytan Adar, Matthew Hurst, Tim Finin, Natalie Conference on Weblogs and Social Media, Boulder, Glance, Nicolas Nicolav and Belle Tseng. Menlo Park, Colorado, March 26-28. CA: AAAI Press, 42-9. http://www.aaai.org/ocs/in Lerman, Kristina, Anon Plangprasopchok, and Chio dex.php/ICWSM/09/paper/download/212/407 Wong. 2007. “Personalizing Image Search Results on Holmes, Paul, and Andrew M. Cox. 2011. “‘Every Group Flickr.” Paper presented at the Workshop on Intelligent Carries the Flavour of the Admins’: Leadership on Techniques for Web Personalization, 22nd AAAI Con- Flickr.” International Journal of Web Based Communities 7: ference on Artificial Intelligence, (Vancouver, Canada, 376-91. July 22-26, 2007). https://arxiv.org/abs/0704.1676 Honan, Mat. 2012. “How Yahoo Killed Flickr and Lost the Lin, Clifton and Haakon Faste. 2012. “Image Exploration Internet.” Gizmodo (blog), May 15. https://gizmodo. and Social Relationships.” Paper presented at Confer- com/5910223/how-yahoo-killed-flickr-and-lost-the-in ence on Human Factors in Computing Systems, Austin, ternet Texas, May 5-10. Im, Dong-Hyuk and Geun-Duk Park. 2015. “Linked Tag: Macgregor, George, and Emma McCulloch. 2006. “Col- Image Annotation Using Semantic Relationships Be- laborative Tagging as a Knowledge Organisation and tween Image Tags.” Multimedia Tools and Applications 74: Resource Discovery Tool.” Library Review 55: 291-300. 2273-87. Marlow, Cameron, Mor Naaman, danah boyd, and Marc Da- Jansen, Bernard J. 2008. “Searching for Digital Images on vis. 2006. “Position Paper, Tagging, Taxonomy, Flickr, the Web.” Journal of Documentation 64: 81-101. Article, Toread.” Paper presented at the Workshop on Jörgensen, Corinne. 2003. Image Retrieval: Theory and Re- Collaborative Web Tagging, 15th International World search. Lanham, MD: Scarecrow. Wide Web Conference (Edinburgh, Scotland, May 23-26, Jörgensen, Corinne. 2016. “Photos: Flickr, Facebook and 2006). https://www.danah.org/papers/WWW2006.pdf Other Social Networking Sites.” In Managing Digital Cul- Mathes, Adam. 2004. “Folksonomies-Cooperative Classifi- tural Objects: Analysis, discovery and retrieval, ed. Allen Fos- cation and Communication Through Shared Metadata.” ter and Pauline Rafferty. London: Facet, 143-81. Unpublished paper for Computer Mediated Communi- Kalfatovic, Martin R, Effie Kapsalis, Katherine P. Spiess, cation LIs590MC, Graduate School of Library and In- Anne Van Camp, and Michael Edson. 2008. “Smithson- formation Science, University of Illinois at Urbana- ian Team Flickr: A Library, Archives, and Museums Col- Champaign. http://www.adammathes.com/academic/ laboration in Web 2.0 Space.” Archival science 8, no. 4: 267. computer-mediated-communication/folksonomies.html Kedzior, Richard, Douglas E. Allen, and Jonathan Mäyrä, Frans. 2011. “Games in the Mobile Internet: Under- Schroeder. 2016. “The Selfie Phenomenon-Consumer standing Contextual Play in ‘Flickr’ and ‘Facebook’.” In Identities in the Social Media Marketplace.” European Jour- Online Gaming in Context: The Social and Cultural Significance nal of Marketing 50: 1767-72. of Online Games, ed. Garry Crawford, Victoria K Gosling, Kennedy, Lyndon, Mor Naaman, Shane Ahern, Rahul Nair, and Ben Light. London: Routledge: 108-29. and Tye Rattenbury. 2007. “How Flickr Helps us Make McCracken, Harry. 2014. “Flickr Turns 10: The Rise, Fall, Sense of the World: Context and Content in Commu- and Revival of a Photo-Sharing Community,” Time, Feb- nity-Contributed Media Collections.” In Proceedings of the ruary 10. http://time.com/6855/flickr-turns-10-the-rise 15th ACM International Conference on Multimedia. New -fall-and-revival-of-a-photo-sharing-community/ York, NY: ACM, 631-40. doi:10.1145/1291233.1291384 McDonald, David W. 2007. “Visual Conversation Styles in Kim, Hak Lae, Alexandre Passant, John G. Breslin, Simon Web Communities.” In Proceedings of the 40th Annual Ha- Scerri, and Stefan Decker. 2008. “Review and Alignment waii International Conference on System Sciences 3-6 January of Tag Ontologies for Semantically-linked Data in Col- 2007 Big Island, Hawaii: ABSTRACTS and CD-ROM of laborative Tagging Spaces.” In Proceedings IEEE Interna- Full Papers, ed. Ralph H. Sprague, Jr. Los Alamitos, CA: tional Conference on Semantic Computing 2008: 4-7 August IEEE Computer Society, 76. doi:10.1109/HICSS.2007. 2008, Santa Clara, California. Los Alamitos, CA: IEEE 605 Computer Society 1: 315-22. doi:10.1109/ICSC.2008.79 234 Knowl. Org. 46(2019)No.3 E. Stuart. Flickr: Organizing and Tagging Images Online

McWilliams, Jeremy. 2008. “Developing an Academic Im- Rafferty, Pauline. 2016. “Managing, Searching and Finding age Collection with Flickr.” Code4Lib Journal 3. http:// Digital Cultural Objects: Putting it in Context.” In Man- journal.code4lib.org/articles/74 aging Digital Cultural Objects: Analysis, discovery and retrieval, Meyer, Eric. T. 2008. “Digital Photography.” In Handbook ed. Allen Foster, and Pauline Rafferty. London: Facet, of Research on Computer Mediated Communication, ed. Sigrid 3-23. Kelsey and Kirk St. Amant. Hershey, PA: Information Rafferty, Pauline, and Rob Hidderley. 2007. “Flickr and Science Reference, 1: 791-803. Democratic Indexing: Dialogic Approaches to Index- Mislove, Alan, Hema S. Koppula, Krishna Gummadi, Peter ing.” Aslib Proceedings 59: 397-410. Druschel, and Bobby Bhattacharjee. 2008. “Growth of Rattenbury, Tye, Nathaniel Good, and Mor Naaman. 2007. the Flickr Social Network.” Paper presented at the Work- “Towards Automatic Extraction of Event and Place Se- shop on Online Social Networks, ACM SIGCOMM mantics from Flickr Tags.” In Proceedings of the 30th An- 2008 Conference, (Seattle, USA, August 17-22, 2008). nual International ACM SIGIR Conference on Research and http://conferences.sigcomm.org/sigcomm/2008/work Development in Information Retrieval. New York, NY: ACM, shops/wosn/papers/p25.pdf 103-10. doi:10.1145/1277741.1277762 Morville, Peter. 2005. Ambient Findability: What We Find Robertson, Adi. 2018. “Republican Lawmakers Keep Grill- Changes Who We Become. Sebastopol, CA: O’Reilly. ing Mark Zuckerberg About ‘Censoring’ Two Conserva- Murray, Susan. 2008. “Digital Images, Photo-Sharing, and tive Vloggers.” Verge (blog), April 11. https://www.the Our Shifting Notions of Everyday Aesthetics.” Journal verge.com/2018/4/11/17225120/mark-zuckerberg- of Visual Culture 7, no. 2: 147-63. facebook-congress-cruz-blackburn-diamond-silk Negoescu, Radu-Andrei, Brett Adams, Dinh Phung, Rorissa, Abebe. 2010. “A Comparative Study of Flickr Svetha Venkatesh, and Daniel Gatica-Perez. 2009. Tags and Index Terms in a General Image Collection.” “Flickr Hypergroups.” In Proceedings of the 17t ACM In- Journal of the American Society for Information Science and ternational Conference on Multimedia. New York, NY: ACM, Technology 61, no. 11: 2230-42. 813-5. doi:10.1145/1631272.1631421 Shirky, Clay. 2005. “Ontology is Overrated: Categories, Negoescu, Radu-Andrei, and Daniel Gatica-Perez. 2008. Links, Tags,” Clay Shirky’s writings About the Internet (blog). “Analyzing Flickr Groups.” In Proceedings of the 2008 In- http://shirky.com/writings/ontology_overrated.html ternational Conference on Content-Based Image and Video Re- Sigurbjörnsson, Börkur, and Roelof van Zwol. 2008. trieval. New York, NY: ACM, 417-26. doi:10.1145/138 “Flickr Tag Recommendation Based on Collective 6352.1386406 Knowledge.” In Proceedings of the 17th International Con- Nov, Oded, Mor Naaman, and Chen Ye. 2009a. “Motiva- ference on World Wide Web. New York, NY: ACM, 327-36. tional, Structural and Tenure Factors that Impact doi:10.1145/1367497.1367542 Online Community Photo Sharing.” In Proceedings of the Smith, Gene. 2008. Tagging: People Powered Metadata for the Third International Conference on Weblogs and Social Media, Social Web. Berkeley, CA: New Riders. eds. Eytan Adar, Matthew Hurst, Tim Finin, Natalie Soergel, Dagobert. 2009. “Digital Libraries and Knowledge Glance, Nicolas Nicolav and Belle Tseng. Menlo Park, Organization.” In Semantic Digital Libraries, ed. Sebastian CA: AAAI Press, 138-45. https://www.aaai.org/ocs/ Ryszard Kruk and Bill McDaniel. Berlin: Springer, 9-39. index.php/ICWSM/09/paper/viewFile/206/426 Spiteri, Louise. F. 2007. “The Structure and Form of Folk- Nov, Oded, Mor Naaman, and Chen Ye. 2009b. “Analysis sonomy Tags: The Road to the Public Library Catalog.” of Participation in an Online Photo-Sharing Commu- Information Technology and Libraries 26: 13-25. nity: A Multidimensional Perspective.” Journal of the Spiteri, Louise, and Diane Pennington. 2018. “Introduc- American Society for Information Science and Technology 61: tion: The Continuing Evolution of Social Tagging.” In 433-637. Social Tagging in a Linked Data Environment, ed. Louise Okabe, Daisuke. 2004. “Emergent Social Practices, Situa- Spiteri and Diane Pennington. London: Facet, 1-10. tions and Relations Through Everyday Camera Phone Springer, Michelle, Beth Dulabahn, Phil Michel, Barbara Use.” Paper presented at the Mobile Communications Natanson, David Reser, David Woodward, and Helena and Social Change Workshop, 2004 International Con- Zinkham. 2008. “For the Common Good: The Library ference on Mobile Communication, (Seoul, Korea, Oc- of Congress Flickr Pilot Project.” https://www.loc.gov/ tober 18-19, 2004). http://www.itofisher.com/mito/ rr/print/flickr_report_final.pdf archives/okabe_seoul.pdf Stuart, Emma. 2012. “Motivations to Upload and Tag Im- Prieur, Christophe, Dominique Cardon, Jean-Samuel ages vs Tagging Practice: An Investigation of The Web Beuscart, Nicolas Pissard, and Pascal Pons. 2008. “The 2.0 Site Flickr.” PhD diss., University of Wolverhampton. Strength of Weak Cooperation: A Case Study on Stuart, Emma. 2013. “Organizing Photographs: Past and Flickr.” http://arxiv.org/abs/0802.2317 Present.” In New Directions in Information Organization, ed. Knowl. Org. 46(2019)No.3 235 E. Stuart. Flickr: Organizing and Tagging Images Online

Jung-ran Park and Lynne Howarth. Bingley, England: van Zwol, Roelof. 2007. “Flickr: Who is looking?” In Pro- Emerald,137-55. ceedings of the IEEE/WIC/ACM International Conference on Stvilia, Besiki, and Corinne Jörgensen. 2010. “Member Ac- Web Intelligence (WI 2007)November 2-5, 2007, Fremont Mar- tivities and Quality of Tags in a Collection of Historical riott Hotel, Silicon Valley, USA, ed. Tsau Young (T.Y.) Lin, Photographs in Flickr.” Journal of the American Society for Laura Haas, Janusz Kacprzyk, Rajeev Motwani, Andrei Information Science and Technology 61: 2477-89. Broder, and Howard Ho. Los Alamitos, CA: IEEE Com- Terras, Melissa. 2008. Digital Images for the Information Profes- puter Society, 184-90. doi:10.1109/WI.2007.60 sional. Aldershot, England: Ashgate. Walker, Jill. 2005. “Mirrors and Shadows: The Digital Aes- Tik, Jan. 2005. “Why is Flickr So Successful.” Flickr Central thicisation of Oneself.” Paper presented at the Digital (blog), May 20. https://www.flickr.com/groups/cen Arts and Culture Conference, Copenhagen, Denmark, tral/discuss/36512/ December 1-3, 2005, https://bora.uib.no/handle/1956/ Van House, Nancy. 2007. “Flickr and Public Image Sharing: 1136 Distant Closeness and Photo Exhibition.” In Proceedings Weinberger, David. 2007. Everything is Miscellaneous: The Power of CHI ‘07 Conference on Human Factors in Computing Sys- of the New Digital Disorder. New York: Times Books. tems. New York, NY: ACM, 2717-22. doi:10.1145/1240 Wilkinson, David. 2007. Flickr Mashups. Indianapolis, IN: 866.1241068 Wrox/Wiley. Van House, Nancy, Marc Davis, Morgan Ames, Megan Xu, Zhichen, Yun Fu, Jianchang Mao, and Difu Su. 2006. Finn, and Vijay Viswanathan. 2005. “The Uses of Per- “Towards the Semantic Web: Collaborative Tag Sugges- sonal Networked Digital Imaging: An Empirical Study of tions.” Paper presented at the Workshop on Collabora- Cameraphone Photos and Sharing.” In CHI 2005: Tech- tive Web Tagging, 15th International World Wide Web nology, Safety, Community: Conference Proceedings; Conference on Conference (Edinburgh, Scotland, May 23-26, 2006). Human Factors in Computing Systems, Portland, Oregon, USA, http://www.ambuehler.ethz.ch/CDstore/www2006/ April 2-7 , ed Wendy Kellogg, Shumin Zhai, Carolyn www.rawsugar.com/www2006/13.pdf Gale, and Gerrit van der Veer. New York, NY: ACM. Zollers, Alla. 2007. “Emerging Motivations for Tagging: doi:10.1145/1056808.1057039 Expression, Performance, and Activism.” Paper pre- Van House, Nancy, Marc Davis, Yuri Takhteyev, Morgan sented at [WWW2007 Workshop: Tagging and Metadata Ames, and Megan Finn. 2004. “The Social Uses of Per- for Social Information Organization], World Wide Web sonal Photography: Methods for Projecting Future Im- Conference, Banff, Alberta, Canada, May 8-12, 2007. aging Applications.” http://people.ischool.berkeley.edu/ http://wwwconference.org/www2007/workshops/pa ~vanhouse/photo_project/pubs/vanhouse_et_al_2004 per_55.pdf b.pdf

236 Knowl. Org. 46(2019)No.3 Letters to the Editor

Letters to the Editor

DOI:10.5771/0943-7444-2019-3-236

Knowledge Ontology: A Tool for the Unification (neural language). The process likes a chain linking the uni- of Knowledge verse to personal life. So, the higher-level language has the ability to reflect the lower language. We also use the life One important mission of knowledge organization is to world as an example: the animals can perceive the nature construct a united KOS (knowledge organization system) by vision and audition; the animals use a neural language that covers general and special domains and cover experts that is higher than the DNA language. Specially, we, human and ordinary users. To this end, I propose the “knowledge beings, use the highest-level language: the symbolic lan- ontology” project based on a human-needs driven know- guage. So, we get an ability that the animals do not have: ledge classification model (Guohua Xiao 2013). Five relat- to explain the real world. Specially, we create linguistics to ed ontologies accessible at the URL (https://github.com/ explain language with language itself. knowledgeontology/KO) are being constructed (in OWL Let us try to answer three basic questions (Smiraglia, format by Protégé (Mark A. Musen 2015)): “pure methods,” Richard P. 2014) on knowledge organization in another way: “pure technology,” “pure theory,” “social life,” “personal life.” Q: How do I know? The “knowledge ontology” is based on the core hy- A: By language (language is the foundation of know- pothesis (the philosophy of “knowledge ontology,” Figure ing). 1A): language is the clue linking personal life to the uni- Q: What is? verse. The idea is similar to the “tree of knowledge (Hen- A: By philosophy (including mathematics and aes- riques, G.R.2013),” but the difference is that the “know- thetics, tools to distinguish). ledge ontology” makes an obvious distinction between the Q: How is it ordered? real world and the knowledge on the real world. There are A: By science of knowledge (to find structure). two meanings of language: a broad one and a narrow one. Apparently, the narrow one is symbolic language that we I integrate language, philosophy and science of knowledge human beings use. Based on languages, we create a virtual as the first ontology in the “knowledge ontology.” The world: knowledge. However, the broader understanding three parts are dependent; that is why computer scientists considers the law that nature and society abide by. For ex- use knowledge graph technology to analyse NLP (natural ample, the life world run by the language composed with language processing) by machine learning (methods, can mainly four letters “ATGC.” The DNA language creates be considered as philosophy here); meanwhile, NLP also “the DNA knowledge,” which creates a higher language supports the knowledge graph construction.

Figure 1. A) The core philosophy of knowledge ontology B) The structure of knowledge ontology. Knowl. Org. 46(2019)No.3 237 Letters to the Editor

Based on the core hypothesis described above, I designed –Religions (void knowledge, not real, but also useful): the following rules to organize knowledge (Figure 1B): – We human beings must get a complete explanation of the whole world (due to the limitation of cogni- 1. Isomorphism: tive ability, we need religions)! Figure 1(A) describes the hierarchical structure of the world. The “pure methods” ontology is a special Next, I will introduce the structure of the “knowledge on- ontology that can be seen as a link between the real tology” briefly (Figure 1B). “Pure methods” refers to the- world and knowledge (it is a special spot, not involv- oretical methods mainly composed of languages, philoso- ing any entities). So, we can see four ontologies (ex- phy and science of knowledge, which are discussed above. cept “pure methods”) in “knowledge ontology” as “Pure technology” is near to technology science, which four mappings of the real world. I will explain why supply tools but do not satisfy human needs directly. The the four ontologies are homogeneous with the real three basics of “pure technology” are material, energy and world using an example. Such as knowledge on lan- information that integrate a comprehensive technology: guages and knowledge can be located in (the exam- robotics. “Pure theory” and “application” (including “so- ple is an extension of “The brain is a knowledge cial life” and “personal life”) have the same structure, graph” (Guohua Xiao 2019:) which are composed of sciences, arts and religions whose relationships are discussed above. But “pure theory,” “so- – In the real word: the potential ability for human be- cial life” and “personal life” have different focuses. Spe- ings to create language and knowledge cially, “personal life” refers to the Maslaw’s hierarchy of – Pure technology: knowledge graphs in artificial intel- needs (Maslow, A.H. 1943), and these ontologies are ho- ligence mogeneous. That is why the model is human-needs driv- – Pure theory: brain semantic networks en. Detailed ontologies can be downloaded from the URL – Social life: applied psychology dealing with language of “knowledge ontology.” barriers Finally, in order to compare “knowledge ontology” with – Personal life: knowledge on speech and communica- another two important KOSs, Peirce's classification of sci- tion ences (https://en.wikipedia.org/wiki/Classification_of_the _sciences_(Peirce)) and Information Coding Classifica- 2 Dichotomy: tion(Ingetraut Dahlberg 2012), I annotated these two KOSs Dichotomy is easy to understand. In the human- to the “knowledge ontology” (only high level terms are an- needs driven model, I classify the knowledge into notated at present, and annotation files can be downloaded scientific method and sciences. And in “knowledge from the URL of the “knowledge ontology”). Here, I just ontology,” the dichotomy rule is used more widely, give an example of the annotations: in Peirce’s classification such as “pure theory” vs “application,” “social life” of sciences, there is an interesting classification, “science of vs “personal life.” Moreover, the dichotomy rule is review.” I think it has a similar meaning with “religions,” so easy to be used to analyse interdisciplinary subjects. I annotate it to “religions” in the “knowledge ontology.” In summary, I described a theoretical framework of a 3 Trichotomy: tool for the unification of knowledge in the letter. The on- Material, energy and information are three aspects of tologies and annotations will be updated in the future, the world, which is the source of trichotomy. So, I di- which will make the “knowledge ontology” and its anno- vide sciences into three parts: natural science, social tations dynamic. I hope it is useful for both experts on science and cognitive science. Similarly, “personal life” knowledge organization (just another KOS) and ordinary can be divided too: natural needs, social needs and users (a tool helping to choose a career). By the way, I cognitive needs. There is a new application of the tri- thank Prof. Changle Zhou at Xiamen University for some chotomy rule in “knowledge ontology:” “pure the- inspirations from his open lectures. ory” or “application” can be divided into: References – Sciences (hard but real knowledge, based on mathe- matics): Dahlberg, Ingetraut. 2012. “A Systematic New Lexicon of On how to understand the world (in “pure theory”) All Knowledge Fields Based on the Information Cod- and how to change the world? (in “application”) ing Classification.” Knowledge Organization 39:142– – Arts (soft but real knowledge, based on aesthetics): 150. doi:10.5771/0943-7444-2012-2-142 On how to constrain the understanding and chang- ing above? 238 Knowl. Org. 46(2019)No.3 Letters to the Editor

Guohua Xiao. 2013. “A Knowledge Classification Model 2016 2000 1984 1969 Based on the Relationship Between Science and Human (13) (6) (11) (14)

Needs.” Knowledge Organization 40: 77-8. Guohua Xiao. 2019. “The Brain Is A Knowledge Graph.” 2015 1999 1983 1968 (12) (12) (14) (12) Knowledge Organization 46: 71.

Henriques, G. R.2013. “The Tree of Knowledge System 2014 1998 1982 1967 and the Theoretical Unification of Psychology.” Review (10) (19) (22) (5)

of General Psychology 7: 150-82. 2013 1997 1981 1966 Maslow, A. H. 1943. “A Theory of Human Motivation.” (11) (14) (20) (4) Psychological Review. 50: 370–96.

Musen, Mark A. 2015.”The Protégé Project: A Look 2012 1996 1980 1965 (9) (7) (13) (5) Back and a Look Forward.” AI Matters 1, no. 4: 4-12.

doi:10.1145/2757001.2757003 2011 1995 1979 1964 Smiraglia, Richard P. 2014. The Elements of Knowledge Organ- (9) (25) (15) (3)

ization. Berlin: Springer. 2010 1994 1978 1962 (22) (32) (18) (3)

Guohua Xiao 2009 1993 1977 1961 (10) (19) (13) (4)

School of Computer Science, Fudan University, Shanghai 2008 1992 1976 1960 200433, China, (12) (27) (18) (1)

2007 1991 1975 1958 (14) (26) (20) (1)

2006 1990 1974 1957 Annual Progress in Knowledge Organization (KO)? (17) (38) (16) (1)

Annual Progress in Thesaurus Research? 2005 1989 1973 1947 (15) (21) (13) (1) Earlier we had the publication Annual Review of Information

Science and Technology, ARIST, published from 1966 to 2011. 2004 1988 It belongs to a family of Annual Reviews that are very popular (15) (24) (and highly cited) in almost any discipline (and often such Table 1. Publications indexed in Web of Science. Annual Reviews exists in subfields too). I have always been interested in this kind of research synthesis (along with This table shows the number of publications indexed by many other kinds). But it has struck me that they almost Web of Science in the subcategory “of information science never live up to their names—or at least what I expect from and library science” containing the word “thesaurus” or publications with such titles. They almost never consider “thesauri” in the title (total of 824 documents). Now my progress in the same field year by year (this is also true for question is: what progress has been made concerning the- my own contributions in this genre, Hjørland 2007; Hjør- sauri year by year by all these publications? Can we say that land and Capurro 2003; Hjørland and Kyllesbech Nielsen specific kinds of progress have been made each year, or 2001). (This does not make them an unnecessary scholarly each year with more than five publications, or could we genre, however; they are still very fruitful by presenting and characterize progress in thesaurus research for each five- reviewing publications in the field on a more or less regular year interval (including, of course theoretical and metathe- basis). oretical contributions), or are all such ideas of identifica- Have a look at Table 1: tion specific progress in thesaurus research problematic and unrealistic? I guess they are. One reason could be that

2019 2003 1987 1972 we have a culture when we do not expect of publications (1) (11) (7) (24) to contribute new knowledge to the field, but just to write

2018 2002 1986 1971 papers about something in the field. If this is the case, it (10) (9) (17) (13) is, of course, a sign of a crisis and a problematic scientific

culture. In my opinion, this may also be related to another 2017 2001 1985 1970 problem: that research too little takes its point of depar- (9) (7) (10) (20) ture in the research literature, and considers its knowledge Knowl. Org. 46(2019)No.3 239 Letters to the Editor base, including, of course, unsolved problems and prob- Hjørland, Birger. 2007. “Semantics and Knowledge Organ- lematic conceptions and methodologies. My main motiva- ization.” Annual Review of Information Science and Tech- tion to edit ISKO Encyclopedia of Knowledge Organization nology 41: 367-405. doi:10.1002/aris.2007.1440410115 (IEKO) and the Reviews of Concepts in KO series in the Hjørland, Birger and Lykke Kyllesbech Nielsen. 2001. present journal is to make it easier to orient oneself in the “Subject Access Points in Electronic Retrieval.” Annual knowledge base of KO (including unsolved problems and Review of Information Science and Technology 35: 249-98. problematic conceptions and methodologies). I have not looked into these publications about thesauri year by year, but perhaps this letter entry may inspire Birger Hjørland somebody to do so? i.e. having a look at the history of thesaurus research from this point of view. Department of Information Studies University of Copenhagen References Email: [email protected]

Capurro, Rafael and Birger Hjørland. 2003. “The Concept of Information”. Annual Review of Information Science and Technology 37: 343-411. doi:10.1002/aris.1440370109

240 Knowl. Org. 46(2019)No.3 Books Recently Published

Books Recently Published Compiled by J. Bradford Young

DOI:10.5771/0943-7444-2019-3-240

Abedjan, Ziawasch, Lukasz Golab, Felix Naumann and Gurukkal, Rajan. 2019. History and Theory of Knowledge Produc- Thorsten Papenbrock. 2019. Data Profiling. Synthesis Lec- tion: An Introductory Outline. New Delhi: Oxford Univer- ture on Data Management 52. San Rafael, CA: Morgan sity Press. & Claypool. Harm, Volker, Anja Lobenstein-Reichmann and Gerhard Bhattacharyya, Siddhartha, Indrajit Pan, Abhijit Das and Diehl. 2019. Wortwelten: Lexikographie, Historische Semantik Shibakali Gupta. 2019. Intelligent Multimedia Data Analysis. und Kulturwissenschaft. Lexicographica. Series Maior 155. Frontiers in Computational Intelligence 2. Berlin: De Berlin: De Gruyter. Gruyter. Holland, Jocelyn. 2019. The Lever as Instrument of Reason: Bouveyron, Charles, Gilles Celeux, T. Brendan Murphy Technological Constructions of Knowledge around 1800. New and Adrian E. Raftery. 2019. Statistical Model-Based Clus- Directions in German Studies 25. New York: Blooms- tering and Classification: With Applications in R. Cambridge bury Academic. Series in Statistical and Probabilistic Mathematics. Cam- Husserl, Edmund. 2019. First Philosophy: Lectures 1923/24 bridge: Cambridge University Press. and Related Texts from the Manuscripts (1920-1925), trans. Chakraborty, Chinmay. 2019. Advanced Classification Tech- Sebastian Luft and Thane M. Naberhaus. Vol. 14 of niques for Healthcare Analysis. Advances in Medical Tech- Collected Works. Dordrecht: Springer Nature. Transla- nologies and Clinical Practice. Hershey, PA: Medical In- tion of Erste Philosophie (1923/24). formation Science Reference. Jagdish, Shraddha. 2019. Online Cataloguing in Library Science. Chaudhary, S. K. 2019. Dewey Decimal Classification. Ency- Delhi: Swastik Publications. clopaedia of Teaching of Library Science 1. New Delhi: Kabir, Mitt Nowshade. 2019. Knowledge-Based Social Entrepre- APH Publishing. neurship: Understanding Knowledge Economy, Innovation, and the Dong, Guozhu. 2019. Exploiting the Power of Group Differences: Future of Social Entrepreneurship. Palgrave Studies in De- Using Patterns to Solve Data Analysis Problems. Synthesis Lec- mocracy, Innovation, and Entrepreneurship for Growth. tures on Data Mining and Knowledge Discovery 16. San New York: Palgrave Macmillan. Rafael, CA: Morgan & Claypool. Losee, Robert M. 2019. Predicting Information Retrieval Perfor- Fischer, Anja. 2019. SAS Administration from the Ground up: mance. Synthesis Lectures on Information Concepts, Re- Running the SAS9 Platform in a Metadata Server Environment. trieval, and Services 65. San Rafael, CA: Morgan & Clay- Cary, NC: SAS Institute. pool. Floridi, Luciano. 2019. The Logic of Information: A Theory of Losh, Elizabeth. 2019. Hashtag. Object Lessons. New York: Philosophy as Conceptual Design. Oxford: Oxford Univer- Bloomsbury Academic. sity Press. Mahmood, Ahmed R. and Walid G. Aref. 2019. Scalable Pro- Garoufallou, Emmanouel, Fabio Sartori, Rania Siatri, and cessing of Spatial-Keyword Queries. Synthesis Lectures on Marios Zervas, eds. 2019. Metadata and Semantic Research: Data Management 56. San Rafael, CA: Morgan & Clay- 12th International Conference, MTSR 2018, Limassol, Cyprus, pool. October 23-26, 2018, Revised Selected Papers. Communica- Meadows, Jack. 2019. Understanding Information. Berlin: K. tions in Computer and Information Science 846. Cham, G. Saur. Reprint of 2018 ed. Switzerland: Springer Nowicki, Robert K. 2019. Rough Set-Based Classification Sys- Graheli, Shanti. ed. 2019. Buying and Selling: The Business of tems. Studies in Computational Intelligence 802. Cham, Books in Early Modern Europe. Library of The Written Switzerland: Springer. Word 72. The Handpress World 55. Leiden: Brill. Pitteloud, Luca, and Evan Keeling, eds. 2019. Psychology and Grajner, Martin und Guido Melchior, eds. 2019. Handbuch Ontology in Plato. Philosophical Studies Series 139. Cham, Erkenntnistheorie. Berlin: J.B. Metzler. Switzerland: Springer. Grossmann, Reinhardt. The Existence of The World: An In- Speed, Laura J., Carolyn O’Meara, Lila San Roque, and troduction to Ontology. Routledge Library Editions: Meta- Asifa Majid, ed. 2019. Perception Metaphors. Converging physics 3. Abingdon, Oxon.: Routledge. Evidence in Language and Communication Research 19. Amsterdam: John Benjamins. Knowl. Org. 46(2019)No.3

KNOWLEDGE ORGANIZATION KO

Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444 International Journal devoted to Concept Theory, Classification, Indexing and Knowledge Representation

Publisher Examples of classification arrays should be configured as figures and set into the document as jpgs; they should not be entered as editable text. Ergon – ein Verlag in der Nomos Verlagsgesellschaft mbH Remove all active hyperlinks, including those from reference formatting software (if Waldseestraße 3-5 hovering over the text with a mouse produces a gray highlight, the text is hyperlinked; D-76530 Baden-Baden remove the link “Insert,” “Hyperlink,” “Remove link”). Tel. +49 (0)7221-21 04-667 Reference citations within the text should have the form: (Author year). For example, Fax +49 (0)7221-21 04-27 (Jones 1990). Specific page numbers are required for quoted material, e.g. (Jones 1990, Sparkasse Baden-Baden Gaggenau 100). A citation with two authors would read (Jones and Smith 1990); three or more au- IBAN: DE05 6625 0030 0005 0022 66 thors would be: (Jones et al. 1990). When the author is mentioned in the text, only the date BIC: SOLADES1BAD and optional page number should appear in parentheses: “According to Jones (1990), …” or “Smith wrote (2010, 146): ….” A subsequent page reference to the same cited work (e.g., to Smith 2010) should have the form “(229).” There is never a comma before the Editor-in-chief (Editorial office) date. In-text citations should not be routinely placed at the end of a sentence or after a KNOWLEDGE ORGANIZATION quotation, but an attempt should be made to work them into the narrative. For example: Journal of the International Society for Knowledge Organization Richard P. Smiraglia, Editor-in-Chief “Jones (2010, 114) reported statistically significant results. [email protected] “Many authors report similar data; according to Matthews (2014, 94): “all seven stud- ies report means within ±5%.”

Instructions for Authors In-text citations should precede block quotations, and never are placed at the end of a block-quotation. Manuscripts should be submitted electronically (in Microsoft® Word format) in English References should be listed alphabetically by author at the end of the article. Refer- only via ScholarOne at https://mc04.manuscriptcentral.com/jisko. Manuscripts that do ence lists should not contain references to works not cited in the text. Websites mentioned not adhere to these guidelines will be returned to the authors for resubmission in proper in passing in the text should be identified parenthetically with their URLs but not with form. references unless a specific page of a specific website is being quoted. Manuscripts should be accompanied by an indicative abstract of approximately 250 Author names should be given as found in the sources (not abbreviated, but also not words. Manuscripts of articles should fall within the range 6,000-10,000 words. Longer fuller than what is given in the source). Journal titles should not be abbreviated. Multiple manuscripts will be considered on consultation with the editor-in-chief. citations to works by the same author should be listed chronologically and should each A separate title page should include the article title and the author’s name, postal ad- include the author’s name. Articles appearing in the same year should have the following dress, and E-mail address. Only the title of the article should appear on the first page of format: “Jones 2005a, Jones 2005b, etc.” the text. Contact information must be present for all authors of a manuscript. Proceedings must be identified fully by title, editor, and details of publication. To protect anonymity, the author’s name should not appear on the manuscript. Journal issue numbers are given only when a journal volume is not through-paginated. Criteria for acceptance will be appropriateness to the field of knowledge organization References for published electronic resources should be accompanied by either a URL or (see Scope and Aims), taking into account the merit of the contents and presentation. It DOI but not in lieu of actual publication data; access dates are not allowed. is expected that all successful manuscripts will be well-situated in the domain of Unpublished electronic resources may use an access date in lieu of a data of publica- knowledge organization, and will cite all relevant literature from within the domain. Au- tion. In cases of doubt, authors are encouraged to consult The Chicago Manual of Style 17th thors are encouraged to use the KO literature database at http://www.isko.org/lit.html. ed. (or online), author-date reference system (chapter 15). The manuscript should be concise and should conform to professional standards of English usage and grammar. Authors whose native language is not English are encouraged Examples: to make use of professional academic English-language proofreading services. We recom- mend Vulpine Academic Services ([email protected]). Dahlberg, Ingetraut. 1978. “A Referent-Oriented, Analytical Concept Theory for INTER- Manuscripts are received with the understanding that they have not been previously CONCEPT.” International Classification 5: 142-51. published, are not being submitted for publication elsewhere, and that if the work received Howarth, Lynne C. 2003. “Designing a Common Namespace for Searching Metadata- official sponsorship, it has been duly released for publication. Submissions are refereed, Enabled Knowledge Repositories: An International Perspective.” Cataloging & Classi- and authors will usually be notified within 6 to 8 weeks. fication Quarterly 37, nos. 1/2: 173-85. Under no circumstances should the author attempt to mimic the presentation of text Pogorelec, Andrej and Alenka Šauperl. 2006. “The Alternative Model of Classification of as it appears in our published journal. Instead, please follow these instructions. Belles-Lettres in Libraries.” Knowledge Organization 33: 204-14. In Microsoft® Word please set the language preference (“Tools,” “Language”) to Schallier, Wouter. 2004. “On the Razor’s Edge: Between Local and Overall Needs in “English (US)” or “English (UK).” Knowledge Organization.” In Knowledge Organization and the Global Information Society: The entire manuscript should be double-spaced, including notes and references. Proceedings of the Eighth International ISKO Conference 13-16 July 2004 London, UK, edited The text should be structured with decimally-numbered subheadings (1.0, 1.1, 2.0, by Ia C. McIlwaine. Advances in knowledge organization 9. Würzburg: Ergon Verlag, 2.1, 2.1.1, etc.). It should contain an introduction, giving an overview and stating the pur- 269-74. pose, a main body, describing in sufficient detail the materials or methods used and the Smiraglia, Richard P. 2001. The Nature of ‘a Work’: Implications for the Organization of Know- results or systems developed, and a conclusion or summary. ledge. Lanham, Md.: Scarecrow. Author-generated keywords are not permitted. Smiraglia, Richard P. 2005. “Instantiation: Toward a Theory.” In Data, Information, and Footnotes are not allowed. Endnotes are accepted only in rare cases and should be Knowledge in a Networked World; Annual Conference of the Canadian Association for Infor- limited in number; all narration should be included in the text of the article. Do not use mation Science … London, Ontario, June 2-4 2005, ed. Liwen Vaughan. http://www.cais- automatic footnote formatting. Instead, insert a superscript numeral (Format, Font, Su- acsi.ca/2005proceedings.htm. perscript) and create the text of the note manually in a separate list at the end of the manuscript, before the reference list. Upon acceptance of a manuscript for publication, authors must provide a digital photo Paragraphs should include a topic sentence, a developed narrative and a conclusion; and a one-paragraph biographical sketch (fewer than 100 words). The photograph a typical paragraph has several sentences. Paragraphs with tweet-like characteristics (one should be scanned with a minimum resolution of 600 dpi and saved as a .jpg file. or two sentences) are inappropriate. Italics are permitted only for phrases from languages other than English, and for the titles of published works. Bold type is not permitted. © Ergon – ein Verlag in der Nomos Verlagsgesellschaft, Em-dashes should not be used as substitutes for commas. Dashes must be inserted Baden-Baden 2019. All Rights reserved. manually (Insert, Advanced Symbol, Em-dash) with no spaces on either side. Do not use automatic formatting of any kind. To indent, use the ruler. Do not use KO is published by Ergon. tabs under any circumstances. For a bulleted list, indent the list using the ruler, then insert bullets (Insert, Advanced Symbol, bullet). Do not use automatically-numbered paragraphs. Annual subscription 2019: Illustrations should be embedded within the document. Photographs (including color – Print + online (8 issues/ann.; unlimited access for your Campus via Nomos and half-tone) should be scanned with a minimum resolution of 600 dpi and saved as .jpg eLibrary) € 359,00/ann. files. Tables should contain a number and caption at the bottom, and all columns and rows – Prices do not include postage and packing should have headings. All illustrations should be cited in the text as Figure 1, Figure 2, etc. – Cancellation policy: Termination within 3 months‘ notice to the end of the cal- or Table 1, Table 2, etc. endar year Knowl. Org. 46(2019)No.3

KO KNOWLEDGE ORGANIZATION

Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444 International Journal devoted to Concept Theory, Classification, Indexing and Knowledge Representation

Scope Aims

The more scientific data is generated in the impetuous present times, the Thus, KNOWLEDGE ORGANIZATION is a forum for all those in- terested in the organization of knowledge on a universal or a domain- more ordering energy needs to be expended to control these data in a specific scale, using concept-analytical or concept-synthetical approaches, retrievable fashion. With the abundance of knowledge now available the as well as quantitative and qualitative methodologies. KNOWLEDGE questions of new solutions to the ordering problem and thus of im- ORGANIZATION also addresses the intellectual and automatic compi- proved classification systems, methods and procedures have acquired un- lation and use of classification systems and thesauri in all fields of foreseen significance. For many years now they have been the focus of knowledge, with special attention being given to the problems of termi- nology. interest of information scientists the world over. KNOWLEDGE ORGANIZATION publishes original articles, re- Until recently, the special literature relevant to classification was pub- ports on conferences and similar communications, as well as book re- lished in piecemeal fashion, scattered over the numerous technical jour- views, letters to the editor, and an extensive annotated bibliography of nals serving the experts of the various fields such as: recent classification and indexing literature. KNOWLEDGE ORGANIZATION should therefore be available philosophy and science of science at every university and research library of every country, at every infor- science policy and science organization mation center, at colleges and schools of library and information science, in the hands of everybody interested in the fields mentioned above and mathematics, statistics and computer science thus also at every office for updating information on any topic related to library and information science the problems of order in our information-flooded times. archivistics and museology KNOWLEDGE ORGANIZATION was founded in 1973 by an in- journalism and communication science ternational group of scholars with a consulting board of editors repre- industrial products and commodity science senting the world’s regions, the special classification fields, and the subject terminology, lexicography and linguistics areas involved. From 1974-1980 it was published by K.G. Saur Verlag, München. Back issues of 1978-1992 are available from ERGON-Verlag,

too. Beginning in 1974, KNOWLEDGE ORGANIZATION (formerly IN- As of 1989, KNOWLEDGE ORGANIZATION has become the TERNATIONAL CLASSIFICATION) has been serving as a common official organ of the INTERNATIONAL SOCIETY FOR KNOW- platform for the discussion of both theoretical background questions LEDGE ORGANIZATION (ISKO) and is included for every ISKO- and practical application problems in many areas of concern. In each is- member, personal or institutional in the membership fee. sue experts from many countries comment on questions of an adequate Annual subscription 2019: Print + online (8 issues/ann.; unlimited structuring and construction of ordering systems and on the problems access for your Campus via Nomos eLibrary) € 359,00/ann. Prices do of their use in opening the information contents of new literature, of not include postage and packing. Cancellation policy: Termination within 3 months‘ notice to the end of the calendar year data collections and survey, of tabular works and of other objects of sci- entific interest. Their contributions have been concerned with Ergon – ein Verlag in der Nomos Verlagsgesellschaft mbH, Wald- seestraße 3-5, D-76530 Baden-Baden, Tel. +49 (0)7221-21 04-667, Fax

+49 (0)7221-21 04-27, Sparkasse Baden-Baden Gaggenau, IBAN: DE05 (1) clarifying the theoretical foundations (general ordering theory/ 6625 0030 0005 0022 66, BIC: SOLADES1BAD science, theoretical bases of classification, data analysis and reduc- Founded under the title International Classification in 1974 by Dr. tion) Ingetraut Dahlberg, the founding president of ISKO. Dr. Dahlberg (2) describing practical operations connected with indexing/classifi- served as the journal’s editor from 1974 to 1997, and as its publisher (In- cation, as well as applications of classification systems and the- deks Verlag of Frankfurt) from 1981 to 1997. sauri, manual and machine indexing The contents of the journal are indexed and abstracted in Social Sci- ences Citation Index, Web of Science, Information Science Abstracts, INSPEC, Li- (3) tracing the history of classification knowledge and methodology brary and Information Science Abstracts (LISA), Library, Information Science & (4) discussing questions of education and training in classification Technology Abstracts (EBSCO), Library Literature and Information Science (Wil- (5) concerning themselves with the problems of terminology in gen- son), PASCAL, Referativnyi Zhurnal Informatika, and Sociological Abstracts. eral and with respect to special fields.