Communications Michael Twidale, and Fang Huang Gao
Total Page:16
File Type:pdf, Size:1020Kb
Hong Zhang, Linda C. Smith, Communications Michael Twidale, and Fang Huang Gao Seeing the Wood example, the Dublin Core Metadata Here again, no weighting or dif- Element Set recommends the use of ferentiating mechanism is included for the Trees: controlled vocabulary to represent in describing the multiple elements. Enhancing subject in “keywords, key phrases, What is addressed is the “what” prob- or classification codes.”1 Similarly, lem: What is the work of or about? Metadata Subject the Library of Congress practice, sug- Metadata schemas for images and art Elements with gested in the Subject Headings Manual, works such as VRA Core and CDWA is to assign “one or more subject focus on specificity and exhaustivity Weights headings that best summarize the of indexing, that is, the precision and overall contents of the work and quantity of terms applied to a subject provide access to its most important element. However, these schemas do Subject indexing has been conducted topics.”2 A topic is only “important not address the question of how much in a dichotomous way in terms of enough” to be given a subject head- the work is of or about the item or ing if it comprises at least 20 percent concept represented by a particular what the information object is primar- of a work, except for headings of keyword. ily about/of or not, corresponding to named entities, which do not need to Recently, social tagging functions the presence or absence of a particular be 20 percent of the work when they have been adopted in digital library subject term, respectively. With more are “critical to the subject of the work and catalog systems to help support as a whole.”3 Although catalogers are better searching and browsing. This subject terms brought into informa- aware of it when they assign terms, introduces more subject terms into tion systems via social tagging, man- this weight information is left out of the system. Yet again, there is typi- ual cataloging, or automated indexing, the current library metadata schemas cally no mechanism to differentiate many more partially relevant results and practice. between the tags used for any given A similar practice applies in item, except for only a few sites that can be retrieved. Using examples from non-textual object subject indexing. make use of tag frequency informa- digital image collections and online Because of the difficulty of selecting tion in the search interfaces. library catalog systems, we explore words to represent visual/aural sym- As collections grow and more bolism, subject indexing for art and federated searching is carried out, the the problem and advocate for adding cultural objects is usually guided by absence of weights for subject terms a weighting mechanism to subject Panofsky’s three levels of meaning can cause problems in search and indexing and tagging to make web (pre-iconographical, iconographical, navigation. The following examples search and navigation more effec- and post-iconographical), further illustrate the problems, and the rest refined by Layne in “ofness” and of the paper further reviews and tive and efficient. We argue that the “aboutness” in each level. Specifically, discusses the precedent research and weighting of subject terms is more what can be indexed includes the practice on weighting, and further important than ever in today’s world “ofness” (what the picture depicts) outlines the issues that are critical in of growing collections, more federated as well as some “aboutness” (what applying a weighting mechanism. is expressed in the picture) in both searching, and expansion of social pre–iconographical and iconographi- tagging. Such a weighting mechanism cal levels.4 In practice, VRA Core 4.0 needs to be considered and applied not for example defines subject subele- ments as: only by indexers, catalogers, and tag- Hong zhang ([email protected]) gers, but also needs to be incorporated Terms or phrases that describe, is Phd candidate, graduate School into system functionality and meta- identify, or interpret the Work of Library and information Science, data schemas. or Image and what it depicts or university of illinois at urbana-champaign, expresses. These may include linda c. smith ([email protected]) is generic terms that describe the Professor, graduate School of Library and information Science, university of ubjects as important access work and the elements that it illinois at urbana-champaign, michael comprises, terms that identify points have largely been twidale ([email protected]) is S indexed in a dichotomous way: particular people, geographic Professor, graduate School of Library and what the object is primarily about/ places, narrative and icono- information Science, university of illinois of or not. This approach to index- graphic themes, or terms that at urbana-champaign, and Fang Huang ing is implicitly assumed in various refer to broader concepts or Gao ([email protected]) is Supervisory guidelines for subject indexing. For interpretations.5 Librarian, government Printing office. seeinG tHe WooD For tHe trees: | ZhanG et Al. 75 ■■ Examples of Problems when people look at a particular Manual, the first subject is always item’s record, with the title and some- the primary one, while the second exhaustive indexing: Digital times the description, we may very and others could be either a primary library collections well determine that the picture is or nonprimary subject.8 This means primarily of, say, a dog instead of that among these 126 books, there is A search query of “tree” can return trees. That is, the subject elements no easy way to tell which books are thousands of images in several dig- have to be interpreted based on “primarily” about “psychoanalysis ital library collections. The results the context of other elements in the and religion” unless the user goes include images with a tree or trees record to convey the “primary” and through all of them. With the pro- as primary components mixed with “peripheral” subjects among the vided metadata, we do know that images where a tree or trees, although listed subject terms. However, in a all books that have “psychoanalysis definitely present, are minor compo- search and navigation system where and religion” as the first subject nents of the image. Figure 1 illustrates subject elements are usually treated heading are primarily about this the point. These examples come from as context-free, search efficiency will topic, but a book that has this same three different collections and either be largely impaired because of the heading as its second subject head- include the subject element of “tree” “noise” items and inability to refine ing may or may not be primarily or are tagged with “tree” by users. the scope, especially when the vol- about this topic. There is no way to There is no mechanism that catalog- ume of items grows. indicate which it is in the metadata, ers or users have available to indicate Lack of weighting also limits nor in the search interface. that “tree” in these images is a minor other potential uses of keywords or As this example shows, the component. tags. For example, all the tags of all Library of Congress manual involves Note that we are not calling this the items in a collection can be used an attempt to acknowledge and make out as an error in the profession- to create a tag cloud as a low cost a distinction between primary and ally developed subject terms, nor way to contribute to a visualization nonprimary subjects. However in indeed in the end user generated of what a collection is “about” over- practice the attempt is insufficient to tags. Although particular images all.6 Unfortunately, a laboriously be really useful since apart from the may have an incorrectly applied key- developed set of exhaustive tags, first entry, it is ambiguous whether word, we want to talk about the vast although valuable for supporting subsequent entries are additional majority where the keyword quite searching and browsing within a primary subjects or nonprimary sub- correctly refers to a component of the large image collection, could give a jects. Consequently, the search system image. Furthermore, such keywords very distorted overview of what the and, further on, the users are not able referring to minor components of whole collection is about. Extending to take full advantage of the care of the image are extremely useful for our example, the tag “tree” may a cataloger in deciding whether an other queries. This kind of exhaustive occur so frequently and be so promi- additional subject is primary or not. indexing of images enables the effec- nent in the tag cloud that a user tive satisfaction of search needs, such infers that this is mostly a botanical other information retrieval as looking for pictures of “buildings, collection. systems people, and trees” or “trees beside a river.” With large image collections, selective indexing: lcsH in The negative effect of current sub- such compound needs become more library catalogs ject indexing without weighting on important to satisfy by combinations search outcomes has been identified of searching and browsing. To enable Although more extreme in the case by some researchers on particular them, metadata about minor subjects of images in conveying the “ofness,” information retrieval systems. In a is essential. the same problem with multiple sub- study examining “the contribution However, without weights to dif- jects also applies to text in terms of of metadata to effective searching,”9 ferentiate subject keywords, users “aboutness.” The