Taxonomies and for digital asset management

Received (in revised form): 4th February, 2018

Heather Hedden is a senior vocabulary editor at Gale, a Cengage Company, where she edits the subject for Gale research database products and develops discipline taxonomies for the management of content for Cengage online educational products. Previously she was a full-time consultant through her own business, Hedden . She is the author of ‘The Accidental Taxonomist’, 2nd edn (Information Today Inc., 2016) and currently provides limited taxonomy consulting services, offers onsite workshops and teaches an online course in taxonomy creation.

Hedden Information Management, 98 East Riding Drive, Carlisle, MA 01741, USA Tel: +1 978 467 5195; E-mail: [email protected]

Abstract Taxonomies, which can aid in the retrieval of digital assets, are of various kinds, and different kinds (hierarchical, faceted and thesauri) are suited for different situations. Taxonomies may be implemented in a digital asset or content management system either for tagging assets or for categorising assets or for both, and there are different circumstances that favour tagging versus categorising. Standards for taxonomies include both guidelines of standards bodies for best practices in developing controlled vocabulary, and specifically thesauri, and various industry recommendations for metadata schema so that controlled vocabularies can be shared. Taxonomies typically form part of a larger metadata architecture. Therefore, taxonomy design and metadata design should go together.

KEYWORDS: categories, controlled vocabularies, metadata, search filters, tagging, taxonomies, thesauri

INTRODUCTION taxonomy design and metadata design should A taxonomy provides a combination of go together. benefits: more accurate and complete Taxonomies can refer to various related retrieval of digital assets than can be retrieved schemes for organising topics and, by with search alone, and the ability to browse extension, content and information. A view topics that are structured in a way to guide of taxonomies that is too limited can result the user to select the most appropriate in overlooking their broader applications topics. As such, taxonomies can be powerful and benefits. A traditional and limited tools for managing and retrieving content. view of taxonomies is that of hierarchical Nevertheless, not all taxonomies are the arrangements of terms for browsing from same in their structure. Some are hierarchical, the top down, from the broadest terms to some are faceted, some support search and the most specific, as in inverse tree structures. some provide a combination. With a better These types of taxonomies are useful in understanding of taxonomies, the right certain subject domains and content retrieval kind of taxonomy can be implemented. use cases but are not suitable in other cases. It Furthermore, taxonomies usually form part is also important not to confuse a taxonomy of a larger metadata architecture. Therefore, with a navigation scheme (as menu labels

380 Journal Of Digital Media Management Vol. 6, 4 380-387 © Henry Stewart Publications 2047-1300 (2018) Taxonomies and metadata for DAM

or a site map) as for a website, intranet or places, academic disciplines, arts and crafts, portal. Navigation schemes provide a guide occupations, organisational departments, to a site as it is structured, but they do not news organised like newspaper sections, serve as an independent reference lookup etc. Hierarchical taxonomies work well if tool as a taxonomy or metadata does. the taxonomy is not too large, where the number of concepts is in the hundreds or less. The content retrieval situation for which TYPES OF TAXONOMIES AND USES hierarchical taxonomies are best suited is to A broader view of taxonomies considers provide guidance to nonexpert users who them as structured controlled vocabularies. want to explore topics to discover what they A controlled vocabulary, in the context of are looking for. The taxonomy can even information management, is defined by the serve an instructional purpose to outline the standards body ISO as a ‘prescribed list of subject domain for the users. terms, headings or codes, each representing A thesaurus is the better option for content a concept’.1 A taxonomy may then refer to and terms that cannot neatly be categories a kind of controlled vocabulary with some into a limited number of , such as structure/relationships among the terms. ISO business-related activities or current trends defines a taxonomy as a ‘scheme of categories in popular culture. In addition, a thesaurus is and subcategories that can be used to sort also more suitable for multiple, overlapping and otherwise organise items of knowledge subject areas or domains with diverse content, or information’, and in a note, explains such as topics of research reports, and for very that the simplest taxonomies may not have large and growing controlled vocabularies, subcategories.2 In a broad sense, a taxonomy in the thousands or tens of thousands of may be presented in the following forms: terms. Thesauri are best used for indexing and retrieval situations such as when detailed • one or more sets of hierarchies of terms, indexing is needed with highly specific terms, whereby all individual terms are related to indexing is done by trained or professional each other in hierarchical relationships; human indexers or the users are subject- • a set of terms related to each other matter experts who will likely look for by either hierarchical or associative specific terms.Thesauri are useful to support (related-term), or equivalence (/ searching, especially with search-support variant) relationships, also known as a type-ahead/auto-complete features, and in thesaurus; or user interfaces with full alphabetical browsing. • a set of distinct types or facets of There are also national and international terms that are intended to be used in standards for thesaurus creation, ISO 25964-1 combination for search and retrieval of andANSI/NISO Z39.19-2005. Because content. The structure is not necessarily thesauri are more complex than taxonomies, the relationships between terms but the it takes a higher level of skill and expertise in grouping of terms into facets. library or to create them. A taxonomy structured as a set of facets Each of these examples of taxonomies is is best suited for managing and retrieving suited for a different purpose, both with content that is of a somewhat unified or respect to content types and content limited kind so that the content items share retrieval/management situations. certain aspects which can be covered by Hierarchical taxonomies may be suited shared taxonomy facets.These content items for content and terms that naturally can could be all the same type, such as product be categorised and for a subject area with records, people records, customer documents, a defined scope.These could be a set reports, marketing collateral, and content for of product types, industries, geographic digital publishing. The implementations for

©Henry Stewart Publications 2047-1300 (2018) Voi. 6,4 380-387 Journal of Digital Media Management 381 Hedden

which faceted taxonomies are best suited are • if the controlled vocabulary includes more varied. Limiting/filtering/refining search , which tends to be supported in results by facets is suitable both for novice lookup, but not in categories; and and expert users. The use of taxonomy facets, • if multiple topics are relevant for a content as part of a larger set of metadata properties, item, because tagging supports assigning is also suitable for the work of content multiple tags per asset, in contrast to managers or digital asset managers who have categories, which may be limited to only other workflow tasks, such as identifying one per asset. records with certain rights or retention status, audience or market, and source or owner. Categories (such as virtual folders) and categorising is preferred under the following circumstance: TAGGING VERSUS CATEGORISING WITH A TAXONOMY • if a single preferred means of categorising There are different options for using a (eg content type, discipline, brand) is taxonomy in a digital asset management preferred by the users; (DAM) system or content management • if the same set of users usually work in the system (CMS). Such a system typically same category, so that team members can provides a feature for tagging content with regularly access their ‘go-to’ folder; controlled vocabulary terms, which in this • if the files always stay in this repository context might be called ‘tags’.Tags may be rather than ‘travel’ downstream to other any keywords of the digital asset manager’s applications; choice, or they may be restricted to terms • if the taxonomy of folders is relatively from a controlled vocabulary/taxonomy. In small (and there is no need for synonyms); some DAM system or CMS, there is also the • if there is the desirability for a hierarchical option to categorise content items according taxonomy but none of the metadata to a limited number of predefined categories, properties support it; often represented as virtual folders. If both • if there are problems with user compliance features exist, a decision needs to be made in tagging for such terms; and about taxonomy implementation and use: • if users clearly prefer category folders whether the controlled vocabulary be (based on use cases). implemented a controlled list of tags for tagging content, whether it be implemented Both tags and categories can be implemented as a taxonomy of categories for categorising in the same system for the same repository content items, or whether both methods will of content if serving different purposes. be implemented with different controlled For example, categories could be broader vocabularies for each. The choice depends on topics than any of the topics in the tags, or various factors. categories can be for a different method of Tags and tagging is preferred under the categorising than covered in tags, such as following circumstances: content types instead of topics.

• if the workflow involves content files ‘travelling’ downstream to other applications METADATA AND TAXONOMIES or systems (as commonly seen in work-in- There is significant overlap between progress DAM systems) so that the tags are taxonomy and metadata. The term metadata, always associated with the content; otherwise known as ‘data about data’, refers • if the controlled vocabulary is very large, to all the recorded, structured information because a large set of folders may be about a content item, document, digital cumbersome to browse through; asset or webpage. Taxonomies (or more

382 Journal Of Digital Media Management Vol. 6, 4 380-387 © Henry Stewart Publications 2047-1300 (2018) Taxonomies and metadata for DAM

generally, controlled vocabularies) are often, of course, a type of controlled list, but a but not always, used for metadata, and much, controlled list may be simpler. For example, but not all, of metadata utilises controlled the controlled list for a metadata property vocabularies or taxonomies. Metadata may consist ofjust a pair of values, such as properties or fields get filled or ‘populated’ yes or no, male or female, or new or used, with specific controlled vocabulary terms as or it may consist of just three or four values, appropriate for each individual content item. such as small, medium and large. These Different types of metadata serve different types of lists are sometimes not considered purposes.The National Information controlled vocabularies, because part of the Standards Organisation (NISO) defines three definition of a controlled vocabulary is kinds of metadata: descriptive, structural and that a term is designated for a concept, and administrative.3 Descriptive metadata includes concept-naming decisions need to be made. information on what a resource is about, Controlled vocabularies of any size, expressed in keywords or short descriptions; including hierarchical taxonomies, may be and it also includes other descriptive used to support one or more descriptive information that could be used to look up metadata properties, e^aecially a property and retrieve the item, such as title, author that is called Subject, Topic or Descriptor. A and document type. Administrative metadata taxonomist is not necessarily responsible describes information needed to manage for all metadata, so he or she needs to work a resource, such as its creation date, size, in collaboration with a metadata architect, access rights, intellectual property rights and metadata librarian or content architect, archival preservation information. Structural especially in the blurred area of responsibility metadata describes the relationships of parts between short controlled vocabularies to one another, such as the sequence of and long controlled lists. In addition to content items in a series. There are other determining the metadata properties and methods besides NISO for classifying their values, other decisions need to be made: metadata types, but most methods distinguish whether assigning/tagging values from a between metadata for aiding in search specific metadata property is required or or discovery and retrieval of content and optional, whether a metadata property may metadata for managing content. hold only one value or can permit multiple Taxonomies are associated with the values and whether the property will be descriptive type of metadata, for two reasons. displayed in the user interface for end-user First, taxonomists, by the nature of their search-and-retrieval purposes. work, are focused on the goal of descriptive While the majority of taxonomies are metadata, which is to help users find content. implemented as metadata, if a taxonomy Secondly, descriptive metadata tends to is implemented in a way that the terms, use taxonomies more than other types of unlike other metadata, are not attached to metadata do. If administrative or structural a content item, then the taxonomy might metadata properties require controlled not be utilised as metadata. An example vocabularies, these tend to be short, flat lists would be navigational topics on a website, of values and not taxonomies. where the topics are hyperlinks to pages. Regardless of the type of metadata, Another example would be a taxonomy (descriptive, administrative or structural), a that is implemented to support dynamic specific metadata element or property or auto-indexing or search, and executed ‘on field may either allow free text or require the fly’, rather than being permanently the user to select from a controlled list attached to a record, and then it is not of options. A controlled vocabulary is, metadata.

© Henry Stewart Publications 2047-1300 (2018) Vol. 6,4 380-387 Journal of Digital Media Management 383 Hedden

FACETED TAXONOMIES AND taxonomies. It is called a ‘taxonomy’ METADATA even though it differs from the classical A faceted taxonomy comprises a set of hierarchical ‘tree’ type of taxonomy, facets, each facet containing an individual because it involves controlled vocabulary controlled vocabulary whose terms are and classification. The name for each facet generally not linked/related to terms in plus the terms within the facet constitutes the other controlled vocabulary facets, but what is essentially a simple two-level the combination of terms, each selected . from a combination of facets, is used to tag Each facet is also a metadata property/ the same set of content, and users limit the element. The taxonomist designing a faceted search/filter on terms in combination from taxonomy is thus also designing metadata, various facets. Examples of facets may be or at least some of it. There are usually more Product/Service, Market Segment, Location, metadata properties to describe the content Content Type, Supplier, Channel, etc. The beyond those which comprise the taxonomy user interface may also include refinements/ facets, because metadata can serve additional filters, which do not utilise taxonomy terms, purposes beyond helping users find content. such as author, price or date. Figures 1 and Metadata may describe content for purposes 2 provide examples of excerpts of faceted of full identification, source citation or taxonomies. information on how the content can be A faceted taxonomy is a common used, including rights data. Designing the type taxonomy, whether for as enterprise full set of metadata may be the responsibility taxonomies, DAM taxonomies, or of a metadata architect rather than a e-commerce or product review taxonomist.

Title Type

Feature Film TV Movie TV Series 8 TV Episode □ TV Special ® Mini-Series Documentary □ Video Game Short Film Video TV Short

Genres

□ Action □ Adventure □ Animation □ Biography □ Comedy □ Crime □ Documentary O Drama □ Family □ Fantasy □ Film-Noir □ Game-Show e History □ Horror □ Music □ Musical □ Mystery □ News □ Reality-TV 0 Romance □ Sci-Fi □ Sport □ Talk-Show □ Thriller □ War □ Western

Title Groups

□ IMDb "Top 100" IMDb "Top 250" IMDb "Top 1000” Now-Playing Oscar-Winning Best Picture-Winning Best Director-Winning Oscar-Nominated Emmy Award-Winning Emmy Award-Nominated 1 Golden Globe-Winning ( Golden Globe-Nominated Razzie-Winning Razzie-Nominated National Film Board Preserved

Figure 1: Select facets from the Internet Movie Database Source: www.imdb.com/search/titles.

384 Journal Of Digital Media Management Vol. 6, 4 380-387 © Henry Stewart Publications 2047-1300 (2018) Taxonomies and metadata for DAM

Color Family | | Dark Brown Wood (199)

□ Dark Brown (900)

□ Gray (341)

| | Light Brown Wood (83)

1 Light Brown (624)

+ See Aii

Cabinet Type □ Base(2333)

□ Pantry/Utility (528)

[J Wail (2323)

Door Style □ Raised Panel (3216)

□ Recessed Panel (92)

□ Shaker (1876)

Figure 2: Facets from Home Depot Source: www.homedepot.com.

Meanwhile, there may be additional based on what it is and what it is about, metadata properties beyond the scope and the number of facets should be limited, definition of‘taxonomy’ that are nevertheless perhaps 5—10, keeping in mind that made available to the end user to filter/ there may be additional, non-taxonomy refine results alongside the other, taxonomy refinements, such as date. Additionally, user facets. These could be for author/creator, interface space limitations may make even date, title keyword, text keyword, file format, fewer facets preferable. Because the facets etc. Sometimes the distinction between are few, they should be presented in some taxonomy facet and other metadata in this logical order, not in a default alphabetical case is not so clear, such as for Document/ order. Content Type, Audience or Language, As for the number of terms within when these properties utilise controlled a facet, ideally these are also somewhat vocabularies. limited, so that they are not too many to be viewed easily in a short list or in a scroll box (without too much scrolling). Often DESIGNING FACETS the first three or four terms within a facet Designing facets overlaps with designing a are displayed, and there is an option to click metadata schema, but facets are displayed to on ‘more’ to see the full list of terms within the end users for their interaction. So, facet that facet. Thus, the number of terms may be design needs to take the user interface and 2—25, with only one or two exception facets user experience into consideration. Following that have far more terms. This way, the user are some issues in designing usable facets. can more easily keep track of selections of For a faceted taxonomy to best serve the terms from multiple facets. Exceptions for user who is trying to fmd/discover content facets with more terms include alphabetical

© Henry Stewart Publications 2047-1300 (2018) Vol. 6,4 380-387 Journal of Digital Media Management 385 Hedden

lists of known (anticipated) entities, such as for term format/style and relationship states or countries, and a larger generic topic types between terms. The leading facet. standards are ANSI/NISO Z39.19 Other considerations in designing facets (2005, renewed 2010) ‘Guidelines for for the DAM or CMS include the following: Construction, Format, and Management of Monolingual Controlled Vocabularies’ • ability to select multiple values from and ISO 25964-1 (2011) ‘Thesauri and within the same facet at once (typically by Interoperability with Other Vocabularies, means of check boxes); Part l:Thesauri for ’. • including other metadata (not ‘taxonomy’) There is a lot of overlap between the two. in the same set of displayed facets (date, Taxonomies should be designed to follow creator, price, etc); such guidelines to the extent practical. • having all generic facets, the same The guidelines are especially relevant in all contexts or also having some to correctly structuring hierarchical category-specific facets; and relationships. Such standards, however, • supporting a hierarchy of terms (no more are still not sufficient for any taxonomy than two levels recommended) within a implementation. Additional customised single facet. policies for the taxonomy maintenance and indexing use should be created as part of an Designing facets is an integral task to overall taxonomy governance plan. designing and specifying all descriptive Metadata in general do not require metadata, of which a faceted taxonomy such standards for ease of use. Rather, is part. Due to this overlap and blurred there are ‘standards’ of the second type, distinction between taxonomy facets and for interoperability for metadata, and displayed metadata for filtering, it is a good these are known a metadata models or idea to design the taxonomy and metadata schema. There are a number of different specification together as an integrated published standards for different kinds of strategy. content, so each may be considered as just a suggestion. These include Dublin Core Metadata Elements for digital content, STANDARDS AND POLICIES FOR MARC (Machine-Readable Cataloging) TAXONOMIES AND METADATA for library materials and IPTC Standards serve various purposes.Two (International Press Telecommunications leading purposes for standards are Council) for photographs, just to name a few. If an organisation does not have the • to ensure consistency and ease of use need to exchange its fully tagged content across different products or systems used externally, then there is no need to follow by different users; and an established metadata schema. Rather, • to ensure interoperability, the sharing an organisation should develop its own or exchange of products/services/ internal, customised metadata schema. information. Like internal taxonomy policy, a metadata schema defines the policy for maintaining Published standards for taxonomies the metadata and how it should be and other controlled vocabularies are applied. typically for the first purpose, of enabling Specifically, a metadata schema lists consistency and east of use, and this is exactly what each of the metadata properties by means of best practices guidelines are, provides definitions for those properties

386 Journal Of Digital Media Management Voi. 6, 4 380-387 ©Henry Stewart Publications 2047-1300(2018) Taxonomies and metadata for DAM

and spells out rules for use of those can acquire metadata architecture skills, properties. Rules for metadata properties or the two individual experts can work may include together on the same project. A smaller organisation, however, might not have both • whether or not the property field is types of experts on staff. Whether such populated with terms from a controlled an organisation has a metadata architect vocabulary; or a taxonomist depends on the nature • what the source entering terms is of the organisation’s content and content (automatically generated or human-created organisation needs. and by whom); Organisations that start with the • whether applying a term from the metadata expertise and approach to property is required for each content item; information management tend to be those • whether only one term, a limited number with significant needs in DAM (with or any number of terms can be applied image or other media collections), records with in the same field; and management (in highly regulated industries), • whether the application of term in one publishing or cultural preservation (museums property are dependent on terms applied or libraries). Organisations that start with the in another property; taxonomy expertise and approach include product or service providers, distributors These last three issues (required, number and and retailers (especially in e-commerce), dependency) are also relevant to facets in a and organisations focused on providing faceted taxonomy. information resources. In conclusion, well-designed taxonomies and metadata will facilitate the management DESIGNING TAXONOMIES ALONG and retrieval of digital assets or other content. WITH METADATA As taxonomies and metadata are integrated, As taxonomies and metadata are integrated, their design and development should also be there may be uncertainty whether to start an integrated process, whether undertaken with creating the overall metadata strategy by an individual or an interdisciplinary and schema and then build taxonomies as team. A good understanding of taxonomies part of it as needed, or to start with creating and metadata is needed to choose the best a taxonomy and then, in the process, identify type of taxonomy and select or design the the various descriptive metadata. Ideally, appropriate metadata model. the two are developed for implementation combination, as part of an integrated strategy. References An expert in taxonomy development (a 1. International Organization for Standardization. taxonomist) and an expert in metadata (2013) “ISO 25964-2:2013 Information and documen­ design (a metadata architect), however, are tation— Thesauri and interoperability with other vocabularies — Part 2: Interoperability with other usually not the same person. vocabularies’, International Organization for A metadata architect (one who develops, Standardization, Geneva, p. 4. implements and manages metadata strategy, 2. Ibid., p. 14. 3. Riley, J. (2017) ‘Understanding Metadata: What is architecture and policies) can acquire Metadata and What is it For?’, National Information taxonomy-creation skills, and a taxonomist Standards Organization (NISO), Baltimore, MD, p. 6.

© Henry Stewart Publications 2047-1300 (2018) Voi. 6,4 380-387 Journal of Digital Media Management 387