Expanding the system definitions and configurations ( and data structure)

Magan Arthur is the principal consultant at ACG — an independent consulting group for end-to-end planning and execution of innovative enterprise content management projects.

Keywords: taxonomy, metadata, data structure, data presentation, metadata templates, polihierarchical structures

Abstract This paper is part of a series of enterprise content management (ECM) best practices from ACG, an independent consulting group. The series provides practical tips and expert advice on topics covering planning, implementing, and improving enterprise content management systems and their components. This paper focuses on taxonomy and data structures. It is written from the point of view of the implementation team. It assumes you have some level of experience with the concept of metadata and but it is not an academic study. This paper tries to be hands-on and intellectual only to the degree necessary to convey certain principles. It will provide links to resources which may also target more academic audiences. The complete ECM Best Practices Series from ACG is available at http:// www.arthurconsultinggroup.com.

INTRODUCTION require companies to improve the Taxonomy is the science of describing organization of their content, and an object, in our case content or assets. taxonomy is a way to apply some very In addition to describing the object, a old and proven methods to a new form taxonomy will also place it into a of managing content. While some old relationship with other content and wisdom can help with the new group the content in logical collections challenges, there are various aspects of or nodes of a hierarchy.1 the new media which are not well Taxonomy is not a new term and covered in the age-old science. This library science is a good 2000 years old. paper will address both areas. The current renaissance is due to a A taxonomy for a larger system will growing understanding that file systems need to describe and group content from are not the right tool to manage and various sources in a logical but also Magan Arthur ACG control access to the growing digital useful way. This structure can become a 60 Canyon Road Fairfax, CA 94930, USA content repositories of companies, complicated hierarchy with hundreds of Tel: +1 415 462 2979 Fax: +1 415 482 9304 governments or any organization even nodes. If you plan for a larger system Email: Magan@arthur of medium size. Stricter rules and and you do not have a librarian on staff, consultinggroup.com regulations, specifically in the USA, you should seriously consider securing

# Henry Stewart Publications 1743–6559 (2005) Vol. 1, 4 279–297 JOURNAL OF DIGITAL ASSET MANAGEMENT 279 Arthur

the services of a consultant. In addition, However, administrative data are almost every industry conference (AIIM, very important for your system to Henry Stewart DAM Symposium and function. There is a second meaning others) has dedicated seminar tracks for of the term taxonomy which is more taxonomy. broadly describing any data used to This paper starts with a clarification of describe and classify content. It has terms. This is necessary because there are become common to refer to any not yet generally accepted standards or system used to find and describe even guidelines for the terms used in digital content as taxonomy. describing taxonomies or data structures. Metadata is a wider term which, for the (Interestingly, you will find that purpose of this paper, shall include building a larger taxonomy is a lot any data about an object both about clarifying terms.) descriptive and administrative in I will then follow the order, also used nature (data about data). by Ann Rockley in her outstanding Metadata structure is the system of book Managing Enterprise Content2,of metadata templates that will be used distinguishing metadata between the to classify, find and describe the and individual data. First I objects of your system. will discuss classification or Collection will refer to any grouping of categorization of content and then content which could be a folder, provide ideas on how to build a collection, or also a job or project. metadata scheme for the individual assets Object will be any element that can be or content (Ann Rockley refers to this as described with its own set of element metadata). metadata: Individual files, collections, In the last part of this paper I will folders, jobs, projects, user groups, describe considerations in regard to users, upload or staging folders and expanding the system, which will touch more. One way to think of an object on other data structures not commonly is as a row in the and included in the taxonomy discussions. metadata as columns. These data structures include user groups Authoritative term is the term used to and roles, security, ingestion and describe a node in the classification download folder structures, as well as hierarchy. An authoritative term can other searchable indexes. have many synonyms or related terms but it is chosen to represent all CLARIFICATION OF TERMS these concepts as the most Before getting into more detail I would identifiable term in the classification. like to clarify a few terms. Parent/child relationship expresses the hierarchical relationship in a Taxonomy is a system of describing an classification. ‘‘Mammal’’ is the object also through its relationship to parent of ‘‘human,’’ and ‘‘race’’ is the other objects. Usually these child of ‘‘human.’’ relationships are expressed in a Ontology is a related term to taxonomy hierarchy. Administrative data (use, and usually tries to explain any usage rights, status etc.) are usually object from its place in the hierarchy not considered in these definitions. of other objects. I will try to avoid

280 JOURNAL OF DIGITAL ASSET MANAGEMENT Vol. 1, 4 279–297 # Henry Stewart Publications 1743–6559 (2005) Expanding the system definitions and configurations

highly academic terms and instead most companies, however, there is no use more descriptive language agreed enterprise structure to the file whenever possible. systems or for different content management systems, digital or TAXONOMY OR otherwise. Every department has HIERARCHICAL STRUCTURE different, sometimes poorly maintained, file folders. General considerations Independent of any software solution Before you start it should be said that you have or will employ to manage all common sense is a very important or part of that content, creating a map element in this exercise. The end result of the content in your organization is a should be a structure that is easy to use by very valuable exercise. Figure 1 shows a end-users, content contributors and sample structure. administrators alike. A classification Different users will make different system for all content of a large logical associations and search for the organization is the best case scenario, but same content in different ways. While it might not be practical to maintain, as it for the sales team ‘‘images’’ might requires ongoing maintenance from staff include anything from photos to logos with specialized skill sets. If your key and graphics, these are very separate concern is a useful classification or search categories for the professional designer. system for the daily tasks of the average In the example in Figure 1, it would person, your energy could be better spent make just as much sense to build the on refining or ‘‘harmonizing’’ a number hierarchy as shown in Figure 2. of smaller and more targeted structures

managed by tools that are more Marketing HR departmental. Product marketing Benefits Data Sheets Forms Another important clarification is that Product Specification 401k your ‘‘enterprise taxonomy’’ is not Solution Overview Life Power Point Slides Disability necessarily tied to a software product Product Videos . . . (existing or planned). It makes a lot of Product Shots Info Docs . . . 401k sense to start with a piece of paper. The Trade Shows Life Banner Ad Disability following questions can be mapped in a Event Specific . . . spreadsheet or a simple table. NAB 2005 ... Building the structure Figure 1: A sample structure What constitutes content in your organization and where is it? HR As you brainstorm this question you Benefits will almost naturally start building a 401k Forms classification in a hierarchical structure Info Docs Life (taxonomy). This structure will likely Forms resemble the structure of any content Info Docs management solutions already in use and/or your existing file systems. In Figure 2: Alternative structure

# Henry Stewart Publications 1743–6559 (2005) Vol. 1, 4 279–297 JOURNAL OF DIGITAL ASSET MANAGEMENT 281 Arthur

To begin with, these details are not Technology that important. The first goal is simply Software Enterprise Software to identify all the content that is of value Content Management for your organization. As with any Ann Rockley, ‘‘Managing Enterprise Content’’ 2003, New Riders larger project it is very important to have a general understanding of the Figure 3: Library classification scope and context. Only after that has been established will it make sense to decide in which area more detail and HR Benefits organization will most benefit the Policies Procedures organization. ... As you think more about your specific situation it will make sense to refine this general map. It is highly recommended Figure 4: Simplified structure that you involve the people who will ultimately use this system when you last level of the hierarchy tree is a book think about the following issues. This is (see Figure 3). not simply general good practice, but The ability of technology to display involving the users is essential to search results intuitively and to refine capturing both the formal as well as the searches with specific metadata can make informal relationships and flows of your it slightly easier for a digital library. To content. use our example from Figure 2, the structure in Figure 4 might suffice to Short-term versus long-term content narrow the search to just a few items As content will be ingested or cataloged that can then be displayed as a list or any into a more organized system it will other useful representation (eg have to follow specific rules. This can be thumbnail) from which any user can work intensive and needs careful easily pick the desired content or asset. attention. It will probably not be necessary to catalog all content. Short- Identify non-unique labels and build a lived content with minimal potential for unique code reuse is usually not valuable enough to Another step to this exercise is to be cataloged. For example, you might identify nodes in the hierarchy that have carefully catalog the high resolution the same tile or name but that do not version of your corporate identity hold the same content. An example images. But you will probably not need would be product specification. A to catalog every low resolution rendition marketing product spec and an as any good digital asset management engineering product spec will probably (DAM) system can easily create these on not contain the same , but the fly. both can be found under the node ‘‘Product Specification.’’ Most software How deep do you need to go? systems that will manage a hierarchy In a library, every book gets a code that will identify the elements of that can be traced or browsed in the library’s hierarchy by a unique code. As we will classification system. In other words, the discuss later, this has many advantages. It

282 JOURNAL OF DIGITAL ASSET MANAGEMENT Vol. 1, 4 279–297 # Henry Stewart Publications 1743–6559 (2005) Expanding the system definitions and configurations

is therefore sensible for you to start system, the taxonomy is not a thinking of a unique code scheme for representation of the physical location of your own system (eg a file or asset. A relationship is built MAR_PROD_SPEC and from the node of the hierarchy and the ENG_PROD_SPEC). asset that is classified as part of that node. We need to think of this system Synonyms or ‘‘equivalence more in terms of relational relationships’’ than bookshelves. You should note all synonyms that are This difference provides possibilities commonly used by your target users. that the analog library cannot provide. The best way to find out about this is to For example, a file or asset can belong to involve the users. Many systems fail more than one node in a hierarchy. The because they are designed to fit the same book can simultaneously be on classification and are not built for and different shelves. As previously with the users. mentioned, the way users will classify an In most cases you would want to object will vary depending on their define the following set of data: specific perspective and need. Let’s look at this example of an advertising agency. . unique code The issue with these duplications is the . authoritative term maintenance of the hierarchy. If, in our . synonyms (including abbreviations and example, a new version of a logo is maybe even common misspellings). created, it can automatically populate to all links as long as both hierarchies are Using the marketing product managed by the same content specification example, this is illustrated management tool (Figure 5). However, in Table 1. if the studio creates a brand new logo for a client, it now also needs to be Parallel structures or updated to the marketing ‘‘client logos’’ ‘‘polyhierarchical taxonomies’’ collection. While this can be defined as a Typically a classification hierarchy is process, it adds complexity. You will represented in a hierarchical folder-like therefore have to weigh the advantage structure. However, it is important to of cross-reference like the one above note that unlike both the traditional against the additional administrative library and the classic computer file overhead.

Table 1: Simplified database row for classification mode Code Authoritative Term Synonyms MAR_PROD_SPEC Marketing Product Product Spec Specification Product Specification Data Sheet Specs Spec Sheet etc

# Henry Stewart Publications 1743–6559 (2005) Vol. 1, 4 279–297 JOURNAL OF DIGITAL ASSET MANAGEMENT 283 Arthur

The Studio Hierarchy: The Marketing Hierarchy: folders serve a different purpose than the overall classification hierarchy. They are Clients Marketing Client A Corporate identity often short-lived or created ad hoc. But Logos . . . they are very useful for that user or a Images Client Logos . . . Client A small group of users to find content Client B Client B quickly in a specific context. Think of ...... projects like ‘‘spring catalog’’ or also private folders for individual users that Figure 5: Duplication within classification can help them group assets or content arbitrarily for their own needs (shopping Project-based classification cart, light boxes etc). It has become a generally accepted Let me stress again that you are not practice to have one authoritative duplicating files or assets. You are classification structure that is carefully simply referencing the asset in different maintained by the administrator or organizational structures. Figure 6 librarian of the system. In addition to this simplifies the logical flow of this structure, users might create specific sub- relationship between a file, the database hierarchies that serve various purposes. record and the representation through As long as new content is also cataloged organizational hierarchies or folders. into the authoritative classification hierarchy, anybody can find it. Multiple systems This system makes sense also In some cases you will not only specifically for project- or job-based duplicate the classification but you have collections of assets. These project separate tools you use to manage the

Figure 6: Data representation

284 JOURNAL OF DIGITAL ASSET MANAGEMENT Vol. 1, 4 279–297 # Henry Stewart Publications 1743–6559 (2005) Expanding the system definitions and configurations

Table 2: Crosswalk Description Image Library DAM Solution Marketing Portal Name File Name File Name File Name (Following our (Following customer (Following our naming convention) naming convention) naming convention) Agency ID N/A File Name N/A (Following our naming convention) Content Descriptors Keywords Subjects Keywords

content. An agency’s studio might use a For example, consider a schoolbook simple image library like Canto publisher interested in the history of a Cumulus or Extensis Portfolio for specific people. His local taxonomy internal organization. The agency might might have terms that are close to what now try to use more sophisticated DAM he looks for, but it might not be a good tools like ClearStory’s ActiveMedia or fit for people’s history. A domain Northplain’s Telescope for client-specific connected to the local domain might be projects and services. more on target (see Figure 7). Those and other more permanent If you plan on this most sophisticated duplications can be mapped with what is way for managing polyhierarchical often called a crossover table or crosswalk. taxonomies you should check out companies like: Crosswalk To manage any migration from one . Synaptica: http://www.synaptica.com/ system to another you will need to make . Google (Enterprise Search Engines): sure to map any relevant data as well. It http://www.google.com/enterprise/gsa/ will be of benefit to create a map similar index.html to the one in Table 2. . Verity: http://www.verity.com/

Taxonomy management tools For the advanced user, Seth Eraley’s Identifying relevant and interesting paper ‘‘Managing Multiple Facets and content and managing that content are Polyhierarchical Taxonomies’’ is great not necessarily tasks of the same system. reading.3 After reading the prior sections it should Before we move on to the next topic I not be a surprise that there are software tools which are solely focused to manage taxonomies. These tools can read and feed the classification structures of Local Structure Connected Structure

various content management tools and Human Related Terms Peoples Race Negroid some even allow you to link in with White Caucasian Black Indo Germani c other publicly available resources. The .... Celtic communication is mostly accomplished Animal via WebServices or Application Program Interfaces (APIs). Figure 7: Related taxonomies

# Henry Stewart Publications 1743–6559 (2005) Vol. 1, 4 279–297 JOURNAL OF DIGITAL ASSET MANAGEMENT 285 Arthur

would like to reiterate that the success of can use it. More than anything, that is a your project is measured by how much question of representation. it will help users, contributors and administrators to manage content. This Representation will depend very much on two As mentioned earlier, a classification elements: hierarchy is typically represented in a hierarchical folder-like structure. It is, 1. How much were the users involved however, important to note that unlike in the planning of the system? the classic file system, this display is not 2. The quality of the user interface a representation of the physical location design and usability of the tools and of a file or asset. systems. In sophisticated systems, every node of the structure will become an object. As The first point is up to you. Below you described earlier, an object is defined as will find a few points in regards to using something that can have metadata and and representing the taxonomy. therefore can be searched for. In other words, users can not only browse the Using or representing hierarchy, but they can also search it. the structure We discussed the idea of unique codes There are many software tools available and synonyms above. A good taxonomy today that will manage classification tool will allow for searches such as ‘‘spec hierarchies (taxonomies). Any document sheet.’’ Following our example from management (DM), larger web content above, this search will find the codes management (CM) or larger digital asset MAR_PROD_SPEC and management (DAM) system will allow ENG_PROD_SPEC because it is a you to set up hierarchical structures to synonym of the main or authoritative classify and manage content. term ‘‘product specification.’’ As discussed, there are also tools that Table 3 looks at this search from the are solely used to manage the database perspective. The objects that classification scheme. They can be used the search for ‘‘spec sheet’’ would find in combination with the systems that can be expressed in simplified database manage the content repository. In either rows. case the key is how the administrator Depending on the ability and can manage the hierarchy and how users flexibility of your tool, this can result in

Table 3: ‘‘Spec sheet’’ search result Unique ID Code Eng_Auth_Term Eng_Syn Parent ID Jh87837l MAR_PROD_SPEC Product Product Spec Jh673922 Specification Marketing Spec (the ID Data Sheet of Product Specs Marketing) Spec Sheet etc Jb958403 ENG_PROD_SPEC Product Product Spec AK948322 Specification Product Specification (the ID of Specs Product Spec Sheet Documents) Matrix etc

286 JOURNAL OF DIGITAL ASSET MANAGEMENT Vol. 1, 4 279–297 # Henry Stewart Publications 1743–6559 (2005) Expanding the system definitions and configurations

Your search for “Spec Sheet” brought up the following choices. Check the term(s) that best match(es) your expected result and click submit to display the content associated with that term.

Root Parent Authoritative Term Marketing Product Marketing Product Specification  Engineering Product Documents Product Specification 

Submit

Figure 8: ‘‘Spec sheet’’ search result representation

any number of search result to multi-language display. Table 4 representations to your users. One shows how a database might represent common way is to present the root and such an object. the immediate parent of the term This would allow a user to search for (Figure 8). In our example of a parallel the term ‘‘spec sheet’’ in German structure the search for ‘‘logo’’ will (Technische Daten Beschreibung) — provide the result shown in Figure 9. either term would find the same In this case the user could find the content, because the content has a same logos in different ways but this relationship to the classification term redundancy is not an issue as long as it is identified by the language neutral not confusing. unique ID ‘‘Jh878371.’’

Multi-language Subject domains and synonyms The concept of hierarchy objects that are Similar to the display elements above, a identified by unique codes is also the key good interface of an advanced taxonomy

Your search for “Logo” brought up the following choices. Check the term(s) that best match(es) your expected result and click submit to display the content associated with that term.

Root Parent Authoritative Term Marketing Marketing Client Logos  Clients Client A Logos  Clients Client B Logos  Clients Client C Logos 

Submit

Figure 9: ‘‘Logo’’ search result representation

# Henry Stewart Publications 1743–6559 (2005) Vol. 1, 4 279–297 JOURNAL OF DIGITAL ASSET MANAGEMENT 287 Arthur

Table 4: Multilingual database row Unique ID Code Eng_Auth_Term Eng_Syn Ger_Auth_term Ger_Syn Jh87837l MAR_PROD_SPEC Product Spec Sheet Technische Technische Specification Produkt Daten Broschu¨re Beschreibung

management tool should provide . MS property tags: http:// options to explore a term in a hierarchy. msdn.microsoft.com/library/ In addition to parent and children, this default.asp?url=/library/en-us/ interface should display synonyms, links odc_wd2003_ta/html/ to related terms of connected domains odc_wdcustprop.asp — The Microsoft and possibly translations into foreign standard to embed metadata into files languages. created with MS Office applications.

Existing standards . SMPTE 335M: http://www.smpte- A number of existing taxonomy ra.org/mdd/rp210-2.pdf standards are listed below. These, as well . MXF: http://www.mxfig.org/index.php as taxonomies created by companies in . AAF: http://www.aafassociation.org your sector (vertical taxonomies), can be a very good starting point. A good place The above are metadata standards for to find other people in your industry television and broadcast. sector who might have started content classification projects is at conferences4 . MPEG-7: http://www.chiariglione.org/ or organizations.5 mpeg/standards/mpeg-7/mpeg-7.htm The following standards can offer . MPEG-21: http://www.chiariglione.org/ more information: mpeg/standards/mpeg-21/mpeg-21.htm

. DCMI or Dublin Core: http:// Unlike the more known MPEG 1, 2, 3 dublincore.org — The best known and 4 these two are metadata standards taxonomy standard to date. for multimedia file description and . IPTC: http://www.iptc.org/pages/ exchange. index.php — This is a standard used to include metadata in image files which is Summary supported by many image software tools For a larger system, just defining the like Adobe Photoshop. basic taxonomy can take weeks. There is . EXIF: http://www.exif.org — This no reason to wait until a decision for a standard is used by many digital cameras product or vendor has been made. This to embed metadata into photography classification will be a very useful tool similar to ITPC. for any vendor or consultant working . XMP: http://www.adobe.com/products/ with you. xmp/main.html — An Adobe standard At this point it will make sense to for embedding metadata into Adobe remember the opening paragraph. Use created files. common sense when building your

288 JOURNAL OF DIGITAL ASSET MANAGEMENT Vol. 1, 4 279–297 # Henry Stewart Publications 1743–6559 (2005) Expanding the system definitions and configurations

taxonomy. Many companies are (subjective or user defined) — author, realizing that content management at the location, target audience, topic etc. enterprise level is a key strategy for . Administrative information — approval long-term success. A simple inventory status, storage path, lifecycle status, use etc. with a well-thought through structure is . Information about the file’s relationships a good first step. However, classification — collections, parent documents, is only one step of the process. In most projects, jobs, inclusions etc. cases users will not traverse long, potentially complex hierarchies to look Building metadata templates for content. They will want to search by You will find that different objects will typing same basic values in a search need different data to be described and page. This kind of search will require classified. A video’s encoding type and metadata. compression are important, while MS Word documents will not need such METADATA information, although a value like Simplified, you should be able to assign ‘‘number of pages’’ might be helpful. data to a file which will be used to I will provide a list of common data describe and most importantly to find the types and explanations below. If your file. These data are called metadata. There system grows larger you might find that are different schools of thought about a hieratical composition of metadata how to group these data. I find the templates for different categories of following grouping most helpful. content makes much sense. In the Metadata can be example shown in Figure 10, an MS Word document would have the . Information about the file (objective) — following data: classification ID, notes, file size, type, color space, bit rate etc. keywords, file type, author, topic, page . Information about the content count, last printed.

Figure 10: Hierarchical metadata templates

# Henry Stewart Publications 1743–6559 (2005) Vol. 1, 4 279–297 JOURNAL OF DIGITAL ASSET MANAGEMENT 289 Arthur

Following are a number of questions that few people have ever used the that can help you building your advanced search features of Google. metadata structure. Most users have limited time and even shorter patience. In order to become a What metadata do you need? useful tool, any system needs to be easy The key question is which data do you to use. Some complexity can be need to assign to the different kinds of overcome by good search user interface content or assets? (UI) design (this will be discussed later) but there is a limit to how much data a user can be expected to provide to find . Which data are needed for users and the content they are looking for. This administrators to find an object? also depends on the level of Users are not only internal staff, they sophistication and training. Many could be channel partners, consumers, systems fail because the search pages are investors, the press. It will often make designed with dozens of options and sense to clarify which user group will use qualifiers. Most often less is more. which data to find assets. . How much data can the administrators . Which data are needed to provide handle? information about the object that users There are options to support the need but that are not used for searches? administrator or librarian in keeping This could be the file size or general order with metadata, classification and notes. cross references. These options are . Which data might be needed in the described in the next section. In many future to find content or objects in an cases manual controls are necessary to archive or in the later stages of its keep the data ‘‘clean’’ and ensure searches lifecycle vs. the earlier stages? reliably return all applicable objects. In some cases it might initially make Thus, the administrator or librarian has sense to assign just a small set of data to an important job which will be detailed an asset because time is of the essence; for in the section about data integrity, example a fast turnaround of assets from below. As you design the system, you a live event. Some of these assets might need to ensure that the administrative later become part of more permanent tasks are not becoming overwhelming or libraries and need additional metadata a bottleneck of system’s efficiency. Of such as key words or usage descriptions. course, it is not solely the librarian that applies metadata. The processes of who It is important to realize that there is a assigns which data should be well defined big difference between the data that can and include the staff with the best be assigned vs. the data that are really context and motivation. needed by the users. There is a limit on how much data the average user and How is the data assigned also the administrators can work with. or applied? As defined above, there are different How much data can you handle? kinds of metadata. Below we go though . How much data can the users handle? the four categories and define who and In this regard it is of interest to note how the data are applied.

290 JOURNAL OF DIGITAL ASSET MANAGEMENT Vol. 1, 4 279–297 # Henry Stewart Publications 1743–6559 (2005) Expanding the system definitions and configurations

. Information about the file (objective) — search for data with a ‘‘new’’ status and file size, type, color space, bit rate etc. perform the necessary tasks and ‘‘flag’’ These data can in most cases be the asset with the applicable status for extracted automatically. This is good, the next step, eg ‘‘reviewed.’’ but unfortunately these data are the least If your organization is large and has a useful in identifying a specific file or well-defined content management asset. These data would be an strategy with a defined enterprise ‘‘advanced’’ search option or simply taxonomy, you might have a dedicated information provided to users after they person to control or manage this aspect. have found the content. Figure 11 shows a sample content flow . Information about the content through the different classification and (subjective or user-defined) — author, metadata assignment steps. location, target audience, topic. . Information about the file’s relationships Arguably these data are the most — collections, parent documents, valuable for finding an asset by search projects, jobs, inclusions etc. terms. In most cases some of the As shown in Figure 11, these data can subjective data are best provided by the be automatically assigned by the system, content creator or someone very familiar eg by using a dedicated upload folder with the context of the content. For that will assign pre-defined relationships example, caption text of a photo or to a project folder. Other examples keywords describing the content, are best include the upload of compound defined by people closer to the context documents like Quark Xpress or than the system administrator or general InDesign. Those documents consist of librarian. It could be the job of the latter many files that any good DAM system to ensure data has been assigned in the will link automatically in a parent/child right format but it is often hard for an relationship. ‘‘outsider’’ to ensure the data are correct. Throughout the content lifecycle, an Data about the content that has to do asset or file might be assigned to other with business rules, such as owner or collections, folders, jobs and the like by usage rights, can then often be defined users or administrators. This will most by more administrative roles. The task of likely happen manually. We touched on assigning the right data is therefore often this issue previously, in the section on a collaborative effort. In many cases project-based classification. Data administrative information can be used integrity is vital: it is key to ensure all to manage this workflow. these data and the relationships will be . Administrative information — approval useful and not confusing due to errors, status, storage path, lifecycle status, use omissions or misclassifications. etc. These data are either controlled by the How can you ensure system or by more administrative roles. data integrity? A typical set-up would be to tag any In the prior section we discussed options newly ingested file automatically with a to automate the assignment of metadata. certain status, eg ‘‘new.’’ This will then However, there is always a degree of allow a dedicated librarian, information human intervention necessary to fully architect or other dedicated role to classify and specifically describe visually

# Henry Stewart Publications 1743–6559 (2005) Vol. 1, 4 279–297 JOURNAL OF DIGITAL ASSET MANAGEMENT 291 Arthur

Figure 11: Classification and metadata assignment steps

rich content. The processes defined for description of the staff responsible for the assignment of data are therefore very setting up hot folders and assigning key important. A basic rule is that there data. We mentioned information should be an incentive for the person architects (IAs — sometimes also called assigning the data to do so with careful cybrarians) and libraries earlier in this attention. Another aspect of data paper. These jobs require dedication and integrity is control. We will discuss that skill. One common reason why systems issue shortly. do not archive with the expected effectiveness is because companies Content management requires underestimate this aspect and leave the dedication and skill crucial management of metadata and No library would expect the average relationships to unqualified and poorly user to file books back into the shelf — trained staff. The level of data integrity the margin of error would be too high. and the effectiveness of the system is Following the example in Figure 11, the only as good as the dedication and skill average creative worker can be expected of the staff managing it. Technology has to provide data vital for their work and only a limited role in this aspect. drop the file into a dedicated ‘‘hot’’ ‘‘Garbage in garbage out’’ has never folder, but one should not expect been more applicable. If the cost of anything more. qualified staff is not part of the Managing content should be defined calculations for the projected return on and articulated as part of the job investment, the calculations are flawed.

292 JOURNAL OF DIGITAL ASSET MANAGEMENT Vol. 1, 4 279–297 # Henry Stewart Publications 1743–6559 (2005) Expanding the system definitions and configurations

Controlled vocabulary processes, an even more important One tool to ensure data integrity by element is the presentation of both administrators is that of controlled search options and the resulting vocabulary. There are various ways to information and content. define a group of values for users to choose from, rather than just entering How is the data represented? free text. These can be in the form of lists, This is a very important question. It is hierarchies (again) or other controls. Both beyond the scope of this paper to discuss the value and the format of information the UI recommendations and can be controlled to some degree. While considerations in detail. However, it is this can help to ensure data are assigned fair to say that the UI is the ultimate correctly, it will need additional thought milestone by which the usefulness of our and planning on the part of the system system will be measured. administrators. Controlled vocabulary We have discussed various quite options are highlighted in the list of complex topics. In order to address the common data types below. various user needs sufficiently, a system will need a way to build UIs (even a Control medium-sized system will have probably An administrator or librarian can take more than one) that are flexible and various measures to ensure that metadata adoptable for change. It also needs to have been assigned, and assigned accommodate some freedom of creativity correctly. We outlined two control steps on the side of the designers. Most out-of- in the process mapped in Figure 11. If the-box designs are targeted at one the control is not part of the data specific use case. If a tool does not provide assignment tasks at the time of some level of freedom in creating cataloging, an administrator can simply ‘‘customized’’ UIs, it is not a good tool for search for an ingestion or catalog date an enterprise application of any kind. range and inspect submitted material The figures used to depict hierarchies randomly or systematically (eg every in this paper for example are all very dry 10th cataloged file). and boring. There is no need to Most systems will also allow searching represent a hierarchy in such a way. for omissions. For example, a librarian Images or any more visual elements can could search for all content that does not become an intuitive and ‘‘fun’’ way to have the classification Ibid. navigate. Fun is not necessarily a Control can also be distributed to the standard design guideline, but it should areas of the organization that have a stake be. Flash and many other technologies in certain aspects of the system. For allow users to move and adjust example, in the logo scenario from earlier, components on a website, thus creating a the marketing department could and personalized experience that can actually should have a dedicated person to check be fun. This is a very important element for new logos created by the studio to add of engaging the user and an invaluable to the marketing client logos folder. contribution to the user acceptance and While control and data integrity are the success of the entire project. very important for the usefulness and In general we have three kinds of the adoption of the new tools and interfaces to think about:

# Henry Stewart Publications 1743–6559 (2005) Vol. 1, 4 279–297 JOURNAL OF DIGITAL ASSET MANAGEMENT 293 Arthur

. search interfaces with general interface design, the more . information display interfaces (search targeted the information is for a specific results) user or use case, the better. When . administrative or editing interfaces. building display interfaces for users we need to know what they need to know. Below we will look at each in detail. We need to understand both the kind of Search interfaces data that is required as well as the best A search actually starts before a user format for that data. We will discuss arrives at a search interface. It starts the later the various metadata types and moment a user clicks on a bookmark, formats that are commonly used. But enters a URL or chooses a specific the content itself can also have various application to look for something. It is renditions or proxies. therefore important to include these In the best case you will create a choices when analyzing the best interface matrix of each user group and their approach for any system. requirements. Figure 12 is an example of For example, when a user needs to pick a possible list for a user group. a geographical location, this can be Administrative displayed in a long list or as an interactive Administrators or librarians often have map or animated globe that the user can very different needs from the average click. The latter is sure to engage users user. At the same time, they should also much more intuitively. They might not spend more time in training and even consider this ‘‘searching.’’ After the therefore should have a much better user has arrived in California the search understanding of what the system can option could include entering a search do. In some cases, an administrative term in a Google-like style or add some interface is not a web browser but a ‘‘advanced’’ data values (dates, file type thick client. The difference being that a etc). Saved searches are also a great tool to make it easier for users to find specific User Role Title: Marketing Images assets. For example, a saved search can Presentation: Lo-res thumbnail ad at least one define even a very complex query and enlargement (No need to zoom or pan) present it as a simple link on an intranet Information: Color pace, File Size. File Type, site. Consider the text ‘‘New logos and Resolution, Marketing Usage Rights . . . Presentation Alternative: Tabulated list view graphics for use in PowerPoint presentations’’ as a link on the marketing Video Animated thumbnail for visual recognition page. Such a link can hide the Lo-res stream or progressive download (Windows Media 9 or higher) complexity of a specific query, eg Information: Shoot Date, Locations, Available Codecs, . Information: Title, Target Audience, Number of slides, Creation Date . . . Information display Presentation Alternative: Tabulated list view How information is displayed is another important aspect of a good system. As Figure 12: User role display definition

294 JOURNAL OF DIGITAL ASSET MANAGEMENT Vol. 1, 4 279–297 # Henry Stewart Publications 1743–6559 (2005) Expanding the system definitions and configurations

thick client is an application installed on limiting if the list is not well-defined. In a local computer to allow more some cases systems can allow users to add sophisticated actions, such as batch to a list. uploads and batch assignment of data or . Numeric (numbers, dates, currency) also editing of end-user interfaces. . Alphanumeric (codes and unique IDs) A good system should allow . Yes/no (Or boolean) administrators to build search pages on In this case it is important to verify the fly by selecting from dozens of that a system also can handle omissions. possible search values. These searches Administrators might need to search for should be saved for later reuse or even assets where no value was assigned. published as links to end user. We . Free text field discussed saved searches previously in The most risk for user error and this paper. misspelling but common for notes, User and application security is keywords, and caption. another aspect of the administrator . Controlled text fields interface. Integration with exiting user You can control the length or format administration tools such as iPlanet, of entered text. For example, you can Microsoft Active Directory6 or other check for the [email protected] LPAD based products is not the final email address format. answer to this issue. Many application- . Hierarchical (with or without specific user administration tasks will inheritance) have to be performed in any larger Data types that are best expressed in a content management system. We will hierarchical manner are often briefly mention how access control and overlapping into the classification security will also impact your taxonomy area. However, it can make sense to planning. But to complete this section allow users to pick one or multiple on metadata we will list the common values from a hierarchical structure other data types and the common objects that than the main taxonomy. Inheritance are defined by these data. means that a lower level value will automatically inherit the values of the Common data types nodes above. A human is always also a mammal. That inheritance flows down Descriptive the tree, but inheritance can flow the . Pick one or pick many lists other way round. Non-inheriting These lists allow for controlled structures are more like computer file vocabulary. The biggest advantage of systems — you only get the value you this data form is that it ensures the pick. correct entry and spelling. It can be A good example for hierarchical

Location Motive Time of Year Light Atmosphere Urban Huma Winter Sun Fun Out Doors Animal Spring Shade Romantic Wild Nature Mammal Summer Half Shade Love ...... Autumn ......

Figure 13: Hierarchical subjects for images

# Henry Stewart Publications 1743–6559 (2005) Vol. 1, 4 279–297 JOURNAL OF DIGITAL ASSET MANAGEMENT 295 Arthur

inheriting metadata is keywords for WHAT ARE THE OBJECTS image libraries. When looking for an YOU WILL NEED TO ASSIGN image hierarchies like these can be a METADATA TO? great tool (Figure 13). As a final area of planning you will have to think about the different elements Relationships that need metadata. We defined those In addition to descriptive and elements as objects. Most of this paper administrative metadata there are focused on the classification and various relationships that can be description of content. However, other important: objects will need to be classified and grouped, and often they need metadata . Containers for searching and administrative tasks. — collections Here is a list of the most common — folders elements that you will most likely have — jobs to include in your planning process: — projects These containers for content or assets . Files/content/ assets are objects that can be searched and — versions organized just as individual assets can. . Containers . Parent/child — collections An HTML page, a Quark XPress or — folders Adobe InDesign document consists — jobs usually of a master template and various — projects linked files. . Users . Lineage . User groups This relationship usually tracks reuse as . Roles in a composite image (a new image . Upload or staging folders created out of multiple photos) or . Nodes of the taxonomy tree (sometimes between renditions that became called subjects) individual assets. It is a mix of parent/ As previously discussed, a node of child and peer-to-peer. taxonomy can become an object which . Versioning can be searched and which can have Mostly a sequential linear relationship metadata such as synonyms and but this can become a complex translations. relationship between different versions of a file and versions of the metadata. TESTING Versions can become hierarchical tree In the planning for a larger system you structures if different versions continue to should consider testing the system on a evolve in parallel. smaller scale. This is not always easy, . Peer-to-peer because with limited content the users This relationship links assets without will often not find anything when creating a new object like a folder or searching for specific items. However, collection. An example is the domain this defeats the purpose of providing a relationship we discussed earlier in the realistic testing environment. Effective paper (race is related to people). pilot projects are therefore quite difficult

296 JOURNAL OF DIGITAL ASSET MANAGEMENT Vol. 1, 4 279–297 # Henry Stewart Publications 1743–6559 (2005) Expanding the system definitions and configurations

to realize. Over the years, I have accomplished in phases. This is not observed that a phased implementation different for complex taxonomy approach is the best alternative. For structures. There is much value in example, start with building an image building a larger enterprise taxonomy. library or enable access for just one As mentioned at the outset, this can be client through your services portal. The in the form of a spreadsheet or table. best first phases are those which are The implementation and with that the complete implementations with limited refinement of the details can be but well-defined scope. They can accomplished in phases. These phases provide valuable feedback and build the should not be isolated projects. They competency of everyone involved over should follow a larger strategy or vision time. but with each phase this strategy can and most likely will be adjusted to reflect ACCESS CONTROL AND ‘‘lessons learned.’’7 APPLICATION SECURITY In closing, I wish to mention briefly one References area that is not usually considered as part 1. This website defines the term taxonomy in of the taxonomy or metadata structure: more detail: http:// www.mywiseowl.com/papers/ access control and application security. Taxonomy. Even in a mid-size system of a few 2. Rockley, A. (2003) Managing Enterprise thousand assets, a user can be Content, New Riders Press, Indianapolis, overwhelmed with the search options IN. and the available information. Access 3. http://www.earley.com/Earley_Report/ control is not only a way to secure that ER_Managing_Multiple_Taxos.htm. assets are not accessed by unauthorized 4. See eg http://www.gilbane.com/ or users. It is also a way to hide some http://www.damusers.com. complexity from the users. They will 5. See eg http://www.g-sam.org, http:// only see what makes sense to them. For www.aiim.org and especially http:// example, a sales person will want the www.cmpros.com. PDF of the marketing brochure. They 6. http://www.microsoft.com/ do not need the native Quark XPress file windows2000/server/evaluation/features/ dirlist.asp. of the same name and they surely don’t 7. Building a strategy for a unified content need to see all the linked files that make strategy is not easy but there are several up the end result. By assigning access experienced consultants in the field. You right according to roles, the information can find independent professional advice can be filtered to the most applicable set at http://www.g-sam.org, http:// of content. www.aiim.org and especially http:// www.cmpros.com. SUMMARY The implementation of content Note: All URLs last accessed 3 May management systems is best 2005.

# Henry Stewart Publications 1743–6559 (2005) Vol. 1, 4 279–297 JOURNAL OF DIGITAL ASSET MANAGEMENT 297