<<

Taxonomy Tools: Requirements and Capabilities

Joseph A. Busch, PPC Senior Principal Today’s agenda

Time Duration Agenda 1:00-1:15 15 min Introductions 1:15-2:00 45 min Basics 2:00-3:00 60 min Taxonomy Development Process 3:00-3:15 15 min Coffee Break 3:15-4:00 45 min Taxonomy Construction Tools 4:00-4:45 45 min Exercise 4:45-5:00 15 min Q&A, Closing Learning Objectives:  Ability to identify by type, to choose the appropriate type for an product development application, and to articulate the benefits of the taxonomy for use in development of an information product.  Understand basic taxonomy-related terminology.  Demonstrate the ability to identify taxonomy term record elements.  Demonstrate the ability to focus on the key concepts and build terms records for a small taxonomy. 1. TAXONOMY BASICS What taxonomy is: Systematics view

Biological taxonomy place an organism in one and only one place.

Animalia Chordata Mammalia Carnivora Canidae Canis C. familiari Kingdom Phylum Class Order Family Genus Species Linnaeus … What taxonomy is: Pragmatic view

But most of the time things belong to more than one category.

Animalia Chordata Mammalia Carnivora Canidae Canis C. familiari Kingdom Phylum Class Order Family Genus Species Linnaeus …

Pets Mammals Farm Animals

Dogs Other semantic schemes

Type Remarks Ring A set of words/phrases that can be used interchangeably for searching. Example: Hypertension, High blood pressure Controlled A list of preferred and variant terms, with defined hierarchical Vocabulary and associative relationships. A taxonomy is a type of controlled vocabulary. Typically used for names of countries, individuals, organizations Classification An arrangement of knowledge that does not follow taxonomy Scheme rules. Usually enumerated; e.g., Dewey Decimal Classification A tool that controls and identifies the semantic relationships among terms. Ontology Resembles faceted taxonomy but uses richer semantic relationships among terms and attributes and strict specification rules. Semantic schemes: Simple to complex

Taxonomies Ontologies

(Vocabularies)

Synonym Authority Classification Thesauri Rings Files Schemes

Simple Complex

Equivalence Hierarchical Associative

(Relationships)

Source: Amy Warner. and Taxonomies for a More Flexible (http://www.lexonomy.com/presentations/metadataAndTaxonomies.ppt) Taxonomic metadata: e-Government example

Agency Form Type Industry Jurisdiction BRM Impact Keyword Topic Audience Impact

0001 Legislative Application 00 Generic Citizen Srvcs Agriculture & All 1000 Judicial Approval 11 Agriculture Social Srvs food General 1100 Executive Claim 21 Mining Defense Commerce Citizen Office of Pres Information Metadata22 Utilities Elements Disasters Communica- Business 0003 Exec Depts request 23 Construct Federal Econ Dev tions Govt 1200 Agriculture Information 31-33 Manuf State + Education Education Employee 1300 Commerce submission 42 Wholesale Local + Energy Energy Native 9700 Defense Instructions 44-45 Retail Other + Env Mgmt Env pro American 9100 Education Legal filing 48-49 Trans Enf Foreign rels Non- 8900 Energy Payment 51 Info Judicial Govt resident 7500 HHS Procurement 52 Finance Correctional Health & Tourist 7000 DHS Renewal 54 Profession Health safety Special group 8600 HUD Reservation 55 Mgmt Security Housing & 1400 Interior Service request 56 Support Income Sec comm dev 1500 Justice Test 61 Education Intelligence Labor 1600 Labor Other input 62 Health Care Intl Affairs Law 1900 State Other 71 Arts Nat Resour Named grps 6900 Transport transaction 72 Hospitality Transport National def 2000 Treasury 81 Other Workforce Nat resources 3600 Veterans Services Science Recreation Ind Agencies 92 Public Delivery Sci & tech Intl Orgs Admin Support Social pgms Management Transport

Controlled Vocabularies Standards

• Taxonomy – Z39.19-2003. Guidelines for the Construction, Format, and Management of Monolingual Thesauri • BT/NT – ISO 13250. Topic Maps • Topics, associations, occurrences • Metadata – ISO 15836 and Z39.85-2007. Dublin Core Metadata Element Set. • 15 elements – FRBR. Functional Requirements for Bibliographic Records • Work  Expression  Manifestation  Item Standards (2)

• Semantic web (interoperability) – RDF. Resource Description Framework. • Subject-predicate object descriptions – ISO 11179. Metadata Registry (MDR). • Metadata-driven exchange of data in an heterogeneous environment, based on exact definitions of data. Taxonomy definitions

Definition Concept The characteristics of a real or imaginary object expressed as terms in the taxonomy. Controlled Vocabulary A list of terms that have been explicitly enumerated. The terms are controlled and published by a designated authority or authoritative source. If multiple terms are used to mean the same thing, one of the terms is identified as the Preferred Term in the Controlled Vocabulary and the other terms are listed as synonyms or aliases. Facet A grouping of concepts of the same inherent category. Examples of categories that may be used for grouping concepts into facets are: Audience, Channels, Components, Content Types, Functions, Industries, Intentions, Lifecycle, Location, Organization, Products, etc. Taxonomy The core metadata elements and the Controlled Vocabularies required to find, use, and manage content in a collection. Some definitions associated with terms

Term Definition UID The unique identifier for the concept. Entry Term The preferred term that is used to label a concept. An entry term is also known as a Descriptor. Broader Term (BT) A term to which another term (or multiple terms) are subordinate in a . Narrower Term (NT) A term that is subordinate to another term or to multiple terms in a hierarchy. Used For Term (UF) Non-preferred term(s) that are equivalent to the Entry Term. Used for terms may be synonyms, aliases (such as abbreviations) and quasi-synonyms (such as more specific terms). RT (Related Term) A term that is associatively (but not hierarchically) linked to another term in a Controlled Vocabulary. SN (Scope Note) A note following a term explaining its source, rationale, coverage, specialized usage, or rules for assigning it. Relationships

Definition Associative A relationship between or among terms that leads from one Relationship term to other terms that are related to or associated with it. An Associative Relationship is a Related Term or cross- reference relationship. Equivalence A relationship between or among terms in a Controlled Relationship Vocabulary that leads to one or more terms that are to be used instead of the term from which the Reference is made. An Equivalence Relationship is a Used For Term relationship. Hierarchical A relationship between or among terms in a Controlled Relationship Vocabulary that depicts broader (generic) to narrower (specific) or whole-part relationships. A Hierarchical relationship is a Broader Term to Narrower Term relationship. Concept, terms and relationships

CONCEPT Is Preferred IBM IBM Label

Is Used For TERMS

Is Used For

International Business I.B.M. RELATIONSHI Machines PS Business taxonomy problem: How can a customer pick from >5,000 faucets w/o quitting?

Refine search by: . Category . Price . Brand . Color/Finish . # Handles . Series Name . Water Filter? . Faucet Spray . Handle Shape . Soap Dispenser? How business taxonomy translates into front- end interface

Metadata Field: Size Metadata Field: Taxonomy Values: Type 4.5 Taxonomy Values: 5.5 Athletic Inspired 6 Boots 6.5 Loafers and Slip-ons 7 Oxfords and More 8 Sandals …

Metadata Field: Color Metadata Field: Brand Taxonomy Values: Black Taxonomy Values: Blue Antonio Maurizi Brown Bacco Bucci Green Ben Sherman Grey Bruno Magli Ivory … … Learning Objectives:  Demonstrate knowledge of multiple taxonomy development methods.  Demonstrate the ability to choose the appropriate taxonomy development method for use in development of an information product.  Demonstrate knowledge of common taxonomy facets.  Demonstrate the ability to identify specialized facets for use in an information product.  Demonstrate the ability to map the facets to the appropriate elements in a Dublin Core-based metadata specification. 2. TAXONOMY DEVELOPMENT PROCESS Taxonomy development methods

Method Description Automated Munge, blast, crunch text to analyze analysis corpus. Workshopping Guide group in activities to identify key concepts. Strawman Prepare best guess, then bring it to the table to discuss. Adapt Existing Customize internal terminology, Vocabularies industry standards, etc. Combination of some or all of these Hybrid methods. Key components to a successful taxonomy project Set-up taxonomy Identify team business case

Planning & research Maintain & Interview evolve stake- taxonomy holders

Migrate Define use content cases

Validation Build high- testing & level review Build-out taxonomy taxonomy detail Define business case: Business case examples

• Improve search and browsing to reduce the amount of time employees spend looking for information. • Reduce business silos, foster collaboration and content reuse, and thereby reduce redundant work. • Reduce the amount of time employees spend e-mailing basic information to each other. • Build confidence that employees are getting the most up to date information, and increase employee loyalty by helping them stay “up to date” on the company. Research & planning

• Identify target content to be focused on. – Provide a list of websites (and/or other target content file stores) – Prioritize this list for the purposes of the taxonomy project. • Gather any query logs, usage statistics and usability surveys. • Collect any existing documentation related to audience personas, content organization, metadata, keywords, and any other guidelines or standards. • Identify and gather any internal classifications (org charts, sales regions, records retention schedule, code of conduct, product lists, etc.); and any relevant industry standard classifications (UNSPSC, NAICS, USPS, regulated activities, etc.) Interview stakeholders

• Recruit people from business-critical functions such as marketing, public relations, product marketing, legal, etc. – Include people who have credibility, are early adopters, hold large amounts of content, and are “squeaky wheels” or “fans.” • Conduct 10-20 interviews. • The goal is for stakeholders to be the review board during the taxonomy development process, and beyond. Define use cases: Intranet examples

• Content related to business areas or facilities – By geographic location, by type, by specific facility, by access restrictions, by audience, etc. . Use Case: Create a safety policies and procedures website for facilities organized by State. . Use Scenario: Find all safety policies and procedures related to a facilities located in Ohio.

• Company-wide content – By business function, by topic, by access rights, etc. . Use Case: Locate any content that has policies and procedures around a particular topic. . Use Scenario: A policy regarding smoking company-wide has changed and references to outdated policies should be removed. Find official policies, as well as newsletters related to the smoking policy company-wide. Define use cases: .com examples

• Web content managers – By content type, by topic, by location, etc. . Use Case: Find and recall all public-facing pages that describe a specific safety tip. . Use Scenario: Find and recall all public-facing pages that discuss gas safety.

• Public users seeking information – by topic, by location, etc. . Use Case: Provide search for dividend schedules, earnings statements and stock splits; and the corresponding press releases for a specific time period. . Use Scenario: An investor who recently sold stock is preparing taxes and would like to do a concise search so that they can find historical information about their holdings. Build high-level taxonomy

• Identify the types of actors – Audiences, roles & access rights • Identify the types of content • Identify the types activities – Business processes, applications & uses • Identify the types of named entities – Products, services, projects, organizations, locations, etc. • Topics will be everything else.

. A business taxonomy should have no more than 6-10 broad divisions. Build high level taxonomy: Oracle.com top-level taxonomy

Person Organization Location Content Type Audience Products

Product Line

Technology

Application

Industry Solution

. The Oracle.com taxonomy has no explicit “Is a” groups of topics, only actors, content types, and named entities. Products Build high level taxonomy: SGMS top-level taxonomy

Topics

. The SGMS (Singapore Government Metadata Standard) Taxonomy is much more focused on Topics. Build-out taxonomy detail

• Get agreement on the broad divisions first, then build-out the detailed taxonomy. • Use existing terminologies whenever they are available for business functions, locations, products & services, etc. • Only build a vocabulary when no alternative authoritative source exists. • Only create categories for which there already is content, or likely to be content soon. • Keep the taxonomy broad and shallow. – Roll-up more specific terms into broader categories . A business taxonomy should have no more than 1,200 categories. Build out taxonomy detail: NASA Taxonomy http://nasataxonomy.jpl.nasa.gov/ Validation testing and review

Method Process Who Requires Validation Walk- Show & explain • Taxonomist • Rough taxonomy • Approach through • SME • Appropriateness to task • Team Walk- Check conformance • Taxonomist • Draft taxonomy • Consistent look and feel through to editorial rules • Editorial Rules Usability Contextual analysis • Users • Rough taxonomy • Tasks are completed successfully Testing (card sorting, • Tasks & Answers • Time to complete task is reduced scenario testing, etc.) User Survey • Users • Rough Taxonomy • Reaction to taxonomy Satisfaction • UI Mockup • Reaction to new interface • Search prototype • Reaction to search results Tagging sample content • Taxonomist • Sample content • Content ‘fit’ Samples with taxonomy • Team • Rough taxonomy • Fills out content inventory • Indexers (or better) • Training materials for people & algorithms • Basis for quantitative methods Migrate content

• Prioritize content to be tagged – Identify and dispose of ROT. • Use business rules to automate content tagging – Tag landing pages for major sections. – Lower-level pages inherit tags from top-level pages. • Use workflow to enforce tagging – Require entry of simple tagging in order to submit an item into the content management system. • Use templates to guide user tagging – Pre-populate template fields whenever possible. – Use context-sensitive pick lists. – Call-out to taxonomy service for more complex controlled vocabularies. • Provide tagging incentives – Almost instantaneous feedback. Maintain and evolve taxonomy

• Taxonomy building is iterative. – A taxonomy should be improved over time and maintained. • Designate a taxonomy editor as the single point-of-contact for taxonomy changes. • Log change requests and notify requestors. • Prioritize taxonomy changes, e.g.  Improves , use and reuse.  Requires creating new data or metadata.  Affects program operations or has a financial impact.  Enables communication campaigns or organizational strategy.  Positive impact on users Licensing an existing taxonomy

• See Factiva’s taxonomy www.taxonomywarehouse.com – There are usually license fees, but these will be less than the effort to develop an equivalent taxonomy. – But pre-existing taxonomies rarely fit an organization’s needs and may require extensive customization. • Recommendation – Adopt a faceted approach. – Reuse existing (especially internal) vocabularies for as many of the facets as possible. – Plan on doing full-custom “Content Type” and “Topic” taxonomies. Free sources for 8 common taxonomies Taxonomy Definition Potential Sources Organization Organizational structure. SP 800-87, U.S. Government Manual, Your organizational structure, etc. Content Type Structured list of the various types Dublin Core Type Vocabulary, AGLS of content being managed or used. Document Type, Your policy, etc. Industry Broad market categories such as SIC, NAICS, Your market segments, lines of business, life events, or etc. industry codes. Location Place of operations or FIPS 5-2, FIPS 55-3, ISO 3166, UN constituencies. Statistics Div, US Postal Service, Your sales regions, etc. Business Activity Business activities or functions Federal Enterprise Architecture performed to accomplish mission Business Reference Model, and goals. Enterprise ontology, Your business functions, etc. Topic Business topics relevant to your Federal Register Thesaurus, NAL mission & goals. Agricultural Thesaurus, Your research areas, etc. Audience Subset of constituents to whom a GEM, ERIC Thesaurus, IEEE LOM, piece of content is directed or is Your psycho-graphics or personas, intended to be used by. etc. Products & Services Names of products/programs and ERP system, Your products and services. services, etc. Learning Objectives:  Demonstrate the ability to identify appropriate taxonomy sources for use in development of an information product.  Demonstrate the ability to define and populate a small taxonomy with 3-5 facets using MultiTes.  Demonstrate the ability to design the validation methods for a taxonomy. 3. TAXONOMY CONSTRUCTION TOOLS Tools

• Taxonomy editing – Data Harmony, MultiTes, protégé, Synaptica, SchemaLogic, Wordmap • Metadata tagging (automated ) – CIS, ConceptSearching, Data Harmony, MetaTagger, nStein, Smartlogic, temis • Content management – Documentum, Drupal, Fat Wire Interwoven, Joomla!, OpenText, SharePoint Vendor Taxonomy Editing Tools URL Cuadra STAR/Thesaurus www.cuadra.com/products/thesaurus.html Thesaurus Master www.dataharmony.com/products/tm.htm Autonomy Interwoven http://www.interwoven.com/components/pagenext.jsp?topic=PROD MetaTagger UCT::METATAGGER Business Objects Tools for http://www.sap.com/solutions/sapbusinessobjects/large/business- Advanced Visualization intelligence/dashboard-visualization/advanced- visualization/index.epx MS Excel www.microsoft.com Intelligent Topic Manager www.mondeca.com MultiTes Pro www.multites.com Taxonomy/Authority File www.nstein.com/epub/ncm-taxonomy.asp Manager Protégé http://protege.stanford.edu/ SchemaServer www.schemalogic.com Semaphore www.smartlogic.com Synaptica www.synapticasoftware.com SAS Ontology Management http://www.sas.com/text-analytics/ontology- management/index.html Luxid for Content Enrich www.temis.com Term Tree www.termtree.com.au Enterprise Vocab Server www.webchoir.com/products/wvs.html Designer www.wordmap.com Normal taxonomy editor functionality requirements

. Standard and Custom Fields . Standard and Custom Relations Term . Data Typing and Restrictions . Consistency Enforcement Editing . Flexible Reporting . Flexible Importing? Basic

. UNICODE . Multiple Vocabulary Support . Inter-Vocabulary Relations . Unique IDs: externally supplied IDs are

Midrange not sufficient

. Workflow Hierarchy . Voting . Change Request Mgmt. Browser . Stylistic rules enforcement .

Advanced Programmability Additional functionality for taxonomy editing software:

 Aliases – Need to deal with  Inter-category relations – Must be synonyms, but also with able to provide links that don’t alternative labels based on follow hierarchy, and even go language or other factors. between vocabularies.  Notes – Useful to have several  Poly-hierarchy – Mid-range tools types of notes fields to keep public should deal with this. notes separate from team’s  Rules checking – Check working notes. conformance to style rules like  Effective dates – Enable the length, use of &, etc. determination of what was the  Workflow – Tracking the handling ‘valid’ taxonomy on dates in the of change requests, as well as the past. Part of a set of strong process of getting approvals for requirements on provenance. edits. Sample taxonomy editor: Data Harmony

Hierarchy Browser

Standard Term Info MultiTes Taxonomy Tool

• Z39.19 compatible taxonomy editor • Self-study: http://www.multites.com/lessons.htm – Getting Started with MultiTes Pro – Navigating your thesaurus – Importing data from text files – Working with Subject Categories – Working with Multilingual Thesauri MultiTes: Formatting an import file Recommendation: Use a text editor (Notepad)

Subject Taxonomy MultiTes Import Format Arithmetic Arithmetic Factoring Operations Operations BT: Operations Addition BT: Arithmetic Properties of Operations Subtraction Addition BT: Operations Multiplication BT: Operations Estimation Division Subtraction BT: Operations Roots BT: Operations Fractions Factorials Multiplication BT: Arithmetic Factoring BT: Operations Decimals Properties of Operations Division BT: Arithmetic Estimation BT: Operations Comparison of numbers Fractions Roots BT: Arithmetic Decimals BT: Operations Exponents Comparison of numbers Factorials BT: Arithmetic Exponents BT: Operations MultiTes: Create a new taxonomy, then Import a file

• File > New • Navigate to destination directory, then enter filename • Click Continue button in New Thesaurus pop-up • File > Import • Navigate to target file, then click Open button MultiTes: Imported taxonomy MultiTes: Hierarchy report

• Reports > Top term – Not Hierarchical • In Select Term range tab, click on Print/Export button – Default should be set to Output to: Screen MultiTes: Hierarchy report MultiTes: Alphabetical report

• Reports > Alphabetical report • Click on Print/Export button MultiTes exercise

• Format a small taxonomy (10-20 terms, 2-3 levels deep) • Import it into MultiTes. • Generate hierarchy (TopTerm) and alphabetical reports. ¿Questions?

Joseph A. Busch, + 415-377-7912, [email protected] http://www.ppc.com