SemanticTagging 1/29/09 4:41 PM Page 38

also featured Review PAGE 21 Data Audits for Content Security PAGE 32 X1 PROFESSIONAL CLIENT How Semantic Tagging Increases Findability PAGE 38

Case Study PAGE 45 A CASE OF DOCUMENT—AND content news ORGANIZATIONWIDE—COLLABORATION Bizo Means Business: Tackling the Ad-Targeting Dilemma PAGE 12 Semantra 2.5 Searches for Business Intelligence PAGE 14 This article is reprinted with permission from EContent magazine, October, 2008. © Online, a division of Information Today, Inc.

Heather Hedden How SEMANTIC TAGGING Increases Findability

indability is about making information easier to find. After all, if it cannot be found, it may as well not exist. Leading information specialists have been saying this for years, and now with the increasing F volume of content and increasing pressures of time, money, and competition, more of us are finding this statement to be true. In addition to traditional -based indexing, information architecture has evolved to make browsing and navigation methods more effective, search engine capabilities have been improving to help us find the proverbial needle in the haystack, and bookmarking and social tagging have emerged to help us find our own content, and that we share with members of a social networking group. The various methods of enhancing findability each have their limitations. Traditional document indexing/material cataloging and web information architecture do not go deep enough. Indexing is usually at the document level, and cataloging only works on the level of the material as a whole (books, sound recordings, video recordings, etc.). Information architecture aids in the navigation of a website, intranet, or portal, but in itself it is often not

1 WWW. ECONTENTMAG. COM SemanticTagging 1/29/09 4:41 PM Page 39 SemanticTagging 1/29/09 4:42 PM Page 40

How Semantic Tagging Increases Findability

Alexander Street Press LLC has developed highly structured facets of tags for The Alexander Street Press’ highly specific categories for Early Encounters plays and scenes. in North America

sufficient for finding specific information. different things to people coming from documents, to reflect the meaning of Search engines match user-entered key- different parts of the information man- the content. Human subject indexing is words and phrases to those found agement field. It may be used inter- inherently semantic, because human within the texts or metatag fields of changeably with “semantic indexing” in indexers can discern the meaning of documents, but these are still just word contexts where “indexing” is used for content. This has been done by periodical matches and do not necessarily go after “tagging.” Nevertheless, in the quest for and other database index publishers for the meaning of a document. For example, better methods of findability, the term decades. Once the domain of large many words are quite ambiguous, and semantic tagging is starting to appear in database publishing companies (H.W. search results would not be accurate on descriptions of information services and Wilson, ProQuest, Gale, EBSCO, etc.), words such as “state,” “log,” or “screen”— products, blogs, online articles, and pre- more affordable client/server and desktop even in combination with other words. sentations. software for taxonomy management, Social tagging only involves files or web- indexing, and web database publishing pages that the user and colleagues have SEMANTIC TAGGING IN PUBLISHED have enabled publishers of all sizes to already viewed or created. More signifi- INDEXES engage in this form of semantic indexing. cantly, though, social tagging tends to “Semantic information … enables Meanwhile, the growing popularity of suffer from inconsistent application of publishers to distinguish their content social tagging has made users more aware tags, such as using both synonyms from their competitors,’” explains Bill of the value of subject terms that reflect (movie, motion picture, film), singular/ Kasdorf of Apex CoVantage, organizer/ the meaning of a piece of content in plural forms, and abbreviations moderator of a preconference seminar comparison-free text word/phrase search. (Corporation/ Corp., information/info). on semantic tagging at the Society for Nevertheless, there are publishers that New techniques and tools are being Scholarly Publishing’s (SSP) annual consider semantic tagging to be something developed to address the shortcomings of conference this May in Boston. “In more than mere controlled vocabulary- these various approaches to finding infor- addition, great progress has been made based human indexing; they are pursuing mation and to deliver better results in an recently in moving semantics beyond the new techniques. This was evident in increasingly competitive information theoretical: Actual publishers are actually the participation in the SSP Boston industry. “Semantic tagging,” in the various doing it, and they'reactually getting real conference’s semantic tagging seminar, ways that it is understood, is a term that benefits from it.” Say What You Mean: How Semantic describes many of these new (and some Some people would argue that semantic Tagging Makes Content More Discoverable, not-so-new) findability approaches. tagging is nothing new. It can be defined More Useful, and More Valuable. Semantic tagging is by no means an as the assigning of selected controlled One way that semantic indexing is accepted concept with an agreed upon vocabulary (aka taxonomy) terms, distinguished from traditional subject definition. Other than the obvious “tagging especially by trained indexers, to content indexing of documents is that it focuses for meaning,” semantic tagging means items, such as articles, images, or other on concepts rather than the documents

3 WWW. ECONTENTMAG. COM SemanticTagging 1/29/09 4:42 PM Page 41

Silverchair search results, indexed to the chapter subsection level and utilitizing a structured taxonomy

According to Zarnegar, “Tagging should be done at the smallest ‘atomic’ level that can stand on its own if taken out.” Whether the original source is a book, article, or pamphlet, subject indexing is often done to the paragraph level.

SEMANTIC TAGGING IN SEARCH Turning to the area of automated as a whole. Panel presenter Stephen that precisely captures each subject in its search and retrieval, enterprise search Rhind-Tutt, president of Alexander Street appropriate facet. engines, content management systems, Press, LLC, explained that semantic Another way that semantic indexing is and related discovery and data mining indexing can answer complex questions distinguished from traditional subject products that do not utilize human of who, what, and when, such as “What indexing of documents is that it focuses indexing, semantic tagging obviously battles during the Civil War resulted in on pieces of content at a finer, granular plays a smaller role. Nevertheless, some more than 1,000 deaths?” Regular level rather than the documents as a of these vendors claim to offer semantic indexing merely answers the question whole. This is an approach taken by capabilities. In the competitive enterprise “What documents discuss this battle?” medical research database developer search space, new technologies are often Specialized and multilevel facets (or Silverchair, as explained by its CTO Jake based on either autocategorization , depending on your perspective) Zarnegar: “We apply semantic tags at (automatic indexing/tagging) or various of controlled vocabularies can be imple- any change of topic or concept in the text analytics techniques, such as pattern mented to support semantically complex data at any level—including articles, recognition or entity extraction. Most of user queries, as done by humanities sections, paragraphs, tables, figures, text analytics is not semantic because it publisher Alexander Street Press. Its equations, sidebars, videos, etc. Many does not discern the meaning of words, database of theatrical plays is indexed by taxonomic tagging systems deal with the but rather may classify words by part of the top-level facets, including playwright entire data entity as one unit.” Using its speech (grammar). Various forms of data, theater data, specific production data, internally developed TOTEM taxonomy autocategorization, on the other hand, theater company information, character management platform, Silverchair inserts may or may not have a degree of characteristics, scene data, and play text taxonomy tags into the XML content. semantic technology involved. data. Its Early Encounters in North America history database has nine controlled vocabularies, including author, source, year, place environment, flora, fauna, encounter, people, personal event, and cultural event. Setting up the controlled vocabulary and facets requires one to “go into the data and ask ‘what are the latent semantic issues that will be asked’ … This needs to be discipline specific,” according to Rhind-Tutt. Finally, the content searched with faceted taxonomies and supporting interfaces needs to be sufficiently struc- tured with metadata, tagging, or indexing

Collexis Holdings’ Research Profiles database with weighted subjects indicated in bar graphs

O CTOBER 2008 ECONTENT 4 SemanticTagging 1/29/09 4:42 PM Page 42

How Semantic Tagging Increases Findability

Companies Featured in This Article statistical approaches including frequency, in niche service areas. For example, uniqueness, and data field location (such Relevad, whose tagline is “semantic Alexander Street Press, LLC Silverchair www.alexanderstreet.com www.silverchair.com as title or text body), terms’ relative keyword analytics,” provides hosted web Apex CoVantage Teragram Corp. weights are displayed with bar graphs. service for online advertisement placing. www.apexcovantage.com www.teragram.com According to Collexis COO Steve Leicht, Relevad claims a growing database of Collexis Holdings, Inc. TextWise www.collexis.com www.textwise.com who also presented on the SSP panel, more than 8 million keywords and more Interwoven, Inc. Thomson Reuters’ Calais semantic tagging “can include taxonomic than 500 million neighbor keyword www.interwoven.com service www.opencalais.com tagging, ontology-based tags, topic meanings. Trovix, meanwhile, provides a Northern Light Group, LLC www.northernlight.com Zigtag, Inc. maps, other controlled vocabularies, web service of matching jobs to www.zigtag.com Relevad mixed statistical approaches, etc.” resumes utilizing complex scoring www.relevad.com While much of text analytics does not algorithms in combination with a involve semantic analysis, the specialty of “hierarchical knowledgebase” of U.S. natural language processing (NLP) is cities, skills, positions, industries, and In cases where autocategorization often involved in such attempts. NLP has companies. search solutions or content management many other applications beyond semantic software come prepackaged with tax- analysis and tagging, but it is being applied SEMANTIC SOCIAL TAGGING onomies or have a feature to build or in that area as well. At the fourth annual The term “tagging” is most strongly automatically generate taxonomies (which Semantic Technology conference in San associated these days with social tagging only some vendors offer), there is a Jose, Calif., in May, the topic of semantic or social bookmarking, whereby people potential for what may be called semantic tagging was presented by TextWise, a assign tags (terms or keywords) of their tagging. A simple taxonomy as used in developer of text extraction, search, cate- own choice to documents, blog posts, or information architecture with a hierarchy gorization, and classification technologies webpages that they have created or have of category terms is not sufficient for using both NLP and statistics. In the pre- viewed to assist in locating the documents effective autocategrization. What is sentation “Applying Trainable Semantic later, whether by themselves or by others. needed is really more of a “thesaurus” Vectors to Tagging, Search/Discovery, Better known tagging websites and services style of taxonomy, whereby there is a Bookmarking and Matching,” a panel of include Delicious, Flickr, and Technorati. cluster of synonyms or other equivalent TextWise speakers explained how its There is generally no taxonomy or con- terms (abbreviations, acronyms, spelling Semantic Signatures function as tags for trolled vocabulary involved, as any words variations, grammatical variations, etc.) bookmarking or in generating tags to can be used as tags, although this is for each concept in the taxonomy. Thus, map/link an existing tag set. changing in some applications. the taxonomy is comprised not merely of Semantic tagging’s integration with Fundamentally, this type of tagging is words, but of concepts which derive search technologies is also being applied “semantic” as well, because humans meaning (“semantics”) from their cluster of synonyms. Autocategorization products that provide integrated taxonomies include Interwoven, Inc.’s MetaTagger; Teragram Corp.’s Categorizer and Taxonomy Manager; and Northern Light Goup, LLC’s Enterprise Search Engine, MI Analyst, and Analyst Direct. Northern Light supports what it calls “meaning extraction.” Knowledge discovery vendor Collexis Holdings, Inc. makes use of taxonomies in what it calls semantic tagging by using weighted taxonomy terms. In the Collexis Knowledge Dashboard product, based on

Fuzzzy tagging/tag creation UI supporting parent tags (broader terms), friend tags (related terms), and child tags (narrower terms)

5 WWW. ECONTENTMAG. COM SemanticTagging 1/29/09 4:42 PM Page 43

manually tag content for what it means. are supported. Thus, Fuzzzy “enables about tags that allows semantically The problem is that this tagging is done global distributed tagging.” The organic indexed terms to interoperate with other based on what the document means to tag set of Fuzzzy is built upon the Topic similarly indexed terms.” the tagger at the time of tagging, not Map ISO standard and an underlying (This is discussed at more length in the necessarily what it means to other users infrastructure with Web Services. blog post “Tagging and the Semantic or even to the initial tagger at a later time. It isn’t just new kids with extra con- Web”; see www.designmills.com/2008/05/ Furthermore, any lists of the occurrences sonants pursuing social tagging however. 20/tagging-in-the-semantic-web.) of a tag can be long, undifferentiated, and Big, established content players are also While social tagging can be made more ambiguous. The term “semantic tagging” getting involved. Thomson Reuters offers semantic, we have to remember that social within the sphere of social tagging, there- its open Calais Web Service, which tagging is not always about pure findability. fore, is being used to refer to a method of ingests unstructured text and, using NLP, The social aspect is about identifying what imposing consistent and more refined and returns RDF-formatted results identi- other people have labeled as interesting or meaning. In other words, utilizing some fying entities, facts, and events within the noteworthy, especially if there is a rating kind of a taxonomy. Such semantic social text. In May, Calais was made available as aspect in involved. For the semantic web, tags are also being called “rich tags.” Not plug-in software for the Drupal publishing on the other hand, information findability only are the tags’ meanings clarified by platform, Yahoo!’s new Searchmonkey is a major objective, as stated in W3C’s synonyms, but there also may be links to service, and the WordPress blogging platform. Semantic Web Activity Statement: "to related-term tags and the presence of The Calais plug-in for WordPress, called create a universal medium for the glossary definitions for tags. In other Tagaroo, returns tag suggestions based on exchange of data. It is envisaged to words, semantic tags or rich tags are text typed into a blog but gives users the smoothly interconnect personal infor- essentially terms in what is known to option of choosing which they want to apply. mation management, enterprise application librarians as a thesaurus. Calais also offers licensed code to make integration, and the global sharing of Social tagging sites/services that offer one’s site part of the “Semantic Web.” commercial, scientific and cultural data.” what they call semantic tagging include Silverchair’s Zarnegar put it well: Zigtag, a Canadian startup, and individual- TAGGING AND THE SEMANTIC WEB “Semantic tagging is best applied in areas led projects Faviki and Fuzzzy (yes, with Finally, semantic tagging can be when there is a qualitative ‘best answer’ to a three z’s). Zigtag (in private beta as of this defined as tagging for the semantic web. user query (as opposed to a ‘most popular’ writing) is a sidebar plug-in, which differ- This involves tags that make use of answer) … If you look at industries where entiates itself from other tagging services RDF (Resource Description Framework) semantic tagging (and structured data) by providing a “semantic dictionary” of specifications or OWL (Web Ontology have found a foothold (aviation, medicine, more than 2 million tags. Tags are defined Language) of the World Wide Web genetics, chemistry, and others) you’ll see and synonyms are linked together. Faviki Consortium (W3C). This also implies they are not areas where you want to go is a social bookmarking tool that provides being used for public webpages that can too far with iffy information!” terms from Wikipedia, extracted by the be accessed with semantic web browsers, open DBpedia tool. This not only provides rather than merely internal enterprise or consistency, but also extensive definitions library products or services. As such, a tag for each of more than 2.18 million is more than a term; it is an object with its HEATHER HEDDEN ([email protected]) IS AN INSTRUCTOR OF CONTINUING EDUCATION WORKSHOPS Wikipedia resources. Fuzzzy, on the other own attributes. According to Rhind-Tutt, AT SIMMONS COLLEGE GRADUATE SCHOOL OF hand, did not start with a prebuilt taxonomy, “The difference between semantic indexing LIBRARY AND INFORMATION SCIENCE, AND FOUNDER AND MANAGER OF THE TAXONOMIES & CONTROLLED but user-created terms are entered into a and standard indexing is that the former VOCABULARIES SIG OF THE AMERICAN SOCIETY FOR INDEXING. shared tag set (thesaurus) and various does more than simply apply subjects to COMMENTS? EMAIL LETTERS TO THE EDITOR TO relationships (broader, narrower, related) terms. It includes the addition of meta-data [email protected].

O CTOBER 2008 ECONTENT 6