INNOVATION

CONTENT Mining Gold with Semantics Experts are turning to new models to better defne metadata and to store it more fexibly and accurately By Matt Turner, Chief Technology Officer, Media and Publishing, MarkLogic Corporation

Tis process means creating a row for ev- ery possible type of metadata including dates, contributors, asset information and catego- ries. Tis can become rapidly complex due to the need to defne in advance all the possible permutations of the metadata. Problems may include defning a single model across mul- tiple types of assets, selecting attributes that cover specifc uses and adopting new informa- tion as data and purposes change. In addition to the items of metadata, expressed as the columns in the model, the values of the columns also need to be man- aged. Many of the items of metadata are man- aged in lists called controlled vocabularies or look-up tables. Taxonomy tools are used to create and manage these lists and the selection Abstract: The value of metadata of these items gives the metadata context for continues to grow in today’s digital ith the rise in the value of a given purpose. Te typical taxonomy ap- metadata and its direct proach is to manage categories in a hierarchy marketplaces and maximizing this impact on the new digital with the distinct levels all having distinct sub- value impacts nearly every part of business, metadata is becom- levels of choices. entertainment and media organizations. W ing a critical asset for product creation and Categories and taxonomies are also tai- Metadata is also becoming more distribution. Tis is driving the need for more lored to specifc business purposes. Internal complex, covering not just the asset accurate and fexible metadata. To address this processes have diferent needs than external information but now including product need, organizations need to look beyond the distribution processes, for example. Te meta- traditional tools and technologies for manag- data may have either many similar categories and title data, contributor and topics, ing metadata and embrace new approaches, or generic categories that do not have the and details on usage and consumption. including NoSQL and semantic specifcity required for each distinct business With the rise of new technologies technologies to create, manage and deliver process. Tis can cause additional data pro- tailored to handle the intricacy of more accurate and complete metadata. cessing or clean up as diferent systems use the today’s information, new approaches metadata. Traditional metadata are available to manage metadata. Tese approaches also require that all the Traditional metadata management processes items of metadata including categories be Enterprise NoSQL databases and defne, up-front, the metadata required for the explicitly associated with the metadata record. Semantic technology can make a wider organization, using tools that range from Excel If an asset is associated with several categories, set of data available to users and reduce spreadsheets to taxonomy and ontology man- all of those categories must be associated with the complexity and cost of delivering agement tools. Tese tools all defne a fxed set the asset directly in the metadata record. Tis this valuable resource. of attributes for the metadata, typically in the can lead to metadata models that have hun- rows and columns of a relational . dreds of attributes to explicitly capture all of

M&EJOURNAL 130 INNOVATION CONTENT

the complexity of the information in the single metadata record. Example of a Semantic Triple To address these issues, metadata experts are turning to new models of data to both better defne metadata and more fexibly and accu- rately store it.

NoSQL databases Hierarchical, complex and sparse metadata are not well suited for relational databases and their strict rows and columns. Terefore many A simple example about a contributor could entertainment organization might model all organizations are adopting enterprise NoSQL be information about his location starting with the possible characteristics a famous fctional database technology that allows metadata to be a fact: “John lives in London”. Combined with character might possess, map those attributes defned and stored with a fexible schema with- the information that “London is in England,” over the many diferent times that character out giving up data integrity or security. Instead we can now also state “John lives in England”. has appeared in flms or shows and provide of defning up-front every item that must go Te data modeled in this fashion can be used diferent metadata values tailored for produc- into the metadata record, the NoSQL approach with other data about England to derive new tion and distribution. allows for each record of metadata to store only information about John – for instance that he (and all!) the attributes that make sense for that lives in a monarchy. and NoSQL particular asset. Te model can also be extended with more together Schema fexibility addresses some of the is- information about John. We can state that John New technology capabilities are bringing sues found with the traditional approach: is an actor and, if John is the lead character for the schema fexibility of a NoSQL database ■ Varying types of assets can have the diferent a show, we can link John to the details of that together with power of semantic technology. types of metadata they need with out having to show, all actors of that show, when that show MarkLogic’s Enterprise NoSQL has seam- also carry or ft into a model not suited for that is in production. We now have new levels of lessly combined these two data models into a asset. insight into his activity for instance when he was single component, removing the fnal hurdle ■ A wider set of data can be stored including working in a given year or what other actors he to fully leveraging the value of the semantic data suited for a specifc purpose for one kind has worked with. models with the active, operational metadata. of asset and not another because the model Te creation of this type of data requires data Tis combination addresses some of the can allow for the fexible storage of additional management structures called ontologies. On- major issues with taxonomy management: information without having to pre-defne that tologies are similar to taxonomies that describe ■ Te relationships do not need to be only information. look-up lists except that the ontologies describe hierarchical but can express many diferent as- ■ New information can be added as data sourc- the collection of triples and are not only hierar- sociations and be linked together to correctly es or metadata models change without having to chical. In our example, the ontology would de- model complex metadata attributes. rebuild the system. scribe the types of data we are recording (people ■ Using several ontologies, metadata can With enterprise NoSQL systems like Mark- and places) and the types of relationships they now record relationships (instead of catego- Logic, this fexibility can be used without any can have (lives in and is in). ries) for the many diferent purposes of the loss in data integrity or reduction in security. Ontologies are also expressed as triples and, asset. Enterprise NoSQL allows for processes to along with the types data and relationships, ■ Because the ontologies are independent of ensure the data is accurate and has the core at- they can also defne the values of that data. For the individual metadata records, changes to tributes needed for the system and that the in- instance we can restrict place to a list of known the data and additions of new types of data formation can only be accessed with the correct locations including London and England. can be done at any time without having to permissions and rights. Tis approach is particularly valuable for reprocess or remodel the metadata. entertainment as it allows for the accurate cre- Using semantic triples, the metadata Semantics ation of models – or ontologies – that are tai- record does not need to store attributes for To address the complexity of managing cat- lored to the specifc uses of the metadata within every single possible value related to the egories and taxonomies and to record more an organization. It can also record and manage record. With only a few defned semantic accurately the intricate relationships for an as- the details of that organization’s specifc prod- relationships, an asset can be linked to a much set such as categories, genres and contributors, ucts. If our simple example explains where John broader set of data -- making metadata man- organizations are also embracing Semantic Web can live, a complex real-world example for an agement much more efcient. For example, a technology. Semantic Web technology is a new approach Matt Turner works with customers and prospects to create leading edge to defning and managing relationship data. information and digital content applications with MarkLogic’s Enterprise Using a simple format called a triple, Semantic NoSQL database. Matt works closely with MarkLogic’s customers McGraw-Hill, Web technology indicates facts, concepts and Warner Bros., Conde Nast and LexisNexis. Before joining MarkLogic, Matt was object and how they are related. at Sony Music and PC World. M&EJOURNAL 131 INNOVATION

Melding Data Models and streamlining the content creation process.

CONTENT Te live video feeds, sport scores and sched- ules were also tagged to the ontology with automated feeds updating the information. Tis data was used to create the public website (bbc.sport.co.uk) that, for the Olym- pics alone, featured over 10,000 pages. All of this content was dynamically generated from the ontologies and the articles, video feeds, scores and schedules and was constantly up- dated as results and scores were tallied. Te resulting user experience broke many records in the UK with, at times, more people watch- NoSQL Metadata record (l.) with semantic triples (r.) associated to an ontology. ing and interacting with the events via the site than over traditional broadcast. metadata record using our fctional character, Dynamic delivery the asset only needs to be associated with the In preparation for the hosting of the London Conclusion specifc title where the character appears. All Olympics in 2012, BBC Sport adopted a Se- As metadata becomes even more important to the other data about the character, for instance mantic Web approach to not only the manage- the delivery of digital products for media and where and when they lived, what genres and ment of the metadata of the assets it would be entertainment organizations, leveraging new categories they ft into, would all be automati- delivering on its web pages, but also the delivery approaches to address the many challenges cally available to that record with that one item of the experience to the end users. with traditional tools is becoming critically of metadata. Te BBC Sport team developed sports important. With this model, the metadata can now ontologies that contained the categories of Semantic Web technology combined provide accurate and complete information to events and sports and the key players and or- with Enterprise NoSQL allows metadata to a much wider range of audiences, the processes ganizations in each sport. Individual articles address more information, including data for for maintaining the metadata are much more and information about the teams and athletes multiple purposes with more fexibility in the efcient and the metadata will be a much more were tagged with just the links to the ontology, management of the data while maintaining valuable asset to the organization. greatly reducing the amount of editorial work the data integrity and security. ■

M&EJOURNAL 132