Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 11

Video Search: New Challenges in the Pervasive Digital Video Era

Katerina Pastra and Stelios Piperidis Institute for Language and Speech Processing, Artemidos 6 and Epidavrou, 151-25, Maroussi, Greece, email kpastra,[email protected]

Abstract REVEAL THIS

The explosion of multimedia digital content and the development of technologies that go beyond tradi- 1 Introduction tional broadcast and TV have rendered access to such The proliferation of digital multimedia content and the content important for all end-users of these technolo- subsequent growth of the number of digital video li- gies. While originally developed for providing access braries have boosted research on the development of to multimedia digital libraries, video search technolo- video retrieval systems. Video search technology has gies assume now a more demanding role. In this paper, traditionally been conceived as a way to enable effi- we attempt to shed light onto this new role of video cient access to large video data collections, with au- search technologies, looking at the rapid developments tomatic indexing and cataloguing of this data being in the related market, the lessons learned from state an essential derivative of its development. The quest of art video search prototypes developed mainly in for efficient video retrieval mechanisms is still ongo- the digital libraries context and the new technologi- cal challenges that have risen. We focus on one of ing, with a number of different unimodal and mul- the latter, i.e., the development of cross-media deci- timodal techniques being implemented and evaluated sion mechanisms, drawing examples from REVEAL (cf. TRECVID competitions, [HC04]). THIS, an FP6 project on the retrieval of video and lan- However, the emergence of technologies crossing guage for the home user. We argue, that efficient video the boundaries between traditional broadcast and the search holds a key to the usability of the new “per- Internet, and between traditional television and com- vasive digital video” technologies and that it should puters broadens the scope of developing video search involve cross-media decision mechanisms. functionalities. Does this, new, reinforced role of video search render it indispensable for end-users of Keywords: Interactive TV, Video Search, Video the new technologies? Within the rapidly changing Retrieval, Pervasive Digital Video, Digital Libraries, multimedia-processing context, does video search be- come an even more challenging task in terms of the Digital Peer Publishing Licence retrieval performance required for achieving high us- Any party may pass on this Work by electronic ability of the new technologies it is embedded in? means and make it available for download under In this paper, we look into the role of video search in the terms and conditions of the current version the light of the new “convergent” technologies and the of the Digital Peer Publishing Licence (DPPL). technological challenges that are subsequently posed The text of the licence may be accessed and on its development. In order to do so, we present the retrieved via Internet at market status and trends in video search for the new http://www.dipp.nrw.de/. technologies, as well as the search mechanisms used within commercial and research prototypes. We dis- First presented at the 4th European Interactive cuss the effects of the new role subsumed by video TV Conference EuroITV 2006, extended and revised search and focus mainly on the use of cross-media de- for JVRB cision mechanisms for dealing with such effects. Last, urn:nbn:de:urn:nbn:de:0009-6-10736, ISSN 1860-2037 Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 11 we present REVEAL THIS, a research project which 2.1 The market players attempts to implement cross-media mechanisms for increasing the usability of both its pull and push video Electronics manufacturers, software companies, access scenarios. telecommunication giants, cable companies, broad- casters, content owners all have an interest in the new market that is being created and which extends 2 Market Status and Trends in Video beyond professional users to everyday laymen, to Search home users. As expected, the convergence of internet and TV has not only resulted in the convergence The formation of regulations for the new technolo- of the corresponding business sectors, but has also gies that enable traditional broadcast and the internet created new ones, which are interested in providing to converge (e.g. Internet Protocol TV (IPTV), Peer- end-to-end services, i.e., aggregation of video (and to-Peer (P2P) networks, Mobile TV) proves that these other) digital content and distribution of this content technologies -with the ever growing popularity- have to interested users through a push (data routing become something more than a trend or optimistic according to a user profile) or pull (data search and prospect ([Car05, Bro05]). They already are a new re- retrieval) model. ality, in which: The business sectors which are being actively in- volved in providing an enhanced TV experience (i.e. • Traditional TV sets can be extended with in- IPTV, i-TV, mobile TV etc.) could be classified in the telligent digital video recorders (DVRs), set-top following categories according to their main business boxes with PC-like functionalities, or can even activities1: communicate with personal computers for dis- playing streamed digital media through gaming consoles enabling an enhanced, interactive TV a Content owners: production companies (broad- experience [SBH+05]. casters who also produce their own content e.g. BBC are included in this category too) • TV viewing goes beyond traditional TV sets, in mobiles and portable digital media play- b TV service providers: satellite and cable compa- ers (i-pods) enabling on-the-move TV watching nies (e.g. Comcast, BeTV etc.) and TV broad- [Cha05]. casters (e.g. CNN, RTBF etc.) • File-swapping networks and headline syndica- c P2P service networks: networks that allow for tion technology facilitates the exchange of not file-swapping e.g. BitTorrent, e-donkey etc. only professional/copyrighted TV programmes but also of consumer-generated ones (e.g. pod- d Electronics manufacturers: manufacturers of casts, video blogs, etc.) [Bor05, Jaf05]. DVRs and set-top boxes (e.g. TiVo, Akimbo, Sci- In opening new markets, suggesting new business entific Atlanta etc.), portable digital media play- models, and indicating new content distribution chan- ers (e.g. Apple), mobiles (e.g. Siemens). nels, all these technologies boost the availability of digital video content and motivate digitisation. In this e Computer networking companies (e.g. Cisco), light, the question of easy and efficient access to video Internet Service Providers (e.g. AOL) and Phone content that was once posed in relation to digital video companies (e.g. BellSouth, Verizon etc.) collections raises again more demanding than ever. In this section, we will look into the role of video f Internet Protocol TV software developers (e.g. search within this new context. This will shed light , Myrio, Virage) on the implications this new role has on the develop- ment of video search mechanisms and will point to- g Content service providers: content monitoring wards possible aspects that need to be taken into con- companies which provide push and/or pull ser- sideration for developing highly usable video access vices (e.g. TVEyes, BlinkX) services. We present the interested parties in the mar- 1The categories are not mutually exclusive; on the contrary, ket of the new TV/video technologies and their stance there are organisations with a wide range of activities, which span towards video search functionalities. more than one category. urn:nbn:de:urn:nbn:de:0009-6-10736, ISSN 1860-2037 Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 11

h Web content aggregators: companies that aggre- the features of its DVRs, allowing the user to down- gate digital media (text, audio, video) or links to load movies from the internet, buy products, search these media and present them online to a user local movie theatre listings etc. [Shi06]. Computer upon request/search e.g. Google and Yahoo networking companies such as Cisco are also extend- ing their reach to the consumer networking market and i Content re-packaging companies: companies IPTV for set-top boxes, in particular, through a num- that acquire content e.g. sports videos/TV pro- ber of company acquisitions [Rea05b]. In most cases, grammes and re-package it for meeting various in-house IPTV software comes part-and-parcel with user needs e.g. interactive viewing of a car the products of the electronics manufacturers (i.e. set- race, where the user can choose his/her preferable top boxes, DVRs etc.). When not, alliances or acquisi- viewing angle(s) (e.g. Nascar) tions of businesses take place, with IPTV software de- velopers playing a key-role; for example, Siemens has All these market players have their own interest acquired Myrio in its attempt to enter the IPTV market in the new technologies and therefore attempt to ex- [Rea05a]. Deals between more than one business sec- tend the services they provide by either enriching in- tors are also growing; for example, Microsoft, BT and house developments or by making alliances with each Virgin are collaborating for providing mobile TV ser- other. Figure 1 illustrates the relative position of each vices, Microsoft contributing all software needed for business sector in the chain from content owners to packaging and viewing TV on mobiles, BT contribut- end-users of enhanced-TV. The chain reveals the de- ing a new network for mobile TV and Virgin mak- pendencies between the different market sectors for ing use of the network and providing the service in achieving an end-to-end service; these dependencies a new mobile phone [Rea06]. However, if IPTV soft- justify the increasing number of alliances, partnerships ware is what makes enhanced-TV-experience possible, and mergings among key players in these sectors and video search software seems to be what makes it us- the tendency of some businesses to extend their own able [Del05]. development activities in order to avoid such collab- orations. The dependencies illustrated in figure 1 are transitive, namely, if X is dependent on Y, and Y is 2.2 Trends in providing video search services dependent on Z, then X is dependent on Z too. Depen- dency, in this case, denotes a need for a company X to The new business sectors that have emerged (g to i collaborate with a company Y that belongs to another above) rely heavily on video/digital media search tech- business sector (i.e. provides different services or/and nology. This is the main asset and the source of com- products) or a need for company X to compensate for petition for both content service providers and web the Y services/products on its own, by extending its content aggregators. For example, TVEyes e-mails own services/products. links to digital media from a number of sources to In particular, content owners are essential in allow- a user according to his profile and/or his queries, it ing digitisation/monitoring/capture of and access to allows a user to search both archived and real time TV/video content; all parties interested in allowing broadcast content, it has extended its search function- users to watch proprietary content need agreements alities to podcasts and blogs [Pog05] and has even with them. They are themselves interested in tak- included a desktop search functionality to its engine. ing advantage of the new content distribution chan- One of its main competitors, Blinkx, allows also for an nels that are opened. TV service providers and phone integrated search providing results from a user’s desk- companies (cellular and other telecommunication gi- top and the web, including TV content, blogs, images, ants) are also competing for entering the IPTV mar- P2P content and for the creation of a customized TV ket; for the former this is a natural extension to their channel based on topics/keyword searches provided current services for which they already have a success- by the user. Automatic continuous streaming video ful business-model, while the latter consider this a new is played whenever the user logs in and can be down- opportunity [Rea05a] for extending their internet ser- loaded to a PC [Mil05]. The search tool is proactive, in vice provision services or/and allowing mobile-TV. the sense that it looks for information related to what On the other end, electronics manufacturers want the user is working on in other programs, without wait- to make the best out of the functionalities of the de- ing for the user to initiate a search [Reg06]. vices they develop. TiVo, for example, has extended On their turn, some of the big search engines have urn:nbn:de:urn:nbn:de:0009-6-10736, ISSN 1860-2037 Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 11

Figure 1: Dependencies & dependency trends among different business sectors

extended their searches to video too (cf. Yahoo video providers are another sign of the growing need of all and Google video - [Ols05]), while others, such as business sectors in the TV/video market to manage and AOL, have acquired video search software compa- distribute content efficiently to their end-users. For ex- nies (e.g. SingingFish and Truveo) for accommodat- ample, the Press Association, BBC and TVEyes part- ing these needs [OM05]. What is more important to nership in developing the “Politics Today” portal, in note though, is that other business sectors involved in which users can search for and watch audio and video the enhanced-TV-experience market have realised the files from e.g. the British parliament along with re- importance of video search technology too and have lated news articles reveals a tendency of content own- proceeded in partnerships with video search market ers for going beyond traditional archival indexing and players. TiVo, for example, has partnered with Ya- retrieval of their collections for their own professional hoo: from any yahoo TV episode page found, the use to providing such access to laymen [Rel05]. user can click on a “record to my TiVo box button” Figure 1 illustrates these direct and indirect depen- and schedule the recording through the yahoo por- dencies (dependency trends) of some business sectors tal [Sin05]. Microsoft has also been developing its on the use of video search technology. own video search software, which could possibly be used by telecom operators who already make use of 2.3 Employed video search mechanisms its IPTV software [LaM05]. While most implementation details are kept aside, it Furthermore, electronics manufacturers make al- seems that the video search mechanisms currently in liances with content aggregators (e.g. Motorola and the market have one or more of the following charac- Google deal, Intel and Google etc.), and so do phone teristics: companies (e.g. Vodafone and Google deal) in order to include video search in the services they provide. Part- • They make use of metadata provided for each nerships between content owners and content service video/TV programme by the content owners urn:nbn:de:urn:nbn:de:0009-6-10736, ISSN 1860-2037 Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 11

or/and the broadcaster (e.g. information also • There is an abundance of non-English video/TV available in electronic programme guides). Mi- programmes of interest to users, which currently crosoft, AOL and Yahoo rely exclusively on such cannot be retrieved, unless English metadata are data; the latter has also developed a headline syn- associated to them. dication based protocol for facilitating publishers in adding metatags to media files (Media RSS) • Going beyond keyword matching to intelligent [Ols05]. processing through query expansion and seman- tic search strategies is important not only for • They are text-based and actually run on closed achieving better precision and recall in retrieval, captions (e.g. Google) or/and on automati- but mainly for making the best out of queries cally generated transcriptions of the audio (e.g. formed by laymen (i.e. uninitiated everyday users TVEyes). BlinkX, on its turn, searches for of the video search services). keywords in the audio track by transforming a keyword into a concatenation of phonemes and • Users are expected to lose interest, if they have to matching the latter with the audio track. watch long video files until they (probably) reach • They apply mostly to English video files. the segment that is directly related to their query. On the other hand, sufficiently long segments are • They are mainly keyword-based searches with necessary for users being properly informed re- semantic expansion being quite restricted (e.g. garding their query. Google identifies concepts associated to the tex- tual keywords) The quest for more accurate and effective video search mechanisms has been undertaken within re- • The video retrieved is either the entirety of a pro- search prototypes. We turn to them for suggestions on gramme (or a part of its structure e.g. a film developing more efficient video search mechanisms. chapter) or a short segment consisting of a few seconds before and after the keyword matched [Inc05]. 3 Video Search in Research Proto- Will such video search mechanisms suffice for effi- types cient access to the increasing number of digital video content? The above-mentioned video search services In this section, we first present the lessons learned are still in their infancy, with beta versions of the en- from state of art prototypes that have been developed gines having been released in early summer of 2005; and evaluated within a “digital collection” access sce- therefore, no usability evaluation of these services is nario; in doing so, we explore whether they go beyond currently available. Still, some problems with such the above-mentioned video search drawbacks and if mechanisms are evident: so, to what extent their suggestions could be imple- mented in the new enhanced-TV context. We discuss • Metadata creation is time and effort consumming how differences in the context of use affect the devel- and in many cases video files are accompanied by opment of video search mechanisms and turn to sug- limited or no such information. Especially in the gestions on effective video search for the new tech- case of consumer-generated video content, meta- nologies by recently completed and on-going research data cannot be taken for granted for retrieving a projects. video file

• Closed captions are not always available for TV 3.1 Video search mechanisms in digital col- programmes (not available at all for other types lection scenarios of video data), while automatic transcriptions are not very robust for being the sole source of re- The large digital library initiatives in mid-nineties trieval; especially in cases when the recording boosted research on the development of video analysis conditions are noisy (e.g. field interviews), au- technologies and in particular of video indexing and tomatic speech recognition (ASR) degrades con- retrieval. Projects such as Informedia [Hau05] and Fis- siderably [Hau05]. chlar [Sme05] explored a variety of unimodal and/or urn:nbn:de:urn:nbn:de:0009-6-10736, ISSN 1860-2037 Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 11 multimodal mechanisms for intelligent indexing, re- text-based retrieval and actually the query-type based trieval, summarization and visualization of mainly fusion they suggest opens a number of research ques- broadcast news files. tions that remain to be addressed [Hau05]. In 2001, the need to compare the different video re- trieval approaches/prototypes led to the organisation of a Video track within the Text Retrieval and Evalua- tion Conference (TRECVID), a competition that is be- The TRECVID evaluation setup itself (and the par- ing organised every year since [HC04, Sme05]. In this ticipating systems) is more focused on a digital video evaluation campaign, competing systems are given a collection application scenario and remains detached test video collection (mainly broadcast news in Eng- from the “pervasive digital video” reality. The users lish), a number of statements expressing an informa- of its video search scenario are not the everyday tion need (topic/query) and a common shot boundary laymen/novice computer users reached by the new reference (retrieval unit) and they are asked to return enhanced-TV services. The queries/information need a ranked list of at most a thousand shots which best statements used are not relevant to the new context satisfy the need. The users who form the queries are of convergent technologies and the search unit is a trained analysts and the queries themselves may con- keyframe (representative of a shot retrieved for the sist of a mix of visual descriptions and semantic infor- user) rather than a video segment that will be pre- mation (e.g. “find shots zooming on the US Capitol sented to the user. A structured video collection with dome”) [Sme05]. no variety of genre/domain (mostly news broadcasts) Competing systems implemented a variety of in- is presupposed. The development of query-type de- dexing and retrieval mechanisms, which were able to pendent multimedia mechanisms for video retrieval re- tackle lack of metadata and closed captions, degraded lies on the availability of a “query and corresponding ASR output and strict keyword matching. It was gen- retrieved videos” corpus for training a system; given erally found that [HC04]: that there is a non-finite type of queries a user may sub- mit in the enhanced-TV context, this is an unrealistic • Systems that performed medium-specific analy- approach for the corresponding video search services. sis (e.g. automatic speech recognition, face de- tection, image analysis etc.) and which subse- quently fused the medium-specific individual re- All these must be taken into consideration when trieval scores for each shot were found to perform drawing conclusions regarding the direction the devel- slightly better than systems which relied on e.g. opment of video search mechanisms in the new con- text-based only retrieval mechanisms. text should take. It is no coincidence that stronger • Learning the best linear weights for fusing such suggestions on the need for multimedia approaches multimedia information according to the type of to video search, come from research projects that the query was found to be helpful for retrieval. implement applications for leisure and entertainment within the “digital home” and/or for the “mobile • Text query enhancement through lexical re- user”. Projects such as UP-TV [TPKC05], BUSMAN sources or associated information found in web [XVD+04] and AceMedia [BPS+05, PKS+04] are documents retrieved by a (cf. also characteristic cases; the projects consider the associa- work in the PRESTOSPACE project, [DTCP05]), tion of low-level image features and higher-level con- relevance feedback mechanisms [Jon04] and use cepts important for video indexing and retrieval and of high-level concepts associated with low-level develop tools and formalisms for the corresponding features, all seemed to contribute in more accu- semi-automatic annotation. Such an association goes rate video retrieval [Sme05]. beyond query-type dependent learning for multime- dia fusion and actually takes the TRECVID findings These findings point to slight benefits from the im- a step further. In what follows, we look into the pa- plementation of multimedia approaches for video in- rameters that effect the development of video search dexing and retrieval, ones that make use of both vi- technologies for use in the enhanced-TV context and sual and linguistic analyses of video content. Still, which have led researchers to focusing on the image- they fail to convince on the necessity of going beyond language association need. urn:nbn:de:urn:nbn:de:0009-6-10736, ISSN 1860-2037 Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 11

3.2 Technological challenges in the new video independent, should be able to analyse any medium- search context specific part of the video (i.e. audio, image, subti- tles etc.) robustly, to structure the video (i.e. iden- In going from digital video libraries to the pervasive tify meaningful, comprehensive stories/segments in it digital video reality a number of video access aspects covering a specific topic for presentation to the user), have been extended affecting the development of video to take user preferences into consideration and work search mechanisms: in real time, with the highest precision. However, From advanced computer users to laymen technology is far from achieving this. As mentioned While digital video library users were advanced earlier, recent research projects that at develop- computer users, the enhanced-TV context reaches ing video access prototypes for applications within the everyday people, uninitiated computer users. This af- enhanced-TV context, focus on the multimedia nature fects not only the type of queries (information need) of videos for dealing with this challenging task, and formed, but also their quality and accuracy, the expec- in particular on the association of semantically equiv- tations/demands on video retrieval accuracy, the length alent medium-specific pieces of information extracted of the retrieved unit, the domain and language of the from a video. retrieved data. In the next section, we present one such project, RE- From structured data collections to pervasive video VEAL THIS, which takes things a step even further, data suggesting that one should go beyond this association Digital video collections consist of data of a spe- to crossing media for performing efficient video index- cific genre and domain (or of finite number thereof), ing. with expected content and structure and of profes- sional quality. The availability of any type of video in the “convergent technologies” reality entails that video 4 The REVEAL THIS Suggestion: access mechanisms should deal with not only pro- cross-media decision mechanisms fessionally made videos, but also consumer-generated ones of probably lower quality, and more noise. Fur- The REVEAL THIS project (REtrieval of VidEo And thermore, abundance of video means non-finite vari- Language for The Home user in a Information Society) ety in domains and languages, language genres, pro- is an EC funded FP6 project which aims at developing nunciations, visual imagery and even cues revealing software for efficient access to multimedia content2. programme structure. The variety of sources of data The envisaged use scenario of the technology that is implies that associated metadata will not always be of being developed in the project is illustrated in figure 2. the same granularity or even of the same type and in Web, TV and /or Radio content monitored by the many cases will not even be available. service provider is fed into the REVEAL THIS pro- From static to dynamic search totype, it is being analysed, indexed, categorized and Within the enhanced-TV context, video search be- summarized and stored in an archive. This data can comes dynamic in two senses; first of all, not only be searched and/or can be pushed to a user according archived data but also real-time broadcast data needs to his/her interests. Novice and advanced computer to be searched, and actually the timely delivery of such users are targeted; the former will be using mainly a data to the user can become a competitive advantage simple mobile phone interface where information will for a content service provider. This imposes time- be pushed to, while advanced computer users will use constraints to a search mechanism. On the other hand, a web interface for searching for information. Cross- the search model itself goes beyond traditional search lingual retrieval is enabled (i.e. the user can issue a triggered by a user through an interface to pro-active, query in one language and retrieve data in another), personalised search for a user according to his/her while the summaries presented to the user are not only known profile and interests. Effectiveness in indicat- translated to the user’s language but are also person- ing “interesting” data for the user becomes even more alized according to a user’s interest or specific query. crucial and personalization is rendered central for the English and Greek for EU politics, news and travel- functionality of the search mechanism. related data are handled by the prototype. Based on the above, one could argue that the ideal Figure 3 illustrates the system architecture; input video search mechanism for use by e.g. a content ser- vice provider, should be language, domain and genre 2For more details cf. http://www.reveal-this.org and [PP05]. urn:nbn:de:urn:nbn:de:0009-6-10736, ISSN 1860-2037 Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 11

Figure 2: The REVEAL THIS use scenario

data is analyzed by a number of different medium- anisms supposed to decide on though? specific processors (i.e. speech processing compo- Crossing-media for achieving a task has normally nent, image analysis component, etc.) each of which taken the form of finding semantic equivalences be- provides its own interpretation of the data contribut- tween different media both of which express the same ing to the “bag” of time-aligned (in the case of audio information (e.g. a TV news programme and a cor- and video files) medium-specific pieces of information responding web news article) [BKW99]. REVEAL (e.g. speaker turns, named entities, facts, image cate- THIS suggests mechanisms for crossing-media within gories, face ids, text categories etc.). These pieces of the same multimedia document (i.e. video), where one information are available for the more complex mod- needs to go beyond the notion of semantic equivalence ules, which are responsible for indexing, categoriza- to other semantic interaction relations. tion and summarization. Cross-media decision mechanisms go a step fur- These more complex modules are viewed as deci- ther from medium-specific feature fusion suggested in sion mechanisms that should take advantage of the the literature (cf. section 3.1) and attempt to decide semantic interaction between the different medium- whether two medium-specific representations: specific pieces of information available for a video segment. The latter is the result of a similarly viewed a are associated (i.e. there is a semantic equiva- segmentation mechanism, which stands as the process- lence in their meaning) - what multimedia inte- ing (indexing- categorization- summarization-) unit gration does [PW04] (e.g. the image of the Euro- for these modules. These units/segments are seman- pean Parliament building and the corresponding tically coherent video segments covering a particular textual label), topic. The REVEAL THIS attempts to develop cross- media decision mechanisms for video segmentation, b are not associated but they complement each video indexing, video categorization and multimedia other, they collaborate in forming a message (e.g. summarization are still ongoing. What are these mech- the image of Josep Borrell and the audio of his urn:nbn:de:urn:nbn:de:0009-6-10736, ISSN 1860-2037 Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 11

Web Text Radio TV

Media Manager Video

Text Audio IAC – Keyframe Extraction, FDIC – TPC - Text Text SPC - Speech Image Analysis & Keyframes Face Detection processing processing Image Categorization & Identification

Automatically extracted metadata: TPC named entities, terms, facts Text Text Categories categorization SPC: speaker turns, speaker names, text IAC: shortcuts, keyframes, image features, image categories FDIC: face regions, names Translation

Story Boundary Detection Media Server: Cross -media Stories Storage Personallization Browse Cross-media Indexing, Query Cross-Media Categorization Retrieve Multimedia Summarization Push Notifications

Figure 3: The REVEAL THIS system architecture

speech on the European Commission stance on indexing term. In case (c), the indexing mechanism the war in Iraq) , or will have to choose (a) piece(s) of information from a specific medium that is to be trusted as more reliable c are not associated, they do not collaborate for expressing the video segment content. This choice in forming a message, they are actually se- could be domain- and/or genre-specific, it could rely mantically incompatible (contradicting), due to on each medium processor’s confidence scores etc. e.g. errors in automatic medium-specific analy- The third case is particularly important in view of the sis/interpretation technological challenges mentioned in section 3.2. Taking as an example the case of a cross-media in- One could go on elaborating the notion of cross- dexing mechanism for a video retrieval system, one media decision mechanisms addressing also the meth- could say that once the mechanism takes as input the ods that could be used for implementing such mech- medium-specific analysis of a video segment (e.g. the anisms. However, this goes beyond the scope of this output of the image analysis component, the face de- paper, which aims at suggesting these mechanisms as a tection component, the speech processing component direction in video search research, which seems more and the natural language understanding component), it appropriate in the new “pervasive digital video real- has to decide which medium-specific pieces of infor- ity”. mation (medium-specific interpretations of the video) are more representative of the content of the segment for indexing. 5 Conclusion In case (a), the association indicates a highly sig- nificant piece of information, which could be used as In this paper, we looked into the growing market of an indexing term. In case (b), a conjunction of differ- the convergent-technologies that go beyond traditional ent medium-specific pieces of information forms the broadcast and TV and in particular the effects on and urn:nbn:de:urn:nbn:de:0009-6-10736, ISSN 1860-2037 Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 11 from video search services in this context: market authors would like to thank the project participants trends have shown that video search technology is in- for fruitful discussions on the market aspects of video dispensable for those involved in the product chain search. from TV/video content to the end-user. We further presented the characteristics of the video search ser- vices provided in this market and the ones of the more References advanced video search techniques developed in state [BKW99] Susanne Boll, Wolfgang Klas, and Jochen of the art prototypes. The latter point to the fact that Wandel, A Cross-Media Adaptation Strat- one should take advantage of the different medium- egy for Multimedia Presentations, Pro- specific pieces of information that can be extracted ceedings of the seventh ACM interna- from a video; however, they fail to convince on the tional conference on Multimedia (Part 1) necessity for doing so which could be due to their re- (New York, NY, USA), ACM Press, 1999, stricted development and evaluation within a digital li- ISBN 1-58113-151-8, pp. 37–46. braries scenario. We, therefore, discussed the effects on the develop- [Bor05] John Borland, File-swap TV comes into ment of video search mechanisms when shifting from focus, http://www.news.com, August 8th, the digital libraries application scenario (for which 2005, 2005, Last visited July 31st, 2006. most of these systems were developed) to the perva- [BPS+05] Stephan Bloehdorn, Kosmas Petridis, sive digital video context; this shift was found to ren- Carsten Saathoff, Nikos Simou, Vassilis der the need for taking advantage of the multimedia Tzouvaras, Yannis S. Avrithis, Siegfried nature of videos more demanding and to make the Handschuh, Ioannis Kompatsiaris, Stef- suggestion of crossing-media for the task worth in- fen Staab, and Michael G. Strintzis, Se- vestigating. Thus, we presented the notion of cross- mantic Annotation of Images and Videos media decision mechanisms as suggested by the RE- for Multimedia Analysis, Proceedings of VEAL THIS project, a project that attempts to imple- the Second European Semantic Web Con- ment such mechanisms for the home user. ference, ESWC 2005, Lecture Notes in We have, therefore, concluded that efficient video Computer Science, Vol. 3532, Springer, search is indispensable for highly usable technologies 2005, ISBN 3-540-26124-9, pp. 592–607. that go beyond traditional broadcast and TV and that cross-media decision mechanisms may hold the key [Bro05] Anne Broache, Digital TV for its development. changeover suggested for 2009, The rapid developments in the new “pervasive digi- http://www.news.com, July 12th, 2005, tal video” market require that we step back from core 2005, Last visited July 31st, 2006. research and development in video analysis and re- trieval, in order to look at the new role video search [Car05] Doreen Carvajal, New eu rules for televi- technologies have come to play and the subsequent sion: content without frontiers?, The In- technological challenges that have risen, to learn from ternational Herald Tribune, November 28, the last decade of developing video search mecha- 2005 Monday, 2005. nisms within the digital libraries context and re-direct [Cha05] B. Charney, Free tv for your mobile, research accordingly; otherwise, there is a risk of de- http://www.news.com, May 17th, 2005, veloping technologies that are detached from the ac- 2005. tual needs of the target users, detached from the new reality that is rapidly being formed around us. In this [Del05] Kevin J. Delaney, Tv’s future may be paper, we attempted to take this step back and get an web search engines that hunt for video, overview of the video search related situation. Wall Street Journal, December 16, 2004, Thursday, Section B; Page 1, Column 5, 2005. 6 Acknowledgements [DTCP05] Mike Dowman, Valentin Tablan, Hamish This work is carried out in the framework of the RE- Cunningham, and Borislav Popov, Con- VEAL THIS project (FP6-IST-511689 grant). The tent Augmentation for Mixed-Mode News urn:nbn:de:urn:nbn:de:0009-6-10736, ISSN 1860-2037 Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 11

Broadcasts, Proceedings of the 3rd Euro- [OM05] Stefanie Olsen and Elinor Mills, pean Conference on Interactive TV: User AOL launches video search service, Centred ITV Systems, Programmes and http://www.news.com, June 30, 2005, Applications, 2005. 2005, Last visited July 31st, 2006. + [Hau05] Alexander G. Hauptmann, Lessons for [PKS 04] Kosmas Petridis, Ioannis Kompatsiaris, the Future from a Decade of Informe- Michael G. Strintzis, Stephan Bloe- dia Video Analysis Research, Image and hdorn, Siegfried Handschuh, Steffen Video Retrieval, 4th International Confer- Staab, Nikos Simou, Vassilis Tzouvaras, ence, CIVR 2005, Lecture Notes in Com- and Yannis S. Avrithis, Knowledge Repre- puter Science, Volume 3568, 2005, ISBN sentation for Semantic Multimedia Con- 978-3-540-27858-0, pp. 1–10. tent Analysis and Reasoning, Knowledge- Based Media Analysis for Self-Adaptive [HC04] Alexander G. Hauptmann and Michael G. and Agile Multi-Media, Proceedings of Christel, Successful Approaches in the the European Workshop for the Integra- TREC Video Retrieval Evaluations, Pro- tion of Knwoledge, Semantics and Digital ceedings of the 12th annual ACM inter- Media Technology, EWIMT 2004, 2004, national conference on Multimedia, SES- ISBN 0-902-23810-8. SION: Brave new topics – session 3: the [Pog05] D. Pogue, A very cool podcast service, effect of benchmarking on advances in se- The New York Times, July 27th, 2005, mantic video, 2004, ISBN 1-58113-893-8, 2005. pp. 668–675. [PP05] Stelios Piperidis and Harris Papageor- [Inc05] StreamSage Inc., Rich Media Indexing giou, REVEAL THIS: retrieval of multi- Technology Approaches, White Paper, media multilingual content for the home March 2005, 2005. user in an information society, The 2nd European Workshop on the Integration of [Jaf05] Joshua Jaffe, Online travel-guides: the Knowledge, Semantics and Digital Media next generation, http://www.news.com, Technology, EWIMT 2005, 2005, ISBN 0- September 28, 2005, 2005. 86341-595-4, pp. 461–465. [Jon04] Gareth J. F. Jones, Adaptive Systems for [PW04] Katerina Pastra and Yorick Wilks, Vision- Multimedia Information Retrieval, Adap- Language Integration in AI: A Real- tive Multimedia Retrieval, First Inter- ity Check, Proceedings of the 16th Eu- national Workshop, AMR 2003, Lec- reopean Conference on Artificial Intel- ture Notes in Computer Science, Vol- ligence, ECAI’2004, including Presti- ume 3094, 2004, ISBN 978-3-540-22163- gious Applicants of Intelligent Systems, 0, pp. 1–18. PAIS 2004, 2004, ISBN 1-58603-452-9, pp. 937–944. [LaM05] Martin LaMonica, Microsoft signs on Alcatel for IPTV, http://www.news.com, [Rea05a] Marguerite Reardon, Siemens tack- February 22nd, 2005, 2005, Last visited les Microsoft IPTV dominance, July 31st, 2006. http://www.news.com, June 13th, 2005, 2005, Last visited July 31st, 2006. [Mil05] Elinor Mills, , http://www.news.com, Oc- [Rea05b] , Tech firms focus on TV, tober 3, 2005, 2005, Last visited July 31st, http://www.news.com, November 23rd, 2006. 2005, 2005, Last visited July 31st, 2006. [Ols05] Stefanie Olsen, Yahoo, Google turn [Rea06] , Microsoft, BT, Virgin team on up volume on video search battle, mobile TV, http://www.news.com, Febru- http://www.news.com, May 4, 2005, ary, 14th 2006, 2006, Last visited July 2005, Last visited July 31st, 2006. 31st, 2006. urn:nbn:de:urn:nbn:de:0009-6-10736, ISSN 1860-2037 Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 11

[Reg06] Keith Regan, Revamped Blinkx Pro- gram to Deliver Automatic Searches, E- Commerce Times, July 3rd, 2006, 2006, Last visited July 31st, 2006.

[Rel05] PA Press Release, Pa business website set to revolutionize parliamentary moni- toring, The Press Association, November 21st, 2005, 2005.

[SBH+05] Richard Shim, John Borland, Eva Hansen, Stefanie Olsen, and B. Schuler, MeTV: Finally you are in control, http://www.news.com, April 2005, 2005, Last visited July 31st, 2006.

[Shi06] Richard Shim, Tivo beefs up patent port- folio, http://www.news.com, April 6th, 2005, 2006, Last visited July 31st, 2006.

[Sin05] Michael Singer, Yahoo, TiVo to connect services, http://www.news.com, Novem- ber 6th 2005, 2005, Last visited July 31st, 2006. Citation [Sme05] Alan F. Smeaton, TRECVID: 5 Years Katerina Pastra and Stelios Piperidis, Video Search: of Benchmarking Video Retrieval Oper- New Challenges in the Pervasive Digital Video Era, ations, MUSCLE/ImageCLEF Workshop Journal of Virtual Reality and Broadcasting, 3(2006), 2005, 2005. no. 11, May 2007, urn:nbn:de:0009-6-10736, ISSN 1860-2037. [TPKC05] Chrisa Tsinaraki, Panagiotis Poly- doros, Fotis Kazasis, and Stavros Christodoulakis, Ontology-Based Seman- tic Indexing for MPEG-7 and TV-Anytime Audiovisual Content, Multimedia Tools and Applications 26 (2005), 299–325, ISSN 1380-7501.

[XVD+04] Li-Qun Xu, Paulo Villegas, Monica´ D´ıez, Ebroul Izquierdo, Stephan Herrmann, Vincent Bottreau, Ivan Damnjanovic, and Damien Papworth, A User-Centred Sys- tem for End-to-End Secure Multimedia Content Delivery: From Content Anno- tation to Consumer Consumption, Inter- national Conference on Image and Video Retrieval, Lecture Notes in Computer Sci- ence, Volume 3115, 2004, ISBN 3-540- 22539-0, pp. 656–664.

urn:nbn:de:urn:nbn:de:0009-6-10736, ISSN 1860-2037