Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 11 JVRB EuroITV 2006 Special Issue, no. 4

Video Search: New Challenges in the Pervasive Digital Video Era

Katerina Pastra and Stelios Piperidis Institute for Language and Speech Processing, Artemidos 6 and Epidavrou, 151-25, Maroussi, Greece, email kpastra,[email protected]

Abstract REVEAL THIS

The explosion of multimedia digital content and the development of technologies that go beyond tradi- 1 Introduction tional broadcast and TV have rendered access to such content important for all end-users of these technolo- The proliferation of digital multimedia content and the gies. While originally developed for providing access subsequent growth of the number of digital video li- to multimedia digital libraries, video search technolo- braries have boosted research on the development of gies assume now a more demanding role. In this paper, video retrieval systems. Video search technology has we attempt to shed light onto this new role of video traditionally been conceived as a way to enable effi- search technologies, looking at the rapid developments cient access to large video data collections, with au- in the related market, the lessons learned from state tomatic indexing and cataloguing of this data being of art video search prototypes developed mainly in an essential derivative of its development. The quest the digital libraries context and the new technologi- for efficient video retrieval mechanisms is still ongo- cal challenges that have risen. We focus on one of ing, with a number of different unimodal and mul- the latter, i.e., the development of cross-media deci- timodal techniques being implemented and evaluated sion mechanisms, drawing examples from REVEAL (cf. TRECVID competitions, [HC04]). THIS, an FP6 project on the retrieval of video and lan- However, the emergence of technologies crossing guage for the home user. We argue, that efficient video the boundaries between traditional broadcast and the search holds a key to the usability of the new “per- Internet, and between traditional television and com- vasive digital video” technologies and that it should puters broadens the scope of developing video search involve cross-media decision mechanisms. functionalities. Does this, new, reinforced role of Keywords: Interactive TV, Video Search, Video video search render it indispensable for end-users of Retrieval, Pervasive Digital Video, Digital Libraries, the new technologies? Within the rapidly changing multimedia-processing context, does video search be- come an even more challenging task in terms of the Digital Peer Publishing Licence retrieval performance required for achieving high us- Any party may pass on this Work by electronic ability of the new technologies it is embedded in? means and make it available for download under In this paper, we look into the role of video search in the terms and conditions of the current version the light of the new “convergent” technologies and the of the Digital Peer Publishing Licence (DPPL). technological challenges that are subsequently posed The text of the licence may be accessed and on its development. In order to do so, we present the retrieved via Internet at market status and trends in video search for the new http://www.dipp.nrw.de/. technologies, as well as the search mechanisms used within commercial and research prototypes. We dis- First presented at the 4th European Interactive cuss the effects of the new role subsumed by video TV Conference EuroITV 2006, extended and revised search and focus mainly on the use of cross-media de- for JVRB cision mechanisms for dealing with such effects. Last, urn:nbn:de:urn:nbn:de:0009-6-10736, ISSN 1860-2037 Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 11 JVRB EuroITV 2006 Special Issue, no. 4 we present REVEAL THIS, a research project which 2.1 The market players attempts to implement cross-media mechanisms for increasing the usability of both its pull and push video Electronics manufacturers, software companies, access scenarios. telecommunication giants, cable companies, broad- casters, content owners all have an interest in the new market that is being created and which extends 2 Market Status and Trends in Video beyond professional users to everyday laymen, to Search home users. As expected, the convergence of internet and TV has not only resulted in the convergence The formation of regulations for the new technolo- of the corresponding business sectors, but has also gies that enable traditional broadcast and the internet created new ones, which are interested in providing to converge (e.g. Internet Protocol TV (IPTV), Peer- end-to-end services, i.e., aggregation of video (and to-Peer (P2P) networks, Mobile TV) proves that these other) digital content and distribution of this content technologies -with the ever growing popularity- have to interested users through a push (data routing become something more than a trend or optimistic according to a user profile) or pull (data search and prospect ([Car05, Bro05]). They already are a new re- retrieval) model. ality, in which: The business sectors which are being actively in- volved in providing an enhanced TV experience (i.e. • Traditional TV sets can be extended with in- IPTV, i-TV, mobile TV etc.) could be classified in the telligent digital video recorders (DVRs), set-top following categories according to their main business boxes with PC-like functionalities, or can even activities1: communicate with personal computers for dis- playing streamed digital media through gaming consoles enabling an enhanced, interactive TV a Content owners: production companies (broad- experience [SBH+05]. casters who also produce their own content e.g. BBC are included in this category too) • TV viewing goes beyond traditional TV sets, in mobiles and portable digital media play- b TV service providers: satellite and cable compa- ers (i-pods) enabling on-the-move TV watching nies (e.g. Comcast, BeTV etc.) and TV broad- [Cha05]. casters (e.g. CNN, RTBF etc.) • File-swapping networks and headline syndica- c P2P service networks: networks that allow for tion technology facilitates the exchange of not file-swapping e.g. BitTorrent, e-donkey etc. only professional/copyrighted TV programmes but also of consumer-generated ones (e.g. pod- d Electronics manufacturers: manufacturers of casts, video blogs, etc.) [Bor05, Jaf05]. DVRs and set-top boxes (e.g. TiVo, Akimbo, Sci- In opening new markets, suggesting new business entific Atlanta etc.), portable digital media play- models, and indicating new content distribution chan- ers (e.g. Apple), mobiles (e.g. Siemens). nels, all these technologies boost the availability of digital video content and motivate digitisation. In this e Computer networking companies (e.g. Cisco), light, the question of easy and efficient access to video Internet Service Providers (e.g. AOL) and Phone content that was once posed in relation to digital video companies (e.g. BellSouth, Verizon etc.) collections raises again more demanding than ever. In this section, we will look into the role of video f Internet Protocol TV software developers (e.g. search within this new context. This will shed light , Myrio, Virage) on the implications this new role has on the develop- ment of video search mechanisms and will point to- g Content service providers: content monitoring wards possible aspects that need to be taken into con- companies which provide push and/or pull ser- sideration for developing highly usable video access vices (e.g. TVEyes, BlinkX) services. We present the interested parties in the mar- 1The categories are not mutually exclusive; on the contrary, ket of the new TV/video technologies and their stance there are organisations with a wide range of activities, which span towards video search functionalities. more than one category. urn:nbn:de:urn:nbn:de:0009-6-10736, ISSN 1860-2037 Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 11 JVRB EuroITV 2006 Special Issue, no. 4

h Web content aggregators: companies that aggre- the features of its DVRs, allowing the user to down- gate digital media (text, audio, video) or links to load movies from the internet, buy products, search these media and present them online to a user local movie theatre listings etc. [Shi06]. Computer upon request/search e.g. Google and Yahoo networking companies such as Cisco are also extend- ing their reach to the consumer networking market and i Content re-packaging companies: companies IPTV for set-top boxes, in particular, through a num- that acquire content e.g. sports videos/TV pro- ber of company acquisitions [Rea05b]. In most cases, grammes and re-package it for meeting various in-house IPTV software comes part-and-parcel with user needs e.g. interactive viewing of a car the products of the electronics manufacturers (i.e. set- race, where the user can choose his/her preferable top boxes, DVRs etc.). When not, alliances or acquisi- viewing angle(s) (e.g. Nascar) tions of businesses take place, with IPTV software de- velopers playing a key-role; for example, Siemens has All these market players have their own interest acquired Myrio in its attempt to enter the IPTV market in the new technologies and therefore attempt to ex- [Rea05a]. Deals between more than one business sec- enriching in- tend the services they provide by either tors are also growing; for example, Microsoft, BT and house developments making alliances or by with each Virgin are collaborating for providing mobile TV ser- other. Figure 1 illustrates the relative position of each vices, Microsoft contributing all software needed for business sector in the chain from content owners to packaging and viewing TV on mobiles, BT contribut- end-users of enhanced-TV. The chain reveals the de- ing a new network for mobile TV and Virgin mak- pendencies between the different market sectors for ing use of the network and providing the service in achieving an end-to-end service; these dependencies a new mobile phone [Rea06]. However, if IPTV soft- justify the increasing number of alliances, partnerships ware is what makes enhanced-TV-experience possible, and mergings among key players in these sectors and video search software seems to be what makes it us- the tendency of some businesses to extend their own able [Del05]. development activities in order to avoid such collab- orations. The dependencies illustrated in figure 1 are transitive, namely, if X is dependent on Y, and Y is 2.2 Trends in providing video search services dependent on Z, then X is dependent on Z too. Depen- dency, in this case, denotes a need for a company X to The new business sectors that have emerged (g to i collaborate with a company Y that belongs to another above) rely heavily on video/digital media search tech- business sector (i.e. provides different services or/and nology. This is the main asset and the source of com- products) or a need for company X to compensate for petition for both content service providers and web the Y services/products on its own, by extending its content aggregators. For example, TVEyes e-mails own services/products. links to digital media from a number of sources to In particular, content owners are essential in allow- a user according to his profile and/or his queries, it ing digitisation/monitoring/capture of and access to allows a user to search both archived and real time TV/video content; all parties interested in allowing broadcast content, it has extended its search function- users to watch proprietary content need agreements alities to podcasts and blogs [Pog05] and has even with them. They are themselves interested in tak- included a desktop search functionality to its engine. ing advantage of the new content distribution chan- One of its main competitors, Blinkx, allows also for an nels that are opened. TV service providers and phone integrated search providing results from a user’s desk- companies (cellular and other telecommunication gi- top and the web, including TV content, blogs, images, ants) are also competing for entering the IPTV mar- P2P content and for the creation of a customized TV ket; for the former this is a natural extension to their channel based on topics/keyword searches provided current services for which they already have a success- by the user. Automatic continuous streaming video ful business-model, while the latter consider this a new is played whenever the user logs in and can be down- opportunity [Rea05a] for extending their internet ser- loaded to a PC [Mil05]. The search tool is proactive, in vice provision services or/and allowing mobile-TV. the sense that it looks for information related to what On the other end, electronics manufacturers want the user is working on in other programs, without wait- to make the best out of the functionalities of the de- ing for the user to initiate a search [Reg06]. vices they develop. TiVo, for example, has extended On their turn, some of the big search engines have urn:nbn:de:urn:nbn:de:0009-6-10736, ISSN 1860-2037 Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 11 JVRB EuroITV 2006 Special Issue, no. 4

Figure 1: Dependencies & dependency trends among different business sectors

extended their searches to video too (cf. Yahoo video providers are another sign of the growing need of all and Google video - [Ols05]), while others, such as business sectors in the TV/video market to manage and AOL, have acquired video search software compa- distribute content efficiently to their end-users. For ex- nies (e.g. SingingFish and Truveo) for accommodat- ample, the Press Association, BBC and TVEyes part- ing these needs [OM05]. What is more important to nership in developing the “Politics Today” portal, in note though, is that other business sectors involved in which users can search for and watch audio and video the enhanced-TV-experience market have realised the files from e.g. the British parliament along with re- importance of video search technology too and have lated news articles reveals a tendency of content own- proceeded in partnerships with video search market ers for going beyond traditional archival indexing and players. TiVo, for example, has partnered with Ya- retrieval of their collections for their own professional hoo: from any yahoo TV episode page found, the use to providing such access to laymen [Rel05]. user can click on a “record to my TiVo box button” Figure 1 illustrates these direct and indirect depen- and schedule the recording through the yahoo por- dencies (dependency trends) of some business sectors tal [Sin05]. Microsoft has also been developing its on the use of video search technology. own video search software, which could possibly be used by telecom operators who already make use of 2.3 Employed video search mechanisms its IPTV software [LaM05]. While most implementation details are kept aside, it Furthermore, electronics manufacturers make al- seems that the video search mechanisms currently in liances with content aggregators (e.g. Motorola and the market have one or more of the following charac- Google deal, Intel and Google etc.), and so do phone teristics: companies (e.g. Vodafone and Google deal) in order to include video search in the services they provide. Part- • They make use of metadata provided for each nerships between content owners and content service video/TV programme by the content owners urn:nbn:de:urn:nbn:de:0009-6-10736, ISSN 1860-2037 Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 11 JVRB EuroITV 2006 Special Issue, no. 4

or/and the broadcaster (e.g. information also • There is an abundance of non-English video/TV available in electronic programme guides). Mi- programmes of interest to users, which currently crosoft, AOL and Yahoo rely exclusively on such cannot be retrieved, unless English metadata are data; the latter has also developed a headline syn- associated to them. dication based protocol for facilitating publishers in adding metatags to media files (Media RSS) • Going beyond keyword matching to intelligent [Ols05]. processing through query expansion and seman- tic search strategies is important not only for • They are text-based and actually run on closed achieving better precision and recall in retrieval, captions (e.g. Google) or/and on automati- but mainly for making the best out of queries cally generated transcriptions of the audio (e.g. formed by laymen (i.e. uninitiated everyday users TVEyes). BlinkX, on its turn, searches for of the video search services). keywords in the audio track by transforming a keyword into a concatenation of phonemes and • Users are expected to lose interest, if they have to matching the latter with the audio track. watch long video files until they (probably) reach • They apply mostly to English video files. the segment that is directly related to their query. On the other hand, sufficiently long segments are • They are mainly keyword-based searches with necessary for users being properly informed re- semantic expansion being quite restricted (e.g. garding their query. Google identifies concepts associated to the tex- tual keywords) The quest for more accurate and effective video search mechanisms has been undertaken within re- • The video retrieved is either the entirety of a pro- search prototypes. We turn to them for suggestions on gramme (or a part of its structure e.g. a film developing more efficient video search mechanisms. chapter) or a short segment consisting of a few seconds before and after the keyword matched [Inc05]. 3 Video Search in Research Proto- Will such video search mechanisms suffice for effi- types cient access to the increasing number of digital video content? The above-mentioned video search services In this section, we first present the lessons learned are still in their infancy, with beta versions of the en- from state of art prototypes that have been developed gines having been released in early summer of 2005; and evaluated within a “digital collection” access sce- therefore, no usability evaluation of these services is nario; in doing so, we explore whether they go beyond currently available. Still, some problems with such the above-mentioned video search drawbacks and if mechanisms are evident: so, to what extent their suggestions could be imple- mented in the new enhanced-TV context. We discuss • Metadata creation is time and effort consumming how differences in the context of use affect the devel- and in many cases video files are accompanied by opment of video search mechanisms and turn to sug- limited or no such information. Especially in the gestions on effective video search for the new tech- case of consumer-generated video content, meta- nologies by recently completed and on-going research data cannot be taken for granted for retrieving a projects. video file

• Closed captions are not always available for TV 3.1 Video search mechanisms in digital col- programmes (not available at all for other types lection scenarios of video data), while automatic transcriptions are not very robust for being the sole source of re- The large digital library initiatives in mid-nineties trieval; especially in cases when the recording boosted research on the development of video analysis conditions are noisy (e.g. field interviews), au- technologies and in particular of video indexing and tomatic speech recognition (ASR) degrades con- retrieval. Projects such as Informedia [Hau05] and Fis- siderably [Hau05]. chlar [Sme05] explored a variety of unimodal and/or urn:nbn:de:urn:nbn:de:0009-6-10736, ISSN 1860-2037 Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 11 JVRB EuroITV 2006 Special Issue, no. 4 multimodal mechanisms for intelligent indexing, re- text-based retrieval and actually the query-type based trieval, summarization and visualization of mainly fusion they suggest opens a number of research ques- broadcast news files. tions that remain to be addressed [Hau05]. In 2001, the need to compare the different video re- trieval approaches/prototypes led to the organisation of a Video track within the Text Retrieval and Evalua- tion Conference (TRECVID), a competition that is be- The TRECVID evaluation setup itself (and the par- ing organised every year since [HC04, Sme05]. In this ticipating systems) is more focused on a digital video evaluation campaign, competing systems are given a collection application scenario and remains detached test video collection (mainly broadcast news in En- from the “pervasive digital video” reality. The users glish), a number of statements expressing an informa- of its video search scenario are not the everyday tion need (topic/query) and a common shot boundary laymen/novice computer users reached by the new reference (retrieval unit) and they are asked to return enhanced-TV services. The queries/information need a ranked list of at most a thousand shots which best statements used are not relevant to the new context satisfy the need. The users who form the queries are of convergent technologies and the search unit is a trained analysts and the queries themselves may con- keyframe (representative of a shot retrieved for the sist of a mix of visual descriptions and semantic infor- user) rather than a video segment that will be pre- mation (e.g. “find shots zooming on the US Capitol sented to the user. A structured video collection with dome”) [Sme05]. no variety of genre/domain (mostly news broadcasts) Competing systems implemented a variety of in- is presupposed. The development of query-type de- dexing and retrieval mechanisms, which were able to pendent multimedia mechanisms for video retrieval re- tackle lack of metadata and closed captions, degraded lies on the availability of a “query and corresponding ASR output and strict keyword matching. It was gen- retrieved videos” corpus for training a system; given erally found that [HC04]: that there is a non-finite type of queries a user may sub- mit in the enhanced-TV context, this is an unrealistic • Systems that performed medium-specific analy- approach for the corresponding video search services. sis (e.g. automatic speech recognition, face de- tection, image analysis etc.) and which subse- quently fused the medium-specific individual re- All these must be taken into consideration when trieval scores for each shot were found to perform drawing conclusions regarding the direction the devel- slightly better than systems which relied on e.g. opment of video search mechanisms in the new con- text-based only retrieval mechanisms. text should take. It is no coincidence that stronger • Learning the best linear weights for fusing such suggestions on the need for multimedia approaches multimedia information according to the type of to video search, come from research projects that the query was found to be helpful for retrieval. implement applications for leisure and entertainment within the “digital home” and/or for the “mobile • Text query enhancement through lexical re- user”. Projects such as UP-TV [TPKC05], BUSMAN sources or associated information found in web [XVD+04] and AceMedia [BPS+05, PKS+04] are documents retrieved by a (cf. also characteristic cases; the projects consider the associa- work in the PRESTOSPACE project, [DTCP05]), tion of low-level image features and higher-level con- relevance feedback mechanisms [Jon04] and use cepts important for video indexing and retrieval and of high-level concepts associated with low-level develop tools and formalisms for the corresponding features, all seemed to contribute in more accu- semi-automatic annotation. Such an association goes rate video retrieval [Sme05]. beyond query-type dependent learning for multime- dia fusion and actually takes the TRECVID findings These findings point to slight benefits from the im- a step further. In what follows, we look into the pa- plementation of multimedia approaches for video in- rameters that effect the development of video search dexing and retrieval, ones that make use of both vi- technologies for use in the enhanced-TV context and sual and linguistic analyses of video content. Still, which have led researchers to focusing on the image- they fail to convince on the necessity of going beyond language association need. urn:nbn:de:urn:nbn:de:0009-6-10736, ISSN 1860-2037 Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 11 JVRB EuroITV 2006 Special Issue, no. 4

3.2 Technological challenges in the new video independent, should be able to analyse any medium- search context specific part of the video (i.e. audio, image, subti- tles etc.) robustly, to structure the video (i.e. iden- In going from digital video libraries to the pervasive tify meaningful, comprehensive stories/segments in it digital video reality a number of video access aspects covering a specific topic for presentation to the user), have been extended affecting the development of video to take user preferences into consideration and work search mechanisms: in real time, with the highest precision. However, From advanced computer users to laymen technology is far from achieving this. As mentioned While digital video library users were advanced earlier, recent research projects that at develop- computer users, the enhanced-TV context reaches ev- ing video access prototypes for applications within the eryday people, uninitiated computer users. This af- enhanced-TV context, focus on the multimedia nature fects not only the type of queries (information need) of videos for dealing with this challenging task, and formed, but also their quality and accuracy, the expec- in particular on the association of semantically equiv- tations/demands on video retrieval accuracy, the length alent medium-specific pieces of information extracted of the retrieved unit, the domain and language of the from a video. retrieved data. In the next section, we present one such project, RE- From structured data collections to pervasive video VEAL THIS, which takes things a step even further, data suggesting that one should go beyond this association Digital video collections consist of data of a spe- to crossing media for performing efficient video index- cific genre and domain (or of finite number thereof), ing. with expected content and structure and of profes- sional quality. The availability of any type of video in the “convergent technologies” reality entails that video 4 The REVEAL THIS Suggestion: access mechanisms should deal with not only pro- cross-media decision mechanisms fessionally made videos, but also consumer-generated ones of probably lower quality, and more noise. Fur- The REVEAL THIS project (REtrieval of VidEo And thermore, abundance of video means non-finite vari- Language for The Home user in a Information Society) ety in domains and languages, language genres, pro- is an EC funded FP6 project which aims at developing nunciations, visual imagery and even cues revealing software for efficient access to multimedia content2. programme structure. The variety of sources of data The envisaged use scenario of the technology that is implies that associated metadata will not always be of being developed in the project is illustrated in figure 2. the same granularity or even of the same type and in Web, TV and /or Radio content monitored by the many cases will not even be available. service provider is fed into the REVEAL THIS pro- search From static to dynamic totype, it is being analysed, indexed, categorized and Within the enhanced-TV context, video search be- summarized and stored in an archive. This data can comes dynamic in two senses; first of all, not only be searched and/or can be pushed to a user according archived data but also real-time broadcast data needs to his/her interests. Novice and advanced computer to be searched, and actually the timely delivery of such users are targeted; the former will be using mainly a data to the user can become a competitive advantage simple mobile phone interface where information will for a content service provider. This imposes time- be pushed to, while advanced computer users will use constraints to a search mechanism. On the other hand, a web interface for searching for information. Cross- the search model itself goes beyond traditional search lingual retrieval is enabled (i.e. the user can issue a triggered by a user through an interface to pro-active, query in one language and retrieve data in another), personalised search for a user according to his/her while the summaries presented to the user are not only known profile and interests. Effectiveness in indicat- translated to the user’s language but are also person- ing “interesting” data for the user becomes even more alized according to a user’s interest or specific query. crucial and personalization is rendered central for the English and Greek for EU politics, news and travel- functionality of the search mechanism. related data are handled by the prototype. Based on the above, one could argue that the ideal Figure 3 illustrates the system architecture; input video search mechanism for use by e.g. a content ser- vice provider, should be language, domain and genre 2For more details cf. http://www.reveal-this.org and [PP05]. urn:nbn:de:urn:nbn:de:0009-6-10736, ISSN 1860-2037 Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 11 JVRB EuroITV 2006 Special Issue, no. 4

Figure 2: The REVEAL THIS use scenario

data is analyzed by a number of different medium- anisms supposed to decide on though? specific processors (i.e. speech processing compo- Crossing-media for achieving a task has normally nent, image analysis component, etc.) each of which taken the form of finding semantic equivalences be- provides its own interpretation of the data contribut- tween different media both of which express the same ing to the “bag” of time-aligned (in the case of audio information (e.g. a TV news programme and a cor- and video files) medium-specific pieces of information responding web news article) [BKW99]. REVEAL (e.g. speaker turns, named entities, facts, image cate- THIS suggests mechanisms for crossing-media within gories, face ids, text categories etc.). These pieces of the same multimedia document (i.e. video), where one information are available for the more complex mod- needs to go beyond the notion of semantic equivalence ules, which are responsible for indexing, categoriza- to other semantic interaction relations. tion and summarization. Cross-media decision mechanisms go a step fur- These more complex modules are viewed as deci- ther from medium-specific feature fusion suggested in sion mechanisms that should take advantage of the the literature (cf. section 3.1) and attempt to decide semantic interaction between the different medium- whether two medium-specific representations: specific pieces of information available for a video segment. The latter is the result of a similarly viewed a are associated (i.e. there is a semantic equiva- segmentation mechanism, which stands as the process- lence in their meaning) - what multimedia inte- ing (indexing- categorization- summarization-) unit gration does [PW04] (e.g. the image of the Euro- for these modules. These units/segments are seman- pean Parliament building and the corresponding tically coherent video segments covering a particular textual label), topic. The REVEAL THIS attempts to develop cross- media decision mechanisms for video segmentation, b are not associated but they complement each video indexing, video categorization and multimedia other, they collaborate in forming a message (e.g. summarization are still ongoing. What are these mech- the image of Josep Borrell and the audio of his urn:nbn:de:urn:nbn:de:0009-6-10736, ISSN 1860-2037 Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 11 JVRB EuroITV 2006 Special Issue, no. 4

Web Text Radio TV

Media Manager Video

Text Audio IAC – Keyframe Extraction, FDIC – TPC - Text Text SPC - Speech Image Analysis & Keyframes Face Detection processing processing Image Categorization & Identification

Automatically extracted metadata: TPC named entities, terms, facts Text Text Categories categorization SPC: speaker turns, speaker names, text IAC: shortcuts, keyframes, image features, image categories FDIC: face regions, names Translation

Story Boundary Detection Media Server: Cross -media Stories Storage Personallization Browse Cross-media Indexing, Query Cross-Media Categorization Retrieve Multimedia Summarization Push Notifications

Figure 3: The REVEAL THIS system architecture

speech on the European Commission stance on indexing term. In case (c), the indexing mechanism the war in Iraq) , or will have to choose (a) piece(s) of information from a specific medium that is to be trusted as more reliable c are not associated, they do not collaborate for expressing the video segment content. This choice in forming a message, they are actually se- could be domain- and/or genre-specific, it could rely mantically incompatible (contradicting), due to on each medium processor’s confidence scores etc. e.g. errors in automatic medium-specific analy- The third case is particularly important in view of the sis/interpretation technological challenges mentioned in section 3.2. Taking as an example the case of a cross-media in- One could go on elaborating the notion of cross- dexing mechanism for a video retrieval system, one media decision mechanisms addressing also the meth- could say that once the mechanism takes as input the ods that could be used for implementing such mech- medium-specific analysis of a video segment (e.g. the anisms. However, this goes beyond the scope of this output of the image analysis component, the face de- paper, which aims at suggesting these mechanisms as a tection component, the speech processing component direction in video search research, which seems more and the natural language understanding component), it appropriate in the new “pervasive digital video real- has to decide which medium-specific pieces of infor- ity”. mation (medium-specific interpretations of the video) are more representative of the content of the segment for indexing. 5 Conclusion In case (a), the association indicates a highly sig- nificant piece of information, which could be used as In this paper, we looked into the growing market of an indexing term. In case (b), a conjunction of differ- the convergent-technologies that go beyond traditional ent medium-specific pieces of information forms the broadcast and TV and in particular the effects on and urn:nbn:de:urn:nbn:de:0009-6-10736, ISSN 1860-2037 Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 11 JVRB EuroITV 2006 Special Issue, no. 4 from video search services in this context: market authors would like to thank the project participants trends have shown that video search technology is in- for fruitful discussions on the market aspects of video dispensable for those involved in the product chain search. from TV/video content to the end-user. We further presented the characteristics of the video search ser- vices provided in this market and the ones of the more References advanced video search techniques developed in state of the art prototypes. The latter point to the fact that [BKW99] Susanne Boll, Wolfgang Klas, and Jochen one should take advantage of the different medium- Wandel, A Cross-Media Adaptation Strat- specific pieces of information that can be extracted egy for Multimedia Presentations, Pro- from a video; however, they fail to convince on the ceedings of the seventh ACM interna- necessity for doing so which could be due to their re- tional conference on Multimedia (Part 1) stricted development and evaluation within a digital li- (New York, NY, USA), ACM Press, 1999, braries scenario. ISBN 1-58113-151-8, pp. 37–46. We, therefore, discussed the effects on the develop- [Bor05] John Borland, File-swap TV comes into ment of video search mechanisms when shifting from focus, http://www.news.com, August 8th, the digital libraries application scenario (for which 2005, 2005, Last visited July 31st, 2006. most of these systems were developed) to the perva- sive digital video context; this shift was found to ren- [BPS+05] Stephan Bloehdorn, Kosmas Petridis, der the need for taking advantage of the multimedia Carsten Saathoff, Nikos Simou, Vassilis nature of videos more demanding and to make the Tzouvaras, Yannis S. Avrithis, Siegfried suggestion of crossing-media for the task worth in- Handschuh, Ioannis Kompatsiaris, Stef- vestigating. Thus, we presented the notion of cross- fen Staab, and Michael G. Strintzis, Se- media decision mechanisms as suggested by the RE- mantic Annotation of Images and Videos VEAL THIS project, a project that attempts to imple- for Multimedia Analysis, Proceedings of ment such mechanisms for the home user. the Second European Semantic Web Con- We have, therefore, concluded that efficient video ference, ESWC 2005, Lecture Notes in search is indispensable for highly usable technologies Computer Science, Vol. 3532, Springer, that go beyond traditional broadcast and TV and that 2005, ISBN 3-540-26124-9, pp. 592–607. cross-media decision mechanisms may hold the key for its development. [Bro05] Anne Broache, Digital TV The rapid developments in the new “pervasive digi- changeover suggested for 2009, tal video” market require that we step back from core http://www.news.com, July 12th, 2005, research and development in video analysis and re- 2005, Last visited July 31st, 2006. trieval, in order to look at the new role video search technologies have come to play and the subsequent [Car05] Doreen Carvajal, New eu rules for televi- technological challenges that have risen, to learn from sion: content without frontiers?, The In- the last decade of developing video search mecha- ternational Herald Tribune, November 28, nisms within the digital libraries context and re-direct 2005 Monday, 2005. research accordingly; otherwise, there is a risk of de- veloping technologies that are detached from the ac- [Cha05] B. Charney, Free tv for your mobile, tual needs of the target users, detached from the new http://www.news.com, May 17th, 2005, reality that is rapidly being formed around us. In this 2005. paper, we attempted to take this step back and get an overview of the video search related situation. Citation Katerina Pastra and Stelios Piperidis, Video Search: 6 Acknowledgements New Challenges in the Pervasive Digital Video Era, Journal of Virtual Reality and Broadcasting, 3(2006), This work is carried out in the framework of the RE- no. 11, May 2007, urn:nbn:de:0009-6-10736, VEAL THIS project (FP6-IST-511689 grant). The ISSN 1860-2037. urn:nbn:de:urn:nbn:de:0009-6-10736, ISSN 1860-2037 Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 11 JVRB EuroITV 2006 Special Issue, no. 4

[Del05] Kevin J. Delaney, Tv’s future may be [Mil05] Elinor Mills, , http://www.news.com, Oc- web search engines that hunt for video, tober 3, 2005, 2005, Last visited July 31st, Wall Street Journal, December 16, 2004, 2006. Thursday, Section B; Page 1, Column 5, 2005. [Ols05] Stefanie Olsen, Yahoo, Google turn up volume on video search battle, [DTCP05] Mike Dowman, Valentin Tablan, Hamish http://www.news.com, May 4, 2005, Cunningham, and Borislav Popov, Con- 2005, Last visited July 31st, 2006. tent Augmentation for Mixed-Mode News Broadcasts, Proceedings of the 3rd Euro- [OM05] Stefanie Olsen and Elinor Mills, pean Conference on Interactive TV: User AOL launches video search service, Centred ITV Systems, Programmes and http://www.news.com, June 30, 2005, Applications, 2005. 2005, Last visited July 31st, 2006.

+ [Hau05] Alexander G. Hauptmann, Lessons for [PKS 04] Kosmas Petridis, Ioannis Kompatsiaris, the Future from a Decade of Informe- Michael G. Strintzis, Stephan Bloe- dia Video Analysis Research, Image and hdorn, Siegfried Handschuh, Steffen Video Retrieval, 4th International Confer- Staab, Nikos Simou, Vassilis Tzouvaras, ence, CIVR 2005, Lecture Notes in Com- and Yannis S. Avrithis, Knowledge Repre- puter Science, Volume 3568, 2005, ISBN sentation for Semantic Multimedia Con- 978-3-540-27858-0, pp. 1–10. tent Analysis and Reasoning, Knowledge- Based Media Analysis for Self-Adaptive [HC04] Alexander G. Hauptmann and Michael G. and Agile Multi-Media, Proceedings of Christel, Successful Approaches in the the European Workshop for the Integra- TREC Video Retrieval Evaluations, Pro- tion of Knwoledge, Semantics and Digital ceedings of the 12th annual ACM inter- Media Technology, EWIMT 2004, 2004, national conference on Multimedia, SES- ISBN 0-902-23810-8. SION: Brave new topics – session 3: the effect of benchmarking on advances in se- [Pog05] D. Pogue, A very cool podcast service, mantic video, 2004, ISBN 1-58113-893-8, The New York Times, July 27th, 2005, pp. 668–675. 2005.

[Inc05] StreamSage Inc., Rich Media Indexing [PP05] Stelios Piperidis and Harris Papageor- Technology Approaches, White Paper, giou, REVEAL THIS: retrieval of multi- March 2005, 2005. media multilingual content for the home user in an information society, The 2nd [Jaf05] Joshua Jaffe, Online travel-guides: the European Workshop on the Integration of next generation, http://www.news.com, Knowledge, Semantics and Digital Media September 28, 2005, 2005. Technology, EWIMT 2005, 2005, ISBN 0- 86341-595-4, pp. 461–465. [Jon04] Gareth J. F. Jones, Adaptive Systems for Multimedia Information Retrieval, Adap- [PW04] Katerina Pastra and Yorick Wilks, Vision- tive Multimedia Retrieval, First Inter- Language Integration in AI: A Real- national Workshop, AMR 2003, Lec- ity Check, Proceedings of the 16th Eu- ture Notes in Computer Science, Vol- reopean Conference on Artificial Intel- ume 3094, 2004, ISBN 978-3-540-22163- ligence, ECAI’2004, including Presti- 0, pp. 1–18. gious Applicants of Intelligent Systems, PAIS 2004, 2004, ISBN 1-58603-452-9, [LaM05] Martin LaMonica, Microsoft signs on pp. 937–944. Alcatel for IPTV, http://www.news.com, February 22nd, 2005, 2005, Last visited [Rea05a] Marguerite Reardon, Siemens tack- July 31st, 2006. les Microsoft IPTV dominance, urn:nbn:de:urn:nbn:de:0009-6-10736, ISSN 1860-2037 Journal of Virtual Reality and Broadcasting, Volume 3(2006), no. 11 JVRB EuroITV 2006 Special Issue, no. 4

http://www.news.com, June 13th, 2005, Damien Papworth, A User-Centred Sys- 2005, Last visited July 31st, 2006. tem for End-to-End Secure Multimedia Content Delivery: From Content Anno- [Rea05b] Marguerite Reardon, Tech firms focus tation to Consumer Consumption, Inter- on TV, http://www.news.com, November national Conference on Image and Video 23rd, 2005, 2005, Last visited July 31st, Retrieval, Lecture Notes in Computer Sci- 2006. ence, Volume 3115, 2004, ISBN 3-540- [Rea06] Marguerite Reardon, Microsoft, 22539-0, pp. 656–664. BT, Virgin team on mobile TV, http://www.news.com, February, 14th 2006, 2006, Last visited July 31st, 2006.

[Reg06] Keith Regan, Revamped Blinkx Pro- gram to Deliver Automatic Searches, E- Commerce Times, July 3rd, 2006, 2006, Last visited July 31st, 2006.

[Rel05] PA Press Release, Pa business website set to revolutionize parliamentary moni- toring, The Press Association, November 21st, 2005, 2005.

[SBH+05] Richard Shim, John Borland, Eva Hansen, Stefanie Olsen, and B. Schuler, MeTV: Finally you are in control, http://www.news.com, April 2005, 2005, Last visited July 31st, 2006.

[Shi06] Richard Shim, Tivo beefs up patent port- folio, http://www.news.com, April 6th, 2005, 2006, Last visited July 31st, 2006.

[Sin05] Michael Singer, Yahoo, TiVo to connect services, http://www.news.com, Novem- ber 6th 2005, 2005, Last visited July 31st, 2006.

[Sme05] Alan F. Smeaton, TRECVID: 5 Years of Benchmarking Video Retrieval Oper- ations, MUSCLE/ImageCLEF Workshop 2005, 2005.

[TPKC05] Chrisa Tsinaraki, Panagiotis Poly- doros, Fotis Kazasis, and Stavros Christodoulakis, Ontology-Based Seman- tic Indexing for MPEG-7 and TV-Anytime Audiovisual Content, Multimedia Tools and Applications 26 (2005), 299–325, ISSN 1380-7501.

[XVD+04] Li-Qun Xu, Paulo Villegas, Monica´ D´ıez, Ebroul Izquierdo, Stephan Herrmann, Vincent Bottreau, Ivan Damnjanovic, and urn:nbn:de:urn:nbn:de:0009-6-10736, ISSN 1860-2037