<<

Expert Systems with Applications 37 (2010) 1124–1133

Contents lists available at ScienceDirect

Expert Systems with Applications

journal homepage: www.elsevier.com/locate/eswa

Enhancing TV programmes with additional contents using MPEG-7 segmentation information q

Marta Rey-López *, Ana Fernández-Vilas, Rebeca P. Díaz-Redondo, Martín López-Nores, José J. Pazos-Arias, Alberto Gil-Solla, Manuel Ramos-Cabrer, Jorge García-Duque

University of Vigo, Department of Telematics Engineering, 36310 Vigo, Spain article info abstract

Keywords: Interactive Digital TV offers a large amount of TV channels, as well as new contents that come along with Metadata the TV programmes. To take advantage of these additional contents and make them easily available to MPEG-7 viewers, this paper proposes to offer additional contents linked to the segments of TV programmes by Video tagging means of semantic relations obtained using MPEG-7 segmentation information. As a practical use of this work, we propose two different application fields: t-learning, with the aim of using TV programmes to engage viewers in education; and personalised advertising, whose goal is offering viewers products of their interest, maximising its effectiveness. Ó 2009 Elsevier Ltd. All rights reserved.

1. Introduction these strategies, the programmes can be grouped together accord- ing to two different criteria: the similarity in their contents (con- The arrival of Interactive Digital TV (IDTV) permits viewers to tent-based filtering) and the resemblance between the profiles of access a huge amount of interactive contents, in addition to the tra- the viewers that have watched them (collaborative filtering). In ditional TV programmes: games, web pages, learning contents, this manner, we could be able to offer related contents when the new types of advertisements, etc. However, a problem arises due viewer finishes watching a programme, as shown in Figs. 1 and to the difficulty in preventing the viewer from feeling in that 2. Contextual recommendations take into account the fact that mess of contents and offering him/her only the interesting ones. the user is likely to watch related contents when a programme In this direction, some research efforts focus on designing audiovi- has finished. However, the granularity of these approaches is quite sual contents recommenders (Björkman et al., 2006; Blanco Ferná- coarse, since they deal with entire contents. ndez, Pazos Arias, López Nores, Gil Solla, & Ramos Cabrer, 2006), The main idea of this paper is studying how to identify which according to the viewer’s preferences. However, for the success characteristics of the programmes can arouse the viewer’s curios- of these recommendations, selecting the suitable ones is as impor- ity and at which point of these programmes this curiosity comes tant as offering these contents when the user is more likely to up, as well as finding mechanisms to offer the appropriate addi- watch them. tional contents to satisfy it. Our approach requires a finer granular- The recommendation systems used in many Internet web sites ity than the contextual recommendation ones mentioned above, – e.g. the on-line store Amazon (http://www.amazon.com)(Linden, since it looks for establishing relationships not only with entire Smith, & York, 2003) – address this issue by offering the user some contents but also with some parts of them – such as segments of items related to the one he/she is browsing. On the contrary, TV videos, some pages of a web site or some learning objects instead recommenders usually suggest the viewer isolated contents, of an entire course. Specifically, we are interested in using this ap- although the techniques used are appropriate for the aforemen- proach in two areas of IDTV. On the one hand, to provide the user tioned type of contextual recommendations. Taking advantage of with educational contents related to the programme he/she is watching, in order to use the characteristics of this programme as a bait to engage viewers in education. On the other hand, to offer q Funded by the Ministerio de Educación y Ciencia research project TSI2007- 61599, by the Consellería de Educación e Ordenación Universitaria incentives file the viewer personalised advertisements related to the contents of 2007/000016-0, and by the Programa de Promoción Xeral da Investigación de the programme, in order for him/her to feel the need to buy these Consellería de Innovación, Industria e Comercio research project PGI- products. DIT05PXIC32204PN. Our approach takes into account the fact that the user is more * Corresponding author. likely to get involved in new contents if they are related to the E-mail address: [email protected] (M. Rey-López). URL: http://idtv.det.uvigo.es/~mrey (M. Rey-López). context of the situation he/she is living; in this case, as he/she is

0957-4174/$ - see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2009.06.053 M. Rey-López et al. / Expert Systems with Applications 37 (2010) 1124–1133 1125

Fig. 1. Contextual recommendations similar in content.

Fig. 2. Contextual recommendations watched by similar users.

playing the role of a viewer, the context is constituted by the con- 2. The agents that enhance TV programmes tents of the programme he/she is watching. As an example, no matter how interested is the user in oriental cultures, if we offer We have already mentioned that we want to enhance TV pro- him/her to watch a documentary of the history of kimonos or to grammes with additional contents that are related to some of their buy when he/she turns on TV – like current TV pro- characteristics: subject matter, cast, place, etc. Three different grammes recommenders do –, he/she would probably decide not agents take part in this process: content creators, content providers to do it. On the contrary, if the same elements are offered while and IDTV receivers. watching the film ‘Memoirs of a Geisha’, it will arouse the viewer’s Content creators are the agents that know the content best, that curiosity and the probabilities will increase. In order for these con- is why they can anticipate to the user’s needs by providing addi- textual recommendations to be offered, we need appropriate label- tional contents and applications: removed scenes, videos of the ling mechanisms for the content as well as semantic reasoning filming, biographies of the participants, etc. Besides, these interac- algorithms to find the relationships between the contents. tive contents can also be useful for the content creators themselves In this paper, the next section explains the mechanism of since they can provide feedback from the users. enhancing TV programmes with additional contents, taking into For example, in the reality show ‘Survivor’, the content creators account the agents and phases of the process. This proposal is could add an application to allow viewers to vote for the contestant based on the correct description of the contents, so that relations to maintain in the island, as shown in Fig. 3. can be established between them. Section 3 discusses the different Content providers do not have as much knowledge about the mechanisms to create the descriptions, as well as the different programme as content creators, but they are the ones that best standards used to share them. Section 4 exposes the architecture know about the audience and they are informed about the contents of the system, as well as an example for a better understanding. that they transmit in the same time interval as well as which con- Then, we introduce some application scenarios, focusing on our tents they are interested in transmitting on purpose to comple- fields of research: t-learning (TV-based interactive learning) and ment the target programme, with the aim of publicising them, personalised advertising. Finally, we draw some conclusions about engaging users in new services, etc. For example, the content pro- the proposal and motivate our future work. viders can enhance different scenes of an episode of the series 1126 M. Rey-López et al. / Expert Systems with Applications 37 (2010) 1124–1133

Fig. 3. Voting application for the reality show ‘Survivor’.

Fig. 4. Additional contents for different scenes of ‘Grey’s Anatomy’.

‘Grey’s anatomy’ as shown in Fig. 4, establishing semantic relation- agents mentioned in the previous section to be able to establish ships between the episode and the additional contents. In this relations between them in an automatic way. For the granularity example, the episode is complemented with a web page that ex- of the approach to be fine enough, the relations should be estab- plains what a heart bypass operation is (to clarify the operation lished between fragments of the programme and some elements taking place in the scene), an episode of the series ‘House’ that or fragments of the additional contents. In this manner, the extra deals with the same case of one in the ‘Grey’s Anatomy’ episode, elements should be offered in the appropriate moment during as well as the film ‘Side Effects’ starring , one of the main programme; consequently, it is essential to know what the actresses of the series. is happening at every point of the programme. For this reason, it IDTV receivers are the agents that know the user best, his/her is necessary to divide the programme in segments and appropri- interests and preferences. Consequently, they are the most appro- ately describe them. Two different aspects have to be taken into ac- priate agents to carry out personalization tasks, both filtering out count concerning contents’ segment labelling: how these those additional contents linked by the content providers that descriptions are expressed in a standardized way – so that they are not interesting for the user and recommending him/her new can be shared between different systems – and which are the contents related to the programme that the viewer may like. The mechanisms to create the descriptions. reasons why these new relationships could not be established by the content provider in the previous phase are that the new con- 3.1. Standards used to share the descriptions tents can come from a different content provider or are old con- tents stored in the user’s Personal Video Recorder (PVR). For We have considered two different standards concerning seg- example (Fig. 5), if the user likes Egyptian culture and he/she is mentation and labelling: TV-Anytime (The TV-Anytime Forum, watching the film ‘The Mummy Returns’ where famous pyramids 2004) and MPEG-7 (Multimedia Content Description Interface) appear, it will arouse his/her curiosity and he/she may like to fol- (MPEG, 2003). low a course of art in Egypt (or the module dealing with the period TV-Anytime describes a rather limited content structure, since it of pyramids). defines the segments as temporal intervals within an audiovisual stream – i.e. continuous fragments of a programme. It also allows 3. The description of TV contents defining segment groups, i.e. collections of segments that are grouped together, for a particular purpose or due to a shared prop- The process of enhancing TV programmes is based on the erty. By associating metadata with segments and segment groups, appropriate labelling of TV contents, in order for the different it is possible to restructure and re-purpose an input audiovisual M. Rey-López et al. / Expert Systems with Applications 37 (2010) 1124–1133 1127

Fig. 5. Additional contents for ‘The Mummy Returns’.

stream to generate alternative consumption and navigation modes. For this reason, our approach can be greatly enhanced if the The main advantage of TV-Anytime is its simplicity. However, the description of the programmes provided by the content creators possibilities of adding metadata to the segments are significantly is complemented with additional labelling mechanisms, like col- limited, because it only permits to describe them with title, synop- laborative and automatic tagging. The former is an emerging tech- sis, genre, a set of keywords, links to external material related and nology where many users add metadata in the form of keywords to the list of credits for the segment (actors, directors, etc.). Some ba- shared content, so that they cannot only categorise information for sic descriptions of segments cannot be naturally represented in themselves, they can also browse the information categorised by this standard. For example, TV-Anytime cannot properly describe others (Golder & Huberman, 2006). It has achieved great accep- a simple scene of the ‘Lost’ series where Jin, a Korean character tance in the Internet, in pages like Delicious (http://del.icio.us), played by Daniel Dae , is speaking in Korean to Sun – his wife where collaborative tagging is used to label web pages; or Google –, played by . Video (http://video.google.es), where videos are indexed using the MPEG-7 offers wider possibilities and a more powerful descrip- tags provided by different users. In this manner, different users tion scheme. It allows to define many different types of segments – could be able to add tags to the TV programmes they are watching. not only temporal intervals – as well as to create segment hierar- The most popular tags would be added to the metadata that de- chies. Although MPEG-7 defines a powerful description scheme scribes this video to be sent to some other viewers. As this one is of segments, not all of them are needed for the purposes of this a keyword-based approach, the semantic capabilities of MPEG-7 project, hence, we use only a subset: AudioVisualSegment, Audio- are not needed, and TV-Anytime is enough. However, the former Segment, VideoSegment, MovingRegion (which describes moving is better if these keywords are going to be used to complement spatio-temporal regions of video content), AudioVisualRegion the metadata obtained by other techniques, stored using MPEG-7. (which describes moving spatio-temporal regions of audiovisual Automatic tagging consists in extracting information directly of content) and StillRegion (which describes a 2D spatial region of the video by means of computerised methods, extracting the infor- an image or video frame). MPEG-7 also provides semantic descrip- mation directly from the video (Assfalg, Bertini, Colombo, Del Bim- tion tools to annotate both entire TV programmes or their seg- bo, & Nunziati, 2003; Ekin, Murat Tekalp, & Mehrotra, 2003; ments. These tools can be used to describe semantic entities, Leonardi & Migliorati, 2002; Yu, Leong, Xu, & Tian, 2003), from a re- which include objects, agent objects, events, concepts, states, lated textual source (Bertini, Cucchiara, Bimbo, & Torniai, 2005; places, times and narrative worlds, which are depicted by or re- Gazendam, Malaisé, Schreiber, & Brugman, 2006; Rey-López, lated to multimedia content; semantic attributes and semantic 2008; Saggion et al., 2004) or hybrid ones (Park & Li, 2006). relations – some common ones normalised by MPEG-7, such as location, time or destination, but the description of non-normative 4. Design of the system ones is also permitted. MPEG-7 can describe more complex scenes than TV-Anytime. For example, Fig. 6 shows the semantics of the Previous sections have exposed a general vision of the different example of the ‘Lost’ series that we have mentioned above that agents that take part in the process of enhancing TV programmes, TV-Anytime could not properly describe. as well as the different possibilities to label the contents, studying both the standards and the mechanisms of description. This one 3.2. Mechanisms to create the descriptions explains how we have translated these ideas into a real system (Figs. 7 and 8) whose core aspects are the personalisation mecha- It is commonly supposed that content creators provide descrip- nisms – both reasoning and filtering ones – that perform the tions of the contents in detail, since they are the ones that know enhancement of the programmes and the filtering of the additional the content best. However, it can be argued that this fact makes contents according to the user’s characteristics, as well as the the project lack of feasibility: on the one hand, content creators labelling methods for multimedia content. are not likely to label the programmes if it does not bring them With reference to personalization mechanisms, we have inte- economical benefits; on the other hand, even if they label them, grated in our system the AVATAR recommender developed by they would probably add metadata describing only those charac- our research group. This system is intended to recommend TV con- teristics that will bring such benefits. tents to the viewers according to their profiles, using a hybrid fil- 1128 M. Rey-López et al. / Expert Systems with Applications 37 (2010) 1124–1133

Fig. 6. MPEG-7 semantics of the ‘Lost’ scene example.

Content creators Automatic tagging Manual tagging Authoring tools Reasoning server IDTV receiver: Set-top box

Full Broadcast Additional TV programs ontology network material Semantic Content provider Reasoner MPEG-7 Content provider filterings Return Partial MPEG-7 ontology MPEG-7 tags Semantic MPEG-7 Automatic channel Viewer Vprograms & Filter tagging Collaborative profile Automatic itional material TV programs & record TV programs & PVR Storage manager tagging additional material tagger additional material updates tent provider Interaction engine Feedback agent Content provider MPEG-7 Automatic MPEG-7 TV programs & tagging additional material Aut TV programs & additional material g

Fig. 7. Scenario with reasoning server.

tering approach that combines content-based methods, collabora- helps the latter to filter out the additional contents selected by tive ones, and semantic inference (see Blanco Fernández et al., the former. 2006 for the details). Its reasoning algorithms can be used to estab- An important fact needs to be considered in the enhancement of lish relationships between contents as well as to filter out the addi- TV programmes and the filtering of the additional contents accord- tional contents that are not appropriate for the viewer. In our ing to the viewer’s profile performed by IDTV receivers. These system, this recommender helps both content providers and IDTV receivers usually have very limited memory and computing power. receivers in finding contents related to the TV programmes and For this reason, we have envisaged two different possible scenar- M. Rey-López et al. / Expert Systems with Applications 37 (2010) 1124–1133 1129

Automatic tagging Manual tagging Authoring tools

Full ontology Semantic Additional Broadcast TV programs Reasoner networks material context+feedback MPEG-7 MPEG-7 tags Return Full Semantic ontology Collaborative channel Viewer tagger Filter profile PVR record updates

Full ontology Semantic reasoner context+feedback

MPEG-7 tags

Collaborative tagger

Fig. 8. Scenario without reasoning server. ios. Fig. 7 shows the architecture when set-top boxes are used as applications such as the one exposed in Tsinaraki, Polydoros, Kaz- IDTV receivers. Due to the technological restrictions of these de- asis, and Christodoulakis (2005). vices, they use the filtering capabilities of AVATAR over partial Content providers can also use automatic taggers to comple- ontologies (with less detailed classifications and descriptions) that ment the description of the videos. However, collaborative tagging characterise and interrelate TV programmes to filter out the addi- is the main source for them to get additional information about the tional contents that are not interesting for the viewer. With this programmes. Although viewers are commonly considered passive scheme, if content providers are the only ones that relate the pro- participants, some of them would be more active and collaborate gramme with additional contents, if the user is subscribed to sev- in tagging the content, for instance, with the intention of recording eral content providers, it is not possible to establish relationships the content and watch it again later or recommend it to his/her between contents of different providers or with old contents stored housemates. Besides, content providers could encourage viewers in his/her Personal Video Recorder (PVR). To solve the first restric- to tag the contents, awarding them with additional contents or tion, a reasoning server can be added to the scenario. This system advertising-free programmes. In case that a return channel exists, applies the reasoning capabilities of AVATAR using full ontologies the tags written by the users are sent by the IDTV receiver to a cen- to establish all the relationships between the contents and can tralised server (collaborative tagger in Figs. 7 and 8) where they are be very interesting for content providers since it would publicise combined with the tags added by other users and included in the their contents. description provided by content creators. This centralised server On the other hand, media centres are becoming more and more is located in the reasoning server in case that it exists or in the con- popular, with this type of IDTV receivers – whose characteristics tent providers servers otherwise. are very close to Personal Computer’s ones – the reasoning algo- rithms can be applied in the receiver using full ontologies to estab- 4.1. Example lish relationships with recorded contents or contents from other providers (Fig. 8). With this scheme, if the media centre has enough For a better understanding of the system, let us expose an computing power, the reasoning server could be suppressed and example of the enhancement of an episode of the well-known ser- content providers can establish relationships themselves between ies ‘Lost’. Content creators have sold the episode to the content their contents to publicise them. provider offering the MPEG-7 metadata of Fig. 9, describing both Concerning labelling methods, content creators – after recording, the syntax and semantics of the chapter. In addition, they have preparing and complementing the videos – use automatic taggers added a poll application related to the last scene (Fig. 10). This to extract as much information as possible about the video without scene shows Sawyer – a swindler – offering Charlie – a pop star human intervention. who got over his addiction to drugs when he arrived on the island After automatic tagging, content creators can add additional – a Virgin statue which contains heroin inside. The goal of the information to the program’s metadata by hand or helped by some application is knowing the opinion of the viewers on this respect, 1130 M. Rey-López et al. / Expert Systems with Applications 37 (2010) 1124–1133

Do you think Charlie will suffer a Application added by the relapse in his addiction to content creator drugs? Yes No, he will be strong enough No, but he will keep the Virgin just in case No, Claire will realize and throw all the Virgins to the sea

Other contents broadcast

Medicine Travel Documentary about Web site advertisement Preventing drugs

Contents added by the content provider Likes travelling Relationships established in the Likes learning foreign languages IDTV receiver (if they have reasoning capabilities)

Fig. 9. Structure and description of the ‘Lost’ episode with MPEG-7 metadata.

AgentObject: AgentObject: swindler singer

specializes

specializes petrol station South Pacific pop star drugadict Virgin

location location specializes property substance represents AgentObject: Sawyer contains agent Event: SemanticPlace: AgentObject: Object: Object: swindle beach Charlie heroin statue depictedBy depictedBy depictedBy depictedBy

movin re ion audiovisual se ment still re ion movin re ion still re ion agentOf

Event: argue

agentOf mov ng reg on s reg on au o segmen mov ng reg on depictedBy depictedBy result depictedBy

AgentObject: Object: Event: agentOf AgentObject: Jack pills Speak Jin

specializes agentOfpatient instrument property

AgentObject: Event: Concept: specializes Concept: SemanticState: doctor hold language Korean language Korean

Fig. 10. Example of the process of enhancing TV programmes. asking them if Charlie is going to relapse in his addiction to drugs. Fig. 10 with additional contents like follows. In the first frame, Jack, This application is useful for the content creators since the script- a doctor, and Sawyer, a swindler, are arguing about a bottle of pills. writers can take advantage of the viewer’s opinion to write the Because of its relation to medical issues, this scene has been linked upcoming episodes of the series. to a website about medicine. In the fourth, a picture of a remote Semantic reasoners – whether in content providers’ servers or heavenly beach is shown; thus, an advertisement of travelling to in an independent reasoning server – use the descriptions of the Caribbean islands is attached. The last scene dealing with drug- programmes provided by the content creators to establish rela- addiction is connected to a documentary about how to prevent tions with other contents. In this manner, they use the first pro- your children from taking drugs. gramme as the entrance door for some other TV contents Semantic filtering systems in the user’s IDTV receiver should fil- available, thus increasing the audiences of the latter ones. For in- ter the additional contents added in the previous phase according stance, the content provider has enhanced the ‘Lost’ episode of to the viewer’s preferences. In the example of Fig. 10, the user likes M. Rey-López et al. / Expert Systems with Applications 37 (2010) 1124–1133 1131 travelling and learning foreign languages, for this reason, the web The idea of creating these experiences emerged from the need page about medicine and the documentary about drugs have been to overcome viewer’s passiveness. Since the TV set began to be filtered out, but the ad of a trip to heavenly islands is shown. If the the central point of living rooms, viewers got used to just sit on IDTV receiver can perform semantic reasoning, it can add new con- the sofa and watch what was shown in that box. The only interac- tents from other content providers or that have been previously re- tion took place when changing the channel or, some decades later, corded in the PVR. Among these new contents, we have ‘Intolerable browsing teletext. That is why viewers would be probably passive Cruelty’, a film starring George Clooney and Catherine Zeta-Jones, students, as opposed to e-learning ones. Entercation experiences where she plays the role of a woman who marries several men are thus a way to offer educational contents in the appropriate mo- to obtain their money in the divorce, being him one of her victims; ment, when the user feels the necessity to know more about a par- ‘The Perfect Storm’ where George Clooney plays the role of a fish- ticular subject. An example that illustrates this fact is the one erman who gets trapped in an unusually intense storm; or an shown in Fig. 5, where some educational contents about pyramids introductory course of Korean. Some of them can be related to are offered while watching the film ‘The mummy returns’. the episode: the first one is appropriate for the second frame, Entercation experiences were defined in Rey-López (2006), where Sawyer reminds when he taught his girlfriend a simple where we presented their basis and scenario, as well as how edu- con; the latter suits the third frame, where Jin, a man from Korea cational contents were structured and labelled in order to find the who cannot speak English, talks to his wife in Korean. However, appropriate ones to fulfil user’s educational needs while watching only the latter is offered to the viewer while watching the ‘Lost’ TV programmes. In this aspect, our approach uses the ADL SCORM episode, since he/she likes learning foreign languages and the other (Sharable Content Object Reference Model) Content Aggregation programmes are not interesting for him/her (according to the Model (ADL, 2006). To describe educational contents, SCORM uses information stored in his/her profile). As the entire course would the IEEE Learning Object Metadata Standard (LOM) (LTSC, 2002). be too heavy for a beginner, in order to refine the granularity of For the flip side, the present paper complements the cited one the approach, only the first lesson of the course is offered, allowing since it explains how TV programmes should be described – and the viewer to continue with the rest of lessons after finishing. who should describe them – in order to be able to use these The filtering of contents according to the viewer profile is very descriptions to find educational contents related to the important since showing too many extra contents could over- programme. whelm the viewer. Finally, these additional contents should not To identify new additional learning contents related to the pro- interrupt him/her, but they ought to be discreetly offered. For gramme the user is watching, we establish semantic relationships example, showing a short phrase superimposed on the image or between MPEG-7 segmentation information that goes along with even only an icon in the corner of the screen (as in Fig. 5) indicating the programme and the metadata accompanying the pedagogical the availability of additional contents. The user can watch them contents. These semantic relationships are obtained by the Intelli- when they are offered, at of the programme or never. gent Tutoring System (ITS) T-MAESTRO (Rey-López, Fernández-Vi- las, & Díaz-Redondo, 2006), which takes advantage of the 5. Application fields reasoning mechanisms offered by AVATAR and works with the appropriate ontologies that describe TV contents (TV-ontology Although the proposed approach can be used in many fields of Blanco-Fernández, Pazos-Arias, Gil-Solla, & Ramos-Cabrer, 2005) IDTV, some of them are particularly interesting. In the example of and pedagogical ones (SCORM ontology Rey-López, 2006). Besides, Fig. 10, a module of a course of Korean language is offered to the it also filters the additional contents to reject those ones that are viewer, in this manner the episode of the ‘Lost’ series acts as a bait not interesting for the viewer according to his/her profile. For that to engage him/her in education. We study in Section 5.1 how to ap- to be possible, it analyses the LOM metadata accompanying the ply our approach to the field of t-learning. An advertisement of a learning content and compares it with the user’s preferences. For trip to heavenly islands has been also suggested, attached to a example, this episode contains an AudioSegment of Korean lan- frame in the episode where a beautiful beach on a desert island guage and a MovingRegion with a Korean character. As a was shown, because the viewer liked travelling. The application SCORM-conformant course to learn Korean language was also of this approach in advertising (Section 5.2) is especially interest- broadcast, both elements are related by T-MAESTRO. Finally, as ing, because the benefits obtained from the sponsors when offering the user is interested in learning foreign languages, the ITS links advertisements to the viewers can encourage content creators to these two MPEG-7 segments to the pedagogical content before appropriately describe TV programmes. offering him/her the TV episode.

5.1. Education through IDTV 5.2. Personalised advertising TV has always been close to education, since pedagogical con- tents have commonly been introduced in TV programmes, e.g. The second field of application where we are trying our ap- the famous ‘Sesame Street’. However, the birth of IDTV has opened proach to mark up the TV programmes is personalised advertising. the door to new educational experiences through TV, broadly Our goal here is to implement a system that provides each TV known as t-learning. In this line, we go a step further and use TV viewer with publicity of contents, products or services that match programmes not only to educate (like documentaries) but also as his/her interests and needs, in a way that maximises the effective- a bait to attract viewers towards education. Thus, the viewer is of- ness of the advertising material. To this aim, we are working on a fered educational content related to the matter of the programme publicity model based on two main features that would not be pos- he/she is watching, being free to decide whether studying it or not. sible without MPEG-7: To refer to these new learning experiences, we have coined the term entercation1 (Rey-López, 2006), with the meaning of enter-  First, in order to avoid the nuisance of the classical spots (and tainment that educates. also the effects of zapping on the advertising revenues), we resort to presenting the publicity jointly with the TV pro- grammes, in a non-invasive way: either with a red button 1 Although this term has been previously used on the web, it did not have the approach as in Fig. 5, or blending logos over static or moving connotation of attraction to education. regions as in Fig. 11. The latter possibility, put forward in López 1132 M. Rey-López et al. / Expert Systems with Applications 37 (2010) 1124–1133

flexibility concerning structuring and semanticly describing the contents, however, its syntax is more complicated. On the other hand, we have exposed the mechanisms that take part in the description process. This approach cannot be entirely based on the fact that content creators are the ones that create the descrip- tions, because they could only label the characteristics of the pro- grammes that are interesting for them, not for the viewer. For this reason, complementing these descriptions with some information obtained using additional sources has been proposed: collaborative tagging and automatic tagging. Automatic tagging based on con- textual sources is a good evidence that some agents label the con- tents when it reports some kind of benefits to them. As a real example, the on-line journal El País writes detailed live chronicles about sports events2, with the aim of attracting readers or using these chronicles to look for concrete events in the future. As a future line of this work, we are working in improving our proposal of automatically tagging sports videos, in order to use them as a scenario to test entercation experiences, using these sports videos to offer related educational contents to the viewers. For example, to offer a documentary about the city where the Fig. 11. Blending a sample logo over a moving region. match takes place or a learning element about injuries prevention. Besides, the opposite approach is also interesting (edutainment experiences, education that entertains): using TV programmes (or Nores, Pazos Arias, García Duque, Blanco Fernández, and Gil segments of TV programmes) to complement educational contents Solla (2007), requires the content creators to delimit the regions – as introduced in Rey-López et al. (2006) –, to make the latter in MPEG-7 metadata. more attractive to viewers. For instance, complementing a course  Second, instead of presenting the same advertisements to all the for referees with real scenes of soccer matches. To create these viewers, we rely on the AVATAR recommender system to iden- experiences we are studying how to bridge the gap of the different tify potentially interesting items for each individual. This is done knowledge representation, since learning content is described by matching the descriptions of the items with a viewer’s profile using the SCORM standard whereas the TV programmes are de- that characterises him/her both as a TV viewer and a consumer. scribed by means of MPEG-7 metadata. Finally, the MPEG-7 descriptions attached to the TV programmes In the same line of the idea of collaborative tagging, where the are taken into account, to ensure that the advertised contents, users’ descriptions of the videos are compiled in a server and products or services fit with what the viewer is watching at shared with other users, we are working now in collaborative rela- any time. tioning so that the users can establish and share their own relation- ships between TV contents, as introduced in Correia and Chambel Our advertising system also offers the possibility to link the (1999) and Thomas (2002). These relationships are sent to the rea- publicity inserted in the TV programmes to interactive services soning server (or the semantic reasoners in content providers) in that provide commercial functionalities (buying products, sub- order to help them in finding new interesting relationships that scribing to contents, hiring services, etc). This feature opens the can be offered to the rest of viewers. door to turning IDTV into a platform for t-commerce, with poten- tially many business implications. Interestingly, the benefits ob- References tained from sponsors can encourage content creators to provide good, detailed descriptions of the TV programmes (the better the Advanced Distributed Learning (ADL) (2006). Sharable Content Object Reference descriptions, the greater targeting we can achieve). Indeed, we ex- Model (SCORMÒ) 2004 (3rd ed. Content Aggregation Model Version 1.0.) Assfalg, J., Bertini, M., Colombo, C., Del Bimbo, A., & Nunziati, W. (2003). Semantic pect that it will be publicity, and not education, what will defi- annotation of soccer videos: Automatic highlights identification. Computer nitely bring personalisation into the IDTV business logic – only Vision and Image Understanding, 92(2–3), 285–305. when the MPEG-7 mark up is already there for other purposes will Bertini, M., Cucchiara, R., Bimbo, A. D., & Torniai, C. (2005). Video annotation with pictorially enriched ontologies. In IEEE international conference on multimedia it be exploited to provide t-learning as an added value. and expo (ICME 2005), The Netherlands. Björkman, M., Aroyo, L., Bellekens, P., Dekker, T., Loef, E., & Pulles, R. (2006). 6. Conclusions, discussion and future work Personalised home media centre using semantically enriched TV-anytime content. In Fourth European conference on interactive television (EuroITV 2006), Athens, Greece, (pp. 165–173). In this paper, an approach to enhance TV programmes with Blanco-Fernández, Y., Pazos-Arias, J. J., Gil-Solla, A., & Ramos-Cabrer, M. (2005). additional contents has been proposed, emphasising the use cases AVATAR: A flexible approach to improve the personalized TV by semantic inference. In First workshop on web personalization, recommender systems and of education through IDTV and personalised advertising. For this to intelligent user interfaces (WPRSIUI-05), Reading, UK. be possible, both the programmes and the additional contents – in Blanco Fernández, Y., Pazos Arias, J. J., López Nores, M., Gil Solla, A., & Ramos Cabrer, this case, the educational contents and advertisements – should be M. (2006). AVATAR: An improved solution for personalized TV based on semantic inference. IEEE Transactions on Consumer Electronics, 52(1), 223–231. appropriately tagged. In this project, MPEG-7 has been identified as Correia, N., & Chambel, T. (1999). Active video watching using annotation. ACM suitable to structure the TV programmes in segments and seman- multimedia, Orlando, USA, (pp. 151–154). ticly describe them. Ekin, A., Murat Tekalp, A., & Mehrotra, R. (2003). Automatic soccer video analysis and summarization. IEEE Transactions on Image Processing, 12(7), 796–807. In addition, the problems of writing the descriptions of TV pro- Gazendam, L., Malaisé, V., Schreiber, G., & Brugman, H. (2006). Deriving semantic grammes have been analysed. On the one hand, we have studied annotations of an audiovisual program from contextual texts. In First the standards used to share the descriptions between systems, international workshop on semantic web annotations for multimedia (SWAMM), Edinburgh. where TV-Anytime and MPEG-7 have been suggested. The former has the advantage of its simplicity, but providing TV programmes with semantic information is more difficult. The latter allows more 2 http://www.elpais.com/deportes/futbol/directo.html?p=0213_00_28_0211_1491. M. Rey-López et al. / Expert Systems with Applications 37 (2010) 1124–1133 1133

Golder, S. A., & Huberman, B. A. (2006). The structure of collaborative tagging In Fourth European conference on interactive television (EuroITV 2006), Athens, systems. Journal of Information Science, 32(2), 198–208. Greece (pp. 310–319). IEEE Learning Technology Standards Committee (LTSC) (2002). Learning object Rey-López, M., Fernández-Vilas, A., & Díaz-Redondo, R. P. (2006). A model for metadata. IEEE Standard 1484.12.1. personalized learning through IDTV. In Adaptive hypermedia, adaptive web-based Leonardi, R., & Migliorati, P. (2002). Semantic indexing of multimedia documents. systems 2006 (AH2006), Dublin, Ireland (Vol. 4018, pp. 457–461). Springer- IEEE Multimedia, 9(2), 44–51. Verlag. Linden, G., Smith, B., & York, J. (2003). Amazon.com recommendations: Item-to- Saggion, H., Cunningham, H., Bontcheva, K., Maynard, D., Hamza, O., & Wilks, Y. item collaborative filtering. IEEE Internet Computing, 7(1), 76–80. (2004). Multimedia indexing through multi-source and multi-language López Nores, M., Pazos Arias, J. J., García Duque, J., Blanco Fernández, Y., & Gil Solla, information extraction: The mumis project. Data Knowledge Engineering, A. (2007). Non-invasive and personalized advertising through MPEG-4 48(2), 247–264. processing and semantic reasoning. In IEEE international conference on The TV-Anytime Forum (2004). Broadcast and on-line services: Search, select and consumer electronics, Las Vegas, USA. rightful use of content on personal storage systems. European standard ETSI TS Moving Pictures Experts Group (MPEG) (2003). Information technology – 102 822. Multimedia content description interface. Part 5. In International standard ISO/ Thomas, N. (2002). A conceptual model for peer-to-peer interactive television. In IEC 15938-5. Workshop of future TV: Adaptive instruction in your living room San Sebastian, Park, Y., & Li, Y. (2006). Extracting salient keywords from instructional videos using Spain (pp. 10–14). joint text, audio and visual cues. In Human language technology conference of the Tsinaraki, C., Polydoros, P., Kazasis, F., & Christodoulakis, S. (2005). Ontology-based North American chapter of the ACL (pp. 109–112). New York, USA: Association for semantic indexing for MPEG-7 and TV-anytime audiovisual content. Special Computational Linguistics. issue of multimedia tools and application. Journal on Video Segmentation for Rey-López, M., Fernández-Vilas, A., Díaz-Redondo, R. P., & Pazos-Arias, J. J. (2008). Semantic Annotation and Transcoding, 26, 299–325. Automatic live tagging of videos using chronicles. In AMDIT’08: Proceedings of Yu, X., Leong, H. W., Xu, C., & Tian, Q. (2003). Trajectory-based ball detection and the 2008 Ambi-Sys workshop on ambient media delivery and interactive television, tracking in broadcast soccer video (Vol. 3). Berkeley, CA (USA): ACM Multimedia. Quebec City, Canada. pp. 11–20. Rey-López, M., Díaz-Redondo, R. P., Fernández-Vilas, A., & Pazos-Arias, J. J. (2006). Entercation experiences: Engaging viewers in education through TV programs.