FP6- 027685 MESH

D6.1 Definition of MESH Application Scenarios, internal & external services Contractual Date of Delivery: M3 (May 2006) Actual Date of Delivery: 13 July 2006 Workpackage: WP6 System Architecture and Integration Dissemination Level: Public Nature: Report Approval Status: Approved Version: 5 Total Number of Pages: 54 Distribution List: WP6, TMC members, European Commission Filename: mesh-wp6-D6.1-20060711-Application Scenarios-v5.doc Keyword list: Users, Application scenarios, Services, External services Abstract This document describes the application scenarios created in MESH to drive the system design and relate the features to be developed in the technical workpackages, plus the list of services to be potentially provided by MESH (internal services) as well as external services to be linked from within the MESH platform.

The information in this document reflects only the author’s views and the European Community is not liable for any use that may be made of the information contained therein. The information in this document is provided as is and no guarantee or warranty is given that the information is fit for any particular purpose. The user thereof uses the information at its sole risk and liability.

Page 1 D6.1 – v. 5

History

Version Date Reason Revised by 01 2006-04-30 Initial scenarios Nikos Sarris, Martin Pinkerneil 02 2006-05-30 Updated services linked to MESH Nikos Sarris technical activities 03a 2006-06-25 Updated scenarios and services Nikos Sarris according to plenary discussion and added scenario summary tables and UML models 03b 2006-06-30 Integrated External Services Katerina survey Diamantakou 04 2006-07-05 Added summary & conclusions, Nikos Sarris plus updated list of references 05 2006-07-11 Revised template, some small Paulo Villegas fixes

Author list

ATC Nikos Sarris ATC Katerina Diamantakou DW Jochen Spanenberg DW Wilfried Runde DW Martin Pinkerneil TID Pedro Concejero Additionally, DIAS, DFKI and INA provided additional useful MESH usage ideas. The whole consortium contributed linking the MESH technical activities to the technical concepts illustrated in the scenarios.

Page 2 D6.1 – v. 5

Executive Summary

This document is the final edition of the MESH Application Scenarios identifying the necessary MESH services which are also linked to the foreseen MESH technical activities. This was produced after iterative consultation with the whole MESH consortium carried out in the series of actions detailed in the following table:

Action Actor(s) Deadline Status Describe a set of scenarios with associated technologies T6.2 partners 30-4-06 Completed Fill in annexed tables with relevant technical activities All partners 15-5-06 Completed Revise and merge scenarios into realisable use cases ATC 30-5-06 Completed Present and discuss final scenarios in the plenary meeting ATC, all 07-06-06 Completed Finalise and formalise scenarios ATC 15-06-06 Completed Produce relevant deliverable ATC 30-06-06 Completed

This document now envisions the implementation target of the MESH project and to this end we also include an introductory chapter which outlines the ‘setting’ within which MESH intends to be positioned. This document will be further utilised by T6.1 for the detailed and formal definition of requirements and the instantiation of use cases by means of sequence diagrams, which will be carried forward to lead the technical implementation and integration of the MESH platform.

Page 3 D6.1 – v. 5

Table of Contents

1. Introduction – defining the MESH ‘setting’ ...... 6 2. MESH targeted users ...... 7 3. MESH Application Scenarios...... 7 3.1. Scenario 1: Ilias Bonn – a travelling journalist ...... 7 3.1.1. Storyline description ...... 7 3.1.2. Summary Table ...... 10 3.1.3. System boundary model ...... 12 3.2. Scenario 2: Martha Jong – a retired doctor...... 13 3.2.1. Storyline description ...... 13 3.2.2. Summary Table ...... 14 3.2.3. System boundary model ...... 15 3.3. Scenario 3: John Clark – a news correspondent...... 16 3.3.1. Storyline description ...... 16 3.3.2. Summary Table ...... 17 3.3.3. System boundary model ...... 18 3.4. Scenario 4: Claudia – a business analyst...... 19 3.4.1. Storyline description ...... 19 3.4.2. Summary Table ...... 19 3.4.3. System boundary model ...... 20 3.5. Scenario 5: Bill Jones – building a news agency...... 21 3.5.1. Storyline description ...... 21 3.5.2. Summary Table ...... 22 3.5.3. System boundary model ...... 23 3.6. Scenario 6: Young periodist and blogger ...... 24 3.6.1. Introduction to the scenario...... 24 3.6.2. Storyline description ...... 25 3.6.3. Summary Table ...... 26 3.6.4. System boundary model ...... 27 4. MESH list of Internal Services ...... 28 5. MESH External Services ...... 38 5.1. Introduction ...... 38 5.2. Syndication ...... 38

Page 4 D6.1 – v. 5

5.3. Web Feeds...... 39 5.3.1. RSS and ...... 39 5.3.2. Who publishes Feeds? [57]...... 41 5.3.3. How does someone read Feeds? ...... 41 5.3.4. Feed Readers ...... 42 5.4. Electronic Content Syndication Methods ...... 44 5.5. News personal publishing sources...... 48 5.5.1. Web ...... 48 5.5.2. Mob Blogs...... 49 5.5.3. Spam Blogs...... 49 5.5.4. Video Blogs ...... 50 5.5.5. Podcasting...... 51 6. Conclusions...... 52 References...... 53

Page 5 D6.1 – v. 5

1. Introduction – defining the MESH ‘setting’

In our days we are confronted with vast amounts of information commonly referred to as “news”. News about all aspects of our everyday lives are nowadays accessible to all corners of the world. But how easy is it for anyone to navigate this flood of information and what opportunities are there to get an objective view of controversial events, at national or international level?

Was the latest war an invasion or a liberation? Were the latest elections a grand victory or the result of an unfair election system?

Our era of knowledge should provide for methods of understanding the meaning of ‘news’. Contemporary methods should be able to organise news in a semantic way that would allow the reader to have a complete overview of all similar and conflicting views, being also able to filter information according to personal preferences and interests.

How can this be made possible? ♦ news have to be understood by fully or semi-automatic mechanisms. This means that a news item (in any multimedia form; text, image or video) has to be analysed and categorised (i.e. annotated) according to its contents. ♦ news consumers also have to be understood. This involves profiling of individuals in a structured manner and constantly updating these profiles through personally provided preferences but also through automatic understanding of their needs and interests by monitoring their requests and habits. ♦ news items will then have to be matched to the readers’ interests and requests, by reasoning what news would be preferred by which reader and in which way. ♦ news have to be delivered in an effective way. Personalised multimedia summaries can be a basic means of navigation into the full set of information, while items referring to the same subject will have to be shown in parallel to make critical reviews possible. ♦ the source of information will also have to be understood and profiled if the reader is to be assisted in forming an objective view of actual events. Structured information will have to be provided for the source helping the reader to understand whether the news provided could be biased and towards which side. Credibility will also have to be measured in such a way. ♦ in our era mobility is also a significant aspect to be taken into account. With mobile devices being used more and more both for the production and consumption of news, special technologies need to be advanced into a framework that allows effective inclusion of mobile prosumers. The MESH project was initiated with the vision to integrate the above needs into a setting that would bring the world of news closer to knowledge-enabled services. Twelve different organisations with expertise in all these diverse fields have joined forces to make personal navigation in the world of news a reality.

Page 6 D6.1 – v. 5

2. MESH targeted users As it is also shown in the application scenarios that follow, three categories of MESH potential users have been identified: 1. Personal user: An individual everyday news consumer who may belong to any profession or business activity and wishes to use the MESH platform to have access to everyday breaking or archived news in an easy and effective way. 2. Professional user: A professional who works in the area of Media and would like to use the MESH platform to make his/her work easier and more effective. This category may include journalists, photographers, reporters or any other employee in the sector of Media. 3. Business user: A professional associated to the world of Media in need to build a business using the MESH platform. This business will relate to models of electronic news services as a focused news agency or a thematic portal which will make use of the MESH technologies to provide intelligent content services.

3. MESH Application Scenarios A number of application scenarios have been authored in the format of storylines which envision the future uses of the MESH platform. Within these storylines we have identified the MESH services or features which are deemed necessary for the realisation of these scenarios. These services are referenced within the text1 with a number enclosed in parenthesis (Sx) which points into a table in the next section where every service is further explained and the MESH technical activities which are planned for the implementation of the service are indicated. At the end of every scenario a summary table is added where the user is classified as a ‘System Actor’ and his possible actions are outlined and mapped on MESH functions and features. The information from this table has been used to produce a UML System Boundary Model for each scenario. The System Boundary Model describes the system boundaries, the actors and their responsibilities, and the services offered by the system. This model aims to identify and describe system boundaries, main services and actors and assure a common understanding of the system and its purpose. These System Boundary Models will be further used within the User Requirement task (T6.1) to generate Use Cases which will be described with UML sequence diagrams and/or activity graphs.

3.1. Scenario 1: Ilias Bonn – a travelling journalist

3.1.1. Storyline description For three years Ilias Bonn has been travelling back and forth between Thessaloniki in Greece and Cologne in Germany. His Greek parents moved to Germany in the 1960s, where he was born in 1965. In early 2006 his parents returned to Greece to retire. Ilias is at home in both languages and cultures, Greek and German. Speaking these two languages (Ilias also speaks English and a bit of Spanish) is an asset in many respects. Having a fluent command of Greek helps Ilias understand and stay up to date with local Greek news and information, as this forms an important part of his conversations with his

1 In the electronic version of the document these references are implemented as hyperlinks as well for the convenience of the reader.

Page 7 D6.1 – v. 5 extended Greek family. German, on the other hand, is the language he uses at home in Cologne with his numerous friends, and in everyday life. English, in turn, is the language in which some of Ilias' work is undertaken. It furthermore has the role of the lingua franca, especially in working environments that involve people from different nationalities. Travel is a significant part of Ilias' life. It is of particular importance for his job as a freelance business and economics journalist and consultant. Hardly a week goes by without Ilias travelling to places like Brussels, Berlin or Greece, to name but a few. At the European CEDEFOP Institute (European Institute for the Advancement of Professional Skills) Ilias is working as a reviewer and external advisor. CEDEFOP has always been of interest to Ilias in the past when he was working as a consultant for asset management systems. However, in the past, the physical distance has always kept him from working for CEDEFOP. This has all changed, thanks to advancements in technology, and thanks to MESH! The institute only requires Ilias' physical presence for two days per month as long as he can guarantee being "always-on", i.e. always reachable. Consequently, it is vital for Ilias Bonn professional life to have access to news content in various formats coming from various sources (S1). Due to his travelling habits it is of equal importance to him to have similar access capabilities through portable end devices (S44) that work in all locations world-wide and let him access news he requires when he requires it. MESH is to provide for most of the needs of Ilias Bonn and the following sections will highlight this in greater detail. Ilias Bonn writes as a specialist journalist about business and financial issues. His expertise lies in the area of telecommunications and how these influence journalistic working practices. He regularly reports about international conferences for the European Centre (EJC) in Maastricht and is a lecturer / trainer for the Centre's e- Learning website. He also regularly publishes articles in various journals, magazines and newspapers, as well as online media. Switzerland's Neue Zuercher Zeitung is the only medium that publishes his works in the "traditional" way, namely in printed form in the physical newspaper. However, Switzerland is very advanced when it comes to asset and content management. In 2009 the publishing house became one of the first commercial clients of the MESH platform. It soon became obvious that use of the MESH platform and system would have a fundamental impact on the journalistic working practices and the way news are consumed. All tools that form part of the MESH platform are online-based (S4). Connectivity has been ensured as a result of the EU's i2010 strategy. As a result, "always-on" connectivity is guaranteed in all EU member states for all its networked citizens. Any disruption for more than three minutes results in hefty fines for network providers. As a result, there is hardly a place left in which one cannot connect online, should it be desired. One of the few places that still experiences some difficulties in this respect is on the Thalys train that connects Cologne with Brussels. Although the trip can now be made in 70 minutes, and although satellite-based connections have been in operation on the train since 2007, the repeaters in some of the tunnels seem not to cope with speeds above 320 km/h. The situation is altogether different in Switzerland though. Connections seem to even improve and stabilise the longer tunnels. Travelling by plane, too, is no longer pure relaxation or offline time. Some passengers disconnect the VoIP functionalities of their devices though, in order not to receive calls and disturb fellow passengers constantly. Switching off is no problem as calls are being transcribed using the speech to text (S17) MESH functionality in real-time. Passengers who get dizzy reading however, can use the external speech synthesis modules (S28) which plug onto the MESH system and allow reading of documents by one’s favourite voice. Time high up above the clouds is also ideally suited to view video content, as this is much easier done than reading or typing. Ever since MESH has become fully in line (synchronised) with Ilias Bonn's profile (S21), even short journeys are ideal for effective video searches as MESH automatically refines results to clips which could be interesting to

Page 8 D6.1 – v. 5 Ilias (S16) for watching or reusing in his articles. Requesting licences (S29) for the material is dealt automatically by the MESH platform. New licence requests only occur when new sources are accessed and used, or when content is accessed on a basis different to usage models that are in operation already between Ilias and the various content suppliers he uses. Security (S5) is an overriding concern and of utmost priority. This is not only the case in trains, at airports or when passing through toll stations, but in numerous other everyday life situation such as accessing a physical work space and the like. Here, MESH is at the forefront of developments, too. It has to compete with other strong products, mainly originating in the US and Asia. Contrary to its competitor products, independent organisations are conducting a monthly data security audit of the MESH system and the data stored in it. This is a pre-requisite in the EU as data security and data privacy has become a citizen's right and forms part of the European Citizen Charta. Ever since IPv6 had been introduced, leading scientists have frequently pointed out the dangers that go with these advancements in technology. Especially, the permanent tracking of movement and activities of individuals has been highlighted again and again. The same holds true for the misuse and abuse of personal profiles and data. The "content auctions" (S27) (content sales of his copyrighted material) which Ilias conducts for international re-use generate further income for him. Through semantic analysis MESH understands the content of Ilias submitted material and suggests purchasing to specific MESH users according to their preferences as described in their profiles. When a user expresses an interest in acquiring a news item purchasing is negotiated with the owner who is notified of all interested parties and can decide on his pricing policy. Every three hours MESH provides Ilias with a personalised summary of Greek news (S30) including textual articles, videos and images from several sources. Pictures and video material from the early days of the Internet and are available free of charge from German providers. A high quality video of Germany's first true broadband on- demand video portal "Qurt", in turn, costs seven MESHros. The MESHro is the currency that is used to make transactions on a non-commercial basis (no re-selling allowed) among users of the MESH platform. This fee may increase to up to 20 MESHros if the item is accessed more than 2000 times. If individual items are requested more than 5000 times, the system and its agent propose a commercialisation of the item. It also suggests paying a certain percentage to the operators of MESH (e.g. 10%) while, in turn, reducing Ilias' annual subscription fee to the MESH service by another percentage (e.g. 12 per cent - depending on popularity of the item made available). The MESH system / architecture "knows" what it is doing as the score of its users regarding individual content items are put into correlation with each other, providing content suppliers with valuable information about quality and quantity of the content that is being offered (S31). Items that have been produced in the past and are stored or referenced in the MESH system remain in the MESH archive, accessible at all times. Every time new and related material is added, this is referenced, linked and associated (S32) with existing content. This process can be based on available metadata or on the automatic extraction of meaning (S13), done by individual MESH tools and components. Apart from using various automatic annotation mechanisms (S14), Ilias is still also using a manual annotation tool for tagging his work according to his personal judgement or for checking on the automatic annotation provided by MESH (S15). Furthermore, there are a number of specialists from his respective fields of expertise in various MESH peer groups (S19). While travelling, and in the course of conferences and physical meetings, they all greatly enjoy testing the limitations of the MESH platform and system. One of their favourite tests: finding a famous quote, a particular tune or a well-known movie scene… Often, the winner is not MESH nor Google nor SMART-WEB, but one of Ilias' friends and colleagues! The video search function also provides quick and quite reliable results as both real-time and off-line analysis are used for the recognition of content based on the understanding of

Page 9 D6.1 – v. 5 images, speech and text (S18). Even older material is always taken into consideration with every query, either because the personal agent has "learned" from past experiences (S20) or because other MESH users have fine-tuned and further improved existing data. MESH has become a vital part in the life of Ilias Bonn. He can no longer imagine life without MESH!

3.1.2. Summary Table Ilias Bohn is a freelance business and economics journalist and Professional user consultant. He is travelling frequently all over Europe. He is usually working in three different languages. He needs MESH to: Have access to news content in various formats coming from News central point various sources of access He needs to have similar access capabilities through portable and Mobile access desktop devices. Real time speech to text is used to transcribe calls or broadcasted Speech to Text news so that fellow bystanders are not disturbed. Text to speech modules are used for listening to textual news items Text to Speech spoken by Ilias favourite newscaster. MESH refines search results predicting Ilias preferences according Profile based to his profile. search results filtering MESH suggests content according to his profile. Profile based syndication MESH automatically advices for acquiring licences for content Ilias Use of access rights requests to use in his own articles. / acquiring licences Strict security mechanisms on the MESH architecture forbid Secure platform misuse of profile and copyrighted content. access Ilias can submit his own news content for sale to other interested Content MESH users. Users express their interest and Ilias can select his syndication pricing policy and buyer, e.g. he may select to grant exclusive use to /auctioning a news agency or allow reading to any user with a small fee. Every three hours MESH provides Ilias with a personalised Personalised news summary of Greek news. summaries Ilias can pay electronically using the MESHros virtual currency Virtual e-payment accepted among all MESH users. All material connected to MESH is cross linked according to Delivery of cross- previously existing metadata or based on annotation automatically linked material performed through the MESH multimedia semantic analysis modules. Ilias can also manually annotate the content he connects to MESH Manual annotation using the MESH desktop or mobile manual annotation tool. tool on desktop or mobile MESH encourages community building among peers who share Community common interests based on information from their profiles. building MESH provides effective video search modules which return results Relevance based

Page 10 D6.1 – v. 5 by analysing visual, spoken and textual information. search The MESH modules learn from how Ilias has used the system in the Dynamic profile past always trying to provide the most suitable to him content in update the ways he has shown to prefer to receive it.

Page 11 D6.1 – v. 5 3.1.3. System boundary model MESH System

Logs in securely Accesses all news centrally

Subscribes to MESH services Receives personalised news

Receives personalised news summaries

Searches by relevance

Receives personalised search results

Listens to written news

Professional User Reads spoken news

Manually annotates news items

Pays in MESHros

Sells news items

Acquires content licenses Registers in communities groups

Page 12 D6.1 – v. 5

3.2. Scenario 2: Martha Jong – a retired doctor

3.2.1. Storyline description In the late 1960s Martha Jong went to the Afghan-Pakistani border region for the first time and worked as a doctor in a temporary (military) hospital in a peaceful part of this region. For the next 35 years she spent her annual vacations as an eye specialist in that same region and, during the last two years of her professional life, worked permanently for an NGO in Pakistan. Today, hundreds of diaries, thousands of photographs and a video archive with tapes in various formats such as VHS, miniDV, and HDV100 are piled up in the cellar of her London house. She had learned to appreciate the advantages of the internet early-on and corresponded online with her colleagues, patients, friends and the administration in Pakistan. Since 2007, almost nothing has been published about this region, although many lives fell victim to terror there (especially in the last few years where there was a proliferation of violence.) Martha sometimes wonders whether it was all worth it. And things did not really improve much to this very day. Rather the contrary: Now, in 2010, Martha can no longer travel to the area because of safety reasons. Even so, she spends long hours trying to find the truth as to what is really going on in Pakistan … for herself, and for others. Usually, she has to find and collect all the desired information herself. She does so by accessing various sources through the MESH platform, which provides her with an overview of most news in multimedia summaries (S1). However, she can not always really trust what is presented in such a way. Only a few broadcasting organisations and agencies still have correspondents in the region. A lot of information is being sent out via blogs, video-blogs, newsletters, e-mail communication and the like. Pakistani broadcasting stations and the news of the allied occupying powers can only partially be trusted. Martha often receives photos of people and landscapes and information which seems out of date or even manipulated. She has meanwhile digitized the diaries and her entire audiovisual material and published it online. Many of her former colleagues in Pakistan have done this, too. Some have gone so far to add newspaper articles to her about Pakistan. Martha gives the MESH platform access to her notes via a web service and permission to also use her audiovisual material as reference material (S6). As a trusted source of information for the situation in Pakistan, she has reached a point of valued credibility (S43). Some of her former colleagues, especially the engineers who helped in building the infrastructure, also possess lots of material. In the case of questionable news from agencies, Martha instructs MESH to carry out a deep-linking (trust levels 1 to 3) with the blogs and sources (S12). In doing so, especially photo and video material can usually be shown to have been manipulated. Martha's photographs of buildings, people or landscapes often appear as search results. In such cases, Martha examines whether the material is genuine or has been manipulated. The research results are added to the metadata of pictures and videos. Broadcasting organisations such as the BBC, DEUTSCHE WELLE, as well as various publishing houses have already worked with Martha’s archive material. MESH organizes and regulates this by means of usage rights (S33). Simultaneously, usage fees are being paid. Big media companies often use an interface connected to the MESH-system or are part of the MESH platform, so that a personalised and automated deep linking can be effectively used in the comments on the contributions that have been added. In the beginning, MESH experienced some minor problems with news summaries and creating links between them. At that time, Martha resided in Spain and looked after her daughter's grandchildren, leaving her with little time for her Pakistani interests. In order to resolve this problem and to assist Martha further, MESH is now searching for more Spanish sources related to the subject (S11). Highly important information is directly sent to Martha’s PDA (S21) and while she is travelling around by plane, she is provided with

Page 13 D6.1 – v. 5 relevant summaries (S30). These are usually long enough to last the duration of a four hour flight. Summaries are read out to her, of course, as Martha had never been able to read in an airplane without becoming travel-sick (S28).

3.2.2. Summary Table Martha Jong is a retired doctor interested in news content surveys Personal user from current and past archives She needs MESH to: She needs a central point of access to various sources of Interface to information, including personal web logs external news services News central point of access She needs summarised delivery of news found in various formats Multimedia summaries She needs a way of assessing the credibility of sources she receives News source news from profiling She needs to be able to follow links between content concerning Delivery of cross- similar topics linked material She wants usage of her content to be controlled by access rights Content access rights She needs to be able to query in one language but receive results in Cross-language many languages queries Sometimes she needs delivery of content to her mobile device Real-time news summarisation on mobile devices

Page 14 D6.1 – v. 5 3.2.3. System boundary model

Page 15 D6.1 – v. 5

3.3. Scenario 3: John Clark – a news correspondent

3.3.1. Storyline description John Clark is an independent journalist writing articles for several news organisations. To maintain an adequate contact with past and current news events he is subscribed through the MESH platform to several news libraries (S3). Today he is sitting in his office at home trying to put together a review article on past developments in the middle-east through the last decade. Before engaging in any work, he submits a query in his natural language through his home internet connection to the MESH platform (S22). The system queries all connected news organisations and a distributed semantic search is initiated in all multimedia (text, image or video) material residing in the remote libraries (S1). Although the query is submitted in English, the search engine adapts this to the language of the material queried (S11) and returns all available results in their original languages. The platform assembles all available multimedia material that satisfies the query and categorises content identifying the fragments that refer to the same event, and comparing them to deliver the results in a structure identifying similarities in content. All material is assessed according to the credibility of its source at the time of the article composition, as the platform utilises an intelligent source credibility measure which depends on past user satisfaction as well as on expert opinions (S43). John is then given the result of his query with links to all relevant material (S12). To help accessing the content, the query results are clustered according to thematic content (S10); for each cluster a core overall multimedia summary is presented, together with a relation of the sources it has been produced from, information about the differences between those sources and a short text summary of each source. Organisations that John is subscribed to offer also expanded multimedia summaries and access to the original sources; in other cases he is offered a highlights excerpt and has the option to either subscribe to the organisation, or purchase access only for this article. All payments are automatically handled by the platform upon John’s approval, or cleared through his monthly subscription (S31). John marks the material he selects to download, and the platform returns it in the preferred format according to his profile (S21). When John finishes his article he considers several alternative ways for marketing his work (S34). He can easily pass it to his preferred newspaper with the usual price he has agreed by uploading it through the MESH platform within his folder in their private space. However, since this is a special piece of work, he decides to make it available through the platform to the highest bidding organisation. So, he sets the time deadline until which bids are to be collected and either uploads his own summary which will be viewable to all interested organisations or instructs the platform to automatically compose a short summary. At the end of the deadline he will log in, view the bids and decide, considering the price offered and the prestige of each organisation. He could also decide at that point to make it available to all public with a small fee per reader, if he considers this scheme to be more beneficial (S34). Actually, last week when he was in Latin America covering a hot story he preferred to make the article available to the public with just a small fee. He realised he made much more money this way, as an amazing number of readers wanted to read the story as soon as it happened. He managed to be the first journalist producing a full article as he wrote the story on his mobile device which used his favourite template and made the article immediately available through the MESH platform. A simultaneous search for relevant pictures (S8) on the platform gave him the capability to purchase and include in his article a couple of photographs taken by a photographer who was covering the same event. He had taken the photographs just a few minutes ago, annotated them with his mobile phone (S45) and made them available through the platform. John did not even have to upload the full resolution copies. Browsing through the thumbnails he decided which he wanted

Page 16 D6.1 – v. 5 to include and instructed the platform to insert them into his article according to his selected template, which had an optional placeholder for multimedia material.

3.3.2. Summary Table John Clark is a journalist working as a news correspondent. Professional user He needs MESH to: Have a central point of access to various libraries of news content News central point (as large archives of established news organisations) of access To be able to submit natural language queries in his native Cross-language language and discover content in other languages as well queries To receive similar material in a cross linked manner, having their Delivery of cross- similarities identified. linked material To be able to follow links between content concerning similar Delivery of cross- topics linked material To have a way of assessing the credibility of sources he receives News source news from. profiling To have usage of his content controlled by access rights Content access rights To be able to market his content based on various business and Content marketing pricing models To be able to compose articles on his mobile device Article composition on a mobile device To be able to use his mobile device for capturing and annotating Annotation on a pictures and video mobile device To be able to search for material relevant to the work he is Relevance-based preparing search

Page 17 D6.1 – v. 5 3.3.3. System boundary model MESH System

Accesses all news centrally Logs in securely

Subscribes to MESH services Searches by relevance

Submits cross-language queries

Checks news source profile

Professional User Manually annotates on mobile

Composes article on mobile

Sells news items Assigns access rights

Page 18 D6.1 – v. 5

3.4. Scenario 4: Claudia – a business analyst

3.4.1. Storyline description Claudia switched on her PDA as she entered the train going to work in the morning. Claudia is an investor and the first thing she checks every day is the latest news on the stock market along with any developments in the financial sector. Her profile is described along these lines on the MESH platform (S21) and the first page she sees on her browser is a compiled table of viewing recommendations (S32) extracted from among the news material generated since her last login, together with automatically generated expanded versions of material she accessed in previous sessions and considered interesting to track (S23). While watching content pieces, she can skim over uninteresting pieces by dynamically selecting ‘summary-mode’, in which the material is excerpted on the fly (S36). From time to time references to related information appear; the system adds the ones Claudia feels worth checking to a navigation folder for later review (if a link catches her attention, Claudia may follow immediately and later return to the segment she was viewing). Some of the longest content pieces are extracted so that only the portions that are relevant to Claudia’s interest are shown (S36). She also has a number of favourite financial analysts (S37). Every time one of them publishes an article either directly on the MESH platform, or through a news organisation, Claudia is given the link to this on her browser’s first page. Some columns she has defined as very critical and the moment they are published, she is also sent an alert with a reminder. Although Claudia described her profile in very general terms (S24) when she first subscribed to MESH, the platform has specialised her preferences, after watching Claudia’s reading and her assessments on articles, which she fills in from time to time (S25). However, if she can afford the time, after going through the platform suggested access she also does a more thorough search, in case something else happens to interest her as well. Quite often she also selects to purchase specific pictures or video clips to use in her reports and presentations. In some cases they are even free of charge for non-commercial uses. Claudia repeats her reading several times through the day, either from her office on her desktop computer, or remotely through her mobile device. The MESH platform immediately understands her terminal device and adapts to the proper preferences.

3.4.2. Summary Table Claudia is a business woman constantly on the move. Personal user She needs MESH to: Receive news according to her reading preferences Personalised content syndication To have her preferences understood by the articles she selects to Dynamic profile read update To be able to manually define her preferences from time to time Manual profile editing To be instantly informed of news that matter to her in a Real-time news summarised manner on her mobile device summarisation on mobile devices To be able to read material in a quick and easy way on her mobile Easy news reading device on a mobile device To be able to easily find news items according to her interests in Relevance-based subject, authors, or sources search

Page 19 D6.1 – v. 5 3.4.3. System boundary model MESH System

Accesses all news centrally Subscribes to MESH services

Logs in securely Receives personalised news

Receives personalised news summaries Searches by relevance

Personal user

Edits Profile

Page 20 D6.1 – v. 5

3.5. Scenario 5: Bill Jones – building a news agency

3.5.1. Storyline description Mr. Jones has been in the area of news publishing for quite a few years, working for several organisations, usually on the editorial board. However, he was never satisfied with his financial benefits even though he worked very hard and had developed a great intuition on how to market a good story. He has been looking into starting his own business in the area for quite some time, but the necessary investment was prohibiting and the associated risk too high for him to take. Recently he decided to try building a virtual news agency through the MESH platform following the example of a past colleague. Using acquaintances he had made with several journalists, photographers and cameramen over the past few years he drew some deals with several of them covering diverse topics so that they will provide stories on their area of expertise in a close to daily basis. It was quite easy to build his own secure space on the MESH platform (S2), which his associates will use to find suggested templates and upload material. Of course, he will also be able to purchase material published every day on the platform (S31) by independent journalists. Having defined his suppliers, the next step was to set his potential consumers. Using again his contacts in the world of media he made key people aware that he will be providing through the MESH platform daily material on a number of interesting to them areas. He also announced the kick-off of his portal to relevant MESH societies (S19) and managed to attract several subscriptions for immediate access to published material in their areas of interest. More promised to keep an eye on his portal on the MESH platform and maybe purchase particular news items or decide on a subscription that suits them later on. Advertising his virtual news-agency on the home page of MESH he may also attract the general public to either purchase access to particular stories, or build their own electronic newspaper (S40) based on their preferences (S21). All mechanisms are centrally provided by the platform and have been built to be easily mastered by any individual news reader. Special press review services are also available using the technology provided by the MESH platform through specific modules at an extra cost which is however mitigated to the agency clients. These reviews may involve a Market Analysis for a specific area which may include new products, entertainment news (such as new films, books, etc.), or even an analysis of market competitors (based on published news material) (S40). Mr. Jones also decided to appeal to younger ages by providing personalised e-papers to students and young researchers who define their interests around particular thematic areas like science, or environmental issues (S24). This will also function as a dynamic personalised encyclopaedia, delivering film clips, news items or documentaries in an organised manner according to the readers interests. Strict legislation however imposes limitations on the content MESH may deliver according to age groups. This is handled by the rate-it MESH plugin which rates multimedia extracts into predefined categories according to the involved content (S41) (e.g. violence or adult scenes).

Page 21 D6.1 – v. 5

3.5.2. Summary Table Bill Jones is a media professional setting up a business using the Business user MESH system. He needs MESH to: Provide secure access to file space where he can store news content News content repository To be able to purchase news items by individual journalists Content purchasing To deliver content in a personalised way to different subscribers Personalised according to their profile content syndication To be able to define different news delivery services for subscribers Personalised content syndication To be able to produce thematic reviews using the MESH modules Preference based content filtering To be able to rate delivered material according to the content and Content rating audience

Page 22 D6.1 – v. 5 3.5.3. System boundary model MESH System

Accesses all news centrally

Logs in securely

Subscribes to MESH services

Pays in MESHros Acquires content licenses

Sells personalised news summaries Sells personalised news

Business User

Accesses personal content repository Rates MM content

Advertises to Target Groups

Page 23 D6.1 – v. 5

3.6. Scenario 6: Young periodist and blogger

3.6.1. Introduction to the scenario Studies about news consumption show that younger people do not trust on traditional news providers, among them large newspapers and TV networks. Rather they search for information in the web, more and more every day. Recent surveys in Spain have shown that at least 1 in 5 people check for news in the web on a daily basis. This can be seen as a simple change of channel through which the information flows, but it is only one part of the changes that have been produced. The information sector is also changing in another important direction, i.e., the appearance of small independent websites, many of them providing news. These range from tiny sites with gossip (but can be very influential, as Monica Lewinsky's case showed), to well maintained, more serious, sites. A key development in this area has been the development of web services so that viewers can comment and contribute to the sites, this has been called "blogging". Basic aspects of these services are the sharing of news among the different sites, via efficient syndication mechanisms, and the ease of use of the services for publication, dissemination and consumption of information. The information provided is not only via traditional text or html. Very often these sites also provide links to multimedia repositories, like youtube.com and many others, that also make it easier that many people upload content that can be considered news, and in some cases, it is much better than traditional professional news material. With all these advances one can think that the information is easier to find, but this is not the case, as the information is in fact more disperse. Rather than finding the news in your favourite newspaper, now you have to search in the many sites providing information. Many readers just value this, as they have many different views on the aspect they are seeking info for. But of course this produces the fact of the complexity of searching, verifying and filtering the information according to personal preferences and interests. Another complex aspect also appears nowadays, and it is a key of modern life: mobility. Nowadays you do not expect to stay sitting on your office desk to have all the information you need. With all kinds of mobile terminals, networks, and services, you just expect to have the info you usually only had on a fixed computer, on the move, using several kinds of mobile devices: PDA, laptop, mobile telephone. MESH tries to provide new services in this context, by providing the following functionalities: - User as both producer and consumer of information. - Mechanisms for easy syndication of different, heterogeneous sources, including multimedia material. - Optimized search mechanisms, with relevance feedback, filtering following rules, optimization of search effectiveness even in case of lack of definition. - Personalisation of search, filtering. - All in a mobility context, i.e., not only working in mobile terminals, but also with mechanisms adapted to the work in mobility (with less resources and also less time and possibly in a complex context). Let us say that MESH should be able to provide lightweight mechanisms that provide results even in constrained contexts as it is usually the case when on the move. - All with assurance of control on privacy and intimacy.

These are some of the themes inspiring the following user scenario:

Page 24 D6.1 – v. 5 3.6.2. Storyline description Teodora is a young periodist, daughter of the director of a known newspaper. She is well aware of the changes that are happening in the information sector, and thus has setup a personal teodoranews.com portal with the help of the MESH platform (S40). Her site is quickly growing in number of users, currently nearly 16000 visits average per day, by means of an intelligent use of the MESH blog integration module (S42), the links to many other small independent sites, easily located in a navigation bar in a side of the web, the RSS feeds, and many activities for syndicating and aggregating information (S42). For instance, joining different information sources in her single portal. Teodora earns some money from the publicity in her site, but she also works in a traditional TV channel, so she works full-time in the news sector. She would not be able to do any of her duties without the internet: She start her day consulting the MESH platform, which uses feeds from news.google.es, but highly personalized to her own requirements (S20), as her main area for the TV channel is technology related news, with strong focus on social changes as a consequence of these new technologies. As she reviews and filters these news items, she pays more attention to the most interesting ones. MESH captures her attention patterns and makes them available to those relatives (and customers) of her who consider she has some authority in her fields of interest. She then consults the news produced by her colleagues and friends (S21) by using MESH, which retrieves summaries, and filters lots of information to get a quick view, so that she can post news on her personal website before she leaves for her work in the TV channel. She downloads the summaries, with all the attached information that allows her to filter and personalize it, to a PDA. Very often, she is in charge of a particular news program for the TV station, and she has to prepare all the information, under immense time pressure. Main problem nowadays is to find the exact information you are looking for, the more difficult the less structured or less known theme you are searching for. With this purpose, she uses MESH as a search mechanism with highly optimized algorithms to suit her preferences (S21). So that she can pay attention to the most relevant news items related to the topic she has to cover, MESH lets her benefit from the expertise of her community of correspondents by letting their own attention patterns filter out those news items that they did not consider worth being paid attention to. When she is travelling towards the TV channel studios, she consults the MESH summaries downloaded to the PDA. And also whenever she does not find the expected information using her home computer, she uses the MESH mobile service, which is less powerful, but more efficient on the move (S44). The MESH system is at the centre of her work and she has a lot of control on many of the parameters, which allow her to optimize her use of the system: She can set her preferences manually, and change them at any time (S24). These preferences configure rules, but she has control on the way the preferences are translated into filtering rules. She can also modify or update filtering using examples. For instance, if she finds that the filtering has taken out important news for her, she can instruct the system so that the filtering rules are changed accordingly. The MESH platform also provides suggestions for particular themes based on previous search history (S25). MESH stores the search history but using all the privacy mechanisms also used for maintaining the user preferences within a secure service (S5).

Page 25 D6.1 – v. 5 MESH allows for control on the recall / precision trade-off depending on the scope of the search made. She has control on this trade-off by using a single measure of retrieval effectiveness (S21). When on the move, the MESH system detects this context and adapts its algorithms to present more precise information (S7), using more strict rules for filtering. This way, the results provided by MESH are better suited to less than optimum context. As an additional aspect for the mobility condition, MESH applies screening mechanisms (S36), instead of full-blown filtered and optimized search (S26), so that good quality results can be provided quickly. This is especially effective in conditions in which the user has little time to consult too much information, and thus requires very quick input of good quality, if not optimum.

3.6.3. Summary Table Teodora is a media professional building a part-time business by Business user syndicating news in an intelligent way. She needs MESH to: Build a portal to aggregate external news sources (as RSS feeds, Interface to blogs, etc.) and syndicate that content to her subscribers external news services Search for news matching her interests which are automatically Preference base understood by MESH content filtering Be able to also manually define or amend her automatically created Manual profile profile editing Consult the communities she is subscribed to, for the latest Community developments in their common fields of interest building Search for material for the news programme she is preparing on a Relevance-based daily basis search Receive personalised summaries of news content on her mobile Real time device summarisation Automatically receive content in the most effective way, as depth of Optimised delivery information is automatically optimised against required response of content time

Page 26 D6.1 – v. 5

3.6.4. System boundary model

Page 27 D6.1 – v. 5

4. MESH list of Internal Services

MESH MESH Service MESH Activity Title Description of service implementation activity Section 1: Platform Infrastructure

Real time analysis will allow access to a variety of news sources S2.2.1 Real-time analysis which have not already been indexed

Real time summarization will support the dynamic and automatic creation of quick summaries or previews of video data using the Summarised delivery of news S4.2.1 Real-time summary generation resulting descriptors derived from the real time analysis subtasks (S1) content in various formats coming (which work directly over compressed video data mainly focusing from various sources - news central on operation efficiency) point of access Adaptation engine will allow for the proper delivery of various S4.1.2 Usage Environment Adaptation content formats according to the user device

Overall MESH architecture & MESH will provide an open architecture interfaceble to various S6.3.1 client-server model types of news sources

Scalable file space with secure Overall MESH architecture & A platform content repository will exist for MESH multimedia news (S2) S6.3.1 access client-server model content

Overall MESH architecture & The MESH architecture will allow connection of external (S3) Access of external library material S6.3.1 client-server model multimedia repositories

On-line access of all MESH Overall MESH architecture & MESH will be a web-based platform (S4) S6.3.1 platform tools client-server model

Overall MESH architecture & Existing security technologies will be used to ensure secure (S5) Secure access of content S6.3.1 client-server model access of content and protection of user profile information

Existing news feeds and services will be identified (S6) Web services that access content S6.2.3 Identification of external services

Page 28 D6.1 – v. 5

in external news services as blogs A selection of existing news feeds and services will be interfaced S6.4.3 Interface to external services to the platform

Section 2: Understanding News

S5.5.2 Reasoning tool implementation Automatic algorithms for the detection of changes in the context.

Users will be profiled to match news to users according to the S5.4.1 User profile modelling Context detection for content context and user profile adaptation Users will be profiled to match news to users according to the (S7) S5.4.3 User profile exploitation context and user profile

S5.1.1 New terminal features Adaptation of the algorithms on the mobile devices

Context modelling S5.5.1 Context modelling for reasoning The context will be modelled for use by the reasoning algorithms

Processing of complex queries based on the multimedia news (S8) Searching based on news content S4.4.3 Hybrid query processing content

Semantic interrelationships and Comparison of news content to extract core meaning and identify S4.3.1 (S9) Semantic disparity map of news automatic disparity maps relations and differences content S3.3.2 Reasoning module

Training and Learning Clustering of news in different classes based on visual descriptors S2.2.3 Approaches for Multimodal Visual based Classification

Media clustering and sample Media of the same topic will be clustered and a representative Thematic categorisation of news S4.3.2 (S10) selection sample will be selected items WP2 will provide annotations, which can include categorisation of Knowledge Modelling & content. However, WP2 provides only the annotation and not the WP2 Semantic Multimedia Analysis query processing as described in the scenario. These functions will be limited in the MESH application domains.

Page 29 D6.1 – v. 5

Full blown cross-language retrieval via query translation has not been planned, but as indicated in subtask 4.4.1, a possible alternative approach is to use the conceptual annotation of Cross-language search Multilingual and Natural (S11) S4.4.1 content as pivotal representation. As limited resources have been mechanisms Language queries allocated for cross-lingual search functionality, experimental research is needed to decide which of the known approaches can be applied successfully given the available resources.

S3.3.2 Reasoning module Application of rules to relate content items

Related contents will be linked. A multimedia news navigation Generation of semantic and environment (“multimedia mesh”) will be built. This requires all the S4.3.3 visual indexes and tables of work in T4.3 and will be based on metadata (from WP2, T4.2 and contents (S12) Cross linking of different material other subtasks in T4.3) Requires all the work in T4.3 and will be based on metadata (from S4.3.4 Media and metadata linking WP2, T4.2 and other subtasks in T4.3)

The Reasoning module will be utilised taking various parameters T3.3 Server-side reasoning into account, as credibility of sources

Hierarchical ranking of events Event detection in videos based on perceptual relevance; (S13) Automatic extraction of meaning S2.5.1 and highlight detection Highlight detection based on knowledge from the expert.

The real time extracted descriptors will be used for extracting some quick meaning for the sequences (in fact, this meaning will also be used in the real time summarization: S.4.2.1 subtask). The meaning extracted in this task is limited to some material creation procedures: shots, camera motion or some rough understanding of scene content (e.g. are there persons in the sequence? are S2.2.1 Real-time analysis there some important objects?). Moreover, the textual information extracted from videos has not to be exclusively limited to speech transcription, but also character recognition can be performed over the video sequences (which is particularly useful in this domain where text usually summarizes a piece of news content). Current state of the art algorithms for video analysis focused on caption extraction will be implemented and refined for this task

Page 30 D6.1 – v. 5

Syntactic analysis of metadata will be used to generate keywords Automatic key-word S2.5.2 that can be verified by human annotators in a semi-automatic recommendations annotation process

Hierarchical semantic highlights Both S2.5.1 and S2.5.2 will be used for this process T2.5 detection and automatic key- word recommendations

User annotation, relevance T2.6 is a specific task where users can edit – correct – insert T2.6 feedback and corrective annotations based on the results of the automatic analysis annotation

This is the objective of the whole WP2 using output from various Knowledge Modelling & Tasks depending on the application domain. The extend and WP2 Semantic Multimedia Analysis accuracy of automatic annotation will depend on the specific domain and application

The whole WP2 works towards semantic analysis of multimedia Knowledge Modelling & Automatic annotation of content WP2 content for its (off-line) automatic annotation in the selected Semantic Multimedia Analysis domains

Semantic harmonization & Mapping, Integration into upper models in the selected domains (S14) Metadata harmonization T3.4 heterogeneity

Metadata mapping S3.4.1 Ontology mapping Limited in the selected domains

Metadata Reasoning S3.3.2 Reasoning module Reasoning engine for temporal structures

Scalable visual analysis and Manual Annotation tool for mobile devices T5.3 annotation in mobile environments (S15) Manual annotation of multimedia content User annotation, relevance Manual Annotation tool for desktop devices T2.6 feedback and corrective annotation

(S16) Personalised search results S2.2.1 Real-time analysis The real time extracted descriptors will be used for carrying out coarse (first level) searches (browsing), where not much detail or

Page 31 D6.1 – v. 5

precision is required

Large vocabulary speech recognition. In MESH speech recognition will in large part be independent of user queries, or in S2.3.1 Speech recognition other words: an off-line process. UT aims to work also on incremental speech indexing, to cover dynamic news wires. But the main focus will be on the offline process. (S17) Speech to text An adaptation module will be integrated in the adaptation engine S4.1.2 Usage Environment Adaptation utilising technology for the speech to text algorithms from WP2.

Speech and text combination Mapping onto the ontology will be provided S2.3.3 and interface to multimedia knowledge

The whole WP2 works towards semantic analysis of video based Knowledge Modelling & (S18) Video indexing WP2 on the understanding of images, speech and text in the selected Semantic Multimedia Analysis domains

Section 3: ‘Understanding’ Users

Social network analysis and user preference clustering will be performed. Users will be able to explicitly declare their groups of (S19) Community building S5.4.3 User profile exploitation friends but communities may also be automatically generated based on user interest similarity.

Long term learning for adaptation of Learning by experience methods for long-term evolution of the (S20) S5.4.2 User profile acquisition user profile user profile

Representation and exploitation of Efficient representation of Representation of user preferences in the profile. (S21) S5.2.1 user’s preferences for content knowledge structures adaptation Profile and knowledge Contextual adaptation (user profile management) of preferences S5.2.3 management tools

Page 32 D6.1 – v. 5

Algorithms will be developed for personal presentation of news content. Use of filtering techniques will exploit user profiles and S5.4.3 User profile exploitation context to personalize content retrieval in a dynamic and contextual way. Developed techniques will allow matching of user profile against content metadata, source description, etc.

Ways to formalise a user profile; Use of ontologies to structure the concepts and relations that represent user interest. User profile S5.4.1 User profile modelling editing tools for manual profile update. The level of complexity to be expressed will have to be agreed

Relevance measures and Combination of preference-based measures with other criteria for S4.4.2 ranking algorithms content ranking

S5.5.1 Context modelling for reasoning Context modelling and reasoning will be used to push the right Proactive recommendations content to the right users S5.5.2 Reasoning tool implementation

Dynamic balance mechanisms will adjust the degree of personalization depending on contextual conditions. Users may Effectiveness of recommendations. S5.4.3 User profile exploitation set the expected precision and recall manually. Criteria will be Flexible degree of personalisation defined for when and to what degree it is appropriate to personalize

Unstructured search queries in Multilingual and Natural Full text monolingual retrieval for three languages will be used as (S22) S4.4.1 natural language Language queries a baseline

Usage resolution, statistics & Different granularity levels will be implemented for preferences (S23) Short-term preference learning T4.5 event reporting learning

Profile and knowledge A profile management editing tool will be implemented (S24) Manual edition of user profile S5.2.3 management tools

(S25) Dynamic update of user profile S5.4.2 User profile acquisition Learning mechanisms will be implemented

Page 33 D6.1 – v. 5

Long term analysis of access history to infer and update user S5.4.2 User profile acquisition interests. Will also takes into account the different dimensions (long-term vs session interests)

The task will build mechanisms to update the user preferences S5.4.2 User profile acquisition based on some implicit user behaviour. A method will need to be defined for collecting implicit feedback from the user

Profile and knowledge User profile management tools will be implemented through the S5.2.3 management tools user profile manager

Reasoning negotiation and Optimisation techniques will be implemented for choosing the best (S26) Choice of screening vs. learning S5.5.3 distribution parameters according to the context

Section 4: Matching and Delivering News

Content syndication - media and Syndicating news content to interested users and negotiating (S27) Content auctions T4.3 metadata linking virtual prices between the author and the consumers

Clear interfaces have to be defined for possible adaptation of off- Overall MESH architecture & S6.3.1 the-self modules Not foreseen to be demonstrated within MESH client-server model (S28) Text to speech but potential must be there for future integration An adaptation module can be integrated in the adaptation engine S4.1.2 Usage Environment Adaptation Requires off-the-self module for the text to speech functionality

Requirements, architecture and Requirements and model definition S3.2.1 model definition

Content licensing and usage model Authorization and rights Authorization and rights management (S29) S3.2.2 agreements management subsystem

Rights expressions and usage Definition of rights expressions and usage rules T3.2 rules

Knowledge Modelling & In order to provide the personalised summary, (semi-) automatic (S30) Personalised news summaries WP2 Semantic Multimedia Analysis analysis will be needed by WP2

Page 34 D6.1 – v. 5

Generation of multimedia T4.2 summaries

Quick summarisation will be accomplished with real time Quick Views S2.2.1 Real-time analysis summarization that uses the data extracted in real time analysis as guidance.

Summarization can be performed agreeing several analysis Summarization at different S4.2.1 Real-time summary generation descriptors, which can be assigned different relevance levels and granularity levels so lead to different summary lengths.

No actual e-payment method will be implemented. The concept of MESHros will be integrated in the user profile and will be handled Content syndication - media and (S31) Virtual e-payment methods T4.3 by the content syndication algorithms when item purchases are metadata linking made to simulate future integration of an actual e-payment system.

Partly based on content, partly on the basis of extracted S6.3.3 Client HC interfaces metadata. Via 6.3.3. a first demo can be delivered without dependency on ontology mapping (S32) Delivering cross-linked content according to semantic relevance Modules for automatic syndication, inter-relations and linking of Content syndication - media and T4.3 different content items will be used, along with a navigation metadata linking environment

Requirements, architecture and Requirements and model definition S3.2.1 model definition Impose usage rights on connected material Authorization and rights Authorization and rights management S3.2.2 management subsystem (S33) Rights expressions and usage Definition of rights expressions and usage rules Take care of usage conditions T3.2 rules

Authorization and rights Authorization and rights management Content authentication S3.2.2 management subsystem

Page 35 D6.1 – v. 5

Business models for marketing Business models will be described to be implemented by content (S34) T7.2 Exploitation news items syndication methods

Real time video summarization. Dynamic summarisation can be Dynamic summarisation of news (S35) S4.2.1 Real-time summary generation accomplished with real time summarization that uses the data articles / screening extracted in real time analysis (S.2.2.1) as guidance.

(S36) Preference-based content filtering S5.4.3 User profile exploitation Preference-based content filtering

(S37) Preferences for authors S5.4.3 User profile exploitation Preference-based content filtering

(S38) User preference for news sources S5.4.1 User profile modelling Preference-based content filtering

Dynamic building of an electronic The multimedia mesh structure will be used for building a linked (S39) S4.3.4 Media and metadata linking newspaper set of news material matching the subscribers preferences

Automatic creation of thematic Generation of multimedia The mechanism of multimedia summaries will be used for (S40) T4.2 reviews summaries providing thematic reviews to subscribers

Training and Learning Content classification based on extracted visual content features Automatic rating according to (S41) S2.2.3 Approaches for Multimodal (e.g. violence) content analysis Visual based Classification

Capability of integrating external news services into a MESH Integrating news from external (S42) S6.4.3 Interface to external services information service/portal news feeds (like blogs, RSS, etc.)

Section 5: Understanding News Sources

A formal model will be built for the representation of trust levels for Measuring the credibility of news Representation of trust levels for knowledge sources and trust levels will also be used for linking (S43) S5.2.2 sources knowledge sources content. The use of social networks aspects for this purpose will also be investigated.

Section 6: Mobility of News Services

Page 36 D6.1 – v. 5

Mobile devices will be supported both for authoring and receiving news content, providing a user friendly interface and customised (S44) Portable end devices S5.1.1 New terminal features features for these purposes. Continuity of interaction will be required (i.e. transfer of user profile information as well as transfer and translation of media streams)

Some of the techniques and algorithms that will be developed in WP2 for multimedia analysis will be adapted for constrained (S45) Annotation on the mobile phone S5.3.2 Scalable annotation mobile environments. This will carry out a very limited analysis and annotation for videos and images

Page 37 D6.1 – v. 5

5. MESH External Services In this section we present a survey of existing news syndication services which are commonly used in the Media community and could be useful to integrate in the MESH platform. Within Sub-task 6.4.3 (‘Interface to external services’) these services will be prioritized and a set of most suitable candidates will be selected for integration and use by the MESH platform.

5.1. Introduction The publishing and journalistic communities, in their attempt to organize and manage information, which proceed from the environment of news, use broadly recognized standards. Their goal is the achievement of processes, such as the description, the structure, the documentation and the exchange of data between big organizations. The latest method for easily distributing online content is often called a and the most popular technical format that makes it possible is called RSS, which stands for Really Simple Syndication. We should point out that RSS is just one standard for expressing feeds as XML. Another well-known choice is Atom. Both formats have their boosters and a critical comparison of the two is given in a later section, however, it does not appear that consolidation toward a single standard is imminent. Furthermore, some of the Content Syndication Methods, which are reported analytically in the later sections, are: ♦ NewsML, ♦ Nitf, Prism, ♦ IPTC 7901, ♦ IIM, etc. The following presentation of a comparative report relatively to the aforementioned standards and feeds, aims in describing characteristics and several similarities and dissimilarities. The main goal is to present the main characteristics, the usefulness and the necessity of the news management tools, not only for the journalistic communities but for the public news audience, as well.

5.2. Syndication News Syndication is the process of providing automatically updated news information. There is no doubt that nowadays almost everyone has a list of web sites he/she browses daily for updates, whether they are stored in bookmarks or in our memory. In case someone finds himself loading 20 or 30 sites a day, and notices that a few stop updating frequently, he/she will inevitably stop checking them. What if there was instead some way to have a list of bookmarks notify you when the sites you read have been updated? You wouldn’t waste time checking those that haven’t. Instead of loading 30 sites a day, you might only need to load 10. Cutting your time in half would enable you to start monitoring more sites, so for the same amount of time you originally invested in checking each site manually, you may just end up end up following twice as many.

Page 38 D6.1 – v. 5

Syndication provides the tools to do this. A news reader, or aggregator2 as they’re also known, is a program or a web site that automatically checks your list of bookmarks (which you only have to set up once) and lets you know what’s new on each site in your list. It goes beyond simple updates though — the news reader works by pulling in the feeds of your various bookmarks. But what is a feed? [56]

5.3. Web Feeds Technology evolution in online publishing has made it really easy to not only publish regular updates to web-based content, but also keep track of a large number of your favorite Web sites or blogs, without having to remember to check each site manually or clutter your email Inbox. Someone now can streamline this online experience by subscribing to specific content feeds and aggregating this information in one place to be read when required. Feeds are addressed to 3 different categories: ♦ Consumer Bottom Line: Subscribing to feeds makes it possible to review a large amount of online content in a very short time. ♦ Publisher Bottom Line: Feeds permit instant distribution of content and the ability to make it "subscribable" ♦ Advertiser Bottom Line: Advertising in feeds overcomes many of the shortcomings that traditional marketing channels encounter including spam filters, delayed distribution, search engine rankings and general “in-box” noise. A web feed is a document (often XML3-based) which contains content items, often summaries of stories or weblog posts with web links to longer versions. Weblogs and news websites are common sources for web feeds, but feeds are also used to deliver structured information ranging from weather data to "top ten" lists of hit tunes. The two main web feed formats are RSS (which is older and far more widely used) and Atom (a newer format that has just completed the IETF [1] standardization process.)

5.3.1. RSS and ATOM RSS and Atom are two flavors of what is more or less the same thing: a ‘feed’ which is a wrapper for pieces of regularly and sequentially-updated content, news articles, weblog posts, a series of photographs, and more.

RSS [2][3]

RSS - Really Simple Syndication/ Rich Site Summary/ RDF [4] Site Summary provides a convenient way to syndicate information from a variety of sources, including news stories, updates to a web site or important bulletins. Regardless of the purpose for which the RSS file is being used, by watching this XML file, you can quickly and easily see whenever an update has occurred. Of course, viewing the RSS feed in

2 Aggregator is a type of software that retrieves syndicated Web content [5] that is supplied in the form of a web feed (RSS, Atom or other XML formats), and published by weblogs, , , and mainstream mass media websites. Aggregators reduce the time and effort needed to regularly check websites of interest for updates, creating a unique information space or "personal newspaper." An aggregator is able to subscribe to a feed, check for new content at user-determined intervals, and retrieve the content. The content is sometimes described as being "pulled" to the subscriber, as opposed to "pushed" with email or IM. Unlike recipients of some "pushed" information, the aggregator user can easily unsubscribe from a feed. 3 XML: The Extensible Markup Language (XML) [7] is a W3C-recommended general-purpose mark-up language [8] for creating special-purpose markup languages, capable of describing many different kinds of data. In other words XML is a way of describing data and an XML file can contain the data too, as in a database.

Page 39 D6.1 – v. 5

Internet Explorer and manually reloading the page every few minutes is not the most efficient use of your time, so most people take advantage of some form of client software to read and monitor RSS feeds. There are many different RSS clients available, the most useful of which are the following: ♦ Rss-Reader [8]/ Internet syndication and aggregation software, ♦ Sharp-Reader [10] / RSS & Atom Aggregator for Windows, ♦ Feed-Reader [11]/ lightweight aggregator that supports RSS and ATOM formats, ♦ Ampheta-Desk [12]/ cross platform, open-sourced, syndicated , ♦ News-Gator [13]/ located behind the corporate firewall, allows critical information to be delivered and read securely, and ♦ RSS Bandit [14] / A desktop news aggregator written in C#.

ATOM [2][15] Atom is a simple way to read and write information on the web, allowing you to easily keep track of more sites in less time, and to seamlessly share your words and ideas by publishing to the web. Atom is also a system which makes it easy for you to receive regular updates from news websites. Created by leading service providers, tool vendors and independent developers, Atom is designed to be a universal publishing standard for personal content and web logs (blogs). Technical information about working with the Atom format is available at the developer information page [16], and publications or weblogs that are interested in the benefits of being can Atom Enabled find out more about the benefits of Atom for publishers [17].

Comparing Standards (RSS vs. ATOM) [18] In 2003, CNET's special report on "Battle of the Blogs" [19] provided a good explanation of the underlying debate. Basically, Dave Winer, who is credited with much of the development behind RSS 2.0, had frozen its core development "to keep the developers from screwing with it," so that it was kept "simple". This didn't sit well with others, so they decided to come up with their own flavor of blog content syndication, which along the way has been named Pie, Echo, and now Atom. The problem is that while RSS and Atom are more alike than not, they are competing specs that could splinter the market. A number of bloggers have posted that RSS was really for web site content syndication, while Atom is geared toward blog syndication. There are many news aggregator programs and web site services that work with RSS, but less than can read Atom, although Atom compatibility has increased enormously in the past couple of years. E.g. BottomFeeder [20] is an open source news aggregator client that runs on many different operating systems (Windows, Mac, Linux, Unix, etc.) and supports news feeds in both RSS and Atom formats. While RSS will not go away (at least in the medium term), Atom tries to be more things to more people. RSS proponents are concerned as to what a competing standard may do to splinter the marketplace. After all, for quite a few years, if someone wanted to burn DVDs, he/she had to choose between buying a DVD-R/W or DVD+R/W drive and cross his/her fingers that the DVDs would work on all equipment (DVD player, laptop DVD drive, desktop DVD-ROM drive, etc.). Only recently have dual-format burners become popular to ensure consumers could use their burned DVD's in the way they were expecting to use them. Thus as Atom picks up more momentum, we will see more dual-format news aggregators like BottomFeeder on the market.

Page 40 D6.1 – v. 5

Atom proponents are stymied by the freeze on the RSS core, because they see that there is much more that RSS is capable of doing and becoming. Some say that on one hand, the ability to further develop RSS in the Atom format (rather than stagnation) is a good thing, but it also adds to its complexity. That is precisely why some RSS proponents want to keep RSS frozen -- to keep it simple so that it does not take expensive consultants and programmers to deploy it. In other words, it may not be perfect, but right now it's simple enough and works well enough that the masses can use it. It's not hard to see the logic on both sides of the debate, but unfortunately, it has become personal for some of the key players. Google's decision for Blogger was interesting in of itself. For a long time, the standard Blogger software didn't include any RSS support, which is why they lost bloggers to other systems like Radio Userland, Movable Type, and TypePad. Now, after Google's acquisition, they went exclusively with Atom support. There is indeed concern over RSS being frozen. Emerging technologies have a hard time emerging when they are not allowed to evolve. Apple tried to keep tight rein over their specifications, and it made them the market leader of a 10% market for many years, while the PC platform flourished. In the interim, these developments bear watching to see which syndication standards are appropriate to support on one's web site or blog. The moral of the story is that it's definitely too soon to tell, and there may be room for both standards as long as the context is appropriately set. Given the intensity of the debate so far, we think it's safe to say we're in for more colorful developments before it's over.

5.3.2. Who publishes Feeds? [57] Most of the biggest names on the web offer content feeds including USATODAY.com, BBC News Headlines, ABCNews, CNET, Yahoo!, Amazon.com (including a ), and many more. In addition, hundreds of thousands of bloggers, Podcasters and Videobloggers publish feeds to keep themselves better connected to their readers /listeners /admirers /critics. Apple, through its’ iTunes Music Store, offers tens of thousands of audio and video podcasts for download, each of which is powered by a feed. The terms "publishing a feed" and “syndication” are used to describe making available a feed for an information source, such as a blog. Like syndicated print newspaper features or broadcast programs, web feed contents may be shared and republished by other web sites. (For that reason, one popular definition of RSS is Really Simple Syndication.) More often, feeds are subscribed to, directly by users with aggregators or feed readers, which combine the contents of multiple web feeds for display on a single screen or series of screens. As of 2006, the latest advance in this area is the appearance of web browsers incorporating aggregator features. Depending on the aggregator, subscription is done by manually entering the URL of a feed, by clicking a link in a web browser or by various other methods. Web feeds are designed to be machine readable, so there is no requirement that they be destined only for human readers. For example, business partners could use web feeds to exchange sales data or other information without any human intervention.

5.3.3. How does someone read Feeds? If one wants to browse and subscribe to feeds, there are various choices. Today, there are more than 2,000 different feed reading applications, also known as “news aggregators” (for text, mostly) or “podcatchers” (for podcasts). There are even readers that work exclusively on mobile devices. Some require a small purchase price but are tops for ease-of-use and ship with dozens of feeds pre-loaded so one can explore the feed "universe" right away. Free readers are available

Page 41 D6.1 – v. 5 as well; a search for "Feed reader" or "Feed aggregator" at popular search sites will yield many results. A handful of popular feed readers are listed below. A typical interface for a feed reader will display your feeds and the number of new (unread) entries within each of those feeds. You can also organize your feeds into categories and even clip and save your favourite entries (with certain applications). If you prefer, you can use an online, web-based service to track and manage feeds. Online services give you the advantage of being able to access your feed updates anywhere you can find a web browser. Also, upgrades and new features are added automatically.

5.3.4. Feed Readers A feed reader is a free lightweight aggregator that supports either RSS or ATOM formats [21]. -What makes Feed readers stand out from the crowd? Powerful, yet simple Feed readers’ functionality is focused on the main task - reading and organizing RSS /ATOM feeds and offering seamless user experience. Lightweight One of the main goals of Feed readers architecture is to keep their footprint small. Mostly Free There are no spyware or adware packages installed inside Feed readers. There are also open source packages available for those who like to get source code and tweak it. Customizable Feed readers are a completely customizable application. Feed readers are perfect for media companies, corporations and other organizations who are interested in completely customized versions of Feed readers. An example of a feed reader functionality is shown below:

Beyond day to day use, a particularly nice feature is that someone is able to take news with him on the go. Have his newsreader grab the latest feeds before he rushes to the airport, then check out of the in-flight movie to catch up on the most recent goings-on. Of course the author has to be providing full content for this to work and some only provide summaries — it’s about 50/50. In case someone leaves the summaries unread, he/she has the opportunity to come back to them later, when he/she will be connected again. In this regard, news readers also function

Page 42 D6.1 – v. 5 like temporary bookmarks. Unread items will stay flagged until someone has more time to read them. And of course, as syndication spreads across the net, more and more choices of content are available. Soon we will have a whole new problem on our hands: -how many feeds are TOO many? [22]

Popular Content Syndication Applications

• NewsGator - Feed Demon 2.0 [23] (Windows)

• NewsGator - Inbox for Microsoft Outlook [24] (Windows)

• NewsGator – NetNewsWire [25] (OS X)

• Pluck [26] (IE or Firefox, PC)

• Firefox [27] (via "Live Bookmarks" feature)

• Safari [28] (feed support in the Apple OS X native browser)

• Pulp Fiction [29] (OS X) Online Services

• NewsGator [30] (Online)

• My Yahoo! [31]

• Bloglines [32]

• Pluck [33] (Web Edition)

• Rojo [34]

• Newsburst [35] Podcast Readers

• ITunes [36]

• Juice [37]

• Doppler [38]

• FireANT [39]

Page 43 D6.1 – v. 5

5.4. Electronic Content Syndication Methods

When it comes to syndicating electronic content, two main syndication methods are employed: ♦ Delivering content directly to readers. ♦ Delivering content to publishers. To syndicate, in the sense of syndicating articles or other content, means to simultaneously publish in a number of different publications. For electronic content, the publications are likely to be web sites. "Delivering content directly to readers" is not actually syndication, because syndication implies publishing in a number of publications, which are then read by readers. However, with the popularity of RSS and Atom, delivering content directly to the reader is sometimes thought to be syndication. News-ML [40] ... The versatile News Markup Language for global news exchange! At the heart of NewsML (News Markup Language) is the concept of the news item which can contain various different media – text, photos, graphics, video - together with all the meta-information that enables the recipient to understand the relationship between components and understand the roles of each component. Everything the recipient might need to know about the content of the news provided can be included in NewsML’s structure. For example, NewsML enables publishers to provide the same text in different languages; a video clip in different formats; or different resolutions of the same photograph. NewsML’s rich metadata concept can help with things like revision levels that make it easy to track the evolution of a NewsItem over time, status details (publishable, embargoed, etc.) and administrative details, such as acknowledgements or copyright details. NewsML has default metadata vocabularies to ease implementations but it does not dictate which metadata vocabulary is used (IPTC Subject Codes, ISO country codes etc.) – providers just have to indicate which vocabulary they are using. Multiple vocabularies can be utilised within the same NewsItem. For text objects in a NewsItem, the IPTC’s News Industry Text Format (NITF) is recommended. NewsML is flexible and extensible and uses standard Internet naming conventions for identifying the news objects in a NewsItem. As such, content does not have to actually be embedded within a NewsItem; pointers can be inserted to content held on a publisher’s web site instead. This means subscribers retrieve the data only when they need to and this makes NewsML bandwidth-efficient. NewsML is designed to provide a media-independent, structural framework for multi-media news an its’ versions can be applied at all stages in the (electronic) news lifecycle. Typical uses would include: ♦ In and between editorial systems ♦ Between news agencies and their customers ♦ Between publishers and news aggregators ♦ Between news service providers and end users.

Page 44 D6.1 – v. 5

Because it is intended for use in electronic production, delivery and archiving it does not include specific provision for traditional paper-based publishing, though formats intended for this purpose - such as the News Industry Text Format (NITF)- can be accommodated. Similarly it is not primarily intended for use in editing or creating news content, though it may be used as a basis for systems doing this. The need for NewsML came from the continuing growth in production, use and re-use of news throughout the world, with rapid expansion of the Internet being a strong driving force. NITF [41] A solution for sharing news developed by the world's leading news publishers” NITF –News Industry Text Format uses the extensible Markup Language to define the content and structure of news articles. Because metadata is applied throughout the news content, NITF documents are far more searchable and useful than HTML pages. By using NITF, publishers can adapt the look, feel, and interactivity of their documents to the bandwidth, devices, and personalized needs of their subscribers. These documents can be translated into HTML, WML (for wireless devices), RTF (for printing), or any other format the publisher wishes. NITF was developed by the International Press Telecommunications Council, an independent international association of the world's leading news agencies and publishers. It is a standard that is open, public, proven, well-used, well-documented, and well-supported. XML-News [42] XMLNews is a set of specifications for exchanging news objects such as stories, images, or audio clips in a standard format across different applications and operating systems. XMLNews uses Extensible Markup Language (XML) and industry standards developed by the International Press Telecommunications Council and the Newspaper Association of America. XMLNews has two parts: ♦ XMLNews-Story and, ♦ XMLNews-Meta. XMLNews-Story is an XML document type for text-based news and information. It defines the format of a news story's content and is a subset of News Industry Text Format (NITF), the XML document type definition (DTD) designed to mark up and deliver news content in a variety of ways, including print, wireless devices, and the Web. XMLNews-Meta defines the format of any metadata associated with a story (or any other kind of news object) and is based on the Consortium's (Resource Description Framework RDF). PRISM [43] The Publishing Requirements for Industry Standard Metadata (PRISM) specification defines an XML metadata vocabulary for managing, aggregating, post-processing, multi- purposing and aggregating magazine, news, catalog, book, and mainstream journal content. PRISM recommends the use of certain existing standards, such as XML, RDF, the Dublin Core, and various ISO specifications for locations, languages, and date/time formats. In addition PRISM provides a framework for the interchange and preservation of content and metadata, a collection of elements to describe that content, and a set of controlled vocabularies listing the values for those elements. Metadata is an exceedingly broad category of information covering everything from an article's country of origin to the fonts used in its layout. PRISM's scope is driven by the needs of publishers to receive, track, and deliver multi-part content. The focus is on additional uses for the content, so metadata concerning the content's appearance is outside PRISM's scope. PRISM focused on metadata for: Page 45 D6.1 – v. 5

♦ General-purpose description of resources as a whole ♦ Specification of a resource’s relationships to other resources ♦ Definition of intellectual property rights and permissions ♦ Expressing inline metadata (that is, markup within the resource itself). Today PRISM consists of two specifications. The PRISM Specification, itself, provides definition for the overall PRISM framework. A second specification, the PRISM Aggregator DTD is a new standard format for publishers to use for delivery of content to web sites and to aggregators and syndicators. It is an XML DTD that provides a simple, flexible model for transmitting content and PRISM metadata. ICE [44] “ICE is the protocol for syndicators who are distributing ‘valued content’ that generates a revenue stream or requires guaranteed delivery in a secure environment.” ICE stands for Information and Content Exchange. On October 27, 1998, after more than a year of private development, a press summit was held in San Francisco to announce the completion of this new XML-based Web protocol. The press event was held to celebrate the completion of the ICE Version 1.0 and to provide the first public look at the new standard. On October 28th, 1998, W3C acknowledged the submission of a note on ICE. Now in June 2004, a new, Web Services compliant version, ICE 2.0 has been released to support industrial strength syndication for the next generation of the Web. The mission of the ICE protocol is to facilitate the controlled exchange and management of electronic assets between partners and affiliates across the Web. Applications based on ICE allow companies to easily construct syndicated publishing networks by establishing Web Services based information networks. The ICE specification provides businesses with an XML-based common language and architecture that facilitates automatic exchanging, updating, supplying and controlling of assets in a trusted fashion without manual packaging or knowledge of remote Web site structures. For consumer Web sites, end users benefit from more complete, easier-to-use Web destinations that reduce the frustration of having to surf through many inadequate narrowly focused Web sites to find what they need. ICE was originally developed in 1998 by a community of 80 content providers and software venders, provides businesses with an XML-based common language and architecture that facilitates automatic delivery, updating and managing content assets in a trusted fashion without manual packaging or knowledge of remote Web-site structures. With the development of this major revision to the ICE Specification, robust content syndication is supported in a Web Services environment for the first time. IPTC 7901 [45] The IPTC has formulated its Recommendation 7901 for use in the transmission of text messages to newspapers, news agencies and other recipients. The first version appeared in the early eighties and was updated regularly; the last revision - number 5, was approved in 1995. Since then the development of IPTC 7901 is frozen despite the fact that it is still used heavily in many countries. Although designed primarily for computerized information handling, the Recommendation 7901 is also suitable for transmission to non-computerized recipients. The Recommendation has been influenced by the "High-speed Wire Service Transmission Guidelines" contained in Bulletins 1312 and subsequent amendments thereto of the Newspaper Association of America (NAA), formerly the American Newspaper Publishers Association (ANPA). Because it is intended for international use it takes into account technical and linguistic differences between countries and is designed for use in numerous languages and alphabets.

Page 46 D6.1 – v. 5

To provide a degree of flexibility and to minimize changes from earlier practices, some elements in the Recommendation have been designed as "optional" or "recommended". Those not so designated must be complied with when using the Recommendation.

IIM [46] “The first multi-media news exchange format” The IPTC and the Newspaper Association of America (NAA) had begun to work jointly in 1990 to design a globally applicable model for all kinds of data. As a result from this effort in 1991 the "Information Interchange Model - IIM" version 1 was approved and further developed since then. After the advent of new technologies for data representation - primarily XML - the development of IIM has been frozen in 1997. The latest and still current version is 4.1. Metadata elements of IIM are quite well-known as "IPTC headers" of digital image files. Adobe Systems Inc. invented their own mechanism to insert metadata structures into Photoshop, JPEG and TIFF files but adopted the data structure of IIM and several of its metadata elements. This mechanism of inserting metadata was implemented by other software vendors as well, therefore many image library programs are able to read and write these "IPTC Headers". Besides this specific use the IIM model is designed to provide for universal communications embracing all types of data, including text, photos, graphics, etc. on a single network or a single storage medium. A mechanism is provided to use existing formats during transition. IIM assumes that the sender wishes to transfer a data object, such as a photographic image, text or perhaps a combination of many types. An envelope is provided around the object for information as to the type of data and the file format. Additional information, such as caption, news category or dateline also is included. The object itself is transferred, together with information regarding the size of the data. Thus any form of computerized data could be transferred, together with pertinent editorial and technical information. OCS Directory Format [47] The Open Content Syndication Directory Format is intended to provide a concise, machine readable-listing of a set of syndicated channels. The directory format is capable of supporting multiple sites, each with multiple channels. Each channel can have multiple formats such as RSS (Rich Site Summary) versions 0.90 or 0.91, Plain Text, Avantgo, WML or Scripting News format as well as separate publishing schedules or languages.

Page 47 D6.1 – v. 5

5.5. News personal publishing sources

5.5.1. Web Blogs

The term blog is a blend of the terms web and log, leading to web log, weblog, and finally blog. Authoring a blog, maintaining a blog or adding an article to an existing blog is called blogging. Individual articles on a blog are called "blog posts," "posts" or "entries". A person who posts these entries is called a Blogger. The first blogs were known as "online diaries", and started in 1994. The term "weblog" itself was coined by Jorn Barger on 17 December 1997. A weblog is a website where regular entries are made (such as in a journal or diary) and presented in reverse chronological order. Blogs often offer commentary or news on a particular subject, such as food, politics, or local news; some function as more personal online diaries. A typical blog combines text, images, and links to other blogs, web pages, and other media related to its topic. Most blogs are primarily textual although many focus on photographs, videos or audio.

A blog entry typically consists of the following: ♦ Title - main title, or headline, of the post. ♦ Body - main content of the post. ♦ - the URL of the full, individual article. ♦ Post Date - date and time the post was published.

Page 48 D6.1 – v. 5

A blog entry optionally includes the following: ♦ Comments - comments added by readers ♦ Categories (or tags [48]) - subjects that the entry discusses ♦ [49] and or [50] - links to other sites that refer to the entry Alongside the regularly updated entries, a blog site often has a less-frequently-updated list of links, or a blogroll of other blogs that the author reads; and/or, with whom he or she affiliates. Although blogs are typically a text medium, there are also non-text versions such as audioblogs [52] (sometimes known as podcasts [53]), [54] and videoblogs -vlogs [55].

5.5.2. Mob Blogs

Moblog is a blend of the words mobile and weblog. A mobile weblog, or moblog, consists of content posted to the Internet from a mobile or portable device, such as a cellular phone or PDA. Moblogs generally involve technology which allows publishing from a mobile device. Much of the earliest development of moblogs occurred in Japan, among the first countries in the world where camera phones (portable phones with built-in cameras) were widely commercially available. The first post to the web from a mobile user was from Steve Mann in 1995. He used a wearable computer, a more elaborate predecessor to modern moblogging devices. The first post to the Internet from an ordinary mobile device is believed to be by Tom Vilmer Paamand in Denmark in May 2000. The term is sometimes pronounced with the emphasis on the first syllable - MOBlog - out of affinity with the ideas about social self-organization developed in Howard Rheingold's "Smart Mobs".

5.5.3. Spam Blogs Spam blogs, sometimes referred to by the neologism splogs, are weblog sites which the author uses only for promoting affiliated websites. The purpose is to increase the PageRank4 of the affiliated sites, get ad impressions from visitors, and/or use the blog as a link outlet to get new sites indexed. Content is often nonsense or text stolen from other websites with an unusually high number of links to sites associated with the splog creator which are often disreputable or otherwise useless websites. There is frequent confusion between the terms "splog" and "". Splogs are blogs where the articles are fake, and are only created for .

4 PageRank [51] is a patented method, to assign a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of "measuring" its relative importance within the set.

Page 49 D6.1 – v. 5

5.5.4. Video Blogs

A or video blog [55] is a blog (short for weblog) which uses video as the primary content; the video is linked to within a videoblog post and usually accompanied by supporting text, image, and additional metadata to provide context. It has become a significant contributor to clip culture. Blogs often take advantage of RSS or Atom for syndication to other web sites and aggregator software (feed readers). With development of RSS enclosures, which provide the ability to attach media files to a feed item/blog post, or the use of the Atom format (which supports rich media content by design) it is possible to bypass the mainstream intermediaries and openly distribute media to the masses via the Internet. Vlogs typically take advantage of this technological development, just as audioblogs have in recent years via the podcast boom. As of 2006, videoblogging is rising in popularity, especially since the release of the new Apple Video iPod and the availability of iTunes Store's video content. Another indicator of its popularity is the growth in uploads as well as traffic to sites like SelfcastTV and YouTube. One of the potential problems with Vlogs is the current inability of search engines to create rich metadata or "search engine" data from the stream. For Vlogs to be fully embraced as part of web culture, some indexing solution will need to emerge.

Page 50 D6.1 – v. 5

5.5.5. Podcasting

Podcasting [53] is the method of distributing multimedia files, such as audio programs or music videos, over the Internet using either the RSS or Atom syndication formats, for playback on mobile devices and personal computers. The term podcast like ‘radio’ can mean both the content and the method of delivery. The host or author of a podcast is often called a podcaster. Podcasters' web sites may also offer direct download or streaming5 of their files a podcast however is distinguished by its ability to be downloaded automatically using software capable of reading RSS or Atom feeds. Usually a podcast features one type of 'show', with new episodes released either sporadically or at planned intervals such as daily or weekly. In addition, there are podcast networks that feature multiple shows on the same feed.

5 Streaming media is media that is consumed (read, heard, viewed) while it is being delivered. Streaming is more a property of the delivery system than the media itself. The distinction is usually applied to media that are distributed over computer networks; most other delivery systems are either inherently streaming (radio, television) or inherently non-streaming (books, video cassettes, audio CDs).

Page 51 D6.1 – v. 5

6. Conclusions Within this document the consortium has defined the setting into which MESH will be positioned. In this context: ♦ a high level project description and rational have been identified in the limits of a single page; ♦ the targeted users have been identified and briefly described; ♦ six application scenarios have been authored as storylines, summarized in a tabular format and abstracted as system boundary models; ♦ the foreseen necessary MESH services have been grouped and associated to the MESH technical tasks which are expected to lead to their implementation; ♦ existing news syndication services and standards have been surveyed and listed for possible interfacing to the MESH platform. The results of this deliverable should be further used primarily for a common understanding of the MESH vision and expected result. Moreover, the identified internal services shall evolve to a more detailed description as use cases, to be modelled in detail through UML sequence diagrams. Finally, the external services survey will be later used by Sub-task 6.4.3 (‘Interface to external services’) where these services will be prioritized and a set of most suitable candidates will be selected for integration and use by the MESH platform.

Page 52 D6.1 – v. 5

References [1] Internet Engineering Task Force, http://www.ietf.org/ [2] Heinz Wittenbrink, Rss And Atom: Understanding And Implementing Content Feeds And Syndication, Packt Publishing , December 2005 [3] http://www.microsoft.com/technet/security/bulletin/secrssinfo.mspx [4] W3C Consortium, Resource Description Framework, http://www.w3.org/RDF/ [5] http://en.wikipedia.org/wiki/Web_Syndication [6] W3C Home page, http://www.w3c.org [7] W3C Consortium, Extensible Markup Language (XML), http://www.w3.org/XML/ [8] James H. Coombs, Allen H. Renear, Steven J. DeRose, Markup Systems and the Future of Scholarly Text Processing, Communications of the ACM 30 (November 1987); 933-47 [9] http://www.rssreader.com/ [10] http://www.sharpreader.net/ [11] http://feedreader.com/ [12] http://disobey.com/amphetadesk/ [13] http://www.newsgator.com/ [14] http://www.rssbandit.org/ [15] http://atomenabled.org/ [16] http://www.atomenabled.org/developers/ [17] http://www.atomenabled.org/publishers/ [18] Jeff Beard, The Great RSS vs. Atom News Feed Debate, February 13, 2004 [19] Paul Festa, Dispute exposes bitter power struggle behind Web logs, CNET News.com, August 4, 2003, http://news.com.com/2009-1032-5059006.html [20] http://www.cincomsmalltalk.com/BottomFeeder/ [21] http://www.feedreader.com/screenshots.php [22] http://www.google.com/search?q=rss+feeds [23] http://www.newsgator.com/NGOLProduct.aspx?ProdID=FeedDemon [24] http://www.newsgator.com/NGOLProduct.aspx?ProdID=NewsGator+In box [25] http://www.newsgator.com/NGOLProduct.aspx?ProdID=NetNewsWire [26] http://www.pluck.com/products/getpluck.html [27] http://www.mozilla.org/products/firefox/live-bookmarks.html [28] http://www.apple.com/macosx/features/safari/ [29] http://freshlysqueezedsoftware.com/products/pulpfiction/ [30] http://www.newsgator.com/ [31] http://my.yahoo.com/ [32] http://www.bloglines.com/

Page 53 D6.1 – v. 5

[33] http://www.pluck.com/ [34] http://www.rojo.com/ [35] http://www.newsburst.com/ [36] http://www.apple.com/itunes/ [37] http://juicereceiver.sourceforge.net/ [38] http://www.dopplerradio.net/ [39] http://fireant.tv/download [40] International Press Telecommunications Council, News Markup Language (NewsML), http://www.newsml.org/pages/index.php [41] International Press Telecommunications Council, News Industry Text Format (NITF), http://www.nitf.org/ [42] XMLNews-Story & XMLNews-Meta Specifications, http://www.xmlnews.org/ [43] International Digital Enterprise Alliance (IDEAlliance), Publishing Requirements for Industry Standard Metadata (PRISM), http://www.prismstandard.org/ [44] International Digital Enterprise Alliance (IDEAlliance), Information and Content Exchange (ICE), http://www.icestandard.org/ [45] International Press Telecommunications Council, The IPTC Recommended Message Format, IPTC Recommendation 7901, 1995, http://www.iptc.org/IPTC7901 [46] International Press Telecommunications Council, Information Interchange Model (IIM), http://www.iptc.org/IIM/ [47] http://internetalchemy.org/ocs/index.html [48] http://en.wikipedia.org/wiki/Tag [49] Six Apart, TrackBack Technical Specification, http://www.sixapart.com/pronet/docs/trackback_spec. See also http://en.wikipedia.org/wiki/Trackback [50] Stuart Langridge, Ian Hickson, Pingback 1.0 Specification, http://www.hixie.ch/specs/pingback/pingback [51] Sergey Brin, Lawrence Page. "The anatomy of a large-scale hypertextual Web search engine". Proceedings of the seventh international conference on World Wide Web 7, 107-117, 1998 [52] MP3 blog, http://en.wikipedia.org/wiki/Audioblog [53] Podcasting, http://en.wikipedia.org/wiki/Podcasting [54] , http://en.wikipedia.org/wiki/Photoblog [55] Vblog, http://en.wikipedia.org/wiki/Vlog [56] What is RSS/XML/Atom/Syndication?, http://mezzoblue.com/archives/2004/05/19/what_is_rssx/ [57] http://www.feedburner.com/fb/a/aboutrss

Page 54