European Research Consortium for Informatics and Mathematics Number 66, July 2006 www.ercim.org

Special: European Digital Library CONTENTS

JOINT ERCIM ACTIONS Information Access and Multimedia 4 ERCIM Beyond-the-Horizon Action Coordinates European ICT Research for the Future 34 Personalizing Digital Library Access with Preference-Based by Peter Kunz Queries by Periklis Georgiadis, Nicolas Spyratos, Vassilis Christophides 5 Stelios Orphanoudakis ERCIM Memorial Seminar and Carlo Meghini by Erzsébet Csuhaj-Varjú 36 Multilingual Interactive Experiments with Flickr 6 The ERCIM "Alain Bensoussan" Fellowship Programme by Jussi Karlgren, Paul Clough and Julio Gonzalo 7 ERCIM Workshop on Software Evolution 37 Multimedia Ontologies for Video Digital Libraries by Tom Mens, Maja D'Hondt and Laurence Duchien by Alberto Del Bimbo, Marco Bertini and CarloTorniai 8 A Tribute to Franco Denoth 38 Structured Multimedia Description for Simplified Interaction EUROPEAN SCENE and Enhanced Retrieval by Stephane Marchand-Maillet, Eric Bruno and Nicolas Moënne- 8 EC Expert Group on Next Generation GRIDs Loccoz by Keith Jeffery 40 Taking a New Look at News EURO-LEGAL by Arne Jacobs and Nektarios Moumoutzis 9 European Commission Consulting on Copyright Levy by Yue Liu 41 Radio Relief: Radio Archives Departments Benefit from Digital Audio Processing NEWS FROM W3C by Martha Larson, Thomas Beckers and Volker Schlögell 42 Self-Organizing Distributed Digital Library Supporting Audio- 10 In Memoriam: Alan Kotok Video 10 W3C to Participate in Advisory Board of Internet Governance by László Kovács, András Micsik, Martin Schmidt and Markus Seidl Forum Repositories and Preservation 10 W3C Workshop on a Device Description Repository 44 Repositories and Preservation in the UK 10 Second Incubator Group to Explore Semantic Web for by Neil Jacobs Multimedia Content 45 Digital Library of Historical Newspapers 11 W3C Launches WebCGM Working Group by Martin Doerr, Georgios Markakis and Maria Theodoridou 11 Call for Implementations of Mobile Web Best Practices 1.0 46 DML-CZ: Czech Digital Mathematics Library 11 W3C Web Security Workshop Report by Jirí Rákosník 11 Latest W3C Recommendations New Projects 47 CASPAR and a European Infrastructure for Digital Preservation SPECIAL THEME: EUROPEAN DIGITAL LIBRARY by David Giaretta 12 Towards the European Digital Library - Introduction 49 PROBADO – Non-Textual Digital Libraries put into Practice by Ingeborg Torvik Sølvberg and Costantino Thanos by Thorsten Steenweg and Ulrike Steffens Invited articles 50 WIKINGER - Semantically Enhanced Knowledge Repositories 14 From Digital Libraries to Knowledge Commons for Scientific Communities by Yannis Ioannidis by Lars Broecker 15 The European Digital Library – A Project of the Conference of European National Librarians R&D AND TECHNOLOGY TRANSFER by Elisabeth Niggemann 52 Visualization 16 A Forward-Looking European Digital Library? Hence 5S? Computer measures Coral Structures by Edward A. Fox by Chris Kruszynski and Annette Kik 18 Technology Applied in Digital Libraries 53 Pattern Recognition by John M Lervik and Svein Arne Brygfjeld Fast Synthesis of Dynamic Colour Textures 20 The Shifting Landscape of Digital Libraries Research and by Jirí Filip, Michal Haindl and Dmitry Chetverikov Development in Australia 55 Multilingual/Multimodal Information Retrieval by Jane Hunter MultiMATCH - Multilingual/Multimedia Access to Cultural Architecture Heritage 21 A Reference Architecture for Digital Library Systems by Carol Peters by Leonardo Candela, Donatella Castelli and Pasquale Pagano 56 Security 22 DelosDLMS - Infrastructure for the Next Generation of Digital Access Control and Data Distribution Solutions for the Library Management Systems Swedish Network Based Defence by Hans-J. Schek and Heiko Schuldt by Frej Drejhammar, Ali Ghodsi, Erik Klintskog, Erik Rissanen and Babak Sadighi 24 A Powerful and Scalable Digital Library Information Service by Henri Avancini, Leonardo Candela, Andrea Manzi and Manuele 58 Ambient Intelligence Simi Bringing Ambient Computing out of the Labs - INRIA's Agreement with JCDecaux 25 in Peer-to-Peer-Based Digital Libraries by Michel Banâtre by Hao Ding and Ingeborg Torvik Sølvberg 59 Text Research 27 XPeer: A Digital Library for the European Higher Education Area New Text - New Conversations in the Media Landscape by Mark Roantree and Zohra Bellahsène by Jussi Karlgren Ontologies and Metadata 60 Logistics 28 Increasing the Power of Semantic Interoperability for the LOG4SMEs: Improving the Logistics Performance of SMEs in European Library the Automotive Sector by Martin Doerr by Imre Czinege, Elisabeth Ilie-Zudor and András Pfeiffer 29 A Tool for Converting Bibliographic Records by Trond Aalberg 61 Networks VTT Develops Dependability Evaluation Methods for IP Networks 31 Information Patterns for Digital Cultural Repositories by Ilkka Norros by Chryssoula Bekiari, Panos Constantopoulos and Martin Doerr 62 Software Engineering 32 Towards a Semantic Information Platform for Subsea Validating Complex Telecommunication Software Petroleum Processes by Sergio Contreras, María del Mar Gallardo, Pedro Merino, David by Jon Atle Gulla Sanán, Javier Rivas and Joaquín Torrecilla 33 Towards The European Metadata Registry 63 IST Results - Insight into EU R&D Achievements by László Kovács, András Micsik and Jill Cousins EVENTS

64 ANNOUNCEMENTS

66 IN BRIEF

2 ERCIM News No. 66, July 2007 Next issue: October 2006 — Special theme: Embedded Intelligence KEYNOTE

Building Europe's Digital Library

Earlier this year, the Commission unveiled a roadmap that will see to the realization by 2010 of a distributed European library presence (also involving archives and museums). This roadmap sets clear targets: full cooperation between Europe's national libraries by the end of this year, digitized objects increasing from 2 million in 2008 to 6 million in 2010, and support for multilin- gual access. It builds on The European Library, a partnership between national libraries that provides an organizational nucleus for further developments.

Large-scale digitization requires that Member States and institutions ramp up their efforts, as well as making improvements in the efficiency and sophistica- Horst Forster, Director Content, Directorate tion of processing and indexing methods for digitized data. General Information Society and Media, European Commission Most current digitization initiatives concentrate on material in the public domain. This creates the risk of fostering a 20th century black hole in our organised knowledge: in a world where increasingly the Web is the sole source of information, we would banish 20th century works to digital oblivion. For the audiovisual world, which was essentially born in the 20th century, this is par- ticularly critical. Accepted ways must therefore be found that will ensure digi- tized copyrighted and orphan materials are accessible on the Internet. The High-Level Group on Digital Libraries set up by Commissioner Reding started its work earlier this year by looking into this subject. There is also substantial research work to be carried out.

What are the research issues to be addressed in this area, and how can the research community contribute? Over the past four years, the European Commission has already invested more than 100 M€ in funding for digital libraries research. This needs to be stepped up in the 7th Framework Programme. A solid body of existing research is creating the building blocks for a European digital library. Topics of immediate concern revolve around more sophisticated treatment of digitized materials; automated indexing of text, sound and images; improved multilingual search engines; services sup- porting annotations and collaborative work. The work and the results of the research community need however to be anchored in the needs of users, whether these be the owners and creators of content or the end user.

The first universal library in Alexandria burnt down 2000 years ago. Our digital libraries may face an equivalent loss if the question of how to preserve digital content is not adequately addressed. A number of ongoing Commission-funded projects are working on the integration of digital preservation tools into work- flows and the preservation of different digital formats, including high-volume scientific data and multimedia music. This creates a baseline for future research.

I have no doubts that we shall reach our targets by 2010. The future will tell whether the European Digital Library will be more durable than the Library of Alexandria.

Horst Forster

ERCIM News No. 65, April 2006 3 JOINT ERCIM ACTIONS

ERCIM Beyond-the-Horizon Action Coordinates European ICT Research for the Future

by Peter Kunz

The results of theBeyond-the-Horizon action, which identified futuer ICT research challenges, was highlighted at a meeting with members of the European Parliament on 10 May in Brussels.

The results of the Beyond-the-Horizon designs, as well as a search for alterna- action, which identified future ICT tive computing methods, eg quantum research challenges, were highlighted at computing a meeting with members of the European • since future ICT systems need greater Parliament on 10 May in Brussels. 'intelligence' in order to function prop- erly, a promising way to achieve this is Bits, atoms and genes define the scene of to study how living organisms – from future European research into a single cell to animal colonies and the Information and Communication human brain – process information Technology (ICT). Computers, the phys- • the rapidly increasing volume and ERCIM has published a booklet ical world and living organisms will complexity of data and networks, in summarising the results of the Beyond- increasingly merge, leading to entirely which humans interact with many The-Horizon action - a European new methods of computing and commu- small, embedded, mobile devices, coordination action to identify ICT-related nication. Large-scale interdisciplinary requires penetrating studies of com- research trends and strategic areas that research efforts in this direction will be plex systems (Nature may teach us require support. The booklet is available crucial for Europe's competitiveness in here too) for download from the ERCIM web site. the long term. • mechanisms should be devised to ensure security for, and trust in the use This picture is the driving force behind of future technologies, which offer the Beyond-the-Horizon action. dazzling possibilities but also serious Coordinated jointly by ERCIM and ICS- threats. FORTH, it has identified six key nator of the action from ICS-FORTH: research areas for developing the ICT of ERCIM has edited a booklet summa- "ICTs provide the glue that binds tomorrow's world. The action is funded rizing the results of the six thematic together multiple themes in European by IST-FET, the Future and Emerging research areas identified by the Beyond- research. The time to address this multi- Technologies activity of the EU The-Horizon action. These are: plicity of themes and their inter-relation- Information Society Technologies pro- • Pervasive Computing and ships is now. gramme. Several workshops held across Communications Europe during 2005 were followed by • Nano-Electronics and Links: extensive consultation using the Internet, Nanotechnologies Beyond-The-Horizon home page: thus involving all relevant European • Security, Dependability and Trust http://www.beyond-the-horizon.net research communities in the action. • Bio-ICT Synergies • Intelligent and Cognitive Systems Booklet presenting the results of the Beyond-the-Horizon was presented at a • Software-Intensive Systems. Beyond-The-Horizon action: meeting with members of the European http://www.ercim.org/publication/policy/ Parliament on 10 May in Brussels, where Three additional research areas were BTH-booklet-MAY2006.pdf researchers and officials from ERCIM brought in by IST-FET and are included and the EU further elucidated the action. in the report: Feature article about Beyond-The-Horizon • Quantum Information Processing and "ICTs - the glue that binds future research" ICT has always profited from cross-fertil- Communication published by "IST Results" ization with other scientific disciplines, • Complex Systems http://istresults.cordis.lu/index.cfm/section/ including mathematics, biology, materials • Tera-Device Computing. news/tpl/article/ID/82433 science and psychology. This is reflected in the wide range of problems and chal- 'IST Results', an online news service Please contact: lenges to be addressed in the identified provided by the European Commission, Dimitris Plexousakis, ICS- FORTH research areas. For example: has recently published a feature article Tel: +30 2810 391637 • the miniaturization of components on on the Beyond-the-Horizon action citing E-mail: [email protected] a chip requires new materials and new Dimitris Plexousakis, scientific coordi-

4 ERCIM News No. 66, July 2006 JOINT ERCIM ACTIONS

Stelios Orphanoudakis ERCIM Memorial Seminar by Erzsébet Csuhaj-Varjú

In honour of Stelios Orphanoudakis, the fourth president of ERCIM, a memorial seminar was held on 30 May 2006 in Budapest during the ERCIM meetings at SZTAKI.

The premature passing of Stelios the oeuvre of Stelios Orphanoudakis, for Health Telematics, Denmark). The Orphanoudakis on 18 March 2005 was a briefly outlining his distinguished aca- talk addressed past and future develop- great loss for the ERCIM community. A demic career and outstanding scientific ments in e-Health, with reference to the commemorative event was organized by achievements, his fundamental role in pioneering ideas of Stelios Constantine Stephanidis, director of the life of FORTH and in the past and Orphanoudakis in this area. Special FORTH-ICS, and was held in Budapest the future of ICT in Greece, and his mention was given to HYGEIAnet, the during the ERCIM meetings. influential pioneering activities in inter- Integrated Regional Health Information national scientific cooperation. Stelios Network of Crete, which is one of The memorial seminar was opened by Orphanoudakis deeply believed in Orphanoudakis' most significant Keith Jeffery, president of ERCIM, ERCIM's promotion of European scien- achievements. whose speech highlighted the accom- tific research, and was actively com- plishments of Stelios Orphanoudakis mitted to the achievement of this objec- The second talk, 'Computer Vision and and the outstanding role he played in tive. His absence is a great loss for Intelligent Systems', was presented by ERCIM. On behalf of the ERCIM com- FORTH, for Greece, and for the interna- Jan-Olof Eklundh (KTH, Sweden), who munity, Keith Jeffery thanked the orga- tional scientific community. As a former summarized the trends in computational nizers of the meeting for their efforts, colleague, Constantine Stephanidis also vision and robotics. Kostas Daniilidis and Jose Koster (CWI) from the Human warmly recalled their collaboration and (University of Pennsylvania, USA) gave Resources Managers Task Group of friendship. the third presentation, '3D Visuali- ERCIM, for commissioning a sculpture zation, 3D Navigation, 3D Content for permanent exhibition at FORTH- Technical presentations by distin- Creation'. The pioneering work of ICS, as a symbol of ERCIM communi- guished speakers and former collabora- Stelios Orphanoudakis in all these areas ty's high esteem for Stelios tors of Stelios Orphanoudakis followed, was emphasized. Orphanoudakis. He also thanked Cor reflecting his main fields of research. Baayen, Dennis Tzichritzis and Gerard Stelios Orphanoudakis had dedicated In addition to his research activities, van Oortmerssen, former presidents of many years of teaching and research to Orphanoudakis had served on various ERCIM, Alain Bensoussan, one of the the fields of computational vision and committees and Working Groups of the 'founding fathers' of ERCIM, and the robotics, intelligent image management European Commission, and was active Orphanoudakis family members for and retrieval by content, medical infor- in numerous European R&D programs. their participation in the event. matics, and medical imaging. The first talk was entitled 'Stelios Orphanoudakis, After the break, Ilias Iakovidis (Deputy The director of ICS-FORTH, Constantine The European and The Greek', and was Head of Unit-ICT for Health, European Stephanidis, then gave an overview of given by Niels Rossing (Danish Centre Commission) gave a talk entitled

From left: Keith Jeffery, ERCIM president, Constantine From left: The former ERCIM presidents Gerard van Oortmerssen, Stephanidis director of ICS-FORTH, Ava and Eleni Dennis Tsichritzis and Cor Baayen. Orphanoudakis, wife and daughter of Stelios Orphanoudakis.

ERCIM News No. 66, July 2006 5 JOINT ERCIM ACTIONS

'e-Health: Achievements and Future The ERCIM "Alain Bensoussan" Plans of the European Union'. Iakovidis also highlighted the important role Fellowship Programme played by Stelios Orphanoudakis in this area and in the European cooperation. ERCIM offers fellowships in leading European information technology research All the presentations offered a unique centres. Fellowships are available for PhD-holders from all over the world. blend of science and personal reflec- tions. The Fellowship Programme has been established as one of the premier activities of ERCIM. Since its inception in 1991, over 180 fellows have passed through the programme. ERCIM has In the second part of the meeting, Cor now named the programme to honour Alain Bensoussan, one of ERCIM's 'founding fathers. Baayen, Dennis Tscihritzis, Gerard van As presidient of INRIA, Alain Bensoussan initiatiated the creation of ERCIM in 1989 together Oortmerssen and Alain Bensoussan dis- cussed the extraordinary merits of with Cor Baayen from CWI and Gerhard Seegmueller from GMD (now part of Fraunhofer Stelios Orphanoudakis as a scientist and Institute). as an individual, expressing their high estimation of his role in promoting Conditions ERCIM. They described Stelios with Applicants must: • have obtained a PhD degree during the last 4 years prior to the application deadline or be in the last year of the thesis work with an outstanding academic record • be fluent in English • be discharged or get deferment from military service • start the grant before October 2007 (for the September 2006 application deadline) • have completed their PhD before starting the grant.

Fellowships are of 18 month duration, spent in two of the ERCIM institutes. In particular cases a fellowship might be of 12 month duration spent in one insitute.

The fellow will receive a competitive monthly allowance which varies depending on the country. In order to encourage mobility a member institute will not be eligible to host a candi- date of the same nationality. Further, a candidate cannot be hosted by a member institute, if he or she has already worked in this institute for a total of 6 months or more, during the last 3 A replica of the memorial artwork for permanent exhibition at ICS-FORTH years. as a sign of the high esteem in which the ERCIM community holds its former Topics president Stelios Orphanoudakis. The programme focuses on topics defined by the ERCIM working groups and projects admin- istrated by ERCIM. Topics include: Applications of Numerical Mathematics in Science, BioMedical Informatics, Constraints, Control and System Theory, E-Learning, Dependable Software-Intensive Embedded Systems, Digital Libraries, Environmental Modelling, Formal great warmth, giving tribute to his Methods for Industrial Critical Systems, Grids, Image and Video Understanding, IT and charismatic personality, and his vision Mathematics applied to Interventional Medicine, Matrix Computations and Statistics, Rapid as a scientist who realized his ideas and Integration of Software Engineering Techniques, Security and Trust Management, Semantic plans with vigour. He had been a strong Web, Soft Computing, Software Evolution, User Interfaces for All. and pragmatic advocate of cooperation in Europe, and a great supporter of In addition, applications are also welcome for other areas in which ERCIM institutes are active. ERCIM since its inception. Detailed description of the topics is available on the ERCIM web site. At the end of the meeting, the commem- orative sculpture was presented to ERCIM does not only encourage researchers from academic institutions to apply, but also sci- Constantine Stephanidis for permanent entists working in industry. exhibition at FORTH-ICS, and small replicas were presented to Ava Deadlines for Application Orphanoudakis and the invited speakers. Deadlines for applications are 30 April and 30 September every year. The seminar concluded with a heartfelt speech by Ava Orphanoudakis, which More Information was greatly appreciated by the audience. Detailed information, conditions and online application form is available at: http://www.ercim.org/fellowship/

6 ERCIM News No. 66, July 2006 JOINT ERCIM ACTIONS

ERCIM-Sponsored Events ERCIM Workshop on Software Evolution ERCIM sponsors up to ten conferences, workshops and summer by Tom Mens, Maja D'Hondt and Laurence Duchien schools per year. The funding for all types of events is 2000 Euro. The ERCIM Working Group on Software Evolution organised its annual two-day workshop 6-7 April 2006 at Université des Sciences et Technologies de Lille Conferences (USTL) in France. The workshop reported on the theoretical, practical and empirical ERCIM invites sponsorship proposals from research on software evolution carried out by the working group members, and established conferences with an discussd new opportunities for collaboration. international reputation, where substantive overlap can be shown between the The workshop gathered 40 researchers The committee discussed the current conference topic and ERCIM areas of from ten European countries. In total, 25 status of the network, which includes activity. Typical cases would include annual position papers were submitted to the over 35 members from research insti- conferences in computer science with workshop, all of which were peer- tutes all over Europe, 17 of which belong international programme committees, reviewed by an international programme to ten different ERCIM partner institutes substantial international participation, and committee consisting of 17 well-known and plans about future activities such as proceedings published with an established researchers. The best submissions were opportunities for proposing new initia- international science publisher. selected for inclusion in a special issue of tives within the IST domain of the EU Elsevier's Electronical Notes in 7th Framework Programme, in particular Workshops and Summer Schools Theoretical Computer Science. Eleven within the strategic objective "Adaptive ERCIM sponsors workshops or summer submissions were invited for a long pre- Software Intensive Systems". It appears schools (co-) organised by an ERCIM sentation and six for short presentation. that the need for supporting software institute. The additional funding provided These workshop presentations covered a adaptation and software evolution is by ERCIM should be used to enhance the wide variety of research topics. Among becoming increasingly important within workshop by, for example, increasing the others, the following topics were this strategic objective. number of external speakers supported. addressed, with the aim to provide either better formal support or better tool support: The workshop was co-organised by Tom Application Deadlines model-driven software evolution, aspect- Mens (WG chair), Laurence Duchien and • Conferences: oriented software evolution, component- Maja D'Hondt (ERCIM postdoctoral 15 July 2006 for conferences later than based software evolution, architectural fellow) who offered to host the workshop 15 May 2007 evolution, runtime software evolution, at the Laboratoire d'Informatique 15 October 2006 for conferences later empirical analysis, software restructuring, Fondamentale de Lille (LIFL) and than 15 August 2007 and software quality measurement. INRIA Futurs in Lille, France. • Workshops and summer schools: 15 July 2006 for workshops and schools Arie Van Deursen, Delft University of Links: later than 15 October 2006 Technology in the Netherlands, gave an Working Group website: 15 October 2006 for workshops and invited talk on 'The Software Evolution http://w3.umh.ac.be/evol/ schools later than 15 December 2006 Paradox: An Aspect Mining Perspective'. Workshop website: During this talk, he explored the relation http://w3.umh.ac.be/evol/meetings/ Events sponsored in 2006 between software evolution and the evol2006.html • World Wide Web Conference 2006, exciting research domain of aspect-ori- Edinburgh, UK, 22-26 May 2006 ented software development. Please contact: • CAiSE 2006 - 18th Confernce on Tom Mens, Institut d'Informatique, Advanced Information Systems In addition to the scientific purpose, the Université de Mons-Hainaut, Belgium Engineering, Luxembourg, 5-9 June workshop also hosted the annual steering Tel: +32 65 37 3453 2006 committee meeting of the ERCIM E-mail: [email protected] • COOP 2006 - European Conference on Working Group on Software Evolution. Object-Oriented Programming, 20th edition, Nantes, France, 3-7 July 2006 • CONCUR 2006 - 17th International Conference on Concurrency Theory, Bonn, Germany, 27-30 August 2006 • DISC 2006 - International Symposium on Distributed Computing, Stockholm Sweden, 19-21 September 2006

More information: http://www.ercim.org/activity/sponsored.html Workshop participants.

ERCIM News No. 66, July 2006 7 EUROPEAN SCENE

It is with great sorrow that CNR announces the death of Franco EC Expert Group on Next Generation GRIDs Denoth, Director of the Institute for Informatics and Telematics of CNR, Pisa, and former member of the ERCIM by Keith Jeffery Board of Directors. The third Next Generation Grid expert group (NGG3) convened by the European Commission has completed its work and reported. The report (and much relevant documentation on GRIDs) is available at http://www.cordis.lu/ist/grids/

In the past few years, a group of high- permit the construction of such systems. level experts, named the Next Almost 20% of the NGG2 experts were Generation Grid (NGG) expert group, from ERCIM member organisations: this has developed a vision that has emerged reflects the broadening acceptance of as the European vision for Grid research. GRIDs in other organisations repre- Driven by the need and opportunity of sented by the majority of experts. bringing Grid capabilities to business Projects resulting from EC FP6 Call 5 in and citizens, the NGG vision underpins the area of GRIDs are currently under Professor Denoth had a long and the evolution of Grid from a tool to solve negotiation. distinguished scientific career at CNR, spreading over almost fifty years, and compute- and data-intensive problems beginning with his participation in the towards a general-purpose infrastructure NGG3 (2005 reporting January 2006) design of the first Italian computer to enabling complex business processes built upon these foundations and concen- be built for scientific computing and workflows across virtual organisa- trated on a service-oriented architecture activities. He was director of three tions (VOs) spanning multiple adminis- where the services have strong semantic Institutes in Pisa: the Istituto di trative domains. descriptions allowing self-choreography Elaborazione dell'Informazione (1979- (composition with flexibility and 1994), the Istituto per le Applicazioni The NGG vision, articulated by NGG1 dynamism). Again approximately 20% Telematiche (1999-2002), and the (2003) consists of three complementary of the experts were drawn from ERCIM Istituto di Informatica e Telematica (IIT) (from 2002) and was President of the dimensions: the end-user perspective member organisations and this team had CNR National Committee for where the simplicity of access to and use a much stronger participation from Information Sciences and of Grid technologies is exemplified; the industry, indicating the take-up of com- Technologies. He was also Italian architectural perspective where the Grid mercial interest in GRIDs. The SOKU representative at the European is seen as a large evolutionary system (Service-Oriented Knowledge Utility) Commission for the IST programme made of billions of interconnected nodes vision identifies a flexible, powerful and under FP5. From 1999 he was of any type; and the software perspective cost-efficient way of building, operating responsible for the Italian Registry of of a fully programmable and customis- and evolving IT intensive solutions for Internet Domain names and, since 2002, CNR delegate on the board of able Grid. In order to realise the Next use by businesses, science and society. It EURid, the European Registry of Generation Grid vision, numerous builds on existing industry practices, Internet Domain names. From 1991 - research priorities were identified in trends and emerging technologies and 2001, he was the CNR representative terms of properties, facilities, models, gives the rules and methods for com- on the ERCIM Board of Directors. tools, etc. which have inspired national bining them into an ecosystem that pro- and international research programmes motes collaboration and self-organisa- Denoth was a firm believer in for Grid research. Almost half of the tion. The benefits are increased agility, multidisciplinary research and his main scientific interests were in the NGG1 experts were from ERCIM lower overhead costs and broader avail- application of electronics and member organisations. The EC FP6 ability of useful services for everybody, information technologies to medicine Call2 in the area of GRIDs resulted in shifting the balance of power from tradi- and biology. He was awarded three projects aligned with the NGG1 vision, tional ICT (Information and Communi- prestigious prizes for his research in including the Network of Excellence cation Technology) players towards these areas. Author of many scientific managed by ERCIM: 'CoreGRID'. intermediaries and end-consumers of articles, he was member of the editorial ICT. It is fortunate that SOKU may also board of several international journals, NGG2 (2004) went further and elabo- be read as Self-Organising Knowledge and editor of a number of books. rated the middleware required for Utility and Semantic Oriented Franco Denoth will be deeply missed GRIDs and considered the requirements Knowledge Utility. by all his colleagues and friends, in of operating systems to support a GRIDs Italy and abroad, not only for his environment. Particular attention was The need for developing the SOKU scientific merits but also for the paid to the need for self-* (self-man- vision stems from the necessity of effec- warmth of his personality, his aging, self-organising, self-healing, self- tively bringing knowledge and pro- willingness to share a joke and his tuning etc) systems. There was initial cessing capabilities to everybody, thus readiness to offer advice or assistance when necessary. consideration of the need for semantic underpinning the emergence of a com- description of service components to petitive knowledge-based economy. The

8 ERCIM News No. 66, July 2006 EUROPEAN SCENE:EUROPEAN OPEN ACCESS SCENE:

SOKU vision builds on and extends the EURO-LEGAL Next Generation Grids vision. It captures News about legal information relating to Information Technology from European three key notions: directives, and pan-European legal requirements and regulations. • Service Oriented - the architecture comprises services which may be instantiated and assembled dynami- European Commission ingly deployed on digital equipment, cally, hence the structure, behaviour multifunction devices such as personal and location of software is changing at Consulting computers, hard disks and even run-time; printers • Knowledge - SOKU services are on Copyright Levy • the internal market and differences in knowledge-assisted ('semantic') to copyright levy systems facilitate automation and advanced Under the Copyright Directive, EU • opinions from several sectors that are functionality, the knowledge aspect member states were given a choice: affected by copyright levies such as reinforced by the emphasis on deliv- either allow private copying and give rights holders, collecting societies, the ering high level services to the user; 'fair compensation' to rights holders or record and film industries, the ICT • Utility - A utility is a directly and ban private copying. Most European industry, consumers of digital equip- immediately useable service with countries (except 5 member states) ment and /or blank media. established functionality, performance allowed copying of music for private use. and dependability, illustrating the These countries add a levy to the cost of In the consultation document, the com- emphasis on user needs and issues such items which are likely to be used to make mission stated that in the digital media as trust. private copies. world "it would no longer be possible to hold only liable the manufacturers or The primary difference between the The Commission is now consulting with importers of equipment and media. The SOKU vision and earlier approaches is a industry so that it can change the laws logic of levies would also have to be switch from a prescribed layered view to around this 'copyright levy' to suit the applied to broadband and infrastructure a multi-dimensional mesh of concepts, world of digital copying. An initiative on service providers including telecommu- applying the same mechanisms along copyright levies is in the commission nications providers that carry content." each dimension across the traditional work program 2006. The additional This statement may indicate the possi- layers. follow-up consultation focuses on a bility to impose copyright levy on ISPs in series of salient points and will run from the future. However the commission also Thanks to the substantial investments and 6 June to 14 July 2006. recognized that clarifying the complex the numerous initiatives launched at the situation is not an easy task, and it may Member States and European levels, The commission sought to address the cause "a serious risk of a backlash Europe has succeeded in establishing a issue by posing various questions from against the rights holder community and leading worldwide position in Grids. The several aspects, which also indicted the consumer welfare". consistent portfolio of Sixth Framework possible changes in the new law: Programme (FP6) Grid research projects Link: will further contribute to the realisation of • the efficiency of applying the levy to The consultation paper is available at the NGG vision, thus boosting European equipment or media that consumers http://www.ec.europa.eu/internal_market/ competitiveness in Grid technologies and use, rather than the party that carries copyright/docs/levy_reform/stakeholder_ applications. It is no accident that ERCIM out and controls the private copying consultation_en.pdf experts have been involved heavily in this • the necessity and the way of improving strategic work. Three ERCIM personnel the accountability of collecting soci- By Yue Liu, NRCCL, Oslo, Norway have been involved in each of the 3 expert eties with respect to the application, groups: Thierry Priol (INRIA), Domenico collection and distribution of copyright Laforenza (CNR) and the author. We • the distribution of levies among right have worked closely and productively owners with our EC colleagues, particularly • the efficiency of current copyright levy Franco Accordino, Max Lemke and system with regard to the development Wolfgang Boch. of digital sales in Europe • copyright levies and the notion of harm Link: based on private copying http://www.cordis.lu/ist/grids/ • the criteria for establishing whether a levy is imposed on particular equip- Please contact: ment or media Keith G Jeffery • the phenomenon of convergence and Director, IT CCLR and ERCIM president copy right levies. It pointed out that E-mail: [email protected] levies that were applied to photocopies or cassette decks are being increas-

ERCIM News No. 66, July 2006 9 NEWS FROM W3C

W3C to Participate in Advisory Board In Memoriam: of Internet Governance Forum Alan Kotok

United Nations Secretary-General Kofi Annan established an Alan Kotok, W3C Advisory Group to assist him in convening the Internet Associate Chair, MIT Governance Forum (IGF), a new forum for a multi-stakeholder site manager and head of dialogue on Internet governance. Daniel Dardailler, W3C's the W3C Systems Associate Chair for Europe, will represent W3C on the new Team, passed away mid- Advisory Board. W3C looks forward to sharing its experience May in Cambridge, in distributed consensus-building within this new international Massachusetts, USA. He was 64. Tim Berners-Lee, W3C environment for standardization. Director, and Steve Bratt, W3C CEO, expressed their deep sorrow: "Our great friend, colleague, and mentor Alan Link: Kotok has passed away. Alan's W3C involvement goes back http://www.un.org/News/Press/docs//2006/sga1006.doc.htm before its formal inception, when he was still employed at Digital Equipment Corporation. His early ideas shaped W3C, and helped lead it to what it is today.

Long before Alan came to W3C, his experience established W3C Workshop on a Device him as one of the early wise men of computer science. Description Repository One of Alan's undergraduate creations was the first video game, Spacewar, which he and several classmates created Madrid, Spain, 12-13 July 2006 for the PDP-1 in 1962. Alan was also part of the team which invented the joystick, an icon of many young computer With ever-increasing diversity of Web-enabled devices, it is gamers' experiences. expected that content adaptation will play a significant role in the delivery of content. The successful adaptation of content to Alan spent 34 years with Digital Equipment Corp. in the capabilities of a device depends on reliable knowledge numerous leadership roles. He served as Technical Director about the target device. For example, the selection of columns for product strategy and development groups in of a table may depend on the physical width of the screen. The Telecommunications, Storage, and Internet. Alan provided goal of this Device Description Repository workshop is to dis- thought leadership as a member of the Corporate Strategy cuss the design, the implementation and use of a repository of Group which advocated early adoption and integration of device information (DDR) for content and service providers, as Internet and Web-based technologies. part of W3C's Mobile Web Initiative. Alan held a wide range of roles at W3C. He carried the title Link: of Associate Chairman, but he also served as the MIT site http://www.w3.org/2005/MWI/DDWG/workshop2006/ manager, managed the Systems Team, and worked closely with the Advisory Board. His contributions to membership and financial issues were highly valued. Alan shone as a problem solver, especially in important and complex areas: patent policy development, Patent Advisory Groups, what- Second Incubator Group to Explore ever processes, policies and procedures were needed to Semantic Web for Multimedia Content improve the W3C as a standards body. We have opened a publicly archived mailing list, public- As part of W3C's Incubator Activity, W3C is pleased to [email protected], http://lists.w3.org/Archives/Public/ announce the creation of the Multimedia Semantics Incubator public-memoria/ to which remembrances and photographs Group, chartered to show how metadata interoperability can be are welcome to be sent. achieved by using the Semantic Web technologies to integrate existing multimedia metadata standards. This new Incubator The W3C Team and our organization was immeasurably group (XG) is sponsored by W3C Members IVML-NTUA, better for his presence. We will all miss him for who he was, CWI, University of Aberdeen, University of Maryland and and all that he achieved." DFKI. Link: http://en.wikipedia.org/wiki/Alan_Kotok Links: Multimedia Semantics XG: http://www.w3.org/2005/Incubator/mmsem/ Incubator Activity: http://www.w3.org/2005/Incubator/

10 ERCIM News No. 66, July 2006 W3C Launches WebCGM Working W3C Web Security Workshop Report Group W3C held a workshop on 'Transparency and Usability of Web Authentication' 15-16 March 2006, in order to identify steps Computer Graphics Metafile, or CGM, is an ISO standard for W3C can take to improve Web Security from the user-facing tree-structured, binary graphics format that has been adopted end of the spectrum. Most workshop participants came from the especially by the technical industries (defense, aviation, trans- security and browser vendor community, such as Google, HP, portation, etc.) for technical illustration in electronic docu- IBM, KDE, Microsoft, Mozilla, Nokia, Opera, VeriSign, ments. W3C started a Working Group which is chartered to Yahoo!, etc., as well as leaders of the online finance actors. develop a W3C Recommendation for WebCGM 2.0, starting with WebCGM 2.0 Submission. The workshop program was structured into seven sessions and an open discussion of next steps. Participants considered short- Link: comings in the usability of current browser-based authentication WebCGM WG: http://www.w3.org/Graphics/WebCGM/WG/ technologies. Requirements for and limitations of possible

Call for Implementations of Mobile Web Best Practices 1.0

W3C reached an important milestone toward its mission of making it as easy to use the Web on a mobile device as on a desktop computer. Written for designers of Web sites and con- tent management systems, the 'Mobile Web Best Practices 1.0' guidelines describe how to author Web content that works well on mobile devices. W3C invites the designers of Web sites and Workshop participants. content management systems to read the guidelines, make implementations, and test their results with the alpha version of a guidelines checker (http://www.w3.org/2006/05/mwbp- improvements were also presented by a number of speakers. check/). Approaches for concrete improvements included leveraging (secure) metadata; a number of proposals for changes to Also, in order to build a strong community of mobile Web browser user interfaces and behaviors; protocol changes; and developers, W3C has launched a wiki (http://www.w3.org/ new approaches to identity online. 2005/MWI/BPWG/techs/TechniquesIntro) to collect observa- tions and suggestions on techniques and implementation expe- Based on the discussions, W3C staff is currently engaging those rience of Mobile Web Best Practices 1.0. Thirty organizations present at the workshop and other W3C Members in discussions participating in the Mobile Web Initiative achieved consensus that may lead to Working Group charters in three areas: Form- and encourage adoption and implementation of these guide- filler support; Secure Chrome; and Secure Metadata. lines to improve user experience and to achieve the goal of 'One Web.' 'Form-filler support' would enable browsers to reliably recog- nize log-in forms. This ability would allow browser-side cre- Links: dential management that is more reliable and usable than current Mobile Web Best Practices 1.0 in Candidate Recommendation: heuristics-based form filling mechanisms. Browsers could also http://www.w3.org/TR/2006/CR-mobile-bp-20060627/ use this capability to launch advanced and security-focused user W3C Mobile Web Initiative: http://www.w3.org/Mobile/ interfaces for credential entry.

Work on secure chrome and secure metadata would identify a baseline set of security context information that should be pre- Latest W3C Recommendations sented to the user, and best practices for the display of this infor- mation to the user. Work in this area may also cover restrictions • Web Services Addressing 1.0 - Core on scripting capabilities that are known to make faking security 9 May 2006, Marc Hadley, Tony Rogers, Martin Gudgin indicators particularly easy. • Web Services Addressing 1.0 - SOAP Binding 9 May 2006, Tony Rogers, Martin Gudgin, Marc Hadley, The workshop, hosted by Citigroup, was chaired by Dan Tony Rogers, Marc Hadley, Martin Gudgin Schutzer (FSTC) and Thomas Roessler (W3C).

A complete list of all W3C Technical Reports: Link: http://www.w3.org/TR/ Workshop report: http://www.w3.org/2005/Security/usability-ws/report

ERCIM News No. 66, July 2006 11 SPECIAL THEME: European Digital Library

Towards the European Digital Library - Introduction to the Special Theme

by Ingeborg Torvik Sølvberg and Costantino Thanos

The recent events in the European scene search, that facilitate the use of their concerning the digital library field - the resources by their target community. initiatives announced by Google and Yahoo aiming at making accessible vast The European Digital Library will be a online libraries of books, and the major step towards making this vision a European Commission's plan for a reality. The European Digital Library European Digital Library recently should support the interoperability of the unveiled - have put 'digital libraries' in different eContent holders – where by the center of the debate and interest of interoperability we intend the ability to the European research community. store and retrieve information across col- lections in diverse media and languages, That all citizens, anywhere, anytime, administered independently. Clearly, should have access to Internet-connected techniques for querying across lan- digital devices to search all of human guages that also take cultural differences knowledge, regardless of barriers of into account must be available and the time, place, culture or language has also results of cross-language searches must been the vision of DELOS, the European be presented in a form that is easily com- Network of Excellence on Digital prehensible to the user. Libraries, since its inception. DELOS believes that, in the near future, net- The European Digital Library must also worked virtual libraries will enable support the storage and preservation of anyone from their home, school or office its digital collections. Long-term storage to access the knowledge contained in the technologies and efficient procedures for digital collections created by traditional migration of contents and processes libraries, museums, archives, universi- within a digital library to new environ- ties, governmental agencies, specialized ments should be developed so that they organizations, and individuals around remain available to the user. The the world. These new libraries will offer European Digital Library must also be digital versions of traditional library, able to manage complex intellectual museum and archive holdings including property rights which will involve both text, documents, video, sound and legal and cost issues. images. But they will also provide pow- erful new technological capabilities that We are convinced that the time for the enable users to refine their requests, ana- European Digital Library has come. The lyze the results, access collections in European digital library community has other languages, share resources, and carried out a relevant amount of work work collaboratively. No matter where during the last years. DELOS, in partic- the digital information resides physi- ular, has engaged the major European cally, sophisticated search software can teams and expertise to help this become find it and present it to the user on a reality and is ready to work with the demand. eContent holders to make this vision a reality. Having said this, we are not talking about a 'googlization' of digital libraries. This issue stands in witness of the con- Digital libraries should be much more siderable amount of research activities than search engine portals. They should carried out by the European digital extend traditional libraries dramatically. library community and is organized in They should provide services, including six sections. The first contains five invited articles. It begins with a descrip-

ERCIM News No. 66, July 2006 SPECIAL THEME: European Digital Library

ARTICLES IN THIS SECTION

12 Towards the European Digital Library - 33 Towards The European Metadata Introduction Registry by Ingeborg Torvik Sølvberg, NTNU, by László Kovács, András Micsik, SZTAKI, Norway, and Costantino Thanos, ISTI- Hungary, and Jill Cousins, The European CNR, Italy Library Office, The Netherlands Invited articles: Information Access and Multimedia: 14 From Digital Libraries to Knowledge 34 Personalizing Digital Library Access Commons with Preference-Based Queries by Yannis Ioannidis, University of Athens, by Periklis Georgiadis, Nicolas Spyratos, Greece Vassilis Christophides, ICS-FORTH, tion by Yannis Ioannidis of the long term 15 The European Digital Library – A Project Greece and Carlo Meghini, ISTI-CNR, Italy vision for digital libraries that has been of the Conference of European National 36 Multilingual Interactive Experiments developed by DELOS during the last Librarians with Flickr by Elisabeth Niggemann, Die Deutsche by Jussi Karlgren, SICS, Sweden, Paul five years. The next article is by Bibliothek, Frankfurt, Germany Clough, University of Sheffield, UK, and Elisabeth Niggemann who presents the 16 A Forward-Looking European Digital Julio Gonzalo, UNED, Spain views of CENL (Conference of Library? Hence 5S? 37 Multimedia Ontologies for Video Digital European National Libraries) on the by Edward A. Fox, Virginia Tech, USA Libraries European Digital Library. Edward Fox 18 Search Engine Technology Applied in by Alberto Del Bimbo, Marco Bertini and Digital Libraries CarloTorniai, University of Florence, Italy then makes some considerations about by John M Lervik, FAST, Norway, and 38 Structured Multimedia Description for the future European Digital Library on Svein Arne Brygfjeld, The National Library Simplified Interaction and Enhanced the basis of a theory-based approach to of Norway Retrieval the field of digital libraries (the 5S 20 The Shifting Landscape of Digital by Stephane Marchand-Maillet, Eric Bruno model). The fourth article is co-authored Libraries Research and Development in and Nicolas Moënne-Loccoz, University of Australia Geneva, Switzerland by John Lervik and Svein Arne by Jane Hunter, University of Queensland, 40 Taking a New Look at News Brygfjeld and focuses on search engine Australia by Arne Jacobs and Nektarios technology applied in digital libraries. Architecture: Moumoutzis, University of Bremen, The final contribution in this section is 21 A Reference Architecture for Digital Germany by Jane Hunter and describes the shifting Library Systems 41 Radio Relief: Radio Archives by Leonardo Candela, Donatella Castelli Departments Benefit from Digital Audio landscape of digital library R&D in and Pasquale Pagano, ISTI-CNR, Italy Processing Australia. 22 DelosDLMS - Infrastructure for the Next by Martha Larson, Fraunhofer IAIS, Generation of Digital Library Thomas Beckers, WDR, and Volker The rest of this section contains a selec- Management Systems Schlögell, Deutsche Welle, Germany tion of submitted articles on a variety of by Hans-J. Schek, University of Konstanz, 42 Self-Organizing Distributed Digital Germany, and Heiko Schuldt, University of Library Supporting Audio-Video research topics within the Digital Basel, Switzerland by László Kovács, András Micsik, SZTAKI, Library domain. We have grouped these 24 A Powerful and Scalable Digital Library Hungary, and Martin Schmidt and Markus articles under the following headings: Information Service Seidl, University of Applied Sciences St. digital library architectures and related by Henri Avancini, Leonardo Candela, Pölten, Austria concepts; ontology and metadata issues; Andrea Manzi and Manuele Simi, ISTI- Repositories and Preservation: CNR, Italy 44 Repositories and Preservation in the UK information access and multimedia; 25 Semantic Search in Peer-to-Peer-Based by Neil Jacobs, Joint Information Systems repositories and preservation. The final Digital Libraries Committee, UK sub-section presents three new projects by Hao Ding, Ingeborg Torvik Sølvberg, 45 Digital Library of Historical Newspapers addressing different aspects of the digital NTNU, Norway by Martin Doerr, Georgios Markakis, Maria library paradigm. Overall we feel that 27 XPeer: A Digital Library for the European Theodoridou, ICS-FORTH, Greece Higher Education Area 46 DML-CZ: Czech Digital Mathematics these articles give a good picture of cur- by Mark Roantree, Dublin City University, Library rent trends in digital library R&D not Ireland, and Zohra Bellahsène, University by Jirí Rákosník, Mathematical Institute AS only in ERCIM institutes but in the of Montpellier II, France CR, Prague, Czech Republic European research community at large. Ontologies and Metadata: New Projects: 28 Increasing the Power of Semantic 47 CASPAR and a European Infrastructure Interoperability for the European Library for Digital Preservation Links: by Martin Doerr, ICS-FORTH, Greece by David Giaretta, Digital Curation Centre, http://www.delos.info/ 29 A Tool for Converting Bibliographic UK http://europa.eu.int/information_society/ Records 49 PROBADO – Non-Textual Digital activities/digital_libraries/ by Trond Aalberg, NTNU, Norway Libraries put into Practice 31 Information Patterns for Digital Cultural by Thorsten Steenweg and Ulrike Steffens, Repositories OFFIS, Germany Please contact: by Chryssoula Bekiari, Panos 50 WIKINGER - Semantically Enhanced Ingeborg Torvik Sølvberg, NTNU, Norway Constantopoulos and Martin Doerr, ICS- Knowledge Repositories for Scientific E-mail: [email protected] FORTH, Greece Communities Costantino Thanos, ISTI-CNR, Italy 32 Towards a Semantic Information by Lars Broecker, Fraunhofer IMK, Platform for Subsea Petroleum Processes Germany Tel.: +39 050 3152910 by Jon Atle Gulla, NTNU, Norway E-mail: [email protected]

ERCIM News No. 66, July 2006 13 SPECIAL THEME: European Digital Library

From Digital Libraries to Knowledge Commons

by Yannis Ioannidis

Digital Libraries began as systems whose goal was to simulate the operation of traditional libraries for books and other text documents in digital form. Significant developments have been made since then, and Digital Libraries are now on their way to becoming 'Knowledge Commons'. These are pervasive systems at the centre of intellectual activity, facilitating communication and collaboration among scientists or the general public and synthesizing distributed multimedia documents, sensor data, and other information.

Digital Libraries represent the conflu- Together with the general community, Achieving this requires significant ence of a variety of technical areas both the DELOS Network of Excellence on changes to be made to past development within the field of informatics (eg data Digital Libraries has initiated a long strategies, which shaped the function- management and information retrieval), journey from current Digital Libraries ality, operational environment and and outside it (eg library sciences and towards the vision of 'Knowledge other aspects of Digital Libraries. sociology). Early Digital Library efforts Commons'. These will be environments Knowledge Commons will have dif- mostly focused on bridging some of the that will impose no conceptual, logical, ferent characteristics. They will be gaps between the constituent fields, physical, temporal or personal borders person-centric, motivated by needs to defining `digital library functionality', or barriers on content. They will be the provide novel, sophisticated, and per- and integrating solutions from each field universal knowledge repositories and sonalized experiences to users. They into systems that support such function- communication conduits of the future, will concentrate on communication and ality. These have resulted in several suc- common vehicles by which everyone collaboration functionality, facilitating cessful systems: researchers, educators, will access, analyse, evaluate, enhance intellectual interactions on themes that students and members of other commu- and exchange all forms of information. are pertinent to their contents, with nities now continuously search Digital Libraries for information as part of their daily routines, decision-making pro- Related Reports cesses, or entertainment. "Recommendations and Observations for a European Digital Library (EDL)": Most current Digital Library systems Brainstorming Report, Juan-Les-Pain, France, December 2005 share certain characteristics. They are http://www.delos.info/index.php?option=com_content&task=view&id=344&Itemid=125 content-centric, motivated by the need to organize and provide access to data "A Future Vision for Digital Libraries": DELOS Brainstorming Report, Corvara, and information. They concentrate on Italy, July 2004 storage-centric functionality, mainly http://www.delos.info/files/pdf/events/2004_Jul_8_10/D8.pdf offering static storage and retrieval of information. They are specialized sys- "Digital Libraries at a Crossroads", DELOS FP5 Final Report, July 2003 tems, built from scratch and tailored to In International Journal on Digital libraries, Volume 5, Number 4, August 2005 the particular needs and characteristics ISSN: 1432-5012 (Paper) 1432-1300 (Online) DOI: 10.1007/s00799-004-0098-4, of the data and users of their target Pages: 255 - 265, http://www.informatik.uni-trier.de/~ley/db/journals/jodl/jodl5.html environment, with little provision for generalization. They tend to operate in "Digital Libraries: Future Directions for a European Research Program", DELOS isolation, limiting the opportunities for Brainstorming Report, San Cassiano, Italy, June 2001: large-scale analysis and global-scale http://delos-noe.iei.pi.cnr.it/activities/researchforum/Brainstorming/brainstorming-report.pdf information availability. Finally, they concentrate on material that is tradi- tionally found in libraries, mostly related to cultural heritage. Hence, They will be indispensable tools in the storage and retrieval being only a small despite the undisputed advantages that daily personal and professional lives of part of such functionality. They will current Digital Library systems offer people, allowing everyone to advance remain specialized systems that will compared to the pre-1990s era, the their knowledge, professions and roles nevertheless be built on top of widely- above restrictions limit the role that in society. They will be accessible at available, industrial-strength, generic Digital Libraries can play in any time and from anywhere, and will management systems, offering all typi- Knowledge Societies, which will serve offer a user-friendly, multi-modal, effi- cally required functionality. In general, as important educational nuclei in the cient and effective interaction and they will be managed by globally dis- future. exploration environment. tributed systems, through which infor-

14 ERCIM News No. 66, July 2006 SPECIAL THEME: European Digital Library

mation sources across the world will content, personalizing user experiences, into a single, universal system that will exchange and integrate their contents. facilitating semantic interoperation provide unified access to all content Finally, they will be characterized by among systems and modelling curation across Europe. This will be the realization universality of information and applica- and preservation of content. Research of the short-term vision of `The European tion, serving all applications and com- within and outside DELOS is advancing Digital Library'. If appropriately prehensively managing all forms of steadily towards these so that a first ver- advanced and openly expandable tech- content. sion may become reality by the end of nology is used, it can also serve as the the decade. ideal springboard for realizing the longer- There are several key milestones to be term vision of Knowledge Commons. achieved on the way towards Knowledge It is serendipitous that, as part of its 'i2010 Commons. In particular, a Reference – a European Information Society for Link: Model for Digital Libraries/Knowledge growth and jobs' initiative, the European DELOS: http://www.delos.info Commons must be obtained, that is, a Commission has recently announced its framework with a set of interrelated con- plans to foster the development of Please contact: cepts that will collectively capture the European Digital Libraries, so that Yannis Ioannidis, University of Athens, essence of the field and help everyone Europe's written and audiovisual heritage Greece understand its basic elements. An appro- becomes widely available. This repre- E-mail: [email protected] priate system architecture (eg Grid or sents a significant step towards Peer-to-Peer) must also be identified. Knowledge Commons. Extensive digiti- Other critical steps include devising zation of materials and the formation of sophisticated similarity search tech- many individual Digital Libraries should niques, handling complex audio-visual be followed by incorporation of the latter

The European Digital Library – A Project of the Conference of European National Librarians by Elisabeth Niggemann

The Conference of European National Librarians (CENL) shares the vision of a European Digital Library and has been working towards this goal by creating The European Library service. The Communication published in September 2005 by the European Commissioner for Information Society and Media, Viviane Reding, on her plans for European digital libraries provided the impulse to think in a much broader way of a more comprehensive European Digital Library.

A true European Digital Library should extremely difficult to organize a central, for limitless data exchange and high-per- serve all types of user needs: present and comprehensive super-structure. Instead, formance access that are critical. future, up-to-date or historical informa- a networked structure is required, tion, science and humanities, education, allowing faster and slower partners to Partner networks need not be homoge- research and everyday, 'normal' informa- proceed at their own speeds, while all nized, but can continue to be organized tion needs. It must comprise all types of benefit from the process. in many ways. They will create among media from the full gamut of Europe's themselves a network of subnetworks, cultural heritage institutions: libraries, The 'hub' of the network should be a cen- with nodes and substructures, that reflect museums and archives. tral entry point to all the participating the diverse needs of the different user gateways. This network of networks communities, media types, institution This is a highly diverse universe of must be scalable, and will rely heavily on types, and eventually also reflect legal knowledge, information and creativity: it common rules, standards and proce- frameworks. exists as print, sound or image; in tradi- dures. On the other hand, it must be built tional analogue form or increasingly as with diversity and heterogeneity in mind. Since the Europe of the future will be 'born digital' or digitized form; and held From the users' point of view, cross- larger than it is now, it is important that by institutions with different profes- searching of all data and cross-services all European countries are taken into sional backgrounds and traditions and in offered by all or most of the network consideration from the very beginning, all the member states with their different members is of great importance. From not only those that are today's EU institutional structures, responsibilities the participating institutions' point of Member States. Building a scalable and financing. It would therefore be view it is the technical interoperability system means not only technical scala-

ERCIM News No. 66, July 2006 15 SPECIAL THEME: European Digital Library

bility but also functional scalability: this CENL believes that TEL is a model plat- lished the Content Working Group includes all the European languages with form and model organizational network which will work on how content for their different character sets. for building the European Digital mass digitization in Europe can be Library. As a group, the members of selected and created for the European The European Digital Library should CENL own Europe's cultural published Digital Library. also – from the very beginning – try to heritage – many of them according to build bridges to those global or regional legal deposit, many for the whole period The next step towards a European networks outside Europe that provide of time of their nation's history. In those Digital Library will be the start of a new additional resources for Europe's citizens cases where sections of their nation's EU-funded project where CENL is one and researchers. memory are not part of their holdings, of the coordinators. The project will deal they are usually part of a national net- with the enlargement of The European One European gateway built on these work of libraries that, as a group, cooper- Library service by new partners, with principles is already in existence: TEL, atively own the complete national multilingual issues and content develop- The European Library. TEL is an ambi- corpus. ment. Negotiations with the European tious and pioneering collaboration Comission are underway and the project between European national libraries, To take up the challenge of creating a should commence in summer 2006: the supported by the EU and created under European Digital Library, CENL European Digital Library (EDL). the auspices of CENL. It offers a profes- adopted the Resolution on Digitization sionally designed and maintained single of European Cultural Heritage at its last Links: access point to their holdings, spanning a annual meeting in Luxembourg in http://www.cenl.org range of collections in all the partner September 2005. CENL aims at exam- http:// www.theeuropeanlibrary.org national libraries. It already allows ining how best to use the organizational access to more than one million digital model of The European Library to Please contact: items, as well as millions of catalogue develop and coordinate, as a common Elisabeth Niggemann, Chair of CENL records. Of the 45 CENL member effort, the existing strategies for digitiza- Die Deutsche Bibliothek, Frankfurt, libraries, about thirty will be full partners tion and digital libraries, including tech- Germany at the end of 2007, including all EU nical, content selection and funding E-mail: [email protected] member states. All 45 CENL members issues. As it is particularly important to were involved as partners from the start. focus on content selection, CENL estab-

A Forward-Looking European Digital Library? Hence 5S?

by Edward A. Fox

Will The European Digital Library (TEDL), as it expands beyond current efforts with The European Library, be forward-looking? Will it consider the 5S checklist: Societies, Scenarios, Spaces, Structures, and Streams? Will it draw upon the expertise that DELOS helps to make visible, and that is apparent in conferences like the European Conference on Digital Libraries?

Europe has a wonderful opportunity to Programme (IST). Another resource is of Digital Libraries, which began become a world leader in developing a the annual conference on digital around 1991, has led to many projects, comprehensive next-generation digital libraries in Europe; eg, ECDL 2006, the services, systems, theoretical results library that will serve the entire 10th European Conference on Research and a great deal of advanced tech- European Community. Doing so will and Advanced Technology for Digital nology. I sincerely hope that the EU require courageous leadership, careful Libraries. (Though current holdings of will ensure that this work and all its investment, hard work and a willing- The European Library are limited, related successes will lead to a forward- ness to share for the common good. ECDL does come up as one of the two looking TEDL. things found when searching for 'digital The EU has much to draw upon as it library', as can be seen in the figure.) One way to focus this discussion is to works toward TEDL. One key resource Even beyond the borders of Europe, use the 5S framework, a theory-based is DELOS, the Network of Excellence there exist professional groups like the approach to the field of digital libraries. on Digital Libraries, partially funded by IEEE-CS Technical Committee on First, we consider the Society perspec- the European Commission's Digital Libraries (TCDL), and confer- tive. What social (and cultural, eco- Information Society Technologies ences like JCDL and ICADL. The field nomic, legal, linguistic, national, to

16 ERCIM News No. 66, July 2006 SPECIAL THEME: European Digital Library

The European Library - http://www.theeuropeanlibrary.org - the portal for searching the content of European national libraries.

political, etc.) concerns apply? What is histograms)? Can TEDL work in papers, images from museums and the target audience? Who are the users? 'Semantic Space' as we move toward courseware. Will institutional reposito- How can their collaboration, member- Web 2.0 and the Semantic Web? Will ries at all colleges, universities, centres, ship in groups, and myriad other results be grouped into useful and even agencies and businesses sup- relationships be made use of, for clusters based on similarity? Will port Open Access? Will the holdings of example through collaborative fil- results be presented, if desired, on a the regional, federal and national tering? How can we move beyond map (eg, of Europe), so that locations libraries of Europe catalogue both TEL's current focus on librarians (as is are understood, such as the site of an information and full-text (and full-con- clear from the interface, query lan- author's institution? tent), and be opened as well? Will pub- guage, use of tech- lishers cooperate to vastly increase their nology and provision mostly of cata- In the case of Structures, will TEDL market and visibility? logue records)? How can the citizens, make use of organization in its many students, teachers, scholars, forms? Of ontologies and thesauri? Of This is a vision to challenge the entire researchers, businesses and other insti- appropriate category systems? Of pre- R&D community worldwide. I hope the tutions in the EU be properly sup- sentation using trees or graphs? Of EU will work in this direction and call ported? fields and markup structure? Of for broad involvement and assistance in database schema and related records? a forward-looking digital library – of This leads us to think about Scenarios. Of facts, snippets (ie structures atop and for Europe (and beyond). For what purposes, goals, and objec- content streams) and extracted informa- tives can TEDL be used? For what tasks tion? Links: and activities? Through what classes DELOS: http://www.delos. info and types of services? How can such a Considering Streams, will TEDL go 5S framework: system respect privacy, while at the beyond streams or sequences of charac- http://www.dlib.vt.edu/projects/5S-Model same time remembering a user for ters and consider large books? Audio? ECDL 2006: http://www.ecdl2006. org longer than the brief time it takes to Video? Animation? Sensor streams handle a single WWW transaction: that from surveillance or satellites? Will Please contact: is, into sessions, investigations and life- streams be managed in concert with Edward A. Fox long learning activities? Will Open structures in order to find suitable video Virginia Tech, USA Access be fully supported, to all avail- frames or book sections? Will spatial Tel: +1 540 231 5113 able content? considerations help in stream selection E-mail: [email protected] and subsetting (eg finding a scene in a Considering Spaces, we must consider news video covering a particular her- all the aspects of context, including the itage site or important event or per- location of the user, the effects of time sonage)? Will helpful scenarios allow and space, the presentation of results users to work with all media types and using information visualization, inte- mixtures, providing all the support gration with geographic information afforded for textual content, and systems, and the simplification of beyond? searching that comes from reducing the dimensionality of a vector space (with Clearly, TEDL could serve millions, features that range from terms, phrases with tens of millions of resources, and categories to concepts and color including theses, reports, papers, news-

ERCIM News No. 66, July 2006 17 SPECIAL THEME: European Digital Library

Search Engine Technology Applied in Digital Libraries

by John M Lervik and Svein Arne Brygfjeld

Digital libraries are experiencing rapid growth with respect to both the amount and richness of available digital content. This is the result of a range of large-scale digitization projects on books and periodicals that are occurring locally, nationally and internationally. Furthermore, a number of libraries are digitizing other information carriers such as still images, audio and video. Much of the information in libraries has good metadata; to supplement this, OCR technology is used to extract text-based information from textual information within digitized images.

As a consequence of the huge amounts of manner and provides proven methodolo- nities. One obvious challenge is to keep digital content becoming available, gies for the operation and maintenance of the query response time low. While con- modern search engines are now being these systems. Hence, as digital libraries ventional search technology gives introduced in digital libraries. Online grow in content volume and diversity, acceptable query response times when users are accustomed to Internet search this third-generation search technology searching is limited to metadata only, it engines: they expect simplicity, speed provides a convenient platform to give cannot handle the addition of large and cross-collection searching. Given the users what they expect with respect to amounts of OCR text. With a third-gen- complexity and amount of digital con- relevance, functionality, and speed. eration platform, searching can be per- tent, third-generation search technology formed across multiple systems and on is required to provide highly relevant The exponential growth in the volume all the available content. search capabilities. This search tech- and diversity of digital library content nology also scales in a cost–efficient represents both challenges and opportu- Another significant challenge is to pro- vide the most relevant information to the user. This involves finding and ranking hits from large amounts of information, as well as retrieving relevant information that the user does not know exists or that she/he may not even know how to search for. The most relevant information may exist in another database and be described in ways that do not show up in traditional searches. Cross-database Figure 1: searching is another area where tradi- Functional model at the tional systems are insufficient. Some sys- National Library of Norway tems provide distributed searching through the use of Z39.50, OpenURL and other protocols. However, they are very slow due to protocols and long response times in source systems, and do not pro- vide one consistent relevancy function across all of the content sources.

A third challenge is searching a rich set of information types, such as the content of still images or audio. With such a diversity of information, there is a clear need for a more dynamic inclusion of new search methods on various informa- tion types.

Third-generation search engine tech- nology is designed to combine the scala- bility of Internet search engines with Figure 2: Contextual searching. new and improved relevance models.

18 ERCIM News No. 66, July 2006 SPECIAL THEME: European Digital Library

This will include contextual relevance, allowing searches to be performed across any type of content and any type and number of sources. Libraries already have a wealth of experience in har- vesting information to build centralized repositories of both metadata and con- tent. Such environments are well suited for exploiting the capabilities of search engine technology and thus meeting the challenges above. The National Library of Norway has recently implemented infrastructure based on these principles, with the FAST search platform as a core component. This is shown in Figure 1. Figure 3: Comparison of results from a standard Web search (left) and a contextual search (right). This model makes it possible to perform very fast and relevant searches of large amounts of information residing in dis- parate databases. In this case, more than Contextual searching has been designed Web search result which until recently thirty metadata sources are included in to address these challenges. Figure 2 was considered state-of-the-art! addition to content from digitized news- illustrates some of the key components papers, books and journals. The search in the foundation of a contextual search. Contextual searching allows the infor- service provides access to the complete mation provider to preserve all the orig- palette of information at the library, Contextual searches introduce new inal information and spend less time including books, periodicals, still metaphors for user interaction: each doc- annotating and classifying the content. images, audio and video. It also serves as ument is further decomposed into To end users, it gives ease-of-use a base for statistical purposes, as well as semantic components that can be through better content and features, and management information. retrieved and analysed across billions of creative freedom to ask questions the articles. Even ambiguous, open-ended providers may not have planned for. In A major advance in third-generation queries can therefore be answered with essence, the value of information should search technology is the introduction of highly accurate 'table-of-contents'-type not be judged by the ability to store it, contextual searching. The current Web results relating to factual information but by the ability to use it. search approach provides links to docu- contained inside the relevant documents. ments based on a hard-wired ranking Hence, in addition to getting access to In summary, contextual searching method, such as the number of inbound highly relevant search results, enables deep semantic analysis and links or scientific citations. This researchers can: refinement across structured, unstruc- approach has some fundamental limita- • discover concepts/facts they did not tured and rich media, and dynamic inter- tions for use by digital libraries. First, know existed pretation of contextual meaning of the researchers cannot find what they don't • understand trends and get access to the content. The overall results of the con- know exists, so all queries must be speci- 'long tail' of information which is textual combination are vastly improved fied by the user and there are no tools to invisible in traditional searches discovery, schema exploration and dis- crate data-driven content analysis. • become more efficient and understand ambiguation capabilities. Contextual Second, researchers will only get access factual patterns across different docu- search is all about turning information to the 'most popular' or 'newest' articles ments. buried inside digital libraries into value as defined by a black-box relevance for researchers. function. It is possible that neither of As an example of how contextual these forms of ranking will illustrate the searching improves search precision, we Please contact: results or trends the researcher is inter- have run two test queries against the John M Lervik, CEO ested in. Third, the user interaction does online encyclopedia Wikipedia (see Fast Search & Transfer ASA (FAST), not provide any learning for the Figure 3). Results on the left are from the Norway researcher. The researcher must open query "persons that appear in the same E-mail: [email protected] and read the full articles in order to document as the word 'soccer'"; those on assess them, and this time investment is the right are from the query "persons that Svein Arne Brygfjeld, Digital Libraries, usually significant. Hence, the overall appear in the same context (ie paragraph) The National Library of Norway approach offered by most library ser- as the word 'soccer'". The improvement E-mail: [email protected] vices does not make the best use of the in result quality is striking, yet the first offered content. list of results corresponds to a standard

ERCIM News No. 66, July 2006 19 SPECIAL THEME: European Digital Library

The Shifting Landscape of Digital Libraries Research and Development in Australia

by Jane Hunter

Over the past ten years, Australia has seen steady growth in research and development in Digital Libraries (DLs). The number, size, sophistication and recognition of related projects and initiatives has increased, leading to better funding from government and private organizations. Specific strategies have been put in place to improve Australian skills, expertise and technologies in this field. The aim is to ensure that digital content being developed in the Australian creative, cultural, educational, scientific and academic sectors is accessible in the long term. This requires that it be stored and maintained in digital libraries with robust management and preservation processes.

The first phase of DL research began in • Meta Access Management System trend was towards the adoption of e- the mid- to late 1990s with Australian Project (MAMS) Research infrastructure; hence, rather involvement in metadata standards • Australian Research Repositories than dealing with traditional scholarly designed to facilitate the discovery of Online to the World (ARROW) publications, the MERRI projects are digital resources, ie Dublin Core. The • Australian Digital Theses Program focusing on long-term access to raw key Australian participants in this work Expansion (ADT). research data (including scientific, med- included the National Library (through ical, financial and sensor data) and ana- its Digital Services Project), the National These projects focused primarily on tra- lytical services by collaborating teams of Archives and the Resource Discovery ditional scholarly publications and on scientists. The nine projects are: Unit at DSTC in the University of evaluating existing DL technologies or • BlueNet: the Australian Marine Queensland. The main focus of this work middleware developed overseas that Science Data Network was on metadata input tools and Dublin could be promoted and deployed within • Molecular Medicine Informatics Core-based search engines. During the Australia. These included DSpace, Model (MMIM): a multi-institutional, late 1990s, this research activity Fedora and Shibboleth. Although some multi-disciplinary research and expanded to include metadata standards local refinements and extensions were training platform for clinical research for multimedia content: images, video, made, the FRODO projects have not • Time Sync: mapping the global audio and composite multimedia objects. delivered any original DL research. financial system However, it was still primarily focused • Australian Service for Knowledge of on support for managing digital collec- In 2005, ARIIC provided funding to Open Source Software (ASK-OSS) tions within libraries, archives and cul- $AUD nine MERRI projects (Managed • Middleware Action Plan and Strategy tural institutions. Funding was ad hoc Environment for Research Repository (MAPS) and fragmented, and a coordinated Infrastructure). By this stage, the global national effort did not exist.

In the past five years, the Australian gov- ernment has attempted to develop a more structured approach to the funding of DL activities through the Systemic Infrastructure Initiative (SII) and the Australian Research Information Infrastructure Committee (ARIIC). The main focus of ARIIC is to improve the access of Australian researchers to rele- vant information, thereby aiding their research and making their results widely available and easily accessible. In 2003, ARIIC funded four FRODO projects (Federated Repositories of Online Digital Objects) to the tune of $AUD22 million. These are: • Australian Partnership for Sustainable Figure 1: DART's secure annotation system for the collaborative annotation of 3D Repositories (APSR) crystallographic structures.

20 ERCIM News No. 66, July 2006 SPECIAL THEME: European Digital Library

• Legal Protocols for Copyright DART project has, however, developed focusing on preservation. The National Management: facilitating open access and demonstrated sophisticated annota- Library's PANDORA project is investi- to research at the national and interna- tion services for communities including gating digital preservation technology. tional levels (OAKLAW) protein crystallographers, climate mod- In addition, the PANIC project at the • Dataset Acquisition, Accessibility and ellers and social scientists. Figure 1 illus- University of Queensland is investi- Annotation e-Research Technology trates a screen shot from the secure anno- gating automatic obsolescence detection (DART) tation system developed by the and notification services, and the appli- • E-Security Framework for Research University of Queensland to enable col- cation of Semantic Web/GRID services • Regional Universities Building laborative annotation of 3D crystallo- to the discovery of optimum preserva- Research Infrastructure graphic structures. tion services. Collaboratively (RUBRIC). Apart from the ARIIC-funded projects, Please contact: Most of these projects are in an early there are a number of additional DL Jane Hunter stage of development, and have not yet research projects being undertaken University of Queensland, Australia produced any substantial results. The within Australia that are specifically E-mail: [email protected]

A Reference Architecture for Digital Library Systems by Leonardo Candela, Donatella Castelli and Pasquale Pagano

The building of a European Digital Library requires a cooperative and distributed development model that, as far as possible, promotes the sharing and reuse of current Digital Library products. In order to support this model, an abstract solution to the problem of implementing a Digital Library - in other words, a reference architecture – is fundamental.

After about fifteen years of Digital moting an activity, lead by ISTI-CNR, problem of organizing and implementing Library (DL) research, development and aimed at producing a Digital Library DL systems. This reference architecture deployment the research community is Reference Model. is based on a loosely coupled compo- now well versed in the successes and nent-oriented approach. Such an weaknesses of the field. Until now a A reference model is an abstract frame- approach is fundamental for the pur- pragmatic approach has been adopted in work for describing and understanding poses of the reference architecture, since developing systems via specialized the significant relationships between enti- it allows for: (i) easy tailoring of the DL methodologies, usually by adapting ties in an environment. It consists of a through component selection and techniques borrowed from other disci- minimal set of unifying concepts and rela- replacement; (ii) reuse of the compo- plines. This approach has produced a tionships within a problem domain and is nents in different contexts; (iii) dis- plethora of heterogeneous entities and usually independent of specific standards, tributed installation (since each compo- systems – commonly classified as 'dig- technology, implementations or other nent can be independently imple- ital library systems' – and has resulted in concrete details. Its goal is to enable the mented); and (iv) easy support for het- a lack of agreement on what should con- development and integration of systems erogeneity issues by using or providing stitute the fundamental aspects of DL by using consistent standards or specifica- an appropriate component dealing with technology. This makes the interoper- tions supporting that environment. the particular issue. ability, reuse, sharing, and cooperative development of DLs extremely difficult. The core of this model is being developed Together with the component-oriented Moreover, the role played by DLs under- by a consortium that includes, in addition approach we also adopt a layered goes continuous evolution, making cur- to ISTI-CNR, four other partners: the approach, and organize the constituent rent systems inadequate for future appli- University of Athens (GR), the University components into three tiers: (i) the cations. Modern DLs are conceived as of Glasgow (UK), the University for Application Framework, ie the set of systems to support the whole process of Health Informatics and Technology libraries and subsystems supporting the dealing with human knowledge produc- (AU), and the University of Basel (CH). operation of the other DL system compo- tion, maintenance and communication. nents; (ii) the Enabling Components, The DELOS Digital Library System which provide the functionality required A Digital Library Reference Model Reference Architecture to support cooperation among the com- To overcome these limitations and lay the One of the main outcomes of this work is ponents implementing the DL applica- foundations upon which to build future the Digital Library System Reference tion; and (iii) the DL Application DL systems, the DELOS Network of Architecture depicted in Figure 1, which Components, which provide the DL Excellence on Digital Libraries is pro- provides an abstract solution to the functionality specific to end users.

ERCIM News No. 66, July 2006 21 SPECIAL THEME: European Digital Library

The components that constitute the DL Application Components tier are further organized into functional areas. Mediation components deal with and provide access to third-party information and services that vary in structure, format, media and physical representa- tion. Information Space Management components implement and manage the DL information objects by providing storage and preservation facilities. Access components support the dis- covery of DL information objects via search and browse facilities. User Space Management components provide for registration and activities concerning The Digital Library System Reference Architecture. administration of the users. DL Management components aid the admin- istration of the DL in terms of the librarian activity, eg review processes, by the Reference Architecture to build a place in June 2006 in Rome, and policy management and preservation prototype. One of the first outcomes of involved leading DL researchers. The activities. Finally, Presentation compo- this activity is identifying the most objective of the workshop was to discuss nents provide users with simple access to appropriate current technologies: the the current model and to plan future the DL information and services, namely proposed component-oriented approach activities and collaborations to build on the graphical user interface. dovetails with the service-oriented this foundational work. approach and promotes the usage of P2P Exploitation and Future Work and Grid technologies. Link: In order to test this model and prove its DELOS website: http://www.delos.info effectiveness, a DELOS DL system pro- These activities, ie the Digital Library totype is being implemented. The goal is Reference Model and the DL system Please contact: to combine existing software and tech- Reference Architecture with its prelimi- Donatella Castelli, ISTI-CNR, Pisa, Italy nologies produced by members of nary implementation, are the topics of a E-mail: [email protected] DELOS, and use the guidelines provided DELOS Workshop. The workshop took

DelosDLMS - Infrastructure for the Next Generation of Digital Library Management Systems

by Hans-J. Schek and Heiko Schuldt

The overall goal of the DelosDLMS is the implementation of a prototype of a next- generation digital library management system. This system combines text and audio-visual searching, offers personalized browsing using new information visualization and relevance feedback tools, allows retrieved information to be annotated and processed, integrates and processes sensor data streams, and finally, from a systems engineering point of view, is easily configured and adapted while being reliable and scalable. The prototype will be built by integrating digital library functionality provided by the DELOS partners into the OSIRIS/ISIS platform, a middleware environment developed by ETH Zürich and now being extended at the University of Basel.

In the first two years of DELOS - the focused on improving digital libraries systems. Current integration activities EC-funded Network of Excellence on (DLs) by developing independent, pow- began in early 2006 and are coordinated Digital Libraries, work has mainly erful and highly sophisticated prototype by the architecture work package of

22 ERCIM News No. 66, July 2006 SPECIAL THEME: European Digital Library

DELOS. These prototype systems are work. This will be achieved by inte- services which are best suited for inte- being integrated as building blocks into grating these services into the OSIRIS gration into the DelosDLMS. For the OSIRIS/ISIS, an existing middleware infrastructure, thereby combining them first version of the integrated environment that was developed at ETH with other ISIS and non-ISIS services DelosDLMS prototype, services will in Zurich. The result of the integration – into advanced, process-based DL appli- most cases be loosely coupled. The final that is, the middleware infrastructure cations. version will then support a higher degree together with all the advanced DL func- of reliability by tightly coupling as many tionality – will constitute the The DelosDLMS services as possible. Services will be DelosDLMS. The plan for the DelosDLMS includes integrated from the following areas: the upgrading of existing ISIS compo- • sophisticated term extraction from ISIS and OSIRIS nents and services and the integration of text, text indexing and collection man- A central task in the second phase of new functionalities. The final product agement DELOS is the development of a global will support multi-object multi-feature • annotation services prototype. The objective is to build a queries over collections of different • reliable sensor data management joint prototype for the future Digital media types. Personalized browsing and • multimedia indexing Library Management System that makes information access, relevance feedback • automatic search process generation available results from many groups in and object annotation will also be con- and personalization services DELOS. This will be based on the sidered. Since information (for instance • image feature extraction OSIRIS/ISIS middleware, the develop- in e-Science Digital Library applica- • 3D shape recognition ment of which began at ETH Zürich for tions) increasingly originates from soft- • special indexing techniques for video ETHWorld, the virtual campus of ETH. ware or hardware sensors, sensor datas- retrieval It was further developed for data streams tream processing will also be integrated • audio feature extraction and audio and for medical objects at UMIT, and is in the DelosDLMS. Essentially, all this retrieval currently being extended at the DL functionality will be made available • advanced visualization services and University of Basel. The OSIRIS mid- by means of services. The challenge of visual relevance feedback dleware (Open Service Infrastructure for DelosDLMS is therefore to provide a • Self-organizing maps visualization Reliable and Integrated process Support) scalable and reliable infrastructure • active paper (linking digital informa- supports programming-in-the-large; ie, where these services can be plugged in tion and paper) the combination of arbitrary application and used as building blocks. • services for transformations between services into so-called processes. This is standards realized by a set of generic (application- Two alternatives exist for integrating • ontology services and natural lan- independent) services that include the services with OSIRIS. First, there are guage access registration of services and processes, tightly coupled services, which are • preservation services interfaces for application development, tightly integrated into the OSIRIS run- • services for multi-lingual access. an engine for decentralized execution of time infrastructure. Advanced failure processes, and services for load bal- handling and load balancing are among This list will be extended and revised ancing. In addition, it features reliable the main advantages of this arrangement. after the evaluation of the call for ser- execution by applying advanced In terms of failure handling, compen- vices and during the actual integration database concepts – essentially for sating services can be registered which work. Nonetheless, it highlights exam- failure handling and concurrency control are automatically invoked in case of fail- ples of building blocks that will be con- – at the level of processes. ISIS ures. In terms of load balancing, ORISIS sidered for DelosDLMS. (Interactive SImilarity Search). is a set of can automatically choose the node car- DL services that have been developed on rying the lightest load to invoke a service Links: the basis of the OSIRIS middleware. that is deployed several times. This is The DELOS Project: http://www.delos.info ISIS includes a sophisticated index particularly important for computation- The DELOS Architecture Work Package: structure for similarity searching, which ally expensive services like feature http://dbis.cs.unibas.ch/delos_website/ is particularly well suited for high- extraction. The ISIS/OSIRIS Homepagel: dimensional vector spaces. http://dbis.cs.unibas.ch/research/isis_osiris Furthermore, in terms of Digital Library Second, services can be loosely coupled functions, ISIS features rudimentary with OSIRIS, meaning that services are Please contact: support for textual and content-based described and invoked by standard Web Hans-Jörg Schek, Department of Computer audiovisual searching. It also provides service interfaces (SOAP and WSDL). and Information Science, University of basic support for relevance feedback and This reduces the effort needed for inte- Konstanz, Germany visualization. gration but does not provide the benefits of tight coupling. Heiko Schuldt, With the DelosDLMS, existing ISIS ser- Database and Information Systems Group, vices will be significantly enriched by Recently, a 'call for services' has been University of Basel, Switzerland other specialized DL services that have issued to both members and non-mem- Tel: +41 61 267 0558 been developed within the DELOS net- bers of DELOS. The goal is to identify E-mail: [email protected]

ERCIM News No. 66, July 2006 23 SPECIAL THEME: European Digital Library

A Powerful and Scalable Digital Library Information Service

by Henri Avancini, Leonardo Candela, Andrea Manzi and Manuele Simi

The implementation of a Digital Library capable of putting Europe's memory on the Web demands a service-oriented, federated and distributed approach. Supporting such an approach requires the introduction of a new type of enabling service, usually called an Information Service, which can collect and disseminate information on the resources that constitute the federation. In large and distributed Digital Libraries, the key features of this service are scalability and availability.

The DLib group of the Networked during a given sequence of interactions information producers, collectors, and Multimedia Information System with a user, a device or another service. consumers. Laboratory at ISTI-CNR has extensive experience in building digital libraries The DILIGENT Information Service Producers and consumers are supported (DLs). This experience arises from the These features were recently included in in interacting with the IS via a participation with scientific leadership in the DILIGENT infrastructure, developed lightweight component that is distributed a series of EU IST projects such as as part of the project of the same name. on each hosting node of the infrastruc- SCHOLNET. It also stems from the DILIGENT is a testbed infrastructure ture. This component is called an IS- development of OpenDLib, a highly that will allow members of dynamic vir- Client, and supports three main features: flexible Digital Library Service System tual eScience communities to create on- (i) publication of the information (IS-IP that has been shown to be suitable for demand transient DLs based on shared library); (ii) access to information and building and operating a range of digital computing, storage, applications, and discovery via querying and subscrip- libraries. One of these is the DELOS DL, multimedia and multi-type content. It is tion/notification mechanisms (IS-C); which manages the documentation of the designed as a service-oriented architec- and (iii) the local storage and mainte- DELOS Network of Excellence. ture over Grid technology and relies on nance of useful and constantly updated Another is the BELIEF DL, which WS-* family standards, namely the information (IS-Cache). The IS-Client serves the eInfrastructure community by WSRF framework and the WS- allows information in the distributed collecting and providing focussed views Addressing, WS-Security and WS- infrastructure to be efficiently accessed over multimedia documents, as well as Notification specifications. and published, while hiding any detail of presenting the latest details relating to the routing process that could identify projects, initiatives and events. In this infrastructure, discovery and the collectors involved. monitoring occur through a specific ser- Our experience leads us to believe that vice, called an Information Service (IS), The collectors aggregate the producers' the service-oriented approach with depicted in Figure 1. This service is information. This part is composed of loosely coupled services is the most organized in three logical parts, each two components, the IS-Registry and the appropriate architectural approach for serving the needs of a class of actors: IS-IC. The former acts as a classical reg- building highly distributed systems. This istry and maintains the list of available approach relies on independent services that provide the expected functionality by cooperating with other services of the federation. In order to produce dis- tributed DL infrastructures of a high quality, we find that an effective dis- covery phase of the constituent compo- nents and careful monitoring of the infrastructure are mandatory. Supporting these features means relying on two kinds of information about services: (i) static information, which includes data that remains fixed during the service life- time, eg location, usage policies and con- figuration parameters; and (ii) dynamic information, which contains data on the operational state of the service, eg the set of properties that keeps track of events Information Service Logical Architecture.

24 ERCIM News No. 66, July 2006 SPECIAL THEME: European Digital Library

services and their static information. The Next Steps of each other and to dynamically dis- latter maintains the dynamic information The viability of the proposed approach cover new repositories and services as and is based on a highly distributed will be further tested in the context of the they join the infrastructure. architecture. forthcoming IST project: 'Digital Repository Infrastructure Vision for This work would not have been possible From an operational point of view, it is European Research – DRIVER'. The without the help of colleagues at the important to note that each time one of objective of DRIVER is to build a NMIS Laboratory. Special thanks go to the federation's services is deployed, it is testbed for a future knowledge infras- Davide Bernardini and Pasquale Pagano first registered on the IS-Registry, and tructure of the European Research Area. for their invaluable support in designing then starts producing its dynamic infor- Existing digital repositories spread over and developing this distributed and scal- mation via the local IS-IP. In parallel, the the Net will be federated, and a set of able Information Service. IS-Cache takes care of maintaining the cross-repository services will be set up to set of minimal information needed by provide seamless access to the DL con- Links: locally hosted services for both pub- tent, regardless of which repository owns Networked Multimedia Information System lishing and querying. The IS-Registry the content. Concretely, the project will Laboratory website: continuously monitors the service start by federating 51 institutional repos- http://www.isti.cnr.it/ResearchUnits/Labs/ instances, thereby maintaining an overall itories from The Netherlands, United nmis-lab/ 'picture' of the infrastructure in line with Kingdom, Germany, France and OpenDLib website: the actual status. Belgium. Each of these repositories will http://www.opendlib.com/ be considered as an element of the com- DILIGENT project website: As well as designing this logical organi- ponent-oriented infrastructure. Other http://www.diligentproject.org/ zation, we are currently evaluating and components will provide digital library BELIEF project website: comparing various caching strategies functionality, eg search and browse, per- http://www.beliefproject.org/ and the distribution and selection algo- sonalized information access through rithms for the IS-ICs. For instance, we recommendations, and virtual collec- Please contact: are investigating the use of distributed tions. In this context the Information Leonardo Candela – ISTI-CNR, Pisa, Italy information retrieval techniques like Service will play a key role, since it will E-mail: [email protected] CORI. allow the other services to become aware

Semantic Search in Peer-to-Peer-Based Digital Libraries by Hao Ding and Ingeborg Torvik Sølvberg

Advances in peer-to-peer overlay networks and Semantic Web technology will have a substantial influence on the design and implementation of future digital libraries. However, it remains unclear how best to combine their advantages in digital library construction. Research in the IF group at the Norwegian University of Science and Technology (NTNU) is evaluating possible solutions to advance developments in this field.

One of the most important features of the best to combine these two technologies Based on our analysis, we concluded that digital library of the future will be that it to form a total solution for digital library these two fields are complementary, and is accessible from anywhere, by anyone construction. NTNU researchers, under that there are great advantages to be and at any time. Achieving this goal the framework of the IKT/WEB-TEK gained by combining them in conducting requires that the digital library be inves- project sponsored by the Research semantic searches in a large-scale dis- tigated as an integrated whole rather than Council of Norway, have developed a tributed environment. One major weak- as the sum of its individual parts. The semantic search framework for peer-to- ness in the current peer-to-peer systems approaches used in peer-to-peer overlay peer based digital libraries. is their limited search capabilities, which networks and Semantic Web technology is due to their lack of power in show promises for aspects of communi- Our work, as illustrated in Figure 1, has responding to queries. The Semantic cation infrastructure and semantic pro- involved comparing and identifying the Web and ontologies as a semantic tool cessing respectively. However, little strengths and weaknesses of both peer- provide a basis for a shared under- work has been done to determine how to-peer and Semantic Web technology. standing across a group of individuals,

ERCIM News No. 66, July 2006 25 SPECIAL THEME: European Digital Library

robustness can be improved with a small overhead in running specific communi- cation protocols on these nodes.

As an intermediate goal, a tentative benchmark has been proposed for selecting an appropriate peer-to-peer networks for information searching in various digital library applications. In particular, our project has extended classic super-peer-based networks with load-balancing and self-organizing func- tionalities, thereby catering for dynamic situations that characterize digital libraries, such as continuous departures of peers, overload caused by the joining Figure 1. Combining P2P and Semantic Web for Constructing Digital Libraries. of peers, or even a system catastrophe. Evaluation results are illustrated in Figure 2. such as in detecting similar concepts In our survey of existing peer-to-peer among ontologies and integrating mul- systems, our project has concentrated In studying the use of Semantic Web tiple ontologies at no cost to the end mainly on scalability and autonomy. technology to enhance search perfor- users. By applying ontologies, the search From a technical perspective, digital mance in digital libraries, this project capability in peer-to-peer networks can libraries need a common infrastructure investigates not just ontology-enriched be strengthened via semantic informa- that is highly scalable, customizable and metadata searching, but also the use of tion processing. The inference engine adaptable. To this end, peer-to-peer sys- rules to express more complicated rela- can also be specifically adapted to tems have been suggested as one method tions that exist among metadata records. achieve more reliable results by for facilitating cooperation among dig- We have compared the performance of a deducing predefined rules. ital libraries and for improving the acces- super-peer-based digital library by sibility of library services. Another crit- applying searches based on global However, while the Semantic Web and ical goal of digital libraries is the sharing schemas and ontology mapping. ontologies provide us with a mechanism of resources with a wider audience. Currently, we are evaluating the poten- for facilitating semantic information However, many inconsistencies exist tial introduction of rule-based reasoning management and processing, they focus across platforms, applications and capa- in all applications. more on local and static situations, rather bilities. This means that library systems than a distributed and dynamic environ- must often sacrifice autonomy to reach Link: ment. Because they are innately decen- agreement with each other, so as to http://www.idi.ntnu.no/grupper/if/ tralized, peer-to-peer systems can help enable better searching and sharing of exploit the full potential of the Semantic information. In comparison with Please contact: Web's capabilities. In other words, peer- client/server architecture, peer-to-peer Hao Ding, NTNU, Norway to-peer systems can act as a fundamental systems provide a more open architec- Tel: +47 73594168 platform for the searching and sharing of ture by decentralizing the control from E-mail: [email protected] distributed information by using the servers, allowing nodes (eg digital Semantic Web technology. libraries) to be loosely coupled. As a consequence, system scalability and

Figure 2. Evaluation Results: from left: (a) Self-organizing under a scenario of continuous leaving of peers; (b) Load-balancing under a scenario of continuous joining of peers; (c) Catastrophe Recovery.

26 ERCIM News No. 66, July 2006 SPECIAL THEME: European Digital Library

XPeer: A Digital Library for the European Higher Education Area by Mark Roantree and Zohra Bellahsène

The Bologna Process was initiated in 1999 by 29 Education Ministers of the EU, with the aim of providing a single European Higher Education Area (EHEA). Its goals were to promote mobility across EU states by standardizing the education programmes in all states, and by 2010, to provide a system of constructing degree programmes across multiple states. In effect, this means that the EHEA System becomes a vast digital library with the appropriate data-management functionality. This presents a major challenge to educational institutes as they prepare to modify their systems for incorporation into the EHEA Digital Library.

There are five broad goals to the and export the information using a with databases (peers) on one level and Bologna Agreement: a system of easily variety of formats, interfaces and seman- super-peers (integrated systems) on the readable and comparable degrees; a tics. XPeer is a peer-to-peer (P2P) archi- second level. In Figure 1, the clusters Ci, system with two study terms; the estab- tecture with each peer representing a Cj and Ck are formed using course lishment of a system of credits; the pro- single database through an XML inter- module databases from various insti- motion of mobility and European coop- face. In the Bologna context, a peer may tutes. eration in quality assurance; and the pro- contain a single course module or a motion of the necessary European group of modules within the educational Managing the Large-Scale dimensions in higher education. To institute and it is assumed that each insti- Digital Library achieve this, the issues that must be over- tute provides any number of information The super-peer is the access point for come are integrating the very large peers. The XPeer Architecture is novel users in the XPeer System, and for other number of databases that contain educa- in that it adopts the 'super-peer' concept, super-peers that wish to communicate tion programmes, dealing with the het- which was first employed in networking within the system (to resolve queries erogeneity of the systems involved, and terms to denote a peer of greater impor- internally). It is the underlying P2P designing a single query interface to the tance. In XPeer, we use this concept to framework that supports the large-scale EHEA Digital Library.

The XPeer Architecture The XPeer Framework emerged from a collaborative effort between Dublin City University and the University of Montpellier to design a large-scale database architecture together with Query and Metadata Services. This research examined scenarios in which it was necessary to write complex queries for large numbers of heterogeneous pre- existing databases and information sys- tems. This project faced the same issues as Bologna institutes: management of an extremely large system of databases where old information disappears and new information is made available on a XPeer framework for the EHEA Digital Library. regular basis. This is essentially what takes place when old courses or subjects integrate common peers, and thus define element to the system, as there is no cen- are decommissioned and new ones groups such as 'All modules in Europe tral control point and therefore no bottle- emerge. for Final Year Computer Graphics stu- neck or limit to the number of partici- dents', 'Database courses in Europe pating systems. The super-peer concept Data integration is a significant chal- delivered in English', 'Networking permits the creation of 'clusters' of inter- lenge: relevant data objects are split Courses in French Universities', or '1st esting systems for end-users. However, across multiple information sources and Year Java Programming in Ireland, as the system grows, it is necessary to are often owned by different organiza- Greece and Germany'. In this way, we introduce a form of classification to tions. The sources represent, maintain have introduced a two-tiered system assist with query optimization. As peers

ERCIM News No. 66, July 2006 27 SPECIAL THEME: European Digital Library

join the system, they are classified Current and Future Work Links: within the the existing set of domains. Current research efforts are focused on XPeer publications: When XPeer is used to model the the provision of query and metadata ser- http://www.computing.dcu.ie/~isg Bologna Process, these domains become vices for the EHEA Digital Library. The Bologna Agreement: the major disciplines such as medicine, Query Service uses the XPath Query http://www.euractiv.com/en/education/ computing, history and engineering. Language, boosted by a fast indexing bologna-process/article-117448 Thus, while the P2P approach allows for system and result-merging process to an infinitely large digital library, the facilitate distributed querying on a large Please contact: classification process ensures that the scale. The Metadata Service also uses Mark Roantree library is ordered, and the integration the XPath language, extended to allow Dublin City University, Ireland process (super-peer) provides a global network creation, the addition and E-mail: [email protected] interface to collections of related removal of peers, the promotion of databases. Thus, peers are accessed common peers to the role of super-peer, Zohra Bellahsène through super-peers; super-peers belong the addition and removal of domains and University of Montpellier II, France to a specified domain; and finally, a so on. The project is expected to con- E-mail: [email protected] global peer (replicated to avoid perfor- clude at the end of 2008 with a set of mance issues) is used to manage the set fully specified services to support the of domains. large EHEA digital library.

Increasing the Power of Semantic Interoperability for the European Library

by Martin Doerr

With the support of the DELOS Network of Excellence, IFLA and ICOM are merging their core ontologies. This is an important step towards semantic interoperability of metadata schemata across all archives, libraries and museums, and opens new prospects for advanced information integration services in the European Digital Library. The first draft of the combined model will be published in June 2006.

Semantic interoperability of Digital information integration, and provide a tation (CIDOC) of the International Libraries (DLs) requires compatibility of more robust, scalable solution than tai- Council for Museums (ICOM) both the employed Knowledge lored 'cross-walks' between individual Documentation Standards Working Organization Systems (KOS; eg classifi- schemata. Information and queries are Group. This is occurring with the initia- cation systems, terminologies and mapped to and from the core ontology, tive and support of ICS-FORTH, authority files) and of the employed which serves as a virtual global schema Heraklion, and the CRM is about to be metadata schemata. Currently, the notion and has the capability to integrate com- accepted as ISO standard (currently and scope of DLs covers not only tradi- plementary information from more ISO/DIS 21127) in 2006. It is a core tional publications, but also scientific and restricted schemata. Many scientists ontology aiming to integrate cultural her- cultural heritage data. The grand vision is question the feasibility of such a global itage information. It already generalizes to see all these data integrated so that ontology across domains. On the other over most data structures used by highly users are effectively supported in side, schemata like Dublin Core reveal diverse museum disciplines, archives, and searching for and analyzing data across the existence of overarching concepts. site and monument records. Even the all domains. Even though the Dublin Ideally, the European Digital Library common library format MARC Core Metadata Element Set is well would be based on one sufficiently ('MAchine Readable Cataloguing') can be accepted as a general solution, it fails to expressive core ontology, not by selec- adequately mapped to it. Its innovation is describe more complex information tion, but by harmonization and integra- to centre descriptions not around the assets. These include multimedia and tion of the relevant alternatives. The things, but around the events that connect learning objects, and data from character- challenge is to explore practically the people, material and immaterial things in istic domains such as archaeological limits of harmonizing conceptualizations space-time. Further, it explicitly describes finds or observational data from geo- from relevant domains. the discourse on relations between identi- sciences. fiers and the identified, a powerful feature The CIDOC Conceptual Reference for the integration of information assets. Core ontologies describing the semantics Model (CRM) has been developed since of metadata schemata are the most effec- 1996 under the auspices of the Quite independently, the FRBR model tive tool to drive global schema and International Committee on Documen- ('Functional Requirements for

28 ERCIM News No. 66, July 2006 SPECIAL THEME: European Digital Library

Bibliographic Records') was designed as an entity-relationship model by a study group appointed by the International Federation of Library Associations and Institutions (IFLA) during the period 1991-1997. It was published in 1998. Its innovation is to cluster publications and other items around the notion of a common conceptual origin – the 'Work' in order to support information retrieval. Its focus is domain-independent and can be regarded as the most advanced formu- lation of library conceptualization. Partial model of the intellectual creation process.

Initial contacts in 2000 between the two intellectual creation process (see Figure). powerful. It allows for a minimal communities eventually led to the for- Work will continue with modelling description of complex processes, scien- mation in 2003 of the International information about authority records and tific and archaeological data, and is Working Group on FRBR/CIDOC CRM performing arts. widely extensible in a consistent way by Harmonisation. It is headed by Martin the CRM-FRBR concepts. CRM Core Doerr from ICS-FORTH and Patrick The potential impact can be high. The can be easily used by Digital Libraries. LeBoeuf from BNF Paris, and brings domains explicitly covered by the com- together representatives from both com- bined models are already immense. Links: munities. The common goals are to Further, they seem to be applicable to the IFLA: http://www.ifla.org express the IFLA FRBR model with the experimental and observational scien- ICOM: http://icom.museum concepts, ontological methodology and tific record for e-science applications. Definition of the CIDOC CRM: notation conventions provided by the From a methodological perspective, the http://cidoc.ics.forth.gr. CIDOC CRM, and to merge the two endeavour experimentally proves the Definition of CRM Core: object-oriented models thus obtained. feasibility of finding viable common http://cidoc.ics.forth.gr/working_editions_ci This Working Group is now being sup- conceptual grounds even if the initial doc.html ported by the DELOS NoE, and in June conceptualizations seem incompatible. Definition of FRBR: 2006 will publish the first complete draft Even though this process is intellectually http://www.ifla.org/VII/s13/frbr/frbr.htm of FRBROO, ie the object-oriented ver- demanding and time-consuming, we DELOS NoE deliverable 5.3.1: sion of FRBR, harmonized with CIDOC hope the tremendous benefits of nearly http://delos-wp5.ukoln.ac.uk/project- CRM. This formal ontology is intended global models will encourage more inte- outcomes/SI-in-DLs to capture and represent the underlying gration work on the core-ontology level. semantics of bibliographic information A recent practical application of these Please contact: and to facilitate the integration, media- models is the derivation of the CRM Martin Doerr, ICS-FORTH, Greece tion and interchange of bibliographic Core Metadata schema, which is com- Tel: +30 2810 391625 and museum information. Its major inno- patible and similar in coverage and com- E-mail: [email protected] vation is a realistic, explicit model of the plexity to Dublin Core, but much more

A Tool for Converting Bibliographic Records by Trond Aalberg

The FRBR model for bibliographic information enables libraries to accommodate a broad range of user needs, and is considered to be an important step towards the next generation of library information systems. To support the application of FRBR in current library catalogues, solutions are needed to the problem of interpreting or converting MARC-based information. At the Norwegian University of Science and Technology, we have developed a conversion tool for this purpose.

The Functional Requirement for within the library community as an conceptualization of the entities, Bibliographic Records (FRBR) was pub- important contribution to the modern- attributes and relationships of concern in lished by the International Foundation izing of library cataloguing and informa- bibliographic information. For the end for Library Associations and Institutions tion systems. The ER-model proposed user, the model promises to support a in 1998 and is widely acknowledged by the FRBR Working Group is a formal broad range of expectations and needs.

ERCIM News No. 66, July 2006 29 SPECIAL THEME: European Digital Library

The heart of the FRBR model is a set of work or person may have duplicate the use of XML, includes the automatic entities that represent the key objects of descriptions in numerous records, and to generation of the XSL transformation interest to users of bibliographic infor- be able to create a consistent set of enti- files used in the conversion, and the solu- mation. The products of intellectual or ties with a proper set of relationships, the tion is independent of any specific artistic endeavour that are named or process needs to be based on an exten- MARC format and cataloguing rules. described in bibliographic records are sive set of rules and conditions. The final Because of this, the tool is reusable represented by the entities work, expres- output of the process should be a normal- across catalogues and MARC formats. It sion, manifestation and item. The enti- ized set of unique entities with a proper uses records in the MarcXchange format ties person and corporate body represent set of attributes and relationships. as input and produces output in a format those responsible for the content, pro- Additionally, the conversion process that is based on MarcXchange, but has duction, dissemination or custodianship needs to support solutions to numerous specific attributes for describing the of the product entities. An additional set problems and exceptions. These may be types defined in the FRBR model and includes entities that serve as the sub- caused by inconsistencies and errors elements for describing the relationships between entities. The conversion tool has successfully been used to transform the 4 million records of the BIBSYS Identify entities Select and assign attributes database into an FRBR-ized prototype Create XSLT Conversion rules XSLT files Create identifiers Establish relationships that is available on the Web. This proto- type database is primarily intended to demonstrate the results of the transfor- MarcXchange Preprocessing Transformation Normalization FRBRXML mation and can be used to search and navigate the BIBSYS database in the shape of FRBR entities and relation-

#... Export #4 Postprocessing ships. The actual conversion performed #3 #2 #1 on this particular catalogue is still far from perfect, and the set of conditions

FRBR entities and rules must be extended to support Bibliographic FRBR MARC records and relationships catalogue database the exceptions and errors in the initial data. However, the tool enables librar- ians to easily specify and test various rules and conditions, and the overall result can be evaluated by inspecting or The process of transforming MARC records into a representation based on the FRBR querying the resulting database. model. The application of the FRBR model as a common ground for interchange and jects of works. For each entity a set of resulting from erroneous registrations, integration between libraries fits well attributes is defined and the model data imported from low-quality sources with the current focus on cross-domain includes an extensive set of possible or changes made to the catalogue. semantic interoperability in digital relationships that may exist between the libraries. NTNU is participating in the entities. This issue – that of transforming MARC DELOS NoE activity on development of records into a representation that directly the FRBROO ontology, and future activi- Although many projects have explored reflects the FRBR model – has been ties include adapting the conversion the use of FRBR in different contexts investigated by the Norwegian system to produce bibliographic infor- and some tools exist, there is little sup- University of Science and Technology in mation encoded as RDF using the port for the systematic processing of all a joint project together with the FRBROO ontology, for the purpose of information in all MARC records into a Norwegian library service center cross-domain integration and interoper- proper representation that directly BIBSYS and the National Library of ability using semantic Web technology. reflects the entities, attributes and rela- Norway. The purpose of this project has tionships of the FRBR model. Due to a been to support and explore the applica- Link: paucity of reusable solutions, researchers tion of the FRBR model on existing BIBSYS FRBRized prototype: beginning work in this area typically MARC-based library catalogues. The http://november.idi.ntnu.no/frbrized need to reinvent the conversion process project has two major goals: the identifi- and write their own interpretation cation and modelling of the various tasks Please contact: system. The transformation from MARC required in a conversion process, and the Trond Aalberg, Norwegian University of to FRBR is a complex task that in many development of a conversion tool that is Science and Technology, Norway ways is different from a simple transfor- based on the use of rules and conditions Tel: +47 7359 7952 mation such as the conversion from to define the transformation from E-mail: [email protected] MARC to Dublin Core. Entities such as MARC to FRBR. The tool is based on

30 ERCIM News No. 66, July 2006 SPECIAL THEME: European Digital Library

Information Patterns for Digital Cultural Repositories by Chryssoula Bekiari, Panos Constantopoulos and Martin Doerr

Digital cultural repositories emerge from the ever-increasing digitization of documents and images, digital photography, analogue-to-digital conversion of audio or video recordings, and digital transcription of object information recorded in various ways. Adding original digital cultural products and digital recordings of cultural information to these digital surrogates leads to an impressive collection of digital cultural material.

As digital collections are created inde- exchange. Semantic interoperability is pendently by autonomous organizations, the ability of different information sys- the emergence of a unified digital space tems to provide information consistent is neither automatic nor easy. Aside with the intended meaning. In practice, from the legal and organizational issues, semantic interoperability aims to asso- certain conditions are required for repos- ciate knowledge dispersed in various itories to be interoperable. These are pri- carriers and forms, thereby allowing marily concerned with documentation related concepts to be automatically Instruction in ancient Athens, red-figured data and processes. Cultural documenta- identified. To do this, standards for rep- attic vase, 5th c. BC, Berlin tion comprises a wide spectrum of infor- resenting objects, functions and content Archaeological Museum. mation on the objects themselves, phys- must be adhered to, during both docu- ical or informational, as well as related mentation and 'productive' uses of digital ISO Draft Standard 21127)) for long- processes ranging from data acquisition information.In order to build a frame- term semantic interoperability, and are to various scientific studies, conserva- work for developing interoperable cul- written in XML for syntactic interoper- tion, exhibition design and publication. tural digital repositories we follow a dual ability. The CIDOC CRM is an ontology These processes may be separately docu- strategy. First, we draw on standard describing the concepts and relations mented and multiple relevant data sets (meta)data structures recommended by involved in cultural documentation. It may exist. If all this information is to be established national and international provides a common base for the interpre- truly useful, we must ensure the ability bodies concerning archaeological, eth- tation of various forms of documenta- exists to easily access and analyse infor- nological, museological, archival, geo- tion, but does not dictate the documenta- mation from disparate sources. graphical, terminological and digital tion elements. Thus it plays a pivotal role preservation data for specific application in building interoperable digital cultural Interoperability has a syntactic and a areas. Second, all structures in the frame- repositories. semantic aspect. Syntactic interoper- work are related to a common ontology ability is achieved by conforming to (namely CIDOC CRM of the On this basis we define a set of informa- standards for information encoding and International Council of Museums (also tion patterns, ie fundamental types of information unit such as time, place, object composition, event etc. This

Date Object composition Table 1: reduces the problem of designing cul- from number of parts Examples of until part tural object records to one of designing a name information kind set of information patterns and a general, code or cardinal number patterns. flexible record structure. As there are Chronology Dating within chronology fewer information patterns than record throughout time measurement cultural period value fields, the design and the conformance social time method justification laboratory with relevant standards are much more closely controlled. Rather than indepen- Place Event name name dent records, we thus obtain a family of code kind records, conformant at the information cadastral number chronology kind place pattern level. This allows different needs geopolitical hierarchy description address persons involved to be addressed and systematic data- coordinates organizations involved values objects involved entry procedures to be adopted, and reference point comprises events precision of measurement ensures interoperability. Examples of geodesic coordinate system information patterns are given in Table link to design Person Organization 1. Furthermore, for the description of name Title biographical data legal address museum objects and site monuments we communication data communication data role/capacity/social group department have produced a comprehensive, role/capacity/social group CIDOC CRM-compliant, common

ERCIM News No. 66, July 2006 31 SPECIAL THEME: European Digital Library

XML DTD, resulting in a new, dynamic Further work includes a number of appli- information modelling and integration cultural object record of unprecedented cation-specific extensions and, most projects. It also benefits from the devel- genericity. Supporting a pragmatic importantly, domain extensions arising opment of the CIDOC CRM ontology object documentation workflow model, from current advances in harmonizing and the activities within the DELOS this object record can accommodate the CIDOC CRM with the FRBR (the Network of Excellence on Digital everything from minimal to highly bibliographic record model endorsed by Libraries, which are devoted to the har- detailed object information in structured IFLA). monization of the CIDOC CRM and and unstructured forms, thus gracefully ABC (DELOS 1) and the CIDOC CRM adapting to the needs of various working This work, carried out at the Centre for and FRBR (DELOS 2) models. contexts. Cultural Informatics, FORTH-ICS, was part of the compilation of guidelines for Please contact: The XML DTD is available as a ready- cultural digitization projects under the Chryssoula Bekiari to-use CIDOC CRM-compliant solution, Greek Information Society Programme. ICS-FORTH, Greece along with higher-level guidelines and It draws heavily on previous and concur- Tel: +30 2810 391631 the CIDOC CRM ontology itself. rent experiences from several cultural E-mail: [email protected]

Towards a Semantic Information Platform for Subsea Petroleum Processes

by Jon Atle Gulla

The Integrated Information Platform project is defining an international semantic standard – or ontology – that will assist oil companies in making decisions and organizing collaborations. This article describes how the standard is developed and how it is used in intelligent information retrieval and reasoning.

The subsea petroleum industry is a tech- distributed and specialized skills of new closely coordinated with various stan- nically challenging business with com- service companies. dardization efforts in the petroleum plex projects and operational structures. industry. The overall objective is to use The projects are expensive, and they In 2004 the Norwegian Oil Industry semantic technology to improve deci- often include several large companies Association launched an Integrated sion-making processes and reduce risks and span disciplines like drilling, reser- Operations program that proposed the and costs in petroleum projects. In partic- voir, production, operations and mainte- use of new information and communica- ular, the project will result in an open nance. The European petroleum business tion technology to integrate processes platform that supports semantic interop- is now facing a number of challenges onshore and offshore. OLF's own esti- erability and intelligent information man- that threaten its profitability. The costs of mates indicated that the implementation agement for subsea production systems. large older fields increase as they enter of this program on the Norwegian the decline phase, and new fields tend to Continental Shelf would increase oil IIP is now completing one of the largest be smaller and less scalable. We also recovery by 3-4%, accelerate production industrial ontologies for the terminology produce more oil and gas than is added by 5-10%, and lower operational costs used in the petroleum business. Support through exploration and improved oil by 20-30%. Central to this program was from central industrial partners on the recovery. For the Norwegian the semantic and uniform treatment of Norwegian Continental Shelf has been Continental Shelf for example, the addi- heterogeneous data, which originate secured, and the project intends to pro- tion and production numbers are at about from various disciplines and companies, pose this ontology as part of a new ISO 100 million Sm3 and 250 million Sm3 at various locations, and with various standard. As such, it will also be avail- respectively. Finally, there has been an degrees of precision and formality. able to companies and institutions that increase in the number of small and are not currently part of the project. Parts highly specialized service companies The Integrated Information Platform of the new ontology have been converted that need to collaborate closely both with (IIP) project was initiated in June 2004 from ISO 15926 Integration of life-cycle each other and with the traditional bigger and is a collaborative project involving data for oil and gas production facilities, companies. All of this suggests that academic institutions and companies but after looking into the terminologies future petroleum projects need to be active on the Norwegian Continental used in selected petroleum projects, we more cost-effective, make better use of Shelf. Led by Det Norske Veritas, the also included concepts from other ISO small fields, and take advantage of the project has a budget of about 22 million standards. More than 40 000 concepts NOK (about 2.8 million Euro), and is have now been defined and modelled in

32 ERCIM News No. 66, July 2006 SPECIAL THEME: European Digital Library

hierarchical conceptual structures. Figure 1 shows some of the concepts that must be defined just for the representa- tion of wellheads. A Christmas tree is the set of valves, spools and fittings con- nected to the top of the well and used to control it's the fluid flow. The final ontology, which will be available in 2007 in OWL (Web Ontology Languages) with all the properties and rules incorporated, will be used to inte- grate petroleum applications, interpret real-time data from subsea installations, and access the information needed in decision processes. Figure 1: Wellhead with Christmas tree and associated ontology classes. Finding information quickly is important in petroleum processes. With the vast number of sensors and amount of com- based on her past behaviour. As the user ontology is specified in OWL Full how- munication equipment added to new is interacting with the system, her ever, it is not clear how the reasoning installations, the problem is more to do behaviour is observed and the system capabilities can be added. OWL Full with relevance than lack of information. refines its understanding of the user's does not lend itself to standard There is an overwhelming amount of perception of the concepts. Rather than Description Logic reasoning. We are information available, from project doc- using the ontological hierarchies to therefore looking into alternative repre- umentation to streams on real-time data expand the search and simply increase sentations of these parts or other ways of from subsea installations, and it may all the amount of retrieved data, the system adding reasoning capabilities to our be relevant to decisions that need to be builds personalized descriptions of onto- semantic information management made quickly and accurately. The pro- logical concepts that help us better rank framework. ject is implementing an ontology-driven the documents with respect to users' approach to searching that interprets the interests and needs. Links: user's query in terms of ontological con- NTNU: http://www.ntnu.no cepts and associates these concepts with Future work includes a rule-based notifi- Norwegian Oil Industry Association : weighted terms used in documents and cation component that will be used to http://www.olf.no data records. The idea is to let these analyse anomalies in real-time data weighted terms define the concepts with coming from sensors measuring the pro- Please contact: respect to both the information available duction of oil and gas. Rules in OWL Jon Atle Gulla, NTNU, Norway and the preferences of the user. They specify properties of the equipment used Tel: +47 73591847 may be constructed on the basis of a and what actions should be taken if a E-mail: [email protected] training set provided by the user and/or constraint has been violated. Since the

Towards The European Metadata Registry by László Kovács, András Micsik and Jill Cousins

UKOLN and SZTAKI are partners with The European Library in a new research collaboration that aims to create a European Metadata Registry.

The existing European Library service is National Libraries. This means that ates a pan-European platform and is a an ambitious and pioneering collabora- researchers or informed citizens in any strategic initiative in European content tion between European national country can use – in a single search ses- enrichment. libraries. It is supported by the EU and sion – not only the resources of their own was created under the auspices of the national library but alsothose of any To extend this work, the European Conference of European National other partner national libraries. The Library has commenced a new collabo- Librarians (CENL). It offers a single European Library adds value to content ration with UKOLN (United Kingdom) access point to their holdings and spans a by offering indexing services through and SZTAKI, which aims to create a range of collections in all the partner the individual national libraries. It cre- European Metadata Registry (EMR).

ERCIM News No. 66, July 2006 33 SPECIAL THEME: European Digital Library

This resulted in some software develop- ferent background semantical models, We intend the EMR to be a standardized, ment and a prototype of a running reg- model mappings are non-trivial and yet flexible and user-friendly tool. It will istry service, which is capable of regis- require further scientific investiga- administer European metadata from dif- tering application profiles and their tions. ferent European cultural heritage commu- semantic connections. The reuse of pro- • It would register tools and/or on-line nities including libraries, museums and files, terms, elements and encoding services available for mappings, infer- archives, and its aim is to ensure trans- schemas is effectively supported by encing, translations, versioning and parency, access and interoperability. implementing an open platform for a col- access. laboration of users from the CORES reg- • Finally, it would register semantic Registry collaboration started with an istry. A registry is single place to register connections and relations between dif- evaluation of the available registry from new metadata terms, inspect terms ferent schemas, thereby fostering the The European Library. Some issues were already in use by different partners, and reuse of profiles, terms, elements and uncovered and suggestions for its to propose new terms for properties that encoding schemas. improvement have been collected. A require them. functional implementation plan for this With these services the EMR will support improvement is under development, The need for this type of joint European the production of relevant crosswalks with a likely implementation date of registry is obvious: National Libraries in between legacy metadata schemas and/or September 2006. At the same time, ini- Europe apply different legacy metadata The European Library schema. In addi- tial attempts to define the 'whys' and schemas, including Unimarc and tion, it will provide collaborative services 'whats' of a European Metadata Registry Marc21. The European Library uses the for the development and improvement of have commenced, and are likely to be TEL Application Profile, which is new metadata schemas. Previous meta- funded under the European Digital Dublin Core with extensions, in order to data registry approaches have enforced Library project in eContentPlus. ensure interoperability when performing strict model-based methods in order to a search across libraries and collections. reach precise one-to-one schema map- Link: pings. This new approach will investigate http://www.TheEuropeanLibrary.org The European Metadata Registry would a more relaxed mapping method, as well provide a set of services: as a new collaboration technique - a Please contact: • It would describe different metadata Schemapedia. This is in the style of a Jill Cousins, The European Library Office, schemas and/or application profiles, as Wikipedia, and is for metadata profes- The Koninklijke Bibliotheek, National Library well as the aims, target audiences, sionals who either need to develop of the Netherlands application circumstances and scope schemas for their projects or to cross Tel: +31 70 314 0952 of the schemas. metadata access tools/services. The reg- E-mail: [email protected] • It would represent internal semantic istry can grow organically on the basis of structure, the hidden model of functional granularity and bilateral map- László Kovacs, SZTAKI, Hungary schemas. Model descriptions aim both pings. The scope and scalability of the Tel: +36 1 279 6212 to understand and document the terms registry are also under study, since the E-mail: [email protected] hierarchy. Because the metadata number of registered schemas and map- schemas of partners are based on dif- pings cannot be predicted.

Personalizing Digital Library Access with Preference-Based Queries

by Periklis Georgiadis, Nicolas Spyratos, Vassilis Christophides and Carlo Meghini

Searching a Digital Library (DL) using traditional database (DB) or knowledge base (KB) query-answering techniques is constrained by the precision and completeness of answers: precision may lead to an empty answer, while completeness may result to a huge answer. Yet reformulating the query-filtering conditions to avoid one handicap may lead to the other: that is, relaxing the filtering conditions may lead to a huge answer, while strengthening the conditions may lead to an empty answer. Using preference-based queries to personalize user access to a Digital Library allows result sets to be tuned between these two extremes.

In general, a DL can be seen as a collec- document can be naturally represented description. The description involves a tion of documents residing in various by a single row in the DL catalogue. A number of multi-valued attributes, called information sources (on the Web). Each document's row contains its ID and in the sequel DL columns. The values for

34 ERCIM News No. 66, July 2006 SPECIAL THEME: European Digital Library

each column (referred to as terms of that are good candidates, considering that The problem of preference-based queries column), come from a specified sort they preserve the initial order properties. for digital libraries is studied jointly by associated with the column, and may be the Institute of Computer Science, organized in a hierarchy (eg taxonomy of The evaluation of a preference-based FORTH, Greece; the Laboratory of subject headings). A traditional DB or query begins by computing the set of Research in Informatics, LRI, France; KB query over the DL catalogue filters documents ans(q) which satisfies the and the Institute of Science and its rows according to the Boolean condi- query part q. Then, for each preference Technology in Informatics, CNR, Italy, tions on columns as defined in the query, relation over a DL column a partial order within the framework of the DELOS but offers limited support for ordering over the set ans(q) is induced. As a par- European Network of Excellence in the documents in the query answer. tially ordered set of documents is not Digital Libraries. With respect to similar convenient to return as a response to a approaches, the main contribution of our In order to personalize DL access we user query, we can employ variants of work lies in the combination of ordered advocate an enriched form of queries, topological sorting to induce an ordered partitions coming from preferences called preference-based queries. A pref- partition of the set ans(q). This is a col- expressed over multi-valued attributes, erence over a column C of the DL cata- lection of mutually disjoint blocks of rather than over functional attributes logue is any reflexive and transitive documents in a linear order that respects describing traditional database tuples or binary relation ? (preorder) over the terms of C. In other words, each pair t → t' of the relation denotes that t is pre- ferred to t'. The presence of both t → t' and t' → t makes the terms t and t' equiv- alent, meaning that if a document described by t is not in the DL then one described by t' is an acceptable alterna- tive (and vice versa).

A preference-based query comprises three parts: - the term-filtering part q, which is Boolean conditions of terms (with an A preference-based query over a Digital Library. option of transitive closure due to the underlying term hierarchies) - the preference part P which consists of preferences over the columns (ie pref- the partial order - and thus the initial user objects. Our framework is expressive erences on the data level) as well as preferences. In this linear order each enough to produce sequences of docu- priorities over the columns (ie prefer- block would correspond to a screen of ments from descriptions expressed in ences on the schema level) relevant documents that is shown to the diverse data models (eg XML, RDF/S) - optionally, the top-k part, ie a positive user, with the most preferred documents with respect to a variety of user prefer- integer k denoting the maximum appearing first. The parameter k is used ences, while also including priorities desired number of returned docu- as a stop condition to end the output pro- over the preferences. ments. cess (when k documents have been shown to the user). Link: Of these three parts only the term-fil- http://www.ics.forth.gr/isl tering part q is always submitted online. A user may express preferences over dif- The remaining two parts can either be ferent DL columns and each preference Please contact: submitted online (together with q), or incurs a different ordered partition. Nicolas Spyratos, LRI, University of Paris taken from a stored user profile and Combining preference relations over dif- South, France appended to q automatically. In either ferent columns and taking into account Tel: +33 1 69156629 case such an access to a DL is a personal- user prioritization over the columns boils E-mail: [email protected] ized access. down to defining a partial order over the Cartesian product of n partially ordered Vassilis Christophides, ICS-FORTH, Greece As DL columns are in general multi- sets. This is a well-known problem for Tel: +30 281 0391628 valued, and if we consider different which various solutions exist, such as the E-mail: [email protected] power domain orders, a partial order lexicographic ordering or the Pareto preference relation over the terms of one orderings. In our case, we compute the Carlo Meghini, ISTI-CNR, Italy column may define a partial preorder product of all partitions and we use one Tel: +39 050 3152893 over the documents in many ways. The of the known orderings on the topolog- E-mail: [email protected] choice is application-dependent, but in ical distances of documents to generate general the Hoare and Smyth relations the required final ordered partition.

ERCIM News No. 66, July 2006 35 SPECIAL THEME: European Digital Library

Multilingual Interactive Experiments with Flickr

by Jussi Karlgren, Paul Clough and Julio Gonzalo

The Cross-Lingual Evaluation Forum (CLEF) in 2006 will feature a track on interactive from dynamic target data taken from the popular Flickr photo-sharing service. In the past, interactive tracks at CLEF have addressed applications such as information retrieval and question answering. This year however, the focus has turned to text-based image retrieval from Flickr.

Information retrieval systems, especially Over the past five years, the CLEF inter- access. For this reason, we have decided text retrieval systems, have in the last active track has studied various cross- to fix the search tasks, but to keep the few decades benefited greatly from a language search tasks, including evaluation methodology open. This fairly strict and straight-laced evaluation retrieval of documents, answers and allows each participant to contribute scheme, which enables system designers annotated images. All tasks involve the with their own ideas about how to study to run tests on versions of their system user interacting with information sys- interactive issues in cross-lingual infor- using a test collection of pre-assessed tems in a language different from that of mation access. data. These tests, based on the target the document collection, and have been Additionally, we will lower the threshold for entry to attract more participants.

This year, the tasks given to participants are: • Topical ad-hoc retrieval over many languages: find pictures of as many different European parliaments as pos- sible. • Creative open-ended retrieval: illus- trate a short text on a given topic with five pictures (the text is provided sepa- rately to the experiment subjects). • Example-based retrieval: determine the name of the place shown in a given photo.

The majority of Web image searching is text-based, and the success of such an An example photo from Flickr with multilingual annotations. approach often depends on reliably iden- tifying relevant text associated with a particular image. Flickr is an online tool notion of topical relevance, with system- evaluated using conventional evaluation for managing and sharing personal pho- oriented evaluation of performance, methodologies. This involves a fairly tographs and currently contains over five have served the text retrieval field well. elaborate experimental setup. million freely accessible images. These However, system evaluation only are available via the Web, and are addresses some of the bottlenecks in This year we introduced some major updated daily by a large number of users. building a successful system. changes. We want to find a collection The photos are annotated by authors with where the cross-language search neces- freely chosen keywords in a naturally As a complement, experiments such as sity arises more naturally for average multilingual manner. Most authors use iCLEF – the interactive track at CLEF users. We have chosen Flickr, a large- keywords in their native language; some –aim to investigate real-life cross-lan- scale, Web-based image database combine keywords in more than one lan- guage searching problems in a realistic serving a large social network of WWW guage. This sort of emerging, unsuper- scenario, and to give indications of how users. It has the potential to offer both vised and distributed semantic structure best to aid users in solving them. This challenging and realistic multilingual is known as a folksonomy and provides a crucially involves developing new eval- search tasks for interactive experiments. modelling challenge for traditional uation methodologies and new target knowledge-based retrieval approaches. notions: relevance does not cover all the We want to use the iCLEF track to aspects that make an interactive session explore alternative evaluation method- Participants will access images and successful. ologies for interactive information metadata in Flickr through the open API

36 ERCIM News No. 66, July 2006 SPECIAL THEME: European Digital Library

provided by Flickr, and are encouraged not been a time limit? How long would Links: to log as many details as possible about you have continued? CLEF: http://www.clef-campaign.org every search session. A skeleton ques- • Score (example-based task, ad-hoc Track home page: http://nlp.uned.es/iCLEF/ tionnaire will be provided to collect task): one point for each relevant Data source: http://www.flickr.com some of the evaluation metrics, and we image found. will aim to probe notions related to user Please contact: satisfaction and confidence: This year's workshop, held in Alicante Jussi Karlgren, SICS, Sweden • Satisfaction (all tasks): are you satis- in September 2006, will discuss the Tel: +46 8 633 1500 fied with how you performed the task? efficiency of search strategies, the use- E-mail: [email protected] • Completion (creative task, ad-hoc fulness of tested methods, and the task): did you find sufficient results, or utility of the projected evaluation would you have continued if there had methodologies.

Multimedia Ontologies for Video Digital Libraries by Alberto Del Bimbo, Marco Bertini and CarloTorniai

A research activity is under way at the Department of Informatics, Florence University, which aims at creating a framework for the automatic annotation of soccer videos and the semantic retrieval of videos based on highlights and other high-level concepts.

Effective usage of multimedia digital inadequate to describe specific patterns highlights described in the linguistic libraries has to deal with the problem of of events or video entities. We are inves- ontology, extracting the visual features building efficient content annotation and tigating the representation of events that and performing an unsupervised clus- retrieval tools. Video digital libraries share the same patterns by visual con- tering. The clustering process, based on require annotation at the level of pattern cepts, instead of linguistic concepts, in visual features, generates clusters of specification in order to retrieve multi- order to capture the essence of the event sequences representing specific patterns media content according to specific user visual development. In this case, high of the same highlight, which are regarded preferences and high level semantic con- level concepts, expressed through lin- as specialization of the highlight. Visual tent description. We have implemented guistic terms, and pattern specifications, concepts for each highlight specialization multimedia ontologies, which include represented through visual concepts, can are automatically obtained as the centres both visual and linguistic concepts, be both organized into new extended of these clusters. Reasoning on the showing how they can be used for video ontologies that couple linguistic terms ontology is used in order to refine the annotation and retrieval and for the cre- with visual information. annotation according to temporal and ation of user interfaces that accept com- semantic relations between events. plex queries, such as the visual proto- Using visual prototypes it is possible to types of actions, their temporal evolution group different video clips according to The Multimedia Ontologies Annotator is and relations. their visual features and at the same time the framework that allows users to import classify them according to the linguistic basic ontology schemas, generate the Broadcasters need tools to annotate their high-level semantic concepts related to multimedia ontology, annotate video clips video asset archives in order to exploit the visual prototype. according to the ontology, and perform them to produce better TV programmes, complex queries in order to retrieve and to lower the costs of indexing and We have implemented a multimedia videos containing specific visual concepts search. Usually the video annotation pro- ontology for the soccer domain. A simpli- and other high-level linguistic concepts. cess is carried out manually, using pre- fied schema is shown in Figure 1. Visual defined vocabularies and taxonomies concepts for the different subclasses of Figure 2 shows the interface of the defined by the TV archivists. 'Shot on Goal' are shown. The ontology is Multimedia Ontologies Annotator. expressed using the Web Ontology The basic idea behind multimedia Language OWL so that it can be shared It should be noted that users are able not ontologies is that the concepts and cate- and used in a search engine to perform only to browse, with a single interface, gories defined in a traditional ontology content-based retrieval from video soccer and other video footage, but can are not rich enough to fully describe the databases or to provide video summaries. also easily access the visual specifica- plethora of visual events that can occur tions of the linguistic concepts. in a video. In fact, although linguistic The creation process of the multimedia terms are appropriate to distinguish ontology is performed by selecting a rep- When users wish to see the different event and object categories, they are resentative set of sequences containing visual specifications of the linguistic

ERCIM News No. 66, July 2006 37 SPECIAL THEME: European Digital Library

Figure 1: Multimedia Ontology (partial view). Figure 2: Multimedia Ontologies Annotator Interface.

concept 'Shot on Goal', they simply actions took place in a given location. Programme of the European Commission select the concept and the interface pro- The video produced should contain any as part of the DELOS Network of vides the clips that represent that con- type of attack action or placed kick visu- Excellence on Digital Libraries. cept. Moreover, a cluster view of similar ally similar to the selected models that visual concepts related to the linguistic occurred in soccer games played in the Please contact: concept is provided. specified location. Marco Bertini, University of Florence, Italy Tel: +39 055 4796540 The queries that can be performed by the Our future work will deal with the auto- E-mail: [email protected] system involve both visual and high- matic generation of textual and vocal level concepts. For instance a user can descriptions for video content based on Alberto Del Bimbo, University of Florence, query for a sequence that starts with a visual features and temporal and Tel: +39 055 4796540 forward launch, finishes with a shot on semantic relations between concepts. E-mail: [email protected] goal and contains a placed kick. He can also require that all actions are visually This work is partially supported by the Carlo Torniai, University of Florence similar to a certain video clip or that all Information Society Technologies (IST) Tel: +39 055 4237408 E-mail: [email protected] Structured Multimedia Description for Simplified Interaction and Enhanced Retrieval

by Stephane Marchand-Maillet, Eric Bruno and Nicolas Moënne-Loccoz

Multimedia management is a challenge common to several user groups, from individual users to corporate groups. It is therefore of high importance to define a platform that will resolve the contradictory issues of simplified description tasks and enhanced querying capabilities. We show that, based on technology common to Digital Libraries and the Semantic Web, we can propose such a framework.

The description of visual documents is a incomplete by nature. The importance of as a description schema, since it caters fundamental aspect of an efficient multi- context may lead to ambiguous or even for manual and automated description media information management system. contradictory content descriptions. It is processes. However, MPEG-7 descrip- This is supported by the fact that a signif- therefore critical that visual content tion tools are still largely not operational. icant part of the information contained in description is done in the most Further, although based on XML, the document can only be captured via favourable environment. MPEG-7 hardly unifies with other the explicit description of a human oper- schemes proposed along the Semantic ator. However, creating such a descrip- In the context of multimedia retrieval, Web route. Rather than pursuing the tion is known to be both expensive and MPEG-7 seems to be a good candidate MPEG-7 direction therefore, we have

38 ERCIM News No. 66, July 2006 SPECIAL THEME: European Digital Library

constructed a generic ontology-based constrained by a con- annotation framework, in close relation- tent schema. By con- ship with the emerging W3C standards trast, in the DEVA RDF (Resource Description Framework) model the subject and OWL Web Ontology Language. property is associated These developments are therefore with the Subject class. aligned with the Semantic Web initia- This class allows the tive, which shows that rich semantic semantic content of a annotations are needed for automating visual document to be truly useful processes, including multi- described. The specifi- lingual support. cation of the content description is an OWL Structured Description Model ontology designed The base of our framework is DEVA, a according to the four description scheme that has many expressive levels seen favourable properties. DEVA acknowl- earlier: edges the power and utility of both the • the Document class Dublin Core (DC), largely used in the and its Dublin Core Digital Libraries community, and the extended properties combination of RDF and OWL as • the Element class, description framework and knowledge representing the management framework respectively, most significant ele- both arising in the frame of the Semantic ments of the scene Figure 1: Architecture of the proposed structured document Web. • the Property class, management system. representing the The latest version of the Dublin Core properties attached vocabulary is composed of fifteen prop- to an element almost reduced to the baseline classical erties, among which we can find dc:title, • the Relationship class, representing keywording operation. However, the dc:creator, dc:format or dc:subject. the relationships between elements. tool fully exploits the context in which DEVA uses and extends these properties keyword input takes place. Hence, every within the deva:Document wrapper Interaction Principle action is aligned against the knowledge class, and preserves their semantics. An annotation tool prototype, called base. Also, the definition of properties Being a Dublin Core extension allows magritte, has been implemented to eval- related to enumerable content is con- the DEVA model to take advantage of a uate our DEVA model and to validate its trolled by restricted choices. In this way, recognized standard, making it compat- relevance in the frame of image annota- the document description is incremen- ible with most of the software tools tion. The prototype is written in Java and tally constructed by narrowing the scope already available. makes use of the open-source Jena of possible entries. Semantic Web Framework developed by Wrapping DC elements has also been HP Laboratories. The interaction is sim- Interaction Principle done in W3C's RDFPic tool using the plified in many ways. The action of A structured focused description is PhotoRDF, where the subject property is describing a multimedia document is useful only when related to corre- sponding queries. However, in the con- text of an economical description, queries should, in principle, target known properties. For example, a picture whose subject is said to be a 'bird' would never respond to a query related to 'animal' in a direct search system. In fact, a 'bird' would be as close as a 'car' from such a query. The solution to this is clearly to make use of an external knowl- Figure 2: edge base to extend the 'bird' concept to The SWKB that of an "animal". demo tool illustrating This is the aim of the Semantic Web the query Knowledge Base (SWKB) that we have "something created to complement our description with a framework with a reasoning engine shell". capable of processing high-level queries.

ERCIM News No. 66, July 2006 39 SPECIAL THEME: European Digital Library

SWKB is an abstract framework embed- tions aligned with typical assets found in The support of the Swiss National ding a reasoning engine to process digital libraries, while opening up the Science Foundation is gratefully DEVA (RDF) data against a classical possibility of extended queries. We are acknowledged. OWL-based knowledge base using currently extending it in directions RDF/S semantics. By default, SWKB related to content-based analysis of mul- Link: embeds the Jess (Java Expert System timedia documents with auto-annotation Viper Group on Multimedia Information Shell) as a reasoning engine and may as a bootstrap procedure, and the use of Management: http://viper.unige.ch well be extended to other types of rea- retrieval to organize the description at soning. the collection level. This extension will Please contact: partly take place in the domain of the Stéphane Marchand-Maillet Extension Cultural Heritage asset management via University of Geneva, Switzerland Thus, the above framework makes it the MultiMATCH project (http://www. Tel: +4122 379 7631 / +41 22 379 7660 possible to create multimedia descrip- multimatch.eu). E-mail: [email protected]

Taking a New Look at News

by Arne Jacobs and Nektarios Moumoutzis

Although video search technology is making rapid strides forward, video search engines continue to be challenged by the semantic gap. This is the difficult problem of relating low-level features to the higher-level meaning that corresponds to the human-like understanding of video content, and a solution to it is necessary for effective retrieval performance. In the Delos Network of Excellence, specifically in task 3.9, "Automatic, context-of-capture based Categorization, Structure Detection, and Segmentation of News Telecasts", our approach to bridging this semantic gap is twofold. First, we restrict the application domain to news videos, and second, we exploit the combination of multimodal analysis with semantic analysis based on ontologies.

A key observation in bridging the other agents related to the news produc- the audiovisual input. Such events can be semantic gap in the news video domain tion. higher-level concepts like a specific is that semantic concepts in news videos person appearing in a shot (eg the are conventionalized in many ways. This For parsing of a news video following a anchor), the appearance of a certain fact can be exploited. For example, seg- corresponding model, we propose a series of frames (eg the introduction ments in news telecasts do not appear in system consisting of two main types of sequence, with which many news broad- arbitrary order, but rather follow a rela- interoperating units: the recognizer unit casts commence), or low-level concepts, tively strict scheme that determines the consisting of several modules, and a eg the similarity of two frames. order of segments. Different news stories parser unit. The recognizer modules are also usually separated by anchor analyse the telecast and each one identi- The system contains three distinct rec- shots containing the presentation of the fies hypothesized instances of 'events' in ognizer modules: the audio recognizer, story that follows. This telecast structure allows the viewer to easily recognize dif- ferent segments. Each news format has its own structural model.

We assume that these news format models can be described with context- free grammars. Such a grammar can aid the segmentation process by removing ambiguities in classification, or by asso- ciating certain audiovisual cues with segment classes (eg news story, presen- tation). It models all interesting con- straints in visual, audio and textual fea- tures, that are partly due to the way news programs are produced and partly to the habits and preferences of journalists and The system architecture shows the interoperation between regognizers, grammar, and parser.

40 ERCIM News No. 66, July 2006 SPECIAL THEME: European Digital Library

the visual recognizer and the semantic A stochastic parser using a probabilistic Finally, the semantic recognizer has recognizer. The visual recognizer identi- grammar analyses the identifications access to an upper ontology, covering all fies video events in the news stream, provided by the recognizers. In essence, necessary aspects required for multimedia such as a face appearing at an expected the recognizers provide the parser with content description, as well as domain- position in the video or the presence of a actual lexical tokens just as a lexical specific ontologies created for news. The familiar frame according to the expected analyser would provide to a program- concepts acquired from these ontologies structure of the broadcast. The audio ming language parser. The grammar rep- will define those detectable semantics that recognizer identifies audio events such resents the possible structures of the can be identified in the telecast. as the presence of speech or music, the news telecast, so the parser can identify detection of predetermined keywords, the exact structure of this telecast. When We are currently investigating methods and clustering of speakers. Finally, the the parsing is complete and all the struc- for automatically determining the audio- semantic recognizer identifies the tural elements of the input have been visual cues that characterize a given semantics involved in the telecast. This analysed, the semantic recognizer uses news format, by analysing a set of includes topic detection, high-level that information to identify story topics example recordings of that format. event detection, discourse cues and pos- and events, and to assign all required Based on the experience that these sible story segmentation points. The semantics to the structure tree. audiovisual cues do not change fre- figure shows a sketch of the system quently, we expect videos from the same architecture. The grammar for each broadcast station, format to have many nearly identical even for different news programs of the video and audio sequences at similar The recognizers normally only commu- same station, is distinct. This is because the time-points. We are trying to exploit this nicate with the parser in a one-way com- grammar captures the directing elements of by using an inter-video, intra-format munication, providing a sequence of pre- the broadcast, and no two programs have similarity analysis. In the future, this will determined event 'tokens'. However, in exactly the same directional structure. overcome the limitation of manual cre- the case of the semantic recognizer there Therefore, a grammar must be produced ation of format models. may be an exception, since that module manually for each program examined. requires a transcript of the telecast in Please contact: order to perform its analysis. In the case To determine the probability values of Arne Jacobs,Universität Bremen, Germany where the transcript is not provided the rules in the grammar, it is necessary E-mail: [email protected] through the input (eg in the form of to (currently manually) complete a closed captions), the audio recognizer training process, which uses a set of cor- George Ioannidis provides this information. rectly labelled news recordings in the Universität Bremen, Germany form of a sequence of tokens. E-mail: [email protected]

Radio Relief: Radio Archives Departments Benefit from Digital Audio Processing by Martha Larson, Thomas Beckers and Volker Schlögell

The archives departments of radio broadcasters are currently facing face two significant challenges, namely, how to store rapidly increasing amounts of radio content, and how to satisfy the rising demand for easy retrieval of audio clips that can be recycled into new programs. A pilot project demonstrates that digital audio processing techniques have the potential to provide much-needed support.

Radio broadcasters rely on highly spe- Germany by Westdeutscher Rundfunk Currently, many radio broadcasters cialized staff to archive broadcast con- (WDR) and Deutsche Welle (DW) in maintain extensive databases containing tent and respond to requests from jour- cooperation with the Fraunhofer Institute annotations of analogue radio recordings, nalists and editors for audio content on for Intelligent Analysis and Information painstakingly compiled by the archive certain topics. As radio expands rapidly Systems (IAIS). It has developed an staff. When the archives department into the digital world, the amount of audio archive prototype that demon- receives a request from a journalist or radio content produced and the demand strates that automatic audio processing editor, the metadata in these databases is for a convenient way to access this con- methods have a clear and concrete poten- searched and the corresponding analogue tent for recycling has been growing at a tial to provide critical support for recording can be located in storage. rate that threatens to overwhelm archives archivists, journalists and editors in the Information concerning the recorded con- departments. The pilot project face of these challenges. tent that is not noted in the annotations is Audiomining is being undertaken in effectively 'lost' in the archive, since it

ERCIM News No. 66, July 2006 41 SPECIAL THEME: European Digital Library

cannot be retrieved. As radio broadcasters The interfaces of the Audiomining speech segments are sent to the speech move towards completely digital work- system were developed in close coopera- recognizer for the generation of speech flows, it becomes possible to use automat- tion with archivists from WDR and DW. recognition transcripts on which the key- ically generated metadata to supplement The project blended tried-and-true tech- word search is carried out. the human-produced annotations. niques used by the archives departments with new digital audio technology in The Audiomining project is in its final The Audiomining prototype system pro- order to created a concept for a new inte- evaluation stage and has accomplished its vides both an indexing interface – which grated workflow, which would provide goal of demonstrating that digital audio allows archivists to load new radio con- comfortable and intuitive support for processing technology can be smoothly tent into the system for processing – and archivists for both annotation and incorporated into archivists' workflows. a search interface. The search interface retrieval of radio content. Archivists feel Automatic systems will not replace human allows archivists not only to retrieve pro- that the structured browsing offered by archivists in the foreseeable future. grams from the archive using titles and the graphic audio interface will allow However, the potential inherent in auto- production dates, but also to type in key- them to listen to radio programs in a tar- matic structuring and audio keyword words, which are then searched for in geted way, using their annotation time to search demonstrates promise to provide speech recognition transcripts. This concentrate on adding high-level significant relief for radio broadcasters, option means that the content of radio semantic labels to targeted radio seg- who are inundated with audio content and broadcasts is directly searchable. The ments. The keyword search also has clear sorely in need of techniques that make search interface returns a hit list, and potential to help archivists locate sections spoken audio as easily accessible as text. individual hits can be opened with a of radio broadcasts, in particular inter- simple click in the graphic audio views that are relevant to user requests. Please contact: browser. The audio browser displays a Martha Larson, Fraunhofer radio program as a series of cuts corre- The indexing module of the Audio- Institute for Intelligent Analysis and sponding to segments of the program mining stand-alone prototype produces Information Systems (IAIS), Germany containing music or speech. Those con- metadata in MPEG7 format. First, it uses Tel: +49 2241 14 1980 taining speech are further divided into audio segmentation, based on the well- E-mail: [email protected] segments spoken by the individual known Bayesian Information Criterion, to speakers, who are assigned speaker index determine boundaries at which the quality Thomas Beckers numbers. The graphic audio browser dis- of the audio changes (for example at a WDR Dokumentation & Archive, Germany plays keywords that have been found in speaker turn). It then applies a classifier Tel: +49 221 220 4799 the radio program at their relative posi- that separates speech from non-speech, E-mail: [email protected] tions, and it is possible to click on key- which is generally music. In the next step, words and jump into the audio at the it groups all the speech segments into Volker Schlögell, Deutsche Welle Archive- exact point when the keyword is spoken. classes that are acoustically similar. These Bibliothek-Dokumentation, Germany classes correspond to speakers and are Tel: +49 228 429 4368 assigned a speaker index. Finally, the E-mail: [email protected]

Self-Organizing Distributed Digital Library Supporting Audio-Video

by László Kovács, András Micsik, Martin Schmidt and Markus Seidl

The StreamOnTheFly network combines peer-to-peer networking and open-archive principles for community radio channels and TV stations in Europe. StreamOnTheFly demonstrates new methods of archive management and personalization technologies for both audio and video. It also provides a collaboration platform for community purposes that suits the flexible activity patterns of these kinds of broadcaster communities.

Community broadcasters are non-profit, TV channels would like to be able to tion that is flexible, effective and low- are open to the general public and have a archive, exchange and stream content cost. local or regional scope. They provide over the Internet, but appropriate tools access to radio and TV production facili- are not yet available at an affordable In 2002, the StreamOnTheFly project ties for organizations, groups and individ- price. They have therefore expressed a with the participation of Public Voice uals aiming to make their own programs desire to set up a robust distributed Lab (Austria), SZTAKI and Team or shows. Community radio stations and infrastructure with a technological solu- Teichenberg (Austria) obtained grants

42 ERCIM News No. 66, July 2006 SPECIAL THEME: European Digital Library

from the European Commission's IST StreamOnTheFly Architecture the network. XBMF enforces the cou- Programme to build a middleware appli- The original StreamOnTheFly architec- pling of metadata with content, and also cation for radio with various front ends. ture consisted of three main components: creates the possibility of providing con- StreamOnTheFly was focused on next- • station control: a Web application to tent in different audio formats, or generation audio content management manage radio broadcasting, pro- attaching different media (eg images, and broadcasting offering a customiz- gramme scheduling, and the archiving text) to the audio content. able community radio program. It is now and publishing of selected pro- able to demonstrate new methods of grammes to node servers During the evolution of the management and personalization tech- • node server: a distributed network of StreamOnTheFly network the node nologies for both audio and video. It also servers containing the archived pro- server component (archive) remained in provides a collaboration platform that grammes with their metadata, access focus, while the other two components suits the flexible activity patterns of local statistics and usage history; users may (station control, portal) were made broadcaster communities. browse for interesting materials, and replaceable using standard communica- compile personalized radio streams for tions. A plug-in for free, professional, In a time when access to broadband listening radio station management software is Internet and the demand for online video • portal server: personal or community under development: this will support the material are rapidly increasing, video selections of content can be presented archiving of programmes on node compatibility has become an important in a customized format; the archived servers. RSS 2.0 or OAI-PMH (Open criterion for online media archives. With content is revitalized through various Archives Initiative - Protocol for this in mind, the development of the subjective filters. Metadata Harvesting) are used to pro- video extension was initiated in January mote archived content to portal engines 2006 by a group of developers at the The core of the StreamOnTheFly net- (eg Typo3, Manila, Plone) and search University of Applied Sciences St. work combines peer-to-peer networking engines. Via podcasting, selected con- Pölten. and open-archive principles, while other tent can be easily transferred to PDAs, services are realized as separate network iPods and other handheld devices. In the case of video, an emphasis was components communicating through placed on compatibility with a wide open interfaces (APIs). There is no cen- StreamOnTheFly in Operation variety of user devices, such as mobile tral server in the network, and metadata The StreamOnTheFly network is opera- phones, iPods, PDAs and Sony's is exchanged in a peer-to-peer manner. tional since October 2003. The core of Playstation Portable. The realization of The content is stored on the node of the the network (five network nodes are up this goal was achieved with the help of publisher while all content metadata is currently) is accessible in four languages ffmpeg, the leading open-source video available on all node servers. This (English, German, French and transcoding tool. Another enhancement enables fast searching and browsing with Hungarian), and contains more than of the archive was the adaptation of reasonable storage requirements. 1800 hours of audio content stored in StreamOnTheFly's RSS (Really Simple 1700 programmes (http://radio. Syndication) functionality to make video A simple exchange format called XBMF sztaki.hu). Other applications of the soft- podcasting an integrated part of the (Exchange Broadcast Binary and ware include an exhibition, school and archive. Metadata Format) is a core element of fair radios, e-learning and linguistic pro-

Figure 1: StreamOnTheFly Node Server Interface. Figure 2: StreamOnTheFly Network Protocols and Access Methods.

ERCIM News No. 66, July 2006 43 SPECIAL THEME: European Digital Library

jects. The StreamOnTheFly software its flexibility and extensibility in several Please contact: suite is open-source and licensed under cases. The architectural decisions made László Kovács, SZTAKI, Hungary the GNU General Public License. in StreamOnTheFly justified themselves Tel: +36 1 279 6212 as an open and scaleable base for further E-mail: [email protected] Although the StreamOnTheFly project developments. officially finished in June 2004, volun- Markus Seidl tary work is continuing to implement Links: University of Applied Sciences various extensions and use cases. Since Project home page: St. Pölten, Austria that time StreamOnTheFly has proved http://www.streamonthefly.org Tel: +43 2742 313 228 - 245 The Hungarian node: http://radio.sztaki.hu E-mail: [email protected]

Repositories and Preservation in the UK

by Neil Jacobs

In recent years, universities and colleges have begun to think more carefully about the management of their intellectual output. Influenced by enlightened self- interest and prompted by the encouragement of funding and regulatory bodies, these higher education institutions are making greater efforts to showcase their research. One of the potential tools in their armoury is the repository, which allows digital objects to be managed locally and accessed globally. In the UK, the Joint Information Systems Committee (JISC) is investing some £13.8m over three years to make this a reality.

JISC's vision is to establish a network of and colleges. This includes a major enhanced and (perhaps most impor- digital resources and services to improve project that will aid higher education tantly) populated. It will be possible to content use and curation. Through a sig- institutions in establishing and devel- search in increasingly sophisticated nificant investment programme, JISC oping repositories. ways, based on a range of effective and will build on work on the 'Information • Digital Preservation: the programme practical interoperability standards. Such Environment' undertaken by UKOLN. will develop a distributed environment standards will also underpin preservation This has resulted in a national architec- for digital preservation, in which ser- services, and universities and national ture that provides easier discovery of, vices, roles and responsibilities are bodies will share the responsibility for and access to, digital content. However, defined. preservation. We will have a much significant development is still required • Discovery to Delivery: this includes a clearer idea of how repositories should in a number of areas, eg in preserving searching service across UK reposito- be used to support education and digital content. ries, and development projects to research. Pilots and demonstrators will achieve agreement on standards for illustrate the potential in this pro- A new JISC development programme will searching and semantic interoper- gramme, and software and tools will fund initiatives to develop the Information ability. make it practical. Environment and support digital reposito- • Tools and Innovation: the programme ries and preservation, including cross- will develop new software and tools It is anticipated that the programme will searching facilities across repositories. It which will lead to innovative result in a range of benefits for universi- will also fund institutions to develop a crit- approaches to repository use and dig- ties and colleges, including an increased ical mass of content, and will provide ital preservation. capability to manage intellectual prop- preservation solutions and advice for the • Shared Infrastructure: in support of erty for education and research, and an development of repositories. The pro- both national and international devel- infrastructure that will support the sector gramme builds on existing JISC work, in opments, the programme will develop into the future. particular the 'Digital Repositories' pro- shared infrastructure services such as gramme and the 'Supporting Digital user profiling services, digital rights Link: Preservation and Asset Management in management, registries, identifier ser- http://www.jisc.ac.uk/ Institutions' programme. vices, terminology and preservation services. Please contact: The new programme comprises the fol- Neil Jacobs, Joint Information Systems lowing areas: If these are the instruments, then what Committee, UK • Digital Repositories: projects will will be the outcomes? Firstly, institu- Tel: +44 117 33 10772 develop repositories for universities tional repository services will be created, E-mail: [email protected]

44 ERCIM News No. 66, July 2006 SPECIAL THEME: European Digital Library

Digital Library of Historical Newspapers by Martin Doerr, Georgios Markakis, Maria Theodoridou

A management system for historical newspapers that supports both digital library functionality and archival management of original newspaper articles is being developed for the needs of the Vikelea Municipal Library of Heraklion. It includes OCR-based page analysis and article clipping, article-level metadata generation, semantic indexing and multifaceted classification of articles using a built-in thesaurus. We aim to improve the classification, completeness and precision of retrieved information - supporting both metadata and full-text searching - and to provide user-friendly Web access.

An important part of the study of histor- • full text of the annotated segment of papers. It therefore supports thematic ical newspapers consists of classifying the newspaper produced by an OCR indexing and classification based on con- the material and annotating it such that (optical character recognition) session. cepts retrieved from appropriate thesauri. its future retrieval is made easier. The system has a variety of goals, including The large volume of the material that At the core of the Historical Newspaper supporting the preservation, documenta- needs to be digitized and classified poses Digital Library is the Fedora open- tion and study of historical newspapers. another important challenge. The system source digital repository system, which It also aims to protect people from expo- will be used to digitize approximately is a flexible content repository system sure to potential health hazards and to 100.000 pages. Given the fact that each that provides organizations with flexible assist in the production and dissemina- page generally contains between five tools for managing and delivering their tion of electronic versions of publica- and twenty articles, we need to create an digital content. Fedora is jointly devel- tions, thereby promoting cultural educa- efficient and flexible interface as well as oped by Cornell University and the tion. a mass import/OCR mechanism in order University of Virginia Library. to reduce the time and cost of the digiti- The structural particularities of digitized zation process. The functionality of the digital reposi- newspaper documents pose a significant tory is enhanced by the use of SIS challenge in creating an efficient digital The historical newspaper management Thesaurus Management System, which library system interface. A newspaper system consists of the following subsys- is a semantic network used to store, page consists of articles (text blocks), tems: develop and access multiple thesauri and pictures and advertisements that refer to their interrelations under one database a variety of real-world events, activities, The Digital Library deals with the man- schema. The semantic interoperability of actors and/or objects. Consequently, the agement of the archival catalogue and the digital repository with the thesaurus page itself is not the basic conceptual information on the contents of the news- management system aids users in classi- unit of information and is there- fying and retrieving newspaper fore not suitable for a thorough articles. metadata-based description of the material. Instead we focused The Documentation Tool pro- on the notion of the segment as a vides an efficient Web-based user basic conceptual unit. A segment interface for the insertion, filing, may consist of one or more parts documentation and classification of the newspaper document that of material, and follows interna- are conceptually relevant (ie an tional standards for information article, a group of articles or modelling and interoperability. advertisements etc). We have created a flexible, easily The historical newspaper man- deployable and user-friendly agement system implements a Web interface for this system to 'hybrid' form of classification and enable the researcher to isolate a searching based on the following specific conceptual entity within elements: the document and perform an on- • user-generated metadata for the-fly creation, description and each annotated segment of the storage of the produced metadata. original newspaper based on the CIDOC Conceptual Reference In addition to the creation of the Model ISO/DIS 21127 Historical newspaper management system architecture. segment, the system performs an

ERCIM News No. 66, July 2006 45 SPECIAL THEME: European Digital Library

extraction of the text included in the anno- edge among the members of a commu- Vikelea Municipal Library of Heraklion tated segment of the document and stores nity of annotators. to upload a significant part of the histor- it for full-text search purposes. ical archive of newspapers and maga- The Administrator Tool allows the mass zines regarding the history of Crete. Graphical terminology visualization storage of digitized material (JPEG techniques enable the user to annotate images) into the digital repository, and the Link: the document according to appropriately transformation of this material into a http://www.ics.forth.gr/isl/cci.html developed thesauri. The combination of format that can be annotated and indexed thesauri visual graphs and auto-complete by the experts via the documentation tool. Please contact: algorithms significantly reduces the time Maria Theodoridou, ICS-FORTH, Greece needed for the creation of metadata and The historical newspaper management Tel: +30 2810 391 731 supports the efficient sharing of knowl- system is currently being used in the E-mail: [email protected]

DML-CZ: Czech Digital Mathematics Library

by Jirí Rákosník

Mathematics, much more than any other area of science, depends on access to literature that may be tens of or even hundred years old. The rapidly increasing extent of this kind of information makes efficient searching and navigation difficult, especially if the majority of works remain accessible only in paper form. Contemporary scholarly literature is commonly available in electronic form online, which enables information to be stored, organized, searched and accessed in a digital environment. It would be highly advantageous if this were also possible for the older body of literature.

A number of recent projects worldwide – and tools that would allow the creation it will be incorporated into the WDML. JSTOR and NUMDAM, for example – of a suitable infrastructure and condi- The project will involve launching the were set up with the aim of digitizing tions for establishing what will become digitization process and providing end historical mathematical literature. the DML-CZ. It will consist of the his- users with access to the digitized mate- Having different initiatives working on torical mathematical literature published rial. It will also involve research into the same problem might result in many in the Czech lands, and upon completion advanced technologies for searching different formats and interfaces. To avoid such mess, discussions started in order to define common standards and best practices. In addition, conditions were set for interlinking the individual projects in an ambitious system called the World Digital Mathematical Library (WDML). The entire mathematical liter- ature is estimated to consist of approxi- mately 50 million pages.

Encouraged by these activities, the Czech Mathematical Society initiated a national digitization project called DML-CZ: Czech Digital Mathematics Library (see http://dml.cuni.cz for more details about the project and other digiti- zation initiatives). Proposed for the period 2005–2009, it is supported by the Academy of Sciences of the Czech Republic within the framework of the national research programme Information Society.

The aim of the project is to investigate, develop and apply techniques, methods The proposed scheme of the DML-CZ.

46 ERCIM News No. 66, July 2006 SPECIAL THEME: European Digital Library

mathematical documents, and for sions of visually marked OCR data • Institute of Computer Science, including both existing and future 'born- into logically structured documents. Masaryk University, Brno (technical digital' materials. Presumably, in view of • Digital library: implementation of the integration, development of the digital the common history and lingual simi- Content Management System; pro- library for the DML-CZ, metadata larity, suitable Slovak mathematical lit- viding access to the digitized material; provision coordination, incorporation erature will also be included. interlinking the content with the refer- of the DML-CZ into the WDML) ence databases ZMATH and MathSci- • Faculty of Computer Science, Masaryk Creating an adequate digital library is a Net; research and implementation of University, Brno (OCR post-pro- complex task and requires numerous advanced search techniques; the DML- cessing, techniques for searching and problems to be solved. These include the CZ administration including long-term presenting digital documents) following areas, which will be tackled preservation of the digital content. • Faculty of Mathematics and Physics, within the project: • Integration of the DML-CZ in the Charles University, Prague (user • Acquisition: technical preparation of WDML. requirements, metadata specifications, materials to be digitized; intellectual links to ZMATH and MathSciNet) property and copyright issues. The testbed for the DML-CZ is being • Library of the Academy of Sciences, • Digitization: setting technical parame- built upon digitized documents from the Prague (digitization, OCR, storage and ters compatible with the WDML Best Czechoslovak Mathematical Journal. presentation of digitized content within Practice Statements; setting the digitiza- The electronic material created within the Kramerius digital library system). tion workflow; selection and adaptation the DIEPER project (Mathematica of software supporting the digitization Bohemica and Commentationes Links: process; OCR processing and post-pro- Mathematicae Universitatis Carolinae) http://dml.cuni.cz cessing; provision of metadata. offers another possibility. http://www.jstor.org • Digital documents: Digital Objects http://www.numdam.org structure specification; defining stan- The complexity of this task requires the http://gdz.sub.uni-goettingen.de/dieper dards for descriptive, structural and expertise of specialists in distinct fields. administrative metadata; global persis- The team therefore consists of five Please contact: tent identification; archiving and pre- groups from different institutions: Jirí Rákosník, Mathematical Institute AS sentation formats; conversions • Mathematical Institute AS CR, Prague CR, Prague, Czech Republic between formats and generation of (project co-ordination, selection and Tel: +420 221403446 digital derivatives; inclusion of born- preparation of materials for digitiza- E-mail: [email protected] digital materials; automatic conver- tion, IPR and copyright issues)

CASPAR and a European Infrastructure for Digital Preservation by David Giaretta

The preservation of digitally encoded information is a difficult task, requiring long- term commitment and collaboration. CASPAR, a new EU FP6 Integrated Project, addresses this problem. Together with other major European initiatives, it will form the basis of a continent-wide preservation infrastructure, and will benefit both current and future users. CASPAR (Cultural, Artistic and Scientific knowledge for Preservation, Access and Retrieval) is an EU Integrated Project, which began in April 2006 with a budget of around 16 MEuro (8.8 MEuro from the EU).

One of the challenges currently facing The ambitious goal is to build up a used in preservation activities. These modern society is the vast amount of common preservation framework for components will be the building intrinsically fragile digital information heterogeneous data, along with a variety blocks of the CASPAR Framework upon which it is increasingly becoming of innovative applications. This will be • creating the CASPAR framework: the dependent. CASPAR intends to achieved through the following high- software platform that enables the- address this problem by building a pio- level objectives: building of services and applications neering framework – based on existing • establishing a foundation method- that can be adapted to multiple areas. and emerging standards – to support ology applicable to an extensive range the end-to-end preservation 'lifecycle' of preservation issues The CASPAR consortium will demon- for scientific, artistic and cultural • researching, developing and inte- strate the validity of the CASPAR frame- information. grating advanced components to be work through heterogeneous testbeds.

ERCIM News No. 66, July 2006 47 SPECIAL THEME: European Digital Library

Figure 1: OAIS functional model. Figure 2: OAIS information model.

These will cover a wide range of disci- To achieve this, CASPAR brings together CASPAR adds to these a high-level plines from science to culture, contempo- a consortium covering important digital model of virtualization and a number of rary arts and multi-media, and will pro- holdings, with the appropriate extensive high-level components. vide a reliable common infrastructure that scientific (CCLRC – the lead partner and can be used or replicated in other areas. ESA), cultural (UNESCO) and creative The components of infrastructure that expertise (INA, CNRS, University of CASPAR will produce must themselves CASPAR proposes a set of tough metrics Leeds, IRCAM and CIANT). This is com- be preservable. To this end the project by which it, and any other project which bined with commercial partners (ACS, will put 'knowledge' at the heart of claims to be doing something useful for ASemantics, MetaWare, Engineering, and preservation. By this we mean that digital preservation, may be judged. IBM/Haifa), experts in knowledge engi- besides simple data semantics, CASPAR neering (CNR and FORTH) and other will also capture higher-level semantics. The CASPAR consortium will also seek leaders in the field of information preser- Furthermore, we will use Semantic Web to guarantee the future evolution of vation (University of Glasgow and techniques to enable the infrastructure CASPAR in the following ways: University of Urbino). components to survive changes over • the CASPAR preservation user com- time. munity will be built to create con- Models sensus around the initiative and gather The Reference Model for an Open Regardless of how successful CASPAR a critical mass of potential users Archival Information System (OAIS, is as a project, it nevertheless has a lim- • the CASPAR framework and compo- ISO 14721) which forms the basis of ited life. In order to provide long-term nents will be embedded within key CASPAR contains a number of models, support we aim to embed CASPAR memory organizations, both national including a functional model (Figure 1) results into the production processes of and international. and an information model (Figure 2). long-lived organizations such as CCLRC, ESA, UNESCO and INA, as well as many related archives.

In addition, the Task Force on Permanent Access to the Records of Science has produced a research pro- gramme and strategic plan, the former being consistent with that of CASPAR. Part of this strategic plan is to create an 'alliance' consisting initially of major data holders across Europe. Members of the alliance can, among other things, seek to align their individual infrastruc- tures to form the basis of a Europe-wide preservation infrastructure. It is also hoped that a European Digital Information Infrastructure for Preservation and Access (EDIIPA) will be added to the ESFRI Roadmap, to fur- Figure 3: CASPAR virtualization model. ther embed these activities.

48 ERCIM News No. 66, July 2006 SPECIAL THEME: European Digital Library

Immediate Benefits retrieving archived data, and indeed it than an ephemeral contribution, even from Digital Preservation may be hard to distinguish between the when measured on the long timescales of While many reasons exist for preserving two. While it is true that for current data data relevance. Furthermore, the tech- digitally encoded information, a large it may be possible to communicate with niques being adopted offer immediate proportion – such as legal requirements – the data producer, it would be much benefits to current users.Links: are transitory. Longer-term reasons tend more convenient not to rely on that but to to be very worthy (eg for the good of have automated processes that can use CASPAR: http://www.casparpreserves.eu future generations) but do not fare well the data correctly. DCC Development: http://dev.dcc.ac.uk in competition with other activities that Task Force on Permanent Access: seek support from cash-limited funders. The virtualization techniques needed for http://tfpa.kb.nl In addition, benefits are hard to quantify. preservation can in many cases provide OAIS Reference Model: exactly that capability. They also offer http://public.ccsds.org/publications/archive Yet an immediate benefit can be identi- the opportunity to support generic appli- /650x0b1.pdf fied, as long as the preservation is suc- cations that can deal with data from any Digital Curation Centre (DCC): cessful. The OAIS view is that the test of source, by using the appropriate virtual- http://www.dcc.ac.uk preservation is that digitally encoded ization information. ESFRI: http://cordis.europa.eu/esfri/ information should remain comprehen- sible and useful for future users to whom We believe CASPAR to be the first pro- Please contact: that data is unfamiliar. However, poten- ject with the aim of producing broadly David Giaretta, Associate Director tial users exist right now to whom the applicable components and a framework (Development) Digital Curation Centre and data is unfamiliar. Pulling current data for digital preservation. The confluence co-ordinator of CASPAR project from the Internet (eg for use in a GRID of events at a European level offers the Tel: +44 1235 446235 application) has many analogies with opportunity for CASPAR to make more E-mail: [email protected] PROBADO – Non-Textual Digital Libraries put into Practice by Thorsten Steenweg and Ulrike Steffens

In the PROBADO project, librarians and computer scientists are collaborating to produce workflows, systems and tools that enable libraries to professionally handle non-textual documents alongside their traditional textual documents.

According to a study at the University of Digital libraries bring together proven Berkeley, the information produced expertise in typical library workflows globally in 2002 amounts to 5 billion ter- and technical know-how on large, dis- abytes. Information on paper only repre- tributed information systems. Hence, sents 0.001% of new information they are also candidates for the manage- recorded in all media and is very often ment of complex, non-textual docu- simultaneously stored in digital format. ments. However, today's libraries are Meanwhile, the relevance of non-textual mainly associated with the provision of digital documents is increasing. This is literature and texts. Furthermore, they obvious in our private lives, where dig- usually offer no way of taking into ital cameras and music downloads are Figure 1: account a document's internal structure. becoming ubiquitous. However it is also Acquisition of 3D content by scanning. For instance, it is possible to retrieve true for professionals, such as architects whole books, but there is no means of who produce and combine digital 2- or accessing single chapters or illustrations. 3D graphical models of monuments, musicians and composers who create The PROBADO project started in and reuse digital audio recordings, or February 2006 and is being conducted by teachers who work with e-learning mate- the University of Bonn, the Technical rial. This last example also illustrates University of Graz and the OFFIS another development: digital documents Research Institute, as well as by the are becoming more and more complex, German National Library of Science and ie they may consist of a variety of partial Technology in Hannover and the documents, possibly based on different Bavarian State Library in Munich. It types of media. Figure 2: Interactive 3D search interface. aims to support libraries in profession-

ERCIM News No. 66, July 2006 49 SPECIAL THEME: European Digital Library

ally handling non-textual, complex doc- Although this workflow is well under- from the catalogue and interactively uments alongside their traditional text stood for text documents, it raises new parameterizing it, or by sketching the documents. The resulting information requirements if non-textual documents architectural shape they are interested in. system will be the basis for a sustainable are also to be managed. PROBADO Search results can be adequately ren- operational library service, which pro- users will for example expect to be able dered in a 3D browser. vides access to non-textual documents to search for 3D models of buildings for scientists and professionals. Initially, with Gothic windows, for pieces of In contrast, e-learning content cannot be PROBADO will provide services for music containing a certain musical restricted to certain media types, and music, 3D graphics and e-learning con- theme or melody, or for e-learning mate- semantically combines different media tent. The underlying digital library rial for students in the first year. To sup- in ever-changing formats. Hence, system is, however, highly generic. port them PROBADO has to offer PROBADO is developing extensible Mechanisms to extend the PROBADO enhanced content-based indexing and indexing and retrieval algorithms. These services to different media types will be retrieval methods as well as advanced, allow existing content-based retrieval devised in future project activities. flexible user interfaces. methods for different media types to be integrated and enriched, enabling The challenges to be met by PROBADO In the area of music, score images are searching by didactic aspects. can be best explained along the work- analysed by Optical Music Recognition flow typically implemented by a scien- algorithms and are later synchronized The PROBADO project is funded by the tific library: with the respective audio recordings. German Research Foundation and also • In the acquisition phase, documents in Among other things, the user interface collaborates with the DELOS Network various formats are collected and enables the user to type in note represen- of Excellence, ensuring European dis- brought into the library. tations or whistle or hum a theme into a semination. The project has a tentative • In the indexing phase, the documents microphone. The music index is used to duration of five years. are prepared for usage by the library retrieve pieces of music matching the patrons. Information on content, form user's request. It can highlight the Link: and bibliographic data is deduced and requested part within the score and syn- PROBADO home page: included into the library's catalogues. chronously play its audio interpretation. http://www.probado.de • Indexed documents can be retrieved from the library by search or browse In the area of 3D graphics, a catalogue is Please contact: facilities. developed of basic architectural shapes, Ulrike Steffens, OFFIS, Germany • A retrieved document can be accessed which are then used to index architec- Tel: +49 441 9722 176 in some way by the library patron. tural 3D models. Users can then search E-mail: [email protected] • Finally, archiving activities ensure the the model database by giving a textual long-term availability of the library's description like 'buildings with Doric Jochen Meyer, OFFIS, Germany documents. columns', by choosing a basic shape Tel: +49 441 9722 185 E-mail: [email protected]

WIKINGER - Semantically Enhanced Knowledge Repositories for Scientific Communities

by Lars Broecker

While many scientific communities use the Internet for the exchange of scientific knowledge, it is only rarely used for the collaborative creation of it. The WIKINGER project is working on semantically enhanced knowledge repositories that support the collaborative generation of knowledge by providing a semi-automatically generated semantic net of the topics contained. A Web application is being developed using front-end building on Wiki technology.

As the success of the Wikipedia project the Wiki approach to knowledge cre- which makes attaining a critical mass of shows, collaborative knowledge creation ation, especially for scientific communi- information very difficult. Second, there on the Internet is possible, even viable. ties. First, there is the problem of is a problem in the way HTML handles This is an interesting result, taking into attaining a critical mass. The domains the linking of pages. Hyperlinks are one- account that the user base generally acts are often highly specialized, leading to way only and do not carry any semantics anonymously and is spread all over the only small numbers of people interested besides 'go there from here'. This world. However, disadvantages exist to in (or even qualified for) participation, becomes a problem as soon as the need

50 ERCIM News No. 66, July 2006 SPECIAL THEME: European Digital Library

to assign semantic labels to associations tion. The project is working towards the input for another iteration of the net- arises, eg in order to enable more sophis- semi-automatic creation of a base for the building process. This in turn is evalu- ticated search tools than full-text semantic Wiki from the digital reposi- ated, and so the process continues. When retrieval. tory, thus reducing the amount of work the experts are satisfied with the results, necessary to attain critical mass. the semantic net is deployed for use in The scenario illustrated above is typical the WIKINGER-repository. for academic communities, especially in The process used by the project is shown the humanities. In general, a variety of in the Figure. The initial phase (labelled This repository combines the function- publications exist that deal with special 0) can be seen as a bootstrapping phase ality of a Wiki system with the expres- facets of the discipline. Each of these for the system. An initial collection of siveness of associations found in lan- contains a multitude of pieces of infor- digitally available data sources including guages of the Semantic Web. Nodes in mation on people, institutions, places publications, articles or databases is the Semantic Net translate to articles in and events, as well as the associations assembled and converted to a format the Wiki; the different types of associa- between them. Unfortunately, organized suitable for further processing. The tion between them form the hyperlinks knowledge repositories available in dig- WIKINGER system stores both the orig- connecting the articles. Since the Wiki is simply a user interface to the Semantic Net, the semantics behind the hyperlinks are retained and can be used for intelli- gent software assistants. The Net is kept in sync with the Wiki through use of a feedback loop that subjects all changes in the articles to the same process as the original data. This allows the identifica- tion of new topics or associations that come up in daily work with the Wiki.

The project is conducted by the University of Duisburg-Essen and the Fraunhofer Institute for Media Communication in cooperation with the Commission for Contemporary History (KFZG) in Bonn. The pilot project focuses on the domain of Contemporary History, in particular on the social and Creation of a semantically enhanced knowledge repository. political history of German Catholicism. While still in its early stages, the first results from the project are very encour- ital format are rare. This problem is rec- inal data as well as the derived format in aging. At the moment we are working on ognized in the community, as the efforts a document repository. The data is then a prototype offering basic functionality, necessary to find these pieces of infor- processed by a module doing Named which will enable users to test-drive the mation among the publications grow. Entity Recognition (NER, labelled 1), system early in the development cycle. Such information would be well suited which gathers entities according to entity for publication in a Wiki system, pro- classes. A human annotator provides the The project is funded by the German vided that a) the process of identifying module with examples for the designated Federal Ministry of Research and articles and their relationships can be classes, thus aiding the system in Education in the program 'eScience'. automated to a high degree, and b) that learning those classes. The advantage of Work on the project commenced in the problem of missing semantics in this approach is the flexibility to include October 2005, and will be completed in hyperlinks can be solved, since there are new classes: given specific examples, September 2008. many different types of relationships that the system can learn to recognize them. need to be expressed in hyperlinks. Link: The output is a collection of recognized WIKINGER project: The goal of the WIKINGER (Wiki Next- entities which serves as the input for http://www.wikinger-escience.de (as of yet Generation Enhanced Repositories) pro- stage 2. Stage 2 tries to identify the asso- German only) ject is the creation of a semantic Wiki ciations between the different entities. containing both the entities relevant to The result of this stage is a semantic net Please contact: the domain and the qualified associa- forming a hypothesis of the knowledge Lars Broecker, Fraunhofer-Institute for tions connecting them. The main differ- contained in the data sources. This Media Communication, Germany ence to other projects dealing with hypothesis is evaluated by human Tel: +49 2241 14 1993 semantic Wikis is the level of automa- experts, and this evaluation is used as the E-mail: [email protected]

ERCIM News No. 66, July 2006 51 Articles in this Section R&D AND TECHNOLOGY TRANSFER

52 Visualization Computer measures Coral Visualization Structures by Chris Kruszynski and Annette Kik, Computer measures Coral Structures CWI, The Netherlands

53 Pattern Recognition by Chris Kruszynski and Annette Kik Fast Synthesis of Dynamic Colour Textures To conserve the Earth's coral reefs, biologists need to study them. In order to by Jiøí Filip, Michal Haindl, Institute make better coral measurements possible, CWI and the Universiteit van of Information Theory and Amsterdam developed sophisticated visualization methods to detect thickness, Automation, Academy of Sciences / angles, lengths, spacing and branch ordering. CRCIM, Czech Republic, and Dmitry Chetverikov, SZTAKI, Hungary Coral reefs are important for the oceans' human drilling –making it hard to distin- 55 Multilingual/Multimodal biodiversity and the growing nature- guish the inside of the coral from the Information Retrieval MultiMATCH -Multilingual/ based tourism industry sector. They background. 'Skeleton loops' – due to Multimedia Access to Cultural show a variety of forms depending on low scanner resolution or branches Heritage the environment, such as light, water growing back together - make branch by Carol Peters, ISTI-CNR, Italy flow and nutrients. Corals can, for ordering, and thus measurement, impos- instance, be ball-shaped or formed like a sible. Noise filtering might affect the 56 Security Access Control and Data tree. The same species of coral can look shape, but the noise itself influences the Distribution Solutions for the completely different depending on envi- measurements, and must thus be Swedish Network Based Defence ronmental factors, which influence the reduced. by Frej Drejhammar, Ali Ghodsi, Erik growth pattern. To compare and classify Klintskog, Erik Rissanen and Babak specimens, it is important to make very Despite these problems, an interactive Sadighi, SICS, Sweden precise measurements of coral thickness visualization system has been created

58 Ambient Intelligence and branch distances. Biologists used to and is being used. It is easier and quicker Bringing Ambient Computing out do this by hand, which takes a lot of than older systems. The system has been of the Labs - INRIA's Agreement time, and can cause unwanted errors. created using existing open source soft- with JCDecaux Due to the large number of branches, and ware, such as the NLM Insight by Michel Banâtre, INRIA / IRISA, complexity of corals, such manual mea- Segmentation and Registration Toolkit France surements are generally only performed (ITK) for image processing, and the 59 Text Research for a single metric – for example, branch Visualization Toolkit (VTK) for visual- New Text - New Conversations in length – or only for a small number of ization and interaction. New software the Media Landscape branches of a specimen. was written for performing the actual by Jussi Karlgren, SICS, Sweden measurements, using the VTK frame- 60 Logistics To make these coral measurements work. The application area – coral LOG4SMEs: Improving the easier, quicker, more accurate and biology – is novel for these techniques, Logistics Performance of SMEs in robust, and more comprehensive, some of which are quite advanced. For the Automotive Sector Krzysztof Kruszynski (CWI) and Jaap by Imre Czinege, Széchenyi István Kaandorp (Universiteit van Amsterdam) University, Gyõr, Hungary, and Elisabeth Ilie-Zudor and András developed a method for the quantifica- Pfeiffer, SZTAKI, Hungary tion of branching coral shape, for which coral specimens are scanned in a CT 61 Networks scanner. The scan data are filtered, seg- VTT Develops Dependability Evaluation Methods for IP mented and transformed to a centerline Networks skeleton. This method simplifies detec- by Ilkka Norros, VTT -Technical tion of features like branches, branching Research Centre of Finland locations, and endpoints. The skeleton is then measured by the computer, and the 62 Software Engineering Validating Complex results are subjected to statistical anal- Telecommunication Software ysis; the measurements include thick- by Sergio Contreras, María del Mar ness, angles, lengths, and spacing of Gallardo, Pedro Merino, David branches. Sanán, University of Málaga/SpaRCIM; Javier Rivas, However, several problems can occur. Joaquín Torrecilla, Centro de Noise comes from CT scanner ray scat- Tecnología de las Comunicaciones, tering, decayed parts of the coral and S.A (CETECOM), Spain creatures growing on it. Branches broken Volume data from a CT scanner with a 62 IST Results - Insight into EU R&D during transport are reattached with glue, computed skeleton inside. Colours Achievements influencing the measurements. It is diffi- indicate branche numbers, counted from cult to fill holes - made by worms or the outside. Picture CWI.

52 ERCIM News No. 66, July 2006 R&D AND TECHNOLOGY TRANSFER

The accuracy of the computer measure- ments also still needs to be quantified. Clustering algorithms could detect simi- larities between species. Biologists can examine similarities and differences in shape, and study the correlation between shape and environment. This might help biologists with the important conserva- tion challenges of coral reefs.

This work was partially carried out in the Surface of a real, scanned coral and a simulated specimen. The right one is the real context of the Virtual Laboratory for e- coral. Pictures CWI. Science project (http://www.vl-e.nl). This project is supported by a Bsik grant example, the Curvature Flow filter from intervention, but any loops in the from the Dutch Ministry of Education, ITK is an advanced image filtering tech- skeleton must be disconnected by hand; Culture and Science (OC&W) and is part nique which reduces the amount of noise there is no known technique to reliably of the ICT innovation program of the with minimal impact on the shape of the determine where a part of the skeleton Ministry of Economic Affairs (EZ). coral. should be removed to disconnect the loop, or how much of the skeleton Links: Some of the tasks in the new system are should actually be removed. The com- http://www.cwi.nl/ins3 performed manually. Segmentation – puter assists the user, and the number of http://www.science.uva.nl/research/scs/ dividing the 3D image into objects of loops is typically very small, making this GF2004/ interest – is mostly automatic, but it pro- an easy and quick task. http://homepages.cwi.nl/~kruszyns/ duces hollow coral with holes in it. The inside is filled, and the holes are patched, In the future, the researchers want to Please contact: but the results of this automatic process create more advanced result visualiza- Krzysztof Kruszynski, CWI must be checked for correctness. The tions; these are currently shown as a Tel: +31 20 592 4325 skeleton can be extracted without user large collection of graphs and numbers. E-mail: [email protected]

Pattern Recognition Fast Synthesis of Dynamic Colour Textures by Jiøi Filip, Michal Haindl and Dmitry Chetverikov

The textural appearance of many real-world materials is not static, but changes over time. If such change is spatially and temporally homogeneous, these mate- rials can be represented by means of dynamic textures (DT). DT modelling is a challenging problem that can improve the quality of computer graphics applica- tions. As part of the MUSCLE Network of Excellence, collaboration between three ERCIM members – CRCIM-UTIA, SZTAKI and CWI – has developed a novel hybrid method for colour DT modelling.

Dynamic or temporal textures (DT) can graphics. Previous DT modelling sequences, which is easily performed by be defined as spatially repetitive motion approaches were based either on video means of contemporary graphics hard- patterns exhibiting stationary temporal editing techniques or time-consuming ware. properties and having indeterminable mathematical models, which were gen- spatial and temporal contents. The sur- erally restricted to greyscale DT mod- The method, illustrated in Figure 1, is face of water, fire and straw in the wind elling. based on a combination of input data are typical DT examples. As a basic rep- dimensionality reduction using the resentation of DTs, a video sequence has The proposed method shows good per- eigen-analysis, and the subsequent mod- finite duration. This limits the use of formance for most of the tested DTs: elling of resulting temporal coefficients DTs in virtual reality systems of any this depends mainly on the properties of by means of a causal simultaneous auto- kind, making temporally unconstrained the original sequence. Moreover, this regressive random field model (CAR). modelling of DT a challenging problem method significantly compresses the The model is learned from real mea- for research areas such as computer original data and enables high-speed sured DTs (typically 250-frame video vision, pattern recognition and computer synthesis of unlimited artificial sequences). Measured data often show

ERCIM News No. 66, July 2006 53 R&D AND TECHNOLOGY TRANSFER

Figure 1: Scheme of the pro- posed dynamic texture hybrid model.

spatial discontinuity between successive cally under several additional and porary graphics hardware programming. images in DT sequences of very fast acceptable assumptions. Moreover, this technique enables signifi- processes. Furthermore, available cant compression of the original DT sequences are usually too short for The CAR model synthesis is very data, typically at a ratio of between 1:5 robust statistical estimation of model simple. New temporal mixing coeffi- and 1:10 depending on the length and the parameters. We therefore performed the cients of individual eigen-images can be character of the DT sequence. The interpolation of individual temporal directly generated from the model equa- method was verified visually and by coefficients by means of cubic splines. tion using the estimated model para- using two proposed statistical similarity This pre-processing step generates addi- metric matrix and a multivariate measures on dynamic texture data sets. tional frames between each pair of orig- Gaussian generator with estimated noise These include fire, boiling water, inal frames and improves the learning variance. Both the synthesis of new tem- moving escalators, smoke, straw etc, and quality of the underlying random field poral coefficients and the following are taken from the DynTex texture model. The major advantage of the CAR interpolation of eigen-images can be per- database maintained by our partners at model is that it can be solved analyti- formed at even faster rates using contem- CWI. The comparison of original and synthesized DT frames of natural tex- tures is shown in Figure 2. The corre- sponding results for the man-made tex- Figure 2: Examples of tures are illustrated in Figure 3. The anal- frames from original DT ysis time of the original DT was about (odd rows) and the three minutes. The synthesis of a new corresponding DT sequence is very fast (about 60 synthesised frames using frames/s using non-optimized CPU soft- the proposed model (even ware implementation on a PC with an rows) for three natural Athlon 2GHz processor), and the genera- DTs. tion time can be further improved using the programmable processing unit of a contemporary graphics card.

Links: DTdemos: http://ro.utia.cz/demos/DTsynth.html DynTex database: http://www.cwi.nl/projects/dyntex/ http://ieeexplore.ieee.org/Xplore/guesthome.jsp Figure 3: Examples of (see IEEE Digital Library for the article). frames from original DT (odd rows) and the Please contact: corresponding Jiøí Filip, Institute of Information Theory synthesised frames using and Automation, Academy of Sciences / the proposed model (even CRCIM, Czech Republic rows) for two man made Dmitry Chetverikov, SZTAKI, Hungary DTs. Tel: +420 266052365 E-mail: [email protected]

54 ERCIM News No. 66, July 2006 R&D AND TECHNOLOGY TRANSFER

Multilingual/Multimodal Information Retrieval MultiMATCH - Multilingual/Multimedia Access to Cultural Heritage by Carol Peters

MultiMATCH, a 30 month specific targeted research project under the Sixth Framework Programme, plans to develop a multilingual search engine for the access, organisation and personalised presentation of cultural heritage information.

Cultural heritage content is everywhere • automatically classify the results in a The concepts underlying the system are on the web, in traditional environments semantic-web compliant fashion, depicted in Figure 1. On the left-hand such as libraries, museums, galleries and based on a document's content, on its side of the figure, we show users audiovisual archives, but also in popular metadata, on its context, and on the querying the system in different lan- magazines and newspapers, in multiple occurrence of relevant cultural her- guages for a range of information on the languages and multiple media. The aim itage concepts in the document; Dutch artist Vincent van Gogh, of the MultiMATCH project is to enable • automatically extract relevant infor- including critical analyses, biographies, users to explore and interact with online mation which will then be used to details of exhibitions. The system dis- accessible cultural heritage content, create cross-links between related plays the retrieved information in an across media types and language bound- material, such as the biography of an integrated fashion, and in a format deter- aries. artist, exhibitions of his/her work, crit- mined by the particular user profile. On ical analyses, etc.; the right-hand side, we show possible The MultiMATCH search engine will be • organise and further analyse the mate- sources of this information and the ways able to: rial crawled to serve focused queries in which it can be acquired. • identify relevant material via an in- generated from information needs for- depth crawling of selected cultural mulated by the user; The project aims at developing a system heritage institutions, accepting and • interact with the user to obtain a more prototype that can be demonstrated for at processing any semantic web specific definition of initial informa- least four languages: Dutch, English, encoding of the information retrieved; tion requirements; Italian and Spanish, and extendible to • crawl the Internet to identify websites • the search results will be organised in others. Figure 2 gives an idea of the with cultural heritage information, an integrated, user-friendly manner, workflow for the system development. locating relevant texts, images and allowing users to access and exploit videos, regardless of the source and the information retrieved regardless of The R&D work is organised around target languages used to write the language barriers. three activities: query and/or describe the results; • User-oriented research activities will primarily investigate the user require- ments and consequent definition of the required functionality of the system, content selection and preparation, studies on the ontologies adopted by cultural heritage institutions and the semantic encoding to be adopted by the system. • System-oriented research activities include the study and development of software components for the acquisi- tion, indexing, classification, retrieval and presentation of multilingual cul- tural heritage information in diverse and mixed media and their integration in the system prototypes. • Validation activities will include eval- uation of the system and its compo- nents. User groups of cultural heritage institutions and cultural heritage con- sumers will be formed to test the Figure 1: The MultiMATCH idea. system and provide feedback.

ERCIM News No. 66, July 2006 55 R&D AND TECHNOLOGY TRANSFER

The consortium comprises eleven part- ners, representing the relevant research, Figure 2: industrial and application communities. Workflow for Each member will play a significant role development of in the design and development of the MultiMATCH system, providing a part of the necessary search engine. know-how; the blend of competences will be a key factor for the success of the project. The six academic research part- ners (ISTI-CNR, U.Amsterdam, UNED- Madrid, U.Geneva, U.Sheffield, Dublin City U.) have already worked closely together collaborating in coordination of the Cross-Language Evaluation Forum (CLEF). CLEF focuses on stimulating advances in research in multilingual/multimedia information retrieval and on information extraction and user/system interaction in the cross- language context. The industrial part- institutions is to be able to exploit the further information, please see the pro- ners, OLCC PICA, UK, and WIND, results of the project in their future infor- ject website. Italy will play a major role in the design mation dissemination activities. of the system architecture and the inte- Links: gration of the various components, also MultiMATCH is supported by the unit http://www.multimatch.inf with a view to the future industrialisation for Content, Learning and Cultural http://www.clef-campaign.org and commercial exploitation of the Heritage (Digicult) of the Information system. The cultural institutions, Casa de Society DG and is coordinated by ISTI- Please contact: América, Spain, Alinari, Italy, and CNR, Pisa, Italy. Carol Peters, ISTI-CNR, Italy Sound and Vision, the Netherlands, each MultiMATCH Project Coordinator represent a different type of cultural The project kick-off meeting was held in Tel: +39 050 3152897 institution with content in diverse media Pisa, 10-12 May 2006. The meeting was E-mail: [email protected] and languages but all three groups have mainly dedicated to a detailed planning in common the desire to improve and of the activities for the first year. The extend their information dissemination first system prototype is scheduled for capabilities, and to work towards the release in November 2007. development of standards for interoper- MultiMATCH will issue a quarterly ability and metadata in the cultural her- newsletter providing information on the itage domain. The intention of these project activities, events and results. For

Security Access Control and Data Distribution Solutions for the Swedish Network Based Defence

by Frej Drejhammar, Ali Ghodsi, Erik Klintskog, Erik Rissanen and Babak Sadighi

Network Based Defence (NBD) is a national Swedish military funded project with the goal of developing the next generation command and control system. The main focus of the project is to develop a system that is so scalable, flexible, robust, decentralized and interoperable that it can handle the needs of tomorrow's battlefield. SICS has in close cooperation with FMV (Swedish Defence Materiel Administration) and Saab Systems developed a role based access control system for NBD.

Access control is about deciding who information system which can be highly highly decentralised and dynamic gets access to what resources in the flexible, resilient and provide informa- system of systems. In this environment system. NBD is a plan for a military tion superiority. The system is built as a with high demands on mobility and

56 ERCIM News No. 66, July 2006 R&D AND TECHNOLOGY TRANSFER

DKS provides fault tolerance and enables parts of the system to continue to function autonomously in case of loss of network communications.

autonomy, traditional centralised solu- storage system and the Publish and located on one single machine, but dis- tions for access control can no longer be Subscribe service can work out which tributed to the edge of the network with applied. SICS has previously used the information is needed where in the net- the additional benefit of reduced band- Delegent authorisation server, which is work. This provides fault tolerance and width consumption and removal of a based on research done at SICS, in proof enables parts of the system to continue to single-point-of-failure. of concept demonstrators for NBD. function autonomously in case of loss of During the year Delegent has been network communications. The successful coupling of Delegent and redesigned to be based on XACML DKS is just one example of where struc- (eXtensible Access Control Markup The DKS system is designed to connect tured P2P-systems can be applied within Language), a standard for access control a large number of machines with the NBD project. We foresee a multitude policies. To support the NBD require- dynamic behaviour in an overlay net- of other applications that could benefit ment we have extended XACML func- work. Dynamic behaviour includes from the usage of a structured P2P- tions for delegated decentralised admin- machines joining and leaving the system, such as service repositories, user istration of policies. Decentralised overlay, as well as machines failing and databases and flat name space resolution administration provides more resilience connections to machines failing. With services, some of which will be explored to failures and faster reaction times when the minimal requirement of point to in the future. We will also continue the adapting to new situations. point connectivity, aggregated function- research on access control solutions for ality such as reliable data storage, name- dynamic systems with more research on Also, in order to further adapt Delegent based communication and multicast are administration and revocation models to the NBD requirements, we have cou- provided. and how to best present information to pled Delegent to a structured peer-to- users. peer-system (P2P) called DKS, which The updated version of Delegent and provides decentralised storage of the DKS have been installed at the Swedish Link: access control policies. The DKS Defence Material Administration proof http://www.sics.se/spot/ system, implemented by SICS and KTH, of concept facility and successfully used provides a decentralised data manage- in experiments during the autumn of Please contact: ment system, with additional support for 2005. Erik Rissanen and Frej Drejhammar, SICS a Publish and Subscribe service. The Tel: +46 633 1500 DKS enhances Delegent to no longer The DKS enhanced Delegent system is a E-mail: [email protected] and [email protected] rely on a centralised policy repository, as potential core component of NBD. it distributes policies via the DKS Access control functions are no longer ERCIM News No. 66, July 2006

57 R&D AND TECHNOLOGY TRANSFER

Ambient Intelligence Bringing Ambient Computing out of the Labs - INRIA's Agreement with JCDecaux

by Michel Banâtre

INRIA and the JCDecaux group, a worldwide leader in street furniture, recently signed a technology transfer agreement. This may come as a surprise, since this group's business areas make it an unusual partner for INRIA. Nevertheless, the agreement fits well with the research on ambient computing being carried out by INRIA's ACES (ambient computing and embedded systems) research team.

ACES became involved in this exciting It is important to note that despite the case with innovation, it isn't so much the area in 1998, working on Spontaneous wealth of new ideas generated since technical difficulties that block progress, Information Systems (SIS). Our SIS 2000, there have been no major innova- but rather the challenge of identifying research involved dynamic information tions resulting in core applications or the 'missing piece'. systems shared by proximal mobile widespread use. In other words, ambient devices that communicate through short- computing hasn't left the lab. The bottom line is that integrating 'con- range radio transmission. This led to our Overcoming this is a real challenge, and text awareness' into mobile terminals is groundbreaking work on ambient com- one that the ACES project team wanted still very difficult. Even though it seems puting, which in turn led to the develop- to tackle. Our approach, which has simple from a technical perspective, ment and study of a variety of novel con- already been used successfully several there are numerous unavoidable obsta- cepts, including 'spatial computing'. times in the past, is to "go the distance", cles, not least of which is negotiating as we did recently with Texas agreements between the various partners In this kind of architecture, physical Instruments. This is even more critical in on wireless standards (Bluetooth, RFID, objects are data symbols and physical the domain of ambient computing, which IR etc) and software. The situation is space is the basis for addressing. In other in essence is based on information tech- complicated by the current absence of an words, such an architecture supports nology that is tightly coupled with the application that will bring the players implicit computation using the flow of real world. If we ignore this aspect of together and motivate them to overcome data from the physical motion of the ambient computing, we will overlook the the obstacles they face. associated objects. When we proposed real challenges, the very ones we must such concepts, there were already 'pop- address to ensure the emergence and There exists an alternative however, ular' solutions in the 'Ubicomp' commu- application of our ideas as researchers. which is the natural outgrowth of our nity. Essentially they were based on 'log- That's why we've had numerous discus- proposed spatial machine and involves ically centralized' approaches, built sions, productive to varying degrees, integrating context awareness into the around information systems independent with a broad range of enterprises, environment. This approach has been of the physical environment. Such con- including equipment makers, mobile tested since 2001 at INRIA Rennes with cepts have only very recently emerged as telephony operators and end users. the 'WebWalker' application, which relevant focus areas, thanks to the Although these discussions did not lead among other things allows users to move growing interest in sensor-related to actual collaboration, they allowed us physically through the Web. With this themes (electronic labels, sensor net- to identify and understand what was pre- technology, the challenge of producing works, smart dust etc). venting ambient computing from taking an effect is linked to the quality and hold in these enterprises. As is often the quantity of the sites encountered.

Integrating 'context awareness' into the environment has been tested since 2001 at INRIA Rennes with the 'WebWalker' application, which among other things allows context-driven navigation as the user moves.

58 ERCIM News No. 66, July 2006 R&D AND TECHNOLOGY TRANSFER

Within this context, street furniture rep- learnt of our research, from a presenta- problem for a group whose revenues resents a very attractive vector. The tion about our 'Ubibus' system. come mainly from advertising. Our question of computing aside, street furni- expertise and broad perspective on these ture is already at the centre of an envi- From the underlying principles alone, problems as well as the relevance of our ronment-based information system, tied and in light of existing technology, solutions have all been critical factors in to transport, tourism, events and JCDecaux immediately saw how they JCDecaux's decision to work with us. maps/directions, not to mention adver- could benefit from our ambient com- We have also taken the important step of tising. Such an information system, built puting solutions. However, convincing obtaining patents to protect certain core around physical objects, is particularly the group to formally adopt these solu- aspects of our solutions. well suited to the application of our solu- tions required significant effort on our tions, which are also based on spatial dis- part, not only from a technical perspec- Link: tribution and management of informa- tive (creation and demonstration of http://www.inria.fr/recherche/equipes/aces. tion. pilots for real-world situations, evalua- en.html tion of performance and development of Currently three large global groups – extensibility) but also in terms of finan- Please contact: ClearChannel, Viacom and JCDecaux – cial criteria (costs of deployment and Michel Banâtre are the main players tapping into this exploitation, long-term survivability). INRIA Rennes – Irisa application domain. It was during 'Les One of JCDecaux's concerns is the 'intru- Tel : +33 2 99 84 72 85 Transports au XXIième Siècle' – an siveness' of the implicit way in which E-mail: [email protected] event organized by the French Senate in ambient computing systems function. April 2004 around the theme of transport This could meet resistance from the in the 21st century – that JCDecaux public, and that would represent a real

Text Research New Text - New Conversations in the Media Landscape by Jussi Karlgren

New text - that is, new forms of textual communication - such as blogs, instant messages, and Wikis contrast with traditional textual genres in some respects and remain true to them in others. This calls for new research methodologies and provides new challenges for text research.

Recent advances in publication and dis- encyclopaedia built through the coopera- semination systems have given rise to tive efforts of its readers. New forms of new types of text - dynamic, reactive, communication such as these raise ques- multi-lingual, with numerous cooper- tions for researchers in a variety of ating or even adversarial authors and fields, and this past spring has seen no little or no editorial control. Many of less than two international workshops these new types of text remain true to held on the analysis of new texts - established existing textual genres. bringing together several topically sim- Others break new ground, moving ilar research projects around Europe. towards new emergent textual genres made possible by the dramatically low- One of the first questions in this research ered publication threshold and faster dis- field is how new text is different. How tribution mechanisms. new is 'new'? Have we never had new text before? What, in fact, is the differ- These new forms of text, with a consid- ence between 'new' and 'old'? It is quite erable amount of attention from tradi- clear that authors of both newand tradi- tional media, most notably include blogs tional texts are aware of linguistic styles - texts written as a timely running com- of various sorts and use them in ways mentary of public or private matters. they deem appropriate. When new Another well-established and remark- genres emerge, such as blogs or able new genre is the Wikipedia - an Wikipedias, they may pattern them- A newspaper reader - how new is 'new'?

ERCIM News No. 66, July 2006 59 R&D AND TECHNOLOGY TRANSFER

selves on existing ones, such as diaries or What services can be expected to emerge Logistics encyclopaedias, thereby drawing on the from the analysis of new text? Several prestige and position of those existing information access services already use LOG4SMEs: Improving genres. Alternatively, they may cast Wikipedias to extract facts and relation- around for forms suitable for their ships for better understanding of other the Logistics intended impact and stature. How to texts. The analysis of public opinion on achieve this form where none exists is a issues, or of consumer attitudes towards Performance of SMEs matter yet to be resolved! products and services on the market, has found a rich vein of data in blogs. To do in the Automotive New texts are more than simply this with any level of reliability however, revamped traditional texts however: they our processing tools, tuned to newsprint Sector have features that traditional texts lack. and other well-edited texts, need to They are interconnected by a network address the challenges of variable or by Imre Czinege, Elisabeth Ilie-Zudor created by authors and readers in a com- multi-lingual texts, containing register and András Pfeiffer plex interplay of explicit textual refer- swings and formality melanges - not ences; they also position themselves shoddy, but New! The LOG4SMEs project aims at much more explicitly within a context of improving the logistics performance of other texts than has previously been the We are currently in a transition phase, small and medium-sized enterprises case. Studying this fabric of textuality is which is exceedingly interesting both (SMEs) in the automotive sector. just the first step in this area of research. philologically and industrially. Similar phases have been seen before, for Small and medium-sized automotive In view of the less formalized publica- example with the introduction of inex- businesses are threatened by a number of tion process, the credibility of new texts pensive printing processes, publishers pressing issues, including saturation of can be called into question. When tradi- put out compilations of private corre- the market, fierce competition and the tional texts are published in paper form, spondences as one form of written com- reduction of entry-barriers. Among these a number of steps - variable from one munication assumed to be of public problems are the identification of weak- mode of publication to another - involve interest. The only certainty we have nesses in logistics and production pro- satisfying editors or publishers of the today is that in the future, people will cesses and the finding of appropriate veracity, relevance, quality and impact find creative ways of using the tech- action lines or IT tools to overcome of a text. (Whether this is a good or a bad nology we are introducing today ? again, them. These issues do not depend exclu- thing is a different discussion entirely!) not unpredictable, but New! sively on companies themselves, but are New texts lack this guarantee of having also heavily affected by the economic, passed many pairs of eyes en route from These new movements will be discussed logistical and social character of the author to reader. There is no simple mea- in coming research events. Those inter- regions in which the SMEs are located. sure of the impact, the variable perceived ested are welcome to join the discussion intellectual status and quality of new at [email protected]! In January 2006, partners from three texts. Understanding credibility, regions of the European Union launched authority and other facets of quality are Links: the project Log4SMEs, a Regins project central to any attempt at analysis of the New Text Workshop: (see http://www.regins.org). The impact of new texts. http://www.sics.se/jussi/newtext University of Bergamo from Lombardy Region (Italy) leads the consortium. Underlying the issue of credibility and AAAI Blog Symposium: Széchenyi István University represents authority is the question of who the http://www.umbriacom.com/aaai2006_ West Pannonia (Hungary), while the author is and why. What makes a weblog_symposium/ Fraunhofer Institute for Manufacturing blogger blog? Why do people devote Engineering and Automation (IPA) rep- time and energy to editing Wikipedia Int. Conference on Weblogs and Social resents the region Baden Württemberg pages? Understanding the motivations Media, March 26-28, 2007, Boulder, (Germany). and intentions of authors is not incidental Colorado, USA: http://www.icwsm.org to the task of understanding the texts. One of the project's goals is to enable Integral to the blog is who and why; inte- Please contact: SMEs to compare their individual cur- gral to the Wiki is purpose; and no-one Jussi Karlgren, SICS, Sweden rent logistics performances with industry can pretend that the texts are analysable Tel: +468 633 15 00 and regional averages as well as with the in isolation. While texts remain texts, E-mail: [email protected] best performer. A second goal is to even with new syntactic patterns and enable each company to identify its cur- new lexical items, their contextuality is rent performance gaps and to determine so great as to dominate many other con- whether, disregarding the specific com- tent features. And this, in fact, is truly pany's actions, there are regional factors new! that affect its logistical performance. Special emphasis is put on identification

60 ERCIM News No. 66, July 2006 R&D AND TECHNOLOGY TRANSFER

Networks strategies that will allow the three regions to develop the location factors VTT Develops Dependability Evaluation Methods for their local automotive industries. for IP Networks LOG4SMEs will provide companies with the ability to acknowledge the best by Ilkka Norros practices in their region/industry and will encourage the exchange of good The Finnish research project 'Dependability evaluation methods for IP networks' - logistics practices among companies. IPLU aims to create a conceptual framework and methods for assessing the complex From an extensive survey throughout the problem "Can one rely on IP technology?" The research is done by VTT and funded three regions, a Web service addressed by several organisations, including the Ministry of Traffic and Communications, to all registered companies will provide a the National Emergency Supply Agency and four telecom operators. unified database of logistical perfor- mance indicators and practices in the Since the project aims at a comprehen- dependability problems in IP net- automotive sector. The project will sive view of the topic, it has a multidisci- working. The internet is recognized as a directly involve SMEs operating in the plinary character combining VTT's new medium the character of which is automotive sector by the provision of a expertise in telecommunications tech- more generic than traditional electronic survey, as well as phone and direct inter- nology, teletraffic and network mod- communication media like telephone views. The design of the survey is based elling, and reliability analysis. In the and television. The baseline paper pro- on the internationally acknowledged international workshop 'Dependability poses a preliminary conceptualization of standard SCOR-model and SCOR–indi- of all-IP networks', organised by the pro- dependability, where a traditional set of cators. The results will also be dis- ject 18-19 May 2006 (http://iplu.vtt.fi/ dependability attributes is augmented by tributed to local industrial associations or industry clusters. A preliminary conceptual Through the project web site, automotive framework for network companies will be able to compare their dependability, discussed in logistics and production performance the baseline paper of the IPLU with other SME- and industry-specific project. indicators as well as to derive their strengths and weaknesses. Developing a catalogue of logistics practices for each identified cluster of companies and describing the main regional location factors that foster or inhibit their logis- tical strategies is also in the scope of Log4SMEs. workshop-06.html), the multidisci- aspects that reflect the self-regulation plinary nature was further enriched by features of the internet architecture. The Please contact: contributions from human activity paper is available from the project's web- Imre Czinege, Széchenyi István University, research and theoretical computer sci- page. Györ, Hungary ence. Tel: +36 96 613673 The main aim for the autumn of 2006 is E-mail: [email protected] This approach has proven fruitful. The to propose an initial set of criteria, indi- established methods of safety and relia- cators, procedures and recommendations bility assessment used in contexts like for consideration by network operators nuclear power are not mechanically and other actors on the telecommunica- transferable to the highly dynamic world tion scene. VTT is prepared to continue of IP networking, and, on the other hand, IPLU's work in future projects. researchers with telecom background are mostly unaware of the experience accu- Link: mulated in the reliability research tradi- http://iplu.vtt.fi. tions. The IPLU project aims at inventing new methods, but not at Please contact: inventing the wheel anew from scratch. Ilkka Norros, VTT-Technical Research Centre of Finland As one of its first tasks, the research Tel: +358 20722 5627 team produced a baseline paper that sets E-mail: [email protected] the scene for structured discussion of the

ERCIM News No. 66, July 2006 61 R&D AND TECHNOLOGY TRANSFER

Software Engineering Validating Complex Telecommunication Software

by Sergio Contreras, María del Mar Gallardo, Pedro Merino, David Sanán, Javier Rivas and Joaquín Torrecilla

In a collaboration with the telecommunication company Centro de Tecnología de las Comunicaciones, S.A. (CETECOM), a research group at University of Malaga is validating communication software written in languages such as SDL, ASN.1 and C.

Automatic validation of many critical systems has usually been done with model-oriented techniques, like model checking. This approach requires the previous construction of a specific model (an abstraction) of the problem, using very specific languages oriented to academic tools like SPIN. In general terms, however, this abstraction is not well suited to automatically obtain an implementation. This is why this model- based methodology has only limited suc- cess within industry and is not generally employed, in particular, by telecommu- nication companies.

Many companies employ classic C and C++ languages for their critical commu- nication software, and they have tradi- tionally replaced validation by testing and/or debugging. Other companies use development languages that preserve Validating with Tau: SDL diagram (left), validation script (center) and validation results validation facilities, as long as they are (right). also powerful enough to automatically obtain the final software. In particular, they use standard description languages Centro de Tecnología de las implementation with more than 800,000 like the notations SDL (Specification Comunicaciones, S.A. (CETECOM), the lines of C code. It is worth noting that and Description Language) and ASN.1 research institute Centro Andaluz de such software has been previously tested (Abstract Syntax Notation One), which Innovación y Tecnología de la with traditional methods and that it is are promoted by ITU-T (Telecommuni- Información y las Comunicaciones complex enough to have Tau validation cation Standardization Sector of the (CITIC) and the Software Engineering tool running for hours. International Telecommunication Group of the University of Malaga Union). These languages offer an accept- (GISUM). As expected, the first problem in per- able formal basis and they are linked to forming validation is the size of the other standard notations, like TTCN In particular, in the context of the pro- system. It produces millions of global (Tree and Tabular Combined Notation) ject, CETECOM and GISUM are states each having more than 20 Kbytes, and UML. In theory, using ITU-T lan- working on a methodology to validate considering only the SDL structures. guages makes it possible to perform vali- existing complex software, using UMTS However, there is a second more inter- dation at the same time that the software signalling as a case study. This software esting problem. As current implementa- is developed; note that the executable is mainly based on SDL to implement tion is only a part of the whole signalling code is obtained automatically by trans- the protocol state machines, ASN.1 to system, we need to complete the SDL lating SDL to C/C++. However, these describe the message types (as defined description with information from the validation facilities seem to be under- by standardization committees) and C to environment. The Tau validator can exploited, and more efforts should be implement critical parts (like automatically produce many messages devoted to obtaining a methodology for encoding/decoding algorithms). The simulating the environment; realistic effective validation of telecommunica- whole software was produced with the messages, however, have a great number tion software. This is one of the objec- tool Telelogic Tau. It contains more than of parameters that cannot be efficiently tives of a joint project of the company 100,000 SDL symbols and it produces an produced by the tools.

62 ERCIM News No. 66, July 2006 R&D AND TECHNOLOGY TRANSFER

We have designed a methodology to deal with both problems at the same time. The wide range of achievements in just one approach is based on the partitioning week's reporting, such as: method proposed in the literature, but we • A range of GRID services which allow have adapted it to existing heteroge- the exchange of data and job informa- neous software (SDL, ASN.1 and C). tion between different Grid systems We isolate processes and blocks and and give a single client access to dif- construct a realistic environment for Insight into EU R&D ferent Grid infrastructures each partition. The environment is con- http://istresults.cordis.europa.eu/index. structed as a validation script that limits Achievements cfm/section/news/tpl/article/ID/82315 the set of messages and the range of /BrowsingType/Features values in the messages. Then, the val- More than 7 billion Euro has been • Open source software for museums, idator uses its internal mechanism to invested by the European Commission in archives and libraries, allowing them generate messages automatically. the Information Society Technologies to utilise their cultural content and (IST) research priority. Now, thanks to a resources in novel ways while drasti- Using validation scripts, we can perform dedicated online news service – IST cally reducing the costs of deploying validation of separate parts; however, we Results – technology users and research digital library services still need more optimization methods to teams can read about results and innova- http://istresults.cordis.europa.eu/index. deal with complexity. One of them is bit- tions emerging from this considerable cfm/section/news/tpl/article/ID/82247 state respresentation. Another one is R&D activity which have potential for /BrowsingType/Features variable hiding, which is applied to big C further development or exploitation. • A sports broadcasting platform linking structures. the existing media channels, internet, Launched in 2003 by DG Information TV and phone to offer a wide variety The project has been successful in terms Society and Media, this free service of services for journalists, VIPs, of quality of the validation results reports on the latest achievements from broadcasters, advertisers and, of (including the confirmation of the IST projects via in-depth feature articles course, the fans robustness of the code) and also in terms and news-in-brief stories. There is also a http://istresults.cordis.europa.eu/index. of the methodology generated. The work calendar of events dedicated to IST pro- cfm/section/news/tpl/article/ID/82199 has also been useful to propose some ject events. /BrowsingType/Features extensions in the validation tool. We • Professional wiki-based collaboration have shown that validation is a valuable Through its editorial approach, IST platform that has enabled the creation task for existing telecommunication Results is establishing an international of a world-leading scientific network code, even when validation was not a profile in the mainstream press with arti- in archaeology. primary aim during development. This is cles syndicated to leading publications http://istresults.cordis.europa.eu/index. made possible by the use of standard lan- such as the Financial Times, New cfm/section/news/tpl/article/ID/82290/ guages like SDL with tool support. Scientist, Wired and ZDNet as well as to BrowsingType/Features specialist online information portals. Links: IST Results follows a monthly Editorial CETECOM: http://www.cetecom.es Feature articles produced by IST Results Calendar and for July 2006 the theme is CITIC: http://www.citic.es are also regularly circulated through Cultural heritage, including digital GISUM: http://www.lcc.uma.es/~gisum/ press wire services – an article libraries. describing results from a robotics project Please contact: attracted over 93,000 hits from media To find out more, browse the website at Maria del Mar Gallardo professionals worldwide, demonstrating http://istresults.europa.eu/ and subscribe University of Málaga/SpaRCIM, Spain the power of this approach for communi- to the free e-alerts, RSS feed or weekly E-mail: [email protected] cating R&D more widely than traditional e-bulletin. If you are interested in repub- Commission dissemination tools. IST lishing IST Results articles on your own Pedro Merino Results' own website attracts around website or would like more information, University of Málaga/SpaRCIM, Spain 250,000 visits a month from technology contact [email protected]. E-mail: [email protected] users, researchers, media and investors in more than 50 different countries. Links: Joaquín Torrecilla, CETECOM, Spain, IST Results home page: E-mail: [email protected] IST Results covers nearly thirty tech- http://istresults.europa.eu/ nology and market application areas so ERCIM members are certain to find arti- Editorial Calendar: cles relevant to their own interests. http://istresults.cordis.europa.eu/index.cfm? Browsing through recent news reveals a section=press&tpl=editorial_themes

ERCIM News No. 66, July 2006 63 EVENTS

CALL FOR PARTICIPATION CALL FOR PARTICIPATION CALL FOR PARTICIPATION 2nd International Workshop Loco Mummy Contest 2006 TEL-CoPs'06: First Workshop on Automated Specification Develop a new Interface Program and on Building Technology-Enhanced and Verification of Web Systems win a laptop computer! Learning Solutions for Communities of Practice Cyprus, 15-16 November 2006, The goal of the contest is to find one or more clever and creative ways to use Crete, Greece, 2 October 2006 The increased complexity of Web sites standard hardware to and the explosive growth of Web-based design a software The ITEL-CoPs'06 workshop, held in applications has turned their design and application with a nat- conjunction with the First European construction into a challenging problem. ural, intuitive way of Conference on Technology Enhanced Nowadays, many companies have interaction between Learning, focuses on current research diverted their Web sites into interactive, the machine and its trends in technology enhanced learning completely-automated, Web-based appli- user. solutions that adress the multiplicity and cations (such as Amazon, on-line complexity of needs of 'Communities of banking, or travel agencies) with a high The application soft- Practice' (CoP). It advocates for complexity that requires appropriate ware should be a new approaches that build on the synergy of specification and verification techniques or updated free of concepts such as multimedia information and tools. Systematic, formal approaches rights multimodal authoring and reuse, knowledge man- to the analysis and verification can User Interface software, allowing an agement, argumentation and negotiation. address the problems of this particular interaction between one or several user It will bring together scientists and engi- domain with automated and reliable tools and a computer software of any kind neers who work on designing and/or that also incorporate semantic aspects. through ordinary devices such as key- developing the abovementioned solu- board, mouse, screen, microphone, loud- tions, as well as practitioners who eval- The WWV series provides a forum for speakers and webcams. Complex and uate them in diverse real environments. researchers from the communities of uncommon interfaces are explicitely Particular interest will be given to Rule-based programming, Automated banned. approaches built according to well- Software Engineering, and Web-ori- established pedagogical principles. ented research to facilitate the cross-fer- The software should present a clear tilization and the advancement of hybrid advantage to the user compared with Topics of interest include: methods that combine the three areas. standard mouse and keyboard applica- • software engineering issues in tools tions. If the progam is an update to an supporting CoPs Topics existing software, copyright issues con- • multimedia authoring and reuse in We solicit paper on formal methods and cerning the base software must be con- CoPs techniques applied to Web sites, Web firmed and clearly specified by the can- • knowledge management services for services or Web-based applications, didate before making the registration. CoPs such as: • mediation services for CoPs • rule-based approaches to Web site Important Dates • computer-supported collaborative analysis, certification, specification, • Entry Registration Deadline: argumentation verification, and optimization 15 September 2006 • learning issues and CoPs • formal models for describing and rea- • Software Submission Start: • evaluation issues and case studies soning about Web sites 15 September 15th, 2006 • user profiling and awareness issues in • model-checking, synthesis and debug- • Software Submission Deadline: tools supporting CoPs ging of Web sites 30 October 30th, 2006 • adaptability issues in tools supporting • abstract interpretation and program • Award Ceremony: CoPs transformation applied to the semantic 14 December 2006 • visualization issues in tools supporting Web CoPs • intelligent tutoring and advisory sys- Prizes • Web-based interactive applications tems for Web specifications authoring. • Best PC User Interface Software Award: Laptop computer The workshop is organised in the frame WWV'06 will be held as a Special Track • Best PDA User Interface Software of the EC funded 'Palette' project. of the 2006 International Symposium on Award: Pocket PC (PDA) Leveraging Applications of Formal • Innovation award: USB Scanner More information: Methods, Verification, and Validation Workshop web page: (ISoLA 2006) More information: http://palette.cti.gr/workshops/telcops06.htm http://www.locomummy.net/ More information: Palette project: http://www.dsic.upv.es/workshops/wwv06 http://palette.ercim.org/

64 ERCIM News No. 66, July 2006 EVENTS

CALL FOR PARTICIPATION CALL FOR PARTICIPATION CALL FOR PARTICIPATION The 9th ERCIM Workshop TED 06: Towards e-Democracy: FMCO 2006 - 5th International "User Interfaces for All" Partticipation, Deliberation, Symposium on Formal Methods Königswinter (Bonn), Germany, Communities for Components and Objects 27-28 September 2006 Mantova, Italy, 24-26 October 2006 Amsterdam, 7-10 November 2006 In the tradition of its predecessors, this workshop aims to consolidate recent For the past four years, the European Large and complex software systems work, and to stimulate further discussion Science Foundation programme provide the necessary infrastucture in all on the state of the art in the field of User Towards Electronic Democracy (TED) industries today. In order to construct Interfaces for All, and its increasing has focused on the development of such large systems in a systematic range of applications in the emerging methods to address societal issues via the manner, the focus in the development Information Society. The emphasis of Web and favour e-participation using the methodologies has switched in the last this year's event is on "Universal Access methodologies of modern decision anal- two decades from functional issues to in Ambient Intelligence Environments" ysis and support to involve citizens and structural issues: both data and functions stakeholders in the actual process of are encapsulated into software units The workshop will therefore focus on decision making: a true step towards e- which are integrated into large systems the new HCI challenges that Ambient democracy rather than the e-administra- by means of various techniques sup- Intelligence brings about in a Universal tion techniques that, by and large, have porting reusability and modifiability. Access perspective, with the aim to been emphasised by e-government ini- This encapsulation principle is essential envisage new scenarios of use of tiatives. At TED's heart is a vision to to both the object-oriented and the more Ambient Intelligence technologies by develop methodologies which enable recent component-based sofware engi- users with diverse needs and require- multiple decision analyses to be commu- neering paradigms. ments, and to identify some of the crit- nicated, explored and, indeed, built over ical issues that will have to be addressed the WWW, thus providing the mecha- The symposium, hosted by CWI, is a throughout all phases and aspects of the nism by which stakeholders may be four days event organized to provide an development life-cycle of interactive drawn more closely into the decision atmosphere that fosters collaborative applications and services. making process. work, discussions and interaction. The program consists of keynote and tutorial Keynote speakers: This conference occurs at the end of presentations. • Norbert Streitz, FhG-IPSI, Germany TED's funding cycle and aims both to • Alois Ferscha, Institut für Pervasive reflect on progress over the project and Keynote speakers include: Computing, Johannes Kepler to set future research agendas. • Gul Agha (The University of Illinois Universität Linz, Austria at Urbana-Champaign, USA) More information: • Sophia Drossopoulou (Imperial More information: http://www.mi.imati.cnr.it/conferences/ College, UK) http://ui4all.ics.forth.gr/workshop2006/ ted06.html • Radu Iosif (Verimag, France) • Thierry Jeron (INRIA Rennes, France) • Erik Meijer, (Microsoft research, Fellowships available in GRID Research USA) • Jayadev Misra (University of Texas at Austin, USA) The CoreGRID Network of Excellence currently offers • Vijay A. Saraswat (Penn State Fellowships for postgraduate students University, USA) in the field of GRID Research • Vladimiro Sassone (University of Sussex, UK) The CoreGRID web site also offers the possibility to post job • Jan Tretmans (Radboud University announcements related to GRID research. Job postings are free of charge Nijmegen, The Netherlands) for academic institutions and organisations. • Moshe Vardi (Rice University, USA) • Philip Wadler (University of For available positions and job postings, see Edinburgh, UK) http://www.coregrid.net/jobs More information: http://www.mi.imati.cnr.it/conferences/ ted06.html CoreGRID is a Network of Excellence administrated by ERCIM

ERCIM News No. 66, July 2006

65 IN BRIEF

ERCIM News is the magazine of ERCIM. INRIA - Michel Cosnard appointed as INRIA's Published quarterly, the newsletter reports on joint actions of the ERCIM partners, and aims to reflect new chairman. By an order of the French president the contribution made by ERCIM to the European signed on 2 May 2006, Michel Cosnard has been Community in Information Technology. Through named as Chairman of INRIA. In accordance with short articles and news items, it provides a forum INRIA's internal regulations, the chairman will also for the exchange of information between the institutes and also with the wider scientific assume the responsibilities of the Institute's CEO. He community. This issue has a circulation of 10,500 succeeds Gilles Kahn who passed away 9 February copies. The printed version of ERCIM News has 2006. Michel Cosnard represents INRIA on ERCIM's a production cost of 8 Euro per copy. It is available board of Directors. y

free of charge for certain groups. k s n i d e

Advertising b e L

Michel Cosnard is a Professor at the Polytechnic . C

For current advertising rates and conditions, /

A I

see http://www.ercim.org/publication/ERCIM_News/ School of the University of Nice-Sophia Antipolis, R N I

or contact [email protected] Director of the INRIA Sophia Antipolis research unit © Copyright Notice and member of the board of the 'Communicating Michel Cosnard. All authors, as identified in each article, retain Secured Solutions' cluster in the Provence Alpes Côte copyright of their work. d'Azur region. He already served as INRIA's CEO ERCIM News online edition http://www.ercim.org/publication/ERCIM_News/ from December 2003 to May 2004, but he asked to ERCIM News is published by ERCIM EEIG, step down from this position for personal reasons. BP 93, F-06902 Sophia-Antipolis Cedex Tel: +33 4 9238 5010, E-mail: [email protected] Michel Cosnard is a worldwide specialist in algorithm, especially in the design and ISSN 0926-4981 analysis of parallel algorithms and Grid computing. He has also worked on the Director: Jérôme Chailloux, ERCIM Manager automaton and neural network complexity. He is the author of over 100 published Central Editor: works in the most prestigious journals of the field. He is Editor-in-Chief of Parallel Peter Kunz, ERCIM office [email protected] Processing Letters, Member of the Editorial Board of Journal of Parallel Computing, Local Editors: and he has served as editor of IEEE Transactions of Parallel and Distributed Systems. AARIT: n.a. He has written two books and supervised 27 theses. He has been awarded the fol- CCLRC: Martin Prime lowing prizes: the Academy of Science Alfred Verdaguer Award (1994), the IFIP [email protected] Silver Core Award (1995), the Charles Babbage Award from the Institute of CRCIM: Michal Haindl [email protected] Electrical and Electronics Engineers Computer Society (2003). CWI: Annette Kik [email protected] CNR: Carol Peters [email protected] FORTH: Eleni Orphanoudakis [email protected] Fraunhofer ICT Group: Michael Krapp SpaRCIM - The 2006 Spanish National Awards in Informatics were announced in [email protected] May 2006. The José García Santesmases Award to the most outstanding professional FNR: Patrik Hitzelberger career was given ex-aequo to Prof. Isidro Ramos Salavert from the Technical [email protected] FWO/FNRS: Benoît Michel University of Valencia (UPV), Prof. Fernando Sáez Vacas from the Technical [email protected] University of Madrid (UPM) and Prof. INRIA: Bernard Hidoine Martí Vergés Trías from the Technical [email protected] Irish Universities Association: University of Catalonia (UPC). The Ray Walshe Aritmel Award for the researcher [email protected] developing the most significant scien- NTNU: Truls Gjestland [email protected] tific contributions to Informatics SARIT: Harry Rudin Engineering was given to Emilio [email protected] López-Zapata from the University of SICS: Kersti Hedman [email protected] Granada. SpaRCIM: Salvador Lucas [email protected] Also, two national awards recognized SZTAKI: Erzsébet Csuhaj-Varjú the activity of private and public insti- [email protected] VTT: Pia-Maria Linden-Linna From left: Alberto Prieto Espinosa, Isidro tutions in the area. The Mare Nostrum [email protected] Ramos, Emilio López-Zapata, and Fernando Award was given to Grupo Telefónica W3C: Marie-Claire Forgue Sáez Vacas. (http://www.telefonica.es/acercadetele- [email protected] Subscription fonica/eng/index.shtml). The Ramón Subscribe to ERCIM News by: Llull Award for the institutional • sending e-mail to your local editor activity in Informatics Engineering was • contacting the ERCIM office (see address given ex-aequo to Prof. Alberto Prieto Espinosa, from the University of Granada and above) • filling out the form at the ERCIM website at Juan José Moreno-Navarro, from the Technical University of Madrid (UPM). Juan http://www.ercim.org/ José Moreno-Navarro is also the Director of SpaRCIM.

66 ERCIM News No. 66, July 2006 IN BRIEF

SICS - Center for Networked CWI - Peter Boncz Systems established - a new wins ICTRegie joint industry-academia research Award 2006. center for the reliable Internet. Analyzing complex SICS is one of the winners when databases in record

Swedish government agencies time: For his . e n r e S

invest 33 million Euro in eight searching tech- t i r r e G

centers for research important niques Peter Boncz y b

o t o for the country's future competi- from CWI won the h P . n o s s tive strength. Each of these l Dutch ICTRegie O

k i r d

Institute Excellence Centre will e Award 2006 on May Peter Boncz with the ICTRegie Award. r F

y b

o receive funding over a six-year- t 16. Boncz devel- o h period, matched by corre- P oped the fast sponding funding from the busi- Bengt Ahlgren, leader of SICS MonetDB database ness community, to build up an Center for Networked Systems. system. It has applications in CRM, digital forensics, science internationally competitive envi- databases and ambient intelligence. With this technology, CWI ronment for research, develop- could launch a successful spin-off company: Data Distilleries ment and innovation. The vision ('95), now taken over by SPSS. Martin Rem, chair of the jury at of SICS Center for Networked Systems is the Reliable Internet, the Nationale ICT Awards 2006 event said: "The challenges for a secure and reliable infrastructure which provides predictable Peter Boncz were not only scientific. The real art was coupling service, enables robust applications on heterogeneous net- research results to a convincing business model - and he suc- works, is secure and at the same time easier to manage. SICS ceeded." Peter Boncz works at CWI in the MultimediaN Bsik Center for Networked Systems is led by Bengt Ahlgren. program. See: http://monetdb.cwi.nl.

CNR - Francesco Beltrame CWI - CWI was rated 'excellent' by an international evalua- has been nominated Director of tion committee in March 2006. "The combination of mathe- the Department for Information matics and computer science and fundamental and applied and Communications research gives the institute a strong and unique position in the Technologies of the Italian international research landscape," the Netherlands National Research Council, one Organisation for Scientific Research NWO said in its press of the eleven macro research release. NWO subjected six Dutch institutes to an external areas resulting from the recent evaluation. The evaluation committee comprised Frank den restructuring of CNR. The Hollander (Technische Universiteit Eindhoven), Christopher Department is responsible for Baker (University of Manchester), Susan Graham (University the coordination and evaluation of California, Berkeley), Wendy Hall (University of of the scientific and technical Southampton) and Kurt Mehlhorn (Max Planck Institute for activities of the seven CNR Computer Science, Saarbrücken). Institutes working in the ICT Francesco Beltrame. sector. Professor Beltrame holds the Chair of BioEngineering at the University of Genoa and he is President of the Scientific and Technical Committee for Industrial Research of the Italian Ministry of Education, University and CWI - Krzysztof Apt elected as Member of the Academia Research. He is also Italian representative at the European Europaea. Krzysztof Apt (CWI and Universiteit van Commission for the IST programme under FP6. Amsterdam) has been elected as Member of the Academia Europaea in the Informatics Section on 26 April. This section has 66 members of whom nine scientists come from ERCIM member institutes. The Academia Europaea is a European, non-governmental association acting as an Academy. Its mem- bers are scientists and scholars who collectively aim to promote learning, education and research. See: http://www.acadeuro.org/

ERCIM News No. 66, July 2006 67 ERCIM – The European Research Consortium for Informatics and Mathematics is an organisation dedicated to the advancement of European research and development, in information technology and applied mathematics. Its national member institutions aim to foster collaborative work within the European research community and to increase co-operation with European industry.

ERCIM is the European Host of the World Wide Web Consortium.

Austrian Association for Research in IT Institut National de Recherche en Informatique c/o Österreichische Computer Gesellschaft et en Automatique Wollzeile 1-3, A-1010 Wien, Austria B.P. 105, F-78153 Le Chesnay, France Tel: +43 1 512 02 35 0, Fax: +43 1 512 02 35 9 Tel: +33 1 3963 5511, Fax: +33 1 3963 5330 http://www.aarit.at/ http://www.inria.fr/

Council for the Central Laboratory of the Research Councils, Rutherford Appleton Laboratory Norwegian University of Science and Technology Chilton, Didcot, Oxfordshire OX11 0QX, United Kingdom Faculty of Information Technology, Mathematics and Tel: +44 1235 82 1900, Fax: +44 1235 44 5385 Electrical Engineering, N 7491 Trondheim, Norway http://www.cclrc.ac.uk/ Tel: +47 73 59 80 35, Fax: +47 73 59 36 28 http://www.ntnu.no/ Consiglio Nazionale delle Ricerche, ISTI-CNR Area della Ricerca CNR di Pisa, Via G. Moruzzi 1, 56124 Pisa, Italy Spanish Research Consortium for Informatics Tel: +39 050 315 2878, Fax: +39 050 315 2810 and Mathematics c/o Esperanza Marcos, Rey Juan Carlos http://www.isti.cnr.it/ University, C/ Tulipan s/n, 28933-Móstoles, Madrid, Spain, Tel: +34 91 664 74 91, Fax: 34 91 664 74 90 Czech Research Consortium http://www.sparcim.org for Informatics and Mathematics FI MU, Botanicka 68a, CZ-602 00 Brno, Czech Republic Swedish Institute of Computer Science Tel: +420 2 688 4669, Fax: +420 2 688 4903 Box 1263, http://www.utia.cas.cz/CRCIM/home.html SE-164 29 Kista, Sweden Centrum voor Wiskunde en Informatica Tel: +46 8 633 1500, Fax: +46 8 751 72 30 Kruislaan 413, NL-1098 SJ Amsterdam, http://www.sics.se/ The Netherlands Tel: +31 20 592 9333, Fax: +31 20 592 4199 Swiss Association for Research in Information Technology http://www.cwi.nl/ c/o Prof. Dr Alfred Strohmeier, EPFL-IC-LGL, CH-1015 Lausanne, Switzerland Fonds National de la Recherche Tel +41 21 693 4231, Fax +41 21 693 5079 6, rue Antoine de Saint-Exupéry, B.P. 1777 http://www.sarit.ch/ L-1017 Luxembourg-Kirchberg Tel. +352 26 19 25-1, Fax +352 26 1925 35 http:// www.fnr.lu Magyar Tudományos Akadémia Számítástechnikai és Automatizálási Kutató Intézet FWO FNRS P.O. Box 63, H-1518 Budapest, Hungary Egmontstraat 5 rue d'Egmont 5 Tel: +36 1 279 6000, Fax: + 36 1 466 7503 B-1000 Brussels, Belgium B-1000 Brussels, Belgium http://www.sztaki.hu/ Tel: +32 2 512.9110 Tel: +32 2 504 92 11 http://www.fwo.be/ http://www.fnrs.be/ Irish Universities Association Foundation for Research and Technology – Hellas c/o School of Computing, Dublin City University Institute of Computer Science Glasnevin, Dublin 9, Ireland P.O. Box 1385, GR-71110 Heraklion, Crete, Greece Tel: +3531 7005636, Fax: +3531 7005442 FORTH Tel: +30 2810 39 16 00, Fax: +30 2810 39 16 01 http://ercim.computing.dcu.ie/ http://www.ics.forth.gr/

Fraunhofer ICT Group Technical Research Centre of Finland Friedrichstr. 60 P.O. Box 1200 10117 Berlin, Germany FIN-02044 VTT, Finland Tel: +49 30 726 15 66 0, Fax: ++49 30 726 15 66 19 Tel:+358 9 456 6041, Fax :+358 9 456 6027 http://www.iuk.fraunhofer.de http://www.vtt.fi/tte

Order Form I wish to subscribe to the If you wish to subscribe to ERCIM News o printed edtion o online edition (email required) free of charge or if you know of a colleague who would like to Name: receive regular copies of ERCIM News, please fill in this form and we Organisation/Company: will add you/them to the mailing list. Address: Send, fax or email this form to: ERCIM NEWS 2004 route des Lucioles BP 93 Postal Code: F-06902 Sophia Antipolis Cedex Fax: +33 4 9238 5011 City: E-mail: [email protected] Country

Data from this form will be held on a computer database. By giving your email address, you allow ERCIM to send you email E-mail:

You can also subscribe to ERCIM News and order back copies by filling out the form at the ERCIM website at