CEN CWA 15992

WORKSHOP July 2009

AGREEMENT

ICS 35.240.60

English version

Harmonization of data interchange in tourism

This CEN Workshop Agreement has been drafted and approved by a Workshop of representatives of interested parties, the constitution of which is indicated in the foreword of this Workshop Agreement.

The formal process followed by the Workshop in the development of this Workshop Agreement has been endorsed by the National Members of CEN but neither the National Members of CEN nor the CEN Management Centre can be held accountable for the technical content of this CEN Workshop Agreement or possible conflicts with standards or legislation.

This CEN Workshop Agreement can in no way be held as being an official standard developed by CEN and its Members.

This CEN Workshop Agreement is publicly available as a reference document from the CEN Members National Standard Bodies.

CEN members are the national standards bodies of Austria, Belgium, Bulgaria, Cyprus, Czech Republic, Denmark, Estonia, Finland, France, , Greece, Hungary, Iceland, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, Netherlands, Norway, Poland, Portugal, Romania, Slovakia, Slovenia, Spain, Sweden, Switzerland and .

EUROPEAN COMMITTEE FOR STANDARDIZATION COMITÉ EUROPÉEN DE NORMALISATION EUROPÄISCHES KOMITEE FÜR NORMUNG

Management Centre: Avenue Marnix 17, B-1000 Brussels

© 2009 CEN All rights of exploitation in any form and by any means reserved worldwide for CEN national Members.

Ref. No.:CWA 15992:2009 E CWA 15992:2009 (E)

Contents

Contents 2 Foreword 9 Executive summary 10 Problem statement 10 Approach 10 The five challenges 10 Semantics 10 Data transformation 11 Process handling 11 Metasearch 11 Object identification 11 Best practice case 11 Recommendations 12 Summary of recommendations 13 Overall recommendations 13 List of recommendations on different topics 14 Standards 14 Short-term recommendations 14 Long-term recommendations 14 Taxonomies 15 Short-term recommendations 15 Long-term recommendations 15 Ontologies 15 Short-term recommendations 15 Structured data mapping 15 Short-term recommendations 15 Long-term recommendations 16 Manual semantic annotation 16 Short-term recommendations 16 Long-term recommendations 16 Automatic information extraction 16 Short-term recommendations 16 Long-term recommendations 16 Inter-ontology mapping 17 Short-term recommendations 17 Long-term recommendations 17 Process handling 17 Short-term recommendations 17 Long-term recommendations 17 Metasearch methodology 17 2

CWA 15992:2009 (E)

Short-term recommendations 17 Long-term recommendations 18 Querying 18 Short-term recommendations 18 Long-term recommendations 18 Object identification 18 Short-term recommendations 18 Long-term recommendations 18 1 Scope 19 2 Normative references 20 3 Abbreviations, terms and definitions 21 3.1 Abbreviations 21 3.2 Terms and definitions 22 4 Methodology and thematic overview 23 4.1 Thematic circle 23 4.2 Topics 25 4.2.1 Semantics 25 4.2.2 Data transformation 26 4.2.3 Process handling 27 4.2.4 Metasearch 27 4.2.5 Object identification 27 4.3 Cross-cutting concerns / Prerequisites 28 4.3.1 Legal aspects 28 4.3.2 Multiculturalism 29 4.3.3 Business models 30 4.3.4 Technology 31 5 Case study 32 5.1 The processes 33 5.1.1 The actors 33 5.1.2 Consumer process 33 5.1.3 Travel-related professional process 35 5.2 The information and communication technologies 36 5.2.1 Multiple levels of data sources 36 5.2.2 Type of information 38 5.2.3 Type of data sources 40 6 Semantics 42 6.1 Standards 42 6.1.1 Needs and requirements 42 Introduction 42 Needs 43 Requirements 44 6.1.2 State of the art 44

3

CWA 15992:2009 (E)

Types of standards 46 List of travel industry standards, companies and organizations (examples) 46 6.1.3 Gaps and future needs 57 6.1.4 Recommendations 57 Short-term recommendations (1–3 years) 57 Long-term recommendations (3–10 years) 58 6.2 Taxonomies 58 6.2.1 Needs and requirements 58 Introduction 58 Needs 58 Requirements 59 6.2.2 State of the art 59 Examples of tourism taxonomies 60 6.2.3 Gaps and future needs 61 6.2.4 Recommendations 62 Short-term recommendations (1–3 years) 62 Long-term recommendations (3–10 years) 62 6.3 Ontologies 62 6.3.1 Needs and requirements 62 Introduction 62 Needs 63 6.3.2 State of the art 64 Definitions of the notion of ontology within the computer science domain 64 Main components of an ontology 65 Ontology development tools 65 Ontology development languages 66 Examples of standard ontologies 67 6.3.3 Gaps and future needs 70 6.3.4 Recommendations 71 Short-term recommendations (1–3 years) 71 Long-term recommendations (3–10 years) 71 7 Data transformation 72 7.1 Structured data mapping 72 7.1.1 Needs and requirements 72 Introduction 72 Needs 73 Requirements 74 7.1.2 State of the art 75 7.1.3 Gaps and future needs 76 7.1.4 Recommendations 77 Short-term recommendations (1–3 years) 77 Long-term recommendations (3–10 years) 77

4

CWA 15992:2009 (E)

7.2 Manual semantic annotation 77 7.2.1 Needs and requirements 78 7.2.2 State of the art 79 7.2.3 Gaps and future needs 80 7.2.4 Recommendations 80 Short-term recommendations (1–3 years) 80 Long-term recommendations (3–10 years) 80 7.3 Automatic information extraction 81 7.3.1 Needs and requirements 81 Needs 81 Requirements 81 7.3.2 State of the art 81 Named entity recognition 82 Event extraction 82 Tourism-specific information extraction 83 7.3.3 Gaps and future needs 84 Named entity recognition 84 Event extraction 84 Tourism-specific information extraction 84 7.3.4 Recommendations 84 Short-term recommendations (1–3 years) 84 Long-term recommendations (3–10 years) 84 7.4 Inter-ontology mapping 85 7.4.1 Needs and requirements 85 Introduction 85 Needs 85 Requirements 85 7.4.2 State of the art 86 7.4.3 Gaps and future needs 87 7.4.4 Recommendations 88 Short-term recommendations (1–3 years) 88 Long-term recommendations (3–10 years) 88 8 Process handling 89 8.1 Needs and requirements 89 8.1.1 Introduction 89 8.1.2 Needs 90 8.1.3 Requirements 92 8.2 State of the art 93 8.2.1 Global standardization efforts 93 8.2.2 Application Integration and APIs 94 8.3 Gaps and future needs 94 8.4 Recommendations 95

5

CWA 15992:2009 (E)

8.4.1 Short-term recommendations (1–3 years) 95 8.4.2 Long-term recommendations (3–10 years) 96 9 Metasearch 97 9.1 Methodology 97 9.1.1 Needs and requirements 97 Introduction 97 Quality of results 97 Response time 97 Access to data 98 Efforts for maintenance 98 9.1.2 State of the art 98 Web crawler 98 HTTP requests 98 Website wrapper 99 Application Programming Interfaces (API) 99 Web services 99 Semantic annotation 99 Caching mechanism 100 Summary 100 9.1.3 Gaps and future needs 100 9.1.4 Recommendations 101 Short-term recommendations (1–3 years) 101 Long-term recommendations (3–10 years) 101 9.2 Querying 102 9.2.1 Needs and requirements 102 Introduction 102 Needs and requirements 102 9.2.2 State of the art 103 Methods for query distribution 103 Query by example 104 Standardized query languages 104 Interface standardization 105 Metadata syndication 106 9.2.3 Gaps and future needs 107 Query by example 107 Standardized query languages / SPARQL 107 Interface standardization 107 Metadata syndication 108 9.2.4 Recommendations 108 Short-term recommendations (1–3 years) 108 Long-term recommendations (3–10 years) 108 9.3 Role of registries in eTourism 109

6

CWA 15992:2009 (E)

9.3.1 Needs and requirements 109 Introduction 109 Needs 109 Requirements 110 9.3.2 State of the art 110 UDDI and the ebXML Registry Specification 110 CEN/ISSS eGovernment Focus Group and CEN/ISSS WS eGov-Share 112 9.3.3 Gaps and future needs 114 Shortcomings of current registry standards 114 Future needs 115 9.3.4 Recommendations 116 Short-term recommendations (1–3 years) 116 Long-term recommendations (3–10 years) 116 10 Object identification 117 10.1 Needs and requirements 117 10.1.1 Introduction 117 10.1.2 Needs 117 10.1.3 Requirements 118 Location codes 118 Travel service codes 118 Travel service qualifier codes 119 Travel company codes 119 10.2 State of the art 119 10.2.1 IATA 119 10.2.2 ICAO 120 10.2.3 ISO 121 10.2.4 UN/LOCODE 121 10.2.5 HEDNA 122 10.2.6 ACRISS 122 10.2.7 GIATA 122 10.2.8 GS1 123 10.2.9 URI 123 10.2.10 UUID 123 10.3 Gaps and future needs 124 10.3.1 Location 124 codes 124 Region codes 124 City, airport and other point of travel codes 125 10.3.2 Currency and language codes 126 10.3.3 Travel service codes 126 10.3.4 Travel service qualifier codes 126 10.3.5 Travel company codes 126

7

CWA 15992:2009 (E)

10.4 Recommendations 127 10.4.1 Short-term recommendations (1–3 years) 127 10.4.2 Long-term recommendations (3–10 years) 127 11 Best practice case 128 11.1 The starting point 128 11.2 The existing case of euromuse.net 128 11.3 Future scenario for euromuse.net 129 11.4 Critical discussion 130 12 Bibliography and references 132

8

CWA 15992:2009 (E)

Foreword

The objective of the Workshop CEN/ISSS WS/eTOUR on “Harmonization of data interchange in tourism” and the production of this draft CEN Workshop Agreement (CWA) was approved by the Workshop at its plenary meeting held in Brussels on 6 February 2008.

This final version of the CWA was approved by letter ballot following the final Workshop meeting on 15 May 2009 and ending on 29 May 2009.

The document was prepared by the eTOUR Project Team:

• David Faveur, Afidium, France, • Manfred Hackl, x+o Business Solutions GmbH, Austria, • Marc Wilhelm Küster, Fachhochschule Worms, Germany, • Carlos Lamsfus, Asociacion Centro de Investigacion Cooperativa en Turismo, Spain.

In his capacity as Chair of the Workshop Wolfram Höpken, University of Applied Sciences Ravensburg-Weingarten and Etourism Competence Center Austria, has contributed greatly to the work with the CWA.The Secretary of the Workshop has been Håvard Hjulstad, Standards Norway.

The following companies/organizations endorsed the CWA:

• Afidium (France) • Asociación Centro de Investigación Cooperativa en Turismo, CICtourGUNE (Spain) • BIT Reiseliv (Norway) • Centre de Recherche Public Henri Tudor (Luxembourg) • ECCA – Etourism Competence Center Austria • eCl@ss – International Classification System (Germany) • euromuse.net – the European exhibition portal • ETOA – European Tour Operators Association • Fachhochschule Worms (Germany) • Hochschule München – Fakultät für Tourismus (Germany) • FernUniversität in Hagen (Germany) • Infoterm – International Information Centre for Terminology • IfM – Institute for Museum Research – SMB-PK (Germany) • OpenTravel Alliance (USA) • Smart Information Systems (Austria) • Travel and Telecom Ltd (UK) • TTI – Travel Technology Initiative Ltd (UK) • Universitat Oberta de Catalunya (Spain) • x+o Business Solutions GmbH (Austria)

This CEN Workshop Agreement is publicly available as a reference document from the National Members of CEN : AENOR, AFNOR, ASRO, BDS, BSI, CSNI, CYS, DIN, DS, ELOT, EVS, IBN, IPQ, IST, LVS, LST, MSA, MSZT, NEN, NSAI, ON, PKN, SEE, SIS, SIST, SFS, SN, SNV, SUTN and UNI.

Comments or suggestions from the users of the CEN Workshop Agreement are welcome and should be addressed to the CEN Management Centre.

9

CWA 15992:2009 (E)

Executive summary Problem statement

Tourism is in the vanguard of ICT adoption and eBusiness in the area of eMarketing and online sales (B2C). Yet, in a ranking of various sectors the tourism industry only achieves a mid-level score in the overall use of ICT and eBusiness. It is still lagging behind especially regarding the deployment of ICT infrastructure and the adoption of e-integrated business processes [eBusiness W@tch Report 2006/2007, p 167]. At the same time, tourism is an important and growing sector of the European economy, with a large presence of SMEs.

Electronic data interchange and the interoperability between systems of different parties are critical for the execution of eBusiness processes throughout the entire industry. A CEN Workshop was set up to recommend approaches for reaching global interoperability, i.e. seamless data interchange and execution of business processes in the tourism sector, meeting the requirements of players on all levels of the value chain. Approach

Data interchange has two key components: The electronic data itself and the exchange of data between two or more tasks in larger process chains. This hinges on the ability of all tasks to understand the data they are supposed to consume – i.e. data interoperability – and of processes to be able to meaningfully cooperate – process interoperability. This draft CWA thus circles around the two core issues “data” and “processes” and related challenges in the domain that need deeper analysis. In particular, we have identified five topics for further analysis that are briefly outlined below: “semantics”, “data transformation”, “process handling”, “metasearch” and “object identification”.

These five topics are placed in the larger context of four cross-cutting concerns that permeate all of them. Tourism transactions on the one hand regularly transcend national and cultural boundaries and frequently involve both very small and very large players. On the other hand, very many of the parameters – rating systems for accommodation, opening hours of sites, classification of beaches – are regulated nationally or even regionally and reflect cultural preferences. All transactions must naturally follow pertinent national or regional laws and regulations. This leads to the four cross-cutting concerns “Legal aspects”, “Multiculturalism”, “Business models”, and “Technology”. The five challenges Semantics

The meaning and structure of data is at the heart of data interoperability and, given the plethora of pertinent formats, it is unfortunately a complex problem. Agreed

10

CWA 15992:2009 (E) strategies towards the expression of semantics in eTourism applications are key to the flexible integration of heterogeneous data structures from a wide number of data sources. In this it is also a central requirement for the building of flexible, cross- organizational process chains.

Data transformation

The co-existence of many different data formats already implies the need to transform data during data exchange. This mapping can affect data structures on different levels that need to be transformed:

• Meta data: Ontologies and taxonomies; • Structured data; • Unstructured data.

Together with well-defined semantics, data transformation is an essential tool to integrate data sources and build cross-organizational processes.

Process handling

The has significantly boosted the use of ICT in the tourism industry and empowered customers to make travel arrangements autonomously by the use of a wide variety of different data sources. This requires the seamless interplay of different computer systems, allowing new online services like dynamic packaging of tourism products.

Metasearch

Metasearch proper builds on shared semantics and data transformation to enable searches across different individual search components of heterogeneous websites and aggregate the results in a unified list. From a user’s perspective they offer thus a one-stop entry point to a specific type of information; from a technology perspective they have high demands on distributed data querying.

Object identification

Electronic transactions often hinge upon the idea of being able to uniquely identify the objects on which they operate. In contrast to for example flights, there are many types of objects in Tourism that do not have a unique identifier. There is at present no universally accepted scheme to identify, say, a given hotel that should be booked, or to compare different offers for the same hotel. Best practice case

To demonstrate the whole interoperability issue and reflect on ways how to solve the problems derived from the five challenges, the existing eTEN project euromuse.net has been chosen as a best practice case. euromuse.net deploys the Harmonise technology, a result of a former IST project, to mediate between different data

11

CWA 15992:2009 (E) formats from the cultural heritage and the tourism sector and is confronted very much with the same challenges as discussed in the workshop report.

“Mediation” has been identified as the key concept to reach interoperability in a highly fragmented and diversified area like the tourism industry. This best practice case demonstrates the way how to easily reach interoperability by data mediation, while leaving enough flexibility to each partner to define his own data format. Recommendations

The workshop came up with a number of recommendations that are all centred around the basic idea to deal with the diversity of existing standards, technologies, projects, and entities – rather than bringing another standard to the market. The keywords in this context are harmonization and mediation.

The suggested approach is to watch carefully existing standards or approaches, when starting to create something new, and to build upon them keeping differences to a necessary minimum. This harmonization shall help to avoid isolated standards and approaches that make interoperability difficult.

Furthermore, ways should be found to mediate between the remaining differences of existing approaches. The tourism sector has come up with a broad spectrum of different standards and models, and for various reasons it will be difficult, if not even impossible, to replace them. This diversity is also needed to some extend and mediation between them shall help to deal with these differences.

To oversee the market, it is highly recommended to implement a watchtower as a follow-up action within the work of this CEN workshop, keeping a map of the semantic landscape to support harmonization of data and offering technology and recommendations to mediate between existing standards. HarmoNET, as an existing non-profit network, established out of a European project and dedicated to data mediation in tourism, shall be the starting point for this watchtower.

In addition it is recommended to invest in long-term research on semantic methods and tools, as well as new ways of object identification, to continue what has already started in several European projects.

These recommendations aim at keeping diversity and flexibility of the European eTourism landscape, while allowing process and data interoperability for the actors involved to achieve a higher level of e-integration.

12

CWA 15992:2009 (E)

Summary of recommendations Overall recommendations

The workshop came up with a number of recommendations that are all centred around the basic idea to deal with the diversity of existing standards, technologies, projects, and entities – rather than bringing another standard to the market. The keywords in this context are harmonization and mediation. The desirable it seems to unify terms and standards to allow easy exchange of information and execution of processes, the important it is to leave the market flexibility and diversity to define data schemas. Instead, ways should be found to mediate between the different approaches. The tourism sector has come up with a broad spectrum of different standards, and for various reasons it will be difficult, if not even impossible, to replace them.

One reason is that eTourism-relevant information, like most of all product descriptions and information classifications, are often deeply rooted in local and national peculiarities and are sometimes even expressed by national law. Take as simple example the classification of “sea view” or of “wellness area”.

Another reason is the game of market forces, making it difficult to reach consensus on the issues involved. Different from many other industries, like the construction industry, the benefit from having different standards seems to have more advantages than the lack of interoperability has disadvantages. This can be observed in the area of destination management as well as on the side of tour operators. However, the need for standardization is recognized as it can be seen from different industry associations and forums. But strong resistance can be observed when discussing approaches for European or worldwide standards.

Above all the detailed recommendations listed in the clauses below, a general approach is therefore suggested to harmonize (keeping differences to a minimum) and to mediate (enable understanding between the differences) existing formats and standards. This approach must be flexible, easy to use and cost-effective, as it is the case for example within the project euromuse.net, which is described as the best practice case. These criteria are critical to the success of the approach, since the tourism sector is characterized by a large number of small and medium-sized organizations.

The approach of mediation shall in no way invite to establish as many isolated new standards as possible. One should rather try to watch carefully existing standards or approaches when starting to create something new, to enable later mediation between them as easily as possible, only deviating from other standards where it is absolutely required.

To ease the harmonization and mediation, it is highly recommended to implement a watchtower, keeping a map of the semantic landscape and offering technologies and recommendations to mediate between existing standards. The watchtower shall monitor relevant standards and reference lists to see what is coming up and what is used in the market. It could also help to identify existing standards or frameworks for 13

CWA 15992:2009 (E) object identification. In addition it could also keep track of technologies and projects easing the problem of data and process interoperability, to come up with recommendations on interoperability approaches and best practises for data models and for interoperability approaches. At the same time the watchtower could operatively be offering a data mediation service between the recognized standards in the field, to serve as a central data meditation service.

It is recommended to write a more detailed proposal for this “eTourism watchtower” as a follow-up action within the work of this CEN workshop. HarmoNET, as an existing non-profit network established out of a European project and dedicated to data mediation in tourism, is the ideal starting point for this watchtower. HarmoNET is composed by main tourism bodies from different levels and is well positioned as the host for the watchtower.

In addition it is recommended to invest in long-term research on semantic methods and tools, as well as new ways of object identification, to continue what has already been started in several European projects. A more detailed recommendation on research areas is given in the following clauses. However, the proposed watchtower could also help to identify gaps and needs for long-term research.

All these recommendations aim at keeping diversity and flexibility of the European eTourism landscape, while allowing process and data interoperability for the actors involved to achieve a higher level of e-integration. List of recommendations on different topics

Under each topic are listed short-term recommendations and long-term recommendations with time spans of 1–3 and 3–10 years respectively.

Standards

Short-term recommendations

• Leverage existing standards rather than develop new specifications whenever possible. • Build cooperation between private associations like IATA, OTA, and XFT and formal standardization bodies such as ISO and CEN. • Build a “watchtower” registry of relevant eTourism standards that is also acting as a coordination body between various formal and informal standardization activities. Such an activity can be modelled on the MoU/MG.

Long-term recommendations

• Lower the entry barrier for participation in pertinent formal and informal standardization bodies especially for SMEs and extend the scope of those activities to cover the requirements of SMEs. • Work on interoperability approaches between different standards.

14

CWA 15992:2009 (E)

Taxonomies

Short-term recommendations

• Follow existing taxonomies including established definitions wherever possible. • Produce mappings between eTourism-related taxonomies. • Federate existing eTourism-related taxonomies across languages based on taxonomy mappings and offer a SKOS (Simple Knowledge Organisation System) interface to them. • Formulate guidelines for the design of eTourism-related taxonomies.

Long-term recommendations

• Build organizational structures for the long-term duration of eTourism-related taxonomies.

Ontologies

Short-term recommendations

• Use recognized standard reference models such as the Harmonise ontology (for tourism purpose) or CIDOC CRM (for cultural heritage data) wherever possible. • Produce guidelines for the mappings between eTourism-related ontologies based on standard reference models. • Use established standards such as RDF(S), OWL or the Topic Map Constraint Language to express ontologies. • Heighten the awareness of Open Source, user-friendly tools for ontology definition such as Protegé.

Structured data mapping

Short-term recommendations

• Use (graphical) mediation tools enabled with reasoning capabilities to automatically suggest same (semantically equivalent) data resources, identify inconsistencies and decreases the amount of human intervention in the mapping processes. • Pursue the design and implementation of new data resources on the bases of agreed recommendations, such as the W3C recommendations for Semantic Web technologies.

15

CWA 15992:2009 (E)

Long-term recommendations

• Use semantic web technologies (e.g. based on RDF URIs) to name and represent (data) resources on the Web so that mapping can be automatically undertaken. • Agree the degree of formality information ought to be defined with, so that automatic mapping tools can compare information. • Ontologies should be developed on different abstraction level. Agreed high- level ontologies should be in place and should be used when defining domain ontologies. General domain ontologies should be reused when more specific sub-domain ontologies are defined.

Manual semantic annotation

Due to the nature of this topic, there can be some overlapping of recommendations with other issues that have already been covered, such as ontologies.

Short-term recommendations

• Enhance the use of standard ontologies (e.g. Harmonise) on the field of tourism. • Enhance the development of ontologies with standard languages: OWL, RDF. • Enhance the use of already existing manual annotation tools in the realm of tourism.

Long-term recommendations

• Investigate in automation of annotations: • Investigate in automatic ontology extension.

Automatic information extraction

Short-term recommendations

• Foster the use of semantic web technologies to describe non-structured data on the web by the means of resources to make data machine processable. • Semantically non-structured information.

Long-term recommendations

• Together with a recognized body such as the W3C, agree on the name that ought to be used for the tags that represent a particular tourism content and that is valid for search machines. • Develop SW that enables (semi)automatic information annotation according to the previous recommendation.

16

CWA 15992:2009 (E)

Inter-ontology mapping

Short-term recommendations

• Foster the development of ontologies using the same standard definition language as well as the same degree of formality and expressivity to ease automatic ontology mapping, following W3C recommendations.

Long-term recommendations

• Based on the short-term recommendations, build graphic user interface based tools that automatically merge and link ontologies, using the ontologies' reasoning capabilities to automatically find and resolve alignment inconsistencies.

Process handling

Short-term recommendations

• Simplify and rationalize existing processes – use stateless process handling or request-response-pairs only. • Build an ontology of common processes in the tourism industry.

Long-term recommendations

• Develop process mediators. • Put research efforts into intelligent agent technologies for automatic process handling.

Metasearch methodology

Short-term recommendations

• Make use of semantic technologies to describe your data. • Provide content and meta-content as close to an existing standard as possible. • Provide regularly updated, external data stores with pre-processed and well described content for fast querying (caching mechanism), if you have larger querying process times or complex queries. • Development of aggregated data repositories, providing pre-processed data from different sources.

17

CWA 15992:2009 (E)

Long-term recommendations

• Focus on development of fast and easy to use alternatives of metasearch technologies, enabling or supporting use of semantic technologies for data transformation.

Querying

Short-term recommendations

• If a system should be available for external queries, make use of general query statements that are supported by a broad range of query languages. Avoid specific features and functionality of own database. • Further develop flexible standardized query languages that can be adapted to different system environments and support semantically enriched data. • Publish “partial translators”, which provide a structured translation for human search concepts like “near”, that can be used by different query languages.

Long-term recommendations

• Research on technologies for flexible and adaptive query methods, that are able to understand semantics of a web repository and can send an appropriate query.

Object identification

Short-term recommendations

• Build a registry of present object identifications in the tourism industry. • Develop travel related global geography identifiers and build transcoding capabilities. • Develop travel company related global identifiers.

Long-term recommendations

• Provide guidelines for travel service coding schemes. • Build a global repository with transcoding capacities.

18

CWA 15992:2009 (E)

1 Scope

The CEN/ISSS Workshop on eTourism aims at producing guidelines for reaching global interoperability, i.e. enabling seamless data interchange and execution of eBusiness processes in the tourism sector.

The Workshop’s main deliverable will be a CEN Workshop Agreement (CWA) on “Harmonization of data interchange in tourism”.

The CWA will cover the following topics under a pan-European interoperability perspective:

a. analysis and identification of the needs of B2B and B2C partners for harmonized data interchange; b. analysis of the gaps in the design of current interoperability approaches; c. description of the metadata and principles and requirements for data modelling; d. analysis of business models and legal issues (IPRs5, DRMs, Personal data protection and privacy); e. analysis of existing initiatives and approaches for flexible harmonization and global interoperability (including process interoperability); f. recommendations concerning a general framework for eTourism related information exchange; g. best practice case.

The Workshop’s main focus is on interoperability issues in electronic data interchange. It will analyse and further build on the results of the already completed European projects Harmonise (Tourism Harmonisation Network), HarmoTEN and Satine (Semantic based interoperability infrastructure for integrating web service platform to peer to peer networks). The Workshop’s aim is to validate and disseminate their results to a wider audience than the project partners. The CEN Workshop will build on the work done by previous projects on metadata frameworks and ontologies.

It is outside the scope of the Workshop to do any direct standardization work on terminology. Instead, it will analyse existing initiatives or approaches to the interoperability problem and recommend steps how to make maximal use of such approaches as well as necessary research activities to further improve them.

The CEN Workshop will focus on data integration and discovery as well as seamless execution of eBusiness processes. Application of the above will support end-user satisfaction/consumption of travel products, increase data reliability, revenue generation and margin contribution, motivating early adoption and roll out to market.

19

CWA 15992:2009 (E)

2 Normative references

The following normative documents (European and International Standards) are referenced in this document. Other documents of interest are listed in the Bibliography.

• ISO 639-1:2002 Codes for the representation of names of languages — Part 1: Alpha-2 code • ISO 639-2:1998 Codes for the representation of names of languages — Part 2: Alpha-3 code • ISO 639-3:2007 Codes for the representation of names of languages — Part 3: Alpha-3 code for comprehensive coverage of languages • EN ISO 3166-1:2006 Codes for the representation of names of and their subdivisions — Part 1: Country codes (ISO 3166-1:2006) • ISO 3166-2:2007 Codes for the representation of names of countries and their subdivisions — Part 2: Country subdivision code • ISO 4217:2008 Codes for the representation of currencies and funds • ISO/IEC 7810:2003 Identification cards — Physical characteristics • ISO/IEC 9075 (several parts) Information technology — Database languages — SQL • ISO/IEC 10646:2003 Information technology — Universal Multiple-Octet Coded Character Set (UCS) • ISO/IEC 13250:2003 Information technology — SGML applications — Topic maps • ISO 16642:2003 Computer applications in terminology — Terminological markup framework • ISO 21127:2006 Information and documentation — A reference ontology for the interchange of cultural heritage information • ISO/IEC Guide 2:2004 Standardization and related activities — General vocabulary

20

CWA 15992:2009 (E)

3 Abbreviations, terms and definitions 3.1 Abbreviations A I AI — artificial intelligence IATA — International Air Transport API — application programming Association interfaces ICAO — International Civil Aviation ASCII — American Standard Code for Organization Information Interchange ICT — information and communications technology B IEC — International Electrotechnical B2B — business to business Commission B2C — business to consumer IEEE — Institute of Electrical and B2G — business to government Electronic Engineers C IFITT — International Federation for IT CEN — European Committee for and Travel & Tourism Standardization IFLA — International Federation of CRS — computer reservation system Library Associations and Institutions CWA — CEN Workshop Agreement IPR — intellectual property right CycL — Ontology language used in AI and computer science ISO — International Organization for Standardization D IST — Information Society DB — database Technologies DML — data manipulation language M DRM — digital rights management M2M — machine to machine E N EDI — electronic data interchange NKRL — narrative knowledge ETSI — European representation language Telecommunications Standards Institute O OML — outline markup language F OWL — ontology web language ftp — file transfer protocol P G P3P — Platform for Privacy G2C — government to citizen Preferences GDS — global distribution system PDA — personal digital assistant H PMS — property management system HEDNA — Hotel Electronic Distribution Q Network Association QBE — query by example HTML — hypertext markup language http — hypertext transfer protocol

21

CWA 15992:2009 (E)

R U RDF — resource description UCS — universal character set framework (ISO/IEC 10646) RDFS — resource description UNWTO — World Tourism framework schema Organization RMSIG — Reference Model Special URI — uniform resource identifier Interest Group (under IFITT) W S W3C — World Wide Web Consortium SCORM — Sharable Content Object WAI — Web Accessibility Initiative Reference Model WSMO — web service modeling SHOE — simple HTML ontology ontology extensions WWW — World Wide Web SME — small and medium enterprises X SOA — service-oriented architectures SQL — standardized query language XFT — exchange for travel XHTML — extensible HTML T XML — extensible markup language TCP/IP — Transmission Control XSLT — extensible stylesheet Protocol / Internet Protocol language transformation TGV — train grande vitesse: high speed train 3.2 Terms and definitions

For the purpose of this document the following terms and definitions apply. computer reservation system (CRS) — computerized system used to store and retrieve information and conduct transactions eTourism — eBusiness methods and techniques applied to the tourism domain global distribution system (GDS) — CRS connecting and integrating the automated booking systems of different organizations tour operator — person or company that organizes tours thesaurus — controlled vocabulary containing synonyms and relationships, but not definitions (see 7.2.1.2) taxonomy — subject-based classification using a controlled vocabulary in a hierarchy (see 6.2 and 7.2.1.1) ontology — (1) study of the nature of being, existence or reality; (2) structured information about reality (see 6.3) — taxonomy developed as a broad collaborative effort (see 7.2.1.3)

22

CWA 15992:2009 (E)

4 Methodology and thematic overview

Tourism is in the vanguard of ICT adoption and eBusiness in the area of eMarketing and online sales (B2C). Yet, in a ranking of various sectors, the tourism industry only achieves a mid-level score in the overall use of ICT and eBusiness. It is still lagging behind especially regarding the deployment of ICT infrastructure and the adoption of e-integrated business processes [eBusiness W@tch Report 2006/2007, p 167].

Tourism is an important and growing sector of the European economy, with a large presence of SMEs. ICT is an enabler to strengthen efficiency, reduce costs and improve competitiveness of the industry. Tourism is expected to contribute 8.4 % of total employment and 9.9 % of the GDG worldwide [World Travel and Tourism Council, 2008, p 4].

For these reasons, it is important that companies and associations in the tourism sector understand the benefits they can reap from eBusiness, enhance their ICT infrastructure, and adopt eBusiness processes.

Electronic data interchange and the interoperability between systems of different parties are critical for eBusiness processes in all industry sectors. This CWS focuses on approaches for reaching global interoperability, i.e. seamless data interchange and execution of business processes in the tourism sector.

In eBusiness implementations the tourism sector has some specificities. Data quality and reliability are critical issues (e.g. updated opening hours for a museum, reliable on-line booking). Other critical issues are territorial definition and coordination between regional or local groups and national sites. Commercial information (B2B, B2C, B2G) and “touristic information” (information to the end user, G2C) are both concerned. All involved parties provide information at different levels (e.g. government – travel warning; B2C the mentioned opening hours, B2B distribution prices and their meanings). These specificities lead to a high degree of heterogeneity in tourism. Tourism market structures are complex and highly fragmented. Information interchange on the level of processes and data structures is not harmonized and the electronic execution of business processes on a global level is still burdened by heterogeneous interfaces and data structures. 4.1 Thematic circle

Data interchange has two key components: the electronic data itself and the exchange of data between two or more tasks in larger process chains; see figure 4-1.

23

CWA 15992:2009 (E)

Figure 4-1

This hinges on the ability of all tasks to understand the data they are supposed to consume – i.e. data interoperability – and of processes to be able to meaningfully cooperate – process interoperability. Our report thus circles around the two key concepts of data and processes; see figure 4-2.

Figure 4-2

24

CWA 15992:2009 (E)

The circle captures the relationship between data and process interoperability with key enablers that need deeper analysis. In particular, we have identified the following topics for further analysis:

• Semantics • Data transformation • Process handling • Metasearch • Object identification

The topics are placed in the larger context of four cross-cutting concerns that permeate all of them. Tourism transactions on the one hand regularly transcend national and cultural boundaries and frequently involve both very small and very large players. On the other hand, many of the parameters – rating systems for accommodation, opening hours of sites, classification of beaches – are nationally or even regionally regulated or reflect cultural preferences. All transactions must naturally follow pertinent national or regional laws and regulations.

Processes will in particular be implemented in line with the process owner’s overall business model. The data structures will similarly often be dictated by the owner’s value proposition. Furthermore, both data and processes will at least to a degree reflect the technology – software, hardware, overall connectivity etc. – on which the system in question operates. 4.2 Topics

The following subsections will briefly present each of the selected topics, give a birds- eye view, and motivate the rationale for their choice. The remainder of the report will then examine the issues methodically and in more detail.

4.2.1 Semantics

The meaning and structure of data is at the heart of data interoperability – and, given the plethora of pertinent formats, it is unfortunately a complex problem. Differences on the syntactic level – say, XML messages versus comma-separated files or EDI- type communications – can already impact how much semantics data carries already in itself. Formal or informal standards can externally assign meaning to an otherwise meaningless data set (say, to an otherwise arbitrary sequence of fields in the rows of a csv file), or explicate the semantics of XML structures that to humans are already partially self explanatory.

Taxonomies can help to unambiguously specify possible value sets for the data, ideally combined with specific definitions of the individual options and their relationship to others. Ontologies can then reference and use theses value sets in properties of classes that go a long way further towards specifying the exact semantics of data.

In conjunction with data transformation techniques agreed strategies towards the expression of semantics in eTourism applications are crucial to the flexible integration

25

CWA 15992:2009 (E) of heterogeneous data structures from a wide number of data sources. In this it is also a central requirement for building flexible, cross-organizational process chains.

4.2.2 Data transformation

The co-existence of many data formats already implies the need to transform data during data exchange. This mapping can affect data structures on different levels:

• Ontologies and taxonomies • Structured data • Unstructured data

Together with well-defined semantics data transformation is an essential tool to integrate data sources and build cross-organizational processes.

Structured data: Structured data can be expressed in a number of syntactic formats (XML, csv, EDI etc.), but even within one “syntactic family” the concrete data structures regularly conflict. For XML, standardized technologies such as XSLT are used to describe and execute the mapping between data sets. However, this usually involves loss of information and only works with a limited precision depending on the similarity of the underlying data models.

Unstructured data: Much of the eTourism-related data is only available in unstructured formats such as web sites. To use this data in automatic transactions is difficult at best, as the semantics of individual data sets are quite unclear. Two strategies can help to explicate their meaning: explicit manual semantic annotation and automatic information extraction:

• Semantic annotation: Key information on a web page is explicitly added to the site as metadata in a machine-readable format (e.g. a serialization format of RDF). • Automatic information extraction: The unstructured information is automatically structured according to some predefined templates. The information is then available for reuse.

Inter-ontology mapping: Often a number of independent ontologies compete in a given domain, even more so in the case of overlapping domains. Furthermore, different, though related standards such as RDF-S, OWL and Topic Maps [ISO/IEC 13250] are in common use to express ontologies syntactically or to describe the constraints applied to it. These multitudes of approaches imply the need for reference ontologies such as the Harmonise ontology to exchange semantic information across individual ontologies.

An even larger number of formats are currently employed to express taxonomies, many of which can be mapped to the reference system defined in the Terminology Markup Framework [ISO 16642].

26

CWA 15992:2009 (E)

4.2.3 Process handling

The eTourism sector has evolved from isolated online presences to online transaction platforms. The WWW has significantly boosted the use of ICT in the tourism industry and empowered travellers and tourists to access more information from a wide variety of data sources. More and more consumers want to arrange their stay on their own and combine different products to a unique bundle instead of buying pre-packaged tours. Dynamic packaging, for example, has become one of the most discussed buzz words at industry events, but without process interoperability it will always remain a more local or otherwise limited phenomenon.

Consumers are getting more and more used to make online transactions, and it comes to a crowding out process: Business actors have to follow demand to keep or expand their market share. Traditional distribution channels are vanishing, and more flexible and dynamic networks rise. A trend for outsourcing and focussing on core competences could be observed, leading to a more consumer-centric approach and allowing highly individualized and ad-hoc product design. This challenge brings with it the need to orchestrate business processes flexibly and across organizations.

4.2.4 Metasearch

One of the prerequisites for process handling is the ability to identify the relevant players for potential joint processes and to find information across those players. Registries, especially federated registries, will play a leading role in describing potential partners and their services. They will thus facilitate to bring them together.

Metasearch builds on shared or mapped semantics and data transformation to enable searches across different individual search components of heterogeneous instances (platforms, websites, databases) and aggregate the results in a unified list. From a user’s perspective they offer thus a one-stop entry point to a specific type of information (e.g., hotels or flights).

At present search components differ in their query syntax, which makes it difficult to scale metasearches and to spontaneously integrate new data sources. For the actual technical realization of metasearches agreed query strategies and query syntaxes are therefore desirable and being worked upon.

4.2.5 Object identification

Electronic transactions often hinge upon the idea of being able to uniquely identify the objects on which they operate. A flight booking service needs to have a clear idea how the flight booked actually maps to the physical event of a flight operated between a point of departure A and a destination B. The mapping does not have to be one-to-one – in our example a flight may have a number of identifiers through code-sharing –, but the object described must be distinct.

While object identification does work for flights, there are many other types of objects in eTourism that do not have a unique identifier. One of the most important cases in

27

CWA 15992:2009 (E) point is accommodation. There is at present no universally accepted scheme to identify, e.g., a given hotel that should be booked. 4.3 Cross-cutting concerns / Prerequisites

By their very nature cross cutting concerns permeate the requirements for and implementation of all topics. We will here outline major characteristics of these concerns that will be referenced and (if needed) expanded upon in the relevant clauses of the report.

4.3.1 Legal aspects eTourism transactions do not happen in a void. They regularly transcend national and regional boundaries and are governed by the legal system(s) of the country or countries concerned. Such transactions are almost always impacted by contractual law, such as the laws that govern the contract between the end user and the tour operator or service provider(s), between tour operators and their destinations, or the obligations of travel agencies towards their customers. Related to this we see the laws regulating the redress that one of the contractual partners can seek in the case of a perceived break of obligations.

Laws, however, influence many other areas in eTourism transactions. The following list is only indicative and certainly not a complete overview of pertinent legislation:

• Reporting obligations on security and crime prevention, especially in the case of air transport. • Classification schemes, e.g. in the form of legally defined eTourism terms (in some countries). • Customer protection laws setting, amongst others, minimum standards for the data provided to end users. • Reporting obligations on statistics. • Health and anti-discrimination regulations that can impact eTourism data and processes. • Media publication regulations when sharing or reusing media on Internet in general.

Some countries and regions such as Oberösterreich even have dedicated laws on tourism (see http://www.oberoesterreich-tourismus.at/alias/lto/recht/410624/ tourismusrecht.html).

These laws are only partially harmonized across Europe, [Directive 90/314/EEC] as a directive setting minimum pan-European standards for customer protections for packaged tours being more an exception than the rule. Furthermore, the legal systems of countries across Europe do not necessarily cover the same areas. For example, hotel classification is mandated by law in some countries such as Italy and Greece and does not even exist in others such as Finland.

Dynamic packaging (example): The example of dynamic packaging might illustrate some aspects of the impact of legal system on tourism. For pre-packaged tours, the

28

CWA 15992:2009 (E) legal situation is quite clear from an end user’s point of view. Such tours are always regulated by the national laws in question. The tour operator is from the customer’s perspective the only contractual partner [Freyer, 2006, p 234] [Directive 90/314/EEC] and responsible for providing all the services that were promised. It is also alone responsible for any possible redress that may result from unsatisfactory services.

The situation is much muddier for extras such as car rental at the place of destination for which a travel agency only acts as an intermediate. Dynamic packaging poses even bigger problems in this direction. An intermediary – often a specialized travel agent – combines pre-assembled packages based on user preferences. The user does not administrate the different items in the package himself, but gets offers for packages which are dynamically assembled based on his preferences. However, the legal and contractual consequences of such dynamic bundles are not clear yet. For an end-user such a bundle of sub-packages can imply also a set of separate contracts which do not by themselves necessarily fall under the definition of “package” of [Directive 90/314/EEC]. In consequence, the contractual situations and the legal mechanisms for redress can be quite more complex for dynamic packaging. An unnamed provider of software components for eTourism transactions named this as the single biggest obstacle to the uptake of dynamic packaging.

4.3.2 Multiculturalism

Related to legal aspects are the multicultural facets of many eTourism transactions which span cultures and frequently involve both very small and very large players, thus also mixing organizational cultures. Culture here is a much wider concept than high culture and covers “the set of distinctive spiritual, material, intellectual and emotional features of society or a social group, and [...] encompasses, in addition to art and literature, lifestyles, ways of living together, value systems, traditions and beliefs” [UNESCO, 2002]. Europe in particular is characterized by multiculturalism right down to its official motto, unity in diversity.

Many cultural preconditions have influenced local description systems such as rating systems for accommodation or classification of beaches, which in some countries are nationally or even regionally regulated. Others such as usual opening hours of sites or food offerings follow usually local customs without being subject to laws.

Multilingualism: Languages are an integral and often defining part of cultures, and as such multiculturalism includes multilingualism, the coexistence of many languages. Until around the turn of the millennium the treatment of multilingual data in computer systems posed major problems. However, the widespread adoption of the Universal Character Set (UCS) [ISO/IEC 10646], also known as Unicode, and its companion standards has changed the game. The UCS is supported in virtually all current operating systems and many application programs including all major browsers and email clients. XML is squarely based on the UCS. Thus both the internal representation, the exchange and the display of multilingual data is now quite unproblematic.

That said, some of the Global Distribution Systems (GDSs) that are at the core of many eTourism transactions stem from the 1950s and 1960s, and even the youngest of the “big four” GDSs, Amadeus, was written in the 1970s and 1980s. In this they 29

CWA 15992:2009 (E) long predate the UCS and have at best sketchy support for multilingual data. Many to this day operate on subsets of ASCII. This obviously can create considerable issues notably for the handling of personal names and the names of organizations. It is outside of the scope of this report to elucidate these issues in detail, though it would be highly beneficial to be able to get this overview.

Taxonomies and terminology are another important area in which data is necessarily language-dependent. The exact definitions of categories such as “double room” (with or without children), “luxury hotel” etc. will reflect the understanding in a given language and culture.

Accommodation ratings (example): A concrete example for classification systems that reflect on legal, cultural, linguistic, not to forget (in some countries) personal preferences are accommodation ratings. As we have seen, some countries require hotels to be classified whereas others do not even have a national classification system. And where they do exist, the quality criteria differ widely from country to country, as an overview such as http://www.hotelstars.org/ shows. For example, a three-star hotel in Germany (http://www.hotelsterne.de/uk/system_kriterien.php) is guaranteed to have rooms 14 m2 or bigger for singles and 18 m2 or bigger for doubles with bilingual employees and a reception that is open 12 hours a day, to single out only a few of the criteria. The rooms in a three-star hotel in Poland (http://hotelarze.pl/en/regulations/) on the other hand must only have 10 m2 resp. 14 m2, but a minimum 12 hour room service, but do not need to command foreign languages. Many other criteria are not even comparable as the overall schemes are quite different.

In countries such as the USA, the officious AAA classification is complemented by many classification systems that are specific to individual travel websites such as Expedia or Travelocity and at times even calculate customer feedback into the figures. Depending on the (often non-transparent) weighting of criteria, these sites arrive at quite different ratings, which in turn may deviate again significantly from customer ratings [Grossman, 2004]. With the increasing dominance of international sites, these additional ratings are going to start competing with the official or officious national European rating systems.

In view of this multitude of taxonomies, which may or may not in turn coincide with the customer’s own cultural and personal preferences, ratings will have to be based on specific properties of accommodation, and, for that matter, general service, rather than on general classifications alone. Searches for “hotels with WiFi, restaurant and rooms over 20 m2” are likely to produce more acceptable results for users of many cultural background than searches on “3-star” alone.

4.3.3 Business models

Each player in the tourism industry operates on an implicit or explicit business model. The value proposition can be the traditional offer of a service, e.g., accommodation; it can be the “convenience” and consulting proposition of a travel agent, or the “integrated packaging” approach of a tour operator, to name only a few. Much of the thrust of customer-driven eTourism transactions stems, in fact, from the desire to disintermediate the industry or, at least, to offer a new type of intermediaries that 30

CWA 15992:2009 (E) operate automatically and can thus compete primarily on the price front and, in part, are closely related to today’s GDSs. The GDSs themselves operate on two related, but distinct business models, namely of being a service company for major service providers such as airlines, and as an integration platform for intermediaries.

All eTourism activities must be seen in the context of the relevant business models. They dictate the initial willingness to interchange data and to engage in cross- organizational processes. In much of a sense this willingness is a premise for this report.

4.3.4 Technology

The advent of the World Wide Web makes a watershed also for the tourism industry. As we have seen, GDSs have been operational since the early 1960s, but they depended on highly proprietary distribution networks to allow travel agents to interact with them. The advent of videotext systems such as BTX in Germany and Minitel in France and similar technologies in the 1980s somewhat opened and standardized these channels, but by and large the communication channels remained accessible only to professional intermediates.

The success of the WWW has largely standardized the communication channels between providers to standard internet protocols; not necessarily http, though, as many larger data sets are still transferred using ftp or related protocols. The underlying technology has in many cases changed much less, though, with today’s GDSs largely operating on the same transactional stacks as before, but some – though by no means all – of its details have been abstracted away through the common protocols.

This standardization on common network protocols has allowed for the rise of collaboration standards such as SOAP-based Web Services, XML-based data formats, semantic standards and, last but not least, the http standard itself that is again in today’s emphasis on RESTful web services. This report concentrates on the interoperability layer between implementations.

31

CWA 15992:2009 (E)

5 Case study

Mechanisms and solutions for electronic data exchanges in the tourism industry were developed a long time ago at first by airline companies in order to allow them to be able to exchange data about flights and bookings. Different standards emerged from those initial operational exchanges, taking into account the limitation of the means of communications of that time.

Over the years, the need to access inventory, prices, booking files, customer data and sales or descriptive information has boomed, first through the development of the GDSs (Sabre in 1960, Galileo in 1971), main CRSs (Pegasus, Wizcom, etc.) and more recently with the web, used both for B2B and B2C applications.

The thematic circle introduced earlier (4.1) will be illustrated through the following case study. The base guideline for the case study corresponds to a consumer (end user or travel-related professionals) wanting to book a trip or gather travel-related information using information and communication technologies.

Figure 5-1

The case study is first detailed in terms of different trip phases and corresponding information needs and processes to be used by the consumer:

• Before the trip to end up with a booked travel; • Before the trip to increase his knowledge around the trip, update the trip itself, etc.; • During the trip to amend his trip or input comments, media, etc.; • After the trip to testify, complain, etc.

Platforms, technologies, types of information and data sources are reviewed within the case study. Some drawbacks and limitations, gaps and future needs will also be identified and associated to the elements of our thematic circle, which will then be detailed later in the document.

32

CWA 15992:2009 (E)

5.1 The processes 5.1.1 The actors

We consider the case study to be as general as possible and include any type of tourism actors and business processes. In the context of the case study, we differentiate the following types of tourism actors:

• end consumer (traveller, customer booking for somebody else, etc.), • travel related professional (incoming agent, tour operator agent, travel consultant, etc.).

Figure 5-2

5.1.2 Consumer process

Buying a trip taking advantage of the web can be seen as a four steps global process:

• Discovering: o select possible destinations and types of trips based on personal or family interests (a particular activity or hobby, a destination, etc.); o select according to a season (winter sport, sun in winter, etc.); o investigate prices and opportunities, accommodations, services, events, etc.; o explore recommendations and ratings from other travellers; o etc. • Shopping: to match reality with expectations: o compare prices; o compare content of offers (similar offers, different types of trips, etc.); o investigate testimonies. • Constituting the trip itself by: o validating price and availability for a trip from a unique vendor, or o amalgamating components from different vendors – such as hotel vendor, pre-packaged tour vendor, airline company, etc. (in a unique booking or in multiple bookings); o requesting bids or quotes or alerts from different vendors. • Finalizing the buying process (confirmed or option booking(s)): o finally buy from a unique vendor, or o buy the amalgamated components (stored in a unique or multiple bookings); o add links to reference data (to keep track of weather, health or country data, activities, testimony, etc.); o pay (deposit or total).

33

CWA 15992:2009 (E)

Once a booking is finalized, this is not the end of the process. Certain consumers would continue browsing the web to

• complement: Search for additional information, testimonies, activities to perform, exchange with people having travelled in the same club or region; • bargain: Find better opportunities and counter proposals or complements; • manage: Simply update their booking(s) to take into account new information (a change of plan, more people joining, new activities to cram into the agenda, make a special request for a meal, print the e tickets or itineraries, consulting with specialists, pay the due amounts, etc.).

Finally, during or after the trip comes the part that is now booming with the web 2.0 sites: The consumer could

• testify: He will add its own piece of information on the web, using forums, testimony sites, polls; • publish new generated content, such as media, text; • enrich its profile(s) on the different sites in order to keep in touch with opportunities in relation with their interests; • follow his subsequent trips in case he actually prepared more than one trip or he acquired components that would be valid on several trips; • share common interests in order to organize group events; • possibly file and follow up a complaint.

This is illustrated by the schema in figure 5-3.

34

CWA 15992:2009 (E)

Figure 5-3

5.1.3 Travel-related professional process

Not all end consumers perform the whole process online, but require assistance for parts or all of the life cycle of a booking. In that case, part of the above mentioned activities would apply in a B2B framework, with more or less the same features. Additionally, travel professionals would also consume specific expert processes not necessarily available for end consumers, such as

• air ticketing in case of negotiated fares; • building complex itineraries including items that cannot be found or bought online; • finding availabilities or better prices where automated systems would not; • bringing added value services or expertise that would correspond to the differentiation of the distributor (specialized destination or activity, luxury trips, etc.).

Other professional processes also revolve around the major task of publishing data for professional and end consumer use, such as

• publishing fares, • providing information on products and destinations, • referencing other sources of information, • selecting and ranking data (vendors, destinations, etc.), • etc. 35

CWA 15992:2009 (E)

Those additional processes either rely on the same systems, platforms and communications means as the ones available to end consumers, but with advanced features, rely on specific systems not available to end consumers or end up being manual. 5.2 The information and communication technologies

The present case study supposes that the consumer uses information and communication technologies (over the web or a private network, a public web site, a restricted B2B site, a dedicated rich application, etc.) to consult travel information. The case study considers that the front system used by the end consumer takes advantage of multilevel sources, possibly even of a multilevel dynamic network of travel related services. Though sources may publish heterogeneous structured and non-structured data, the front system would still provide homogeneous access to its end user for all the data they publish. The distributor should have the responsibility and choice of the final formatting and proposed processes.

This is of course only possible depending on the flexibility of the exchanges, on the formats made available by the sources and intermediates, on the extensive use of semantic web and other mechanisms allowing automated exchanges and recognition of meaning and data. This will be detailed in the present document.

The user may also consult different sites in parallel, therefore initiating different processes. This behaviour is considered outside the present case study.

5.2.1 Multiple levels of data sources

Figure 5-4

The owner of the communication and information technology would usually own one or several data sources and directly make us aware of them. That would be the case for instance for a hotel group for its hotel data (editorial text, prices, availabilities, comments, etc.).

36

CWA 15992:2009 (E)

The front system may also connect to other external sources to aggregate additional information. A hotel chain may not own the inventory of each hotel and could interrogate the different hotel PMS or hotel groups CRS to validate the availability.

Each of those additional external sources could therefore either own the data or itself aggregate content from other sources, therefore creating a chain of sources involved in a single request from the consumer. That would typically be the case for an online site like Opodo dynamically requesting airline availability and fares from a GDS (Amadeus in our example), itself launching requests to different airlines in relation with the expected city pair.

The added value of using layers of sources would reside in their capacity to

• concentrate coherent data from different sources (such is the case of GDSs for airlines, comparators); • enrich data from a source by either directly adding data or by concatenating data from other external sources (like web sites proposing different types of trips).

Online agencies such as Expedia or Opodo also have back office systems to enter and maintain editorial data, price lists and stocks. That would be their own data source. They typically do not own destination, weather, policy or health related data but use external sources such as Lonely Planet or government web sites. Those distributor in-house systems also usually connect to GDSs (Global Distribution Systems such as Amadeus or Galileo) to request airline fares and availability. We would be in the situation where an intermediate data source browses other external data sources for information.

This need for a distributed architecture composed of distinct systems around the world and owned by different companies with various strategies and technologies lead to a number of constraints and requirements identified as cross-cutting aspects in the previous introduction:

• Technical aspects come first to mind, with the need to ensure compatibility of the different systems, increase the reliability of the individual elements, measure the impact on architectures and scale accordingly. Performance of the different systems and of the overall chain is key and leads to additional complexity (such as caching, uniqueness of data, etc.). • Business models must also be taken into account because making money is central for the complete system to work smoothly. There must therefore be the capacity to: o use other systems against retribution (fixed price, price per transaction, percentage of a booking, etc.); o add mark-ups along the chain and still get a competitive price; o access net prices directly on intermediate levels in the chain; o etc. • Legal aspects is equally important, with the necessity to ensure that o the information and products found and possibly purchased on the different systems can legally be purchased or used; 37

CWA 15992:2009 (E)

o the distributor and the end user will have the capacity to track individual providers so that they fulfil their obligation (provided there is the same notion at the provider’s place), in case of any issue. • Even multiculturalism is present when speaking about systems composing the complete infrastructure: o Provision of services (and support) on a 24 hours basis and not to stop servers during the night is unusual in certain countries or for small companies. o Documentation to consume the service may not be written in a widely used language such as English or with multiple translation.

The main topics involved to allow process and data interoperability also come into play in case of multilevel data sources:

• Object identification is mandatory to avoid cumbersome and time-consuming transcoding to allow data enrichment along the chain and ultimately comparison and cleaning of results. • Semantics provide the structure behind the data in order to have coherence between the layers and to merge the information after some data transformation have been performed. • Data transformation is central to the implementation of multilevel data sources because data seldom share the same formats even when based on the same standards. • Metasearch is the key to search for services dynamically and have loosely bound systems. • Without efficient process handling, multi level data sources efficiency will remain minimal and would only correspond to juxtaposing data from different services without true interactions.

With all these elements in place in the multilevel data sources scenario of our case study, we could have complex processes in place like dynamic packaging for instance with

• data interoperability, sharing and grouping objects with different identifiers and semantic definitions, and • process interoperability: o compatibility of the different exchanges for each sub process; o capacity to have evolution only on certain components of the system; o etc.

5.2.2 Type of information

The type of information that front platforms would provide in our case study are:

• product-related data (editorial data, testimony, media, prices, availability, technical data detailing the travel (flight numbers, airlines, type of rooms, etc.), marketing qualification;

38

CWA 15992:2009 (E)

• destination-related data (geography, health, climate, history, activities, religions, etc.); • customer-related data (in case user is a known customer or consultant – identity, preferences, past trips, relatives and family members, additional qualification, etc.); • etc.

According to the type of information, different types of issues and needs arise, that we have again grouped based on our thematic circle with first the topic of our thematic cycle and then the pre requisites:

• Object identification to pinpoint unique identical elements: o Same object, but coming from different sources (e.g. the same hotel from different web sites); o Same object, but within aggregated content (e.g. a hotel within a packaged tour same as a hotel sold alone). • Semantics: o Identify meaning of information (e.g. identify climate information in country-related content); o Provide explicit objective structured definitions and rules and not just transcoding features (e.g. to explain for each provider what a double room is – like D = exactly 2 adults whereas DBL may contain 2 adults but would also accept a child in an extra bed). • Data transformation: o Extract media from text; o Extract text from HTML; o Extract data with certain meaning from a complete document; o Map different ontologies to be able to share information (e.g. to understand that a D Room for a provider is a DBL room for another and a double for a third); o etc. • Process handling: o Network capacity to ensure that a complete complex process would be able to run on separate systems with true system independence; o According to the presence or absence of certain data, launch certain process, stop certain process, direct to alternative sources; o Cost effectiveness; o Reliability, stability and performance to ensure that the user will not suffer from failure or inefficiency of one or several process chain elements; o Launch alerts according to certain contents or lack of certain contents; o Ensure security over the complete process chain; o Updates of data sources (through a manual request, remotely, with automatic synchronization, etc.); o etc. • Metasearch: o Perform queries without being hindered by specific query syntax;

39

CWA 15992:2009 (E)

o Browse the internet for certain types of information without having to setup the searches manually. • Multiculturalism: o Multilingualism of content (information in Italian provided by the Italian government to be used in the USA); o Access the right media for the audience (pictures with or without people, certain colours of hairs, etc., would be more or less appropriate in certain countries for instance); o Dynamic translations. • Technology: o Share communication protocols or at least use interoperable communication protocols o Data accessibility (XML-based format to publish the data accessible via web services versus csv file to be sent by mail); o Large updates of data sources (large amounts of data for each source, multiplication of sources, etc.). • Legal aspects: o Reliability of the data itself; o Estimate the quality of the data; o Determine the legal constraints associated with a piece of information. o Conditions to use and distribute the data; o Condition to store data not owned; • Business models: o Business model in relation with the use of the data.

5.2.3 Type of data sources

Typical data sources in the travel industry are:

• GDSs (Galileo/Worldspan, Amadeus and Sabre – Allowing access to airline, car rental, hotel, ferry, insurance, leisure, etc. systems – via XML, Edifact or flat file transfers); • Specialized online platforms providing structured data for a certain type of activity (Car rental companies, Hotel concentrators, Tour operators, destination management systems, etc., usually via XML based web services or XHTML data); • Web sites providing HTML based data (structured and non structured).

This is illustrated in figure 5-5.

40

CWA 15992:2009 (E)

Figure 5-5

As introduced in the previous clauses, each data source may in turn connect to multiple data source of the same type or of other types.

For instance, a specialized pre-packaged provider could:

• Connect to Galileo for scheduled air; o Galileo would directly connect to Delta airlines but o Galileo would connect to Amadeus to get Air France flights • Connect to Pegasus for hotels; o Pegasus would hold certain inventories but o Pegasus could access Gulliver or Transhotels for others • Connect to Trip advisor to get testimonies both on airlines and hotels • etc.

41

CWA 15992:2009 (E)

6 Semantics 6.1 Standards 6.1.1 Needs and requirements

Introduction

The first word that may come to mind when talking about data and information interoperability and exchange is “standards”. Standards have traditionally been widely used in different industries. The general goal of standards and standardization is to allow compatibility, interoperability, safety, repeatability, quality, etc. The process of developing and agreeing upon a general standard is known as standardization.

Generally speaking a standard is an established norm or requirement that needs to be followed in order to allow components (of different nature and origin) to fit and work together. It is usually a formal document that establishes uniform engineering or technical criteria, methods, processes and practices designed to be consistently used as a rule, guideline, or definition.

The ISO/IEC Guide 2 defines a standard as “a document established by consensus and approved by a recognized body that provides for common and repeated use, rules, guidelines or characteristics for activities or their results, aimed at the achievement of the optimum degree of order in a given context”. Standards help to make life simpler and to increase the reliability and the effectiveness of many goods, services and processes. They are intended to be a summary of good and best practices rather than general practice. Standards are created by bringing together the experience and expertise of all interested parties such as the producers, sellers, buyers, users and regulators of a particular material, product, process or service. Standards are designed for voluntary use and do not impose any regulations. However, laws and regulations may refer to certain standards and make compliance with them compulsory. For example, the physical characteristics and format of credit cards is set out in International Standard ISO/IEC 7810:1996. Adhering to this standard means that the cards can be used worldwide.

Within the computer science domain and Information and Communication Techno- logies standards have also been widely used and are becoming increasingly more important. There are a vast number of both software and hardware developers and manufacturers worldwide that produce different items. These items do need to follow particular standards in order to work together in a satisfactory manner. As the amount of information contained on the Internet increases every second, a unified represent- tation for web data and resources is needed in today’s large scale Internet data management systems. This unification of standards will allow machines to meaning- fully process the available information and to (successfully) exchange and integrate data coming from distributed databases and information management systems. This has been occurring, e.g. in the context of eLearning with the development of the SCORM (http://www.adl.net/) and AICC (http://www.aicc.org/) standards, or in the

42

CWA 15992:2009 (E) context of telemedicine applications with the development of standard data transport protocols such as HL7 and ISO/IEEE 11073, among others.

This is also mandatory in the tourism sector as it is changing from a labour-intensive industry into a knowledge and information-intensive industry. In the tourism domain the usage of information systems to support market processes has not reached the goal of a single electronic tourism market. The complex structure in traditional tourism markets, characterized by a lot of different distribution channels and long value chains, was transformed one-to-one to its electronic counterpart. The result was a multitude of different electronic tourism markets. The most important obstacle to a single tourism market is the missing commitment of all market participants on the semantics of information to be exchanged as well as on the method for the exchange.

There have already been some efforts invested in this direction (see 6.1.2) in order to enable distributed data exchange and integration. Interoperability between databases and information sources needs to be provided on both a technical and informational (semantic) level. The social value of the Web is that it enables human communi- cation, commerce, and opportunities to share knowledge, information and experien- ces. One of W3C’s (World Wide Web Consortium, http://www.w3c.org/) primary goals is to make these benefits available to all people, whatever their hardware, software, network infrastructure, native language, culture, geographical location, or physical or mental ability might be.

Needs

Benefits of use of standards

Standards have proved to be a powerful tool for organizations of all sizes, supporting innovation, increasing productivity and efficiency in their business processes. Effective standardization promotes competition and enhances profitability, enabling a business to take a leading role in shaping the industry itself. Generally speaking, standards allow a company to:

• attract and assure customers; • demonstrate market leadership; • create competitive advantage; • develop and maintain best practice.

Standards within business

In modern business effective communication along the supply chain and with legislative bodies, clients and customers is imperative. Applying standards within the everyday operation of a company provides the means to measure various variables and thus, to be able to manage the evolution of the variables, providing benefits when applied within the infrastructure of a company itself. Business costs and risks can be minimized, internal processes streamlined and communication improved. Standardization promotes interoperability, providing a competitive edge necessary for the effective worldwide trading of products and services.

43

CWA 15992:2009 (E)

Requirements

Within the tourism industry standards may help companies to be more competitive in terms of being present on the web by complying with information and communication standards and recommendations. In order to achieve exchange and integration of information through different information systems, information formats and transfer protocols must be compatible and ought to allow any hardware and software used to access the information to work together.

Furthermore, information integration and exchange are required to provide trade and commerce operations capacity on web sites, so that a local company be globally present through the web and increases its business opportunities.

In order to do this there must be some kind of standard or sufficiently agreed communication protocol between information systems that are interchanging information. It may be standards about data repositories (how to structure data in a repository or how that data ought to be named) or standards about intermediation systems, such as HarmoNET ontology.

Regarding Web and information standards one of the most active bodies the W3C. W3C designs and promotes interoperable open (non-proprietary) formats and protocols to avoid the market fragmentation of the past. A W3C Recommendation is the equivalent of a web standard, indicating that this W3C-developed specification is stable, contributes to web interoperability, and has been reviewed by the W3C membership, who favours its adoption by the industry.

To reach a general information standard in order to enable information integration and exchange within the tourism sector is a relatively complex activity. Thus, recommendations by official and recognized bodies (such as W3C) in this direction are to follow to allow tourism companies’ information management systems to effectively process all information and to interoperate.

6.1.2 State of the art

As mentioned before, a standard is “a document established by consensus and approved by a recognized body that provides for common and repeated use, rules, guidelines or characteristics for activities or their results, aimed at the achievement of the optimum degree of order in a given context”.

ETSI is the European Telecommunications Standards Institute. It is an independent, non-profit organization in the telecommunications industry in Europe with world-wide projection. ETSI is officially responsible for standardization of Information and Communication Technologies (ICT) within Europe.

ETSI standards could be described in general as being “definitions and specifications for products and processes requiring repeated use”. They are certainly a set of rules for ensuring that a process is always carried out the same way with a certain degree of quality, or that a product is always manufactured following the same tasks and the tasks the same order, also, complying with a certain degree of quality assumed to be generally satisfactory. 44

CWA 15992:2009 (E)

A more complete definition of a “standard” from an ETSI perspective would be: “A technical specification approved by a recognized standardization body for repeated or continuous application, with which compliance is not compulsory and which is one of the following:

• International Standard: a standard adopted by an international standardization organization • European Standard: a standard adopted by a European standardization body • National Standard: a standard adopted by a national standardization body and made available to the public.”

(Source: Directive 98/34/EC definitions.)

ETSI standards making priorities include:

• fully specified scoping; • consistent use of specific terms; • accurate referencing; • contextualizing of abbreviations; • accuracy and completeness of technical content; • clear and unambiguous requirements; • legibility and comprehension.

Two major objectives of ICT standardization are interconnection and interoperability. ETSI’s uncompromising approach facilitates these by ensuring content is easily interpretable, understandable and unambiguous. Only this level of attention to detail can produce the truly high quality standards that Industry, Operators and Users now demand to grow their increasingly global markets.

Standards can be found throughout daily life, but why would we need to use standards? Rather than asking why we would need standards, we might usefully ask ourselves what the world would be like without standards. Products would not work as expected. They would be of inferior quality and incompatible with other products or equipment, in fact they would not even connect with them, and in extreme cases; non-standardized products could potentially be dangerous.

From a user’s standpoint, standards are extremely important in the computer industry because they allow the combination of products from different manufacturers to create a customized system. Without standards, only hardware and software from the same company could be used together. In addition, standard user interfaces can make it much easier to learn how to use new applications.

Most official computer standards are set by one of the following organizations:

• ANSI (American National Standards Institute); • ITU (International Telecommunication Union); • IEEE (Institute of Electrical and Electronics Engineers); • ISO (International Organization for Standards); • VESA (Video Electronics Standards Association).

45

CWA 15992:2009 (E)

Types of standards

The primary types of technical standards are:

• A standard specification is an explicit set of requirements for an item, material, component, system or service. It is often used to formalize the technical aspects of a procurement agreement or contract. For example, there may be a specification for a turbine blade for a jet engine which defines the exact material and performance requirements, shape, etc. This guarantees that components produced by different manufactures may be used and assembled in the same product and perform as expected. • A standard test method describes a definitive procedure which produces a test result comparable with a reference in order to validate a certain product. It may involve making a careful personal observation or conducting a highly technical measurement. For example, a physical property of a material is often affected by the precise method of testing: any reference to the property should therefore reference the test method used. • A standard procedure (or standard practice) gives a set of instructions for performing operations or functions, usually tasks to be carried out in a particular order. For example, the quality assurance system at companies ensures that all procedures within one company have been identified, defined and are always carried out the same way. • A standard guide is general information or options which do not require a specific course of action. • A standard definition is formally established terminology and sufficiently agreed within the expert community.

List of travel industry standards, companies and organizations (examples)

Following there is a classification and description of standards within or with direct relevance for the tourism domain. Standards and initiatives have been assigned to five different categories: Tourism initiatives and vocabularies; eBusiness vocabularies; eBusiness frameworks; Business semantics; Modelling languages:

• Tourism initiatives and vocabularies: o ACRISS: The Association of Car Rental Industry Systems Standards has devised a car coding system, the ‘ACRISS Code’ This identifies the features of a car so that you can be sure your client gets the same standard of car wherever they rent in Europe from an ACRISS Member. o ANSI ASC X121 TG08, American National Standards Institute: The ANSI X12 standards have been the first branch-independent standards for EDI, but their focus is only on the North American market. Today ANSI X12 has specified more than 275 document types, so-called transaction sets to be used in B2B. Similarly, to UN/EDIFACT the ANSI X12 syntax is based on hierarchical structuring and implicit data element identification. However, X12 has its unique set of notations and

46

CWA 15992:2009 (E)

rules on representations. X12 does not make use of composite data elements. ANSI ASC 12 is divided into branch-specific subcommittees. The subcommittee X12I is responsible for the area of transportation. Each subcommittee consists of multiple task groups. X12I TG08 is the task group for travel, tourism and leisure. o CEN/TC 329, European Committee for Standardization / Technical Committee Tourism Services: The Technical Committee 329 Tourism Services of the European Committee for Standardization focuses on the standardization of terminology and specification of facilities and services including tourism related activities that can be used in information and reservation systems. Accordingly, CEN/TC 329 develops a European glossary of definitions for tourist terms. The project has brought together numerous national and international trade associations and interest groups as well as tour operators, public institutions and consumer groups. The glossary covers domain know- how that should be captured by an ontology in the field of tourism. o DATEX: DATEX is a European task force set up to standardize the interface between inter-regional Traffic Control Centres. The standardization work resulted in the Data Exchange Network (DATEX- Net) Specifications for Interoperability, which is a set of basic tools to provide a common interface including a common Data Dictionary, a common set of EDIFACT messages and a common Geographical messaging system. o Enjoy Europe: In the enjoyeurope initiative, previously InTouriSME, launched in 1996, 40 European regions agreed on a common metadata definition (Minimum Data Set MDS) about key tourism information, federating the local legacy systems and encapsulating the metadata descriptions. The first tangible result of this federation is the EnjoyEurope portal. In addition the achieved European tourism data interoperability will provide the critical-mass information base to new services. o HEDNA stands for Hotel Electronic Distribution Network Association, http://www.hedna.org/. HEDNA is a global association focused on indentifying distribution opportunities and providing solutions for the lodging industry and its distribution community. HEDNA’s activities are intended to stimulate the booking of hotel rooms through the use of Global Distribution Systems, the Internet and other electronic means. HEDNA works on the following directions: ƒ Optimizing the use of current and emerging technologies; ƒ Providing an opportunity for an open exchange of information among members; ƒ Educating industry partners. ƒ HEDNA has produced the Unique Global Identifiers for the Hospitality Industry, UGI, which is a unique reference number to identify and provide information about operational units within the hospitality industry. A UGI is a random number that is attached to attribute and relationship information.

47

CWA 15992:2009 (E)

o HITIS, Hospitality Industry Technology Integration Standards: The goal of HITIS is to identify general functions (of property management systems) and standardize their implementation. In addition, a common data dictionary for hospitality relevant data is to be developed. HITIS provides an object standard and therefore specifies standardized interfaces for objects providing the identified functions. The object standard is additionally provided as XML specifications. o IATA: The International Air Transport Association. The main objective of this association is to assist airline companies to achieve lawful competition and uniformity in prices. IATA assigns the following standard identifiers: ƒ IATA Airport Codes, or also IATA location identifier, to designate most of the airports around the world; ƒ IATA Railway station Codes: Following the idea of the Airport Codes, IATA has also labelled different railway stations in the world, especially if there is an agreement between a flying company and a railway provider; ƒ IATA Airline designation code that identifies airlines operating worldwide; ƒ IATA also assigns codes to delays. o IATAN: IATAN’s mission is to promote professionalism, administer meaningful and impartial business standards, and to provide cost- effective products and services that benefit the travel industry. Through the use of its informational and other resources, IATAN provides a vital link between the supplier community and the US travel distribution network. o ICAO: The ICAO Council adopts standards and recommended practices concerning air navigation, prevention of unlawful interference, and facilitation of border-crossing procedures for international civil aviation. In addition, the ICAO defines the protocols for air accident investigation followed by transport safety authorities in countries signatory to the Convention on International Civil Aviation, commonly known as the Chicago Convention. The ICAO also standardizes certain functions for use in the airline industry, such as the Aeronautical Message Handling System AMHS; this probably makes it a standards organization. The ICAO defines an International Standard Atmosphere (also known as ICAO Standard Atmosphere), a model of the standard variation of pressure, temperature, density, and viscosity with altitude in the Earth’s atmosphere. This is useful in calibrating instruments and designing aircraft. The ICAO standardizes machine-readable passports world-wide. Such passports have an area where some of the information otherwise written in textual form is written as strings of alphanumeric characters, printed in a manner suitable for optical character recognition. This enables border controllers and other law enforcement agents to process such passports quickly, without having to input the information manually into a computer. ICAO publishes Doc 9303, Machine Readable Travel Documents and the technical standard for machine-readable passports. A more recent standard is for biometric passports. These contain biometrics to authenticate the 48

CWA 15992:2009 (E)

identity of travellers. The passport’s critical information is stored on a tiny RFID computer chip, much like information stored on smartcards. Like some smartcards, the passport book design calls for an embedded contactless chip that is able to hold digital signature data to ensure the integrity of the passport and the biometric data. o IFITT RMSIG, IFITT Reference Model Special Interest Group: The objective of the IFITT Reference Model Special Interest Group (IFITT RMSIG) is the harmonization of electronic tourism markets in an open and flexible manner, based on a reference model. The main purpose of the IFITT RMSIG is bringing together the different market participants and domain experts to ensure a broad acceptance of the reference model. The reference model, provided by the IFITT RMSIG, is a framework for modelling electronic tourism markets. Instead of fix standardization, the reference model enables the flexible description of specific models, based on a common modelling language and standardized building blocks as vocabulary. The purpose of the reference model is to enable the description of specific models for specific standards or data exchange formats in a form understandable by other market participants and enable a mapping between different standards. Suppliers of tourism services or brokers within the tourism market can use the reference model to describe their specific standards or data exchange formats which can be understood and used by other market participants. In this way, not only new but also existing standards or data exchange formats can be integrated into one open electronic tourism market. o JourneyWeb: The project JourneyWeb researched, designed and developed an Internet-based protocol for dynamic exchanging of electronic schedule data between distributed heterogeneous computing systems, allowing any telephone enquiry centre or Internet-based service universal access to any public transport information, regardless of location. These services allow any traveller to obtain an unbiased selection of integrated journey alternatives, which may contain trips from multiple travel modes (train, bus, coach, air), and be remotely sourced from different databases and software suppliers. o KAREN, Keystone Architecture Required for European Networks: The KAREN Framework Architecture gives support when intelligent transport system (ITS) implementation is being planned and prepared, and offers a basis for an European integrated approach to ITS. The organization of the project was designed to facilitate the complete process to be followed from the establishment of European requirements, through the production of a comprehensive European Transport Telematics Framework Architecture, to the creation of consensus and endorsement of the results. o omnis-online: omnis-online is an electronic marketplace platform for the travel and tourism industry. The essential feature of omnis-online is the contractual and procedural framework which makes it possible for buyers and sellers of holiday and travel products, in different parts of the world, to trade together. omnis-online provides a standards book, which is a statement of procedures, rules and definitions to govern the 49

CWA 15992:2009 (E)

use of omnis-online, especially including product description standards (based on XML). o OTA, Open Travel Alliance: The Open Travel Alliance aims to promote the free flow of travel services through multiple distribution channels. Therefore, the objective is to provide a vocabulary and grammar for communicating travel-related information as tags across all travel industry segments. These tags will be implemented using the eXtensible Markup Language (XML). o SIGRT, Sistema de Informaçao de Gestao de Recursos Turísticos: SIGRT is an information system that serves as a global reference in the promotion of national tourism products in Portugal. The enormous source of information spanning over multiple touristic sectors is made available to the general public, to tourism operators and other public or private institutions in the sector. The system is based on a tourism resource database that enables the storage management and availability of data elements regarding the national tourism services. In the scope of SIGRT are defined standardized data structures for describing tourism products. The information is structured according to 190 different, identified types each having its separate data structures. o TIH, Travel Information Highway: The Travel Information Highway (TIH) is an open communications approach to facilitate the exchange of real time information between network operators themselves and with driver information service providers. It was developed with the objectives of co-ordinating network operating strategies and providing high quality information services to the travelling public. o TIN, Tourism Information Norm for the German tourism: The TIN aims to provide rules for a uniform presentation and search structure within the information and reservation systems of the German tourism. Therefore, it defines and structures the characteristics for describing tourism services and specifies the access to the tourism services. o TourinFrance: TourinFrance aims to develop a nation-wide common data format for describing and exchanging tourism information. This data format respects the independence and autonomy of all the regional and local tourist information systems, but should allow some day to aggregate the local contents at a national level. This national project will therefore enable the exchange of data between the existing players involved in the French tourism: the French government services, the regions, the counties, the tourist offices, and the major tourism federations. The format, which is NOT based on XML, allows transmitting information about the following touristic entities: hotels, campsites, self catering, restaurants, events, natural sites, cultural sites, leisure activities, tourist routes, and holiday villages, and holiday resorts. Currently, TourinFrance does not support any commercial transactions. o Transmodel: TRANSMODEL is a reference data model for Public Transport operations. TRANSMODEL is a description of the data of interest to a company in designing an Integrated Information System. TRANSMODEL is a conceptual model and does not mandate any 50

CWA 15992:2009 (E)

particular implementation at the logical or physical level. TRANSMODEL increases the efficiency of transport operations by underpinning them with more secure and reliable Information Systems. TRANSMODEL is also expected to open the market by allowing integration of complementary software products from different suppliers. o TransXchange: TransXChange aims to define a national data standard for the interchange of bus route registration, route and timetable information between operators, the Traffic Area Network, Local Authorities and Public Transport Executives, and the National Passenger Transport Information System. TransXChange is a standard for data records, thus its scope is defined by the possible extent of its contents which are in turn determined by the concepts to be supported. o TRIDENT, TRansport Intermodality Data sharing and Exchange NeTworks: The goal of the project is to support multimodal travel ITS services by establishing common and reusable mechanisms that enable sharing and exchanging data between transport operators (content owners) of different modes (bus/tram/metro, rail and road) as well as information service providers. It will also investigate and propose solutions for the organizational and strategic issues hampering travel intermodality. This will lead to proposals for new standards as well as to recommendations supporting the implementation of systems based on the project’s results. o TTI, Travel Technology Initiative: The Travel Technology Initiative was created to establish technology standards within the travel industry. TTI maintains and publishes the Unicorn EDI messages, of which there are now over 130 in use throughout the travel industry. The TTI is cooperating with the OTA on establishing XML standards. o UIC 912, Union Internationale de Chemins de Fer 912 protocol: UCI 912 is a proprietary protocol developed by UIC to support the information exchange of their members. The application areas covered are international freight, passenger and baggage traffic, and documentary research. Each message format contains a header followed by a series of 32 bit fields. EDIFER (UIC’s competence centre for EDI standardization) is maintaining the UIC 912. However, it was a strategic decision to adopt the UN/EDIFACT standards as the overall UIC standard. Recently, a new working group on XML has been established within UIC. UIC was participating in ebXML. o UN/LOCODE: This is a Code for Trade and Transport Locations. It is a geographic coding scheme developed by United Nations Economic Commission for Europe (UNECE). The UN/LOCODE assigns codes to locations used in trade and transport with functions such as seaports, rail and road terminals, airports, post offices and border crossing points. UN/LOCODEs have five characters. The first two are letters, and come from the EN ISO 3166-1 alpha-2 country codes. Normally three letters will follow, but if there are not enough combinations, numbers from 2 to 9 can also be used. For airports, the three letters following the are not always identical to the IATA airport code.

51

CWA 15992:2009 (E)

o UN/EDIFACTTT&L, United Nations rules for Electronic Data Interchange for Administration, Commerce and Transport, Travel, Tourism and Leisure: UN/EDIFACT aims at facilitating the electronic exchange of business data between communication partners and is comprised of a set of internationally agreed standards directories and guidelines for the electronic exchange of structured data. o USTOA: The Tour Operator Association is a professional association representing the tour operator industry. It is composed of companies whose tours and packages encompass the entire globe and who conduct business in the USA. o WATA, World Association for Travel Agencies: WATA is since 50 years the leading association of the travel trade. With members in most countries and on all continents it stands for quality and reliability. An own guarantee-fund covers transactions. The MASTER KEY and the GLOBAL TRAVEL PLANNER help tour operators and travel agents in their daily business and the yearly General Assembly fosters friendship. • eBusiness vocabularies: o cXML, commerce eXtensible Markup Language: cXML allows buyers, suppliers, aggregators, and intermediaries to communicate using a single, standard, open language. Successful business-to-business electronic commerce (B2B e-commerce) portals depend upon a flexible, widely adopted protocol. cXML is a language designed specifically for B2B e-commerce and provides access to products and services. cXML transactions consist of documents, which are simple text files with well- defined format and content. Most types of cXML documents are analogous to hardcopy documents traditionally used in business. o OAG, Open Applications Group: The Open Applications Group is a non- profit industry consortium focussing on promoting the easy and cost- effective integration of key business application software components for enterprise and supply chain functions for end-user organizations. The Open Applications Group Integration Specifications (OAGIS) accelerate component integration and electronic commerce by providing capabilities for Supply Chain Integration using the Extensible Markup Language (XML). The Open Applications Group has also published a proposal for a common middleware (OAMAS) that, when adopted with OAGIS, will move the industry much closer to the vision of plug and play compatibility for business applications. o RosettaNet: RosettaNet is an independent, non-profit organization dedicated to promoting an industry-wide initiative to agree on and adopt common electronic business processes world-wide. RosettaNet focuses on building a master dictionary to define properties for products, partners, and business transactions. This master dictionary, coupled with an established implementation framework (exchange protocols), is used to support the eBusiness dialog known as the Partner Interface Process (PIP). RosettaNet PIPs create new areas of alignment within the overall IT supply-chain eBusiness processes, allowing IT supply- chain partners to scale eBusiness, and to fully leverage electronic

52

CWA 15992:2009 (E)

commerce applications and the Internet as a business-to-business commerce tool. o SIMPL-edi: SIMPL-edi’s purpose is to provide more focused EDI messages based on simple, standard international data elements and well structured master files. It builds on the best practice work already done with an aim to provide a sound basis for the widest, most cost effective use of electronic commerce and associated computer applications. The philosophy of Simpl-EDI is a common shared understanding of what is required to pass between companies involved in the supply, transport and purchase of all types of goods and services. Simplified messages rely upon the removal of redundant or stable information from the messages themselves to master data files where it can be separately accessed or processed. o xCBL, XML Common Business Library: The XML Common Business Library (xCBL) is a set of XML building blocks and a document framework that allows the creation of robust, reusable, XML documents to facilitate global trading. It essentially serves as the “mother code”, providing one language that all e-marketplace participants can understand. This interoperability allows businesses everywhere to easily exchange documents across multiple e-marketplaces, giving global access to buyers, suppliers, and providers of business services. • eBusiness frameworks: o BizTalk: BizTalk is an industry initiative defining the BizTalk Framework, an Extensible Markup Language (XML) framework for application integration and electronic commerce. It includes a design framework for implementing an XML schema and a set of XML tags used in messages sent between applications. The BizTalk Framework will be used to produce and publish XML schemas in a consistent manner. o ebXML, electronic business XML: The mission of ebXML is to provide an open XML-based infrastructure enabling the global use of electronic business information in an interoperable, secure and consistent manner by all parties. ebXML, sponsored by UN/CEFACT and OASIS, is a modular suite of specifications that enables enterprises of any size and in any geographical location to conduct business over the Internet. Using ebXML, companies now have a standard method to exchange business messages, conduct trading relationships, communicate data in common terms and define and register business processes. o eCoFramework: The goal of the eCo Framework project is to develop a common framework for interoperability among XML-based application standards and key electronic commerce environments. The project’s working group will develop a specification for content names and definitions in electronic commerce documents, and an interoperable transaction framework specification. o Ontology.org: Ontology.Org is an independent industry and research forum focussed upon the application of ontologies in Internet commerce. It is the central goal of Ontology.Org to use ontologies to

53

CWA 15992:2009 (E)

the problems that impact the formation and sustainability of large electronic trading groups. o OntoWeb: OntoWeb - Ontology-based information exchange for knowledge management and electronic commerce, a collaborative network of European researchers and industrials, which aims to strengthening the European influence on Semantic Web standardization efforts such as those based on RDF and XML. o OO-edi, Object-oriented edi: OO-edi is an attempt to put the Open-edi Reference Model into practice. It is based on business process and information modelling methodologies. The resultant model specifies the business flow needs and identifies related object classes to the extent that production of off-the-shelf software to support EDI exchanges becomes feasible. Business process and information modelling in OO- edi is based on the Unified Modelling Language (UML). The methodology used for business process and information modelling is a customization of the Rational Unified Process and is called UN/CEFACT’s Modelling Methodology (UMM or “N90”). The development of this methodology was influenced by the methodology used at SWIFT. Later on, the methodology merged with the modelling methodology & metamodel used in RosettaNet. UMM is referenced by ebXML as the preferred methodology to model ebXML compliant business processes. o Open-edi, ISO/IEC 14662:1997 Information Technology — Open-edi reference model: Open-edi specifies a Reference Model which should enable business partners to do business electronically without any prior agreements. o SOAP, Simple Object Access Protocol: SOAP provides the definition of an XML document which can be used for exchanging structured and typed information between peers in a decentralised, distributed environment. It is fundamentally a stateless, one-way message exchange paradigm, but applications can create more complex interaction patterns (e.g., request/response, request/multiple responses, etc.) by combining such one-way exchanges with features provided by an underlying transport protocol or application-specific information. SOAP is silent on the semantics of any application-specific data it conveys, as it is on issues such as the routing of SOAP messages, reliable data transfer, firewall traversal, etc. o UDDI, Universal Description, Discovery and Integration: UDDI enables companies to publish how they want to conduct business on the web, potentially fuelling growth of business-to-business (B2B) electronic commerce. UDDI will benefit businesses of all sizes by creating a global, platform-independent, open architecture for describing businesses and services, discovering those businesses and services, and integrating businesses using the Internet. Therefore, the core function of UDDI is performed by a UDDI Business Registry. o XML/EDI, eXtensible Markup Language / Electronic Data Interchange: XML/EDI is an extension of EDI and aims to enhance the EDI mechanisms by the flexibility and extensibility of XML. The basic 54

CWA 15992:2009 (E)

approach of the XML/EDI framework is expressing EDI mechanisms using XML syntax. To reach full dynamic electronic commerce the XML/EDI framework provides three additional components: process templates containing processing information, software agents interpreting the process templates and repositories providing syntactic and semantic information needed for the execution of EDI transactions. • Business semantics: o BSR, Basic Semantic Register: The BSR is an official ISO data register for use by designers, implementers and users of information systems in a manner which will allow systems development to move from a closed to an open multilingual environment, especially for use in domestic and international electronic communication including electronic commerce and EDI. The purpose of the BSR is to provide an internationally agreed register of multilingual data concepts, semantic units (SU), with its technical infrastructure. This will provide storage, maintenance and distribution facilities for reference data about semantic units and their links (bridges) with operational directories. The semantic units will be built from semantic components, which can be considered as building blocks. o ISO/IEC 11179, Specification and Standardization of Data Elements: ISO/IEC 11179 is a multi-part International Standard concerning data element specification and standardization. The complete set includes six interrelated parts, with each part focusing on one aspect of data element development and maintenance. o UNSPSC, United Nations Standard Product and Services Classification: The UNSPSC Code is a coding system to classify both products and services. It has been established in 1999 by the merger of the UN Common Procurement Code (CPC) list with the Dun and Bradstreet Standard Product and Services Code list. UNSPSC is a five level hierarchical taxonomy. A product or service is identified by a two character numerical (and a textual description) for each level. For example the code “90” on the highest level contains “Travel and Food and Lodging and Entertainment Services”. UNSPCS codes can be used in a UDDI registry for the identification of products and codes. o UNTDED, Trade Data Elements Directory, ISO 7372: The standard data elements included in this Directory are intended to facilitate interchange of data in international trade. These standard data elements can be used with any method for data interchange on paper documents as well as with other means of data communication: they can be selected for transmission one by one, or used within a particular system of interchange rules. • Modelling languages: o DCMI, Dublin Core Metadata Initiative: The Dublin Core Metadata Initiative (DCMI) is an organization dedicated to promoting the widespread adoption of interoperable meta data standards and developing specialized metadata vocabularies for describing resources that enable more intelligent information discovery systems. The DCMI is committed to the continual refinement of a “core” foundation of property 55

CWA 15992:2009 (E)

values and types to provide vertically specific (or semantic) information about Web resources, much in the same way a library card catalogues provide indexed information about book properties. The Dublin Core Metadata Element Set (DCMES) was the first metadata standard developed out of the DCMI as an IETF standard. DCMES provides a semantic vocabulary for describing “core” information properties, such as “Description” and “Creator” and “Date”. o GINF, Generic Interoperability Framework: The Generic Interoperability Framework (GINF) has been developed to facilitate integration of heterogeneous components. One of the main principles it employs is the generic representation of protocols, languages, data and interface descriptions. The current implementation of the framework is based on RDF. The implementation of GINF provides semantic-oriented middleware for application development and integration. GINF middleware allows creating open and highly extensible client/server applications. o OIL, Ontology Inference Layer: OIL is a proposal for a web-based representation and inference layer for ontologies, which combines the widely used modelling primitives from frame-based languages with the formal semantics and reasoning services provided by description logics. It is compatible with RDF Schema (RDFS), and includes a precise semantics for describing term meanings (and thus also for describing implied information). o Object Management Group: The OMG was formed to create a component-based software marketplace by hastening the introduction of standardized object software. The organization’s charter includes the establishment of industry guidelines and detailed object management specifications to provide a common framework for application development. Primary goals are the reusability, portability, and interoperability of object-based software in distributed, heterogeneous environments. Within specific task forces are developed specifications for special area markets or domains (domain interfaces) e.g. the Transportation or the Retail group. o RDF, Resource Description Framework: RDF provides the foundation for metadata interoperability across different resource description communities. RDF allows descriptions of Web resources - any object with a Uniform Resource Identifier (URI) as its address - to be made available in machine understandable form. This enables the semantics of objects to be expressible and exploitable. RDF is based on a concrete formal model utilizing directed graphs that elude to the semantics of resource description. The basic concept is that a Resource is described through a collection of Properties called an RDF Description. Each of these Properties has a Property Type and Value. Any resource can be described with RDF as long as the resource is identifiable with a URI. o UML, Unifying Modelling Language: The Unified Modelling Language (UML) is a language for specifying, visualizing, constructing, and documenting the artefacts of software systems, as well as for business 56

CWA 15992:2009 (E)

modelling and other non-software systems. Unified Modelling Language fuses the concepts of object-oriented analysis and design approaches (Booch, OMT and OOSE). The result is a single, common, and widely usable modelling language.

6.1.3 Gaps and future needs

Standards are generally speaking becoming an increasingly central issue in the so- called information and knowledge-based society. Information and communication technologies are tremendously impacting the travel and tourism industry, transforming the whole sector, both from the industry itself and from the consumer side. Within this context, standards in the travel and tourism industry ought to provide integration and exchange of heterogeneous sources of distributed tourism information so that processes (reservation, purchasing, checking, etc.) can seamlessly be carried out, no matter where from, which communication technology is used, language of use, etc.

There is a need to define and provide (semantic) definitions and clarifications in order to transform disparate localized information into a global, coherent resource within the Internet (most common communication platform and environment in this case).

The following important functionalities should be obtained:

• Define a common language for domain experts and IT developers to formulate requirements and to agree upon system functionalities with respect to the correct handling of tourism information. • Support the (semi-)automatic data transformation algorithms from local to global structures without (essential) loss of meaning. • Support associative queries against integrated resources by providing a global model of the basic classes and their associations to formulate such queries. • Define data structures that can be applied, used and processed by the majority of organizations in the travel and tourism industry. • Define the level of formality data ought to be defined with by the different agents of the travel and tourism sector so that these data can efficiently be automatically compared and processed by machines with the lowest human intervention possible.

6.1.4 Recommendations

Short-term recommendations (1–3 years)

• Leverage existing standards rather than developing new specifications whenever possible. • Build cooperation between private associations like IATA, OTA, and XFT and formal standardization bodies such as ISO, and CEN. • Build a “watchtower” registry of relevant eTourism standards that is also acting as a coordination body between various formal and informal standardization activities. Such an activity can be modelled on the MoU/MG. 57

CWA 15992:2009 (E)

Long-term recommendations (3–10 years)

• Lower the entry barrier for participation in pertinent formal and informal standardization bodies especially for SMEs and extend the scope of those activities to cover their respective requirements. • Work on interoperability approaches between different standards. 6.2 Taxonomies 6.2.1 Needs and requirements

Introduction

Traditionally all sciences classify their objects. Astronomy classifies celestial bodies such as planets, stars and galaxies. Botany classifies plants, chemistry the chemicals, medicine classifies illnesses, psychology classifies mental processes, library and information science classify documents and systems and methods of knowledge organization, religious studies classify religions, and the list could go on forever.

Such classifications are not performed just in order to create an aesthetic effect. Classifications are constructed in order to work efficiently, and also to provide the means to efficiently find and retrieve meaningful and required information. Classification is not something extra put on the top of scientific work; rather it is something deeply integrated within scientific work itself, as it provides deeper understanding on the subject matter of study.

For example, if a new group of chemical substances are found to help cure a certain disease and this fact is widely demonstrated, it shall be classified as a kind of drug (e.g. as antidepressives, tranquillizers or anti-inflammatory drugs) that helps humans recover from that particular disease.

There is a close connection between the development of scientific concepts and classifications e.g. when an astronomer recognizes the different between the nature of a particular star or planet, s/he is reflecting this fact in both his/her conception of the item and its later classification within the table of celestial bodies. Classification is carried out under various criteria and it aims at distributing entities within different groups that have one (or several) similar (common) features.

Needs

Due to the tremendous impact that information and communication technologies have had upon the travel and tourism industry, the whole sector has to be re-thought. Despite traditional (tourism) research still being valid, research on other realms (especially IT-related) is also needed, e.g. new information management systems (storage, management, access, retrieval), new communication technologies, channels and platforms, devices that allow people on the move to be able to access and receive information-based services as well as new consumption patterns and behaviours. All of this can only be possible if information is classified, stored and 58

CWA 15992:2009 (E) organized in agreed ways by all agents (public institutions and bodies, industry, research communities, final user, etc.). Relevant tourism information in general, its organization within information management systems and its explicit specification through schemas or information representation methods and models need to be defined.

Information and content are key. To access the right piece of information at the right moment information needs to be clearly stored and classified. Almost anything (including tourism information, i.e. travel, accommodation, restoration, events, useful information, etc.) has to be classified following a structure, e.g. taxonomic schemas.

Requirements

As the amount of (all kinds of) available information increases on the web the particular piece of information we seek may be buried into the one that we do not seek. Thus, the activity of classifying information becomes increasingly more important as it makes it easier to find a particular content on the web. This in terms of service provided by a company can be translated into business opportunities. Information availability in an easy way is significantly more important to those planning some kind of leisure activity, as their behaviour pattern indicates that they will not spend too long on Web sites looking for information. Thus, information has to be object oriented, not experience oriented. However, in order to build a successful tourism taxonomy, both approaches are required.

Taxonomies or conceptual hierarchies are crucial for any knowledge-based system, i.e. any system making use of declarative knowledge about the domain it deals with. Thus, in order for a knowledge system to succeed it has to be easy enough to use for anyone without specialized training or background on its content. Information has to be classified in a meaningful manner by taxonomists and tourism domain experts in a general way and it requires strict control over the creation of new entities and branches. Information management principles and practices, taxonomies, and other controlled vocabularies serve as knowledge management tools that can be used to help organize content and make connections between people and the information they need.

6.2.2 State of the art

Taxonomies are one way of classifying things into groups. There is a significant difference between describing the objects being classified and describing the subjects used to classify them. Taxonomies (and other classification techniques) are different approaches to describe subjects, i.e. it is a subject-based classification that arranges terms in a controlled vocabulary into a hierarchy without doing anything else any further. In practise, taxonomies may be found applied to more complex structures.

The benefit of this (taxonomy) approach is that it allows related terms to be grouped together and categorized in ways that make it easier to find the correct term to use for whatever purpose. Within the tourism domain, if there is a taxonomical classification for the notion of “Event”, different “Event”s could be classified under the

59

CWA 15992:2009 (E) general one, e.g., sport events, cultural events, etc., and would allow a tourist to easily find the kind of “Event” s/he wants to undertake.

Etymologically speaking, the word “taxonomy” comes from the Greek taxis (“arrangement, order”) and nomos (“law”).

The units in taxonomies are termed taxon (plural: taxa). Initially taxonomy was only the science of classifying living organisms and species, but later the word was applied in a wider sense, and may also refer to either a classification of things, or the principles underlying that classification. Classification of species, however, began well before the eighteenth century. Aristotle distinguished species by habitat and means of reproduction, but Andrea Cesalpino produced the first significant taxonomy of plants in 1583, arranging the species in a hierarchical, graded order. His work was developed by Marcello Malpighi, who expanded his hierarchical system to include animals. The word taxonomy is sometimes used synonymously with classification and sometimes given a special meaning.

There have also been some attempts to differentiate taxonomies from simple classifications. These attempts may also serve as a review of the different definitions authors have given to the notion of taxonomy. “A taxonomy obtains when several fundamenta divisionis are considered in succession, rather than simultaneously, by an intensional cl. [classification]. The order in which fundamenta are considered is highly relevant: the taxonomy obtained by using property X to classify a genus and then property Y to classify its species is by no means the same as that obtained by considering property Y first and property X afterwards” [Marradi, 1990].

Campbell & Currier (31/10/00) [Campbell, Currier] asks: What is a taxonomy? And they provide the following answer:

• A taxonomy is an ordered classification system. • Information is grouped according to presumed natural relationships. • Ordered resources are grouped like with like. • The structure of a taxonomy should be consistent with user groups conceptualization of their subject.

Examples of tourism taxonomies

There is a vast number of taxonomic classifications within the tourism domain in the literature. Almost every project applying information management methods use a taxonomy in order to organize the existing information of their universe (the project to be developed) of discourse. Taxonomies are later used to design database structures, ontologies and other tools in order the information to be easily accessible and retrievable for the final user.

In commercial web sites and online travel agencies, their services are often organized under taxonomies, e.g. restaurants and kinds of restaurants. Accommodation facilities are organized under different categories: hostel, 5-star hotel, 4-star hotel, etc., or even in ranges of price, depending upon the search criteria.

60

CWA 15992:2009 (E)

Other examples found in the literature are:

• Cultural Tourism Taxonomies and : The objective of the taxonomy is to develop a comprehensive map of elements of cultural heritage that attract different people to town and cities by: o identifying and categorizing a range of cultural attractors; o identifying interests and motivators for different types of tourists; o identifying relations of attractions between attractors and interests; • Cultural Tourism Taxonomies and Folksonomies: there is a project in the COST Action C21 that defines and builds taxonomy of attraction tourism site. 23 types of attractions are defined thanks to the Prentice’s typology. Building classification is a complex process. Some problems arise like in the Dewey classification where most part of the topics concern mostly European culture than worldwide culture. In the Urban field domain, classifications are more object-oriented classifications than experience oriented classifications like the Dewey classification. Experienced classification is based on the usage. Folksonomy is an ethnoclassification. The goal is to define categories rather than build a correct classification. For example, Flikr web site contains the most popular tags used for photography.

6.2.3 Gaps and future needs

As it can be seen from the previous text after having thoroughly reviewed the most significant literature, one single way of (correctly) classifying things does not exist. Furthermore, the same instances could be classified in different ways (may be depending on their application scenario) with different objects, and different objects could be instantiated using the same meaning. Consequently, there is a need to find an agreement in the community involving all agents possible: public administration and regulatory bodies, industry, final user, research community, etc.

In a taxonomy the means for subject description consists of essentially one relationship: the broader/narrower relationship used to build the hierarchy. The set of terms being described is of course open, but the language used to describe them is closed, since it consists only of a single relationship. Before actually developing the taxonomy, one needs to define the scope of the classification, purpose, and types of content formats. It is crucial to bear in mind at all times the target audience and communities who will use it. An evaluation of the needs can be carried out, or interviews to identify and focus on the content final users care about and on the organization of the content.

Taxonomies usually require strict control over the creation of new entities and branches and this restriction needs to be overcome, especially given the way information is consumed on the web. Systems need to be as dynamic as possible, i.e., flexible.

Traditional information systems have classified information according to a particular hierarchy. Now, some information systems allow introducing links and they make information available in an easier way. In the future, information systems will not

61

CWA 15992:2009 (E) introduce classification methodologies; rather, they will categorize their content and will make it available via tags, links, etc.

6.2.4 Recommendations

Short-term recommendations (1–3 years)

• Follow existing taxonomies including established definitions wherever possible. • Produce mappings between eTourism-related taxonomies. • Federate existing eTourism-related taxonomies across languages based on the mappings and offer a SKOS interface to them. • Formulate guidelines for the design of eTourism-related taxonomies.

Long-term recommendations (3–10 years)

• Build organizational structures for the long-term duration of eTourism-related taxonomies. 6.3 Ontologies 6.3.1 Needs and requirements

Introduction

The word “Ontology” (note the upper-case ‘O’) comes originally from philosophy. From a philosophical point of view, Ontology is the branch of philosophy which deals with the nature and the organization of reality [Guarino, Giaretta, 1995]. We have to go as far back as to Aristotle to see the first reference to this word when he tries to define a “science” that is “on top of” the rest of the sciences, when he describes in his Metaphysics Book IV a science that studies the being as being (i.e. Ontology):

“There is a science that studies the being as being and its properties as such (being) which belong to it in virtue of its nature. Now, this science is not the same as any of the so called special sciences, since none of these other treat (universally) the being as being itself but reducing the being to one part of it, they (“only”) investigate the essential properties of this part. Since we are seeking the first principles and the highest causes there must be something to which these belong in virtue of its own nature. If then those who sought the elements of existing things were seeking these same principles, it is necessary that the elements must be elements of being not only by accident but just because it is being. Therefore it is of being as being that we also must grasp the first causes” [Aristotle, Metaphysics Book IV].

At the computer science domain, ontologies (note now the lower-case ‘o’) aim at capturing domain knowledge in a generic way and providing a commonly agreed understanding of a domain which may be reused and shared across applications and groups. Ontologies provide a common vocabulary of an area and define with different levels of formality the meaning of terms and the relations between them. Since the beginning of the 1990s, ontologies have become a popular research topic 62

CWA 15992:2009 (E) investigated by several Artificial Intelligence research communities, including knowledge engineering, natural language processing and knowledge representation. More recently, the notion of an ontology is also becoming widespread in fields such as intelligent information integration, information retrieval on the Internet, and knowledge management. The reason for ontologies being so popular is in large part due to what they promise: a shared and common understanding of some domain that can be communicated across people and computers.

Needs

In recent years, the development of ontologies has been moving from the realm of Artificial Intelligence (AI) laboratories to the desktops of domain experts. Ontologies have become common on the World Wide Web. Ontologies on the web range from large taxonomies categorizing web sites to categorizations of products for sale and their features. Many disciplines now develop standardized ontologies that domain experts can use to share and annotate information in their fields. Why would someone want to develop an ontology? Here are some of the (possible) reasons:

Clarification of knowledge structures

Ontological analysis clarifies the structure of knowledge. The first reason is that they form the heart of any system of knowledge representation. If there are not conceptualizations that underlie knowledge, then there is not a vocabulary for representing knowledge. Thus, the first step in knowledge representation is performing an effective ontological analysis of some field of knowledge. Weak analyses lead to incoherent knowledge bases.

Consider a domain in which there are people, some of whom are students, some professors, some are other type of employees, some are females and some males. For quite some time, a simple ontology was used in which the classes of students, employees, professors, males and females were represented as “types of” humans. Soon this caused problems because it was noted that students could also be employees at times and can also stop being students. Further ontological analysis showed that “students”, “employees”, etc. are not “types of” humans, but rather they are “roles” that humans can play, unlike categories such as “females”, which are in fact a “types of” humans. Clarifying the ontology of this data domain made it possible to avoid various difficulties in reasoning about the data.

Knowledge sharing

Ontologies enable knowledge sharing. The second reason why ontologies are important is that they provide a means of sharing knowledge. Suppose we do an analysis and arrive at a satisfactory set of conceptualizations and terms standing for them for some are of knowledge, say, the domain of “electronic devices”. The resulting ontology would be likely to include terms such as “transistors” and “diodes”, and more general terms such as “functions”, “processes”, and also terms in the electrical domain, such as “voltage”, that could be necessary to represent the behaviour of these devices. It is important to note that the ontology – defined by the basic concepts involved and their relations – is intrinsic to the domain, apart from a choice of vocabulary to represent it. This ontology can be shared with others who 63

CWA 15992:2009 (E) have similar needs for knowledge representation in that domain, avoiding the need for replicating the knowledge analysis.

6.3.2 State of the art

Already in the middle of the 1980s the building of a big knowledge base on common sense began. This knowledge base can be considered as an (probably the first) ontology. However, it is not until the beginning of the 1990s that ontologies were more known.

It is at that time when DARPA (Defence Advanced Research Projects Agency) started its Knowledge Sharing Effort envisioning as a new way in which intelligent systems could be built [Neches, Fikes, Finin, et al, 1991]. Building knowledge-based systems today usually entails constructing new knowledge bases from scratch. It could be done by assembling reusable components. System developers would then only need to worry about creating the specialized knowledge. This new system would interoperate with existing systems using them to perform some of its reasoning. In this way declarative knowledge, problem solving techniques, and reasoning services would be all shared among applications. This approach would facilitate building bigger and better systems at lower cost.

Since then, a considerable progress has been made in developing conceptual bases needed for building technology that allows knowledge component reuse and sharing.

Definitions of the notion of ontology within the computer science domain

One of the first definitions of the word “ontology” within the computer science domain is due to Neches et al [1991]. They defined an ontology as follows: “An ontology defines the basic terms and relations compromising the vocabulary of a topic area as well as the rules for combining terms and relations to define extensions to the vocabulary”.

It can be affirmed that this definition gives some clues about how to proceed to build an ontology, including some vague definitions:

• identify basic terms and relations between them, • identify rules to combine them, and • provide definitions of such terms and relations.

Later, in 1993, Gruber’s definition becomes the most referenced on the literature. The following is his definition of an ontology: “An ontology is an explicit specification of a conceptualization”. Conceptualization refers to an abstract model of phenomena in the world by having identified the relevant concepts of those phenomena. Explicit means that the type of concepts used and the constraints on their use are clearly defined. Formal refers to the fact that the ontology should be machine readable and process able. Shared reflects the notion that an ontology captures consensual knowledge, that is, it is not private to some individual, but accepted by a representative group of users that belong to a particular domain of knowledge.

64

CWA 15992:2009 (E)

Finally we include here Uschold’s and Grüninger’s [1993] definition of an ontology: “Ontology is the term used to refer to the shared understanding of some domain of interest which may be used as a unifying framework to solve problems e.g. semantic interoperability, structuring and representing relevant concepts in a large knowledge base, etc.” As a conclusion, it can be said that there are as many definitions of this word as authors, although these last two are the most used ones in the reviewed literature.

Main components of an ontology

Ontologies provide a common vocabulary of an area and define – with different levels of formality – the meaning of the terms and the relations between them. Knowledge in ontologies is mainly formalised using five kinds of components: classes, relations, functions, axioms and instances [Gruber, 1993].

• Classes (also concepts) in the ontology are usually organized in taxonomies. Classes or concepts are used in a broad sense. A concept can be anything about which something is said and, therefore, could also be the description of a task, function, action, strategy, reasoning process, etc. • Relations represent a type of interaction between concepts of the domain. They are formally defined as any subset of a product of n sets. • Functions are a special case of relations in which the n element of the relationship is unique for the n-1 preceding elements. • Axioms are used to model sentences that are always true. They can be included in an ontology for several purposes, such as defining the meaning of ontology components, defining complex constrains on the values of attributes, the arguments of relations, etc., verifying the correctness of the information specified in the ontology or deducing new information. • Instances are used to represent specific elements.

Ontology development tools

The tools that can be used for building ontologies usually provide a graphical user interface for building ontologies, which allows the ontologists to create ontologies without using directly a specific ontology specification language. Some tools such as Protégé, Chimaera, and FCA-Merge have been created for merging and integrating ontologies.

In the context of the Semantic Web, some tools have arisen during last years for the annotation of web resources in SHOE, RDF or DAML+OIL and OWL. Their main objective is the creation and maintenance of ontology-based markups in static web documents. In fact, they are used for managing easily instances, attributes and relationships between web resources. Some of these annotation tools are OntoAnnotate, OntoMAt, and SHOE Knowledge Annotator.

There are also some ontology-based text mining tools, which allow extracting ontologies either from structured, semi-structured or free text. These tools are used to learn ontologies from natural language, exploiting the interacting constraints on the 65

CWA 15992:2009 (E) various language levels (from morphology to pragmatics and background knowledge) in order to discover new concepts and stipulate relationships between concepts.

There are some important parameters that can be used in the comparison and evaluation of existing tools. Some of these parameters are:

• Software architecture and tool evolution. This includes information about hardware and software platforms necessary to use the tool, its architecture, extensibility and ontology storage. In this sense tools are moving towards Java-based applications most of them accessible in the web. • Interoperability. There is not standardization on the tools that are used when performing any of these tasks, and these environments are not usually interoperable. • Methodology support. It is not usual that a tool gives support to a methodology for building ontologies.

Ontology development languages

A great range of languages have been used for the specification of ontologies during the last decade: Ontolingua, LOOM, OCML, Flogic, CARIN. Many of these languages had already been used for representing knowledge inside knowledge-based applications, others were adapted from existing knowledge representation languages, and there is also a group of languages that were specifically created for the representation of ontologies. These languages, which we will call “traditional” languages, are in a stable phase of development, and their syntax consists of plain text where ontologies are specified.

Recently many other languages have been developed in the context of the World Wide Web: RDF, RDF Schema, SHOE, XOL, OML, OIL, DML+OIL, and OWL. Their syntax is based on XML, which has been widely adopted as a ‘standard’ language for exchanging information on the web, except for SHOE, whose syntax is based on HTML.

Among all these languages, RDF and RDF Schema cannot be considered to be ontology specification languages per se, but rather general languages for the description of metadata in the web. Most of these “markup” languages are still in a development phase.

There are many other languages that have been also considered in this survey. For instance, some languages have been created for the specification of specific ontologies, such as CycL and GRAIL. There are also some other languages that have not been created specifically for the representation of ontologies, including additional features that are not usual in ontologies, such as NKRL.

The selection of an ontology specification language for the development of an ontology will not only depend on the characteristics of the language, but also on the tools that support it, the applications in which the ontology will be used, and the availability of reusable ontologies in the same domain in a specific language.

66

CWA 15992:2009 (E)

The most commonly used ontology development languages are the following:

• RDF: RDF (Resource Description Framework) is one of the essential tools within Semantic Web. It is defined as a data model for objects (“resources”) and their relations. It offers a simple semantic and uses XML-based syntax. The scientific community has chosen the RDF as a standard in order to mark metadata and it is widely supported by the W3C and various others organizations; • OWL: The Ontology Web Language has been designed in order to extend RDF’s descriptive features. OWL is part of a growing recommendation practise by the W3C in Semantic Web related issues. OWL has three different languages, each of them related with a higher degree of expressivity. They have been conceptualized according to the level of expressivity and formality needed in the application: o OWL Lite: it is widely used in cases where a hierarchical classification is needed and where there are very simple restrictions. E.g., it allows establishing cardinality restrictions, but it only allows to establish 0 or 1 values. OWL Lite has less complexity than OWL DL; o OWL DL (Description Logic): The OWL DL language has been designed for cases in which maximum expressivity is required. OWL DL includes all functionality and power of OWL Full but with some restrictions e.g., one class can be a sub-class of many other classes, however, a class cannot be an instance of another class; o OWL Full is the third OWL sub-language. In OWL Full a class can simultaneously be considered as a set of individual classes and as an individual class on its own. OWL Full can be considered as an extension of RDF(S).

Examples of standard ontologies

There have been some research communities that have already tried to define standard ontologies that cover a particular area of knowledge in a generic way and that could thus be used in a standard way.

The CIDOC Conceptual Reference Model (CIDOC CRM)

The CIDOC CRM is a core ontology explaining the extended meaning of data structures from humanities and cultural heritage, including history of science, is the outcome of a long-term disciplined knowledge engineering activity which excels in its ontological commitment, i.e. acceptance of its constructs by domain experts.

The primary role of the CRM is to enable information exchange and integration between heterogeneous sources of cultural heritage information (Doe, 03). It aims at providing the semantic definitions and clarifications needed to transform disparate, localized information sources into a coherent global resource within a larger institution, in intranets or within the Internet. More concretely, it defines and it is restricted to the underlying semantics of database schema and document structures used in cultural heritage and museum documentation in terms of a formal ontology.

67

CWA 15992:2009 (E)

The success of the CRM relies on the fact that the explanation of common meaning can be done by a very small set of primitive concepts and relations in contrast to data structure that suggest to the user what to say about an object. The relations in data structures that connect items directly by highly specific, diverse kind of relationship can frequently be expressed by data paths composed of a few fundamental relationships defined within the core ontology.

The CIDOC CRM has become the most promising core element for realizing semantic interoperability in archives, libraries and museums by its capability to link intellectual structure of highly diverse sources and products of scientific and scholar discourse with the elements formally handled by information systems.

The CIDOC CRM is the culmination of over 10 years work by the CIDOC Documentation Standards Working Group and CIDOC CRM SIG (Special Interest Group) which are working groups of CIDOC. Since 2006 it is official standard ISO 21127.

FRBRoo

The FRBRoo is a formal ontology intended to capture and represent the underlying semantics of bibliographic information and to facilitate the integration, mediation, and interchange of bibliographic and museum information. The FRBR model was originally designed as an entity-relationship model by a study group appointed by the International Federation of Library Associations and Institutions (IFLA).

The CIDOC CRM model was being developed from 1996 under the auspices of the ICOM-CIDOC (International Council for Museums – International Committee on Documentation) Documentation Standards Working Group. The idea that both the library and museum communities might benefit from harmonizing the two models was first expressed in 2000 and grew up in the following years. Eventually it led to the formation, in 2003, of the International Working Group on FRBR/CIDOC CRM Harmonisation that brings together representatives from both communities with the common goals of:

• expressing the IFLA FRBR model with the concepts, tools, mechanisms, and notation conventions provided by the CIDOC CRM, and • aligning (possibly even merging) the two object-oriented models with the aim to contribute to the solution of the problem of semantic interoperability between the documentation structures used for library and museum information, such that: o all equivalent information can be retrieved under the same notions, and o all directly and indirectly related information can be retrieved regardless of its distribution over individual data sources; o knowledge encoded for a specific application can be repurposed for other studies; o recall and precision in systems employed by both communities is improved;

68

CWA 15992:2009 (E)

o both communities can learn from each other’s concepts for their mutual progress; o for the benefit of the scientific and scholarly communities and the general public.

In 2006 a first draft of FRBRoo was completed. It is a logically rigid model interpreting conceptualizations expressed in FRBRer and of concepts necessary to explain the intended meaning of all FRBRer attributes and relationships. The model is formulated as an extension of the CIDOC CRM. Any conflicts occurring in the harmonization process with the CIDOC CRM have been or will be resolved on the CIDOC CRM side as well. The Harmonization Group intends to continue work modelling the FRAR concepts and elaborating the application of FRBR concepts to performing arts.

HarmoNET

The Harmonisation Network for the Exchange of Travel and Tourism Information, HarmoNET, is an international network bringing together people and organizations with an interest in the topic of harmonization and seamless information exchange in travel and tourism. HarmoNET provides unique technologies and services enabling an easy, affordable and fast information exchange.

The travel and tourism industry is an information-based business in which information exchange is essential in order to maintain a dynamic market. HarmoNET aims to create for its members an international network for harmonization and seamless data exchange in the travel and tourism industry. HarmoNET does not implement a new standard, rather it provides the means for an effective data mediation process.

HarmoNET offers the following services:

• Ontology Management: HarmoNET provides and maintains a tourism specific ontology as a common definition of concepts and terms, their meaning and relations between them. This ontology serves as a common agreement for the HarmoNET mediation service as well as a reference model for building specific data models or tourism information systems. • Mediation Service: The HarmoNET mediation service provides a technical solution to the interoperability problem. Heterogeneous data are mapped from the local format on the one side to the format on the other side. • Community Services: In order to build a strong community and foster the communication and information exchange within the community HarmoNET offers online community services like mailing lists, discussion fora, newsletter or bulletin boards as well as traditional community services like conferences, workshops and seminars, which will allow the community to meet together and to further work on the definition of the HarmoNET ontology.

SUO

Recognizing both the need for large ontologies and the need for an open process leading to a free, public standard, a diverse group of people has come together to make such a standard a reality. The Standard Upper Ontology (SUO) will be an 69

CWA 15992:2009 (E) upper level ontology that provides definitions for general-purpose terms and acts as a foundation for more specific domain ontologies.

It is estimated to contain between 1000 and 2500 terms plus roughly ten definitional statements for each term.

• The standard will be suitable to support knowledge-based reasoning applications. • This standard will enable the development of a large (20 000 +) general- purpose standard ontology of common concepts, which will provide the basis for middle level domain ontologies and lower-level application ontologies; • The ontology will be suitable for “compilation” to more restricted forms such as XML or database schemata. This will enable database developers to define new data elements in terms of a common ontology, and thereby gain some degree of interoperability with other compliant systems. • Owners of existing systems will be able to map existing data elements just once to a common ontology, and thereby gain a degree of interoperability with other representations that are compliant with the SUO. • Domain-specific ontologies that are compliant with the SUO will be able to interoperate (to some degree) by virtue of the shared common terms and definitions. • Applications of the ontology will include: o e-commerce applications from different domains which need to interoperate at both the data and semantic levels; o educational applications in which students learn concepts and relationships directly from, or expressed in terms of, a common ontology. This will also enable a standard record of learning to be kept; o natural language understanding tasks in which a knowledge-based reasoning system uses the ontology to disambiguate natural language terms and structures.

SOUPA

Standard Ontology for Ubiquitous and Pervasive Applications (SOUPA) is designed to model and support pervasive computing applications. This ontology is expressed using the Web Ontology Language OWL and includes modular component vocabularies to represent intelligent agents with associated beliefs, desires, and intentions, time, space, events, user profiles, actions, and policies for security and privacy. SOUPA can be extended and used to support the applications of CoBrA, a broker-centric agent architecture for building smart meeting rooms, and MoGATU, a peer-to-peer data management for pervasive environments.

6.3.3 Gaps and future needs

Ontologies are still not flexible enough and extensible enough. The tourism sector could be partially covered by some concepts, however, the extension of the initial ontology would require a relatively large (manual) effort in order to cover new 70

CWA 15992:2009 (E) concepts. Due to the heterogeneity of the travel and tourism industry, it is a challenge for a single ontology to cover the whole market offer, thus the ontology management process would potentially be too complicated.

In order for tourism companies to adopt either a standard or an ontology mediation process, companies should have the feeling to be able to make their offer differentiable within the market. Standards and ontologies sometimes tend to bury distinct features of products and services.

6.3.4 Recommendations

Short-term recommendations (1–3 years)

• Use recognized standard reference models such as the Harmonise ontology (for tourism purpose) or CIDOC CRM (for cultural heritage data) wherever possible. • Produce guidelines for the mappings between eTourism-related ontologies based on standard reference models. • Use established standards such as RDF(S), OWL or the Topic Map Constraint Language to express ontologies. • Heighten the awareness of Open Source, user-friendly tools for ontology definition such as Protegé.

Long-term recommendations (3–10 years)

• Build ontologies to represent other standards, e.g. IATA, etc. • Build tools to automatically map ontologies. • Work on automatic ontology (re)structuring and population.

71

CWA 15992:2009 (E)

7 Data transformation 7.1 Structured data mapping 7.1.1 Needs and requirements

Introduction

The so-called information society demands complete access to available information, which is most of the times distributed and heterogeneous. First a suitable information source must be located that potentially contains data of interest. Then access to the data contained in the information source has to be provided, i.e. the information source and the querying system need to understand each other in order to effectively retrieve the particular piece of information of interest.

In order to establish comprehensive information sharing and to achieve efficient interoperability of information systems various kinds of solutions have been made available. Within these range of possible approaches, ontologies have shown to play an important role in resolving semantic heterogeneity among information sources by providing a shared understanding of a given domain of interest, e.g. the travel and tourism industry.

Information sources may contain information on different levels of organization: Data may be structured in databases, semi-structured in XML documents or completely non-structured as web pages or other type of documents available. Regardless what the origin of data is it has to be mapped to an ontology if the objective is to achieve interoperability of the local system with some other. In this clause the mapping between an information source (e.g. a database, an XML file, etc.) and an ontology is reviewed.

The first step in a mapping process is to relate ontologies to actual contents of an information source. Ontologies may relate to the database scheme but also to single terms used in the database or data structure. Regardless of this distinction we can observe different general approaches used to establish a connection between ontologies and information sources:

• Structure Resemblance: A straightforward approach to connecting the ontology with the structured data source is to simply produce a one-to-one copy of the structure and encode it in a language that makes automated reasoning possible. The integration is then performed on the copy of the model and can easily be tracked back to the original data. • Definition of Terms: In order to make the semantics of terms in a database schema clear it is not sufficient to produce a copy of the schema. There are approaches that use the ontology to further define terms from the database or the database scheme. These definitions do not correspond to the structure of the database; these are only linked to the information by the term that is defined. The definition itself can consist of a set of rules defining the term. However, in most cases terms are described by concept definitions.

72

CWA 15992:2009 (E)

• Structure Enrichment is the most common approach to relating ontologies to information sources. It combines the two previously mentioned approaches. A logical model is built that resembles the structure of the information source and contains additional definitions of concepts. • Meta-Annotation: A rather new approach is the use of meta-annotations that add semantic information to an information source. This approach is becoming prominent with the need to integrate information present in the World Wide Web where annotation is a natural way of adding semantics. We can further distinguish between annotations resembling parts of the real information and approaches avoiding redundancy.

Needs

Using information systems in the travel and tourism industry implies using information coming from different data sources. All in all, it is a system working in cooperation with other systems and for this to happen, information coming from various data sources may be needed to provide a particular service to a client.

Mapping is a very critical operation in various application domains such as semantic web, schema or ontology integration, data integration, data warehouses, eCommerce, etc. As it has been mentioned in previous clauses, eCommerce activities are crucial in the eTourism domain. Focussing on mappings three different kinds can be distinguished:

• Schema mapping: Mappings are established between schemas of databases. This method takes two database schemas as an input and produces a mapping between elements of the two schemas that correspond to each other. • Ontology mapping: Ontology mapping is somewhat similar to schema mapping. In this case, the purpose of the mapping is to create a relation of the vocabulary of two ontologies that share the same domain of discourse. • Database-to-Ontology mapping: This is the process through which a structured data source and an ontology are semantically related at a conceptual level, i.e. relationships are set up between the ontology and data source components.

The approach to be taken requires the creation of a mapping description using some kind of formal language that maintains the level of formality and expressivity of both the ontology and the database. The document containing the description of them has to show the correspondences between the components of the database’s SQL schema and those of the ontology. Afterwards, the ontology needs to be populated through the mappings that have been made explicit in the document. The process ought to be as automatic as possible in order to not need a high human effort.

In order to do this, languages to define mappings are needed. These languages have to have the following features:

• They have to be fully declarative in order to efficiently define and describe mappings between relational database schemas and ontologies. It is has to be expressive enough to define the semantics of the mappings. 73

CWA 15992:2009 (E)

• The language ought to define how to create instances in the ontology in terms of the data stored in the database. • The language needs to have a declarative nature in terms of discovering inconsistencies and ambiguities in the definition of a mapping. This potential problems have to automatically be discovered by the mapping language. • The mapping definition language could potentially be used to automatically characterize data sources to allow dynamic query distribution in intelligent information integration approaches. • The mapping definition language doesn’t have to declare the degree of similarity between database elements and ontology components. Rather, it has to state under which conditions and after what transformations the database elements are equivalent to the ontology components.

Requirements

Semantic conflicts occur whenever two contexts do not use the same interpretation of the information. Goh identifies three main causes for semantic heterogeneity that need to be overcome in order to achieve semantic interoperability [Goh, 1997]:

• Confounding conflicts occur when information items seem to have the same meaning, but differ in reality, e.g. owing to different temporal contexts. • Scaling conflicts occur when different reference systems are used to measure a value. Examples are different currencies. • Naming conflicts occur when naming schemes of information differ significantly. A frequent phenomenon is the presence of homonyms and synonyms.

The use of ontologies for the explication of implicit and hidden knowledge is a possible approach to overcome the problem of semantic heterogeneity. With respect to the impact on the data exchange, structuring conflicts can be differentiated:

• fully mappable: all clashes can be resolved without any loss of information; • partially or non-mappable: covering the structural conflicts for which any conceivable transformation will cause a loss of information.

Here are some examples of clashes between different standards identified [Dell’Erba, Fodor, Höpken, et al, 2005].

• Different naming: Equivalent concepts have different names in different standards. This is a fully mappable semantic clash. • Different position: Equivalent concepts have different positions within the structure of the standards. This is also a fully mappable semantic clash. • Different scope of concepts: Concepts, containing the same piece of information in different standards, have different scopes, i.e., the same piece of information might be represented as single concept or as a part of several concepts. This is also a fully mappable semantic clash. • Different abstraction levels: The same information is represented on different levels of abstraction. This is a partially mappable semantic clash.

74

CWA 15992:2009 (E)

• Different granularity: The same information is represented on different levels of granularity. This is a partially mappable semantic clash. • Missing concept: If a concept in one standard has no counterpart in the other standard, it cannot be mapped.

Most of current approaches to solve the interoperability problem are mainly based on the idea of fixed, obligatory standards, which define all details of the exchanged messages. An example of an international XML-based standard is the specification of OTA [OTA]. Companies, which are using such standards, are automatically able to exchange information with each other. However, all details of the exchanged message must be committed among all communication participants. The process of defining and maintaining such standards requires a lot of effort and therefore such standards are almost exclusively used by large companies such as hotel chains, airline companies and Global Distribution Systems (GDS).

7.1.2 State of the art

This section presents the state of the art on structured-to-ontology mapping from a database perspective. The same concepts hold for any other structured data source, such as XML data structures.

There are different mapping situations arising from database-to-ontology mapping. A database-to-ontology mapping can be defined as a set of correspondences that relate the vocabulary of a relational database schema with that of an ontology. That is, we want to relate a database’s tables, columns, primary and foreign keys, etc., with an ontology’s concepts, relations, attributes, etc.

There are several approaches in the literature to address the database to ontology mapping. In general, they can be classified into two main categories: approaches to create a new ontology from a database and approaches to map a database to an already existing ontology.

• Creating an ontology from a database: This approach refers to the creation of an ontology model from a relational database model and migrates the contents of the database to the generated ontology. The mappings here are simply the correspondences between each created ontological component (class, property, etc.) and its original database component (table, column). Mappings in this case are usually not extremely complex and the process could be automated in a high degree. However, this kind of direct mapping may fail to express the full semantics of the database domain. The creation of an ontology structure may require the discovery of hidden semantics implicitly expressed between database components (e.g., referential constraints) and take them into account in the ontology building process. • Mapping a database to an already existing ontology refers to the creation of links between them or to populate the ontology with database content. Mappings in this case are far more complex as different levels of overlaps between the database domain and the ontology’s one can be found. Those domains do not necessarily have to coincide, as the criteria used to design databases and the criteria used to design ontologies are different.

75

CWA 15992:2009 (E)

Both mapping processes include two processes:

• mapping definition (i.e. the definition from the database structure (schema) to the ontology structure, and • data migration, the migration of database content to instances of the ontology.

Volz et al [Volz, Handschuch, Staab, Studer, 2004] [Volz, Stojanovic, Stojanovic, 2002] propose an approach based on semiautomatic generation of an F-Logic ontology from a relational database model. Mappings are defined between the database and the generated ontology. The ontology generation process takes into account different types of relationships between database tables and maps them to suitable relations in the ontology. The mapping process is not completely automatic and a user intervention is needed when several rules could be applied to choose the most suitable.

Each table is transformed to a class and each attribute is transformed to a property. In addition, if the relational database table has foreign key references to other tables, these can be transformed to instance pointers, i.e. a new slot is added to the class representing the reference table whose value is an instance of the class representing the referenced table. The user manually selects the tables that he wants to map to the ontology, then the mapping process is run in a completely automatic manner.

Relational.OWL [de Laborda, Conrad, 2005] is an OWL ontology representing abstract schema components of relational databases. Based on this ontology, the schema of (virtually) any relational database can be described and in turn be used to represent the data stored in that specific database. This approach uses the meta- modelling capabilities of OWL-Full, which prevents the use of decidable inference on the resulting ontology.

The definition of mappings is automatic or semi-automatic in the approaches that create a new ontology, whereas there is no approach allowing the completely automatic definition of mappings to an already existing ontology. On the other hand, the process of ontology population is always automatic. The approaches that create a new ontology utilize the massive dump process for ontology population, except the approach DB2OWL that allows the query driven process.

7.1.3 Gaps and future needs

Although a lot of effort has already been invested in ontology research (concept, methods, building, theory, etc.) and (commercial) application building, general mapping processes are still at their infancy. There is a clear notion of what a mapping is, however, the real semantics and expressiveness of the links themselves have not yet been clearly defined.

Most of the mappings have been defined ad-hoc, i.e. for particular cases and are neither reusable nor extensible to other cases. Besides, should changes occur within databases, the whole mapping and even ontology would have to be redefined in order to cover new concepts and relations.

76

CWA 15992:2009 (E)

The literature review has shown a number of languages that have been used to map databases to ontologies. However, there is no evidence of any language that links (maps) ontology components to database elements.

There is still a lot of human intervention needed for creating mappings. Although graphical interfaces have been created (like in the case of R2O) still the mapping work is in general hand intensive. This depends upon the level of formality and different expressivity information is represented with and stored in databases. One possible way to automate in a certain degree the mapping creating process could be to recommend the building of the ontology using existing standard languages. This way ontologies could be compared, as they would have the same degree of expressivity and formality.

7.1.4 Recommendations

Short-term recommendations (1–3 years)

• Use (graphical) mediation tools that enabled with reasoning capabilities to automatically suggest same (semantically equivalent) data sources, identify inconsistencies and decreases the amount of human intervention in the mapping process. • Pursue the design and implementation of new data resources on the bases of agreed recommendations, such as the W3C recommendations for Semantic Web technologies.

Long-term recommendations (3–10 years)

• Use semantic web technologies (e.g. based on RDF URIs) to name and represent (data) resources on the Web so that mapping can be automatically undertaken. • Agree the degree of formality information ought to be defined with, so that automatic mapping tools compare same kind of information. • Foster high level general ontologies to describe particular domains of interest so that low-level more concrete ontologies can later be linked or merged within the (more general) structure (if and only if both ontologies are defined with the same level of formality and with the same ontology definition language). 7.2 Manual semantic annotation

Semantic Annotation is about attaching meaningful (information) structures to information resources such as documents, general multimedia content or information on the Web in such a way that they can be used by computers in a meaningful way to enhance the usefulness of those resources. Semantic Annotation formally identifies concepts and relations between concepts in documents, and is intended primarily for use by machines.

Information about documents and information sources has traditionally been managed through the use of metadata. Metadata is just information concerning a 77

CWA 15992:2009 (E) particular source of information: author, date, origin, content, type of file, etc. Within the context of Semantic Web (as defined by Tim Berners-Lee) annotating document content is proposed by using semantic information from domain ontologies [Berners- Lee, 2001]. The result of (manually) annotating a Web information resource is Web pages with machine interpretable mark-up that provide the source material with which agents and Semantic Web services and advanced search engine operate. The goal is to create annotations with well-defined semantics.

The amount of tourism information on the Web is huge and the diversity of its nature is also vast. Furthermore, recent studies have shown that decisions of tourists about their potential destinations are increasingly influenced by multimedia and web-based content and comments generated by other tourists. Besides, tourists have begun to share their experiences on the web in the so-called Web 2.0 phenomenon and a tremendous amount of web pages have been created by tourists and final users. Event destination management organizations are beginning to include user generated content into their own web sites as a way to promote their destination.

All of this information (usually non-structured) has to be made available to the general public, i.e. metadata about that information has to be created in order to make that information reachable on the Internet.

7.2.1 Needs and requirements

For the sake of data interoperability and exchange a well defined semantics is a must to ensure that annotator and annotation consumer actually share meaning. A key contribution of the Semantic Web is therefore to provide a set of worldwide standards and recommendations on manual annotation. These recommendations allow to operate with heterogeneous resources by providing an intermediation of common syntax, methods, semantics and understanding.

Travel and tourism is a leading industry in the application of B2C and B2B2C eCommerce and mCommerce solutions as well as Web based information channel, and a huge number of tourism information systems have been developed in order to support all the processes related to the electronic market. If the objective is to automate the eBusiness processes over the Web with no human intervention and allowing machines to automatically interoperate among them, there is a must to annotate information sources so that a mediation ontology can integrate information coming from heterogeneous systems.

Therefore, in order for the tourism industry to succeed, new ways of data and content annotation have to be developed so that the particular piece of information is used by a particular machine for a particular business process allowing a vertical data integration approach to the tourism market.

Semantic annotation is required as it brings enhanced information retrieval and improved interoperability among systems. Information retrieval, mostly related to search on unstructured data sources, is improved by the ability to perform searches, which exploit the ontology to make inferences about data from heterogeneous resources [Welty, 1999].

78

CWA 15992:2009 (E)

According to the Semantic Annotation for knowledge Management [Uren, 2005]: requirements and a survey of the state of the art, there are six requirements for semantic annotation:

• Standard formats: standards can provide a bridging mechanism that allows heterogeneous resources to be accessed simultaneously and collaborating users and organizations to share annotations. Two standards can be mentioned: the OWL for describing ontologies and RDF for annotation schema; • User centred: easy to use interfaces that ease the task of annotating documents; • Ontology Support: Annotation tools need to support multiple ontologies; • Support of heterogeneous document formats; • Document evolution: Keep documents and annotations consistent; • Annotation storage: There are different storage criteria. Ones argue that annotations ought to be stored separately from the documents and others argue that annotations are an integral part of the document and therefore they should be stored together.

7.2.2 State of the art

There are a number of tools that produce semantic annotations, i.e. annotations that refer to a particular ontology. These tools meet some of the requirements above, however, they need further development.

Manual annotation tools allow users to manually create annotations, i.e. metadata about a particular information source. These tools are in general terms relatively similar to those used for pure textual annotations, but differ in the sense that they provide some support for ontologies.

Following, there is a list with some of the most relevant annotation tools found in the literature:

• Amaya [Quint, 1994] is a Web browser and editor that marks-up Web documents in XML or HTML. The user can make annotations in the same tool s/he uses for browsing purposes. It facilitates manual Web pages annotation but does not support any automatic annotations; • The Annozilla browser aims to make all Amaya annotations readable in the Mozilla browser; • The Mongrove system is another example of manual but user friendly annotation tool [McDowell, 2003]. The annotation tool is a straight forward GUI that allows users to associate a selection of tags to text that they highlight; • Due to the increase of multimedia content on the Web, tools to annotate this kind of content have become very useful. Vannotea [Schroeter, 2003] can be used to add metadata to MPEG-2 (video), JPEG(2000) image and Direct 3D (mesh) files, with the mesh being used to define regions of images;

79

CWA 15992:2009 (E)

• OntoMat Annotizer: this is a tool for making annotations which is built on the principles of the CREAM framework. It has a Web browser to display the page which is being annotated and provides some reasonably user friendly functions for manual annotation, such as drag and drop creation of instances and the ability to mark-up pages while they are being created; • The M-OntoMat-Annotizer [Bloehdorn, 2005] supports manual annotation of image and video data by indexers with little multimedia experience by automatic extraction of low level features that describe objects in the content. A commercial version of OntoMat, called OntoAnnotate,5 is available from Ontoprise; • SHOE Knowledge Annotator [Heflin, 2001] was an early system which allowed users to mark-up HTML pages in SHOE guided by ontologies available locally or via a URL. Users were assisted by being prompted for inputs. Unusually, the SHOE Knowledge Annotator did not have a browser to display Web pages, which could only be viewed as source code.

7.2.3 Gaps and future needs

Although annotation tools are most of them based in easy to understand and to use GUI, it is still relatively expensive to annotate information sources. There is a need for integrated systems that allow users to deal with the documents, ontologies and the annotations that link documents to ontologies.

The most important challenge in manual annotation tools is automation – automation to support annotation, to support ontology maintenance and automation to help maintain the consistency of documents, ontologies and annotations (Uren, 2005).

Other important challenges for the future in this active research area are: automating the annotation of information of various formats, addressing issues of trust and security and resolving problems of storage.

7.2.4 Recommendations

Due to the nature of this topic, there can be some overlapping of recommendations with other issues that have already been covered, such as ontologies.

Short-term recommendations (1–3 years)

• Enhance the use of standard ontologies (e.g. harmoNISE) on the field of tourism. • Enhance the development of ontologies with standard languages: OWL, RDF. • Enhance the use of already existing manual annotation tools in the realm of tourism.

Long-term recommendations (3–10 years)

• Investigate in automation of annotations.

80

CWA 15992:2009 (E)

• Investigate in automatic ontology extension. 7.3 Automatic information extraction 7.3.1 Needs and requirements

Much of the data relevant for eTourism is available on normal public web sites. Just as tourism itself is a wide-ranging concept, data pertinent for it can stem from many sources, touristic and otherwise. Local communities often provide ample information about points of interest in their area. Theatres and orchestras regularly publish their programmes, museums inform about opening hours and ticket prices, and hotels frequently provide information about their services that complement the basic facts that are stored in major booking systems.

These scenarios are only a few examples of many. Actually, the amount of information that is stored in this way probably vastly surpasses that in structured sources. As Martin Hepp, Katharina Siorpaes, et al have analyzed, structured and unstructured data complement each other in many cases, e.g. for hotels where web sites frequently contain more complete descriptions of the hotel, while the GDSs only publish the room availability.

Normally, however, the data on the web is unstructured and geared towards human consumption only. Only rarely do metadata or formal resource descriptions reliably complement and explicate this unstructured information to facilitate its use in automated transactions or automated integration with structured resources. It seems unlikely that this situation is going to improve fundamentally over the next years.

The unstructured nature of the data invariably limits its reuse in electronic transactions. Based on this type of information it will be difficult at best to, e.g., automatically complement a hotel booking with the reservation of museum and theatre tickets.

Needs

Nonetheless, as long as there is no prospect of fundamentally reversing the present situation, companies need to leverage the currently existing data as well as possible.

Requirements

In an ideal world, information extraction would structure free text in such a way that it can be automatically analyzed, queried and integrated with structured data sources. This is certainly illusionary for the foreseeable future. Nevertheless, it is necessary to explore the potential of the various facets of information extraction for the eTourism domain.

7.3.2 State of the art

Information extraction is “the automatic identification of selected types of entities, relations, or events in free text” [Grishman, 2003]. Unstructured information is thus 81

CWA 15992:2009 (E) automatically structured and usually imported into databases, XML files or other structured storage formats for subsequent analysis and evaluation.

Information extraction is by no means restricted to web sites. In fact, information extraction was originally popularized in the 1980s based on locally stored free text corpora. However, many of today’s application incorporate the harvesting of information on the web. This is certainly also the more applicable scenario for eTourism.

Currently the two branches of information extraction that have drawn most attention in the research community are named entity recognition – the explication of references to persons, organizations, places, etc. – and event extraction; the latter, e.g., practiced in projects such as JRC’s EMM Violent Events Maps that are automatically compiled from published news feeds. Both are pertinent to eTourism. Furthermore, some research has been done on information extraction specific to eTourism.

Named entity recognition

Named entity recognition is by now a rather well understood topic with wide applications both across many fields – computational linguistics, computational philology and related disciplines, even genetics – and across many languages. Approaches for name taggers often build either on hand-crafted rules – good classifiers can reach a precision well above 90 % for English language material (cf. Grishman, 2003, note 3) – or machine learning technologies including automated learning and statistical model building. Both maximum entropy [Borthwick, 1999] and Hidden Markov [Bikel, Miller, Schwartz, Weischedel, 1997] models have been trained using tagged reference materials. The models have then been successfully applied to untrained material, reaching again precision levels above 90 % for new material.

Various readily available tools implement named entity recognition. The ANNIE package of the open source GATE suite contains resources such as a tokenizers, gazetteers and semantic taggers to build rule-based named entity resolvers. Many other open source or commercial offerings are listed in http://en.wikipedia.org/wiki/Named_entity_recognition.

Event extraction

Whereas named entity recognition is a rather well understood topic, event extraction is somewhat more experimental and by necessity more closely bound to the type of events that are supposed to be extracted. A given event type is usually captured according to a given template – essentially a database table or a set of formal assertions – whose valencies are filled from entities that are isolated in the free text. As a rule named entity recognition is a part of this explication process as named entities frequently occur in the description of events.

To illustrate this situation a typical example of the description event from the Wall Street Journal of 1993-02-19 may help. This example is lifted directly from GATE Information Extraction:

82

CWA 15992:2009 (E)

• New York Times Co. named Russell T. Lewis, 45, president and general manager of its flagship New York Times newspaper, responsible for all business-side activities. He was executive vice president and deputy general manager. He succeeds Lance R. Primis, who in September was named president and chief operating officer of the parent.

Ideally event extraction might automatically capture the series of events implied in this article according to a job-related template with fields such as organization, job title, newly appointed person, and previous job holder. In reality this is often highly non-trivial, as exemplified by the number of anaphoric references (“he”, “who” and “the parent”), the need for inference (Primis obviously was the previous job holder and has now been promoted) and the amount of encyclopaedic knowledge (New York Time Co. is the holding for the newspaper) needed for interpreting even this short and seemingly simple news bulletin.

Unsurprisingly results tend to be better if the source material already follows some recurrent pattern, as is the case, e.g., for many job postings or medical records, but also, interestingly, for news articles on violent events such as bombings or earth quakes.

The number of readily available tools for event extraction is smaller than that for named entity extraction, and they need to be heavily tailored for any given type of event extraction and template. One example for such a tool is the open source GATE Information Extraction package. Commercial offerings include the OpenCalais suite of web services.

Tourism-specific information extraction

Information extraction for tourism specific data necessarily has to deal with a number of different types of events such as performances, sports events, entries from event calendars, etc. Each of these can have its own display rules and needs its own templates. Furthermore, pertinent data is regularly spread across many sources in many different languages and must hence support parsing from many languages. Ideally, it should then be stored in language-independent templates based on language-neutral concepts hierarchies. For end-user consumption the templates must be rendered in various languages, ideally fully automatically. Already the FP4 project on Multilingual Information Extraction for Tourism and Travel Assistance (MIETTA, 1998–2000) worked precisely on these issues. Xu, Netter, Stenzhorn elaborate on two event types, adult education courses and theatre performances and describe the MIETTA system developed in the project. Sadly, they do not publish any data on the reliability of the system by testing the extracted information against manually captured data, as would have been normal. To know this would obviously be a precondition to gauge the viability of the project’s approach.

A brief follow-up project on a Multilingual Information Environment for Travel and Tourism Applications (MIETTA-II) was funded under FP5 from 2001 to 2002. Its primary goal was to commercialize the findings of MIETTA. Unfortunately, there is very little information publicly available on MIETTA-II. It is not clear if the project actually achieved its primary objective and if the results of the two projects were ever deployed in real-life settings. 83

CWA 15992:2009 (E)

Current research seems to have largely abandoned information extraction in the tourism domain and have opted for semantic web approaches to interoperability. Such approaches have been analyzed in projects such as SATINE (2004–2006) and concentrate on the semantic description of web services that give access to already structured data.

7.3.3 Gaps and future needs

Named entity recognition

For eTourism named entity recognition is a key to linking extracted information with given locations or organizations such as hotels, theatres, or other relevant players. For this purpose one need agreement on an suitable model to unambiguously link the names of organizations against a suitable vocabulary of organizational units in the eTourism domain, possibly based on the 29 types proposed in “Annotation guidelines for answer types” [Brunstein, 2002]. These findings need to be validated against sample data to test the level of granularity and a sufficient precision in the tagging.

Event extraction

Event extraction is still a research area, though, as we have seen, first applications are operational, e.g. in the news arena. Standardization in this area would be premature, though.

Tourism-specific information extraction

Event extraction for eTourism is still very much an area of research. In particular it misses performance tests that would allow an informed decision on the precision that current systems can reach. Given the great potential that information extraction can have for the domain, it would be highly desirable to have such data.

7.3.4 Recommendations

Short-term recommendations (1–3 years)

• Foster the use of semantic web technologies to describe non-structured data on the web by the means of resources to make data machine processable. • Semantically tag non-structured information.

Long-term recommendations (3–10 years)

• Agree on the name tags (labels) (preferably with intervention of a recognized body such as the W3C) representing particular tourism content ought to have, so that it is made visible for search machines. • Develop SW that enables (semi)automatic information tagging according to the previous recommendation.

84

CWA 15992:2009 (E)

7.4 Inter-ontology mapping 7.4.1 Needs and requirements

Introduction

The mapping between an integrated global ontology and local ontologies may support enterprise knowledge management and data or information integration. In the Semantic Web an integrated global ontology extracts information from the local ones and provides a unified view through which users can query different local ontologies. In an information integration system a mediated schema is constructed for user queries. Mappings are used to describe the relationship between the mediated schema, i.e. an integrated global ontology and local schemas.

Needs

There may be different airlines flying to the same destinations from same origins, and that information has to be shown to the final user in order for her to make a decision on the most convenient way to travel.

Tasks on distributed and heterogeneous systems demand support from more than one ontology. Multiple ontologies need to be accessed from different systems. In addition, the distributed nature (conceptualization) of ontology development has led to dissimilar ontologies for the same or overlapping domains. Therefore, various parties with different ontologies do not fully understand each other and they cannot work together as a consequence, not allowing electronic transactions. To solve these problems it is necessary to use ontology mapping to achieve interoperability among information sources and enable effective and efficient business transactions over the Internet

Requirements

Information sharing and integration does not only have to provide full accessibility to data. In addition it ought to make that data fully processable and interpretable by machines as well. One possible way to achieve effective heterogeneous information integration is creating links among already existing ontologies. There are different ways to map ontologies among them: from an integrated global ontology into local ontologies, local ontologies among them and ontology mapping in ontology merging and alignment.

Ontology mapping between local ontologies provide interoperability for highly dynamic, open and distributed environments, i.e. tourism. It can be used for mediation between distributed data in such environments. This kind of mapping is more appropriate and scalable than mappings between an integrated global ontology and local ontologies. It enables ontologies to be contextualized as it keeps content local. It can provide interoperability between local ontologies when different local ontologies cannot be integrated or merged because of mutual inconsistency of their information.

85

CWA 15992:2009 (E)

With the growing use of ontologies in different domains of interest, the problem of overlapping knowledge in a common domain becomes critical. The complexity of the travel and tourism industry could by no means be represented by a single ontology, thus multiple ontologies would have to be accessed from various applications. Inter- ontology mapping could very well provide a common layer from which several ontologies could be accessed and hence could exchange information in semantically sound manners.

7.4.2 State of the art

The task of integrating heterogeneous information sources put ontologies in context. They cannot be perceived as standalone models of the world but should rather be seen as the glue that puts together information of various kinds. Consequently, the relation of ontologies to their environment plays an essential role in information integration. The term mapping is used to denote the connection of ontologies to other parts of the application system. The two most important uses of mappings required for information integration are mappings between ontologies and the information they describe and mappings between different ontologies used in a system.

Many information integration systems use more than one ontology to describe the information. The problem of mapping different ontologies is a well known problem in knowledge engineering. General approaches that are used in information integration systems are:

• Defined Mappings: A common approach to the ontology mapping problem is to provide the possibility to define mappings. Different kinds of mappings are distinguished in this approach starting from simple one-to-one mappings between classes and values up to mappings between compound expressions. This approach allows a great flexibility, but it fails to ensure a preservation of semantics: the user is free to define arbitrary mappings even if they do not make sense or produce conflicts. • Lexical Relations: The approaches extend a common description logic model by quantified inter-ontology relationships borrowed from linguistics. Some of the relationships used are synonym, hypernym, hyponym, overlap, covering and disjoint. While these relations are similar to constructs used in description logics they do not have a formal semantics. Consequently, the subsumption algorithm is rather heuristic than formally grounded. • Top-Level Grounding: In order to avoid a loss of semantics, one has to stay inside the formal representation language when defining mappings between different ontologies. A straightforward way to stay inside the formalism is to relate all ontologies used to a single top-level ontology. This can be done by inheriting concepts from a common top-level ontology. This approach can be used to resolve conflicts and ambiguities. While this approach allows establishing connections between concepts from different ontologies in terms of common superclasses, it does not establish a direct correspondence. This might lead to problems when exact matches are required. • Semantic Correspondences: An approach that tries to overcome the ambiguity that arises from an indirect mapping of concepts via a top-level grounding is

86

CWA 15992:2009 (E)

the attempt to identify well-founded semantic correspondences between concepts from different ontologies. In order to avoid arbitrary mappings between concepts, these approaches have to rely on a common vocabulary for defining concepts across different ontologies.

7.4.3 Gaps and future needs

Ontologies have been widely used in a large number of information systems for different purposes. However, there is still a lot to be done in order to successfully mediate information exchange and integration processes.

Although reasonable results have been achieved on the technical side of using ontologies for intelligent information integration, the use of inter-ontology mapping is still an exception. Reviewing the literature, it seems that most of the mappings have been realised ad-hoc, i.e. for the particular purpose of the mapping itself, especially for the connection of different ontologies. There are approaches that try to provide well-founded mappings, but they either rely on assumptions that cannot always be guaranteed or they face technical problems. There is a need to undertake research on mapping methodologies for general purposes.

Most systems only provide tools to develop ontologies, and they fail to indicate a particular methodology to develop them. The comparison of different approaches indicates that requirements concerning ontology language and structure depend on the kind of information to be integrated and the intended use of the ontology. There is a need to develop a more general methodology that includes an analysis of the integration task and supports the process of defining the role of ontologies with respect to these requirements.

By the use of ontologies inter-ontology mapping could offer ontology-based mediation amongst diverse information sources. eTourism is an industry with a strong need for interoperability among all agents that take part in the value chain if an integrated service (general market trend) is to be provided. While languages for representing semantic models have been intensively studied, semantic mapping is an open and active research field still in very early stages. Still, there are a number of issues that need to be overcome in the near future:

• Identify incompatibilities between different data models and representation structures. • Define the degree of formality information needs to be defined with, so that two different data structures can effectively and as automatically as possible be compared, merged, and eventually mapped. • Language Heterogeneity is still a problem. There are a number of standard languages (with different degrees of formality) for representing semantic models. However, one of the major challenges still is the translation between models encoded in different languages. The challenge is to provide translations with guaranteed formal properties. • The Nature of Semantic Relations: Most existing mapping approaches use a very limited set of semantic relations that can hold with elements from different

87

CWA 15992:2009 (E)

models. In particular implication and equivalence are frequently used. Many realistic settings, however, demand for richer relations such as inconsistency, effect-cause relations or overlap. Very limited work exists on approaches for measuring the degree of relatedness specified by a mapping. This is in particular important when mappings are created by automatic mapping tools. A very specific problem with respect to semantic relations is the definition of semantic relations between models that describe the domain of interest at different levels of abstraction. • A general observation about the state of the art of mapping representations is that mappings are not yet considered to be first class entities in semantic models. While most approaches agree on elements such as concepts relations and instances, mappings are not yet an agreed element of semantic modelling. Important operations on mappings such as reasoning about, retrieving and composing mappings are currently not supported. • A Framework for Comparing Mappings: A very concrete research task is to design a common framework for comparing existing mapping approaches. • Text mining: Text analysis and conversion into a formal representing model that can automatically be linked into the common model of the tourism domain. • Identify techniques and methodologies in order to solve these problems. The problem of semantic mapping needs to be further automated; however, complete automation is not expected to be reached. • Tools that support the creation of a common ontology that enables semi- automatic mapping of data structures with data representation models. • Tools that act as information and mediation brokers (data and information conversion) between the common and particular models.

7.4.4 Recommendations

Recommendations within this section are by nature very similar to the recommendations proposed within Clause 6.3 (“Ontologies”).

Short-term recommendations (1–3 years)

• Foster the development of ontologies using the same standard definition language as well as the same degree of formality and expressivity to ease automatic ontology mapping, following W3C recommendations.

Long-term recommendations (3–10 years)

• Based on short term recommendations, build graphic user interface based tools that automatically merge and link ontologies using the ontologies' reasoning capabilities to automatically find and resolve alignment and inconsistencies.

88

CWA 15992:2009 (E)

8 Process handling 8.1 Needs and requirements 8.1.1 Introduction

Consumers in the tourism industry are getting more and more used to make online transactions, and the industry is competing with services to attract these customers and get them to the actual booking act as fast as possible. Traditional distribution channels are vanishing, and more flexible and dynamic networks rise. This very dynamic development puts pressure on service providers: Business actors have to follow demand to keep or expand their market share, otherwise they might get crowded out.

These challenges require skills in marketing but most of all in deploying modern information technology to manage the actual buying or booking process. This process and other processes in the domain alike usually require the participation of different players along the value chain to be fulfilled, making it necessary to interact easily with other computer systems on a process level. But the management of business processes is already difficult within one organization, making it a much more sophisticated challenge in a network of organizations.

We want to start the discussion on the topic by looking at broadly accepted definitions. Davenport [1993] defines a (business) process as “a structured, measured set of activities designed to produce a specific output for a particular customer or market. It implies a strong emphasis on how work is done within an organization, in contrast to a product focus’ emphasis on what. A process is thus a specific ordering of work activities across time and space, with a beginning and an end, and clearly defined inputs and outputs: a structure for action. ... Taking a process approach implies adopting the customer’s point of view. Processes are the structure by which an organization does what is necessary to produce value for its customers.”

Although this is a very customer-oriented definition, it is first of all broadly accepted, and second it has an important phrase: “A process is thus a specific ordering of work activities across time and space, with a beginning and an end, and clearly defined inputs and outputs.” Something similar comes from Rummler, Brache [1995] stating that “a business process is a series of steps designed to produce a product or service”.

As already outlined in the introduction to this topic, in the context of information and communication technology we consider a process to consist of data, being defined as inputs and outputs, and of its execution, being a “work activity” or step. The problem of data heterogeneity across different systems is part of the clause on semantics, while we want to discuss the dynamic aspect of executing processes by involving heterogeneous computer systems in this clause.

89

CWA 15992:2009 (E)

In fact even the one-time exchange of data is already a simple process, which implies that data cannot be exchanged without having some kind of processes being involved. Since this is already true for web sites being “crawled” to get information, we do not want to consider passive process participation in our discussion of the matter. Instead, we consider a rather complex interplay of at least two participants. This has always been a problematic issue, being a more critical challenge compared to mere exchange of data. This issue becomes even more pressing within a highly networked, dynamic and diverse environment like the tourism industry today. The introduction of standards is of course always a sophisticated way to meet interoperability issues, but we know from the past that it is difficult to find industry- wide acceptance. One reason is the loss of flexibility accompanying standards, another one the game of market forces.

Since we leave the problem of data mediation to the clause of semantics, and since we consider complex processes, we have named this clause “Process handling”, and we consider it the dynamic component of process interoperability with the need of active participation of all actors involved.

8.1.2 Needs

Under this clause the basic and principle needs for process interoperability are analysed and discussed, while requirements are outlined in the following clause. The challenge is to find ways of process interoperability between heterogeneous systems that allows an easy integration of business processes and leaves the autonomy and diversity of the different players, which is needed to correspond with the diversity of requirements on a global scale. The following discussion does not touch business issues like pricing, virtual or ad-hoc organizational forms, dynamic packing, legal aspects, etc. The intention is also not to design platforms for these issues; it is merely about discussing and recommending one or several ways to allow process interoperability.

According to the definition presented in the introduction, a process has a clearly defined starting point, a clearly defined ending point (an objective), and might have a number of interim steps. The following use case illustrates in simple terms a booking process in the tourism industry: The starting point is a user with the intention to book a room in a hotel he has already selected. The starting point is a clearly identified object; the ending point is a booking confirmation for this object for a specific date. The process could be broken down into the following steps:

1. Check availability of the room for the date given. 2. Get customer’s acceptance of terms (e.g. room rate for that date) and payment details. 3. Make reservation and print booking confirmation.

These three steps are very basic and might have some backward loops (e.g. if the room is not available) or sub-steps (the check for availability might include a temporary blocking of the room for the specific date). But most of all, parts of the process might run on other systems. Imagine that the booking is done on a portal comprising a number of different hotel chains. The checking of availability is done on

90

CWA 15992:2009 (E) a hotel chain’s computer system and the check for approval of payment by credit card (a pre-requisite for making a reservation) is done on a third system.

This short use case illustrates that a business process can be broken down in different steps (or sub-processes), which might need the interaction of different systems. The entire process could be drawn by using a flow chart showing the different steps and their dependencies. In any case the completion of the entire business process requires the handling of the steps and conditions. For example, if the room has been reserved in a first step and in a later step the credit card is not accepted, then the reservation of the room must be cancelled. Or it is cancelled automatically after some time if there is no confirmation which is required to complete the booking. This is up to the design of the process on the hotel’s side. However, the portal, owner of the entire business process, might need to deal with as many different systems as hotel chains are presented on the portal. And each of the systems might have different naming for reserving a room (booking, reservation, locking, etc.) and different conditions (requires confirmation to complete the reservation, cancels reservation automatically after some time without confirmation, keeps reservation alive until status is changed, etc.).

Although the portal requires just one step to be done on another system, it might have to deal with 100 different ways how to deal with this step if 100 hotels are involved. And each hotel might have to deal with 100 booking systems if they make business with 100 portals. Thus each actor might have to implement 100 interfaces to be interoperable with the required other systems. It is obvious that this is increasing dramatically the efforts to run processes automatically with other partners.

Since this simple use case is a very common use case and the industry is depending more and more on the interaction of different computer systems, we can assume a strong need for a solution that decreases the complexity for process interoperability in a networked environment. Typical business processes in the tourism industry are:

• searching; • selling and buying; • reservation; • booking; • modification; • cancellation; • confirmation; • notification; • payment and other money transfers.

This list might not be complete and could provoke a lot of discussion (e.g. the difference of buying and booking, confirmation and notification, etc.). However, it shall only bring examples of the frame of possibilities we are discussing. To perform all these processes in a networked environment we can assume that

• “the basic industry need is an applicable concept for the technical interaction of heterogeneous ICT systems to provoke and run complete business process cycles involving at least two different technical systems.”

91

CWA 15992:2009 (E)

“Applicable concept” shall express the need for something that is useful in daily business life.

“Technical interaction of heterogeneous ICT system” focuses the topic on the technical level, leaving out business, legal, social or any other aspects. It has to be possible to run processes on different systems regardless of their technical specification (frameworks, programming language, data base, etc.). However, it is obvious that these different systems cannot be completely different. They need at least a, whatever kind of, network connection and some protocol to be able to interact with each others. In this context, and due to its relevance, it is assumed that an internet connections and industry-wide used protocols (e.g. TCP/IP) and standards (e.g. XML) are available.

“Complete business process cycle” means that a business process, wherever it starts or ends, should be carried out completely as defined.

“Involving at least two different technical systems” shall define that the topic can have a bi-directional setup, but in any case has to be flexible enough to be run in a network of different technical systems, thus more than two and up to an unspecified number.

The design and management of business processes is a subject on its own, but for the discussion following it is enough to say that business processes can be broken into a number of steps. Each of the steps needs a trigger to initiate the step, has some conditions to be started and delivers an output, including information required for the performance of the overall process (e.g. trigger for another step).

Furthermore, we assume that a step can only run on one system. If it requires two systems this step must be broken down into several steps. This assumption is reasonable, because despite of the discussion whether this is technically feasible, it is necessary to have the authority over a step with one actor. Otherwise, two or more actors would be responsible for the same step, which is obviously not feasible in practice.

8.1.3 Requirements

The clause about requirements shall bring the industry needs, as described above, in a more structured and operative form.

1. Network capability: Ability to run one complete business process as an interplay of an unlimited number of different and clearly separated computer systems. 2. System independence: Ability to be deployed independent from the ICT system used, especially independent of: o databases, o data structuring, o operating systems, o frameworks.

92

CWA 15992:2009 (E)

3. Player independence: Ability of each player to participate in the same business process but initiated from an unlimited number of different players. 4. Process range: Flexibility to run each business process possible in the tourism industry. 5. Player’s autonomy: Leave autonomy and flexibility to the individual player to change own system and consume external data in an autonomous way. 6. Cost effectiveness: Low cost of integration and operation. 7. Stability and reliability: Including fault secure system and error handling. 8. System performance: High performance to allow fast transactions and comprehensive handling. 9. Security: Ability to meet security and trust requirements like: o data encryption, o partner identification, o fraud resistance. 10. System openness: Accessibility and availability, e.g. whether the system is publicly available or the specification is open. 8.2 State of the art

The following systems are currently the state of the art in the tourism industry:

8.2.1 Global standardization efforts

Standards define a formulized schema how to handle processes - thus they usually leave only little flexibility for the participants who have to comply with the standards. Furthermore they flatten diversity, since no deviations from standards are allowed. Standards can provide rather rigid rules, as for example the standards set by the Open Travel Alliance or by OASIS (ebXML). They define on a concrete level the data-schemas and process rules. Developing such a standard is in general a lengthy process, initiated either by market power or by industrial or governmental interest groups. And the implementation of a standard can be a complex and expensive procedure, which makes it especially difficult for smaller players.

Additionally, there are also more flexible initiatives, giving a framework within which players can adapt according to their needs, based on a non-cohercitive language that allow to express common basic element in a similar ways for all players but at the same time allow combination of those elements in different ways so as to allow diversity. Cost of implementation can be reduced compared to a full standard since all players may publish different levels of services. The use of templates also allows a certain flexibility in the format of responses according to requesters. A drawback stems from the fact that integrating different players may require certain adaptations due to commercial or system driven specificities. On the other hand this fact allows competition and diversity. An example of such a language is the XFT (Exchange For Travel) language.

93

CWA 15992:2009 (E)

8.2.2 Application Integration and APIs

Application Integration and Application Programming Interfaces (API) allow a 1:1 interplay of different players. The type of interplay is defined by the partners involved jointly or by one central partner having the market power to do so (like in the case of Amadeus). Application Integration is a much deeper way of system interplay requiring development work to fulfil the purpose by “integrating” the systems, while an API is a gateway for external systems where the corresponding partners do not care about what happens behind the partner’s gate.

This is feasible when having a central player, but is not feasible in open and dynamic networks, since an interface for each player has to be developed. This increases drastically complexity and cost of implementation. However, Application Integration and APIs are better suited to handle different processes and are more responsive regarding specific requirements for the systems involved. 8.3 Gaps and future needs

Service providers in the tourism industry are faced with a fast changing and highly dynamic environment. They have to meet changing market requirements in shorter time and more cost-efficient. They need to offer enhanced functionalities to their customers and at the same time need to run processes within the interplay of different systems, including the integration of external information systems.

The following table helps to highlight the current state of the art as described above meet the needs and requirements identified: Table 1 Criteria Standards Application integration Network capability rather no no System independence rather yes rather yes Player independence yes rather no Process range rather no rather yes Player’s autonomy no no Cost effectiveness rather no no Stability and reliability yes rather yes System performance rather yes yes Security indifferent rather yes System openness rather yes no

The different entries might well be questionable and can raise discussions, but in general they reflect well the current situation: Standards and Application Integration 94

CWA 15992:2009 (E) are not fully suitable for a highly networked and dynamic environment like the tourism industry today. They result in a loss of autonomy, need some central entity or control of power, and are expensive.

For each process run over different systems the interface needs to be specified, developed and maintained separately, since they do not all make use of the same standards or interfaces. If a new version of a standard or interface is published it cannot be used automatically. It needs to be deployed and maintained manually. It is obvious that a more flexible solution with a mediating technology meets better the requirements than rather rigid technologies.

Current research projects touch the issue of process interoperability by focussing on Semantic Web Services and Multi-Agent Technologies, but also Grid technologies, with the aim to develop intelligent and adaptive systems for the interplay of heterogeneous systems. Examples for these projects are:

• Agent Link: http://www.agentlink.org/ • ArguGRID: http://www.argugrid.eu/ • ARTEMIS: http://www.srdc.metu.edu.tr/webpage/projects/artemis/ • ASG: http://asg-platform.org/cgi-bin/twiki/view/Public • BREIN: http://www.eu-brein.com/ • DIP: http://dip.semanticweb.org/index.html • SATINE: http://www.srdc.metu.edu.tr/webpage/projects/satine/ • SUPER: http://www.ip-super.org/

These projects address the need for a more flexible and cost efficient way to align business processes between different systems. A promising way is the concept of Semantic Business Process Management, resulting from the application of Semantic Web Services to Business Process Management, as for example discussed by Hepp et al [2005]. Based on this concept, Cimpian, Mocan [2005] proposed a process mediator, adjusting the bi-directional flow of messages based on the Web Service Modeling Ontology (WSMO). This approach is similar to that chosen by the Harmonise project (http://www.harmonet.org/) for data mediation, in which a technology for mediating between heterogeneous data sources was developed. The Harmonise technology allows involved parties to exchange information without changing the local data structure, only by referring to a common understanding of a domain-specific ontological concept, the Harmonise Ontology [Fodor, Werthner, 2005]. 8.4 Recommendations 8.4.1 Short-term recommendations (1–3 years)

• Simplify and rationalize existing processes – use stateless process handling or request-response-pairs only. • Build an ontology of common processes in the tourism industry.

95

CWA 15992:2009 (E)

8.4.2 Long-term recommendations (3–10 years)

• Develop process mediators. • Put research efforts into intelligent agent technologies for automatic process handling.

96

CWA 15992:2009 (E)

9 Metasearch 9.1 Methodology 9.1.1 Needs and requirements

Introduction

Metasearch is the ability to run one search process over different search engines of heterogeneous instances (platforms, websites, databases) and aggregate result in a unified list. In the tourism industry they are typically used to compile and compare specific offers. Examples are:

• Checkfelix: http://www.checkfelix.at/, • Kayak: http://www.kayak.com/, • Farechase: http://farechase.yahoo.com/, • Trabber: http://www.trabber.com/, • Kelkoo Travel: http://travel.kelkoo.co.uk/, and • Minube: http://www.minube.com/.

Typically, search results are not stored in a database, but delivered as real-time results. However, some systems make use of data replication for static data (like, e.g., hotel descriptions).

Quality of results

Metasearch engines in tourism rely on quality of data, especially regarding accuracy of information. Quality of results obviously depends on the understanding of the same information provided by different systems, like:

• data organization: the structuring and naming of data; • data understanding: the encoding (like different categories) and precision of information (like miles and kilometres); • data depth: the availability of different information items in different systems (like disclosure of price composition); • data accuracy: like prices and availability.

Response time

An acceptable response time for search engines is of high importance to meet user’s expectations. Metasearch engines depend on the response time of the other search engines and have to face clever algorithms to avoid deadlocks. However, this is less a matter of information and process interoperability, except that runtime performance in aggregating data might be improved.

97

CWA 15992:2009 (E)

Access to data

Data can either be accessed by getting it automatically from the web interface (user interface) or via a data interface (e.g. web services). Semantic Annotation of content and Semantic Mapping, so that the metasearch engine can find the information that is required provides part of the answers and are detailed in the corresponding clauses of this study. However, there are still some remaining issues for instance regarding data encapsulated in pure graphical applications such as Flash applications, because the data is not accessible at all. Possible solutions come from the new technology trends such as Flex applications where the whole application is XML based. Other issues stem from client-side calculations (e.g. options depend on different settings, prices are calculated on the fly by client side). In that case, the data is directly hard coded in the application and would require interpretation of the code to access the data and the corresponding rules.

Another aspect of access to data is covered in the section on querying (9.2).

Efforts for maintenance

The search on another system is often tailored to the particularities of the foreign system. Interfaces have to be updated each time the other systems changes to keep the service level, if these interfaces do not follow a given pattern or standard.

9.1.2 State of the art

Web crawler

Web crawlers (synonyms: Robots, Bots, Spiders) are software scripts and programs that browse the World Wide Web in an automated manner to create copies of website (which are processed by other software agents later) or to gather specific information. They are used by search engines but are typically not used for metasearch processes, since they are normally only gathering information and not running processes on other websites.

HTTP requests

HTTP requests can be used to run automated search queries on existing search engines by rebuilding the HTTP request that is used on each of the external sites. HTTP requests are very maintenance-intensive, since each little change in the HTTP requests requires an up-date of the process. Depending on the external site, data is sent back in an unstructured or in a structured manner and needs to be processed to bring it into the scheme (semantics) used for displaying results by the metasearch engine. Depending on the provision of results on the external systems, HTTP requests can be used as a light-weighted, yet still maintenance-intensive, way for running a metasearch process.

98

CWA 15992:2009 (E)

Website wrapper

Website wrappers allow to grasp the unstructured information provided on websites and transform them into a structured form. Advanced wrappers allow to run more into deeper information architectures than webcrawlers, but have to be adapted for each website that has to be wrapped. The provider of the website that is wrapped does not need to make any changes or adaptations, thus the semantics of the wrapper can be applied to the search process.

Since advanced tools can run different operations, website wrappers are well suited for metasearching. Still they need considerable maintenance efforts since a change in the wrapped website requires an update of the wrapper.

Application Programming Interfaces (API)

APIs are the classic way for interactions between different computer systems and allow a broad range of possibilities. They can be independent of programming language and are therefore open for any kind of integration and information exchange. It has to be mentioned that the implementation of an API typically causes considerable efforts, and there is no general standard for APIs, since APIs vary significantly from the purpose and the domain concerned.

Web services

A web service is a software application enabling the exchange of data in XML format to allow machine-to-machine (M2M) interaction on different platforms. Different from HTTP requests and web crawlers, the provision of a web service has to be implemented by the service provider, who is identified by listings in registries (UDDI).

A similar approach is REST (or RESTful web services) to allow the exchange of domain-specific data over HTTP without an additional messaging layer like in web services. It is often described as an easier form of web services.

Web services and REST provide means to provide data in a structured (understandable) way and are therefore well suited to get information for metasearch queries. However, it still leaves out the problem of different ways to describe the data.

Semantic annotation

Semantic annotations provide methods to add metadata to documents, allowing some formalised understanding of the information provided in these documents. In this way the meaning (semantics) of information can be understood automatically by information systems (see also clause on semantic annotations, 7.2). Three different approaches exist do add semantic annotations to a document: 1. Embedded annotations are added into the document. 2. The document refers to another document providing the annotations. 3. Annotations refer to the document concerned.

99

CWA 15992:2009 (E)

Semantic annotation is an enabler for metasearch, making it easier to find and understand resources with information relevant to the search criteria. It is not a search method itself and still depends on a common schema to describe the annotation, but can be a powerful method in combination with web crawlers.

Caching mechanism

Caching mechanisms aim to provide extensive content directly disclosed by vendors on a regular basis (every day, hour, etc.) in a common language to be easily integrated by metasearch engines. Many metasearch engines in fact constitute internal caches from cached data provided by the vendors on a regular basis so as to improve response time and improve the probability to be present on the metasearch engine (because sources with poor response time are unlikely to be displayed in metasearch engines).

Summary

The methods described above can be divided into two groups.

The first group comprises methods where the agent providing a metasearch engine can integrate other search engines without any assistance from the search engines used (web crawler, HTTP requests, website wrapper). Thus the metasearch agent is more independent from the other systems. These methods are therefore more flexible, but cause considerable efforts for the implementation and maintenance of a metasearch service. However, they would obviously cause less effort if standards are supported or interoperability problems are solved.

The second group comprises methods, where the assistance of the external search engine is required, where some kind of interface is provided or where other changes are necessary. Clearly, these methods make it easier to implement and maintain a metasearch service, but require the application of standards or the solving of interoperability issues to run smoothly.

9.1.3 Gaps and future needs

The tourism industry is characterised by a vast number of search engines to search for flights, accommodations, events, attractions and other tourism services. Metasearch engines are important tools to provide a one-stop access to this information on a regional, national or transnational level.

The methods for metasearch described above provide useful tools to integrate different search engines, but quality of results, response time, access and efforts for maintenance depend very much on the use of standards or the ability to understand the other system in another way. Especially the combination of website wrapper and semantic annotations to websites seem a promising way to enable improved metasearch functionalities. The deployment of metasearch engines could be supported sharply if either broadly accepted standards or means for the interoperability of tourism related information could be provided.

100

CWA 15992:2009 (E)

One important direction metasearch engines can take is that of semantics. Semantic search engines are becoming increasingly popular. Semantic search engines are systems that need to understand (the meaning of) both what the user is asking for as well as the information that is stored in the web. Any semantics-based query recognizes key words used in order to carry out a search and uses that same information in order to display more precise results. The final and main objective of this search technique is to find all documents on the web that contain the most relevant information related to the query (i.e. those that syntactically match with the search keywords), minimizing the number of false results.

Additionally, semantic information enables the inference of new knowledge from semantic documents, based on various logic rules together with classes, attributes and values. This knowledge can to be stored and processed, which is not done by traditional search engines. The resulting graph from a semantic data collection is different from the one created with hyperconnected HTML-based documents. Thus, a semantic search engine is different from traditional ones, because a semantic search engine searches on semantic annotations of content, i.e. annotations realised using domain ontologies.

Another big challenge for metasearch engines, after understanding the content, is the system performance. All methods have to fetch data from different systems, transform them and display them in appropriate manner (ranking, paging, etc.). The more sources the systems queries, the more benefit it offers for the user, but the slower the system becomes.

9.1.4 Recommendations

Short-term recommendations (1–3 years)

• Make use of semantic technologies to describe your data. • Provide content and meta-content as close to an existing standard as possible. • Provide regularly updated, external data store with pre-processed and well described content for fast querying (caching mechanism), if you have larger querying process times or complex queries. • Development of aggregated data repositories, providing pre-processed data from different sources.

Long-term recommendations (3–10 years)

• Focus on development of fast and easy to use alternatives of metasearch technologies, enabling or supporting use of semantic technologies for data transformation.

101

CWA 15992:2009 (E)

9.2 Querying 9.2.1 Needs and requirements

Introduction

More often than not the information involved in eTourism transactions is distributed across a number of different data stores, usually operated by different companies: various GDSs (potentially in their respective national incarnations), CRSs, other sources, not to forget the plethora of unstructured data such as the web. As discussed throughout this section, we often need to find information in and across many of these data sources and, indeed, often for the data sources themselves.

However in many scenarios applications need to go beyond mere search and to query across data sources for data sets that meet very specific sets of constraints. Typical queries might be:

• List all hotels in Rome with at least three stars that have availabilities between October 20th and 22nd. • List all prices for flights to Rome that fly in on the morning of 20th and return on the evening of the 22nd.

In many cases queries could be much more complex still and be combined with constraints based on geographical data (hotels not more than 500 metres from the Spanish Steps), price ranges (not more than EUR 100 per night), etc. In many cases subsequent queries will build on the existing result sets of simpler queries and further refine them in a piecemeal manner.

Going more into detail of human search behaviour, we can observe that users are not searching for hotels “not more than 500 metres from the Spanish Steps” but rather for hotels “close to” or “near” to the Spanish Steps. However, a hotel “near” Rome might describe another distance than a hotel “near” the Spanish Steps. The translation of human search needs or peculiarities into a respective machine-readable search query covers aspects of interoperability we are not going to cover in this clause, which focuses on machine-machine interoperability. Nevertheless, natural language processing and the transformation into search queries remain important aspects and challenges in querying.

Needs and requirements

Fast transactions

Queries along these lines are typical parts of the selection phase in eTourism transactions. A given transaction will often involve a considerable number of queries as the customer or her agents are narrowing down their result set to a small number of hits that fit the demands. Queries therefore must be fast and return results within a maximum of a few seconds. Nevertheless, we can observe metasearch engines on the market today taking minutes rather than seconds to run real-time queries on external systems.

102

CWA 15992:2009 (E)

Reflect complexity of search requirements

Queries must be able to be sufficiently expressive to model the customer’s requirements, either directly through a single complex query that enumerates all constraints, or through a sequence of simpler queries that narrow down result sets.

100 % correctness of query results at this stage is highly desirable, but not absolutely necessary. The ultimate corroboration or falsification of query results can follow at the booking phase when an unbinding service offer is turned into a binding contract between supplier and customer.

Expansion of querying sources

The tourism industry is a highly dynamic environment, and data stores and search engines appear, and disappear, almost continuously. Content aggregation and syndication become indispensable tasks for the provision of one-stop platforms. Ideally, integration of new data stores into a user or travel agency facing metasearch engine should be largely transparent, easy and thus cost-efficient.

9.2.2 State of the art

Methods for query distribution

Technically, the search query entered into current metasearch engines has to be translated to other data stores for further processing. This can either be done by

1. manual translations on a case-per-case basis, 2. use of query by example, 3. use of standardized query languages, or 4. use of standardized query interfaces.

Furthermore, queries can be truly federated or based on pre-harvested and regularly updated data that the participating data stores provide to the metasearch engine. Some engines also combine these two approaches.

At present, option 1 dominates. For federated queries individual data stores today offer their own query strategies. These strategies often reflect their historic evolution and their specific internal processes. This makes querying one of the biggest challenges for metasearch, since queries cannot be translated easily from one system to another. The integration of each new data store means considerable custom programming, making it a costly and time-consuming enterprise.

For want of a standardized format, options 2 (“query by example”), 3 (“standardized query languages”) and option 4 (“standardized query interfaces”) are at present not widely used, though they offer considerable potential for easier integration of data stores. All of them are related in that they propose to look at commonly agreed query language.

103

CWA 15992:2009 (E)

Query by example

“Query by example” was developed by IBM in the 1970s in parallel to what was to become SQL (cf. Ramakrishnan, Gehrke, 2002, chapter 6). The user supplies example result sets that can formulate constraints or other selection criteria in addition to typical string values. Examples can often be built through graphical user interfaces.

When looking for a hotel, for example, the client would specify basic hotel characteristics (e.g. name, category, etc.), and room category, and the system would on that basis return a suitable set of hotels [Höpken, 2004]. This way, query by example partially relieves users to learn about formalized query languages and, instead, allows them to find related entries to known samples. However, it needs clear templates for the type of examples that can be constructed and used as the basis for cross-data store queries.

Standardized query languages

Standardized query languages usually expect users – which in this case will normally be system integrators rather than end-users – to learn a specialized language for querying a system. The best known of these certainly is the Standardized Query Language (SQL) for relational databases which is used by all current relational database management systems. At least the core features of ISO/IEC 9075, the international standard specifying SQL, are implemented by virtually all suppliers of relational database management systems.

SQL does not lend itself particularly well to federated queries and is normally used to consult a given database instance. SQL-like syntax is used, however, for drill-down searches in federated registries such as the ebXML Registry Specification. Likewise, the SQL syntax has heavily influenced the syntax of a number of other non-relational query languages such as the Object Query Language (OQL), Simple Protocol and RDF Query Language (SPARQL), and aspects of the Topic Map Query Language (TMQL). In the following, we shall look at one new query language especially for (potentially federated) semantic queries.

SPARQL: The Query Language for RDF (SPARQL) is, as the name suggests, a language for querying RDF triples. This relatively new W3C Recommendation was only published in January 2008, but can already point to a considerable implementation base. It can be used to query against a considerable number of commercial and non-commercial native triple stores – for a not complete list cf. http://esw.w3.org/topic/SparqlImplementations –, but also against adaptors such as D2R Server that sit on top of relational databases. This flexibility has encouraged the growth of a number of publicly available SPARQL endpoints, some of which are listed on http://esw.w3.org/topic/SparqlEndpoints.

SPARQL can honour the transitivity properties defined in RDF-S and OWL ontologies.

Typical queries might look like:

104

CWA 15992:2009 (E)

SELECT ?resource WHERE { ?resource dc:creator which would list all resources created by Eleanor Hallowell Abbot available in a given triple store, or

SELECT ?title WHERE { ?book dc:language "en" . ?book dc:title ?title } ORDER BY ?title which lists all available titles of publications in English.

Unlike SQL, SPARQL can be used for distributed queries and aggregation of data across data stores [Schenk, Staab, 2008], [Haase, Wang, 2007] [Quilitz, Leser, 2008]. Vocabularies can be cross-references and cross-queries across data stores. However, in the case of divergent ontologies being exposed in participating data stores, suitable mappings, e.g. to a reference ontology, must exist for a distributed query to succeed. Such a search strategy could in principle scale to manually annotated data sources such as web pages annotated with RDFa.

That said, at present few, if any, examples of distributed SPARQL queries across a number of nodes operated by different organizations are known to be used in a production environment. Even less so queries including many individual web pages, (though some commercial products such as Allegrograph (http://agraph.franz.com/allegrograph/) support elaborate and largely transparent federated SPARQL queries and reasoning across distributed instances of the system). Little is known if the technology would, in fact, scale well enough for large heterogeneous networks, and, if so, in which type of network topology.

Similarly, in spite of the rather positive overall SPARQL take-up in general, no endpoints in production use are currently known in the eTourism domain. SPARQL may or may not prove to be a good choice for the domain.

Interface standardization

Just as the query language itself, also the interfaces to query services can be standardized, e.g. through shared interface specifications in WSDL. Without a shared query language the expressiveness of such services is necessarily limited, but in many cases the result sets even of simple queries can be subsequently refined in further query steps.

The Open Travel Alliance specifies query interfaces in their schemas (http://www.opentravel.org/Specifications/SchemaIndex.aspx?FolderName=2008A), e.g. for the availability of cruises, of golf courses and many more. While the long-term benefits of interface standardization may be less than that of a shared query

105

CWA 15992:2009 (E) language, it is a chain of piecemeal, often informal standardization activities that can lower integration cost in the short to mid-term.

Metadata syndication

The alternative to distributed queries is local data stores based on metadata syndication. In rather simple forms – regular supply of data dumps generated from supplier’s CRSs – this is in many cases the practice in today’s GDSs. Often such dumps simply replace all the supplier’s data.

Using simple syndication protocols based on Topic Map or RDF playloads only actually changed records are exchanged between data stores. An -based general purpose syndication protocol is specified in part 1b of the nascent eGov- Share CWA (http://www.egovpt.org/fg/CWA_Part_1b). Nodes subscribe to change feeds and can thus import new, deleted or updated records on a case-by-case basis from their source registry, provided they have the necessary credentials to access the feeds – and their metadata can be mapped on a shared reference ontology.

Figure 9-1

106

CWA 15992:2009 (E)

Querying thus becomes a sub-problem of data integration, and queries can then be run locally against the aggregated data store. Updates can be pulled at short intervals (say, every 10 minutes), thus providing cached queries with nearly live results.

9.2.3 Gaps and future needs

Query by example

Query by example (QBE) can be used without any specific query language, only by the use of data samples. This makes it easy to implement when data interoperability is solved, independently of which kind of method is used to reach data interoperability (standard, interfaces, mediation). The main drawback is the fact that complex queries cannot be made and user requirements will not be met in most cases. However, in specific scenarios, especially when looking for descriptions or listings in domains with a shared or even standardized ontology, QBE might prove to be sufficient.

Standardized query languages / SPARQL

In order to gauge the potential of standardized query languages in general and SPARQL in particular in tourism scenarios we need experience reports and test- beds. Such test-beds should involve major information providers and or integrators to evaluate aspects such as:

• ease of the production of RDF triples based on existing data stores, • ease of the implementation of SPARQL endpoints on top of existing data stores, and • performance characteristics of federated SPARQL queries.

Since the ICT infrastructure in the tourism industry is characterized by a broad range of heterogeneous systems (and thus different databases), it is very unlikely that a typical query language can be deployed as a standard on a broad base. SPARQL, on the other hand, has the potential for broad acceptance, since it can be deployed on top of existing reference models in the case of divergent data models. In this setup it has similar benefits and constraints as QBE, but overcomes QBE’s main obstacle by allowing complex queries. SPARQL seems therefore to be one of the main potential candidates for handling metasearch queries in a distributed and divergent environment.

Interface standardization

The short-term benefit of interface standardization can be heightened by an overview of respective schemata. On this basis the relevant fora such as the Open Travel Association and XFT can spot gaps and fill them. Such specifications should be elaborated with the general recommendations in the process handling section.

Interface standardization seems also a reasonable practical method for running distributed queries. Its potential is limited by mainly two facts: Firstly, each possible

107

CWA 15992:2009 (E) query sequence must be defined as part of the interface(s). This makes query interfaces difficult in its definition and adoption, and therefore limits the potential for running complex queries since the efforts for defining and deploying interfaces become overwhelming. Secondly, participating partners must either implement the standard or define mappings to be interoperable with the standard. Thus query interfaces have to be implemented either for each query scenario or mappings based on a shared reference model must be setup.

Thus interface standardization is more advanced than QBE, but still has similar restrictions. It might be well suitable for specific scenarios, but sets its limits for a broader deployment.

Metadata syndication eTourism can build on the experience with metadata syndication in the eGovernment domain. Those results should be evaluated and screened for their applicability in the data integration between CRSs, GDSs and intermediates.

In fact metadata syndication bypasses the problem of running integrated queries in “metasearch” scenarios, by making normal queries in “metadata repositories”. This seems to be a fairly practical approach. It results in fast queries allowing whatever degree of complexity – only limited by the possibilities of the query language used. One disadvantage is the hosting of redundant data. Another one is the need of constant updates in a highly dynamic environment like, e.g., for hotel bookings. It might also be doubtful that metadata syndication is feasible for networked environments, since it might result in multiple asynchronous data hubs. A clear advantage is that it keeps the data source free from queries, improving the source’s overall system performance. Especially GDSs are subject to performance problems, which are alleviated this way. Again, the full benefit might be reserved to a limited number of search scenarios.

9.2.4 Recommendations

Short-term recommendations (1–3 years)

• If a system should be available for external queries, make use of general query statements that are supported by a broad range of query languages. Avoid specific features and functionality of own database. • Further develop flexible standardized query languages that can be adapted to different system environments and support semantically enriched data. • Publish “partial translators”, which provide a structured translation for human search concepts like “near”, that can be used by different query languages.

Long-term recommendations (3–10 years)

• Research on technologies for flexible and adaptive query methods, that are able to understand semantics of a web repository and can send an appropriate query.

108

CWA 15992:2009 (E)

9.3 Role of registries in eTourism 9.3.1 Needs and requirements

Introduction

As has been discussed above, both services and data is widely distributed in typical eTourism scenarios. In addition to the information provided by one or more large players such as a GDS, a typical eTourism transaction can bring together – or could in the future profit from combining – local and remote data from many sources. Standardized queries, e.g. based on SPARQL, or ad-hoc protocols can be used to actually retrieve specific data sets from data stores (see above) and web services can be used to access specific services, ideally through standardized APIs.

This scenario presumes, however, prior awareness of all pertinent sources of information and services that in reality no single player is acquainted with. Instead, machine-processable information on such stores and services is currently either not available at all or spread across many different data collections. These range from major commercial operators such as the GDSs themselves over national or regional integrators and portals to the web sites of small tourist destinations that list relevant services in their small geographical area.

The need for machine-processable information especially on services has long been recognized. When web services became popular in the late 1990s, three key factors were considered to be crucial for the success of the then new paradigm:

1. Technical interoperability: Web services need to exchange (often pre- defined) data structures or perform RPC-style calls across systems. 2. Description of service interfaces: The APIs of web services and the data structures must be defined in a machine-readable fashion. 3. Lists of available services: Knowledge about other existing web services and their goals as prerequisite for using them.

Solutions for these requirements are based on open specifications and are in the context of “traditional” web services usually identified with the three well-known basic web service standards SOAP, WSDL and UDDI (questions of semantic interoperability were largely out of focus at that time). In RESTful Web Services the stack is somewhat less clearly defined especially for machine-processable API descriptions, but the general requirements are the very much the same.

Needs

Looking beyond those specific web service standards, the OASIS Reference Model for Service Oriented Architectures [OASIS Reference Model] explores some of these requirements on a more precise, technologically neutral level. Around the idea of a service as “the mechanism by which needs and capabilities are brought together” gravitate concepts such as interaction of services, their service descriptions and their visibility and reachability, all grounded in the willingness to collaborate with the goal of achieving a real-world effect. 109

CWA 15992:2009 (E)

Rightly, “the large amount of associated documentation and description” [OASIS Reference Model] that exists for a service is seen as one of the defining characteristics of Service-Oriented Architectures (SOAs). This service description, however, goes way beyond interface descriptions and includes both information about organizational prerequisites – the foundations for what terms organizational interoperability – and a depiction of the service’s semantics, required for semantic interoperability. The same principles can be extended to the primordial role of the visibility for data stores in eBusiness and in particular eTourism transactions, an aspect less in the focus of the OASIS Reference Model.

Registries are typically regarded as one approach to achieve visibility, other options being semantic or general-purpose search engines. Registries in this sense help to find actual resources, thus enabling their discovery. For that purpose, they store more or less standardized metadata to describe those resources and offer an interface to query that metadata. This metadata could conceivably one day also be harvested using information extraction.

Requirements

Registries must facilitate finding existing services and data repositories. Together with standardized query technologies they thus help to put those resources to optimal use.

While registries focus on the visibility of resources, they build on the often unspoken assumption that there is already a willingness to collaborate and share those resources in a given context, be it within an organization or across organizational boundaries, be it for free or for a charge. This may or may not be true in a given case, and it may or may not imply that a registry owner is willing to give up control over the data. Furthermore, in the real world there is rarely a single source of information for any given area of interest, and, as we have seen in the introduction to this section, it is particularly true for the tourism sector. Individual registries are maintained at various levels of government – notably, local authorities supporting their local tourism industry –, in tourism associations, GDSs and other private sector organizations. This makes sense; in many cases the maintainers are closest to the very resources themselves and have both the best first-hand knowledge and the strongest business case to keep the data up-to-date.

That said, there is also a strong requirement for centrality, or, more exactly, central interfaces to enable searches across individual registries. Otherwise any one search will involve direct queries to a large number of eTourism registries, negating the very idea of visibility of data and services.

9.3.2 State of the art

UDDI and the ebXML Registry Specification

Two well known registry standards dominate the relatively small literature on the subject, namely UDDI and the ebXML Registry Specification. But neither standard has been widely adopted in the market. This is, as we argue in Küster, Moore, Ludwig, 2007, due to fundamental design issues that plague both specifications, 110

CWA 15992:2009 (E) namely the mixing up of the in reality orthogonal technical exchange formats, information models and organizational rules, leading to very limited adaptability for new requirements and to bloated specifications.

UDDI

UDDI is the best-known standard for registries of services. The UDDI 1.0 specification was formally released in 2002, pushed by major software vendors such as IBM, SAP and Microsoft. It was supposed to lay the basis for the loosely coupled operation of web services, bringing together service consumers and service providers, possibly even based on automatic discovery and cooperation. For this purpose, the vendors created three public UDDI registries that were open to all interested parties. These public registries, however, were not widely used and were eventually discontinued in early 2006.

Technically, UDDI is above-all an API for a set of SOAP-based web services with their respective data models. This API has continued to grow over the three published versions of the standard and covers today amongst others methods for publishing information on businesses and their services, for finding them and for establishing links between them. By now, the monolithic UDDI 3.0 standard totals an estimated 400 pages, not counting the nine XML schemata with the actual API specifications. ebXML Registry Specification

The ebXML Registry Specification is composed of the two sister OASIS standards [OASIS ebXML Registry], the former specifying its internal data model, the latter its SOAP-based API. In coverage it is quite similar to UDDI, though it supports more flexible content models. It distinguishes itself from UDDI by the support of federated queries across a number of different registries:

Figure 9-2

The Open Source Omar project (http://ebxmlrr.sourceforge.net/3.0/PropertiesGuide.html) is by all appearances the most popular implementation of the ebXML Registry Specification. 111

CWA 15992:2009 (E)

Semantically Enhanced Registries in SATINE

Neither UDDI nor the ebXML Registry Specification allows per se for detailed semantic descriptions of (web) services, let alone other types of resources such as data stores. Queries can at maximum leverage rather coarse-grained, domain- independent taxonomies such UNSPSC.

As has been argued above, semantic technologies are a key to enabling data and process interoperability, but are at present largely underused in eTourism in general and in GDSs in particular. The SATINE project (http://www.srdc.metu.edu.tr/webpage/projects/satine) was funded under FP6 from 2004 to 2006 with the explicit goal to overcome the shortcomings of some current GDSs. SATINE set out to “provide tools and mechanisms for publishing, discovering and invoking web services through their semantics in peer-to-peer networks” (http://www.srdc.metu.edu.tr/webpage/projects/satine/deliverables/D4.1.1.doc). Semantic technologies and specifically ontologies for web services play a significant role in the SATINE architecture.

Amongst other deliverables SATINE set out to establish a “Semantic-based Interoperability Infrastructure for integrating Web Service Platforms to Peer-to-Peer Networks”. Looking at both specifications, but in particular at the ebXML RS, the SATINE project defined mechanisms for describing the Web service semantics of registry entries through the use of OWL-S (task 4.1 deliverable (http://www.srdc.metu.edu.tr/webpage/projects/satine/deliverables/D4.1.1.doc)). In particular it built mapping tools for OWL constructs into UDDI and ebXML RS, in particular ebXML class hierarchies. It studied how these semantic descriptions can be leveraged in queries. The user interfaces permits to discover the Web Services advertised in the SATINE P2P network using their semantic definitions http://www.srdc.metu.edu.tr/webpage/projects/satine/publications/FreezedeChallenge s.doc.

Discovery is intended to happen on both levels: that of concrete eTourism services and of eTourism-related collections and registries. Few strategies for actually federating those registries are defined, though.

CEN/ISSS eGovernment Focus Group and CEN/ISSS WS eGov-Share

The CEN/ISSS eGov-Share Workshop (http://www.cen.eu/cenorm/businessdomains/businessdomains/isss/workshops/wseg ovshare.asp) was established in February 2008 with the aim to help designers and developers of eGovernment systems and applications by developing approaches and tools to facilitate the sharing of information across agencies and across borders.

The workshop produces specifications, guidelines and two practical demonstrators. This is to help designers and developers of eGovernment systems and services to be able to exchange descriptions of eGovernment resources in the widest sense and to build and maintain federated repositories that integrate resources – both services and data stores – created and managed by several agencies creating a single point of access to users.

112

CWA 15992:2009 (E)

Figure 9-3

Local registries are aggregated into larger registries that are often targeted at specific user communities. Those aggregated registries can, of course, be further aggregated into other registries still. All the while the origin of certain metadata sets remains fully traceable through unique identifiers. Furthermore, each of the semantic descriptions is addressable through normal URLs, making the overall architecture fully RESTful and an ideal fit for Resource Oriented Architectures (ROAs) and SOAs alike.

In the overall framework of specifications the workshop first specifies a simple domain-independent, Atom-based protocol for the exchange of semantic descriptions. It continues with a reference ontology for eGovernment resources with two representations, one in OWL and one in Topic Maps. While this ontology is naturally domain specific, the overall architecture supports to plug in arbitrary other domain reference ontologies e.g. for eTourism. Terminological resources, e.g. “skosified” vocabularies and taxonomies such as Eurovoc, are used to provide anchor points for value domains. Soft cultural elements help to heighten the awareness of and information on culturally variable system factors.

113

CWA 15992:2009 (E)

Figure 9-4

The resulting multipart CWA is currently out for open consultation and consists of the following parts:

• CWA Part 0: Introduction • CWA Part 1a: Reference Ontology and Metadata Schema • CWA Part 1b: Protocol for the Syndication of Semantic Descriptions • CWA Part 2: Federated Terminological Resources • CWA Part 3: Establishment of a set of Soft Cultural Elements • CWA Part 4: Evaluation and Recommendations

Future work may add specifications for the organizational arrangements especially in the eGovernment domain.

9.3.3 Gaps and future needs

Shortcomings of current registry standards

Neither UDDI nor ebXML registries have been well received in the market place. This is due to a number of serious shortcomings that affect those registries:

114

CWA 15992:2009 (E)

1. Both specifications essentially build on a fixed ontology with corresponding data formats for registry entries. This ontology is non-trivial to extend for other requirements (the ebXML Registry model being more flexible than UDDI). 2. Both specifications are overly long and complex. 3. Both are bound to a single technologies stack, SOAP-based web services. 4. Both standards lack a well-defined and simple data exchange storage format. 5. Especially UDDI clearly implies a specific set of procedures and organizational environments in which to operate. 6. UDDI has insufficient support for linking up registries. The federation support in ebXML is heavily Web Service based and difficult to implement.

Attempts such as SATINE to build ontology constructs into the registries further complicate the specifications and have seen little adoption in practice.

In short, UDDI and, to a lesser extent, the ebXML Registry Specification, meshes three important, but orthogonal concerns that should be kept apart:

• an information model for registry entries, • a specific technical interface to the registry, and • organizational procedures for maintaining the registry data.

Future needs

In line with the recommendation of the CEN/ISSS eGovernment Focus Group to “build domain registries in response to the needs of individual business cases and to construct them out of existing, standardized technologies” (section 1.2.2) we need to design federated registries in the eTourism domain that are built on a suitable information model – possibly in line with models used in semantic interoperability. At the same time, we should align with trends on technical interfaces – both notification and exchange formats – to those registries that are laid by the CEN/ISSS eGov- Share Workshop.

Much of the eGov-Share architecture lends itself ideally to this adoption, provided that a reference ontology for eTourism-related resources is developed.

115

CWA 15992:2009 (E)

Figure 9-5

The “watchtower” registry of relevant eTourism standards (cf. recommendation 6.1.4.1) lends itself to be the test case also for developing collaboration models for shared eTourism registries for those registries. Once this prototype is in operation, plans for the long-term operation of that registry must be elaborated that, again, can be exemplary for other registries in the domain.

9.3.4 Recommendations

Short-term recommendations (1–3 years)

• Develop a reference ontology for eTourism-related resources. • Build the “lighthouse” registries (cf. other recommendations) based on the syndication specifications standardized in WS eGov/Share. • Specify collaboration models for shared eTourism registries for those registries.

Long-term recommendations (3–10 years)

• Plan for the long-term operation and business models for the “watchtower” registry. 116

CWA 15992:2009 (E)

10 Object identification 10.1 Needs and requirements 10.1.1 Introduction

Until recently and still a standard practice, getting information or buying travel-related products is performed via intermediaries (such as agencies) directly providing the information and performing the bookings on dedicated systems, possibly vendor- specific systems. As introduced in the case study, the use of internet for travel- related searches and online shopping is increasing and already widely accepted. Multiple sources of information are available, proposing single products (like hotels, car rentals, events, etc.) or complex packaged products comparing or aggregating information from different sources and becoming sources themselves.

Identifying identical items (like the same hotel with similar names from different sites), comparing information on different items (such as room or price definitions), merging or filtering similar information from different sources (such as getting information on Baleares sometimes searching a Spanish region and sometimes directly Baleares) is next to impossible in the current situation.

10.1.2 Needs

In this clause the basic needs for unique identifiers for tourism products or services are discussed.

Travel being a change in location, precisely identifying geographical locations is a basic need for tourism. Identification mechanisms should allow searches and identification as well as geopositioning on maps, being a growing tool used on the web in relation with travel. In a more general context each travel service should have unique identifiers so as to allow cross references between the various sources, reliable comparisons and data aggregation. That would cover hospitality items, events, animations, activities, historical sites, exhibitions, museums, etc. In a world of open architecture technology, that would allow matching data from different sources without time consuming transcoding data being built, therefore reducing the amount of translations, allowing more efficient querying and reduced time to resolve query issues like cache synchronization. That would also allow building extensive knowledge bases compiling different data sources (matching hotel, chain and testimony sites for instances, completing with regional, historical or event site information, etc.).

Certain aspects of each type of service further require unique identification so as to remove ambiguity of definition and to allow comparisons. For instance, when buying a stay in a hotel, the type of room (a double or a triple room) becomes major. However at present, it is not necessarily clear what a double room would be (what is the size of the bed, is there one or two beds, can an extra bed be used, for a child or even an adult, etc.). For another component, other features may be crucial and should be correctly identified.

117

CWA 15992:2009 (E)

On a different level, to track the different intermediaries introduced in the previous case study and possibly to allow compensations for services rendered by different entities in the whole travel (pre trip, on trip and post trip), tagging individual entities such as central reservation systems, credit card companies, GDSs, web sites, wholesalers, travel agencies, chains, etc. would greatly facilitate commission and money collection.

10.1.3 Requirements

This clause outlines the different requirements that may be deduced from the previously exposed needs. Being able to uniquely identify objects corresponds to building taxonomies for certain domains or ontologies, some of them being mentioned in the following clauses. More information may be found in the taxonomy clause of this document.

Location codes

Unique precise exhaustive location codes are a basic requirement for the travel industry. Location coding should not be limited to general codes such as countries, cities or airports. Online information and booking facilities becoming widespread, it is now required to be able to associate codes to all levels of locations that can be used in a travel, such as

• touristic regions, • terminals, • stations (railways stations, ski stations, car rental pickup stations), • points of interests, • leisure, event or activity locations, • etc.

The location codes are often directly used by the experts and become also more and more visible to end users (on itineraries, on displays, in search forms, etc.).

Geodesic coordinates is also becoming vital information for searches (“What can I do in the vicinity of my hotel?”, “What alternative hotel?”, etc.), to represent itineraries, results, etc. However, it does not seem realistic for the geodesic coordinates to be the unique coding mechanism, the coordinates being complex and in essence corresponding to a point. What would therefore be a country coordinate?

Travel service codes

Travel products are always composed of a number of separate travel services proposed by different vendors through a multitude of resellers. Some of those companies have codes that are standardized (such as airline IATA codes), but most have codes that depend on the vendor or distributor (for instance hotel codes are different for each distributor so that you would have a different hotel code for the same hotel in Sabre, Amadeus, Hotels.com, the hotel chains proposing the hotel and the hotel Property management system itself).

118

CWA 15992:2009 (E)

Furthermore, more and more types of leisure, activities or travel related services are being proposed and published on Internet, without any unique identification (and classification). Unique identifiers for all those services are required to have a chance to discover and aggregate data in an efficient way.

It seems however unrealistic at present to imagine a unique global entity providing identification for all services worldwide, specific identifiers per country or per sector would also be possible provided there is capacity to ensure uniqueness of codes.

Travel service qualifier codes

To compare or qualify each type of services, it is now more and more required to have structured information based on universally accepted taxonomies. This information must also be codified. For some services, like hotels or car rental, it is more developed than for others, but it only corresponds to recommended codifications and not true unique identifiers. For most services, codification is still specific to each service provider.

In that case also, it seems unrealistic to have a unique body responsible for that type of codification.

Travel company codes

The important level of intermediation and the quantity of different companies involved in a selling process lead to complexity to explain pricing schemes, to unravel in question of complaints, to proceed with payments. Adding traceability for each step in the process is becoming an important requirement. That would imply unique identifiers for each company involved in those processes, such as

• the end travel services introduced in the previous clause, • the wholesalers (hotel chains, tour operators), • the distributors such as travel agents, online companies, • the intermediaries (central reservation systems, GDSs, switch companies, • the compensation, commission processing or payment processing companies, • the call centres, • etc. 10.2 State of the art

In this section, we review commonly used codes. There are other bodies producing codes that either associate other codes to regions, cities, countries or provide local codes (for instance for all cities in a country, postal codes, etc.). Those have not been reviewed, though they could very well be used to designate more locations, not linked with airports for instance.

10.2.1 IATA

The IATA codes are the first codes that come to mind in the travel industry, because they are used for airports, airline companies, etc. 119

CWA 15992:2009 (E)

• IATA Airport Codes: alpha-3 Codes. The IATA alpha-3 airport codes uniquely identify individual airports worldwide. They are made up of precisely three letters; numerals are not allowed. In fact those codes have been expanded to also contain city codes in case a city has more than one airport, as well as coach, rail or ferry locations if requested by an airline or CRS. For instance TGV railway stations usually have IATA codes because TGV are used as feeders for the airlines. It therefore becomes truer to define IATA codes as location codes used in travel rather than only airport codes. Except for cities, the codes correspond to transportation boarding locations and not really to stay or service oriented locations. Drawbacks of IATA airport codes are the fact that they cannot be much extended to include all locations required for the travel industry. • IATA Airline Code: officially an alphanumeric-3 codes as well as pure numeric codes (used for ticketing for instance). They were initially an alphanumeric-2 code which are the codes that are mainly used. The alphanumeric-2 codes are used in combination with others in ticket numbers, timetables, tariffs, etc. Codes are also allocated to railway or coach companies, whenever requested by airlines or GDSs. There are also codes that are reused for different airlines, whenever their destinations are not likely to overlap! Codes allocated to airlines that discontinue business would be reused after six months. • IATA Agency codes: Numeric codes: IATA is pivotal in the worldwide accreditation of travel agents issuing airline tickets with exception of the USA, where this is done by the Airlines Reporting Corporation. Permission to sell airline tickets from the participating carriers is achieved through national member organizations. As a consequence, there are agencies that would not have IATA numbers which have lead to alternative solutions according to countries, allocating Pseudo IATA numbers in some cases (such as SNCF issuing agencies in France that are not IATA).

There are also less used IATA codes such as baggage tag issuers, delay codes, accounting prefix codes, logistics company codes, etc.

10.2.2 ICAO

• ICAO airport codes: The ICAO (International Civil Aviation Organization) alpha-4 airport identifier codes uniquely identify individual airports worldwide. They are used in flight plans to indicate departure, destination and alternate airfields, as well as in other professional aviation publications. Usually, the first two letters of ICAO codes identify the country (but do not correspond to ISO country codes). In the continental USA, however, codes normally consist of a ‘K’ followed by the airport’s IATA code. • ICAO airline designator: The ICAO airline designator is a code assigned by the International Civil Aviation Organization (ICAO) to aircraft operating agencies, aeronautical authorities and services. The codes are always unique by airline. There are ICAO codes for companies that have no correspondence with IATA codes.

120

CWA 15992:2009 (E)

10.2.3 ISO

A number of ISO standards are used on a regular basis in the travel industry:

• Country codes, EN ISO 3166-1 alpha-2, alpha-3 and numeric. EN ISO 3166-1, as part of the ISO 3166 standard, provides codes for the names of countries and dependent territories, and is published by the International Organization for Standardization (ISO). Some codes are in fact regions and not countries (such as MQ for Martinique, part of France), therefore leading to some confusion (“Is FR only the mainland or the whole of France?” for instance). Alpha-2 codes are more often used, alone or in combinations. • Region zones ISO 3166-2 alphanumeric codes. ISO 3166-2 is the second part of the ISO 3166 standard published by the International Organization for Standardization (ISO). It is a system created for coding the names of country subdivisions and dependent areas, such as regions, states, departments, etc., depending on countries. They usually correspond to administrative zones. • Language codes: ISO 639-1. Although alpha-2 codes are not sufficient to code all languages, this is sufficient in most cases. In case there is a need to expand, ISO 639-2 or ISO 639-3 could be used. In some cases, when local variations of the languages are important, the ISO 3166-2 country code is used in association with the language code (such as fr-FR and fr-CA). • Currency codes: ISO 4217. The first two letters of the code are the two letters of EN ISO 3166-1 alpha-2 country codes and the third is usually the initial of the currency itself. In some cases, the third letter is the initial for “new” in that country’s language, to distinguish it from an older currency that was revaluated; the code often long outlasts the usage of the term “new” itself.

10.2.4 UN/LOCODE

The United Nations Code for Trade and Transport Locations is commonly more known as UN/LOCODE. Although managed and maintained by the UNECE, it is the product of a wide collaboration in the framework of the joint trade facilitation effort undertaken within the United Nations.

Each code element consists of five characters, where the two first indicate the country (according to EN ISO 3166-1) and the three following represent the place name. Examples such as CHGVA, FRPAR, GBLON, JPTYO and USNYC ring bells for air travellers who are used to see the three last letters of these codes on their luggage tags. UN/LOCODE picks up the IATA location identifiers wherever possible, to benefit from their association value and to avoid unnecessary code conflicts. In allocating codes, the secretariat tries to find some mnemonic association link with the place names, to aid human memorization. This is of course increasingly difficult for large country lists where the 17576 permutations of three letters are near exhaustion.

Each code is also associated to different additional information, among them (possibly multiple) function(s) such as airport, harbour, railway station, road terminal, etc.

121

CWA 15992:2009 (E)

The position of this additional coding mechanism is interesting because

• it is based on existing and accepted standard (IATA codes whenever possible, ISO 3166-2 country codes); • it expands the code list following the same structure and methodology; • it takes into account the human use of the codes, facilitating mnemonic associations.

10.2.5 HEDNA

HEDNA is an international association focused on identifying distribution opportunities and providing solutions for the lodging industry and its distribution community. HEDNA compiles codes for instance for hotel chains, room types, etc., so as provides list and codes of conducts on how to use lists.

HEDNA also works on a project to provide global unique identifiers.

10.2.6 ACRISS

ACRISS Members utilize an industry standard vehicle matrix to define car groups ensuring a like to like comparison of standards across countries. This easy-to-use matrix consists of four categories. Each position in the four character vehicle code represents a definable characteristic of the vehicle. The expanded vehicle matrix makes it possible to have 400 vehicle types.

This coding system has been adopted to ensure that all ACRISS members display the same coding for the same vehicles, enabling you to make an informed decision when comparing rates.

This certainly facilitates understanding what type of vehicle being rented though many surprises can still happen, even within ACRISS members.

ACRISS does not actually provide standardization for all car rental related data; for instance car rental stations are not standardized, nor are opening hours.

10.2.7 GIATA

GIATA acquires and standardizes (normalizes) the digital image and text data for many tour operators and travel agencies such as TUI, Thomas Cook, Easyjet, Expedia, Opodo or Lastminute.com. They are also used by all well-known CRS/GDS (Amadeus, Sabre, Galileo/Worldspan) to provide decoding information based on a unique identifier present in those GDS.

GIATA is not a global standardization body but it has compiled enough data to become de facto a “standard” source of information, their identifier becoming the identifier. It is not completely true though since it is not globally used, nor even used by the hotel owners.

122

CWA 15992:2009 (E)

10.2.8 GS1

The GS1 System is an integrated system of global standards that provides for accurate identification and communication of information regarding products, assets, services and locations. It is the most implemented supply chain standards system in the world.

GS1 Identification Keys automatically identify things such as trade items, locations, logistic units, and assets in a unique way worldwide. They can be used on bar codes, in online transactions, for selling or synchronization processes, etc.

Though this identification scheme is not used at present in a systematic way in the travel industry, it is applied in many other trades in a successful manner and could therefore be easily expanded to the travel trade.

GS1 operates in multiple sectors and industries and already works in close relation with many corporations throughout the world as well as various standardization bodies such as

• International Organization for Standardization (ISO), • UN/EDIFACT, • GCI (Global Commerce Initiative), • ISBN (International Standard Book Number), and • ISSN (International Standard Number).

10.2.9 URI

Since we are reviewing methods to obtain unique identifiers, the W3C provides a means for globally unique identifiers: URIs. Uniform Resource Identifier (URI) is a compact string of characters used to identify or name a resource on the Internet. The main purpose of this identification is to enable interaction with representations of the resource over a network, typically the World Wide Web, using specific protocols.

URIs could be used in the travel industry in a systematic way, but they have major drawbacks such as

• not being short, • requiring registration (and therefore money), • not really providing standard naming conventions.

10.2.10 UUID

Universally Unique Identifier (UUID) is an identifier standard used in software construction, standardized by the Open Software Foundation (OSF) as part of the Distributed Computing Environment (DCE). The intent of UUIDs is to enable distributed systems to uniquely identify information without significant central coordination. Thus, anyone can create a UUID and use it to identify something with reasonable confidence that the identifier will never be unintentionally used by anyone

123

CWA 15992:2009 (E) for anything else. Information labelled with UUIDs can therefore be later combined into a single database without needing to resolve name conflicts.

Though not directly applied in the tourism industry, since it is technically oriented, UUIDs are interesting in the sense that they do not require a centralised body for validation (though repositories or registries would be useful). UUID keys are still not directly usable due to their inherent complexity. 10.3 Gaps and future needs

In this section the gaps present in the current identification schemes are outlined as well as future needs.

10.3.1 Location

In the previous clauses we have seen that various associations and organizations propose location identifiers. However, there is currently no worldwide identification standard that can uniquely identify and provide information about entities within the travel industry.

Country codes

There is mainly consensus around the country codes (though several coding schemes exist). The ISO 3166 standard is very widely used and even incorporated in other standards (like UN codes). However, the alpha-2 codes are mostly used, limiting the migration to alpha-3 codes. That may hinder extending the codes.

Some “country” codes are also allocated to regions of certain countries or even part of the world that are bigger than countries (like EU for the , MQ for Martinique). This most likely comes from the need to have travel-oriented zones that often coincide with countries, but not always. At present this is not done in a systematic way (there is no code for Corsica or Baleares for instance). There is a real need to differentiate touristic “zones” with political countries or areas.

Region codes

There is less consensus here. The ISO subdivisions of countries are less widely used because they are less matching the travel industry needs.

• There is a need to provide travel specific regions, that do not really map political or administrative boundaries (cruise regions at sea, ski regions (or mountains)) that are present on several countries, specific touristic regions that may be within a country or across countries (Mediterranean region, the south of France, Sardinia, Balearics, La Réunion, etc.). • Some countries have several levels of subdivision and the current ISO codes only take into account one level (like the French departments but not the French regions for which a local coding is used, some codes being identical to the ISO sub regions, but with different meaning though).

124

CWA 15992:2009 (E)

• Some travel companies are also specialized on certain domains (like diving, hunting, etc.) and they also require specific regions related to their specialty. There is no way to submit such regions in order to create a global repository. There should be a mechanism to submit and validate such codification because that would allow better understanding of offers which are at present difficult to compare.

City, airport and other point of travel codes

IATA, though widely accepted provides a number of incomplete identifications (city, airport, railway stations, etc.) without differentiation or identification of the types of locations.

New codes are added only in relation with airline related business without systematic coding processes.

Furthermore, alpha-3 identification is far too limited to code travel related identifications.

Those codes are still widely used and a global coding process should allow their integration, at least for their original objectives (airport codes).

ICAO is also providing airport codes in a more neutral way, including non ISO country codes. They tend to be used internally by airlines and airports, therefore using two sets of codes. They tend to be specialized though and limited to airports.

All in all, airport codification is fairly well covered though cluttered. However, no codification integrate terminal data and airport codes so that vendors often create pseudo codes such as CD3 in lieu of CDG terminal 3, disrupting the initial IATA codes.

Furthermore, travel destinations are not limited to airports or main cities (which are covered by the IATA codes). Precisely defining cities in general, villages, stations (airport terminals, ski, railway, car rental, coach, etc.), points of interest within cities or outside, lieu-dits, etc. does not exist on a global scale and is a major issue for eTourism.

There are several possible ways to move forward: either differentiate airports, railway stations, cities and build identification schemes for each type of item. Or on the contrary create a unique set of identifiers for points of travel.

The second approach corresponds to the historical approach where cities actually inherited the codes of their airports and then sometimes differentiation occurred. That seems logical because when travelling somewhere, location and airport is a very similar notion (for the trip), except in case of multiple airports and airport differentiation is in order.

IATA nor ICAO seem in a position to provide coding schemes. Integrating local postal codes and possibly other codifications in a global identification process could speed

125

CWA 15992:2009 (E) up the process. The UN has also initiated the same type of process, with stations, harbours, etc., completing airport codes whenever possible.

10.3.2 Currency and language codes

Relevant ISO standards are already in place and there are no any major gaps, except possibly defining precisely the notion of localization.

10.3.3 Travel service codes

IATA and preferably ICAO provide extensive identification for airlines. Car rental companies, hotel chains are also identified by two or three identifiers. However, there is no global unique identification for hospitality items, cruise companies, events, animations, activities, services, restaurants, etc.

It is therefore impossible to have unique identifiers for each element of a trip and it is therefore impossible to compare or even amalgamate information. In case such identification was in place, there would then be a need to provide additional qualification like understanding the rights of the source related to the content (like is this first hand information, does the author have the right to create or distribute the information, etc.).

Some organizations such as the HEDNA have such a project for hospitality services or other specific services. Private companies in certain countries provide partial data (such as GIATA in Germany). Private companies distributing content also provide unique identifiers within their system which do not allow cross referencing.

10.3.4 Travel service qualifier codes

If we still want to refine the definition of travel related items, we would need to identify rooms types, car types, facilities, staff credentials, etc.

Here again, nothing comprehensive really exists. Certain associations provide recommendations or partial identification schemes and guidelines, without possibly imposing a standard. For instance, there are coding recommendations for double rooms (such as DBL or just D) but that does not inform on the true occupancy of the room or its situation, view, comfort, etc.

Defining unique codes for travel services is very delicate because it touches marketing or sales oriented information which is subjective and also requires many details to allow precision. Actual codes are likely to be aggregates of different information (such as room information, bed information, features, location, etc.).

10.3.5 Travel company codes

Finally, we expressed the importance of being able to track each company in a travel related booking or data exchange process. There is presently no global service providing unique identifiers for vendors, distributors, central reservation systems, cruise companies, commission payment systems, tour operators, etc. 126

CWA 15992:2009 (E)

10.4 Recommendations 10.4.1 Short-term recommendations (1–3 years)

• Build a registry of present object identifications in the tourism industry. • Develop travel related global geography identifiers. • Integrate the global geography identifiers in the registry and build transcoding capability. • Develop travel company related global identifiers.

10.4.2 Long-term recommendations (3–10 years)

• Provide recommendations for travel service coding schemes. • Build transcoding capacity over the above mentioned repository to transform the registry into a thesaurus.

127

CWA 15992:2009 (E)

11 Best practice case 11.1 The starting point

The objective of the best practice case is to instantiate a real case study demonstrating the future scenario of main issues of data and process interoperability, based on an existing business case. The CWA results shall be discussed based on this case to have them witnessing the feasibility of what the CWA states.

We have selected an existing eTEN project, which joined the workshop as a member and was also present with key-note speakers during one of the workshop meetings in . The project called “euromuse.net” comes not from the core tourism domain, but from the tourism driver cultural heritage. The project improves an existing platform to offer services and exhibition data to the tourism industry and wants to bridge the existing gap between cultural heritage and the tourism industry. It actually faces the same problems as discussed in the workshop and has an appropriate data mediation solution in use to show the way recommended in general by this CWA to overcome the interoperability problems. It uses Harmonise 2.0 to integrate data from 100s of Europe’s top museums and provide this aggregated information to the variety of players of the tourism industry. And of course there is a strong need for a cost- effective and easy-to-use solution, since museums usually do not have large IT departments, if any at all. euromuse.net has been identified as a very good starting point for discussions of the issue and to demonstrate a real live system, which could otherwise not really be implemented easily within the course of the CEN workshop. It allows to make a real demonstration and to discuss the issues presented in this document based on the system in use. 11.2 The existing case of euromuse.net

People interested in exhibitions and museums depend on access to information, which – in most cases – is only available spending great effort on a rather complex and scattered market as such bundled data from museums on a supranational and multilingual level is difficult to access. euromuse.net is a public access portal providing multilingual information on museums and their exhibitions throughout Europe. euromuse.net offers both, a ‘one-stop’ web tool to the greatest exhibitions in Europe for the public as well as a special data interface called Harmonise to deliver structured data from the museums for the tourism sector. The euromuse.net project will deploy an existing online service, which provides multilingual information about temporary exhibitions and museums as well as other museum resources on a web platform, to develop a wider pan-European data-collection based on public sector information to be re-used by different actors in the cultural and tourism fields. The project aims at three main goals:

128

CWA 15992:2009 (E)

1. Improve and increase the existing platform, a website offering museum and exhibition information to the general public for free. 2. Integrate the museums’ information of the euromuse.net database with the Harmonise tools. Through this integration euromuse.net’s rich content will affiliate with the online offers of other European and national tourism and marketing services for culture. 3. Enhance the existing services to integrate information on scientific publications from museums and to expand the current services, which provide an overview of “virtual” museums and their (online) resources.

The main focus is to improve the connection between existing marketing and promotion channels of the tourism industry and the cultural sector over the euromuse.net database. A general idea of the euromuse.net project is to better connect the museum sector with relevant target groups in the tourism sector– both on a professional and on a non-professional or private level. euromuse.net services will support and strengthen existing connections between the general public interested in museums and exhibitions, the professional tourism sector and museum.

The service will help to create easily accessible information about exhibitions and museums all over Europe. This takes place by offering the information on three complementing services: On the website http://www.euromuse.net/, mainly for the general public and accessible for free, via tools for structured data exchange with databases of tourism industry and other tourism players and on a scientific literature database of museum publications, mainly for researchers and museum staff. The tools for data exchange will enable representatives of the tourism industry and services to organize personalised tourism packages for their customers through the service.

Because the requests of industrial and private users normally differ, the project offers special access for tourism industry users besides the euromuse.net website. Special search strings and precise queries to the euromuse.net database allow optimized preparation of organized trips. Industrial users will receive structured and formatted data on a special export from the euromuse.net database. The commercial users of this functionality will be requested to pay a contribution for this service provided. 11.3 Future scenario for euromuse.net

The current setup of euromuse.net is sufficient to collect exhibition information from hundreds of partners with different data models, and to pool this information in a central repository via the Harmonise service. This data can be searched by project external partners, most of all tourism organizations, to get up-to-date information about exhibitions all over Europe. This is again done via the Harmonise service, so the tourism partners do receive the data in their own data format and can feed their data bases easily.

This follows very much the approaches recommended in this CWA to overcome the data interoperability problem. However, here the current setup ends leaving up some issues open, which have also been discussed in the topics of this CWA. Some

129

CWA 15992:2009 (E) of them should be deployed in euromuse.net in the future, some of them are still not easy to solve.

Following the order of this document, the first issue is the process handling. Most museums do not have a system to allow online ticket purchasing, but they might have soon or later. Online buying of tickets will therefore become an issue, also because travel agencies might wish to bundle services together dynamically to sell a full travel package to the client comprising also exhibitions. Process handling would be principally possible easiest by a stateless way of managing processes, handling it only by exchange of data. Process mediators are currently being developed in applied ICT-sciences and might offer an improved solution on the longer run (these process mediators work similarly to the data mediator Harmonise).

Meta search is the next topic and in some sense euromuse.net is already a meta- search repository, since it is aggregating data from different sources and makes it available for search queries. When currently querying data on the euromuse.net data base, a fixed query string or query rules have to be used, since no proper solution could be found to handle different query strings in a flexible and generic way. In the future, it should be possible also to map different queries to run one query simultaneously on a larger number of instances, which all might have a different query language. This shows the need for interoperable query languages but also the need for registries, in order to find the data instances that should be searched. Clearly, there is the need for some meta information about where to search, because searching any data base in the world to get a certain set of data is inefficient if not impossible. Thus, reliable registries directing search queries to potential data sources would significantly improve search efficiency.

And even if you search various data bases and retrieve a large number of results (let’s say exhibitions in the case of euromuse.net) you do not automatically know how many exhibitions are represented several times in the data sets retrieved. Thus, object identification is the last of the topics, which are covered by this CWA and are also a future enhancement of euromuse.net. If all exhibitions, museums and locations can be identified automatically, then it is possible to clean the data base from multiple entries of the same object automatically. At the moment the issue is open in euromuse.net, since the number of sources is manageable and the probability, that one exhibition is reported by two museums, is very low. However, this might rise significantly and quickly when the network grows. 11.4 Critical discussion

The discussion above showed that even in this rather small scenario, where the business case and the players can be overlooked easily, all the topics touched in the CWA are relevant issues. Even if the project comes originally from the cultural heritage sector, it has strong links with the tourism industry, maybe stronger than we might perceive when looking at it at the first time. This more “extra-orbital” issue of exhibitions might also make it easier to see the questions and answers raised in this document, since it is less bound to topics of hotels and flights (and of course other products more in the core of tourism than exhibitions). Nevertheless, even coming from the outer sphere of tourism, it is of deep relevance to tourism in Europe.

130

CWA 15992:2009 (E)

It is easy to realise that the topics are exactly the same for exhibitions as they are for accommodation. euromuse.net therefore demonstrates nicely how all of the issues can be solved also on a global scale. The same technology and setup for mediating data and processes can be used for any other object, like accommodation, flights, car rentals, events, etc.

After all, one important issue remains unanswered, since it is out of scope of the interoperability issue: Although you could exchange all the data smoothly, identify data sources easily, understand the content and also run processes for bookings - how to assure data quality? How to make sure a time table (opening hours, flight schedules) is correct or the price quotes are valid? Quality of service and user acceptance will depend very much on data quality. In euromuse.net it is discussed to have users involved to report back quality of information. Maybe the involvement of users (user generated content) is a reliable source for estimation of data quality. But although this topic is an important one, it is not part of this CWA about data and process interoperability.

131

CWA 15992:2009 (E)

12 Bibliography and references

The following is a list of documents and web sites other than referenced European and International Standards, which are listed in Clause 2 (“Normative references”).

[Adam, Hofer, Zang, et al, 2005] Otmar Adam, Anja Hofer, Sven Zang, Christoph Hammer, Mirko Jerrentrup, Stefan Leinenbach: “A Collaboration Framework for Cross-enterprise Business Process Management”. In: Panetto, Hervé (Hrsg.): Interoperability of Enterprise Software and Applications – INTEROP-ESA’2005. Geneva, Schwitzerland, February 23–25, 2005, Technical Sessions, 2005, p 499-510 [Addis, Boniface, Goodall, et al, 2003] M. Addis, M. Boniface, S. Goodall, P. Grimwood, S. Kim, P. Lewis, K. Martinez, A. Stevenson: “SCULPTEUR: Towards a new paradigm for multimedia museum information handling”, In: Proceedings of the Second International Conference on Semantic Web, p 582- 596, 2003 [Addis, Stevenson, 2002] M. Addis, A. Stevenson: D6.2 Impact on World-Wide Metadata Standards, Deliverable report of ARTISTE project, 2002 [Adrian, Sauermann, Roth-Berghofer, 2007] B. Adrian, L. Sauermann, T. Roth- Berghofer: “ConTag: A semantic tag recommendation system”. In: Proceedings of I-Semantics ’07, p 297-304, 2007 [Advanced Distributed Learning] http://www.adlnet.gov/ [Agent Link] http://www.agentlink.org/ [Ahern, King, Naaman, et al, 2007] S. Ahern, S. King, M. Naaman, R. Nair, J.H.I. Yang: “ZoneTag: Rich, Community-Supported Context-Aware Media Capture and Annotation”. In: Proceedings, MSI workshop CHI2007, San Jose, Calif, 2007 [AICC] Aviation Industry CBT Committee, http://www.aicc.org/ [Amadeus] http://www.amadeus.com/ [Amann, Fundulaki, 1999] B. Amann, I. Fundulaki: “Integrating Ontologies and Thesauri to build RDF Schemas”, ECDL Research and Advanced Technologies for Digital Libraries, p 234-253, 1999 [ANSI] American National Standards Institute, http://www.ansi.org/ [ArguGRID] http://www.argugrid.eu/ [Aristotle] Aristotle: Metaphisics Book IV, http://classics.mit.edu/Aristotle/metaphysics.4.iv.html [Arnarsdóttir, Berre, Hahn, Missikoff, Taglino] K. Arnarsdóttir, A.-J. Berre, A. Hahn, M. Missikoff, F. Taglino: Semantic Mapping: ontology based vs. model based approach. Alternative or complementary approaches?, ftp://ftp.informatik.rwth- aachen.de/Publications/CEUR-WS/Vol-200/17.pdf [ARTEMIS] http://www.srdc.metu.edu.tr/webpage/projects/artemis/ [ASG] http://asg-platform.org/cgi-bin/twiki/view/Public [Aviation Industrie CBTI Committee] http://www.aicc.org/ [Baader, Horrocks, Sattler, 2003] F. Baader, I. Horrocks, U. Sattler: “Description logics as ontology languages for the semantic web”. In: S. Staab, R. Studer, eds: Lecture Notes in Artificial Intelligence, Springer Verlag, 2003 [Bailey, 1994] K.D. Bailey: Typologies and Taxonomies - An Introduction to Classification Techniques, London, Sage Publications, Quantitative Applications in the Social Sciences, 1994

132

CWA 15992:2009 (E)

[Barrasa, Corcho, Gómez-Pérez, 2004] J. Barrasa, O. Corcho, A. Gómez-Pérez: R2O, an Extensible and Semantically Based Database-to-Ontology Mapping Language. Second Workshop on Semantic Web and Databases (SWDB2004). Toronto, Canada, August 2004 [Berners-Lee, Hendler, Lassila, 2001] Tim Berners-Lee, J. Hendler and O. Lassila: “The Semantic Web”. In: Scientific American vol 284, no 5, p 34-43, May 2001 [Bikel, Miller, Schwartz, Weischedel, 1997] Daniel M. Bikel, Scott Miller, Richard Schwartz, Ralph Weischedel: Nymble: a High-Performance Learning Name- finder, 1997, http://xxx.lanl.gov/pdf/cmp-lg/9803003 [Biron, Malhotra, 2001] P.V. Biron, A. Malhotra (Eds): XML Schema Part 2: Datatypes. W3C Recommendation, May 2001, http://www.w3.org/TR/xmlschem-2/ [Bizer, 2003] C. Bizer: D2R MAP – A Database to RDF Mapping Language, The twelfth international World Wide Web Conference, WWW2003, Budapest, Hungary, 2003 [Bloehdorn, 2005] S. Bloehdorn, K. Petridis, C. Saathoff, N. Simou, V. Tzouaras, Y.Avrithis, S. Handschuh, Y. Kompatsiaris, S. Staab, M.G. Strintzis, Semantic annotation of images and videos for multimedia analysis, in: Proceedings of the 2nd European Semantic Web Conference (ESWC 2005), 29 May–1 June 2005, Heraklion, Greece, 2005 [Borgida, An, Mylopoulos, 2005] A. Borgida, Y. An, J. Mylopoulos: Inferring Complex Semantic Mappings Between Relational Tables and Ontologies from Simple Correspondences. In: CoopIS, DOA, and ODBASE, OTM Confederated International Conferences, Cyprus, Part II, volume 3761 of LNCS, p 1152-1169, Springer, 2005 [Borthwick, 1999] Andrew Eliot Borthwick: A maximum entropy approach to named entity recognition, New York University, 1999 [BREIN] http://www.eu-brein.com/ [Brunstein, 2002] Ada Brunstein, “Annotation guidelines for answer types”, BBN Technologies, 2002, http://www.ldc.upenn.edu/Catalog/docs/LDC2005T33 /BBN-Types-Subtypes.html [Campbell, Currier] L.M. Campbell, S. Currier, (31/10/00), http://www.sesdl.scotcit.ac.uk/sellic_pres/sellic2.html [Chandrasekaran, Josephson, Benjamins, 1998] B. Chandrasekaran, J.R. Josephson, V.R. Benjamins: “Ontology of Tasks and Methods”, In: Proceedings of 1998 Banff Knowledge Acquisition Workshop, 1998 [CIDOC CRM] The CIDOC Conceptual Reference Model, http://cidoc.ics.forth.gr/ [Cimpian, Mocan, 2005] Emilia Cimpian, Adrian Mocan: WSMX Process Mediation Based on Choreographies, 1st International Workshop on Web Service Choreography and Orchestration for Business Process Management, 2005 [DARPA] Defense Advanced Research Projects Agency, http://www.darpa.gov [Davenport, 1993] Thomas Davenport: Process Innovation: Reengineering work through information technology, Harvard Business School Press, Boston, 1993 [Davis, Van House, Towle, et al, 2005] M. Davis, N. Van House, J. Towle, S. King, S. Ahern, C. Burgener, D. Perkel, M. Finn, V. Viswanathan, M. Rothenberg: (2005). MMM2: mobile media metadata for media sharing, CHI ’05 extended abstracts on Human factors in computing systems, April 02-07, Portland, OR, USA, 2005, http://portal.acm.org/citation.cfm?id=1056910&dl=GUIDE &coll=GUIDE&CFID=25341546&CFTOKEN=26292269

133

CWA 15992:2009 (E)

[de Laborda, Conrad, 2005] C.P. de Laborda, S. Conrad: Relational.OWL A Data and Schema Representation Format Based on OWL. In Second Asia-Pacific Conference on Conceptual Modelling (APCCM2005), volume 43 of CRPIT, p 89-96, Newcastle, Australia, 2005, ACS [Dell’Erba, Fodor, Höpken, et al, 2005] M. Dell’Erba, O. Fodor, W. Höpken, et al, “Exploiting Semantic Web Technologies for Harmonizing e-Markets”. In: IT&T Information Technology & Tourism – Application – Methodologies – Techniques, 2005 [DIP] http://dip.semanticweb.org/index.html [Directive 90/314/EEC] Council Directive 90/314/EEC of 13 June 1990 on package travel, package holidays and package tours [Dodgeball] http://www.dodgeball.com/ [Dörr, 2003] M. Dörr: “The cidoc conceptual reference module: An ontological approach to semantic interoperability of metadata”. AI Magazine 24(3) (2003), 75–92 [Dörr, Guarino, Fernández López, et al, 2001] M. Dörr, N. Guarino, M. Fernández López, E. Schulten, M. Stefanova, A. Tate: “State of the Art in Content Standards. OntoWeb Deliverable 3.1.”, Technical Report, 2001 [Dörr, Hunter, Lagoze, 2003] M. Dörr, J. Hunter, C. Lagoze: “Towards a core ontology for information integration. Journal of Digital Information 4(1)” (2003) [Dou, McDermott, Qi] D. Dou, McDermott, P. Qi: “Ontology translation by Ontology Merging and Automated Reasoning” [Dunieveld, Stoter, Weiden, et al, 2000] A.J. Dunieveld, R. Stoter, M.R. Weiden, B. Kenepa, V.R. Benjamins: “WonderTools? A comparative study of ontological engineering tools”, 2000 [Earley, 2005] S. Earley: Resolving Taxonomy Challenges and Information Architecture Conflicts, 2005 http://www.dama-nj.org/presentations/ Seth%20Earley%20Taxonomies%20May%2012%202005%20(DamaNJ).pdf [eBusiness W@tch Report 2006/2007] eBusiness W@tch Report 2006/2007, http://www.ebusiness-watch.org/key_reports/documents/EBR06.pdf [ebXML] eBusiness XML, http://www.ebxml.org/ [Echarte, Astrain, Cordoba, Villadangos, 2007] F. Echarte, J.J. Astrain, A. Cordoba, J. Villadangos: Ontology of Folksonomy: A New Modelling Method. Proceedings of the Semantic Authoring, Annotation and Knowledge Markup Workshop (SAAKM2007), British Columbia, Canada, Vol-289, 2007, http://ftp.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-289/p08.pdf [ESP Game] http://www.espgame.org/ [ETSI] European Telecommunications Standards Institute, http://www.etsi.org/ [euromuse] http://www.euromuse.net, http://www.euromuse-project.net [Expedia] http://www.expedia.com/ [Fabian, 1975] J. Fabian: “Taxonomy and Ideology: On the Boundaries of Concept Classification”. In: M. Kinkade (ed), Linguistics and Anthropology, Lisse, p 183- 197, 1975 [Facebook] http://www.facebook.com/ [Flickr] http://www.flickr.com/ [Fodor, Werthner, 2005] Oliver Fodor, Hannes Werthner: Harmonise: a step toward an interoperable e-tourism marketplace. In: International Journal of Electronic Commerce, Winter 2004-5, Vol 9, No 2, p 11-39, 2005 [Freyer, 2006] Freyer, Walter: Tourismus: Einführung in die Fremdenverkehrsökonomie, 8th revised ed, München : Oldenbourg, 2006

134

CWA 15992:2009 (E)

[Fuxman, Hernández, Ho, et al, 2006] A. Fuxman, M.A. Hernández, H. Ho, R. Miller, P. Papotti, L. Popa: Nested Mappings: Schema Mapping Reloaded. Proc. VLDB 2006 Conf., p 67-78, Seoul, Korea, 2006 [Garshol, 2004] L.M. Garshol: Metadata? Thesauri? Taxonomies? Topic Maps! Making Sense of it all, Journal of Information Science, 2004 [Gennari, Musen, Fergerson, et al, 2002] J. Gennari, M.A. Musen, R.W. Fergerson, W.E. Grosso, M. Crubezy, H. Eriksson, N.F. Noy, S.W. Tu: The Evolution of Protégé: An Environment for Knowledge-Based Systems Development, Technical Report SMI-2002-0943, 2002 [Ghawi, Cullot] R. Ghawi, N. Cullot: Database-to-ontology Mapping Generation for semantic interoperability [Gilchrist, 2003] A. Gilchrist: Thesauri, taxonomies and ontologies - an etymological note. Journal of Documentation, 2003, 59 (1), p 7-18 [Goodall, Lewis, Martinez, et al, 2004] S. Goodall, P.H. Lewis, K. Martinez, P. Sinclair, F. Giorgini, M.J. Addis, M.J. Boniface, C. Lahanier, J. Stevenson: “SCULPTEUR: Multimedia Retrieval for Museums”, CIVR 2004, LNCS 3115, p 638-646, 2004 [Grishman, 2003] Ralph Grishman, “Information Extraction”. In: The Oxford Handbook of Computational Linguistics, ed. R. Mitkov, Oxford University Press, 2003 [Grosof, Horrocks, Volz, Decker, 2003] B.N. Grosof, I. Horrocks, R. Volz, S. Decker: Description logic programs: Combining logic programs with description logic. In Proc. of the Twelfth International World Wide Web Conference (WWW 2003), p 48-57, ACM, 2003 [Grossman, 2004] Grossman, David: Confusion is the star of hotel rating systems, http://www.usatoday.com/travel/columnist/grossman/2004-03-05- grossman_x.htm [Grove, 2003] A. Grove: Taxonomy. In: Encyclopedia of Library and Information Science, p 2770-2777, New York, Marcel Dekker Inc, 2003 [Gruber, 1993a] T.R. Gruber: “A translation approach to portable ontology specifications”, Knowledge Acquisition, Vol 5, 1993 [Gruber, 1993b] T.R. Gruber: “Towards Principles of the Design of Ontologies Used for Knowledge Sharing”, International Journal of Human Computer Studies, Vol 43, p 907-928, 1993 [Gruber, 2005a] T. Gruber: Ontology of Folksonomy: A Mash-up of Apples and Oranges, AIS SIGSEMIS Bulletin, 2005 [Gruber, 2005b] T. Gruber: TagOntology, a way to agree on the semantics of tagging data, 2005 [GS1] http://www.gs1.org/ [Guarino, Giaretta, 1995] N. Guarino, P. Giaretta: “Ontologies and knowledge bases. Towards a terminological clarification”, Towards Very Large Knowledge Bases. Ed IOS Press, p 25-32 [GUID] http://en.wikipedia.org/wiki/GUID [Gulli, Signorini, 2005] A. Gulli, A. Signorini: Building an open source meta search engine [WWW2005] [Haase, Wang, 2007] P. Haase, Y. Wang: “A decentralized infrastructure for query answering over distributed ontologies”. In: Proceedings of the 2007 ACM Symposium on Applied Computing (Seoul, Korea, March 11-15, 2007). SAC ’07. ACM, New York, NY, p 1351-1356, http://doi.acm.org/10.1145/1244002.1244294

135

CWA 15992:2009 (E)

[HarmoNET] The Harmonisation Network for the Exchange of Travel and Tourism Information, http://www.harmonet.org/ [HEDNA] http://www.hedna.org/ [Heflin, 2001] J. Heflin, J. Hendler, “A portrait of the Semantic Web in action”, IEEE Intell. Syst. 16 (2) (2001), p 54–59 [Hempel, 1965] C.G. Hempel: “Fundamentals of Taxonomy”, p 137-154. In: C. G. Hempel: Aspects of scientific explanation and other essays in the philosophy of science, New York, The Free Press, 1965 [Hepp, Leymann, Domingue, et al, 2005] Martin Hepp, Frank Leymann, John Domingue, Alexander Wahler, Dieter Fensel: Semantic Business Process Management: A Vision Towards Using Semantic Web Services for Business Process Management, Proceedings of the IEEE ICEBE. 2005 [Höpken, 2004] Wolfram Höpken: Reference Model of an Electronic Tourism Market (IFITT RM), Version 1.3, 2004, http://www.rmsig.de/documents/ReferenceModel.doc [Hull, 1998] D.L. Hull: Taxonomy. In: Routledge Encyclopedia of Philosophy, Version 1.0, London, Routledge, 1998 [Hunter, 2002] J. Hunter: “Combing the CIDOC CRM and MPEG-7 to describe multimedia in museums”, In: Proceedings of Museums on the Web 2002 Conference, Boston, 2002 [IATA] http://www.iata.org/, http://en.wikipedia.org/wiki/IATA [IEEE] Institute of Electrical and Electronics Engineers, http://www.ieee.org [IFITT] International Federation for IT and Travel & Tourism, http://www.ifitt.org/ [IFLA] International Federation of Library Associations and Institutions, http://www.ifla.org/ [ISO] International Organization for Standardization, http://www.iso.org/; for references to ISO standards see also Clause 2 “Normative references” and this clause. [ISO 3166] http://www.iso.org/iso/country_codes.htm, http://www.iso.org/iso/fr/country_codes.htm [ISO/IEEE 11073] Health informatics — Point-of-care medical device communications (multiple parts) [ISO 21127:2006] Information and documentation — A reference ontology for the interchange of cultural heritage information [IST] Information Society Technologies, http://cordis.europa.eu/ist/ [ITU] International Telecommunication Union, http://www.itu.int [Iurgel, 2004] I. Iurgel: From another point of view: art-E-fact, In: Proc. TIDSE’04 (2004) vol 1, p 26-35 [Kalfoglou, Schorlemmer, 2003] Yannis Kalfoglou, Marco Schorlemmer: Ontology mapping, the state of the art. Knowledge Engineering Review, 18(1), p 1-31, 2003 [Kim, Yang, Song, et al, 2007] H.L. Kim, S.K. Yang, S.J. Song, G.J. Breslin: “Tag Mediated Society with SCOT Ontology”, Proceedings of the Semantic Web Challenge 2007 in conjunction with the Sixth International Semantic Web Conference, November 11-15, Busan, Korea, 2007 [Knerr, 2006] T. Knerr: Tagging Ontology: Towards a Common Ontology for Folksonomies, 2006 [Konstantinou, Spanos, Chalas, et al, 2006] N. Konstantinou, D. Spanos, M. Chalas, E. Solidakis, N. Mitrou: VisAVis: An Approach to an Intermediate Layer between

136

CWA 15992:2009 (E)

Ontologies and Relational Database Contents. International Workshop on Web Information Systems Modeling (WISM 2006), Luxembourg, 2006 [Küster, Moore, Ludwig, 2007] Marc Wilhelm Küster, Graham Moore, and Christoph Ludwig, “Semantic registries.” In: XMLTage 2007 in Berlin, Berlin, 2007 [Lagoze, Hunter, 2001] C. Lagoze, J. Hunter: “The ABC Ontology and Model”, Journal of Digital Information, Vol 2, No 2, 2001 [Lahti, Palola, Korva, et al, 2006] J. Lahti, M. Palola, J. Korva, U. Westermann, K. Pentikousis, P. Pietarila: “A mobile phone-based context-aware video management application,” In: Multimedia on Mobile Devices II, Edited by Creutzburg, Takala, Chen, Proceedings of the SPIE, Volume 6074, p 204-215, 2006 [Lamsfus, Linaza, Smithers] Carlos Lamsfus, María Teresa Linaza, Tim Smithers: “Towards semantic-based information exchange and integration standards: the art-E-fact ontology as a possible extension to the CIDOC CRM (ISO/CD 21127) standard”. K-CAP2005, Banff, Alberta, Canada, Proceedings (ISSN 1613-0073) of the Workshop on Integrating Ontologies, p 49-54 [Landwehr, Bull, McDermott, Chpi, 1994] C.E. Landwehr, A.R. Bull, J.P. McDermott, W.S. Chpi: A Taxonomy of Computer Program Security Flaws, with Examples. ACM Computing Surveys, 26,3 (Sept 1994), http://chacs.nrl.navy.mil/publications/CHACS/1994/1994landwehr-acmcs.pdf [Lassila, Swick, 1999] O. Lassila, R.R. Swick: “Resource Description Frameworks (RDF): Model and Syntax Specification”, Recommendation World Wide Web Consortium, February 1999 [LOCODE] http://www.unece.org/cefact/locode/ [Lu, Meng, Shu, et al, 2005] Y. Lu, W. Meng, L. Shu, C. Yu, K. Liu: Evaluation of Result Merging Strategies for Metasearch Engines. WISE Conference, 2005 [Lu, Wu, Zhao, et al, 2007] Yiyao Lu, Zonghuan Wu, Hongkun Zhao, Weiyi Meng, King-Lup Liu, Vijay Raghavan, Clement Yu: MySearchView: A Customized Metasearch Engine Generator. 26th ACM SIGMOD International Conference on Management of Data (SIGMOD 2007), Demo paper, p 1113-1115, Beijing, China, June 2007 [Marradi, 1990] A. Marradi Classification, Typology, Taxonomy. Quality and Quantity, 1990, XXIV, 2, p 129-157. Available at: http://web.archive.org/web/20040705070709/http://www.unibo.edu.ar/marradi/cl assqq.pdf (Visited 2004-01-04) [McDowell, 2003] L. McDowell, O. Etzioni, S. Gribble, A. Halevy, H. Levy, W. Pentney,D. Verma, S. Vlasseva, Enticing ordinary people onto the Semantic Web via instant gratification. In: Proceedings of the 2nd International Semantic Web Conference (ISWC 2003), October 2003 [Medjahed, Bouguettaya, 2005] Brahim Medjahed, Athman Bouguettaya: A Multilevel Composability Model for Semantic Web Services, IEEE Transactions on Knowledge and Data Engineering (July 2005) vol 17 Issue7 p 954-968 [Meehl, 1995] P.E. Meehl: Bootstraps taxometrics: solving the classification problem in psychopathology. American Psychologist, 1995, 50(4), p 266-275 [Meng, Yu, Liu, 2002] W. Meng, C. Yu, K. Liu: Building Efficient and Effective Metasearch Engines. ACM Computing Surveys, 34(1), March 2002, p 48-89 [Merholz, 2004] P. Merholz: Ethnoclassification and vernacular vocabularies, 2004 [metasearch] http://www.trln.org/events/NISO/NISOmetasearch.ppt [Miles, Brickley, 2005] A. Miles, D. Brickley: SKOS Core Vocabulary Specification, W3C Working Draft, 2005

137

CWA 15992:2009 (E)

[Miller, Haas, Hernandez, 2000] E. Miller, L. Haas, M.A. Hernandez: Schema Mapping as Query Discovery. Proc. VLDB 2000 Conf., p 77-88, Cairo, Egypt, 2000 [Mishler, 2006] B.D. Mishler: Integrative Biology 200A, “Principles of phylogenetics”, 2006, http://ib.berkeley.edu/courses/ib200a/pdfs/lect_12_(classification).pdf [Mutton, P., and Golbeck, 2003] P. Mutton, J. Golbeck: “Visualization of semantic metadata and ontologies”, In: Proc. of Information Visualization 2003, London UK, 2003 [MySpace] http://www.myspace.com/ [Neches, Fikes, Finin, et al, 1991] R. Neches, R.E. Fikes, T. Finin, T.R. Gruber, T. Senator, W.R. Swarout: “Enabling technology for knowledge sharing”, AI Magazine, Vol 12, No 3, p 36-56, 1991 [Noy, McGuinness, 2001] N.F. Noy, D.L. McGuinness: “Ontology development 101: A Guide to creating your first ontology”, Standford University, 2001 [OASIS ebXML Registry] OASIS ebXML Registry Information Model (RIM) Standard, v3.0 and OASIS ebXML Registry Services (RS) Standard, v3.0, http://www.oasis-open.org/committees/download.php/23648/regrep-3.0.1- cd3.zip [OASIS Reference Model] http://www.oasis-open.org/committees/tc_home.php ?wg_abbrev=soa-rm, http://www.oasis-open.org/committees/download.php/ 16587/wd-soa-rm-cd1ED.pdf [ontology] http://www.ontologyportal.org/pubs/IJCAI2001.pdf [Opodo] http://www.opodo.com/ [OTA] Open Travel Alliance, http://www.opentravel.org/ [P3P] W3C’s Platform for Privacy Preferences, http://www.w3.org/P3P/ [Petrini, Risch, 2004] J. Petrini, T. Risch: Processing Queries over RDF views of Wrapped Relational Databases. In 1st International Workshop on Wrapper Techniques for Legacy Systems, WRAP 2004, Delft, Holland, 2004 [Photostuff] http://www.photostuff.com/ [Prud’Hommeaux, Seaborne, 2006] E. Prud’Hommeaux, A. Seaborne: SPARQL Query Language for RDF. World Wide Web Consortium, Working Draft WD-rdf- sparql-query-2006, 2006 [Quilitz, Leser, 2008] B. Quilitz, U. Leser: Querying Distributed RDF Data Sources with SPARQL. In: The Semantic Web: Research and Applications. LNCS 5021/2008, p 524-538 [Quint, 2004] V. Quint, I. Vatton, An Introduction to Amaya, W3C NOTE 20-February- 1997, 1997, http://www.w3.org/TR/NOTE-amaya-970220.html [Rabble] http://www.rabble.com/ [Ramakrishnan, Gehrke, 2002] Raghu Ramakrishnan, Johannes Gehrke: Database Management Systems, 3rd edition, McGraw-Hill, 2002 [RDF] http://www.w3.org/RDF/ [reference sources] http://www.libraries.rutgers.edu/rul/rr_gateway/e_ref_shelf/refmaps.shtml [REWERSE] Reasoning on the Web with Rules and Semantics, http://rewerse.net/ [RFC 4122] http://www.ietf.org/rfc/rfc4122.txt, http://www.faqs.org/rfcs/rfc4122.html [Rodriguez, Gómez-Pérez, 2006] J.B. Rodriguez, A. Gómez-Pérez: “Upgrading relational legacy data to the semantic web”. In: Proceedings of the 15th International Conference on World Wide Web (Edinburgh, Scotland, May 23 - 26, 2006), WWW ’06, ACM Press, New York, NY, p 1069-1070 [RQL] http://www.openrdf.org/doc/rql-tutorial.html

138

CWA 15992:2009 (E)

[Rummler, Brache, 1995] Rummler, Brache: Improving Performance: How to manage the white space on the organizational chart, Jossey-Bass, San Francisco, 1995 [Sanghee, Lewis, Martinez, 2004] K. Sanghee, P. Lewis, K. Martinez: “SCULPTEUR- D7.1- Semantic Network of Concepts and their Relationships”, Technical Deliverable, 2004 [Sarvas, Herrarte, Wilhelm, Davis, 2004] R. Sarvas, E. Herrarte, A. Wilhelm, M. Davis: “Metadata creation system for mobile images”. In: Proceedings of the 2nd international conference on Mobile systems, applications, and services, Boston, MA, USA, 2004, http://portal.acm.org/citation.cfm?id=990072&dl=GUIDE&coll=GUIDE&CFID=25 341318&CFTOKEN=52999446 [Sarvas, Viikari, Pesonen, Nevanlinna, 2004] R. Sarvas, M. Viikari, J. Pesonen, H. Nevanlinna: “MobShare: controlled and immediate sharing of mobile images”. In: Proceedings of the 12th annual ACM international conference on Multimedia, October 10-16, 2004, New York, NY, USA, http://portal.acm.org/citation.cfm?id=1027690&dl=GUIDE&coll=GUIDE&CFID=2 5341318&CFTOKEN=52999446 [SATINE] Semantic Web travel services on a voyage of discovery, http://www.srdc.metu.edu.tr/webpage/projects/satine/, http://cordis.europa.eu/ictresults/index.cfm/section/news/tpl/article/BrowsingTyp e/Features/ID/79947 [Schenk, Staab, 2008] S. Schenk, S. Staab: “Networked graphs: a declarative mechanism for SPARQL rules, SPARQL views and RDF data integration on the web”. In: Proceeding of the 17th international Conference on World Wide Web (Beijing, China, April 21-25, 2008). WWW ’08. ACM, New York, NY, p 585-594, http://doi.acm.org/10.1145/1367497.1367577 [Schroeter, 2003] R. Schroeter, J. Hunter, D. Kosovic, Vannotea, A collaborative video indexing, annotation and discussion system for broadband networks, in: Proceedings of the K-CAP 2003 Workshop on “Knowledge Markup and Semantic Annotation”, October 2003, Florida, 2003 [SCORM] Sharable Content Object Reference Model, http://www.adlnet.gov/scorm/index.aspx [Shvaiko, Euzenat, 2005] P. Shvaiko, J. Euzenat: “A Survey of Schema-Based Matching Approaches”. In: J. Data Semantics IV 3730, 2005, p 146-171 [Silva] Nuno Silva: Ontology Mapping for Interoperability in Semantic Web, GEDAC - Knowledge Engineering and Decission Support Research Group, Porto, Portugal [Slavic, 2000] A. Slavic: A Definition of Thesauri and Classification as Indexing Tools, 2000, http://dublincore.org/documents/thesauri-definition/ (Visited 2005-12-20) [Smithers, Posada, Stork, et al, 2004] T. Smithers, J. Posada, A. Stork, M. Pianciamore, N. Ferreira, S. Grimm, I. Jimenez, S. di Marca, G. Marcos, M. Mauri, P. Selvini, N. Sevilmis, B. Thelen, V. Zecchino: “Information management and knowledge sharing in wide”, In: European Workshop on the Integration of Knowledge, Semantics and Digital Media Technology, London, 2004 [SOUPA] Standard Ontology for Ubiquitous and Pervasive Applications, http://ebiquity.umbc.edu/paper/html/id/168/ [standard] http://www.webopedia.com/TERM/S/standard.htm, http://www.etsi.org/WebSite/Standards/WhatIsAStandard.aspx

139

CWA 15992:2009 (E)

[Stuckenschmidt, van Harmelen, de Waard, et al, 2004] H. Stuckenschmidt, F. van Harmelen, A. de Waard, T. Scerri, R. Bhogal, J. van Buel, I. Crowlesmith, Ch. Fluit, A. Kampman, J. Broekstra, E. van Mulligen: “Exploring Large Document Repositories with RDF Technology: The DOPE project”, IEEE Intelligent Systems, Vol 19, No 3, p 34-40, 2004 [Studer, Benjamins, Fensel, 1998] R. Studer, R. Benjamins, D. Fensel: “Knowledge Engineering”, DKE, Vol 25, No 1-2, p 161-197, 1998 [SUO] Standard Upper Ontology, http://suo.ieee.org/ [SUPER] http://www.ip-super.org/ [taxonomy] http://archive.eiffel.com/doc/manuals/technology/oosc/inheritance- design/penn.html, http://www.db.dk/bh/lifeboat_ko/concepts/taxonomy.htm, http://www.db.dk/jni/lifeboat/info.asp?subjectid=15, http://en.wikipedia.org/wiki/Taxonomy [Terzi, Vakali, Hacid] Elvimaria Terzi, Athna Vakali, Mohand-Saïd Hacid, “Knowledge Representaioin, ontologies and the Semantic Web” [Topic maps] http://www.ontopia.net/topicmaps/materials/tm-vs-thesauri.html [towntology] http://www.towntology.net/Meetings/0605-Belfast/Minutes- Presentations.pdf [Trant, Wyman, 2006] J. Trant, B. Wyman: Investigating social tagging and folksonomy in art museums with steve.museum. World Wide Web 2006: Tagging Workshop. Editor, Edinburgh, Scotland, ACM, Access, 2006 [TUI] http://www.tui.com/ [UNESCO, 2002] UNESCO: Universal Declaration on Cultural Diversity, 2002, http://unesdoc.unesco.org/images/0012/001271/127160m.pdf [UNWTO] World Tourism Organization, http://www.unwto.org/ [Uren, Cimiano, Iria, Handschuh, Vargas-Vera, Motta, Ciravegna] Semantic annotation for knowledge management: Requirements and a survey of the state of the art, 2005: Journal of Web Semantics [URI, URL] http://www.w3.org/Addressing/ [URN] Uniform Resource Name, http://en.wikipedia.org/wiki/Uniform_Resource_Name, http://fr.wikipedia.org/wiki/Nom_uniformisé_de_ressource, http://de.wikipedia.org/wiki/Uniform_Resource_Name [Uschold, Grüninger, 1996] M. Uschold, M. Grüninger: “Ontologies: Principles, Methods and Applications”, Knowledge Engineering Review, Vol 2, 1996 [Van Damme, Hepp, Siorpaes, 2007] C. Van Damme, M. Hepp, K. Siorpaes: FolksOntology: An Integrated Approach for Turning Folksonomies into Ontologies, Proceedings of the ESWC 2007 Workshop “Bridging the Gap between Semantic Web and Web 2.0”, Innsbruck, Austria, 2007 [Van Harmelen, Broekstra, Chirtiaan, et al, 2001] F. Van Harmelen, J. Broekstra, F. Chirtiaan, H. Horst, A. Kampman, J. van der Meer, M. Sabou: “Ontology-based Information Visualisation”, In: Proceedings of the Fifth International Conference on Information Visualisation, England, 2001 [VESA] Video Electronics Standards Association, http://www.vesa.org/ [Volz, Handschuch, Staab, Studer, 2004] R. Volz, S. Handschuch, S. Staab, R. Studer: OntoLiFT Demonstrator, 2004 [Volz, Stojanovic, Stojanovic, 2002] R. Volz, L. Stojanovic, N. Stojanovic: Migrating dataintensive Web Sites into the Semantic Web. ACM Symposium on Applied Computing (SAC 2002), Madrid, Spain, March 2002 [W3C] World Wide Web Consortium, http://www.w3.org/

140

CWA 15992:2009 (E)

[Wache, et al, 2001] H. Wache, et al: Ontology-Based Integration of Information - A Survey of Existing Approaches. In Stuckenschmidt, H., editor, IJCAI-2001 Workshop on Ontologies and Information Sharing, p 108-117, Seattle, USA, April 4-5, 2001 [WAI] W3C’s Web Accessibility Initiative, http://www.w3.org/WAI/ [Web Service Modeling Ontology Working Group] http://www.wsmo.org/ [Welty, 1998] C.A. Welty: The Ontological Nature of Subject Taxonomies. In: N. Guarino (ed), Proceedings of the First Conference on Formal Ontology and Information Systems, Amsterdam, IOS Press, 1998, http://www.cs.vassar.edu/faculty/welty/papers/fois-98/fois-98-1.html [Welty, 1999] C. Welty, N. Ide, Using the right tools: enhancing retrieval from markedup documents, J. Comput. Humanit. 33 (10) (1999), p 59–84 [Wielinga, Schreiber, Wielemaker, Sandberg, 2001] B.J. Wielinga, A.Th. Schreiber, J. Wielemaker, J.A.C. Sandberg: “From Thesaurus to Ontology”, In: Proceedings of the First International Conference on Knowledge Capture and Acquisition, p 194-201, 2001 [Wordpress] http://wordpress.com/ [World Travel and Tourism Council, 2008] World Travel and Tourism Council: 2008 Tourism and Travel Executive Summary, 2008, http://www.wttc.org/bin/pdf/temp/exec_summary_final.html [WRL] The Web Rule Language, http://www.wsmo.org/wsml/wrl/wrl.html [XFT] eXchange For Travel, http://www.exchangefortravel.org/ [YouTube] http://www.youtube.com/ [Zhao, Meng, Wu, et al, 2005] H. Zhao, W. Meng, Z. Wu, V. Raghavan, C. Yu: Fully Automatic Wrapper Generation for Search Engines. World Wide Web Conference (WWW14), p 66-75, 2005 EN ISO 9000:2005 Quality management systems — Fundamentals and vocabulary (ISO 9000:2005) ISO/IEC 9834:2005 (several parts) Information technology — Open Systems Interconnection — Procedures for the operation of OSI Registration Authorities EN ISO 14001:2004 Environmental management systems — Requirements with guidance for use (ISO 14001:2004)

141