<<

ANNEX 18 A – Functional specifications

Publications Office AO 10592 Maintenance and evolution of the EuroVoc Front Office

Annex 18 A Description of the system

EuroVoc Front-Office – Functional Specifications

AO 10592 Maintenance and further development of the EuroVoc Front Office 1/88

ANNEX 18 A – Functional specifications

EuroVoc Front-Office – Functional Specifications

GLOSSARY & REFERENCES ...... 4

1. INTRODUCTION ...... 4 1.1 EuroVoc: the thesaurus ...... 4 1.2 The EuroVoc website and Front Office ...... 5 2. EUROVOC ONTOLOGY DESCRIPTION...... 5 2.1 Thesauri description ...... 7 2.1.1 EuroVoc - Domain and Microthesaurus ...... 7 2.2 Thesaurus concepts ...... 7 2.3 Relationships between the concepts ...... 7 2.3.1 Hierarchical relationship ...... 7 2.3.2 Associative relationship ...... 8 2.3.3 Polyhierarchical relationship ...... 8 2.4 Thesaurus terms ...... 9 2.5 Notes ...... 10 2.6 Specific associations between terms (FullName – Acronym – ShortName) ...... 10 2.7 Language equivalence between concepts ...... 11 2.8 Description of the European Training Thesaurus (ETT, Cedefop) ...... 11 2.9 Description of the Thesaurus of European Education System (TESE)...... 11 2.10 Shared concepts between EuroVoc, ETT and TESE ...... 12 3. A DETAILED PRESENTATION OF THE EUROVOC WEBSITE ...... 12 3.1 Page layout ...... 12 3.1.1 Splash Page ...... 12 3.1.2 General page layout ...... 12 3.2 Language management ...... 13 3.2.1 Content language ...... 13 3.2.2 Interface language ...... 14 3.2.3 Editorial content ...... 15 3.3 Search ...... 15 3.3.1 Simple Search ...... 15 3.3.2 Advanced search ...... 16 3.3.3 Search results ...... 19 3.4 Browse ...... 21 3.5 Term details ...... 25 3.5.1 Term details: Concept ...... 25 3.5.2 Term details: Non-Preferred Terms ...... 27 3.5.3 Term details: Obsolete concept ...... 27 3.5.4 Term details: Polyhierarchy ...... 28 3.5.5 Term details: Compound Non-Preferred Terms ...... 29 3.5.6 Term details: Full Name, Short Name and Acronyms ...... 30 3.6 Map ...... 30 3.7 Download ...... 32 3.7.1 Download (by Domain) ...... 32 3.7.2 Download (Permuted alphabetical) ...... 33 3.7.3 Download (Multilingual list) ...... 33 3.7.4 Download (Alphabetical Index) ...... 35 3.7.5 Download (SKOS-XML) ...... 38 3.8 Contribute ...... 39 3.9 New approved concepts ...... 41 3.10 Drupal ...... 41 3.10.1 User and authorization management ...... 41 3.10.2 Types of content ...... 44 3.10.3 Menu item ...... 48

AO 10592 Maintenance of the EuroVoc Front Office 2/88

ANNEX 18 A – Functional specifications

3.10.4 Translation ...... 50 3.11 PDF Layout specifications ...... 57 3.11.1 Subject-oriented version ...... 58 3.11.2 Permuted Alphabetical version ...... 63 3.11.3 Language Dependent Appendices ...... 66 3.11.4 Stop words ...... 69 3.12 Generation of Permuted entries ...... 70 3.13 Transition between two releases ...... 70 3.14 Semantic technologies ...... 71 3.14.1 URI management ...... 71 3.15 Thesaurus Alignment Environment ...... 71 3.15.1 Web services ...... 72 3.15.2 Computing inferences and double annotations ...... 72 3.16 Web services defined in the CELLAR ...... 74 3.17 Licensee’s users ...... 77 3.17.1 Website usage ...... 77 4. TECHNICAL ARCHITECTURE ...... 79 4.1 Environments ...... 79 4.2 Hardware architecture ...... 79 4.3 Software architecture ...... 79 4.4 Publishing mechanism ...... 81 4.4.1 Exporting from the Back Office ...... 82 4.4.2 Importing into the Front Office ...... 82 5. ONGOING DEVELOPMENTS ...... 82 5.1 Alignment ...... 83 5.1.1 “Disseminate Alignment” functionality ...... 83 5.2 Language management ...... 84 5.2.1 “Publish New Editorial Language” functionality ...... 84 5.3 Contribute ...... 84 5.3.1 “Captcha” ...... 84 5.4 Publishing mechanism ...... 85 5.4.1 “Publishing Front End” ...... 85 5.5 Semantic Web features ...... 85 5.5.1 “Linked Open Data” ...... 85 5.5.2 “URI aliases (alternative identifier)” ...... 86 6. ADDITIONAL INFORMATION...... 87

AO 10592 Maintenance of the EuroVoc Front Office 3/88

ANNEX 18 A – Functional specifications

GLOSSARY & REFERENCES

Please consult the separate document “Annex 18 B Glossary and References.”

1. Introduction

1.1 EuroVoc: the thesaurus

EuroVoc is a multilingual thesaurus which was originally built specifically for processing the documentary information of the EU institutions. It is a multi-disciplinary thesaurus covering fields which are sufficiently wide-ranging to encompass both and national points of view, with a certain emphasis on parliamentary activities.

The concepts in the EuroVoc thesaurus are classified under 21 domains that cover the main policies and activity fields of the European Union: POLITICS INTERNATIONAL RELATIONS ECONOMICS TRADE FINANCE SOCIAL QUESTIONS EDUCATION AND COMMUNICATIONS SCIENCE BUSINESS AND COMPETITION EMPLOYMENT AND WORKING CONDITIONS TRANSPORT ENVIRONMENT AGRICULTURE, FORESTRY AND FISHERIES AGRI-FOODSTUFFS PRODUCTION, TECHNOLOGY AND RESEARCH ENERGY INDUSTRY GEOGRAPHY INTERNATIONAL ORGANISATIONS The aim of EuroVoc is to provide a coherent tool for annotating the content of information resources and for searching the information resources indexed by EuroVoc.

Main uses of EuroVoc include: − indexing the content of documents or tagging the content of websites; − feeding search engines for multilingual information retrieval; − use in research and academic works in the field of semantic technologies − automatic indexing or text mining; − multilingual glossary or translation tool; The main categories of users of the EuroVoc thesaurus are:

AO 10592 Maintenance of the EuroVoc Front Office 4/88

ANNEX 18 A – Functional specifications

− EU institutions for content annotation and information retrieval, For example, to index the content of EU law in EUR-Lex or of the EU publications in EU-Bookshop. − Governmental bodies; − libraries ; − Libraries and documentation centres; − Universities or private laboratories; − Translators and lawyers ; − Knowledge Management entities from the public and private sectors. Since the first printed edition published in 1982 in seven languages, EuroVoc has extended its language coverage and is now available in 23 official EU languages (Bulgarian, Spanish, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, and Swedish), as well as in another language (Serbian).

EuroVoc is revised on a regular basis, and the content is updated every two years with a new release.

1.2 The EuroVoc website and Front Office

Since 2000, the content of the EuroVoc thesaurus has been made available in the EuroVoc website (http://eurovoc.europa.eu ) that allows users to navigate across the structure of the thesaurus, to search a term by language, and to display its semantic context or its language equivalents. EuroVoc end-users can download the PDF printable version or ask for a login/password to download the native version in XML or SKOS/RDF format. The EuroVoc website layout was renewed in 2008.

The EuroVoc website is a multilingual platform available in the 24 languages of the thesaurus. It allows users to search for the terms and to display them in their semantic environment, to display the language equivalent, to navigate through the hierarchical relations in the thesaurus, or to download it in different electronic formats. While searching, navigating or downloading the PDF or XLS files is open for the public, downloading the RDF or XML formats is today still restricted to registered users who have signed a license agreement with the Publications Office. For more statistical information about the EuroVoc website , see AO_10592_Annex18C_EUROVOC_Statistics. The scope of the current call for tenders is the maintenance and further development of the EuroVoc Front Office.

2. EuroVoc ontology description

The EuroVoc RDF schema have been modelled as a direct extensions of the SKOS-Core and SKOS- XL classes and properties.

The following annexed file contains a description of the OWL (Web Ontology Language) for the version of EuroVoc available in the EuroVoc website .

The file is readable with the Protégé software.

° Please refer to the file eurovoc.owl in the archive “Annex_18E_Attachments”

The following annexed file describes the RDF-Schema for the version of EuroVoc available in the EuroVoc website . The documentation is available in HTML format. Select the index.html to display the home page index in your browser.

AO 10592 Maintenance of the EuroVoc Front Office 5/88

ANNEX 18 A – Functional specifications

° Please refer to the archive Eurovoc Schema Documentation.zip in the archive “Annex_18E_Attachments”

AO 10592 Maintenance of the EuroVoc Front Office 6/88

ANNEX 18 A – Functional specifications

2.1 Thesauri description

This section describes the structure of the EuroVoc, ETT and TESE thesauri that are maintained in accordance with the recommendation of the ISO 25964 standards for “Thesauri and interoperability with other vocabularies”.

2.1.1 EuroVoc - Domain and Microthesaurus EuroVoc is split into 21 Domains , divided in a number of Microthesauri . Altogether there are 127 Microthesauri . .

− The Domains are identified by two-digit numbers and titles in words:

Example: 10 EUROPEAN UNION

− The Microthesauri are identified by four-digit numbers — the first two digits being those for the field containing the Microthesaurus — followed by titles in words:

Example: 1011 European Union law

The numbering of fields and Microthesauri is identical in all language versions.

All concepts are accompanied by a reference to a Microthesaurus , introduced by the abbreviation MT ( Microthesaurus ) to show to which Microthesaurus or Microthesauri they belong.

Example of a concept belonging to a single Microthesaurus : Nationality MT 1231 international law

2.2 Thesaurus concepts

The role of the thesaurus is to remove ambiguities, ensuring that each concept is contextualised in such a way that its meaning is univocal. The meaning of the concept is determined by the field and the Microthesaurus to which it belongs, by its semantic relationships, by additional information (Scope Notes, Definition Notes or History Notes), as well as by its equivalents in other languages covered in EuroVoc.

EuroVoc contains a sub-class of concept to represent the country name, which contains the 2-Character ISO Country code as additional attribute.

2.3 Relationships between the concepts

2.3.1 Hierarchical relationship This hierarchical relationship (BT/NT – Broader Term /Narrower Term ) is based on hierarchical levels of superiority or subordination between concepts. The superior concept (Broader Term ) constitutes a class, whereas the subordinate concepts (Narrower Terms ) represent elements or parts of this class.

The relationships are indicated by the abbreviations:

− ‘BT’ ( Broader Term or generic term) between a specific and a more generic concept , together with an indication of the number of hierarchical steps between the specific term and the Broader Term in question.

AO 10592 Maintenance of the EuroVoc Front Office 7/88

ANNEX 18 A – Functional specifications

Example: standard BT1 standardisation BT2 technical regulations Concepts with no Broader Terms are called ‘ Top Terms ’.

− ‘NT’ ( Narrower Term or specific term) between a generic and a specific concept , together with an indication of the number of hierarchical steps between the Broader Term and each Narrower Term. Example: standardisation NT1 approval NT2 Community certification NT1 EC conformity marking NT1 harmonisation of standards NT1 international standard NT2 European standard NT1 standard NT2 production standard NT2 quality standard NT2 safety standard NT2 technical standard

2.3.2 Associative relationship The associative relationship (RT – Related Term ) is a relationship between two concepts which do not belong to the same hierarchical structure, although they have semantic or contextual similarities. The relationship must be made explicit because it suggests to the indexer the use of other indexing terms with connected or similar meanings which could be used for indexing or searches.

The associative relationship is shown by the abbreviation RT ( Related Term ), placed between two associated concepts and is reciprocal:

Example: credit guarantee RT financial solvency

2.3.3 Polyhierarchical relationship Some concepts can belong to several fields at once for logical reasons. One of EuroVoc’s distinctive features is the limitation of polyhierarchy. Concepts which could fit into a number of subject fields are thus generally assigned only to the field which seems the most logical for users.

However, polyhierarchy is accepted in the thesaurus for the field 72 GEOGRAPHY.

Concepts which belong to these fields can be subordinate to a number of Broader Terms .

For example,

Nigeria belongs to all of the following categories: West , ACP countries, OPEC countries and Ecowas countries.

Thus the various policies of the European Union, which could logically belong to the field 10 EUROPEAN UNION, are actually in the fields to which they directly relate.

For example,

AO 10592 Maintenance of the EuroVoc Front Office 8/88

ANNEX 18 A – Functional specifications

Common Agricultural Policy belongs to the field 56 AGRICULTURE, FORESTRY AND FISHERIES, and EU research policy to the field 64 PRODUCTION, TECHNOLOGY AND RESEARCH.

2.4 Thesaurus terms

The Thesaurus term s are split into three categories:

− The Preferred Terms (PT or descriptors), used for indexing and denoting in an unambiguous fashion the name of the concepts. In the EuroVoc thesaurus, all language versions have the same status: each Preferred Term in one language necessarily matches a Preferred Term in each of the other languages. A Thesaurus concept includes by default all the language equivalent of the Preferred Term .

− The Non-Preferred Terms (Non descriptors or NPT), used as access points in the thesaurus. They are never used as indexing terms but lead to the Preferred Term to apply. Equivalence between the Non Preferred terms in the different languages does not necessarily exist. Each language is characterised by its wealth and its semantic and cultural differences, and a concept in one language is not always represented in another language.

− The Compound Non-Preferred Terms (Compound NPT), a type of Non-Preferred Term (NPT) that instruct to use a combination of several Preferred Terms.

The equivalence relationship between a Preferred Terms (application of the law ) and its Non-Preferred Terms (derogation of the law, enforcement of the law, implementation of the law, validity of the law) is shown by the abbreviations

− ‘UF’ (used for), in front of the Non-Preferred Term Example: application of the law UF derogation from the law enforcement of the law implementation of the law validity of a law − ‘USE’, in front of the Preferred Term . Example: implementation of the law USE application of the law

− The equivalence relationship between a Compound Non-Preferred Term (The Hague Programme) and a combination of Preferred Terms (EU programme, area of freedom, sec uri ty and justice) is shown by the abbreviation USE+

Example: The Hague Programme USE+ EU programme area of freedom, security and justice − The equivalence relationship between a Preferred Term ’s component (EU programme, area of freedom, sec uri ty and justice) and a Compound Non-Preferred Terms is shown by the abbreviation UF+ EU programme UF+ The Hague Programme

area of freedom, security and justice AO 10592 Maintenance of the EuroVoc Front Office 9/88

ANNEX 18 A – Functional specifications

UF+ The Hague Programme

2.5 Notes

If the meaning of a Preferred Term is not clear enough, a note can be added to clarify its meaning. EuroVoc offers three types of notes:

(1) The Scope Notes that describes the intended use of term within the thesaurus, introduced by the abbreviation ‘SN’.

Example: retail price SN: Use in opposition to wholesale price, otherwise use consumer price.

(2) The definition notes that give a definition of a concept , introduced by the abbreviation ‘DEF’

Example: share farming DEF: Arrangement under which the landowner lets his land in return for part of the produce.

(3) The History Notes that summarize the historical evolution of the concept inside the thesaurus, introduced by the abbreviation ‘HN’.

When homographs — terms with the same spelling but a different meaning — are taken as Preferred Terms or Non-Preferred Terms , their meaning is clarified by an explanation in brackets.

Example: Land ()

2.6 Specific associations between terms (FullName – Acronym – ShortName)

In EuroVoc, specific associations, not foreseen by the ISO 25964 standards, have been modelled to represent the relationships between a term (full name) and its acronym or short name.

Example:

UNIFEM full name: UN Development Fund for Women United Nations Development Fund for Women

USE UN Development Programme

Example:

Development Programme

UF UN Capital Development Fund UNCDF UN Development Fund for Women UNDP [acronym] AO 10592 Maintenance of the EuroVoc Front Office 10/88

ANNEX 18 A – Functional specifications

UNIFEM United Nations Capital Development Fund United Nations Development Fund for Women United Nations Development Programme United Nations Volunteers UNV UN Volunteers Programme

2.7 Language equivalence between concepts

A concept in EuroVoc contains always 24 Preferred Terms (one Preferred Term by language of the thesaurus). The Non-Preferred Terms are not mandatory and their number may differ by language according to the language and cultural characteristics.

2.8 Description of the European Training Thesaurus (ETT, Cedefop) The European Training Thesaurus (hereafter ETT ) contains 1,550 Preferred Terms and 950 Non-Preferred Terms used to index and to retrieve information and news from the Cedefop online catalogue on vocational training ( VET-BIB ); the ETV web site; the Training Institutions database, and ERO. It has been created by Cedefop, the Agency for Vocational Training, located in Thessaloniki ().

Cedefop has also developed two other linguistic tools managed in the same TMS : • The Cedefop Glossary (hereafter GLO) that contains 127 terms with their definitions. (http://europass.cedefop.europa.eu/en/education-and-training-glossary) • The Controlled Vocabulary List (hereafter CVL), a flat list of about 100 key terms in the field of vocational training. They are used to search the content in the web portal of the Cedefop. (http://www.cedefop.europa.eu/EN/Browse-by-keyword.aspx )

In the TMS , the concepts can be shared and belong to one or more vocabularies. They are interlinked through the matched concept association ‘exact match’.

Since 2009, Cedefop’s and EuroVoc vocabularies have been using the same Thesaurus Management System ( TMS ) and the same data model, but on different standalone installations. Today, there is no ‘publishing mechanism’ that disseminates the content of ETT in a website.

Cedefop’s aim is to link these stand-alone linguistic tools and create interoperability between the vocabularies in the field of vocational education and training (VET), skills, qualifications and lifelong learning.

Cedefop’s vocabularies are today available in 13 languages: English, French, Danish, Dutch, Estonian, Finnish, German, Italian, Polish, Portuguese, Swedish, Czech and Maltese.

2.9 Description of the Thesaurus of European Education System (TESE )

Previously managed by Executive Agency EAC (Education, Audiovisual and Culture), the thesaurus is no more maintained. It contains 21 Microthesauri , 1,387 concepts, in which in 15 languages (Czech, Finnish, French, English, Estonian, German, Polish, Greek, Dutch, Portuguese, Italian, Spanish, Romanian, Turkish and Lithuanian).

AO 10592 Maintenance of the EuroVoc Front Office 11/88

ANNEX 18 A – Functional specifications

2.10 Shared concepts between EuroVoc, ETT and TESE

The Publications Office has aligned the concepts of EuroVoc 4.3, ETT and TESE .

TESE and EuroVoc share 843 equivalent concepts, ETT and TESE : 709 equivalent concepts, and EuroVoc 4.3 and ETT : 1,184 equivalent concepts.

3. A detailed presentation of the EuroVoc website

This section describes the functionalities of the EuroVoc website available at the following address: http://eurovoc.europa.eu/ .

3.1 Page layout

3.1.1 Splash Page The Splash page is the first access page into the EuroVoc website and allows the users to select a default interface language which will be stored in a cookie. Therefore, the selected interface language will be the default user language for the next EuroVoc session.

Fig. 1: EuroVoc Splash page

3.1.2 General page layout The screen is divided into four parts:

AO 10592 Maintenance of the EuroVoc Front Office 12/88

ANNEX 18 A – Functional specifications

(1) The Top Banner that contains the breadcrumbs (navigation path), the interface language selection menu, and the header menu

(2) The left navigation that contains the content language selection menu, and the main EuroVoc website functionalities (search, navigation, download, contribute)

(3) The central content frame that displays the thesaurus content as well as the editorial content

(4) The Footer navigation

(5) The right banner that contains the login area and offers links to Publications Office’s websites and social media.

Fig 2: EuroVoc website general layout

3.2 Language management

3.2.1 Content language The content language of the EuroVoc website is available in 24 languages. The content language is a drop-down menu that enables the end-users to select the language to search and display the content of the thesaurus. The language order follows the protocol order

AO 10592 Maintenance of the EuroVoc Front Office 13/88

ANNEX 18 A – Functional specifications

defined by the Interinstitutional Style Guide available on the Publications Office website: http://publications.europa.eu/code/en/en-370200.htm.

The default value of the content language is the language of the user’s browser, detected by the application automatically.

The ‘term detail view’ displays each concept in the selected content language, plus the language equivalents at the right side, in all the language versions. A different content language can be selected at any time from the Content language menu or from the list of language equivalents.

If the translation of a Preferred Term is not available, the Preferred Term (not clickable) is published in English and is followed by the label “under translation ”.

Fig. 3 : ‘term detail view’ and Language equivalent.

3.2.2 Interface language The EuroVoc website is available in 24 languages. The interface language can be selected in a drop-down menu by the end-users, and it sets the language of the editorial content of the EuroVoc website (website navigation, menu, and buttons). The language order follows the protocol order defined by http://publications.europa.eu/code/en/en-370200.htm. (See 3.10.4 Translation)

AO 10592 Maintenance of the EuroVoc Front Office 14/88

ANNEX 18 A – Functional specifications

Fig. 4 : EuroVoc Homepage

3.2.3 Editorial content The editorial content is defined in Drupal as language dependent labels and tooltips, and is created in English, the default pivot language . Specific translation tables are available to fill up the specified interface languages (see 3.10.4 Translation)

3.3 Search

3.3.1 Simple Search The Simple Search enables the end-user to search for a term in the selected content language. a. The default search mode retrieves the exact match for the word entered. The case used for typing the search is irrelevant (case insensitive), and the system will not look for terms having the same root word (no stemming). If the word searched is part of a compound term, such terms will be displayed in the result list, irrespective of the position of the word in the compound term. Likewise, in case multiple words are searched for in the same query, all terms containing the expression will be retrieved in any order. b. Expanding the simple search can be done through using left and/or right truncation and wildcard. Use the percent sign ('%') to truncate the searched word and find all terms that match the query, with zero or more characters in place of the % sign. c. The underscore sign ('_'), when used for replacing a character in the searched word, will allow ignoring the character and retrieving all results matching with the rest of the word.

AO 10592 Maintenance of the EuroVoc Front Office 15/88

ANNEX 18 A – Functional specifications

Fig. 5: Simple search

3.3.2 Advanced search The advanced search engine is based on Oracle Text 11g.

Accessible from the left-hand menu, the ADVANCED SEARCH option allows complex queries through different search modes and options, and enables to narrow on selected specific results by using filters.

AO 10592 Maintenance of the EuroVoc Front Office 16/88

ANNEX 18 A – Functional specifications

Fig. 6: Advanced search

The Advanced search functionality enables to select between 4 search modes:

1. Identical to the simple search feature, the ‘ Exact word’ will display the term or word precisely matching the searched characters, found anywhere in a single or multiple word expression. Truncation and wildcard searches through the _ and % signs are possible.

2. ‘Containing’ search retrieves a term, word or stem enclosed anywhere in a searched expression or phrase. The left and right truncations are used by default. The system will show all expressions that contain all the searched words in any order.

3. ‘Starting with’ search returns all the expressions starting with the searched word, stem or combination of words.

4. Similarly, the 'Ending with' search retrieves all the expressions ending with the searched word, stem or combination of words.

AO 10592 Maintenance of the EuroVoc Front Office 17/88

ANNEX 18 A – Functional specifications

The filters available under the ADVANCED SEARCH box allow restricting the search to specific elements or terms. Such items that can be cumulatively selected are:

a. PT Preferred Term . This option is checked by default

b. NPT Non-Preferred Term . This option is checked by default

c. Acronym, formed from the initial letter or letters of each of the successive parts or major parts of a compound term (e.g. NATO)

d. Short name, partly abbreviated usual expression (e.g. “ADR agreement”)

e. Scope Note, short text describing the intended use of term within the thesaurus

f. History Note, short text summarizing the historical evolution of the term inside the thesaurus g. Definition note, short text describing the term and its meaning h. Country name, for this filter, the PT and NPT boxes will be checked automatically, other fields are not selectable i. ISO-3166-2 country code

The following search filters can be chosen:.

− Stemming - Tick the Stemming checkbox to expand the results to all terms having the same root word. For example, “talk” will yield “talking,” “talks” and “talked,” as well as “talk” (but not “talkie”). For a better relevance of the results, this option is not available in the ‘Starting with or ‘Ending with’ search modes.

The Stemming option is linked to Oracle 10 features and is so far enabled only for the Dutch, English, French, German, Italian and Spanish languages. If the content language does not support stemming, this option is disabled and thus cannot be selected. Alternate spelling will be available by default for Swedish, German (including new German spelling) and Danish.

− Case sensitive queries allows the retrieval of results having the exact case and accentuation of the word typed in the search box.

− Search by EuroVoc release number: Search the concepts of a specific EuroVoc release (down to release 3.1) by selecting the number of the required version. Default search is performed in 'all releases'. This Release option lets users retrieve queried words contained in the expression from a specific release, as well as to find all the Preferred Terms and Non-Preferred Terms of a release (when not specifying any search terms).

AO 10592 Maintenance of the EuroVoc Front Office 18/88

ANNEX 18 A – Functional specifications

3.3.3 Search results Search results display the number of concepts that match the search criteria in the central part of the layout.

The concept (Preferred and Non-Preferred Terms) are clustered by Top Term under a Microthesaurus . The relevant matching criteria are displayed in Bold in the terms. A concept may have several hits matching the query.

If the matching criteria has been found in a Non-Preferred Term , the list of results will show the Preferred Term (in the left column) followed by the abbreviation 'UF' (Used For) and the matching Non-Preferred Term (in the right column).

Each Preferred Term , Non-Preferred Terms or notes are clickable to display the term details.

Fig. 6: Search fruit in the fields Preferred Term (PT), Non-Preferred Term (NPT) along with the list of results

AO 10592 Maintenance of the EuroVoc Front Office 19/88

ANNEX 18 A – Functional specifications

Fig. 7: Search on fish in the fields Preferred Term (PT), Non-Preferred Term (NPT) and Notes (History, Scope and Definition) along with the list of results.

AO 10592 Maintenance of the EuroVoc Front Office 20/88

ANNEX 18 A – Functional specifications

3.4 Browse

Click on ‘Browse the subject-oriented version to navigate through the EuroVoc thesaurus by Domains and Microthesauri. All the Domains and Microthesauri are displayed by clicking on the 'Expand all ' button. They are hidden or collapsed by clicking on the 'Collapse all' button.

Fig. 8: Navigate through the Domains

Clicking on the '+' / '-' sign will expand or collapse the desired Domain only.

AO 10592 Maintenance of the EuroVoc Front Office 21/88

ANNEX 18 A – Functional specifications

Fig. 9: Navigate through the Microthesaurus

The hierarchy of the concepts is displayed by clicking on a Microthesaurus .

AO 10592 Maintenance of the EuroVoc Front Office 22/88

ANNEX 18 A – Functional specifications

Fig. 10: Display the content of a Microthesaurus (MT 1021) – Tree view and Related Terms

The terms belonging to a Microthesaurus are sorted alphabetically, filed under the Top Terms , while their Narrower Terms (mentioned as 'NT') and Related Terms (mentioned as 'RT' along with the Microthesaurus number they belong to) are placed below.

The number preceding the abbreviation NT indicates the number of levels between the NT (Narrower Term or Specific Term) and its Top Term (the term at the highest level).

At any time of the navigation, tracing back your browsing steps can be done by clicking on the links shown on the breadcrumb, which displays the different levels of thesaurus hierarchy that have been browsed through.

For a clearer view of the Microthesaurus structure, Related Terms can be hidden by clicking on the 'Hide RT' button and displayed back with its alternative 'Show RT' version.

AO 10592 Maintenance of the EuroVoc Front Office 23/88

ANNEX 18 A – Functional specifications

Fig. 11: Display the content of a Microthesaurus (MT 1021) – Tree view and hide the Related Terms

Likewise, clicking on the 'Flat view' button displays all terms available in the Microthesaurus alphabetically, with no hierarchical order. Non-Preferred Terms are shown in italics with their proper Preferred Terms match to be used.

AO 10592 Maintenance of the EuroVoc Front Office 24/88

ANNEX 18 A – Functional specifications

Fig. 12: Display the content of a Microthesaurus (MT 1021) – Flat view

The 'Tree view' button will allow shifting back to the default hierarchical display.

‰ See the EuroVoc ontology description in the EuroVoc website : EuroVoc Domains and Microthesauri http://eurovoc.europa.eu/drupal/?q=node/555

3.5 Term details

3.5.1 Term details: Concept Clicking on a term from the Microthesaurus structure or from the search results will display its place in the Microthesaurus , its relationships (e.g., BT, NT, USE or UF (=Used For), RT) as well as its equivalents in other languages covered in EuroVoc.

The left part displays the concept information in the selected content language:

− The concept attributes: the Preferred Term (PT), the Non-Preferred Terms , the Compound Non-Preferred Terms , the notes.

− The concept /term relations:

o the relations between the Preferred Term and its Non-Preferred Terms (USE/UF) AO 10592 Maintenance of the EuroVoc Front Office 25/88

ANNEX 18 A – Functional specifications

o Use instead (relation between an obsolete concept and a new concept ),

o the specific term relations (Full Name, Acronym, Short Name)

− The concept relationships:

o Association to the Microthesaurus

o Hierarchical ( Broader Term , BT – Narrower Term , NT) and associative relations ( Related Term – RT) to the concepts in EuroVoc.

Fig. 13: View the term details in English

− The right part of the screen shows the language equivalent for the Preferred Terms . The content language can be changed at any time either by clicking on one of the language equivalents or by selecting a new language in the list of Content language (Left Navigation).

Fig. 14: View the term details in Danish (new selected content language)

‰ See the EuroVoc ontology description in the EuroVoc website :

AO 10592 Maintenance of the EuroVoc Front Office 26/88

ANNEX 18 A – Functional specifications

• Thesaurus concepts http://eurovoc.europa.eu/drupal/?q=node/556

• Thesaurus term s http://eurovoc.europa.eu/drupal/?q=node/557

I.1.1 Term details: Country

Fig. 15: View the term details for a country (class of concept ) The thesaurus data model defines a country as a specific country sub-class of the class Thesaurus concept , with the ISO country code as additional attribute. The attribute ISO- 3166-2 is displayed in the term details frame as first line.

The country class and its ISO attribute is searchable in the advanced search 3.3.2 Advanced search.

‰ See EuroVoc ontology description in the EuroVoc website Thesaurus concepts http://eurovoc.europa.eu/drupal/?q=node/556

3.5.2 Term details: Non-Preferred Terms The Non-Preferred Term is displayed in the term details screen, followed by USE and the corresponding Preferred Term .

Fig. 16: View a Non-Preferred Term

Language equivalent of a Non-Preferred Term might be displayed in the right part of the screen but are not mandatory.

3.5.3 Term details: Obsolete concept An obsolete concept is replaced, for historical matter, by a new concept but is still searchable in EuroVoc.

AO 10592 Maintenance of the EuroVoc Front Office 27/88

ANNEX 18 A – Functional specifications

The Preferred Term of an obsolete concept is suffixed by the label [Obsolete], and followed by an indication USE instead leading to the replacing concept .

Fig. 17: View an obsolete concept with a Use instead association

‰ See the EuroVoc ontology description in the EuroVoc website :

° Thesaurus concepts http://eurovoc.europa.eu/drupal/?q=node/556

3.5.4 Term details: Polyhierarchy Concepts belonging to Domain 72 (Geography) can have more than one Broader Term .

In that case the Microthesauri of the concept are ordered by number and the hierarchical relations displayed below each Microthesaurus .

AO 10592 Maintenance of the EuroVoc Front Office 28/88

ANNEX 18 A – Functional specifications

Fig. 18: View a concept with polyhierarchy

‰ See the EuroVoc ontology description in the EuroVoc website : EuroVoc Domains and Microthesauri http://eurovoc.europa.eu/drupal/?q=node/555

3.5.5 Term details: Compound Non-Preferred Terms A compound Non-Preferred Term is a Non-Preferred Term that instructs to use a combination of more than one Preferred Term.

In the detailed view of a Preferred Term , the Compound Non-Preferred Terms are preceded by the acronym UF+.

The equivalence relationship between a Preferred Term (EU programme or area of freedom, security and justice) and a Compound Non-Preferred Term (EU programme, area of freedom, security and justice) is shown by the abbreviation UF+ in the Preferred Term .

EU Programme UF European Union programme Community programme Community framework programme UF+ The Hague programme MT 1016 European construction BT1 EU action Fig. 19: View a Compound Non-Preferred Term in the term details of a Preferred Terms

In the detailed view of a Compound Non-Preferred Term , the combination of Preferred Terms to use (EU programme, area of freedom, security and justice) is shown by the abbreviation USE+.

AO 10592 Maintenance of the EuroVoc Front Office 29/88

ANNEX 18 A – Functional specifications

The Hague programme USE+ EU programme Area of freedom, security and justice

Fig. 20: View the association to a combination of Preferred Terms in the term details of a Compound Non-Preferred Terms.

3.5.6 Term details: Full Name, Short Name and Acronyms The EuroVoc data model defines particular relations between terms within the same language

− hasAcronym – hasFullName – hasShortName

In a Non-Preferred Term , the relation is indicated before the USE instruction and the term is clickable.

Fig. 21: Non-Preferred Term View: association acronym – FullName

Fig. 22: Non-Preferred Term View: association FullName – Acronym

In a Preferred Term , the relation to a Non-Preferred Term is indicated after the Non- Preferred Term between brackets.

UN Development Programme UF UNCDF [Acronym] United Nations Capital Development Fund Fig. 23: Preferred Term View: association FullName – Acronym

3.6 Map

A graphical representation of the term's position in the Microthesaurus can be obtained by hitting the ' Map ' button. It will thus be shown as a white dot with arrows corresponding to its relation to the Microthesaurus (identified by a blue dot), Narrower Terms (red dot), Broader Terms (green dot) or Related Terms (yellow dot).

AO 10592 Maintenance of the EuroVoc Front Office 30/88

ANNEX 18 A – Functional specifications

Fig. 24: Graphical map for the concept ‘financial perspective’

AO 10592 Maintenance of the EuroVoc Front Office 31/88

ANNEX 18 A – Functional specifications

3.7 Download

3.7.1 Download (by Domain ) This page contains the subject-oriented version in PDF by Domain and by language. The user can select any Domain and language combination.

For each download, the selected PDF files are compressed in a zip file.

These files are produced by the PDF generation process.

Fig. 25: Download by Domain

AO 10592 Maintenance of the EuroVoc Front Office 32/88

ANNEX 18 A – Functional specifications

3.7.2 Download (Permuted alphabetical) This page offers access to the permuted alphabetical index in PDF format by language. The permuted alphabetical version is an index of all the Thesaurus Terms and their permuted entries classified in alphabetical order.

The user selects at least one language and gets the permuted alphabetical list in a zip file.

These files are produced by the PDF generation process.

Fig. 26: Download the permuted alphabetical version

3.7.3 Download (Multilingual list) This page enables to download an Excel file including all the Preferred Terms in one language with their equivalents in a number of selected languages, sorted by concept ID number or alphabetically. At least one language must be selected.

AO 10592 Maintenance of the EuroVoc Front Office 33/88

ANNEX 18 A – Functional specifications

Fig. 27: Download the Multilingual list

AO 10592 Maintenance of the EuroVoc Front Office 34/88

ANNEX 18 A – Functional specifications

Fig. 28: Sample of multilingual list of Preferred Terms downloaded in English, French and Italian.

3.7.4 Download (Alphabetical Index) This page enables to download all the terms (Preferred and Non-Preferred) of a selected Microthesaurus , in a selected languages. The file lists all the terms of a selected Microthesaurus alphabetically, and in addition, for the Non- Preferred Terms , their associated Preferred Term prefixed by the USE abbreviation.

AO 10592 Maintenance of the EuroVoc Front Office 35/88

ANNEX 18 A – Functional specifications

Fig. 29: Download the alphabetical index

AO 10592 Maintenance of the EuroVoc Front Office 36/88

ANNEX 18 A – Functional specifications

Fig. 30: Sample of a downloaded alphabetical index in German.

AO 10592 Maintenance of the EuroVoc Front Office 37/88

ANNEX 18 A – Functional specifications

3.7.5 Download (SKOS-XML) This area is restricted to the EuroVoc registered users who have signed a EuroVoc licence agreement with the Publications Office and have received a login and password that enable them to download EuroVoc in XML or SKOS/RDF format.

The SKOS/RDF , generated in the Back-Office, is validated by the import process in the Front-Office and transformed to XML (as described in 0)

Item Value

OS Sun Solaris 10 u8

Oracle Database 11g release 11.2.0.3.6 64 bits

Encoding AL32UTF8

2 Gbyte per database instance Minimal Disk space Two database instances are created, one for Acceptance, one for operational

Maximum Disk space 2 x 10 Gigabyte (disk space allocated 12 GB, 26 tables)

Minimal RAM Memory 2 Gbyte per database

Recommended per db instance 4 Gbyte

Expected indexed words Starting 320.000 and growing

Item Value

OS Sun Solaris (tested on 10.8)

Oracle client 11.2.0.3.0 or higher

Apache 2 2.2.11

Apache requirements http://httpd.apache.org/docs/2.2/install.html

Apache Minimal Disk space 100 Megabyte

Apache Maximum Disk space 500 Megabyte

Apache Minimal RAM Memory 16 Megabyte (nowhere clearly specified)

Apache Customised RAM 50 Megabyte memory

Application Server Minimal Disk 1 Gbyte space

RDF Store Minimal Disk space 2 Gbyte

Total Application Minimal Disk 3.1 Gbyte space

Minimal RAM Memory 2 Gbyte

AO 10592 Maintenance of the EuroVoc Front Office 38/88

ANNEX 18 A – Functional specifications

Item Value

PHP 5.2.9

PHP memory limit 128M

MySQL 5.0.75

Drupal 6.9

JDK 1.6.0_07

Tomcat 6.0.18 The Drupal 6.9 prerequisites are Apache 2, MySQL 5.0, PHP 5.2.

Publishing mechanism). The XML contains also the DTD files. Both the original SKOS/RDF (SKOS-XL) and the generated XML files are available for download.

Fig.24: Screen displayed to the EuroVoc registered users .

Fig. 31: Default screen displayed to the users who don’t own a login and a password.

3.8 Contribute

Clicking on the Contribute button opens the maintenance form that enables a user to send a proposal on a Preferred Term or Non-Preferred Term and to attach a background document to the proposal.

If the maintenance form is selected from the term detail screen, the value of the Preferred Term is displayed automatically in the form.

AO 10592 Maintenance of the EuroVoc Front Office 39/88

ANNEX 18 A – Functional specifications

Fig. 32: Maintenance form called by the [ Contribute] button in the concept ‘adaptation of financial perspectives’

If the maintenance form is called from the Left Navigation (Contribute item), the Value of the Preferred Term is empty.

As soon as the user has sent the proposal, a confirmation is sent to both the user and the EuroVoc generic email ( [email protected] )

AO 10592 Maintenance of the EuroVoc Front Office 40/88

ANNEX 18 A – Functional specifications

3.9 New approved concepts

The list of approved concepts displays a list of candidates or new terms that will be added in the following release of the EuroVoc thesaurus. These are approved but no yet part of the official release of the EuroVoc thesaurus.

The list of approved concepts is generated and published automatically from the Back- Office.

The list of new approved concepts is a flat list where terms are grouped by language and then alphabetically. Every Preferred Term is clickable and will pop up the detailed concept screen with the title Approved concepts for the next release.

The Microthesaurus is specified between brackets on the right side of the term. Approved concepts can not be retrieved via the search engine and are not displayed in the official release of the thesaurus.

List of new approved terms

[en] chiroptera (5211) [en] computer science (3606) [en] industrial accident (6806) [en] marine science (3606) [fr] accident industriel (6806) [fr] chiroptera (5211)

Fig. 33: List of approved concepts

3.10 Drupal

3.10.1 User and authorization management The following roles have been defined in Drupal for the EuroVoc website :

− Anonymous user: public user

− Authenticated user: standard Drupal user

Fig. 34: Login area

− EuroVoc team : authenticated user with the permission to add, modify or delete the multilingual content, manage the user authentication.

− Licensee user : authenticated user with the permission to download the SKOS /XML. The licensee user is informed automatically by the system as soon as a new release has been published in the operational environment.

Any Authenticated user can request a new password, resent by email.

AO 10592 Maintenance of the EuroVoc Front Office 41/88

ANNEX 18 A – Functional specifications

Fig. 35: Screen to request a new password

AO 10592 Maintenance of the EuroVoc Front Office 42/88

ANNEX 18 A – Functional specifications

Authorizations are maintained in the User Module .

Fig. 36: User management Module . List of users.

Fig. 37: User management Module . Add user.

AO 10592 Maintenance of the EuroVoc Front Office 43/88

ANNEX 18 A – Functional specifications

− The list of users can be imported in CSV format in the Drupal user import Module

Fig. 38: User import Module .

− Every time a new release of EuroVoc is published in the acceptance environment, an email is sent automatically to the registered licensee users.

3.10.2 Types of content Drupal enables to maintain different types of content:

• Page : simple Nodes used for static multilingual content that are linked to the menu items

In Drupal , a Node is assigned to each language-dependent page.

Fig. 39: Drupal Node for the Copyright notice in French.

AO 10592 Maintenance of the EuroVoc Front Office 44/88

ANNEX 18 A – Functional specifications

Fig. 40: List of translated pages and their Nodes for the Copyright notice.

AO 10592 Maintenance of the EuroVoc Front Office 45/88

ANNEX 18 A – Functional specifications

Fig. 41: Drupal page for the English content of Copyright Notice linked to the Legal notice menu.

• Book : Allows users to structure multilingual site pages in a hierarchy or outline. Book pages are linked to menu items.

Fig. 42: Book page for the content linked to the About EuroVoc menu.

AO 10592 Maintenance of the EuroVoc Front Office 46/88

ANNEX 18 A – Functional specifications

Fig. 43: Book page content.

AO 10592 Maintenance of the EuroVoc Front Office 47/88

ANNEX 18 A – Functional specifications

• Help page: provides specific pop-up help for a specified page.

Fig. 44: Contextual help linked to the Microthesaurus display.

• The Help is called by the Help Menu item in the Header Menu.

Fig. 45: [Help] button to call the contextual help.

3.10.3 Menu item Drupal menus are a hierarchical collection of links, or menu items, used to navigate a website.

In the EuroVoc website , the Header and Bottom navigation are set up as Drupal Menus.

AO 10592 Maintenance of the EuroVoc Front Office 48/88

ANNEX 18 A – Functional specifications

Fig. 46: Menu item defined-area in the EuroVoc website

AO 10592 Maintenance of the EuroVoc Front Office 49/88

ANNEX 18 A – Functional specifications

Fig. 47: Menu item management in Drupal 6.9.

3.10.4 Translation Page

Every new page is created in the source language (English) and translated via the Translate Tab.

If no translation is available, the language of the page can be set up as ‘ language neutral’, so that page will be viewed as default in all the editorial languages of the interface.

AO 10592 Maintenance of the EuroVoc Front Office 50/88

ANNEX 18 A – Functional specifications

Fig. 48: Management of translation for pages in Drupal .

Menu

1. The menu Module provides an interface to control and customized the menu items in Drupal.

AO 10592 Maintenance of the EuroVoc Front Office 51/88

ANNEX 18 A – Functional specifications

Fig. 49: Association to a menu item for the Legal Notice page content.

New language-dependent menu item can be set up in the Menu Module

AO 10592 Maintenance of the EuroVoc Front Office 52/88

ANNEX 18 A – Functional specifications

Fig. 50: List of language-dependent menu item (Advanced Search)

Translation labels of the EuroVoc interface can be extracted from Drupal , edited in POeditor ( http://drupal.org/node/220341/ ) and re-imported in Drupal .

AO 10592 Maintenance of the EuroVoc Front Office 53/88

ANNEX 18 A – Functional specifications

Fig. 51: Translate interface Module . Area to export the interface labels.

The translate Interface Module enables to search and edit a single label in the interface languages

AO 10592 Maintenance of the EuroVoc Front Office 54/88

ANNEX 18 A – Functional specifications

Fig. 52: Translate interface Module . Search for an interface label.

AO 10592 Maintenance of the EuroVoc Front Office 55/88

ANNEX 18 A – Functional specifications

Fig. 53: Translate interface Module . Update the translation for a label.

I.1.2 Non-Drupal layout

AO 10592 Maintenance of the EuroVoc Front Office 56/88

ANNEX 18 A – Functional specifications

Tooltips are shown when the user moves the mouse on different items on the screen (menu, buttons, labels, terms, etc.)

Fig. 54: English Tooltip for the Alphabetical index menu.

The EuroVoc website configuration didn’t enable the management of the tooltips’ translation for the left menu items (Content language, Simple Search, Browse, Download and Your proposal). Therefore, the elements in the left menu item have been hard-coded in Java to allow the translation management through the Drupal ‘translate interface’ Module .

3.11 PDF Layout specifications

This part presents the functional requirements regarding the generation of the PDF files, downloadable from the web site and ready to use for the printable version of EuroVoc.

The following table summarizes the requirements for each type of file. The input file to generate the PDF is the SKOS-XL format), generated from the back-office. Downloadable files Printed version

Subject -oriented version One PDF file per Domain and b y One PDF file by language language Pagination starts on page 21 Pagination: Dxx/x No Title page (D + Domain number/page

AO 10592 Maintenance of the EuroVoc Front Office 57/88

ANNEX 18 A – Functional specifications

Downloadable files Printed version number) One Title page Alphabetical permutated One PDF file by language One PDF fil e by language version Pagination starts on page 1 Pagination starts on page 21 One Title page No Title page 2 parts (exception for some languages)

3.11.1 Subject-oriented version Title Page

Figure 55 - Title page of the subject-oriented version

The font used in the document is DejaVu.

AO 10592 Maintenance of the EuroVoc Front Office 58/88

ANNEX 18 A – Functional specifications

Following details are present on the title page: − Left: Title EUROVOC Thesaurus å, − Right: ISSN ç Thesaurus release number and Language of the document é, − Title of the document è, − Field xx = Domain number ê, − Title of the Domain ë, − Abbreviation of the language in ISO-3166-2 code í, − Footer: © , yyyy (year of the computer clock at the moment of production) ì.

Subject-oriented presentation of a Microthesaurus

AO 10592 Maintenance of the EuroVoc Front Office 59/88

ANNEX 18 A – Functional specifications

Figure 56 – Sample of the Subject-Oriented Presentation (downloadable file)

The page is organized in a two column layout with a page break if a concept can’t be written with all his Related Terms on the same page. The page break can happen at any needed place. If a term (NT, BT, RT…) is too long to show on one line, the whole phrase must be shown on one page.

å Header: Microthesaurus id in 4 digit format and name of the Microthesaurus . An arrow next to it to indicate next and/or previous pages (Font size 12 pt),

ç Footer: Domain in format DXX/YY with XX Domain number and YY the page number (Title page is not included in the page numbering),

éTop Term (font size: 11 pt),

èNT1: Narrower Term on first level (font size: 10 pt),

êNT2: Narrower Term of second level (font size: 10 pt), NT3, NT4, NT5 with the same font

ëRT: Related Terms (font size: 9 pt).

AO 10592 Maintenance of the EuroVoc Front Office 60/88

ANNEX 18 A – Functional specifications

Terms are ordered alphabetically for each language.

All term names are formatted with an optional revision indication between square brackets (terms released on the latest revision).

A new Microthesaurus always starts at the beginning of the page.

Terminology of the field

Each subject-oriented presentation of a Domain is followed by the terminology of the field.

The page contains the terms (PT, NPT) of the Domain and is organized in 2 columns alphabetically.

Title Terminology of the field: XX Domain releases an alphabetical list of terms (PT, NPT) of the Domain ,

Only the Preferred Term is followed by the Microthesaurus number between brackets.

In case of a Non-Preferred Term , the following line contains:

USE or USE+ followed by one Preferred Term per line.

Lay-out rules:

The format for the PT is 10pt.

The format for the NPT is 9 pt.

AO 10592 Maintenance of the EuroVoc Front Office 61/88

ANNEX 18 A – Functional specifications

Figure 57 - Sample of terminology of the field

AO 10592 Maintenance of the EuroVoc Front Office 62/88

ANNEX 18 A – Functional specifications

3.11.2 Permuted Alphabetical version Title Page

Figure 58 - Title page of the permuted alphabetical version, (downloadable file)

Following details are present on the title page: − Left: Title EUROVOC Thesaurus å, − Right: ISSN ç Thesaurus release number é, − Language of the document è, − Title of the document Permutated alphabetical version ê, − Abbreviation of the language in ISO-3166-2 code ë,

AO 10592 Maintenance of the EuroVoc Front Office 63/88

ANNEX 18 A – Functional specifications

− Footer: © European Communities, yyyy (year of the computer clock at the moment of production) í.

Alphabetical permuted presentation

Figure 59 - Sample of an Alphabetical Permuted Presentation

Page heading

Left: First term of the page å (font size: 12pt),

Right: Last term of the page (PT or NPT) ç,

In case of a permutated entry only the string before the comma will be printed. e.g. account, European unit of and only account is printed. Footer: Page number é.

The content is displayed in three columns.

Entries of the index are terms:

− Preferred Term

AO 10592 Maintenance of the EuroVoc Front Office 64/88

ANNEX 18 A – Functional specifications

− Preferred Term (obsolete)

− Non-Preferred Term (s) (font 9 pt)

− USE PT label on the following line

− Compound Non-Preferred Term

ê Each Preferred Term displays (font size 10 pt): − Note(s) (SN í, DEF, HN) − MT in the format of: id + name − UF ( Non-Preferred Terms ) − BT1 ë − BT2, BT3… − NT1 − NT2, NT3... − RT Remarks: − No hyperlink will be reproduced in the printable version. The XML will contain the PT of the concept . − The arrow indicates that the term details go on at the next page.

− All term names are formatted with an optional revision indication between square brackets (terms released on the latest revision). The term relationships will appear in the printable version of EuroVoc.

Examples

Optional revision Indication

Ustí [V4.2] MT 7211 regions of EU Member States BT regions of the [V4.2]

Countries with polyhierarchy

PTOM français

DEF Collectivités territoriales de la République française créées en 1946 et supprimées en 2003. [Dictionnaire Le petit Robert, 2004] En 2003, les TOM (Territoires d'Outre-Mer) ont été remplacés par les COM (Collectivités d'Outre-Mer). Elles disposent chacune d'une organisation particulière leur permettant d'adopter des règles locales, qui peuvent être différentes de celles de la métropole dans de nombreux Domaines. [Ministère de l'outre- mer, site web] MT 7241 pays et territoires d'outre-mer UF PTOM () [short name] pays et territoires français d'outre-mer [full name] territoires d'outre-mer de la République française [full name] collectivités de la République française collectivités françaises d'outre-mer COM française AO 10592 Maintenance of the EuroVoc Front Office 65/88

ANNEX 18 A – Functional specifications

In order to handle polyhierarchy, the same logic of the web-portal will be used in order to present the hierarchy being that the Broader Terms are listed under their appropriate Microthesaurus .

Narrower Term hierarchy is displayed independently from the MT and after the BT.

Preferred Term with Compound Non-Preferred Terms Iraq ISO IQ MT 7226 Asie - Océanie UF République d’Iraq [full name] Irak UF+ situation en Iraq guerre en Iraq RT question du (3606) BT1 pays du Golfe MT 7231 géographie économique BT1 pays du Marché commun arabe BT1 pays de l'OPEP BT1 pays du CUEA MT 7236 géographie politique BT1 pays de la Ligue arabe

Obsolete concept USSR (obsolete) SN Use only for the State which existed until the break-up of the Soviet Union. MT 7231 economic geography MT 7236 political geography USE INSTEAD USE INSTEAD USE INSTEAD USE INSTEAD USE INSTEAD USE INSTEAD USE INSTEAD USE INSTEAD Kazakhstan USE INSTEAD Kyrgyzstan USE INSTEAD Turkmenistan USE INSTEAD Tajikistan USE INSTEAD Uzbekistan USE INSTEAD USE INSTEAD USE INSTEAD UF Soviet Union BT1 CMEA countries BT1 Eastern Bloc countries ..BT2 former socialist countries BT1 Warsaw Pact countries

3.11.3 Language Dependent Appendices The XSL-FO file that contains the translation by language for the production of the Title page and headers is converted into XML format that can be edited and saved.

AO 10592 Maintenance of the EuroVoc Front Office 66/88

ANNEX 18 A – Functional specifications

The examples given in this document are in English only. We add the following Excel sheet containing the translations in the other supported European languages for the title page and the headings. It contains also the ISSN for each publication.

AO 10592 Maintenance of the EuroVoc Front Office 67/88

ANNEX 18 A – Functional specifications

Volume 1 Part A: translation table

Volume 1 – Part B: translation table

AO 10592 Maintenance of the EuroVoc Front Office 68/88

ANNEX 18 A – Functional specifications

Volume 2: translation table

The alphabetical permutated printable version has one volume and two parts. Each part is divided on a specific letter, defined according the language. CZ: LT část A - A až O A - A-Y část B - P až Ž B - J–Ž DA LV del A - A til I A - A līdz Ī del B - J til Å B - J līdz Ž DE PL Teil A - A bis I A - A - I Teil B - J bis Z B - J - Ż EL SK A - Α - Θ A - A - Í Β - Ι - Ω B - J - Ž ES, EN, FR, IT, PT, NL, PT SL A - A to I A - A - I B - J to Z B - J - Ž ET SV A - A to I A - A-I B - J to Ü B - J-Ö FI BG A - A to I A - A- Н B - J to Ö B - О - Я HU RO A - A to I A - A - Î B - J – Z B - J to

3.11.4 Stop words A list of stop words in English, Italian and Czech is available in the following file.

° Please refer to the archive StopWords.zip in the archive “Annex_18E_Attachments”

AO 10592 Maintenance of the EuroVoc Front Office 69/88

ANNEX 18 A – Functional specifications

3.12 Generation of Permuted entries

The permuted entries are at computed automatically for the Preferred and Non-Preferred Terms In the TMS . They can later be edited manually and re-imported in batch in XML format.

1) A permutation is generated on each significant word following a blank character, except for the first word and the Stop words .

• Example of permuted lexical form for ‘ politica ambientale dell’UE’ (Italian):

ambientale dell’UE, politica politica, ambientale dell’UE UE politica, ambientale dell’

Stop words : dell’

2) In ‘unified compound terms’, the significant word is marked by a space before the significant word in a field called Tokenized lexical form.

• Example of permuted lexical form for a unified compound term: EU milieubeleid (Dutch)

Tokenized lexical: EU- milieubeleid Computed permuted lexical form: milieubeleid, EU-

In ‘unified compound terms’, the significant word is marked-up by a space before the significant word in a field called Tokenized lexical form.

3) The list of Stop words by language is imported in a batch or updated manually

Specific symbols or typesets can be considered as part of the stop words , for example: • Apostrophe in Latin languages (l’, d’, dell’,) • Hyphen in Maltese tal-, l-

3.13 Transition between two releases

1. The transition between two releases of EuroVoc was available in the older version of the EuroVoc website, before 2008. The functionality is currently not present.

The information was provided in HTML in the website and proposed, by language of the thesaurus,

− A list of the new descriptors (Preferred Terms)

− A list of the modified descriptors

− A list of the deleted descriptors

− A list of the modified Non Descriptors (Non-Preferred Terms)

− A list of the new Non Descriptors

− A list of the deleted Non Descriptors

− A list of the modified Non Descriptors

− A list of changes in the hierarchical relationships

AO 10592 Maintenance and further development of the EuroVoc Front Office 70/88 70

ANNEX 18 A – Functional specifications

A sample of the transition between EuroVoc 4.2 and EuroVoc 4.3 is available in HTML in the following attached file:

° Please refer to the archive Transition_4.3_FR.zip in the archive “Annex_18E_Attachments”

3.14 Semantic technologies

3.14.1 URI management A number of EuroVoc resources – concepts, Preferred Terms , Non-Preferred Terms and Microthesaurus - are represented by an URI , published in the EuroVoc website .

For example http://eurovoc.europa.eu/8453 is the URI of the concept ‘digital evidence’.

As the concept contains 24 Preferred Term (one Preferred Term by language), each Preferred Term is also represented by its own URI , for instance,

http://eurovoc.europa.eu/211540 is the URI of the Preferred Term (English)

http://eurovoc.europa.eu/22637 is the URI of the Preferred Term (French)

http://eurovoc.europa.eu/338517 is the URI of the Preferred Term (Portuguese) etc.

Each Non-Preferred Term is represented by its own URI , for instance,

http://eurovoc.europa.eu/3338517 is the URI of a Non-Preferred Term (Portuguese)

http://eurovoc.europa.eu/117946 is the URI of a Non-Preferred Term (Spanish), etc.

Clicking on a concept displayed in the list of results opens the ‘term detail view’ of the concept in the selected content language; clicking on a any term (Preferred Term or Non-Preferred Term ) opens the ‘term details’ in the language of the selected term.

Example of an URI for a Microthesaurus : http://eurovoc.europa.eu/100169

Today, accessing EuroVoc Thesaurus and Domains ’ URIs produced an error message as they are unknown resources [ http://eurovoc.europa.eu/100141 ]. These resources have URIs, but they are not implemented in the EuroVoc website.

3.15 Thesaurus Alignment Environment

TAE (Thesaurus Alignment Environment) is a stand-alone application that can be used to align the concepts from two conceptSchemes automatically and to validate the mapping results manually. The TAE is outside the scope of the current call for tenders. The mapping results are exported in SKOS-Core but at this time, they are not yet disseminated in the EuroVoc website.

The following mappings are validated and ready to be published in the EuroVoc website:

• EuroVoc 4.3 – EuroVoc 4.2

• EuroVoc 4.3 – GEMET (European Environmental Agency Thesaurus)

The mappings are published in the GEMET website : http://www.eionet.europa.eu/ GEMET AO 10592 Maintenance and further development of the EuroVoc Front Office 71/88 71

ANNEX 18 A – Functional specifications

• EuroVoc 4.3 – Agrovoc (FAO Thesaurus)

Mapping between EuroVoc and Agrovoc is available in the Agrovoc website http://aims.fao.org/website/AGROVOC-Thesaurus/sub

The following mappings have been validated but the targeted conceptSchemes do not have URIs today so we can not publish the results:

• EuroVoc 4.3 – TESE (European Education Thesaurus)

• EuroVoc 4.3 – ETT (Vocational Training Thesaurus)

• EuroVoc 4.3 – OSHA thesaurus (European Agency for Safety and Health at Work)

• EuroVoc 4.3 – ECLAS Thesaurus ( Library)

• EuroVoc 4.3 – Court of Justice Case Law Classification

• EuroVoc 4.3 – Classification of the Legislation in Force ( EUR-Lex ) –

• EuroVoc 4.3 – EU-Bookshop Classification

3.15.1 Web services A set of Web services are available to query the conceptScheme (EuroVoc, ETT , TESE , GEMET , Agrovoc, etc.) and their alignment results.

Those Web services , made in SPARQL endpoint , correspond to the following predefined queries:

− List Vocabularies : retrieves the list of vocabularies in the alignment repository;

− Search concepts : retrieves the concepts that belong to a specific conceptScheme, based on their labels;

− Get Narrower Concepts : provides the Narrower Terms of a searched concept in a selected conceptScheme ;

− Read Labels : read the labels (Preferred, Non-Preferred) for a concept in a selected conceptScheme;

− Translate URI : finds the aligned concepts for an requested URI

− Translate string : find the aligned concepts of a label ( Preferred Term or Non-Preferred Term )

3.15.2 Computing inferences and double annotations The alignments are expressed in SKOS mapping (exactMatch , broadMatch , narrowMatch and relatedMatch ). The full set of all possible semantic correspondences or deductions can be computed on the basis of defined extension rules, for example:

• Extension of an exactMatch into broadMatch If Concept A1 has exactMatch Concept A2 NT Concept B1 ‰ Thus we can infer that Concept B1 has broadMatch Concept A2 • Extension of an exactMatch into narrowMatch BT Concept B1 If Concept A1 has exactMatch Concept A2

AO 10592 Maintenance and further development of the EuroVoc Front Office 72/88 72

ANNEX 18 A – Functional specifications

‰ Thus we can infer that B1 has narrowMatch A2 • Inversion of a broadMatch into narrowMatch (reciprocity) If Concept A1 has exactMatch Concept A2 ‰ Thus we can infer that A2 has broadMatch A1 • Extension of a broadMatch into broadMatch for the ascending concepts (NT) If Concept A1 has broadMatch Concept A2 NT Concept B1 ‰ Thus we can infer that Concept B1 has broadMatch Concept A2 • Extension of a narrowMatch into narrowMatch for the generic concepts (BT) BT Concept B1 If Concept A1 has narrowMatch Concept A2 ‰ As a consequence we can infer that concept B1 has narrowMatch concept A2

Semantic inferences will be applied to automated annotation and retrieval in different document collections. For example, it shall be possible to use a Preferred Term or Non-Preferred Terms to search in a collection not annotated by the conceptScheme , but mapped to this conceptScheme .

USE CASE: Collection A is annotated by EuroVoc - Collection B is annotated by ETT

• A search on the Preferred Term EuroVoc:‘Female migrant’ in Collection B shall retrieve the documents annotated by ETT :‘migrant woman’.

ETT EuroVoc

• A search on the Preferred Term EuroVoc:‘employment statistics’ in Collection B shall retrieve the documents annotated by the exactMatch ETT :‘labour statistics’ or by the narrowMatch ETT :‘number of trainees’ or by the broadMatch ETT :‘labour market’.

EuroVoc ETT

Fig. 60: Mapping results for EuroVoc:‘employment statistics’.

• A search on the Non-Preferred Terms EuroVoc:‘employment level’ or EuroVoc:‘employment situation’ shall retrieve the documents annotated by ETT :‘labour statistics’

AO 10592 Maintenance and further development of the EuroVoc Front Office 73/88 73

ANNEX 18 A – Functional specifications

Fig. 61: EuroVoc – Display of ‘labour market’ in the Thesaurus alignment environment’ platform.

3.16 Web services defined in the CELLAR

The following conceptScheme access services have already been defined as part of the technical requirements of the CELLAR dissemination interface:

(1) Language independent services − Check if a conceptScheme has been updated − Get the supported languages of a conceptScheme − Get the conceptScheme details

(2) Language specific services − Get the top concepts of a conceptScheme − Get the language description of a concept − Translate a concept − Get the contents related to a given concept (narrower, broader, related)

(3) conceptScheme loading and decoding services − Load and decode EuroVoc in a specific language, including Domain , micro-thesaurus, top- term and including language fallback procedures for decoding − Calculate the SKOS version difference list for a new conceptScheme compared to its previous version. − Concept decoding includes alternate labels − Alignment results and computed inferences

AO 10592 Maintenance and further development of the EuroVoc Front Office 74/88 74

ANNEX 18 A – Functional specifications

Figure 62 - EuroVoc Restful web services – CELLAR architecture

AO 10592 Maintenance and further development of the EuroVoc Front Office 75/88 75

ANNEX 18 A – Functional specifications

− Figure 63 - CELLAR : EuroVoc Restful web services – Description

AO 10592 Maintenance and further development of the EuroVoc Front Office 76/88

ANNEX 18 A – Functional specifications

3.17 Licensee’s users

3.17.1 Website usage

Source: Europa Analytics - SAS Web Analytics 5.3.3, downloaded on March 06, 2012 Number of unique visitors aggregated per month for the period March 2011 – February 2012: The Unique visitor is the uniquely identified client viewing pages within a defined time period (one day in this case). A unique visitor is counted once in the considered period whereas the visitor can have several visits. As identification is done one visitor’s PC, the same visitor using more than on PC is counted more times.

Source: Europa Analytics - SAS Web Analytics 5.3.3, downloaded on March 06, 2012 Presents the number of visits aggregated per month for the period ‘March 2011 – February 2012’. A visit is defined as a series of page requests from the same uniquely identified client.

AO 10592 Maintenance and further development of the EuroVoc Front Office 77/88

ANNEX 18 A – Functional specifications

The bounce is a visit lasting exactly 1 page views in other words a visitor visit only 1 page on the EuroVoc website and he/she leaves it immediately after and he/she is not going to visit an other page during the same visit.

AO 10592 Maintenance and further development of the EuroVoc Front Office 78/88

ANNEX 18 A – Functional specifications

4. Technical Architecture

4.1 Environments

Currently, the EuroVoc Front Office is installed in two separate but technically identical environments. All new installations are first performed in the Test environment, and, in case of success, the same installation is repeated in the Production environment. The Production environment includes the actual EuroVoc website, as seen by the public. The Test environment is only accessible from within the Publications Office.

4.2 Hardware architecture

− A RDF store that offers RDF schema inferencing, content negotiation and SPARL querying;

− Oracle Text database and index for querying the EuroVoc multilingual content.

4.3 Software architecture

The Front-Office consists of the following components:

− Drupal 6.9, an open-source Content Management System http://drupal.org/ for the management of the multilingual editorial content and the screen layout

− A SKOS consistency validator

− an open-source Content Management System, Drupal , http://drupal.org/ for the maintenance of the multilingual editorial content and the screen layout

− A set of conversion applications to transform

• SKOS into XML

• XML into XHTML to display the thesaurus content as a web page

• XML into PDF

AO 10592 Maintenance and further development of the EuroVoc Front Office 79/88

ANNEX 18 A – Functional specifications

Fig. 64: EuroVoc global architecture.

The EuroVoc Front Office has the following software requirements: ° The OS version to be installed is Solaris 10.5. ° The Solaris OS needs to have a fixed IP-address.

Item Value

OS Sun Solaris 10 u8

Oracle Database 11g release 11.2.0.3.6 64 bits

Encoding AL32UTF8

2 Gbyte per database instance Minimal Disk space Two database instances are created, one for Acceptance, one for operational

Maximum Disk space 2 x 10 Gigabyte (disk space allocated 12 GB, 26 tables)

Minimal RAM Memory 2 Gbyte per database

Recommended per db instance 4 Gbyte

Expected indexed words Starting 320.000 and growing

Item Value

OS Sun Solaris (tested on 10.8)

AO 10592 Maintenance and further development of the EuroVoc Front Office 80/88

ANNEX 18 A – Functional specifications

Item Value

Oracle client 11.2.0.3.0 or higher

Apache 2 2.2.11

Apache requirements http://httpd.apache.org/docs/2.2/install.html

Apache Minimal Disk space 100 Megabyte

Apache Maximum Disk space 500 Megabyte

Apache Minimal RAM Memory 16 Megabyte (nowhere clearly specified)

Apache Customised RAM 50 Megabyte memory

Application Server Minimal Disk 1 Gbyte space

RDF Store Minimal Disk space 2 Gbyte

Total Application Minimal Disk 3.1 Gbyte space

Minimal RAM Memory 2 Gbyte

PHP 5.2.9

PHP memory limit 128M

MySQL 5.0.75

Drupal 6.9

JDK 1.6.0_07

Tomcat 6.0.18 The Drupal 6.9 prerequisites are Apache 2, MySQL 5.0, PHP 5.2.

4.4 Publishing mechanism

Publishing EuroVoc involves exporting the contents of EuroVoc from the Back Office, then validating and importing the exported data in the Front Office, using dedicated software tools. If validation is successful, the contents are stored in the RDF store of the Front Office, and are made available to the public in the EuroVoc website.

The RDF store contains the concepts, the arborescence structure and all terms and relations (the whole SKOS file). There are two different RDF stores, one with the released thesaurus and one with the maintenance thesaurus (approved concepts). EuroVoc uses the Sesame open source framework for storage, inferencing and querying of RDF data. Detailed explanation about RDF and Sesame can be found on the portal http://www.openrdf.org/ .

AO 10592 Maintenance and further development of the EuroVoc Front Office 81/88

ANNEX 18 A – Functional specifications

4.4.1 Exporting from the Back Office The content of EuroVoc (concepts and terms) is maintained in a Thesaurus Management System (the Back Office) and exported as RDF file in SKOS-XL format.

Two types of files are exported:

(1) A SKOS/RDF file of the EuroVoc full release (called the official release ), which includes

1. The release number (Example: EuroVoc 4.3, EuroVoc 4.4)

2. The Domains and Microthesauri ;

3. The concepts ( Thesaurus concepts , countries and obsolete concepts ), their relations (BT/NT, RT) and properties;

4. The Thesaurus term s ( Preferred Terms , Non-Preferred Terms, Compound Non-Preferred Terms ), their relations (Acronym, Short Name, Full Name, TranslationOf) and their properties.

(2) A partial SKOS/RDF that contains the candidates or approved concepts available in at least one language (called the partial release).

The SKOS/RDF file is exported and stored on a local PC and afterwards must be moved in a dedicated directory to be taken over by the Infrastructure Unit of the Publications Office (Infra), responsible for running the publishing process in the test (acceptance) and production (operational) environments.

4.4.2 Importing into the Front Office Currently, importing data into the Front Office is handled by command-line Java applications which require technical expertise and experience with command-line environments, and thus they are executed by technicians at the Infrastructure Unit of the Publications Office.

A SKOS validation process checks the SKOS thesaurus consistency and tracks the errors and warnings in a log file. The SKOS/RDF consistency is validated on the basis of a set of rules and the validation results saved in a log file. Corrections in the SKOS content must be done in the TMS (Back-Office) and the full export steps reiterated. The two types of SKOS file are stored in a RDF store .

On the basis on the RDF file of the EuroVoc full release, the following steps take place:

− an Oracle Text index is created to allow information retrieval in the EuroVoc website

− the RDF file is transformed into XML that is used as input file

− by the PDF generation process to publish the PDF available for downloading in the EuroVoc website ;

− To publish the EuroVoc content to display in HTML in the EuroVoc website .

The EuroVoc content and indexes are stored at first in the acceptance environment. A transfer procedure, triggered manually, moves the thesaurus content, the native files ( SKOS/RDF , XML and PDF) and the Oracle Text index from the acceptance into the operational environment.

5. Ongoing developments

The following new developments are taking place under the current contract:

AO 10592 Maintenance and further development of the EuroVoc Front Office 82/88

ANNEX 18 A – Functional specifications

5.1 Alignment

Currently, the EuroVoc website does not support the publishing of alignment information.

5.1.1 “Disseminate Alignment” functionality

(1) The EuroVoc website will be able to disseminate any alignments for each concept (with a unique URI) published in the EuroVoc website, associated by a SKOS:mapping property to a concept hosted on an outside website (with a unique URI).

(2) The alignments will be exported from the ‘Thesaurus Alignment Environment’and delivered in SKOS/RDF format.

(3) The Front Office will not place any restrictions on the number of alignments that can be integrated this way.

(4) An alignment will be added at any time, and removed at any time. The Front Office should handle these transitions gracefully, and regenerate the necessary data structures so that the change can go live immediately.

(5) The alignment results generated between the following thesauri shall be published in a first phase.

° EuroVoc and GEMET ,

° EuroVoc and ECLAS,

° EuroVoc and Agrovoc, and

° EuroVoc and a previous EuroVoc version

(6) The ‘Term details’ shall display the details for the individual concepts that belong to one conceptScheme , in the selected content language:

6.1 The SKOS mapping associations of the aligned resources that will include:

− the type of match (Has exact match, Has broad match, Has narrow match, Has related Match), followed by,

− The name or lexical value of the matching concept (hyperlinked on the target URI ) in the target conceptScheme , in the selected content language. Clicking on the link ( URI ) will open the term details of the aligned concept in a new browser window.

Example:

‘Technological change’ http://eurovoc.europa.eu/5383 hasExactmatch Technological change (Gemet) http://www.eionet.europa.eu/gemet/concept/8330

AO 10592 Maintenance and further development of the EuroVoc Front Office 83/88

ANNEX 18 A – Functional specifications

Fig. 65 : Possible layout example for “Disseminate Alignment” development

5.2 Language management

The EuroVoc website works with several content languages, as well as interface languages.

5.2.1 “Publish New Editorial Language” functionality The system will provide a means for a Webmaster to generate the content for a new interface language (editorial content) in the EuroVoc website .

(1) A global publishing process that generates the labels of the editorial content in a new, additional language (for example, menu items, buttons, field name, tooltips, titles) in the EuroVoc website. The solution will replace or improve the current language management functionalities with a stable, user-friendly system which allows the webmaster to edit any label

5.3 Contribute

5.3.1 “Captcha”

(1) A captcha Drupal Module shall be installed and activated in order to avoid automated spam submission for the “Contribute” functionality.

AO 10592 Maintenance and further development of the EuroVoc Front Office 84/88

ANNEX 18 A – Functional specifications

5.4 Publishing mechanism

5.4.1 “Publishing Front End” Currently, the Publishing Mechanism is only available for technicians who have shell access to the Front Office environments. Currently, a solution is being developed for giving the Webmaster access to the Publishing Mechanism (ideally via an internally accessible web application).

(1) The ‘Publishing Front End’ will trigger the publishing of the content of EuroVoc and the alignments in the acceptance environment. It will allow the user to cancel or restart an ongoing publishing mechanism as well.

(2) The ‘Publishing Front End’ will be available for the ‘EuroVoc Team’ profile (webmaster). It will be straightforward to use, and it must provide visual feedback on the publishing processes. It must also be able to display and download the remote log files for these processes.

(3) The Publishing Front End will run independently of the actual publishing process. With the exception of file copy operations described in (6), no closing, restarting, crashing, etc. of the Publishing Front End is allowed to affect the publishing mechanism itself.

(4) After launching the Publishing Front End at any time, it must display the status of any ongoing publishing processes, as well as a brief history of previous ones. If a publishing process is ongoing, the Publishing Front End must not allow the launch of any conflicting tasks (e.g. an attempted second, concurrent import).

(5) The generation of the content of the EuroVoc website and the generation of the PDF files will be two independent phases, and can be triggered individually by the Publishing Front End. A graphical user interface option must also be present for triggering both phases, in which case the second phase will start automatically once the first one finishes successfully.

(6) As its input location, the Publishing Front End must accept any user-accessible location on the file system. The system will first zip and copy the input file(s) to its specified server-side location, then unzip them in their new location, and launch the import procedures remotely. If the copy operation fails for whatever reason, the system should recover from this gracefully, and inform the user.

(7) and shall be able to ingest any RDF file exported from the TMS or from the Thesaurus Alignment Environment).

5.5 Semantic Web features

5.5.1 “Linked Open Data”

° Today, even if EuroVoc has been modelled as an RDF schema and all the EuroVoc resources (Concepts, Preferred Terms and Non-Preferred Terms ) are identified by their own URI , the content of the EuroVoc website is not yet published according to the practices of Linked Open Data .

° For example, testing the EuroVoc resource http://eurovoc.europa.eu/5958 in the Vapour website ( http://validator.linkeddata.org/vapour ) (a web validation service that checks whether semantic content is correctly published according to the Linked Data current best practices) demonstrates that the EuroVoc resources are well configured as information resources (A code 200 is received in response to an HTTP GET request that corresponds to an information resource ) but content negotiation and direct dereferencing are missing.

AO 10592 Maintenance and further development of the EuroVoc Front Office 85/88

ANNEX 18 A – Functional specifications

° Content negotiation and URI dereferencing are provided in the framework of the CELLAR dissemination interface.

(1) There is a project ongoing to test the rewrite rules to dereference the URIs and update the rules according to the CELLAR defined services to make the EuroVoc website fully Linked Data compliant.

(2) Dereferencing URIs will be made available for all the published RDF resources in the EU Thesaurus website (EuroVoc and the alignment results and the RDF schema ).

RDF resources published in the EuroVoc website will consist of:

2.1 All the versions (previous and current) of a conceptScheme available in RDF;

2.2 The conceptSchemes (Thesaurus, Domains , Microthesaurus ) that shall be identified by an URI ;

2.3 Metadata about the EuroVoc conceptScheme shall be provided according to VOID, a RDF-Schema vocabulary for expressing metadata about RDF datasets (See ‘Describing Linked Datasets with the VOID Vocabulary’ (http://www.w3.org/TR/void/). Metadata will describe the conceptScheme available in RDF (Dataset) and each RDF file of alignment results (Linkset). The contractor shall develop a mechanism that enables the Publications Office to supply and publish these metadata.

2.4 All the ontology resources (classes and properties) shall be queried and return their HTML resource description..

(3) A HTTP content negotiation mechanism shall be set up to specify the type of representation for each RDF resource (Thesaurus, Domain , Microthesaurus , Concepts and conceptSchemes ). A SKOS button shall be made available in the ‘ Error! Reference source not found. ’ and in the ‘ Error! Reference source not found. ’ to perform an HTTP GET request and provide RDF descriptions as: − RDF/XML as application/rdf+xml or application/x-turtle MIME-types − HTML as text/html MIME-type (4) The button will be designed on the same model as the current ‘Tree/Flat view’ or ‘Hide/Show RT’. (5) Resources must be persistent and continue to be available throughout the consecutive versions of a conceptScheme . For example, URI aliases (alternative identifier) shall be configured for the deprecated concepts that must remain available in the EuroVoc website (See chapter 5.5.2)

5.5.2 “URI aliases (alternative identifier)”

° There is an ongoing project to configure the EuroVoc website to provide URI dereferencing and content negotiation for the alternative identifiers and to ensure their persistency.

° Alternative identifiers are maintained for deprecated concepts , which have been merged into new concepts in the TMS. For example, all the terms ( Preferred Terms and Non- Preferred Terms ) of a deprecated concept are merged as Non-Preferred Terms of a new concept . The identifier ( URI ) of the deprecated concept is merged as an alternative identifier of the new concept .

° Today, the EuroVoc website doesn’t offer access to the alternative identifiers of a deprecated concept .

AO 10592 Maintenance and further development of the EuroVoc Front Office 86/88

ANNEX 18 A – Functional specifications

For example, ‘ Aarhus’ ( http://eurovoc.europa.eu/1 ) has been deprecated under ‘Midtjylland’ (http://eurovoc.europa.eu/8280 ). http://eurovoc.europa.eu/1 is not accessible in the EuroVoc website .

(1) All the URIs of the deprecated resources shall be available and queried as Non- information resources , available via content negotiation and URI dereferencing . Resources must be persistent and continue to be available throughout the consecutive versions of a conceptScheme . A query on ‘alternative identifiers’ must lead to the URI of the associated concept . For example, http://eurovoc.europa.eu/1 must forward to http://eurovoc.europa.eu/8280

(2) The RDF schema is being enhanced to integrate the alternative identifiers.

6. Additional Information

The EuroVoc thesaurus concepts are not represented in Drupal as a new CCK content type. The EuroVoc thesaurus concepts are stored in the Oracle database as shown in the diagram below:

AO 10592 Maintenance and further development of the EuroVoc Front Office 87/88

ANNEX 18 A – Functional specifications

The XML file that is produced from the SCOS/RDF file is used to publish the EuroVoc content in HTML in the EuroVoc website according to following principle:

Every user, surfing to the EUROVOC portal, will send http requests (port 80) from his browser to the Apache web server. Drupal running within this Apache server, gets this information request. If the user requests a micro thesaurus navigation, approved concepts, or terms detail screen and graphical representation, the Drupal application will request this information from the Java application running within the Tomcat application server. The algorithms within Drupal are programmed using PHP 5.

The Java application, receiving this request has a page template for every generic request. The Java application constructs a SPARQL request and gets the RDF data from the RDF store.

The data, filled in in the template is sent back to the requesting Drupal application which sends it back to the browser together with the styling information.

If the user triggers a simple or advanced search request, the Drupal application will construct an SQL statement for the Oracle Text database. The results are presented in the result screen. Single Concept pages are generated automatically. The single concept page is not available for editing in Drupal. HTML automatically generated pages (The EuroVoc thesaurus content) are not available in Drupal. Only the editorial content is available in Drupal for editing.

AO 10592 Maintenance and further development of the EuroVoc Front Office 88/88