See discussions, stats, and author profiles for this publication at: http://www.researchgate.net/publication/271530318

First Report on User Requirements Identification and Analysis

TECHNICAL REPORT · AUGUST 2013 DOI: 10.13140/2.1.3418.6564

DOWNLOADS VIEWS 102 28

5 AUTHORS, INCLUDING:

Francesca Pozzi Michela Ott Italian National Research Council Italian National Research Council

97 PUBLICATIONS 205 CITATIONS 193 PUBLICATIONS 291 CITATIONS

SEE PROFILE SEE PROFILE

Francesca Dagnino Alessandra Antonaci Italian National Research Council Italian National Research Council

33 PUBLICATIONS 24 CITATIONS 15 PUBLICATIONS 7 CITATIONS

SEE PROFILE SEE PROFILE

Available from: Francesca Pozzi Retrieved on: 06 July 2015

Project Title: i-Treasures: Intangible Treasures – Capturing the Intangible Cultural Heritage and Learning the Rare Know- How of Living Human Treasures

Contract No: FP7-ICT-2011-9-600676 Instrument: Large Scale Integrated Project (IP) Thematic Priority: ICT for access to cultural resources Start of project: 1 February 2013 Duration: 48 months

D2.1 First Report on User Requirements Identification and Analysis

Due date of 1 August 2013 deliverable: Actual submission 6 August 2013 date: Version: 2nd version of D2.1 Main Authors: Francesca Pozzi (ITD-CNR), Marilena Alivizatou (UCL), Michela Ott (ITD-CNR), Francesca Dagnino (ITD-CNR), Alessandra Antonaci (ITD-CNR)

D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Project funded by the European Community under the 7th Framework Programme for Research and Technological Development.

Project ref. number ICT-600676

i-Treasures - Intangible Treasures – Capturing the Project title Intangible Cultural Heritage and Learning the Rare Know- How of Living Human Treasures

Deliverable title First Report on User Requirements Identification and Analysis

Deliverable number D2.1

Deliverable version Version 2

Previous version(s) Version 1

Contractual date of delivery 1 August 2013

Actual date of delivery 6 August 2013

Deliverable filename Del_2_1_FINAL2.doc

Nature of deliverable R

Dissemination level PU

Number of pages 175

Workpackage WP 2

Partner responsible ITD-CNR

Author(s) Matine Adda-Decker (CNRS), Marilena Alivizatou (UCL), Samer Al Kork (UPMC), Angélique Amelot (CNRS), Alessandra Antonaci (ITD-CNR), George Apostolidis (AUTH), Nicolas Audibert (CNRS), Vasilis Charisis (AUTH), Marius Cotescu (ACAPELA), Lise Crevier-Buchman (CNRS), Francesca Dagnino (ITD-CNR), Bruce Denby (UPMC), Olivier Deroo (ACAPELA), Kosmas Dimitropoulos (CERTH), Cécile Fougeron (CNRS), Vasso Gatziaki (UOM), Cédric Gendrot (CNRS), Alina Glushkova (UOM), Nikos Grammalidis (CERTH), Leontios Hadjileontiadis (AUTH), Anastasios Katos (UOM), Alexandros Kitsikidis (CERTH), Ioannis Kompatsiaris (CERTH), George Kourvoulis (UOM), Gwenaelle Lo Bue (CNRS), Athanasios Manitsaris (UOM), Sotiris Manitsaris (ARMINES/ENSMP),

Del_2_1_FINAL2.doc Page 2 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Dimitris Manousis (UOM), Spiros Nikolopoulos (CERTH), Michela Ott (ITD-CNR), Stavros Panas (AUTH), Xrysa Papadaniil (AUTH), Savvas Pavlidis (UOM), Claire Pillot- Loiseau (CNRS), Francesca Pozzi (ITD-CNR), Thierry Ravet (UMONS), George Sergiadis (AUTH), Mauro Tavella (ITD-CNR), Joëlle Tilmanne (UMONS), Filareti Tsalakanidou (CERTH), Viki Tsekouropoulou (UOM), Jacqueline Vaissière (CNRS), Leny Vinceslas (CNRS), Christina Volioti (UOM), Erdal Yilmaz (TT/Sobee Studios).

Editor Francesca Pozzi (ITD-CNR)

EC Project Officer Alina Senn

Abstract The document analyzes and describes the ‘intangible’ artistic expressions chosen by the project as use cases and defines the basic requirements of the i-Treasures system that will be developed to support information, preservation and education of these intangible heritages.

Keywords Intangible Cultural Heritage (ICH), preservation, education, technology.

Del_2_1_FINAL2.doc Page 3 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Signatures

Written by Responsibility- Company Date

Francesca WP2 Leader (ITD-CNR) 16/07/2013 Pozzi

Verified by

Francesca WP2 Leader (ITD-CNR) 02/08/2013 Pozzi

Approved by

Nikos Coordinator (CERTH) 05/08/2013 Grammaliis

Yiannis (Ioannis) Quality Manager (CERTH) 05/08/2013 Kompatsiaris

Del_2_1_FINAL2.doc Page 4 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Table of Contents 1. Executive summary ...... 8 2. Introduction ...... 9 2.1 Purpose and structure of the document ...... 9 2.2 Brief introduction to the i-Treasures project ...... 9 2.3 ICHs considered in the project: an overview ...... 10 2.4 Focus on Work Package 2 ...... 12 3. Overall methodology for the definition of the i-Treasures Requirements ...... 15 4. State of the art review ...... 18 4.1 Methodology and Objectives ...... 18 4.2 Safeguarding Intangible Heritage ...... 19 4.2.1 The Approach and Projects of UNESCO ...... 19 4.2.2 Community-Focused Approaches to Safeguarding Intangible Heritage ...... 21 4.3 Modern Technologies in the Transmission and Documentation of Intangible Heritage ...... 23 4.3.1 Facial Expression Analysis ...... 24 4.3.1.1 Introduction ...... 24 4.3.1.2 Key projects and applications ...... 27 4.3.1.3 Possible Use in Intangible Heritage Preservation and Transmission ...... 29 4.3.2 Vocal Tract Sensing and Modeling ...... 30 4.3.2.1 Introduction ...... 30 4.3.2.2 Key Projects and applications ...... 31 4.3.2.3 Possible Use in Intangible Heritage Preservation ...... 32 4.3.3 Motion capture - Body and Gesture Recognition ...... 33 4.3.3.1 Introduction ...... 33 4.3.3.2 Key Projects and applications & Possible Use in Intangible Heritage Transmission and Preservation ...... 37 4.3.4 Encephalogram Analysis ...... 40 4.3.4.1 Introduction ...... 40 4.3.4.2 Key Projects and applications ...... 41 4.3.4.3 Possible Use in Intangible Heritage Preservation and Transmission ...... 45 4.3.5 Semantic Multimedia Analysis ...... 46 4.3.5.1 Introduction ...... 46 4.3.5.2 Key Projects and applications ...... 47 4.3.5.3 Possible Use in Intangible Heritage Preservation and Transmission ...... 51 4.3.6 3D Visualization of Intangible Heritage ...... 53 4.3.6.1 Introduction ...... 53 4.3.6.2 Key Projects and applications ...... 54 4.3.6.3 Possible Use in Intangible Heritage Preservation and Transmission ...... 57 4.3.7 Text to Song ...... 57

Del_2_1_FINAL2.doc Page 5 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

4.3.7.1 Introduction ...... 57 4.3.7.2 Key Projects and applications ...... 58 4.3.7.3 Possible Use in Intangible Heritage Preservation and Transmission ...... 61 4.4 Results and Way Forward: Emerging requirements ...... 62 5. Knowledge domain definition ...... 64 5.1 Objectives and rationale ...... 64 5.2 Experts’ and Users’ Groups setting up ...... 64 5.2.1 Preliminary feedback from the Expert Groups ...... 65 5.2.2 Next Actions ...... 66 5.3 The Glossary ...... 66 5.3.1 Glossary design ...... 66 5.3.2 Glossary realization ...... 66 5.4 Questionnaires to the Experts ...... 67 5.4.1 Dimensions for knowledge domains definition ...... 68 5.4.2 Questionnaires preparation ...... 70 5.4.3 Questionnaires release and delivery ...... 75 5.4.4 Questionnaires results ...... 76 5.4.4.1 Sardinian Canto a Tenore...... 76 5.4.4.2 Corsican Cantu in Paghjella ...... 81 5.4.4.3 Byzantine Music ...... 85 5.4.4.4 Human Beat Box ...... 88 5.4.4.5 Romanian Căluş dance ...... 91 5.4.4.6 Greek Tsamiko dance ...... 95 5.4.4.7 Walloon traditional dances...... 99 5.4.4.8 Contemporary dance ...... 104 5.4.4.9 Contemporary music composition ...... 109 5.4.4.10 Craftsmanship ...... 112 5.5 Interviews with the Experts – Consultants ...... 119 5.5.1 Dimensions/ topics of the interviews ...... 120 5.5.2 Interviews preparation ...... 121 5.5.3 Interviews release and delivery ...... 121 5.5.4 Interviews results ...... 121 5.5.4.1 Canto a Tenore ...... 121 5.5.4.2 Byzantine music ...... 124 5.5.4.3 Contemporary music composition ...... 124 6. Identification of the Use case Requirements ...... 127 6.1 Objectives and rationale ...... 127 6.2 Requirements per use cases ...... 129 6.2.1 Rare singing ...... 129 6.2.2 Rare dancing ...... 131

Del_2_1_FINAL2.doc Page 6 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

6.2.3 Craftsmanship ...... 133 6.2.4 Contemporary music composition ...... 135 7. Results: Towards the first definition of the i-Treasures Requirements ...... 137 7.1 Introduction ...... 137 7.2 Overall description ...... 137 7.3 Functional Requirements ...... 139 7.4 Non-functional Requirements ...... 153 8. Discussion and Conclusions ...... 158 9. References ...... 159 10. Appendixes ...... 175 10.1 Appendix to the State of the Art ...... 175 10.2 Experts’ and Users’ Groups ...... 175 10.3 Questionnaires for knowledge domain definition ...... 175 10.4 Questionnaire results: sub-use case analysis ...... 175 10.5 Guidelines for interviewers – Example ...... 175

Del_2_1_FINAL2.doc Page 7 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

1. Executive summary The i-Treasures project deals with ICH (Intangible Cultural Heritage) preservation and transmission; its primary aim is to develop an open and extendable platform to provide access to ICH resources, enable knowledge exchange between researchers and contribute to the transmission of the rare know-how from Living Human Treasures to apprentices. The main purpose of this document is to define the system and user requirements of the i-Treasures platform. The requirements definition process was based on a participatory approach, where experts, performers and users have been actively involved through surveys and interviews, in the complex tasks of identifying specificities of rare traditional know-how, discovering existing teaching and learning practices and finally identifying the most cutting edge technologies able to support innovative learning approaches to ICH. Thus the document contains a state of the art review in the field, as well as the analysis of the artistic expressions (i.e. intangible heritages) identified by the project as use cases, namely: 1) rare traditional songs, 2) rare dance interactions, 3) traditional craftsmanship and 4) contemporary music composition.

Del_2_1_FINAL2.doc Page 8 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

2. Introduction

2.1 Purpose and structure of the document This document is the first scientific deliverable of the i-Treasures project, an Integrated Project (IP) of the European Union's 7th Framework Programme under the theme “ICT for Access to Cultural Resources”. The project deals with ICH (Intangible Cultural Heritage) and its main objective is to develop a platform dedicated to support access, knowledge and transmission of rare know-how behind certain artistic expressions (intangible heritages). The document describes the artistic expressions identified by the project as main use cases, namely: 1) rare singing, 2) rare dancing, 3) craftsmanship and 4) contemporary music composition. Besides, the document illustrates the methods used, as well as the results obtained, within the complex process of defining the user and system requirements of the i-Treasures platform. Such process was based on a participatory and user-centered approach involving several stakeholders, which will lay the foundation for the following of the project. Thus the document is structured as it follows:  In the next section, the deliverable provides a bird’s eye view of the project, and of the considered ICHs and contextualizes Work Package 2 in the framework of the whole project.  Section 3 explains the overall design behind the work undertaken and illustrates the methodology adopted to define the requirements of the treasures platform.  Section 4 contains a state-of-the-art review in the field, which looks at the literature, resources and projects providing access to intangible cultural heritage and enlightens the technology employed for ICH transmission and safeguarding.  Section 5 presents the work done to define the knowledge domains of the use cases, by presenting the approaches adopted, the tools developed, as well as presenting the results of such analysis process.  Section 6 illustrates the preliminary elaboration of the requirements at the level of the use cases.  Section 7 describes the main results of our work, presenting the i-Treasures functional and non-functional requirements.  Lastly, Section 8 discusses the main results obtained so far and identifies future areas of work.

2.2 Brief introduction to the i-Treasures project As already mentioned, the i-Treasures project is about ICH (Intangible Cultural Heritage): it looks at those rare and valuable living expressions and traditions that countless groups and communities worldwide inherited from their ancestors and still transmit to their descendants, in most cases orally or by imitation. The project will make an extensive use of cutting edge ICT and sensor technologies with the ultimate aim of developing “an open and extendable platform providing access to ICH resources, enabling knowledge exchange between researchers and contributing to the transmission of rare know-how from Living Human Treasures to

Del_2_1_FINAL2.doc Page 9 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 apprentices” (Project Description of Work1). Besides, the project aims to propose new methods, employ and create innovative tools able to support and enhance the passing down of rare know how to new generations. Starting from ‘capturing’ the key aspects and features of the different ICHs, a process of data modelling will be carried out within the project, by relying on advanced Semantic Multimedia Analysis techniques. The new data acquired on the ICHs will thus give life to a knowledge base containing a wealth of information never available before; based on this wide new knowledge, it will then be possible to shape a variety of different educational paths, serving different scopes and specific educational needs, all aimed at contributing to the transmission of these peculiar artistic and cultural expressions. The i- Treasures educational platform is expected to take the learners beyond the concept of “learning by imitation”: besides offering the opportunity to acquire a variety of new information on the ICHs in different formats (audio, video, narrative, etc...), it will allow learners to put themselves to the test, carrying out individual trials and receiving appropriate feedback and hints (in different formats, e.g. audio or video), so to reach increased levels of competence in an easier, more direct and quicker way. Lastly, the platform also goes in the direction of safeguarding valuable patrimonies and sustaining the sense of identity of the local communities where the ICHs came to light, were practiced, cultivated and maintained, so to become integral part of their lives.

2.3 ICHs considered in the project: an overview One decade ago, in 2003, UNESCO promulgated the Convention for the Safeguarding of the Intangible Cultural Heritage (ICH). This key document: a) defines the “intangible cultural heritage” as “the practices, representations, expressions, knowledge, skills – as well as the instruments, objects, artefacts and cultural spaces associated therewith – that communities, groups and, in some cases, individuals recognize as part of their cultural heritage”; b) highlights the urgent need for preserving/safeguarding traditional culture and traditional artistic expressions; c) identifies the various elements of the world intangible cultural heritage present in different territories by means of two lists:  the Representative List of the Intangible Cultural Heritage of Humanity,  and the List of Expressions in Need of Urgent Safeguarding, including those ICH that require “urgent measures to keep them alive”. As already mentioned, in the wide panorama of the existing ICHs, this project will examine in detail four use cases (areas of intangible cultural heritage), namely: 1) rare singing 2) rare dancing 3) craftsmanship and 4) contemporary music composition. Each use case has been further instantiated in different “sub-use cases”2. Table 2-1 contains the list of the sub-use cases tackled by the project and for each of them, the table specifies: whether the sub-use case is included in one of the UNESCO lists of

1 http://www.i-treasures.eu/filedepot?fid=4 2 In the i-Treasures DoW the term ‘use-case’ denotes the 4 main domains that will be addressed in the project (i.e.: rare traditional songs, rare dance interactions, traditional craftsmanship and contemporary music composition), while the term ‘scenario’ is used for the specific artistic expressions (e.g. the Bizantine song, the Sardinian ‘canto a tenore’, the Calus dance, etc.). In this document we adopt the term sub-use cases (instead of scenarios).

Del_2_1_FINAL2.doc Page 10 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Intangible Cultural Heritage, the country of origin, link(s) to get an overview of the ICH itself and the i-Treasure partner in charge of it. Use Case Sub Use Listed by Country Main web reference/s Partner case UNESCO in responsib le Rare singing CNRS

Byzantine not listed Greece http://www.ec-patr.net/en/ UOM music http://www.i- treasures.eu/content/byzantine- music Cantu in List of Corse- http://www.unesco.org/culture/ich/i CNRS paghjella Intangible France ndex.php?lg=en&pg=00011&USL= Cultural 00315 Heritage in Need of Urgent Safeguarding Canto a Representativ Sardinia- http://www.unesco.org/culture/ich/i CNR Tenore e List of the Italy ndex.php?lg=en&pg=00011&RL=0 Intangible 0165 Cultural http://www.i- Heritage of treasures.eu/content/canto-tenore Humanity Human not listed Worldwide http://www.i- CNRS Beat box treasures.eu/content/human-beat- box Rare dancing UMONS Căluş Representativ Romania http://www.unesco.org/culture/ich/i CERTH dance e List of the ndex.php?lg=en&pg=00011&RL=0 Intangible 0090 Cultural Heritage of Humanity Tsamiko not listed Greece http://www.greekdance.org/e- CERTH Greek library/Tsamiko dance http://greekcommunity.org.nz/2012 /12/greek-dance/ Walloon not listed Belgium http://www.i- UMONS traditional treasures.eu/content/walloon- dance traditional-dances http://www.dapo.be/ http://www.fgfw.be/ Contempo not listed not http://www.i- UMONS rary dance relevant treasures.eu/content/contemporar for this y-dances sub-use http://www.blackfishacademy.com/ case dance.htm http://www.contemporary- dance.org/contemporary-dance- history.html

Del_2_1_FINAL2.doc Page 11 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Craftsmanship CERTH

The art of not listed Greece http://atschool.eduweb.co.uk/sirrob CERTH pottery France hitch.suffolk/portland%20state%20 ARMINES Turkey university%20greek%20civilization TT %20home%20page%20v2/docs/8/ glatt.htm http://www.vallauris-golfe-juan.fr/- A-village-of-ceramic-tradition-.html http://turkey.amethistle.com/2008/ 04/glorious-ceramics.html Contemporary UOM music composition Based on not listed Not http://www.i-treasures.eu/node/62 UOM music relevant patterns of for this Beethoven sub-use Haydn or case Mozart Table 2-1 - List of the ICHs considered in the project As one can see in the above table, four sub-use cases for both the Dances and the Songs have been included; among them, three sub-use cases belong to ICHs listed by UNESCO (the Căluş dance, the Cantu a Tenore and the Cantu in Paghjella); for the Craftsmanship use case, instead, three different pottery fabrics have been chosen, representing the living tradition of three different EU countries. At the same time, the concept of ICH is not necessarily linked to the past times, but it may also refers to more recent living cultural expressions; for this reason, the i-Treasures project also deals with the new artistic form of Contemporary music composition and with the Contemporary dance.

2.4 Focus on Work Package 2 Work Package 2 (WP2) is meant to analyse the various sub-use cases addressed in the project and design the i-Treasures platform and its functionalities, thus laying the foundations for the work to be done in subsequent WPs. Figure 2-1 provides an overview of the objectives, tasks and deliverables due under WP2; in doing so, it also highlights the objectives and tasks presently active and the way they are related (plain black arrows).

Del_2_1_FINAL2.doc Page 12 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Figure 2-1 - Overview of WP2 objectives, tasks and deliverables Below Figure 2-2 depicts the overall WP2 roadmap. As the figure shows, Deliverable D2.1 is the result of: 1) the analysis and definition of use cases and sub-use cases, 2) the setting up of the users’ and experts’ groups for each sub-use case, 3) the state of the art of the technologies adopted/available so far in the field and 4) the consultation of the users’ and experts’ groups, with the final aim to elicit the requirements which represent the core of this deliverable. Besides, Deliverable 2.1 is the first step of the i-Treasures platform development process, and thus it is directly linked to Deliverable 2.2 (due at month 9) that will deal with the system requirements and specifications. A quick look at the roadmap below in the figure shows also that the elicitation of the requirements is an ongoing process, subject to revisions at strategic moments in project lifecycle: the first version of the requirements is thus presented here (D2.1), while a refined version of the requirements will be produced in month 27 (D2.3), next to the testing and evaluation of the first prototype of the i-Treasures platform; the final release of the requirements (D2.5) will be delivered at month 40 and will keep track of the results, observations and notes of the second release of the i-Treasures platform.

Del_2_1_FINAL2.doc Page 13 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Figure 2-2 - WP2 roadmap

Del_2_1_FINAL2.doc Page 14 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

3. Overall methodology for the definition of the i-Treasures Requirements Any software development process goes through the phase of requirements engineering, which is the process of discovering, analyzing, documenting and validating the requirements of the system to be developed [IEEE, 2004]. Usually, defining the users’ and system requirements implies to identify all the stakeholders (users, customers, developers, etc.), take into account all their needs and negotiate with them what the system will be able to offer (Wiegers, 1996). Analysts can employ several methods and techniques to elicit the requirements from the users/customers. As matter of fact, often this is a collaborative and participatory process, envisaging a continue and intensive dialogue among the stakeholders. Such dialogue may be based on the development of ‘scenarios’ and/ or ‘use cases’ (as it happens in the agile methods – Beck et al., 2001), the use of focus groups, workshops, interviews, questionnaires with the users/ customers, more ethnographic approaches based on the direct observation of the users’ actions/needs, the study of the documentation of previous systems, etc. So, defining the requirements may be a very complex process, encompassing the use of more than one method or technique (Sommerville & Sawyer, 1997). The outcome of this complex process of elicitation is a list of requirements, stating what the system will do (rather than how it will do this) (IEEE, 1998). In the context of the i-Treasures project, the stakeholders include:  experts of the various ICHs  apprentices / students of the various ICHs  basic users of the system (teachers, amateurs, academics, etc.)  researchers (in various fields)  system developers  all the partners of the Consortium (who in some cases play one or more of the above mentioned roles). In i-Treasures, in order to elicit requirements, we have used a complex methodology, encompassing different sources, each one providing inputs that have been finally elaborated in an overall list of Requirements (Figure 3-1).

Del_2_1_FINAL2.doc Page 15 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Figure 3-1 - Overall methodology In particular, first of all we analyzed what had been previously done in this field by conducting a state of the art review, which takes into consideration the existing literature, as well as past and/or ongoing projects in the field of ICH preservation and knowledge dissemination. As it will be illustrated in Section 4, this led us to identify a set of promising technologies that could be adopted by the project for the ICH transmission and documentation, as well as to understand what are the main gaps/needs in this field, that could be addressed by i-Treasures and constitute the added value of the project. Thus the results of the state of the art review provide general indications on what the system could/should do and can be seen as preliminary inputs to the i-Treasures requirements. Another crucial phase of the process, was the knowledge domain definition. This is acknowledged as an essential step of the requirements definition process (Burge, n.d.), and in particular in i-Treasures such phase was considered vital, given that we are addressing de facto ten different knowledge domains (the ten sub-use cases of the project). In the field of domain knowledge acquisition several methods can be used, implying different levels of involvement of the domain experts: from indirect methods (i.e. documents analysis), to direct interactions with the experts (through interviews, questionnaires, case studies, etc.) (Burge, n.d.). In the project a direct contact was preferred and to this purpose different experts’ and users’ groups were created and involved (see Section 5.2). As far as the adopted methods are concerned, the project conceived and developed three main tools to support the phase of knowledge domain definition, namely: glossary, questionnaires and interviews. The former tool, i.e. the glossary, constitutes a sort of background of the process of knowledge domain definition and was considered by the Consortium essential to define a common vocabulary among all the stakeholders and the Consortium itself, whose composition is highly inter-disciplinary. The glossary, though, should not be seen only as an internal tool to the project, but it is also a legacy that the project will be able to leave to the broader ICH community, even after the project ending. Moreover, in order to define the main characteristics of each sub-use case, a set of questionnaires were delivered to experts of the various sub-use cases, in such a way to get a picture of what they are, what their main characteristics are, what methods are usually adopted to transmit /teach/ learn these artistic expressions, etc.

Del_2_1_FINAL2.doc Page 16 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Then, a third tool was conceived and developed in some of the sub-use cases, i.e. an interview, this time delivered to a sub-set of the experts, whose main aim was to go deeper into the existing teaching/learning practices, so to better understand what i– Treasures could offer to foster education in these fields. All in all, glossary, questionnaires and interviews are the main sources of data and information to define the various domains addressed by the project. Then, all the inputs gathered through the above mentioned means were elaborated at the level of the four use cases considered in the project, which eventually led to the production of a preliminary version of the i-Treasures requirements.

In the following this document presents each of the phases (circles) represented in Figure 3-1. Thus, as already anticipated, the structure of the document reflects the three circles proposed in the Figure: Section 4 contains the results of the State of the art review, Section 5 presents the work done to define the knowledge domain and Section 6 illustrates the elaboration of the requirements at the level of the use cases. Lastly, Section 7 will describe the main results of our work, presenting the i- Treasures functional and non-functional requirements.

Del_2_1_FINAL2.doc Page 17 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

4. State of the art review In the last decades the protection and promotion of cultural heritage (primarily in the form of monuments, historic sites, artefacts and more recently cultural expressions) has become a central topic of European and international cultural policy3. Since the end of World War II, UNESCO has been a key organization in defining cultural heritage and ensuring its protection through the adoption of a series of conventions4, financial and administrative measures. Parallel to the work of UNESCO, governmental and non-governmental organizations, professional associations and academic institutions around Europe have been involved with documenting and providing access to different forms of cultural heritage (ranging from archaeological sites and natural parks to museum collections and folk traditions). In this process, a significant body of resources dealing with the documentation and promotion of cultural heritage through different technologies has been developed. There is little doubt that digital technologies have revolutionized scientific and public access to cultural heritage (Cameron and Kenderdine 2010, Ioannides et al. 2010). Following the adoption of the Convention for the Safeguarding of Intangible Heritage in 2003, the protection of cultural traditions has become prominent on an international level. One of the key arguments in this area is that humanity’s intangible heritage is threatened by processes of globalization. Modern technologies and mass culture are often regarded as a threat to the survival of traditional expressions. According to the 2003 Convention, it falls upon national governments, cultural organizations and practicing communities to transmit these vulnerable cultural expressions to the next generations. Safeguarding activities vary according to local and national contexts. Interestingly, although modern technologies are often identified as a threat to traditional expressions, it is these very technological innovations that frequently play a key part in the preservation and dissemination of intangible heritage.

4.1 Methodology and Objectives Drawing on the existing literature and body of research, this Section of the deliverable provides an overview of current safeguarding programmes with a particular focus on specific technological methods and how they contribute to the documentation and transmission of intangible heritage. What we argue is that new technologies can provide innovative approaches to the transmission and dissemination of intangible heritage by supporting human interaction and communication. More precisely, this Section offers an overview of the literature, resources and projects providing access to intangible heritage. It aims to identify gaps and constraints of existing projects in the area. It is the results of a collaborative effort by members of the consortium and covers the state of the art in technology that could be employed in i-Treasures in order to highlight how specific new technologies can contribute to the transmission and safeguarding of intangible heritage. In so doing, it contributes to the definition of user and system requirements. To this end, the following section looks at the broad scope of safeguarding activities supported by UNESCO, which mainly consist of national and international inventories and rely mostly on archival approaches. The document then examines projects run by museums, cultural organizations and grassroots initiatives, which are driven by

3 See www.unesco.org , www.coe.it/cultureheritage 4 Examples are: the 1954 Convention for the Protection of Cultural Heritage in the Event of Armed Conflict, the 1970 Convention on the Means of Prohibiting the Illicit Import, Export and Transfer of Ownership of Cultural Property, the 1972 World Heritage Convention and the 2003 Convention for the Safeguarding of Intangible Heritage.

Del_2_1_FINAL2.doc Page 18 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 community participation. Section 4.3 takes a closer look at specific technological methods (facial expression analysis and modeling, vocal tract sensing and modeling, body and gesture recognition, semantic multimedia analysis, 3D visualisation and text to song) that relate to the documentation and transmission of four use cases that are at the centre of i-Treasures: rare singing, music composition, rare dancing and craftsmanship. Section 4.4 explains how i-Treasures seeks to fill existing gaps and offer a novel approach to the safeguarding of intangible heritage by relating the results from the state of the art to the emerging requirements.

4.2 Safeguarding Intangible Heritage

4.2.1 The Approach and Projects of UNESCO UNESCO is a major international governmental organization founded in the aftermath of World War II and is the educational, scientific and cultural body of the United Nations family. The organization is widely regarded as the guardian of world cultural heritage adopting legal measures and raising awareness about humanity’s heritage. The World Heritage Centre, which was founded in 1972, is a major institution within UNESCO concerned with the identification, conservation and promotion of monuments and sites of outstanding universal value5. Among UNESCO’s projects concerned with documenting and providing access to cultural heritage is the Memory of the World, an online encyclopedic database of digitized archival material, manuscripts and rare publications6. In terms of intangible heritage, the key normative instrument of UNESCO is the 2003 Convention. This is a legally binding document that offers an intellectual and operational framework for the protection of traditional expressions by national governments and on an international level. It proposes a structured approach that highlights the urgent need to preserve traditional culture through the establishment of national and international inventories, raising awareness and community participation (UNESCO 2003). Over the course of the last decades, UNESCO has developed several projects, which have been directly or indirectly related to the safeguarding of intangible heritage (Aikawa 2004). The most prominent measures have been the two Lists predicted by the 2003 Convention: The International List of the Intangible Cultural Heritage of Humanity and the List of Intangible Cultural Heritage in Need of Urgent Safeguarding7. These are international inventories of traditional expressions following the five domains of intangible heritage defined in the 2003 Convention. Both are accessible online and include photographs and audiovisual recordings of cultural expressions. Their primary function is that of an archival resource that raises awareness about the listed expressions and their communities. In the case of the List of Expressions in Need of Urgent Safeguarding, it also includes financial support for the adoption of cultural revival measures. The drawback of the lists is that they seem to be serving primarily promotional objectives rather than activities that have a direct impact on local communities (Hafstein 2009). Moreover, the amount of documentation available online is relatively limited. Prior to these lists, UNESCO supported the programme for the Proclamation of Masterpieces of the Oral and Intangible Heritage of Humanity (Nas 2002, Alivizatou 2007). This was the first international project to raise major

5 http://whc.unesco.org 6 http://www.unesco.org/new/en/communication-and-information/flagship-project-activities/memory-of-the- world/homepage 7 http://www.unesco.org/culture/ich/index.php?pg=00001

Del_2_1_FINAL2.doc Page 19 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 awareness at a governmental level and influence the adoption of the 2003 Convention. The database of selected masterpieces of intangible heritage is available online and includes photographic and audiovisual documentation. Although it helped UNESCO raise awareness towards the need to safeguard intangible heritage, the project has subsequently been criticized for the exclusive connotation of the term ‘masterpiece’ (Kirshenblatt-Gimblett 2004) and its relatively limited educational scope. Prior to the adoption of the 2003 Convention, UNESCO established several projects aimed at the safeguarding of intangible heritage. For example, the Red Book of Endangered Languages (subsequently known as Atlas of Endangered Languages) is a publication and online resource that provides basic information on more than two thousand languages8. It has taken the form of an online map and archival resource and provides an encyclopedic list of world languages ranging from vulnerable to extinct. However, the information available online is limited and there are limited learning possibilities available. The Traditional Music of the World is a project that includes a compilation of recordings of traditional music from around the world (Aikawa 2004). The recordings have been made by ethnomusicologists in situ and then copied on vinyl and CD format. Relevant photographs accompany the audio recordings. The project has made available these recordings to an international audience and has raised awareness about traditional music. However, it seems to act primarily as an archival resource and has limited educational application. Moreover, there is no online access to the recordings. The Living Human Treasures project9 was set up in the early 1990s following existing programmes set up in Japan and Korea (Yim 2003). The project supports the transmission of traditional skills to young generations through a system of nationally sponsored apprenticeships. Apprentices learn the skills involved in traditional arts and crafts by living and working closely with master craftspeople (so called Living Human Treasures, i.e. LHT). The LHT are recognized by the government and receive a salary in support of their work. The project has been applied in different national contexts (the ones with the most experience being Japan and Korea). Although its impact varies, it could be argued that knowledge transmission is restricted to the chosen apprentices rather than a broader audience. Also, occasionally the transmitted knowledge is treated as something fixed and monolithic. UNESCO has also played an active part in the establishment of national inventories of intangible heritage in several countries around the world. Some examples with information available on the Internet are the national inventories and/ or registers of Japan, Brazil, Portugal, Bulgaria and Venezuela. Different methodologies have been used for the creation of these inventories. In Bulgaria, catalogues were set up drawing on information collected through questionnaires distributed to local communities via cultural centres. In Japan and Brazil, inventories were drawn mostly though ethnographic research10. The national inventories have raised awareness about the importance of intangible heritage among local communities. However, the amount of documentation available seems to be relatively limited. The research on the different cultural expressions is not particularly rigorous and there seems to be relatively limited focus on education and transmission. An innovative methodology has been used for the documentation of intangible heritage in Scotland11. The UK Commission of UNESCO in collaboration with the Scottish Arts Council funds this

8 http://www.unesco.org/culture/languages-atlas 9 http://www.unesco.org/culture/ich/?pg=00061 10 http://www.unesco.org/culture/ich/index.php?pg=00080 11 www.ichscotlandwiki.org

Del_2_1_FINAL2.doc Page 20 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 project. It consists of an online archive of Scottish intangible heritage that is open to the public in the form of a wiki. It contains photographic and audiovisual documentation and uses participatory approaches. However, the educational application of the project is not clear. The above constitute some key projects concerning the documentation and dissemination of intangible heritage expressions by UNESCO. They provide an archival and encyclopedic approach to the documentation of intangible heritage.

4.2.2 Community-Focused Approaches to Safeguarding Intangible Heritage It becomes clear from the above that UNESCO has adopted a primarily archival approach to the safeguarding of intangible heritage. This can be related to the fact that the 2003 Convention recognizes registers and inventories as a first step of safeguarding intangible heritage. The above projects have been run by the international organization in partnership with national governments. The second part of this review examines projects that are driven primarily by or for the benefit of local communities and rely on more participatory methods. These projects have not been initiated by national governments but rather by local museums, research institutes and local centres. The Oral Traditions Project of the Vanuatu Cultural Centre is a relevant case in point (for more details on the project see Bolton 2003 and Alivizatou 2012a). Vanuatu is a country in the Pacific comprised of more than one hundred islands. Held within the auspices of the national museum and cultural centre, the project is run through a network of volunteers, called fieldworkers. These are representatives of different communities who each year conduct research on traditional customs and cultural expressions. The fieldworkers have been trained in ethnographic research methods and photographic and audiovisual documentation. The material collected during their research is kept in a specifically designated room of the national museum with limited access to community representatives and museum curators. The project has been instrumental in raising awareness across the islands about the importance of traditional culture in the years following decolonization from the French and British Condominium government in 1980. Its primary function is to create a ‘memory-bank’ of traditional culture and languages, but the collected material are not only kept for posterity but used in educational programmes for schools, the museum and the radio and community development. A key theme of the programme is the idea of ‘heritage for development’ translated in eco- and cultural tourism projects. The project has been going on since the early 1970s and some of the issues raised relate to the limited budget, the engagement of fieldworkers and how best to protect traditional culture from commercialization (Huffman 1996). A further case relates to the topic of community-driven digital repatriation. This involves indigenous groups that collaborate with museums holding material sourced from their communities to create digital copies of objects or audiovisual recordings. For example, with financial and technological support from the National Museum of New Zealand Te Papa Tongarewa, a Maori tribe called Te Aitianga a Hauti prepared a digital resource for documenting their tangible and intangible heritage. This consists of a digital database containing photographic and audiovisual data related to collections in museums in New Zealand and overseas. The specific project was not only about cultural preservation but also community empowerment and education. Due to various limitations relating to intellectual property rights access to this resource is limited (Alivizatou 2012b). A similar project is the Sierra Leone Cultural Heritage project (Basu 2011).

Del_2_1_FINAL2.doc Page 21 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Online learning resources constitute another area of intangible heritage preservation. For instance, many indigenous groups in partnership with museums have created online heritage resources with a pedagogical focus. One example is the collaboration between the Lakota Native American tribe with the Smithsonian Museum of Natural History to create an online resource for the interpretation of the Lakota calendars, known as winter counts. The resource presents stories related to the tradition of the winter counts, which are narrated by tribe members12. The specific project involves the digitization of relevant content, the use of audiovisual technology and the creation of a website. Through the project community members are empowered to share their stories and memories. Moreover, MelOdysseia13 which was developed by the Music Library of Greece “Lilian Voudouris” is an online interactive tool for teaching the history of music in Greek from the medieval period until today. It is designed to preserve heritage by introducing and familiarising classical music to secondary school pupils, and those who love music and are in the early stages of their acquaintance with it. This is achieved through the study of major composers, works and forms that are considered to be milestones of Western European art music. IS-HELLEANA, Intelligent System for HELLEnic Audiovisual National Aggregator14 is a Greek project in progress, which will provide an effective access in audiovisual content that constitutes the spark for the further development and the appointment of Greek digital audiovisual content, not only in Greece but in Europe as well. The EU has also supported projects relating directly and indirectly to the transmission of intangible heritage. For example, the I-maestro project15 aims to build a multimedia environment for technology enhanced music education. This employs self-learning environments, gestural interfaces and augmented instruments promoting new methods for music training. The question that is raised, however, (and is also relevant for I-treasures) is whether technology risks replacing human interaction as a process of transmission. More directly related to intangible heritage and local development is the EU-funded project entitled Cultural Capital Counts16. The project aims to enable a positive development of six regions in Central Europe by focusing on intangible heritage resources like living traditions, knowledge and talents. By developing a strategy that is based on intangible cultural resources, the project aims to enable a sustainable regional development in order to increase the region’s attractiveness for enterprises and the competitiveness of the regions. The project appears to develop around a website that contains a list of various traditions and expressions of intangible heritage found in the six regions. It takes forward strategies for local, sustainable development and collaborative research. But the focus seems to be more on the commercialization of intangible heritage than on how these practices and traditions can be transmitted to the next generations. Another EU-funded project is Europeana17 (see also 4.3.5.2.2), which is the most well-known portal for exploring the digital resources of Europe’s museums, libraries, archives and audio-visual collections, thus offering direct access to millions of books,

12 www.wintercounts.si.edu 13 http://melodisia.mmb.org.gr/ 14 http://www.helleana.gr/?q=el 15 www.i-maestro.org 16 www.culturalcapitalcounts.eu 17 http://www.europeana.eu/

Del_2_1_FINAL2.doc Page 22 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 manuscripts, paintings, films, museum objects and archival records that have been digitised throughout Europe18. Additionally, the Mediterranean Voices Project19, which was funded by Euromed Heritage, aimed at creating a database of audiovisual information about cultural expressions and traditions from twelve cities in the Mediterranean. The project involved collaborative research and capacity building through a close partnership between universities and local communities. It addressed issues of memory, space and oral history and employed ethnographic research methods to investigate personal histories of the different cities. It is not clear how the database has been subsequently used, but the project has survived its ‘funding life’ and has taken a new form, exploring issues of heritage and identity beyond the Mediterranean (Koussis et al. 2011). What is interesting to note in the above projects is that they have been adopted at a grass-roots level and as such have a more direct and significant impact on heritage preservation among local communities. They are based on the active involvement of local communities and are aimed at serving local needs. As such they are not only focused on documenting and archiving intangible heritage, but more importantly in processes of transmission and dissemination among practitioners and the new generation. The i-Treasures project aims to build on current developments in the field and apply participatory methodologies in public engagement with local communities. The idea is to empower local actors to use new technologies in the transmission and dissemination of intangible heritage expressions for the benefit of sustainable community development. To this end, the project aims to establish strong connections with local actors and community representatives and include local stakeholders in major phases of the development and use of the online platform. One of the central arguments of the project is that although modern technologies cannot replace human interaction in the transmission of intangible heritage, they can contribute significantly to processes of dissemination, especially among younger generations. For this reason, a particular focus of the project is the development of territorial schools that will act as local hubs for the transmission of local intangible heritage expressions (see DoW, WP8).

4.3 Modern Technologies in the Transmission and Documentation of Intangible Heritage The third – and largest – part of the state of the art review looks more closely at the technological methods that could be employed in i-Treasures and how these can contribute to the safeguarding and transmission of intangible heritage. The rationale of the project is that new technology can be used not only for the digitization and archiving of cultural expressions but also in terms of cultural transmission, education and community development. Although technology cannot replace human interaction, it can nevertheless support cultural transmission in new and innovative ways.

18 There are a number of projects which provide digitised material to Europeana, such as CARARE (http://www.carare.eu/) – aggregates content for the archaeology and architectural heritage, ATHENA (http://www.athenaeurope.org/) – aggregates museum content and promotes standards for museum digitisation and metadata, Europeana 1914–1918 (http://www.europeana1914-1918.eu/en) – collects material relating to World War One, Europeana Libraries (http://www.europeana-libraries.eu/) – adds over 5 million digital objects to Europeana from 19 of Europe's leading research and university libraries, EUscreen (http://www.euscreen.eu/) – discovers and contributes television heritage material to Europeana, Musical Instrument Museums Online / MIMO (http://www.mimo-international.com/) – provides information on musical instruments held in public collections, etc. 19 www.medvoices.org

Del_2_1_FINAL2.doc Page 23 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

To this end, the document provides a detailed analysis of the different technological methods/ modules that have a potential use in the preservation of intangible heritage, key projects in the field, their development and subsequent use. Thus this section is divided in seven subsections according to the themes of:  Facial expression analysis and modeling,  Vocal tract sensing and modeling,  Body and gesture recognition,  Encephalography analysis,  Semantic multimedia analysis,  3D Visualization  Text to Song technology. Each section introduces the key developments in the specific field, examines major projects and discusses the potential use and contribution of the technology in the preservation of intangible heritage.

4.3.1 Facial Expression Analysis

4.3.1.1 Introduction Facial expressions are one of the most cogent, naturally pre-eminent means for human beings to communicate emotions and affective states, to clarify and stress what is said, to signal comprehension, disagreement, and intentions and in brief regulate interactions with the environment and other persons in the vicinity (Zafeiriou and Yin 2012). Facial expressions are generated by facial muscle contractions, which result in temporary facial deformations of facial geometry and texture. Human faces are estimated to be capable of more than ten thousand different expressions. This versatility makes non-verbal expressions of the face extremely efficient and honest, unless deliberately manipulated. Many of these expressions are directly associated with emotions and affective states such as happiness, sadness, anger, fear, surprise, disgust, shame, anguish and interest, which are universally recognized (Ekman 2003). Facial expression analysis has been an active research topic for behavioral scientists since the work of Charles Darwin in 1872 (Darwin 1904), in which he established the general principles of expression and the means of expressions in both humans and animals. Darwin also grouped various kinds of expressions into similar categories and cataloged the facial deformations that occur for each category of expressions. Another important milestone in the study of facial expressions and human emotions is the work done by psychologist Paul Ekman and his colleagues since the 1970s (Ekman 1989, Ekman and Friesen 1978, Sherer and Ekman 1982, Ekman 2003). Their work has a large influence on the development of automatic facial expression recognition systems (Bettadapura 2012). In the last quarter of the 20th century, with the advances in the fields of robotics, computer graphics and computer vision, animators and computer scientists also started showing interest in the study of facial expressions (Bettadapura 2012). The first step towards the automatic recognition of facial expressions was taken in 1978 by Suwa et al. (Suwa et al 1978), who presented an early automatic facial expression analysis system based on motion tracking of twenty identified landmarks on an image sequence. Since then, much progress has been achieved in building computer systems that can help us understand and use this natural form of human communication.

Del_2_1_FINAL2.doc Page 24 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Generally speaking, facial analysis techniques can be classified in two categories: a) those focusing on the interpretation of specific facial patterns and the classification of facial expressions into a predefined number of discrete categories, in which the most commonly used are the six basic emotions (anger, disgust, fear, happiness, sadness and surprise), which are widely assumed to be universal, and b) those providing descriptions for facial deformations at an abstract level in an objective manner and defer the decision making process to other high-level algorithms or human experts. Techniques of the second category are usually based on the Facial Action Coding System (FACS) proposed by Ekman and Friesen (1978). According to this, all perceptible facial deformations can be described as a combination of fundamental actions of individual muscles or groups of muscles called Action Units (AUs) (44 AUs are defined, e.g. Outer Brow Raiser, Upper Lip Raiser, Lip Tightener, Jaw Drop, etc, see Figure 4-1).This approach is suitable for describing spontaneous facial behaviors and more complex emotional states than those implied by the six basic emotions (see Figure 4-2). Moreover, AUs are independent of interpretation and thus can be used as input for the extraction of high level features or semantics (e.g. emotional state).

Figure 4-1 Basic Facial Action Units of the coding system proposed by Ekman.

Figure 4-2 The six basic expressions of emotion. From left to right: a) happy, b) sad, c) angry, d) surprise, e) fear and f) disgust.

Del_2_1_FINAL2.doc Page 25 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

In the recent past, research has primarily focused on the use of 2D facial information mainly due to prevalence of data in the relevant modalities (i.e. images and videos). Comprehensive surveys in this area include those by Pantic and Rothkrantz (2000) and Zeng et al. (2007). Facial features used for expression recognition may be roughly classified into geometric, e.g. distances between facial points (Kotsia and Pitas 2007), appearance-based such as Gabor filter responses (Bartlett et al. 2003) or holistic such as optical flow fields (Black and Yacoob 1997). Classification methods can be roughly divided into static and dynamic ones. Static classifiers use feature vectors related to a single frame to perform classification. In the case of image sequences, this frame corresponds to the peak of the depicted expression. Probabilistic as well as rule-based techniques are popular (Kotsia and Pitas 2007) (Pantic and Rothkrantz 2004). Temporal classifiers on the other hand try to capture the temporal pattern in the sequence of feature vectors over subsequent frames (Cohen et al 2003, Pantic and Patras 2006). While 2D facial expression recognition systems have achieved good performance, they are also susceptible to the problems of illumination and pose variations inherent to all 2D methods, which affect the perceived geometry and appearance of facial features. Moreover, the subtle skin deformations that characterize facial expressions (e.g. furrows, wrinkles, folds) may be difficult to capture using a 2D camera. Three- dimensional (3D) data, on the other hand, are invariant to such variations, are information-rich by nature and may capture with increased accuracy the fine-scale facial dynamics. Although the advantages of using 3D facial images are self-evident, until recently very few works have examined 3D facial expression recognition (a comprehensive survey may be found in Fang et al. (2011) and also in Sandbach et al. (2012)). This was mainly due to the unavailability of low-cost 3D sensors. Many of the proposed methods involve highly specialized sensors and/or controlled studio environments to capture the facial geometry (Beeler et al. 2010, Bradley et al. 2010). However, recent developments in gaming technology, such as the Nintendo Wii and the Microsoft Kinect system20, offer low-cost 3D sensing solutions, which have already been used for facial expression analysis and facial animation (Weise et al. 2011). The majority of 3D facial expression methods uses static 3D images and relies on the extraction of 3D geometric/curvature features or the use of deformable models. Few works have dealt with the dynamics of facial expressions, i.e. with the incorporation of the temporal information (Tsalakanidou and Malassiotis 2010, Weise et al. 2011, Sandbach et al. 2012). The work of Tsalakanidou and Malassiotis (2010) is the first fully automatic 3D facial expression recognition system presented in the literature and was developed by CERTH-ITI. Expression dynamics and 3D facial geometry data combined offer a wealth of information that can be harnessed for the analysis of facial expressions. The development of such systems opens up new avenues in facial expression recognition as 3D facial geometries ensure that all motion in the face will be captured, unlike 2D data, and analysis of full expression dynamics allows cues to be detected that are unavailable in static data (Sandbach et al. 2012).

20 Kinect for Windows. From http://www.microsoft.com/en-us/kinectforwindows/

Del_2_1_FINAL2.doc Page 26 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

4.3.1.2 Key projects and applications During the past decade, facial expression analysis has received increased interest due to its potential applications in various fields as diverse as human-machine interaction, robotics (Bettadapura 2012; Zeng et al 2007), behavioural science, medicine, psychiatry, communication, education, entertainment, and security. As expression recognition systems become more real-time and robust, many other innovative applications and uses will be identified. Facial expression analysis has been investigated in many European research projects over the past decade. The main motivation behind this research is usually the desire to make human-machine interaction more natural for the human. Next generation computing systems and robots are expected to interact with users in a way that emulates face to face encounters. Face to face communication relies significantly on the implicit and non-verbal signals expressed through body and head posture, hand gestures and facial expressions for determining the spoken message in a non-ambiguous way. As already explained above, facial expressions in particular are considered to be one of the most powerful and immediate means for humans to communicate their emotions, intentions and opinions to each other (Tsalakanidou and Malassiotis, 2010). An indicative list of projects involving research on facial expression recognition is provided below (in Appendix 10.1 you can find more details about each project).  HUMAINE (Human-Machine Interaction Network on Emotion)  AGENT-DYSL (Accommodative intelliGENT educational environments for DYSlexic learners)  FEELIX GROWING (FEEL, Interact, eXpress: a Global appRoach to develOpment With INterdisciplinary Grounding)  CALLAS (Conveying Affectiveness in Leading-edge Living Adaptive Systems)  PASION (Psychologically Augmented Social Interaction Over Networks)  SEMAINE (Sustained Emotionally coloured Machine-human Interaction using Nonverbal Expression)  TANGO (Emotional interaction grounded in realistic context)  IMMEMO (IMMersion 3D basée sur l'interaction ÉMOtionnelle)  SIEMPRE (Social Interaction and Entrainment using Music PeRformance Experimentation)  TARDIS (Training young Adult's Regulation of emotions and Development of social Interaction Skills)  ASC-Inclusion (Integrated Internet-based Environment for Social Inclusion of Children with Autism Spectrum Conditions). Most of the aforementioned projects focus on improving human-machine interaction through recognition of the emotional state of users (HUMAINE, FEELIX GROWING, PASION, TANGO, SEMAINE). Other projects focus on training users to practice and improve their communication skills (IMMEMO, TARDIS, ASC-Inclusion, AGENT- DYSL). Only two of these projects, i.e. CALLAS and SIEMPRE, are related to cultural expression. The goal of SIEMPRE is to explore intrapersonal interaction between a) musicians and b) musicians and listeners and investigate entrainment, emotional contagion and co-creation by which both performers and audience contribute to

Del_2_1_FINAL2.doc Page 27 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 shaping a music event. Expressive movement, audio and physiological features are extracted and used towards this aim. However, in this case, researchers do not investigate facial expressions but head movements. The goal of CALLAS is the development of a multi-modal framework for real time interpretation of emotional aspects for new media applications for Art and Entertainment. The target is to make users stimulating sources of human communication rather than passive spectators of artistic performances. User emotion analysis is based on different modalities such as speech recognition, gesture recognition, body motion tracking, facial expression detection, sound capture, haptic tracking, natural language processing, etc. Analysis of facial expressions is based on 2D images acquired by webcams and is used in interactive opera and interactive music showcases. In the first case, the story plot is designed depending on the facial expressions of multiple people, each one standing for a different opera character. In the second case, facial expressions along with other modalities are used to estimate the user’s emotions and guide the music in a music-kiosk. As it is clear, the investigation of facial expressions and related emotional states as an integral part of the artistic and cultural expression could be investigated for the first time in the context of the i-Treasures project. Although the CALLAS project also involves emotion recognition and artistic expression, it emphasizes on the spectator experience and his interaction with the artistic product. The i-Treasures project on the other hand, aims to capture, analyze and preserve the unique characteristics of the artistic performance itself and thus focuses on the expressions and emotional state of the performer. Also, in the majority of the above mentioned projects the facial expression analysis component uses 2D facial images captured by webcams. However, 2D-based facial expression recognition is highly affected by pose and illumination variations and usually requires the user to be recorded in relatively controlled settings. Moreover, 2D sensors may not be able to capture subtle facial movements and deformations. The use of 3D sensors may alleviate these problems. 3D sensors are only used in PASION and TANGO projects. In PASION, a prototype 3D sensor developed for the needs of the project was used. At that time low-cost off- the-shelve 3D sensors like Microsoft Kinect were not available yet and 3D images were usually captured using high-cost devices like laser scanners. In TANGO project on the other hand, affordable depth-sensing devices (like MS Kinect) are used. However, in this case, facial analysis aims at tracking facial motion in recorded image sequences and using tracking results to control the animation of virtual characters. A key aim of facial analysis and modeling within i-Treasures could be to leverage the technological advances in 3D sensing technologies and create a low-cost facial expression analysis system allowing a) the extraction of fine-scale facial dynamics and b) the control of the facial expressions of a digital avatar in real time with a sufficient level of realism mainly for educational purposes. Innovations beyond the state of the art may thus include:  Increased detection accuracy of fine-scale facial dynamics based on low-cost sensors. Low-cost 3D sensing systems like Microsoft Kinect focus mainly on robust motion tracking for compelling real time interaction, while geometric accuracy and appearance are of secondary importance. Special effort will be given in developing novel facial feature tracking techniques that combine 3D geometry with 2D texture to recognize subtle facial muscle movements and offer increased robustness against noisy data. The combined use of 2D / 3D facial information will allow operation in relatively uncontrolled conditions (e.g. under pose and illumination changes, occlusions).

Del_2_1_FINAL2.doc Page 28 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

 Novel methods for facial feature detection. Special methods will be developed for the accurate tracking of important facial features such as the mouth and eyebrows. Mouth tracking is particularly intriguing: mouth movement contains significant amount of information for facial expression and emotion analysis; at the same time the mouth is the region where most of the facial deformations occur causing significant tracking errors (especially when using 2D data). Mouth tracking is extremely important for the rare singing use case.  Novel measurements for the identification of Facial Action Units. The use of 3D facial information can provide extremely useful measurements for the identification of facial action units. For example, curvature descriptors can be used for detecting deformations of cheek muscles (blow, puff, suck) or lips (press, tighten, stretch).

4.3.1.3 Possible Use in Intangible Heritage Preservation and Transmission Facial expressions and body gestures are one of the most powerful and natural means of humans to communicate emotions, affective states and intentions to each other. This natural means of communication becomes even more important in the case of artistic expressions like singing or acting where the face and body are the main tools used by the performer to communicate the emotional aspects of their role. A great singing performance is not only the result of a great voice but also reflects the emotional involvement of the performer who expresses what he feels through his voice and body. In the context of i-Treasures, facial expression analysis could be employed, for example, in the following use cases: i) rare singing and ii) contemporary music composition. In both cases, the performer’s facial expressions could be analyzed in terms of facial actions, which can either be used as a means of extracting facial muscle movements useful for describing the performer’s technique (in the fist case) or as a means for decoding the performer’s emotional state (in both cases). For the rare singing use case, preservation and transmission of this expression involves more than analyzing voice and music patterns and decoding voice articulation. It should also involve analyzing and preserving the expressive and emotional aspects revealed by the performer’s face, since the performance is more than correct voice articulation: it is also emotion revealed through voice and face. This is also very important for educational purposes. New singers will not only be taught how to use their vocal tract to sing different types of songs but can also learn how to give a complete performance. Besides the emotional aspect, facial expressions can also be used to reveal details of the performer’s technique, e.g. how much does he open the mouth when singing. In the case of contemporary music composition, the aim is to combine both communication aspects (emotions) and control actions (gestures) to explore more sophisticated body/music interactions. The main objective is to develop a novel multimodal Human-Machine Interface for music composition, in which natural gestures performed in a real-world environment are mapped to music/voice segments taking into account the emotional status of the performer. In this case, facial expressions and EEG signals can be used to decode the emotional state of the performer and shape (together with hand and body gestures) the output of the intangible musical instrument.

Del_2_1_FINAL2.doc Page 29 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

4.3.2 Vocal Tract Sensing and Modeling

4.3.2.1 Introduction Since the dawn of human communication, man has been curious about the speech production mechanism, and has sought to model and exploit it in a variety of useful applications. The first vocal tract models were physical models, constructed of tubes and valves and resonators, which sought to duplicate the intricate process by which speech is produced in the human vocal tract via the articulators: the larynx (vocal folds), tongue, lips, teeth, jaw, and nasal cavity. With the advent of powerful digital computers, it became possible to produce 2D and 3D vocal tract models of surprising realism in software, often referred to as “talking heads” (Figure 4-3 and 4-4) (Engwall 1999, Fels et al., Stone 1991) which, when coupled with an appropriate acoustic simulation, allow to synthesize speech in a way totally analogous to actual human speech production. Although, such so-called systems have been claimed by some researchers for having poorer performance than the codebook-style vocoder synthesizers, articulatory synthesis remains an active area of research, as many researchers believe it will ultimately lead to the most effective means of communication between man and machines.

Figure 4-3 3D Vocal Tract Model with the Talking Head (Engwall 1999)

Del_2_1_FINAL2.doc Page 30 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Figure 4-4 (a) Jaw and Laryngeal model implemented using Artisynth, (b) Jaw model connected to a model of the tongue (Fels et al. 1991) To model the vocal tract effectively, it is necessary to study and understand its physical characteristics. Early studies on cadavers were the first sources of such information, followed by various types of endoscopic investigations, many of which are still in use today, but in the 20th century, the non-invasive nature of real time medical imaging techniques led to significant breakthroughs in vocal tract sensing. Very high-resolution real-time imaging of the entire vocal tract is possible using cineradiography (X-rays) and magnetic resonance imaging (MRI) (Figure 4-5). As the use of X-rays on living subjects is regulated by strict radiation exposure limits, its use as a research tool is rather limited, although a number of studies have been carried out (Badin et al. 2002, Stone 1990). Concerning MRI, although real time studies of the vocal tract have been carried out (Badin et al. 2002, Engwall 2004, Stone and Lundberg 1996), the procedure requires the subject to recline within the confines of a very constrained volume containing a strong magnetic field. Time on an MRI machine is also very expensive, and, finally, the repetition rate of MRI, at best several Hertz, is insufficient for a delicate, real time physico-acoustic study of speech production.

Figure 4-5 Examples of tongue contours superposed on Magnetic Resonance images (Badin et al. 2002)

4.3.2.2 Key Projects and applications In the 1980s, sonographic techniques began to become popular for vocal tract sensing studies (Stone 1991, Shawker et al. 1993). Indeed, the underside of the chin provides a very convenient window for the study of tongue movement using ultrasound (US) waves in the range of 1 to 10 MHz. In this case, the upper surface of

Del_2_1_FINAL2.doc Page 31 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 the tongue, as an air-tissue boundary, gives a very strong reflection of the ultrasonic energy, providing a clear tongue contour which can be studied and tracked in real time, at frame rates of up to 100 images per second, which is perfectly adequate for speech production research. Although US (Figure 4-6) provides poorer resolution than MRI or cineradiography, and, as a coherent wave source, is perturbed by speckle noise, there are no radiation dose limits for standard B-mode ultrasound, which, along with the high frame rates available, make it an excellent tool for investigation of the speech production process on multiple subjects. As it is non- invasive, portable, and requires no external magnetic field, US can also be readily complemented with other lightweight sensors such as: an ElecroGlottoGraph (EGG), to measure and record vocal fold contact movement during speech; External photoglottography (ePGG) to measure and record vocal fold movements during speech, a nasally mounted accelerometer for detecting the presence of nasality in speech sounds; a video camera to follow lip movement; and a standard microphone (Figure 4-7) (Denby et al. 2006). Standard data acquisition techniques allow the synchronized acquisition of all of these sensors simultaneously (Figure 4-8) (Hueber et al. 2008, Hueber et al. 2007).

Figure 4-6 Ultrasound image showing Figure 4-7 Lightweight helmet, with tongue contour (Denby and Stone 2004) ultrasound probe, infra-red camera, and lapel microphone (Denby et al. 2006)

Figure 4-8 Hardware component of the acquisition system (Hueber et al. 2008)

4.3.2.3 Possible Use in Intangible Heritage Preservation The routine availability of inexpensive, powerful computing resources, are today beginning to make unexpected inroads into a variety of new fields. Thus in the i- Treasures project some of the above mentioned advanced sensing and modeling techniques may be implied, to help preserve disappearing intangible cultural

Del_2_1_FINAL2.doc Page 32 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 heritage. In particular, in the Singing use case, it is clear that sensing techniques developed for speech production research could be directly applicable, or with minor modifications to take into account increased jaw movement, higher acoustic amplitude, and the like. In addition, the combination of instruments might enhance the level of knowledge. For example, as it will be illustrated in the following (see Section 5.4.1.3.1), in the Sardinian Canto a Tenore some singers use a traditional laryngeal phonation, others use a method that pitch-doubles the fundamental frequency (Henrich et al. 2006, Henrich et al. 2009). It is not clear if this is done by vibrating both the vocal folds and the ventricular folds as is found in diplophonia, or whether this is done by amplifying an overtone as is done in Tuva throat singing. The combination of ultrasound and EGG could allow the recording of the tongue, anterior pharyngeal wall and vocal folds during the methods. Between them, all the structures and behaviors of interest could be recorded, and allow visual and auditory documentation of the technique for purposes of archiving and future teaching. Like Cho et al. (2012), it’s possible to assess changing true vocal fold length with ultrasonography, and to observe vowel tongue shapes in untrained and trained singers (Troup et al. 2006). The following Table provides a synthesis of the sensors that will be considered in i- Treasures and presents (some of the) possible future explorations. Within i-Treasures two real time vocal tract models, or avatars, for each singing type could be constructed, one which demonstrates the articulatory technique of the expert (that will serve as a feedback to guide apprentices), and the other being operated by the student using his or her own articulators. In this way, the student may continue practicing and comparing styles until the desired technique has been mastered. Creating the avatars, as well as adapting the vocal tract models to each individual student, will require the acquisition of a significant amount of multi-sensor data, so that the necessary parameters can be adjusted. Sensors and Acquisition Possible Future Explorations

Acoustic signal Explore pharyngeal or labial embellishment External photoglottography (ePGG) (soloists) EGG Explore nature of tiling Tongue ultrasound Explore position of tongue and lips Lip camera Explore vocal quality tessitura of voice alone and Piezoelectric accelerometer ornamentations Video and fibroscope signal Comparison voice alone/accompanied (simulation) Study the link between body gestures and laryngeal gestures. Table 4-1 Summary of rare case songs and future explorations

4.3.3 Motion capture - Body and Gesture Recognition

4.3.3.1 Introduction

4.3.3.1.1 Motion capture technologies (full body) The study of motion is central in various scientific fields and applications. In the last decade, 3D motion capture systems have known a rapid evolution and substantial improvements, which have attracted the attention of many application fields, such as medicine, sports, entertainment, etc. Motion capture (or “mocap”) systems can be divided into two main categories: marker-based and marker-less technologies. Even if some very important improvements have been made in the last years, no perfect system exists, each one having its own advantages and drawbacks.

Del_2_1_FINAL2.doc Page 33 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Marker-based technologies Marker-based systems can be divided into two main categories: optical systems and inertial systems (accelerometers, gyroscopes, etc.). a) Optical motion capture systems The optical motion capture systems are based on a set of cameras around the capture scene and on markers, reflecting or emitting light, placed on the body of the performer. Optical motion capture systems such as Vicon Peak21, Phasespace22 or Optitrack23 (commercial systems), have been used in applications such as gait analysis, rehabilitation, 3D animation and special effects in cinema (see Figure below). Optical motion capture has been widely used to study motion of performers. In (Rasamimanana and Bevilacqua 2008) and (Demoucron et al. 2008) for instance, the Vicon system was used to capture the motion of violin players. The aim of this research was the modelling of music performances by understanding different bowing strategies in violin playing. In another case, researchers tried to adapt this method on piano players (Palmer and Pfordresher 2000), while UMONS participated in research (Ofli et al., 2008) where the motion of a dancer were captured with an optical system and modelled in synchronization with the music.

Figure 4-9 The motion capture systems ViconPeak (left) and OptiTrack (right) b) Other sensors Various types of sensors (Coduys et al. 2004) and (Bevilacqua et al. 2010) or commercial interfaces, such as the Wii joystick (Grunberg, n.d.), the MotionPod24 (a set of individual sensors), or the IGS-190 inertial motion capture suits from Animazoo25 can easily provide real-time access to motion information. IRCAM (France) has developed various types of time-of-flight wireless sensor interfaces for the continuous and real-time gesture-following applied in dance and music performances (Bevilacqua 2007). Numediart (UMONS) has also been using the IGS- 190 inertial motion capture suit to analyse and model variations in walk motions (Tilmanne et al. 2012).

21 http://www.vicon.com 22 http://www.phasespace.com 23 http://www.naturalpoint.com/optitrack 24 http://www.movea.com/technology 25 http://www.animazoo.com

Del_2_1_FINAL2.doc Page 34 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Figure 4-10 The motion capture system IGS-190 from Animazoo

Markerless technologies Markerless technologies do not require subjects to wear specific equipment for tracking and are usually based on computer vision approaches. Even if the accuracy and sensitivity of the tracking results do not yet meet the needs of the industry for the usual use of motion capture for animation, marker-less systems are the future of the field. Nonetheless, marker-less systems still suffer from a lack of precision and cannot compete with marker-based technologies that now reach sub millimeter precision in real time. On the other hand, marker based systems are often very expensive and need a more complicated setup. Markerless motion capture technologies based on real-time depth sensing systems have taken a huge step ahead with the release of Microsoft Kinect and its accompanying skeleton tracking software (Kinect for Windows) and other affordable depth cameras (ASUS Xtion26, PMD nano27). These sensors are relatively cheap and offer a balance in usability and cost compared to optical and inertial motion capture systems. Kinect produces a depth-map stream at 30 frames per second with subsequent real-time human skeleton tracking. Estimation of the positions of 20 predefined joints that constitute a skeleton of a person is provided by software SDKs (Microsoft Kinect SDK, OpenNI28) together with the rotational data of bones. Subsequent algorithmic processing can then be applied in order to detect the actions of the tracked person. The estimated 3D joint positions are noisy and may have significant errors when there are occlusions, which pose an additional challenge to action detection problem. Multi Kinect setups (calibrated with PCL29) with subsequent skeleton fusion techniques have been employed to combat the occlusion problems (Caon 2011). In conclusion, we can say that no perfect motion capture system exists. All systems have their advantages and drawbacks, and must be carefully chosen according to the use case scenarios in which they are to be used. A compromise must be found between motion capture precision, the need for burdensome sensors, and other external constraints like the motion capture area, the lighting environments, the portability of the system, etc.

26 http://www.asus.com/Multimedia/Xtion_PRO/ 27 http://www.pmdtec.com 28 http://www.openni.org 29 http://pointclouds.org

Del_2_1_FINAL2.doc Page 35 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

4.3.3.1.2 Hand and finger motion recognition Hand motion recognition and more especially finger motion recognition is very different from the usual motion capture approaches, which are generally designed for full body motion capture. Although special gloves for capturing finger motion are commercially available, the above motion capture methods are usually not suitable for finger gesture recognition. In (Burns and Wanderley 2006), recognition of the musical effect of the guitarist’s finger motions on discrete time events is proposed, using static finger gesture recognition based on a specific Computer Vision web- platform. The approach does not take into consideration the stochastic nature of the gestures and this method cannot be applied in the human-robot collaboration. Recently, a new method for dynamic finger gesture recognition in human computer interaction has been introduced by Manitsaris (2011). This method, based on a low- cost webcam, recognizes the entire finger gesture individually and it is non-obtrusive since it doesn’t put any limit on finger motions. When considering gesture analysis and more specifically fingering analysis in music interaction, there are four main approaches. These are: (a) the pre-processing using score analysis based on an acyclic graph. This approach does not take into consideration all the factors influencing the choice of specific fingering, such as physical and biomechanical constraints (Grunberg, n.d.); (b) the real-time using midi technology. This approach doesn’t concern classical musical instruments (Verner 1995); (c) the post-processing using sound analysis that works only when one note is played at a time (Traube 2004) and (d) the computer vision methods for the guitarist fingering retrieval (Burns and Wanderley 2006). The existing Computer Vision (CV) methods are of a low cost but they presuppose painted fingers with a full-extended palm in order to identify the guitarist fingers in the image, and specific recognition platforms, such as EyesWeb. Another great example of fingering recognition is the system of Yoshinari Takegawa, who used colour markers on the fingertips in order to develop a real-time Fingering Detection System for Piano Performance (Takegawa et al. 2006). This system is restricted in electronic keyboards, such as synthesizers and it cannot be applied for classical music instruments neither for the finger gesture recognition and mapping with sounds in space. Moreover, MacRitchie used Vicon System and Vicon Markers modelling in order to visualize musical structures. His method requires the music score in advance (MacRitchie et al. 2009). None of the above methods can be extended towards a dynamic gesture recognition taking into consideration the stochastic nature of gestures. They all recognize the musical effect of finger motions on discrete time events. The study of the above categories for gesture analysis in music interaction can lead to the conclusions that: (a) the gesture measurement approaches are based on rather expensive commercial systems, they are suitable for offline analysis and not for live performances and they cannot be applied on finger gestures; (b) gesture recognition via WSBN or CV does not cost a lot and has many important paradigms of live performance applications. On the other hand, sensors cannot be applied for finger gestures performed on the piano keyboard or on woodwind musical instruments; (c) fingerings can be retrieved with low cost technologies but the information acquired is related to discrete time events without taking into consideration the stochastic nature of the gestures and (d) new paradigms for the recognition of the musician gestures performed on surface or keyboards, with a semi- extended palm, can only be based on CV.

4.3.3.1.3 Motion Capture Formats The success of motion capture has led to a number of production houses that provide motion capture services. However most of those companies have developed

Del_2_1_FINAL2.doc Page 36 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 their own proprietary file formats. The most common formats for human motion capture are BVA/BVH and H-Anim. The BioVision’s BVH format succeeded BVA data format with the noticeable addition of a hierarchical data structure representing the bones of the skeleton. The BVH file consists of two parts where the first section details the hierarchy and initial pose of the skeleton and the second section describes the channel data for each frame, which is essentially the motion of the skeleton hierarchy (Meredith & Maddock, 2001). H-Anim (Humanoid Animation) is an ISO standard, part of a VRML/X3D. H-Anim specifies an abstract representation for modeling three dimensional human figures. It describes a standard way of representing humanoids, which allows human figures created with a modeling tools from one vendor to be animated using motion capture data and animation tools from another vendor. Some other known formats are ASF/AMC (Acclaim), C3D (National Institute of Health), CSM (3D Studio Max, Character Studio).

4.3.3.1.4 Action detection Despite the research efforts in the past decade and many encouraging advances, accurate recognition of the human actions remains a challenging task. Existing approaches to human action and gesture recognition can be coarsely grouped into two classes. The first uses 2D image sequences or 3D depth maps / silhouettes which form a continuous evolution of body pose in time. Action descriptors, which capture both spatial and temporal characteristics, are extracted from those sequences and conventional classifiers can be used for recognition. The other category of the methods extracts features from each silhouette and model the dynamics of the action explicitly. Bag of Words30 (BoW) are often employed as an intermediate representation with subsequent use of statistical models such as hidden Markov models31 (HMM), graphical models32 (GM) and conditional random fields33 (CRF) (Wang et al. 2012). Another more recent approach is to use the skeletal data acquired from the depth maps (Shotton et al., 2011). The subsequent use of skeletal data for action detection can be divided into two categories. The methods of the first category are based on 3D joints feature trajectories (Waithayanon and Aporntewan 2011). Those features are either joint position, rotational data, or some transformation of the above. They are mainly based on various Dynamic Time Warping (DTW) variants, like multi- dimensional time warping (MD-DTW) (Ten Holt 2007). The recognition is based on the alignment of the movement trajectories compared to the ‘oracle’ move which is being detected. Another approach is to extract features from the whole skeleton (histograms) and to use statistical models as in the case of silhouette based methods (Xia et al. 2012).

4.3.3.2 Key Projects and applications & Possible Use in Intangible Heritage Transmission and Preservation As already mentioned, motion capture has developed rapidly in the last few years and has been used extensively in various fields: medical applications (Mena et al. 1981, Duvinage et al., 2012), entertainment applications (Pejsa and Pandzic 2010), artistic performances34, industrial applications and ergonomy, sports, etc.

30 http://en.wikipedia.org/wiki/Bag-of-words_model 31 https://en.wikipedia.org/wiki/Hidden_Markov_model 32 http://en.wikipedia.org/wiki/Graphical_model 33 http://en.wikipedia.org/wiki/Conditional_random_field 34 Deakin Motion Lab 2013 - Dance technology workshops (http://www.deakin.edu.au/motionlab/educational- services.php?p=Dance-technology-workshops&pid=23 )

Del_2_1_FINAL2.doc Page 37 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

The applications of motion capture are too numerous to be listed here as an extensive review of motion capture is beyond the scope of the present state of the art. However, across all the application fields listed here above, the motion capture research field can be addressed regarding:  Motion capture system design: motion capture technologies, developing new approaches for motion capture, or improving the current motion capture tools.  Motion capture for motion analysis: the use of existing motion capture systems for understanding the motion, gesture recognition, extracting information from motion capture sequences, analysing similarities and differences between motions, characterize the motion and recognize specific information (identity, style, activity, etc.) from the motion capture sequence, etc.  Motion capture for animation: the use of motion capture, performed either in real-time or offline, to animate virtual characters using motions recorded from human subjects.

4.3.3.2.1 Motion capture technologies for dance applications As the interdisciplinary artist Marc Boucher says in (Boucher 2011) “Motion-capture is the most objective form of dance notation insofar as it does not rely on subjective appreciation and verbal descriptions of individuals but rather on predetermined mathematical means of specifying spatial coordinates along x, y and z axes at given moments for each marker. These data can be interpreted (inscribed, “read,” and “performed”) cybernetically (human-machine communication) while previous dance notation methods are based on symbolic representations, written and read by humans alone.” However, as discussed above, all motion capture solutions have advantages and drawback, and even though motion capture is the most informative tool for recording dance, issues like obtrusiveness of markers, need to wear specific costumes and motion recoding precision are different subjects that require further investigation and appropriate solutions. Furthermore, motion capture is not yet widely known, and its costs and complexity have also prevented this technology to reach most artists and dancers. A widely adoption of these technologies needs adapted and usable tools and convincing system demonstrations. Although motion capture technologies are most often designed and developed in generic application purposes, we have identified several studies where new sensors were designed or adapted to be used in the specific use case of dance motion capture. The SENSEMBLE project (Aylward and Paradiso 2006) designed a system of compact, wireless sensor modules worn at the wrist or ankles of dancers and meant to capture expressive motions in dance ensembles. The collected data enabled them to study if the dancers of the ensemble were moving together, if some were leading or lagging, or responding to one another with complementary movements. However, this sensor is aimed to be worn at the wrists and ankles of a dancer, not at every body segment and thus does not consist of a true motion capture system since the whole body is not captured, and the dance motion cannot be reconstructed based on the recorded information. The sensor captures some information about the motion, but not the 3D motion itself. Saltate! (Drobny et al. 2009) is a wireless force sensors system mounted under the dancers’ feet which is used to detect synchronisation mistakes, and emphasize the beats in the music when mistakes are detected in order to help the dancer stay in synchronisation with the music. Once again, the sensor records some information about the dance moves, and more especially about the feet interactions with the ground, but the whole body motion is not captured at all.

Del_2_1_FINAL2.doc Page 38 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Other approaches consist in capturing the dancer’s motion through motion capture in order to control the soundtrack through the gesture to music mapping. This is for instance the approach followed by (Bevilacqua et al. 2001), (Bevilacqua et al. 2001a), (Dobrian and Bevilacqua, 2003), whose goal is mainly to explore possible relationships between gesture and music using the optical motion capture Vicon 8 system.

4.3.3.2.2 Dance motion analysis Dance is an important form of human motion and expression. Detection, classification and evaluation of dance gestures and performances are research fields, in which existing commercial products have been often employed (Raptis et al. 2011). Experiences such as Harmonix’ Dance Central video game series35, where a player repeats the motion posed by an animated character are becoming commonplace. Research is being conducted on automatic evaluation of dance performance against the performance of a professional, within 3D virtual environments or virtual classes for dance learning (Alexiadis et al. 2011) (Essid, et al. 2012). The VR-Theater project allows choreographers to enter the desired choreography moves with a user-friendly user interface, or even to record the movements of a specific performer using motion capture techniques36. Numerous research studies have addressed the issue of synthesizing new dance motion sequences. They often base their synthesis model on existing dance motion capture databases (Alankus et al. 2005). Again, their aim is not to preserve cultural heritage of the dance content. However, these studies have developed interesting approaches and tools, which can be used in order to analyse dance motions and the synchronized music track. For instance, Alankus et al. (2005) have developed a dance move detection algorithm based on the curvature of the limbs’ path, while (Brand and Hertzmann 2000) have developed and unsupervised dance-modelling approach based on Hidden Markov Models. Although it was not designed specifically for dance motions, Balasubramanian et al. (2012) developed a metric for quantifying movement smoothness that could be very interesting for characterizing our collected moves. Laban movement analysis (LMA)37 is a method developed originally by Rudolf Laban, which aims at building a language capable of describing and documenting precisely all varieties of human movements. The Laban movement analysis describes movements through six main characteristics of the motion: body, effort, shape, space, relationship, and phrasing. Even though this method has its drawbacks and requires a long training, it is one of the very few attempts at building a vocabulary or dictionary of motions that have been adopted quite widely. Bouchard et al. (2007) use Laban movement analysis (LMA) to extract movement qualities, which are used to automatically segment motion capture data of any kind. They hence use concepts initially developed for dance and apply them to general motions. Kahol et al. (2004) implement an automated gesture segmentation dedicated to dance sequences. Dance motion capture has also been attracting great interest recently in the performing arts for its use in interactive dance performances, such as the work of James et al. (2006).

35 http://pointclouds.org 36 http://avrlab.iti.gr/HTML/Projects/current/VRTHEATER.htm 37 http://laban-eurolab.org/index.php?option=com_content&view=article&id=45&Itemid=22&lang=en

Del_2_1_FINAL2.doc Page 39 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

4.3.3.2.3 Intangible heritage preservation and transmission Very few attempts at using body and gesture recognition for intangible heritage preservation can be found in the literature. To our knowledge, past attempts at preserving the intangible cultural heritage of the traditional dances mainly consisted in informal interviews of the people practising these dances. The results of these interviews were then summarized in books such as (Malempré 2010). According to (Calvert et al. 2005), dance has probably been the slowest art form to adopt technology, partially because useful tools have been slow to develop because of the limited commercial opportunities brought by dance applications. In their article they describe applications such as animate and visualize dance, plan choreography, edit and animate notation, enhance performance, but they do not cover intangible performance preservation. However they interestingly underline a recurring issue of all these applications, which is the need for a unique, unambiguous way to represent human movement, and more particularly dance. Shen et al. (2012) introduce the concept of using motion capture technology for national dances protection in China. However their report lacks basic details and information. In Brown, et al. (2005), the creation of a motion capture database of 183 Jamaican dancers is reported. Their study aimed at evaluating if dance revealed something about the phenotypic or genotypic quality of the dancers, and showed that there are strong positive associations between symmetry (one measure of quality in evolutionary studies) and dancing abilities. However, the aim of this research is not to preserve the dance, but rather to study it, here at a very fundamental level. In the contemporary dance case, the DANCERS! project38 (Tardieu et al. 2010) aimed at collecting a database of dancers. The recording setup consisted of a formatted space, videos recorded from the front and top of the scene and metadata describing the dancer. No motion capture was performed, and no precise motion information is hence available, the only possible views of the scene are the ones originally recorded by the videos since the scene was not captured in 3D. Some research projects have shown that dance-training systems based on motion capture technologies could successfully guide students to improve their dance skills (Chan et al. 2011) and have evaluated different kinds of augmented feedback modalities (tactile, video, sound) for learning basic dance choreographies.

4.3.4 Encephalogram Analysis

4.3.4.1 Introduction Brainwaves are a form of neural repetitive activity of the central nervous system. Human brainwaves were first measured in 1924 by Hans Berger. He termed these electrical measurements the electroencephalogram (EEG), which means literally 'brain electricity writing'. Today, EEG has become one of the most useful tools in the diagnosis of epilepsy and other neurological disorders. The fact that a machine can read signals from the brain has sparked the imaginations of scientists, artists and researchers. Consequently, the EEG has made its way into a myriad of applications such as brain- computer-interface (BCI), a tool that permits its user to communicate to an interface using only his/hers brain signal. Following that vein of creativity, researchers- musicians envisioned the utilization of EEG signals towards music composition and performance.

38 http://www.bud-hybrid.org/Public/index.php

Del_2_1_FINAL2.doc Page 40 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

4.3.4.1.1 EEG and Emotion Recognition Emotion Recognition (ER) is the first and one of the most important issues affecting computing (AC) brings forward and plays a dominant role in the effort to incorporate computers, and generally machines, with the ability to interact with humans by expressing cues that postulate and demonstrate emotional intelligence-related attitude. Successful ER enables machines to recognize the affective state of the user and collect emotional data for processing in order to proceed toward the terminus of emotion-based Human Machine Interface, the emotional-like response. Toward effective ER, a large variety of methods and devices have been implemented, mostly concerning ER from face (Cohen et al. 2000, Bourel et al. 2002, Lien et al. 1998), speech (Schuller et al. 2005, Busso et al. 2004) and signals from the autonomous nervous system (ANS), i.e., heart rate and galvanic skin response (GSR) (Picard et al. 2001, Nasoz et al. 2003, Lisetti and Nasoz 2004). A relatively new field in the ER area is the EEG-based ER (EEG-ER), which overcomes some of the fundamental reliability issues that arise with ER from face, voice, or ANS-related signals. For instance, a facial expression recognition approach would be useless for people with the inability to express emotions via face, even if they really feel them, such as patients within the autism spectrum (McIntosh et al. 2006), or for situations of human social masking; for example, when smiling though feeling angry. Moreover, voice and ANS signals are vulnerable to “noise” related to activity that does not derive from emotional experience, i.e., GSR signals are highly influenced by inspiration, which may be caused from physical and not emotional activity. On the other hand, signals from the Central Nervous System (CNS), such as EEG, Magnetoencephalogram (MEG), Positron Emission Tomography (PET), or functional Magnetic Resonance Imaging (fMRI), are not influenced by the aforementioned factors as they capture the expression of emotional experience from its origin. Toward such a more reliable ER procedure, EEG appears to be the less intrusive and the one with the best time resolution than the other three (MEG, PET, and fMRI). Motivated by the latter, a number of EEG-ER research efforts have been proposed in the literature.

4.3.4.2 Key Projects and applications

4.3.4.2.1 EEG-based Music Synthesis and Performance Composer Alvin Lucier in 1965 composed the first musical piece using EEG "Music for solo performer" (Lucier 1976). This was a piece for percussion instruments made to resonate by the performer’s EEG waves. David Rosenboom began systematic research into the potential of the EEG to generate artworks, including music (Rosenboom, 1990). He developed EEG-based musical interfaces associated with a number of compositional and performance environments exploring the hypothesis that it might be possible to detect certain aspects of our musical experience in the EEG signal (Rosenboom, 1990a). Another attempt towards the same direction is the "Other Ear" project. The Other Ear (Dribus, 2004) was realized through the use of an additive synthesis engine, by mapping the provided EEG data recorded from a subject listening to music. The goal of this scheme was to present the data accurately in regard to its temporality and other features, while simultaneously rendering it as an aesthetically pleasing musical composition. The contribution of other physiological signals additional to EEG was examined by Brouse et al. (2005). In their work, apart from EEG analysis, they engaged electromyogram (EMG) electrocardiogram (ECG) and electro-oculogram (EOG) analysis to control sound synthesis algorithms in order to build a biologically driven musical instrument. In the same direction, Miranda and Bruce (2005)

Del_2_1_FINAL2.doc Page 41 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 introduced a brain-computer-music-interface piano and an inter-harmonium for EEG- based music performance systems. In 2010, by modding a geeky toy, i.e., "Mindflex", and connecting it to a vintage synthesizer, an indie rocker singer/guitarist, Robert Schneider, created a trippy instrument called “Teletron” that lets him make music with his mind (Thill 2010). Schneider claimed characteristically that “The curve of the pitch is basically the same for both synthesizers, but the left-brain synth is more logical and dry, while the right-brain synth is more dreamy and surreal”. The work of Chew and Caspary (2011) presents an integration of a Brain Computer Interface (BCI) with a music step sequencer composition program. Previous BCIs that utilize EEG data to form music provide users little control over the final composition or do not provide enough feedback. Their interface allows a user to create and modify a melody in real time and provides continuous aural and visual feedback to the user, thus affording them a controllable means to achieve creative expression. Lu et al. (Lu et al. 2012) proposed mapping techniques for EEG-based music creation. The period of an EEG waveform was mapped to the duration of a note, the average power change of EEG was mapped logarithmically to music intensity according to the Fechner's law and fMRI blood oxygenation level represented the intensity of music. Preliminary results of this approach are demonstrated in older works by the same authors. In (Wu et al. 2009), parameters of EEG signals were translated to the parameters of music and in (Wu et al. 2010) scale-free based mapping from the amplitude of EEG to music pitch according to the power law was achieved. Another novel approach for EEG exploitation towards music performance is MoodMixer (Leslie and Mullen, 2011). Moodmixer is an interactive installation in which participants collaboratively navigate a two-dimensional music space by manipulating their cognitive state and conveying this state via wearable EEG technology. The participants can choose to actively manipulate or passively convey their cognitive state depending on their desired approach and experience level. A four-channel electronic music mixture continuously conveys the participants' expressed cognitive states while a colored visualization of their locations on a two- dimensional projection of cognitive state attributes aids their navigation through the space. MoodMixer is a collaborative experience that incorporates aspects of both passive and active EEG sonification and performance art while its aesthetic design is placed within the context of existing EEG-based music and art. More recently, Folgieri and Zichella (2012) used audio and visual stimuli to train a subject reproduce a single note by brainwaves whereas Masaki Batoh (Watercutter 2012, Margasak 2012) introduced the "Brain Pulse Music" project through a novel apparatus, i.e. Brain Pulse Music Machine (BPMM). BPMM captures brain waves from the parental and frontal lobes and converts them into a wave pulse that is the output sound. On the 13th of January 2013, Chicago Symphony Orchestra cellist Katinka Kleijn played the work "Intelligence in the Human-Machine", a duet for cello and brain waves written by Daniel Dehaan and Ryan Ingebritsen, wearing a portable EEG device (Margasak 2013). Example of an existing tool: the Emotiv EEG Acquisition Device Based on the latest developments in neuro-technology, Emotiv presents a revolutionary personal interface for human computer interaction. Emotiv is a high resolution, multi-channel, wireless neuroheadset that offers a multi-channel electroencephalogram (EEG system) for research, enabling a broad range of applications including neurotherapy, biofeedback, and brain computer interface. Emotiv uses a set of 14 sensors plus 2 references to tune into electric signals produced by the brain to detect the user’s thoughts, feelings and expressions in real time. The 14 EEG channel names based on the International 10-20 locations are: AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, AF4. Apart from the 14 EEG plus 2 sensors, Emotiv includes gyroscope that generates optimal positional information for cursor and camera controls. It is connected wirelessly to PCs running Windows, Linux, or MAC OS X while its lithium battery provide 12 hours of continuous use.

Del_2_1_FINAL2.doc Page 42 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

4.3.4.2.2 Differentiation of emotions from brain activity The degree to which emotions are differentiated by unique patterns of physiological signals used to be or even still consists of a debate as old as the study of emotion itself. More particularly, there are two major positions to the subject. The first, supported by the cognitive scientists, based on Cannon’s theory (Cannon, 1927), claims that different emotions are associated with the same physiological patterns and it is not possible to differentiated between them. On the other hand, an alternative position that has become the most prominent is based on the theoretical writings of Darwin (Darwin 1955) and James (James, 1890) and supports the fact that different emotions are accompanied by discrete patterns of physiological activity. Furthermore, Ekman (1984) has also reported that these patterns refer both to the ANS (e.g., heart rate, GSR, etc.) and the CNS (e.g., brain signals) and can differentiate the six primary emotions of happiness, sadness, anger, fear, disgust, and surprise. Specifically, Ekman et al. (1983) have evidenced the existence of unique patterns of ANS activity that differentiates among the negative emotions of fear, anger, disgust, and sadness. Besides the research toward the ability of ANS activity to differentiate emotions, there has also been a great volume of research conducted for the same ability of CNS activity. In particular, relative psychophysiology literature has revealed the most prominent expression of emotion in brain signals, i.e., the asymmetry between the left and right brain hemispheres. Davidson et al. (1979) developed a model that related this asymmetric behavior with emotions, with the latter analyzed in two main dimensions, i.e., arousal and valence.

Figure 4-11 The 2D emotion model by valence and arousal (left) and the topography of the 10-20 system (right), with Fp1, Fp2, F3 and F3 sites strongly related to the emotional activity.

Valence stands for one’s judgment about a situation as positive or negative and arousal spans from calmness to excitement, expressing the degree of one’s excitation (see Fig. 4.11-left). According to that model, emotions are: 1) organized around approach-withdrawal tendencies and 2) differentially lateralized in the frontal region of the brain. The left frontal area is involved in the experience of positive emotions, such as joy or happiness (the experience of positive affect facilitates and maintains approach behaviors), whereas the right frontal region is involved in the experience of negative emotions, such as fear or disgust (the experience of negative affect facilitates and

Del_2_1_FINAL2.doc Page 43 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 maintains withdrawal behaviors). Furthermore, Davidson et al. (1990) tried to differentiate the emotions of happiness and disgust with EEG signals captured from the left and right frontal, central, anterior temporal, and parietal regions (F3, F4, C3, C4, T3, T4, P3, P4 positions according to the 10-20 system (Jasper 1958) -Fig.4-11- right). The results revealed a more right-sided activation, as far as the power of alpha (8-12 Hz) band of the EEG signal is concerned, for the disgust condition for both the frontal and anterior temporal regions. Thus, the results enhanced the applicability of the aforementioned model and confirmed the evidenced extensive anatomical reciprocity of both regions with limbic circuits that have been directly implicated in the control of emotion (Nauta 1971). Later, Davidson (2004) examined the asymmetry concept in regard to the prefrontal cortex (PFC) based on data from the neuroscience literature on the heterogeneity of different sectors of the PFC. He envisaged the PFC role in affective expression in the brain and considered the use of more EEG signal bands other than alpha, i.e., theta (4-7 Hz), beta (13-30 Hz), and gamma (31-100 Hz). More recently, a variety of psychophysiological studies of emotion (Hagemann et al. 1999) have confirmed and adopted the model proposed by Davidson, expanding it by arguing that frontal EEG asymmetry may serve as both a moderator and a mediator of emotion and motivation-related constructs (Coan and Allen 2004). There, activity which is characterized in terms of decreased power in the alpha band was found to be associated with emotional states. Finally, Aftanas et al. (2001) used affective pictures and Event-Related Desynchronization/Synchronization (ERD/ERS) analysis to study cortical activation during emotion processing. ERD refers to the desynchronization of a brain rhythm during a stimulus, which leads to a power decrease, whereas ERS refers to the synchronization of the rhythm and power increase (Pfurtscheller and Da Silva 1999). In accordance with the asymmetry literature, they reported relatively greater right hemisphere ERS for negative emotional states and greater left hemisphere ERS for positive ones. All of the aforementioned studies reveal the potential of EEG signals as a means for developing an ER system based on the asymmetry concept expressed by frequency bands’ power and ERD/ERS phenomenon.

4.3.4.2.3 The Gesture-based Compositional Tool The connection of the gesture with music is based on the perception of the sound, through an abstract dramaturgy, as a living organism that constitutes an entity (mass) that has specific properties with their time substances and it is far from just a list of sound-colors or performance techniques liaisons (Cook 1990, 2000). The innovative concept is to construct a choreography of the sound gestures (Makryniotis 2004), which combine the relationship of “sound to sound” and “gesture to sound” (Gritten and King 2006). The gestural material could be acquired using Microsoft Kinect (see Sections 4.3.1.1 and 4.3.1.2) and modeled using Swarm Intelligence Τheory (SIT), which is rigorously applied to the robotics field, and with repulsive and attractive forces the swarm agents manage to locally define their time- dependent relationship to their neighbours, achieving, at the same time, an organized and functional trajectory of the whole swarm (Beni and Wang 1989). The application of SIT in music has been initially explored by Blackweel and Bentlry (2002) and Blackwell (2006), who tried to model the musicians’ improvisation, creating, at a degree, agent-based improvisators that actually interact with the musicians. The gesture in classical music is usually perceived as an extra-musical element that usually is connected to expressional issues (Juslin et al 2001, Lidov 1987, Lidov 2006). Nevertheless, going back to the Baroque era, one could spot some embodied aspects of the gesture, similar to the rhetoric performance (e.g., singers). Studies in Music Psychology (Davidson 1993, Juchniewicz 2008) reveal the importance of the body in the music performance. Researchers like E. F. Clarke and J. Davidson

Del_2_1_FINAL2.doc Page 44 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

(1998) and J. Sloboda (1998) have connected gestures with expression, cognitive perception and emotional transmission to the audience, but also to the compositional process (Sloboda 1998, Davidson 1994, Snook 1998, Davidson and Correia 2002).

Compositional tools based on emotions recognition - Examples of the biomusic concept transferred to the field of contemporary music39 The Metamorphosis (2011) Op. 83 (for actor, instrumental ensemble, live electronics and biosignals- motion sensors)40 The work is based on the concept of metamorphosis, as it characterizes the variations of the multiple levels of resolution of the existence. Motivated by the F. Kafka’s homonym work (1935), an external metamorphosis is constructed (Kafka: man to cockroach, Here: actor to guinea pig), which gradually is transformed to a normalized version of the external environment in which the internal metamorphoses take place. The latter, touch the space of emotions that are elicitated from the alterations in relationships and dependencies in a structural connectivity network at the sound space; there, the experiential interaction is expressed through the biosignals that reflect the internal world to a sound projection, which is mobilized and transformed from the ‘interferences’ of the conductor’s expressive movements. The trajectory of the metamorphosis puts forward the ‘sound liquidity’ and the continuous deconstruction/catastrophe at the foreground, stetting them as the basic ingredients of the work, in an alternative perspective, that of the so called ‘biomusic’. Common Brain (2013) Op. 86 (for sopranino, sub-bass recorder, male voice, 2 Emotiv EEG sensors and live electronics)41 The work is a commission of Swiss Contemporary music ensemble UMS 'n JIP and part of their 'Greece Project'. It is based on the construction of a "common brain" in a bilateral perception; the one that is being shared and the one that does not deviate from the norm. Starting from the point of existence (through the understanding of breathing), the multi-layering proliferation of the neurons is developed, followed by the network-activation, text-training, external-stimulation reaction and, final, emotional responsiveness. The performers participate in a formation of the sound-scape with the real-time acquisition of their encephalogram and its transform to sound material, creating an experiential perception of the "common brain" in many different aspects of its development and formation, exploring further the concept of Biomusic.

4.3.4.3 Possible Use in Intangible Heritage Preservation and Transmission There are important cultural differences in emotions that can be predicted, understood and connected to each other in the light of cultural expressions. The main cultural differences reflected at the affective space are expressed through initial response tendencies of appraisal, action readiness, expression and instrumental behavior, but also in regulation strategies. Moreover, the ecologies of emotion and contexts, as well as their mutual reinforcement are different across cultures. By capturing the emotions, and even better their dynamic character using EEG signals during cultural activities, the response selection at the levels of different emotional components, the relative priorities of initial response selection and effortful regulation, the sensitivity to certain context, the plans that are entailed by the emotions, as well as, the likely means to achieve them, could be identified and used as dominant source of information to acquire intangible cultural heritage elements. Consequently, the ways in which the potential of emotions is realized could reveal cultural facets that are intangible in character but form tangible measures at the affective space, contributing to their categorization and preservation, as a knowledge-based cultural/ emotional models. Moreover, most folklore/popular culture is shaped by a logic of emotional intensification. It is less interested in making people think than it is in making people feel. Yet that distinction is too simple: folklore/popular culture, at its best, makes people think by making them feel. In this context, the emotions generated by folklore/popular culture are rare personal; rather, to be traditional or popular, it has to

39 Composed by Assoc. Prof. Leontios Hadjileontiadis (member of the AUTH research team). 40 http://www.youtube.com/watch?v=6zIs5LTixN4 41 http://www.youtube.com/watch?v=n0q3cMvukzk

Del_2_1_FINAL2.doc Page 45 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 evoke broadly shared feelings. The most emotional moments are often the ones that hit on conflicts, anxieties, fantasies and fears that are central to the culture. In this perspective, folklore/cultural expressions try to use every device their medium offers in order to maximize the emotional response of their audience. Insofar as these folklore/popular artists and performers think about their craft, they are also thinking about how to achieve an emotional impact. By using EEG-based emotion acquisition of the performers of rare singing, and the corresponding audience, the difference in contexts within these works are produced and consumed could be identified at the affective space, contributing to the exploration of the ways intangible cultural hierarchies respect or dismiss the affective dimensions, operating differently within different folklore cultures.

4.3.5 Semantic Multimedia Analysis

4.3.5.1 Introduction Semantic multimedia analysis is essentially the process of mapping low-level features to high level concepts, an issue addressed as bridging the “semantic gap” and extracting a set of metadata that can be used to index the multimedia content in a manner coherent with human perception. The challenging aspect of this process derives from the high number of different instantiations exhibited by the vast majority of semantic concepts, which is difficult to capture using a finite number of patterns. If we consider concept detection as the result of a continuous process where the learner interacts with a set of examples and his teacher to gradually develop his system of visual perception, we may identify the following interrelations. The grounding of concepts is primarily achieved through indicative examples that are followed by the description of the teacher (i.e. annotations). Based on these samples the learner uses his senses to build models that are able to ground the annotated concepts, either by relying on the discriminative power of the received stimuli (i.e. discriminative models), or by shaping a model that could potentially generate these stimuli (i.e. generative models). However, these models are typically weak in generalization, at least at their early stages of development. This fact prevents them from successfully recognizing new, un-seen instantiations of the modeled concepts that are likely to differ in form and appearance (i.e. semantic gap). This is where the teacher once again comes into play to provide the learner with a set of logic based rules or probabilistic dependencies that will offer him an additional path to visual perception through inference. These rules and dependencies are essentially filters that can be applied to reduce the uncertainty of the stimuli-based models, or to generate higher forms of knowledge through reasoning. Finally, when this knowledge accumulates over time it takes the form of experience, which is a kind of information that can be sometimes transferred directly from the teacher to the learner and help him to make rough approximations of the required models. In the cultural heritage domain, multimedia analysis has been extensively used in the past decades as a form of automatic indexing the multimedia cultural content. This necessity grows even more these days considering the popularity of digitizing cultural content for purposes such as safeguarding, capturing, visualizing and presenting both tangible and intangible resources that broadly define that heritage. When it comes to intangible cultural heritage, the task of semantic analysis becomes even more challenging, since the significance of heritage artifacts is implied in their context and the scope of the preservation extends also to the preservation of the background knowledge that puts these artifacts in proper perspective. These intangible assets may for instance derive from performing arts (e.g. singing, dancing, etc.) and semantic multimedia analysis is essential for mapping the low level features originating from the signal of the utilized sensors (e.g. sound, image, EEG) to

Del_2_1_FINAL2.doc Page 46 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 important aspects that define the examined art (e.g. singing or dancing style). In the typical case, semantic multimedia analysis consists of the following four components: 1) Pattern recognition: the low level features are mapped to medium level concepts (e.g. detecting the rhythm of a song from the sound). 2) Data fusion: information from various modalities are fused to improve the detection accuracy of the various concepts (early fusion, i.e. fusion is performed directly on the signals, or late fusion, i.e. the results of the pattern recognition models are combined). 3) Knowledge-assisted semantic analysis: explicit domain knowledge, typically provided in the form of ontology by the domain experts, is incorporated along with the outcome of the pattern recognition models and/or the data fusion technique, in order to extract the high level semantics. 4) Schema alignment: the metadata schemas are aligned so that the knowledge can be combined in a single schema to allow for interoperability.

4.3.5.2 Key Projects and applications In the following, we present some important projects that have been or currently being funded by the European Commission in the domain of cultural heritage. These project are the outcome of the strategic focus of the EC towards digitizing, analysing and, most importantly, making cultural resources interoperable, so as to become available through a common interface. In this endeavour, the need for semantic analysis of cultural content is ranked particularly high. Thus, we mention the projects that are relevant for i-Treasures with respect to their contribution to the semantic analysis of cultural heritage content.

4.3.5.2.1 Key projects in Digitization and Preservation of Cultural Heritage During the last few years, digitization of massive documented historical content has led to a large number of advanced libraries in Europe for which large scale- digitization of historical documents is an urgent need. In particular we would like to recall three projects which seems particular relevant: the IMPACT project, CULTURA and the 3D-COFORM project (for a more extensive description of these project, please see Appendix 10.1). IMPACT is related to i-Treasures with respect to the semantic annotation of the cultural content they perform and the retrieval method that they provide to the users. CULTURA focuses on the personalization of the search and retrieval mechanism. Also, in contrast with i-Treasures, the analysis is performed on a narrow set of media, i.e. the historical documents. IMPACT and CULTURA are related to i-Treasures with respect to the semantic analysis of the cultural content they perform and the retrieval method that they provide to the users. However, the semantic analysis on i- Treasures will deal with diverse and plural cultural heritage content collections. Thus, it is multimodal, since various sources of information contribute to the analysis. Lastly, cultural heritage analysis in 3D-COFORM is performed in the low level characteristics of the digitized objects (shape, material, etc.), while in i-Treasures both low-level and abstract characteristics of the intangible cultural heritage content will be analysed.

4.3.5.2.2 Key projects in Access to Massive Cultural Heritage Collections The analysis, representation and modelling of cultural heritage have received increasing research interest in the last decade. There are a number of projects aimed

Del_2_1_FINAL2.doc Page 47 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 at making the European cultural heritage accessible to all. Among the others, we should make a special note to the European Digital Library Network (EDLnet) project, whose activities resulted in launching the beta version of the European digital library, Europeana, which is currently the most well-known gateway for exploring the digital resources of Europe’s museums, libraries, archives and audio-visual collections. Aligned with the objective of making cultural heritage accessible to all, eCLAP has been developed as the European, and GloPAD as the global, digital library which hosts content regarding performing arts (e.g. songs and dances). Thus, both libraries provide intangible cultural heritage content (for a more extensive description of these project, please see Appendix 10.1). Cultural heritage objects are ingested in the 3D-COFORM, Europeana, eCLAP, GloPAD data management systems without (necessarily) semantic analysis, since the metadata are provided by information obtained by experts through tagging. Besides, we would like to recall the ArcLand project, which uses remote sensing techniques to capture the shared landscape and archaeological heritage of the countries of the European Union, the 3D-ICONS project and the Linked Heritage project (for a description of these project, please see Appendix 10.1). These are the most important projects with the common goal to develop novel access methods to cultural heritage content. However, the search and retrieval they propose is based on cultural heritage metadata provided by the users through tagging. On the contrary, in i-Treasures, the metadata will be produced automatically through semantic analysis and will be used for cultural heritage preservation through training and educational activities.

4.3.5.2.3 Key projects in Semantic Analysis of Cultural Heritage and Transmission Semantic analysis on cultural heritage is the task of recognizing patterns and concepts in heritage objects, as well as in their relationships. Thus, analysing cultural heritage in this context promotes the preservation of fragile traditionally preserved knowledge with new means and beyond a mere digitization of objects. In particular we want to mention two projects that focus on the semantic analysis of cultural heritage and that may have similarities with i-Treasures, as well as gaps that i-Treasures aims to fill in this field, namely: the PATHS project and the DECIPHER project (for a description of these project, please see Appendix 10.1). Both DECIPHER and PATHS focus on the semantic analysis of cultural heritage objects in order to provide a sophisticated convenient access to the end-user. Both projects provide convenient access to a vast corpus of cultural heritage and, thus, automated methods that organize the delivery to the end-user method of this content are necessary. Semantic analysis plays an important role to this endeavour, as much as in the case of the i-Treasures project. However, in i-Treasures, semantic analysis results not only are used for search and retrieval, but also for intangible cultural heritage preservation, training and educational purposes.

4.3.5.2.4 Literature in Cultural Heritage Semantic Analysis In this section we provide literature related to cultural heritage semantic analysis. Pattern recognition methodologies play an important role to this task. Moreover, we provide a short introduction to fusion and then provide works that apply fusion for multimodal analysis on cultural heritage content. Knowledge-assisted methods are also highlighted, since the knowledge contribution from experts will play a crucial role in i-Treasures.

Del_2_1_FINAL2.doc Page 48 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

a) Pattern Recognition The difficulty of mapping a set of low-level visual features into semantic concepts, i.e. bridging the “Semantic Gap”, has brought the mechanisms of learning and pattern recognition into the forefront of related scientific research. It is natural that we seek to design and build machines that can recognize patterns. From automated speech recognition, fingerprinting identification, optical character recognition and much more, it is clear that reliable and accurate pattern recognition would offer a great deal of perceptual capabilities. As humans we learn to recognize materials, objects and scenes from very few examples and without much effort. A 3-year old child is capable of building models for a substantial number of concepts and recognizing them using these models. In an effort to simulate the human learning techniques, researchers have developed algorithms to teach the machine how to recognize patterns by using annotated examples that relate to the pattern (positive examples) and examples that are not related to the examined pattern (negative examples). The aim of example-based learning is to generate a global model that maps the input signals/features to the desired labels and generalize from the presented data to unseen situations in a “reasonable” way. Some of the most widely used types of classifiers are the Neural Networks (multilayer perceptron) (Egmont-Petersen et al. 2002), Support Vector Machines (Cortes and Vapnik 1995), naive Bayes (Domingos and Pazzani 1997), decision trees (Breiman et al. 1984) and radial basis function classifiers (Lukaszyk 2004). Pattern recognition plays an important role to the representation and preservation of cultural heritage. Indeed, pattern recognition techniques have been used in the cultural domain for various cultural heritage categories. In (Esposito et al. 2004), a method that processes historical documents and transforms them to metadata compliant to the XML schema is proposed. In (Malik et al. 2011), SVM- based classification of traditional Indian dance actions from multimedia data is proposed. Artifact reconstruction by matching fresco fragments is performed in (Funkhouser et al. 2011). In (Makridis and Daras 2012) and (Karasik 2010) computer vision techniques are employed in order to automatically classify archaeological pottery sherds. Specifically, in (Makridis and Daras 2012) the classification algorithm is trained using a subset of the sherds that are classified by an expert. The proposed classification technique uses these representative sherds in order to classify the remaining sherds. For the classification task, local features based on color and texture information are extracted and feature selection algorithm is applied. A similar approach is followed in (Karasik 2010), where automatic topology and classification of pottery sherds takes place. The ultimate goal of this work is to foster archaeological research and documentation. Moreover, a Computer Vision technique is also used in (Vrochidis et al. 2008) where the authors present a search engine for retrieving cultural heritage multimedia content. Their technique is based on the semantic annotation of multimedia content that is made feasible with the help of an ontology and the low level visual features. Lastly, in (Junyong et al. 2008), the authors present a management information system of Guangdong intangible cultural heritage utilizing GIS technology. b) Fusion in semantic analysis Fusion (Liggins et al. 2008) is the process of combining the outcomes (observations) of multiple sources of information in order to produce a single outcome, as a representative of the outcomes of all sources. The multiple sources are assumed to share a common data generation procedure, something which fusion methods exploit in order to combine the multiple outcomes. Thus, fusion aims to combine the observations (evidence) coming from multiple sources of information by deducing the unknown but common event that lead to the observed data. This can be used as a principled formulation of the multimodal semantic analysis task. For the semantic analysis case, the original data are abstract concepts that produce the indirect data (video, audio, EEG, etc.), where the dimensionality of the latter is much higher. Bayesian inference is actually the procedure of estimating the values of the unobserved variables that model the concepts, which lead to the specific observations, given the generative model of the observations and the evidence (observed) variables. Fusion, from the Bayesian analysis view point (Punska 1999), is the inference procedure where the observations come from different modalities, where each modality has produced the observed data from the same for all modalities hidden data. The Bayes rule plays a crucial role in the inference procedure. Thus, in semantic analysis, based on the assumption that there are hidden unknown variables (i.e., the high-level concepts) that feed the observed data generation process, multimodal semantic analysis that is based on fusion aims to infer from the observed data the concepts that lead to these specific observations. The observed data come from multiple sources, which means the data belonging to different modality are actually indirect observations of the same hidden data. In semantic analysis, the hidden data are the high-level abstract concepts that we aim to detect in the content we analyse. In (Nikolopoulos et al. 2009), a cross media analysis scheme for the semantic analysis of multimedia is presented. Particularly, a late-fusion mechanism is proposed that combines single-media semantic results outputs in order to perform multimodal semantic analysis, resulting in a common analysis for all modalities. Its main novelty relies on using the evidence extracted from heterogeneous media sources in the context of probabilistic inference (using a Bayesian network), where the incorporated prior knowledge is modelled using an ontology. In (Chang et al. 2007), semantic analysis of audio-visual content is performed, by employing multimodal fusion based on statistical models. The outcome of this analysis is the classification of audio-visual contentment entities. Ensemble fusion using both global and local features is employed for the analysis of the visual part of the content. For the audio, Gaussian models as well as advanced statistical methods such as probabilistic latent semantic analysis (Hoffman 1999) are employed to the same end. Several fusion frameworks, such as simple weighted averaging, multimodal context fusion by boosted conditional

Del_2_1_FINAL2.doc Page 49 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

random field and multi-class joint boosting was developed and evaluated in terms of their classification performance. The multimodal classification system that was developed was shown to outperform other methods that use single modalities only. In (Huber-Mork et al. 2011), Naïve Bayesian fusion was used for ancient coin identification. More specifically a two-stage procedure is adopted for the identification of ancient coins in images. The aim of this procedure is to classify a given coin into one of known coin categories. In the first stage, the coin is matched to a small number of potential coin categories. In the second stage, extraction and matching of local features between the query coin image and images contained in the subset obtained from the first stage is performed. At the end, the outputs of the first and second stage are fused in order to obtain a multimodal decision about the coin category. Multimodal semantic analysis is performed also in (Datcu and Rothkrantz 2008), where a Dynamic Bayesian Network (DBN) is employed in order to fuse the audio and visual information of multimodal data (audio-visual content) and provide an emotion recognition algorithm more accurate than its monomodal counterparts. c) Knowledge-assisted semantic analysis Research has shown that, in general, domain knowledge coming from experts can augment the performance of the semantic analysis task. Particularly, in (Koolen and Kamps 2010), it is shown that both expert and non-expert users can benefit from taking into consideration the structure of cultural heritage metadata in their search query. Interestingly, it was found that the accuracy of cultural object retrieval is increased when the data are appropriately structured and this in a way is exploitable by the user. Ontologies are knowledge representation models that have been used for the representation and preservation of cultural heritage. An ontology models knowledge as a set of concepts, which are entities within a specific knowledge domain. Concepts have specific properties and are related with each other. Knowledge representation using ontologies has extensively been used in multimedia semantic analysis. Regarding i-Treasures, the cultural heritage knowledge is provided by the appropriate experts, which are called domain experts. Next, we give some examples of works that propose ontologies for knowledge assisted semantic analysis of multimedia. In (Bai et al. 2007), a video semantic content analysis framework is proposed, where an ontology is used in combination with the MPEG-7 multimedia metadata standard. Based on the domain ontology that defines the semantic concepts and their relations, MPEG-7 audiovisual metadata terms are expressed in this ontology. OWL is used for the ontology description. Description Logic is used to describe the semantic events and a reasoning algorithm is proposed for event detection. In (Dasiopoulou et al. 2005), which is an approach to knowledge-assisted semantic video object detection, Semantic Web technologies are used for knowledge representation in the RDF(S) metadata standard. Depending on concept attributes and low-level features, logic-based reasoning is employed for the detection of video objects corresponding to the semantic concepts defined in the ontology. This renders the multimedia analysis tasks that adopt this framework, to be application and domain independent. Another example of an ontology framework used in order to facilitate ontology-based mapping of cultural heritage content to corresponding concepts is proposed in (Malik et al. 2011). More specifically, in this work, an ontology-based framework is proposed that facilitates the properties and the relationships between the heritage resources both at domain knowledge level and at feature level. The ontology used in this framework includes descriptions of domain concepts, where the descriptions are given formally with terms of related low-level audio-visual features, appearing in the cultural multimedia content, enabling in this way a convenient semantic interpretation of the multimedia data. The ontology architecture manifesting the domain knowledge, after it is designed by domain experts, contains parameters that are fine-tuned by applying machine-learning techniques. Specifically, the Multimedia Web Ontology Language (MOWL) is used to encode the domain knowledge, which is trained using labeled data and is used ultimately the ontology to automatically annotate new instances of digital heritage artefacts. In (Mulholland et al. 2011), a work developed in the framework of the DECIPHER project is presented, which proposed a methodology for the description of museum narratives (i.e., the structure of the exhibits). Specifically, an ontology tailored to the needs of the project is proposed to be used for this description. The authors of (Malik et al. 2008) propose a knowledge framework modelling intangible cultural heritage that is based on an ontology proposed in (Tan et al. 2009). Particularly, the work in (Malik et al. 2008), presents a knowledge assisted framework for the semantic analysis of cultural Indian dances, where a carefully designed ontology plays an important role for the detection of specific dance styles and moves in multimedia with cultural content. Probabilistic Bayesian inference, which takes into account the evidence (elementary concepts, such as body postures and gestures), in conjunction with the knowledge expressed via the ontology, is employed for the detection of abstract concepts, such as dance styles. Lastly, in (Malik et al. 2008), a knowledge representation framework is proposed designed to express and share cultural heritage knowledge based on an ontology. The CIDOC CRM data model was used to this end. Towards the same direction, the authors of (Lien et al. 1998) perform ontology-based semantic analysis with a view to link media, contexts, objects, events and people. Also, a comparative study about which modeling techniques are more appropriate for the representation of cultural heritage domain

Del_2_1_FINAL2.doc Page 50 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

concepts is presented in (Hug and Gonzalez-Perez 2012). d) Schema alignment and ontology based mapping Due to the multimodal nature of the content that is to be analyzed semantically in i-Treasures, a common metadata schema is necessary for the interoperability between the elementary concept and the semantic analysis tasks. Next, we give a number of cultural heritage metadata schemas that they are designed to foster the interoperability among heterogeneous cultural heritage metadata standards by making convenient the mapping of the semantic information to one common metadata standard. This procedure can be used within i-Treasures, in order to facilitate the dissemination of the semantic analysis results through second party cultural heritage repositories, such as Europeana. A vast number of Europe's cultural heritage objects are digitised by a wide range of data providers from the library, museum, archive and audio-visual sectors, and they all use different metadata standards. This heterogeneous data needs to appear in a common context. Thus, given the large variety of existing metadata schemas, ensuring the interoperability across diverse cultural collections is another challenge that has received a lot of research attention. Europeana data model (EDM), which was developed for the implementation of the Europeana digital library, was designed to enforce interoperability across various content providers to the library. EDM transcends metadata standards, without compromising the range and richness of the standards. Also, it facilitates Europeana’s participation in the Semantic Web. Finally, the EDM semantic approach is expected to promote richer resource discovery and improved display of more complex data. The PREMIS Data Dictionary for Preservation Metadata is an international standard for metadata that was developed to support the preservation of digital objects/assets and ensure their long-term usability. PREMIS metadata standard has been adopted globally in various projects related to digital preservation. It supports numerous digital preservation software tools and systems. The CIDOC Conceptual Reference Model (CRM), an official standard since 9/12/2006, provides the ability to describe the implicit and explicit relationships of cultural heritage concepts in a formalized manner. Thus, CIDOC CRM is intended to promote a common understanding of cultural heritage information by providing a common and extensible semantic framework that can represent any cultural heritage information. It is intended to be a common language for cultural knowledge domain experts to formulate user requirements for information systems, and thus, facilitating in this way the interoperability between different sources of cultural heritage information in a semantic level. Lastly, the work in (Kollia et al., 2012) is worth to be noted, since it provides a methodology to map semantic analysis results to the EMD metadata schema. This is very useful for applications that aim to make their metadata available and reusable by end-users and similar but heterogeneous applications.

4.3.5.3 Possible Use in Intangible Heritage Preservation and Transmission In i-Treasures with the semantic analysis term we refer to the process of deducing whether the hypothesis that a number of specific concepts exist in a piece of multimedia content is true. We deal with content of multiple modalities (e.g. visual, audio, voice, EEG), thus the semantic analysis we aim to perform is multimodal. Research on semantic analysis in i-Treasures will entail the development of algorithms that analyse the cultural heritage intangible content, in order to detect and decide which concepts, defined by the experts of the domain knowledge, are included in the content. As explained earlier, multimodal analysis can be considered as a general fusion process, which can be applied on different levels of abstraction: (a) result level (Nephade and Huang 2001, Snoek et al. 2006) where fusion takes place in the semantic (conceptual) space, (b) feature level (Magalhaes and Ruger 2007, Wu et al. 2004) where fusion takes place in the feature space (by feature combination), as an intermediate stage for the semantic analysis tasks (i.e. classification, clustering, indexing, etc.). In i-Treasures, we plan to perform multi-modal analysis in both levels of abstraction. Since structured knowledge will be available for the analysis task, we plan to extend the multimodal semantic analysis works in (Lakka et al. 2011, Nikolopoulos et al. 2012). In these works, combination of multimodal information and, thus, semantic analysis, is achieved by modeling the conceptual space as a Bayesian network. This allows all evidence originating from the domain knowledge, application content and the multiple content modalities to contribute to the support or disproof of a certain

Del_2_1_FINAL2.doc Page 51 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 concept related hypothesis. Thus, the information carried by each modality is stochastically modelled and concept detection is performed via probabilistic inference algorithms. On the other hand, in cases where our only source of information is the low-level features of two (or more) simultaneously captured modalities, the feature-level of abstraction will be favoured so as to exploit the statistical characteristics of the data. In this endeavour, we will leverage our work (Nikolopoulos et al. 2009) that relies on aspect models like probabilistic latent semantic analysis (pLSA) (Hoffman 1999) to overcome the heterogeneous nature of the combined features and extend the currently available techniques to higher order (Nikolopoulos et al. 2012) so as to efficiently handle the increased number of modalities envisaged in i-Treasures. Moreover, we also plan to employ the latent Dirichlet allocation model (LDA) (Blei et al. 2003) for this endeavour, which is a more complex and flexible model than the pLSA hierarchical Bayesian model, proposed as an extension of pLSA. pLSA and LDA are stochastic Bayesian methods that can model the conceptual (semantic) space via latent (hidden/unobserved) stochastic variables. They are based on the assumption that the evidence data are the product of a process applied on hidden data. All data are of course modelled as random variables. Based on a stochastic model (the prior) for the hidden data and a stochastic model for the generation of the evidence data given the hidden, Bayesian inference can be employed to infer the hidden data. In our case, the hidden data are the high-level concepts that we assume to have generated the evidence (observed) data. These are either raw cultural heritage data (e.g., video, audio, EEG) or more abstract forms of information (low and medium level features of these data). Fusion is also a methodology we plan to employ for the semantic analysis task in i- Treasures. Semantic analysis can be viewed as fusion of features extracted from different modalities (at the feature level abstraction) and as combination of multiple conceptual space (at the result-level abstraction). We plan to focus on the Bayesian fusion methodology, since it can conveniently be applied to our semantic analysis problem. Second, Bayesian fusion is a method that relies on stochastic generative models, which is in similar spirit with the aforementioned pLSA and LDA models. Since we have proven experimentally (Lakka et al. 2011) that, in a cross media setting, the generative models outperform the discriminative ones in fusing the extracted evidence, mainly due to their ability in efficiently handling prior knowledge and learning from a few examples, we plan to use this type of modeling, instead of discriminative ones. In i-Treasures, four use cases of intangible cultural heritage analysis have been specified. The cultural heritage resources regarding each use case will be available in a digitized form. However, i-Treasures goes far beyond the preservation of cultural heritage in digitized form. By analysing semantically the heritage resources, intangible heritage elements can be automatically detected and annotated in a rigorous context. The automatically extracted cultural heritage knowledge enables the convenient integration of new resources with the i-Treasures digital heritage collection without the presence of a cultural heritage expert. Furthermore, the search and retrieval of cultural heritage content from a vast corpus of data, based on the extracted annotations (knowledge) becomes very convenient, not only for the cases where a user desires simple access to the content but also for learning and educational tasks. Thus, semantic analysis extends the simple preservation of intangible cultural heritage via digitization to the preservation of the salient information imbued in the cultural heritage content (e.g., dance styles, song types, singing modes, etc.). This

Del_2_1_FINAL2.doc Page 52 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 information is important and i-Treasures aims to facilitate its promotion and preservation. Semantic analysis plays an important role to this key objective. As stated in the beginning of this section, semantic analysis can be performed in two modes. In the first mode (result level abstractions), the process consists of the following to steps:  recognize cultural heritage patterns that correspond to elementary domain concepts like relics, postures, body moves, audio tempos, etc. and  based on these elementary domain concepts, obtain a degree of confidence for the plausibility of the hypothesis stating that the analysed media item manifests a certain high-level concept, such as dance type, song style etc. Thus, using the above terminology, in the first mode, mapping from elementary domain concepts to high-level (abstract) concepts is performed. In the second mode, however, the mapping does not include the intermediate step of elementary concept detection, but, instead, less abstract characteristics (low-level features) are used. Since the elementary concepts come from different modalities, something that makes the joint processing and analysis of heterogeneous content more difficult than the monomodal case, the semantic analysis will be multimodal. Ontologies will be created by cultural heritage experts, so as to model the domain knowledge. The domain knowledge is, in essence, the elementary and high-level concept properties and relationships definition and it will be defined in a structured and principled manner using an ontology, as in the works mentioned in Section 4.3.5.2.4 point c. Note that there will be related concepts from different modalities. Domain experts will be employed to provide domain knowledge, which in turn will be translated to formalized knowledge in the context of an ontology. The Ontology Web Language (OWL) (Horrocks et al. 2004) will be used to this end. More specifically, a custom ontology will be developed for each i-Treasures use case and with input taken from the use case experts. These ontologies will be very useful for knowledge- assisted multimodal analysis.

4.3.6 3D Visualization of Intangible Heritage

4.3.6.1 Introduction Intangible culture is quite different from tangible culture, since intangible culture such as skills, crafts, music, song, drama, and the other recordable culture cannot be simply touched and interacted with or without use of other means. In real life, tangible cultural heritage can be demonstrated in an environment like museums and related exhibitions. A cultural heritage structure which is totally destroyed such as a temple can be even reproduced as a replica, so that audience can personally wander inside. On the other hand, due to its non-physical nature, intangible cultural heritage is more restricted and hard to demonstrate in real life which is a real challenge to prevent it from disappearing. This is where 3D visualization and interaction technology comes into play. There is a famous quote by Albert Einstein which says “If I can’t picture it I can’t understand it”. 3D Visualization helps us to create the images, animations or models of anything either tangible or intangible. In this way it is quite easier and effective to transfer knowledge such as communicating a message, explaining an idea, showing an event or presenting a scientific study. Thanks to the advent of computer graphics, it is now possible to create the picture or animation of almost anything. What can be done is limited to imagination. Currently it is possible to generate the realistic or abstract 3D visualizations by using 3D modeling tools or programming interfaces.

Del_2_1_FINAL2.doc Page 53 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Furthermore, such technology makes it even more possible to transmit these intangible cultures to more audience than the times when they were actively performed. At those times, only a limited number of people were able of such arts, so that only limited number of pupils can be tutored with the knowledge. By the help of extensive technology, lots of people can acquire culture of such knowledge and art. It is certain that 3D visualization and interaction will hardly be on par with the real thing and instruction system in a computer application and simulation cannot possibly match a real life master’s tutoring. Obviously, the degree of reality in interaction, visualization and physics simulation becomes a very important concern for users to become well accustomed to the culture and encourage others to do so. There exist research studies and industry applications (such as games) for this purpose as explained below.

4.3.6.2 Key Projects and applications There are several industry applications related with intangible cultural heritage. Some applications are commercial and some of them are experimental R&D work. It can be concluded that there is an increasing trend to use more ICH content in 3D visualization studies. The following covers the industrial 3D ICH applications considering the focus of the i-Treasures project.

4.3.6.2.1 Virtual Pottery “Let’s Create! Pottery” is an iOS and Android application which is developed by iDreams. It allows users to create realistic pottery pieces by sculpting and painting the pieces as they want42. It is considered as the best mobile application for virtual pottery. In the last update they released in March 2013, they improved the application to actually print the created pottery. After user completes the pot, he/she can choose to print it. The created model is sent to a 3D printing firm and they print it by using 3D printers for a price which varies depending on the size of the model. L’Artisan Electronique is another project that combines traditional pottery with digital techniques. It is developed by Unfold and Tim Knapen43. It is similar to traditional way but instead of touching the pot, a 3D scanner scans your hand to figure out where your hand is, and software applies your scanned hand projection to the pot to give it a shape. There are many other pottery applications similar to “Let’s Create! Pottery” which are not as good as it is. However their focuses are same; sculpting pots with touching and painting it with the given colours and patterns.

4.3.6.2.2 Virtual Museum Applications 3D virtual museum applications give people an opportunity to visit museums virtually. Another option is using these applications on kiosks in museums to give visitors an introduction about the theme of the museum. There are immersive visualization approaches as well. V-MUST44 is a Network of Excellence recently funded by EU, that aims to provide the heritage sector with the tools and support to develop Virtual Museums that are educational, enjoyable, long-lasting and easy to maintain.

42 http://techcrunch.com/2013/03/06/lets-create-pottery-because-virtual-pottery-and-3d-printing-were-made-for-each- other 43 http://www.piepaper.com/?p=568 44 http://www.v-must.net/

Del_2_1_FINAL2.doc Page 54 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Reotek, a company specialized in interactive museums offer many 3D intangible cultural heritage visualization such as death burial ceremony of ancient civilizations as shown in Figure 4-1245.

Figure 4-12. Virtual Death Burial Ceremony of Hitite Civilization (courtesy of ReoTek)

4.3.6.2.3 Traditional Dance Applications Dance applications are very popular on Kinect. Making games on traditional dances attract people who are interested in their own countries’ or even other countries’ folk dances. As an example, there is an application called Virtual Bharatanatyam Tutor. This application helps enthusiasts to learn traditional Bharatanatyam dance46. The dance is popular in South Indian state of Tamil Nadu47. The application teaches users the body gestures in Bharatanatyam dance and shows the correct gesture if user has made a mistake. The image below demonstrates how the application explains the mistakes and, shows and tells the correct way.

Figure 4-13 Virtual Bharatanatyam Tutor

45 http://www.reo-tek.com/en 46 http://www.behance.net/gallery/Virtual-Bharathanatyam-Tutuor-Kinect-application/6048463 47 http://en.wikipedia.org/wiki/Bharata_Natyam

Del_2_1_FINAL2.doc Page 55 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Applications for various traditional dances can be created. Training as in the example above can be a part of the game, and actual dancing can be the main objective. There are several dancing games on Kinect, but they mostly focus on modern dances, or dances with simpler moves compared to the traditional dances.

4.3.6.2.4 Literature in 3D Visualization Academy puts significant effort into preservation of intangible cultural heritage. There are studies in literature about 3D visualization and interaction of intangible cultures such as music, dance, drama, crafting, and skills. Each study shows a different level of interaction and visualization because of their nature. For example, applications of craft and skill use more interaction than applications of drama or dance culture. It is possible to categorize these studies on the level of interaction, visualization or technology used, but it would be a rather vague categorization so we will categorize them simply as their real life counterparts such as “dance and drama” and “crafting and skill”. Another section is devoted to the studies which try to reflect fragments of ancient civilizations environments in a vibrant manner. a) Dance and drama The authors of (Zhou and Mudur 2007) made a study on 3D visualization of facial expressions and animations of Chinese opera which is an ancient dramatic art form. They directly morph the 3D painted avatar’s face of a Chinese opera performer by using level set deformation based on the real life input of an actual performer. Another study is about Noh which is a Japanese musical drama. In this study a synchronous scripting based system is utilized in second life to realize a complete Noh play with in Second Life (Choi et al. 2010). 3D Noh avatars are animated with synchronized music and audio in a 3D virtual Noh theatre. b) Crafting and skill Pottery is a kind of craftsmanship and can also be considered as an art as well. Most of the studies are about ceramic, clay and pottery working. Though these studies do not emphasis on preservation of intangible cultural heritage, they can play their own parts as a background studies. There are three studies about 3D interactive virtual pottery making. One of the early studies was done in 1997 (Han et al. 2007). In this study, users can create and modify virtual ceramic pottery in a 3D stereoscopic display facility by using their hands with two-handed instrumented gloves. In 2007, Korida, Nishino and Utsiyamahave presented a more accessible pottery making virtual environment via use of markers as seen in Figure 4-14 (Korida et al. 1997).

Figure 4-14. Virtual pottery working by use of markers

This work is really a meaningful implementation in the sense of accessibility for end users, since a camera and markers printed on paper should be enough to utilize the tool. In (McDonnell et al. 2001), rather a more realistic clay crafting system is created via physic based clay model and haptic force feedback devices. Such an implementation can provide more realistic feel of the pottery and similar crafting cultures but would be still inaccessible to most of the users. Turkish Ebru art and Japanese Sumi ink marbling which is also called as paper marbling is also a topic of 3D ICH visualization. Lu et al have proposed mathematical methods to generate such visual outputs similar to Figure 4-15 (Lu et al. 2012).

Del_2_1_FINAL2.doc Page 56 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Figure 4-15 Virtual paper marbling

c) Environments in ancient civilizations In (Yang et al. 2006), virtual reality technology has been utilized to demonstrate and simulate ancient cultural space with crowd in the time of warring states. To realize interactive scenarios, a rule behavior system was created in order to interact with cultural environment in more depth. Another vibrant cultural environment is the Isa Bey Tekke where Mevlevi dervish “zikr” rituals took place 14th centuries. Those environments were used by interactive story driven approach. Huseinovic, Turcinhodzict and Rizvic have utilized museum technologies and emerging interaction design to transmit intangible heritage (Huseinovic et al. 2013).

4.3.6.3 Possible Use in Intangible Heritage Preservation and Transmission As mentioned earlier, there is a significant difference between the amounts of 3D visualization studies regarding tangible and intangible cultural heritage studies. That is quite expected due to the nature of real and abstract. However, new advances in multi-modal sensors and computer graphics make it possible to introduce new approaches to visualize the intangible in a manner once it is not possible or easier. Considering the literature survey in this field it can be concluded that it is not easy to transfer the knowledge regarding ICH into 3D visualization. However there is a raising concern from the communities such as computer graphics, computer games and cultural studies. i-Treasures project can be a pioneer in this area by introducing innovative capturing methods for intangible cultural heritage and mapping the captured data into visual outputs by using computer graphics, computer games and similar technologies. The key factor for the success is the good blending of data capture, effective usage of 3D visualization and imagination. Possible concrete use in i-Treasures could include for example the creation of two avatars, one showing an expert performing one particular cultural expression, and the other being operated by the student, who may practice and compare her/his performance with the master’s one until the desired style/ movement/ technique has been acquired.

4.3.7 Text to Song

4.3.7.1 Introduction Singing Voice Synthesis (SVS) is a branch of the text-to-speech (TTS) technology that is concerned with generating a synthetic interpretation of a song, given its text and musical score. Synthesis of singing voice has been a research area for a long time and for each decade a major technology improvement has been achieved. Let’s have a look at what kind of technology breakthrough has been achieved for the last 30 years.

Del_2_1_FINAL2.doc Page 57 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

The synthesis of songs is a story almost as old as the . Once systems have been able to speak, their inventors have invariably tried to use them for singing. But most of these trials were very basic, and far below the level of quality required for true musical applications. The beginnings of the synthetic voice in musical representations in the artistic domain date back to the beginning of the 1980s, with the ’Chant’ project developed at IRCAM, and used by composers of contemporary classical music (Rodet et al. 1984). ‘Chant’ is based on the synthesis by formants (by rules), like other well-known systems, as those developed at KTH in Stockholm (Berndtsson 1996), or CCRM Stanford (Cook 1992). A state of the art description can be found in Cook (1996) or in Bennett and Rodet (1989). Such systems were capable of synthesizing realistically vocal vowels (vocals), at the price of a big studio work to analyse and adjust the settings of the systems. For singing, as for speech, the 1990’s are marked by the generalization of , driven by the impressive increase in size of the speech corpora. Another approach to singing synthesis that found its way in professional quality productions is not based on synthesis, but on voice conversion. The first example was the merger of two voices, a male alto and a soprano voice from a woman to ’create’ the voice of a castrate (male soprano) in the film ”Farinelli”, examples can be found on youtube and in Bennet and Rodet (1989). The voice was capable of high quality articulate speech; however it was not synthesis but just voice transformation. The 2000 Decade was marked by the appearance of (Kenmochi and Ohchita 2007 ). It is the first software for singing voice synthesis with articulate lyrics which had a very important mainstream development. Vocaloid is marketed by Yamaha since 2003. During this period, concatenative synthesis is being generalized, and statistical parametric synthesis systems appeared for speech and more marginally for singing. The recent period is marked by a revival of interest for singing voice synthesis. The request is coming from both the composers, the audio-visual and public games industries. Recent innovations include real-time control speech synthesis, both in the sphere of mobile orchestras (project Chorus Digitalis, Vox Tactum, meta-orchestra, laptop orchestra), the use of various sensors in conjunction with a synthesizer.

4.3.7.2 Key Projects and applications

4.3.7.2.1 State of Current Technology Singing voice synthesis systems use various techniques: parametric synthesis by rule; synthesis by concatenation and conversion of units (Bonada et al 2001, Bonada et al 2003, Macon et al. 1997a, Macon et al. 1997b); speech synthesis with voice conversion (Kenmochi and Ohchita 2007, Saitou et al. 2007); statistical parametric synthesis (Hidden Markov Model, HMM) (Saino, Tachibana and Kenmochi 2010). Most of these systems are limited in their ambitions by using only a part of the technologies that are available to date. Indeed, the concatenation-based systems only take into account a small portion of the knowledge on the singing synthesis (Berndtsson 1996, Zen et al. 2009 ) and consist in the application of methods proven effective in speech synthesis. The synthesis by rules technology is limited by the number of sounds that can be manually created by the human experts. The system

Del_2_1_FINAL2.doc Page 58 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 developed at IRCAM48 in the years 1980-1990 for example was based on formant synthesis and can only generate vowels. It contained however a large number of rules regarding the properties and characteristics of the vocals and the associated quality of the production remain impressive today. Accurate knowledge and control of both the vocal tract and source are required to achieve high quality Singing Voice Synthesis. The problem is made even more difficult than in the case of Speech Synthesis by the increased number of contexts present in the songs (i.e. musical notes, emotional content, etc.) and by the raised expectations of the listener. The context of rare songs is even more problematic since the corpora available for training and studying these kinds of songs, when they exist, are small and usually consisting of low quality recordings. Most of the current synthesis systems are commonly limited to English, Spanish, or Japanese modern songs. The rare songs paradigm has clearly never been addressed by the research community or companies.

4.3.7.2.2 Vocal Tract Control In the case of the vocal tract, two important themes have been identified in the control problem. The first is represented by accurate estimation of the spectral envelope. This issue is addressed by the STRAIGHT (Kawahara, Masuda-Katsuse and de Cheveigne 1999), and True Envelope (Robel et al. 2007) frameworks. However, because of the rapid and fine variations in the source signal characteristic to the singing voice, a clear relation between the spectral envelope and the vocal tract shape is not easy to extract. This brings us to the second theme: the correlation between the spectral envelope and the value of the fundamental frequency. Achieving control over this phenomenon is crucial in the correct synthesis of vibrato (Lee and Dong 2011), as well as in the synthesis of new musical notes using the spectral envelope extracted in a known context (Villavicencio and Bonada 2010).

F0 Control A number of studies are available concerning the generation of the control parameters for the fundamental frequency value during singing. Two main approaches have been proposed. One is using deterministic models (second order linear systems (Fujusaki et al. 2004) excited by the basic melody provided by a MIDI (Saitou et al. 2005) and the other one is based on HMMs that are trained using natural F0 contours from a song corpus (Stevens 2000). The second order linear system control model (Saitou et al. 2005, Stevens 2000) showed very little degradation in an analysis/synthesis framework. The proposed system uses three different systems to model the vibrato, overshoot, and preparation phenomena present in the sung voice F0 contour. A psychoacoustic experiment run using a copy system, in which the pitch contours was changed with isolated behaviour types showed the overshoot to be the most important to the perceived quality of the song. The summation of all the synthesized behaviours was perceived as being close to the natural reference. Two recent studies have investigated the use of HMMs in the control of sung pitch contours. In Saino et al. (2010), HMMs are used to model both the behaviour-type during a note production, as well as the pitch contour during the note production. Psychoacoustic tests have shown a clear improvement over the Vocaloid standard control model. In Ohishi et al. (2008), the HMMs are used to model the command signal of the sung pitch contour, which is then passed through a second order system to recreate its fine structure. No listening tests were conducted, but a comparison between the estimated and natural signals showed promising results.

48 See IRCAM (http://anasynth.ircam.fr/home/media/singing-synthesis-chant-program) for a demo

Del_2_1_FINAL2.doc Page 59 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Source Control Although efforts have been made towards modelling the singing voice source (Lu and Smith 2000, Roebel et al. 2012) and corpora recording for the study of a larger variety voice qualities (Proutskova et al. 2012 ), there are just few examples of modelling source parameters in SVS systems (Roebel et al. 2012). In the field of parametric speech synthesis, using more complex and controllable models of the glottal source has shown good results. In Huber et al. (2012), the Rd parameter was used to control the Liljencrants-Fant model in a speaker conversion application with positive results. In Nambu et al. (2010), the authors attempt joint stochastic analysis of the vocal tract and glottal flow, with applications in speaker conversion. Attempts have also been made in using source models to improve HMM-based speech synthesis systems. In Cabral et al. (2011), the parameters of the glottal flow are used to train statistical models, and used to reconstruct the glottal pulse at synthesis time. In Raitio et al. (2011), the authors use a database of residual frames to excite the filter generated by the HMMs; another non- parametric model of the glottal flow is used in Drugman and Dutoit (2012). All cited approaches show improvements in the quality of the voice.

Speech-to-Singing An important trend in the singing voice synthesis technology is represented by Speech-To-Singing conversion. It allows using the techniques developed in TTS systems to generate the underlying spoken message, upon which the conversion techniques are applied to render a song quality to the voice. Speech Conversion systems are limited because the base signal (speech) is too far from the song signal in order to obtain a high quality song. Although HMM synthesis is allowing easy adaptation parameters of the voice, it does not yet produce exceptional voice quality, and therefore is hardly usable for high quality productions. But a first singing synthesis system using this technology has already been proposed (Saino and Kenmochi 2010, Stevens 2000) limited to children voices. Recently, several conversion techniques have been applied to transform spoken lyrics to songs. One approach makes use of the established family of techniques used to adapt HMM voices to a given target (Gales 2000, Visweswariah et al. 2002). In this direction, the technique described in Kim et al. (2011) was applied with satisfactory results to adapt a general purpose speech synthesis HMM voice to a singing voice (Sung et al. 2011). GMM conversion was also proposed for concatenative SVS systems (Villavicencio and Bonada 2011), showing good results for in-range conversion. The use of inferred rules and models of the singing behaviour (Saitou et al. 2005) has also been proposed (Saitou et al. 2007) as a method to convert the read message into song. The proposed method manipulates the intonation, duration and spectral envelope of the read message, showing good results for simple melodies.

4.3.7.2.3 Projects Following the analysis of a global database of patents and proceedings, we tried to investigate if the research carried out under this framework (singing TTS) is hampered by any intellectual property belonging to an existing company. Most of the patents found in this category49 have been filled by Yamaha. The Yamaha Corporation is active in synthesis since the beginning of the 1980s and heavily protected its research. Microsoft is also part of the game50. Two observations need to be made related to those patents: On the one hand, a large part of these patents are based on the synthesis itself, that is to say the approach (selection of units). In the i-Treasures project, we will investigate and compare another technology, statistical parametric synthesis. In

49 Patents: FR2849951, EP2270773, US20110000360, JP2010169889, JP2007272242, JP2007226174, JP2007240564, WO200484174, EP1455340, JP2004258561, JP2004061793, US20040006472, US20030159568, JP2003323188, EP1220194, JP2008170592, JP2006330625, JP2006330615, JP2005275420, JP2005018097, JP2004061753, JP2004004440, EP1220195, JP2009075611, JP2006119655, US20060173676, EP1505570, JP2004077608 50 Patent: Yao, Q. and Soong F. (2011). Synthesized singing voice waveform generator. US7977562 B2.

Del_2_1_FINAL2.doc Page 60 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Bonada and Saino (2011)51, Yamaha is using HMM technology, but only for pitch contour generation. On the other hand, Yamaha patent assume systematically the use of their software in the context of synthesis of songs from the text. As we will not use the Yamaha software and no patented technology for the singing TTS system, the use of such a technology in order to produce traditional songs appears to be a sufficient innovation from their patents. As its name suggests, the singing voice synthesis technology is largely based on the regular speech synthesis techniques. In order to reproduce a song, the system needs to generate the lexical content given constrains of segment duration and pitch provided by the score. As in the case of speech synthesis, there are two major approaches to the problem. One of them is based on the concatenation of recorded sounds; it is generically called unit selection. These algorithms use large databases of sounds pre-recorded – usually from a single speaker –, which are stored with minimal quality loss. The segments stored in the database are selected and concatenated at synthesis time, to render the desired message. This approach was used in the synthesis of singing (Bonada et al. 2003, Meron 1999), leading to an artificial singer intelligible, but expression and vocal production are a bit caricatured (limited to the provided song style) and not controllable by the user. The other one is based on using the Hidden Markov Models (HMM) framework to train statistical models of acoustic features of phonetic entities. The algorithms were initially developed for ASR applications, and they have gradually moved to speech synthesis applications, especially for their low memory footprint, and highly intelligible output (Tokuda et al. 2000). This technique was also used for the synthesis of singing voice, particularly in a Japanese system called (Oura et al. 2010). In all cases, these systems are not interactive. In next section, we will describe more precisely the different technologies, their limitations and possible improvements.

4.3.7.3 Possible Use in Intangible Heritage Preservation and Transmission Even though commercial SVS systems are available, and seem to be quite successful in simple applications, their quality is not yet high enough to be used in applications where virtuosity of the singer is expected (e.g. opera, jazz, folkloric music, etc.). The academic systems were also limited to simple applications, so far. Although we do not expect to reach a level of naturalness and control that a professional human singer would be capable of, we intend to provide additional stepping stones in this direction. We hope that the knowledge gained, by improving the production and control models needed to reproduce the features unique to the studied rare singing techniques, will benefit the general understanding and reproduction of human sound production. There are two main applications in which SVS could be used to help in singing cultural heritage preservation. The first is as an educational tool, assisting pupils in their effort to master the skill by providing examples. As all three traditional songs studied in the i-Treasures project (Byzantine music, Corsican music, and Sardinian music) are polyphonic, so the system could also assist the pupils by filling in for the missing partners. We were not able to find any scientific studies regarding the use of

51 Patent: Bonada, J. and Saino K.( 2011). Apparatus and Method for Creating Singing Synthesizing Database, and Pitch Curve Generation Apparatus and Method. US20110000360 A1.

Del_2_1_FINAL2.doc Page 61 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 virtual tools in choir singing training. However, there is one online choir training platform52, providing free resources, and the use of virtual tools for assisting in second language learning has been well documented (Ranalli 2011, Wik 2011, Wik and Hjalmarsson 2009). A second envisioned application is a performative tool. In this scenario, the system should perform as an “instrument”, which can be “played” by the user. A system that makes use of haptic input devices (i.e. tablets, joysticks, etc.) to control the parameters of synthetic voices — both of the source, and of the vocal tract — in real time, is presented in Le Beux et al. (2011). The authors have experimented with performing chironomic choral songs. The instrument seemed easy to learn, and viable, the players achieving good levels of accuracy for synchronization. Their mastering of intonation and ornamentation also showed good levels. Similar approaches of treating the synthetic voice as an instrument have been reported in D’Alessandro et al. (2005, 2006) and in Bonada and Serra (2007). Another difficulty is the lack of linguistic resources for two of the targeted languages: Sardinian and Corsican. There are no previous attempts to implement any speech technology solution reported for these two languages. The Greek language is much better represented from this point of view. There are numerous commercial and academic implementations of speech recognition (Digalakis et al. 2003, Nuance Communications Inc., 2012a) and TTS systems (Karabetsos et al. 2010, Lazaridis et al. 2010, Nuance Communications Inc., 2012b). Acapela has already developed a Greek TTS system53 which can provide the linguistic resources needed for the development needed for the Byzantine SVS system. There is even a reported implementation of a SVS system for the Greek language (Kyritsi et al. 2007). The use of the MBROLA synthesis technique (Dutoit and Leich 1993), coupled with the simplistic pitch contour model, had bounded the system’s applicability to simple popular songs. It is only relevant to our future research because it is the only SVS system available in any of the targeted languages.

4.4 Results and Way Forward: Emerging requirements From the above, it becomes obvious that the advanced technological methods discussed earlier have got a limited use so far for the preservation or transmission of intangible heritage. At the same time, there are evidences from previous projects and studies that there are numerous opportunities to employ the specific technologies in cultural heritage preservation with a particular focus on transmission and education. Although technology cannot replace human interaction there is significant scope to develop activities that on the one hand document and preserve the knowledge of rare songs, dances, composition and craftsmanship, but also ensure the transmission of this knowledge to younger generations. In this sense, technology can help sustain the knowledge of the past and enable its transmission to future generations. For this reason, it is important to ensure the active involvement of cultural practitioners in the development of i-Treasures. Rather than a threat, new technologies constitute a great opportunity for the documentation and dissemination of intangible heritage. Instead of only focusing on the documentation of intangible heritage, the combination of the technologies discussed above can create a novel approach for the safeguarding of intangible heritage that is primarily focused on education, training and pedagogical interaction. In the case of rare traditional singing, the combination of these technologies can contribute significantly not only to the documentation of the knowledge of singing, but

52 www.choralia.net 53 http://www.acapela-group.com/text-to-speech-interactive-demo.html

Del_2_1_FINAL2.doc Page 62 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 also to its dissemination to a broader audience. Facial expression analysis technologies could therefore enable the detailed recording of the singers’ expression and singing technique and EEG analysis could provide information about the performers’ emotional state. Vocal tract sensing technologies could be used to document the various changes of the vocal tract and motion capture technologies could give an indication of the performers’ body movements. Text to Song technology could provide sophisticated software for the creation of an educational tool to be used in educational scenarios and a ‘performative tool’ that can be used as a traditional musical instrument. With respect to the case of dance, motion capture technologies can provide a detailed representation of the movement of the human body in performance bringing new insights to motional and gestural aspects whose examination is not always possible due to complex outfits and costumes. EEG analysis and facial expression analysis can contribute to the examination of the emotional state of the dancers when performing. Moreover, 3D visualisation can enable the representation of dance movements in a 3D and sensorimotor learning context. Regarding the case of craftsmanship and pottery, motion capture can be used again for the detailed documentation of hand and finger movement during the creation process. As discussed earlier, 3D visualization can provide educational opportunities for virtual learning scenarios. Concerning contemporary music composition, technologies such as motion capture and EEG could potentially provide combined information on finger movement and emotional condition. In addition, facial expression analysis could give insight as to how the creative process is mirrored in the composer’s face. Finally, in all use cases multimodal semantic analysis can enable the combination of different levels of information and data for documentation and subsequent use in learning and training activities. As the world becomes increasingly dependent on digital resources, there is an important opportunity to develop a platform that enhances the transmission of traditional knowledge and skills by using current developments in the field of digital technologies. In this process, technology is no longer a threat to the survival of customs and traditions, but a tool for their sustained development in an increasingly global 21st century. As anticipated, in the following of this document, we will analyse the cultural expressions taken into account by the project and - basing on such analysis, along with what we learnt from the state of the art just reported – we will define the user and system requirements for the-Treasures platform.

Del_2_1_FINAL2.doc Page 63 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

5. Knowledge domain definition

5.1 Objectives and rationale As it has been already mentioned in Section 3, the knowledge domain definition was considered a crucial phase of the process. Acquiring some knowledge about the application domain of the system that will be developed, is an essential step of software engineering, because knowing the “problem” helps the software engineers to elicit requirements at a very complex and complete level. This is particular true in i-Treasures, where we deal with very different and very complex domains, ranging from various rare singing, to different rare dancing, to contemporary music and contemporary dances, and even to some forms of craftsmanship. The acquisition of a deeper knowledge of the sub-use cases was, therefore, identified as the first step towards the definition of requirements. This encompassed the adoption of three main tools: a glossary to disambiguate many terms used in the context of the dialogue among the stakeholders, a questionnaire to be delivered to the experts, to analyse each cultural expression and understand its main traits and characteristics, and lastly an interview to be conducted with some of the experts, to better understand the teaching and learning practices and needs. These three tools are described separately, and their main results are provided in the following sections. Before the analysis of the tools, though, a section is dedicated to the setting up of the user and expert groups in the various sub-use case communities, who will constitute our main interlocutors in all the following phases of the project.

5.2 Experts’ and Users’ Groups setting up As defined in the DoW, the constitution of experts’ and users’ groups is a crucial element for the i-Treasures project. Potential users are expected to contribute significantly to the specification of user requirements (WP2), the demonstration of applications (WP6) and the evaluation of system performance (WP7) through sharing their valuable expertise and experience, expressing their needs and concerns and proposing their ideas. Four main types of user profiles have been identified:  expert  apprentice  basic user  researcher. The establishment of the groups is an ongoing task, which will be completed during the 4 years of the project. In the following, the attention is focused on the establishment in particular of the Expert Groups, as this is the group mostly involved in the process of the requirements’ definition. a) Establishment of Expert types First of all a list of different types of Experts is established. Here is an indicative list of such Expert types per use case:

Del_2_1_FINAL2.doc Page 64 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

 Rare singing: conservatoires, music schools, university departments of music, associations of singers and musicians, churches and monasteries (for byzantine music), individual singers.  Rare dancing: traditional dance clubs and schools, university departments of physical education and sport science, individual performers.  Craftsmanship: wheel-throwing associations, museums, individual craftsmen, etc.  Contemporary music composition: societies of music composers, conservatoires, university departments of music science and art, performers in music theatre, individual composers. b) First contact with the Experts Each partner responsible for the constitution of groups per use case and sub-use case (see Table 2-1 for partners’ responsibilities) conducted actions and organized meetings and presentations of the i-Treasures project to different potential Expert Groups, like the Associazione Tenores Sardegna, the Filathonitai byzantine choir, the Pas d’la Yau dancer association etc. Many of these Experts were involved in the questionnaire compilation. c) Selection of sub-set of Experts for closer collaboration After the preliminary contact with the Experts, it was important to define a set of criteria for a stricter selection of Experts in view of closer collaboration (Experts- Consultants). One of the main criterion was the acceptability of the project by the Experts, their motivation, their interest to the i-Treasures project and the understanding of its goals. The acceptability issue will be described in details below. Another important criterion is the level of expertise: all the experts selected should have significant experience in their field (on average 8/10 years), as performers or teachers. In alternative, they should have been awarded for their work and/or play an important role to the preservation and transmission of the rare know-how at local, regional and even national level. Unfortunately, it is very likely that some of these Experts shall provide low availability. So, it will be necessary to pay attention also to this, more practical criterion, i.e. the availability of the selected Experts to participate and contribute to the i-Treasures project. A list of the existing Experts Groups, categorized by profile / type of organisation (associations, pedagogical institutions, independent experts, apprentices, etc.) can be found in Appendix 10.2.

5.2.1 Preliminary feedback from the Expert Groups After the meetings with the different organizations and also with the independent Experts some preliminary reflections concerning the acceptability of the project can be done. Generally speaking, acceptability is high and reactions of users are positive. Although two types of barriers have been identified: a) The psychological barriers For some of the experts it is difficult to accept the idea to use new technologies for the preservation and the transmission of their skills and know-how. The role of the i- Treasures partners in this case is to assure them that technologies are not meant to replace the human being, but rather to provide innovative and user friendly tools to support learning and teaching processes.

Del_2_1_FINAL2.doc Page 65 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

b) Practical problems As mentioned above, the availability of some of the Experts is low and this shall be taken into consideration, especially for the selection of the Experts-Consultants. Another practical issue is the geographical distance, but for the meetings and the discussions it can be solved with the use of telecommunication tools.

5.2.2 Next Actions Next actions concern mostly the enlargement of the existing User Groups. It is also necessary to complete the preparation of contracts to be signed between partners and Experts-Consultants, defining issues, such as intellectual property rights, ethical issues (use of invasive technologies, EEGs etc.), image use/distribution rights, etc.

5.3 The Glossary From the very beginning of the project and especially during the preliminary steps of the knowledge domain definition process (see next sections), it was clear that – due to the high level of interdisciplinarity of the Consortium - it was necessary to share terminology and meanings, so to avoid inconsistences and misunderstandings. Besides, given that the ICH preservation and education research field in itself still needs to be consolidated and its research community is still to be shaped, the Glossary is also a means to define common concepts and boundaries, in view of future work.

5.3.1 Glossary design Since the i-Treasures Glossary is meant to be the result of the joint effort by all the partners, thus reflecting the wide variety expertise present in the Consortium, the tool chosen to implement it was a wiki, as this will allow all the partners to easily add/integrate/modify/review definitions. Thus the i-Treasures Glossary will encompass definitions for:  ‘wide’ terms that are key to the project and to the research filed in general (e.g.: ICH, LHT);  technical terms that are relevant for the project and need a very precise and in-depth understanding (e.g.: sensorimotor learning, Text-to-Song);  terms that are very specific of a single research domain but needs to be disambiguated as they are at the boundary with other research domains (e.g.: educational scenario, LMS = typical terms in the field of TEL; A cappella = term typical of the Music world).

5.3.2 Glossary realization The wiki has been set up by CERTH with the support of ITD-CNR, which has also developed guidelines to support partners in the use of the wiki. Presently the structure of the Glossary encompasses definitions and sources. Thus, each page can include a basic definition (with its sources), but also alternative definitions (and further sources) (see Figures 5-1).

Del_2_1_FINAL2.doc Page 66 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Figure 5-1 - Example of definition in the Glossary

The completion of the Glossary is an ongoing process that has just started, but the Glossary will be updated along the project. It is available through the project website at: http://www.i-treasures.eu/mediawiki/index.php/Main_Page .

5.4 Questionnaires to the Experts The analysis of each sub-use case was a necessary condition to understand the needs that shall then guide the development of the i-Treasures system. So, the interaction with the persons who possess the knowledge and skills to perform these cultural expression (namely the Experts), was considered at this first stage the best way to define the various ICH boundaries and characteristics. Given that the project addresses ten different sub-use cases, it was necessary to conceive a common framework to allow the analysis of the various cultural expressions in a homogenous way, but being able at the same time to respect the peculiarities of each context. Besides, given that the local communities of the various sub-use cases are very different in nature (in terms of: size of the expert communities, average educational level, digital literacy skills, etc.), it was also necessary to allow a certain flexibility in the way the interactions between the Consortium and the experts were managed. This led us to conceive a general framework for the description of any ICH form, which was then taken up and customized by the different sub-use case leaders54, according to the specificities of each context and target population. The results of this customization process was the construction of ten questionnaires (one for each sub- use case), which are all based on the common framework, but contains also specific questions. The ten questionnaires were then delivered to the experts with different modes (i.e. online vs. face-to-face), according to the local contexts and needs. In the following, the document describes first of all the common framework conceived by the Consortium, then the customization process, which was based on a collaborative process among the partners, and then we describe how the questionnaires have been delivered in the various situations. Finally, we provide an analysis of each sub-use case, building on the data gathered through the questionnaires.

54 Partners of the project responsible for a certain sub-use case (see Section 2.3).

Del_2_1_FINAL2.doc Page 67 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

5.4.1 Dimensions for knowledge domains definition In order to support the description and analysis of the ten specific artistic expressions, the Consortium defined a common framework, composed of a set of dimensions which can – in principle - be used to describe any artistic expression. The framework was first conceived and proposed by CNR and then discussed among the partners. It encompasses:  General information of the cultural expression  Physical dimension  Emotional dimension  Social dimension  Knowledge and meta-knowledge dimension  Context /environment dimension  Teaching and learning dimension  Value The General information about a specific artistic expression should identify the domain where the expression is rooted (dancing, singing, etc.) and give an overview of its main characteristics, clarifying its historical and geographical origins, etc. The physical dimension is aimed to describe how the performer should use the body, what specific parts of the body are involved, how, etc. The emotional dimension is related to the performer’s feelings during the performance, her/his affective states. The social dimension has to do with the relationships (if any) the performer has with the other people involved in the performance (other performers, audience, etc.). The knowledge and meta-knowledge dimension includes both the theories and practice (notions, techniques, styles, etc.) the performer needs to master, those elements the performer needs to plan prior to the performance, and those s/he will need to keep under control and tune during the performance itself. The context/environment dimension describes the place where the artistic expression is usually carried out, its main characteristics and the tools/costumes, etc. the performer needs to use. The teaching and learning dimension investigates how the cultural expression is traditionally ‘taught’ or ‘transmitted’, if there is an official training path to be followed (with schools, teachers, etc.) or if learning occurs through informal methods (observation, apprenticeship, etc.). Lastly, the value should highlight the aspects of each cultural tradition that experts and local communities consider valuable and the reasons why they think it is important to safeguard and preserve that specific cultural expression. Exemplar questions for each dimension have been proposed by CNR, in order to guide each sub use case leader in the development of its own questionnaire in support of the knowledge elicitation process (see Table 5-1):

Del_2_1_FINAL2.doc Page 68 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Dimension Exemplar questions for Experts General info Provide a short description of the artistic expression (genre, basic features, etc.) What are the main distinctive traits characterizing the artistic expression? What are the origins of the artistic expression? Where is it geographically located? Can you describe the historical context where it has been originated? Can you give an idea of the level of diffusion of this artistic expression today (e.g. in terms of number of performers/groups, etc.)? Please provide any material, reference, etc. documenting the artistic expression (documents, images, videos, URLs, etc.). … Physical What parts of the body does the performer use? What are the movements the performer will need to do? How the performer will enact these movements (rhythm, sequence, etc.)? Please synthesize the key physical elements that are peculiar of /better identify the artistic expression (choose one or more options): Body o Motion trajectories o Body postures o Gesture co-articulation o Other______(please specify) Head o Facial expression o Optical flow o Vocal tract o Mouth o Other______(please specify) Hands o Hands postures o Gesture co-articulation o Fingers motions o Other______(please specify) Voice o Acoustic signal o Other______(please specify) … Emotional Is there any mental or emotional attitude required by the performer during the performance? What are the performer’s feelings while performing? How do feelings impact on /are reflected in the performance? … Social Does the performer usually perform alone or together with other people? What kind of relationship (if any) does the performer need to have with the other performers? What kind of relationship (if any) does the performer has with the audience? … Knowledge What are the main notions the performer needs? What are the practical skills the performer needs? What are the techniques/styles the performer needs to master? What elements should the performer be able to keep under control /tune during the performance? …. Context Where does the performer usually perform?

Del_2_1_FINAL2.doc Page 69 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

/environment Does the environment need to be specifically configured for the performance? Does the performer need any specific tools /instruments to carry out the performance? Does the performer use any specific personal equipment during the performance (dress, etc.)? Where is the audience placed? … Teaching and So far, how do people learn this ICH (by imitation, through dedicated learning training initiatives, etc.)? Where does this mainly happen (in informal settings, in formal educational settings, etc.)? What is the typical learning path to be followed by a leaner (stages, duration, apprenticeship, etc.)? Are there people officially entitled to teach (‘teachers’) or is this delegated to practitioners (for example LHTs)? What are the most important elements the learner has to focus on/ learn? … Value What is - according to your perspective - the real ‘value’ of this artistic expression (historical value, economic value, innovation value, uniqueness value, cultural value, etc. …)? What is the reason why you think this artistic expression is deemed to be safeguarded and preserved? What is - according to local communities - the real ‘value’ of this artistic expression? …. Table 5-1 Exemplar questions for each dimension

This first attempts of instantiating the dimensions in different questions confirmed that the relevance of the dimensions could vary depending on the use case and/or sub- use case considered: for example, we might expect the ‘physical’ dimension to be highly relevant for the dance use case on the whole, while the ‘context/ environment’ might turn out to be lowly relevant, or differently relevant dependently on the kind of dance.

5.4.2 Questionnaires preparation In order to support the partners in the process of taking up the framework and customize it, by producing specific questionnaires for each sub-use cases, ITD-CNR proposed a collaborative process to be carried among the partners. Collaboration was fostered through the adoption of a ‘peer review’, a very effective collaborative technique usually adopted to foster a social construction of knowledge55. In order to allow dialogue and collaboration, four forums (one for each use case) were created in the platform, so to allow a quick and direct exchange among the partners. A sequence of steps were set up and agreed with the partners:

55 As its name suggests, the peer review envisages learner analysis of an artefact produced by someone else. Usually, the process includes three phases: during the first stage learners produce an artefact (a document, a map, a hypertext, etc.); then they are asked to provide feedback on the work done by their colleagues; during the third stage, learners revise their original product according to the feedback received. During a peer review, a reciprocal teaching approach is stressed, where one’s own interpretations of reality are to be faced and compared with those of others [Pozzi F. & Persico D. (Eds.) (2011), Techniques for fostering collaboration in online learning communities: theoretical and practical perspectives, IGI Global].

Del_2_1_FINAL2.doc Page 70 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

1. Preparation of a first draft of the questionnaire: during this first step, sub-use case leaders were asked to try to customize/adapt the exemplar questions to their own sub-use case and send a preliminary proposal in the forum. 2. Internal peer review for each use case: partners involved in the same use case were asked to read the lists of questions produced by others and provide feedbacks. During the revision phase, some of the experts were also contacted, who provided useful observations and comments on the questionnaire structure, on single questions formulation, etc. 3. Questionnaire refinement: during this phase, sub-use case leaders had the chance to revise their own list of questions, according to the received suggestions/ inputs/ ideas. 4. Final release: at this last stage, partners were asked to upload the final version of the questionnaire on the platform. Since the questionnaires was created in English, a translation in the language of the ICH was also expected. Of course, this process was proceeded by a familiarization phase (given that this was the first collaborative effort of the Consortium), during which partners were invited to introduce themselves in the forum. Thus the activities on the platform followed the sequence represented in Figure 5-1

Figure 5-1 Questionnaire preparation process

In each forum the activities were carried out by the sub-use case leaders and continuously monitored and facilitated by ITD-CNR. Participation in the forums was high in all the use cases in the different phases of the process. We had 174 posts on the whole. The most active forums were craftsmanship and Singing, respectively with 49 and 45 posts. The individual participation in the forum was high as well: 20 people wrote at least one post and they are pretty equally distributed in the use cases (the forums hosting most of the people who contributed with at least one post were the craftsmanship and the singing). Figure 5-2 shows the number of posts in the forums for each phase of the questionnaire development.

Del_2_1_FINAL2.doc Page 71 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Figure 5-2 Number of posts in the forum for each phase of the questionnaire development

As shown in the Figure 5-2, we had different levels of participation in the various phases in the four use cases. Partners were all very active during the first familiarization phase in all the forums (blue) and the same happened for the first release of the questionnaire (orange), during which we had about the same number of posts in three of the four forums (with the exception of the Contemporary music composition forum, where exchanges could not occur because there is only one partner responsible within the use case). The peer review phase (grey) registered different levels of participation in the four use cases, depending on the decision of having one common or different questionnaires (which was taken during the previous phase). When partners agreed that it was more reasonable to have a common questionnaire (Dance use case), they collaborated directly to the co-production of one single questionnaire, thus limiting the peer review phase to the interactions with the experts, which occurred outside the forum. It’s noteworthy that the forums registered a pick of activity in all the use cases in the last phase (yellow), during which questionnaires were finalized and released, that means that in any case the leaders of all the sub-use cases in the end took advantage of the overall process and felt the need to tune and modify their questionnaires. From a more qualitative analysis of the collaboration process that took place during this activity, it is interesting to note that within the various use cases the process followed different paths, due to the peculiarities of each ICH form. The different paths are represented in Figure 5-3.

Del_2_1_FINAL2.doc Page 72 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

A. Singing

B. Dancing

Figure 5-3 Questionnaires versioning (to be continued)

Del_2_1_FINAL2.doc Page 73 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

C. Craftsmanship

D. Contemporary Music Composition

Figure 5-3 Questionnaires versioning While within the rare singing use case each sub-use case leader developed its own questionnaire and collaborated during the peer review phase in order to harmonize and improve the four documents (Figure 5-3 – A. Singing), in the Dance use case one draft was enriched collaboratively by all the partners Figure 5-3 – B. Dancing. Instead, the craftsmanship and the contemporary music composition use cases adopted a sort of ‘mixed’ approach (see Figure 5-3. C. Craftsmanship and D. Contemporary music).

Del_2_1_FINAL2.doc Page 74 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

In parallel to the process happening in the forums, some experts were involved by some of the sub-use case leaders, to have feedback on the questionnaires. In some cases, as for example in Canto a Tenore and in the Walloon dance, the experts suggested to create a double version of the questionnaire: one, more extended version to be delivered for example to the experts/consultants, and a lighter version for other experts, who may have difficulties in managing the longer questionnaire (due to lack of time, lack of education, etc.).

5.4.3 Questionnaires release and delivery At the end of the process described above, a set of questionnaires was prepared and delivered in the different use cases (see Table 5-2): Use Sub-use case Questionnaire Languages Delivery case final release methods

Four different questionnaires:

long and short Cantu a Tenore English/ Italian Face to face version single version Face to face/on Cantu in Paghjella English / French line

Byzantine music single version English/ Greek Face to face Raresinging single version E-mail and Human Beat Box English/ French Facebook community

One common questionnaire

long version Face to face/on Căluş English/Romanian line long version Face to face/on Tsamiko English/Greek line long and short Face to face and

Raredancing Walloon English/French version e-mail

Contemporary dance long version English/French Face to face

One common questionnaire:

French Pottery single version English/ French E-mail single version Face to face/e- Greek Pottery English/ Greek mail/on line

Craftsmanship Turkish Pottery single version English/Turkish Face to face

One common questionnaire

. . .

Cont English/French Face to face music comp single version

Table 5-2 - Questionnaire versions, languages, and delivery

Del_2_1_FINAL2.doc Page 75 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

As to the question types used, the questionnaires included:  long and short free text questions;  Likert scales (scale from 1 to 5): the questions were oriented to value the importance (frequency/level of complexity, etc.) of different aspects (e.g. posture and gestures) or to register the level of agreement of the respondents regarding to predefined statements relative to other aspects (e.g. perception of the value, etc.);  multiple choice questions;  matrix questions with short text boxes : identical response categories were assigned to multiple items;  rank order questions: a set of items should be ranked, based upon a specific attribute or characteristic. For an overview of the structure of the questionnaires and for the complete versions of all the questionnaires see Appendix 10.3.

5.4.4 Questionnaires results This Section contains the analysis of the ten sub-use cases considered in the project. The analysis is the results of the collection and elaboration of the data coming from the questionnaires described so far. Descriptive statistics are reported in Appendix 10.4.

5.4.4.1 Sardinian Canto a Tenore

5.4.4.1.1 General Information In 2005 Canto a Tenore was proclaimed ‘Intangible Cultural Heritage of Humanity’ by UNESCO. In the local dialect of several villages belonging to central Sardinia, the word “tenor” is strictly related to the meaning of the Italian word “choir”. Canto a Tenore represents a form of polyphonic singing performed by a group of four men who perform separate and specialized parts. They are characterized not only by different vocal registers and timbres, but also by freedom of movement and by the role they assume during the interpretation of the chants. Generally speaking, the tenor singing can be described as a solo singing accompanied “with chords” ("corfos") by a three- part vocal chord ("su tenore"). The soloist, called "sa boghe", sings a Sardinian language poem (logudorese) while the other three cantors (su bassu, sa contra, sa mesu boghe) accompany the chant with nonsense syllables (either one or two) consisting of guttural sounds characterized by a peculiar vocal tone. Having developed in the oral tradition, it doesn’t rely on rigid schemes, written scores or preset melodies. The text itself of the chant is not predetermined. The execution is built by melodic and harmonic formulas, well known among the cantors, which are combined following a canvas that can be varied by the soloist or by the choir as a whole (or also by the individual vocal parts). The musical sequence is therefore not preordained by rigid scores, and actually through it the cantors have the opportunity to express their feelings and aesthetic taste. At the same time they can express themselves in a different way that is always unique and original. In Canto a Tenore one finds an overall lack of historical sources, both written and oral.

Del_2_1_FINAL2.doc Page 76 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Figure 5-4 Sardinian Canto a Tenore singers

Concerning the origins of Canto a Tenore the first evidence with reference to certain musical practices date back to the second half of the 18th century. Present diffusion is observed for the majority in the center north of the island; at the moment there are 42 groups registered. The teaching is usually done orally, by imitation. Only after the invention of the phonograph, towards the end of the 19th century, has it become possible to fix the sounds that are more or less faithful to a musical performance. Besides, it seems that the diffusion of recordings has had a number of effects on the traditions of the Canto a Tenore. One of the most important is that the wide distribution of some recordings has strongly contributed to giving a particular fame to some village styles, groups or individual singers within the micro-world of the a tenore singers and sometimes even beyond (Pilosu, 2012b). Starting from the end of the Second World War, this practice has started being studied by the ethnomusicologists, who are especially interested in the social and musical study analysis. Nowadays, the number of researches is growing with a focus on the phonetical aspects of this singing (e.g. Bravi, 2012). Presently, one of the most known resources in the field of Canto a Tenore is The Encyclopedia on Sardinian Music, published by the Sardinian Union in 2012 (Pilosu, 2012a).

5.4.4.1.2 Respondents’ information The questionnaire was distributed to a group of experts (singers belonging to an Association of Canto a Tenore56) and to experts-consultants (researchers). Our sample was composed of 88 males with a mean age of 38.6 (minimum age 17, maximum 81); 86 of them are the experts/singers and 2 are the researchers on Canto a Tenore (ethnomusicologists). As to the educational background, most of the respondents have compulsory school education (44%), 37% have upper secondary school education and 16% have a university degree. The majority of the singers are experienced: 73% of our sample have more than 6 years in the role of tenores; 14% have 5- 6 years of experience and 11% 3- 4 years only.

56 www.tenores.org

Del_2_1_FINAL2.doc Page 77 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Although the majority of the respondents have wide experience in the practice of the Canto a Tenore, they have never studied music (76%), in the academic sense of studies. All the respondents in our sample are singers and the roles covered are the following: Contra (29%); Bassu (26%); Mesu Boghe (23%) and Boghe (22%).

5.4.4.1.3 Physical Dimension Our experts declare that in Canto a Tenore is usually performed by four singers who stand upright. The traditional disposition of the singers is cross-shaped: they face each other in a closed circle. The singers are usually very close to each other (often their shoulders are touching). In many cases, the singer who plays the voice is slightly apart from the other three (and in particular from the bassu and the contra). We asked singers with what frequency some typical positions are assumed during the performance. The position “Standing with the support of the legs asymmetrical” is chosen by the majority of respondents. Also the position “Standing, back straight, legs extended” is very frequent in the practice of Canto a Tenore. As to the main gestures put into practice by singers during the performance, the opinion of our experts, concordant with the singers’ opinion, was that some typical gestures of the Canto a Tenore are very common among singers. One of them is to touch the ear with the hand (in a few cases both hands are in contact with both ears) another one is ‘one arm (or sometimes both) placed behind the back’. Moreover, when singers sing with the traditional Sardinian costume, many of them usually put their hands on their belt. Another position indicated as most common by the singers is an elbow on the shoulder of the other . In order to investigate the main facial movements, we asked our singers what the main expressions / facial movements made by the singer are. According to our experts and singers, cantors, in most cases, are serious and concentrated on their task of singing. They do not usually have specific facial expressions which are meant to convey specific emotions (like happiness, sadness etc.). Their gazes may be oriented in different directions: sometimes the singers look at each other in order to better regulate and synchronize their parts, in other cases their gazes are directed to the floor, towards some points outside the circle of singers, and in a few cases they are closed. The mouth is generally not as open as one might expect to utter the text (particularly in the singers of some villages, as in Bitti). Some of the singers (particularly many bassos and contras) usually protrude and bend their lips towards the right or the left. Usually in Canto a Tenore singers stand still, but slight body movements are present between the musical phrases and some singers often move their shoulders while singing songs with a strong rhythmic pulse. This is confirmed by our sample, who chose “Still”. In order to understand what the internal organs mostly involved during the performance are, we asked in what way they are involved. According to the experts, singing entails vibration of the vocal cords. The two lower voices (in some case the mesu boghe too) have particular laryngeal contraction. In the case of the bassos, the contraction brings about a vibration of the arytenoid. The acoustic effect is a doubling of the period (i.e. a lowering of the pitch by one octave) and a particular roughness of the timbre. In the case of the contras, the larynx contraction does not imply the doubling of the period, but gives the timbre a particular tone. The answers from the singers show that they were not able to identify an organ mainly involved.

Del_2_1_FINAL2.doc Page 78 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

In order to investigate what the most important factors characterizing the Canto a Tenore are, we asked the singers’ opinions related to Synchronization, utterance and preparation. They declared that, as far as utterance is concerned, the “right start” is considered the most important factor. Preparation is quite important, as well as synchronization (this question was directed only to sa boghe). To understand the physical dimension of this treasure in more depth, we asked our sample their opinion concerning timbre and voice quality in Canto a Tenore. According to most of the singers, when the rhythm is free, as well as in a “ballu”, the timbre of vocals is the most important element. The last question about the physical dimension in our questionnaire was focused on the main characteristics/ features considered important by the singers while performing. More than the half of respondents agree to consider the ornamentation as the main feature of the Canto a Tenore. Voice quality and intonation are considered equally important.

5.4.4.1.4 Emotional Dimension For the investigation of the emotional dimension, we asked to what extent feelings / moods affect the singers’ performance. According to the experts, the Canto a Tenore usually has a strong emotional impact on the singers themselves. Singing in public may imply – also depending on the experience of the singers - excitement and/or anxiety. This may bring about variation in the musical tempo and absolute pitch of the song. The singers’ opinions confirmed the experts’ vision about the emotional impact: the most cited emotions were “Happiness” and “Excitement”.

5.4.4.1.5 Social Dimension As far as the social dimension of the Canto a Tenore is concerned, singers are not happy / able to sing together with other people they do not know. In some cases, the Canto a Tenore entails an intergenerational contact. However, as far as professional groups are concerned, belonging to the same (or close) age group is more frequent. When a singer is older than the others, this is usually the boghe. Physical contact seems to be very important as it expresses not only familiarity with the fellows by increasing the ease of the singers, but also helps to better listen to what the other singers are doing and to find the right harmony. To this end, eye contact and meaningful gazes are also crucial. We also investigated the impact on the performance of the “external factor”, which means whether and to what extent the presence of the public or/and of dancers is considered important by the singers. Most of the respondents confirm that they have to pay attention to the performance external to the group (e.g. dancers).

5.4.4.1.6 Knowledge and Meta-Knowledge Dimension Concerning the theoretical aspects, according to the experts, the ability of the four singers comprising a Canto a Tenore group can be different. Each voice needs particular knowledge and skills. As far as linguistic knowledge is concerned, knowledge of Sardinian is, generally speaking, something that all singers must have. In the case of the boghe, this is due to the fact that all the lyrics are in Sardinian (only exceptionally and for fun one may sing in Italian or other languages), but Sardinian is the usual language through which communication takes place in the contexts where the Canto a Tenore is learned and practiced.

Del_2_1_FINAL2.doc Page 79 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

A perfect knowledge of the lyrics is needed. Particularly in some regions, such as Orgosolo, the value of the text (and, as a consequence, a good knowledge of many poetical texts) is considered one of the most important skill for a good boghe. While a perfect knowledge of the region repertoire is essential for every professional singer, a knowledge of the most renowned songs of other regions and/or of the most famous singers, is considered a sign of the mastery of an expert singer and is usually a part of the learning process of the singers. According to the singers, the most important knowledge seems to be the styles and modas of the region in which the singer lives, followed by the mastering of the Sardinian language. As for the most important skills connected to the practice of the Canto a Tenore, according to many singers, the boghe should not only be a soloist but also a person with a leading capacity in the group. He has to be able to lead the group, to take decisions, e.g. related to the lyrics to sing, the pitch and the start of the performance. The ability to make variations and/or to improvise is considered important. This is also the case for the Mesa boghes and – in some regions and to a lower extent – for the Contras. The singers, who compose the group of accompaniment, must be well synchronized and able to stay "in time" with the soloist (sa boghe) and with the other members, be able to listen properly the other singers and be able to monitor the performance and to correct others and himself.

5.4.4.1.7 Context/Environment Dimension As for the context/ environmental dimension, the Canto a Tenore is usually practiced in various contexts. Traditionally, it is a part of the male sociality of the lower classes (especially shepherds) and as such is practiced both in tzilleris (pubs) and spuntinos (large meals with friends) and in occasion of the patron saint’s day festivities. Nowadays, many Canto a Tenore groups have a semi-professional status and usually participate in musical festivals, both in Sardinia and, for some groups, outside the island. The costumes are usually different in the two kinds of context. Everybody is allowed to wear what he likes when singing in informal contexts. The traditional costume is used in the most official situations, whereas the “modern-Sardinian” clothing fashion is preferred by many, as a more practical way of presenting themselves appropriately, but also in a comfortable and easy-to-manage style.

5.4.4.1.8 Teaching and Learning Dimension The Canto a Tenore is orally transmitted. Traditionally, no formal teaching process is foreseen, and transmission is usually based on practicing together with peers, looking at and listening to the master singers. In recent years, audio recordings have assumed a crucial role in the training of young singers, along with live practice. Moreover, “local schools” of Canto a Tenore have been, in some cases, a starting point for singers and groups. It should be noted that teachers in these schools are usually mature and expert singers, but they do not have any formal musical training, and that the training process cultivated in here is usually more centered on informal teaching/learning activities, than on a regulated and theory-driven process. Should be noticed that the singers who answered the questionnaire didn’t report to have learnt Canto a Tenore in devoted schools. This result highlights one of the distinctive aspects of the Canto a Tenore teaching/learning practice, which usually takes place in informal settings by teachers that in most cases don't have any musical/academic background. According to our experts, a formal approach to the transmission and development of the Canto a Tenore could take into account the possibilities offered by modern tools

Del_2_1_FINAL2.doc Page 80 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 of communication and technology to support the learning process and to give the learner a higher awareness of the features of the singing styles. In particular, the possibility of isolating voices and distinguishing the individual styles, as well as any kind of visual and conceptual representation of the structure of the songs and of each singer’s role, could be of great importance in the learning and knowledge development processes. The singers identified the main suitable tools for teaching/ learning process in video and audio recordings of singers' live performances, “Theoretical” tools (texts and images) seem not to be valued as useful. A different consideration should be made for more advanced technologies (e.g. avatars), which were not particularly appreciated probably because most of the singers are not familiar with them.

5.4.4.1.9 Value Dimension The Canto a Tenore is an expression with a strong continuity in time and it is usually seen, both within Sardinia and elsewhere, as the most characteristic musical expression of Sardinia. Its rich repertoires, the unusual techniques deployed to make music in the Canto a Tenore groups, the complexity of its timbre, melodic and rhythmic structures, together with a deep and shared “culture of singing”, give it an aesthetic value which can be recognized not only by Sardinians, but also by people throughout the world. From this point of view, the Canto a Tenore has also had an increasing economic value (in terms of number of semi-professional groups, number of performances and musical events, cultural tourism etc.) in recent years, and it is foreseeable that a wider dissemination of this music may open the way to further developments in this direction. Safeguarding and developing the Canto a Tenore may be regarded as a major objective for both Sardinians, who may cultivate a specific and strong artistry which helps the social life within the local community and allow them to express individual identities and well-being, and the world, as a valuable aspect of cultural diversity, the most relevant human trait. All in all, the Canto a Tenore is a treasure of Sardinia, which has to be safeguarded because it is the artistical expression of the Sardinian tradition.

5.4.4.2 Corsican Cantu in Paghjella

5.4.4.2.1 General Information On Thursday 1st October 2009, secular and liturgical Cantu in Paghjella was listed on the emergency backup by the Intergovernmental Committee of UNESCO (Unesco 2009). The Cantu in Paghjella designates both the male chant interpreted a cappella by three voices (a seconda, u bassa and a terza) and secular or liturgical types (Guelfucci et Salini 2008). According to the repertory, diverse languages such as Corsican, Cruscu (Toscan Corsican), Sardinian, Latin and Greek (kyrie eleison) are used. Its harmonic process is constructed as follows (Guelfucci & Salini, 2008; Salini, 1996): - the versu, a verb/sound intrication that identifies places and individuals belonging or having belonged to cantor families; - by the successive entrance and interchanging of voices: a seconda (principal voice) then u bassu (base voice) and a terza (high voice); - by their overlap arrangement, entrance of one voice in the key of the other; - by the utilization of the ornament as a structural element;

Del_2_1_FINAL2.doc Page 81 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

- by conforming to a precise behavioural code : the eye, the ear and the mouth function in a closed circle as the singers form a circle. - It is still transmitted orally, by impregnation and intergenerational and endogen imitation Concerning the geographical description, the Cantu in Paghjella was first practiced in the more rural and pastoral areas of northern Corsica: the regions of Bozziu, Castagniccia and of Tagliu Isulacciu. Since the 70’s thanks to the introduction of new modes for the diffusion of music, the practice of the Cantu in Paghjella has extended to all the regions of the Island and certain versi are ‘spreading’ throughout the island territory. It still remains concentrated in Upper Corsica, although in southern Corsica young practitioners have adopted and interpret the Cantu in Paghjella.

Figure 5-5 Corsican Cantu in Paghjella singers

5.4.4.2.2 Respondents’ information The questionnaire was distributed thanks to some Corsican experts, but unfortunately up-to-date only two responses have been collected. This was due, as the respondents said, to a certain difficulty to put into words some very difficult concepts. In addition, most of the Corsican singers have an oral culture, so probably an interview would have been more fitting in this context. In the future the sub-use case leaders is planning to have closer contact with La Casa di u Populu Corsu (Association of Corsican students in Paris), so to collect more data. Besides, probably the translation of the questionnaire in Corsican language could also help to reach more people. The first respondent is a French man, 58 years old; he lives in the village of Sermano near Corte in Corsica. For 25 years continues to transmit the secular and liturgical Cantu in Paghjella in the customary context. This respondent was not comfortable with the online form: he has completed this questionnaire by hand and sent it to us. He is a farmer and his education level is college. It has no musical experience but sings regularly. He heard the old fund of Paghjella, and this technique is sung in his family. He learned to sing by listening to the olders of his family and his neighbors. He teaches the Paghjella, and alternates between the voices of seconda and terza. He’s an expert of this vocal technique. Concerning the second respondent, this French man, 39 years old, lives in Marseille. He’s an engineer, and has a musical experience of over 6 years in singing renaissance, medieval, and Byzantine vocal technique. He heard the old fund of Paghjella, and this technique isn’t sung in his family. He learned to sing Paghjella with a singing conductor, and alternates between the voices of seconda, u bassu and terza. He’s an informed learner of this vocal technique.

Del_2_1_FINAL2.doc Page 82 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

5.4.4.2.3 Physical Dimension Our experts declare that Cantu in Paghjella is usually performed by three singers who stand with hands moving and who stand straight with support legs symmetrical and with the hand on the ear. The most typical gestures are a hand on an ear, and, in many cases, the most typical facial expressions/movements – for the two respondents – are: eyes looking at the other singers and listening to the other singers, and mouth movements and tongue and lip position. According to our two experts, the voices accompanying the seconda watch seconda’s lips in order to have a better start. This allows the singer to anticipate more easily than the only listening: or respondents said: "to be attached to the lip movements of the other." According to our respondents, the Cantu in Paghjella is usually performed without body movement, except hands forward. The internal parts of the body mostly engaged during the performance are the abdominal and diaphragmatic breathing. Then, we asked to our respondents to indicate their opinion about the main characteristics of the Cantu in Paghjella: for them, the right utterance is important to the performance; in addition, synchronisation and complicity with the other singers are necessary. While performing, melody and ornamentation are more emphasized. The key feature in the Cantu in Paghjella is voice and the utterance; the improvisation is also important for one respondent but not for the second one.

5.4.4.2.4 Emotional Dimension For the investigation of the emotional dimension, we asked to what extent feelings / moods affect the singers’ performance. According to the experts, during the Cantu in Paghjella performance, the singer should be relaxed and in a spirit of conviviality and sharing, in particular in the case of religious singing. There is a high and very high impact of the singer’s feelings/mood to the performance for the two respondents when they sing secular and sacred songs.

5.4.4.2.5 Social Dimension The Cantu in Paghjella is a group singing, with a minimum of 3 people, till 4 with a double bass. The relationship between the performer and the other singers are friendship, complicity, trust, but it also happens to sing with people you don’t know; work, trust, minimum of intimacy to ensure a song whose voices are totally dependent on them. There is no hierarchy and no choirmaster in the group. The first voice (seconda) starts the song, then the bass voice and finally the high voice (terza). The singers may be interchangeable, but mainly it is a role playing game. The singer who sings the voice of seconda must conduct performance and others have been a better time to contribute and wait some signs of seconda, but every singer can sing seconded for complicity. There is no rules concerning proximity/disposition among singers, just close enough to see the lips for synchronization purpose: as close as possible to each other, or in a circle facing. If there is public, it better to stay a half circle but all aligned in a single row; no singer in the background. Concerning the transmission to the audience, the Paghjella is sung firstly for oneself, then (if it’s well done) it may be pleasant to someone; convey the mood of the song, promote the humility of our lives, and especially be true transmit the complicity of the

Del_2_1_FINAL2.doc Page 83 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 singers themselves. This is the more significant as evidenced by the audience most of the time felt.

5.4.4.2.6 Knowledge and Meta-Knowledge Dimension Concerning the theoretical aspects, according to our respondents, all of them are important: knowing: the texts of the songs, a specific language, the main existing styles, fashions of their village and of other villages. Concerning the theoretical aspects, it’s very important to be able to start at the right time, to be able to produce ornamentations, and (for one respondent only) to be able to perform improvisations. The elements the performer needs to keep under control (and tune) during the performance are the control of their voice and tone, the sustaining of the fundamental tone of the beginning of the song, the emission of harmonics of voice, and the adaptation to the acoustics of the performance’s place.

5.4.4.2.7 Context/Environment Dimension As for the context/ environmental dimension, the Cantu in Paghjella is usually practiced in various contexts: village feasts, in bars, in the street, among friends, or in a church or a place with a sound that makes the singers curious to try. Concerning a specific tool or instrument, the Cantu in Paghjella is a cappella singing, but sometimes it’s accompanied by cetera, a Corsican lute. The group disposition is usually semi-circular or round, in front of an audience, or in the middle of the public; the singers usually select a position with respect to the most advantageous place’s acoustics. No particular dress is worn.

5.4.4.2.8 Teaching and Learning dimension The Cantu in Paghjella is orally transmitted. Traditionally, no formal teaching process is foreseen, and transmission is usually based on practicing together with peers, looking at and listening to the master singers. Some parishes or dioceses host their own brotherhood. A brotherhood is composed of singers from all socio-professional environments. There are no formal institutions but associations conducting workshops in singing or folk traditions, organize training. Practitioners themselves meet and pass on their songs. This result highlights one of the distinctive aspects of the Cantu in Paghjella teaching/learning practice, where everyone usually moves at their own pace by contacting a singing teacher or other structures and the level changes by "confronting" to other singers. Learning Paghjella is allowed by listening to a teacher, and by the discovery of his own ability to improvise and ornament. Oral transmission favors early listening and mimicry, but after time, the acquired insurance soars the "recklessness" of the singer in its different habits. According to our experts, the teaching/ learning methods for teaching Cantu in Paghjella are: learning by observation of a master of singing (French word: “maître de chant”) with oral transmission and practical training, or by means of stages, and training group with a trainer. According to our respondents, a formal approach to the transmission and development of the Cantu in Paghjella could take into account the possibilities offered by modern tools of communication and technology to support the learning process and to give the learner a higher awareness of the features of the singing styles. In particular, text (one respondent), video (one repondent), audio tools can

Del_2_1_FINAL2.doc Page 84 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 become suitable as training material types. The best feedback types considered as important to be provided to the student are mainly textual, then audio and video. Concerning the student’s evaluation, the intonation is very important, followed by the rescission in following/creating the rhythm, the articulation, the adaptation of the text to the singing, then the vocal power, the ability to improvise, and to do ornamentations.

5.4.4.2.9 Value Dimension The main values of the Cantu in Paghjella are historical, cultural, uniqueness, and then economic (one respondent only).

5.4.4.3 Byzantine Music

5.4.4.3.1 General Information Byzantine Music, is the music that was developed within the Orthodox Church, in order to fulfil its worshiping needs. It also serves as means to highlight and emphasize the power of human speech. It is performed solely from the human voice, without the accompaniment of musical instruments. It is considered to be the evolution and the cultivation of ancient Greek music. It is the music of the Byzantine Empire that was originally consisted of Greek texts of melodies. Its main characteristics is the poetry and the melody that is written in order to create a chant and naturally, since it is so closely tied to the Orthodox Church, it can be also characterized by its religiosity and the need to worship God. Byzantine Music cantors often have a style of seriousness and humbleness while chanting even though they try through their art to express the inner beauty of the soul, the spiritual wealth and freedom and ultimately ethos (moral). Semeography, semadography and parasemantics57 is the unique figment of the Byzantine spirit and civilization and it’s a perfect system for expressing the monophonic music logic.

Figure 5-6 - Byzantine Music cantors

The geographical diffusion of the Byzantine Music today reaches the Balkans, mainly Greece, Russia, Constantinople the former capital of the Byzantine Empire, Ukraine, Romania, the Eastern Mediterranean and the Middle East. Regarding its origins, with the terms of Byzantine and meta-Byzantine Music, “[…] we characterize the Greek music expression and tradition throughout the centuries of the Byzantine civilization. As an art form, Byzantine Music is an expression of the Byzantine culture which evolved to a self-contained and homogenic system of semeography, in order to perfectly express the strong worshipping needs of the east Orthodox Church. The capturing of the semeography based hymnology, with its hand

57 These three terms indicate a symbolic notation system that can describe exactly how to chant the lyrics that are written. It's uniqueness lies in the fact that these symbol do not represent absolute musical notes but they are relative to each other, meaning that the next symbol shows a tonic up or down from the previous one.

Del_2_1_FINAL2.doc Page 85 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 written code, starts approximately in the middle of the 10th century” (by Professor Gregorios Th. Stathis58). Nowadays Byzantine Music chants can be sung in Greek, Romanian, Russian and Arabic, but it is possible to hear chants in other languages too. In the complex rhythm of Byzantine Music, you can distinguish the iambic and tonic rhythm. Byzantine chants require it’s chanters to have an extent voice with duration, tonality and melody.

5.4.4.3.2 Respondents’ information Our sample is composed of 30 people with a mean age of 30.1 years (min. 18; max. 40 years old). As to the educational background, almost all of the respondents have college or university education (93.3%) and the rest have upper secondary school education (6.7%). The 70% of the respondents have more than 6 year experience of training in Byzantine Music, 3.3% of them have a 1-2 years, 16.7% 3-4 years and finally 10% 5- 6 years of experience. The roles are split in 4 parts, 33% of them are Cantors, 23.3% are Chanters, 20% are Choristers and 23.3% have stated “other”. Finally none of them is a priest. As for their professional experience, most of them are students (36.7%) and teachers (26.7%). Professional singers are the 5% and last only 5% are researchers. 3.3% has stated “other”.

5.4.4.3.3 Physical Dimension Most of our respondents have a strict expression while chanting. The frowning of the forehead and the closing of eyes seem to play some part when chanting while the tightening of the lips seems to be the least common facial expression. As far as the gesture and finger movements are concentred, while chanting, our Experts seem to make slight gesture and finger movements. The most common among their gestures is the melodic movement of the hands. Regarding body movements, from the results of the questionnaires we can conclude that the body movement of the chanters is minimal. Our Experts also state, that the way these movements are enacted, depends solely on the preferences and the personal style of the chanter. Our Experts seem to use abdominal/diaphragmatic and laryngeal style of singing rather than nasal. One of the most difficult sections of the questionnaire for our respondents was to state how much they feel and how much they intend to use inner oro-facial parts, since there is no methodology in the Byzantine teaching, for using these internal body parts. These movements are spontaneous and are a result of mechanical actions. As it is natural, articulation is very important in singing in general, so it is in Byzantine Music. Voice, utterance and rhythm, all play a significant role in chanting.

5.4.4.3.4 Emotional Dimension For the investigation of this dimension, we asked our experts what kind of emotional impact has on them, the chanting of Byzantine Music. Our experts state that among a set of emotions the strongest one is awe. It seems that while chanting, our experts

58 http://www.ibyzmusic.gr/default.asp?pid=10&la=1

Del_2_1_FINAL2.doc Page 86 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 have lots and very strong feelings such as Joy, contentment, love , compassion and devoutness while the less intense, is the feeling of pride.

5.4.4.3.5 Social Dimension As far as the social dimension of the Byzantine Music, singers have rather neutral feelings in terms of singing alone or with others but are more concentrated when they sing alone.

5.4.4.3.6 Knowledge and Meta-Knowledge Dimension It is seen that a key factor in maintaining Byzantine Music throughout time, is the teachers themselves who they never settle and constantly want to learn about new ways, styles and genres of Byzantine Music. Also, from their point of view, for a proper musical performance, it is necessary for the chanter to have a good understanding of the texts of the hymns, to synchronize breathing with the conceptual version of the lyrics and to have a good utterance. Ultimately a chanter must try and to follow his teachers verbal methodology by imitation.

5.4.4.3.7 Context/Environment Dimension The natural place for a Byzantine Music singer to chant, is of course the Church. Along with the chorus and the home they are the top three places that chanters most frequently practice their skills. They state that especially in places like a church or a monastery, their emotions are at their peak. The tools that they most probably need to carry along, is bookstand, books or texts and diapason. For their practicing, they also consider crucial, the access to audio and visual material of Byzantine Music.

5.4.4.3.8 Teaching and Learning Dimension It takes years of hard practice in order to master Byzantine Music. With the questionnaire we gathered information that tells us that in the teaching & learning process two of the most important parts to be learned, are the marks (semadophones) which includes their form, their name, their etymology, energy and interpretation and the second is the teaching of the sounds such as arctic testimony sounds, tones and their branches, musical scales, signs of deterioration and the ethos of each sound. Additional tools that aid this procedure, can be computers and the ability to type musical texts in them and stringed musical instruments such as ganun and tamburas.

5.4.4.3.9 Value Dimension Since Byzantine Music is tied with the Greek history and religion through the Byzantine Empire, it constitutes an extremely valuable historical and religious heritage. It is also very important for the general art of singing, because of its unique characteristics and innovative elements such as semeography and semadography. All of the above in addition to characteristics such as having the oldest musical alphabet, being a tool for praying which lifts the human spirituality and ethos (moral) makes Byzantine Music an intangible cultural heritage that must be preserved and safeguarded.

Del_2_1_FINAL2.doc Page 87 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

5.4.4.4 Human Beat Box

5.4.4.4.1 General Information Human Beatbox (HBB) is an artistic form of human sound production in which the vocal organs are used to imitate percussion instruments (Tyte 2012, De Torcy et al. 2013). In contemporary western popular music, human beat- boxing is an element of hip hop culture, performed either as its own form of artistic expression, or as an accompaniment to rapping or singing (Kapur et al. 2004, Lederer 2005, Proctor et al. 2013). Beatboxing was pioneered in the1980s by New York artists including Doug E. Fresh and Darren Robinson (Hess, 2007). The name reflects the origins of the practice, in which performers attempted to imitate the sounds of the synthetic drum machines that were popularly used in hip hop production at the time. Because it is a relatively young vocal art form, beatboxing has not been extensively studied in the musical performance or speech science literature (Sinyor et al. 2005). The main characteristics of this technic are: “no tools needed, no cost”, “everyone has the instrument for HBB”. The most important aspects of HBB are the control of the body, the imitation of instruments and natural noises, the sound diversity, the spontaneity and the rhythm. HBB use every rhythm already present in instrumental music but it is mainly based on binary rhythm such as hip-hop.

Figure 5-7: Da vox, Human Beatbox performer during a show

Because it is a relatively young vocal art form, the main means of diffusion are TV shows, radios, concerts, battles (competitions), but the recent development is mainly due to internet (video, forum and social network). HBB is present in USA, Europe, and some beatboxers are present in Asia. Finally we can found a small representation of HBB in Africa, probably because of the lack of telecommunication tools.

5.4.4.4.2 Respondents’ information The sample was composed of 15 male. 13 are French, 1 Belgian and 1 is Swiss. All of them are native French speakers and the average age is about 26 years (min 16 years old; max: 46). 8 studied at the university, 4 were at the high school (lyceum), 2 studied to college, and one has not been to school. Half of the respondents are experiencing music for more than 6 years, and most of them for at least 4 years. 66% of the beatboxers are autodidact. 40% of the respondents were able to sing in another technique such as jazz, choir or hip-hop. 94% practiced personal work of their voices to improve their performance, and 34%

Del_2_1_FINAL2.doc Page 88 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 played the drums, the piano, the guitar or the violin. 73% teach Human Beatbox to youth or adult people, with an average of 2 to 10 times per year. They perform for a show with an average of 2 to 10 times per year (42% of the subjects), one time per year (29%) or one to six times per month (29%). Their role in Human Beatbox production is the main role (50% of the subjects are concerned with this role) or accompaniment role (50% too).

5.4.4.4.3 Physical Dimension Discussions with experts and answers to the questionnaire helped us to define the more typical gestures and postures used during a beatbox performance. The beatboxers are usually standing up, “the knee bent and the body down (horse rider position) and moving his hands, while a microphone can also be held and moderate the gestures. This position is the most standard but performers agreed that each person has his own moves and positions. While holding the microphone in one hand, the second hand can perform instrumental imitation or moves that encourage and motivate the audience (hand and harm up and down). The mouth can also be covered by 1 or 2 hands in order to change the acoustic properties of the vocal track and thus give the perception of the use of special effects as audio filters. According to the experts and the performers, there are active facial movements and expressions during a beatbox show. The lips and the eyes are playing an important role with sometime an implication of the eyebrows when frowning. As facial expression is partly determined by the nature of the produced sound, the movements of the face are not always symmetrical: “I use a lot of sounds that I am only one side of my mouth”. Our respondents specified that all body movements can occur during a Human Beatbox performance (thorax, shoulders, limbs, feet…) and that they are most of the time synchronized with the rhythm. Questions about the internal parts of the body demonstrate that they are all commonly implied in the sound production process with a predominance of the tongue which plays a particular role. Some respondents specified that “everything moves” with reference to the vocal track. Responses also point out that vocal exercises before the performance are essential in order to warm up those body parts. According to the performers, the key feature of Human Beatbox is definitely the rhythm followed by the improvisation capability and then the voice, articulation, and utterance. A participant pointed out that « there is not a good pronunciation but thousands of different ways to make a snare for example. But some pronunciations make better than others. ». Open questions highlight sub-features such as the charisma of the person and the accuracy of his movements (the more accurate they are, the less tiered the performer will be). Questions focused on the sound itself show that if noises and consonants are the more emphasized, melodies and vowels are also very important: “Finally, I will put vowels and consonants at the same level.”

5.4.4.4.4 Emotional Dimension The investigation on the emotional dimension has been carried out by asking if the performer should entre in a particular mood before the performance. The main tendency of the responses was that the mood does not necessary play an important role on the quality but “Like all arts the beatboxer must believe in what he does, have confidence in himself as a top athlete, and be in tune with himself” to perform the best.

Del_2_1_FINAL2.doc Page 89 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

5.4.4.4.5 Social Dimension According to the beatboxers, Human Beatbox can be performed in a vocal group, but it can also be practiced solo. The number of members in a group varies generally from 2 to 5 people. For some respondents, it is better to not exceed 4 or 5 people, but according to other respondents the number can be infinite and can rich 40 beatboxers. The relationship between performers is most of the times friendly and confident. They like to share their references about music but also about street wears. Apart from that they also maintain a certain competitiveness which seems to motivate them and to participate in the creation process. In the Beatbox community there is no hierarchical position. Titles won during competitions and years of experience may increase the reputation of a performer but no classes or grads are used. For example “the old beatboxers, those who started the movement in France, which have created associations, learned and showed voices to many novices and those who are the biggest competitors are the most respected”. In most of the groups, beatboxers can swap their roles, however, some respondents argued that a beatboxer cannot easily be replaced by another: “the personality of a beatboxer is unique as all people on this earth”. However, each beatboxer can have his own specialties: one can be very rhythmic and the other may be strong in sound effects. This complementarity in essential and usually within a group each member knows exactly what he has to do. Beatboxing is a musical instrument, so the roles are defined in the project implementation. In a group, it is common to find beatboxers more experienced than others, but most of the time they take advantage of the differences. One can do a basic rhythm and the others follow, pick up the rhythm, then the powerful beatboxer will abandon the beat to do more vocal and melodic beatbox. As there is no rule regarding hierarchy, there is no particular disposition to adopt when practicing with others. In a group beatboxers can be in semi-circle, looking at each other. During the competing games, the beatboxers start face to face, but also look at the public. Concerning the relationship with the public, there is no particular relationship to be established, but the closer the contact is, the better the performance will be perceived by the audience. The beatboxer must stir the audience’s curiosity. The Beatboxer can also be part of a more traditional music formation (singing, drums, guitar, piano…), but in that case he is generally limited to the rhythm. If it is a beatboxer focusing on vocal percussion and non-performance, his place in a group is as a percussionist.

5.4.4.4.6 Knowledge and Meta-knowledge Dimension In this section we asked the performer to point out the main knowledge necessary to the Human Beatbox. Knowing the three basic sounds inspired from battery and drums, wind and string instruments is very important followed by the awareness of the main techniques and the knowing of the basic rhythms : “the better the sound, the more the rhythm is rich and varied: it brings something new and opens more opportunities”. In line with the knowledge needs, experts and respondents to the questionnaire concluded that it would be useful to find in the platform information about different styles and techniques. Physiological information (mainly articulatory material) would be suitable for explaining how to generate the different classes of sound.

Del_2_1_FINAL2.doc Page 90 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Beatboxers also agreed on the importance of having very detailed information about the sound and video sequences which seems to be the favourite medium before audio tracks and text. Some respondents suggest that the organisation of workshops with known people would be a great way to promote the Human Beatbox culture

5.4.4.4.7 Context/Environment Dimension Human Beatbox can be practiced everywhere: in the street, on stage (concert rooms), at home, in group between beatboxers, in festivals, there is no dedicated place. When practicing on stage (concert room), it must have a good sound with perfect settings (microphone, speakers, feedback…), but in general there is no special configuration. Human Beatbox is usually practiced in a circle (for a group) and there is no particular dress code needed.

5.4.4.4.8 Teaching and Learning Dimension Human Beatbox is originally based on oral transition. Beatboxers mainly learn by observation and imitation and by practical Training: “Personal training is the best of course, but watching video (or master if possible) is still necessary; at least to progress quickly by copying the technique.” The more adapted learning support indicated by the subjects is the audio plus video. On the same way performers would also use audio plus video to teach. The main characteristic of a good student appears to be the ability of following a rhythm precisely and to create news rhythms on his own.

5.4.4.4.9 Value Dimension According to the respondents one of the great values of the Human Beatbox is its innovative characteristic. Beatboxers are also very proud of the culture accompanying their art as well as the values of community, generosity, solidarity and entertainment: “the Beatbox is a cultural tool that cannot be regarded as revolutionary, but it can stay smart and independent from the market and the industrial culture”.

5.4.4.5 Romanian Căluş dance

5.4.4.5.1 General Information Romanian Căluş originated as a healing and fertility ritual performed by groups of an odd number of men, bound together by an oath. By the beginning of the 20th century its ritual form survived mainly in southern Romania and among Romanian minorities in northern Bulgaria (Mellish, 2006), although remnants of this custom could be found in much of the rest of Romania, and throughout the Balkans. The Căluş tradition, especially through the art of the dance, caught the attention of the pilgrims, historians, researchers and has always raised the general admiration. The complexity and the mystery of this ritual, along with the virtuosity of the dance accompanying it (Figure 5-8), attracted the interest of many specialists, who have mentioned, along the time, the beauty and complexity of this custom. At the end of 2005, the Căluş custom was included by the UNESCO in the list of the immaterial masterpieces of humanity.

Del_2_1_FINAL2.doc Page 91 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Figure 5-8 Călus dance

The bibliography of the last centuries concerning the Căluş is rich in describing the custom, mentioning all the stages of the ritual. The descriptions and the theories about this custom stress mainly the aspects concerning the dance, its role and function within the ritual, and its magical virtues. From an anthropological point of view, the theories about the origins of the custom are numerous, and each researcher associates the Căluş with certain dances and rituals belonging to other people. A first mention is offered by the Greek historian Xenophon in his work “Anabasis”, in accordance to which Căluş is a descendent of the Thracian dance “Kolabrismos”. This dance was considered a war implication, having a dramatic character. Several centuries later, Dimitrie Cantemir mentions in “Descriptio Moldavie”: ”All the Calusari bear a sword, if such a party meet another, then they must fight, if anyone is killed in such a squander, there is no trial, they cover their faces with cloth, lest they should be recognized, they dress in women’s clothes and talk like women.” The present form of the custom is based on the Thracian one, with all the influences added along the time, taking into account the central position of the Romanian regions towards the cultural currents around them. Căluş is a very popular dance at specific regions in the south and central parts of Romania (Figure 5-9). The Căluş is met today in the regions of Oltenia and Muntenia, and each locality has its particularities concerning the manner in which the ritual is performed. Some identifiable dance motifs, which make up the Călus figures, are inherited or created by the famous leaders or Calusari, bearing their manes or manes linked to the place they come from.

Del_2_1_FINAL2.doc Page 92 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Figure 5-9: Geographical distribution of villages and areas with Căluş ritual/dances

The musical measure is 2/4 and the rhythm is binary and binary syncopated. The Rhythmic cells are: Dipiric, Anapest, Dactil, Spondeu, Amfibrah, while Rhythmic motifs are formed by combining two or more rhythmic binary cells. The superposition between the dance and musical phrases, in archaic form, is in most cases unmatched. The rhythm of the dance superposes and completes the rhythm of the song, creating a polyrhythmic effect.

5.4.4.5.2 Respondents’ information Four questionnaires were filled: one by Florian Teodoresku, an Expert on Căluş dance, who is both a Professor and Choreographer and three from Căluş dance students. A general comment was that it is not easy for users to provide many comments regarding the project implementation now, since they are not yet familiar with such technology.

5.4.4.5.3 Physical Dimension All the written accounts about the Căluş state that the number of dancers in the party is odd, 5, 7, 9, 11 or more. The number includes the Vataf (the leader), the Mut (the dumb), the flag bearer, as well as the Calusari. The Calusari are disposed as single units in a circle. It is the only dance from the south of the country in which the dancers are placed in such a formation. The Căluş is composed only of men of different ages, excluding children, who are not accepted under a certain age and social status. The Calusari are freely placed in space, the specific posture for the Căluş is leaning on the stick, with the right hand a rule, or with both hands. There are places where, at certain Căluş movements, the Calusari are held in pairs by the left hand with the stick in the right one. The features characteristic of this dance are: i) starting figure of walking (Walk, "plimbari"), or a basic step, obviously in a anticlockwise circle moving (repetition of certain cells or kinetic motifs), ii) more complex figures (Move, "mişcari") performed in place between walking steps, and iii) figures are formed from combinations of spectacular elements. The figures ("mişcari") are combinations of extremely spectacular and fast stamps, heel clicks, springs and leg rotations. The introductory kinetic motif is different from one place to another, representing the

Del_2_1_FINAL2.doc Page 93 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 identity of the respective place or party. In some situations the Sarba (another Romanian dance) also appears. Căluş moves both ways and in place on the circle. The tempo is alternative, depending on the section performed: the Walk is performed Animato, the tempo of the Move is Vivace and of the Sarba is Allegro. The Căluş figures are performed at the command of the Vataf. There are parties who perform the figures in a fixed succession, established by the choreographer or the instructor of the party. Basic Moves: There is a high frequency of spurs on the ground or in the air, a basic kinetic motif, dick determines a high degree of difficulty and vigour, stepping on the ground, jumps, movements from a lying position on the back or the front, large movements on the circle line, cross steps at the front or at the back, toe-heel steps, weavings of the feet, rotations of the feet, in the air or the ground. All body parts are used, however the movement of fingers is less important.

5.4.4.5.4 Emotional Dimension The dance of the Calusari illustrates the main characteristics of the people: the vigor, the exuberance and the joy, all these features being important inside the custom. Mixed feelings and emotions are felt by the Calusari dancers, most notably devoutness and excitement

5.4.4.5.5 Social Dimension Căluş is a group dance using a predefined choreography, although the basic steps can be learned as solo and practiced as a solo dance. As said before, the number of dancers in the party is odd (5, 7, 9, 11 or more). They perform on a circle in front of the audience. The Căluş figures are performed at the command of the Vataf. There are parties who perform the figures in a fixed succession, established by the choreographer or the instructor of the party. Most of the interviewees stated that the primary factor that led them to learn Călus was their family. The existence of dancing clubs and associations also played a significant factor to their involvement in learning traditional dances.

5.4.4.5.6 Knowledge and Meta-knowledge Dimension The Căluş has a multitude of ritual meanings: of fecundity, of fertility, of the protection of the community against the wicked fairies, of healing of those who trespassed the folk tradition or punished by the Iele (malevolent fairies), and many other significances concerning prophylaxis or prosperity. The Căluş, as the whole, is formed of the series of ritual acts which succeed in the concrete development of the custom. The most important stages of the custom are: the forming of the Călus party, the making of the flag and the adjacent items, the vow of the Calus to the flag, various ritual acts which are performed in the course of the manifestation, the breaking up of the Căluş, and the burying of the flag, which are each in itself, very important ceremonial acts in the development of the custom. Initiation, within a Căluş party refers to the mysteries of the society which it represents. Today the focus is on the imitation of the dance, attempting in this way the preservation and the transmission of the dance, or of certain parts which still remain in the present day repertoire. There is a belief that not anyone can be a member of the Căluş party and not any individual and in any age can become an actor of this custom. The selection and the participation of a man in the Căluş party brings fame and prestige to the community. The skill of mastering these rituals and

Del_2_1_FINAL2.doc Page 94 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 the technical execution of the Căluş figures require a long period of initiation, for which the Vataf and his helpers are responsible.

5.4.4.5.7 Context/Environment Dimension Environment (Music accompaniment, Costumes and Location/place) is extremely important, as noted by all interviewees. The main objects of the Calusari party are the flag, with all the ritual objects used for its adornment wormwood, basil, garlic, nut, tree leaves, wheat ears, which are considered plants with prophylactic or curative characteristics. Furthermore, the sticks of the Calusari are considered relics of the swords, having different function, serving as leaning objects, as fighting weapons or having a ludic function. The main accessories worn are: spurs, little bells, attachment to the shoes or the legs, the belt with bells, woven around the waist by the Calusari from Giurgita, handkerchiefs worn around the waist in order to ensure fertility for women. Furthermore, time is also important as the time for the Căluş ritual is the period of Rusalii (Pentecost) which occurs forty days after Orthodox Easter, and lasts for seven to nine days. The custom begins on the Sunday of the Rusalii and ends after nine days, on the Tuesday of the Cioc, when the flag is buried and the party spreads, or after three days from Rusalii, on the first Tuesday.

5.4.4.5.8 Teaching and Learning Dimension Most of the interviewees believe that the best way of teaching a traditional dance is by explaining and showing (demonstration). Furthermore, dance teachers insisted that dance teaching methods based on demonstration and imitation can be used. Furthermore, explanations are suggested for correcting the student’s performance before and during the repetition of a figure.

5.4.4.5.9 Value Dimension Căluş dance high historical, economic and cultural value is recognized by all interviewees and is considered as unique in its type.

5.4.4.6 Greek Tsamiko dance

5.4.4.6.1 General Information The Tsamiko (Greek: Τσάμικος, Tsamikos) is a popular traditional folk dance of Greece. This dance is probably named from the Tsames in Northern Epirus, but according to other sources it is named from the clothes of the 'klephtes', the mountain fighters in the Greek war of independence. The name of the dance comes from the name used to describe the outfits they wore, which were called tsamika. The main feature of the kleftiko costume is the foustanella, a white pleated kilt. These types of outfits can still be seen today in parades and special events where they are worn by the special segment of the Greek army called Evzones59.

59 Edinburgh Greek Dance Group, http://www.greekdance.org/e-library/Tsamiko

Del_2_1_FINAL2.doc Page 95 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Figure 5-10. Tsamiko traditional Greek dance

Tsamiko is danced to a 3/4 rhythm. The dance follows a strict and slow tempo with emphasis on the "attitude, style and grace" of the dancer. The steps are relatively easy but have to be precise and strictly on beat. Its variations consist of both smooth and leaping steps, which give the dance a triumphant air. In earlier times, the Tsamiko was danced in the mountainous areas of Epirus, in Northern Greece, only by men. Today, it is enjoyed throughout Greece by both men and women. It is danced in a semi-circle, with the leader performing variations while the others follow the basic steps. The deliberate, grandiose nature of this dance stirs considerable excitement in the individual, especially the leader. The peak of the dance exhorts the leader to perform outstanding gymnastic and acrobatic feats60. The dancer might even stomp his foot in response to a strong beat. There is some improvisation involved and many variations of the steps, depending on which area the dancers come from. Over time the dance has taken on many variations. Although an eight measure (sixteen steps) dance has been taught at schools and a five measure (ten steps) dance is common in northern Greece, the six measure (twelve steps) dance is by far the most widely danced in Greece and elsewhere61. The character of the music ranges from relatively gentle to dramatic and powerful, and the songs often describe heroic deeds.

5.4.4.6.2 Respondents’ information As far as the gender is concerned 7 respondents are male and 23 are female. Many respondents(5) belong to 40-49 years category, the same number of respondents belong to 50-59 category and 4 belong to 20-29 category, the rest can be seen in the statistical results. The education level of all people is College/University. Most people start dancing at 10-19 years category and almost the same number answers having started dancing at 3-9 years category, other answers of this question can be seen in statistical results section. Users perform a great number of traditional dances that do not present any homogeneity and are too many to be enumerated. Most peoples’ experience in traditional dancing is more than 6 years. 13 respondents are amateur dancers, 5 are dance teachers and 3 are researchers (the rest of the answers are cited in details in statistical analysis). The great majority of interviewees learned traditional dancing with a professor/ expert who explained and showed them

60 http://greekcommunity.org.nz/2012/12/greek-dance/ 61 Folk Dance Federation of California, http://www.folkdance.com/LDNotations/ Tsamikos1995LD.pdf

Del_2_1_FINAL2.doc Page 96 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 how to do. Participating to trainings is the way for improving traditional dancing skills that most respondents choose. In statistical analysis, other traditional and nontraditional dances that respondents perform are cited. Finally, most people practice traditional singing.

5.4.4.6.3 Physical Dimension In Tsamiko, the dancers hold each other from each other's hands, bent 90 degrees upwards at the elbows. The performer actually uses the whole body (as the most of Greek dances) and mainly the lower limbs (the steps of course). The upper body is turned on different directions and the upper limbs are used to give information about the position of the body and keep distances between the dancers. The performer has to do 16 movements. Usually the 16 movements measured with 10 numbers. There are three movements (1-2-3) and measured as 1. So it goes like 1r (1-2-3), 2l and again 3r (1-2-3), 4l, 5r, 6l and 7l (1-2-3), 8r, 9l, 10r (strong-weak-weak). There are many variations of the steps, depending on which area the dancers come from.

5.4.4.6.4 Emotional Dimension Due to its historical origins (Tsamikos became the favourite dance of the mountain fighters and rebels of the Greek revolution), the dance has a grandiose nature, which stirs considerable excitement in the individual, especially the leader. In the past, Tsamiko used to be danced only by men, with the leader expressing strength and courage through acrobatic movements. Nowadays, both men and women (women’s versions emphasizing footwork rather than acrobatics) dance Tsamiko, and the overwhelming feelings are still joy, contentment and pride.

5.4.4.6.5 Social Dimension The performer usually performs with a group of dancers on a circle in front of an audience. The minimum number of dancers is two, while there is no maximum number of dancers. In Tsamiko there is a physical interaction between all dancers. Specifically, dancers should follow the basic steps of the dance apart from the leader, who can improvise performing outstanding gymnastic and acrobatic figures. Most of the interviewees answered that the main factor led them to learn Tsamiko is their social environment, i.e. family and friends, as well as their interest in tradition. The existence of dancing clubs and associations also played a significant factor to their involvement in learning traditional dances. The majority of the users believe that the existence of a complementary learning method could be useful for learning traditional dances.

5.4.4.6.6 Knowledge and Meta-knowledge Dimension A performer needs to know the number and the sequence of the steps. Tsamikos (traditional dances in general) is based on highly coordinated rhythmical movements that require visual coupling between the performers and auditory stimulus between the performers and the music. Additionally it requires haptic contact between the performers. The performer has to keep his/her upper body straight over the center of pressure of the body. That means that the performer has to use the “ankle strategy” (the hip segment’s angle has to be unchanged) during the dance. Performer should be able to keep at least the sequence of the steps and to control the coordination of lower limbs with the music.

Del_2_1_FINAL2.doc Page 97 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

5.4.4.6.7 Context/Environment Dimension The performer can actually perform everywhere. People usually dance in outside places and especially at the central square of a village or a town and sometimes in theaters. The environment does not need to have specific characteristics to be used for performance. The only limitation could be the space. Except for the music (the cd or the live music with the musicians) and the traditional costumes (foustanella, a white pleated kilt.) if needed, the performers do not need to have anything else with them.

5.4.4.6.8 Teaching and Learning Dimension Most of the users answered that the best way of teaching a traditional dance is by explaining and showing. Most of them stated that learning by imitation is easier than learning by studying training materials. Furthermore, dance teachers insisted that specific dance teaching methods should be adopted by the system e.g. the progressive partial method. The most significant of the characteristics that the student of the traditional dance must have is discipline in rhythm (synchronization). They also stated that during teaching if they observe some errors in their student’s posture, they prefer either to show again him/her the correct way of doing and let him/her imitate it or to do the gesture in parallel, so he/she can watch and imitate at the same time. The average number of lessons required to learn traditional dance is 6-8 and most respondents believe that the way the lessons are taught depend on age or experience of the students. Most people say that a typical lesson last 1 hour. Moreover, the answer to the most important learning method for teaching traditional dances is learning by imitation. The most important training material type is regarded firstly, the video of a teacher explaining the dance and secondly, videos of people performing the dance. Respondents believe that the most important feedback provided to the student by the training system is the visual. Additionally, it is preferred receiving the evaluation of their performance after the whole dance is completed instead of receiving the evaluation during the dance figure or after each dance figure. The element of the training system’s user interface that respondents consider most important is the video of the correct dance. The two key elements in the determination of student’s score are: the precision of body motion and the precision in following the rhythm. The ability to replay students’ performance is the most important feature and the ability to zoom in/out is also considered very important feature. In learning or practicing traditional dance without direct access to a teacher, through an interactive video game, interviewees believe that the structure of a lesson should be repetition of the same dance figure, with progressive introduction of new dance figures.

5.4.4.6.9 Value Dimension Most interviewees’ opinion about the value of traditional dance is the cultural dimension. This is easily explained as Tsamiko is a dance of kleftes or Greek rebels, who lived in the mountains during Greek revolutionary war against Turkey. The dance spread from Epirus to Thessaly and Roumeli when it was adopted by the klephtes. Moreover, a great number of respondents regard very important the historical value of the dance as it is a warlike dance that was danced in preparation for battle, as evidenced in the leader's leaps and spins, somersaults and striking of feet on the floor. Through these steps, the leader is showing his physical skill and his value as a man and a fighter. The entire dance is an expression of bravery and the

Del_2_1_FINAL2.doc Page 98 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 desire to win. Nowadays, the dance is danced all over the country and is considered as an integral part of the Greek culture.

5.4.4.7 Walloon traditional dances

5.4.4.7.1 General Information Walloon traditional dances are essentially peasant dances originated from the 18th, 19th and early 20th centuries and practiced in the Walloon region of Belgium. They were originally mostly danced in popular balls in the villages but almost disappeared at the end of the 19th century and beginning of 20th century. A few people and groups interested in preserving and perpetuating this intangible heritage conducted dance « collections » at their own initiative, by interviewing older people who used to perform the dance and hence where living representatives of this heritage, or found information in notebooks from « ménétriers » (dance leaders) who used to go to local events (weddings, etc.) and in villages to play music and animate the traditional balls. Walloon traditional dances are hence mainly reconstitutions or dances inherited from a few older dancers through these individual dance collections. Traditional dance groups have since been practicing and teaching these dances, perpetuating the tradition of folklore, community and social interactions which are at the center of these traditional dances. All traditional dancers are amateur dancers (not professional), with different levels of expertise. Each dance group relies on one or a few experts, which have inherited their knowledge of the traditional Walloon dances from individual dance collections or from older experts. There are currently around thirty local dance groups in Belgium. Since there are no formal traditional dance schools, Walloon dances are usually taught in the local dance groups. The dances are usually practiced at the dance group rehearsals during the whole year, and most dance groups perform from time to time in local events, traditional dance festivals in Belgium or abroad, or in folk balls organized by the different dance groups. Walloon traditional dances are mainly couple dances, whitch are danced with at least four couples, with no upper limit to the number of participants. Some of the main Walloon dances are called « passe-pied », « maclotte », « scottish », « amoureuse », « allemandes », « mazurka », « polka », « quadrilles », « valse ». They are performed with some variances according to the knowledge sources of the dance group and according to the region. Explanations about common features for some of these dances are given in the following62:

62 http://www.vitrifolk.be/

Del_2_1_FINAL2.doc Page 99 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

 La Maclote

Figure5-11 "Maclote liégeoise" performed by the dance company "Les Vis T'Chapias du Stimont"

This word is derived from “matelot”. It designated originally the contra dances performed by deckhands. There are almost as many adapted maclotes as there are villages in Wallonia. Execution of these maclotes occupies 8 measures in 2/4 times based on a sequence of hop steps and stamps, chasse or polka steps.  L’Allemande “L’Allemande” is a festive couple dance. This one comes originally from Germany. It is danced in duple meter with a moderate rhythm.

Figure 5-12 "L'Allemande" by the dance company "La Compagnie Folklorique Fanny Thibout

 La Polka Polka is a fast couple dance, born in the early 19th century in central Europe. The steps sequence begins with a preparatory hop followed by a chasse done first to the left and then to the right; this figure is performed during a displacement around the dance floor.  La Quadrille Quadrille is a dance performed by four couples in a rectangular formation. It is composed by five figures danced to a 2/4 rhythm or 6/8. With more than one quadrille (if we have more than 4 couples), these ones can be positioned in line or in quincunx.

Del_2_1_FINAL2.doc Page 100 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Figure 5-13 Quadrille during the annual village feast in Saint-Mard (http://www.soleildegaume.be)

5.4.4.7.2 Respondents’ information The questionnaires were distributed in two different Walloon traditional dance groups, called “Les Pas d’la Yau” (around 20 members, paper version of the questionnaires) and “La Compagnie Fanny Thibout” (around 20 members, questionnaires as Word document sent by email). We received 18 answers, but some of them where very scarce. This explains why we do not have the same number of answers for each question. The sample was composed of 9 males and 7 females. The respondents are between 16 and 70 years old, with a mean of 41,5. A high majority (69%) of the respondents have a high education level (University/High school), and some of the younger respondents are still studying. A majority of the respondents began to dance either at a very young age (before 9, 37% of the respondents), either in young adulthood (between 16 and 32, 44% if the respondents). They perform mainly Walloon traditional dances, for which they do not all use exactly the same nomenclature (Walloon traditional dances, Walloon dances, traditional Belgian dances, dances of the Walloon folklore, or enumeration of the dances). The Walloon traditional dances which were cited are: mazurka, polka, passe pied, quadrille, valse, Scottish, maclotte, ardèges, contredanses. All respondents have more than six years of expertise in the Walloon traditional dances, and all of them are amateur dancers. Four of them are dance teachers, and one is also a choreographer. All of the respondents learned the dance with a professor who explained and showed them some basics; five out of 17 also mentioned a professor who explained without showing. The other mainly cited way of learning is being part of a dance group. They all learned by participating in trainings, although a few people mention watching video courses, demonstration by expert dancers and application to themselves, asking and discussing with other expert dancers, and experimenting by themselves. Another way of learning cited is participating in folk balls. Most traditional dancers (69%) have no expertise in non-traditional dances, and none of them are professional dancers. Most respondents (65%) do not practice traditional singing in addition to dancing.

5.4.4.7.3 Physical Dimension The importance of each body part during Walloon traditional dances has been assessed on a scale range from 0 (low importance) to 5 (high importance). The body parts involved in the dance are, in order of decreasing importance: lower body

Del_2_1_FINAL2.doc Page 101 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

(legs), feet, upper body (arms and torso), head, hands, and fingers. The difficulty of the different dance figures can vary widely according to the type of Walloon dance performed. Moreover, even the easiest figures can become difficult when one wants to reach perfection in the movement execution (body, expression, smoothness, etc.). During the Walloon dances, movements are enacted most commonly following a specific rhythm, and a specific predetermined sequence of figures. Spontaneity does not play any role in the dance. The key features in the Walloon traditional dances are the dance figure, the group interaction, the movement rhythm, and the movement style. All the different dances come from an oral tradition, and small variations in the step execution can appear from dance group to dance group, and from region to region.

5.4.4.7.4 Emotional Dimension Regarding the emotional aspects, the emotions mostly felt by the respondents while dancing are amusement joy, contentment. Pride, love, and awe are also part of the dance experience. Among these, the emotions that mainly influence the dance performances are amusement and joy

5.4.4.7.5 Social Dimension The reasons that led people to get involved in learning traditional dances are multiple, but the importance of family (47%) or friends (29%) is preponderant compared to people who decided to get involved on their own (23%). The interest in traditions is also essential (41%), while the existence of a dance group also plays an important role (29%). Walloon traditional dances are mainly performed as predefined couple or group dances. Improvisation and solo dancing are not really part of the experience; solo dancing having some interest only when rehearsing specific dance steps. In Walloon traditional dances, the average dancers number change according to dance, but there are at least 8, and more often between 16 and 20 people. Most of the dance moves include interactions between two partners (couple dances), or with the whole group, and less often figures are performed in solo. The majority of the respondents (77%) totally disagree with the fact that they would be practicing more if they could train on their own, but 70% of the people agree that complementary learning methods could be useful for learning traditional dances.

5.4.4.7.6 Knowledge and Meta-knowledge Dimension In the i-Treasure database, the respondents would mainly like to be able to search for specific dance figures, posture and moves, or sequences of moves. More than 90% of the respondents think that the i-Treasure platform will contribute to the preservation of their cultural heritage, and would be interested in a demonstration of the system.

5.4.4.7.7 Context/Environment Dimension The Walloon dances are performed mainly at the dance company place (95% of the respondents) and in dance festivals (88% of the respondents). The folkloric balls are also mentioned by 35% of the respondents as places where they perform, but these dances are almost never performed at home, in schools or in theatres.

Del_2_1_FINAL2.doc Page 102 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

The area required for dancing is comprised between approximately 6 meters by 6 meters, and 8 meters by 10 meters, but can be bigger depending on the number of dancers. For the dance performance, the music accompaniment is very important, as well as the costumes of the dancers. The music is mainly played by accordion, violin, flute, and percussion, and also sometimes with bagpipes or other instruments. The costumes are mainly peasant costumes from the 18th and 19th centuries, but they can be different for each dance type. They include mainly long skirts for women, and sometimes ample blouses for the men. The costumes can sometimes interact with the dance figures (for instance, women holding and moving their skirt in their hands), and tools representing the peasant craft can also be part of a few dances.

5.4.4.7.8 Teaching and Learning Dimension The teaching of traditional dance figure is done mainly explaining and showing the student how to perform the gesture. Learning by imitation is hence by far the most important learning method. If the student makes mistakes, the teacher will re-show the gesture, or explain the error and let the student perform the gesture again. Dance classes will typically last between one and two hours. The student must have a good rhythmic coordination, but the physical condition is not a very important criterion. For learning without access to a teacher, the respondents emphasized the importance of videos, but do not see much interest in interactive video games, which appear far from their experience, especially since the dances are to be performed as group dances. However, whilst they emphasize the importance of the traditional learning method, they see as an advantage the possibility of being able to practice at home, without specific schedules, and think that these technologies might attract the interest of some young people in traditional dances. However, human relationships, social interactions, communication and sharing are values that are and should remain at the heart of traditional dances. Interactive video games could be interesting as complementary learning tools, for rehearsing specific figures individually for instance. However, if people were to learn through an interactive video game, visual feedbacks would again by far be the most important type of feedback. The feedback should be given after the whole dance is completed for 47% of the respondents, after each dance figure for 41%, and during the dance figure only for 23%. The training system should display a character performing the correct dance, or a video showing the correct dance, and being able to visually highlight the mistakes of the student. The precision in following the rhythm is the first element that should be evaluated, and the precision of body motion (right move performed with right style) is also very important. The system should enable the student to visualize the dance from different points of view, to replay his own performance, and to choose specific moves or specific lessons.

5.4.4.7.9 Value Dimension Traditional Walloon dancers recognize a very high cultural, but also historical value to their dances. Information about the Walloon traditional dances can be found in a few books, reporting collections of dances, on the websites of dance groups or federations of dance groups, or in videos of the dance groups performances. However, most dancers have gathered their knowledge through the oral transmission inside the dance groups. The respondents do not see the future of the Walloon traditional dances very optimistically. Most of them think that the interest of people in traditional dances and

Del_2_1_FINAL2.doc Page 103 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 the number of people practicing these dances are not increasing, and that young people are not willing to learn these dances. However, most respondents think that new technologies can help preserving traditional dance heritage. Personally, they practice the dance mostly because they like dancing and because it contributes to the preservation of their cultural heritage.

5.4.4.8 Contemporary dance

5.4.4.8.1 General Information

Figure 5-14 Standing Wave – Company Bud Blumenthal

Contemporary dance has appeared in the middle of the 20th century in Europe and in the United States, mostly after the Second World War and has become increasingly popular amongst professional dancers. At the origin of the contemporary dance are names such as Martha Graham, Merce Cunningham, Thrisha Brown, etc. It is now one of the most represented dance type amongst the professional dancers. It consists in the exploration of the total movement potential of the body, and is not bound by set standards. Contemporary dance hence covers a very wide range of motion. Contemporary dance has been originally developed on the technical basis of classical and modern dance techniques, and has been continually evolving since the origins. Each contemporary dancer is influenced by different inspiration sources, and by his experience. The contemporary dance styles hence often refer to the name of the dancers and choreographers from whom the style originally comes from.

Del_2_1_FINAL2.doc Page 104 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Figure 5-15 Dancers images from DANCERS! website

Contemporary dance is hence very diverse, while the performance is radically different according to the dancer. Collections such as the DANCERS! Project63 can give an idea of the diversity of the contemporary dance.

5.4.4.8.2 Respondents’ information Getting answers from contemporary dancers turned out to be difficult. Given that they continuously analyse and think about motion and dance motion, each word has a specific meaning for them, and each question draws a deep reflection and takes a lot of time to answer. Besides, even if connections with dance companies have been established to solicit answers, the contacted dancers showed lack of available time to provide responses. In the next months, it is planned to expand the network of distribution for this questionnaire in order to overcome the problem and collect significant quantity of data. Their responses will be included in the next document about user requirement identification and analysis (D2.3). In the end, only two people answered the questionnaire, but we had the occasion to discuss very long with both of them and hence to gather very detailed information. One of our respondent is a man, a contemporary dancer since the age of 25 and choreographer, who has his own dance company and a very good experience in collecting contemporary dance performances. The second one is a 43 year old woman, a dancer of professional level, who practice contemporary dance since the age of 19. It is hard to draw any generalities from the personality descriptions that the respondents gave of themselves. Our respondents both describe themselves as extrovert, conscientious and intellectual.

63 http://www.dancersproject.com

Del_2_1_FINAL2.doc Page 105 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Contemporary dance is mainly studied through participations to dance classes, seminars and workshops. The different contemporary dance styles and techniques performed by our experts include for both of them M. Graham, J. Limon, Release, Cunningham, Contact improvisation and for one of them Evans and Jazz in addition to the others. Other activities such as sports (baseball, basketball, football, wrestling, gymnastics, tai-chi chan, yoga) sometimes also influence the way contemporary dancers move. There are many new techniques evolving that are either fusions or metamorphoses of established techniques hybrided with extraineous forms of dance or sport. Examples are hip-hop techniques that are now part of contemporary, gymnastic or acrobatics and parcours techniques, African styles, or Martial arts, which all have found their way into contemporary. Improvement of the contemporary dance skills is obtained mainly by participating in classes, exchanging with other contemporary dancers, demonstration by expert dancers, and also experimentation by the contemporary dancers themselves. Other dances very often performed by contemporary dancers include classical, jazz and modern dance. In addition to the exploration of all potential body moves, some contemporary dancers, like our two experts, also use their voice as part of their performance while they dance.

5.4.4.8.3 Physical Dimension The body parts involved in the dance can be different according to the contemporary dancer, since contemporary dance is about exploring all the potential body movements, with an accent on different body parts according to the contemporary dance style. Our experts agree on the very high importance of feet and legs, arms and torso, but have opposite feelings about the importance of fingers, and hands. Regarding the movements of the different body parts and their frequencies, there are no set standards in contemporary dance. Furthermore, all movements should be able to happen simultaneously, and all movements should be able to happen alone depending upon the need to attract attention to a part of the body and an idea or to a movement form or figure. Some of the major dance moves recurrent in the contemporary dance performed by our experts include: kick ball change, rond de jambe, port de bras, leap, roll on floor, battement, développé, pirouette or turn, handstand variations, isolations (only one part of the body moves), ripple of the spine, shift of the weight, “floating”, and “support”. In contemporary dance, movements are rarely enacted following a specific rhythm or at predetermined places in space, but rather spontaneously or following a predefined sequence. The key features in contemporary dance are the movement style and expression, and a little less the dance moves. The movement rhythm on the other hand is not a key feature, nor is the movement precision.

5.4.4.8.4 Emotional Dimension Regarding the emotional aspects, the emotions mostly felt by the respondents while dancing are joy, contentment and awe, and love and devoutness in addition for one of the respondents. Among these, the emotions that mainly influence the dance performances are awe, contentment and also pride for one of the respondent. Inwardness, absorption, concentration and focus can also be reflected in the performance.

Del_2_1_FINAL2.doc Page 106 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

5.4.4.8.5 Social Dimension Learning contemporary dance almost always comes from a personal choice from professional dancers. Contemporary dance can be performed either as a solo dance or a partner dance, and the motion can be predefined or improvised. The choice of the form will change according the project on which the dancer is working. In the case of group dance, the average number of dancers is between 5 and 10 dancers. Contemporary dancers would not practice more if they could train on their own and they do not agree on the need for complementary learning methods. However, some people would like to learn but do not have access to dance classes.

5.4.4.8.6 Knowledge and Meta-knowledge Dimension Our respondents would like to be able to look for all kind of information in the i- Treasure database: mainly styles/techniques and dances figures, postures and moves, but also sequence of moves, geographical and historical information, and anatomical criteria, or information about choreographer, stage and costume designer, and dance companies. They would be interested in being able to search for correlations between geography and style, movement figure and anthropological criteria, gestures and historical, geographical and anthropological criteria, psychological and gestural criteria, emotions and movements, anthropology and rhythm, culture and movements. The i-Treasure database is believed to be able to contribute to the preservation of cultural heritage. Interesting additional features could include a connection to the internet to allow constant automated search from lexographic term database that is triggered by movements (existing tools like Google are not adapted/efficient) and lessons so that one can discover in many divers areas directly while studying with the machine. The automated searches should be automatically saved so that one can follow up generated searches after the session. Anatomy lessons would be very useful (including terminology), and a visualization of how the skeleton and organs/body parts move under specific body would be useful, suggesting ways to avoid injuries/pain. Body temperature change monitoring (as the body moves) would also be useful. Warnings for extreme dance moves that may lead to injuries are also a feature of interest, along with personalized training (e.g. adapted for injured dancers).

5.4.4.8.7 Context/Environment Dimension Contemporary dance is usually performed either at the dance company studio or in dance festival. The minimum area for performing contemporary dance is around 5-6 meters by 5-6 meters, up to a maximum around 14 meters by 14 meters. For contemporary dance, the costume and the location have no importance, but the dance floor is very important and should be supple and smooth (wood or vinyl dance floor). The music will or will not have an importance according to the performer and its current project. The only specificity of the clothes is that they should be comfortable and not distract movements. Roomy wide clothes are nice to work in but tighter to body clothes can work too as long as they do not bind the body, they can have elastic in the fiber and then work perfectly.

Del_2_1_FINAL2.doc Page 107 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

5.4.4.8.8 Teaching and Learning Dimension Teaching is mainly performed by explaining and demonstration to the student how to do the moves, or some basic principles, or by pure imitation (demonstration without explanation). On the other hand, explaining without demonstrating anything is not efficient at all. Other teaching methods useful for contemporary dance are: performance proposals, communication with students, provoke the brain, endorse improvisation, watching, discussing anatomy and physiology, poetic descriptions, moving with eyes closed, hyper slow moving, partnering. Learning contemporary dance is mainly achieved by taking classes, even though learning by imitation and alone (discovery, experimentation) are also important. A contemporary dance student must mainly have a good physical coordination and reflexes, and be motivated by the aesthetical experience. One (or sometimes two) lessons are usually required to learn a new dance move, one lesson lasting around 1,5 to 2 hours. It is very often important to have a general shared level of proficiency among the students, so that the work is adapted to the state of the development of the student. While teaching, if the student makes mistakes, both our experts demonstrate again the correct way of execution and let the student imitate it, or they do the same movement together, so the student can see and imitate at the same time. The also give verbal explanations on what s/he is doing wrong, either during the movement or after it. For learning, videos of people performing the dance or of a professor explaining the dance are the most important training material types. An interactive video games also draws interest from our experts. They see its usefulness for home use, at one’s personal schedule, and for those who have not access to classes, in places where classes are rare or not available at all, when classes are too expensive or for people who are too shy, for more concentrated and specific training where a class is impractical, or for fun and personal exploration and for deep research. But an interactive video game has many limitations and should be used as a complementary tool. If they were to learn through an interactive video game, the feedback should be visual, and given after the sequence of dance moves has been completed, or after the execution of each dance move. Important features of the system would be a display of the correct dance moves (virtual character or video), a virtual character performing the student’s dance moves, and a visual highlight of her/his mistakes. A numerical score is not important at all. However, when learning contemporary dance recommendations are useful, but errors are quite subjective. The evaluation should focus on the precision of body motion (correct execution with correct style), but not so much on rhythmic precision or timing. The system should enable the student to zoom in and out and to rotate the view, to replay his own performance, and to choose a specific lesson and a difficulty level. There is no need for the avatar to be visually similar to the student. A lesson could consist in repeating new dance figures, then introduce other ones, and finally put them together in a sequence.

5.4.4.8.9 Value Dimension Our experts disagree on the historical and economical value of contemporary dance, but agree on the very high value of contemporary dance regarding culture, uniqueness and innovation. Other important values include paradigm of physical knowledge (i.e.: expansion of collective body model idea and use) and interaction between stage designer/film maker/musician/choreographer for instance.

Del_2_1_FINAL2.doc Page 108 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Information sources about contemporary dance are numerous. Lots of websites propose information about contemporary dances64, hundreds of books are available, video can be found on Youtube, Numeridanse, Dancersproject.com, Facebook, the dance companies websites, etc. Groups in Social media (such as Facebook) are also places where much information can be exchanged, along with seminars, workshops, festivals and interaction with different choreographers. The interest of people in contemporary dance is increasing, young people are willing to learn contemporary dance, and more and more people are practicing it. New technologies can be very helpful in preserving contemporary dance. Contemporary dance is essentially important for the dancers because the enjoy dancing, they like to meet new people and cooperating with others, it improves the physical and psychological condition, and is a way of expression.

5.4.4.9 Contemporary music composition

5.4.4.9.1 General Information According to Griffiths (1996), the contemporary music refers to all post-1945 modern musical forms. In comparison to classical musical forms, the contemporary music has some distinctive characteristics like the elimination of that notion of tonality, the intense changes in the dynamics and the peculiar and unconventional use of musical instruments. In the field of contemporary music composition and of musical instruments used, a few years ago, the electronic synthesizer was a revolutionary concept of a new music instrument that was capable of producing sounds by generating electrical signals of different frequencies by pianistic gestures performed on a keyboard. Nowadays, the music production still depends on musical instruments that are based on intermediate and obtrusive mechanism (piano keyboard, violin bow etc.). Many years of studies are presupposed in order to obtain a good level to (a) control these mechanisms and (b) read and comprehend a priori defined musical scores. This long learning procedure creates gaps between non-musicians and music. Additionally, even the emotional status of the performer is expressed through extremely limited effective gestures such as “keystrokes”, “grand détaché” (sustained strokes) or “martelé” (sharp almost percussive strokes). Nowadays, the need of a novel intangible musical instrument, where natural gestures (effective, accompanying and figurative) performed in a real-world environment together with emotions and other stochastic parameters for controlling non-sequential music, is increasing. The main goal of this use case is to propose a large scale unconventional tool for sound synthesis, “the intangible musical instrument” based on real time recognition of musical gestures performed in space and on the identification of the emotional status of the performer. Thus, ‘everyday gestures’ performed in space or surface together with the emotional feedback of the performer, will continuously control the synthesis of music entities. This intangible musical instrument will not be only addressed to experienced performers, musicians, researchers or composers, but also to users without any specific music knowledge. The heritage of the classic composers can in this way be

64 www.dancersproject.com ; www.cnd.fr ; www.ladanse.com ; www.dancemagazine.com ; www.dance-tech.net ; www.blackfishacademy.com/dance.htm

Del_2_1_FINAL2.doc Page 109 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 available for everyone; it can be better preserved and renewed using natural body and emotional interactions. Below the results of the questionnaires for the contemporary music composition are provided, according to the proposed dimensions.

5.4.4.9.2 Respondents’ information Our sample is composed of 43 respondents, 24 of whom were male and 19 female, with a mean age of 33.95 years (min. 18 and max 57). As to the educational background, most of the respondents (33) have higher education, while the rest of them (10 respondents) have high school education. Also, they are studying music for an average of 23.9 years (SD 11.68) as well as practicing for 16.3 hours (SD 11.99) per week. The majority of the respondents are professional musicians (12 of them), 11 composers, 10 students and 5 teachers, while the rest 5 respondents are both students and teachers. The most frequently answered musical instrument that the respondents are playing is piano (20 of them) and guitar (8 of them). Moreover, 15 of them who are playing the piano belong to and follow the European piano school and the rest the Russian one. The majority of the respondents are familiar with the contemporary music (39 respondents), with the meaning that they practice mostly contemporary music and follow the actuality of the tendencies in contemporary creations or they have some basic knowledge and sometimes they are curious to know what is composed today. Furthermore, they are familiar (35 respondents) with the use of technological devices generally (computers, tablets etc.) as well as (31 respondents) with the use of technological devices in their musical activity (trainings, composition etc.).

5.4.4.9.3 Physical Dimension Our first main goal of this section is to categorize the main gestures of the musicians while performing music, secondly analyze them and create a musical gesture dictionary/typology, which will be used for the “intangible musical instrument”. Generally speaking, it is expected to be made different gestures for different kind of musical instruments by musicians. Although our experts are playing a wide variety of musical instruments, such as piano, guitar, flute, trumpet, saxophone, violin, viola, cello, zither, etc., we focused mostly on the pianists’ gestures, as they were the majority or/and they are playing more than one instrument including piano. Also, the musicians respond that they do not concentrate on what gestures are going to make while performing, but only on the musical score. As a result, the process of realizing, recording and categorizing gestures was quite difficult for them. The guidelines for the physical dimension were written according to Delalande’s (1988). More specifically, the gesture categories are: a. effective gesture: sound-producing gesture; b. accompanying gesture: supports the effective gesture in various ways; c. figurative gesture: refers to a mental image that is not directly related to any physical movement, but which may be conveyed through sound. Therefore, we asked the experts not only to describe their major body movements (hand, arm gestures, etc.) but also the degree of their frequency. As a result, effective gestures include the muscular tension of back, shoulder, arm and wrist. According to experts, their gestures have much more tension, instead of external movement, meaning that they mostly use their muscles of the upper body in

Del_2_1_FINAL2.doc Page 110 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 order to produce sound/music. Moreover, for the pianists, the use of leg for pedal is also important, as it plays a crucial role for the sound result. Referring to the accompanying gestures, the right position of torso, knees, or relaxation of neck, shoulders, as well as breaths, are included. These gestures do not produce sound, but support the musician and the sound result in many ways. Also, figurative gestures include head or leg movements in order for musicians to be more theatrical or to communicate with the audience. Finally, referring to the facial expressions, according to our experts, in most cases, as it is already mentioned, are serious and concentrated on their task of performing the musical piece. They do not usually have specific facial expressions which are meant to convey specific emotions (like happiness, sadness etc.). Their gazes may be oriented in different directions: sometimes, if they are playing with others, they look their maestro for instructions or the rest musicians for better communication, in other cases their gazes are directed to the musical score.

5.4.4.9.4 Emotional Dimension For the investigation of this dimension, we asked the experts to respond what the musician’s feelings/moods are, while performing. According to them, it is easier to perform effectively if they have experienced something important and joyful and they have to perform a sad piece of music, than to perform if they have experienced an important and sad event and they have to perform a cheerful piece of music. This is because they can imitate and feel more easily the emotional state in the first case and perform better, in contrary to the second one. We also have to highlight that many respondents support the idea that the emotional state agrees with the musical expression of an emotion.

5.4.4.9.5 Social Dimension As far as the social dimension is concerned, the respondents not only feel but also perform better when they play with others, than play solo. This is because they feel more secure. Also, they are more concentrated to what they do when they play with others. Finally a movement becomes a musical gesture only if it is understood as such by the perceiver, within the flow of communication between the musician and the audience.

5.4.4.9.6 Knowledge and Meta-knowledge Dimension Starting with the theoretical aspects, according to the experts, a good understanding of the score is essential for the proper music execution as well as the knowledge of the score affect the better musical expression. Also in case of the pianists, they support that different fingerings and pedal using produce different musical result.

5.4.4.9.7 Context/Environment Dimension As far as the environmental dimension is concerned, musicians perform in various places, such as house, conservatory, outdoors, indoors, theater, opera, etc. The place that the respondents usually perform is house and then conservatory. Also they believe that the performance space is very important in the final outcome.

5.4.4.9.8 Teaching and Learning Dimension According to the experts, a technological training system to support learning process would be very important, not only for the awareness, but also for the transmission of

Del_2_1_FINAL2.doc Page 111 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 the contemporary music composition. Affordances might include a digital score, the feature of score multimedia annotation, etc. Also supporting tools for practicing, such as metronome or tuner, etc. or collaborative ones, such as mail, forums could play an important role in the educational process. Moreover, the possibility of isolating instruments and distinguishing the individual styles, as well as any kind of visual and conceptual representation of the structure of the contemporary musical piece, could be of great importance in the learning and knowledge development processes. Also, they believe that it is very important for the performer to have the possibility to follow his/her personal evolution through an evaluation system. Finally, according to our experts, a possible structure of a lesson is suggested: 1. Stimulus (i.e. sound, paintings, etc) 2. Discussion (history, social context, etc) 3. Alternatively, show and discuss on other composers’ scores 4. Discuss about what the student wants to compose 5. Discuss with other students each ones score 6. Repeat the process 7. Add new material – new gestures

5.4.4.9.9 Value Dimension The objective of contemporary music use case is the composition of new musical pieces based on musical patterns of classical composers (i.e. Beethoven, Haydn or Mozart). According to experts, teaching classical music is necessary to the general educational music process as it stirs the creativity, imagination, emotions and causes the spiritual elevation. It contributes to the understanding and perception of contemporary music and it often inspires for modern musical creations achieving thus the renewal of cultural heritage. For this reason, the value of classical music is very important. Generally speaking, the value can be cultural as well as historical, because of the physical evolution of music through ages. Its benefits make classical music an intangible cultural heritage that must be preserved and safeguarded. This is the purpose of contemporary music composition: the renewal musical pieces.

5.4.4.10 Craftsmanship

5.4.4.10.1 General Information Pottery65 is the ceramic material which makes up pottery wares, of which major types include earthenware, stoneware and porcelain. The place where such wares are made is also called a pottery. Pottery also refers to the art or craft of the potter or the manufacture of pottery. Pottery is made by forming a clay body into objects of a required shape (most commonly vessels) and heating them to high temperatures in a kiln (high temperature chamber), which removes all the water from the clay and induces reactions that lead to permanent changes including increasing their strength and hardening and setting their shape. A clay body can be decorated before or after firing. Prior to some shaping processes, clay must be prepared. Kneading helps to ensure an even distribution of the moisture content throughout the body. Air trapped within the clay body needs to be removed. This is called de-airing and can be

65 http://en.wikipedia.org/wiki/Pottery

Del_2_1_FINAL2.doc Page 112 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 accomplished by a machine called a vacuum pug or manually by wedging. Wedging can also help produce even moisture content. Once a clay body has been kneaded and de-aired or wedged, it is shaped by a variety of techniques. After shaping it is dried and then fired. Today, there is a decline in the importance of pottery for useful purposes. We seldom see people today using pottery jugs or containers to carry food or water. Instead, containers made of metal or plastic are used, due to the fact that they are more durable as compared to pottery, which are fragile and break easily.

5.4.4.10.2 Turkish Pottery (Avanos Region) The Anatolian pottery dates back to Chalcolotic-copper age (BC 7000). There are several pottery centers in Anatolia which have different characteristics depending on several parameters such as the material, cultural roots etc. Avanos is the most popular traditional pottery center in Turkey. The oldest samples in this region dates back to BC 3000. Figure 5-16 shows two samples from Avanos region which are produced around BC 2000. The city is also very old trade and pottery center. The historical recordings reports that the name of the town in Asyrian era was “Nenessa”. The town is named as “Zu-Winasa” during the Hittite era. Romans and Byzantians called the place as “Venessa”. The Ottoman era name was “Evenez”.

Figure 5-16 Avanos Pottery Samples (~BC2000).

The town is part of world-famous touristic Cappadocia region. It is situated just the north of the Cappadocia and separated from the region by the longest river in Turkey known as Kızılırmak (Red River). As the name indicates Kızılırmak has red silt in the basin and the banks. The ceramic clay of the red silt is used as the main material in Avanos pottery. Figure 5-17 shows Cappadocia region, Avanos town and the location of Avanos in Turkey.

Figure 5-17: Cappadocia, Red River and the Location of Avanos

Del_2_1_FINAL2.doc Page 113 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

The pottery was used to be the main economic activity of Avanos. The earthenware products such as yoghurt cases, vases and potteries were quite popular. Thus it was quite important for Avanos male population to master in pottery. It is said that pottery proficiency was an unwritten rule for a man to get married, which indicates that he can easily take after a family. However the rise of plastic materials has lessened the popularity of traditional earthenware pottery. According to the records there were 75 workshops and more than 300 skilled potters in Avanos. Currently there are nearly 20 workshops and almost 150 potters in the town, thanks to the Capadocia’s touristic attractions which keep Avanos pottery still alive. The pottery art and skills are mainly transferred via master-apprentice approach in Avanos. However there is a significant rise in the number of potters that get formal training. Another important issue is the change in the format of the products, as the region has been influenced by the ceramic work of Kütahya region pottery.

5.4.4.10.3 French Pottery (Vallauris – the clay city) As a representative case study of the French pottery, we have selected to focus on Vallauris city, which is also called the clay city, located in the south of France.

Figure 5-18: Valarius location The pottery-making traditions of Vallauris date back to the start of the Christian era. In Gallo-Roman times, large deposits of fireclay were already being used to make bricks and pots. In the 16th century, the town was ravaged by the plague. However, 70 families from the surrounding area of Gênes later repopulated it, among which were several potters. From a technical point of view, Vallauris pottery is influenced by Spanish, Italian, French and Romanian techniques. In 1966, Vallauris potters decided to create a true World Centre of Ceramics. They proposed the creation of a national competition bringing together the best artists and craftsmen in France. This idea rapidly caught-on with famous contemporary artists such as André Malraux and Pablo Picasso, as well as other creative artists, so well in fact, that in 1968 the Competition became international. Thus the Vallauris International Biennale of Ceramic was born, playing a very important role in the local community of Vallauris, in its’ local identity and in tourism66 Currently around 50 expert potters and ceramists are actively working and promoting the pottery art of the region. The city has thus obtained a local quality label named «City of Artistic Professions».

5.4.4.10.4 Greek Pottery The use of pottery in Greece dates back to 7000 B.C. It existed especially in fertile plains which contained appropriate soil (to make clay), plenitude of water and fuels. The most significant centers of pottery production were: Cyclades, Thessaly,

66 http://www.vallauris-golfe-juan.fr/Town-of-clay.html?lang=fr

Del_2_1_FINAL2.doc Page 114 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Macedonia, Creta and Peloponnese. Minoan vessels were found even in Gibraltar, while classical vessels were found in the Mediterranean and Mesopotamia. Because of its relative durability, pottery comprises a large part of the archaeological record of Ancient Greece, and since there is so much of it (some 100,000 vases are recorded in the Corpus vasorum antiquorum67), it has exerted a large influence on our understanding of Greek society. The shards of pots discarded or buried in the 1st millennium BC are still the best guide we have to the customary life and mind of the ancient Greeks. Pottery in Ancient Greece was painted with both abstract designs and realistic murals depicting everyday Greek life. Ancient Greek paintings and structures did not survive as well as Ancient Greek Pottery, so the paintings on the jugs, vases and pots are extremely important for archeologists. Clay was a very important part of ancient Greek culture68. Due to the absence of glass or plastic to make containers out of, clay was extensively used to make containers, since it is easy to find in Greece. Once clay is fired it is almost indestructible (unless broken) and also fairly waterproof. These features made clay a perfect material to make containers out of e.g. big storage containers, cups, wine bottles etc. In Creta, the first pottery fragments date back to 7000 BC, however the pottery thrived initially in the Early Minoan period (3000- 2100 BC) and reached its peak in the Middle Minoan period (2100- 1600 BC). Other important chronological periods for Greek pottery, which reflect at specific styles include69: Protogeometrical (1050 – 900 BC), Geometrical (9th and 8th century BC), Orientalized (900 – 600 BC), Black Figure (620 to 480 BC), Red Figure (late 6th century), white-ground (end of the 6th century BC), Hellenistic period (roughly late 4th century to the 1st century BC).

(a) (b)

(d) (c) Figure 5-19 a) Pithari (large jar) from Thrapsanos b) Pottery maker in Corfu c) Pottery from Sifnos d) Pottery from Archangelos (Rhodes).

In modern Greece, the pottery sector encounters difficulties, although it could well be an important and robust sector of the Greek Economy and to achieve a significant

67 http://en.wikipedia.org/wiki/Corpus_vasorum_antiquorum 68 http://atschool.eduweb.co.uk/sirrobhitch.suffolk/Portland%20State%20University%20Greek%20Civilization%20Ho me%20Page%20v2/docs/8/glatt.htm 69 http://en.wikipedia.org/wiki/Pottery_of_ancient_Greece

Del_2_1_FINAL2.doc Page 115 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 number of exports. In the past, there have been some programmes and initiatives, e.g. from local regional authorities (Marousi, Aigina, Thrapsanos/Margarites in Creta, Madamados in Lesbos, etc.), that gave boost to the sector. However, these were fragmented approaches with a theoretical focus and without any continuity and there was no central (national) organization for supervising this activity, which was sometimes considered as an art (Ministry of Culture) and other times as a profession (Ministry of Development). A serious crisis for the Greek pottery occurred in 50’s and 60’s because of the fact that plastic and aluminum replaced pottery items in everyday life. Many traditional pottery businesses were forced to close since the use of pottery vessels to transport water stopped. Wooden ovens and ceramic plates, vases, etc. were not made any more, since electric kitchens became extremely popular. Refrigerators replaced the clay items that were previously used as storage compartments. For instance, mass production of industrial ceramics was possible from the factory of Kerameikos (a historical place in Athens, where ceramic artists worked), which made exports until the 80’s, but then closed. However, even with serious losses, the Greek traditional ceramics sector recently showed signs of recovery, due to factors like the following: a) The rise of tourism and exports by the production of souvenirs from clay, b) Trends of modern society, e.g. the rise in the demand for decorative large ceramic jars for balconies and gardens, c) the wide use of fireplaces and cuisine TV shows that led to high demand for cooking pots and other ceramic items for the kitchen and d) the wide expansion of Internet. There are many reasons to strengthen this sector of the economy: Traditional Greek pottery products suffered for many years from the lack of interest for their sector by the Government and an integrated strategy for their promotion and marketing. Even outside Greek museums, tourist shops sell copies and imitations of products mainly from Asian markets. One problems that needs to be tackled is that the raw material for creating high quality pottery is expensive, reaching 20-40% of the cost of the final product. In many cases, clay needs to be imported (one large Greek provider of clay existed, called ELKEA70, but it has been recently declared insolvent). Specifically, clay is imported (mainly from UK and Italy) by a percentage of 50-60% for pottery artisans and of 80% for fire-resistant ceramics. Although Greece has the appropriate soil for specialized ceramics, there is no good organization to produce the necessary quantities and reduce imports. The lack of promotion and marketing of Greek ceramic products in domestic and international markets, the increase of transportation costs, as well as the decrease of demand from tourists due to the global financial crisis, make the potteries (except for some exceptions) focused more on the domestic market. Another problem is that the knowledge for making traditional pottery was mainly transmitted from one generation to the next, mainly from father to son, since women only had an auxiliary role. Also, in many cases the maker and the seller were not the same person.

5.4.4.10.4.1 Respondents’ information We received 26 questionnaires in total (from Greek, Turkish and French potters). The majority of Greek questionnaires were answered by very experienced potters from various regions of Greece. The Turkish questionnaires were filled in with the help of pottery expert Dr. Betül Aytepe. She has visited 10 participants (potters) and completed the questionnaires after providing information about i-Treasures project and the details of technical issues. Since she is a respected academic in this field it is believed that the participants have paid attention to the questionnaire.

70 www.elkea.gr

Del_2_1_FINAL2.doc Page 116 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Regarding the gender distribution 13 of the participants are female and 13 are male. Generally, the age of respondents varies, but most respondents (7 out of 26) belong in the 50-59 category. The level of education also varies: College/University (10), High School/Lyceum (8) and Basic education (6), while 2 do not answer. Most respondents started practicing craftsmanship at 20-29 years and many others at 10-19 years. The great majority (17 people) practice craftsmanship more than 6 years, while a great number (8 people) practices craftsmanship 1-2 years. The great majority of respondents (15 people) are professional potters and some of them are at the same time instructors (6 people), while the rest (9 people) were students. As far as experience in wheel-throwing is concerned, the majority (14 people) practices wheel - throwing more than 6 years, while many (7 people) practice wheel-throwing 1-2 years. The majority (12 people) learned pottery with a professor/expert who explained and showed them some basics and then they continued learning on their own. Regarding the technological issues in the questionnaires, the majority seem to be positive. However, most of the participants also indicate that the enjoyment is to touch clay and to shape material by hand. Thus, they agree that hands should get dirty in order to master this handicraft tradition

5.4.4.10.5 Physical Dimension In wheel throwing craftsmanship, the potter initially prepares the clay and kneads it, and then he centers the clay in the wheel. Next step is prickling and creation of bottom. After the completion of this step, the first shape of the object is formed by upgrading clay and creating equal thickness in the objects sides. Refining of the object takes place then. Finally, the object is given the ultimate form and then is extracted from the wheel throwing. While kneading the potter stands in upright position and presses the clay with his forearms. All other steps are applied with the potter in seated position. Wheel throwing pottery is made using the potter's wheel, a machine with a horizontal spinning disc on which clay is shaped into various clay objects. In wheel throwing hands coordination and rhythm, the body posture, the correct hand pressure and the fingers flexibility are very important. Most of the potters use the classical electric wheel, where small symmetric pieces are produced (pitchers, jugs, jars, pots). Professional potters were interviewed and stated that all the body takes part in pottery. It needs a lot of concentration and many hours when done professionally. Moreover, they said that it is required to change frequently the body position, according to the technique or the object created. It is a great exercise for the body that requires strength and energy. Finally, people with physical /muscular diseases to the torso, people that are not capable to put energy or force or people not flexible enough will have difficulty in practicing pottery for long time or professionally. Respondents say that the fingers of the right hand, right wrist and the fingers of the left hand are the body parts which are more involved in wheel-throwing. Left wrist, right sole and right shoulder, are the body parts which are involved a lot in the procedure. Left hip, left knee, left elbow and left sole and right hip are regarded the body parts that are less involved. The level of involvement of all other body parts could be regarded medium.

5.4.4.10.6 Emotional Dimension Due to its artistic nature the pottery creation makes people feel joy, devoutness pride and contentment. Most people, either professional or amateur say that joy is the most

Del_2_1_FINAL2.doc Page 117 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 common feeling when performing wheel throwing. According to respondents declarations the creation of objects such as copies for museums, with great historic or spiritual value make them feel devoutness. Pride is an emotion that people feel when creating things on their own since early childhood and it is an expected answer.

5.4.4.10.7 Social Dimension The potters usually perform alone in their workshop and in these conditions they feel more concentrated in their work. They do not encourage, neither discourage their friends and relatives to get involved with pottery. We can say that they keep a neutral attitude towards encouraging friends and relatives to deal with pottery. Most of the interviewees answer that the main factor which led them to learn pottery was their family and the second factor was their interest in tradition. Many respondents were motivated by their personal inclination. It should also be mentioned that the existence of local ceramic industry played a significant factor to their involvement in learning pottery.

5.4.4.10.8 Knowledge and Meta-knowledge Dimension People would like to search in the knowledge platform firstly for sequence of moves and secondly for body/hand postures and styles/techniques. Moreover in the knowledge platform many people are interested in correlations with body and hands posture. Other types of correlations people would like to search for are: clay type and plasticity, slip, glaze, cooking temperature, resistance to deformation, stages and techniques used and geographical region for the different objects/object types. For example, specialized pottery techniques used in specific geographic regions (e.g. India) are of special interest.

5.4.4.10.9 Context/Environment Dimension The great majority of potters use electric wheel throwing instead of foot operated. The most important for clay quality is clay plasticity. The basic materials/ingredients used in order to practice wheel throwing pottery are fettling tools, wire and water. Additionally, can be used kaoline, color, glass, scraper, sponge. Most respondents say that wearing gloves affects the procedure in no having any feeling of the clay during the creation procedure. The great majority of potters uses electric kiln to traditional wood kiln.

5.4.4.10.10 Teaching and Learning Dimension Most of the users answer that the most efficient method for teaching wheel throwing gestures is by explaining and showing to the student some bases but let him learn the rest by himself. Video, audio and printed supports are not often used as a support during wheel throwing pottery gestures. Most respondents if they observe some errors in their student’s gestures/postures try to guide student’s hands through physical contact, using his/her hands. The student’s frequent lesson attendance is regarded very important. Most respondents believe that the average number of hours typically required in learning pottery is 300. While some people claim 200 hours would be enough for a skilled student, some others claimed minimum 300 hours are required. Most people say the way lessons are taught depends on the age or the experience of the students. A typical lesson lasts between 45 minutes and 4 hours.

Del_2_1_FINAL2.doc Page 118 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

As far as training material is concerned respondents say video is very important, printed text, audio and interactive game, are not regarded very important. The visual feedback is the most important feedback type provided to the student by the training material. Interviewees desire the response time of the feedback to be in real-time. Moreover, the elements of the training system’s user interface that are considered important are virtual hands model presenting the correct procedure and visually highlight mistakes. Very important in the determination of the student’s score is regarded the correct execution of moves, while the correct body posture and the correct sequence of moves are also considered important. Most respondents totally agree on the statements that the ability to replay the student’s performance is an important feature and the ability to set a difficulty level. Most respondents agree a lot on the statement that the ability to select a specific move for training is an important feature. The structure of the lesson according to most trainers should be repetition of movements for approaching the best application. Other answers given are: creation of an object with increasing degree of difficulty and repetition of the same gesture or/and progressive introduction of new gestures. Most people believe the difficulty should be escalated with lessons creating different objects, with increasing complexity.

5.4.4.10.11 Value Dimension The long history of pottery with variations from region to region justifies its historical and unique value. Moreover, since pottery is a contemporary art expression all over the word it is highly correlated with cultural dimension. Last but not least, pottery’s economic value in many regions is of great importance. Handmade pottery can often be folk art masterpieces. Many clay items, e.g. the Pithari of Thrapsanon, Creta71 or the plates in Archangelos, Rhodes72or the pottery items made in Sifnos Island73 are considered as unique in the world. At last, Greek Ceramic Cookware are made following strict international specifications for safe cooking of foods.

5.5 Interviews with the Experts – Consultants As it has been explained in the Overall methodology (see Section 3), the third source from which we derive the requirements was the interview, which was conceived and developed in some of the sub-use cases. The need for this third step of data collection derived from the fact that the analysis of the data gathered from the questionnaires (see previous Section) showed that while in some use cases (namely, the Craftsmanship, the Dance) the questionnaires allowed a certain degree of detail as far as the existing teaching and learning processes (thus guarantying sufficient inputs to define a set of requirements, especially related to the possible educational functionalities of the system), in some other use cases (i.e. the Songs, as well as the Contemporary music) the information were vaguer. This led some of the sub-use case leaders to consider the possibility to select a restricted set of Experts (Experts-Consultants) and come back to them, this time with an interview, more oriented to understand what the existing teaching/learning

71 http://www.e-thrapsano.gr/cretan-pottery/234-the-pottery-art-in-thrapsano-historical-data.html 72 http://www.visitrhodes.gr/showcontent.asp?id=172&mainid=8 73 http://sifnos.e-sifnos.com/sifnos-information/sifnos-pottery.html

Del_2_1_FINAL2.doc Page 119 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 practices are, and – above all – what they might expect the system could offer to foster educational activities in these fields.

5.5.1 Dimensions/ topics of the interviews In order to define the main dimensions/topics that should constitute the main foci of the interview, a close collaboration between CNR and UOM occurred. The interactions between the two resulted in the definition of a number of main dimensions/topics, which then guided the overall structure of the interview. In particular, these include: - User interface of the platform (especially, accessibility and ergonomics issues) -Teaching and learning functionalities (what a teacher might expect from the platform, to support the design, running and evaluation of learning activities processes in the context of one specific sub-use case). This may entail:  Course monitoring – this topic has to do with the functionalities of the platform for tracking usage by students, scheduling and availability controls, etc.  Course access authorization – this topic regards the ability of the platform to manage identities, roles and permissions, as well as groups (log in and password) and related aspects, such as for example the possibility for the teacher to manage enrolments.  Assessment Design – this topic concerns the ability of the platform to design and manage assessment tools, such as creating and administering tests, automated scoring, self-assessment and feedback mechanisms, etc.  Online Collaboration and Communication – this topic covers the area of communication and collaboration, i.e. the ability of the platform to provide tools supporting communication (chats, forums, video-conferring, email, social networking, etc.) and collaboration (file exchange, wiki, blogs, commenting features, whiteboards, etc.).  Productivity Tools – these include: bookmarks, calendar/progress review, orientation/help, search, etc.  Practicing – the ability of the platform to offer functionalities to students, such as for example the possibility to choose specific lessons, to choose the level of difficulty and adapt dynamically the learning path, to record and save progresses, etc.  Supporting tools for practicing music – these include: metronome, tuner, score follower, etc.  Practice and Training Support for understanding and (self-)assessment – the ability of the platform to support students in the process of understanding and assessing her/his performance; tools may include for example: score follower, score rendering, score annotation, recording sound and analysis, sound visualization, sound comparison, sound feature extraction, gesture follower, visualization from avatar, etc.  Text to song – the ability of the platform to allow students and teachers to enter some lyrics (or semadophons), so that the platform reproduces the song/chant as a professional performer would do. - Educational content (the ability of the platform to provide access to educational /information contents). This includes:

Del_2_1_FINAL2.doc Page 120 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

 General - the format of the resources made available by the platform (videos, texts, images, audios etc.), navigation issues, etc.  Symbolic music representation – this include: logical structure based on symbolic element and instructions related to rendering and synchronization with different types of media.  Multimedia annotations – the ability of the platform to allow highlighting important elements of the score, drawing lines, circles, arrows, etc., adding text descriptions, etc.  Music Exercise Generator Tool – the ability of the platform to allow teachers to create exercises, through: loading and saving music material, loading and saving of templates with exercise structure, integrating music material with exercise structure, etc. - Functionalities for researchers (for example in terms of specific search functionalities).

5.5.2 Interviews preparation Once the dimensions/topics have been defined, the overall structure of the interview was designed. The idea was to conduct ‘semi-structured interviews’ where the interviewer - thanks to some supporting material - should introduce one by one the above mentioned topics and then invite the interviewee to talk freely while looking at some keywords (that should inspire her/him). The supporting material (prepared by CNR) was composed by some guidelines for the interviewees (to guarantee a certain homogeneity of approach in conducting the interviews), and some cards that should be displayed by the interviewers to the interviewees, to provide stimuli about a certain topic (see Appendix 10.5 for the Guidelines, which include also the cards). The interview should be recorded and then the recordings analysed thanks to a template, who should be filled in by the coder with the data gathered (the template is also included in the Guidelines in Appendix 10.5).

5.5.3 Interviews release and delivery As already mentioned, the use cases within which interviews were conducted were: the Rare Songs (in particular: Canto a Tenore and Byzantine music), and the Contemporary music. In particular, while for the Canto a Tenore 2 Experts –Consultants were interviewed (academics), for the Byzantine music 20 interviews were conducted, and for the Contemporary music 16 Experts were involved (10 of which were interviewed by UOM and the rest of them by ARMINES/ENSMP. 9 interviewees were contemporary music composers and teachers and the rest were students in contemporary music composition).

5.5.4 Interviews results A synthesis of the main results reported by the interviews is provided in the following:

5.5.4.1 Canto a Tenore - User Interface of the platform

Del_2_1_FINAL2.doc Page 121 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Taking into consideration the type of user, the experts underlined the need for a platform with an easy interface, “user friendly” (for young audience), menus very intuitive and with large buttons (for beginner users). As for the language, it should be in Italian first, but also in Sardinian, for ideal and practical purposes. Indeed, there are some terms in Sardinian language that cannot be translated into Italian, and at finally they also proposed English, that is the language of research and helps in the dissemination and spreading of knowledge on this topic. - Teaching and Learning Functionalities For the definition of these dimensions, it is important to underline that in Canto a Tenore a formalized teaching process does not exist, neither does only one way to perform this type of singing exist, but each Sardinian village has its style (modas). This is pointed out because we want to underline that it is very difficult to close this form of singing in confines, as a platform. Learning Canto a Tenore is not only related to sound reproduction, but is an experience linked to context, social aspects, etc.; indeed from the cantors’ view point, learning is based on imitation, basically through a face to face approach. However, the experts agreed that technology can help to develop a more structured learning and teaching approach. They identified as possible user a person interested in Canto a Tenore, but not living in Sardinia; or a researcher or an expert who wants to go in depth on some technical aspects, but also a singer in Sardinia. In a learning/ teaching process the platform for the singers should contain videos with a lesson carried out for example by a master on Canto a Tenore about a specific topic such as the timbre quality; for the researcher some audio with a different recording for each voice. Finally, it can help to understand the acoustic behind the singing and to make some articulation analyses, which might be useful for the singers themselves. Virtual courses can be a profitable source but should be conceived at an individual level. They should support two different kinds of competence (listening and singing) and should be conceived as self-learning paths. The platform should allow access to different, learning objectives both self-contained and modular. The system should also provide a file repository (for texts, videos and audio recordings). It should be reachable with easy navigation. The tracking of activities done and to be carried out should be considered as a guide for the learner in following a coherent path. Course access authorization should be at different levels according to the different types of user (expert/ teachers, learners/ student and single user). The assessment setting should support the user in the self-learning process, e.g. the learner should be able to record his performance and evaluate it by himself or by an expert singer, as in the usual tradition, or the system should give automatic feedback. One of our experts imagined an assessment setting different for each voice of the Canto a Tenore for a more structured type of evaluation. Collaboration would be desirable in a course online or blended, because it can help the student to remain focused on the course. As regards practicing, we should not think of Canto a Tenore as a finite number of songs; there is not a structured repertory. It’s more correct to talk about “ways of singing” based on continuous decisions; the singers do not follow a score. There is a sort of “grammar” that should be known by the singer, and he, from time to time, follows the rules (i.e. some styles require text with particular metrics, harmony increases and decreases) and performs. For practice it would be useful for the student to listen an audio recording of a master singer, who sings in the style of a village, to “see” the visual representation of that particular timbre and to try to reproduce that song. After singing the student can listen to his performance, see

Del_2_1_FINAL2.doc Page 122 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 the visual representation of his singing and can compare his own with the one of the master singer. The system can give feedback to the student about the timbre, i.e. in which part the timbre is different or similar to the master’s one. It would be useful to create a graphic visualization of the two performances, in this way the student from the feedback system and from his own evaluation can understand in which part his way of singing is similar to or different from the “original”. The student can learn if he sings in the style of that village but with some differences (imitation is avoided in the Canto a Tenore, this way of singing is in continuous evolution). Some other practice and training supports on the platform should be: the opportunity to follow and annotate the formal representation of the music, but also sound and recording analysis. Sound should be recorded, analyzed and compared with a pre- recorded sound or analyzed following a set of rules (i.e. given that Canto a Tenore is mostly based on improvisation, a learner could try to produce a piece of a song and have back this information: is the structure of the song correct in relation to a set of rules? What is wrong?) A spectrum analysis in real time (by means of a digital spectrum analyzer) can help the learner to compare his performance with a recorded one. Related to the prospective of the implementation on the platform of the “text to song” function, it is necessary to consider that in Canto a Tenore text can be the result of an improvisation. Often famous poetry texts are sung, so a tool which allows one to upload a text and listen to it sung as a cantor in Canto a Tenore, can be useful for training listening competences and understanding some particular rules related to the metrics and translation of a text into a song (the text is in some way elaborated by the cantor who doesn’t sing the text in the written sequence but can, following specific rules, skip a verse, cut a verse, etc.). - Educational content The platform should support all the cited types of file in order to provide a wide range of materials and it should include an archive with all these files. Training materials on Canto a Tenore’s historical and cultural value are important in order to understand the ICH before practicing. Video and audio recordings could be useful for the training (listening competences). Texts collected in a sort of song book could be useful for training singing competences. In Canto a Tenore, a text could be sung following a style and the related rules. The integration of the symbolic representation with video and audio recordings could help the learner to understand the structure of the song and reproduce it. The experts stressed that the best symbolic representation could be the spectrogram and the melodic curves, because understanding the traditional score by a student without musical background, can be very difficult. So those ways of representation are more communicative and suitable for this type of singing and audience. Platform exercises should be conceived in order to support listening and singing competences of the learner. To improve listening competence it could be useful to integrate the listening to a song with exercises (recognition of a style, of the metrics, etc). - Functionalities for Researchers For a researcher, according to the opinion of our experts, it would be useful to find on the platform: availability of new studies with raw data which can eventually be elaborated; metadata of materials uploaded; the opportunity to upload different kinds of materials (audio, video, photos, etc.), that can be shared in the research community; data recorded by electrograph, that can record the activities of the larynx and can help researchers in the study of timbre.

Del_2_1_FINAL2.doc Page 123 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Finally all these materials would also be very useful for the singers who can become more conscious of the ability that they have.

5.5.4.2 Byzantine music - User Interface of the platform Although most of our users are relatively young, most of them have very little experience with computers and they are not very comfortable with their usage. Most of their web “surfing” experience comes from visiting web forums related to Byzantine Music. This leads to the conclusion that the user interface of the i-Treasures platform should have a simple hierarchical structure of the content, with distinctive menus for classifying/categorizing the various functionalities. Some of them had a difficulty grasping the ultimate goal of i-Treasures, because of their little interaction with the Internet, and having as a standard the web forums that they most often use. So they consider that a video introduction (on the homepage) to the i-Treasures concepts should help the users getting accustomed to what they are dealing with upon entering the web platform. Having special interest in people with disabilities, our interviewees suggested the platform having functionalities that help these people use it. For example the availability of high contrast themes with large fonts, visual annotation and audio notifications. - Teaching and learning functionalities Some of our interviewees were also teachers either of Byzantine Music or school teachers, etc. This put them in the position to be able to specify some functionalities that they consider useful for a teaching and learning environment. They believe that educational content from other teachers should be reusable in order to be able to combine different point of views (from the various teachers) for teaching Byzantine Music. Additionally the ability to be able to contact through the platform other teachers or learners would be much appreciated. This could be done either by sending private messages, emails or through forums etc. Quizzes and generally assessment tests are considered useful educational tools to monitor the learners’ progress and performance. It would also be very useful to have the ability to adapt the difficulty level of the quizzes automatically based on the learners’ results or on demand. Specialized tools concentrated on Byzantine Music were requested. Such tool is a digital notation board (score) in which by entering the Byzantine Music specific marks (semadophons) and a relevant text (hymn), the tool would be able to reproduce the chant as well as the Byzantine Music scales. - Educational content The content of the platform regarding the Byzantine Music sub use case that is considered to be vital and should be present on the platform is consisted by digitized manuscripts/texts and audio material including all the range of Byzantine styles and sounds. This content should be categorized according to the Byzantine Music hierarchy and categories.

5.5.4.3 Contemporary music composition - User Interface of the platform According to our interviewers’ opinion and taking into account the type of users, the platform need to be easy to use, to have simple hierarchical structure of the content

Del_2_1_FINAL2.doc Page 124 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 with minimum complexity and generally to be user friendly. To this direction, and in combination with the impact that the platform would have to people with special needs, high contrast theme with large fonts and other functionalities, such as sound notifications and annotations should be enabled. - Teaching and learning functionalities Because of the fact that our sample includes both teachers and students, we asked them to suggest some teaching and learning functionalities. According to the teachers, the platform should have a repository/database, in which all educational content should be stored and archived. Also, the ability to view and search these learning resources of other teachers from the repository would be preferable. The platform should have the ability to adapt the difficult level according to the student’s preferences, as well as let each user choose between levels. Moreover, in most cases, assessment tools are used to estimate the learning process of student. However, in the contemporary music composition, interviewers suggest not to include such tools, but tools which would support the discussion and the ability to share opinions, such as messages or forums, would be preferable. Also they suggest a structure of a lesson. The most important thing in contemporary music composition is the stimulus (i.e. sound, paintings, etc.). When a teacher shares it with the students, a discussion about history, social context, etc. takes place. Then or alternatively, the teacher shows a composer’s score and discuss on it with his/her students. The teacher discusses with the students about what they want to compose as well as each student discusses with their classmates about his/her score while he/she is in the process of composition. Users should have the ability to parameterize rhythm, pitch, etc. in order to compose a musical piece. Also the platform should provide a database of sounds (i.e. musical instruments, urban, environment sounds etc.), as well as some other parameters such as different styles (i.e. tonal, minimal, ambient, etc.). Course access authorization should be at different levels according to the different types of user (expert/teachers, learners/student and researcher). Communication or collaboration tools, such as private messages, email, forum, whiteboard, etc. would also be much appreciated, in order a teacher, student, researcher to contact with other registered users of the platform. Some specialized tools for contemporary music composition were requested. Such tool is a digital notation board (score) in which the user would have the ability to compose a musical piece and then to listen to it while this tool would annotate the current part of the score. Finally, the possibility of isolating instruments and distinguishing the individual styles in different channels for later processing, such as configure the pace, speed, pitch, switch on/of the channel, etc. would be also desirable. - Educational content As far as the educational content is concerned, a variety of digitized scores and audio material that are archived in the platform’s repository/database should be very useful. This content should be categorized. Also, the platform should provide to user the ability to select a different style of musical piece of the currently selected score in order to study their differences. - Functionalities for Researchers Platform would be very useful to a researcher, if it would provide a repository/database with a variety of scores and audio material, metadata of

Del_2_1_FINAL2.doc Page 125 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 materials uploaded and the opportunity to upload different kinds of materials (audio, video, photos, etc.), that can be shared in the research community.

Del_2_1_FINAL2.doc Page 126 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

6. Identification of the Use case Requirements The data coming from state of the art review, questionnaires, interviews and the deriving pictures of the available technologies, as well as of the sub-use cases, were all used as input for the subsequent identification of Use case Requirements, which is reported in this Section.

6.1 Objectives and rationale Given that the goal of i-Treasures is to explore and experiment whether and to what extent some technologies may serve the purpose to support preservation and education in the ICH field (and more specifically, in the singing, dancing, craftsmanship and contemporary music composition areas), we thought it was important to make the first elaboration of the requirements at the level of the use cases, so to reflect their own peculiarities. The identification of the requirements at this level can be considered an intermediate step, aiming to bridge the inputs gathered through the questionnaires, interviews and state of the art review, with the subsequent stage of the overall i-Treasures User Requirements definition (which will be described in Section 7). The different sources of information (dimensions of the questionnaires, interviews and state of the art) were used to derive specific categories of requirements. The following figure (Figure 6-1) illustrates only the strongest relationships between sources of information and requirements (even if clearly all the collected data had an impact on the requirement definition).

Figure 6-1 – From questionnaires, interviews and state of the art towards Use case Requirements

Provisionally, the requirements have been grouped into five main categories. The categories reflect the project work partition into WPs and Tasks, so to allow partners to contextualize requirements in respect to their work. The categories are:  ICH Capture and Analysis (WP3-related)  Data Fusion and Semantic Analysis (WP4-related)

Del_2_1_FINAL2.doc Page 127 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

 Educational processes (WP5.1-related)  3D visualization module for Sensorimotor Learning (WP5.2-related)  Web Platform for Research and Education (WP5.4-related)74. Looking at the single (sub-)use cases separately and starting from some dimensions of the questionnaires (especially the physical and the emotional ones) and from the state of the art, we were able to derive a first version of the requirements related to what aspects of the ICHs the i-Treasures system shall be able to ‘capture’. From the knowledge and meta-knowledge dimension of the questionnaires, as well as from the state of the art, we got inputs as far as the data fusion and the semantic analysis is concerned. The educational functionalities of the system were mainly nurtured by the knowledge and meta-knowledge dimension, the teaching /learning dimension, as well as by the info provided by the experts during the interviews. The requirements of the 3D visualization module for sensorimotor learning got inputs from the interviews and the teaching /learning dimension of the questionnaires, as well as from the state of the art review. The general info dimension of the questionnaire, together with the value, the knowledge and meta-knowledge and the context/environment dimensions provided inputs for those system functionalities that will guarantee access to ICH information to the general public with research, educational or dissemination purposes. To be noted that the Glossary in this framework can be conceived as a sort of background pervading the overall process of requirements’ elicitation. Besides, and similarly, the project Description of Work (DoW) should be considered as a general background of the requirements’ definition process, given that this document contained already indications, constraints and suggestions of work.

74 Given the specificities of the contemporary music composition use case, the first two categories have been merged in this use case and an additional category has been added, namely: ‘Intangible musical instrument’.

Del_2_1_FINAL2.doc Page 128 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

6.2 Requirements per use cases The process of requirements elicitation was started up by the sub-use case leaders, who gave preliminary inputs about possible requirements derived from the analysis of the sub-use cases; these inputs were then elaborated by ITD-CNR and revised by the use case leaders. In the end, this collaborative endeavor of requirements identification brought to the production of four lists of requirements (one per use case), which are reported in the following Sections.

6.2.1 Rare singing Legend: S1 Canto a Tenore M Mandatory S2 Cantu in Paghjella D Desirable S3 Byzantine music S4 Human Beat Box S1 S2 S3 S4 ICH Capture and analysis 1. The system shall be able to capture several singers (max.4) together and be able to separate their single voices M M M 2. The system shall be able to capture the sound in a high quality in order to produce the related spectrogram and to identify fundamental frequencies, ornamentations, consonants, utter, and improvisations M M M M 3. The system shall be able to detect the singer’s vocal tract engagement (e.g. tongue, mandibles, lips, anterior pharyngeal wall, vocal folds and vocal track constriction) M M M M 4. The system should be able to detect the reciprocal positions of singers D D D 5. The system should be able to detect the contacts among the singers D D D 6. The system shall be able to detect hand gestures (instrument imitation)/position and general postures. M M M M 7. The system shall be able to detect singers’ facial movements D D D M 8. The system should be able to detect singers’ gaze D D D 9. Sensors should not affect the performance of singers; sensors technology should cause no or minimal disturbance to the singers. D D D D 10. The system should be able to detect singers’ abdominal breathing with suitable sensors. D D 11. The i-Treasures platform shall as much as possible adapt itself to the places where it is produced M M Data Fusion and Semantic Analysis 12. The system shall be able to fuse data captured from different modalities M M M M 13. All measurements shall be time-stamped, so that only consistent measurements (those falling within a specific time- window) are fused in each data fusion cycle. M M M M 14. Different voices should be recognized using information related to the timbre, etc. D D D 15. Different styles of Human Beatbox should be recognized using information related to rhythm, geographical places and hiphop influences. D 16. Different styles of chanting should be recognized using D D D

Del_2_1_FINAL2.doc Page 129 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

information related to ornamentations, improvisations, languages, times and villages. Educational processes 17. The student shall have access to different learning materials, including textual documents, as well as videos and audios M M M M 18. The system should allow the student to separate voices while listening to audios D D D 19. The student shall be able to make practice by listening to songs/chants and recognizing voices, styles (modas), rhythms and vocal gestures (for example in the form of quizzes) M M M M 20. The student shall be able to practice with the basic rules/grammar (in the form of quizzes) M M M M 21. The student should be able to record his/her voice and the system compares the related spectrogram with the one produced by a master singer D D D D 22. The overall learning path shall start from developing listening abilities and then will go to production abilities M M D 23. The system shall allow the student to choose a specific learning path according to the style to be interpreted (rhythms or general sounds) M 24. The system should be able to adjust the difficulty level of the quizzes according to user’s responses D 25. The system shall enable the student to choose a specific learning path according to the voice to be interpreted M M 26. The system shall enable the student to choose a specific lesson M M M M 27. The system shall enable the student to set a difficulty level. M M M M 28. The system shall be able to allow group work/learning M M M D 29. The teacher shall be able to have access to other teachers’ learning material M 30. The system shall give the ability to its users to enter marks and lyrics into the Text to Song tool and will produce the resulting chant (while the tool annotates the current singing word/phrase) M M M M 31. The system should provide a tool that can reproduce accurately the Byzantine Music scales D 3D visualization module for Sensorimotor Learning 32. The 3D platform shall be able to provide visual or audio feedback M M M M 33. The platform should support interactive functionalities to facilitate learning. D D D D 34. If there is an error in student’s vocal tract movement/position, the platform should show him/her the correct way of doing and let him/her imitate it or show the movement/position in parallel, so he/she can watch and imitate at the same time D D D D 35. The platform should display a video/virtual character singing “appropriately” with the correct gestures D D D D 36. The platform should enable a 3D view (with possibility to zoom in /out) D D D D Web Platform for Research and Education 37. The user shall be able to search in the database using different criteria such as style (moda/patriarchic), groups, M M M M

Del_2_1_FINAL2.doc Page 130 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

rhythm, location etc. 38. The user should be able to search using a variety of criteria, e.g. historical, geographical, etc. D D D M 39. The system should visualize the search results graphically or as a text. D D D M 40. The system shall provide access to different material types e.g. text, audio, video, 3D, etc. M M M M 41. The system shall provide a repository of Human Beatbox rhythms and standard songs M 42. The system shall provide a repository of song/chant texts M M M 43. The platform shall provide multilingual and universal access to the contents M M M M 44. The system shall be a highly customizable operating environment according to distinguished user roles. M M M D 45. The system shall provide information about Human Beatbox and various groups, but also describe other vocal singing techniques that can be reproduce by a beatboxer. M 46. The system shall have high contrast themes with large fonts to aid people with visual impairment M 47. The system shall provide historical information about Cantu in Paghjella and other vocal singing techniques, but also information about the practice of other singing groups from different villages and countries. M 48. The system shall present the proper context of Canto a Tenore, i.e. specific festival and costumes, its history and other associated data M 49. The system shall provide rare Human Beatbox audio or video recordings that are not available on traditional communication channels, and be able to give new approaches of the present art (novel inspiration). M 50. The entire systems content shall have a simple hierarchical structure in order not to confuse it’s users M 51. The system shall provide Paghjella audio or video recordings not found or rare on the one hand, and the origin and inspiration and translations of texts on the other hand M 52. The system should have an introduction video describing the aim of the project D

6.2.2 Rare dancing Legend: D1 Tsamiko dance M Mandatory D2 Calus dance D Desirable D3 Walloon dance D4 Contemporary dance D1 D2 D3 D4 ICH Capture and analysis 1. Sensors should not affect the performance of dancers. Sensors technology should cause no or minimal disturbance to the dancers. D D D D 2. The sensors set-up should allow the dancer to move in a minimum area of 5x5m D 3. The sensors set-up should allow the dancer to move in a minimum area of 6x6m D 4. The sensors set-up should allow the dancer to practice in D

Del_2_1_FINAL2.doc Page 131 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

a minimum area of 6x4m 5. The sensors set-up should allow the dancer to move in a minimum area of 2x2m D 6. The minimum distance between a camera and a dancer shall be at least 2m. M M M M 7. The system shall be able to detect the movements of the lower body, especially feet. M D M 8. The system shall be able to detect the movements of all the body parts. M 9. The system shall be able to analyze the basic postures and motion patterns of the dance. M D M M 10. The system shall be able to capture the motion of dancers wearing traditional dance costumes (especially long skirts). M M M M Data Fusion and Semantic Analysis 11. The system shall be able to fuse data from different modalities in order to combine audio data and dance movements. M M M M 12. All measurements shall be time-stamped, so that only consistent measurements (those falling within a specific time-window) are fused in each data fusion cycle. M M M M 13. Different dance figures should be recognized using information related to the identification of specific motion patterns. D 14. Different styles shall be recognized using information related to the number of steps, the identification of specific motion patterns and the type of music. D D M 15. The synchronization between dance figures and music rhythm should be recognized. D D D D 16. The system should be able to analyze at the same time the basic postures of two people dancing together. D 17. Measures enabling the stylistic characterization of improvised dance sequences should be developed. D Educational processes 18. The proposed learning scenario shall promote learning by imitation rather than learning by studying training materials. M D M M 19. The learning scenario shall support a progressive approach, in which dance is divided in smaller parts/entities that are taught independently. M M M 20. The lessons should depend on the age or experience of the student. D D D D 21. The average time of a typical lesson should be around 45- 60 min D D D D 22. The user should be able to define the number of repetitions for each dance figure. D D D D 23. The average number of lessons required should be adapted to the difficulty of the dance figure being taught. D D D D 24. The system should provide videos of people performing the dance as well as video of a teacher explaining the dance. D D D D 3D visualization module for Sensorimotor Learning 25. The 3D platform shall be able to provide visual or audio feedback M M M M

Del_2_1_FINAL2.doc Page 132 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

26. The platform should support interactive functionalities to facilitate learning. D D D D 27. If there is an error in student’s posture, the platform should re-show him/her the correct way of doing and let him/her imitate it or show the gesture in parallel, so he/she can watch and imitate at the same time D D D D 28. The system should/shall provide feedback after the whole M/ M/ M/ dance is completed (M) or after each dance figure (D). D M D D 29. The platform shall display a virtual character performing the correct dance. M D M M 30. The platform shall display a virtual character performing the student’s movements. M M M M 31. The platform should visually highlight the student's mistakes. D D D D 32. The student’s score shall be determined by the precision of the body motion and the precision in following the rhythm M M M 33. The student’s score shall be determined by the precision of the body motion. M M M M 34. The platform shall enable the user to rotate the avatar/view. M M M M 35. The platform shall enable the user to zoom in/out. M M M M 36. The platform shall enable the user to select specific movements for training. M M M M 37. The platform shall enable the user to choose a specific lesson. M M M M 38. The platform shall enable the user to set a difficulty level. M D D D 39. The platform should offer personalized training style. D D D D 40. The platform should adapt the difficult level dynamically D D D D Web Platform for Research and Education 41. The user shall be able to search in the database using different criteria such as style, dance figures, postures and motion patterns. M M M M 42. The user should be able to search using historical or geographical criteria. D D D D 43. Search results should be visualized graphically or as a text. D D D D 44. The platform shall provide different material types e.g. text, audio, 3D etc. M M D D 45. The platform shall provide multilingual and universal access to the contents M M M M 46. The platform shall be a highly customizable operating environment according to distinguished user roles. M M M M 47. The system shall present the proper context of each dance, i.e. specific props (stage) and costumes, its history and other associated data. M M M 48. The platform should give the user the opportunity to get in contact with people performing the dance he is interested in. D D D D

6.2.3 Craftsmanship Legend: P1 Craftsmanship M Mandatory

Del_2_1_FINAL2.doc Page 133 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

D Desirable P1 ICH Capture and analysis 1. Sensors should not affect the performance of potters. Sensors technology D should cause no or minimal disturbance to the potters. 2. The sensors should not be affected by the electromagnetic field produced D by the electric wheel. 3. Wearable glove sensors should not be used because they are considered D obtrusive by the potters (lose the touch of the clay ) 4. Wearable full body suite should not be used because they are considered D obtrusive by the potters 5. The system shall be able to detect the movements of the upper body, M especially hands and fingers. 6. The system shall be able to analyze the basic stages of the wheel- M throwing procedure as well as hand and fingers motion patterns. Data Fusion and Semantic Analysis 7. The system shall be able to fuse data from different sensors in order to M more accurately analyze upper body motion. 8. All measurements shall be time-stamped, so that only consistent measurements (those falling within a specific time-window) are fused in M each data fusion cycle. 9. Different styles shall be recognized using information related to the hand M and figure gestures Educational processes 10. The proposed learning scenario shall promote learning by imitation M (sensorimotor learning). 11. The learning scenario shall support a progressive approach, in which the pottery course is divided in smaller sections in which objects with M increasing complexity are gradually introduced 12. The lessons should depend on the age or/and experience of the student. D 13. The average time of a typical lesson should vary from around 1 - 2 hours D up to 3-4 hours (e.g.Turkish pottery) 14. The learning process should envisage that the average number of hours D required to learn pottery is 200 - 400 15. The system shall provide videos of pottery lessons as a complementary M material. 3D visualization module for Sensorimotor Learning 16. The 3D platform shall be able to provide visual or audio feedback M 17. The platform should support interactive functionalities to facilitate learning. D 18. If there is an error in student’s posture, the platform should re-show him/her the correct way of doing and let him/her imitate it or show the D gesture in parallel, so he/she can watch and imitate at the same time 19. The platform shall display a virtual character (M) and/or video (D) of the M/D teacher focusing on the hands movements. 20. The platform should display a virtual character performing the student's D movements. 21. The platform should enable the user to rotate the avatar/view. D 22. The platform shall visually highlight the student's mistakes. M 23. The platform should enable the user to zoom in/out. D 24. The platform should be able to replay the student's performance D

Del_2_1_FINAL2.doc Page 134 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

25. The platform shall enable the user to select specific movements for M training. 26. The platform shall enable the user to choose a specific lesson. M 27. The platform shall enable the user to set a difficulty level. M 28. The platform should offer personalized training style. D 29. The platform should adapt the difficult level dynamically D Web Platform for Research and Education 30. The user shall be able to search in the database using different criteria M such as style/technique, body/hands postures and motion patterns. 31. The user should be able to search for correlations between various D parameters of the wheel-throwing procedure. 32. Search results should be visualized graphically. D 33. The platform shall provide multilingual and universal access to the M contents 34. The platform shall be a highly customizable operating environment M according to distinguished user roles.

6.2.4 Contemporary music composition Legend: M1 Contemporary music composition M Mandatory D Desirable M1 ICH capture and analysis 1. The system shall be able to capture the sound and produce the related spectrogram M 2. The system shall be able to detect hand gestures/ position M 3. Sensors should not affect the performance of musicians; sensors technology should cause no or minimal disturbance to them D Data Fusion and Semantic Analysis 4. The system shall be able to fuse data captured from different modalities M 5. All measurements shall be time-stamped, so that only consistent measurements (those falling within a specific time-window) are fused in each data fusion cycle. M 6. Different styles of contemporary music composition should be recognized using relevant information. D Educational processes 7. The student will have access to different learning materials, including textual documents, as well as videos and audios M 8. The system should allow the student to separate instruments while listening to musical pieces for processing, such as configure the pace, speed, switch on/off the channel, etc. D 9. The student shall be able to make practice by listening to musical pieces and recognizing styles, etc. from an annotated musical score (“karaoke style” tool) M 10. The student shall be able to practice with the contemporary music composition (in the form of discussion) M 11. The system should give the ability to its users to enter modern musical notation in to the Text to Song tool and will produce the resulting musical piece (while the tool annotates the current singing word/phrase) D 12. The teacher/student/researcher contacts with other registered users of the platform with communication or collaboration tools, such as email, M

Del_2_1_FINAL2.doc Page 135 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

forum, whiteboard, etc. 13. The system shall be able to adjust the difficulty level of the quizzes according to user’s responses M 14. The system shall enable the student to choose a specific lesson M 15. The system shall enable the student to set a difficulty level. M 16. The system shall be able to allow group work/learning M 17. The teacher shall be able to have access to other teachers’ learning material M 3D visualization module for Sensorimotor Learning 18. The 3D platform shall be able to provide visual or audio feedback M 19. The platform should support interactive functionalities to facilitate learning. D 20. A teacher/student/researcher studies goes through the categories based on rhythm, pitch, etc. to compose a musical piece through the intangible musical instrument D 21. A teacher/student/researcher studies should go through a database of sounds, such as musical instruments, urban, environment sounds to compose a musical piece through the intangible musical instrument D 22. A teacher/student/researcher studies should select through different styles, such as tonal, minimal, ambient, etc. to compose a musical piece through the intangible musical instrument. D Web platform for Research and Education 23. The user shall be able to search in the database using different criteria such as style, composers, etc M 24. The user should be able to search using a variety of criteria, e.g. historical, geographical, etc. D 25. The system should visualize the search results graphically or as a text. D 26. The system shall provide access to different material types e.g. text, audio, video, 3D, etc. M 27. The platform shall provide multilingual and universal access to the contents M 28. The system shall provide a repository of contemporary musical scores M 29. The system shall be a highly customizable operating environment according to distinguished user roles. M 30. The system shall have high contrast themes with large fonts to aid people with visual impairment M

Del_2_1_FINAL2.doc Page 136 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

7. Results: Towards the first definition of the i-Treasures Requirements So far, the present deliverable has described the methods and processes carried out within the i-Treasures project in order to define the i-Treasures User Requirements. The present section contains the results of this work, i.e. it contains the overall picture of the User Requirements and thus it is basically structured according to the IEEE Recommendations (1998) regarding the way a document of System Requirements Specifications (SRS) needs to be structured.

7.1 Introduction The purpose of this document section is to define the basic functional and non- functional requirements for the i-Treasures system. Scope of this system is to allow access, provide information, as well as support the leaching and learning processes in the field of ICH. Starting from the four lists of requirements presented in Section 6.2, an attempt has been done to derive a first release of the the i-Treasures User Requirements. In this release the i-Treasures User Requirements are aggregated into a new, ‘higher order’ category, which makes the requirements more general and possibly transferrable to other contexts (i.e. other ICH). In view of a generalization process, besides identifying higher order categories, (some of the) requirements have been also labelled as ‘extendable’ (‘E’ in the tables of Section 7.3), when requirements originally defined under one specific (sub-) use case have been considered transferrable in principle to other use cases as well . The resulting, preliminary list and related aggregation are presented in Section 7.3, but this is still an ongoing process that will need further refinement in the following WP2 deliverables.

7.2 Overall description Four main user roles have been identified so far for the i-Treasures system: the researcher, the student, the teacher and the ‘basic user’. According to the role, users will have different permissions and privileges on the system. As to the main functionalities envisaged, one affordance of the i-Treasures system will be to allow the detection and capturing of the ICH main features. In particular this will include capturing any relevant performer’s posture and movements (especially: total body, feet, leg, hand and fingers, vocal tract, gaze, face, etc.), capturing sounds (through recordings, etc.), capturing contextual conditions (i.e. accessories and tools of any kind used by the performer), capturing any interactions with others, capturing single roles, single styles, and detect synchronization aspects (among performers, among different ‘actions’ by the same performer, etc.). The system shall also be able to detect basic features/sequences/ patterns of a performance, categorize improvisation patterns, as well as detect deviations from standard performance. Given that the system will use sensors of various kinds to capture all the above mentioned data, it is an issue that these sensors shall not disrupt or influence the performance. Besides, the system shall be able to fuse data coming from /in different modalities from different sources.

Del_2_1_FINAL2.doc Page 137 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Another important category of functionalities of the i-Treasures system has to do with its ability to guarantee access to data and information concerning the ICHs. This means that the system will allow the storage of multi-media information (video, audio, images, text, etc.) and provide adequate and multi-searching functionalities to allow easy retrieval of this information. The system shall guarantee multi-lingual data. Of course, permissions on storage, as well as retrieval permissions will depend on the user’ role. Besides, the system shall provide different display modalities and allow different visualization modes (including 3D), and guarantee high levels of interactivity. Another affordance of the system has to do with providing support to the teaching and learning processes. This means that the system will offer the teacher the possibility to design innovative learning activities for a specific sub-use case, while the student will be able to carry them out and will be assessed. In particular, the system will allow to set up and deliver standard learning paths, as well as personalized ones, and the learning path will possibly adapt dynamically based on the student’ performance in previous activities. The system will support individual activities, as well as group work, offering also communication tools; besides activities may include: readings, exercises (quizzes, etc.), imitation, listening/looking at performances (focusing on roles/styles/sequences/patterns, etc.), 3D visualization of models/sounds/movements, etc., practicing and receiving immediate feedback, etc. In the following Section, the complete list of the User’s Requirements is reported.

Del_2_1_FINAL2.doc Page 138 of 175 7.3 Functional Requirements ICH capture and analysis S1 Canto a Tenore D1 Tsamiko dance M1 Cont. music composit. P1 Craftsmanship M Mandatory S2 Cantu in Paghjella D2 Calus dance D Desirable S3 Byzantine music D3 Walloon dance E Extensible S4 Human Beat Box D4 Contemporary dance

General category Original requirement S1 S2 S3 S4 D1 D2 D3 D4 M1 P1 R.1 The system shall R.1.A - The system shall be able to capture the sound in a high M M M M capture and elaborate quality in order to produce the related spectrogram and to identify sounds fundamental frequencies, ornamentations, consonants, utter, and improvisations R.1.B - The system shall be able to capture the sound and produce M the related spectrogram R.2 The system shall R.2.A - The system shall be able to capture several singers (max. 4) M M M capture different singers' together and be able to separate their single voices voices / instruments in a global performance and be able to switch on/off the single voices R.3 The system shall detect R.3.A - The system should be able to detect singers’ abdominal E D E D and capture movements breathing with suitable sensors. R.3.B - The system shall be able to detect the singer’s vocal tract M M M M engagement (e.g. tongue, mandibles, lips, anterior pharyngeal wall, vocal folds and vocal track constriction) R.3.C - The system shall be able to detect the movements of the M D M lower body, especially feet. R.3.D - The system shall be able to detect the movements of the M upper body, especially hands and fingers. R.3.E - The system shall be able to detect the movements of all the M D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

body parts. R.3.F - The system shall be able to analyze the basic stages of the M wheel-throwing procedure as well as hand and fingers motion patterns. R.3.G - The system shall be able to detect hand gestures M M M M (instrument imitation)/position and general postures. R.3.H - The system shall be able to detect singers’ facial D D D M movements R.3.I. - The system should be able to detect singers’ gaze D D D R.3.L - The system shall be able to detect hand gestures/ position M R.4 The system should R.4.A - The system should be able to detect the reciprocal positions D D D detect physical interactions of singers among performers in a R.4.B - The system should be able to detect the contacts among the D D D group singers R.5 The system shall adapt R.5.A - The system shall as much as possible adapt itself to the E M M E E E E to the place where the places where the performance occurs performance happen R.6 The system shall R.6.A - The system shall be able to analyze the basic postures and M D M M recognize and analyze parts motion patterns of the dance. of dance performances R.7 Sensors set-up shall R.7.A - The sensors set-up should allow the dancer to move in a D adapt to a performing area minimum area of 2x2 m ranging from 2x2 to 6x6 m R.7.B - The sensors set-up should allow the dancer to practice in a D minimum area of 6x4 m R.7.C - The sensors set-up should allow the dancer to move in a D minimum area of 6x6m R.7.D - The sensors set-up should allow the dancer to move in a D minimum area of 5x5 m R.7.E - The minimum distance between a camera and a dancer M M M M shall be at least 2 m.

Del_2_1_FINAL2.doc Page 140 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

R.8 Sensors should not R.8.A - Sensors should not affect the performance of dancers. D D D D affect or hinder performance Sensors technology should cause no or minimal disturbance to the and viceversa dancers. R.8.B - Sensors should not affect the performance of potters. D Sensors technology should cause no or minimal disturbance to the potters. R.8.C - The sensors should not be affected by the electromagnetic D field produced by the electric wheel. R.8.D - Wearable glove sensors should not be used because they D are considered obtrusive by the potters (lose the touch of the clay) R.8.E - Wearable full body suite should not be used because they D are considered obtrusive by the potters R.8.F - Sensors should not affect the performance of singers; D D D D sensors technology should cause no or minimal disturbance to the singers. R.8.G - Sensors should not affect the performance of musicians; D sensors technology should cause no or minimal disturbance to them R.9 Sensors usage and R.9.A - The system shall be able to capture the motion of dancers M M M M function shall not be wearing traditional dance costumes (especially long skirts). hindered by the vestments and accessories

Del_2_1_FINAL2.doc Page 141 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Data Fusion and Semantic Analysis S1 Canto a Tenore D1 Tsamiko dance M1 Cont. music composit. P1 Craftsmanship M Mandatory S2 Cantu in Paghjella D2 Calus dance D Desirable S3 Byzantine music D3 Walloon dance E Extensible S4 Human Beat Box D4 Contemporary dance

General category Original requirement S1 S2 S3 S4 D1 D2 D3 D4 M1 P1 R.10 The system shall rely R.10.A - All measurements shall be time-stamped, so that only M M M M M M M M M M on consistent consistent measurements (those falling within a specific time- measurements window) are fused in each data fusion cycle. R.11 The system shall fuse R.11.A - The system shall be able to fuse data from different M M M M data captured from /in modalities in order to combine audio data and dance movements. different modalities and from R.11.B - The system shall be able to fuse data from different M different sources sensors in order to more accurately analyze upper body motion. R.11.C - The system shall be able to fuse data captured from M M M M M different modalities R.12 The system shall R.12.A - Different styles of contemporary music composition should D recognize different styles be recognized using relevant information (among different R.12.B - Different styles shall be recognized using information M performances) related to the hand and figure gestures R.12.C - Different styles of Human Beatbox should be recognized D using information related to rhythm, geographical places and hiphop influences. R.12.D - Different styles of chanting should be recognized using D D D information related to ornamentations, improvisations, languages, times and villages. R.12.E - Different styles shall be recognized using information D D M related to the number of steps, the identification of specific motion patterns and the type of music. R.13 The system should R.13.A - The synchronization between dance figures and music D D D D detect synchronization rhythm should be recognized.

Del_2_1_FINAL2.doc Page 142 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

aspects R.13.B - The system should be able to analyze at the same time the D basic postures of two people dancing together. R.14 The system should R.14.A - Different voices should be recognized using information D D D recognize different singers' related to the timbre, etc. voices / instruments in a performance R.15 The system should R.15.A - Measures enabling the stylistic characterization of D allow measures enabling the improvised dance sequences should be developed. stylistic characterization of improvised sequences R.16 The system should R.16.A - Different dance figures should be recognized using D analyze different parts within information related to the identification of specific motion patterns. a performance

Del_2_1_FINAL2.doc Page 143 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Educational processes S1 Canto a Tenore D1 Tsamiko dance M1 Cont. music composit. P1 Craftsmanship M Mandatory S2 Cantu in Paghjella D2 Calus dance D Desirable S3 Byzantine music D3 Walloon dance E Extensible S4 Human Beat Box D4 Contemporary dance

General category Original requirement S1 S2 S3 S4 D1 D2 D3 D4 M1 P1 R.17 The educational platform R.17.A - The proposed learning scenario shall promote learning by M D M M M shall allow learning to be imitation rather than learning by studying training materials. achieved following a variety of R.17.B - The student shall be able to make practice by listening to M methods/ from a variety of musical pieces and recognizing styles, etc. from an annotated sources (learning by imitation, musical score (“karaoke style” tool) reading texts, listening/ R.17.C - The student shall be able to make practice by listening to M M M M looking at performance, …) songs/chants and recognizing voices, styles, rhythms and vocal gestures (for example in the form of quizzes) R.17.D - The student shall be able to practice with the contemporary M music composition (in the form of discussion) R.17.E - The student shall be able to practice with the basic M M M M rules/grammar (in the form of quizzes) R.17.F - The overall learning path shall start from developing M M D listening abilities and then shall go to production abilities R.17.G - The system should provide videos of people performing D D D D the dance as well as video of a teacher explaining the dance. R.17.H - The system shall provide a tool that can reproduce D accurately the Byzantine Music scales R.18 The educational platform R.18.A - The system shall be able to adjust the difficulty level of the E E D E E E E E M E shall adjust lessons/difficulty quizzes according to user’s responses levels according to students’ R.18.B - The lessons should depend on the age or experience of D D D D D characteristics and abilities the student.

Del_2_1_FINAL2.doc Page 144 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

R.19 The educational platform R.19.A - The learning process should foresee that the average D D D D should allow to set/suggest number of lessons is adapted to the difficulty of the dance figure the average number of being taught. lessons /repetitions for each block/unit of knowledge and R.19.B - The user should be able to define the number of repetitions D D D D for the whole performance for each dance figure. R.20 The educational platform R.20.A - The learning process should take into account that the D D D D should allow to set/suggest average time of a typical dance lesson should be around 45-60min the average time for a single R.20.B - The learning process should take into account that the D lesson and for the whole average time of a pottery typical lesson should vary from around 1 - endeavor 2 hours up to 3-4 hours (e.g. Turkish pottery) R.20.C - The learning process should envisage that the average D number of hours required to learn pottery is 200 - 400 R.21 The educational platform R.21.A - The learning scenario shall support a progressive M E M M shall support a progressive approach, in which dance is divided in smaller parts/entities that are learning approach, in which taught independently. performance is divided in R.21.B - The learning scenario shall support a progressive M smaller parts/entities that are approach, in which the pottery course is divided in smaller sections taught independently. in which objects with increasing complexity are gradually introduced R.22 The educational platform R.22.A - The student shall have access to different learning M M M M M shall provide access to a materials, including textual documents, as well as videos and audios variety of educational material R.22.B - The teacher shall be able to have access to other teachers’ M M in different formats learning material R.23 The educational platform R.23.A - The system shall be able to allow group work/learning M M M D M shall support communication and exchanges among the R.23.B - The teacher/student/researcher contacts with other M different actors in the learning registered users of the platform with communication or collaboration process tools, such as email, forum, whiteboard, etc.

Del_2_1_FINAL2.doc Page 145 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

R.24 The educational platform R.24.A - The system shall give the ability to its users to enter marks M M M M shall give the ability to its and lyrics into the Text to Song tool and will produce the resulting users to enter marks and chant (while the tool annotates the current singing word/phrase) lyrics /modern musical R.24.B - The system should give the ability to its users to enter D notation in to the Text to Song modern musical notation in to the Text to Song tool and will produce tool and will produce the the resulting musical piece (while the tool annotates the current resulting chant /musical piece singing word/phrase) (while the tool annotates the current singing word/phrase) R.25 The educational platform R.25.A - The system shall allow the student to choose a specific M shall allow students to make learning path according to the style to be interpreted (rhythms or choices related to the general sounds) educational path R.25.B - The system shall enable the student to choose a specific M M learning path according to the voice to be interpreted R.25.C - The system shall enable the student to choose a specific M M M M E E E E M E lesson R.25.D - The system shall enable the student to set a difficulty level. M M M M M R.26 The educational platform R.26.A - The student should be able to record his/her voice and the D D D D should foresee the recording system compares the related spectrogram with the one produced by of sounds/ movements a master singer R.27 The educational platform R.27.A - The system should allow the student to separate D should allow students to instruments while listening to musical pieces for processing, such as separate and study individual configure the pace, speed, switch on/off the channel, etc. parts of the performance (e.g. R.27.B - The system should allow the student to separate voices D D D switching on/off the single while listening to audios voices/instruments)

Del_2_1_FINAL2.doc Page 146 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

3D visualization module for Sensorimotor Learning S1 Canto a Tenore D1 Tsamiko dance M1 Cont. music composit. P1 Craftsmanship M Mandatory S2 Cantu in Paghjella D2 Calus dance D Desirable S3 Byzantine music D3 Walloon dance E Extensible S4 Human Beat Box D4 Contemporary dance

General category Original requirement S1 S2 S3 S4 D1 D2 D3 D4 M1 P1 R.28 The 3D module shall R.28.A - The platform shall enable the user to rotate the avatar/view. M M M M allow avatar showing and R.28.B - The platform shall display a virtual character (M) and/or M/ moving video (D) of the teacher focusing on the hands movements. D R.28.C - The platform should display a video/virtual character singing D D D D “appropriately” with the correct gestures R.28.D - The platform should display a virtual character performing M D M M the correct dance. R.28.E - The platform should display a virtual character performing D the student's movements. R.28.F - The platform should display a virtual character performing M M M M the student’s movements. R.28.G - The platform should enable the user to rotate the D avatar/view. R.29 The 3D module should R.29.A - The platform shall support interactive functionalities to D D D D D D D D D D allow Interactivity facilitate learning. R.30 The 3D module shall R.30.A - The platform shall enable the user to select specific E E E E M M M M E M allow personalized choices movements for training. R.30.B - The platform shall enable the user to choose a specific E E E E M M M M E M lesson. R.30.C - The platform shall enable the user to set a difficulty level. E E E E M D D D E M R.30.D - The platform should offer personalized training style. E E E E D D D D E D R.30.E - The platform should adapt the difficult level dynamically E E E E D D D D E D

Del_2_1_FINAL2.doc Page 147 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

R.31 The 3D module shall R.31.A - The 3D platform shall be able to provide visual or audio M M M M M M M M M M allow providing students with feedback appropriate feedback R.31.B - If there is an error in student’s posture, the platform should D D D D D (contextual, final) in the re-show him/her the correct way of doing and let him/her imitate it or appropriate format show the gesture in parallel, so he/she can watch and imitate at the same time R.31.C - If there is an error in student’s vocal tract D D D D movement/position, the platform should show him/her the correct way of doing and let him/her imitate it or show the movement/position in parallel, so he/she can watch and imitate at the same time R.31.D - The system should provide feedback after the whole dance M/ M M/ M/ is completed (M) or after each dance figure (D). D D D R.32 The 3D module should R.32.A - The platform should be able to replay the student's D allow the recording of performance sounds/movements R.33 The 3D module should R.33.A - The platform shall enable a 3D view (with possibility to D D D D M M M M E D enable the user to zoom zoom in /out) in/out. R.34 The 3D module shall R.34.A - The student’s score shall be determined by the precision of M M M envisage that assessment the body motion and the precision in following the rhythm of students’ performance is R.34.B - The platform shall visually highlight the student's mistakes. E E E E D D D D E M carried out using various criteria R.35 The 3D module shall R.35.A - The platform shall visually highlight the student's mistakes. E E E E D D D D E M visually highlight the student's mistakes. R.36 The 3D module should R.36.A - A teacher/student/researcher studies goes through the allow to compose a musical categories based on rhythm, pitch, etc. to compose a musical piece piece through the intangible through the intangible musical instrument D

Del_2_1_FINAL2.doc Page 148 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

musical instrument. R.36.B - A teacher/student/researcher studies goes through a database of sounds, such as musical instruments, urban, environment sounds to compose a musical piece through the intangible musical instrument D R.36.C - A teacher/student/researcher studies selects through different styles, such as tonal, minimal, ambient, etc. to compose a musical piece through the intangible musical instrument. D

Del_2_1_FINAL2.doc Page 149 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Web Platform for Research and Education S1 Canto a Tenore D1 Tsamiko dance M1 Cont. music composit. P1 Craftsmanship M Mandatory S2 Cantu in Paghjella D2 Calus dance D Desirable S3 Byzantine music D3 Walloon dance E Extensible S4 Human Beat Box D4 Contemporary dance

General category Original requirement S1 S2 S3 S4 D1 D2 D3 D4 M1 P1 R.37 The web platform shall R.37.A - The platform shall provide multilingual and universal access M M M M M M M M M M allow multilingual and to the contents Universal Access R.37.B - The system should have high contrast themes with large E E M E E E E E M E fonts to aid people with visual impairment R.38 The web platform shall R.38.A - The platform shall be a highly customizable operating M M M D M M M M M M be a highly customizable environment according to distinguished user roles. operating environment according to distinguished user roles. R.39 The web platform shall R.39.A - The system should have an introduction video describing the D put at disposal an archive of aim of the project catalogued information and R.39.B - The system shall present the proper context of each dance, M M M data i.e. specific props (stage) and costumes, its history and other associated data. R.39.C - The system shall provide historical information about Cantu M in Paghjella and other vocal singing techniques, but also information about the practice of other singing groups from different villages and countries. R.39.D - The system shall present the proper context of Canto a M Tenore, i.e. specific festival and costumes, its history and other associated data.

Del_2_1_FINAL2.doc Page 150 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

R.39.E - The system shall provide information about Human Beatbox M and various groups, but also describe other vocal singing techniques that can be reproduce by a beatboxer. R.40 The web platform shall R.40.A - The system shall provide access to different material types M M M M M M D D M E provide an archive e.g. text, audio, video, 3D, etc. containing recordings of R.40.B - The system shall provide a repository of contemporary M different types, in different musical scores formats R.40.C - The system shall provide a repository of Human Beatbox M rhythms and standard songs R.40.D - The system shall provide a repository of song/chant texts M M M R.40.E - The system shall provide rare Human Beatbox audio or M video recordings that are not available on traditional communication channels, and be able to give new approaches of the present art (novel inspiration). R.40.F - The system shall provide Paghjella audio or video recordings M not found or rare on the one hand, and the origin and inspiration and translations of texts on the other hand R.41 The web platform shall R.41.A - The user shall be able to search using a variety of criteria, D D D M D allow searching the archive e.g. historical, geographical, etc. using different criteria and R.41.B - The user shall be able to search in the database using M parameters different criteria such as style, composers, etc. R.41.C - The user shall be able to search in the database using E E E E E M different criteria such as style/technique, body/hands postures and motion patterns. R.41.D - The user shall be able to search in the database using M M M M different criteria such as style (moda/patriarchic), groups, rhythm, location etc. R.41.E - The user should be able to search using historical or E E E E D D D D E E geographical criteria.

Del_2_1_FINAL2.doc Page 151 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

R.41.F - The user should be able to search for correlations between D various parameters of the wheel-throwing procedure. R.41.G - The user shall be able to search in the database using M M M M different criteria such as style, dance figures, postures and motion patterns. R.42 The web platform shall R.42.A - The system shall visualize the search results graphically or D D D M D D D D D D allow visualization of search as a text. results in different formats R.43 The web platform shall R.43.A - The platform shall give the user the opportunity to get in D D D D include communication contact with people performing the dance he is interested in. features R.44 The web platform shall R.44.A - The entire system content shall have a simple hierarchical M organize content in a simple structure in order not to confuse it’s users hierarchical structure

Del_2_1_FINAL2.doc Page 152 of 175 7.4 Non-functional Requirements In the following the list of non-functional requirements is reported:  Accessibility / Usability It is the degree to which a product, device, service, or environment is available to as many people as possible. Accessibility can be viewed as the "ability to access" and benefit from some system or entity. The concept often focuses on people with disabilities or special needs. In order to fulfil this need, certain measures must be taken into consideration according to w3.org. o Provide content that, when presented to the user, conveys essentially the same function or purpose as auditory or visual content in order to aid visual impaired users. . Provide a text equivalent for every non-text element (e.g., via "alt", "longdesc", or in element content). . Provide high contrast themes with large fonts. o Use markup and style sheets and do so properly in order to ensure proper rendering among different browsers. . Use style sheets to control layout and presentation . Use relative rather than absolute units in markup language attribute values and style sheet property values. . Use header elements to convey document structure and use them according to specification. . Mark up lists and list items properly. o Create tables that transform gracefully (if needed) in order to be able to easily access the data in them. . For data tables, identify row and column headers. . For data tables that have two or more logical levels of row or column headers, use markup to associate data cells and header cells. . Do not use tables for layout unless the table makes sense when linearized. Otherwise, if the table does not make sense, provide an alternative equivalent (which may be a linearized version). o Ensure that pages are accessible even when newer technologies are not supported or are turned off (e.g. javascript). . Organize documents so they may be read without style sheets. For example, when an HTML document is rendered without associated style sheets, it must still be possible to read the document. . Ensure that pages are usable when scripts, applets, or other programmatic objects are turned off or not supported. If this is not possible, provide equivalent information on an alternative accessible page. . For scripts and applets, ensure that event handlers are input device-independent. o Provide a multilingual environment in order to maximize the use of the platform from different geographical areas across the world. D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

. The user interface must be offered in multiple languages and additional languages must be able to be added without the need of programming extra code. o Ensure that the user interface follows principles of accessible design: device-independent access to functionality, keyboard operability, self- voicing, etc. . Create a logical tab order through links, form controls, and objects. . Provide keyboard shortcuts to important links o Provide clear navigation mechanisms. . Clearly identify the target of each link . Provide metadata to add semantic information to pages and sites . Provide information about the general layout of a site (e.g., a site map or table of contents) . Use navigation mechanisms in a consistent manner . Provide navigation bars to highlight and give access to the navigation mechanism . If search functions are provided, enable different types of searches for different skill levels and preferences . Place distinguishing information at the beginning of headings, paragraphs, lists, etc. o Ensure that documents are clear and simple . Create a style of presentation that is consistent across pages.  Availability / Fault tolerance The degree to which a system, subsystem or equipment is fully operational/ is in a functioning condition and where it is accessible from. To ensure maximum availability given the financial constraints that exist, we define the following requirements. o Software failure monitoring and failover components. o Regular hardware maintenance o Regular software upgrades (e.g., OS system upgrades, bug patches, current anti-virus definitions, etc.) o Extensive testing and quality engineering. o Proper disaster recovery o Scalability.  Backup / Disaster Recovery / Robustness Backup is the process copying and archiving computer data so it may be used to restore the original after a data loss event. Disaster recovery is process that is related to preparing for recovery or continuation of technology infrastructure which are vital to an organization after a natural or human-induced disaster. o Daily incremental backup of every system component (databases, web files, multimedia files, system files) o Weekly full backup of every system component (databases, web files, multimedia files, system files) o Weekly backup of full system images (vmdks) o Hardware abstraction that is provided by Virtualization Technologies, ensures that the backup system images can be restored on different hardware.

Del_2_1_FINAL2.doc Page 154 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

 Dependency on other parties Parts/Modules/Services of the platform that download/upload data from other systems/platforms in order to function properly, when the other end goes offline or out of service. o Since we are planning to harvest metadata from other platforms such as Europeana, certain measures must be taken into account in order to prevent service loss, erroneous errors etc. in case these other parties go offline  Documentation o Every part of the platform must documented and user manuals for deploying, managing, maintaining, extending must be created in order to ensure proper installation/migration even from administrators not familiar with the i-Treasures platform.  Efficiency (resource consumption for given load) / Environmental protection o Use of virtualization technologies allow multiple OS instances to be run on a single physical server results to limited wasted cpu cycles. o Virtualization technologies also allow the dynamic allocation of hardware resources to individual Virtual Machines (increase available memory, cpu cores, disk space etc.)  Extensibility (adding features, and carry-forward of customizations at next major version upgrade) / maintainability The following must be taken into account to be able to add/remove/upgrade features more efficiently: o The core components must be modular and developed with extensibility in mind as well as upgradeability. o The custom modules must also be developed in a modular/extensible/upgradeable way. o Rapid deployment of code fixes and service packs. o Support for configurable system logging and auditing. o Parameterization for configuration and configuration changes. o Migration, upgrading and backward compatibility. o Systems management functionality and support for systems management standards like SNMP for 3rd party tools. o Archive and backup functionality. o Remote management capabilities.  Failure management o When a module or part of the platform fails, certain mechanisms must exist in order to be able to disable them and the platforms other services that are not directly depended on the failed services, continue to function properly.  Open source / Legal and licensing issues or patent-infringement- avoidability o The platform will be based on open source solutions that provide suitable distribution licenses (GPL, BSD) and allow their use for the i-Treasures project

Del_2_1_FINAL2.doc Page 155 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

 Interoperability Well known and accepted standards will be used in order to ensure interoperability of the relevant components/modules with third party services. o For the process of metadata harvesting, OAI-PMH will be used o For ontology creation/management, OWL will be used o For metadata mapping, known metadata schemas such as Dublin Core, CIDOC-CRM, MPEG-7, etc. will be taken into account. o For web service endpoints, the REST protocol will be used but others will be also considered depending on the case.  Network topology

Figure7-1

o A firewall must be in the first line of defense against abusers. The firewall will allow only the minimum required network protocols/ports to communicate with the platform that resides in the Web Server Virtual Machine in order to protect the server from malicious attackers. o The Data Server Virtual Machine (databases, file servers, digital repository) will allow communication only with the web server, again with the minimum required protocols/ports to ensure minimum exposure of the platform to malicious users.  Scalability o Separating the Web Server(s) from the Data Server(s) allows a lot easier horizontal scalability to be implement and cope with increased load.  Performance / response time It is the amount of useful work accomplished by a computer system compared to the time and resources used o Proper adequate response times – application loading, screen open and refresh times, etc. o Latency – proper time to complete tasks

Del_2_1_FINAL2.doc Page 156 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

o Throughput – multiple jobs can be completed in a certain time period o Quick response in simple search and little bit more time for advance search – this vary based on input conditions  Platform compatibility o Use technologies that are compatible with multiple web browsers (Mozilla, Microsoft, Chrome, Safari) and web standards like W3C and CSS o Support major types of tablets and mobile devices  Portability o The platform must function on all commonly available operating systems (Microsoft Windows, Apple OSX and Linux) o Downloadable content must be compatible with as many different platforms (Windows, OSX , Linux) as possible  Security / Privacy o Password requirements - length, special characters, expiry, recycling policies in order to prevent successful brute force attacks as well as account theft because of password interception. o The database should support role based access control, user based privileges in order to allow actions to be taken according to the user’s role. o The database should support the resource allocated to the per user session in order for the users experience with the platform to be more pleasant (performance wise). o The system should have the option to encrypt data before transferring over a network (HTTPS) in order to prevent data sniffing an exposure of sensitive data of the user or the platform to malicious attackers. o The system should have the option to encrypt the data stored in the database in case the database security gets compromised. o The system should support password reset capabilities in case an account password is lost or stolen. o The system should be able to lock accounts on multiple failed logon attempts to prevent brute force attacks.  Audit and control o Provide reports on user activity based on the role and the application that was used in order to be able to track down abuse or do troubleshooting. o Support for auditing to track and monitor user behavior to be used for analysis in order to get useful information regarding the use of the system. o Support for auditing failed logon attempts in order to detect brute force attempts.

Audit system should be centralized, secured and should provide detail insight in audit data (who did what, to what data and when) in order to provide a single point of managing logging information which makes task such as monitoring/backup etc. easier and more efficient.

Del_2_1_FINAL2.doc Page 157 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

8. Discussion and Conclusions In this deliverable we have presented the complex work that has led to the preliminary definition of the User’ Requirements for the i-Treasures system. The process, as we have demonstrated, has been highly collaborative and inter- disciplinary, with a strong effort devoted to involve all the main stakeholders, including not only the various partners with their variety of competences, but also the communities around the single ICHs considered by the project. The effort has given very good results, in terms of sub-use case analysis and knowledge domain definition, which have then nurtured the process of requirements’ definition. From our point of view, one of the main outcome of such process has been the construction of a common background among the partners and a shared understanding of the sub-use cases, as well as a joint vision of the project. Even the process of knowledge domain definition with the direct involvement of the experts/ performers, which was essentially a means to gain a deep understanding of the sub-use cases, can be considered an outcome in itself. The deriving list of requirements is certainly another important result of such effort, but this is still preliminary and may need to be further refined in the future. To this aim, other two deliverables are envisaged in the project life span (under WP2, D2.3 and D2.5), which will serve exactly to tune and refine the preliminary requirements, also on the basis on the users’ impressions about the use of the i-Treasures system prototype. Also the requirements’ categorization(s) proposed here have been conceived mainly as a working tool internal to the Consortium; in the future, in order to facilitate reading by external readers and possibly to support the re-use of these requirements in other ICH fields, it will be advisable to elaborate further on wider, more abstract categorizations. Lastly, the direct involvement of the experts/ performers – even if not always easy to get – should be regarded as one of the most outstanding outcome of this stage of the project. The collaboration process just started with them will certainly be further reinforced and other interactions with the experts will be planned in the near future, with the main aim to enrich the already available data (especially for those sub-use cases – i.e. contemporary dance, Cantu in Paghjella and Tsamiko dance - where the preliminary contacts didn’t lead to a sufficient amount of data), but also to start up a process of knowledge extraction, which should be the basis of the process of know- how modelling envisaged under WP4.

Del_2_1_FINAL2.doc Page 158 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

9. References Aftanas, L., Varlamov, A., Pavlov, S., Makhnev, V. and Reva, N. (2001) ‘Affective Picture Processing: Event-Related Synchronization within IndividuallyDefined Human Theta Band Is Modulated By Valence Dimension’, Neuroscience Letters, vol. 303, pp. 115-118. Aikawa, N. (2004) ‘An Historical Overview of the Preparation of the UNESCO International Convention for the Safeguarding of Intangible Heritage’, Museum International, vol. 56, pp. 137-149. Alankus, G., Bayazit, A. A. and Bayazit, O. B. (2005) ‘Automated motion synthesis for dancing characters: Motion Capture and Retrieval’, Comput. Animat. Virtual Worlds, vol. 16, no. 3-4, pp. 259-271. Alexiadis, D. S., Kelly, P., Daras, P., O'Connor, N. E., Boubekeur, T. and Moussa, M. B. (2011). ‘Evaluating a dancer's performance using kinect-based skeleton tracking’, Proceedings of the 19th ACM international conference on Multimedia, ACM, New York, USA,pp. 659 – 662. Alivizatou, M. (2007) ‘The UNESCO Programme for the Proclamation of Masterpieces of the Oral and Intangible Heritage of Humanity: A Critical Examination’, Journal of Museum Ethnography, vol. 19, pp. 34-42. Alivizatou, M. (2012) ‘Debating Heritage Authenticity: Kastom and Development at the Vanuatu Cultural Centre’, International Journal of Heritage Studies, vol. 18, no. 2, pp. 124- 143. Alivizatou, M. (2012a) Intangible Heritage and the Museum: New Perspectives on Cultural Preservation, Walnut Creek, CA: Left Coast Press. Aylward, R. and Paradiso, J. A. (2006) ‘Sensemble: A Wireless, Compact, Multi-User Sensor System for Interactive Dance’, in Proceedings of the International Conference on New Interfaces for Musical Expression (NIME06), Paris, France, Centre Pompidou, pp. 134-139. Badin, P., Bailly, G., Reveret, L., Baciu, M., Segebarth, C. and Savariaux, C. (2002) ‘Three- dimensional linear articulatory modeling of tongue, lips and face, based on MRI and video images’, Journal of Phonetics, vol. 30, no. 3, pp. 533-553. Bai, L., Lao,. S., Zhang,W., Jones, G. J. F. and Smeaton, A. F. (2007) ‘Video Semantic Content Analysis Framework Based on Ontology Combined MPEG-7’, Adaptive Multimedia Retrieval: Retrieval, User, and Semantics, Lecture Notes in Computer Science, July, pp. 237- 250. Balasubramanian, S., Melendez-Calderon, A. and Burdet, E. (2012) ‘A robust and sensitive metric for quantifying movement smoothness’, Biomedical Engineering, IEEE Transactions on, vol. 59, no. 8, pp. 2126-2136. Basu, B. (2011) ‘Object Diasporas, Resourcing Communities: Sierra Leonean Collections in the Global Museums cape’, Museum Anthropology, vol. 34, no. 1, pp. 28-42. Bartlett, M.S., Littlewort, G., Fasel, I. and Movellan, J. R. (2003) ‘Real time face detection and facial expression recognition: development and application to human computer interaction’, Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, Portland, Oregon Black, M. and Yacoob, Y. (1997) ‘Recognizing facial expressions in image sequences using local parameterized models of image motion’, International Journal of Computer Vision, vol. 25, no. 1, October, pp. 23-48. Beck et al. (2001). Manifesto for Agile Software Development, Agile Alliance, [On line] Available at: http://agilemanifesto.org [13 July 2013]. Beeler, T., Bickel, B.,. Beardsley, P., Sumner, B. and Gross M. (2010) ‘High-Quality Single- Shot Capture of Facial Geometry’, ACM Transactions on Graphics, vol. 29, no. 40, pp. 1-9.

Del_2_1_FINAL2.doc Page 159 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Beni, G. and Wang J. (1989) ‘Swarm Intelligence in Cellular Robotic Systems’, in Proceedings, NATO Advanced Workshop on Robots and Biological Systems, Tuscany, Italy, June, pp. 26-30 Bennett, G. and Rodet, X. (1989) ‘Synthesis of the singing voice’, in Mathews, M.V. and. Pierce, J.R, (ed.) Current directions in computer music research, Cambridge, MA, USA: MIT Press, pp.19-44. Berndtsson, G. (1996) ‘The KTH rule system for singing synthesi’, Computer Music Journal, vol. 20, no. 1, pp. 76—91. Bettadapura, V. (2012) Face Expression Recognition and Analysis: The State of the Art, Atlanta, GA: Technical report, Georgia Tech, College of Computing. Bevilacqua, F., Guédy, F., Schnell, N., Fléty, E. and Leroy, N. (2007) ‘Wireless sensor interface and gesture-follower for music pedagogy’, Proceedings of the 7th international conference on New Interfaces for Musical Expression, New York, USA, pp. 124-129. Bevilacqua, F., Naugle, L. and Dobrian, C. (2001) ‘Music control from 3D motion capture of dance’. CHI 2001 for the NIME workshop. Bevilacqua, F., Naugle, L. and Valverde, I. (2001) ‘Virtual dance and music environment using motion capture’, Proceedings of the IEEE-Multimedia Technology And Applications Conference, Irvine CA., Citeseer. Bevilacqua, F., Zamborlin, B., Sypniewski, A., Schnell, N., Guédy, F. and Rasamimanana, N. (2010). ‘Gesture in embodied communication and human-computer interaction’, 8th International Gesture Workshop , pp. 73-84. Blackwell, T. (2006 ) ‘Swarming and Music’, in Miranda, E. and Biles, J.(ed.) Evolutionary Computer Music, London: Springer. Blackwell, T. and Bentley, P (2002)‘Improvised Music with Swarms’, Proceedings of the Congress on Evolutionary Computation (CEC ’02), Honolulu, HI, pp. 1462-67 Blei, D. M.,. Ng, A. Y and Jordan M. I, (2003)’Latent Dirichlet allocation’, (Lafferty John ed, Journal of Machine Learning Research, Vol. 1, no. 4–5, January, pp. 993–1022. Bolton, L. (2003) ‘Unfolding the Moon: Enacting Women's Kastom in Vanuatu’, Honolulu, Hawai'i: University of Hawai'i Press. Bonada, J. and Serra .X. (2007) ‘Synthesis of the singing voice by performance sampling and spectral models’. Signal Processing Magazine, IEEE, vol. 24, no. 2, pp.67-79. Bonada, J., Celma, O., Loscos, A., Ortolà j. and Serra X. (2001) ‘Singing voice synthesis combining excitation plus resonance and sinusoidal plus residual models’, Proceedings of the International Computer Music Conference, Havana, Cuba, Bonada, J., Loscos, A. and Kenmochi, H. (2003)’Sample-based singing voice synthesizer by spectral concatenation’, Proceedings of the Stockholm Music Acoustics Conference, Stockholm, Sweden. Bouchard, D. and Badler, N. (2007) ‘Semantic segmentation of motion capture using laban movement analysis’,Intelligent Virtual Agents , Springer, pp. 37-44. Boucher, M. (2011) ‘Virtual Dance and Motion-Capture’, Contemporary Aesthetics, vol. 9 Bourel, F., Chibelushi, C. and Low, A. (2002) ‘Robust Facial Expression Recognition Using a State-Based Model of Spatially-Localized Facial Dynamics’, Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition, Washington, USA. Bradley, D., Heidrich, W., Popa, T. and Sheffer, A. (2010).’High Resolution Passive Facial Performance Capture’, ACM Transactions on Graphics, vol. 29, no. 41. Brand, M. and Hertzmann, A. (2000) ’Style machines’, Proceedings of the 27th annual conference on Computer graphics and interactive techniques, ACM Press/Addison-Wesley Publishing Co, pp. 183-192. Bravi, P. (2012) Mbimbom. L'accompagnamento vocale nel canto a tenore. Ph.D. Thesis - University of Sassari.

Del_2_1_FINAL2.doc Page 160 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Breiman, L., Friedman J., Olshen. R. and Stone C. (1984) ‘Classification and Regression Trees’, in Wadsworth and Brooks, Monterey, CA. Brouse, A., Castel, A., Filatriau, J., Lehembre, J., Noirhomme, R. and Simon, Q. (2005) ‘From Biological Signals to Music’, Proceedings of the 2nd International Conference on Enactive Interfaces, Genoa, Italy, . Brown, W. M., Cronk, L., Grochow, K., Jacobson, A., Liu, C. K., Popovic, Z., et al. (2005) ‘Dance reveals symmetry especially in young men’, Nature , vol.438, no. 7071, pp. 1148- 1150. Burge J. E. (n.d.) Knowledge Elicitation Tool Classification, Artificial Intelligence Research Group, Worcester Polytechnic Institute, [On line] Available at: http://web.cs.wpi.edu/~jburge/thesis/kematrix.html#_Toc417957386 [13 July 2013]. Burns, A.-M. and Wanderley, M. M. (2006) ‘Visual methods for the retrieval of guitarist fingering’, Proceedings of the Conference on New interfaces for musical expression, Pompidou: IRCAM-Centre, pp. 196-199. Busso, C., Deng, Z., Yildirim, S, Bulut, M., Lee, C., Kazemzadeh, A., Lee, S., Neumann, U. and Narayanan, S. (2004) ‘Analysis of Emotional Recognition Using Facial Expressions, Speech and Multimodal Information’, Proceedings of the International Conference on Multimodal Interfaces, ACM, New York, pp. 205-211 Cabral, J.P., Renals, S., Yamagishi, J. and Richmond, K. (2011) ‘HMM-based speech synthesiser using the LF-model of the glottal source’, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, pp.4704-4707. Calvert, T., Wilke, W., Ryman, R. and Fox, I. (2005) ‘Applications of computers to dance’. Computer Graphics and Applications, IEEE, vol. 25, no. 2, pp. 6-12. Cameron, F. and Kenderdine, S. (2010) Theorizing Digital Cultural Heritage: A Critical Discourse. Cambridge, Mass, London: MIT Press. Cannon, W. (1927) ‘The James-Lange Theory of Emotions: A Critical Examination and an Alternative Theory’, The American Journal of Phychology, vol. 39, pp. 106-124. Caon, M. (2011) ‘Context-Aware 3D Gesture Interaction Based on Multiple Kinects’, Proceedings of the First International Conference on Ambient Computing, Applications, Services and Technologies, Barcelona, Spain, pp.7-12. Chan, J. C., Leung, H., Tang, J. K. and Komura, T. (2011) ’A virtual reality dance training system using motion capture technology’, Learning Technologies, IEEE Transactions, vol 4, no. 2, pp. 187-195. Chang, S., Ellis, D., Jiang, W., Lee, K., Yanagawa, A., Loui, A. C. and Luo, J. (2007) ‘Large- scale multimodal semantic concept detection for consumer video’, Proceedings of the international workshop on Workshop on multimedia information retrieval (MIR '07), September, Germany, pp. 255-264. Chew, Y. and Caspary, E. (2011) ‘MusEEGk: A Brain Computer Musical Interface’, Proceedings of the International Conference on Human Factors in Computing Systems, Vancouver, Canada, pp.1417-1422. Choi, W., D. Baker, Takabashi, S. and Hachimura, K. (2010) ’Implementation of Japanese Intangible Cultural Heritage Noh Play in Second Life’, The Journal of IIEEJ, vol. 39, no. 1. Clarke, E. and Davidson, J. (1998) ‘The Body in Performance’ in Thomas, W. (ed), Composition Performance Reception, Aldershot, Ashgate. Cho, W., Hong, J., Park, H. (2012) ‘Real-Time Ultrasonographic Assessment of True Vocal Fold Length in Professional Singers’, Journal of Voice, vol. 26, no.6, pp. 1-6. Coan, J. and Allen, J. (2004) ‘Frontal EEG Asymmetry as a Moderator and Mediator of Emotion’ Biological Psychology, vol. 67, pp. 7-49.

Del_2_1_FINAL2.doc Page 161 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Coduys, T., Henry, C. and Cont, A. (2004) ‘TOASTER and KROONDE: high-resolution and high-speed real-time sensor interfaces’, Proceedings of the Conference on New Interfaces for Musical Expression, Singapore, pp. 205-206. Cohen, I., Sebe, N., Garg, A., Chen L. and Huang T. (2003) ‘Facial expression recognition from video sequences: temporal and static modelling’, Computer Vision and Image Understanding, vol. 91, pp. 160-187. Cohen, I., Garg A. and Huang T. (2000) ‘Emotion Recognition from Facial Expression Using Multilevel HMM’, Proceedings of the Neural Information Processing Systems Workshop on Affective Computing, Breckenridge. Cook, N. (1990) Music, imagination and culture, Oxford: Clarendon Press. Cook, P. (1992) ‘Physical Models for Music Synthesis, and a Meta-Controller for Real Time Performance’, Proceedings of the International Computer Music Conference and Festival, Delphi, Greece. Cook, P. (1996) ‘Singing voice synthesis: History, current work, and future directions’, Computer Music Journal, vol. 20, no. 3, pp.38—46. Cook, N. (2000) Music: A very short introduction, Oxford: Oxford University Press. Cortes, C. and Vapnik V. (1995) ‘Support-Vector Networks’, Machine Learning, vol. 20, no. 3, pp. 273-297. D'Alessandro, C., D'Alessandro,N., Le Beux, S., Šimko, .J, Çetin, F. and Pirker, H.(2005) ‘The speech conductor: gestural control of speech synthesis’, Proceedings of the Summer Workshop on Multimodal Interfaces (eNTERFACE), Mons, Belgium, pp. 52-61. D'Alessandro, C., D'Alessandro,N., Le Beux and Doval, B.(2006) ‘Real-time CALM synthesizer new approaches in hands-controlled voice synthesis’, Proceedings of the Conference on New interfaces for Musical Expression (NIME), Paris, France, IRCAM; Centre Pompidou, pp.266-271. Darwin, C. (1904) The Expression of Emotions in Man and Animals, 2nd ed., London: John Murray. Darwin, C. (1955) The Expression of Emotion in Man and Animals, New York: Philosophical Library. Dasiopoulou, S., Mezaris, V., Kompatsiaris, I., Papastathis, V. K. and Strintzis, G. M. (2005) ‘Knowledge-Assisted Semantic Video Object Detection, IEEE Transactions on Circuits and Systems for Video Technology’, Special Issue on Analysis and Understanding for Video Adaptation, vol. 15, no. 10, pp. 1210-1224. Datcu, D. and Rothkrantz, L. J. M. (2008), Semantic Audio-Visual Data Fusion for Automatic Emotion Recognition, in Euromedia, Porto. Davidson, J. (1993) ‘Visual perception of performance manner in the movements of solo musicians’, Psychology of Music, vol. 21, pp. 103-113. Davidson, J. (1994) ‘What type of information is conveyed in the body movements of solo musician performers?’, Journal of Human Movement Studies, vol. 6, pp. 279-301. Davidson, J. and Correia, J. (2001) ‘Meaningful musical performance: A bodily experience’, Research Studies in Music Education, vol. 17, pp. 70-83. Davidson, J. and Correia, J.(2002) ‘Body Movement’, in Pamcutt, R. and McPherson, G. (ed) The Science and Psychology of Music Performance, New York, Oxford University Press. Davidson, R. (2004) ‘What Does The Prefrontal Cortex "Do" in Affect: Perspectives on Frontal EEG Asymmetry Research’, Biological Psychology, vol. 67, pp. 219-233. Davidson, R., Ekman, P., Saron, C., Senulis, J. and Friesen, W. (1990) ‘Approach- Withdrawal and Cerebral Asymmetry: Emotional Expression and Brain Physiology’, Personality and Social Psychology, vol. 58, pp. 330-341.

Del_2_1_FINAL2.doc Page 162 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Davidson, R., Schwartz G., Saron, C., Bennet, J. and Goleman, D. (1979) ‘Frontal vs Parietal EEG Asymmetry During Positive and Negative Affect’, Psychophysiology, vol. 16, pp. 202-203. De Torcy, T., Clouet, A., Pillot-Loiseau, C., Vaissière, J., Brasnu, D., Crevier-Buchman, L. (2013) ‘A video-fiberscopic study of laryngopharyngeal behaviour in the Human beatbox’ Logopedics Phoniatrics Vocology on line, pp. 1-11. Degottex, G., Lanchantin, P., Roebel, A. and Rodet, X. (2013) ‘Mixed source model and its adapted vocal-tract filter estimate for voice transformation and synthesis’, Speech Communication, vol. 55, no. 2, pp.278-94. Delalande, F. (1988) ‘La gestique de Gould: éléments pour une sémiologie du geste musical’, in G. Guertin (ed). Glenn Gould, Pluriel. Saint Zenon, Canada: Louise Courteau Editrice Inc. pp. 83-111. Demoucron, M., Askenfelt, A. and Causse, R. E. (2008)’Observations on bow changes in violin performance’, Journal of the Acoustical Society of America , vol. 123 , no.5, pp. 3123- 3123. Denby, B. and Stone, M. (2004) ‘Speech synthesis from real time ultrasound images of the tongue’ Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. 685-688. Denby, B., Oussar, Y., Dreyfus, G. and Stone, M. (2006) ‘Prospects for a silent speech interface using ultrasound imaging’, Proceedings of the IEEE International Conference In Acoustics, Speech and Signal Processing’, vol. 1, pp. I-I. Digalakis, et al. (2003) ‘Large vocabulary continuous speech recognition in greek: Corpus and an automatic dictation system’. In Proceedings of Eurospeech ‘03 Conference, Geneva, Switzerland,. Dobrian, C. and Bevilacqua, F. (2003) ‘Gestural control of music: using the vicon 8 motion capture system.’ Proceedings of the Conference on New Interfaces for Musical Expression (NIME), National University of Singapore, pp. 161-163. Domingos, P. and Pazzani, M. J. (1997) ‘On the optimality of the simple bayesian classifier under zero-one loss’, Machine Learning, vol. 29, n. 2-3, p. 103–130. Dribus, J., (2004) ‘The Other Ear: A musical sonification of EEG data’, Proceedings of the International Conference on Auditory Display, Sidney, Australia,. Drobny, D., Weiss, M.and Borchers, J. (2009) ‘Saltate!: a sensor-based system to support dance beginners, Extended abstracts on Human factors in Computing Systems, Proceedings of the CHI ‘09 International Conference, New York: ACM, pp. 3943-3948 Drugman, T. and Dutoit, T. (2012) ‘The deterministic plus stochastic model of the residual signal and its applications’, Audio, Speech, and Language Processing, IEEE Transactions on, vol. 20 (3), pp.968-981. Dutoit, T. and Leich, H. (1993) ‘MBR-PSOLA Text-to-speech synthesis based on an MBE re-synthesis of the segments database’, Speech Communication, vol. 13, n. 3, pp.435—440. Duvinage, M., Castermans, T., Jimenez-Fabian, R., Hoellinger, T., Petieau, M., Verlinden, O., et al. (2012) ‘Human Walk Modeled by PCPG to Control a Lower Limb Neuroprosthesis by High-Level Commands’, Journal of Systemics, Cybernetics and Informatics, vol. 10, no. 3, pp. 70-80. Ekman, P. (1984) ‘Expression and the Nature of Emotion’, in K. Scherer and P. Ekman (ed.) Approaches to Emotion, Hillsdale, New Jersey: Erlbaum Ekman, P. (1989) The Argument and Evidence about Universals in Facial Expressions of Emotion, New York: Wiley Ekman, P. (2003) Emotions revealed: Recognizing faces and feelings to improve communication and emotional life, New York: Times Books Ekman, P. and Friesen, W. V. (1978) The Facial Action Coding System: A technique for measurement of facial movement, Palo Alto, CA: Consulting Psychologists Press.

Del_2_1_FINAL2.doc Page 163 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Ekman, P., Levenson R.and Friesen W. (1983) ‘Emotions Differ in Autonomic Nervous System Activity’, Science, vol. 221, pp. 1208-1210. Egmont-Petersen, M., Ridder D. D. and Handels H.(2002) ‘Image processing with neural networks – a review’ Pattern Recognition, vol. 35, no.10, p. 2279–2301. Engwall, O. (1999) ‘Modeling of the vocal tract in three dimensions’, in Proceedings, Eurospeech99, Hungary, Budapest, pp. 113–116. Engwall, O. (2004) ‘From real-time MRI to 3D tongue movements’, in Proceedings, 8th International Conference on Spoken Language Processing (ICSLP), Jeju Island, Korea, vol. 2, pp. 1109-1112. Esposito, F., Malerba D., Semeraro G., Altamura O., Ferilli S., Basile T., Berard M. and Ceci M. (2004) ‘Machine learning methods for automatically processing historical documents: from paper acquisition to XML transformation’, Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL, 04), Palo Alto, CA, USA, pp. 328-335. Essid, S., Alexiadis, D. S., Tournemenne, R., Gowing, M., Kelly, P., Monaghan, D. S., et al. (2012) ‘An advanced virtual dance performance evaluator’, Proceedings of the 37th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Kyoto, Japan, pp. 2269-2272. Fang, T., Zhao X., Ocegueda O., Shah S. K. and Kakadiaris I. A. (2011) ‘3D Facial Expression Recognition: A Perspective on Promises and Challenges’, Proceedings of the 9th IEEE International Conference on Automatic Face and Gesture Recognition (FG'11), Special Session: 3D Facial Behavior Analysis and Understanding, Santa Barbara, CA, USA, pp. 603- 610 Fels, S., Lloyd J.E., Van Den Doel K., Vogt F., Stavness I. and Vatikiotis-Bateson E. (1991) ‘Developing physically-based, dynamic vocal tract models using Artisynth’ , Proceedings of ISSP 6, pp.419-426, Folgieri, R. and Zichella M.(2012) ‘A BCI-based Application in Music: Conscious Playing of Single Notes by Brainwaves’, ACM Computers in Entertainment, vol. 10, no. 3, pp. 1-10. Fujisaki, H., Ohno, S. and Gu, W. (2004) ‘Physiological and physical mechanisms for fundamental frequency control in some tone languages and a command-response model for generation of their F0 contours’, Proceedings of the International Symposium on Tonal Aspects of Languages: With Emphasis on Tone Languages. Beijing, China, pp. 61-64. Funkhouser, T., Shin, H., Toler-Franklin, C., Garcia Castaneda, A., Brown, B., Dobkin, D., Rusinkiewicz S.and Weyrich, T. (2011) ‘Learning how to match fresco fragments’, ACM Journal of Computing and Cultural Heritage, vol. 4, no. 2, Novemeber, p. Article 7. Gales, M.J. (2000) ‘Cluster adaptive training of hidden Markov models’, Speech and Audio Processing, IEEE Transactions on, vol. 8, no.4, pp. 417—428 Govind, D., Prasanna, S. and Yegnanarayana, B. (2011) ‘Neutral to target emotion conversion using source and suprasegmental information’, Proceedings of INTERSPEECH 2011, Florence, Italy, pp. 2962-2972. Griffiths, P. (1996) Modern Music and After - Directions Since 1945, Oxford: Clarendon Press Gritten, A. and King, E.(2006) Music and Gesture, Aldershot: Ashgate. Grunberg, D. (n.d.) Gesture Recognition for Conducting Computer Music. [On line] Available at: http://schubert.ece.drexel.edu/research/gestureRecognition [10 January 2009] Hafstein, V. (2009) ‘Intangible Heritage as List: From Masterpieces to Representation’, in Smith L. and Akagawa N. (ed.) Intangible Heritage, Abingdon: Routledge, pp. 93-111. Hagemann, D., Naumann E., Lurken A., Becker G., Maier S. and Bartussek D. (1999) ‘EEG Asymmetry, Dispositional Mood and Personality’, Personality and Individual Differences, vol. 27, pp. 541-568.

Del_2_1_FINAL2.doc Page 164 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Han, G., Hwang, J. Choi S. and G. J. Kim (2007) ‘AR pottery: experiencing pottery making in the augmented space’, Proceedings of the 2nd international conference on Virtual reality ICVR'07, Beijing, China. Henrich N., Lortat-Jacob B., Castellengo M., Bailly L. and Pelorson X. (2006), ‘Period- doubling occurences in singing: the "bassu" case in traditional Sardinian "A Tenore" singing, Proceedings of the International Conference on Voice Physiology and Biomechanics, Tokyo, Japan, Jul. 2006. Henrich N., Bailly L., Pelorson X., Lortat−Jacob B. (2009) ‘Physiological and physical understanding of singing voice practices: the Sardinian Bassu case’, AIRS Start−up meeting, Prince Edward Island, Canada, Hess, M. (2007). Icons of Hip Hop: An Encyclopedia of the Movement, Music, and Culture, Greenwood Press, Westport, p.640 Hoffmann, T. (1999) ‘Probabilistic latent semantic indexing’, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, ACM, New York, pp.50-57 Horrocks, I. and Patel-Schneider P. F. (2004) ‘A proposal for an OWL rules language’, Proceedings of the 13th international conference on World Wide Web (WWW04), pp. 723– 731. Horrocks, I. (2005) ‘Description logics in ontology applications’, in Automated Reasoning with Analytic Tableaux and Related Method: proceedings of the 14th International Conference, TABLEAUX 2005, Koblenz, Germany, Berlin, Springer pp. 2-13,. Hueber, T., Aversano, G., Chollet, G., Denby, B., Dreyfus, G., Oussar, Y., Stone, M. (2007) ‘Eigentongue feature extraction for an ultrasound-based silent speech interface', in Proceedings of the IEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2007), vol. 1, pp. 1545-1248. Hueber, T., Chollet, G., Denby, B. and Stone, M. (2008) ‘Acquisition of ultrasound, video and acoustic speech data for a silent-speech interface application’, Proceedings of 8th International Seminar of Speech Production (ISSP), pp. 365-369. Huber-Mörk, R., Zambanini, S., Zaharieva M.and Kampel, M. (2011) ‘Identification of ancient coins based on fusion of shape and local features’, Machine Vision and Applications, vol. 22, no. 6, pp. 983-994. Huber, S., Roebel, A. and Degottex, G. (2012) ‘Glottal source shape parameter estimation using phase minimization variants, Poster presented at the INTERSPEECH12 Conference, Portland, Oregon, USA. Huffman, K. (1996) ‘The Fieldworkers of the Vanuatu Cultural Centre and their Contribution to the Audiovisual Collections’ in Bonnemaison, J., Huffman, K. and Tryon, D. (ed.) Arts of Vanuatu. Honolulu: University of Hawai'i Press, pp. 290-293. Hug, C. and Gonzalez-Perez C. (2012) ‘Qualitative Evaluation of Cultural Heritage Information Modeling Techniques’, ACM Journal on Computing and Cultural Heritage, vol. 5, no. 2, Article 8. Huseinovic, M., Turcinhodzict R. and Rizvic S. (2013) ‘Interactive Animated Storytelling in Presenting Intangible Cultural Heritage’, Proceedings of CESCG 2013: The 17th Central European Seminar on Computer Graphics, Bratislava,. Inanoglu, Z. and Young, S.(2009) ‘Data-driven emotion conversion in spoken English’. Speech Communication, vol. 51, no.3, pp.268—283, 2009. IEEE (1998) IEEE Recommended Practice for Software Requirements Specifications, [On line] Available at: http://www.math.uaa.alaska.edu/~afkjm/cs401/IEEE830.pdf [13 July 2013]. IEEE (2004) Guide to the Software Engineering Body of Knowledge (SWEBOK), [On line] Available at: http://www.computer.org/portal/web/swebok/html/contents [13 July 2013].

Del_2_1_FINAL2.doc Page 165 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Ioannides, M., Fellner, D. Georgopoulos, A. and Hadjimitsis, D. (ed.) (2010), Digital Heritage, Third International Conference, Euromed 2010, Lemnos, Cyprus, Proceedings., Berlin, Heidelberg, New York: Springer. James, J., Ingalls, T., Qian, G., Olsen, L., Whiteley, D., Wong, S., et al. (2006) ‘Movement- based interactive dance performance’, Proceedings of the 14th annual ACM International Conference on Multimedia, New York: ACM, pp. 470-480 James, W. (1890) The Principles of Psychology, New York: Holt, Rinehart and Winston. Jasper, H. (1958) ‘The Ten-Twenty Electrode System of the International Federation’ Electroencephalography and Clinical Neurophysiology, vol. 39, pp. 371-375. Juchniewicz, J. (2008) ‘The influence of physical movement on the perception of musical performance’, Psychology of Music, vol. 36, pp. 417-427. Junyong, L., Jing, Z. M. L., Jing Y.and L. Lu (2008) ‘Design of the Intangible Cultural Heritage Management Information System based on GIS’, Proceedings of the 2008 International Conference on Information Management, Innovation Management and Industrial Engineering, Taipei, pp.94-99 Juslin, P. and Sloboda, J. (2001) Music and Emotion: Thoery and Research, New York: Oxford University Press. Kapur, A., Benning, M., and Tzanetakis, G. (2004) ‘Query-by-beat-boxing: Music retrieval for the DJ’, Proceedings of the 5th International Conference on Music Information Retrieval (ISMIR), Barcelona, pp. 170–178. Kahol, K., Tripathi, P., and Panchanathan, S. (2004) ‘Automated gesture segmentation from dance sequences’, Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition (FGR04), Seoul, Korea, pp. 883-888 Karabetsos, S., Tsiakoulis P., Chalamandaris A. and Raptis S. (2010) ‘One-class classification for spectral join cost calculation in unit selection speech synthesis’, Signal Processing Letters, IEEE, vol. 17, no. 8, pp.746-49. Karasik, A. (2010) ‘A complete, automatic procedure for pottery documentation and analysis’, Proceedings of the IEEE Computer Vision and Pattern Recognition Workshops (CVPRW),San Francisco, CA, USA, pp. 29-34. Kawahara, H., Masuda-Katsuse, I. and de Cheveigne, A. (1999) Restructuring speech representations using a pitch-adaptive time--frequency smoothing and an instantaneous- frequency-based F0 extraction: Possible role of a repetitive structure in sounds’, Speech communication, vol. 27, no.3, pp.187—207. Kenmochi, H. and Ohshita, H. (2007) ‘Vocaloid--commercial singing synthesizer based on sample concatenation. Presented at Interspeech 2007, Antwerp, Belgium, pp.4009-40010 Kim, N.S., Sung, J.S. and Hong, D.H. (2011) ‘Factored MLLR adaptation’, Signal Processing Letters, IEEE, vol. 18, no.2, pp.99—102. Kirshenblatt-Gimblett, B. (2004) ‘Intangible Heritage as Metacultural Production’, Museum International, vol. 56, pp. 52-65. Kollia, I., Tzouvaras, V. Drosopoulos N. and . Stamou G. (2012) ‘A Systemic Approach for Effective Semantic Access to Cultural Content’ Semantic Web – Interoperability, Usability, Applicability, vol. 3, no.1, pp.65-83. Koolen, M. and Kamps J.(2010) ‘Searching cultural heritage data: Does structure help expert searchers?’, Proceedings of RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information, Paris, France, pp. 152-155. Korida, K., Nishino H. and Utsumiya K.(1997) ‘An interactive 3D interface for a virtual ceramic art work environment’, in Proceedings of the 1997 International Conference on Virtual Systems and MultiMedia (VSMM '97), Washington DC, pp. 227-234. Kotsia, I and Pitas I. (2007) ‘Facial expression recognition in image sequences using geometric deformation features and support vector machines’ IEEE Transactions on Image Processing, vol. 16, no.1, pp. 127-187.

Del_2_1_FINAL2.doc Page 166 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Kousis, M., Selwyn, T. and Clark, D. (2011) Contested Mediterranean Spaces: Ethnographic Essays in Honour of Chris Tilly, New York, Oxford: Berghan Books. Kyritsi, V., Georgaki, A. and Kouroupetroglou, G. (2007) ‘A score-to-singing voice synthesis system for the greek language’, Proceedings of the International Computer Music Conference. Copenhagen,Denmark, pp. 216-223. Lakka, C., Nikolopoulos, S., Varytimidis C. and Kompatsiaris, I. (2011) ‘A Bayesian network modeling approach for cross media analysis’, Signal Processing: Image Communication, vol. 26, no. 3, p. 175–193. Lazaridis, A., s Kostoulas T., Ganchev T., Mporas I., Fakotakis, N. (2010) ‘Vergina: A Modern Greek Speech Database for Speech Synthesis?, Proceedings of LREC., Malta, pp. 117-121. Le Beux, S., Feugere, L. and D’Alessandro (2011) ‘C. Chorus Digitalis: experiments in chironomic choir singing’, Proceedings of INTERSPEECH2011, 12th Annual Conference of the International Speech Communication Association, Florence, Italy, pp.2005-2008. Lederer, K. (2005). The phonetics of beatboxing, BA Dissertation, Leeds University, United Kingdom. Lee, S. and Dong, M. (2011) ‘Singing Voice Synthesis: Singer-dependent Vibrato Modeling and Coherent Processing of Spectral Envelope’, Proceeding of INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, Florence, Italy,pp.2001-2004 Leslie, G. and T. Mullen (2011) ‘MoodMixer: EEG-based Collaborative Sonification’, Proceedings of the 2011 International Conference on New Interfaces for Musical Expression, Oslo, pp.296-299. Lidov, D. (1987) On Musical Phrase. Monographies de semiologie et d'analyses musicales, Montreal: Universite de Montreal. Lidov, D. (2006) ‘Emotive Gesture in Music and its Contraries’ in Gritten A. and King E. (ed.), Music and Gesture, Aldershot, Ashgate, pp.24-44. Lien, J., Kanade T., Cohn J.and Li C. (1998) ‘Automated Facial Expression Recognition Based on FACS Action Units’, Proceedings of the Third IEEE Conference on Automatic Face and Gesture Recognition, Nara, pp.390-395 . Lisetti, C., and Nasoz F. (2004) ‘Using Non-invasive Wearable Computers to Recognize Human Emotions from Physiological Signals’, EURASIP Journal on Applied Signal Processing, vol. 11, pp. 1672-1687. Liggins, M., Hall D. L. and Llina J. (2008) Handbook of Multisensor Data Fusion, Theory and Practice (2nd Edition), CRC Press. Lorenzo-Trueba, J.,Barra-Chicote,R.,Raitio,T.,Obin, N.,Alku, P.,Yamagishi,Y. and Montero, J.M. (2012) ‘Towards glottal source controllability in expressive speech synthesis’, Proceedings of INTERSPEECH12, 13th Annual Conference of the International Speech Communication Association Portland, Oregon, pp. 1620-1623. Lu, H.L. and Smith, J. (2000) ‘Glottal source modeling for singing voice synthesis’, Proceedings of the International Computer Music Conference, Berlin, Germany. Lu, J., Wu, D. Yang, H. Luo, C. Li, C. and Yao, D. (2012) ‘Scale-Free Brain-Wave Music from Simultaneously EEG and fMRI Recordings’, PLoS ONE, vol. 7, no. 11, pp. 1-11. Lu, S., Jaffer, A. Jin, X. and Zhoa, H. (2012) ‘Mathematical Marbling’, Computer Graphics and Applications, vol. 32, no. 6, pp. 26-35. Lucier, A. (1976) ‘Statement On: Music for Solo Performer’, in Rosenboom D. (ed.) Biofeedback and the Arts: Results of Early Experiments, Vancouver: ARC Publication. Lukaszyk, S., (2004) ‘A new concept of probability metric and its applications in approximation of scattered data sets’, Computational Mechanics, vol. 33, p. 299-304.

Del_2_1_FINAL2.doc Page 167 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Macon , M.W., Jensen-Link ,L.,Oliverio , J.,Clements , M.A. and Bryan. E. (1997a) ‘Concatenation-based MIDI-to-singing voice synthesis’, Proceedings of the Audio Engineering Society Convention 103, pp. 1-7 Macon , M.W., Jensen-Link ,L.,Oliverio , J.,Clements , M.A. and Bryan. E. (1997 b) ‘A singing voice synthesis system based on sinusoidal modeling’, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Munich, Germany, pp 435-438. MacRitchie, J., Buck, B., and Bailey, N. (2009) ‘Visualising musical structure through performance gesture’, Proceedings of the International Society for Music Information Retrieval Conference, Kobe, Japan, pp.237-242. Magalhaes, J., and Ruger, S. (2007) ‘Information-theoretic semantic multimedia indexing’, Proceedings of the International Conference on Image and Video Retrieval, Amsterdam, The Netherlands, ACM: New York, pp.619-629. Makridis, M. and . Daras, P. (2012) ‘Automatic classification of archaeological pottery sherds’, ACM Journal on Computing and Cultural Heritage, vol. 5, no. 4, pp. Article 15. Makryniotis, D. (2004) The limits of the body. Multidisciplinary approaches, Athens: Nissos Publications. Malempré, M. (2010) ‘Pour une poignée de danses’. Mallik, A., Chaudhuri, S. and Ghosh, H. (2011) ‘Nrityakosha: Preserving the Intangible Heritage of Indian Classical Dance’, ACM Journal on Computing and Cultural Heritage, vol. 4, no. 3, pp. Article 11. Mallik, A., Pasumarthi, A. P. and . Chaudhury, S. (2008) ‘Multimedia ontology learning for automatic annotation and video browsing’, Proceedings of the International Conference on Multimedia Information Retrieval (MIR), New York, USA, pp. 387-394. Manitsaris, S. (2011) ‘Vision par ordinateur pour la reconnaissance des gestes musicaux des doigts’, Revue Francophone d'Informatique Musicale [Online] Available at: http://revues.mshparisnord.org/rfim/index.php?id=107 [13 July 2013]. Margasak, P. (2012) ‘Brain-wave music to go’, [Online] Available at: http://www.chicagoreader.com/Bleader/archives/2012/04/03/brain-wave-music-to-go [13 July 2013]. Margasak, P. (2013) ‘Music of the hemispheres’, [Online] Available at: http://www.chicagoreader.com/chicago/katinka-kleijn-daniel-dehaan-ryan-ingebritsen-cello- brain-wave-eeg/Content?oid=8417453 [13 July 2013]. McDonnell, K.T., Qin H. and Wlodarczyk, R. A. (2001) ‘Virtual clay: a real-time sculpting system with haptic toolkits’, Proceedings of the 2001 symposium on Interactive 3D graphics, New York, USA, pp. 179-190. McIntosh, D., Reichmann-Decker, A. Winkielman, P. and Wilbarger, J. (2006) ‘When the Social Mirror Breaks: Deficits in Automatic, But Not Voluntary, Mimicry of Emotional Facial Expressions in Autism’, Developmental Science, vol. 9, pp. 295-302. Mellish, L. (2006) ‘The Romanian Căluş Tradition and Its Changing Symbolism as It Travels from the Village to the Global Platform’, [Online] Available at: http://mainweb.hgo.se/ [13 July 2013]. Mena, D., Mansour, J., and Simon, S. (1981) ‘Analysis and synthesis of human swing leg motion during gait and its clinical applications’, Journal of Biomechanics , vol. 14, no.12, pp.823-832. Meredith, M., and Maddock, S. (2001) ‘Motion capture file formats explained’, [Online] Available at: http://www.dcs.shef.ac.uk [12 July 2013]. Meron, Y. (1999) ‘High quality singing synthesis using the selection-based synthesis scheme’, Tokyo: University of Tokyo. Ph.D. Thesis.

Del_2_1_FINAL2.doc Page 168 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Meron, Y. and Hirose, K. (2000) ‘Synthesis of vibrato singing’, Proceedings of the IEE Internationa Conference on Acoustics, Speech, and Signal Processing (ICASSP), Istanbul, Turkey, vol. 1, pp. 745-748. Miranda, E. and Brouse, A. (2005) ‘Interfacing the Brain Directly with Musical Systems: On Developing Systems for Making Music with Brain Signals’, Leonardo, vol. 38, no. 4, pp. 331- 336. Mulholland, P., Wolff, A. Collins T. and Zdrahal, Z. (2011) ‘An event-based approach to describing and understanding museum narratives’, Proceedings: Detection, Representation, and Exploitation of Events in the Semantic Web Workshop in conjunction with the International Semantic Web Conference, Bonn, Germany. Nambu, Y., Mikawa, M. and Tanaka, K. (2010) ‘Flexible voice morphing based on linear combination of multi-speakers’vocal tract area functions’, In Proceedings of the 18th European Signal Processing Conference (EUSIPCO-2010). Aalborg, Denmark, pp.790-794. Naphade, M. and Huang, T. (2001) ‘A probabilistic framework for semantic video indexing, filtering, and retrieval’, IEEE Transactions on Multimedia, vol. 3, no. 1, pp. 141-151. Nas, P. (2002) ‘Masterpieces of Oral and Intangible Heritage: Reflections on the UNESCO World Heritage List’, Current Anthropology, vol. 43, no. 1, pp. 139-143. Nasoz, F., Lisetti, C. Alvarez K. and Finkelstein, N. (2003) ‘Emotion Recognition from Physiological Signals for User Modeling of Affect’, In Proceedings of the International Conference on User Modeling, Johnstown, PA, USA. Nauta, W. (1971) ‘The Problem of the Frontal Lobe: A Reinterpretation’,Psychiatric Research, vol. 8, pp. 167-187. Nikolopoulos, S., Lakka, C. Kompatsiaris, I. Varytimidis, C. Rapantzikos K. and Avrithis, Y. (2009) ‘Compound document analysis by fusing evidence across media’, Proceedings of the International Workshop on Content-Based Multimedia Indexing, Chania, Crete, pp. 175-180. Nikolopoulos, S., Zafeiriou, S. Patras I. and Kompatsiaris, I. (2012) ‘High Order pLSA for Indexing Tagged Images’, Signal Processing (Elsevier) Special Issue on: “Indexing of Large- Scale Multimedia Signals”, vol. 93, no. 8, pp. 2212-2228. Nuance Communications Inc. (2012a) ‘Recognizer Languages – Nuance’ [Online] Available at: http://www.nuance.com/for-business/by-solution/customer-service-solutions/solutions- services/inbound-solutions/self-service-automation/recognizer/recognizer- languages/index.htm [26 May 2013]. Nuance Communications Inc. (2012b) Nuance Vocalizer demo. [Online] Available at: http://www.nuance.com/vocalizer5/flash/index.html [22 May 2013]. Ofli, F., Demir, Y., Yemez, Y., Erzin, E., Tekalp, A. M., Balci, K., et al. (2008) ‘An audio- driven dancing avatar’, Journal on Multimodal User Interfaces , vol.2, no.2, pp.93-103. Ohishi, Y., Kameoka, H., Kashino, K. and Takeda, K. (2008) ‘Parameter estimation method of F0 control model for singing voices’, Proceedings of INTERSPEECH 2008, 9th Annual Conference of the International Speech Communication Association, Brisbane, Australia,pp.139-142. Ohishi, Y.,Kameoka, H.,Mochihashi, D.,Nagano, H. and Kashino, K. (2010) ‘Statistical modeling of F0 dynamics in singing voices based on Gaussian processes with multiple oscillation bases’, Proceedings Interspeech, 11th Annual Conference of the International Speech Communication Association, Chiba, Japan, pp. 2598-2601. Oura, K.,Mase, A.,Yamada, T.,Muto, S.,Nankaku, Y. and Tokuda, K. (2010) ‘Recent development of the HMM-based singing voice synthesis system’, Proceedings of the 7th Workshop on Speech Synthesis, Kyoto, Japan, pp. 211-216 Palmer, C., & Pfordresher, P. (2000) ‘From my hand to your ear: the faces of meter in performance and perception’, Proceedings of the International Conference on Music Perception and Cognition, Staffordshire, England, pp. .

Del_2_1_FINAL2.doc Page 169 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Pantic, M. and Patras, I. (2006) ‘Dynamics of facial expression: recognition of facial actions and their temporal segments from face profile image sequences’, IEEE Transactions on Systems, Man, and Cybernetics - Part B, vol.36, no.2, pp. 433-449. Pantic, M. and Rothkrantz, L. (2000) ‘Automatic analysis of facial expressions: The state of the art’, IEEE Transactions on Pattern Analysis and Machine Intelligence , vol.22, no.12, pp. 1424-1445. Pantic, M. and Rothkrantz, L. (2004) ‘Facial action recognition for facial expression analysis from static face images’, IEEE Transactions on Systems, Man, and Cybernetics - Part B, vol.34, no.3, pp.1449-1461. Pejsa, T., and Pandzic, I. (2010) ‘State of the Art in Example-Based Motion Synthesis for Virtual Characters in Interactive Applications’, Computer Graphics Forum , vol.29, no.1, pp. 202-226. Pfurtscheller, G. and Da Silva, F. (1999) ‘Event-Related EEG/MEG Synchronization and Desynchronization: Basic Principles’, Clinical Neurophysiology, vol. 110, pp. 1842-1857. Picard, R., Vyzas, E. and Healey, J. (2001) ‘Toward Machine Emotional Intelligence: Analysis of Affective Physiological State’, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 10, pp. 1175-1191. Pilosu, S. (2012a) ‘Enciclopedia della musica sarda: Canto a Tenore’, in Casu F. and Lutzu M. (ed.), Enciclopedia della musica sarda, vol.1 e 2, Cagliari: L'Unione Sarda. Pilosu, S. (2012b) ‘Canto a tenore and "visibility". Comparing two communities: Orgosolo and Bortigali, In Multipart music. A specific mode of musical thinking, expressive behaviour and sound. Udine: Nota, pp. 403-414. Proctor, M. ,Bresch, E., Bird, D., Nayak, K. and Narayanan, S. (2013) ‘Paralinguistic mechanisms of production in human “beatboxing”: A real-time magnetic resonance imaging study’, Journal of the Acoustical Society of America, vol.133, no.2, pp.1-12. Proutskova, P., Rhodes, C., Wiggins, G. and Crawford, T. (2012) ‘Breathy or Resonant-A Controlled and Curated Dataset for Phonation Mode Detection in Singing’, Proceedings of the International Society for Music Information Retrieval Conference (ISMIR 2012), Porto ,Portugal, pp. 589-594. Punska, O. (1999) ‘Bayesian approach to multisensor data fusion’, PhD. Dissertation, Department of Engineering, University of Cambridge. Raitio, T.,Suni, A. ,Pulakka, H. ,Vainio, M. and Alku, P.(2011) ‘Utilizing glottal source pulse library for generating improved excitation signal for HMM-based speech synthesis’, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2011), Prague, Czech Republic, pp. 4564-4567. Raitio, T.,Suni, A. ,Yamagishi, J. ,Pulakka, H. ,Nurminen, J. ,Vainio, M. and Alku, P. (2011) ‘HMM-based speech synthesis utilizing glottal inverse filtering’, Audio, Speech, and Language Processing, IEEE Transactions on, vol.19, no.1, pp.153-165. Ranalli, J. (2011) ‘Review of Task-Based Language Learning and Teaching With Technology’, About Language Learning & Technology, vol.15, no.3. Raptis, M., Kirovski, D., and Hoppe, H. (2011) ‘Real-time classification of dance gestures from skeleton animation’, Proceedings of ACM SIGGRAPH/Eurographics Symposium on Computer Animation, New York, NY, USA, pp. 147-156. Rasamimanana, N., and Bevilacqua, F. (2008) ‘Effort-based analysis of bowing movements: evidence of anticipation effects’, Journal of New Music Research , vol.37, no.4, pp. 339-351. Robel, A., Villavicencio, F. and Rodet, X. (2007) ‘On cepstral and all-pole based spectral envelope modeling with unknown model order’, Pattern Recognition Letters, vol. 28, no.11, pp.1343-1350. Rodet, X., Depalle, P. and Garcia, G. (1994) ‘Additive analysis and synthesis using partial tracking, spectral envelopes, and inverse fast Fourier transform’, The Journal of the Acoustical Society of America, vol. 95, p. 2958.

Del_2_1_FINAL2.doc Page 170 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Rodet, X., Potard, Y. and Barriere, J.-B. (1984) ‘The CHANT project: from the synthesis of the singing voice to synthesis in general’, Computer Music Journal, vol. 8, no.3, pp.15-31. Roebel, A., Huber, S., Rodet, X. and Degottex, G. (2012) ‘Analysis and modification of excitation source characteristics for singing voice synthesis’, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2012), Kyoto, Japan, pp.5381-5384. Rosenboom, D. (1990) Extended Musical Interface with the Human Nervous System, Berkeley, CA: International Society for the Arts, Sciences and Technology. Rosenboom, D. (1990a) ‘The Performing Brain’, Computer Music Journal, vol. 14, no. 1, pp. 48-65. Saino, K., Tachibana, M. and Kenmochi, H. (2010) ‘A singing style modeling system for singing voice synthesis’, Proceedings of INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association Chiba, Japan, 2894-2897. Saitou, T., Goto, M., Unoki, M. and Akagi, M. (2007) ‘Speech-to-singing synthesis: converting speaking voices to singing voices by controlling acoustic features unique to singing voices’, In Proceedings, Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, New York. Saitou, T., Unoki, M. and Akagi, M. (2005) ‘Development of an F0 control model based on F0 dynamic characteristics for singing-voice synthesis’, Speech communication, vol. 46, no. 3, pp.405-417. Salini, D. (1996) Musiques traditionnelles de Corse, Editions Messagera. Sandbach, G., Zafeiriou, S. Pantic, M. and Rueckert, D. (2012) ‘Recognition of 3D facial expression dynamics’, Image and Vision Computing, vol. 30, no. 10, pp. 762-773. Scherer, K. and Ekman, P. (1982) Handbook of Methods in Nonverbal Behavior Research, Cambridge, UK: Cambridge University Press. Schuller, B., Reiter, S. Mueller, R. Hames, A. and Rigoll, G. (2005) ‘Speaker Independent Speech Emotion Recognition by Ensemble Classification’, Proceedings of the IEEE International Conference on Multimedia and Expo, Amsterdam, The Netherlands, pp. 864- 867. Shawker, T. H., Sonies, B., Stone, M., and Baum, B. J. (1983) ‘Real‐time ultrasound visualization of tongue movement during swallowing’, Journal of Clinical Ultrasound,vol. 11, no. 9, pp. 485-490. Shen, Y., Wu, X., Lüa, C. and Cheng, H. (2012) ‘National Dances Protection Based on Motion Capture Technology’, Chengdu, Sichuan, China, Singapore: IACSIT Press, vol. 51, pp. 78-81. Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., et al. (2011) ‘Real-time human pose recognition in parts from single depth images’, In Proceedings, Computer Vision and Pattern Recognition, Colorado Springs, Colorado, USA. Sinyor, E., Rebecca, C. M., Mcennis, D. and Fujinaga, I. (2005) ’Beatbox classification using ACE’, Proceedings of the International Conference on Music Information Retrieval, London, pp. 672–675. Sloboda, J. (1998) ‘Does music mean anything?’, Musica Scientice, vol. 2, no. 1. Snoek, G., M. Worring, J. Geusebroek, D. Koelma, F. Seinstra and Smeulders, A. (2006) ‘The semantic path finder:using an authoring metaphor for generic multimedia indexing’, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 10, pp. 1678–1689. Snook, S. (1998) Kinesthetic analysis in post-tonal instrumental composition. Phd Thesis, University of Sydney, Australia. Sommerville I., Sawyer P. (1997) Requirements Engineering: A Good Practice Guide, New York: Wiley & Sons Stevens, K.N. (2000). Acoustic phonetics, MIT Press.

Del_2_1_FINAL2.doc Page 171 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Stone, M. L. (1981) ‘Evidence for a rhythm pattern in speech production: Observations of jaw movement’, Journal of Phonetics, vol 9, no.1, pp. 109-120.. Stone, M. (1990) ‘A three‐dimensional model of tongue movement based on ultrasound and x‐ray microbeam data’, The Journal of the Acoustical Society of America,vol.87, pp. 2207. Stone, M. (1991) ‘Toward a model of three-dimensional tongue movement’, Phonetics, vol. 19, pp. 309–320. Stone, M. and Lundberg, A. (1996) ‘Three-dimensional tongue surface shapes of English consonants and vowels’, Journal of the Acoustical Society of America, vol.99, no.6, pp. 3728– 3737. Sung, J.S., Hong, D.H., Kang, S.J. and Kim, N.S. (2011) ‘Factored MLLR Adaptation For Singing Voice Generation’, Proceedings of INTERSPEECH2011, 12th Annual Conference of the International Speech Communication Association, Florence, Italy, pp.2789-2792. Suwa, M., Sugie, N. and Fujimora, K. (1978) ‘A preliminary note on pattern recognition of human emotional expression’, Proceedings of the International Joint Conference on Pattern Recognition, Kyoto, Japan. Takegawa, Y., Terada, T., and Nishio, S. (2006) ‘Design and Implementation of a Real-time Fingering Detection System for Piano Performances’, Proceedings of the International Computer Music Conference, New Orleans, USA, pp. 67-74. Tan, G., Hao, T. and Zhong, Z. (2009) ‘A Knowledge Modeling Framework for Intangible Cultural Heritage Based on Ontology’, Proceedings of the second International Symposium on Knowledge Acquisition and Modeling, Wuhan, China, vol. 1, pp. 304-307. Tardieu, D., Siebert, X., Mazzarino, B., Chessini, R., Dubois, J., Dupont, S., Varni, G. and Visentin, A. (2010) ‘Browsing a dance video collection: dance analysis and interface design’, Journal on Multimodal User Interfaces , vol. 4, no.1, pp. 37-46. Ten Holt, G. A., Reinders, M. J. and Hendriks, E. A. (2007) ‘Multi-Dimensional Dynamic Time Warping for Gesture Recognition’, Proceedings of the Thirteenth annual conference of the Advanced School for Computing and Imaging. Thill, S. (2010), Mindflex Hacks Turns Brain Waves Into Music, [Online] Available at: http://www.wired.com/underwire/2010/10/robert-schneider-teletron [21 October 2010]. Tian, Y.L., Kanade, T. and Cohn, J. F. (2005) ‘Facial Expression Analysis’, In Handbook of Face Recognition, New York: Springer, pp. 247-275. Tilmanne, J., Moinet, A., and Dutoit, T. (2012) ‘Stylistic gait synthesis based on hidden Markov models’, Eurasip journal on Advances in Signal Processing ,vol.72, pp. 1-14. Tokuda, K., Kobayashi, T., Masuko, T., Kobayashi, T. and Kitamura, T. (2000) ‘Speech parameter generation algorithms for HMM-based speech synthesis’, In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’00), Istanbul, Turkey, vol. 3, pp. 1315-1318. Traube, C. (2004) An interdisciplinary study of the timbre of the classical guitar, PhD Thesis. McGill University. Troup, G., Griffiths, T., Schneider-Kolsky, M. and Finlayson, T. (2006) ‘Ultrasound Observation of Vowel Tongue Shapes in Trained Singers’, Proceedings of the 30 th Condensed Matter and Materials Meeting, Wagga, Australia. Tsalakanidou, F. and Malassiotis, S. (2010) ‘Real-time 2D+3D facial action and expression recognition’, Pattern Recognition, vol. 43, no. 5, May, pp. 1763-1775. Tyte, G. (2012) ‘Beatboxing techniques’, [Online] Available at: www.humanbeatbox.com [10 June 2013]. UNESCO, (2003) ‘Convention for the Safeguarding of Intangible Cultural Heritage’. Unesco (2009). ‘Convention for the safeguarding of the intangible cultural heritage’, Nomination for inscription on the Urgent Safeguarding List in 2009 (Reference No. 00315), pp. 1-16.

Del_2_1_FINAL2.doc Page 172 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Verner, J. (1995) ‘MIDI guitar synthesis yesterday, today and tomorrow, an overview of the whole fingerpicking thing’, Recording Magazine , vol. 8, no.9, pp. 52-57. Villavicencio, F. and Bonada, J. (2010) ‘Applying voice conversion to concatenative singing- voice synthesis’. Proceedings INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, pp. 2162-2165. Visweswariah, K., Goel, V. and Gopinath, R. (2002) ‘Structuring linear transforms for adaptation using training time information’, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 02), Orlando, Florida, vol 1, 585-588. Vrochidis, S., Doulaverakis, C. Gounaris, A. Nidelkou, E. Makris, L. and Kompatsiaris, I. (2008) ‘A Hybrid Ontology and Visual-based Retrieval Model for Cultural Heritage Multimedia Collections’, International Journal of Metadata, Semantics and Ontologies, vol. 3, no. 3, pp. 167-182. Waithayanon, C. and Aporntewan, C. (2011) ‘A motion classifier for Microsoft Kinect’, Proceedings of Computer Sciences and Convergence Information Technology, Seogwipo, South Korea, pp. 727-731. Watercutter, A. (2012) ‘This Is Your Brain on Wax’ [Online] Available at: http://www.wired.com/underwire/2012/02/masaki-batoh-brain-waves-music [28 February 2012]. Wang, J., Liu, Z., Wu, Y. and Yuan, J. (2012) ‘Mining actionlet ensemble for action recognition with depth cameras’, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, Rhode Island, USA, pp. 1290-1297. Weise, T., Bouaziz, S. Li, H. and Pauly, M. (2011) ‘Realtime Performance-Based Facial Animation’, ACM Transactions on Graphics, vol. 30, no.4. Wiegers K. E. (1996) Creating a Software Engineering Culture, Dorset House Publishing Wik, P. The Virtual Language Teacher: Models and applications for language learning using embodied conversational agents. Stockholm: KTH Royal Institute of Technology, 2011. Wik, P. and Hjalmarsson, A. (2009) ‘Embodied conversational agents in computer assisted language learning’, Speech communication, vol.51, no.10, pp.1024-1037. Wu, Y., Chang, E. Chang, K.-C. and Smith, J. (2004) ‘Optimal multimodal fusion for multimedia data analysis’, Proceeding of the 12th annual ACM International Conference on Multimedia ACM: New York, USA, pp.572-579. Wu, D., Li, C. and Yao, D. (2009) ‘Scale-Free Music of the Brain’, PLoS ONE, vol. 4, no. 11, pp. 1-8. Wu, D., Li, C., Yin, Y., Zhou C. and Yao, D. (2010) ‘Music Composition from the Brain Signal: Representing the Mental State by Music’, Computational Intelligence and Neuroscience, vol. 2010, pp. 1-6. Xia, L., Chen, C.-C. and Aggarwal, J. (2012) ‘View invariant human action recognition using histograms of 3D joints’, Proceedings of, Computer Vision and Pattern Recognition Workshops (CVPRW), Providence, Rhode Island, USA, pp. 20-27. Yang, C., Peng, D. and Sun, S. (2006) ‘Creating a Virtual Activity for the Intangible Cultural Heritage’, Proceedings of the 16th International Conference on Articial Reality and Telexistence - Workshops (ICAT '06), Hangzhou, China, IEEE Computer Society: Washington DC, USA, pp.636-641. Yim, D. (2004) ‘Living Human Treasures and the Protection of Intangible Cultural Heritage’, ICOM News, vol. 57, no. 4, pp. 10-12. Zafeiriou, S. and Yin, L. (2012) ‘3D facial behaviour analysis and understanding’, Image and Vision Computing, vol. 30, October, pp. 681-682. Zen, H., Tokuda, K. and Black, A.W. (2009) ‘Statistical parametric speech synthesis’, Speech Communication, vol. 51, no.11, pp. 1039—1064.

Del_2_1_FINAL2.doc Page 173 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

Zeng, Z., Pantic, M., Roisman, G. and Huang, T. (2007) ‘A survey of affect recognition methods: Audio, visual, and spontaneous expressions’, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 1, pp. 39–58. Zhou, H. and Mudur, S. P. (2007) ‘Technology and Digital Art: 3D scan-based animation techniques for Chinese opera facial expression documentation’, Computers and Graphics, vol. 31, no. 6, pp. 788-799. Further Readings Jireghie A., Banciu V. and Biriş R. T. (2013) ‘Ethnographic And Medical Considerations On The ‘Calus’ (Morris Dance)’, Practice and Theory in Systems of Education, vol. 8, no. 1, pp.87-90. Available at: http://www.eduscience.x3.hu/2310AngelaJireghie-VioricaBanciu- RodicaBiris.pdf [13 July 2013]. Salini, D. (2009) Histoire des musiques de Corse, Editions Dumane.

Del_2_1_FINAL2.doc Page 174 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676

10. Appendixes

10.1 Appendix to the State of the Art

10.2 Experts’ and Users’ Groups

10.3 Questionnaires for knowledge domain definition

10.4 Questionnaire results: sub-use case analysis

10.5 Guidelines for interviewers – Example

Del_2_1_FINAL2.doc Page 175 of 175