First Report on User Requirements Identification and Analysis

Total Page:16

File Type:pdf, Size:1020Kb

First Report on User Requirements Identification and Analysis See discussions, stats, and author profiles for this publication at: http://www.researchgate.net/publication/271530318 First Report on User Requirements Identification and Analysis TECHNICAL REPORT · AUGUST 2013 DOI: 10.13140/2.1.3418.6564 DOWNLOADS VIEWS 102 28 5 AUTHORS, INCLUDING: Francesca Pozzi Michela Ott Italian National Research Council Italian National Research Council 97 PUBLICATIONS 205 CITATIONS 193 PUBLICATIONS 291 CITATIONS SEE PROFILE SEE PROFILE Francesca Dagnino Alessandra Antonaci Italian National Research Council Italian National Research Council 33 PUBLICATIONS 24 CITATIONS 15 PUBLICATIONS 7 CITATIONS SEE PROFILE SEE PROFILE Available from: Francesca Pozzi Retrieved on: 06 July 2015 Project Title: i-Treasures: Intangible Treasures – Capturing the Intangible Cultural Heritage and Learning the Rare Know- How of Living Human Treasures Contract No: FP7-ICT-2011-9-600676 Instrument: Large Scale Integrated Project (IP) Thematic Priority: ICT for access to cultural resources Start of project: 1 February 2013 Duration: 48 months D2.1 First Report on User Requirements Identification and Analysis Due date of 1 August 2013 deliverable: Actual submission 6 August 2013 date: Version: 2nd version of D2.1 Main Authors: Francesca Pozzi (ITD-CNR), Marilena Alivizatou (UCL), Michela Ott (ITD-CNR), Francesca Dagnino (ITD-CNR), Alessandra Antonaci (ITD-CNR) D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 Project funded by the European Community under the 7th Framework Programme for Research and Technological Development. Project ref. number ICT-600676 i-Treasures - Intangible Treasures – Capturing the Project title Intangible Cultural Heritage and Learning the Rare Know- How of Living Human Treasures Deliverable title First Report on User Requirements Identification and Analysis Deliverable number D2.1 Deliverable version Version 2 Previous version(s) Version 1 Contractual date of delivery 1 August 2013 Actual date of delivery 6 August 2013 Deliverable filename Del_2_1_FINAL2.doc Nature of deliverable R Dissemination level PU Number of pages 175 Workpackage WP 2 Partner responsible ITD-CNR Author(s) Matine Adda-Decker (CNRS), Marilena Alivizatou (UCL), Samer Al Kork (UPMC), Angélique Amelot (CNRS), Alessandra Antonaci (ITD-CNR), George Apostolidis (AUTH), Nicolas Audibert (CNRS), Vasilis Charisis (AUTH), Marius Cotescu (ACAPELA), Lise Crevier-Buchman (CNRS), Francesca Dagnino (ITD-CNR), Bruce Denby (UPMC), Olivier Deroo (ACAPELA), Kosmas Dimitropoulos (CERTH), Cécile Fougeron (CNRS), Vasso Gatziaki (UOM), Cédric Gendrot (CNRS), Alina Glushkova (UOM), Nikos Grammalidis (CERTH), Leontios Hadjileontiadis (AUTH), Anastasios Katos (UOM), Alexandros Kitsikidis (CERTH), Ioannis Kompatsiaris (CERTH), George Kourvoulis (UOM), Gwenaelle Lo Bue (CNRS), Athanasios Manitsaris (UOM), Sotiris Manitsaris (ARMINES/ENSMP), Del_2_1_FINAL2.doc Page 2 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 Dimitris Manousis (UOM), Spiros Nikolopoulos (CERTH), Michela Ott (ITD-CNR), Stavros Panas (AUTH), Xrysa Papadaniil (AUTH), Savvas Pavlidis (UOM), Claire Pillot- Loiseau (CNRS), Francesca Pozzi (ITD-CNR), Thierry Ravet (UMONS), George Sergiadis (AUTH), Mauro Tavella (ITD-CNR), Joëlle Tilmanne (UMONS), Filareti Tsalakanidou (CERTH), Viki Tsekouropoulou (UOM), Jacqueline Vaissière (CNRS), Leny Vinceslas (CNRS), Christina Volioti (UOM), Erdal Yilmaz (TT/Sobee Studios). Editor Francesca Pozzi (ITD-CNR) EC Project Officer Alina Senn Abstract The document analyzes and describes the ‘intangible’ artistic expressions chosen by the project as use cases and defines the basic requirements of the i-Treasures system that will be developed to support information, preservation and education of these intangible heritages. Keywords Intangible Cultural Heritage (ICH), preservation, education, technology. Del_2_1_FINAL2.doc Page 3 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 Signatures Written by Responsibility- Company Date Francesca WP2 Leader (ITD-CNR) 16/07/2013 Pozzi Verified by Francesca WP2 Leader (ITD-CNR) 02/08/2013 Pozzi Approved by Nikos Coordinator (CERTH) 05/08/2013 Grammaliis Yiannis (Ioannis) Quality Manager (CERTH) 05/08/2013 Kompatsiaris Del_2_1_FINAL2.doc Page 4 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 Table of Contents 1. Executive summary ........................................................................................................... 8 2. Introduction ....................................................................................................................... 9 2.1 Purpose and structure of the document .................................................................... 9 2.2 Brief introduction to the i-Treasures project .............................................................. 9 2.3 ICHs considered in the project: an overview ........................................................... 10 2.4 Focus on Work Package 2 ...................................................................................... 12 3. Overall methodology for the definition of the i-Treasures Requirements ....................... 15 4. State of the art review ..................................................................................................... 18 4.1 Methodology and Objectives ................................................................................... 18 4.2 Safeguarding Intangible Heritage ............................................................................ 19 4.2.1 The Approach and Projects of UNESCO ......................................................... 19 4.2.2 Community-Focused Approaches to Safeguarding Intangible Heritage ......... 21 4.3 Modern Technologies in the Transmission and Documentation of Intangible Heritage ............................................................................................................................... 23 4.3.1 Facial Expression Analysis .............................................................................. 24 4.3.1.1 Introduction .............................................................................................. 24 4.3.1.2 Key projects and applications .................................................................. 27 4.3.1.3 Possible Use in Intangible Heritage Preservation and Transmission ...... 29 4.3.2 Vocal Tract Sensing and Modeling .................................................................. 30 4.3.2.1 Introduction .............................................................................................. 30 4.3.2.2 Key Projects and applications ................................................................. 31 4.3.2.3 Possible Use in Intangible Heritage Preservation ................................... 32 4.3.3 Motion capture - Body and Gesture Recognition ............................................ 33 4.3.3.1 Introduction .............................................................................................. 33 4.3.3.2 Key Projects and applications & Possible Use in Intangible Heritage Transmission and Preservation .................................................................................. 37 4.3.4 Encephalogram Analysis ................................................................................. 40 4.3.4.1 Introduction .............................................................................................. 40 4.3.4.2 Key Projects and applications ................................................................. 41 4.3.4.3 Possible Use in Intangible Heritage Preservation and Transmission ...... 45 4.3.5 Semantic Multimedia Analysis ......................................................................... 46 4.3.5.1 Introduction .............................................................................................. 46 4.3.5.2 Key Projects and applications ................................................................. 47 4.3.5.3 Possible Use in Intangible Heritage Preservation and Transmission ...... 51 4.3.6 3D Visualization of Intangible Heritage ............................................................ 53 4.3.6.1 Introduction .............................................................................................. 53 4.3.6.2 Key Projects and applications ................................................................. 54 4.3.6.3 Possible Use in Intangible Heritage Preservation and Transmission ...... 57 4.3.7 Text to Song .................................................................................................... 57 Del_2_1_FINAL2.doc Page 5 of 175 D2.1 First Report on User Requirements Identification and Analysis i-Treasures ICT-600676 4.3.7.1 Introduction .............................................................................................. 57 4.3.7.2 Key Projects and applications ................................................................. 58 4.3.7.3 Possible Use in Intangible Heritage Preservation and Transmission ...... 61 4.4 Results and Way Forward: Emerging requirements ................................................ 62 5. Knowledge domain definition .......................................................................................... 64 5.1 Objectives and rationale .......................................................................................... 64 5.2 Experts’ and Users’ Groups setting up ...................................................................
Recommended publications
  • HMM 기반 TTS와 Musicxml을 이용한 노래음 합성
    Journal of The Korea Society of Computer and Information www.ksci.re.kr Vol. 20, No. 5, May 2015 http://dx.doi.org/10.9708/jksci.2015.20.5.053 HMM 기반 TTS와 MusicXML을 이용한 노래음 합성 1)칸 나지브 울라*, 이 정 철* Singing Voice Synthesis Using HMM Based TTS and MusicXML Najeeb Ullah Khan*, Jung-Chul Lee* 요 약 노래음 합성이란 주어진 가사와 악보를 이용하여 컴퓨터에서 노래음을 생성하는 것이다. 텍스트/음성 변환기에 널리 사용된 HMM 기반 음성합성기는 최근 노래음 합성에도 적용되고 있다. 그러나 기존의 구현방법에는 대용량의 노래음 데이터베이스 수집과 학습이 필요하여 구현에 어려움이 있다. 또한 기존의 상용 노래음 합성시스템은 피아 노 롤 방식의 악보 표현방식을 사용하고 있어 일반인에게는 익숙하지 않으므로 읽기 쉬운 표준 악보형식의 사용자 인터페이스를 지원하여 노래 학습의 편의성을 향상시킬 필요가 있다. 이 문제를 해결하기 위하여 본 논문에서는 기 존 낭독형 음성합성기의 HMM 모델을 이용하고 노래음에 적합한 피치값과 지속시간 제어방법을 적용하여 HMM 모 델 파라미터 값을 변화시킴으로서 노래음을 생성하는 방법을 제안한다. 그리고 음표와 가사를 입력하기 위한 MusicXML 기반의 악보편집기를 전단으로, HMM 기반의 텍스트/음성 변환 합성기를 합성기 후단으로서 사용하여 노래음 합성시스템을 구현하는 방법을 제안한다. 본 논문에서 제안하는 방법을 이용하여 합성된 노래음을 평가하였 으며 평가결과 활용 가능성을 확인하였다. ▸Keywords : 텍스트/음성변환, 은닉 마코프 모델, 노래음 합성, 악보편집기 Abstract Singing voice synthesis is the generation of a song using a computer given its lyrics and musical notes. Hidden Markov models (HMM) have been proved to be the models of choice for text to speech synthesis. HMMs have also been used for singing voice synthesis research, however, a huge database is needed for the training of HMMs for singing voice synthesis.
    [Show full text]
  • The Race of Sound: Listening, Timbre, and Vocality in African American Music
    UCLA Recent Work Title The Race of Sound: Listening, Timbre, and Vocality in African American Music Permalink https://escholarship.org/uc/item/9sn4k8dr ISBN 9780822372646 Author Eidsheim, Nina Sun Publication Date 2018-01-11 License https://creativecommons.org/licenses/by-nc-nd/4.0/ 4.0 Peer reviewed eScholarship.org Powered by the California Digital Library University of California The Race of Sound Refiguring American Music A series edited by Ronald Radano, Josh Kun, and Nina Sun Eidsheim Charles McGovern, contributing editor The Race of Sound Listening, Timbre, and Vocality in African American Music Nina Sun Eidsheim Duke University Press Durham and London 2019 © 2019 Nina Sun Eidsheim All rights reserved Printed in the United States of America on acid-free paper ∞ Designed by Courtney Leigh Baker and typeset in Garamond Premier Pro by Copperline Book Services Library of Congress Cataloging-in-Publication Data Title: The race of sound : listening, timbre, and vocality in African American music / Nina Sun Eidsheim. Description: Durham : Duke University Press, 2018. | Series: Refiguring American music | Includes bibliographical references and index. Identifiers:lccn 2018022952 (print) | lccn 2018035119 (ebook) | isbn 9780822372646 (ebook) | isbn 9780822368564 (hardcover : alk. paper) | isbn 9780822368687 (pbk. : alk. paper) Subjects: lcsh: African Americans—Music—Social aspects. | Music and race—United States. | Voice culture—Social aspects— United States. | Tone color (Music)—Social aspects—United States. | Music—Social aspects—United States. | Singing—Social aspects— United States. | Anderson, Marian, 1897–1993. | Holiday, Billie, 1915–1959. | Scott, Jimmy, 1925–2014. | Vocaloid (Computer file) Classification:lcc ml3917.u6 (ebook) | lcc ml3917.u6 e35 2018 (print) | ddc 781.2/308996073—dc23 lc record available at https://lccn.loc.gov/2018022952 Cover art: Nick Cave, Soundsuit, 2017.
    [Show full text]
  • Masterarbeit
    Masterarbeit Erstellung einer Sprachdatenbank sowie eines Programms zu deren Analyse im Kontext einer Sprachsynthese mit spektralen Modellen zur Erlangung des akademischen Grades Master of Science vorgelegt dem Fachbereich Mathematik, Naturwissenschaften und Informatik der Technischen Hochschule Mittelhessen Tobias Platen im August 2014 Referent: Prof. Dr. Erdmuthe Meyer zu Bexten Korreferent: Prof. Dr. Keywan Sohrabi Eidesstattliche Erklärung Hiermit versichere ich, die vorliegende Arbeit selbstständig und unter ausschließlicher Verwendung der angegebenen Literatur und Hilfsmittel erstellt zu haben. Die Arbeit wurde bisher in gleicher oder ähnlicher Form keiner anderen Prüfungsbehörde vorgelegt und auch nicht veröffentlicht. 2 Inhaltsverzeichnis 1 Einführung7 1.1 Motivation...................................7 1.2 Ziele......................................8 1.3 Historische Sprachsynthesen.........................9 1.3.1 Die Sprechmaschine.......................... 10 1.3.2 Der Vocoder und der Voder..................... 10 1.3.3 Linear Predictive Coding....................... 10 1.4 Moderne Algorithmen zur Sprachsynthese................. 11 1.4.1 Formantsynthese........................... 11 1.4.2 Konkatenative Synthese....................... 12 2 Spektrale Modelle zur Sprachsynthese 13 2.1 Faltung, Fouriertransformation und Vocoder................ 13 2.2 Phase Vocoder................................ 14 2.3 Spectral Model Synthesis........................... 19 2.3.1 Harmonic Trajectories........................ 19 2.3.2 Shape Invariance..........................
    [Show full text]
  • Expression Control in Singing Voice Synthesis
    Expression Control in Singing Voice Synthesis Features, approaches, n the context of singing voice synthesis, expression control manipu- [ lates a set of voice features related to a particular emotion, style, or evaluation, and challenges singer. Also known as performance modeling, it has been ] approached from different perspectives and for different purposes, and different projects have shown a wide extent of applicability. The Iaim of this article is to provide an overview of approaches to expression control in singing voice synthesis. We introduce some musical applica- tions that use singing voice synthesis techniques to justify the need for Martí Umbert, Jordi Bonada, [ an accurate control of expression. Then, expression is defined and Masataka Goto, Tomoyasu Nakano, related to speech and instrument performance modeling. Next, we pres- ent the commonly studied set of voice parameters that can change and Johan Sundberg] Digital Object Identifier 10.1109/MSP.2015.2424572 Date of publication: 13 October 2015 IMAGE LICENSED BY INGRAM PUBLISHING 1053-5888/15©2015IEEE IEEE SIGNAL PROCESSING MAGAZINE [55] noVEMBER 2015 voices that are difficult to produce naturally (e.g., castrati). [TABLE 1] RESEARCH PROJECTS USING SINGING VOICE SYNTHESIS TECHNOLOGIES. More examples can be found with pedagogical purposes or as tools to identify perceptually relevant voice properties [3]. Project WEBSITE These applications of the so-called music information CANTOR HTTP://WWW.VIRSYN.DE research field may have a great impact on the way we inter- CANTOR DIGITALIS HTTPS://CANTORDIGITALIS.LIMSI.FR/ act with music [4]. Examples of research projects using sing- CHANTER HTTPS://CHANTER.LIMSI.FR ing voice synthesis technologies are listed in Table 1.
    [Show full text]
  • A Unit Selection Text-To-Speech-And-Singing Synthesis Framework from Neutral Speech: Proof of Concept 39 II.1 Introduction
    Adding expressiveness to unit selection speech synthesis and to numerical voice production 90) - 02 - Marc Freixes Guerreiro http://hdl.handle.net/10803/672066 Generalitat 472 (28 de Catalunya núm. Rgtre. Fund. ADVERTIMENT. L'accés als continguts d'aquesta tesi doctoral i la seva utilització ha de respectar els drets de ió la persona autora. Pot ser utilitzada per a consulta o estudi personal, així com en activitats o materials d'investigació i docència en els termes establerts a l'art. 32 del Text Refós de la Llei de Propietat Intel·lectual undac F (RDL 1/1996). Per altres utilitzacions es requereix l'autorització prèvia i expressa de la persona autora. En qualsevol cas, en la utilització dels seus continguts caldrà indicar de forma clara el nom i cognoms de la persona autora i el títol de la tesi doctoral. No s'autoritza la seva reproducció o altres formes d'explotació efectuades amb finalitats de lucre ni la seva comunicació pública des d'un lloc aliè al servei TDX. Tampoc s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant als continguts de la tesi com als seus resums i índexs. Universitat Ramon Llull Universitat Ramon ADVERTENCIA. El acceso a los contenidos de esta tesis doctoral y su utilización debe respetar los derechos de la persona autora. Puede ser utilizada para consulta o estudio personal, así como en actividades o materiales de investigación y docencia en los términos establecidos en el art. 32 del Texto Refundido de la Ley de Propiedad Intelectual (RDL 1/1996).
    [Show full text]
  • Signal Processing
    Contents | Zoom in | Zoom out For navigation instructions please click here Search Issue | Next Page [VOLUME 32 NUMBER 6 NOVEMBER 2015] Contents | Zoom in | Zoom out For navigation instructions please click here Search Issue | Next Page qM qMqM Previous Page | Contents | Zoom in | Zoom out | Front Cover | Search Issue | Next Page qMqM THE WORLD’S NEWSSTAND® USB & Ethernet Programmable New Models ATTENUATORS up to 120 dB! $ 0 –30, 60, 90, 110 &120 dB 0.25 dB Step 1 MHz to 6 GHz* from 395 Mini-Circuits’ new programmable attenuators offer the entire range of attenuation settings, while USB, precise attenuation from 0 up to 120 dB, supporting Ethernet and RS232 control options allow setup flexibility even more applications and greater sensitivity level and easy remote test management. Supplied with measurements! Now available in models with maximum user-friendly GUI control software, DLLs for attenuation of 30, 60, 90, 110, and 120 dB with 0.25 dB programmers† and everything you need for immediate attenuation steps, they provide the widest range of level use right out of the box, Mini-Circuits programmable control in the industry with accurate, repeatable attenuators offer a wide range of solutions to meet performance for a variety of applications including your needs and fit your budget. Visit minicircuits.com fading simulators, handover system evaluation, for detailed performance specs, great prices, and off automated test equipment and more! Our unique the shelf availability. Place your order today for delivery designs maintain linear attenuation change per dB over as soon as tomorrow! RoHS compliant Models Attenuation Attenuation Step USB Ethernet RS232 Price Range Accuracy Size Control Control Control Qty.
    [Show full text]
  • Synthèse Du Chant
    Mémoire de stage de master 2 ATIAM Synthèse du chant Luc Ardaillon 01/03/2013 - 31/07/2013 Laboratoire d’accueil : IRCAM - Équipe Analyse-Synthèse des sons Encadrant : Axel Roebel 2 Résumé L’objet du stage présenté ici concerne l’évaluation et l’adaptation de technologies existantes dans le cadre de la synthèse du chant, afin d’identifier les problèmes à résoudre et de proposer des solutions adaptées. Pour cela, un système de synthèse basé sur une méthode de concaténation et transformation d’unités a été développé. Un tel système doit permettre, à partir d’une partition et d’un texte, de créer un fichier audio de voix chantée dont le rendu doit être le plus naturel et le plus expressif possible. Une base de données préalablement enregistrée et segmentée est utilisée pour concaténer les segments nécessaires à la synthèse, déterminés par la phonétisation du texte donné en entrée. Divers traitements sont alors effectués afin de lisser les jonctions entre ces segments et rendre celles-ci imperceptibles. Ensuite, afin de faire correspondre le résultat à la partition, le logiciel superVP est utilisé pour l’analyse, la transformation et la resynthèse des sons, par l’application d’algorithmes récents de haute qualité, notamment en ce qui concerne la transposition et l’étirement temporel. Enfin, certaines pistes pour ajouter de l’expressivité et avoir un rendu plus naturel ont été explorées, avec l’implémentation de certaines règles de contrôle des différents paramètres. The internship project presented in this report is concerned by the evaluation and adaptation of existing technologies, for the purpose of singing voice synthesis, thus identifying the main problems to be solved to which some possible solutions have been suggested.
    [Show full text]
  • Review on Expression Control on Singing Voice Synthesis.Docx
    Expression Control in Singing Voice Synthesis Features, Approaches, Evaluation, and Challenges Martí Umbert, Jordi Bonada, Masataka Goto, Tomoyasu Nakano, and Johan Sundberg M. Umbert and J. Bonada are with the Music Technology Group (MTG) of the Department of Information and Communication Technologies, Universitat Pompeu Fabra, 08018 Barcelona, Spain, e-mail: {marti.umbert, jordi.bonada}@upf.edu M. Goto and T. Nakano are with the Media Interaction Group, Information Technology Research Institute (ITRI) at the National Institute of Advanced Industrial Science and Technology (AIST), Japan, e-mail: {m.goto, t.nakano}@aist.go.jp J. Sundberg is with the Voice Research Group of the Department of Speech, Music and Hearing (TMH) at the Royal Institute of Technology (KTH), Stockholm, Sweden, e-mail: [email protected] In the context of singing voice synthesis, expression control manipulates a set of voice features related to a particular emotion, style, or singer. Also known as performance modeling, it has been approached from different perspectives and for different purposes, and different projects have shown a wide extent of applicability. The aim of this article is to provide an overview of approaches to expression control in singing voice synthesis. Section I introduces some musical applications that use singing voice synthesis techniques to justify the need for an accurate control of expression. Then, expression is defined and related to speech and instrument performance modeling. Next, Section II presents the commonly studied set of voice parameters that can change perceptual aspects of synthesized voices. Section III provides, as the main topic of this review, an up-to-date classification, comparison, and description of a selection of approaches to expression control.
    [Show full text]
  • Challenges and Perspectives on Real-Time Singing Voice Synthesis S´Intesede Voz Cantada Em Tempo Real: Desafios E Perspectivas
    Revista de Informatica´ Teorica´ e Aplicada - RITA - ISSN 2175-2745 Vol. 27, Num. 04 (2020) 118-126 RESEARCH ARTICLE Challenges and Perspectives on Real-time Singing Voice Synthesis S´ıntesede voz cantada em tempo real: desafios e perspectivas Edward David Moreno Ordonez˜ 1, Leonardo Araujo Zoehler Brum1* Abstract: This paper describes the state of art of real-time singing voice synthesis and presents its concept, applications and technical aspects. A technological mapping and a literature review are made in order to indicate the latest developments in this area. We made a brief comparative analysis among the selected works. Finally, we have discussed challenges and future research problems. Keywords: Real-time singing voice synthesis — Sound Synthesis — TTS — MIDI — Computer Music Resumo: Este trabalho trata do estado da arte do campo da s´ıntese de voz cantada em tempo real, apre- sentando seu conceito, aplicac¸oes˜ e aspectos tecnicos.´ Realiza um mapeamento tecnologico´ uma revisao˜ sistematica´ de literatura no intuito de apresentar os ultimos´ desenvolvimentos na area,´ perfazendo uma breve analise´ comparativa entre os trabalhos selecionados. Por fim, discutem-se os desafios e futuros problemas de pesquisa. Palavras-Chave: S´ıntese de Voz Cantada em Tempo Real — S´ıntese Sonora — TTS — MIDI — Computac¸ao˜ Musical 1Universidade Federal de Sergipe, Sao˜ Cristov´ ao,˜ Sergipe, Brasil *Corresponding author: [email protected] DOI: http://dx.doi.org/10.22456/2175-2745.107292 • Received: 30/08/2020 • Accepted: 29/11/2020 CC BY-NC-ND 4.0 - This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. 1. Introduction voice synthesis embedded systems, through the description of its concept, theoretical premises, main used techniques, latest From a computational vision, the aim of singing voice syn- developments and challenges for future research.
    [Show full text]
  • Virtual Voices on Hands: Prominent Applications on the Synthesis and Control of the Singing Voice
    VIRTUAL VOICES ON HANDS: PROMINENT APPLICATIONS ON THE SYNTHESIS AND CONTROL OF THE SINGING VOICE . Anastasia Georgaki, Lecturer Music Department, School of Philosophy National and Kapodistrian University of Athens, Greece [email protected] ΑBSTRACT The on-going research of the last thirty years on the Since our last research [14] where we present the most synthesis of the singing voice highlights different aspects important projects on the synthesis of the singing voice, of this implicative field which involves the multiple research programs have being arising all over interdisciplinary area of musical acoustics, signal the world through different optical views, languages and processing, linguistics, artificial intelligence, music methodologies,focusing on the mystery of the synthetic perception and cognition, music information retrieval and singing voice. [2], [15], [16], [19], [21], [22], [33]. performance systems. Recent work shows that the musical and natural quality of singing voice synthesis has 2. MODELS AND RESEARCH ASPECTS evolved enough for high fidelity commercial applications to be realistically envisioned. Among the various projects which use a multitude of In the first paragraph of this paper we are going to techniques and rules, concerning the analysis and highlight briefly the different aspects of the on-going synthesis of the singing voice, we have selected as a research (synthesis models, performance by rules, text- vehicle of our discussion, the research projects which to- speech synthesis, controllers) through a taxonomy of tend to have a complete point of view about the synthesis these approaches. In the second part we are going to of the singing voice (in order to speak about ‘models’ emphasize on the utility and the different applications of which fulfil the expectations of a synthesizer not only this research area (including the recent commercial ones) from the acoustical point of view but also from the as a tool for the music creativity by presenting audio and phonetic one).
    [Show full text]
  • Voice Source Modelling Techniques for Statistical Parametric Speech Synthesis Aalto University
    Department of Signal Processing and Acoustics Aalto- Speech is the most natural way through TuomoRaitio which humans communicate, and today DD Voice source modelling speech synthesis is utilised in various ap- 40 / plications. However, the performance of 2015 modern speech synthesisers falls far short techniques for statistical from the abilities of human speakers—syn- sourceVoice modelling techniques for statistical parametric speech synthesis thesising intelligible and natural sounding parametric speech speech with desired contextual and speaker characteristics and appropriate speaking style is extremely difficult. This thesis aims synthesis to improve both the naturalness and expres- sivity of speech synthesis by proposing new methods for voice source modelling in sta- tistical parametric speech synthesis. With accurate estimation and appropriate model- Tuomo Raitio ling of the voice source signal, which is known to be the origin for several essential acoustic cues in spoken communication, various expressive voices are created with high degree of naturalness and intelligibility. Evaluations in various listening contexts show that speech created with the proposed methods is assessed to be more suitable than that generated with current techniques, thus providing potentially large benefits in many speech synthesis applications. 9HSTFMG*agbdgi+ ISBN 978-952-60-6136-8 (printed) BUSINESS + ISBN 978-952-60-6137-5 (pdf) ECONOMY ISSN-L 1799-4934 ISSN 1799-4934 (printed) ART + ISSN 1799-4942 (pdf) DESIGN + ARCHITECTURE UniversityAalto Aalto University
    [Show full text]
  • Singing Synthesis Framework from Neutral Speech: Proof of Concept Marc Freixes* , Francesc Alías and Joan Claudi Socoró
    Freixes et al. EURASIP Journal on Audio, Speech, and Music Processing (2019) 2019:22 https://doi.org/10.1186/s13636-019-0163-y RESEARCH Open Access A unit selection text-to-speech-and- singing synthesis framework from neutral speech: proof of concept Marc Freixes* , Francesc Alías and Joan Claudi Socoró Abstract Text-to-speech (TTS) synthesis systems have been widely used in general-purpose applications based on the generation of speech. Nonetheless, there are some domains, such as storytelling or voice output aid devices, which may also require singing. To enable a corpus-based TTS system to sing, a supplementary singing database should be recorded. This solution, however, might be too costly for eventual singing needs, or even unfeasible if the original speaker is unavailable or unable to sing properly. This work introduces a unit selection-based text-to-speech-and-singing (US-TTS&S) synthesis framework, which integrates speech-to-singing (STS) conversion to enable the generation of both speech and singing from an input text and a score, respectively, using the same neutral speech corpus. The viability of the proposal is evaluated considering three vocal ranges and two tempos on a proof-of-concept implementation using a 2.6-h Spanish neutral speech corpus. The experiments show that challenging STS transformation factors are required to sing beyond the corpus vocal range and/or with notes longer than 150 ms. While score-driven US configurations allow the reduction of pitch-scale factors, time-scale factors are not reduced due to the short length of the spoken vowels. Moreover, in the MUSHRA test, text-driven and score-driven US configurations obtain similar naturalness rates of around 40 for all the analysed scenarios.
    [Show full text]