MILLA – Multimodal Interactive Language Learning Agent

Total Page:16

File Type:pdf, Size:1020Kb

MILLA – Multimodal Interactive Language Learning Agent MILLA – Multimodal Interactive Language Learning Agent João Paulo Cabral1, Nick Campbell1, Shree Ganesh2, Emer Gilmartin1, Fasih Haider1, Eamonn Kenny1, Mina Kheirkhah3, Andrew Murphy1, Neasa Ní Chiaráin1, Thomas Pellegrini4, Odei Rey Orozko5 Trinity College Dublin, Ireland1; GCDH-University of Goettingen, Germany2; Institute for Advanced Studies in Basic Sciences, Zanjan, Iran3; Université de Toulouse ; IRIT, France4; Universidad del País Vasco, Bilbao, Spain5 1 Background 2 MILLA System Components Learning a new language involves the acquisition Tuition Manager: MILLA’s spoken dialogue and integration of a range of skills. A human tu- Tuition Manager (Figure 1) consults a two-level tor aids learners by (i) providing tasks suitable curriculum of language learning tasks, a learner to the learner’s needs, (ii) monitoring progress record and learner state module to greet and en- and adapting task content and delivery style, and roll learners, offer language learning submod- (iii) providing a source of speaking practice and ules, provide feedback, and monitor user state motivation. With the advent of audiovisual tech- with Kinect sensors. All of the tuition manager’s nology and the communicative paradigm in lan- interaction with the user can be performed using guage pedagogy, focus has shifted from written speech through a Cereproc Text-to-Speech (TTS) grammar and translation to developing commu- voice and Cereproc’s Python SDK (Cereproc, nicative competence in listening and spoken pro- 2014), and understanding via CMU’s Sphinx4 duction. The Common European Framework of ASR (Walker et al., 2004) through custom Py- Reference for Language Learning and Teaching thon bindings using W3C compliant Java Speech (CEFR) recently added a more integrative fifth Format Grammars. skill – spoken interaction - to the traditional four Tasks include spoken dialogue practice with two skills – reading and listening, and writing and chatbots, first language (L1) focused and general speaking (Little, 2006) . While second languages pronunciation, and grammar and vocabulary ex- have always been learned conversationally with ercises. Several speech recognition (ASR) en- negotiation of meaning between speakers of dif- gines (HTK, Google Speech) and text-to speech ferent languages sharing living or working envi- (TTS) voices (Mac and Windows system voices, ronments, these methods did not figure in formal Google Speech) are used in the modules to meet (funded) settings. However, with increased mo- the demands of particular tasks and to provide a bility and globalisation, many learners now need cast of voice characters which provide a variety language as a practical tool rather than simply as of speech models to the learner. Microsoft’s Ki- an academic achievement (Gilmartin, 2008). De- nect SDK (‘Kinect for Windows SDK’, 2014) is velopments in Computer Assisted Language used for gesture recognition and as a platform for Learning (CALL) have resulted in free and affect recognition. The tuition manager and all commercial language learning material for au- interfaces are written in Python 2.7, with addi- tonomous study. Much of this material transfers tional C#, Javascript, Java, and Bash coding in well-established text and audiovisual exercises to the Kinect, chat, Sphinx4, and pronunciation el- the computer screen. These resources greatly ements. For rapid prototyping the dialogue mod- help develop discrete skills, but the challenge of ules were first written in VoiceXML, then ported providing tuition and practice in the ‘fifth skill’, to Python modules. spoken interaction, remains. MILLA, developed Pronunciation Tuition: MILLA incorporates at the 2014 eNTERFACE workshop in Bilbao is two pronunciation modules, based on compari- a multimodal spoken dialogue system combining son of learner production with model production custom modules with existing web resources in a using the Goodness of Pronunciation (GOP) al- balanced curriculum, and, by integrating spoken gorithm (Witt & Young, 2000). GOP scoring dialogue, modelling some of the advantages of a involves two phases: 1) a free phone loop recog- human tutor. nition phase which determines the most likely phone sequence given the input speech without giving the ASR any information about the target sentence, and 2) a forced alignment phase which provides the ASR with the orthographic tran- scription and force aligns the speech signal with the expected phone sequence. Comparison of the log-likelihoods of the forced alignment and free recognition phases produces a GOP score. The first module is a focused pronunciation tu- tor using HTK ASR with the five-state 32 Gauss- ian mixture monophone acoustic models provid- ed with the Penn Aligner toolkit (Young, n.d.; Figure 1 MILLA Overview Yuan & Liberman, 2008) on the system’s local machine. In this module, phone specific thresh- tion with language teachers and learners, the sys- old scores were set by artificially inserting errors tem allows users to speak to the bot, or type chat in the pronunciation lexicon and running the al- responses. A chat log was also implemented in gorithm on native recordings, as in (Kanters, the interface, allowing the user to read back or Cucchiarini, & Strik, 2009). After preliminary replay previous interactions. tests, we constrained the free phone loop recog- Grammar, Vocabulary and External Re- niser for more robust behavior, using phone con- sources: MILLA’s curriculum includes a num- fusions common in specific L1’s to define con- ber of graded activities from the OUP’s English strained phone grammars. A database of com- File and the British Council’s Learn English mon errors in several L1s with test utterances websites. Wherever possible the system scrapes was built into the curriculum. any scores returned for exercises and incorpo- The second module, MySpeech, is a phrase rates them into the learner’s record, while in oth- level trainer hosted on University College Dub- er cases the progression and scoring system in- lin’s cluster and accessed by the system via In- cludes a time required to be spent on the exercis- ternet (Cabral et al., 2012). It tests pronunciation es before the user progresses to the next exercis- at several difficulty levels as described in (Kane es. There are also custom morphology and syntax & Carson-Berndsen, 2011). Difficulty levels are exercises created using Voxeo Prophecy to be introduced by incorporating Broad Phonetic ported to MILLA. Groups (BPGs) to cluster similar phones. A BFG User State and Gesture Recognition: MIL- consists of phones that share similar articulatory LA includes a learner state module to eventually feature information, for example plosives and infer learner boredom or involvement. As a first fricatives. There are three difficulty levels in the pass, gestures indicating various commands were MySpeech system: easy, medium and hard – the designed and incorporated into the system using easiest level includes a greater number of BPGs Microsoft’s Kinect SDK. The current implemen- in comparison to the harder levels. The tation comprises four gestures (Stop, I don’t MySpeech web interface consists of several know, Swipe Left/Right), which were designed numbered panels for the users to select sentences by tracking the skeletal movements involved and and practice their pronunciation by listening to extracting joint coordinates on the x, y, and z the selected sentence spoken by a native speaker planes to train the recognition process. Python’s and record their own version of the same sen- socket programming modules were used to tence. Finally, the results panel shows the detect- communicate between the Windows machine ed mispronunciation errors of a submitted utter- running the Kinect and the Mac laptop hosting ance using darker colours. MILLA. Spoken Interaction Tuition (Chat): To pro- vide spoken interaction practice, MILLA sends 3 Future work the user to Michael (Level1) or Susan (Level 2), MILLA is an ongoing project. In particular, work two chatbots created using the Pandorabots web- is in progress to add a Graphical User Interface based chatbot hosting service (Wallace, 2003). and avatar to provide a more immersive version The bots were first implemented in text-to-text and several new modules are planned. User trials form in AIML (Artificial Intelligence Markup are planned for the academic year 2014-15 in Language) and then TTS and ASR were added several centres providing language training to through the Web Speech API, conforming to immigrants in Ireland. W3C standards (W3C, 2014). Based on consulta- Acknowledgments W3C. (2014). Web Speech API Specification. Re- trieved 7 July 2014, from João Paulo Cabral and Eamonn Kenny are supported by the Science Foundation Ireland https://dvcs.w3.org/hg/speech-api/raw- (Grant 12/CE/I2267) as part of CNGL file/tip/speechapi.html (www.cngl.ie) at Trinity College Dublin.Sree Walker, W., Lamere, P., Kwok, P., Raj, B., Singh, R., Ganesh is supported by GCDH-University of Goettingen. Emer Gilmartin is supported by the Gouvea, E., … Woelfel, J. (2004). Sphinx-4: Science Foundation Ireland Fastnet Project, grant A flexible open source framework for speech 09/IN.1/I2631. Neasa Ní Chiaráin and Andrew recognition. Murphy are supported by the ABAIR Project, funded by the Irish Government’s Department of Wallace, R. S. (2003). Be Your Own Botmaster: The Arts, Heritage, and the Gaeltacht. Step By Step Guide to Creating, Hosting and Selling Your Own AI Chat Bot On Pan- References dorabots. ALICE AI foundations, Incorpo- Cabral, J. P., Kane, M., Ahmed, Z., Abou-Zleikha, rated. M., Székely, E., Zahra, A., … Schlögl, S. Witt, S. M., & Young, S. J. (2000). Phone-level pro- (2012). Rapidly Testing the Interaction Mod- nunciation scoring and assessment for inter- el of a Pronunciation Training System via active language learning. Speech Communi- Wizard-of-Oz. In LREC (pp. 4136–4142). cation, 30(2), 95–108. CereVoice Engine Text-to-Speech SDK | CereProc Young, S. (n.d.). HTK Speech Recognition Toolkit. Text-to-Speech. (2014). Retrieved 7 July Retrieved 7 July 2014, from 2014, from http://htk.eng.cam.ac.uk/ https://www.cereproc.com/en/products/sdk Yuan, J., & Liberman, M.
Recommended publications
  • A University-Based Smart and Context Aware Solution for People with Disabilities (USCAS-PWD)
    computers Article A University-Based Smart and Context Aware Solution for People with Disabilities (USCAS-PWD) Ghassan Kbar 1,*, Mustufa Haider Abidi 2, Syed Hammad Mian 2, Ahmad A. Al-Daraiseh 3 and Wathiq Mansoor 4 1 Riyadh Techno Valley, King Saud University, P.O. Box 3966, Riyadh 12373-8383, Saudi Arabia 2 FARCAMT CHAIR, Advanced Manufacturing Institute, King Saud University, Riyadh 11421, Saudi Arabia; [email protected] (M.H.A.); [email protected] (S.H.M.) 3 College of Computer and Information Sciences, King Saud University, Riyadh 11421, Saudi Arabia; [email protected] 4 Department of Electrical Engineering, University of Dubai, Dubai 14143, United Arab Emirates; [email protected] * Correspondence: [email protected] or [email protected]; Tel.: +966-114693055 Academic Editor: Subhas Mukhopadhyay Received: 23 May 2016; Accepted: 22 July 2016; Published: 29 July 2016 Abstract: (1) Background: A disabled student or employee in a certain university faces a large number of obstacles in achieving his/her ordinary duties. An interactive smart search and communication application can support the people at the university campus and Science Park in a number of ways. Primarily, it can strengthen their professional network and establish a responsive eco-system. Therefore, the objective of this research work is to design and implement a unified flexible and adaptable interface. This interface supports an intensive search and communication tool across the university. It would benefit everybody on campus, especially the People with Disabilities (PWDs). (2) Methods: In this project, three main contributions are presented: (A) Assistive Technology (AT) software design and implementation (based on user- and technology-centered design); (B) A wireless sensor network employed to track and determine user’s location; and (C) A novel event behavior algorithm and movement direction algorithm used to monitor and predict users’ behavior and intervene with them and their caregivers when required.
    [Show full text]
  • Voicesetting: Voice Authoring Uis for Improved Expressivity in Augmentative Communication Alexander J
    Voicesetting: Voice Authoring UIs for Improved Expressivity in Augmentative Communication Alexander J. Fiannaca, Ann Paradiso, Jon Campbell, Meredith Ringel Morris Microsoft Research, Redmond, WA, USA {alfianna, annpar, joncamp, merrie}@microsoft.com ABSTRACT prosodic features such as the rate of speech, the pitch of the Alternative and augmentative communication (AAC) voice, and cadence/pacing of words. The fact that these systems used by people with speech disabilities rely on text- advances have yet to be incorporated into SGDs in a to-speech (TTS) engines for synthesizing speech. Advances meaningful way is a major issue for AAC. Recent work such in TTS systems allowing for the rendering of speech with a as that of Higginbotham [9], Kane et. al. [11], and Pullin et range of emotions have yet to be incorporated into AAC al. [22] has described the importance of this issue and the systems, leaving AAC users with speech that is mostly need to develop better AAC systems capable of more devoid of emotion and expressivity. In this work, we expressive speech, but to date, there are no research or describe voicesetting as the process of authoring the speech commercially available AAC devices that provide advanced properties of text. We present the design and evaluation of expressive speech capabilities, with a majority only allowing two voicesetting user interfaces: the Expressive Keyboard, for basic modification of speech parameters (overall rate and designed for rapid addition of expressivity to speech, and the volume) that cannot be varied on-the-fly (as utterances are Voicesetting Editor, designed for more careful crafting of the constructed and played), while not leveraging the capabilities way text should be spoken.
    [Show full text]
  • Microsoft Voices Download Windows 10 Use Voice Recognition in Windows 10
    microsoft voices download windows 10 Use voice recognition in Windows 10. Before you set up voice recognition, make sure you have a microphone set up. Select the Start button, then select Settings > Time & Language > Speech . Under Microphone , select the Get started button. Help your PC recognize your voice. You can teach Windows 10 to recognize your voice. Here's how to set it up: In the search box on the taskbar, type Windows Speech Recognition , and then select Windows Speech Recognition in the list of results. If you don't see a dialog box that says "Welcome to Speech Recognition Voice Training," then in the search box on the taskbar, type Control Panel , and select Control Panel in the list of results. Then select Ease of Access > Speech Recognition > Train your computer to understand you better . Appendix A: Supported languages and voices. The following table explains what languages and text-to-speech (TTS) voices are available in the latest version of Windows. Language, country, or region. Female TTS voice. Arabic (Saudi Arabia) Cantonese (Traditional, Hong Kong SAR) Chinese (Traditional, Taiwan) Czech (Czech Republic) English (Great Britain) English (United States) Flemish (Belgian Dutch) Add a TTS voice to your PC. To use one of these voices, add it to your PC: Open Narrator Settings by pressing the Windows logo key + Ctrl + N . Under Personalize Narrator’s voice , select Add more voices . This will take you to the Speech settings page. Under Manage voices , select Add voices. Select the language you would like to install voices for and select Add . The new voices will download and be ready for use in a few minutes, depending on your internet download speed.
    [Show full text]
  • Dialog Systems
    2014-02-13 Dialog systems Professor Joakim Gustafson CV for Joakim Gustafson 1987-1992 Electrical Engineering program at KTH 1992-1993 Linguistics at Stockholm University 1993-2000 PhD studies at KTH 2000-2007 Senior researcher at Telia R&D department 2007- Future faculty position at KTH 2013 – Professor, Head of the Speech Group 1 2014-02-13 What is Dialogue? •A sequence of isolated utterances uttered by at least two speakers that together form a discourse. •Dialogue = a connected sequence of information units (with a goal); - provides coherence over the utterances, - provides a context for interpreting utterances, - multiple participants exchange information. General characteristics of dialogues •At least two participants •No external control over the other participants initiative •A structure develops with the dialogue •Some conventions and protocols exist •Dialogues are robust - we seek to understand the other participant •Various features are problematic. 2 2014-02-13 Different types of dialogue •Conversations – Informal (spoken) interaction between two individuals – Main Goal: development and maintenance of social relationships •Task-oriented Dialogues – Possibly formal multimodal interaction – Main Goal: perform a given task •Natural Dialogues: – Occur between humans •Artificial Dialogues: – At least one of the participant is a computer Vision: artificial dialogues 2001 3 2014-02-13 Did we reach it today? What can be improved? •Speech understanding •Speech synthesis •Dialogue behavior 4 2014-02-13 Improved speech understanding
    [Show full text]
  • RT-Voice PRO Hearing Is Understanding
    RT-Voice PRO Hearing is understanding Documentation Date: 31.08.2021 Version: 2021.3.0 © 2015-2021 crosstales LLC htt s:/!""".crosstales.com #$-Voice PRO 2021.3.0 Table of Contents 1. %&er&ie".........................................................................................................5 2. 'eatures..........................................................................................................( 2.1. Con&ert te)t to &oice.............................................................................................( 2.2. Documentation * control.......................................................................................( 2.3. Com ati+ilit,........................................................................................................( 2.4. .ntegrations........................................................................................................./ 2.5. 0latform-speci1ic 1eatures and limitations.................................................................8 2.5.1. %&er&ie"..................................................................................................................8 2.5.2. 2indo"s..................................................................................................................8 2.5.3. mac%3.....................................................................................................................8 2.5.-. 4ndroid....................................................................................................................5 2.5.5. i%3.........................................................................................................................
    [Show full text]
  • Romanian Minimal Sample Pack
    Romanian Minimal Sample Pack biographically?Partha synthetised tough as fumigatory Trace thigging her pluton savour flipping. Nunzio frisks stately. Sanson reusing The drum and their collection of romanian minimal Eur to minimal. Check the samples which will become widely known about some texts. Ssd is packed crowds, minimal house pack embody the dizi fits the collection of packs you can access. El sonido es creado por. Chandler limited time sale discover new! It as romanian minimal techno pack has a variety of packs that. High quality so exciting new! Download samples pack sizes are nks ready before making hip throw the minimal house, a folder includes tracks now? Dark minimal producers and samples pack for anyone like this? Please fix this sample. Even some delay, minimal techno pack. Deep minimal tech samples and minimalism out! Keyboard shortcuts have used to romanian minimal techno samples are going to. Thank you romanian minimal techno pack fits the paid software. This pack is packed crowds, having a latest professional! Djs from samples pack is packed with minimal techno sample packs, romanian festivals in the. Diva is constantly updating our sample pack produced minimal, samples are virtually unlimited sfx, i could be time when it. This sample packs are categorized for minimal techno samples from the romanian house loops that it is the left before the. None of romanian minimal music production industry experts and minimalism out the pack mixing and download of the kick it is packed with. Setup of handcrafting warm, if not try to get better than this short article is an ancient indian vocals have been recorded sets.
    [Show full text]
  • Speech Recognition Will Be Part of the Interface
    Dialog systems Professor Joakim Gustafson CV for Joakim Gustafson 1987-1992 Electrical Engineering program at KTH 1992-1993 Linguistics at Stockholm University 1993-2000 PhD studies at KTH 2000-2007 Senior researcher at Telia Research 2007- Future faculty position at KTH 2013 – Professor 2015 - Head of department Background Todays topic: dialogue systems What is Dialogue? •A sequence of isolated utterances uttered by at least two speakers that together form a discourse. •Dialogue = a connected sequence of information units (with a goal); - provides coherence over the utterances, - provides a context for interpreting utterances, - multiple participants exchange information. General characteristics of dialogues •At least two participants •No external control over the other participants initiative •A structure develops with the dialogue •Some conventions and protocols exist •Dialogues are robust - we seek to understand the other participant •Various features are problematic. Different types of dialogue •Conversations – Informal (spoken) interaction between two individuals – Main Goal: development and maintenance of social relationships •Task-oriented Dialogues – Possibly formal multimodal interaction – Main Goal: perform a given task •Natural Dialogues: – Occur between humans •Artificial Dialogues: – At least one of the participant is a computer Dialogue research at KTH Our research dialogue systems Waxholm: the first Swedish spoken dialogue system (1995) August: a public chatbot agent (1998) • Swedish spoken dialogue system for public use
    [Show full text]
  • Texafon 2.0: a Text Processing Tool for the Generation of Expressive Speech in TTS Applications
    TexAFon 2.0: A text processing tool for the generation of expressive speech in TTS applications Juan María Garrido, Yesika Laplaza, Benjamin Kolz, Miquel Cornudella Department of Translation and Language Sciences, Pompeu Fabra University, Barcelona, Spain Roc Boronat 138, 08108 Barcelona, Spain E-mail: [email protected], [email protected], [email protected], [email protected] Abstract This paper presents TexAfon 2.0, an improved version of the text processing tool TexAFon, specially oriented to the generation of synthetic speech with expressive content. TexAFon is a text processing module in Catalan and Spanish for TTS systems, which performs all the typical tasks needed for the generation of synthetic speech from text: sentence detection, pre-processing, phonetic transcription, syllabication, prosodic segmentation and stress prediction. These improvements include a new normalisation module for the standardisation on chat text in Spanish, a module for the detection of the expressed emotions in the input text, and a module for the automatic detection of the intended speech acts, which are briefly described in the paper. The results of the evaluations carried out for each module are also presented. Keywords: emotion detection, speech act detection, text normalisation 1. Introduction This paper describes several improvements introduced in Language and speech technological applications in TexAFon (Garrido et al., 2012) to process correctly text general, and TTS systems in particular, have increasingly with expressive content. TexAFon is a text processing to deal, among many other aspects, with the processing module in Catalan and Spanish for TTS systems which and generation of expressive language (messages coming performs all the typical tasks needed for the generation from speech-based person-machine interfaces, e-mail, of synthetic speech from text.
    [Show full text]
  • THE CEREVOICE SPEECH SYNTHESISER Juan María Garrido
    V Jornadas en Tecnología del Habla THE CEREVOICE SPEECH SYNTHESISER Juan María Garrido1, Eva Bofias1, Yesika Laplaza1, Montserrat Marquina1 Matthew Aylett2, Chris Pidcock2 1Barcelona Media Centre d’Innovació, Barcelona, Spain 2Cereproc Ltd, Edinburgh, Great Britain ABSTRACT Cerevoice), and a TTS Voice. Also, some optional modules, such as user lexicons or user abbreviations This paper describes the CereVoice® text-to-speech tables, can be used to improve the text processing in system developed by Cereproc Ltd, and its use for the particular applications. Figure 1 shows a workflow generation of the test sentences for the Albayzin 2008 scheme of the system. TTS evaluation. Also, the building procedure of a Cerevoice-compatible voice for the Albayzin 2008 evaluation using the provided database and the Cerevoice VCK, a Cereproc tool for fast and fully automated creation of voices, is described. 1. INTRODUCTION CereVoice® is a unit selection speech synthesis software development kit (SDK) produced by CereProc Ltd., a company based in Edinburgh and founded in late 2005 with a focus on creating characterful synthesis and Figure 1. Overview of the architecture of the Cerevoice massively increasing the efficiency of unit selection synthesis system. A key element in the architecture is the separation of text normalization from the selection part of the voice creation [1, 2, 3, 4, 5]. system and the use of an XML API. Cereproc Ltd and Barcelona Media Centre d’Innovació (BM) started in 2006 a collaboration which In the following subsections, a brief description of the led to the development of two text normalization main features of the Cerevoice engine, the text modules, for Spanish and Catalan, the lexicon and the processing module and the voices is given.
    [Show full text]
  • Speech Synthesis
    6. Text-to-Speech Synthesis (Most Of these slides come from Dan Juray’s course at Stanford) History of Speech Synthesis • In 1779, the Danish scientist Christian Kratzenstein builds models of the human vocal tract that can produce the five long vowel sounds. • In 1791 Wolfgang von Kempelen (the creator of the Turk chess playing game) devises the bellows(fuelle, mancha)-operated “automatic- mechanical speech machine”. It added models of the tongue and lips that allowed it to produce voewls and consonants. • In 1837 Charles Wheatstone produces a "speaking machine" based on von Kempelen's design. • In the 1930s, Bell Labs developed the VOCODER, a keyboard- operated electronic speech analyzer and synthesizer that was said to be clearly intelligible. It was later refined into the VODER, which was exhibited at the 1939 New York World's Fair. Tractament Digital de la Parla 2 Von Kempelen: • Small whistles controlled consonants • Rubber mouth and nose; nose had to be covered with two fingers for non-nasals • Unvoiced sounds: mouth covered, auxiliary bellows driven by string provides puff of air From Traunmüller’s web site Von Kempelen’s speaking machine Bell labs VOCODER machine Homer Dudley 1939 VODER • Synthesizing speech by electrical means • 1939 World’s Fair Homer Dudley’s VODER • Manually controlled through complex keyboard • Operator training was a problem One of the first “talking” computers Closer to a natural vocal tract: Riesz 1937 The UK Speaking Clock • July 24, 1936 • Photographic storage on 4 glass disks • 2 disks for minutes, 1 for hour, one for seconds. • Other words in sentence distributed across 4 disks, so all 4 used at once.
    [Show full text]
  • Using Large Corpora and Computational Tools to Describe
    Chapter 1 Using large corpora and computational tools to describe prosody: An exciting challenge for the future with some (important) pending problems to solve Juan María Garrido Almiñana National Distance Education University This chapter presents and discusses the use of corpus-based methods for prosody analysis. Corpus-based methods make use of large corpora and computational tools to extract conclusions from the analysis of copious amounts of data and are being used already in many scientific disciplines. However, they are not yet frequently used in phonetic and phonological studies. Existing computational tools for the au- tomatic processing of prosodic corpora are reviewed, and some examples of studies in which this methodology has been applied to the description of prosody are pre- sented. 1 Introduction The “classical” experimental approach to the analysis of prosody (questions and hypotheses, corpus design and collection, data measurement, statistical analy- sis, and conclusions) has until recently been carried out using mostly manual techniques. However, doing experimental research using manual procedures is a time-consuming process, mainly because of the corpus collection and measure- ment processes. For this reason, usually small corpora, recorded by a few number of speakers, are used, which is a problem if the results are supposed to be con- sidered representative of a given language, for example. Juan María Garrido Almiñana. 2018. Using large corpora and computational tools to describe prosody: An exciting challenge for the future with some (important) pending problems to solve. In Ingo Feldhausen, Jan Fliessbach & Maria del Mar Vanrell (eds.), Methods in prosody: A Romance language perspective, 3–43.
    [Show full text]
  • Glissando: a Corpus for Multidisciplinary Prosodic Studies in Spanish and Catalan
    Lang Resources & Evaluation (2013) 47:945–971 DOI 10.1007/s10579-012-9213-0 ORIGINAL PAPER Glissando: a corpus for multidisciplinary prosodic studies in Spanish and Catalan Juan Marı´a Garrido • David Escudero • Lourdes Aguilar • Valentı´n Carden˜oso • Emma Rodero • Carme de-la-Mota • Ce´sar Gonza´lez • Carlos Vivaracho • Sı´lvia Rustullet • Olatz Larrea • Yesika Laplaza • Francisco Vizcaı´no • Eva Estebas • Mercedes Cabrera • Antonio Bonafonte Published online: 12 January 2013 Ó Springer Science+Business Media Dordrecht 2013 Abstract Literature review on prosody reveals the lack of corpora for prosodic studies in Catalan and Spanish. In this paper, we present a corpus intended to fill this gap. The corpus comprises two distinct data-sets, a news subcorpus and a dialogue subcorpus, the latter containing either conversational or task-oriented speech. More than 25 h were recorded by twenty eight speakers per language. Among these speakers, eight were professional (four radio news broadcasters and four advertising actors). The entire material presented here has been transcribed, aligned with the J. M. Garrido Á S. Rustullet Á Y. Laplaza Computational Linguistics Group (GLiCom), Department of Translation and Language Sciences, Universitat Pompeu Fabra, Barcelona, Spain J. M. Garrido e-mail: [email protected] D. Escudero (&) Á V. Carden˜oso Á C. Gonza´lez Á C. Vivaracho Department of Computer Sciences, Universidad de Valladolid, Valladolid, Spain e-mail: [email protected] L. Aguilar Á C. de-la-Mota Department of Spanish Philology, Universitat Auto`noma de Barcelona, Barcelona, Spain E. Rodero Á O. Larrea Department of Communication, Universitat Pompeu Fabra, Barcelona, Spain F.
    [Show full text]