Voice Synthesizer Application Android

Total Page:16

File Type:pdf, Size:1020Kb

Voice Synthesizer Application Android Voice synthesizer application android Continue The Download Now link sends you to the Windows Store, where you can continue the download process. You need to have an active Microsoft account to download the app. This download may not be available in some countries. Results 1 - 10 of 603 Prev 1 2 3 4 5 Next See also: Speech Synthesis Receming Device is an artificial production of human speech. The computer system used for this purpose is called a speech computer or speech synthesizer, and can be implemented in software or hardware. The text-to-speech system (TTS) converts the usual text of language into speech; other systems display symbolic linguistic representations, such as phonetic transcriptions in speech. Synthesized speech can be created by concatenating fragments of recorded speech that are stored in the database. Systems vary in size of stored speech blocks; The system that stores phones or diphones provides the greatest range of outputs, but may not have clarity. For specific domain use, storing whole words or suggestions allows for high-quality output. In addition, the synthesizer may include a vocal tract model and other characteristics of the human voice to create a fully synthetic voice output. The quality of the speech synthesizer is judged by its similarity to the human voice and its ability to be understood clearly. The clear text to speech program allows people with visual impairments or reading disabilities to listen to written words on their home computer. Many computer operating systems have included speech synthesizers since the early 1990s. A review of the typical TTS Automatic Announcement System synthetic voice announces the arriving train to Sweden. Problems with playing this file? See the media report. The text to speech system (or engine) consists of two parts: front and back. The front end has two main tasks. First, it converts raw text containing symbols, such as numbers and abbreviations, into the equivalent of written words. This process is often referred to as text normalization, pre-processing, or tokenization. The front end then assigns phonetic transcriptions to each word, and divides and marks the text into prozodiotic units such as phrases, clauses and sentences. The process of assigning phonetic transcriptions to words is called the transformation of text into a phoneme or graphema to a phoneme. Phonetic transcriptions and prosody information together constitute a symbolic linguistic representation that is the output at the front end. The rear end, often called a synthesizer, converts a symbolic linguistic representation into sound. In some systems, this part includes the calculation of the target prosody (step outline, phoneme duration), which is then imposed on output speech. Long before the invention of electronic signal processing, some people tried to build machines to emulate human speech. Speech. Early legends of the existence of Brazen Heads involved Pope Sylvester II (d. 1003 AD), Albert Magnus (1198-1280), and Roger Bacon (1214-1294). In 1779, the German-Danish scientist Christian Gottlieb Kratzenstein received the first prize in a competition announced by the Russian Imperial Academy of Sciences and Arts for models he built from a human vocal tract that could produce five long vowels (in the international phonetic notation of the alphabet: aː, eː, iː, oː and uː). It was followed by the acoustic-mechanical speech machine of Wolfgang von Kempelen of Pressburg, Hungary, described in a 1791 article. This machine added models of tongue and lips, which allowed it to produce consonants as well as vowels. In 1837, Charles Wheatstone produced a talking machine designed by von Kempelen, and in 1846, Joseph Faber exhibited Euphonia. In 1923, Peydt revived Wheatstone's design. In the 1930s, Bell Labs developed a vocoder that automatically analyzed speech in its fundamental tones and resonances. From his work on the vocoder, Homer Dudley developed a voice synthesizer called The Voder (Voice Demonstrator), which he exhibited at the 1939 World's Fair in New York. Dr. Franklin S. Cooper and his colleagues at Haskins Lab built a pattern reproduction in the late 1940s and completed it in 1950. There were several different versions of this hardware device; Only one currently survives. The machine converts images of acoustic speech patterns in the form of a spectrogram back into sound. Using this device, Alvin Lieberman and his colleagues found acoustic signals for the perception of phonetic segments (consonants and vowels). Electronic computer and speech synthesizer devices used by Stephen Hawking in 1999 The first computer speech synthesis systems originated in the late 1950s. Noriko Umeda et al. developed the first common English text-speech system in 1968, at the Electrical Laboratory in Japan. In 1961, physicist John Larry Kelly Jr. and his colleague Louis Gerstman used an IBM 704 computer to synthesize speech, one of the most famous events in bell Labs history. (quote necessary) Kelly's synthesizer recorder (vocoder) recreated the song Daisy Bell to the music of Max Matthews. Coincidentally, Arthur Clarke was visiting his friend and colleague John Pierce at the Bell Labs Murray Hill facility. Clarke was so impressed by the demonstration that he used it in the climax scene of his script for his 2001 novel: A Space Odyssey, 10 where a HAL 9000 computer sings the same song as astronaut Dave Bowman puts him to sleep. Despite the success of purely electronic speech synthesis, research into mechanical speech synthesizers continues. Linear predictive coding (required third-party source) The form of speech coding began with the work of Fumitada Ithaku from the University of Nagoya and Shuzo Saito of Nippon Telegraph and Telephone (NTT) in 1966. Further developments in LPC technology were made by Bishna S. Atal and Manfred R. Schroeder at Bell Labs in the 1970s. In 1975, Fumitada Ithaku developed the Linear Spectral Pairs (LSP) method for coding high-compression speech while in NTT. From 1975 to 1981, Ithaca studied problems in speech analysis and synthesis based on the LSP method. In 1980, his team developed a speech synthesizer chip based on LSP. LSP is an important technology for speech synthesis and coding, and in the 1990s was adopted by almost all international standards of speech coding as an important component, helping to improve digital speech communication through mobile channels and the Internet. In 1975, MUSA was released and became one of the first speech synthesis systems. It consisted of independent computer equipment and specialized software that allowed him to read Italian. The second version, released in 1978, was also able to advise in Italian in the style of a cappella. The DECtalk demo, using perfect Paul and Uppity Ursula voices Dominant in the 1980s and 1990s, was a DECtalk system based primarily on the work of Dennis Klatt at MIT and Bell Labs; The latter was one of the first multilingual language independent systems, using natural language processing techniques extensively. Hand electronics with speech synthesis began to appear in the 1970s. One of the first was the portable Telesensory Systems Inc. (TSI) Speech calculator for the blind in 1976. Other devices had mostly educational purposes, such as the Speak toy and the Spell produced by Texas Instruments in 1978. In 1979, Fidelity released a colloquial version of its electronic chess computer. The first video game to feature speech synthesis was the 1980 arcade game Stratovox (known in Japan as Speak and Rescue) by Sun Electronics. The first personalized speech-synthesis computer game was Manbiki Shoujo (Shoplifting Girl), released in 1980 for PET 2001, for which the game's developer, Hiroshi Suzuki, developed a zero cross programming method for the production of synthesized speech form. Another early example, the arcade version of Berzerk, also dates back to 1980. Milton Bradley released the first multiplayer electronic game using Milton's voice synthesis in the same year. Early electronic speech synthesizers sounded robotic and were often barely clear. The quality of synthesized speech is steadily improving, but compared to 2016, the output from modern speech synthesis remains clearly distinguishable from actual human speech. Synthesized voices tended to sound male until 1990, when Anne Sirdal, of ATT Bell Laboratories, created a female voice. In 2005, Kurzweil predicted that as the cost-performance ratio made speech synthesizers cheaper and more accessible, more people would benefit from text programs. Synthesizer technology The most important qualities of the speech synthesis system are naturalness and intelligibility. Naturalness describes how close the output sounds like human speech, while intelligibility is the ease with which output is understood. The ideal speech synthesizer is natural and understandable. Speech synthesis systems usually try to maximize both characteristics. The two main technologies for generating synthetic speech waves are concatenative synthesis and formal synthesis. Each technology has strengths and weaknesses, and the intended use of a fusion system usually determines which approach is used. Synthesis of Concatenation Main article: Concatenative synthesis is based on concatenation (or stringing together) segments of recorded speech. Typically, concatinative synthesis produces the most naturally sounding synthesized speech. However, differences between natural variations of speech and the nature of automated wave segmentation methods sometimes lead to sound failures in output. There are
Recommended publications
  • THE DEVELOPMENT of ACCENTED ENGLISH SYNTHETIC VOICES By
    THE DEVELOPMENT OF ACCENTED ENGLISH SYNTHETIC VOICES by PROMISE TSHEPISO MALATJI DISSERTATION Submitted in fulfilment of the requirements for the degree of MASTER OF SCIENCE in COMPUTER SCIENCE in the FACULTY OF SCIENCE AND AGRICULTURE (School of Mathematical and Computer Sciences) at the UNIVERSITY OF LIMPOPO SUPERVISOR: Mr MJD Manamela CO-SUPERVISOR: Dr TI Modipa 2019 DEDICATION In memory of my grandparents, Cecilia Khumalo and Alfred Mashele, who always believed in me! ii DECLARATION I declare that THE DEVELOPMENT OF ACCENTED ENGLISH SYNTHETIC VOICES is my own work and that all the sources that I have used or quoted have been indicated and acknowledged by means of complete references and that this work has not been submitted before for any other degree at any other institution. ______________________ ___________ Signature Date iii ACKNOWLEDGEMENTS I want to recognise the following people for their individual contributions to this dissertation: • My brother, Mr B.I. Khumalo and the whole family for the unconditional love, support and understanding. • A distinct thank you to both my supervisors, Mr M.J.D. Manamela and Dr T.I. Modipa, for their guidance, motivation, and support. • The Telkom Centre of Excellence for Speech Technology for providing the resources and support to make this study a success. • My colleagues in Department of Computer Science, Messrs V.R. Baloyi and L.M. Kola, for always motivating me. • A special thank you to Mr T.J. Sefara for taking his time to participate in the study. • The six Computer Science undergraduate students who sacrificed their precious time to participate in data collection.
    [Show full text]
  • Development of a Text to Speech System for Devanagari Konkani
    DEVELOPMENT OF A TEXT TO SPEECH SYSTEM FOR DEVANAGARI KONKANI A Thesis submitted to Goa University for the award of the degree of Doctor of Philosophy in Computer Science and Technology By Nilesh B. Fal Dessai GOA UNIVERSITY Taleigao Plateau May 2017 Development of a Text to Speech System for Devanagari Konkani A Thesis submitted to Goa University for the award of the degree of Doctor of Philosophy in Computer Science and Technology By Nilesh B. Fal Dessai GOA UNIVERSITY Taleigao Plateau May 2017 Statement As required under the ordinance OB-9.9 of Goa University, I, Nilesh B. Fal Dessai, state that the present Thesis entitled, ‘Development of a Text to Speech System for Devanagari Konkani’ is my original contribution carried out under the supervision of Dr. Jyoti D. Pawar, Associate Professor, Department of Computer Science and Technology, Goa University and the same has not been submitted on any previous occasion. To the best of my knowledge, the present study is the first comprehensive work of its kind in the area mentioned. The literature related to the problem investigated has been cited. Due acknowledgments have been made whenever facilities and suggestions have been availed of. (Nilesh B. Fal Dessai) i Certificate This is to certify that the thesis entitled ‘Development of a Text to Speech System for Devanagari Konkani’, submitted by Nilesh B. Fal Dessai, for the award of the degree of Doctor of Philosophy in Computer Science and Technology is based on his original studies carried out under my supervision. The thesis or any part thereof has not been previously submitted for any other degree or diploma in any University or Institute.
    [Show full text]
  • Commercial Tools in Speech Synthesis Technology
    International Journal of Research in Engineering, Science and Management 320 Volume-2, Issue-12, December-2019 www.ijresm.com | ISSN (Online): 2581-5792 Commercial Tools in Speech Synthesis Technology D. Nagaraju1, R. J. Ramasree2, K. Kishore3, K. Vamsi Krishna4, R. Sujana5 1Associate Professor, Dept. of Computer Science, Audisankara College of Engg. and Technology, Gudur, India 2Professor, Dept. of Computer Science, Rastriya Sanskrit VidyaPeet, Tirupati, India 3,4,5UG Student, Dept. of Computer Science, Audisankara College of Engg. and Technology, Gudur, India Abstract: This is a study paper planned to a new system phonetic and prosodic information. These two phases are emotional speech system for Telugu (ESST). The main objective of usually called as high- and low-level synthesis. The input text this paper is to map the situation of today's speech synthesis might be for example data from a word processor, standard technology and to focus on potential methods for the future. ASCII from e-mail, a mobile text-message, or scanned text Usually literature and articles in the area are focused on a single method or single synthesizer or the very limited range of the from a newspaper. The character string is then preprocessed and technology. In this paper the whole speech synthesis area with as analyzed into phonetic representation which is usually a string many methods, techniques, applications, and products as possible of phonemes with some additional information for correct is under investigation. Unfortunately, this leads to a situation intonation, duration, and stress. Speech sound is finally where in some cases very detailed information may not be given generated with the low-level synthesizer by the information here, but may be found in given references.
    [Show full text]
  • The Conductor Model of Online Speech Modulation
    The Conductor Model of Online Speech Modulation Fred Cummins Department of Computer Science, University College Dublin [email protected] Abstract. Two observations about prosodic modulation are made. Firstly, many prosodic parameters co-vary when speaking style is altered, and a similar set of variables are affected in particular dysarthrias. Second, there is a need to span the gap between the phenomenologically sim- ple space of intentional speech control and the much higher dimensional space of manifest effects. A novel model of speech production, the Con- ductor, is proposed which posits a functional subsystem in speech pro- duction responsible for the sequencing and modulation of relatively in- variant elements. The ways in which the Conductor can modulate these elements are limited, as its domain of variation is hypothesized to be a relatively low-dimensional space. Some known functions of the cortico- striatal circuits are reviewed and are found to be likely candidates for implementation of the Conductor, suggesting that the model may be well grounded in plausible neurophysiology. Other speech production models which consider the role of the basal ganglia are considered, leading to some observations about the inextricable linkage of linguistic and motor elements in speech as actually produced. 1 The Co-Modulation of Some Prosodic Variables It is a remarkable fact that speakers appear to be able to change many aspects of their speech collectively, and others not at all. I focus here on those prosodic variables which are collectively affected by intentional changes to speaking style, and argue that they are governed by a single modulatory process, the ‘Con- ductor’.
    [Show full text]
  • COMMITTEE TRUSTEES: Philip Rubin, Mark Boxer, Sandy Cloud
    DRAFT MINUTES SPECIAL TELEPHONE MEETING OF THE COMMITTEE FOR RESEARCH, ENTREPRENEURSHIP AND INNOVATION University of Connecticut Board of Trustees February 15, 2021 COMMITTEE TRUSTEES: Philip Rubin, Mark Boxer, Sandy Cloud, Marilda Gandara ADDITIONAL TRUSTEES: Chairman Toscano COMMITTEE MEMBERS: Rich Vogel, Tim Shannon UNIVERSITY SENATE REP: Jeannette Pritchard STAFF: Joanna Desjardin, President Katsouleas, Radenka Maric, David Nobel, Rachel Rubin, Dan Schwartz, Jeffrey Shoulson, Dan Weiner Committee Vice Chair Rubin convened the meeting at 10:01 a.m. by telephone. No public comment was volunteered on any of the agenda items. On a motion by Trustee Cloud, seconded by Mr. Vogel, the minutes from the November 16, 2020, Special Meeting of the REI Committee were approved. Vice Chair Rubin informed the committee that University Senate Representative Dr. Rajeev Bansal has retired and welcomed Dr. Jeanette Pritchard as the University Senate Representative on the committee. Vice Chair Rubin gave opening remarks and presented the publication, UConn in the Media. If anyone wants a copy they can contact Vice Chair Rubin or Joanna Desjardin. Congratulated Dr. David Noble for a featured article in UConn Today on entrepreneurship. Congratulated Dr. Radenka Maric for becoming a member of AAAS and her publication, Atomistic Insights into the Hydrogen Oxidation Reaction of Palladium-Ceria Bifunctional Catalysts for Anion-Exchange Membrane Fuel Cells, which is about her important work she and her colleagues continue to do on fuel cells. Would like to hear more about this at a future meeting. Welcomed Dr. Daniel Weiner, Vice President for Global Affairs, to the meeting. Vice Chair Rubin introduced Dr. Radenka Maric, Vice President for Research, Innovation & Entrepreneurship.
    [Show full text]
  • Models of Speech Synthesis ROLF CARLSON Department of Speech Communication and Music Acoustics, Royal Institute of Technology, S-100 44 Stockholm, Sweden
    Proc. Natl. Acad. Sci. USA Vol. 92, pp. 9932-9937, October 1995 Colloquium Paper This paper was presented at a colloquium entitled "Human-Machine Communication by Voice," organized by Lawrence R. Rabiner, held by the National Academy of Sciences at The Arnold and Mabel Beckman Center in Irvine, CA, February 8-9,1993. Models of speech synthesis ROLF CARLSON Department of Speech Communication and Music Acoustics, Royal Institute of Technology, S-100 44 Stockholm, Sweden ABSTRACT The term "speech synthesis" has been used need large amounts of speech data. Models working close to for diverse technical approaches. In this paper, some of the the waveform are now typically making use of increased unit approaches used to generate synthetic speech in a text-to- sizes while still modeling prosody by rule. In the middle of the speech system are reviewed, and some of the basic motivations scale, "formant synthesis" is moving toward the articulatory for choosing one method over another are discussed. It is models by looking for "higher-level parameters" or to larger important to keep in mind, however, that speech synthesis prestored units. Articulatory synthesis, hampered by lack of models are needed not just for speech generation but to help data, still has some way to go but is yielding improved quality, us understand how speech is created, or even how articulation due mostly to advanced analysis-synthesis techniques. can explain language structure. General issues such as the synthesis of different voices, accents, and multiple languages Flexibility and Technical Dimensions are discussed as special challenges facing the speech synthesis community.
    [Show full text]
  • Estudios De I+D+I
    ESTUDIOS DE I+D+I Número 51 Proyecto SIRAU. Servicio de gestión de información remota para las actividades de la vida diaria adaptable a usuario Autor/es: Catalá Mallofré, Andreu Filiación: Universidad Politécnica de Cataluña Contacto: Fecha: 2006 Para citar este documento: CATALÁ MALLOFRÉ, Andreu (Convocatoria 2006). “Proyecto SIRAU. Servicio de gestión de información remota para las actividades de la vida diaria adaptable a usuario”. Madrid. Estudios de I+D+I, nº 51. [Fecha de publicación: 03/05/2010]. <http://www.imsersomayores.csic.es/documentos/documentos/imserso-estudiosidi-51.pdf> Una iniciativa del IMSERSO y del CSIC © 2003 Portal Mayores http://www.imsersomayores.csic.es Resumen Este proyecto se enmarca dentro de una de las líneas de investigación del Centro de Estudios Tecnológicos para Personas con Dependencia (CETDP – UPC) de la Universidad Politécnica de Cataluña que se dedica a desarrollar soluciones tecnológicas para mejorar la calidad de vida de las personas con discapacidad. Se pretende aprovechar el gran avance que representan las nuevas tecnologías de identificación con radiofrecuencia (RFID), para su aplicación como sistema de apoyo a personas con déficit de distinta índole. En principio estaba pensado para personas con discapacidad visual, pero su uso es fácilmente extensible a personas con problemas de comprensión y memoria, o cualquier tipo de déficit cognitivo. La idea consiste en ofrecer la posibilidad de reconocer electrónicamente los objetos de la vida diaria, y que un sistema pueda presentar la información asociada mediante un canal verbal. Consta de un terminal portátil equipado con un trasmisor de corto alcance. Cuando el usuario acerca este terminal a un objeto o viceversa, lo identifica y ofrece información complementaria mediante un mensaje oral.
    [Show full text]
  • Expression Control in Singing Voice Synthesis
    Expression Control in Singing Voice Synthesis Features, approaches, n the context of singing voice synthesis, expression control manipu- [ lates a set of voice features related to a particular emotion, style, or evaluation, and challenges singer. Also known as performance modeling, it has been ] approached from different perspectives and for different purposes, and different projects have shown a wide extent of applicability. The Iaim of this article is to provide an overview of approaches to expression control in singing voice synthesis. We introduce some musical applica- tions that use singing voice synthesis techniques to justify the need for Martí Umbert, Jordi Bonada, [ an accurate control of expression. Then, expression is defined and Masataka Goto, Tomoyasu Nakano, related to speech and instrument performance modeling. Next, we pres- ent the commonly studied set of voice parameters that can change and Johan Sundberg] Digital Object Identifier 10.1109/MSP.2015.2424572 Date of publication: 13 October 2015 IMAGE LICENSED BY INGRAM PUBLISHING 1053-5888/15©2015IEEE IEEE SIGNAL PROCESSING MAGAZINE [55] noVEMBER 2015 voices that are difficult to produce naturally (e.g., castrati). [TABLE 1] RESEARCH PROJECTS USING SINGING VOICE SYNTHESIS TECHNOLOGIES. More examples can be found with pedagogical purposes or as tools to identify perceptually relevant voice properties [3]. Project WEBSITE These applications of the so-called music information CANTOR HTTP://WWW.VIRSYN.DE research field may have a great impact on the way we inter- CANTOR DIGITALIS HTTPS://CANTORDIGITALIS.LIMSI.FR/ act with music [4]. Examples of research projects using sing- CHANTER HTTPS://CHANTER.LIMSI.FR ing voice synthesis technologies are listed in Table 1.
    [Show full text]
  • Text-To-Speech Synthesis System for Marathi Language Using Concatenation Technique
    Text-To-Speech Synthesis System for Marathi Language Using Concatenation Technique Submitted to Dr. Babasaheb Ambedkar Marathwada University, Aurangabad, INDIA For the award of Doctor of Philosophy by Mr. Sangramsing Nathsing Kayte Supervised by: Dr. Bharti W. Gawali Professor Department of Computer Science and Information Technology DR. BABASAHEB AMBEDKAR MARATHWADA UNIVERSITY, AURANGABAD November 2018 Abstract A speech synthesis system is a computer-based system that should be able to read any text aloud with a particular language or multiple languages. This is also called as Text-to- Speech synthesis or in short TTS. Communication plays an important role in everyones life. Usually communication refers to speaking or writing or sending a message to another person. Speech is one of the most important ways for human communication. There have been a great number of efforts to incorporate speech for communication between humans and computers. Demand for technologies in speech processing, such as speech recogni- tion, dialogue processing, natural language processing, and speech synthesis are increas- ing. These technologies are useful for human-to-human like spoken language translation systems and human-to-machine communication like control for handicapped persons etc., TTS is one of the key technologies in speech processing. TTS system converts ordinary orthographic text into acoustic signal, which is indis- tinguishable from human speech. For developing a natural human machine interface, the TTS system can be used as a way to communicate back, like a human voice by a computer. The TTS can be a voice for those people who cannot speak. People wishing to learn a new language can use the TTS system to learn the pronunciation.
    [Show full text]
  • Wormed Voice Workshop Presentation
    Wormed Voice Workshop Presentation micro_research December 27, 2017 1 some worm poetry and songs: The WORM was for a long time desirous to speake, but the rule and order of the Court enjoyned him silence, but now strutting and swelling, and impatient, of further delay, he broke out thus... [Michael Maier] He worshipped the worm and prayed to the wormy grave. Serpent Lucifer, how do you do? Of your worms and your snakes I'd be one or two; For in this dear planet of wool and of leather `Tis pleasant to need neither shirt, sleeve, nor shoe, 2 And have arm, leg, and belly together. Then aches your head, or are you lazy? Sing, `Round your neck your belly wrap, Tail-a-top, and make your cap Any bee and daisy. Two pigs' feet, two mens' feet, and two of a hen; Devil-winged; dragon-bellied; grave- jawed, because grass Is a beard that's soon shaved, and grows seldom again worm writing the the the the,eeeronencoug,en sthistit, d.).dupi w m,tinsprsool itav f bometaisp- pav wheaigelic..)a?? orerdi mise we ich'roo bish ftroo htothuloul mespowouklain- duteavshi wn,jis, sownol hof." m,tisorora angsthyedust,es, fofald,junss ownoug brad,)fr m fr,aA?a????ck;A?stelav aly, al is.'rady'lfrdil owoncorara wns t.) sh'r, oof ofr,a? ar,a???????a? fu mo towess,eethen hrtolly-l,."tigolav ict,a???!ol, w..'m,elyelil,tstreamas..n gotaillas.tansstheatsea f mb ispot inici t.) owar.**1 wnshigigholoothtith orsir.tsotic.'m, sotamimoledug imootrdeavet..t,) sh s,tranciror."wn sieee h asinied.tiear wspilotor,) bla av.nicord,ier.dy'et.*tite m.)..*d, hrouceto hie, ig il m, bsomoug,.t.'l,t, olitel bs,.nt,.dotr tat,)aa? htotitedont,j alesil, starar,ja taie ass.nishiceroouldseal fotitoonckysil, m oitispl o anteeeaicowousomirot.
    [Show full text]
  • Attuning Speech-Enabled Interfaces to User and Context for Inclusive Design: Technology, Methodology and Practice
    CORE Metadata, citation and similar papers at core.ac.uk Provided by Springer - Publisher Connector Univ Access Inf Soc (2009) 8:109–122 DOI 10.1007/s10209-008-0136-x LONG PAPER Attuning speech-enabled interfaces to user and context for inclusive design: technology, methodology and practice Mark A. Neerincx Æ Anita H. M. Cremers Æ Judith M. Kessens Æ David A. van Leeuwen Æ Khiet P. Truong Published online: 7 August 2008 Ó The Author(s) 2008 Abstract This paper presents a methodology to apply 1 Introduction speech technology for compensating sensory, motor, cog- nitive and affective usage difficulties. It distinguishes (1) Speech technology seems to provide new opportunities to an analysis of accessibility and technological issues for the improve the accessibility of electronic services and soft- identification of context-dependent user needs and corre- ware applications, by offering compensation for limitations sponding opportunities to include speech in multimodal of specific user groups. These limitations can be quite user interfaces, and (2) an iterative generate-and-test pro- diverse and originate from specific sensory, physical or cess to refine the interface prototype and its design cognitive disabilities—such as difficulties to see icons, to rationale. Best practices show that such inclusion of speech control a mouse or to read text. Such limitations have both technology, although still imperfect in itself, can enhance functional and emotional aspects that should be addressed both the functional and affective information and com- in the design of user interfaces (cf. [49]). Speech technol- munication technology-experiences of specific user groups, ogy can be an ‘enabler’ for understanding both the content such as persons with reading difficulties, hearing-impaired, and ‘tone’ in user expressions, and for producing the right intellectually disabled, children and older adults.
    [Show full text]
  • Towards Expressive Speech Synthesis in English on a Robotic Platform
    PAGE 130 Towards Expressive Speech Synthesis in English on a Robotic Platform Sigrid Roehling, Bruce MacDonald, Catherine Watson Department of Electrical and Computer Engineering University of Auckland, New Zealand s.roehling, b.macdonald, [email protected] Abstract Affect influences speech, not only in the words we choose, but in the way we say them. This pa- per reviews the research on vocal correlates in the expression of affect and examines the ability of currently available major text-to-speech (TTS) systems to synthesize expressive speech for an emotional robot guide. Speech features discussed include pitch, duration, loudness, spectral structure, and voice quality. TTS systems are examined as to their ability to control the fea- tures needed for synthesizing expressive speech: pitch, duration, loudness, and voice quality. The OpenMARY system is recommended since it provides the highest amount of control over speech production as well as the ability to work with a sophisticated intonation model. Open- MARY is being actively developed, is supported on our current Linux platform, and provides timing information for talking heads such as our current robot face. 1. Introduction explicitly stated otherwise the research is concerned with the English language. Affect influences speech, not only in the words we choose, but in the way we say them. These vocal nonverbal cues are important in human speech as they communicate 2.1. Pitch information about the speaker’s state or attitude more effi- ciently than the verbal content (Eide, Aaron, Bakis, Hamza, Pitch contour seems to be one of the clearest indica- Picheny, and Pitrelli 2004).
    [Show full text]