Using Speech Synthesis to Create Stimuli for Speech Perception Research

Total Page:16

File Type:pdf, Size:1020Kb

Using Speech Synthesis to Create Stimuli for Speech Perception Research Using speech synthesis to create stimuli for speech perception research Prof. Simon King V N I E R The Centre for Speech Technology Research U S University of Edinburgh, UK E I T H www.cstr.ed.ac.uk Y T O H INSPIRE 2013, Sheffield F G E R D I N B U 1 © Copyright Simon King, University of Edinburgh. For personal use only. Re-use or distribution not permitted. Contents • Part I : Motivation • why synthesis might be a useful tool • Part II : Core techniques • formant synthesis • articulatory synthesis • physical modelling • vocoding • concatenation of diphones • concatenation of units from a large inventory • statistical parametric speech synthesis (HMMs) • Part III : The state of the art • controllable HMM synthesis with articulatory and formant controls 2 © Copyright Simon King, University of Edinburgh. For personal use only. Re-use or distribution not permitted. Part I Motivation 3 © Copyright Simon King, University of Edinburgh. For personal use only. Re-use or distribution not permitted. Goal: investigate speech perception • How? • form a hypothesis • design experiment • design the stimuli design is limited by methods available • create the stimuli for creation • play stimuli to listeners • obtain responses • analyse responses • support / refute hypothesis 4 © Copyright Simon King, University of Edinburgh. For personal use only. Re-use or distribution not permitted. Designing stimuli • Usually speech or speech-like sounds • Natural speech • elicited from one or more speakers • Manipulated natural speech • filtered - e.g., delexicalised • edited - e.g., modify temporal structure, remove acoustic cues, splice, ... • Synthetic speech • several methods available • which should we choose? • Other synthetic sounds - e.g, sine wave speech 5 © Copyright Simon King, University of Edinburgh. For personal use only. Re-use or distribution not permitted. The limits of manually manipulating natural speech • Manually editing means that only limited forms of modification possible • remove information • splice together individual natural sounds • Laborious • Highly skilled • Therefore very slow to create stimuli • Places limits on the experiments that can be performed • a bias towards certain types of stimuli 6 © Copyright Simon King, University of Edinburgh. For personal use only. Re-use or distribution not permitted. Doing it automatically - decomposing speech • The speech signal we observe (the waveform) is the product of interacting processes operating at different time scales • at any moment in time, the signal is affected not just by the current phoneme, but many other aspects of the context in which it occurs • the context is complex - it’s not just the preceding/following sounds • We have a conflict: we want to simultaneously: • model speech as a linear string of units, for engineering simplicity • take into account all the long-range effects of context, before, during and after the current moment in time 7 © Copyright Simon King, University of Edinburgh. For personal use only. Re-use or distribution not permitted. Speech is produced by several interacting processes 8 © Copyright Simon King, University of Edinburgh. For personal use only. Re-use or distribution not permitted. Resolving this conflict: take context into account • The context in which a speech sound is produced affects that sound • articulatory constraints: where the articulators are coming from / going to • phonological effects • prosodic environment 9 © Copyright Simon King, University of Edinburgh. For personal use only. Re-use or distribution not permitted. Modern speech synthesis 4 examples: diphones, unit selection, HMMs (x2) http://www.cstr.ed.ac.uk/projects/festival/morevoices.html 10 © Copyright Simon King, University of Edinburgh. For personal use only. Re-use or distribution not permitted. Part II Core techniques 11 © Copyright Simon King, University of Edinburgh. For personal use only. Re-use or distribution not permitted. Part II Core techniques - formant synthesis 12 © Copyright Simon King, University of Edinburgh. For personal use only. Re-use or distribution not permitted. Mayo (Segmental),Mayo (Segmental),JASA JASA Stimulus design: a simple consonant-vowel sequence [sa ] [sa ] [ a ] [ a ] [de] [de] [be] [be] 8000 8000 8000 8000 (Hz) (Hz) (Hz) (Hz) equency equency equency equency Fr Fr Fr Fr 0 0 0 0 0 Time (ms)0 T300ime 0(ms) Time300 (ms)0 T300ime (ms) 300 0 Time (ms)0 T300ime0(ms) T300ime 0(ms) T300ime (ms) 300 “sigh” “shy” [ta] [ta] [da] [da] 13[ti] [ti] [di] [di] © Copyright8000 Simon King, 8000University of Edinburgh. For personal use only. Re-use or distribution not8000 permitted. 8000 (Hz) (Hz) (Hz) (Hz) equency equency equency equency Fr Fr Fr Fr 0 0 0 0 0 Time (ms)0 T300ime0(ms)T300ime (ms)0 T300ime (ms) 300 0 Time (ms)0 T300ime0(ms) T300ime 0(ms) T300ime (ms) 300 FIG. 1: FIG. 1: 39 39 Stimulus design: a continuum Mayo (Segmental), JASA 8000 (Hz) equency Fr 0 /s/ continua of frication noise / / 7-year-olds 40 Adults 8000 of (Hz) equency “sigh” ..... “shy” Fr esponses r 0 / Mayo using the vowel from “sigh” Number Time 300ms) /sa (Segmental), FIG. 2: (audio: 9 point continuum) /s/ / / /s/14 / / © Copyright Simon King, University of Edinburgh. For personal use only. Re-use or distribution not permitted. JASA 5-year-olds 3- to 4-year-olds of esponses r / Number /sa /(s)a / /( )a / /s/ / / /s/ / / Frequency of frication noise FIG. 3: 41 The Klatt vocal tract model • F0 and gain Voicing Tilt TF TAF NF NAF • Up to six vocal tract F1 F1 F1 F1 F1 F2 F3 F4 F5 resonances: formants B1 B1 B1 B1 B1 B2 B3 B4 B5 F1, F2, … , F6 Aspiration • Aspiration and frication A2 F2 B2 noise A3 F3 B3 • Nasal zero (the `anti- resonance’ introduced Frication A4 F4 B4 when the nasal cavity is noise opened by lowering the velum) A5 F5 B5 Bypass © Copyright Simon King, University of Edinburgh. For personal use only. Re-use or distribution not permitted. Creating speech using the Klatt model © Copyright Simon King, University of Edinburgh. For personal use only. Re-use or distribution not permitted. Manipulating speech via the Klatt model audio: original, Klatt, Klatt (stylised) © Copyright Simon King, University of Edinburgh. For personal use only. Re-use or distribution not permitted. Pros and cons of formant-based systems • Pros • Allows incorporation of linguistic knowledge in a transparent way • Precise control over every parameter value • Low memory requirements and low computational cost (for simple models like Klatt) • Cons • Speech quality is ultimately limited by the vocal tract model • Skilled and laborious work to create high-quality output • Work involved leads to a strong bias towards very short stimuli © Copyright Simon King, University of Edinburgh. For personal use only. Re-use or distribution not permitted. Text-to-speech using formant synthesis • Most well known system is MITalk (1970s), but hardware-based predecessors include PAT (Edinburgh, 1950s-1960s), OVE (KTH, Sweden, 1960s), & others • Uses rules to drive an abstract & simplified vocal tract model • MITalk also implemented in hardware (DECtalk, as used by Stephen Hawkins) • This type of system takes a long time to develop: rule sets written by experts http://festvox.org/history/klatt.html Example: MITalk 19 © Copyright Simon King, University of Edinburgh. For personal use only. Re-use or distribution not permitted. Driving the Klatt model from text with rules • A synthesiser like MITalk determines values for vowel formants using rules • start with a fixed default (target) value for every vowel • modify using co-articulation rules, reduction, etc. • The Klatt vocal tract model is still used to create stimuli for phonetic experiments • reasonable results can be obtained by experts • but driving it automatically with rules is another matter • It is only used for text-to-speech in legacy applications © Copyright Simon King, University of Edinburgh. For personal use only. Re-use or distribution not permitted. Part II Core techniques - articulatory synthesis 21 © Copyright Simon King, University of Edinburgh. For personal use only. Re-use or distribution not permitted. VocalTractLab (audiovisual example) 22 © Copyright Simon King, University of Edinburgh. For personal use only. Re-use or distribution not permitted. HLSyn from Sensimetrics 6.542J Lab 16 11/08/05 5 • “quasi-articulatory” synthesiser f1, f2, f3, f4 an • specify vocal tract in terms of ab both physical dimensions and formant frequencies al • fewer parameters than Klatt ue (13, instead of 40-60) • no longer available as a product ag dc f0 ps ap (audio examples: original. copy synthesis) - credit: ProsynthFigure project, 1(a) UCL 23 © Copyright Simon King, University of Edinburgh. For f0personalag ap useal only. ab Re-use an or distributionuepsdc f1 not f2 permitted. f3 f4 HL parameters (13 physiologically- based) Mapping Relations (including the circuit model used to calculate pressures and flows) AV OQ AF ...... F1 F2B1B2 ...... KL parameters (40-50 acoustic) Sources Transfer functions Speech output Figure 1(b) 5000 4000 3000 Frequency (Hz) 2000 1000 0 0 50 100 150 200 250 300 350 400 450 Time (ms) Figure 2 Tada Manual -3- Last Updated 3/9/07 Usage Launching TADA TADA : TAsk Dynamic Application • MATLAB™ version: Release 14 (Ver. 7.0 or higher) Type 'tada' in command line in MATLAB™. • Stand-alone version: Double-click on TADA icon. • Based on Browman & Goldstein’s Task Dynamic model of speech production The TADA Window • Synthesis achievedTADA opens the using GUI shown HLSyn in Fig. 2. Figure 2. (audiovisual example) • In the center is the temporal display: the gestural activations that are input to the task dynamic model (gestural score) and time functions of tract variable values and 24 © Copyright Simon King, Universityarticulator of Edinburgh. that are the modelFor personal outputs. use only. Re-use or distribution not permitted. • At the left side is the spatial display: vocal tract shape and area function of the at the time of the cursor in the temporal display. • The right side is organized into four panels of buttons and readouts.
Recommended publications
  • Commercial Tools in Speech Synthesis Technology
    International Journal of Research in Engineering, Science and Management 320 Volume-2, Issue-12, December-2019 www.ijresm.com | ISSN (Online): 2581-5792 Commercial Tools in Speech Synthesis Technology D. Nagaraju1, R. J. Ramasree2, K. Kishore3, K. Vamsi Krishna4, R. Sujana5 1Associate Professor, Dept. of Computer Science, Audisankara College of Engg. and Technology, Gudur, India 2Professor, Dept. of Computer Science, Rastriya Sanskrit VidyaPeet, Tirupati, India 3,4,5UG Student, Dept. of Computer Science, Audisankara College of Engg. and Technology, Gudur, India Abstract: This is a study paper planned to a new system phonetic and prosodic information. These two phases are emotional speech system for Telugu (ESST). The main objective of usually called as high- and low-level synthesis. The input text this paper is to map the situation of today's speech synthesis might be for example data from a word processor, standard technology and to focus on potential methods for the future. ASCII from e-mail, a mobile text-message, or scanned text Usually literature and articles in the area are focused on a single method or single synthesizer or the very limited range of the from a newspaper. The character string is then preprocessed and technology. In this paper the whole speech synthesis area with as analyzed into phonetic representation which is usually a string many methods, techniques, applications, and products as possible of phonemes with some additional information for correct is under investigation. Unfortunately, this leads to a situation intonation, duration, and stress. Speech sound is finally where in some cases very detailed information may not be given generated with the low-level synthesizer by the information here, but may be found in given references.
    [Show full text]
  • Models of Speech Synthesis ROLF CARLSON Department of Speech Communication and Music Acoustics, Royal Institute of Technology, S-100 44 Stockholm, Sweden
    Proc. Natl. Acad. Sci. USA Vol. 92, pp. 9932-9937, October 1995 Colloquium Paper This paper was presented at a colloquium entitled "Human-Machine Communication by Voice," organized by Lawrence R. Rabiner, held by the National Academy of Sciences at The Arnold and Mabel Beckman Center in Irvine, CA, February 8-9,1993. Models of speech synthesis ROLF CARLSON Department of Speech Communication and Music Acoustics, Royal Institute of Technology, S-100 44 Stockholm, Sweden ABSTRACT The term "speech synthesis" has been used need large amounts of speech data. Models working close to for diverse technical approaches. In this paper, some of the the waveform are now typically making use of increased unit approaches used to generate synthetic speech in a text-to- sizes while still modeling prosody by rule. In the middle of the speech system are reviewed, and some of the basic motivations scale, "formant synthesis" is moving toward the articulatory for choosing one method over another are discussed. It is models by looking for "higher-level parameters" or to larger important to keep in mind, however, that speech synthesis prestored units. Articulatory synthesis, hampered by lack of models are needed not just for speech generation but to help data, still has some way to go but is yielding improved quality, us understand how speech is created, or even how articulation due mostly to advanced analysis-synthesis techniques. can explain language structure. General issues such as the synthesis of different voices, accents, and multiple languages Flexibility and Technical Dimensions are discussed as special challenges facing the speech synthesis community.
    [Show full text]
  • Expression Control in Singing Voice Synthesis
    Expression Control in Singing Voice Synthesis Features, approaches, n the context of singing voice synthesis, expression control manipu- [ lates a set of voice features related to a particular emotion, style, or evaluation, and challenges singer. Also known as performance modeling, it has been ] approached from different perspectives and for different purposes, and different projects have shown a wide extent of applicability. The Iaim of this article is to provide an overview of approaches to expression control in singing voice synthesis. We introduce some musical applica- tions that use singing voice synthesis techniques to justify the need for Martí Umbert, Jordi Bonada, [ an accurate control of expression. Then, expression is defined and Masataka Goto, Tomoyasu Nakano, related to speech and instrument performance modeling. Next, we pres- ent the commonly studied set of voice parameters that can change and Johan Sundberg] Digital Object Identifier 10.1109/MSP.2015.2424572 Date of publication: 13 October 2015 IMAGE LICENSED BY INGRAM PUBLISHING 1053-5888/15©2015IEEE IEEE SIGNAL PROCESSING MAGAZINE [55] noVEMBER 2015 voices that are difficult to produce naturally (e.g., castrati). [TABLE 1] RESEARCH PROJECTS USING SINGING VOICE SYNTHESIS TECHNOLOGIES. More examples can be found with pedagogical purposes or as tools to identify perceptually relevant voice properties [3]. Project WEBSITE These applications of the so-called music information CANTOR HTTP://WWW.VIRSYN.DE research field may have a great impact on the way we inter- CANTOR DIGITALIS HTTPS://CANTORDIGITALIS.LIMSI.FR/ act with music [4]. Examples of research projects using sing- CHANTER HTTPS://CHANTER.LIMSI.FR ing voice synthesis technologies are listed in Table 1.
    [Show full text]
  • Text-To-Speech Synthesis System for Marathi Language Using Concatenation Technique
    Text-To-Speech Synthesis System for Marathi Language Using Concatenation Technique Submitted to Dr. Babasaheb Ambedkar Marathwada University, Aurangabad, INDIA For the award of Doctor of Philosophy by Mr. Sangramsing Nathsing Kayte Supervised by: Dr. Bharti W. Gawali Professor Department of Computer Science and Information Technology DR. BABASAHEB AMBEDKAR MARATHWADA UNIVERSITY, AURANGABAD November 2018 Abstract A speech synthesis system is a computer-based system that should be able to read any text aloud with a particular language or multiple languages. This is also called as Text-to- Speech synthesis or in short TTS. Communication plays an important role in everyones life. Usually communication refers to speaking or writing or sending a message to another person. Speech is one of the most important ways for human communication. There have been a great number of efforts to incorporate speech for communication between humans and computers. Demand for technologies in speech processing, such as speech recogni- tion, dialogue processing, natural language processing, and speech synthesis are increas- ing. These technologies are useful for human-to-human like spoken language translation systems and human-to-machine communication like control for handicapped persons etc., TTS is one of the key technologies in speech processing. TTS system converts ordinary orthographic text into acoustic signal, which is indis- tinguishable from human speech. For developing a natural human machine interface, the TTS system can be used as a way to communicate back, like a human voice by a computer. The TTS can be a voice for those people who cannot speak. People wishing to learn a new language can use the TTS system to learn the pronunciation.
    [Show full text]
  • Wormed Voice Workshop Presentation
    Wormed Voice Workshop Presentation micro_research December 27, 2017 1 some worm poetry and songs: The WORM was for a long time desirous to speake, but the rule and order of the Court enjoyned him silence, but now strutting and swelling, and impatient, of further delay, he broke out thus... [Michael Maier] He worshipped the worm and prayed to the wormy grave. Serpent Lucifer, how do you do? Of your worms and your snakes I'd be one or two; For in this dear planet of wool and of leather `Tis pleasant to need neither shirt, sleeve, nor shoe, 2 And have arm, leg, and belly together. Then aches your head, or are you lazy? Sing, `Round your neck your belly wrap, Tail-a-top, and make your cap Any bee and daisy. Two pigs' feet, two mens' feet, and two of a hen; Devil-winged; dragon-bellied; grave- jawed, because grass Is a beard that's soon shaved, and grows seldom again worm writing the the the the,eeeronencoug,en sthistit, d.).dupi w m,tinsprsool itav f bometaisp- pav wheaigelic..)a?? orerdi mise we ich'roo bish ftroo htothuloul mespowouklain- duteavshi wn,jis, sownol hof." m,tisorora angsthyedust,es, fofald,junss ownoug brad,)fr m fr,aA?a????ck;A?stelav aly, al is.'rady'lfrdil owoncorara wns t.) sh'r, oof ofr,a? ar,a???????a? fu mo towess,eethen hrtolly-l,."tigolav ict,a???!ol, w..'m,elyelil,tstreamas..n gotaillas.tansstheatsea f mb ispot inici t.) owar.**1 wnshigigholoothtith orsir.tsotic.'m, sotamimoledug imootrdeavet..t,) sh s,tranciror."wn sieee h asinied.tiear wspilotor,) bla av.nicord,ier.dy'et.*tite m.)..*d, hrouceto hie, ig il m, bsomoug,.t.'l,t, olitel bs,.nt,.dotr tat,)aa? htotitedont,j alesil, starar,ja taie ass.nishiceroouldseal fotitoonckysil, m oitispl o anteeeaicowousomirot.
    [Show full text]
  • Design and Implementation of Text to Speech Conversion for Visually Impaired People
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Covenant University Repository International Journal of Applied Information Systems (IJAIS) – ISSN : 2249-0868 Foundation of Computer Science FCS, New York, USA Volume 7– No. 2, April 2014 – www.ijais.org Design and Implementation of Text To Speech Conversion for Visually Impaired People Itunuoluwa Isewon* Jelili Oyelade Olufunke Oladipupo Department of Computer and Department of Computer and Department of Computer and Information Sciences Information Sciences Information Sciences Covenant University Covenant University Covenant University PMB 1023, Ota, Nigeria PMB 1023, Ota, Nigeria PMB 1023, Ota, Nigeria * Corresponding Author ABSTRACT A Text-to-speech synthesizer is an application that converts text into spoken word, by analyzing and processing the text using Natural Language Processing (NLP) and then using Digital Signal Processing (DSP) technology to convert this processed text into synthesized speech representation of the text. Here, we developed a useful text-to-speech synthesizer in the form of a simple application that converts inputted text into synthesized speech and reads out to the user which can then be saved as an mp3.file. The development of a text to Figure 1: A simple but general functional diagram of a speech synthesizer will be of great help to people with visual TTS system. [2] impairment and make making through large volume of text easier. 2. OVERVIEW OF SPEECH SYNTHESIS Speech synthesis can be described as artificial production of Keywords human speech [3]. A computer system used for this purpose is Text-to-speech synthesis, Natural Language Processing, called a speech synthesizer, and can be implemented in Digital Signal Processing software or hardware.
    [Show full text]
  • Attuning Speech-Enabled Interfaces to User and Context for Inclusive Design: Technology, Methodology and Practice
    CORE Metadata, citation and similar papers at core.ac.uk Provided by Springer - Publisher Connector Univ Access Inf Soc (2009) 8:109–122 DOI 10.1007/s10209-008-0136-x LONG PAPER Attuning speech-enabled interfaces to user and context for inclusive design: technology, methodology and practice Mark A. Neerincx Æ Anita H. M. Cremers Æ Judith M. Kessens Æ David A. van Leeuwen Æ Khiet P. Truong Published online: 7 August 2008 Ó The Author(s) 2008 Abstract This paper presents a methodology to apply 1 Introduction speech technology for compensating sensory, motor, cog- nitive and affective usage difficulties. It distinguishes (1) Speech technology seems to provide new opportunities to an analysis of accessibility and technological issues for the improve the accessibility of electronic services and soft- identification of context-dependent user needs and corre- ware applications, by offering compensation for limitations sponding opportunities to include speech in multimodal of specific user groups. These limitations can be quite user interfaces, and (2) an iterative generate-and-test pro- diverse and originate from specific sensory, physical or cess to refine the interface prototype and its design cognitive disabilities—such as difficulties to see icons, to rationale. Best practices show that such inclusion of speech control a mouse or to read text. Such limitations have both technology, although still imperfect in itself, can enhance functional and emotional aspects that should be addressed both the functional and affective information and com- in the design of user interfaces (cf. [49]). Speech technol- munication technology-experiences of specific user groups, ogy can be an ‘enabler’ for understanding both the content such as persons with reading difficulties, hearing-impaired, and ‘tone’ in user expressions, and for producing the right intellectually disabled, children and older adults.
    [Show full text]
  • Voice Synthesizer Application Android
    Voice synthesizer application android Continue The Download Now link sends you to the Windows Store, where you can continue the download process. You need to have an active Microsoft account to download the app. This download may not be available in some countries. Results 1 - 10 of 603 Prev 1 2 3 4 5 Next See also: Speech Synthesis Receming Device is an artificial production of human speech. The computer system used for this purpose is called a speech computer or speech synthesizer, and can be implemented in software or hardware. The text-to-speech system (TTS) converts the usual text of language into speech; other systems display symbolic linguistic representations, such as phonetic transcriptions in speech. Synthesized speech can be created by concatenating fragments of recorded speech that are stored in the database. Systems vary in size of stored speech blocks; The system that stores phones or diphones provides the greatest range of outputs, but may not have clarity. For specific domain use, storing whole words or suggestions allows for high-quality output. In addition, the synthesizer may include a vocal tract model and other characteristics of the human voice to create a fully synthetic voice output. The quality of the speech synthesizer is judged by its similarity to the human voice and its ability to be understood clearly. The clear text to speech program allows people with visual impairments or reading disabilities to listen to written words on their home computer. Many computer operating systems have included speech synthesizers since the early 1990s. A review of the typical TTS Automatic Announcement System synthetic voice announces the arriving train to Sweden.
    [Show full text]
  • Towards Expressive Speech Synthesis in English on a Robotic Platform
    PAGE 130 Towards Expressive Speech Synthesis in English on a Robotic Platform Sigrid Roehling, Bruce MacDonald, Catherine Watson Department of Electrical and Computer Engineering University of Auckland, New Zealand s.roehling, b.macdonald, [email protected] Abstract Affect influences speech, not only in the words we choose, but in the way we say them. This pa- per reviews the research on vocal correlates in the expression of affect and examines the ability of currently available major text-to-speech (TTS) systems to synthesize expressive speech for an emotional robot guide. Speech features discussed include pitch, duration, loudness, spectral structure, and voice quality. TTS systems are examined as to their ability to control the fea- tures needed for synthesizing expressive speech: pitch, duration, loudness, and voice quality. The OpenMARY system is recommended since it provides the highest amount of control over speech production as well as the ability to work with a sophisticated intonation model. Open- MARY is being actively developed, is supported on our current Linux platform, and provides timing information for talking heads such as our current robot face. 1. Introduction explicitly stated otherwise the research is concerned with the English language. Affect influences speech, not only in the words we choose, but in the way we say them. These vocal nonverbal cues are important in human speech as they communicate 2.1. Pitch information about the speaker’s state or attitude more effi- ciently than the verbal content (Eide, Aaron, Bakis, Hamza, Pitch contour seems to be one of the clearest indica- Picheny, and Pitrelli 2004).
    [Show full text]
  • Speech Synthesis
    Gudeta Gebremariam Speech synthesis Developing a web application implementing speech tech- nology Helsinki Metropolia University of Applied Sciences Bachelor of Engineering Information Technology Thesis 7 April, 2016 Abstract Author(s) Gudeta Gebremariam Title Speech synthesis Number of Pages 35 pages + 1 appendices Date 7 April, 2016 Degree Bachelor of Engineering Degree Programme Information Technology Specialisation option Software Engineering Instructor(s) Olli Hämäläinen, Senior Lecturer Speech is a natural media of communication for humans. Text-to-speech (TTS) tech- nology uses a computer to synthesize speech. There are three main techniques of TTS synthesis. These are formant-based, articulatory and concatenative. The application areas of TTS include accessibility, education, entertainment and communication aid in mass transit. A web application was developed to demonstrate the application of speech synthesis technology. Existing speech synthesis engines for the Finnish language were compared and two open source text to speech engines, Festival and Espeak were selected to be used with the web application. The application uses a Linux-based speech server which communicates with client devices with the HTTP-GET protocol. The application development successfully demonstrated the use of speech synthesis in language learning. One of the emerging sectors of speech technologies is the mobile market due to limited input capabilities in mobile devices. Speech technologies are not equally available in all languages. Text in the Oromo language
    [Show full text]
  • The Algorithms of Tajik Speech Synthesis by Syllable
    ITM Web of Conferences 35, 07003 (2020) https://doi.org/10.1051/itmconf/20203507003 ITEE-2019 The Algorithms of Tajik Speech Synthesis by Syllable Khurshed A. Khudoyberdiev1* 1Khujand Polytechnic institute of Tajik technical university named after academician M.S. Osimi, Khujand, 735700, Tajikistan Abstract. This article is devoted to the development of a prototype of a computer synthesizer of Tajik speech by the text. The need for such a synthesizer is caused by the fact that its analogues for other languages not only help people with visual and speech defects, but also find more and more application in communication technology, information and reference systems. In the future, such programs will take their proper place in the broad acoustic dialogue of humans with automatic machines and robotics in various fields of human activity. The article describes the prototype of the Tajik computer synthesizer by the text developed by the author, which is constructed on the principle of a concatenative synthesizer, in which the syllable is chosen as the speech unit, which in turn, indicates the need for the most complete description of the variety of Tajik language syllables. To study the patterns of the Tajik language associated with the concept of syllable, it was introduced the concept of “syllabic structure of the word”. It is obtained the statistical distribution of structures, i.e. a correspondence is established between the syllabic structures of words and the frequencies of their occurrence in texts in the Tajik language. It is proposed an algorithm for breaking Tajik words into syllables, implemented as a computer program.
    [Show full text]
  • Understanding Disability and Assistive Technology
    Free ebooks ==> www.ebook777.com BOOKS FOR PROFESSIONALS BY PROFESSIONALS® O Connor RELATED Pro HTML5 Accessibility Build exciting, accessible, and usable web sites and apps with Pro HTML5 Accessibility. This book walks you through the process of designing user interfaces to be used by everyone, regardless of ability. It gives you the knowledge and skills you need to use HTML5 to serve the needs of the widest possible audience, including people with disabilities using assistive technology (AT) and older people. With Pro HTML5 Accessibility, you’ll learn: • How accessibility makes for good web site design • The fundamentals of ATs and how they interact with web content • How to apply HTML5 to your web projects in order to design more accessible content • How JavaScript and WAI-ARIA can be used with HTML5 to support the development of accessible web content • Important usability and user-centered design techniques that can make your HTML5 projects reach a wider audience Filled with practical advice, this book helps you master HTML5 and good accessibility design. It explores the new semantics of HTML5 and shows you how to combine them with authoring practices you may know from using earlier versions of HTML. It also aims to demonstrate how HTML5 content is currently supported (or not) by ATs such as screen readers and what this practically means for you as you endeavor to make your HTML5 projects accessible. Shelve in Web Design/HTML User level: Intermediate–Advanced SOURCE CODE ONLINE www.apress.com www.ebook777.com Free ebooks ==> www.ebook777.com For your convenience Apress has placed some of the front matter material after the index.
    [Show full text]