Text to Speech Synthesis

Numerical Text to Speech Synthesis

Aim: To understand and develop a Text to Speech Synthesizer using MATLAB.

Block Diagram:

Theory:

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware. A text- to-speech (TTS) system converts normal language text, into speech. Here for ease, the text that we will be considering only numbers.

Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a database. The quality of a speech synthesizer is judged by its similarity to the human voice, and by its ability to be understood. The most important qualities of a speech synthesis system are naturalness and Intelligibility. Naturalness describes how closely the output sounds like human speech, while intelligibility is the ease with which the output is understood. The ideal speech synthesizer is both natural and intelligible, hence speech synthesis systems usually try to maximize both characteristics.

The two primary technologies for generating synthetic speech waveforms are concatenative synthesis and formant synthesis. Here we will be using Concatenative synthesis. Concatenative synthesis is based on the concatenation (or stringing together) of segments of recorded speech. Generally, concatenative synthesis produces the most natural-sounding synthesized speech. However, differences between natural variations in speech and the nature of the automated techniques for segmenting the waveforms sometimes result in audible glitches in the output.

The following code uses the database containing recorded speech for each number. Based on the number, respective speech are chosen, digit by digit. These sounds are then concatenated to generate the wav file, which has synthesized speech, narrating the whole number. Procedure and Matlab Codes:

%Step 1: Get the database of speech signals (preferably in .wav format) that %will be required for Speech Synthesis code. All the speech signals of the %database are then sampled and this sampled data is stored in separate %variables (using ‘wavread’ function in MATLAB). y1 = wavread('one'); y2 = wavread('two'); y3 = wavread('three'); y4 = wavread('four'); y5 = wavread('five'); y6 = wavread('six'); y7 = wavread('seven'); y8 = wavread('eight'); y9 = wavread('nine'); y10 = wavread('ten'); y11 = wavread('eleven'); y12 = wavread('twelve'); y13 = wavread('thirteen'); y14 = wavread('fourteen');y15 = wavread('fifteen'); y16 = wavread('sixteen')y17 = wavread('seventeen'); y18 = wavread('eighteen'); y19 = wavread('nineteen'); y20 = wavread('twenty'); y30 = wavread('thirty'); y40 = wavread('forty'); y50 = wavread('fifty'); y60 = wavread('sixty'); y70 = wavread('seventy'); y80 = wavread('eighty'); y90 = wavread('ninty'); y100 = wavread('hundred'); y1000 = wavread('thousand'); y0 = wavread('zero'); lex = {'1';'2';'3';'4';'5';'6';'7';'8';'9';'0'};

%Step 2: Then input (number) is taken in string format from the user. sen = input('Please give a number to be synthesised into speech : ','s'); Fs = 22050; len = length(sen); %n-words-in-sentence gets the length of the string. pt = zeros(1,len); x = [0];

%Step 3: Text Processing - The number string is then divided into each digit. %Based on the digit value and its position, sampled data of respected speech %is chosen. Eg: for ‘358’, it will be read as three hundred fifty eight, that %is hundred is added after three because of its position in hundred’s place. for i=1:len pt(i) = strmatch(sen(i),lex(:,1),'exact');

%Step 4: Concatenation-All the sampled data is concatenated one after another %in the output sample in proper order (using ‘cat’ function in MATLAB) .

if (i == len-1 && pt(i) == 1) pt(i+1) = strmatch(sen(i+1),lex(:,1),'exact'); switch pt(i+1) case 1 x = cat(1,x,y11); case 2 x = cat(1,x,y12); case 3 x = cat(1,x,y13); case 4 x = cat(1,x,y14); case 5 x = cat(1,x,y15); case 6 x = cat(1,x,y16); case 7 x = cat(1,x,y17); case 8 x = cat(1,x,y18); case 9 x = cat(1,x,y19); case 10 x = cat(1,x,y10); end%end for switch break; elseif (i == len-1) switch pt(i) case 2 x = cat(1,x,y20); case 3 x = cat(1,x,y30); case 4 x = cat(1,x,y40); case 5 x = cat(1,x,y50); case 6 x = cat(1,x,y60); case 7 x = cat(1,x,y70); case 8 x = cat(1,x,y80); case 9 x = cat(1,x,y90); end; else switch pt(i) case 1 x = cat(1,x,y1); case 2 x = cat(1,x,y2); case 3 x = cat(1,x,y3); case 4 x = cat(1,x,y4); case 5 x = cat(1,x,y5); case 6 x = cat(1,x,y6); case 7 x = cat(1,x,y7); case 8 x = cat(1,x,y8); case 9 x = cat(1,x,y9); case 10 if (len == 1) x = cat(1,x,y0); end;

end%end for switch end; if (i == len-2 && pt(i) ~= 10) x = cat(1,x,y100); end; if (i == len-3 && pt(i) ~= 10) x = cat(1,x,y1000); end; end;%end for for

%Step 5: The final output sample is then again converted back into speech % (using ‘wavwrite’ function in MATLAB). wavwrite(x,Fs,'output'); plot(x)

Results:

Conclusions and Applications:

 This program was successful in converting the input number into synthesized speech.

 A text-to-speech (TTS) system has many applications. Especially, an intelligible text-to- speech program allows people with visual impairments or reading disabilities to listen to written works on a home computer.  An example of a fully developed speech synthesis framework is the Festival Speech Synthesis System. The system covers the complete process from textual input, such as the one you are reading here, and audio output, in the form of wave-samples. The full source is freely available for those who want to look at the working details of such a system.