Requirements Specification

for

ESpeak :

Version 1.48.15

Prepared by Dimitrios Koufounakis

January 10, 2018

Copyright © 2002 by Karl E. Wiegers. Permission is granted to use, modify, and distribute this document. Software Requirements Specification for Page ii

Table of Contents

Table of Contents ...... ii Revision History ...... ii 1. Introduction ...... 1 1.1 Purpose ...... 1 1.2 Document Conventions ...... 1 1.3 Intended Audience and Reading Suggestions...... 1 1.4 Project Scope ...... 1 1.5 References...... 1 2. Overall Description ...... 2 2.1 Product Perspective ...... 2 2.2 Product Features ...... 2 2.3 User Classes and Characteristics ...... 2 2.4 Operating Environment ...... 3 2.5 Design and Implementation Constraints ...... 3 2.6 User Documentation ...... 3 2.7 Assumptions and Dependencies ...... 4 3. System Features ...... 4 3.1 System Feature 1 ...... 3 3.2 System Feature 2 (and so on)...... 4 4. External Interface Requirements ...... 8 4.1 User Interfaces ...... 8 4.2 Hardware Interfaces ...... 8 4.3 Software Interfaces ...... 8 4.4 Communications Interfaces ...... 8 5. Other Nonfunctional Requirements ...... 8 5.1 Performance Requirements ...... 8 5.2 Safety Requirements ...... 9 5.3 Security Requirements ...... 9 5.4 Software Quality Attributes ...... 9 6. Other Requirements ...... 5 Appendix A: Glossary...... 9 Appendix B: Analysis Models ...... 6 Appendix : Issues List ...... 6

Software Requirements Specification for Page 1

1. Introduction

1.1 Purpose

The purpose of the current document is the specification of all the requirements that eSpeak : Speech Synthesis version 1.48.15. eSpeak : Speech Synthesis is a compact open source software speech synthesizer for English and other languages,available for and Windows. eSpeak can be used to create an audio file that contains the given input text file in a verbal form.

1.2 Document Conventions

This SRS document follows Karl E. Wiegers standard. It is the only document describing the requirements of this software for the current version (1.48.15). Any change in the software’s requirements in the future is necessary to be made through a typical process of change and acceptance of this document. The displays from some devices may look different from the screenshots below. The screenshots in the document have been taken in Windows 10.

1.3 Intended Audience and Reading Suggestions

The intended audience of this document is as follows: 1. Software engineers: It can be used for understanding and further developing the program. 2. Programmers: By reading the document, developers will be able to understand the software in depth, something that would be more difficult if their only source was the program’s code and its GUI. This way, programmers can find which elements need to be improved and what features can be added in a future release. 3. Testers: The document can provide useful information about how the program responds while using it and which are its restrictions so as to better test it and find weak spots. 4. Users: Users can get another view of the application mainly by reading chapters 3 and 4, in addition to the User Manual.

1.4 Project Scope eSpeak : Speech Synthesis is open source software distributed under the GNU General Public License version 3.0 (GPLv3) . The scope of this software is to provide a text-to-sound conversion of a file. The output sound can be played directly or saved to a .wav file for future use(Save to .wav button). Options like voice selection, voice rate and volume are available. There is also a graphical representation of the oral movement. eSpeak : Speech Synthesis windows 10 screenshot

1.5 References

The information found in this document have been taken by the following links: 1. http://espeak.sourceforge.net/index.html Main Website

Software Requirements Specification for Page 2

2. https://en.wikipedia.org/wiki/ESpeakNG page

2. Overall Description

2.1 Product Perspective

In 1995, Jonathan Duddington released the Speak speech synthesizer for RISC OS computers supporting . On 17 February 2006, Speak 1.05 was released under the GPLv2 license, initially for Linux, with a Windows SAPI 5 version added in January 2007. Development on Speak continued until version 1.14, when it was renamed to eSpeak. From eSpeak 1.27, eSpeak was updated to use the GPLv3 license. The last official eSpeak release was 1.48.04 for Windows and Linux, 1.47.06 for RISC OS and 1.45.04 for Mac OS X. The last development release of eSpeak was 1.48.15 on 16 April 2015. To this day, eSpeak has been used from various users to listen to blogs and news sites. The eSpeak speech synthesizer supports several languages, however in many cases these are initial drafts and need more work to improve them. Assistance from native speakers is welcome for these, or other new languages. eSpeak does text to speech synthesis for several languages, some better than others.

2.2 Product Features

• Includes different Voices, whose characteristics can be altered. • Can produce speech output as a WAV file. • SSML (Speech Synthesis Markup Language) is supported (not complete), and also HTML. • Compact size. The program and its data, including many languages, totals about 2 Mbytes. • Can be used as a front-end to MBROLA diphone voices, see .html. eSpeak converts text to with pitch and length information. • Can translate text into codes, so it could be adapted as a front end for another speech synthesis engine. • Potential for other languages. Several are included in varying stages of progress. Help from native speakers for these or other languages is welcome. • Development tools are available for producing and tuning phoneme data. • Written in C.

2.3 User Classes and Characteristics

This software has been used, until now, mainly in proprietary or open source projects as a . Some examples are : • Windows • Linux and some other distributions • Android • The latter used Speak (it's predecessor). It can be used by a variety of users. There is no restriction on who may use the software as it is free and open-source but users can be divided into two main categories:

❖ General purpose users:

Software Requirements Specification for Page 3

Any user that needs to convert a text of considerable size into a audio file. Previous knowledge of this software is not required from these users, due to the small number of functions and their simplicity.

❖ Software Engineers/Developers in OSS or proprietary software: As mentioned above, this could be used as a software component for a specific purpose in information systems by any software company or individual with the technical expertise. This user category needs to have in-depth knowledge and understanding of the software functionality in order to transmute it and adjust it to their software system.

2.4 Operating Environment

Available for the following operating systems: • Windows. • Linux. • Mac OS X. • RISC OS for ARM processors. • eSpeak can also be ported to other platforms, including Android and Solaris. Precompiled versions exist for each OS(written in C) except android and Solaris. ❖ In future development of the source code, specific libraries like should be avoided as they make impossible the compilation of the source code for other operating systems.

2.5 Design and Implementation Constraints

All the source files including the graphical user interface, have been created using C. In-depth understanding of C/C++ language is required in order to further develop the project or even understand it. The documentation is fully written in the so it's knowledge will also be required. The software is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. Those who use it, redistribute it and/or modify it have to agree and fully accept the terms and conditions of the GNU GPLv3 under which it is licensed.

2.6 User Documentation

The documentation file that is provided for the Command Line Interface form of the program can be found on http://espeak.sourceforge.net/commands.html. eSpeak is available as a command line interface program for Linux and Windows only but it's Graphical User Interface version is provided for all the enviroments mentioned above. This document explains how the CLI program should be used. The general usage manual can be found in the following webpage :http://espeak.sourceforge.net/docindex.html.

Software Requirements Specification for Page 4

2.7 Assumptions and Dependencies eSpeak uses the PortAudio sound library (version 18), so you will need to have the libportaudio0 library package installed. It may be already, since it's used by other software, such as OpenOffice.org and the Audacity sound editor. Some Linux distrubitions (eg. SuSe 10) have version 19 of PortAudio which has a slightly different API. The speak program can be compiled to use version 19 of PortAudio by copying the file portaudio19.h to portaudio.h before compiling. The speak program may be compiled without using PortAudio, by removing the line #define USE_PORTAUDIOin the file speech.h.

Information on how to compile eSpeak for Linux can be found through Google search.

3. System Features

The main page of the application (which is also the only page), has a text input box, all the function buttons and some options about the voice and it’s sound. It also has a function(button) queue that displays all the functions(buttons) that were pressed lastly.

3.1 Voice Altering.

3.1.1 Description Under the input text box, there is a combobox named: "Voice". Different options appear in this combobox, each having it's own characteristics. The voice, for instance, can be male, English with a British or American accent. 3.1.2 Stimulus/Response Sequences The user can select the desired Voice format by clicking on it. The combobox appears showing all the available voices. To select an individual voice the user needs to click on one of the options. Once the voice has been selected the combobox shows the preferred voice as the selected one and other than this nothing changes until the user presses the "speak" button and the voice starts playing with different acoustic characteristics. 3.1.3 Functional Requirements REQ-1: Every supported voice format must be shown in the corresponding list. REQ-2: When a voice is selected, the combobox must show the selected voice. REQ-3: After the user has selected the preferred voice and has pressed the "speak" button, the acoustic characteristics of the voice playing must be equivalent to the previously selected voice.

3.2 Produce speech from input text or text file "speak".

3.2.1Description On the side of the input text box, there is a button named: "speak". Under the condition that a text input has been given, either by writing on input text box or pressing the "open file" button which transfers the text from the file to the input text box, a voice is produced in the corresponding language.

Software Requirements Specification for Page 5

3.2.2Stimulus/Response Sequences Under the condition that a text input has been given, when "speak" is pressed, a voice is produced pronouncing each word given as input. 3.2.3Functional Requirements REQ-1: All the words given as input must be pronounced in the right order and correctly.

3.3 Pause/Resume button and Stop button.

3.3.1Description The Resume/Pause button has two(2) titles where one appears depending on the state of the program. If the "speak" button is pressed and the voice has started playing this button has the Pause title which if pressed then the title changes to Resume. The function of this button is self-evident. The Stop button on the other hand if the voice is already playing and pressed then the voice stops and the last word pronounced is not saved and the whole procedure must start again.

3.3.2Stimulus/Response Sequences The Pause if pressed pauses the voice and the title of Pause button changes to Resume. If Resume is pressed the voice resumes and the button title changes to Pause. The stop button stops the procedure of voice producing without saving the last point that was pronounced in the text. 3.3.3Functional Requirements REQ-1: Resume button must change its title to "Pause" once is pressed and vice versa. REQ-2: Resume if pressed must begin the sound production from the point that Pause button was pressed. REQ-3: If Pause button is pressed then the sound production must pause saving the point in the text this happened. REQ-4: If Stop is pressed sound production must stop without any saving of the last point in the text.

3.4 Save to .wav button and Speak .wav button.

3.4.1Description The Save to .wav button instead of playing the sound of the given input text at that moment, it saves it to a .wav file for later use. The speak .wav button once pressed, prompts the user to select a .wav file from his own computer files which then plays the .wav file like a media player.

3.4.2Stimulus/Response Sequences The Save to .wav button if pressed opens a "Save As" window for the user to select the directory that the .wav file is about to be saved with the appropriate name. The speak .wav button opens an "Open" windows in order the user to select the preferred file that is going to be played. Once the file is chosen, eSpeak plays it.

Software Requirements Specification for Page 6

3.4.3Functional Requirements REQ-1: The save to .wav button must correctly produce the corresponding .wav file. REQ-2: The speak .wav button must correctly play the corresponding .wav file.

3.5 Rate, Volume and Format Options.

3.5.1Description The Rate bar adjusts the number of words pronounced in a second. The Volume adjusts the volume of the sound that is produced. The Format combobox lets the user choose the sampling rate (Hz), the bit size and if the sound will be stereo or mono. This option is related to the quality of the sound.

3.5.2Stimulus/Response Sequences If the Rate bar moves higher for instance, the rate of the words get bigger or in other words, eSpeak speaks quicker. The opposite is also true. If the Volume bar moves higher for instance, the sound gets louder. The opposite is also true. If the option "24kHz, 8 bit , stereo" is selected in Format, a sound of a certain quality will be produced. 3.5.3Functional Requirements REQ-1: When Rate bar moves the word sound rate must change accordingly. REQ-2: When Volume bar moves the volume of the sound must change accordingly. REQ-3: With each option in the combobox of Format, the appropriate sound quality must be produced. .

3.6 Open File ,Skip and Reset buttons.

3.61Description The "Open File" button opens a text file and copies all the data in the input text box. The "Skip" button, which has a spin box attached to it, skips n words in the text box and doesn't pronounce them. The "Reset" button resets the input text box, the rate,volume and format options and the check box "Show all events".

3.6.2Stimulus/Response Sequences If the "Open File" button is pressed, a "Open" windows opens that prompts the user to select a text file. After the user has selected a certain text file, eSpeak will copy the data and paste it to the input text box in the main page/windows of the application. If the "Skip" button is pressed with the number n selected in the spin box, then eSpeak skips n words in the input text box.

Software Requirements Specification for Page 7

If the "Reset" is pressed, the input text box, the rate,volume and format as the check box "Show all events", all return to their initial state, in which each has the following contents correspondingly: • "Enter text you wish spoken here.". • "eSpeak-EN-US" • Bar in middle. • Bar in max. • "16 kHz 16 bit Mono" • Unchecked. 3.6.3Functional Requirements REQ-1: When "Open File" button is pressed and a text file is selected, its data must be copied to the input text box. REQ-2: When "Skip" button is pressed, n words must be ignored from the input text box. REQ-3: When "Reset" button is pressed all the fields mentioned above in 3.6.2 must return return to their initial state.

3.7 Process XML check button (SSML).

3.7.1Description This function refers to the recognition of the XML-based language known as SSML (SSML stands for Speech Synthesis Markup Language). It can also process any XML-based language text in order to remove the tags, so it can process HTML text also. SSML and HTML in the same file is also acceptable. With the SSML inside an input text, it makes it possible for each word to have it's own attributes. For instance the 1st word can be pronounced with a certain volume and rate but the 2nd word can be pronounced with a different volume and rate or even with another accent. Time delay between the pronunciation of the words is also achievable.

3.7.2Stimulus/Response Sequences If the check box "Process XML" is checked, then the text inside tags wont be pronounced but it will have a special meaning for the parser, assigning a specific acoustic attribute to each word outside the XML tags. Also every tag that isn't part of the SSML tag set will fall in the category of "tag removal" which means that every tag will be removed and only the contents of them will be heard. For instance this would be useful if the user wants to give a website as input to eSpeak. 3.7.3Functional Requirements REQ-1: The tag remover must work according to its requirements. REQ-2: The SSML parser must also work according to its requirements.

Software Requirements Specification for Page 8

4. External Interface Requirements

4.1 User Interfaces

The Graphical User Interface of eSpeak is the simplest form of Desktop GUI which handles the communication between the software and the user. It consists of only one main screen which contains all the control units. There is not a menu bar.

The functionality has been described above.

4.2 Hardware Interfaces

The supported device types in terms of hardware are described in this section. To use this software on each of the following devices, the user should download the appropriate file for his/her operating system and follow the installation wizard. At least 2 MB free disk space. Needed devices: Screen, Speakers, Mouse.

4.3 Software Interfaces eSpeak is a software written in C, which means that there is a different install file for each operating system. Put GUI LIB info here.

4.4 Communications Interfaces eSpeak is not a Web Application, thus there are no Communication Interfaces.

5. Other Nonfunctional Requirements

5.1 Performance Requirements eSpeak is a lightweight software. With regard to RAM usage, after consecutive observations, it was noticed that the software never used more that 13 MB. As worst case scenario, was assumed the opening of a very large text file and even under those circumstances there was not any excessive RAM usage compared to before.

Software Requirements Specification for Page 9

5.2 Safety Requirements

The only harm the software could cause to data would those of the files that are being opened. eSpeak does not modify files, it just reads them so the loss of data would be impossible.

5.3 Security Requirements

There are no security measures on eSpeak. If the user press the 'x' button, no dialog box will be shown about whether he/she wants to exit or not. Further on security, all users have the same rights.

5.4 Software Quality Attributes

❖ Easy to use. eSpeak is provided with a very simplistic GUI and the user doesn't need to have any prior knowledge. ❖ Easy maintenance. The software consists of C++ classes so it can be extended or tested individually. Appendix: Glossary

• GUI: Graphical User Interface is a type of interface that allows users to interact with electronic devices through graphical icons and visual indicators such as secondary notation, as opposed to text- based interfaces, typed command labels or text navigation.

• Library: In computer science, a library is a collection of non-volatile resources used by computer programs, often to develop software. These may include configuration data, documentation, help data, message templates, pre-written code and subroutines, classes, values or type specifications.

• MB: It is the recommended unit symbol of megabyte, a multiple for the unit byte for digital information. 1 MB = 1048576 bytes.

• OS: Operating System is a program that acts as an interface between the user and the computer hardware and controls the execution of all kinds of programs.

• RAM: Random Access Memory is a form of computer data storage.

• GPLv3: General Public License Version 3 is a widely used license, which guarantees end users (individuals, organizations, companies) the freedoms to run, study, share (copy) and modify the software.