Computer Science

Computer Networks

Piotr Leszczyński Book No. s4207

Remote voice Web browser for people with sight impairment

Zdalna głosowa przeglądarka WWW dla osób niewidomych

Engineering Thesis Written under the advice of Ph.D. Eng. Przemysław Skurowski

Bytom September 2009

Contents

1 Introduction...... 7

2 A brief review of speech synthesis ...... 9

2.1 Human speech synthesis ...... 9

2.2 Text-To-Speech systems overview ...... 10

2.2.2 Concatenation Speech Systems ...... 11

2.2.3 Articulator Speech Systems ...... 11

2.2.4 History ...... 12

3 Application modeling and implementation ...... 14

3.1 Application concept ...... 14

3.2 Functional requirements ...... 15

3.3 Non-Functional requirements ...... 16

3.4 Feasibility analysis ...... 16

3.5 Technical limitations ...... 17

3.5.1 Accessibility ...... 17

3.5.2 Speech synthesis ...... 18

3.5.3 Interpretation of Web pages ...... 18

4 Technology ...... 19

4.1 Java ...... 19

4.2 Java Web Start ...... 20

4.3 FreeTTS speech system ...... 20

4.4 Netbeans framework ...... 21

5 Design ...... 24

5.1 Communication ...... 24 5.2 Client ...... 26

5.2.1 Modular design ...... 26

5.2.2 Portability ...... 26

5.2.3 Model-View-Controller architecture pattern ...... 26

5.2.4 Class model...... 27

5.2.5 Sequence diagram ...... 30

5.2.6 Dependency model ...... 33

5.2.7 Interface ...... 33

5.2.8 Compatibility ...... 35

5.3 Server ...... 35

5.3.1 Class model...... 35

5.3.2 Compatibility ...... 37

5.4 Development challenges ...... 37

6 Testing ...... 39

6.1 Live web testing ...... 39

6.1.1 Method ...... 39

6.1.2 Results ...... 39

6.1.3 Feedback ...... 40

6.2 Synthetic testing ...... 40

6.2.1 Method ...... 40

6.2.2 The results ...... 42

7 Conclusions ...... 47

8 Summery in Polish ...... 49

9 Bibliography ...... 51

A Installation and use ...... 53

A.1 Installation ...... 53

4

A.2 Use ...... 54

B Client compatibility list ...... 56

C Server compatibility list ...... 61

Introduction 5

6

1 Introduction

Human beings posses five senses, according to Aristotelian psychology these senses are sight, hearing, smell, taste and touch [1]. We mainly use only the hearing and the sight when interfacing with computers. There are projects introducing smell into the equation but those have not gone mainstream yet.

Though the main burden of communication lies on the sight, we use hearing to argument multimedia, enhance system and application communication. In most cases, it would be impossible to even enter the operating system without the sense of sight not to mention doing anything else. That is why the situation of people with sight impairment is so difficult when it comes to interfacing with computers. It requires often a very expensive set of software with the top of the line costing between $500-$1300 [2].A combination of a screen reader whose task is to identify and interpret the output of a computer screen, a task that’s being done by a computer monitor and a human brain in case of people without sight impairment, and a text-to-speech or Braille output device. Text-to-speech is the preferred and the most natural method of representing interpreted text.

I have decided to take on the problem of accessibility of sight impairment enabled computers in public places like schools, public administration offices, airports, libraries - in specific World Wide Web access. There are major obstacles in adjusting computers to the use of sight impaired persons. Starting with the high cost of buying the software which in a places like schools, universities, libraries with hundreds of computers could rise to astronomical levels not to mention not many schools can afford it when they cannot even afford buying all the needed computers.

Introduction 7

The next issue, the software needs to be installed, configured and maintained which adds to the already high costs. Those were all the issues I wanted to either deal with or alleviate.

The idea behind this project was to provide a Web based application which would identify, interpret a WWW page and represent the output to the user with text-to-speech technology. An application that would not need an installation or configuration would be easy to run and handle by persons with sight impairment and could be installed on any computer plugged into the internet regardless of its architecture or operating system. I would like to present you the end result of that idea in this document and a working application on the attached disk. I hope you find it interesting and useful.

8

2 A brief review of speech synthesis

In this chapter the basic information needed to understand how speech synthesis works on human and mechanical levels are introduced.

2.1 Human speech synthesis

Modern researchers believe Humans possessed speech abilities as early as 300,000 years ago after the Neanderthals evolution [13] yet documented Human speech synthesis was a subject of research for only a century now and the biggest breakthroughs happened only in the last hundred years or even in the last 20 years.

There are two major centers responsible for the process of human speech creation: lungs and larynx, with vocal cords and glottis. Humans create sound when the air pumped by the lungs moves over the vocal cords and is made to vibrate. Changing sound into speech is a much more complicated process though; it involves creation of phonation in the glottis and modifying it into different vowels and consonants. Prepared speech is then modified by a complex movement of lips, tongue and soft palate which purpose is to filter out some of the frequencies and resonate some of the others. [14]

A brief review of speech synthesis 9

Fig. 2.1: Human vocal tract 1

2.2 Text-To-Speech systems overview

The simplest description of a Text-To-Speech system would be an application that can reproduce speech sequence from a supplied text. There are generally three kinds of speech synthesis methods: concatenation, articulator and formant. Despite the differences they follow the model of human speech production.

Fig. 2.2: Human speech model 2

1 Human vocal tract, picture taken from the May-June 2008 issue of Duke Magazine.

10

2.2.2 Concatenation Speech Systems

It is the most popular and widely used method of speech synthesis. There are two methods of concatenation; the first one concatenates single words or parts of sentences from a database to create the speech. They are called “Voice Response Systems” and their usability is limited to situations where rich vocabulary is not needed and sentence structure is predefined to a strict structure. For example telephone automated systems or train stations arrival and departure announcement systems. That method generally produces one of the best speech qualities at the cost of versatility.

The second one concatenates diphones or phones, two smallest segments of speech in English needed to pronounce text [3].This method allows pronouncing almost anything in the vocabulary at the cost of a lower quality output. Prime examples being Festival and FreeTTS open source solutions and Ivona a commercial solution made in Poland , considered the best voice quality synthesizer in the world [4][5].

2.2.3 Articulator Speech Systems

It is the most complex and natural sounding method of speech synthesis. It tries to mimic the human vocal track as best as it’s possible for today’s technology by creating computational model of every element in the vocal track and its articulation processes. Speech is generated by simulating airflow through the model. There are a few working articulatory speech systems , gnuspeech an open source system and NeXT a commercial

2 Human speech model, picture taken from The Scientist and Engineer's Guide to Digital Signal Processing By Steven W. Smith, Ph.D. http://www.dspguide.com/ch22/6.htm

A brief review of speech synthesis 11

speech system currently belonging to Apple, both having their sources in system orginally developed by Trillium Sound Research.

Concatenative Speech Formant Speech Articulative Synthesis Synthesis Speech Synthesis Complexity Simplest Moderate Complex Quality Variation from poor to Natural and Very good good speech quality good quality quality and most with minor sound speech Natural speech artefacts

Fig. 2.3: Comparison of speech synthesis methods

2.2.4 History

One of the first practical applications of speech synthesis was British Telephone Company’s speaking clock, an equivalent to Polish Zegarynka. It was concatenating words from an optical storage to form a "At the third stroke, the time from BT will be (hour) (minute) and (second) seconds" sequence. It still is even after 72 years with storage method upgrades and four different voices over the years with Sara Mendes da Costa as its last and permanent voice [5][7].

In 1939 Bell Laboratories developed a mechanical device operated by movement of pedals and mechanical keys, Voder (Voice Operating Demonstrator). It was considered to be the first true speech synthesizer. It was based on Vocoder (Voice Coder), a device used for analyzing speech in order to reconstruct the approximation of it. It required a very skilled operator but the output could almost sound like a speech. Despite its shortcomings, it was the first device that showed the potential of artificial speech systems and opened the way for further developments. [7]

12

Fig. 2.4: Voder schematics 3

3 Voder schematics, picture takken from the “History and Development of Speech synthesis” article at http://www.acoustics.hut.fi/publications/files/theses/lemmetty_mst/chap2.html

A brief review of speech synthesis 13

3 Application modeling and implementation

This chapter introduces the analysis and implementation phases of a Remote Voice Web Browser application development. A Client-Server application providing universal Web access for people with sight impairment on any PC class computer plugged into the Internet regardless of its architecture and operating system.

First the project requirements and their analysis to meet the needs of people with sight impairment are presented. They are followed by a technical limitations and technology summary used in the development of the application.

3.1 Application concept

Fig. 3.1: Applications concept

1. A person with sight impairment asks a nearby bystander to open a Web page on a public computer 2. Opens a Web page containing the application

14

3. A request to deploy the client application is made to the deployment server 4. The client application is deployed 5. Application starts and guides a person with sight impairment about its use 6. Application is instructed to open a Web page 7. A task is delegated to an application server 8. A request is made to a WWW server somewhere on the Internet for the content of a Web page 9. Content of the requested Web page is returned to the application server 10. Content of a Web page is parsed into text, normalized and formatted for speech synthesis then send back to the client application 11. Content of the requested Web page is further parsed then it is outputted to the user as synthesized speech.

3.2 Functional requirements

The Voice Web Browser application has to fulfill the following functional requirements:

 Interpret any general Web pages  Create a voice navigation table for each Web page  Output interpreted Web pages through voice system  Provide a voice navigation system  Provide a voice feedback system  Manage focus and navigation of GUI elements as required by the needs of people with sight impairment  Work as a Web based application  Work without installation and configuration  Be platform independent

Application modeling and implementation 15

 Be easy to use and intuitive for people with sight impairment

3.3 Non-Functional requirements

The Voice Web Browser application has to fulfill the following non functional requirements:

 Provide a modular design based on Netbeans API  Support easy maintainability and modifiability  Client side application should provide high enough efficiency to meet performance requirements of office type computers  Server side application should provide scalability for hundreds of users on a typical home WWW server  Be based on Open Source libraries  Be distributed under Open Source license ( only on condition that promoter and University authorities give their permit)

3.4 Feasibility analysis

 Platform independency should be achieved by writing the client and server applications in an interpreted platform independent language- Java was chosen as the most suitable solution accompanied by Java native libraries for the subsystems.

 Interpretation of any general Web pages should be achieved with the help of a Text Web browser – After a short review Lynx was chosen because it is known to be a long time standard text browser for website development.

 All the sound systems and sound output of Web pages should be achieved with the help of Java Speech API and a feasible speech

16

synthesizer – After analysis of the requirements FreeTTS speech synthesizer API was chosen by meeting most of the requirements.

 Modularity was should be achieved with the help of Netbeans API and following design patterns like Model View Controller.

 Web deployability should be achieved with a technology that provides portability of Web based applications while retaining capabilities of desktop applications – Java Web start proofed to be the most suitable choice

3.5 Technical limitations

3.5.1 Accessibility

The major technical limitation was providing an application that could work on any public computer without the need for installation and configuration. The solution was a Web based application but each Web technology has its limitations and finding a suitable one was another challenge in itself.

Java applets were one solution but they have several limitations compared to desktop applications. Applets cannot:

 Read or write to the local file system  Cannot make connections except to the servers on which they were deployed  Cannot access native libraries  Cannot Create processes on the local machine [15] The solution to the problem proofed to be Java Web Start. It possesses all the benefits of Web based applications while retaining desktop

Application modeling and implementation 17

applications capabilities. More information about Java Web Start can be found in the technology section of this thesis.

3.5.2 Speech synthesis

Design assumed using a free open source speech synthesizer, their number is limited though and they had to meet specific requirements. In the end most of the needs could not be met by open source synthesizers so the requirements were toned down to providing average quality of speech, being written in Java and compatibility with Java Web Start technology. The solution was FreeTTS, an open source concatenation based speech synthesizer.

3.5.3 Interpretation of Web pages

This was the most challenging barrier. While interpreting RSS feeds is fairly easy, general Interpretation of Web pages for speech synthesis could be a topic of a master thesis in itself. It is a complicated process partially taken care by large applications like Web browsers. Another part of the problem is general Web pages are written for graphical displaying of data not for sound reproduction. The solution to this problem was very limited and was provided by an old technology, text Web browsers from 20 or more years ago. I used Lynx a text based web browser developed in 1992 by a team of students at a French university to distribute campus information.

18

4 Technology

This section describes all the technologies used in development of the Web Browser application.

4.1 Java

“A high-level programming language developed by Sun Microsystems. Java was originally called OAK, and was designed for handheld devices and set-top boxes. Oak was unsuccessful so in 1995 Sun changed the name to Java and modified the language to take advantage of the burgeoning World Wide Web.

Java is an object-oriented language similar to C++, but simplified to eliminate language features that cause common programming errors. Java source code files (files with a .java extension) are compiled into a format called bytecode (files with a .class extension), which can then be executed by a Java interpreter. Compiled Java code can run on most computers because Java interpreters and runtime environments, known as Java Virtual Machines (VMs), exist for most operating systems, including , the Macintosh OS, and Windows. Bytecode can also be converted directly into machine language instructions by a just-in-time compiler (JIT).

Java is a general purpose programming language with a number of features that make the language well suited for use on the World Wide Web. Small Java applications are called Java applets and can be downloaded from a and run on your computer by a Java- compatible Web browser, such as Netscape Navigator or Microsoft Internet Explorer.” [8]

Technology 19

With today’s progress of Java development it is suited for both desktop, corporate, server applications and even games as well. On the list of Java- compatible Web browser there are also Mozilla Firefox, Apple Safari, Google Chrome and many embedded browsers of graphical desktop environments.

4.2 Java Web Start

It is a technology developed for deploying Java applications over the Internet based on the Java Network Launching Protocol API (JNLP). JNLP provides a browser-independent architecture for deploying applications. Programmer is only required to write an XML file with .jnlp extension describing all the needed jars and their locations, everything else is done on the client machine by Web Start application which is installed with every modern Java distribution. It’s a very powerful system providing Web based applications the capabilities of Java desktop applications while retaining the portability of Web based applications. Java Web Start can also update Java Runtime Environments on the client machine if it’s required by the deployed application thus guaranteeing proper working of the application on all client machines. [9]

4.3 FreeTTS speech system

FreeTTS is a speech synthesis system based on Flite, synthesis engine developed at Carnegie Mellon University, and written in Java. Flite is derived from the Festival Speech Synthesis System from the University of Edinburgh and the Carnegie Mellon UniversityUniversity of Edinburgh and the FestVox project from Carnegie Mellon University. [16]

While it does not support JSML or any other markup language, implementation with Netbeans API requires a rewrite of source code, quality of speech is average and implementation of Polish voice while

20

using Java Web Start proofed impossible, it still is the best free open source speech synthesizer written in Java with the best voice quality.

4.4 Netbeans framework

Netbeans is a generic framework for Swing applications that provides flexible and reliable application architecture. A framework that saves time by reliving a person from writing all the boilerplate code for tabbed views, menus, explorer and pallet types of views, saving state, connecting actions to menu items, toolbar items, keyboard shortcuts, a window management. It also encourages the use of good design pattern solutions, for example the Lookup API, Dependency system and a lot of elements using MVC model. [10]

Netbeans API provides many out of the box components that make the development much quicker and easier, can be reused during development and a lot of other benefits for example the module system providing easy modifiability and maintainability. Some of the benefits of the Netbeans API are:

 Modular Runtime Container

Netbeans runtime container provides lifecycle services to Swing applications allowing for composing a set of modules into a single Swing application. This modularity allows developers to organize the application code into separated versioned modules. Only the modules that have explicitly declared dependencies are able to use code from other exposed packages. This model helps greatly when developing or maintaining large applications developed by teams of engineers. There are benefits for the end users of the application as well, they are able to install modules into the running application because modules are pluggable. Summarizing, the

Technology 21

NetBeans runtime container provides an environment for modules that handles their lifecycle and enables them to interact with each other. [17]

 Loose Coupling & Context Sensitivity Management

NetBeans provides an equivalent to JDK6 ServiceLoader class an implementation of Service Locator design pattern. The Lookup API, with the same functionality but being more suited for Netbeans platform by providing dependency injection among the other benefits. It enables modules to communicate with each other in a type-safe uncoupled way, it allows the use of objects defined in one module in another without the need to depend on each other. [17]

 System FileSystem

The NetBeans filesystem offers the ability to install folders and files into the application filesystem for example settings which enables them to be read by all the modules in the application. [17]

 Window System

The NetBeans Window System API provides a multiply window GUI with tabs and modes that can be maximize/minimize, dock/undock, and drag- and-drop out of the box. It also takes care of interactions between all the windows in the system[17]

 Data Management

The NetBeans Nodes API provides a generic model for Swing components like jLists, jTables which can be used in every component without the need of rewriting the model. Nodes can be also used to display data in

22

several Swing components provided Netbeans Explorer & Properties Sheet API. [17]

Technology 23

5 Design

The application consists of two parts, a server and a client. The server application retrieves, interprets, normalizes and formats Web pages for speech synthesis and sends them to the client applications. The client applications task is to change interpreted data into speech and provide it through a voice interface consisting of voice navigation and voice feedback systems to the user of the application.

5.1 Communication

The communication between the server and the client applications is based on Http (Hypertext Transfer Protocol) with MIME type application/x- java-serialized-object which is used for transferring serialized java objects. Http is a stateless protocol it does not need to store session data about users. The server is designed in a stateless architecture as well, all the recourses are committed and destroyed after the transmission. Stateless server in addition to the stateless protocol results in a lower resources usage and higher efficiency by the server.

24

Fig. 5.1: Communication diagram

5.2 Client

5.2.1 Modular design

The client part of the application is designed based on a modular system. This design allows easy maintainability and modifiability and prevents creation of spaghetti effect. Existing modules can be removed during the runtime and new modules can be installed without stopping the application. GUI modules of the application are uncoupled from the Model modules and logic modules. Any module can be removed and the application will still work just without the capability that was provided by the removed module. By using Lookup you can even remove a module and have another module automatically taking care of the removed module tasks without any changes to the code. Those are just the main benefits of the modular design.

5.2.2 Portability

The application is fully written in Java and all the external libraries are native Java. There are no dependencies on system resources. The application should run on any system and hardware architecture supporting Java. The only limitation could be the ability to reproduce two sound streams at the same time for completely correct working of the application for example some Linux distributions. The application was tested on Windows XP, Windows Vista, Windows 7 by myself and Mac OS X 10.5.8 by Fabrizio Giudici and proofed to be working flawlessly.

5.2.3 Model-View-Controller architecture pattern

Model-View-Controller is an architectural pattern designed to uncouple the model functionality from the presentation and control logic of the application. It allows different presentation layers to share the same data model otherwise the model would need to be written twice or more times for every presentation layer increasing the work of engineers. Using the pattern not only safes time but allows for easy implementation and maintainability of the application. [18]

Fig. 5.2: Model-View-Controller overview diagram4

5.2.4 Class model

This section contains UML Class model of the client part of the application. The model is showing packages, classes and the relations between them and is used as an overview of the application design.

4 Model-View-Controller overview diagram, picture taken from the Sun Java Blueprints http://java.sun.com/blueprints/patterns/MVC- detailed.html

Design 27

Fig. 5.3: Class diagram of the client application Model and nodemodel are the Model part of the Model-View-Controller architectural pattern. Their purpose is to shape and encapsulate the data on which the application operates.

 Model module encapsulates Web data received from the server part of the application  NodeModel encapsulates and provides MyNode objects which inherit from AbstractNode, a part of Netbeans API responsible for presentation layer.

Communication, Controller and VoiceSynthesizer modules are the Controller part of MVC architectural pattern. Controller is responsible for the business logic of the application.

 Communication module is responsible for the data transfer between the server and the client parts of the application. Data is transferred as serialized objects both ways using the URL class. The address of a Web page to interpret is sent to the server on a user request and server sends back the interpreted and formatted Web page. Both the server and client close their sessions, in addition the server commits all the resources following a stateless architecture.

 Controller does the business logic part of application and is an intermediate layer between the GUI and Model. Some things that are done by Controller are for example informing modules of new tasks, controlling the communication, voice synthesis.

 VoiceSynthesizer module is a Controller part of MVC pattern. It is responsible for interfacing with FreeTTS and Java Speech API’s, controlling the synthesizer parameters and speech flow, factoring synthesizer objects

GUIOptions, GUIBrowser are GUIExplorer are the View parts of the MVC architectural pattern. Their task is to provide a GUI for the user which used to display and input infromation. Considering users have sight impairment GUI’s are only used for inputting data and providing a template and event management for the sound navigation system.

 GUIBrowser is the main interface window. It allows the user to type the Web address, inner Web number and to pause and resume the speech queue. It also listens for mouse wheel to adjust the sound volume.  GUIOptions is an option panel for the application, it shows the current Web server address and allows changing it, in the future it will contain more settings as the application grows. It stores settings in the user home directory  GuiExplorer module is responsible for the presentation of Web data under the form of Nodes. It is mainly used for development purposes together with the property sheet like adjusting Web data formatting but in the future it will be reused for a Web page templates creator used to increase the quality of interpreted Web pages.

5.2.5 Sequence diagram

 Scenario 1 Use Case The user opens a Web page Fig. 5.4

 Scenario 2 Use Case The user listens to speech synthesis, changes the speaking rate and volume, pause speech synthesis then resumes it but before the Web page finishes cancels the speech Fig. 5.5

30

Fig. 5.4: Sequence diagram presenting first scenario Use Case

Fig. 5.5: Sequence diagram presenting second scenario Use Case

5.2.6 Dependency model

Dependency model shows dependencies between modules in the application. The application is divided into eight modules and into three layers. There are no mutual dependencies which in addition to modular system mean if one module is removed it will only affect one place in the application instead of the whole system like In the common “spaghetti” architecture applications.

Fig. 5.6: Client applications dependency model and layer separation

5.2.7 Interface

Interface is designed with people with sight impairment in mind. It consists of two text fields, three buttons , 4 sliders, a text area and 2 hidden panels for development purposes. Text field used for inputing Web address Text field used for inner Web page navigatgion

High contrast text area showing Web pages with large font

Options button

Pause and resume buttons used for controlling speech synthesis

Voice synthesizer settings

 Web address field – used for inputting Web address of a desired Web page  Inner Web page field – used for navigation inside a Web page  Pause and resume buttons – used for controlling speech synthesis  Speech synthesizer settings – used for controlling main and interface synthesizers settings like volume or speaking speed  Options button – used for entering options panel of the application

5.2.8 Compatibility

Application is compatible with all the operating systems and hardware supported by Java Se 6.0+ and Java Web Start. See Annex B for the current compatibility list from Sun resources.

5.3 Server

5.3.1 Class model

This section contains UML Class model of the server part of the application. The model is showing packages, classes and their relationships. The server application is not designed as a modular application like the client part. It consists of a servlet, a view part of the MVC model, responsible for communication with the client application and a controller class for business logic like opening Lynx process, normalizing and formatting Web data.

Fig. 5.7: Server applications UML model

36

5.3.2 Compatibility

Server side compatibility consists of two things:

 Lynx compatibility Supported by most Linux distributions and Windows versions through Cygwin  List of Servlet compatible WWW servers, see Annex C

5.4 Development challenges

 Focus management on Netbeans platform

Focus management is very important for applications designed for people with sight impairment in mind. You need the application to work exactly as you planned it to work and focus is a key player here. The application has to start with the right window active in the right tab with the predefined component focused in every circumstance. The transfer of focus has to happen according to a planned traversal policy. I used the Netbeans API in the development for its many advantages but focus management was not one of them. It was a very hard task to complete. There is not much documentation on this and most of them just point to SWING focus management. The solution to this problem took a lot of time to complete but thought me a lot about SWING applications design and architecture.

Design 37

 Implementation of FreeTTS on Netbeans platform Web Start application

FreeTTS library creates its own classloader that tries to find all the needed jars. It is not a job of library to provide its own custom classloader, it is a bad design but it is even worse when designing application on netbeans API. It has strict dependency policy enforcement and a wrapper libraries system. Basically FreeTTS would not work without source code rewrite. This issue took weeks to resolve including many attempts to change the default class loader in freetts.jar, all ending in disaster duo to lack of experience with class loaders. I have finally resolved the issue by weeks of tinkering with Netbeans.

38

6 Testing

6.1 Live web testing

6.1.1 Method

In order to test performance, find bugs in the client and the server application and get professional feedback I have created a suitable native Web environment by inviting members of the Netbeans mailing group [email protected] into the testing phase. It is a group created for computer science faculty students that have taken a 16 hour training course in Netbeans API designed for exchanging new application ideas and sharing knowledge about the API. I have taken the curse in May which was organized by Silesian JUG.

GlassFish, an open source application server, was set for the members of the group. They were asked to try the application over the internet and provide feedback while I was monitoring the server. Over 30 different IP’s have connected from all over the world including a polish student whose master thesis was a blind and sight impairment aid system.

6.1.2 Results

The application server crashed 10 minutes after starting tests duo to too many open local processes. After fixing the problem, by committing all the streams opened by processes and destroying the said processes after each transaction, the server continued to work flawlessly without any crashes for five straight days while continued observations of its parameters and CPU / memory usage were done.

Testing 39

6.1.3 Feedback

“I'm sitting with Toni in Geneva right before the next training day and we've just been listening to my blog in your application! Great! However, we thought that the app should be more visually pleasing... but then we realized the target audience is blind. :-) So, absence of progress bar integration isn't a problem (except, maybe the progress bar could be integrated anyway and then elevator should be played during progress of accessing the requested site).”5

“Started correctly on Mac OS X 10.5.8 and it's speaking right now Pretty cool. It's the first Java talking application that I try.”6

6.2 Synthetic testing

6.2.1 Method

Profiler tool was used for the performance measurements. It is used to monitor important information about the runtime behavior of applications, such as CPU performance, memory usage, thread states, while imposing low overhead. Below are the parameters for the performance tests.

Hardware specification

CPU: Intel T4200 @ 2.0 GHZ Memory: 3,00 GB System: Microsoft Vista

Software specification

5 Geertjan Wielenga – a technical writer and trainer for Netbeans 6 Fabrizio Giudici - Java Architect, Project Manager

40

Operating System: Windows Vista Home Basic Java Development Kit: 1.6.0 update 14 Netbeans Platform: 6.7.1

Profiler Calibration results

Approximate time in one methodEntrey()/methodExit() call pair: When getting absolute timestamp only: 2,7497 microseconds When getting thread CPU timestamp only: 1,1512 microseconds When getting both timestamps:: 3,682 microseconds

Approximate time in one methodEntrey()/methodExit() call pair in sampled instrumentation mode: 0,2211 microseconds

Profiler performance test settings

Scope: Entire application Filter: Profile project and subprojects Method tracking: Exact call tree and timing Exclude time spent in Thread.sleep() and Object.wait() Limit number of profiled threads: 32 Instrumentation scheme: Total Instrument: Method.invoke()

Testing scenario

Open a predefined list of websites and execute them while doing minor GUI operations like pausing / resuming and changing volume. http://www.theinquirer.net/ and its sub pages http://www.bbc.co.uk/ and its sub pages http://mmorpg.com/ and its sub pages http://www.gazeta.pl/ and its sub pages

Testing 41

6.2.2 The results

Tests took an hour and a half and their purpose was to show the behavior of application during normal use. During the tests several important factors were taken into the consideration following a model of application profiling and two testing patterns. First pattern assumed reading the full content of a Web page before opening new one, the second one took only 3-4 minutes and assumed opening new Web pages in short concussive intervals to simulate browsing part of the test. Below you will find the test results with their analysis.

6.2.2.1 Surviving generations / Garbage Collector CPU time

Surviving generations shows how many garbage collections, objects allocated on the JVM heap space survived since the start of the application. Generally the number of surviving generations rises during the application startup but it stabilizes after the application is done loading when all the temporary objects are destroyed. If the number of surviving generations continues to rise instead of stabilizing it might mean that objects that are put on the heap are not removed from it by the Java garbage collector, which is called a leak and is an unwanted feature that may lead to inefficient resources management, stability problems or even out of memory crashes.

Fig. 6.1: Surviving generations and Garbage Collector diagram showing the first 90 seconds of application execution

42

During the first two minutes of application execution graph shows a rise in surviving generations as expected from a launching application. The results then begin to stabilize not showing any signs of memory leaks.

Fig. 6.2: Surviving generations and Garbage Collector diagram showing the full 90 minutes of testing cycle

During the next hour and half graph shows a steady number of surviving Generations. It is not until the application was executing for an hour and ten minutes that a sharp rise was noted. The rise has soon has stabilized and was not a sign of a memory leak. It happened due to heavy load created by a change in testing pattern, which was a concussive opening of Web pages in short intervals.

6.2.2.2 Total heap size / Used heap size

Heap is an area in the memory used for storage, for example objects, during the runtime.

Fig. 6.3: Total/Unused heap size chart

Testing 43

The graph shows two things. The heap size has enough margin space allowing for an effective work of garbage collection and heap size is consistent, staying at a fairly even level between 50 and 60 megabytes.

6.2.2.3 Threads / Loaded classes

Fig. 6.4: Number of Threads and loaded classes chart

The graph shows the number of active threads and loaded classes. The data is consistent and does not show anything out of order.

6.2.2.4 CPU performance analyze

This graph shows how much relative time of the CPU each method in the application has taken during the one hour and thirty minutes.

44

Fig. A.1: Details of two speech synthesizer threads that have used the most CPU time during testing and the only module thread that have used significant CPU time, the communication method sendAndReciveData

Fig. 6.5: Top 35 packages that have used the most CPU time

Testing 45

The Results show that almost 80%+ of the CPU relative time was taken by the FreeTTS and Java Sound API while only 5% was taken by the network communication. This shows that the application introduces very small overhead and considering the fact that the application has no memory leaks it shows that the architecture is very performance efficient.

46

7 Conclusions

The Project was designed to provide Web access for people with sight impairment in limited environments like public places. Even with the numerous technical limitations and a few development problems the project can be considered successful. It has fulfilled all the requirements put ahead of it with one minor exception, to run the application there is a need for a person with a healthy sight to type the Web address of an application in a Web browser but it is simple, takes seconds to do and can be done by anybody around which is not a problem in public places. I am working on a solution for it but considering public computers have limited access combined with already limited access of Java Web Start to the local resources, it is a very difficult task and not high on my priority list.

While the project was successful there are a few things that can be improved. I want to continue the project as my master thesis or at least release it under open source license to improve the application so it can one day become a true solution to many persons with sight impairment problems.

The voice synthesizer supports only English and is of average quality. Not only that but it does not support JSML thus limiting its use. Without JSML support you cannot control the way synthesizer speaks or even introduce small pauses since it is not supported and Thread operations on it are unsafe and unpredictable. The solution to it is using Ivona synthesizer, it provides one of the best commercial voices on the market but SDK alone costs around 200+ Euros and it will require support from the University or organization helping people with sight impairment.

General Web parsing could be improved too. The issue is that Web pages are read from up to bottom and it is not done in the way a healthy person would read a Web page. Another problem is many contents of the main page are replicated on the sub pages like for example menus and other elements thus limiting the experience and wasting time of people using the application. The solution to these problems would be writing a template creator for Web pages to suit them further for the needs of the application. The editor would allow volunteers or students to write templates for the most popular pages. I have started doing it in Netbeans API with the use of Visual Library and even finished part of it but the development was put on hold after realizing FreeTTS does not support JSML and lacks any sort of control mechanism for the output. I would definitely like to restart development with the use of Ivona.

48

8 Summery in Polish

Istoty ludzkie posiadają pięć zmysłów, zgodnie z psychologią Arystotelesa tymi zmysłami są wzrok, słuch, węch, smak i dotyk [1]. Do interakcji z komputerami używa się głównie zmysłu wzroku i słuchu. Są projekty wprowadzające generowanie zapachu dla komputerów ale projekty te nie dojrzały jeszcze do komercyjnej implementacji.

Główny ciężar komunikacji między człowiekiem a komputerem spoczywa na zmyśle wzroku natomiast zmysł słuchu jest przeważnie używany tylko do argumentacji tej komunikacji. W większości przypadków niemożliwym byłoby nawet uruchomienie systemu operacyjnego bez zmysłu wzroku. Dlatego sytuacja osób z zaburzeniami wzroku jest tak trudne w kwesti interakcji z komputerami. Wymaga ona często bardzo drogich zestawów oprogramowania, których to ceny wachają się w przedziale od 500 do 1300 dolarów za pakiety z wyższyej półki [2]. Oprogramowanie to składa się na kombinacje czytnika ekranu (z ang. screen reader) i urządzenia wyjściowego. Czytnik ekranu wykonuje interpretacje i analizę , zadania które to normalnie są wykonywane przez monitor komputerowy i ludzki mózg. Urządzeniem wyjściowym jest syntezator mowy lub urzadzenie wyjściowe Braille’a.

Zdecydowałem się podjąć problemu powszechnej dostępności komputerów przystosowanych do pracy z osobami niewidomym pod kontem dostępu do Internetu w miejscach publicznych takich jak szkoły, urzędy administracji publicznej, lotniska, biblioteki. Występuje wiele poważnych przeszków w dostosowaniu komputerów w miejscach publicznych do potrzeb osób niewidomych. Pierwszą z nich jest wysoki koszt zakupu oprogramowania, które w miejscach takich jak szkoły, uniwersytety, biblioteki z setką

Summery in Polish 49

komputerów mogą przekraczać nawet koszt same infrastruktury IT, nie wspominając już o tym że wiele szkół nie może sobie na to pozwolić, jeżeli nie są wstanie zakupić wszystkich potrzebnych komputerów, przynajmniej w krajach takich ak Polska. Następną kwestią jest, że oprogramowanie musi być zainstalowane, skonfigurowane i konserwowane, to wszystko podnosi koszty, które już i tak są wysokie. Były by to wszystkie kwestie, którymi chciałem się zająć podczas realizacji tego projektu.

Ideą tego projektu było dostarczenie aplikacji Web’owej, która zinterpretuje dowolną stronę WWW bez względu na jej format i przekaże jej treść pod postacia syntetycznej mowy. Aplikacja, która nie potrzebowała by instalacji, konfiguracji, była łatwa w uruchamianiu i obsłudze przez osoby niewidome oraz mogła być uruchominona na dowolonym komputerze podłączonym do internetu bez względu na system operacyjny czy też architekture.

50

9 Bibliography

1. Kaufmann Kohler, Isaac Broyde. Senses, The Five. Jewish Encyclopedia. [Online] http:\\www.jewishencylopedia.com/view.jsp?artid=479&letter=S. 2. Screen Readers. Enable Mart. [Online] http://www.enablemart.com/Catalog/Screen-Readers. 3. George H.Shames, Elisabeth H. Wiig. Human Communication Disorders. s.l. : Bell & Howel Company, 1982. 0-675-09837-8. 4. Christina L. Bennett, Alan W Black. The Blizzard Challenge 2006. festvox. [Online] http://festvox.org/blizzard/bc2006/eval_blizzard2006.pdf. 5. Robert A. J. Clark, Monika Podsiadło, Mark Fraser, Catherine Mayo, Simon King. Statistical analysis of the Blizzard Challenge 2007 listening test results. festvox. [Online] http://www.festvox.org/blizzard/bc2007/blizzard_2007/full_papers/blz3_0 03.pdf. 6. Speaking Clock. Telephones Uk. [Online] http://www.telephonesuk.co.uk/speaking_clock.htm. 7. Lemmetty, Sami. Review of Speech Synthesis Technology. Helsinki University of Technology. [Online] http://www.acoustics.hut.fi/publications/files/theses/lemmetty_mst/chap2 .html. 8. WebMediaBrands Inc. Java. Webopedia. [Online] http://www.webopedia.com/TERM/J/Java.html. 9. Sun Microsystems, Inc. Java Platform, Standard Edition (Java SE) - Java Web Start Overview. Developer Resources for Java Technology. [Online]

Bibliography 51

http://java.sun.com/javase/technologies/desktop/javawebstart/overview. html. 10. Sun Microsystems, Inc. Netbeans platfrom. Netbeans. [Online] http://bits.netbeans.org/dev/javadoc/index.html. 11. Sun Microsystems, Inc. . JavaTM SE 6 Release Notes - Supported System Configurations. Developer Resources for Java Technology. [Online] http://java.sun.com/javase/6/webnotes/install/system- configurations.html. 12. Hunter, Jason. Standalone Servlet Engines. Servlets. [Online] http://www.servlets.com/engines/. 13. Science Blog. Earlier Human Speech? Science Blog. [Online] http://www.scienceblog.com/community/older/1998/B/199801121.html. 14. Vorländer, Michael. Auralization. Fundamentals of Acoustics, Modelling, Simulation, Algorithms and Acoustic Virtual Reality. s.l. : Springer, 2008. 15. Michael Girdley, Kathryn A. Jone. Web Programming with Java 16. FreeTTS 1.2 - A speech synthesizer written entirely in the JavaTM programming language. SourceForge. [Online] http://freetts.sourceforge.net/docs/index.php#what_is_freetts. 17. Sun Microsystems. Netbeans platform. Netbeans. [Online] http://platform.netbeans.org/description.html.

52

A Installation and use

A.1 Installation

Client application does not require installation. To open the application a person without sight impairment is required but it is only 2 easy steps that anyone can do and takes roughly 10-20 seconds. Following steps have to be taken:

Open a Web browser and type the applications WWW address in the address field. For example www.webbrowser.org/webbrowser.jnlp

Wait for the application to finish up loading, for the first time it can take up to 2 minutes on 1mbit connection to load

Accept the security pop up, you can disable it for every concussive use by ticking “always trust content from this publisher”

Installation and use 53

A.2 Use

After the application is opened persons with sight impairment can take over. The use of application is very easy and users are guided by a voice navigation and feedback system.

Mouse wheel is used for increasing or decrease the sound volume.

Tab key is used to change between:

 Web address field - used for inputting the address of a Web page. Every letter input on the keyboard will be echoed back through the voice feedback system and hitting enter will commit the typed address. The user will be informed if the address is correct or the WWW server is offline.

 Link number field - used for inner Web page navigation. User can access any inner parts of the Web page by typing their

54

corresponding number. Those numbers will be spoken during the main Web page speech output. It will be automatically enabled and focused if the Web page was successfully retrieved.

 Pause button – used to pause the Web paged speech output. Activated by pressing enter

 Resume button – used to resume the Web page speech output. Activated by pressing enter

Client compatibility list 55

B Client compatibility list

Operating Desktop Platform System Browsers JRE JDK Managers Version

SolarisTM Operating System, 32-bit and 64-bit

Solaris JDS-2 Sparc (Gnome- Solaris 10 (32) Metacity), CDE-dtwm

Gnome- Mozilla 1.4x, 32-bit 32-bit Metacity 1.7+ Install Install Solaris 9 2.4.34 or later CDE-dtwm

CDE-dtwm, Solaris 8 Openwin-olwm

Solaris Gnome- x86 Solaris 10 (32) Metacity, CDE Mozilla 1.4x,

1.7+ 32-bit 32-bit

Gnome- install Install Solaris 9 Metacity, CDE

Solaris 8 CDE, Openwin

OpenSolaris GNOME 2.24.0 Firefox 3

Windows 32-bit

Windows Windows XP Windows/Active IE 6 SP1+, 32-bit Intel IA32 Professional for Windows IE 7, IE 8 Install 32-bit

56

Windows XP Mozilla 1.4.X Disk Install Home or 1.7+, space Disk Netscape space Windows 7.X, Firefox Server 2003 1.06 - 3

Windows IE 6 SP1+, 2000 Mozilla 1.4.X Professional or 1.7+, Netscape Windows 7.X, Firefox 2000 Server 1.06 - 3

Windows Vista

Windows IE 7 or IE 8 Server 2008

Windows 64-bit

Windows IE 6 SP1+, IE x64 7, IE 8 32-bit mode Mozilla 1.4.X Windows XP or 1.7+, Netscape 7.X,

Firefox 1.06 - 32-bit 32-bit Windows/Active 3 Install Install for Windows IE 6 SP1+, IE Disk Disk 7, IE 8 space space Mozilla 1.4.X Windows or 1.7+, Server 2003 Netscape 7.X, Firefox 1.06 - 3

Client compatibility list 57

Windows

Vista IE 7 or IE 8

Windows Server 2008

Windows Windows XP 64bit OS, x64 32bit 64-bit mode Browsers: IE 6 SP1+, IE 7, IE 8 64-bit 64-bit Windows Mozilla 1.4.X Install Install Server 2003 or 1.7+, Windows/Active 32-bit 32-bit Netscape 7.X, for Windows Install Install Firefox 1.06 – Disk Disk 3+ space space

Windows 64bit mode, Vista 64bit

Windows Browsers: Server 2008 IE 7 or IE 8

Linux 32-bit

Linux Gnome1.4- Red Hat 2.1, IA32 sawfish 1.0 or Red Hat later Enterprise Gnome 2.2 - Linux 3.0, metacity 2.4 or 4.0, 5.0 - later 5.2 Mozilla 1.4.x

or 1.7+, 32-bit 32-bit Suse Firefox 1.06 Install Install Enterprise - 3 Linux Server Gnome2.0.5- 8, Suse Metacity 2.6.2 Enterprise or later Linux Server (default: 2.4) 9, Suse Enterprise

58

Linux Server 10, Suse Enterprise Linux Desktop

Turbo Linux 10 (ONLY Chinese and Japanese Locale. No english.) Gnome-sawfish 1.0 or later

Linux 64-bit

Linux x64 Suse 32-bit Enterprise mode Linux Server 8, Suse Enterprise Gnome2.0.5- Linux Server Mozilla 1.4.x Metacity 2.6.2 9, Suse or 1.7+, 32-bit 32-bit or later Enterprise Firefox 1.06 Install Install (default: 2.4) Linux Server - 3 10, Suse Enterprise Linux Desktop

Red Hat Gnome2.0.5-

Client compatibility list 59

Enterprise Metacity 2.6.2 Linux 3.0, or later 4.0, 5.0 - (default: 2.4) 5.2

Turbo Linux 10 (ONLY Chinese and Gnome-sawfish Japanese 1.0 or later Locale. No english.)

Linux x64 Suse 64bit OS, 64-bit Enterprise 32bit mode Linux Server Browsers: 8, Suse Mozilla 1.4.x Enterprise Gnome2.0.5- or 1.7+, Linux Server Metacity 2.6.2 Firefox 1.06 9, Suse or later - 3 64-bit 64-bit Enterprise (default: 2.4) Install Install Linux Server 32-bit 32-bit 10, Suse Install Install Enterprise

Linux

Desktop

Red Hat 64bit mode, Enterprise Gnome 2.2 - 64bit Linux 3.0, metacity 2.4 or Browsers: 4.0, 5.0 later

[16] Fig. A.2: Server application compatibility list

60

C Server compatibility list

 Tomcat server  IBM's WebSphere Application Server  BEA Weblogic Application Server  Caucho's Server  Adobe's JRun Web Server  Orion Application Server  Oracle Application Server  ATG Dynamo Application Server  Pramati J2EE Server  Borland AppServer  Server  The World Wide Web Consortium's Jigsaw Server  Zeus Web Server  iPlanet (Netscape) Web Server Enterprise Edition  iPlanet (Netscape) Web Server Enterprise Edition for Linux  Netscape Enterprise Server 3.5.1 and 3.6  GemStone/J Application Server  Gefion Software's LiteWebServer  CtO-Jstar  M5 Web Server  Servertec's iServer  Lotus's Domino Go WebServer  Paperclips Java Servlet Server 2.0  jo! Web Server  KonaSoft Enterprise Server  NGASI (Next Generation Application Server)  Avenida Web Server

Server compatibility list 61

 vqServer  Serfler  WebEasy WEASEL Application Server  Tandem's iTP WebServer  Novocode's NetForge  Enhydra [12]

62