A SPEECH-ENABLED INTERFACE AGENT FOR STATIC HTML WEB PAGES

Mhd Yusley Yusoffa, Ahmad Safuan Abu Bakara, Amri Mustafaa, and Mohd Khairil Azhan Zakariaa aFaculty of Computer Science and Information System University of Technology Malaysia 81310 Skudai, Johore, Malaysia Tel: 607-5576160, Fax: 607-5565044

E-Mail: , , ,

Abstract: A speech-enabled interface agent is a part of the interface agent. It is an interface agent that can process speech in a simple plain English, be seen physically on the monitor screen, and has feedbacks to the user interaction through speech input. The speech-enabled interface agent is a user friendly, it can reduce the time-usage of keyboard and mouse, and implicitly, it can reduce the emotional tenseness of a user. This paper is written, to solve the dependency of a user, using keyboard and mouse by using the speech-enabled interface agent to tune up performance. Speech-enabled interface agent can also help the handicapped to interact with the static HTML web pages interactively. Initially, to build such a static HTML web pages, a speech-enabled interface agent can be build using ® Agent. Microsoft® Speech API is used to build a speech engine input for the speech-enabled interface agent. To combine the and Microsoft Speech API, altogether in a static HTML web pages, an e-book designed to meet the requisites of the Microsoft Agent and Microsoft Speech API. This e-book is an electronic book has static structures, consists simple plain text, and uses a collaboration of keyboard, mouse and speech- enabled interface agent to navigate into it. The result is interaction between the user and the static HTML web pages, and with the static HTML web pages themselves. Then, the computational expectation is to see how much response time towards the static HTML web pages using the speech-enabled interface agent based on a user speech input on a current state of a computer hardware using the available CPU and memory.

Keywords – Speech-Enabled Interface Agent, Microsoft Agent, Microsoft Speech API, Static HTML Web Pages

1. Introduction the computer monitor, and has feedbacks to user interaction through speech input. 1.1 Speech-enabled interface agent 1.2 Microsoft Agent To define the whole meaning of the “speech- enabled interface agent”, it must be broken down According to Microsoft, Microsoft Agent is a set into atom definitions. Thus, starting with the of programmable software services that supports definition of the interface agent itself, it is an the presentation of interactive animated emphasizing autonomy and learning in order to characters within the ® perform tasks for their owner as defined by Maes interface [2]. Developers can use characters as (1994) [1]. The whole key to this metaphor of interactive assistants to introduce, guide, the interface agent is a personal who is entertain, or otherwise enhance their web pages collaborating with the user in the same work or applications in addition to the conventional environment. The speech-enabled that added to use of windows, menus and controls. Microsoft the interface agent is its capability, meaning that Agent enables software developers and web the interface agent has ability to speech in simple authors to incorporate a new form of user plain language, which is English and borrowing interaction. In addition to mouse and keyboard some technologies from Microsoft, its input, Microsoft Agent includes optional support characteristics extend to be seen physically on for so applications can the static HTML web pages. Generally, respond to voice commands. regarding a common static HTML web pages, the user must scroll down or read the content of the static HTML web pages to find the information needed. In addition, it is time consuming and some works difficult when it turn user has to do other things simultaneously at the same time. Consider at the same time, when user’s hand is full, cannot use the keyboard and/or mouse to navigate static HTML the web Figure 1: Microsoft Agent Characters pages to seek the information needed at that time. As described in Figure 1 above, Microsoft Agent also provides characters, which can respond Seeking information or reading the content of the using synthesized speech, recorded audio, or text static HTML web pages is also an emotional in a cartoon word balloon. The characters from tenseness when the content is too much too read left are labeled as Genie, Merlin, Robby the and it is tiring the hand to scroll down the web Robot and Peedy the Parrot. pages. Luckily, the problem states above is pinpoint towards user/s that can use mouse 1.3 Microsoft Speech API and/or keyboard using hand. However, raise an issue here if the user is handicapped that cannot This application-programming interface (API) is use hand. Should he/she ask for help all the time an industry-standard programming interface for to navigate the static HTML web pages to find speech. The Speech API lets developers to write the information or to read the content of the web Windows 32-bit based applications that use pages? It is constraining the handicapped user if speech recognition and text to speech [3]. The he/she cannot physically use the keyboard and/or API is specified as a collection of OLE mouse to navigate the static HTML web pages. (COM) objects. Using OLE makes speech available to developers using 3. Objectives in , C/C++, or any other programming language that can access Object Based on the problem explained above, the Linking Embedded (OLE) objects directly of objectives of this paper are: - through automation. The Speech API requires and above, and it still need a third i. To develop an interface agent that party speech engine, one for speech recognition has speech ability to interact with and one for converting text to speech. the user and the static HTML web pages. 1.4 Static HTML web pages ii. To provide the user options Static Hyper Text Markup Language (HTML) whether to use the conventional web pages are a hard-coded HTML that their method or other method to interact structures don’t change [4]. Often these pages with the static HTML web pages. are created “by hand” using a text-editor or a program such as Microsoft FrontPage. A static iii. To add some cheerful functionality web page is stored in a file system. The concept with the existence of the speech- behind all the creation of the static HTML web enabled interface agent in the static pages is WYSIWYG – what you see is what you HTML web pages so that the user get. does not feel bored.

2. The Current Problem iv. To use the speech-enabled interface agent to seek information or to read The problem with the conventional user interface the content of the static HTML web is limited by using only “point and click” pages in a less time then using method, the mouse and of course, the keyboard. mouse and keyboard. There is no way for the user to use other method to access the user interfaces and the content of v. To aid the handicapped persons to HTML web pages using the speech-enabled navigate the static HTML web interface agent. pages without having to ask for help from others. A D

4. Methodology Microsoft Agent There are steps to follow for developing a C Microsoft Speech API speech-enabled interface agent for static HTML web pages. In order, there are: -

Creating the characteristics and functionalities of the speech-enabled Static HTML Webpages interface agent using Microsoft Agent. B

Figure 2: The architecture describing the incorporation of the speech-enabled interface agent and the static HTML web pages. Combining Microsoft Speech API with the Microsoft Agent so that it can Legends: receive speech input from user. A Other interact ional method using speech- enabled interface agent B Conventional interact ional method using keyboard and mouse Creating the content of the static HTML C User i. web pages. In this case, use an already D The speech-enabled interface agent ii. ade el ectronic book (e-book).

The requirements for the user to interact with the static HTML web pages using speech-enabled interface agent is based on this specifications: -

Combining altogether in the static i. Microsoft Windows 95, Windows HTML web pages using scripting 98, Windows NT 4.0 (x86) or later language for the Microsoft Agent ii. version 3.02 or and OLE objects for accessing the later speech engine. iii. Personal computer with a Pentium 100 MHz or higher processor iv. At least 16 MB of memory 5. The architecture v. Hard-disk space for core components: 1MB In this part, we will discuss about the vi. Hard-disk space for optional architecture of the speech-enabled interface components: agent. The architecture as described in Figure 2 below is quit simple describing the speech- • Lernout & Hauspie™ enabled interface agent itself, incorporating with TruVoice® Text-to-Speech the static HTML web pages. engine for speech output: 1.6 MB The bodies of the speech-enabled interface agent as described in the Figure 2 below build using • Microsoft Speech Recognition two kinds of technologies provided by Engine for speech input: 22 Microsoft, Microsoft Agent and Microsoft MB Speech API. These two technologies made it possible to interact with the user and the static • Characters installed locally: 2-4 MB per character vii. Windows compatible sound card Then, the ActiveX control must be specified by selecting Insert | Advanced | ActiveX Control… viii. Compatible speakers and in the Microsoft FrontPage editor as described in microphone the Figure 4 below.

ix. Microsoft Mouse or compatible pointing device Nevertheless, despite all the interaction between the user and the speech-enabled interface agent, we can still interact with the static HTML web pages using the conventional method, keyboard and mouse. Moreover, the use of the speech- enabled interface agent is not for replacing the conventional method, it is additional option for the user. Nevertheless, for the disable persons that cannot use hand, it is a relief way for them to use the speech-enabled interface agent. With a Figure 4: Select ActiveX control few exercises, we know that these disabilities can achieve great use of the speech-enabled 6.2 Combining with Microsoft Speech API interface agent to interact with the static HTML web pages. Most of us prefer talking to typing or pointing. So, allowing the user to interact and response by 6. Implementation voice commands improve tremendously the usefulness of the application. Moreover, luckily, 6.1 Develop the speech-enabled interface agent. Microsoft® Agent provides these benefits, using voice commands. Strictly, the use of Microsoft There are two ways to develop the speech- Speech API specifications must be suit with a enabled interface agent using Microsoft Agent, compatible voice-recognition engine that either through a COM (Component Object supports speech input (speech recognition or SR) Model) or use the ActiveX control. For our and speech output (text to speech or TTS). purposes to incorporate with the static HTML web pages we shall be using Microsoft To use the Speech API, we specified the control FrontPage and Jscript, which requires the that will handle the conversion of the text into ActiveX control. The incorporation will be later the character’ by select Insert | Advanced | in this section. ActiveX Control menu option and select the Microsoft Agent Lernout & Hauspie Wrapper Because we develop the speech-enabled interface Control as described in the Figure 5 below. agent in a local computer (local site), not in a remote site, so the speech-enabled interface agent will be loaded from a local computer (local site). For this kind of development, Microsoft provides three kinds of popular characters, they are: Genie, Merlin, and Robbie the Robot Agent [5]. In this development, we will use the character Merlin as described in Figure 3 below.

Figure 3: The Merlin character Figure 5: Specify the control To suit the development of the speech-enabled And the parts of the Jscript codes are described interface agent, using the specification by the as shown in the Listing B below. Microsoft Speech API, we had downloaded the Microsoft Command and Control Engine from Var Merlin the Microsoft Agent’s website. Moreover, the function OnLoad() { use of a good microphone is crucial at this time, AgentControl.Connected = True so a headset mike is used because it works better AgentControl.Characters.Load ("merlin", http://agent. than desktop or monitor mikes. microsoft.com//characters// Even though, there are limitations to the set of merlinsfx//merlinsfx.acf); possible commands that we had created, along Merlin = AgentControl.Characters.Character with a few housekeeping commands such as ("merlin") “Hide ” or “Show Command Merlin.Get ("State", "Showing, Windows”. Speaking") Merlin.Get ("Animation", "Greet, GreetReturn, 6.3 Altogether, in the static HTML web pages Acknowledge, Announce, We had chosen an e-book as a demo for this final Congratulate, DoMagic1, development. The e-book as mentioned in the DoMagic2, Read") Merlin.Show (); NOTE: Jscript abstract above is static HTML web page which requires parens structure’s is static, consists simple plain text, unlike VBScript and uses collaboration between the conventional Merlin.Get ("State", "Hiding") Merlin.Play ("Greet"); method, the keyboard and the mouse. Then the Merlin.Speak ("Hello. I am other method is the use of the speech-enabled Merlin. I am the interface agent to navigate into it. latest in Microsoft Animation Technology."); After the controls had been specified, then the Merlin.Play ("Announce"); incorporation before previewing it as full Merlin.Speak ("And who might functional web pages is described in the Figure 6 you be?"); below. Merlin.Play ("Acknowledge"); Merlin.Play ("Congratulate"); Merlin.Speak ("Congratulations. Well done!"); Merlin.Play ("DoMagic1"); Merlin.Play ("DoMagic2"); Merlin.Play ("Read"); Merlin.Hide () }

Listing B: Parts of the Jscript code

Finally, the Merlin character will be appearing in as illustrated in the Figure 7 below.

Figure 6: In normal view, should look like this

After that, we must add the Jscript to the web pages that will allow us to load and control Merlin’s action. We specified it as a function OnLoad() as showing in the Listing A below.

Figure 7: The speech-enabled interface agent incorporated with the static HTML web pages Listing A: The function OnLoad() 7. The results microphone and a quality sound card will make the interaction between the user and the speech- The test drive of this development, can be seen enabled interface agent achieve its maximum from its qualitative measure from the side of the interaction to be understood by the two sides, the user’s emotional, that he/she feel that the user and the speech-enabled interface agent. collaboration between keyboard, mouse and the speech-enabled interface agent is needed, especially for navigating through the static and plain text of the web pages that sometimes, the 8. Conclusion and future works feel of boring came because if there is no alternative way, meaning no facilitator to From the beginning of the development of the navigate through the web pages, the user may not speech-enabled interface agent to the feel excited to stay and discover the whole incorporation with the static HTML web pages, content of the static HTML web pages. we know that the journey to make this kind of interaction with the user is long way up. There Then, the next test drive is based on the available are still too much works to do to make the user memory and CPU on different kinds of the users have the sensation feel to work in the computer. There are 2 computers that we’ve environment of the speech-enabled interface tested this speech-enabled interface agent, the agent with any kind of applications. computer with Intel Pentium IV 1GHz and computer with Intel Pentium III 500 MHz. Both For plans, we shall use the speech-enabled have 98 MB RAM. The result showed as interface agent for the incorporation with any described in Figure 8 below. kind of static or dynamic kind of web pages, because those are the reality of web pages these Time, ms days. In addition, speech-enabled interface agent must be featured with many functions to support 4 those kinds of web pages. In addition, our team will make sure that the goal and its objectives are 3 achieved exactly.

2

1 9. References

Memory, Kb [1] Hyacinth S. Nwana, “Software agents: an overview”, Knowledge Engineering Review, p Figure 8: A time versus memory graph tested on 14 (1996). 2 computers. [2] Microsoft Corporation, “Introduction to Legends: Microsoft Agent”, MSDN , Platform SDK, User Interface Services, Microsoft Agent. Computer using Intel Pentium IV Jan. 2000. 1GHz Computer using Intel Pentium III [3] Rozak, Mike, “Talk to your computer and 500 MHz have it answer back with the Microsoft speech API”, Microsoft Systems Journal, 11(1) Jan. 1996 p. 19-32.

This kind of test drive actually does not vary too [4] Hillerbrand, Rainer and Wierlemann, much from another computer to another Thomas, “Static web pages”, computer that has different amount of memory http://mobileinternetguide.org/html/ch04s09s10. and CPU. The most important thing that we html found is the usage of the microphone and the quality of the sound card that is available in [5] Lenos, Doug, “An agent at your fingertips”, those computers. The usage of better amount of Microsoft Web Builder, MSDN Library – Jan memory and CPU and added with a good 2000, Periodicals 1998.