A Speech-Enabled Interface Agent for Static Html Web Pages
Total Page:16
File Type:pdf, Size:1020Kb
A SPEECH-ENABLED INTERFACE AGENT FOR STATIC HTML WEB PAGES Mhd Yusley Yusoffa, Ahmad Safuan Abu Bakara, Amri Mustafaa, and Mohd Khairil Azhan Zakariaa aFaculty of Computer Science and Information System University of Technology Malaysia 81310 Skudai, Johore, Malaysia Tel: 607-5576160, Fax: 607-5565044 E-Mail: <[email protected]>, <[email protected]>, <[email protected]>, <[email protected]> Abstract: A speech-enabled interface agent is a part of the interface agent. It is an interface agent that can process speech in a simple plain English, be seen physically on the monitor screen, and has feedbacks to the user interaction through speech input. The speech-enabled interface agent is a user friendly, it can reduce the time-usage of keyboard and mouse, and implicitly, it can reduce the emotional tenseness of a user. This paper is written, to solve the dependency of a user, using keyboard and mouse by using the speech-enabled interface agent to tune up performance. Speech-enabled interface agent can also help the handicapped to interact with the static HTML web pages interactively. Initially, to build such a static HTML web pages, a speech-enabled interface agent can be build using Microsoft® Agent. Microsoft® Speech API is used to build a speech engine input for the speech-enabled interface agent. To combine the Microsoft Agent and Microsoft Speech API, altogether in a static HTML web pages, an e-book designed to meet the requisites of the Microsoft Agent and Microsoft Speech API. This e-book is an electronic book has static structures, consists simple plain text, and uses a collaboration of keyboard, mouse and speech- enabled interface agent to navigate into it. The result is interaction between the user and the static HTML web pages, and with the static HTML web pages themselves. Then, the computational expectation is to see how much response time towards the static HTML web pages using the speech-enabled interface agent based on a user speech input on a current state of a computer hardware using the available CPU and memory. Keywords – Speech-Enabled Interface Agent, Microsoft Agent, Microsoft Speech API, Static HTML Web Pages 1. Introduction the computer monitor, and has feedbacks to user interaction through speech input. 1.1 Speech-enabled interface agent 1.2 Microsoft Agent To define the whole meaning of the “speech- enabled interface agent”, it must be broken down According to Microsoft, Microsoft Agent is a set into atom definitions. Thus, starting with the of programmable software services that supports definition of the interface agent itself, it is an the presentation of interactive animated emphasizing autonomy and learning in order to characters within the Microsoft Windows® perform tasks for their owner as defined by Maes interface [2]. Developers can use characters as (1994) [1]. The whole key to this metaphor of interactive assistants to introduce, guide, the interface agent is a personal assistant who is entertain, or otherwise enhance their web pages collaborating with the user in the same work or applications in addition to the conventional environment. The speech-enabled that added to use of windows, menus and controls. Microsoft the interface agent is its capability, meaning that Agent enables software developers and web the interface agent has ability to speech in simple authors to incorporate a new form of user plain language, which is English and borrowing interaction. In addition to mouse and keyboard some technologies from Microsoft, its input, Microsoft Agent includes optional support characteristics extend to be seen physically on for speech recognition so applications can the static HTML web pages. Generally, respond to voice commands. regarding a common static HTML web pages, the user must scroll down or read the content of the static HTML web pages to find the information needed. In addition, it is time consuming and some works difficult when it turn user has to do other things simultaneously at the same time. Consider at the same time, when user’s hand is full, cannot use the keyboard and/or mouse to navigate static HTML the web Figure 1: Microsoft Agent Characters pages to seek the information needed at that time. As described in Figure 1 above, Microsoft Agent also provides characters, which can respond Seeking information or reading the content of the using synthesized speech, recorded audio, or text static HTML web pages is also an emotional in a cartoon word balloon. The characters from tenseness when the content is too much too read left are labeled as Genie, Merlin, Robby the and it is tiring the hand to scroll down the web Robot and Peedy the Parrot. pages. Luckily, the problem states above is pinpoint towards user/s that can use mouse 1.3 Microsoft Speech API and/or keyboard using hand. However, raise an issue here if the user is handicapped that cannot This application-programming interface (API) is use hand. Should he/she ask for help all the time an industry-standard programming interface for to navigate the static HTML web pages to find speech. The Speech API lets developers to write the information or to read the content of the web Windows 32-bit based applications that use pages? It is constraining the handicapped user if speech recognition and text to speech [3]. The he/she cannot physically use the keyboard and/or API is specified as a collection of OLE mouse to navigate the static HTML web pages. Component Object Model (COM) objects. Using OLE makes speech available to developers using 3. Objectives in Visual Basic, C/C++, or any other programming language that can access Object Based on the problem explained above, the Linking Embedded (OLE) objects directly of objectives of this paper are: - through automation. The Speech API requires Windows 95 and above, and it still need a third i. To develop an interface agent that party speech engine, one for speech recognition has speech ability to interact with and one for converting text to speech. the user and the static HTML web pages. 1.4 Static HTML web pages ii. To provide the user options Static Hyper Text Markup Language (HTML) whether to use the conventional web pages are a hard-coded HTML that their method or other method to interact structures don’t change [4]. Often these pages with the static HTML web pages. are created “by hand” using a text-editor or a program such as Microsoft FrontPage. A static iii. To add some cheerful functionality web page is stored in a file system. The concept with the existence of the speech- behind all the creation of the static HTML web enabled interface agent in the static pages is WYSIWYG – what you see is what you HTML web pages so that the user get. does not feel bored. 2. The Current Problem iv. To use the speech-enabled interface agent to seek information or to read The problem with the conventional user interface the content of the static HTML web is limited by using only “point and click” pages in a less time then using method, the mouse and of course, the keyboard. mouse and keyboard. There is no way for the user to use other method to access the user interfaces and the content of v. To aid the handicapped persons to HTML web pages using the speech-enabled navigate the static HTML web interface agent. pages without having to ask for help from others. A D 4. Methodology Microsoft Agent There are steps to follow for developing a C Microsoft Speech API speech-enabled interface agent for static HTML web pages. In order, there are: - Creating the characteristics and functionalities of the speech-enabled Static HTML Webpages interface agent using Microsoft Agent. B Figure 2: The architecture describing the incorporation of the speech-enabled interface agent and the static HTML web pages. Combining Microsoft Speech API with the Microsoft Agent so that it can Legends: receive speech input from user. A Other interact ional method using speech- enabled interface agent B Conventional interact ional method using keyboard and mouse Creating the content of the static HTML C User i. web pages. In this case, use an already D The speech-enabled interface agent ii.m ade el ectronic book (e-book). The requirements for the user to interact with the static HTML web pages using speech-enabled interface agent is based on this specifications: - Combining altogether in the static i. Microsoft Windows 95, Windows HTML web pages using scripting 98, Windows NT 4.0 (x86) or later language for the Microsoft Agent ii. Internet Explorer version 3.02 or and OLE objects for accessing the later speech engine. iii. Personal computer with a Pentium 100 MHz or higher processor iv. At least 16 MB of memory 5. The architecture v. Hard-disk space for core components: 1MB In this part, we will discuss about the vi. Hard-disk space for optional architecture of the speech-enabled interface components: agent. The architecture as described in Figure 2 below is quit simple describing the speech- • Lernout & Hauspie™ enabled interface agent itself, incorporating with TruVoice® Text-to-Speech the static HTML web pages. engine for speech output: 1.6 MB The bodies of the speech-enabled interface agent as described in the Figure 2 below build using • Microsoft Speech Recognition two kinds of technologies provided by Engine for speech input: 22 Microsoft, Microsoft Agent and Microsoft MB Speech API. These two technologies made it possible to interact with the user and the static • Characters installed locally: 2-4 MB per character vii. Windows compatible sound card Then, the ActiveX control must be specified by selecting Insert | Advanced | ActiveX Control… viii.