Voice Instant Messenger

VoIM Voice Instant Messenger

Project Report CS 491 B – Fall 2006 Andrew Miller December 8, 2006 Abstract

Verbal communication is one of the most important forms of communication in the world. With it one can interpret many things in a message from tone of voice, pitch and content. Often when communicating over the Internet, some of these things are lost and a message can lose its actual meaning due to misinterpretation. The things commonly lost are sarcasm, compassion and confusion. VoIM seeks to solve these problems by allowing users to send voice communications to each other, much like a text instant messenger, to preserve these commonly lost, but very important aspects of language. VoIM is a voice instant messenger that sends voice clips instead of text messages. Unlike most voice communication software available, VoIM combines the ability of most text instant messengers to retrieve a message at your leisure and listen to it again at a later time with the voice clarity of other popular streaming voice communication software.

I. Introduction

In a technological world like ours, why should distance separate us from verbal communication? Why should we have to be sitting next to a phone or in front of a computer to hear what somebody has to say right when they say it? We should no longer have to miss messages by not being at the right place at the right time. VoIM is a solution to these problems. Being limited to text messenger services like AOL Instant

Messenger or MSN Messenger, or live voice chat programs like Teamspeak and

Ventrilo is a thing of the past. The limitations of not knowing tone of voice or mood in text on AIM or MSN Messenger, and the limitations of having to be at your computer to listen to messages on Ventrilo and Teamspeak are all solved with VoIM.

Too often in text instant messenger programs are messages misinterpreted due to missing a tone of voice or mood. Frustration, sarcasm, sympathy, and many other elements of language that can normally be interpreted by how something is said, is lost when it is transformed into text. Take for example the phrase “Nice work”. When said with sincerity, it is a compliment, but with sarcasm it is an insult. Just by reading it, there is no way to tell the message’s absolute meaning. VoIM will take care of this problem by sending the same message over as a voice clip, so tone of voice will be preserved. With VoIM you’ll hear the compliment or sarcasm in the person’s voice when they send their message.

What about when you have to leave your computer, or if something is too short for a phone conversation? VoIM can send any length voice clip to another user and it can be retrieved any time they wish. With VoIM you can listen to a message whenever you want, as many times as you want, and even save it for future use. No longer does somebody have to repeat themselves to share something they’ve said to others. With

VoIM you can forward voice clips to other users so they can hear exactly what you did.

No longer will you have to retell a story or explain something to multiple people individually.

Other features VoIM offers are a friends list to keep track of which if your buddies are online, offline and away. The away feature allows users to record a voice message to play to explain where they are, what they’re doing and why they’re away.

VoIM can also export saved conversations and messages to a playable file so you can listen to them at any time. These files can later be imported and sent as messages to other users so you can share any conversation any time. II. Technological Background

VoIM was produced using the Java object-oriented programming language with JBuilder 2006 Enterprise Edition as the Integrated Development Environment (IDE). The key reasons for choosing Java were the ease of network programming and Graphic User Interface (GUI) development. The reasons for using JBuilder were ease file management, class and function diagrams and compilation control. The JSpeex JavaSound library was also used to allow sound encoding into a manageable size to save bandwidth and disk space.

II.A – Java

Java was developed and is maintained by Sun Microsystems as a cross platform alternative to many object-oriented programming languages already on the market. It is a very flexible language that rather than compiling into system-specific executables compiles into Java bytecode that is usable on any system that has the Java runtime installed. This includes Windows, Linux and Macintosh systems and allows for rapid program development for all three systems without the need to reprogram and recompile for compatibility issues. Java was chosen as the development language for its cross platform compatibility as well as the ease of use of its user interface development tools. Unfortunately, further research into its sound capabilities, or lack thereof would have been wiser and could have lead to a more informed decision as to which language to use.

II.B – Borland JBuilder 2006 Enterprise Edition

Borland JBuilder 2006 Enterprise Edition was chosen to develop VoIM in due to the ease of file management and coding assistance. The built in file tree manager allows for easy viewing and management of all packages and files. The native API allows for quick method and argument management for the Java runtime and any classes developed by the user. This vastly reduces coding time due to lack of wasted time looking at the API for trivial functions and libraries.

The GUI and network libraries allowed for rapid development of a GUI and file transfer services. Without the ease of use and control available in the Java Swing and

AWT libraries, the amount of time spent creating the GUI would be very large. The controls make layout easy to do and objects such as buttons very powerful tools. The network libraries allow for easy server creation and data transfer between both client and server.

Borland JBuilder 2006 was chosen because of its familiarity and powerful management tools. Its integration with the Java API allowed for quick method completion and lookup. The process management features also allowed for easy testing of multiple clients and rapid recompilation and running of newly revised code.

II.C – JSpeex Java Speex Encoding/Decoding Library

The JSpeex library is an open source Java port for the Speex speech codec. The entire library is written in Java, and it provides an encoder and decoder as well as a plugin for the JavaSound API. JSpeex is one of the few active projects that is working on sound implementations in Java. Due to lack of support by Sun, and the discontinued development of JavaSound when Sun’s previous sound engineer left, Java has been left without any good support for audio compression. JSpeex however enables audio streams that would normally be very large and memory intensive to be compressed and reduced to only a fraction of their size, making them much more manageable and practical for web based applications such as VoIM. JSpeex was chosen because it was one of the few Java audio libraries that actually worked and was still in development. III. System Overview

VoIM uses a simple user interface that allows the user to do everything they need in a single window. The users can select the recipient(s) of their messages, record and save messages, play messages and forward messages all from the same window without any complex knowledge of the program. The user can name, record, preview and send a message from the top portion of the GUI. The recipients are selected from a simple list on the left side, which shows all online, offline and away users. Older received messages are listed in the lower right portion of the GUI in a simple list that allows the user to listen, forward or delete a message. A user can also select a friend from their list and view messages only sent to or from that person.

When a user records a message it is saved as a message object, which includes the message, the name of the message, who it is being sent to, and who sent it. The program maintains an array of all messages the user has sent or received, which is saved to a database file when the program closes, and is loaded when the program is opened. The VoIM server maintains a list of all of the user’s friends lists, as well as manipulates message traffic between all currently connected users. IV. Design and Implementation

IV.A – Sound and Voice

Using the JavaSound library in conjunction with the JSpeex libraries the voice clips are recorded then encoded as JSpeex audio streams. They are then stored as a proprietary file type on the user’s machine as Message objects which contains the sender, any and all recipients, as well as the audio stream. The streams are converted and sent as ByteArrayOutputStreams and read using ByteArrayInputStreams, then converted to JSpeex audio streams for playback. Audio encoding with Java is very troublesome due to a complete lack of support and development by Sun Microsystems.

Their previous JavaSound developer, having left the project in early 2005 has left a hole in the Java framework as far as audio management goes. He was also a member of the

Tritonus team, which has seen no developments for a couple of years as well. The largest setback to this project has been the encoding and decoding of audio streams, due to the large nature of their completely uncompressed state.

Uncompressed audio is very poor for network use due to its size, which floods network traffic and uses computer memory, which could make a program very inefficient. After determining that the base JavaSound support wouldn’t yield the necessary compression, and after extensive testing and lack of results from the Tritonus library, we determined that neither would be a viable solution to our needs. After much searching and testing, we finally settled on JSpeex due to its continued development and good encoding results.

IV.B – Networking

The Java network API allowed for client and server communication to be very simple yet secure. When connecting a client sends its user information to the server to be verified and updated on the server’s list of users. Each packet sent from the client sends with it the user ID of the client’s user to verify the packet’s sender, and the server sends an appropriate response to verify the packet was sent properly. The client implements a PacketHandler class to handle packets from the server and perform the necessary tasks based upon the OpCode (or Operation Code, represented as a byte) of that packet. The connection with the server is handled by a Socket which connects directly to the server’s ServerSocket and maintains that connection until the user disconnects or the server forces the client to disconnect.

The networking portion of this project took the most organization in terms of structure. A standard had to be created to maintain packet integrity and consistency, and also ensure that the client was forming the correct packets, and the server reading them accurately and completely. At one point the network code had to be completely redone to fix a memory leak issue, as well and to reorganize packet structure to save network bandwidth. The new structure included the client sending its particular userID with each packet so the server could verify it was getting data from the correct client. The change in packet structure lowered the number of packets sent when sending messages to other users by sending the list of users with the message instead of doing so individually.

IV.C – User Interface

The client’s user interface is very simplistic and easy to use. Buttons and lists are used to control everything a user could want to do. The login screen has a text area for the user to enter their username, and a login button for them to connect to the server.

The friends list is to the left where a particular user’s friends are listed and can be selected. Once a friend is selected, a user can press the “Send Message” button to open up the Send Message window. The window is separated by tabs along the top that represent each of the users that a current message could be sent to. In this window, a user has the option of recording a message, discarding the message or sending it to one or all of the currently selected friends. Users can also add or remove friends from the current message pane by selecting more friends from their friends list and pushing the

Send Message button, or hitting the Remove Recipient button on that particular friend’s tab in the Send Message window.

The message list is contained in the main VoIM window on the right side. Users can scroll down through the message list to listen to, delete, forward or reply to received messages. If a friend is selected from the friends list, only messages to and from that friend will be displayed in the message list.

The user interface has been the most flexible portion of the project because it least affected the backend methods and was as a whole purely cosmetic. Its design has changed quite a bit, and the ways users can interact with it has improved drastically from the first design. The controls for message manipulation have been simplified and the look of it has become cleaner and less scattered. IV.D – Server

The VoIM server keeps track of users in an ArrayList of User objects. A User object contains the user’s name as a String, the user’s user ID as an integer and an

ArrayList of all the user’s friends as Friend objects. A Friend object contains the friend’s user ID as an integer and the friend’s username as a String. The server implements a

PacketHandler class that controls any incoming packets from the client and performs a specific operation based upon the packet’s OpCode. The server also contains an

ArrayList of OnlineUser objects, which keeps track of all the current online users and contains the users’ IDs as well as their current status of Online or Away. The server maintains an array of clientConnection threads of all the currently connected clients.

The clientConnection threads contain the client’s ID, the user ID and the Socket that client is connected to. The clientConnection thread handles all packets that the client sends with an instance of the PacketHandler class.

Server Main Class

The server was among the easier portions of the project, because as a whole it just handled data manipulation and redirection between the clients. Only one glaring bug surfaced during the creation of the server, which was rejecting multiple clients. The server’s main purpose is to handle the interactions between users and update their status to all applicable clients. Maintaining the list of friends and status updates between users is the most complex portion of the server and took the most planning. V. Analysis

Most of the testing of this project was related to sound recording, encoding and playback. It was the largest hurdle to overcome and was by far the largest time sink as far as research, testing and implementation is concerned. We did extensive tests with various audio codecs and configurations to determine the best possible usage. While a number of the tests of various libraries and setups failed due to JavaSound’s poor design and implementation, as well as lack of support, JSpeex finally gave us the closest results to what we had wanted from the start. That is, small file size and good quality, albeit the usage of it was just as frustrating as the other libraries we attempted to use, most of which ended up not working whatsoever.

Other testing that was done was running many instances of the client against the server, then reaching the server cap and seeing that the server responded correctly to the clients, namely rejecting new ones. Also testing was done to check if multiple clients could be run on the same machine, and accommodations were made to allow this. Clients also were tested to ensure that all windows would react properly when the different features were used, regardless of which windows happened to be open, or what state the client was in, and that all windows would be properly closed when the client disconnected.

Extensive network analysis was done on the packets to ensure the clients and servers were sending and receiving them properly, as well as reacting in an appropriate manner. During this testing, bugs in coding were ironed out that sometimes caused some of the message packets to be misinterpreted as multiple packets. Also during this testing, some packets were restructured to allow for minor changes in design that we decided upon. VI. Conclusion

The design of VoIM has been very flexible, and has much room for additional features and growth. The power it its gives users to communicate how they want to, whenever and wherever they want to is one of its best features. No longer do people have to guess what a person means when they read text, or miss a message because they were gone, late or busy. VoIM solves all these problems by allowing users to say what they mean, without ambiguity and without missing a message. VoIM users won’t have to sit at a phone waiting for a call, or stay at their computer waiting for whoever they’re communicating with to say something, they can get the message at their leisure, and in our convenience driven world this is something everyone could use.

The design and creation of this project was definitely a learning experience, and it taught us we should research our resources for development more before determining which language and libraries to use. Java with its lack of support was a poor choice and proved to be more of a hindrance than a benefit. A future project could revolve around the creation of a more functional JavaSound library which would benefit the entire Java development community.